mirror-linux/net
Lorenz Bauer c0ce0fb766 bpf: reject unhashed sockets in bpf_sk_assign
[ Upstream commit 67312adc96 ]

The semantics for bpf_sk_assign are as follows:

    sk = some_lookup_func()
    bpf_sk_assign(skb, sk)
    bpf_sk_release(sk)

That is, the sk is not consumed by bpf_sk_assign. The function
therefore needs to make sure that sk lives long enough to be
consumed from __inet_lookup_skb. The path through the stack for a
TCPv4 packet is roughly:

  netif_receive_skb_core: takes RCU read lock
    __netif_receive_skb_core:
      sch_handle_ingress:
        tcf_classify:
          bpf_sk_assign()
      deliver_ptype_list_skb:
        deliver_skb:
          ip_packet_type->func == ip_rcv:
            ip_rcv_core:
            ip_rcv_finish_core:
              dst_input:
                ip_local_deliver:
                  ip_local_deliver_finish:
                    ip_protocol_deliver_rcu:
                      tcp_v4_rcv:
                        __inet_lookup_skb:
                          skb_steal_sock

The existing helper takes advantage of the fact that everything
happens in the same RCU critical section: for sockets with
SOCK_RCU_FREE set bpf_sk_assign never takes a reference.
skb_steal_sock then checks SOCK_RCU_FREE again and does sock_put
if necessary.

This approach assumes that SOCK_RCU_FREE is never set on a sk
between bpf_sk_assign and skb_steal_sock, but this invariant is
violated by unhashed UDP sockets. A new UDP socket is created
in TCP_CLOSE state but without SOCK_RCU_FREE set. That flag is only
added in udp_lib_get_port() which happens when a socket is bound.

When bpf_sk_assign was added it wasn't possible to access unhashed
UDP sockets from BPF, so this wasn't a problem. This changed
in commit 0c48eefae7 ("sock_map: Lift socket state restriction
for datagram sockets"), but the helper wasn't adjusted accordingly.
The following sequence of events will therefore lead to a refcount
leak:

1. Add socket(AF_INET, SOCK_DGRAM) to a sockmap.
2. Pull socket out of sockmap and bpf_sk_assign it. Since
   SOCK_RCU_FREE is not set we increment the refcount.
3. bind() or connect() the socket, setting SOCK_RCU_FREE.
4. skb_steal_sock will now set refcounted = false due to
   SOCK_RCU_FREE.
5. tcp_v4_rcv() skips sock_put().

Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
This matches the behaviour of __inet_lookup_skb which is ultimately
the goal of bpf_sk_assign().

Fixes: cf7fbe660f ("bpf: Add socket assign support")
Cc: Joe Stringer <joe@cilium.io>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-2-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:31 +02:00
..
6lowpan
9p 9p: virtio: make sure 'offs' is initialized in zc_request 2023-09-13 09:42:21 +02:00
802
8021q vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit() 2023-05-24 17:32:47 +01:00
appletalk
atm atm: hide unused procfs functions 2023-06-09 10:34:16 +02:00
ax25
batman-adv batman-adv: Hold rtnl lock during MTU update via netlink 2023-08-30 16:11:08 +02:00
bluetooth Bluetooth: MGMT: Use correct address for memcpy() 2023-08-23 17:52:27 +02:00
bpf
bpfilter
bridge Revert "bridge: Add extack warning when enabling STP in netns." 2023-09-13 09:42:20 +02:00
caif
can can: raw: add missing refcount for memory leak fix 2023-08-30 16:11:11 +02:00
ceph libceph: fix potential hang in ceph_osdc_notify() 2023-08-11 12:08:19 +02:00
core bpf: reject unhashed sockets in bpf_sk_assign 2023-09-13 09:42:31 +02:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-11 12:08:17 +02:00
dccp ipv4: fix data-races around inet->inet_id 2023-08-30 16:11:02 +02:00
devlink devlink: add missing unregister linecard notification 2023-08-30 16:11:00 +02:00
dns_resolver
dsa net: dsa: sja1105: always enable the send_meta options 2023-07-19 16:22:06 +02:00
ethernet
ethtool ethtool: Fix uninitialized number of lanes 2023-05-17 11:53:37 +02:00
hsr
ieee802154
ife
ipv4 udp: re-score reuseport groups when connected sockets are present 2023-09-13 09:42:31 +02:00
ipv6 udp: re-score reuseport groups when connected sockets are present 2023-09-13 09:42:31 +02:00
iucv
kcm
key net: af_key: fix sadb_x_filter validation 2023-08-23 17:52:32 +02:00
l2tp net: annotate data-races around sk->sk_mark 2023-08-11 12:08:14 +02:00
l3mdev
lapb
llc llc: Don't drop packet from non-root netns. 2023-07-27 08:50:45 +02:00
mac80211 wifi: mac80211: Use active_links instead of valid_links in Tx 2023-09-13 09:42:24 +02:00
mac802154
mctp
mpls
mptcp mptcp: fix the incorrect judgment for msk->cb_flags 2023-08-16 18:27:26 +02:00
ncsi net/ncsi: change from ndo_set_mac_address to dev_set_mac_address 2023-07-23 13:49:51 +02:00
netfilter netfilter: nf_tables: fix out of memory error handling 2023-08-30 16:11:03 +02:00
netlabel netlabel: fix shift wrapping bug in netlbl_catmap_setlong() 2023-09-13 09:42:24 +02:00
netlink netlink: Add __sock_i_ino() for __netlink_diag_dump(). 2023-07-19 16:21:13 +02:00
netrom netrom: fix info-leak in nr_write_internal() 2023-06-09 10:34:01 +02:00
nfc net: nfc: Fix use-after-free caused by nfc_llcp_find_local 2023-07-19 16:21:13 +02:00
nsh net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() 2023-05-24 17:32:45 +01:00
openvswitch net: openvswitch: reject negative ifindex 2023-08-23 17:52:35 +02:00
packet net/packet: annotate data-races around tp->status 2023-08-16 18:27:26 +02:00
phonet
psample
qrtr net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() 2023-04-20 12:35:09 +02:00
rds
rfkill
rose
rxrpc rxrpc: Fix hard call timeout units 2023-05-17 11:53:35 +02:00
sched net: annotate data-races around sk->sk_{rcv|snd}timeo 2023-09-13 09:42:23 +02:00
sctp sctp: handle invalid error codes without calling BUG() 2023-09-13 09:42:25 +02:00
smc net/smc: Fix setsockopt and sysctl to specify same buffer size again 2023-08-23 17:52:18 +02:00
strparser
sunrpc xprtrdma: Remap Receive buffers after a reconnect 2023-08-30 16:10:57 +02:00
switchdev
tipc tipc: stop tipc crypto on failure in tipc_node_create 2023-08-03 10:24:02 +02:00
tls net: tls: avoid discarding data on record close 2023-08-16 18:27:27 +02:00
unix af_unix: Fix null-ptr-deref in unix_stream_sendpage(). 2023-08-23 17:52:42 +02:00
vmw_vsock vsock: avoid to close connected socket after the timeout 2023-05-24 17:32:44 +01:00
wireless wifi: cfg80211: remove links only on AP 2023-09-13 09:42:24 +02:00
x25
xdp xsk: fix refcount underflow in error path 2023-08-16 18:27:27 +02:00
xfrm xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH 2023-08-23 17:52:32 +02:00
Kconfig
Kconfig.debug
Makefile devlink: move code to a dedicated directory 2023-08-30 16:11:00 +02:00
compat.c
devres.c
socket.c net: Avoid address overwrite in kernel_connect 2023-09-13 09:42:26 +02:00
sysctl_net.c