mirror-linux/net/core
Lorenz Bauer c0ce0fb766 bpf: reject unhashed sockets in bpf_sk_assign
[ Upstream commit 67312adc96 ]

The semantics for bpf_sk_assign are as follows:

    sk = some_lookup_func()
    bpf_sk_assign(skb, sk)
    bpf_sk_release(sk)

That is, the sk is not consumed by bpf_sk_assign. The function
therefore needs to make sure that sk lives long enough to be
consumed from __inet_lookup_skb. The path through the stack for a
TCPv4 packet is roughly:

  netif_receive_skb_core: takes RCU read lock
    __netif_receive_skb_core:
      sch_handle_ingress:
        tcf_classify:
          bpf_sk_assign()
      deliver_ptype_list_skb:
        deliver_skb:
          ip_packet_type->func == ip_rcv:
            ip_rcv_core:
            ip_rcv_finish_core:
              dst_input:
                ip_local_deliver:
                  ip_local_deliver_finish:
                    ip_protocol_deliver_rcu:
                      tcp_v4_rcv:
                        __inet_lookup_skb:
                          skb_steal_sock

The existing helper takes advantage of the fact that everything
happens in the same RCU critical section: for sockets with
SOCK_RCU_FREE set bpf_sk_assign never takes a reference.
skb_steal_sock then checks SOCK_RCU_FREE again and does sock_put
if necessary.

This approach assumes that SOCK_RCU_FREE is never set on a sk
between bpf_sk_assign and skb_steal_sock, but this invariant is
violated by unhashed UDP sockets. A new UDP socket is created
in TCP_CLOSE state but without SOCK_RCU_FREE set. That flag is only
added in udp_lib_get_port() which happens when a socket is bound.

When bpf_sk_assign was added it wasn't possible to access unhashed
UDP sockets from BPF, so this wasn't a problem. This changed
in commit 0c48eefae7 ("sock_map: Lift socket state restriction
for datagram sockets"), but the helper wasn't adjusted accordingly.
The following sequence of events will therefore lead to a refcount
leak:

1. Add socket(AF_INET, SOCK_DGRAM) to a sockmap.
2. Pull socket out of sockmap and bpf_sk_assign it. Since
   SOCK_RCU_FREE is not set we increment the refcount.
3. bind() or connect() the socket, setting SOCK_RCU_FREE.
4. skb_steal_sock will now set refcounted = false due to
   SOCK_RCU_FREE.
5. tcp_v4_rcv() skips sock_put().

Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
This matches the behaviour of __inet_lookup_skb which is ultimately
the goal of bpf_sk_assign().

Fixes: cf7fbe660f ("bpf: Add socket assign support")
Cc: Joe Stringer <joe@cilium.io>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-2-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:31 +02:00
..
Makefile devlink: move code to a dedicated directory 2023-08-30 16:11:00 +02:00
bpf_sk_storage.c bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing 2023-08-11 12:08:12 +02:00
datagram.c net: datagram: fix data-races in datagram_poll() 2023-05-24 17:32:32 +01:00
dev.c net: sched: add rcu annotations around qdisc->qdisc_sleeping 2023-06-14 11:15:21 +02:00
dev.h
dev_addr_lists.c
dev_addr_lists_test.c
dev_ioctl.c
drop_monitor.c
dst.c
dst_cache.c
failover.c
fib_notifier.c
fib_rules.c
filter.c bpf: reject unhashed sockets in bpf_sk_assign 2023-09-13 09:42:31 +02:00
flow_dissector.c netfilter: conntrack: Fix data-races around ct mark 2022-11-18 15:21:00 +01:00
flow_offload.c
gen_estimator.c
gen_stats.c
gro.c skb: Do mix page pool and page referenced frags in GRO 2023-02-09 11:28:05 +01:00
gro_cells.c net: drop the weight argument from netif_napi_add 2022-09-28 18:57:14 -07:00
hwbm.c
link_watch.c
lwt_bpf.c
lwtunnel.c xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available 2022-10-12 10:45:51 +02:00
neighbour.c neighbour: delete neigh_lookup_nodev as not used 2023-06-21 16:01:03 +02:00
net-procfs.c
net-sysfs.c net-sysfs: Convert to use sysfs_emit() APIs 2022-09-30 12:27:44 +01:00
net-sysfs.h
net-traces.c
net_namespace.c net: fix UaF in netns ops registration error path 2023-02-01 08:34:43 +01:00
netclassid_cgroup.c
netevent.c
netpoll.c net: don't let netpoll invoke NAPI if in xmit context 2023-04-13 16:55:21 +02:00
netprio_cgroup.c
of_net.c
page_pool.c page_pool: fix inconsistency for page_pool_ring_[un]lock() 2023-06-05 09:26:20 +02:00
pktgen.c treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
ptp_classifier.c
request_sock.c
rtnetlink.c rtnetlink: Reject negative ifindexes in RTM_NEWLINK 2023-08-30 16:11:04 +02:00
scm.c scm: add user copy checks to put_cmsg() 2023-03-10 09:33:54 +01:00
secure_seq.c
selftests.c
skbuff.c net: prevent skb corruption on frag list segmentation 2023-07-23 13:49:23 +02:00
skmsg.c bpf, sockmap: Fix bug that strp_done cannot be called 2023-08-16 18:27:26 +02:00
sock.c net: annotate data-races around sk->sk_{rcv|snd}timeo 2023-09-13 09:42:23 +02:00
sock_destructor.h
sock_diag.c
sock_map.c bpf, sockmap: Fix map type error in sock_map_del_link 2023-08-16 18:27:26 +02:00
sock_reuseport.c soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-12-31 13:32:04 +01:00
stream.c net: deal with most data-races in sk_wait_event() 2023-05-24 17:32:32 +01:00
sysctl_net_core.c
timestamping.c
tso.c
utils.c
xdp.c xdp: improve page_pool xdp_return performance 2022-09-26 11:28:19 -07:00