mirror-linux/net
Ido Schimmel 87d7e14008 nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
commit 8743aeff5b upstream.

A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.

The nexthop bucket dump callback always returns a positive number if
nexthop buckets were filled in the provided skb, even if the dump is
complete. This means that a dump will span at least two recvmsg() calls
as long as nexthop buckets are present. In the last recvmsg() call the
dump callback will not fill in any nexthop buckets because the previous
call indicated that the dump should restart from the last dumped nexthop
ID plus one.

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id 10 group 1 type resilient buckets 2
 # strace -e sendto,recvmsg -s 5 ip nexthop bucket
 sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
 id 10 index 0 idle_time 6.66 nhid 1
 id 10 index 1 idle_time 6.66 nhid 1
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
 +++ exited with 0 +++

This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
 # ip nexthop bucket
 id 4294967295 index 0 idle_time 5.55 nhid 1
 id 4294967295 index 1 idle_time 5.55 nhid 1
 id 4294967295 index 0 idle_time 5.55 nhid 1
 id 4294967295 index 1 idle_time 5.55 nhid 1
 [...]

Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
 # strace -e sendto,recvmsg -s 5 ip nexthop bucket
 sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
 id 4294967295 index 0 idle_time 6.61 nhid 1
 id 4294967295 index 1 idle_time 6.61 nhid 1
 +++ exited with 0 +++

Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.

Add a test that fails before the fix:

 # ./fib_nexthops.sh -t basic_res
 [...]
 TEST: Maximum nexthop ID dump                                       [FAIL]
 [...]

And passes after it:

 # ./fib_nexthops.sh -t basic_res
 [...]
 TEST: Maximum nexthop ID dump                                       [ OK ]
 [...]

Fixes: 8a1bbabb03 ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16 18:27:28 +02:00
..
6lowpan
9p 9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition 2023-04-20 12:35:08 +02:00
802 mrp: introduce active flags to prevent UAF when applicant uninit 2022-12-31 13:33:02 +01:00
8021q vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit() 2023-05-24 17:32:47 +01:00
appletalk
atm atm: hide unused procfs functions 2023-06-09 10:34:16 +02:00
ax25
batman-adv batman-adv: Broken sync while rescheduling delayed work 2023-06-14 11:15:23 +02:00
bluetooth Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb 2023-08-11 12:08:23 +02:00
bpf Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" 2023-03-17 08:50:32 +01:00
bpfilter
bridge bridge: Add extack warning when enabling STP in netns. 2023-07-27 08:50:40 +02:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-17 08:50:24 +01:00
can can: bcm: Fix UAF in bcm_proc_show() 2023-07-27 08:50:27 +02:00
ceph libceph: fix potential hang in ceph_osdc_notify() 2023-08-11 12:08:19 +02:00
core bpf, sockmap: Fix bug that strp_done cannot be called 2023-08-16 18:27:26 +02:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-11 12:08:17 +02:00
dccp dccp: fix data-race around dp->dccps_mss_cache 2023-08-16 18:27:27 +02:00
dns_resolver
dsa net: dsa: sja1105: always enable the send_meta options 2023-07-19 16:22:06 +02:00
ethernet
ethtool ethtool: Fix uninitialized number of lanes 2023-05-17 11:53:37 +02:00
hsr hsr: ratelimit only when errors are printed 2023-04-06 12:10:58 +02:00
ieee802154
ife
ipv4 nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID 2023-08-16 18:27:28 +02:00
ipv6 ipv6: adjust ndisc_is_useropt() to also return true for PIO 2023-08-16 18:27:21 +02:00
iucv net/iucv: Fix size of interrupt data 2023-03-22 13:33:50 +01:00
kcm
key af_key: Reject optional tunnel/BEET mode templates in outbound policies 2023-05-24 17:32:43 +01:00
l2tp net: annotate data-races around sk->sk_mark 2023-08-11 12:08:14 +02:00
l3mdev
lapb
llc llc: Don't drop packet from non-root netns. 2023-07-27 08:50:45 +02:00
mac80211 wifi: mac80211: Remove "Missing iftype sband data/EHT cap" spam 2023-07-19 16:21:09 +02:00
mac802154 mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add() 2022-12-05 09:53:08 +01:00
mctp net: mctp: purge receive queues on sk destruction 2023-02-06 08:06:34 +01:00
mpls net: mpls: fix stale pointer if allocation fails during device rename 2023-02-22 12:59:53 +01:00
mptcp mptcp: fix the incorrect judgment for msk->cb_flags 2023-08-16 18:27:26 +02:00
ncsi net/ncsi: change from ndo_set_mac_address to dev_set_mac_address 2023-07-23 13:49:51 +02:00
netfilter net: annotate data-races around sk->sk_mark 2023-08-11 12:08:14 +02:00
netlabel
netlink netlink: Add __sock_i_ino() for __netlink_diag_dump(). 2023-07-19 16:21:13 +02:00
netrom netrom: fix info-leak in nr_write_internal() 2023-06-09 10:34:01 +02:00
nfc net: nfc: Fix use-after-free caused by nfc_llcp_find_local 2023-07-19 16:21:13 +02:00
nsh net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() 2023-05-24 17:32:45 +01:00
openvswitch net: openvswitch: fix race on port output 2023-04-20 12:35:09 +02:00
packet net/packet: annotate data-races around tp->status 2023-08-16 18:27:26 +02:00
phonet
psample
qrtr net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() 2023-04-20 12:35:09 +02:00
rds rds: rds_rm_zerocopy_callback() correct order for list_add_tail() 2023-03-10 09:33:02 +01:00
rfkill
rose net/rose: Fix to not accept on connected socket 2023-02-22 12:59:42 +01:00
rxrpc rxrpc: Fix hard call timeout units 2023-05-17 11:53:35 +02:00
sched net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free 2023-08-11 12:08:16 +02:00
sctp sctp: fix potential deadlock on &net->sctp.addr_wq_lock 2023-07-19 16:22:00 +02:00
smc net/smc: Use correct buffer sizes when switching between TCP and SMC 2023-08-16 18:27:26 +02:00
strparser
sunrpc SUNRPC: Fix UAF in svc_tcp_listen_data_ready() 2023-07-19 16:21:48 +02:00
switchdev
tipc tipc: stop tipc crypto on failure in tipc_node_create 2023-08-03 10:24:02 +02:00
tls net: tls: avoid discarding data on record close 2023-08-16 18:27:27 +02:00
unix net: add missing data-race annotations around sk->sk_peek_off 2023-08-11 12:08:14 +02:00
vmw_vsock vsock: avoid to close connected socket after the timeout 2023-05-24 17:32:44 +01:00
wireless wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems() 2023-08-16 18:27:20 +02:00
x25 net/x25: Fix to not accept on connected socket 2023-02-09 11:28:13 +01:00
xdp xsk: fix refcount underflow in error path 2023-08-16 18:27:27 +02:00
xfrm net: annotate data-races around sk->sk_mark 2023-08-11 12:08:14 +02:00
Kconfig
Kconfig.debug
Makefile
compat.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
devres.c
socket.c net: annotate sk->sk_err write from do_recvmmsg() 2023-05-24 17:32:32 +01:00
sysctl_net.c