mirror-linux/net/sched
William Liu 0e1d5d9b5c net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree
htb_lookup_leaf has a BUG_ON that can trigger with the following:

tc qdisc del dev lo root
tc qdisc add dev lo root handle 1: htb default 1
tc class add dev lo parent 1: classid 1:1 htb rate 64bit
tc qdisc add dev lo parent 1:1 handle 2: netem
tc qdisc add dev lo parent 2:1 handle 3: blackhole
ping -I lo -c1 -W0.001 127.0.0.1

The root cause is the following:

1. htb_dequeue calls htb_dequeue_tree which calls the dequeue handler on
   the selected leaf qdisc
2. netem_dequeue calls enqueue on the child qdisc
3. blackhole_enqueue drops the packet and returns a value that is not
   just NET_XMIT_SUCCESS
4. Because of this, netem_dequeue calls qdisc_tree_reduce_backlog, and
   since qlen is now 0, it calls htb_qlen_notify -> htb_deactivate ->
   htb_deactiviate_prios -> htb_remove_class_from_row -> htb_safe_rb_erase
5. As this is the only class in the selected hprio rbtree,
   __rb_change_child in __rb_erase_augmented sets the rb_root pointer to
   NULL
6. Because blackhole_dequeue returns NULL, netem_dequeue returns NULL,
   which causes htb_dequeue_tree to call htb_lookup_leaf with the same
   hprio rbtree, and fail the BUG_ON

The function graph for this scenario is shown here:
 0)               |  htb_enqueue() {
 0) + 13.635 us   |    netem_enqueue();
 0)   4.719 us    |    htb_activate_prios();
 0) # 2249.199 us |  }
 0)               |  htb_dequeue() {
 0)   2.355 us    |    htb_lookup_leaf();
 0)               |    netem_dequeue() {
 0) + 11.061 us   |      blackhole_enqueue();
 0)               |      qdisc_tree_reduce_backlog() {
 0)               |        qdisc_lookup_rcu() {
 0)   1.873 us    |          qdisc_match_from_root();
 0)   6.292 us    |        }
 0)   1.894 us    |        htb_search();
 0)               |        htb_qlen_notify() {
 0)   2.655 us    |          htb_deactivate_prios();
 0)   6.933 us    |        }
 0) + 25.227 us   |      }
 0)   1.983 us    |      blackhole_dequeue();
 0) + 86.553 us   |    }
 0) # 2932.761 us |    qdisc_warn_nonwc();
 0)               |    htb_lookup_leaf() {
 0)               |      BUG_ON();
 ------------------------------------------

The full original bug report can be seen here [1].

We can fix this just by returning NULL instead of the BUG_ON,
as htb_dequeue_tree returns NULL when htb_lookup_leaf returns
NULL.

[1] https://lore.kernel.org/netdev/pF5XOOIim0IuEfhI-SOxTgRvNoDwuux7UHKnE_Y5-zVd4wmGvNk2ceHjKb8ORnzw0cGwfmVu42g9dL7XyJLf1NEzaztboTWcm0Ogxuojoeo=@willsroot.io/

Fixes: 512bb43eb5 ("pkt_sched: sch_htb: Optimize WARN_ONs in htb_dequeue_tree() etc.")
Signed-off-by: William Liu <will@willsroot.io>
Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io>
Link: https://patch.msgid.link/20250717022816.221364-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17 07:47:55 -07:00
..
Kconfig sctp: use skb_crc32c() instead of __skb_checksum() 2025-05-21 15:40:16 -07:00
Makefile bpf: net_sched: Support implementation of Qdisc_ops in bpf 2025-04-17 10:54:33 -07:00
act_api.c tc: Return an error if filters try to attach too many actions 2025-04-15 17:13:11 -07:00
act_bpf.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
act_connmark.c
act_csum.c
act_ct.c net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
act_ctinfo.c net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
act_gact.c
act_gate.c net/sched: Switch to use hrtimer_setup() 2025-02-18 10:35:44 +01:00
act_ife.c
act_meta_mark.c
act_meta_skbprio.c
act_meta_skbtcindex.c
act_mirred.c net/sched: act_mirred: Move the recursion counter struct netdev_xmit 2025-05-15 15:23:31 +02:00
act_mpls.c net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
act_nat.c
act_pedit.c
act_police.c net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
act_sample.c net: sched: act_sample: add action cookie to sample 2024-07-05 17:45:47 -07:00
act_simple.c
act_skbedit.c
act_skbmod.c net/sched: act_skbmod: convert comma to semicolon 2024-07-11 17:12:15 -07:00
act_tunnel_key.c net: fix geneve_opt length integer overflow 2025-04-03 15:47:35 -07:00
act_vlan.c tc: adjust network header after 2nd vlan push 2024-08-27 11:37:42 +02:00
bpf_qdisc.c bpf: net_sched: Make some Qdisc_ops ops mandatory 2025-05-02 15:35:37 -07:00
cls_api.c tc: Ensure we have enough buffer space when sending filter netlink notifications 2025-04-08 13:57:49 +02:00
cls_basic.c
cls_bpf.c net: sched: refine software bypass handling in tc_run 2025-01-20 09:21:27 +00:00
cls_cgroup.c
cls_flow.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
cls_flower.c net: fix geneve_opt length integer overflow 2025-04-03 15:47:35 -07:00
cls_fw.c
cls_matchall.c net: sched: refine software bypass handling in tc_run 2025-01-20 09:21:27 +00:00
cls_route.c
cls_u32.c net: sched: refine software bypass handling in tc_run 2025-01-20 09:21:27 +00:00
em_canid.c
em_cmp.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
em_ipset.c
em_ipt.c
em_meta.c net: dismiss sk_forward_alloc_get() 2025-02-19 19:05:28 -08:00
em_nbyte.c
em_text.c
em_u32.c
ematch.c
sch_api.c net/sched: sch_qfq: Fix null-deref in agg_dequeue 2025-07-10 11:08:35 +02:00
sch_blackhole.c
sch_cake.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-01-09 16:11:47 -08:00
sch_cbs.c net/sched: cbs: Fix integer overflow in cbs_set_port_rate() 2024-10-15 18:25:47 -07:00
sch_choke.c net: sched: fix ordering of qlen adjustment 2024-12-04 12:54:22 +00:00
sch_codel.c net_sched: Flush gso_skb list too during ->change() 2025-05-09 12:34:38 +01:00
sch_drr.c net_sched: drr: Fix double list add in class with netem as child qdisc 2025-04-28 15:55:06 -07:00
sch_etf.c
sch_ets.c net_sched: ets: fix a race in ets_qdisc_change() 2025-06-12 08:05:50 -07:00
sch_fifo.c pfifo_tail_enqueue: Drop new packet when sch->limit == 0 2025-02-05 18:13:58 -08:00
sch_fq.c net_sched: Flush gso_skb list too during ->change() 2025-05-09 12:34:38 +01:00
sch_fq_codel.c net_sched: Flush gso_skb list too during ->change() 2025-05-09 12:34:38 +01:00
sch_fq_pie.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
sch_frag.c net/sched: Use nested-BH locking for sch_frag_data_storage 2025-05-15 15:23:31 +02:00
sch_generic.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
sch_gred.c sched: address a potential NULL pointer dereference in the GRED scheduler. 2025-03-06 16:35:14 -08:00
sch_hfsc.c net/sched: sch_qfq: Fix null-deref in agg_dequeue 2025-07-10 11:08:35 +02:00
sch_hhf.c net_sched: Flush gso_skb list too during ->change() 2025-05-09 12:34:38 +01:00
sch_htb.c net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree 2025-07-17 07:47:55 -07:00
sch_ingress.c bpf: Fix too early release of tcx_entry 2024-07-08 14:07:31 -07:00
sch_mq.c
sch_mqprio.c
sch_mqprio_lib.c
sch_mqprio_lib.h
sch_multiq.c net: sched: sch_multiq: fix possible OOB write in multiq_tune() 2024-06-05 10:50:19 +01:00
sch_netem.c netem: Update sch->q.qlen before qdisc_tree_reduce_backlog() 2025-02-05 18:14:46 -08:00
sch_pie.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
sch_plug.c
sch_prio.c net_sched: prio: fix a race in prio_tune() 2025-06-12 08:05:49 -07:00
sch_qfq.c net/sched: sch_qfq: Fix race condition on qfq_aggregate 2025-07-13 00:09:33 +01:00
sch_red.c Including fixes from bluetooth and wireless. 2025-06-12 09:50:36 -07:00
sch_sfb.c net/sched: Add drop reasons for AQM-based qdiscs 2024-12-17 13:27:29 +01:00
sch_sfq.c Including fixes from bluetooth and wireless. 2025-06-12 09:50:36 -07:00
sch_skbprio.c net_sched: skbprio: Remove overly strict queue assertions 2025-04-02 16:03:32 -07:00
sch_taprio.c net/sched: fix use-after-free in taprio_dev_notifier 2025-06-17 16:14:04 -07:00
sch_tbf.c net_sched: tbf: fix a race in tbf_change() 2025-06-12 08:05:50 -07:00
sch_teql.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00