[ Upstream commit ca22da2fbd ]
William reports kernel soft-lockups on some OVS topologies when TC mirred
egress->ingress action is hit by local TCP traffic [1].
The same can also be reproduced with SCTP (thanks Xin for verifying), when
client and server reach themselves through mirred egress to ingress, and
one of the two peers sends a "heartbeat" packet (from within a timer).
Enqueueing to backlog proved to fix this soft lockup; however, as Cong
noticed [2], we should preserve - when possible - the current mirred
behavior that counts as "overlimits" any eventual packet drop subsequent to
the mirred forwarding action [3]. A compromise solution might use the
backlog only when tcf_mirred_act() has a nest level greater than one:
change tcf_mirred_forward() accordingly.
Also, add a kselftest that can reproduce the lockup and verifies TC mirred
ability to account for further packet drops after TC mirred egress->ingress
(when the nest level is 1).
[1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/
[3] such behavior is not guaranteed: for example, if RPS or skb RX
timestamping is enabled on the mirred target device, the kernel
can defer receiving the skb and return NET_RX_SUCCESS inside
tcf_mirred_forward().
Reported-by: William Zhao <wizhao@redhat.com>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Motivation
==========
One of the nice things about network namespaces is that they allow one
to easily create and test complex environments.
Unfortunately, these namespaces can not be used with actual switching
ASICs, as their ports can not be migrated to other network namespaces
(NETIF_F_NETNS_LOCAL) and most of them probably do not support the
L1-separation provided by namespaces.
However, a similar kind of flexibility can be achieved by using VRFs and
by looping the switch ports together. For example:
br0
+
vrf-h1 | vrf-h2
+ +---+----+ +
| | | |
192.0.2.1/24 + + + + 192.0.2.2/24
swp1 swp2 swp3 swp4
+ + + +
| | | |
+--------+ +--------+
The VRFs act as lightweight namespaces representing hosts connected to
the switch.
This approach for testing switch ASICs has several advantages over the
traditional method that requires multiple physical machines, to name a
few:
1. Only the device under test (DUT) is being tested without noise from
other system.
2. Ability to easily provision complex topologies. Testing bridging
between 4-ports LAGs or 8-way ECMP requires many physical links that are
not always available. With the VRF-based approach one merely needs to
loopback more ports.
These tests are written with switch ASICs in mind, but they can be run
on any Linux box using veth pairs to emulate physical loopbacks.
Guidelines for Writing Tests
============================
o Where possible, reuse an existing topology for different tests instead
of recreating the same topology.
o Tests that use anything but the most trivial topologies should include
an ASCII art showing the topology.
o Where possible, IPv6 and IPv4 addresses shall conform to RFC 3849 and
RFC 5737, respectively.
o Where possible, tests shall be written so that they can be reused by
multiple topologies and added to lib.sh.
o Checks shall be added to lib.sh for any external dependencies.
o Code shall be checked using ShellCheck [1] prior to submission.
1. https://www.shellcheck.net/