mirror-linux/drivers
Michal Pecio bb0ba4cb10 usb: xhci: Apply the link chain quirk on NEC isoc endpoints
Two clearly different specimens of NEC uPD720200 (one with start/stop
bug, one without) were seen to cause IOMMU faults after some Missed
Service Errors. Faulting address is immediately after a transfer ring
segment and patched dynamic debug messages revealed that the MSE was
received when waiting for a TD near the end of that segment:

[ 1.041954] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ffa08fe0
[ 1.042120] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09000 flags=0x0000]
[ 1.042146] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09040 flags=0x0000]

It gets even funnier if the next page is a ring segment accessible to
the HC. Below, it reports MSE in segment at ff1e8000, plows through a
zero-filled page at ff1e9000 and starts reporting events for TRBs in
page at ff1ea000 every microframe, instead of jumping to seg ff1e6000.

[ 7.041671] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0
[ 7.041999] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0
[ 7.042011] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 7.042028] xhci_hcd: All TDs skipped for slot 1 ep 2. Clear skip flag.
[ 7.042134] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 7.042138] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31
[ 7.042144] xhci_hcd: Looking for event-dma 00000000ff1ea040 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820
[ 7.042259] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 7.042262] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31
[ 7.042266] xhci_hcd: Looking for event-dma 00000000ff1ea050 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820

At some point completion events change from Isoch Buffer Overrun to
Short Packet and the HC finally finds cycle bit mismatch in ff1ec000.

[ 7.098130] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 7.098132] xhci_hcd: Looking for event-dma 00000000ff1ecc50 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820
[ 7.098254] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 7.098256] xhci_hcd: Looking for event-dma 00000000ff1ecc60 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820
[ 7.098379] xhci_hcd: Overrun event on slot 1 ep 2

It's possible that data from the isochronous device were written to
random buffers of pending TDs on other endpoints (either IN or OUT),
other devices or even other HCs in the same IOMMU domain.

Lastly, an error from a different USB device on another HC. Was it
caused by the above? I don't know, but it may have been. The disk
was working without any other issues and generated PCIe traffic to
starve the NEC of upstream BW and trigger those MSEs. The two HCs
shared one x1 slot by means of a commercial "PCIe splitter" board.

[ 7.162604] usb 10-2: reset SuperSpeed USB device number 3 using xhci_hcd
[ 7.178990] sd 9:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s
[ 7.179001] sd 9:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 04 02 ae 00 00 02 00 00
[ 7.179004] I/O error, dev sdb, sector 67284480 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0

Fortunately, it appears that this ridiculous bug is avoided by setting
the chain bit of Link TRBs on isochronous rings. Other ancient HCs are
known which also expect the bit to be set and they ignore Link TRBs if
it's not. Reportedly, 0.95 spec guaranteed that the bit is set.

The bandwidth-starved NEC HC running a 32KB/uframe UVC endpoint reports
tens of MSEs per second and runs into the bug within seconds. Chaining
Link TRBs allows the same workload to run for many minutes, many times.

No negative side effects seen in UVC recording and UAC playback with a
few devices at full speed, high speed and SuperSpeed.

The problem doesn't reproduce on the newer Renesas uPD720201/uPD720202
and on old Etron EJ168 and VIA VL805 (but the VL805 has other bug).

[shorten line length of log snippets in commit messge -Mathias]

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-14-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06 16:46:16 +01:00
..
accel A couple of fixes for ivpu to error handling, komeda for format 2025-02-07 14:47:25 +10:00
accessibility
acpi arm64 fixes for -rc3 2025-02-14 09:55:17 -08:00
amba
android Char/Misc/IIO driver updates for 6.14-rc1 2025-01-27 16:51:51 -08:00
ata ata changes for 6.14 part2 2025-01-31 11:07:56 -08:00
atm
auxdisplay auxdisplay for v6.14-1 2025-01-24 08:03:52 -08:00
base Driver core api addition for 6.14-rc3 2025-02-16 12:54:42 -08:00
bcma
block block-6.14-20250207 2025-02-07 11:00:33 -08:00
bluetooth Bluetooth: btintel_pcie: Fix a potential race condition 2025-02-13 11:14:04 -05:00
bus genirq: Remove leading space from irq_chip::irq_print_chip() callbacks 2025-02-07 08:56:01 +01:00
cache
cdrom treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
cdx
char treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
clk The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
clocksource
comedi
connector
counter
cpufreq amd-pstate fixes 2/6/25 2025-02-06 20:39:43 +01:00
cpuidle More power management updates for 6.14-rc1 2025-01-30 15:10:34 -08:00
crypto crypto: ccp: Add external API interface for PSP module initialization 2025-02-14 18:39:19 -05:00
cxl cxl changes for v6.14 2025-01-29 11:23:22 -08:00
dax
dca
devfreq Update devfreq next for v6.14 2025-01-13 20:48:34 +01:00
dio
dma tegra210-adma: fix 32-bit x86 build 2025-02-15 09:28:55 -08:00
dma-buf
dpll
edac - The first part of a restructuring of AMD's representation of a northbridge 2025-01-21 09:38:52 -08:00
eisa
extcon Update extcon next for v6.14 2025-01-12 13:44:27 +01:00
firewire Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
firmware EFI fixes for v6.14 #1 2025-02-14 13:56:04 -08:00
fpga
fsi
gnss
gpio gpiolib: Fix crash on error in gpiochip_get_ngpios() 2025-02-13 18:51:39 +01:00
gpu - Remove bo->clients out of bos_lock area (Tejas) 2025-02-14 12:15:59 +10:00
greybus
hid hid-for-linus-2025021001 2025-02-10 09:50:01 -08:00
hsi
hte
hv treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
hwmon Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
hwspinlock
hwtracing KVM/arm64 updates for 6.14 2025-01-28 09:01:36 -08:00
i2c Revert "i2c: Replace list-based mechanism for handling auto-detected clients" 2025-02-05 14:22:12 +01:00
i3c I3C for 6.14 2025-01-24 15:48:01 -08:00
idle Power management updates for 6.14-rc1 2025-01-22 11:16:14 -08:00
iio IIO: 2nd set of fixes for the 6.13 cycle. 2025-01-16 13:46:08 +01:00
infiniband Mainly individually changelogged singleton patches. The patch series in 2025-01-26 17:50:53 -08:00
input platform-drivers-x86 for v6.14-1 2025-01-24 07:18:39 -08:00
interconnect interconnect changes for 6.14 2025-01-16 14:01:40 +01:00
iommu ARM: 2025-02-16 10:25:12 -08:00
ipack
irqchip genirq: Remove leading space from irq_chip::irq_print_chip() callbacks 2025-02-07 08:56:01 +01:00
isdn
leds Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
macintosh The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
mailbox mailbox: th1520: Fix memory corruption due to incorrect array size 2025-01-18 16:20:55 -06:00
mcb
md block-6.14-20250207 2025-02-07 11:00:33 -08:00
media [GIT PULL for v6.14] media updates 2025-02-01 09:15:01 -08:00
memory spi: Support DTR in spi-mem 2025-01-15 19:07:39 +01:00
memstick Char/Misc/IIO driver updates for 6.14-rc1 2025-01-27 16:51:51 -08:00
message Merge branch '6.13/scsi-fixes' into 6.14/scsi-staging 2025-01-10 15:20:30 -05:00
mfd mfd: syscon: Restore device_node_to_regmap() for non-syscon nodes 2025-02-11 14:53:39 +00:00
misc treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
mmc mmc: mtk-sd: Fix register settings for hs400(es) mode 2025-02-03 13:34:50 +01:00
most
mtd block-6.14-20250131 2025-01-31 11:49:30 -08:00
mux
net net: pse-pd: Fix deadlock in current limit functions 2025-02-13 10:00:39 -08:00
nfc nfc: mrvl: Don't use "proxy" headers 2025-01-18 17:10:05 -08:00
ntb PCI: Remove devres from pci_intx() 2025-01-18 14:38:49 -06:00
nubus
nvdimm
nvme nvme fixes for Linux 6.14 2025-02-03 09:19:03 -07:00
nvmem
of of: address: Add kunit test for __of_address_resource_bounds() 2025-02-02 20:59:04 -06:00
opp Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
parisc
parport
pci pci-v6.14-fixes-3 2025-02-14 16:49:07 -08:00
pcmcia
peci
perf treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
phy phy-for-6.14 2025-01-29 14:32:38 -08:00
pinctrl pinctrl: pinconf-generic: Print unsigned value if a format is registered 2025-02-06 10:13:15 +01:00
platform platform/x86: thinkpad_acpi: Fix registration of tpacpi platform driver 2025-02-12 13:49:37 +02:00
pmdomain pmdomain: airoha: Fix compilation error with Clang-20 and Thumb2 mode 2025-01-21 10:45:24 +01:00
pnp
power power supply and reset changes for the 6.14 series 2025-01-27 15:37:16 -08:00
powercap Merge branch 'pm-powercap' 2025-02-07 12:43:58 +01:00
pps
ps3
ptp ptp: vmclock: Remove goto-based cleanup logic 2025-02-11 10:20:52 +01:00
pwm Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
rapidio
ras
regulator regulator: core: let dt properties override driver init_data 2025-02-11 16:29:01 +00:00
remoteproc remoteproc: st: Use syscon_regmap_lookup_by_phandle_args 2025-01-15 10:04:27 -07:00
reset soc: driver updates for 6.14 2025-01-24 14:56:59 -08:00
rpmsg
rtc RTC for 6.13 2025-01-30 17:50:02 -08:00
s390 s390 updates for 6.14-rc3 2025-02-15 10:15:24 -08:00
sbus
scsi scsi: qla1280: Fix kernel oops when debug level > 2 2025-02-03 17:54:56 -05:00
sh
siox
slimbus Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
soc genirq: Remove leading space from irq_chip::irq_print_chip() callbacks 2025-02-07 08:56:01 +01:00
soundwire soundwire updates for 6.14 2025-01-29 14:38:19 -08:00
spi spi: sn-f-ospi: Fix division by zero 2025-02-06 11:33:51 +00:00
spmi spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe() 2025-01-17 12:58:49 +01:00
ssb
staging Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
target Merge branch '6.14/scsi-queue' into 6.14/scsi-fixes 2025-02-03 16:28:51 -05:00
tc
tee
thermal thermal/cpufreq_cooling: Remove structure member documentation 2025-02-11 21:02:13 +01:00
thunderbolt Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
tty Serial driver fixes for 6.14-rc3 2025-02-16 12:50:44 -08:00
ufs scsi: ufs: core: Fix error return with query response 2025-02-03 17:34:24 -05:00
uio Char/Misc/IIO driver updates for 6.14-rc1 2025-01-27 16:51:51 -08:00
usb usb: xhci: Apply the link chain quirk on NEC isoc endpoints 2025-03-06 16:46:16 +01:00
vdpa virtio: features, fixes, cleanups 2025-01-27 15:26:06 -08:00
vfio VFIO updates for v6.14-rc1 2025-01-28 14:16:46 -08:00
vhost vhost/net: Set num_buffers for virtio 1.0 2025-01-27 09:39:25 -05:00
video fbdev fixes and updates for 6.14-rc1: 2025-01-24 11:32:13 -08:00
virt - A segmented Reverse Map table (RMP) is a across-nodes distributed 2025-01-21 09:00:31 -08:00
virtio virtio: features, fixes, cleanups 2025-01-27 15:26:06 -08:00
w1
watchdog linux-watchdog 6.14-rc1 tag 2025-01-25 16:19:10 -08:00
xen xen: branch for v6.14-rc3 2025-02-14 08:15:17 -08:00
zorro
Kconfig
Makefile