mirror-linux/drivers/net/ethernet
Dipayaan Roy 95084f1883 net: mana: Fix crash from unvalidated SHM offset read from BAR0 during FLR
During Function Level Reset recovery, the MANA driver reads
hardware BAR0 registers that may temporarily contain garbage values.
The SHM (Shared Memory) offset read from GDMA_REG_SHM_OFFSET is used
to compute gc->shm_base, which is later dereferenced via readl() in
mana_smc_poll_register(). If the hardware returns an unaligned or
out-of-range value, the driver must not blindly use it, as this would
propagate the hardware error into a kernel crash.

The following crash was observed on an arm64 Hyper-V guest running
kernel 6.17.0-3013-azure during VF reset recovery triggered by HWC
timeout.

[13291.785274] Unable to handle kernel paging request at virtual address ffff8000a200001b
[13291.785311] Mem abort info:
[13291.785332]   ESR = 0x0000000096000021
[13291.785343]   EC = 0x25: DABT (current EL), IL = 32 bits
[13291.785355]   SET = 0, FnV = 0
[13291.785363]   EA = 0, S1PTW = 0
[13291.785372]   FSC = 0x21: alignment fault
[13291.785382] Data abort info:
[13291.785391]   ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000
[13291.785404]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[13291.785412]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[13291.785421] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000014df3a1000
[13291.785432] [ffff8000a200001b] pgd=1000000100438403, p4d=1000000100438403, pud=1000000100439403, pmd=0068000fc2000711
[13291.785703] Internal error: Oops: 0000000096000021 [#1]  SMP
[13291.830975] Modules linked in: tls qrtr mana_ib ib_uverbs ib_core xt_owner xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables cfg80211 8021q garp mrp stp llc binfmt_misc joydev serio_raw nls_iso8859_1 hid_generic aes_ce_blk aes_ce_cipher polyval_ce ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher hid_hyperv sm4 sm3_ce sha3_ce hv_netvsc hid vmgenid hyperv_keyboard hyperv_drm sch_fq_codel nvme_fabrics efi_pstore dm_multipath nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vmw_vmci vsock dmi_sysfs ip_tables x_tables autofs4
[13291.862630] CPU: 122 UID: 0 PID: 61796 Comm: kworker/122:2 Tainted: G        W           6.17.0-3013-azure #13-Ubuntu VOLUNTARY
[13291.869902] Tainted: [W]=WARN
[13291.871901] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 01/08/2026
[13291.878086] Workqueue: events mana_serv_func
[13291.880718] pstate: 62400005 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
[13291.884835] pc : mana_smc_poll_register+0x48/0xb0
[13291.887902] lr : mana_smc_setup_hwc+0x70/0x1c0
[13291.890493] sp : ffff8000ab79bbb0
[13291.892364] x29: ffff8000ab79bbb0 x28: ffff00410c8b5900 x27: ffff00410d630680
[13291.896252] x26: ffff004171f9fd80 x25: 000000016ed55000 x24: 000000017f37e000
[13291.899990] x23: 0000000000000000 x22: 000000016ed55000 x21: 0000000000000000
[13291.904497] x20: ffff8000a200001b x19: 0000000000004e20 x18: ffff8000a6183050
[13291.908308] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000000a
[13291.912542] x14: 0000000000000004 x13: 0000000000000000 x12: 0000000000000000
[13291.916298] x11: 0000000000000000 x10: 0000000000000001 x9 : ffffc45006af1bd8
[13291.920945] x8 : ffff000151129000 x7 : 0000000000000000 x6 : 0000000000000000
[13291.925293] x5 : 000000015f214000 x4 : 000000017217a000 x3 : 000000016ed50000
[13291.930436] x2 : 000000016ed55000 x1 : 0000000000000000 x0 : ffff8000a1ffffff
[13291.934342] Call trace:
[13291.935736]  mana_smc_poll_register+0x48/0xb0 (P)
[13291.938611]  mana_smc_setup_hwc+0x70/0x1c0
[13291.941113]  mana_hwc_create_channel+0x1a0/0x3a0
[13291.944283]  mana_gd_setup+0x16c/0x398
[13291.946584]  mana_gd_resume+0x24/0x70
[13291.948917]  mana_do_service+0x13c/0x1d0
[13291.951583]  mana_serv_func+0x34/0x68
[13291.953732]  process_one_work+0x168/0x3d0
[13291.956745]  worker_thread+0x2ac/0x480
[13291.959104]  kthread+0xf8/0x110
[13291.961026]  ret_from_fork+0x10/0x20
[13291.963560] Code: d2807d00 9417c551 71000673 54000220 (b9400281)
[13291.967299] ---[ end trace 0000000000000000 ]---

Disassembly of mana_smc_poll_register() around the crash site:

Disassembly of section .text:

00000000000047c8 <mana_smc_poll_register>:
    47c8: d503201f        nop
    47cc: d503201f        nop
    47d0: d503233f        paciasp
    47d4: f800865e        str     x30, [x18], #8
    47d8: a9bd7bfd        stp     x29, x30, [sp, #-48]!
    47dc: 910003fd        mov     x29, sp
    47e0: a90153f3        stp     x19, x20, [sp, #16]
    47e4: 91007014        add     x20, x0, #0x1c
    47e8: 5289c413        mov     w19, #0x4e20
    47ec: f90013f5        str     x21, [sp, #32]
    47f0: 12001c35        and     w21, w1, #0xff
    47f4: 14000008        b       4814 <mana_smc_poll_register+0x4c>
    47f8: 36f801e1  tbz  w1, #31, 4834 <mana_smc_poll_register+0x6c>
    47fc: 52800042        mov     w2, #0x2
    4800: d280fa01        mov     x1, #0x7d0
    4804: d2807d00        mov     x0, #0x3e8
    4808: 94000000        bl      0 <usleep_range_state>
    480c: 71000673        subs    w19, w19, #0x1
    4810: 54000200        b.eq    4850 <mana_smc_poll_register+0x88>
    4814: b9400281      ldr   w1, [x20] <-- **** CRASHED HERE *****
    4818: d50331bf        dmb     oshld
    481c: 2a0103e2        mov     w2, w1
    ...

From the crash signature x20 = ffff8000a200001b, this address
ends in 0x1b which is not 4-byte aligned, so the 'ldr w1, [x20]'
instruction (readl) triggers the arm64 alignment fault (FSC = 0x21).

The root cause is in mana_gd_init_vf_regs(), which computes:

  gc->shm_base = gc->bar0_va + mana_gd_r64(gc, GDMA_REG_SHM_OFFSET);

The offset is used without any validation.  The same problem exists
in mana_gd_init_pf_regs() for sriov_base_off and sriov_shm_off.

Fix this by validating all offsets before use:

- VF: check shm_off is within BAR0, properly aligned to 4 bytes
  (readl requirement), and leaves room for the full 256-bit
  (32-byte) SMC aperture.

- PF: check sriov_base_off is within BAR0, aligned to 8 bytes
  (readq requirement), and leaves room to safely read the
  sriov_shm_off register at sriov_base_off + GDMA_PF_REG_SHM_OFF.
  Then check sriov_shm_off leaves room for the full SMC aperture.
  All arithmetic uses subtraction rather than addition to avoid
  integer overflow on garbage values.

Define SMC_APERTURE_SIZE (32 bytes, derived from the 256-bit aperture
width)

Return -EPROTO on invalid values.  The existing recovery path in
mana_serv_reset() already handles -EPROTO by falling through to PCI
device rescan, giving the hardware another chance to present valid
register values after reset.

Fixes: 9bf66036d6 ("net: mana: Handle hardware recovery events when probing the device")
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/afQUMClyjmBVfD+u@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-05 15:43:08 +02:00
..
3com drivers: net: 3com: 3c589: Remove this driver 2026-04-23 15:56:49 -07:00
8390 drivers: net: 8390: wd80x3: Remove this driver 2026-04-23 15:57:10 -07:00
actions
adaptec
adi treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
aeroflex
agere Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
airoha net: airoha: Move entries to queue head in case of DMA mapping failure in airoha_dev_xmit() 2026-04-30 18:08:48 -07:00
alacritech Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
allwinner treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
altera net: altera-tse: fix skb leak on DMA mapping error in tse_start_xmit() 2026-04-02 18:25:23 -07:00
amazon net: ena: convert to use .get_rx_ring_count 2026-01-17 18:10:16 -08:00
amd amd-xgbe: fix PTP addend overflow causing frozen clock 2026-05-02 10:16:27 -07:00
apm Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
apple
aquantia net: atlantic: fix reading SFP module info on some AQC100 cards 2026-02-26 19:20:53 -08:00
arc net: ethernet: arc: emac: quiesce interrupts before requesting IRQ 2026-03-10 19:05:12 -07:00
asix
atheros Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
broadcom Including fixes from Netfilter. 2026-04-23 16:50:42 -07:00
brocade Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
cadence net: macb: Use napi_schedule_irqoff() in IRQ handler 2026-04-09 20:17:31 -07:00
calxeda Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
cavium Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
chelsio ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs 2026-03-29 11:21:22 -07:00
cirrus
cisco enic: detect admin channel resources for SR-IOV 2026-04-02 18:05:06 -07:00
cortina Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
davicom
dec
dlink net: dlink: replace printk() with netdev_{info,dbg}() in rio_probe1() 2026-01-06 17:11:38 -08:00
emulex Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
engleder Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
ezchip
faraday net: ftgmac100: fix ring allocation unwind on open failure 2026-03-31 19:38:36 -07:00
freescale net: enetc: fix VSI mailbox timeout handling and DMA lifecycle 2026-04-30 17:35:56 -07:00
fungible Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
google gve: add support for UDP GSO for DQO format 2026-03-09 19:17:52 -07:00
hisilicon Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
huawei hinic3: Fix spelling mistake "capbility" -> "capability" 2026-03-14 08:46:41 -07:00
i825xx
ibm ibmveth: Disable GSO for packets with small MSS 2026-04-27 19:07:57 -07:00
intel ice: add dpll peer notification for paired SMA and U.FL pins 2026-04-30 11:37:39 +02:00
litex net: ethernet: litex: use device pointer to simplify code. 2026-02-27 19:25:16 -08:00
marvell octeontx2-af: npc: cn20k: Reject missing default-rule MCAM indices 2026-04-30 18:50:17 -07:00
mediatek Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2026-04-14 12:04:00 -07:00
mellanox Including fixes from Netfilter. 2026-04-23 16:50:42 -07:00
meta fbnic: convert to ndo_set_rx_mode_async 2026-04-21 12:50:24 +02:00
micrel net: ks8851: Avoid excess softirq scheduling 2026-04-18 12:14:19 -07:00
microchip net: lan743x: rename chip_rev to fpga_rev 2026-04-12 09:41:56 -07:00
microsoft net: mana: Fix crash from unvalidated SHM offset read from BAR0 during FLR 2026-05-05 15:43:08 +02:00
moxa
mscc Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
mucse net: rnpgbe: Add register_netdev 2025-11-04 18:11:37 -08:00
myricom Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
natsemi
netronome nfp: fix swapped arguments in nfp_encode_basic_qdr() calls 2026-04-23 11:01:20 -07:00
ni
nvidia Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
nxp
oki-semi net: pch_gbe: convert to use ndo_hwtstamp callbacks 2025-11-04 17:43:52 -08:00
pasemi Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
pensando Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2026-03-26 12:09:57 -07:00
qlogic qed: Reimplement qed_mcast_bin_from_mac() using library functions 2026-03-18 19:07:37 -07:00
qualcomm net: qualcomm: qca_uart: report the consumed byte on RX skb allocation failure 2026-04-03 15:32:56 -07:00
rdc
realtek r8169: add support for RTL8125cp 2026-03-05 13:41:48 +01:00
renesas net: ethernet: ravb: Suspend and resume the transmission flow 2026-04-03 16:04:28 -07:00
rocker net: rocker: kzalloc + kcalloc to kzalloc_flex 2026-03-09 18:51:07 -07:00
samsung Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
seeq
sfc sfc: fix error code in efx_devlink_info_running_versions() 2026-04-30 13:44:30 +02:00
sgi
silan
sis Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
smsc drivers: net: smsc: smc91c92: Remove this driver 2026-04-23 15:57:06 -07:00
socionext Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
spacemit Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2026-03-12 12:53:34 -07:00
stmicro net: stmmac: Prevent NULL deref when RX memory exhausted 2026-04-28 12:26:20 +02:00
sun Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
sunplus treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
synopsys Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
tehuti
ti net: ethernet: ti: am65-cpsw: add support for J722S SoC family 2026-04-12 08:29:03 -07:00
toshiba Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
tundra
vertexcom
via Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
wangxun net: libwx: use request_irq for VF misc interrupt 2026-04-30 18:07:21 -07:00
wiznet
xilinx net: xilinx: axienet: Fix BQL accounting for multi-BD TX packets 2026-03-31 12:09:12 +02:00
xircom
xscale net: ethernet: xscale: Check for PTP support properly 2026-02-20 16:10:24 -08:00
Kconfig drivers: net: fujitsu: fmvj18x: Remove this driver 2026-04-23 15:57:10 -07:00
Makefile drivers: net: fujitsu: fmvj18x: Remove this driver 2026-04-23 15:57:10 -07:00
ec_bhf.c net: ethernet: ec_bhf: Fix dma_free_coherent() dma handle 2026-02-17 17:16:55 -08:00
ethoc.c
fealnx.c
jme.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
jme.h
korina.c
lantiq_etop.c
lantiq_xrx200.c
oa_tc6.c