-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmfmt1AUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxqNw//cC1BlVe4UUVR5r9nfpoFeGAZeDJz
32naGvZKjwL0tR6dStO/BEZx4QrBp+smVfJfuxtQxfzLHLgMigM2jVhfa7XUmaun
7yZlJZu4Jmydc57sPf56CFOYOFP6zyPzSaE8u1Eb4IIqpvuoYpvDayDt6PSsLmFS
PDzqmicT3nuNbbcfE4rYLyL6JsXooKCR1h+NNcDjy7Run9DvQbE6N2PPvXCu6O97
aC3+kYUydEpgn9DfjBDghe+GBQCkBPldwnWqXBxKDFmYj5bKFujNccS9/IDSEuuX
oWntDRAXgLWg048sBC1AuJQajF3UaqffRGJkzUBaZWbU/jB9t5N/Z3GpYlXzizRx
CAqnt/ciGUKVbaESKwoeeIqgK+wG1bnrmoEaJHXFGqjr6sjm2A2T5EzyBMJ1hFwE
wUq6SDnkp5igG7rWtsBPo/lGa5h/pNlaXng11570ikD2ZfHVfRgwy2MpXYxChrkt
X2q/lRYU7yyfNJQ8O5LQJ6bYztatjxT0TxXNFv+cxfVrdI7vnQMuaeBE352jn+Lo
aZ8fTJDFbRXtrZolcoetZjBdHwPIS42wYQtvxo/ylUl64xKEzZEzN09XODjwn74v
nuOzDtbM0TAjZWxi6bwRFPnemTEAQxPuv1i4VdPQ7yblvC0at2agNBsz6Fdq60GS
s80YMJVUU0UcVoc=
=gM1i
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Enable Configuration RRS SV, which makes device readiness visible,
early instead of during child bus scanning (Bjorn Helgaas)
- Log debug messages about reset methods being used (Bjorn Helgaas)
- Avoid reset when it has been disabled via sysfs (Nishanth
Aravamudan)
- Add common pci-ep-bus.yaml schema for exporting several peripherals
of a single PCI function via devicetree (Andrea della Porta)
- Create DT nodes for PCI host bridges to enable loading device tree
overlays to create platform devices for PCI devices that have
several features that require multiple drivers (Herve Codina)
Resource management:
- Enlarge devres table[] to accommodate bridge windows, ROM, IOV
BARs, etc., and validate BAR index in devres interfaces (Philipp
Stanner)
- Fix typo that repeatedly distributed resources to a bridge instead
of iterating over subordinate bridges, which resulted in too little
space to assign some BARs (Kai-Heng Feng)
- Relax bridge window tail sizing for optional resources, e.g., IOV
BARs, to avoid failures when removing and re-adding devices (Ilpo
Järvinen)
- Allow drivers to enable devices even if we haven't assigned
optional IOV resources to them (Ilpo Järvinen)
- Rework handling of optional resources (IOV BARs, ROMs) to reduce
failures if we can't allocate them (Ilpo Järvinen)
- Fix a NULL dereference in the SR-IOV VF creation error path (Shay
Drory)
- Fix s390 mmio_read/write syscalls, which didn't cause page faults
in some cases, which broke vfio-pci lazy mapping on first access
(Niklas Schnelle)
- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
was disabled only for s390 (Niklas Schnelle)
- Support mmap of PCI resources on s390 except for ISM devices
(Niklas Schnelle)
ASPM:
- Delay pcie_link_state deallocation to avoid dangling pointers that
cause invalid references during hot-unplug (Daniel Stodden)
Power management:
- Allow PCI bridges to go to D3Hot when suspending on all non-x86
systems (Manivannan Sadhasivam)
Power control:
- Create pwrctrl devices in pci_scan_device() to make it more
symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
for PCI bridges possible (Manivannan Sadhasivam)
- Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
can still access devices after pci_stop_dev() (Manivannan
Sadhasivam)
- If there's a pwrctrl device for a PCI device, skip scanning it
because the pwrctrl core will rescan the bus after the device is
powered on (Manivannan Sadhasivam)
- Add a pwrctrl driver for PCI slots based on voltage regulators
described via devicetree (Manivannan Sadhasivam)
Bandwidth control:
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
set_pcie_cooling_state.sh test case (Yi Lai)
- Avoid a NULL pointer dereference when we run out of bus numbers to
assign for a bridge secondary bus (Lukas Wunner)
Hotplug:
- Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
NULL pointer checks (Lukas Wunner)
- Drop shpchp module init/exit logging, replace shpchp dbg() with
ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
(Ilpo Järvinen)
- Drop 'shpchp_debug' module parameter in favor of standard dynamic
debugging (Ilpo Järvinen)
- Drop unused cpcihp .get_power(), .set_power() function pointers
(Guilherme Giacomo Simoes)
- Disable hotplug interrupts in portdrv only when pciehp is not
enabled to avoid issuing two hotplug commands too close together
(Feng Tang)
- Skip pciehp 'device replaced' check if the device has been removed
to address a deadlock when resuming after a device was removed
during system sleep (Lukas Wunner)
- Don't enable pciehp hotplug interupt when resuming in poll mode
(Ilpo Järvinen)
Virtualization:
- Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
Dave)
DOE:
- Expose supported DOE features via sysfs (Alistair Francis)
- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
Francis)
Endpoint framework:
- Convert PCI device data so pci-epf-test works correctly on
big-endian endpoint systems (Niklas Cassel)
- Add BAR_RESIZABLE type to endpoint framework and add DWC core
support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
Cassel)
- Fix pci-epf-test double free that causes an oops if the host
reboots and PERST# deassertion restarts endpoint BAR allocation
(Christian Bruel)
- Fix endpoint BAR testing so tests can skip disabled BARs instead of
reporting them as failures (Niklas Cassel)
- Widen endpoint test BAR size variable to accommodate BARs larger
than INT_MAX (Niklas Cassel)
- Remove unused tools 'pci' build target left over after moving tests
to tools/testing/selftests/pci_endpoint (Jianfeng Liu)
Altera PCIe controller driver:
- Add DT binding and driver support for Agilex family (P-Tile,
F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)
AMD MDB PCIe controller driver:
- Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
(Thippeswamy Havalige)
Broadcom STB PCIe controller driver:
- Add BCM2712 MSI-X DT binding and interrupt controller drivers and
add softdep on irq_bcm2712_mip driver to ensure that it is loaded
first (Stanimir Varbanov)
- Expand inbound window map to 64GB so it can accommodate BCM2712
(Stanimir Varbanov)
- Add BCM2712 support and DT updates (Stanimir Varbanov)
- Apply link speed restriction before bringing link up, not after
(Jim Quinlan)
- Update Max Link Speed in Link Capabilities via the internal
writable register, not the read-only config register (Jim Quinlan)
- Handle regulator_bulk_get() error to avoid panic when we call
regulator_bulk_free() later (Jim Quinlan)
- Disable regulators only when removing the bus immediately below a
Root Port because we don't support regulators deeper in the
hierarchy (Jim Quinlan)
- Make const read-only arrays static (Colin Ian King)
Cadence PCIe endpoint driver:
- Correct MSG TLP generation so endpoints can generate INTx messages
(Hans Zhang)
Freescale i.MX6 PCIe controller driver:
- Identify the second controller on i.MX8MQ based on devicetree
'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)
- Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
ATU input address (using parent_bus_offset) from devicetree (Frank
Li)
Freescale Layerscape PCIe controller driver:
- Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
unnecessary 'status' from example (Krzysztof Kozlowski)
- Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
arg_count to fix probe failure on LS1043A (Ioana Ciornei)
HiSilicon STB PCIe controller driver:
- Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
JAILLET)
Intel Gateway PCIe controller driver:
- Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
input address (using parent_bus_offset) from devicetree (Frank Li)
Intel VMD host bridge driver:
- Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
pci_ops.read() will never sleep, even on PREEMPT_RT where
spinlock_t becomes a sleepable lock, to avoid calling a sleeping
function from invalid context (Ryo Takakura)
MediaTek PCIe Gen3 controller driver:
- Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
Bianconi)
- Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
program host bridge memory aperture to this syscon node (Lorenzo
Bianconi)
Qualcomm PCIe controller driver:
- Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)
- Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
Stein)
- Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
Baryshkov)
- Make DT iommu property required for SA8775P and prohibited for
SDX55 (Dmitry Baryshkov)
- Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
Baryshkov)
- Add endpoint DT properties for SAR2130P and enable endpoint mode in
driver (Dmitry Baryshkov)
- Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
RESERVED (Manivannan Sadhasivam)
Rockchip DesignWare PCIe controller driver:
- Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
Cassel)
Synopsys DesignWare PCIe controller driver:
- Add debugfs-based Silicon Debug, Error Injection, Statistical
Counter support for DWC (Shradha Todi)
- Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
Zhang)
- Add Rockchip support for DWC debugfs features (Niklas Cassel)
- Add dw_pcie_parent_bus_offset() to look up the parent bus address
of a specified 'reg' property and return the offset from the CPU
physical address (Frank Li)
- Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
via 'reg[config]' for host controllers and 'reg[addr_space]' for
endpoint controllers (Frank Li)
- Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
of .cpu_addr_fixup() when programming ATU (Frank Li)
TI J721E PCIe driver:
- Correct the 'link down' interrupt bit for J784S4 (Siddharth
Vadapalli)
TI Keystone PCIe controller driver:
- Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
alignment requirement from 1MB to 64KB (Niklas Cassel)
Xilinx Versal CPM PCIe controller driver:
- Free IRQ domain in probe error path to avoid leaking it
(Thippeswamy Havalige)
- Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)
- Add driver support for CPM5_HOST1 (Thippeswamy Havalige)
Miscellaneous:
- Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)
- Use for_each_available_child_of_node_scoped() to simplify apple,
kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"
* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
PCI: endpoint: Add intx_capable to epc_features struct
dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
PCI: intel-gw: Remove intel_pcie_cpu_addr()
PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
PCI: dwc: ep: Ensure proper iteration over outbound map windows
PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
PCI: dwc: Add dw_pcie_parent_bus_offset()
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
PCI: brcmstb: Make const read-only arrays static
...
uapi:
- add mediatek tiled fourcc
- add support for notifying userspace on device wedged
new driver:
- appletbdrm: support for Apple Touchbar displays on m1/m2
- nova-core: skeleton rust driver to develop nova inside off
firmware:
- add some rust firmware pieces
rust:
- add 'LocalModule' type alias
component:
- add helper to query bound status
fbdev:
- fbtft: remove access to page->index
media:
- cec: tda998x: import driver from drm
dma-buf:
- add fast path for single fence merging
tests:
- fix lockdep warnings
atomic:
- allow full modeset on connector changes
- clarify semantics of allow_modeset and drm_atomic_helper_check
- async-flip: support on arbitary planes
- writeback: fix UAF
- Document atomic-state history
format-helper:
- support ARGB8888 to ARGB4444 conversions
buddy:
- fix multi-root cleanup
ci:
- update IGT
dp:
- support extended wake timeout
- mst: fix RAD to string conversion
- increase DPCD eDP control CAP size to 5 bytes
- add DPCD eDP v1.5 definition
- add helpers for LTTPR transparent mode
panic:
- encode QR code according to Fido 2.2
scheduler:
- add parameter struct for init
- improve job peek/pop operations
- optimise drm_sched_job struct layout
ttm:
- refactor pool allocation
- add helpers for TTM shrinker
panel-orientation:
- add a bunch of new quirks
panel:
- convert panels to multi-style functions
- edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006,
Lenovo T14s Gen6 Snapdragon
- himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
kd110n11-51ie, Starry 2082109qfh040022-50e
- visionox-r66451: use multi-style MIPI-DSI functions
- raydium-rm67200: Add driver for Raydium RM67200
- simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
- sony-td4353-jdi: Use MIPI-DSI multi-func interface
- summit: Add driver for Apple Summit display panel
- visionox-rm692e5: Add driver for Visionox RM692E5
bridge:
- pass full atomic state to various callbacks
- adv7511: Report correct capabilities
- it6505: Fix HDCP V compare
- snd65dsi86: fix device IDs
- nwl-dsi: set bridge type
- ti-sn65si83: add error recovery and set bridge type
- synopsys: add HDMI audio support
xe:
- support device-wedged event
- add mmap support for PCI memory barrier
- perf pmu integration and expose per-engien activity
- add EU stall sampling support
- GPU SVM and Xe SVM implementation
- use TTM shrinker
- add survivability mode to allow the driver to do
firmware updates in critical failure states
- PXP HWDRM support for MTL and LNL
- expose package/vram temps over hwmon
- enable DP tunneling
- drop mmio_ext abstraction
- Reject BO evcition if BO is bound to current VM
- Xe suballocator improvements
- re-use display vmas when possible
- add GuC Buffer Cache abstraction
- PCI ID update for Panther Lake and Battlemage
- Enable SRIOV for Panther Lake
- Refactor VRAM manager location
i915:
- enable extends wake timeout
- support device-wedged event
- Enable DP 128b/132b SST DSC
- FBC dirty rectangle support for display version 30+
- convert i915/xe to drm client setup
- Compute HDMI PLLS for rates not in fixed tables
- Allow DSB usage when PSR is enabled on LNL+
- Enable panel replay without full modeset
- Enable async flips with compressed buffers on ICL+
- support luminance based brightness via DPCD for eDP
- enable VRR enable/disable without full modeset
- allow GuC SLPC default strategies on MTL+ for performance
- lots of display refactoring in move to struct intel_display
amdgpu:
- add device wedged event
- support async page flips on overlay planes
- enable broadcast RGB drm property
- add info ioctl for virt mode
- OEM i2c support for RGB lights
- GC 11.5.2 + 11.5.3 support
- SDMA 6.1.3 support
- NBIO 7.9.1 + 7.11.2 support
- MMHUB 1.8.1 + 3.3.2 support
- DCN 3.6.0 support
- Add dynamic workload profile switching for GC 10-12
- support larger VBIOS sizes
- Mark gttsize parameters as deprecated
- Initial JPEG queue resset support
amdkfd:
- add KFD per process flags for setting precision
- sync pasid values between KGD and KFD
- improve GTT/VRAM handling for APUs
- fix user queue validation on GC7/8
- SDMA queue reset support
raedeon:
- rs400 hyperz fix
i2c:
- td998x: drop platform_data, split driver into media and bridge
ast:
- transmitter chip detection refactoring
- vbios display mode refactoring
- astdp: fix connection status and filter unsupported modes
- cursor handling refactoring
imagination:
- check job dependencies with sched helper
ivpu:
- improve command queue handling
- use workqueue for IRQ handling
- add support HW fault injection
- locking fixes
mgag200:
- add support for G200eH5
msm:
- dpu: add concurrent writeback support for DPU 10.x+
- use LTTPR helpers
- GPU:
- Fix obscure GMU suspend failure
- Expose syncobj timeline support
- Extend GPU devcoredump with pagetable info
- a623 support
- Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump
- Display:
- Add cpu-cfg interconnect paths on SM8560 and SM8650
- Introduce KMS OMMU fault handler, causing devcoredump snapshot
- Fixed error pointer dereference in msm_kms_init_aspace()
- DPU:
- Fix mode_changing handling
- Add writeback support on SM6150 (QCS615)
- Fix DSC programming in 1:1:1 topology
- Reworked hardware resource allocation, moving it to the CRTC code
- Enabled support for Concurrent WriteBack (CWB) on SM8650
- Enabled CDM blocks on all relevant platforms
- Reworked debugfs interface for BW/clocks debugging
- Clear perf params before calculating bw
- Support YUV formats on writeback
- Fixed double inclusion
- Fixed writeback in YUV formats when using cloned output, Dropped
wb2_formats_rgb
- Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
kerneldocs
- Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
- DSI:
- DSC-related fixes
- Rework clock programming
- DSI PHY:
- Fix 7nm (and lower) PHY programming
- Add proper DT schema definitions for DSI PHY clocks
- HDMI:
- Rework the driver, enabling the use of the HDMI Connector framework
- Bindings:
- Added eDP PHY on SA8775P
nouveau:
- move drm_slave_encoder interface into driver
- nvkm: refactor GSP RPC
- use LTTPR helpers
mediatek:
- HDMI fixup and refinement
- add MT8188 dsc compatible
- MT8365 SoC support
panthor:
- Expose sizes of intenral BOs via fdinfo
- Fix race between reset and suspend
- Improve locking
qaic:
- Add support for AIC200
renesas:
- Fix limits in DT bindings
rockchip:
- support rk3562-mali
- rk3576: Add HDMI support
- vop2: Add new display modes on RK3588 HDMI0 up to 4K
- Don't change HDMI reference clock rate
- Fix DT bindings
- analogix_dp: add eDP support
- fix shutodnw
solomon:
- Set SPI device table to silence warnings
- Fix pixel and scanline encoding
v3d:
- handle clock
vc4:
- Use drm_exec
- Use dma-resv for wait-BO ioctl
- Remove seqno infrastructure
virtgpu:
- Support partial mappings of GEM objects
- Reserve VGA resources during initialization
- Fix UAF in virtgpu_dma_buf_free_obj()
- Add panic support
vkms:
- Switch to a managed modesetting pipeline
- Add support for ARGB8888
- fix UAf
xlnx:
- Set correct DMA segment size
- use mutex guards
- Fix error handling
- Fix docs
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmfmA2cACgkQDHTzWXnE
hr7lKQ/+I7gmln3ka8FyKnpwG5KusDpxz3OzgUKHpzOTkXL1+vPMt0vjCKRJKE3D
zfpUTWNbwlVN0krqmUGyFIeCt8wmevBF6HvQ+GsWbiWltUj3xIlnkV0TVH84XTUo
0evrXNG9K8sLjMKjrf7yrGL53Ayoaq9IO9wtOws+FCgtykAsMR/IWLrYLpj21ZQ6
Oclhq5Cz21WRoQpzySR23s3DLi4LHri26RGKbGNh2PzxYwyP/euGW6O+ncEduNmg
vQLgUfptaM/EubJFG6jxDWZJ2ChIAUUxQuhZwt7DKqRsYIcJKcfDSUzqL95t6SYU
zewlYmeslvusoesCeTJzHaUj34yqJGsvjFPsHFUEvAy8BVncsqS40D6mhrRMo5nD
chnSJu+IpDOEEqcdbb1J73zKLw6X3GROM8qUQEThJBD2yTsOdw9d0HXekMDMBXSi
NayKvXfsBV19rI74FYnRCzHt8BVVANh5qJNnR5RcnPZ2KzHQbV0JFOA9YhxE8vFU
GWkFWKlpAQUQ+yoTy1kuO9dcUxLIC7QseMeS5BYhcJBMEV78xQuRYRxgsL8YS4Yg
rIhcb3mZwMFj7jBAqfpLKWiI+GTup+P9vcz7Bvm5iIf8gEhZLUTwqBeAYXQkcWd4
i1AqDuFR0//sAgODoeU2sg1Q3yTL/i/DhcwvCIPgcDZ/x4Eg868=
=oK/N
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"Outside of drm there are some rust patches from Danilo who maintains
that area in here, and some pieces for drm header check tests.
The major things in here are a new driver supporting the touchbar
displays on M1/M2, the nova-core stub driver which is just the vehicle
for adding rust abstractions and start developing a real driver inside
of.
xe adds support for SVM with a non-driver specific SVM core
abstraction that will hopefully be useful for other drivers, along
with support for shrinking for TTM devices. I'm sure xe and AMD
support new devices, but the pipeline depth on these things is hard to
know what they end up being in the marketplace!
uapi:
- add mediatek tiled fourcc
- add support for notifying userspace on device wedged
new driver:
- appletbdrm: support for Apple Touchbar displays on m1/m2
- nova-core: skeleton rust driver to develop nova inside off
firmware:
- add some rust firmware pieces
rust:
- add 'LocalModule' type alias
component:
- add helper to query bound status
fbdev:
- fbtft: remove access to page->index
media:
- cec: tda998x: import driver from drm
dma-buf:
- add fast path for single fence merging
tests:
- fix lockdep warnings
atomic:
- allow full modeset on connector changes
- clarify semantics of allow_modeset and drm_atomic_helper_check
- async-flip: support on arbitary planes
- writeback: fix UAF
- Document atomic-state history
format-helper:
- support ARGB8888 to ARGB4444 conversions
buddy:
- fix multi-root cleanup
ci:
- update IGT
dp:
- support extended wake timeout
- mst: fix RAD to string conversion
- increase DPCD eDP control CAP size to 5 bytes
- add DPCD eDP v1.5 definition
- add helpers for LTTPR transparent mode
panic:
- encode QR code according to Fido 2.2
scheduler:
- add parameter struct for init
- improve job peek/pop operations
- optimise drm_sched_job struct layout
ttm:
- refactor pool allocation
- add helpers for TTM shrinker
panel-orientation:
- add a bunch of new quirks
panel:
- convert panels to multi-style functions
- edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry
116KHD024006, Lenovo T14s Gen6 Snapdragon
- himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
kd110n11-51ie, Starry 2082109qfh040022-50e
- visionox-r66451: use multi-style MIPI-DSI functions
- raydium-rm67200: Add driver for Raydium RM67200
- simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
- sony-td4353-jdi: Use MIPI-DSI multi-func interface
- summit: Add driver for Apple Summit display panel
- visionox-rm692e5: Add driver for Visionox RM692E5
bridge:
- pass full atomic state to various callbacks
- adv7511: Report correct capabilities
- it6505: Fix HDCP V compare
- snd65dsi86: fix device IDs
- nwl-dsi: set bridge type
- ti-sn65si83: add error recovery and set bridge type
- synopsys: add HDMI audio support
xe:
- support device-wedged event
- add mmap support for PCI memory barrier
- perf pmu integration and expose per-engien activity
- add EU stall sampling support
- GPU SVM and Xe SVM implementation
- use TTM shrinker
- add survivability mode to allow the driver to do firmware updates
in critical failure states
- PXP HWDRM support for MTL and LNL
- expose package/vram temps over hwmon
- enable DP tunneling
- drop mmio_ext abstraction
- Reject BO evcition if BO is bound to current VM
- Xe suballocator improvements
- re-use display vmas when possible
- add GuC Buffer Cache abstraction
- PCI ID update for Panther Lake and Battlemage
- Enable SRIOV for Panther Lake
- Refactor VRAM manager location
i915:
- enable extends wake timeout
- support device-wedged event
- Enable DP 128b/132b SST DSC
- FBC dirty rectangle support for display version 30+
- convert i915/xe to drm client setup
- Compute HDMI PLLS for rates not in fixed tables
- Allow DSB usage when PSR is enabled on LNL+
- Enable panel replay without full modeset
- Enable async flips with compressed buffers on ICL+
- support luminance based brightness via DPCD for eDP
- enable VRR enable/disable without full modeset
- allow GuC SLPC default strategies on MTL+ for performance
- lots of display refactoring in move to struct intel_display
amdgpu:
- add device wedged event
- support async page flips on overlay planes
- enable broadcast RGB drm property
- add info ioctl for virt mode
- OEM i2c support for RGB lights
- GC 11.5.2 + 11.5.3 support
- SDMA 6.1.3 support
- NBIO 7.9.1 + 7.11.2 support
- MMHUB 1.8.1 + 3.3.2 support
- DCN 3.6.0 support
- Add dynamic workload profile switching for GC 10-12
- support larger VBIOS sizes
- Mark gttsize parameters as deprecated
- Initial JPEG queue resset support
amdkfd:
- add KFD per process flags for setting precision
- sync pasid values between KGD and KFD
- improve GTT/VRAM handling for APUs
- fix user queue validation on GC7/8
- SDMA queue reset support
raedeon:
- rs400 hyperz fix
i2c:
- td998x: drop platform_data, split driver into media and bridge
ast:
- transmitter chip detection refactoring
- vbios display mode refactoring
- astdp: fix connection status and filter unsupported modes
- cursor handling refactoring
imagination:
- check job dependencies with sched helper
ivpu:
- improve command queue handling
- use workqueue for IRQ handling
- add support HW fault injection
- locking fixes
mgag200:
- add support for G200eH5
msm:
- dpu: add concurrent writeback support for DPU 10.x+
- use LTTPR helpers
- GPU:
- Fix obscure GMU suspend failure
- Expose syncobj timeline support
- Extend GPU devcoredump with pagetable info
- a623 support
- Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot /
devcoredump
- Display:
- Add cpu-cfg interconnect paths on SM8560 and SM8650
- Introduce KMS OMMU fault handler, causing devcoredump snapshot
- Fixed error pointer dereference in msm_kms_init_aspace()
- DPU:
- Fix mode_changing handling
- Add writeback support on SM6150 (QCS615)
- Fix DSC programming in 1:1:1 topology
- Reworked hardware resource allocation, moving it to the CRTC code
- Enabled support for Concurrent WriteBack (CWB) on SM8650
- Enabled CDM blocks on all relevant platforms
- Reworked debugfs interface for BW/clocks debugging
- Clear perf params before calculating bw
- Support YUV formats on writeback
- Fixed double inclusion
- Fixed writeback in YUV formats when using cloned output, Dropped
wb2_formats_rgb
- Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
kerneldocs
- Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
- DSI:
- DSC-related fixes
- Rework clock programming
- DSI PHY:
- Fix 7nm (and lower) PHY programming
- Add proper DT schema definitions for DSI PHY clocks
- HDMI:
- Rework the driver, enabling the use of the HDMI Connector
framework
- Bindings:
- Added eDP PHY on SA8775P
nouveau:
- move drm_slave_encoder interface into driver
- nvkm: refactor GSP RPC
- use LTTPR helpers
mediatek:
- HDMI fixup and refinement
- add MT8188 dsc compatible
- MT8365 SoC support
panthor:
- Expose sizes of intenral BOs via fdinfo
- Fix race between reset and suspend
- Improve locking
qaic:
- Add support for AIC200
renesas:
- Fix limits in DT bindings
rockchip:
- support rk3562-mali
- rk3576: Add HDMI support
- vop2: Add new display modes on RK3588 HDMI0 up to 4K
- Don't change HDMI reference clock rate
- Fix DT bindings
- analogix_dp: add eDP support
- fix shutodnw
solomon:
- Set SPI device table to silence warnings
- Fix pixel and scanline encoding
v3d:
- handle clock
vc4:
- Use drm_exec
- Use dma-resv for wait-BO ioctl
- Remove seqno infrastructure
virtgpu:
- Support partial mappings of GEM objects
- Reserve VGA resources during initialization
- Fix UAF in virtgpu_dma_buf_free_obj()
- Add panic support
vkms:
- Switch to a managed modesetting pipeline
- Add support for ARGB8888
- fix UAf
xlnx:
- Set correct DMA segment size
- use mutex guards
- Fix error handling
- Fix docs"
* tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel: (1762 commits)
drm/amd/pm: Update feature list for smu_v13_0_6
drm/amdgpu: Add parameter documentation for amdgpu_sync_fence
drm/amdgpu/discovery: optionally use fw based ip discovery
drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics
drm/amdgpu/discovery: check ip_discovery fw file available
drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA
drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush
drm/amd/amdgpu: Increase max rings to enable SDMA page ring
drm/amdgpu: Decode deferred error type in gfx aca bank parser
drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs
drm/amdgpu/mes: clean up SDMA HQD loop
drm/amdgpu/mes: enable compute pipes across all MEC
drm/amdgpu/mes: drop MES 10.x leftovers
drm/amdgpu/mes: optimize compute loop handling
drm/amdgpu/sdma: guilty tracking is per instance
drm/amdgpu/sdma: fix engine reset handling
drm/amdgpu: remove invalid usage of sched.ready
drm/amdgpu: add cleaner shader trace point
...
IO split is usually bad in io_uring world, since -EAGAIN is caused and
IO handling may have to fallback to io-wq, this way does hurt performance.
ublk starts to support zero copy recently, for avoiding unnecessary IO
split, ublk driver's segment limit should be aligned with backend
device's segment limit.
Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
which number is aligned with ublk request segment number, so that big
memory allocation can be avoided by setting reasonable max_segments limit.
So add segment parameter for providing ublk server chance to align
segment limit with backend, and keep it reasonable from implementation
viewpoint.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmflYcAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpmvJD/4tKQlr0yRhln/JzPiONS41mUAuNRI4MdqJ
ykpQkMx3NcQANbNyOxI0PV5I7y1Jdlg/UP9gy11BrIaBk4Kqoluc6iAzgr5q9pBC
8pXhPIe80R/q/LOKEz9n5gqOMPNyUtd7IaBayJPBJre/YZXQu+49IL2Uyy3hss8d
neqAbWErd2FoUfTY14XB3ImLM6a76Z6CjE3pJYvVDM5uRBuH0IGqehJJuNpsViBf
M9XAW/HZt8ISsVt1tJbCQVWx4b63L/omHI8u5K2M0isTPV+QPk1O2Vgkn7dBrDeT
JvThWrM1uE++DYGcQ3DXHfb3gBIFEjTrNb2nddstyEU2ZaEXUkuOV2O0b7WPuphj
zp0oFaLl/ivHT8NoJzzZzK24zt99Qz43GWUaFCQeR0R8oTix/M1q0unguER45Iv7
Po/b3h6+RAi+87KOlM5WWo05ScswS8AwcSUsP5xMR5BjjD+GQYO5PmVVyo8w0rid
8F9U9DpN2CTA5YVjI+ax1cxWMOfmAXPK5ONjzZpyJoWb0THgj97esEwc2un7SBi7
TJJz7Gc9/xOqfRKaPDoH9t8+b6ruWHMqCYDw6exSAUKeDxQ+7z0zNMudHkuR5VrX
x+Taaj95ONLVNZYz0mbFcvmJC0UBOqkE94omXk7TU2Cn7SBzAW//XDep6CPpX/sa
LcmOK4UXdg==
=vOm1
-----END PGP SIGNATURE-----
Merge tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
"Final separate updates for io_uring.
This started out as a series of cleanups improvements and improvements
for registered buffers, but as the last series of the io_uring changes
for 6.15, it also collected a few fixes for the other branches on top:
- Add support for vectored fixed/registered buffers.
Previously only single segments have been supported for commands,
now vectored variants are supported as well. This series includes
networking and file read/write support.
- Small series unifying return codes across multi and single shot.
- Small series cleaning up registerd buffer importing.
- Adding support for vectored registered buffers for uring_cmd.
- Fix for io-wq handling of command reissue.
- Various little fixes and tweaks"
* tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux: (25 commits)
io_uring/net: fix io_req_post_cqe abuse by send bundle
io_uring/net: use REQ_F_IMPORT_BUFFER for send_zc
io_uring: move min_events sanitisation
io_uring: rename "min" arg in io_iopoll_check()
io_uring: open code __io_post_aux_cqe()
io_uring: defer iowq cqe overflow via task_work
io_uring: fix retry handling off iowq
io_uring/net: only import send_zc buffer once
io_uring/cmd: introduce io_uring_cmd_import_fixed_vec
io_uring/cmd: add iovec cache for commands
io_uring/cmd: don't expose entire cmd async data
io_uring: rename the data cmd cache
io_uring: rely on io_prep_reg_vec for iovec placement
io_uring: introduce io_prep_reg_iovec()
io_uring: unify STOP_MULTISHOT with IOU_OK
io_uring: return -EAGAIN to continue multishot
io_uring: cap cached iovec/bvec size
io_uring/net: implement vectored reg bufs for zctx
io_uring/net: convert to struct iou_vec
io_uring/net: pull vec alloc out of msghdr import
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfjTRYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpvfMD/4596ZTo0bzeimMMzFloabhaqcMC9eO21iW
oNb6KdSKQzAOdQl3+YfhSrP/ZbeoJl8y0YoPGIA2RBd4ELl/4YPVTNeahNhz70P2
iawXcYmMyrM+6W6RXof9uj+k8mE17G7N+M7DuMYjfMnC/OhDKg/3DbbydwesMW47
/1naMsk/rJx6CbeVKAl3V2es3NPUrZlh/dsurm5muGqXHAUmv459w46togKNvUhM
tAGSq54zyPLnvnuA4Cm1QBIYtN5dHmjDjMmEXFiEF1qg7c5WIfVpkopc/SjvvlQs
fVwN9Mn9Mh1VXuLysSqa5RsXINtXbJdpyXnh8pG/QqhntftNRKkwMxlN0gEYfBzB
4iuvzUpGmPAwPhYJ60Ylci4KhWY31mwbU+ns9+/G4bIdDxdEMJGvy4D/oz/fUr9p
1gwAIkJgfixb6fsUrL6REFoSEmRQQ7hQkyKRFyvXSXsTtqXz2BR7O75ujK381J9d
Sa3zexDBRaJWlDWspRe7Olcwf/hqLmY3cWwQ8Z+a2vozU2dAtYpKr5Lhr7SjuftI
Uw4pCkrXUG5EhdMSNuQnzVZv3xWPGAAUaAIT0MhDcWakhNT1dyyjbuXWuFzDmKQI
0Q71+UPBW300eJx8bFupYQ1mKp10LbjewCxiXOkBFZxaG7n5UwGsv4FoE3CJ8QEP
V0ou6ZpBFA==
=AZkA
-----END PGP SIGNATURE-----
Merge tag 'for-6.15/io_uring-epoll-wait-20250325' of git://git.kernel.dk/linux
Pull io_uring epoll support from Jens Axboe:
"This adds support for reading epoll events via io_uring.
While this may seem counter-intuitive (and/or productive), the
reasoning here is that quite a few existing epoll event loops can
easily do a partial conversion to a completion based model, but are
still stuck with one (or few) event types that remain readiness based.
For that case, they then need to add the io_uring fd to the epoll
context, and continue to rely on epoll_wait(2) for waiting on events.
This misses out on the finer grained waiting that io_uring can do, to
reduce context switches and wait for multiple events in one batch
reliably.
With adding support for reaping epoll events via io_uring, the whole
legacy readiness based event types can still be reaped via epoll, with
the overall waiting in the loop be driven by io_uring"
* tag 'for-6.15/io_uring-epoll-wait-20250325' of git://git.kernel.dk/linux:
io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
io_uring/epoll: remove CONFIG_EPOLL guards
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfjTP8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpm6oEACnpGL52FAKTVj14GDqFo6Pq0Jmnh07x8qj
mpHFPwxfWAzRiuNyji2iS9ecS2cnlkixNyMWZipXRi4KJAUjJH6YDd7IofUI3Glf
6v7b6srFSvsWJIJ8LdkJHLHAJuzYnJvFZ8apwgQczEDqgHq7BAunM1sVQ+mydjYk
EXT4kN6DSBOPzwr9GAay52f8nXhbqdHfT+YTGHPHg+QToojL6gD7vvW57w/QqD/x
91hJef1z01cSIsDZOxA0EUeD+9bBAHpoamr/e3IOOCVYCN6hy0dGa9g0QGbbpVyE
AeU4FGZLV9J8OOfvHVraDt5Wn3IXxYaMu22dSn1S6tVinwjXhJR2LAA+t4fGHAkt
i38LjOsIbopSQn/cNhzwC8UZcHLqnVsdDolHlHzsVFVfcpck2/4JFpUeP8QhWgrk
f9tY12QUf/oEaWm0/sUCHZNFxpIGeFA5FFXf0Z92clnzBuiuWoesBNvxqY/2DeZn
IDNXiv+Trxr6kFEjTpzPeuxbWrn4PJ7afQSAFcEmOCguk5riM+zJZNIKg0TxUHSS
tt6sfxmTP1DhgDKad5kT3MLyzOcx47Kbjf4dj6KmRnD+3DGwwN2F7X7R1GJylPSp
RLOzJ+Ouuy9UmBN6JMsT4BmR9+FJTVirADU926d/ZqCTtRV8Tnq/6HHmKmmr4CR0
THJ0PJqQjg==
=MOve
-----END PGP SIGNATURE-----
Merge tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux
Pull io_uring zero-copy receive support from Jens Axboe:
"This adds support for zero-copy receive with io_uring, enabling fast
bulk receive of data directly into application memory, rather than
needing to copy the data out of kernel memory.
While this version only supports host memory as that was the initial
target, other memory types are planned as well, with notably GPU
memory coming next.
This work depends on some networking components which were queued up
on the networking side, but have now landed in your tree.
This is the work of Pavel Begunkov and David Wei. From the v14 posting:
'We configure a page pool that a driver uses to fill a hw rx queue
to hand out user pages instead of kernel pages. Any data that ends
up hitting this hw rx queue will thus be dma'd into userspace
memory directly, without needing to be bounced through kernel
memory. 'Reading' data out of a socket instead becomes a
_notification_ mechanism, where the kernel tells userspace where
the data is. The overall approach is similar to the devmem TCP
proposal
This relies on hw header/data split, flow steering and RSS to
ensure packet headers remain in kernel memory and only desired
flows hit a hw rx queue configured for zero copy. Configuring this
is outside of the scope of this patchset.
We share netdev core infra with devmem TCP. The main difference is
that io_uring is used for the uAPI and the lifetime of all objects
are bound to an io_uring instance. Data is 'read' using a new
io_uring request type. When done, data is returned via a new shared
refill queue. A zero copy page pool refills a hw rx queue from this
refill queue directly. Of course, the lifetime of these data
buffers are managed by io_uring rather than the networking stack,
with different refcounting rules.
This patchset is the first step adding basic zero copy support. We
will extend this iteratively with new features e.g. dynamically
allocated zero copy areas, THP support, dmabuf support, improved
copy fallback, general optimisations and more'
In a local setup, I was able to saturate a 200G link with a single CPU
core, and at netdev conf 0x19 earlier this month, Jamal reported
188Gbit of bandwidth using a single core (no HT, including soft-irq).
Safe to say the efficiency is there, as bigger links would be needed
to find the per-core limit, and it's considerably more efficient and
faster than the existing devmem solution"
* tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux:
io_uring/zcrx: add selftest case for recvzc with read limit
io_uring/zcrx: add a read limit to recvzc requests
io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
io_uring: Rename KConfig to Kconfig
io_uring/zcrx: fix leaks on failed registration
io_uring/zcrx: recheck ifq on shutdown
io_uring/zcrx: add selftest
net: add documentation for io_uring zcrx
io_uring/zcrx: add copy fallback
io_uring/zcrx: throttle receive requests
io_uring/zcrx: set pp memory provider for an rx queue
io_uring/zcrx: add io_recvzc request
io_uring/zcrx: dma-map area for the device
io_uring/zcrx: implement zerocopy receive pp memory provider
io_uring/zcrx: grab a net device
io_uring/zcrx: add io_zcrx_area
io_uring/zcrx: add interface queue and refill queue
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ+bGgBAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSKmgBAICZsmQTuKMHIXdB7kwA+BX5k++SZcyA+qHN
0hrJTSMsAP0Uv6NpiPT4CTduqBMRbuMwNhujBczRiok6yaHDbC8eCw==
=K8XL
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This brings two main changes to Landlock:
- A signal scoping fix with a new interface for user space to know if
it is compatible with the running kernel.
- Audit support to give visibility on why access requests are denied,
including the origin of the security policy, missing access rights,
and description of object(s). This was designed to limit log spam
as much as possible while still alerting about unexpected blocked
access.
With these changes come new and improved documentation, and a lot of
new tests"
* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
landlock: Add audit documentation
selftests/landlock: Add audit tests for network
selftests/landlock: Add audit tests for filesystem
selftests/landlock: Add audit tests for abstract UNIX socket scoping
selftests/landlock: Add audit tests for ptrace
selftests/landlock: Test audit with restrict flags
selftests/landlock: Add tests for audit flags and domain IDs
selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
selftests/landlock: Add test for invalid ruleset file descriptor
samples/landlock: Enable users to log sandbox denials
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
landlock: Log scoped denials
landlock: Log TCP bind and connect denials
landlock: Log truncate and IOCTL denials
landlock: Factor out IOCTL hooks
landlock: Log file-related denials
landlock: Log mount-related denials
landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
...
Stephen Rothwell reports htmldocs warnings on iommufd_vevent_header
tables:
Documentation/userspace-api/iommufd:323: ./include/uapi/linux/iommufd.h:1048: CRITICAL: Unexpected section title or transition.
------------------------------------------------------------------------- [docutils]
WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 8.1.3 ./include/uapi/linux/iommufd.h' processing failed with: Documentation/userspace-api/iommufd:323: ./include/uapi/linux/iommufd.h:1048: (SEVERE/4) Unexpected section title or transition.
-------------------------------------------------------------------------
These are because Sphinx confuses the tables for section headings. Fix
the table markup to squash away above warnings.
Fixes: e36ba5ab80 ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Link: https://patch.msgid.link/r/20250328114654.55840-1-bagasdotme@gmail.com
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250318213359.5dc56fd1@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
PASID usage requires PASID support in both device and IOMMU. Since the
iommu drivers always enable the PASID capability for the device if it
is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
capability to userspace. Also, enhances the selftest accordingly.
Link: https://patch.msgid.link/r/20250321180143.8468-5-yi.l.liu@intel.com
Cc: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Since the context-type additions to the virtio-gpu spec, these have been
defined locally in guest user-space, and virtio-gpu backend library code.
Now, these capsets have been stabilized, and should be defined in a
common space, in both the virtio_gpu header, and alongside the virtgpu_drm
interface that they apply to.
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Aaron Ruby <aruby@qnx.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
[dmitry.osipenko@collabora.com: edit commit title]
Link: https://patchwork.freedesktop.org/patch/msgid/YT3PR01MB5857E808EDF6949F2DF517FDAFA12@YT3PR01MB5857.CANPRD01.PROD.OUTLOOK.COM
In this round, there are three major updates: 1) folio conversion, 2) refactor
for mount API conversion, 3) some performance improvement such as direct IO,
checkpoint speed, and IO priority hints. For stability, there are patches
which add more sanity checks and fixes some major issues like i_size in
atomic write operations and write pointer recovery in zoned devices.
Enhancement:
- huge folio converion work by Matthew Wilcox
- clean up for mount API conversion by Eric Sandeen
- improve direct IO speed in the overwrite case
- add some sanity check on node consistency
- set highest IO priority for checkpoint thread
- keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
- add ioctl to get IO priority hint
- add carve_out sysfs node for fsstat
Bug fix:
- disable nat_bits during umount to avoid potential nat entry corruption
- fix missing i_size update on atomic writes
- fix missing discard for active segments
- fix running out of free segments
- fix out-of-bounds access in f2fs_truncate_inode_blocks()
- call f2fs_recover_quota_end() correctly
- fix potential deadloop in prepare_compress_overwrite()
- fix the missing write pointer correction for zoned device
- fix to avoid panic once fallocation fails for pinfile
- don't retry IO for corrupted data scenario
There are many other clean up patches and minor bug fixes as usual.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmfhjuYACgkQQBSofoJI
UNJF+hAAip0Sf6bXwt43KR3lmbXg/YBTtPDACOs375Pn8pfRVsAPIAf5e1AnlBET
rA0KpqUrEZHXenQFF9n4LIoHYqfuUw3EMempuvp4qgQcx15Kaajw+EGWVLYstNy2
9dELc9DA55f/i1uvHezGfQFy6hMfXasf+tkaYk0z0ZYDStYboLgkVAY+F869q1Dl
D16Y7Lna12h4eCxDdssIPDsLjFH/2LDn7SmhXsnpZxwK0Zx8JSo83WxXHnzXHxz4
vUkukInKgqcBDvf6ufW/YF3/tqSs20XEXNK3cI1vyHx1dwij6Us+G0n0WJHL30QE
zsliecR0X28JP1rJ3ldCjr+4Kxm6/u/Uwpinm2Fm1jL67UY4TZcaARBE8k20I6ND
j/L1+sXrIdZ1aILM9/bwCjgXiVFdZbvlfGfpTj1duAkQgRd+/s9cjPlo5j5HTad5
XJmzJz6YOaUNarhP/E31Z9SV9M2kEcmoDOTxKBg6ZcMWXZ27B6Z0ag92prd0GWk6
rWDuVj0eP/LDY1QedbHRPbg1D84jVgcnldfPaf9ptln993skJGdgS0dkegTqMR0L
H8RgpWOzWZI53gnQdePdej8diHmD8uRTrLf/oABm//GzTHC5BdwVpArOl1DCuLFF
YkMtVEicgnWRmL3PgODAdTJXaDi3uGvT116i+lfMlUn9p+t93r0=
=E+TE
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, there are three major updates: (1) folio conversion,
(2) refactoring for mount API conversion, (3) some performance
improvement such as direct IO, checkpoint speed, and IO priority
hints.
For stability, there are patches which add more sanity checks and
fixes some major issues like i_size in atomic write operations and
write pointer recovery in zoned devices.
Enhancements:
- huge folio converion work by Matthew Wilcox
- clean up for mount API conversion by Eric Sandeen
- improve direct IO speed in the overwrite case
- add some sanity check on node consistency
- set highest IO priority for checkpoint thread
- keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
- add ioctl to get IO priority hint
- add carve_out sysfs node for fsstat
Bug fixes:
- disable nat_bits during umount to avoid potential nat entry corruption
- fix missing i_size update on atomic writes
- fix missing discard for active segments
- fix running out of free segments
- fix out-of-bounds access in f2fs_truncate_inode_blocks()
- call f2fs_recover_quota_end() correctly
- fix potential deadloop in prepare_compress_overwrite()
- fix the missing write pointer correction for zoned device
- fix to avoid panic once fallocation fails for pinfile
- don't retry IO for corrupted data scenario
There are many other clean up patches and minor bug fixes as usual"
* tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: fix missing discard for active segments
f2fs: optimize f2fs DIO overwrites
f2fs: fix to avoid atomicity corruption of atomic file
f2fs: pass sbi rather than sb to parse_options()
f2fs: pass sbi rather than sb to quota qf_name helpers
f2fs: defer readonly check vs norecovery
f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
f2fs: make LAZYTIME a mount option flag
f2fs: make INLINECRYPT a mount option flag
f2fs: factor out an f2fs_default_check function
f2fs: consolidate unsupported option handling errors
f2fs: use f2fs_sb_has_device_alias during option parsing
f2fs: add carve_out sysfs node
f2fs: fix to avoid running out of free segments
f2fs: Remove f2fs_write_node_page()
f2fs: Remove f2fs_write_meta_page()
f2fs: Remove f2fs_write_data_page()
f2fs: Remove check for ->writepage
Revert "f2fs: rebuild nat_bits during umount"
f2fs: fix to avoid accessing uninitialized curseg
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmfZy+0ACgkQxWXV+ddt
WDtcRw//bUfqbabUGBZ+t/a7YahSeukKx7jhHEHDvzaK8LSZj4otZtLtlKZbaNQK
gGhMitd+rwkf/KnnRvCmS9Y6v4PHbsH8NX0PaGH4ZFYD4mifAs6HNSQUzQIASAZt
OhX/PaKUdLN6kFOt4Yg8Qtem5LcF9Kmrc43ySkcF1T7KtZey8KZypMf0Af/4KvP/
QcNiYJiUlotz6m5K0+TjsDVJDKbYPYy07u3/9GHJBN8bEf5jswPmfDJrONd+NDFS
rMylVCTkW5Hl93qDM0zINPcyfuFFNUH4fWJVRizJPmOwQWUqkRx4J5nSKZzQSlgg
O3KTEYPJHG388an1Cs/k4oIEpOq2xJ7RKJP8ksPf/IcXOTJ0dLXUQisheRoeGyYR
04TWP1rZ2vyQI/LzlOiRozCkAWWhLMJMvWXRUTK/9z9Jh2dcbPdykJGQZ11D9hNI
W5i0XsHX/P2xD8D2sOHo+QY5o1QzMZpb+IaL/+Kv22s3Vb1brabZgOAq8H13l1/y
oe3RLVSLueth22q4GK/MSi7hxSZwV6Zj5HtxYxfs4RFqWo9sM6mp9xP3Via3MnLA
fK8FIMYUMqgvqonDqUD8Gv+YV15Haq8icO/2F9b9eiycJ1mSsRILVEiVCJGbBYIz
C1tB7j5Lv44ZExKHmxPzHMa8rrrG+jaSxxZpuLuOYX0VvVECKVY=
=t4Jn
-----END PGP SIGNATURE-----
Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible changes:
- fall back to buffered write if direct io is done on a file that
requires checksums
- this avoids a problem with checksum mismatch errors, observed
e.g. on virtual images when writes to pages under writeback
cause the checksum mismatch reports
- this may lead to some performance degradation but currently the
recommended setup for VM images is to use the NOCOW file
attribute that also disables checksums
- fast/realtime zstd levels -15 to -1
- supported by mount options (compress=zstd:-5) and defrag ioctl
- improved speed, reduced compression ratio, check the commit for
sample measurements
- defrag ioctl extended to accept negative compression levels
- subpage mode
- remove warning when subpage mode is used, the feature is now
reasonably complete and tested
- in debug mode allow to create 2K b-tree nodes to allow testing
subpage on x86_64 with 4K pages too
Performance improvements:
- in send, better file path caching improves runtime (on sample load
by -30%)
- on s390x with hardware zlib support prepare the input buffer in a
better way to get the best results from the acceleration
- minor speed improvement in encoded read, avoid memory allocation in
synchronous mode
Core:
- enable stable writes on inodes, replacing manually waiting for
writeback and allowing to skip that on inodes without checksums
- add last checks and warnings for out-of-band dirty writes to pages,
requiring a fixup ("fixup worker"), this should not be necessary
since 5.8 where get_user_page() and pin_user_pages*() prevent this
- long history behind that, we'll be happy to remove the whole
infrastructure in the near future
- more folio API conversions and preparations for large folio support
- subpage cleanups and refactoring, split handling of data and
metadata to allow future support for large folios
- readpage works as block-by-block, no change for normal mode, this
is preparation for future subpage updates
- block group refcount fixes and hardening
- delayed iput fixes
- in zoned mode, fix zone activation on filesystem with missing
devices
Cleanups:
- inode parameter cleanups
- path auto-freeing updates
- code flow simplifications in send
- redundant parameter cleanups"
* tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits)
btrfs: zoned: fix zone finishing with missing devices
btrfs: zoned: fix zone activation with missing devices
btrfs: remove end_no_trans label from btrfs_log_inode_parent()
btrfs: simplify condition for logging new dentries at btrfs_log_inode_parent()
btrfs: remove redundant else statement from btrfs_log_inode_parent()
btrfs: use memcmp_extent_buffer() at replay_one_extent()
btrfs: update outdated comment for overwrite_item()
btrfs: use variables to store extent buffer and slot at overwrite_item()
btrfs: avoid unnecessary memory allocation and copy at overwrite_item()
btrfs: don't clobber ret in btrfs_validate_super()
btrfs: prepare btrfs_page_mkwrite() for large folios
btrfs: prepare extent_io.c for future large folio support
btrfs: prepare btrfs_launcher_folio() for large folios support
btrfs: replace PAGE_SIZE with folio_size for subpage.[ch]
btrfs: add a size parameter to btrfs_alloc_subpage()
btrfs: subpage: make btrfs_is_subpage() check against a folio
btrfs: add extra warning if delayed iput is added when it's not allowed
btrfs: avoid redundant path slot assignment in btrfs_search_forward()
btrfs: remove unnecessary btrfs_key local variable in btrfs_search_forward()
btrfs: simplify the return value handling in search_ioctl()
...
- Fix endpoint BAR testing so the test can skip disabled BARs instead of
reporting them as failures (Niklas Cassel)
- Verify that pci_endpoint interrupt tests set the correct IRQ type
(Kunihiko Hayashi)
- Fix interpretation of pci_endpoint_test_bars_read_bar() error returns
(Niklas Cassel)
- Fix potential string truncation in pci_endpoint_test_probe() (Niklas
Cassel)
- Increase endpoint test BAR size variable to accommodate BARs larger than
INT_MAX (Niklas Cassel)
- Release IRQs to avoid leak in pci_endpoint interrupt tests (Kunihiko
Hayashi)
- Log the correct IRQ type when pci_endpoint IRQ request test fails
(Kunihiko Hayashi)
- Remove pci_endpoint_test irq_type and no_msi globals; instead use
test->irq_type (Kunihiko Hayashi)
- Remove unnecessary use of managed IRQ functions in pci_endpoint_test
(Kunihiko Hayashi)
- Add and use IRQ_TYPE_* defines in pci_endpoint_test (Niklas Cassel)
- Add struct pci_epc_features.intx_capable and note that RK3568 and RK3588
can't raise INTx interrupts (Niklas Cassel)
- Expose supported IRQ types in CAPS so pci_endpoint_test can set
appropriate type (Niklas Cassel)
- Add PCITEST_IRQ_TYPE_AUTO to pci_endpoint_test for cases where the IRQ
type doesn't matter (Niklas Cassel)
* pci/endpoint-test:
misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
PCI: endpoint: Add intx_capable to epc_features struct
selftests: pci_endpoint: Use IRQ_TYPE_* defines from UAPI header
misc: pci_endpoint_test: Use IRQ_TYPE_* defines from UAPI header
PCI: endpoint: pcitest: Add IRQ_TYPE_* defines to UAPI header
misc: pci_endpoint_test: Do not use managed IRQ functions
misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi'
misc: pci_endpoint_test: Fix 'irq_type' to convey the correct type
misc: pci_endpoint_test: Fix displaying 'irq_type' after 'request_irq' error
misc: pci_endpoint_test: Avoid issue of interrupts remaining after request_irq error
misc: pci_endpoint_test: Handle BAR sizes larger than INT_MAX
misc: pci_endpoint_test: Give disabled BARs a distinct error code
misc: pci_endpoint_test: Fix potential truncation in pci_endpoint_test_probe()
misc: pci_endpoint_test: Fix pci_endpoint_test_bars_read_bar() error handling
selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test
selftests: pci_endpoint: Skip disabled BARs
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo
Järvinen)
- Fix typo that repeatedly distributed resources to a bridge instead of
iterating over subordinate bridges, which resulted in too little space to
assign some BARs (Kai-Heng Feng)
- Relax bridge window tail sizing for optional resources, e.g., IOV BARs,
to avoid failures when removing and re-adding devices (Ilpo Järvinen)
- Fix a double counting error for I/O resources, as we previously did for
memory resources (Ilpo Järvinen)
- Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen)
- Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen)
- Add pci_resource_num() to look up the BAR number from the resource
pointer (Ilpo Järvinen)
- Add restore_dev_resource() to simplify code that resources saved device
resources (Ilpo Järvinen)
- Allow drivers to enable devices even if we haven't assigned optional IOV
resources to them (Ilpo Järvinen)
- Improve debug output during resource reallocation (Ilpo Järvinen)
- Rework handling of optional resources (IOV BARs, ROMs) to reduce failures
if we can't allocate them (Ilpo Järvinen)
- Move declarations of pci_rescan_bus_bridge_resize(),
pci_reassign_bridge_resources(), and CardBus-related sizes from
include/linux/pci.h to drivers/pci/pci.h since they're not used outside
the PCI core (Ilpo Järvinen)
- Make pci_setup_bridge() static (Ilpo Järvinen)
- Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory)
- Fix s390 mmio_read/write syscalls, which didn't cause page faults in some
cases, which broke vfio-pci lazy mapping on first access (Niklas
Schnelle)
- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was
disabled only for s390 (Niklas Schnelle)
- Support mmap of PCI resources on s390 except for ISM devices (Niklas
Schnelle)
* pci/resource:
s390/pci: Support mmap() of PCI resources except for ISM devices
s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
s390/pci: Fix s390_mmio_read/write syscall page fault handling
PCI: Fix NULL dereference in SR-IOV VF creation error path
PCI: Move cardbus IO size declarations into pci/pci.h
PCI: Make pci_setup_bridge() static
PCI: Move resource reassignment func declarations into pci/pci.h
PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
PCI: Fix BAR resizing when VF BARs are assigned
PCI: Do not claim to release resource falsely
PCI: Increase Resizable BAR support from 512 GB to 128 TB
PCI: Rework optional resource handling
PCI: Perform reset_resource() and build fail list in sync
PCI: Use res->parent to check if resource is assigned
PCI: Add debug print when releasing resources before retry
PCI: Indicate optional resource assignment failures
PCI: Always have realloc_head in __assign_resources_sorted()
PCI: Extend enable to check for any optional resource
PCI: Add restore_dev_resource()
PCI: Remove incorrect comment from pci_reassign_resource()
PCI: Consolidate assignment loop next round preparation
PCI: Rename retval to ret
PCI: Use while loop and break instead of gotos
PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
PCI: Converge return paths in __assign_resources_sorted()
PCI: Add dev & res local variables to resource assignment funcs
PCI: Add pci_resource_num() helper
PCI: Check resource_size() separately
PCI: Add pci_resource_is_iov() to identify IOV resources
PCI: Use resource_set_{range,size}() helpers
PCI: Use SZ_* instead of literals in setup-bus.c
PCI: Fix old_size lower bound in calculate_iosize() too
PCI: Allow relaxed bridge window tail sizing for optional resources
PCI: Simplify size1 assignment logic
PCI: Use min_align, not unrelated add_align, for size0
PCI: Remove add_align overwrite unrelated to size0
PCI: Use downstream bridges for distributing resources
PCI: Cleanup dev->resource + resno to use pci_resource_n()
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair
Francis)
- Expose supported DOE features via sysfs (Alistair Francis)
- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
Francis)
* pci/doe:
PCI/DOE: Allow enabling DOE without CXL
PCI/DOE: Expose DOE features via sysfs
PCI/DOE: Rename Discovery Response Data Object Contents to type
PCI/DOE: Rename DOE protocol to feature
Core & protocols
----------------
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls).
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked)
in BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream performance
up to 2x.
- Improve performance of contended connect() by 200% by searching
for an available 4-tuple under RCU rather than a spin lock.
Bring an additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles.
There are up to 4 namespace roles but we used to have just 2 netns
pointer arguments, interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout
in TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar
to normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name
to messages as metadata
Driver API
----------
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where possible.
Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers
--------------
- Remove the IBM LCS driver for s390.
- Remove the sb1000 cable modem driver.
- Add support for SFP module access over SMBus.
- Add MCTP transport driver for MCTP-over-USB.
- Enable XDP metadata support in multiple drivers.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96% with
1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfkLC8ACgkQMUZtbf5S
Irsb5g/+L7oKOf0ALbaV9kxFsoz8AymZfAW9i/27F07omGJGpks8oX6j6rQLgIRO
OQOFcp7XEdDh1+jh82gHVuPrw2/6lchLtW8ARtzdiQKFr5DRjrsbtua6GRc8iBqA
DIRCBFoV2HuMkF39Vr09HMa9AZAT7QR2RLsRGpSq8E8Z8xxKz0X7oujs10PFpMTE
IVKhTrVrk+NDot/IU2hzVpnpup+0ld+T2/ZaBklJGcU8uDffImsqNepHRyCG5UC3
xz74Ju23MAj24Gct+og0yFUooF+lUltKyVm0FYCDCY3bASTwgY01NR3kEH/0NQvM
cywLzd/ngHm/SMD2ggVAHkjZUieiIVHdaZ53dgjDeBOQoVP6p0dgUK7EumXX8Mx4
8ReR2UiGoYRPaq9c4o+IjG4K027MwVK2p+mF1a6MLa+20XcyMbev8FIRbbHtC/V4
z5/FsOAxcuICWkA1hU9bODrrGzIqemmdRgKG8sGuTJCt/kYGAn72/TCATGNSaCJ0
00n2jN1aepa7wtywHJ5MhVzxN9iQX7+geUHXz0BI+lK4e1Pmk+vjGksymb9ai2fk
eQAUV9ekub6q68/J16scD7XeOUM37bTLiMBQeIF8UtZBOJscKiS71zn9QP9Twwxv
P2pm01RDZUI+z5ZX3hc12Pm1vjRHaAh9S1JpAw/pTOVlQ+mAJEM=
=XY0S
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfe8DcQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgprzNEADPOOfi05F0f4KOMLYOiRlB+D9yGtnbcu8k
foPATU++L5fFDUpjAGobi38AXqHtF/Bx0CjCkPCvy4GjygcszYDY81FQmRlSFdeH
CnYV/bm5myyupc8xANcWGJYpxf/ZAw7ugOAgcR3Xxk/qXdD5hYJ1jG4477o8v4IJ
jyzE4vQCnWBogIoqpWTQyY3FSP5fdjeQlYdFGIFCx68rFTO9zEpuHhFr2OMfJtSu
y1dLkeqbu1BZ0fhgRog9k9ZWk0uRQJI7d7EUXf9iwXOfkUyzCaV+nHvNGRdhJ/XO
zj/qePw4lFyCgc+kuIejjIaPf/g84paCzshz/PIxOZTsvo6X78zj72RPM0Makhga
amgtEqegWHtKSya1zew57oDwqwaIe6b+56PtNSP7K8IrwcJLheszoPKJIwiiynIL
ZqvWseqYQXs58S9qZzSZlyEpwLVFcvyNt7Z/q7b56vhvXHabxOccPkUHt+UscVoQ
qpG0R9+zmIiDfzY6Vfu7JyH2Uspdq3WjGYiaoVUko1rJXY+HFPd3be9pIbckyaA8
ZQHIxB+h8IlxhpQ8HJKQpZhtYiluxEu6wDdvtpwC0KiGp5g4Q5eg9SkVyvETIEN8
yW2tHqMAJJtJjDAPwJOV+GIyWQMIexzLmMP7Fq8R5VIoAox+Xn3flgNNEnMZdl/r
kTc/hRjHvQ==
=VEQr
-----END PGP SIGNATURE-----
Merge tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
"This is the first of the io_uring pull requests for the 6.15 merge
window, there will be others once the net tree has gone in. This
contains:
- Cleanup and unification of cancelation handling across various
request types.
- Improvement for bundles, supporting them both for incrementally
consumed buffers, and for non-multishot requests.
- Enable toggling of using iowait while waiting on io_uring events or
not. Unfortunately this is still tied with CPU frequency boosting
on short waits, as the scheduler side has not been very receptive
to splitting the (useless) iowait stat from the cpufreq implied
boost.
- Add support for kbuf nodes, enabling zero-copy support for the ublk
block driver.
- Various cleanups for resource node handling.
- Series greatly cleaning up the legacy provided (non-ring based)
buffers. For years, we've been pushing the ring provided buffers as
the way to go, and that is what people have been using. Reduce the
complexity and code associated with legacy provided buffers.
- Series cleaning up the compat handling.
- Series improving and cleaning up the recvmsg/sendmsg iovec and msg
handling.
- Series of cleanups for io-wq.
- Start adding a bunch of selftests. The liburing repository
generally carries feature and regression tests for everything, but
at least for ublk initially, we'll try and go the route of having
it in selftests as well. We'll see how this goes, might decide to
migrate more tests this way in the future.
- Various little cleanups and fixes"
* tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux: (108 commits)
selftests: ublk: add stripe target
selftests: ublk: simplify loop io completion
selftests: ublk: enable zero copy for null target
selftests: ublk: prepare for supporting stripe target
selftests: ublk: move common code into common.c
selftests: ublk: increase max buffer size to 1MB
selftests: ublk: add single sqe allocator helper
selftests: ublk: add generic_01 for verifying sequential IO order
selftests: ublk: fix starting ublk device
io_uring: enable toggle of iowait usage when waiting on CQEs
selftests: ublk: fix write cache implementation
selftests: ublk: add variable for user to not show test result
selftests: ublk: don't show `modprobe` failure
selftests: ublk: add one dependency header
io_uring/kbuf: enable bundles for incrementally consumed buffers
Revert "io_uring/rsrc: simplify the bvec iter count calculation"
selftests: ublk: improve test usability
selftests: ublk: add stress test for covering IO vs. killing ublk server
selftests: ublk: add one stress test for covering IO vs. removing device
selftests: ublk: load/unload ublk_drv when preparing & cleaning up tests
...
Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF for the case of sandboxer
tools, init systems, or runtime containers launching programs sandboxing
themselves in an inconsistent way. Setting this flag should only
depends on runtime configuration (i.e. not hardcoded).
We don't create a new ruleset's option because this should not be part
of the security policy: only the task that enforces the policy (not the
one that create it) knows if itself or its children may request denied
actions.
This is the first and only flag that can be set without actually
restricting the caller (i.e. without providing a ruleset).
Extend struct landlock_cred_security with a u8 log_subdomains_off.
struct landlock_file_security is still 16 bytes.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Closes: https://github.com/landlock-lsm/linux/issues/3
Link: https://lore.kernel.org/r/20250320190717.2287696-19-mic@digikod.net
[mic: Fix comment]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Most of the time we want to log denied access because they should not
happen and such information helps diagnose issues. However, when
sandboxing processes that we know will try to access denied resources
(e.g. unknown, bogus, or malicious binary), we might want to not log
related access requests that might fill up logs.
By default, denied requests are logged until the task call execve(2).
If the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF flag is set, denied
requests will not be logged for the same executed file.
If the LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON flag is set, denied
requests from after an execve(2) call will be logged.
The rationale is that a program should know its own behavior, but not
necessarily the behavior of other programs.
Because LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF is set for a specific
Landlock domain, it makes it possible to selectively mask some access
requests that would be logged by a parent domain, which might be handy
for unprivileged processes to limit logs. However, system
administrators should still use the audit filtering mechanism. There is
intentionally no audit nor sysctl configuration to re-enable these logs.
This is delegated to the user space program.
Increment the Landlock ABI version to reflect this interface change.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-18-mic@digikod.net
[mic: Rename variables and fix __maybe_unused]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Asynchronously log domain information when it first denies an access.
This minimize the amount of generated logs, which makes it possible to
always log denials for the current execution since they should not
happen. These records are identified with the new AUDIT_LANDLOCK_DOMAIN
type.
The AUDIT_LANDLOCK_DOMAIN message contains:
- the "domain" ID which is described;
- the "status" which can either be "allocated" or "deallocated";
- the "mode" which is for now only "enforcing";
- for the "allocated" status, a minimal set of properties to easily
identify the task that loaded the domain's policy with
landlock_restrict_self(2): "pid", "uid", executable path ("exe"), and
command line ("comm");
- for the "deallocated" state, the number of "denials" accounted to this
domain, which is at least 1.
This requires each domain to save these task properties at creation
time in the new struct landlock_details. A reference to the PID is kept
for the lifetime of the domain to avoid race conditions when
investigating the related task. The executable path is resolved and
stored to not keep a reference to the filesystem and block related
actions. All these metadata are stored for the lifetime of the related
domain and should then be minimal. The required memory is not accounted
to the task calling landlock_restrict_self(2) contrary to most other
Landlock allocations (see related comment).
The AUDIT_LANDLOCK_DOMAIN record follows the first AUDIT_LANDLOCK_ACCESS
record for the same domain, which is always followed by AUDIT_SYSCALL
and AUDIT_PROCTITLE. This is in line with the audit logic to first
record the cause of an event, and then add context with other types of
record.
Audit event sample for a first denial:
type=LANDLOCK_ACCESS msg=audit(1732186800.349:44): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
type=LANDLOCK_DOMAIN msg=audit(1732186800.349:44): domain=195ba459b status=allocated mode=enforcing pid=300 uid=0 exe="/root/sandboxer" comm="sandboxer"
type=SYSCALL msg=audit(1732186800.349:44): arch=c000003e syscall=101 success=no [...] pid=300 auid=0
Audit event sample for a following denial:
type=LANDLOCK_ACCESS msg=audit(1732186800.372:45): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
type=SYSCALL msg=audit(1732186800.372:45): arch=c000003e syscall=101 success=no [...] pid=300 auid=0
Log domain deletion with the "deallocated" state when a domain was
previously logged. This makes it possible for log parsers to free
potential resources when a domain ID will never show again.
The number of denied access requests is useful to easily check how many
access requests a domain blocked and potentially if some of them are
missing in logs because of audit rate limiting, audit rules, or Landlock
log configuration flags (see following commit).
Audit event sample for a deletion of a domain that denied something:
type=LANDLOCK_DOMAIN msg=audit(1732186800.393:46): domain=195ba459b status=deallocated denials=2
Cc: Günther Noack <gnoack@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-11-mic@digikod.net
[mic: Update comment and GFP flag for landlock_log_drop_domain()]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Add a new AUDIT_LANDLOCK_ACCESS record type dedicated to an access
request denied by a Landlock domain. AUDIT_LANDLOCK_ACCESS indicates
that something unexpected happened.
For now, only denied access are logged, which means that any
AUDIT_LANDLOCK_ACCESS record is always followed by a SYSCALL record with
"success=no". However, log parsers should check this syscall property
because this is the only sign that a request was denied. Indeed, we
could have "success=yes" if Landlock would support a "permissive" mode.
We could also add a new field to AUDIT_LANDLOCK_DOMAIN for this mode
(see following commit).
By default, the only logged access requests are those coming from the
same executed program that enforced the Landlock restriction on itself.
In other words, no audit record are created for a task after it called
execve(2). This is required to avoid log spam because programs may only
be aware of their own restrictions, but not the inherited ones.
Following commits will allow to conditionally generate
AUDIT_LANDLOCK_ACCESS records according to dedicated
landlock_restrict_self(2)'s flags.
The AUDIT_LANDLOCK_ACCESS message contains:
- the "domain" ID restricting the action on an object,
- the "blockers" that are missing to allow the requested access,
- a set of fields identifying the related object (e.g. task identified
with "opid" and "ocomm").
The blockers are implicit restrictions (e.g. ptrace), or explicit access
rights (e.g. filesystem), or explicit scopes (e.g. signal). This field
contains a list of at least one element, each separated with a comma.
The initial blocker is "ptrace", which describe all implicit Landlock
restrictions related to ptrace (e.g. deny tracing of tasks outside a
sandbox).
Add audit support to ptrace_access_check and ptrace_traceme hooks. For
the ptrace_access_check case, we log the current/parent domain and the
child task. For the ptrace_traceme case, we log the parent domain and
the current/child task. Indeed, the requester and the target are the
current task, but the action would be performed by the parent task.
Audit event sample:
type=LANDLOCK_ACCESS msg=audit(1729738800.349:44): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
type=SYSCALL msg=audit(1729738800.349:44): arch=c000003e syscall=101 success=no [...] pid=300 auid=0
A following commit adds user documentation.
Add KUnit tests to check reading of domain ID relative to layer level.
The quick return for non-landlocked tasks is moved from task_ptrace() to
each LSM hooks.
It is not useful to inline the audit_enabled check because other
computation are performed by landlock_log_denial().
Use scoped guards for RCU read-side critical sections.
Cc: Günther Noack <gnoack@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-10-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
For PCITEST_MSI we really want to set PCITEST_SET_IRQTYPE explicitly
to PCITEST_IRQ_TYPE_MSI, since we want to test if MSI works.
For PCITEST_MSIX we really want to set PCITEST_SET_IRQTYPE explicitly
to PCITEST_IRQ_TYPE_MSIX, since we want to test if MSI works.
For PCITEST_LEGACY_IRQ we really want to set PCITEST_SET_IRQTYPE
explicitly to PCITEST_IRQ_TYPE_INTX, since we want to test if INTx
works.
However, for PCITEST_WRITE, PCITEST_READ, PCITEST_COPY, we really don't
care which IRQ type that is used, we just want to use a IRQ type that is
supported by the EPC.
The old behavior was to always use MSI for PCITEST_WRITE, PCITEST_READ,
PCITEST_COPY, was to always set IRQ type to MSI before doing the actual
test, however, there are EPC drivers that do not support MSI.
Add a new PCITEST_IRQ_TYPE_AUTO, that will use the CAPS register to see
which IRQ types the endpoint supports, and use one of the supported IRQ
types.
For backwards compatibility, if the endpoint does not expose any supported
IRQ type in the CAPS register, simply fallback to using MSI, as it was
unconditionally done before.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310111016.859445-16-cassel@kernel.org
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmfhlLATHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXgchCADOz33rSm4G4w4r0qT05dTDi/lZkEdK
64dQq322XXP/C9FfR66d30243gsAmuM5a0SvzFHLXAOu6yqM270Xehd/Rud+Um2s
lSVnc0Ux0AWBgksqFd0t577aN7zmJEukosEYO5lBNop+zOcadrm3S6Th/AoL2h/D
yphPkhH13bsCK+Wll/eBOQLIhC9iA0konYbBLuEQ5MqvUbrzc6Rmb5gxsHHZKOqg
vLjkrYR/d3s2gIpKxiFp0RwvzGyffZEHxvU/YF3hTenPMlTlnXWbyspBSTVmWggP
13IFLzqxDdW9RgUnGB4xRc424AC1LKqEr42QPQE7zGvl2jdJriA2Q1LT
=BXqj
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Add support for running as the root partition in Hyper-V (Microsoft
Hypervisor) by exposing /dev/mshv (Nuno and various people)
- Add support for CPU offlining in Hyper-V (Hamza Mahfooz)
- Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael
Kelley, Thorsten Blum)
* tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
x86/hyperv: fix an indentation issue in mshyperv.h
x86/hyperv: Add comments about hv_vpset and var size hypercall input args
Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs
hyperv: Add definitions for root partition driver to hv headers
x86: hyperv: Add mshv_handler() irq handler and setup function
Drivers: hv: Introduce per-cpu event ring tail
Drivers: hv: Export some functions for use by root partition module
acpi: numa: Export node_to_pxm()
hyperv: Introduce hv_recommend_using_aeoi()
arm64/hyperv: Add some missing functions to arm64
x86/mshyperv: Add support for extended Hyper-V features
hyperv: Log hypercall status codes as strings
x86/hyperv: Fix check of return value from snp_set_vmsa()
x86/hyperv: Add VTL mode callback for restarting the system
x86/hyperv: Add VTL mode emergency restart callback
hyperv: Remove unused union and structs
hyperv: Add CONFIG_MSHV_ROOT to gate root partition support
hyperv: Change hv_root_partition into a function
hyperv: Convert hypercall statuses to linux error codes
drivers/hv: add CPU offlining support
...
* Nested virtualization support for VGICv3, giving the nested
hypervisor control of the VGIC hardware when running an L2 VM
* Removal of 'late' nested virtualization feature register masking,
making the supported feature set directly visible to userspace
* Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
* Paravirtual interface for discovering the set of CPU implementations
where a VM may run, addressing a longstanding issue of guest CPU
errata awareness in big-little systems and cross-implementation VM
migration
* Userspace control of the registers responsible for identifying a
particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
allowing VMs to be migrated cross-implementation
* pKVM updates, including support for tracking stage-2 page table
allocations in the protected hypervisor in the 'SecPageTable' stat
* Fixes to vPMU, ensuring that userspace updates to the vPMU after
KVM_RUN are reflected into the backing perf events
LoongArch:
* Remove unnecessary header include path
* Assume constant PGD during VM context switch
* Add perf events support for guest VM
RISC-V:
* Disable the kernel perf counter during configure
* KVM selftests improvements for PMU
* Fix warning at the time of KVM module removal
x86:
* Add support for aging of SPTEs without holding mmu_lock. Not taking mmu_lock
allows multiple aging actions to run in parallel, and more importantly avoids
stalling vCPUs. This includes an implementation of per-rmap-entry locking;
aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas
locking an rmap for write requires taking both the per-rmap spinlock and
the mmu_lock.
Note that this decreases slightly the accuracy of accessed-page information,
because changes to the SPTE outside aging might not use atomic operations
even if they could race against a clear of the Accessed bit. This is
deliberate because KVM and mm/ tolerate false positives/negatives for
accessed information, and testing has shown that reducing the latency of
aging is far more beneficial to overall system performance than providing
"perfect" young/old information.
* Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
coalesce updates when multiple pieces of vCPU state are changing, e.g. as
part of a nested transition.
* Fix a variety of nested emulation bugs, and add VMX support for synthesizing
nested VM-Exit on interception (instead of injecting #UD into L2).
* Drop "support" for async page faults for protected guests that do not set
SEND_ALWAYS (i.e. that only want async page faults at CPL3)
* Bring a bit of sanity to x86's VM teardown code, which has accumulated
a lot of cruft over the years. Particularly, destroy vCPUs before
the MMU, despite the latter being a VM-wide operation.
* Add common secure TSC infrastructure for use within SNP and in the
future TDX
* Block KVM_CAP_SYNC_REGS if guest state is protected. It does not make
sense to use the capability if the relevant registers are not
available for reading or writing.
* Don't take kvm->lock when iterating over vCPUs in the suspend notifier to
fix a largely theoretical deadlock.
* Use the vCPU's actual Xen PV clock information when starting the Xen timer,
as the cached state in arch.hv_clock can be stale/bogus.
* Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different
PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend
notifier only accounts for kvmclock, and there's no evidence that the
flag is actually supported by Xen guests.
* Clean up the per-vCPU "cache" of its reference pvclock, and instead only
track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately
expensive to compute, and rarely changes for modern setups).
* Don't write to the Xen hypercall page on MSR writes that are initiated by
the host (userspace or KVM) to fix a class of bugs where KVM can write to
guest memory at unexpected times, e.g. during vCPU creation if userspace has
set the Xen hypercall MSR index to collide with an MSR that KVM emulates.
* Restrict the Xen hypercall MSR index to the unofficial synthetic range to
reduce the set of possible collisions with MSRs that are emulated by KVM
(collisions can still happen as KVM emulates Hyper-V MSRs, which also reside
in the synthetic range).
* Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config.
* Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID
entries when updating PV clocks; there is no guarantee PV clocks will be
updated between TSC frequency changes and CPUID emulation, and guest reads
of the TSC leaves should be rare, i.e. are not a hot path.
x86 (Intel):
* Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus
modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1.
* Pass XFD_ERR as the payload when injecting #NM, as a preparatory step
for upcoming FRED virtualization support.
* Decouple the EPT entry RWX protection bit macros from the EPT Violation
bits, both as a general cleanup and in anticipation of adding support for
emulating Mode-Based Execution Control (MBEC).
* Reject KVM_RUN if userspace manages to gain control and stuff invalid guest
state while KVM is in the middle of emulating nested VM-Enter.
* Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs
in anticipation of adding sanity checks for secondary exit controls (the
primary field is out of bits).
x86 (AMD):
* Ensure the PSP driver is initialized when both the PSP and KVM modules are
built-in (the initcall framework doesn't handle dependencies).
* Use long-term pins when registering encrypted memory regions, so that the
pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to
excessive fragmentation.
* Add macros and helpers for setting GHCB return/error codes.
* Add support for Idle HLT interception, which elides interception if the vCPU
has a pending, unmasked virtual IRQ when HLT is executed.
* Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical
address.
* Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g.
because the vCPU was "destroyed" via SNP's AP Creation hypercall.
* Reject SNP AP Creation if the requested SEV features for the vCPU don't
match the VM's configured set of features.
Selftests:
* Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data
instead of executing code. The theory is that modern Intel CPUs have
learned new code prefetching tricks that bypass the PMU counters.
* Fix a flaw in the Intel PMU counters test where it asserts that an event is
counting correctly without actually knowing what the event counts on the
underlying hardware.
* Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
improve its coverage by collecting all dirty entries on each iteration.
* Fix a few minor bugs related to handling of stats FDs.
* Add infrastructure to make vCPU and VM stats FDs available to tests by
default (open the FDs during VM/vCPU creation).
* Relax an assertion on the number of HLT exits in the xAPIC IPI test when
running on a CPU that supports AMD's Idle HLT (which elides interception of
HLT if a virtual IRQ is pending and unmasked).
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmfcTkEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMnQAf/cPx72hJOdNy4Qrm8M33YLXVRVV00
yEZ8eN8TWdOclr0ltE/w/ELGh/qS4CU8pjURAk0A6lPioU+mdcTn3dPEqMDMVYom
uOQ2lusEHw0UuSnGZSEjvZJsE/Ro2NSAsHIB6PWRqig1ZBPJzyu0frce34pMpeQH
diwriJL9lKPAhBWXnUQ9BKoi1R0P5OLW9ahX4SOWk7cAFg4DLlDE66Nqf6nKqViw
DwEucTiUEg5+a3d93gihdD4JNl+fb3vI2erxrMxjFjkacl0qgqRu3ei3DG0MfdHU
wNcFSG5B1n0OECKxr80lr1Ip1KTVNNij0Ks+w6Gc6lSg9c4PptnNkfLK3A==
=nnCN
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Nested virtualization support for VGICv3, giving the nested
hypervisor control of the VGIC hardware when running an L2 VM
- Removal of 'late' nested virtualization feature register masking,
making the supported feature set directly visible to userspace
- Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
- Paravirtual interface for discovering the set of CPU
implementations where a VM may run, addressing a longstanding issue
of guest CPU errata awareness in big-little systems and
cross-implementation VM migration
- Userspace control of the registers responsible for identifying a
particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
allowing VMs to be migrated cross-implementation
- pKVM updates, including support for tracking stage-2 page table
allocations in the protected hypervisor in the 'SecPageTable' stat
- Fixes to vPMU, ensuring that userspace updates to the vPMU after
KVM_RUN are reflected into the backing perf events
LoongArch:
- Remove unnecessary header include path
- Assume constant PGD during VM context switch
- Add perf events support for guest VM
RISC-V:
- Disable the kernel perf counter during configure
- KVM selftests improvements for PMU
- Fix warning at the time of KVM module removal
x86:
- Add support for aging of SPTEs without holding mmu_lock.
Not taking mmu_lock allows multiple aging actions to run in
parallel, and more importantly avoids stalling vCPUs. This includes
an implementation of per-rmap-entry locking; aging the gfn is done
with only a per-rmap single-bin spinlock taken, whereas locking an
rmap for write requires taking both the per-rmap spinlock and the
mmu_lock.
Note that this decreases slightly the accuracy of accessed-page
information, because changes to the SPTE outside aging might not
use atomic operations even if they could race against a clear of
the Accessed bit.
This is deliberate because KVM and mm/ tolerate false
positives/negatives for accessed information, and testing has shown
that reducing the latency of aging is far more beneficial to
overall system performance than providing "perfect" young/old
information.
- Defer runtime CPUID updates until KVM emulates a CPUID instruction,
to coalesce updates when multiple pieces of vCPU state are
changing, e.g. as part of a nested transition
- Fix a variety of nested emulation bugs, and add VMX support for
synthesizing nested VM-Exit on interception (instead of injecting
#UD into L2)
- Drop "support" for async page faults for protected guests that do
not set SEND_ALWAYS (i.e. that only want async page faults at CPL3)
- Bring a bit of sanity to x86's VM teardown code, which has
accumulated a lot of cruft over the years. Particularly, destroy
vCPUs before the MMU, despite the latter being a VM-wide operation
- Add common secure TSC infrastructure for use within SNP and in the
future TDX
- Block KVM_CAP_SYNC_REGS if guest state is protected. It does not
make sense to use the capability if the relevant registers are not
available for reading or writing
- Don't take kvm->lock when iterating over vCPUs in the suspend
notifier to fix a largely theoretical deadlock
- Use the vCPU's actual Xen PV clock information when starting the
Xen timer, as the cached state in arch.hv_clock can be stale/bogus
- Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across
different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as
KVM's suspend notifier only accounts for kvmclock, and there's no
evidence that the flag is actually supported by Xen guests
- Clean up the per-vCPU "cache" of its reference pvclock, and instead
only track the vCPU's TSC scaling (multipler+shift) metadata (which
is moderately expensive to compute, and rarely changes for modern
setups)
- Don't write to the Xen hypercall page on MSR writes that are
initiated by the host (userspace or KVM) to fix a class of bugs
where KVM can write to guest memory at unexpected times, e.g.
during vCPU creation if userspace has set the Xen hypercall MSR
index to collide with an MSR that KVM emulates
- Restrict the Xen hypercall MSR index to the unofficial synthetic
range to reduce the set of possible collisions with MSRs that are
emulated by KVM (collisions can still happen as KVM emulates
Hyper-V MSRs, which also reside in the synthetic range)
- Clean up and optimize KVM's handling of Xen MSR writes and
xen_hvm_config
- Update Xen TSC leaves during CPUID emulation instead of modifying
the CPUID entries when updating PV clocks; there is no guarantee PV
clocks will be updated between TSC frequency changes and CPUID
emulation, and guest reads of the TSC leaves should be rare, i.e.
are not a hot path
x86 (Intel):
- Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and
thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1
- Pass XFD_ERR as the payload when injecting #NM, as a preparatory
step for upcoming FRED virtualization support
- Decouple the EPT entry RWX protection bit macros from the EPT
Violation bits, both as a general cleanup and in anticipation of
adding support for emulating Mode-Based Execution Control (MBEC)
- Reject KVM_RUN if userspace manages to gain control and stuff
invalid guest state while KVM is in the middle of emulating nested
VM-Enter
- Add a macro to handle KVM's sanity checks on entry/exit VMCS
control pairs in anticipation of adding sanity checks for secondary
exit controls (the primary field is out of bits)
x86 (AMD):
- Ensure the PSP driver is initialized when both the PSP and KVM
modules are built-in (the initcall framework doesn't handle
dependencies)
- Use long-term pins when registering encrypted memory regions, so
that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and
don't lead to excessive fragmentation
- Add macros and helpers for setting GHCB return/error codes
- Add support for Idle HLT interception, which elides interception if
the vCPU has a pending, unmasked virtual IRQ when HLT is executed
- Fix a bug in INVPCID emulation where KVM fails to check for a
non-canonical address
- Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is
invalid, e.g. because the vCPU was "destroyed" via SNP's AP
Creation hypercall
- Reject SNP AP Creation if the requested SEV features for the vCPU
don't match the VM's configured set of features
Selftests:
- Fix again the Intel PMU counters test; add a data load and do
CLFLUSH{OPT} on the data instead of executing code. The theory is
that modern Intel CPUs have learned new code prefetching tricks
that bypass the PMU counters
- Fix a flaw in the Intel PMU counters test where it asserts that an
event is counting correctly without actually knowing what the event
counts on the underlying hardware
- Fix a variety of flaws, bugs, and false failures/passes
dirty_log_test, and improve its coverage by collecting all dirty
entries on each iteration
- Fix a few minor bugs related to handling of stats FDs
- Add infrastructure to make vCPU and VM stats FDs available to tests
by default (open the FDs during VM/vCPU creation)
- Relax an assertion on the number of HLT exits in the xAPIC IPI test
when running on a CPU that supports AMD's Idle HLT (which elides
interception of HLT if a virtual IRQ is pending and unmasked)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits)
RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed
RISC-V: KVM: Teardown riscv specific bits after kvm_exit
LoongArch: KVM: Register perf callbacks for guest
LoongArch: KVM: Implement arch-specific functions for guest perf
LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()
LoongArch: KVM: Remove PGD saving during VM context switch
LoongArch: KVM: Remove unnecessary header include path
KVM: arm64: Tear down vGIC on failed vCPU creation
KVM: arm64: PMU: Reload when resetting
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
KVM: arm64: Initialize HCRX_EL2 traps in pKVM
KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected
KVM: x86: Add infrastructure for secure TSC
KVM: x86: Push down setting vcpu.arch.user_set_tsc
...
core:
- Add support for skb TX SND/COMPLETION timestamping
- hci_core: Enable buffer flow control for SCO/eSCO
- coredump: Log devcd dumps into the monitor
drivers:
- btusb: Add 2 HWIDs for MT7922
- btusb: Fix regression in the initialization of fake Bluetooth controllers
- btusb: Add 14 USB device IDs for Qualcomm WCN785x
- btintel: Add support for Intel Scorpius Peak
- btintel: Add support to configure TX power
- btintel: Add DSBR support for ScP
- btintel_pcie: Add device id of Whale Peak
- btintel_pcie: Setup buffers for firmware traces
- btintel_pcie: Read hardware exception data
- btintel_pcie: Add support for device coredump
- btintel_pcie: Trigger device coredump on hardware exception
- btnxpuart: Support for controller wakeup gpio config
- btnxpuart: Add support to set BD address
- btnxpuart: Add correct bootloader error codes
- btnxpuart: Handle bootloader error during cmd5 and cmd7
- btnxpuart: Fix kernel panic during FW release
- qca: add WCN3950 support
- hci_qca: use the power sequencer for wcn6750
- btmtksdio: Prevent enabling interrupts after IRQ handler removal
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmfjA3AZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKeWWD/0QZYBrNP9QSTyYTeNlYhCC
Lw/n7n3+LhxOqIu+tOGS7UplTqR3p4WQGu2g+XV9wSu4dvLplZGn/40XtiXJA0+r
VsXivQ4IR/Vjd8sNLLixfmuH4g4CbMSblQvECD3/5wFTeSH8T6/gyts/WV/LDYOJ
jp5kG6HCFHMv7RaiaHdZ0Pe4c1xJ/Ek8TnrW4G/kPBaLzm+lhjRkfzxCx8cO0k0H
mpoheUohLtUSgfLf49826t7rp3HDuX9db2hiGXQfKSrL2milwufKNaMFTmbuU2Uq
IyAwR1CEdSsKlcpbnVNF05r94sjf8NjuBD3YWxB9OfVXq9aymJYZdGh54XT5nF3f
ccD1WNnOoTsXDbEAnuVx+EYrJAI70e7vE4m8XbpxJcxhFoSIQCmtDoXUY199LiEG
El7DlwouNFESmOIj6gSK7ogUEhmcQA0AJxBVpdxnGBAcQMn1hrGSb75VGg7rul8K
HcBtf5j4gxZum2fzrufeY1+eBxfSRjNFIsnutAr53gEtidPZYlmiRyAuqQJMo2sO
0hlpC6gQGQJ5HiTR5Jbo9u+OC1oeHHXNN5IpNrd6J65n0pqCRldDsoXCerEJsOou
cqWzb9lfQCrjRnWRxUWOLukdYRgBH15wuORU+Z7/qVhqOr49RvARnzq7AjGSenTS
YKb2N+NGlqpxG+vQscFJUA==
=XCdn
-----END PGP SIGNATURE-----
Merge tag 'for-net-next-2025-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
core:
- Add support for skb TX SND/COMPLETION timestamping
- hci_core: Enable buffer flow control for SCO/eSCO
- coredump: Log devcd dumps into the monitor
drivers:
- btusb: Add 2 HWIDs for MT7922
- btusb: Fix regression in the initialization of fake Bluetooth controllers
- btusb: Add 14 USB device IDs for Qualcomm WCN785x
- btintel: Add support for Intel Scorpius Peak
- btintel: Add support to configure TX power
- btintel: Add DSBR support for ScP
- btintel_pcie: Add device id of Whale Peak
- btintel_pcie: Setup buffers for firmware traces
- btintel_pcie: Read hardware exception data
- btintel_pcie: Add support for device coredump
- btintel_pcie: Trigger device coredump on hardware exception
- btnxpuart: Support for controller wakeup gpio config
- btnxpuart: Add support to set BD address
- btnxpuart: Add correct bootloader error codes
- btnxpuart: Handle bootloader error during cmd5 and cmd7
- btnxpuart: Fix kernel panic during FW release
- qca: add WCN3950 support
- hci_qca: use the power sequencer for wcn6750
- btmtksdio: Prevent enabling interrupts after IRQ handler removal
* tag 'for-net-next-2025-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (53 commits)
Bluetooth: MGMT: Add LL Privacy Setting
Bluetooth: hci_event: Fix handling of HCI_EV_LE_DIRECT_ADV_REPORT
Bluetooth: btnxpuart: Fix kernel panic during FW release
Bluetooth: btnxpuart: Handle bootloader error during cmd5 and cmd7
Bluetooth: btnxpuart: Add correct bootloader error codes
t blameBluetooth: btintel: Fix leading white space
Bluetooth: btintel: Add support to configure TX power
Bluetooth: btmtksdio: Prevent enabling interrupts after IRQ handler removal
Bluetooth: btmtk: Remove the resetting step before downloading the fw
Bluetooth: SCO: add TX timestamping
Bluetooth: L2CAP: add TX timestamping
Bluetooth: ISO: add TX timestamping
Bluetooth: add support for skb TX SND/COMPLETION timestamping
net-timestamp: COMPLETION timestamp on packet tx completion
HCI: coredump: Log devcd dumps into the monitor
Bluetooth: HCI: Add definition of hci_rp_remote_name_req_cancel
Bluetooth: hci_vhci: Mark Sync Flow Control as supported
Bluetooth: hci_core: Enable buffer flow control for SCO/eSCO
Bluetooth: btintel_pci: Fix build warning
Bluetooth: btintel_pcie: Trigger device coredump on hardware exception
...
====================
Link: https://patch.msgid.link/20250325192925.2497890-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Consolidate the VDSO storage
The VDSO data storage and data layout has been largely architecture
specific for historical reasons. That increases the maintenance effort
and causes inconsistencies over and over.
There is no real technical reason for architecture specific layouts and
implementations. The architecture specific details can easily be
integrated into a generic layout, which also reduces the amount of
duplicated code for managing the mappings.
Convert all architectures over to a unified layout and common mapping
infrastructure. This splits the VDSO data layout into subsystem
specific blocks, timekeeping, random and architecture parts, which
provides a better structure and allows to improve and update the
functionalities without conflict and interaction.
- Rework the timekeeping data storage
The current implementation is designed for exposing system timekeeping
accessors, which was good enough at the time when it was designed.
PTP and Time Sensitive Networking (TSN) change that as there are
requirements to expose independent PTP clocks, which are not related to
system timekeeping.
Replace the monolithic data storage by a structured layout, which
allows to add support for independent PTP clocks on top while reusing
both the data structures and the time accessor implementations.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgSWUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYGED/0f/M8YyacAyErDYW4ufW+zh2sUidSf
GVlK0Jn5BMljOoye+y2XfTxuvvXxEDjJNYiJm2uKGPdV29tjNXreGK39XyNqXPu5
jwR4f/IN/QVSM2nCO6jyydMz8ympJ2k6M4RewwmxXBL2KsUzzJWSKTgRNqM5Tdjs
1RhJMjkQVTiiSYerBpHXYCeZLM7/VEfZ120uuzVAYPXo0/R6zuyF7IBgIao9hbfO
IQeCMLLfpDQHQhwquTA8ZbWqQusiEoSYHT+kTDa3eXDDbE/2UklAUs9gaatI979x
73zs0Yqxyx2iIGaghACWOAbKdcBWBeCYDw5fFwYVKn4VMQi1+wcxbtOYL767jp9o
vfkLXGilXcVkvDjv4fH+e1NoJXXBxq1Ug1silKdOeJzenQF8Q1i3tavkWUVCNfwH
qyOIM72NiCEWbYBDcz0lwBxEAyO4o0E6NP1bDc4y50VedEYIbXwSh0QGrdev1abn
rjY9vsuUR9oznmZ6BRPPxMTY87gOSHoKvqydgSZUACEgLV9346f5qZf341OReYai
MXUmXOM4+LdyaM1+Mec8ppvjMbLw+736NZyZtT2InusEBE+Ddp25L3hYiWnklJu8
2uwv0AoyrwaJ8y6ADOX4thcLZq0gND0Z/Ayz/XvpeI30eftsGUCt5KOVlqwfwOkI
4EQKvk2fAixPxg==
=rwei
-----END PGP SIGNATURE-----
Merge tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO infrastructure updates from Thomas Gleixner:
- Consolidate the VDSO storage
The VDSO data storage and data layout has been largely architecture
specific for historical reasons. That increases the maintenance
effort and causes inconsistencies over and over.
There is no real technical reason for architecture specific layouts
and implementations. The architecture specific details can easily be
integrated into a generic layout, which also reduces the amount of
duplicated code for managing the mappings.
Convert all architectures over to a unified layout and common mapping
infrastructure. This splits the VDSO data layout into subsystem
specific blocks, timekeeping, random and architecture parts, which
provides a better structure and allows to improve and update the
functionalities without conflict and interaction.
- Rework the timekeeping data storage
The current implementation is designed for exposing system
timekeeping accessors, which was good enough at the time when it was
designed.
PTP and Time Sensitive Networking (TSN) change that as there are
requirements to expose independent PTP clocks, which are not related
to system timekeeping.
Replace the monolithic data storage by a structured layout, which
allows to add support for independent PTP clocks on top while reusing
both the data structures and the time accessor implementations.
* tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
sparc/vdso: Always reject undefined references during linking
x86/vdso: Always reject undefined references during linking
vdso: Rework struct vdso_time_data and introduce struct vdso_clock
vdso: Move architecture related data before basetime data
powerpc/vdso: Prepare introduction of struct vdso_clock
arm64/vdso: Prepare introduction of struct vdso_clock
x86/vdso: Prepare introduction of struct vdso_clock
time/namespace: Prepare introduction of struct vdso_clock
vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct
vdso/vsyscall: Prepare introduction of struct vdso_clock
vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock
vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock
vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock
vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock
vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock
vdso/gettimeofday: Prepare introduction of struct vdso_clock
vdso/helpers: Prepare introduction of struct vdso_clock
vdso/datapage: Define vdso_clock to prepare for multiple PTP clocks
vdso: Make vdso_time_data cacheline aligned
arm64: Make asm/cache.h compatible with vDSO
...
- Fix a memory ordering issue in posix-timers
Posix-timer lookup is lockless and reevaluates the timer validity under
the timer lock, but the update which validates the timer is not
protected by the timer lock. That allows the store to be reordered
against the initialization stores, so that the lookup side can observe
a partially initialized timer. That's mostly a theoretical problem, but
incorrect nevertheless.
- Fix a long standing inconsistency of the coarse time getters
The coarse time getters read the base time of the current update cycle
without reading the actual hardware clock. NTP frequency adjustment can
set the base time backwards. The fine grained interfaces compensate
this by reading the clock and applying the new conversion factor, but
the coarse grained time getters use the base time directly. That allows
the user to observe time going backwards.
Cure it by always forwarding base time, when NTP changes the frequency
with an immediate step.
- Rework of posix-timer hashing
The posix-timer hash is not scalable and due to the CRIU timer restore
mechanism prone to massive contention on the global hash bucket lock.
Replace the global hash lock with a fine grained per bucket locking
scheme to address that.
- Rework the proc/$PID/timers interface.
/proc/$PID/timers is provided for CRIU to be able to restore a
timer. The printout happens with sighand lock held and interrupts
disabled. That's not required as this can be done with RCU protection
as well.
- Provide a sane mechanism for CRIU to restore a timer ID
CRIU restores timers by creating and deleting them until the kernel
internal per process ID counter reached the requested ID. That's
horribly slow for sparse timer IDs.
Provide a prctl() which allows CRIU to restore a timer with a given
ID. When enabled the ID pointer is used as input pointer to read the
requested ID from user space. When disabled, the normal allocation
scheme (next ID) is active as before. This is backwards compatible for
both kernel and user space.
- Make hrtimer_update_function() less expensive.
The sanity checks are valuable, but expensive for high frequency usage
in io/uring. Make the debug checks conditional and enable them only
when lockdep is enabled.
- Small updates, cleanups and improvements
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgQ6wTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeQzD/9p+EuUGrMbSNaLVMCYFULBbR0lersJ
hrGGoKUsNt5T+f6hEEbSLBnkjZcMIj0J+mdIEUiRa73ryw1KmwLk/8MBu0c6u6q3
musDvJqt3dLTG98yN0YeWK3tJDxhSjxIpwcAXusPQ04j16I2fVXFzDQ/kGPq6MTI
tdMYzsS3wjuWpi+CbgRSP2HEwu08fIDVsQ7Grynh4Kmd31apne4ZgF2UVp6UiZyp
8yJHZgVzJcFs7Y3MS6XTgezHnuADxMY1irzbXmok19941X8mZz2QRIpGQX+oMh6o
g7SG2lj9i8YbLqU9/5RbC5ppjRcWfogDpW0Lk+OmdOpr0RiXTmx5Lz8Egxex9wG5
pUJszeTY+bLw7mmYmkGZyBz+PNoGgVM5KFZRe5ENvYM8Gy8LUW5DA9zvxeHqDDz1
FiMmKdYrwr8VCKqx+8hJQdzlzRbepxq9sNzDdMKVOUcFdGUVWekfG6ZFkfLKxwzA
XDTKJilzXbAAj4r57vEvOCYLUZH/ZsFK4yyg0O53fEg6fj87EbTDb5+YUGazb3+C
yNTEOQIT8LtutzLR9+xeLi92k+6zlJ4c1PfqBx5Kv/TwBrIfV1P8N2c6TCOWDoRM
AOvo2SXEA/jEPix2GjT5jalSV1mROEXo2T9/G7kz4H7K+DkI/dGgS9mXyUDO2mMd
ouOxYN0GohVqTQ==
=XUGH
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Fix a memory ordering issue in posix-timers
Posix-timer lookup is lockless and reevaluates the timer validity
under the timer lock, but the update which validates the timer is not
protected by the timer lock. That allows the store to be reordered
against the initialization stores, so that the lookup side can
observe a partially initialized timer. That's mostly a theoretical
problem, but incorrect nevertheless.
- Fix a long standing inconsistency of the coarse time getters
The coarse time getters read the base time of the current update
cycle without reading the actual hardware clock. NTP frequency
adjustment can set the base time backwards. The fine grained
interfaces compensate this by reading the clock and applying the new
conversion factor, but the coarse grained time getters use the base
time directly. That allows the user to observe time going backwards.
Cure it by always forwarding base time, when NTP changes the
frequency with an immediate step.
- Rework of posix-timer hashing
The posix-timer hash is not scalable and due to the CRIU timer
restore mechanism prone to massive contention on the global hash
bucket lock.
Replace the global hash lock with a fine grained per bucket locking
scheme to address that.
- Rework the proc/$PID/timers interface.
/proc/$PID/timers is provided for CRIU to be able to restore a timer.
The printout happens with sighand lock held and interrupts disabled.
That's not required as this can be done with RCU protection as well.
- Provide a sane mechanism for CRIU to restore a timer ID
CRIU restores timers by creating and deleting them until the kernel
internal per process ID counter reached the requested ID. That's
horribly slow for sparse timer IDs.
Provide a prctl() which allows CRIU to restore a timer with a given
ID. When enabled the ID pointer is used as input pointer to read the
requested ID from user space. When disabled, the normal allocation
scheme (next ID) is active as before. This is backwards compatible
for both kernel and user space.
- Make hrtimer_update_function() less expensive.
The sanity checks are valuable, but expensive for high frequency
usage in io/uring. Make the debug checks conditional and enable them
only when lockdep is enabled.
- Small updates, cleanups and improvements
* tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
selftests/timers: Improve skew_consistency by testing with other clockids
timekeeping: Fix possible inconsistencies in _COARSE clockids
posix-timers: Drop redundant memset() invocation
selftests/timers/posix-timers: Add a test for exact allocation mode
posix-timers: Provide a mechanism to allocate a given timer ID
posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held
posix-timers: Make per process list RCU safe
posix-timers: Avoid false cacheline sharing
posix-timers: Switch to jhash32()
posix-timers: Improve hash table performance
posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t
posix-timers: Make lock_timer() use guard()
posix-timers: Rework timer removal
posix-timers: Simplify lock/unlock_timer()
posix-timers: Use guards in a few places
posix-timers: Remove SLAB_PANIC from kmem cache
posix-timers: Remove a few paranoid warnings
posix-timers: Cleanup includes
posix-timers: Add cond_resched() to posix_timer_add() search loop
posix-timers: Initialise timer before adding it to the hash table
...
Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
when hardware reports a packet completed.
Completion tstamp is useful for Bluetooth, as hardware timestamps do not
exist in the HCI specification except for ISO packets, and the hardware
has a queue where packets may wait. In this case the software SND
timestamp only reflects the kernel-side part of the total latency
(usually small) and queue length (usually 0 unless HW buffers
congested), whereas the completion report time is more informative of
the true latency.
It may also be useful in other cases where HW TX timestamps cannot be
obtained and user wants to estimate an upper bound to when the TX
probably happened.
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
struct virtio_net_rss_config was less useful in actual code because of a
flexible array placed in the middle. Add new structures that split it
into two to avoid having a flexible array in the middle.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-1-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* cfg80211/mac80211: fix and enable link reconfiguration
* rtw88: support RTL8814AE/RTL8814AU
* mt7996: preparations for MLO
* ath12k: continued work on MLO
* iwlwifi: add new iwlmld sub-driver/op-mode for
some current and future devices
* wfx: wowlan support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmfcE18ACgkQ10qiO8sP
aAAW4RAAlu0ehfMYvDmUifEo2q7NXVjou1D7u3gGNlwSR/Shrftn2k47n+tThARb
25ZutatgPlt4b+RjzeoSa724xLlDOQAG7e36tV7xZNDgiWRK6VrF+JVzFxin7cgQ
e3yXcgf5luLiUNWocCZXc2Vr4MGVrSFcqLAFZXuQPDzVfU8ClKObUsJkEX59PsFv
Sob5b4VzwAyBXUlUzMjSWxdAsUV04p0NBfAv0HlEuc+XboHwMZDkXJokrPz9YCpz
TiT+hySDxeFyeXADiXQsYUAVo8LwNUNlehqEc+yzXsuZb6daCiGQiZW0wN15uxCb
vrfYawJO05QsWtnmG6gg91gh6FwFnd0XIPjrveqzhsAqbA+AJpHqPT9lUBe3K3C2
PObDpz5NLRnGhO62Rt0au/wkoXGLe3sro5fOy1G0v2YMEvTIvEOByxEtlh0124zM
C9s28KRV3PqZ4+8YmTjfc/enTVzh8vFFiyj7b//xtclBbFJZezJ1iDTHj1gEuIiQ
/Npjvu7zXMYxmtGbUfTxoZ4AxzO7wk2j6oD3b+prI3DrthRjzq9n1j7bnmlAukyw
miFm5f0ss0TrS+dOnE8CysNz8EkYJGDjtoGh9U3Bej+q2HSYK1YwcL2BeW+v28qV
2rqyOQbftxXHWs0n2aEzkzGwnTYKCY7kR1+CMh11GQ7+5vKBVss=
=HSDM
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
More features for 6.15, major changes:
* cfg80211/mac80211: fix and enable link reconfiguration
* rtw88: support RTL8814AE/RTL8814AU
* mt7996: preparations for MLO
* ath12k: continued work on MLO
* iwlwifi: add new iwlmld sub-driver/op-mode for
some current and future devices
* wfx: wowlan support
* tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (311 commits)
wifi: mt76: mt7996: fix locking in mt7996_mac_sta_rc_work()
wifi: mt76: mt76x2u: add TP-Link TL-WDN6200 ID to device table
wifi: mt76: mt792x: re-register CHANCTX_STA_CSA only for the mt7921 series
wifi: mt76: mt7996: Update mt7996_tx to MLO support
wifi: mt76: mt7996: rework mt7996_ampdu_action to support MLO
wifi: mt76: mt7996: rework set/get_tsf callabcks to support MLO
wifi: mt76: mt7996: set vif default link_id adding/removing vif links
wifi: mt76: mt7996: rework mt7996_mcu_beacon_inband_discov to support MLO
wifi: mt76: mt7996: rework mt7996_mcu_add_obss_spr to support MLO
wifi: mt76: mt7996: rework mt7996_net_fill_forward_path to support MLO
wifi: mt76: mt7996: rework mt7996_update_mu_group to support MLO
wifi: mt76: mt7996: rework mt7996_mac_sta_poll to support MLO
wifi: mt76: mt7996: rework mt7996_mac_sta_rc_work to support MLO
wifi: mt76: mt7996: remove mt7996_mac_enable_rtscts()
wifi: mt76: mt7996: rework mt7996_sta_hw_queue_read to support MLO
wifi: mt76: mt7996: rework mt7996_set_hw_key to support MLO
wifi: mt76: mt7996: Add mt7996_sta_link to mt7996_mcu_add_bss_info signature
wifi: mt76: mt7996: rework mt7996_sta_set_4addr and mt7996_sta_set_decap_offload to support MLO
wifi: mt76: mt7996: rework mt7996_rx_get_wcid to support MLO
wifi: mt76: mt7996: Rely on wcid_to_sta in mt7996_mac_add_txs_skb()
...
====================
Link: https://patch.msgid.link/20250320131106.33266-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 14a1968074 ("net: reorganize IP MIB values") changed
MIB values to group hot fields together.
Since then 5 new fields have been added without caring about
data locality.
This patch moves IPSTATS_MIB_OUTPKTS, IPSTATS_MIB_NOECTPKTS,
IPSTATS_MIB_ECT1PKTS, IPSTATS_MIB_ECT0PKTS, IPSTATS_MIB_CEPKTS
to the hot portion of per-cpu data.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250320101434.3174412-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This extends the VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT ioctls to attach/detach
a given pasid of a vfio device to/from an IOAS/HWPT.
Link: https://patch.msgid.link/r/20250321180143.8468-4-yi.l.liu@intel.com
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The underlying infrastructure has supported the PASID attach and related
enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This
extends iommufd to support PASID compatible domain requested by userspace.
Link: https://patch.msgid.link/r/20250321171940.7213-15-yi.l.liu@intel.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Support adjusting/reading delayed ack max for socket level by using
set/getsockopt().
This option aligns with TCP_BPF_DELACK_MAX usage. Considering that bpf
option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.
Add WRITE_ONCE/READ_ONCE() to prevent data-race if setsockopt()
happens to write one value to icsk_delack_max while icsk_delack_max is
being read.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support adjusting/reading RTO MIN for socket level by using set/getsockopt().
This new option has the same effect as TCP_BPF_RTO_MIN, which means it
doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that
bpf option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.
When the socket is created, its icsk_rto_min is set to the default
value that is controlled by sysctl_tcp_rto_min_us. Then if application
calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then
icsk_rto_min will be overridden in jiffies unit.
This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around
icsk_rto_min.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core:
- Move perf_event sysctls into kernel/events/ (Joel Granados)
- Use POLLHUP for pinned events in error (Namhyung Kim)
- Avoid the read if the count is already updated (Peter Zijlstra)
- Allow the EPOLLRDNORM flag for poll (Tao Chen)
- locking/percpu-rwsem: Add guard support (Peter Zijlstra)
[ NOTE: this got (mis-)merged into the perf tree due to related work. ]
perf_pmu_unregister() related improvements: (Peter Zijlstra)
- Simplify the perf_event_alloc() error path
- Simplify the perf_pmu_register() error path
- Simplify perf_pmu_register()
- Simplify perf_init_event()
- Simplify perf_event_alloc()
- Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
- Add this_cpc() helper
- Introduce perf_free_addr_filters()
- Robustify perf_event_free_bpf_prog()
- Simplify the perf_mmap() control flow
- Further simplify perf_mmap()
- Remove retry loop from perf_mmap()
- Lift event->mmap_mutex in perf_mmap()
- Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
- Fix perf_mmap() failure path
Uprobes:
- Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
- Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
- Remove the spinlock within handle_singlestep() (Liao Chang)
x86 Intel PMU enhancements:
- Support PEBS counters snapshotting (Kan Liang)
- Fix intel_pmu_read_event() (Kan Liang)
- Extend per event callchain limit to branch stack (Kan Liang)
- Fix system-wide LBR profiling (Kan Liang)
- Allocate bts_ctx only if necessary (Li RongQing)
- Apply static call for drain_pebs (Peter Zijlstra)
x86 AMD PMU enhancements: (Ravi Bangoria)
- Remove pointless sample period check
- Fix ->config to sample period calculation for OP PMU
- Fix perf_ibs_op.cnt_mask for CurCnt
- Don't allow freq mode event creation through ->config interface
- Add PMU specific minimum period
- Add ->check_period() callback
- Ceil sample_period to min_period
- Add support for OP Load Latency Filtering
- Update DTLB/PageSize decode logic
Hardware breakpoints:
- Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar Bhaskar)
Hardlockup detector improvements: (Li Huafei)
- perf_event memory leak
- Warn if watchdog_ev is leaked
Fixes and cleanups:
- Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter Zijlstra,
Ravi Bangoria, Thorsten Blum, XieLudan)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfehRIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hF3g//TCAQijI6OFNpYiD1xoyMq4m+baIYhYx0
lnxwxhsN58JFcEJeWIEGLACqUePyH68jNKVSr9sIoeV4gnnMX+x2Ny6rh/1H3Ox+
jQyVmPdFmKa8QG7wGNjcDteIzlEKK4zqruXWaG54LX2e6kbQZWwd0I21MyXkrHXb
oMIfyZbCAWuPW1wefZm8FPgImT+nvwOosyx90OVagGqk5mYdNb9DFhMjQveStHdQ
BnWU6rYdW1c2eXKpeuvxY4uWQoCELC6WntLimvcswy6fb+9LtbglpCYQOGGDrGvp
v3RASf/8clFVSau8P/8NEaNgLgjN/e3eN/fAoSut8Z22nAeBC6qv4qjFt1piDpbs
AaEXYCYM0/Tfzjp3ctPsFrxbKvB8q2qhxSm37Co0Ix6WyJn3JQbNx48g8GIod2os
eGPXSZzoz9O8coeTKKbxWp4fpAjFfyfe/ovWQuVd8JI4bYj7Mi63J+RxQDd2TkJP
H+IgxZoamJExgS1YcKJUBtw7QKQm5pHFx03Br7KsNxgmHy7JdoN9bh0h14pkeXjB
MnAvWOS5ouuriJgQ+4bqAezS8DSHnDdmFmWgNEEqAlOD9Zy9hDXJ2GiqbHKMyRNC
ae35o0PDUFTIX9O5NPIDUyWtJb5uH/S1lQhS7GD+ODlMDIX+ny+REXf9krSCR1H0
GUqq2UmxBGA=
=iPmA
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Core:
- Move perf_event sysctls into kernel/events/ (Joel Granados)
- Use POLLHUP for pinned events in error (Namhyung Kim)
- Avoid the read if the count is already updated (Peter Zijlstra)
- Allow the EPOLLRDNORM flag for poll (Tao Chen)
- locking/percpu-rwsem: Add guard support [ NOTE: this got
(mis-)merged into the perf tree due to related work ] (Peter
Zijlstra)
perf_pmu_unregister() related improvements: (Peter Zijlstra)
- Simplify the perf_event_alloc() error path
- Simplify the perf_pmu_register() error path
- Simplify perf_pmu_register()
- Simplify perf_init_event()
- Simplify perf_event_alloc()
- Merge struct pmu::pmu_disable_count into struct
perf_cpu_pmu_context::pmu_disable_count
- Add this_cpc() helper
- Introduce perf_free_addr_filters()
- Robustify perf_event_free_bpf_prog()
- Simplify the perf_mmap() control flow
- Further simplify perf_mmap()
- Remove retry loop from perf_mmap()
- Lift event->mmap_mutex in perf_mmap()
- Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
- Fix perf_mmap() failure path
Uprobes:
- Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
- Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
- Remove the spinlock within handle_singlestep() (Liao Chang)
x86 Intel PMU enhancements:
- Support PEBS counters snapshotting (Kan Liang)
- Fix intel_pmu_read_event() (Kan Liang)
- Extend per event callchain limit to branch stack (Kan Liang)
- Fix system-wide LBR profiling (Kan Liang)
- Allocate bts_ctx only if necessary (Li RongQing)
- Apply static call for drain_pebs (Peter Zijlstra)
x86 AMD PMU enhancements: (Ravi Bangoria)
- Remove pointless sample period check
- Fix ->config to sample period calculation for OP PMU
- Fix perf_ibs_op.cnt_mask for CurCnt
- Don't allow freq mode event creation through ->config interface
- Add PMU specific minimum period
- Add ->check_period() callback
- Ceil sample_period to min_period
- Add support for OP Load Latency Filtering
- Update DTLB/PageSize decode logic
Hardware breakpoints:
- Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar
Bhaskar)
Hardlockup detector improvements: (Li Huafei)
- perf_event memory leak
- Warn if watchdog_ev is leaked
Fixes and cleanups:
- Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter
Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)"
* tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
perf: Fix __percpu annotation
perf: Clean up pmu specific data
perf/x86: Remove swap_task_ctx()
perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode
perf: Supply task information to sched_task()
perf: attach/detach PMU specific data
locking/percpu-rwsem: Add guard support
perf: Save PMU specific data in task_struct
perf: Extend per event callchain limit to branch stack
perf/ring_buffer: Allow the EPOLLRDNORM flag for poll
perf/core: Use POLLHUP for pinned events in error
perf/core: Use sysfs_emit() instead of scnprintf()
perf/core: Remove optional 'size' arguments from strscpy() calls
perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions
uprobes/x86: Harden uretprobe syscall trampoline check
watchdog/hardlockup/perf: Warn if watchdog_ev is leaked
watchdog/hardlockup/perf: Fix perf_event memory leak
perf/x86: Annotate struct bts_buffer::buf with __counted_by()
perf/core: Clean up perf_try_init_event()
perf/core: Fix perf_mmap() failure path
...
- Significant changes throughout the tree to bring Python code up to
current standards and raise the minimum Python required to 3.9. Much of
this is preparatory to replacing the ancient Perl scripts/kernel-doc
horror with a slightly less horrifying Python implementation, expected
for 6.16.
- Update the minimum Sphinx required to 3.4.3, allowing us to remove a
bunch of older compatibility code.
- Rework and improve the generation of the ABI documentation.
(All of the above done by Mauro)
- Lots of translation updates. Alex Shi and Yanteng Si are taking on
responsibility for the Chinese translations going forward; that work will
still get to you via docs-next
- Try to standardize the format for indicating a developer's affiliation in
commit tags.
- Clarify the TAB's role in CoC enforcement actions.
- Try to spell out the rules for when a commit tag can name another
developer without their explicit permission.
Plus lots of other typo fixes and updates.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx
PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU
wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6
FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2
Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl
gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw=
=Gvdp
-----END PGP SIGNATURE-----
Merge tag 'docs-6.15' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
current standards and raise the minimum Python required to 3.9
Much of this is preparatory to replacing the ancient Perl
scripts/kernel-doc horror with a slightly less horrifying Python
implementation, expected for 6.16
- Update the minimum Sphinx required to 3.4.3, allowing us to remove
a bunch of older compatibility code
- Rework and improve the generation of the ABI documentation
(All of the above done by Mauro)
- Lots of translation updates. Alex Shi and Yanteng Si are taking on
responsibility for the Chinese translations going forward; that
work will still get to you via docs-next
- Try to standardize the format for indicating a developer's
affiliation in commit tags
- Clarify the TAB's role in CoC enforcement actions
- Try to spell out the rules for when a commit tag can name another
developer without their explicit permission
Plus lots of other typo fixes and updates"
* tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits)
docs/zh_CN: fix spelling mistake
docs/Chinese: change the disclaimer words
docs/zh_CN: Add snp-tdx-threat-model index Chinese translation
docs: driver-api: firmware: clarify userspace requirements
docs: clarify rules wrt tagging other people
docs: Remove outdated highuid.rst documentation
Documentation: dma-buf: heaps: Add heap name definitions
docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README
docs: Correct installation instruction
Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst
Documentation/CoC: Spell out the TAB role in enforcement decisions
Documentation: ocxl.rst: Update consortium site
scripts: get_feat.pl: substitute s390x with s390
scripts/kernel-doc: drop dead code for Wcontents_before_sections
scripts/kernel-doc: don't add not needed new lines
docs: driver-api/infiniband.rst: fix Kerneldoc markup
drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup
include/asm-generic/io.h: fix kerneldoc markup
Docs/arch/arm64: Fix spelling in amu.rst
...
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile time
(Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for it
- Introduce __nonstring_array for silencing future GCC 15 warnings
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hGrgAKCRA2KwveOeQk
u1WvAQC3ZxFu3b0Omfmht2pPqCltf2UOQNvUx3egjoeXpUaNSgD+Lxr/T4xksy7E
jHh7rCYDkruOWs3DHA5JjRQcf0BBLQo=
=FTQp
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As usual, it's scattered changes all over. Patches touching things
outside of our traditional areas in the tree have been Acked by
maintainers or were trivial changes:
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile
time (Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of
memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
it
- Introduce __nonstring_array for silencing future GCC 15 warnings"
* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
compiler_types: Introduce __nonstring_array
hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
x86/build: Remove -ffreestanding on i386 with GCC
ubsan/overflow: Enable ignorelist parsing and add type filter
ubsan/overflow: Enable pattern exclusions
ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
samples/check-exec: Fix script name
yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
kbuild: clang: Support building UM with SUBARCH=i386
loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
lib/string_choices: Rearrange functions in sorted order
string.h: Validate memtostr*()/strtomem*() arguments more carefully
compiler.h: Introduce __must_be_noncstr()
nilfs2: Mark on-disk strings as nonstring
uapi: stddef.h: Introduce __kernel_nonstring
x86/tdx: Mark message.bytes as nonstring
string: kunit: Mark nonstring test strings as __nonstring
scsi: qla2xxx: Mark device strings as nonstring
scsi: mpt3sas: Mark device strings as nonstring
scsi: mpi3mr: Mark device strings as nonstring
...
- elf: Define and use note name macros (Akihiko Odaki)
- elf: add remaining SHF_ flag macros (Timur Tabi)
- binfmt: Remove loader from linux_binprm struct (Yonatan Goldschmidt)
- binfmt_elf_fdpic: fix variable set but not used warning (sunliming)
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hJxQAKCRA2KwveOeQk
uwhAAP9WXP77FCfAvJkmOfDI5FSa6g2WH/BnUOtZW63lSoXnTAEAuttg8W1IKcl/
Db+R/RLS3QqrU+ib5xWBeA6ZvxZXCwA=
=Svn4
-----END PGP SIGNATURE-----
Merge tag 'execve-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- elf: Define and use note name macros (Akihiko Odaki)
- elf: add remaining SHF_ flag macros (Timur Tabi)
- binfmt: Remove loader from linux_binprm struct (Yonatan Goldschmidt)
- binfmt_elf_fdpic: fix variable set but not used warning (sunliming)
* tag 'execve-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf_fdpic: fix variable set but not used warning
elf: add remaining SHF_ flag macros
binfmt: Remove loader from linux_binprm struct
crash: Remove KEXEC_CORE_NOTE_NAME
s390/crash: Use note name macros
crash: Use note name macros
powerpc/crash: Use note name macros
binfmt_elf: Use note name macros
elf: Define note name macros
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90pqgAKCRCRxhvAZXjc
oqVsAP9Aq/fMCI14HeXehPCezKQZPu1HTrPPo2clLHXoSnafawEAsA3YfWTT4Heb
iexzqvAEUOMYOVN66QEc+6AAwtMLrwc=
=0eYo
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pidfs updates from Christian Brauner:
- Allow retrieving exit information after a process has been reaped
through pidfds via the new PIDFD_INTO_EXIT extension for the
PIDFD_GET_INFO ioctl. Various tools need access to information about
a process/task even after it has already been reaped.
Pidfd polling allows waiting on either task exit or for a task to
have been reaped. The contract for PIDFD_INFO_EXIT is simply that
EPOLLHUP must be observed before exit information can be retrieved,
i.e., exit information is only provided once the task has been reaped
and then can be retrieved as long as the pidfd is open.
- Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
forgo allocating a file descriptor for their own process. This is
useful in scenarios where users want to act on their own process
through pidfds and is akin to AT_FDCWD.
- Improve premature thread-group leader and subthread exec behavior
when polling on pidfds:
(1) During a multi-threaded exec by a subthread, i.e.,
non-thread-group leader thread, all other threads in the
thread-group including the thread-group leader are killed and the
struct pid of the thread-group leader will be taken over by the
subthread that called exec. IOW, two tasks change their TIDs.
(2) A premature thread-group leader exit means that the thread-group
leader exited before all of the other subthreads in the
thread-group have exited.
Both cases lead to inconsistencies for pidfd polling with
PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
current thread-group leader may or may not see an exit notification
on the file descriptor depending on when poll is performed. If the
poll is performed before the exec of the subthread has concluded an
exit notification is generated for the old thread-group leader. If
the poll is performed after the exec of the subthread has concluded
no exit notification is generated for the old thread-group leader.
The correct behavior is to simply not generate an exit notification
on the struct pid of a subhthread exec because the struct pid is
taken over by the subthread and thus remains alive.
But this is difficult to handle because a thread-group may exit
premature as mentioned in (2). In that case an exit notification is
reliably generated but the subthreads may continue to run for an
indeterminate amount of time and thus also may exec at some point.
After this pull no exit notifications will be generated for a
PIDFD_THREAD pidfd for a thread-group leader until all subthreads
have been reaped. If a subthread should exec before no exit
notification will be generated until that task exits or it creates
subthreads and repeates the cycle.
This means an exit notification indicates the ability for the father
to reap the child.
* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
selftests/pidfd: third test for multi-threaded exec polling
selftests/pidfd: second test for multi-threaded exec polling
selftests/pidfd: first test for multi-threaded exec polling
pidfs: improve multi-threaded exec and premature thread-group leader exit polling
pidfs: ensure that PIDFS_INFO_EXIT is available
selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
selftests/pidfd: add third PIDFD_INFO_EXIT selftest
selftests/pidfd: add second PIDFD_INFO_EXIT selftest
selftests/pidfd: add first PIDFD_INFO_EXIT selftest
selftests/pidfd: expand common pidfd header
pidfs/selftests: ensure correct headers for ioctl handling
selftests/pidfd: fix header inclusion
pidfs: allow to retrieve exit information
pidfs: record exit code and cgroupid at exit
pidfs: use private inode slab cache
pidfs: move setting flags into pidfs_alloc_file()
pidfd: rely on automatic cleanup in __pidfd_prepare()
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90qAwAKCRCRxhvAZXjc
on7lAP0akpIsJMWREg9tLwTNTySI1b82uKec0EAgM6T7n/PYhAD/T4zoY8UYU0Pr
qCxwTXHUVT6bkNhjREBkfqq9OkPP8w8=
=GxeN
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- Mount notifications
The day has come where we finally provide a new api to listen for
mount topology changes outside of /proc/<pid>/mountinfo. A mount
namespace file descriptor can be supplied and registered with
fanotify to listen for mount topology changes.
Currently notifications for mount, umount and moving mounts are
generated. The generated notification record contains the unique
mount id of the mount.
The listmount() and statmount() api can be used to query detailed
information about the mount using the received unique mount id.
This allows userspace to figure out exactly how the mount topology
changed without having to generating diffs of /proc/<pid>/mountinfo
in userspace.
- Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
api
- Support detached mounts in overlayfs
Since last cycle we support specifying overlayfs layers via file
descriptors. However, we don't allow detached mounts which means
userspace cannot user file descriptors received via
open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
attach them to a mount namespace via move_mount() first.
This is cumbersome and means they have to undo mounts via umount().
Allow them to directly use detached mounts.
- Allow to retrieve idmappings with statmount
Currently it isn't possible to figure out what idmapping has been
attached to an idmapped mount. Add an extension to statmount() which
allows to read the idmapping from the mount.
- Allow creating idmapped mounts from mounts that are already idmapped
So far it isn't possible to allow the creation of idmapped mounts
from already idmapped mounts as this has significant lifetime
implications. Make the creation of idmapped mounts atomic by allow to
pass struct mount_attr together with the open_tree_attr() system call
allowing to solve these issues without complicating VFS lookup in any
way.
The system call has in general the benefit that creating a detached
mount and applying mount attributes to it becomes an atomic operation
for userspace.
- Add a way to query statmount() for supported options
Allow userspace to query which mount information can be retrieved
through statmount().
- Allow superblock owners to force unmount
* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
umount: Allow superblock owners to force umount
selftests: add tests for mount notification
selinux: add FILE__WATCH_MOUNTNS
samples/vfs: fix printf format string for size_t
fs: allow changing idmappings
fs: add kflags member to struct mount_kattr
fs: add open_tree_attr()
fs: add copy_mount_setattr() helper
fs: add vfs_open_tree() helper
statmount: add a new supported_mask field
samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
selftests: add tests for using detached mount with overlayfs
samples/vfs: check whether flag was raised
statmount: allow to retrieve idmappings
uidgid: add map_id_range_up()
fs: allow detached mounts in clone_private_mount()
selftests/overlayfs: test specifying layers as O_PATH file descriptors
fs: support O_PATH fds with FSCONFIG_SET_FD
vfs: add notifications for mount attach and detach
fanotify: notify on mount attach and detach
...
Provide a set of IOCTLs for creating and managing child partitions when
running as root partition on Hyper-V. The new driver is enabled via
CONFIG_MSHV_ROOT.
A brief overview of the interface:
MSHV_CREATE_PARTITION is the entry point, returning a file descriptor
representing a child partition. IOCTLs on this fd can be used to map
memory, create VPs, etc.
Creating a VP returns another file descriptor representing that VP which
in turn has another set of corresponding IOCTLs for running the VP,
getting/setting state, etc.
MSHV_ROOT_HVCALL is a generic "passthrough" hypercall IOCTL which can be
used for a number of partition or VP hypercalls. This is for hypercalls
that do not affect any state in the kernel driver, such as getting and
setting VP registers and partition properties, translating addresses,
etc. It is "passthrough" because the binary input and output for the
hypercall is only interpreted by the VMM - the kernel driver does
nothing but insert the VP and partition id where necessary (which are
always in the same place), and execute the hypercall.
Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Co-developed-by: Jinank Jain <jinankjain@microsoft.com>
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Co-developed-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Co-developed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-developed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Co-developed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Link: https://lore.kernel.org/r/1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com>
Some fixes may require user space to check if they are applied on the
running kernel before using a specific feature. For instance, this
applies when a restriction was previously too restrictive and is now
getting relaxed (e.g. for compatibility reasons). However, non-visible
changes for legitimate use (e.g. security fixes) do not require an
erratum.
Because fixes are backported down to a specific Landlock ABI, we need a
way to avoid cherry-pick conflicts. The solution is to only update a
file related to the lower ABI impacted by this issue. All the ABI files
are then used to create a bitmask of fixes.
The new errata interface is similar to the one used to get the supported
Landlock ABI version, but it returns a bitmask instead because the order
of fixes may not match the order of versions, and not all fixes may
apply to all versions.
The actual errata will come with dedicated commits. The description is
not actually used in the code but serves as documentation.
Create the landlock_abi_version symbol and use its value to check errata
consistency.
Update test_base's create_ruleset_checks_ordering tests and add errata
tests.
This commit is backportable down to the first version of Landlock.
Fixes: 3532b0b435 ("landlock: Enable user space to infer supported features")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
By default, io_uring marks a waiting task as being in iowait, if it's
sleeping waiting on events and there are pending requests. This isn't
necessarily always useful, and may be confusing on non-storage setups
where iowait isn't expected. It can also cause extra power usage, by
preventing the CPU from entering lower sleep states.
This adds a new enter flag, IORING_ENTER_NO_IOWAIT. If set, then
io_uring will not account the sleeping task as being in iowait. If the
kernel supports this feature, then it will be marked by having the
IORING_FEAT_NO_IOWAIT feature flag set.
As the kernel currently does not support separating the iowait
accounting and CPU frequency boosting, the IORING_ENTER_NO_IOWAIT
controls both of these at the same time. In the future, if those do end
up being split, then it'd be possible to control them separately.
However, it seems more likely that the kernel will decouple iowait and
CPU frequency boosting anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* kvm-arm64/writable-midr:
: Writable implementation ID registers, courtesy of Sebastian Ott
:
: Introduce a new capability that allows userspace to set the
: ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1,
: and AIDR_EL1. Also plug a hole in KVM's trap configuration where
: SMIDR_EL1 was readable at EL1, despite the fact that KVM does not
: support SME.
KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
KVM: arm64: Copy guest CTR_EL0 into hyp VM
KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
KVM: arm64: Allow userspace to change the implementation ID registers
KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value
KVM: arm64: Maintain per-VM copy of implementation ID regs
KVM: arm64: Set HCR_EL2.TID1 unconditionally
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When we currently create a pidfd we check that the task hasn't been
reaped right before we create the pidfd. But it is of course possible
that by the time we return the pidfd to userspace the task has already
been reaped since we don't check again after having created a dentry for
it.
This was fine until now because that race was meaningless. But now that
we provide PIDFD_INFO_EXIT it is a problem because it is possible that
the kernel returns a reaped pidfd and it depends on the race whether
PIDFD_INFO_EXIT information is available. This depends on if the task
gets reaped before or after a dentry has been attached to struct pid.
Make this consistent and only returned pidfds for reaped tasks if
PIDFD_INFO_EXIT information is available. This is done by performing
another check whether the task has been reaped right after we attached a
dentry to struct pid.
Since pidfs_exit() is called before struct pid's task linkage is removed
the case where the task got reaped but a dentry was already attached to
struct pid and exit information was recorded and published can be
handled correctly. In that case we do return a pidfd for a reaped task
like we would've before.
Link: https://lore.kernel.org/r/20250316-kabel-fehden-66bdb6a83436@brauner
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The zstd and zlib compression types support setting compression levels.
Enhance the defrag interface to specify the levels as well. For zstd the
negative (realtime) levels are also accepted.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Aside from the IOPF framework, iommufd provides an additional pathway to
report hardware events, via the vEVENTQ of vIOMMU infrastructure.
Define an iommu_vevent_arm_smmuv3 uAPI structure, and report stage-1 events
in the threaded IRQ handler. Also, add another four event record types that
can be forwarded to a VM.
Link: https://patch.msgid.link/r/5cf6719682fdfdabffdb08374cdf31ad2466d75a.1741719725.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Introduce a new IOMMUFD_OBJ_VEVENTQ object for vIOMMU Event Queue that
provides user space (VMM) another FD to read the vIOMMU Events.
Allow a vIOMMU object to allocate vEVENTQs, with a condition that each
vIOMMU can only have one single vEVENTQ per type.
Add iommufd_veventq_alloc() with iommufd_veventq_ops for the new ioctl.
Link: https://patch.msgid.link/r/21acf0751dd5c93846935ee06f93b9c65eff5e04.1741719725.git.nicolinc@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
- bump version strings, by Simon Wunderlich
- drop batadv_priv_debug_log struct, by Sven Eckelmann
- adopt netdev_hold() / netdev_put(), by Eric Dumazet
- add support for jumbo frames, by Sven Eckelmann
- use consistent name for mesh interface, by Sven Eckelmann
- cleanup B.A.T.M.A.N. IV OGM aggregation handling,
by Sven Eckelmann (4 patches)
- add missing newlines for log macros, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmfTCE0WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoea+EACQ3fJp7kf6iWxhJLiwon3Aoo4u
dg84xX5nc02KHT/zUC/ADSSN3FDdc3y8jgjpLfZv5TkWmRUqMNP9449PaoXpiIsW
oI6HKrVaEKIEPpMAaBrv0dPtG5fBZPku4tZBLP7ekU/2N0Ri3eKTCph7IznO3Ma5
cM4FN6H9Nxwnht2qN7Gx3nWoBn9LeaSyKfAFLR+hI3QxTh9PYLV26rGCLrUhbbvT
St8TtgdeULSH/50rhMqC4wpT8verk+BMJdShNPLyse4QSRky6DD6YOj7zS/c7akP
z8S77gvp0mcgzb/t49K4AFjUFEvK9l0rvPDx6DMrfJwQAZUPjFY93y1XI5QUOPRZ
+kVj0nr2XRc7c1/wDNfn4131T9Twi41+TX6BI8Fan8WpWKVTo2a/O5VWk5pHKjy+
a8aZqL41444wWyFhyCL+Io1y1mDc9TDa7644+fJjR5H4h47M9rGnLVEX3F9mVtBl
LIOwpHpHXg9z3JDgpCwLWykUzW4lEuF1hQFUD3DPIbVKENjvibiiB+qPNZJG1icL
n9Pt6843vF8gcw3aGbp0VGsESILiKr7EN7CMkhfcO1CaqAErJG4DinubH6DLQIjO
uW5Ss2751BfqWPOakWVzvvWafnleVRdw4rysbil4ko/xONg0qdfTMWe5fS42bkbk
CMmGPNeSiqVTXp5bIg==
=hmiI
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- drop batadv_priv_debug_log struct, by Sven Eckelmann
- adopt netdev_hold() / netdev_put(), by Eric Dumazet
- add support for jumbo frames, by Sven Eckelmann
- use consistent name for mesh interface, by Sven Eckelmann
- cleanup B.A.T.M.A.N. IV OGM aggregation handling,
by Sven Eckelmann (4 patches)
- add missing newlines for log macros, by Sven Eckelmann
* tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge:
batman-adv: add missing newlines for log macros
batman-adv: Limit aggregation size to outgoing MTU
batman-adv: Use actual packet count for aggregated packets
batman-adv: Switch to bitmap helper for aggregation handling
batman-adv: Limit number of aggregated packets directly
batman-adv: Use consistent name for mesh interface
batman-adv: Add support for jumbo frames
batman-adv: adopt netdev_hold() / netdev_put()
batman-adv: Drop batadv_priv_debug_log struct
batman-adv: Start new development cycle
====================
Link: https://patch.msgid.link/20250313164519.72808-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There are a few conflicts between the work that went
into wireless and that's here now, resolve them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently BPF_BTF_GET_FD_BY_ID requires CAP_SYS_ADMIN, which does not
allow running it from user namespace. This creates a problem when
freplace program running from user namespace needs to query target
program BTF.
This patch relaxes capable check from CAP_SYS_ADMIN to CAP_BPF and adds
support for BPF token that can be passed in attributes to syscall.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250317174039.161275-2-mykyta.yatsenko5@gmail.com
With AccECN, there's one additional TCP flag to be used (AE)
and ACE field that overloads the definition of AE, CWR, and
ECE flags. As tcp_flags was previously only 1 byte, the
byte-order stuff needs to be added to it's handling.
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 97c79a38cd ("perf core: Per event callchain limit")
introduced a per-event term to allow finer tuning of the depth of
callchains to save space.
It should be applied to the branch stack as well. For example, autoFDO
collections require maximum LBR entries. In the meantime, other
system-wide LBR users may only be interested in the latest a few number
of LBRs. A per-event LBR depth would save the perf output buffer.
The patch simply drops the uninterested branches, but HW still collects
the maximum branches. There may be a model-specific optimization that
can reduce the HW depth for some cases to reduce the overhead further.
But it isn't included in the patch set. Because it's not useful for all
cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
LBRs. The depth should have less impact on the collecting overhead.
The model-specific optimization may be implemented later separately.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310181536.3645382-1-kan.liang@linux.intel.com
We currently leave the decision of whether to shutdown or reboot to
protect hardware in an emergency situation to the individual drivers.
This works out in some cases, where the driver detecting the critical
failure has inside knowledge: It binds to the system management controller
for example or is guided by hardware description that defines what to do.
In the general case, however, the driver detecting the issue can't know
what the appropriate course of action is and shouldn't be dictating the
policy of dealing with it.
Therefore, add a global hw_protection toggle that allows the user to
specify whether shutdown or reboot should be the default action when the
driver doesn't set policy.
This introduces no functional change yet as hw_protection_trigger() has no
callers, but these will be added in subsequent commits.
[arnd@arndb.de: hide unused hw_protection_attr]
Link: https://lkml.kernel.org/r/20250224141849.1546019-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-7-e1c09b090c0c@pengutronix.de
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Cc: Benson Leung <bleung@chromium.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Fabio Estevam <festevam@denx.de>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matteo Croce <teknoraver@meta.com>
Cc: Matti Vaittinen <mazziesaccount@gmail.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce BPF instructions with load-acquire and store-release
semantics, as discussed in [1]. Define 2 new flags:
#define BPF_LOAD_ACQ 0x100
#define BPF_STORE_REL 0x110
A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).
Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).
Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit). As an
exception, however, 64-bit load-acquires/store-releases are not
supported on 32-bit architectures (to fix a build error reported by the
kernel test robot).
An 8- or 16-bit load-acquire zero-extends the value before writing it to
a 32-bit register, just like ARM64 instruction LDARH and friends.
Similar to existing atomic read-modify-write operations, misaligned
load-acquires/store-releases are not allowed (even if
BPF_F_ANY_ALIGNMENT is set).
As an example, consider the following 64-bit load-acquire BPF
instruction (assuming little-endian):
db 10 00 00 00 01 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
imm (0x00000100): BPF_LOAD_ACQ
Similarly, a 16-bit BPF store-release:
cb 21 00 00 10 01 00 00 store_release((u16 *)(r1 + 0x0), w2)
opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
imm (0x00000110): BPF_STORE_REL
In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have
bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the new
instructions, until the corresponding JIT compiler supports them in
arena.
[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/a217f46f0e445fbd573a1a024be5c6bf1d5fe716.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently for bpf progs in a cgroup hierarchy, the effective prog array
is computed from bottom cgroup to upper cgroups (post-ordering). For
example, the following cgroup hierarchy
root cgroup: p1, p2
subcgroup: p3, p4
have BPF_F_ALLOW_MULTI for both cgroup levels.
The effective cgroup array ordering looks like
p3 p4 p1 p2
and at run time, progs will execute based on that order.
But in some cases, it is desirable to have root prog executes earlier than
children progs (pre-ordering). For example,
- prog p1 intends to collect original pkt dest addresses.
- prog p3 will modify original pkt dest addresses to a proxy address for
security reason.
The end result is that prog p1 gets proxy address which is not what it
wants. Putting p1 to every child cgroup is not desirable either as it
will duplicate itself in many child cgroups. And this is exactly a use case
we are encountering in Meta.
To fix this issue, let us introduce a flag BPF_F_PREORDER. If the flag
is specified at attachment time, the prog has higher priority and the
ordering with that flag will be from top to bottom (pre-ordering).
For example, in the above example,
root cgroup: p1, p2
subcgroup: p3, p4
Let us say p2 and p4 are marked with BPF_F_PREORDER. The final
effective array ordering will be
p2 p4 p3 p1
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Additions to the error enum after explicit 0x27 setting for
SEV_RET_INVALID_KEY leads to incorrect value assignments.
Use explicit values to match the manufacturer specifications more
clearly.
Fixes: 3a45dc2b41 ("crypto: ccp: Define the SEV-SNP commands")
CC: stable@vger.kernel.org
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert TDG.VP.VMCALL<ReportFatalError> to KVM_EXIT_SYSTEM_EVENT with
a new type KVM_SYSTEM_EVENT_TDX_FATAL and forward it to userspace for
handling.
TD guest can use TDG.VP.VMCALL<ReportFatalError> to report the fatal
error it has experienced. This hypercall is special because TD guest
is requesting a termination with the error information, KVM needs to
forward the hypercall to userspace anyway, KVM doesn't do parsing or
conversion, it just dumps the 16 general-purpose registers to userspace
and let userspace decide what to do.
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250222014225.897298-8-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These IRQ_TYPE_* defines are used by both drivers/misc/pci_endpoint_test.c
and tools/testing/selftests/pci_endpoint/pci_endpoint_test.c.
Considering that both the misc driver and the selftest already includes
the pcitest.h UAPI header, it makes sense for these IRQ_TYPE_* defines
to be located in the pcitest.h UAPI header.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310111016.859445-10-cassel@kernel.org
The usual mixture of new drivers, support in existing drivers for new
devices, a range of features and general subsystem cleanup.
Two merges of immutable branches in here:
* SPI offload support. Culmination of a long effort to bring the ability
to offload triggered sequences of SPI operations to specific hardware,
allow high datarate acquisition over an SPI bus (if you have the right
hardware / FPGA firmware)
* GPIO set-array-helper - enables code simplification.
New device support
==================
adi,ad3552r-hs:
- Add support for AD3541r and AD3542r via newly supported FPGA HDL.
adi,ad4030
- New driver supporting the AD4030, AD4630 AD4630-16, AD4640-24, AD4632-16,
AD4632-24 1 and 2 channel high precision SPI ADCs.
adi,ad4851
- New driver and backend support for the AD4851, AD4852, AD4853, AD4854,
AD4855, AD4846, AD4857, AD4858 and AD4858I high speed multichannel
simultaneous sampling ADCs.
adi,ad7191
- New driver for this 24-bit ADC for precision bridge applications,
adi,ad7380
- Add support for the adaq4381-4 which is a 14-bit version of the
already supported adaq4380-1
adi,adis16550
- New driver using the ADIS library (which needed extensions) for this
IMU.
brcm,apds9160
- New driver for this proximity and ambient light sensor.
dynaimage,al3000a
- New driver for this illuminance sensor.
mcube,mc3230
- Add support for the mc3510c accelerometer with a different scale to existing
supported parts (some rework preceded this)
nxp,imx93
- Add compatibles for imx94 and imx95 which are fully compatible with imx93.
rockchip,saradc
- Add support for the RK3528 ADC
- Add support for the RK3562 ADC
silab,si7210
- New driver to support this I2C Hall effect magnetic position sensor.
ti,ads7138
- New driver supporting the ADS7128 and AD7138 I2C ADCs.
Staging driver drop
===================
adi,adis16240
- Drop this impact sensor. Interesting part but complex hence never left
staging due to ABI challenges. No longer readily available so drop driver.
New features
============
Documentation
- A really nice overview document introduce ADC terminology and how
it maps to IIO.
core
- New description for FAULT events, used in the ad7173.
- filter_type ABI used in ad4130.
buffer-dmaengine
- Split DMA channel request from buffer allocation (for SPI offload)
- Add a new _with_handle setup variant. (for SPI offload)
adi,adf4371
- Add control of reference clock type and support for frequency doubling
where appropriate.
adi,ad4695
- Support SPI offload.
- Support oversampling control.
adi,ad5791
- Support SPI offload.
adi,ad7124
- Add channel calibration support.
adi,ad7380:
- Alert support (threshold interrupts)
- SPI offload support.
adi,ad7606
- Support writing registers when using backend enabling software control
of modes.
adi,ad7944
- Support SPI offload.
adi,ad9832
- Use devm_regulator_get_enable() to simplify code.
adi,ad9834
- Use devm_regulator_get_enable() to simplify code.
adi,adxl345
- Improve IRQ handling code.
- Add debug access to registers.
bosch,bmi270
- Add temperature channel support.
- Add data ready trigger.
google,cross_ec
- Add trace events.
mcube,mc3230
- Add mount matrix support
- Add an OF match table.
Cleanup and minor bug fixes
===========================
Tree wide:
- Stop using iio_device_claim_direct_scoped() and introduce sparse friendly
iio_device_claim/release_direct()
The conditional scoped cleanup has proved hard to deal with, requiring
workarounds for various compiler issues and in is rather non-intuitive
so abandon that experiment. One of the attractions of that approach was
that it made it much harder to have unbalanced claim/release bugs so
instead introduce a conditional-lock style boolean returning new pair
of functions. These are inline in the header and have __acquire and
__release calls allowing sparse to detect lack of balance. There are
occasional false positives but so far those have reflected complex code
paths that benefited from cleanup anyway.
The first set of driver conversions are in this pull request, more to
follow next cycle. Various related cleanup in drivers.
Removal of the _scoped code is completed and the definition removed.
- Use of str_enable_disable() and similar helpers.
- Don't set regmap cache to REGCACHE_NONE as that's the default anyway.
- Change some caches from RBTREE to MAPLE reflecting best practice.
- Use the new gpiod_multi_set_value_cansleep()
- Make sure to grab direct mode for some calibrations paths.
- Avoid using memcmp on structures when checking for matching channel configs.
Instead just match field by field.
dt-bindings:
- Fix up indentation inconsistencies.
gts-helper:
- Simplify building of available scale table.
adi,ad-sigma-delta
- Make sure to disable channel after calibration done.
- Add error handling in configuring channel during calibration.
adi,ad2s1201
- use a bitmap_write() rather than directly accessing underlying storage.
adi,ad3552r-hs
- Fix a wrong error message.
- Make sure to use instruction mode for configuration.
adi,ad4695
- Add a conversion to ensure exit from conversion mode.
- Use custom regmap to handle required sclk rate change.
- Fix an out of bounds array access
- Simplify oversampling ratio handling.
adi,ad4851
- Fix a sign bug.
adi,ad5791
- Fix wrong exported number of storage bits.
adi,ad7124
- Disable all channels at probe to avoid strange initial configurations.
adi,ad7173
- Rework to allow static const struct ad_sigma_delta without need
to make a copy.
adi,ad7623
- Drop a BSD license tag that the authors consider unnecessary.
adi,ad7768-1
- Fix channels sign description exposed to user space.
- Set MOSI idle state to avoid accidental device reset.
- Avoid some overkill locking.
adi,axi-dac
- Check if device interface is busy when enabling data stream.
- Add control of bus mode.
bosch,bmi270
- Move a struct definition to a c file as only used there.
vishay,veml6030
- Enable regmap cache to reduce bus traffic.
- Fix ABI bug around scale reporting.
vishay,vem6075
- Check array bounds to harden against broken hardware.
Various other minor tweaks and fixes not called out.
*
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmfTNbkRHGppYzIzQGtl
cm5lbC5vcmcACgkQVIU0mcT0Foi41Q/+NTlPu1e9TqOqhXmpJc56EP5wnBaKuxMF
pgPjKr1bbh6Th131kdeCjcTtUyMeLoLLPzcmSzPVVBOLanTHw9d5DQutg7+k0yUE
8L0FEGEuSJeN6u6hnCFyKhOgu2xSPsCj1L99UCCsfXrvkUILSQgXSzoUkZDonmWj
gDd/Zfn1Otxevxct9KG4dHhiy17f734Oxhjv3xyMwHWsUtHndAJiYeYRzRxjMJTh
GeZHZMYvhnseLagAzfp3Dsqc5UCwm/Kmstz8wTlDW0kq/wbZqPwVLiC2xhPnTeds
PqAL3m/n7Z1SYBa5ouKZYcn2pMfcYNsd3ZfbFMIX0gecGZZ6zRfSgulVDZKe7tp+
cFbGhsIVWHt4TLWBevz6xihhyv/8ashi0PfIE1Gju8GZqqwwyR6wmdKJ1CNq3DmL
W9+74w7xAbN+Fg9C/EbuSk/+uKXFrdf9w1NtpnPzKWVwnR0tFwavgpxRCJpu9xMn
63nO/NCl+FtxibBBrpsTAb03hA4Fw595kiBBvkNSoNOYmoASj+w/bHhIFohZeteP
pS3eAQlMRCH4ZCjRDNHIEwO9xoja5aqprhnGCNrVQYR+OCfUw5ELf6GwYpEsmvtt
GyzgR92qxFBQz71h1/VyvAjxcaNRezO92VJajj+T2iJvj5lD6o/0+cscYWfEWEzn
dxuaXrpF3ak=
=4tID
-----END PGP SIGNATURE-----
Merge tag 'iio-for-6.15a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
IIO: New device support, features and cleanup for the 6.15 cycle.
The usual mixture of new drivers, support in existing drivers for new
devices, a range of features and general subsystem cleanup.
Two merges of immutable branches in here:
* SPI offload support. Culmination of a long effort to bring the ability
to offload triggered sequences of SPI operations to specific hardware,
allow high datarate acquisition over an SPI bus (if you have the right
hardware / FPGA firmware)
* GPIO set-array-helper - enables code simplification.
New device support
==================
adi,ad3552r-hs:
- Add support for AD3541r and AD3542r via newly supported FPGA HDL.
adi,ad4030
- New driver supporting the AD4030, AD4630 AD4630-16, AD4640-24, AD4632-16,
AD4632-24 1 and 2 channel high precision SPI ADCs.
adi,ad4851
- New driver and backend support for the AD4851, AD4852, AD4853, AD4854,
AD4855, AD4846, AD4857, AD4858 and AD4858I high speed multichannel
simultaneous sampling ADCs.
adi,ad7191
- New driver for this 24-bit ADC for precision bridge applications,
adi,ad7380
- Add support for the adaq4381-4 which is a 14-bit version of the
already supported adaq4380-1
adi,adis16550
- New driver using the ADIS library (which needed extensions) for this
IMU.
brcm,apds9160
- New driver for this proximity and ambient light sensor.
dynaimage,al3000a
- New driver for this illuminance sensor.
mcube,mc3230
- Add support for the mc3510c accelerometer with a different scale to existing
supported parts (some rework preceded this)
nxp,imx93
- Add compatibles for imx94 and imx95 which are fully compatible with imx93.
rockchip,saradc
- Add support for the RK3528 ADC
- Add support for the RK3562 ADC
silab,si7210
- New driver to support this I2C Hall effect magnetic position sensor.
ti,ads7138
- New driver supporting the ADS7128 and AD7138 I2C ADCs.
Staging driver drop
===================
adi,adis16240
- Drop this impact sensor. Interesting part but complex hence never left
staging due to ABI challenges. No longer readily available so drop driver.
New features
============
Documentation
- A really nice overview document introduce ADC terminology and how
it maps to IIO.
core
- New description for FAULT events, used in the ad7173.
- filter_type ABI used in ad4130.
buffer-dmaengine
- Split DMA channel request from buffer allocation (for SPI offload)
- Add a new _with_handle setup variant. (for SPI offload)
adi,adf4371
- Add control of reference clock type and support for frequency doubling
where appropriate.
adi,ad4695
- Support SPI offload.
- Support oversampling control.
adi,ad5791
- Support SPI offload.
adi,ad7124
- Add channel calibration support.
adi,ad7380:
- Alert support (threshold interrupts)
- SPI offload support.
adi,ad7606
- Support writing registers when using backend enabling software control
of modes.
adi,ad7944
- Support SPI offload.
adi,ad9832
- Use devm_regulator_get_enable() to simplify code.
adi,ad9834
- Use devm_regulator_get_enable() to simplify code.
adi,adxl345
- Improve IRQ handling code.
- Add debug access to registers.
bosch,bmi270
- Add temperature channel support.
- Add data ready trigger.
google,cross_ec
- Add trace events.
mcube,mc3230
- Add mount matrix support
- Add an OF match table.
Cleanup and minor bug fixes
===========================
Tree wide:
- Stop using iio_device_claim_direct_scoped() and introduce sparse friendly
iio_device_claim/release_direct()
The conditional scoped cleanup has proved hard to deal with, requiring
workarounds for various compiler issues and in is rather non-intuitive
so abandon that experiment. One of the attractions of that approach was
that it made it much harder to have unbalanced claim/release bugs so
instead introduce a conditional-lock style boolean returning new pair
of functions. These are inline in the header and have __acquire and
__release calls allowing sparse to detect lack of balance. There are
occasional false positives but so far those have reflected complex code
paths that benefited from cleanup anyway.
The first set of driver conversions are in this pull request, more to
follow next cycle. Various related cleanup in drivers.
Removal of the _scoped code is completed and the definition removed.
- Use of str_enable_disable() and similar helpers.
- Don't set regmap cache to REGCACHE_NONE as that's the default anyway.
- Change some caches from RBTREE to MAPLE reflecting best practice.
- Use the new gpiod_multi_set_value_cansleep()
- Make sure to grab direct mode for some calibrations paths.
- Avoid using memcmp on structures when checking for matching channel configs.
Instead just match field by field.
dt-bindings:
- Fix up indentation inconsistencies.
gts-helper:
- Simplify building of available scale table.
adi,ad-sigma-delta
- Make sure to disable channel after calibration done.
- Add error handling in configuring channel during calibration.
adi,ad2s1201
- use a bitmap_write() rather than directly accessing underlying storage.
adi,ad3552r-hs
- Fix a wrong error message.
- Make sure to use instruction mode for configuration.
adi,ad4695
- Add a conversion to ensure exit from conversion mode.
- Use custom regmap to handle required sclk rate change.
- Fix an out of bounds array access
- Simplify oversampling ratio handling.
adi,ad4851
- Fix a sign bug.
adi,ad5791
- Fix wrong exported number of storage bits.
adi,ad7124
- Disable all channels at probe to avoid strange initial configurations.
adi,ad7173
- Rework to allow static const struct ad_sigma_delta without need
to make a copy.
adi,ad7623
- Drop a BSD license tag that the authors consider unnecessary.
adi,ad7768-1
- Fix channels sign description exposed to user space.
- Set MOSI idle state to avoid accidental device reset.
- Avoid some overkill locking.
adi,axi-dac
- Check if device interface is busy when enabling data stream.
- Add control of bus mode.
bosch,bmi270
- Move a struct definition to a c file as only used there.
vishay,veml6030
- Enable regmap cache to reduce bus traffic.
- Fix ABI bug around scale reporting.
vishay,vem6075
- Check array bounds to harden against broken hardware.
Various other minor tweaks and fixes not called out.
*
* tag 'iio-for-6.15a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (223 commits)
doc: iio: ad7380: describe offload support
iio: ad7380: add support for SPI offload
iio: light: Add check for array bounds in veml6075_read_int_time_ms
iio: adc: ti-ads7924 Drop unnecessary function parameters
staging: iio: ad9834: Use devm_regulator_get_enable()
staging: iio: ad9832: Use devm_regulator_get_enable()
iio: gyro: bmg160_spi: add of_match_table
dt-bindings: iio: adc: Add i.MX94 and i.MX95 support
iio: adc: ad7768-1: remove unnecessary locking
Documentation: ABI: add wideband filter type to sysfs-bus-iio
iio: adc: ad7768-1: set MOSI idle state to prevent accidental reset
iio: adc: ad7768-1: Fix conversion result sign
iio: adc: ad7124: Benefit of dev = indio_dev->dev.parent in ad7124_parse_channel_config()
iio: adc: ad7124: Implement system calibration
iio: adc: ad7124: Implement internal calibration at probe time
iio: adc: ad_sigma_delta: Add error checking for ad_sigma_delta_set_channel()
iio: adc: ad4130: Adapt internal names to match official filter_type ABI
iio: adc: ad7173: Fix comparison of channel configs
iio: adc: ad7124: Fix comparison of channel configs
iio: adc: ad4130: Fix comparison of channel setups
...
Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers
with the same timer ID on restore. It uses sys_timer_create() and relies on
the monotonic increasing timer ID provided by this syscall. It creates and
deletes timers until the desired ID is reached. This is can loop for a long
time, when the checkpointed process had a very sparse timer ID range.
It has been debated to implement a new syscall to allow the creation of
timers with a given timer ID, but that's tideous due to the 32/64bit compat
issues of sigevent_t and of dubious value.
The restore mechanism of CRIU creates the timers in a state where all
threads of the restored process are held on a barrier and cannot issue
syscalls. That means the restorer task has exclusive control.
This allows to address this issue with a prctl() so that the restorer
thread can do:
if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_ON))
goto linear_mode;
create_timers_with_explicit_ids();
prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_OFF);
This is backwards compatible because the prctl() fails on older kernels and
CRIU can fall back to the linear timer ID mechanism. CRIU versions which do
not know about the prctl() just work as before.
Implement the prctl() and modify timer_create() so that it copies the
requested timer ID from userspace by utilizing the existing timer_t
pointer, which is used to copy out the allocated timer ID on success.
If the prctl() is disabled, which it is by default, timer_create() works as
before and does not try to read from the userspace pointer.
There is no problem when a broken or rogue user space application enables
the prctl(). If the user space pointer does not contain a valid ID, then
timer_create() fails. If the data is not initialized, but constains a
random valid ID, timer_create() will create that random timer ID or fail if
the ID is already given out.
As CRIU must use the raw syscall to avoid manipulating the internal state
of the restored process, this has no library dependencies and can be
adopted by CRIU right away.
Recreating two timers with IDs 1000000 and 2000000 takes 1.5 seconds with
the create/delete method. With the prctl() it takes 3 microseconds.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Link: https://lore.kernel.org/all/87jz8vz0en.ffs@tglx
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek
BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm
SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+
t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd
n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ
JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc
eWLFUeg=
=Wypt
-----END PGP SIGNATURE-----
Backmerge tag 'v6.14-rc6' into drm-next
This is a backmerge from Linux 6.14-rc6, needed for the nova PR.
Signed-off-by: Dave Airlie <airlied@redhat.com>
counter:
- Introduce the COUNTER_EVENT_DIRECTION_CHANGE event
- Introduce the COUNTER_COMP_COMPARE helper macro
microchip-tcb-cpature:
- Add IRQ handling
- Add support for capture extensions
- Add support for compare extension
ti-eqep:
- Add support for reading and detecting changes in direction
tools/counter:
- Add counter_watch_events executable to .gitignore
- Support COUNTER_EVENT_DIRECTION_CHANGE in counter_watch_events tool
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSNN83d4NIlKPjon7a1SFbKvhIjKwUCZ8+ECgAKCRC1SFbKvhIj
K/i+AQDS3TVUcot2C+NsvNKud2hA2Q7IIIsAsorQgSdjnx9wDAD8CfQvuvEr6JhZ
WuWxw8L59xJAYlInQjpP89pZQA4g+Ao=
=b0Dk
-----END PGP SIGNATURE-----
Merge tag 'counter-updates-for-6.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter into char-misc-next
William writes:
Counter updates for 6.15
counter:
- Introduce the COUNTER_EVENT_DIRECTION_CHANGE event
- Introduce the COUNTER_COMP_COMPARE helper macro
microchip-tcb-cpature:
- Add IRQ handling
- Add support for capture extensions
- Add support for compare extension
ti-eqep:
- Add support for reading and detecting changes in direction
tools/counter:
- Add counter_watch_events executable to .gitignore
- Support COUNTER_EVENT_DIRECTION_CHANGE in counter_watch_events tool
* tag 'counter-updates-for-6.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter:
counter: microchip-tcb-capture: Add support for RC Compare
counter: Introduce the compare component
counter: microchip-tcb-capture: Add capture extensions for registers RA/RB
counter: microchip-tcb-capture: Add IRQ handling
counter: ti-eqep: add direction support
tools/counter: add direction change event to watcher
counter: add direction change event
tools/counter: gitignore counter_watch_events
Some regulatory bodies doesn't allow IR (initiate radioation) on a
specific subband, but allows it for channels with a bandwidth of 20 MHz.
Add a channel flag that indicates that, and consider it in
cfg80211_reg_check_beaconing.
While on it, fix the kernel doc of enum nl80211_reg_rule_flags and
change it to use BIT().
Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Co-developed-by: Somashekhar Puttagangaiah <somashekhar.puttagangaiah@intel.com>
Signed-off-by: Somashekhar Puttagangaiah <somashekhar.puttagangaiah@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250308225541.d3ab352a73ff.I8a8f79e1c9eb74936929463960ee2a324712fe51@changeid
[fix typo]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some extended MLD capabilities and operations bits (currently
the "BTM MLD Recommendataion For Multiple APs Support" bit)
may depend on userspace capabilities. Allow userspace to pass
the values for this field that it supports to the association
and link reconfiguration operations.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Link: https://patch.msgid.link/20250308225541.bd52078b5f65.I4dd8f53b0030db7ea87a2e0920989e7e2c7b5345@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The legacy rpc.nfsd tool will set the nlm_grace_period if the NFSv4
grace period is set. nfsdctl is missing this functionality, so add a new
netlink control interface for lockd that it can use. For now, it only
allows setting the grace period, and the tcp and udp listener ports.
lockd currently uses module parameters and sysctls for configuration, so
all of its settings are global. With this change, lockd now tracks these
values on a per-net-ns basis. It will only fall back to using the global
values if any of them are 0.
Finally, as a backward compatibility measure, if updating the nlm
settings in the init_net namespace, also update the legacy global
values to match.
Link: https://issues.redhat.com/browse/RHEL-71698
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
TCB hardware is capable of capturing the timer value to registers RA and
RB. Add these registers as capture extensions.
Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
Link: https://lore.kernel.org/r/20250306134441.582819-3-csokas.bence@prolan.hu
Signed-off-by: William Breathitt Gray <wbg@kernel.org>
Add interrupt servicing to allow userspace to wait for the following:
* Change-of-state caused by external trigger
* Capture of timer value into RA/RB
* Compare to RC register
* Overflow
Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
Link: https://lore.kernel.org/r/20250306134441.582819-2-csokas.bence@prolan.hu
Signed-off-by: William Breathitt Gray <wbg@kernel.org>
Add support for more per-process flags starting with option to configure
MFMA precision for gfx 9.5
v2: Change flag name to KFD_PROC_FLAG_MFMA_HIGH_PRECISION
Remove unused else condition
v3: Bump the KFD API version
v4: Missed SH_MEM_CONFIG__PRECISION_MODE__SHIFT define. Added it.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the remaining SHF_ flags, as listed in the "Executable and
Linkable Format" Wikipedia page and the System V Application Binary
Interface[1]. This allows drivers to load and parse ELF images that use
some of those flags.
In particular, an upcoming change to the Nouveau GPU driver will
use some of the flags.
Link: https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.sheader.html#sh_flags [1]
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Link: https://lore.kernel.org/r/20250307171417.267488-1-ttabi@nvidia.com
Signed-off-by: Kees Cook <kees@kernel.org>
Per PCIe r6.0, sec 7.8.6.2, devices can advertise Resizable BAR sizes up to
128 TB in the Resizable BAR Capability register. Larger sizes can be
advertised via the Capability register, but that requires an API change.
Update pci_rebar_get_possible_sizes() and pbus_size_mem() to increase the
sizes we currently support from 512 GB to 128 TB.
Link: https://lore.kernel.org/r/20250307053535.44918-1-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* for-6.15/io_uring: (80 commits)
io_uring: introduce io_cache_free() helper
io_uring/rsrc: skip NULL file/buffer checks in io_free_rsrc_node()
io_uring/rsrc: avoid NULL node check on io_sqe_buffer_register() failure
io_uring/rsrc: call io_free_node() on io_sqe_buffer_register() failure
io_uring/rsrc: free io_rsrc_node using kfree()
io_uring/rsrc: split out io_free_node() helper
io_uring/rsrc: include io_uring_types.h in rsrc.h
ublk: don't cast registered buffer index to int
io_uring/nop: use io_find_buf_node()
io_uring/rsrc: declare io_find_buf_node() in header file
io_uring/ublk: report error when unregister operation fails
io_uring: convert cmd_to_io_kiocb() macro to function
io_uring/uring_cmd: specify io_uring_cmd_import_fixed() pointer type
io_uring/rsrc: use rq_data_dir() to compute bvec dir
selftests: ublk: add ublk zero copy test
selftests: ublk: add file backed ublk
selftests: ublk: add kernel selftests for ublk
io_uring: cache nodes and mapped buffers
ublk: zc register/unregister bvec
io_uring: add support for kernel registered bvecs
...
PCIe r6.1, sec 6.30.1.1, describes a "Vendor ID", a "Data Object Type" and
"Next Index" as the fields in the DOE Discovery Response Data Object. The
DOE driver currently uses both the terms 'type' and 'prot' for the second
element.
Rename all uses of the DOE Discovery Response Data Object to use 'type' as
the second element of the object header, instead of type/prot as it
currently is.
Link: https://lore.kernel.org/r/20250306075211.1855177-2-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
"int" was misspelled as "init" the code comments in the bits.h and
const.h files. Fix the typo.
CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Similar to compute queue reset, flag SDMA queue reset capabilities to
user space for safe testing.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some tools like systemd's jounral need to retrieve the exit and cgroup
information after a process has already been reaped. This can e.g.,
happen when retrieving a pidfd via SCM_PIDFD or SCM_PEERPIDFD.
Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-6-c8c3d8361705@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
* cfg80211/mac80211
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
* rtw88
- preparation for RTL8814AU support
* rtw89
- use wiphy_lock/wiphy_work
- preparations for MLO
- BT-Coex improvements
- regulatory support in firmware files
* iwlwifi
- preparations for the new iwlmld sub-driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmfG9tQACgkQ10qiO8sP
aAAE1g/9Fg6vrTY1OX0at1iXL5AMqtfAVUJffHGz8SyWJNzKgP0pLtXUTi1Ta2b4
TmHG0+4fMcwM0nigL0cg/F6vEE5V7qM30b7WMhIKpBejcCklINEIYuB1a6CwfDNF
7QtE/NA9/oKBy2JftYpm+4mdtlKdwBdkpz3MKE8/BpB1HdmG3IS3/z9+t4jkMOv+
hRxZKRlqpW98uimc5JjkjMGyt/e4QyHiUt0Z7Ly3dzm0PajvmRy4Idkl3TFMi4Yp
GgKFTix9/7G4iYFoyK5iNiqfesnzqY6jaI3sxv6h6AG/Yz8EOkBNNnKovPC0lAo/
ZpkhU63XN10SZXxeSDSiz4tWx3uBxpN8PIRvuaEr+KW7caCjMYXAuB5uE+KvREA3
hkM8zB5IkT54bLaHJYG6TGQoWxRnzoEDjURBHkEG9DfJs2ixKxdLfe/Rc1lvLqt4
t5ejc+UQsjICZ31LhrZkNbfmKLtQfFCcBj3ZUHz9RzO1AXmsqudeRcl0j3rDjsqY
AdYKFEKGCKeBlhr+Cng9VqzJa7gCmgVgkXYjGxJgbL0n3u6oV/S08/7Ssbzf7RXY
ulFqxEjaexn/vtUHVwMOTxyDttftOfB3Y1IrNUJj3OXWpuXK247OQN6QHM9XWtVy
xzBOW3g9N0L3EUM4PVmZTGU4SdtuwNxvZbSrToBfnjVu0rESAT4=
=KYXQ
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2025-03-04-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
First 6.15 material:
* cfg80211/mac80211
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
* rtw88
- preparation for RTL8814AU support
* rtw89
- use wiphy_lock/wiphy_work
- preparations for MLO
- BT-Coex improvements
- regulatory support in firmware files
* iwlwifi
- preparations for the new iwlmld sub-driver
* tag 'wireless-next-2025-03-04-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (128 commits)
wifi: iwlwifi: remove mld/roc.c
wifi: mac80211: refactor populating mesh related fields in sinfo
wifi: cfg80211: reorg sinfo structure elements for mesh
wifi: iwlwifi: Fix spelling mistake "Increate" -> "Increase"
wifi: iwlwifi: add Debug Host Command APIs
wifi: iwlwifi: add IWL_MAX_NUM_IGTKS macro
wifi: iwlwifi: add OMI bandwidth reduction APIs
wifi: iwlwifi: remove mvm prefix from iwl_mvm_d3_end_notif
wifi: iwlwifi: remember if the UATS table was read successfully
wifi: iwlwifi: export iwl_get_lari_config_bitmap
wifi: iwlwifi: add support for external 32 KHz clock
wifi: iwlwifi: mld: add a debug level for EHT prints
wifi: iwlwifi: mld: add a debug level for PTP prints
wifi: iwlwifi: remove mvm prefix from iwl_mvm_esr_mode_notif
wifi: iwlwifi: use 0xff instead of 0xffffffff for invalid
wifi: iwlwifi: location api cleanup
wifi: cfg80211: expose update timestamp to drivers
wifi: mac80211: add ieee80211_iter_chan_contexts_mtx
wifi: mac80211: fix integer overflow in hwmp_route_info_get()
wifi: mac80211: Fix possible integer promotion issue
...
====================
Link: https://patch.msgid.link/20250304125605.127914-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 05c1280a2b ("netdev_features: convert NETIF_F_NETNS_LOCAL to
dev->netns_local"), there is no way to see if the netns_immutable property
s set on a device. Let's add a netlink attribute to advertise it.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The type is used by tools/testing/selftests/vDSO/parse_vdso.c.
To be able to build the vDSO selftests without a libc dependency,
add the type to the kernels own UAPI headers. As documented by elf(5).
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-5-28e14e031ed8@linutronix.de
The definition is used by tools/testing/selftests/vDSO/parse_vdso.c.
To be able to build the vDSO selftests without a libc dependency,
add the definition to the kernels own UAPI headers.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.symtab.html
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-2-28e14e031ed8@linutronix.de
The in-tree ublk driver doesn't need DMA alignment limit because there
is one data copy between request pages and the userspace buffer.
However, ublk is going to support zero copy, then DMA alignment limit
is required, because same IO buffer is forwarded to backend which may
have specific buffer DMA alignment limit, so the limit has to be exposed
from the frontend driver to client application.
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250227103707.2640014-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Implement support for ROI as described in UVC 1.5:
4.2.2.1.20 Digital Region of Interest (ROI) Control
ROI control is implemented using V4L2 control API as
two UVC-specific controls:
V4L2_CID_UVC_REGION_OF_INTEREST_RECT and
V4L2_CID_UVC_REGION_OF_INTEREST_AUTO.
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Yunke Cao <yunkec@google.com>
Reviewed-by: Yunke Cao <yunkec@google.com>
Tested-by: Yunke Cao <yunkec@google.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250203-uvc-roi-v17-16-5900a9fed613@chromium.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
[hverkuil: fix control names: "Of" -> "of", "Controls" -> "Ctrls"]
Add the capability of retrieving the min and max values of a
compound control.
[Ricardo: Added static to v4l2_ctrl_type_op_(maximum|minimum) proto]
[Ricardo: Fix documentation]
Signed-off-by: Yunke Cao <yunkec@google.com>
Tested-by: Yunke Cao <yunkec@google.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250203-uvc-roi-v17-2-5900a9fed613@chromium.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
[hverkuil: fix small alignment checkpatch warning]
Add p_rect to struct v4l2_ext_control with basic support in
v4l2-ctrls.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Yunke Cao <yunkec@google.com>
Reviewed-by: Hans Verkuil <hverkuil@xs4all.nl>
Tested-by: Yunke Cao <yunkec@google.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250203-uvc-roi-v17-1-5900a9fed613@chromium.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
In preparation for memtostr*() checking that its source is marked as
nonstring, annotate the device strings accordingly using the new UAPI
alias for the "nonstring" attribute.
Signed-off-by: Kees Cook <kees@kernel.org>
In order to annotate byte arrays in UAPI that are not C strings (i.e.
they may not be NUL terminated), the "nonstring" attribute is needed.
However, we can't expose this to userspace as it is compiler version
specific.
Signed-off-by: Kees Cook <kees@kernel.org>
This patch reverts 'commit c32ee3d9abd2("bitops: avoid integer overflow in
GENMASK(_ULL)")'.
The code generation can be shrink by over 1KB by reverting this commit.
Originally the commit claimed that clang would emit warnings using the
implementation at that time.
The patch was applied and tested against numerous compilers, including
gcc-13, gcc-12, gcc-11 cross-compiler, clang-17, clang-18 and clang-19.
Various warning levels were set (-W=0, -W=1, -W=2) and CONFIG_WERROR
disabled to complete the compilation. The results show that no compilation
errors or warnings were generated due to the patch.
The results of code size reduction are summarized in the following table.
The code size changes for clang are all zero across different versions,
so they're not listed in the table.
For NR_CPUS=64 on x86_64.
----------------------------------------------
| | gcc-13 | gcc-12 | gcc-11 |
----------------------------------------------
| old | 22438085 | 22453915 | 22302033 |
----------------------------------------------
| new | 22436816 | 22452913 | 22300826 |
----------------------------------------------
| new - old | -1269 | -1002 | -1207 |
----------------------------------------------
For NR_CPUS=1024 on x86_64.
----------------------------------------------
| | gcc-13 | gcc-12 | gcc-11 |
----------------------------------------------
| old | 22493682 | 22509812 | 22357661 |
----------------------------------------------
| new | 22493230 | 22509487 | 22357250 |
----------------------------------------------
| new - old | -452 | -325 | -411 |
----------------------------------------------
For arm64 architecture, gcc cross-compiler was used and QEMU was
utilized to execute a VM for a CPU-heavy workload to ensure no
side effects and that functionalities remained correct. The test
even demonstrated a positive result in terms of code size reduction:
* Before: 31660668
* After: 31658724
* Difference (After - Before): -1944
An analysis of multiple functions compiled with gcc-13 on x86_64 was
performed. In summary, the patch elimates one negation in almost
every use case. However, negative effects may occur in some cases,
such as the generation of additional "mov" instruction or increased
register usage. The use of "~_UL(0) << (l)" may even result in the
allocations of "%r*" registers instead of "%e*" registers (which are
32-bit registers) because the compiler cannot assume that the higher
bits are zero.
Yury:
We limit GENMASK() usage with the const_true(l > h) condition, and
most of users just call it with constant parameters. For those, the
actual implementation of the macro doesn't matter, and since it
triggered clang warnings back then, it was reasonable to workaround
the warnings on the kernel side.
Now that some find_bit() functions call GENMASK() with runtime
parameters (although the const_true() condition holds), this ended up
hurting the generated code, as I Hsin discovered. This is especially
bad because it hurts small_const_nbits() optimization, where people are
most concerned about generated code quality. So, revert it to the
original version for good.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Provide new operations for the user to request mapping an active request
to an io uring instance's buf_table. The user has to provide the index
it wants to install the buffer.
A reference count is taken on the request to ensure it can't be
completed while it is active in a ring's buf_table.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250227223916.143006-6-kbusch@meta.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Recently, in case of Cilium, we run into users on Azure who require to use
tunneling for east/west traffic due to hitting IPAM API limits for Kubernetes
Pods if they would have gone with publicly routable IPs for Pods. In case
of tunneling, Cilium supports the option of vxlan or geneve. In order to
RSS spread flows among remote CPUs both derive a source port hash via
udp_flow_src_port() which takes the inner packet's skb->hash into account.
For clusters with many nodes, this can then hit a new limitation [0]: Today,
the Azure networking stack supports 1M total flows (500k inbound and 500k
outbound) for a VM. [...] Once this limit is hit, other connections are
dropped. [...] Each flow is distinguished by a 5-tuple (protocol, local IP
address, remote IP address, local port, and remote port) information. [...]
For vxlan and geneve, this can create a massive amount of UDP flows which
then run into the limits if stale flows are not evicted fast enough. One
option to mitigate this for vxlan is to narrow the source port range via
IFLA_VXLAN_PORT_RANGE while still being able to benefit from RSS. However,
geneve currently does not have this option and it spreads traffic across
the full source port range of [1, USHRT_MAX]. To overcome this limitation
also for geneve, add an equivalent IFLA_GENEVE_PORT_RANGE setting for users.
Note that struct geneve_config before/after still remains at 2 cachelines
on x86-64. The low/high members of struct ifla_geneve_port_range (which is
uapi exposed) are of type __be16. While they would be perfectly fine to be
of __u16 type, the consensus was that it would be good to be consistent
with the existing struct ifla_vxlan_port_range from a uapi consumer PoV.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-machine-network-throughput [0]
Link: https://patch.msgid.link/20250226182030.89440-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If queue size is less than minimum, clamp it to minimum to prevent
underflow when writing queue mqd.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Merge mainline fixes into 6.15 branch, as upcoming patches depend on
fixes that went into the 6.14 mainline branch.
* io_uring-6.14:
io_uring/net: save msg_control for compat
io_uring/rw: clean up mshot forced sync mode
io_uring/rw: move ki_complete init into prep
io_uring/rw: don't directly use ki_complete
io_uring/rw: forbid multishot async reads
io_uring/rsrc: remove unused constants
io_uring: fix spelling error in uapi io_uring.h
io_uring: prevent opcode speculation
io-wq: backoff when retrying worker creation
Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS,
for flows using TCP TS option (RFC 7323)
As hinted by an old comment in tcp_check_req(),
we can check the TSEcr value in the incoming packet corresponds
to one of the SYNACK TSval values we have sent.
In this patch, I record the oldest and most recent values
that SYNACK packets have used.
Send a challenge ACK if we receive a TSEcr outside
of this range, and increase a new SNMP counter.
nstat -az | grep TSEcrRejected
TcpExtTSEcrRejected 0 0.0
Due to TCP fastopen implementation, do not apply yet these checks
for fastopen flows.
v2: No longer use req->num_timeout, but treq->snt_tsval_first
to detect when first SYNACK is prepared. This means
we make sure to not send an initial zero TSval.
Make sure MPTCP and TCP selftests are passing.
Change MIB name to TcpExtTSEcrRejected
v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/
Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ785FhAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSILQBAMFwpFClzjVeWyLFNd/gaTlPWeedvnag+yZu
CK9q39jOAP9tj1unQBFpsI7jsTk6ZxPVxb4DymPIirZMb/5FuxUnDQ==
=QrSM
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"Fixes to TCP socket identification, documentation, and tests"
* tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add binaries to .gitignore
selftests/landlock: Test that MPTCP actions are not restricted
selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP
landlock: Fix non-TCP sockets restriction
landlock: Minor typo and grammar fixes in IPC scoping documentation
landlock: Fix grammar error
selftests/landlock: Enable the new CONFIG_AF_UNIX_OOB
KVM's treatment of the ID registers that describe the implementation
(MIDR, REVIDR, and AIDR) is interesting, to say the least. On the
userspace-facing end of it, KVM presents the values of the boot CPU on
all vCPUs and treats them as invariant. On the guest side of things KVM
presents the hardware values of the local CPU, which can change during
CPU migration in a big-little system.
While one may call this fragile, there is at least some degree of
predictability around it. For example, if a VMM wanted to present
big-little to a guest, it could affine vCPUs accordingly to the correct
clusters.
All of this makes a giant mess out of adding support for making these
implementation ID registers writable. Avoid breaking the rather subtle
ABI around the old way of doing things by requiring opt-in from
userspace to make the registers writable.
When the cap is enabled, allow userspace to set MIDR, REVIDR, and AIDR
to any non-reserved value and present those values consistently across
all vCPUs.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
[oliver: changelog, capability]
Link: https://lore.kernel.org/r/20250225005401.679536-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The way how the virtual interface is called inside the batman-adv source
code is not consistent. The genl headers call it meshif and the rest of the
code calls is (mostly) softif.
The genl definitions cannot be touched because they are part of the UAPI.
But the rest of the batman-adv code can be touched to have a consistent
name again.
The bulk of the renaming was done using
sed -i -e 's/soft\(-\|\_\| \|\)i\([nf]\)/mesh\1i\2/g' \
-e 's/SOFT\(-\|\_\| \|\)I\([NF]\)/MESH\1I\2/g'
and then it was adjusted slightly when proofreading the changes.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Upcoming changes will add a USB host (and later gadget) driver for the
MCTP-over-USB protocol. Add a header that provides common definitions
for protocol support: the packet header format and a few framing
definitions. Add a define for the MCTP class code, as per
https://usb.org/defined-class-codes.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-1-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add an attribute that allows matching on DSCP with a mask. Matching on
DSCP with a mask is needed in deployments where users encode path
information into certain bits of the DSCP field.
Temporarily set the type of the attribute to 'NLA_REJECT' while support
is being added.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ7ffOQAKCRAraS+Z+3y5
EzVHAP9h/QkeYoOZW9gul08I8vFiZsFe/lbOSLJWxeVfxb9JhgD/cMqby3qAxQK6
lsdNQ9jYG2232Wym89ag7fvTBK15Wg4=
=gkN2
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-02-20
We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).
The main changes are:
1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing
2) Add network TX timestamping support to BPF sock_ops, from Jason Xing
3) Add TX metadata Launch Time support, from Song Yoong Siang
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
igc: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
net: stmmac: Add launch time support to XDP ZC
selftests/bpf: Add launch time request to xdp_hw_metadata
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
bpf: Support selective sampling for bpf timestamping
bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
bpf: Disable unsafe helpers in TX timestamping callbacks
bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
bpf: Add networking timestamping support to bpf_get/setsockopt()
selftests/bpf: Add rto max for bpf_setsockopt test
bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================
Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is
presented through AER and DPC Capability registers. The TLP Prefix Log
Register is not present with Flit mode, and the register becomes an
extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13).
Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the
extended TLP Header Log when the Link is in Flit mode. As the Prefix Log
and Extended TLP Header are not present at the same time, a C union can be
used.
Determining whether the error occurred while the Link was in Flit mode is a
bit complicated. In case of AER, the Advanced Error Capabilities and
Control Register directly tells whether the error was logged in Flit mode
or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14),
unfortunately, does not contain the same information.
Unlike AER, the DPC Capability does not provide a way to discern whether
the error was logged in Flit mode (this is confirmed by PCI WG to be an
oversight in the spec). DPC will bring the Link down immediately following
an error, which makes it impossible to acquire the Flit Mode Status
directly from the Link Status 2 register because Flit Mode Status is only
set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use
the flit_mode value stored into the struct pci_bus.
Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAme4rVYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpodqD/9hVmNuIJTpfLNdKNGgsgE1VRhtTtKgISOV
U2c2NDVvx6brqyRL2JGfC3DhZyXvC98qEKXfsJf0lbDPh/pP3oQMLWmPJWsJKVUt
DpE9RzI89/objCvIUNMLS9svixmBcVSBnF9x1ppGl6Ou4+NSkS72aswMNlP3qLr9
+UnAxEFWCCn/8sl29GdtUDXSm8l/u6oftxeGhXTaczwEVUSvclyTqrsqoy05z9ib
Ar5PC9h7crwCgrnMahAUkM603Vi3gZzkDiQre/GK28HN7nx0TbwB9inRsxrO3sv5
01rwwA67oN7sEp2AM3Svh4MyBb3wyNRzcynZMgml2YM+0RRmstmeT2VBjilCRVCT
s5pO+KnJ1Q64rYcaqPU5+hqfJ55d8qveBjB8Omu4BrFaf7HuzgLq1bbEBrmyoHKm
TQvM/USnwycOs6FrdAWS5X+nx9pFIDowd6ZqLckVhWxjnIsi/WeyNoJFP9U887X7
YdatkNoqY7EApykzUxVKzuSw1sgADE9HvnBSrfBcbyxA3Wpl52K+Y+/NyRjlJamF
aGP8FpM1ZoOcwyWhi0By8PmB4flSnirYQ7Ysu0Wv12CoNylHv1n1pbbGNZ/LuXGN
MrrE4jQLk5cVEfSk/+C0G5MreUkwav8K5cnEoqq36peq9LXW36jQdWkYieZMCbun
lKODmRPAgA==
=w+aR
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Series fixing an issue with multishot read on pollable files that may
return -EIOCBQUEUED from ->read_iter(). Four small patches for that,
the first one deliberately done in such a way that it'd be easy to
backport
- Remove some dead constant definitions
- Use array_index_nospec() for opcode indexing
- Work-around for worker creation retries in the presence of signals
* tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linux:
io_uring/rw: clean up mshot forced sync mode
io_uring/rw: move ki_complete init into prep
io_uring/rw: don't directly use ki_complete
io_uring/rw: forbid multishot async reads
io_uring/rsrc: remove unused constants
io_uring: fix spelling error in uapi io_uring.h
io_uring: prevent opcode speculation
io-wq: backoff when retrying worker creation
Add support for the 'eUSB2 Isochronous Endpoint Companion Descriptor'
introduced in the recent USB 2.0 specification 'USB 2.0 Double Isochronous
IN Bandwidth' ECN.
It allows embedded USB2 (eUSB2) devices to report and use higher bandwidths
for isochronous IN transfers in order to support higher camera resolutions
on the lid of laptops and tablets with minimal change to the USB2 protocol.
The motivation for expanding USB 2.0 is further clarified in an additional
Embedded USB2 version 2.0 (eUSB2v2) supplement to the USB 2.0
specification. It points out this is optimized for performance, power and
cost by using the USB 2.0 low-voltage, power efficient PHY and half-duplex
link for the asymmetric camera bandwidth needs, avoiding the costly and
complex full-duplex USB 3.x symmetric link and gigabit receivers.
eUSB2 devices that support the higher isochronous IN bandwidth and the new
descriptor can be identified by their device descriptor bcdUSB value of
0x0220
Co-developed-by: Amardeep Rai <amardeep.rai@intel.com>
Signed-off-by: Amardeep Rai <amardeep.rai@intel.com>
Signed-off-by: Kannappan R <r.kannappan@intel.com>
Co-developed-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20250220141339.1939448-1-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The define used for the version in the example diagram does not match what
is defined in enum rksip1_ext_param_buffer_version, nor the description
above it. Correct the typo to make it clear which define to use.
Fixes: e9d05e9d5d ("media: uapi: rkisp1-config: Add extensible params format")
Cc: stable@vger.kernel.org
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
This patch introduces a new callback in tcp_tx_timestamp() to correlate
tcp_sendmsg timestamp with timestamps from other tx timestamping
callbacks (e.g., SND/SW/ACK).
Without this patch, BPF program wouldn't know which timestamps belong
to which flow because of no socket lock protection. This new callback
is inserted in tcp_tx_timestamp() to address this issue because
tcp_tx_timestamp() still owns the same socket lock with
tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes
the timestamping related fields for the skb, especially tskey. The
tskey is the bridge to do the correlation.
For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and
then stores the sendmsg timestamp at the bpf_sk_storage, correlating
this timestamp with its tskey that are later used in other sending
timestamping callbacks.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-11-kerneljasonxing@gmail.com
Support the ACK case for bpf timestamping.
Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_ACK_CB. This
callback will occur at the same timestamping point as the user
space's SCM_TSTAMP_ACK. The BPF program can use it to get the
same SCM_TSTAMP_ACK timestamp without modifying the user-space
application.
This patch extends txstamp_ack to two bits: 1 stands for
SO_TIMESTAMPING mode, 2 bpf extension.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-10-kerneljasonxing@gmail.com
Support hw SCM_TSTAMP_SND case for bpf timestamping.
Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This
callback will occur at the same timestamping point as the user
space's hardware SCM_TSTAMP_SND. The BPF program can use it to
get the same SCM_TSTAMP_SND timestamp without modifying the
user-space application.
To avoid increasing the code complexity, replace SKBTX_HW_TSTAMP
with SKBTX_HW_TSTAMP_NOBPF instead of changing numerous callers
from driver side using SKBTX_HW_TSTAMP. The new definition of
SKBTX_HW_TSTAMP means the combination tests of socket timestamping
and bpf timestamping. After this patch, drivers can work under the
bpf timestamping.
Considering some drivers don't assign the skb with hardware
timestamp, this patch does the assignment and then BPF program
can acquire the hwstamp from skb directly.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-9-kerneljasonxing@gmail.com
Support sw SCM_TSTAMP_SND case for bpf timestamping.
Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_SW_CB. This
callback will occur at the same timestamping point as the user
space's software SCM_TSTAMP_SND. The BPF program can use it to
get the same SCM_TSTAMP_SND timestamp without modifying the
user-space application.
Based on this patch, BPF program will get the software
timestamp when the driver is ready to send the skb. In the
sebsequent patch, the hardware timestamp will be supported.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-8-kerneljasonxing@gmail.com
Support SCM_TSTAMP_SCHED case for bpf timestamping.
Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SCHED_CB. This
callback will occur at the same timestamping point as the user
space's SCM_TSTAMP_SCHED. The BPF program can use it to get the
same SCM_TSTAMP_SCHED timestamp without modifying the user-space
application.
A new SKBTX_BPF flag is added to mark skb_shinfo(skb)->tx_flags,
ensuring that the new BPF timestamping and the current user
space's SO_TIMESTAMPING do not interfere with each other.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-7-kerneljasonxing@gmail.com
The new SK_BPF_CB_FLAGS and new SK_BPF_CB_TX_TIMESTAMPING are
added to bpf_get/setsockopt. The later patches will implement the
BPF networking timestamping. The BPF program will use
bpf_setsockopt(SK_BPF_CB_FLAGS, SK_BPF_CB_TX_TIMESTAMPING) to
enable the BPF networking timestamping on a socket.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-2-kerneljasonxing@gmail.com
For existing epoll event loops that can't fully convert to io_uring,
the used approach is usually to add the io_uring fd to the epoll
instance and use epoll_wait() to wait on both "legacy" and io_uring
events. While this work, it isn't optimal as:
1) epoll_wait() is pretty limited in what it can do. It does not support
partial reaping of events, or waiting on a batch of events.
2) When an io_uring ring is added to an epoll instance, it activates the
io_uring "I'm being polled" logic which slows things down.
Rather than use this approach, with EPOLL_WAIT support added to io_uring,
event loops can use the normal io_uring wait logic for everything, as
long as an epoll wait request has been armed with io_uring.
Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this
is an async request. Waiting on io_uring events in general has various
timeout parameters, and those are the ones that should be used when
waiting on any kind of request. If events are immediately available for
reaping, then This opcode will return those immediately. If none are
available, then it will post an async completion when they become
available.
cqe->res will contain either an error code (< 0 value) for a malformed
request, invalid epoll instance, etc. It will return a positive result
indicating how many events were reaped.
IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring
cancelation infrastructure.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAme1voYTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRAMdGXf+ZCRnFOCB/0dTrOBuRiX1Gg6yIkYs3lzrWlmCv98
itowE3y2RfAxlFZXaTSNTCFKrKpyk56C13tcq2okaqCWrBzcDkNeVemsNKJ49OBK
pYdxXsiJtddaVg6qJ2up2f/4btgbD1tEH0vzDm2S4z5YigjDFuAiQslbNi6rFSSt
Lcp4yhPcmQ+L+3nGYP0HmYPIX0K1gWL3jFEFPr4e/XWiCKn9bYOUdYRUV6xVa1Dz
lHkA8rqLJ+Gdwc6P3FgSdrTUafCFfgeFlPNnl1ULiLchHK2ZklPWZjFBb/jhKRJM
/Ef9RgpMXrSb8P6lS25RWptTSFnKrg00VqVWV0Fi9rxELe3OvSdoAUuK
=URof
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-6.15-20250219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2025-02-19
this is a pull request of 12 patches for net-next/master.
The first 4 patches are by Krzysztof Kozlowski and simplify the c_can
driver's c_can_plat_probe() function.
Ciprian Marian Costea contributes 3 patches to add S32G2/S32G3 support
to the flexcan driver.
Ruffalo Lavoisier's patch removes a duplicated word from the mcp251xfd
DT bindings documentation.
Oleksij Rempel extends the J1939 documentation.
The next patch is by Oliver Hartkopp and adds access for the Remote
Request Substitution bit in CAN-XL frames.
Henrik Brix Andersen's patch for the gs_usb driver adds support for
the CANnectivity firmware.
The last patch is by Robin van der Gracht and removes a duplicated
setup of RX FIFO in the rockchip_canfd driver.
linux-can-next-for-6.15-20250219
* tag 'linux-can-next-for-6.15-20250219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: rockchip_canfd: rkcanfd_chip_fifo_setup(): remove duplicated setup of RX FIFO
can: gs_usb: add VID/PID for the CANnectivity firmware
can: canxl: support Remote Request Substitution bit access
can: j1939: Extend stack documentation with buffer size behavior
dt-binding: can: mcp251xfd: remove duplicate word
can: flexcan: add NXP S32G2/S32G3 SoC support
can: flexcan: Add quirk to handle separate interrupt lines for mailboxes
dt-bindings: can: fsl,flexcan: add S32G2/S32G3 SoC support
can: c_can: Use syscon_regmap_lookup_by_phandle_args
can: c_can: Use of_property_present() to test existence of DT property
can: c_can: Simplify handling syscon error path
can: c_can: Drop useless final probe failure message
====================
Link: https://patch.msgid.link/20250219113354.529611-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add attributes that allow matching on source and destination ports with
a mask. Matching on the source port with a mask is needed in deployments
where users encode path information into certain bits of the UDP source
port.
Temporarily set the type of the attributes to 'NLA_REJECT' while support
is being added.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
or aren't considered necessary for -stable kernels.
10 are for MM and 8 are for non-MM. All are singletons, please see the
changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ7aKTwAKCRDdBJ7gKXxA
jo9eAQD0GBh7LaeobM+OJBN0E+u/wKySR/QpGfQX1h/uTpcOPAEA+Q5yaNcmFIzO
NB/htGoMpW2F9gru3pwAT7CgnE3qeg8=
=Y0sw
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-02-19-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"18 hotfixes. 5 are cc:stable and the remainder address post-6.13
issues or aren't considered necessary for -stable kernels.
10 are for MM and 8 are for non-MM. All are singletons, please see the
changelogs for details"
* tag 'mm-hotfixes-stable-2025-02-19-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
test_xarray: fix failure in check_pause when CONFIG_XARRAY_MULTI is not defined
kasan: don't call find_vm_area() in a PREEMPT_RT kernel
MAINTAINERS: update Nick's contact info
selftests/mm: fix check for running THP tests
mm: hugetlb: avoid fallback for specific node allocation of 1G pages
memcg: avoid dead loop when setting memory.max
mailmap: update Nick's entry
mm: pgtable: fix incorrect reclaim of non-empty PTE pages
taskstats: modify taskstats version
getdelays: fix error format characters
mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize()
tools/mm: fix build warnings with musl-libc
mailmap: add entry for Feng Tang
.mailmap: add entries for Jeff Johnson
mm,madvise,hugetlb: check for 0-length range after end address adjustment
mm/zswap: fix inconsistency when zswap_store_page() fails
lib/iov_iter: fix import_iovec_ubuf iovec management
procfs: fix a locking bug in a vmcore_add_device_dump() error path
The Remote Request Substitution bit is a dominant bit ("0") in the CAN
XL frame. As some CAN XL controllers support to access this bit a new
CANXL_RRS value has been defined for the canxl_frame.flags element.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250124142347.7444-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This is obviously not that important, but when changes are synced back
from the kernel to liburing, the codespell CI ends up erroring because
of this misspelling. Let's just correct it and avoid this biting us
again on an import.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In this patch, an eventfd object is created by the vfio_ap device driver
and used to notify userspace when a guests's AP configuration is
dynamically changed. Such changes may occur whenever:
* An adapter, domain or control domain is assigned to or unassigned from a
mediated device that is attached to the guest.
* A queue assigned to the mediated device that is attached to a guest is
bound to or unbound from the vfio_ap device driver. This can occur
either by manually binding/unbinding the queue via the vfio_ap driver's
sysfs bind/unbind attribute interfaces, or because an adapter, domain or
control domain assigned to the mediated device is added to or removed
from the host's AP configuration via an SE/HMC
The purpose of this patch is to provide immediate notification of changes
made to a guest's AP configuration by the vfio_ap driver. This will enable
the guest to take immediate action rather than relying on polling or some
other inefficient mechanism to detect changes to its AP configuration.
Note that there are corresponding QEMU patches that will be shipped along
with this patch (see vfio-ap: Report vfio-ap configuration changes) that
will pick up the eventfd signal.
Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20250107183645.90082-1-rreyes@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
A slightly large collection of fixes, spread over various drivers.
Almost all are small and device-specific fixes and quirks in ASoC
SOF Intel and AMD, Renesas, Cirrus, HD-audio, in addition to a small
fix for MIDI 2.0.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAme0bwIOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE+6SQ//ZUPAzra8D1zIR9Gfk/ImULAPOpK2R9MK6Tg9
u6LIbFEUVZ1/A5MbsYWbW6bdndu0q/pWYHmOSNbpbWPpnxANu/SGTqex6KjkEUWw
Co5P/Mv9ZQWflI1tXGNtneS1cAnJ888qKrkNF2JnzENa0TeuiVOUuCNUI0Ro/SQ+
nxGtd1239zgzXYyruSe+Gj8haetBR3RfFKD042cGYY9TNNsnVFd0Tlsi5OLZYSk1
E0w4mJ92E3kpL9slFDVC9xa+pPy4e39y+2xpXqwN15kh86J5Gvp2w+Z8fz0W7EaV
6yZwx2xlM+DDO71gXoX4XwMLn93b4gQbFMjPdssDhVP9dH8aA04lRFDmvnRpy2eq
m8c7TLw8M/QNtz784GvmmgZavv4/twbL0x7HvZ1VvSenbFX9UxozPaYHnfh+Ev4T
vwjq8S7DX35gKKu7rEk/euGDnQ2wHEfD+f7inCKf1fB1o0oj3q0yxD6Olld3/M3L
8ALiDWQ8Yx42kpYpEpeQvStR0Ocb1BRQ5DrAhG2/v/z78sAZMorXF58cpzLphmmS
wSDlZm4rNe23AZ5GUh6BwrdUpQjttQEyakMZ97xm1T/KDZ4niY1i4f5SLi6MR0ak
ErkSSD0idSKdnxuZwuJCh18NjAgZE5ve5qgOPOYnUPr47LbXZuq/nCQAZ2ibFzV5
QlHZDQQ=
=F+UA
-----END PGP SIGNATURE-----
Merge tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A slightly large collection of fixes, spread over various drivers.
Almost all are small and device-specific fixes and quirks in ASoC SOF
Intel and AMD, Renesas, Cirrus, HD-audio, in addition to a small fix
for MIDI 2.0"
* tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (41 commits)
ALSA: seq: Drop UMP events when no UMP-conversion is set
ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LED
ALSA: hda/cirrus: Reduce codec resume time
ALSA: hda/cirrus: Correct the full scale volume set logic
virtio_snd.h: clarify that `controls` depends on VIRTIO_SND_F_CTLS
ALSA: hda: Add error check for snd_ctl_rename_id() in snd_hda_create_dig_out_ctls()
ALSA: hda/tas2781: Fix index issue in tas2781 hda SPI driver
ASoC: imx-audmix: remove cpu_mclk which is from cpu dai device
ALSA: hda/realtek: Fixup ALC225 depop procedure
ALSA: hda/tas2781: Update tas2781 hda SPI driver
ASoC: cs35l41: Fix acpi_device_hid() not found
ASoC: SOF: amd: Add branch prediction hint in ACP IRQ handler
ASoC: SOF: amd: Handle IPC replies before FW_BOOT_COMPLETE
ASoC: SOF: amd: Drop unused includes from Vangogh driver
ASoC: SOF: amd: Add post_fw_run_delay ACP quirk
ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt713_vb_l2_rt1320_l13
ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt712_vb + rt1320 support
ALSA: Switch to use hrtimer_setup()
ALSA: hda: hda-intel: add Panther Lake-H support
ASoC: SOF: Intel: pci-ptl: Add support for PTL-H
...
After adding "delay max" and "delay min" to the taskstats structure, the
taskstats version needs to be updated.
Link: https://lkml.kernel.org/r/20250208144901218Q5ptVpqsQkb2MOEmW4Ujn@zte.com.cn
Fixes: f65c64f311 ("delayacct: add delay min to record delay peak")
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Expose a new per-queue nest attribute, xsk, which will be present for
queues that are being used for AF_XDP. If the queue is not being used for
AF_XDP, the nest will not be present.
In the future, this attribute can be extended to include more data about
XSK as it is needed.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add io_uring opcode OP_RECV_ZC for doing zero copy reads out of a
socket. Only the connection should be land on the specific rx queue set
up for zero copy, and the socket must be handled by the io_uring
instance that the rx queue was registered for zero copy with. That's
because neither net_iovs / buffers from our queue can be read by outside
applications, nor zero copy is possible if traffic for the zero copy
connection goes to another queue. This coordination is outside of the
scope of this patch series. Also, any traffic directed to the zero copy
enabled queue is immediately visible to the application, which is why
CAP_NET_ADMIN is required at the registration step.
Of course, no data is actually read out of the socket, it has already
been copied by the netdev into userspace memory via DMA. OP_RECV_ZC
reads skbs out of the socket and checks that its frags are indeed
net_iovs that belong to io_uring. A cqe is queued for each one of these
frags.
Recall that each cqe is a big cqe, with the top half being an
io_uring_zcrx_cqe. The cqe res field contains the len or error. The
lower IORING_ZCRX_AREA_SHIFT bits of the struct io_uring_zcrx_cqe::off
field contain the offset relative to the start of the zero copy area.
The upper part of the off field is trivially zero, and will be used
to carry the area id.
For now, there is no limit as to how much work each OP_RECV_ZC request
does. It will attempt to drain a socket of all available data. This
request always operates in multishot mode.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20250215000947.789731-7-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add io_zcrx_area that represents a region of userspace memory that is
used for zero copy. During ifq registration, userspace passes in the
uaddr and len of userspace memory, which is then pinned by the kernel.
Each net_iov is mapped to one of these pages.
The freelist is a spinlock protected list that keeps track of all the
net_iovs/pages that aren't used.
For now, there is only one area per ifq and area registration happens
implicitly as part of ifq registration. There is no API for
adding/removing areas yet. The struct for area registration is there for
future extensibility once we support multiple areas and TCP devmem.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20250215000947.789731-3-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new object called an interface queue (ifq) that represents a net
rx queue that has been configured for zero copy. Each ifq is registered
using a new registration opcode IORING_REGISTER_ZCRX_IFQ.
The refill queue is allocated by the kernel and mapped by userspace
using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main
SQ/CQ. It is used by userspace to return buffers that it is done with,
which will then be re-used by the netdev again.
The main CQ ring is used to notify userspace of received data by using
the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each
entry contains the offset + len to the data.
For now, each io_uring instance only has a single ifq.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20250215000947.789731-2-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge networking zerocopy receive tree, to get the prep patches for
the io_uring rx zc support.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (63 commits)
net: add helpers for setting a memory provider on an rx queue
net: page_pool: add memory provider helpers
net: prepare for non devmem TCP memory providers
net: page_pool: add a mp hook to unregister_netdevice*
net: page_pool: add callback for mp info printing
netdev: add io_uring memory provider info
net: page_pool: create hooks for custom memory providers
net: generalise net_iov chunk owners
net: prefix devmem specific helpers
net: page_pool: don't cast mp param to devmem
tools: ynl: add all headers to makefile deps
eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs
eth: fbnic: add MAC address TCAM to debugfs
tools: ynl-gen: support limits using definitions
tools: ynl-gen: don't output external constants
net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled
net/mlx5e: Remove unused mlx5e_tc_flow_action struct
net/mlx5: Remove stray semicolon in LAG port selection table creation
net/mlx5e: Support FEC settings for 200G per lane link modes
net/mlx5: Add support for 200Gbps per lane link modes
...
Fix a regression caused by an inadvertent change of the
THERMAL_GENL_ATTR_CPU_CAPABILITY value in one of the recent
thermal commits (Zhang Rui) and drop a stale piece of
documentation (Daniel Lezcano).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmevqTsSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxNCoQAKgTxykNnDV9u/wHlb8E4RRjf1kx4nXl
yulKrI9Ow+8N48OStl/80X+Cjdnq2BwlblxKIm/jLg524HkSmwllpSwmQDnfJ2Td
yl+EJOMvevB2ccOwYnLV6xsMVJDQgQqVZgly4p/X8jL4S4qYDgag3WIzEIVaJHyC
TzSlr9kHkmjgP/a9oncgT2N2ONxf610wrn0u5IGZ3r5ac7Io4vzHJCDz+i7AfCac
bcCYRQCE7D4s+oPY9ru9yT7X+AGDJUP7LRO1F86He3YTJyntppFhnCRsaRHMkB8w
Cfd2d+3hnpbO4nUFTKSJOxQrKBoITeVTDLR2CFD/JeNpI2FixgwjVihGE21ccjAx
gd3oIxxfSvKi08gC8OoDfHtbirwXi+Wl32CBVxMvwzqvRfcJ4PD/34VUgpz6LyUQ
JWMiNLutc3qbrtXCemIibfXrI2gy6KUThcT7cU8seFL6Y/ZmKpL8eHmzTJzip40t
wNOyjSYFBTUZXiGWeq4yQq6Q1lPpuGt8wrL0dJ3rU7vk1RevaahqAmiIirzfLZRV
BhvguDR7BOfW5kq0fnzuSZXGE8nVUJDn/tnY+JWYEXwLXXEYrtXhC7Gpz9idhJdw
NUCWZMqwBLlHlu+ZmdctHwZH2r4vpZw3LePh+0aS4sRHR1qSbRJMGFVej5VLMzvG
Uk4tlEAT2HGe
=c9kh
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"Fix a regression caused by an inadvertent change of the
THERMAL_GENL_ATTR_CPU_CAPABILITY value in one of the recent thermal
commits (Zhang Rui) and drop a stale piece of documentation (Daniel
Lezcano)"
* tag 'thermal-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/cpufreq_cooling: Remove structure member documentation
thermal/netlink: Prevent userspace segmentation fault by adjusting UAPI header
As defined in the specification, the `controls` field in the configuration
space is only valid/present if VIRTIO_SND_F_CTLS is negotiated.
From https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html:
5.14.4 Device Configuration Layout
...
controls
(driver-read-only) indicates a total number of all available control
elements if VIRTIO_SND_F_CTLS has been negotiated.
Let's use the same style used in virtio_blk.h to clarify this and to avoid
confusion as happened in QEMU (see link).
Link: https://gitlab.com/qemu-project/qemu/-/issues/2805
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250213161825.139952-1-sgarzare@redhat.com
* Fix some whitespace, punctuation and minor grammar.
* Add a missing sentence about the minimum ABI version,
to stay in line with the section next to it.
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: Tanya Agarwal <tanyaagarwal25699@gmail.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250124154445.162841-1-gnoack@google.com
[mic: Add newlines, update doc date]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Introduct new xattr name prefix security.bpf., and enable reading these
xattrs from bpf kfuncs bpf_get_[file|dentry]_xattr().
As we are on it, correct the comments for return value of
bpf_get_[file|dentry]_xattr(), i.e. return length the xattr value on
success.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20250130213549.3353349-2-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Christian Brauner <brauner@kernel.org> says:
Currently, it isn't possible to change the idmapping of an idmapped
mount. This is becoming an obstacle for various use-cases.
/* idmapped home directories with systemd-homed */
On newer systems /home is can be an idmapped mount such that each file
on disk is owned by 65536 and a subfolder exists for foreign id ranges
such as containers. For example, a home directory might look like this
(using an arbitrary folder as an example):
user1@localhost:~/data/mount-idmapped$ ls -al /data/
total 16
drwxrwxrwx 1 65536 65536 36 Jan 27 12:15 .
drwxrwxr-x 1 root root 184 Jan 27 12:06 ..
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 aaa
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 bbb
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 cc
drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers
When logging in home is mounted as an idmapped mount with the following
idmappings:
65536:$(id -u):1 // uid mapping
65536:$(id -g):1 // gid mapping
2147352576:2147352576:65536 // uid mapping
2147352576:2147352576:65536 // gid mapping
So for a user with uid/gid 1000 an idmapped /home would like like this:
user1@localhost:~/data/mount-idmapped$ ls -aln /mnt/
total 16
drwxrwxrwx 1 1000 1000 36 Jan 27 12:15 .
drwxrwxr-x 1 0 0 184 Jan 27 12:06 ..
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 aaa
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 bbb
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 cc
drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers
In other words, 65536 is mapped to the user's uid/gid and the range
2147352576 up to 2147352576 + 65536 is an identity mapping for
containers.
When a container is started a transient uid/gid range is allocated
outside of both mappings of the idmapped mount. For example, the
container might get the idmapping:
$ cat /proc/1742611/uid_map
0 537985024 65536
This container will be allowed to write to disk within the allocated
foreign id range 2147352576 to 2147352576 + 65536. To do this an
idmapped mount must be created from an already idmapped mount such that:
- The mappings for the user's uid/gid must be dropped, i.e., the
following mappings are removed:
65536:$(id -u):1 // uid mapping
65536:$(id -g):1 // gid mapping
- A mapping for the transient uid/gid range to the foreign uid/gid range
is added:
2147352576:537985024:65536
In combination this will mean that the container will write to disk
within the foreign id range 2147352576 to 2147352576 + 65536.
/* nested containers */
When the outer container makes use of idmapped mounts it isn't posssible
to create an idmapped mount for the inner container with a differen
idmapping from the outer container's idmapped mount.
There are other usecases and the two above just serve as an illustration
of the problem.
This patchset makes it possible to create a new idmapped mount from an
already idmapped mount. It aims to adhere to current performance
constraints and requirements:
- Idmapped mounts aim to have near zero performance implications for
path lookup. That is why no refernce counting, locking or any other
mechanism can be required that would impact performance.
This works be ensuring that a regular mount transitions to an idmapped
mount once going from a static nop_mnt_idmap mapping to a non-static
idmapping.
- The idmapping of a mount change anymore for the lifetime of the mount
afterwards. This not just avoids UAF issues it also avoids pitfalls
such as generating non-matching uid/gid values.
Changing idmappings could be solved by:
- Idmappings could simply be reference counted (above the simple
reference count when sharing them across multiple mounts).
This would require pairing mnt_idmap_get() with mnt_idmap_put() which
would end up being sprinkled everywhere into the VFS and some
filesystems that access idmappings directly.
It wouldn't just be quite ugly and introduce new complexity it would
have a noticeable performance impact.
- Idmappings could gain RCU protection. This would help the LOOKUP_RCU
case and avoids taking reference counts under RCU.
When not under LOOKUP_RCU reference counts need to be acquired on each
idmapping. This would require pairing mnt_idmap_get() with
mnt_idmap_put() which would end up being sprinkled everywhere into the
VFS and some filesystems that access idmappings directly.
This would have the same downsides as mentioned earlier.
- The earlier solutions work by updating the mnt->mnt_idmap pointer with
the new idmapping. Instead of this it would be possible to change the
idmapping itself to avoid UAF issues.
To do this a sequence counter would have to be added to struct mount.
When retrieving the idmapping to generate uid/gid values the sequence
counter would need to be sampled and the generation of the uid/gid
would spin until the update of the idmap is finished.
This has problems as well but the biggest issue will be that this can
lead to inconsistent permission checking and inconsistent uid/gid
pairs even more than this is already possible today. Specifically,
during creation it could happen that:
idmap = mnt_idmap(mnt);
inode_permission(idmap, ...);
may_create(idmap);
// create file with uid/gid based on @idmap
in between the permission checking and the generation of the uid/gid
value the idmapping could change leading to the permission checking
and uid/gid value that is actually used to create a file on disk being
out of sync.
Similarly if two values are generated like:
idmap = mnt_idmap(mnt)
vfsgid = make_vfsgid(idmap);
// idmapping gets update concurrently
vfsuid = make_vfsuid(idmap);
@vfsgid and @vfsuid could be out of sync if the idmapping was changed
in between. The generation of vfsgid/vfsuid could span a lot of
codelines so to guard against this a sequence count would have to be
passed around.
The performance impact of this solutio are less clear but very likely
not zero.
- Using SRCU similar to fanotify that can sleep. I find that not just
ugly but it would have memory consumption implications and is overall
pretty ugly.
/* solution */
So, to avoid all of these pitfalls creating an idmapped mount from an
already idmapped mount will be done atomically, i.e., a new detached
mount is created and a new set of mount properties applied to it without
it ever having been exposed to userspace at all.
This can be done in two ways. A new flag to open_tree() is added
OPEN_TREE_CLEAR_IDMAP that clears the old idmapping and returns a mount
that isn't idmapped. And then it is possible to set mount attributes on
it again including creation of an idmapped mount.
This has the consequence that a file descriptor must exist in userspace
that doesn't have any idmapping applied and it will thus never work in
unpriviledged scenarios. As a container would be able to remove the
idmapping of the mount it has been given. That should be avoided.
Instead, we add open_tree_attr() which works just like open_tree() but
takes an optional struct mount_attr parameter. This is useful beyond
idmappings as it fills a gap where a mount never exists in userspace
without the necessary mount properties applied.
This is particularly useful for mount options such as
MOUNT_ATTR_{RDONLY,NOSUID,NODEV,NOEXEC}.
To create a new idmapped mount the following works:
// Create a first idmapped mount
struct mount_attr attr = {
.attr_set = MOUNT_ATTR_IDMAP
.userns_fd = fd_userns
};
fd_tree = open_tree(-EBADF, "/", OPEN_TREE_CLONE, &attr, sizeof(attr));
move_mount(fd_tree, "", -EBADF, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
// Create a second idmapped mount from the first idmapped mount
attr.attr_set = MOUNT_ATTR_IDMAP;
attr.userns_fd = fd_userns2;
fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr));
// Create a second non-idmapped mount from the first idmapped mount:
memset(&attr, 0, sizeof(attr));
attr.attr_clr = MOUNT_ATTR_IDMAP;
fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr));
* patches from https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org:
fs: allow changing idmappings
fs: add kflags member to struct mount_kattr
fs: add open_tree_attr()
fs: add copy_mount_setattr() helper
fs: add vfs_open_tree() helper
Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Some of the fields in the statmount() reply can be optional. If the
kernel has nothing to emit in that field, then it doesn't set the flag
in the reply. This presents a problem: There is currently no way to
know what mask flags the kernel supports since you can't always count on
them being in the reply.
Add a new STATMOUNT_SUPPORTED_MASK flag and field that the kernel can
set in the reply. Userland can use this to determine if the fields it
requires from the kernel are supported. This also gives us a way to
deprecate fields in the future, if that should become necessary.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250206-statmount-v2-1-6ae70a21c2ab@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
This adds the STATMOUNT_MNT_UIDMAP and STATMOUNT_MNT_GIDMAP options.
It allows the retrieval of idmappings via statmount().
Currently it isn't possible to figure out what idmappings are applied to
an idmapped mount. This information is often crucial. Before statmount()
the only realistic options for an interface like this would have been to
add it to /proc/<pid>/fdinfo/<nr> or to expose it in
/proc/<pid>/mountinfo. Both solution would have been pretty ugly and
would've shown information that is of strong interest to some
application but not all. statmount() is perfect for this.
The idmappings applied to an idmapped mount are shown relative to the
caller's user namespace. This is the most useful solution that doesn't
risk leaking information or confuse the caller.
For example, an idmapped mount might have been created with the
following idmappings:
mount --bind -o X-mount.idmap="0:10000:1000 2000:2000:1 3000:3000:1" /srv /opt
Listing the idmappings through statmount() in the same context shows:
mnt_id: 2147485088
mnt_parent_id: 2147484816
fs_type: btrfs
mnt_root: /srv
mnt_point: /opt
mnt_opts: ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
mnt_uidmap[0]: 0 10000 1000
mnt_uidmap[1]: 2000 2000 1
mnt_uidmap[2]: 3000 3000 1
mnt_gidmap[0]: 0 10000 1000
mnt_gidmap[1]: 2000 2000 1
mnt_gidmap[2]: 3000 3000 1
But the idmappings might not always be resolvable in the caller's user
namespace. For example:
unshare --user --map-root
In this case statmount() will skip any mappings that fil to resolve in
the caller's idmapping:
mnt_id: 2147485087
mnt_parent_id: 2147484016
fs_type: btrfs
mnt_root: /srv
mnt_point: /opt
mnt_opts: ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
The caller can differentiate between a mount not being idmapped and a
mount that is idmapped but where all mappings fail to resolve in the
caller's idmapping by check for the STATMOUNT_MNT_{G,U}IDMAP flag being
raised but the number of mappings in ->mnt_{g,u}idmap_num being zero.
Note that statmount() requires that the whole range must be resolvable
in the caller's user namespace. If a subrange fails to map it will still
list the map as not resolvable. This is a practical compromise to avoid
having to find which subranges are resovable and wich aren't.
Idmappings are listed as a string array with each mapping separated by
zero bytes. This allows to retrieve the idmappings and immediately use
them for writing to e.g., /proc/<pid>/{g,u}id_map and it also allow for
simple iteration like:
if (stmnt->mask & STATMOUNT_MNT_UIDMAP) {
const char *idmap = stmnt->str + stmnt->mnt_uidmap;
for (size_t idx = 0; idx < stmnt->mnt_uidmap_nr; idx++) {
printf("mnt_uidmap[%lu]: %s\n", idx, idmap);
idmap += strlen(idmap) + 1;
}
}
Link: https://lore.kernel.org/r/20250204-work-mnt_idmap-statmount-v2-2-007720f39f2e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This patch adds an ioctl to give a per-file priority hint to attach
REQ_PRIO.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The intel-lpmd tool [1], which uses the THERMAL_GENL_ATTR_CPU_CAPABILITY
attribute to receive HFI events from kernel space, encounters a
segmentation fault after commit 1773572863 ("thermal: netlink: Add the
commands and the events for the thresholds").
The issue arises because the THERMAL_GENL_ATTR_CPU_CAPABILITY raw value
was changed while intel_lpmd still uses the old value.
Although intel_lpmd can be updated to check the THERMAL_GENL_VERSION and
use the appropriate THERMAL_GENL_ATTR_CPU_CAPABILITY value, the commit
itself is questionable.
The commit introduced a new element in the middle of enum thermal_genl_attr,
which affects many existing attributes and introduces potential risks
and unnecessary maintenance burdens for userspace thermal netlink event
users.
Solve the issue by moving the newly introduced
THERMAL_GENL_ATTR_TZ_PREV_TEMP attribute to the end of the
enum thermal_genl_attr. This ensures that all existing thermal generic
netlink attributes remain unaffected.
Link: https://github.com/intel/intel-lpmd [1]
Fixes: 1773572863 ("thermal: netlink: Add the commands and the events for the thresholds")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20250208074907.5679-1-rui.zhang@intel.com
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, TCP stack uses a constant (120 seconds)
to limit the RTO value exponential growth.
Some applications want to set a lower value.
Add TCP_RTO_MAX_MS socket option to set a value (in ms)
between 1 and 120 seconds.
It is discouraged to change the socket rto max on a live
socket, as it might lead to unexpected disconnects.
Following patch is adding a netns sysctl to control the
default value at socket creation time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Unconditionally start to refuse creating cooked monitor interfaces to
phase them out.
There is no feature flag for drivers to opt-in for cooked monitor and
all known users are using/preferring the modern API since the hostapd
release 1.0 in May 2012.
Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
Link: https://patch.msgid.link/20250204111352.7004-1-Alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
elf.h had a comment saying:
> Notes used in ET_CORE. Architectures export some of the arch register
> sets using the corresponding note types via the PTRACE_GETREGSET and
> PTRACE_SETREGSET requests.
> The note name for these types is "LINUX", except NT_PRFPREG that is
> named "CORE".
However, NT_PRSTATUS is also named "CORE". It is also unclear what
"these types" refers to.
To fix these problems, define a name for each note type. The added
definitions are macros so the kernel and userspace can directly refer to
them to remove their duplicate definitions of note names.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20250115-elf-v5-1-0f9e55bbb2fc@daynix.com
Signed-off-by: Kees Cook <kees@kernel.org>
Until this point, the kernel can use hardware-wrapped keys to do
encryption if userspace provides one -- specifically a key in
ephemerally-wrapped form. However, no generic way has been provided for
userspace to get such a key in the first place.
Getting such a key is a two-step process. First, the key needs to be
imported from a raw key or generated by the hardware, producing a key in
long-term wrapped form. This happens once in the whole lifetime of the
key. Second, the long-term wrapped key needs to be converted into
ephemerally-wrapped form. This happens each time the key is "unlocked".
In Android, these operations are supported in a generic way through
KeyMint, a userspace abstraction layer. However, that method is
Android-specific and can't be used on other Linux systems, may rely on
proprietary libraries, and also misleads people into supporting KeyMint
features like rollback resistance that make sense for other KeyMint keys
but don't make sense for hardware-wrapped inline encryption keys.
Therefore, this patch provides a generic kernel interface for these
operations by introducing new block device ioctls:
- BLKCRYPTOIMPORTKEY: convert a raw key to long-term wrapped form.
- BLKCRYPTOGENERATEKEY: have the hardware generate a new key, then
return it in long-term wrapped form.
- BLKCRYPTOPREPAREKEY: convert a key from long-term wrapped form to
ephemerally-wrapped form.
These ioctls are implemented using new operations in blk_crypto_ll_ops.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650
Link: https://lore.kernel.org/r/20250204060041.409950-4-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new event type to describe an hardware failure.
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Signed-off-by: Guillaume Ranquet <granquet@baylibre.com>
Reviewed-by: David Lechner <dlechner@baylibre.com>
Link: https://patch.msgid.link/20250127-ad4111_openwire-v5-1-ef2db05c384f@baylibre.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Fix the netlink type for hardware timestamp flags, which are represented
as a bitset of flags. Although only one flag is supported currently, the
correct netlink bitset type should be used instead of u32 to keep
consistency with other fields. Address this by adding a new named string
set description for the hwtstamp flag structure.
The code has been introduced in the current release so the uAPI change is
still okay.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Fixes: 6e9e2eed4f ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config")
Link: https://patch.msgid.link/20250205110304.375086-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wei says:
====================
io_uring zero copy rx
This patchset contains net/ patches needed by a new io_uring request
implementing zero copy rx into userspace pages, eliminating a kernel
to user copy.
We configure a page pool that a driver uses to fill a hw rx queue to
hand out user pages instead of kernel pages. Any data that ends up
hitting this hw rx queue will thus be dma'd into userspace memory
directly, without needing to be bounced through kernel memory. 'Reading'
data out of a socket instead becomes a _notification_ mechanism, where
the kernel tells userspace where the data is. The overall approach is
similar to the devmem TCP proposal.
This relies on hw header/data split, flow steering and RSS to ensure
packet headers remain in kernel memory and only desired flows hit a hw
rx queue configured for zero copy. Configuring this is outside of the
scope of this patchset.
We share netdev core infra with devmem TCP. The main difference is that
io_uring is used for the uAPI and the lifetime of all objects are bound
to an io_uring instance. Data is 'read' using a new io_uring request
type. When done, data is returned via a new shared refill queue. A zero
copy page pool refills a hw rx queue from this refill queue directly. Of
course, the lifetime of these data buffers are managed by io_uring
rather than the networking stack, with different refcounting rules.
This patchset is the first step adding basic zero copy support. We will
extend this iteratively with new features e.g. dynamically allocated
zero copy areas, THP support, dmabuf support, improved copy fallback,
general optimisations and more.
In terms of netdev support, we're first targeting Broadcom bnxt. Patches
aren't included since Taehee Yoo has already sent a more comprehensive
patchset adding support in [1]. Google gve should already support this,
and Mellanox mlx5 support is WIP pending driver changes.
===========
Performance
===========
Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet.
Test setup:
* AMD EPYC 9454
* Broadcom BCM957508 200G
* Kernel v6.11 base [2]
* liburing fork [3]
* kperf fork [4]
* 4K MTU
* Single TCP flow
With application thread + net rx softirq pinned to _different_ cores:
+-------------------------------+
| epoll | io_uring |
|-----------|-------------------|
| 82.2 Gbps | 116.2 Gbps (+41%) |
+-------------------------------+
Pinned to _same_ core:
+-------------------------------+
| epoll | io_uring |
|-----------|-------------------|
| 62.6 Gbps | 80.9 Gbps (+29%) |
+-------------------------------+
=====
Links
=====
Broadcom bnxt support:
[1]: https://lore.kernel.org/20241003160620.1521626-8-ap420073@gmail.com
Linux kernel branch including io_uring bits:
[2]: https://github.com/isilence/linux.git zcrx/v13
liburing for testing:
[3]: https://github.com/isilence/liburing.git zcrx/next
kperf for testing:
[4]: https://git.kernel.dk/kperf.git
====================
Link: https://patch.msgid.link/20250204215622.695511-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Define 200G, 400G and 800G link modes using 200Gbps per lane.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Explain the meaning of kind_flag in BTF type_tags and decl_tags.
Update uapi btf.h kind_flag comment to reflect the changes.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250130201239.1429648-3-ihor.solodrai@linux.dev
Add notifications for attaching and detaching mounts. The following new
event masks are added:
FAN_MNT_ATTACH - Mount was attached
FAN_MNT_DETACH - Mount was detached
If a mount is moved, then the event is reported with (FAN_MNT_ATTACH |
FAN_MNT_DETACH).
These events add an info record of type FAN_EVENT_INFO_TYPE_MNT containing
these fields identifying the affected mounts:
__u64 mnt_id - the ID of the mount (see statmount(2))
FAN_REPORT_MNT must be supplied to fanotify_init() to receive these events
and no other type of event can be received with this report type.
Marks are added with FAN_MARK_MNTNS, which records the mount namespace from
an nsfs file (e.g. /proc/self/ns/mnt).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250129165803.72138-3-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
It is useful to be able to utilise the pidfd mechanism to reference the
current thread or process (from a userland point of view - thread group
leader from the kernel's point of view).
Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
For convenience and to avoid confusion from userland's perspective we alias
these:
* PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
the user will want to use, as they would find it surprising if for
instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
and that failed.
* PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
have no concept of thread groups or what a thread group leader is, and
from userland's perspective and nomenclature this is what userland
considers to be a process.
We adjust pidfd_get_task() and the pidfd_send_signal() system call with
specific handling for this, implementing this functionality for
process_madvise(), process_mrelease() (albeit, using it here wouldn't
really make sense) and pidfd_send_signal().
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/r/24315a16a3d01a548dd45c7515f7d51c767e954e.1738268370.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ5nzXQAKCRDh3BK/laaZ
PCaJAP4gw6CnxrdzuPvm7yEsINuHdavQ8aeCiimWwOC4eBzkOgD/SlMry5vwCkW9
WOzoONVUcNIPEqYXThw77OFlkFpKGwQ=
=JQE1
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
"Add support for io-uring communication between kernel and userspace
using IORING_OP_URING_CMD (Bernd Schubert). Following features enable
gains in performance compared to the regular interface:
- Allow processing multiple requests with less syscall overhead
- Combine commit of old and fetch of new fuse request
- CPU/NUMA affinity of queues
Patches were reviewed by several people, including Pavel Begunkov,
io-uring co-maintainer"
* tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: prevent disabling io-uring on active connections
fuse: enable fuse-over-io-uring
fuse: block request allocation until io-uring init is complete
fuse: {io-uring} Prevent mount point hang on fuse-server termination
fuse: Allow to queue bg requests through io-uring
fuse: Allow to queue fg requests through io-uring
fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
fuse: {io-uring} Handle teardown of ring entries
fuse: Add io-uring sqe commit and fetch support
fuse: {io-uring} Make hash-list req unique finding functions non-static
fuse: Add fuse-io-uring handling into fuse_copy
fuse: Make fuse_copy non static
fuse: {io-uring} Handle SQEs - register commands
fuse: make args->in_args[0] to be always the header
fuse: Add fuse-io-uring design documentation
fuse: Move request bits
fuse: Move fuse_get_dev to header file
fuse: rename to fuse_dev_end_requests and make non-static
Jeff Layton contributed an implementation of NFSv4.2+ attribute
delegation, as described here:
https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html
This interoperates with similar functionality introduced into the
Linux NFS client in v6.11. An attribute delegation permits an NFS
client to manage a file's mtime, rather than flushing dirty data to
the NFS server so that the file's mtime reflects the last write,
which is considerably slower.
Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
This facility enables NFSD to increase or decrease the number of
slots per NFS session depending on server memory availability. More
session slots means greater parallelism.
Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
encoding screws up when crossing a page boundary in the encoding
buffer. This is a zero-day bug, but hitting it is rare and depends
on the NFS client implementation. The Linux NFS client does not
happen to trigger this issue.
A variety of bug fixes and other incremental improvements fill out
the list of commits in this release. Great thanks to all
contributors, reviewers, testers, and bug reporters who participated
during this development cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmeVPBUACgkQM2qzM29m
f5dClxAAmW4O2bOJaR8neJ54fzeFYXtFYEXRF/XbIh2KdnCy2LoywT9ux8ndzE0k
1tsjtv0g4y84IcYfrxXPRTuhh2GO2pHw5L1kGVRezJg2ODSFbpdzGtcVK7SIrs2S
eijTViqdi9xRgfj1jPqRrvxC99RL3/fmCztqorAPLDFsqYAd/6ZRxZ7+IcZ2h4+J
cJ0Z6Wx6eh10roacZPXweH13XJ7xWO/ublYZvQFQpK2BAKyO98aXGgLDraNt8k60
X3DZLSKkGB/eBlNeAlTtcrXec/ot6XGJPKr3b/7zhwfMi8B13RGdSmCR8SxMdRQM
vCQO4G2YadU4YFS6FFIw9Wc1XDYUuYh2YgcveafjzjXbgi7NnY7rYOxtnTgBi0Xv
XGjtGqpvD676gPm+8b3DcwqmWI3c/WUdtQIZ1uYRCFZdFqkVP91bySPT2aHtx2GO
4j3uEyTlypC00kyvu1oL3+tVUG/EFlJCpYvIbOwqDG2m7KWPStzpfJTD5Q9cdlEl
fdgs6l82EVqe1YyjLTqajDuOcRrLYK2hlR/5STc03hQV+GpKSo5UypRejzE9WtRV
zT/tyelqhj4+0EZJz4ay/8q9s2Jp+5JGVxoVvjujSuH7+Ulb3T+IDkldtMamO8Fm
www2y0/fLfU2xIapMJdCoJ+ZKgel2i8RZMPIc0cIfO5ITXm+dOs=
=hwXx
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Jeff Layton contributed an implementation of NFSv4.2+ attribute
delegation, as described here:
https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html
This interoperates with similar functionality introduced into the
Linux NFS client in v6.11. An attribute delegation permits an NFS
client to manage a file's mtime, rather than flushing dirty data to
the NFS server so that the file's mtime reflects the last write, which
is considerably slower.
Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
This facility enables NFSD to increase or decrease the number of slots
per NFS session depending on server memory availability. More session
slots means greater parallelism.
Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
encoding screws up when crossing a page boundary in the encoding
buffer. This is a zero-day bug, but hitting it is rare and depends on
the NFS client implementation. The Linux NFS client does not happen to
trigger this issue.
A variety of bug fixes and other incremental improvements fill out the
list of commits in this release. Great thanks to all contributors,
reviewers, testers, and bug reporters who participated during this
development cycle"
* tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode
sunrpc: Remove gss_generic_token deadcode
sunrpc: Remove unused xprt_iter_get_xprt
Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages"
nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
nfsd: handle delegated timestamps in SETATTR
nfsd: add support for delegated timestamps
nfsd: rework NFS4_SHARE_WANT_* flag handling
nfsd: add support for FATTR4_OPEN_ARGUMENTS
nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
nfsd: switch to autogenerated definitions for open_delegation_type4
nfs_common: make include/linux/nfs4.h include generated nfs4_1.h
nfsd: fix handling of delegated change attr in CB_GETATTR
SUNRPC: Document validity guarantees of the pointer returned by reserve_space
NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer
NFSD: Refactor nfsd4_do_encode_secinfo() again
NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer
NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
...
- change kzalloc to kcalloc
- remove useless test in alloc_multiple_bios
- disable REQ_NOWAIT for flushes
- dm-transaction-manager: use red-black trees instead of linear lists
- atomic writes support for dm-linear, dm-stripe and dm-mirror
- dm-crypt: code cleanups and two bugfixes
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCZ5dX9RQcbXBhdG9ja2FA
cmVkaGF0LmNvbQAKCRATAyx9YGnhba9VAP97UEbvgxZU4UnysTZc+4t9eUlmWmmU
Tf/ERJGoi/nKXQEAr//Zj5oDLBxd80hgR8iDqLeG3L/QH8vMd8IxLwWJQg8=
=pRsj
-----END PGP SIGNATURE-----
Merge tag 'for-6.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- fix a spelling error in dm-raid
- change kzalloc to kcalloc
- remove useless test in alloc_multiple_bios
- disable REQ_NOWAIT for flushes
- dm-transaction-manager: use red-black trees instead of linear lists
- atomic writes support for dm-linear, dm-stripe and dm-mirror
- dm-crypt: code cleanups and two bugfixes
* tag 'for-6.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-crypt: track tag_offset in convert_context
dm-crypt: don't initialize cc_sector again
dm-crypt: don't update io->sector after kcryptd_crypt_write_io_submit()
dm-crypt: use bi_sector in bio when initialize integrity seed
dm-crypt: fully initialize clone->bi_iter in crypt_alloc_buffer()
dm-crypt: set atomic as false when calling crypt_convert() in kworker
dm-mirror: Support atomic writes
dm-io: Warn on creating multiple atomic write bios for a region
dm-stripe: Enable atomic writes
dm-linear: Enable atomic writes
dm: Ensure cloned bio is same length for atomic write
dm-table: atomic writes support
dm-transaction-manager: use red-black trees instead of linear lists
dm: disable REQ_NOWAIT for flushes
dm: remove useless test in alloc_multiple_bios
dm: change kzalloc to kcalloc
dm raid: fix spelling errors in raid_ctr()
Here is the "big" set of char/misc/iio and other smaller driver
subsystem updates for 6.14-rc1. Loads of different things in here this
development cycle, highlights are:
- ntsync "driver" to handle Windows locking types enabling Wine to
work much better on many workloads (i.e. games). The driver
framework was in 6.13, but now it's enabled and fully working
properly. Should make many SteamOS users happy. Even comes with
tests!
- Large IIO driver updates and bugfixes
- FPGA driver updates
- Coresight driver updates
- MHI driver updates
- PPS driver updatesa
- const bin_attribute reworking for many drivers
- binder driver updates
- smaller driver updates and fixes
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ5fGOQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynatACeLlbkhUT544Va1eOL2TkjfcGxrZUAoJ3ymGC0
y0N7/+fWL6aS+b4sEilv
=TU0D
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc/IIO driver updates from Greg KH:
"Here is the "big" set of char/misc/iio and other smaller driver
subsystem updates for 6.14-rc1. Loads of different things in here this
development cycle, highlights are:
- ntsync "driver" to handle Windows locking types enabling Wine to
work much better on many workloads (i.e. games). The driver
framework was in 6.13, but now it's enabled and fully working
properly. Should make many SteamOS users happy. Even comes with
tests!
- Large IIO driver updates and bugfixes
- FPGA driver updates
- Coresight driver updates
- MHI driver updates
- PPS driver updatesa
- const bin_attribute reworking for many drivers
- binder driver updates
- smaller driver updates and fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits)
ntsync: Fix reference leaks in the remaining create ioctls.
spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe()
spmi: Set fwnode for spmi devices
ntsync: fix a file reference leak in drivers/misc/ntsync.c
scripts/tags.sh: Don't tag usages of DECLARE_BITMAP
dt-bindings: interconnect: qcom,msm8998-bwmon: Add SM8750 CPU BWMONs
dt-bindings: interconnect: OSM L3: Document sm8650 OSM L3 compatible
dt-bindings: interconnect: qcom-bwmon: Document QCS615 bwmon compatibles
interconnect: sm8750: Add missing const to static qcom_icc_desc
memstick: core: fix kernel-doc notation
intel_th: core: fix kernel-doc warnings
binder: log transaction code on failure
iio: dac: ad3552r-hs: clear reset status flag
iio: dac: ad3552r-common: fix ad3541/2r ranges
iio: chemical: bme680: Fix uninitialized variable in __bme680_read_raw()
misc: fastrpc: Fix copy buffer page size
misc: fastrpc: Fix registered buffer page address
misc: fastrpc: Deregister device nodes properly in error scenarios
nvmem: core: improve range check for nvmem_cell_write()
nvmem: qcom-spmi-sdam: Set size in struct nvmem_config
...
Here is the USB and Thunderbolt driver updates for 6.14-rc1. Nothing
huge in here, just lots of new hardware support and updates for existing
drivers. Changes here are:
- big gadget f_tcm driver update
- other gadget driver updates and fixes
- thunderbolt driver updates for new hardware and capabilities and
lots more debugging functionality to handle it when things aren't
working well.
- xhci driver updates
- new USB-serial device updates
- typec driver updates, including a chrome platform driver (acked by
the subsystem maintainers)
- other small driver updates
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ5fI8A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yloPwCguHSvL8FM6Xwaxc/hdfalwI4c49AAnRECaMhR
mA1owvXgrKO3hjDHo2Sg
=Z5yt
-----END PGP SIGNATURE-----
Merge tag 'usb-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt driver updates from Greg KH:
"Here is the USB and Thunderbolt driver updates for 6.14-rc1. Nothing
huge in here, just lots of new hardware support and updates for
existing drivers. Changes here are:
- big gadget f_tcm driver update
- other gadget driver updates and fixes
- thunderbolt driver updates for new hardware and capabilities and
lots more debugging functionality to handle it when things aren't
working well.
- xhci driver updates
- new USB-serial device updates
- typec driver updates, including a chrome platform driver (acked by
the subsystem maintainers)
- other small driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (123 commits)
usb: hcd: Bump local buffer size in rh_string()
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
usb: typec: tcpci: Prevent Sink disconnection before vPpsShutdown in SPR PPS
usb: xhci: tegra: Fix OF boolean read warning
usb: host: xhci-plat: add support compatible ID PNP0D15
usb: typec: ucsi: Add a macro definition for UCSI v1.0
usb: dwc3: core: Defer the probe until USB power supply ready
usbip: Correct format specifier for seqnum from %d to %u
usbip: Fix seqnum sign extension issue in vhci_tx_urb
dt-bindings: usb: snps,dwc3: Split core description
usb: quirks: Add NO_LPM quirk for TOSHIBA TransMemory-Mx device
usb: dwc3: gadget: Reinitiate stream for all host NoStream behavior
USB: Use str_enable_disable-like helpers
USB: gadget: Use str_enable_disable-like helpers
USB: phy: Use str_enable_disable-like helpers
USB: typec: Use str_enable_disable-like helpers
USB: host: Use str_enable_disable-like helpers
USB: Replace own str_plural with common one
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
usb: phy: Remove API devm_usb_put_phy()
...
A small number of improvements all over the place:
vdpa/octeon gained support for multiple interrupts
virtio-pci gained support for error recovery
vp_vdpa gained support for notification with data
vhost/net has been fixed to set num_buffers for spec compliance
virtio-mem now works with kdump on s390
Small cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmeXnAsPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpbsYH/0gfvGFBrILN3O06cWtm/ZEny6U86o3imvxm
5tBYOu/gh7yFqPHb3ywwz0Xy8Sty8zdIGVcod6+ioiS5JxV4m75/8eODZZHK/O+g
W+2ozgRFm07RIQX8qQxfN6MURTEw9GHWLPqHfLopbQtoKJbD0NpWnm272xlJkox2
SzuHJ2D1Sg3ItcRr0x1TVsjefQKUHFduS/nt2WfQWjCnEXEbCx3S+Jp6oFCoub6L
zgI6RLim9HdScgo5lXzbWEyJ4fEjWOypO3Z5IEXls8ZP/OEueCHZX3eZmfgbbfhP
/uCPhoIxHe4PJBFDRKogdNyV40Iq8LvF7RzhOtJjS7GFlf1bipM=
=PM05
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"A small number of improvements all over the place:
- vdpa/octeon support for multiple interrupts
- virtio-pci support for error recovery
- vp_vdpa support for notification with data
- vhost/net fix to set num_buffers for spec compliance
- virtio-mem now works with kdump on s390
And small cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (23 commits)
virtio_blk: Add support for transport error recovery
virtio_pci: Add support for PCIe Function Level Reset
vhost/net: Set num_buffers for virtio 1.0
vdpa/octeon_ep: read vendor-specific PCI capability
virtio-pci: define type and header for PCI vendor data
vdpa/octeon_ep: handle device config change events
vdpa/octeon_ep: enable support for multiple interrupts per device
vdpa: solidrun: Replace deprecated PCI functions
s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
virtio-mem: remember usable region size
virtio-mem: mark device ready before registering callbacks in kdump mode
fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
fs/proc/vmcore: factor out freeing a list of vmcore ranges
fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list
fs/proc/vmcore: move vmcore definitions out of kcore.h
fs/proc/vmcore: prefix all pr_* with "vmcore:"
fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
...
Added macro definition for VIRTIO_PCI_CAP_VENDOR_CFG to identify the PCI
vendor data type in the virtio_pci_cap structure. Defined a new struct
virtio_pci_vndr_data for the vendor data capability header as per the
specification.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Message-Id: <20250103153226.1933479-3-sthotton@marvell.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap library
code.
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some
cleanup and Rust preparation in the xarray library code.
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes
pathnames in some code comments.
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the
new secs_to_jiffies() in various places where that is appropriate.
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API.
- "Convert ocfs2 to use folios" from Matthew Wilcox does that.
- "Remove get_task_comm() and print task comm directly" from Yafang Shao
removes now-unneeded calls to get_task_comm() in various places.
- "squashfs: reduce memory usage and update docs" from Phillip Lougher
implements some memory savings in squashfs and performs some
maintainability work.
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work.
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a
corrupted image.
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc.
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger.
- "minmax.h: Cleanups and minor optimisations" from David Laight
does some maintenance work on the min/max library code.
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work
on the xarray library code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5SP5QAKCRDdBJ7gKXxA
jqN7AQChvwXGG43n4d5SDiA/rH7ddvowQcDqhC9cAMJ1ReR7qwEA8/LIWDE4PdMX
mJnaZ1/ibpEpearrChCViApQtcyEGQI=
=ti4E
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Mainly individually changelogged singleton patches. The patch series
in this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap
library code
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
some cleanup and Rust preparation in the xarray library code
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven
fixes pathnames in some code comments
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
the new secs_to_jiffies() in various places where that is
appropriate
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API
- "Convert ocfs2 to use folios" from Matthew Wilcox does that
- "Remove get_task_comm() and print task comm directly" from Yafang
Shao removes now-unneeded calls to get_task_comm() in various
places
- "squashfs: reduce memory usage and update docs" from Phillip
Lougher implements some memory savings in squashfs and performs
some maintainability work
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented
with a corrupted image
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger
- "minmax.h: Cleanups and minor optimisations" from David Laight does
some maintenance work on the min/max library code
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
work on the xarray library code"
* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
ocfs2: use str_yes_no() and str_no_yes() helper functions
include/linux/lz4.h: add some missing macros
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Xarray: remove repeat check in xas_squash_marks()
Xarray: distinguish large entries correctly in xas_split_alloc()
Xarray: move forward index correctly in xas_pause()
Xarray: do not return sibling entries from xas_find_marked()
ipc/util.c: complete the kernel-doc function descriptions
gcov: clang: use correct function param names
latencytop: use correct kernel-doc format for func params
minmax.h: remove some #defines that are only expanded once
minmax.h: simplify the variants of clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: update some comments
minmax.h: add whitespace around operators and after commas
nilfs2: do not update mtime of renamed directory that is not moved
nilfs2: handle errors that nilfs_prepare_chunk() may return
CREDITS: fix spelling mistake
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmeTr8wUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxrMw//TJXH+U6x5LhYvBPD/KZ20ecGHqaA
eGXrbHAasYbU1CfW7HM0onR8NffOIGoYvQrtefjQAln0w6rTvyFO0xJKLP15vMfN
hnj+y1WWtKwAkSpu10Cl9nTj8uYRNNSQeoy5kS+1diwuXdby/DlgQONO2APSe9zd
KMPXJcqSfDJlM5zHrcqqtlxauO9KHInLCc/iutd85AKjvcjOoNHNeZE0pTC0C3gE
sXYHDqJiS3zdEG6X6mWFo3OzI/Q/7NGlHJ2j0CQaObsgQ9yA7eWkez25ifwZcugc
TPtjm8DhaDo9/zx0NV9c2dPauHRC6NYUjAflMPK7Aye/41BE1Ag5Ka+tMDgC2i/N
TbfBxSeArhjnjY+eZwRhrJNNC58TtHTUs69TO7Dbmuwr7cp99MIEDAYI5V6LFAdk
plKqn1h8FztW5QKRPCgmzy6KTE+WPytiGAGAQFxzIGYkV/QqyvFaVs8FIyOJUIFM
aDSa6Xy5WLGxmPZ9hPapzEm4ws/HTRpFjNgi/d4rRG5RWMwAxZZa44s9eldhN1D/
ZwEmF2rJ+U8S7Q+mXPHDlwcsHe5APCbiaTEp4X+e3LNe0i9oxhhaUWG6LrDDmTlQ
tU5j5daHiBa0nTDL1lfaayJlYX/oJ+IYQrIYzGbnivZv4ZVdPnuWSsOsMOEiLhEt
4QqCoanqf0mCn2A=
=O62O
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Batch sizing of multiple BARs while memory decoding is disabled
instead of disabling/enabling decoding for each BAR individually;
this optimizes virtualized environments where toggling decoding
enable is expensive (Alex Williamson)
- Add host bridge .enable_device() and .disable_device() hooks for
bridges that need to configure things like Requester ID to StreamID
mapping when enabling devices (Frank Li)
- Extend struct pci_ecam_ops with .enable_device() and
.disable_device() hooks so drivers that use pci_host_common_probe()
instead of their own .probe() have a way to set the
.enable_device() callbacks (Marc Zyngier)
- Drop 'No bus range found' message so we don't complain when DTs
don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)
- Rename the drivers/pci/of_property.c struct of_pci_range to
of_pci_range_entry to avoid confusion with the global of_pci_range
in include/linux/of_address.h (Bjorn Helgaas)
Driver binding:
- Update resource request API documentation to encourage callers to
supply a driver name when requesting resources (Philipp Stanner)
- Export pci_intx_unmanaged() and pcim_intx() (always managed) so
callers of pci_intx() (which is sometimes managed) can explicitly
choose the one they need (Philipp Stanner)
- Convert drivers from pci_intx() to always-managed pcim_intx() or
never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna,
ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)
- Remove pci_intx_unmanaged() since pci_intx() is now always
unmanaged and pcim_intx() is always managed (Philipp Stanner)
Error handling:
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core
logging rather than building their own (Ilpo Järvinen)
- Move TLP Log handling to its own file (Ilpo Järvinen)
- Store number of supported End-End TLP Prefixes always so we can
read the correct number of DWORDs from the TLP Prefix Log (Ilpo
Järvinen)
- Read TLP Prefixes in addition to the Header Log in
pcie_read_tlp_log() (Ilpo Järvinen)
- Add pcie_print_tlp_log() to consolidate printing of TLP Header and
Prefix Log (Ilpo Järvinen)
- Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor
BIOSes that don't configure it correctly (Takashi Iwai)
ASPM:
- Save parent L1 PM Substates config so when we restore it along with
an endpoint's config, the parent info isn't junk (Jian-Hong Pan)
Power management:
- Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because
the system can't wake up from suspend (Werner Sembach)
Endpoint framework:
- Destroy the EPC device in devm_pci_epc_destroy(), which previously
didn't call devres_release() (Zijun Hu)
- Finish virtual EP removal in pci_epf_remove_vepf(), which
previously caused a subsequent pci_epf_add_vepf() to fail with
-EBUSY (Zijun Hu)
- Write BAR_MASK before iATU registers in pci_epc_set_bar() so we
don't depend on the BAR_MASK reset value being larger than the
requested BAR size (Niklas Cassel)
- Prevent changing BAR size/flags in pci_epc_set_bar() to prevent
reads from bypassing the iATU if we reduced the BAR size (Niklas
Cassel)
- Verify address alignment when programming iATU so we don't attempt
to write bits that are read-only because of the BAR size, which
could lead to directing accesses to the wrong address (Niklas
Cassel)
- Implement artpec6 pci_epc_features so we can rely on all drivers
supporting it so we can use it in EPC core code (Niklas Cassel)
- Check for BARs of fixed size to prevent endpoint drivers from
trying to change their size (Niklas Cassel)
- Verify that requested BAR size is a power of two when endpoint
driver sets the BAR (Niklas Cassel)
Endpoint framework tests:
- Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
dma_chan_rx (Mohamed Khalfella)
- Correct the DMA MEMCPY test so it doesn't fail if the Endpoint
supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)
- Add pci-epf-test and pci_endpoint_test support for capabilities
(Niklas Cassel)
- Add Endpoint test for consecutive BARs (Niklas Cassel)
- Remove redundant comparison from Endpoint BAR test because a > 1MB
BAR can always be exactly covered by iterating with a 1MB buffer
(Hans Zhang)
- Move and convert PCI Endpoint tests from tools/pci to Kselftests
(Manivannan Sadhasivam)
Apple PCIe controller driver:
- Convert StreamID mapping configuration from a bus notifier to the
.enable_device() and .disable_device() callbacks (Marc Zyngier)
Freescale i.MX6 PCIe controller driver:
- Add Requester ID to StreamID mapping configuration when enabling
devices (Frank Li)
- Use DWC core suspend/resume functions for imx6 (Frank Li)
- Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
Zhu)
- Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank
Li)
- Add DT binding for optional i.MX95 Refclk and driver support to
enable it if the platform hasn't enabled it (Richard Zhu)
- Configure PHY based on controller being in Root Complex or Endpoint
mode (Frank Li)
- Rely on dbi2 and iATU base addresses from DT via
dw_pcie_get_resources() instead of hardcoding them (Richard Zhu)
- Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
asserted in imx_pcie_assert_core_reset() (Richard Zhu)
- Add missing reference clock enable or disable logic for IMX6SX,
IMX7D, IMX8MM (Richard Zhu)
- Remove redundant imx7d_pcie_init_phy() since
imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu)
Freescale Layerscape PCIe controller driver:
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead
of syscon_regmap_lookup_by_phandle() followed by
of_property_read_u32_array() (Krzysztof Kozlowski)
Marvell MVEBU PCIe controller driver:
- Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)
MediaTek PCIe Gen3 controller driver:
- Use clk_bulk_prepare_enable() instead of separate
clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi)
- Rearrange reset assert/deassert so they're both done in the
*_power_up() callbacks (Lorenzo Bianconi)
- Document that Airoha EN7581 requires PHY init and power-on before
PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo
Bianconi)
- Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)
- Sleep instead of delay during Airoha EN7581 power-up, since this is
a non-atomic context (Lorenzo Bianconi)
- Skip PERST# assertion on Airoha EN7581 during probe and
suspend/resume to avoid a hardware defect (Lorenzo Bianconi)
- Enable async probe to reduce system startup time (Douglas Anderson)
Microchip PolarFlare PCIe controller driver:
- Set up the inbound address translation based on whether the
platform allows coherent or non-coherent DMA (Daire McNamara)
- Update DT binding such that platforms are DMA-coherent by default
and must specify 'dma-noncoherent' if needed (Conor Dooley)
Mobiveil PCIe controller driver:
- Convert mobiveil-pcie.txt to YAML and update 'interrupt-names'
and 'reg-names' (Frank Li)
Qualcomm PCIe controller driver:
- Add DT SM8550 and SM8650 optional 'global' interrupt for link
events (Neil Armstrong)
- Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta
Mylavarapu)
- If 'global' IRQ is supported for detection of Link Up events, tell
DWC core not to wait for link up (Krishna chaitanya chundru)
Renesas R-Car PCIe controller driver:
- Avoid passing stack buffer as resource name (King Dix)
Rockchip PCIe controller driver:
- Simplify clock and reset handling by using bulk interfaces (Anand
Moon)
- Pass typed rockchip_pcie (not void) pointer to
rockchip_pcie_disable_clocks() (Anand Moon)
- Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails
(Dan Carpenter)
Rockchip DesignWare PCIe controller driver:
- Use dll_link_up IRQ to detect Link Up and enumerate devices so
users don't have to manually rescan (Niklas Cassel)
- Tell DWC core not to wait for link up since the 'sys' interrupt is
required and detects Link Up events (Niklas Cassel)
Synopsys DesignWare PCIe controller driver:
- Don't wait for link up in DWC core if driver can detect Link Up
event (Krishna chaitanya chundru)
- Update ICC and OPP votes after Link Up events (Krishna chaitanya
chundru)
- Always stop link in dw_pcie_suspend_noirq(), which is required at
least for i.MX8QM to re-establish link on resume (Richard Zhu)
- Drop racy and unnecessary LTSSM state check before sending
PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu)
- Add struct of_pci_range.parent_bus_addr for devices that need their
immediate parent bus address, not the CPU address, e.g., to program
an internal Address Translation Unit (iATU) (Frank Li)
TI DRA7xx PCIe controller driver:
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
syscon_regmap_lookup_by_phandle() followed by
of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
(Krzysztof Kozlowski)
Xilinx Versal CPM PCIe controller driver:
- Add DT binding and driver support for Xilinx Versal CPM5
(Thippeswamy Havalige)
MicroSemi Switchtec management driver:
- Add Microchip PCI100X device IDs (Rakesh Babu Saladi)
Miscellaneous:
- Move reset related sysfs code from pci.c to pci-sysfs.c where other
similar code lives (Ilpo Järvinen)
- Simplify reset_method_store() memory management by using __free()
instead of explicit kfree() cleanup (Ilpo Järvinen)
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM
ACPI hotplug driver (Thomas Weißschuh)
- Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong
Zhang)
- Correct documentation of the 'config_acs=' kernel parameter
(Akihiko Odaki)"
* tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits)
PCI: Batch BAR sizing operations
dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
PCI: microchip: Set inbound address translation for coherent or non-coherent mode
Documentation: Fix pci=config_acs= example
PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
PCI: Don't include 'pm_wakeup.h' directly
selftests: pci_endpoint: Migrate to Kselftest framework
selftests: Move PCI Endpoint tests from tools/pci to Kselftests
misc: pci_endpoint_test: Fix IOCTL return value
dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller
dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt
dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
PCI: switchtec: Add Microchip PCI100X device IDs
misc: pci_endpoint_test: Remove redundant 'remainder' test
misc: pci_endpoint_test: Add consecutive BAR test
misc: pci_endpoint_test: Add support for capabilities
PCI: endpoint: pci-epf-test: Add support for capabilities
PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
PCI: dwc: Simplify config resource lookup
...
* Clear LLBCTL if secondary mmu mapping changes.
* Add hypercall service support for usermode VMM.
x86:
* Add a comment to kvm_mmu_do_page_fault() to explain why KVM performs a
direct call to kvm_tdp_page_fault() when RETPOLINE is enabled.
* Ensure that all SEV code is compiled out when disabled in Kconfig, even
if building with less brilliant compilers.
* Remove a redundant TLB flush on AMD processors when guest CR4.PGE changes.
* Use str_enabled_disabled() to replace open coded strings.
* Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's APICv cache
prior to every VM-Enter.
* Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities
instead of just those where KVM needs to manage state and/or explicitly
enable the feature in hardware. Along the way, refactor the code to make
it easier to add features, and to make it more self-documenting how KVM
is handling each feature.
* Rework KVM's handling of VM-Exits during event vectoring; this plugs holes
where KVM unintentionally puts the vCPU into infinite loops in some scenarios
(e.g. if emulation is triggered by the exit), and brings parity between VMX
and SVM.
* Add pending request and interrupt injection information to the kvm_exit and
kvm_entry tracepoints respectively.
* Fix a relatively benign flaw where KVM would end up redoing RDPKRU when
loading guest/host PKRU, due to a refactoring of the kernel helpers that
didn't account for KVM's pre-checking of the need to do WRPKRU.
* Make the completion of hypercalls go through the complete_hypercall
function pointer argument, no matter if the hypercall exits to
userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE
specifically went to userspace, and all the others did not; the new code
need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at
all whether there was an exit to userspace or not.
* As part of enabling TDX virtual machines, support support separation of
private/shared EPT into separate roots. When TDX will be enabled, operations
on private pages will need to go through the privileged TDX Module via SEAMCALLs;
as a result, they are limited and relatively slow compared to reading a PTE.
The patches included in 6.14 allow KVM to keep a mirror of the private EPT in
host memory, and define entries in kvm_x86_ops to operate on external page
tables such as the TDX private EPT.
* The recently introduced conversion of the NX-page reclamation kthread to
vhost_task moved the task under the main process. The task is created as
soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that
didn't expect to see any child task of the VM process until it started
creating its own userspace threads. In particular crosvm refuses to fork()
if procfs shows any child task, so unbreak it by creating the task lazily.
This is arguably a userspace bug, as there can be other kinds of legitimate
worker tasks and they wouldn't impede fork(); but it's not like userspace
has a way to distinguish kernel worker tasks right now. Should they show
as "Kthread: 1" in proc/.../status?
x86 - Intel:
* Fix a bug where KVM updates hardware's APICv cache of the highest ISR bit
while L2 is active, while ultimately results in a hardware-accelerated L1
EOI effectively being lost.
* Honor event priority when emulating Posted Interrupt delivery during nested
VM-Enter by queueing KVM_REQ_EVENT instead of immediately handling the
interrupt.
* Rework KVM's processing of the Page-Modification Logging buffer to reap
entries in the same order they were created, i.e. to mark gfns dirty in the
same order that hardware marked the page/PTE dirty.
* Misc cleanups.
Generic:
* Cleanup and harden kvm_set_memory_region(); add proper lockdep assertions when
setting memory regions and add a dedicated API for setting KVM-internal
memory regions. The API can then explicitly disallow all flags for
KVM-internal memory regions.
* Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug
where KVM would return a pointer to a vCPU prior to it being fully online,
and give kvm_for_each_vcpu() similar treatment to fix a similar flaw.
* Wait for a vCPU to come online prior to executing a vCPU ioctl, to fix a
bug where userspace could coerce KVM into handling the ioctl on a vCPU that
isn't yet onlined.
* Gracefully handle xarray insertion failures; even though such failures are
impossible in practice after xa_reserve(), reserving an entry is always followed
by xa_store() which does not know (or differentiate) whether there was an
xa_reserve() before or not.
RISC-V:
* Zabha, Svvptc, and Ziccrse extension support for guests. None of them
require anything in KVM except for detecting them and marking them
as supported; Zabha adds byte and halfword atomic operations, while the
others are markers for specific operation of the TLB and of LL/SC
instructions respectively.
* Virtualize SBI system suspend extension for Guest/VM
* Support firmware counters which can be used by the guests to collect
statistics about traps that occur in the host.
Selftests:
* Rework vcpu_get_reg() to return a value instead of using an out-param, and
update all affected arch code accordingly.
* Convert the max_guest_memory_test into a more generic mmu_stress_test.
The basic gist of the "conversion" is to have the test do mprotect() on
guest memory while vCPUs are accessing said memory, e.g. to verify KVM
and mmu_notifiers are working as intended.
* Play nice with treewrite builds of unsupported architectures, e.g. arm
(32-bit), as KVM selftests' Makefile doesn't do anything to ensure the
target architecture is actually one KVM selftests supports.
* Use the kernel's $(ARCH) definition instead of the target triple for arch
specific directories, e.g. arm64 instead of aarch64, mainly so as not to
be different from the rest of the kernel.
* Ensure that format strings for logging statements are checked by the
compiler even when the logging statement itself is disabled.
* Attempt to whack the last LLC references/misses mole in the Intel PMU
counters test by adding a data load and doing CLFLUSH{OPT} on the data
instead of the code being executed. It seems that modern Intel CPUs
have learned new code prefetching tricks that bypass the PMU counters.
* Fix a flaw in the Intel PMU counters test where it asserts that events
are counting correctly without actually knowing what the events count
given the underlying hardware; this can happen if Intel reuses a
formerly microarchitecture-specific event encoding as an architectural
event, as was the case for Top-Down Slots.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmeTuzoUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOkBwf8CRNExYaM3j9y2E7mmo6AiL2ug6+J
Uy5Hai1poY48pPwKC6ke3EWT8WVsgj/Py5pCeHvLojQchWNjCCYNfSQluJdkRxwG
DgP3QUljSxEJWBeSwyTRcKM+IySi5hZd1IFo3gePFRB829Jpnj05vjbvCyv8gIwU
y3HXxSYDsViaaFoNg4OlZFsIGis7mtknsZzk++QjuCXmxNa6UCbv3qvE/UkVLhVg
WH65RTRdjk+EsdwaOMHKuUvQoGa+iM4o39b6bqmw8+ZMK39+y33WeTX/y5RXsp1N
tUUBRfS+MuuYgC/6LmTr66EkMzoChxk3Dp3kKUaCBcfqRC8PxQag5reZhw==
=NEaO
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Loongarch:
- Clear LLBCTL if secondary mmu mapping changes
- Add hypercall service support for usermode VMM
x86:
- Add a comment to kvm_mmu_do_page_fault() to explain why KVM
performs a direct call to kvm_tdp_page_fault() when RETPOLINE is
enabled
- Ensure that all SEV code is compiled out when disabled in Kconfig,
even if building with less brilliant compilers
- Remove a redundant TLB flush on AMD processors when guest CR4.PGE
changes
- Use str_enabled_disabled() to replace open coded strings
- Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's
APICv cache prior to every VM-Enter
- Overhaul KVM's CPUID feature infrastructure to track all vCPU
capabilities instead of just those where KVM needs to manage state
and/or explicitly enable the feature in hardware. Along the way,
refactor the code to make it easier to add features, and to make it
more self-documenting how KVM is handling each feature
- Rework KVM's handling of VM-Exits during event vectoring; this
plugs holes where KVM unintentionally puts the vCPU into infinite
loops in some scenarios (e.g. if emulation is triggered by the
exit), and brings parity between VMX and SVM
- Add pending request and interrupt injection information to the
kvm_exit and kvm_entry tracepoints respectively
- Fix a relatively benign flaw where KVM would end up redoing RDPKRU
when loading guest/host PKRU, due to a refactoring of the kernel
helpers that didn't account for KVM's pre-checking of the need to
do WRPKRU
- Make the completion of hypercalls go through the complete_hypercall
function pointer argument, no matter if the hypercall exits to
userspace or not.
Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically
went to userspace, and all the others did not; the new code need
not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at
all whether there was an exit to userspace or not
- As part of enabling TDX virtual machines, support support
separation of private/shared EPT into separate roots.
When TDX will be enabled, operations on private pages will need to
go through the privileged TDX Module via SEAMCALLs; as a result,
they are limited and relatively slow compared to reading a PTE.
The patches included in 6.14 allow KVM to keep a mirror of the
private EPT in host memory, and define entries in kvm_x86_ops to
operate on external page tables such as the TDX private EPT
- The recently introduced conversion of the NX-page reclamation
kthread to vhost_task moved the task under the main process. The
task is created as soon as KVM_CREATE_VM was invoked and this, of
course, broke userspace that didn't expect to see any child task of
the VM process until it started creating its own userspace threads.
In particular crosvm refuses to fork() if procfs shows any child
task, so unbreak it by creating the task lazily. This is arguably a
userspace bug, as there can be other kinds of legitimate worker
tasks and they wouldn't impede fork(); but it's not like userspace
has a way to distinguish kernel worker tasks right now. Should they
show as "Kthread: 1" in proc/.../status?
x86 - Intel:
- Fix a bug where KVM updates hardware's APICv cache of the highest
ISR bit while L2 is active, while ultimately results in a
hardware-accelerated L1 EOI effectively being lost
- Honor event priority when emulating Posted Interrupt delivery
during nested VM-Enter by queueing KVM_REQ_EVENT instead of
immediately handling the interrupt
- Rework KVM's processing of the Page-Modification Logging buffer to
reap entries in the same order they were created, i.e. to mark gfns
dirty in the same order that hardware marked the page/PTE dirty
- Misc cleanups
Generic:
- Cleanup and harden kvm_set_memory_region(); add proper lockdep
assertions when setting memory regions and add a dedicated API for
setting KVM-internal memory regions. The API can then explicitly
disallow all flags for KVM-internal memory regions
- Explicitly verify the target vCPU is online in kvm_get_vcpu() to
fix a bug where KVM would return a pointer to a vCPU prior to it
being fully online, and give kvm_for_each_vcpu() similar treatment
to fix a similar flaw
- Wait for a vCPU to come online prior to executing a vCPU ioctl, to
fix a bug where userspace could coerce KVM into handling the ioctl
on a vCPU that isn't yet onlined
- Gracefully handle xarray insertion failures; even though such
failures are impossible in practice after xa_reserve(), reserving
an entry is always followed by xa_store() which does not know (or
differentiate) whether there was an xa_reserve() before or not
RISC-V:
- Zabha, Svvptc, and Ziccrse extension support for guests. None of
them require anything in KVM except for detecting them and marking
them as supported; Zabha adds byte and halfword atomic operations,
while the others are markers for specific operation of the TLB and
of LL/SC instructions respectively
- Virtualize SBI system suspend extension for Guest/VM
- Support firmware counters which can be used by the guests to
collect statistics about traps that occur in the host
Selftests:
- Rework vcpu_get_reg() to return a value instead of using an
out-param, and update all affected arch code accordingly
- Convert the max_guest_memory_test into a more generic
mmu_stress_test. The basic gist of the "conversion" is to have the
test do mprotect() on guest memory while vCPUs are accessing said
memory, e.g. to verify KVM and mmu_notifiers are working as
intended
- Play nice with treewrite builds of unsupported architectures, e.g.
arm (32-bit), as KVM selftests' Makefile doesn't do anything to
ensure the target architecture is actually one KVM selftests
supports
- Use the kernel's $(ARCH) definition instead of the target triple
for arch specific directories, e.g. arm64 instead of aarch64,
mainly so as not to be different from the rest of the kernel
- Ensure that format strings for logging statements are checked by
the compiler even when the logging statement itself is disabled
- Attempt to whack the last LLC references/misses mole in the Intel
PMU counters test by adding a data load and doing CLFLUSH{OPT} on
the data instead of the code being executed. It seems that modern
Intel CPUs have learned new code prefetching tricks that bypass the
PMU counters
- Fix a flaw in the Intel PMU counters test where it asserts that
events are counting correctly without actually knowing what the
events count given the underlying hardware; this can happen if
Intel reuses a formerly microarchitecture-specific event encoding
as an architectural event, as was the case for Top-Down Slots"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits)
kvm: defer huge page recovery vhost task to later
KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault()
KVM: Disallow all flags for KVM-internal memslots
KVM: x86: Drop double-underscores from __kvm_set_memory_region()
KVM: Add a dedicated API for setting KVM-internal memslots
KVM: Assert slots_lock is held when setting memory regions
KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API)
LoongArch: KVM: Add hypercall service support for usermode VMM
LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed
KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup()
KVM: VMX: read the PML log in the same order as it was written
KVM: VMX: refactor PML terminology
KVM: VMX: Fix comment of handle_vmx_instruction()
KVM: VMX: Reinstate __exit attribute for vmx_exit()
KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup()
KVM: x86: Avoid double RDPKRU when loading host/guest PKRU
KVM: x86: Use LVT_TIMER instead of an open coded literal
RISC-V: KVM: Add new exit statstics for redirected traps
RISC-V: KVM: Update firmware counters for various events
RISC-V: KVM: Redirect instruction access fault trap to guest
...
No major functionality this cycle:
- iommufd part of the domain_alloc_paging_flags() conversion
- Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers
- Increase a timeout waiting for other threads to drop transient refcounts
that syzkaller was hitting
- Fix a UBSAN hit in iova_bitmap due to shift out of bounds
- Add missing cleanup of fault events during FD shutdown, fixing a memory leak
- Improve the fault delivery flow to have a smaller locking critical
region that does not include copy_to_user()
- Fix 32 bit ABI breakage due to missed implicit padding, and fix the
stack memory leakage
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ5JgnAAKCRCFwuHvBreF
YQWrAP9ItOsbeOxYIEVQK1E90HbMWVHF8RhWDPnpChgzyorjjQEA/OOou6uTAcVC
fJybLk3T7JrutMmBFvVLsRg+yCJPlgE=
=ldhT
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"No major functionality this cycle:
- iommufd part of the domain_alloc_paging_flags() conversion
- Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers
- Increase a timeout waiting for other threads to drop transient
refcounts that syzkaller was hitting
- Fix a UBSAN hit in iova_bitmap due to shift out of bounds
- Add missing cleanup of fault events during FD shutdown, fixing a
memory leak
- Improve the fault delivery flow to have a smaller locking critical
region that does not include copy_to_user()
- Fix 32 bit ABI breakage due to missed implicit padding, and fix the
stack memory leakage"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Fix struct iommu_hwpt_pgfault init and padding
iommufd/fault: Use a separate spinlock to protect fault->deliver list
iommufd/fault: Destroy response and mutex in iommufd_fault_destroy()
iommufd: Keep OBJ/IOCTL lists in an alphabetical order
iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index()
iommu: iommufd: fix WARNING in iommufd_device_unbind
iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core
iommufd/selftest: Remove domain_alloc_paging()
Highlights:
- acer-wmi:
- Add support for PH14-51, PH16-72, and Nitro AN515-58
- Add proper hwmon support
- Improve error handling when reading "gaming system info"
- Replace direct EC reads for the current platform profile
with WMI calls to handle EC address variations
- Replace custom platform_profile cycling with the generic one
- ACPI: platform_profile: Major refactoring and improvements
- Support registering multiple platform_profile handlers
concurrently to avoid the need to quirk which handler takes
precedence
- Support reporting "custom" profile for cases where the current
profile is ambiguous or when settings tweaks are done outside
the pre-defined profile
- Abstract and layer platform_profile API better using the
class_dev and drvdata
- Various minor improvements
- Add Documentation and kerneldoc
- amd/hsmp: Add support for HSMP protocol v7
- amd/pmc:
- Support AMD 1Ah family 70h
- Support STB with Ryzen desktop SoCs
- amd/pmf:
- Support Custom BIOS inputs for PMF TA
- Support passing SRA sensor data from AMD SFH (HID) to PMF TA
- dell-smo8800:
- Move SMO88xx quirk away from the generic i2c-i801 driver
- Add accelerometer support for Dell Latitude E6330/E6430 and
XPS 9550
- Support probing accelerometer for models yet to be listed in
the DMI mapping table because ACPI lacks i2c-address for the
accelerometer (behind a module parameter because probing might
be dangerous)
- HID: amd_sfh: Add support for exporting SRA sensor data
- hp-wmi: Add fan and thermal support for Victus 16-s1000
- input: Add key for phone linking
- input: i8042: Add context for the i8042 filter to enable cleaning up
the filter related global variables from pdx86 drivers
- lenovo-wmi-camera: Use SW_CAMERA_LENS_COVER instead of
KEY_CAMERA_ACCESS
- mellanox: mlxbf-pmc:
- Add support for monitoring cycle count
- Add Documentation
- thinkpad_acpi: Add support for phone link key
- tools/power/x86/intel-speed-select: Fix Turbo Ratio Limit restore
- x86-android-tables: Add support for Vexia EDU ATLA 10 Bluetooth and
EC battery driver
- Miscellaneous cleanups / refactoring / improvements
The following is an automated shortlog grouped by driver:
acer-wmi:
- add support for Acer Nitro AN515-58
- Add support for Acer PH14-51
- Add support for Acer Predator PH16-72
- Fix initialization of last_non_turbo_profile
- Ignore AC events
- Implement proper hwmon support
- Improve error handling when reading gaming system information
- Rename ACER_CAP_FAN_SPEED_READ
- simplify platform profile cycling
- use an ACPI bitmap to set the platform profile choices
- Use devm_platform_profile_register()
- use new helper function for setting overclocks
- use WMI calls for platform profile handling
ACPI: platform-profile:
- Add a name member to handlers
ACPI: platform_profile:
- Add a prefix to log messages
- Add choices attribute for class interface
- Add concept of a "custom" profile
- Add device pointer into platform profile handler
- Add devm_platform_profile_register()
- Add documentation
- Add name attribute to class interface
- Add `ops` member to handlers
- Add platform handler argument to platform_profile_remove()
- Add `probe` to platform_profile_ops
- Add profile attribute for class interface
- Allow multiple handlers
- Check all profile handler to calculate next
- Clean platform_profile_handler
- Create class for ACPI platform profile
- Let drivers set drvdata to the class device
- Make sure all profile handlers agree on profile
- Move matching string for new profile out of mutex
- Move platform_profile_handler
- Move sanity check out of the mutex
- Notify change events on register and unregister
- Notify class device from platform_profile_notify()
- Only show profiles common for all handlers
- Pass the profile handler into platform_profile_notify()
- Remove platform_profile_handler from callbacks
- Remove platform_profile_handler from exported symbols
- Replace *class_dev member with class_dev
- Use guard(mutex) for register/unregister
- Use `scoped_cond_guard`
alienware_wmi:
- General cleanup of WMAX methods
alienware-wmi:
- Improve hdmi_mux, amplifier and deepslp group creation
- Improve rgb-zones group creation
- Modify parse_rgb() signature
- Move Lighting Control State
- Remove unnecessary check at module exit
- Use devm_platform_profile_register()
amd/hsmp:
- Add support for HSMP protocol version 7 messages
- Constify 'struct bin_attribute'
amd/pmc:
- Add STB support for AMD Desktop variants
- Define enum for S2D/PMC msg_port and add helper function
- Isolate STB code changes to a new file
- Move STB block into amd_pmc_s2d_init()
- Move STB functionality to a new file for better code organization
- Update function names to align with new STB file
- Update IP information structure for newer SoCs
- Update S2D message id for 1Ah Family 70h model
- Use ARRAY_SIZE() to fill num_ips information
amd: pmc:
- Use guard(mutex)
amd: pmf:
- Drop all quirks
amd/pmf:
- Enable Custom BIOS Inputs for PMF-TA
- Get SRA sensor data from AMD SFH driver
amd: pmf: sps:
- Use devm_platform_profile_register()
amd: pmf:
- Switch to guard(mutex)
asus-wmi:
- Use devm_platform_profile_register()
dell: dcdbas:
- Constify 'struct bin_attribute'
dell: dell-pc:
- Create platform device
dell-pc:
- Use devm_platform_profile_register()
dell_rbu:
- Constify 'struct bin_attribute'
dell-smo8800:
- Add a couple more models to lis3lv02d_devices[]
- Add support for probing for the accelerometer i2c address
- Move instantiation of lis3lv02d i2c_client from i2c-i801 to dell-lis3lv02d
- Move SMO88xx acpi_device_ids to dell-smo8800-ids.h
dell-sysman:
- Directly use firmware_attributes_class
dell-uart-backlight:
- Use blacklight power constant
docs: platform/x86: wmi:
- mention tool for invoking WMI methods
Documentation/ABI:
- Add document for Mellanox PMC driver
- Add new sysfs field to sysfs-platform-mellanox-pmc
Documentation:
- Add documentation about class interface for platform profiles
firmware_attributes_class:
- Drop lifecycle functions
- Move include linux/device/class.h
- Simplify API
fujitsu-laptop:
- replace strcpy -> strscpy
HID: amd_sfh:
- Add support to export device operating states
hp-bioscfg:
- Directly use firmware_attributes_class
hp-wmi:
- Add fan and thermal profile support for Victus 16-s1000
- Use devm_platform_profile_register()
ideapad-laptop:
- Use devm_platform_profile_register()
Input:
- allocate keycode for phone linking
- i8042 - Add support for platform filter contexts
inspur_platform_profile:
- Use devm_platform_profile_register()
int3472:
- Check for adev == NULL
- Debug log the sensor name
- Fix skl_int3472_handle_gpio_resources() return value
- Make "pin number mismatch" message a debug message
intel: bytcrc_pwrsrc:
- fix power_supply dependency
- Optionally register a power_supply dev
intel: int0002_vgpio:
- Make the irqchip immutable
intel/pmt:
- Constify 'struct bin_attribute'
intel: punit_ipc:
- Remove unused function
intel/sdsi:
- Constify 'struct bin_attribute'
intel/tpmi/plr:
- Make char[] longer to silence warning
lenovo-wmi-camera:
- Use SW_CAMERA_LENS_COVER instead of KEY_CAMERA_ACESS
MAINTAINERS:
- Change AMD PMC driver status to "Supported"
mlxbf-bootctl:
- Constify 'struct bin_attribute'
- use sysfs_emit() instead of sprintf()
mlxbf-pmc:
- Add support for clock_measure performance block
- Add support for monitoring cycle count
- incorrect type in assignment
mlxreg-hotplug:
- use sysfs_emit() instead of sprintf()
mlxreg-io:
- use sysfs_emit() instead of sprintf()
quickstart:
- don't include 'pm_wakeup.h' directly
serdev_helpers:
- Add get_serdev_controller_from_parent() helper
- Check for serial_ctrl_uid == NULL
surface: surface_platform_profile:
- Use devm_platform_profile_register()
think-lmi:
- Directly use firmware_attributes_class
thinkpad_acpi:
- Add support for new phone link hotkey
thinkpad-acpi:
- replace strcpy with strscpy
thinkpad_acpi:
- Use devm_platform_profile_register()
tools/power/x86/intel-speed-select:
- Fix TRL restore after SST-TF disable
- v1.21 release
wmi-bmof:
- Make use of .bin_size() callback
x86-android-tablets:
- Add Bluetooth support for Vexia EDU ATLA 10
- Add missing __init to get_i2c_adap_by_*()
- Add support for getting serdev-controller by PCI parent
- Add Vexia EDU ATLA 10 EC battery driver
- Change x86_instantiate_serdev() prototype
- make platform data be static
- Make variables only used locally static
- Store serdev-controller ACPI HID + UID in a union
Merges:
- Merge branch 'fixes' into 'for-next'
- Merge branch 'intel-sst' of https://github.com/spandruvada/linux-kernel into review-ilpo-next
- Merge branch 'platform-drivers-x86-platform-profile' into for-next
- Merge branch 'platform-drivers-x86-platform-profile' into for-next
- Merge import NS conversion from 'https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git' into for-next
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCZ5JDNAAKCRBZrE9hU+XO
MT3AAP9YSYaWZUEgV9T/De2C/ksx0XfmHULmtQHccMgqIsIxmAEAmsBOHsDozPuZ
9F2IbT4uBuQo2iwbGq0DhVd+N36kEQw=
=Vz0C
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Ilpo Järvinen:
"acer-wmi:
- Add support for PH14-51, PH16-72, and Nitro AN515-58
- Add proper hwmon support
- Improve error handling when reading "gaming system info"
- Replace direct EC reads for the current platform profile with WMI
calls to handle EC address variations
- Replace custom platform_profile cycling with the generic one
ACPI:
- platform_profile: Major refactoring and improvements
- Support registering multiple platform_profile handlers concurrently
to avoid the need to quirk which handler takes precedence
- Support reporting "custom" profile for cases where the current
profile is ambiguous or when settings tweaks are done outside the
pre-defined profile
- Abstract and layer platform_profile API better using the class_dev
and drvdata
- Various minor improvements
- Add Documentation and kerneldoc
amd/hsmp:
- Add support for HSMP protocol v7
amd/pmc:
- Support AMD 1Ah family 70h
- Support STB with Ryzen desktop SoCs
amd/pmf:
- Support Custom BIOS inputs for PMF TA
- Support passing SRA sensor data from AMD SFH (HID) to PMF TA
dell-smo8800:
- Move SMO88xx quirk away from the generic i2c-i801 driver
- Add accelerometer support for Dell Latitude E6330/E6430 and XPS
9550
- Support probing accelerometer for models yet to be listed in the
DMI mapping table because ACPI lacks i2c-address for the
accelerometer (behind a module parameter because probing might be
dangerous)
HID:
- amd_sfh: Add support for exporting SRA sensor data
hp-wmi:
- Add fan and thermal support for Victus 16-s1000
input:
- Add key for phone linking
- i8042: Add context for the i8042 filter to enable cleaning up the
filter related global variables from pdx86 drivers
lenovo-wmi-camera:
- Use SW_CAMERA_LENS_COVER instead of KEY_CAMERA_ACCESS
mellanox mlxbf-pmc:
- Add support for monitoring cycle count
- Add Documentation
thinkpad_acpi:
- Add support for phone link key
tools/power/x86/intel-speed-select:
- Fix Turbo Ratio Limit restore
x86-android-tables:
- Add support for Vexia EDU ATLA 10 Bluetooth and EC battery driver
And miscellaneous cleanups / refactoring / improvements"
* tag 'platform-drivers-x86-v6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (133 commits)
platform/x86: acer-wmi: Fix initialization of last_non_turbo_profile
platform/x86: acer-wmi: Ignore AC events
platform/mellanox: mlxreg-io: use sysfs_emit() instead of sprintf()
platform/mellanox: mlxreg-hotplug: use sysfs_emit() instead of sprintf()
platform/mellanox: mlxbf-bootctl: use sysfs_emit() instead of sprintf()
platform/x86: hp-wmi: Add fan and thermal profile support for Victus 16-s1000
ACPI: platform_profile: Add a prefix to log messages
ACPI: platform_profile: Add documentation
ACPI: platform_profile: Clean platform_profile_handler
ACPI: platform_profile: Move platform_profile_handler
ACPI: platform_profile: Remove platform_profile_handler from exported symbols
platform/x86: thinkpad_acpi: Use devm_platform_profile_register()
platform/x86: inspur_platform_profile: Use devm_platform_profile_register()
platform/x86: hp-wmi: Use devm_platform_profile_register()
platform/x86: ideapad-laptop: Use devm_platform_profile_register()
platform/x86: dell-pc: Use devm_platform_profile_register()
platform/x86: asus-wmi: Use devm_platform_profile_register()
platform/x86: amd: pmf: sps: Use devm_platform_profile_register()
platform/x86: acer-wmi: Use devm_platform_profile_register()
platform/surface: surface_platform_profile: Use devm_platform_profile_register()
...
This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
entries.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmePs7oACgkQnJ2qBz9k
QNmHuAf9GkLnY5u1/81xP5V9ukZ4N2yeMW0dydLS5cjWj/St5ELeMAza3jeqtJtD
j36vbnmy2c5pPaGLAK8BJpMXT/R2TkmmKD004zcfqF2S3SgbGzdgO1zMZzq9KJpM
woRKZtLuglDajedsDEBBcKotBhlN2+C/sQlFuL1mX4zitk9ajr0qYUB1+JqOeg5f
qwPsDLT077ADpxd7lVIMcm+OqbduP5KWkBKYHpn7lJcLe1eqVMMzceJroW42zhVG
Dq8Iln26bbU9Wx6FSPFCUcHEzHRHUfXmu07HN9U0X++0QgWjrmBQQLooGFB/bR4a
edBrPpVas6xE4/brjgFX3gOKtv8xYg==
=ewDV
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify pre-content notification support from Jan Kara:
"This introduces a new fsnotify event (FS_PRE_ACCESS) that gets
generated before a file contents is accessed.
The event is synchronous so if there is listener for this event, the
kernel waits for reply. On success the execution continues as usual,
on failure we propagate the error to userspace. This allows userspace
to fill in file content on demand from slow storage. The context in
which the events are generated has been picked so that we don't hold
any locks and thus there's no risk of a deadlock for the userspace
handler.
The new pre-content event is available only for users with global
CAP_SYS_ADMIN capability (similarly to other parts of fanotify
functionality) and it is an administrator responsibility to make sure
the userspace event handler doesn't do stupid stuff that can DoS the
system.
Based on your feedback from the last submission, fsnotify code has
been improved and now file->f_mode encodes whether pre-content event
needs to be generated for the file so the fast path when nobody wants
pre-content event for the file just grows the additional file->f_mode
check. As a bonus this also removes the checks whether the old
FS_ACCESS event needs to be generated from the fast path. Also the
place where the event is generated during page fault has been moved so
now filemap_fault() generates the event if and only if there is no
uptodate folio in the page cache.
Also we have dropped FS_PRE_MODIFY event as current real-world users
of the pre-content functionality don't really use it so let's start
with the minimal useful feature set"
* tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (21 commits)
fanotify: Fix crash in fanotify_init(2)
fs: don't block write during exec on pre-content watched files
fs: enable pre-content events on supported file systems
ext4: add pre-content fsnotify hook for DAX faults
btrfs: disable defrag on pre-content watched files
xfs: add pre-content fsnotify hook for DAX faults
fsnotify: generate pre-content permission event on page fault
mm: don't allow huge faults for files with pre content watches
fanotify: disable readahead if we have pre-content watches
fanotify: allow to set errno in FAN_DENY permission response
fanotify: report file range info with pre-content events
fanotify: introduce FAN_PRE_ACCESS permission event
fsnotify: generate pre-content permission event on truncate
fsnotify: pass optional file access range in pre-content event
fsnotify: introduce pre-content permission events
fanotify: reserve event bit of deprecated FAN_DIR_MODIFY
fanotify: rename a misnamed constant
fanotify: don't skip extra event info if no info_mode is set
fsnotify: check if file is actually being watched for pre-content events on open
fsnotify: opt-in for permission events at file open time
...
- Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
dma_chan_rx (Mohamed Khalfella)
- Correct the DMA MEMCPY test so it doesn't fail if the Endpoint supports
both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)
- Add pci-epf-test and pci_endpoint_test support for capabilities (Niklas
Cassel)
- Add Endpoint test for consecutive BARs (Niklas Cassel)
- Remove redundant comparison from Endpoint BAR test because a > 1MB BAR
can always be exactly covered by iterating with a 1MB buffer (Hans Zhang)
- Correct the PCI Endpoint test IOCTL return value (Manivannan Sadhasivam)
- Move PCI Endpoint tests from tools/pci to Kselftests (Manivannan
Sadhasivam)
- Convert PCI Endpoint tests to the Kselftest framework (Manivannan
Sadhasivam)
* pci/endpoint-test:
selftests: pci_endpoint: Migrate to Kselftest framework
selftests: Move PCI Endpoint tests from tools/pci to Kselftests
misc: pci_endpoint_test: Fix IOCTL return value
misc: pci_endpoint_test: Remove redundant 'remainder' test
misc: pci_endpoint_test: Add consecutive BAR test
misc: pci_endpoint_test: Add support for capabilities
PCI: endpoint: pci-epf-test: Add support for capabilities
PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmeOu1YACgkQ6rmadz2v
bTrrHxAAn6eqEsluWnDlzhI0OGsPjvgS00sf+MOeqiXYeS2eJ8yJuKifp38+nIQZ
lIplsWU2ReUY20eizPqLPnQ7TXZGvLgp08E8yHUoZ0siWanqr9iDRfbZCCNrDMNm
lMqeR1SLapMws2R/UX9JbvPn2ajIJ6Lb4wxenTfdlW6q+0hAGM6Dt0k/jBod+quq
/oo+xwG3L0q4APBovJfiAFN2z6IYN03b+zLiOrpIJtMACGewEXnl3m4mkL8ZM/FV
nZGPIxIUPXCpKTGEkNqxfkrnHN2wZQ4ZSKEJ6lhEEp4jrgCVITaGZ/E7jlx6fZoj
bbd4YMonIPo9Nhim8p1dt8yYBhKKiE5IXIq0GqlMv5+MvAN8ylrlydpsouW1fu66
hZ1W1BxbxmrgyF0Bwo9JPOMhBHwMrmD6iH9LgiMpZf0ASeF+q9cJpoSOU5j5E9XB
LpLIRf5jYTd4wZjhDmrQREReLo+Bng9DlCBu+jjh2+YTz6l6Qed+ETpENcd7lL5i
IHZVbgD2RVPNJoUfdrd763HfYfDTk+50MF5FIMEyfKHz11if0E/LhBMzto22hm6b
2f8ruj/8yvg8s2dxEP3ySQgcnynlwEnGxLenUVv7uEOYKeWri1rq+fvTK5ne1OLK
oHnTlkViwQb74c0r8cFW+nkyfUYTfhhBAql14rl/fMjGDO2KZ10=
=f2CA
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"A smaller than usual release cycle.
The main changes are:
- Prepare selftest to run with GCC-BPF backend (Ihor Solodrai)
In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile
only mode. Half of the tests are failing, since support for
btf_decl_tag is still WIP, but this is a great milestone.
- Convert various samples/bpf to selftests/bpf/test_progs format
(Alexis Lothoré and Bastien Curutchet)
- Teach verifier to recognize that array lookup with constant
in-range index will always succeed (Daniel Xu)
- Cleanup migrate disable scope in BPF maps (Hou Tao)
- Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao)
- Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin
KaFai Lau)
- Refactor verifier lock support (Kumar Kartikeya Dwivedi)
This is a prerequisite for upcoming resilient spin lock.
- Remove excessive 'may_goto +0' instructions in the verifier that
LLVM leaves when unrolls the loops (Yonghong Song)
- Remove unhelpful bpf_probe_write_user() warning message (Marco
Elver)
- Add fd_array_cnt attribute for prog_load command (Anton Protopopov)
This is a prerequisite for upcoming support for static_branch"
* tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits)
selftests/bpf: Add some tests related to 'may_goto 0' insns
bpf: Remove 'may_goto 0' instruction in opt_remove_nops()
bpf: Allow 'may_goto 0' instruction in verifier
selftests/bpf: Add test case for the freeing of bpf_timer
bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT
bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
bpf: Bail out early in __htab_map_lookup_and_delete_elem()
bpf: Free special fields after unlock in htab_lru_map_delete_node()
tools: Sync if_xdp.h uapi tooling header
libbpf: Work around kernel inconsistently stripping '.llvm.' suffix
bpf: selftests: verifier: Add nullness elision tests
bpf: verifier: Support eliding map lookup nullness
bpf: verifier: Refactor helper access type tracking
bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
bpf: verifier: Add missing newline on verbose() call
selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED
libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED
libbpf: Fix return zero when elf_begin failed
selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test
veristat: Load struct_ops programs only once
...
- Implement AT_EXECVE_CHECK flag to execveat(2) (Mickaël Salaün)
- Implement EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
(Mickaël Salaün)
- Add selftests and samples for AT_EXECVE_CHECK (Mickaël Salaün)
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ4hO7wAKCRA2KwveOeQk
u4l+AP9UHO1KwMn3aOt6uFPj7omaoY0vpcB1rx/x5s4efNFHOAD/QjY0f+ND+HzF
mKLYOIeacGEQi7TNhpnOkGjz6jzSiwg=
=sMhZ
-----END PGP SIGNATURE-----
Merge tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull AT_EXECVE_CHECK from Kees Cook:
- Implement AT_EXECVE_CHECK flag to execveat(2) (Mickaël Salaün)
- Implement EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
(Mickaël Salaün)
- Add selftests and samples for AT_EXECVE_CHECK (Mickaël Salaün)
* tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
ima: instantiate the bprm_creds_for_exec() hook
samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
selftests: ktap_helpers: Fix uninitialized variable
samples/check-exec: Add set-exec
selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK
selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits
security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
Core
----
- More core refactoring to reduce the RTNL lock contention,
including preparatory work for the per-network namespace RTNL lock,
replacing RTNL lock with a per device-one to protect NAPI-related
net device data and moving synchronize_net() calls outside such
lock.
- Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge and
more specific TCP coverage.
- Reduce network namespace tear-down time by removing per-subsystems
synchronize_net() in tipc and sched.
- Add flow label selector support for fib rules, allowing traffic
redirection based on such header field.
Netfilter
---------
- Do not remove netdev basechain when last device is gone, allowing
netdev basechains without devices.
- Revisit the flowtable teardown strategy, dealing better with fin,
reset and re-open events.
- Scale-up IP-vs connection dumping by avoiding linear search on
each restart.
Protocols
---------
- A significant XDP socket refactor, consolidating and optimizing
several helpers into the core
- Better scaling of ICMP rate-limiting, by removing false-sharing in
inet peers handling.
- Introduces netlink notifications for multicast IPv4 and IPv6
address changes.
- Add ipsec support for IP-TFS/AggFrag encapsulation, allowing
aggregation and fragmentation of the inner IP.
- Add sysctl to configure TIME-WAIT reuse delay for TCP sockets,
to avoid local port exhaustion issues when the average connection
lifetime is very short.
- Support updating keys (re-keying) for connections using kernel
TLS (for TLS 1.3 only).
- Support ipv4-mapped ipv6 address clients in smc-r v2.
- Add support for jumbo data packet transmission in RxRPC sockets,
gluing multiple data packets in a single UDP packet.
- Support RxRPC RACK-TLP to manage packet loss and retransmission in
conjunction with the congestion control algorithm.
Driver API
----------
- Introduce a unified and structured interface for reporting PHY
statistics, exposing consistent data across different H/W via
ethtool.
- Make timestamping selectable, allow the user to select the desired
hwtstamp provider (PHY or MAC) administratively.
- Add support for configuring a header-data-split threshold (HDS)
value via ethtool, to deal with partial or buggy H/W implementation.
- Consolidate DSA drivers Energy Efficiency Ethernet support.
- Add EEE management to phylink, making use of the phylib
implementation.
- Add phylib support for in-band capabilities negotiation.
- Simplify how phylib-enabled mac drivers expose the supported
interfaces.
Tests and tooling
-----------------
- Make the YNL tool package-friendly to make it easier to deploy it
separately from the kernel.
- Increase TCP selftest coverage importing several packetdrill
test-cases.
- Regenerate the ethtool uapi header from the YNL spec,
to ease maintenance and future development.
- Add YNL support for decoding the link types used in net
self-tests, allowing a single build to run both net and
drivers/net.
Drivers
-------
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- add cross E-Switch QoS support
- add SW Steering support for ConnectX-8
- implement support for HW-Managed Flow Steering, improving the
rule deletion/insertion rate
- support for multi-host LAG
- Intel (ixgbe, ice, igb):
- ice: add support for devlink health events
- ixgbe: add initial support for E610 chipset variant
- igb: add support for AF_XDP zero-copy
- Meta:
- add support for basic RSS config
- allow changing the number of channels
- add hardware monitoring support
- Broadcom (bnxt):
- implement TCP data split and HDS threshold ethtool support,
enabling Device Memory TCP.
- Marvell Octeon:
- implement egress ipsec offload support for the cn10k family
- Hisilicon (HIBMC):
- implement unicast MAC filtering
- Ethernet NICs embedded and virtual:
- Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding
contented atomic operations for drop counters
- Freescale:
- quicc: phylink conversion
- enetc: support Tx and Rx checksum offload and improve TSO
performances
- MediaTek:
- airoha: introduce support for ETS and HTB Qdisc offload
- Microchip:
- lan78XX USB: preparation work for phylink conversion
- Synopsys (stmmac):
- support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45
- refactor EEE support to leverage the new driver API
- optimize DMA and cache access to increase raw RX performances
by 40%
- TI:
- icssg-prueth: add multicast filtering support for VLAN
interface
- netkit:
- add ability to configure head/tailroom
- VXLAN:
- accepts packets with user-defined reserved bit
- Ethernet switches:
- Microchip:
- lan969x: add RGMII support
- lan969x: improve TX and RX performance using the FDMA engine
- nVidia/Mellanox:
- move Tx header handling to PCI driver, to ease XDP support
- Ethernet PHYs:
- Texas Instruments DP83822:
- add support for GPIO2 clock output
- Realtek:
- 8169: add support for RTL8125D rev.b
- rtl822x: add hwmon support for the temperature sensor
- Microchip:
- add support for RDS PTP hardware
- consolidate periodic output signal generation
- CAN:
- several DT-bindings to DT schema conversions
- tcan4x5x:
- add HW standby support
- support nWKRQ voltage selection
- kvaser:
- allowing Bus Error Reporting runtime configuration
- WiFi:
- the on-going Multi-Link Operation (MLO) effort continues, affecting
both the stack and in drivers
- mac80211/cfg80211:
- Emergency Preparedness Communication Services (EPCS) station mode
support
- support for adding and removing station links for MLO
- add support for WiFi 7/EHT mesh over 320 MHz channels
- report Tx power info for each link
- RealTek (rtw88):
- enable USB Rx aggregation and USB 3 to improve performance
- LED support
- RealTek (rtw89):
- refactor power save to support Multi-Link Operations
- add support for RTL8922AE-VS variant
- MediaTek (mt76):
- single wiphy multiband support (preparation for MLO)
- p2p device support
- add TP-Link TXE50UH USB adapter support
- Qualcomm (ath10k):
- support for the QCA6698AQ IP core
- Qualcomm (ath12k):
- enable MLO for QCN9274
- Bluetooth:
- Allow sysfs to trigger hdev reset, to allow recovering devices
not responsive from user-space
- MediaTek: add support for MT7922, MT7925, MT7921e devices
- Realtek: add support for RTL8851BE devices
- Qualcomm: add support for WCN785x devices
- ISO: allow BIG re-sync
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmePf5YSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkUcMQALblhkGTxurnfT+yK+Bsuhn2LoHl2RPN
4u2Kjkzm+2FYgcw6lS17cFXsnfAPlRIpmhnmKk1EBgsBdkuL29c+jtqnljA2bboD
tIMhMgWiaLS3xgEMrLeKnseIo0G9mviQRphGeZPFTaLb4Ww/bd5LAp4ZGc5oij76
tURatC3b6MuO4Lt5U+jWKnRwviXku8udHkVHXlvPdirawHCVinmx3tvce/BI/MaD
eUOp6ZeJCPCOLtk7b8WEyxxvdY0f6D9ed82qfPDHjb94SJv+Vxb38RZtNuApIjn9
S0KdlNih/4flDy17LDxGYSyFps78lUFRbpqmsUlnZkyLXpsph7/WTvAmMAFcrX0K
UgQ/F/q5GAvcP5WZcCj5+tZaRmfKQraQirXMtYU/Uj50qCnSU7ssyACASt23GLZ8
OF8tCLlm9lLOU1B6Ofkul1Dbo5f0Xpaghga4dFb0kzSfbm78fTUnqBNsJ7jIkWfi
fD6dO+fg+p2ZMD0CACGo3CNxQuJmaQWg6BIDeno6God8kZ6qBMxY/sFr4qozrvFH
x/FgQq8dgc8WLmaPejKiNIPkdQepXrIiv3T9jgMVyEjJnWB/LBfyWKSQOdTfnLs+
rgr4YMV6XW4bx0fYqTI8B9jZ+FCWbG6sn4UtRTHITKcd3FSvd8Y+PHa5YyCUWvJM
l8pePMGF0XVF
=hrsp
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"This is slightly smaller than usual, with the most interesting work
being still around RTNL scope reduction.
Core:
- More core refactoring to reduce the RTNL lock contention, including
preparatory work for the per-network namespace RTNL lock, replacing
RTNL lock with a per device-one to protect NAPI-related net device
data and moving synchronize_net() calls outside such lock.
- Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge
and more specific TCP coverage.
- Reduce network namespace tear-down time by removing per-subsystems
synchronize_net() in tipc and sched.
- Add flow label selector support for fib rules, allowing traffic
redirection based on such header field.
Netfilter:
- Do not remove netdev basechain when last device is gone, allowing
netdev basechains without devices.
- Revisit the flowtable teardown strategy, dealing better with fin,
reset and re-open events.
- Scale-up IP-vs connection dumping by avoiding linear search on each
restart.
Protocols:
- A significant XDP socket refactor, consolidating and optimizing
several helpers into the core
- Better scaling of ICMP rate-limiting, by removing false-sharing in
inet peers handling.
- Introduces netlink notifications for multicast IPv4 and IPv6
address changes.
- Add ipsec support for IP-TFS/AggFrag encapsulation, allowing
aggregation and fragmentation of the inner IP.
- Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to
avoid local port exhaustion issues when the average connection
lifetime is very short.
- Support updating keys (re-keying) for connections using kernel TLS
(for TLS 1.3 only).
- Support ipv4-mapped ipv6 address clients in smc-r v2.
- Add support for jumbo data packet transmission in RxRPC sockets,
gluing multiple data packets in a single UDP packet.
- Support RxRPC RACK-TLP to manage packet loss and retransmission in
conjunction with the congestion control algorithm.
Driver API:
- Introduce a unified and structured interface for reporting PHY
statistics, exposing consistent data across different H/W via
ethtool.
- Make timestamping selectable, allow the user to select the desired
hwtstamp provider (PHY or MAC) administratively.
- Add support for configuring a header-data-split threshold (HDS)
value via ethtool, to deal with partial or buggy H/W
implementation.
- Consolidate DSA drivers Energy Efficiency Ethernet support.
- Add EEE management to phylink, making use of the phylib
implementation.
- Add phylib support for in-band capabilities negotiation.
- Simplify how phylib-enabled mac drivers expose the supported
interfaces.
Tests and tooling:
- Make the YNL tool package-friendly to make it easier to deploy it
separately from the kernel.
- Increase TCP selftest coverage importing several packetdrill
test-cases.
- Regenerate the ethtool uapi header from the YNL spec, to ease
maintenance and future development.
- Add YNL support for decoding the link types used in net self-tests,
allowing a single build to run both net and drivers/net.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- add cross E-Switch QoS support
- add SW Steering support for ConnectX-8
- implement support for HW-Managed Flow Steering, improving the
rule deletion/insertion rate
- support for multi-host LAG
- Intel (ixgbe, ice, igb):
- ice: add support for devlink health events
- ixgbe: add initial support for E610 chipset variant
- igb: add support for AF_XDP zero-copy
- Meta:
- add support for basic RSS config
- allow changing the number of channels
- add hardware monitoring support
- Broadcom (bnxt):
- implement TCP data split and HDS threshold ethtool support,
enabling Device Memory TCP.
- Marvell Octeon:
- implement egress ipsec offload support for the cn10k family
- Hisilicon (HIBMC):
- implement unicast MAC filtering
- Ethernet NICs embedded and virtual:
- Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding
contented atomic operations for drop counters
- Freescale:
- quicc: phylink conversion
- enetc: support Tx and Rx checksum offload and improve TSO
performances
- MediaTek:
- airoha: introduce support for ETS and HTB Qdisc offload
- Microchip:
- lan78XX USB: preparation work for phylink conversion
- Synopsys (stmmac):
- support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45
- refactor EEE support to leverage the new driver API
- optimize DMA and cache access to increase raw RX performances
by 40%
- TI:
- icssg-prueth: add multicast filtering support for VLAN
interface
- netkit:
- add ability to configure head/tailroom
- VXLAN:
- accepts packets with user-defined reserved bit
- Ethernet switches:
- Microchip:
- lan969x: add RGMII support
- lan969x: improve TX and RX performance using the FDMA engine
- nVidia/Mellanox:
- move Tx header handling to PCI driver, to ease XDP support
- Ethernet PHYs:
- Texas Instruments DP83822:
- add support for GPIO2 clock output
- Realtek:
- 8169: add support for RTL8125D rev.b
- rtl822x: add hwmon support for the temperature sensor
- Microchip:
- add support for RDS PTP hardware
- consolidate periodic output signal generation
- CAN:
- several DT-bindings to DT schema conversions
- tcan4x5x:
- add HW standby support
- support nWKRQ voltage selection
- kvaser:
- allowing Bus Error Reporting runtime configuration
- WiFi:
- the on-going Multi-Link Operation (MLO) effort continues,
affecting both the stack and in drivers
- mac80211/cfg80211:
- Emergency Preparedness Communication Services (EPCS) station
mode support
- support for adding and removing station links for MLO
- add support for WiFi 7/EHT mesh over 320 MHz channels
- report Tx power info for each link
- RealTek (rtw88):
- enable USB Rx aggregation and USB 3 to improve performance
- LED support
- RealTek (rtw89):
- refactor power save to support Multi-Link Operations
- add support for RTL8922AE-VS variant
- MediaTek (mt76):
- single wiphy multiband support (preparation for MLO)
- p2p device support
- add TP-Link TXE50UH USB adapter support
- Qualcomm (ath10k):
- support for the QCA6698AQ IP core
- Qualcomm (ath12k):
- enable MLO for QCN9274
- Bluetooth:
- Allow sysfs to trigger hdev reset, to allow recovering devices
not responsive from user-space
- MediaTek: add support for MT7922, MT7925, MT7921e devices
- Realtek: add support for RTL8851BE devices
- Qualcomm: add support for WCN785x devices
- ISO: allow BIG re-sync"
* tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits)
net/rose: prevent integer overflows in rose_setsockopt()
net: phylink: fix regression when binding a PHY
net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup
net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup
net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path
ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.
ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.
ipv6: Move lifetime validation to inet6_rtm_newaddr().
ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().
ipv6: Pass dev to inet6_addr_add().
ipv6: Convert inet6_ioctl() to per-netns RTNL.
ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().
ipv6: Hold rtnl_net_lock() in addrconf_dad_work().
ipv6: Hold rtnl_net_lock() in addrconf_verify_work().
ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.
ipv6: Add __in6_dev_get_rtnl_net().
net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags
net: mii: Fix the Speed display when the network cable is not connected
sysctl net: Remove macro checks for CONFIG_SYSCTL
eth: bnxt: update header sizing defaults
...
Remove duplicate macro PCI_VSEC_HDR and its related macro
PCI_VSEC_HDR_LEN_SHIFT from pci_regs.h to avoid redundancy and
inconsistencies. Update VFIO PCI code to use PCI_VNDR_HEADER and
PCI_VNDR_HEADER_LEN() for consistent naming and functionality.
These changes aim to streamline header handling while minimizing impact,
given the niche usage of these macros in userspace.
Link: https://lore.kernel.org/r/20241216013536.4487-1-zhangdongdong@eswincomputing.com
Signed-off-by: Dongdong Zhang <zhangdongdong@eswincomputing.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
The delstid draft adds new NFS4_SHARE_WANT_TYPE_MASK values that don't
fit neatly into the existing WANT_MASK or WHEN_MASK. Add a new
NFS4_SHARE_WANT_MOD_MASK value and redefine NFS4_SHARE_WANT_MASK to
include it.
Also fix the checks in nfsd4_deleg_xgrade_none_ext() to check for the
flags instead of equality, since there may be modifier flags in the
value.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add a more advanced BAR test that writes all BARs in one go, and then reads
them back and verifies that the value matches the BAR number bitwise OR'ed
with offset, this allows us to verify:
- The BAR number was what we intended to read
- The offset was what we intended to read
This allows us to detect potential address translation issues on the EP.
Reading back the BAR directly after writing will not allow us to detect the
case where inbound address translation on the endpoint incorrectly causes
multiple BARs to be redirected to the same memory region (within the EP).
Link: https://lore.kernel.org/r/20241116032045.2574168-2-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmeNDEUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpl5hD/4t7kWWNQDeQG9CiA3QStMJ5Yow2AgYtK8f
sJBr5/6PGEsbTreX//Kh8DtPZPRGcjG9elCo58QxWaPZ2mg3fTOR3/QYLMlaGXU2
hSht58lj32utpuzMjMo9bG3aesi03bLf+buaq7V1FaMlcTV8rXqK1s/HGtphDBRo
8tNLEk3JDJDs3vlWbNp/5Hqh9+Ro6DU8df1zWWH4Vbu8RXaGIPyJyjKvvcbfuuCf
k7Ay45XNAmTZg+rSNGv1H3Yn1LNzPMVFLWBfzRahPCzlKy2+mJMWz1PWu9naaUK+
WTM+kgiBLF24k59G/9xuxC5bYtsTjTbr4GsEE5ZvFBnhKPzLzzaJj7iQHRj83vtv
tqxNmAbA3wJoNk48Zr8+cYbfDX9Q9Pl32wIaS/LxRgF9MT4lem6pyKY7Skd12oK3
rnQ8moGtnOBxp3QUU6BZ7IX3ipb+Bgw7FhZbtVYJdlqKeKyi1QO0MuITwGXpMwk/
EWDDTsspIf+QaTu+fmO8byJavugKljW8t7hM1JpvlfOLl+rsh6/+AYz42fCvcaA0
Tu4bpUk8SuwALvZfU2R6bLkorGG6MFuGI8g3eixOcGir3YAcHBMfdg6ItpZi5qVt
ToM87BMaezOZZvSwX1JBaQ0AR5HBQYmHaiLWgPsORf3PjJ0kz+u21SK9D+yJkUtU
rT6+HvoVXA==
=ufpE
-----END PGP SIGNATURE-----
Merge tag 'for-6.14/io_uring-20250119' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
"Not a lot in terms of features this time around, mostly just cleanups
and code consolidation:
- Support for PI meta data read/write via io_uring, with NVMe and
SCSI covered
- Cleanup the per-op structure caching, making it consistent across
various command types
- Consolidate the various user mapped features into a concept called
regions, making the various users of that consistent
- Various cleanups and fixes"
* tag 'for-6.14/io_uring-20250119' of git://git.kernel.dk/linux: (56 commits)
io_uring/fdinfo: fix io_uring_show_fdinfo() misuse of ->d_iname
io_uring: reuse io_should_terminate_tw() for cmds
io_uring: Factor out a function to parse restrictions
io_uring/rsrc: require cloned buffers to share accounting contexts
io_uring: simplify the SQPOLL thread check when cancelling requests
io_uring: expose read/write attribute capability
io_uring/rw: don't gate retry on completion context
io_uring/rw: handle -EAGAIN retry at IO completion time
io_uring/rw: use io_rw_recycle() from cleanup path
io_uring/rsrc: simplify the bvec iter count calculation
io_uring: ensure io_queue_deferred() is out-of-line
io_uring/rw: always clear ->bytes_done on io_async_rw setup
io_uring/rw: use NULL for rw->free_iovec assigment
io_uring/rw: don't mask in f_iocb_flags
io_uring/msg_ring: Drop custom destructor
io_uring: Move old async data allocation helper to header
io_uring/rw: Allocate async data through helper
io_uring/net: Allocate msghdr async data through helper
io_uring/uring_cmd: Allocate async data through generic helper
io_uring/poll: Allocate apoll with generic alloc_cache helper
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmeL6hoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppw2EADQV8nDgLRggZR+il4U03yKHXcQEdAX1GrB
Erowx+dasIJuh6kp3n6qRe9QD/pRqt1DKyLvXoWF8Qfuwq85j7oDnDDYxutNYT27
hDgrLJriJ3VeKYtTu+andHWt8P29b5h57UayInDOUJurEPA6rXyFZ5YVIti8n21K
uDOrQXiACG3qRWS2+p2f3UNhX0MkFNFdN/lxi13WMIJtRWF5bXAP+JOgIWCID4Ze
QuSY6rQD4dp4Q6M2erpX6tn0YZb7Hvw3rPjsd91n6jvYfTUVLH375zg8jCBpi6Wi
Syufbb8xcTtriVPTDRNu0ekjebkc8wD8ax/h86g0z9v3Ua4DlNmsx9eXrtv6r5nu
YXqDODOad6stI0+owFquW2vas0gHmfNSfyfGdlk2g24PMtP5Yx0V6FIEvwIeqnje
ghgxQvBuKUsdhqakByfNnc+XvXi3+RUJek8kvMeUSUQWT1IyMQqPOOk0yp9WdyWD
bY1f2ECP5BR1b37zYOyawewsI5xTupHUswn5a4r4qtGn3O15rGDkX98Nab5aLCnR
rW/DvX7+wT6gW9EwrRHiwjwfNDZbsJ9Ggu3lMhtUl5GUWdk58yTiVgKaHJLnlX9/
CKFKfyyIR1Vl8+gYIpemyFhhcoN+dCSf06ISkrg0jeS0/tYwydaAaCBPL5J4kxZA
h3Rtbh+Pgg==
=EXYs
-----END PGP SIGNATURE-----
Merge tag 'for-6.14/block-20250118' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- NVMe pull requests via Keith:
- Target support for PCI-Endpoint transport (Damien)
- TCP IO queue spreading fixes (Sagi, Chaitanya)
- Target handling for "limited retry" flags (Guixen)
- Poll type fix (Yongsoo)
- Xarray storage error handling (Keisuke)
- Host memory buffer free size fix on error (Francis)
- MD pull requests via Song:
- Reintroduce md-linear (Yu Kuai)
- md-bitmap refactor and fix (Yu Kuai)
- Replace kmap_atomic with kmap_local_page (David Reaver)
- Quite a few queue freeze and debugfs deadlock fixes
Ming introduced lockdep support for this in the 6.13 kernel, and it
has (unsurprisingly) uncovered quite a few issues
- Use const attributes for IO schedulers
- Remove bio ioprio wrappers
- Fixes for stacked device atomic write support
- Refactor queue affinity helpers, in preparation for better supporting
isolated CPUs
- Cleanups of loop O_DIRECT handling
- Cleanup of BLK_MQ_F_* flags
- Add rotational support for null_blk
- Various fixes and cleanups
* tag 'for-6.14/block-20250118' of git://git.kernel.dk/linux: (106 commits)
block: Don't trim an atomic write
block: Add common atomic writes enable flag
md/md-linear: Fix a NULL vs IS_ERR() bug in linear_add()
block: limit disk max sectors to (LLONG_MAX >> 9)
block: Change blk_stack_atomic_writes_limits() unit_min check
block: Ensure start sector is aligned for stacking atomic writes
blk-mq: Move more error handling into blk_mq_submit_bio()
block: Reorder the request allocation code in blk_mq_submit_bio()
nvme: fix bogus kzalloc() return check in nvme_init_effects_log()
md/md-bitmap: move bitmap_{start, end}write to md upper layer
md/raid5: implement pers->bitmap_sector()
md: add a new callback pers->bitmap_sector()
md/md-bitmap: remove the last parameter for bimtap_ops->endwrite()
md/md-bitmap: factor behind write counters out from bitmap_{start/end}write()
md: Replace deprecated kmap_atomic() with kmap_local_page()
md: reintroduce md-linear
partitions: ldm: remove the initial kernel-doc notation
blk-cgroup: rwstat: fix kernel-doc warnings in header file
blk-cgroup: fix kernel-doc warnings in header file
nbd: fix partial sending
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pSTwAKCRCRxhvAZXjc
oiSbAQCIWp8Jm2FX9Mv+eX8erLFlyQSDQauAnqPtW/SvMbpgFgEAuZOTSRU3GSqI
NiowpYms9OckO638GlNHlSTUTcV4YwU=
=obHT
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.14-rc1.statx.dio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs direct-io updates from Christian Brauner:
"File systems that write out of place usually require different
alignment for direct I/O writes than what they can do for reads.
Add a separate dio read align field to statx, as many out of place
write file systems can easily do reads aligned to the device sector
size, but require bigger alignment for writes.
This is usually papered over by falling back to buffered I/O for
smaller writes and doing read-modify-write cycles, but performance for
this sucks, so applications benefit from knowing the actual write
alignment"
* tag 'vfs-6.14-rc1.statx.dio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
xfs: report larger dio alignment for COW inodes
xfs: report the correct read/write dio alignment for reflinked inodes
xfs: cleanup xfs_vn_getattr
fs: add STATX_DIO_READ_ALIGN
fs: reformat the statx definition
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pRjQAKCRCRxhvAZXjc
omUyAP9k31Qr7RY1zNtmpPfejqc+3Xx+xXD7NwHr+tONWtUQiQEA/F94qU2U3ivS
AzyDABWrEQ5ZNsm+Rq2Y3zyoH7of3ww=
=s3Bu
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support caching symlink lengths in inodes
The size is stored in a new union utilizing the same space as
i_devices, thus avoiding growing the struct or taking up any more
space
When utilized it dodges strlen() in vfs_readlink(), giving about
1.5% speed up when issuing readlink on /initrd.img on ext4
- Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
If a file system supports uncached buffered IO, it may set
FOP_DONTCACHE and enable support for RWF_DONTCACHE.
If RWF_DONTCACHE is attempted without the file system supporting
it, it'll get errored with -EOPNOTSUPP
- Enable VBOXGUEST and VBOXSF_FS on ARM64
Now that VirtualBox is able to run as a host on arm64 (e.g. the
Apple M3 processors) we can enable VBOXSF_FS (and in turn
VBOXGUEST) for this architecture.
Tested with various runs of bonnie++ and dbench on an Apple MacBook
Pro with the latest Virtualbox 7.1.4 r165100 installed
Cleanups:
- Delay sysctl_nr_open check in expand_files()
- Use kernel-doc includes in fiemap docbook
- Use page->private instead of page->index in watch_queue
- Use a consume fence in mnt_idmap() as it's heavily used in
link_path_walk()
- Replace magic number 7 with ARRAY_SIZE() in fc_log
- Sort out a stale comment about races between fd alloc and dup2()
- Fix return type of do_mount() from long to int
- Various cosmetic cleanups for the lockref code
Fixes:
- Annotate spinning as unlikely() in __read_seqcount_begin
The annotation already used to be there, but got lost in commit
52ac39e5db ("seqlock: seqcount_t: Implement all read APIs as
statement expressions")
- Fix proc_handler for sysctl_nr_open
- Flush delayed work in delayed fput()
- Fix grammar and spelling in propagate_umount()
- Fix ESP not readable during coredump
In /proc/PID/stat, there is the kstkesp field which is the stack
pointer of a thread. While the thread is active, this field reads
zero. But during a coredump, it should have a valid value
However, at the moment, kstkesp is zero even during coredump
- Don't wake up the writer if the pipe is still full
- Fix unbalanced user_access_end() in select code"
* tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
gfs2: use lockref_init for qd_lockref
erofs: use lockref_init for pcl->lockref
dcache: use lockref_init for d_lockref
lockref: add a lockref_init helper
lockref: drop superfluous externs
lockref: use bool for false/true returns
lockref: improve the lockref_get_not_zero description
lockref: remove lockref_put_not_zero
fs: Fix return type of do_mount() from long to int
select: Fix unbalanced user_access_end()
vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64
pipe_read: don't wake up the writer if the pipe is still full
selftests: coredump: Add stackdump test
fs/proc: do_task_stat: Fix ESP not readable during coredump
fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
fs: sort out a stale comment about races between fd alloc and dup2
fs: Fix grammar and spelling in propagate_umount()
fs: fc_log replace magic number 7 with ARRAY_SIZE()
fs: use a consume fence in mnt_idmap()
file: flush delayed work in delayed fput()
...
- Overhaul KVM's CPUID feature infrastructure to replace "governed" features
with per-vCPU tracking of the vCPU's capabailities for all features. Along
the way, refactor the code to make it easier to add/modify features, and
add a variety of self-documenting macro types to again simplify adding new
features and to help readers understand KVM's handling of existing features.
- Rework KVM's handling of VM-Exits during event vectoring to plug holes where
KVM unintentionally puts the vCPU into infinite loops in some scenarios,
e.g. if emulation is triggered by the exit, and to bring parity between VMX
and SVM.
- Add pending request and interrupt injection information to the kvm_exit and
kvm_entry tracepoints respectively.
- Fix a relatively benign flaw where KVM would end up redoing RDPKRU when
loading guest/host PKRU due to a refactoring of the kernel helpers that
didn't account for KVM's pre-checking of the need to do WRPKRU.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmeJngsACgkQOlYIJqCj
N/1dfA/+NIZmnd8OV9Zvc6HGxrzgt4QsM9pmsUmrfkDWefxMYIAMeaW8Vn4CJfRf
zY/UcqyNI7JYxSiuVTckz+Tf54HhqYaLrUwILGCQ49koirZx+aQT1OUfjLroVMlh
ffX1i6GOoLNtxjb9MXM/heLVdUbvmzQMSFkd/AkOH+nrOtDNOiPlZfjHsewj9zrf
BNJGhzvT4M6vc/AsScC7tc0yFD5KKFRv8tVwJ6Zf1nWKyUDOSpMTWkVnq6geKJPZ
iGBZPPNg55Oy1g6uj6VYWmqYTD8Qioz5jtEJ/8pPHdAyIFo21s81bfJc548d+QLh
KfrL1K7TrCOhSAGC3Cb3lTLeq2immmGHaiTBLwGABG4MhpiX4NVpMMdOyFbVLMOS
HIYuwXwDckm1pfU7/w+PgPaakCyPrXQntm+3Y2pvDOoY6e2JbwodK4j8BvvQda35
8TrYKEGFvq5aij7Iw1O9TUoLAocDM/sHIHE6BCazHyzKBIv9xLRFeabiCQ+A1pwv
gZk5u0+j+DPpLdeLhbMYhIXUtr3bvyMYvc+tRkG716f8ubAE3+Kn5BEDo4Ot2DcT
vc+NTRYYWN6zavHiJH3Ddt153yj256JCZhLwCdfbryCQdz3Mpy16m36tgkDRd3lR
QT4IkPQo1Vl/aU0yiE/dhnJgh1rTO26YQjZoHs5Oj16d0HRrKyc=
=32mM
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-misc-6.14' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.14:
- Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities
instead of just those where KVM needs to manage state and/or explicitly
enable the feature in hardware. Along the way, refactor the code to make
it easier to add features, and to make it more self-documenting how KVM
is handling each feature.
- Rework KVM's handling of VM-Exits during event vectoring; this plugs holes
where KVM unintentionally puts the vCPU into infinite loops in some scenarios
(e.g. if emulation is triggered by the exit), and brings parity between VMX
and SVM.
- Add pending request and interrupt injection information to the kvm_exit and
kvm_entry tracepoints respectively.
- Fix a relatively benign flaw where KVM would end up redoing RDPKRU when
loading guest/host PKRU, due to a refactoring of the kernel helpers that
didn't account for KVM's pre-checking of the need to do WRPKRU.
Most likely the last "new features" pull request for v6.14 and this is
a bigger one. Multi-Link Operation (MLO) work continues both in stack
in drivers. Few new devices supported and usual fixes all over.
Major changes:
cfg80211
* Emergency Preparedness Communication Services (EPCS) station mode support
mac80211
* an option to filter a sta from being flushed
* some support for RX Operating Mode Indication (OMI) power saving
* support for adding and removing station links for MLO
iwlwifi
* new device ids
* rework firmware error handling and restart
rtw88
* RTL8812A: RFE type 2 support
* LED support
rtw89
* variant info to support RTL8922AE-VS
mt76
* mt7996: single wiphy multiband support (preparation for MLO)
* mt7996: support for more variants
* mt792x: P2P_DEVICE support
* mt7921u: TP-Link TXE50UH support
ath12k
* enable MLO for QCN9274 (although it seems to be broken with dual
band devices)
* MLO radar detection support
* debugfs: transmit buffer OFDMA, AST entry and puncture stats
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmeKvq8RHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZtJ3wf/bMRrED1k1xtC117yxBthXOZj7Ae0MG3s
cqIDBZ2MSmi596jfmc/FRFTN6Rie7+WQg8MGhpWpJt4WyvCd+IqmluLwuJGg3oHz
KdDQHN766cPb50mykTBY0m+Uh9eqyhaoozzK+KbrJZNLUt/Ri3DOdPInWP/ODKuT
uo9xv25zmOMQuUJlhOlFpOW68y6isuMsnPpgTvbtjndw3nPe+tVIfGPdg3xeQnq1
SG05wmOeJ8obmO1VsnmHPN20F5PfS5t2asu12UafR7KTiJOdP2QXKGNTy+ag7fQ/
9Sq6SzuPpbHHJJ93f0aJOkTUKdQVmk2xfCYNOyPNu1ewLqiVlvnkew==
=Vu+D
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2025-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.14
Most likely the last "new features" pull request for v6.14 and this is
a bigger one. Multi-Link Operation (MLO) work continues both in stack
in drivers. Few new devices supported and usual fixes all over.
Major changes:
cfg80211
* Emergency Preparedness Communication Services (EPCS) station mode support
mac80211
* an option to filter a sta from being flushed
* some support for RX Operating Mode Indication (OMI) power saving
* support for adding and removing station links for MLO
iwlwifi
* new device ids
* rework firmware error handling and restart
rtw88
* RTL8812A: RFE type 2 support
* LED support
rtw89
* variant info to support RTL8922AE-VS
mt76
* mt7996: single wiphy multiband support (preparation for MLO)
* mt7996: support for more variants
* mt792x: P2P_DEVICE support
* mt7921u: TP-Link TXE50UH support
ath12k
* enable MLO for QCN9274 (although it seems to be broken with dual
band devices)
* MLO radar detection support
* debugfs: transmit buffer OFDMA, AST entry and puncture stats
* tag 'wireless-next-2025-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (322 commits)
wifi: brcmfmac: fix NULL pointer dereference in brcmf_txfinalize()
wifi: rtw88: add RTW88_LEDS depends on LEDS_CLASS to Kconfig
wifi: wilc1000: unregister wiphy only after netdev registration
wifi: cfg80211: adjust allocation of colocated AP data
wifi: mac80211: fix memory leak in ieee80211_mgd_assoc_ml_reconf()
wifi: ath12k: fix key cache handling
wifi: ath12k: Fix uninitialized variable access in ath12k_mac_allocate() function
wifi: ath12k: Remove ath12k_get_num_hw() helper function
wifi: ath12k: Refactor the ath12k_hw get helper function argument
wifi: ath12k: Refactor ath12k_hw set helper function argument
wifi: mt76: mt7996: add implicit beamforming support for mt7992
wifi: mt76: mt7996: fix beacon command during disabling
wifi: mt76: mt7996: fix ldpc setting
wifi: mt76: mt7996: fix definition of tx descriptor
wifi: mt76: connac: adjust phy capabilities based on band constraints
wifi: mt76: mt7996: fix incorrect indexing of MIB FW event
wifi: mt76: mt7996: fix HE Phy capability
wifi: mt76: mt7996: fix the capability of reception of EHT MU PPDU
wifi: mt76: mt7996: add max mpdu len capability
wifi: mt76: mt7921: avoid undesired changes of the preset regulatory domain
...
====================
Link: https://patch.msgid.link/20250117203529.72D45C4CEDD@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For packets with two-step timestamp requests, the hardware timestamp
comes back to the driver through a confirmation mechanism of sorts,
which allows the driver to confidently bump the successful "pkts"
counter.
For one-step PTP, the NIC is supposed to autonomously insert its
hardware TX timestamp in the packet headers while simultaneously
transmitting it. There may be a confirmation that this was done
successfully, or there may not.
None of the current drivers which implement ethtool_ops :: get_ts_stats()
also support HWTSTAMP_TX_ONESTEP_SYNC or HWTSTAMP_TX_ONESTEP_SYNC, so it
is a bit unclear which model to follow. But there are NICs, such as DSA,
where there is no transmit confirmation at all. Here, it would be wrong /
misleading to increment the successful "pkts" counter, because one-step
PTP packets can be dropped on TX just like any other packets.
So introduce a special counter which signifies "yes, an attempt was made,
but we don't know whether it also exited the port or not". I expect that
for one-step PTP packets where a confirmation is available, the "pkts"
counter would be bumped.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250116104628.123555-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support stacking atomic write limits for DM devices.
All the pre-existing code in blk_stack_atomic_writes_limits() already takes
care of finding the aggregrate limits from the bottom devices.
Feature flag DM_TARGET_ATOMIC_WRITES is introduced so that atomic writes
can be enabled on personalities selectively. This is to ensure that atomic
writes are only enabled when verified to be working properly (for a
specific personality). In addition, it just may not make sense to enable
atomic writes on some personalities (so this flag also helps there).
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Add a definition for the clock stop capable bit in the PCS MMD. This
bit indicates whether the MAC is able to stop the transmit xMII clock
while it is signalling LPI.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1tYADb-0014PV-6T@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>