mirror-linux/drivers
Jason Gunthorpe f394576eb1 iommufd: PFN handling for iopt_pages
The top of the data structure provides an IO Address Space (IOAS) that is
similar to a VFIO container. The IOAS allows map/unmap of memory into
ranges of IOVA called iopt_areas. Multiple IOMMU domains (IO page tables)
and in-kernel accesses (like VFIO mdevs) can be attached to the IOAS to
access the PFNs that those IOVA areas cover.

The IO Address Space (IOAS) datastructure is composed of:
 - struct io_pagetable holding the IOVA map
 - struct iopt_areas representing populated portions of IOVA
 - struct iopt_pages representing the storage of PFNs
 - struct iommu_domain representing each IO page table in the system IOMMU
 - struct iopt_pages_access representing in-kernel accesses of PFNs (ie
   VFIO mdevs)
 - struct xarray pinned_pfns holding a list of pages pinned by in-kernel
   accesses

This patch introduces the lowest part of the datastructure - the movement
of PFNs in a tiered storage scheme:
 1) iopt_pages::pinned_pfns xarray
 2) Multiple iommu_domains
 3) The origin of the PFNs, i.e. the userspace pointer

PFN have to be copied between all combinations of tiers, depending on the
configuration.

The interface is an iterator called a 'pfn_reader' which determines which
tier each PFN is stored and loads it into a list of PFNs held in a struct
pfn_batch.

Each step of the iterator will fill up the pfn_batch, then the caller can
use the pfn_batch to send the PFNs to the required destination. Repeating
this loop will read all the PFNs in an IOVA range.

The pfn_reader and pfn_batch also keep track of the pinned page accounting.

While PFNs are always stored and accessed as full PAGE_SIZE units the
iommu_domain tier can store with a sub-page offset/length to support
IOMMUs with a smaller IOPTE size than PAGE_SIZE.

Link: https://lore.kernel.org/r/8-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30 20:16:49 -04:00
..
accessibility
acpi ACPI and device properties fixes for 6.1-rc3 2022-10-28 16:48:29 -07:00
amba
android Scheduler changes for v6.1: 2022-10-10 09:10:28 -07:00
ata ata: ahci_qoriq: Fix compilation warning 2022-10-18 08:02:14 +09:00
atm
auxdisplay
base ACPI and device properties fixes for 6.1-rc3 2022-10-28 16:48:29 -07:00
bcma Interrupt subsystem updates: 2022-10-12 10:23:24 -07:00
block block-6.1-2022-10-28 2022-10-29 18:06:52 -07:00
bluetooth
bus Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
cdrom
char random: use arch_get_random*_early() in random_init() 2022-10-29 00:24:03 +02:00
clk This is the final part of the clk patches for this merge window. 2022-10-16 11:08:19 -07:00
clocksource A boring time, timekeeping, timers update: 2022-10-10 10:16:00 -07:00
comedi
connector
counter counter: 104-quad-8: Fix race getting function mode and direction 2022-10-23 20:39:26 -04:00
cpufreq cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores 2022-10-25 15:09:23 +02:00
cpuidle RISC-V Patches for the 6.1 Merge Window, Part 1 2022-10-09 13:24:01 -07:00
crypto This update includes the following changes: 2022-10-10 13:04:25 -07:00
cxl
dax libnvdimm for 6.1 2022-10-14 18:41:41 -07:00
dca
devfreq
dio
dma iommu: Remove SVM_FLAG_SUPERVISOR_MODE support 2022-11-03 15:47:45 +01:00
dma-buf whack-a-mole: cropped up open-coded file_inode() uses... 2022-10-06 17:22:11 -07:00
edac Merge patch series "Use composable cache instead of L2 cache" 2022-10-13 11:07:13 -07:00
eisa
extcon Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
firewire
firmware efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0 2022-10-21 11:09:41 +02:00
fpga Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
fsi
gnss
gpio gpio: tegra: Convert to immutable irq chip 2022-10-20 13:47:54 +02:00
gpu drm-misc-fixes for v6.1-rc3: 2022-10-28 13:00:15 +10:00
greybus
hid for-linus-2022102101 2022-10-21 17:41:57 -07:00
hsi
hte
hv hyperv-next for 6.1 2022-10-10 13:59:01 -07:00
hwmon - Use the correct CPU capability clearing function on the error path in 2022-10-23 10:01:34 -07:00
hwspinlock
hwtracing coresight: cti: Fix hang in cti_disable_hw() 2022-10-25 19:08:07 +02:00
i2c i2c: mlxbf: depend on ACPI; clean away ifdeffage 2022-10-21 07:59:35 +02:00
i3c i3c: master: Remove the wrong place of reattach. 2022-10-12 23:45:29 +02:00
idle
iio iio: bmc150-accel-core: Fix unsafe buffer attributes 2022-10-17 08:51:26 +01:00
infiniband treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
input Input updates for 6.1 merge window: 2022-10-11 10:53:25 -07:00
interconnect
iommu iommufd: PFN handling for iopt_pages 2022-11-30 20:16:49 -04:00
ipack Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
irqchip Interrupt subsystem updates: 2022-10-12 10:23:24 -07:00
isdn mISDN: hfcpci: Fix use-after-free bug in hfcpci_softirq 2022-10-09 19:11:54 +01:00
leds leds: simatic-ipc-leds-gpio: fix incorrect LED to GPIO mapping 2022-10-24 11:32:10 +02:00
macintosh powerpc updates for 6.1 2022-10-09 14:05:15 -07:00
mailbox mailbox: qcom-ipcc: flag IRQ NO_THREAD 2022-10-05 21:51:58 -05:00
mcb
md dm clone: Fix typo in block_device format specifier 2022-10-18 17:17:48 -04:00
media media: vivid: set num_in/outputs to 0 if not supported 2022-10-25 16:43:34 +01:00
memory
memstick
message
mfd Revert "mfd: syscon: Remove repetition of the regmap_get_val_endian()" 2022-10-23 12:04:56 -07:00
misc iommu: Remove SVM_FLAG_SUPERVISOR_MODE support 2022-11-03 15:47:45 +01:00
mmc mmc: sdhci_am654: 'select', not 'depends' REGMAP_MMIO 2022-10-26 11:48:03 +02:00
most
mtd mtd: parsers: bcm47xxpart: Fix halfblock reads 2022-10-18 11:20:12 +02:00
mux
net net: enetc: survive memory pressure without crashing 2022-10-27 11:32:25 -07:00
nfc nfc: virtual_ncidev: Fix memory leak in virtual_nci_send() 2022-10-20 21:13:04 -07:00
ntb
nubus
nvdimm libnvdimm for 6.1 2022-10-14 18:41:41 -07:00
nvme block-6.1-2022-10-28 2022-10-29 18:06:52 -07:00
nvmem
of Devicetree updates for v6.1: 2022-10-10 13:13:51 -07:00
opp
parisc parisc architecture fixes and updates for kernel v6.1-rc1: 2022-10-14 12:10:01 -07:00
parport
pci PCI: Enable PASID only when ACS RR & UF enabled on upstream path 2022-11-03 15:47:47 +01:00
pcmcia
peci
perf arm64 fixes: 2022-10-14 12:38:03 -07:00
phy pci-v6.1-changes 2022-10-11 11:08:18 -07:00
pinctrl pinctrl: ocelot: Fix incorrect trigger of the interrupt. 2022-10-18 10:42:10 +02:00
platform LoongArch fixes for v6.1-rc3 2022-10-30 09:44:06 -07:00
pnp Merge branches 'acpi-apei', 'acpi-wakeup', 'acpi-reboot' and 'acpi-thermal' 2022-10-10 18:11:11 +02:00
power power supply and reset changes for the v6.1 series 2022-10-07 11:48:30 -07:00
powercap Scheduler changes for v6.1: 2022-10-10 09:10:28 -07:00
pps
ps3
ptp ] ptp: ocp: remove symlink for second GNSS 2022-10-10 08:37:24 +01:00
pwm pwm: Changes for v6.1-rc1 2022-10-07 11:32:10 -07:00
rapidio
ras
regulator - Core Frameworks 2022-10-07 11:24:20 -07:00
remoteproc remoteproc: virtio: Fix warning on bindings by removing the of_match_table 2022-10-05 09:20:44 -06:00
reset Here's the main clk pull request for this merge window. We have some 2022-10-08 10:06:48 -07:00
rpmsg
rtc rtc: cmos: fix build on non-ACPI platforms 2022-10-18 22:36:54 +02:00
s390 s390/vfio-ap: Fix memory allocation for mdev_types array 2022-10-26 14:47:31 +02:00
sbus
scsi SCSI fixes on 20221028 2022-10-29 18:12:45 -07:00
sh
siox
slimbus
soc Merge patch series "Use composable cache instead of L2 cache" 2022-10-13 11:07:13 -07:00
soundwire soundwire updates for 6.1-rc1 2022-10-07 16:13:55 -07:00
spi spi: Fixes for v6.1 2022-10-26 17:38:46 -07:00
spmi
ssb
staging media fixes for v6.1-rc2 2022-10-22 15:30:15 -07:00
target Merge branch '6.1/scsi-queue' into 6.1/scsi-fixes 2022-10-21 01:10:34 +00:00
tc
tee - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
thermal thermal: intel_powerclamp: Use first online CPU as control_cpu 2022-10-15 19:33:57 +02:00
thunderbolt treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
tty parisc architecture fixes and updates for kernel v6.1-rc1: 2022-10-14 12:10:01 -07:00
ufs scsi: ufs: core: Fix typo in comment 2022-10-22 03:29:32 +00:00
uio
usb fbdev fixes for kernel 6.1-rc3: 2022-10-30 11:31:14 -07:00
vdpa virtio: fixes, features 2022-10-10 14:02:53 -07:00
vfio VFIO updates for v6.1-rc1 2022-10-12 14:46:48 -07:00
vhost virtio: fixes, features 2022-10-10 14:02:53 -07:00
video fbdev fixes for kernel 6.1-rc3: 2022-10-30 11:31:14 -07:00
virt Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
virtio virtio_pci: use irq to detect interrupt support 2022-10-13 09:33:03 -04:00
vlynq
w1 Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
watchdog linux-watchdog 6.1-rc2 tag 2022-10-21 12:25:39 -07:00
xen xen: branch for v6.1-rc2 2022-10-21 14:43:09 -07:00
zorro
Kconfig
Makefile