mirror-linux/include
Jason Gunthorpe 99cb252f5e mm/mmu_notifier: add an interval tree notifier
Of the 13 users of mmu_notifiers, 8 of them use only
invalidate_range_start/end() and immediately intersect the
mmu_notifier_range with some kind of internal list of VAs.  4 use an
interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list
of some kind (scif_dma, vhost, gntdev, hmm)

And the remaining 5 either don't use invalidate_range_start() or do some
special thing with it.

It turns out that building a correct scheme with an interval tree is
pretty complicated, particularly if the use case is synchronizing against
another thread doing get_user_pages().  Many of these implementations have
various subtle and difficult to fix races.

This approach puts the interval tree as common code at the top of the mmu
notifier call tree and implements a shareable locking scheme.

It includes:
 - An interval tree tracking VA ranges, with per-range callbacks
 - A read/write locking scheme for the interval tree that avoids
   sleeping in the notifier path (for OOM killer)
 - A sequence counter based collision-retry locking scheme to tell
   device page fault that a VA range is being concurrently invalidated.

This is based on various ideas:
- hmm accumulates invalidated VA ranges and releases them when all
  invalidates are done, via active_invalidate_ranges count.
  This approach avoids having to intersect the interval tree twice (as
  umem_odp does) at the potential cost of a longer device page fault.

- kvm/umem_odp use a sequence counter to drive the collision retry,
  via invalidate_seq

- a deferred work todo list on unlock scheme like RTNL, via deferred_list.
  This makes adding/removing interval tree members more deterministic

- seqlock, except this version makes the seqlock idea multi-holder on the
  write side by protecting it with active_invalidate_ranges and a spinlock

To minimize MM overhead when only the interval tree is being used, the
entire SRCU and hlist overheads are dropped using some simple
branches. Similarly the interval tree overhead is dropped when in hlist
mode.

The overhead from the mandatory spinlock is broadly the same as most of
existing users which already had a lock (or two) of some sort on the
invalidation path.

Link: https://lore.kernel.org/r/20191112202231.3856-3-jgg@ziepe.ca
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23 19:56:44 -04:00
..
acpi cpufreq: Use per-policy frequency QoS 2019-10-21 02:05:21 +02:00
asm-generic
clocksource
crypto
drm
dt-bindings
keys
kvm
linux mm/mmu_notifier: add an interval tree notifier 2019-11-23 19:56:44 -04:00
math-emu
media
misc
net net: reorder 'struct net' fields to avoid false sharing 2019-10-19 12:21:53 -07:00
pcmcia
ras
rdma RDMA/mlx5: Rework implicit ODP destroy 2019-10-28 16:41:14 -03:00
scsi SCSI fixes on 20191015 2019-10-15 12:19:08 -07:00
soc
sound ASoC: Fixes for v5.4 2019-10-21 14:05:26 +02:00
target
trace for-5.4-rc4-tag 2019-10-23 06:14:29 -04:00
uapi TTY/Serial driver fixes for 5.4-rc3 2019-10-12 15:42:19 -07:00
vdso
video
xen xen: fixes and cleanups for 5.4-rc2 2019-10-04 11:13:09 -07:00
Kbuild