Commit Graph

932869 Commits (d1739eabdd9600569ff2da170b3baf343263f952)

Author SHA1 Message Date
Christophe JAILLET 320bdbd816 crypto: cavium/nitrox - Fix 'nitrox_get_first_device()' when ndevlist is fully iterated
When a list is completely iterated with 'list_for_each_entry(x, ...)', x is
not NULL at the end.

While at it, remove a useless initialization of the ndev variable. It
is overridden by 'list_for_each_entry'.

Fixes: f2663872f0 ("crypto: cavium - Register the CNN55XX supported crypto algorithms.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-04 22:06:26 +10:00
Tero Kristo 281c377872 crypto: omap-sham - add proper load balancing support for multicore
The current implementation of the multiple accelerator core support for
OMAP SHA does not work properly. It always picks up the first probed
accelerator core if this is available, and rest of the book keeping also
gets confused if there are two cores available. Add proper load
balancing support for SHA, and also fix any bugs related to the
multicore support while doing it.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-04 22:03:43 +10:00
Tero Kristo 9ef4e6e5e3 crypto: omap-aes - prevent unregistering algorithms twice
Most of the OMAP family SoCs contain two instances for AES core, which
causes the remove callbacks to be also done twice when driver is
removed. Fix the algorithm unregister callbacks to take into account the
number of algorithms still registered to avoid removing these twice.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-04 22:03:42 +10:00
Tero Kristo 63832a0c6f crypto: omap-sham - fix very small data size handling
With very small data sizes, the whole data can end up in the xmit
buffer. This code path does not set the sg_len properly which causes the
core dma framework to crash. Fix by adding the proper size in place.
Also, the data length must be a multiple of block-size, so extend the
DMA data size while here.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-04 22:03:41 +10:00
Tero Kristo 6395166d7a crypto: omap-sham - huge buffer access fixes
The ctx internal buffer can only hold buflen amount of data, don't try
to copy over more than that. Also, initialize the context sg pointer
if we only have data in the context internal buffer, this can happen
when closing a hash with certain data amounts.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-04 22:03:41 +10:00
Tero Kristo 7e34e0bbc6 crypto: omap-crypto - fix userspace copied buffer access
In case buffers are copied from userspace, directly accessing the page
will most likely fail because it hasn't been mapped into the kernel
memory space. Fix the issue by forcing a kmap / kunmap within the
cleanup functionality.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-04 22:03:40 +10:00
Tero Kristo 8dc43636e3 crypto: omap-sham - force kernel driver usage for sha algos
As the hardware acceleration for the omap-sham algos is not available
from userspace, force kernel driver usage. Without this flag in place,
openssl 1.1 implementation thinks it can accelerate sha algorithms on
omap devices directly from userspace.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-04 22:03:40 +10:00
Tero Kristo b29cb8d645 crypto: omap-aes - avoid spamming console with self tests
Running the self test suite for omap-aes with extra tests enabled causes
huge spam with the tag message wrong indicators. With self tests, this
is fine as there are some tests that purposedly pass bad data to the
driver. Also, returning -EBADMSG from the driver is enough, so remove the
dev_err message completely.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-04 22:03:39 +10:00
Vasily Gorbik bfa50e1427 vfio-ccw: make vfio_ccw_regops variables declarations static
Fixes the following sparse warnings:
drivers/s390/cio/vfio_ccw_chp.c:62:30: warning: symbol 'vfio_ccw_schib_region_ops' was not declared. Should it be static?
drivers/s390/cio/vfio_ccw_chp.c:117:30: warning: symbol 'vfio_ccw_crw_region_ops' was not declared. Should it be static?

Link: https://lkml.kernel.org/r/patch.git-a34be7aede18.your-ad-here.call-01591269421-ext-5655@work.hours
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-06-04 13:45:51 +02:00
Vasily Gorbik 0927e157b7 vfio-ccw updates:
- accept requests without the prefetch bit set
 - enable path handling via two new regions
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEw9DWbcNiT/aowBjO3s9rk8bwL68FAl7Xhu0SHGNvaHVja0By
 ZWRoYXQuY29tAAoJEN7Pa5PG8C+vCvAP/2X7tJlL7XJqMHTWI84WsFF0HP9jPNzc
 DetIc692A49Yhidrfrd2qcJaDmencKCD1BR7OdepgSQJkcI+ohcLq8RxgIvPANzy
 lK9+Zf1uSxWVDLYzdyX5lUN5AhWzvkjQbgNLassspLuZWwon1n57GGtBCpHPs9+V
 8ESLAlMjIw2BiGq5FLWWd5VtL5/4Oyzfb/SZIL48LsPfHT9lpuypf2JheGwMZSJR
 dBdr/Gn5poQ7Y6uorDD8Cw40MEoHHjYJdqOZhbUg/CeTmgTO6Wy7mEr16PobUxIy
 lEnRgdz4cJf5yK3e6O5lBfY8xD14qubaAhjZlauQqGy+fUEVIGqOSZoTgC6xv8G/
 3OrSDtzT20wk38dyc+urZAiwwhMTf10BdkRXP15wV+xZb/IhUYzYGGNXAkXBTUvQ
 zC3sEjWqPBKGlxW3w3fVSPVDL8gaAF1yKTIhPds1A1PBElQ8R+edPYb26NW4cl7J
 rjnREBxhldJj+sc0DB/Foa3ut8a2luaGktBAJUu7eS+akWZSb/krvAjQHj3ZeWzS
 78ALCoMSX4QvRZHEddLDs+eaPEQefp1FfqAn28bbLwrnS1QXkVxPriMMj5j0zKE5
 IKA80XW+/SntIXGP+eiPTOUDI9X+e0mMavHjCxb9gBEPZY9aPwvWyB6PekYajZ4a
 ZISA+9MioI46
 =3Qm+
 -----END PGP SIGNATURE-----

Merge tag 'vfio-ccw-20200603-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into features

vfio-ccw updates:
- accept requests without the prefetch bit set
- enable path handling via two new regions

* tag 'vfio-ccw-20200603-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw:
  vfio-ccw: Add trace for CRW event
  vfio-ccw: Wire up the CRW irq and CRW region
  vfio-ccw: Introduce a new CRW region
  vfio-ccw: Refactor IRQ handlers
  vfio-ccw: Introduce a new schib region
  vfio-ccw: Refactor the unregister of the async regions
  vfio-ccw: Register a chp_event callback for vfio-ccw
  vfio-ccw: Introduce new helper functions to free/destroy regions
  vfio-ccw: document possible errors
  vfio-ccw: Enable transparent CCW IPL from DASD

Link: https://lkml.kernel.org/r/20200603112716.332801-1-cohuck@redhat.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-06-04 11:58:05 +02:00
Joerg Roedel 431275afdc iommu: Check for deferred attach in iommu_group_do_dma_attach()
The iommu_group_do_dma_attach() must not attach devices which have
deferred_attach set. Otherwise devices could cause IOMMU faults when
re-initialized in a kdump kernel.

Fixes: deac0b3bed ("iommu: Split off default domain allocation from group assignment")
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20200604091944.26402-1-joro@8bytes.org
2020-06-04 11:38:17 +02:00
Kunihiko Hayashi 8d7e33d681 PCI: uniphier: Add Socionext UniPhier Pro5 PCIe endpoint controller driver
Add driver for the Socionext UniPhier Pro5 SoC endpoint controller.
This controller is based on the DesignWare PCIe core.

And add "host" to existing controller descriontions for the host controller
in Kconfig.

Link: https://lore.kernel.org/r/1589457801-12796-3-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
2020-06-04 10:03:18 +01:00
Miklos Szeredi 74c6e384e9 ovl: make oip->index bool
ovl_get_inode() uses oip->index as a bool value, not as a pointer.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04 10:48:19 +02:00
Miklos Szeredi b778e1ee1a ovl: only pass ->ki_flags to ovl_iocb_to_rwf()
Next patch will want to pass a modified set of flags, so...

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04 10:48:19 +02:00
Miklos Szeredi df820f8de4 ovl: make private mounts longterm
Overlayfs is using clone_private_mount() to create internal mounts for
underlying layers.  These are used for operations requiring a path, such as
dentry_open().

Since these private mounts are not in any namespace they are treated as
short term, "detached" mounts and mntput() involves taking the global
mount_lock, which can result in serious cacheline pingpong.

Make these private mounts longterm instead, which trade the penalty on
mntput() for a slightly longer shutdown time due to an added RCU grace
period when putting these mounts.

Introduce a new helper kern_unmount_many() that can take care of multiple
longterm mounts with a single RCU grace period.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04 10:48:19 +02:00
Miklos Szeredi b8e42a651b ovl: get rid of redundant members in struct ovl_fs
ofs->upper_mnt is copied to ->layers[0].mnt and ->layers[0].trap could be
used instead of a separate ->upperdir_trap.

Split the lowerdir option early to get the number of layers, then allocate
the ->layers array, and finally fill the upper and lower layers, as before.

Get rid of path_put_init() in ovl_lower_dir(), since the only caller will
take care of that.

[Colin Ian King] Fix null pointer dereference on null stack pointer on
error return found by Coverity.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04 10:48:19 +02:00
Miklos Szeredi 08f4c7c86d ovl: add accessor for ofs->upper_mnt
Next patch will remove ofs->upper_mnt, so add an accessor function for this
field.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04 10:48:19 +02:00
Yuxuan Shui 520da69d26 ovl: initialize error in ovl_copy_xattr
In ovl_copy_xattr, if all the xattrs to be copied are overlayfs private
xattrs, the copy loop will terminate without assigning anything to the
error variable, thus returning an uninitialized value.

If ovl_copy_xattr is called from ovl_clear_empty, this uninitialized error
value is put into a pointer by ERR_PTR(), causing potential invalid memory
accesses down the line.

This commit initialize error with 0. This is the correct value because when
there's no xattr to copy, because all xattrs are private, ovl_copy_xattr
should succeed.

This bug is discovered with the help of INIT_STACK_ALL and clang.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1050405
Fixes: 0956254a2d ("ovl: don't copy up opaqueness")
Cc: stable@vger.kernel.org # v4.8
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04 10:48:19 +02:00
Steve French e80ddeb2f7 smb3: fix incorrect number of credits when ioctl MaxOutputResponse > 64K
We were not checking to see if ioctl requests asked for more than
64K (ie when CIFSMaxBufSize was > 64K) so when setting larger
CIFSMaxBufSize then ioctls would fail with invalid parameter errors.
When requests ask for more than 64K in MaxOutputResponse then we
need to ask for more than 1 credit.

Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-04 01:13:41 -05:00
Steve French 1ee0e6d47d smb3: default to minimum of two channels when multichannel specified
When "multichannel" is specified on mount, make sure to default to
at least two channels.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-06-04 01:13:37 -05:00
Ben Skeggs dd67cab5db drm/nouveau/kms/nv50-: clear SW state of disabled windows harder
The most innocuous result of not having done this is that we end up
sending unnecessary methods when we next enable the window.

However, interactions with the code handling skipping disables when
an update immediately follows, and window ownership assignment, can
lead to upsetting the display hardware on Volta and newer.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-06-04 14:23:22 +10:00
Thierry Reding 21454fe697 drm/nouveau: gr/gk20a: Use firmware version 0
Tegra firmware doesn't actually use any version numbers and passing -1
causes the existing firmware binaries not to be found. Use version 0 to
find the correct files.

Fixes: ef16dc278e ("drm/nouveau/gr/gf100-: select implementation based on available FW")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-06-04 14:23:22 +10:00
Ben Skeggs 9b5ca547bb drm/nouveau/disp/gm200-: detect and potentially disable HDA support on some SORs
Some HDA pin widgets may be disabled by BIOS, and unavailable from a
SOR.  Our SOR allocation policy uses this information to allocate an
appropriate SOR when HDA is supported by a display.

Thank you to NVIDIA for providing the information to determine this.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-06-04 14:23:21 +10:00
Ben Skeggs 9f9f54e887 drm/nouveau/disp/gp100: split SOR implementation from gm200
GP100 needs different HDA detection.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-06-04 14:23:21 +10:00
Ben Skeggs e6867ffa34 drm/nouveau/disp: modify OR allocation policy to account for HDA requirements
Since GM200, SORs are no longer tied to a specific connector, and we
allocate them instead, with the assumption that all SORs are equally
capable.

However, there's a 1<->1 mapping between SOR and HDA pin widget, and
it turns out that it's possible for some widgets to be disabled...

In order to avoid picking a SOR without a valid pin widget, some new
rules need to be added.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-06-04 14:23:21 +10:00
Ben Skeggs f24b6ae19f drm/nouveau/disp: split part of OR allocation logic into a function
No logical changes here, this is just moving the code to make the
changes in the next commit more obvious.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-06-04 14:23:21 +10:00
Ben Skeggs 6f8dbcf1c9 drm/nouveau/disp: provide hint to OR allocation about HDA requirements
Will be used by a subsequent commit to influence SOR allocation policy.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-06-04 14:23:20 +10:00
Linus Torvalds 6929f71e46 atomisp: avoid warning about unused function
The atomisp_mrfld_power() function isn't actually ever called, because
the two call-sites have commented out the use because it breaks on some
platforms.  That results in:

  drivers/staging/media/atomisp/pci/atomisp_v4l2.c:764:12: warning: ‘atomisp_mrfld_power’ defined but not used [-Wunused-function]
    764 | static int atomisp_mrfld_power(struct atomisp_device *isp, bool enable)
        |            ^~~~~~~~~~~~~~~~~~~

during the build.

Rather than commenting out the use entirely, just disable it
semantically instead (using a "0 &&" construct), leaving the call in
place from a syntax standpoint, and avoiding the warning.

I really don't want my builds to have any warnings that can then hide
real issues.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 21:22:46 -07:00
Linus Torvalds a98f670e41 media updates for v5.8-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl7XUmwACgkQCF8+vY7k
 4RU4zg//fT32wiVAPHCCp+pDZVnWNeipXE1gnpqghd/qZXfzBPiLEC9sPS74VVkA
 jf1hhR33VZpKAKTPg/b074qhRZBywEOdHZnT/0CEE1oNB61shVOnyDYzLGSq95cO
 6V55ovbi5IOkrg0QEJbHpG5YHzt+pq5XeWOkqGNsHwla7N7iMGMVYfHepVVDWPnZ
 0wGYFF9cAJP+X/uxqkZLDVMA/K1I+QKh6vrj/qx53/eRt8VID3+i8ig3guk4PlUq
 7RLw5w/CywtNaGE5zaz7T3i2eoED71JHOTXi6RxdP1z8IDvELZ9mT95GQ+enlwqt
 AS6Ju1sV40wviHMv5prJWQjJkrrtYH3S907lIjwBpQLNGbh2+5crCd/6CwumkGgv
 1cCZ1dVmXpCe++9mU9AXmSkjsjGPStNcmHMOpc1Pwn9jUV3LQOOSDp8+RYdt1WHU
 Iw9cyM8NOpz5Mv/B1/ZPQ1gPb9lr1gE09XyUekxtAI/nl4nNHGWO8QDuX7Odfrv9
 8nfo14lk/p6XCTA8dsWJCgI5B1fgnqD4frHKWO9Uctppc/KBW41c8JpQUjBNlG/T
 MhtlGwYMVgSQxpQ6wK018JUAFoWkn1Sr0zMKRayqCnMjMLHsaMwE6kq+LgmRBqbB
 ersKV/9ZLYqCU1d6PhEVG6xUs6GsWdLcyhALlmHsddPSdpFXdf8=
 =KNAo
 -----END PGP SIGNATURE-----

Merge tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - Media documentation is now split into admin-guide, driver-api and
   userspace-api books (a longstanding request from Jon);

 - The media Kconfig was reorganized, in order to make easier to select
   drivers and their dependencies;

 - The testing drivers now has a separate directory;

 - added a new driver for Rockchip Video Decoder IP;

 - The atomisp staging driver was resurrected. It is meant to work with
   4 generations of cameras on Atom-based laptops, tablets and cell
   phones. So, it seems worth investing time to cleanup this driver and
   making it in good shape.

 - Added some V4L2 core ancillary routines to help with h264 codecs;

 - Added an ov2740 image sensor driver;

 - The si2157 gained support for Analog TV, which, in turn, added
   support for some cx231xx and cx23885 boards to also support analog
   standards;

 - Added some V4L2 controls (V4L2_CID_CAMERA_ORIENTATION and
   V4L2_CID_CAMERA_SENSOR_ROTATION) to help identifying where the camera
   is located at the device;

 - VIDIOC_ENUM_FMT was extended to support MC-centric devices;

 - Lots of drivers improvements and cleanups.

* tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (503 commits)
  media: Documentation: media: Refer to mbus format documentation from CSI-2 docs
  media: s5k5baf: Replace zero-length array with flexible-array
  media: i2c: imx219: Drop <linux/clk-provider.h> and <linux/clkdev.h>
  media: i2c: Add ov2740 image sensor driver
  media: ov8856: Implement sensor module revision identification
  media: ov8856: Add devicetree support
  media: dt-bindings: ov8856: Document YAML bindings
  media: dvb-usb: Add Cinergy S2 PCIe Dual Port support
  media: dvbdev: Fix tuner->demod media controller link
  media: dt-bindings: phy: phy-rockchip-dphy-rx0: move rockchip dphy rx0 bindings out of staging
  media: staging: dt-bindings: phy-rockchip-dphy-rx0: remove non-used reg property
  media: atomisp: unify the version for isp2401 a0 and b0 versions
  media: atomisp: update TODO with the current data
  media: atomisp: adjust some code at sh_css that could be broken
  media: atomisp: don't produce errs for ignored IRQs
  media: atomisp: print IRQ when debugging
  media: atomisp: isp_mmu: don't use kmem_cache
  media: atomisp: add a notice about possible leak resources
  media: atomisp: disable the dynamic and reserved pools
  media: atomisp: turn on camera before setting it
  ...
2020-06-03 20:59:38 -07:00
Linus Torvalds ee01c4d72a Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "More mm/ work, plenty more to come

  Subsystems affected by this patch series: slub, memcg, gup, kasan,
  pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
  thp, mmap, kconfig"

* akpm: (131 commits)
  arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  riscv: support DEBUG_WX
  mm: add DEBUG_WX support
  drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
  mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
  powerpc/mm: drop platform defined pmd_mknotpresent()
  mm: thp: don't need to drain lru cache when splitting and mlocking THP
  hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  sparc32: register memory occupied by kernel as memblock.memory
  include/linux/memblock.h: fix minor typo and unclear comment
  mm, mempolicy: fix up gup usage in lookup_node
  tools/vm/page_owner_sort.c: filter out unneeded line
  mm: swap: memcg: fix memcg stats for huge pages
  mm: swap: fix vmstats for huge pages
  mm: vmscan: limit the range of LRU type balancing
  mm: vmscan: reclaim writepage is IO cost
  mm: vmscan: determine anon/file pressure balance at the reclaim root
  mm: balance LRU lists based on relative thrashing
  mm: only count actual rotations as LRU reclaim cost
  ...
2020-06-03 20:24:15 -07:00
Jan Kara 6b8ed62008 ext4: avoid unnecessary transaction starts during writeback
ext4_writepages() currently works in a loop like:
  start a transaction
  scan inode for pages to write
  map and submit these pages
  stop the transaction

This loop results in starting transaction once more than is needed
because in the last iteration we start a transaction only to scan the
inode and find there are no pages to write. This can be significant
increase in number of transaction starts for single-extent files or
files that have all blocks already mapped. Furthermore we already know
from previous iteration whether there are more pages to write or not. So
propagate the information from mpage_prepare_extent_to_map() and avoid
unnecessary looping in case there are no more pages to write.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200525081215.29451-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:56 -04:00
Jens Axboe 6e014c621e ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
Running with some debug patches to detect illegal blocking triggered the
extend/unaligned condition in ext4. If ext4 needs to extend the file (and
hence go to buffered IO), or if the app is doing unaligned IO, then ext4
asks the iomap code to wait for IO completion. If the caller asked for
no-wait semantics by setting IOCB_NOWAIT, then ext4 should return -EAGAIN
instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Link: https://lore.kernel.org/r/76152096-2bbb-7682-8fce-4cb498bcd909@kernel.dk
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig ba98890393 ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
access_ok just checks we are fed a proper user pointer.  We also do that
in copy_to_user itself, so no need to do this early.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-10-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig c7d216e8c4 fs: remove the access_ok() check in ioctl_fiemap
access_ok just checks we are fed a proper user pointer.  We also do that
in copy_to_user itself, so no need to do this early.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-9-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig 45dd052e67 fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is
handled once instead of duplicated, but can still be done under fs locks,
like xfs/iomap intended with its duplicate handling.  Also make sure the
error value of filemap_write_and_wait is propagated to user space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig cddf8a2c4a fs: move fiemap range validation into the file systems instances
Replace fiemap_check_flags with a fiemap_prep helper that also takes the
inode and mapped range, and performs the sanity check and truncation
previously done in fiemap_check_range.  This way the validation is inside
the file system itself and thus properly works for the stacked overlayfs
case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-7-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig 2732881894 iomap: fix the iomap_fiemap prototype
iomap_fiemap should take u64 start and len arguments, just like the
->fiemap prototype.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-6-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig 10c5db2864 fs: move the fiemap definitions out of fs.h
No need to pull the fiemap definitions into almost every file in the
kernel build.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig 44ebcd06bb fs: mark __generic_block_fiemap static
There is no caller left outside of ioctl.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-4-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00
Christoph Hellwig da565e792b ext4: remove the call to fiemap_check_flags in ext4_fiemap
iomap_fiemap already calls fiemap_check_flags first thing, so this
additional check is redundant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-3-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00
Christoph Hellwig 03a5ed24c9 ext4: split _ext4_fiemap
The fiemap and EXT4_IOC_GET_ES_CACHE cases share almost no code, so split
them into entirely separate functions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-2-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00
Christoph Hellwig 328e24ae14 ext4: fix fiemap size checks for bitmap files
Add an extra validation of the len parameter, as for ext4 some files
might have smaller file size limits than others.  This also means the
redundant size check in ext4_ioctl_get_es_cache can go away, as all
size checking is done in the shared fiemap handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200505154324.3226743-3-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00
Ritesh Harjani 175efa81fe ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
ext4 supports max number of logical blocks in a file to be 0xffffffff.
(This is since ext4_extent's ee_block is __le32).
This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting
from 0 logical offset). This patch fixes this.

The issue was seen when ext4 moved to iomap_fiemap API and when
overlayfs was mounted on top of ext4. Since overlayfs was missing
filemap_check_ranges(), so it could pass a arbitrary huge length which
lead to overflow of map.m_len logic.

This patch fixes that.

Fixes: d3b6f23f71 ("ext4: move ext4_fiemap to use iomap framework")
Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00
Jonathan Grant 9f364e1d95 add comment for ext4_dir_entry_2 file_type member
Signed-off-by: Jonathan Grant <jg@jguk.org>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/ad3290d5-86af-99c1-f9d5-cd1bab710429@jguk.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00
Jan Kara 14ff628630 jbd2: avoid leaking transaction credits when unreserving handle
When reserved transaction handle is unused, we subtract its reserved
credits in __jbd2_journal_unreserve_handle() called from
jbd2_journal_stop(). However this function forgets to remove reserved
credits from transaction->t_outstanding_credits and thus the transaction
space that was reserved remains effectively leaked. The leaked
transaction space can be quite significant in some cases and leads to
unnecessarily small transactions and thus reducing throughput of the
journalling machinery. E.g. fsmark workload creating lots of 4k files
was observed to have about 20% lower throughput due to this when ext4 is
mounted with dioread_nolock mount option.

Subtract reserved credits from t_outstanding_credits as well.

CC: stable@vger.kernel.org
Fixes: 8f7d89f368 ("jbd2: transaction reservation support")
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200520133119.1383-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:53 -04:00
Jan Kara dfcd4489e2 ext4: drop ext4_journal_free_reserved()
Remove ext4_journal_free_reserved() function. It is never used.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200520133119.1383-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:53 -04:00
Ritesh Harjani 993778306e ext4: mballoc: use lock for checking free blocks while retrying
Currently while doing block allocation grp->bb_free may be getting
modified if discard is happening in parallel.
For e.g. consider a case where there are lot of threads who have
preallocated lot of blocks and there is a thread which is trying
to discard all of this group's PA. Now it could happen that
we see all of those group's bb_free is zero and fail the allocation
while there is sufficient space if we free up all the PA.

So this patch adds another flag "EXT4_MB_STRICT_CHECK" which will be set
if we are unable to allocate any blocks in the first try (since we may
not have considered blocks about to be discarded from PA lists).
So during retry attempt to allocate blocks we will use ext4_lock_group()
for checking if the group is good or not.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/9cb740a117c958c36596f167b12af1beae9a68b7.1589955723.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:53 -04:00
Ritesh Harjani 8ef123fe02 ext4: mballoc: refactor ext4_mb_good_group()
ext4_mb_good_group() definition was changed some time back
and now it even initializes the buddy cache (via ext4_mb_init_group()),
if in case the EXT4_MB_GRP_NEED_INIT() is true for a group.
Note that ext4_mb_init_group() could sleep and so should not be called
under a spinlock held.
This is fine as of now because ext4_mb_good_group() is called before
loading the buddy bitmap without ext4_lock_group() held
and again called after loading the bitmap, only this time with
ext4_lock_group() held.
But still this whole thing is confusing.

So this patch refactors out ext4_mb_good_group_nolock() which should be
called when without holding ext4_lock_group().
Also in further patches we hold the spinlock (ext4_lock_group()) while
doing any calculations which involves grp->bb_free or grp->bb_fragments.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/d9f7d031a5fbe1c943fae6bf1ff5cdf0604ae722.1589955723.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:53 -04:00
Ritesh Harjani 07b5b8e1ac ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
There could be a race in function ext4_mb_discard_group_preallocations()
where the 1st thread may iterate through group's bb_prealloc_list and
remove all the PAs and add to function's local list head.
Now if the 2nd thread comes in to discard the group preallocations,
it will see that the group->bb_prealloc_list is empty and will return 0.

Consider for a case where we have less number of groups
(for e.g. just group 0),
this may even return an -ENOSPC error from ext4_mb_new_blocks()
(where we call for ext4_mb_discard_group_preallocations()).
But that is wrong, since 2nd thread should have waited for 1st thread
to release all the PAs and should have retried for allocation.
Since 1st thread was anyway going to discard the PAs.

The algorithm using this percpu seq counter goes below:
1. We sample the percpu discard_pa_seq counter before trying for block
   allocation in ext4_mb_new_blocks().
2. We increment this percpu discard_pa_seq counter when we either allocate
   or free these blocks i.e. while marking those blocks as used/free in
   mb_mark_used()/mb_free_blocks().
3. We also increment this percpu seq counter when we successfully identify
   that the bb_prealloc_list is not empty and hence proceed for discarding
   of those PAs inside ext4_mb_discard_group_preallocations().

Now to make sure that the regular fast path of block allocation is not
affected, as a small optimization we only sample the percpu seq counter
on that cpu. Only when the block allocation fails and when freed blocks
found were 0, that is when we sample percpu seq counter for all cpus using
below function ext4_get_discard_pa_seq_sum(). This happens after making
sure that all the PAs on grp->bb_prealloc_list got freed or if it's empty.

It can be well argued that why don't just check for grp->bb_free to
see if there are any free blocks to be allocated. So here are the two
concerns which were discussed:-

1. If for some reason the blocks available in the group are not
   appropriate for allocation logic (say for e.g.
   EXT4_MB_HINT_GOAL_ONLY, although this is not yet implemented), then
   the retry logic may result into infinte looping since grp->bb_free is
   non-zero.

2. Also before preallocation was clubbed with block allocation with the
   same ext4_lock_group() held, there were lot of races where grp->bb_free
   could not be reliably relied upon.
Due to above, this patch considers discard_pa_seq logic to determine if
we should retry for block allocation. Say if there are are n threads
trying for block allocation and none of those could allocate or discard
any of the blocks, then all of those n threads will fail the block
allocation and return -ENOSPC error. (Since the seq counter for all of
those will match as no block allocation/discard was done during that
duration).

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/7f254686903b87c419d798742fd9a1be34f0657b.1589955723.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:53 -04:00
Ritesh Harjani cf5e2ca6c9 ext4: mballoc: refactor ext4_mb_discard_preallocations()
Implement ext4_mb_discard_preallocations_should_retry()
which we will need in later patches to add more logic
like check for sequence number match to see if we should
retry for block allocation or not.

There should be no functionality change in this patch.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/1cfae0098d2aa9afbeb59331401258182868c8f2.1589955723.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:53 -04:00