Will Deacon says:
====================
sparc32 SRMMU fixes for SMP
Enabling SMP for sparc32 uncovered some issues in the SRMMU page-table
allocation code. One of these was introduced by me, but the other two
seem to have been there a while and are probably just exposed more
easily by my recent changes.
Tested on QEMU. I'm assuming these will go via David's tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The SRMMU page-table allocator allocates multiple PTE tables per page,
since they are only 1K in size. However, this means that calls to
pgtable_pte_page_{ctor,dtor}() must be serialised and performed only by
the first and last page-table allocation for the page respectively.
Use the page reference count to track how many PTE tables we have
allocated for a given page returned by the SRMMU allocator and only
call the ctor()/dtor() functions for the first and last user respectively.
Cc: David S. Miller <davem@davemloft.net>
Fixes: 8c8f3156dd ("sparc32: mm: Reduce allocation size for PMD and PTE tables")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SRMMU page-table allocator is not compatible with SPLIT_PTLOCK_CPUS
for two major reasons:
1. Pages are allocated via memblock, and therefore the ptl is not
cleared by prep_new_page(), which is expected by ptlock_init()
2. Multiple PTE tables can exist in a single page, causing them to
share the same ptl and deadlock when attempting to take the same
lock twice (e.g. as part of copy_page_range()).
Ensure that SPLIT_PTLOCK_CPUS is not selected for SPARC32.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pages backing page-table allocations for SRMMU are allocated via
memblock as part of the "nocache" region initialisation during
srmmu_paging_init() and should not be freed even if a later call to
pgtable_pte_page_ctor() fails.
Remove the broken call to __free_page().
Cc: David S. Miller <davem@davemloft.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Fixes: 1ae9ae5f7d ("sparc: handle pgtable_page_ctor() fail")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc32 never registered the memory occupied by the kernel image with
memblock_add() and it only reserved this memory with meblock_reserve().
With openbios as system firmware, the memory occupied by the kernel is
reserved in openbios and removed from mem.available. The prom setup code in
the kernel uses mem.available to set up the memory banks and essentially
there is a hole for the memory occupied by the kernel image.
Later in bootmem_init() this memory is memblock_reserve()d.
Up until recently, memmap initialization would call __init_single_page()
for the pages in that hole, the free_low_memory_core_early() would mark
them as reserved and everything would be Ok.
After the change in memmap initialization introduced by the commit "mm:
memmap_init: iterate over memblock regions rather that check each PFN", the
hole is skipped and the page structs for it are not initialized. And when
they are passed from memblock to page allocator as reserved, the latter
gets confused.
Simply registering the memory occupied by the kernel with memblock_add()
resolves this issue.
Tested on qemu-system-sparc with Debian Etch [1] userspace.
[1] https://people.debian.org/~aurel32/qemu/sparc/debian_etch_sparc_small.qcow2
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200517000050.GA87467@roeck-us.nlllllet/
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ufshcd_probe_hba(), all BKOP SW tracking variables can be reset together
in ufshcd_force_reset_auto_bkops(), thus urgent_bkop_lvl initialization in
the beginning of ufshcd_probe_hba() can be merged into
ufshcd_force_reset_auto_bkops().
Link: https://lore.kernel.org/r/20200530141200.4616-1-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Urgent bkops level is used to compare against actual bkops status read from
UFS device. Urgent bkops level is set during initialization and might be
updated in exception event handler during runtime. But it should not be
updated to the actual bkops status every time when auto bkops is toggled.
Otherwise, if urgent bkops level is updated to 0, auto bkops shall always
be kept enabled.
Link: https://lore.kernel.org/r/1590632686-17866-1-git-send-email-cang@codeaurora.org
Fixes: 24366c2afb ("scsi: ufs: Recheck bkops level if bkops is disabled")
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The variable rc is being initialized with a value that is never read and it
is being updated later with a new value. The initialization is redundant
and can be removed.
Link: https://lore.kernel.org/r/20200527115242.172344-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix memset() accessing out of range address when reply_queue count is less
than RDPQ_MAX_INDEX_IN_ONE_CHUNK (i.e. 16) in non-RDPQ mode.
In non-RDPQ mode, the driver allocates a single contiguous pool of size
reply_queue's count * reqly_post_free_sz. But the driver is always
memsetting this pool with size 16 * reqly_post_free_sz. If reply queue
count is less than 16 (i.e. when MSI-X vectors enabled < 16), the driver is
accessing out of range address and this results in 'BUG: unable to handle
kernel paging request at fff0x...x' bug.
Make driver use dma_pool_zalloc() API to allocate and zero the pool.
Link: https://lore.kernel.org/r/20200528145617.27252-1-suganath-prabu.subramani@broadcom.com
Fixes: 8012209eb2 ("scsi: mpt3sas: Handle RDPQ DMA allocation in same 4G region")
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
kobject_init_and_add() takes reference even when it fails. If this
function returns an error, kobject_put() must be called to properly
clean up the memory associated with the object.
Link: https://lore.kernel.org/r/20200528201353.14849-1-wu000273@umn.edu
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
1) If remaining ring space before the end of the ring is smaller then the
next cmd to write, tcmu writes a padding entry which fills the remaining
space at the end of the ring.
Then tcmu calls tcmu_flush_dcache_range() with the size of struct
tcmu_cmd_entry as data length to flush. If the space filled by the
padding was smaller then tcmu_cmd_entry, tcmu_flush_dcache_range() is
called for an address range reaching behind the end of the vmalloc'ed
ring.
tcmu_flush_dcache_range() in a loop calls
flush_dcache_page(virt_to_page(start)); for every page being part of the
range. On x86 the line is optimized out by the compiler, as
flush_dcache_page() is empty on x86.
But I assume the above can cause trouble on other architectures that
really have a flush_dcache_page(). For paddings only the header part of
an entry is relevant due to alignment rules the header always fits in
the remaining space, if padding is needed. So tcmu_flush_dcache_range()
can safely be called with sizeof(entry->hdr) as the length here.
2) After it has written a command to cmd ring, tcmu calls
tcmu_flush_dcache_range() using the size of a struct tcmu_cmd_entry as
data length to flush. But if a command needs many iovecs, the real size
of the command may be bigger then tcmu_cmd_entry, so a part of the
written command is not flushed then.
Link: https://lore.kernel.org/r/20200528193108.9085-1-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull lockdown update from James Morris:
"An update for the security subsystem to allow unprivileged users
to see the status of the lockdown feature. From Jeremy Cline"
Also an added comment to describe CAP_SETFCAP.
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
capabilities: add description for CAP_SETFCAP
lockdown: Allow unprivileged users to see lockdown status
dummy_netdev shold be freed by free_netdev() instead of kfree(). Also
remove unneeded variable 'priv'
Fixes: 4730f4a6c6 ("IB/hfi1: Activate the dummy netdev")
Link: https://lore.kernel.org/r/20200602061635.31224-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7VnLoUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPkjA//fHFbgHBbiZrS7v/vi61wdpEtGzmn
/hr4Z5DpFmJdCTGeGItST8Xq4KEqlLGrMclk+PsG0H7BMJEEp+0XJ+begqNvC8PF
+JzP+oBqoO0SoF5z0jOnBBtzK8R3vmVgcPO3dNdEgNBQG3T7/GQLUTX8DylBDOI1
yFeuewRD7sK/rIg/S6t+B0ut7Uer5CjEIed4iQZ3eKIUqE6/C1zpmQj98MH9L5uh
yN0tdF8aOZvgD6v1bfmvgAnnODFvvKcogDn+hvbqRhrDdhgt1DAErIjYeqRemQRc
g7Xve4i7VivXC4o8nhUy00FWqzCB5tcydR0cwgg4iR/JgKvn18s0vRQV9SU7Nt+o
pXOex6qHlFCJpjTop+DCBEkGK9V7UBMM6t6gwR/bpkDMYkgIjJrCIQTyw8/HrKKt
fntryXf9juM0Owh/YOp5jKXPddhkfuztViJ+FnxsI2sho643Gg6/Wfy2slvJ0udH
i0bnnacW/6pysf/eLrPsF89IacAGydkhdZwaSno3GLyCtXxrqJU4cs2wSpUq0Wiz
g4kB4hpPXgrQszLriEsF0gRVcRu2nOF4ISXlUqfSw7i/nFT7+axYUjgBg9PpV1Mj
GyLBSOQp1xs4S/oglfJ5nE4UtS4m187t4JVWOxfqqyWE/O2cqUPtaS+52m0aIWTH
6HFWbmL5+Dxsm+4=
=A+bX
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"The highlights:
- A number of improvements to various SELinux internal data
structures to help improve performance. We move the role
transitions into a hash table. In the content structure we shift
from hashing the content string (aka SELinux label) to the
structure itself, when it is valid. This last change not only
offers a speedup, but it helps us simplify the code some as well.
- Add a new SELinux policy version which allows for a more space
efficient way of storing the filename transitions in the binary
policy. Given the default Fedora SELinux policy with the unconfined
module enabled, this change drops the policy size from ~7.6MB to
~3.3MB. The kernel policy load time dropped as well.
- Some fixes to the error handling code in the policy parser to
properly return error codes when things go wrong"
* tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: netlabel: Remove unused inline function
selinux: do not allocate hashtabs dynamically
selinux: fix return value on error in policydb_read()
selinux: simplify range_write()
selinux: fix error return code in policydb_read()
selinux: don't produce incorrect filename_trans_count
selinux: implement new format of filename transitions
selinux: move context hashing under sidtab
selinux: hash context structure directly
selinux: store role transitions in a hash table
selinux: drop unnecessary smp_load_acquire() call
selinux: fix warning Comparison to bool
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7VnKEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMbHA/+PQmrPdzPvkLAjjf1y3LXvyEIAXIQ
h2r8SxHa7iGyF6vVPz+ya7ux0KAm8wCVdfkokWG5jxjwK7pysS6gx9JzBVK7dbhD
FsKBSoq9+to9fYlaCyX7vn85C7kK5oGrwS/ECos0BHBpij8ukLgvPQu+PDs7d4xW
1X2Nrgqnc7M4L8ayzXTQX0fDWcOkapzaN86+R+Lavb4hO/FownaYbuCFn+1mdzux
ZNBpt3/y1pM6vi5YBkI1rkauBCmkl/YSX/mf/EwDNlQ0XmcadGQ6z7iwjyiE826g
etCHWD3cgQH7Zzz6CxBNX8Xbq0nIQueHHiFYpVyy9lf4xleFvnfFDebrs8Q9TB6G
jTWU8okioUKPZyRDaRuIAmCf/LBQRsMkIYTU3w6J0ZqsBycTw3NXPiQArmlxZESM
HquxWpKoZytRiw581hiSGKNqY+R3FvA+Jroc/7bWfNOE3IdFxegvCsC3giKJf1rY
AlQitehql9a5jp7A57+477WRYOygYRnd+ntMD5KqR90QSIcQXeg0/lFKhco+zc2p
bXbWLE+aaOTGCeC+3Eow3T7FMWmrIn6ccKgM84+WT7YQYtRqUYu3RIZbnlYXN7uH
8xGXT6ccPcEwIjgyF87J0KyGhrbT1N91Jd2jMJkEry9OLAn/yr+pUBQtAa456MMi
JYevS4atZaUqgvw=
=iLfC
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Summary of the significant patches:
- Record information about binds/unbinds to the audit multicast
socket. This helps identify which processes have/had access to the
information in the audit stream.
- Cleanup and add some additional information to the netfilter
configuration events collected by audit.
- Fix some of the audit error handling code so we don't leak network
namespace references"
* tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: add subj creds to NETFILTER_CFG record to
audit: Replace zero-length array with flexible-array
audit: make symbol 'audit_nfcfgs' static
netfilter: add audit table unregister actions
audit: tidy and extend netfilter_cfg x_tables
audit: log audit netlink multicast bind and unbind
audit: fix a net reference leak in audit_list_rules_send()
audit: fix a net reference leak in audit_send_reply()
The "dmac" variable is used before it is initialized.
Fixes: 494c3b3122 ("RDMA/hns: Refactor the QP context filling process related to WQE buffer configure")
Link: https://lore.kernel.org/r/20200529083918.GA1298465@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The hfi1_vnic_up() function doesn't check whether hfi1_netdev_rx_init()
returns errors. In hfi1_vnic_init() we need to change the code to
preserve the error code instead of returning success.
Fixes: 2280740f01 ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Fixes: 4730f4a6c6 ("IB/hfi1: Activate the dummy netdev")
Link: https://lore.kernel.org/r/20200530140224.GA1330098@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use FRWR method to register memory by default and remove the ancient and
unsafe FMR method.
Link: https://lore.kernel.org/r/10-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
HCA's that are driven by mlx4 driver support FRWR method to register
memory. Remove the ancient and unsafe FMR method.
Link: https://lore.kernel.org/r/8-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove a few leftovers from FMR functionality which are no longer used.
Link: https://lore.kernel.org/r/5-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This ancient and unsafe method for memory registration is no longer used
by any RDMA based ULP. Remove the FMR pool API from the core driver.
Link: https://lore.kernel.org/r/4-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
FMR is not supported on most recent RDMA devices (that use fast memory
registration mechanism). Also, FMR was recently removed from NFS/RDMA
ULP.
Link: https://lore.kernel.org/r/2-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
FMR is not supported on most recent RDMA devices (that use fast memory
registration mechanism). Also, FMR was recently removed from NFS/RDMA
ULP.
Link: https://lore.kernel.org/r/1-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Document the purpose of CAP_SETFCAP. For some reason this capability
had no description while the others did.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
A previous commit enabled this functionality, which also enabled O_PATH
to work correctly with io_uring. But we can't safely close the ring
itself, as the file handle isn't reference counted inside
io_uring_enter(). Instead of jumping through hoops to enable ring
closure, add a "soft" ->needs_file option, ->needs_file_no_error. This
enables O_PATH file descriptors to work, but still catches the case of
trying to close the ring itself.
Reported-by: Jann Horn <jannh@google.com>
Fixes: 904fbcb115 ("io_uring: remove 'fd is io_uring' from close path")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit e7bf90e5af ("block/bio-integrity: fix a memory leak bug") added
a kfree() for 'buf' if bio_integrity_add_page() returns '0'. However,
the object will be freed in bio_integrity_free() since 'bio->bi_opf' and
'bio->bi_integrity' were set previousy in bio_integrity_alloc().
Fixes: commit e7bf90e5af ("block/bio-integrity: fix a memory leak bug")
Signed-off-by: yu kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When running with conntrack rules, the dropped overlap fragments may cause
EPERM to be returned to sendto. Instead of completely failing, just ignore
those errors and continue. If this causes packets with overlap fragments to
be dropped as expected, that is okay. And if it causes packets that are
expected to be received to be dropped, which should not happen, it will be
detected as failure.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VP+kQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuK4D/0XsSG/Yirbba1rrbqw/qpw9xcAs9oyN0tS
8SmmGN27ghrkVSsGBXNcG+PSTu3pkkLjYZ6TQtKamrya9G+lRAsKRsQ+Yq+7Qv4e
N6lCUlLJ99KqTMtwvIoxSpA1tz3ENHucOw2cJrw3kd9G0kil7GvDkIOBasd+kmwn
ak+mnMJZzRhqSM7M5lKQOk8l92gKBHGbPy4xKb0st3dQkYptDvit0KcNSAuevtOp
sRZpdbXaT3FA6xa5iEgggI6vZQGVmK1EaGoQqZ8vgVo75aovkjZyQWWiFVVOlEqr
QjUCCQuixcbMRbZjgpojqva5nmLhFVhLCfoSH2XgttEQZhmTwypdRwM2/IlxV5q2
xCofrDkhYOfIgHkuP6p68ukIPIfQ+4jotvsmXZ/HeD/xbx3TRyJRZadISr6wiuLm
7zRXWaGCYomUIPJOOrpBQ9FsCglkaN63oB6VGuGKTg3g7kE2QrZ2/aGuexP+FAdh
OrA8BlzxZzpqMKhjQVKOl9r6FU928MZn8nIAkMdQ/Ia1mOpb4rrPo4qCdf+tbhPO
pmKtQPQjbszQ3UfTgShvfvDk43BeRim1DxZPFTauSu1FMpqWBCwQgXMynPFrf5TR
HXF61G+jw5swDW6uJgW7bXdm7hHr15vRqQr54MgGS+T0OOa1df9MR0dJB5CGklfI
ycLU6AAT+A==
=A/qA
-----END PGP SIGNATURE-----
Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"A relatively quiet round, mostly just fixes and code improvements. In
particular:
- Make statx just use the generic statx handler, instead of open
coding it. We don't need that anymore, as we always call it async
safe (Bijan)
- Enable closing of the ring itself. Also fixes O_PATH closure (me)
- Properly name completion members (me)
- Batch reap of dead file registrations (me)
- Allow IORING_OP_POLL with double waitqueues (me)
- Add tee(2) support (Pavel)
- Remove double off read (Pavel)
- Fix overflow cancellations (Pavel)
- Improve CQ timeouts (Pavel)
- Async defer drain fixes (Pavel)
- Add support for enabling/disabling notifications on a registered
eventfd (Stefano)
- Remove dead state parameter (Xiaoguang)
- Disable SQPOLL submit on dying ctx (Xiaoguang)
- Various code cleanups"
* tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
io_uring: fix overflowed reqs cancellation
io_uring: off timeouts based only on completions
io_uring: move timeouts flushing to a helper
statx: hide interfaces no longer used by io_uring
io_uring: call statx directly
statx: allow system call to be invoked from io_uring
io_uring: add io_statx structure
io_uring: get rid of manual punting in io_close
io_uring: separate DRAIN flushing into a cold path
io_uring: don't re-read sqe->off in timeout_prep()
io_uring: simplify io_timeout locking
io_uring: fix flush req->refs underflow
io_uring: don't submit sqes when ctx->refs is dying
io_uring: async task poll trigger cleanup
io_uring: add tee(2) support
splice: export do_tee()
io_uring: don't repeat valid flag list
io_uring: rename io_file_put()
io_uring: remove req->needs_fixed_files
io_uring: cleanup io_poll_remove_one() logic
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VPc4QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpgQkEACnQlzWOfNQMz1AzgUAv/S8IYDJCLrkbjLZ
JK4pJv8Hjhss/7sS+fd8kyKe9VtaZz2IjmrXcC66RMMwtpx4iHnkRffoNAgEdGOl
/M5TCZGhs+F/mp3Lc0WdR5DFHkM6yy2Tkk9wCFLreB4bW67janAWnd7nbU4INqJj
+WqIgpzNMc/kfUhpBYTeQLORhL4e2TG9ADTi/zeUITlpnEsA65LOgXKEpeIFYnSX
KTl4GIZ9tjazG3Y1Eva7DYHDIErNNAtX67KBqf+WBgMV98eB0O6xIPN1WlmhDTqj
FGMLkb8msH1HHntvxDAuc4/ortnUy8vPI4o6zKP89HJJNjIM5p5eHEuVF5JnBw42
Rtu9Om6JqWx51nhAhJNBj9bUStYbhEl0vVQCwbkfPbDJhzTy3RR8z709q9+ZwOrL
xbp4aJBzqrzscjBEiSQbNCf2PyuOAdU0r1x81UN81ZN41d5qUcumcinjw4Y7vru8
z5zMlo1Iy/AWQYyu7jgHmnpI7ZyA/1Qclo5dV7aa72bLFaJa35e7QxgfQOFBA5dY
UZl6QPJRlnB80uGRzD5jCh2O2sQ3XZqYnpaKsUAka1GgbceCp9IC4A5mfZvpACsh
Xk8VXjlhvY/iPJsKLqrh4Oedg4Dj5M3PLL9C3MDfYeIP2qgXpbnk87UV1TPNSpY0
QcTxsXXXIw==
=H+/Z
-----END PGP SIGNATURE-----
Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"On top of the core changes, here are the block driver changes for this
merge window:
- NVMe changes:
- NVMe over Fibre Channel protocol updates, which also reach
over to drivers/scsi/lpfc (James Smart)
- namespace revalidation support on the target (Anthony
Iliopoulos)
- gcc zero length array fix (Arnd Bergmann)
- nvmet cleanups (Chaitanya Kulkarni)
- misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
- use a SRQ per completion vector (Max Gurtovoy)
- fix handling of runtime changes to the queue count (Weiping
Zhang)
- t10 protection information support for nvme-rdma and
nvmet-rdma (Israel Rukshin and Max Gurtovoy)
- target side AEN improvements (Chaitanya Kulkarni)
- various fixes and minor improvements all over, icluding the
nvme part of the lpfc driver"
- Floppy code cleanup series (Willy, Denis)
- Floppy contention fix (Jiri)
- Loop CONFIGURE support (Martijn)
- bcache fixes/improvements (Coly, Joe, Colin)
- q->queuedata cleanups (Christoph)
- Get rid of ioctl_by_bdev (Christoph, Stefan)
- md/raid5 allocation fixes (Coly)
- zero length array fixes (Gustavo)
- swim3 task state fix (Xu)"
* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
bcache: configure the asynchronous registertion to be experimental
bcache: asynchronous devices registration
bcache: fix refcount underflow in bcache_device_free()
bcache: Convert pr_<level> uses to a more typical style
bcache: remove redundant variables i and n
lpfc: Fix return value in __lpfc_nvme_ls_abort
lpfc: fix axchg pointer reference after free and double frees
lpfc: Fix pointer checks and comments in LS receive refactoring
nvme: set dma alignment to qword
nvmet: cleanups the loop in nvmet_async_events_process
nvmet: fix memory leak when removing namespaces and controllers concurrently
nvmet-rdma: add metadata/T10-PI support
nvmet: add metadata support for block devices
nvmet: add metadata/T10-PI support
nvme: add Metadata Capabilities enumerations
nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
nvmet: rename nvmet_rw_len to nvmet_rw_data_len
nvmet: add metadata characteristics for a namespace
nvme-rdma: add metadata/T10-PI support
nvme-rdma: introduce nvme_rdma_sgl structure
...
found by smatch:
drivers/net/net_failover.c:65 net_failover_open() error:
we previously assumed 'primary_dev' could be null (see line 43)
Fixes: cfc80d9a11 ("net: Introduce net_failover driver")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VOwMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoR7EADAlz3TCkb4wwuHytTBDrm6gVDdsJ9zUfQW
Cl2ASLtufA8PWZUCEI3vhFyOe6P5e+ZZ0O2HjljSevmHyogCaRYXFYVfbWKcQKuk
AcxiTgnYNevh8KbGLfJY1WL4eXsY+C3QUGivg35cCgrx+kr9oDaHMeqA9Tm1plyM
FSprDBoSmHPqRxiV/1gnr8uXLX6K7i/fHzwmKgySMhavum7Ma8W3wdAGebzvQwrO
SbFSuJVgz06e4B1Fzr/wSvVNUE/qW/KqfGuQKIp7VQFIywbgG7TgRMHjE1FSnpnh
gn+BfL+O5gc0sTvcOTGOE0SRWWwLx961WNg8Azq08l3fzsxLA6h8/AnoDf3i+QMA
rHmLpWZIic2xPSvjaFHX3/V9ITyGYeAMpAR77EL+4ivWrKv5JrBhnSLDt1fKILdg
5elxm7RDI+C4nCP4xuTlVCy5gCd6gwjgytKj+NUWhNq1WiGAD0B54SSiV+SbCSH6
Om2f5trcxz8E4pqWcf0k3LjFapVKRNV8v/+TmVkCdRPBl3y9P0h0wFTkkcEquqnJ
y7Yq6efdWviRCnX5w/r/yj0qBuk4xo5hMVsPmlthCWtnBm+xZQ6LwMRcq4HQgZgR
2SYNscZ3OFMekHssH7DvY4DAy1J+n83ims+KzbScbLg2zCZjh/scQuv38R5Eh9WZ
rCS8c+T7Ig==
=HYf4
-----END PGP SIGNATURE-----
Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"Core block changes that have been queued up for this release:
- Remove dead blk-throttle and blk-wbt code (Guoqing)
- Include pid in blktrace note traces (Jan)
- Don't spew I/O errors on wouldblock termination (me)
- Zone append addition (Johannes, Keith, Damien)
- IO accounting improvements (Konstantin, Christoph)
- blk-mq hardware map update improvements (Ming)
- Scheduler dispatch improvement (Salman)
- Inline block encryption support (Satya)
- Request map fixes and improvements (Weiping)
- blk-iocost tweaks (Tejun)
- Fix for timeout failing with error injection (Keith)
- Queue re-run fixes (Douglas)
- CPU hotplug improvements (Christoph)
- Queue entry/exit improvements (Christoph)
- Move DMA drain handling to the few drivers that use it (Christoph)
- Partition handling cleanups (Christoph)"
* tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
block: mark bio_wouldblock_error() bio with BIO_QUIET
blk-wbt: rename __wbt_update_limits to wbt_update_limits
blk-wbt: remove wbt_update_limits
blk-throttle: remove tg_drain_bios
blk-throttle: remove blk_throtl_drain
null_blk: force complete for timeout request
blk-mq: drain I/O when all CPUs in a hctx are offline
blk-mq: add blk_mq_all_tag_iter
blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
blk-mq: use BLK_MQ_NO_TAG in more places
blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
blk-mq: move more request initialization to blk_mq_rq_ctx_init
blk-mq: simplify the blk_mq_get_request calling convention
blk-mq: remove the bio argument to ->prepare_request
nvme: force complete cancelled requests
blk-mq: blk-mq: provide forced completion method
block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
block: blk-crypto-fallback: remove redundant initialization of variable err
block: reduce part_stat_lock() scope
block: use __this_cpu_add() instead of access by smp_processor_id()
...
Just finished bisecting mmotm, to find why a test which used to take
four minutes now took more than an hour: the __buffer_migrate_page()
cleanup left behind a get_page() which attach_page_private() now does.
Fixes: cd0f371544 ("mm/migrate.c: call detach_page_private to cleanup code")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tuong Lien says:
====================
tipc: revert two patches
We revert two patches:
tipc: Fix potential tipc_node refcnt leak in tipc_rcv
tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv
which prevented TIPC encryption from working properly and caused kernel
panic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 441870ee42.
Like the previous patch in this series, we revert the above commit that
causes similar issues with the 'aead' object.
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit de05842076.
There is no actual tipc_node refcnt leak as stated in the above commit.
The refcnt is hold carefully for the case of an asynchronous decryption
(i.e. -EINPROGRESS/-EBUSY and skb = NULL is returned), so that the node
object cannot be freed in the meantime. The counter will be re-balanced
when the operation's callback arrives with the decrypted buffer if any.
In other cases, e.g. a synchronous crypto the counter will be decreased
immediately when it is done.
Now with that commit, a kernel panic will occur when there is no node
found (i.e. n = NULL) in the 'tipc_rcv()' or a premature release of the
node object.
This commit solves the issues by reverting the said commit, but keeping
one valid case that the 'skb_linearize()' is failed.
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Tested-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It makes sense to allow changes to get/set rx flow hash callback only
when rss is enabled. This patch restricts get_rss_hash_opts and
set_rss_hash_opts methods to allow querying and configuring different
Rx flow hash configurations only when rss is enabled
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add support to change TX/RX queue number with "ethtool -L combined".
V5 -> V6: remove check for carrier in hinic_xmit_frame
V4 -> V5: change time zone in patch header
V3 -> V4: update date in patch header
V2 -> V3: remove check for zero channels->combined_count
V1 -> V2: update commit message("ethtool -L" to "ethtool -L combined")
V0 -> V1: remove check for channels->tx_count/rx_count/other_count
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-06-02
The following pull-request contains BPF _fixes-only_ for your *net-next*
tree.
We've added 10 non-merge commits during the last 1 day(s) which contain
a total of 15 files changed, 229 insertions(+), 74 deletions(-).
The main changes are:
1) Several fixes to s390 BPF JIT e.g. fixing kernel panic when BPF stack is
not 8-byte aligned, from Ilya Leoshkevich.
2) Fix bpf_skb_adjust_room() helper's CHECKSUM_UNNECESSARY handling which
was wrongly bypassing TCP checksum verification, from Daniel Borkmann.
3) Fix tools/bpf/ build under MAKEFLAGS=rR which causes built-in CXX and
others vars to be undefined, also from Ilya Leoshkevich.
4) Fix BPF ringbuf's selftest shared sample_cnt variable to avoid compiler
optimizations on it, from Andrii Nakryiko.
5) Fix up test_verifier selftest due to addition of rx_queue_mapping to
the bpf_sock structure, from Alexei Starovoitov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This series adds a selftest for hmm_range_fault() and several of the
DEVICE_PRIVATE migration related actions, and another simplification for
hmm_range_fault()'s API.
- Simplify hmm_range_fault() with a simpler return code, no
HMM_PFN_SPECIAL, and no customizable output PFN format
- Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related
functionality
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7VQr8ACgkQOG33FX4g
mxrpcg/+O+oZ2p8FDTZi/0BTaU0crUiKwJngmmv78UuvD8nzhOZ0fkhK2lsXn9Uo
70lYbfDUSX2TbReP7y39VArW0v+Bj7wo9/7AZ+R2o5A0ajC6kccjGdnb7uEc3L6v
CR+uumRYf/ZNz13cbuRBbYEz477DGnz+3vhBb4FLNTFj9XiNAC61jA1WUI0ep6x3
lDrkhDatqmdBJ+EqZDMq2+UH+lWbkptQT7hPqgEp6o7FqdnySxRd+rT3hALz5wNP
fbryfWXM7V1eh7Kxr2mBJJqIkgbdhGLj2yLl1Iz11BbG6u7AT20r23WTvJ7hUCyt
18574twdltZ81gheqqN7KVYYAo+5seMfP14QdthqzzBMo3pOeLG0JMVqQNisDPgn
Tf4lWF/GR7ajKxyRbLdvUgRE7pFQ9VMAiP86GoIpBFmSZQQDwcecnoYxg60zsTwR
yuf60gopfNsSWNmDqKT3td12PQyFQYHYT6ue1eW6Rb9P+yA++tZaGkvGFn7kHeNV
ZeUqsKEy6a9l6cDrFzNmsCcdNZg/qmw9mKFfa/4RRulU5jlskt/e52NiLaLU2rsr
0Tot3j5tMufLLorZPprMI3Z/M9ohVAS5DkX6ttcZDs5v0iGQEUOOnq0cXmwlJQ9I
0CHr2ImjiDr9v2fS+5ixaRNSHfnQWnHxcqq79UZiTjtPW1Daauo=
=twev
-----END PGP SIGNATURE-----
Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
"This series adds a selftest for hmm_range_fault() and several of the
DEVICE_PRIVATE migration related actions, and another simplification
for hmm_range_fault()'s API.
- Simplify hmm_range_fault() with a simpler return code, no
HMM_PFN_SPECIAL, and no customizable output PFN format
- Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related
functionality"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
MAINTAINERS: add HMM selftests
mm/hmm/test: add selftests for HMM
mm/hmm/test: add selftest driver for HMM
mm/hmm: remove the customizable pfn format from hmm_range_fault
mm/hmm: remove HMM_PFN_SPECIAL
drm/amdgpu: remove dead code after hmm_range_fault()
mm/hmm: make hmm_range_fault return 0 or -1