Commit Graph

7346 Commits (df6afe2f7c19349de2ee560dc62ea4d9ad3ff889)

Author SHA1 Message Date
Mauro Carvalho Chehab 463c43fcd9 Linux 5.9-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9xBkIeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGSCoIAKIRBovDAx0dA+qV
 vf7PncPra8ofbext+c70G6UV27Eg9dBnLWRTTbOLK07yVdPs+fH3P5Jq1LPTKX9T
 hJfhNVPzxX5dOney/caEllTistzdLTzYrP4OLnr/z7Bc5dlDsxAdO2F4/EIWxk0U
 qEsyOWubVLTL5UkFdlosyyEDwQkWCgkvWXlR50WPRvDlxCY5OFK6YKE9WdGoAtcj
 YRPVIIjJmlPFG9eXqH22rn2hLmXbIhszgpypaNaZBZu9Yt+uhEMfLqnj1dLlxzHf
 zRj3EHq2ifHJm5AdJYsjUxi3vUMVow6QfU0LDFJ4LYkE8o/zcpP13MZKYoA0lWiI
 DueaKws=
 =M48O
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc7' into patchwork

Linux 5.9-rc7

* tag 'v5.9-rc7': (683 commits)
  Linux 5.9-rc7
  mm/thp: Split huge pmds/puds if they're pinned when fork()
  mm: Do early cow for pinned pages during fork() for ptes
  mm/fork: Pass new vma pointer into copy_page_range()
  mm: Introduce mm_struct.has_pinned
  mm: validate pmd after splitting
  mm: don't rely on system state to detect hot-plug operations
  mm: replace memmap_context by meminit_context
  arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
  lib/memregion.c: include memregion.h
  lib/string.c: implement stpcpy
  mm/migrate: correct thp migration stats
  mm/gup: fix gup_fast with dynamic page table folding
  mm: memcontrol: fix missing suffix of workingset_restore
  mm, THP, swap: fix allocating cluster for swapfile by mistake
  mm: slab: fix potential double free in ___cache_free
  Documentation/llvm: Fix clang target examples
  io_uring: ensure async buffered read-retry is setup properly
  KVM: SVM: Add a dedicated INVD intercept routine
  io_uring: don't unconditionally set plug->nowait = true
  ...
2020-10-04 12:19:12 +02:00
Guillaume Nault a45294af9e net/sched: act_mpls: Add action to push MPLS LSE before Ethernet header
Define the MAC_PUSH action which pushes an MPLS LSE before the mac
header (instead of between the mac and the network headers as the
plain PUSH action does).

The only special case is when the skb has an offloaded VLAN. In that
case, it has to be inlined before pushing the MPLS header.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03 17:28:45 -07:00
Guillaume Nault 19fbcb36a3 net/sched: act_vlan: Add {POP,PUSH}_ETH actions
Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to
respectively pop and push a base Ethernet header at the beginning of a
frame.

POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any,
must be stripped before calling POP_ETH.

PUSH_ETH is restricted to skbs with no mac_header, and only the MAC
addresses can be configured. The Ethertype is automatically set from
skb->protocol. These restrictions ensure that all skb's fields remain
consistent, so that this action can't confuse other part of the
networking stack (like GSO).

Since openvswitch already had these actions, consolidate the code in
skbuff.c (like for vlan and mpls push/pop).

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03 17:28:45 -07:00
Jakub Kicinski e992a6eda9 genetlink: allow dumping command-specific policy
Right now CTRL_CMD_GETPOLICY can only dump the family-wide
policy. Support dumping policy of a specific op.

v3:
 - rebase after per-op policy export and handle that
v2:
 - make cmd U32, just in case.
v1:
 - don't echo op in the output in a naive way, this should
   make it cleaner to extend the output format for dumping
   policies for all the commands at once in the future.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20201001225933.1373426-11-kuba@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03 14:18:29 -07:00
Johannes Berg 50a896cf2d genetlink: properly support per-op policy dumping
Add support for per-op policy dumping. The data is pretty much
as before, except that now the assumption that the policy with
index 0 is "the" policy no longer holds - you now need to look
at the new CTRL_ATTR_OP_POLICY attribute which is a nested attr
(indexed by op) containing attributes for do and dump policies.

When a single op is requested, the CTRL_ATTR_OP_POLICY will be
added in the same way, since do and dump policies may differ.

v2:
 - conditionally advertise per-command policies only if there
   actually is a policy being used for the do/dump and it's
   present at all

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03 14:18:29 -07:00
Gustavo A. R. Silva f5ace5ef37 block: scsi_ioctl: Avoid the use of one-element arrays
One-element arrays are being deprecated[1]. Replace the one-element array
with a simple object of type compat_caddr_t: 'compat_caddr_t unused'[2],
once it seems this field is actually never used.

Also, update struct cdrom_generic_command in UAPI by adding an
anonimous union to avoid using the one-element array _reserved_.

[1] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays
[2] https://github.com/KSPP/linux/issues/86

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/lkml/5f76f5d0.qJ4t%2FHWuRzSW7bTa%25lkp@intel.com/
Build-tested-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02 17:58:52 -06:00
David S. Miller 26d0a8edca Another set of changes, this time with:
* lots more S1G band support
  * 6 GHz scanning, finally
  * kernel-doc fixes
  * non-split wiphy dump fixes in nl80211
  * various other small cleanups/features
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl92/IsACgkQB8qZga/f
 l8QrBxAAi8HJdFyZtCIuLrXL3KHzey5AjmrYuHdsFcdk8NJZkEco17I3l05D9ek6
 76VvqjiYzDwdmgoHr3yz0K7pOAoTRpBKlaecvZLPXWf2bVhebWSU5EPcrTZHolrJ
 JBoBj4FU6Im/MnFbeiKxPj3M+NTQrLdekODSeaC5hFhi/oSF9lap6RMC8sz4YrVp
 9yKzB8zjz+eL4wL3EsztEzpTxbvHTaVMe0XBVou7Fg2ZauJGwqMxpIukpMWUmmNr
 EequhVFpdlXbVMle8wP4ZR58c4+O1kbRoYL9WhAILtdDhCKfLccWXnlUjzuQlCeB
 RH/jzG7AlVhm972oUuqG9szAcU8hEgWdsNEML7pilXmFk/ZSNLpUZfZCAILn+Gd3
 8oMQnXp2br+DLzf1SO7cxpL2KrTNjrb4gcJVBJ9eBlDjK/64N22MqZkpOKcMxq51
 ocmf1MJ1TbAbZn/kY2hsoaPYt2+bm1umMa/t/Pwuds+xKZEOOuPNgZQILcAsfJZB
 2OWDDT+RNLo/K4mPETtyQQZoCxAWB9n/CcnU+UTsmnUmMsEnCEbPnbYKBrc6jX1l
 jSP6XUD8fxhB2lfW+SPtQPnAi86+gblXVvEO8zm0+ez3juItlIsVcRY8ey/j9N1F
 uorpycvrfl6+Q1+mmhe6et2r+TLdGs73I0PJ44HRA9JZexKBrDg=
 =B9Xl
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2020-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Another set of changes, this time with:
 * lots more S1G band support
 * 6 GHz scanning, finally
 * kernel-doc fixes
 * non-split wiphy dump fixes in nl80211
 * various other small cleanups/features
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 15:33:13 -07:00
Hao Luo 63d9b80dcf bpf: Introducte bpf_this_cpu_ptr()
Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This
helper always returns a valid pointer, therefore no need to check
returned value for NULL. Also note that all programs run with
preemption disabled, which means that the returned pointer is stable
during all the execution of the program.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com
2020-10-02 15:00:49 -07:00
Hao Luo eaa6bcb71e bpf: Introduce bpf_per_cpu_ptr()
Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
except that it may return NULL. This happens when the cpu parameter is
out of range. So the caller must check the returned value.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com
2020-10-02 15:00:49 -07:00
Hao Luo 4976b718c3 bpf: Introduce pseudo_btf_id
Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
ksym so that further dereferences on the ksym can use the BTF info
to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
the verifier reads the btf_id stored in the insn[0]'s imm field and
marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
which is encoded in btf_vminux by pahole. If the VAR is not of a struct
type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
and the mem_size is resolved to the size of the VAR's type.

>From the VAR btf_id, the verifier can also read the address of the
ksym's corresponding kernel var from kallsyms and use that to fill
dst_reg.

Therefore, the proper functionality of pseudo_btf_id depends on (1)
kallsyms and (2) the encoding of kernel global VARs in pahole, which
should be available since pahole v1.18.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com
2020-10-02 14:59:25 -07:00
David S. Miller 23a1f682a9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-10-01

The following pull-request contains BPF updates for your *net-next* tree.

We've added 90 non-merge commits during the last 8 day(s) which contain
a total of 103 files changed, 7662 insertions(+), 1894 deletions(-).

Note that once bpf(/net) tree gets merged into net-next, there will be a small
merge conflict in tools/lib/bpf/btf.c between commit 1245008122 ("libbpf: Fix
native endian assumption when parsing BTF") from the bpf tree and the commit
3289959b97 ("libbpf: Support BTF loading and raw data output in both endianness")
from the bpf-next tree. Correct resolution would be to stick with bpf-next, it
should look like:

  [...]
        /* check BTF magic */
        if (fread(&magic, 1, sizeof(magic), f) < sizeof(magic)) {
                err = -EIO;
                goto err_out;
        }
        if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) {
                /* definitely not a raw BTF */
                err = -EPROTO;
                goto err_out;
        }

        /* get file size */
  [...]

The main changes are:

1) Add bpf_snprintf_btf() and bpf_seq_printf_btf() helpers to support displaying
   BTF-based kernel data structures out of BPF programs, from Alan Maguire.

2) Speed up RCU tasks trace grace periods by a factor of 50 & fix a few race
   conditions exposed by it. It was discussed to take these via BPF and
   networking tree to get better testing exposure, from Paul E. McKenney.

3) Support multi-attach for freplace programs, needed for incremental attachment
   of multiple XDP progs using libxdp dispatcher model, from Toke Høiland-Jørgensen.

4) libbpf support for appending new BTF types at the end of BTF object, allowing
   intrusive changes of prog's BTF (useful for future linking), from Andrii Nakryiko.

5) Several BPF helper improvements e.g. avoid atomic op in cookie generator and add
   a redirect helper into neighboring subsys, from Daniel Borkmann.

6) Allow map updates on sockmaps from bpf_iter context in order to migrate sockmaps
   from one to another, from Lorenz Bauer.

7) Fix 32 bit to 64 bit assignment from latest alu32 bounds tracking which caused
   a verifier issue due to type downgrade to scalar, from John Fastabend.

8) Follow-up on tail-call support in BPF subprogs which optimizes x64 JIT prologue
   and epilogue sections, from Maciej Fijalkowski.

9) Add an option to perf RB map to improve sharing of event entries by avoiding remove-
   on-close behavior. Also, add BPF_PROG_TEST_RUN for raw_tracepoint, from Song Liu.

10) Fix a crash in AF_XDP's socket_release when memory allocation for UMEMs fails,
    from Magnus Karlsson.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-01 14:29:01 -07:00
Mike Snitzer 61931c0ee9 dm: export dm_copy_name_and_uuid
Allow DM targets to access the configured name and uuid.
Also, bump DM ioctl version.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-10-01 15:03:40 -04:00
Jacob Pan 6278eecba3 iommu/vt-d: Check UAPI data processed by IOMMU core
IOMMU generic layer already does sanity checks on UAPI data for version
match and argsz range based on generic information.

This patch adjusts the following data checking responsibilities:
- removes the redundant version check from VT-d driver
- removes the check for vendor specific data size
- adds check for the use of reserved/undefined flags

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1601051567-54787-7-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-01 14:52:46 +02:00
Jacob Pan d90573812e iommu/uapi: Handle data and argsz filled by users
IOMMU user APIs are responsible for processing user data. This patch
changes the interface such that user pointers can be passed into IOMMU
code directly. Separate kernel APIs without user pointers are introduced
for in-kernel users of the UAPI functionality.

IOMMU UAPI data has a user filled argsz field which indicates the data
length of the structure. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.

User data may also be extended, resulting in possible argsz increase.
Backward compatibility is ensured based on size and flags (or
the functional equivalent fields) checking.

This patch adds sanity checks in the IOMMU layer. In addition to argsz,
reserved/unused fields in padding, flags, and version are also checked.
Details are documented in Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1601051567-54787-6-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-01 14:52:46 +02:00
Jacob Pan 8d3bb3b8cb iommu/uapi: Use named union for user data
IOMMU UAPI data size is filled by the user space which must be validated
by the kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.

To use offsetof() on the union, we must make it named.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/linux-iommu/20200611145518.0c2817d6@x1.home/
Link: https://lore.kernel.org/r/1601051567-54787-4-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-01 14:52:46 +02:00
Jacob Pan 1e6aaae93e iommu/uapi: Add argsz for user filled data
As IOMMU UAPI gets extended, user data size may increase. To support
backward compatibiliy, this patch introduces a size field to each UAPI
data structures. It is *always* the responsibility for the user to fill in
the correct size. Padding fields are adjusted to ensure 8 byte alignment.

Specific scenarios for user data handling are documented in:
Documentation/userspace-api/iommu.rst

As there is no current users of the API, struct version is not
incremented.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1601051567-54787-3-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-01 14:52:46 +02:00
Song Liu 792caccc45 bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array
Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.

Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com
2020-09-30 23:18:12 -07:00
Jens Axboe 9055420072 io_uring: provide IORING_ENTER_SQ_WAIT for SQPOLL SQ ring waits
When using SQPOLL, applications can run into the issue of running out of
SQ ring entries because the thread hasn't consumed them yet. The only
option for dealing with that is checking later, or busy checking for the
condition.

Provide IORING_ENTER_SQ_WAIT if applications want to wait on this
condition.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30 20:32:33 -06:00
Stefano Garzarella 7e84e1c756 io_uring: allow disabling rings during the creation
This patch adds a new IORING_SETUP_R_DISABLED flag to start the
rings disabled, allowing the user to register restrictions,
buffers, files, before to start processing SQEs.

When IORING_SETUP_R_DISABLED is set, SQE are not processed and
SQPOLL kthread is not started.

The restrictions registration are allowed only when the rings
are disable to prevent concurrency issue while processing SQEs.

The rings can be enabled using IORING_REGISTER_ENABLE_RINGS
opcode with io_uring_register(2).

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30 20:32:33 -06:00
Stefano Garzarella 21b55dbc06 io_uring: add IOURING_REGISTER_RESTRICTIONS opcode
The new io_uring_register(2) IOURING_REGISTER_RESTRICTIONS opcode
permanently installs a feature allowlist on an io_ring_ctx.
The io_ring_ctx can then be passed to untrusted code with the
knowledge that only operations present in the allowlist can be
executed.

The allowlist approach ensures that new features added to io_uring
do not accidentally become available when an existing application
is launched on a newer kernel version.

Currently is it possible to restrict sqe opcodes, sqe flags, and
register opcodes.

IOURING_REGISTER_RESTRICTIONS can only be made once. Afterwards
it is not possible to change restrictions anymore.
This prevents untrusted code from removing restrictions.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30 20:32:33 -06:00
Stefano Garzarella 9d4a75efa2 io_uring: use an enumeration for io_uring_register(2) opcodes
The enumeration allows us to keep track of the last
io_uring_register(2) opcode available.

Behaviour and opcodes names don't change.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30 20:32:33 -06:00
Daniel Borkmann b4ab314149 bpf: Add redirect_neigh helper as redirect drop-in
Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.

This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.

Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c325252 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.

With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:35 -07:00
Daniel Borkmann b426ce83ba bpf: Add classid helper only based on skb->sk
Similarly to 5a52ae4e32 ("bpf: Allow to retrieve cgroup v1 classid
from v2 hooks"), add a helper to retrieve cgroup v1 classid solely
based on the skb->sk, so it can be used as key as part of BPF map
lookups out of tc from host ns, in particular given the skb->sk is
retained these days when crossing net ns thanks to 9c4c325252
("skbuff: preserve sock reference when scrubbing the skb."). This
is similar to bpf_skb_cgroup_id() which implements the same for v2.
Kubernetes ecosystem is still operating on v1 however, hence net_cls
needs to be used there until this can be dropped in with the v2
helper of bpf_skb_cgroup_id().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:34 -07:00
Jose M. Guisado Gomez 002f217653 netfilter: nf_tables: add userdata attributes to nft_chain
Enables storing userdata for nft_chain. Field udata points to user data
and udlen stores its length.

Adds new attribute flag NFTA_CHAIN_USERDATA.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-30 11:46:50 +02:00
Kent Gibson b234d233fe gpio: uapi: document uAPI v1 as deprecated
Update uAPI documentation to deprecate v1 structs and ioctls.

Signed-off-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2020-09-30 10:57:08 +02:00
Kent Gibson b53911aa87 gpio: uapi: define uAPI v2
Add a new version of the uAPI to address existing 32/64-bit alignment
issues, add support for debounce and event sequence numbers, allow
requested lines with different configurations, and provide some future
proofing by adding padding reserved for future use.

The alignment issue relates to the gpioevent_data, which packs to different
sizes on 32-bit and 64-bit platforms. That creates problems for 32-bit apps
running on 64-bit kernels.  uAPI v2 addresses that particular issue, and
the problem more generally, by adding pad fields that explicitly pad
structs out to 64-bit boundaries, so they will pack to the same size now,
and even if some of the reserved padding is used for __u64 fields in the
future.

The new structs have been analysed with pahole to ensure that they
are sized as expected and contain no implicit padding.

The lack of future proofing in v1 makes it impossible to, for example,
add the debounce feature that is included in v2.
The future proofing is addressed by providing configurable attributes in
line config and reserved padding in all structs for future features.
Specifically, the line request, config, info, info_changed and event
structs receive updated versions and new ioctls.

As the majority of the structs and ioctls were being replaced, it is
opportune to rework some of the other aspects of the uAPI:

v1 has three different flags fields, each with their own separate
bit definitions.  In v2 that is collapsed to one - gpio_v2_line_flag.

The handle and event requests are merged into a single request, the line
request, as the two requests were mostly the same other than the edge
detection provided by event requests.  As a byproduct, the v2 uAPI allows
for multiple lines producing edge events on the same line handle.
This is a new capability as v1 only supports a single line in an event
request.

As a consequence, there are now only two types of file handle to be
concerned with, the chip and the line, and it is clearer which ioctls
apply to which type of handle.

There is also some minor renaming of fields for consistency compared to
their v1 counterparts, e.g. offset rather than lineoffset or line_offset,
and consumer rather than consumer_label.

Additionally, v1 GPIOHANDLES_MAX becomes GPIO_V2_LINES_MAX in v2 for
clarity, and the gpiohandle_data __u8 array becomes a bitmap in
gpio_v2_line_values.

The v2 uAPI is mostly a reorganisation and extension of v1, so userspace
code, particularly libgpiod, should readily port to it.

Signed-off-by: Kent Gibson <warthog618@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2020-09-30 10:56:38 +02:00
Kent Gibson 539430fbbc gpio: uapi: define GPIO_MAX_NAME_SIZE for array sizes
Replace constant array sizes with a macro constant to clarify the source
of array sizes, provide a place to document any constraints on the size,
and to simplify array sizing in userspace if constructing structs
from their composite fields.

Signed-off-by: Kent Gibson <warthog618@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2020-09-30 10:56:34 +02:00
Jacob Keller 2ec13cbcfa devlink: include <linux/const.h> for _BITUL
Commit 5d5b4128c4 ("devlink: introduce flash update overwrite mask")
added a usage of _BITUL to the UAPI <linux/devlink.h> header, but failed
to include the header file where it was defined. It happens that this
does not break any existing kernel include chains because it gets
included through other sources. However, when including the UAPI headers
in a userspace application (such as devlink in iproute2), _BITUL is not
defined.

Fixes: 5d5b4128c4 ("devlink: introduce flash update overwrite mask")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 13:58:41 -07:00
Mike Snitzer 1471308fb5 Merge remote-tracking branch 'jens/for-5.10/block' into dm-5.10
DM depends on these block 5.10 commits:

22ada802ed block: use lcm_not_zero() when stacking chunk_sectors
07d098e6bb block: allow 'chunk_sectors' to be non-power-of-2
021a24460d block: add QUEUE_FLAG_NOWAIT
6abc49468e dm: add support for REQ_NOWAIT and enable it for linear target

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29 16:31:35 -04:00
Tom Parkin 3f47cb4c1c l2tp: report rx cookie discards in netlink get
When an L2TPv3 session receives a data frame with an incorrect cookie
l2tp_core logs a warning message and bumps a stats counter to reflect
the fact that the packet has been dropped.

However, the stats counter in question is missing from the l2tp_netlink
get message for tunnel and session instances.

Include the statistic in the netlink get response.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 13:26:36 -07:00
Toke Høiland-Jørgensen 4a1e7c0c63 bpf: Support attaching freplace programs to multiple attach points
This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.

The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.

The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk
2020-09-29 13:09:24 -07:00
Alan Maguire eb411377ae bpf: Add bpf_seq_printf_btf helper
A helper is added to allow seq file writing of kernel data
structures using vmlinux BTF.  Its signature is

long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
                        u32 btf_ptr_size, u64 flags);

Flags and struct btf_ptr definitions/use are identical to the
bpf_snprintf_btf helper, and the helper returns 0 on success
or a negative error value.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com
2020-09-28 18:26:58 -07:00
Alan Maguire c4d0bfb450 bpf: Add bpf_snprintf_btf helper
A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF).  Its signature is

long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
		      u32 btf_ptr_size, u64 flags);

struct btf_ptr * specifies

- a pointer to the data to be traced
- the BTF id of the type of data pointed to
- a flags field is provided for future use; these flags
  are not to be confused with the BTF_F_* flags
  below that control how the btf_ptr is displayed; the
  flags member of the struct btf_ptr may be used to
  disambiguate types in kernel versus module BTF, etc;
  the main distinction is the flags relate to the type
  and information needed in identifying it; not how it
  is displayed.

For example a BPF program with a struct sk_buff *skb
could do the following:

	static struct btf_ptr b = { };

	b.ptr = skb;
	b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
	bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);

Default output looks like this:

(struct sk_buff){
 .transport_header = (__u16)65535,
 .mac_header = (__u16)65535,
 .end = (sk_buff_data_t)192,
 .head = (unsigned char *)0x000000007524fd8b,
 .data = (unsigned char *)0x000000007524fd8b,
 .truesize = (unsigned int)768,
 .users = (refcount_t){
  .refs = (atomic_t){
   .counter = (int)1,
  },
 },
}

Flags modifying display are as follows:

- BTF_F_COMPACT:	no formatting around type information
- BTF_F_NONAME:		no struct/union member names/types
- BTF_F_PTR_RAW:	show raw (unobfuscated) pointer values;
			equivalent to %px.
- BTF_F_ZERO:		show zero-valued struct/union members;
			they are not displayed by default

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
2020-09-28 18:26:58 -07:00
Song Liu 1b4d60ec16 bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
2020-09-28 21:52:36 +02:00
Rajkumar Manoharan f5bec330e3 nl80211: extend support to config spatial reuse parameter set
Allow the user to configure below Spatial Reuse Parameter Set element.
  * Non-SRG OBSS PD Max Offset
  * SRG BSS Color Bitmap
  * SRG Partial BSSID Bitmap

Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org>
Link: https://lore.kernel.org/r/1601278091-20313-2-git-send-email-rmanohar@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-28 15:07:41 +02:00
Thomas Pedersen 58ef7c1b55 nl80211: include frequency offset in survey info
Recently channels gained a potential frequency offset, so
include this in the per-channel survey info.

Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200922022818.15855-16-thomas@adapt-ip.com
[add the offset only if non-zero]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-28 14:09:52 +02:00
Thomas Pedersen d2b7588a47 nl80211: support S1G capability overrides in assoc
NL80211_ATTR_S1G_CAPABILITY can be passed along with
NL80211_ATTR_S1G_CAPABILITY_MASK to NL80211_CMD_ASSOCIATE
to indicate S1G capabilities which should override the
hardware capabilities in eg. the association request.

Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200922022818.15855-4-thomas@adapt-ip.com
[johannes: always require both attributes together, commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-28 13:53:05 +02:00
Tova Mussai c8cb5b854b nl80211/cfg80211: support 6 GHz scanning
Support 6 GHz scanning, by
 * a new scan flag to scan for colocated BSSes advertised
   by (and found) APs on 2.4 & 5 GHz
 * doing the necessary reduced neighbor report parsing for
   this, to find them
 * adding the ability to split the scan request in case the
   device by itself cannot support this.

Also add some necessary bits in mac80211 to not break with
these changes.

Signed-off-by: Tova Mussai <tova.mussai@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200918113313.232917c93af9.Ida22f0212f9122f47094d81659e879a50434a6a2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-28 13:53:04 +02:00
Dafna Hirschfeld 62aacfa9bf media: v4l2: extend the CSC API to subdevice.
This patch extends the CSC API in video devices to be supported
also on sub-devices. The flag V4L2_MBUS_FRAMEFMT_SET_CSC set by
the application when calling VIDIOC_SUBDEV_S_FMT ioctl.
The flags:

V4L2_SUBDEV_MBUS_CODE_CSC_COLORSPACE,
V4L2_SUBDEV_MBUS_CODE_CSC_XFER_FUNC,
V4L2_SUBDEV_MBUS_CODE_CSC_YCBCR_ENC/V4L2_SUBDEV_MBUS_CODE_CSC_HSV_ENC
V4L2_SUBDEV_MBUS_CODE_CSC_QUANTIZATION

are set by the driver in the VIDIOC_SUBDEV_ENUM_MBUS_CODE ioctl.

New 'flags' fields were added to the structs
v4l2_subdev_mbus_code_enum, v4l2_mbus_framefmt which are borrowed
from the 'reserved' field

The patch also replaces the 'ycbcr_enc' field in
'struct v4l2_mbus_framefmt' with a union that includes 'hsv_enc'

Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-09-26 10:22:29 +02:00
Dafna Hirschfeld b38c73ca1c media: v4l2: add support for colorspace conversion API (CSC) for video capture
For video capture it is the driver that reports the colorspace,
transfer function, Y'CbCr/HSV encoding and quantization range
used by the video, and there is no way to request something
different, even though many HDTV receivers have some sort of
colorspace conversion capabilities.

For output video this feature already exists since the application
specifies this information for the video format it will send out, and
the transmitter will enable any available CSC if a format conversion has
to be performed in order to match the capabilities of the sink.

For video capture we propose adding new v4l2_pix_format flag:
V4L2_PIX_FMT_FLAG_SET_CSC. The flag is set by the application,
the driver will interpret the colorspace, xfer_func, ycbcr_enc/hsv_enc
and quantization fields as the requested colorspace information and will
attempt to do the conversion it supports.

Drivers set the flags
V4L2_FMT_FLAG_CSC_COLORSPACE,
V4L2_FMT_FLAG_CSC_XFER_FUNC,
V4L2_FMT_FLAG_CSC_YCBCR_ENC/V4L2_FMT_FLAG_CSC_HSV_ENC,
V4L2_FMT_FLAG_CSC_QUANTIZATION,
in the flags field of the struct v4l2_fmtdesc during enumeration to
indicate that they support colorspace conversion for the respective field.

Drivers do not have to actually look at the flags. If the flags are not
set, then the fields 'colorspace', 'xfer_func', 'ycbcr_enc/hsv_enc',
and 'quantization' are set to the default values by the core, i.e. just
pass on the received format without conversion.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-09-26 10:20:57 +02:00
Jacob Keller 5d5b4128c4 devlink: introduce flash update overwrite mask
Sections of device flash may contain settings or device identifying
information. When performing a flash update, it is generally expected
that these settings and identifiers are not overwritten.

However, it may sometimes be useful to allow overwriting these fields
when performing a flash update. Some examples include, 1) customizing
the initial device config on first programming, such as overwriting
default device identifying information, or 2) reverting a device
configuration to known good state provided in the new firmware image, or
3) in case it is suspected that current firmware logic for managing the
preservation of fields during an update is broken.

Although some devices are able to completely separate these types of
settings and fields into separate components, this is not true for all
hardware.

To support controlling this behavior, a new
DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK is defined. This is an
nla_bitfield32 which will define what subset of fields in a component
should be overwritten during an update.

If no bits are specified, or of the overwrite mask is not provided, then
an update should not overwrite anything, and should maintain the
settings and identifiers as they are in the previous image.

If the overwrite mask has the DEVLINK_FLASH_OVERWRITE_SETTINGS bit set,
then the device should be configured to overwrite any of the settings in
the requested component with settings found in the provided image.

Similarly, if the DEVLINK_FLASH_OVERWRITE_IDENTIFIERS bit is set, the
device should be configured to overwrite any device identifiers in the
requested component with the identifiers from the image.

Multiple overwrite modes may be combined to indicate that a combination
of the set of fields that should be overwritten.

Drivers which support the new overwrite mask must set the
DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK in the
supported_flash_update_params field of their devlink_ops.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25 17:20:57 -07:00
Martin KaFai Lau 27e5203bd9 bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_sk_assign() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

The bpf_sk_lookup_assign() is taking ARG_PTR_TO_SOCKET_"OR_NULL".  Meaning
it specifically takes a literal NULL.  ARG_PTR_TO_BTF_ID_SOCK_COMMON
does not allow a literal NULL, so another ARG type is required
for this purpose and another follow-up patch can be used if
there is such need.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000415.3857374-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau c0df236e13 bpf: Change bpf_tcp_*_syncookie to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_tcp_*_syncookie() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000409.3856725-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Martin KaFai Lau 592a349864 bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_sk_storage_*() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

A micro benchmark has been done on a "cgroup_skb/egress" bpf program
which does a bpf_sk_storage_get().  It was driven by netperf doing
a 4096 connected UDP_STREAM test with 64bytes packet.
The stats from "kernel.bpf_stats_enabled" shows no meaningful difference.

The sk_storage_get_btf_proto, sk_storage_delete_btf_proto,
btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are
no longer needed, so they are removed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Martin KaFai Lau a5fa25adf0 bpf: Change bpf_sk_release and bpf_sk_*cgroup_id to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
The previous patch allows the networking bpf prog to use the
bpf_skc_to_*() helpers to get a PTR_TO_BTF_ID socket pointer,
e.g. "struct tcp_sock *".  It allows the bpf prog to read all the
fields of the tcp_sock.

This patch changes the bpf_sk_release() and bpf_sk_*cgroup_id()
to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will
work with the pointer returned by the bpf_skc_to_*() helpers
also.  For example, the following will work:

	sk = bpf_skc_lookup_tcp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0);
	if (!sk)
		return;
	tp = bpf_skc_to_tcp_sock(sk);
	if (!tp) {
		bpf_sk_release(sk);
		return;
	}
	lsndtime = tp->lsndtime;
	/* Pass tp to bpf_sk_release() will also work */
	bpf_sk_release(tp);

Since PTR_TO_BTF_ID could be NULL, the helper taking
ARG_PTR_TO_BTF_ID_SOCK_COMMON has to check for NULL at runtime.

A btf_id of "struct sock" may not always mean a fullsock.  Regardless
the helper's running context may get a non-fullsock or not,
considering fullsock check/handling is pretty cheap, it is better to
keep the same verifier expectation on helper that takes ARG_PTR_TO_BTF_ID*
will be able to handle the minisock situation.  In the bpf_sk_*cgroup_id()
case,  it will try to get a fullsock by using sk_to_full_sk() as its
skb variant bpf_sk"b"_*cgroup_id() has already been doing.

bpf_sk_release can already handle minisock, so nothing special has to
be done.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000356.3856047-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Peter Oskolkov 2a36ab717e rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
This patchset is based on Google-internal RSEQ work done by Paul
Turner and Andrew Hunter.

When working with per-CPU RSEQ-based memory allocations, it is
sometimes important to make sure that a global memory location is no
longer accessed from RSEQ critical sections. For example, there can be
two per-CPU lists, one is "active" and accessed per-CPU, while another
one is inactive and worked on asynchronously "off CPU" (e.g.  garbage
collection is performed). Then at some point the two lists are
swapped, and a fast RCU-like mechanism is required to make sure that
the previously active list is no longer accessed.

This patch introduces such a mechanism: in short, membarrier() syscall
issues an IPI to a CPU, restarting a potentially active RSEQ critical
section on the CPU.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-1-posk@google.com
2020-09-25 14:23:27 +02:00
Elena Petrova 77ebdabe8d crypto: af_alg - add extra parameters for DRBG interface
Extend the user-space RNG interface:
  1. Add entropy input via ALG_SET_DRBG_ENTROPY setsockopt option;
  2. Add additional data input via sendmsg syscall.

This allows DRBG to be tested with test vectors, for example for the
purpose of CAVP testing, which otherwise isn't possible.

To prevent erroneous use of entropy input, it is hidden under
CRYPTO_USER_API_RNG_CAVP config option and requires CAP_SYS_ADMIN to
succeed.

Signed-off-by: Elena Petrova <lenaptr@google.com>
Acked-by: Stephan Müller <smueller@chronox.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25 17:48:52 +10:00
Priyaranjan Jha ad2b9b0f8d tcp: skip DSACKs with dubious sequence ranges
Currently, we use length of DSACKed range to compute number of
delivered packets. And if sequence range in DSACK is corrupted,
we can get bogus dsacked/acked count, and bogus cwnd.

This patch put bounds on DSACKed range to skip update of data
delivery and spurious retransmission information, if the DSACK
is unlikely caused by sender's action:
- DSACKed range shouldn't be greater than maximum advertised rwnd.
- Total no. of DSACKed segments shouldn't be greater than total
  no. of retransmitted segs. Unlike spurious retransmits, network
  duplicates or corrupted DSACKs shouldn't be counted as delivery.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 20:15:45 -07:00
Jens Axboe ac8f7a0264 Merge branch 'for-5.10/block' into for-5.10/drivers
* for-5.10/block: (140 commits)
  bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag
  bdi: invert BDI_CAP_NO_ACCT_WB
  bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag
  mm: use SWP_SYNCHRONOUS_IO more intelligently
  bdi: remove BDI_CAP_SYNCHRONOUS_IO
  bdi: remove BDI_CAP_CGROUP_WRITEBACK
  block: lift setting the readahead size into the block layer
  md: update the optimal I/O size on reshape
  bdi: initialize ->ra_pages and ->io_pages in bdi_init
  aoe: set an optimal I/O size
  bcache: inherit the optimal I/O size
  drbd: remove dead code in device_to_statistics
  fs: remove the unused SB_I_MULTIROOT flag
  block: mark blkdev_get static
  PM: mm: cleanup swsusp_swap_check
  mm: split swap_type_of
  PM: rewrite is_hibernate_resume_dev to not require an inode
  mm: cleanup claim_swapfile
  ocfs2: cleanup o2hb_region_dev_store
  dasd: cleanup dasd_scan_partitions
  ...
2020-09-24 13:44:39 -06:00
Linus Torvalds 8fa5960f57 media fixes for v5.9-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl9sj1cACgkQCF8+vY7k
 4RXwjg//WBvX2zi7lS5UM8btQG+PoVE8dqlnD020OUy3S16PPseEPmXgUb54ohJW
 wFaMa1mSCd+dob0qtcd4n7sErjBuWicvxSQmGmEiKCFOWmivsmv7OMRS0tJVo19d
 HctpEyZykFWZWoIZxN+P4ZPhNVJ0zgr9zK13eM1WscnDkZ2wXKnSFftlELEtNctx
 oBDqLX5sBYcZYYRdblI/Ifbpl8Xi0vwbbpxvonIt2ZNHDqzwY+X+g8V5Zqx6R0ig
 MeTQ/3DGuR1pU+fP9+sazwVZLNoPhkormKbtkdPiQfxl1x3oM3S5KNhKdcdUU1Uw
 k50n4ijsLuTfAqPQqFcBLOLg1iCTB1758FR+6RaJASibIUusf6uuiQ6AaeJvQ3Js
 YUb+VODagcRaGKaxICIrEFkxDyn9SXDOyBvFZ+s9qFJwO1YrH+Y5C2IwaNJK/wcq
 OZBVGw4qDyZXpGjrevM0c+uGEDX1YvJPezrwfvGIvYhOQdqpGmdLWQXBGoSTG55b
 DljVmduvIfvAAHmrdmGeOV27MxGTVEKTl/AYScrMuiOi6YtGH+gnMc6brCN1pql9
 P2L2/0Ju45xiLkBClqi7GSmggoA7nw3OQJ9mS7QTf0ii0DPZnrrBysDC9hnfQLU1
 KtByJn99bmx6nP/ed1z5YZSLAKnlc2rztjpGc6MIkg8M+dYCtas=
 =xTpy
 -----END PGP SIGNATURE-----

Merge tag 'media/v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - fix a regression at the CEC adapter core

 - two uAPI patches (one revert) for changes in this development cycle

* tag 'media/v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: dt-bindings: media: imx274: Convert to json-schema
  media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag
  media: cec-adap.c: don't use flush_scheduled_work()
2020-09-24 09:05:04 -07:00
Nikolay Aleksandrov 9116ffbf1d net: bridge: mcast: add support for blocked port groups
When excluding S,G entries we need a way to block a particular S,G,port.
The new port group flag is managed based on the source's timer as per
RFCs 3376 and 3810. When a source expires and its port group is in
EXCLUDE mode, it will be blocked.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov 8266a0491e net: bridge: mcast: handle port group filter modes
We need to handle group filter mode transitions and initial state.
To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
a new port group in EXCLUDE mode) we need to add that port to all of
*,G ports' S,G entries for proper replication. When the EXCLUDE state is
changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
called after the source list processing because the assumption is that
all of the group's S,G entries will be created before transitioning to
EXCLUDE mode, i.e. most importantly its blocked entries will already be
added so it will not get automatically added to them.
The transition EXCLUDE -> INCLUDE happens only when a port group timer
expires, it requires us to remove that port from all of *,G ports' S,G
entries where it was automatically added previously.
Finally when we are adding a new S,G entry we must add all of *,G's
EXCLUDE ports to it.
In order to distinguish automatically added *,G EXCLUDE ports we have a
new port group flag - MDB_PG_FLAGS_STAR_EXCL.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov 8f8cb77e0b net: bridge: mcast: add rt_protocol field to the port group struct
We need to be able to differentiate between pg entries created by
user-space and the kernel when we start generating S,G entries for
IGMPv3/MLDv2's fast path. User-space entries are created by default as
RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can
allow user-space to provide the entry rt_protocol so we can
differentiate between who added the entries specifically (e.g. clag,
admin, frr etc).

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov 88d4bd1804 net: bridge: mdb: add support for add/del/dump of entries with source
Add new mdb attributes (MDBE_ATTR_SOURCE for setting,
MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb
entries with a source address (S,G). New S,G entries are created with
filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and
IPv6, they're validated and parsed based on their protocol.
S,G host joined entries which are added by user are not allowed yet.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov 9c4258c78a net: bridge: mdb: add support to extend add/del commands
Since the MDB add/del code expects an exact struct br_mdb_entry we can't
really add any extensions, thus add a new nested attribute at the level of
MDBA_SET_ENTRY called MDBA_SET_ENTRY_ATTRS which will be used to pass
all new options via netlink attributes. This patch doesn't change
anything functionally since the new attribute is not used yet, only
parsed.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
David S. Miller 6d772f328d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-09-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 95 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 4211 insertions(+), 2040 deletions(-).

The main changes are:

1) Full multi function support in libbpf, from Andrii.

2) Refactoring of function argument checks, from Lorenz.

3) Make bpf_tail_call compatible with functions (subprograms), from Maciej.

4) Program metadata support, from YiFei.

5) bpf iterator optimizations, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:11:11 -07:00
David S. Miller 3ab0a7a0c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Two minor conflicts:

1) net/ipv4/route.c, adding a new local variable while
   moving another local variable and removing it's
   initial assignment.

2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
   One pretty prints the port mode differently, whilst another
   changes the driver to try and obtain the port mode from
   the port node rather than the switch node.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 16:45:34 -07:00
Linus Torvalds d3017135c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:

 - fix failure to add bond interfaces to a bridge, the offload-handling
   code was too defensive there and recent refactoring unearthed that.
   Users complained (Ido)

 - fix unnecessarily reflecting ECN bits within TOS values / QoS marking
   in TCP ACK and reset packets (Wei)

 - fix a deadlock with bpf iterator. Hopefully we're in the clear on
   this front now... (Yonghong)

 - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)

 - fix AQL on mt76 devices with FW rate control and add a couple of AQL
   issues in mac80211 code (Felix)

 - fix authentication issue with mwifiex (Maximilian)

 - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)

 - fix exception handling for multipath routes via same device (David
   Ahern)

 - revert back to a BH spin lock flavor for nsid_lock: there are paths
   which do require the BH context protection (Taehee)

 - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)

 - fix ife module load deadlock (Cong)

 - make an adjustment to netlink reply message type for code added in
   this release (the sole change touching uAPI here) (Michal)

 - a number of fixes for small NXP and Microchip switches (Vladimir)

[ Pull request acked by David: "you can expect more of this in the
  future as I try to delegate more things to Jakub" ]

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
  net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
  net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
  net: Update MAINTAINERS for MediaTek switch driver
  net/mlx5e: mlx5e_fec_in_caps() returns a boolean
  net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
  net/mlx5e: kTLS, Fix leak on resync error flow
  net/mlx5e: kTLS, Add missing dma_unmap in RX resync
  net/mlx5e: kTLS, Fix napi sync and possible use-after-free
  net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
  net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
  net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  net/mlx5e: Fix endianness when calculating pedit mask first bit
  net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
  net/mlx5e: CT: Fix freeing ct_label mapping
  net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
  net/mlx5e: Use synchronize_rcu to sync with NAPI
  net/mlx5e: Use RCU to protect rq->xdp_prog
  ...
2020-09-22 14:43:50 -07:00
Eric Biggers c7f0207b61 fscrypt: make "#define fscrypt_policy" user-only
The fscrypt UAPI header defines fscrypt_policy to fscrypt_policy_v1,
for source compatibility with old userspace programs.

Internally, the kernel doesn't want that compatibility definition.
Instead, fscrypt_private.h #undefs it and re-defines it to a union.

That works for now.  However, in order to add
fscrypt_operations::get_dummy_policy(), we'll need to forward declare
'union fscrypt_policy' in include/linux/fscrypt.h.  That would cause
build errors because "fscrypt_policy" is used in ioctl numbers.

To avoid this, modify the UAPI header to make the fscrypt_policy
compatibility definition conditional on !__KERNEL__, and make the ioctls
use fscrypt_policy_v1 instead of fscrypt_policy.

Note that this doesn't change the actual ioctl numbers.

Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22 06:48:44 -07:00
Andra Paraschiv 15b760c37a nitro_enclaves: Add ioctl interface definition
The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* No changes.

v7 -> v8

* Add NE custom error codes for user space memory regions not backed by
  pages multiple of 2 MiB, invalid flags and enclave CID.
* Add max flag value for enclave image load info.

v6 -> v7

* Clarify in the ioctls documentation that the return value is -1 and
  errno is set on failure.
* Update the error code value for NE_ERR_INVALID_MEM_REGION_SIZE as it
  gets in user space as value 25 (ENOTTY) instead of 515. Update the
  NE custom error codes values range to not be the same as the ones
  defined in include/linux/errno.h, although these are not propagated
  to user space.

v5 -> v6

* Fix typo in the description about the NE CPU pool.
* Update documentation to kernel-doc format.
* Remove the ioctl to query API version.

v4 -> v5

* Add more details about the ioctl calls usage e.g. error codes, file
  descriptors used.
* Update the ioctl to set an enclave vCPU to not return a file
  descriptor.
* Add specific NE error codes.

v3 -> v4

* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
  ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
  and minor.

Reviewed-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-2-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-22 13:58:40 +02:00
David S. Miller ae4dd9a8c2 This time we have:
* some AP-side infrastructure for FILS discovery and
    unsolicited probe resonses
  * a major rework of the encapsulation/header conversion
    offload from Felix, to fit better with e.g. AP_VLAN
    interfaces
  * performance fix for VHT A-MPDU size, don't limit to HT
  * some initial patches for S1G (sub 1 GHz) support, more
    will come in this area
  * minor cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl9oqc0ACgkQB8qZga/f
 l8TdMxAAnGTS7p6n//5RrdNmbT/UgMPBPhdHJTXZy7q1z3hDI5xrWRrx26fQeB31
 exslnHd6q28bNGKscExcLRsP63SKn+P3PdUWJzbusya46OZnMgajNQYiQkL3ISfS
 MkF0Uyv7GmkkRn8nBg1ZbLMtzONxqn9uT6o3qyNODHwYgf/QjlEWstL7Q0fEy5UU
 gftBz4WOhMOsEQQu3XvLu2hMd6EI14ZWkChdwboYJ29GgJrCYnhO/0B4QpVJxBpp
 ROYQgF3c3Fis2tJpyZ0FNqG7T16MhjOD2+hyNAcX2+ZkkbSzn6B89jOd/qeowJOT
 FwYbJNQTUxBYH8+IYZeGVWRMyPuZdazrfccbTd9Lfrk3y/Hi+cuuneFFXykST6Zd
 EqRkumAOJY9b6HgcGTkfRFdY4SU4v3lcAOcAgg1fJVFvmLKcHfhD3xKyGz3Q1JLv
 v9x/9S+G69YWyxl09rMY9c78S6enxKmhJdV1wnv3zbALvqNTa2OpaZnCBx+hmkEy
 NoSowwYeb0PANk9PfA1N0+uQPVKu8SGT0FhxLj6NEAhCbbVBTqZ3z4TkdYByy/a0
 L80zk1ziyC4uS24+TcILHyCzRHkdLdsD8lEeVTDZIusU6nXX9imZ+yWGwIkbi35Y
 v+6fUns/1UMJqBc3J/wtr9pamoIBJT3Qe0lHC4Uy7CxuO5S1UVM=
 =Bnsm
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2020-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
This time we have:
 * some AP-side infrastructure for FILS discovery and
   unsolicited probe resonses
 * a major rework of the encapsulation/header conversion
   offload from Felix, to fit better with e.g. AP_VLAN
   interfaces
 * performance fix for VHT A-MPDU size, don't limit to HT
 * some initial patches for S1G (sub 1 GHz) support, more
   will come in this area
 * minor cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 14:55:09 -07:00
Dan Murphy 55f1331178 ethtool: Add 100base-FX link mode entries
Add entries for the 100base-FX full and half duplex supported modes.

$ ethtool eth0
        Supported ports: [ FIBRE ]
        Supported link modes:  100baseFX/Half 100baseFX/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes: 100baseFX/Half 100baseFX/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 100Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: MII
        PHYAD: 1
        Transceiver: external
        Supports Wake-on: gs
        Wake-on: d
        SecureOn password: 00:00:00:00:00:00
        Current message level: 0x00000000 (0)

        Link detected: yes

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-19 16:54:35 -07:00
Tuong Lien 23700da29b tipc: add automatic rekeying for encryption key
Rekeying is required for security since a key is less secure when using
for a long time. Also, key will be detached when its nonce value (or
seqno ...) is exhausted. We now make the rekeying process automatic and
configurable by user.

Basically, TIPC will at a specific interval generate a new key by using
the kernel 'Random Number Generator' cipher, then attach it as the node
TX key and securely distribute to others in the cluster as RX keys (-
the key exchange). The automatic key switching will then take over, and
make the new key active shortly. Afterwards, the traffic from this node
will be encrypted with the new session key. The same can happen in peer
nodes but not necessarily at the same time.

For simplicity, the automatically generated key will be initiated as a
per node key. It is not too hard to also support a cluster key rekeying
(e.g. a given node will generate a unique cluster key and update to the
others in the cluster...), but that doesn't bring much benefit, while a
per-node key is even more secure.

We also enable user to force a rekeying or change the rekeying interval
via netlink, the new 'set key' command option: 'TIPC_NLA_NODE_REKEYING'
is added for these purposes as follows:
- A value >= 1 will be set as the rekeying interval (in minutes);
- A value of 0 will disable the rekeying;
- A value of 'TIPC_REKEYING_NOW' (~0) will force an immediate rekeying;

The default rekeying interval is (60 * 24) minutes i.e. done every day.
There isn't any restriction for the value but user shouldn't set it too
small or too large which results in an "ineffective" rekeying (thats ok
for testing though).

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 13:58:37 -07:00
Tuong Lien daef1ee379 tipc: introduce encryption master key
In addition to the supported cluster & per-node encryption keys for the
en/decryption of TIPC messages, we now introduce one option for user to
set a cluster key as 'master key', which is simply a symmetric key like
the former but has a longer life cycle. It has two purposes:

- Authentication of new member nodes in the cluster. New nodes, having
  no knowledge of current session keys in the cluster will still be
  able to join the cluster as long as they know the master key. This is
  because all neighbor discovery (LINK_CONFIG) messages must be
  encrypted with this key.

- Encryption of session encryption keys during automatic exchange and
  update of those.This is a feature we will introduce in a later commit
  in this series.

We insert the new key into the currently unused slot 0 in the key array
and start using it immediately once the user has set it.
After joining, a node only knowing the master key should be fully
communicable to existing nodes in the cluster, although those nodes may
have their own session keys activated (i.e. not the master one). To
support this, we define a 'grace period', starting from the time a node
itself reports having no RX keys, so the existing nodes will use the
master key for encryption instead. The grace period can be extended but
will automatically stop after e.g. 5 seconds without a new report. This
is also the basis for later key exchanging feature as the new node will
be impossible to decrypt anything without the support from master key.

For user to set a master key, we define a new netlink flag -
'TIPC_NLA_NODE_KEY_MASTER', so it can be added to the current 'set key'
netlink command to specify the setting key to be a master key.

Above all, the traditional cluster/per-node key mechanism is guaranteed
to work when user comes not to use this master key option. This is also
compatible to legacy nodes without the feature supported.

Even this master key can be updated without any interruption of cluster
connectivity but is so is needed, this has to be coordinated and set by
the user.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 13:58:37 -07:00
Shannon Nelson f92970c694 devlink: add timeout information to status_notify
Add a timeout element to the DEVLINK_CMD_FLASH_UPDATE_STATUS
netlink message for use by a userland utility to show that
a particular firmware flash activity may take a long but
bounded time to finish.  Also add a handy helper for drivers
to make use of the new timeout value.

UI usage hints:
 - if non-zero, add timeout display to the end of the status line
 	[component] status_msg  ( Xm Ys : Am Bs )
     using the timeout value for Am Bs and updating the Xm Ys
     every second
 - if the timeout expires while awaiting the next update,
   display something like
 	[component] status_msg  ( timeout reached : Am Bs )
 - if new status notify messages are received, remove
   the timeout and start over

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 13:54:22 -07:00
Aloka Dixit 7443dcd1f1 nl80211: Unsolicited broadcast probe response support
This patch adds new attributes to support unsolicited broadcast
probe response transmission used for in-band
discovery in 6GHz band (IEEE P802.11ax/D6.0 26.17.2.3.2, AP behavior for
fast passive scanning).
The new attribute, NL80211_ATTR_UNSOL_BCAST_PROBE_RESP, is nested which
supports following parameters:
(1) NL80211_UNSOL_BCAST_PROBE_RESP_ATTR_INT - Packet interval
(2) NL80211_UNSOL_BCAST_PROBE_RESP_ATTR_TMPL - Template data

Signed-off-by: Aloka Dixit <alokad@codeaurora.org>
Link: https://lore.kernel.org/r/010101747a946698-aac263ae-2ed3-4dab-9590-0bc7131214e1-000000@us-west-2.amazonses.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18 14:06:20 +02:00
Aloka Dixit 291c49ded2 nl80211: Add FILS discovery support
FILS discovery attribute, NL80211_ATTR_FILS_DISCOVERY, is nested which
supports following parameters as given in IEEE Std 802.11ai-2016,
Annex C.3 MIB detail:
(1) NL80211_FILS_DISCOVERY_ATTR_INT_MIN - Minimum packet interval
(2) NL80211_FILS_DISCOVERY_ATTR_INT_MAX - Maximum packet interval
(3) NL80211_FILS_DISCOVERY_ATTR_TMPL - Template data

Signed-off-by: Aloka Dixit <alokad@codeaurora.org>
Link: https://lore.kernel.org/r/20200805011838.28166-2-alokad@codeaurora.org
[fix attribute and other names, use NLA_RANGE(), use policy only once]
Link: https://lore.kernel.org/r/010101747a7b38a8-306f06b2-9061-4baf-81c1-054a42a18e22-000000@us-west-2.amazonses.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18 14:06:19 +02:00
Thomas Pedersen d65a977087 nl80211: advertise supported channel width in S1G
S1G supports 5 channel widths: 1, 2, 4, 8, and 16. One
channel width is allowed per frequency in each operating
class, so it makes more sense to advertise the specific
channel width allowed.

Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200908190323.15814-3-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18 12:28:06 +02:00
Jakub Kicinski 78a3ea5557 net: remove comments on struct rtnl_link_stats
We removed the misleading comments from struct rtnl_link_stats64
when we added proper kdoc. struct rtnl_link_stats has the same
inline comments, so remove them, too.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17 17:58:13 -07:00
Michal Kubecek 19a83d36f9 ethtool: add and use message type for tunnel info reply
Tunnel offload info code uses ETHTOOL_MSG_TUNNEL_INFO_GET message type (cmd
field in genetlink header) for replies to tunnel info netlink request, i.e.
the same value as the request have. This is a problem because we are using
two separate enums for userspace to kernel and kernel to userspace message
types so that this ETHTOOL_MSG_TUNNEL_INFO_GET (28) collides with
ETHTOOL_MSG_CABLE_TEST_TDR_NTF which is what message type 28 means for
kernel to userspace messages.

As the tunnel info request reached mainline in 5.9 merge window, we should
still be able to fix the reply message type without breaking backward
compatibility.

Fixes: c7d759eb7b ("ethtool: add tunnel info interface")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17 16:43:20 -07:00
Tingwei Zhang 4af8b3d3eb coresight: stm: Support marked packet
STP_PACKET_MARKED is not supported by STM currently.
Add STM_FLAG_MARKED to support marked packet in STM.

Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20200916191737.4001561-3-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-17 18:46:04 +02:00
Maxime Ripard 05e63c6a7e
Merge branch 'virtio-shm' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into drm-misc-next
Topic pull request for core virtio changes that will be required by the DRM
driver.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
From: Gurchetan Singh <gurchetansingh@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/CAAfnVBn2BzXWFY3hhjDxd5q0P2_JWn-HdkVxgS94x9keAUZiow@mail.gmail.com
2020-09-16 13:18:32 +02:00
YiFei Zhu ef15314aa5 bpf: Add BPF_PROG_BIND_MAP syscall
This syscall binds a map to a program. Returns success if the map is
already bound to the program.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Link: https://lore.kernel.org/bpf/20200915234543.3220146-3-sdf@google.com
2020-09-15 18:28:27 -07:00
Jiri Pirko e2ce94dc1d devlink: introduce the health reporter test command
Introduce a test command for health reporters. User might use this
command to trigger test event on a reporter if the reporter supports it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15 15:57:16 -07:00
Jakub Kicinski 9a27a33027 ethtool: add standard pause stats
Currently drivers have to report their pause frames statistics
via ethtool -S, and there is a wide variety of names used for
these statistics.

Add the two statistics defined in IEEE 802.3x to the standard
API. Create a new ethtool request header flag for including
statistics in the response to GET commands.

Always create the ETHTOOL_A_PAUSE_STATS nest in replies when
flag is set. Testing if driver declares the op is not a reliable
way of checking if any stats will actually be included and therefore
we don't want to give the impression that presence of
ETHTOOL_A_PAUSE_STATS indicates driver support.

Note that this patch does not include PFC counters, which may fit
better in dcbnl? But mostly I don't need them/have a setup to test
them so I haven't looked deeply into exposing them :)

v3:
 - add a helper for "uninitializing" stats, rather than a cryptic
   memset() (Andrew)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15 13:26:28 -07:00
Xianting Tian f8910ffa81 ipmi:msghandler: retry to get device id on an error
We fail to get the BMCS's device id with low probability when loading
the ipmi driver and it causes BMC device registration failed. When this
issue occurs we got below kernel prints:

  [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler:
     device id demangle failed: -22
  [Wed Sep  9 19:52:03 2020] IPMI BT: using default values
  [Wed Sep  9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
  [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the
     device id: -5
  [Wed Sep  9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register
     device: error -5

When this issue happens, we want to manually unload the driver and try to
load it again, but it can't be unloaded by 'rmmod' as it is already 'in
use'.

We add a print in handle_one_recv_msg(), when this issue happens,
the msg we received is "Recv: 1c 01 d5", which means the data_len is 1,
data[0] is 0xd5 (completion code), which means "bmc cannot execute
command.  Command, or request parameter(s), not supported in present
state".  Debug code:
	static int handle_one_recv_msg(struct ipmi_smi *intf,
                               struct ipmi_smi_msg *msg) {
        	printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
		... ...
	}
Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7'
and 'data[0] != 0'.

We created this patch to retry the get device id when this error
happens.  We reproduced this issue again and the retry succeed on the
first retry, we finally got the correct msg and then all is ok:
Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00

So use a retry machanism in this patch to give bmc more opportunity to
correctly response kernel when we received specific completion codes.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Message-Id: <20200915071817.4484-1-tian.xianting@h3c.com>
[Cleaned up the verbage a bit in the header and prints.]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-09-15 09:57:45 -05:00
Daniel Vetter 818280d5ad Linux 5.9-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9epdgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG9IMH/jHCRSbcsIXHuQHn
 xcRLlhrDHfXoBza7auHfPWx2+9DZsmaSJs/SEiTGNag0Bi7jBcWcwBpsep7iVG/+
 WiftD5uOMhZigyuvfMFrt0mjr2Kr3wg5p58lwMBeBdm8iL5uKV8ehKsh05/Fral2
 6hu3jP8L0PCZMpF+sZ7s2jlhfVUMmjA8VzXZCvgQtmhoraHiF3mzfkcSMxnHwBPO
 HLo+TDDm49u+LbVsJT7+cSTiWxuUJCbix9Q4PCTx/BGg4ezYsjc6v0BnYRaYtrrA
 1uYiT6PVBEUkYYBHKQlD3N2KnUmbKx7dGUF4t+peTg5/JiocAJMNi1N9Qzvv7N6Q
 CqTiuio=
 =q+kJ
 -----END PGP SIGNATURE-----

Merge v5.9-rc5 into drm-next

Paul needs 1a21e5b930 ("drm/ingenic: Fix leak of device_node
pointer") and 3b5b005ef7 ("drm/ingenic: Fix driver not probing when
IPU port is missing") from -fixes to be able to merge further ingenic
patches into -next.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2020-09-14 17:19:11 +02:00
Stanimir Varbanov 5823833c9a media: v4l2-ctrl: Add VP9 codec levels
Add menu control for VP9 codec levels. A total of 14 levels are
defined for Profile 0 (8bit) and Profile 2 (10bit). Each level
is a set of constrained bitstreams coded with targeted resolutions,
frame rates, and bitrates.

The definitions have been taken from webm project [1].

[1] https://www.webmproject.org/vp9/levels/

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-09-14 15:39:42 +02:00
Sergey Senozhatsky 129134e541 media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag
The patch partially reverts some of the UAPI bits of the buffer
cache management hints. Namely, the queue consistency (memory
coherency) user-space hint because, as it turned out, the kernel
implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.

The patch reverts both kernel and user space parts: removes the
DMA consistency attr functions, rolls back changes to v4l2_requestbuffers,
v4l2_create_buffers structures and corresponding UAPI functions
(plus compat32 layer) and cleans up the documentation.

[hverkuil: fixed a few typos in the commit log]
[hverkuil: fixed vb2_core_reqbufs call in drivers/media/dvb-core/dvb_vb2.c]
[mchehab: fixed a typo in the commit log: revers->reverts]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-09-14 15:28:06 +02:00
Greg Kroah-Hartman 05fa34dcdb Linux 5.9-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9epdgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG9IMH/jHCRSbcsIXHuQHn
 xcRLlhrDHfXoBza7auHfPWx2+9DZsmaSJs/SEiTGNag0Bi7jBcWcwBpsep7iVG/+
 WiftD5uOMhZigyuvfMFrt0mjr2Kr3wg5p58lwMBeBdm8iL5uKV8ehKsh05/Fral2
 6hu3jP8L0PCZMpF+sZ7s2jlhfVUMmjA8VzXZCvgQtmhoraHiF3mzfkcSMxnHwBPO
 HLo+TDDm49u+LbVsJT7+cSTiWxuUJCbix9Q4PCTx/BGg4ezYsjc6v0BnYRaYtrrA
 1uYiT6PVBEUkYYBHKQlD3N2KnUmbKx7dGUF4t+peTg5/JiocAJMNi1N9Qzvv7N6Q
 CqTiuio=
 =q+kJ
 -----END PGP SIGNATURE-----

Merge 5.9-rc5 into char-misc-next

We want the char/misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-14 10:07:08 +02:00
Greg Kroah-Hartman 1622d35453 Merge 5.9-rc5 into staging-next
We want the staging/iio changes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-14 06:57:52 +02:00
Linus Torvalds 84b1349972 ARM:
- Multiple stolen time fixes, with a new capability to match x86
 - Fix for hugetlbfs mappings when PUD and PMD are the same level
 - Fix for hugetlbfs mappings when PTE mappings are enforced
   (dirty logging, for example)
 - Fix tracing output of 64bit values
 
 x86:
 - nSVM state restore fixes
 - Async page fault fixes
 - Lots of small fixes everywhere
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9dM5kUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM+Iwf+LbISO7ccpPMK1kKtOeug/jZv+xQA
 sVaBGRzYo+k2e0XtV8E8IV4N30FBtYSwXsbBKkMAoy2FpmMebgDWDQ7xspb6RJMS
 /y8t1iqPwdOaLIkUkgc7UihSTlZm05Es3f3q6uZ9+oaM4Fe+V7xWzTUX4Oy89JO7
 KcQsTD7pMqS4bfZGADK781ITR/WPgCi0aYx5s6dcwcZAQXhb1K1UKEjB8OGKnjUh
 jliReJtxRA16rjF+S5aJ7L07Ce/ksrfwkI4NXJ4GxW+lyOfVNdSBJUBaZt1m7G2M
 1We5+i5EjKCjuxmgtUUUfVdazpj1yl+gBGT7KKkLte9T9WZdXyDnixAbvg==
 =OFb3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "A bit on the bigger side, mostly due to me being on vacation, then
  busy, then on parental leave, but there's nothing worrisome.

  ARM:
   - Multiple stolen time fixes, with a new capability to match x86
   - Fix for hugetlbfs mappings when PUD and PMD are the same level
   - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty
     logging, for example)
   - Fix tracing output of 64bit values

  x86:
   - nSVM state restore fixes
   - Async page fault fixes
   - Lots of small fixes everywhere"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: emulator: more strict rsm checks.
  KVM: nSVM: more strict SMM checks when returning to nested guest
  SVM: nSVM: setup nested msr permission bitmap on nested state load
  SVM: nSVM: correctly restore GIF on vmexit from nesting after migration
  x86/kvm: don't forget to ACK async PF IRQ
  x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro
  KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit
  KVM: SVM: avoid emulation with stale next_rip
  KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN
  KVM: SVM: Periodically schedule when unregistering regions on destroy
  KVM: MIPS: Change the definition of kvm type
  kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed
  KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control
  KVM: fix memory leak in kvm_io_bus_unregister_dev()
  KVM: Check the allocation of pv cpu mask
  KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
  KVM: arm64: Update page shift if stage 2 block mapping not supported
  KVM: arm64: Fix address truncation in traces
  KVM: arm64: Do not try to map PUDs when they are folded into PMD
  arm64/x86: KVM: Introduce steal-time cap
  ...
2020-09-13 08:34:47 -07:00
Huacai Chen 15e9e35cd1 KVM: MIPS: Change the definition of kvm type
MIPS defines two kvm types:

 #define KVM_VM_MIPS_TE          0
 #define KVM_VM_MIPS_VZ          1

In Documentation/virt/kvm/api.rst it is said that "You probably want to
use 0 as machine type", which implies that type 0 be the "automatic" or
"default" type. And, in user-space libvirt use the null-machine (with
type 0) to detect the kvm capability, which returns "KVM not supported"
on a VZ platform.

I try to fix it in QEMU but it is ugly:
https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html

And Thomas Huth suggests me to change the definition of kvm type:
https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html

So I define like this:

 #define KVM_VM_MIPS_AUTO        0
 #define KVM_VM_MIPS_VZ          1
 #define KVM_VM_MIPS_TE          2

Since VZ and TE cannot co-exists, using type 0 on a TE platform will
still return success (so old user-space tools have no problems on new
kernels); the advantage is that using type 0 on a VZ platform will not
return failure. So, the only problem is "new user-space tools use type
2 on old kernels", but if we treat this as a kernel bug, we can backport
this patch to old stable kernels.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11 13:22:52 -04:00
Song Liu 1aef5b4391 bpf: Fix comment for helper bpf_current_task_under_cgroup()
This should be "current" not "skb".

Fixes: c6b5fb8690 ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/bpf/20200910203314.70018-1-songliubraving@fb.com
2020-09-10 20:04:51 -07:00
Paul Davey c8715a8e9f ipmr: Add high byte of VIF ID to igmpmsg
Use the unused3 byte in struct igmpmsg to hold the high 8 bits of the
VIF ID.

If using more than 255 IPv4 multicast interfaces it is necessary to have
access to a VIF ID for cache reports that is wider than 8 bits, the VIF
ID present in the igmpmsg reports sent to mroute_sk was only 8 bits wide
in the igmpmsg header.  Adding the high 8 bits of the 16 bit VIF ID in
the unused byte allows use of more than 255 IPv4 multicast interfaces.

Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10 12:25:51 -07:00
Paul Davey 501cb00890 ipmr: Add route table ID to netlink cache reports
Insert the multicast route table ID as a Netlink attribute to Netlink
cache report notifications.

When multiple route tables are in use it is necessary to have a way to
determine which route table a given cache report belongs to when
receiving the cache report.

Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10 12:25:51 -07:00
Sebastien Boeuf 38e895487a virtio: Implement get_shm_region for MMIO transport
On MMIO a new set of registers is defined for finding SHM
regions.  Add their definitions and use them to find the region.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 10:05:58 +02:00
Sebastien Boeuf 0dd4ff93f4 virtio: Implement get_shm_region for PCI transport
On PCI the shm regions are found using capability entries;
find a region by searching for the capability.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 10:05:58 +02:00
Jan Kara d1c1076783 quota: Expand comment describing d_itimer
Expand comment describing d_itimer in struct fs_disk_quota.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-09-10 09:10:06 +02:00
Darrick J. Wong ad47ff330b quota: widen timestamps for the fs_disk_quota structure
Soon, XFS will support quota grace period expiration timestamps beyond
the year 2038, widen the timestamp fields to handle the extra time bits.
Internally, XFS now stores unsigned 34-bit quantities, so the extra 8
bits here should work fine.  (Note that XFS is the only user of this
structure.)

Link: https://lore.kernel.org/r/20200909163413.GJ7955@magnolia
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-09-10 09:09:51 +02:00
Parav Pandit 3a2d9588c4 devlink: Introduce controller number
A devlink port may be for a controller consist of PCI device.
A devlink instance holds ports of two types of controllers.
(1) controller discovered on same system where eswitch resides
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external host system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system.

When a devlink eswitch instance serves the devlink ports of both
controllers together, PCI PF/VF numbers may overlap.
Due to this a unique phys_port_name cannot be constructed.

For example in below such system controller-0 and controller-1, each has
PCI PF pf0 whose eswitch ports can be present in controller-0.
These results in phys_port_name as "pf0" for both.
Similar problem exists for VFs and upcoming Sub functions.

An example view of two controller systems:

             ---------------------------------------------------------
             |                                                       |
             |           --------- ---------         ------- ------- |
-----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
| server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
| pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
| connect |  | -------                       -------                 |
-----------  |     | controller_num=1 (no eswitch)                   |
             ------|--------------------------------------------------
             (internal wire)
                   |
             ---------------------------------------------------------
             | devlink eswitch ports and reps                        |
             | ----------------------------------------------------- |
             | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
             | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
             | ----------------------------------------------------- |
             | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
             | |pf1    | pf1vfN | pf1sfN | pf1    | pf1vfN |pf0sfN | |
             | ----------------------------------------------------- |
             |                                                       |
             |                                                       |
             |           --------- ---------         ------- ------- |
             |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
             | -------   ----/---- ---/----- ------- ---/--- ---/--- |
             | | pf0 |______/________/       | pf1 |___/_______/     |
             | -------                       -------                 |
             |                                                       |
             |  local controller_num=0 (eswitch)                     |
             ---------------------------------------------------------

An example devlink port for external controller with controller
number = 1 for a VF 1 of PF 0:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09 14:19:55 -07:00
Parav Pandit 05b595e9c4 devlink: Introduce external controller flag
A devlink eswitch port may represent PCI PF/VF ports of a controller.

A controller either located on same system or it can be an external
controller located in host where such NIC is plugged in.

Add the ability for driver to specify if a port is for external
controller.

Use such flag in the mlx5_core driver.

An example of an external controller having VF1 of PF0 belong to
controller 1.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00
$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09 14:19:55 -07:00
David S. Miller d85427e3c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT,
   from Michael Zhou.

2) do_ip_vs_set_ctl() dereferences uninitialized value,
   from Peilin Ye.

3) Support for userdata in tables, from Jose M. Guisado.

4) Do not increment ct error and invalid stats at the same time,
   from Florian Westphal.

5) Remove ct ignore stats, also from Florian.

6) Add ct stats for clash resolution, from Florian Westphal.

7) Bump reference counter bump on ct clash resolution only,
   this is safe because bucket lock is held, again from Florian.

8) Use ip_is_fragment() in xt_HMARK, from YueHaibing.

9) Add wildcard support for nft_socket, from Balazs Scheidler.

10) Remove superfluous IPVS dependency on iptables, from
    Yaroslav Bolyukin.

11) Remove unused definition in ebt_stp, from Wang Hai.

12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT
    in selftests/net, from Fabian Frederick.

13) Add userdata support for nft_object, from Jose M. Guisado.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09 11:21:19 -07:00
Jose M. Guisado Gomez b131c96496 netfilter: nf_tables: add userdata support for nft_object
Enables storing userdata for nft_object. Initially this will store an
optional comment but can be extended in the future as needed.

Adds new attribute NFTA_OBJ_USERDATA to nft_object.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08 16:35:38 +02:00
Dave Airlie 0c8d22fcae Merge tag 'amd-drm-next-5.10-2020-09-03' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.10-2020-09-03:

amdgpu:
- RAS fixes
- Sienna Cichlid updates
- Navy Flounder updates
- DCE6 (SI) support in DC
- Enable plane rotation
- Rework pre-OS vram reservation handling during driver init
- Add standard interface to dump GPU metrics table from SMU
- Rework tiling and tmz state handling in atomic commits
- Pstate fixes
- Add voltage and power hwmon interfaces for renoir
- SW CTF fixes
- S/G display fix for Raven
- Print client strings for vmfaults for vega and newer
- Manual fan control fixes
- Display updates
- Reorg power management directory structure
- Misc bug fixes
- Misc code cleanups

amdkfd:
- Topology fixes
- Add SMI events for thermal throttling and GPU resets

radeon:
- switch from pci_* to dma_* for dma allocations
- PLL fix

Scheduler:
- Clean up priority levels

UAPI:
- amdgpu INFO IOCTL query update for TMZ state
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049
- amdkfd SMI event interface updates
  https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/therm_thrott

From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200903222921.4152-1-alexander.deucher@amd.com
2020-09-08 16:40:13 +10:00
Dave Airlie ce5c207c6b Linux 5.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9VerweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhc4H/iHD6qLdB36gZB6K
 oc2nJyrqyWitv4ti2Mnt5PA7o4wX4l6nnr1QvoaJ4BRs5Ja1czRvb2XDmdzqAoIA
 xITGoafqaAeDfxQ91bWrJsVN0pCRKiGwddXlU7TWmqw/riAkfOqi6GYKvav4biJH
 +n1mUPQb1M2IbRFsqkAS+ebKHq3CWaRvzKOEneS88nGlL5u31S9NAru8Ru/fkxRn
 6CwGcs1XRaBPYaZAhdfIb0NuatUlpkhPC9yhNS9up6SqrWmK3m65vmFVng6H0eCF
 fwn1jVztboY/XcNAi5sM9ExpQCql6WLQEEktVikqRDojC8fVtSx6W55tPt7qeaoO
 Z6t4/DA=
 =bcA4
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc4' into drm-next

Backmerge 5.9-rc4 as there is a nasty qxl conflict
that needs to be resolved.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-09-08 14:41:40 +10:00
Jakub Kicinski 0db0c34cfb net: tighten the definition of interface statistics
This patch is born out of an investigation into which IEEE statistics
correspond to which struct rtnl_link_stats64 members. Turns out that
there seems to be reasonable consensus on the matter, among many drivers.
To save others the time (and it took more time than I'm comfortable
admitting) I'm adding comments referring to IEEE attributes to
struct rtnl_link_stats64.

Up until now we had two forms of documentation for stats - in
Documentation/ABI/testing/sysfs-class-net-statistics and the comments
on struct rtnl_link_stats64 itself. While the former is very cautious
in defining the expected behavior, the latter feel quite dated and
may not be easy to understand for modern day driver author
(e.g. rx_over_errors). At the same time modern systems are far more
complex and once obvious definitions lost their clarity. For example
- does rx_packet count at the MAC layer (aFramesReceivedOK)?
packets processed correctly by hardware? received by the driver?
or maybe received by the stack?

I tried to clarify the expectations, further clarifications from
others are very welcome.

The part hardest to untangle is rx_over_errors vs rx_fifo_errors
vs rx_missed_errors. After much deliberation I concluded that for
modern HW only two of the counters will make sense. The distinction
between internal FIFO overflow and packets dropped due to back-pressure
from the host is likely too implementation (driver and device) specific
to expose in the standard stats.

Now - which two of those counters we select to use is anyone's pick:

sysfs documentation suggests rx_over_errors counts packets which
did not fit into buffers due to MTU being too small, which I reused.
There don't seem to be many modern drivers using it (well, CAN drivers
seem to love this statistic).

Of the remaining two I picked rx_missed_errors to report device drops.
bnxt reports it and it's folded into "drop"s in procfs (while
rx_fifo_errors is an error, and modern devices usually receive the frame
OK, they just can't admit it into the pipeline).

Of the drivers I looked at only AMD Lance-like and NS8390-like use all
three of these counters. rx_missed_errors counts missed frames,
rx_over_errors counts overflow events, and rx_fifo_errors counts frames
which were truncated because they didn't fit into buffers. This suggests
that rx_fifo_errors may be the correct stat for truncated packets, but
I'd think a FIFO stat counting truncated packets would be very confusing
to a modern reader.

v2:
 - add driver developer notes about ethtool stat count and reset
 - replace Ethernet with IEEE 802.3 to better indicate source of attrs
 - mention byte counters don't count FCS
 - clarify RX counter is from device to host
 - drop "sightly" from sysfs paragraph
 - add examples of ethtool stats
 - s/incoming/received/ s/incoming/transmitted/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 15:08:05 -07:00
Nikolay Aleksandrov 5205e919c9 net: bridge: mcast: add support for src list and filter mode dumping
Support per port group src list (address and timer) and filter mode
dumping. Protected by either multicast_lock or rcu.

v3: add IPv6 support
v2: require RCU or multicast_lock to traverse src groups

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:34 -07:00
Quentin Monnet 938c3efd9e bpf: Fix formatting in documentation for BPF helpers
Fix a formatting error in the description of bpf_load_hdr_opt() (rst2man
complains about a wrong indentation, but what is missing is actually a
blank line before the bullet list).

Fix and harmonise the formatting for other helpers.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904161454.31135-3-quentin@isovalent.com
2020-09-07 16:31:18 +02:00
Mauro Carvalho Chehab d13ee586e0 Linux 5.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9VerweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhc4H/iHD6qLdB36gZB6K
 oc2nJyrqyWitv4ti2Mnt5PA7o4wX4l6nnr1QvoaJ4BRs5Ja1czRvb2XDmdzqAoIA
 xITGoafqaAeDfxQ91bWrJsVN0pCRKiGwddXlU7TWmqw/riAkfOqi6GYKvav4biJH
 +n1mUPQb1M2IbRFsqkAS+ebKHq3CWaRvzKOEneS88nGlL5u31S9NAru8Ru/fkxRn
 6CwGcs1XRaBPYaZAhdfIb0NuatUlpkhPC9yhNS9up6SqrWmK3m65vmFVng6H0eCF
 fwn1jVztboY/XcNAi5sM9ExpQCql6WLQEEktVikqRDojC8fVtSx6W55tPt7qeaoO
 Z6t4/DA=
 =bcA4
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc4' into patchwork

Linux 5.9-rc4

* tag 'v5.9-rc4': (1001 commits)
  Linux 5.9-rc4
  io_uring: fix linked deferred ->files cancellation
  io_uring: fix cancel of deferred reqs with ->files
  include/linux/log2.h: add missing () around n in roundup_pow_of_two()
  mm/khugepaged.c: fix khugepaged's request size in collapse_file
  mm/hugetlb: fix a race between hugetlb sysctl handlers
  mm/hugetlb: try preferred node first when alloc gigantic page from cma
  mm/migrate: preserve soft dirty in remove_migration_pte()
  mm/migrate: remove unnecessary is_zone_device_page() check
  mm/rmap: fixup copying of soft dirty and uffd ptes
  mm/migrate: fixup setting UFFD_WP flag
  mm: madvise: fix vma user-after-free
  checkpatch: fix the usage of capture group ( ... )
  fork: adjust sysctl_max_threads definition to match prototype
  ipc: adjust proc_ipc_sem_dointvec definition to match prototype
  mm: track page table modifications in __apply_to_page_range()
  MAINTAINERS: IA64: mark Status as Odd Fixes only
  MAINTAINERS: add LLVM maintainers
  MAINTAINERS: update Cavium/Marvell entries
  mm: slub: fix conversion of freelist_corrupted()
  ...
2020-09-07 16:13:06 +02:00