Commit Graph

680 Commits (7d40bfc1933efbbd65762b0bcb63287c07125370)

Author SHA1 Message Date
David S. Miller a52171ae7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-17

The following pull-request contains BPF updates for your *net-next* tree.

We've added 50 non-merge commits during the last 25 day(s) which contain
a total of 148 files changed, 4779 insertions(+), 1248 deletions(-).

The main changes are:

1) BPF infrastructure to migrate TCP child sockets from a listener to another
   in the same reuseport group/map, from Kuniyuki Iwashima.

2) Add a provably sound, faster and more precise algorithm for tnum_mul() as
   noted in https://arxiv.org/abs/2105.05398, from Harishankar Vishwanathan.

3) Streamline error reporting changes in libbpf as planned out in the
   'libbpf: the road to v1.0' effort, from Andrii Nakryiko.

4) Add broadcast support to xdp_redirect_map(), from Hangbin Liu.

5) Extends bpf_map_lookup_and_delete_elem() functionality to 4 more map
   types, that is, {LRU_,PERCPU_,LRU_PERCPU_,}HASH, from Denis Salopek.

6) Support new LLVM relocations in libbpf to make them more linker friendly,
   also add a doc to describe the BPF backend relocations, from Yonghong Song.

7) Silence long standing KUBSAN complaints on register-based shifts in
   interpreter, from Daniel Borkmann and Eric Biggers.

8) Add dummy PT_REGS macros in libbpf to fail BPF program compilation when
   target arch cannot be determined, from Lorenz Bauer.

9) Extend AF_XDP to support large umems with 1M+ pages, from Magnus Karlsson.

10) Fix two minor libbpf tc BPF API issues, from Kumar Kartikeya Dwivedi.

11) Move libbpf BPF_SEQ_PRINTF/BPF_SNPRINTF macros that can be used by BPF
    programs to bpf_helpers.h header, from Florent Revest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17 11:54:56 -07:00
Kuniyuki Iwashima d5e4ddaeb6 bpf: Support socket migration by eBPF.
This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT
to check if the attached eBPF program is capable of migrating sockets. When
the eBPF program is attached, we run it for socket migration if the
expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or
net.ipv4.tcp_migrate_req is enabled.

Currently, the expected_attach_type is not enforced for the
BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the
earlier idea in the commit aac3fc320d ("bpf: Post-hooks for sys_bind") to
fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type().

Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to
select a new listener based on the child socket. migrating_sk varies
depending on if it is migrating a request in the accept queue or during
3WHS.

  - accept_queue : sock (ESTABLISHED/SYN_RECV)
  - 3WHS         : request_sock (NEW_SYN_RECV)

In the eBPF program, we can select a new listener by
BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning
SK_DROP. This feature is useful when listeners have different settings at
the socket API level or when we want to free resources as soon as possible.

  - SK_PASS with selected_sk, select it as a new listener
  - SK_PASS with selected_sk NULL, fallbacks to the random selection
  - SK_DROP, cancel the migration.

There is a noteworthy point. We select a listening socket in three places,
but we do not have struct skb at closing a listener or retransmitting a
SYN+ACK. On the other hand, some helper functions do not expect skb is NULL
(e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer()
in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb
temporarily before running the eBPF program.

Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp
2021-06-15 18:01:06 +02:00
Kuniyuki Iwashima e061047684 bpf: Support BPF_FUNC_get_socket_cookie() for BPF_PROG_TYPE_SK_REUSEPORT.
We will call sock_reuseport.prog for socket migration in the next commit,
so the eBPF program has to know which listener is closing to select a new
listener.

We can currently get a unique ID of each listener in the userspace by
calling bpf_map_lookup_elem() for BPF_MAP_TYPE_REUSEPORT_SOCKARRAY map.

This patch makes the pointer of sk available in sk_reuseport_md so that we
can get the ID by BPF_FUNC_get_socket_cookie() in the eBPF program.

Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/netdev/20201119001154.kapwihc2plp4f7zc@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20210612123224.12525-9-kuniyu@amazon.co.jp
2021-06-15 18:01:06 +02:00
Jakub Kicinski 5ada57a9a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 09:55:10 -07:00
Hangbin Liu e624d4ed4a xdp: Extend xdp_redirect_map with broadcast support
This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
extend xdp_redirect_map for broadcast support.

With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.

When getting the devices in dev hash map via dev_map_hash_get_next_key(),
there is a possibility that we fall back to the first key when a device
was removed. This will duplicate packets on some interfaces. So just walk
the whole buckets to avoid this issue. For dev array map, we also walk the
whole map to find valid interfaces.

Function bpf_clear_redirect_map() was removed in
commit ee75aef23a ("bpf, xdp: Restructure redirect actions").
Add it back as we need to use ri->map again.

With test topology:
  +-------------------+             +-------------------+
  | Host A (i40e 10G) |  ---------- | eno1(i40e 10G)    |
  +-------------------+             |                   |
                                    |   Host B          |
  +-------------------+             |                   |
  | Host C (i40e 10G) |  ---------- | eno2(i40e 10G)    |
  +-------------------+             |                   |
                                    |          +------+ |
                                    | veth0 -- | Peer | |
                                    | veth1 -- |      | |
                                    | veth2 -- |  NS  | |
                                    |          +------+ |
                                    +-------------------+

On Host A:
 # pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64

On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
All the veth peers in the NS have a XDP_DROP program loaded. The
forward_map max_entries in xdp_redirect_map_multi is modify to 4.

Testing the performance impact on the regular xdp_redirect path with and
without patch (to check impact of additional check for broadcast mode):

5.12 rc4         | redirect_map        i40e->i40e      |    2.0M |  9.7M
5.12 rc4         | redirect_map        i40e->veth      |    1.7M | 11.8M
5.12 rc4 + patch | redirect_map        i40e->i40e      |    2.0M |  9.6M
5.12 rc4 + patch | redirect_map        i40e->veth      |    1.7M | 11.7M

Testing the performance when cloning packets with the redirect_map_multi
test, using a redirect map size of 4, filled with 1-3 devices:

5.12 rc4 + patch | redirect_map multi  i40e->veth (x1) |    1.7M | 11.4M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x2) |    1.1M |  4.3M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x3) |    0.8M |  2.6M

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
2021-05-26 09:46:16 +02:00
Denis Salopek 3e87f192b4 bpf: Add lookup_and_delete_elem support to hashtab
Extend the existing bpf_map_lookup_and_delete_elem() functionality to
hashtab map types, in addition to stacks and queues.
Create a new hashtab bpf_map_ops function that does lookup and deletion
of the element under the same bucket lock and add the created map_ops to
bpf.h.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/4d18480a3e990ffbf14751ddef0325eed3be2966.1620763117.git.denis.salopek@sartura.hr
2021-05-24 13:30:26 -07:00
Arnaldo Carvalho de Melo 4224680ee7 tools headers UAPI: Sync linux/perf_event.h with the kernel sources
To pick the trivial change in:

  0683b53197 ("signal: Deliver all of the siginfo perf data in _perf")

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-21 16:03:08 -03:00
Arnaldo Carvalho de Melo ec347b7c31 tools headers UAPI: Sync linux/fs.h with the kernel sources
To pick the trivial change in:

  63c8af5687 ("block: uapi: fix comment about block device ioctl")

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
  diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h

Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-21 16:00:31 -03:00
Alexei Starovoitov 5d67f34959 bpf: Add cmd alias BPF_PROG_RUN
Add BPF_PROG_RUN command as an alias to BPF_RPOG_TEST_RUN to better
indicate the full range of use cases done by the command.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519014032.20908-1-alexei.starovoitov@gmail.com
2021-05-19 15:35:12 +02:00
Alexei Starovoitov 3abea08924 bpf: Add bpf_sys_close() helper.
Add bpf_sys_close() helper to be used by the syscall/loader program to close
intermediate FDs and other cleanup.
Note this helper must never be allowed inside fdget/fdput bracketing.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-11-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Alexei Starovoitov 3d78417b60 bpf: Add bpf_btf_find_by_name_kind() helper.
Add new helper:
long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags)
Description
	Find BTF type with given name and kind in vmlinux BTF or in module's BTFs.
Return
	Returns btf_id and btf_obj_fd in lower and upper 32 bits.

It will be used by loader program to find btf_id to attach the program to
and to find btf_ids of ksyms.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-10-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Alexei Starovoitov 387544bfa2 bpf: Introduce fd_idx
Typical program loading sequence involves creating bpf maps and applying
map FDs into bpf instructions in various places in the bpf program.
This job is done by libbpf that is using compiler generated ELF relocations
to patch certain instruction after maps are created and BTFs are loaded.
The goal of fd_idx is to allow bpf instructions to stay immutable
after compilation. At load time the libbpf would still create maps as usual,
but it wouldn't need to patch instructions. It would store map_fds into
__u32 fd_array[] and would pass that pointer to sys_bpf(BPF_PROG_LOAD).

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-9-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Alexei Starovoitov 79a7f8bdb1 bpf: Introduce bpf_sys_bpf() helper and program type.
Add placeholders for bpf_sys_bpf() helper and new program type.
Make sure to check that expected_attach_type is zero for future extensibility.
Allow tracing helper functions to be used in this program type, since they will
only execute from user context via bpf_prog_test_run.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com
2021-05-19 00:33:39 +02:00
Arnaldo Carvalho de Melo 71d7924b3e tools headers UAPI: Sync perf_event.h with the kernel sources
To pick up the changes in:

  2b26f0aa00 ("perf: Support only inheriting events if cloned with CLONE_THREAD")
  2e498d0a74 ("perf: Add support for event removal on exec")
  547b60988e ("perf: aux: Add flags for the buffer format")
  55bcf6ef31 ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
  7dde51767c ("perf: aux: Add CoreSight PMU buffer formats")
  97ba62b278 ("perf: Add support for SIGTRAP on perf events")
  d0d1dd6285 ("perf core: Add PERF_COUNT_SW_CGROUP_SWITCHES event")

Also change the expected sizeof(struct perf_event_attr) from 120 to 128 due to
fields being added for the SIGTRAP changes.

Addressing this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Marco Elver <elver@google.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-10 09:01:01 -03:00
Arnaldo Carvalho de Melo 5a80ee4219 tools headers UAPI: Sync linux/prctl.h with the kernel sources
To pick a new prctl introduced in:

  201698626f ("arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS)")

That results in

  $ grep prctl tools/perf/trace/beauty/*.sh
  tools/perf/trace/beauty/prctl_option.sh:printf "static const char *prctl_options[] = {\n"
  tools/perf/trace/beauty/prctl_option.sh:egrep $regex ${header_dir}/prctl.h | grep -v PR_SET_PTRACER | \
  tools/perf/trace/beauty/prctl_option.sh:printf "static const char *prctl_set_mm_options[] = {\n"
  tools/perf/trace/beauty/prctl_option.sh:egrep $regex ${header_dir}/prctl.h | \
  tools/perf/trace/beauty/x86_arch_prctl.sh:prctl_arch_header=${x86_header_dir}/prctl.h
  tools/perf/trace/beauty/x86_arch_prctl.sh:	printf "#define x86_arch_prctl_codes_%d_offset %s\n" $idx $first_entry
  tools/perf/trace/beauty/x86_arch_prctl.sh:	printf "static const char *x86_arch_prctl_codes_%d[] = {\n" $idx
  tools/perf/trace/beauty/x86_arch_prctl.sh:	egrep -q $regex ${prctl_arch_header} && \
  tools/perf/trace/beauty/x86_arch_prctl.sh:	(egrep $regex ${prctl_arch_header} | \
  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  --- before	2021-05-09 10:06:10.064559675 -0300
  +++ after	2021-05-09 10:06:21.319791396 -0300
  @@ -54,6 +54,8 @@
   	[57] = "SET_IO_FLUSHER",
   	[58] = "GET_IO_FLUSHER",
   	[59] = "SET_SYSCALL_USER_DISPATCH",
  +	[60] = "PAC_SET_ENABLED_KEYS",
  +	[61] = "PAC_GET_ENABLED_KEYS",
   };
   static const char *prctl_set_mm_options[] = {
   	[1] = "START_CODE",
  $

Now users can do:

  # perf trace -e syscalls:sys_enter_prctl --filter "option==PAC_GET_ENABLED_KEYS"
^C#
  # trace -v -e syscalls:sys_enter_prctl --filter "option==PAC_GET_ENABLED_KEYS"
  New filter for syscalls:sys_enter_prctl: (option==0x3d) && (common_pid != 5519 && common_pid != 3404)
^C#

And also when prctl appears in a session, its options will be
translated to the string.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-10 09:01:00 -03:00
Arnaldo Carvalho de Melo 0d943d5fde tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:

  15fb7de1a7 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
  3bf725699b ("KVM: arm64: Add support for the KVM PTP service")
  4cfdd47d6d ("KVM: SVM: Add KVM_SEV SEND_START command")
  54526d1fd5 ("KVM: x86: Support KVM VMs sharing SEV context")
  5569e2e7a6 ("KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command")
  8b13c36493 ("KVM: introduce KVM_CAP_SET_GUEST_DEBUG2")
  af43cbbf95 ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command")
  d3d1af85e2 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
  fe7e948837 ("KVM: x86: Add capability to grant VM access to privileged SGX attribute")

That don't cause any change in tooling as it doesn't introduce any new
ioctl.

  $ grep kvm tools/perf/trace/beauty/*.sh
  tools/perf/trace/beauty/kvm_ioctl.sh:printf "static const char *kvm_ioctl_cmds[] = {\n"
  tools/perf/trace/beauty/kvm_ioctl.sh:egrep $regex ${header_dir}/kvm.h	| \
  $
  $ tools/perf/trace/beauty/kvm_ioctl.sh > before
  $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
  $ tools/perf/trace/beauty/kvm_ioctl.sh > after
  $ diff -u before after
  $

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Jianyong Wu <jianyong.wu@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nathan Tempelman <natet@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steve Rutherford <srutherford@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-10 09:01:00 -03:00
Linus Torvalds 10a3efd0fe perf tools changes for v5.13: 1st batch
perf stat:
 
 - Add support for hybrid PMUs to support systems such as Intel Alderlake
   and its BIG/little core/atom cpus.
 
 - Introduce 'bperf' to share hardware PMCs with BPF.
 
 - New --iostat option to collect and present IO stats on Intel hardware.
 
   This functionality is based on recently introduced sysfs attributes
   for Intel® Xeon® Scalable processor family (code name Skylake-SP):
 
     commit bb42b3d397 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping")
 
   It is intended to provide four I/O performance metrics in MB per each
   PCIe root port:
 
    - Inbound Read: I/O devices below root port read from the host memory
    - Inbound Write: I/O devices below root port write to the host memory
    - Outbound Read: CPU reads from I/O devices below root port
    - Outbound Write: CPU writes to I/O devices below root port
 
 - Align CSV output for summary.
 
 - Clarify --null use cases: Assess raw overhead of 'perf stat' or
   measure just wall clock time.
 
 - Improve readability of shadow stats.
 
 perf record:
 
 - Change the COMM when starting tha workload so that --exclude-perf
   doesn't seem to be not honoured.
 
 - Improve 'Workload failed' message printing events + what was exec'ed.
 
 - Fix cross-arch support for TIME_CONV.
 
 perf report:
 
 - Add option to disable raw event ordering.
 
 - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'.
 
 - Improvements to --stat output, that shows information about PERF_RECORD_ events.
 
 - Preserve identifier id in OCaml demangler.
 
 perf annotate:
 
 - Show full source location with 'l' hotkey in the 'perf annotate' TUI.
 
 - Add line number like in TUI and source location at EOL to the 'perf annotate' --stdio mode.
 
 - Add --demangle and --demangle-kernel to 'perf annotate'.
 
 - Allow configuring annotate.demangle{,_kernel} in 'perf config'.
 
 - Fix sample events lost in stdio mode.
 
 perf data:
 
 - Allow converting a perf.data file to JSON.
 
 libperf:
 
 - Add support for user space counter access.
 
 - Update topdown documentation to permit rdpmc calls.
 
 perf test:
 
 - Add 'perf test' for 'perf stat' CSV output.
 
 - Add 'perf test' entries to test the hybrid PMU support.
 
 - Cleanup 'perf test daemon' if its 'perf test' is interrupted.
 
 - Handle metric reuse in pmu-events parsing 'perf test' entry.
 
 - Add test for PE executable support.
 
 - Add timeout for wait for daemon start in its 'perf test' entries.
 
 Build:
 
 - Enable libtraceevent dynamic linking.
 
 - Improve feature detection output.
 
 - Fix caching of feature checks caching.
 
 - First round of updates for tools copies of kernel headers.
 
 - Enable warnings when compiling BPF programs.
 
 Vendor specific events:
 
 Intel:
 
 - Add missing skylake & icelake model numbers.
 
 arm64:
 
 - Add Hisi hip08 L1, L2 and L3 metrics.
 
 - Add Fujitsu A64FX PMU events.
 
 PowerPC:
 
 - Initial JSON/events list for power10 platform.
 
 - Remove unsupported power9 metrics.
 
 AMD:
 
 - Add Zen3 events.
 
 - Fix broken L2 Cache Hits from L2 HWPF metric.
 
 - Use lowercases for all the eventcodes and umasks.
 
 Hardware tracing:
 
 arm64:
 
 - Update CoreSight ETM metadata format.
 
 - Fix bitmap for CS-ETM option.
 
 - Support PID tracing in config.
 
 - Detect pid in VMID for kernel running at EL2.
 
 Arch specific:
 
 MIPS:
 
 - Support MIPS unwinding and dwarf-regs.
 
 - Generate mips syscalls_n64.c syscall table.
 
 PowerPC:
 
 - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC.
 
 - Support pipeline stage cycles for powerpc.
 
 libbeauty:
 
 - Fix fsconfig generator.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYIshAwAKCRCyPKLppCJ+
 J8oWAP9c1POclDQ7AZDe5/t/InZYSQKJFIku1sE1SNCSOupy7wEAuPBtaN7wDaRj
 BFBibfUGd4MNzLPvMMHneIhSY3DgJwg=
 =FLLr
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "perf stat:

   - Add support for hybrid PMUs to support systems such as Intel
     Alderlake and its BIG/little core/atom cpus.

   - Introduce 'bperf' to share hardware PMCs with BPF.

   - New --iostat option to collect and present IO stats on Intel
     hardware.

     This functionality is based on recently introduced sysfs attributes
     for Intel® Xeon® Scalable processor family (code name Skylake-SP)
     in commit bb42b3d397 ("perf/x86/intel/uncore: Expose an Uncore
     unit to IIO PMON mapping")

     It is intended to provide four I/O performance metrics in MB per
     each PCIe root port:

       - Inbound Read: I/O devices below root port read from the host memory
       - Inbound Write: I/O devices below root port write to the host memory
       - Outbound Read: CPU reads from I/O devices below root port
       - Outbound Write: CPU writes to I/O devices below root port

   - Align CSV output for summary.

   - Clarify --null use cases: Assess raw overhead of 'perf stat' or
     measure just wall clock time.

   - Improve readability of shadow stats.

  perf record:

   - Change the COMM when starting tha workload so that --exclude-perf
     doesn't seem to be not honoured.

   - Improve 'Workload failed' message printing events + what was
     exec'ed.

   - Fix cross-arch support for TIME_CONV.

  perf report:

   - Add option to disable raw event ordering.

   - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'.

   - Improvements to --stat output, that shows information about
     PERF_RECORD_ events.

   - Preserve identifier id in OCaml demangler.

  perf annotate:

   - Show full source location with 'l' hotkey in the 'perf annotate'
     TUI.

   - Add line number like in TUI and source location at EOL to the 'perf
     annotate' --stdio mode.

   - Add --demangle and --demangle-kernel to 'perf annotate'.

   - Allow configuring annotate.demangle{,_kernel} in 'perf config'.

   - Fix sample events lost in stdio mode.

  perf data:

   - Allow converting a perf.data file to JSON.

  libperf:

   - Add support for user space counter access.

   - Update topdown documentation to permit rdpmc calls.

  perf test:

   - Add 'perf test' for 'perf stat' CSV output.

   - Add 'perf test' entries to test the hybrid PMU support.

   - Cleanup 'perf test daemon' if its 'perf test' is interrupted.

   - Handle metric reuse in pmu-events parsing 'perf test' entry.

   - Add test for PE executable support.

   - Add timeout for wait for daemon start in its 'perf test' entries.

  Build:

   - Enable libtraceevent dynamic linking.

   - Improve feature detection output.

   - Fix caching of feature checks caching.

   - First round of updates for tools copies of kernel headers.

   - Enable warnings when compiling BPF programs.

  Vendor specific events:

   - Intel:
      - Add missing skylake & icelake model numbers.

   - arm64:
      - Add Hisi hip08 L1, L2 and L3 metrics.
      - Add Fujitsu A64FX PMU events.

   - PowerPC:
      - Initial JSON/events list for power10 platform.
      - Remove unsupported power9 metrics.

   - AMD:
      - Add Zen3 events.
      - Fix broken L2 Cache Hits from L2 HWPF metric.
      - Use lowercases for all the eventcodes and umasks.

  Hardware tracing:

   - arm64:
      - Update CoreSight ETM metadata format.
      - Fix bitmap for CS-ETM option.
      - Support PID tracing in config.
      - Detect pid in VMID for kernel running at EL2.

  Arch specific updates:

   - MIPS:
      - Support MIPS unwinding and dwarf-regs.
      - Generate mips syscalls_n64.c syscall table.

   - PowerPC:
      - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC.
      - Support pipeline stage cycles for powerpc.

  libbeauty:

   - Fix fsconfig generator"

* tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (132 commits)
  perf build: Defer printing detected features to the end of all feature checks
  tools build: Allow deferring printing the results of feature detection
  perf build: Regenerate the FEATURE_DUMP file after extra feature checks
  perf session: Dump PERF_RECORD_TIME_CONV event
  perf session: Add swap operation for event TIME_CONV
  perf jit: Let convert_timestamp() to be backwards-compatible
  perf tools: Change fields type in perf_record_time_conv
  perf tools: Enable libtraceevent dynamic linking
  perf Documentation: Document intel-hybrid support
  perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid
  perf tests: Support 'Convert perf time to TSC' test for hybrid
  perf tests: Support 'Session topology' test for hybrid
  perf tests: Support 'Parse and process metrics' test for hybrid
  perf tests: Support 'Track with sched_switch' test for hybrid
  perf tests: Skip 'Setup struct perf_event_attr' test for hybrid
  perf tests: Add hybrid cases for 'Roundtrip evsel->name' test
  perf tests: Add hybrid cases for 'Parse event definition strings' test
  perf record: Uniquify hybrid event name
  perf stat: Warn group events from different hybrid PMU
  perf stat: Filter out unmatched aggregation for hybrid event
  ...
2021-05-01 12:22:38 -07:00
Jin Yao 4127361191 tools headers uapi: Update tools's copy of linux/perf_event.h
To get the changes in:

Liang Kan's patch

  55bcf6ef31 ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")

Kan's patch is in the tip/perf/core branch.

So the next perf tool patches need this interface for hybrid support.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-29 10:30:59 -03:00
David S. Miller 5f6c2f536d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-04-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 69 non-merge commits during the last 22 day(s) which contain
a total of 69 files changed, 3141 insertions(+), 866 deletions(-).

The main changes are:

1) Add BPF static linker support for extern resolution of global, from Andrii.

2) Refine retval for bpf_get_task_stack helper, from Dave.

3) Add a bpf_snprintf helper, from Florent.

4) A bunch of miscellaneous improvements from many developers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-25 18:02:32 -07:00
Florent Revest 7b15523a98 bpf: Add a bpf_snprintf helper
The implementation takes inspiration from the existing bpf_trace_printk
helper but there are a few differences:

To allow for a large number of format-specifiers, parameters are
provided in an array, like in bpf_seq_printf.

Because the output string takes two arguments and the array of
parameters also takes two arguments, the format string needs to fit in
one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to
a zero-terminated read-only map so we don't need a format string length
arg.

Because the format-string is known at verification time, we also do
a first pass of format string validation in the verifier logic. This
makes debugging easier.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-4-revest@chromium.org
2021-04-19 15:27:36 -07:00
Toke Høiland-Jørgensen 441e8c66b2 bpf: Return target info when a tracing bpf_link is queried
There is currently no way to discover the target of a tracing program
attachment after the fact. Add this information to bpf_link_info and return
it when querying the bpf_link fd.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413091607.58945-1-toke@redhat.com
2021-04-13 18:18:57 -07:00
Pedro Tammela 5c50732900 libbpf: Clarify flags in ringbuf helpers
In 'bpf_ringbuf_reserve()' we require the flag to '0' at the moment.

For 'bpf_ringbuf_{discard,submit,output}' a flag of '0' might send a
notification to the process if needed.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210412192434.944343-1-pctammela@mojatatu.com
2021-04-12 21:28:33 -07:00
Daniel Borkmann cbaa683bb3 bpf: Sync bpf headers in tooling infrastucture
Synchronize tools/include/uapi/linux/bpf.h which was missing changes
from various commits:

  - f3c45326ee ("bpf: Document PROG_TEST_RUN limitations")
  - e5e35e754c ("bpf: BPF-helper for MTU checking add length input")

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-04-12 17:31:09 +02:00
Jakub Kicinski 8859a44ea0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

MAINTAINERS
 - keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
 - simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
 - trivial
include/linux/ethtool.h
 - trivial, fix kdoc while at it
include/linux/skmsg.h
 - move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
 - add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
 - trivial

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 20:48:35 -07:00
Cong Wang a7ba4558e6 sock_map: Introduce BPF_SK_SKB_VERDICT
Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is
confusing and more importantly we still want to distinguish them
from user-space. So we can just reuse the stream verdict code but
introduce a new type of eBPF program, skb_verdict. Users are not
allowed to attach stream_verdict and skb_verdict programs to the
same map.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-10-xiyou.wangcong@gmail.com
2021-04-01 10:56:14 -07:00
Martin KaFai Lau e6ac2450d6 bpf: Support bpf program calling kernel function
This patch adds support to BPF verifier to allow bpf program calling
kernel function directly.

The use case included in this set is to allow bpf-tcp-cc to directly
call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
functions have already been used by some kernel tcp-cc implementations.

This set will also allow the bpf-tcp-cc program to directly call the
kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
from the kernel tcp_dctcp.c instead of reimplementing (or
copy-and-pasting) them.

The tcp-cc kernel functions mentioned above will be white listed
for the struct_ops bpf-tcp-cc programs to use in a later patch.
The white listed functions are not bounded to a fixed ABI contract.
Those functions have already been used by the existing kernel tcp-cc.
If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
implementations have to be changed.  The same goes for the struct_ops
bpf-tcp-cc programs which have to be adjusted accordingly.

This patch is to make the required changes in the bpf verifier.

First change is in btf.c, it adds a case in "btf_check_func_arg_match()".
When the passed in "btf->kernel_btf == true", it means matching the
verifier regs' states with a kernel function.  This will handle the
PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
and PTR_TO_TCP_SOCK to its kernel's btf_id.

In the later libbpf patch, the insn calling a kernel function will
look like:

insn->code == (BPF_JMP | BPF_CALL)
insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
insn->imm == func_btf_id /* btf_id of the running kernel */

[ For the future calling function-in-kernel-module support, an array
  of module btf_fds can be passed at the load time and insn->off
  can be used to index into this array. ]

At the early stage of verifier, the verifier will collect all kernel
function calls into "struct bpf_kfunc_desc".  Those
descriptors are stored in "prog->aux->kfunc_tab" and will
be available to the JIT.  Since this "add" operation is similar
to the current "add_subprog()" and looking for the same insn->code,
they are done together in the new "add_subprog_and_kfunc()".

In the "do_check()" stage, the new "check_kfunc_call()" is added
to verify the kernel function call instruction:
1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
   A new bpf_verifier_ops "check_kfunc_call" is added to do that.
   The bpf-tcp-cc struct_ops program will implement this function in
   a later patch.
2. Call "btf_check_kfunc_args_match()" to ensure the regs can be
   used as the args of a kernel function.
3. Mark the regs' type, subreg_def, and zext_dst.

At the later do_misc_fixups() stage, the new fixup_kfunc_call()
will replace the insn->imm with the function address (relative
to __bpf_call_base).  If needed, the jit can find the btf_func_model
by calling the new bpf_jit_find_kfunc_model(prog, insn).
With the imm set to the function address, "bpftool prog dump xlated"
will be able to display the kernel function calls the same way as
it displays other bpf helper calls.

gpl_compatible program is required to call kernel function.

This feature currently requires JIT.

The verifier selftests are adjusted because of the changes in
the verbose log in add_subprog_and_kfunc().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015142.1544736-1-kafai@fb.com
2021-03-26 20:41:51 -07:00
Arnaldo Carvalho de Melo 49f2675f5b tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:

  30b5c851af ("KVM: x86/xen: Add support for vCPU runstate information")

That don't cause any change in tooling as it doesn't introduce any new
ioctl, just parameters to existing one.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-10 09:18:30 -03:00
David S. Miller c1acda9807 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-03-09

The following pull-request contains BPF updates for your *net-next* tree.

We've added 90 non-merge commits during the last 17 day(s) which contain
a total of 114 files changed, 5158 insertions(+), 1288 deletions(-).

The main changes are:

1) Faster bpf_redirect_map(), from Björn.

2) skmsg cleanup, from Cong.

3) Support for floating point types in BTF, from Ilya.

4) Documentation for sys_bpf commands, from Joe.

5) Support for sk_lookup in bpf_prog_test_run, form Lorenz.

6) Enable task local storage for tracing programs, from Song.

7) bpf_for_each_map_elem() helper, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 18:07:05 -08:00
Linus Torvalds 05a59d7979 Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.

 2) TX skb error handling fix in mt76 driver, also from Felix.

 3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.

 4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
    From Cong Wang.

 5) Use correct printf format for dma_addr_t in ath11k, from Geert
    Uytterhoeven.

 6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
    Hsieh.

 7) Don't report truncated frames to mac80211 in mt76 driver, from
    Lorenzop Bianconi.

 8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.

 9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.

10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
    Arjun Roy.

11) Ignore routes with deleted nexthop object in mlxsw, from Ido
    Schimmel.

12) Need to undo tcp early demux lookup sometimes in nf_nat, from
    Florian Westphal.

13) Fix gro aggregation for udp encaps with zero csum, from Daniel
    Borkmann.

14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
    Donenfeld.

15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.

16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.

17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
    Oltean.

18) Make sure textsearch copntrol block is large enough, from Wilem de
    Bruijn.

19) Revert MAC changes to r8152 leading to instability, from Hates Wang.

20) Advance iov in 9p even for empty reads, from Jissheng Zhang.

21) Double hook unregister in nftables, from PabloNeira Ayuso.

22) Fix memleak in ixgbe, fropm Dinghao Liu.

23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.

24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
    Tang.

25) Fix DOI refcount bugs in cipso, from Paul Moore.

26) One too many irqsave in ibmvnic, from Junlin Yang.

27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
    Balazs Nemeth.

* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
  s390/qeth: fix notification for pending buffers during teardown
  s390/qeth: schedule TX NAPI on QAOB completion
  s390/qeth: improve completion of pending TX buffers
  s390/qeth: fix memory leak after failed TX Buffer allocation
  net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
  net: check if protocol extracted by virtio_net_hdr_set_proto is correct
  net: dsa: xrs700x: check if partner is same as port in hsr join
  net: lapbether: Remove netif_start_queue / netif_stop_queue
  atm: idt77252: fix null-ptr-dereference
  atm: uPD98402: fix incorrect allocation
  atm: fix a typo in the struct description
  net: qrtr: fix error return code of qrtr_sendmsg()
  mptcp: fix length of ADD_ADDR with port sub-option
  net: bonding: fix error return code of bond_neigh_init()
  net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
  net: enetc: set MAC RX FIFO to recommended value
  net: davicom: Use platform_get_irq_optional()
  net: davicom: Fix regulator not turned off on driver removal
  net: davicom: Fix regulator not turned off on failed probe
  net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
  ...
2021-03-09 17:15:56 -08:00
Arnaldo Carvalho de Melo 743108e104 tools headers: Update syscall.tbl files to support mount_setattr
To pick the changes from:

  9caccd4154 ("fs: introduce MOUNT_ATTR_IDMAP")

This adds this new syscall to the tables used by tools such as 'perf
trace', so that one can specify it by name and have it filtered, etc.

Addressing these perf build warnings:

  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YD6Wsxr9ByUbab/a@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:30 -03:00
Arnaldo Carvalho de Melo 21b7e35bdf tools headers UAPI: Sync kvm.h headers with the kernel sources
To pick the changes in:

  d9a47edabc ("KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR")
  8d4e7e8083 ("KVM: x86: declare Xen HVM shared info capability and add test case")
  40da8ccd72 ("KVM: x86/xen: Add event channel interrupt vector upcall")

These new IOCTLs are now supported on 'perf trace':

  $ tools/perf/trace/beauty/kvm_ioctl.sh > before
  $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
  $ tools/perf/trace/beauty/kvm_ioctl.sh > after
  $ diff -u before after
  --- before	2021-02-23 09:55:46.229058308 -0300
  +++ after	2021-02-23 09:55:57.509308058 -0300
  @@ -91,6 +91,10 @@
   	[0xc1] = "GET_SUPPORTED_HV_CPUID",
   	[0xc6] = "X86_SET_MSR_FILTER",
   	[0xc7] = "RESET_DIRTY_RINGS",
  +	[0xc8] = "XEN_HVM_GET_ATTR",
  +	[0xc9] = "XEN_HVM_SET_ATTR",
  +	[0xca] = "XEN_VCPU_GET_ATTR",
  +	[0xcb] = "XEN_VCPU_SET_ATTR",
   	[0xe0] = "CREATE_DEVICE",
   	[0xe1] = "SET_DEVICE_ATTR",
   	[0xe2] = "GET_DEVICE_ATTR",
  $

Addressing this perf build warning:
  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:23 -03:00
Arnaldo Carvalho de Melo 1e61463cfc tools headers UAPI: Sync openat2.h with the kernel sources
To pick the changes in:

  99668f6180 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")

That don't result in any change in tooling, only silences this perf
build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/openat2.h' differs from latest version at 'include/uapi/linux/openat2.h'
  diff -u tools/include/uapi/linux/openat2.h include/uapi/linux/openat2.h

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:22 -03:00
Xuesen Huang d01b59c9ae bpf: Add bpf_skb_adjust_room flag BPF_F_ADJ_ROOM_ENCAP_L2_ETH
bpf_skb_adjust_room sets the inner_protocol as skb->protocol for packets
encapsulation. But that is not appropriate when pushing Ethernet header.

Add an option to further specify encap L2 type and set the inner_protocol
as ETH_P_TEB.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xuesen Huang <huangxuesen@kuaishou.com>
Signed-off-by: Zhiyong Cheng <chengzhiyong@kuaishou.com>
Signed-off-by: Li Wang <wangli09@kuaishou.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/20210304064046.6232-1-hxseverything@gmail.com
2021-03-05 16:59:00 +01:00
Lorenz Bauer 7c32e8f8bc bpf: Add PROG_TEST_RUN support for sk_lookup programs
Allow to pass sk_lookup programs to PROG_TEST_RUN. User space
provides the full bpf_sk_lookup struct as context. Since the
context includes a socket pointer that can't be exposed
to user space we define that PROG_TEST_RUN returns the cookie
of the selected socket or zero in place of the socket pointer.

We don't support testing programs that select a reuseport socket,
since this would mean running another (unrelated) BPF program
from the sk_lookup test handler.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com
2021-03-04 19:11:29 -08:00
Joe Stringer 242029f426 tools: Sync uapi bpf.h header with latest changes
Synchronize the header after all of the recent changes.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-16-joe@cilium.io
2021-03-04 18:39:46 -08:00
Joe Stringer 923a932c98 scripts/bpf: Abstract eBPF API target parameter
Abstract out the target parameter so that upcoming commits, more than
just the existing "helpers" target can be called to generate specific
portions of docs from the eBPF UAPI headers.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-10-joe@cilium.io
2021-03-04 18:39:45 -08:00
Ilya Leoshkevich 8fd886911a bpf: Add BTF_KIND_FLOAT to uapi
Add a new kind value and expand the kind bitfield.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-2-iii@linux.ibm.com
2021-03-04 17:58:15 -08:00
Yonghong Song 69c087ba62 bpf: Add bpf_for_each_map_elem() helper
The bpf_for_each_map_elem() helper is introduced which
iterates all map elements with a callback function. The
helper signature looks like
  long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags)
and for each map element, the callback_fn will be called. For example,
like hashmap, the callback signature may look like
  long callback_fn(map, key, val, callback_ctx)

There are two known use cases for this. One is from upstream ([1]) where
a for_each_map_elem helper may help implement a timeout mechanism
in a more generic way. Another is from our internal discussion
for a firewall use case where a map contains all the rules. The packet
data can be compared to all these rules to decide allow or deny
the packet.

For array maps, users can already use a bounded loop to traverse
elements. Using this helper can avoid using bounded loop. For other
type of maps (e.g., hash maps) where bounded loop is hard or
impossible to use, this helper provides a convenient way to
operate on all elements.

For callback_fn, besides map and map element, a callback_ctx,
allocated on caller stack, is also passed to the callback
function. This callback_ctx argument can provide additional
input and allow to write to caller stack for output.

If the callback_fn returns 0, the helper will iterate through next
element if available. If the callback_fn returns 1, the helper
will stop iterating and returns to the bpf program. Other return
values are not used for now.

Currently, this helper is only available with jit. It is possible
to make it work with interpreter with so effort but I leave it
as the future work.

[1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@gmail.com/

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204925.3884923-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Jakub Kicinski 9e8e714f2d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2021-02-26

1) Fix for bpf atomic insns with src_reg=r0, from Brendan.

2) Fix use after free due to bpf_prog_clone, from Cong.

3) Drop imprecise verifier log message, from Dmitrii.

4) Remove incorrect blank line in bpf helper description, from Hangbin.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: No need to drop the packet when there is no geneve opt
  bpf: Remove blank line in bpf helper description comment
  tools/resolve_btfids: Fix build error with older host toolchains
  selftests/bpf: Fix a compiler warning in global func test
  bpf: Drop imprecise log message
  bpf: Clear percpu pointers in bpf_prog_clone_free()
  bpf: Fix a warning message in mark_ptr_not_null_reg()
  bpf, x86: Fix BPF_FETCH atomic and/or/xor with r0 as src
====================

Link: https://lore.kernel.org/r/20210226193737.57004-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 13:16:31 -08:00
Hangbin Liu a83586a7dd bpf: Remove blank line in bpf helper description comment
Commit 34b2021cc6 ("bpf: Add BPF-helper for MTU checking") added an extra
blank line in bpf helper description. This will make bpf_helpers_doc.py stop
building bpf_helper_defs.h immediately after bpf_check_mtu(), which will
affect future added functions.

Fixes: 34b2021cc6 ("bpf: Add BPF-helper for MTU checking")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-26 12:18:12 -08:00
Hangbin Liu a7c9c25a99 bpf: Remove blank line in bpf helper description comment
Commit 34b2021cc6 ("bpf: Add BPF-helper for MTU checking") added an extra
blank line in bpf helper description. This will make bpf_helpers_doc.py stop
building bpf_helper_defs.h immediately after bpf_check_mtu(), which will
affect future added functions.

Fixes: 34b2021cc6 ("bpf: Add BPF-helper for MTU checking")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.com
2021-02-24 17:20:21 +01:00
Linus Torvalds 3a36281a17 New features:
- Support instruction latency in 'perf report', with both memory latency
   (weight) and instruction latency information, users can locate expensive load
   instructions and understand time spent in different stages.
 
 - Extend 'perf c2c' to display the number of loads which were blocked by data
   or address conflict.
 
 - Add 'perf stat' support for L2 topdown events in systems such as Intel's
   Sapphire rapids server.
 
 - Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a sort key, for instance:
 
     perf report --stdio --sort=comm,symbol,code_page_size
 
 - New 'perf daemon' command to run long running sessions while providing a way to control
   the enablement of events without restarting a traditional 'perf record' session.
 
 - Enable counting events for BPF programs in 'perf stat' just like for other
   targets (tid, cgroup, cpu, etc), e.g.:
 
       # perf stat -e ref-cycles,cycles -b 254 -I 1000
          1.487903822            115,200      ref-cycles
          1.487903822             86,012      cycles
          2.489147029             80,560      ref-cycles
          2.489147029             73,784      cycles
       ^C#
 
   The example above counts 'cycles' and 'ref-cycles' of BPF program of id 254.
   It is similar to bpftool-prog-profile command, but more flexible.
 
 - Support the new layout for PERF_RECORD_MMAP2 to carry the DSO build-id using infrastructure
   generalised from the eBPF subsystem, removing the need for traversing the perf.data file
   to collect build-ids at the end of 'perf record' sessions and helping with long running
   sessions where binaries can get replaced in updates, leading to possible mis-resolution
   of symbols.
 
 - Support filtering by hex address in 'perf script'.
 
 - Support DSO filter in 'perf script', like in other perf tools.
 
 - Add namespaces support to 'perf inject'
 
 - Add support for SDT (Dtrace Style Markers) events on ARM64.
 
 perf record:
 
 - Fix handling of eventfd() when draining a buffer in 'perf record'.
 
 - Improvements to the generation of metadata events for pre-existing threads (mmaps, comm, etc),
   speeding up the work done at the start of system wide or per CPU 'perf record' sessions.
 
 Hardware tracing:
 
 - Initial support for tracing KVM with Intel PT.
 
 - Intel PT fixes for IPC
 
 - Support Intel PT PSB (synchronization packets) events.
 
 - Automatically group aux-output events to overcome --filter syntax.
 
 - Enable PERF_SAMPLE_DATA_SRC on ARMs SPE.
 
 - Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0.
 
 perf annotate TUI:
 
 - Fix handling of 'k' ("show line number") hotkey
 
 - Fix jump parsing for C++ code.
 
 perf probe:
 
 - Add protection to avoid endless loop.
 
 cgroups:
 
 - Avoid reading cgroup mountpoint multiple times, caching it.
 
 - Fix handling of cgroup v1/v2 in mixed hierarchy.
 
 Symbol resolving:
 
 - Add OCaml symbol demangling.
 
 - Further fixes for handling PE executables when using perf with Wine and .exe/.dll files.
 
 - Fix 'perf unwind' DSO handling.
 
 - Resolve symbols against debug file first, to deal with artifacts related to LTO.
 
 - Fix gap between kernel end and module start on powerpc.
 
 Reporting tools:
 
 - The DSO filter shouldn't show samples in unresolved maps.
 
 - Improve debuginfod support in various tools.
 
 build ids:
 
 - Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test' entry for that case.
 
 perf test:
 
 - Support for PERF_SAMPLE_WEIGHT_STRUCT.
 
 - Add test case for PERF_SAMPLE_CODE_PAGE_SIZE.
 
 - Shell based tests for 'perf daemon's commands ('start', 'stop, 'reconfig', 'list', etc).
 
 - ARM cs-etm 'perf test' fixes.
 
 - Add parse-metric memory bandwidth testcase.
 
 Compiler related:
 
 - Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used with -fpatchable-function-entry.
 
 - Fix ARM64 build with gcc 11's -Wformat-overflow.
 
 - Fix unaligned access in sample parsing test.
 
 - Fix printf conversion specifier for IP addresses on arm64, s390 and powerpc.
 
 Arch specific:
 
 - Support exposing Performance Monitor Counter SPRs as part of extended regs on powerpc.
 
 - Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn DDR, fix imx8mm ones.
 
 - Fix common and uarch events for ARM64's A76 and Ampere eMag
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYDANTQAKCRCyPKLppCJ+
 J4veAQCISY1BPHscUTRYhq9cwU/Zs0ImtX7zDT4jxaP39JkduAD/eSqYavAJrtQh
 HDyEiTgZ7CQSp5eCbXkzrnet4n3G9QE=
 =H/Jk
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Support instruction latency in 'perf report', with both memory
     latency (weight) and instruction latency information, users can
     locate expensive load instructions and understand time spent in
     different stages.

   - Extend 'perf c2c' to display the number of loads which were blocked
     by data or address conflict.

   - Add 'perf stat' support for L2 topdown events in systems such as
     Intel's Sapphire rapids server.

   - Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a
     sort key, for instance:

        perf report --stdio --sort=comm,symbol,code_page_size

   - New 'perf daemon' command to run long running sessions while
     providing a way to control the enablement of events without
     restarting a traditional 'perf record' session.

   - Enable counting events for BPF programs in 'perf stat' just like
     for other targets (tid, cgroup, cpu, etc), e.g.:

        # perf stat -e ref-cycles,cycles -b 254 -I 1000
           1.487903822            115,200      ref-cycles
           1.487903822             86,012      cycles
           2.489147029             80,560      ref-cycles
           2.489147029             73,784      cycles
        ^C

     The example above counts 'cycles' and 'ref-cycles' of BPF program
     of id 254. It is similar to bpftool-prog-profile command, but more
     flexible.

   - Support the new layout for PERF_RECORD_MMAP2 to carry the DSO
     build-id using infrastructure generalised from the eBPF subsystem,
     removing the need for traversing the perf.data file to collect
     build-ids at the end of 'perf record' sessions and helping with
     long running sessions where binaries can get replaced in updates,
     leading to possible mis-resolution of symbols.

   - Support filtering by hex address in 'perf script'.

   - Support DSO filter in 'perf script', like in other perf tools.

   - Add namespaces support to 'perf inject'

   - Add support for SDT (Dtrace Style Markers) events on ARM64.

  perf record:

   - Fix handling of eventfd() when draining a buffer in 'perf record'.

   - Improvements to the generation of metadata events for pre-existing
     threads (mmaps, comm, etc), speeding up the work done at the start
     of system wide or per CPU 'perf record' sessions.

  Hardware tracing:

   - Initial support for tracing KVM with Intel PT.

   - Intel PT fixes for IPC

   - Support Intel PT PSB (synchronization packets) events.

   - Automatically group aux-output events to overcome --filter syntax.

   - Enable PERF_SAMPLE_DATA_SRC on ARMs SPE.

   - Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0.

  perf annotate TUI:

   - Fix handling of 'k' ("show line number") hotkey

   - Fix jump parsing for C++ code.

  perf probe:

   - Add protection to avoid endless loop.

  cgroups:

   - Avoid reading cgroup mountpoint multiple times, caching it.

   - Fix handling of cgroup v1/v2 in mixed hierarchy.

  Symbol resolving:

   - Add OCaml symbol demangling.

   - Further fixes for handling PE executables when using perf with Wine
     and .exe/.dll files.

   - Fix 'perf unwind' DSO handling.

   - Resolve symbols against debug file first, to deal with artifacts
     related to LTO.

   - Fix gap between kernel end and module start on powerpc.

  Reporting tools:

   - The DSO filter shouldn't show samples in unresolved maps.

   - Improve debuginfod support in various tools.

  build ids:

   - Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test'
     entry for that case.

  perf test:

   - Support for PERF_SAMPLE_WEIGHT_STRUCT.

   - Add test case for PERF_SAMPLE_CODE_PAGE_SIZE.

   - Shell based tests for 'perf daemon's commands ('start', 'stop,
     'reconfig', 'list', etc).

   - ARM cs-etm 'perf test' fixes.

   - Add parse-metric memory bandwidth testcase.

  Compiler related:

   - Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used
     with -fpatchable-function-entry.

   - Fix ARM64 build with gcc 11's -Wformat-overflow.

   - Fix unaligned access in sample parsing test.

   - Fix printf conversion specifier for IP addresses on arm64, s390 and
     powerpc.

  Arch specific:

   - Support exposing Performance Monitor Counter SPRs as part of
     extended regs on powerpc.

   - Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn
     DDR, fix imx8mm ones.

   - Fix common and uarch events for ARM64's A76 and Ampere eMag"

* tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (148 commits)
  perf buildid-cache: Don't skip 16-byte build-ids
  perf buildid-cache: Add test for 16-byte build-id
  perf symbol: Remove redundant libbfd checks
  perf test: Output the sub testing result in cs-etm
  perf test: Suppress logs in cs-etm testing
  perf tools: Fix arm64 build error with gcc-11
  perf intel-pt: Add documentation for tracing virtual machines
  perf intel-pt: Split VM-Entry and VM-Exit branches
  perf intel-pt: Adjust sample flags for VM-Exit
  perf intel-pt: Allow for a guest kernel address filter
  perf intel-pt: Support decoding of guest kernel
  perf machine: Factor out machine__idle_thread()
  perf machine: Factor out machines__find_guest()
  perf intel-pt: Amend decoder to track the NR flag
  perf intel-pt: Retain the last PIP packet payload as is
  perf intel_pt: Add vmlaunch and vmresume as branches
  perf script: Add branch types for VM-Entry and VM-Exit
  perf auxtrace: Automatically group aux-output events
  perf test: Fix unaligned access in sample parsing test
  perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing
  ...
2021-02-22 13:59:43 -08:00
Linus Torvalds 3e10585335 x86:
- Support for userspace to emulate Xen hypercalls
 - Raise the maximum number of user memslots
 - Scalability improvements for the new MMU.  Instead of the complex
   "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an
   rwlock so that page faults are concurrent, but the code that can run
   against page faults is limited.  Right now only page faults take the
   lock for reading; in the future this will be extended to some
   cases of page table destruction.  I hope to switch the default MMU
   around 5.12-rc3 (some testing was delayed due to Chinese New Year).
 - Cleanups for MAXPHYADDR checks
 - Use static calls for vendor-specific callbacks
 - On AMD, use VMLOAD/VMSAVE to save and restore host state
 - Stop using deprecated jump label APIs
 - Workaround for AMD erratum that made nested virtualization unreliable
 - Support for LBR emulation in the guest
 - Support for communicating bus lock vmexits to userspace
 - Add support for SEV attestation command
 - Miscellaneous cleanups
 
 PPC:
 - Support for second data watchpoint on POWER10
 - Remove some complex workarounds for buggy early versions of POWER9
 - Guest entry/exit fixes
 
 ARM64
 - Make the nVHE EL2 object relocatable
 - Cleanups for concurrent translation faults hitting the same page
 - Support for the standard TRNG hypervisor call
 - A bunch of small PMU/Debug fixes
 - Simplification of the early init hypercall handling
 
 Non-KVM changes (with acks):
 - Detection of contended rwlocks (implemented only for qrwlocks,
   because KVM only needs it for x86)
 - Allow __DISABLE_EXPORTS from assembly code
 - Provide a saner follow_pfn replacements for modules
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmApSRgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOc7wf9FnlinKoTFaSk7oeuuhF/CoCVwSFs
 Z9+A2sNI99tWHQxFR6dyDkEFeQoXnqSxfLHtUVIdH/JnTg0FkEvFz3NK+0PzY1PF
 PnGNbSoyhP58mSBG4gbBAxdF3ZJZMB8GBgYPeR62PvMX2dYbcHqVBNhlf6W4MQK4
 5mAUuAnbf19O5N267sND+sIg3wwJYwOZpRZB7PlwvfKAGKf18gdBz5dQ/6Ej+apf
 P7GODZITjqM5Iho7SDm/sYJlZprFZT81KqffwJQHWFMEcxFgwzrnYPx7J3gFwRTR
 eeh9E61eCBDyCTPpHROLuNTVBqrAioCqXLdKOtO5gKvZI3zmomvAsZ8uXQ==
 =uFZU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "x86:

   - Support for userspace to emulate Xen hypercalls

   - Raise the maximum number of user memslots

   - Scalability improvements for the new MMU.

     Instead of the complex "fast page fault" logic that is used in
     mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent,
     but the code that can run against page faults is limited. Right now
     only page faults take the lock for reading; in the future this will
     be extended to some cases of page table destruction. I hope to
     switch the default MMU around 5.12-rc3 (some testing was delayed
     due to Chinese New Year).

   - Cleanups for MAXPHYADDR checks

   - Use static calls for vendor-specific callbacks

   - On AMD, use VMLOAD/VMSAVE to save and restore host state

   - Stop using deprecated jump label APIs

   - Workaround for AMD erratum that made nested virtualization
     unreliable

   - Support for LBR emulation in the guest

   - Support for communicating bus lock vmexits to userspace

   - Add support for SEV attestation command

   - Miscellaneous cleanups

  PPC:

   - Support for second data watchpoint on POWER10

   - Remove some complex workarounds for buggy early versions of POWER9

   - Guest entry/exit fixes

  ARM64:

   - Make the nVHE EL2 object relocatable

   - Cleanups for concurrent translation faults hitting the same page

   - Support for the standard TRNG hypervisor call

   - A bunch of small PMU/Debug fixes

   - Simplification of the early init hypercall handling

  Non-KVM changes (with acks):

   - Detection of contended rwlocks (implemented only for qrwlocks,
     because KVM only needs it for x86)

   - Allow __DISABLE_EXPORTS from assembly code

   - Provide a saner follow_pfn replacements for modules"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits)
  KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes
  KVM: selftests: Don't bother mapping GVA for Xen shinfo test
  KVM: selftests: Fix hex vs. decimal snafu in Xen test
  KVM: selftests: Fix size of memslots created by Xen tests
  KVM: selftests: Ignore recently added Xen tests' build output
  KVM: selftests: Add missing header file needed by xAPIC IPI tests
  KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
  KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static
  locking/arch: Move qrwlock.h include after qspinlock.h
  KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
  KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
  KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
  KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
  KVM: PPC: remove unneeded semicolon
  KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
  KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
  KVM: PPC: Book3S HV: Fix radix guest SLB side channel
  KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
  KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
  KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
  ...
2021-02-21 13:31:43 -08:00
David S. Miller b8af417e4d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-02-16

The following pull-request contains BPF updates for your *net-next* tree.

There's a small merge conflict between 7eeba1706e ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f81 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:

  [...]
                lock_sock(sk);
                err = tcp_zerocopy_receive(sk, &zc, &tss);
                err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
                                                          &zc, &len, err);
                release_sock(sk);
  [...]

We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).

The main changes are:

1) Adds support of pointers to types with known size among global function
   args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.

2) Add bpf_iter for task_vma which can be used to generate information similar
   to /proc/pid/maps, from Song Liu.

3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
   rewriting bind user ports from BPF side below the ip_unprivileged_port_start
   range, both from Stanislav Fomichev.

4) Prevent recursion on fentry/fexit & sleepable programs and allow map-in-map
   as well as per-cpu maps for the latter, from Alexei Starovoitov.

5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
   for sleepable programs, both from KP Singh.

6) Extend verifier to enable variable offset read/write access to the BPF
   program stack, from Andrei Matei.

7) Improve tc & XDP MTU handling and add a new bpf_check_mtu() helper to
   query device MTU from programs, from Jesper Dangaard Brouer.

8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
   tracing programs, from Florent Revest.

9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
   otherwise too many passes are required, from Gary Lin.

10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
    verification both related to zero-extension handling, from Ilya Leoshkevich.

11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.

12) Batch of AF_XDP selftest cleanups and small performance improvement
    for libbpf's xsk map redirect for newer kernels, from Björn Töpel.

13) Follow-up BPF doc and verifier improvements around atomics with
    BPF_FETCH, from Brendan Jackman.

14) Permit zero-sized data sections e.g. if ELF .rodata section contains
    read-only data from local variables, from Yonghong Song.

15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 13:14:06 -08:00
Arnaldo Carvalho de Melo c1bd8a2b9f Merge branch 'perf/urgent' into perf/core
To get some fixes that didn't made into 5.11.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-16 11:52:16 -03:00
Jesper Dangaard Brouer 34b2021cc6 bpf: Add BPF-helper for MTU checking
This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs.

The SKB object is complex and the skb->len value (accessible from
BPF-prog) also include the length of any extra GRO/GSO segments, but
without taking into account that these GRO/GSO segments get added
transport (L4) and network (L3) headers before being transmitted. Thus,
this BPF-helper is created such that the BPF-programmer don't need to
handle these details in the BPF-prog.

The API is designed to help the BPF-programmer, that want to do packet
context size changes, which involves other helpers. These other helpers
usually does a delta size adjustment. This helper also support a delta
size (len_diff), which allow BPF-programmer to reuse arguments needed by
these other helpers, and perform the MTU check prior to doing any actual
size adjustment of the packet context.

It is on purpose, that we allow the len adjustment to become a negative
result, that will pass the MTU check. This might seem weird, but it's not
this helpers responsibility to "catch" wrong len_diff adjustments. Other
helpers will take care of these checks, if BPF-programmer chooses to do
actual size adjustment.

V14:
 - Improve man-page desc of len_diff.

V13:
 - Enforce flag BPF_MTU_CHK_SEGS cannot use len_diff.

V12:
 - Simplify segment check that calls skb_gso_validate_network_len.
 - Helpers should return long

V9:
- Use dev->hard_header_len (instead of ETH_HLEN)
- Annotate with unlikely req from Daniel
- Fix logic error using skb_gso_validate_network_len from Daniel

V6:
- Took John's advice and dropped BPF_MTU_CHK_RELAX
- Returned MTU is kept at L3-level (like fib_lookup)

V4: Lot of changes
 - ifindex 0 now use current netdev for MTU lookup
 - rename helper from bpf_mtu_check to bpf_check_mtu
 - fix bug for GSO pkt length (as skb->len is total len)
 - remove __bpf_len_adj_positive, simply allow negative len adj

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287790461.790810.3429728639563297353.stgit@firesoul
2021-02-13 01:15:28 +01:00
Jesper Dangaard Brouer e1850ea9bd bpf: bpf_fib_lookup return MTU value as output when looked up
The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED. The BPF-prog
don't know the MTU value that caused this rejection.

If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
need to know this MTU value for the ICMP packet.

Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
value as output via a union with 'tot_len' as this is the value used for
the MTU lookup.

V5:
 - Fixed uninit value spotted by Dan Carpenter.
 - Name struct output member mtu_result

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287789952.790810.13134700381067698781.stgit@firesoul
2021-02-13 01:15:22 +01:00
Paolo Bonzini 8c6e67bec3 KVM/arm64 updates for Linux 5.12
- Make the nVHE EL2 object relocatable, resulting in much more
   maintainable code
 - Handle concurrent translation faults hitting the same page
   in a more elegant way
 - Support for the standard TRNG hypervisor call
 - A bunch of small PMU/Debug fixes
 - Allow the disabling of symbol export from assembly code
 - Simplification of the early init hypercall handling
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmAmjqEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDoUEQAIrJ7YF4v4gz06a0HG9+b6fbmykHyxlG7jfm
 trvctfaiKzOybKoY5odPpNFzhbYOOdXXqYipyTHGwBYtGSy9G/9SjMKSUrfln2Ni
 lr1wBqapr9TE+SVKoR8pWWuZxGGbHVa7brNuMbMsMi1wwAsM2/n70H9PXrdq3QiK
 Ge1DWLso2oEfhtTwqNKa4dwB2MHjBhBFhhq+Nq5pslm6mmxJaYqz7pyBmw/C+2cc
 oU/6kpAa1yPAauptWXtYXJYOMHihxgEa1IdK3Gl0hUyFyu96xVkwH/KFsj+bRs23
 QGGCSdy4313hzaoGaSOTK22R98Aeg0wI9a6tcCBvVVjTAztnlu1FPtUZr8e/F7uc
 +r8xVJUJFiywt3Zktf/D7YDK9LuMMqFnj0BkI4U9nIBY59XZRNhENsBCmjru5lnL
 iXa5cuta03H4emfssIChLpgn0XHFas6t5dFXBPGbXyw0qsQchTw98iQX9LVxefUK
 rOUGPIN4nE9ESRIZe0SPlAVeCtNP8cLH7+0YG9MJ1QeDVYaUsnvy9Ln/ox+514mR
 5y2KJ6y7xnLB136SKCzPDDloYtz7BDiJq6a/RPiXKGheKoxy+N+BSe58yWCqFZYE
 Fx/cGUr7oSg39U7gCboog6BDp5e2CXBfbRllg6P47bZFfdPNwzNEzHvk49VltMxx
 Rl2W05bk
 =6EwV
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.12

- Make the nVHE EL2 object relocatable, resulting in much more
  maintainable code
- Handle concurrent translation faults hitting the same page
  in a more elegant way
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Allow the disabling of symbol export from assembly code
- Simplification of the early init hypercall handling
2021-02-12 11:23:44 -05:00
Florent Revest c5dbb89fc2 bpf: Expose bpf_get_socket_cookie to tracing programs
This needs a new helper that:
- can work in a sleepable context (using sock_gen_cookie)
- takes a struct sock pointer and checks that it's not NULL

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-2-revest@chromium.org
2021-02-11 17:44:41 -08:00
Florent Revest 07881ccbf4 bpf: Be less specific about socket cookies guarantees
Since "92acdc58ab11 bpf, net: Rework cookie generator as per-cpu one"
socket cookies are not guaranteed to be non-decreasing. The
bpf_get_socket_cookie helper descriptions are currently specifying that
cookies are non-decreasing but we don't want users to rely on that.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-1-revest@chromium.org
2021-02-11 17:44:40 -08:00