Commit Graph

5682 Commits (c38b8400aef99d63be2b1ff131bb993465dcafe1)

Author SHA1 Message Date
Uros Bizjak b4b55dfd96 ring_buffer: Change some static functions to void
The results of some static functions are not used. Change the
type of these function to void and remove unnecessary returns.

No functional change intended.

Link: https://lkml.kernel.org/r/20230305155532.5549-2-ubizjak@gmail.com

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:59:31 -04:00
Mark Rutland fee86a4ed5 ftrace: selftest: remove broken trace_direct_tramp
The ftrace selftest code has a trace_direct_tramp() function which it
uses as a direct call trampoline. This happens to work on x86, since the
direct call's return address is in the usual place, and can be returned
to via a RET, but in general the calling convention for direct calls is
different from regular function calls, and requires a trampoline written
in assembly.

On s390, regular function calls place the return address in %r14, and an
ftrace patch-site in an instrumented function places the trampoline's
return address (which is within the instrumented function) in %r0,
preserving the original %r14 value in-place. As a regular C function
will return to the address in %r14, using a C function as the trampoline
results in the trampoline returning to the caller of the instrumented
function, skipping the body of the instrumented function.

Note that the s390 issue is not detcted by the ftrace selftest code, as
the instrumented function is trivial, and returning back into the caller
happens to be equivalent.

On arm64, regular function calls place the return address in x30, and
an ftrace patch-site in an instrumented function saves this into r9
and places the trampoline's return address (within the instrumented
function) in x30. A regular C function will return to the address in
x30, but will not restore x9 into x30. Consequently, using a C function
as the trampoline results in returning to the trampoline's return
address having corrupted x30, such that when the instrumented function
returns, it will return back into itself.

To avoid future issues in this area, remove the trace_direct_tramp()
function, and require that each architecture with direct calls provides
a stub trampoline, named ftrace_stub_direct_tramp. This can be written
to handle the architecture's trampoline calling convention, and in
future could be used elsewhere (e.g. in the ftrace ops sample, to
measure the overhead of direct calls), so we may as well always build it
in.

Link: https://lkml.kernel.org/r/20230321140424.345218-8-revest@chromium.org

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:59:29 -04:00
Florent Revest 60c8971899 ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS
Direct called trampolines can be called in two ways:
- either from the ftrace callsite. In this case, they do not access any
  struct ftrace_regs nor pt_regs
- Or, if a ftrace ops is also attached, from the end of a ftrace
  trampoline. In this case, the call_direct_funcs ops is in charge of
  setting the direct call trampoline's address in a struct ftrace_regs

Since:

commit 9705bc7096 ("ftrace: pass fregs to arch_ftrace_set_direct_caller()")

The later case no longer requires a full pt_regs. It only needs a struct
ftrace_regs so DIRECT_CALLS can work with both WITH_ARGS or WITH_REGS.
With architectures like arm64 already abandoning WITH_REGS in favor of
WITH_ARGS, it's important to have DIRECT_CALLS work WITH_ARGS only.

Link: https://lkml.kernel.org/r/20230321140424.345218-7-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:43:32 -04:00
Florent Revest dbaccb618f ftrace: Store direct called addresses in their ops
All direct calls are now registered using the register_ftrace_direct API
so each ops can jump to only one direct-called trampoline.

By storing the direct called trampoline address directly in the ops we
can save one hashmap lookup in the direct call ops and implement arm64
direct calls on top of call ops.

Link: https://lkml.kernel.org/r/20230321140424.345218-6-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:43:32 -04:00
Florent Revest da8bdfbd42 ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIs
Now that the original _ftrace_direct APIs are gone, the "_multi"
suffixes only add confusion.

Link: https://lkml.kernel.org/r/20230321140424.345218-5-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:43:32 -04:00
Florent Revest 8788ca164e ftrace: Remove the legacy _ftrace_direct API
This API relies on a single global ops, used for all direct calls
registered with it. However, to implement arm64 direct calls, we need
each ops to point to a single direct call trampoline.

Link: https://lkml.kernel.org/r/20230321140424.345218-4-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:43:31 -04:00
Florent Revest 23edf48309 ftrace: Replace uses of _ftrace_direct APIs with _ftrace_direct_multi
The _multi API requires that users keep their own ops but can enforce
that an op is only associated to one direct call.

Link: https://lkml.kernel.org/r/20230321140424.345218-3-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:42:12 -04:00
Florent Revest 59495740f7 ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()
A common pattern when using the ftrace_direct_multi API is to unregister
the ops and also immediately free its filter. We've noticed it's very
easy for users to miss calling ftrace_free_filter().

This adds a "free_filters" argument to unregister_ftrace_direct_multi()
to both remind the user they should free filters and also to make their
life easier.

Link: https://lkml.kernel.org/r/20230321140424.345218-2-revest@chromium.org

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:42:11 -04:00
Zhen Lei 3703bd54cd kallsyms: Delete an unused parameter related to {module_}kallsyms_on_each_symbol()
The parameter 'struct module *' in the hook function associated with
{module_}kallsyms_on_each_symbol() is no longer used. Delete it.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-19 13:27:19 -07:00
Linus Torvalds eaba52d63b Tracing fixes for 6.3:
- Fix setting affinity of hwlat threads in containers
   Using sched_set_affinity() has unwanted side effects when being
   called within a container. Use set_cpus_allowed_ptr() instead.
 
 - Fix per cpu thread management of the hwlat tracer
   * Do not start per_cpu threads if one is already running for the CPU.
   * When starting per_cpu threads, do not clear the kthread variable
     as it may already be set to running per cpu threads
 
 - Fix return value for test_gen_kprobe_cmd()
   On error the return value was overwritten by being set to
   the result of the call from kprobe_event_delete(), which would
   likely succeed, and thus have the function return success.
 
 - Fix splice() reads from the trace file that was broken by
   36e2c7421f ("fs: don't allow splice read/write without explicit ops")
 
 - Remove obsolete and confusing comment in ring_buffer.c
   The original design of the ring buffer used struct page flags
   for tricks to optimize, which was shortly removed due to them
   being tricks. But a comment for those tricks remained.
 
 - Set local functions and variables to static
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZBdIkBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qti2AP49s1GM8teFgDF/CO3oa45BoYq1lMJO
 1Z+x+mS/bdUAZgEAw+3KGo8oZDuvWu/nr04JPeoy0GL1/JnbQ6JNCCjzhQc=
 =Yk66
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix setting affinity of hwlat threads in containers

   Using sched_set_affinity() has unwanted side effects when being
   called within a container. Use set_cpus_allowed_ptr() instead

 - Fix per cpu thread management of the hwlat tracer:
    - Do not start per_cpu threads if one is already running for the CPU
    - When starting per_cpu threads, do not clear the kthread variable
      as it may already be set to running per cpu threads

 - Fix return value for test_gen_kprobe_cmd()

   On error the return value was overwritten by being set to the result
   of the call from kprobe_event_delete(), which would likely succeed,
   and thus have the function return success

 - Fix splice() reads from the trace file that was broken by commit
   36e2c7421f ("fs: don't allow splice read/write without explicit
   ops")

 - Remove obsolete and confusing comment in ring_buffer.c

   The original design of the ring buffer used struct page flags for
   tricks to optimize, which was shortly removed due to them being
   tricks. But a comment for those tricks remained

 - Set local functions and variables to static

* tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
  ring-buffer: remove obsolete comment for free_buffer_page()
  tracing: Make splice_read available again
  ftrace: Set direct_ops storage-class-specifier to static
  trace/hwlat: Do not start per-cpu thread if it is already running
  trace/hwlat: Do not wipe the contents of per-cpu thread data
  tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
  tracing: Fix wrong return in kprobe_event_gen_test.c
2023-03-19 10:46:02 -07:00
Costa Shulyupin 71c7a30442 tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
There is a problem with the behavior of hwlat in a container,
resulting in incorrect output. A warning message is generated:
"cpumask changed while in round-robin mode, switching to mode none",
and the tracing_cpumask is ignored. This issue arises because
the kernel thread, hwlatd, is not a part of the container, and
the function sched_setaffinity is unable to locate it using its PID.
Additionally, the task_struct of hwlatd is already known.
Ultimately, the function set_cpus_allowed_ptr achieves
the same outcome as sched_setaffinity, but employs task_struct
instead of PID.

Test case:

  # cd /sys/kernel/tracing
  # echo 0 > tracing_on
  # echo round-robin > hwlat_detector/mode
  # echo hwlat > current_tracer
  # unshare --fork --pid bash -c 'echo 1 > tracing_on'
  # dmesg -c

Actual behavior:

[573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none

Link: https://lore.kernel.org/linux-trace-kernel/20230316144535.1004952-1-costa.shul@redhat.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 0330f7aa8e ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19 13:23:22 -04:00
Vlastimil Babka a98151ad53 ring-buffer: remove obsolete comment for free_buffer_page()
The comment refers to mm/slob.c which is being removed. It comes from
commit ed56829cb3 ("ring_buffer: reset buffer page when freeing") and
according to Steven the borrowed code was a page mapcount and mapping
reset, which was later removed by commit e4c2ce82ca ("ring_buffer:
allocate buffer page pointer"). Thus the comment is not accurate anyway,
remove it.

Link: https://lore.kernel.org/linux-trace-kernel/20230315142446.27040-1-vbabka@suse.cz

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Reported-by: Mike Rapoport <mike.rapoport@gmail.com>
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fixes: e4c2ce82ca ("ring_buffer: allocate buffer page pointer")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19 13:23:22 -04:00
Sung-hun Kim e400be674a tracing: Make splice_read available again
Since the commit 36e2c7421f ("fs: don't allow splice read/write
without explicit ops") is applied to the kernel, splice() and
sendfile() calls on the trace file (/sys/kernel/debug/tracing
/trace) return EINVAL.

This patch restores these system calls by initializing splice_read
in file_operations of the trace file. This patch only enables such
functionalities for the read case.

Link: https://lore.kernel.org/linux-trace-kernel/20230314013707.28814-1-sfoon.kim@samsung.com

Cc: stable@vger.kernel.org
Fixes: 36e2c7421f ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19 13:16:21 -04:00
Tom Rix 8732565549 ftrace: Set direct_ops storage-class-specifier to static
smatch reports this warning
kernel/trace/ftrace.c:2594:19: warning:
  symbol 'direct_ops' was not declared. Should it be static?

The variable direct_ops is only used in ftrace.c, so it should be static

Link: https://lore.kernel.org/linux-trace-kernel/20230311135113.711824-1-trix@redhat.com

Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19 12:20:49 -04:00
Tero Kristo 08697bca9b trace/hwlat: Do not start per-cpu thread if it is already running
The hwlatd tracer will end up starting multiple per-cpu threads with
the following script:

    #!/bin/sh
    cd /sys/kernel/debug/tracing
    echo 0 > tracing_on
    echo hwlat > current_tracer
    echo per-cpu > hwlat_detector/mode
    echo 100000 > hwlat_detector/width
    echo 200000 > hwlat_detector/window
    echo 1 > tracing_on

To fix the issue, check if the hwlatd thread for the cpu is already
running, before starting a new one. Along with the previous patch, this
avoids running multiple instances of the same CPU thread on the system.

Link: https://lore.kernel.org/all/20230302113654.2984709-1-tero.kristo@linux.intel.com/
Link: https://lkml.kernel.org/r/20230310100451.3948583-3-tero.kristo@linux.intel.com

Cc: stable@vger.kernel.org
Fixes: f46b16520a ("trace/hwlat: Implement the per-cpu mode")
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19 12:20:49 -04:00
Tero Kristo 4c42f5f0d1 trace/hwlat: Do not wipe the contents of per-cpu thread data
Do not wipe the contents of the per-cpu kthread data when starting the
tracer, as this will completely forget about already running instances
and can later start new additional per-cpu threads.

Link: https://lore.kernel.org/all/20230302113654.2984709-1-tero.kristo@linux.intel.com/
Link: https://lkml.kernel.org/r/20230310100451.3948583-2-tero.kristo@linux.intel.com

Cc: stable@vger.kernel.org
Fixes: f46b16520a ("trace/hwlat: Implement the per-cpu mode")
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19 12:20:48 -04:00
Tom Rix 7a025e066e tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
smatch reports several similar warnings
kernel/trace/trace_osnoise.c:220:1: warning:
  symbol '__pcpu_scope_per_cpu_osnoise_var' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:243:1: warning:
  symbol '__pcpu_scope_per_cpu_timerlat_var' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:335:14: warning:
  symbol 'interface_lock' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:2242:5: warning:
  symbol 'timerlat_min_period' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:2243:5: warning:
  symbol 'timerlat_max_period' was not declared. Should it be static?

These variables are only used in trace_osnoise.c, so it should be static

Link: https://lore.kernel.org/linux-trace-kernel/20230309150414.4036764-1-trix@redhat.com

Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19 12:20:48 -04:00
Anton Gusev bc4f359b3b tracing: Fix wrong return in kprobe_event_gen_test.c
Overwriting the error code with the deletion result may cause the
function to return 0 despite encountering an error. Commit b111545d26
("tracing: Remove the useless value assignment in
test_create_synth_event()") solves a similar issue by
returning the original error code, so this patch does the same.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Link: https://lore.kernel.org/linux-trace-kernel/20230131075818.5322-1-aagusev@ispras.ru

Signed-off-by: Anton Gusev <aagusev@ispras.ru>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19 12:20:48 -04:00
Jakub Kicinski 1118aa4c70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/wireless/nl80211.c
  b27f07c50a ("wifi: nl80211: fix puncturing bitmap policy")
  cbbaf2bb82 ("wifi: nl80211: add a command to enable/disable HW timestamping")
https://lore.kernel.org/all/20230314105421.3608efae@canb.auug.org.au

tools/testing/selftests/net/Makefile
  62199e3f16 ("selftests: net: Add VXLAN MDB test")
  13715acf8a ("selftest: Add test for bind() conflicts.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-17 16:29:25 -07:00
Linus Torvalds 29db00c252 Tracing fixes for v6.3
- Do not allow histogram values to have modifies.
   Can cause a NULL pointer dereference if they do.
 
 - Warn if hist_field_name() is passed a NULL.
   Prevent the NULL pointer dereference mentioned above.
 
 - Fix invalid address look up race in lookup_rec()
 
 - Define ftrace_stub_graph conditionally to prevent linker errors
 
 - Always check if RCU is watching at all tracepoint locations
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZBDuTBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsboAP4yfrFYvIIKM5EkzkEiPI+V2hdlA12x
 bt839jO5AWCmhAEAiY8FmKatpBJQKsiGqSOab8aHOMnhGFZwltCHAPa9PAI=
 =vtA2
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Do not allow histogram values to have modifies. They can cause a NULL
   pointer dereference if they do.

 - Warn if hist_field_name() is passed a NULL. Prevent the NULL pointer
   dereference mentioned above.

 - Fix invalid address look up race in lookup_rec()

 - Define ftrace_stub_graph conditionally to prevent linker errors

 - Always check if RCU is watching at all tracepoint locations

* tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Make tracepoint lockdep check actually test something
  ftrace,kcfi: Define ftrace_stub_graph conditionally
  ftrace: Fix invalid address access in lookup_rec() when index is 0
  tracing: Check field value in hist_field_name()
  tracing: Do not let histogram values have some modifiers
2023-03-14 17:07:54 -07:00
Jakub Kicinski d0ddf5065f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Documentation/bpf/bpf_devel_QA.rst
  b7abcd9c65 ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
  d56b0c461d ("bpf, docs: Fix link to netdev-FAQ target")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 22:22:11 -08:00
Chen Zhongjin ee92fa4433 ftrace: Fix invalid address access in lookup_rec() when index is 0
KASAN reported follow problem:

 BUG: KASAN: use-after-free in lookup_rec
 Read of size 8 at addr ffff000199270ff0 by task modprobe
 CPU: 2 Comm: modprobe
 Call trace:
  kasan_report
  __asan_load8
  lookup_rec
  ftrace_location
  arch_check_ftrace_location
  check_kprobe_address_safe
  register_kprobe

When checking pg->records[pg->index - 1].ip in lookup_rec(), it can get a
pg which is newly added to ftrace_pages_start in ftrace_process_locs().
Before the first pg->index++, index is 0 and accessing pg->records[-1].ip
will cause this problem.

Don't check the ip when pg->index is 0.

Link: https://lore.kernel.org/linux-trace-kernel/20230309080230.36064-1-chenzhongjin@huawei.com

Cc: stable@vger.kernel.org
Fixes: 9644302e33 ("ftrace: Speed up search by skipping pages by address")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-09 22:17:06 -05:00
Steven Rostedt (Google) 9f116f76fa tracing: Check field value in hist_field_name()
The function hist_field_name() cannot handle being passed a NULL field
parameter. It should never be NULL, but due to a previous bug, NULL was
passed to the function and the kernel crashed due to a NULL dereference.
Mark Rutland reported this to me on IRC.

The bug was fixed, but to prevent future bugs from crashing the kernel,
check the field and add a WARN_ON() if it is NULL.

Link: https://lkml.kernel.org/r/20230302020810.762384440@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: c6afad49d1 ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers")
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-09 22:17:06 -05:00
Steven Rostedt (Google) e0213434fe tracing: Do not let histogram values have some modifiers
Histogram values can not be strings, stacktraces, graphs, symbols,
syscalls, or grouped in buckets or log. Give an error if a value is set to
do so.

Note, the histogram code was not prepared to handle these modifiers for
histograms and caused a bug.

Mark Rutland reported:

 # echo 'p:copy_to_user __arch_copy_to_user n=$arg2' >> /sys/kernel/tracing/kprobe_events
 # echo 'hist:keys=n:vals=hitcount.buckets=8:sort=hitcount' > /sys/kernel/tracing/events/kprobes/copy_to_user/trigger
 # cat /sys/kernel/tracing/events/kprobes/copy_to_user/hist
[  143.694628] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  143.695190] Mem abort info:
[  143.695362]   ESR = 0x0000000096000004
[  143.695604]   EC = 0x25: DABT (current EL), IL = 32 bits
[  143.695889]   SET = 0, FnV = 0
[  143.696077]   EA = 0, S1PTW = 0
[  143.696302]   FSC = 0x04: level 0 translation fault
[  143.702381] Data abort info:
[  143.702614]   ISV = 0, ISS = 0x00000004
[  143.702832]   CM = 0, WnR = 0
[  143.703087] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000448f9000
[  143.703407] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  143.704137] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  143.704714] Modules linked in:
[  143.705273] CPU: 0 PID: 133 Comm: cat Not tainted 6.2.0-00003-g6fc512c10a7c #3
[  143.706138] Hardware name: linux,dummy-virt (DT)
[  143.706723] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  143.707120] pc : hist_field_name.part.0+0x14/0x140
[  143.707504] lr : hist_field_name.part.0+0x104/0x140
[  143.707774] sp : ffff800008333a30
[  143.707952] x29: ffff800008333a30 x28: 0000000000000001 x27: 0000000000400cc0
[  143.708429] x26: ffffd7a653b20260 x25: 0000000000000000 x24: ffff10d303ee5800
[  143.708776] x23: ffffd7a6539b27b0 x22: ffff10d303fb8c00 x21: 0000000000000001
[  143.709127] x20: ffff10d303ec2000 x19: 0000000000000000 x18: 0000000000000000
[  143.709478] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  143.709824] x14: 0000000000000000 x13: 203a6f666e692072 x12: 6567676972742023
[  143.710179] x11: 0a230a6d6172676f x10: 000000000000002c x9 : ffffd7a6521e018c
[  143.710584] x8 : 000000000000002c x7 : 7f7f7f7f7f7f7f7f x6 : 000000000000002c
[  143.710915] x5 : ffff10d303b0103e x4 : ffffd7a653b20261 x3 : 000000000000003d
[  143.711239] x2 : 0000000000020001 x1 : 0000000000000001 x0 : 0000000000000000
[  143.711746] Call trace:
[  143.712115]  hist_field_name.part.0+0x14/0x140
[  143.712642]  hist_field_name.part.0+0x104/0x140
[  143.712925]  hist_field_print+0x28/0x140
[  143.713125]  event_hist_trigger_print+0x174/0x4d0
[  143.713348]  hist_show+0xf8/0x980
[  143.713521]  seq_read_iter+0x1bc/0x4b0
[  143.713711]  seq_read+0x8c/0xc4
[  143.713876]  vfs_read+0xc8/0x2a4
[  143.714043]  ksys_read+0x70/0xfc
[  143.714218]  __arm64_sys_read+0x24/0x30
[  143.714400]  invoke_syscall+0x50/0x120
[  143.714587]  el0_svc_common.constprop.0+0x4c/0x100
[  143.714807]  do_el0_svc+0x44/0xd0
[  143.714970]  el0_svc+0x2c/0x84
[  143.715134]  el0t_64_sync_handler+0xbc/0x140
[  143.715334]  el0t_64_sync+0x190/0x194
[  143.715742] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (f9400000)
[  143.716510] ---[ end trace 0000000000000000 ]---
Segmentation fault

Link: https://lkml.kernel.org/r/20230302020810.559462599@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: c6afad49d1 ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-09 22:17:06 -05:00
Jakub Kicinski 36e5e391a2 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZsBwAKCRDbK58LschI
 g3W1AQCQnO6pqqX5Q2aYDAZPlZRtV2TRLjuqrQE0dHW/XLAbBgD/bgsAmiKhPSCG
 2mTt6izpTQVlZB0e8KcDIvbYd9CE3Qc=
 =EjJQ
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-03-06

We've added 85 non-merge commits during the last 13 day(s) which contain
a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).

The main changes are:

1) Add skb and XDP typed dynptrs which allow BPF programs for more
   ergonomic and less brittle iteration through data and variable-sized
   accesses, from Joanne Koong.

2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF
   open-coded iterators allowing for less restrictive looping capabilities,
   from Andrii Nakryiko.

3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
   programs to NULL-check before passing such pointers into kfunc,
   from Alexei Starovoitov.

4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
   local storage maps, from Kumar Kartikeya Dwivedi.

5) Add BPF verifier support for ST instructions in convert_ctx_access()
   which will help new -mcpu=v4 clang flag to start emitting them,
   from Eduard Zingerman.

6) Make uprobe attachment Android APK aware by supporting attachment
   to functions inside ELF objects contained in APKs via function names,
   from Daniel Müller.

7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper
   to start the timer with absolute expiration value instead of relative
   one, from Tero Kristo.

8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id,
   from Tejun Heo.

9) Extend libbpf to support users manually attaching kprobes/uprobes
   in the legacy/perf/link mode, from Menglong Dong.

10) Implement workarounds in the mips BPF JIT for DADDI/R4000,
   from Jiaxun Yang.

11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT,
    from Hengqi Chen.

12) Extend BPF instruction set doc with describing the encoding of BPF
    instructions in terms of how bytes are stored under big/little endian,
    from Jose E. Marchesi.

13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.

14) Fix bpf_xdp_query() backwards compatibility on old kernels,
    from Yonghong Song.

15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS,
    from Florent Revest.

16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache,
    from Hou Tao.

17) Fix BPF verifier's check_subprogs to not unnecessarily mark
    a subprogram with has_tail_call, from Ilya Leoshkevich.

18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
  selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
  selftests/bpf: Split test_attach_probe into multi subtests
  libbpf: Add support to set kprobe/uprobe attach mode
  tools/resolve_btfids: Add /libsubcmd to .gitignore
  bpf: add support for fixed-size memory pointer returns for kfuncs
  bpf: generalize dynptr_get_spi to be usable for iters
  bpf: mark PTR_TO_MEM as non-null register type
  bpf: move kfunc_call_arg_meta higher in the file
  bpf: ensure that r0 is marked scratched after any function call
  bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
  bpf: clean up visit_insn()'s instruction processing
  selftests/bpf: adjust log_fixup's buffer size for proper truncation
  bpf: honor env->test_state_freq flag in is_state_visited()
  selftests/bpf: enhance align selftest's expected log matching
  bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
  bpf: improve stack slot state printing
  selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
  selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
  bpf: allow ctx writes using BPF_ST_MEM instruction
  bpf: Use separate RCU callbacks for freeing selem
  ...
====================

Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06 20:36:39 -08:00
Linus Torvalds 9d0281b56b block-6.3-2023-03-03
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmQB57MQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgputpEADVrc1OFzHOivJq+LJ3HS3ufhLBthtgu1Lp
 sEHvDNp9tBGXMLkomuCYpAju5TBAEKC+AJTZyj9iS1j++ItoezdoP55YRIH7t2Or
 UTy8ex3rLPGkQk6k3o8roWCyajTW/ZS+4fmk+NkVYMLsQBp9I+kFbxgJa5bbREdU
 Z8b/9hcBGz58R8Kq+TEMp/bO7oCV4c8xWumrKER+MktDDx0kc5d+afWXoy7bEKFg
 jLB3gleTM9HUpa9a2GPc4fxqdb0KanQdMtiyn/oplg0JcZLMiHfRbiRnsgQkjN0O
 RVtUcdxXmOkQeFra4GXPiHmQBcIfE85wP4wxb8p/F2StYRhb1epzzeCXOhuNZvv4
 dd6OSARgtzWt3OlHka4aC63H4kzs9SxJp0F2uwuPLV0fM91TP1oOTWV+53FrQr9Z
 OQYyB8d9Il4K72NFLwU4ukJ1fPoCRHjpgAXIIkasEjaBftpJlMNnfblncTZTBumy
 XumFVdKfvqc3OFt8LLKWqLDV0j3TknVeCMPKhsbRwQ0NG4vlNOSWaLkGJCDLJ7ga
 ebf8AD5eaLCT9qyYquBuW5VBKZH5Z4rf5yHta9Dx+Omu0JTQYtTkiiM3UTdpDbtq
 SObZ31UvLoYK2dOZcVgjhE2RgM/AV5jJcx7aHhT3UptavAehHbePgiNhuEEntlKv
 L87kXJkSSQ==
 =ezrg
 -----END PGP SIGNATURE-----

Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
      - Don't access released socket during error recovery (Akinobu
        Mita)
      - Bring back auto-removal of deleted namespaces during sequential
        scan (Christoph Hellwig)
      - Fix an error code in nvme_auth_process_dhchap_challenge (Dan
        Carpenter)
      - Show well known discovery name (Daniel Wagner)
      - Add a missing endianess conversion in effects masking (Keith
        Busch)

 - Fix for a regression introduced in blk-rq-qos during init in this
   merge window (Breno)

 - Reorder a few fields in struct blk_mq_tag_set, eliminating a few
   holes and shrinking it (Christophe)

 - Remove redundant bdev_get_queue() NULL checks (Juhyung)

 - Add sed-opal single user mode support flag (Luca)

 - Remove SQE128 check in ublk as it isn't needed, saving some memory
   (Ming)

 - Op specific segment checking for cloned requests (Uday)

 - Exclusive open partition scan fixes (Yu)

 - Loop offset/size checking before assigning them in the device (Zhong)

 - Bio polling fixes (me)

* tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux:
  blk-mq: enforce op-specific segment limits in blk_insert_cloned_request
  nvme-fabrics: show well known discovery name
  nvme-tcp: don't access released socket during error recovery
  nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()
  nvme: bring back auto-removal of deleted namespaces during sequential scan
  blk-iocost: Pass gendisk to ioc_refresh_params
  nvme: fix sparse warning on effects masking
  block: be a bit more careful in checking for NULL bdev while polling
  block: clear bio->bi_bdev when putting a bio back in the cache
  loop: loop_set_status_from_info() check before assignment
  ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd
  block: remove more NULL checks after bdev_get_queue()
  blk-mq: Reorder fields in 'struct blk_mq_tag_set'
  block: fix scan partition for exclusively open device again
  block: Revert "block: Do not reread partition table on exclusively open device"
  sed-opal: add support flag for SUM in status ioctl
2023-03-03 10:21:39 -08:00
Tejun Heo c501bf55c8 bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types
These helpers are safe to call from any context and there's no reason to
restrict access to them. Remove them from bpf_trace and filter lists and add
to bpf_base_func_proto() under perfmon_capable().

v2: After consulting with Andrii, relocated in bpf_base_func_proto() so that
    they require bpf_capable() but not perfomon_capable() as it doesn't read
    from or affect others on the system.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/ZAD8QyoszMZiTzBY@slm.duckdns.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02 22:37:07 -08:00
Linus Torvalds 116b41162f probes cleanup updates for v6.3:
These are the probe events cleanup patches, no new features but improve
 readability.
 
 - Rename print_probe_args() to trace_probe_print_args() and un-inlined.
 
 - Introduce a set of default data fetch functions for dynamic probe
   events.
 
 - Extract common code of data fetch process of dynamic probe events.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmP40igbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b1Q4IAIdX/xWscmgnH/OjKlXK
 U/O/mzvgY4m8pGpxbTHQ0DLea6PW2SUmGuPz+9BvRMisNtJP3M/C3vpnYEZRwkY9
 Movn0lCBZ47Dw2pdwBz34XKUI+B3Tp42/Xs5J8FaKlp+/CcJNqGpsJnF0Hm05MUb
 fZyQNI2hArnXITLHP+ADNiG6lTly0Zaza5nw27P+VW+6sfJhzuSG4hl+WXgHL8ut
 M0s6972QYH08uGs7Ywnl8pMpng6861YpHnREoOLOJ+1Cg6IYA+O6XK3Q75EXp+IX
 lXl611AAKZKpvc27eX+e9ionJRS4j1C0C12HSHYX/B8DN7SzuctXm1zmKewonHd5
 r8M=
 =1Lix
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull kprobes cleanup updates from Masami Hiramatsu:
 "These are probe events cleanups, no new features but improve
  readability:

   - Rename print_probe_args() to trace_probe_print_args() and
     un-inline it

   - Introduce a set of default data fetch functions for dynamic
     probe events

   - Extract common code of data fetch process of dynamic probe events"

* tag 'probes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  kernel/trace: extract common part in process_fetch_insn
  kernel/trace: Provide default impelentations defined in trace_probe_tmpl.h
  kernel/trace: Introduce trace_probe_print_args and use it in *probes
2023-02-25 13:06:28 -08:00
Linus Torvalds e534a583cc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha updates from Al Viro:
 "Mostly small janitorial fixes but there's also more important ones: a
  patch to fix loading large modules from Edward Humes, and some fixes
  from Al Viro"

[ The fixes from Al mostly came in separately through Al's trees too and
  are now duplicated..   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha: in_irq() cleanup
  alpha: lazy FPU switching
  alpha/boot/misc: trim unused declarations
  alpha/boot/tools/objstrip: fix the check for ELF header
  alpha/boot: fix the breakage from -isystem series...
  alpha: fix FEN fault handling
  alpha: Avoid comma separated statements
  alpha: fixed a typo in core_cia.c
  alpha: remove unused __SLOW_DOWN_IO and SLOW_DOWN_IO definitions
  alpha: update config files
  alpha: fix R_ALPHA_LITERAL reloc for large modules
  alpha: Add some spaces to ensure format specification
  alpha: replace NR_SYSCALLS by NR_syscalls
  alpha: Remove redundant local asm header redirections
  alpha: Implement "current_stack_pointer"
  alpha: remove redundant err variable
  alpha: osf_sys: reduce kernel log spamming on invalid osf_mount call typenr
2023-02-25 12:49:29 -08:00
Song Chen bd78acc89d kernel/trace: extract common part in process_fetch_insn
Each probe has an instance of process_fetch_insn respectively,
but they have something in common.

This patch aims to extract the common part into
process_common_fetch_insn which can be shared by each probe,
and they only need to focus on their special cases.

Signed-off-by: Song Chen <chensong_2000@189.cn>
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
2023-02-24 10:04:13 +09:00
Song Chen 672a2bf840 kernel/trace: Provide default impelentations defined in trace_probe_tmpl.h
There are 6 function definitions in trace_probe_tmpl.h, they are:

1, fetch_store_strlen
2, fetch_store_string
3, fetch_store_strlen_user
4, fetch_store_string_user
5, probe_mem_read
6, probe_mem_read_user

Every C file which includes trace_probe_tmpl.h has to implement them,
otherwise it gets warnings and errors. However, some of them are identical,
like kprobe and eprobe, as a result, there is a lot redundant code in those
2 files.

This patch would like to provide default behaviors for those functions
which kprobe and eprobe can share by just including trace_probe_kernel.h
with trace_probe_tmpl.h together.

It removes redundant code, increases readability, and more importantly,
makes it easier to introduce a new feature based on trace probe
(it's possible).

Link: https://lore.kernel.org/all/1672382018-18347-1-git-send-email-chensong_2000@189.cn/

Signed-off-by: Song Chen <chensong_2000@189.cn>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-24 09:44:27 +09:00
Song Chen 196b6389a3 kernel/trace: Introduce trace_probe_print_args and use it in *probes
print_probe_args is currently inplemented in trace_probe_tmpl.h and
included by *probes, as a result, each probe has an identical copy.

This patch will move it to trace_probe.c as an new API, each probe
calls it to print their args in trace file.

Link: https://lore.kernel.org/all/1672382000-18304-1-git-send-email-chensong_2000@189.cn/

Signed-off-by: Song Chen <chensong_2000@189.cn>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-24 09:44:15 +09:00
Linus Torvalds 2b79eb73e2 probes updates for 6.3:
- Skip negative return code check for snprintf in eprobe.
 
 - Add recursive call test cases for kprobe unit test
 
 - Add 'char' type to probe events to show it as the character instead of value.
 
 - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols.
 
 - Fix kselftest to check filter on eprobe event correctly.
 
 - Add filter on eprobe to the README file in tracefs.
 
 - Fix optprobes to check whether there is 'under unoptimizing' optprobe when optimizing another kprobe correctly.
 
 - Fix optprobe to check whether there is 'under unoptimizing' optprobe when fetching the original instruction correctly.
 
 - Fix optprobe to free 'forcibly unoptimized' optprobe correctly.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmP0JdYACgkQ2/sHvwUr
 Pxt6sQf/TD9Kwqx3XG1tnLPev6yt2nuggUippHwWUFHlJtMyUaLV8aKFqByyEe+j
 tCQvrFIIJq242xg0Jac/MAf2exlWG9jsmVZPmvC1YzepOAbjXu2eBkIS7LsbeHjF
 JJypNnEceffWCpNoD6nlvR0xWXenqRbZJwdsGqo3u+fXnzTurEMY2GU2xOyv39tv
 S1uNLPANJxdMb/2iUsUE3hMbe82dqr8zPcApqWFtTBB6QPHI3B2SjuQHpQxwbTPl
 bzAl0yQkLSQXprVzT7xJ4xLnzbl1ljgJBci5aX8BFF+VD9oYkypdfYVczBH5VsP9
 E3eT9T9lRf4Q99EqxNy5uw7NqQXGQg==
 =CMPb
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull kprobes updates from Masami Hiramatsu:

 - Skip negative return code check for snprintf in eprobe

 - Add recursive call test cases for kprobe unit test

 - Add 'char' type to probe events to show it as the character instead
   of value

 - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols

 - Fix kselftest to check filter on eprobe event correctly

 - Add filter on eprobe to the README file in tracefs

 - Fix optprobes to check whether there is 'under unoptimizing' optprobe
   when optimizing another kprobe correctly

 - Fix optprobe to check whether there is 'under unoptimizing' optprobe
   when fetching the original instruction correctly

 - Fix optprobe to free 'forcibly unoptimized' optprobe correctly

* tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/eprobe: no need to check for negative ret value for snprintf
  test_kprobes: Add recursed kprobe test case
  tracing/probe: add a char type to show the character value of traced arguments
  selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols
  selftests/ftrace: Fix eprobe syntax test case to check filter support
  tracing/eprobe: Fix to add filter on eprobe description in README file
  x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
  x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
  kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
2023-02-23 13:03:08 -08:00
Linus Torvalds b72b5fecc1 tracing updates for 6.3:
- Add function names as a way to filter function addresses
 
 - Add sample module to test ftrace ops and dynamic trampolines
 
 - Allow stack traces to be passed from beginning event to end event for
   synthetic events. This will allow seeing the stack trace of when a task is
   scheduled out and recorded when it gets scheduled back in.
 
 - Add trace event helper __get_buf() to use as a temporary buffer when printing
   out trace event output.
 
 - Add kernel command line to create trace instances on boot up.
 
 - Add enabling of events to instances created at boot up.
 
 - Add trace_array_puts() to write into instances.
 
 - Allow boot instances to take a snapshot at the end of boot up.
 
 - Allow live patch modules to include trace events
 
 - Minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY/PaaBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qh5iAPoD0LKZzD33rhO5Ec4hoexE0DkqycP3
 dvmOMbCBL8GkxwEA+d2gLz/EquSFm166hc4D79Sn3geCqvkwmy8vQWVjIQc=
 =M82D
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add function names as a way to filter function addresses

 - Add sample module to test ftrace ops and dynamic trampolines

 - Allow stack traces to be passed from beginning event to end event for
   synthetic events. This will allow seeing the stack trace of when a
   task is scheduled out and recorded when it gets scheduled back in.

 - Add trace event helper __get_buf() to use as a temporary buffer when
   printing out trace event output.

 - Add kernel command line to create trace instances on boot up.

 - Add enabling of events to instances created at boot up.

 - Add trace_array_puts() to write into instances.

 - Allow boot instances to take a snapshot at the end of boot up.

 - Allow live patch modules to include trace events

 - Minor fixes and clean ups

* tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
  tracing: Remove unnecessary NULL assignment
  tracepoint: Allow livepatch module add trace event
  tracing: Always use canonical ftrace path
  tracing/histogram: Fix stacktrace histogram Documententation
  tracing/histogram: Fix stacktrace key
  tracing/histogram: Fix a few problems with stacktrace variable printing
  tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
  tracing/histogram: Don't use strlen to find length of stacktrace variables
  tracing: Allow boot instances to have snapshot buffers
  tracing: Add trace_array_puts() to write into instance
  tracing: Add enabling of events to boot instances
  tracing: Add creation of instances at boot command line
  tracing: Fix trace_event_raw_event_synth() if else statement
  samples: ftrace: Make some global variables static
  ftrace: sample: avoid open-coded 64-bit division
  samples: ftrace: Include the nospec-branch.h only for x86
  tracing: Acquire buffer from temparary trace sequence
  tracing/histogram: Wrap remaining shell snippets in code blocks
  tracing/osnoise: No need for schedule_hrtimeout range
  bpf/tracing: Use stage6 of tracing to not duplicate macros
  ...
2023-02-23 10:20:49 -08:00
Linus Torvalds 9191423872 tracing: Fix race that causes a warning of corrupt ring buffer
With the change that allows to read the "trace" file without disabling
 writing to the ring buffer, there was an integrity check of the ring
 buffer in the iterator read code, that expected the ring buffer to be
 write disabled. This caused the integrity check to trigger when stress
 reading the "trace" file while writing was happening.
 
 The integrity check is a bit aggressive (and has never triggered in
 practice). Change it so that it checks just the integrity of the linked
 pages without clearing the flags inside the pointers. This removes the
 warning that was being triggered.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY+05KBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qm2uAQDpygn75IrW1aApx5wySl2GKo0dSjg9
 PaQVTYFYLMxIqgEA7oZQFQwokmZrIThqcQ+JA710+C1xecKYUl+apGza9Qk=
 =aTTM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2-rc7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix race that causes a warning of corrupt ring buffer

  With the change that allows to read the "trace" file without disabling
  writing to the ring buffer, there was an integrity check of the ring
  buffer in the iterator read code, that expected the ring buffer to be
  write disabled. This caused the integrity check to trigger when stress
  reading the "trace" file while writing was happening.

  The integrity check is a bit aggressive (and has never triggered in
  practice). Change it so that it checks just the integrity of the
  linked pages without clearing the flags inside the pointers. This
  removes the warning that was being triggered"

[ Heh. This was supposed to have gone in last week before the 6.2
  release, but Steven forgot to actually add me to the participants of
  the pull request, so here it is, a week later   - Linus ]

* tag 'trace-v6.2-rc7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Handle race between rb_move_tail and rb_check_pages
2023-02-23 10:08:01 -08:00
Linus Torvalds 5b7c4cabbb Networking changes for 6.3.
Core
 ----
 
  - Add dedicated kmem_cache for typical/small skb->head, avoid having
    to access struct page at kfree time, and improve memory use.
 
  - Introduce sysctl to set default RPS configuration for new netdevs.
 
  - Define Netlink protocol specification format which can be used
    to describe messages used by each family and auto-generate parsers.
    Add tools for generating kernel data structures and uAPI headers.
 
  - Expose all net/core sysctls inside netns.
 
  - Remove 4s sleep in netpoll if carrier is instantly detected on boot.
 
  - Add configurable limit of MDB entries per port, and port-vlan.
 
  - Continue populating drop reasons throughout the stack.
 
  - Retire a handful of legacy Qdiscs and classifiers.
 
 Protocols
 ---------
 
  - Support IPv4 big TCP (TSO frames larger than 64kB).
 
  - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
    on socket by socket basis.
 
  - Track and report in procfs number of MPTCP sockets used.
 
  - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP
    path manager.
 
  - IPv6: don't check net.ipv6.route.max_size and rely on garbage
    collection to free memory (similarly to IPv4).
 
  - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
 
  - ICMP: add per-rate limit counters.
 
  - Add support for user scanning requests in ieee802154.
 
  - Remove static WEP support.
 
  - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
    reporting.
 
  - WiFi 7 EHT channel puncturing support (client & AP).
 
 BPF
 ---
 
  - Add a rbtree data structure following the "next-gen data structure"
    precedent set by recently added linked list, that is, by using
    kfunc + kptr instead of adding a new BPF map type.
 
  - Expose XDP hints via kfuncs with initial support for RX hash and
    timestamp metadata.
 
  - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key
    to better support decap on GRE tunnel devices not operating
    in collect metadata.
 
  - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
 
  - Remove the need for trace_printk_lock for bpf_trace_printk
    and bpf_trace_vprintk helpers.
 
  - Extend libbpf's bpf_tracing.h support for tracing arguments of
    kprobes/uprobes and syscall as a special case.
 
  - Significantly reduce the search time for module symbols
    by livepatch and BPF.
 
  - Enable cpumasks to be used as kptrs, which is useful for tracing
    programs tracking which tasks end up running on which CPUs in
    different time intervals.
 
  - Add support for BPF trampoline on s390x and riscv64.
 
  - Add capability to export the XDP features supported by the NIC.
 
  - Add __bpf_kfunc tag for marking kernel functions as kfuncs.
 
  - Add cgroup.memory=nobpf kernel parameter option to disable BPF
    memory accounting for container environments.
 
 Netfilter
 ---------
 
  - Remove the CLUSTERIP target. It has been marked as obsolete
    for years, and we still have WARN splats wrt. races of
    the out-of-band /proc interface installed by this target.
 
  - Add 'destroy' commands to nf_tables. They are identical to
    the existing 'delete' commands, but do not return an error if
    the referenced object (set, chain, rule...) did not exist.
 
 Driver API
 ----------
 
  - Improve cpumask_local_spread() locality to help NICs set the right
    IRQ affinity on AMD platforms.
 
  - Separate C22 and C45 MDIO bus transactions more clearly.
 
  - Introduce new DCB table to control DSCP rewrite on egress.
 
  - Support configuration of Physical Layer Collision Avoidance (PLCA)
    Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
    shared medium Ethernet.
 
  - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
    preemption of low priority frames by high priority frames.
 
  - Add support for controlling MACSec offload using netlink SET.
 
  - Rework devlink instance refcounts to allow registration and
    de-registration under the instance lock. Split the code into multiple
    files, drop some of the unnecessarily granular locks and factor out
    common parts of netlink operation handling.
 
  - Add TX frame aggregation parameters (for USB drivers).
 
  - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
    messages with notifications for debug.
 
  - Allow offloading of UDP NEW connections via act_ct.
 
  - Add support for per action HW stats in TC.
 
  - Support hardware miss to TC action (continue processing in SW from
    a specific point in the action chain).
 
  - Warn if old Wireless Extension user space interface is used with
    modern cfg80211/mac80211 drivers. Do not support Wireless Extensions
    for Wi-Fi 7 devices at all. Everyone should switch to using nl80211
    interface instead.
 
  - Improve the CAN bit timing configuration. Use extack to return error
    messages directly to user space, update the SJW handling, including
    the definition of a new default value that will benefit CAN-FD
    controllers, by increasing their oscillator tolerance.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - nVidia BlueField-3 support (control traffic driver)
    - Ethernet support for imx93 SoCs
    - Motorcomm yt8531 gigabit Ethernet PHY
    - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
    - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
    - Amlogic gxl MDIO mux
 
  - WiFi:
    - RealTek RTL8188EU (rtl8xxxu)
    - Qualcomm Wi-Fi 7 devices (ath12k)
 
  - CAN:
    - Renesas R-Car V4H
 
 Drivers
 -------
 
  - Bluetooth:
    - Set Per Platform Antenna Gain (PPAG) for Intel controllers.
 
  - Ethernet NICs:
    - Intel (1G, igc):
      - support TSN / Qbv / packet scheduling features of i226 model
    - Intel (100G, ice):
      - use GNSS subsystem instead of TTY
      - multi-buffer XDP support
      - extend support for GPIO pins to E823 devices
    - nVidia/Mellanox:
      - update the shared buffer configuration on PFC commands
      - implement PTP adjphase function for HW offset control
      - TC support for Geneve and GRE with VF tunnel offload
      - more efficient crypto key management method
      - multi-port eswitch support
    - Netronome/Corigine:
      - add DCB IEEE support
      - support IPsec offloading for NFP3800
    - Freescale/NXP (enetc):
      - enetc: support XDP_REDIRECT for XDP non-linear buffers
      - enetc: improve reconfig, avoid link flap and waiting for idle
      - enetc: support MAC Merge layer
    - Other NICs:
      - sfc/ef100: add basic devlink support for ef100
      - ionic: rx_push mode operation (writing descriptors via MMIO)
      - bnxt: use the auxiliary bus abstraction for RDMA
      - r8169: disable ASPM and reset bus in case of tx timeout
      - cpsw: support QSGMII mode for J721e CPSW9G
      - cpts: support pulse-per-second output
      - ngbe: add an mdio bus driver
      - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
      - r8152: handle devices with FW with NCM support
      - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
      - virtio-net: support multi buffer XDP
      - virtio/vsock: replace virtio_vsock_pkt with sk_buff
      - tsnep: XDP support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add support for latency TLV (in FW control messages)
    - Microchip (sparx5):
      - separate explicit and implicit traffic forwarding rules, make
        the implicit rules always active
      - add support for egress DSCP rewrite
      - IS0 VCAP support (Ingress Classification)
      - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.)
      - ES2 VCAP support (Egress Access Control)
      - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - add MAB (port auth) offload support
      - enable PTP receive for mv88e6390
    - NXP (ocelot):
      - support MAC Merge layer
      - support for the the vsc7512 internal copper phys
    - Microchip:
      - lan9303: convert to PHYLINK
      - lan966x: support TC flower filter statistics
      - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
      - lan937x: support Credit Based Shaper configuration
      - ksz9477: support Energy Efficient Ethernet
    - other:
      - qca8k: convert to regmap read/write API, use bulk operations
      - rswitch: Improve TX timestamp accuracy
 
  - Intel WiFi (iwlwifi):
    - EHT (Wi-Fi 7) rate reporting
    - STEP equalizer support: transfer some STEP (connection to radio
      on platforms with integrated wifi) related parameters from the
      BIOS to the firmware.
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - IPQ5018 support
    - Fine Timing Measurement (FTM) responder role support
    - channel 177 support
 
  - MediaTek WiFi (mt76):
    - per-PHY LED support
    - mt7996: EHT (Wi-Fi 7) support
    - Wireless Ethernet Dispatch (WED) reset support
    - switch to using page pool allocator
 
  - RealTek WiFi (rtw89):
    - support new version of Bluetooth co-existance
 
  - Mobile:
    - rmnet: support TX aggregation.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S
 IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK
 nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ
 7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el
 n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW
 9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl
 leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4
 LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP
 n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC
 xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc
 edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1
 Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI=
 =xXhC
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Add dedicated kmem_cache for typical/small skb->head, avoid having
     to access struct page at kfree time, and improve memory use.

   - Introduce sysctl to set default RPS configuration for new netdevs.

   - Define Netlink protocol specification format which can be used to
     describe messages used by each family and auto-generate parsers.
     Add tools for generating kernel data structures and uAPI headers.

   - Expose all net/core sysctls inside netns.

   - Remove 4s sleep in netpoll if carrier is instantly detected on
     boot.

   - Add configurable limit of MDB entries per port, and port-vlan.

   - Continue populating drop reasons throughout the stack.

   - Retire a handful of legacy Qdiscs and classifiers.

  Protocols:

   - Support IPv4 big TCP (TSO frames larger than 64kB).

   - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
     on socket by socket basis.

   - Track and report in procfs number of MPTCP sockets used.

   - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
     manager.

   - IPv6: don't check net.ipv6.route.max_size and rely on garbage
     collection to free memory (similarly to IPv4).

   - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).

   - ICMP: add per-rate limit counters.

   - Add support for user scanning requests in ieee802154.

   - Remove static WEP support.

   - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
     reporting.

   - WiFi 7 EHT channel puncturing support (client & AP).

  BPF:

   - Add a rbtree data structure following the "next-gen data structure"
     precedent set by recently added linked list, that is, by using
     kfunc + kptr instead of adding a new BPF map type.

   - Expose XDP hints via kfuncs with initial support for RX hash and
     timestamp metadata.

   - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
     better support decap on GRE tunnel devices not operating in collect
     metadata.

   - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.

   - Remove the need for trace_printk_lock for bpf_trace_printk and
     bpf_trace_vprintk helpers.

   - Extend libbpf's bpf_tracing.h support for tracing arguments of
     kprobes/uprobes and syscall as a special case.

   - Significantly reduce the search time for module symbols by
     livepatch and BPF.

   - Enable cpumasks to be used as kptrs, which is useful for tracing
     programs tracking which tasks end up running on which CPUs in
     different time intervals.

   - Add support for BPF trampoline on s390x and riscv64.

   - Add capability to export the XDP features supported by the NIC.

   - Add __bpf_kfunc tag for marking kernel functions as kfuncs.

   - Add cgroup.memory=nobpf kernel parameter option to disable BPF
     memory accounting for container environments.

  Netfilter:

   - Remove the CLUSTERIP target. It has been marked as obsolete for
     years, and we still have WARN splats wrt races of the out-of-band
     /proc interface installed by this target.

   - Add 'destroy' commands to nf_tables. They are identical to the
     existing 'delete' commands, but do not return an error if the
     referenced object (set, chain, rule...) did not exist.

  Driver API:

   - Improve cpumask_local_spread() locality to help NICs set the right
     IRQ affinity on AMD platforms.

   - Separate C22 and C45 MDIO bus transactions more clearly.

   - Introduce new DCB table to control DSCP rewrite on egress.

   - Support configuration of Physical Layer Collision Avoidance (PLCA)
     Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
     shared medium Ethernet.

   - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
     preemption of low priority frames by high priority frames.

   - Add support for controlling MACSec offload using netlink SET.

   - Rework devlink instance refcounts to allow registration and
     de-registration under the instance lock. Split the code into
     multiple files, drop some of the unnecessarily granular locks and
     factor out common parts of netlink operation handling.

   - Add TX frame aggregation parameters (for USB drivers).

   - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
     messages with notifications for debug.

   - Allow offloading of UDP NEW connections via act_ct.

   - Add support for per action HW stats in TC.

   - Support hardware miss to TC action (continue processing in SW from
     a specific point in the action chain).

   - Warn if old Wireless Extension user space interface is used with
     modern cfg80211/mac80211 drivers. Do not support Wireless
     Extensions for Wi-Fi 7 devices at all. Everyone should switch to
     using nl80211 interface instead.

   - Improve the CAN bit timing configuration. Use extack to return
     error messages directly to user space, update the SJW handling,
     including the definition of a new default value that will benefit
     CAN-FD controllers, by increasing their oscillator tolerance.

  New hardware / drivers:

   - Ethernet:
      - nVidia BlueField-3 support (control traffic driver)
      - Ethernet support for imx93 SoCs
      - Motorcomm yt8531 gigabit Ethernet PHY
      - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
      - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
      - Amlogic gxl MDIO mux

   - WiFi:
      - RealTek RTL8188EU (rtl8xxxu)
      - Qualcomm Wi-Fi 7 devices (ath12k)

   - CAN:
      - Renesas R-Car V4H

  Drivers:

   - Bluetooth:
      - Set Per Platform Antenna Gain (PPAG) for Intel controllers.

   - Ethernet NICs:
      - Intel (1G, igc):
         - support TSN / Qbv / packet scheduling features of i226 model
      - Intel (100G, ice):
         - use GNSS subsystem instead of TTY
         - multi-buffer XDP support
         - extend support for GPIO pins to E823 devices
      - nVidia/Mellanox:
         - update the shared buffer configuration on PFC commands
         - implement PTP adjphase function for HW offset control
         - TC support for Geneve and GRE with VF tunnel offload
         - more efficient crypto key management method
         - multi-port eswitch support
      - Netronome/Corigine:
         - add DCB IEEE support
         - support IPsec offloading for NFP3800
      - Freescale/NXP (enetc):
         - support XDP_REDIRECT for XDP non-linear buffers
         - improve reconfig, avoid link flap and waiting for idle
         - support MAC Merge layer
      - Other NICs:
         - sfc/ef100: add basic devlink support for ef100
         - ionic: rx_push mode operation (writing descriptors via MMIO)
         - bnxt: use the auxiliary bus abstraction for RDMA
         - r8169: disable ASPM and reset bus in case of tx timeout
         - cpsw: support QSGMII mode for J721e CPSW9G
         - cpts: support pulse-per-second output
         - ngbe: add an mdio bus driver
         - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
         - r8152: handle devices with FW with NCM support
         - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
         - virtio-net: support multi buffer XDP
         - virtio/vsock: replace virtio_vsock_pkt with sk_buff
         - tsnep: XDP support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add support for latency TLV (in FW control messages)
      - Microchip (sparx5):
         - separate explicit and implicit traffic forwarding rules, make
           the implicit rules always active
         - add support for egress DSCP rewrite
         - IS0 VCAP support (Ingress Classification)
         - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
           etc.)
         - ES2 VCAP support (Egress Access Control)
         - support for Per-Stream Filtering and Policing (802.1Q,
           8.6.5.1)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - add MAB (port auth) offload support
         - enable PTP receive for mv88e6390
      - NXP (ocelot):
         - support MAC Merge layer
         - support for the the vsc7512 internal copper phys
      - Microchip:
         - lan9303: convert to PHYLINK
         - lan966x: support TC flower filter statistics
         - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
         - lan937x: support Credit Based Shaper configuration
         - ksz9477: support Energy Efficient Ethernet
      - other:
         - qca8k: convert to regmap read/write API, use bulk operations
         - rswitch: Improve TX timestamp accuracy

   - Intel WiFi (iwlwifi):
      - EHT (Wi-Fi 7) rate reporting
      - STEP equalizer support: transfer some STEP (connection to radio
        on platforms with integrated wifi) related parameters from the
        BIOS to the firmware.

   - Qualcomm 802.11ax WiFi (ath11k):
      - IPQ5018 support
      - Fine Timing Measurement (FTM) responder role support
      - channel 177 support

   - MediaTek WiFi (mt76):
      - per-PHY LED support
      - mt7996: EHT (Wi-Fi 7) support
      - Wireless Ethernet Dispatch (WED) reset support
      - switch to using page pool allocator

   - RealTek WiFi (rtw89):
      - support new version of Bluetooth co-existance

   - Mobile:
      - rmnet: support TX aggregation"

* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
  page_pool: add a comment explaining the fragment counter usage
  net: ethtool: fix __ethtool_dev_mm_supported() implementation
  ethtool: pse-pd: Fix double word in comments
  xsk: add linux/vmalloc.h to xsk.c
  sefltests: netdevsim: wait for devlink instance after netns removal
  selftest: fib_tests: Always cleanup before exit
  net/mlx5e: Align IPsec ASO result memory to be as required by hardware
  net/mlx5e: TC, Set CT miss to the specific ct action instance
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/sched: flower: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: Rename user cookie and act cookie
  sfc: fix builds without CONFIG_RTC_LIB
  sfc: clean up some inconsistent indentings
  net/mlx4_en: Introduce flexible array to silence overflow warning
  net: lan966x: Fix possible deadlock inside PTP
  net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
  ...
2023-02-21 18:24:12 -08:00
Linus Torvalds 8bf1a529cd arm64 updates for 6.3:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that Linux
   needs to save/restore.
 
 - Include TPIDR2 in the signal context and add the corresponding
   kselftests.
 
 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time.
 
 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
 
 - Permit EFI boot with MMU and caches on. Instead of cleaning the entire
   loaded kernel image to the PoC and disabling the MMU and caches before
   branching to the kernel bare metal entry point, leave the MMU and
   caches enabled and rely on EFI's cacheable 1:1 mapping of all of
   system RAM to populate the initial page tables.
 
 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values).
 
 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure.
 
 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice.
 
 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone.
 
 - Further arm64 sysreg conversion and some fixes.
 
 - arm64 kselftest fixes and improvements.
 
 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation.
 
 - Pseudo-NMI code generation optimisations.
 
 - Minor fixes for SME and TPIDR2 handling.
 
 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable, replace
   strtobool() to kstrtobool() in the cpufeature.c code, apply dynamic
   shadow call stack in two passes, intercept pfn changes in set_pte_at()
   without the required break-before-make sequence, attempt to dump all
   instructions on unhandled kernel faults.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmP0/QsACgkQa9axLQDI
 XvG+gA/+JDVEH9wRzAIZvbp9hSuohPc48xgAmIMP1eiVB0/5qeRjYAJwS33H0rXS
 BPC2kj9IBy/eQeM9ICg0nFd0zYznSVacITqe6NrqeJ1F+ftS4rrHdfxd+J7kIoCs
 V2L8e+BJvmHdhmNV2qMAgJdGlfxfQBA7fv2cy52HKYcouoOh1AUVR/x+yXVXAsCd
 qJP3+dlUKccgm/oc5unEC1eZ49u8O+EoasqOyfG6K5udMgzhEX3K6imT9J3hw0WT
 UjstYkx5uGS/prUrRCQAX96VCHoZmzEDKtQuHkHvQXEYXsYPF3ldbR2CziNJnHe7
 QfSkjJlt8HAtExA+BkwEe9i0MQO/2VF5qsa2e4fA6l7uqGu3LOtS/jJd23C9n9fR
 Id8aBMeN6S8+MjqRA9L2uf4t6e4ISEHoG9ZRdc4WOwloxEEiJoIeun+7bHdOSZLj
 AFdHFCz4NXiiwC0UP0xPDI2YeCLqt5np7HmnrUqwzRpVO8UUagiJD8TIpcBSjBN9
 J68eidenHUW7/SlIeaMKE2lmo8AUEAJs9AorDSugF19/ThJcQdx7vT2UAZjeVB3j
 1dbbwajnlDOk/w8PQC4thFp5/MDlfst0htS3WRwa+vgkweE2EAdTU4hUZ8qEP7FQ
 smhYtlT1xUSTYDTqoaG/U2OWR6/UU79wP0jgcOsHXTuyYrtPI/Q=
 =VmXL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that
   Linux needs to save/restore

 - Include TPIDR2 in the signal context and add the corresponding
   kselftests

 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time

 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64

 - Permit EFI boot with MMU and caches on. Instead of cleaning the
   entire loaded kernel image to the PoC and disabling the MMU and
   caches before branching to the kernel bare metal entry point, leave
   the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
   all of system RAM to populate the initial page tables

 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values)

 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure

 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice

 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone

 - Further arm64 sysreg conversion and some fixes

 - arm64 kselftest fixes and improvements

 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation

 - Pseudo-NMI code generation optimisations

 - Minor fixes for SME and TPIDR2 handling

 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
   replace strtobool() to kstrtobool() in the cpufeature.c code, apply
   dynamic shadow call stack in two passes, intercept pfn changes in
   set_pte_at() without the required break-before-make sequence, attempt
   to dump all instructions on unhandled kernel faults

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
  arm64: fix .idmap.text assertion for large kernels
  kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
  kselftest/arm64: Copy whole EXTRA context
  arm64: kprobes: Drop ID map text from kprobes blacklist
  perf: arm_spe: Print the version of SPE detected
  perf: arm_spe: Add support for SPEv1.2 inverted event filtering
  perf: Add perf_event_attr::config3
  arm64/sme: Fix __finalise_el2 SMEver check
  drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
  arm64/signal: Only read new data when parsing the ZT context
  arm64/signal: Only read new data when parsing the ZA context
  arm64/signal: Only read new data when parsing the SVE context
  arm64/signal: Avoid rereading context frame sizes
  arm64/signal: Make interface for restore_fpsimd_context() consistent
  arm64/signal: Remove redundant size validation from parse_user_sigframe()
  arm64/signal: Don't redundantly verify FPSIMD magic
  arm64/cpufeature: Use helper macros to specify hwcaps
  arm64/cpufeature: Always use symbolic name for feature value in hwcaps
  arm64/sysreg: Initial unsigned annotations for ID registers
  arm64/sysreg: Initial annotation of signed ID registers
  ...
2023-02-21 15:27:48 -08:00
Juhyung Park 9e0c7efa5e block: remove more NULL checks after bdev_get_queue()
bdev_get_queue() never returns NULL. Several commits [1][2] have been made
before to remove such superfluous checks, but some still remained.

For places where bdev_get_queue() is called solely for NULL checks, it is
removed entirely.

[1] commit ec9fd2a13d ("blk-lib: don't check bdev_get_queue() NULL check")
[2] commit fea127b36c ("block: remove superfluous check for request queue in bdev_is_zoned()")

Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Link: https://lore.kernel.org/r/20230203024029.48260-1-qkrwngud825@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-21 09:23:22 -07:00
Linus Torvalds 1f2d9ffc7a Scheduler updates in this cycle are:
- Improve the scalability of the CFS bandwidth unthrottling logic
    with large number of CPUs.
 
  - Fix & rework various cpuidle routines, simplify interaction with
    the generic scheduler code. Add __cpuidle methods as noinstr to
    objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks.
 
  - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS,
    to query previously issued registrations.
 
  - Limit scheduler slice duration to the sysctl_sched_latency period,
    to improve scheduling granularity with a large number of SCHED_IDLE
    tasks.
 
  - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
    but also enable them to prevent a cascade of followup problems and
    repeat warnings.
 
  - Fix the rescheduling logic in prio_changed_dl().
 
  - Micro-optimize cpufreq and sched-util methods.
 
  - Micro-optimize ttwu_runnable()
 
  - Micro-optimize the idle-scanning in update_numa_stats(),
    select_idle_capacity() and steal_cookie_task().
 
  - Update the RSEQ code & self-tests
 
  - Constify various scheduler methods
 
  - Remove unused methods
 
  - Refine __init tags
 
  - Documentation updates
 
  - ... Misc other cleanups, fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzbJwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIvA//ZcEaB8Z6ChLRQjM+bsaudKJu3pdLQbPK
 iYbP8Da+LsAfxbEfYuGV3m+jIp0LlBOtsI/EezxQrXV+V7FvNyAX9Y00eEu/zlj8
 7Jn3LMy/DBYTwH7LwVdcU0MyIVI8ZPc6WNnkx0LOtGZn8n+qfHPSDzcP3CW+a5AV
 UvllPYpYyEmsX0Eby7CF4Ue8mSmbViw/xR3rNr8ZSve0c25XzKabw8O9kE3jiHxP
 d/zERJoAYeDyYUEuZqhfn5dTlB4an4IjNEkAfRE5SQ09RA8Gkxsa5Ar8gob9e9M1
 eQsdd4/bdhnrkM8L5qDZczqmgCTZ2bukQrxkBXhRDhLgoFxwAn77b+2ZjmIW3Lae
 AyGqRcDSg1q2oxaYm5ZiuO/t26aDOZu9vPHyHRDGt95EGbZlrp+GgeePyfCigJYz
 UmPdZAAcHdSymnnnlcvdG37WVvaVkpgWZzd8LbtBi23QR+Zc4WQ2IlgnUS5WKNNf
 VOBcAcP6E1IslDotZDQCc2dPFFQoQQEssVooyUc5oMytm7BsvxXLOeHG+Ncu/8uc
 H+U8Qn8jnqTxJbC5hkWQIJlhVKCq2FJrHxxySYTKROfUNcDgCmxboFeAcXTCIU1K
 T0S+sdoTS/CvtLklRkG0j6B8N4N98mOd9cFwUV3tX+/gMLMep3hCQs5L76JagvC5
 skkQXoONNaM=
 =l1nN
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Improve the scalability of the CFS bandwidth unthrottling logic with
   large number of CPUs.

 - Fix & rework various cpuidle routines, simplify interaction with the
   generic scheduler code. Add __cpuidle methods as noinstr to objtool's
   noinstr detection and fix boatloads of cpuidle bugs & quirks.

 - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
   previously issued registrations.

 - Limit scheduler slice duration to the sysctl_sched_latency period, to
   improve scheduling granularity with a large number of SCHED_IDLE
   tasks.

 - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
   but also enable them to prevent a cascade of followup problems and
   repeat warnings.

 - Fix the rescheduling logic in prio_changed_dl().

 - Micro-optimize cpufreq and sched-util methods.

 - Micro-optimize ttwu_runnable()

 - Micro-optimize the idle-scanning in update_numa_stats(),
   select_idle_capacity() and steal_cookie_task().

 - Update the RSEQ code & self-tests

 - Constify various scheduler methods

 - Remove unused methods

 - Refine __init tags

 - Documentation updates

 - Misc other cleanups, fixes

* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  sched/rt: pick_next_rt_entity(): check list_entry
  sched/deadline: Add more reschedule cases to prio_changed_dl()
  sched/fair: sanitize vruntime of entity being placed
  sched/fair: Remove capacity inversion detection
  sched/fair: unlink misfit task from cpu overutilized
  objtool: mem*() are not uaccess safe
  cpuidle: Fix poll_idle() noinstr annotation
  sched/clock: Make local_clock() noinstr
  sched/clock/x86: Mark sched_clock() noinstr
  x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
  x86/atomics: Always inline arch_atomic64*()
  cpuidle: tracing, preempt: Squash _rcuidle tracing
  cpuidle: tracing: Warn about !rcu_is_watching()
  cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
  cpuidle: drivers: firmware: psci: Dont instrument suspend code
  KVM: selftests: Fix build of rseq test
  exit: Detect and fix irq disabled state in oops
  cpuidle, arm64: Fix the ARM64 cpuidle logic
  cpuidle: mvebu: Fix duplicate flags assignment
  sched/fair: Limit sched slice duration
  ...
2023-02-20 17:41:08 -08:00
Linus Torvalds a2f0e7eee1 The latest perf updates in this cycle are:
- Optimize perf_sample_data layout
  - Prepare sample data handling for BPF integration
  - Update the x86 PMU driver for Intel Meteor Lake
  - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
    discovery breakage
  - Fix the x86 Zhaoxin PMU driver
  - Cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzaHgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jYQg/+KRfobCevMQlZVnz09T3SsJ4ahJ587BL6
 g2C6kobyUNfeChpFVroBkTR+yCb6Mq4xGr2nda9+2E978BYu9eanpx/u/bXNQ6NU
 6YhLwgRrlFXonYn07kFfUJeELZ0W+zpPvymEN1KhTQWcrgXDfXRt2VfMwNsVxGRF
 ZRyCWK+UOzSMU22FtW3I/xVLBB0vio9Y6wRC5QOpDVW5YtGwQGust7GJ53JPK43J
 m2soJvWORauT+v0aqc7ggOtKd6pahVoXrDrbktxtq9N0ZGI+PubVCGevex++cXm/
 B3QSf6VcMMuU6pfzxiEwRa8Whrc3XFeSDEfvMjC5v3becGNkdNBnGOJzYprwgRZJ
 irb6/dSrv5P2lj6WphsO1Wzcm7EoWh8M7DVOMh/13Y/oODRdOrv48112Don9UURC
 EPyvzAzizqdwdDopUmfiqUwuAXqb8uPZqCgmlz/NJkVz1/ijlfrmLgeDuf0vI7Aq
 HznzzRwjFHzyCH7D+rtonFh3JDaqgaouY76tpC5yTtzKbZPlFT8kzeCvqkTMnGgH
 czZnSNc/kBup0HDkNSlthK+TyrMXWKeVa8KQSY1E0NJHO4IBBCMzZywSoAaeofQK
 hqfQyofX9XHmuHhCA4yIfv1XkZGlBTxpPAyDdHjgs9iJTsodSYMs8ESY08eW8DXn
 Ld/35O6SylM=
 =ztUT
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:

 - Optimize perf_sample_data layout

 - Prepare sample data handling for BPF integration

 - Update the x86 PMU driver for Intel Meteor Lake

 - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
   discovery breakage

 - Fix the x86 Zhaoxin PMU driver

 - Cleanups

* tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  perf/x86/intel/uncore: Add Meteor Lake support
  x86/perf/zhaoxin: Add stepping check for ZXC
  perf/x86/intel/ds: Fix the conversion from TSC to perf time
  perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
  perf/x86/uncore: Add a quirk for UPI on SPR
  perf/x86/uncore: Ignore broken units in discovery table
  perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
  perf/x86/uncore: Factor out uncore_device_to_die()
  perf/core: Call perf_prepare_sample() before running BPF
  perf/core: Introduce perf_prepare_header()
  perf/core: Do not pass header for sample ID init
  perf/core: Set data->sample_flags in perf_prepare_sample()
  perf/core: Add perf_sample_save_brstack() helper
  perf/core: Add perf_sample_save_raw_data() helper
  perf/core: Add perf_sample_save_callchain() helper
  perf/core: Save the dynamic parts of sample data size
  x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation
  perf/core: Change the layout of perf_sample_data
  perf/x86/msr: Add Meteor Lake support
  perf/x86/cstate: Add Meteor Lake support
  ...
2023-02-20 17:29:55 -08:00
Quanfa Fu c96abaec78 tracing/eprobe: no need to check for negative ret value for snprintf
No need to check for negative return value from snprintf() as the
code does not return negative values.

Link: https://lore.kernel.org/all/20230109040625.3259642-1-quanfafu@gmail.com/

Signed-off-by: Quanfa Fu <quanfafu@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:52:42 +09:00
Donglin Peng 8478cca1e3 tracing/probe: add a char type to show the character value of traced arguments
There are scenes that we want to show the character value of traced
arguments other than a decimal or hexadecimal or string value for debug
convinience. I add a new type named 'char' to do it and a new test case
file named 'kprobe_args_char.tc' to do selftest for char type.

For example:

The to be traced function is 'void demo_func(char type, char *name);', we
can add a kprobe event as follows to show argument values as we want:

echo  'p:myprobe demo_func $arg1:char +0($arg2):char[5]' > kprobe_events

we will get the following trace log:

... myprobe: (demo_func+0x0/0x29) arg1='A' arg2={'b','p','f','1',''}

Link: https://lore.kernel.org/all/20221219110613.367098-1-dolinux.peng@gmail.com/

Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:52:42 +09:00
Masami Hiramatsu (Google) 133921530c tracing/eprobe: Fix to add filter on eprobe description in README file
Fix to add a description of the filter on eprobe in README file. This
is required to identify the kernel supports the filter on eprobe or not.

Link: https://lore.kernel.org/all/167309833728.640500.12232259238201433587.stgit@devnote3/

Fixes: 752be5c5c9 ("tracing/eprobe: Add eprobe filter support")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-21 08:49:16 +09:00
Linus Torvalds 5b0ed59649 for-6.3/block-2023-02-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPvfncQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpob2EADXJxcr2jjYHm/7cjKkyuVX8fr80dNdMeuY
 JFdsjG1k6Uj73BVhQQWYTcs/PsrWBHWRsv6uz4WgOELj55eXmf5Q0kJszyUeJW33
 /DjqLvtoppVcYf80xE13wKvCfn73BjwQo6xkGM0qAYn15eaXiD/Ax3xC6eJlsBeK
 PEw7EJyhacbSxZa/1D2B6+mqII1jUQWProTCc3udZ4JHi3WvdWa3Rda0qCqHl4a1
 +K2aP2YTFIRPxBzfMNa/CafWVIFubTdht+4Ds6R60RImzB9e0VUBfcsiUyW5Zg7L
 Fwv7ptXuWrALwVNdW56Oz1QikBxn2pdRR2HMLwKJW1MD8kP9r8LMm2jV5Rhiwe0B
 OQsGRYkOzBvw+bxeP5fvk0iPGVMz6ActH4gkraA5QdLqayDaFYOadlhqz0uRo5SH
 Fb42Vl658K/MHDSIk8U58TNkmrsIJsBGohXI9DOGINPPvv3XOPi4Q1HmXkGRmii0
 y+lNU/QEGh7xXXew29SPP76uQpQaYfC7NxXCMw/OpOMwehzjsjshmM2lpxi8zsgt
 PJUmfHv5qxCplNmTJXmUpmX7sS7550HUdu9FJb13DM+gzKg8bk9jWVuLrzqrVlG5
 1hKWEl1+heg1heRfaIuJVLbPI0au6Sb4uqhih/PHyrP9TWIoAruDbDJM65GKTxyE
 2uEgcHzHQw==
 =poRc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Christoph:
      - Small improvements to the logging functionality (Amit Engel)
      - Authentication cleanups (Hannes Reinecke)
      - Cleanup and optimize the DMA mapping cod in the PCIe driver
        (Keith Busch)
      - Work around the command effects for Format NVM (Keith Busch)
      - Misc cleanups (Keith Busch, Christoph Hellwig)
      - Fix and cleanup freeing single sgl (Keith Busch)

 - MD updates via Song:
      - Fix a rare crash during the takeover process
      - Don't update recovery_cp when curr_resync is ACTIVE
      - Free writes_pending in md_stop
      - Change active_io to percpu

 - Updates to drbd, inching us closer to unifying the out-of-tree driver
   with the in-tree one (Andreas, Christoph, Lars, Robert)

 - BFQ update adding support for multi-actuator drives (Paolo, Federico,
   Davide)

 - Make brd compliant with REQ_NOWAIT (me)

 - Fix for IOPOLL and queue entering, fixing stalled IO waiting on
   timeouts (me)

 - Fix for REQ_NOWAIT with multiple bios (me)

 - Fix memory leak in blktrace cleanup (Greg)

 - Clean up sbitmap and fix a potential hang (Kemeng)

 - Clean up some bits in BFQ, and fix a bug in the request injection
   (Kemeng)

 - Clean up the request allocation and issue code, and fix some bugs
   related to that (Kemeng)

 - ublk updates and fixes:
      - Add support for unprivileged ublk (Ming)
      - Improve device deletion handling (Ming)
      - Misc (Liu, Ziyang)

 - s390 dasd fixes (Alexander, Qiheng)

 - Improve utility of request caching and fixes (Anuj, Xiao)

 - zoned cleanups (Pankaj)

 - More constification for kobjs (Thomas)

 - blk-iocost cleanups (Yu)

 - Remove bio splitting from drivers that don't need it (Christoph)

 - Switch blk-cgroups to use struct gendisk. Some of this is now
   incomplete as select late reverts were done. (Christoph)

 - Add bvec initialization helpers, and convert callers to use that
   rather than open-coding it (Christoph)

 - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
   Matthew, Ulf, Zhong)

* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
  brd: use radix_tree_maybe_preload instead of radix_tree_preload
  block: use proper return value from bio_failfast()
  block: bio-integrity: Copy flags when bio_integrity_payload is cloned
  block: Fix io statistics for cgroup in throttle path
  brd: mark as nowait compatible
  brd: check for REQ_NOWAIT and set correct page allocation mask
  brd: return 0/-error from brd_insert_page()
  block: sync mixed merged request's failfast with 1st bio's
  Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
  Revert "blk-cgroup: pass a gendisk to blkg_lookup"
  Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
  Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
  Revert "blk-cgroup: move the cgroup information to struct gendisk"
  nvme-pci: remove iod use_sgls
  nvme-pci: fix freeing single sgl
  block: ublk: check IO buffer based on flag need_get_data
  s390/dasd: Fix potential memleak in dasd_eckd_init()
  s390/dasd: sort out physical vs virtual pointers usage
  block: Remove the ALLOC_CACHE_SLACK constant
  block: make kobj_type structures constant
  ...
2023-02-20 14:27:21 -08:00
Wang ShaoBo 7568a21e52 tracing: Remove unnecessary NULL assignment
Remove unnecessary NULL assignment int create_new_subsystem().

Link: https://lkml.kernel.org/r/20221123065124.3982439-1-bobo.shaobowang@huawei.com

Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-18 14:34:36 -05:00
Ross Zwisler 2455f0e124 tracing: Always use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

Many comments and Kconfig help messages in the tracing code still refer
to this older debugfs path, so let's update them to avoid confusion.

Link: https://lore.kernel.org/linux-trace-kernel/20230215223350.2658616-2-zwisler@google.com

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-18 14:34:09 -05:00
David S. Miller 675f176b4d Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Some of the devlink bits were tricky, but I think I got it right.

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-17 11:06:39 +00:00
Tom Zanussi f5914b301a tracing/histogram: Fix stacktrace key
The current code will always use the current stacktrace as a key even
if a stacktrace contained in a specific event field was specified.

For example, we expect to use the 'unsigned long[] stack' field in the
below event in the histogram:

  # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > /sys/kernel/debug/tracing/dynamic_events
  # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger

But in fact, when we type out the trigger, we see that it's using the
plain old global 'stacktrace' as the key, which is just the stacktrace
when the event was hit and not the stacktrace contained in the event,
which is what we want:

  # cat /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
  hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]

And in fact, there's no code to actually retrieve it from the event,
so we need to add HIST_FIELD_FN_STACK and hist_field_stack() to get it
and hook it into the trigger code.  For now, since the stack is just
using dynamic strings, this could just use the dynamic string
function, but it seems cleaner to have a dedicated function an be able
to tweak independently as necessary.

Link: https://lkml.kernel.org/r/11aa614c82976adbfa4ea763dbe885b5fb01d59c.1676063532.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Fixed 32bit build warning reported by kernel test robot <lkp@intel.com> ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-16 13:48:15 -05:00
Tom Zanussi 2bacfd9f7e tracing/histogram: Fix a few problems with stacktrace variable printing
Currently, there are a few problems when printing hist triggers and
trace output when using stacktrace variables.  This fixes the problems
seen below:

  # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
  # cat /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
  hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]

  # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 2' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  hist:keys=next_pid:vals=hitcount:ts=common_timestamp.usecs,st=stacktrace.stacktrace:sort=hitcount:size=2048:clock=global if prev_state == 2 [active]

and also in the trace output (should be stack.stacktrace):

  {  delta: ~ 100-199, stacktrace         __schedule+0xa19/0x1520

Link: https://lkml.kernel.org/r/60bebd4e546728e012a7a2bcbf58716d48ba6edb.1676063532.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-16 12:37:59 -05:00
Steven Rostedt (Google) 8261ef2eb3 tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
The max string length for a histogram variable is 256 bytes. The max depth
of a stacktrace is 16. With 8byte words, that's 16 * 8 = 128. Which can
easily fit in the string variable. The histogram stacktrace is being
stored in the string value (with the given max length), with the
assumption it will fit. To make sure that this is always the case (in the
case that the stack trace depth increases), add a BUILD_BUG_ON() to test
this.

Link: https://lore.kernel.org/linux-trace-kernel/20230214002418.0103b9e765d3e5c374d2aa7d@kernel.org/

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-15 20:25:38 -05:00
Tom Zanussi fc1a9dc101 tracing/histogram: Don't use strlen to find length of stacktrace variables
Because stacktraces are saved in dynamic strings,
trace_event_raw_event_synth() uses strlen to determine the length of
the stack.  Stacktraces may contain 0-bytes, though, in the saved
addresses, so the length found and passed to reserve() will be too
small.

Fix this by using the first unsigned long in the stack variables to
store the actual number of elements in the stack and have
trace_event_raw_event_synth() use that to determine the length of the
stack.

Link: https://lkml.kernel.org/r/1ed6906cd9d6477ef2bd8e63c61de20a9ffe64d7.1676063532.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-15 19:59:09 -05:00
Linus Torvalds ca5ca22775 tracing: Make trace_define_field_ext() static
Just after the fix to TASK_COMM_LEN not converted to its value in
 trace_events was pulled, the kernel test robot reported that the helper
 function trace_define_field_ext() added to that change was only used in
 the file it was defined in but was not declared static. Make it a local
 function.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY+pQvhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qpWMAP958Izvo22zPjlvqypLrC4wkwOrU6BG
 ITApOESLGS6YMAEA3X1qVpjgXClFmRv6j+J7S6LdhUzhkOm9Sxg5Vejxzgo=
 =4fmj
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixlet from Steven Rostedt:
 "Make trace_define_field_ext() static.

  Just after the fix to TASK_COMM_LEN not converted to its value in
  trace_events was pulled, the kernel test robot reported that the
  helper function trace_define_field_ext() added to that change was only
  used in the file it was defined in but was not declared static.

  Make it a local function"

* tag 'trace-v6.2-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Make trace_define_field_ext() static
2023-02-15 11:31:34 -08:00
Mukesh Ojha 8843e06f67 ring-buffer: Handle race between rb_move_tail and rb_check_pages
It seems a data race between ring_buffer writing and integrity check.
That is, RB_FLAG of head_page is been updating, while at same time
RB_FLAG was cleared when doing integrity check rb_check_pages():

  rb_check_pages()            rb_handle_head_page():
  --------                    --------
  rb_head_page_deactivate()
                              rb_head_page_set_normal()
  rb_head_page_activate()

We do intergrity test of the list to check if the list is corrupted and
it is still worth doing it. So, let's refactor rb_check_pages() such that
we no longer clear and set flag during the list sanity checking.

[1] and [2] are the test to reproduce and the crash report respectively.

1:
``` read_trace.sh
  while true;
  do
    # the "trace" file is closed after read
    head -1 /sys/kernel/tracing/trace > /dev/null
  done
```
``` repro.sh
  sysctl -w kernel.panic_on_warn=1
  # function tracer will writing enough data into ring_buffer
  echo function > /sys/kernel/tracing/current_tracer
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
```

2:
------------[ cut here ]------------
WARNING: CPU: 9 PID: 62 at kernel/trace/ring_buffer.c:2653
rb_move_tail+0x450/0x470
Modules linked in:
CPU: 9 PID: 62 Comm: ksoftirqd/9 Tainted: G        W          6.2.0-rc6+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:rb_move_tail+0x450/0x470
Code: ff ff 4c 89 c8 f0 4d 0f b1 02 48 89 c2 48 83 e2 fc 49 39 d0 75 24
83 e0 03 83 f8 02 0f 84 e1 fb ff ff 48 8b 57 10 f0 ff 42 08 <0f> 0b 83
f8 02 0f 84 ce fb ff ff e9 db
RSP: 0018:ffffb5564089bd00 EFLAGS: 00000203
RAX: 0000000000000000 RBX: ffff9db385a2bf81 RCX: ffffb5564089bd18
RDX: ffff9db281110100 RSI: 0000000000000fe4 RDI: ffff9db380145400
RBP: ffff9db385a2bf80 R08: ffff9db385a2bfc0 R09: ffff9db385a2bfc2
R10: ffff9db385a6c000 R11: ffff9db385a2bf80 R12: 0000000000000000
R13: 00000000000003e8 R14: ffff9db281110100 R15: ffffffffbb006108
FS:  0000000000000000(0000) GS:ffff9db3bdcc0000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005602323024c8 CR3: 0000000022e0c000 CR4: 00000000000006e0
Call Trace:
 <TASK>
 ring_buffer_lock_reserve+0x136/0x360
 ? __do_softirq+0x287/0x2df
 ? __pfx_rcu_softirq_qs+0x10/0x10
 trace_function+0x21/0x110
 ? __pfx_rcu_softirq_qs+0x10/0x10
 ? __do_softirq+0x287/0x2df
 function_trace_call+0xf6/0x120
 0xffffffffc038f097
 ? rcu_softirq_qs+0x5/0x140
 rcu_softirq_qs+0x5/0x140
 __do_softirq+0x287/0x2df
 run_ksoftirqd+0x2a/0x30
 smpboot_thread_fn+0x188/0x220
 ? __pfx_smpboot_thread_fn+0x10/0x10
 kthread+0xe7/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
---[ end trace 0000000000000000 ]---

[ crash report and test reproducer credit goes to Zheng Yejian]

Link: https://lore.kernel.org/linux-trace-kernel/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com

Cc: <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 1039221cc2 ("ring-buffer: Do not disable recording when there is an iterator")
Reported-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-15 11:59:16 -05:00
Yang Yang d6e595792f alpha: replace NR_SYSCALLS by NR_syscalls
Reference to other arch likes x86_64 or arm64 to do this replacement.
To solve compile error when using NR_syscalls in kernel[1].

[1] https://lore.kernel.org/all/202203270449.WBYQF9X3-lkp@intel.com/

Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Matt Turner <mattst88@gmail.com>
2023-02-14 12:37:17 -05:00
Steven Rostedt (Google) 70b5339caf tracing: Make trace_define_field_ext() static
trace_define_field_ext() is not used outside of trace_events.c, it should
be static.

Link: https://lore.kernel.org/oe-kbuild-all/202302130750.679RaRog-lkp@intel.com/

Fixes: b6c7abd1c2 ("tracing: Fix TASK_COMM_LEN in trace event format file")
Reported-by: Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-12 20:14:11 -05:00
Linus Torvalds 5e98e916f9 tracing: Fix showing of TASK_COMM_LEN instead of its value
The TASK_COMM_LEN was converted from a macro into an enum so that BTF
 would have access to it. But this unfortunately caused TASK_COMM_LEN to
 display in the format fields of trace events, as they are created by the
 TRACE_EVENT() macro and such, macros convert to their values, where as
 enums do not.
 
 To handle this, instead of using the field itself to be display, save the
 value of the array size as another field in the trace_event_fields
 structure, and use that instead. Not only does this fix the issue, but
 also converts the other trace events that have this same problem (but were
 not breaking tooling). With this change, the original work around
 b3bc8547d3 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types
 as well") could be reverted (but that should be done in the merge window).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY+lOqxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quYPAQD+9j+RPIUm9Ms4XCIEOXkFI04yjsbd
 rQSRcpYBWyP59AEAnZNYNwE11vDsKBGxPrOPgGYe4Pzfr5yOWY84mgaxgwo=
 =iYsE
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix showing of TASK_COMM_LEN instead of its value

  The TASK_COMM_LEN was converted from a macro into an enum so that BTF
  would have access to it. But this unfortunately caused TASK_COMM_LEN
  to display in the format fields of trace events, as they are created
  by the TRACE_EVENT() macro and such, macros convert to their values,
  where as enums do not.

  To handle this, instead of using the field itself to be display, save
  the value of the array size as another field in the trace_event_fields
  structure, and use that instead.

  Not only does this fix the issue, but also converts the other trace
  events that have this same problem (but were not breaking tooling).

  With this change, the original work around b3bc8547d3 ("tracing:
  Have TRACE_DEFINE_ENUM affect trace event types as well") could be
  reverted (but that should be done in the merge window)"

* tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix TASK_COMM_LEN in trace event format file
2023-02-12 13:52:17 -08:00
Yafang Shao b6c7abd1c2 tracing: Fix TASK_COMM_LEN in trace event format file
After commit 3087c61ed2 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"),
the content of the format file under
/sys/kernel/tracing/events/task/task_newtask was changed from
  field:char comm[16];    offset:12;    size:16;    signed:0;
to
  field:char comm[TASK_COMM_LEN];    offset:12;    size:16;    signed:0;

John reported that this change breaks older versions of perfetto.
Then Mathieu pointed out that this behavioral change was caused by the
use of __stringify(_len), which happens to work on macros, but not on enum
labels. And he also gave the suggestion on how to fix it:
  :One possible solution to make this more robust would be to extend
  :struct trace_event_fields with one more field that indicates the length
  :of an array as an actual integer, without storing it in its stringified
  :form in the type, and do the formatting in f_show where it belongs.

The result as follows after this change,
$ cat /sys/kernel/tracing/events/task/task_newtask/format
        field:char comm[16];    offset:12;      size:16;        signed:0;

Link: https://lore.kernel.org/lkml/Y+QaZtz55LIirsUO@google.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230210155921.4610-1-laoar.shao@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230212151303.12353-1-laoar.shao@gmail.com

Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kajetan Puchalski <kajetan.puchalski@arm.com>
CC: Qais Yousef <qyousef@layalina.io>
Fixes: 3087c61ed2 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN")
Reported-by: John Stultz <jstultz@google.com>
Debugged-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-12 10:23:39 -05:00
Jakub Kicinski de42873367 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+bZrwAKCRDbK58LschI
 gzi4AP4+TYo0jnSwwkrOoN9l4f5VO9X8osmj3CXfHBv7BGWVxAD/WnvA3TDZyaUd
 agIZTkRs6BHF9He8oROypARZxTeMLwM=
 =nO1C
 -----END PGP SIGNATURE-----

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-02-11

We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).

There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad2 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:

        [...]
        ice_set_netdev_features(netdev);
        netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
                               NETDEV_XDP_ACT_XSK_ZEROCOPY;
        ice_set_ops(netdev);
        [...]

Stephen's merge conflict mail:
https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/

The main changes are:

1) Add support for BPF trampoline on s390x which finally allows to remove many
   test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.

2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.

3) Add capability to export the XDP features supported by the NIC.
   Along with that, add a XDP compliance test tool,
   from Lorenzo Bianconi & Marek Majtyka.

4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
   from David Vernet.

5) Add a deep dive documentation about the verifier's register
   liveness tracking algorithm, from Eduard Zingerman.

6) Fix and follow-up cleanups for resolve_btfids to be compiled
   as a host program to avoid cross compile issues,
   from Jiri Olsa & Ian Rogers.

7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
   when testing on different NICs, from Jesper Dangaard Brouer.

8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.

9) Extend libbpf to add an option for when the perf buffer should
   wake up, from Jon Doron.

10) Follow-up fix on xdp_metadata selftest to just consume on TX
    completion, from Stanislav Fomichev.

11) Extend the kfuncs.rst document with description on kfunc
    lifecycle & stability expectations, from David Vernet.

12) Fix bpftool prog profile to skip attaching to offline CPUs,
    from Tonghao Zhang.

====================

Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 17:51:27 -08:00
Jakub Kicinski 8697a258ae Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/devlink/leftover.c / net/core/devlink.c:
  565b4824c3 ("devlink: change port event netdev notifier from per-net to global")
  f05bd8ebeb ("devlink: move code to a dedicated directory")
  687125b579 ("devlink: split out core code")
https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 12:25:40 -08:00
Steven Rostedt (Google) 9c1c251d67 tracing: Allow boot instances to have snapshot buffers
Add to ftrace_boot_snapshot, "=<instance>" name, where the instance will
get a snapshot buffer, and will take a snapshot at the end of boot (which
will save the boot traces).

Link: https://lkml.kernel.org/r/20230207173026.792774721@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-07 12:49:56 -05:00
Steven Rostedt (Google) d503b8f747 tracing: Add trace_array_puts() to write into instance
Add a generic trace_array_puts() that can be used to "trace_puts()" into
an allocated trace_array instance. This is just another variant of
trace_array_printk().

Link: https://lkml.kernel.org/r/20230207173026.584717290@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-07 12:49:56 -05:00
Steven Rostedt (Google) c484648083 tracing: Add enabling of events to boot instances
Add the format of:

  trace_instance=foo,sched:sched_switch,irq_handler_entry,initcall

That will create the "foo" instance and enable the sched_switch event
(here were the "sched" system is explicitly specified), the
irq_handler_entry event, and all events under the system initcall.

Link: https://lkml.kernel.org/r/20230207173026.386114535@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-07 12:49:56 -05:00
Steven Rostedt (Google) cb1f98c5e5 tracing: Add creation of instances at boot command line
Add kernel command line to add tracing instances. This only creates
instances at boot but still does not enable any events to them. Later
changes will extend this command line to add enabling of events, filters,
and triggers. As well as possibly redirecting trace_printk()!

Link: https://lkml.kernel.org/r/20230207173026.186210158@goodmis.org

Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-07 12:49:56 -05:00
Steven Rostedt (Google) 9971c3f944 tracing: Fix trace_event_raw_event_synth() if else statement
The test to check if the field is a stack is to be done if it is not a
string. But the code had:

    } if (event->fields[i]->is_stack) {

and not

   } else if (event->fields[i]->is_stack) {

which would cause it to always be tested. Worse yet, this also included an
"else" statement that was only to be called if the field was not a string
and a stack, but this code allows it to be called if it was a string (and
not a stack).

Also fixed some whitespace issues.

Link: https://lore.kernel.org/all/202301302110.mEtNwkBD-lkp@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230131095237.63e3ca8d@gandalf.local.home

Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-07 12:48:58 -05:00
Linyu Yuan a9c4bdd505 tracing: Acquire buffer from temparary trace sequence
there is one dwc3 trace event declare as below,
DECLARE_EVENT_CLASS(dwc3_log_event,
	TP_PROTO(u32 event, struct dwc3 *dwc),
	TP_ARGS(event, dwc),
	TP_STRUCT__entry(
		__field(u32, event)
		__field(u32, ep0state)
		__dynamic_array(char, str, DWC3_MSG_MAX)
	),
	TP_fast_assign(
		__entry->event = event;
		__entry->ep0state = dwc->ep0state;
	),
	TP_printk("event (%08x): %s", __entry->event,
			dwc3_decode_event(__get_str(str), DWC3_MSG_MAX,
				__entry->event, __entry->ep0state))
);
the problem is when trace function called, it will allocate up to
DWC3_MSG_MAX bytes from trace event buffer, but never fill the buffer
during fast assignment, it only fill the buffer when output function are
called, so this means if output function are not called, the buffer will
never used.

add __get_buf(len) which acquiree buffer from iter->tmp_seq when trace
output function called, it allow user write string to acquired buffer.

the mentioned dwc3 trace event will changed as below,
DECLARE_EVENT_CLASS(dwc3_log_event,
	TP_PROTO(u32 event, struct dwc3 *dwc),
	TP_ARGS(event, dwc),
	TP_STRUCT__entry(
		__field(u32, event)
		__field(u32, ep0state)
	),
	TP_fast_assign(
		__entry->event = event;
		__entry->ep0state = dwc->ep0state;
	),
	TP_printk("event (%08x): %s", __entry->event,
		dwc3_decode_event(__get_buf(DWC3_MSG_MAX), DWC3_MSG_MAX,
				__entry->event, __entry->ep0state))
);.

Link: https://lore.kernel.org/linux-trace-kernel/1675065249-23368-1-git-send-email-quic_linyyuan@quicinc.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-07 12:42:54 -05:00
Davidlohr Bueso b18c58af29 tracing/osnoise: No need for schedule_hrtimeout range
No slack time is being passed, just use schedule_hrtimeout().

Link: https://lore.kernel.org/linux-trace-kernel/20230123234649.17968-1-dave@stgolabs.net

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-07 12:34:11 -05:00
Linus Torvalds 513c1a3d3f Fix regression in poll() and select()
With the fix that made poll() and select() block if read would block
 caused a slight regression in rasdaemon, as it needed that kind
 of behavior. Add a way to make that behavior come back by writing
 zero into the "buffer_percentage", which means to never block on read.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY+Jn3xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgQ6AQC30hHcPMPm8+drlH/P6wEYstRP6xbp
 nHYHcvT6qXNPtAD+OhUQR2Zav66m6cE0qvkdnZb72E0YHRTfBhN5OpshTgQ=
 =dJEF
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix regression in poll() and select()

  With the fix that made poll() and select() block if read would block
  caused a slight regression in rasdaemon, as it needed that kind of
  behavior. Add a way to make that behavior come back by writing zero
  into the 'buffer_percentage', which means to never block on read"

* tag 'trace-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
2023-02-07 07:54:40 -08:00
Greg Kroah-Hartman 83e8864fee trace/blktrace: fix memory leak with using debugfs_lookup()
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230202141956.2299521-1-gregkh@linuxfoundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06 09:29:30 -07:00
Jakub Kicinski 82b4a9412b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/core/gro.c
  7d2c89b325 ("skb: Do mix page pool and page referenced frags in GRO")
  b1a78b9b98 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02 14:49:55 -08:00
Linus Torvalds edb9b8f380 Including fixes from bpf, can and netfilter.
Current release - regressions:
 
  - phy: fix null-deref in phy_attach_direct
 
  - mac802154: fix possible double free upon parsing error
 
 Previous releases - regressions:
 
  - bpf: preserve reg parent/live fields when copying range info,
    prevent mis-verification of programs as safe
 
  - ip6: fix GRE tunnels not generating IPv6 link local addresses
 
  - phy: dp83822: fix null-deref on DP83825/DP83826 devices
 
  - sctp: do not check hb_timer.expires when resetting hb_timer
 
  - eth: mtk_sock: fix SGMII configuration after phylink conversion
 
 Previous releases - always broken:
 
  - eth: xdp: execute xdp_do_flush() before napi_complete_done()
 
  - skb: do not mix page pool and page referenced frags in GRO
 
  - bpf:
    - fix a possible task gone issue with bpf_send_signal[_thread]()
    - fix an off-by-one bug in bpf_mem_cache_idx() to select
      the right cache
    - add missing btf_put to register_btf_id_dtor_kfuncs
    - sockmap: fon't let sock_map_{close,destroy,unhash} call itself
 
  - gso: fix null-deref in skb_segment_list()
 
  - mctp: purge receive queues on sk destruction
 
  - fix UaF caused by accept on already connected socket in exotic
    socket families
 
  - tls: don't treat list head as an entry in tls_is_tx_ready()
 
  - netfilter: br_netfilter: disable sabotage_in hook after first
    suppression
 
  - wwan: t7xx: fix runtime PM implementation
 
 Misc:
 
  - MAINTAINERS: spring cleanup of networking maintainers
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmPcKlkACgkQMUZtbf5S
 IrtHgxAAxvuXeN9kQ+/gDYbxTa4Fc3gwCAA7SdJiBdfxP9xJAUEBs9TNfh5MPynb
 iIWl9tTe1OOQtWJTnp5CKMuegUj4hlJVYuLu4MK36hW56Q41cbCFvhsDk/1xoFTF
 2mtmEr5JiXZfgEdf9/tEITTGKgfANsbXOAzL4b5HcV9qtvwvEd2aFxll3VO9N2Xn
 k6o59Lr2mff7NnXQsrVkTLBxXRK9oir8VwyTnCs7j+T7E8Qe4hIkx2LDWcDiRC1e
 vSxfoWFoe+uOGTQMxlarMOAgAOt7nvOQngwCaaVpMBa/OLzGo51Clf98cpPm+cSi
 18ZjmA8r5owGghd75p2PtxcND8U1vXeeRJW+FKK60K8qznMc0D4s7HX+wHBfevnV
 U649F0OHsNsX0Y75Z5sqA1eSmJbTR9QeyiRbuS6nfOnT7ZKDw6vPJrKGJRN789y0
 erzG/JwrfnrCDavXfNNEgk0mMlP0squemffWjuH6FfL4Pt4/gUz63Nokr5vRFSns
 yskHZgXVs4gq++yrbZDZjKfQaPDkA1P/z6VADVRWh4Y5m1m6GtvDUy6vJxK7/hUs
 5giqmhbAAQq0U+2L5RIaoYGfjk2UaDcunNpqTCgjdCdMXx9Yz41QBloFgBjRshp/
 3kg6ElYPzflXXCYK4pGRrzqO4Gqz1Hrvr9YuT+kka+wrpZPDtWc=
 =OGU7
 -----END PGP SIGNATURE-----

Merge tag 'net-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, can and netfilter.

  Current release - regressions:

   - phy: fix null-deref in phy_attach_direct

   - mac802154: fix possible double free upon parsing error

  Previous releases - regressions:

   - bpf: preserve reg parent/live fields when copying range info,
     prevent mis-verification of programs as safe

   - ip6: fix GRE tunnels not generating IPv6 link local addresses

   - phy: dp83822: fix null-deref on DP83825/DP83826 devices

   - sctp: do not check hb_timer.expires when resetting hb_timer

   - eth: mtk_sock: fix SGMII configuration after phylink conversion

  Previous releases - always broken:

   - eth: xdp: execute xdp_do_flush() before napi_complete_done()

   - skb: do not mix page pool and page referenced frags in GRO

   - bpf:
      - fix a possible task gone issue with bpf_send_signal[_thread]()
      - fix an off-by-one bug in bpf_mem_cache_idx() to select the right
        cache
      - add missing btf_put to register_btf_id_dtor_kfuncs
      - sockmap: fon't let sock_map_{close,destroy,unhash} call itself

   - gso: fix null-deref in skb_segment_list()

   - mctp: purge receive queues on sk destruction

   - fix UaF caused by accept on already connected socket in exotic
     socket families

   - tls: don't treat list head as an entry in tls_is_tx_ready()

   - netfilter: br_netfilter: disable sabotage_in hook after first
     suppression

   - wwan: t7xx: fix runtime PM implementation

  Misc:

   - MAINTAINERS: spring cleanup of networking maintainers"

* tag 'net-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  mtk_sgmii: enable PCS polling to allow SFP work
  net: mediatek: sgmii: fix duplex configuration
  net: mediatek: sgmii: ensure the SGMII PHY is powered down on configuration
  MAINTAINERS: update SCTP maintainers
  MAINTAINERS: ipv6: retire Hideaki Yoshifuji
  mailmap: add John Crispin's entry
  MAINTAINERS: bonding: move Veaceslav Falico to CREDITS
  net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
  net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC
  virtio-net: Keep stop() to follow mirror sequence of open()
  selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
  selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
  selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
  selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
  can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
  can: isotp: split tx timer into transmission and timeout
  can: isotp: handle wait_event_interruptible() return values
  can: raw: fix CAN FD frame transmissions over CAN XL devices
  can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
  hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap()
  ...
2023-02-02 14:03:31 -08:00
Shiju Jose 3e46d910d8 tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
poll() and select() on per_cpu trace_pipe and trace_pipe_raw do not work
since kernel 6.1-rc6. This issue is seen after the commit
42fb0a1e84 ("tracing/ring-buffer: Have
polling block on watermark").

This issue is firstly detected and reported, when testing the CXL error
events in the rasdaemon and also erified using the test application for poll()
and select().

This issue occurs for the per_cpu case, when calling the ring_buffer_poll_wait(),
in kernel/trace/ring_buffer.c, with the buffer_percent > 0 and then wait until the
percentage of pages are available. The default value set for the buffer_percent is 50
in the kernel/trace/trace.c.

As a fix, allow userspace application could set buffer_percent as 0 through
the buffer_percent_fops, so that the task will wake up as soon as data is added
to any of the specific cpu buffer.

Link: https://lore.kernel.org/linux-trace-kernel/20230202182309.742-2-shiju.jose@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <mchehab@kernel.org>
Cc: <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 42fb0a1e84 ("tracing/ring-buffer: Have polling block on watermark")
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-02 16:15:53 -05:00
David Vernet 400031e05a bpf: Add __bpf_kfunc tag to all kfuncs
Now that we have the __bpf_kfunc tag, we should use add it to all
existing kfuncs to ensure that they'll never be elided in LTO builds.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com
2023-02-02 00:25:14 +01:00
Peter Zijlstra 3017ba4b83 cpuidle: tracing, preempt: Squash _rcuidle tracing
Extend/fix commit:

  9aedeaed6f ("tracing, hardirq: No moar _rcuidle() tracing")

... to also cover trace_preempt_{on,off}() which were mysteriously
untouched.

Fixes: 9aedeaed6f ("tracing, hardirq: No moar _rcuidle() tracing")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/Y9D5AfnOukWNOZ5q@hirez.programming.kicks-ass.net
Link: https://lore.kernel.org/r/Y9jWXKgkxY5EZVwW@hirez.programming.kicks-ass.net
2023-01-31 15:01:46 +01:00
Ingo Molnar 57a30218fa Linux 6.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
 E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
 nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
 TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
 s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
 ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
 SDc/Y/c=
 =zpaD
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc6' into sched/core, to pick up fixes

Pick up fixes before merging another batch of cpuidle updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-31 15:01:20 +01:00
Jakub Kicinski 2d104c390f bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY9RqJgAKCRDbK58LschI
 gw2IAP9G5uhFO5abBzYLupp6SY3T5j97MUvPwLfFqUEt7EXmuwEA2lCUEWeW0KtR
 QX+QmzCa6iHxrW7WzP4DUYLue//FJQY=
 =yYqA
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-28

We've added 124 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 6386 insertions(+), 1827 deletions(-).

The main changes are:

1) Implement XDP hints via kfuncs with initial support for RX hash and
   timestamp metadata kfuncs, from Stanislav Fomichev and
   Toke Høiland-Jørgensen.
   Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk

2) Extend libbpf's bpf_tracing.h support for tracing arguments of
   kprobes/uprobes and syscall as a special case, from Andrii Nakryiko.

3) Significantly reduce the search time for module symbols by livepatch
   and BPF, from Jiri Olsa and Zhen Lei.

4) Enable cpumasks to be used as kptrs, which is useful for tracing
   programs tracking which tasks end up running on which CPUs
   in different time intervals, from David Vernet.

5) Fix several issues in the dynptr processing such as stack slot liveness
   propagation, missing checks for PTR_TO_STACK variable offset, etc,
   from Kumar Kartikeya Dwivedi.

6) Various performance improvements, fixes, and introduction of more
   than just one XDP program to XSK selftests, from Magnus Karlsson.

7) Big batch to BPF samples to reduce deprecated functionality,
   from Daniel T. Lee.

8) Enable struct_ops programs to be sleepable in verifier,
   from David Vernet.

9) Reduce pr_warn() noise on BTF mismatches when they are expected under
   the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien.

10) Describe modulo and division by zero behavior of the BPF runtime
    in BPF's instruction specification document, from Dave Thaler.

11) Several improvements to libbpf API documentation in libbpf.h,
    from Grant Seltzer.

12) Improve resolve_btfids header dependencies related to subcmd and add
    proper support for HOSTCC, from Ian Rogers.

13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room()
    helper along with BPF selftests, from Ziyang Xuan.

14) Simplify the parsing logic of structure parameters for BPF trampoline
    in the x86-64 JIT compiler, from Pu Lehui.

15) Get BTF working for kernels with CONFIG_RUST enabled by excluding
    Rust compilation units with pahole, from Martin Rodriguez Reboredo.

16) Get bpf_setsockopt() working for kTLS on top of TCP sockets,
    from Kui-Feng Lee.

17) Disable stack protection for BPF objects in bpftool given BPF backends
    don't support it, from Holger Hoffstätte.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits)
  selftest/bpf: Make crashes more debuggable in test_progs
  libbpf: Add documentation to map pinning API functions
  libbpf: Fix malformed documentation formatting
  selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
  selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
  bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().
  bpf/selftests: Verify struct_ops prog sleepable behavior
  bpf: Pass const struct bpf_prog * to .check_member
  libbpf: Support sleepable struct_ops.s section
  bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
  selftests/bpf: Fix vmtest static compilation error
  tools/resolve_btfids: Alter how HOSTCC is forced
  tools/resolve_btfids: Install subcmd headers
  bpf/docs: Document the nocast aliasing behavior of ___init
  bpf/docs: Document how nested trusted fields may be defined
  bpf/docs: Document cpumask kfuncs in a new file
  selftests/bpf: Add selftest suite for cpumask kfuncs
  selftests/bpf: Add nested trust selftests suite
  bpf: Enable cpumasks to be queried and used as kptrs
  bpf: Disallow NULLable pointers for trusted kfuncs
  ...
====================

Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28 00:00:14 -08:00
Steven Rostedt (Google) b81a3a100c tracing/histogram: Add simple tests for stacktrace usage of synthetic events
Update the selftests to include a test of passing a stacktrace between the
events of a synthetic event.

Link: https://lkml.kernel.org/r/20230117152236.475439286@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:24 -05:00
Steven Rostedt (Google) cc5fc8bfc9 tracing/histogram: Add stacktrace type
Now that stacktraces can be part of synthetic events, allow a key to be
typed as a stacktrace.

  # cd /sys/kernel/tracing
  # echo 's:block_lat u64 delta; unsigned long stack[];' >> dynamic_events
  # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger
  # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,st2=$st:onmatch(sched.sched_switch).trace(block_lat,$delta,$st2)' >> events/sched/sched_switch/trigger
  # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger

  # cat events/synthetic/block_lat/hist

  # event histogram
  #
  # trigger info: hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
  #

  { delta: ~ 0-99, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule_idle+0x26/0x40
           cpu_startup_entry+0x19/0x20
           start_secondary+0xed/0xf0
           secondary_startup_64_no_verify+0xe0/0xeb
  } hitcount:          6
  { delta: ~ 0-99, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule_idle+0x26/0x40
           cpu_startup_entry+0x19/0x20
           __pfx_kernel_init+0x0/0x10
           arch_call_rest_init+0xa/0x24
           start_kernel+0x964/0x98d
           secondary_startup_64_no_verify+0xe0/0xeb
  } hitcount:          3
  { delta: ~ 0-99, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule+0x5a/0xb0
           worker_thread+0xaf/0x380
           kthread+0xe9/0x110
           ret_from_fork+0x2c/0x50
  } hitcount:          1
  { delta: ~ 100-199, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule_idle+0x26/0x40
           cpu_startup_entry+0x19/0x20
           start_secondary+0xed/0xf0
           secondary_startup_64_no_verify+0xe0/0xeb
  } hitcount:         15
  [..]
  { delta: ~ 8500-8599, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule_idle+0x26/0x40
           cpu_startup_entry+0x19/0x20
           start_secondary+0xed/0xf0
           secondary_startup_64_no_verify+0xe0/0xeb
  } hitcount:          1

  Totals:
      Hits: 89
      Entries: 11
      Dropped: 0

Link: https://lkml.kernel.org/r/20230117152236.167046397@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:24 -05:00
Steven Rostedt (Google) 00cf3d672a tracing: Allow synthetic events to pass around stacktraces
Allow a stacktrace from one event to be displayed by the end event of a
synthetic event. This is very useful when looking for the longest latency
of a sleep or something blocked on I/O.

 # cd /sys/kernel/tracing/
 # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
 # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 1||prev_state == 2' > events/sched/sched_switch/trigger
 # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger

The above creates a "block_lat" synthetic event that take the stacktrace of
when a task schedules out in either the interruptible or uninterruptible
states, and on a new per process max $delta (the time it was scheduled
out), will print the process id and the stacktrace.

  # echo 1 > events/synthetic/block_lat/enable
  # cat trace
 #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
 #              | |         |   |||||     |         |
    kworker/u16:0-767     [006] d..4.   560.645045: block_lat: pid=767 delta=66 stack=STACK:
 => __schedule
 => schedule
 => pipe_read
 => vfs_read
 => ksys_read
 => do_syscall_64
 => 0x966000aa

           <idle>-0       [003] d..4.   561.132117: block_lat: pid=0 delta=413787 stack=STACK:
 => __schedule
 => schedule
 => schedule_hrtimeout_range_clock
 => do_sys_poll
 => __x64_sys_poll
 => do_syscall_64
 => 0x966000aa

            <...>-153     [006] d..4.   562.068407: block_lat: pid=153 delta=54 stack=STACK:
 => __schedule
 => schedule
 => io_schedule
 => rq_qos_wait
 => wbt_wait
 => __rq_qos_throttle
 => blk_mq_submit_bio
 => submit_bio_noacct_nocheck
 => ext4_bio_write_page
 => mpage_submit_page
 => mpage_process_page_bufs
 => mpage_prepare_extent_to_map
 => ext4_do_writepages
 => ext4_writepages
 => do_writepages
 => __writeback_single_inode

Link: https://lkml.kernel.org/r/20230117152236.010941267@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:24 -05:00
Steven Rostedt (Google) 288709c9f3 tracing: Allow stacktraces to be saved as histogram variables
Allow to save stacktraces into a histogram variable. This will be used by
synthetic events to allow a stacktrace from one event to be passed and
displayed by another event.

The special keyword "stacktrace" is to be used to trigger a stack
trace for the event that the histogram trigger is attached to.

  echo 'hist:keys=pid:st=stacktrace" > events/sched/sched_waking/trigger

Currently nothing can get access to the "$st" variable above that contains
the stack trace, but that will soon change.

Link: https://lkml.kernel.org/r/20230117152235.856323729@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:24 -05:00
Steven Rostedt (Google) 19ff804964 tracing: Simplify calculating entry size using struct_size()
When tracing a dynamic string field for a synthetic event, the offset
calculation for where to write the next event can use struct_size() to
find what the current size of the structure is.

This simplifies the code and makes it less error prone.

Link: https://lkml.kernel.org/r/20230117152235.698632147@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:24 -05:00
Jia-Ju Bai 3e4272b995 tracing: Add NULL checks for buffer in ring_buffer_free_read_page()
In a previous commit 7433632c9f, buffer, buffer->buffers and
buffer->buffers[cpu] in ring_buffer_wake_waiters() can be NULL,
and thus the related checks are added.

However, in the same call stack, these variables are also used in
ring_buffer_free_read_page():

tracing_buffers_release()
  ring_buffer_wake_waiters(iter->array_buffer->buffer)
    cpu_buffer = buffer->buffers[cpu] -> Add checks by previous commit
  ring_buffer_free_read_page(iter->array_buffer->buffer)
    cpu_buffer = buffer->buffers[cpu] -> No check

Thus, to avod possible null-pointer derefernces, the related checks
should be added.

These results are reported by a static tool designed by myself.

Link: https://lkml.kernel.org/r/20230113125501.760324-1-baijiaju1990@gmail.com

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:24 -05:00
Steven Rostedt (Google) e6745a4da9 tracing: Add a way to filter function addresses to function names
There's been several times where an event records a function address in
its field and I needed to filter on that address for a specific function
name. It required looking up the function in kallsyms, finding its size,
and doing a compare of "field >= function_start && field < function_end".

But this would change from boot to boot and is unreliable in scripts.
Also, it is useful to have this at boot up, where the addresses will not
be known. For example, on the boot command line:

  trace_trigger="initcall_finish.traceoff if func.function == acpi_init"

To implement this, add a ".function" prefix, that will check that the
field is of size long, and the only operations allowed (so far) are "=="
and "!=".

Link: https://lkml.kernel.org/r/20221219183213.916833763@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:11 -05:00
Colin Ian King ae3edea88e rv: remove redundant initialization of pointer ptr
The pointer ptr is being initialized with a value that is never read,
it is being updated later on a call to strim. Remove the extraneous
initialization.

Link: https://lkml.kernel.org/r/20230116161612.77192-1-colin.i.king@gmail.com

Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:02 -05:00
Randy Dunlap d5090d91ec tracing/filter: fix kernel-doc warnings
Use the 'struct' keyword for a struct's kernel-doc notation and
use the correct function parameter name to eliminate kernel-doc
warnings:

kernel/trace/trace_events_filter.c:136: warning: cannot understand function prototype: 'struct prog_entry '
kerne/trace/trace_events_filter.c:155: warning: Excess function parameter 'when_to_branch' description in 'update_preds'

Also correct some trivial punctuation problems.

Link: https://lkml.kernel.org/r/20230108021238.16398-1-rdunlap@infradead.org

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:28:13 -05:00
Natalia Petrova 8b152e9150 trace_events_hist: add check for return value of 'create_hist_field'
Function 'create_hist_field' is called recursively at
trace_events_hist.c:1954 and can return NULL-value that's why we have
to check it to avoid null pointer dereference.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Link: https://lkml.kernel.org/r/20230111120409.4111-1-n.petrova@fintech.ru

Cc: stable@vger.kernel.org
Fixes: 30350d65ac ("tracing: Add variable support to hist triggers")
Signed-off-by: Natalia Petrova <n.petrova@fintech.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24 18:19:36 -05:00
Chuang Wang 685b64e4d6 tracing/osnoise: Use built-in RCU list checking
list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence false lockdep
warning when CONFIG_PROVE_RCU_LIST is enabled.

Execute as follow:

 [tracing]# echo osnoise > current_tracer
 [tracing]# echo 1 > tracing_on
 [tracing]# echo 0 > tracing_on

The trace_types_lock is held when osnoise_tracer_stop() or
timerlat_tracer_stop() are called in the non-RCU read side section.
So, pass lockdep_is_held(&trace_types_lock) to silence false lockdep
warning.

Link: https://lkml.kernel.org/r/20221227023036.784337-1-nashuiliang@gmail.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: dae181349f ("tracing/osnoise: Support a list of trace_array *tr")
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24 18:11:41 -05:00
Randy Dunlap ac28d0a0f4 tracing: Kconfig: Fix spelling/grammar/punctuation
Fix some editorial nits in trace Kconfig.

Link: https://lkml.kernel.org/r/20230124181647.15902-1-rdunlap@infradead.org

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24 13:22:38 -05:00
Steven Rostedt (Google) 3bb06eb6e9 tracing: Make sure trace_printk() can output as soon as it can be used
Currently trace_printk() can be used as soon as early_trace_init() is
called from start_kernel(). But if a crash happens, and
"ftrace_dump_on_oops" is set on the kernel command line, all you get will
be:

  [    0.456075]   <idle>-0         0dN.2. 347519us : Unknown type 6
  [    0.456075]   <idle>-0         0dN.2. 353141us : Unknown type 6
  [    0.456075]   <idle>-0         0dN.2. 358684us : Unknown type 6

This is because the trace_printk() event (type 6) hasn't been registered
yet. That gets done via an early_initcall(), which may be early, but not
early enough.

Instead of registering the trace_printk() event (and other ftrace events,
which are not trace events) via an early_initcall(), have them registered at
the same time that trace_printk() can be used. This way, if there is a
crash before early_initcall(), then the trace_printk()s will actually be
useful.

Link: https://lkml.kernel.org/r/20230104161412.019f6c55@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: e725c731e3 ("tracing: Split tracing initialization into two for early initialization")
Reported-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24 11:27:29 -05:00
Mark Rutland 8be9fbd534 ftrace: Export ftrace_free_filter() to modules
Setting filters on an ftrace ops results in some memory being allocated
for the filter hashes, which must be freed before the ops can be freed.
This can be done by removing every individual element of the hash by
calling ftrace_set_filter_ip() or ftrace_set_filter_ips() with `remove`
set, but this is somewhat error prone as it's easy to forget to remove
an element.

Make it easier to clean this up by exporting ftrace_free_filter(), which
can be used to clean up all of the filter hashes after an ftrace_ops has
been unregistered.

Using this, fix the ftrace-direct* samples to free hashes prior to being
unloaded. All other code either removes individual filters explicitly or
is built-in and already calls ftrace_free_filter().

Link: https://lkml.kernel.org/r/20230103124912.2948963-3-mark.rutland@arm.com

Cc: stable@vger.kernel.org
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: e1067a07cf ("ftrace/samples: Add module to test multi direct modify interface")
Fixes: 5fae941b9a ("ftrace/samples: Add multi direct interface test module")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24 11:20:58 -05:00
Mark Rutland cbad0fb2d8 ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).

Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.

On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.

Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.

Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.

This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.

Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).

The CALL_OPS approach avoids this problems and is safe as:

* The existing synchronization in ftrace_shutdown() using
  ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
  synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
  to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
  or while invoking ftrace_ops::func), when that ftrace_ops is
  unregistered.

  Arguably this could also be relied upon for the existing scheme,
  permitting dynamic ftrace_ops to be invoked directly when ops->func is
  in range, but this will require additional logic to handle branch
  range limitations, and is not handled by this patch.

* Each callsite's ftrace_ops pointer literal can hold any valid kernel
  address, and is updated atomically. As an architecture's ftrace_caller
  trampoline will atomically load the ops pointer then dereference
  ops->func, there is no risk of invoking ops->func with a mismatches
  ops pointer, and updates to the ops pointer do not require special
  care.

A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:49:42 +00:00
Jakub Kicinski b3c588cd55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ipa/ipa_interrupt.c
drivers/net/ipa/ipa_interrupt.h
  9ec9b2a308 ("net: ipa: disable ipa interrupt during suspend")
  8e461e1f09 ("net: ipa: introduce ipa_interrupt_enable()")
  d50ed35587 ("net: ipa: enable IPA interrupt handlers separate from registration")
https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/
https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20 12:28:23 -08:00
Jiri Olsa 6a5f2d6ee8 bpf: Change modules resolving for kprobe multi link
We currently use module_kallsyms_on_each_symbol that iterates all
modules/symbols and we try to lookup each such address in user
provided symbols/addresses to get list of used modules.

This fix instead only iterates provided kprobe addresses and calls
__module_address on each to get list of used modules. This turned
out to be simpler and also bit faster.

On my setup with workload (executed 10 times):

   # test_progs -t kprobe_multi_bench_attach/modules

Current code:

 Performance counter stats for './test.sh' (5 runs):

    76,081,161,596      cycles:k                   ( +-  0.47% )

           18.3867 +- 0.0992 seconds time elapsed  ( +-  0.54% )

With the fix:

 Performance counter stats for './test.sh' (5 runs):

    74,079,889,063      cycles:k                   ( +-  0.04% )

           17.8514 +- 0.0218 seconds time elapsed  ( +-  0.12% )

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230116101009.23694-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-19 17:07:40 -08:00
Zhen Lei 07cc2c931e livepatch: Improve the search performance of module_kallsyms_on_each_symbol()
Currently we traverse all symbols of all modules to find the specified
function for the specified module. But in reality, we just need to find
the given module and then traverse all the symbols in it.

Let's add a new parameter 'const char *modname' to function
module_kallsyms_on_each_symbol(), then we can compare the module names
directly in this function and call hook 'fn' after matching. If 'modname'
is NULL, the symbols of all modules are still traversed for compatibility
with other usage cases.

Phase1: mod1-->mod2..(subsequent modules do not need to be compared)
                |
Phase2:          -->f1-->f2-->f3

Assuming that there are m modules, each module has n symbols on average,
then the time complexity is reduced from O(m * n) to O(m) + O(n).

Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230116101009.23694-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-19 17:07:15 -08:00
Yonghong Song bdb7fdb0ac bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
In current bpf_send_signal() and bpf_send_signal_thread() helper
implementation, irq_work is used to handle nmi context. Hao Sun
reported in [1] that the current task at the entry of the helper
might be gone during irq_work callback processing. To fix the issue,
a reference is acquired for the current task before enqueuing into
the irq_work so that the queued task is still available during
irq_work callback processing.

  [1] https://lore.kernel.org/bpf/20230109074425.12556-1-sunhao.th@gmail.com/

Fixes: 8b401f9ed2 ("bpf: implement bpf_send_signal() helper")
Tested-by: Hao Sun <sunhao.th@gmail.com>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230118204815.3331855-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-18 18:44:16 -08:00
Namhyung Kim 0a9081cf0a perf/core: Add perf_sample_save_raw_data() helper
When we save the raw_data to the perf sample data, we need to update
the sample flags and the dynamic size.  To make sure this is done
consistently, add the perf_sample_save_raw_data() helper and convert
all call sites.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-4-namhyung@kernel.org
2023-01-18 11:57:19 +01:00
Peter Zijlstra 9aedeaed6f tracing, hardirq: No moar _rcuidle() tracing
Robot reported that trace_hardirqs_{on,off}() tickle the forbidden
_rcuidle() tracepoint through local_irq_{en,dis}able().

For 'sane' configs, these calls will only happen with RCU enabled and
as such can use the regular tracepoint. This also means it's possible
to trace them from NMI context again.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.477416709@infradead.org
2023-01-13 11:48:16 +01:00
Peter Zijlstra 408b961146 tracing: WARN on rcuidle
ARCH_WANTS_NO_INSTR (a superset of CONFIG_GENERIC_ENTRY) disallows any
and all tracing when RCU isn't enabled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.416110581@infradead.org
2023-01-13 11:48:16 +01:00
Peter Zijlstra dc7305606d tracing: Remove trace_hardirqs_{on,off}_caller()
Per commit 56e62a7370 ("s390: convert to generic entry") the last
and only callers of trace_hardirqs_{on,off}_caller() went away, clean
up.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.355283994@infradead.org
2023-01-13 11:48:16 +01:00
Hao Sun a3d81bc1ea bpf: Skip task with pid=1 in send_signal_common()
The following kernel panic can be triggered when a task with pid=1 attaches
a prog that attempts to send killing signal to itself, also see [1] for more
details:

  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148
  Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106
  panic+0x2c4/0x60f kernel/panic.c:275
  do_exit.cold+0x63/0xe4 kernel/exit.c:789
  do_group_exit+0xd4/0x2a0 kernel/exit.c:950
  get_signal+0x2460/0x2600 kernel/signal.c:2858
  arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306
  exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
  exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
  __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
  syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
  do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

So skip task with pid=1 in bpf_send_signal_common() to avoid the panic.

  [1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com

Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com
2023-01-06 22:48:58 +01:00
Jakub Kicinski 4aea86b403 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 15:34:11 -08:00