There are a lot of changes in the perf tools in this cycle.
build
-----
* Use generic syscall table to generate syscall numbers on supported archs.
* This also enables to get rid of libaudit which was used for syscall numbers.
* Remove python2 support as it's deprecated for years.
* Fix issues on static build with libzstd.
perf record
-----------
* Intel-PT supports "aux-action" config term to pause or resume tracing in
the aux-buffer. Users can start the intel_pt event as "started-paused" and
configure other events to control the Intel-PT tracing.
# perf record --kcore -e intel_pt/aux-action=start-paused/ \
-e syscalls:sys_enter_newuname/aux-action=resume/ \
-e syscalls:sys_exit_newuname/aux-action=pause/ -- uname
This requires the kernel support (which was added in v6.13).
perf lock
---------
* 'perf lock contention' command has an ability to symbolize locks in
dynamically allocated objects using slab cache name when it runs with BPF.
Those dynamic locks would have "&" prefix in the name to distinguish them
from ordinary (static) locks.
# perf lock con -abl -E 5 sleep 1
contended total wait max wait avg wait address symbol
2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex)
1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex)
4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex)
2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex)
3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex)
This also requires the kernel/BPF support (which was added in v6.13).
perf ftrace
-----------
* 'perf ftrace latency' command gets a couple of options to support linear
buckets instead of exponential. Also it's possible to specify max and
min latency for the linear buckets.
# perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \
--min-latency=200 --max-latency=800 -- sleep 1
# DURATION | COUNT | GRAPH |
0 - 200 ns | 186 | ### |
200 - 300 ns | 256 | ##### |
300 - 400 ns | 364 | ####### |
400 - 500 ns | 223 | #### |
500 - 600 ns | 111 | ## |
600 - 700 ns | 41 | |
700 - 800 ns | 141 | ## |
800 - ... ns | 169 | ### |
# statistics (in nsec)
total time: 2162212
avg time: 967
max time: 16817
min time: 132
count: 2236
* As you can see in the above example, it nows shows the statistics at the
end so that users can see the avg/max/min latencies easily.
* 'perf ftrace profile' command has --graph-opts option like 'perf ftrace
trace' so that it can control the tracing behaviors in the same way.
For example, it can limit the function call depth or threshold.
perf script
-----------
* Improve physical memory resolution in 'mem-phys-addr' script by parsing
/proc/iomem file.
# perf script mem-phys-addr -- find /
...
Event: mem_inst_retired.all_loads:P
Memory type count percentage
---------------------------------------- ---------- ----------
100000000-85f7fffff : System RAM 8929 69.7
547600000-54785d23f : Kernel data 1240 9.7
546a00000-5474bdfff : Kernel rodata 490 3.8
5480ce000-5485fffff : Kernel bss 121 0.9
0-fff : Reserved 3860 30.1
100000-89c01fff : System RAM 18 0.1
8a22c000-8df6efff : System RAM 5 0.0
Others
------
* 'perf test' gets --runs-per-test option to run the test cases repeatedly.
This would be helpful to see if it's flaky.
* Add 'parse_events' method to Python perf extension module, so that users
can use the same event parsing logic in the python code. One more step
towards implementing perf tools in Python. :)
* Support opening tracepoint events without libtraceevent. This will be
helpful if it won't use the tracing data like in 'perf stat'.
* Update ARM Neoverse N2/V2 JSON events and metrics
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ5AgiQAKCRCMstVUGiXM
g0WhAP43Dpfatrm1jicTyAogk5D/JrIMOgjGtrJJi5RXG/r0gwD8DSWFzLppS9xy
KGtjLHrN6v6BqR4DCubdlZmRfh9Qjgg=
=M0Kz
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf-tools updates from Namhyung Kim:
"There are a lot of changes in the perf tools in this cycle.
build:
- Use generic syscall table to generate syscall numbers on supported
archs
- This also enables to get rid of libaudit which was used for syscall
numbers
- Remove python2 support as it's deprecated for years
- Fix issues on static build with libzstd
perf record:
- Intel-PT supports "aux-action" config term to pause or resume
tracing in the aux-buffer. Users can start the intel_pt event as
"started-paused" and configure other events to control the Intel-PT
tracing:
# perf record --kcore -e intel_pt/aux-action=start-paused/ \
-e syscalls:sys_enter_newuname/aux-action=resume/ \
-e syscalls:sys_exit_newuname/aux-action=pause/ -- uname
This requires kernel support (which was added in v6.13)
perf lock:
- 'perf lock contention' command has an ability to symbolize locks in
dynamically allocated objects using slab cache name when it runs
with BPF. Those dynamic locks would have "&" prefix in the name to
distinguish them from ordinary (static) locks
# perf lock con -abl -E 5 sleep 1
contended total wait max wait avg wait address symbol
2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex)
1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex)
4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex)
2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex)
3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex)
This also requires kernel/BPF support (which was added in v6.13)
perf ftrace:
- 'perf ftrace latency' command gets a couple of options to support
linear buckets instead of exponential. Also it's possible to
specify max and min latency for the linear buckets:
# perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \
--min-latency=200 --max-latency=800 -- sleep 1
# DURATION | COUNT | GRAPH |
0 - 200 ns | 186 | ### |
200 - 300 ns | 256 | ##### |
300 - 400 ns | 364 | ####### |
400 - 500 ns | 223 | #### |
500 - 600 ns | 111 | ## |
600 - 700 ns | 41 | |
700 - 800 ns | 141 | ## |
800 - ... ns | 169 | ### |
# statistics (in nsec)
total time: 2162212
avg time: 967
max time: 16817
min time: 132
count: 2236
- As you can see in the above example, it nows shows the statistics
at the end so that users can see the avg/max/min latencies easily
- 'perf ftrace profile' command has --graph-opts option like 'perf
ftrace trace' so that it can control the tracing behaviors in the
same way. For example, it can limit the function call depth or
threshold
perf script:
- Improve physical memory resolution in 'mem-phys-addr' script by
parsing /proc/iomem file
# perf script mem-phys-addr -- find /
...
Event: mem_inst_retired.all_loads:P
Memory type count percentage
---------------------------------------- ---------- ----------
100000000-85f7fffff : System RAM 8929 69.7
547600000-54785d23f : Kernel data 1240 9.7
546a00000-5474bdfff : Kernel rodata 490 3.8
5480ce000-5485fffff : Kernel bss 121 0.9
0-fff : Reserved 3860 30.1
100000-89c01fff : System RAM 18 0.1
8a22c000-8df6efff : System RAM 5 0.0
Others:
- 'perf test' gets --runs-per-test option to run the test cases
repeatedly. This would be helpful to see if it's flaky
- Add 'parse_events' method to Python perf extension module, so that
users can use the same event parsing logic in the python code. One
more step towards implementing perf tools in Python. :)
- Support opening tracepoint events without libtraceevent. This will
be helpful if it won't use the tracing data like in 'perf stat'
- Update ARM Neoverse N2/V2 JSON events and metrics"
* tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (176 commits)
perf test: Update event_groups test to use instructions
perf bench: Fix undefined behavior in cmpworker()
perf annotate: Prefer passing evsel to evsel->core.idx
perf lock: Rename fields in lock_type_table
perf lock: Add percpu-rwsem for type filter
perf lock: Fix parse_lock_type which only retrieve one lock flag
perf lock: Fix return code for functions in __cmd_contention
perf hist: Fix width calculation in hpp__fmt()
perf hist: Fix bogus profiles when filters are enabled
perf hist: Deduplicate cmp/sort/collapse code
perf test: Improve verbose documentation
perf test: Add a runs-per-test flag
perf test: Fix parallel/sequential option documentation
perf test: Send list output to stdout rather than stderr
perf test: Rename functions and variables for better clarity
perf tools: Expose quiet/verbose variables in Makefile.perf
perf config: Add a function to set one variable in .perfconfig
perf test perftool_testsuite: Return correct value for skipping
perf test perftool_testsuite: Add missing description
perf test record+probe_libc_inet_pton: Make test resilient
...
percpu-rwsem was missing in man page. And for backward compatibility,
replace `pcpu-sem` with `percpu-rwsem` before parsing lock name.
Tested `./perf lock con -ab -Y pcpu-sem` and `./perf lock con -ab -Y
percpu-rwsem`
Fixes: 4f701063bf ("perf lock contention: Show lock type with address")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Cc: nick.forrington@arm.com
Link: https://lore.kernel.org/r/20250116235838.2769691-2-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add a little more detail on the output expectations for each verbose
level.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250110045736.598281-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To detect flakes it is useful to run tests more than once. Add a
runs-per-test flag that will run each test multiple times. Example
output:
```
$ perf test -r 3 lbr -v
122: perf record LBR tests : Ok
122: perf record LBR tests : Ok
122: perf record LBR tests : Ok
```
Update the documentation for the runs-per-test option.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250110045736.598281-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The parallel option was removed in commit 94d1a913bd ("perf test:
Make parallel testing the default"). Update the sequential
documentation to reflect it isn't the default except for "exclusive"
tests.
Fixes: 94d1a913bd ("perf test: Make parallel testing the default")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250110045736.598281-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Document the flag along with PMU events to hint what it's used for and
give an example with other useful options to get minimal output.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250108142904.401139-3-james.clark@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
All architectures now support HAVE_SYSCALL_TABLE_SUPPORT, so the flag is
no longer needed. With the removal of the flag, the related
GENERIC_SYSCALL_TABLE can also be removed.
libaudit was only used as a fallback for when HAVE_SYSCALL_TABLE_SUPPORT
was not defined, so libaudit is also no longer needed for any
architecture.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-16-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Document the use of aux-action config term and provide a simple example.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Improve format of config terms and section references.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add parsing for aux-action to accept "pause", "resume" or "start-paused"
values.
"start-paused" is valid only for AUX area events.
"pause" and "resume" are valid only for events grouped with an AUX area
event as the group leader. However, like with aux-output, the events
will be automatically grouped if they are not currently in a group, and
the AUX area event precedes the other events.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a max-latency option as discussed, in case the number of
buckets is more than 22, we don't observe the setting (for now, let's
say).
By default or if 0 is passed, the value is automatically determined
based on the number of buckets, range and minimum, so that we fill all
available buffers (equivalent to the behaviour before this patch).
We now get something like this:
# perf ftrace latency --bucket-range=20 \
--min-latency 10 \
--max-latency=100 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 10 us | 1731 | ################ |
10 - 30 us | 1 | |
30 - 50 us | 0 | |
50 - 70 us | 0 | |
70 - 90 us | 0 | |
90 - 100 us | 0 | |
100 - ... us | 0 | |
Note the maximum is observed also if it doesn't cover completely a full
range (the second to last range is 10us long to let the last start at
100 sharp), this looks to me more sensible and eases the computations,
since we don't need to account for the range while filling the buckets.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241112181214.1171244-5-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Things below and over will be in the first and last, outlier, buckets.
Without it:
# perf ftrace latency --use-nsec --use-bpf \
--bucket-range=200 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 200 ns | 0 | |
200 - 400 ns | 44 | |
400 - 600 ns | 291 | # |
600 - 800 ns | 506 | ## |
800 - 1000 ns | 148 | |
1.00 - 1.20 us | 581 | ## |
1.20 - 1.40 us | 2199 | ########## |
1.40 - 1.60 us | 1048 | #### |
1.60 - 1.80 us | 1448 | ###### |
1.80 - 2.00 us | 1091 | ##### |
2.00 - 2.20 us | 517 | ## |
2.20 - 2.40 us | 318 | # |
2.40 - 2.60 us | 370 | # |
2.60 - 2.80 us | 271 | # |
2.80 - 3.00 us | 150 | |
3.00 - 3.20 us | 85 | |
3.20 - 3.40 us | 48 | |
3.40 - 3.60 us | 40 | |
3.60 - 3.80 us | 22 | |
3.80 - 4.00 us | 13 | |
4.00 - 4.20 us | 14 | |
4.20 - ... us | 626 | ## |
#
# perf ftrace latency --use-nsec --use-bpf \
--bucket-range=20 --min-latency=1200 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 1200 ns | 1243 | ##### |
1.20 - 1.22 us | 141 | |
1.22 - 1.24 us | 202 | |
1.24 - 1.26 us | 209 | |
1.26 - 1.28 us | 219 | |
1.28 - 1.30 us | 208 | |
1.30 - 1.32 us | 245 | # |
1.32 - 1.34 us | 246 | # |
1.34 - 1.36 us | 224 | # |
1.36 - 1.38 us | 219 | |
1.38 - 1.40 us | 206 | |
1.40 - 1.42 us | 190 | |
1.42 - 1.44 us | 190 | |
1.44 - 1.46 us | 146 | |
1.46 - 1.48 us | 140 | |
1.48 - 1.50 us | 125 | |
1.50 - 1.52 us | 115 | |
1.52 - 1.54 us | 102 | |
1.54 - 1.56 us | 87 | |
1.56 - 1.58 us | 90 | |
1.58 - 1.60 us | 85 | |
1.60 - ... us | 5487 | ######################## |
#
Now we want focus on the latencies starting at 1.2us, with a finer
grained range of 20ns:
This is all on a live system, so statistically interesting, but not
narrowing down on the same numbers, so a 'perf ftrace latency record'
seems interesting to then use all on the same snapshot of latencies.
A --max-latency counterpart should come next, at first limiting the
max-latency to 20 * bucket-size, as we have a fixed buckets array with
20 + 2 entries (+ for the outliers) and thus would need to make it
larger for higher latencies.
We also may need a way to ask for not considering the out of range
values (first and last buckets) when drawing the buckets bars.
Co-developed-by: Gabriele Monaco <gmonaco@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241112181214.1171244-4-acme@kernel.org
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Just a trivial typo, should be 'can', did a spell check on the rest of
the file just in case, nothing more stood out.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.
So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8
/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}
/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#
After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a few paragraphs on tool and hwmon events.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20241109003759.473460-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Wasn't documented so far, mention that it is mostly used in the shell
regression tests.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/r/20241020021842.1752770-4-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In Makefile.config for unwinding the name dwarf implies either
libunwind or libdw. Make it clearer that HAVE_DWARF_SUPPORT is really
just defined when libdw is present by renaming to HAVE_LIBDW_SUPPORT.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
As HAVE_DWARF_GETLOCATIONS_SUPPORT and HAVE_DWARF_CFI_SUPPORT always
match HAVE_DWARF_SUPPORT remove the macros and use
HAVE_DWARF_SUPPORT. If building the file is guarded by CONFIG_DWARF
then remove all ifs.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
pre-migration wait time is the time that a task unnecessarily spends
on the runqueue of a CPU but doesn't get switched-in there. In terms
of tracepoints, it is the time between sched:sched_wakeup and
sched:sched_migrate_task.
Let's say a task woke up on CPU2, then it got migrated to CPU4 and
then it's switched-in to CPU4. So, here pre-migration wait time is
time that it was waiting on runqueue of CPU2 after it is woken up.
The general pattern for pre-migration to occur is:
sched:sched_wakeup
sched:sched_migrate_task
sched:sched_switch
The sched:sched_waking event is used to capture the wakeup time,
as it aligns with the existing code and only introduces a negligible
time difference.
pre-migrations are generally not useful and it increases migrations.
This metric would be helpful in testing patches mainly related to wakeup
and load-balancer code paths as better wakeup logic would choose an
optimal CPU where task would be switched-in and thereby reducing pre-
migrations.
The sample output(s) when -P or --pre-migrations is used:
=================
time cpu task name wait time sch delay run time pre-mig time
[tid/pid] (msec) (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- --------- ---------
38456.720806 [0001] schbench[28634/28574] 4.917 4.768 1.004 0.000
38456.720810 [0001] rcu_preempt[18] 3.919 0.003 0.004 0.000
38456.721800 [0006] schbench[28779/28574] 23.465 23.465 1.999 0.000
38456.722800 [0002] schbench[28773/28574] 60.371 60.237 3.955 60.197
38456.722806 [0001] schbench[28634/28574] 0.004 0.004 1.996 0.000
38456.722811 [0001] rcu_preempt[18] 1.996 0.005 0.005 0.000
38456.723800 [0000] schbench[28833/28574] 4.000 4.000 3.999 0.000
38456.723800 [0004] schbench[28762/28574] 42.951 42.839 3.999 39.867
38456.723802 [0007] schbench[28812/28574] 43.947 43.817 3.999 40.866
38456.723804 [0001] schbench[28587/28574] 7.935 7.822 0.993 0.000
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Link: https://lore.kernel.org/r/20241004170756.18064-1-vineethr@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The original commit message:
"
Use current sort mechanism but the real .se_cmp() just returns 0 so
that new columns "Predicted", "Abort" and "Cycles" are created in display
but actually these keys are not the sort keys.
For example:
Overhead Source:Line Symbol Shared Object Predicted Abort Cycles
........ ............ ........ ............. ......... ..... ......
38.25% div.c:45 [.] main div 97.6% 0 3
"
Update missed commit from series "perf report: Show branch flags/cycles
in --branch-history callgraph view" to apply to current repository so that
new columns described above are visible.
Link to original series:
https://lore.kernel.org/lkml/1477876794-30749-1-git-send-email-yao.jin@linux.intel.com/
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20241010184046.203822-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
There is a difference between the SYNOPSIS section of the help message
and the man page (tools/perf/Documentation/perf-list.txt) for the perf
list command. After checking, we found that the help message reflected
the latest specifications. Therefore, revised the SYNOPSIS section of
the man page to match the help message.
Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Liang
Link: https://lore.kernel.org/r/20241003002404.2592094-1-fj5100bi@fujitsu.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This can be used to get config values like which objdump Perf uses for
disassembly.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-4-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fix two inconsistencies in feature names as discussed in [1]:
1. Rename "dwarf-unwind-support" to "dwarf-unwind"
2. 'get_cpuid' feature and 'HAVE_AUXTRACE_SUPPORT' names don't
look related, change the feature name to 'auxtrace' to match the
macro name, as 'get_cpuid' string is not used anywhere to check the
feature presence
[1]: https://lore.kernel.org/linux-perf-users/ZoRw5we4HLSTZND6@x1/
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-7-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the presence of a feature is checked with a combination of
perf version --build-options and greps, such as:
perf version --build-options | grep " on .* HAVE_FEATURE"
Instead of this, introduce a subcommand "perf check feature", with which
scripts can test for presence of a feature, such as:
perf check feature HAVE_FEATURE
'perf check feature' command is expected to have exit status of 0 if
feature is built-in, and 1 if it's not built-in or if feature is not known.
Multiple features can also be passed as a comma-separated list, in which
case the exit status will be 1 only if all of the passed features are
built-in. For example, with below command, it will have exit status of 0
only if both libtraceevent and bpf are enabled, else 1 in all other cases
perf check feature libtraceevent,bpf
The arguments are case-insensitive.
An array 'supported_features' has also been introduced that can be used by
other commands like 'perf version --build-options', so that new features
can be added in one place, with the array
Committer testing:
$ perf check feature libtraceevent,bpf
libtraceevent: [ on ] # HAVE_LIBTRACEEVENT
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
$ perf check feature libtraceevent
libtraceevent: [ on ] # HAVE_LIBTRACEEVENT
$ perf check feature bpf
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
$ perf check -q feature bpf && echo "BPF support is present"
BPF support is present
$ perf check -q feature Bogus && echo "Bogus support is present"
$
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904061836.55873-3-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --show-prio option is used to display the priority of task.
It is disabled by default, which is consistent with original behavior.
The display format is xxx (priority does not change during task running)
or xxx->yyy (priority changes during task running)
Testcase:
# perf sched record nice -n 9 true
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 0.497 MB perf.data ]
# perf sched timehist -h
Usage: perf sched timehist [<options>]
-C, --cpu <cpu> list of cpus to profile
-D, --dump-raw-trace dump raw trace in ASCII
-f, --force don't complain, do it
-g, --call-graph Display call chains if present (default on)
-I, --idle-hist Show idle events only
-i, --input <file> input file name
-k, --vmlinux <file> vmlinux pathname
-M, --migrations Show migration events
-n, --next Show next task
-p, --pid <pid[,pid...]>
analyze events only for given process id(s)
-s, --summary Show only syscall summary with statistics
-S, --with-summary Show all syscalls and summary with statistics
-t, --tid <tid[,tid...]>
analyze events only for given thread id(s)
-V, --cpu-visual Add CPU visual
-v, --verbose be more verbose (show symbol address, etc)
-w, --wakeups Show wakeup events
--kallsyms <file>
kallsyms pathname
--max-stack <n> Maximum number of functions to display backtrace.
--show-prio Show task priority
--state Show task state when sched-out
--symfs <directory>
Look for files with symbols relative to this directory
--time <str> Time span for analysis (start,stop)
# perf sched timehist
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
23952.006537 [0000] perf[534] 0.000 0.000 0.000
23952.006593 [0000] migration/0[19] 0.000 0.014 0.056
23952.006899 [0001] perf[534] 0.000 0.000 0.000
23952.006947 [0001] migration/1[22] 0.000 0.015 0.047
23952.007138 [0002] perf[534] 0.000 0.000 0.000
<SNIP>
# perf sched timehist --show-prio
Samples of sched_switch event do not have callchains.
time cpu task name prio wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ -------- --------- --------- ---------
23952.006537 [0000] perf[534] 120 0.000 0.000 0.000
23952.006593 [0000] migration/0[19] 0 0.000 0.014 0.056
23952.006899 [0001] perf[534] 120 0.000 0.000 0.000
<SNIP>
23952.034843 [0003] nice[535] 120->129 0.189 0.024 23.314
<SNIP>
23952.053838 [0005] rcu_preempt[16] 120 3.993 0.000 0.023
23952.053990 [0005] <idle> 120 0.023 0.023 0.152
23952.054137 [0006] <idle> 120 1.427 1.427 17.855
23952.054278 [0007] <idle> 120 0.506 0.506 1.650
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819033016.2427235-2-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
TPEBS (Timed PEBS(Precise Event-Based Sampling)) is a new feature Intel
PMU from Granite Rapids microarchitecture.
It will be used in new TMA (Top-Down Microarchitecture Analysis)
releases.
Add related introduction to documents while adding new code to support
it in 'perf stat'.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-8-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Current description for the AUX trace buffer size is misleading. When a
user specifies the option '-m,512M', it represents a size value in bytes
(512MiB) but not 512M pages (512M x 4KiB regard to a page of 4KiB).
Make the document clear that the normal buffer and the AUX tracing
buffer share the same semantics. Syncs the documents for consistent
text.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812093459.2575278-1-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Similarly to other subcommands (like report, top), it would be handy to
provide a path for addr2line command.
Signed-off-by: Martin Liska <martin.liska@hey.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/eadc3e36-029d-4848-9d69-272fe5a83a26@foxlink.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a common options section and move some items to the section. Also
add description of new options to report options.
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20240802180913.1023886-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To allow BPF filters for unprivileged users it needs to pin the BPF
objects to BPF-fs first. Let's add a new option to pin and unpin the
objects easily. I'm not sure 'perf record' is a right place to do this
but I don't have a better idea right now.
$ sudo perf record --setup-filter pin
The above command would pin BPF program and maps for the filter when the
system has BPF-fs (usually at /sys/fs/bpf/). To unpin the objects,
users can run the following command (as root).
$ sudo perf record --setup-filter unpin
Committer testing:
root@number:~# perf record --setup-filter pin
root@number:~# ls -la /sys/fs/bpf/perf_filter/
total 0
drwxr-xr-x. 2 root root 0 Jul 31 10:43 .
drwxr-xr-t. 3 root root 0 Jul 31 10:43 ..
-rw-rw-rw-. 1 root root 0 Jul 31 10:43 dropped
-rw-rw-rw-. 1 root root 0 Jul 31 10:43 filters
-rwxrwxrwx. 1 root root 0 Jul 31 10:43 perf_sample_filter
-rw-rw-rw-. 1 root root 0 Jul 31 10:43 pid_hash
-rw-------. 1 root root 0 Jul 31 10:43 sample_f_rodata
root@number:~# ls -la /sys/fs/bpf/perf_filter/perf_sample_filter
-rwxrwxrwx. 1 root root 0 Jul 31 10:43 /sys/fs/bpf/perf_filter/perf_sample_filter
root@number:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf ftrace profile' command is to get function execution profiles
using function-graph tracer so that users can see the total, average,
max execution time as well as the number of invocations easily.
The following is a profile for the perf_event_open syscall.
$ sudo perf ftrace profile -G __x64_sys_perf_event_open -- \
perf stat -e cycles -C1 true 2> /dev/null | head
# Total (us) Avg (us) Max (us) Count Function
65.611 65.611 65.611 1 __x64_sys_perf_event_open
30.527 30.527 30.527 1 anon_inode_getfile
30.260 30.260 30.260 1 __anon_inode_getfile
29.700 29.700 29.700 1 alloc_file_pseudo
17.578 17.578 17.578 1 d_alloc_pseudo
17.382 17.382 17.382 1 __d_alloc
16.738 16.738 16.738 1 kmem_cache_alloc_lru
15.686 15.686 15.686 1 perf_event_alloc
14.012 7.006 11.264 2 obj_cgroup_charge
#
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Changbin Du <changbin.du@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/lkml/20240729004127.238611-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'graph-tail' option is to print function name as a comment at the end.
This is useful when a large function is mixed with other functions
(possibly from different CPUs).
For example,
$ sudo perf ftrace -- perf stat true
...
1) | get_unused_fd_flags() {
1) | alloc_fd() {
1) 0.178 us | _raw_spin_lock();
1) 0.187 us | expand_files();
1) 0.169 us | _raw_spin_unlock();
1) 1.211 us | }
1) 1.503 us | }
$ sudo perf ftrace --graph-opts tail -- perf stat true
...
1) | get_unused_fd_flags() {
1) | alloc_fd() {
1) 0.099 us | _raw_spin_lock();
1) 0.083 us | expand_files();
1) 0.081 us | _raw_spin_unlock();
1) 0.601 us | } /* alloc_fd */
1) 0.751 us | } /* get_unused_fd_flags */
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Changbin Du <changbin.du@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/lkml/20240729004127.238611-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Records the commands for cross compilation with two methods.
The first method relies on Multiarch. The second approach is to explicitly
specify the PKG_CONFIG variables, which is widely used in build system
(like Buildroot, Yocto, etc).
Co-developed-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: amadio@gentoo.org
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20240717082211.524826-7-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
By default, perf sched map prints sched-in events for all the tasks
which may not be required all the time as it prints lot of symbols
and rows to the terminal.
With --task-name option, one could specify the specific task name
for which the map has to be shown. This would help in analyzing the
CPU usage patterns easier for that specific task. Since multiple
PID's might have the same task name, using task-name filter
would be more useful for debugging.
For other tasks, instead of printing the symbol, '-' is printed and
the same '.' is used to represent idle. '-' is used instead of symbol
for other tasks because it helps in clear visualization of task
of interest and secondly the symbol itself doesn't mean anything
because the sched-in of that symbol will not be printed(first sched-in
contains pid and the corresponding symbol).
When using the --task-name option, the sched-out time is represented
by a '*-'. Since not all task sched-in events are printed, the sched-out
time of the relevant task might be lost. This representation ensures
that the sched-out time of the interested task is not overlooked.
6.10.0-rc1
==========
*A0 131040.639793 secs A0 => migration/0:19
*. 131040.639801 secs . => swapper:0
. *B0 131040.639830 secs B0 => migration/1:24
. *. 131040.639836 secs
. . *C0 131040.640108 secs C0 => migration/2:30
. . *. 131040.640163 secs
. . . *D0 131040.640386 secs D0 => migration/3:36
. . . *. 131040.640395 secs
6.10.0-rc1 + patch (--task-name wdavdaemon)
=============
. *A0 . . . . - . 131040.641346 secs A0 => wdavdaemon:62509
. A0 *B0 . . . - . 131040.641378 secs B0 => wdavdaemon:62274
- *- B0 . . . - . 131040.641379 secs
*C0 . B0 . . . . . 131040.641572 secs C0 => wdavdaemon:62283
C0 . B0 . *D0 . . . 131040.641572 secs D0 => wdavdaemon:62277
C0 . B0 . D0 . *E0 . 131040.641578 secs E0 => wdavdaemon:62270
*- . B0 . D0 . E0 . 131040.641581 secs
. . B0 . D0 . *- . 131040.641583 secs
Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Link: https://lore.kernel.org/r/20240707182716.22054-2-vineethr@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Currently, the -r/--repeat option accepts values from 0 and complains
for -1. The help section specifies:
-r, --repeat <n> repeat the workload replay N times (-1: infinite)
The -r -1 option raises an error because replay_repeat is defined as
an unsigned int.
In the current implementation, the workload is repeated n times when
-r <n> is used, except when n is 0.
When -r is set to 0, the workload is also repeated once. This happens
because when -r=0, the run_one_test function is not called. (Note that
mutex unlocking, which is essential for child threads spawned to emulate
the workload, happens in run_one_test.) However, mutex unlocking is
still performed in the destroy_tasks function. Thus, -r=0 results in the
workload running once coincidentally.
To clarify and maintain the existing logic for -r >= 1 (which runs the
workload the specified number of times) and to fix the issue with infinite
runs, make -r=0 perform an infinite run.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240628071821.15264-1-vineethr@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When using perf timehist, sch delay is only computed for a waking task,
not for a pre empted task. This patches changes sch delay to account for
both. This makes sense as testing scheduling policy need to consider the
effect of scheduling delay globally, not only for waking tasks.
Example of `perf timehist` report before the patch for `stress` task
competing with each other.
First column is wait time, second column sch delay, third column
runtime.
1.492060 [0000] s stress[81] 1.999 0.000 2.000 R next: stress[83]
1.494060 [0000] s stress[83] 2.000 0.000 2.000 R next: stress[81]
1.496060 [0000] s stress[81] 2.000 0.000 2.000 R next: stress[83]
1.498060 [0000] s stress[83] 2.000 0.000 1.999 R next: stress[81]
After the patch, it looks like this (note that all wait time is not zero
anymore):
1.492060 [0000] s stress[81] 1.999 1.999 2.000 R next: stress[83]
1.494060 [0000] s stress[83] 2.000 2.000 2.000 R next: stress[81]
1.496060 [0000] s stress[81] 2.000 2.000 2.000 R next: stress[83]
1.498060 [0000] s stress[83] 2.000 2.000 1.999 R next: stress[81]
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240618090339.87482-1-sieberf@amazon.com
Add a perf man page document that describes how to exploit AMD IBS with
Linux perf. Brief intro about IBS and simple one-liner examples will help
naive users to get started. This is not meant to be an exhaustive IBS
guide. User should refer latest AMD64 Architecture Programmer's Manual
for detailed description of IBS.
Usage:
$ man perf-amd-ibs
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: ananth.narayan@amd.com
Cc: sandipan.das@amd.com
Cc: santosh.shukla@amd.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240620054104.815-1-ravi.bangoria@amd.com
Change "perf lock info" argument handling to:
Display both map and thread info (rather than an error) when neither are
specified.
Display both map and thread info (rather than just thread info) when
both are requested.
Signed-off-by: Nick Forrington <nick.forrington@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240513091413.738537-2-nick.forrington@arm.com