mirror-linux

History

Linus Torvalds 9e906a9dea [GIT PULL] perf tools changes for v6.19 Perf event/metric description ----------------------------- Unify all event and metric descriptions in JSON format. Now event parsing and handling is greatly simplified by that. From users point of view, perf list will provide richer information about hardware events like the following. $ perf list hw List of pre-defined events (to be used in -e or -M): legacy hardware: branch-instructions [Retired branch instructions [This event is an alias of branches]. Unit: cpu] branch-misses [Mispredicted branch instructions. Unit: cpu] branches [Retired branch instructions [This event is an alias of branch-instructions]. Unit: cpu] bus-cycles [Bus cycles,which can be different from total cycles. Unit: cpu] cache-misses [Cache misses. Usually this indicates Last Level Cache misses; this is intended to be used in conjunction with the PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates. Unit: cpu] cache-references [Cache accesses. Usually this indicates Last Level Cache accesses but this may vary depending on your CPU. This may include prefetches and coherency messages; again this depends on the design of your CPU. Unit: cpu] cpu-cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cycles]. Unit: cpu] cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cpu-cycles]. Unit: cpu] instructions [Retired instructions. Be careful,these can be affected by various issues,most notably hardware interrupt counts. Unit: cpu] ref-cycles [Total cycles; not affected by CPU frequency scaling. Unit: cpu] But most notable changes would be in the perf stat. On the right side, the default metrics are better named and aligned. :) $ perf stat -- perf test -w noploop Performance counter stats for 'perf test -w noploop': 11 context-switches # 10.8 cs/sec cs_per_second 0 cpu-migrations # 0.0 migrations/sec migrations_per_second 3,612 page-faults # 3532.5 faults/sec page_faults_per_second 1,022.51 msec task-clock # 1.0 CPUs CPUs_utilized 110,466 branch-misses # 0.0 % branch_miss_rate (88.66%) 6,934,452,104 branches # 6781.8 M/sec branch_frequency (88.66%) 4,657,032,590 cpu-cycles # 4.6 GHz cycles_frequency (88.65%) 27,755,874,218 instructions # 6.0 instructions insn_per_cycle (89.03%) TopdownL1 # 0.3 % tma_backend_bound # 9.3 % tma_bad_speculation (89.05%) # 9.7 % tma_frontend_bound (77.86%) # 80.7 % tma_retiring (88.81%) 1.025318171 seconds time elapsed 1.013248000 seconds user 0.012014000 seconds sys Deferred unwinding support -------------------------- With the kernel support [1], perf can use deferred callchains for userspace stack trace with frame pointers like below: $ perf record --call-graph fp,defer ... This will be transparent to users when it comes to other commands like perf report and perf script. They will merge the deferred callchains to the previous samples as if they were collected together. [1] https://git.kernel.org/torvalds/c/c69993ecdd4dfde2b7da08b022052a33b203da07 ARM SPE updates --------------- * Extensive enhancements to support various kinds of memory operations including GCS, MTE allocation tags, memcpy/memset, register access, and SIMD operations. * Add inverted data source filter (inv_data_src_filter) support to exclude certain data sources. * Improve documentation. Vendor event updates -------------------- * Intel: Updated event files for Sierra Forest, Panther Lake, Meteor Lake, Lunar Lake, Granite Rapids, and others. * Arm64: Added metrics for i.MX94 DDR PMU and Cortex-A720AE definitions. * RISC-V: Added JSON support for T-HEAD C920V2. Misc ---- * Improve pointer tracking in data type profiling. It'd give better output when the variable is using container_of() to convert type. * Annotation support for perf c2c report in TUI. Press 'a' key to enter annotation view from cacheline browser window. This will show which instruction is causing the cacheline contention. * Lots of fixes and test coverage improvements! Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCaTUiWgAKCRCMstVUGiXM gzO3AQCaPM1/xAOtZ3Z21QEBrP+A0yFhmWMkI54IqZLsFl6qzQD/fvuorMblR+9W Nlr0Yyyo3zWnl2CD6s6AraIcLR5gVQs= =mjYC -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.19-2025-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "Perf event/metric description: Unify all event and metric descriptions in JSON format. Now event parsing and handling is greatly simplified by that. From users point of view, perf list will provide richer information about hardware events like the following. $ perf list hw List of pre-defined events (to be used in -e or -M): legacy hardware: branch-instructions [Retired branch instructions [This event is an alias of branches]. Unit: cpu] branch-misses [Mispredicted branch instructions. Unit: cpu] branches [Retired branch instructions [This event is an alias of branch-instructions]. Unit: cpu] bus-cycles [Bus cycles,which can be different from total cycles. Unit: cpu] cache-misses [Cache misses. Usually this indicates Last Level Cache misses; this is intended to be used in conjunction with the PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates. Unit: cpu] cache-references [Cache accesses. Usually this indicates Last Level Cache accesses but this may vary depending on your CPU. This may include prefetches and coherency messages; again this depends on the design of your CPU. Unit: cpu] cpu-cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cycles]. Unit: cpu] cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cpu-cycles]. Unit: cpu] instructions [Retired instructions. Be careful,these can be affected by various issues,most notably hardware interrupt counts. Unit: cpu] ref-cycles [Total cycles; not affected by CPU frequency scaling. Unit: cpu] But most notable changes would be in the perf stat. On the right side, the default metrics are better named and aligned. :) $ perf stat -- perf test -w noploop Performance counter stats for 'perf test -w noploop': 11 context-switches # 10.8 cs/sec cs_per_second 0 cpu-migrations # 0.0 migrations/sec migrations_per_second 3,612 page-faults # 3532.5 faults/sec page_faults_per_second 1,022.51 msec task-clock # 1.0 CPUs CPUs_utilized 110,466 branch-misses # 0.0 % branch_miss_rate (88.66%) 6,934,452,104 branches # 6781.8 M/sec branch_frequency (88.66%) 4,657,032,590 cpu-cycles # 4.6 GHz cycles_frequency (88.65%) 27,755,874,218 instructions # 6.0 instructions insn_per_cycle (89.03%) TopdownL1 # 0.3 % tma_backend_bound # 9.3 % tma_bad_speculation (89.05%) # 9.7 % tma_frontend_bound (77.86%) # 80.7 % tma_retiring (88.81%) 1.025318171 seconds time elapsed 1.013248000 seconds user 0.012014000 seconds sys Deferred unwinding support: With the kernel support (commit c69993ecdd4d: "perf: Support deferred user unwind"), perf can use deferred callchains for userspace stack trace with frame pointers like below: $ perf record --call-graph fp,defer ... This will be transparent to users when it comes to other commands like perf report and perf script. They will merge the deferred callchains to the previous samples as if they were collected together. ARM SPE updates - Extensive enhancements to support various kinds of memory operations including GCS, MTE allocation tags, memcpy/memset, register access, and SIMD operations. - Add inverted data source filter (inv_data_src_filter) support to exclude certain data sources. - Improve documentation. Vendor event updates: - Intel: Updated event files for Sierra Forest, Panther Lake, Meteor Lake, Lunar Lake, Granite Rapids, and others. - Arm64: Added metrics for i.MX94 DDR PMU and Cortex-A720AE definitions. - RISC-V: Added JSON support for T-HEAD C920V2. Misc: - Improve pointer tracking in data type profiling. It'd give better output when the variable is using container_of() to convert type. - Annotation support for perf c2c report in TUI. Press 'a' key to enter annotation view from cacheline browser window. This will show which instruction is causing the cacheline contention. - Lots of fixes and test coverage improvements!" * tag 'perf-tools-for-v6.19-2025-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (214 commits) libperf: Use 'extern' in LIBPERF_API visibility macro perf stat: Improve handling of termination by signal perf tests stat: Add test for error for an offline CPU perf stat: When no events, don't report an error if there is none perf tests stat: Add "--null" coverage perf cpumap: Add "any" CPU handling to cpu_map__snprint_mask libperf cpumap: Fix perf_cpu_map__max for an empty/NULL map perf stat: Allow no events to open if this is a "--null" run perf test kvm: Add some basic perf kvm test coverage perf tests evlist: Add basic evlist test perf tests script dlfilter: Add a dlfilter test perf tests kallsyms: Add basic kallsyms test perf tests timechart: Add a perf timechart test perf tests top: Add basic perf top coverage test perf tests buildid: Add purge and remove testing perf tests c2c: Add a basic c2c perf c2c: Clean up some defensive gets and make asan clean perf jitdump: Fix missed dso__put perf mem-events: Don't leak online CPU map perf hist: In init, ensure mem_info is put on error paths ...		2025-12-07 07:07:02 -08:00
..
Documentation	perf timechart: Add record support for output perf.data path	2025-12-03 11:07:23 -08:00
arch	[GIT PULL] perf tools changes for v6.19	2025-12-07 07:07:02 -08:00
bench	perf tools: Don't read build-ids from non-regular files	2025-11-26 10:13:38 -08:00
check-header_ignore_hunks/lib	…
dlfilters	…
include/perf	…
jvmti	…
pmu-events	perf vendor events intel: Update sierraforest events from 1.12 to 1.13	2025-12-03 11:02:07 -08:00
python	perf ilist: Be tolerant of reading a metric on the wrong CPU	2025-12-02 16:12:49 -08:00
scripts	perf build: Disable thread safety analysis for perl header	2025-10-06 16:49:25 -03:00
tests	[GIT PULL] perf tools changes for v6.19	2025-12-07 07:07:02 -08:00
trace	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2025-11-13 12:35:38 -08:00
ui	perf auxtrace: Remove errno.h from auxtrace.h and fix transitive dependencies	2025-11-13 23:03:11 -08:00
util	[GIT PULL] perf tools changes for v6.19	2025-12-07 07:07:02 -08:00
.gitignore	…
Build	…
CREDITS	…
MANIFEST	…
Makefile	…
Makefile.config	[GIT PULL] perf tools changes for v6.19	2025-12-07 07:07:02 -08:00
Makefile.perf	[GIT PULL] perf tools changes for v6.19	2025-12-07 07:07:02 -08:00
builtin-annotate.c	perf tool: Add the perf_tool argument to all callbacks	2025-11-07 13:25:05 -08:00
builtin-bench.c	perf bench mem: Add mmap() workloads	2025-09-19 12:43:59 -03:00
builtin-buildid-cache.c	perf tools: Don't read build-ids from non-regular files	2025-11-26 10:13:38 -08:00
builtin-buildid-list.c	…
builtin-c2c.c	perf c2c: Clean up some defensive gets and make asan clean	2025-12-03 11:07:46 -08:00
builtin-check.c	perf build: Remove NO_AUXTRACE build option	2025-11-13 23:03:11 -08:00
builtin-config.c	…
builtin-daemon.c	…
builtin-data.c	…
builtin-diff.c	…
builtin-evlist.c	perf tool: Add the perf_tool argument to all callbacks	2025-11-07 13:25:05 -08:00
builtin-ftrace.c	…
builtin-help.c	…
builtin-inject.c	perf tools: Merge deferred user callchains	2025-12-02 21:59:14 -08:00
builtin-kallsyms.c	…
builtin-kmem.c	…
builtin-kvm.c	perf kvm: Fix debug assertion	2025-12-03 11:07:19 -08:00
builtin-kwork.c	perf tools kwork: Add missed memory allocation check and free	2025-10-02 15:30:30 -03:00
builtin-list.c	perf list: Support filtering in JSON output	2025-11-20 11:11:48 -08:00
builtin-lock.c	perf lock: Fix segfault due to missing kernel map	2025-11-13 17:17:41 -03:00
builtin-mem.c	perf auxtrace: Remove errno.h from auxtrace.h and fix transitive dependencies	2025-11-13 23:03:11 -08:00
builtin-probe.c	…
builtin-record.c	perf build: Remove NO_AUXTRACE build option	2025-11-13 23:03:11 -08:00
builtin-report.c	perf tools: Merge deferred user callchains	2025-12-02 21:59:14 -08:00
builtin-sched.c	perf sched: Avoid union type punning undefined behavior	2025-10-01 11:22:04 -03:00
builtin-script.c	perf tools: Merge deferred user callchains	2025-12-02 21:59:14 -08:00
builtin-stat.c	perf stat: Improve handling of termination by signal	2025-12-04 15:44:39 -08:00
builtin-timechart.c	perf timechart: Add record support for output perf.data path	2025-12-03 11:07:23 -08:00
builtin-top.c	perf top: Use evlist__new_default when no events specified	2025-10-15 23:59:11 +09:00
builtin-trace.c	perf trace: Skip internal syscall arguments	2025-11-29 12:23:37 -08:00
builtin-version.c	…
builtin.h	…
check-headers.sh	tools include: Add headers to make tools builds more hermetic	2025-10-02 15:13:19 -03:00
command-list.txt	…
design.txt	…
perf-archive.sh	…
perf-completion.sh	…
perf-iostat.sh	…
perf-read-vdso.c	…
perf-sys.h	…
perf.c	…
perf.h	perf: Completely remove possibility to override MAX_NR_CPUS	2025-09-12 10:52:22 -03:00