mirror-linux/tools/perf/arch/x86/util
Ian Rogers 0ffca606e9 perf pmu intel: Adjust cpumaks for sub-NUMA clusters on graniterapids
On graniterapids the cache home agent (CHA) and memory controller
(IMC) PMUs all have their cpumask set to per-socket information. In
order for per NUMA node aggregation to work correctly the PMUs cpumask
needs to be set to CPUs for the relevant sub-NUMA grouping.

For example, on a 2 socket graniterapids machine with sub NUMA
clustering of 3, for uncore_cha and uncore_imc PMUs the cpumask is
"0,120" leading to aggregation only on NUMA nodes 0 and 3:
```
$ perf stat --per-node -e 'UNC_CHA_CLOCKTICKS,UNC_M_CLOCKTICKS' -a sleep 1

 Performance counter stats for 'system wide':

N0        1    277,835,681,344      UNC_CHA_CLOCKTICKS
N0        1     19,242,894,228      UNC_M_CLOCKTICKS
N3        1    277,803,448,124      UNC_CHA_CLOCKTICKS
N3        1     19,240,741,498      UNC_M_CLOCKTICKS

       1.002113847 seconds time elapsed
```

By updating the PMUs cpumasks to "0,120", "40,160" and "80,200" then
the correctly 6 NUMA node aggregations are achieved:
```
$ perf stat --per-node -e 'UNC_CHA_CLOCKTICKS,UNC_M_CLOCKTICKS' -a sleep 1

 Performance counter stats for 'system wide':

N0        1     92,748,667,796      UNC_CHA_CLOCKTICKS
N0        0      6,424,021,142      UNC_M_CLOCKTICKS
N1        0     92,753,504,424      UNC_CHA_CLOCKTICKS
N1        1      6,424,308,338      UNC_M_CLOCKTICKS
N2        0     92,751,170,084      UNC_CHA_CLOCKTICKS
N2        0      6,424,227,402      UNC_M_CLOCKTICKS
N3        1     92,745,944,144      UNC_CHA_CLOCKTICKS
N3        0      6,423,752,086      UNC_M_CLOCKTICKS
N4        0     92,725,793,788      UNC_CHA_CLOCKTICKS
N4        1      6,422,393,266      UNC_M_CLOCKTICKS
N5        0     92,717,504,388      UNC_CHA_CLOCKTICKS
N5        0      6,421,842,618      UNC_M_CLOCKTICKS

       1.003406645 seconds time elapsed
```

In general, having the perf tool adjust cpumasks isn't desirable as
ideally the PMU driver would be advertising the correct cpumask.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250515181417.491401-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-22 23:15:48 -03:00
..
Build perf x86: Define arch_fetch_insn in NO_AUXTRACE builds 2024-12-18 16:24:33 -03:00
archinsn.c
auxtrace.c perf header: Refactor get_cpuid to take a CPU for ARM 2024-11-16 16:37:54 -03:00
cpuid.h
event.c perf tool: Constify tool pointers 2024-08-12 18:05:14 -03:00
evlist.c perf x86 evlist: Update comments on topdown regrouping 2025-03-11 19:04:56 -07:00
evsel.c perf x86/topdown: Complete topdown slots/metrics events check 2024-09-30 15:23:43 -07:00
evsel.h
header.c perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str 2024-11-16 16:40:30 -03:00
intel-bts.c perf auxtrace: Remove unused 'pmu' pointer from struct auxtrace_record 2024-08-28 18:15:16 -03:00
intel-pt.c perf intel-pt: Do not default to recording all switch events 2025-05-12 14:18:16 -03:00
iostat.c perf tools: Fix up some comments and code to properly use the event_source bus 2025-02-19 13:23:43 -08:00
kvm-stat.c
machine.c
mem-events.c perf mem/c2c amd: Add ldlat support 2025-04-29 22:30:46 -03:00
mem-events.h perf mem/c2c amd: Add ldlat support 2025-04-29 22:30:46 -03:00
perf_regs.c perf parse-regs: Introduce a weak function arch__sample_reg_masks() 2024-02-15 13:48:36 -08:00
pmu.c perf pmu intel: Adjust cpumaks for sub-NUMA clusters on graniterapids 2025-05-22 23:15:48 -03:00
topdown.c perf x86/topdown: Fix topdown leader sampling test error on hybrid 2025-03-11 19:00:50 -07:00
topdown.h perf x86/topdown: Complete topdown slots/metrics events check 2024-09-30 15:23:43 -07:00
tsc.c perf tool_pmu: Move expr literals to tool_pmu 2024-10-10 23:40:32 -07:00
unwind-libdw.c perf sample: Make user_regs and intr_regs optional 2025-02-12 20:06:11 -08:00
unwind-libunwind.c