perf tools updates for 7.1

perf report: - Add 'comm_nodigit' sort key to combine similar threads that only have different numbers in the comm. In the following example, the 'comm_nodigit' will have samples from all threads starting with "bpfrb/" into an entry "bpfrb/<N>". $ perf report -s comm_nodigit,comm -H ... # # Overhead CommandNoDigit / Command # ........... ........................ # 20.30% swapper 20.30% swapper 13.37% chrome 13.37% chrome 10.07% bpfrb/<N> 7.47% bpfrb/0 0.70% bpfrb/1 0.47% bpfrb/3 0.46% bpfrb/2 0.25% bpfrb/4 0.23% bpfrb/5 0.20% bpfrb/6 0.14% bpfrb/10 0.07% bpfrb/7 - Support flat layout for symfs. The --symfs option is to specify the location of debugging symbol files. The default 'hierarchy' layout would search the symbol file using the same path of the original file under the symfs root. The new 'flat' layout would search only in the root directory. - Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more predicate flags. perf stat: - Add --pmu-filter option to select specific PMUs. This would be useful when you measure metrics from multiple instance of uncore PMUs with similar names. # perf stat -M cpa_p0_avg_bw Performance counter stats for 'system wide': 19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl0_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/ 19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl10_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/ 19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw 75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/ 18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/ 19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl8_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/ 19.417734480 seconds time elapsed With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU. # perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw Performance counter stats for 'system wide': 6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw 50,548,465 cpa_p0_wr_dat 7,552,182 cpa_p0_rd_dat_64b 0 cpa_p0_rd_dat_32b 6.234139320 seconds time elapsed Data type profiling: - Quality improvements by tracking register state more precisely. - Ensure array members to get the type. - Handle more cases for global variables. Vendor event/metric updates: - Update various Intel events and metrics - Add NVIDIA Tegra 410 Olympus events Internal changes: - Verify perf.data header for maliciously crafted files. - Update perf test to cover more usages and make them robust. - Move a couple of copied kernel headers not to annoy objtool build. - Fix a bug in map sorting in name order. - Remove some unused codes. Misc: - Fix module symbol resolution with non-zero text address. - Add -t/--threads option to `perf bench mem mmap`. - Track duration of exit*() syscall by `perf trace -s`. - Add core.addr2line-timeout and core.addr2line-disable-warn config items. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCaeKePAAKCRCMstVUGiXM g5HiAQD7V4hiNd1atnY2slRfvkqSV7wlrXjYEQj01Ht0eJxJwAEA+3991R+6+RTZ 9AbC0LvjBgKhnRDR1/DE+GkXUmQZnwA= =rlNN -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf report: - Add 'comm_nodigit' sort key to combine similar threads that only have different numbers in the comm. In the following example, the 'comm_nodigit' will have samples from all threads starting with "bpfrb/" into an entry "bpfrb/<N>". $ perf report -s comm_nodigit,comm -H ... # # Overhead CommandNoDigit / Command # ........... ........................ # 20.30% swapper 20.30% swapper 13.37% chrome 13.37% chrome 10.07% bpfrb/<N> 7.47% bpfrb/0 0.70% bpfrb/1 0.47% bpfrb/3 0.46% bpfrb/2 0.25% bpfrb/4 0.23% bpfrb/5 0.20% bpfrb/6 0.14% bpfrb/10 0.07% bpfrb/7 - Support flat layout for symfs. The --symfs option is to specify the location of debugging symbol files. The default 'hierarchy' layout would search the symbol file using the same path of the original file under the symfs root. The new 'flat' layout would search only in the root directory. - Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more predicate flags. perf stat: - Add --pmu-filter option to select specific PMUs. This would be useful when you measure metrics from multiple instance of uncore PMUs with similar names. # perf stat -M cpa_p0_avg_bw Performance counter stats for 'system wide': 19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl0_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/ 19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl10_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/ 19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw 75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/ 18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/ 19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl8_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/ 19.417734480 seconds time elapsed With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU. # perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw Performance counter stats for 'system wide': 6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw 50,548,465 cpa_p0_wr_dat 7,552,182 cpa_p0_rd_dat_64b 0 cpa_p0_rd_dat_32b 6.234139320 seconds time elapsed Data type profiling: - Quality improvements by tracking register state more precisely - Ensure array members to get the type - Handle more cases for global variables Vendor event/metric updates: - Update various Intel events and metrics - Add NVIDIA Tegra 410 Olympus events Internal changes: - Verify perf.data header for maliciously crafted files - Update perf test to cover more usages and make them robust - Move a couple of copied kernel headers not to annoy objtool build - Fix a bug in map sorting in name order - Remove some unused codes Misc: - Fix module symbol resolution with non-zero text address - Add -t/--threads option to `perf bench mem mmap` - Track duration of exit*() syscall by `perf trace -s` - Add core.addr2line-timeout and core.addr2line-disable-warn config items" * tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (131 commits) perf loongarch: Fix build failure with CONFIG_LIBDW_DWARF_UNWIND perf annotate: Use jump__delete when freeing LoongArch jumps perf test: Fixes for check branch stack sampling perf test: Fix inet_pton probe failure and unroll call graph perf build: fix "argument list too long" in second location perf header: Add sanity checks to HEADER_BPF_BTF processing perf header: Sanity check HEADER_BPF_PROG_INFO perf header: Sanity check HEADER_PMU_CAPS perf header: Sanity check HEADER_HYBRID_TOPOLOGY perf header: Sanity check HEADER_CACHE perf header: Sanity check HEADER_GROUP_DESC perf header: Sanity check HEADER_PMU_MAPPINGS perf header: Sanity check HEADER_MEM_TOPOLOGY perf header: Sanity check HEADER_NUMA_TOPOLOGY perf header: Sanity check HEADER_CPU_TOPOLOGY perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO perf header: Bump up the max number of command line args allowed perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO perf sample: Fix documentation typo perf arm_spe: Improve SIMD flags setting ...
2026-04-18 09:24:56 -07:00 · 2026-04-18 09:24:56 -07:00 · df8f6181ab
parent 8541d8f725 9a683fe0a0
commit df8f6181ab
245 changed files with 6800 additions and 1679 deletions
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@ -104,12 +104,18 @@ else
  endif
 endif

+ifeq ($(findstring -static,${LDFLAGS}),-static)
+  PKG_CONFIG += --static
+endif
+
 all: $(FILES)

 __BUILD = $(CC) $(CFLAGS) -MD -Wall -Werror -o $@ $(patsubst %.bin,%.c,$(@F)) $(LDFLAGS)
  BUILD = $(__BUILD) > $(@:.bin=.make.output) 2>&1
  BUILD_BFD = $(BUILD) -DPACKAGE='"perf"' -lbfd -ldl
-  BUILD_ALL = $(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf -lnuma -lelf -lslang $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) -ldl -lz -llzma -lzstd -lssl
+  BUILD_ALL = $(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf -lnuma -lelf -lslang \
+	      $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) -ldl -lz -llzma -lzstd \
+	      $(shell $(PKG_CONFIG) --libs --cflags openssl 2>/dev/null)

 __BUILDXX = $(CXX) $(CXXFLAGS) -MD -Wall -Werror -o $@ $(patsubst %.bin,%.cpp,$(@F)) $(LDFLAGS)
  BUILDXX = $(__BUILDXX) > $(@:.bin=.make.output) 2>&1
@ -388,7 +394,7 @@ $(OUTPUT)test-libpfm4.bin:
 	$(BUILD) -lpfm

 $(OUTPUT)test-libopenssl.bin:
-	$(BUILD) -lssl
+	$(BUILD) $(shell $(PKG_CONFIG) --libs --cflags openssl 2>/dev/null)

 $(OUTPUT)test-bpftool-skeletons.bin:
 	$(SYSTEM_BPFTOOL) version | grep '^features:.*skeletons' \
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@ -15,12 +15,12 @@

 #define MAX_NR_CPUS 4096

-void perf_cpu_map__set_nr(struct perf_cpu_map *map, int nr_cpus)
+void perf_cpu_map__set_nr(struct perf_cpu_map *map, unsigned int nr_cpus)
 {
 	RC_CHK_ACCESS(map)->nr = nr_cpus;
 }

-struct perf_cpu_map *perf_cpu_map__alloc(int nr_cpus)
+struct perf_cpu_map *perf_cpu_map__alloc(unsigned int nr_cpus)
 {
 	RC_STRUCT(perf_cpu_map) *cpus;
 	struct perf_cpu_map *result;
@ -78,7 +78,7 @@ void perf_cpu_map__put(struct perf_cpu_map *map)
 static struct perf_cpu_map *cpu_map__new_sysconf(void)
 {
 	struct perf_cpu_map *cpus;
-	int nr_cpus, nr_cpus_conf;
+	long nr_cpus, nr_cpus_conf;

 	nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
 	if (nr_cpus < 0)
@ -86,15 +86,13 @@ static struct perf_cpu_map *cpu_map__new_sysconf(void)

 	nr_cpus_conf = sysconf(_SC_NPROCESSORS_CONF);
 	if (nr_cpus != nr_cpus_conf) {
-		pr_warning("Number of online CPUs (%d) differs from the number configured (%d) the CPU map will only cover the first %d CPUs.",
+		pr_warning("Number of online CPUs (%ld) differs from the number configured (%ld) the CPU map will only cover the first %ld CPUs.",
 			nr_cpus, nr_cpus_conf, nr_cpus);
 	}

 	cpus = perf_cpu_map__alloc(nr_cpus);
 	if (cpus != NULL) {
-		int i;
-
-		for (i = 0; i < nr_cpus; ++i)
+		for (long i = 0; i < nr_cpus; ++i)
 			RC_CHK_ACCESS(cpus)->map[i].cpu = i;
 	}

@ -132,23 +130,23 @@ static int cmp_cpu(const void *a, const void *b)
 	return cpu_a->cpu - cpu_b->cpu;
 }

-static struct perf_cpu __perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx)
+static struct perf_cpu __perf_cpu_map__cpu(const struct perf_cpu_map *cpus, unsigned int idx)
 {
 	return RC_CHK_ACCESS(cpus)->map[idx];
 }

-static struct perf_cpu_map *cpu_map__trim_new(int nr_cpus, const struct perf_cpu *tmp_cpus)
+static struct perf_cpu_map *cpu_map__trim_new(unsigned int nr_cpus, const struct perf_cpu *tmp_cpus)
 {
 	size_t payload_size = nr_cpus * sizeof(struct perf_cpu);
 	struct perf_cpu_map *cpus = perf_cpu_map__alloc(nr_cpus);
-	int i, j;

 	if (cpus != NULL) {
+		unsigned int j = 0;
+
 		memcpy(RC_CHK_ACCESS(cpus)->map, tmp_cpus, payload_size);
 		qsort(RC_CHK_ACCESS(cpus)->map, nr_cpus, sizeof(struct perf_cpu), cmp_cpu);
 		/* Remove dups */
-		j = 0;
-		for (i = 0; i < nr_cpus; i++) {
+		for (unsigned int i = 0; i < nr_cpus; i++) {
 			if (i == 0 ||
 			    __perf_cpu_map__cpu(cpus, i).cpu !=
 			    __perf_cpu_map__cpu(cpus, i - 1).cpu) {
@ -167,9 +165,8 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
 	struct perf_cpu_map *cpus = NULL;
 	unsigned long start_cpu, end_cpu = 0;
 	char *p = NULL;
-	int i, nr_cpus = 0;
+	unsigned int nr_cpus = 0, max_entries = 0;
 	struct perf_cpu *tmp_cpus = NULL, *tmp;
-	int max_entries = 0;

 	if (!cpu_list)
 		return perf_cpu_map__new_online_cpus();
@ -208,9 +205,10 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)

 		for (; start_cpu <= end_cpu; start_cpu++) {
 			/* check for duplicates */
-			for (i = 0; i < nr_cpus; i++)
+			for (unsigned int i = 0; i < nr_cpus; i++) {
 				if (tmp_cpus[i].cpu == (int16_t)start_cpu)
 					goto invalid;
+			}

 			if (nr_cpus == max_entries) {
 				max_entries += max(end_cpu - start_cpu + 1, 16UL);
@ -252,12 +250,12 @@ struct perf_cpu_map *perf_cpu_map__new_int(int cpu)
 	return cpus;
 }

-static int __perf_cpu_map__nr(const struct perf_cpu_map *cpus)
+static unsigned int __perf_cpu_map__nr(const struct perf_cpu_map *cpus)
 {
 	return RC_CHK_ACCESS(cpus)->nr;
 }

-struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx)
+struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, unsigned int idx)
 {
 	struct perf_cpu result = {
 		.cpu = -1
@ -269,7 +267,7 @@ struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx)
 	return result;
 }

-int perf_cpu_map__nr(const struct perf_cpu_map *cpus)
+unsigned int perf_cpu_map__nr(const struct perf_cpu_map *cpus)
 {
 	return cpus ? __perf_cpu_map__nr(cpus) : 1;
 }
@ -294,7 +292,7 @@ bool perf_cpu_map__is_empty(const struct perf_cpu_map *map)

 int perf_cpu_map__idx(const struct perf_cpu_map *cpus, struct perf_cpu cpu)
 {
-	int low, high;
+	unsigned int low, high;

 	if (!cpus)
 		return -1;
@ -324,7 +322,7 @@ bool perf_cpu_map__has(const struct perf_cpu_map *cpus, struct perf_cpu cpu)

 bool perf_cpu_map__equal(const struct perf_cpu_map *lhs, const struct perf_cpu_map *rhs)
 {
-	int nr;
+	unsigned int nr;

 	if (lhs == rhs)
 		return true;
@ -336,7 +334,7 @@ bool perf_cpu_map__equal(const struct perf_cpu_map *lhs, const struct perf_cpu_m
 	if (nr != __perf_cpu_map__nr(rhs))
 		return false;

-	for (int idx = 0; idx < nr; idx++) {
+	for (unsigned int idx = 0; idx < nr; idx++) {
 		if (__perf_cpu_map__cpu(lhs, idx).cpu != __perf_cpu_map__cpu(rhs, idx).cpu)
 			return false;
 	}
@ -353,7 +351,7 @@ struct perf_cpu perf_cpu_map__min(const struct perf_cpu_map *map)
 	struct perf_cpu cpu, result = {
 		.cpu = -1
 	};
-	int idx;
+	unsigned int idx;

 	perf_cpu_map__for_each_cpu_skip_any(cpu, idx, map) {
 		result = cpu;
@ -384,7 +382,7 @@ bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu
 	if (!a || __perf_cpu_map__nr(b) > __perf_cpu_map__nr(a))
 		return false;

-	for (int i = 0, j = 0; i < __perf_cpu_map__nr(a); i++) {
+	for (unsigned int i = 0, j = 0; i < __perf_cpu_map__nr(a); i++) {
 		if (__perf_cpu_map__cpu(a, i).cpu > __perf_cpu_map__cpu(b, j).cpu)
 			return false;
 		if (__perf_cpu_map__cpu(a, i).cpu == __perf_cpu_map__cpu(b, j).cpu) {
@ -410,8 +408,7 @@ bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu
 int perf_cpu_map__merge(struct perf_cpu_map **orig, struct perf_cpu_map *other)
 {
 	struct perf_cpu *tmp_cpus;
-	int tmp_len;
-	int i, j, k;
+	unsigned int tmp_len, i, j, k;
 	struct perf_cpu_map *merged;

 	if (perf_cpu_map__is_subset(*orig, other))
@ -455,7 +452,7 @@ int perf_cpu_map__merge(struct perf_cpu_map **orig, struct perf_cpu_map *other)
 struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
 					     struct perf_cpu_map *other)
 {
-	int i, j, k;
+	unsigned int i, j, k;
 	struct perf_cpu_map *merged;

 	if (perf_cpu_map__is_subset(other, orig))
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@ -127,7 +127,8 @@ int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus,
 		     struct perf_thread_map *threads)
 {
 	struct perf_cpu cpu;
-	int idx, thread, err = 0;
+	unsigned int idx;
+	int thread, err = 0;

 	if (cpus == NULL) {
 		static struct perf_cpu_map *empty_cpu_map;
@ -460,7 +461,7 @@ int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu_map_idx)
 int perf_evsel__enable_thread(struct perf_evsel *evsel, int thread)
 {
 	struct perf_cpu cpu __maybe_unused;
-	int idx;
+	unsigned int idx;
 	int err;

 	perf_cpu_map__for_each_cpu(cpu, idx, evsel->cpus) {
@ -499,12 +500,13 @@ int perf_evsel__disable(struct perf_evsel *evsel)

 int perf_evsel__apply_filter(struct perf_evsel *evsel, const char *filter)
 {
-	int err = 0, i;
+	int err = 0;

-	for (i = 0; i < perf_cpu_map__nr(evsel->cpus) && !err; i++)
+	for (unsigned int i = 0; i < perf_cpu_map__nr(evsel->cpus) && !err; i++) {
 		err = perf_evsel__run_ioctl(evsel,
 				     PERF_EVENT_IOC_SET_FILTER,
 				     (void *)filter, i);
+	}
 	return err;
 }

--- a/tools/lib/perf/include/internal/cpumap.h
+++ b/tools/lib/perf/include/internal/cpumap.h
@ -16,16 +16,16 @@
 DECLARE_RC_STRUCT(perf_cpu_map) {
 	refcount_t	refcnt;
 	/** Length of the map array. */
-	int		nr;
+	unsigned int	nr;
 	/** The CPU values. */
 	struct perf_cpu	map[];
 };

-struct perf_cpu_map *perf_cpu_map__alloc(int nr_cpus);
+struct perf_cpu_map *perf_cpu_map__alloc(unsigned int nr_cpus);
 int perf_cpu_map__idx(const struct perf_cpu_map *cpus, struct perf_cpu cpu);
 bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b);

-void perf_cpu_map__set_nr(struct perf_cpu_map *map, int nr_cpus);
+void perf_cpu_map__set_nr(struct perf_cpu_map *map, unsigned int nr_cpus);

 static inline refcount_t *perf_cpu_map__refcnt(struct perf_cpu_map *map)
 {
--- a/tools/lib/perf/include/perf/cpumap.h
+++ b/tools/lib/perf/include/perf/cpumap.h
@ -49,7 +49,7 @@ LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
 * perf_cpu_map__cpu - get the CPU value at the given index. Returns -1 if index
 *                     is invalid.
 */
-LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
+LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, unsigned int idx);
 /**
 * perf_cpu_map__nr - for an empty map returns 1, as perf_cpu_map__cpu returns a
 *                    cpu of -1 for an invalid index, this makes an empty map
@ -57,7 +57,7 @@ LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, i
 *                    the result is the number CPUs in the map plus one if the
 *                    "any CPU"/dummy value is present.
 */
-LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
+LIBPERF_API unsigned int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
 /**
 * perf_cpu_map__has_any_cpu_or_is_empty - is map either empty or has the "any CPU"/dummy value.
 */
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@ -110,8 +110,11 @@ include::itrace.txt[]
 	Interleave source code with assembly code. Enabled by default,
 	disable with --no-source.

--symfs=<directory>::
-        Look for files with symbols relative to this directory.
+--symfs=<directory[,layout]>::
+        Look for files with symbols relative to this directory. The optional
+        layout can be 'hierarchy' (default, matches full path) or 'flat'
+        (only matches base name). This is useful when debug files are stored
+        in a flat directory structure.

 -M::
 --disassembler-style=:: Set disassembler style for objdump.
--- a/tools/perf/Documentation/perf-bench.txt
+++ b/tools/perf/Documentation/perf-bench.txt
@ -274,6 +274,10 @@ Repeat mmap() invocation this number of times.
 --cycles::
 Use perf's cpu-cycles event instead of gettimeofday syscall.

+-t::
+--threads=<NUM>::
+Create multiple threads to call mmap/munmap concurrently.
+
 SUITES FOR 'numa'
 ~~~~~~~~~~~~~~~~~
 *mem*::
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@ -210,6 +210,12 @@ core.*::
 		Sets a timeout (in milliseconds) for parsing /proc/<pid>/maps files.
 		Can be overridden by the --proc-map-timeout option on supported
 		subcommands. The default timeout is 500ms.
+	addr2line-disable-warn::
+		When set to 'true' disable all warnings from 'addr2line' output.
+		Default setting is 'false' to show these warnings.
+	addr2line-timeout::
+		Sets a timeout (in milliseconds) for parsing 'addr2line'
+		output.  The default timeout is 5s.

 tui.*, gtk.*::
 	Subcommands that can be configured here are 'top', 'report' and 'annotate'.
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@ -81,8 +81,11 @@ OPTIONS
 --force::
        Don't do ownership validation.

--symfs=<directory>::
-        Look for files with symbols relative to this directory.
+--symfs=<directory[,layout]>::
+        Look for files with symbols relative to this directory. The optional
+        layout can be 'hierarchy' (default, matches full path) or 'flat'
+        (only matches base name). This is useful when debug files are stored
+        in a flat directory structure.

 -b::
 --baseline-only::
--- a/tools/perf/Documentation/perf-kwork.txt
+++ b/tools/perf/Documentation/perf-kwork.txt
@ -169,8 +169,11 @@ OPTIONS for 'perf kwork timehist'
 --max-stack::
 	Maximum number of functions to display in backtrace, default 5.

--symfs=<directory>::
-    Look for files with symbols relative to this directory.
+--symfs=<directory[,layout]>::
+        Look for files with symbols relative to this directory. The optional
+        layout can be 'hierarchy' (default, matches full path) or 'flat'
+        (only matches base name). This is useful when debug files are stored
+        in a flat directory structure.

 --time::
 	Only analyze samples within given time window: <start>,<stop>. Times
--- a/tools/perf/Documentation/perf-probe.txt
+++ b/tools/perf/Documentation/perf-probe.txt
@ -50,6 +50,12 @@ OPTIONS
 --source=PATH::
 	Specify path to kernel source.

+--symfs=<directory[,layout]>::
+	Look for files with symbols relative to this directory. The optional
+	layout can be 'hierarchy' (default, matches full path) or 'flat'
+	(only matches base name). This is useful when debug files are stored
+	in a flat directory structure.
+
 -v::
 --verbose::
        Be more verbose (show parsed arguments, etc).
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@ -88,7 +88,7 @@ OPTIONS
 	Sort histogram entries by given key(s) - multiple keys can be specified
 	in CSV format.  Following sort keys are available:
 	pid, comm, dso, symbol, parent, cpu, socket, srcline, weight,
-	local_weight, cgroup_id, addr.
+	local_weight, cgroup_id, addr, comm_nodigit.

 	Each key has following meaning:

@ -136,13 +136,17 @@ OPTIONS
 	- addr: (Full) virtual address of the sampled instruction
 	- retire_lat: On X86, this reports pipeline stall of this instruction compared
 	  to the previous instruction in cycles. And currently supported only on X86
-	- simd: Flags describing a SIMD operation. "e" for empty Arm SVE predicate. "p" for partial Arm SVE predicate
+	- simd: Flags describing a SIMD operation. The architecture type can be Arm's
+	  ASE (Advanced SIMD extension), SVE, SME. It provides an extra tag for
+	  predicate: "e" for empty predicate, "p" for partial predicate, "d" for
+	  predicate disabled, and "f" for full predicate.
 	- type: Data type of sample memory access.
 	- typeoff: Offset in the data type of sample memory access.
 	- symoff: Offset in the symbol.
 	- weight1: Average value of event specific weight (1st field of weight_struct).
 	- weight2: Average value of event specific weight (2nd field of weight_struct).
 	- weight3: Average value of event specific weight (3rd field of weight_struct).
+	- comm_nodigit: same as comm, with numbers replaced by "<N>"

 	By default, overhead, comm, dso and symbol keys are used.
 	(i.e. --sort overhead,comm,dso,symbol).
@ -368,8 +372,11 @@ OPTIONS
 --force::
        Don't do ownership validation.

--symfs=<directory>::
-        Look for files with symbols relative to this directory.
+--symfs=<directory[,layout]>::
+        Look for files with symbols relative to this directory. The optional
+        layout can be 'hierarchy' (default, matches full path) or 'flat'
+        (only matches base name). This is useful when debug files are stored
+        in a flat directory structure.

 -C::
 --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@ -437,8 +437,11 @@ OPTIONS for 'perf sched timehist'
    Show all scheduling events followed by a summary by thread with min,
    max, and average run times (in sec) and relative stddev.

--symfs=<directory>::
-    Look for files with symbols relative to this directory.
+--symfs=<directory[,layout]>::
+        Look for files with symbols relative to this directory. The optional
+        layout can be 'hierarchy' (default, matches full path) or 'flat'
+        (only matches base name). This is useful when debug files are stored
+        in a flat directory structure.

 -V::
 --cpu-visual::
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@ -307,8 +307,11 @@ OPTIONS
 --kallsyms=<file>::
        kallsyms pathname

--symfs=<directory>::
-        Look for files with symbols relative to this directory.
+--symfs=<directory[,layout]>::
+        Look for files with symbols relative to this directory. The optional
+        layout can be 'hierarchy' (default, matches full path) or 'flat'
+        (only matches base name). This is useful when debug files are stored
+        in a flat directory structure.

 -G::
 --hide-call-graph::
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@ -578,6 +578,10 @@ $ perf config stat.no-csv-summary=true
 Only enable events on applying cpu with this type for hybrid platform
 (e.g. core or atom)"

+--pmu-filter::
+Only enable events on applying pmu with specified for multiple
+pmus with same type (e.g. hisi_sicl2_cpa0 or hisi_sicl0_cpa0)
+
 EXAMPLES
 --------

--- a/tools/perf/Documentation/perf-timechart.txt
+++ b/tools/perf/Documentation/perf-timechart.txt
@ -53,8 +53,11 @@ TIMECHART OPTIONS
 -f::
 --force::
 	Don't complain, do it.
--symfs=<directory>::
-        Look for files with symbols relative to this directory.
+--symfs=<directory[,layout]>::
+        Look for files with symbols relative to this directory. The optional
+        layout can be 'hierarchy' (default, matches full path) or 'flat'
+        (only matches base name). This is useful when debug files are stored
+        in a flat directory structure.
 -n::
 --proc-num::
        Print task info for at least given number of tasks.
--- a/tools/perf/Documentation/tips.txt
+++ b/tools/perf/Documentation/tips.txt
@ -11,7 +11,7 @@ Search options using a keyword: perf report -h <keyword>
 Use parent filter to see specific call path: perf report -p <regex>
 List events using substring match: perf list <keyword>
 To see list of saved events and attributes: perf evlist -v
-Use --symfs <dir> if your symbol files are in non-standard locations
+Use --symfs <dir>[,layout] if your symbol files are in non-standard locations.
 To see callchains in a more compact form: perf report -g folded
 To see call chains by final symbol taking CPU time (bottom up) use perf report -G
 Show individual samples with: perf script
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@ -349,6 +349,7 @@ CORE_CFLAGS += -fno-omit-frame-pointer
 CORE_CFLAGS += -Wall
 CORE_CFLAGS += -Wextra
 CORE_CFLAGS += -std=gnu11
+CORE_CFLAGS += -funsigned-char

 CXXFLAGS += -std=gnu++17 -fno-exceptions -fno-rtti
 CXXFLAGS += -Wall
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@ -547,7 +547,7 @@ $(clone_flags_array): $(beauty_uapi_linux_dir)/sched.h $(clone_flags_tbl)
 	$(Q)$(SHELL) '$(clone_flags_tbl)' $(beauty_uapi_linux_dir) > $@

 drm_ioctl_array := $(beauty_ioctl_outdir)/drm_ioctl_array.c
-drm_hdr_dir := $(srctree)/tools/include/uapi/drm
+drm_hdr_dir := $(srctree)/tools/perf/trace/beauty/include/uapi/drm
 drm_ioctl_tbl := $(srctree)/tools/perf/trace/beauty/drm_ioctl.sh

 $(drm_ioctl_array): $(drm_hdr_dir)/drm.h $(drm_hdr_dir)/i915_drm.h $(drm_ioctl_tbl)
@ -556,8 +556,8 @@ $(drm_ioctl_array): $(drm_hdr_dir)/drm.h $(drm_hdr_dir)/i915_drm.h $(drm_ioctl_t
 fadvise_advice_array := $(beauty_outdir)/fadvise_advice_array.c
 fadvise_advice_tbl := $(srctree)/tools/perf/trace/beauty/fadvise.sh

-$(fadvise_advice_array): $(linux_uapi_dir)/in.h $(fadvise_advice_tbl)
-	$(Q)$(SHELL) '$(fadvise_advice_tbl)' $(linux_uapi_dir) > $@
+$(fadvise_advice_array): $(beauty_uapi_linux_dir)/fadvise.h $(fadvise_advice_tbl)
+	$(Q)$(SHELL) '$(fadvise_advice_tbl)' $(beauty_uapi_linux_dir) > $@

 fsmount_arrays := $(beauty_outdir)/fsmount_arrays.c
 fsmount_tbls := $(srctree)/tools/perf/trace/beauty/fsmount.sh
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@ -8,7 +8,7 @@
 #include <errno.h>
 #include <stdbool.h>
 #include <linux/coresight-pmu.h>
-#include <linux/zalloc.h>
+#include <stdlib.h>
 #include <api/fs/fs.h>

 #include "../../../util/auxtrace.h"
@ -27,7 +27,7 @@ static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
 	/* arm_spe_xxxxxxxxx\0 */
 	char arm_spe_pmu_name[sizeof(ARM_SPE_PMU_NAME) + 10];

-	arm_spe_pmus = zalloc(sizeof(struct perf_pmu *) * nr_cpus);
+	arm_spe_pmus = calloc(nr_cpus, sizeof(struct perf_pmu *));
 	if (!arm_spe_pmus) {
 		pr_err("spes alloc failed\n");
 		*err = -ENOMEM;
@ -79,7 +79,7 @@ static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err)
 	if (!(*nr_ptts))
 		goto out;

-	hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts));
+	hisi_ptt_pmus = calloc((*nr_ptts), sizeof(struct perf_pmu *));
 	if (!hisi_ptt_pmus) {
 		pr_err("hisi_ptt alloc failed\n");
 		*err = -ENOMEM;
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@ -197,7 +197,8 @@ static struct perf_pmu *cs_etm_get_pmu(struct auxtrace_record *itr)
 static int cs_etm_validate_config(struct perf_pmu *cs_etm_pmu,
 				  struct evsel *evsel)
 {
-	int idx, err = 0;
+	unsigned int idx;
+	int err = 0;
 	struct perf_cpu_map *event_cpus = evsel->evlist->core.user_requested_cpus;
 	struct perf_cpu_map *intersect_cpus;
 	struct perf_cpu cpu;
@ -546,7 +547,7 @@ static size_t
 cs_etm_info_priv_size(struct auxtrace_record *itr,
 		      struct evlist *evlist)
 {
-	int idx;
+	unsigned int idx;
 	int etmv3 = 0, etmv4 = 0, ete = 0;
 	struct perf_cpu_map *event_cpus = evlist->core.user_requested_cpus;
 	struct perf_cpu_map *intersect_cpus;
@ -783,7 +784,7 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 			    struct perf_record_auxtrace_info *info,
 			    size_t priv_size)
 {
-	int i;
+	unsigned int i;
 	u32 offset;
 	u64 nr_cpu, type;
 	struct perf_cpu_map *cpu_map;
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@ -144,7 +144,8 @@ static int arm_spe_info_fill(struct auxtrace_record *itr,
 			     struct perf_record_auxtrace_info *auxtrace_info,
 			     size_t priv_size)
 {
-	int i, ret;
+	unsigned int i;
+	int ret;
 	size_t offset;
 	struct arm_spe_recording *sper =
 			container_of(itr, struct arm_spe_recording, itr);
--- a/tools/perf/arch/arm64/util/header.c
+++ b/tools/perf/arch/arm64/util/header.c
@ -43,7 +43,7 @@ static int _get_cpuid(char *buf, size_t sz, struct perf_cpu cpu)
 int get_cpuid(char *buf, size_t sz, struct perf_cpu cpu)
 {
 	struct perf_cpu_map *cpus;
-	int idx;
+	unsigned int idx;

 	if (cpu.cpu != -1)
 		return _get_cpuid(buf, sz, cpu);
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@ -9,14 +9,14 @@
 #include "../util/debug.h"
 #include <linux/zalloc.h>

-const char *const arc_triplets[] = {
+static const char *const arc_triplets[] = {
 	"arc-linux-",
 	"arc-snps-linux-uclibc-",
 	"arc-snps-linux-gnu-",
 	NULL
 };

-const char *const arm_triplets[] = {
+static const char *const arm_triplets[] = {
 	"arm-eabi-",
 	"arm-linux-androideabi-",
 	"arm-unknown-linux-",
@ -28,13 +28,13 @@ const char *const arm_triplets[] = {
 	NULL
 };

-const char *const arm64_triplets[] = {
+static const char *const arm64_triplets[] = {
 	"aarch64-linux-android-",
 	"aarch64-linux-gnu-",
 	NULL
 };

-const char *const powerpc_triplets[] = {
+static const char *const powerpc_triplets[] = {
 	"powerpc-unknown-linux-gnu-",
 	"powerpc-linux-gnu-",
 	"powerpc64-unknown-linux-gnu-",
@ -43,40 +43,40 @@ const char *const powerpc_triplets[] = {
 	NULL
 };

-const char *const riscv32_triplets[] = {
+static const char *const riscv32_triplets[] = {
 	"riscv32-unknown-linux-gnu-",
 	"riscv32-linux-android-",
 	"riscv32-linux-gnu-",
 	NULL
 };

-const char *const riscv64_triplets[] = {
+static const char *const riscv64_triplets[] = {
 	"riscv64-unknown-linux-gnu-",
 	"riscv64-linux-android-",
 	"riscv64-linux-gnu-",
 	NULL
 };

-const char *const s390_triplets[] = {
+static const char *const s390_triplets[] = {
 	"s390-ibm-linux-",
 	"s390x-linux-gnu-",
 	NULL
 };

-const char *const sh_triplets[] = {
+static const char *const sh_triplets[] = {
 	"sh-unknown-linux-gnu-",
 	"sh-linux-gnu-",
 	NULL
 };

-const char *const sparc_triplets[] = {
+static const char *const sparc_triplets[] = {
 	"sparc-unknown-linux-gnu-",
 	"sparc64-unknown-linux-gnu-",
 	"sparc64-linux-gnu-",
 	NULL
 };

-const char *const x86_triplets[] = {
+static const char *const x86_triplets[] = {
 	"x86_64-pc-linux-gnu-",
 	"x86_64-unknown-linux-gnu-",
 	"i686-pc-linux-gnu-",
@ -90,7 +90,7 @@ const char *const x86_triplets[] = {
 	NULL
 };

-const char *const mips_triplets[] = {
+static const char *const mips_triplets[] = {
 	"mips-unknown-linux-gnu-",
 	"mipsel-linux-android-",
 	"mips-linux-gnu-",
--- a/tools/perf/arch/loongarch/util/Build
+++ b/tools/perf/arch/loongarch/util/Build
@ -1,4 +1,3 @@
 perf-util-y += header.o

 perf-util-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o
-perf-util-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
--- a/tools/perf/arch/powerpc/util/auxtrace.c
+++ b/tools/perf/arch/powerpc/util/auxtrace.c
@ -6,6 +6,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/string.h>
+#include <linux/zalloc.h>

 #include "../../util/evlist.h"
 #include "../../util/debug.h"
--- a/tools/perf/arch/sh/include/dwarf-regs-table.h
+++ b/tools/perf/arch/sh/include/dwarf-regs-table.h
@ -2,7 +2,7 @@
 #ifdef DEFINE_DWARF_REGSTR_TABLE
 /* This is included in perf/util/dwarf-regs.c */

-const char * const sh_regstr_tbl[] = {
+static const char * const sh_regstr_tbl[] = {
 	"r0",
 	"r1",
 	"r2",
--- a/tools/perf/arch/x86/tests/amd-ibs-period.c
+++ b/tools/perf/arch/x86/tests/amd-ibs-period.c
@ -8,7 +8,6 @@

 #include "arch-tests.h"
 #include "linux/perf_event.h"
-#include "linux/zalloc.h"
 #include "tests/tests.h"
 #include "../perf-sys.h"
 #include "pmu.h"
@ -60,7 +59,7 @@ static int dummy_workload_1(unsigned long count)
 		0xcc, /* int 3 */
 	};

-	p = zalloc(2 * page_size);
+	p = calloc(2, page_size);
 	if (!p) {
 		printf("malloc() failed. %m");
 		return 1;
--- a/tools/perf/arch/x86/tests/dwarf-unwind.c
+++ b/tools/perf/arch/x86/tests/dwarf-unwind.c
@ -54,22 +54,13 @@ int test__arch_unwind_sample(struct perf_sample *sample,
 			     struct thread *thread)
 {
 	struct regs_dump *regs = perf_sample__user_regs(sample);
-	u64 *buf;
+	u64 *buf = calloc(PERF_REGS_MAX, sizeof(u64));

-	buf = malloc(sizeof(u64) * PERF_REGS_MAX);
 	if (!buf) {
 		pr_debug("failed to allocate sample uregs data\n");
 		return -1;
 	}

-#ifdef MEMORY_SANITIZER
-	/*
-	 * Assignments to buf in the assembly function perf_regs_load aren't
-	 * seen by memory sanitizer. Zero the memory to convince memory
-	 * sanitizer the memory is initialized.
-	 */
-	memset(buf, 0, sizeof(u64) * PERF_REGS_MAX);
-#endif
 	perf_regs_load(buf);
 	regs->abi  = PERF_SAMPLE_REGS_ABI;
 	regs->regs = buf;
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@ -5,8 +5,8 @@
 #include <dirent.h>
 #include <fcntl.h>
 #include <linux/stddef.h>
+#include <linux/string.h>
 #include <linux/perf_event.h>
-#include <linux/zalloc.h>
 #include <api/fs/fs.h>
 #include <api/io_dir.h>
 #include <internal/cpumap.h>
@ -71,11 +71,6 @@ static int snc_nodes_per_l3_cache(void)
 	return snc_nodes;
 }

-static bool starts_with(const char *str, const char *prefix)
-{
-	return !strncmp(prefix, str, strlen(prefix));
-}
-
 static int num_chas(void)
 {
 	static bool checked_chas;
@ -93,7 +88,7 @@ static int num_chas(void)

 		while ((dent = io_dir__readdir(&dir)) != NULL) {
 			/* Note, dent->d_type will be DT_LNK and so isn't a useful filter. */
-			if (starts_with(dent->d_name, "uncore_cha_"))
+			if (strstarts(dent->d_name, "uncore_cha_"))
 				num_chas++;
 		}
 		close(fd);
@ -225,7 +220,8 @@ static void gnr_uncore_cha_imc_adjust_cpumask_for_snc(struct perf_pmu *pmu, bool
 	static struct perf_cpu_map *cha_adjusted[MAX_SNCS];
 	static struct perf_cpu_map *imc_adjusted[MAX_SNCS];
 	struct perf_cpu_map **adjusted = cha ? cha_adjusted : imc_adjusted;
-	int idx, pmu_snc, cpu_adjust;
+	unsigned int idx;
+	int pmu_snc, cpu_adjust;
 	struct perf_cpu cpu;
 	bool alloc;

@ -305,9 +301,9 @@ void perf_pmu__arch_init(struct perf_pmu *pmu)
 			else
 				pmu->mem_events = perf_mem_events_intel;
 		} else if (x86__is_intel_graniterapids()) {
-			if (starts_with(pmu->name, "uncore_cha_"))
+			if (strstarts(pmu->name, "uncore_cha_"))
 				gnr_uncore_cha_imc_adjust_cpumask_for_snc(pmu, /*cha=*/true);
-			else if (starts_with(pmu->name, "uncore_imc_"))
+			else if (strstarts(pmu->name, "uncore_imc_"))
 				gnr_uncore_cha_imc_adjust_cpumask_for_snc(pmu, /*cha=*/false);
 		}
 	}
--- a/tools/perf/bench/breakpoint.c
+++ b/tools/perf/bench/breakpoint.c
@ -16,7 +16,7 @@
 #include "bench.h"
 #include "futex.h"

-struct {
+static struct {
 	unsigned int nbreakpoints;
 	unsigned int nparallel;
 	unsigned int nthreads;
@ -173,7 +173,7 @@ int bench_breakpoint_thread(int argc, const char **argv)
 	return 0;
 }

-struct {
+static struct {
 	unsigned int npassive;
 	unsigned int nactive;
 } enable_params = {
--- a/tools/perf/bench/mem-functions.c
+++ b/tools/perf/bench/mem-functions.c
@ -7,13 +7,14 @@
 * Written by Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
 */

-#include "debug.h"
+#include "bench.h"
 #include "../perf-sys.h"
 #include <subcmd/parse-options.h>
-#include "../util/header.h"
-#include "../util/cloexec.h"
-#include "../util/string2.h"
-#include "bench.h"
+#include "util/cloexec.h"
+#include "util/debug.h"
+#include "util/header.h"
+#include "util/stat.h"
+#include "util/string2.h"
 #include "mem-memcpy-arch.h"
 #include "mem-memset-arch.h"

@ -26,6 +27,7 @@
 #include <errno.h>
 #include <linux/time64.h>
 #include <linux/log2.h>
+#include <pthread.h>

 #define K 1024

@ -41,6 +43,7 @@ static unsigned int	nr_loops	= 1;
 static bool		use_cycles;
 static int		cycles_fd;
 static unsigned int	seed;
+static unsigned int	nr_threads	= 1;

 static const struct option bench_common_options[] = {
 	OPT_STRING('s', "size", &size_str, "1MB",
@ -121,6 +124,8 @@ static struct perf_event_attr cycle_attr = {
 	.config		= PERF_COUNT_HW_CPU_CYCLES
 };

+static struct stats stats;
+
 static int init_cycles(void)
 {
 	cycles_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1, perf_event_open_cloexec_flag());
@ -174,18 +179,18 @@ static void clock_accum(union bench_clock *a, union bench_clock *b)

 static double timeval2double(struct timeval *ts)
 {
-	return (double)ts->tv_sec + (double)ts->tv_usec / (double)USEC_PER_SEC;
+	return ((double)ts->tv_sec + (double)ts->tv_usec / (double)USEC_PER_SEC) / nr_threads;
 }

 #define print_bps(x) do {						\
 		if (x < K)						\
-			printf(" %14lf bytes/sec\n", x);		\
+			printf(" %14lf bytes/sec", x);			\
 		else if (x < K * K)					\
-			printf(" %14lfd KB/sec\n", x / K);		\
+			printf(" %14lfd KB/sec", x / K);		\
 		else if (x < K * K * K)					\
-			printf(" %14lf MB/sec\n", x / K / K);		\
+			printf(" %14lf MB/sec", x / K / K);		\
 		else							\
-			printf(" %14lf GB/sec\n", x / K / K / K);	\
+			printf(" %14lf GB/sec", x / K / K / K);	\
 	} while (0)

 static void __bench_mem_function(struct bench_mem_info *info, struct bench_params *p,
@ -196,6 +201,7 @@ static void __bench_mem_function(struct bench_mem_info *info, struct bench_param
 	union bench_clock rt = { 0 };
 	void *src = NULL, *dst = NULL;

+	init_stats(&stats);
 	printf("# function '%s' (%s)\n", r->name, r->desc);

 	if (r->fn.init && r->fn.init(info, p, &src, &dst))
@ -210,11 +216,16 @@ static void __bench_mem_function(struct bench_mem_info *info, struct bench_param
 	switch (bench_format) {
 	case BENCH_FORMAT_DEFAULT:
 		if (use_cycles) {
-			printf(" %14lf cycles/byte\n", (double)rt.cycles/(double)p->size_total);
+			printf(" %14lf cycles/byte", (double)rt.cycles/(double)p->size_total);
 		} else {
 			result_bps = (double)p->size_total/timeval2double(&rt.tv);
 			print_bps(result_bps);
 		}
+		if (nr_threads > 1) {
+			printf("/thread\t( +- %6.2f%% )",
+			       rel_stddev_stats(stddev_stats(&stats), avg_stats(&stats)));
+		}
+		printf("\n");
 		break;

 	case BENCH_FORMAT_SIMPLE:
@ -388,7 +399,7 @@ static void mem_free(struct bench_mem_info *info __maybe_unused,
 	*dst = *src = NULL;
 }

-struct function memcpy_functions[] = {
+static struct function memcpy_functions[] = {
 	{ .name		= "default",
 	  .desc		= "Default memcpy() provided by glibc",
 	  .fn.init	= mem_alloc,
@ -494,16 +505,27 @@ static void mmap_page_touch(void *dst, size_t size, unsigned int page_shift, boo
 	}
 }

-static int do_mmap(const struct function *r, struct bench_params *p,
-		  void *src __maybe_unused, void *dst __maybe_unused,
-		  union bench_clock *accum)
+struct mmap_data {
+	pthread_t id;
+	const struct function *func;
+	struct bench_params *params;
+	union bench_clock result;
+	unsigned int seed;
+	int error;
+};
+
+static void *do_mmap_thread(void *arg)
 {
+	struct mmap_data *data = arg;
+	const struct function *r = data->func;
+	struct bench_params *p = data->params;
 	union bench_clock start, end, diff;
 	mmap_op_t fn = r->fn.mmap_op;
 	bool populate = strcmp(r->name, "populate") == 0;
+	void *dst;

-	if (p->seed)
-		srand(p->seed);
+	if (data->seed)
+		srand(data->seed);

 	for (unsigned int i = 0; i < p->nr_loops; i++) {
 		clock_get(&start);
@ -514,16 +536,59 @@ static int do_mmap(const struct function *r, struct bench_params *p,
 		fn(dst, p->size, p->page_shift, p->seed);
 		clock_get(&end);
 		diff = clock_diff(&start, &end);
-		clock_accum(accum, &diff);
+		clock_accum(&data->result, &diff);

 		bench_munmap(dst, p->size);
 	}

-	return 0;
+	return data;
 out:
-	printf("# Memory allocation failed - maybe size (%s) %s?\n", size_str,
-			p->page_shift != PAGE_SHIFT_4KB ? "has insufficient hugepages" : "is too large");
-	return -1;
+	data->error = -ENOMEM;
+	return NULL;
+}
+
+static int do_mmap(const struct function *r, struct bench_params *p,
+		  void *src __maybe_unused, void *dst __maybe_unused,
+		  union bench_clock *accum)
+{
+	struct mmap_data *data;
+	int error = 0;
+
+	data = calloc(nr_threads, sizeof(*data));
+	if (!data) {
+		printf("# Failed to allocate thread resources\n");
+		return -1;
+	}
+
+	for (unsigned int i = 0; i < nr_threads; i++) {
+		data[i].func = r;
+		data[i].params = p;
+		if (p->seed)
+			data[i].seed = p->seed + i;
+
+		if (pthread_create(&data[i].id, NULL, do_mmap_thread, &data[i]) < 0)
+			data[i].error = -errno;
+	}
+
+	for (unsigned int i = 0; i < nr_threads; i++) {
+		union bench_clock *t = &data[i].result;
+
+		pthread_join(data[i].id, NULL);
+
+		clock_accum(accum, t);
+		if (use_cycles)
+			update_stats(&stats, t->cycles);
+		else
+			update_stats(&stats, t->tv.tv_sec * 1e6 + t->tv.tv_usec);
+		error |= data[i].error;
+	}
+	free(data);
+
+	if (error) {
+		printf("# Memory allocation failed - maybe size (%s) %s?\n", size_str,
+		       p->page_shift != PAGE_SHIFT_4KB ? "has insufficient hugepages" : "is too large");
+	}
+	return error ? -1 : 0;
 }

 static const char * const bench_mem_mmap_usage[] = {
@ -548,6 +613,8 @@ int bench_mem_mmap(int argc, const char **argv)
 	static const struct option bench_mmap_options[] = {
 		OPT_UINTEGER('r', "randomize", &seed,
 			    "Seed to randomize page access offset."),
+		OPT_UINTEGER('t', "threads", &nr_threads,
+			    "Number of threads to run concurrently (default: 1)."),
 		OPT_PARENT(bench_common_options),
 		OPT_END()
 	};
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@ -32,7 +32,6 @@
 #include <linux/kernel.h>
 #include <linux/time64.h>
 #include <linux/numa.h>
-#include <linux/zalloc.h>

 #include "../util/header.h"
 #include "../util/mutex.h"
@ -166,7 +165,7 @@ static struct global_info	*g = NULL;
 static int parse_cpus_opt(const struct option *opt, const char *arg, int unset);
 static int parse_nodes_opt(const struct option *opt, const char *arg, int unset);

-struct params p0;
+static struct params p0;

 static const struct option options[] = {
 	OPT_INTEGER('p', "nr_proc"	, &p0.nr_proc,		"number of processes"),
@ -980,10 +979,8 @@ static int count_process_nodes(int process_nr)
 	int nodes;
 	int n, t;

-	node_present = (char *)malloc(g->p.nr_nodes * sizeof(char));
+	node_present = calloc(g->p.nr_nodes, sizeof(char));
 	BUG_ON(!node_present);
-	for (nodes = 0; nodes < g->p.nr_nodes; nodes++)
-		node_present[nodes] = 0;

 	for (t = 0; t < g->p.nr_threads; t++) {
 		struct thread_data *td;
@ -1090,10 +1087,8 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
 	if (!g->p.show_convergence && !g->p.measure_convergence)
 		return;

-	nodes = (int *)malloc(g->p.nr_nodes * sizeof(int));
+	nodes = calloc(g->p.nr_nodes, sizeof(int));
 	BUG_ON(!nodes);
-	for (node = 0; node < g->p.nr_nodes; node++)
-		nodes[node] = 0;

 	loops_done_min = -1;
 	loops_done_max = 0;
@ -1423,7 +1418,7 @@ static void worker_process(int process_nr)
 	bind_to_memnode(td->bind_node);
 	bind_to_cpumask(td->bind_cpumask);

-	pthreads = zalloc(g->p.nr_threads * sizeof(pthread_t));
+	pthreads = calloc(g->p.nr_threads, sizeof(pthread_t));
 	process_data = setup_private_data(g->p.bytes_process);

 	if (g->p.show_details >= 3) {
@ -1629,7 +1624,7 @@ static int __bench_numa(const char *name)
 	if (init())
 		return -1;

-	pids = zalloc(g->p.nr_proc * sizeof(*pids));
+	pids = calloc(g->p.nr_proc, sizeof(*pids));
 	pid = -1;

 	if (g->p.serialize_startup) {
--- a/tools/perf/bench/sched-messaging.c
+++ b/tools/perf/bench/sched-messaging.c
@ -301,7 +301,7 @@ int bench_sched_messaging(int argc, const char **argv)
 	argc = parse_options(argc, argv, options,
 			     bench_sched_message_usage, 0);

-	worker_tab = malloc(num_fds * 2 * num_groups * sizeof(union messaging_worker));
+	worker_tab = calloc(num_fds * 2 * num_groups, sizeof(union messaging_worker));
 	if (!worker_tab)
 		err(EXIT_FAILURE, "main:malloc()");

--- a/tools/perf/bench/uprobe.c
+++ b/tools/perf/bench/uprobe.c
@ -58,7 +58,7 @@ static const char * const bench_uprobe_usage[] = {
 		goto cleanup; \
 	}

-struct bench_uprobe_bpf *skel;
+static struct bench_uprobe_bpf *skel;

 static int bench_uprobe__setup_bpf_skel(enum bench_uprobe bench)
 {
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@ -13,7 +13,6 @@
 #include <linux/list.h>
 #include "util/cache.h"
 #include <linux/rbtree.h>
-#include <linux/zalloc.h>
 #include "util/symbol.h"

 #include "util/debug.h"
@ -313,15 +312,6 @@ out_put:
 	return ret;
 }

-static int process_feature_event(const struct perf_tool *tool __maybe_unused,
-				 struct perf_session *session,
-				 union perf_event *event)
-{
-	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(session, event);
-	return 0;
-}
-
 static int hist_entry__stdio_annotate(struct hist_entry *he,
 				    struct evsel *evsel,
 				    struct perf_annotate *ann)
@ -744,8 +734,7 @@ int cmd_annotate(int argc, const char **argv)
 			&annotate.group_set,
 			"Show event group information together"),
 	OPT_STRING('C', "cpu", &annotate.cpu_list, "cpu", "list of cpus to profile"),
-	OPT_CALLBACK(0, "symfs", NULL, "directory",
-		     "Look for files with symbols relative to this directory",
+	OPT_CALLBACK(0, "symfs", NULL, "directory[,layout]", SYMFS_HELP,
 		     symbol__config_symfs),
 	OPT_BOOLEAN(0, "source", &annotate_opts.annotate_src,
 		    "Interleave source code with assembly code (default)"),
@ -876,7 +865,7 @@ int cmd_annotate(int argc, const char **argv)
 	annotate.tool.id_index	= perf_event__process_id_index;
 	annotate.tool.auxtrace_info	= perf_event__process_auxtrace_info;
 	annotate.tool.auxtrace	= perf_event__process_auxtrace;
-	annotate.tool.feature	= process_feature_event;
+	annotate.tool.feature	= perf_event__process_feature;
 	annotate.tool.ordering_requires_timestamps = true;

 	annotate.session = perf_session__new(&data, &annotate.tool);
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@ -37,14 +37,14 @@ struct bench {
 };

 #ifdef HAVE_LIBNUMA_SUPPORT
-static struct bench numa_benchmarks[] = {
+static const struct bench numa_benchmarks[] = {
 	{ "mem",	"Benchmark for NUMA workloads",			bench_numa		},
 	{ "all",	"Run all NUMA benchmarks",			NULL			},
 	{ NULL,		NULL,						NULL			}
 };
 #endif

-static struct bench sched_benchmarks[] = {
+static const struct bench sched_benchmarks[] = {
 	{ "messaging",	"Benchmark for scheduling and IPC",		bench_sched_messaging	},
 	{ "pipe",	"Benchmark for pipe() between two processes",	bench_sched_pipe	},
 	{ "seccomp-notify",	"Benchmark for seccomp user notify",	bench_sched_seccomp_notify},
@ -52,7 +52,7 @@ static struct bench sched_benchmarks[] = {
 	{ NULL,		NULL,						NULL			}
 };

-static struct bench syscall_benchmarks[] = {
+static const struct bench syscall_benchmarks[] = {
 	{ "basic",	"Benchmark for basic getppid(2) calls",		bench_syscall_basic	},
 	{ "getpgid",	"Benchmark for getpgid(2) calls",		bench_syscall_getpgid	},
 	{ "fork",	"Benchmark for fork(2) calls",			bench_syscall_fork	},
@ -61,7 +61,7 @@ static struct bench syscall_benchmarks[] = {
 	{ NULL,		NULL,						NULL			},
 };

-static struct bench mem_benchmarks[] = {
+static const struct bench mem_benchmarks[] = {
 	{ "memcpy",	"Benchmark for memcpy() functions",		bench_mem_memcpy	},
 	{ "memset",	"Benchmark for memset() functions",		bench_mem_memset	},
 	{ "find_bit",	"Benchmark for find_bit() functions",		bench_mem_find_bit	},
@ -70,7 +70,7 @@ static struct bench mem_benchmarks[] = {
 	{ NULL,		NULL,						NULL			}
 };

-static struct bench futex_benchmarks[] = {
+static const struct bench futex_benchmarks[] = {
 	{ "hash",	"Benchmark for futex hash table",               bench_futex_hash	},
 	{ "wake",	"Benchmark for futex wake calls",               bench_futex_wake	},
 	{ "wake-parallel", "Benchmark for parallel futex wake calls",   bench_futex_wake_parallel },
@ -82,7 +82,7 @@ static struct bench futex_benchmarks[] = {
 };

 #ifdef HAVE_EVENTFD_SUPPORT
-static struct bench epoll_benchmarks[] = {
+static const struct bench epoll_benchmarks[] = {
 	{ "wait",	"Benchmark epoll concurrent epoll_waits",       bench_epoll_wait	},
 	{ "ctl",	"Benchmark epoll concurrent epoll_ctls",        bench_epoll_ctl		},
 	{ "all",	"Run all futex benchmarks",			NULL			},
@ -90,7 +90,7 @@ static struct bench epoll_benchmarks[] = {
 };
 #endif // HAVE_EVENTFD_SUPPORT

-static struct bench internals_benchmarks[] = {
+static const struct bench internals_benchmarks[] = {
 	{ "synthesize", "Benchmark perf event synthesis",	bench_synthesize	},
 	{ "kallsyms-parse", "Benchmark kallsyms parsing",	bench_kallsyms_parse	},
 	{ "inject-build-id", "Benchmark build-id injection",	bench_inject_build_id	},
@ -99,14 +99,14 @@ static struct bench internals_benchmarks[] = {
 	{ NULL,		NULL,					NULL			}
 };

-static struct bench breakpoint_benchmarks[] = {
+static const struct bench breakpoint_benchmarks[] = {
 	{ "thread", "Benchmark thread start/finish with breakpoints", bench_breakpoint_thread},
 	{ "enable", "Benchmark breakpoint enable/disable", bench_breakpoint_enable},
 	{ "all", "Run all breakpoint benchmarks", NULL},
 	{ NULL,	NULL, NULL },
 };

-static struct bench uprobe_benchmarks[] = {
+static const struct bench uprobe_benchmarks[] = {
 	{ "baseline",	"Baseline libc usleep(1000) call",				bench_uprobe_baseline,	},
 	{ "empty",	"Attach empty BPF prog to uprobe on usleep, system wide",	bench_uprobe_empty,	},
 	{ "trace_printk", "Attach trace_printk BPF prog to uprobe on usleep syswide",	bench_uprobe_trace_printk,	},
@ -116,12 +116,12 @@ static struct bench uprobe_benchmarks[] = {
 };

 struct collection {
-	const char	*name;
-	const char	*summary;
-	struct bench	*benchmarks;
+	const char		*name;
+	const char		*summary;
+	const struct bench	*benchmarks;
 };

-static struct collection collections[] = {
+static const struct collection collections[] = {
 	{ "sched",	"Scheduler and IPC benchmarks",			sched_benchmarks	},
 	{ "syscall",	"System call benchmarks",			syscall_benchmarks	},
 	{ "mem",	"Memory access benchmarks",			mem_benchmarks		},
@ -147,9 +147,9 @@ static struct collection collections[] = {
 #define for_each_bench(coll, bench) \
 	for (bench = coll->benchmarks; bench && bench->name; bench++)

-static void dump_benchmarks(struct collection *coll)
+static void dump_benchmarks(const struct collection *coll)
 {
-	struct bench *bench;
+	const struct bench *bench;

 	printf("\n        # List of available benchmarks for collection '%s':\n\n", coll->name);

@ -178,7 +178,7 @@ static const char * const bench_usage[] = {

 static void print_usage(void)
 {
-	struct collection *coll;
+	const struct collection *coll;
 	int i;

 	printf("Usage: \n");
@ -234,9 +234,9 @@ static int run_bench(const char *coll_name, const char *bench_name, bench_fn_t f
 	return ret;
 }

-static void run_collection(struct collection *coll)
+static void run_collection(const struct collection *coll)
 {
-	struct bench *bench;
+	const struct bench *bench;
 	const char *argv[2];

 	argv[1] = NULL;
@ -260,7 +260,7 @@ static void run_collection(struct collection *coll)

 static void run_all_collections(void)
 {
-	struct collection *coll;
+	const struct collection *coll;

 	for_each_collection(coll)
 		run_collection(coll);
@ -268,7 +268,7 @@ static void run_all_collections(void)

 int cmd_bench(int argc, const char **argv)
 {
-	struct collection *coll;
+	const struct collection *coll;
 	int ret = 0;

 	/* Unbuffered output */
@ -306,7 +306,7 @@ int cmd_bench(int argc, const char **argv)
 	}

 	for_each_collection(coll) {
-		struct bench *bench;
+		const struct bench *bench;

 		if (strcmp(coll->name, argv[0]))
 			continue;
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@ -155,7 +155,7 @@ static void *c2c_he_zalloc(size_t size)
 	if (!c2c_he->nodeset)
 		goto out_free;

-	c2c_he->node_stats = zalloc(c2c.nodes_cnt * sizeof(*c2c_he->node_stats));
+	c2c_he->node_stats = calloc(c2c.nodes_cnt, sizeof(*c2c_he->node_stats));
 	if (!c2c_he->node_stats)
 		goto out_free;

@ -2310,7 +2310,6 @@ static int setup_nodes(struct perf_session *session)
 {
 	struct numa_node *n;
 	unsigned long **nodes;
-	int node, idx;
 	struct perf_cpu cpu;
 	int *cpu2node;
 	struct perf_env *env = perf_session__env(session);
@ -2325,24 +2324,25 @@ static int setup_nodes(struct perf_session *session)
 	if (!n)
 		return -EINVAL;

-	nodes = zalloc(sizeof(unsigned long *) * c2c.nodes_cnt);
+	nodes = calloc(c2c.nodes_cnt, sizeof(unsigned long *));
 	if (!nodes)
 		return -ENOMEM;

 	c2c.nodes = nodes;

-	cpu2node = zalloc(sizeof(int) * c2c.cpus_cnt);
+	cpu2node = calloc(c2c.cpus_cnt, sizeof(int));
 	if (!cpu2node)
 		return -ENOMEM;

-	for (idx = 0; idx < c2c.cpus_cnt; idx++)
+	for (int idx = 0; idx < c2c.cpus_cnt; idx++)
 		cpu2node[idx] = -1;

 	c2c.cpu2node = cpu2node;

-	for (node = 0; node < c2c.nodes_cnt; node++) {
+	for (int node = 0; node < c2c.nodes_cnt; node++) {
 		struct perf_cpu_map *map = n[node].map;
 		unsigned long *set;
+		unsigned int idx;

 		set = bitmap_zalloc(c2c.cpus_cnt);
 		if (!set)
@ -2892,9 +2892,10 @@ static int ui_quirks(void)

 #define CALLCHAIN_DEFAULT_OPT  "graph,0.5,caller,function,percent"

-const char callchain_help[] = "Display call graph (stack chain/backtrace):\n\n"
-				CALLCHAIN_REPORT_HELP
-				"\n\t\t\t\tDefault: " CALLCHAIN_DEFAULT_OPT;
+static const char callchain_help[] =
+	"Display call graph (stack chain/backtrace):\n\n"
+	CALLCHAIN_REPORT_HELP
+	"\n\t\t\t\tDefault: " CALLCHAIN_DEFAULT_OPT;

 static int
 parse_callchain_opt(const struct option *opt, const char *arg, int unset)
--- a/tools/perf/builtin-config.c
+++ b/tools/perf/builtin-config.c
@ -23,7 +23,7 @@ static const char * const config_usage[] = {
 	NULL
 };

-enum actions {
+static enum actions {
 	ACTION_LIST = 1
 } actions;

--- a/tools/perf/builtin-daemon.c
+++ b/tools/perf/builtin-daemon.c
@ -1016,7 +1016,7 @@ static int setup_config_changes(struct daemon *daemon)
 {
 	char *basen = strdup(daemon->config_real);
 	char *dirn  = strdup(daemon->config_real);
-	char *base, *dir;
+	const char *base, *dir;
 	int fd, wd = -1;

 	if (!dirn || !basen)
@ -1029,7 +1029,7 @@ static int setup_config_changes(struct daemon *daemon)
 	}

 	dir = dirname(dirn);
-	base = basename(basen);
+	base = perf_basename(basen);
 	pr_debug("config file: %s, dir: %s\n", base, dir);

 	wd = inotify_add_watch(fd, dir, IN_CLOSE_WRITE);
--- a/tools/perf/builtin-data.c
+++ b/tools/perf/builtin-data.c
@ -28,15 +28,15 @@ static const char *data_usage[] = {
 	NULL
 };

-const char *to_json;
-const char *to_ctf;
-struct perf_data_convert_opts opts = {
+static const char *to_json;
+static const char *to_ctf;
+static struct perf_data_convert_opts opts = {
 	.force = false,
 	.all = false,
 	.time_str = NULL,
 };

-const struct option data_options[] = {
+static const struct option data_options[] = {
 		OPT_INCR('v', "verbose", &verbose, "be more verbose"),
 		OPT_STRING('i', "input", &input_name, "file", "input file name"),
 		OPT_STRING(0, "to-json", &to_json, NULL, "Convert to JSON format"),
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@ -113,7 +113,7 @@ enum {
 	COMPUTE_STREAM,	/* After COMPUTE_MAX to avoid use current compute arrays */
 };

-const char *compute_names[COMPUTE_MAX] = {
+static const char *compute_names[COMPUTE_MAX] = {
 	[COMPUTE_DELTA] = "delta",
 	[COMPUTE_DELTA_ABS] = "delta-abs",
 	[COMPUTE_RATIO] = "ratio",
@ -382,7 +382,7 @@ static void block_hist_free(void *he)
 	free(bh);
 }

-struct hist_entry_ops block_hist_ops = {
+static struct hist_entry_ops block_hist_ops = {
 	.new    = block_hist_zalloc,
 	.free   = block_hist_free,
 };
@ -1280,8 +1280,7 @@ static const struct option options[] = {
 	OPT_STRING_NOEMPTY('t', "field-separator", &symbol_conf.field_sep, "separator",
 		   "separator for columns, no spaces will be added between "
 		   "columns '.' is reserved."),
-	OPT_CALLBACK(0, "symfs", NULL, "directory",
-		     "Look for files with symbols relative to this directory",
+	OPT_CALLBACK(0, "symfs", NULL, "directory[,layout]", SYMFS_HELP,
 		     symbol__config_symfs),
 	OPT_UINTEGER('o', "order", &sort_compute, "Specify compute sorting."),
 	OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",
@ -1353,7 +1352,7 @@ static int cycles_printf(struct hist_entry *he, struct hist_entry *pair,
 	/*
 	 * Avoid printing the warning "addr2line_init failed for ..."
 	 */
-	symbol_conf.disable_add2line_warn = true;
+	symbol_conf.addr2line_disable_warn = true;

 	bi = block_he->block_info;

@ -1892,7 +1891,7 @@ static int data_init(int argc, const char **argv)
 		return -EINVAL;
 	}

-	data__files = zalloc(sizeof(*data__files) * data__files_cnt);
+	data__files = calloc(data__files_cnt, sizeof(*data__files));
 	if (!data__files)
 		return -ENOMEM;

@ -1987,7 +1986,7 @@ int cmd_diff(int argc, const char **argv)

 	if (compute == COMPUTE_STREAM) {
 		symbol_conf.show_branchflag_count = true;
-		symbol_conf.disable_add2line_warn = true;
+		symbol_conf.addr2line_disable_warn = true;
 		callchain_param.mode = CHAIN_FLAT;
 		callchain_param.key = CCKEY_SRCLINE;
 		callchain_param.branch_callstack = 1;
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@ -20,6 +20,7 @@
 #include <linux/capability.h>
 #include <linux/err.h>
 #include <linux/string.h>
+#include <linux/zalloc.h>
 #include <sys/stat.h>

 #include "debug.h"
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@ -133,7 +133,7 @@ struct perf_inject {
 	struct perf_file_section secs[HEADER_FEAT_BITS];
 	struct guest_session	guest_session;
 	struct strlist		*known_build_ids;
-	const struct evsel	*mmap_evsel;
+	struct evsel		*mmap_evsel;
 	struct ip_callchain	*raw_callchain;
 };

@ -270,9 +270,8 @@ static s64 perf_event__repipe_auxtrace(const struct perf_tool *tool,
 	inject->have_auxtrace = true;

 	if (!inject->output.is_pipe) {
-		off_t offset;
+		off_t offset = perf_data__seek(&inject->output, 0, SEEK_CUR);

-		offset = lseek(inject->output.file.fd, 0, SEEK_CUR);
 		if (offset == -1)
 			return -errno;
 		ret = auxtrace_index__auxtrace_event(&session->auxtrace_index,
@ -519,7 +518,7 @@ static struct dso *findnew_dso(int pid, int tid, const char *filename,
 * processing mmap events. If not stashed, search the evlist for the first mmap
 * gathering event.
 */
-static const struct evsel *inject__mmap_evsel(struct perf_inject *inject)
+static struct evsel *inject__mmap_evsel(struct perf_inject *inject)
 {
 	struct evsel *pos;

@ -1023,7 +1022,6 @@ int perf_event__inject_buildid(const struct perf_tool *tool, union perf_event *e

 	sample__for_each_callchain_node(thread, evsel, sample, PERF_MAX_STACK_DEPTH,
 					/*symbols=*/false, mark_dso_hit_callback, &args);
-
 	thread__put(thread);
 repipe:
 	perf_event__repipe(tool, event, sample, machine);
@ -1087,6 +1085,7 @@ static int perf_inject__sched_stat(const struct perf_tool *tool,
 	struct perf_sample sample_sw;
 	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
 	u32 pid = evsel__intval(evsel, sample, "pid");
+	int ret;

 	list_for_each_entry(ent, &inject->samples, node) {
 		if (pid == ent->tid)
@ -1103,7 +1102,9 @@ found:
 	perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type,
 				      evsel->core.attr.read_format, &sample_sw);
 	build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine);
-	return perf_event__repipe(tool, event_sw, &sample_sw, machine);
+	ret = perf_event__repipe(tool, event_sw, &sample_sw, machine);
+	perf_sample__exit(&sample_sw);
+	return ret;
 }
 #endif

@ -1429,6 +1430,7 @@ static int synthesize_build_id(struct perf_inject *inject, struct dso *dso, pid_
 {
 	struct machine *machine = perf_session__findnew_machine(inject->session, machine_pid);
 	struct perf_sample synth_sample = {
+		.evsel	   = inject__mmap_evsel(inject),
 		.pid	   = -1,
 		.tid	   = -1,
 		.time	   = -1,
@ -1648,6 +1650,7 @@ static int guest_session__fetch(struct guest_session *gs)
 	size_t hdr_sz = sizeof(*hdr);
 	ssize_t ret;

+	perf_sample__init(&gs->ev.sample, /*all=*/false);
 	buf = gs->ev.event_buf;
 	if (!buf) {
 		buf = malloc(PERF_SAMPLE_MAX_SIZE);
@ -1745,18 +1748,24 @@ static int guest_session__inject_events(struct guest_session *gs, u64 timestamp)
 		if (!gs->fetched) {
 			ret = guest_session__fetch(gs);
 			if (ret)
-				return ret;
+				break;
 			gs->fetched = true;
 		}

 		ev = gs->ev.event;
 		sample = &gs->ev.sample;

-		if (!ev->header.size)
-			return 0; /* EOF */
-
-		if (sample->time > timestamp)
-			return 0;
+		if (!ev->header.size) {
+			/* EOF */
+			perf_sample__exit(&gs->ev.sample);
+			gs->fetched = false;
+			ret = 0;
+			break;
+		}
+		if (sample->time > timestamp) {
+			ret = 0;
+			break;
+		}

 		/* Change cpumode to guest */
 		cpumode = ev->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
@ -1779,12 +1788,14 @@ static int guest_session__inject_events(struct guest_session *gs, u64 timestamp)

 		if (id_hdr_size & 7) {
 			pr_err("Bad id_hdr_size %u\n", id_hdr_size);
-			return -EINVAL;
+			ret = -EINVAL;
+			break;
 		}

 		if (ev->header.size & 7) {
 			pr_err("Bad event size %u\n", ev->header.size);
-			return -EINVAL;
+			ret = -EINVAL;
+			break;
 		}

 		/* Remove guest id sample */
@ -1792,14 +1803,16 @@ static int guest_session__inject_events(struct guest_session *gs, u64 timestamp)

 		if (ev->header.size & 7) {
 			pr_err("Bad raw event size %u\n", ev->header.size);
-			return -EINVAL;
+			ret = -EINVAL;
+			break;
 		}

 		guest_id = guest_session__lookup_id(gs, id);
 		if (!guest_id) {
 			pr_err("Guest event with unknown id %llu\n",
 			       (unsigned long long)id);
-			return -EINVAL;
+			ret = -EINVAL;
+			break;
 		}

 		/* Change to host ID to avoid conflicting ID values */
@ -1819,19 +1832,28 @@ static int guest_session__inject_events(struct guest_session *gs, u64 timestamp)
 		/* New id sample with new ID and CPU */
 		ret = evlist__append_id_sample(inject->session->evlist, ev, sample);
 		if (ret)
-			return ret;
+			break;

 		if (ev->header.size & 7) {
 			pr_err("Bad new event size %u\n", ev->header.size);
-			return -EINVAL;
+			ret = -EINVAL;
+			break;
 		}

-		gs->fetched = false;
-
 		ret = output_bytes(inject, ev, ev->header.size);
 		if (ret)
-			return ret;
+			break;
+
+		/* Reset for next guest session event fetch. */
+		perf_sample__exit(sample);
+		gs->fetched = false;
 	}
+	if (ret && gs->fetched) {
+		/* Clear saved sample state on error. */
+		perf_sample__exit(&gs->ev.sample);
+		gs->fetched = false;
+	}
+	return ret;
 }

 static int guest_session__flush_events(struct guest_session *gs)
@ -2134,6 +2156,7 @@ static bool keep_feat(struct perf_inject *inject, int feat)
 	case HEADER_HYBRID_TOPOLOGY:
 	case HEADER_PMU_CAPS:
 	case HEADER_CPU_DOMAIN_INFO:
+	case HEADER_CLN_SIZE:
 		return true;
 	/* Information that can be updated */
 	case HEADER_BUILD_ID:
@ -2479,12 +2502,12 @@ int cmd_inject(int argc, const char **argv)
 		.output = {
 			.path = "-",
 			.mode = PERF_DATA_MODE_WRITE,
-			.use_stdio = true,
+			.file.use_stdio = true,
 		},
 	};
 	struct perf_data data = {
 		.mode = PERF_DATA_MODE_READ,
-		.use_stdio = true,
+		.file.use_stdio = true,
 	};
 	int ret;
 	const char *known_build_ids = NULL;
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@ -82,7 +82,7 @@ static unsigned long nr_allocs, nr_cross_allocs;

 /* filters for controlling start and stop of time of analysis */
 static struct perf_time_interval ptime;
-const char *time_str;
+static const char *time_str;

 static int insert_alloc_stat(unsigned long call_site, unsigned long ptr,
 			     int bytes_req, int bytes_alloc, int cpu)
--- a/tools/perf/builtin-kwork.c
+++ b/tools/perf/builtin-kwork.c
@ -985,7 +985,7 @@ static int process_irq_handler_exit_event(const struct perf_tool *tool,
 	return 0;
 }

-const struct evsel_str_handler irq_tp_handlers[] = {
+static const struct evsel_str_handler irq_tp_handlers[] = {
 	{ "irq:irq_handler_entry", process_irq_handler_entry_event, },
 	{ "irq:irq_handler_exit",  process_irq_handler_exit_event,  },
 };
@ -1080,7 +1080,7 @@ static int process_softirq_exit_event(const struct perf_tool *tool,
 	return 0;
 }

-const struct evsel_str_handler softirq_tp_handlers[] = {
+static const struct evsel_str_handler softirq_tp_handlers[] = {
 	{ "irq:softirq_raise", process_softirq_raise_event, },
 	{ "irq:softirq_entry", process_softirq_entry_event, },
 	{ "irq:softirq_exit",  process_softirq_exit_event,  },
@ -1211,7 +1211,7 @@ static int process_workqueue_execute_end_event(const struct perf_tool *tool,
 	return 0;
 }

-const struct evsel_str_handler workqueue_tp_handlers[] = {
+static const struct evsel_str_handler workqueue_tp_handlers[] = {
 	{ "workqueue:workqueue_activate_work", process_workqueue_activate_work_event, },
 	{ "workqueue:workqueue_execute_start", process_workqueue_execute_start_event, },
 	{ "workqueue:workqueue_execute_end",   process_workqueue_execute_end_event,   },
@ -1281,7 +1281,7 @@ static int process_sched_switch_event(const struct perf_tool *tool,
 	return 0;
 }

-const struct evsel_str_handler sched_tp_handlers[] = {
+static const struct evsel_str_handler sched_tp_handlers[] = {
 	{ "sched:sched_switch",  process_sched_switch_event, },
 };

@ -1561,13 +1561,13 @@ static void print_bad_events(struct perf_kwork *kwork)
 	}
 }

-const char *graph_load = "||||||||||||||||||||||||||||||||||||||||||||||||";
-const char *graph_idle = "                                                ";
 static void top_print_per_cpu_load(struct perf_kwork *kwork)
 {
 	int i, load_width;
 	u64 total, load, load_ratio;
 	struct kwork_top_stat *stat = &kwork->top_stat;
+	const char *graph_load = "||||||||||||||||||||||||||||||||||||||||||||||||";
+	const char *graph_idle = "                                                ";

 	for (i = 0; i < MAX_NR_CPUS; i++) {
 		total = stat->cpus_runtime[i].total;
@ -2208,7 +2208,7 @@ static int perf_kwork__top(struct perf_kwork *kwork)
 	struct __top_cpus_runtime *cpus_runtime;
 	int ret = 0;

-	cpus_runtime = zalloc(sizeof(struct __top_cpus_runtime) * (MAX_NR_CPUS + 1));
+	cpus_runtime = calloc(MAX_NR_CPUS + 1, sizeof(struct __top_cpus_runtime));
 	if (!cpus_runtime)
 		return -1;

@ -2423,8 +2423,8 @@ int cmd_kwork(int argc, const char **argv)
 		    "Display call chains if present"),
 	OPT_UINTEGER(0, "max-stack", &kwork.max_stack,
 		   "Maximum number of functions to display backtrace."),
-	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
-		    "Look for files with symbols relative to this directory"),
+	OPT_CALLBACK(0, "symfs", NULL, "directory[,layout]", SYMFS_HELP,
+		     symbol__config_symfs),
 	OPT_STRING(0, "time", &kwork.time_str, "str",
 		   "Time span for analysis (start,stop)"),
 	OPT_STRING('C', "cpu", &kwork.cpu_list, "cpu",
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@ -2250,7 +2250,7 @@ static int parse_map_entry(const struct option *opt, const char *str,
 static int parse_max_stack(const struct option *opt, const char *str,
 			   int unset __maybe_unused)
 {
-	unsigned long *len = (unsigned long *)opt->value;
+	int *len = opt->value;
 	long val;
 	char *endptr;

--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@ -597,8 +597,8 @@ __cmd_probe(int argc, const char **argv)
 	OPT_BOOLEAN(0, "demangle-kernel", &symbol_conf.demangle_kernel,
 		    "Enable kernel symbol demangling"),
 	OPT_BOOLEAN(0, "cache", &probe_conf.cache, "Manipulate probe cache"),
-	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
-		   "Look for files with symbols relative to this directory"),
+	OPT_CALLBACK(0, "symfs", NULL, "directory[,layout]", SYMFS_HELP,
+		     symbol__config_symfs),
 	OPT_CALLBACK(0, "target-ns", NULL, "pid",
 		     "target pid for namespace contexts", opt_set_target_ns),
 	OPT_BOOLEAN(0, "bootconfig", &probe_conf.bootconfig,
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@ -40,7 +40,6 @@
 #include "util/perf_api_probe.h"
 #include "util/trigger.h"
 #include "util/perf-hooks.h"
-#include "util/cpu-set-sched.h"
 #include "util/synthetic-events.h"
 #include "util/time-utils.h"
 #include "util/units.h"
@ -56,6 +55,7 @@
 #include "asm/bug.h"
 #include "perf.h"
 #include "cputopo.h"
+#include "dwarf-regs.h"

 #include <errno.h>
 #include <inttypes.h>
@ -453,7 +453,7 @@ static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size
 static int record__aio_push(struct record *rec, struct mmap *map, off_t *off)
 {
 	int ret, idx;
-	int trace_fd = rec->session->data->file.fd;
+	int trace_fd = perf_data__fd(rec->session->data);
 	struct record_aio aio = { .rec = rec, .size = 0 };

 	/*
@ -1070,12 +1070,12 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
 		thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
 						      thread_data->mask->maps.nbits);
 	if (mmap) {
-		thread_data->maps = zalloc(thread_data->nr_mmaps * sizeof(struct mmap *));
+		thread_data->maps = calloc(thread_data->nr_mmaps, sizeof(struct mmap *));
 		if (!thread_data->maps)
 			return -ENOMEM;
 	}
 	if (overwrite_mmap) {
-		thread_data->overwrite_maps = zalloc(thread_data->nr_mmaps * sizeof(struct mmap *));
+		thread_data->overwrite_maps = calloc(thread_data->nr_mmaps, sizeof(struct mmap *));
 		if (!thread_data->overwrite_maps) {
 			zfree(&thread_data->maps);
 			return -ENOMEM;
@ -1220,7 +1220,7 @@ static int record__alloc_thread_data(struct record *rec, struct evlist *evlist)
 	int t, ret;
 	struct record_thread *thread_data;

-	rec->thread_data = zalloc(rec->nr_threads * sizeof(*(rec->thread_data)));
+	rec->thread_data = calloc(rec->nr_threads, sizeof(*(rec->thread_data)));
 	if (!rec->thread_data) {
 		pr_err("Failed to allocate thread data\n");
 		return -ENOMEM;
@ -1640,7 +1640,7 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
 	int rc = 0;
 	int nr_mmaps;
 	struct mmap **maps;
-	int trace_fd = rec->data.file.fd;
+	int trace_fd = perf_data__fd(&rec->data);
 	off_t off = 0;

 	if (!evlist)
@ -1845,10 +1845,12 @@ record__finish_output(struct record *rec)
 	}

 	rec->session->header.data_size += rec->bytes_written;
-	data->file.size = lseek(perf_data__fd(data), 0, SEEK_CUR);
+	data->file.size = perf_data__seek(data, 0, SEEK_CUR);
 	if (record__threads_enabled(rec)) {
-		for (i = 0; i < data->dir.nr; i++)
-			data->dir.files[i].size = lseek(data->dir.files[i].fd, 0, SEEK_CUR);
+		for (i = 0; i < data->dir.nr; i++) {
+			data->dir.files[i].size =
+				perf_data_file__seek(&data->dir.files[i], 0, SEEK_CUR);
+		}
 	}

 	/* Buildid scanning disabled or build ID in kernel and synthesized map events. */
@ -2976,65 +2978,32 @@ out_delete_session:
 	return status;
 }

-static void callchain_debug(struct callchain_param *callchain)
-{
-	static const char *str[CALLCHAIN_MAX] = { "NONE", "FP", "DWARF", "LBR" };
-
-	pr_debug("callchain: type %s\n", str[callchain->record_mode]);
-
-	if (callchain->record_mode == CALLCHAIN_DWARF)
-		pr_debug("callchain: stack dump size %d\n",
-			 callchain->dump_size);
-}
-
-int record_opts__parse_callchain(struct record_opts *record,
-				 struct callchain_param *callchain,
-				 const char *arg, bool unset)
-{
-	int ret;
-	callchain->enabled = !unset;
-
-	/* --no-call-graph */
-	if (unset) {
-		callchain->record_mode = CALLCHAIN_NONE;
-		pr_debug("callchain: disabled\n");
-		return 0;
-	}
-
-	ret = parse_callchain_record_opt(arg, callchain);
-	if (!ret) {
-		/* Enable data address sampling for DWARF unwind. */
-		if (callchain->record_mode == CALLCHAIN_DWARF &&
-		    !record->record_data_mmap_set)
-			record->record_data_mmap = true;
-		callchain_debug(callchain);
-	}
-
-	return ret;
-}
-
-int record_parse_callchain_opt(const struct option *opt,
+static int record_parse_callchain_opt(const struct option *opt,
 			       const char *arg,
 			       int unset)
 {
 	return record_opts__parse_callchain(opt->value, &callchain_param, arg, unset);
 }

-int record_callchain_opt(const struct option *opt,
-			 const char *arg __maybe_unused,
-			 int unset __maybe_unused)
+static int record_callchain_opt(const struct option *opt,
+				const char *arg __maybe_unused,
+				int unset)
 {
-	struct callchain_param *callchain = opt->value;
+	/*
+	 * The -g option only sets the callchain if not already configured by
+	 * .perfconfig. It does, however, enable it.
+	 */
+	if (callchain_param.record_mode != CALLCHAIN_NONE) {
+		callchain_param.enabled = true;
+		return 0;
+	}

-	callchain->enabled = true;
-
-	if (callchain->record_mode == CALLCHAIN_NONE)
-		callchain->record_mode = CALLCHAIN_FP;
-
-	callchain_debug(callchain);
-	return 0;
+	return record_opts__parse_callchain(opt->value, &callchain_param,
+					    EM_HOST != EM_S390 ? "fp" : "dwarf",
+					    unset);
 }

+
 static int perf_record_config(const char *var, const char *value, void *cb)
 {
 	struct record *rec = cb;
@ -3526,7 +3495,7 @@ static struct option __record_options[] = {
 	OPT_CALLBACK(0, "mmap-flush", &record.opts, "number",
 		     "Minimal number of bytes that is extracted from mmap data pages (default: 1)",
 		     record__mmap_flush_parse),
-	OPT_CALLBACK_NOOPT('g', NULL, &callchain_param,
+	OPT_CALLBACK_NOOPT('g', NULL, &record.opts,
 			   NULL, "enables call-graph recording" ,
 			   &record_callchain_opt),
 	OPT_CALLBACK(0, "call-graph", &record.opts,
@ -3696,7 +3665,7 @@ struct option *record_options = __record_options;
 static int record__mmap_cpu_mask_init(struct mmap_cpu_mask *mask, struct perf_cpu_map *cpus)
 {
 	struct perf_cpu cpu;
-	int idx;
+	unsigned int idx;

 	if (cpu_map__is_dummy(cpus))
 		return 0;
@ -3743,7 +3712,7 @@ static int record__alloc_thread_masks(struct record *rec, int nr_threads, int nr
 {
 	int t, ret;

-	rec->thread_masks = zalloc(nr_threads * sizeof(*(rec->thread_masks)));
+	rec->thread_masks = calloc(nr_threads, sizeof(*(rec->thread_masks)));
 	if (!rec->thread_masks) {
 		pr_err("Failed to allocate thread masks\n");
 		return -ENOMEM;
@ -3953,7 +3922,7 @@ static int record__init_thread_numa_masks(struct record *rec, struct perf_cpu_ma
 		return -ENOMEM;
 	}

-	spec = zalloc(topo->nr * sizeof(char *));
+	spec = calloc(topo->nr, sizeof(char *));
 	if (!spec) {
 		pr_err("Failed to allocate NUMA spec\n");
 		ret = -ENOMEM;
@ -4131,8 +4100,11 @@ int cmd_record(int argc, const char **argv)

 	perf_debuginfod_setup(&record.debuginfod);

-	/* Make system wide (-a) the default target. */
-	if (!argc && target__none(&rec->opts.target))
+	/*
+	 * Use system wide (-a) for the default target (ie. when no
+	 * workload). User ID filtering also implies system-wide.
+	 */
+	if ((!argc && target__none(&rec->opts.target)) || rec->uid_str)
 		rec->opts.target.system_wide = true;

 	if (nr_cgroups && !rec->opts.target.system_wide) {
@ -4310,7 +4282,8 @@ int cmd_record(int argc, const char **argv)
 		record.opts.tail_synthesize = true;

 	if (rec->evlist->core.nr_entries == 0) {
-		struct evlist *def_evlist = evlist__new_default();
+		struct evlist *def_evlist = evlist__new_default(&rec->opts.target,
+								callchain_param.enabled);

 		if (!def_evlist)
 			goto out;
@ -4339,9 +4312,6 @@ int cmd_record(int argc, const char **argv)
 		err = parse_uid_filter(rec->evlist, uid);
 		if (err)
 			goto out;
-
-		/* User ID filtering implies system wide. */
-		rec->opts.target.system_wide = true;
 	}

 	/* Enable ignoring missing threads when -p option is defined. */
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@ -245,25 +245,20 @@ static int process_feature_event(const struct perf_tool *tool,
 				 union perf_event *event)
 {
 	struct report *rep = container_of(tool, struct report, tool);
+	int ret = perf_event__process_feature(tool, session, event);

-	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(session, event);
+	if (ret == 0 && event->header.size == sizeof(struct perf_record_header_feature) &&
+	    (int)event->feat.feat_id >= session->header.last_feat) {
+		/*
+		 * (feat_id = HEADER_LAST_FEATURE) is the end marker which means
+		 * all features are received.
+		 */
+		if (rep->header_only)
+			session_done = 1;

-	if (event->feat.feat_id != HEADER_LAST_FEATURE) {
-		pr_err("failed: wrong feature ID: %" PRI_lu64 "\n",
-		       event->feat.feat_id);
-		return -1;
-	} else if (rep->header_only) {
-		session_done = 1;
+		setup_forced_leader(rep, session->evlist);
 	}
-
-	/*
-	 * (feat_id = HEADER_LAST_FEATURE) is the end marker which
-	 * means all features are received, now we can force the
-	 * group if needed.
-	 */
-	setup_forced_leader(rep, session->evlist);
-	return 0;
+	return ret;
 }

 static int process_sample_event(const struct perf_tool *tool,
@ -1416,8 +1411,7 @@ int cmd_report(int argc, const char **argv)
 		   "columns '.' is reserved."),
 	OPT_BOOLEAN('U', "hide-unresolved", &symbol_conf.hide_unresolved,
 		    "Only display entries resolved to a symbol"),
-	OPT_CALLBACK(0, "symfs", NULL, "directory",
-		     "Look for files with symbols relative to this directory",
+	OPT_CALLBACK(0, "symfs", NULL, "directory[,layout]", SYMFS_HELP,
 		     symbol__config_symfs),
 	OPT_STRING('C', "cpu", &report.cpu_list, "cpu",
 		   "list of cpus to profile"),
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@ -2405,7 +2405,7 @@ static int init_idle_threads(int ncpu)
 {
 	int i, ret;

-	idle_threads = zalloc(ncpu * sizeof(struct thread *));
+	idle_threads = calloc(ncpu, sizeof(struct thread *));
 	if (!idle_threads)
 		return -ENOMEM;

@ -3483,7 +3483,7 @@ static int setup_cpus_switch_event(struct perf_sched *sched)
 	if (!sched->cpu_last_switched)
 		return -1;

-	sched->curr_pid = malloc(MAX_CPUS * sizeof(*(sched->curr_pid)));
+	sched->curr_pid = calloc(MAX_CPUS, sizeof(*(sched->curr_pid)));
 	if (!sched->curr_pid) {
 		zfree(&sched->cpu_last_switched);
 		return -1;
@ -3559,7 +3559,7 @@ static int setup_map_cpus(struct perf_sched *sched)
 	sched->max_cpu.cpu  = sysconf(_SC_NPROCESSORS_CONF);

 	if (sched->map.comp) {
-		sched->map.comp_cpus = zalloc(sched->max_cpu.cpu * sizeof(int));
+		sched->map.comp_cpus = calloc(sched->max_cpu.cpu, sizeof(int));
 		if (!sched->map.comp_cpus)
 			return -1;
 	}
@ -4879,8 +4879,8 @@ int cmd_sched(int argc, const char **argv)
 		    "Display call chains if present (default on)"),
 	OPT_UINTEGER(0, "max-stack", &sched.max_stack,
 		   "Maximum number of functions to display backtrace."),
-	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
-		    "Look for files with symbols relative to this directory"),
+	OPT_CALLBACK(0, "symfs", NULL, "directory[,layout]", SYMFS_HELP,
+		     symbol__config_symfs),
 	OPT_BOOLEAN('s', "summary", &sched.summary_only,
 		    "Show only syscall summary with statistics"),
 	OPT_BOOLEAN('S', "with-summary", &sched.summary,
@ -4955,6 +4955,7 @@ int cmd_sched(int argc, const char **argv)
 		.switch_event	    = replay_switch_event,
 		.fork_event	    = replay_fork_event,
 	};
+	struct trace_sched_handler stats_ops  = {};
 	int ret;

 	perf_tool__init(&sched.tool, /*ordered_events=*/true);
@ -5037,6 +5038,7 @@ int cmd_sched(int argc, const char **argv)
 	} else if (!strcmp(argv[0], "stats")) {
 		const char *const stats_subcommands[] = {"record", "report", NULL};

+		sched.tp_handler = &stats_ops;
 		argc = parse_options_subcommand(argc, argv, stats_options,
 						stats_subcommands,
 						stats_usage,
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@ -166,7 +166,7 @@ struct perf_script {
 	int			range_num;
 };

-struct output_option {
+static struct output_option {
 	const char *str;
 	enum perf_output_field field;
 } all_output_options[] = {
@ -1271,11 +1271,11 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,

 	if (PRINT_FIELD(BRCNTR)) {
 		struct evsel *pos = evsel__leader(evsel);
-		unsigned int i = 0, j, num, mask, width;
+		unsigned int i = 0, j, num, mask, width, numprinted = 0;

 		perf_env__find_br_cntr_info(evsel__env(evsel), NULL, &width);
 		mask = (1L << width) - 1;
-		printed += fprintf(fp, "br_cntr: ");
+		printed += fprintf(fp, "\t# br_cntr: ");
 		evlist__for_each_entry_from(evsel->evlist, pos) {
 			if (!(pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
 				continue;
@ -1283,16 +1283,20 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 				break;

 			num = (br_cntr >> (i++ * width)) & mask;
+			numprinted += num;
 			if (!verbose) {
 				for (j = 0; j < num; j++)
 					printed += fprintf(fp, "%s", pos->abbr_name);
 			} else
 				printed += fprintf(fp, "%s %d ", pos->name, num);
 		}
-		printed += fprintf(fp, "\t");
+		if (numprinted == 0 && !verbose)
+			printed += fprintf(fp, "-");
+		printed += fprintf(fp, " ");
 	}

-	printed += fprintf(fp, "#%s%s%s%s",
+	printed += fprintf(fp, "%s%s%s%s%s",
+			      !PRINT_FIELD(BRCNTR) ? "#" : "",
 			      en->flags.predicted ? " PRED" : "",
 			      en->flags.mispred ? " MISPRED" : "",
 			      en->flags.in_tx ? " INTX" : "",
@ -2568,7 +2572,6 @@ static struct scripting_ops	*scripting_ops;
 static void __process_stat(struct evsel *counter, u64 tstamp)
 {
 	int nthreads = perf_thread_map__nr(counter->core.threads);
-	int idx, thread;
 	struct perf_cpu cpu;
 	static int header_printed;

@ -2578,7 +2581,9 @@ static void __process_stat(struct evsel *counter, u64 tstamp)
 		header_printed = 1;
 	}

-	for (thread = 0; thread < nthreads; thread++) {
+	for (int thread = 0; thread < nthreads; thread++) {
+		unsigned int idx;
+
 		perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
 			struct perf_counts_values *counts;

@ -2905,8 +2910,12 @@ static int print_event_with_time(const struct perf_tool *tool,
 		thread = machine__findnew_thread(machine, pid, tid);

 	if (evsel) {
+		struct evsel *saved_evsel = sample->evsel;
+
+		sample->evsel = evsel;
 		perf_sample__fprintf_start(script, sample, thread, evsel,
 					   event->header.type, stdout);
+		sample->evsel = saved_evsel;
 	}

 	perf_event__fprintf(event, machine, stdout);
@ -3814,7 +3823,7 @@ out:

 static int have_cmd(int argc, const char **argv)
 {
-	char **__argv = malloc(sizeof(const char *) * argc);
+	char **__argv = calloc(argc, sizeof(const char *));

 	if (!__argv) {
 		pr_err("malloc failed\n");
@ -3939,15 +3948,6 @@ int process_cpu_map_event(const struct perf_tool *tool,
 	return set_maps(script);
 }

-static int process_feature_event(const struct perf_tool *tool __maybe_unused,
-				 struct perf_session *session,
-				 union perf_event *event)
-{
-	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(session, event);
-	return 0;
-}
-
 static int perf_script__process_auxtrace_info(const struct perf_tool *tool,
 					      struct perf_session *session,
 					      union perf_event *event)
@ -4074,8 +4074,7 @@ int cmd_script(int argc, const char **argv)
 		   "file", "kallsyms pathname"),
 	OPT_BOOLEAN('G', "hide-call-graph", &no_callchain,
 		    "When printing symbols do not display call chain"),
-	OPT_CALLBACK(0, "symfs", NULL, "directory",
-		     "Look for files with symbols relative to this directory",
+	OPT_CALLBACK(0, "symfs", NULL, "directory[,layout]", SYMFS_HELP,
 		     symbol__config_symfs),
 	OPT_CALLBACK('F', "fields", NULL, "str",
 		     "comma separated output fields prepend with 'type:'. "
@ -4313,7 +4312,7 @@ int cmd_script(int argc, const char **argv)
 				}
 			}

-			__argv = malloc((argc + 6) * sizeof(const char *));
+			__argv = calloc(argc + 6, sizeof(const char *));
 			if (!__argv) {
 				pr_err("malloc failed\n");
 				err = -ENOMEM;
@ -4339,7 +4338,7 @@ int cmd_script(int argc, const char **argv)
 		dup2(live_pipe[0], 0);
 		close(live_pipe[1]);

-		__argv = malloc((argc + 4) * sizeof(const char *));
+		__argv = calloc(argc + 4, sizeof(const char *));
 		if (!__argv) {
 			pr_err("malloc failed\n");
 			err = -ENOMEM;
@ -4377,7 +4376,7 @@ script_found:
 			}
 		}

-		__argv = malloc((argc + 2) * sizeof(const char *));
+		__argv = calloc(argc + 2, sizeof(const char *));
 		if (!__argv) {
 			pr_err("malloc failed\n");
 			err = -ENOMEM;
@ -4423,7 +4422,7 @@ script_found:
 #ifdef HAVE_LIBTRACEEVENT
 	script.tool.tracing_data	 = perf_event__process_tracing_data;
 #endif
-	script.tool.feature		 = process_feature_event;
+	script.tool.feature		 = perf_event__process_feature;
 	script.tool.build_id		 = perf_event__process_build_id;
 	script.tool.id_index		 = perf_event__process_id_index;
 	script.tool.auxtrace_info	 = perf_script__process_auxtrace_info;
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@ -164,7 +164,7 @@ struct opt_aggr_mode {
 };

 /* Turn command line option into most generic aggregation mode setting. */
-static enum aggr_mode opt_aggr_mode_to_aggr_mode(struct opt_aggr_mode *opt_mode)
+static enum aggr_mode opt_aggr_mode_to_aggr_mode(const struct opt_aggr_mode *opt_mode)
 {
 	enum aggr_mode mode = AGGR_GLOBAL;

@ -410,7 +410,7 @@ static int read_tool_counters(void)
 	struct evsel *counter;

 	evlist__for_each_entry(evsel_list, counter) {
-		int idx;
+		unsigned int idx;

 		if (!evsel__is_tool(counter))
 			continue;
@ -1214,13 +1214,28 @@ static int parse_cputype(const struct option *opt,
 	return 0;
 }

+static int parse_pmu_filter(const struct option *opt,
+			   const char *str,
+			   int unset __maybe_unused)
+{
+	struct evlist *evlist = *(struct evlist **)opt->value;
+
+	if (!list_empty(&evlist->core.entries)) {
+		fprintf(stderr, "Must define pmu-filter before events/metrics\n");
+		return -1;
+	}
+
+	parse_events_option_args.pmu_filter = str;
+	return 0;
+}
+
 static int parse_cache_level(const struct option *opt,
 			     const char *str,
 			     int unset __maybe_unused)
 {
 	int level;
-	struct opt_aggr_mode *opt_aggr_mode = (struct opt_aggr_mode *)opt->value;
-	u32 *aggr_level = (u32 *)opt->data;
+	bool *per_cache = opt->value;
+	u32 *aggr_level = opt->data;

 	/*
 	 * If no string is specified, aggregate based on the topology of
@ -1258,7 +1273,7 @@ static int parse_cache_level(const struct option *opt,
 		return -EINVAL;
 	}
 out:
-	opt_aggr_mode->cache = true;
+	*per_cache = true;
 	*aggr_level = level;
 	return 0;
 }
@ -1917,25 +1932,33 @@ static int default_evlist_evsel_cmp(void *priv __maybe_unused,
 	const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
 	const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
 	const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
+	const struct evsel *lhs_leader = evsel__leader(lhs);
+	const struct evsel *rhs_leader = evsel__leader(rhs);

-	if (evsel__leader(lhs) == evsel__leader(rhs)) {
+	if (lhs_leader == rhs_leader) {
 		/* Within the same group, respect the original order. */
 		return lhs_core->idx - rhs_core->idx;
 	}

-	/* Sort default metrics evsels first, and default show events before those. */
-	if (lhs->default_metricgroup != rhs->default_metricgroup)
-		return lhs->default_metricgroup ? -1 : 1;
+	/*
+	 * Compare using leader's attributes so that all members of a group
+	 * stay together. This ensures leaders are opened before their members.
+	 */

-	if (lhs->default_show_events != rhs->default_show_events)
-		return lhs->default_show_events ? -1 : 1;
+	/* Sort default metrics evsels first, and default show events before those. */
+	if (lhs_leader->default_metricgroup != rhs_leader->default_metricgroup)
+		return lhs_leader->default_metricgroup ? -1 : 1;
+
+	if (lhs_leader->default_show_events != rhs_leader->default_show_events)
+		return lhs_leader->default_show_events ? -1 : 1;

 	/* Sort by PMU type (prefers legacy types first). */
-	if (lhs->pmu != rhs->pmu)
-		return lhs->pmu->type - rhs->pmu->type;
+	if (lhs_leader->pmu != rhs_leader->pmu)
+		return lhs_leader->pmu->type - rhs_leader->pmu->type;

-	/* Sort by name. */
-	return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
+	/* Sort by leader's name. */
+	return strcmp(evsel__name((struct evsel *)lhs_leader),
+		      evsel__name((struct evsel *)rhs_leader));
 }

 /*
@ -2305,24 +2328,23 @@ static struct perf_stat perf_stat = {
 static int __cmd_report(int argc, const char **argv)
 {
 	struct perf_session *session;
+	struct opt_aggr_mode opt_mode = {};
 	const struct option options[] = {
 	OPT_STRING('i', "input", &input_name, "file", "input file name"),
-	OPT_SET_UINT(0, "per-socket", &perf_stat.aggr_mode,
-		     "aggregate counts per processor socket", AGGR_SOCKET),
-	OPT_SET_UINT(0, "per-die", &perf_stat.aggr_mode,
-		     "aggregate counts per processor die", AGGR_DIE),
-	OPT_SET_UINT(0, "per-cluster", &perf_stat.aggr_mode,
-		     "aggregate counts perf processor cluster", AGGR_CLUSTER),
-	OPT_CALLBACK_OPTARG(0, "per-cache", &perf_stat.aggr_mode, &perf_stat.aggr_level,
-			    "cache level",
-			    "aggregate count at this cache level (Default: LLC)",
+	OPT_BOOLEAN(0, "per-thread", &opt_mode.thread, "aggregate counts per thread"),
+	OPT_BOOLEAN(0, "per-socket", &opt_mode.socket,
+		    "aggregate counts per processor socket"),
+	OPT_BOOLEAN(0, "per-die", &opt_mode.die, "aggregate counts per processor die"),
+	OPT_BOOLEAN(0, "per-cluster", &opt_mode.cluster,
+		    "aggregate counts per processor cluster"),
+	OPT_CALLBACK_OPTARG(0, "per-cache", &opt_mode.cache, &perf_stat.aggr_level,
+			    "cache level", "aggregate count at this cache level (Default: LLC)",
 			    parse_cache_level),
-	OPT_SET_UINT(0, "per-core", &perf_stat.aggr_mode,
-		     "aggregate counts per physical processor core", AGGR_CORE),
-	OPT_SET_UINT(0, "per-node", &perf_stat.aggr_mode,
-		     "aggregate counts per numa node", AGGR_NODE),
-	OPT_SET_UINT('A', "no-aggr", &perf_stat.aggr_mode,
-		     "disable CPU count aggregation", AGGR_NONE),
+	OPT_BOOLEAN(0, "per-core", &opt_mode.core,
+		    "aggregate counts per physical processor core"),
+	OPT_BOOLEAN(0, "per-node", &opt_mode.node, "aggregate counts per numa node"),
+	OPT_BOOLEAN('A', "no-aggr", &opt_mode.no_aggr,
+		    "disable aggregation across CPUs or PMUs"),
 	OPT_END()
 	};
 	struct stat st;
@ -2330,6 +2352,10 @@ static int __cmd_report(int argc, const char **argv)

 	argc = parse_options(argc, argv, options, stat_report_usage, 0);

+	perf_stat.aggr_mode = opt_aggr_mode_to_aggr_mode(&opt_mode);
+	if (perf_stat.aggr_mode == AGGR_GLOBAL)
+		perf_stat.aggr_mode = AGGR_UNSET; /* No option found so leave unset. */
+
 	if (!input_name || !strlen(input_name)) {
 		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
 			input_name = "-";
@ -2506,7 +2532,7 @@ int cmd_stat(int argc, const char **argv)
 		OPT_BOOLEAN(0, "per-die", &opt_mode.die, "aggregate counts per processor die"),
 		OPT_BOOLEAN(0, "per-cluster", &opt_mode.cluster,
 			"aggregate counts per processor cluster"),
-		OPT_CALLBACK_OPTARG(0, "per-cache", &opt_mode, &stat_config.aggr_level,
+		OPT_CALLBACK_OPTARG(0, "per-cache", &opt_mode.cache, &stat_config.aggr_level,
 				"cache level", "aggregate count at this cache level (Default: LLC)",
 				parse_cache_level),
 		OPT_BOOLEAN(0, "per-core", &opt_mode.core,
@ -2561,6 +2587,10 @@ int cmd_stat(int argc, const char **argv)
 			"Only enable events on applying cpu with this type "
 			"for hybrid platform (e.g. core or atom)",
 			parse_cputype),
+		OPT_CALLBACK(0, "pmu-filter", &evsel_list, "pmu",
+			"Only enable events on applying pmu with specified "
+			"for multiple pmus with same type(e.g. hisi_sicl2_cpa0 or hisi_sicl0_cpa0)",
+			parse_pmu_filter),
 #ifdef HAVE_LIBPFM
 		OPT_CALLBACK(0, "pfm-events", &evsel_list, "event",
 			"libpfm4 event selector. use 'perf list' to list available events",
@ -2744,7 +2774,7 @@ int cmd_stat(int argc, const char **argv)
 	}

 	if (stat_config.walltime_run_table) {
-		stat_config.walltime_run = zalloc(stat_config.run_count * sizeof(stat_config.walltime_run[0]));
+		stat_config.walltime_run = calloc(stat_config.run_count, sizeof(stat_config.walltime_run[0]));
 		if (!stat_config.walltime_run) {
 			pr_err("failed to setup -r option");
 			goto out;
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@ -1951,8 +1951,7 @@ int cmd_timechart(int argc, const char **argv)
 	OPT_CALLBACK('p', "process", NULL, "process",
 		      "process selector. Pass a pid or process name.",
 		       parse_process),
-	OPT_CALLBACK(0, "symfs", NULL, "directory",
-		     "Look for files with symbols relative to this directory",
+	OPT_CALLBACK(0, "symfs", NULL, "directory[,layout]", SYMFS_HELP,
 		     symbol__config_symfs),
 	OPT_INTEGER('n', "proc-num", &tchart.proc_num,
 		    "min. number of tasks to print"),
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@ -56,6 +56,7 @@
 #include "util/debug.h"
 #include "util/ordered-events.h"
 #include "util/pfm.h"
+#include "dwarf-regs.h"

 #include <assert.h>
 #include <elf.h>
@ -1386,13 +1387,6 @@ out_join_thread:
 	return ret;
 }

-static int
-callchain_opt(const struct option *opt, const char *arg, int unset)
-{
-	symbol_conf.use_callchain = true;
-	return record_callchain_opt(opt, arg, unset);
-}
-
 static int
 parse_callchain_opt(const struct option *opt, const char *arg, int unset)
 {
@ -1413,6 +1407,24 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
 	return parse_callchain_top_opt(arg);
 }

+static int
+callchain_opt(const struct option *opt, const char *arg __maybe_unused, int unset)
+{
+	struct callchain_param *callchain = opt->value;
+
+	/*
+	 * The -g option only sets the callchain if not already configured by
+	 * .perfconfig. It does, however, enable it.
+	 */
+	if (callchain->record_mode != CALLCHAIN_NONE) {
+		callchain->enabled = true;
+		return 0;
+	}
+
+	return parse_callchain_opt(opt, EM_HOST != EM_S390 ? "fp" : "dwarf", unset);
+}
+
+
 static int perf_top_config(const char *var, const char *value, void *cb __maybe_unused)
 {
 	if (!strcmp(var, "top.call-graph")) {
@ -1437,11 +1449,10 @@ parse_percent_limit(const struct option *opt, const char *arg,
 	return 0;
 }

-const char top_callchain_help[] = CALLCHAIN_RECORD_HELP CALLCHAIN_REPORT_HELP
-	"\n\t\t\t\tDefault: fp,graph,0.5,caller,function";
-
 int cmd_top(int argc, const char **argv)
 {
+	static const char top_callchain_help[] = CALLCHAIN_RECORD_HELP CALLCHAIN_REPORT_HELP
+		"\n\t\t\t\tDefault: fp,graph,0.5,caller,function";
 	char errbuf[BUFSIZ];
 	struct perf_top top = {
 		.count_filter	     = 5,
@ -1694,8 +1705,17 @@ int cmd_top(int argc, const char **argv)
 	if (annotate_check_args() < 0)
 		goto out_delete_evlist;

+	status = target__validate(target);
+	if (status) {
+		target__strerror(target, status, errbuf, BUFSIZ);
+		ui__warning("%s\n", errbuf);
+	}
+
+	if (target__none(target))
+		target->system_wide = true;
+
 	if (!top.evlist->core.nr_entries) {
-		struct evlist *def_evlist = evlist__new_default();
+		struct evlist *def_evlist = evlist__new_default(target, callchain_param.enabled);

 		if (!def_evlist)
 			goto out_delete_evlist;
@ -1788,12 +1808,6 @@ int cmd_top(int argc, const char **argv)
 		goto out_delete_evlist;
 	}

-	status = target__validate(target);
-	if (status) {
-		target__strerror(target, status, errbuf, BUFSIZ);
-		ui__warning("%s\n", errbuf);
-	}
-
 	if (top.uid_str) {
 		uid_t uid = parse_uid(top.uid_str);

@ -1807,9 +1821,6 @@ int cmd_top(int argc, const char **argv)
 			goto out_delete_evlist;
 	}

-	if (target__none(target))
-		target->system_wide = true;
-
 	if (evlist__create_maps(top.evlist, target) < 0) {
 		ui__error("Couldn't create thread/CPU maps: %s\n",
 			  errno == ENOENT ? "No such process" : str_error_r(errno, errbuf, sizeof(errbuf)));
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@ -21,7 +21,6 @@
 #include <bpf/libbpf.h>
 #include <bpf/btf.h>
 #endif
-#include "util/bpf_map.h"
 #include "util/rlimit.h"
 #include "builtin.h"
 #include "util/cgroup.h"
@ -1565,7 +1564,9 @@ static bool syscall_id_equal(long key1, long key2, void *ctx __maybe_unused)

 static struct hashmap *alloc_syscall_stats(void)
 {
-	return hashmap__new(syscall_id_hash, syscall_id_equal, NULL);
+	struct hashmap *result = hashmap__new(syscall_id_hash, syscall_id_equal, NULL);
+
+	return IS_ERR(result) ? NULL : result;
 }

 static void delete_syscall_stats(struct hashmap *syscall_stats)
@ -1573,7 +1574,7 @@ static void delete_syscall_stats(struct hashmap *syscall_stats)
 	struct hashmap_entry *pos;
 	size_t bkt;

-	if (syscall_stats == NULL)
+	if (!syscall_stats)
 		return;

 	hashmap__for_each_entry(syscall_stats, pos, bkt)
@ -1589,7 +1590,7 @@ static struct thread_trace *thread_trace__new(struct trace *trace)
 		ttrace->files.max = -1;
 		if (trace->summary) {
 			ttrace->syscall_stats = alloc_syscall_stats();
-			if (IS_ERR(ttrace->syscall_stats))
+			if (!ttrace->syscall_stats)
 				zfree(&ttrace);
 		}
 	}
@ -2003,9 +2004,13 @@ static int trace__symbols_init(struct trace *trace, int argc, const char **argv,
 	if (err < 0)
 		goto out;

+	if (trace->summary_only && trace->summary_mode != SUMMARY__BY_THREAD)
+		goto out;
+
 	err = __machine__synthesize_threads(trace->host, &trace->tool, &trace->opts.target,
 					    evlist->core.threads, trace__tool_process,
-					    /*needs_mmap=*/callchain_param.enabled,
+					    /*needs_mmap=*/callchain_param.enabled &&
+							   !trace->summary_only,
 					    /*mmap_data=*/false,
 					    /*nr_threads_synthesize=*/1);
 out:
@ -2264,9 +2269,7 @@ static int trace__validate_ev_qualifier(struct trace *trace)
 	struct str_node *pos;
 	size_t nr_used = 0, nr_allocated = strlist__nr_entries(trace->ev_qualifier);

-	trace->ev_qualifier_ids.entries = malloc(nr_allocated *
-						 sizeof(trace->ev_qualifier_ids.entries[0]));
-
+	trace->ev_qualifier_ids.entries = calloc(nr_allocated, sizeof(trace->ev_qualifier_ids.entries[0]));
 	if (trace->ev_qualifier_ids.entries == NULL) {
 		fputs("Error:\tNot enough memory for allocating events qualifier ids\n",
 		       trace->output);
@ -2955,7 +2958,7 @@ static int trace__sys_exit(struct trace *trace, struct evsel *evsel,
 		++trace->stats.vfs_getname;
 	}

-	if (ttrace->entry_time) {
+	if (ttrace->entry_time && sample->time >= ttrace->entry_time) {
 		duration = sample->time - ttrace->entry_time;
 		if (trace__filter_duration(trace, duration))
 			goto out;
@ -4464,7 +4467,7 @@ create_maps:

 	if (trace->summary_mode == SUMMARY__BY_TOTAL && !trace->summary_bpf) {
 		trace->syscall_stats = alloc_syscall_stats();
-		if (IS_ERR(trace->syscall_stats))
+		if (!trace->syscall_stats)
 			goto out_delete_evlist;
 	}

@ -4771,7 +4774,7 @@ static int trace__replay(struct trace *trace)

 	if (trace->summary_mode == SUMMARY__BY_TOTAL) {
 		trace->syscall_stats = alloc_syscall_stats();
-		if (IS_ERR(trace->syscall_stats))
+		if (!trace->syscall_stats)
 			goto out;
 	}

@ -5299,6 +5302,13 @@ static int trace__parse_summary_mode(const struct option *opt, const char *str,
 	return 0;
 }

+static int trace_parse_callchain_opt(const struct option *opt,
+				     const char *arg,
+				     int unset)
+{
+	return record_opts__parse_callchain(opt->value, &callchain_param, arg, unset);
+}
+
 static int trace__config(const char *var, const char *value, void *arg)
 {
 	struct trace *trace = arg;
@ -5446,7 +5456,7 @@ int cmd_trace(int argc, const char **argv)
 	OPT_BOOLEAN('f', "force", &trace.force, "don't complain, do it"),
 	OPT_CALLBACK(0, "call-graph", &trace.opts,
 		     "record_mode[,record_size]", record_callchain_help,
-		     &record_parse_callchain_opt),
+		     &trace_parse_callchain_opt),
 	OPT_BOOLEAN(0, "libtraceevent_print", &trace.libtraceevent_print,
 		    "Use libtraceevent to print the tracepoint arguments."),
 	OPT_BOOLEAN(0, "kernel-syscall-graph", &trace.kernel_syscallchains,
--- a/tools/perf/check-headers.sh
+++ b/tools/perf/check-headers.sh
@ -6,10 +6,7 @@ NC='\033[0m' # No Color

 declare -a FILES=(
  "include/uapi/linux/const.h"
-  "include/uapi/drm/drm.h"
-  "include/uapi/drm/i915_drm.h"
  "include/uapi/linux/bits.h"
-  "include/uapi/linux/fadvise.h"
  "include/uapi/linux/fscrypt.h"
  "include/uapi/linux/genetlink.h"
  "include/uapi/linux/if_addr.h"
@ -90,7 +87,10 @@ declare -a SYNC_CHECK_FILES=(
 declare -a BEAUTY_FILES=(
  "arch/x86/include/asm/irq_vectors.h"
  "arch/x86/include/uapi/asm/prctl.h"
+  "include/uapi/drm/drm.h"
+  "include/uapi/drm/i915_drm.h"
  "include/linux/socket.h"
+  "include/uapi/linux/fadvise.h"
  "include/uapi/linux/fcntl.h"
  "include/uapi/linux/fs.h"
  "include/uapi/linux/mount.h"
--- a/tools/perf/jvmti/libjvmti.c
+++ b/tools/perf/jvmti/libjvmti.c
@ -98,7 +98,7 @@ get_line_numbers(jvmtiEnv *jvmti, const void *compile_info, jvmti_line_info_t **
 	/*
 	 * Phase 2 -- allocate big enough line table
 	 */
-	*tab = malloc(nr_total * sizeof(**tab));
+	*tab = calloc(nr_total, sizeof(**tab));
 	if (!*tab)
 		return JVMTI_ERROR_OUT_OF_MEMORY;

@ -262,11 +262,10 @@ compiled_method_load_cb(jvmtiEnv *jvmti,
 			}
 			nr_lines = 0;
 		} else if (nr_lines > 0) {
-			line_file_names = malloc(sizeof(char*) * nr_lines);
+			line_file_names = calloc(nr_lines, sizeof(char *));
 			if (!line_file_names) {
 				warnx("jvmti: cannot allocate space for line table method names");
 			} else {
-				memset(line_file_names, 0, sizeof(char*) * nr_lines);
 				ret = fill_source_filenames(jvmti, nr_lines, line_tab, line_file_names);
 				if (ret != JVMTI_ERROR_NONE) {
 					warnx("jvmti: fill_source_filenames failed");
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@ -48,7 +48,7 @@ struct cmd_struct {
 	int option;
 };

-static struct cmd_struct commands[] = {
+static const struct cmd_struct commands[] = {
 	{ "archive",	NULL,	0 },
 	{ "buildid-cache", cmd_buildid_cache, 0 },
 	{ "buildid-list", cmd_buildid_list, 0 },
@ -178,7 +178,7 @@ static int set_debug_file(const char *path)
 	return 0;
 }

-struct option options[] = {
+static const struct option options[] = {
 	OPT_ARGUMENT("help", "help"),
 	OPT_ARGUMENT("version", "version"),
 	OPT_ARGUMENT("exec-path", "exec-path"),
@ -280,7 +280,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 			unsigned int i;

 			for (i = 0; i < ARRAY_SIZE(commands); i++) {
-				struct cmd_struct *p = commands+i;
+				const struct cmd_struct *p = commands + i;
 				printf("%s ", p->cmd);
 			}
 			putchar('\n');
@ -289,7 +289,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 			unsigned int i;

 			for (i = 0; i < ARRAY_SIZE(options)-1; i++) {
-				struct option *p = options+i;
+				const struct option *p = options + i;
 				printf("--%s ", p->long_name);
 			}
 			putchar('\n');
@ -331,7 +331,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 #define RUN_SETUP	(1<<0)
 #define USE_PAGER	(1<<1)

-static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
+static int run_builtin(const struct cmd_struct *p, int argc, const char **argv)
 {
 	int status;
 	struct stat st;
@ -390,7 +390,7 @@ static void handle_internal_command(int argc, const char **argv)
 	}

 	for (i = 0; i < ARRAY_SIZE(commands); i++) {
-		struct cmd_struct *p = commands+i;
+		const struct cmd_struct *p = commands+i;
 		if (p->fn == NULL)
 			continue;
 		if (strcmp(p->cmd, cmd))
--- a/tools/perf/pmu-events/Build
+++ b/tools/perf/pmu-events/Build
@ -211,10 +211,10 @@ ifneq ($(strip $(ORPHAN_FILES)),)

 # Message for $(call echo-cmd,rm). Generally cleaning files isn't part
 # of a build step.
-quiet_cmd_rm  = RM      $^
+quiet_cmd_rm = RM      ...$(words $^) orphan file(s)...

+# The list of files can be long. Use xargs to prevent issues.
 prune_orphans: $(ORPHAN_FILES)
-	# The list of files can be long. Use xargs to prevent issues.
 	$(Q)$(call echo-cmd,rm)echo "$^" | xargs rm -f

 JEVENTS_DEPS += prune_orphans
--- a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
+++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
@ -1512,11 +1512,26 @@
        "EventName": "L2D_CACHE_REFILL_PRFM",
        "BriefDescription": "Level 2 data cache refill, software preload"
    },
+    {
+        "EventCode": "0x8150",
+        "EventName": "L3D_CACHE_RW",
+        "BriefDescription": "Level 3 data cache demand access."
+    },
+    {
+        "EventCode": "0x8151",
+        "EventName": "L3D_CACHE_PRFM",
+        "BriefDescription": "Level 3 data cache software prefetch"
+    },
    {
        "EventCode": "0x8152",
        "EventName": "L3D_CACHE_MISS",
        "BriefDescription": "Level 3 data cache demand access miss"
    },
+    {
+        "EventCode": "0x8153",
+        "EventName": "L3D_CACHE_REFILL_PRFM",
+        "BriefDescription": "Level 3 data cache refill, software prefetch."
+    },
    {
        "EventCode": "0x8154",
        "EventName": "L1D_CACHE_HWPRF",
@ -1527,6 +1542,11 @@
        "EventName": "L2D_CACHE_HWPRF",
        "BriefDescription": "Level 2 data cache hardware prefetch."
    },
+    {
+        "EventCode": "0x8156",
+        "EventName": "L3D_CACHE_HWPRF",
+        "BriefDescription": "Level 3 data cache hardware prefetch."
+    },
    {
        "EventCode": "0x8158",
        "EventName": "STALL_FRONTEND_MEMBOUND",
@ -1682,6 +1702,11 @@
        "EventName": "L2D_CACHE_REFILL_HWPRF",
        "BriefDescription": "Level 2 data cache refill, hardware prefetch."
    },
+    {
+        "EventCode": "0x81BE",
+        "EventName": "L3D_CACHE_REFILL_HWPRF",
+        "BriefDescription": "Level 3 data cache refill, hardware prefetch."
+    },
    {
        "EventCode": "0x81C0",
        "EventName": "L1I_CACHE_HIT_RD",
@ -1712,11 +1737,31 @@
        "EventName": "L1I_CACHE_HIT_RD_FPRFM",
        "BriefDescription": "Level 1 instruction cache demand fetch first hit, fetched by software preload"
    },
+    {
+        "EventCode": "0x81DC",
+        "EventName": "L1D_CACHE_HIT_RW_FPRFM",
+        "BriefDescription": "Level 1 data cache demand access first hit, fetched by software prefetch."
+    },
    {
        "EventCode": "0x81E0",
        "EventName": "L1I_CACHE_HIT_RD_FHWPRF",
        "BriefDescription": "Level 1 instruction cache demand fetch first hit, fetched by hardware prefetcher"
    },
+    {
+        "EventCode": "0x81EC",
+        "EventName": "L1D_CACHE_HIT_RW_FHWPRF",
+        "BriefDescription": "Level 1 data cache demand access first hit, fetched by hardware prefetcher."
+    },
+    {
+        "EventCode": "0x81F0",
+        "EventName": "L1I_CACHE_HIT_RD_FPRF",
+        "BriefDescription": "Level 1 instruction cache demand fetch first hit, fetched by prefetch."
+    },
+    {
+        "EventCode": "0x81FC",
+        "EventName": "L1D_CACHE_HIT_RW_FPRF",
+        "BriefDescription": "Level 1 data cache demand access first hit, fetched by prefetch."
+    },
    {
        "EventCode": "0x8200",
        "EventName": "L1I_CACHE_HIT",
@ -1767,11 +1812,26 @@
        "EventName": "L1I_LFB_HIT_RD_FPRFM",
        "BriefDescription": "Level 1 instruction cache demand fetch line-fill buffer first hit, recently fetched by software preload"
    },
+    {
+        "EventCode": "0x825C",
+        "EventName": "L1D_LFB_HIT_RW_FPRFM",
+        "BriefDescription": "Level 1 data cache demand access line-fill buffer first hit, recently fetched by software prefetch."
+    },
    {
        "EventCode": "0x8260",
        "EventName": "L1I_LFB_HIT_RD_FHWPRF",
        "BriefDescription": "Level 1 instruction cache demand fetch line-fill buffer first hit, recently fetched by hardware prefetcher"
    },
+    {
+        "EventCode": "0x826C",
+        "EventName": "L1D_LFB_HIT_RW_FHWPRF",
+        "BriefDescription": "Level 1 data cache demand access line-fill buffer first hit, recently fetched by hardware prefetcher."
+    },
+    {
+        "EventCode": "0x827C",
+        "EventName": "L1D_LFB_HIT_RW_FPRF",
+        "BriefDescription": "Level 1 data cache demand access line-fill buffer first hit, recently fetched by prefetch."
+    },
    {
        "EventCode": "0x8280",
        "EventName": "L1I_CACHE_PRF",
@ -1807,6 +1867,11 @@
        "EventName": "LL_CACHE_REFILL",
        "BriefDescription": "Last level cache refill"
    },
+    {
+        "EventCode": "0x828E",
+        "EventName": "L3D_CACHE_REFILL_PRF",
+        "BriefDescription": "Level 3 data cache refill, prefetch."
+    },
    {
        "EventCode": "0x8320",
        "EventName": "L1D_CACHE_REFILL_PERCYC",
@ -1872,6 +1937,16 @@
        "EventName": "FP_FP8_MIN_SPEC",
        "BriefDescription": "Floating-point operation speculatively_executed, smallest type is 8-bit floating-point."
    },
+    {
+        "EventCode": "0x8480",
+        "EventName": "FP_SP_FIXED_MIN_OPS_SPEC",
+        "BriefDescription": "Non-scalable element arithmetic operations speculatively executed, smallest type is single-precision floating-point."
+    },
+    {
+        "EventCode": "0x8482",
+        "EventName": "FP_HP_FIXED_MIN_OPS_SPEC",
+        "BriefDescription": "Non-scalable element arithmetic operations speculatively executed, smallest type is half-precision floating-point."
+    },
    {
        "EventCode": "0x8483",
        "EventName": "FP_BF16_FIXED_MIN_OPS_SPEC",
@ -1882,6 +1957,16 @@
        "EventName": "FP_FP8_FIXED_MIN_OPS_SPEC",
        "BriefDescription": "Non-scalable element arithmetic operations speculatively executed, smallest type is 8-bit floating-point."
    },
+    {
+        "EventCode": "0x8488",
+        "EventName": "FP_SP_SCALE_MIN_OPS_SPEC",
+        "BriefDescription": "Scalable element arithmetic operations speculatively executed, smallest type is single-precision floating-point."
+    },
+    {
+        "EventCode": "0x848A",
+        "EventName": "FP_HP_SCALE_MIN_OPS_SPEC",
+        "BriefDescription": "Scalable element arithmetic operations speculatively executed, smallest type is half-precision floating-point."
+    },
    {
        "EventCode": "0x848B",
        "EventName": "FP_BF16_SCALE_MIN_OPS_SPEC",
--- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
+++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
@ -46,3 +46,4 @@
 0x00000000500f0000,v1,ampere/emag,core
 0x00000000c00fac30,v1,ampere/ampereone,core
 0x00000000c00fac40,v1,ampere/ampereonex,core
+0x000000004e0f0100,v1,nvidia/t410,core
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json
@ -0,0 +1,45 @@
+[
+    {
+        "ArchStdEvent": "BR_MIS_PRED",
+        "PublicDescription": "This event counts branches which are speculatively executed and mispredicted."
+    },
+    {
+        "ArchStdEvent": "BR_PRED",
+        "PublicDescription": "This event counts all speculatively executed branches."
+    },
+    {
+        "EventCode": "0x017e",
+        "EventName": "BR_PRED_BTB_CTX_UPDATE",
+        "PublicDescription": "Branch context table update."
+    },
+    {
+        "EventCode": "0x0188",
+        "EventName": "BR_MIS_PRED_DIR_RESOLVED",
+        "PublicDescription": "Number of branch misprediction due to direction misprediction."
+    },
+    {
+        "EventCode": "0x0189",
+        "EventName": "BR_MIS_PRED_DIR_UNCOND_RESOLVED",
+        "PublicDescription": "Number of branch misprediction due to direction misprediction for unconditional branches."
+    },
+    {
+        "EventCode": "0x018a",
+        "EventName": "BR_MIS_PRED_DIR_UNCOND_DIRECT_RESOLVED",
+        "PublicDescription": "Number of branch misprediction due to direction misprediction for unconditional direct branches."
+    },
+    {
+        "EventCode": "0x018b",
+        "EventName": "BR_PRED_MULTI_RESOLVED",
+        "PublicDescription": "Number of resolved branch which made prediction by polymorphic indirect predictor."
+    },
+    {
+        "EventCode": "0x018c",
+        "EventName": "BR_MIS_PRED_MULTI_RESOLVED",
+        "PublicDescription": "Number of branch misprediction which made prediction by polymorphic indirect predictor."
+    },
+    {
+        "EventCode": "0x01e4",
+        "EventName": "BR_RGN_RECLAIM",
+        "PublicDescription": "This event counts the Indirect predictor entries flushed by region reclamation."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json
@ -0,0 +1,6 @@
+[
+    {
+        "ArchStdEvent": "BRB_FILTRATE",
+        "PublicDescription": "This event counts each valid branch record captured in the branch record buffer. Branch records that are not captured because they are removed by filtering are not counted."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json
@ -0,0 +1,48 @@
+[
+    {
+        "ArchStdEvent": "BUS_ACCESS",
+        "PublicDescription": "This event counts the number of data-beat accesses between the CPU and the external bus. This count includes accesses due to read, write, and snoop. Each beat of data is counted individually."
+    },
+    {
+        "ArchStdEvent": "BUS_CYCLES",
+        "PublicDescription": "This event counts bus cycles in the CPU. Bus cycles represent a clock cycle in which a transaction could be sent or received on the interface from the CPU to the external bus. Since that interface is driven at the same clock speed as the CPU, this event increments at the rate of CPU clock. Regardless of the WFE/WFI state of the PE, this event increments on each processor clock."
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS_RD",
+        "PublicDescription": "This event counts memory Read transactions seen on the external bus. Each beat of data is counted individually."
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS_WR",
+        "PublicDescription": "This event counts memory Write transactions seen on the external bus. Each beat of data is counted individually."
+    },
+    {
+        "EventCode": "0x0154",
+        "EventName": "BUS_REQUEST_REQ",
+        "PublicDescription": "Bus request, request."
+    },
+    {
+        "EventCode": "0x0155",
+        "EventName": "BUS_REQUEST_RETRY",
+        "PublicDescription": "Bus request, retry."
+    },
+    {
+        "EventCode": "0x0198",
+        "EventName": "L2_CHI_CBUSY0",
+        "PublicDescription": "Number of RXDAT or RXRSP response received width CBusy of 0."
+    },
+    {
+        "EventCode": "0x0199",
+        "EventName": "L2_CHI_CBUSY1",
+        "PublicDescription": "Number of RXDAT or RXRSP response received width CBusy of 1."
+    },
+    {
+        "EventCode": "0x019a",
+        "EventName": "L2_CHI_CBUSY2",
+        "PublicDescription": "Number of RXDAT or RXRSP response received width CBusy of 2."
+    },
+    {
+        "EventCode": "0x019b",
+        "EventName": "L2_CHI_CBUSY3",
+        "PublicDescription": "Number of RXDAT or RXRSP response received width CBusy of 3."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json
@ -0,0 +1,62 @@
+[
+    {
+        "ArchStdEvent": "EXC_TAKEN",
+        "PublicDescription": "This event counts any taken architecturally visible exceptions such as IRQ, FIQ, SError, and other synchronous exceptions. Exceptions are counted whether or not they are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_RETURN",
+        "PublicDescription": "This event counts any architecturally executed exception return instructions. For example: AArch64: ERET."
+    },
+    {
+        "ArchStdEvent": "EXC_UNDEF",
+        "PublicDescription": "This event counts the number of synchronous exceptions which are taken locally that are due to attempting to execute an instruction that is UNDEFINED.\nAttempting to execute instruction bit patterns that have not been allocated.\nAttempting to execute instructions when they are disabled.\nAttempting to execute instructions at an inappropriate Exception level.\nAttempting to execute an instruction when the value of PSTATE.IL is 1."
+    },
+    {
+        "ArchStdEvent": "EXC_SVC",
+        "PublicDescription": "This event counts SVC exceptions taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_PABORT",
+        "PublicDescription": "This event counts synchronous exceptions that are taken locally and caused by Instruction Aborts."
+    },
+    {
+        "ArchStdEvent": "EXC_DABORT",
+        "PublicDescription": "This event counts exceptions that are taken locally and are caused by data aborts or SErrors. Conditions that could cause those exceptions are attempting to read or write memory where the MMU generates a fault, attempting to read or write memory with a misaligned address, Interrupts from the nSEI inputs and internally generated SErrors."
+    },
+    {
+        "ArchStdEvent": "EXC_IRQ",
+        "PublicDescription": "This event counts IRQ exceptions including the virtual IRQs that are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_FIQ",
+        "PublicDescription": "This event counts FIQ exceptions including the virtual FIQs that are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_SMC",
+        "PublicDescription": "This event counts SMC exceptions taken to EL3."
+    },
+    {
+        "ArchStdEvent": "EXC_HVC",
+        "PublicDescription": "This event counts HVC exceptions taken to EL2."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_PABORT",
+        "PublicDescription": "This event counts exceptions which are traps not taken locally and are caused by Instruction Aborts. For example, attempting to execute an instruction with a misaligned PC."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_DABORT",
+        "PublicDescription": "This event counts exceptions which are traps not taken locally and are caused by Data Aborts or SError Interrupts. Conditions that could cause those exceptions are:\n* Attempting to read or write memory where the MMU generates a fault,\n* Attempting to read or write memory with a misaligned address,\n* Interrupts from the SEI input,\n* Internally generated SErrors."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_OTHER",
+        "PublicDescription": "This event counts the number of synchronous trap exceptions which are not taken locally and are not SVC, SMC, HVC, Data Aborts, Instruction Aborts, or Interrupts."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_IRQ",
+        "PublicDescription": "This event counts IRQ exceptions including the virtual IRQs that are not taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_FIQ",
+        "PublicDescription": "This event counts FIQs which are not taken locally but taken from EL0, EL1, or EL2 to EL3 (which would be the normal behavior for FIQs when not executing in EL3)."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json
@ -0,0 +1,78 @@
+[
+    {
+        "ArchStdEvent": "FP_HP_SPEC",
+        "PublicDescription": "This event counts speculatively executed half precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_SP_SPEC",
+        "PublicDescription": "This event counts speculatively executed single precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_DP_SPEC",
+        "PublicDescription": "This event counts speculatively executed double precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_SCALE_OPS_SPEC",
+        "PublicDescription": "This event counts speculatively executed scalable single precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_FIXED_OPS_SPEC",
+        "PublicDescription": "This event counts speculatively executed non-scalable single precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_HP_SCALE_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed scalable element arithmetic operation, due to an instruction where the largest type was half-precision floating-point, where v is a value such that (v*(VL/128)) is the number of arithmetic operations carried out by the operation or instruction which causes the counter to increment.\nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_HP_FIXED_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed non-scalable element arithmetic operation, due to an instruction where the largest type was half-precision floating-point, where v is the number of arithmetic operations carried out by the operation or which instruction causes the event to increment.\nThis event does not count operations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_SP_SCALE_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed scalable element arithmetic operation, due to an instruction where the largest type was single-precision floating-point, where v is a value such that (v*(VL/128)) is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_SP_FIXED_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed non-scalable element arithmetic operation, due to an instruction where the largest type was single-precision floating-point, where v is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_DP_SCALE_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed scalable element arithmetic operation, due to an instruction where the largest type was double-precision floating-point, where v is a value such that (v*(VL/128)) is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_DP_FIXED_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed non-scalable element arithmetic operation, due to an instruction where the largest type was double-precision floating-point, where v is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_SP_FIXED_MIN_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed non-scalable element arithmetic operation, due to an instruction where the smallest type was single-precision floating-point, where v is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_HP_FIXED_MIN_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed non-scalable element arithmetic operation, due to an instruction where the smallest type was half-precision floating-point, where v is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_BF16_FIXED_MIN_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed non-scalable element arithmetic operation, due to an instruction where the smallest type was BFloat16 floating-point. Where v is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment. This event does not count operations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_FP8_FIXED_MIN_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed non-scalable element arithmetic operation, due to an instruction where the smallest type was 8-bit floating-point, where v is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_SP_SCALE_MIN_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed scalable element arithmetic operation, due to an instruction where the smallest type was single-precision floating-point, where v is a value such that (v*(VL/128)) is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_HP_SCALE_MIN_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed scalable element arithmetic operation, due to an instruction where the smallest type was half-precision floating-point, where v is a value such that (v*(VL/128)) is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_BF16_SCALE_MIN_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed scalable element arithmetic operation, due to an instruction where the smallest type was BFloat16 floating-point, where v is a value such that (v*(VL/128)) is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    },
+    {
+        "ArchStdEvent": "FP_FP8_SCALE_MIN_OPS_SPEC",
+        "PublicDescription": "This event increments by v for each speculatively executed scalable element arithmetic operation, due to an instruction where the smallest type was 8-bit floating-point, where v is a value such that (v*(VL/128)) is the number of arithmetic operations carried out by the operation or instruction which causes the event to increment.\nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE2_OPS_SPEC."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json
@ -0,0 +1,15 @@
+[
+    {
+        "ArchStdEvent": "CPU_CYCLES",
+        "PublicDescription": "This event counts CPU clock cycles when the PE is not in WFE/WFI. The clock measured by this event is defined as the physical clock driving the CPU logic."
+    },
+    {
+        "ArchStdEvent": "CNT_CYCLES",
+        "PublicDescription": "This event increments at a constant frequency equal to the rate of increment of the System Counter, CNTPCT_EL0.\nThis event does not increment when the PE is in WFE/WFI."
+    },
+    {
+        "EventCode": "0x01e1",
+        "EventName": "CPU_SLOT",
+        "PublicDescription": "Entitled CPU slots.\nThis event counts the number of slots. When in ST mode, this event shall increment by PMMIR_EL1.SLOTS quantities, and when in SMT partitioned resource mode (regardless of in WFI state or otherwise), this event is incremented by PMMIR_EL1.SLOTS/2 quantities."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json
@ -0,0 +1,122 @@
+[
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL",
+        "PublicDescription": "This event counts L1 D-cache refills caused by speculatively executed load or store operations, preload instructions, or hardware cache prefetching that missed in the L1 D-cache. This event only counts one event per cache line.\nSince the caches are Write-back only for this processor, there are no Write-through cache accesses."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE",
+        "PublicDescription": "This event counts L1 D-cache accesses from any load/store operations, software preload, or hardware prefetch operations. Atomic operations that resolve in the CPU's caches (near atomic operations) count as both a write access and read access. Each access to a cache line is counted including the multiple accesses caused by single instructions such as LDM or STM. Each access to other L1 data or unified memory structures, for example refill buffers, write buffers, and write-back buffers, are also counted.\nThis event counts the sum of the following events:\nL1D_CACHE_RD,\nL1D_CACHE_WR,\nL1D_CACHE_PRFM, and\nL1D_CACHE_HWPRF."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB",
+        "PublicDescription": "This event counts write-backs of dirty data from the L1 D-cache to the L2 cache. This occurs when either a dirty cache line is evicted from L1 D-cache and allocated in the L2 cache or dirty data is written to the L2 and possibly to the next level of cache. This event counts both victim cache line evictions and cache write-backs from snoops or cache maintenance operations. The following cache operations are not counted:\n* Invalidations which do not result in data being transferred out of the L1 (such as evictions of clean data),\n* Full line writes which write to L2 without writing L1, such as write streaming mode.\nThis event is the sum of the following events:\nL1D_CACHE_WB_CLEAN and\nL1D_CACHE_WB_VICTIM."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_LMISS_RD",
+        "PublicDescription": "This event counts cache line refills into the L1 D-cache from any memory Read operations, that incurred additional latency.\nCounts same as L1D_CACHE_REFILL_RD on this CPU."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_RD",
+        "PublicDescription": "This event counts L1 D-cache accesses from any Load operation. Atomic Load operations that resolve in the CPU's caches count as both a write access and read access."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WR",
+        "PublicDescription": "This event counts L1 D-cache accesses generated by Store operations. This event also counts accesses caused by a DC ZVA (D-cache zero, specified by virtual address) instruction. Near atomic operations that resolve in the CPU's caches count as a write access and read access.\nThis event is a subset of the L1D_CACHE event, except this event only counts memory Write operations."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_RD",
+        "PublicDescription": "This event counts L1 D-cache refills caused by speculatively executed Load instructions where the memory Read operation misses in the L1 D-cache. This event only counts one event per cache line.\nThis event is a subset of the L1D_CACHE_REFILL event, but only counts memory Read operations. This event does not count reads caused by cache maintenance operations or preload instructions."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_WR",
+        "PublicDescription": "This event counts L1 D-cache refills caused by speculatively executed Store instructions where the memory Write operation misses in the L1 D-cache. This event only counts one event per cache line.\nThis event is a subset of the L1D_CACHE_REFILL event, but only counts memory Write operations."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
+        "PublicDescription": "This event counts L1 D-cache refills (L1D_CACHE_REFILL) where the cache line data came from caches inside the immediate Cluster of the Core (L2 cache)."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
+        "PublicDescription": "This event counts L1 D-cache refills (L1D_CACHE_REFILL) for which the cache line data came from outside the immediate Cluster of the Core, like an SLC in the system interconnect or DRAM or remote socket."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB_VICTIM",
+        "PublicDescription": "This event counts dirty cache line evictions from the L1 D-cache caused by a new cache line allocation. This event does not count evictions caused by cache maintenance operations.\nThis event is a subset of the L1D_CACHE_WB event, but only counts write-backs that are a result of the line being allocated for an access made by the CPU."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB_CLEAN",
+        "PublicDescription": "This event counts write-backs from the L1 D-cache that are a result of a coherency operation made by another CPU. Event counts include cache maintenance operations.\nThis event is a subset of the L1D_CACHE_WB event."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_INVAL",
+        "PublicDescription": "This event counts each explicit invalidation of a cache line in the L1 D-cache caused by:\n* Cache Maintenance Operations (CMO) that operate by a virtual address.\n* Broadcast cache coherency operations from another CPU in the system.\nThis event does not count for the following conditions:\n* A cache refill invalidates a cache line.\n* A CMO which is executed on that CPU and invalidates a cache line specified by Set/Way.\nNote that CMOs that operate by Set/Way cannot be broadcast from one CPU to another."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_RW",
+        "PublicDescription": "This event counts L1 data demand cache accesses from any Load or Store operation. Near atomic operations that resolve in the CPU's caches count as both a write access and read access.\nThis event is implemented as L1D_CACHE_RD + L1D_CACHE_WR"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_PRFM",
+        "PublicDescription": "This event counts L1 D-cache accesses from software preload or prefetch instructions."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_MISS",
+        "PublicDescription": "This event counts each demand access counted by L1D_CACHE_RW that misses in the L1 Data or unified cache, causing an access to outside of the L1 caches of this PE."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_PRFM",
+        "PublicDescription": "This event counts L1 D-cache refills where the cache line access was generated by software preload or prefetch instructions."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_HWPRF",
+        "PublicDescription": "This event counts L1 D-cache accesses from any Load/Store operations generated by the hardware prefetcher."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_HWPRF",
+        "PublicDescription": "This event counts each hardware prefetch access counted by L1D_CACHE_HWPRF that causes a refill of the L1 D-cache from outside of the L1 D-cache."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_HIT_RW_FPRFM",
+        "PublicDescription": "This event counts each demand access first hit counted by L1D_CACHE_HIT_RW_FPRF where the cache line was fetched in response to a prefetch instruction. That is, the L1D_CACHE_REFILL_PRFM event was generated when the cache line was fetched into the cache.\nOnly the first hit by a demand access is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_HIT_RW_FHWPRF",
+        "PublicDescription": "This event counts each demand access first hit counted by L1D_CACHE_HIT_RW_FPRF where the cache line was fetched by a hardware prefetcher. That is, the L1D_CACHE_REFILL_HWPRF Event was generated when the cache line was fetched into the cache.\nOnly the first hit by a demand access is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_HIT_RW_FPRF",
+        "PublicDescription": "This event counts each demand access first hit counted by L1D_CACHE_HIT_RW where the cache line was fetched in response to a prefetch instruction or by a hardware prefetcher. That is, the L1D_CACHE_REFILL_PRF event was generated when the cache line was fetched into the cache.\nOnly the first hit by a demand access is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "ArchStdEvent": "L1D_LFB_HIT_RW_FPRFM",
+        "PublicDescription": "This event counts each demand access line-fill buffer first hit counted by L1D_LFB_HIT_RW_FPRF where the cache line was fetched in response to a prefetch instruction. That is, the access hits a cache line that is in the process of being loaded into the L1 D-cache, and so does not generate a new refill, but has to wait for the previous refill to complete, and the L1D_CACHE_REFILL_PRFM event was generated when the cache line was fetched into the cache.\nOnly the first hit by a demand access is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "ArchStdEvent": "L1D_LFB_HIT_RW_FHWPRF",
+        "PublicDescription": "This event counts each demand access line-fill buffer first hit counted by L1D_LFB_HIT_RW_FPRF, where the cache line was fetched by a hardware prefetcher. That is, the access hits a cache line that is in the process of being loaded into the L1 D-cache, and so does not generate a new refill, but has to wait for the previous refill to complete, and the L1D_CACHE_REFILL_HWPRF Event was generated when the cache line was fetched into the cache.\nOnly the first hit by a demand access is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "ArchStdEvent": "L1D_LFB_HIT_RW_FPRF",
+        "PublicDescription": "This event counts each demand access line-fill buffer first hit counted by L1D_LFB_HIT_RW where the cache line was fetched in response to a prefetch instruction or by a hardware prefetcher. That is, the access hits a cache line that is in the process of being loaded into the L1 D-cache, and so does not generate a new refill, but has to wait for the previous refill to complete, and the L1D_CACHE_REFILL_PRF event was generated when the cache line was fetched into the cache.\nOnly the first hit by a demand access is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "EventCode": "0x01f5",
+        "EventName": "L1D_CACHE_REFILL_RW",
+        "PublicDescription": "L1 D-cache refill, demand Read and Write. This event counts demand Read and Write accesses that causes a refill of the L1 D-cache of this PE, from outside of this cache."
+    },
+    {
+        "EventCode": "0x0204",
+        "EventName": "L1D_CACHE_REFILL_OUTER_LLC",
+        "PublicDescription": "This event counts L1D_CACHE_REFILL from L3 D-cache."
+    },
+    {
+        "EventCode": "0x0205",
+        "EventName": "L1D_CACHE_REFILL_OUTER_DRAM",
+        "PublicDescription": "This event counts L1D_CACHE_REFILL from local memory."
+    },
+    {
+        "EventCode": "0x0206",
+        "EventName": "L1D_CACHE_REFILL_OUTER_REMOTE",
+        "PublicDescription": "This event counts L1D_CACHE_REFILL from a remote memory."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json
@ -0,0 +1,114 @@
+[
+    {
+        "ArchStdEvent": "L1I_CACHE_REFILL",
+        "PublicDescription": "This event counts cache line refills in the L1 I-cache caused by a missed instruction fetch (demand, hardware prefetch, and software preload accesses). Instruction fetches may include accessing multiple instructions, but the single cache line allocation is counted once."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE",
+        "PublicDescription": "This event counts instruction fetches (demand, hardware prefetch, and software preload accesses) which access the L1 Instruction Cache. Instruction Cache accesses caused by cache maintenance operations are not counted."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_LMISS",
+        "PublicDescription": "This event counts cache line refills into the L1 I-cache, that incurred additional latency.\nCounts the same as L1I_CACHE_REFILL in this CPU."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_RD",
+        "PublicDescription": "This event counts demand instruction fetches which access the L1 I-cache."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_PRFM",
+        "PublicDescription": "This event counts instruction fetches generated by software preload or prefetch instructions which access the L1 I-cache."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_HWPRF",
+        "PublicDescription": "This event counts instruction fetches which access the L1 I-cache generated by the hardware prefetcher."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_REFILL_PRFM",
+        "PublicDescription": "This event counts cache line refills in the L1 I-cache caused by a missed instruction fetch generated by software preload or prefetch instructions. Instruction fetches may include accessing multiple instructions, but the single cache line allocation is counted once."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_REFILL_HWPRF",
+        "PublicDescription": "This event counts each hardware prefetch access counted by L1I_CACHE_HWPRF that causes a refill of the Level 1I-cache from outside of the L1 I-cache."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_HIT_RD",
+        "PublicDescription": "This event counts demand instruction fetches that access the L1 I-cache and hit in the L1 I-cache."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_HIT_RD_FPRF",
+        "PublicDescription": "This event counts each demand fetch first hit counted by L1I_CACHE_HIT_RD where the cache line was fetched in response to a software preload or by a hardware prefetcher. That is, the L1I_CACHE_REFILL_PRF event was generated when the cache line was fetched into the cache.\nOnly the first hit by a demand access is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_HIT",
+        "PublicDescription": "This event counts instruction fetches that access the L1 I-cache (demand, hardware prefetch, and software preload accesses) and hit in the L1 I-cache. I-cache accesses caused by cache maintenance operations are not counted."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_HIT_PRFM",
+        "PublicDescription": "This event counts instruction fetches generated by software preload or prefetch instructions that access the L1 I-cache and hit in the L1 I-cache."
+    },
+    {
+        "ArchStdEvent": "L1I_LFB_HIT_RD",
+        "PublicDescription": "This event counts demand instruction fetches that access the L1 I-cache and hit in a line that is in the process of being loaded into the L1 I-cache."
+    },
+    {
+        "EventCode": "0x0174",
+        "EventName": "L1I_HWPRF_REQ_DROP",
+        "PublicDescription": "L1 I-cache hardware prefetch dropped."
+    },
+    {
+        "EventCode": "0x01e3",
+        "EventName": "L1I_CACHE_REFILL_RD",
+        "PublicDescription": "L1 I-cache refill, Read.\nThis event counts demand instruction fetch that causes a refill of the L1 I-cache of this PE, from outside of this cache."
+    },
+    {
+        "EventCode": "0x01ea",
+        "EventName": "L1I_CFC_ENTRIES",
+        "PublicDescription": "This event counts the CFC (Cache Fill Control) entries.\nThe CFC is the fill buffer for I-cache."
+    },
+    {
+        "EventCode": "0x01ef",
+        "EventName": "L1I_CACHE_INVAL",
+        "PublicDescription": "L1 I-cache invalidate.\nThis event counts each explicit invalidation of a cache line in the L1 I-cache caused by:\n* Broadcast cache coherency operations from another CPU in the system.\n* Invalidation dues to capacity eviction in L2 D-cache.\nThis event does not count for the following conditions:\n* A cache refill invalidates a cache line.\n* A CMO which is executed on that CPU Core and invalidates a cache line specified by Set/Way.\n* Cache Maintenance Operations (CMO) that operate by a virtual address.\nNote that\n* CMOs that operate by Set/Way cannot be broadcast from one CPU Core to another.\n* The CMO is treated as No-op for the purposes of L1 I-cache line invalidation, as this Core implements fully coherent I-cache."
+    },
+    {
+        "EventCode": "0x0212",
+        "EventName": "L1I_CACHE_HIT_HWPRF",
+        "PublicDescription": "This event counts each hardware prefetch access that hits an L1 I-cache."
+    },
+    {
+        "EventCode": "0x0215",
+        "EventName": "L1I_LFB_HIT",
+        "PublicDescription": "L1 Line fill buffer hit.\nThis event counts each Demand or software preload or hardware prefetch induced instruction fetch that hits an L1 I-cache line that is in the process of being loaded into the L1 instruction cache, and so does not generate a new refill, but has to wait for the previous refill to complete."
+    },
+    {
+        "EventCode": "0x0216",
+        "EventName": "L1I_LFB_HIT_PRFM",
+        "PublicDescription": "This event counts each software prefetch access that hits a cache line that is in the process of being loaded into the L1 instruction cache, and so does not generate a new refill, but has to wait for the previous refill to complete."
+    },
+    {
+        "EventCode": "0x0219",
+        "EventName": "L1I_LFB_HIT_HWPRF",
+        "PublicDescription": "This event counts each hardware prefetch access that hits a cache line that is in the process of being loaded into the L1 instruction cache, and so does not generate a new refill, but has to wait for the previous refill to complete."
+    },
+    {
+        "EventCode": "0x0221",
+        "EventName": "L1I_PRFM_REQ",
+        "PublicDescription": "L1 I-cache software prefetch requests."
+    },
+    {
+        "EventCode": "0x0222",
+        "EventName": "L1I_HWPRF_REQ",
+        "PublicDescription": "L1 I-cache hardware prefetch requests."
+    },
+    {
+        "EventCode": "0x0228",
+        "EventName": "L1I_CACHE_HIT_PRFM_FPRF",
+        "PublicDescription": "L1 I-cache software prefetch access first hit, fetched by hardware or software prefetch.\nThis event counts each software preload access first hit where the cache line was fetched in response to a hardware prefetcher or software preload instruction.\nOnly the first hit is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "EventCode": "0x022a",
+        "EventName": "L1I_CACHE_HIT_HWPRF_FPRF",
+        "PublicDescription": "L1 I-cache hardware prefetch access first hit, fetched by hardware or software prefetch.\nThis event counts each hardware prefetch access first hit where the cache line was fetched in response to a hardware or prefetch instruction.\nOnly the first hit is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json
@ -0,0 +1,134 @@
+[
+    {
+        "ArchStdEvent": "L2D_CACHE",
+        "PublicDescription": "This event counts accesses to the L2 cache due to data accesses. L2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the L1 D-cache or translation resolutions due to accesses. This event also counts write-back of dirty data from L1 D-cache to the L2 cache.\nI-cache accesses are included in this event. This event is the sum of the following events:\nL2D_CACHE_RD,\nL2D_CACHE_WR,\nL2D_CACHE_PRFM, and\nL2D_CACHE_HWPRF."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL",
+        "PublicDescription": "This event counts cache line refills into the L2 cache. L2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the L1 D-cache or translation resolutions due to accesses.\nI-cache refills are included in this event. This event is the sum of the following events:\nL2D_CACHE_REFILL_RD,\nL2D_CACHE_REFILL_WR,\nL2D_CACHE_REFILL_HWPRF, and\nL2D_CACHE_REFILL_PRFM."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB",
+        "PublicDescription": "This event counts write-backs of data from the L2 cache to outside the CPU. This includes snoops to the L2 (from other CPUs) which return data even if the snoops cause an invalidation. L2 cache line invalidations which do not write data outside the CPU and snoops which return data from an L1 cache are not counted. Data would not be written outside the cache when invalidating a clean cache line.\nThis event is the sum of the following events:\nL2D_CACHE_WB_VICTIM and\nL2D_CACHE_WB_CLEAN."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_RD",
+        "PublicDescription": "This event counts L2 D-cache accesses due to memory Read operations. L2 cache is a unified cache for data and instruction accesses, accesses are for misses in the L1 D-cache or translation resolutions due to accesses.\nI-cache accesses are included in this event. This event is a subset of the L2D_CACHE event, but this event only counts memory Read operations."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WR",
+        "PublicDescription": "This event counts L2 cache accesses due to memory Write operations. L2 cache is a unified cache for data and instruction accesses, accesses are for misses in the L1 D-cache or translation resolutions due to accesses.\nThis event is a subset of the L2D_CACHE event, but this event only counts memory Write operations."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_RD",
+        "PublicDescription": "This event counts refills for memory accesses due to memory Read operation counted by L2D_CACHE_RD. L2 cache is a unified cache for data and instruction accesses, accesses are for misses in the L1 D-cache or translation resolutions due to accesses.\nThis CPU includes I-cache refills in this counter as an L2I equivalent event was not implemented. This event is a subset of the L2D_CACHE_REFILL event. This event does not count L2 refills caused by stashes into L2.\nThis count includes demand requests that encounter an L2 prefetch request or an L2 software prefetch request to the same cache line, which is still pending in the L2 LFB."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_WR",
+        "PublicDescription": "This event counts refills for memory accesses due to memory Write operation counted by L2D_CACHE_WR. L2 cache is a unified cache for data and instruction accesses, accesses are for misses in the L1 D-cache or translation resolutions due to accesses.\nThis count includes demand requests that encounter an L2 prefetch request or an L2 software prefetch request to the same cache line, which is still pending in the L2 LFB."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB_VICTIM",
+        "PublicDescription": "This event counts evictions from the L2 cache because of a line being allocated into the L2 cache.\nThis event is a subset of the L2D_CACHE_WB event."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB_CLEAN",
+        "PublicDescription": "This event counts write-backs from the L2 cache that are a result of any of the following:\n* Cache maintenance operations,\n* Snoop responses, or\n* Direct cache transfers to another CPU due to a forwarding snoop request.\nThis event is a subset of the L2D_CACHE_WB event."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_INVAL",
+        "PublicDescription": "This event counts each explicit invalidation of a cache line in the L2 cache by cache maintenance operations that operate by a virtual address, or by external coherency operations. This event does not count if either:\n* A cache refill invalidates a cache line, or\n* A cache Maintenance Operation (CMO), which invalidates a cache line specified by Set/Way,\nis executed on that CPU.\nCMOs that operate by Set/Way cannot be broadcast from one CPU to another."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_LMISS_RD",
+        "PublicDescription": "This event counts cache line refills into the L2 unified cache from any memory Read operations that incurred additional latency.\nCounts the same as L2D_CACHE_REFILL_RD in this CPU"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_RW",
+        "PublicDescription": "This event counts L2 cache demand accesses from any Load/Store operations. L2 cache is a unified cache for data and instruction accesses, accesses are for misses in the L1 D-cache or translation resolutions due to accesses.\nI-cache accesses are included in this event.\nThis event is the sum of the following events:\nL2D_CACHE_RD and\nL2D_CACHE_WR."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_PRFM",
+        "PublicDescription": "This event counts L2 D-cache accesses generated by software preload or prefetch instructions with target = L1/L2/L3 cache.\nNote that a software preload or prefetch instructions with (target = L1/L2/L3) that hits in L1D will not result in an L2 D-cache access. Therefore, such a software preload or prefetch instructions will not be counted by this event."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_MISS",
+        "PublicDescription": "This event counts cache line misses in the L2 cache. L2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the L1 D-cache or translation resolutions due to accesses.\nThis event counts the same as L2D_CACHE_REFILL_RD in this CPU."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_PRFM",
+        "PublicDescription": "This event counts refills due to accesses generated as a result of software preload or prefetch instructions as counted by L2D_CACHE_PRFM. I-cache refills are included in this event."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_HWPRF",
+        "PublicDescription": "This event counts the L2 D-cache access caused by L1 or L2 hardware prefetcher."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_HWPRF",
+        "PublicDescription": "This event counts each hardware prefetch access counted by L2D_CACHE_HWPRF that causes a refill of the L2 cache, or any L1 Data, or Instruction cache of this PE, from outside of those caches.\nThis does not include prefetch requests pending waiting for a refill in LFB and a new demand request to the same cache line hitting the LFB entry. All such refills are counted as L2D_LFB_HIT_RWL1PRF_FHWPRF."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_PRF",
+        "PublicDescription": "This event counts each access to L2 Cache due to a prefetch instruction, or hardware prefetch that causes a refill of the L2 or any Level 1, from outside of those caches."
+    },
+    {
+        "EventCode": "0x0108",
+        "EventName": "L2D_CACHE_IF_REFILL",
+        "PublicDescription": "L2 D-cache refill, instruction fetch.\nThis event counts demand instruction fetch that causes a refill of the L2 cache or L1 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x0109",
+        "EventName": "L2D_CACHE_TBW_REFILL",
+        "PublicDescription": "L2 D-cache refill, Page table walk.\nThis event counts demand translation table walk that causes a refill of the L2 cache or L1 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x010a",
+        "EventName": "L2D_CACHE_PF_REFILL",
+        "PublicDescription": "L2 D-cache refill, prefetch.\nThis event counts L1 or L2 hardware or software prefetch accesses that causes a refill of the L2 cache or L1 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x010b",
+        "EventName": "L2D_LFB_HIT_RWL1PRF_FHWPRF",
+        "PublicDescription": "L2 line fill buffer demand Read, demand Write or L1 prefetch first hit, fetched by hardware prefetch.\nThis event counts each of the following access that hit the line-fill buffer when the same cache line is already being fetched due to an L2 hardware prefetcher.\n* Demand Read or Write\n* L1I-HWPRF\n* L1D-HWPRF\n* L1I PRFM\n* L1D PRFM\nThese accesses hit a cache line that is currently being loaded into the L2 cache as a result of a hardware prefetcher to the same line. Consequently, this access does not initiate a new refill but waits for the completion of the previous refill.\nOnly the first hit is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "EventCode": "0x0179",
+        "EventName": "L2D_CACHE_HIT_RWL1PRF_FHWPRF",
+        "PublicDescription": "L2 D-cache demand Read, demand Write and L1 prefetch hit, fetched by hardware prefetch. This event counts each demand Read, demand Write and L1 hardware or software prefetch request that hit an L2 D-cache line that was refilled into L2 D-cache in response to an L2 hardware prefetch. Only the first hit is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "EventCode": "0x01b8",
+        "EventName": "L2D_CACHE_L1PRF",
+        "PublicDescription": "L2 D-cache access, L1 hardware or software prefetch. This event counts L1 Hardware or software prefetch access to L2 D-cache."
+    },
+    {
+        "EventCode": "0x01b9",
+        "EventName": "L2D_CACHE_REFILL_L1PRF",
+        "PublicDescription": "L2 D-cache refill, L1 hardware or software prefetch.\nThis event counts each access counted by L2D_CACHE_L1PRF that causes a refill of the L2 cache or any L1 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x0201",
+        "EventName": "L2D_CACHE_BACKSNOOP_L1D_VIRT_ALIASING",
+        "PublicDescription": "This event counts when the L2 D-cache sends an invalidating back-snoop to the L1 D for an access initiated by the L1 D, where the corresponding line is already present in the L1 D-cache.\nThe L2 D-cache line tags the PE that refilled the line. It also retains specific bits of the VA to identify virtually aliased addresses.\nThe L1 D request requiring a back-snoop can originate either from the same PE that refilled the L2 D line or from a different PE. In either case, this event only counts those back snoop where the requested VA mismatch the VA stored in the L2 D tag.\nThis event is counted only by PE that initiated the original request necessitating a back-snoop.\nNote : The L1 D is VIPT, it identifies this access as a miss. Conversely, as L2 is PIPT, it identifies this as a hit. L2 D utilizes the back-snoop mechanism to refill L1 D with the snooped data."
+    },
+    {
+        "EventCode": "0x0208",
+        "EventName": "L2D_CACHE_RWL1PRF",
+        "PublicDescription": "L2 D-cache access, demand Read, demand Write or L1 hardware or software prefetch.\nThis event counts each access to L2 D-cache due to the following:\n* Demand Read or Write.\n* L1 Hardware or software prefetch."
+    },
+    {
+        "EventCode": "0x020a",
+        "EventName": "L2D_CACHE_REFILL_RWL1PRF",
+        "PublicDescription": "L2 D-cache refill, demand Read, demand Write or L1 hardware or software prefetch.\nThis event counts each access counted by L2D_CACHE_RWL1PRF that causes a refill of the L2 cache, or any L1 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x020c",
+        "EventName": "L2D_CACHE_HIT_RWL1PRF_FPRFM",
+        "PublicDescription": "L2 D-cache demand Read, demand Write and L1 prefetch hit, fetched by software prefetch.\nThis event counts each demand Read, demand Write and L1 hardware or software prefetch request that hit an L2 D-cache line that was refilled into L2 D-cache in response to an L2 software prefetch. Only the first hit is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "EventCode": "0x020e",
+        "EventName": "L2D_CACHE_HIT_RWL1PRF_FPRF",
+        "PublicDescription": "L2 D-cache demand Read, demand Write and L1 prefetch hit, fetched by software or hardware prefetch.\nThis event counts each demand Read, demand Write and L1 hardware or software prefetch request that hit an L2 D-cache line that was refilled into L2 D-cache in response to an L2 hardware prefetch or software prefetch. Only the first hit is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json
@ -0,0 +1,107 @@
+[
+    {
+        "ArchStdEvent": "L3D_CACHE_ALLOCATE",
+        "PublicDescription": "This event counts each memory Write operation that writes an entire line into the L3 data without fetching data from outside the L3 Data. These are allocations of cache lines in the L3 Data that are not refills counted by\nL3D_CACHE_REFILL. For example:\nA Write-back of an entire cache line from an L2 cache to the L3 D-cache.\n* A Write of an entire cache line from a coalescing Write buffer.\n* An operation such as DC ZVA.\nThis counter does not count writes that write an entire line to beyond level 3. Thus this counter does not count the streaming writes to beyond L3 cache."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_REFILL",
+        "PublicDescription": "This event counts each access counted by L3D_CACHE that causes a refill of the L3 Data, or any L1 Data, instruction or L2 cache of this PE, from outside of those caches. This includes the refill due to hardware prefetch and software prefetch accesses.\nThis event is a sum of L3D_CACHE_MISS, L3D_CACHE_REFILL_PRFM and L3D_CACHE_REFILL_HWPRF event.\nA refill includes any access that causes data to be fetched from outside of the L1 to L3 caches, even if the data is ultimately not allocated into the L3 D-cache."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE",
+        "PublicDescription": "This event counts each memory Read operation or memory Write operation that causes a cache access to the Level 3.\nThis event is a sum of the following Events:\n* L3D_CACHE_RD(0x00a0)\n* L3D_CACHE_ALLOCATE(0x0029)\n* L3D_CACHE_PRFM(0x8151)\n* L3D_CACHE_HWPRF(0x8156)\n* L2D_CACHE_WB(0x0018)"
+    },
+    {
+        "ArchStdEvent": "LL_CACHE_RD",
+        "PublicDescription": "This is an alias to the event L3D_CACHE_RD (0x00a0)."
+    },
+    {
+        "ArchStdEvent": "LL_CACHE_MISS_RD",
+        "PublicDescription": "This is an alias to the event L3D_CACHE_REFILL_RD (0x00a2)."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_RD",
+        "PublicDescription": "This event counts each Memory Read operation to L3 D-cache from instruction fetch, Load/Store, and MMU translation table accesses. This does not include hardware prefetcher or PRFM instruction accesses. This include L1 and L2 prefetcher accesses to L3 D-cache."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_REFILL_RD",
+        "PublicDescription": "This event counts each access counted by both L3D_CACHE_RD and L3D_CACHE_REFILL. That is, every refill of the L3 cache counted by L3D_CACHE_REFILL that is caused by a Memory Read operation.\nThe L3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS_RD(0x400b) count the same event in the hardware."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_LMISS_RD",
+        "PublicDescription": "This event counts each memory Read operation to the L3 cache counted by L3D_CACHE that incurs additional latency because it returns data from outside of the L1 to L3 caches.\nThe L3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS_RD(0x400b) count the same event in the hardware."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_RW",
+        "PublicDescription": "This event counts each access counted by L3D_CACHE that is due to a demand memory Read operation or demand memory Write operation.\nThis event is a sum of L3D_CACHE_RD(0x00a0), L3D_CACHE_ALLOCATE(0x0029) and L2D_CACHE_WB(0x0018).\nNote that this counter does not count that writes an entire line to beyond level 3. Thus this counter does not count the streaming Writes to beyond L3 cache."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_PRFM",
+        "PublicDescription": "This event counts each access counted by L3D_CACHE that is due to a prefetch instruction. This includes L3 Data accesses due to the L1, L2, or L3 prefetch instruction."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_MISS",
+        "PublicDescription": "This event counts each demand Read access counted by L3D_CACHE_RD that misses in the L1 to L3 Data, causing an access to outside of the L3 cache.\nThe L3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS_RD(0x400b) count the same event in the hardware."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_REFILL_PRFM",
+        "PublicDescription": "This event counts each access counted by L3D_CACHE_PRFM that causes a refill of the L3 cache, or any L1 or L2 Data, from outside of those caches."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_HWPRF",
+        "PublicDescription": "This event counts each access to L3 cache that is due to a hardware prefetcher. This includes L3D accesses due to the Level-1 or Level-2 or Level-3 hardware prefetcher."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_REFILL_HWPRF",
+        "PublicDescription": "This event counts each hardware prefetch counted by L3D_CACHE_HWPRF that causes a refill of the L3 Data or unified cache, or any L1 or L2 Data, Instruction, or unified cache of this PE, from outside of those caches."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_REFILL_PRF",
+        "PublicDescription": "This event counts each access to L3 cache due to a prefetch instruction, or hardware prefetch that causes a refill of the L3 Data, or any L1 or L2 Data, from outside of those caches."
+    },
+    {
+        "EventCode": "0x01e8",
+        "EventName": "L3D_CACHE_RWL1PRFL2PRF",
+        "PublicDescription": "L3 cache access, demand Read, demand Write, L1 hardware or software prefetch or L2 hardware or software prefetch.\nThis event counts each access to L3 D-cache due to the following:\n* Demand Read or Write.\n* L1 Hardware or software prefetch.\n* L2 Hardware or software prefetch."
+    },
+    {
+        "EventCode": "0x01e9",
+        "EventName": "L3D_CACHE_REFILL_RWL1PRFL2PRF",
+        "PublicDescription": "L3 cache refill, demand Read, demand Write, L1 hardware or software prefetch or L2 hardware or software prefetch.\nThis event counts each access counted by L3D_CACHE_RWL1PRFL2PRF that causes a refill of the L3 cache, or any L1 or L2 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x01f6",
+        "EventName": "L3D_CACHE_REFILL_L2PRF",
+        "PublicDescription": "This event counts each access counted by L3D_CACHE_L2PRF that causes a refill of the L3 cache, or any L1 or L2 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x01f7",
+        "EventName": "L3D_CACHE_HIT_RWL1PRFL2PRF_FPRF",
+        "PublicDescription": "L3 cache demand Read, demand Write, L1 prefetch L2 prefetch first hit, fetched by software or hardware prefetch.\nThis event counts each demand Read, demand Write, L1 hardware or software prefetch request and L2 hardware or software prefetch that hit an L3 D-cache line that was refilled into L3 D-cache in response to an L3 hardware prefetch or software prefetch. Only the first hit is counted. After this event is generated for a cache line, the event is not generated again for the same cache line while it remains in the cache."
+    },
+    {
+        "EventCode": "0x0225",
+        "EventName": "L3D_CACHE_REFILL_IF",
+        "PublicDescription": "L3 cache refill, instruction fetch.\nThis event counts demand instruction fetch that causes a refill of the L3 cache, or any L1 or L2 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x0226",
+        "EventName": "L3D_CACHE_REFILL_MM",
+        "PublicDescription": "L3 cache refill, translation table walk access.\nThis event counts demand translation table access that causes a refill of the L3 cache, or any L1 or L2 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x0227",
+        "EventName": "L3D_CACHE_REFILL_L1PRF",
+        "PublicDescription": "This event counts each access counted by L3D_CACHE_L1PRF that causes a refill of the L3 cache, or any L1 or L2 cache of this PE, from outside of those caches."
+    },
+    {
+        "EventCode": "0x022c",
+        "EventName": "L3D_CACHE_L1PRF",
+        "PublicDescription": "This event counts the L3 D-cache access due to L1 hardware prefetch or software prefetch request.\nThe L1 hardware prefetch or software prefetch requests that miss the L1I, L1D and L2 D-cache are counted by this counter"
+    },
+    {
+        "EventCode": "0x022d",
+        "EventName": "L3D_CACHE_L2PRF",
+        "PublicDescription": "This event counts the L3 D-cache access due to L2 hardware prefetch or software prefetch request.\nThe L2 hardware prefetch or software prefetch requests that miss the L2 D-cache are counted by this counter"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json
@ -0,0 +1,46 @@
+[
+    {
+        "ArchStdEvent": "MEM_ACCESS",
+        "PublicDescription": "This event counts memory accesses issued by the CPU load/store unit, where those accesses are issued due to load or store operations. This event counts memory accesses regardless of whether the data is received from any level of cache hierarchy or external memory. If memory accesses are broken up into smaller transactions than what were specified in the load or store instructions, then the event counts those smaller memory transactions.\nMemory accesses generated by the following instructions or activity are not counted: instruction fetches, cache maintenance instructions, translation table walks or prefetches, memory prefetch operations. This event counts the sum of the following events:\nMEM_ACCESS_RD and\nMEM_ACCESS_WR."
+    },
+    {
+        "ArchStdEvent": "MEMORY_ERROR",
+        "PublicDescription": "This event counts any detected correctable or uncorrectable physical memory errors (ECC or parity) in protected CPU RAMs. On the Core, this event counts errors in the caches (including data and tag RAMs). Any detected memory error (from either a speculative and abandoned access, or an architecturally executed access) is counted.\nNote that errors are only detected when the actual protected memory is accessed by an operation."
+    },
+    {
+        "ArchStdEvent": "REMOTE_ACCESS",
+        "PublicDescription": "This event counts each external bus read access that causes an access to a remote device. That is, a socket that does not contain the PE."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_RD",
+        "PublicDescription": "This event counts memory accesses issued by the CPU due to Load operations. This event counts any memory Load access, no matter whether the data is received from any level of cache hierarchy or external memory. This event also counts atomic Load operations. If memory accesses are broken up by the Load/Store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions.\nThe following instructions are not counted:\n1) Instruction fetches,\n2) Cache maintenance instructions,\n3) Translation table walks or prefetches,\n4) Memory prefetch operations.\nThis event is a subset of the MEM_ACCESS event but the event only counts memory-Read operations."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_WR",
+        "PublicDescription": "This event counts memory accesses issued by the CPU due to Store operations. This event counts any memory Store access, no matter whether the data is located in any level of cache or external memory. This event also counts atomic Load and Store operations. If memory accesses are broken up by the Load/Store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+    },
+    {
+        "ArchStdEvent": "LDST_ALIGN_LAT",
+        "PublicDescription": "This event counts the number of memory Read and Write accesses in a cycle that incurred additional latency due to the alignment of the address and the size of data being accessed, which results in a store crossing a single cache line.\nThis event is implemented as the sum of the following events on this CPU:\nLD_ALIGN_LAT and\nST_ALIGN_LAT."
+    },
+    {
+        "ArchStdEvent": "LD_ALIGN_LAT",
+        "PublicDescription": "This event counts the number of memory Read accesses in a cycle that incurred additional latency due to the alignment of the address and size of data being accessed, which results in a load crossing a single cache line."
+    },
+    {
+        "ArchStdEvent": "ST_ALIGN_LAT",
+        "PublicDescription": "This event counts the number of memory Write accesses in a cycle that incurred additional latency due to the alignment of the address and size of data being accessed."
+    },
+    {
+        "ArchStdEvent": "INST_FETCH_PERCYC",
+        "PublicDescription": "This event counts number of instruction fetches outstanding per cycle, which will provide an average latency of instruction fetch."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_RD_PERCYC",
+        "PublicDescription": "This event counts the number of outstanding Loads or memory Read accesses per cycle."
+    },
+    {
+        "ArchStdEvent": "INST_FETCH",
+        "PublicDescription": "This event counts instruction memory accesses that the PE makes."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json
@ -0,0 +1,722 @@
+[
+    {
+        "MetricName": "backend_bound",
+        "MetricExpr": "100 * (STALL_SLOT_BACKEND / CPU_SLOT)",
+        "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the backend of the processor.",
+        "ScaleUnit": "1percent of slots",
+        "MetricGroup": "TopdownL1"
+    },
+    {
+        "MetricName": "backend_busy_bound",
+        "MetricExpr": "100 * (STALL_BACKEND_BUSY / STALL_BACKEND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to issue queues being full to accept operations for execution.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_cache_l1d_bound",
+        "MetricExpr": "100 * (STALL_BACKEND_L1D / (STALL_BACKEND_L1D + STALL_BACKEND_MEM))",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by L1 D-cache misses.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_cache_l2d_bound",
+        "MetricExpr": "100 * (STALL_BACKEND_MEM / (STALL_BACKEND_L1D + STALL_BACKEND_MEM))",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by L2 D-cache misses.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_core_bound",
+        "MetricExpr": "100 * (STALL_BACKEND_CPUBOUND / STALL_BACKEND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to backend Core resource constraints not related to instruction fetch latency issues caused by memory access components.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_core_rename_bound",
+        "MetricExpr": "100 * (STALL_BACKEND_RENAME / STALL_BACKEND_CPUBOUND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend as the rename unit registers are unavailable.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_mem_bound",
+        "MetricExpr": "100 * (STALL_BACKEND_MEMBOUND / STALL_BACKEND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to backend Core resource constraints related to memory access latency issues caused by memory access components.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_mem_cache_bound",
+        "MetricExpr": "100 * ((STALL_BACKEND_L1D + STALL_BACKEND_MEM) / STALL_BACKEND_MEMBOUND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory latency issues caused by D-cache misses.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_mem_store_bound",
+        "MetricExpr": "100 * (STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory Write pending caused by Stores stalled in the pre-commit stage.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_mem_tlb_bound",
+        "MetricExpr": "100 * (STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by Data TLB misses.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Backend"
+    },
+    {
+        "MetricName": "backend_stalled_cycles",
+        "MetricExpr": "100 * (STALL_BACKEND / CPU_CYCLES)",
+        "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the backend unit of the processor.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Cycle_Accounting"
+    },
+    {
+        "MetricName": "bad_speculation",
+        "MetricExpr": "100 - (frontend_bound + retiring + backend_bound)",
+        "BriefDescription": "This metric is the percentage of total slots that executed operations and didn't retire due to a pipeline flush. This indicates cycles that were utilized but inefficiently.",
+        "ScaleUnit": "1percent of slots",
+        "MetricGroup": "TopdownL1"
+    },
+    {
+        "MetricName": "barrier_percentage",
+        "MetricExpr": "100 * ((ISB_SPEC + DSB_SPEC + DMB_SPEC) / INST_SPEC)",
+        "BriefDescription": "This metric measures instruction and data barrier operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "branch_direct_ratio",
+        "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED",
+        "BriefDescription": "This metric measures the ratio of direct branches retired to the total number of branches architecturally executed.",
+        "ScaleUnit": "1per branch",
+        "MetricGroup": "Branch_Effectiveness"
+    },
+    {
+        "MetricName": "branch_indirect_ratio",
+        "MetricExpr": "BR_IND_RETIRED / BR_RETIRED",
+        "BriefDescription": "This metric measures the ratio of indirect branches retired, including function returns, to the total number of branches architecturally executed.",
+        "ScaleUnit": "1per branch",
+        "MetricGroup": "Branch_Effectiveness"
+    },
+    {
+        "MetricName": "branch_misprediction_ratio",
+        "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
+        "BriefDescription": "This metric measures the ratio of branches mispredicted to the total number of branches architecturally executed. This gives an indication of the effectiveness of the branch prediction unit.",
+        "ScaleUnit": "1per branch",
+        "MetricGroup": "Miss_Ratio;Branch_Effectiveness"
+    },
+    {
+        "MetricName": "branch_mpki",
+        "MetricExpr": "1000 * (BR_MIS_PRED_RETIRED / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of branch mispredictions per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;Branch_Effectiveness"
+    },
+    {
+        "MetricName": "branch_percentage",
+        "MetricExpr": "100 * ((BR_IMMED_SPEC + BR_INDIRECT_SPEC) / INST_SPEC)",
+        "BriefDescription": "This metric measures branch operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "branch_return_ratio",
+        "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED",
+        "BriefDescription": "This metric measures the ratio of branches retired that are function returns to the total number of branches architecturally executed.",
+        "ScaleUnit": "1per branch",
+        "MetricGroup": "Branch_Effectiveness"
+    },
+    {
+        "MetricName": "bus_bandwidth",
+        "MetricExpr": "BUS_ACCESS * 32 / duration_time ",
+        "BriefDescription": "This metric measures the bus-bandwidth of the data transferred between this PE's L2 with unCore in the system.",
+        "ScaleUnit": "1Bytes/sec"
+    },
+    {
+        "MetricName": "cpu_cycles_fraction_in_st_mode",
+        "MetricExpr": "((CPU_SLOT/CPU_CYCLES) - 5) / 5",
+        "BriefDescription": "This metric counts fraction of the CPU cycles spent in ST mode during program execution.",
+        "ScaleUnit": "1fraction of cycles",
+        "MetricGroup": "SMT"
+    },
+    {
+        "MetricName": "cpu_cycles_in_smt_mode",
+        "MetricExpr": "(1 - cpu_cycles_fraction_in_st_mode) * CPU_CYCLES",
+        "BriefDescription": "This metric counts CPU cycles in SMT mode during program execution.",
+        "ScaleUnit": "1CPU cycles",
+        "MetricGroup": "SMT"
+    },
+    {
+        "MetricName": "cpu_cycles_in_st_mode",
+        "MetricExpr": "cpu_cycles_fraction_in_st_mode * CPU_CYCLES",
+        "BriefDescription": "This metric counts CPU cycles in ST mode during program execution.",
+        "ScaleUnit": "1CPU cycles",
+        "MetricGroup": "SMT"
+    },
+    {
+        "MetricName": "crypto_percentage",
+        "MetricExpr": "100 * (CRYPTO_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures crypto operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "dtlb_mpki",
+        "MetricExpr": "1000 * (DTLB_WALK / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of Data TLB Walks per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;DTLB_Effectiveness"
+    },
+    {
+        "MetricName": "dtlb_walk_average_latency",
+        "MetricExpr": "DTLB_WALK_PERCYC / DTLB_WALK",
+        "BriefDescription": "This metric measures the average latency of Data TLB walks in CPU cycles.",
+        "ScaleUnit": "1CPU cycles",
+        "MetricGroup": "Average_Latency"
+    },
+    {
+        "MetricName": "dtlb_walk_ratio",
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "BriefDescription": "This metric measures the ratio of Data TLB Walks to the total number of Data TLB accesses. This gives an indication of the effectiveness of the Data TLB accesses.",
+        "ScaleUnit": "1per TLB access",
+        "MetricGroup": "Miss_Ratio;DTLB_Effectiveness"
+    },
+    {
+        "MetricName": "fp16_percentage",
+        "MetricExpr": "100 * (FP_HP_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures half-precision floating point operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "FP_Precision_Mix"
+    },
+    {
+        "MetricName": "fp32_percentage",
+        "MetricExpr": "100 * (FP_SP_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures single-precision floating point operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "FP_Precision_Mix"
+    },
+    {
+        "MetricName": "fp64_percentage",
+        "MetricExpr": "100 * (FP_DP_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures double-precision floating point operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "FP_Precision_Mix"
+    },
+    {
+        "MetricName": "fp_ops_per_cycle",
+        "MetricExpr": "(FP_SCALE_OPS_SPEC + FP_FIXED_OPS_SPEC) / CPU_CYCLES",
+        "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by any instruction. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
+        "ScaleUnit": "1operations per cycle",
+        "MetricGroup": "FP_Arithmetic_Intensity"
+    },
+    {
+        "MetricName": "frontend_bound",
+        "MetricExpr": "100 * (STALL_SLOT_FRONTEND_WITHOUT_MISPRED / CPU_SLOT)",
+        "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
+        "ScaleUnit": "1percent of slots",
+        "MetricGroup": "TopdownL1"
+    },
+    {
+        "MetricName": "frontend_cache_l1i_bound",
+        "MetricExpr": "100 * (STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM))",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by L1 I-cache misses.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Frontend"
+    },
+    {
+        "MetricName": "frontend_cache_l2i_bound",
+        "MetricExpr": "100 * (STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM))",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by L2 I-cache misses.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Frontend"
+    },
+    {
+        "MetricName": "frontend_core_bound",
+        "MetricExpr": "100 * (STALL_FRONTEND_CPUBOUND / STALL_FRONTEND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend Core resource constraints not related to instruction fetch latency issues caused by memory access components.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Frontend"
+    },
+    {
+        "MetricName": "frontend_core_flow_bound",
+        "MetricExpr": "100 * (STALL_FRONTEND_FLOW / STALL_FRONTEND_CPUBOUND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend as the decode unit is awaiting input from the branch prediction unit.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Frontend"
+    },
+    {
+        "MetricName": "frontend_core_flush_bound",
+        "MetricExpr": "100 * (STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUBOUND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend as the processor is recovering from a pipeline flush caused by bad speculation or other machine resteers.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Frontend"
+    },
+    {
+        "MetricName": "frontend_mem_bound",
+        "MetricExpr": "100 * (STALL_FRONTEND_MEMBOUND / STALL_FRONTEND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend Core resource constraints related to the instruction fetch latency issues caused by memory access components.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Frontend"
+    },
+    {
+        "MetricName": "frontend_mem_cache_bound",
+        "MetricExpr": "100 * ((STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) / STALL_FRONTEND_MEMBOUND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by I-cache misses.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Frontend"
+    },
+    {
+        "MetricName": "frontend_mem_tlb_bound",
+        "MetricExpr": "100 * (STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOUND)",
+        "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by Instruction TLB misses.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Topdown_Frontend"
+    },
+    {
+        "MetricName": "frontend_stalled_cycles",
+        "MetricExpr": "100 * (STALL_FRONTEND / CPU_CYCLES)",
+        "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the frontend unit of the processor.",
+        "ScaleUnit": "1percent of cycles",
+        "MetricGroup": "Cycle_Accounting"
+    },
+    {
+        "MetricName": "instruction_fetch_average_latency",
+        "MetricExpr": "INST_FETCH_PERCYC / INST_FETCH",
+        "BriefDescription": "This metric measures the average latency of instruction fetches in CPU cycles.",
+        "ScaleUnit": "1CPU cycles",
+        "MetricGroup": "Average_Latency"
+    },
+    {
+        "MetricName": "integer_dp_percentage",
+        "MetricExpr": "100 * (DP_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures scalar integer operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "ipc",
+        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
+        "BriefDescription": "This metric measures the number of instructions retired per cycle.",
+        "ScaleUnit": "1per cycle",
+        "MetricGroup": "General"
+    },
+    {
+        "MetricName": "itlb_mpki",
+        "MetricExpr": "1000 * (ITLB_WALK / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of instruction TLB Walks per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;ITLB_Effectiveness"
+    },
+    {
+        "MetricName": "itlb_walk_average_latency",
+        "MetricExpr": "ITLB_WALK_PERCYC / ITLB_WALK",
+        "BriefDescription": "This metric measures the average latency of instruction TLB walks in CPU cycles.",
+        "ScaleUnit": "1CPU cycles",
+        "MetricGroup": "Average_Latency"
+    },
+    {
+        "MetricName": "itlb_walk_ratio",
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "BriefDescription": "This metric measures the ratio of instruction TLB Walks to the total number of Instruction TLB accesses. This gives an indication of the effectiveness of the Instruction TLB accesses.",
+        "ScaleUnit": "1per TLB access",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_cache_miss_ratio",
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "BriefDescription": "This metric measures the ratio of L1 D-cache accesses missed to the total number of L1 D-cache accesses. This gives an indication of the effectiveness of the L1 D-cache.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_cache_mpki",
+        "MetricExpr": "1000 * (L1D_CACHE_REFILL / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of L1 D-cache accesses missed per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;L1D_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_cache_rw_miss_ratio",
+        "MetricExpr": "l1d_demand_misses / l1d_demand_accesses",
+        "BriefDescription": "This metric measures the ratio of L1 D-cache Read accesses missed to the total number of L1 D-cache accesses. This gives an indication of the effectiveness of the L1 D-cache for demand Load or Store traffic.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "L1I_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_demand_accesses",
+        "MetricExpr": "L1D_CACHE_RW",
+        "BriefDescription": "This metric measures the count of L1 D-cache accesses incurred on Load or Store by the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L1I_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_demand_misses",
+        "MetricExpr": "L1D_CACHE_REFILL_RW",
+        "BriefDescription": "This metric measures the count of L1 D-cache misses incurred on a Load or Store by the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L1I_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_prf_accuracy",
+        "MetricExpr": "100 * (l1d_useful_prf / l1d_refilled_prf)",
+        "BriefDescription": "This metric measures the fraction of prefetched memory addresses that are used by the instruction stream.",
+        "ScaleUnit": "1percent of prefetch",
+        "MetricGroup": "L1I_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_prf_coverage",
+        "MetricExpr": "100 * (l1d_useful_prf / (l1d_demand_misses + l1d_refilled_prf))",
+        "BriefDescription": "This metric measures the baseline demand cache misses which the prefetcher brings into the cache.",
+        "ScaleUnit": "1percent of cache access",
+        "MetricGroup": "L1I_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_refilled_prf",
+        "MetricExpr": "L1D_CACHE_REFILL_HWPRF + L1D_CACHE_REFILL_PRFM + L1D_LFB_HIT_RW_FHWPRF + L1D_LFB_HIT_RW_FPRFM",
+        "BriefDescription": "This metric measures the count of cache lines refilled by L1 data prefetcher (hardware prefetches or software preload) into L1 D-cache.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L1I_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_tlb_miss_ratio",
+        "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
+        "BriefDescription": "This metric measures the ratio of L1 Data TLB accesses missed to the total number of L1 Data TLB accesses. This gives an indication of the effectiveness of the L1 Data TLB.",
+        "ScaleUnit": "1per TLB access",
+        "MetricGroup": "Miss_Ratio;DTLB_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_tlb_mpki",
+        "MetricExpr": "1000 * (L1D_TLB_REFILL / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of L1 Data TLB accesses missed per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;DTLB_Effectiveness"
+    },
+    {
+        "MetricName": "l1d_useful_prf",
+        "MetricExpr": "L1D_CACHE_HIT_RW_FPRF + L1D_LFB_HIT_RW_FHWPRF + L1D_LFB_HIT_RW_FPRFM",
+        "BriefDescription": "This metric measures the count of cache lines refilled by L1 data prefetcher (hardware prefetches or software preload) into L1 D-cache which are further used by Load or Store from the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L1I_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_cache_miss_ratio",
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "BriefDescription": "This metric measures the ratio of L1 I-cache accesses missed to the total number of L1 I-cache accesses. This gives an indication of the effectiveness of the L1 I-cache.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_cache_mpki",
+        "MetricExpr": "1000 * (L1I_CACHE_REFILL / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of L1 I-cache accesses missed per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;L1I_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_cache_rd_miss_ratio",
+        "MetricExpr": "l1i_demand_misses / l1i_demand_accesses",
+        "BriefDescription": "This metric measures the ratio of L1 I-cache Read accesses missed to the total number of L1 I-cache accesses. This gives an indication of the effectiveness of the L1 I-cache for demand instruction fetch traffic. Note that cache accesses in this cache are demand instruction fetch.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "L1D_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_demand_accesses",
+        "MetricExpr": "L1I_CACHE_RD",
+        "BriefDescription": "This metric measures the count of L1 I-cache accesses caused by an instruction fetch by the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L1D_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_demand_misses",
+        "MetricExpr": "L1I_CACHE_REFILL_RD",
+        "BriefDescription": "This metric measures the count of L1 I-cache misses caused by an instruction fetch by the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L1D_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_prf_accuracy",
+        "MetricExpr": "100 * (l1i_useful_prf / l1i_refilled_prf)",
+        "BriefDescription": "This metric measures the fraction of prefetched memory addresses that are used by the instruction stream.",
+        "ScaleUnit": "1percent of prefetch",
+        "MetricGroup": "L1D_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_prf_coverage",
+        "MetricExpr": "100 * (l1i_useful_prf / (l1i_demand_misses + l1i_refilled_prf))",
+        "BriefDescription": "This metric measures the baseline demand cache misses which the prefetcher brings into the cache.",
+        "ScaleUnit": "1percent of cache access",
+        "MetricGroup": "L1D_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_refilled_prf",
+        "MetricExpr": "L1I_CACHE_REFILL_HWPRF + L1I_CACHE_REFILL_PRFM",
+        "BriefDescription": "This metric measures the count of cache lines refilled by L1 instruction prefetcher (hardware prefetches or software preload) into L1 I-cache.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L1D_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_tlb_miss_ratio",
+        "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
+        "BriefDescription": "This metric measures the ratio of L1 Instruction TLB accesses missed to the total number of L1 Instruction TLB accesses. This gives an indication of the effectiveness of the L1 Instruction TLB.",
+        "ScaleUnit": "1per TLB access",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_tlb_mpki",
+        "MetricExpr": "1000 * (L1I_TLB_REFILL / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of L1 Instruction TLB accesses missed per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;ITLB_Effectiveness"
+    },
+    {
+        "MetricName": "l1i_useful_prf",
+        "MetricExpr": "L1I_CACHE_HIT_RD_FPRF",
+        "BriefDescription": "This metric measures the count of cache lines refilled by L1 instruction prefetcher (hardware prefetches or software preload) into L1 I-cache which are further used by instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L1D_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l2_cache_miss_ratio",
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "BriefDescription": "This metric measures the ratio of L2 cache accesses missed to the total number of L2 cache accesses. This gives an indication of the effectiveness of the L2 cache, which is a unified cache that stores both data and instruction.\nNote that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "l2_cache_mpki",
+        "MetricExpr": "1000 * (l2d_demand_misses / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of L2 unified cache accesses missed per thousand instructions executed.\nNote that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;L2_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "l2_tlb_miss_ratio",
+        "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
+        "BriefDescription": "This metric measures the ratio of L2 unified TLB accesses missed to the total number of L2 unified TLB accesses.\nThis gives an indication of the effectiveness of the L2 TLB.",
+        "ScaleUnit": "1per TLB access",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness"
+    },
+    {
+        "MetricName": "l2_tlb_mpki",
+        "MetricExpr": "1000 * (L2D_TLB_REFILL / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of L2 unified TLB accesses missed per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness"
+    },
+    {
+        "MetricName": "l2d_cache_rwl1prf_miss_ratio",
+        "MetricExpr": "l2d_demand_misses / l2d_demand_accesses",
+        "BriefDescription": "This metric measures the ratio of L2 D-cache Read accesses missed to the total number of L2 D-cache accesses.\nThis gives an indication of the effectiveness of the L2 D-cache for demand instruction fetch, Load, Store, or L1 prefetcher accesses traffic.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "L2_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l2d_demand_accesses",
+        "MetricExpr": "L2D_CACHE_RD + L2D_CACHE_WR + L2D_CACHE_L1PRF",
+        "BriefDescription": "This metric measures the count of L2 D-cache accesses incurred on an instruction fetch, Load, Store, or L1 prefetcher accesses by the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L2_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l2d_demand_misses",
+        "MetricExpr": "L2D_CACHE_REFILL_RD + L2D_CACHE_REFILL_WR + L2D_CACHE_REFILL_L1PRF",
+        "BriefDescription": "This metric measures the count of L2 D-cache misses incurred on an instruction fetch, Load, Store, or L1 prefetcher accesses by the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L2_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l2d_prf_accuracy",
+        "MetricExpr": "100 * (l2d_useful_prf / l2d_refilled_prf)",
+        "BriefDescription": "This metric measures the fraction of prefetched memory addresses that are used by the instruction stream.",
+        "ScaleUnit": "1percent of prefetch",
+        "MetricGroup": "L2_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l2d_prf_coverage",
+        "MetricExpr": "100 * (l2d_useful_prf / (l2d_demand_misses + l2d_refilled_prf))",
+        "BriefDescription": "This metric measures the baseline demand cache misses which the prefetcher brings into the cache.",
+        "ScaleUnit": "1percent of cache access",
+        "MetricGroup": "L2_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l2d_refilled_prf",
+        "MetricExpr": "(L2D_CACHE_REFILL_PRF - L2D_CACHE_REFILL_L1PRF) + L2D_LFB_HIT_RWL1PRF_FHWPRF",
+        "BriefDescription": "This metric measures the count of cache lines refilled by L2 data prefetcher (hardware prefetches or software preload) into L2 D-cache.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L2_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l2d_useful_prf",
+        "MetricExpr": "L2D_CACHE_HIT_RWL1PRF_FPRF + L2D_LFB_HIT_RWL1PRF_FHWPRF",
+        "BriefDescription": "This metric measures the count of cache lines refilled by L2 data prefetcher (hardware prefetches or software preload) into L2 D-cache which are further used by instruction fetch, Load, Store, or L1 prefetcher accesses from the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L2_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l3d_cache_rwl1prfl2prf_miss_ratio",
+        "MetricExpr": "l3d_demand_misses / l3d_demand_accesses",
+        "BriefDescription": "This metric measures the ratio of L3 D-cache Read accesses missed to the total number of L3 D-cache accesses. This gives an indication of the effectiveness of the L2 D-cache for demand instruction fetch, Load, Store, L1 prefetcher, or L2 prefetcher accesses traffic.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "L3_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l3d_demand_accesses",
+        "MetricExpr": "L3D_CACHE_RWL1PRFL2PRF",
+        "BriefDescription": "This metric measures the count of L3 D-cache accesses incurred on an instruction fetch, Load, Store, L1 prefetcher, or L2 prefetcher accesses by the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L3_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l3d_demand_misses",
+        "MetricExpr": "L3D_CACHE_REFILL_RWL1PRFL2PRF",
+        "BriefDescription": "This metric measures the count of L3 D-cache misses incurred on an instruction fetch, Load, Store, L1 prefetcher, or L2 prefetcher accesses by the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L3_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l3d_prf_accuracy",
+        "MetricExpr": "100 * (l3d_useful_prf / l3d_refilled_prf)",
+        "BriefDescription": "This metric measures the fraction of prefetched memory addresses that are used by the instruction stream.",
+        "ScaleUnit": "1percent of prefetch",
+        "MetricGroup": "L3_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l3d_prf_coverage",
+        "MetricExpr": "100 * (l3d_useful_prf / (l3d_demand_misses + l3d_refilled_prf))",
+        "BriefDescription": "This metric measures the baseline demand cache misses which the prefetcher brings into the cache.",
+        "ScaleUnit": "1percent of cache access",
+        "MetricGroup": "L3_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l3d_refilled_prf",
+        "MetricExpr": "L3D_CACHE_REFILL_HWPRF + L3D_CACHE_REFILL_PRFM - L3D_CACHE_REFILL_L1PRF - L3D_CACHE_REFILL_L2PRF",
+        "BriefDescription": "This metric measures the count of cache lines refilled by L3 data prefetcher (hardware prefetches or software preload) into L3 D-cache.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L3_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "l3d_useful_prf",
+        "MetricExpr": "L3D_CACHE_HIT_RWL1PRFL2PRF_FPRF",
+        "BriefDescription": "This metric measures the count of cache lines refilled by L3 data prefetcher (hardware prefetches or software preload) into L3 D-cache which are further used by instruction fetch, Load, Store, L1 prefetcher, or L2 prefetcher accesses from the instruction stream of the program.",
+        "ScaleUnit": "1count",
+        "MetricGroup": "L3_Prefetcher_Effectiveness"
+    },
+    {
+        "MetricName": "ll_cache_read_hit_ratio",
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "BriefDescription": "This metric measures the ratio of last level cache Read accesses hit in the cache to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for Read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "LL_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "ll_cache_read_miss_ratio",
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "BriefDescription": "This metric measures the ratio of last level cache Read accesses missed to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for Read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+        "ScaleUnit": "1per cache access",
+        "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "ll_cache_read_mpki",
+        "MetricExpr": "1000 * (LL_CACHE_MISS_RD / INST_RETIRED)",
+        "BriefDescription": "This metric measures the number of last level cache Read accesses missed per thousand instructions executed.",
+        "ScaleUnit": "1MPKI",
+        "MetricGroup": "MPKI;LL_Cache_Effectiveness"
+    },
+    {
+        "MetricName": "load_average_latency",
+        "MetricExpr": "MEM_ACCESS_RD_PERCYC / MEM_ACCESS",
+        "BriefDescription": "This metric measures the average latency of Load operations in CPU cycles.",
+        "ScaleUnit": "1CPU cycles",
+        "MetricGroup": "Average_Latency"
+    },
+    {
+        "MetricName": "load_percentage",
+        "MetricExpr": "100 * (LD_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures Load operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "nonsve_fp_ops_per_cycle",
+        "MetricExpr": "FP_FIXED_OPS_SPEC / CPU_CYCLES",
+        "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by an instruction that is not an SVE instruction. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
+        "ScaleUnit": "1operations per cycle",
+        "MetricGroup": "FP_Arithmetic_Intensity"
+    },
+    {
+        "MetricName": "retiring",
+        "MetricExpr": "100 * ((OP_RETIRED/OP_SPEC) * (1 - (STALL_SLOT/CPU_SLOT)))",
+        "BriefDescription": "This metric is the percentage of total slots that retired operations, which indicates cycles that were utilized efficiently.",
+        "ScaleUnit": "1percent of slots",
+        "MetricGroup": "TopdownL1"
+    },
+    {
+        "MetricName": "scalar_fp_percentage",
+        "MetricExpr": "100 * (VFP_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures scalar floating point operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "simd_percentage",
+        "MetricExpr": "100 * (ASE_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures advanced SIMD operations as a percentage of total operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "store_percentage",
+        "MetricExpr": "100 * (ST_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures Store operations as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "sve_all_percentage",
+        "MetricExpr": "100 * (SVE_INST_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures scalable vector operations, including Loads and Stores, as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "Operation_Mix"
+    },
+    {
+        "MetricName": "sve_fp_ops_per_cycle",
+        "MetricExpr": "FP_SCALE_OPS_SPEC / CPU_CYCLES",
+        "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by SVE instructions. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
+        "ScaleUnit": "1operations per cycle",
+        "MetricGroup": "FP_Arithmetic_Intensity"
+    },
+    {
+        "MetricName": "sve_predicate_empty_percentage",
+        "MetricExpr": "100 * (SVE_PRED_EMPTY_SPEC / SVE_PRED_SPEC)",
+        "BriefDescription": "This metric measures scalable vector operations with no active predicates as a percentage of SVE predicated operations speculatively executed.",
+        "ScaleUnit": "1percent of SVE predicated operations",
+        "MetricGroup": "SVE_Effectiveness"
+    },
+    {
+        "MetricName": "sve_predicate_full_percentage",
+        "MetricExpr": "100 * (SVE_PRED_FULL_SPEC / SVE_PRED_SPEC)",
+        "BriefDescription": "This metric measures scalable vector operations with all active predicates as a percentage of SVE predicated operations speculatively executed.",
+        "ScaleUnit": "1percent of SVE predicated operations",
+        "MetricGroup": "SVE_Effectiveness"
+    },
+    {
+        "MetricName": "sve_predicate_partial_percentage",
+        "MetricExpr": "100 * (SVE_PRED_PARTIAL_SPEC / SVE_PRED_SPEC)",
+        "BriefDescription": "This metric measures scalable vector operations with at least one active predicates as a percentage of SVE predicated operations speculatively executed.",
+        "ScaleUnit": "1percent of SVE predicated operations",
+        "MetricGroup": "SVE_Effectiveness"
+    },
+    {
+        "MetricName": "sve_predicate_percentage",
+        "MetricExpr": "100 * (SVE_PRED_SPEC / INST_SPEC)",
+        "BriefDescription": "This metric measures scalable vector operations with predicates as a percentage of operations speculatively executed.",
+        "ScaleUnit": "1percent of operations",
+        "MetricGroup": "SVE_Effectiveness"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json
@ -0,0 +1,642 @@
+[
+    {
+        "ArchStdEvent": "SW_INCR",
+        "PublicDescription": "This event counts software writes to the PMSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a manually updated counter for use by application software.\nThis event could be used to measure any user program event, such as accesses to a particular data structure (by writing to the PMSWINC_EL0 register each time the data structure is accessed).\nTo use the PMSWINC_EL0 register and event, developers must insert instructions that write to the PMSWINC_EL0 register into the source code.\nSince the SW_INCR event records writes to the PMSWINC_EL0 register, there is no need to do a Read/Increment/Write sequence to the PMSWINC_EL0 register."
+    },
+    {
+        "ArchStdEvent": "TRB_WRAP",
+        "PublicDescription": "This event is generated each time the trace buffer current Write pointer is wrapped to the trace buffer base pointer."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT0",
+        "PublicDescription": "Trace unit external output 0."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT1",
+        "PublicDescription": "Trace unit external output 1."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT2",
+        "PublicDescription": "Trace unit external output 2."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT3",
+        "PublicDescription": "Trace unit external output 3."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT4",
+        "PublicDescription": "Cross-trigger Interface output trigger 4."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT5",
+        "PublicDescription": "Cross-trigger Interface output trigger 5."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT6",
+        "PublicDescription": "Cross-trigger Interface output trigger 6."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT7",
+        "PublicDescription": "Cross-trigger Interface output trigger 7."
+    },
+    {
+        "EventCode": "0x00e1",
+        "EventName": "L1I_PRFM_REQ_DROP",
+        "PublicDescription": "L1 I-cache software prefetch dropped."
+    },
+    {
+        "EventCode": "0x0100",
+        "EventName": "L1_PF_REFILL",
+        "PublicDescription": "L1 prefetch requests, refilled to L1 cache."
+    },
+    {
+        "EventCode": "0x0120",
+        "EventName": "FLUSH",
+        "PublicDescription": "This event counts both the CT flush and BX flush. The BR_MIS_PRED counts the BX flushes. So the FLUSH-BR_MIS_PRED gives the CT flushes."
+    },
+    {
+        "EventCode": "0x0121",
+        "EventName": "FLUSH_MEM",
+        "PublicDescription": "Flushes due to memory hazards. This only includes CT flushes."
+    },
+    {
+        "EventCode": "0x0122",
+        "EventName": "FLUSH_BAD_BRANCH",
+        "PublicDescription": "Flushes due to bad predicted branch. This only includes CT flushes."
+    },
+    {
+        "EventCode": "0x0123",
+        "EventName": "FLUSH_STDBYPASS",
+        "PublicDescription": "Flushes due to bad predecode. This only includes CT flushes."
+    },
+    {
+        "EventCode": "0x0124",
+        "EventName": "FLUSH_ISB",
+        "PublicDescription": "Flushes due to ISB or similar side-effects. This only includes CT flushes."
+    },
+    {
+        "EventCode": "0x0125",
+        "EventName": "FLUSH_OTHER",
+        "PublicDescription": "Flushes due to other hazards. This only includes CT flushes."
+    },
+    {
+        "EventCode": "0x0126",
+        "EventName": "STORE_STREAM",
+        "PublicDescription": "Stored lines in streaming no-Write-allocate mode."
+    },
+    {
+        "EventCode": "0x0127",
+        "EventName": "NUKE_RAR",
+        "PublicDescription": "Load/Store nuke due to Read-after-Read ordering hazard."
+    },
+    {
+        "EventCode": "0x0128",
+        "EventName": "NUKE_RAW",
+        "PublicDescription": "Load/Store nuke due to Read-after-Write ordering hazard."
+    },
+    {
+        "EventCode": "0x0129",
+        "EventName": "L1_PF_GEN_PAGE",
+        "PublicDescription": "Load/Store prefetch to L1 generated, Page mode."
+    },
+    {
+        "EventCode": "0x012a",
+        "EventName": "L1_PF_GEN_STRIDE",
+        "PublicDescription": "Load/Store prefetch to L1 generated, stride mode."
+    },
+    {
+        "EventCode": "0x012b",
+        "EventName": "L2_PF_GEN_LD",
+        "PublicDescription": "Load prefetch to L2 generated."
+    },
+    {
+        "EventCode": "0x012d",
+        "EventName": "LS_PF_TRAIN_TABLE_ALLOC",
+        "PublicDescription": "LS prefetch train table entry allocated."
+    },
+    {
+        "EventCode": "0x0130",
+        "EventName": "LS_PF_GEN_TABLE_ALLOC",
+        "PublicDescription": "This event counts the number of cycles with at least one table allocation, for L2 hardware prefetches (including the software PRFM instructions that are converted into hardware prefetches due to D-TLB miss).\nLS prefetch gen table allocation (for L2 prefetches)."
+    },
+    {
+        "EventCode": "0x0131",
+        "EventName": "LS_PF_GEN_TABLE_ALLOC_PF_PEND",
+        "PublicDescription": "This event counts the number of cycles in which at least one hardware prefetch is dropped due to the inability to identify a victim when the generation table is full. The hardware prefetch considered here includes the software PRFM that is converted into hardware prefetches due to D-TLB miss."
+    },
+    {
+        "EventCode": "0x0132",
+        "EventName": "TBW",
+        "PublicDescription": "Tablewalks."
+    },
+    {
+        "EventCode": "0x0134",
+        "EventName": "S1L2_HIT",
+        "PublicDescription": "Translation cache hit on S1L2 walk cache entry."
+    },
+    {
+        "EventCode": "0x0135",
+        "EventName": "S1L1_HIT",
+        "PublicDescription": "Translation cache hit on S1L1 walk cache entry."
+    },
+    {
+        "EventCode": "0x0136",
+        "EventName": "S1L0_HIT",
+        "PublicDescription": "Translation cache hit on S1L0 walk cache entry."
+    },
+    {
+        "EventCode": "0x0137",
+        "EventName": "S2L2_HIT",
+        "PublicDescription": "Translation cache hit for S2L2 IPA walk cache entry."
+    },
+    {
+        "EventCode": "0x0138",
+        "EventName": "IPA_REQ",
+        "PublicDescription": "Translation cache lookups for IPA to PA entries."
+    },
+    {
+        "EventCode": "0x0139",
+        "EventName": "IPA_REFILL",
+        "PublicDescription": "Translation cache refills for IPA to PA entries."
+    },
+    {
+        "EventCode": "0x013a",
+        "EventName": "S1_FLT",
+        "PublicDescription": "Stage1 tablewalk fault."
+    },
+    {
+        "EventCode": "0x013b",
+        "EventName": "S2_FLT",
+        "PublicDescription": "Stage2 tablewalk fault."
+    },
+    {
+        "EventCode": "0x013c",
+        "EventName": "COLT_REFILL",
+        "PublicDescription": "Aggregated page refill."
+    },
+    {
+        "EventCode": "0x0145",
+        "EventName": "L1_PF_HIT",
+        "PublicDescription": "L1 prefetch requests, hitting in L1 cache."
+    },
+    {
+        "EventCode": "0x0146",
+        "EventName": "L1_PF",
+        "PublicDescription": "L1 prefetch requests."
+    },
+    {
+        "EventCode": "0x0147",
+        "EventName": "CACHE_LS_REFILL",
+        "PublicDescription": "L2 D-cache refill, Load/Store."
+    },
+    {
+        "EventCode": "0x0148",
+        "EventName": "CACHE_PF",
+        "PublicDescription": "L2 prefetch requests."
+    },
+    {
+        "EventCode": "0x0149",
+        "EventName": "CACHE_PF_HIT",
+        "PublicDescription": "L2 prefetch requests, hitting in L2 cache."
+    },
+    {
+        "EventCode": "0x0150",
+        "EventName": "UNUSED_PF",
+        "PublicDescription": "L2 unused prefetch."
+    },
+    {
+        "EventCode": "0x0151",
+        "EventName": "PFT_SENT",
+        "PublicDescription": "L2 prefetch TGT sent.\nNote that PFT_SENT != PFT_USEFUL + PFT_DROP. There may be PFT_SENT for which the accesses resulted in a SLC hit."
+    },
+    {
+        "EventCode": "0x0152",
+        "EventName": "PFT_USEFUL",
+        "PublicDescription": "L2 prefetch TGT useful."
+    },
+    {
+        "EventCode": "0x0153",
+        "EventName": "PFT_DROP",
+        "PublicDescription": "L2 prefetch TGT dropped."
+    },
+    {
+        "EventCode": "0x0162",
+        "EventName": "LRQ_FULL",
+        "PublicDescription": "This event counts the number of cycles the LRQ is full."
+    },
+    {
+        "EventCode": "0x0163",
+        "EventName": "FETCH_FQ_EMPTY",
+        "PublicDescription": "Fetch Queue empty cycles."
+    },
+    {
+        "EventCode": "0x0164",
+        "EventName": "FPG2",
+        "PublicDescription": "Forward progress guarantee. Medium range livelock triggered."
+    },
+    {
+        "EventCode": "0x0165",
+        "EventName": "FPG",
+        "PublicDescription": "Forward progress guarantee. Tofu global livelock buster is triggered."
+    },
+    {
+        "EventCode": "0x0172",
+        "EventName": "DEADBLOCK",
+        "PublicDescription": "Write-back evictions converted to dataless EVICT.\nThe victim line is deemed deadblock if the likeliness of a reuse is low. The Core uses dataless evict to evict a deadblock; and it uses an evict with data to evict an L2 line that is not a deadblock."
+    },
+    {
+        "EventCode": "0x0173",
+        "EventName": "PF_PRQ_ALLOC_PF_PEND",
+        "PublicDescription": "L1 prefetch prq allocation (replacing pending)."
+    },
+    {
+        "EventCode": "0x0178",
+        "EventName": "FETCH_ICACHE_INSTR",
+        "PublicDescription": "Instructions fetched from I-cache."
+    },
+    {
+        "EventCode": "0x017b",
+        "EventName": "NEAR_CAS",
+        "PublicDescription": "Near atomics: compare and swap."
+    },
+    {
+        "EventCode": "0x017c",
+        "EventName": "NEAR_CAS_PASS",
+        "PublicDescription": "Near atomics: compare and swap pass."
+    },
+    {
+        "EventCode": "0x017d",
+        "EventName": "FAR_CAS",
+        "PublicDescription": "Far atomics: compare and swap."
+    },
+    {
+        "EventCode": "0x0186",
+        "EventName": "L2_BTB_RELOAD_MAIN_BTB",
+        "PublicDescription": "Number of completed L1 BTB update initiated by L2 BTB hit which swap branch information between L1 BTB and L2 BTB."
+    },
+    {
+        "EventCode": "0x018f",
+        "EventName": "L1_PF_GEN_MCMC",
+        "PublicDescription": "Load/Store prefetch to L1 generated, MCMC."
+    },
+    {
+        "EventCode": "0x0190",
+        "EventName": "PF_MODE_0_CYCLES",
+        "PublicDescription": "Number of cycles in which the hardware prefetcher is in the most aggressive mode."
+    },
+    {
+        "EventCode": "0x0191",
+        "EventName": "PF_MODE_1_CYCLES",
+        "PublicDescription": "Number of cycles in which the hardware prefetcher is in the more aggressive mode."
+    },
+    {
+        "EventCode": "0x0192",
+        "EventName": "PF_MODE_2_CYCLES",
+        "PublicDescription": "Number of cycles in which the hardware prefetcher is in the less aggressive mode."
+    },
+    {
+        "EventCode": "0x0193",
+        "EventName": "PF_MODE_3_CYCLES",
+        "PublicDescription": "Number of cycles in which the hardware prefetcher is in the most conservative mode."
+    },
+    {
+        "EventCode": "0x0194",
+        "EventName": "TXREQ_LIMIT_MAX_CYCLES",
+        "PublicDescription": "Number of cycles in which the dynamic TXREQ limit is the L2_TQ_SIZE."
+    },
+    {
+        "EventCode": "0x0195",
+        "EventName": "TXREQ_LIMIT_3QUARTER_CYCLES",
+        "PublicDescription": "Number of cycles in which the dynamic TXREQ limit is between 3/4 of the L2_TQ_SIZE and the L2_TQ_SIZE-1."
+    },
+    {
+        "EventCode": "0x0196",
+        "EventName": "TXREQ_LIMIT_HALF_CYCLES",
+        "PublicDescription": "Number of cycles in which the dynamic TXREQ limit is between 1/2 of the L2_TQ_SIZE and 3/4 of the L2_TQ_SIZE."
+    },
+    {
+        "EventCode": "0x0197",
+        "EventName": "TXREQ_LIMIT_1QUARTER_CYCLES",
+        "PublicDescription": "Number of cycles in which the dynamic TXREQ limit is between 1/4 of the L2_TQ_SIZE and 1/2 of the L2_TQ_SIZE."
+    },
+    {
+        "EventCode": "0x019d",
+        "EventName": "PREFETCH_LATE_CMC",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit on TQ entry allocated by CMC prefetch request."
+    },
+    {
+        "EventCode": "0x019e",
+        "EventName": "PREFETCH_LATE_BO",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit on TQ entry allocated by BO prefetch request."
+    },
+    {
+        "EventCode": "0x019f",
+        "EventName": "PREFETCH_LATE_STRIDE",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit on TQ entry allocated by STRIDE prefetch request."
+    },
+    {
+        "EventCode": "0x01a0",
+        "EventName": "PREFETCH_LATE_SPATIAL",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit on TQ entry allocated by SPATIAL prefetch request."
+    },
+    {
+        "EventCode": "0x01a2",
+        "EventName": "PREFETCH_LATE_TBW",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit on TQ entry allocated by TBW prefetch request."
+    },
+    {
+        "EventCode": "0x01a3",
+        "EventName": "PREFETCH_LATE_PAGE",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit on TQ entry allocated by PAGE prefetch request."
+    },
+    {
+        "EventCode": "0x01a4",
+        "EventName": "PREFETCH_LATE_GSMS",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit on TQ entry allocated by GSMS prefetch request."
+    },
+    {
+        "EventCode": "0x01a5",
+        "EventName": "PREFETCH_LATE_SIP_CONS",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit on TQ entry allocated by SIP_CONS prefetch request."
+    },
+    {
+        "EventCode": "0x01a6",
+        "EventName": "PREFETCH_REFILL_CMC",
+        "PublicDescription": "PF/prefetch or PF/readclean request from CMC pf engine filled the L2 cache."
+    },
+    {
+        "EventCode": "0x01a7",
+        "EventName": "PREFETCH_REFILL_BO",
+        "PublicDescription": "PF/prefetch or PF/readclean request from BO pf engine filled the L2 cache."
+    },
+    {
+        "EventCode": "0x01a8",
+        "EventName": "PREFETCH_REFILL_STRIDE",
+        "PublicDescription": "PF/prefetch or PF/readclean request from STRIDE pf engine filled the L2 cache."
+    },
+    {
+        "EventCode": "0x01a9",
+        "EventName": "PREFETCH_REFILL_SPATIAL",
+        "PublicDescription": "PF/prefetch or PF/readclean request from SPATIAL pf engine filled the L2 cache."
+    },
+    {
+        "EventCode": "0x01ab",
+        "EventName": "PREFETCH_REFILL_TBW",
+        "PublicDescription": "PF/prefetch or PF/readclean request from TBW pf engine filled the L2 cache."
+    },
+    {
+        "EventCode": "0x01ac",
+        "EventName": "PREFETCH_REFILL_PAGE",
+        "PublicDescription": "PF/prefetch or PF/readclean request from PAGE pf engine filled the L2 cache."
+    },
+    {
+        "EventCode": "0x01ad",
+        "EventName": "PREFETCH_REFILL_GSMS",
+        "PublicDescription": "PF/prefetch or PF/readclean request from GSMS pf engine filled the L2 cache."
+    },
+    {
+        "EventCode": "0x01ae",
+        "EventName": "PREFETCH_REFILL_SIP_CONS",
+        "PublicDescription": "PF/prefetch or PF/readclean request from SIP_CONS pf engine filled the L2 cache."
+    },
+    {
+        "EventCode": "0x01af",
+        "EventName": "CACHE_HIT_LINE_PF_CMC",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit in L2 cache on line filled by CMC prefetch request."
+    },
+    {
+        "EventCode": "0x01b0",
+        "EventName": "CACHE_HIT_LINE_PF_BO",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit in L2 cache on line filled by BO prefetch request."
+    },
+    {
+        "EventCode": "0x01b1",
+        "EventName": "CACHE_HIT_LINE_PF_STRIDE",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit in L2 cache on line filled by STRIDE prefetch request."
+    },
+    {
+        "EventCode": "0x01b2",
+        "EventName": "CACHE_HIT_LINE_PF_SPATIAL",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit in L2 cache on line filled by SPATIAL prefetch request."
+    },
+    {
+        "EventCode": "0x01b4",
+        "EventName": "CACHE_HIT_LINE_PF_TBW",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit in L2 cache on line filled by TBW prefetch request."
+    },
+    {
+        "EventCode": "0x01b5",
+        "EventName": "CACHE_HIT_LINE_PF_PAGE",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit in L2 cache on line filled by PAGE prefetch request."
+    },
+    {
+        "EventCode": "0x01b6",
+        "EventName": "CACHE_HIT_LINE_PF_GSMS",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit in L2 cache on line filled by GSMS prefetch request."
+    },
+    {
+        "EventCode": "0x01b7",
+        "EventName": "CACHE_HIT_LINE_PF_SIP_CONS",
+        "PublicDescription": "LS/readclean or LS/readunique lookup hit in L2 cache on line filled by SIP_CONS prefetch request."
+    },
+    {
+        "EventCode": "0x01ba",
+        "EventName": "PREFETCH_LATE_STORE_ISSUE",
+        "PublicDescription": "This event counts the number of demand requests that matches a Store-issue prefetcher's pending refill request. These are called late prefetch requests and are still counted as useful prefetcher requests for the sake of accuracy and coverage measurements."
+    },
+    {
+        "EventCode": "0x01bb",
+        "EventName": "PREFETCH_LATE_STORE_STRIDE",
+        "PublicDescription": "This event counts the number of demand requests that matches a Store-stride prefetcher's pending refill request. These are called late prefetch requests and are still counted as useful prefetcher requests for the sake of accuracy and coverage measurements."
+    },
+    {
+        "EventCode": "0x01bc",
+        "EventName": "PREFETCH_LATE_PC_OFFSET",
+        "PublicDescription": "This event counts the number of demand requests that matches a PC-offset prefetcher's pending refill request. These are called late prefetch requests and are still counted as useful prefetcher requests for the sake of accuracy and coverage measurements."
+    },
+    {
+        "EventCode": "0x01bd",
+        "EventName": "PREFETCH_LATE_IFUPF",
+        "PublicDescription": "This event counts the number of demand requests that matches a IFU prefetcher's pending refill request. These are called late prefetch requests and are still counted as useful prefetcher requests for the sake of accuracy and coverage measurements."
+    },
+    {
+        "EventCode": "0x01be",
+        "EventName": "PREFETCH_REFILL_STORE_ISSUE",
+        "PublicDescription": "This event counts the number of cache refills due to Store-Issue prefetcher."
+    },
+    {
+        "EventCode": "0x01bf",
+        "EventName": "PREFETCH_REFILL_STORE_STRIDE",
+        "PublicDescription": "This event counts the number of cache refills due to Store-stride prefetcher."
+    },
+    {
+        "EventCode": "0x01c0",
+        "EventName": "PREFETCH_REFILL_PC_OFFSET",
+        "PublicDescription": "This event counts the number of cache refills due to PC-offset prefetcher."
+    },
+    {
+        "EventCode": "0x01c1",
+        "EventName": "PREFETCH_REFILL_IFUPF",
+        "PublicDescription": "This event counts the number of cache refills due to IFU prefetcher."
+    },
+    {
+        "EventCode": "0x01c2",
+        "EventName": "CACHE_HIT_LINE_PF_STORE_ISSUE",
+        "PublicDescription": "This event counts the number of first hit to a cache line filled by Store-issue prefetcher."
+    },
+    {
+        "EventCode": "0x01c3",
+        "EventName": "CACHE_HIT_LINE_PF_STORE_STRIDE",
+        "PublicDescription": "This event counts the number of first hit to a cache line filled by Store-stride prefetcher."
+    },
+    {
+        "EventCode": "0x01c4",
+        "EventName": "CACHE_HIT_LINE_PF_PC_OFFSET",
+        "PublicDescription": "This event counts the number of first hit to a cache line filled by PC-offset prefetcher."
+    },
+    {
+        "EventCode": "0x01c5",
+        "EventName": "CACHE_HIT_LINE_PF_IFUPF",
+        "PublicDescription": "This event counts the number of first hit to a cache line filled by IFU prefetcher."
+    },
+    {
+        "EventCode": "0x01c6",
+        "EventName": "L2_PF_GEN_ST_ISSUE",
+        "PublicDescription": "Store-issue prefetch to L2 generated."
+    },
+    {
+        "EventCode": "0x01c7",
+        "EventName": "L2_PF_GEN_ST_STRIDE",
+        "PublicDescription": "Store-stride prefetch to L2 generated"
+    },
+    {
+        "EventCode": "0x01cb",
+        "EventName": "L2_TQ_OUTSTANDING",
+        "PublicDescription": "Outstanding tracker count, per cycle.\nThis event increments by the number of valid entries pertaining to this thread in the L2TQ, in each cycle.\nThis event can be used to calculate the occupancy of L2TQ by dividing this by the CPU_CYCLES event. The L2TQ queue tracks the outstanding Read, Write and Snoop transactions. The Read transaction and the Write transaction entries are attributable to PE, whereas the Snoop transactions are not always attributable to PE."
+    },
+    {
+        "EventCode": "0x01cc",
+        "EventName": "TXREQ_LIMIT_COUNT_CYCLES",
+        "PublicDescription": "This event increments by the dynamic TXREQ value, in each cycle.\nThis is a companion event of TXREQ_LIMIT_MAX_CYCLES, TXREQ_LIMIT_3QUARTER_CYCLES, TXREQ_LIMIT_HALF_CYCLES, and TXREQ_LIMIT_1QUARTER_CYCLES."
+    },
+    {
+        "EventCode": "0x01ce",
+        "EventName": "L3DPRFM_TO_L2PRQ_CONVERTED",
+        "PublicDescription": "This event counts the number of Converted-L3D-PRFMs. These are indeed L3D PRFM and activities around these PRFM are counted by the L3D_CACHE_PRFM, L3D_CACHE_REFILL_PRFM and L3D_CACHE_REFILL Events."
+    },
+    {
+        "EventCode": "0x01d2",
+        "EventName": "DVM_TLBI_RCVD",
+        "PublicDescription": "This event counts the number of TLBI DVM message received over CHI interface, for *this* Core."
+    },
+    {
+        "EventCode": "0x01d6",
+        "EventName": "DSB_COMMITING_LOCAL_TLBI",
+        "PublicDescription": "This event counts the number of DSB that are retired and committed at least one local TLBI instruction. This event increments no more than once (in a cycle) even if the DSB commits multiple local TLBI instruction."
+    },
+    {
+        "EventCode": "0x01d7",
+        "EventName": "DSB_COMMITING_BROADCAST_TLBI",
+        "PublicDescription": "This event counts the number of DSB that are retired and committed at least one broadcast TLBI instruction. This event increments no more than once (in a cycle) even if the DSB commits multiple broadcast TLBI instruction."
+    },
+    {
+        "EventCode": "0x01eb",
+        "EventName": "L1DPRFM_L2DPRFM_TO_L2PRQ_CONVERTED",
+        "PublicDescription": "This event counts the number of Converted-L1D-PRFMs and Converted-L2D-PRFM.\nActivities involving the Converted-L1D-PRFM are counted by the L1D_CACHE_PRFM. However they are *not* counted by the L1D_CACHE_REFILL_PRFM, and L1D_CACHE_REFILL, as these Converted-L1D-PRFM are treated as L2 D hardware prefetches. Activities around the Converted-L1D-PRFMs and Converted-L2D-PRFMs are counted by the L2D_CACHE_PRFM, L2D_CACHE_REFILL_PRFM and L2D_CACHE_REFILL Events."
+    },
+    {
+        "EventCode": "0x01ec",
+        "EventName": "PREFETCH_LATE_CONVERTED_PRFM",
+        "PublicDescription": "This event counts the number of demand requests that matches a Converted-L1D-PRFM or Converted-L2D-PRFM pending refill request at L2 D-cache. These are called late prefetch requests and are still counted as useful prefetcher requests for the sake of accuracy and coverage measurements.\nNote that this event is not counted by the L2D_CACHE_HIT_RWL1PRF_LATE_HWPRF, though the Converted-L1D-PRFM or Converted-L2D-PRFM are replayed by the L2PRQ."
+    },
+    {
+        "EventCode": "0x01ed",
+        "EventName": "PREFETCH_REFILL_CONVERTED_PRFM",
+        "PublicDescription": "This event counts the number of L2 D-cache refills due to Converted-L1D-PRFM or Converted-L2D-PRFM.\nNote : L2D_CACHE_REFILL_PRFM is inclusive of PREFETCH_REFILL_PRFM_CONVERTED, where both the PREFETCH_REFILL_PRFM_CONVERTED and the L2D_CACHE_REFILL_PRFM increment when L2 D-cache refills due to Converted-L1D-PRFM or Converted-L2D-PRFM."
+    },
+    {
+        "EventCode": "0x01ee",
+        "EventName": "CACHE_HIT_LINE_PF_CONVERTED_PRFM",
+        "PublicDescription": "This event counts the number of first hit to a cache line filled by Converted-L1D-PRFM or Converted-L2D-PRFM.\nNote that L2D_CACHE_HIT_RWL1PRF_FPRFM is inclusive of CACHE_HIT_LINE_PF_CONVERTED_PRFM, where both the CACHE_HIT_LINE_PF_CONVERTED_PRFM and the L2D_CACHE_HIT_RWL1PRF_FPRFM increment on a first hit to L2 D-cache filled by Converted-L1D-PRFM or Converted-L2D-PRFM."
+    },
+    {
+        "EventCode": "0x01f0",
+        "EventName": "TMS_ST_TO_SMT_LATENCY",
+        "PublicDescription": "This event counts the number of CPU cycles spent on TMS for ST-to-SMT switch.\nThis event is counted by both the threads - This event in both threads increment during TMS for ST-to-SMT switch."
+    },
+    {
+        "EventCode": "0x01f1",
+        "EventName": "TMS_SMT_TO_ST_LATENCY",
+        "PublicDescription": "This event counts the number of CPU cycles spent on TMS for SMT-to-ST switch. The count also includes the CPU cycles spend due to an aborted SMT-to-ST TMS attempt.\nThis event is counted only by the thread that is not in WFI."
+    },
+    {
+        "EventCode": "0x01f2",
+        "EventName": "TMS_ST_TO_SMT_COUNT",
+        "PublicDescription": "This event counts the number of completed TMS from ST-to-SMT.\nThis event is counted only by the active thread (the one that is not in WFI).\nNote: When an active thread enters the Debug state in ST-Full resource mode, it is switched to SMT mode. This is because the inactive thread cannot wake up while the other thread remains in the Debug state. To prEvent this issue, threads operating in ST-Full resource mode are transitioned to SMT mode upon entering Debug state. This event count will also reflect such switches from ST to SMT mode.\n(Also see the (NV_CPUACTLR14_EL1.chka_prEvent_st_tx_to_smt_when_tx_in_debug_state bit to disable this behavior.)"
+    },
+    {
+        "EventCode": "0x01f3",
+        "EventName": "TMS_SMT_TO_ST_COUNT",
+        "PublicDescription": "This event counts the number of completed TMS from SMT-to-ST.\nThis event is counted only by the thread that is not in WFI."
+    },
+    {
+        "EventCode": "0x01f4",
+        "EventName": "TMS_SMT_TO_ST_COUNT_ABRT",
+        "PublicDescription": "This event counts the number of aborted TMS from SMT-to-ST.\nThis event is counted only by the thread that is not in WFI."
+    },
+    {
+        "EventCode": "0x0202",
+        "EventName": "L0I_CACHE_RD",
+        "PublicDescription": "This event counts the number of predict blocks serviced out of L0 I-cache.\nNote: The L0 I-cache performs at most 4 L0 I look-up in a cycle. Two of which are to service PB from L0 I. And the other two to refill L0 I-cache from L1 I. This event count only the L0 I-cache lookup pertaining to servicing the PB from L0 I."
+    },
+    {
+        "EventCode": "0x0203",
+        "EventName": "L0I_CACHE_REFILL",
+        "PublicDescription": "This event counts the number of L0I cache refill from L1 I-cache."
+    },
+    {
+        "EventCode": "0x0207",
+        "EventName": "INTR_LATENCY",
+        "PublicDescription": "This event counts the number of cycles elapsed between when an Interrupt is recognized (after masking) to when a uop associated with the first instruction in the destination exception level is allocated. If there is some other flush condition that pre-empts the Interrupt, then the cycles counted terminates early at the first instruction executed after that flush. In the event of dropped Interrupts (when an Interrupt is deasserted before it is taken), this counter measures the number of cycles that elapse from the moment an Interrupt is recognized (post-masking) until the Interrupt is dropped or deasserted.\nNote that\n* IESB(Implicit Error Synchronization Barrier) is an internal mop, so the latency of an implicit IESB mop executed before the Interrupt taken is included in the Interrupt latency count.\n* Nukes or TMS sequence within the window are also counted by the Interrupt latency Event.\n* A SMT to ST TMS will be aborted on detecting the wake condition for the WFI thread. The Interrupt latency count includes any additional penalty for an aborted TMS."
+    },
+    {
+        "EventCode": "0x021c",
+        "EventName": "CWT_ALLOC_ENTRY",
+        "PublicDescription": "Cache Way Tracker Allocate entry."
+    },
+    {
+        "EventCode": "0x021d",
+        "EventName": "CWT_ALLOC_LINE",
+        "PublicDescription": "Cache Way Tracker Allocate line."
+    },
+    {
+        "EventCode": "0x021e",
+        "EventName": "CWT_HIT",
+        "PublicDescription": "Cache Way Tracker hit."
+    },
+    {
+        "EventCode": "0x021f",
+        "EventName": "CWT_HIT_TAG",
+        "PublicDescription": "Cache Way Tracker hit when ITAG lookup suppressed."
+    },
+    {
+        "EventCode": "0x0220",
+        "EventName": "CWT_REPLAY_TAG",
+        "PublicDescription": "Cache Way Tracker causes ITAG replay due to miss when ITAG lookup suppressed."
+    },
+    {
+        "EventCode": "0x0250",
+        "EventName": "GPT_REQ",
+        "PublicDescription": "GPT lookup."
+    },
+    {
+        "EventCode": "0x0251",
+        "EventName": "GPT_WC_HIT",
+        "PublicDescription": "GPT lookup hit in Walk cache."
+    },
+    {
+        "EventCode": "0x0252",
+        "EventName": "GPT_PG_HIT",
+        "PublicDescription": "GPT lookup hit in TLB."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json
@ -0,0 +1,94 @@
+[
+    {
+        "ArchStdEvent": "INST_RETIRED",
+        "PublicDescription": "This event counts instructions that have been architecturally executed."
+    },
+    {
+        "ArchStdEvent": "CID_WRITE_RETIRED",
+        "PublicDescription": "This event counts architecturally executed writes to the CONTEXTIDR_EL1 register, which usually contains the kernel PID and can be output with hardware trace."
+    },
+    {
+        "ArchStdEvent": "BR_IMMED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed direct branches."
+    },
+    {
+        "ArchStdEvent": "BR_RETURN_RETIRED",
+        "PublicDescription": "This event counts architecturally executed procedure returns."
+    },
+    {
+        "ArchStdEvent": "TTBR_WRITE_RETIRED",
+        "PublicDescription": "This event counts architectural writes to TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically updated when the kernel is swapping user-space threads or applications."
+    },
+    {
+        "ArchStdEvent": "BR_RETIRED",
+        "PublicDescription": "This event counts architecturally executed branches, whether the branch is taken or not. Instructions that explicitly write to the PC are also counted. Note that exception generating instructions, exception return instructions, and context synchronization instructions are not counted."
+    },
+    {
+        "ArchStdEvent": "BR_MIS_PRED_RETIRED",
+        "PublicDescription": "This event counts branches counted by BR_RETIRED which were mispredicted and caused a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "OP_RETIRED",
+        "PublicDescription": "This event counts micro-operations that are architecturally executed. This is a count of number of micro-operations retired from the commit queue in a single cycle."
+    },
+    {
+        "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED",
+        "PublicDescription": "This event counts architecturally executed indirect branches excluding procedure returns that were taken."
+    },
+    {
+        "ArchStdEvent": "BR_IMMED_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed direct branches that were correctly predicted."
+    },
+    {
+        "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed direct branches that were mispredicted and caused a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "BR_IND_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed indirect branches including procedure returns that were correctly predicted."
+    },
+    {
+        "ArchStdEvent": "BR_IND_MIS_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed indirect branches including procedure returns that were mispredicted and caused a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "BR_RETURN_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed procedure returns that were correctly predicted."
+    },
+    {
+        "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed procedure returns that were mispredicted and caused a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "BR_INDNR_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed indirect branches excluding procedure returns that were correctly predicted."
+    },
+    {
+        "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed indirect branches excluding procedure returns that were mispredicted and caused a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "BR_TAKEN_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed branches that were taken and were correctly predicted."
+    },
+    {
+        "ArchStdEvent": "BR_TAKEN_MIS_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed branches that were taken and were mispredicted causing a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "BR_SKIP_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed branches that were not taken and were correctly predicted."
+    },
+    {
+        "ArchStdEvent": "BR_SKIP_MIS_PRED_RETIRED",
+        "PublicDescription": "This event counts architecturally executed branches that were not taken and were mispredicted causing a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "BR_PRED_RETIRED",
+        "PublicDescription": "This event counts branch instructions counted by BR_RETIRED which were correctly predicted."
+    },
+    {
+        "ArchStdEvent": "BR_IND_RETIRED",
+        "PublicDescription": "This event counts architecturally executed indirect branches including procedure returns."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json
@ -0,0 +1,42 @@
+[
+    {
+        "ArchStdEvent": "SAMPLE_POP",
+        "PublicDescription": "This event counts statistical profiling sample population, the count of all operations that could be sampled but may or may not be chosen for sampling."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED",
+        "PublicDescription": "This event counts statistical profiling samples taken for sampling."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FILTRATE",
+        "PublicDescription": "This event counts statistical profiling samples taken which are not removed by filtering."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_COLLISION",
+        "PublicDescription": "This event counts statistical profiling samples that have collided with a previous sample and so therefore not taken."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED_BR",
+        "PublicDescription": "This event counts statistical profiling samples taken which are branches."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED_LD",
+        "PublicDescription": "This event counts statistical profiling samples taken which are Loads or Load atomic operations."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED_ST",
+        "PublicDescription": "This event counts statistical profiling samples taken which are Stores or Store atomic operations."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED_OP",
+        "PublicDescription": "This event counts statistical profiling samples taken which are matching any operation type filters supported."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED_EVENT",
+        "PublicDescription": "This event counts statistical profiling samples taken which are matching event packet filter constraints."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED_LAT",
+        "PublicDescription": "This event counts statistical profiling samples taken which are exceeding minimum latency set by operation latency filter constraints."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.json
@ -0,0 +1,230 @@
+[
+    {
+        "ArchStdEvent": "INST_SPEC",
+        "PublicDescription": "This event counts operations that have been speculatively executed."
+    },
+    {
+        "ArchStdEvent": "OP_SPEC",
+        "PublicDescription": "This event counts micro-operations speculatively executed. This is the count of the number of micro-operations dispatched in a cycle."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_LD_SPEC",
+        "PublicDescription": "This event counts unaligned memory Read operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses.\nThis event does not count preload operations (PLD, PLI).\nThis event is a subset of the UNALIGNED_LDST_SPEC event."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_ST_SPEC",
+        "PublicDescription": "This event counts unaligned memory Write operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses.\nThis event is a subset of the UNALIGNED_LDST_SPEC event."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_LDST_SPEC",
+        "PublicDescription": "This event counts unaligned memory operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses.\nThis event is the sum of the following events:\nUNALIGNED_ST_SPEC and\nUNALIGNED_LD_SPEC."
+    },
+    {
+        "ArchStdEvent": "LDREX_SPEC",
+        "PublicDescription": "This event counts Load-Exclusive operations that have been speculatively executed. For example: LDREX, LDX"
+    },
+    {
+        "ArchStdEvent": "STREX_PASS_SPEC",
+        "PublicDescription": "This event counts Store-exclusive operations that have been speculatively executed and have successfully completed the Store operation."
+    },
+    {
+        "ArchStdEvent": "STREX_FAIL_SPEC",
+        "PublicDescription": "This event counts Store-exclusive operations that have been speculatively executed and have not successfully completed the Store operation."
+    },
+    {
+        "ArchStdEvent": "STREX_SPEC",
+        "PublicDescription": "This event counts Store-exclusive operations that have been speculatively executed.\nThis event is the sum of the following events:\nSTREX_PASS_SPEC and\nSTREX_FAIL_SPEC."
+    },
+    {
+        "ArchStdEvent": "LD_SPEC",
+        "PublicDescription": "This event counts speculatively executed Load operations including Single Instruction Multiple Data (SIMD) Load operations."
+    },
+    {
+        "ArchStdEvent": "ST_SPEC",
+        "PublicDescription": "This event counts speculatively executed Store operations including Single Instruction Multiple Data (SIMD) Store operations."
+    },
+    {
+        "ArchStdEvent": "LDST_SPEC",
+        "PublicDescription": "This event counts Load and Store operations that have been speculatively executed."
+    },
+    {
+        "ArchStdEvent": "DP_SPEC",
+        "PublicDescription": "This event counts speculatively executed logical or arithmetic instructions such as MOV/MVN operations."
+    },
+    {
+        "ArchStdEvent": "ASE_SPEC",
+        "PublicDescription": "This event counts speculatively executed Advanced SIMD operations excluding Load, Store, and Move micro-operations that move data to or from SIMD (vector) registers."
+    },
+    {
+        "ArchStdEvent": "VFP_SPEC",
+        "PublicDescription": "This event counts speculatively executed floating point operations. This event does not count operations that move data to or from floating point (vector) registers."
+    },
+    {
+        "ArchStdEvent": "PC_WRITE_SPEC",
+        "PublicDescription": "This event counts speculatively executed operations which cause software changes of the PC. Those operations include all taken branch operations."
+    },
+    {
+        "ArchStdEvent": "CRYPTO_SPEC",
+        "PublicDescription": "This event counts speculatively executed cryptographic operations except for PMULL and VMULL operations."
+    },
+    {
+        "ArchStdEvent": "BR_IMMED_SPEC",
+        "PublicDescription": "This event counts direct branch operations which are speculatively executed."
+    },
+    {
+        "ArchStdEvent": "BR_RETURN_SPEC",
+        "PublicDescription": "This event counts procedure return operations (RET, RETAA and RETAB) which are speculatively executed."
+    },
+    {
+        "ArchStdEvent": "BR_INDIRECT_SPEC",
+        "PublicDescription": "This event counts indirect branch operations including procedure returns, which are speculatively executed. This includes operations that force a software change of the PC, other than exception-generating operations and direct branch instructions. Some examples of the instructions counted by this event include BR Xn, RET, etc."
+    },
+    {
+        "ArchStdEvent": "ISB_SPEC",
+        "PublicDescription": "This event counts ISB operations that are executed."
+    },
+    {
+        "ArchStdEvent": "DSB_SPEC",
+        "PublicDescription": "This event counts DSB operations that are speculatively issued to Load/Store unit in the CPU."
+    },
+    {
+        "ArchStdEvent": "DMB_SPEC",
+        "PublicDescription": "This event counts DMB operations that are speculatively issued to the Load/Store unit in the CPU. This event does not count implied barriers from Load-acquire/Store-release operations."
+    },
+    {
+        "ArchStdEvent": "CSDB_SPEC",
+        "PublicDescription": "This event counts CSDB operations that are speculatively issued to the Load/Store unit in the CPU. This event does not count implied barriers from Load-acquire/Store-release operations."
+    },
+    {
+        "ArchStdEvent": "RC_LD_SPEC",
+        "PublicDescription": "This event counts any Load acquire operations that are speculatively executed. For example: LDAR, LDARH, LDARB"
+    },
+    {
+        "ArchStdEvent": "RC_ST_SPEC",
+        "PublicDescription": "This event counts any Store release operations that are speculatively executed. For example: STLR, STLRH, STLRB"
+    },
+    {
+        "ArchStdEvent": "SIMD_INST_SPEC",
+        "PublicDescription": "This event counts speculatively executed operations that are SIMD or SVE vector operations or Advanced SIMD non-scalar operations."
+    },
+    {
+        "ArchStdEvent": "ASE_INST_SPEC",
+        "PublicDescription": "This event counts speculatively executed Advanced SIMD operations."
+    },
+    {
+        "ArchStdEvent": "SVE_INST_SPEC",
+        "PublicDescription": "This event counts speculatively executed operations that are SVE operations."
+    },
+    {
+        "ArchStdEvent": "INT_SPEC",
+        "PublicDescription": "This event counts speculatively executed integer arithmetic operations."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_SPEC",
+        "PublicDescription": "This event counts speculatively executed predicated SVE operations.\nThis counter also counts SVE operation due to instruction with Governing predicate operand that determines the Active elements that do not write to any SVE Z vector destination register using either zeroing or merging predicate. Thus, the operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* events."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_EMPTY_SPEC",
+        "PublicDescription": "This event counts speculatively executed predicated SVE operations with no active predicate elements.\nThis counter also counts SVE operation due to instruction with Governing predicate operand that determines the Active elements that do not write to any SVE Z vector destination register using either zeroing or merging predicate. Thus, the operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* events."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_FULL_SPEC",
+        "PublicDescription": "This event counts speculatively executed predicated SVE operations with all predicate elements active.\nThis counter also counts SVE operation due to instruction with Governing predicate operand that determines the Active elements that do not write to any SVE Z vector destination register using either zeroing or merging predicate. Thus, the operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* events."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC",
+        "PublicDescription": "This event counts speculatively executed predicated SVE operations with at least one but not all active predicate elements.\nThis counter also counts SVE operation due to instruction with Governing predicate operand that determines the Active elements that do not write to any SVE Z vector destination register using either zeroing or merging predicate. Thus, the operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* events."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC",
+        "PublicDescription": "This event counts speculatively executed predicated SVE operations with at least one non active predicate elements.\nThis counter also counts SVE operation due to instruction with Governing predicate operand that determines the Active elements that do not write to any SVE Z vector destination register using either zeroing or merging predicate. Thus, the operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* events."
+    },
+    {
+        "ArchStdEvent": "PRF_SPEC",
+        "PublicDescription": "This event counts speculatively executed operations that prefetch memory. For example, Scalar: PRFM, SVE: PRFB, PRFD, PRFH, or PRFW."
+    },
+    {
+        "ArchStdEvent": "SVE_LDFF_SPEC",
+        "PublicDescription": "This event counts speculatively executed SVE first fault or non-fault Load operations."
+    },
+    {
+        "ArchStdEvent": "SVE_LDFF_FAULT_SPEC",
+        "PublicDescription": "This event counts speculatively executed SVE first fault or non-fault Load operations that clear at least one bit in the FFR."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT8_SPEC",
+        "PublicDescription": "This event counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type being an 8-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT16_SPEC",
+        "PublicDescription": "This event counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 16-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT32_SPEC",
+        "PublicDescription": "This event counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 32-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT64_SPEC",
+        "PublicDescription": "This event counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 64-bit integer."
+    },
+    {
+        "EventCode": "0x011d",
+        "EventName": "SPEC_RET_STACK_FULL",
+        "PublicDescription": "This event counts predict pipe stalls due to speculative return address predictor full."
+    },
+    {
+        "EventCode": "0x011f",
+        "EventName": "MOPS_SPEC",
+        "PublicDescription": "Macro-ops speculatively decoded."
+    },
+    {
+        "EventCode": "0x0180",
+        "EventName": "BR_SPEC_PRED_TAKEN",
+        "PublicDescription": "Number of predicted taken from branch predictor."
+    },
+    {
+        "EventCode": "0x0181",
+        "EventName": "BR_SPEC_PRED_TAKEN_FROM_L2BTB",
+        "PublicDescription": "Number of predicted taken branch from L2 BTB."
+    },
+    {
+        "EventCode": "0x0182",
+        "EventName": "BR_SPEC_PRED_TAKEN_MULTI",
+        "PublicDescription": "Number of predicted taken for polymorphic branch."
+    },
+    {
+        "EventCode": "0x0185",
+        "EventName": "BR_SPEC_PRED_STATIC",
+        "PublicDescription": "Number of post fetch prediction."
+    },
+    {
+        "EventCode": "0x01d0",
+        "EventName": "TLBI_LOCAL_SPEC",
+        "PublicDescription": "A non-broadcast TLBI instruction executed (Speculatively or otherwise) on *this* PE."
+    },
+    {
+        "EventCode": "0x01d1",
+        "EventName": "TLBI_BROADCAST_SPEC",
+        "PublicDescription": "A broadcast TLBI instruction executed (Speculatively or otherwise) on *this* PE."
+    },
+    {
+        "EventCode": "0x01e7",
+        "EventName": "BR_SPEC_PRED_ALN_REDIR",
+        "PublicDescription": "BPU predict pipe align redirect (either AL-APQ hit/miss)."
+    },
+    {
+        "EventCode": "0x0200",
+        "EventName": "SIMD_CRYPTO_INST_SPEC",
+        "PublicDescription": "SIMD, SVE, and CRYPTO instructions speculatively decoded."
+    },
+    {
+        "EventCode": "0x022e",
+        "EventName": "VPRED_LD_SPEC",
+        "PublicDescription": "This event counts the number of Speculatively-executed-Load operations with addresses produced by the value-prediction mechanism. The loaded data might be discarded if the predicted address differs from the actual address."
+    },
+    {
+        "EventCode": "0x022f",
+        "EventName": "VPRED_LD_SPEC_MISMATCH",
+        "PublicDescription": "This event counts a subset of VPRED_LD_SPEC where the predicted Load address and the actual address mismatched."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json
@ -0,0 +1,145 @@
+[
+    {
+        "ArchStdEvent": "STALL_FRONTEND",
+        "PublicDescription": "This event counts cycles when frontend could not send any micro-operations to the rename stage because of frontend resource stalls caused by fetch memory latency or branch prediction flow stalls. STALL_FRONTEND_SLOTS counts SLOTS during the cycle when this event counts. STALL_SLOT_FRONTEND will count SLOTS when this event is counted on this CPU."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND",
+        "PublicDescription": "This event counts cycles whenever the rename unit is unable to send any micro-operations to the backend of the pipeline because of backend resource constraints. Backend resource constraints can include issue stage fullness, execution stage fullness, or other internal pipeline resource fullness. All the backend slots were empty during the cycle when this event counts."
+    },
+    {
+        "ArchStdEvent": "STALL",
+        "PublicDescription": "This event counts cycles when no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). This event is the sum of the following events:\nSTALL_FRONTEND and\nSTALL_BACKEND."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT_BACKEND",
+        "PublicDescription": "This event counts slots per cycle in which no operations are sent from the rename unit to the backend due to backend resource constraints. STALL_BACKEND counts during the cycle when STALL_SLOT_BACKEND counts at least 1. STALL_BACKEND counts during the cycle when STALL_SLOT_BACKEND is SLOTS."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT_FRONTEND",
+        "PublicDescription": "This event counts slots per cycle in which no operations are sent to the rename unit from the frontend due to frontend resource constraints. STALL_FRONTEND counts during the cycle when STALL_SLOT_FRONTEND is SLOTS."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT",
+        "PublicDescription": "This event counts slots per cycle in which no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall).\nSTALL_SLOT is the sum of the following events:\nSTALL_SLOT_FRONTEND and\nSTALL_SLOT_BACKEND."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_MEM",
+        "PublicDescription": "This event counts cycles when the backend is stalled because there is a pending demand Load request in progress in the last level Core cache.\nLast level cache in this CPU is Level 2, hence this event counts same as STALL_BACKEND_L2D."
+    },
+    {
+        "ArchStdEvent": "STALL_FRONTEND_MEMBOUND",
+        "PublicDescription": "This event counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the memory resources."
+    },
+    {
+        "ArchStdEvent": "STALL_FRONTEND_L1I",
+        "PublicDescription": "This event counts cycles when the frontend is stalled because there is an instruction fetch request pending in the L1 I-cache."
+    },
+    {
+        "ArchStdEvent": "STALL_FRONTEND_MEM",
+        "PublicDescription": "This event counts cycles when the frontend is stalled because there is an instruction fetch request pending in the last level Core cache.\nLast level cache in this CPU is Level 2, hence this event counts rather than STALL_FRONTEND_L2I."
+    },
+    {
+        "ArchStdEvent": "STALL_FRONTEND_TLB",
+        "PublicDescription": "This event counts when the frontend is stalled on any TLB misses being handled. This event also counts the TLB accesses made by hardware prefetches."
+    },
+    {
+        "ArchStdEvent": "STALL_FRONTEND_CPUBOUND",
+        "PublicDescription": "This event counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the CPU resources excluding memory resources."
+    },
+    {
+        "ArchStdEvent": "STALL_FRONTEND_FLOW",
+        "PublicDescription": "This event counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the branch prediction unit."
+    },
+    {
+        "ArchStdEvent": "STALL_FRONTEND_FLUSH",
+        "PublicDescription": "This event counts cycles when the frontend could not send any micro-operations to the rename stage as the frontend is recovering from a machine flush or resteer. Example scenarios that cause a flush include branch mispredictions, taken exceptions, microarchitectural flush etc."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_MEMBOUND",
+        "PublicDescription": "This event counts cycles when the backend could not accept any micro-operations due to resource constraints in the memory resources."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_L1D",
+        "PublicDescription": "This event counts cycles when the backend is stalled because there is a pending demand Load request in progress in the L1 D-cache."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_TLB",
+        "PublicDescription": "This event counts cycles when the backend is stalled on any demand TLB misses being handled."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_ST",
+        "PublicDescription": "This event counts cycles when the backend is stalled and there is a Store that has not reached the pre-commit stage."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_CPUBOUND",
+        "PublicDescription": "This event counts cycles when the backend could not accept any micro-operations due to any resource constraints in the CPU excluding memory resources."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_BUSY",
+        "PublicDescription": "This event counts cycles when the backend could not accept any micro-operations because the issue queues are full to take any operations for execution."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_ILOCK",
+        "PublicDescription": "This event counts cycles when the backend could not accept any micro-operations due to resource constraints imposed by input dependency."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_RENAME",
+        "PublicDescription": "This event counts cycles when backend is stalled even when operations are available from the frontend but at least one is not ready to be sent to the backend because no rename register is available."
+    },
+    {
+        "EventCode": "0x0158",
+        "EventName": "FLAG_DISP_STALL",
+        "PublicDescription": "Rename stalled due to FRF(Flag register file) full."
+    },
+    {
+        "EventCode": "0x0159",
+        "EventName": "GEN_DISP_STALL",
+        "PublicDescription": "Rename stalled due to GRF (General-purpose register file) full."
+    },
+    {
+        "EventCode": "0x015a",
+        "EventName": "VEC_DISP_STALL",
+        "PublicDescription": "Rename stalled due to VRF (Vector register file) full."
+    },
+    {
+        "EventCode": "0x015c",
+        "EventName": "SX_IQ_STALL",
+        "PublicDescription": "Dispatch stalled due to IQ full, SX."
+    },
+    {
+        "EventCode": "0x015d",
+        "EventName": "MX_IQ_STALL",
+        "PublicDescription": "Dispatch stalled due to IQ full, MX."
+    },
+    {
+        "EventCode": "0x015e",
+        "EventName": "LS_IQ_STALL",
+        "PublicDescription": "Dispatch stalled due to IQ full, LS."
+    },
+    {
+        "EventCode": "0x015f",
+        "EventName": "VX_IQ_STALL",
+        "PublicDescription": "Dispatch stalled due to IQ full, VX."
+    },
+    {
+        "EventCode": "0x0160",
+        "EventName": "MCQ_FULL_STALL",
+        "PublicDescription": "Dispatch stalled due to MCQ full."
+    },
+    {
+        "EventCode": "0x01cf",
+        "EventName": "PRD_DISP_STALL",
+        "PublicDescription": "Rename stalled due to predicate registers (physical) are full."
+    },
+    {
+        "EventCode": "0x01e0",
+        "EventName": "CSDB_STALL",
+        "PublicDescription": "Rename stalled due to CSDB."
+    },
+    {
+        "EventCode": "0x01e2",
+        "EventName": "STALL_SLOT_FRONTEND_WITHOUT_MISPRED",
+        "PublicDescription": "Stall slot frontend during non-mispredicted branch.\nThis event counts the STALL_STOT_FRONTEND Events, except for the 4 cycles following a mispredicted branch Event or 4 cycles following a commit flush&restart Event."
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json
+++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json
@ -0,0 +1,158 @@
+[
+    {
+        "ArchStdEvent": "L1I_TLB_REFILL",
+        "PublicDescription": "This event counts L1 Instruction TLB refills from any instruction fetch (demand, hardware prefetch, and software preload accesses). If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL",
+        "PublicDescription": "This event counts L1 Data TLB accesses that resulted in TLB refills. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an AT (Address Translation) instruction.\nThis event counts the sum of the following events:\nL1D_TLB_REFILL_RD and\nL1D_TLB_REFILL_WR."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB",
+        "PublicDescription": "This event counts L1 Data TLB accesses caused by any memory Load or Store operation.\nNote that Load or Store instructions can be broken up into multiple memory operations.\nThis event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1I_TLB",
+        "PublicDescription": "This event counts L1 instruction TLB accesses (caused by demand or hardware prefetch or software preload accesses), whether the access hits or misses in the TLB. This event counts both demand accesses and prefetch or preload generated accesses.\nThis event is a superset of the L1I_TLB_REFILL event."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL",
+        "PublicDescription": "This event counts L2 TLB refills caused by memory operations from both data and instruction fetch, except for those caused by TLB maintenance operations and hardware prefetches.\nThis event is the sum of the following events:\nL2D_TLB_REFILL_RD and\nL2D_TLB_REFILL_WR."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB",
+        "PublicDescription": "This event counts L2 TLB accesses except those caused by TLB maintenance operations.\nThis event is the sum of the following events:\nL2D_TLB_RD and\nL2D_TLB_WR."
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK",
+        "PublicDescription": "This event counts number of demand data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations.\nThis event does not include prefetches."
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK",
+        "PublicDescription": "This event counts number of instruction translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations.\nThis event does not include prefetches."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL_RD",
+        "PublicDescription": "This event counts L1 Data TLB refills caused by memory Read operations. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an Address Translation (AT) instruction.\nThis event is a subset of the L1D_TLB_REFILL event."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL_WR",
+        "PublicDescription": "This event counts L1 Data TLB refills caused by data side memory Write operations. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count with an access from an Address Translation (AT) instruction.\nThis event is a subset of the L1D_TLB_REFILL event."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_RD",
+        "PublicDescription": "This event counts L1 Data TLB accesses caused by memory Read operations. This event counts whether the access hits or misses in the TLB. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_WR",
+        "PublicDescription": "This event counts any L1 Data side TLB accesses caused by memory Write operations. This event counts whether the access hits or misses in the TLB. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL_RD",
+        "PublicDescription": "This event counts L2 TLB refills caused by memory Read operations from both data and instruction fetch except for those caused by TLB maintenance operations or hardware prefetches.\nThis event is a subset of the L2D_TLB_REFILL event."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL_WR",
+        "PublicDescription": "This event counts L2 TLB refills caused by memory Write operations from both data and instruction fetch except for those caused by TLB maintenance operations.\nThis event is a subset of the L2D_TLB_REFILL event."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_RD",
+        "PublicDescription": "This event counts L2 TLB accesses caused by memory Read operations from both data and instruction fetch except for those caused by TLB maintenance operations.\nThis event is a subset of the L2D_TLB event."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_WR",
+        "PublicDescription": "This event counts L2 TLB accesses caused by memory Write operations from both data and instruction fetch except for those caused by TLB maintenance operations.\nThis event is a subset of the L2D_TLB event."
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK_PERCYC",
+        "PublicDescription": "This event counts the number of data translation table walks in progress per cycle."
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK_PERCYC",
+        "PublicDescription": "This event counts the number of instruction translation table walks in progress per cycle."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_RW",
+        "PublicDescription": "This event counts L1 Data TLB demand accesses caused by memory Read or Write operations. This event counts whether the access hits or misses in the TLB. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1I_TLB_RD",
+        "PublicDescription": "This event counts L1 Instruction TLB demand accesses whether the access hits or misses in the TLB."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_PRFM",
+        "PublicDescription": "This event counts L1 Data TLB accesses generated by software prefetch or preload memory accesses. Load or Store instructions can be broken into multiple memory operations. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1I_TLB_PRFM",
+        "PublicDescription": "This event counts L1 Instruction TLB accesses generated by software preload or prefetch instructions. This event counts whether the access hits or misses in the TLB. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "DTLB_HWUPD",
+        "PublicDescription": "This event counts number of memory accesses triggered by a data translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
+    },
+    {
+        "ArchStdEvent": "ITLB_HWUPD",
+        "PublicDescription": "This event counts number of memory accesses triggered by an instruction translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
+    },
+    {
+        "ArchStdEvent": "DTLB_STEP",
+        "PublicDescription": "This event counts number of memory accesses triggered by a demand data translation table walk and performing a Read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD.\nNote that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
+    },
+    {
+        "ArchStdEvent": "ITLB_STEP",
+        "PublicDescription": "This event counts number of memory accesses triggered by an instruction translation table walk and performing a Read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK_LARGE",
+        "PublicDescription": "This event counts number of demand data translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_BLOCK is implemented, then it is an alias for this event in this family.\nNote that partial translations that cause a translation table walk are also counted.\nAlso note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK_LARGE",
+        "PublicDescription": "This event counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_BLOCK event.\nNote that partial translations that cause a translation table walk are also counted.\nAlso note that this event does not count walks triggered by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK_SMALL",
+        "PublicDescription": "This event counts number of data translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_PAGE event is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted.\nAlso note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK_SMALL",
+        "PublicDescription": "This event counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_PAGE event.\nNote that partial translations that cause a translation table walk are also counted.\nAlso note that this event does not count walks triggered by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK_RW",
+        "PublicDescription": "This event counts number of demand data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD.\nNote that partial translations that cause a translation table walk are also counted.\nAlso note that this event does not count walks triggered by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK_RD",
+        "PublicDescription": "This event counts number of demand instruction translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD.\nNote that partial translations that cause a translation table walk are also counted.\nAlso note that this event does not count walks triggered by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK_PRFM",
+        "PublicDescription": "This event counts number of software prefetches or preloads generated data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD.\nNote that partial translations that cause a translation table walk are also counted.\nAlso note that this event does not count walks triggered by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK_PRFM",
+        "PublicDescription": "This event counts number of software prefetches or preloads generated instruction translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD.\nNote that partial translations that cause a translation table walk are also counted.\nAlso note that this event does not count walks triggered by TLB maintenance operations."
+    },
+    {
+        "EventCode": "0x010e",
+        "EventName": "L1D_TLB_REFILL_RD_PF",
+        "PublicDescription": "L1 Data TLB refill, Read, prefetch."
+    },
+    {
+        "EventCode": "0x010f",
+        "EventName": "L2TLB_PF_REFILL",
+        "PublicDescription": "L2 Data TLB refill, Read, prefetch.\nThis event counts MMU refills due to internal PFStream requests."
+    },
+    {
+        "EventCode": "0x0223",
+        "EventName": "L1I_TLB_REFILL_RD",
+        "PublicDescription": "L1 Instruction TLB refills due to Demand miss."
+    },
+    {
+        "EventCode": "0x0224",
+        "EventName": "L1I_TLB_REFILL_PRFM",
+        "PublicDescription": "L1 Instruction TLB refills due to Software prefetch miss."
+    }
+]
--- a/tools/perf/pmu-events/arch/common/common/metrics.json
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@ -46,14 +46,14 @@
    },
    {
        "BriefDescription": "Max front or backend stalls per instruction",
-        "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
+        "MetricExpr": "(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions) if (has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend)) else ((stalled\\-cycles\\-frontend / instructions) if has_event(stalled\\-cycles\\-frontend) else ((stalled\\-cycles\\-backend / instructions) if has_event(stalled\\-cycles\\-backend) else 0))",
        "MetricGroup": "Default",
        "MetricName": "stalled_cycles_per_instruction",
        "DefaultShowEvents": "1"
    },
    {
        "BriefDescription": "Frontend stalls per cycle",
-        "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
+        "MetricExpr": "(stalled\\-cycles\\-frontend / cpu\\-cycles) if has_event(stalled\\-cycles\\-frontend) else 0",
        "MetricGroup": "Default",
        "MetricName": "frontend_cycles_idle",
        "MetricThreshold": "frontend_cycles_idle > 0.1",
@ -61,7 +61,7 @@
    },
    {
        "BriefDescription": "Backend stalls per cycle",
-        "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
+        "MetricExpr": "(stalled\\-cycles\\-backend / cpu\\-cycles) if has_event(stalled\\-cycles\\-backend) else 0",
        "MetricGroup": "Default",
        "MetricName": "backend_cycles_idle",
        "MetricThreshold": "backend_cycles_idle > 0.2",
--- a/tools/perf/pmu-events/arch/x86/alderlake/cache.json
+++ b/tools/perf/pmu-events/arch/x86/alderlake/cache.json
@ -876,105 +876,97 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 128 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 128. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_128",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x80",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 128 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5",
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 16 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 16. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_16",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x10",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 16 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5",
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 256 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 256. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_256",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x100",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 256 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5",
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 32 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 32. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_32",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x20",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 32 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5",
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 4 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 4. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_4",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x4",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 4 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5",
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 512 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 512. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_512",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x200",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 512 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5",
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 64 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 64. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_64",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x40",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 64 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5",
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 8 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 8. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_8",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x8",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 8 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5",
        "Unit": "cpu_atom"
@ -1030,12 +1022,11 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of stores uops retired. Counts with or without PEBS enabled.",
+        "BriefDescription": "Counts the number of stores uops retired.",
        "Counter": "0,1,2,3,4,5",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.STORE_LATENCY",
-        "PublicDescription": "Counts the number of stores uops retired. Counts with or without PEBS enabled. If PEBS is enabled and a PEBS record is generated, will populate PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x6",
        "Unit": "cpu_atom"
--- a/tools/perf/pmu-events/arch/x86/alderlake/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/alderlake/frontend.json
@ -327,6 +327,24 @@
        "UMask": "0x4",
        "Unit": "cpu_core"
    },
+    {
+        "BriefDescription": "ICACHE_TAG.STALLS_INUSE",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_INUSE",
+        "SampleAfterValue": "200003",
+        "UMask": "0x10",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "ICACHE_TAG.STALLS_ISB",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_ISB",
+        "SampleAfterValue": "200003",
+        "UMask": "0x8",
+        "Unit": "cpu_core"
+    },
    {
        "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
        "Counter": "0,1,2,3",
--- a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
@ -244,6 +244,15 @@
        "UMask": "0xfb",
        "Unit": "cpu_atom"
    },
+    {
+        "BriefDescription": "Counts the number of near indirect JMP branch instructions retired.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xc4",
+        "EventName": "BR_INST_RETIRED.INDIRECT_JMP",
+        "SampleAfterValue": "200003",
+        "UMask": "0xef",
+        "Unit": "cpu_atom"
+    },
    {
        "BriefDescription": "This event is deprecated. Refer to new event BR_INST_RETIRED.INDIRECT_CALL",
        "Counter": "0,1,2,3,4,5",
@ -464,6 +473,15 @@
        "UMask": "0x2",
        "Unit": "cpu_core"
    },
+    {
+        "BriefDescription": "Counts the number of mispredicted near indirect JMP branch instructions retired.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xc5",
+        "EventName": "BR_MISP_RETIRED.INDIRECT_JMP",
+        "SampleAfterValue": "200003",
+        "UMask": "0xef",
+        "Unit": "cpu_atom"
+    },
    {
        "BriefDescription": "This event is deprecated. Refer to new event BR_MISP_RETIRED.INDIRECT_CALL",
        "Counter": "0,1,2,3,4,5",
@ -573,7 +591,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.THREAD]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.CORE",
        "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1.",
@ -582,7 +600,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of unhalted core clock cycles.",
+        "BriefDescription": "Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.THREAD_P]",
        "Counter": "0,1,2,3,4,5",
        "EventCode": "0x3c",
        "EventName": "CPU_CLK_UNHALTED.CORE_P",
@ -651,7 +669,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles at TSC frequency.",
        "Counter": "Fixed counter 2",
        "EventName": "CPU_CLK_UNHALTED.REF_TSC",
        "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
@ -689,7 +707,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.CORE]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.THREAD",
        "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.",
@ -707,7 +725,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Counts the number of unhalted core clock cycles.",
+        "BriefDescription": "Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.CORE_P]",
        "Counter": "0,1,2,3,4,5",
        "EventCode": "0x3c",
        "EventName": "CPU_CLK_UNHALTED.THREAD_P",
@ -875,7 +893,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
+        "BriefDescription": "Fixed Counter: Counts the total number of instructions retired.",
        "Counter": "Fixed counter 0",
        "EventName": "INST_RETIRED.ANY",
        "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0. Available PDIST counters: 32",
@ -1273,6 +1291,42 @@
        "UMask": "0x20",
        "Unit": "cpu_core"
    },
+    {
+        "BriefDescription": "Counts the number of CLFLUSH, CLWB, and CLDEMOTE instructions retired.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xe0",
+        "EventName": "MISC_RETIRED1.CL_INST",
+        "SampleAfterValue": "1000003",
+        "UMask": "0xff",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of LFENCE instructions retired.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xe0",
+        "EventName": "MISC_RETIRED1.LFENCE",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x2",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of accesses to KeyLocker cache.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xe1",
+        "EventName": "MISC_RETIRED2.KEYLOCKER_ACCESS",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x10",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of misses to KeyLocker cache.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xe1",
+        "EventName": "MISC_RETIRED2.KEYLOCKER_MISS",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x11",
+        "Unit": "cpu_atom"
+    },
    {
        "BriefDescription": "Cycles stalled due to no store buffers available. (not including draining form sync).",
        "Counter": "0,1,2,3,4,5,6,7",
--- a/tools/perf/pmu-events/arch/x86/alderlaken/cache.json
+++ b/tools/perf/pmu-events/arch/x86/alderlaken/cache.json
@ -246,98 +246,90 @@
        "UMask": "0x82"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 128 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 128. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_128",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x80",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 128 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 16 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 16. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_16",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x10",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 16 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 256 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 256. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_256",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x100",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 256 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 32 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 32. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_32",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x20",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 32 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 4 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 4. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_4",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x4",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 4 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 512 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 512. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_512",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x200",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 512 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 64 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 64. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_64",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x40",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 64 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 8 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 8. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_8",
        "MSRIndex": "0x3F6",
        "MSRValue": "0x8",
-        "PublicDescription": "Counts the number of tagged loads with an instruction latency that exceeds or equals the threshold of 8 cycles as defined in MEC_CR_PEBS_LD_LAT_THRESHOLD (3F6H). Only counts with PEBS enabled. If a PEBS record is generated, will populate the PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x5"
    },
@ -387,12 +379,11 @@
        "UMask": "0x12"
    },
    {
-        "BriefDescription": "Counts the number of stores uops retired. Counts with or without PEBS enabled.",
+        "BriefDescription": "Counts the number of stores uops retired.",
        "Counter": "0,1,2,3,4,5",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.STORE_LATENCY",
-        "PublicDescription": "Counts the number of stores uops retired. Counts with or without PEBS enabled. If PEBS is enabled and a PEBS record is generated, will populate PEBS Latency and PEBS Data Source fields accordingly.",
        "SampleAfterValue": "1000003",
        "UMask": "0x6"
    },
--- a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
@ -108,6 +108,14 @@
        "SampleAfterValue": "200003",
        "UMask": "0xfb"
    },
+    {
+        "BriefDescription": "Counts the number of near indirect JMP branch instructions retired.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xc4",
+        "EventName": "BR_INST_RETIRED.INDIRECT_JMP",
+        "SampleAfterValue": "200003",
+        "UMask": "0xef"
+    },
    {
        "BriefDescription": "This event is deprecated. Refer to new event BR_INST_RETIRED.INDIRECT_CALL",
        "Counter": "0,1,2,3,4,5",
@ -225,6 +233,14 @@
        "SampleAfterValue": "200003",
        "UMask": "0xfb"
    },
+    {
+        "BriefDescription": "Counts the number of mispredicted near indirect JMP branch instructions retired.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xc5",
+        "EventName": "BR_MISP_RETIRED.INDIRECT_JMP",
+        "SampleAfterValue": "200003",
+        "UMask": "0xef"
+    },
    {
        "BriefDescription": "This event is deprecated. Refer to new event BR_MISP_RETIRED.INDIRECT_CALL",
        "Counter": "0,1,2,3,4,5",
@ -278,7 +294,7 @@
        "UMask": "0xfe"
    },
    {
-        "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.THREAD]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.CORE",
        "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1.",
@ -286,7 +302,7 @@
        "UMask": "0x2"
    },
    {
-        "BriefDescription": "Counts the number of unhalted core clock cycles.",
+        "BriefDescription": "Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.THREAD_P]",
        "Counter": "0,1,2,3,4,5",
        "EventCode": "0x3c",
        "EventName": "CPU_CLK_UNHALTED.CORE_P",
@ -303,7 +319,7 @@
        "UMask": "0x1"
    },
    {
-        "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles at TSC frequency.",
        "Counter": "Fixed counter 2",
        "EventName": "CPU_CLK_UNHALTED.REF_TSC",
        "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
@ -320,7 +336,7 @@
        "UMask": "0x1"
    },
    {
-        "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.CORE]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.THREAD",
        "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.",
@ -328,7 +344,7 @@
        "UMask": "0x2"
    },
    {
-        "BriefDescription": "Counts the number of unhalted core clock cycles.",
+        "BriefDescription": "Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.CORE_P]",
        "Counter": "0,1,2,3,4,5",
        "EventCode": "0x3c",
        "EventName": "CPU_CLK_UNHALTED.THREAD_P",
@ -336,7 +352,7 @@
        "SampleAfterValue": "2000003"
    },
    {
-        "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
+        "BriefDescription": "Fixed Counter: Counts the total number of instructions retired.",
        "Counter": "Fixed counter 0",
        "EventName": "INST_RETIRED.ANY",
        "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0. Available PDIST counters: 32",
@ -426,6 +442,38 @@
        "SampleAfterValue": "1000003",
        "UMask": "0x1"
    },
+    {
+        "BriefDescription": "Counts the number of CLFLUSH, CLWB, and CLDEMOTE instructions retired.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xe0",
+        "EventName": "MISC_RETIRED1.CL_INST",
+        "SampleAfterValue": "1000003",
+        "UMask": "0xff"
+    },
+    {
+        "BriefDescription": "Counts the number of LFENCE instructions retired.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xe0",
+        "EventName": "MISC_RETIRED1.LFENCE",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x2"
+    },
+    {
+        "BriefDescription": "Counts the number of accesses to KeyLocker cache.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xe1",
+        "EventName": "MISC_RETIRED2.KEYLOCKER_ACCESS",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x10"
+    },
+    {
+        "BriefDescription": "Counts the number of misses to KeyLocker cache.",
+        "Counter": "0,1,2,3,4,5",
+        "EventCode": "0xe1",
+        "EventName": "MISC_RETIRED2.KEYLOCKER_MISS",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x11"
+    },
    {
        "BriefDescription": "Counts the number of issue slots in a UMWAIT or TPAUSE instruction where no uop issues due to the instruction putting the CPU into the C0.1 activity state. For Tremont, UMWAIT and TPAUSE will only put the CPU into C0.1 activity state (not C0.2 activity state)",
        "Counter": "0,1,2,3,4,5",
--- a/tools/perf/pmu-events/arch/x86/arrowlake/cache.json
+++ b/tools/perf/pmu-events/arch/x86/arrowlake/cache.json
@ -628,6 +628,15 @@
        "UMask": "0x7f",
        "Unit": "cpu_atom"
    },
+    {
+        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled due to an instruction cache or TLB miss.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x35",
+        "EventName": "MEM_BOUND_STALLS_IFETCH.ALL",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x7f",
+        "Unit": "cpu_lowpower"
+    },
    {
        "BriefDescription": "Counts the number of cycles the core is stalled due to an instruction cache or TLB miss which hit in the L2 cache.",
        "Counter": "0,1,2,3,4,5,6,7",
@ -731,6 +740,24 @@
        "UMask": "0x6",
        "Unit": "cpu_atom"
    },
+    {
+        "BriefDescription": "Counts the number of unhalted cycles that the core is stalled due to a demand load miss which hit in the LLC, no snoop was required, and the LLC provided data",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x34",
+        "EventName": "MEM_BOUND_STALLS_LOAD.LLC_HIT_NOSNOOP",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x2",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled due to a demand load miss which hit in the LLC, a snoop was required, the snoop misses or the snoop hits but no fwd. LLC provides the data",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x34",
+        "EventName": "MEM_BOUND_STALLS_LOAD.LLC_HIT_SNOOP",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x4",
+        "Unit": "cpu_atom"
+    },
    {
        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled due to a demand load miss which missed all the local caches.",
        "Counter": "0,1,2,3,4,5,6,7",
@ -749,6 +776,24 @@
        "UMask": "0x78",
        "Unit": "cpu_lowpower"
    },
+    {
+        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled due to a demand load miss which missed all the caches.  DRAM, MMIO or other LOCAL memory type provides the data",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x34",
+        "EventName": "MEM_BOUND_STALLS_LOAD.LLC_MISS_LOCALMEM",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x50",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled to a demand load miss and the data was provided from an unknown source. If the core has access to an L3 cache, an LLC miss refers to an L3 cache miss, otherwise it is an L2 cache miss.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x34",
+        "EventName": "MEM_BOUND_STALLS_LOAD.LLC_MISS_LOCALMEM",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x50",
+        "Unit": "cpu_lowpower"
+    },
    {
        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled to a store buffer full condition",
        "Counter": "0,1,2,3,4,5,6,7",
@ -1081,6 +1126,15 @@
        "UMask": "0x20",
        "Unit": "cpu_core"
    },
+    {
+        "BriefDescription": "Counts the number of retired load ops with an unknown source",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xd4",
+        "EventName": "MEM_LOAD_UOPS_MISC_RETIRED.LOCAL_DRAM",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x2",
+        "Unit": "cpu_atom"
+    },
    {
        "BriefDescription": "Counts the number of load ops retired that miss the L3 cache and hit in DRAM",
        "Counter": "0,1,2,3,4,5,6,7",
@ -1181,6 +1235,15 @@
        "UMask": "0x1c",
        "Unit": "cpu_lowpower"
    },
+    {
+        "BriefDescription": "Counts the number of load ops retired that hit in the L3 cache in which no snoop was required",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xd1",
+        "EventName": "MEM_LOAD_UOPS_RETIRED.L3_HIT_NO_SNOOP",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x4",
+        "Unit": "cpu_atom"
+    },
    {
        "BriefDescription": "Counts the number of loads that hit in a write combining buffer (WCB), excluding the first load that caused the WCB to allocate.",
        "Counter": "0,1,2,3,4,5,6,7",
@ -1331,7 +1394,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 1024. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1343,7 +1406,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 128.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1355,7 +1418,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 128. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1367,7 +1430,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 16.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1379,7 +1442,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 16. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1391,7 +1454,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 2048. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1403,7 +1466,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 256.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1415,7 +1478,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 256. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1427,7 +1490,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 32.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1439,7 +1502,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 32. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1451,7 +1514,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 4.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1463,7 +1526,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 4. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1475,7 +1538,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 512.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1487,7 +1550,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 512. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1499,7 +1562,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 64.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1511,7 +1574,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 64. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1523,7 +1586,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 8.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1535,7 +1598,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 8. Only counts with PEBS enabled.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1707,7 +1770,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Counts the number of  stores uops retired same as MEM_UOPS_RETIRED.ALL_STORES",
+        "BriefDescription": "Counts the number of stores uops retired.",
        "Counter": "0,1,2,3,4,5,6,7",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1717,7 +1780,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of  stores uops retired same as MEM_UOPS_RETIRED.ALL_STORES",
+        "BriefDescription": "Counts the number of stores uops retired.",
        "Counter": "0,1,2,3,4,5,6,7",
        "Data_LA": "1",
        "EventCode": "0xd0",
--- a/tools/perf/pmu-events/arch/x86/arrowlake/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/arrowlake/frontend.json
@ -627,6 +627,24 @@
        "UMask": "0x4",
        "Unit": "cpu_core"
    },
+    {
+        "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache In use-full",
+        "Counter": "0,1,2,3,4,5,6,7,8,9",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_INUSE",
+        "SampleAfterValue": "200003",
+        "UMask": "0x10",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache ISB-full",
+        "Counter": "0,1,2,3,4,5,6,7,8,9",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_ISB",
+        "SampleAfterValue": "200003",
+        "UMask": "0x8",
+        "Unit": "cpu_core"
+    },
    {
        "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
        "Counter": "0,1,2,3,4,5,6,7,8,9",
--- a/tools/perf/pmu-events/arch/x86/arrowlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/arrowlake/pipeline.json
@ -822,7 +822,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles.",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.THREAD]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.CORE",
        "SampleAfterValue": "2000003",
@ -839,7 +839,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.THREAD]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.CORE",
        "SampleAfterValue": "2000003",
@ -909,7 +909,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles.",
        "Counter": "Fixed counter 2",
        "EventName": "CPU_CLK_UNHALTED.REF_TSC",
        "SampleAfterValue": "2000003",
@ -947,7 +947,7 @@
        "Unit": "cpu_lowpower"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles.",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.CORE]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.THREAD",
        "SampleAfterValue": "2000003",
@ -964,7 +964,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.CORE]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.THREAD",
        "SampleAfterValue": "2000003",
@ -1134,10 +1134,10 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
+        "BriefDescription": "Fixed Counter: Counts the number of instructions retired.",
        "Counter": "Fixed counter 0",
        "EventName": "INST_RETIRED.ANY",
-        "PublicDescription": "Fixed Counter: Counts the number of instructions retired Available PDIST counters: 32",
+        "PublicDescription": "Fixed Counter: Counts the number of instructions retired. Available PDIST counters: 32",
        "SampleAfterValue": "2000003",
        "UMask": "0x1",
        "Unit": "cpu_lowpower"
@ -1607,6 +1607,14 @@
        "SampleAfterValue": "20003",
        "Unit": "cpu_atom"
    },
+    {
+        "BriefDescription": "Counts the total number of machine clears for any reason including, but not limited to, memory ordering, memory disambiguation, SMC, and FP assist.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc3",
+        "EventName": "MACHINE_CLEARS.ANY",
+        "SampleAfterValue": "20003",
+        "Unit": "cpu_lowpower"
+    },
    {
        "BriefDescription": "Counts the number of machine clears that flush the pipeline and restart the machine without the use of microcode.",
        "Counter": "0,1,2,3,4,5,6,7",
@ -1813,6 +1821,15 @@
        "UMask": "0xff",
        "Unit": "cpu_atom"
    },
+    {
+        "BriefDescription": "Counts the number of CLFLUSH, CLWB, and CLDEMOTE instructions retired.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xe0",
+        "EventName": "MISC_RETIRED1.CL_INST",
+        "SampleAfterValue": "1000003",
+        "UMask": "0xff",
+        "Unit": "cpu_lowpower"
+    },
    {
        "BriefDescription": "Counts the number of LFENCE instructions retired.",
        "Counter": "0,1,2,3,4,5,6,7",
@ -1822,6 +1839,15 @@
        "UMask": "0x2",
        "Unit": "cpu_atom"
    },
+    {
+        "BriefDescription": "Counts the number of LFENCE instructions retired.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xe0",
+        "EventName": "MISC_RETIRED1.LFENCE",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x2",
+        "Unit": "cpu_lowpower"
+    },
    {
        "BriefDescription": "Counts the number of RDPMC, RDTSC, and RDTSCP instructions retired.",
        "Counter": "0,1,2,3,4,5,6,7",
--- a/tools/perf/pmu-events/arch/x86/emeraldrapids/cache.json
+++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/cache.json
@ -514,7 +514,7 @@
        "EventCode": "0xd3",
        "EventName": "MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM",
        "PublicDescription": "MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM Available PDIST counters: 0",
-        "SampleAfterValue": "1000003",
+        "SampleAfterValue": "100007",
        "UMask": "0x2"
    },
    {
@ -534,7 +534,7 @@
        "EventCode": "0xd3",
        "EventName": "MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM",
        "PublicDescription": "MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM Available PDIST counters: 0",
-        "SampleAfterValue": "1000003",
+        "SampleAfterValue": "100007",
        "UMask": "0x4"
    },
    {
--- a/tools/perf/pmu-events/arch/x86/emeraldrapids/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/frontend.json
@ -271,6 +271,22 @@
        "SampleAfterValue": "200003",
        "UMask": "0x4"
    },
+    {
+        "BriefDescription": "ICACHE_TAG.STALLS_INUSE",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_INUSE",
+        "SampleAfterValue": "200003",
+        "UMask": "0x10"
+    },
+    {
+        "BriefDescription": "ICACHE_TAG.STALLS_ISB",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_ISB",
+        "SampleAfterValue": "200003",
+        "UMask": "0x8"
+    },
    {
        "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
        "Counter": "0,1,2,3",
--- a/tools/perf/pmu-events/arch/x86/emeraldrapids/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/uncore-cache.json
@ -3501,7 +3501,7 @@
        "EventName": "UNC_CHA_SNOOP_RESP.RSPIFWD",
        "Experimental": "1",
        "PerPkg": "1",
-        "PublicDescription": "Counts when a a transaction with the opcode type RspIFwd Snoop Response was received which indicates a remote caching agent forwarded the data and the requesting agent is able to acquire the data in E (Exclusive) or M (modified) states.  This is commonly returned with RFO (the Read for Ownership issued before a write) transactions.  The snoop could have either been to a cacheline in the M,E,F (Modified, Exclusive or Forward)  states.",
+        "PublicDescription": "Counts when a transaction with the opcode type RspIFwd Snoop Response was received which indicates a remote caching agent forwarded the data and the requesting agent is able to acquire the data in E (Exclusive) or M (modified) states.  This is commonly returned with RFO (the Read for Ownership issued before a write) transactions.  The snoop could have either been to a cacheline in the M,E,F (Modified, Exclusive or Forward)  states.",
        "UMask": "0x4",
        "Unit": "CHA"
    },
@ -3523,7 +3523,7 @@
        "EventName": "UNC_CHA_SNOOP_RESP.RSPSFWD",
        "Experimental": "1",
        "PerPkg": "1",
-        "PublicDescription": "Counts when a a transaction with the opcode type RspSFwd Snoop Response was received which indicates a remote caching agent forwarded the data but held on to its current copy.  This is common for data and code reads that hit in a remote socket in E (Exclusive) or F (Forward) state.",
+        "PublicDescription": "Counts when a transaction with the opcode type RspSFwd Snoop Response was received which indicates a remote caching agent forwarded the data but held on to its current copy.  This is common for data and code reads that hit in a remote socket in E (Exclusive) or F (Forward) state.",
        "UMask": "0x8",
        "Unit": "CHA"
    },
--- a/tools/perf/pmu-events/arch/x86/emeraldrapids/uncore-io.json
+++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/uncore-io.json
@ -223,6 +223,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
+        "PortMask": "0xff",
        "UMask": "0xff",
        "Unit": "IIO"
    },
@ -234,7 +235,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
-        "PortMask": "0x0000",
+        "PortMask": "0x01",
        "PublicDescription": "x16 card plugged in to stack, Or x8 card plugged in to Lane 0/1, Or x4 card is plugged in to slot 0",
        "UMask": "0x1",
        "Unit": "IIO"
@ -247,7 +248,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
-        "PortMask": "0x0000",
+        "PortMask": "0x02",
        "PublicDescription": "x4 card is plugged in to slot 1",
        "UMask": "0x2",
        "Unit": "IIO"
@ -260,7 +261,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
-        "PortMask": "0x0000",
+        "PortMask": "0x04",
        "PublicDescription": "x8 card plugged in to Lane 2/3, Or x4 card is plugged in to slot 1",
        "UMask": "0x4",
        "Unit": "IIO"
@ -273,7 +274,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
-        "PortMask": "0x0000",
+        "PortMask": "0x08",
        "PublicDescription": "x4 card is plugged in to slot 3",
        "UMask": "0x8",
        "Unit": "IIO"
@ -286,7 +287,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
-        "PortMask": "0x0000",
+        "PortMask": "0x10",
        "PublicDescription": "x16 card plugged in to stack, Or x8 card plugged in to Lane 0/1, Or x4 card is plugged in to slot 0",
        "UMask": "0x10",
        "Unit": "IIO"
@ -299,7 +300,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
-        "PortMask": "0x0000",
+        "PortMask": "0x20",
        "PublicDescription": "x4 card is plugged in to slot 1",
        "UMask": "0x20",
        "Unit": "IIO"
@ -312,7 +313,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
-        "PortMask": "0x0000",
+        "PortMask": "0x40",
        "PublicDescription": "x8 card plugged in to Lane 2/3, Or x4 card is plugged in to slot 1",
        "UMask": "0x40",
        "Unit": "IIO"
@ -325,7 +326,7 @@
        "Experimental": "1",
        "FCMask": "0x07",
        "PerPkg": "1",
-        "PortMask": "0x0000",
+        "PortMask": "0x80",
        "PublicDescription": "x4 card is plugged in to slot 3",
        "UMask": "0x80",
        "Unit": "IIO"
--- a/tools/perf/pmu-events/arch/x86/grandridge/cache.json
+++ b/tools/perf/pmu-events/arch/x86/grandridge/cache.json
@ -285,8 +285,8 @@
        "UMask": "0x82"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 1024. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_1024",
@ -296,8 +296,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 128. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_128",
@ -307,8 +307,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 16. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_16",
@ -318,8 +318,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 2048. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_2048",
@ -329,8 +329,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 256. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_256",
@ -340,8 +340,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 32. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_32",
@ -351,8 +351,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 4. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_4",
@ -362,8 +362,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 512. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_512",
@ -373,8 +373,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 64. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_64",
@ -384,8 +384,8 @@
        "UMask": "0x5"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
-        "Counter": "0,1,2,3,4,5,6,7",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 8. Only counts with PEBS enabled.",
+        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_8",
@ -458,7 +458,7 @@
        "UMask": "0x12"
    },
    {
-        "BriefDescription": "Counts the number of  stores uops retired same as MEM_UOPS_RETIRED.ALL_STORES",
+        "BriefDescription": "Counts the number of stores uops retired.",
        "Counter": "0,1,2,3,4,5,6,7",
        "Data_LA": "1",
        "EventCode": "0xd0",
--- a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
@ -178,7 +178,7 @@
        "UMask": "0xf7"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.THREAD]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.CORE",
        "SampleAfterValue": "2000003",
@ -192,7 +192,7 @@
        "SampleAfterValue": "2000003"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles.",
        "Counter": "Fixed counter 2",
        "EventName": "CPU_CLK_UNHALTED.REF_TSC",
        "SampleAfterValue": "2000003",
@ -208,7 +208,7 @@
        "UMask": "0x1"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.CORE]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.THREAD",
        "SampleAfterValue": "2000003",
@ -222,10 +222,10 @@
        "SampleAfterValue": "2000003"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
+        "BriefDescription": "Fixed Counter: Counts the number of instructions retired.",
        "Counter": "Fixed counter 0",
        "EventName": "INST_RETIRED.ANY",
-        "PublicDescription": "Fixed Counter: Counts the number of instructions retired Available PDIST counters: 32",
+        "PublicDescription": "Fixed Counter: Counts the number of instructions retired. Available PDIST counters: 32",
        "SampleAfterValue": "2000003",
        "UMask": "0x1"
    },
@ -301,6 +301,38 @@
        "SampleAfterValue": "1000003",
        "UMask": "0x1"
    },
+    {
+        "BriefDescription": "Counts the number of CLFLUSH, CLWB, and CLDEMOTE instructions retired.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xe0",
+        "EventName": "MISC_RETIRED1.CL_INST",
+        "SampleAfterValue": "1000003",
+        "UMask": "0xff"
+    },
+    {
+        "BriefDescription": "Counts the number of LFENCE instructions retired.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xe0",
+        "EventName": "MISC_RETIRED1.LFENCE",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x2"
+    },
+    {
+        "BriefDescription": "Counts the number of accesses to KeyLocker cache.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xe1",
+        "EventName": "MISC_RETIRED2.KEYLOCKER_ACCESS",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x10"
+    },
+    {
+        "BriefDescription": "Counts the number of misses to KeyLocker cache.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xe1",
+        "EventName": "MISC_RETIRED2.KEYLOCKER_MISS",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x11"
+    },
    {
        "BriefDescription": "Counts the number of issue slots in a UMWAIT or TPAUSE instruction where no uop issues due to the instruction putting the CPU into the C0.1 activity state.",
        "Counter": "0,1,2,3,4,5,6,7",
--- a/tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
@ -325,6 +325,22 @@
        "SampleAfterValue": "200003",
        "UMask": "0x4"
    },
+    {
+        "BriefDescription": "ICACHE_TAG.STALLS_INUSE",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_INUSE",
+        "SampleAfterValue": "200003",
+        "UMask": "0x10"
+    },
+    {
+        "BriefDescription": "ICACHE_TAG.STALLS_ISB",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_ISB",
+        "SampleAfterValue": "200003",
+        "UMask": "0x8"
+    },
    {
        "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
        "Counter": "0,1,2,3",
--- a/tools/perf/pmu-events/arch/x86/graniterapids/gnr-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/gnr-metrics.json
@ -143,6 +143,12 @@
        "MetricName": "io_full_write_l3_miss",
        "ScaleUnit": "100%"
    },
+    {
+        "BriefDescription": "The number of times per second that ownership of a cacheline was stolen from the integrated IO controller before it was able to write back the modified line",
+        "MetricExpr": "(UNC_I_MISC1.LOST_FWD + UNC_I_MISC1.SEC_RCVD_INVLD) / duration_time",
+        "MetricName": "io_lost_fwd",
+        "ScaleUnit": "1per_sec"
+    },
    {
        "BriefDescription": "Message Signaled Interrupts (MSI) per second sent by the integrated I/O traffic controller (IIO) to System Configuration Controller (Ubox)",
        "MetricExpr": "UNC_IIO_NUM_REQ_OF_CPU_BY_TGT.UBOX_POSTED / duration_time",
@ -294,6 +300,27 @@
        "MetricName": "memory_bandwidth_write",
        "ScaleUnit": "1MB/s"
    },
+    {
+        "BriefDescription": "All reads to the local sub-numa cluster cache as a percentage of total memory read accesses",
+        "MetricExpr": "(L2_LINES_IN.ALL - (OCR.READS_TO_CORE.SNC_CACHE.HITM + OCR.READS_TO_CORE.SNC_CACHE.HIT_WITH_FWD + OCR.READS_TO_CORE.REMOTE_CACHE.SNOOP_FWD + OCR.READS_TO_CORE.REMOTE_MEMORY + OCR.READS_TO_CORE.L3_MISS_LOCAL)) / L2_LINES_IN.ALL",
+        "MetricName": "numa_percent_all_reads_to_local_cluster_cache",
+        "PublicDescription": "All reads to the local sub-numa cluster cache as a percentage of total memory read accesses. Includes demand and prefetch requests for data reads, code reads, read for ownerships (RFO), does not include LLC prefetches",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "All reads to the local sub-numa cluster memory as a percentage of total memory read accesses",
+        "MetricExpr": "OCR.READS_TO_CORE.L3_MISS_LOCAL / L2_LINES_IN.ALL",
+        "MetricName": "numa_percent_all_reads_to_local_cluster_memory",
+        "PublicDescription": "All reads to the local sub-numa cluster memory as a percentage of total memory read accesses. Includes demand and prefetch requests for data reads, code reads, read for ownerships (RFO), does not include LLC prefetches",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "All reads to a remote sub-numa cluster cache as a percentage of total memory read accesses",
+        "MetricExpr": "(OCR.READS_TO_CORE.SNC_CACHE.HIT_WITH_FWD + OCR.READS_TO_CORE.SNC_CACHE.HITM) / L2_LINES_IN.ALL",
+        "MetricName": "numa_percent_all_reads_to_remote_cluster_cache",
+        "PublicDescription": "All reads to a remote sub-numa cluster cache as a percentage of total memory read accesses. Includes demand and prefetch requests for data reads, code reads, read for ownerships (RFO), does not include LLC prefetches",
+        "ScaleUnit": "100%"
+    },
    {
        "BriefDescription": "Memory read that miss the last level cache (LLC) addressed to local DRAM as a percentage of total memory read accesses, does not include LLC prefetches",
        "MetricExpr": "(UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_LOCAL) / (UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_LOCAL + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_REMOTE)",
--- a/tools/perf/pmu-events/arch/x86/lunarlake/cache.json
+++ b/tools/perf/pmu-events/arch/x86/lunarlake/cache.json
@ -550,6 +550,24 @@
        "UMask": "0x7e",
        "Unit": "cpu_atom"
    },
+    {
+        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled due to an icache or itlb miss which missed all the caches.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x35",
+        "EventName": "MEM_BOUND_STALLS_IFETCH.LLC_MISS",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x78",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled due to an icache or itlb miss which missed all the caches. Local DRAM, MMIO or other local memory type provides the data.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x35",
+        "EventName": "MEM_BOUND_STALLS_IFETCH.LLC_MISS_LOCALMEM",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x50",
+        "Unit": "cpu_atom"
+    },
    {
        "BriefDescription": "Counts the number of unhalted cycles when the core is stalled due to an L1 demand load miss.",
        "Counter": "0,1,2,3,4,5,6,7",
@ -1088,7 +1106,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 128.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1100,7 +1118,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 16.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1112,7 +1130,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 256.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1124,7 +1142,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 32.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1136,7 +1154,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 4.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1148,7 +1166,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 512.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1160,7 +1178,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 64.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1172,7 +1190,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled",
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold of 8.",
        "Counter": "0,1",
        "Data_LA": "1",
        "EventCode": "0xd0",
@ -1274,7 +1292,7 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of  stores uops retired same as MEM_UOPS_RETIRED.ALL_STORES",
+        "BriefDescription": "Counts the number of stores uops retired.",
        "Counter": "0,1,2,3,4,5,6,7",
        "Data_LA": "1",
        "EventCode": "0xd0",
--- a/tools/perf/pmu-events/arch/x86/lunarlake/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/lunarlake/frontend.json
@ -424,6 +424,15 @@
        "UMask": "0x1",
        "Unit": "cpu_atom"
    },
+    {
+        "BriefDescription": "Counts the number of instructions retired that were tagged because empty issue slots were seen before the uop due to Instruction L1 cache miss, that missed in the L2 cache.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc9",
+        "EventName": "FRONTEND_RETIRED_SOURCE.ICACHE_L2_MISS",
+        "SampleAfterValue": "1000003",
+        "UMask": "0xe",
+        "Unit": "cpu_atom"
+    },
    {
        "BriefDescription": "Counts the number of instructions retired that were tagged because empty issue slots were seen before the uop due to ITLB miss that hit in the second level TLB.",
        "Counter": "0,1,2,3,4,5,6,7",
@ -500,6 +509,24 @@
        "UMask": "0x4",
        "Unit": "cpu_core"
    },
+    {
+        "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache In use-full",
+        "Counter": "0,1,2,3,4,5,6,7,8,9",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_INUSE",
+        "SampleAfterValue": "200003",
+        "UMask": "0x10",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache ISB-full",
+        "Counter": "0,1,2,3,4,5,6,7,8,9",
+        "EventCode": "0x83",
+        "EventName": "ICACHE_TAG.STALLS_ISB",
+        "SampleAfterValue": "200003",
+        "UMask": "0x8",
+        "Unit": "cpu_core"
+    },
    {
        "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
        "Counter": "0,1,2,3,4,5,6,7,8,9",
--- a/tools/perf/pmu-events/arch/x86/lunarlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/lunarlake/pipeline.json
@ -634,7 +634,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles.",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.THREAD]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.CORE",
        "SampleAfterValue": "2000003",
@ -725,7 +725,7 @@
        "Unit": "cpu_core"
    },
    {
-        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles.",
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles. [This event is alias to CPU_CLK_UNHALTED.CORE]",
        "Counter": "Fixed counter 1",
        "EventName": "CPU_CLK_UNHALTED.THREAD",
        "SampleAfterValue": "2000003",
@ -1530,8 +1530,9 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of accesses to KeyLocker cache.",
+        "BriefDescription": "This event is deprecated.",
        "Counter": "0,1,2,3,4,5,6,7",
+        "Deprecated": "1",
        "EventCode": "0xe1",
        "EventName": "MISC_RETIRED2.KEYLOCKER_ACCESS",
        "SampleAfterValue": "1000003",
@ -1539,8 +1540,9 @@
        "Unit": "cpu_atom"
    },
    {
-        "BriefDescription": "Counts the number of misses to KeyLocker cache.",
+        "BriefDescription": "This event is deprecated.",
        "Counter": "0,1,2,3,4,5,6,7",
+        "Deprecated": "1",
        "EventCode": "0xe1",
        "EventName": "MISC_RETIRED2.KEYLOCKER_MISS",
        "SampleAfterValue": "1000003",
--- a/Show More
+++ b/Show More