Performance events updates for v6.15:

Core:
 
   - Move perf_event sysctls into kernel/events/ (Joel Granados)
   - Use POLLHUP for pinned events in error (Namhyung Kim)
   - Avoid the read if the count is already updated (Peter Zijlstra)
   - Allow the EPOLLRDNORM flag for poll (Tao Chen)
 
   - locking/percpu-rwsem: Add guard support (Peter Zijlstra)
     [ NOTE: this got (mis-)merged into the perf tree due to related work. ]
 
 perf_pmu_unregister() related improvements: (Peter Zijlstra)
 
   - Simplify the perf_event_alloc() error path
   - Simplify the perf_pmu_register() error path
   - Simplify perf_pmu_register()
   - Simplify perf_init_event()
   - Simplify perf_event_alloc()
   - Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
   - Add this_cpc() helper
   - Introduce perf_free_addr_filters()
   - Robustify perf_event_free_bpf_prog()
   - Simplify the perf_mmap() control flow
   - Further simplify perf_mmap()
   - Remove retry loop from perf_mmap()
   - Lift event->mmap_mutex in perf_mmap()
   - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
   - Fix perf_mmap() failure path
 
 Uprobes:
 
   - Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
   - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
   - Remove the spinlock within handle_singlestep() (Liao Chang)
 
 x86 Intel PMU enhancements:
 
   - Support PEBS counters snapshotting (Kan Liang)
   - Fix intel_pmu_read_event() (Kan Liang)
   - Extend per event callchain limit to branch stack (Kan Liang)
   - Fix system-wide LBR profiling (Kan Liang)
   - Allocate bts_ctx only if necessary (Li RongQing)
   - Apply static call for drain_pebs (Peter Zijlstra)
 
 x86 AMD PMU enhancements: (Ravi Bangoria)
 
   - Remove pointless sample period check
   - Fix ->config to sample period calculation for OP PMU
   - Fix perf_ibs_op.cnt_mask for CurCnt
   - Don't allow freq mode event creation through ->config interface
   - Add PMU specific minimum period
   - Add ->check_period() callback
   - Ceil sample_period to min_period
   - Add support for OP Load Latency Filtering
   - Update DTLB/PageSize decode logic
 
 Hardware breakpoints:
 
   - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar Bhaskar)
 
 Hardlockup detector improvements: (Li Huafei)
 
   - perf_event memory leak
   - Warn if watchdog_ev is leaked
 
 Fixes and cleanups:
 
   - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter Zijlstra,
     Ravi Bangoria, Thorsten Blum, XieLudan)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfehRIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hF3g//TCAQijI6OFNpYiD1xoyMq4m+baIYhYx0
 lnxwxhsN58JFcEJeWIEGLACqUePyH68jNKVSr9sIoeV4gnnMX+x2Ny6rh/1H3Ox+
 jQyVmPdFmKa8QG7wGNjcDteIzlEKK4zqruXWaG54LX2e6kbQZWwd0I21MyXkrHXb
 oMIfyZbCAWuPW1wefZm8FPgImT+nvwOosyx90OVagGqk5mYdNb9DFhMjQveStHdQ
 BnWU6rYdW1c2eXKpeuvxY4uWQoCELC6WntLimvcswy6fb+9LtbglpCYQOGGDrGvp
 v3RASf/8clFVSau8P/8NEaNgLgjN/e3eN/fAoSut8Z22nAeBC6qv4qjFt1piDpbs
 AaEXYCYM0/Tfzjp3ctPsFrxbKvB8q2qhxSm37Co0Ix6WyJn3JQbNx48g8GIod2os
 eGPXSZzoz9O8coeTKKbxWp4fpAjFfyfe/ovWQuVd8JI4bYj7Mi63J+RxQDd2TkJP
 H+IgxZoamJExgS1YcKJUBtw7QKQm5pHFx03Br7KsNxgmHy7JdoN9bh0h14pkeXjB
 MnAvWOS5ouuriJgQ+4bqAezS8DSHnDdmFmWgNEEqAlOD9Zy9hDXJ2GiqbHKMyRNC
 ae35o0PDUFTIX9O5NPIDUyWtJb5uH/S1lQhS7GD+ODlMDIX+ny+REXf9krSCR1H0
 GUqq2UmxBGA=
 =iPmA
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "Core:
   - Move perf_event sysctls into kernel/events/ (Joel Granados)
   - Use POLLHUP for pinned events in error (Namhyung Kim)
   - Avoid the read if the count is already updated (Peter Zijlstra)
   - Allow the EPOLLRDNORM flag for poll (Tao Chen)
   - locking/percpu-rwsem: Add guard support [ NOTE: this got
     (mis-)merged into the perf tree due to related work ] (Peter
     Zijlstra)

  perf_pmu_unregister() related improvements: (Peter Zijlstra)
   - Simplify the perf_event_alloc() error path
   - Simplify the perf_pmu_register() error path
   - Simplify perf_pmu_register()
   - Simplify perf_init_event()
   - Simplify perf_event_alloc()
   - Merge struct pmu::pmu_disable_count into struct
     perf_cpu_pmu_context::pmu_disable_count
   - Add this_cpc() helper
   - Introduce perf_free_addr_filters()
   - Robustify perf_event_free_bpf_prog()
   - Simplify the perf_mmap() control flow
   - Further simplify perf_mmap()
   - Remove retry loop from perf_mmap()
   - Lift event->mmap_mutex in perf_mmap()
   - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
   - Fix perf_mmap() failure path

  Uprobes:
   - Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
   - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
   - Remove the spinlock within handle_singlestep() (Liao Chang)

  x86 Intel PMU enhancements:
   - Support PEBS counters snapshotting (Kan Liang)
   - Fix intel_pmu_read_event() (Kan Liang)
   - Extend per event callchain limit to branch stack (Kan Liang)
   - Fix system-wide LBR profiling (Kan Liang)
   - Allocate bts_ctx only if necessary (Li RongQing)
   - Apply static call for drain_pebs (Peter Zijlstra)

  x86 AMD PMU enhancements: (Ravi Bangoria)
   - Remove pointless sample period check
   - Fix ->config to sample period calculation for OP PMU
   - Fix perf_ibs_op.cnt_mask for CurCnt
   - Don't allow freq mode event creation through ->config interface
   - Add PMU specific minimum period
   - Add ->check_period() callback
   - Ceil sample_period to min_period
   - Add support for OP Load Latency Filtering
   - Update DTLB/PageSize decode logic

  Hardware breakpoints:
   - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar
     Bhaskar)

  Hardlockup detector improvements: (Li Huafei)
   - perf_event memory leak
   - Warn if watchdog_ev is leaked

  Fixes and cleanups:
   - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter
     Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)"

* tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  perf: Fix __percpu annotation
  perf: Clean up pmu specific data
  perf/x86: Remove swap_task_ctx()
  perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode
  perf: Supply task information to sched_task()
  perf: attach/detach PMU specific data
  locking/percpu-rwsem: Add guard support
  perf: Save PMU specific data in task_struct
  perf: Extend per event callchain limit to branch stack
  perf/ring_buffer: Allow the EPOLLRDNORM flag for poll
  perf/core: Use POLLHUP for pinned events in error
  perf/core: Use sysfs_emit() instead of scnprintf()
  perf/core: Remove optional 'size' arguments from strscpy() calls
  perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions
  uprobes/x86: Harden uretprobe syscall trampoline check
  watchdog/hardlockup/perf: Warn if watchdog_ev is leaked
  watchdog/hardlockup/perf: Fix perf_event memory leak
  perf/x86: Annotate struct bts_buffer::buf with __counted_by()
  perf/core: Clean up perf_try_init_event()
  perf/core: Fix perf_mmap() failure path
  ...
pull/1183/head
Linus Torvalds 2025-03-24 21:46:36 -07:00
commit 327ecdbc0f
34 changed files with 1425 additions and 746 deletions

View File

@ -132,7 +132,10 @@ static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
static inline void power_pmu_bhrb_enable(struct perf_event *event) {}
static inline void power_pmu_bhrb_disable(struct perf_event *event) {}
static void power_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in) {}
static void power_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
}
static inline void power_pmu_bhrb_read(struct perf_event *event, struct cpu_hw_events *cpuhw) {}
static void pmao_restore_workaround(bool ebb) { }
#endif /* CONFIG_PPC32 */
@ -444,7 +447,8 @@ static void power_pmu_bhrb_disable(struct perf_event *event)
/* Called from ctxsw to prevent one process's branch entries to
* mingle with the other process's entries during context switch.
*/
static void power_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
static void power_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
if (!ppmu->bhrb_nr)
return;

View File

@ -518,7 +518,8 @@ static void paicrypt_have_samples(void)
/* Called on schedule-in and schedule-out. No access to event structure,
* but for sampling only event CRYPTO_ALL is allowed.
*/
static void paicrypt_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
static void paicrypt_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
/* We started with a clean page on event installation. So read out
* results on schedule_out and if page was dirty, save old values.

View File

@ -542,7 +542,8 @@ static void paiext_have_samples(void)
/* Called on schedule-in and schedule-out. No access to event structure,
* but for sampling only event NNPA_ALL is allowed.
*/
static void paiext_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
static void paiext_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
/* We started with a clean page on event installation. So read out
* results on schedule_out and if page was dirty, save old values.

View File

@ -381,7 +381,8 @@ static void amd_brs_poison_buffer(void)
* On ctxswin, sched_in = true, called after the PMU has started
* On ctxswout, sched_in = false, called before the PMU is stopped
*/
void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);

View File

@ -28,9 +28,6 @@ static u32 ibs_caps;
#include <asm/nmi.h>
#include <asm/amd-ibs.h>
#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
/* attr.config2 */
#define IBS_SW_FILTER_MASK 1
@ -89,6 +86,7 @@ struct perf_ibs {
u64 cnt_mask;
u64 enable_mask;
u64 valid_mask;
u16 min_period;
u64 max_period;
unsigned long offset_mask[1];
int offset_max;
@ -270,11 +268,19 @@ static int validate_group(struct perf_event *event)
return 0;
}
static bool perf_ibs_ldlat_event(struct perf_ibs *perf_ibs,
struct perf_event *event)
{
return perf_ibs == &perf_ibs_op &&
(ibs_caps & IBS_CAPS_OPLDLAT) &&
(event->attr.config1 & 0xFFF);
}
static int perf_ibs_init(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
struct perf_ibs *perf_ibs;
u64 max_cnt, config;
u64 config;
int ret;
perf_ibs = get_ibs_pmu(event->attr.type);
@ -310,25 +316,47 @@ static int perf_ibs_init(struct perf_event *event)
if (config & perf_ibs->cnt_mask)
/* raw max_cnt may not be set */
return -EINVAL;
if (!event->attr.sample_freq && hwc->sample_period & 0x0f)
/*
* lower 4 bits can not be set in ibs max cnt,
* but allowing it in case we adjust the
* sample period to set a frequency.
*/
return -EINVAL;
hwc->sample_period &= ~0x0FULL;
if (!hwc->sample_period)
hwc->sample_period = 0x10;
if (event->attr.freq) {
hwc->sample_period = perf_ibs->min_period;
} else {
/* Silently mask off lower nibble. IBS hw mandates it. */
hwc->sample_period &= ~0x0FULL;
if (hwc->sample_period < perf_ibs->min_period)
return -EINVAL;
}
} else {
max_cnt = config & perf_ibs->cnt_mask;
u64 period = 0;
if (event->attr.freq)
return -EINVAL;
if (perf_ibs == &perf_ibs_op) {
period = (config & IBS_OP_MAX_CNT) << 4;
if (ibs_caps & IBS_CAPS_OPCNTEXT)
period |= config & IBS_OP_MAX_CNT_EXT_MASK;
} else {
period = (config & IBS_FETCH_MAX_CNT) << 4;
}
config &= ~perf_ibs->cnt_mask;
event->attr.sample_period = max_cnt << 4;
hwc->sample_period = event->attr.sample_period;
event->attr.sample_period = period;
hwc->sample_period = period;
if (hwc->sample_period < perf_ibs->min_period)
return -EINVAL;
}
if (!hwc->sample_period)
return -EINVAL;
if (perf_ibs_ldlat_event(perf_ibs, event)) {
u64 ldlat = event->attr.config1 & 0xFFF;
if (ldlat < 128 || ldlat > 2048)
return -EINVAL;
ldlat >>= 7;
config |= (ldlat - 1) << 59;
config |= IBS_OP_L3MISSONLY | IBS_OP_LDLAT_EN;
}
/*
* If we modify hwc->sample_period, we also need to update
@ -349,7 +377,8 @@ static int perf_ibs_set_period(struct perf_ibs *perf_ibs,
int overflow;
/* ignore lower 4 bits in min count: */
overflow = perf_event_set_period(hwc, 1<<4, perf_ibs->max_period, period);
overflow = perf_event_set_period(hwc, perf_ibs->min_period,
perf_ibs->max_period, period);
local64_set(&hwc->prev_count, 0);
return overflow;
@ -447,6 +476,9 @@ static void perf_ibs_start(struct perf_event *event, int flags)
WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
hwc->state = 0;
if (event->attr.freq && hwc->sample_period < perf_ibs->min_period)
hwc->sample_period = perf_ibs->min_period;
perf_ibs_set_period(perf_ibs, hwc, &period);
if (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_OPCNTEXT)) {
config |= period & IBS_OP_MAX_CNT_EXT_MASK;
@ -554,6 +586,28 @@ static void perf_ibs_del(struct perf_event *event, int flags)
static void perf_ibs_read(struct perf_event *event) { }
static int perf_ibs_check_period(struct perf_event *event, u64 value)
{
struct perf_ibs *perf_ibs;
u64 low_nibble;
if (event->attr.freq)
return 0;
perf_ibs = container_of(event->pmu, struct perf_ibs, pmu);
low_nibble = value & 0xFULL;
/*
* This contradicts with perf_ibs_init() which allows sample period
* with lower nibble bits set but silently masks them off. Whereas
* this returns error.
*/
if (low_nibble || value < perf_ibs->min_period)
return -EINVAL;
return 0;
}
/*
* We need to initialize with empty group if all attributes in the
* group are dynamic.
@ -572,7 +626,10 @@ PMU_FORMAT_ATTR(cnt_ctl, "config:19");
PMU_FORMAT_ATTR(swfilt, "config2:0");
PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
PMU_EVENT_ATTR_STRING(ldlat, ibs_op_ldlat_format, "config1:0-11");
PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
PMU_EVENT_ATTR_STRING(ldlat, ibs_op_ldlat_cap, "1");
PMU_EVENT_ATTR_STRING(dtlb_pgsize, ibs_op_dtlb_pgsize_cap, "1");
static umode_t
zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
@ -580,6 +637,18 @@ zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int
return ibs_caps & IBS_CAPS_ZEN4 ? attr->mode : 0;
}
static umode_t
ibs_op_ldlat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
{
return ibs_caps & IBS_CAPS_OPLDLAT ? attr->mode : 0;
}
static umode_t
ibs_op_dtlb_pgsize_is_visible(struct kobject *kobj, struct attribute *attr, int i)
{
return ibs_caps & IBS_CAPS_OPDTLBPGSIZE ? attr->mode : 0;
}
static struct attribute *fetch_attrs[] = {
&format_attr_rand_en.attr,
&format_attr_swfilt.attr,
@ -596,6 +665,16 @@ static struct attribute *zen4_ibs_extensions_attrs[] = {
NULL,
};
static struct attribute *ibs_op_ldlat_cap_attrs[] = {
&ibs_op_ldlat_cap.attr.attr,
NULL,
};
static struct attribute *ibs_op_dtlb_pgsize_cap_attrs[] = {
&ibs_op_dtlb_pgsize_cap.attr.attr,
NULL,
};
static struct attribute_group group_fetch_formats = {
.name = "format",
.attrs = fetch_attrs,
@ -613,6 +692,18 @@ static struct attribute_group group_zen4_ibs_extensions = {
.is_visible = zen4_ibs_extensions_is_visible,
};
static struct attribute_group group_ibs_op_ldlat_cap = {
.name = "caps",
.attrs = ibs_op_ldlat_cap_attrs,
.is_visible = ibs_op_ldlat_is_visible,
};
static struct attribute_group group_ibs_op_dtlb_pgsize_cap = {
.name = "caps",
.attrs = ibs_op_dtlb_pgsize_cap_attrs,
.is_visible = ibs_op_dtlb_pgsize_is_visible,
};
static const struct attribute_group *fetch_attr_groups[] = {
&group_fetch_formats,
&empty_caps_group,
@ -651,6 +742,11 @@ static struct attribute_group group_op_formats = {
.attrs = op_attrs,
};
static struct attribute *ibs_op_ldlat_format_attrs[] = {
&ibs_op_ldlat_format.attr.attr,
NULL,
};
static struct attribute_group group_cnt_ctl = {
.name = "format",
.attrs = cnt_ctl_attrs,
@ -669,10 +765,19 @@ static const struct attribute_group *op_attr_groups[] = {
NULL,
};
static struct attribute_group group_ibs_op_ldlat_format = {
.name = "format",
.attrs = ibs_op_ldlat_format_attrs,
.is_visible = ibs_op_ldlat_is_visible,
};
static const struct attribute_group *op_attr_update[] = {
&group_cnt_ctl,
&group_op_l3missonly,
&group_zen4_ibs_extensions,
&group_ibs_op_ldlat_cap,
&group_ibs_op_ldlat_format,
&group_ibs_op_dtlb_pgsize_cap,
NULL,
};
@ -686,12 +791,14 @@ static struct perf_ibs perf_ibs_fetch = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
.check_period = perf_ibs_check_period,
},
.msr = MSR_AMD64_IBSFETCHCTL,
.config_mask = IBS_FETCH_CONFIG_MASK,
.config_mask = IBS_FETCH_MAX_CNT | IBS_FETCH_RAND_EN,
.cnt_mask = IBS_FETCH_MAX_CNT,
.enable_mask = IBS_FETCH_ENABLE,
.valid_mask = IBS_FETCH_VAL,
.min_period = 0x10,
.max_period = IBS_FETCH_MAX_CNT << 4,
.offset_mask = { MSR_AMD64_IBSFETCH_REG_MASK },
.offset_max = MSR_AMD64_IBSFETCH_REG_COUNT,
@ -709,13 +816,15 @@ static struct perf_ibs perf_ibs_op = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
.check_period = perf_ibs_check_period,
},
.msr = MSR_AMD64_IBSOPCTL,
.config_mask = IBS_OP_CONFIG_MASK,
.config_mask = IBS_OP_MAX_CNT,
.cnt_mask = IBS_OP_MAX_CNT | IBS_OP_CUR_CNT |
IBS_OP_CUR_CNT_RAND,
.enable_mask = IBS_OP_ENABLE,
.valid_mask = IBS_OP_VAL,
.min_period = 0x90,
.max_period = IBS_OP_MAX_CNT << 4,
.offset_mask = { MSR_AMD64_IBSOP_REG_MASK },
.offset_max = MSR_AMD64_IBSOP_REG_COUNT,
@ -917,6 +1026,10 @@ static void perf_ibs_get_tlb_lvl(union ibs_op_data3 *op_data3,
if (!op_data3->dc_lin_addr_valid)
return;
if ((ibs_caps & IBS_CAPS_OPDTLBPGSIZE) &&
!op_data3->dc_phy_addr_valid)
return;
if (!op_data3->dc_l1tlb_miss) {
data_src->mem_dtlb = PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT;
return;
@ -1023,15 +1136,25 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
}
}
static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
static bool perf_ibs_is_mem_sample_type(struct perf_ibs *perf_ibs,
struct perf_event *event)
{
u64 sample_type = event->attr.sample_type;
return perf_ibs == &perf_ibs_op &&
sample_type & (PERF_SAMPLE_DATA_SRC |
PERF_SAMPLE_WEIGHT_TYPE |
PERF_SAMPLE_ADDR |
PERF_SAMPLE_PHYS_ADDR);
}
static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs,
struct perf_event *event,
int check_rip)
{
if (sample_type & PERF_SAMPLE_RAW ||
(perf_ibs == &perf_ibs_op &&
(sample_type & PERF_SAMPLE_DATA_SRC ||
sample_type & PERF_SAMPLE_WEIGHT_TYPE ||
sample_type & PERF_SAMPLE_ADDR ||
sample_type & PERF_SAMPLE_PHYS_ADDR)))
if (event->attr.sample_type & PERF_SAMPLE_RAW ||
perf_ibs_is_mem_sample_type(perf_ibs, event) ||
perf_ibs_ldlat_event(perf_ibs, event))
return perf_ibs->offset_max;
else if (check_rip)
return 3;
@ -1148,7 +1271,7 @@ fail:
offset = 1;
check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
offset_max = perf_ibs_get_offset_max(perf_ibs, event->attr.sample_type, check_rip);
offset_max = perf_ibs_get_offset_max(perf_ibs, event, check_rip);
do {
rdmsrl(msr + offset, *buf++);
@ -1157,6 +1280,22 @@ fail:
perf_ibs->offset_max,
offset + 1);
} while (offset < offset_max);
if (perf_ibs_ldlat_event(perf_ibs, event)) {
union ibs_op_data3 op_data3;
op_data3.val = ibs_data.regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
/*
* Opening event is errored out if load latency threshold is
* outside of [128, 2048] range. Since the event has reached
* interrupt handler, we can safely assume the threshold is
* within [128, 2048] range.
*/
if (!op_data3.ld_op || !op_data3.dc_miss ||
op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF))
goto out;
}
/*
* Read IbsBrTarget, IbsOpData4, and IbsExtdCtl separately
* depending on their availability.
@ -1229,6 +1368,10 @@ fail:
perf_sample_save_callchain(&data, event, iregs);
throttle = perf_event_overflow(event, &data, &regs);
if (event->attr.freq && hwc->sample_period < perf_ibs->min_period)
hwc->sample_period = perf_ibs->min_period;
out:
if (throttle) {
perf_ibs_stop(event, 0);
@ -1318,7 +1461,8 @@ static __init int perf_ibs_op_init(void)
if (ibs_caps & IBS_CAPS_OPCNTEXT) {
perf_ibs_op.max_period |= IBS_OP_MAX_CNT_EXT_MASK;
perf_ibs_op.config_mask |= IBS_OP_MAX_CNT_EXT_MASK;
perf_ibs_op.cnt_mask |= IBS_OP_MAX_CNT_EXT_MASK;
perf_ibs_op.cnt_mask |= (IBS_OP_MAX_CNT_EXT_MASK |
IBS_OP_CUR_CNT_EXT_MASK);
}
if (ibs_caps & IBS_CAPS_ZEN4)

View File

@ -30,7 +30,7 @@
#define GET_DOMID_MASK(x) (((x)->conf1 >> 16) & 0xFFFFULL)
#define GET_PASID_MASK(x) (((x)->conf1 >> 32) & 0xFFFFFULL)
#define IOMMU_NAME_SIZE 16
#define IOMMU_NAME_SIZE 24
struct perf_amd_iommu {
struct list_head list;

View File

@ -371,7 +371,8 @@ void amd_pmu_lbr_del(struct perf_event *event)
perf_sched_cb_dec(event->pmu);
}
void amd_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
void amd_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);

View File

@ -87,13 +87,14 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_commit_scheduling, *x86_pmu.commit_scheduling);
DEFINE_STATIC_CALL_NULL(x86_pmu_stop_scheduling, *x86_pmu.stop_scheduling);
DEFINE_STATIC_CALL_NULL(x86_pmu_sched_task, *x86_pmu.sched_task);
DEFINE_STATIC_CALL_NULL(x86_pmu_swap_task_ctx, *x86_pmu.swap_task_ctx);
DEFINE_STATIC_CALL_NULL(x86_pmu_drain_pebs, *x86_pmu.drain_pebs);
DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);
DEFINE_STATIC_CALL_NULL(x86_pmu_filter, *x86_pmu.filter);
DEFINE_STATIC_CALL_NULL(x86_pmu_late_setup, *x86_pmu.late_setup);
/*
* This one is magic, it will get called even when PMU init fails (because
* there is no PMU), in which case it should simply return NULL.
@ -1298,6 +1299,15 @@ static void x86_pmu_enable(struct pmu *pmu)
if (cpuc->n_added) {
int n_running = cpuc->n_events - cpuc->n_added;
/*
* The late setup (after counters are scheduled)
* is required for some cases, e.g., PEBS counters
* snapshotting. Because an accurate counter index
* is needed.
*/
static_call_cond(x86_pmu_late_setup)();
/*
* apply assignment obtained either from
* hw_perf_group_sched_in() or x86_pmu_enable()
@ -2028,13 +2038,14 @@ static void x86_pmu_static_call_update(void)
static_call_update(x86_pmu_stop_scheduling, x86_pmu.stop_scheduling);
static_call_update(x86_pmu_sched_task, x86_pmu.sched_task);
static_call_update(x86_pmu_swap_task_ctx, x86_pmu.swap_task_ctx);
static_call_update(x86_pmu_drain_pebs, x86_pmu.drain_pebs);
static_call_update(x86_pmu_pebs_aliases, x86_pmu.pebs_aliases);
static_call_update(x86_pmu_guest_get_msrs, x86_pmu.guest_get_msrs);
static_call_update(x86_pmu_filter, x86_pmu.filter);
static_call_update(x86_pmu_late_setup, x86_pmu.late_setup);
}
static void _x86_pmu_read(struct perf_event *event)
@ -2625,15 +2636,10 @@ static const struct attribute_group *x86_pmu_attr_groups[] = {
NULL,
};
static void x86_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
static void x86_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
static_call_cond(x86_pmu_sched_task)(pmu_ctx, sched_in);
}
static void x86_pmu_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
struct perf_event_pmu_context *next_epc)
{
static_call_cond(x86_pmu_swap_task_ctx)(prev_epc, next_epc);
static_call_cond(x86_pmu_sched_task)(pmu_ctx, task, sched_in);
}
void perf_check_microcode(void)
@ -2700,7 +2706,6 @@ static struct pmu pmu = {
.event_idx = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
.swap_task_ctx = x86_pmu_swap_task_ctx,
.check_period = x86_pmu_check_period,
.aux_output_match = x86_pmu_aux_output_match,

View File

@ -36,7 +36,7 @@ enum {
BTS_STATE_ACTIVE,
};
static DEFINE_PER_CPU(struct bts_ctx, bts_ctx);
static struct bts_ctx __percpu *bts_ctx;
#define BTS_RECORD_SIZE 24
#define BTS_SAFETY_MARGIN 4080
@ -58,7 +58,7 @@ struct bts_buffer {
local_t head;
unsigned long end;
void **data_pages;
struct bts_phys buf[];
struct bts_phys buf[] __counted_by(nr_bufs);
};
static struct pmu bts_pmu;
@ -231,7 +231,7 @@ bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle);
static void __bts_event_start(struct perf_event *event)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
struct bts_buffer *buf = perf_get_aux(&bts->handle);
u64 config = 0;
@ -260,7 +260,7 @@ static void __bts_event_start(struct perf_event *event)
static void bts_event_start(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
struct bts_buffer *buf;
buf = perf_aux_output_begin(&bts->handle, event);
@ -290,7 +290,7 @@ fail_stop:
static void __bts_event_stop(struct perf_event *event, int state)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
/* ACTIVE -> INACTIVE(PMI)/STOPPED(->stop()) */
WRITE_ONCE(bts->state, state);
@ -305,7 +305,7 @@ static void __bts_event_stop(struct perf_event *event, int state)
static void bts_event_stop(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
struct bts_buffer *buf = NULL;
int state = READ_ONCE(bts->state);
@ -338,9 +338,14 @@ static void bts_event_stop(struct perf_event *event, int flags)
void intel_bts_enable_local(void)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
int state = READ_ONCE(bts->state);
struct bts_ctx *bts;
int state;
if (!bts_ctx)
return;
bts = this_cpu_ptr(bts_ctx);
state = READ_ONCE(bts->state);
/*
* Here we transition from INACTIVE to ACTIVE;
* if we instead are STOPPED from the interrupt handler,
@ -358,7 +363,12 @@ void intel_bts_enable_local(void)
void intel_bts_disable_local(void)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_ctx *bts;
if (!bts_ctx)
return;
bts = this_cpu_ptr(bts_ctx);
/*
* Here we transition from ACTIVE to INACTIVE;
@ -450,12 +460,17 @@ bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle)
int intel_bts_interrupt(void)
{
struct debug_store *ds = this_cpu_ptr(&cpu_hw_events)->ds;
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct perf_event *event = bts->handle.event;
struct bts_ctx *bts;
struct perf_event *event;
struct bts_buffer *buf;
s64 old_head;
int err = -ENOSPC, handled = 0;
if (!bts_ctx)
return 0;
bts = this_cpu_ptr(bts_ctx);
event = bts->handle.event;
/*
* The only surefire way of knowing if this NMI is ours is by checking
* the write ptr against the PMI threshold.
@ -518,7 +533,7 @@ static void bts_event_del(struct perf_event *event, int mode)
static int bts_event_add(struct perf_event *event, int mode)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
@ -605,6 +620,10 @@ static __init int bts_init(void)
return -ENODEV;
}
bts_ctx = alloc_percpu(struct bts_ctx);
if (!bts_ctx)
return -ENOMEM;
bts_pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_ITRACE |
PERF_PMU_CAP_EXCLUSIVE;
bts_pmu.task_ctx_nr = perf_sw_context;

View File

@ -2714,7 +2714,7 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
* modify by a NMI. PMU has to be disabled before calling this function.
*/
static u64 intel_update_topdown_event(struct perf_event *event, int metric_end)
static u64 intel_update_topdown_event(struct perf_event *event, int metric_end, u64 *val)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct perf_event *other;
@ -2722,13 +2722,24 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end)
bool reset = true;
int idx;
/* read Fixed counter 3 */
rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
if (!slots)
return 0;
if (!val) {
/* read Fixed counter 3 */
rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
if (!slots)
return 0;
/* read PERF_METRICS */
rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
/* read PERF_METRICS */
rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
} else {
slots = val[0];
metrics = val[1];
/*
* Don't reset the PERF_METRICS and Fixed counter 3
* for each PEBS record read. Utilize the RDPMC metrics
* clear mode.
*/
reset = false;
}
for_each_set_bit(idx, cpuc->active_mask, metric_end + 1) {
if (!is_topdown_idx(idx))
@ -2771,36 +2782,47 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end)
return slots;
}
static u64 icl_update_topdown_event(struct perf_event *event)
static u64 icl_update_topdown_event(struct perf_event *event, u64 *val)
{
return intel_update_topdown_event(event, INTEL_PMC_IDX_METRIC_BASE +
x86_pmu.num_topdown_events - 1);
x86_pmu.num_topdown_events - 1,
val);
}
DEFINE_STATIC_CALL(intel_pmu_update_topdown_event, x86_perf_event_update);
static void intel_pmu_read_topdown_event(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
/* Only need to call update_topdown_event() once for group read. */
if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
!is_slots_event(event))
return;
perf_pmu_disable(event->pmu);
static_call(intel_pmu_update_topdown_event)(event);
perf_pmu_enable(event->pmu);
}
DEFINE_STATIC_CALL(intel_pmu_update_topdown_event, intel_pmu_topdown_event_update);
static void intel_pmu_read_event(struct perf_event *event)
{
if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
intel_pmu_auto_reload_read(event);
else if (is_topdown_count(event))
intel_pmu_read_topdown_event(event);
else
x86_perf_event_update(event);
if (event->hw.flags & (PERF_X86_EVENT_AUTO_RELOAD | PERF_X86_EVENT_TOPDOWN) ||
is_pebs_counter_event_group(event)) {
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
bool pmu_enabled = cpuc->enabled;
/* Only need to call update_topdown_event() once for group read. */
if (is_metric_event(event) && (cpuc->txn_flags & PERF_PMU_TXN_READ))
return;
cpuc->enabled = 0;
if (pmu_enabled)
intel_pmu_disable_all();
/*
* If the PEBS counters snapshotting is enabled,
* the topdown event is available in PEBS records.
*/
if (is_topdown_event(event) && !is_pebs_counter_event_group(event))
static_call(intel_pmu_update_topdown_event)(event, NULL);
else
intel_pmu_drain_pebs_buffer();
cpuc->enabled = pmu_enabled;
if (pmu_enabled)
intel_pmu_enable_all(0);
return;
}
x86_perf_event_update(event);
}
static void intel_pmu_enable_fixed(struct perf_event *event)
@ -2932,7 +2954,7 @@ static int intel_pmu_set_period(struct perf_event *event)
static u64 intel_pmu_update(struct perf_event *event)
{
if (unlikely(is_topdown_count(event)))
return static_call(intel_pmu_update_topdown_event)(event);
return static_call(intel_pmu_update_topdown_event)(event, NULL);
return x86_perf_event_update(event);
}
@ -3070,7 +3092,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
handled++;
x86_pmu_handle_guest_pebs(regs, &data);
x86_pmu.drain_pebs(regs, &data);
static_call(x86_pmu_drain_pebs)(regs, &data);
status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
/*
@ -3098,7 +3120,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
*/
if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT, (unsigned long *)&status)) {
handled++;
static_call(intel_pmu_update_topdown_event)(NULL);
static_call(intel_pmu_update_topdown_event)(NULL, NULL);
}
/*
@ -3116,6 +3138,27 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
if (!test_bit(bit, cpuc->active_mask))
continue;
/*
* There may be unprocessed PEBS records in the PEBS buffer,
* which still stores the previous values.
* Process those records first before handling the latest value.
* For example,
* A is a regular counter
* B is a PEBS event which reads A
* C is a PEBS event
*
* The following can happen:
* B-assist A=1
* C A=2
* B-assist A=3
* A-overflow-PMI A=4
* C-assist-PMI (PEBS buffer) A=5
*
* The PEBS buffer has to be drained before handling the A-PMI
*/
if (is_pebs_counter_event_group(event))
x86_pmu.drain_pebs(regs, &data);
if (!intel_pmu_save_and_restart(event))
continue;
@ -4148,6 +4191,13 @@ static int intel_pmu_hw_config(struct perf_event *event)
event->hw.flags |= PERF_X86_EVENT_PEBS_VIA_PT;
}
if ((event->attr.sample_type & PERF_SAMPLE_READ) &&
(x86_pmu.intel_cap.pebs_format >= 6) &&
x86_pmu.intel_cap.pebs_baseline &&
is_sampling_event(event) &&
event->attr.precise_ip)
event->group_leader->hw.flags |= PERF_X86_EVENT_PEBS_CNTR;
if ((event->attr.type == PERF_TYPE_HARDWARE) ||
(event->attr.type == PERF_TYPE_HW_CACHE))
return 0;
@ -5244,16 +5294,10 @@ static void intel_pmu_cpu_dead(int cpu)
}
static void intel_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
bool sched_in)
struct task_struct *task, bool sched_in)
{
intel_pmu_pebs_sched_task(pmu_ctx, sched_in);
intel_pmu_lbr_sched_task(pmu_ctx, sched_in);
}
static void intel_pmu_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
struct perf_event_pmu_context *next_epc)
{
intel_pmu_lbr_swap_task_ctx(prev_epc, next_epc);
intel_pmu_lbr_sched_task(pmu_ctx, task, sched_in);
}
static int intel_pmu_check_period(struct perf_event *event, u64 value)
@ -5424,7 +5468,6 @@ static __initconst const struct x86_pmu intel_pmu = {
.guest_get_msrs = intel_guest_get_msrs,
.sched_task = intel_pmu_sched_task,
.swap_task_ctx = intel_pmu_swap_task_ctx,
.check_period = intel_pmu_check_period,

View File

@ -953,11 +953,11 @@ unlock:
return 1;
}
static inline void intel_pmu_drain_pebs_buffer(void)
void intel_pmu_drain_pebs_buffer(void)
{
struct perf_sample_data data;
x86_pmu.drain_pebs(NULL, &data);
static_call(x86_pmu_drain_pebs)(NULL, &data);
}
/*
@ -1294,6 +1294,19 @@ static inline void pebs_update_threshold(struct cpu_hw_events *cpuc)
ds->pebs_interrupt_threshold = threshold;
}
#define PEBS_DATACFG_CNTRS(x) \
((x >> PEBS_DATACFG_CNTR_SHIFT) & PEBS_DATACFG_CNTR_MASK)
#define PEBS_DATACFG_CNTR_BIT(x) \
(((1ULL << x) & PEBS_DATACFG_CNTR_MASK) << PEBS_DATACFG_CNTR_SHIFT)
#define PEBS_DATACFG_FIX(x) \
((x >> PEBS_DATACFG_FIX_SHIFT) & PEBS_DATACFG_FIX_MASK)
#define PEBS_DATACFG_FIX_BIT(x) \
(((1ULL << (x)) & PEBS_DATACFG_FIX_MASK) \
<< PEBS_DATACFG_FIX_SHIFT)
static void adaptive_pebs_record_size_update(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@ -1308,10 +1321,58 @@ static void adaptive_pebs_record_size_update(void)
sz += sizeof(struct pebs_xmm);
if (pebs_data_cfg & PEBS_DATACFG_LBRS)
sz += x86_pmu.lbr_nr * sizeof(struct lbr_entry);
if (pebs_data_cfg & (PEBS_DATACFG_METRICS | PEBS_DATACFG_CNTR)) {
sz += sizeof(struct pebs_cntr_header);
/* Metrics base and Metrics Data */
if (pebs_data_cfg & PEBS_DATACFG_METRICS)
sz += 2 * sizeof(u64);
if (pebs_data_cfg & PEBS_DATACFG_CNTR) {
sz += (hweight64(PEBS_DATACFG_CNTRS(pebs_data_cfg)) +
hweight64(PEBS_DATACFG_FIX(pebs_data_cfg))) *
sizeof(u64);
}
}
cpuc->pebs_record_size = sz;
}
static void __intel_pmu_pebs_update_cfg(struct perf_event *event,
int idx, u64 *pebs_data_cfg)
{
if (is_metric_event(event)) {
*pebs_data_cfg |= PEBS_DATACFG_METRICS;
return;
}
*pebs_data_cfg |= PEBS_DATACFG_CNTR;
if (idx >= INTEL_PMC_IDX_FIXED)
*pebs_data_cfg |= PEBS_DATACFG_FIX_BIT(idx - INTEL_PMC_IDX_FIXED);
else
*pebs_data_cfg |= PEBS_DATACFG_CNTR_BIT(idx);
}
static void intel_pmu_late_setup(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct perf_event *event;
u64 pebs_data_cfg = 0;
int i;
for (i = 0; i < cpuc->n_events; i++) {
event = cpuc->event_list[i];
if (!is_pebs_counter_event_group(event))
continue;
__intel_pmu_pebs_update_cfg(event, cpuc->assign[i], &pebs_data_cfg);
}
if (pebs_data_cfg & ~cpuc->pebs_data_cfg)
cpuc->pebs_data_cfg |= pebs_data_cfg | PEBS_UPDATE_DS_SW;
}
#define PERF_PEBS_MEMINFO_TYPE (PERF_SAMPLE_ADDR | PERF_SAMPLE_DATA_SRC | \
PERF_SAMPLE_PHYS_ADDR | \
PERF_SAMPLE_WEIGHT_TYPE | \
@ -1914,12 +1975,89 @@ static void adaptive_pebs_save_regs(struct pt_regs *regs,
#endif
}
static void intel_perf_event_update_pmc(struct perf_event *event, u64 pmc)
{
int shift = 64 - x86_pmu.cntval_bits;
struct hw_perf_event *hwc;
u64 delta, prev_pmc;
/*
* A recorded counter may not have an assigned event in the
* following cases. The value should be dropped.
* - An event is deleted. There is still an active PEBS event.
* The PEBS record doesn't shrink on pmu::del().
* If the counter of the deleted event once occurred in a PEBS
* record, PEBS still records the counter until the counter is
* reassigned.
* - An event is stopped for some reason, e.g., throttled.
* During this period, another event is added and takes the
* counter of the stopped event. The stopped event is assigned
* to another new and uninitialized counter, since the
* x86_pmu_start(RELOAD) is not invoked for a stopped event.
* The PEBS__DATA_CFG is updated regardless of the event state.
* The uninitialized counter can be recorded in a PEBS record.
* But the cpuc->events[uninitialized_counter] is always NULL,
* because the event is stopped. The uninitialized value is
* safely dropped.
*/
if (!event)
return;
hwc = &event->hw;
prev_pmc = local64_read(&hwc->prev_count);
/* Only update the count when the PMU is disabled */
WARN_ON(this_cpu_read(cpu_hw_events.enabled));
local64_set(&hwc->prev_count, pmc);
delta = (pmc << shift) - (prev_pmc << shift);
delta >>= shift;
local64_add(delta, &event->count);
local64_sub(delta, &hwc->period_left);
}
static inline void __setup_pebs_counter_group(struct cpu_hw_events *cpuc,
struct perf_event *event,
struct pebs_cntr_header *cntr,
void *next_record)
{
int bit;
for_each_set_bit(bit, (unsigned long *)&cntr->cntr, INTEL_PMC_MAX_GENERIC) {
intel_perf_event_update_pmc(cpuc->events[bit], *(u64 *)next_record);
next_record += sizeof(u64);
}
for_each_set_bit(bit, (unsigned long *)&cntr->fixed, INTEL_PMC_MAX_FIXED) {
/* The slots event will be handled with perf_metric later */
if ((cntr->metrics == INTEL_CNTR_METRICS) &&
(bit + INTEL_PMC_IDX_FIXED == INTEL_PMC_IDX_FIXED_SLOTS)) {
next_record += sizeof(u64);
continue;
}
intel_perf_event_update_pmc(cpuc->events[bit + INTEL_PMC_IDX_FIXED],
*(u64 *)next_record);
next_record += sizeof(u64);
}
/* HW will reload the value right after the overflow. */
if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
local64_set(&event->hw.prev_count, (u64)-event->hw.sample_period);
if (cntr->metrics == INTEL_CNTR_METRICS) {
static_call(intel_pmu_update_topdown_event)
(cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS],
(u64 *)next_record);
next_record += 2 * sizeof(u64);
}
}
#define PEBS_LATENCY_MASK 0xffff
/*
* With adaptive PEBS the layout depends on what fields are configured.
*/
static void setup_pebs_adaptive_sample_data(struct perf_event *event,
struct pt_regs *iregs, void *__pebs,
struct perf_sample_data *data,
@ -2049,6 +2187,28 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
}
}
if (format_group & (PEBS_DATACFG_CNTR | PEBS_DATACFG_METRICS)) {
struct pebs_cntr_header *cntr = next_record;
unsigned int nr;
next_record += sizeof(struct pebs_cntr_header);
/*
* The PEBS_DATA_CFG is a global register, which is the
* superset configuration for all PEBS events.
* For the PEBS record of non-sample-read group, ignore
* the counter snapshot fields.
*/
if (is_pebs_counter_event_group(event)) {
__setup_pebs_counter_group(cpuc, event, cntr, next_record);
data->sample_flags |= PERF_SAMPLE_READ;
}
nr = hweight32(cntr->cntr) + hweight32(cntr->fixed);
if (cntr->metrics == INTEL_CNTR_METRICS)
nr += 2;
next_record += nr * sizeof(u64);
}
WARN_ONCE(next_record != __pebs + basic->format_size,
"PEBS record size %u, expected %llu, config %llx\n",
basic->format_size,
@ -2094,15 +2254,6 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
return NULL;
}
void intel_pmu_auto_reload_read(struct perf_event *event)
{
WARN_ON(!(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD));
perf_pmu_disable(event->pmu);
intel_pmu_drain_pebs_buffer();
perf_pmu_enable(event->pmu);
}
/*
* Special variant of intel_pmu_save_and_restart() for auto-reload.
*/
@ -2211,13 +2362,21 @@ __intel_pmu_pebs_last_event(struct perf_event *event,
}
if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
/*
* Now, auto-reload is only enabled in fixed period mode.
* The reload value is always hwc->sample_period.
* May need to change it, if auto-reload is enabled in
* freq mode later.
*/
intel_pmu_save_and_restart_reload(event, count);
if ((is_pebs_counter_event_group(event))) {
/*
* The value of each sample has been updated when setup
* the corresponding sample data.
*/
perf_event_update_userpage(event);
} else {
/*
* Now, auto-reload is only enabled in fixed period mode.
* The reload value is always hwc->sample_period.
* May need to change it, if auto-reload is enabled in
* freq mode later.
*/
intel_pmu_save_and_restart_reload(event, count);
}
} else
intel_pmu_save_and_restart(event);
}
@ -2552,6 +2711,11 @@ void __init intel_ds_init(void)
break;
case 6:
if (x86_pmu.intel_cap.pebs_baseline) {
x86_pmu.large_pebs_flags |= PERF_SAMPLE_READ;
x86_pmu.late_setup = intel_pmu_late_setup;
}
fallthrough;
case 5:
x86_pmu.pebs_ept = 1;
fallthrough;
@ -2576,7 +2740,7 @@ void __init intel_ds_init(void)
PERF_SAMPLE_REGS_USER |
PERF_SAMPLE_REGS_INTR);
}
pr_cont("PEBS fmt4%c%s, ", pebs_type, pebs_qual);
pr_cont("PEBS fmt%d%c%s, ", format, pebs_type, pebs_qual);
/*
* The PEBS-via-PT is not supported on hybrid platforms,

View File

@ -422,11 +422,17 @@ static __always_inline bool lbr_is_reset_in_cstate(void *ctx)
return !rdlbr_from(((struct x86_perf_task_context *)ctx)->tos, NULL);
}
static inline bool has_lbr_callstack_users(void *ctx)
{
return task_context_opt(ctx)->lbr_callstack_users ||
x86_pmu.lbr_callstack_users;
}
static void __intel_pmu_lbr_restore(void *ctx)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
if (task_context_opt(ctx)->lbr_callstack_users == 0 ||
if (!has_lbr_callstack_users(ctx) ||
task_context_opt(ctx)->lbr_stack_state == LBR_NONE) {
intel_pmu_lbr_reset();
return;
@ -503,7 +509,7 @@ static void __intel_pmu_lbr_save(void *ctx)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
if (task_context_opt(ctx)->lbr_callstack_users == 0) {
if (!has_lbr_callstack_users(ctx)) {
task_context_opt(ctx)->lbr_stack_state = LBR_NONE;
return;
}
@ -516,32 +522,11 @@ static void __intel_pmu_lbr_save(void *ctx)
cpuc->last_log_id = ++task_context_opt(ctx)->log_id;
}
void intel_pmu_lbr_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
struct perf_event_pmu_context *next_epc)
{
void *prev_ctx_data, *next_ctx_data;
swap(prev_epc->task_ctx_data, next_epc->task_ctx_data);
/*
* Architecture specific synchronization makes sense in case
* both prev_epc->task_ctx_data and next_epc->task_ctx_data
* pointers are allocated.
*/
prev_ctx_data = next_epc->task_ctx_data;
next_ctx_data = prev_epc->task_ctx_data;
if (!prev_ctx_data || !next_ctx_data)
return;
swap(task_context_opt(prev_ctx_data)->lbr_callstack_users,
task_context_opt(next_ctx_data)->lbr_callstack_users);
}
void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct perf_ctx_data *ctx_data;
void *task_ctx;
if (!cpuc->lbr_users)
@ -552,14 +537,18 @@ void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched
* the task was scheduled out, restore the stack. Otherwise flush
* the LBR stack.
*/
task_ctx = pmu_ctx ? pmu_ctx->task_ctx_data : NULL;
rcu_read_lock();
ctx_data = rcu_dereference(task->perf_ctx_data);
task_ctx = ctx_data ? ctx_data->data : NULL;
if (task_ctx) {
if (sched_in)
__intel_pmu_lbr_restore(task_ctx);
else
__intel_pmu_lbr_save(task_ctx);
rcu_read_unlock();
return;
}
rcu_read_unlock();
/*
* Since a context switch can flip the address space and LBR entries
@ -588,9 +577,19 @@ void intel_pmu_lbr_add(struct perf_event *event)
cpuc->br_sel = event->hw.branch_reg.reg;
if (branch_user_callstack(cpuc->br_sel) && event->pmu_ctx->task_ctx_data)
task_context_opt(event->pmu_ctx->task_ctx_data)->lbr_callstack_users++;
if (branch_user_callstack(cpuc->br_sel)) {
if (event->attach_state & PERF_ATTACH_TASK) {
struct task_struct *task = event->hw.target;
struct perf_ctx_data *ctx_data;
rcu_read_lock();
ctx_data = rcu_dereference(task->perf_ctx_data);
if (ctx_data)
task_context_opt(ctx_data->data)->lbr_callstack_users++;
rcu_read_unlock();
} else
x86_pmu.lbr_callstack_users++;
}
/*
* Request pmu::sched_task() callback, which will fire inside the
* regular perf event scheduling, so that call will:
@ -664,9 +663,19 @@ void intel_pmu_lbr_del(struct perf_event *event)
if (!x86_pmu.lbr_nr)
return;
if (branch_user_callstack(cpuc->br_sel) &&
event->pmu_ctx->task_ctx_data)
task_context_opt(event->pmu_ctx->task_ctx_data)->lbr_callstack_users--;
if (branch_user_callstack(cpuc->br_sel)) {
if (event->attach_state & PERF_ATTACH_TASK) {
struct task_struct *task = event->hw.target;
struct perf_ctx_data *ctx_data;
rcu_read_lock();
ctx_data = rcu_dereference(task->perf_ctx_data);
if (ctx_data)
task_context_opt(ctx_data->data)->lbr_callstack_users--;
rcu_read_unlock();
} else
x86_pmu.lbr_callstack_users--;
}
if (event->hw.flags & PERF_X86_EVENT_LBR_SELECT)
cpuc->lbr_select = 0;

View File

@ -115,6 +115,11 @@ static inline bool is_branch_counters_group(struct perf_event *event)
return event->group_leader->hw.flags & PERF_X86_EVENT_BRANCH_COUNTERS;
}
static inline bool is_pebs_counter_event_group(struct perf_event *event)
{
return event->group_leader->hw.flags & PERF_X86_EVENT_PEBS_CNTR;
}
struct amd_nb {
int nb_id; /* NorthBridge id */
int refcnt; /* reference count */
@ -800,6 +805,7 @@ struct x86_pmu {
u64 (*update)(struct perf_event *event);
int (*hw_config)(struct perf_event *event);
int (*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
void (*late_setup)(void);
unsigned eventsel;
unsigned perfctr;
unsigned fixedctr;
@ -869,7 +875,7 @@ struct x86_pmu {
void (*check_microcode)(void);
void (*sched_task)(struct perf_event_pmu_context *pmu_ctx,
bool sched_in);
struct task_struct *task, bool sched_in);
/*
* Intel Arch Perfmon v2+
@ -914,6 +920,7 @@ struct x86_pmu {
const int *lbr_sel_map; /* lbr_select mappings */
int *lbr_ctl_map; /* LBR_CTL mappings */
};
u64 lbr_callstack_users; /* lbr callstack system wide users */
bool lbr_double_abort; /* duplicated lbr aborts */
bool lbr_pt_coexist; /* (LBR|BTS) may coexist with PT */
@ -951,14 +958,6 @@ struct x86_pmu {
*/
int num_topdown_events;
/*
* perf task context (i.e. struct perf_event_pmu_context::task_ctx_data)
* switch helper to bridge calls from perf/core to perf/x86.
* See struct pmu::swap_task_ctx() usage for examples;
*/
void (*swap_task_ctx)(struct perf_event_pmu_context *prev_epc,
struct perf_event_pmu_context *next_epc);
/*
* AMD bits
*/
@ -1107,6 +1106,8 @@ extern struct x86_pmu x86_pmu __read_mostly;
DECLARE_STATIC_CALL(x86_pmu_set_period, *x86_pmu.set_period);
DECLARE_STATIC_CALL(x86_pmu_update, *x86_pmu.update);
DECLARE_STATIC_CALL(x86_pmu_drain_pebs, *x86_pmu.drain_pebs);
DECLARE_STATIC_CALL(x86_pmu_late_setup, *x86_pmu.late_setup);
static __always_inline struct x86_perf_task_context_opt *task_context_opt(void *ctx)
{
@ -1148,6 +1149,12 @@ extern u64 __read_mostly hw_cache_extra_regs
u64 x86_perf_event_update(struct perf_event *event);
static inline u64 intel_pmu_topdown_event_update(struct perf_event *event, u64 *val)
{
return x86_perf_event_update(event);
}
DECLARE_STATIC_CALL(intel_pmu_update_topdown_event, intel_pmu_topdown_event_update);
static inline unsigned int x86_pmu_config_addr(int index)
{
return x86_pmu.eventsel + (x86_pmu.addr_offset ?
@ -1394,7 +1401,8 @@ void amd_pmu_lbr_reset(void);
void amd_pmu_lbr_read(void);
void amd_pmu_lbr_add(struct perf_event *event);
void amd_pmu_lbr_del(struct perf_event *event);
void amd_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
void amd_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in);
void amd_pmu_lbr_enable_all(void);
void amd_pmu_lbr_disable_all(void);
int amd_pmu_lbr_hw_config(struct perf_event *event);
@ -1448,7 +1456,8 @@ static inline void amd_pmu_brs_del(struct perf_event *event)
perf_sched_cb_dec(event->pmu);
}
void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in);
#else
static inline int amd_brs_init(void)
{
@ -1473,7 +1482,8 @@ static inline void amd_pmu_brs_del(struct perf_event *event)
{
}
static inline void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
static inline void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
}
@ -1643,7 +1653,7 @@ void intel_pmu_pebs_disable_all(void);
void intel_pmu_pebs_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
void intel_pmu_auto_reload_read(struct perf_event *event);
void intel_pmu_drain_pebs_buffer(void);
void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr);
@ -1653,10 +1663,8 @@ void intel_pmu_lbr_save_brstack(struct perf_sample_data *data,
struct cpu_hw_events *cpuc,
struct perf_event *event);
void intel_pmu_lbr_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
struct perf_event_pmu_context *next_epc);
void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in);
u64 lbr_from_signext_quirk_wr(u64 val);

View File

@ -9,7 +9,7 @@ PERF_ARCH(PEBS_LD_HSW, 0x00008) /* haswell style datala, load */
PERF_ARCH(PEBS_NA_HSW, 0x00010) /* haswell style datala, unknown */
PERF_ARCH(EXCL, 0x00020) /* HT exclusivity on counter */
PERF_ARCH(DYNAMIC, 0x00040) /* dynamic alloc'd constraint */
/* 0x00080 */
PERF_ARCH(PEBS_CNTR, 0x00080) /* PEBS counters snapshot */
PERF_ARCH(EXCL_ACCT, 0x00100) /* accounted EXCL event */
PERF_ARCH(AUTO_RELOAD, 0x00200) /* use PEBS auto-reload */
PERF_ARCH(LARGE_PEBS, 0x00400) /* use large PEBS */

View File

@ -64,7 +64,8 @@ union ibs_op_ctl {
opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
reserved0:5, /* 27-31: reserved */
opcurcnt:27, /* 32-58: periodic op counter current count */
reserved1:5; /* 59-63: reserved */
ldlat_thrsh:4, /* 59-62: Load Latency threshold */
ldlat_en:1; /* 63: Load Latency enabled */
};
};

View File

@ -141,6 +141,12 @@
#define PEBS_DATACFG_XMMS BIT_ULL(2)
#define PEBS_DATACFG_LBRS BIT_ULL(3)
#define PEBS_DATACFG_LBR_SHIFT 24
#define PEBS_DATACFG_CNTR BIT_ULL(4)
#define PEBS_DATACFG_CNTR_SHIFT 32
#define PEBS_DATACFG_CNTR_MASK GENMASK_ULL(15, 0)
#define PEBS_DATACFG_FIX_SHIFT 48
#define PEBS_DATACFG_FIX_MASK GENMASK_ULL(7, 0)
#define PEBS_DATACFG_METRICS BIT_ULL(5)
/* Steal the highest bit of pebs_data_cfg for SW usage */
#define PEBS_UPDATE_DS_SW BIT_ULL(63)
@ -482,6 +488,15 @@ struct pebs_xmm {
u64 xmm[16*2]; /* two entries for each register */
};
struct pebs_cntr_header {
u32 cntr;
u32 fixed;
u32 metrics;
u32 reserved;
};
#define INTEL_CNTR_METRICS 0x3
/*
* AMD Extended Performance Monitoring and Debug cpuid feature detection
*/
@ -509,6 +524,8 @@ struct pebs_xmm {
#define IBS_CAPS_FETCHCTLEXTD (1U<<9)
#define IBS_CAPS_OPDATA4 (1U<<10)
#define IBS_CAPS_ZEN4 (1U<<11)
#define IBS_CAPS_OPLDLAT (1U<<12)
#define IBS_CAPS_OPDTLBPGSIZE (1U<<19)
#define IBS_CAPS_DEFAULT (IBS_CAPS_AVAIL \
| IBS_CAPS_FETCHSAM \
@ -534,8 +551,11 @@ struct pebs_xmm {
* The lower 7 bits of the current count are random bits
* preloaded by hardware and ignored in software
*/
#define IBS_OP_LDLAT_EN (1ULL<<63)
#define IBS_OP_LDLAT_THRSH (0xFULL<<59)
#define IBS_OP_CUR_CNT (0xFFF80ULL<<32)
#define IBS_OP_CUR_CNT_RAND (0x0007FULL<<32)
#define IBS_OP_CUR_CNT_EXT_MASK (0x7FULL<<52)
#define IBS_OP_CNT_CTL (1ULL<<19)
#define IBS_OP_VAL (1ULL<<18)
#define IBS_OP_ENABLE (1ULL<<17)

View File

@ -357,19 +357,23 @@ void *arch_uprobe_trampoline(unsigned long *psize)
return &insn;
}
static unsigned long trampoline_check_ip(void)
static unsigned long trampoline_check_ip(unsigned long tramp)
{
unsigned long tramp = uprobe_get_trampoline_vaddr();
return tramp + (uretprobe_syscall_check - uretprobe_trampoline_entry);
}
SYSCALL_DEFINE0(uretprobe)
{
struct pt_regs *regs = task_pt_regs(current);
unsigned long err, ip, sp, r11_cx_ax[3];
unsigned long err, ip, sp, r11_cx_ax[3], tramp;
if (regs->ip != trampoline_check_ip())
/* If there's no trampoline, we are called from wrong place. */
tramp = uprobe_get_trampoline_vaddr();
if (unlikely(tramp == UPROBE_NO_TRAMPOLINE_VADDR))
goto sigill;
/* Make sure the ip matches the only allowed sys_uretprobe caller. */
if (unlikely(regs->ip != trampoline_check_ip(tramp)))
goto sigill;
err = copy_from_user(r11_cx_ax, (void __user *)regs->sp, sizeof(r11_cx_ax));

View File

@ -15,6 +15,7 @@
#include <linux/radix-tree.h>
#include <linux/gfp.h>
#include <linux/percpu.h>
#include <linux/cleanup.h>
struct idr {
struct radix_tree_root idr_rt;
@ -124,6 +125,22 @@ void *idr_get_next_ul(struct idr *, unsigned long *nextid);
void *idr_replace(struct idr *, void *, unsigned long id);
void idr_destroy(struct idr *);
struct __class_idr {
struct idr *idr;
int id;
};
#define idr_null ((struct __class_idr){ NULL, -1 })
#define take_idr_id(id) __get_and_null(id, idr_null)
DEFINE_CLASS(idr_alloc, struct __class_idr,
if (_T.id >= 0) idr_remove(_T.idr, _T.id),
((struct __class_idr){
.idr = idr,
.id = idr_alloc(idr, ptr, start, end, gfp),
}),
struct idr *idr, void *ptr, int start, int end, gfp_t gfp);
/**
* idr_init_base() - Initialise an IDR.
* @idr: IDR handle.

View File

@ -17,7 +17,6 @@
void lockup_detector_init(void);
void lockup_detector_retry_init(void);
void lockup_detector_soft_poweroff(void);
void lockup_detector_cleanup(void);
extern int watchdog_user_enabled;
extern int watchdog_thresh;
@ -37,7 +36,6 @@ extern int sysctl_hardlockup_all_cpu_backtrace;
static inline void lockup_detector_init(void) { }
static inline void lockup_detector_retry_init(void) { }
static inline void lockup_detector_soft_poweroff(void) { }
static inline void lockup_detector_cleanup(void) { }
#endif /* !CONFIG_LOCKUP_DETECTOR */
#ifdef CONFIG_SOFTLOCKUP_DETECTOR
@ -104,12 +102,10 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
extern void hardlockup_detector_perf_stop(void);
extern void hardlockup_detector_perf_restart(void);
extern void hardlockup_detector_perf_cleanup(void);
extern void hardlockup_config_perf_event(const char *str);
#else
static inline void hardlockup_detector_perf_stop(void) { }
static inline void hardlockup_detector_perf_restart(void) { }
static inline void hardlockup_detector_perf_cleanup(void) { }
static inline void hardlockup_config_perf_event(const char *str) { }
#endif

View File

@ -8,6 +8,7 @@
#include <linux/wait.h>
#include <linux/rcu_sync.h>
#include <linux/lockdep.h>
#include <linux/cleanup.h>
struct percpu_rw_semaphore {
struct rcu_sync rss;
@ -125,6 +126,13 @@ extern bool percpu_is_read_locked(struct percpu_rw_semaphore *);
extern void percpu_down_write(struct percpu_rw_semaphore *);
extern void percpu_up_write(struct percpu_rw_semaphore *);
DEFINE_GUARD(percpu_read, struct percpu_rw_semaphore *,
percpu_down_read(_T), percpu_up_read(_T))
DEFINE_GUARD_COND(percpu_read, _try, percpu_down_read_trylock(_T))
DEFINE_GUARD(percpu_write, struct percpu_rw_semaphore *,
percpu_down_write(_T), percpu_up_write(_T))
static inline bool percpu_is_write_locked(struct percpu_rw_semaphore *sem)
{
return atomic_read(&sem->block);

View File

@ -343,8 +343,7 @@ struct pmu {
*/
unsigned int scope;
int __percpu *pmu_disable_count;
struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
struct perf_cpu_pmu_context * __percpu *cpu_pmu_context;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
int hrtimer_interval_ms;
@ -495,23 +494,13 @@ struct pmu {
* context-switches callback
*/
void (*sched_task) (struct perf_event_pmu_context *pmu_ctx,
bool sched_in);
struct task_struct *task, bool sched_in);
/*
* Kmem cache of PMU specific data
*/
struct kmem_cache *task_ctx_cache;
/*
* PMU specific parts of task perf event context (i.e. ctx->task_ctx_data)
* can be synchronized using this function. See Intel LBR callstack support
* implementation and Perf core context switch handling callbacks for usage
* examples.
*/
void (*swap_task_ctx) (struct perf_event_pmu_context *prev_epc,
struct perf_event_pmu_context *next_epc);
/* optional */
/*
* Set up pmu-private data structures for an AUX area
*/
@ -673,13 +662,16 @@ struct swevent_hlist {
struct rcu_head rcu_head;
};
#define PERF_ATTACH_CONTEXT 0x01
#define PERF_ATTACH_GROUP 0x02
#define PERF_ATTACH_TASK 0x04
#define PERF_ATTACH_TASK_DATA 0x08
#define PERF_ATTACH_ITRACE 0x10
#define PERF_ATTACH_SCHED_CB 0x20
#define PERF_ATTACH_CHILD 0x40
#define PERF_ATTACH_CONTEXT 0x0001
#define PERF_ATTACH_GROUP 0x0002
#define PERF_ATTACH_TASK 0x0004
#define PERF_ATTACH_TASK_DATA 0x0008
#define PERF_ATTACH_GLOBAL_DATA 0x0010
#define PERF_ATTACH_SCHED_CB 0x0020
#define PERF_ATTACH_CHILD 0x0040
#define PERF_ATTACH_EXCLUSIVE 0x0080
#define PERF_ATTACH_CALLCHAIN 0x0100
#define PERF_ATTACH_ITRACE 0x0200
struct bpf_prog;
struct perf_cgroup;
@ -921,7 +913,7 @@ struct perf_event_pmu_context {
struct list_head pinned_active;
struct list_head flexible_active;
/* Used to avoid freeing per-cpu perf_event_pmu_context */
/* Used to identify the per-cpu perf_event_pmu_context */
unsigned int embedded : 1;
unsigned int nr_events;
@ -931,7 +923,6 @@ struct perf_event_pmu_context {
atomic_t refcount; /* event <-> epc */
struct rcu_head rcu_head;
void *task_ctx_data; /* pmu specific data */
/*
* Set when one or more (plausibly active) event can't be scheduled
* due to pmu overcommit or pmu constraints, except tolerant to
@ -979,7 +970,6 @@ struct perf_event_context {
int nr_user;
int is_active;
int nr_task_data;
int nr_stat;
int nr_freq;
int rotate_disable;
@ -1020,6 +1010,41 @@ struct perf_event_context {
local_t nr_no_switch_fast;
};
/**
* struct perf_ctx_data - PMU specific data for a task
* @rcu_head: To avoid the race on free PMU specific data
* @refcount: To track users
* @global: To track system-wide users
* @ctx_cache: Kmem cache of PMU specific data
* @data: PMU specific data
*
* Currently, the struct is only used in Intel LBR call stack mode to
* save/restore the call stack of a task on context switches.
*
* The rcu_head is used to prevent the race on free the data.
* The data only be allocated when Intel LBR call stack mode is enabled.
* The data will be freed when the mode is disabled.
* The content of the data will only be accessed in context switch, which
* should be protected by rcu_read_lock().
*
* Because of the alignment requirement of Intel Arch LBR, the Kmem cache
* is used to allocate the PMU specific data. The ctx_cache is to track
* the Kmem cache.
*
* Careful: Struct perf_ctx_data is added as a pointer in struct task_struct.
* When system-wide Intel LBR call stack mode is enabled, a buffer with
* constant size will be allocated for each task.
* Also, system memory consumption can further grow when the size of
* struct perf_ctx_data enlarges.
*/
struct perf_ctx_data {
struct rcu_head rcu_head;
refcount_t refcount;
int global;
struct kmem_cache *ctx_cache;
void *data;
};
struct perf_cpu_pmu_context {
struct perf_event_pmu_context epc;
struct perf_event_pmu_context *task_epc;
@ -1029,6 +1054,7 @@ struct perf_cpu_pmu_context {
int active_oncpu;
int exclusive;
int pmu_disable_count;
raw_spinlock_t hrtimer_lock;
struct hrtimer hrtimer;
@ -1062,7 +1088,13 @@ struct perf_output_handle {
struct perf_buffer *rb;
unsigned long wakeup;
unsigned long size;
u64 aux_flags;
union {
u64 flags; /* perf_output*() */
u64 aux_flags; /* perf_aux_output*() */
struct {
u64 skip_read : 1;
};
};
union {
void *addr;
unsigned long head;
@ -1339,6 +1371,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
if (branch_sample_hw_index(event))
size += sizeof(u64);
brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);
size += brs->nr * sizeof(struct perf_branch_entry);
/*
@ -1646,19 +1681,10 @@ static inline int perf_callchain_store(struct perf_callchain_entry_ctx *ctx, u64
}
extern int sysctl_perf_event_paranoid;
extern int sysctl_perf_event_mlock;
extern int sysctl_perf_event_sample_rate;
extern int sysctl_perf_cpu_time_max_percent;
extern void perf_sample_event_took(u64 sample_len_ns);
int perf_event_max_sample_rate_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos);
int perf_cpu_time_max_percent_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos);
int perf_event_max_stack_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos);
/* Access to perf_event_open(2) syscall. */
#define PERF_SECURITY_OPEN 0

View File

@ -65,6 +65,7 @@ struct mempolicy;
struct nameidata;
struct nsproxy;
struct perf_event_context;
struct perf_ctx_data;
struct pid_namespace;
struct pipe_inode_info;
struct rcu_node;
@ -1316,6 +1317,7 @@ struct task_struct {
struct perf_event_context *perf_event_ctxp;
struct mutex perf_event_mutex;
struct list_head perf_event_list;
struct perf_ctx_data __rcu *perf_ctx_data;
#endif
#ifdef CONFIG_DEBUG_PREEMPT
unsigned long preempt_disable_ip;

View File

@ -39,6 +39,8 @@ struct page;
#define MAX_URETPROBE_DEPTH 64
#define UPROBE_NO_TRAMPOLINE_VADDR (~0UL)
struct uprobe_consumer {
/*
* handler() can return UPROBE_HANDLER_REMOVE to signal the need to
@ -143,6 +145,7 @@ struct uprobe_task {
struct uprobe *active_uprobe;
unsigned long xol_vaddr;
bool signal_denied;
struct arch_uprobe *auprobe;
};

View File

@ -385,6 +385,8 @@ enum perf_event_read_format {
*
* @sample_max_stack: Max number of frame pointers in a callchain,
* should be < /proc/sys/kernel/perf_event_max_stack
* Max number of entries of branch stack
* should be < hardware limit
*/
struct perf_event_attr {

View File

@ -1453,11 +1453,6 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
out:
cpus_write_unlock();
/*
* Do post unplug cleanup. This is still protected against
* concurrent CPU hotplug via cpu_add_remove_lock.
*/
lockup_detector_cleanup();
arch_smt_update();
return ret;
}

View File

@ -22,6 +22,7 @@ struct callchain_cpus_entries {
int sysctl_perf_event_max_stack __read_mostly = PERF_MAX_STACK_DEPTH;
int sysctl_perf_event_max_contexts_per_stack __read_mostly = PERF_MAX_CONTEXTS_PER_STACK;
static const int six_hundred_forty_kb = 640 * 1024;
static inline size_t perf_callchain_entry__sizeof(void)
{
@ -266,12 +267,8 @@ exit_put:
return entry;
}
/*
* Used for sysctl_perf_event_max_stack and
* sysctl_perf_event_max_contexts_per_stack.
*/
int perf_event_max_stack_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
static int perf_event_max_stack_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
int *value = table->data;
int new_value = *value, ret;
@ -292,3 +289,32 @@ int perf_event_max_stack_handler(const struct ctl_table *table, int write,
return ret;
}
static const struct ctl_table callchain_sysctl_table[] = {
{
.procname = "perf_event_max_stack",
.data = &sysctl_perf_event_max_stack,
.maxlen = sizeof(sysctl_perf_event_max_stack),
.mode = 0644,
.proc_handler = perf_event_max_stack_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = (void *)&six_hundred_forty_kb,
},
{
.procname = "perf_event_max_contexts_per_stack",
.data = &sysctl_perf_event_max_contexts_per_stack,
.maxlen = sizeof(sysctl_perf_event_max_contexts_per_stack),
.mode = 0644,
.proc_handler = perf_event_max_stack_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE_THOUSAND,
},
};
static int __init init_callchain_sysctls(void)
{
register_sysctl_init("kernel", callchain_sysctl_table);
return 0;
}
core_initcall(init_callchain_sysctls);

File diff suppressed because it is too large Load Diff

View File

@ -950,9 +950,10 @@ static int hw_breakpoint_event_init(struct perf_event *bp)
return -ENOENT;
/*
* no branch sampling for breakpoint events
* Check if breakpoint type is supported before proceeding.
* Also, no branch sampling for breakpoint events.
*/
if (has_branch_stack(bp))
if (!hw_breakpoint_slots_cached(find_slot_idx(bp->attr.bp_type)) || has_branch_stack(bp))
return -EOPNOTSUPP;
err = register_perf_hw_breakpoint(bp);

View File

@ -19,7 +19,7 @@
static void perf_output_wakeup(struct perf_output_handle *handle)
{
atomic_set(&handle->rb->poll, EPOLLIN);
atomic_set(&handle->rb->poll, EPOLLIN | EPOLLRDNORM);
handle->event->pending_wakeup = 1;
@ -185,6 +185,7 @@ __perf_output_begin(struct perf_output_handle *handle,
handle->rb = rb;
handle->event = event;
handle->flags = 0;
have_lost = local_read(&rb->lost);
if (unlikely(have_lost)) {

View File

@ -2169,8 +2169,8 @@ void uprobe_copy_process(struct task_struct *t, unsigned long flags)
*/
unsigned long uprobe_get_trampoline_vaddr(void)
{
unsigned long trampoline_vaddr = UPROBE_NO_TRAMPOLINE_VADDR;
struct xol_area *area;
unsigned long trampoline_vaddr = -1;
/* Pairs with xol_add_vma() smp_store_release() */
area = READ_ONCE(current->mm->uprobes_state.xol_area); /* ^^^ */
@ -2311,9 +2311,8 @@ bool uprobe_deny_signal(void)
WARN_ON_ONCE(utask->state != UTASK_SSTEP);
if (task_sigpending(t)) {
spin_lock_irq(&t->sighand->siglock);
utask->signal_denied = true;
clear_tsk_thread_flag(t, TIF_SIGPENDING);
spin_unlock_irq(&t->sighand->siglock);
if (__fatal_signal_pending(t) || arch_uprobe_xol_was_trapped(t)) {
utask->state = UTASK_SSTEP_TRAPPED;
@ -2746,9 +2745,10 @@ static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
utask->state = UTASK_RUNNING;
xol_free_insn_slot(utask);
spin_lock_irq(&current->sighand->siglock);
recalc_sigpending(); /* see uprobe_deny_signal() */
spin_unlock_irq(&current->sighand->siglock);
if (utask->signal_denied) {
set_thread_flag(TIF_SIGPENDING);
utask->signal_denied = false;
}
if (unlikely(err)) {
uprobe_warn(current, "execute the probed insn, sending SIGILL.");

View File

@ -54,7 +54,6 @@
#include <linux/acpi.h>
#include <linux/reboot.h>
#include <linux/ftrace.h>
#include <linux/perf_event.h>
#include <linux/oom.h>
#include <linux/kmod.h>
#include <linux/capability.h>
@ -91,12 +90,6 @@ EXPORT_SYMBOL_GPL(sysctl_long_vals);
#if defined(CONFIG_SYSCTL)
/* Constants used for minimum and maximum */
#ifdef CONFIG_PERF_EVENTS
static const int six_hundred_forty_kb = 640 * 1024;
#endif
static const int ngroups_max = NGROUPS_MAX;
static const int cap_last_cap = CAP_LAST_CAP;
@ -1932,63 +1925,6 @@ static const struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
#endif
#ifdef CONFIG_PERF_EVENTS
/*
* User-space scripts rely on the existence of this file
* as a feature check for perf_events being enabled.
*
* So it's an ABI, do not remove!
*/
{
.procname = "perf_event_paranoid",
.data = &sysctl_perf_event_paranoid,
.maxlen = sizeof(sysctl_perf_event_paranoid),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "perf_event_mlock_kb",
.data = &sysctl_perf_event_mlock,
.maxlen = sizeof(sysctl_perf_event_mlock),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "perf_event_max_sample_rate",
.data = &sysctl_perf_event_sample_rate,
.maxlen = sizeof(sysctl_perf_event_sample_rate),
.mode = 0644,
.proc_handler = perf_event_max_sample_rate_handler,
.extra1 = SYSCTL_ONE,
},
{
.procname = "perf_cpu_time_max_percent",
.data = &sysctl_perf_cpu_time_max_percent,
.maxlen = sizeof(sysctl_perf_cpu_time_max_percent),
.mode = 0644,
.proc_handler = perf_cpu_time_max_percent_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "perf_event_max_stack",
.data = &sysctl_perf_event_max_stack,
.maxlen = sizeof(sysctl_perf_event_max_stack),
.mode = 0644,
.proc_handler = perf_event_max_stack_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = (void *)&six_hundred_forty_kb,
},
{
.procname = "perf_event_max_contexts_per_stack",
.data = &sysctl_perf_event_max_contexts_per_stack,
.maxlen = sizeof(sysctl_perf_event_max_contexts_per_stack),
.mode = 0644,
.proc_handler = perf_event_max_stack_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE_THOUSAND,
},
#endif
{
.procname = "panic_on_warn",

View File

@ -347,8 +347,6 @@ static int __init watchdog_thresh_setup(char *str)
}
__setup("watchdog_thresh=", watchdog_thresh_setup);
static void __lockup_detector_cleanup(void);
#ifdef CONFIG_SOFTLOCKUP_DETECTOR_INTR_STORM
enum stats_per_group {
STATS_SYSTEM,
@ -886,11 +884,6 @@ static void __lockup_detector_reconfigure(void)
watchdog_hardlockup_start();
cpus_read_unlock();
/*
* Must be called outside the cpus locked section to prevent
* recursive locking in the perf code.
*/
__lockup_detector_cleanup();
}
void lockup_detector_reconfigure(void)
@ -940,24 +933,6 @@ static inline void lockup_detector_setup(void)
}
#endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
static void __lockup_detector_cleanup(void)
{
lockdep_assert_held(&watchdog_mutex);
hardlockup_detector_perf_cleanup();
}
/**
* lockup_detector_cleanup - Cleanup after cpu hotplug or sysctl changes
*
* Caller must not hold the cpu hotplug rwsem.
*/
void lockup_detector_cleanup(void)
{
mutex_lock(&watchdog_mutex);
__lockup_detector_cleanup();
mutex_unlock(&watchdog_mutex);
}
/**
* lockup_detector_soft_poweroff - Interface to stop lockup detector(s)
*

View File

@ -21,8 +21,6 @@
#include <linux/perf_event.h>
static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
static DEFINE_PER_CPU(struct perf_event *, dead_event);
static struct cpumask dead_events_mask;
static atomic_t watchdog_cpus = ATOMIC_INIT(0);
@ -146,6 +144,7 @@ static int hardlockup_detector_event_create(void)
PTR_ERR(evt));
return PTR_ERR(evt);
}
WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
this_cpu_write(watchdog_ev, evt);
return 0;
}
@ -181,36 +180,12 @@ void watchdog_hardlockup_disable(unsigned int cpu)
if (event) {
perf_event_disable(event);
perf_event_release_kernel(event);
this_cpu_write(watchdog_ev, NULL);
this_cpu_write(dead_event, event);
cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
atomic_dec(&watchdog_cpus);
}
}
/**
* hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
*
* Called from lockup_detector_cleanup(). Serialized by the caller.
*/
void hardlockup_detector_perf_cleanup(void)
{
int cpu;
for_each_cpu(cpu, &dead_events_mask) {
struct perf_event *event = per_cpu(dead_event, cpu);
/*
* Required because for_each_cpu() reports unconditionally
* CPU0 as set on UP kernels. Sigh.
*/
if (event)
perf_event_release_kernel(event);
per_cpu(dead_event, cpu) = NULL;
}
cpumask_clear(&dead_events_mask);
}
/**
* hardlockup_detector_perf_stop - Globally stop watchdog events
*

View File

@ -64,7 +64,8 @@ union ibs_op_ctl {
opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
reserved0:5, /* 27-31: reserved */
opcurcnt:27, /* 32-58: periodic op counter current count */
reserved1:5; /* 59-63: reserved */
ldlat_thrsh:4, /* 59-62: Load Latency threshold */
ldlat_en:1; /* 63: Load Latency enabled */
};
};