Performance events updates for v6.15:
Core:
- Move perf_event sysctls into kernel/events/ (Joel Granados)
- Use POLLHUP for pinned events in error (Namhyung Kim)
- Avoid the read if the count is already updated (Peter Zijlstra)
- Allow the EPOLLRDNORM flag for poll (Tao Chen)
- locking/percpu-rwsem: Add guard support (Peter Zijlstra)
[ NOTE: this got (mis-)merged into the perf tree due to related work. ]
perf_pmu_unregister() related improvements: (Peter Zijlstra)
- Simplify the perf_event_alloc() error path
- Simplify the perf_pmu_register() error path
- Simplify perf_pmu_register()
- Simplify perf_init_event()
- Simplify perf_event_alloc()
- Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
- Add this_cpc() helper
- Introduce perf_free_addr_filters()
- Robustify perf_event_free_bpf_prog()
- Simplify the perf_mmap() control flow
- Further simplify perf_mmap()
- Remove retry loop from perf_mmap()
- Lift event->mmap_mutex in perf_mmap()
- Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
- Fix perf_mmap() failure path
Uprobes:
- Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
- Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
- Remove the spinlock within handle_singlestep() (Liao Chang)
x86 Intel PMU enhancements:
- Support PEBS counters snapshotting (Kan Liang)
- Fix intel_pmu_read_event() (Kan Liang)
- Extend per event callchain limit to branch stack (Kan Liang)
- Fix system-wide LBR profiling (Kan Liang)
- Allocate bts_ctx only if necessary (Li RongQing)
- Apply static call for drain_pebs (Peter Zijlstra)
x86 AMD PMU enhancements: (Ravi Bangoria)
- Remove pointless sample period check
- Fix ->config to sample period calculation for OP PMU
- Fix perf_ibs_op.cnt_mask for CurCnt
- Don't allow freq mode event creation through ->config interface
- Add PMU specific minimum period
- Add ->check_period() callback
- Ceil sample_period to min_period
- Add support for OP Load Latency Filtering
- Update DTLB/PageSize decode logic
Hardware breakpoints:
- Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar Bhaskar)
Hardlockup detector improvements: (Li Huafei)
- perf_event memory leak
- Warn if watchdog_ev is leaked
Fixes and cleanups:
- Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter Zijlstra,
Ravi Bangoria, Thorsten Blum, XieLudan)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfehRIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hF3g//TCAQijI6OFNpYiD1xoyMq4m+baIYhYx0
lnxwxhsN58JFcEJeWIEGLACqUePyH68jNKVSr9sIoeV4gnnMX+x2Ny6rh/1H3Ox+
jQyVmPdFmKa8QG7wGNjcDteIzlEKK4zqruXWaG54LX2e6kbQZWwd0I21MyXkrHXb
oMIfyZbCAWuPW1wefZm8FPgImT+nvwOosyx90OVagGqk5mYdNb9DFhMjQveStHdQ
BnWU6rYdW1c2eXKpeuvxY4uWQoCELC6WntLimvcswy6fb+9LtbglpCYQOGGDrGvp
v3RASf/8clFVSau8P/8NEaNgLgjN/e3eN/fAoSut8Z22nAeBC6qv4qjFt1piDpbs
AaEXYCYM0/Tfzjp3ctPsFrxbKvB8q2qhxSm37Co0Ix6WyJn3JQbNx48g8GIod2os
eGPXSZzoz9O8coeTKKbxWp4fpAjFfyfe/ovWQuVd8JI4bYj7Mi63J+RxQDd2TkJP
H+IgxZoamJExgS1YcKJUBtw7QKQm5pHFx03Br7KsNxgmHy7JdoN9bh0h14pkeXjB
MnAvWOS5ouuriJgQ+4bqAezS8DSHnDdmFmWgNEEqAlOD9Zy9hDXJ2GiqbHKMyRNC
ae35o0PDUFTIX9O5NPIDUyWtJb5uH/S1lQhS7GD+ODlMDIX+ny+REXf9krSCR1H0
GUqq2UmxBGA=
=iPmA
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Core:
- Move perf_event sysctls into kernel/events/ (Joel Granados)
- Use POLLHUP for pinned events in error (Namhyung Kim)
- Avoid the read if the count is already updated (Peter Zijlstra)
- Allow the EPOLLRDNORM flag for poll (Tao Chen)
- locking/percpu-rwsem: Add guard support [ NOTE: this got
(mis-)merged into the perf tree due to related work ] (Peter
Zijlstra)
perf_pmu_unregister() related improvements: (Peter Zijlstra)
- Simplify the perf_event_alloc() error path
- Simplify the perf_pmu_register() error path
- Simplify perf_pmu_register()
- Simplify perf_init_event()
- Simplify perf_event_alloc()
- Merge struct pmu::pmu_disable_count into struct
perf_cpu_pmu_context::pmu_disable_count
- Add this_cpc() helper
- Introduce perf_free_addr_filters()
- Robustify perf_event_free_bpf_prog()
- Simplify the perf_mmap() control flow
- Further simplify perf_mmap()
- Remove retry loop from perf_mmap()
- Lift event->mmap_mutex in perf_mmap()
- Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
- Fix perf_mmap() failure path
Uprobes:
- Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
- Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
- Remove the spinlock within handle_singlestep() (Liao Chang)
x86 Intel PMU enhancements:
- Support PEBS counters snapshotting (Kan Liang)
- Fix intel_pmu_read_event() (Kan Liang)
- Extend per event callchain limit to branch stack (Kan Liang)
- Fix system-wide LBR profiling (Kan Liang)
- Allocate bts_ctx only if necessary (Li RongQing)
- Apply static call for drain_pebs (Peter Zijlstra)
x86 AMD PMU enhancements: (Ravi Bangoria)
- Remove pointless sample period check
- Fix ->config to sample period calculation for OP PMU
- Fix perf_ibs_op.cnt_mask for CurCnt
- Don't allow freq mode event creation through ->config interface
- Add PMU specific minimum period
- Add ->check_period() callback
- Ceil sample_period to min_period
- Add support for OP Load Latency Filtering
- Update DTLB/PageSize decode logic
Hardware breakpoints:
- Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar
Bhaskar)
Hardlockup detector improvements: (Li Huafei)
- perf_event memory leak
- Warn if watchdog_ev is leaked
Fixes and cleanups:
- Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter
Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)"
* tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
perf: Fix __percpu annotation
perf: Clean up pmu specific data
perf/x86: Remove swap_task_ctx()
perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode
perf: Supply task information to sched_task()
perf: attach/detach PMU specific data
locking/percpu-rwsem: Add guard support
perf: Save PMU specific data in task_struct
perf: Extend per event callchain limit to branch stack
perf/ring_buffer: Allow the EPOLLRDNORM flag for poll
perf/core: Use POLLHUP for pinned events in error
perf/core: Use sysfs_emit() instead of scnprintf()
perf/core: Remove optional 'size' arguments from strscpy() calls
perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions
uprobes/x86: Harden uretprobe syscall trampoline check
watchdog/hardlockup/perf: Warn if watchdog_ev is leaked
watchdog/hardlockup/perf: Fix perf_event memory leak
perf/x86: Annotate struct bts_buffer::buf with __counted_by()
perf/core: Clean up perf_try_init_event()
perf/core: Fix perf_mmap() failure path
...
pull/1183/head
commit
327ecdbc0f
|
|
@ -132,7 +132,10 @@ static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
|
|||
|
||||
static inline void power_pmu_bhrb_enable(struct perf_event *event) {}
|
||||
static inline void power_pmu_bhrb_disable(struct perf_event *event) {}
|
||||
static void power_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in) {}
|
||||
static void power_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
}
|
||||
static inline void power_pmu_bhrb_read(struct perf_event *event, struct cpu_hw_events *cpuhw) {}
|
||||
static void pmao_restore_workaround(bool ebb) { }
|
||||
#endif /* CONFIG_PPC32 */
|
||||
|
|
@ -444,7 +447,8 @@ static void power_pmu_bhrb_disable(struct perf_event *event)
|
|||
/* Called from ctxsw to prevent one process's branch entries to
|
||||
* mingle with the other process's entries during context switch.
|
||||
*/
|
||||
static void power_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
|
||||
static void power_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
if (!ppmu->bhrb_nr)
|
||||
return;
|
||||
|
|
|
|||
|
|
@ -518,7 +518,8 @@ static void paicrypt_have_samples(void)
|
|||
/* Called on schedule-in and schedule-out. No access to event structure,
|
||||
* but for sampling only event CRYPTO_ALL is allowed.
|
||||
*/
|
||||
static void paicrypt_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
|
||||
static void paicrypt_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
/* We started with a clean page on event installation. So read out
|
||||
* results on schedule_out and if page was dirty, save old values.
|
||||
|
|
|
|||
|
|
@ -542,7 +542,8 @@ static void paiext_have_samples(void)
|
|||
/* Called on schedule-in and schedule-out. No access to event structure,
|
||||
* but for sampling only event NNPA_ALL is allowed.
|
||||
*/
|
||||
static void paiext_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
|
||||
static void paiext_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
/* We started with a clean page on event installation. So read out
|
||||
* results on schedule_out and if page was dirty, save old values.
|
||||
|
|
|
|||
|
|
@ -381,7 +381,8 @@ static void amd_brs_poison_buffer(void)
|
|||
* On ctxswin, sched_in = true, called after the PMU has started
|
||||
* On ctxswout, sched_in = false, called before the PMU is stopped
|
||||
*/
|
||||
void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
|
||||
void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
|
||||
|
|
|
|||
|
|
@ -28,9 +28,6 @@ static u32 ibs_caps;
|
|||
#include <asm/nmi.h>
|
||||
#include <asm/amd-ibs.h>
|
||||
|
||||
#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
|
||||
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
|
||||
|
||||
/* attr.config2 */
|
||||
#define IBS_SW_FILTER_MASK 1
|
||||
|
||||
|
|
@ -89,6 +86,7 @@ struct perf_ibs {
|
|||
u64 cnt_mask;
|
||||
u64 enable_mask;
|
||||
u64 valid_mask;
|
||||
u16 min_period;
|
||||
u64 max_period;
|
||||
unsigned long offset_mask[1];
|
||||
int offset_max;
|
||||
|
|
@ -270,11 +268,19 @@ static int validate_group(struct perf_event *event)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static bool perf_ibs_ldlat_event(struct perf_ibs *perf_ibs,
|
||||
struct perf_event *event)
|
||||
{
|
||||
return perf_ibs == &perf_ibs_op &&
|
||||
(ibs_caps & IBS_CAPS_OPLDLAT) &&
|
||||
(event->attr.config1 & 0xFFF);
|
||||
}
|
||||
|
||||
static int perf_ibs_init(struct perf_event *event)
|
||||
{
|
||||
struct hw_perf_event *hwc = &event->hw;
|
||||
struct perf_ibs *perf_ibs;
|
||||
u64 max_cnt, config;
|
||||
u64 config;
|
||||
int ret;
|
||||
|
||||
perf_ibs = get_ibs_pmu(event->attr.type);
|
||||
|
|
@ -310,25 +316,47 @@ static int perf_ibs_init(struct perf_event *event)
|
|||
if (config & perf_ibs->cnt_mask)
|
||||
/* raw max_cnt may not be set */
|
||||
return -EINVAL;
|
||||
if (!event->attr.sample_freq && hwc->sample_period & 0x0f)
|
||||
/*
|
||||
* lower 4 bits can not be set in ibs max cnt,
|
||||
* but allowing it in case we adjust the
|
||||
* sample period to set a frequency.
|
||||
*/
|
||||
return -EINVAL;
|
||||
hwc->sample_period &= ~0x0FULL;
|
||||
if (!hwc->sample_period)
|
||||
hwc->sample_period = 0x10;
|
||||
|
||||
if (event->attr.freq) {
|
||||
hwc->sample_period = perf_ibs->min_period;
|
||||
} else {
|
||||
/* Silently mask off lower nibble. IBS hw mandates it. */
|
||||
hwc->sample_period &= ~0x0FULL;
|
||||
if (hwc->sample_period < perf_ibs->min_period)
|
||||
return -EINVAL;
|
||||
}
|
||||
} else {
|
||||
max_cnt = config & perf_ibs->cnt_mask;
|
||||
u64 period = 0;
|
||||
|
||||
if (event->attr.freq)
|
||||
return -EINVAL;
|
||||
|
||||
if (perf_ibs == &perf_ibs_op) {
|
||||
period = (config & IBS_OP_MAX_CNT) << 4;
|
||||
if (ibs_caps & IBS_CAPS_OPCNTEXT)
|
||||
period |= config & IBS_OP_MAX_CNT_EXT_MASK;
|
||||
} else {
|
||||
period = (config & IBS_FETCH_MAX_CNT) << 4;
|
||||
}
|
||||
|
||||
config &= ~perf_ibs->cnt_mask;
|
||||
event->attr.sample_period = max_cnt << 4;
|
||||
hwc->sample_period = event->attr.sample_period;
|
||||
event->attr.sample_period = period;
|
||||
hwc->sample_period = period;
|
||||
|
||||
if (hwc->sample_period < perf_ibs->min_period)
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (!hwc->sample_period)
|
||||
return -EINVAL;
|
||||
if (perf_ibs_ldlat_event(perf_ibs, event)) {
|
||||
u64 ldlat = event->attr.config1 & 0xFFF;
|
||||
|
||||
if (ldlat < 128 || ldlat > 2048)
|
||||
return -EINVAL;
|
||||
ldlat >>= 7;
|
||||
|
||||
config |= (ldlat - 1) << 59;
|
||||
config |= IBS_OP_L3MISSONLY | IBS_OP_LDLAT_EN;
|
||||
}
|
||||
|
||||
/*
|
||||
* If we modify hwc->sample_period, we also need to update
|
||||
|
|
@ -349,7 +377,8 @@ static int perf_ibs_set_period(struct perf_ibs *perf_ibs,
|
|||
int overflow;
|
||||
|
||||
/* ignore lower 4 bits in min count: */
|
||||
overflow = perf_event_set_period(hwc, 1<<4, perf_ibs->max_period, period);
|
||||
overflow = perf_event_set_period(hwc, perf_ibs->min_period,
|
||||
perf_ibs->max_period, period);
|
||||
local64_set(&hwc->prev_count, 0);
|
||||
|
||||
return overflow;
|
||||
|
|
@ -447,6 +476,9 @@ static void perf_ibs_start(struct perf_event *event, int flags)
|
|||
WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
|
||||
hwc->state = 0;
|
||||
|
||||
if (event->attr.freq && hwc->sample_period < perf_ibs->min_period)
|
||||
hwc->sample_period = perf_ibs->min_period;
|
||||
|
||||
perf_ibs_set_period(perf_ibs, hwc, &period);
|
||||
if (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_OPCNTEXT)) {
|
||||
config |= period & IBS_OP_MAX_CNT_EXT_MASK;
|
||||
|
|
@ -554,6 +586,28 @@ static void perf_ibs_del(struct perf_event *event, int flags)
|
|||
|
||||
static void perf_ibs_read(struct perf_event *event) { }
|
||||
|
||||
static int perf_ibs_check_period(struct perf_event *event, u64 value)
|
||||
{
|
||||
struct perf_ibs *perf_ibs;
|
||||
u64 low_nibble;
|
||||
|
||||
if (event->attr.freq)
|
||||
return 0;
|
||||
|
||||
perf_ibs = container_of(event->pmu, struct perf_ibs, pmu);
|
||||
low_nibble = value & 0xFULL;
|
||||
|
||||
/*
|
||||
* This contradicts with perf_ibs_init() which allows sample period
|
||||
* with lower nibble bits set but silently masks them off. Whereas
|
||||
* this returns error.
|
||||
*/
|
||||
if (low_nibble || value < perf_ibs->min_period)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* We need to initialize with empty group if all attributes in the
|
||||
* group are dynamic.
|
||||
|
|
@ -572,7 +626,10 @@ PMU_FORMAT_ATTR(cnt_ctl, "config:19");
|
|||
PMU_FORMAT_ATTR(swfilt, "config2:0");
|
||||
PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
|
||||
PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
|
||||
PMU_EVENT_ATTR_STRING(ldlat, ibs_op_ldlat_format, "config1:0-11");
|
||||
PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
|
||||
PMU_EVENT_ATTR_STRING(ldlat, ibs_op_ldlat_cap, "1");
|
||||
PMU_EVENT_ATTR_STRING(dtlb_pgsize, ibs_op_dtlb_pgsize_cap, "1");
|
||||
|
||||
static umode_t
|
||||
zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
|
||||
|
|
@ -580,6 +637,18 @@ zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int
|
|||
return ibs_caps & IBS_CAPS_ZEN4 ? attr->mode : 0;
|
||||
}
|
||||
|
||||
static umode_t
|
||||
ibs_op_ldlat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
|
||||
{
|
||||
return ibs_caps & IBS_CAPS_OPLDLAT ? attr->mode : 0;
|
||||
}
|
||||
|
||||
static umode_t
|
||||
ibs_op_dtlb_pgsize_is_visible(struct kobject *kobj, struct attribute *attr, int i)
|
||||
{
|
||||
return ibs_caps & IBS_CAPS_OPDTLBPGSIZE ? attr->mode : 0;
|
||||
}
|
||||
|
||||
static struct attribute *fetch_attrs[] = {
|
||||
&format_attr_rand_en.attr,
|
||||
&format_attr_swfilt.attr,
|
||||
|
|
@ -596,6 +665,16 @@ static struct attribute *zen4_ibs_extensions_attrs[] = {
|
|||
NULL,
|
||||
};
|
||||
|
||||
static struct attribute *ibs_op_ldlat_cap_attrs[] = {
|
||||
&ibs_op_ldlat_cap.attr.attr,
|
||||
NULL,
|
||||
};
|
||||
|
||||
static struct attribute *ibs_op_dtlb_pgsize_cap_attrs[] = {
|
||||
&ibs_op_dtlb_pgsize_cap.attr.attr,
|
||||
NULL,
|
||||
};
|
||||
|
||||
static struct attribute_group group_fetch_formats = {
|
||||
.name = "format",
|
||||
.attrs = fetch_attrs,
|
||||
|
|
@ -613,6 +692,18 @@ static struct attribute_group group_zen4_ibs_extensions = {
|
|||
.is_visible = zen4_ibs_extensions_is_visible,
|
||||
};
|
||||
|
||||
static struct attribute_group group_ibs_op_ldlat_cap = {
|
||||
.name = "caps",
|
||||
.attrs = ibs_op_ldlat_cap_attrs,
|
||||
.is_visible = ibs_op_ldlat_is_visible,
|
||||
};
|
||||
|
||||
static struct attribute_group group_ibs_op_dtlb_pgsize_cap = {
|
||||
.name = "caps",
|
||||
.attrs = ibs_op_dtlb_pgsize_cap_attrs,
|
||||
.is_visible = ibs_op_dtlb_pgsize_is_visible,
|
||||
};
|
||||
|
||||
static const struct attribute_group *fetch_attr_groups[] = {
|
||||
&group_fetch_formats,
|
||||
&empty_caps_group,
|
||||
|
|
@ -651,6 +742,11 @@ static struct attribute_group group_op_formats = {
|
|||
.attrs = op_attrs,
|
||||
};
|
||||
|
||||
static struct attribute *ibs_op_ldlat_format_attrs[] = {
|
||||
&ibs_op_ldlat_format.attr.attr,
|
||||
NULL,
|
||||
};
|
||||
|
||||
static struct attribute_group group_cnt_ctl = {
|
||||
.name = "format",
|
||||
.attrs = cnt_ctl_attrs,
|
||||
|
|
@ -669,10 +765,19 @@ static const struct attribute_group *op_attr_groups[] = {
|
|||
NULL,
|
||||
};
|
||||
|
||||
static struct attribute_group group_ibs_op_ldlat_format = {
|
||||
.name = "format",
|
||||
.attrs = ibs_op_ldlat_format_attrs,
|
||||
.is_visible = ibs_op_ldlat_is_visible,
|
||||
};
|
||||
|
||||
static const struct attribute_group *op_attr_update[] = {
|
||||
&group_cnt_ctl,
|
||||
&group_op_l3missonly,
|
||||
&group_zen4_ibs_extensions,
|
||||
&group_ibs_op_ldlat_cap,
|
||||
&group_ibs_op_ldlat_format,
|
||||
&group_ibs_op_dtlb_pgsize_cap,
|
||||
NULL,
|
||||
};
|
||||
|
||||
|
|
@ -686,12 +791,14 @@ static struct perf_ibs perf_ibs_fetch = {
|
|||
.start = perf_ibs_start,
|
||||
.stop = perf_ibs_stop,
|
||||
.read = perf_ibs_read,
|
||||
.check_period = perf_ibs_check_period,
|
||||
},
|
||||
.msr = MSR_AMD64_IBSFETCHCTL,
|
||||
.config_mask = IBS_FETCH_CONFIG_MASK,
|
||||
.config_mask = IBS_FETCH_MAX_CNT | IBS_FETCH_RAND_EN,
|
||||
.cnt_mask = IBS_FETCH_MAX_CNT,
|
||||
.enable_mask = IBS_FETCH_ENABLE,
|
||||
.valid_mask = IBS_FETCH_VAL,
|
||||
.min_period = 0x10,
|
||||
.max_period = IBS_FETCH_MAX_CNT << 4,
|
||||
.offset_mask = { MSR_AMD64_IBSFETCH_REG_MASK },
|
||||
.offset_max = MSR_AMD64_IBSFETCH_REG_COUNT,
|
||||
|
|
@ -709,13 +816,15 @@ static struct perf_ibs perf_ibs_op = {
|
|||
.start = perf_ibs_start,
|
||||
.stop = perf_ibs_stop,
|
||||
.read = perf_ibs_read,
|
||||
.check_period = perf_ibs_check_period,
|
||||
},
|
||||
.msr = MSR_AMD64_IBSOPCTL,
|
||||
.config_mask = IBS_OP_CONFIG_MASK,
|
||||
.config_mask = IBS_OP_MAX_CNT,
|
||||
.cnt_mask = IBS_OP_MAX_CNT | IBS_OP_CUR_CNT |
|
||||
IBS_OP_CUR_CNT_RAND,
|
||||
.enable_mask = IBS_OP_ENABLE,
|
||||
.valid_mask = IBS_OP_VAL,
|
||||
.min_period = 0x90,
|
||||
.max_period = IBS_OP_MAX_CNT << 4,
|
||||
.offset_mask = { MSR_AMD64_IBSOP_REG_MASK },
|
||||
.offset_max = MSR_AMD64_IBSOP_REG_COUNT,
|
||||
|
|
@ -917,6 +1026,10 @@ static void perf_ibs_get_tlb_lvl(union ibs_op_data3 *op_data3,
|
|||
if (!op_data3->dc_lin_addr_valid)
|
||||
return;
|
||||
|
||||
if ((ibs_caps & IBS_CAPS_OPDTLBPGSIZE) &&
|
||||
!op_data3->dc_phy_addr_valid)
|
||||
return;
|
||||
|
||||
if (!op_data3->dc_l1tlb_miss) {
|
||||
data_src->mem_dtlb = PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT;
|
||||
return;
|
||||
|
|
@ -1023,15 +1136,25 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
|
|||
}
|
||||
}
|
||||
|
||||
static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
|
||||
static bool perf_ibs_is_mem_sample_type(struct perf_ibs *perf_ibs,
|
||||
struct perf_event *event)
|
||||
{
|
||||
u64 sample_type = event->attr.sample_type;
|
||||
|
||||
return perf_ibs == &perf_ibs_op &&
|
||||
sample_type & (PERF_SAMPLE_DATA_SRC |
|
||||
PERF_SAMPLE_WEIGHT_TYPE |
|
||||
PERF_SAMPLE_ADDR |
|
||||
PERF_SAMPLE_PHYS_ADDR);
|
||||
}
|
||||
|
||||
static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs,
|
||||
struct perf_event *event,
|
||||
int check_rip)
|
||||
{
|
||||
if (sample_type & PERF_SAMPLE_RAW ||
|
||||
(perf_ibs == &perf_ibs_op &&
|
||||
(sample_type & PERF_SAMPLE_DATA_SRC ||
|
||||
sample_type & PERF_SAMPLE_WEIGHT_TYPE ||
|
||||
sample_type & PERF_SAMPLE_ADDR ||
|
||||
sample_type & PERF_SAMPLE_PHYS_ADDR)))
|
||||
if (event->attr.sample_type & PERF_SAMPLE_RAW ||
|
||||
perf_ibs_is_mem_sample_type(perf_ibs, event) ||
|
||||
perf_ibs_ldlat_event(perf_ibs, event))
|
||||
return perf_ibs->offset_max;
|
||||
else if (check_rip)
|
||||
return 3;
|
||||
|
|
@ -1148,7 +1271,7 @@ fail:
|
|||
offset = 1;
|
||||
check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
|
||||
|
||||
offset_max = perf_ibs_get_offset_max(perf_ibs, event->attr.sample_type, check_rip);
|
||||
offset_max = perf_ibs_get_offset_max(perf_ibs, event, check_rip);
|
||||
|
||||
do {
|
||||
rdmsrl(msr + offset, *buf++);
|
||||
|
|
@ -1157,6 +1280,22 @@ fail:
|
|||
perf_ibs->offset_max,
|
||||
offset + 1);
|
||||
} while (offset < offset_max);
|
||||
|
||||
if (perf_ibs_ldlat_event(perf_ibs, event)) {
|
||||
union ibs_op_data3 op_data3;
|
||||
|
||||
op_data3.val = ibs_data.regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
|
||||
/*
|
||||
* Opening event is errored out if load latency threshold is
|
||||
* outside of [128, 2048] range. Since the event has reached
|
||||
* interrupt handler, we can safely assume the threshold is
|
||||
* within [128, 2048] range.
|
||||
*/
|
||||
if (!op_data3.ld_op || !op_data3.dc_miss ||
|
||||
op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF))
|
||||
goto out;
|
||||
}
|
||||
|
||||
/*
|
||||
* Read IbsBrTarget, IbsOpData4, and IbsExtdCtl separately
|
||||
* depending on their availability.
|
||||
|
|
@ -1229,6 +1368,10 @@ fail:
|
|||
perf_sample_save_callchain(&data, event, iregs);
|
||||
|
||||
throttle = perf_event_overflow(event, &data, ®s);
|
||||
|
||||
if (event->attr.freq && hwc->sample_period < perf_ibs->min_period)
|
||||
hwc->sample_period = perf_ibs->min_period;
|
||||
|
||||
out:
|
||||
if (throttle) {
|
||||
perf_ibs_stop(event, 0);
|
||||
|
|
@ -1318,7 +1461,8 @@ static __init int perf_ibs_op_init(void)
|
|||
if (ibs_caps & IBS_CAPS_OPCNTEXT) {
|
||||
perf_ibs_op.max_period |= IBS_OP_MAX_CNT_EXT_MASK;
|
||||
perf_ibs_op.config_mask |= IBS_OP_MAX_CNT_EXT_MASK;
|
||||
perf_ibs_op.cnt_mask |= IBS_OP_MAX_CNT_EXT_MASK;
|
||||
perf_ibs_op.cnt_mask |= (IBS_OP_MAX_CNT_EXT_MASK |
|
||||
IBS_OP_CUR_CNT_EXT_MASK);
|
||||
}
|
||||
|
||||
if (ibs_caps & IBS_CAPS_ZEN4)
|
||||
|
|
|
|||
|
|
@ -30,7 +30,7 @@
|
|||
#define GET_DOMID_MASK(x) (((x)->conf1 >> 16) & 0xFFFFULL)
|
||||
#define GET_PASID_MASK(x) (((x)->conf1 >> 32) & 0xFFFFFULL)
|
||||
|
||||
#define IOMMU_NAME_SIZE 16
|
||||
#define IOMMU_NAME_SIZE 24
|
||||
|
||||
struct perf_amd_iommu {
|
||||
struct list_head list;
|
||||
|
|
|
|||
|
|
@ -371,7 +371,8 @@ void amd_pmu_lbr_del(struct perf_event *event)
|
|||
perf_sched_cb_dec(event->pmu);
|
||||
}
|
||||
|
||||
void amd_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
|
||||
void amd_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
|
||||
|
|
|
|||
|
|
@ -87,13 +87,14 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_commit_scheduling, *x86_pmu.commit_scheduling);
|
|||
DEFINE_STATIC_CALL_NULL(x86_pmu_stop_scheduling, *x86_pmu.stop_scheduling);
|
||||
|
||||
DEFINE_STATIC_CALL_NULL(x86_pmu_sched_task, *x86_pmu.sched_task);
|
||||
DEFINE_STATIC_CALL_NULL(x86_pmu_swap_task_ctx, *x86_pmu.swap_task_ctx);
|
||||
|
||||
DEFINE_STATIC_CALL_NULL(x86_pmu_drain_pebs, *x86_pmu.drain_pebs);
|
||||
DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);
|
||||
|
||||
DEFINE_STATIC_CALL_NULL(x86_pmu_filter, *x86_pmu.filter);
|
||||
|
||||
DEFINE_STATIC_CALL_NULL(x86_pmu_late_setup, *x86_pmu.late_setup);
|
||||
|
||||
/*
|
||||
* This one is magic, it will get called even when PMU init fails (because
|
||||
* there is no PMU), in which case it should simply return NULL.
|
||||
|
|
@ -1298,6 +1299,15 @@ static void x86_pmu_enable(struct pmu *pmu)
|
|||
|
||||
if (cpuc->n_added) {
|
||||
int n_running = cpuc->n_events - cpuc->n_added;
|
||||
|
||||
/*
|
||||
* The late setup (after counters are scheduled)
|
||||
* is required for some cases, e.g., PEBS counters
|
||||
* snapshotting. Because an accurate counter index
|
||||
* is needed.
|
||||
*/
|
||||
static_call_cond(x86_pmu_late_setup)();
|
||||
|
||||
/*
|
||||
* apply assignment obtained either from
|
||||
* hw_perf_group_sched_in() or x86_pmu_enable()
|
||||
|
|
@ -2028,13 +2038,14 @@ static void x86_pmu_static_call_update(void)
|
|||
static_call_update(x86_pmu_stop_scheduling, x86_pmu.stop_scheduling);
|
||||
|
||||
static_call_update(x86_pmu_sched_task, x86_pmu.sched_task);
|
||||
static_call_update(x86_pmu_swap_task_ctx, x86_pmu.swap_task_ctx);
|
||||
|
||||
static_call_update(x86_pmu_drain_pebs, x86_pmu.drain_pebs);
|
||||
static_call_update(x86_pmu_pebs_aliases, x86_pmu.pebs_aliases);
|
||||
|
||||
static_call_update(x86_pmu_guest_get_msrs, x86_pmu.guest_get_msrs);
|
||||
static_call_update(x86_pmu_filter, x86_pmu.filter);
|
||||
|
||||
static_call_update(x86_pmu_late_setup, x86_pmu.late_setup);
|
||||
}
|
||||
|
||||
static void _x86_pmu_read(struct perf_event *event)
|
||||
|
|
@ -2625,15 +2636,10 @@ static const struct attribute_group *x86_pmu_attr_groups[] = {
|
|||
NULL,
|
||||
};
|
||||
|
||||
static void x86_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
|
||||
static void x86_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
static_call_cond(x86_pmu_sched_task)(pmu_ctx, sched_in);
|
||||
}
|
||||
|
||||
static void x86_pmu_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
|
||||
struct perf_event_pmu_context *next_epc)
|
||||
{
|
||||
static_call_cond(x86_pmu_swap_task_ctx)(prev_epc, next_epc);
|
||||
static_call_cond(x86_pmu_sched_task)(pmu_ctx, task, sched_in);
|
||||
}
|
||||
|
||||
void perf_check_microcode(void)
|
||||
|
|
@ -2700,7 +2706,6 @@ static struct pmu pmu = {
|
|||
|
||||
.event_idx = x86_pmu_event_idx,
|
||||
.sched_task = x86_pmu_sched_task,
|
||||
.swap_task_ctx = x86_pmu_swap_task_ctx,
|
||||
.check_period = x86_pmu_check_period,
|
||||
|
||||
.aux_output_match = x86_pmu_aux_output_match,
|
||||
|
|
|
|||
|
|
@ -36,7 +36,7 @@ enum {
|
|||
BTS_STATE_ACTIVE,
|
||||
};
|
||||
|
||||
static DEFINE_PER_CPU(struct bts_ctx, bts_ctx);
|
||||
static struct bts_ctx __percpu *bts_ctx;
|
||||
|
||||
#define BTS_RECORD_SIZE 24
|
||||
#define BTS_SAFETY_MARGIN 4080
|
||||
|
|
@ -58,7 +58,7 @@ struct bts_buffer {
|
|||
local_t head;
|
||||
unsigned long end;
|
||||
void **data_pages;
|
||||
struct bts_phys buf[];
|
||||
struct bts_phys buf[] __counted_by(nr_bufs);
|
||||
};
|
||||
|
||||
static struct pmu bts_pmu;
|
||||
|
|
@ -231,7 +231,7 @@ bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle);
|
|||
|
||||
static void __bts_event_start(struct perf_event *event)
|
||||
{
|
||||
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
|
||||
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
|
||||
struct bts_buffer *buf = perf_get_aux(&bts->handle);
|
||||
u64 config = 0;
|
||||
|
||||
|
|
@ -260,7 +260,7 @@ static void __bts_event_start(struct perf_event *event)
|
|||
static void bts_event_start(struct perf_event *event, int flags)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
|
||||
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
|
||||
struct bts_buffer *buf;
|
||||
|
||||
buf = perf_aux_output_begin(&bts->handle, event);
|
||||
|
|
@ -290,7 +290,7 @@ fail_stop:
|
|||
|
||||
static void __bts_event_stop(struct perf_event *event, int state)
|
||||
{
|
||||
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
|
||||
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
|
||||
|
||||
/* ACTIVE -> INACTIVE(PMI)/STOPPED(->stop()) */
|
||||
WRITE_ONCE(bts->state, state);
|
||||
|
|
@ -305,7 +305,7 @@ static void __bts_event_stop(struct perf_event *event, int state)
|
|||
static void bts_event_stop(struct perf_event *event, int flags)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
|
||||
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
|
||||
struct bts_buffer *buf = NULL;
|
||||
int state = READ_ONCE(bts->state);
|
||||
|
||||
|
|
@ -338,9 +338,14 @@ static void bts_event_stop(struct perf_event *event, int flags)
|
|||
|
||||
void intel_bts_enable_local(void)
|
||||
{
|
||||
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
|
||||
int state = READ_ONCE(bts->state);
|
||||
struct bts_ctx *bts;
|
||||
int state;
|
||||
|
||||
if (!bts_ctx)
|
||||
return;
|
||||
|
||||
bts = this_cpu_ptr(bts_ctx);
|
||||
state = READ_ONCE(bts->state);
|
||||
/*
|
||||
* Here we transition from INACTIVE to ACTIVE;
|
||||
* if we instead are STOPPED from the interrupt handler,
|
||||
|
|
@ -358,7 +363,12 @@ void intel_bts_enable_local(void)
|
|||
|
||||
void intel_bts_disable_local(void)
|
||||
{
|
||||
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
|
||||
struct bts_ctx *bts;
|
||||
|
||||
if (!bts_ctx)
|
||||
return;
|
||||
|
||||
bts = this_cpu_ptr(bts_ctx);
|
||||
|
||||
/*
|
||||
* Here we transition from ACTIVE to INACTIVE;
|
||||
|
|
@ -450,12 +460,17 @@ bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle)
|
|||
int intel_bts_interrupt(void)
|
||||
{
|
||||
struct debug_store *ds = this_cpu_ptr(&cpu_hw_events)->ds;
|
||||
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
|
||||
struct perf_event *event = bts->handle.event;
|
||||
struct bts_ctx *bts;
|
||||
struct perf_event *event;
|
||||
struct bts_buffer *buf;
|
||||
s64 old_head;
|
||||
int err = -ENOSPC, handled = 0;
|
||||
|
||||
if (!bts_ctx)
|
||||
return 0;
|
||||
|
||||
bts = this_cpu_ptr(bts_ctx);
|
||||
event = bts->handle.event;
|
||||
/*
|
||||
* The only surefire way of knowing if this NMI is ours is by checking
|
||||
* the write ptr against the PMI threshold.
|
||||
|
|
@ -518,7 +533,7 @@ static void bts_event_del(struct perf_event *event, int mode)
|
|||
|
||||
static int bts_event_add(struct perf_event *event, int mode)
|
||||
{
|
||||
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
|
||||
struct bts_ctx *bts = this_cpu_ptr(bts_ctx);
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
struct hw_perf_event *hwc = &event->hw;
|
||||
|
||||
|
|
@ -605,6 +620,10 @@ static __init int bts_init(void)
|
|||
return -ENODEV;
|
||||
}
|
||||
|
||||
bts_ctx = alloc_percpu(struct bts_ctx);
|
||||
if (!bts_ctx)
|
||||
return -ENOMEM;
|
||||
|
||||
bts_pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_ITRACE |
|
||||
PERF_PMU_CAP_EXCLUSIVE;
|
||||
bts_pmu.task_ctx_nr = perf_sw_context;
|
||||
|
|
|
|||
|
|
@ -2714,7 +2714,7 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
|
|||
* modify by a NMI. PMU has to be disabled before calling this function.
|
||||
*/
|
||||
|
||||
static u64 intel_update_topdown_event(struct perf_event *event, int metric_end)
|
||||
static u64 intel_update_topdown_event(struct perf_event *event, int metric_end, u64 *val)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
struct perf_event *other;
|
||||
|
|
@ -2722,13 +2722,24 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end)
|
|||
bool reset = true;
|
||||
int idx;
|
||||
|
||||
/* read Fixed counter 3 */
|
||||
rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
|
||||
if (!slots)
|
||||
return 0;
|
||||
if (!val) {
|
||||
/* read Fixed counter 3 */
|
||||
rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
|
||||
if (!slots)
|
||||
return 0;
|
||||
|
||||
/* read PERF_METRICS */
|
||||
rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
|
||||
/* read PERF_METRICS */
|
||||
rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
|
||||
} else {
|
||||
slots = val[0];
|
||||
metrics = val[1];
|
||||
/*
|
||||
* Don't reset the PERF_METRICS and Fixed counter 3
|
||||
* for each PEBS record read. Utilize the RDPMC metrics
|
||||
* clear mode.
|
||||
*/
|
||||
reset = false;
|
||||
}
|
||||
|
||||
for_each_set_bit(idx, cpuc->active_mask, metric_end + 1) {
|
||||
if (!is_topdown_idx(idx))
|
||||
|
|
@ -2771,36 +2782,47 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end)
|
|||
return slots;
|
||||
}
|
||||
|
||||
static u64 icl_update_topdown_event(struct perf_event *event)
|
||||
static u64 icl_update_topdown_event(struct perf_event *event, u64 *val)
|
||||
{
|
||||
return intel_update_topdown_event(event, INTEL_PMC_IDX_METRIC_BASE +
|
||||
x86_pmu.num_topdown_events - 1);
|
||||
x86_pmu.num_topdown_events - 1,
|
||||
val);
|
||||
}
|
||||
|
||||
DEFINE_STATIC_CALL(intel_pmu_update_topdown_event, x86_perf_event_update);
|
||||
|
||||
static void intel_pmu_read_topdown_event(struct perf_event *event)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
|
||||
/* Only need to call update_topdown_event() once for group read. */
|
||||
if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
|
||||
!is_slots_event(event))
|
||||
return;
|
||||
|
||||
perf_pmu_disable(event->pmu);
|
||||
static_call(intel_pmu_update_topdown_event)(event);
|
||||
perf_pmu_enable(event->pmu);
|
||||
}
|
||||
DEFINE_STATIC_CALL(intel_pmu_update_topdown_event, intel_pmu_topdown_event_update);
|
||||
|
||||
static void intel_pmu_read_event(struct perf_event *event)
|
||||
{
|
||||
if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
|
||||
intel_pmu_auto_reload_read(event);
|
||||
else if (is_topdown_count(event))
|
||||
intel_pmu_read_topdown_event(event);
|
||||
else
|
||||
x86_perf_event_update(event);
|
||||
if (event->hw.flags & (PERF_X86_EVENT_AUTO_RELOAD | PERF_X86_EVENT_TOPDOWN) ||
|
||||
is_pebs_counter_event_group(event)) {
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
bool pmu_enabled = cpuc->enabled;
|
||||
|
||||
/* Only need to call update_topdown_event() once for group read. */
|
||||
if (is_metric_event(event) && (cpuc->txn_flags & PERF_PMU_TXN_READ))
|
||||
return;
|
||||
|
||||
cpuc->enabled = 0;
|
||||
if (pmu_enabled)
|
||||
intel_pmu_disable_all();
|
||||
|
||||
/*
|
||||
* If the PEBS counters snapshotting is enabled,
|
||||
* the topdown event is available in PEBS records.
|
||||
*/
|
||||
if (is_topdown_event(event) && !is_pebs_counter_event_group(event))
|
||||
static_call(intel_pmu_update_topdown_event)(event, NULL);
|
||||
else
|
||||
intel_pmu_drain_pebs_buffer();
|
||||
|
||||
cpuc->enabled = pmu_enabled;
|
||||
if (pmu_enabled)
|
||||
intel_pmu_enable_all(0);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
x86_perf_event_update(event);
|
||||
}
|
||||
|
||||
static void intel_pmu_enable_fixed(struct perf_event *event)
|
||||
|
|
@ -2932,7 +2954,7 @@ static int intel_pmu_set_period(struct perf_event *event)
|
|||
static u64 intel_pmu_update(struct perf_event *event)
|
||||
{
|
||||
if (unlikely(is_topdown_count(event)))
|
||||
return static_call(intel_pmu_update_topdown_event)(event);
|
||||
return static_call(intel_pmu_update_topdown_event)(event, NULL);
|
||||
|
||||
return x86_perf_event_update(event);
|
||||
}
|
||||
|
|
@ -3070,7 +3092,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
|
|||
|
||||
handled++;
|
||||
x86_pmu_handle_guest_pebs(regs, &data);
|
||||
x86_pmu.drain_pebs(regs, &data);
|
||||
static_call(x86_pmu_drain_pebs)(regs, &data);
|
||||
status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
|
||||
|
||||
/*
|
||||
|
|
@ -3098,7 +3120,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
|
|||
*/
|
||||
if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT, (unsigned long *)&status)) {
|
||||
handled++;
|
||||
static_call(intel_pmu_update_topdown_event)(NULL);
|
||||
static_call(intel_pmu_update_topdown_event)(NULL, NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
@ -3116,6 +3138,27 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
|
|||
if (!test_bit(bit, cpuc->active_mask))
|
||||
continue;
|
||||
|
||||
/*
|
||||
* There may be unprocessed PEBS records in the PEBS buffer,
|
||||
* which still stores the previous values.
|
||||
* Process those records first before handling the latest value.
|
||||
* For example,
|
||||
* A is a regular counter
|
||||
* B is a PEBS event which reads A
|
||||
* C is a PEBS event
|
||||
*
|
||||
* The following can happen:
|
||||
* B-assist A=1
|
||||
* C A=2
|
||||
* B-assist A=3
|
||||
* A-overflow-PMI A=4
|
||||
* C-assist-PMI (PEBS buffer) A=5
|
||||
*
|
||||
* The PEBS buffer has to be drained before handling the A-PMI
|
||||
*/
|
||||
if (is_pebs_counter_event_group(event))
|
||||
x86_pmu.drain_pebs(regs, &data);
|
||||
|
||||
if (!intel_pmu_save_and_restart(event))
|
||||
continue;
|
||||
|
||||
|
|
@ -4148,6 +4191,13 @@ static int intel_pmu_hw_config(struct perf_event *event)
|
|||
event->hw.flags |= PERF_X86_EVENT_PEBS_VIA_PT;
|
||||
}
|
||||
|
||||
if ((event->attr.sample_type & PERF_SAMPLE_READ) &&
|
||||
(x86_pmu.intel_cap.pebs_format >= 6) &&
|
||||
x86_pmu.intel_cap.pebs_baseline &&
|
||||
is_sampling_event(event) &&
|
||||
event->attr.precise_ip)
|
||||
event->group_leader->hw.flags |= PERF_X86_EVENT_PEBS_CNTR;
|
||||
|
||||
if ((event->attr.type == PERF_TYPE_HARDWARE) ||
|
||||
(event->attr.type == PERF_TYPE_HW_CACHE))
|
||||
return 0;
|
||||
|
|
@ -5244,16 +5294,10 @@ static void intel_pmu_cpu_dead(int cpu)
|
|||
}
|
||||
|
||||
static void intel_pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
bool sched_in)
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
intel_pmu_pebs_sched_task(pmu_ctx, sched_in);
|
||||
intel_pmu_lbr_sched_task(pmu_ctx, sched_in);
|
||||
}
|
||||
|
||||
static void intel_pmu_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
|
||||
struct perf_event_pmu_context *next_epc)
|
||||
{
|
||||
intel_pmu_lbr_swap_task_ctx(prev_epc, next_epc);
|
||||
intel_pmu_lbr_sched_task(pmu_ctx, task, sched_in);
|
||||
}
|
||||
|
||||
static int intel_pmu_check_period(struct perf_event *event, u64 value)
|
||||
|
|
@ -5424,7 +5468,6 @@ static __initconst const struct x86_pmu intel_pmu = {
|
|||
|
||||
.guest_get_msrs = intel_guest_get_msrs,
|
||||
.sched_task = intel_pmu_sched_task,
|
||||
.swap_task_ctx = intel_pmu_swap_task_ctx,
|
||||
|
||||
.check_period = intel_pmu_check_period,
|
||||
|
||||
|
|
|
|||
|
|
@ -953,11 +953,11 @@ unlock:
|
|||
return 1;
|
||||
}
|
||||
|
||||
static inline void intel_pmu_drain_pebs_buffer(void)
|
||||
void intel_pmu_drain_pebs_buffer(void)
|
||||
{
|
||||
struct perf_sample_data data;
|
||||
|
||||
x86_pmu.drain_pebs(NULL, &data);
|
||||
static_call(x86_pmu_drain_pebs)(NULL, &data);
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
@ -1294,6 +1294,19 @@ static inline void pebs_update_threshold(struct cpu_hw_events *cpuc)
|
|||
ds->pebs_interrupt_threshold = threshold;
|
||||
}
|
||||
|
||||
#define PEBS_DATACFG_CNTRS(x) \
|
||||
((x >> PEBS_DATACFG_CNTR_SHIFT) & PEBS_DATACFG_CNTR_MASK)
|
||||
|
||||
#define PEBS_DATACFG_CNTR_BIT(x) \
|
||||
(((1ULL << x) & PEBS_DATACFG_CNTR_MASK) << PEBS_DATACFG_CNTR_SHIFT)
|
||||
|
||||
#define PEBS_DATACFG_FIX(x) \
|
||||
((x >> PEBS_DATACFG_FIX_SHIFT) & PEBS_DATACFG_FIX_MASK)
|
||||
|
||||
#define PEBS_DATACFG_FIX_BIT(x) \
|
||||
(((1ULL << (x)) & PEBS_DATACFG_FIX_MASK) \
|
||||
<< PEBS_DATACFG_FIX_SHIFT)
|
||||
|
||||
static void adaptive_pebs_record_size_update(void)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
|
|
@ -1308,10 +1321,58 @@ static void adaptive_pebs_record_size_update(void)
|
|||
sz += sizeof(struct pebs_xmm);
|
||||
if (pebs_data_cfg & PEBS_DATACFG_LBRS)
|
||||
sz += x86_pmu.lbr_nr * sizeof(struct lbr_entry);
|
||||
if (pebs_data_cfg & (PEBS_DATACFG_METRICS | PEBS_DATACFG_CNTR)) {
|
||||
sz += sizeof(struct pebs_cntr_header);
|
||||
|
||||
/* Metrics base and Metrics Data */
|
||||
if (pebs_data_cfg & PEBS_DATACFG_METRICS)
|
||||
sz += 2 * sizeof(u64);
|
||||
|
||||
if (pebs_data_cfg & PEBS_DATACFG_CNTR) {
|
||||
sz += (hweight64(PEBS_DATACFG_CNTRS(pebs_data_cfg)) +
|
||||
hweight64(PEBS_DATACFG_FIX(pebs_data_cfg))) *
|
||||
sizeof(u64);
|
||||
}
|
||||
}
|
||||
|
||||
cpuc->pebs_record_size = sz;
|
||||
}
|
||||
|
||||
static void __intel_pmu_pebs_update_cfg(struct perf_event *event,
|
||||
int idx, u64 *pebs_data_cfg)
|
||||
{
|
||||
if (is_metric_event(event)) {
|
||||
*pebs_data_cfg |= PEBS_DATACFG_METRICS;
|
||||
return;
|
||||
}
|
||||
|
||||
*pebs_data_cfg |= PEBS_DATACFG_CNTR;
|
||||
|
||||
if (idx >= INTEL_PMC_IDX_FIXED)
|
||||
*pebs_data_cfg |= PEBS_DATACFG_FIX_BIT(idx - INTEL_PMC_IDX_FIXED);
|
||||
else
|
||||
*pebs_data_cfg |= PEBS_DATACFG_CNTR_BIT(idx);
|
||||
}
|
||||
|
||||
|
||||
static void intel_pmu_late_setup(void)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
struct perf_event *event;
|
||||
u64 pebs_data_cfg = 0;
|
||||
int i;
|
||||
|
||||
for (i = 0; i < cpuc->n_events; i++) {
|
||||
event = cpuc->event_list[i];
|
||||
if (!is_pebs_counter_event_group(event))
|
||||
continue;
|
||||
__intel_pmu_pebs_update_cfg(event, cpuc->assign[i], &pebs_data_cfg);
|
||||
}
|
||||
|
||||
if (pebs_data_cfg & ~cpuc->pebs_data_cfg)
|
||||
cpuc->pebs_data_cfg |= pebs_data_cfg | PEBS_UPDATE_DS_SW;
|
||||
}
|
||||
|
||||
#define PERF_PEBS_MEMINFO_TYPE (PERF_SAMPLE_ADDR | PERF_SAMPLE_DATA_SRC | \
|
||||
PERF_SAMPLE_PHYS_ADDR | \
|
||||
PERF_SAMPLE_WEIGHT_TYPE | \
|
||||
|
|
@ -1914,12 +1975,89 @@ static void adaptive_pebs_save_regs(struct pt_regs *regs,
|
|||
#endif
|
||||
}
|
||||
|
||||
static void intel_perf_event_update_pmc(struct perf_event *event, u64 pmc)
|
||||
{
|
||||
int shift = 64 - x86_pmu.cntval_bits;
|
||||
struct hw_perf_event *hwc;
|
||||
u64 delta, prev_pmc;
|
||||
|
||||
/*
|
||||
* A recorded counter may not have an assigned event in the
|
||||
* following cases. The value should be dropped.
|
||||
* - An event is deleted. There is still an active PEBS event.
|
||||
* The PEBS record doesn't shrink on pmu::del().
|
||||
* If the counter of the deleted event once occurred in a PEBS
|
||||
* record, PEBS still records the counter until the counter is
|
||||
* reassigned.
|
||||
* - An event is stopped for some reason, e.g., throttled.
|
||||
* During this period, another event is added and takes the
|
||||
* counter of the stopped event. The stopped event is assigned
|
||||
* to another new and uninitialized counter, since the
|
||||
* x86_pmu_start(RELOAD) is not invoked for a stopped event.
|
||||
* The PEBS__DATA_CFG is updated regardless of the event state.
|
||||
* The uninitialized counter can be recorded in a PEBS record.
|
||||
* But the cpuc->events[uninitialized_counter] is always NULL,
|
||||
* because the event is stopped. The uninitialized value is
|
||||
* safely dropped.
|
||||
*/
|
||||
if (!event)
|
||||
return;
|
||||
|
||||
hwc = &event->hw;
|
||||
prev_pmc = local64_read(&hwc->prev_count);
|
||||
|
||||
/* Only update the count when the PMU is disabled */
|
||||
WARN_ON(this_cpu_read(cpu_hw_events.enabled));
|
||||
local64_set(&hwc->prev_count, pmc);
|
||||
|
||||
delta = (pmc << shift) - (prev_pmc << shift);
|
||||
delta >>= shift;
|
||||
|
||||
local64_add(delta, &event->count);
|
||||
local64_sub(delta, &hwc->period_left);
|
||||
}
|
||||
|
||||
static inline void __setup_pebs_counter_group(struct cpu_hw_events *cpuc,
|
||||
struct perf_event *event,
|
||||
struct pebs_cntr_header *cntr,
|
||||
void *next_record)
|
||||
{
|
||||
int bit;
|
||||
|
||||
for_each_set_bit(bit, (unsigned long *)&cntr->cntr, INTEL_PMC_MAX_GENERIC) {
|
||||
intel_perf_event_update_pmc(cpuc->events[bit], *(u64 *)next_record);
|
||||
next_record += sizeof(u64);
|
||||
}
|
||||
|
||||
for_each_set_bit(bit, (unsigned long *)&cntr->fixed, INTEL_PMC_MAX_FIXED) {
|
||||
/* The slots event will be handled with perf_metric later */
|
||||
if ((cntr->metrics == INTEL_CNTR_METRICS) &&
|
||||
(bit + INTEL_PMC_IDX_FIXED == INTEL_PMC_IDX_FIXED_SLOTS)) {
|
||||
next_record += sizeof(u64);
|
||||
continue;
|
||||
}
|
||||
intel_perf_event_update_pmc(cpuc->events[bit + INTEL_PMC_IDX_FIXED],
|
||||
*(u64 *)next_record);
|
||||
next_record += sizeof(u64);
|
||||
}
|
||||
|
||||
/* HW will reload the value right after the overflow. */
|
||||
if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
|
||||
local64_set(&event->hw.prev_count, (u64)-event->hw.sample_period);
|
||||
|
||||
if (cntr->metrics == INTEL_CNTR_METRICS) {
|
||||
static_call(intel_pmu_update_topdown_event)
|
||||
(cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS],
|
||||
(u64 *)next_record);
|
||||
next_record += 2 * sizeof(u64);
|
||||
}
|
||||
}
|
||||
|
||||
#define PEBS_LATENCY_MASK 0xffff
|
||||
|
||||
/*
|
||||
* With adaptive PEBS the layout depends on what fields are configured.
|
||||
*/
|
||||
|
||||
static void setup_pebs_adaptive_sample_data(struct perf_event *event,
|
||||
struct pt_regs *iregs, void *__pebs,
|
||||
struct perf_sample_data *data,
|
||||
|
|
@ -2049,6 +2187,28 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
|
|||
}
|
||||
}
|
||||
|
||||
if (format_group & (PEBS_DATACFG_CNTR | PEBS_DATACFG_METRICS)) {
|
||||
struct pebs_cntr_header *cntr = next_record;
|
||||
unsigned int nr;
|
||||
|
||||
next_record += sizeof(struct pebs_cntr_header);
|
||||
/*
|
||||
* The PEBS_DATA_CFG is a global register, which is the
|
||||
* superset configuration for all PEBS events.
|
||||
* For the PEBS record of non-sample-read group, ignore
|
||||
* the counter snapshot fields.
|
||||
*/
|
||||
if (is_pebs_counter_event_group(event)) {
|
||||
__setup_pebs_counter_group(cpuc, event, cntr, next_record);
|
||||
data->sample_flags |= PERF_SAMPLE_READ;
|
||||
}
|
||||
|
||||
nr = hweight32(cntr->cntr) + hweight32(cntr->fixed);
|
||||
if (cntr->metrics == INTEL_CNTR_METRICS)
|
||||
nr += 2;
|
||||
next_record += nr * sizeof(u64);
|
||||
}
|
||||
|
||||
WARN_ONCE(next_record != __pebs + basic->format_size,
|
||||
"PEBS record size %u, expected %llu, config %llx\n",
|
||||
basic->format_size,
|
||||
|
|
@ -2094,15 +2254,6 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
|
|||
return NULL;
|
||||
}
|
||||
|
||||
void intel_pmu_auto_reload_read(struct perf_event *event)
|
||||
{
|
||||
WARN_ON(!(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD));
|
||||
|
||||
perf_pmu_disable(event->pmu);
|
||||
intel_pmu_drain_pebs_buffer();
|
||||
perf_pmu_enable(event->pmu);
|
||||
}
|
||||
|
||||
/*
|
||||
* Special variant of intel_pmu_save_and_restart() for auto-reload.
|
||||
*/
|
||||
|
|
@ -2211,13 +2362,21 @@ __intel_pmu_pebs_last_event(struct perf_event *event,
|
|||
}
|
||||
|
||||
if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
|
||||
/*
|
||||
* Now, auto-reload is only enabled in fixed period mode.
|
||||
* The reload value is always hwc->sample_period.
|
||||
* May need to change it, if auto-reload is enabled in
|
||||
* freq mode later.
|
||||
*/
|
||||
intel_pmu_save_and_restart_reload(event, count);
|
||||
if ((is_pebs_counter_event_group(event))) {
|
||||
/*
|
||||
* The value of each sample has been updated when setup
|
||||
* the corresponding sample data.
|
||||
*/
|
||||
perf_event_update_userpage(event);
|
||||
} else {
|
||||
/*
|
||||
* Now, auto-reload is only enabled in fixed period mode.
|
||||
* The reload value is always hwc->sample_period.
|
||||
* May need to change it, if auto-reload is enabled in
|
||||
* freq mode later.
|
||||
*/
|
||||
intel_pmu_save_and_restart_reload(event, count);
|
||||
}
|
||||
} else
|
||||
intel_pmu_save_and_restart(event);
|
||||
}
|
||||
|
|
@ -2552,6 +2711,11 @@ void __init intel_ds_init(void)
|
|||
break;
|
||||
|
||||
case 6:
|
||||
if (x86_pmu.intel_cap.pebs_baseline) {
|
||||
x86_pmu.large_pebs_flags |= PERF_SAMPLE_READ;
|
||||
x86_pmu.late_setup = intel_pmu_late_setup;
|
||||
}
|
||||
fallthrough;
|
||||
case 5:
|
||||
x86_pmu.pebs_ept = 1;
|
||||
fallthrough;
|
||||
|
|
@ -2576,7 +2740,7 @@ void __init intel_ds_init(void)
|
|||
PERF_SAMPLE_REGS_USER |
|
||||
PERF_SAMPLE_REGS_INTR);
|
||||
}
|
||||
pr_cont("PEBS fmt4%c%s, ", pebs_type, pebs_qual);
|
||||
pr_cont("PEBS fmt%d%c%s, ", format, pebs_type, pebs_qual);
|
||||
|
||||
/*
|
||||
* The PEBS-via-PT is not supported on hybrid platforms,
|
||||
|
|
|
|||
|
|
@ -422,11 +422,17 @@ static __always_inline bool lbr_is_reset_in_cstate(void *ctx)
|
|||
return !rdlbr_from(((struct x86_perf_task_context *)ctx)->tos, NULL);
|
||||
}
|
||||
|
||||
static inline bool has_lbr_callstack_users(void *ctx)
|
||||
{
|
||||
return task_context_opt(ctx)->lbr_callstack_users ||
|
||||
x86_pmu.lbr_callstack_users;
|
||||
}
|
||||
|
||||
static void __intel_pmu_lbr_restore(void *ctx)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
|
||||
if (task_context_opt(ctx)->lbr_callstack_users == 0 ||
|
||||
if (!has_lbr_callstack_users(ctx) ||
|
||||
task_context_opt(ctx)->lbr_stack_state == LBR_NONE) {
|
||||
intel_pmu_lbr_reset();
|
||||
return;
|
||||
|
|
@ -503,7 +509,7 @@ static void __intel_pmu_lbr_save(void *ctx)
|
|||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
|
||||
if (task_context_opt(ctx)->lbr_callstack_users == 0) {
|
||||
if (!has_lbr_callstack_users(ctx)) {
|
||||
task_context_opt(ctx)->lbr_stack_state = LBR_NONE;
|
||||
return;
|
||||
}
|
||||
|
|
@ -516,32 +522,11 @@ static void __intel_pmu_lbr_save(void *ctx)
|
|||
cpuc->last_log_id = ++task_context_opt(ctx)->log_id;
|
||||
}
|
||||
|
||||
void intel_pmu_lbr_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
|
||||
struct perf_event_pmu_context *next_epc)
|
||||
{
|
||||
void *prev_ctx_data, *next_ctx_data;
|
||||
|
||||
swap(prev_epc->task_ctx_data, next_epc->task_ctx_data);
|
||||
|
||||
/*
|
||||
* Architecture specific synchronization makes sense in case
|
||||
* both prev_epc->task_ctx_data and next_epc->task_ctx_data
|
||||
* pointers are allocated.
|
||||
*/
|
||||
|
||||
prev_ctx_data = next_epc->task_ctx_data;
|
||||
next_ctx_data = prev_epc->task_ctx_data;
|
||||
|
||||
if (!prev_ctx_data || !next_ctx_data)
|
||||
return;
|
||||
|
||||
swap(task_context_opt(prev_ctx_data)->lbr_callstack_users,
|
||||
task_context_opt(next_ctx_data)->lbr_callstack_users);
|
||||
}
|
||||
|
||||
void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
|
||||
void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
||||
struct perf_ctx_data *ctx_data;
|
||||
void *task_ctx;
|
||||
|
||||
if (!cpuc->lbr_users)
|
||||
|
|
@ -552,14 +537,18 @@ void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched
|
|||
* the task was scheduled out, restore the stack. Otherwise flush
|
||||
* the LBR stack.
|
||||
*/
|
||||
task_ctx = pmu_ctx ? pmu_ctx->task_ctx_data : NULL;
|
||||
rcu_read_lock();
|
||||
ctx_data = rcu_dereference(task->perf_ctx_data);
|
||||
task_ctx = ctx_data ? ctx_data->data : NULL;
|
||||
if (task_ctx) {
|
||||
if (sched_in)
|
||||
__intel_pmu_lbr_restore(task_ctx);
|
||||
else
|
||||
__intel_pmu_lbr_save(task_ctx);
|
||||
rcu_read_unlock();
|
||||
return;
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
/*
|
||||
* Since a context switch can flip the address space and LBR entries
|
||||
|
|
@ -588,9 +577,19 @@ void intel_pmu_lbr_add(struct perf_event *event)
|
|||
|
||||
cpuc->br_sel = event->hw.branch_reg.reg;
|
||||
|
||||
if (branch_user_callstack(cpuc->br_sel) && event->pmu_ctx->task_ctx_data)
|
||||
task_context_opt(event->pmu_ctx->task_ctx_data)->lbr_callstack_users++;
|
||||
if (branch_user_callstack(cpuc->br_sel)) {
|
||||
if (event->attach_state & PERF_ATTACH_TASK) {
|
||||
struct task_struct *task = event->hw.target;
|
||||
struct perf_ctx_data *ctx_data;
|
||||
|
||||
rcu_read_lock();
|
||||
ctx_data = rcu_dereference(task->perf_ctx_data);
|
||||
if (ctx_data)
|
||||
task_context_opt(ctx_data->data)->lbr_callstack_users++;
|
||||
rcu_read_unlock();
|
||||
} else
|
||||
x86_pmu.lbr_callstack_users++;
|
||||
}
|
||||
/*
|
||||
* Request pmu::sched_task() callback, which will fire inside the
|
||||
* regular perf event scheduling, so that call will:
|
||||
|
|
@ -664,9 +663,19 @@ void intel_pmu_lbr_del(struct perf_event *event)
|
|||
if (!x86_pmu.lbr_nr)
|
||||
return;
|
||||
|
||||
if (branch_user_callstack(cpuc->br_sel) &&
|
||||
event->pmu_ctx->task_ctx_data)
|
||||
task_context_opt(event->pmu_ctx->task_ctx_data)->lbr_callstack_users--;
|
||||
if (branch_user_callstack(cpuc->br_sel)) {
|
||||
if (event->attach_state & PERF_ATTACH_TASK) {
|
||||
struct task_struct *task = event->hw.target;
|
||||
struct perf_ctx_data *ctx_data;
|
||||
|
||||
rcu_read_lock();
|
||||
ctx_data = rcu_dereference(task->perf_ctx_data);
|
||||
if (ctx_data)
|
||||
task_context_opt(ctx_data->data)->lbr_callstack_users--;
|
||||
rcu_read_unlock();
|
||||
} else
|
||||
x86_pmu.lbr_callstack_users--;
|
||||
}
|
||||
|
||||
if (event->hw.flags & PERF_X86_EVENT_LBR_SELECT)
|
||||
cpuc->lbr_select = 0;
|
||||
|
|
|
|||
|
|
@ -115,6 +115,11 @@ static inline bool is_branch_counters_group(struct perf_event *event)
|
|||
return event->group_leader->hw.flags & PERF_X86_EVENT_BRANCH_COUNTERS;
|
||||
}
|
||||
|
||||
static inline bool is_pebs_counter_event_group(struct perf_event *event)
|
||||
{
|
||||
return event->group_leader->hw.flags & PERF_X86_EVENT_PEBS_CNTR;
|
||||
}
|
||||
|
||||
struct amd_nb {
|
||||
int nb_id; /* NorthBridge id */
|
||||
int refcnt; /* reference count */
|
||||
|
|
@ -800,6 +805,7 @@ struct x86_pmu {
|
|||
u64 (*update)(struct perf_event *event);
|
||||
int (*hw_config)(struct perf_event *event);
|
||||
int (*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
|
||||
void (*late_setup)(void);
|
||||
unsigned eventsel;
|
||||
unsigned perfctr;
|
||||
unsigned fixedctr;
|
||||
|
|
@ -869,7 +875,7 @@ struct x86_pmu {
|
|||
|
||||
void (*check_microcode)(void);
|
||||
void (*sched_task)(struct perf_event_pmu_context *pmu_ctx,
|
||||
bool sched_in);
|
||||
struct task_struct *task, bool sched_in);
|
||||
|
||||
/*
|
||||
* Intel Arch Perfmon v2+
|
||||
|
|
@ -914,6 +920,7 @@ struct x86_pmu {
|
|||
const int *lbr_sel_map; /* lbr_select mappings */
|
||||
int *lbr_ctl_map; /* LBR_CTL mappings */
|
||||
};
|
||||
u64 lbr_callstack_users; /* lbr callstack system wide users */
|
||||
bool lbr_double_abort; /* duplicated lbr aborts */
|
||||
bool lbr_pt_coexist; /* (LBR|BTS) may coexist with PT */
|
||||
|
||||
|
|
@ -951,14 +958,6 @@ struct x86_pmu {
|
|||
*/
|
||||
int num_topdown_events;
|
||||
|
||||
/*
|
||||
* perf task context (i.e. struct perf_event_pmu_context::task_ctx_data)
|
||||
* switch helper to bridge calls from perf/core to perf/x86.
|
||||
* See struct pmu::swap_task_ctx() usage for examples;
|
||||
*/
|
||||
void (*swap_task_ctx)(struct perf_event_pmu_context *prev_epc,
|
||||
struct perf_event_pmu_context *next_epc);
|
||||
|
||||
/*
|
||||
* AMD bits
|
||||
*/
|
||||
|
|
@ -1107,6 +1106,8 @@ extern struct x86_pmu x86_pmu __read_mostly;
|
|||
|
||||
DECLARE_STATIC_CALL(x86_pmu_set_period, *x86_pmu.set_period);
|
||||
DECLARE_STATIC_CALL(x86_pmu_update, *x86_pmu.update);
|
||||
DECLARE_STATIC_CALL(x86_pmu_drain_pebs, *x86_pmu.drain_pebs);
|
||||
DECLARE_STATIC_CALL(x86_pmu_late_setup, *x86_pmu.late_setup);
|
||||
|
||||
static __always_inline struct x86_perf_task_context_opt *task_context_opt(void *ctx)
|
||||
{
|
||||
|
|
@ -1148,6 +1149,12 @@ extern u64 __read_mostly hw_cache_extra_regs
|
|||
|
||||
u64 x86_perf_event_update(struct perf_event *event);
|
||||
|
||||
static inline u64 intel_pmu_topdown_event_update(struct perf_event *event, u64 *val)
|
||||
{
|
||||
return x86_perf_event_update(event);
|
||||
}
|
||||
DECLARE_STATIC_CALL(intel_pmu_update_topdown_event, intel_pmu_topdown_event_update);
|
||||
|
||||
static inline unsigned int x86_pmu_config_addr(int index)
|
||||
{
|
||||
return x86_pmu.eventsel + (x86_pmu.addr_offset ?
|
||||
|
|
@ -1394,7 +1401,8 @@ void amd_pmu_lbr_reset(void);
|
|||
void amd_pmu_lbr_read(void);
|
||||
void amd_pmu_lbr_add(struct perf_event *event);
|
||||
void amd_pmu_lbr_del(struct perf_event *event);
|
||||
void amd_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
|
||||
void amd_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in);
|
||||
void amd_pmu_lbr_enable_all(void);
|
||||
void amd_pmu_lbr_disable_all(void);
|
||||
int amd_pmu_lbr_hw_config(struct perf_event *event);
|
||||
|
|
@ -1448,7 +1456,8 @@ static inline void amd_pmu_brs_del(struct perf_event *event)
|
|||
perf_sched_cb_dec(event->pmu);
|
||||
}
|
||||
|
||||
void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
|
||||
void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in);
|
||||
#else
|
||||
static inline int amd_brs_init(void)
|
||||
{
|
||||
|
|
@ -1473,7 +1482,8 @@ static inline void amd_pmu_brs_del(struct perf_event *event)
|
|||
{
|
||||
}
|
||||
|
||||
static inline void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
|
||||
static inline void amd_pmu_brs_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in)
|
||||
{
|
||||
}
|
||||
|
||||
|
|
@ -1643,7 +1653,7 @@ void intel_pmu_pebs_disable_all(void);
|
|||
|
||||
void intel_pmu_pebs_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
|
||||
|
||||
void intel_pmu_auto_reload_read(struct perf_event *event);
|
||||
void intel_pmu_drain_pebs_buffer(void);
|
||||
|
||||
void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr);
|
||||
|
||||
|
|
@ -1653,10 +1663,8 @@ void intel_pmu_lbr_save_brstack(struct perf_sample_data *data,
|
|||
struct cpu_hw_events *cpuc,
|
||||
struct perf_event *event);
|
||||
|
||||
void intel_pmu_lbr_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
|
||||
struct perf_event_pmu_context *next_epc);
|
||||
|
||||
void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
|
||||
void intel_pmu_lbr_sched_task(struct perf_event_pmu_context *pmu_ctx,
|
||||
struct task_struct *task, bool sched_in);
|
||||
|
||||
u64 lbr_from_signext_quirk_wr(u64 val);
|
||||
|
||||
|
|
|
|||
|
|
@ -9,7 +9,7 @@ PERF_ARCH(PEBS_LD_HSW, 0x00008) /* haswell style datala, load */
|
|||
PERF_ARCH(PEBS_NA_HSW, 0x00010) /* haswell style datala, unknown */
|
||||
PERF_ARCH(EXCL, 0x00020) /* HT exclusivity on counter */
|
||||
PERF_ARCH(DYNAMIC, 0x00040) /* dynamic alloc'd constraint */
|
||||
/* 0x00080 */
|
||||
PERF_ARCH(PEBS_CNTR, 0x00080) /* PEBS counters snapshot */
|
||||
PERF_ARCH(EXCL_ACCT, 0x00100) /* accounted EXCL event */
|
||||
PERF_ARCH(AUTO_RELOAD, 0x00200) /* use PEBS auto-reload */
|
||||
PERF_ARCH(LARGE_PEBS, 0x00400) /* use large PEBS */
|
||||
|
|
|
|||
|
|
@ -64,7 +64,8 @@ union ibs_op_ctl {
|
|||
opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
|
||||
reserved0:5, /* 27-31: reserved */
|
||||
opcurcnt:27, /* 32-58: periodic op counter current count */
|
||||
reserved1:5; /* 59-63: reserved */
|
||||
ldlat_thrsh:4, /* 59-62: Load Latency threshold */
|
||||
ldlat_en:1; /* 63: Load Latency enabled */
|
||||
};
|
||||
};
|
||||
|
||||
|
|
|
|||
|
|
@ -141,6 +141,12 @@
|
|||
#define PEBS_DATACFG_XMMS BIT_ULL(2)
|
||||
#define PEBS_DATACFG_LBRS BIT_ULL(3)
|
||||
#define PEBS_DATACFG_LBR_SHIFT 24
|
||||
#define PEBS_DATACFG_CNTR BIT_ULL(4)
|
||||
#define PEBS_DATACFG_CNTR_SHIFT 32
|
||||
#define PEBS_DATACFG_CNTR_MASK GENMASK_ULL(15, 0)
|
||||
#define PEBS_DATACFG_FIX_SHIFT 48
|
||||
#define PEBS_DATACFG_FIX_MASK GENMASK_ULL(7, 0)
|
||||
#define PEBS_DATACFG_METRICS BIT_ULL(5)
|
||||
|
||||
/* Steal the highest bit of pebs_data_cfg for SW usage */
|
||||
#define PEBS_UPDATE_DS_SW BIT_ULL(63)
|
||||
|
|
@ -482,6 +488,15 @@ struct pebs_xmm {
|
|||
u64 xmm[16*2]; /* two entries for each register */
|
||||
};
|
||||
|
||||
struct pebs_cntr_header {
|
||||
u32 cntr;
|
||||
u32 fixed;
|
||||
u32 metrics;
|
||||
u32 reserved;
|
||||
};
|
||||
|
||||
#define INTEL_CNTR_METRICS 0x3
|
||||
|
||||
/*
|
||||
* AMD Extended Performance Monitoring and Debug cpuid feature detection
|
||||
*/
|
||||
|
|
@ -509,6 +524,8 @@ struct pebs_xmm {
|
|||
#define IBS_CAPS_FETCHCTLEXTD (1U<<9)
|
||||
#define IBS_CAPS_OPDATA4 (1U<<10)
|
||||
#define IBS_CAPS_ZEN4 (1U<<11)
|
||||
#define IBS_CAPS_OPLDLAT (1U<<12)
|
||||
#define IBS_CAPS_OPDTLBPGSIZE (1U<<19)
|
||||
|
||||
#define IBS_CAPS_DEFAULT (IBS_CAPS_AVAIL \
|
||||
| IBS_CAPS_FETCHSAM \
|
||||
|
|
@ -534,8 +551,11 @@ struct pebs_xmm {
|
|||
* The lower 7 bits of the current count are random bits
|
||||
* preloaded by hardware and ignored in software
|
||||
*/
|
||||
#define IBS_OP_LDLAT_EN (1ULL<<63)
|
||||
#define IBS_OP_LDLAT_THRSH (0xFULL<<59)
|
||||
#define IBS_OP_CUR_CNT (0xFFF80ULL<<32)
|
||||
#define IBS_OP_CUR_CNT_RAND (0x0007FULL<<32)
|
||||
#define IBS_OP_CUR_CNT_EXT_MASK (0x7FULL<<52)
|
||||
#define IBS_OP_CNT_CTL (1ULL<<19)
|
||||
#define IBS_OP_VAL (1ULL<<18)
|
||||
#define IBS_OP_ENABLE (1ULL<<17)
|
||||
|
|
|
|||
|
|
@ -357,19 +357,23 @@ void *arch_uprobe_trampoline(unsigned long *psize)
|
|||
return &insn;
|
||||
}
|
||||
|
||||
static unsigned long trampoline_check_ip(void)
|
||||
static unsigned long trampoline_check_ip(unsigned long tramp)
|
||||
{
|
||||
unsigned long tramp = uprobe_get_trampoline_vaddr();
|
||||
|
||||
return tramp + (uretprobe_syscall_check - uretprobe_trampoline_entry);
|
||||
}
|
||||
|
||||
SYSCALL_DEFINE0(uretprobe)
|
||||
{
|
||||
struct pt_regs *regs = task_pt_regs(current);
|
||||
unsigned long err, ip, sp, r11_cx_ax[3];
|
||||
unsigned long err, ip, sp, r11_cx_ax[3], tramp;
|
||||
|
||||
if (regs->ip != trampoline_check_ip())
|
||||
/* If there's no trampoline, we are called from wrong place. */
|
||||
tramp = uprobe_get_trampoline_vaddr();
|
||||
if (unlikely(tramp == UPROBE_NO_TRAMPOLINE_VADDR))
|
||||
goto sigill;
|
||||
|
||||
/* Make sure the ip matches the only allowed sys_uretprobe caller. */
|
||||
if (unlikely(regs->ip != trampoline_check_ip(tramp)))
|
||||
goto sigill;
|
||||
|
||||
err = copy_from_user(r11_cx_ax, (void __user *)regs->sp, sizeof(r11_cx_ax));
|
||||
|
|
|
|||
|
|
@ -15,6 +15,7 @@
|
|||
#include <linux/radix-tree.h>
|
||||
#include <linux/gfp.h>
|
||||
#include <linux/percpu.h>
|
||||
#include <linux/cleanup.h>
|
||||
|
||||
struct idr {
|
||||
struct radix_tree_root idr_rt;
|
||||
|
|
@ -124,6 +125,22 @@ void *idr_get_next_ul(struct idr *, unsigned long *nextid);
|
|||
void *idr_replace(struct idr *, void *, unsigned long id);
|
||||
void idr_destroy(struct idr *);
|
||||
|
||||
struct __class_idr {
|
||||
struct idr *idr;
|
||||
int id;
|
||||
};
|
||||
|
||||
#define idr_null ((struct __class_idr){ NULL, -1 })
|
||||
#define take_idr_id(id) __get_and_null(id, idr_null)
|
||||
|
||||
DEFINE_CLASS(idr_alloc, struct __class_idr,
|
||||
if (_T.id >= 0) idr_remove(_T.idr, _T.id),
|
||||
((struct __class_idr){
|
||||
.idr = idr,
|
||||
.id = idr_alloc(idr, ptr, start, end, gfp),
|
||||
}),
|
||||
struct idr *idr, void *ptr, int start, int end, gfp_t gfp);
|
||||
|
||||
/**
|
||||
* idr_init_base() - Initialise an IDR.
|
||||
* @idr: IDR handle.
|
||||
|
|
|
|||
|
|
@ -17,7 +17,6 @@
|
|||
void lockup_detector_init(void);
|
||||
void lockup_detector_retry_init(void);
|
||||
void lockup_detector_soft_poweroff(void);
|
||||
void lockup_detector_cleanup(void);
|
||||
|
||||
extern int watchdog_user_enabled;
|
||||
extern int watchdog_thresh;
|
||||
|
|
@ -37,7 +36,6 @@ extern int sysctl_hardlockup_all_cpu_backtrace;
|
|||
static inline void lockup_detector_init(void) { }
|
||||
static inline void lockup_detector_retry_init(void) { }
|
||||
static inline void lockup_detector_soft_poweroff(void) { }
|
||||
static inline void lockup_detector_cleanup(void) { }
|
||||
#endif /* !CONFIG_LOCKUP_DETECTOR */
|
||||
|
||||
#ifdef CONFIG_SOFTLOCKUP_DETECTOR
|
||||
|
|
@ -104,12 +102,10 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
|
|||
#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
|
||||
extern void hardlockup_detector_perf_stop(void);
|
||||
extern void hardlockup_detector_perf_restart(void);
|
||||
extern void hardlockup_detector_perf_cleanup(void);
|
||||
extern void hardlockup_config_perf_event(const char *str);
|
||||
#else
|
||||
static inline void hardlockup_detector_perf_stop(void) { }
|
||||
static inline void hardlockup_detector_perf_restart(void) { }
|
||||
static inline void hardlockup_detector_perf_cleanup(void) { }
|
||||
static inline void hardlockup_config_perf_event(const char *str) { }
|
||||
#endif
|
||||
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@
|
|||
#include <linux/wait.h>
|
||||
#include <linux/rcu_sync.h>
|
||||
#include <linux/lockdep.h>
|
||||
#include <linux/cleanup.h>
|
||||
|
||||
struct percpu_rw_semaphore {
|
||||
struct rcu_sync rss;
|
||||
|
|
@ -125,6 +126,13 @@ extern bool percpu_is_read_locked(struct percpu_rw_semaphore *);
|
|||
extern void percpu_down_write(struct percpu_rw_semaphore *);
|
||||
extern void percpu_up_write(struct percpu_rw_semaphore *);
|
||||
|
||||
DEFINE_GUARD(percpu_read, struct percpu_rw_semaphore *,
|
||||
percpu_down_read(_T), percpu_up_read(_T))
|
||||
DEFINE_GUARD_COND(percpu_read, _try, percpu_down_read_trylock(_T))
|
||||
|
||||
DEFINE_GUARD(percpu_write, struct percpu_rw_semaphore *,
|
||||
percpu_down_write(_T), percpu_up_write(_T))
|
||||
|
||||
static inline bool percpu_is_write_locked(struct percpu_rw_semaphore *sem)
|
||||
{
|
||||
return atomic_read(&sem->block);
|
||||
|
|
|
|||
|
|
@ -343,8 +343,7 @@ struct pmu {
|
|||
*/
|
||||
unsigned int scope;
|
||||
|
||||
int __percpu *pmu_disable_count;
|
||||
struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
|
||||
struct perf_cpu_pmu_context * __percpu *cpu_pmu_context;
|
||||
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
|
||||
int task_ctx_nr;
|
||||
int hrtimer_interval_ms;
|
||||
|
|
@ -495,23 +494,13 @@ struct pmu {
|
|||
* context-switches callback
|
||||
*/
|
||||
void (*sched_task) (struct perf_event_pmu_context *pmu_ctx,
|
||||
bool sched_in);
|
||||
struct task_struct *task, bool sched_in);
|
||||
|
||||
/*
|
||||
* Kmem cache of PMU specific data
|
||||
*/
|
||||
struct kmem_cache *task_ctx_cache;
|
||||
|
||||
/*
|
||||
* PMU specific parts of task perf event context (i.e. ctx->task_ctx_data)
|
||||
* can be synchronized using this function. See Intel LBR callstack support
|
||||
* implementation and Perf core context switch handling callbacks for usage
|
||||
* examples.
|
||||
*/
|
||||
void (*swap_task_ctx) (struct perf_event_pmu_context *prev_epc,
|
||||
struct perf_event_pmu_context *next_epc);
|
||||
/* optional */
|
||||
|
||||
/*
|
||||
* Set up pmu-private data structures for an AUX area
|
||||
*/
|
||||
|
|
@ -673,13 +662,16 @@ struct swevent_hlist {
|
|||
struct rcu_head rcu_head;
|
||||
};
|
||||
|
||||
#define PERF_ATTACH_CONTEXT 0x01
|
||||
#define PERF_ATTACH_GROUP 0x02
|
||||
#define PERF_ATTACH_TASK 0x04
|
||||
#define PERF_ATTACH_TASK_DATA 0x08
|
||||
#define PERF_ATTACH_ITRACE 0x10
|
||||
#define PERF_ATTACH_SCHED_CB 0x20
|
||||
#define PERF_ATTACH_CHILD 0x40
|
||||
#define PERF_ATTACH_CONTEXT 0x0001
|
||||
#define PERF_ATTACH_GROUP 0x0002
|
||||
#define PERF_ATTACH_TASK 0x0004
|
||||
#define PERF_ATTACH_TASK_DATA 0x0008
|
||||
#define PERF_ATTACH_GLOBAL_DATA 0x0010
|
||||
#define PERF_ATTACH_SCHED_CB 0x0020
|
||||
#define PERF_ATTACH_CHILD 0x0040
|
||||
#define PERF_ATTACH_EXCLUSIVE 0x0080
|
||||
#define PERF_ATTACH_CALLCHAIN 0x0100
|
||||
#define PERF_ATTACH_ITRACE 0x0200
|
||||
|
||||
struct bpf_prog;
|
||||
struct perf_cgroup;
|
||||
|
|
@ -921,7 +913,7 @@ struct perf_event_pmu_context {
|
|||
struct list_head pinned_active;
|
||||
struct list_head flexible_active;
|
||||
|
||||
/* Used to avoid freeing per-cpu perf_event_pmu_context */
|
||||
/* Used to identify the per-cpu perf_event_pmu_context */
|
||||
unsigned int embedded : 1;
|
||||
|
||||
unsigned int nr_events;
|
||||
|
|
@ -931,7 +923,6 @@ struct perf_event_pmu_context {
|
|||
atomic_t refcount; /* event <-> epc */
|
||||
struct rcu_head rcu_head;
|
||||
|
||||
void *task_ctx_data; /* pmu specific data */
|
||||
/*
|
||||
* Set when one or more (plausibly active) event can't be scheduled
|
||||
* due to pmu overcommit or pmu constraints, except tolerant to
|
||||
|
|
@ -979,7 +970,6 @@ struct perf_event_context {
|
|||
int nr_user;
|
||||
int is_active;
|
||||
|
||||
int nr_task_data;
|
||||
int nr_stat;
|
||||
int nr_freq;
|
||||
int rotate_disable;
|
||||
|
|
@ -1020,6 +1010,41 @@ struct perf_event_context {
|
|||
local_t nr_no_switch_fast;
|
||||
};
|
||||
|
||||
/**
|
||||
* struct perf_ctx_data - PMU specific data for a task
|
||||
* @rcu_head: To avoid the race on free PMU specific data
|
||||
* @refcount: To track users
|
||||
* @global: To track system-wide users
|
||||
* @ctx_cache: Kmem cache of PMU specific data
|
||||
* @data: PMU specific data
|
||||
*
|
||||
* Currently, the struct is only used in Intel LBR call stack mode to
|
||||
* save/restore the call stack of a task on context switches.
|
||||
*
|
||||
* The rcu_head is used to prevent the race on free the data.
|
||||
* The data only be allocated when Intel LBR call stack mode is enabled.
|
||||
* The data will be freed when the mode is disabled.
|
||||
* The content of the data will only be accessed in context switch, which
|
||||
* should be protected by rcu_read_lock().
|
||||
*
|
||||
* Because of the alignment requirement of Intel Arch LBR, the Kmem cache
|
||||
* is used to allocate the PMU specific data. The ctx_cache is to track
|
||||
* the Kmem cache.
|
||||
*
|
||||
* Careful: Struct perf_ctx_data is added as a pointer in struct task_struct.
|
||||
* When system-wide Intel LBR call stack mode is enabled, a buffer with
|
||||
* constant size will be allocated for each task.
|
||||
* Also, system memory consumption can further grow when the size of
|
||||
* struct perf_ctx_data enlarges.
|
||||
*/
|
||||
struct perf_ctx_data {
|
||||
struct rcu_head rcu_head;
|
||||
refcount_t refcount;
|
||||
int global;
|
||||
struct kmem_cache *ctx_cache;
|
||||
void *data;
|
||||
};
|
||||
|
||||
struct perf_cpu_pmu_context {
|
||||
struct perf_event_pmu_context epc;
|
||||
struct perf_event_pmu_context *task_epc;
|
||||
|
|
@ -1029,6 +1054,7 @@ struct perf_cpu_pmu_context {
|
|||
|
||||
int active_oncpu;
|
||||
int exclusive;
|
||||
int pmu_disable_count;
|
||||
|
||||
raw_spinlock_t hrtimer_lock;
|
||||
struct hrtimer hrtimer;
|
||||
|
|
@ -1062,7 +1088,13 @@ struct perf_output_handle {
|
|||
struct perf_buffer *rb;
|
||||
unsigned long wakeup;
|
||||
unsigned long size;
|
||||
u64 aux_flags;
|
||||
union {
|
||||
u64 flags; /* perf_output*() */
|
||||
u64 aux_flags; /* perf_aux_output*() */
|
||||
struct {
|
||||
u64 skip_read : 1;
|
||||
};
|
||||
};
|
||||
union {
|
||||
void *addr;
|
||||
unsigned long head;
|
||||
|
|
@ -1339,6 +1371,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
|
|||
|
||||
if (branch_sample_hw_index(event))
|
||||
size += sizeof(u64);
|
||||
|
||||
brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);
|
||||
|
||||
size += brs->nr * sizeof(struct perf_branch_entry);
|
||||
|
||||
/*
|
||||
|
|
@ -1646,19 +1681,10 @@ static inline int perf_callchain_store(struct perf_callchain_entry_ctx *ctx, u64
|
|||
}
|
||||
|
||||
extern int sysctl_perf_event_paranoid;
|
||||
extern int sysctl_perf_event_mlock;
|
||||
extern int sysctl_perf_event_sample_rate;
|
||||
extern int sysctl_perf_cpu_time_max_percent;
|
||||
|
||||
extern void perf_sample_event_took(u64 sample_len_ns);
|
||||
|
||||
int perf_event_max_sample_rate_handler(const struct ctl_table *table, int write,
|
||||
void *buffer, size_t *lenp, loff_t *ppos);
|
||||
int perf_cpu_time_max_percent_handler(const struct ctl_table *table, int write,
|
||||
void *buffer, size_t *lenp, loff_t *ppos);
|
||||
int perf_event_max_stack_handler(const struct ctl_table *table, int write,
|
||||
void *buffer, size_t *lenp, loff_t *ppos);
|
||||
|
||||
/* Access to perf_event_open(2) syscall. */
|
||||
#define PERF_SECURITY_OPEN 0
|
||||
|
||||
|
|
|
|||
|
|
@ -65,6 +65,7 @@ struct mempolicy;
|
|||
struct nameidata;
|
||||
struct nsproxy;
|
||||
struct perf_event_context;
|
||||
struct perf_ctx_data;
|
||||
struct pid_namespace;
|
||||
struct pipe_inode_info;
|
||||
struct rcu_node;
|
||||
|
|
@ -1316,6 +1317,7 @@ struct task_struct {
|
|||
struct perf_event_context *perf_event_ctxp;
|
||||
struct mutex perf_event_mutex;
|
||||
struct list_head perf_event_list;
|
||||
struct perf_ctx_data __rcu *perf_ctx_data;
|
||||
#endif
|
||||
#ifdef CONFIG_DEBUG_PREEMPT
|
||||
unsigned long preempt_disable_ip;
|
||||
|
|
|
|||
|
|
@ -39,6 +39,8 @@ struct page;
|
|||
|
||||
#define MAX_URETPROBE_DEPTH 64
|
||||
|
||||
#define UPROBE_NO_TRAMPOLINE_VADDR (~0UL)
|
||||
|
||||
struct uprobe_consumer {
|
||||
/*
|
||||
* handler() can return UPROBE_HANDLER_REMOVE to signal the need to
|
||||
|
|
@ -143,6 +145,7 @@ struct uprobe_task {
|
|||
|
||||
struct uprobe *active_uprobe;
|
||||
unsigned long xol_vaddr;
|
||||
bool signal_denied;
|
||||
|
||||
struct arch_uprobe *auprobe;
|
||||
};
|
||||
|
|
|
|||
|
|
@ -385,6 +385,8 @@ enum perf_event_read_format {
|
|||
*
|
||||
* @sample_max_stack: Max number of frame pointers in a callchain,
|
||||
* should be < /proc/sys/kernel/perf_event_max_stack
|
||||
* Max number of entries of branch stack
|
||||
* should be < hardware limit
|
||||
*/
|
||||
struct perf_event_attr {
|
||||
|
||||
|
|
|
|||
|
|
@ -1453,11 +1453,6 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
|
|||
|
||||
out:
|
||||
cpus_write_unlock();
|
||||
/*
|
||||
* Do post unplug cleanup. This is still protected against
|
||||
* concurrent CPU hotplug via cpu_add_remove_lock.
|
||||
*/
|
||||
lockup_detector_cleanup();
|
||||
arch_smt_update();
|
||||
return ret;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -22,6 +22,7 @@ struct callchain_cpus_entries {
|
|||
|
||||
int sysctl_perf_event_max_stack __read_mostly = PERF_MAX_STACK_DEPTH;
|
||||
int sysctl_perf_event_max_contexts_per_stack __read_mostly = PERF_MAX_CONTEXTS_PER_STACK;
|
||||
static const int six_hundred_forty_kb = 640 * 1024;
|
||||
|
||||
static inline size_t perf_callchain_entry__sizeof(void)
|
||||
{
|
||||
|
|
@ -266,12 +267,8 @@ exit_put:
|
|||
return entry;
|
||||
}
|
||||
|
||||
/*
|
||||
* Used for sysctl_perf_event_max_stack and
|
||||
* sysctl_perf_event_max_contexts_per_stack.
|
||||
*/
|
||||
int perf_event_max_stack_handler(const struct ctl_table *table, int write,
|
||||
void *buffer, size_t *lenp, loff_t *ppos)
|
||||
static int perf_event_max_stack_handler(const struct ctl_table *table, int write,
|
||||
void *buffer, size_t *lenp, loff_t *ppos)
|
||||
{
|
||||
int *value = table->data;
|
||||
int new_value = *value, ret;
|
||||
|
|
@ -292,3 +289,32 @@ int perf_event_max_stack_handler(const struct ctl_table *table, int write,
|
|||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static const struct ctl_table callchain_sysctl_table[] = {
|
||||
{
|
||||
.procname = "perf_event_max_stack",
|
||||
.data = &sysctl_perf_event_max_stack,
|
||||
.maxlen = sizeof(sysctl_perf_event_max_stack),
|
||||
.mode = 0644,
|
||||
.proc_handler = perf_event_max_stack_handler,
|
||||
.extra1 = SYSCTL_ZERO,
|
||||
.extra2 = (void *)&six_hundred_forty_kb,
|
||||
},
|
||||
{
|
||||
.procname = "perf_event_max_contexts_per_stack",
|
||||
.data = &sysctl_perf_event_max_contexts_per_stack,
|
||||
.maxlen = sizeof(sysctl_perf_event_max_contexts_per_stack),
|
||||
.mode = 0644,
|
||||
.proc_handler = perf_event_max_stack_handler,
|
||||
.extra1 = SYSCTL_ZERO,
|
||||
.extra2 = SYSCTL_ONE_THOUSAND,
|
||||
},
|
||||
};
|
||||
|
||||
static int __init init_callchain_sysctls(void)
|
||||
{
|
||||
register_sysctl_init("kernel", callchain_sysctl_table);
|
||||
return 0;
|
||||
}
|
||||
core_initcall(init_callchain_sysctls);
|
||||
|
||||
|
|
|
|||
1076
kernel/events/core.c
1076
kernel/events/core.c
File diff suppressed because it is too large
Load Diff
|
|
@ -950,9 +950,10 @@ static int hw_breakpoint_event_init(struct perf_event *bp)
|
|||
return -ENOENT;
|
||||
|
||||
/*
|
||||
* no branch sampling for breakpoint events
|
||||
* Check if breakpoint type is supported before proceeding.
|
||||
* Also, no branch sampling for breakpoint events.
|
||||
*/
|
||||
if (has_branch_stack(bp))
|
||||
if (!hw_breakpoint_slots_cached(find_slot_idx(bp->attr.bp_type)) || has_branch_stack(bp))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
err = register_perf_hw_breakpoint(bp);
|
||||
|
|
|
|||
|
|
@ -19,7 +19,7 @@
|
|||
|
||||
static void perf_output_wakeup(struct perf_output_handle *handle)
|
||||
{
|
||||
atomic_set(&handle->rb->poll, EPOLLIN);
|
||||
atomic_set(&handle->rb->poll, EPOLLIN | EPOLLRDNORM);
|
||||
|
||||
handle->event->pending_wakeup = 1;
|
||||
|
||||
|
|
@ -185,6 +185,7 @@ __perf_output_begin(struct perf_output_handle *handle,
|
|||
|
||||
handle->rb = rb;
|
||||
handle->event = event;
|
||||
handle->flags = 0;
|
||||
|
||||
have_lost = local_read(&rb->lost);
|
||||
if (unlikely(have_lost)) {
|
||||
|
|
|
|||
|
|
@ -2169,8 +2169,8 @@ void uprobe_copy_process(struct task_struct *t, unsigned long flags)
|
|||
*/
|
||||
unsigned long uprobe_get_trampoline_vaddr(void)
|
||||
{
|
||||
unsigned long trampoline_vaddr = UPROBE_NO_TRAMPOLINE_VADDR;
|
||||
struct xol_area *area;
|
||||
unsigned long trampoline_vaddr = -1;
|
||||
|
||||
/* Pairs with xol_add_vma() smp_store_release() */
|
||||
area = READ_ONCE(current->mm->uprobes_state.xol_area); /* ^^^ */
|
||||
|
|
@ -2311,9 +2311,8 @@ bool uprobe_deny_signal(void)
|
|||
WARN_ON_ONCE(utask->state != UTASK_SSTEP);
|
||||
|
||||
if (task_sigpending(t)) {
|
||||
spin_lock_irq(&t->sighand->siglock);
|
||||
utask->signal_denied = true;
|
||||
clear_tsk_thread_flag(t, TIF_SIGPENDING);
|
||||
spin_unlock_irq(&t->sighand->siglock);
|
||||
|
||||
if (__fatal_signal_pending(t) || arch_uprobe_xol_was_trapped(t)) {
|
||||
utask->state = UTASK_SSTEP_TRAPPED;
|
||||
|
|
@ -2746,9 +2745,10 @@ static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
|
|||
utask->state = UTASK_RUNNING;
|
||||
xol_free_insn_slot(utask);
|
||||
|
||||
spin_lock_irq(¤t->sighand->siglock);
|
||||
recalc_sigpending(); /* see uprobe_deny_signal() */
|
||||
spin_unlock_irq(¤t->sighand->siglock);
|
||||
if (utask->signal_denied) {
|
||||
set_thread_flag(TIF_SIGPENDING);
|
||||
utask->signal_denied = false;
|
||||
}
|
||||
|
||||
if (unlikely(err)) {
|
||||
uprobe_warn(current, "execute the probed insn, sending SIGILL.");
|
||||
|
|
|
|||
|
|
@ -54,7 +54,6 @@
|
|||
#include <linux/acpi.h>
|
||||
#include <linux/reboot.h>
|
||||
#include <linux/ftrace.h>
|
||||
#include <linux/perf_event.h>
|
||||
#include <linux/oom.h>
|
||||
#include <linux/kmod.h>
|
||||
#include <linux/capability.h>
|
||||
|
|
@ -91,12 +90,6 @@ EXPORT_SYMBOL_GPL(sysctl_long_vals);
|
|||
#if defined(CONFIG_SYSCTL)
|
||||
|
||||
/* Constants used for minimum and maximum */
|
||||
|
||||
#ifdef CONFIG_PERF_EVENTS
|
||||
static const int six_hundred_forty_kb = 640 * 1024;
|
||||
#endif
|
||||
|
||||
|
||||
static const int ngroups_max = NGROUPS_MAX;
|
||||
static const int cap_last_cap = CAP_LAST_CAP;
|
||||
|
||||
|
|
@ -1932,63 +1925,6 @@ static const struct ctl_table kern_table[] = {
|
|||
.mode = 0644,
|
||||
.proc_handler = proc_dointvec,
|
||||
},
|
||||
#endif
|
||||
#ifdef CONFIG_PERF_EVENTS
|
||||
/*
|
||||
* User-space scripts rely on the existence of this file
|
||||
* as a feature check for perf_events being enabled.
|
||||
*
|
||||
* So it's an ABI, do not remove!
|
||||
*/
|
||||
{
|
||||
.procname = "perf_event_paranoid",
|
||||
.data = &sysctl_perf_event_paranoid,
|
||||
.maxlen = sizeof(sysctl_perf_event_paranoid),
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_dointvec,
|
||||
},
|
||||
{
|
||||
.procname = "perf_event_mlock_kb",
|
||||
.data = &sysctl_perf_event_mlock,
|
||||
.maxlen = sizeof(sysctl_perf_event_mlock),
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_dointvec,
|
||||
},
|
||||
{
|
||||
.procname = "perf_event_max_sample_rate",
|
||||
.data = &sysctl_perf_event_sample_rate,
|
||||
.maxlen = sizeof(sysctl_perf_event_sample_rate),
|
||||
.mode = 0644,
|
||||
.proc_handler = perf_event_max_sample_rate_handler,
|
||||
.extra1 = SYSCTL_ONE,
|
||||
},
|
||||
{
|
||||
.procname = "perf_cpu_time_max_percent",
|
||||
.data = &sysctl_perf_cpu_time_max_percent,
|
||||
.maxlen = sizeof(sysctl_perf_cpu_time_max_percent),
|
||||
.mode = 0644,
|
||||
.proc_handler = perf_cpu_time_max_percent_handler,
|
||||
.extra1 = SYSCTL_ZERO,
|
||||
.extra2 = SYSCTL_ONE_HUNDRED,
|
||||
},
|
||||
{
|
||||
.procname = "perf_event_max_stack",
|
||||
.data = &sysctl_perf_event_max_stack,
|
||||
.maxlen = sizeof(sysctl_perf_event_max_stack),
|
||||
.mode = 0644,
|
||||
.proc_handler = perf_event_max_stack_handler,
|
||||
.extra1 = SYSCTL_ZERO,
|
||||
.extra2 = (void *)&six_hundred_forty_kb,
|
||||
},
|
||||
{
|
||||
.procname = "perf_event_max_contexts_per_stack",
|
||||
.data = &sysctl_perf_event_max_contexts_per_stack,
|
||||
.maxlen = sizeof(sysctl_perf_event_max_contexts_per_stack),
|
||||
.mode = 0644,
|
||||
.proc_handler = perf_event_max_stack_handler,
|
||||
.extra1 = SYSCTL_ZERO,
|
||||
.extra2 = SYSCTL_ONE_THOUSAND,
|
||||
},
|
||||
#endif
|
||||
{
|
||||
.procname = "panic_on_warn",
|
||||
|
|
|
|||
|
|
@ -347,8 +347,6 @@ static int __init watchdog_thresh_setup(char *str)
|
|||
}
|
||||
__setup("watchdog_thresh=", watchdog_thresh_setup);
|
||||
|
||||
static void __lockup_detector_cleanup(void);
|
||||
|
||||
#ifdef CONFIG_SOFTLOCKUP_DETECTOR_INTR_STORM
|
||||
enum stats_per_group {
|
||||
STATS_SYSTEM,
|
||||
|
|
@ -886,11 +884,6 @@ static void __lockup_detector_reconfigure(void)
|
|||
|
||||
watchdog_hardlockup_start();
|
||||
cpus_read_unlock();
|
||||
/*
|
||||
* Must be called outside the cpus locked section to prevent
|
||||
* recursive locking in the perf code.
|
||||
*/
|
||||
__lockup_detector_cleanup();
|
||||
}
|
||||
|
||||
void lockup_detector_reconfigure(void)
|
||||
|
|
@ -940,24 +933,6 @@ static inline void lockup_detector_setup(void)
|
|||
}
|
||||
#endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
|
||||
|
||||
static void __lockup_detector_cleanup(void)
|
||||
{
|
||||
lockdep_assert_held(&watchdog_mutex);
|
||||
hardlockup_detector_perf_cleanup();
|
||||
}
|
||||
|
||||
/**
|
||||
* lockup_detector_cleanup - Cleanup after cpu hotplug or sysctl changes
|
||||
*
|
||||
* Caller must not hold the cpu hotplug rwsem.
|
||||
*/
|
||||
void lockup_detector_cleanup(void)
|
||||
{
|
||||
mutex_lock(&watchdog_mutex);
|
||||
__lockup_detector_cleanup();
|
||||
mutex_unlock(&watchdog_mutex);
|
||||
}
|
||||
|
||||
/**
|
||||
* lockup_detector_soft_poweroff - Interface to stop lockup detector(s)
|
||||
*
|
||||
|
|
|
|||
|
|
@ -21,8 +21,6 @@
|
|||
#include <linux/perf_event.h>
|
||||
|
||||
static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
|
||||
static DEFINE_PER_CPU(struct perf_event *, dead_event);
|
||||
static struct cpumask dead_events_mask;
|
||||
|
||||
static atomic_t watchdog_cpus = ATOMIC_INIT(0);
|
||||
|
||||
|
|
@ -146,6 +144,7 @@ static int hardlockup_detector_event_create(void)
|
|||
PTR_ERR(evt));
|
||||
return PTR_ERR(evt);
|
||||
}
|
||||
WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
|
||||
this_cpu_write(watchdog_ev, evt);
|
||||
return 0;
|
||||
}
|
||||
|
|
@ -181,36 +180,12 @@ void watchdog_hardlockup_disable(unsigned int cpu)
|
|||
|
||||
if (event) {
|
||||
perf_event_disable(event);
|
||||
perf_event_release_kernel(event);
|
||||
this_cpu_write(watchdog_ev, NULL);
|
||||
this_cpu_write(dead_event, event);
|
||||
cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
|
||||
atomic_dec(&watchdog_cpus);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
|
||||
*
|
||||
* Called from lockup_detector_cleanup(). Serialized by the caller.
|
||||
*/
|
||||
void hardlockup_detector_perf_cleanup(void)
|
||||
{
|
||||
int cpu;
|
||||
|
||||
for_each_cpu(cpu, &dead_events_mask) {
|
||||
struct perf_event *event = per_cpu(dead_event, cpu);
|
||||
|
||||
/*
|
||||
* Required because for_each_cpu() reports unconditionally
|
||||
* CPU0 as set on UP kernels. Sigh.
|
||||
*/
|
||||
if (event)
|
||||
perf_event_release_kernel(event);
|
||||
per_cpu(dead_event, cpu) = NULL;
|
||||
}
|
||||
cpumask_clear(&dead_events_mask);
|
||||
}
|
||||
|
||||
/**
|
||||
* hardlockup_detector_perf_stop - Globally stop watchdog events
|
||||
*
|
||||
|
|
|
|||
|
|
@ -64,7 +64,8 @@ union ibs_op_ctl {
|
|||
opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
|
||||
reserved0:5, /* 27-31: reserved */
|
||||
opcurcnt:27, /* 32-58: periodic op counter current count */
|
||||
reserved1:5; /* 59-63: reserved */
|
||||
ldlat_thrsh:4, /* 59-62: Load Latency threshold */
|
||||
ldlat_en:1; /* 63: Load Latency enabled */
|
||||
};
|
||||
};
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue