perf/x86/intel: Add support for rdpmc user disable feature

Starting with Panther Cove, the rdpmc user disable feature is supported.
This feature allows the perf system to disable user space rdpmc reads at
the counter level.

Currently, when a global counter is active, any user with rdpmc rights
can read it, even if perf access permissions forbid it (e.g., disallow
reading ring 0 counters). The rdpmc user disable feature mitigates this
security concern.

Details:

- A new RDPMC_USR_DISABLE bit (bit 37) in each EVNTSELx MSR indicates
  that the GP counter cannot be read by RDPMC in ring 3.
- New RDPMC_USR_DISABLE bits in IA32_FIXED_CTR_CTRL MSR (bits 33, 37,
  41, 45, etc.) for fixed counters 0, 1, 2, 3, etc.
- When calling rdpmc instruction for counter x, the following pseudo
  code demonstrates how the counter value is obtained:
  	If (!CPL0 && RDPMC_USR_DISABLE[x] == 1) ? 0 : counter_value;
- RDPMC_USR_DISABLE is enumerated by CPUID.0x23.0.EBX[2].

This patch extends the current global user space rdpmc control logic via
the sysfs interface (/sys/devices/cpu/rdpmc) as follows:

- rdpmc = 0:
  Global user space rdpmc and counter-level user space rdpmc for all
  counters are both disabled.
- rdpmc = 1:
  Global user space rdpmc is enabled during the mmap-enabled time window,
  and counter-level user space rdpmc is enabled only for non-system-wide
  events. This prevents counter data leaks as count data is cleared
  during context switches.
- rdpmc = 2:
  Global user space rdpmc and counter-level user space rdpmc for all
  counters are enabled unconditionally.

The new rdpmc settings only affect newly activated perf events; currently
active perf events remain unaffected. This simplifies and cleans up the
code. The default value of rdpmc remains unchanged at 1.

For more details about rdpmc user disable, please refer to chapter 15
"RDPMC USER DISABLE" in ISE documentation.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260114011750.350569-8-dapeng1.mi@linux.intel.com
master
Dapeng Mi 2026-01-14 09:17:50 +08:00 committed by Peter Zijlstra
parent 8c74e4e3e0
commit 59af95e028
5 changed files with 104 additions and 2 deletions

View File

@ -0,0 +1,44 @@
What: /sys/bus/event_source/devices/cpu.../rdpmc
Date: November 2011
KernelVersion: 3.10
Contact: Linux kernel mailing list linux-kernel@vger.kernel.org
Description: The /sys/bus/event_source/devices/cpu.../rdpmc attribute
is used to show/manage if rdpmc instruction can be
executed in user space. This attribute supports 3 numbers.
- rdpmc = 0
user space rdpmc is globally disabled for all PMU
counters.
- rdpmc = 1
user space rdpmc is globally enabled only in event mmap
ioctl called time window. If the mmap region is unmapped,
user space rdpmc is disabled again.
- rdpmc = 2
user space rdpmc is globally enabled for all PMU
counters.
In the Intel platforms supporting counter level's user
space rdpmc disable feature (CPUID.23H.EBX[2] = 1), the
meaning of 3 numbers is extended to
- rdpmc = 0
global user space rdpmc and counter level's user space
rdpmc of all counters are both disabled.
- rdpmc = 1
No changes on behavior of global user space rdpmc.
counter level's rdpmc of system-wide events is disabled
but counter level's rdpmc of non-system-wide events is
enabled.
- rdpmc = 2
global user space rdpmc and counter level's user space
rdpmc of all counters are both enabled unconditionally.
The default value of rdpmc is 1.
Please notice:
- global user space rdpmc's behavior would change
immediately along with the rdpmc value's change,
but the behavior of counter level's user space rdpmc
won't take effect immediately until the event is
reactivated or recreated.
- The rdpmc attribute is global, even for x86 hybrid
platforms. For example, changing cpu_core/rdpmc will
also change cpu_atom/rdpmc.

View File

@ -2616,6 +2616,27 @@ static ssize_t get_attr_rdpmc(struct device *cdev,
return snprintf(buf, 40, "%d\n", x86_pmu.attr_rdpmc);
}
/*
* Behaviors of rdpmc value:
* - rdpmc = 0
* global user space rdpmc and counter level's user space rdpmc of all
* counters are both disabled.
* - rdpmc = 1
* global user space rdpmc is enabled in mmap enabled time window and
* counter level's user space rdpmc is enabled for only non system-wide
* events. Counter level's user space rdpmc of system-wide events is
* still disabled by default. This won't introduce counter data leak for
* non system-wide events since their count data would be cleared when
* context switches.
* - rdpmc = 2
* global user space rdpmc and counter level's user space rdpmc of all
* counters are enabled unconditionally.
*
* Suppose the rdpmc value won't be changed frequently, don't dynamically
* reschedule events to make the new rpdmc value take effect on active perf
* events immediately, the new rdpmc value would only impact the new
* activated perf events. This makes code simpler and cleaner.
*/
static ssize_t set_attr_rdpmc(struct device *cdev,
struct device_attribute *attr,
const char *buf, size_t count)

View File

@ -3128,6 +3128,8 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
bits |= INTEL_FIXED_0_USER;
if (hwc->config & ARCH_PERFMON_EVENTSEL_OS)
bits |= INTEL_FIXED_0_KERNEL;
if (hwc->config & ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE)
bits |= INTEL_FIXED_0_RDPMC_USER_DISABLE;
/*
* ANY bit is supported in v3 and up
@ -3263,6 +3265,27 @@ static void intel_pmu_enable_event_ext(struct perf_event *event)
__intel_pmu_update_event_ext(hwc->idx, ext);
}
static void intel_pmu_update_rdpmc_user_disable(struct perf_event *event)
{
if (!x86_pmu_has_rdpmc_user_disable(event->pmu))
return;
/*
* Counter scope's user-space rdpmc is disabled by default
* except two cases.
* a. rdpmc = 2 (user space rdpmc enabled unconditionally)
* b. rdpmc = 1 and the event is not a system-wide event.
* The count of non-system-wide events would be cleared when
* context switches, so no count data is leaked.
*/
if (x86_pmu.attr_rdpmc == X86_USER_RDPMC_ALWAYS_ENABLE ||
(x86_pmu.attr_rdpmc == X86_USER_RDPMC_CONDITIONAL_ENABLE &&
event->ctx->task))
event->hw.config &= ~ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE;
else
event->hw.config |= ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE;
}
DEFINE_STATIC_CALL_NULL(intel_pmu_enable_event_ext, intel_pmu_enable_event_ext);
static void intel_pmu_enable_event(struct perf_event *event)
@ -3271,6 +3294,8 @@ static void intel_pmu_enable_event(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
intel_pmu_update_rdpmc_user_disable(event);
if (unlikely(event->attr.precise_ip))
static_call(x86_pmu_pebs_enable)(event);
@ -5869,6 +5894,8 @@ static void update_pmu_cap(struct pmu *pmu)
hybrid(pmu, config_mask) |= ARCH_PERFMON_EVENTSEL_UMASK2;
if (ebx_0.split.eq)
hybrid(pmu, config_mask) |= ARCH_PERFMON_EVENTSEL_EQ;
if (ebx_0.split.rdpmc_user_disable)
hybrid(pmu, config_mask) |= ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE;
if (eax_0.split.cntr_subleaf) {
cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF,

View File

@ -1333,6 +1333,12 @@ static inline u64 x86_pmu_get_event_config(struct perf_event *event)
return event->attr.config & hybrid(event->pmu, config_mask);
}
static inline bool x86_pmu_has_rdpmc_user_disable(struct pmu *pmu)
{
return !!(hybrid(pmu, config_mask) &
ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE);
}
extern struct event_constraint emptyconstraint;
extern struct event_constraint unconstrained;

View File

@ -33,6 +33,7 @@
#define ARCH_PERFMON_EVENTSEL_CMASK 0xFF000000ULL
#define ARCH_PERFMON_EVENTSEL_BR_CNTR (1ULL << 35)
#define ARCH_PERFMON_EVENTSEL_EQ (1ULL << 36)
#define ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE (1ULL << 37)
#define ARCH_PERFMON_EVENTSEL_UMASK2 (0xFFULL << 40)
#define INTEL_FIXED_BITS_STRIDE 4
@ -40,6 +41,7 @@
#define INTEL_FIXED_0_USER (1ULL << 1)
#define INTEL_FIXED_0_ANYTHREAD (1ULL << 2)
#define INTEL_FIXED_0_ENABLE_PMI (1ULL << 3)
#define INTEL_FIXED_0_RDPMC_USER_DISABLE (1ULL << 33)
#define INTEL_FIXED_3_METRICS_CLEAR (1ULL << 2)
#define HSW_IN_TX (1ULL << 32)
@ -50,7 +52,7 @@
#define INTEL_FIXED_BITS_MASK \
(INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER | \
INTEL_FIXED_0_ANYTHREAD | INTEL_FIXED_0_ENABLE_PMI | \
ICL_FIXED_0_ADAPTIVE)
ICL_FIXED_0_ADAPTIVE | INTEL_FIXED_0_RDPMC_USER_DISABLE)
#define intel_fixed_bits_by_idx(_idx, _bits) \
((_bits) << ((_idx) * INTEL_FIXED_BITS_STRIDE))
@ -226,7 +228,9 @@ union cpuid35_ebx {
unsigned int umask2:1;
/* EQ-bit Supported */
unsigned int eq:1;
unsigned int reserved:30;
/* rdpmc user disable Supported */
unsigned int rdpmc_user_disable:1;
unsigned int reserved:29;
} split;
unsigned int full;
};