Commit Graph

164 Commits (c38b8400aef99d63be2b1ff131bb993465dcafe1)

Author SHA1 Message Date
Dongxu Sun 97b5576b01 arm64/sme: Fix some comments of ARM SME
When TIF_SME is clear, fpsimd_restore_current_state will disable
SME trap during ret_to_user, then SME access trap is impossible
in userspace, not SVE.

Besides, fix typo: alocated->allocated.

Signed-off-by: Dongxu Sun <sundongxu3@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230317124915.1263-5-sundongxu3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-12 09:41:48 +01:00
Linus Torvalds 39ce4395c3 arm64 fixes:
- In copy_highpage(), only reset the tag of the destination pointer if
   KASAN_HW_TAGS is enabled so that user-space MTE does not interfere
   with KASAN_SW_TAGS (which relies on top-byte-ignore).
 
 - Remove warning if SME is detected without SVE, the kernel can cope
   with such configuration (though none in the field currently).
 
 - In cfi_handler(), pass the ESR_EL1 value to die() for consistency with
   other die() callers.
 
 - Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte
   manipulation from the generic vmemmap_remap_pte() does not follow the
   required ARM break-before-make sequence (clear the pte, flush the
   TLBs, set the new pte). It may be re-enabled once this sequence is
   sorted.
 
 - Fix possible memory leak in the arm64 ACPI code if the SMCCC version
   and conduit checks fail.
 
 - Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores
   -falign-functions=N with -Os.
 
 - Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no
   randomisation would actually take place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmQBHFEACgkQa9axLQDI
 XvHf9Q/3Zg8o/8HchnWSvzgV//9ljGrrDfAjbfZHrE2W4PCniSd0op0uXYsVK3IH
 Nk6ZDiRe5uIXKgHuSq5caOoL4aRk0hk1TpQ3RKCuh8E3ybhQe9gwYm8xEWXDSSWh
 QzcfENsKlZLpuMoSMILJ2NlMPMbMLprXNCUlgENBbRT7KUToHZKTwE6BL2AUI3tg
 RdMntccorybxk1hiXV1YKT8482i+x2gAnylYXFsq3eI+G54rdfiks+tft0CQV3ng
 1/i1PfbnGC45sBoxXPqYXzBSUDNHpAqb5dwvtlVinGo3J6STxIvbM6Zi5Ma5hl3u
 QrhwyduwCTZ6wVOqzd4KAH9gmhJSzRG75OzCek2dTwU9KXVMOPEvp1ZfTwUXDx7J
 5j8UkjGgrbtj6IioGqBAO/HiFfoty8EBtmlSZIj0thwxkM73ZBG6efQOJaVWh85m
 ioUzMC2Y5yfKLfHEcy9yKIQVizMYoz6fl+QHOEbVSoFhJKNRc4wt5CCJCvsbMHsu
 K8rvD/CI9jFMP9GEK7ObTaC7ICjUz/+8wbIrRrm5ObRQ65Tm2zv3OLqGnK8O5O4W
 gcDEraTnSPHDUtgG6dAEPFN5Wi9hT3zYC0xAcNhc3aZC5ofS5RD6YXIWJvqjWrvL
 5k8G1gfa57C/hfxO6pPw7bg/nY8vvYpxUkZ9erRWD430g7y0Sg==
 =jfPa
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - In copy_highpage(), only reset the tag of the destination pointer if
   KASAN_HW_TAGS is enabled so that user-space MTE does not interfere
   with KASAN_SW_TAGS (which relies on top-byte-ignore).

 - Remove warning if SME is detected without SVE, the kernel can cope
   with such configuration (though none in the field currently).

 - In cfi_handler(), pass the ESR_EL1 value to die() for consistency
   with other die() callers.

 - Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte
   manipulation from the generic vmemmap_remap_pte() does not follow the
   required ARM break-before-make sequence (clear the pte, flush the
   TLBs, set the new pte). It may be re-enabled once this sequence is
   sorted.

 - Fix possible memory leak in the arm64 ACPI code if the SMCCC version
   and conduit checks fail.

 - Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores
  -falign-functions=N with -Os.

 - Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no
   randomisation would actually take place.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN
  arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE
  arm64: acpi: Fix possible memory leak of ffh_ctxt
  arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP
  arm64: pass ESR_ELx to die() of cfi_handler
  arm64/fpsimd: Remove warning for SME without SVE
  arm64: Reset KASAN tag in copy_highpage with HW tags only
2023-03-02 14:57:53 -08:00
Mark Brown 0269680e5e arm64/fpsimd: Remove warning for SME without SVE
Support for SME without SVE is architecturally valid and has now been tested
well enough so let's remove the warning message that is displayed at boot.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230209-arm64-sme-no-sve-v1-1-74eb3df2f878@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-22 15:54:32 +00:00
Linus Torvalds 8bf1a529cd arm64 updates for 6.3:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that Linux
   needs to save/restore.
 
 - Include TPIDR2 in the signal context and add the corresponding
   kselftests.
 
 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time.
 
 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
 
 - Permit EFI boot with MMU and caches on. Instead of cleaning the entire
   loaded kernel image to the PoC and disabling the MMU and caches before
   branching to the kernel bare metal entry point, leave the MMU and
   caches enabled and rely on EFI's cacheable 1:1 mapping of all of
   system RAM to populate the initial page tables.
 
 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values).
 
 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure.
 
 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice.
 
 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone.
 
 - Further arm64 sysreg conversion and some fixes.
 
 - arm64 kselftest fixes and improvements.
 
 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation.
 
 - Pseudo-NMI code generation optimisations.
 
 - Minor fixes for SME and TPIDR2 handling.
 
 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable, replace
   strtobool() to kstrtobool() in the cpufeature.c code, apply dynamic
   shadow call stack in two passes, intercept pfn changes in set_pte_at()
   without the required break-before-make sequence, attempt to dump all
   instructions on unhandled kernel faults.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmP0/QsACgkQa9axLQDI
 XvG+gA/+JDVEH9wRzAIZvbp9hSuohPc48xgAmIMP1eiVB0/5qeRjYAJwS33H0rXS
 BPC2kj9IBy/eQeM9ICg0nFd0zYznSVacITqe6NrqeJ1F+ftS4rrHdfxd+J7kIoCs
 V2L8e+BJvmHdhmNV2qMAgJdGlfxfQBA7fv2cy52HKYcouoOh1AUVR/x+yXVXAsCd
 qJP3+dlUKccgm/oc5unEC1eZ49u8O+EoasqOyfG6K5udMgzhEX3K6imT9J3hw0WT
 UjstYkx5uGS/prUrRCQAX96VCHoZmzEDKtQuHkHvQXEYXsYPF3ldbR2CziNJnHe7
 QfSkjJlt8HAtExA+BkwEe9i0MQO/2VF5qsa2e4fA6l7uqGu3LOtS/jJd23C9n9fR
 Id8aBMeN6S8+MjqRA9L2uf4t6e4ISEHoG9ZRdc4WOwloxEEiJoIeun+7bHdOSZLj
 AFdHFCz4NXiiwC0UP0xPDI2YeCLqt5np7HmnrUqwzRpVO8UUagiJD8TIpcBSjBN9
 J68eidenHUW7/SlIeaMKE2lmo8AUEAJs9AorDSugF19/ThJcQdx7vT2UAZjeVB3j
 1dbbwajnlDOk/w8PQC4thFp5/MDlfst0htS3WRwa+vgkweE2EAdTU4hUZ8qEP7FQ
 smhYtlT1xUSTYDTqoaG/U2OWR6/UU79wP0jgcOsHXTuyYrtPI/Q=
 =VmXL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that
   Linux needs to save/restore

 - Include TPIDR2 in the signal context and add the corresponding
   kselftests

 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time

 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64

 - Permit EFI boot with MMU and caches on. Instead of cleaning the
   entire loaded kernel image to the PoC and disabling the MMU and
   caches before branching to the kernel bare metal entry point, leave
   the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
   all of system RAM to populate the initial page tables

 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values)

 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure

 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice

 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone

 - Further arm64 sysreg conversion and some fixes

 - arm64 kselftest fixes and improvements

 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation

 - Pseudo-NMI code generation optimisations

 - Minor fixes for SME and TPIDR2 handling

 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
   replace strtobool() to kstrtobool() in the cpufeature.c code, apply
   dynamic shadow call stack in two passes, intercept pfn changes in
   set_pte_at() without the required break-before-make sequence, attempt
   to dump all instructions on unhandled kernel faults

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
  arm64: fix .idmap.text assertion for large kernels
  kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
  kselftest/arm64: Copy whole EXTRA context
  arm64: kprobes: Drop ID map text from kprobes blacklist
  perf: arm_spe: Print the version of SPE detected
  perf: arm_spe: Add support for SPEv1.2 inverted event filtering
  perf: Add perf_event_attr::config3
  arm64/sme: Fix __finalise_el2 SMEver check
  drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
  arm64/signal: Only read new data when parsing the ZT context
  arm64/signal: Only read new data when parsing the ZA context
  arm64/signal: Only read new data when parsing the SVE context
  arm64/signal: Avoid rereading context frame sizes
  arm64/signal: Make interface for restore_fpsimd_context() consistent
  arm64/signal: Remove redundant size validation from parse_user_sigframe()
  arm64/signal: Don't redundantly verify FPSIMD magic
  arm64/cpufeature: Use helper macros to specify hwcaps
  arm64/cpufeature: Always use symbolic name for feature value in hwcaps
  arm64/sysreg: Initial unsigned annotations for ID registers
  arm64/sysreg: Initial annotation of signed ID registers
  ...
2023-02-21 15:27:48 -08:00
Catalin Marinas 156010ed9c Merge branches 'for-next/sysreg', 'for-next/sme', 'for-next/kselftest', 'for-next/misc', 'for-next/sme2', 'for-next/tpidr2', 'for-next/scs', 'for-next/compat-hwcap', 'for-next/ftrace', 'for-next/efi-boot-mmu-on', 'for-next/ptrauth' and 'for-next/pseudo-nmi', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: arm_spe: Print the version of SPE detected
  perf: arm_spe: Add support for SPEv1.2 inverted event filtering
  perf: Add perf_event_attr::config3
  drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
  perf: arm_spe: Support new SPEv1.2/v8.7 'not taken' event
  perf: arm_spe: Use new PMSIDR_EL1 register enums
  perf: arm_spe: Drop BIT() and use FIELD_GET/PREP accessors
  arm64/sysreg: Convert SPE registers to automatic generation
  arm64: Drop SYS_ from SPE register defines
  perf: arm_spe: Use feature numbering for PMSEVFR_EL1 defines
  perf/marvell: Add ACPI support to TAD uncore driver
  perf/marvell: Add ACPI support to DDR uncore driver
  perf/arm-cmn: Reset DTM_PMU_CONFIG at probe
  drivers/perf: hisi: Extract initialization of "cpa_pmu->pmu"
  drivers/perf: hisi: Simplify the parameters of hisi_pmu_init()
  drivers/perf: hisi: Advertise the PERF_PMU_CAP_NO_EXCLUDE capability

* for-next/sysreg:
  : arm64 sysreg and cpufeature fixes/updates
  KVM: arm64: Use symbolic definition for ISR_EL1.A
  arm64/sysreg: Add definition of ISR_EL1
  arm64/sysreg: Add definition for ICC_NMIAR1_EL1
  arm64/cpufeature: Remove 4 bit assumption in ARM64_FEATURE_MASK()
  arm64/sysreg: Fix errors in 32 bit enumeration values
  arm64/cpufeature: Fix field sign for DIT hwcap detection

* for-next/sme:
  : SME-related updates
  arm64/sme: Optimise SME exit on syscall entry
  arm64/sme: Don't use streaming mode to probe the maximum SME VL
  arm64/ptrace: Use system_supports_tpidr2() to check for TPIDR2 support

* for-next/kselftest: (23 commits)
  : arm64 kselftest fixes and improvements
  kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
  kselftest/arm64: Copy whole EXTRA context
  kselftest/arm64: Fix enumeration of systems without 128 bit SME for SSVE+ZA
  kselftest/arm64: Fix enumeration of systems without 128 bit SME
  kselftest/arm64: Don't require FA64 for streaming SVE tests
  kselftest/arm64: Limit the maximum VL we try to set via ptrace
  kselftest/arm64: Correct buffer size for SME ZA storage
  kselftest/arm64: Remove the local NUM_VL definition
  kselftest/arm64: Verify simultaneous SSVE and ZA context generation
  kselftest/arm64: Verify that SSVE signal context has SVE_SIG_FLAG_SM set
  kselftest/arm64: Remove spurious comment from MTE test Makefile
  kselftest/arm64: Support build of MTE tests with clang
  kselftest/arm64: Initialise current at build time in signal tests
  kselftest/arm64: Don't pass headers to the compiler as source
  kselftest/arm64: Remove redundant _start labels from FP tests
  kselftest/arm64: Fix .pushsection for strings in FP tests
  kselftest/arm64: Run BTI selftests on systems without BTI
  kselftest/arm64: Fix test numbering when skipping tests
  kselftest/arm64: Skip non-power of 2 SVE vector lengths in fp-stress
  kselftest/arm64: Only enumerate power of two VLs in syscall-abi
  ...

* for-next/misc:
  : Miscellaneous arm64 updates
  arm64/mm: Intercept pfn changes in set_pte_at()
  Documentation: arm64: correct spelling
  arm64: traps: attempt to dump all instructions
  arm64: Apply dynamic shadow call stack patching in two passes
  arm64: el2_setup.h: fix spelling typo in comments
  arm64: Kconfig: fix spelling
  arm64: cpufeature: Use kstrtobool() instead of strtobool()
  arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  arm64: make ARCH_FORCE_MAX_ORDER selectable

* for-next/sme2: (23 commits)
  : Support for arm64 SME 2 and 2.1
  arm64/sme: Fix __finalise_el2 SMEver check
  kselftest/arm64: Remove redundant _start labels from zt-test
  kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps
  kselftest/arm64: Add coverage of the ZT ptrace regset
  kselftest/arm64: Add SME2 coverage to syscall-abi
  kselftest/arm64: Add test coverage for ZT register signal frames
  kselftest/arm64: Teach the generic signal context validation about ZT
  kselftest/arm64: Enumerate SME2 in the signal test utility code
  kselftest/arm64: Cover ZT in the FP stress test
  kselftest/arm64: Add a stress test program for ZT0
  arm64/sme: Add hwcaps for SME 2 and 2.1 features
  arm64/sme: Implement ZT0 ptrace support
  arm64/sme: Implement signal handling for ZT
  arm64/sme: Implement context switching for ZT0
  arm64/sme: Provide storage for ZT0
  arm64/sme: Add basic enumeration for SME2
  arm64/sme: Enable host kernel to access ZT0
  arm64/sme: Manually encode ZT0 load and store instructions
  arm64/esr: Document ISS for ZT0 being disabled
  arm64/sme: Document SME 2 and SME 2.1 ABI
  ...

* for-next/tpidr2:
  : Include TPIDR2 in the signal context
  kselftest/arm64: Add test case for TPIDR2 signal frame records
  kselftest/arm64: Add TPIDR2 to the set of known signal context records
  arm64/signal: Include TPIDR2 in the signal context
  arm64/sme: Document ABI for TPIDR2 signal information

* for-next/scs:
  : arm64: harden shadow call stack pointer handling
  arm64: Stash shadow stack pointer in the task struct on interrupt
  arm64: Always load shadow stack pointer directly from the task struct

* for-next/compat-hwcap:
  : arm64: Expose compat ARMv8 AArch32 features (HWCAPs)
  arm64: Add compat hwcap SSBS
  arm64: Add compat hwcap SB
  arm64: Add compat hwcap I8MM
  arm64: Add compat hwcap ASIMDBF16
  arm64: Add compat hwcap ASIMDFHM
  arm64: Add compat hwcap ASIMDDP
  arm64: Add compat hwcap FPHP and ASIMDHP

* for-next/ftrace:
  : Add arm64 support for DYNAMICE_FTRACE_WITH_CALL_OPS
  arm64: avoid executing padding bytes during kexec / hibernation
  arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
  arm64: ftrace: Update stale comment
  arm64: patching: Add aarch64_insn_write_literal_u64()
  arm64: insn: Add helpers for BTI
  arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
  ACPI: Don't build ACPICA with '-Os'
  Compiler attributes: GCC cold function alignment workarounds
  ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS

* for-next/efi-boot-mmu-on:
  : Permit arm64 EFI boot with MMU and caches on
  arm64: kprobes: Drop ID map text from kprobes blacklist
  arm64: head: Switch endianness before populating the ID map
  efi: arm64: enter with MMU and caches enabled
  arm64: head: Clean the ID map and the HYP text to the PoC if needed
  arm64: head: avoid cache invalidation when entering with the MMU on
  arm64: head: record the MMU state at primary entry
  arm64: kernel: move identity map out of .text mapping
  arm64: head: Move all finalise_el2 calls to after __enable_mmu

* for-next/ptrauth:
  : arm64 pointer authentication cleanup
  arm64: pauth: don't sign leaf functions
  arm64: unify asm-arch manipulation

* for-next/pseudo-nmi:
  : Pseudo-NMI code generation optimisations
  arm64: irqflags: use alternative branches for pseudo-NMI logic
  arm64: add ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap
  arm64: make ARM64_HAS_GIC_PRIO_MASKING depend on ARM64_HAS_GIC_CPUIF_SYSREGS
  arm64: rename ARM64_HAS_IRQ_PRIO_MASKING to ARM64_HAS_GIC_PRIO_MASKING
  arm64: rename ARM64_HAS_SYSREG_GIC_CPUIF to ARM64_HAS_GIC_CPUIF_SYSREGS
2023-02-10 18:51:49 +00:00
Mark Brown 95fcec7132 arm64/sme: Implement context switching for ZT0
When the system supports SME2 the ZT0 register must be context switched as
part of the floating point state. This register is stored immediately
after ZA in memory and is only accessible when PSTATE.ZA is set so we
handle it in the same functions we use to save and restore ZA.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-10-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:06 +00:00
Mark Brown d4913eee15 arm64/sme: Add basic enumeration for SME2
Add basic feature detection for SME2, detecting that the feature is present
and disabling traps for ZT0.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-8-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:06 +00:00
Mark Brown ce514000da arm64/sme: Rename za_state to sme_state
In preparation for adding support for storage for ZT0 to the thread_struct
rename za_state to sme_state. Since ZT0 is accessible when PSTATE.ZA is
set just like ZA itself we will extend the allocation done for ZA to
cover it, avoiding the need to further expand task_struct for non-SME
tasks.

No functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-1-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:05 +00:00
Mark Brown fcd3d2c082 arm64/sme: Don't use streaming mode to probe the maximum SME VL
During development the architecture added the RDSVL instruction which means
we do not need to enter streaming mode to enumerate the SME VLs, use it
when we probe the maximum supported VL. Other users were already updated.

No functional change.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-sme-probe-max-v1-1-cbde68f67ad0@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-12 16:07:26 +00:00
Mark Brown 0cab5b4964 arm64/sme: Fix context switch for SME only systems
When refactoring fpsimd_load() to support keeping SVE enabled over syscalls
support for systems with SME but not SVE was broken. The code that selects
between loading regular FPSIMD and SVE states was guarded by using
system_supports_sve() but is also needed to handle the streaming SVE state
in SME only systems where that check will be false. Fix this by also
checking for system_supports_sme().

Fixes: a0136be443 ("arm64/fpsimd: Load FP state based on recorded data type")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-fix-sme-only-v1-1-938d663f69e5@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-01-05 15:31:18 +00:00
Will Deacon 75bc81d08f Merge branch 'for-next/sve-state' into for-next/core
* for-next/sve-state:
  arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu()
  arm64/sve: Leave SVE enabled on syscall if we don't context switch
  arm64/fpsimd: SME no longer requires SVE register state
  arm64/fpsimd: Load FP state based on recorded data type
  arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM
  arm64/fpsimd: Have KVM explicitly say which FP registers to save
  arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  KVM: arm64: Discard any SVE state when entering KVM guests
2022-12-06 11:27:28 +00:00
Mark Brown 1192b93ba3 arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu()
For reasons that are unclear to this reader fpsimd_bind_state_to_cpu()
populates the struct fpsimd_last_state_struct that it uses to store the
active floating point state for KVM guests by passing an argument for
each member of the structure. As the richness of the architecture increases
this is resulting in a function with a rather large number of arguments
which isn't ideal.

Simplify the interface by using the struct directly as the single argument
for the function, renaming it as we lift the definition into the header.
This could be built on further to reduce the work we do adding storage for
new FP state in various places but for now it just simplifies this one
interface.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-9-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 15:01:56 +00:00
Mark Brown 8c845e2731 arm64/sve: Leave SVE enabled on syscall if we don't context switch
The syscall ABI says that the SVE register state not shared with FPSIMD
may not be preserved on syscall, and this is the only mechanism we have
in the ABI to stop tracking the extra SVE state for a process. Currently
we do this unconditionally by means of disabling SVE for the process on
syscall, causing userspace to take a trap to EL1 if it uses SVE again.
These extra traps result in a noticeable overhead for using SVE instead
of FPSIMD in some workloads, especially for simple syscalls where we can
return directly to userspace and would not otherwise need to update the
floating point registers. Tests with fp-pidbench show an approximately
70% overhead on a range of implementations when SVE is in use - while
this is an extreme and entirely artificial benchmark it is clear that
there is some useful room for improvement here.

Now that we have the ability to track the decision about what to save
seprately to TIF_SVE we can improve things by leaving TIF_SVE enabled on
syscall but only saving the FPSIMD registers if we are in a syscall.
This means that if we need to restore the register state from memory
(eg, after a context switch or kernel mode NEON) we will drop TIF_SVE
and reenable traps for userspace but if we can just return to userspace
then traps will remain disabled.

Since our current implementation and hence ABI has the effect of zeroing
all the SVE register state not shared with FPSIMD on syscall we replace
the disabling of TIF_SVE with a flush of the non-shared register state,
this means that there is still some overhead for syscalls when SVE is in
use but it is very much reduced.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-8-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 15:01:56 +00:00
Mark Brown bbc6172eef arm64/fpsimd: SME no longer requires SVE register state
Now that we track the type of the stored register state separately to
what is active in the task, it is valid to have the FPSIMD register
state stored while in streaming mode. Remove the special case handling
for SME when setting FPSIMD register state.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-7-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 15:01:56 +00:00
Mark Brown a0136be443 arm64/fpsimd: Load FP state based on recorded data type
Now that we are recording the type of floating point register state we
are saving when we write the register state out to memory we can use
that information when we load from memory to decide which format to
load, bringing TIF_SVE into line with what we saved rather than relying
on TIF_SVE to determine what to load.

The SME state details are already recorded directly in the saved
SVCR and handled based on the information there.

Since we are not changing any of the save paths there should be no
functional change from this patch, further patches will make use of this
to optimise and clarify the code.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-6-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 15:01:56 +00:00
Mark Brown 62021cc36a arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM
Now that we are explicitly telling the host FP code which register state
it needs to save we can remove the manipulation of TIF_SVE from the KVM
code, simplifying it and allowing us to optimise our handling of normal
tasks. Remove the manipulation of TIF_SVE from KVM and instead rely on
to_save to ensure we save the correct data for it.

There should be no functional or performance impact from this change.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 15:01:56 +00:00
Mark Brown deeb8f9a80 arm64/fpsimd: Have KVM explicitly say which FP registers to save
In order to avoid needlessly saving and restoring the guest registers KVM
relies on the host FPSMID code to save the guest registers when we context
switch away from the guest. This is done by binding the KVM guest state to
the CPU on top of the task state that was originally there, then carefully
managing the TIF_SVE flag for the task to cause the host to save the full
SVE state when needed regardless of the needs of the host task. This works
well enough but isn't terribly direct about what is going on and makes it
much more complicated to try to optimise what we're doing with the SVE
register state.

Let's instead have KVM pass in the register state it wants saving when it
binds to the CPU. We introduce a new FP_STATE_CURRENT for use
during normal task binding to indicate that we should base our
decisions on the current task. This should not be used when
actually saving. Ideally we might want to use a separate enum for
the type to save but this enum and the enum values would then
need to be named which has problems with clarity and ambiguity.

In order to ease any future debugging that might be required this patch
does not actually update any of the decision making about what to save,
it merely starts tracking the new information and warns if the requested
state is not what we would otherwise have decided to save.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 15:01:56 +00:00
Mark Brown baa8515281 arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
When we save the state for the floating point registers this can be done
in the form visible through either the FPSIMD V registers or the SVE Z and
P registers. At present we track which format is currently used based on
TIF_SVE and the SME streaming mode state but particularly in the SVE case
this limits our options for optimising things, especially around syscalls.
Introduce a new enum which we place together with saved floating point
state in both thread_struct and the KVM guest state which explicitly
states which format is active and keep it up to date when we change it.

At present we do not use this state except to verify that it has the
expected value when loading the state, future patches will introduce
functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 15:01:56 +00:00
Mark Brown 93ae6b01ba KVM: arm64: Discard any SVE state when entering KVM guests
Since 8383741ab2 (KVM: arm64: Get rid of host SVE tracking/saving)
KVM has not tracked the host SVE state, relying on the fact that we
currently disable SVE whenever we perform a syscall. This may not be true
in future since performance optimisation may result in us keeping SVE
enabled in order to avoid needing to take access traps to reenable it.
Handle this by clearing TIF_SVE and converting the stored task state to
FPSIMD format when preparing to run the guest.  This is done with a new
call fpsimd_kvm_prepare() to keep the direct state manipulation
functions internal to fpsimd.c.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 15:01:56 +00:00
Mark Brown aaeca98456 arm64/fpsimd: Make kernel_neon_ API _GPL
Currently for reasons lost in the mists of time the kernel_neon_ APIs are
EXPORT_SYMBOL() but the general policy for floating point usage is that it
should be GPL only given the non-standard runtime environment that holds
while it is in use and PCS impacts when code is compiled for FP usage.

Given the limited existing deployment of non-GPL modules for arm64 and the
fact that other architectures like x86 already make their equivalent
functions GPL only this is not expected to be disruptive to existing users.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221107170747.276910-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08 16:07:36 +00:00
Mark Brown 714f3cbd70 arm64/sme: Don't flush SVE register state when handling SME traps
Currently as part of handling a SME access trap we flush the SVE register
state. This is not needed and would corrupt register state if the task has
access to the SVE registers already. For non-streaming mode accesses the
required flushing will be done in the SVE access trap. For streaming
mode SVE register accesses the architecture guarantees that the register
state will be flushed when streaming mode is entered or exited so there is
no need for us to do so. Simply remove the register initialisation.

Fixes: 8bd7f91c03 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:29:12 +01:00
Mark Brown 826a4fdd2a arm64/sme: Don't flush SVE register state when allocating SME storage
Currently when taking a SME access trap we allocate storage for the SVE
register state in order to be able to handle storage of streaming mode SVE.
Due to the original usage in a purely SVE context the SVE register state
allocation this also flushes the register state for SVE if storage was
already allocated but in the SME context this is not desirable. For a SME
access trap to be taken the task must not be in streaming mode so either
there already is SVE register state present for regular SVE mode which would
be corrupted or the task does not have TIF_SVE and the flush is redundant.

Fix this by adding a flag to sve_alloc() indicating if we are in a SVE
context and need to flush the state. Freshly allocated storage is always
zeroed either way.

Fixes: 8bd7f91c03 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:29:11 +01:00
Schspa Shi 4139320d19 arm64/fpsimd: Remove duplicate SYS_SVCR read
It seems to be a typo, remove the duplicate SYS_SVCR read.

Signed-off-by: Schspa Shi <schspa@gmail.com>
Link: https://lore.kernel.org/r/20220629051023.18173-1-schspa@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-29 10:29:30 +01:00
Mark Brown 2e990e6322 arm64/sme: Fix EFI save/restore
The EFI save/restore code is confused. When saving the check for saving
FFR is inverted due to confusion with the streaming mode check, and when
restoring we check if we need to restore FFR by checking the percpu
efi_sm_state without the required wrapper rather than based on the
combination of FA64 support and streaming mode.

Fixes: e0838f6373 ("arm64/sme: Save and restore streaming mode over EFI runtime calls")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220602124132.3528951-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-10 17:29:23 +01:00
Xiang wangx bb314511b6 arm64/fpsimd: Fix typo in comment
Delete the redundant word 'in'.

Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220610070543.59338-1-wangxiang@cdjrlc.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-10 16:24:15 +01:00
Catalin Marinas 0616ea3f1b Merge branch 'for-next/esr-elx-64-bit' into for-next/core
* for-next/esr-elx-64-bit:
  : Treat ESR_ELx as a 64-bit register.
  KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
  KVM: arm64: Treat ESR_EL2 as a 64-bit register
  arm64: Treat ESR_ELx as a 64-bit register
  arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall
  arm64: Make ESR_ELx_xVC_IMM_MASK compatible with assembly
2022-05-20 18:51:54 +01:00
Catalin Marinas e003d5335c Merge branch 'for-next/sysreg-gen' into for-next/core
* for-next/sysreg-gen: (32 commits)
  : Automatic system register definition generation.
  arm64/sysreg: Generate definitions for FAR_ELx
  arm64/sysreg: Generate definitions for DACR32_EL2
  arm64/sysreg: Generate definitions for CSSELR_EL1
  arm64/sysreg: Generate definitions for CPACR_ELx
  arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
  arm64/sysreg: Generate definitions for CLIDR_EL1
  arm64/sve: Generate ZCR definitions
  arm64/sme: Generate defintions for SVCR
  arm64/sme: Generate SMPRI_EL1 definitions
  arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
  arm64/sme: Automatically generate SMIDR_EL1 defines
  arm64/sme: Automatically generate defines for SMCR
  arm64/sysreg: Support generation of RAZ fields
  arm64/sme: Remove _EL0 from name of SVCR - FIXME sysreg.h
  arm64/sme: Standardise bitfield names for SVCR
  arm64/sme: Drop SYS_ from SMIDR_EL1 defines
  arm64/fp: Rename SVE and SME LEN field name to _WIDTH
  arm64/fp: Make SVE and SME length register definition match architecture
  arm64/sysreg: fix odd line spacing
  arm64/sysreg: improve comment for regs without fields
  ...
2022-05-20 18:50:57 +01:00
Geert Uytterhoeven 8e1f78a921 arm64/sve: Move sve_free() into SVE code section
If CONFIG_ARM64_SVE is not set:

    arch/arm64/kernel/fpsimd.c:294:13: warning: ‘sve_free’ defined but not used [-Wunused-function]

Fix this by moving sve_free() and __sve_free() into the existing section
protected by "#ifdef CONFIG_ARM64_SVE", now the last user outside that
section has been removed.

Fixes: a1259dd807 ("arm64/sve: Delay freeing memory in fpsimd_flush_thread()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/cd633284683c24cb9469f8ff429915aedf67f868.1652798894.git.geert+renesas@glider.be
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-18 14:44:36 +01:00
Mark Brown ec0067a63e arm64/sme: Remove _EL0 from name of SVCR - FIXME sysreg.h
The defines for SVCR call it SVCR_EL0 however the architecture calls the
register SVCR with no _EL0 suffix. In preparation for generating the sysreg
definitions rename to match the architecture, no functional change.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220510161208.631259-6-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 19:50:20 +01:00
Mark Brown e65fc01bf2 arm64/sme: Standardise bitfield names for SVCR
The bitfield definitions for SVCR have a SYS_ added to the names of the
constant which will be a problem for automatic generation. Remove the
prefixes, no functional change.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220510161208.631259-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 19:50:20 +01:00
Sebastian Andrzej Siewior 696207d425 arm64/sve: Make kernel FPU protection RT friendly
Non RT kernels need to protect FPU against preemption and bottom half
processing. This is achieved by disabling bottom halves via
local_bh_disable() which implictly disables preemption.

On RT kernels this protection mechanism is not sufficient because
local_bh_disable() does not disable preemption. It serializes bottom half
related processing via a CPU local lock.

As bottom halves are running always in thread context on RT kernels
disabling preemption is the proper choice as it implicitly prevents bottom
half processing.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220505163207.85751-3-bigeasy@linutronix.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 19:15:46 +01:00
Sebastian Andrzej Siewior a1259dd807 arm64/sve: Delay freeing memory in fpsimd_flush_thread()
fpsimd_flush_thread() invokes kfree() via sve_free()+sme_free() within a
preempt disabled section which is not working on -RT.

Delay freeing of memory until preemption is enabled again.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220505163207.85751-2-bigeasy@linutronix.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 19:15:45 +01:00
Alexandru Elisei 8d56e5c5a9 arm64: Treat ESR_ELx as a 64-bit register
In the initial release of the ARM Architecture Reference Manual for
ARMv8-A, the ESR_ELx registers were defined as 32-bit registers. This
changed in 2018 with version D.a (ARM DDI 0487D.a) of the architecture,
when they became 64-bit registers, with bits [63:32] defined as RES0. In
version G.a, a new field was added to ESR_ELx, ISS2, which covers bits
[36:32].  This field is used when the Armv8.7 extension FEAT_LS64 is
implemented.

As a result of the evolution of the register width, Linux stores it as
both a 64-bit value and a 32-bit value, which hasn't affected correctness
so far as Linux only uses the lower 32 bits of the register.

Make the register type consistent and always treat it as 64-bit wide. The
register is redefined as an "unsigned long", which is an unsigned
double-word (64-bit quantity) for the LP64 machine (aapcs64 [1], Table 1,
page 14). The type was chosen because "unsigned int" is the most frequent
type for ESR_ELx and because FAR_ELx, which is used together with ESR_ELx
in exception handling, is also declared as "unsigned long". The 64-bit type
also makes adding support for architectural features that use fields above
bit 31 easier in the future.

The KVM hypervisor will receive a similar update in a subsequent patch.

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-4-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Mark Brown e0838f6373 arm64/sme: Save and restore streaming mode over EFI runtime calls
When saving and restoring the floating point state over an EFI runtime
call ensure that we handle streaming mode, only handling FFR if we are not
in streaming mode and ensuring that we are in normal mode over the call
into runtime services.

We currently assume that ZA will not be modified by runtime services, the
specification is not yet finalised so this may need updating if that
changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-24-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:21 +01:00
Mark Brown d45d7ff704 arm64/sme: Disable streaming mode and ZA when flushing CPU state
Both streaming mode and ZA may increase power consumption when they are
enabled and streaming mode makes many FPSIMD and SVE instructions undefined
which will cause problems for any kernel mode floating point so disable
both when we flush the CPU state. This covers both kernel_neon_begin() and
idle and after flushing the state a reload is always required anyway.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-23-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:19 +01:00
Mark Brown e12310a0d3 arm64/sme: Implement ptrace support for streaming mode SVE registers
The streaming mode SVE registers are represented using the same data
structures as for SVE but since the vector lengths supported and in use
may not be the same as SVE we represent them with a new type NT_ARM_SSVE.
Unfortunately we only have a single 16 bit reserved field available in
the header so there is no space to fit the current and maximum vector
length for both standard and streaming SVE mode without redefining the
structure in a way the creates a complicatd and fragile ABI. Since FFR
is not present in streaming mode it is read and written as zero.

Setting NT_ARM_SSVE registers will put the task into streaming mode,
similarly setting NT_ARM_SVE registers will exit it. Reads that do not
correspond to the current mode of the task will return the header with
no register data. For compatibility reasons on write setting no flag for
the register type will be interpreted as setting SVE registers, though
users can provide no register data as an alternative mechanism for doing
so.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-21-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:15 +01:00
Mark Brown 39782210eb arm64/sme: Implement ZA signal handling
Implement support for ZA in signal handling in a very similar way to how
we implement support for SVE registers, using a signal context structure
with optional register state after it. Where present this register state
stores the ZA matrix as a series of horizontal vectors numbered from 0 to
VL/8 in the endinanness independent format used for vectors.

As with SVE we do not allow changes in the vector length during signal
return but we do allow ZA to be enabled or disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-20-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:13 +01:00
Mark Brown 8bd7f91c03 arm64/sme: Implement traps and syscall handling for SME
By default all SME operations in userspace will trap.  When this happens
we allocate storage space for the SME register state, set up the SVE
registers and disable traps.  We do not need to initialize ZA since the
architecture guarantees that it will be zeroed when enabled and when we
trap ZA is disabled.

On syscall we exit streaming mode if we were previously in it and ensure
that all but the lower 128 bits of the registers are zeroed while
preserving the state of ZA. This follows the aarch64 PCS for SME, ZA
state is preserved over a function call and streaming mode is exited.
Since the traps for SME do not distinguish between streaming mode SVE
and ZA usage if ZA is in use rather than reenabling traps we instead
zero the parts of the SVE registers not shared with FPSIMD and leave SME
enabled, this simplifies handling SME traps. If ZA is not in use then we
reenable SME traps and fall through to normal handling of SVE.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-17-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:05 +01:00
Mark Brown 0033cd9339 arm64/sme: Implement ZA context switching
Allocate space for storing ZA on first access to SME and use that to save
and restore ZA state when context switching. We do this by using the vector
form of the LDR and STR ZA instructions, these do not require streaming
mode and have implementation recommendations that they avoid contention
issues in shared SMCU implementations.

Since ZA is architecturally guaranteed to be zeroed when enabled we do not
need to explicitly zero ZA, either we will be restoring from a saved copy
or trapping on first use of SME so we know that ZA must be disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-16-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:02 +01:00
Mark Brown af7167d6d2 arm64/sme: Implement streaming SVE context switching
When in streaming mode we need to save and restore the streaming mode
SVE register state rather than the regular SVE register state. This uses
the streaming mode vector length and omits FFR but is otherwise identical,
if TIF_SVE is enabled when we are in streaming mode then streaming mode
takes precedence.

This does not handle use of streaming SVE state with KVM, ptrace or
signals. This will be updated in further patches.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-15-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:00 +01:00
Mark Brown b40c559b45 arm64/sme: Implement SVCR context switching
In SME the use of both streaming SVE mode and ZA are tracked through
PSTATE.SM and PSTATE.ZA, visible through the system register SVCR.  In
order to context switch the floating point state for SME we need to
context switch the contents of this register as part of context
switching the floating point state.

Since changing the vector length exits streaming SVE mode and disables
ZA we also make sure we update SVCR appropriately when setting vector
length, and similarly ensure that new threads have streaming SVE mode
and ZA disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-14-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:50:58 +01:00
Mark Brown a9d6915859 arm64/sme: Implement support for TPIDR2
The Scalable Matrix Extension introduces support for a new thread specific
data register TPIDR2 intended for use by libc. The kernel must save the
value of TPIDR2 on context switch and should ensure that all new threads
start off with a default value of 0. Add a field to the thread_struct to
store TPIDR2 and context switch it with the other thread specific data.

In case there are future extensions which also use TPIDR2 we introduce
system_supports_tpidr2() and use that rather than system_supports_sme()
for TPIDR2 handling.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-13-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:50:56 +01:00
Mark Brown 9e4ab6c891 arm64/sme: Implement vector length configuration prctl()s
As for SVE provide a prctl() interface which allows processes to
configure their SME vector length.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-12-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:50:54 +01:00
Mark Brown 12f1bacfc5 arm64/sme: Implement sysctl to set the default vector length
As for SVE provide a sysctl which allows the default SME vector length to
be configured.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-11-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:50:52 +01:00
Mark Brown b42990d3bf arm64/sme: Identify supported SME vector lengths at boot
The vector lengths used for SME are controlled through a similar set of
registers to those for SVE and enumerated using a similar algorithm with
some slight differences due to the fact that unlike SVE there are no
restrictions on which combinations of vector lengths can be supported
nor any mandatory vector lengths which must be implemented.  Add a new
vector type and implement support for enumerating it.

One slightly awkward feature is that we need to read the current vector
length using a different instruction (or enter streaming mode which
would have the same issue and be higher cost).  Rather than add an ops
structure we add special cases directly in the otherwise generic
vec_probe_vqs() function, this is a bit inelegant but it's the only
place where this is an issue.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-10-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:50:51 +01:00
Mark Brown 5e64b862c4 arm64/sme: Basic enumeration support
This patch introduces basic cpufeature support for discovering the presence
of the Scalable Matrix Extension.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-9-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:50:49 +01:00
Mark Brown 432110cd83 arm64/fpsimd: Clarify the purpose of using last in fpsimd_save()
When saving the floating point context in fpsimd_save() we always reference
the state using last-> rather than using current->. Looking at the FP code
in isolation the reason for this is not entirely obvious, it's done because
when KVM is running it will bind the guest context and rely on the host
writing out the guest state on context switch away from the guest.

There's a slight trick here in that KVM still uses TIF_FOREIGN_FPSTATE and
TIF_SVE to communicate what needs to be saved, it maintains those flags
and restores them when it is done running the guest so that the normal
restore paths function when we return back to userspace.

Add a comment to explain this to help future readers work out what's going
on a bit faster.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220124161115.115200-1-broonie@kernel.org
2022-02-08 14:43:22 +00:00
Linus Torvalds 79e06c4c49 RISCV:
- Use common KVM implementation of MMU memory caches
 
 - SBI v0.2 support for Guest
 
 - Initial KVM selftests support
 
 - Fix to avoid spurious virtual interrupts after clearing hideleg CSR
 
 - Update email address for Anup and Atish
 
 ARM:
 - Simplification of the 'vcpu first run' by integrating it into
   KVM's 'pid change' flow
 
 - Refactoring of the FP and SVE state tracking, also leading to
   a simpler state and less shared data between EL1 and EL2 in
   the nVHE case
 
 - Tidy up the header file usage for the nvhe hyp object
 
 - New HYP unsharing mechanism, finally allowing pages to be
   unmapped from the Stage-1 EL2 page-tables
 
 - Various pKVM cleanups around refcounting and sharing
 
 - A couple of vgic fixes for bugs that would trigger once
   the vcpu xarray rework is merged, but not sooner
 
 - Add minimal support for ARMv8.7's PMU extension
 
 - Rework kvm_pgtable initialisation ahead of the NV work
 
 - New selftest for IRQ injection
 
 - Teach selftests about the lack of default IPA space and
   page sizes
 
 - Expand sysreg selftest to deal with Pointer Authentication
 
 - The usual bunch of cleanups and doc update
 
 s390:
 - fix sigp sense/start/stop/inconsistency
 
 - cleanups
 
 x86:
 - Clean up some function prototypes more
 
 - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation
 
 - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
 
 - completely remove potential TOC/TOU races in nested SVM consistency checks
 
 - update some PMCs on emulated instructions
 
 - Intel AMX support (joint work between Thomas and Intel)
 
 - large MMU cleanups
 
 - module parameter to disable PMU virtualization
 
 - cleanup register cache
 
 - first part of halt handling cleanups
 
 - Hyper-V enlightened MSR bitmap support for nested hypervisors
 
 Generic:
 - clean up Makefiles
 
 - introduce CONFIG_HAVE_KVM_DIRTY_RING
 
 - optimize memslot lookup using a tree
 
 - optimize vCPU array usage by converting to xarray
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHhxvsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPZkAf+Nz92UL/5nNGcdHtE4m7AToMmitE9
 bYkesf9BMQvAe5wjkABLuoHGi6ay4jabo4fiGzbdkiK7lO5YgfsWiMB3/MT5fl4E
 jRPzaVQabp3YZLM8UYCBmfUVuRj524S967SfSRe0AvYjDEH8y7klPf4+7sCsFT0/
 Px9Vf2KGuOlf0eM78yKg4rGaF0jS22eLgXm6FfNMY8/e29ZAo/jyUmqBY+Z2xxZG
 aWhceDtSheW1jwLHLj3nOlQJvHTn8LVGXBE/R8Gda3ZjrBV2rKaDi4Fh+HD+dz86
 2zVXwzQ7uck2CMW73GMoXMTWoKSHMyvlBOs1BdvBm4UsnGcXR+q8IFCeuQ==
 =s73m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "RISCV:

   - Use common KVM implementation of MMU memory caches

   - SBI v0.2 support for Guest

   - Initial KVM selftests support

   - Fix to avoid spurious virtual interrupts after clearing hideleg CSR

   - Update email address for Anup and Atish

  ARM:

   - Simplification of the 'vcpu first run' by integrating it into KVM's
     'pid change' flow

   - Refactoring of the FP and SVE state tracking, also leading to a
     simpler state and less shared data between EL1 and EL2 in the nVHE
     case

   - Tidy up the header file usage for the nvhe hyp object

   - New HYP unsharing mechanism, finally allowing pages to be unmapped
     from the Stage-1 EL2 page-tables

   - Various pKVM cleanups around refcounting and sharing

   - A couple of vgic fixes for bugs that would trigger once the vcpu
     xarray rework is merged, but not sooner

   - Add minimal support for ARMv8.7's PMU extension

   - Rework kvm_pgtable initialisation ahead of the NV work

   - New selftest for IRQ injection

   - Teach selftests about the lack of default IPA space and page sizes

   - Expand sysreg selftest to deal with Pointer Authentication

   - The usual bunch of cleanups and doc update

  s390:

   - fix sigp sense/start/stop/inconsistency

   - cleanups

  x86:

   - Clean up some function prototypes more

   - improved gfn_to_pfn_cache with proper invalidation, used by Xen
     emulation

   - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery

   - completely remove potential TOC/TOU races in nested SVM consistency
     checks

   - update some PMCs on emulated instructions

   - Intel AMX support (joint work between Thomas and Intel)

   - large MMU cleanups

   - module parameter to disable PMU virtualization

   - cleanup register cache

   - first part of halt handling cleanups

   - Hyper-V enlightened MSR bitmap support for nested hypervisors

  Generic:

   - clean up Makefiles

   - introduce CONFIG_HAVE_KVM_DIRTY_RING

   - optimize memslot lookup using a tree

   - optimize vCPU array usage by converting to xarray"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits)
  x86/fpu: Fix inline prefix warnings
  selftest: kvm: Add amx selftest
  selftest: kvm: Move struct kvm_x86_state to header
  selftest: kvm: Reorder vcpu_load_state steps for AMX
  kvm: x86: Disable interception for IA32_XFD on demand
  x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
  kvm: selftests: Add support for KVM_CAP_XSAVE2
  kvm: x86: Add support for getting/setting expanded xstate buffer
  x86/fpu: Add uabi_size to guest_fpu
  kvm: x86: Add CPUID support for Intel AMX
  kvm: x86: Add XCR0 support for Intel AMX
  kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
  kvm: x86: Emulate IA32_XFD_ERR for guest
  kvm: x86: Intercept #NM for saving IA32_XFD_ERR
  x86/fpu: Prepare xfd_err in struct fpu_guest
  kvm: x86: Add emulation for IA32_XFD
  x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
  kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
  x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
  x86/fpu: Add guest support to xfd_enable_feature()
  ...
2022-01-16 16:15:14 +02:00
Catalin Marinas 945409a6ef Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (32 commits)
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs
  drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
  docs: perf: Add description for HiSilicon PCIe PMU driver
  dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
  drivers: perf: Add LLC-TAD perf counter support
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers
  perf/smmuv3: Add devicetree support
  dt-bindings: Add Arm SMMUv3 PMCG binding
  perf/arm-cmn: Add debugfs topology info
  perf/arm-cmn: Add CI-700 Support
  dt-bindings: perf: arm-cmn: Add CI-700
  perf/arm-cmn: Support new IP features
  perf/arm-cmn: Demarcate CMN-600 specifics
  perf/arm-cmn: Move group validation data off-stack
  perf/arm-cmn: Optimise DTC counter accesses
  ...

* for-next/misc:
  : Miscellaneous patches
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64/fp: Add comments documenting the usage of state restore functions
  arm64: mm: Use asid feature macro for cheanup
  arm64: mm: Rename asid2idx() to ctxid2asid()
  arm64: kexec: reduce calls to page_address()
  arm64: extable: remove unused ex_handler_t definition
  arm64: entry: Use SDEI event constants
  arm64: Simplify checking for populated DT
  arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c

* for-next/cache-ops-dzp:
  : Avoid DC instructions when DCZID_EL0.DZP == 1
  arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
  arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

* for-next/stacktrace:
  : Unify the arm64 unwind code
  arm64: Make some stacktrace functions private
  arm64: Make dump_backtrace() use arch_stack_walk()
  arm64: Make profile_pc() use arch_stack_walk()
  arm64: Make return_address() use arch_stack_walk()
  arm64: Make __get_wchan() use arch_stack_walk()
  arm64: Make perf_callchain_kernel() use arch_stack_walk()
  arm64: Mark __switch_to() as __sched
  arm64: Add comment for stack_info::kr_cur
  arch: Make ARCH_STACKWALK independent of STACKTRACE

* for-next/xor-neon:
  : Use SHA3 instructions to speed up XOR
  arm64/xor: use EOR3 instructions when available

* for-next/kasan:
  : Log potential KASAN shadow aliases
  arm64: mm: log potential KASAN shadow alias
  arm64: mm: use die_kernel_fault() in do_mem_abort()

* for-next/armv8_7-fp:
  : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES
  arm64: cpufeature: add HWCAP for FEAT_RPRES
  arm64: add ID_AA64ISAR2_EL1 sys register
  arm64: cpufeature: add HWCAP for FEAT_AFP

* for-next/atomics:
  : arm64 atomics clean-ups and codegen improvements
  arm64: atomics: lse: define RETURN ops in terms of FETCH ops
  arm64: atomics: lse: improve constraints for simple ops
  arm64: atomics: lse: define ANDs in terms of ANDNOTs
  arm64: atomics lse: define SUBs in terms of ADDs
  arm64: atomics: format whitespace consistently

* for-next/bti:
  : BTI clean-ups
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: Use BTI C directly and unconditionally
  arm64: Unconditionally override SYM_FUNC macros
  arm64: Add macro version of the BTI instruction
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel

* for-next/sve:
  : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME

* for-next/kselftest:
  : arm64 kselftest additions
  kselftest/arm64: Add pidbench for floating point syscall cases
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information

* for-next/kcsan:
  : Enable KCSAN for arm64
  arm64: Enable KCSAN
2022-01-05 18:14:32 +00:00
Mark Brown 12b792e5e2 arm64/fp: Add comments documenting the usage of state restore functions
Add comments to help people figure out when fpsimd_bind_state_to_cpu() and
fpsimd_update_current_state() are used.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211207163250.1373542-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:36:19 +00:00