Commit 2aeaf663b8 introduced strict checking for variable length
payload size validation. The payload length of received data must
match the size of the requested data by the caller except for the case
where the min_out value is set.
The Get Log command does not have a header with a length field set.
The Log size is determined by the Get Supported Logs command (CXL 3.0,
8.2.9.5.1). However, the actual size can be smaller and the number of
valid bytes in the payload output must be determined reading the
Payload Length field (CXL 3.0, Table 8-36, Note 2).
Two issues arise: The command can successfully complete with a payload
length of zero. And, the valid payload length must then also be
consumed by the caller.
Change cxl_xfer_log() to pass the number of payload bytes back to the
caller to determine the number of log entries. Implement the payload
handling as a special case where mbox_cmd->size_out is consulted when
cxl_internal_send_cmd() returns -EIO. A WARN_ONCE() is added to check
that -EIO is only returned in case of an unexpected output size.
Logs can be bigger than the maximum payload length and multiple Get
Log commands can be issued. If the received payload size is smaller
than the maximum payload size we can assume all valid bytes have been
fetched. Stop sending further Get Log commands then.
On that occasion, change debug messages to also report the opcodes of
supported commands.
The variable payload commands GET_LSA and SET_LSA are not affected by
this strict check: SET_LSA cannot be broken because SET_LSA does not
return an output payload, and GET_LSA never expects short reads.
Fixes: 2aeaf663b8 ("cxl/mbox: Add variable output size validation for internal commands")
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230119094934.86067-1-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmPUO0cPHHdzYUBrZXJu
ZWwub3JnAAoJEBQN5MwUoCm2W58QAIQMUVwScHxlEM4N9D7O585wIVFcSksmBdHn
xQREKleAm+84EVXjj2rTFonqIUcVzm367rz2w9YOvtHilJaepTOcB8HRfoi5Mo9o
9gYa2m+7PfqsWJI/DE/Bee97AwK1u7N+L1WoHPVY2ZJpPruw0/Ylr/kNXMf3cKAU
8qTYYuNgc8uMU1b9WZbypYy1g0Tj+dyXE0nmPr1z26obDhsSsTG/dJ4fhoOmlGLC
m1ioVymwlGBcc+aTIHoryiFF2/t4duPtFJ5J4Ks9Dw5brtUCNvq1CHu0u0g94mlB
cEapGuhY6mhtbMkMBA4l7+MiJkeX0ZDBvDdTtSzeiEttHmVZAReWvtPkMIt2P+6z
EgxYRx4Y0H0O/ckWW6zxM8f+Tp3WmYYoWncU7pnuekn78+usX+zFAF9xs9SpWEV5
Mj8qQyDh56Aoaqdh/JP5SVARVMJz8uKn2NgZ32gxHBBx37N1gD6DYhs+N06yFr+X
6V3aqcLKgpMMxLZMUP3vJEwuB+dHJ4JHWGsKW5MjXrBL0Oc17bEfXAjD6bpYas0t
n7eeSk+d5VBT0gdcdgrHqBZyxKcnTUDqT18BS1+L+rvT+axSZxW0R6A1nI4j4QwU
nV8NDKiB1nOSJiPPUuVuWFJcSrHDOilmGAM71d8VuYIvpak8aLqP4xz3Ve2UibU4
hdfy3f/i
=LSQy
-----END PGP SIGNATURE-----
Merge tag 'i2c-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A bunch of driver fixes with a tiny bit of new IDs"
* tag 'i2c-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rk3x: fix a bunch of kernel-doc warnings
i2c: axxia: use 'struct' for kernel-doc notation
dt-bindings: i2c: renesas,rzv2m: Fix SoC specific string
i2c: mxs: suppress probe-deferral error message
i2c: designware-pci: Add new PCI IDs for AMD NAVI GPU
i2c: designware: Fix unbalanced suspended flag
i2c: designware: use casting of u64 in clock multiplication to avoid overflow
- fix the -c option in the gpio-event-mode user-space example program
- fix the irq number translation in gpio-ep93xx and make its irqchip immutable
- add a missing spin_unlock in error path in gpio-mxc
- fix a suspend breakage on System76 and Lenovo Gen2a introduced in GPIO ACPI
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmPUKckACgkQEacuoBRx
13KjuQ//fL9MagRQVsE85SVhhMsjmTTbXnvGJOIT0iTTvqgh9DACX9m6EDQQ/8Kd
bKuciI0h2zO5xt/fZPAyJWgGw0w3qvld7f3f6KQE60X4MRyckyZB9xEyHnvuUs+X
v0SiH8rLm+znezFXFiIB3lq004T9DU+wxUCaSq4WH1eE7eqKP8ZXhg2sLUgiskiY
c1Albv2X0xM4duLmPBK4vsXV2x35yj5qS0qkIymezff2sQekzb3TOZsaS7Doznex
cKr2OvJR/VuH4SMv+uIoKjsVKzdvtP/6hBKJtTH5i1krzNsBW/S+/7odwxzdxPVU
EgQc5I7/ubV7FP3Jo3ThO3exTFyd50hl4jKmrkrzzFpeh2xZ2eKU/AD8/MNsPVuA
KAoBUxHnoJfdZuUNL0Cmf1pTmzZbS80qntznzO700QtBaIxS/6redfJ8CKUK4Iy8
f491K3SbzpXzRAjVMBv8ua7JrB8MT82IKxbeXd1bellGrLaNuMzj1Yd8XhWlepnQ
wZxMSo3457FpOghiiMM3YnLSCcpbs4WU8VV/NMCc8jfS14oq7gA26EalQLPwKeCF
5c+UMAZ95gafJwPjuZcDJcRCGpDlhqPawRBo5cDVUlSokh1PSmPhcGNtE8XKUKlE
fHyd6WEYn8AzXnf3PMceiuJYCADs8wtg87GHKuLF1PPXmcqM4C8=
=wh3L
-----END PGP SIGNATURE-----
Merge tag 'gpio-fixes-for-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix the -c option in the gpio-event-mode user-space example program
- fix the irq number translation in gpio-ep93xx and make its irqchip
immutable
- add a missing spin_unlock in error path in gpio-mxc
- fix a suspend breakage on System76 and Lenovo Gen2a introduced in
GPIO ACPI
* tag 'gpio-fixes-for-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
tools: gpio: fix -c option of gpio-event-mon
gpio: ep93xx: remove unused variable
gpio: ep93xx: Make irqchip immutable
gpio: ep93xx: Fix port F hwirq numbers in handler
gpio: mxc: Unlock on error path in mxc_flip_edge()
gpiolib-acpi: Don't set GPIOs for wakeup in S3 mode
A fix for the DT binding documentation which dropped a property
when being converted to YAML format causing spurious errors
validating device trees for platforms using the device.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmPT/N4ACgkQJNaLcl1U
h9DebQf+O5ghcGfIu5RnFV5NMaUPlkVKJqY51iJQH50+CNqFAzmCIM8uBQjUxeE9
Ujxenq31RxnBPBzdRZzuKYEWnM7Q2qPUp/dWrcdxJVpBYZuhq+ExQ9teUsf/EeIT
95DorQ9jfOqlV18HNHNCoH8xn5clnW8iDIdciSzfpp3pX+trWAefeIaOlU5H3tKj
Lb1gEf29LrE6zDsDRKjL1tuyQw/ED8MpKzcX6/Pjm6He5hWXpof1kSMm8Z085XQF
pB8nuJoipQIbbj/cQm/eAwb8R5AUbotTKxDwamdVtPYm+3FEg3fi/SIBOdbDvRKc
yQS42il5cE+YPozy9myeVps6d8/60A==
=7HDJ
-----END PGP SIGNATURE-----
Merge tag 'regulator-fix-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A fix for the DT binding documentation which dropped a property when
being converted to YAML format causing spurious errors validating
device trees for platforms using the device"
* tag 'regulator-fix-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: samsung,s2mps14: add lost samsung,ext-control-gpios
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCY9PrPQAKCRDh3BK/laaZ
PKk5AP9UUlwGP2XIuCY7hMWvsZKe1FpAXyXzG3jrEmRyBmOEFQD/RMRItvlj330O
ntPw7luRC4Us4TO/xc3OqVE0UUnwqQw=
=sqlV
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix two bugs, a recent one introduced in the last cycle, and an older
one from v5.11"
* tag 'ovl-fixes-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fail on invalid uid/gid mapping at copy up
ovl: fix tmpfile leak
The Bananapi-M3 has a SATA connector, driven by a USB-to-SATA bridge
soldered on the board. The power for the SATA device is provided by a
GPIO controlled regulator. Since the SATA device is behind USB, it has
no DT node, so we never described this regulator. Instead U-Boot was
turning this on in a rather hackish way, which we now want to get rid of.
On top of that it seems fragile to leave this GPIO undescribed, as
userland could claim it and turn the disk off.
Add a fixed regulator, controlled by the PD25 GPIO, and mark it as
always-on. This would mimic the current situation, but in a safer way,
and would allow U-Boot to drop the CONFIG_SATAPWR enable hack.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20230120012616.30960-1-andre.przywara@arm.com
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
The property named in the schema is 'enable-gpios', not 'enable-gpio'.
This makes no difference at runtime, because the regulator is marked as
always-on, but it breaks validation.
Fixes: 4701fc6e5d ("ARM: dts: sun8i: add FriendlyARM NanoPi Duo2")
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221231225854.16320-2-samuel@sholland.org
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
These board devicetrees fail to validate because the gpio-leds schema
requires its child nodes to have "led" in the node name.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20221231225854.16320-1-samuel@sholland.org
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
drm:
- DP MST kref fix
- fb_helper: check return value
i915:
- Fix BSC default context for Meteor Lake
- Fix selftest-scheduler's modify_type
- memory leak fix
amdgpu:
- GC11.x fixes
- SMU13.0.0 fix
- Freesync video fix
- DP MST fixes
- build fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmPTXuMACgkQDHTzWXnE
hr5kaw/+JVm3ghpcdjueHjanKEIk1gzpXY9mqfv2VyNoJvl+9Tf9IdCZ3KDJIH4u
/fCR10IAWO5rUDE6pYELkwE78Fj4d0Tqj/OcuLTdIPsdzsvmh2gMdlKB24BiWods
T4mFca0xpq4Ogeh2sfvBmNxUejK/4syiKYyLkvMKR51Nt9M0iF0WEY6RYgm/XYSz
sLAkeAgh+YdEfH94zEj0U1Hi+N5RG1bQGbbrEbKsjrf10XoyICzP2U0MbtECQhQt
dqYaLSnrQEiNybOtdG+H/oi4Gvn7MXe7QRSBBcgLtHYG+CR89rbw1imgVHDQhxA3
G2SLUqTMdZ2m9oG8KRe1HTVr0kXl063Sdv7XWmJx5pZoXlKj5GuSGs2eaG+tdm3y
SlwaOzSdo3v4ad5txFS1GKlpas8bhSFQ0XSQfJ0wXt7rXzz+1R4v8OJ8LWdLu/gP
mFIMZ8o5U850hq8Y6m//Ivwff5MzPmrvcG53my9lr7D7/YH1J+UvmhfK9JtfllDe
GxC4GXQoodhIqvgxGlJ1YX2GmJRGfrplhYPmdWSeNoUn+zPinOGFdWqgH50bKNvS
0aykDMHBpsdxnzjugVaOGxpwCeYX/GZtcuJKCU7Ak/bGzuKoFU+aKKYFhO2u14pe
N2pxqP+RRtHzif5vmIKSkjACYCAfjaaVvHcEMv2C1f5ySXXRTSM=
=w7XH
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2023-01-27' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Fairly small this week as well, i915 has a memory leak fix and some
minor changes, and amdgpu has some MST fixes, and some other minor
ones:
drm:
- DP MST kref fix
- fb_helper: check return value
i915:
- Fix BSC default context for Meteor Lake
- Fix selftest-scheduler's modify_type
- memory leak fix
amdgpu:
- GC11.x fixes
- SMU13.0.0 fix
- Freesync video fix
- DP MST fixes
- build fix"
* tag 'drm-fixes-2023-01-27' of git://anongit.freedesktop.org/drm/drm:
amdgpu: fix build on non-DCN platforms.
drm/amd/display: Fix timing not changning when freesync video is enabled
drm/display/dp_mst: Correct the kref of port.
drm/amdgpu/display/mst: update mst_mgr relevant variable when long HPD
drm/amdgpu/display/mst: limit payload to be updated one by one
drm/amdgpu/display/mst: Fix mst_state->pbn_div and slot count assignments
drm/amdgpu: declare firmware for new MES 11.0.4
drm/amdgpu: enable imu firmware for GC 11.0.4
drm/amd/pm: add missing AllowIHInterrupt message mapping for SMU13.0.0
drm/amdgpu: remove unconditional trap enable on add gfx11 queues
drm/fb-helper: Use a per-driver FB deferred I/O handler
drm/fb-helper: Check fb_deferred_io_init() return value
drm/i915/selftest: fix intel_selftest_modify_policy argument types
drm/i915/mtl: Fix bcs default context
drm/i915: Fix a memory leak with reused mmap_offset
drm/drm_vma_manager: Add drm_vma_node_allow_once()
- Add support for USB host/device configuration on RZ/N1,
- Add PLL2 programming support, and CAN-FD clocks on R-Car V4H,
- Miscellaneous fixes and improvements.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQ9qaHoIs/1I4cXmEiKwlD9ZEnxcAUCY9OhkwAKCRCKwlD9ZEnx
cHPuAQCo1Id5WB+ho1fs0L1OCrMR+3k9xoFuSWdq1OzKnQ1vmAEAq50oJT9U8+Ge
LFz+RyJVvlVkpKNhYT40OzIXIn0KmwE=
=7Cjr
-----END PGP SIGNATURE-----
Merge tag 'renesas-clk-for-v6.3-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into clk-renesas
Pull more Renesas clk driver updates from Geert Uytterhoeven:
- Add support for USB host/device configuration on RZ/N1
- Add PLL2 programming support, and CAN-FD clocks on R-Car V4H
- Miscellaneous fixes and improvements
* tag 'renesas-clk-for-v6.3-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers:
clk: renesas: r8a779g0: Add CAN-FD clocks
clk: renesas: r8a779g0: Tidy up DMAC name on SYS-DMAC
clk: renesas: r8a779a0: Tidy up DMAC name on SYS-DMAC
clk: renesas: r8a779g0: Add custom clock for PLL2
clk: renesas: cpg-mssr: Remove superfluous check in resume code
clk: renesas: r9a06g032: Handle h2mode setting based on USBF presence
Add locking to the Intel int340x thermal control driver to prevent
its thermal zone callbacks from racing with firmware-induced thermal
trip point updates (Srinivas Pandruvada, Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPT4RMSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxytIP/0rs3UsBprl3ShRbdILjwkZ/tMWvgkx0
92Uv47Q5d69ydbc2/4uzALPykJkbDlglrg5qaTwfqekNsiFxcV8ii+ztmvnTgkoD
5CvgSO28/YaVEetbF5fnG+9Ckuq3IA/oaCsRJl5ODccm2k++HFkOB6o4k8SWK0Eg
KdXyHvQdF8uiR7HjVMHX4xScmICuYp22Fcj6P++A6+QkwiOUMQ5hDTBJv0+IaSlh
z9n3C7Wq5Q+sMUZN3rU1F0oLRpULUmZPaamrp2IHHQtsmzyrtSr90cZcLjFA1IrQ
p5RMoT3M8nJc9X8WMALP62o4r/HRI2U1lVejxVlnf/gGLfqhf10v3VvDX8UfCodn
RkVbT+RmgxMDBb5GpvHrJMKzMvjwyZnDLA1ZiIHiP1ok3NFiUCG+j1bYF35mgJIj
GX+rg01Qv0royn3FBePfbfYfDs5c+/elpOAu3Zs/DPi/R2hoFSYt/7iVBH3BYtvm
qMjLucR/GochCo6X4RJsCB0KxotcaEahc6rr4V7D0a9kCEuAjCRPdMtnvU3orFhh
b022ORJIbH8MhK43U2m+q4JXcYqbjm094smBwlD5NvIlOJfdWLo/5rXLfp7zPDub
YPbB5OiIV5sVs61IP71KF/LEYnVjySOLyyEpFr7HBhsZDtIBnJwF8ivC93ZrK7MY
Z70GgUqV5NX/
=wNe5
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"Add locking to the Intel int340x thermal control driver to prevent its
thermal zone callbacks from racing with firmware-induced thermal trip
point updates (Srinivas Pandruvada, Rafael Wysocki)"
* tag 'thermal-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()
thermal: intel: int340x: Protect trip temperature from concurrent updates
The GuC specific register state entry in the error capture object was
just called 'capture'. Although the companion 'node' entry was called
'guc_capture_node'. Rename the base entry to be 'guc_capture' instead
so that it is a) more consistent and b) more obvious what it is.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-9-John.C.Harrison@Intel.com
For understanding bug reports, it can be useful to have an explicit
dmesg print when a reset notification is received from GuC. As opposed
to simply inferring that this happened from other messages.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-8-John.C.Harrison@Intel.com
Engine resets are supposed to never fail. But in the case when one
does (due to unknown reasons that normally come down to a missing
w/a), it is useful to get as much information out of the system as
possible. Given that the GuC intentionally dies on such a situation,
it is not possible to get a guilty context notification back. So do a
manual search instead. Given that GuC is dead, this is safe because
GuC won't be changing the engine state asynchronously.
v2: Change comment to be less alarming (Tvrtko)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-7-John.C.Harrison@Intel.com
A hang situation has been observed where the only requests on the
context were either completed or not yet started according to the
breaadcrumbs. However, the register state claimed a batch was (maybe)
in progress. So, allow capture of the pending request on the grounds
that this might be better than nothing.
v2: Reword 'not started' warning message (Tvrtko)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-6-John.C.Harrison@Intel.com
There was a report of error captures occurring without any hung
context being indicated despite the capture being initiated by a 'hung
context notification' from GuC. The problem was not reproducible.
However, it is possible to happen if the context in question has no
active requests. For example, if the hang was in the context switch
itself then the breadcrumb write would have occurred and the KMD would
see an idle context.
In the interests of attempting to provide as much information as
possible about a hang, it seems wise to include the engine info
regardless of whether a request was found or not. As opposed to just
prentending there was no hang at all.
So update the error capture code to always record engine information
if a context is given. Which means updating record_context() to take a
context instead of a request (which it only ever used to find the
context anyway). And split the request agnostic parts of
intel_engine_coredump_add_request() out into a seaprate function.
v2: Remove a duplicate 'if' statement (Umesh) and fix a put of a null
pointer.
v3: Tidy up request locking code flow (Tvrtko)
v4: Pull in improved info message from next patch and fix up potential
leak of GuC register state (Daniele)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> (v2)
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-5-John.C.Harrison@Intel.com
The debugfs dump of requests was confused about what state requires
the execlist lock versus the GuC lock. There was also a bunch of
duplicated messy code between it and the error capture code.
So refactor the hung request search into a re-usable function. And
reduce the span of the execlist state lock to only the execlist
specific code paths. In order to do that, also move the report of hold
count (which is an execlist only concept) from the top level dump
function to the lower level execlist specific function. Also, move the
execlist specific code into the execlist source file.
v2: Rename some functions and move to more appropriate files (Daniele).
v3: Rename new execlist dump function (Daniele)
Fixes: dc0dad365c ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-4-John.C.Harrison@Intel.com
When GuC support was added to error capture, the reference counting
around the request object was broken. Fix it up.
The context based search manages the spinlocking around the search
internally. So it needs to grab the reference count internally as
well. The execlist only request based search relies on external
locking, so it needs an external reference count but within the
spinlock not outside it.
The only other caller of the context based search is the code for
dumping engine state to debugfs. That code wasn't previously getting
an explicit reference at all as it does everything while holding the
execlist specific spinlock. So, that needs updaing as well as that
spinlock doesn't help when using GuC submission. Rather than trying to
conditionally get/put depending on submission model, just change it to
always do the get/put.
v2: Explicitly document adding an extra blank line in some dense code
(Andy Shevchenko). Fix multiple potential null pointer derefs in case
of no request found (some spotted by Tvrtko, but there was more!).
Also fix a leaked request in case of !started and another in
__guc_reset_context now that intel_context_find_active_request is
actually reference counting the returned request.
v3: Add a _get suffix to intel_context_find_active_request now that it
grabs a reference (Daniele).
v4: Split the intel_guc_find_hung_context change to a separate patch
and rename intel_context_find_active_request_get to
intel_context_get_active_request (Tvrtko).
v5: s/locking/reference counting/ in commit message (Tvrtko)
Fixes: dc0dad365c ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Fixes: 573ba126ae ("drm/i915/guc: Capture error state on context reset")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-3-John.C.Harrison@Intel.com
intel_guc_find_hung_context() was not acquiring the correct spinlock
before searching the request list. So fix that up. While at it, add
some extra whitespace padding for readability.
Fixes: dc0dad365c ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-2-John.C.Harrison@Intel.com
- Fix event counting regression in Arm CMN PMU driver due to broken optimisation
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmPUBmoQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjND5ICACD5MypW0I+BtQiRRG2/yq+hAYgJN2TVnJC
+o//Atj7aiwI5EIv+SxFhhwlRNsWG8DnwotetdO7YTqQWskDBNSSX6LO9bxjt7N8
eGERr97IK2hOYlLgA5gStKSp4FJL6WLysNXwoyy+Ff6pBtLTFOZHdgUEsEV55XIL
yCK8DdUU+J4CbcQX1EKwdj4yyJ2Hu2ThMvNnIpC6OrNg4eHIIP4fZbE+EKGNdcA6
Xh+2tSfqUeClqOqbXKnshH9DLgywBgIwW6JUnKHJynNNUrJni6oKqoeBAEJUJkol
prmvEr1if/++0QufOHRHZIx1VyeCFN/vwXrY6nZ8AhMRmjwrTVia
=Dfxt
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
- Fix event counting regression in Arm CMN PMU driver due to broken
optimisation
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
* A few DT bindings fixes to more closely align the ISA string
requirements between the bindings and the ISA manual.
* A handful of build error/warning fixes.
* A fix to move init_cpu_topology() later in the boot flow, so it can
allocate memory.
* The IRC channel is now in the MAINTAINERS file, so it's easier to
find.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmPUC6kTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYibIoD/9WvCKADgFTdAsKx+daiKebpuWvCjUv
du8/l+QE9xdmpbISdA5tpzHVLQNc3oqM2AtgeKtxdX5mG1DndtIuigM9k8hoF5p9
id9nWnIAHI1t5/yigmd4NPMNBB23Lsh+NfFSfP+fn1Q9F3ZmUCvVnotu91+Qs9lQ
ZYeb2l2jEJU19KV79Do+zNisJ5iXMMl7/5GIFjov6vUpVw5jIU5C6gbVEWHrgSEe
xGr7aZBJPIszcM6L7nTVWrikdK4gn417ydqBrBUGN8n2skve6Q32snfN8zrs0DV7
YYJfo+Gh2j2K5pQnpLganpbQIGBm1u9cx5sKboD6X/eAsyzlgjGelH13uA/1ZNB1
rqUsr5V11TN2RconXK/kkox+pusq3/KRDB01focndJdHKNDJo5bZps6uH/GM9aog
3wMafDLWd4LeBJiPbJx49834wy277GGbyxBpSar7uIRrhX7IPwLkj3tTU3DCP4kK
xvrk2LhitdgLOzGSDMoPjTPTpXlqrXwb3bh4NITxx95XXOAE03/4W6FhYKRbgWub
JDmX0JKGkcgWC/JX4TMAFLZU7AvjKpZBLoTTMrAe+y/bzHASBgFBrzqUVaKZ8GgC
KV91OzOLJb1/np7kZCQHxO/72v5YPbPcfEyNUQwm8KC4/8CPf8/tq5eWgWsw9xBM
GZjoUco1Y+9RCQ==
=ZHbl
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A few DT bindings fixes to more closely align the ISA string
requirements between the bindings and the ISA manual.
- A handful of build error/warning fixes.
- A fix to move init_cpu_topology() later in the boot flow, so it can
allocate memory.
- The IRC channel is now in the MAINTAINERS file, so it's easier to
find.
* tag 'riscv-for-linus-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Move call to init_cpu_topology() to later initialization stage
riscv/kprobe: Fix instruction simulation of JALR
riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT
MAINTAINERS: add an IRC entry for RISC-V
RISC-V: fix compile error from deduplicated __ALTERNATIVE_CFG_2
dt-bindings: riscv: fix single letter canonical order
dt-bindings: riscv: fix underscore requirement for multi-letter extensions
This Kselftest fixes update for Linux 6.2-rc6 consists of a single
fix to a amd-pstate test Makefile bug that deletes source files
during make clean run.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmPS4awACgkQCwJExA0N
QxyxIA/+LSuXceaj895VX/wwCeB9SFarG5oZesX5sg4Z8RID91w505hY+ekuKyUT
6Wt2sjc9XJ3RJfPZ5L9fK1TncIjkkQbKoDIJjKZ9nM10zT4wj0fK+vjnKY96Da9w
s70stRKr0zyVE+uPV5oEuj89LsZITuOsDJxbxcF7RWHOCid+nYaO8RdoANAHMPnF
f70ziaAAb8zNqv55Lnh05N+gd7VptaJ6CuS3d/WxRABhNgqSHheEMWymY51G0aO1
3Om3/HLVFWLr7nqdt7Vnc2EjcllzwxaRnIACctmLIzVlabJSuDm4FtDpOU2D+sG4
pGW/NVc+ydrM1PJuugery7tMMP1vV+frhhDvfnuRdLaPFl5Wxc0W8D7WOgMce0fo
hK+wahfIrcNrXX8mQQi54P0K5imIZcgPxpZz3rgZHrbP+hJmDdStMh93zCLLfL/t
zmVYf9xYJzb32OOkmjZdN0AKNDlDowgZQcbesmH22bDVMuKORb6YG920J6Qa2pmK
5A168cLYS5yDHtbouZlcDK5XAroydcAYWGLXmVMoLKpS2Sa58DsKlVgq7BeHcUgX
dQLhfS18YsW9XKiKYakWb6z67B+MQ0SulnOdOLhOO3ekds04hWFSASR4nFvDKHce
h9MZKBk2osHYfX//RCfmyCVqE4SleMXBWNzFOfrfm0de0d+NILQ=
=Wb/P
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"A single fix to a amd-pstate test Makefile bug that deletes source
files during make clean run"
* tag 'linux-kselftest-fixes-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: amd-pstate: Don't delete source files via Makefile
Use strchr() instead of open coding it as it's done elsewhere in
the same file. Either we will have similar to what it was or possibly
better performance in case architecture implements its own strchr().
Memory wise on x86_64 bloat-o-meter shows the following
Function old new delta
strsep 111 102 -9
Total: Before=2763, After=2754, chg -0.33%
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230127155135.27153-1-andriy.shevchenko@linux.intel.com
To work around a Clang __builtin_object_size bug that shows up under
CONFIG_FORTIFY_SOURCE and UBSAN_BOUNDS, move the per-loop-iteration
mem_block wipe into a single wipe of the entire pool structure after
the loop.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1780
Cc: Weili Qian <qianweili@huawei.com>
Cc: Zhou Wang <wangzhou1@hisilicon.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org> # build
Link: https://lore.kernel.org/r/20230106041945.never.831-kees@kernel.org
Zero-length arrays are deprecated[1]. Replace struct i40e_lump_tracking's
"list" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:
In function 'i40e_put_lump',
inlined from 'i40e_clear_interrupt_scheme' at drivers/net/ethernet/intel/i40e/i40e_main.c:5145:2:
drivers/net/ethernet/intel/i40e/i40e_main.c:278:27: warning: array subscript <unknown> is outside array bounds of 'u16[0]' {aka 'short unsigned int[]'} [-Warray-bounds=]
278 | pile->list[i] = 0;
| ~~~~~~~~~~^~~
drivers/net/ethernet/intel/i40e/i40e.h: In function 'i40e_clear_interrupt_scheme':
drivers/net/ethernet/intel/i40e/i40e.h:179:13: note: while referencing 'list'
179 | u16 list[0];
| ^~~~
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230105234557.never.799-kees@kernel.org
Zero-length arrays are deprecated[1]. Replace struct io_uring_buf_ring's
"bufs" with a flexible array member. (How is the size of this array
verified?) Detected with GCC 13, using -fstrict-flex-arrays=3:
In function 'io_ring_buffer_select',
inlined from 'io_buffer_select' at io_uring/kbuf.c:183:10:
io_uring/kbuf.c:141:23: warning: array subscript 255 is outside the bounds of an interior zero-length array 'struct io_uring_buf[0]' [-Wzero-length-bounds]
141 | buf = &br->bufs[head];
| ^~~~~~~~~~~~~~~
In file included from include/linux/io_uring.h:7,
from io_uring/kbuf.c:10:
include/uapi/linux/io_uring.h: In function 'io_buffer_select':
include/uapi/linux/io_uring.h:628:41: note: while referencing 'bufs'
628 | struct io_uring_buf bufs[0];
| ^~~~
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: stable@vger.kernel.org
Cc: io-uring@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230105190507.gonna.131-kees@kernel.org
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed.
ext4_feat_ktype was setting the "release" handler to "kfree", which
doesn't have a matching function prototype. Add a simple wrapper
with the correct prototype.
This was found as a result of Clang's new -Wcast-function-type-strict
flag, which is more sensitive than the simpler -Wcast-function-type,
which only checks for type width mismatches.
Note that this code is only reached when ext4 is a loadable module and
it is being unloaded:
CFI failure at kobject_put+0xbb/0x1b0 (target: kfree+0x0/0x180; expected type: 0x7c4aa698)
...
RIP: 0010:kobject_put+0xbb/0x1b0
...
Call Trace:
<TASK>
ext4_exit_sysfs+0x14/0x60 [ext4]
cleanup_module+0x67/0xedb [ext4]
Fixes: b99fee58a2 ("ext4: create ext4_feat kobject dynamically")
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: stable@vger.kernel.org
Build-tested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230103234616.never.915-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230104210908.gonna.388-kees@kernel.org
One-element arrays are deprecated, and we are replacing them with
flexible array members instead. So, replace one-element array with
flexible-array member in struct gvt_firmware_header and refactor the
rest of the code accordingly.
Additionally, previous implementation was allocating 8 bytes more than
required to represent firmware_header + cfg_space data + mmio data.
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
To make reviewing this patch easier, I'm pasting before/after struct
sizes.
pahole -C gvt_firmware_header before/drivers/gpu/drm/i915/gvt/firmware.o
struct gvt_firmware_header {
u64 magic; /* 0 8 */
u32 crc32; /* 8 4 */
u32 version; /* 12 4 */
u64 cfg_space_size; /* 16 8 */
u64 cfg_space_offset; /* 24 8 */
u64 mmio_size; /* 32 8 */
u64 mmio_offset; /* 40 8 */
unsigned char data[1]; /* 48 1 */
/* size: 56, cachelines: 1, members: 8 */
/* padding: 7 */
/* last cacheline: 56 bytes */
};
pahole -C gvt_firmware_header after/drivers/gpu/drm/i915/gvt/firmware.o
struct gvt_firmware_header {
u64 magic; /* 0 8 */
u32 crc32; /* 8 4 */
u32 version; /* 12 4 */
u64 cfg_space_size; /* 16 8 */
u64 cfg_space_offset; /* 24 8 */
u64 mmio_size; /* 32 8 */
u64 mmio_offset; /* 40 8 */
unsigned char data[]; /* 48 0 */
/* size: 48, cachelines: 1, members: 8 */
/* last cacheline: 48 bytes */
};
As you can see the additional byte of the fake-flexible array (data[1])
forced the compiler to pad the struct but those bytes aren't actually used
as first & last bytes (of both cfg_space and mmio) are controlled by the
<>_size and <>_offset members present in the gvt_firmware_header struct.
Link: https://github.com/KSPP/linux/issues/79
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 [1]
Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/Y6Eu2604cqtryP4g@mail.google.com
Both Coverity and GCC with -Wstringop-overflow noticed that
nvif_outp_acquire_dp() accidentally defined its second argument with 1
additional element:
drivers/gpu/drm/nouveau/dispnv50/disp.c: In function 'nv50_pior_atomic_enable':
drivers/gpu/drm/nouveau/dispnv50/disp.c:1813:17: error: 'nvif_outp_acquire_dp' accessing 16 bytes in a region of size 15 [-Werror=stringop-overflow=]
1813 | nvif_outp_acquire_dp(&nv_encoder->outp, nv_encoder->dp.dpcd, 0, 0, false, false);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/nouveau/dispnv50/disp.c:1813:17: note: referencing argument 2 of type 'u8[16]' {aka 'unsigned char[16]'}
drivers/gpu/drm/nouveau/include/nvif/outp.h:24:5: note: in a call to function 'nvif_outp_acquire_dp'
24 | int nvif_outp_acquire_dp(struct nvif_outp *, u8 dpcd[16],
| ^~~~~~~~~~~~~~~~~~~~
Avoid these warnings by defining the argument size using the matching
define (DP_RECEIVER_CAP_SIZE, 15) instead of having it be a literal
(and incorrect) value (16).
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1527269 ("Memory - corruptions")
Addresses-Coverity-ID: 1527268 ("Memory - corruptions")
Link: https://lore.kernel.org/lkml/202211100848.FFBA2432@keescook/
Link: https://lore.kernel.org/lkml/202211100848.F4C2819BB@keescook/
Fixes: 8134437213 ("drm/nouveau/disp: move DP link config into acquire")
Reviewed-by: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221127183036.never.139-kees@kernel.org
Current link for NAPI documentation in ice driver doesn't work - it
returns 404. Update the link to the working one.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The PF controls the set of queues that the RDMA auxiliary_driver requests
resources from. The set_channel command will alter that pool and trigger a
reconfiguration of the VSI, which breaks RDMA functionality.
Prevent set_channel from executing when RDMA driver bound to auxiliary
device.
Adding a locked variable to pass down the call chain to avoid double
locking the device_lock.
Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
build_id__init() only copies the buildid data up to size leaving the
rest of the data array uninitialized. Copying the full array during
synthesis means the written event contains uninitialized memory.
Ensure the size is less that the buffer size and only copy the bytes
that were initialized. This was detected by the Clang/LLVM memory
sanitizer.
v2. Avoids the potential for copying too much as suggested by Arnaldo.
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230120185828.43231-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Setup is non-trivial so also link to the full SPE docs.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.or
Link: https://lore.kernel.org/r/20230124145929.557891-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make the sink error message more similar to the event error message that
reminds about missing kernel support. The available sinks are also
determined by the hardware so mention that too.
Also, usually it's not necessary to specify the sink, so add that as a
hint.
Now the error for a made up sink looks like this:
$ perf record -e cs_etm/@abc/
Couldn't find sink "abc" on event cs_etm/@abc/.
Missing kernel or device support?
Hint: An appropriate sink will be picked automatically if one isn't is specified.
For any error other than ENOENT, the same message as before is
displayed.
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/ec7502e6-b406-3997-c2a5-24f98e5c4854@arm.com
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230124110220.460551-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian has been reviewing perf tooling patches consistently for a long
time, so lets reflect that in the MAINTAINERS file so that contributors
add him to the CC list in patch submissions.
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update ptrace tests according to all potential Yama security policies.
This is required to make such tests pass even if Yama is enabled.
Tests are not skipped but they now check both Landlock and Yama boundary
restrictions at run time to keep a maximum test coverage (i.e. positive
and negative testing).
Signed-off-by: Jeff Xu <jeffxu@google.com>
Link: https://lore.kernel.org/r/20230114020306.1407195-2-jeffxu@google.com
Cc: stable@vger.kernel.org
[mic: Add curly braces around EXPECT_EQ() to make it build, and improve
commit message]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Currently dump_kernel_instr() dumps a few instructions around the
pt_regs::pc value, dumping 4 instructions before the PC before dumping
the instruction at the PC. If an attempt to read an instruction fails,
it gives up and does not attempt to dump any subsequent instructions.
This is unfortunate when the pt_regs::pc value points to the start of a
page with a leading guard page, where the instruction at the PC can be
read, but prior instructions cannot.
This patch makes dump_kernel_instr() attempt to dump each instruction
regardless of whether reading a prior instruction could be read, which
gives a more useful code dump in such cases. When an instruction cannot
be read, it is reported as "????????", which cannot be confused with a
hex value,
For example, with a `UDF #0` (AKA 0x00000000) early in the kexec control
page, we'll now get the following code dump:
| Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP
| Modules linked in:
| CPU: 0 PID: 261 Comm: kexec Not tainted 6.2.0-rc5+ #26
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : 0x48c00000
| lr : machine_kexec+0x190/0x200
| sp : ffff80000d36ba80
| x29: ffff80000d36ba80 x28: ffff000002dfc380 x27: 0000000000000000
| x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
| x23: ffff80000a9f7858 x22: 000000004c460000 x21: 0000000000000010
| x20: 00000000ad821000 x19: ffff000000aa0000 x18: 0000000000000006
| x17: ffff8000758a2000 x16: ffff800008000000 x15: ffff80000d36b568
| x14: 0000000000000000 x13: ffff80000d36b707 x12: ffff80000a9bf6e0
| x11: 00000000ffffdfff x10: ffff80000aaaf8e0 x9 : ffff80000815eff8
| x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
| x5 : 0000000000001fff x4 : 0000000000000001 x3 : ffff80000a263008
| x2 : ffff80000a9e20f8 x1 : 0000000048c00000 x0 : ffff000000aa0000
| Call trace:
| 0x48c00000
| kernel_kexec+0x88/0x138
| __do_sys_reboot+0x108/0x288
| __arm64_sys_reboot+0x2c/0x40
| invoke_syscall+0x78/0x140
| el0_svc_common.constprop.0+0x4c/0x100
| do_el0_svc+0x34/0x80
| el0_svc+0x34/0x140
| el0t_64_sync_handler+0xf4/0x140
| el0t_64_sync+0x194/0x1c0
| Code: ???????? ???????? ???????? ???????? (00000000)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Oops - Undefined instruction: Fatal exception
| Kernel Offset: disabled
| CPU features: 0x002000,00050108,c8004203
| Memory Limit: none
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230127121256.2141368-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
== Background ==
There is a class of side-channel attacks against SGX enclaves called
"SGX Step"[1]. These attacks create lots of exceptions inside of
enclaves. Basically, run an in-enclave instruction, cause an exception.
Over and over.
There is a concern that a VMM could attack a TDX guest in the same way
by causing lots of #VE's. The TDX architecture includes new
countermeasures for these attacks. It basically counts the number of
exceptions and can send another *special* exception once the number of
VMM-induced #VE's hits a critical threshold[2].
== Problem ==
But, these special exceptions are independent of any action that the
guest takes. They can occur anywhere that the guest executes. This
includes sensitive areas like the entry code. The (non-paranoid) #VE
handler is incapable of handling exceptions in these areas.
== Solution ==
Fortunately, the special exceptions can be disabled by the guest via
write to NOTIFY_ENABLES TDCS field. NOTIFY_ENABLES is disabled by
default, but might be enabled by a bootloader, firmware or an earlier
kernel before the current kernel runs.
Disable NOTIFY_ENABLES feature explicitly and unconditionally. Any
NOTIFY_ENABLES-based #VE's that occur before this point will end up
in the early #VE exception handler and die due to unexpected exit
reason.
[1] https://github.com/jovanbulck/sgx-step
[2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#safety-against-ve-in-kernel-code
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-8-kirill.shutemov%40linux.intel.com
A "SEPT #VE" occurs when a TDX guest touches memory that is not properly
mapped into the "secure EPT". This can be the result of hypervisor
attacks or bugs, *OR* guest bugs. Most notably, buggy guests might
touch unaccepted memory for lots of different memory safety bugs like
buffer overflows.
TDX guests do not want to continue in the face of hypervisor attacks or
hypervisor bugs. They want to terminate as fast and safely as possible.
SEPT_VE_DISABLE ensures that TDX guests *can't* continue in the face of
these kinds of issues.
But, that causes a problem. TDX guests that can't continue can't spit
out oopses or other debugging info. In essence SEPT_VE_DISABLE=1 guests
are not debuggable.
Relax the SEPT_VE_DISABLE check to warning on debug TD and panic() in
the #VE handler on EPT-violation on private memory. It will produce
useful backtrace.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-7-kirill.shutemov%40linux.intel.com
Linux TDX guests require that the SEPT_VE_DISABLE "attribute" be set.
If it is not set, the kernel is theoretically required to handle
exceptions anywhere that kernel memory is accessed, including places
like NMI handlers and in the syscall entry gap.
Rather than even try to handle these exceptions, the kernel refuses to
run if SEPT_VE_DISABLE is unset.
However, the SEPT_VE_DISABLE detection and refusal code happens very
early in boot, even before earlyprintk runs. Calling panic() will
effectively just hang the system.
Instead, call a TDX-specific panic() function. This makes a very simple
TDVMCALL which gets a short error string out to the hypervisor without
any console infrastructure.
Use TDG.VP.VMCALL<ReportFatalError> to report the error. The hypercall
can encode message up to 64 bytes in eight registers.
[ dhansen: tweak comment and remove while loop brackets. ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-6-kirill.shutemov%40linux.intel.com
Currently we rely on the HIBERNATE_TEXT section starting with the entry
point to swsusp_arch_suspend_exit, and the KEXEC_TEXT section starting
with the entry point to arm64_relocate_new_kernel. In both cases we copy
the entire section into a dynamically-allocated page, and then later
branch to the start of this page.
SYM_FUNC_START() will align the function entry points to
CONFIG_FUNCTION_ALIGNMENT, and when the linker later processes the
assembled code it will place padding bytes before the function entry
point if the location counter was not already sufficiently aligned. The
linker happens to use the value zero for these padding bytes.
This padding may end up being applied whenever CONFIG_FUNCTION_ALIGNMENT
is greater than 4, which can be the case with
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y or
CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y.
When such padding is applied, attempting to kexec or resume from
hibernate will result ina crash: the kernel will branch to the padding
bytes as the start of the dynamically-allocated page, and as those bytes
are zero they will decode as UDF #0, which reliably triggers an
UNDEFINED exception. For example:
| # ./kexec --reuse-cmdline -f Image
| [ 46.965800] kexec_core: Starting new kernel
| [ 47.143641] psci: CPU1 killed (polled 0 ms)
| [ 47.233653] psci: CPU2 killed (polled 0 ms)
| [ 47.323465] psci: CPU3 killed (polled 0 ms)
| [ 47.324776] Bye!
| [ 47.327072] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP
| [ 47.328510] Modules linked in:
| [ 47.329086] CPU: 0 PID: 259 Comm: kexec Not tainted 6.2.0-rc5+ #3
| [ 47.330223] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| [ 47.331497] pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| [ 47.332782] pc : 0x43a95000
| [ 47.333338] lr : machine_kexec+0x190/0x1e0
| [ 47.334169] sp : ffff80000d293b70
| [ 47.334845] x29: ffff80000d293b70 x28: ffff000002cc0000 x27: 0000000000000000
| [ 47.336292] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
| [ 47.337744] x23: ffff80000a837858 x22: 0000000048ec9000 x21: 0000000000000010
| [ 47.339192] x20: 00000000adc83000 x19: ffff000000827000 x18: 0000000000000006
| [ 47.340638] x17: ffff800075a61000 x16: ffff800008000000 x15: ffff80000d293658
| [ 47.342085] x14: 0000000000000000 x13: ffff80000d2937f7 x12: ffff80000a7ff6e0
| [ 47.343530] x11: 00000000ffffdfff x10: ffff80000a8ef8e0 x9 : ffff80000813ef00
| [ 47.344976] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
| [ 47.346431] x5 : 0000000000001fff x4 : 0000000000000001 x3 : ffff80000a0a3008
| [ 47.347877] x2 : ffff80000a8220f8 x1 : 0000000043a95000 x0 : ffff000000827000
| [ 47.349334] Call trace:
| [ 47.349834] 0x43a95000
| [ 47.350338] kernel_kexec+0x88/0x100
| [ 47.351070] __do_sys_reboot+0x108/0x268
| [ 47.351873] __arm64_sys_reboot+0x2c/0x40
| [ 47.352689] invoke_syscall+0x78/0x108
| [ 47.353458] el0_svc_common.constprop.0+0x4c/0x100
| [ 47.354426] do_el0_svc+0x34/0x50
| [ 47.355102] el0_svc+0x34/0x108
| [ 47.355747] el0t_64_sync_handler+0xf4/0x120
| [ 47.356617] el0t_64_sync+0x194/0x198
| [ 47.357374] Code: bad PC value
| [ 47.357999] ---[ end trace 0000000000000000 ]---
| [ 47.358937] Kernel panic - not syncing: Oops - Undefined instruction: Fatal exception
| [ 47.360515] Kernel Offset: disabled
| [ 47.361230] CPU features: 0x002000,00050108,c8004203
| [ 47.362232] Memory Limit: none
Note: Unfortunately the code dump reports "bad PC value" as it attempts
to dump some instructions prior to the UDF (i.e. before the start of the
page), and terminates early upon a fault, obscuring the problem.
This patch fixes this issue by aligning the section starter markes to
CONFIG_FUNCTION_ALIGNMENT using the ALIGN_FUNCTION() helper, which
ensures that the linker never needs to place padding bytes within the
section. Assertions are added to verify each section begins with the
function we expect, making our implicit requirement explicit.
In future it might be nice to rework the kexec and hibernation code to
decouple the section start from the entry point, but that involves much
more significant changes that come with a higher risk of error, so I've
tried to keep this fix as simple as possible for now.
Fixes: 47a15aa544 ("arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT")
Reported-by: CKI Project <cki-project@redhat.com>
Link: https://lore.kernel.org/linux-arm-kernel/29992.123012504212600261@us-mta-139.us.mimecast.lan/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
So far __tdx_hypercall() only handles six arguments for VMCALL.
Expanding it to six more register would allow to cover more use-cases
like ReportFatalError() and Hyper-V hypercalls.
With all preparations in place, the expansion is pretty straight
forward.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-5-kirill.shutemov%40linux.intel.com
RDI is the first argument to __tdx_hypercall() that used to pass pointer
to struct tdx_hypercall_args. RSI is the second argument that contains
flags, such as TDX_HCALL_HAS_OUTPUT and TDX_HCALL_ISSUE_STI.
RDI and RSI can also be used as arguments to TDVMCALL leafs. Move RDI to
RAX and RSI to RBP to free up them for the hypercall arguments.
RAX saved on stack during TDCALL as it returns status code in the
register.
RBP value has to be restored before returning from __tdx_hypercall() as
it is callee-saved register.
This is preparatory patch. No functional change.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-4-kirill.shutemov%40linux.intel.com
struct tdx_hypercall_args is used to pass down hypercall arguments to
__tdx_hypercall() assembly routine.
Currently __tdx_hypercall() handles up to 6 arguments. In preparation to
changes in __tdx_hypercall(), expand the structure to 6 more registers
and generate asm offsets for them.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-3-kirill.shutemov%40linux.intel.com
Comment in __tdx_hypercall() points that RAX==0 indicates TDVMCALL
failure which is opposite of the truth: RAX==0 is success.
Fix the comment. No functional changes.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-2-kirill.shutemov%40linux.intel.com