-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPMgtUeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGv7wH/3J6vjENrW+Jg/3W
Vhr8bDEBtZ0+myheNZQTjR+/VNLI88UPxp9AV4a5qH1jlPmmqEC1AJcCN5sg9PMB
Kqr8Jr6gOLI8zVKP7fAhbiJVL0bjtZv4G/IEk5y30jNyO40YBVqwhPwUD8Umhooy
DAHDXyDF3laWk6bMEmJODahM0rdVmPLqvH+X+ALcliRh0SK61t+bMG/4Ox/j81HO
7d61Kd0ttXuzsPdwWmZWiDFMc+KxQ4004vCemU4dZwOaDW3RJU7Uc84+fCBH6nJC
HEzcIaAu9FdmSBY0U1fH8BNIiaBAo+nye2LgV5mZ9LQ77MfuMdDo0I+rWjtBmEp0
hayPOPI=
=aO6E
-----END PGP SIGNATURE-----
Merge tag 'v6.2-rc5' into locking/core, to pick up fixes
Refresh this branch with the latest locking fixes, before
applying a new series of changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
It's a bit confusing to have two cached EDIDs in struct intel_connector
with slightly different purposes. Make the distinction a bit clearer by
moving the EDID cached for eDP and LVDS panels at connector init time to
struct intel_panel, and name it fixed_edid. That's what it is, a fixed
EDID for the panels.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/328350ef918638928a8286cdbab3107c8258332d.1674643465.git.jani.nikula@intel.com
Convert all the connectors that use cached connector edid and
detect_edid to drm_edid.
Since drm_get_edid() calls drm_connector_update_edid_property() while
drm_edid_read*() do not, we need to call drm_edid_connector_update()
separately, in part due to the EDID caching behaviour in HDMI and
DP. Especially DP depends on the details parsed from EDID. (The big
behavioural change conflating EDID reading with parsing and property
update was done in commit 5186421cbf ("drm: Introduce epoch counter to
drm_connector"))
v6: Rebase on drm_edid_connector_add_modes()
v5: Fix potential uninitialized var use (kernel test robot <lkp@intel.com>)
v4: Call drm_edid_connector_update() after reading HDMI/DP EDID
v3: Don't leak vga switcheroo EDID in LVDS init (Ville)
v2: Don't leak opregion fallback EDID (Ville)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/eabb4de932841b38b34cc2818ea9fbf7c10224fd.1674643465.git.jani.nikula@intel.com
DT schema expects names of operating points tables to match certain
pattern:
stih418-b2264.dtb: opp_table: $nodename:0: 'opp_table' does not match '^opp-table(-[a-z0-9]+)?$'
Link: https://lore.kernel.org/r/20230120072110.138627-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
clang correctly complains
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1456:6: warning: variable \
'h' set but not used [-Wunused-but-set-variable]
u32 h;
^
but it can't know whether this use is innocuous or really a problem.
There's a reason why those warning switches are behind a W=1 and not
enabled by default - yes, one needs to do:
make W=1 CC=clang HOSTCC=clang arch/x86/kernel/cpu/resctrl/
with clang 14 in order to trigger it.
I would normally not take a silly fix like that but this one is simple
and doesn't make the code uglier so...
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/202301242015.kbzkVteJ-lkp@intel.com
When a module with a livepatched function is unloaded and then reloaded,
klp attempts to dynamically re-patch it. On ppc64, that fails with the
following error:
module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
The error happens because the restore r2 instruction had already
previously been written into the klp module's replacement function when
the original function was patched the first time. So the instruction
wasn't a nop as expected.
When the restore r2 instruction has already been patched in, detect that
and skip the warning and the instruction write.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2f6329ffd9674df6ff57e03edeb2ca54414770ab.1674617130.git.jpoimboe@kernel.org
restore_r2() returns 1 on success, which is surprising for a non-boolean
function. Change it to return 0 on success and -errno on error to match
kernel coding convention.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/15baf76c271a0ae09f7b8556e50f2b4251e7049d.1674617130.git.jpoimboe@kernel.org
There are multiple ICMP rate limiting mechanisms:
* Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec
* v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask
* v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask
However, when ICMP output is limited, there is no way to tell
which limit has been hit or even if the limits are responsible
for the lack of ICMP output.
Add counters for each of the cases above. As we are within
local_bh_disable(), use the __INC stats variant.
Example output:
# nstat -sz "*RateLimit*"
IcmpOutRateLimitGlobal 134 0.0
IcmpOutRateLimitHost 770 0.0
Icmp6OutRateLimitHost 84 0.0
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com>
Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stephen Rothwell reported htmldocs warnings when merging accel tree:
Documentation/ABI/testing/sysfs-driver-habanalabs:201: ERROR: Unexpected indentation.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: WARNING: Block quote ends without a blank line; unexpected unindent.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: ERROR: Unexpected indentation.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: WARNING: Block quote ends without a blank line; unexpected unindent.
Fix these by fixing alignment of list of card status returned by
/sys/class/habanalabs/hl<n>/status.
Link: https://lore.kernel.org/linux-next/20230120130634.61c3e857@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The device specific directory in debugfs does not have "accel". For
example, the documentation says device 0 should have a debugfs entry as
/sys/kernel/debug/accel/accel0/ but in reality the entry is
/sys/kernel/debug/accel/0/
Fix the documentation to match the implementation.
Fixes: 8c5577a5cc ("doc: add documentation for accel subsystem")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When a decode error happens, we often don't know the exact root
cause (the erroneous address that was accessed) and the exact engine
that created the erroneous transaction.
To find out, we need to go over all the relevant register blocks
in the ASIC. Once we find the relevant engine, we print its details
and the offending address.
This helps tremendously when debugging an error that was created
by running a user workload.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This is required in order to allow the kernel to control relevant
configuration space via load and store instructions.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
If resetting device upon release while the release watchdog work is
scheduled, the compute reset is replaced with hard reset.
In this case, need to clear the in_compute_reset indication in the
device reset information structure.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
If device memory scrubbing from hl_device_reset() fails, we return with
an error code but not perform error handling code.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit enhances the following error messages to also provide the
type of error occurred, this in order to ease debugging of errors
detected during firmware-load.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Completion timestamp is taken during the actual command submission
release. As the release happens in a work queue, the timestamp taken
is not accurate. Hence, we will take the timestamp in the interrupt
handler itself while propagating it to the release function.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to support more user interrupt types in the future, we
enumerate the user interrupt type instead of using a boolean.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Handling edma razwi is different than all other engines since edma
uses sft routers. For hbw transactions sft router contain separate
interface for each edma and for lbw there is common interface for
both edma engines of the same dcore.
To handle the razwi correctly we need to:
1. Simplify the calculation of the sft router address.
2. Add razwi handling for edma qm errors, since edma qman doesn't
reports axi error response.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
A device with status malfunction indicates that it can't be used.
In such a case we do not support certain reset types, e.g.,
all kinds of soft-resets (compute reset, inference soft-reset),
and reset upon device release.
A hard-reset is the only way that an unusable device can change its
status. All other reset procedures can't put the device in a reset
procedure, which might ultimately cause the device to change its
status, unintentionally, to become operational again.
Such a scenario has recently occurred, when a user requested
a hard-reset while another heavy user workload was ongoing (reset
request is queued).
Since the workload couldn't finish within reset's timeout limits, the
reset has failed and set a device status malfunction.
Eventually, when the user released the FD, an unsuccessful soft-reset
occurred, hence followed by an additional hard-reset that changed the
ASICs status back to be operational.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
AXI transaction id holds information about the initiator which caused
the page fault. In the future it will be translated automatically by
driver to an initiator name.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As device status was changed recently, we must update the
documentation as well.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In reviewing the ivpu driver, DEFINE_DRM_ACCEL_FOPS could have been used
if DRM_ACCEL_FOPS defined .mmap to be drm_gem_mmap. Lets add that since
accel drivers are a variant of drm drivers, modern drm drivers are
expected to use GEM, and mmap() is a common operation that is expected
to be heavily used in accel drivers thus the common accel driver should
be able to just use DEFINE_DRM_ACCEL_FOPS() for convenience.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
get_maintainer.pl does not suggest Oded Gabbay, the DRM COMPUTE
ACCELERATORS DRIVERS AND FRAMEWORK maintainer for changes that touch
the Accel Subsystem header - drm_accel.h. This is because that file is
missing from the Accel Subsystem entry. Fix this.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Captured addresses of low b/w razwi information contains only the
offset from the cfg base. To make it more user readable, add the cfg
base to it.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In gaudi2 there night be different routers for low b/w and high b/w
transactions. But in the code that collects razwi information, we used
the same router for high b/w and low b/w.
Fixed it by reading the information also from low b/w routers.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Protect re-using the same timestamp buffer record before actually
adding it to the to interrupt wait list.
Mark ts buff offset as in use in the spinlock protection area of the
interrupt wait list to avoid getting in the re-use section in
ts_buff_get_kernel_ts_record before adding the node to the list.
this scenario might happen when multiple threads are racing on
same offset and one thread could set data in the ts buff in
ts_buff_get_kernel_ts_record then the other thread takes over
and get to ts_buff_get_kernel_ts_record and we will try
to re-use the same ts buff offset then we will try to
delete a non existing node from the list.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
use argument instead of fixed GFP value for allocation
in Timestamps buffers alloc function.
change data type of size to size_t.
Fixes: 9158bf69e7 ("habanalabs: Timestamps buffers registration")
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Make sure all reserved/pad fields in uapi input structures
are set to 0.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
data is a void * type and does not require a cast.
Signed-off-by: XU pengfei <xupengfei@nfschina.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit attaches the PCI device address to driver fatal messages
in order to ease debugging in multi-device setups.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Because f/w does not update razwi info when sending events, remove the
use of it.
The driver is responsible to check if razwi happened and to
collect razwi data.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add traces to LBW reads/writes.
This may be handy when debugging configuration failure or events when
tracking configuration flow.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
There are cases where it may be useful to dump the whole LBW configs.
Yet, doing so while spamming the kernel log will probably shade other
important messages since the LBW access is done in sheer volume.
To answer this we add trace events for those too.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The value in SM_SEI_CAUSE includes the SOB index and not the SOB group
index.
Remove usage of log_mask in sm_sei_cause structure as it was never
used.
Signed-off-by: Carmit Carmel <ccarmel@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This function shall be used whenever components enable/binning masks
should be updated.
Usage is in one of the below cases:
- update user (or default) component masks
- update when getting the masks from FW (either CPUCP or COMMS)
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>