When test-compiling the BCM2835 pin control driver on m68k:
In file included from arch/m68k/include/asm/io_mm.h:32:0,
from arch/m68k/include/asm/io.h:8,
from include/linux/io.h:13,
from include/linux/irq.h:20,
from include/linux/gpio/driver.h:7,
from drivers/pinctrl/bcm/pinctrl-bcm2835.c:17:
drivers/pinctrl/bcm/pinctrl-bcm2835.c: In function 'bcm2711_pull_config_set':
arch/m68k/include/asm/atarihw.h:190:22: error: expected identifier or '(' before 'volatile'
# define shifter ((*(volatile struct SHIFTER *)SHF_BAS))
"shifter" is a too generic name for a global definition.
As the corresponding definition for Atari TT is already called
"shifter_tt", fix this by renaming the definition for Atari ST to
"shifter_st".
Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Since commit d3b41b6bb4 ("m68k: Dispatch nvram_ops calls to Atari or
Mac functions"), Coldfire builds generate compiler warnings due to the
unconditional inclusion of asm/atarihw.h and asm/macintosh.h.
The inclusion of asm/atarihw.h causes warnings like this:
In file included from ./arch/m68k/include/asm/atarihw.h:25:0,
from arch/m68k/kernel/setup_mm.c:41,
from arch/m68k/kernel/setup.c:3:
./arch/m68k/include/asm/raw_io.h:39:0: warning: "__raw_readb" redefined
#define __raw_readb in_8
In file included from ./arch/m68k/include/asm/io.h:6:0,
from arch/m68k/kernel/setup_mm.c:36,
from arch/m68k/kernel/setup.c:3:
./arch/m68k/include/asm/io_no.h:16:0: note: this is the location of the previous definition
#define __raw_readb(addr) \
...
This issue is resolved by dropping the asm/raw_io.h include. It turns out
that asm/io_mm.h already includes that header file.
Moving the relevant macro definitions helps to clarify this dependency
and make it safe to include asm/atarihw.h.
The other warnings look like this:
In file included from arch/m68k/kernel/setup_mm.c:48:0,
from arch/m68k/kernel/setup.c:3:
./arch/m68k/include/asm/macintosh.h:19:35: warning: 'struct irq_data' declared inside parameter list will not be visible outside of this definition or declaration
extern void mac_irq_enable(struct irq_data *data);
^~~~~~~~
...
This issue is resolved by adding the missing linux/irq.h include.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Rename floppy_type macros to make them more consistent with the scsi_type
macros, which are named after classes of models with similar memory maps.
The MAC_FLOPPY_OLD symbol is introduced to change the relevant base
address from 0x50F00000 to 0x50000000 (consistent with MAC_SCSI_OLD).
The documentation for LC-class machines has the IO devices at offsets
from $50F00000. Use these addresses for MAC_FLOPPY_LC (consistent with
MAC_SCSI_LC) because they may not be aliased elsewhere in the memory map.
Add comments with controller type information from 'Designing Cards and
Drivers for the Macintosh Family', relevant Developer Notes and
http://mess.redump.net/mess/driver_info/mac_technical_notes
Adopt phys_addr_t to avoid type casts.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Add the phy-node and mdio bus to the fec-node, represented as is on
hardware.
This commit includes micrel,led-mode that is set to the default
value, prepared for someone who wants to change this.
Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com>
Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Add sleep pinmux to the fec so it can properly sleep.
Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com>
Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Prepare FlexCAN use on SODIMM 55/63 178/188. Those SODIMM pins are
compatible for CAN bus use with several modules from the Colibri
family.
Add Better drivestrength and also add flexcan2.
Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com>
Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Force HS200 by masking bit 63 of the SDHCI capability register.
The i.MX ESDHC driver uses SDHCI_QUIRK2_CAPS_BIT63_FOR_HS400. With
that the stack checks bit 63 to descide whether HS400 is available.
Using sdhci-caps-mask allows to mask bit 63. The stack then selects
HS200 as operating mode.
This prevents rare communication errors with minimal effect on
performance:
sdhci-esdhc-imx 30b60000.usdhc: warning! HS400 strobe DLL
status REF not lock!
Signed-off-by: Stefan Agner <stefan.agner@toradex.com>
Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com>
Reviewed-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
klp_module_coming() is called for every module appearing in the system.
It sets obj->mod to a patched module for klp_object obj. Unfortunately
it leaves it set even if an error happens later in the function and the
patched module is not allowed to be loaded.
klp_is_object_loaded() uses obj->mod variable and could currently give a
wrong return value. The bug is probably harmless as of now.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Replace "depot_save_stack" with "stack_depot_save" in code comments because
depot_save_stack() was replaced in commit c0cfc33726 ("lib/stackdepot:
Provide functions which operate on plain storage arrays") and removed in
commit 56d8f079c5 ("lib/stackdepot: Remove obsolete functions")
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190815113246.18478-1-miles.chen@mediatek.com
Some newer machines do not advertise legacy timers. The kernel can handle
that situation if the TSC and the CPU frequency are enumerated by CPUID or
MSRs and the CPU supports TSC deadline timer. If the CPU does not support
TSC deadline timer the local APIC timer frequency has to be known as well.
Some Ryzens machines do not advertize legacy timers, but there is no
reliable way to determine the bus frequency which feeds the local APIC
timer when the machine allows overclocking of that frequency.
As there is no legacy timer the local APIC timer calibration crashes due to
a NULL pointer dereference when accessing the not installed global clock
event device.
Switch the calibration loop to a non interrupt based one, which polls
either TSC (if frequency is known) or jiffies. The latter requires a global
clockevent. As the machines which do not have a global clockevent installed
have a known TSC frequency this is a non issue. For older machines where
TSC frequency is not known, there is no known case where the legacy timers
do not exist as that would have been reported long ago.
Reported-by: Daniel Drake <drake@endlessm.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de
Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
lockdep reports the following deadlock scenario:
WARNING: possible circular locking dependency detected
kworker/1:1/48 is trying to acquire lock:
000000008d7a62b2 (text_mutex){+.+.}, at: kprobe_optimizer+0x163/0x290
but task is already holding lock:
00000000850b5e2d (module_mutex){+.+.}, at: kprobe_optimizer+0x31/0x290
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (module_mutex){+.+.}:
__mutex_lock+0xac/0x9f0
mutex_lock_nested+0x1b/0x20
set_all_modules_text_rw+0x22/0x90
ftrace_arch_code_modify_prepare+0x1c/0x20
ftrace_run_update_code+0xe/0x30
ftrace_startup_enable+0x2e/0x50
ftrace_startup+0xa7/0x100
register_ftrace_function+0x27/0x70
arm_kprobe+0xb3/0x130
enable_kprobe+0x83/0xa0
enable_trace_kprobe.part.0+0x2e/0x80
kprobe_register+0x6f/0xc0
perf_trace_event_init+0x16b/0x270
perf_kprobe_init+0xa7/0xe0
perf_kprobe_event_init+0x3e/0x70
perf_try_init_event+0x4a/0x140
perf_event_alloc+0x93a/0xde0
__do_sys_perf_event_open+0x19f/0xf30
__x64_sys_perf_event_open+0x20/0x30
do_syscall_64+0x65/0x1d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (text_mutex){+.+.}:
__lock_acquire+0xfcb/0x1b60
lock_acquire+0xca/0x1d0
__mutex_lock+0xac/0x9f0
mutex_lock_nested+0x1b/0x20
kprobe_optimizer+0x163/0x290
process_one_work+0x22b/0x560
worker_thread+0x50/0x3c0
kthread+0x112/0x150
ret_from_fork+0x3a/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(module_mutex);
lock(text_mutex);
lock(module_mutex);
lock(text_mutex);
*** DEADLOCK ***
As a reproducer I've been using bcc's funccount.py
(https://github.com/iovisor/bcc/blob/master/tools/funccount.py),
for example:
# ./funccount.py '*interrupt*'
That immediately triggers the lockdep splat.
Fix by acquiring text_mutex before module_mutex in kprobe_optimizer().
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d5b844a2cf ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()")
Link: http://lkml.kernel.org/r/20190812184302.GA7010@xps-13
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On Asus X571GT, GPIO 297 is configured as an interrupt and serves
for the touchpad. The touchpad will report input events much less
than expected after S3 suspend/resume, which results in extremely
slow cursor movement. However, the number of interrupts observed
from /proc/interrupts increases much more than expected even no
touching touchpad.
This is due to the value of PADCFG0 of PIN 225 for the interrupt
has been changed from 0x80800102 to 0x80100102. The GPIROUTIOXAPIC
is toggled on which results in the spurious interrupts. The PADCFG0
of PIN 225 is expected to be saved during suspend, but the 297 is
saved instead because the gpiochip_line_is_irq() expect the GPIO
offset but what's really passed to it is PIN number. In this case,
the /sys/kernel/debug/pinctrl/INT3450:00/gpio-ranges shows
288: INT3450:00 GPIOS [436 - 459] PINS [216 - 239]
So gpiochip_line_is_irq() returns true for GPIO offset 297, the
suspend routine spuriously saves the content for PIN 297 which
we expect to save for PIN 225.
This commit maps the PIN number to GPIO offset first in the
intel_pinctrl_should_save() to make sure the values for the
specific PINs can be correctly saved and then restored.
Fixes: c538b94367 ("pinctrl: intel: Only restore pins that are used by the driver")
Signed-off-by: Chris Chiu <chiu@endlessm.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
We use a fake timeline->mutex lock to reassure lockdep that the timeline
is always locked when emitting requests. However, the use inside
__engine_park() may be inside hardirq and so lockdep now complains about
the mixed irq-state of the nested locked. Disable irqs around the
lockdep tracking to keep it happy.
Fixes: 6c69a45445 ("drm/i915/gt: Mark context->active_count as protected by timeline->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819075835.20065-3-chris@chris-wilson.co.uk
While not strictly needed as "ethernet-phy-ieee802.3-c22"
is assumed by default if not given explicitly, having
the compatible string here makes it more clear what
this is and which driver handles this - an Ethernet
phy attached to mdio, handled by of_mdio.c
Signed-off-by: André Draszik <git@andred.net>
CC: Ilya Ledvich <ilya@compulab.co.il>
CC: Igor Grinberg <grinberg@compulab.co.il>
CC: Rob Herring <robh+dt@kernel.org>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Shawn Guo <shawnguo@kernel.org>
CC: Sascha Hauer <s.hauer@pengutronix.de>
CC: Pengutronix Kernel Team <kernel@pengutronix.de>
CC: Fabio Estevam <festevam@gmail.com>
CC: NXP Linux Team <linux-imx@nxp.com>
CC: devicetree@vger.kernel.org
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
If a task is PI-blocked (blocking on sleeping spinlock) then we don't want to
schedule a new kworker if we schedule out due to lock contention because !RT
does not do that as well. A spinning spinlock disables preemption and a worker
does not schedule out on lock contention (but spin).
On RT the RW-semaphore implementation uses an rtmutex so
tsk_is_pi_blocked() will return true if a task blocks on it. In this case we
will now start a new worker which may deadlock if one worker is waiting on
progress from another worker. Since a RW-semaphore starts a new worker on !RT,
we should do the same on RT.
XFS is able to trigger this deadlock.
Allow to schedule new worker if the current worker is PI-blocked.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190816160626.12742-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
... sort them in and fixup comment, while at it.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190819070140.23708-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Recent changes to the atheros at803x driver caused
ethernet to stop working on this board.
In particular commit 6d4cd041f0
("net: phy: at803x: disable delay only for RGMII mode")
and commit cd28d1d6e5
("net: phy: at803x: Disable phy delay for RGMII mode")
fix the AR8031 driver to configure the phy's (RX/TX)
delays as per the 'phy-mode' in the device tree.
This now prevents ethernet from working on this board.
It used to work before those commits, because the
AR8031 comes out of reset with RX delay enabled, and
the at803x driver didn't touch the delay configuration
at all when "rgmii" mode was selected, and because
arch/arm/mach-imx/mach-imx7d.c:ar8031_phy_fixup()
unconditionally enables TX delay.
Since above commits ar8031_phy_fixup() also has no
effect anymore, and the end-result is that all delays
are disabled in the phy, no ethernet.
Update the device tree to restore functionality.
Signed-off-by: André Draszik <git@andred.net>
CC: Ilya Ledvich <ilya@compulab.co.il>
CC: Igor Grinberg <grinberg@compulab.co.il>
CC: Rob Herring <robh+dt@kernel.org>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Shawn Guo <shawnguo@kernel.org>
CC: Sascha Hauer <s.hauer@pengutronix.de>
CC: Pengutronix Kernel Team <kernel@pengutronix.de>
CC: Fabio Estevam <festevam@gmail.com>
CC: NXP Linux Team <linux-imx@nxp.com>
CC: devicetree@vger.kernel.org
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
When running a 64-bit kernel with a 32-bit iptables binary, the size of
the xt_nfacct_match_info struct diverges.
kernel: sizeof(struct xt_nfacct_match_info) : 40
iptables: sizeof(struct xt_nfacct_match_info)) : 36
Trying to append nfacct related rules results in an unhelpful message.
Although it is suggested to look for more information in dmesg, nothing
can be found there.
# iptables -A <chain> -m nfacct --nfacct-name <acct-object>
iptables: Invalid argument. Run `dmesg' for more information.
This patch fixes the memory misalignment by enforcing 8-byte alignment
within the struct's first revision. This solution is often used in many
other uapi netfilter headers.
Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The netlink attribute policy for NFTA_FLOW_TABLE_NAME is missing.
Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ordering of arguments to the x_tables ADD_COUNTER macro
appears to be wrong in ebtables (cf. ip_tables.c, ip6_tables.c,
and arp_tables.c).
This causes data corruption in the ebtables userspace tools
because they get incorrect packet & byte counts from the kernel.
Fixes: d72133e628 ("netfilter: ebtables: use ADD_COUNTER macro")
Signed-off-by: Todd Seidelmann <tseidelmann@linode.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This entry is in MAINTAINERS for historical purpose.
It doesn't match current sources since the commit
adf82accc5 ("netfilter: x_tables: merge ip and
ipv6 masquerade modules") moved the module.
The net/netfilter/xt_MASQUERADE.c module is already under
the netfilter section. Thus, there is no purpose to keep this
separate entry in MAINTAINERS.
Cc: Florian Westphal <fw@strlen.de>
Cc: Juanjo Ciarlante <jjciarla@raiz.uncu.edu.ar>
Cc: netfilter-devel@vger.kernel.org
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This change exports an interface to read tcc offset and allow writing if
the platform is not locked.
Refer to Intel SDM for details on the MSR: MSR_TEMPERATURE_TARGET.
Here TCC Activation Offset (R/W) bits allow temperature offset in degrees
in relation to TjMAX.
This change will be useful for improving performance from user space for
some platforms, if the current offset is not optimal.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Benjamin Berg <bberg@redhat.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Static structure fb_funcs, of type drm_framebuffer_funcs, is used only
when it is passed to drm_gem_fb_create_with_funcs() as its last
argument. drm_gem_fb_create_with_funcs does not modify its lst argument
(fb_funcs) and hence fb_funcs is never modified. Therefore make fb_funcs
constant to protect it from further modification.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190813062712.24993-1-nishkadg.linux@gmail.com
Modify the xmon 'dxi' command to query all interrupts if no IRQ number
is specified.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190814154754.23682-4-clg@kaod.org
The xmon 'dxi' command calls OPAL to query the XIVE configuration of a
interrupt. This can only be done on baremetal (PowerNV) and it will
crash a pseries machine.
Introduce a new XIVE get_irq_config() operation which implements a
different query depending on the platform, PowerNV or pseries, and
modify xmon to use a top level wrapper.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190814154754.23682-3-clg@kaod.org
Currently, the xmon 'dx' command calls OPAL to dump the XIVE state in
the OPAL logs and also outputs some of the fields of the internal XIVE
structures in Linux. The OPAL calls can only be done on baremetal
(PowerNV) and they crash a pseries machine. Fix by checking the
hypervisor feature of the CPU.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190814154754.23682-2-clg@kaod.org
At the moment we create a small window only for 32bit devices, the window
maps 0..2GB of the PCI space only. For other devices we either use
a sketchy bypass or hardware bypass but the former can only work if
the amount of RAM is no bigger than the device's DMA mask and the latter
requires devices to support at least 59bit DMA.
This extends the default DMA window to the maximum size possible to allow
a wider DMA mask than just 32bit. The default window size is now limited
by the the iommu_table::it_map allocation bitmap which is a contiguous
array, 1 bit per an IOMMU page.
This increases the default IOMMU page size from hard coded 4K to
the system page size to allow wider DMA masks.
This increases the level number to not exceed the max order allocation
limit per TCE level. By the same time, this keeps minimal levels number
as 2 in order to save memory.
As the extended window now overlaps the 32bit MMIO region, this adds
an area reservation to iommu_init_table().
After this change the default window size is 0x80000000000==1<<43 so
devices limited to DMA mask smaller than the amount of system RAM can
still use more than just 2GB of memory for DMA.
This is an optimization and not a bug fix for DMA API usage.
With the on-demand allocation of indirect TCE table levels enabled and
2 levels, the first TCE level size is just
1<<ceil((log2(0x7ffffffffff+1)-16)/2)=16384 TCEs or 2 system pages.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-5-aik@ozlabs.ru
We allocate only the first level of multilevel TCE tables for KVM
already (alloc_userspace_copy==true), and the rest is allocated on demand.
This is not enabled though for bare metal.
This removes the KVM limitation (implicit, via the alloc_userspace_copy
parameter) and always allocates just the first level. The on-demand
allocation of missing levels is already implemented.
As from now on DMA map might happen with disabled interrupts, this
allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1].
In practice just a single page is allocated there so chances for failure
are quite low.
To save time when creating a new clean table, this skips non-allocated
indirect TCE entries in pnv_tce_free just like we already do in
the VFIO IOMMU TCE driver.
This changes the default level number from 1 to 2 to reduce the amount
of memory required for the default 32bit DMA window at the boot time.
The default window size is up to 2GB which requires 4MB of TCEs which is
unlikely to be used entirely or at all as most devices these days are
64bit capable so by switching to 2 levels by default we save 4032KB of
RAM per a device.
While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace
can trigger this path via VFIO, see the failure and try creating a table
again with different parameters which might succeed.
[1]:
===
BUG: sleeping function called from invalid context at mm/page_alloc.c:4596
in_atomic(): 1, irqs_disabled(): 1, pid: 1038, name: scsi_eh_1
2 locks held by scsi_eh_1/1038:
#0: 000000005efd659a (&host->eh_mutex){+.+.}, at: ata_eh_acquire+0x34/0x80
#1: 0000000006cf56a6 (&(&host->lock)->rlock){....}, at: ata_exec_internal_sg+0xb0/0x5c0
irq event stamp: 500
hardirqs last enabled at (499): [<c000000000cb8a74>] _raw_spin_unlock_irqrestore+0x94/0xd0
hardirqs last disabled at (500): [<c000000000cb85c4>] _raw_spin_lock_irqsave+0x44/0x120
softirqs last enabled at (0): [<c000000000101120>] copy_process.isra.4.part.5+0x640/0x1a80
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 73 PID: 1038 Comm: scsi_eh_1 Not tainted 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634
Call Trace:
[c000003d064cef50] [c000000000c8e6c4] dump_stack+0xe8/0x164 (unreliable)
[c000003d064cefa0] [c00000000014ed78] ___might_sleep+0x2f8/0x310
[c000003d064cf020] [c0000000003ca084] __alloc_pages_nodemask+0x2a4/0x1560
[c000003d064cf220] [c0000000000c2530] pnv_alloc_tce_level.isra.0+0x90/0x130
[c000003d064cf290] [c0000000000c2888] pnv_tce+0x128/0x3b0
[c000003d064cf360] [c0000000000c2c00] pnv_tce_build+0xb0/0xf0
[c000003d064cf3c0] [c0000000000bbd9c] pnv_ioda2_tce_build+0x3c/0xb0
[c000003d064cf400] [c00000000004cfe0] ppc_iommu_map_sg+0x210/0x550
[c000003d064cf510] [c00000000004b7a4] dma_iommu_map_sg+0x74/0xb0
[c000003d064cf530] [c000000000863944] ata_qc_issue+0x134/0x470
[c000003d064cf5b0] [c000000000863ec4] ata_exec_internal_sg+0x244/0x5c0
[c000003d064cf700] [c0000000008642d0] ata_exec_internal+0x90/0xe0
[c000003d064cf780] [c0000000008650ac] ata_dev_read_id+0x2ec/0x640
[c000003d064cf8d0] [c000000000878e28] ata_eh_recover+0x948/0x16d0
[c000003d064cfa10] [c00000000087d760] sata_pmp_error_handler+0x480/0xbf0
[c000003d064cfbc0] [c000000000884624] ahci_error_handler+0x74/0xe0
[c000003d064cfbf0] [c000000000879fa8] ata_scsi_port_error_handler+0x2d8/0x7c0
[c000003d064cfca0] [c00000000087a544] ata_scsi_error+0xb4/0x100
[c000003d064cfd00] [c000000000802450] scsi_error_handler+0x120/0x510
[c000003d064cfdb0] [c000000000140c48] kthread+0x1b8/0x1c0
[c000003d064cfe20] [c00000000000bd8c] ret_from_kernel_thread+0x5c/0x70
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
irq event stamp: 2305
========================================================
hardirqs last enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34
hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654
WARNING: possible irq lock inversion dependency detected
5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G W
softirqs last enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654
softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120
but this lock took another, HARDIRQ-unsafe lock in the past:
(fs_reclaim){+.+.}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
local_irq_disable();
lock(&(&host->lock)->rlock);
lock(fs_reclaim);
<Interrupt>
lock(&(&host->lock)->rlock);
*** DEADLOCK ***
no locks held by swapper/0/0.
the shortest dependencies between 2nd lock and 1st lock:
-> (fs_reclaim){+.+.} ops: 167579 {
HARDIRQ-ON-W at:
lock_acquire+0xf8/0x2a0
fs_reclaim_acquire.part.23+0x44/0x60
kmem_cache_alloc_node_trace+0x80/0x590
alloc_desc+0x64/0x270
__irq_alloc_descs+0x2e4/0x3a0
irq_domain_alloc_descs+0xb0/0x150
irq_create_mapping+0x168/0x2c0
xics_smp_probe+0x2c/0x98
pnv_smp_probe+0x40/0x9c
smp_prepare_cpus+0x524/0x6c4
kernel_init_freeable+0x1b4/0x650
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x70
SOFTIRQ-ON-W at:
lock_acquire+0xf8/0x2a0
fs_reclaim_acquire.part.23+0x44/0x60
kmem_cache_alloc_node_trace+0x80/0x590
alloc_desc+0x64/0x270
__irq_alloc_descs+0x2e4/0x3a0
irq_domain_alloc_descs+0xb0/0x150
irq_create_mapping+0x168/0x2c0
xics_smp_probe+0x2c/0x98
pnv_smp_probe+0x40/0x9c
smp_prepare_cpus+0x524/0x6c4
kernel_init_freeable+0x1b4/0x650
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x70
INITIAL USE at:
lock_acquire+0xf8/0x2a0
fs_reclaim_acquire.part.23+0x44/0x60
kmem_cache_alloc_node_trace+0x80/0x590
alloc_desc+0x64/0x270
__irq_alloc_descs+0x2e4/0x3a0
irq_domain_alloc_descs+0xb0/0x150
irq_create_mapping+0x168/0x2c0
xics_smp_probe+0x2c/0x98
pnv_smp_probe+0x40/0x9c
smp_prepare_cpus+0x524/0x6c4
kernel_init_freeable+0x1b4/0x650
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x70
}
===
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-4-aik@ozlabs.ru
POWER8 and newer support a bypass mode which maps all host memory to
PCI buses so an IOMMU table is not always required. However if we fail to
create such a table, the DMA setup fails and the kernel does not boot.
This skips the 32bit DMA setup check if the bypass is selected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-3-aik@ozlabs.ru
pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero,
it cannot become zero again.
Fixes: a68bd1267b ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
CC: stable@vger.kernel.org # v4.19+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-2-aik@ozlabs.ru
The calls to arch_add_memory()/arch_remove_memory() are always made
with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin().
On pSeries, arch_add_memory()/arch_remove_memory() eventually call
resize_hpt() which in turn calls stop_machine() which acquires the
read-side cpu_hotplug_lock again, thereby resulting in the recursive
acquisition of this lock.
In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system
lockup during a memory hotplug operation because cpus_read_lock() is a
per-cpu rwsem read, which, in the fast-path (in the absence of the
writer, which in our case is a CPU-hotplug operation) simply
increments the read_count on the semaphore. Thus a recursive read in
the fast-path doesn't cause any problems.
However, we can hit this problem in practice if there is a concurrent
CPU-Hotplug operation in progress which is waiting to acquire the
write-side of the lock. This will cause the second recursive read to
block until the writer finishes. While the writer is blocked since the
first read holds the lock. Thus both the reader as well as the writers
fail to make any progress thereby blocking both CPU-Hotplug as well as
Memory Hotplug operations.
Memory-Hotplug CPU-Hotplug
CPU 0 CPU 1
------ ------
1. down_read(cpu_hotplug_lock.rw_sem)
[memory_hotplug_begin]
2. down_write(cpu_hotplug_lock.rw_sem)
[cpu_up/cpu_down]
3. down_read(cpu_hotplug_lock.rw_sem)
[stop_machine()]
Lockdep complains as follows in these code-paths.
swapper/0/1 is trying to acquire lock:
(____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60
but task is already holding lock:
(____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(cpu_hotplug_lock.rw_sem);
lock(cpu_hotplug_lock.rw_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by swapper/0/1:
#0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0
#1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
#2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0
stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166
Call Trace:
dump_stack+0xe8/0x164 (unreliable)
__lock_acquire+0x1110/0x1c70
lock_acquire+0x240/0x290
cpus_read_lock+0x64/0xf0
stop_machine+0x2c/0x60
pseries_lpar_resize_hpt+0x19c/0x2c0
resize_hpt_for_hotplug+0x70/0xd0
arch_add_memory+0x58/0xfc
devm_memremap_pages+0x5e8/0x8f0
pmem_attach_disk+0x764/0x830
nvdimm_bus_probe+0x118/0x240
really_probe+0x230/0x4b0
driver_probe_device+0x16c/0x1e0
__driver_attach+0x148/0x1b0
bus_for_each_dev+0x90/0x130
driver_attach+0x34/0x50
bus_add_driver+0x1a8/0x360
driver_register+0x108/0x170
__nd_driver_register+0xd0/0xf0
nd_pmem_driver_init+0x34/0x48
do_one_initcall+0x1e0/0x45c
kernel_init_freeable+0x540/0x64c
kernel_init+0x2c/0x160
ret_from_kernel_thread+0x5c/0x68
Fix this issue by
1) Requiring all the calls to pseries_lpar_resize_hpt() be made
with cpu_hotplug_lock held.
2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
as a consequence of 1)
3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
with cpu_hotplug_lock held.
Fixes: dbcf929c00 ("powerpc/pseries: Add support for hash table resizing")
Cc: stable@vger.kernel.org # v4.11+
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.com
While trawling through the dedupe file comparison code trying to fix
page deadlocking problems, Dave Chinner noticed that the reflink code
only takes shared IOLOCK/MMAPLOCKs on the source file. Because
page_mkwrite and directio writes do not take the EXCL versions of those
locks, this means that reflink can race with writer processes.
For pure remapping this can lead to undefined behavior and file
corruption; for dedupe this means that we cannot be sure that the
contents are identical when we decide to go ahead with the remapping.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Currently the driver does not handle EPROBE_DEFER for the confd gpio.
Use devm_gpiod_get_optional() instead of devm_gpiod_get() and return
error codes from altera_ps_probe().
Fixes: 5692fae074 ("fpga manager: Add altera-ps-spi driver for Altera FPGAs")
Signed-off-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Each iteration of for_each_child_of_node puts the previous
node, but in the case of a goto from the middle of the loop, there is
no put, thus causing a memory leak. Hence add an of_node_put before the
goto in two places.
Issue found with Coccinelle.
Fixes: 119f517362 (drm/mediatek: Add DRM Driver for Mediatek SoC MT8173)
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
This patch adds the call to phy_attached_info() to the hns driver
to identify which exact PHY drivers is in use.
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a Tx timestamp is requested, a pointer to the skb is stored in the
ravb_tstamp_skb struct. This was done without an skb_get. There exists
the possibility that the skb could be freed by ravb_tx_free (when
ravb_tx_free is called from ravb_start_xmit) before the timestamp was
processed, leading to a use-after-free bug.
Use skb_get when filling a ravb_tstamp_skb struct, and add appropriate
frees/consumes when a ravb_tstamp_skb struct is freed.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Tho Vu <tho.vu.wh@rvc.renesas.com>
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the MediaTek MT7628/88 SoCs to the common
MediaTek ethernet driver. Some minor changes are needed for this and
a bigger change, as the MT7628 does not support QDMA (only PDMA).
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the NEXT_RX_DESP_IDX macro to NEXT_DESP_IDX, so that it better
can be used for TX ops as well. This will be used in the upcoming
MT7628/88 support (same functionality for RX and TX in this macro).
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all QDMA registers are named "MTK_QDMA_foo" in this driver
with one exception: MTK_QMTK_INT_STATUS. This patch renames
MTK_QMTK_INT_STATUS to MTK_QDMA_INT_STATUS so that all macros follow
this rule.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>