Commit Graph

873808 Commits (d2cd795c4ece1a24fda170c35eeb4f17d9826cbb)

Author SHA1 Message Date
Zhuohao Lee dcc935b06f
mtd: spi-nor: enable the debugfs for the partname and partid
This patch adds spi_nor_debugfs_init() for the debugfs initialization.
With this patch, we can read the partname and partid through the
debugfs.

The output of new debugfs nodes on my device are:
cat /sys/kernel/debug/mtd/mtd0/partid
spi-nor:ef6017
cat /sys/kernel/debug/mtd/mtd0/partname
w25q64dw

Signed-off-by: Zhuohao Lee <zhuohao@chromium.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
2019-08-29 10:36:54 +03:00
Zhuohao Lee 1018c94be6
mtd: mtdcore: add debugfs nodes for querying the flash name and id
Currently, we don't have vfs nodes for querying the underlying flash name
and flash id. This information is important especially when we want to
know the flash detail of the defective system. In order to support the
query, we add mtd_debugfs_populate() to create two debugfs nodes
(ie. partname and partid). The upper driver can assign the pointer to
partname and partid before calling mtd_device_register().

Signed-off-by: Zhuohao Lee <zhuohao@chromium.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
2019-08-29 10:36:47 +03:00
Chris Packham 23d103ae3e ARM: 8891/1: EDAC: armada_xp: Add support for more SoCs
The Armada 38x and other integrated SoCs use a reduced pin count so the
width of the SDRAM interface is smaller than the Armada XP SoCs. This
means that the definition of "full" and "half" width is reduced from
64/32 to 32/16.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Jan Luebbe 7f6998a412 ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC
Add support for the ECC functionality as found in the DDR RAM and L2
cache controllers on the MV78230/MV78x60 SoCs. This driver has been
tested on the MV78460 (on a custom board with a DDR3 ECC DIMM).

[cp use SPDX license]

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Jan Luebbe 0ecace04a3 ARM: 8892/1: EDAC: Add missing debugfs_create_x32 wrapper
We already have wrappers for x8 and x16, so add the missing x32 one.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Chris Packham c8abbd6f9d ARM: 8890/1: l2x0: add marvell,ecc-enable property for aurora
The aurora cache on the Marvell Armada-XP SoC supports ECC protection
for the L2 data arrays. Add a "marvell,ecc-enable" device tree property
which can be used to enable this.

[jlu@pengutronix.de: use aurora specific define AURORA_ACR_ECC_EN]

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Chris Packham 4bf4770db4 ARM: 8889/1: dt-bindings: document marvell,ecc-enable binding
Add documentation for the marvell,ecc-enable properties which can be
used to enable ECC on the Marvell aurora cache.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Chris Packham fd3bbde717 ARM: 8886/1: l2x0: support parity-enable/disable on aurora
The aurora cache on the Marvell Armada-XP SoC supports the same tag
parity features as the other l2x0 cache implementations.

[jlu@pengutronix.de: use aurora specific define AURORA_ACR_PARITY_EN]

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Jan Luebbe 0770bc9214 ARM: 8885/1: aurora-l2: add defines for parity and ECC registers
These defines will be used by subsequent patches to add support for the
parity check and error correction functionality in the Aurora L2 cache
controller.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Jan Luebbe 1a85cb4b0d ARM: 8887/1: aurora-l2: add prefix to MAX_RANGE_SIZE
The macro name is too generic, so add a AURORA_ prefix.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Jan Luebbe 921a3fe5be ARM: 8902/1: l2c: move cache-aurora-l2.h to asm/hardware
This include file will be used by the AURORA EDAC code.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Nathan Huckleberry 6dc5fd93b2 ARM: 8900/1: UNWINDER_FRAME_POINTER implementation for Clang
The stackframe setup when compiled with clang is different.
Since the stack unwinder expects the gcc stackframe setup it
fails to print backtraces. This patch adds support for the
clang stackframe setup.

Link: https://github.com/ClangBuiltLinux/linux/issues/35

Cc: clang-built-linux@googlegroups.com
Suggested-by: Tri Vo <trong@google.com>
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Thomas Gleixner 8f2edb4a78 posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
The rework of the posix-cpu-timers patch series dropped the empty
declaration of struct cpu_timer for the CONFIG_POSIX_TIMERS=n case which
causes the build to fail:

./include/linux/posix-timers.h:218:20: error: field 'cpu' has incomplete type

Add it back.

Fixes: 60bda037f1 ("posix-cpu-timers: Utilize timerqueue for storage")
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-08-29 08:25:21 +02:00
Tejun Heo 8504dea783 blkcg: add tools/cgroup/iocost_coef_gen.py
Add a script which can be used to generate device-specific iocost
linear model coefficients.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:17 -06:00
Tejun Heo 6954ff185e blkcg: add tools/cgroup/iocost_monitor.py
Instead of mucking with debugfs and ->pd_stat(), add drgn based
monitoring script.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:14 -06:00
Tejun Heo 7caa47151a blkcg: implement blk-iocost
This patchset implements IO cost model based work-conserving
proportional controller.

While io.latency provides the capability to comprehensively prioritize
and protect IOs depending on the cgroups, its protection is binary -
the lowest latency target cgroup which is suffering is protected at
the cost of all others.  In many use cases including stacking multiple
workload containers in a single system, it's necessary to distribute
IO capacity with better granularity.

One challenge of controlling IO resources is the lack of trivially
observable cost metric.  The most common metrics - bandwidth and iops
- can be off by orders of magnitude depending on the device type and
IO pattern.  However, the cost isn't a complete mystery.  Given
several key attributes, we can make fairly reliable predictions on how
expensive a given stream of IOs would be, at least compared to other
IO patterns.

The function which determines the cost of a given IO is the IO cost
model for the device.  This controller distributes IO capacity based
on the costs estimated by such model.  The more accurate the cost
model the better but the controller adapts based on IO completion
latency and as long as the relative costs across differents IO
patterns are consistent and sensible, it'll adapt to the actual
performance of the device.

Currently, the only implemented cost model is a simple linear one with
a few sets of default parameters for different classes of device.
This covers most common devices reasonably well.  All the
infrastructure to tune and add different cost models is already in
place and a later patch will also allow using bpf progs for cost
models.

Please see the top comment in blk-iocost.c and documentation for
more details.

v2: Rebased on top of RQ_ALLOC_TIME changes and folded in Rik's fix
    for a divide-by-zero bug in current_hweight() triggered by zero
    inuse_sum.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:12 -06:00
Tejun Heo 6f816b4b74 blk-mq: add optional request->alloc_time_ns
There are currently two start time timestamps - start_time_ns and
io_start_time_ns.  The former marks the request allocation and and the
second issue-to-device time.  The planned io.weight controller needs
to measure the total time bios take to execute after it leaves rq_qos
including the time spent waiting for request to become available,
which can easily dominate on saturated devices.

This patch adds request->alloc_time_ns which records when the request
allocation attempt started.  As it isn't used for the usual stats,
make it optional behind CONFIG_BLK_RQ_ALLOC_TIME and
QUEUE_FLAG_RQ_ALLOC_TIME so that it can be compiled out when there are
no users and it's active only on queues which need it even when
compiled in.

v2: s/pre_start_time/alloc_time/ and add CONFIG_BLK_RQ_ALLOC_TIME
    gating as suggested by Jens.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:10 -06:00
Tejun Heo beab17fc2a blkcg: s/RQ_QOS_CGROUP/RQ_QOS_LATENCY/
io.weight is gonna be another rq_qos cgroup mechanism.  Let's rename
RQ_QOS_CGROUP which is being used by io.latency to RQ_QOS_LATENCY in
preparation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:08 -06:00
Tejun Heo 9677a3e01f block/rq_qos: implement rq_qos_ops->queue_depth_changed()
wbt already gets queue depth changed notification through
wbt_set_queue_depth().  Generalize it into
rq_qos_ops->queue_depth_changed() so that other rq_qos policies can
easily hook into the events too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:07 -06:00
Tejun Heo d3e65ffff6 block/rq_qos: add rq_qos_merge()
Add a merge hook for rq_qos.  This will be used by io.weight.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:05 -06:00
Tejun Heo 015d254cb0 blkcg: separate blkcg_conf_get_disk() out of blkg_conf_prep()
Separate out blkcg_conf_get_disk() so that it can be used by blkcg
policy interface file input parsers before the policy is actually
enabled.  This doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:04 -06:00
Tejun Heo 86a5bba5c2 blkcg: make ->cpd_init_fn() optional
For policies which can do enough initialization from ->cpd_alloc_fn(),
make ->cpd_init_fn() optional.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:03 -06:00
Tejun Heo cf09a8ee19 blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()
Instead of @node, pass in @q and @blkcg so that the alloc function has
more context.  This doesn't cause any behavior change and will be used
by io.weight implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:01 -06:00
David S. Miller 198836fdff Merge branch 'mlxsw-Various-updates'
Ido Schimmel says:

====================
mlxsw: Various updates

Patch #1 from Amit removes 56G speed support. The reasons for this are
detailed in the commit message.

Patch #2 from Shalom ensures that the hardware does not auto negotiate
the number of used lanes. For example, if a four lane port supports 100G
over both two and four lanes, it will not advertise the two lane link
mode.

Patch #3 bumps the firmware version supported by the driver.

Patch #4 from Petr adds ethtool counters to help debug the internal PTP
implementation in mlxsw. I copied Richard on this patch in case he has
comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 18:24:04 -07:00
Petr Machata dc4f3eb08a mlxsw: spectrum_ptp: Add counters for GC events
On Spectrum-1, timestamped PTP packets and the corresponding timestamps need to
be kept in caches until both are available, at which point they are matched up
and packets forwarded as appropriate. However, not all packets will ever see
their timestamp, and not all timestamps will ever see their packet. It is
necessary to dispose of such abandoned entries, so a garbage collector was
introduced in commit 5d23e41597 ("mlxsw: spectrum: PTP: Garbage-collect
unmatched entries").

If these GC events happen often, it is a sign of a problem. However because this
whole mechanism is taking place behind the scenes, there is no direct way to
determine whether garbage collection took place.

Therefore to fix this, on Spectrum-1 only, expose four artificial ethtool
counters for the GC events: GCd timestamps and packets, in TX and RX directions.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 18:24:04 -07:00
Ido Schimmel 45bd634131 mlxsw: Bump firmware version to 13.2000.1886
The new version supports extended error reporting from firmware via a
new TLV in the EMAD packet. Similar to netlink extended ack.

It also fixes an issue in the PCI code that can result in false AER
errors under high Tx rate.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 18:24:04 -07:00
Shalom Toledo 3f61967f41 mlxsw: spectrum: Prevent auto negotiation on number of lanes
After 50G-1-lane and 100G-2-lanes link modes were introduced, the driver
is facing situations in which the hardware auto negotiates not only on
speed and type, but also on number of lanes.

Prevent auto negotiation on number of lanes by allowing only port speeds
that can be supported on a given port according to its width.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 18:24:04 -07:00
Amit Cohen b97cd89126 mlxsw: Remove 56G speed support
Commit 275e928f19 ("mlxsw: spectrum: Prevent force of 56G") prevented
the driver from setting a speed of 56G when auto-negotiation is off.
This is the only speed supported by mlxsw that cannot be set when
auto-negotiation is off, which makes it difficult to write generic
tests.

Further, the speed is not supported by newer ASICs such as Spectrum-2
and to the best of our knowledge it is not used by current users.

Therefore, remove 56G support from mlxsw.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 18:24:04 -07:00
J. Bruce Fields 2b86e3aaf9 nfsd: eliminate an unnecessary acl size limit
We're unnecessarily limiting the size of an ACL to less than what most
filesystems will support.  Some users do hit the limit and it's
confusing and unnecessary.

It still seems prudent to impose some limit on the number of ACEs the
client gives us before passing it straight to kmalloc().  So, let's just
limit it to the maximum number that would be possible given the amount
of data left in the argument buffer.

That will still leave one limit beyond whatever the filesystem imposes:
the client and server negotiate a limit on the size of a request, which
we have to respect.

But we're no longer imposing any additional arbitrary limit.

struct nfs4_ace is 20 bytes on my system and the maximum call size we'll
negotiate is about a megabyte, so in practice this is limiting the
allocation here to about a megabyte.

Reported-by: "de Vandiere, Louis" <louis.devandiere@atos.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-28 21:13:45 -04:00
Jian Shen 95fb8bb318 net: phy: force phy suspend when calling phy_stop
Some ethernet drivers may call phy_start() and phy_stop() from
ndo_open() and ndo_close() respectively.

When network cable is unconnected, and operate like below:
step 1: ifconfig ethX up -> ndo_open -> phy_start ->start
autoneg, and phy is no link.
step 2: ifconfig ethX down -> ndo_close -> phy_stop -> just stop
phy state machine.

This patch forces phy suspend even phydev->link is off.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 16:16:27 -07:00
Takashi Iwai 189308d582 sky2: Disable MSI on yet another ASUS boards (P6Xxxx)
A similar workaround for the suspend/resume problem is needed for yet
another ASUS machines, P6X models.  Like the previous fix, the BIOS
doesn't provide the standard DMI_SYS_* entry, so again DMI_BOARD_*
entries are used instead.

Reported-and-tested-by: SteveM <swm@swm1.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 16:09:02 -07:00
David S. Miller 807e329995 Merge branch 'nfp-flower-fix-bugs-in-merge-tunnel-encap-code'
Jakub Kicinski says:

====================
nfp: flower: fix bugs in merge tunnel encap code

John says:

There are few bugs in the merge encap code that have come to light with
recent driver changes. Effectively, flow bind callbacks were being
registered twice when using internal ports (new 'busy' code triggers
this). There was also an issue with neighbour notifier messages being
ignored for internal ports.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 16:06:49 -07:00
John Hurley e8024cb483 nfp: flower: handle neighbour events on internal ports
Recent code changes to NFP allowed the offload of neighbour entries to FW
when the next hop device was an internal port. This allows for offload of
tunnel encap when the end-point IP address is applied to such a port.

Unfortunately, the neighbour event handler still rejects events that are
not associated with a repr dev and so the firmware neighbour table may get
out of sync for internal ports.

Fix this by allowing internal port neighbour events to be correctly
processed.

Fixes: 45756dfeda ("nfp: flower: allow tunnels to output to internal port")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 16:06:49 -07:00
John Hurley 739d7c5752 nfp: flower: prevent ingress block binds on internal ports
Internal port TC offload is implemented through user-space applications
(such as OvS) by adding filters at egress via TC clsact qdiscs. Indirect
block offload support in the NFP driver accepts both ingress qdisc binds
and egress binds if the device is an internal port. However, clsact sends
bind notification for both ingress and egress block binds which can lead
to the driver registering multiple callbacks and receiving multiple
notifications of new filters.

Fix this by rejecting ingress block bind callbacks when the port is
internal and only adding filter callbacks for egress binds.

Fixes: 4d12ba4278 ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 16:06:49 -07:00
David S. Miller 80a6a5d62d Merge branch 'r8152-fix-side-effect'
Hayes Wang says:

====================
r8152: fix side effect

v3:
Update the commit message for patch #1.

v2:
Replace patch #2 with "r8152: remove calling netif_napi_del".

v1:
The commit 0ee1f47349 ("r8152: napi hangup fix after disconnect")
add a check to avoid using napi_disable after netif_napi_del. However,
the commit ffa9fec30c ("r8152: set RTL8152_UNPLUG only for real
disconnection") let the check useless.

Therefore, I revert commit 0ee1f47349 ("r8152: napi hangup fix
after disconnect") first, and add another patch to fix it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 16:02:32 -07:00
Hayes Wang 973dc6cfc0 r8152: remove calling netif_napi_del
Remove unnecessary use of netif_napi_del. This also avoids to call
napi_disable() after netif_napi_del().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 16:02:32 -07:00
Hayes Wang 49d4b14113 Revert "r8152: napi hangup fix after disconnect"
This reverts commit 0ee1f47349.

The commit 0ee1f47349 ("r8152: napi hangup fix after
disconnect") adds a check about RTL8152_UNPLUG to determine
if calling napi_disable() is invalid in rtl8152_close(),
when rtl8152_disconnect() is called. This avoids to use
napi_disable() after calling netif_napi_del().

Howver, commit ffa9fec30c ("r8152: set RTL8152_UNPLUG
only for real disconnection") causes that RTL8152_UNPLUG
is not always set when calling rtl8152_disconnect().
Therefore, I have to revert commit 0ee1f47349 ("r8152:
napi hangup fix after disconnect"), first. And submit
another patch to fix it.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 16:02:32 -07:00
Davide Caratti 092e22e586 net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueue
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that
per-cpu counters are there in the error path of skb_array_produce().
Otherwise, the following splat can be seen:

 Unable to handle kernel paging request at virtual address 0000600dea430008
 Mem abort info:
   ESR = 0x96000005
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000005
   CM = 0, WnR = 0
 user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e
 [0000600dea430008] pgd=0000000000000000, pud=0000000000000000
 Internal error: Oops: 96000005 [#1] SMP
[...]
 pstate: 10000005 (nzcV daif -PAN -UAO)
 pc : pfifo_fast_enqueue+0x524/0x6e8
 lr : pfifo_fast_enqueue+0x46c/0x6e8
 sp : ffff800d39376fe0
 x29: ffff800d39376fe0 x28: 1ffff001a07d1e40
 x27: ffff800d03e8f188 x26: ffff800d03e8f200
 x25: 0000000000000062 x24: ffff800d393772f0
 x23: 0000000000000000 x22: 0000000000000403
 x21: ffff800cca569a00 x20: ffff800d03e8ee00
 x19: ffff800cca569a10 x18: 00000000000000bf
 x17: 0000000000000000 x16: 0000000000000000
 x15: 0000000000000000 x14: ffff1001a726edd0
 x13: 1fffe4000276a9a4 x12: 0000000000000000
 x11: dfff200000000000 x10: ffff800d03e8f1a0
 x9 : 0000000000000003 x8 : 0000000000000000
 x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea
 x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003
 x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb
 x1 : 0000600dea430000 x0 : 0000600dea430008
 Process ping (pid: 6067, stack limit = 0x00000000dc0aa557)
 Call trace:
  pfifo_fast_enqueue+0x524/0x6e8
  htb_enqueue+0x660/0x10e0 [sch_htb]
  __dev_queue_xmit+0x123c/0x2de0
  dev_queue_xmit+0x24/0x30
  ip_finish_output2+0xc48/0x1720
  ip_finish_output+0x548/0x9d8
  ip_output+0x334/0x788
  ip_local_out+0x90/0x138
  ip_send_skb+0x44/0x1d0
  ip_push_pending_frames+0x5c/0x78
  raw_sendmsg+0xed8/0x28d0
  inet_sendmsg+0xc4/0x5c0
  sock_sendmsg+0xac/0x108
  __sys_sendto+0x1ac/0x2a0
  __arm64_sys_sendto+0xc4/0x138
  el0_svc_handler+0x13c/0x298
  el0_svc+0x8/0xc
 Code: f9402e80 d538d081 91002000 8b010000 (885f7c03)

Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
before dereferencing 'qdisc->cpu_qstats'.

Fixes: 8a53e616de ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
CC: Paolo Abeni <pabeni@redhat.com>
CC: Stefano Brivio <sbrivio@redhat.com>
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 15:57:38 -07:00
Willem de Bruijn 888a5c53c0 tcp: inherit timestamp on mtu probe
TCP associates tx timestamp requests with a byte in the bytestream.
If merging skbs in tcp_mtu_probe, migrate the tstamp request.

Similar to MSG_EOR, do not allow moving a timestamp from any segment
in the probe but the last. This to avoid merging multiple timestamps.

Tested with the packetdrill script at
https://github.com/wdebruij/packetdrill/commits/mtu_probe-1

Link: http://patchwork.ozlabs.org/patch/1143278/#2232897
Fixes: 4ed2d765df ("net-timestamp: TCP timestamping")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 15:56:28 -07:00
Vlad Buslov dbf47a2a09 net: sched: act_sample: fix psample group handling on overwrite
Action sample doesn't properly handle psample_group pointer in overwrite
case. Following issues need to be fixed:

- In tcf_sample_init() function RCU_INIT_POINTER() is used to set
  s->psample_group, even though we neither setting the pointer to NULL, nor
  preventing concurrent readers from accessing the pointer in some way.
  Use rcu_swap_protected() instead to safely reset the pointer.

- Old value of s->psample_group is not released or deallocated in any way,
  which results resource leak. Use psample_group_put() on non-NULL value
  obtained with rcu_swap_protected().

- The function psample_group_put() that released reference to struct
  psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
  grace period when deallocating it. Extend struct psample_group with rcu
  head and use kfree_rcu when freeing it.

Fixes: 5c5670fae4 ("net/sched: Introduce sample tc action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 15:53:51 -07:00
David S. Miller 8e4a2adced Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-08-26

This series contains updates to ice driver only.

Usha fixes the statistics reported on 4 port NICs which were reporting
the incorrect statistics due to using the incorrect port identifier.

Victor fixes an issue when trying to traverse to the first node of a
requested layer by adding a sibling head pointer for each layer per
traffic class.

Anirudh cleans up the locking and logic for enabling and disabling
VSI's to make it more consistent.  Updates the driver to do dynamic
allocation of queue management bitmaps and arrays, rather than
statically allocating them which consumes more memory than required.
Refactor the logic in ice_ena_msix_range() for clarity and add
additional checks for when requested resources exceed what is available.

Jesse updates the debugging print statements to make it more useful when
dealing with link and PHY related issues.

Krzysztof adds a local variable to the VSI rebuild path to improve
readability.

Akeem limits the reporting of MDD events from VFs so that the kernel
log is not clogged up with MDD events which are duplicate or potentially
false positives.  Fixed a reset issue that would result in the system
getting into a state that could only be resolved by a reboot by
testing if the VF is in a disabled state during a reset.

Michal adds a check to avoid trying to access memory that has not be
allocated by checking the number of queue pairs.

Jake fixes a static analysis warning due to a cast of a u8 to unsigned
long, so just update ice_is_tc_ena() to take a unsigned long so that a
cast is not necessary.

Colin Ian King fixes a potential infinite loop where a u8 is being
compared to an int.

Maciej refactors the queue handling functions that work on queue arrays
so that the logic can be done for a single queue.

Paul adds support for VFs to enable and disable single queues.

Henry fixed the order of operations in ice_remove() which was trying to
use adminq operations that were already disabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 15:47:11 -07:00
Thomas Falcon 36f1031c51 ibmvnic: Do not process reset during or after device removal
Currently, the ibmvnic driver will not schedule device resets
if the device is being removed, but does not check the device
state before the reset is actually processed. This leads to a race
where a reset is scheduled with a valid device state but is
processed after the driver has been removed, resulting in an oops.

Fix this by checking the device state before processing a queued
reset event.

Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 15:45:40 -07:00
zhaoyang 5b3efa4f14 ARM: 8901/1: add a criteria for pfn_valid of arm
pfn_valid can be wrong when parsing a invalid pfn whose phys address
exceeds BITS_PER_LONG as the MSB will be trimed when shifted.

The issue originally arise from bellowing call stack, which corresponding to
an access of the /proc/kpageflags from userspace with a invalid pfn parameter
and leads to kernel panic.

[46886.723249] c7 [<c031ff98>] (stable_page_flags) from [<c03203f8>]
[46886.723264] c7 [<c0320368>] (kpageflags_read) from [<c0312030>]
[46886.723280] c7 [<c0311fb0>] (proc_reg_read) from [<c02a6e6c>]
[46886.723290] c7 [<c02a6e24>] (__vfs_read) from [<c02a7018>]
[46886.723301] c7 [<c02a6f74>] (vfs_read) from [<c02a778c>]
[46886.723315] c7 [<c02a770c>] (SyS_pread64) from [<c0108620>]
(ret_fast_syscall+0x0/0x28)

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-28 23:30:21 +01:00
Anup Patel a256f2e329 RISC-V: Fix FIXMAP area corruption on RV32 systems
Currently, various virtual memory areas of Linux RISC-V are organized
in increasing order of their virtual addresses is as follows:
1. User space area (This is lowest area and starts at 0x0)
2. FIXMAP area
3. VMALLOC area
4. Kernel area (This is highest area and starts at PAGE_OFFSET)

The maximum size of user space aread is represented by TASK_SIZE.

On RV32 systems, TASK_SIZE is defined as VMALLOC_START which causes the
user space area to overlap the FIXMAP area. This allows user space apps
to potentially corrupt the FIXMAP area and kernel OF APIs will crash
whenever they access corrupted FDT in the FIXMAP area.

On RV64 systems, TASK_SIZE is set to fixed 256GB and no other areas
happen to overlap so we don't see any FIXMAP area corruptions.

This patch fixes FIXMAP area corruption on RV32 systems by setting
TASK_SIZE to FIXADDR_START. We also move FIXADDR_TOP, FIXADDR_SIZE,
and FIXADDR_START defines to asm/pgtable.h so that we can avoid cyclic
header includes.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Tested-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-08-28 15:30:12 -07:00
Justin Pettit 0754b4e8cd openvswitch: Clear the L4 portion of the key for "later" fragments.
Only the first fragment in a datagram contains the L4 headers.  When the
Open vSwitch module parses a packet, it always sets the IP protocol
field in the key, but can only set the L4 fields on the first fragment.
The original behavior would not clear the L4 portion of the key, so
garbage values would be sent in the key for "later" fragments.  This
patch clears the L4 fields in that circumstance to prevent sending those
garbage values as part of the upcall.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 14:53:51 -07:00
Greg Rose ad06a566e1 openvswitch: Properly set L4 keys on "later" IP fragments
When IP fragments are reassembled before being sent to conntrack, the
key from the last fragment is used.  Unless there are reordering
issues, the last fragment received will not contain the L4 ports, so the
key for the reassembled datagram won't contain them.  This patch updates
the key once we have a reassembled datagram.

The handle_fragments() function works on L3 headers so we pull the L3/L4
flow key update code from key_extract into a new function
'key_extract_l3l4'.  Then we add a another new function
ovs_flow_key_update_l3l4() and export it so that it is accessible by
handle_fragments() for conntrack packet reassembly.

Co-authored-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 14:53:51 -07:00
YueHaibing 3894793e4b phy: mdio-sun4i: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 14:51:00 -07:00
YueHaibing bd51ce0583 phy: mdio-mux-meson-g12a: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 14:51:00 -07:00
YueHaibing ea7076923b phy: mdio-moxart: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 14:51:00 -07:00
YueHaibing ba869d3c40 phy: mdio-hisi-femac: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 14:51:00 -07:00