Commit Graph

948892 Commits (94dea151bf3651c01acb12a38ca75ba9d26ea4da)

Author SHA1 Message Date
Eric Biggers 97c6327f71 fscrypt: use smp_load_acquire() for fscrypt_prepared_key
Normally smp_store_release() or cmpxchg_release() is paired with
smp_load_acquire().  Sometimes smp_load_acquire() can be replaced with
the more lightweight READ_ONCE().  However, for this to be safe, all the
published memory must only be accessed in a way that involves the
pointer itself.  This may not be the case if allocating the object also
involves initializing a static or global variable, for example.

fscrypt_prepared_key includes a pointer to a crypto_skcipher object,
which is internal to and is allocated by the crypto subsystem.  By using
READ_ONCE() for it, we're relying on internal implementation details of
the crypto subsystem.

Remove this fragile assumption by using smp_load_acquire() instead.

(Note: I haven't seen any real-world problems here.  This change is just
fixing the code to be guaranteed correct and less fragile.)

Fixes: 5fee36095c ("fscrypt: add inline encryption support")
Cc: Satya Tangirala <satyat@google.com>
Link: https://lore.kernel.org/r/20200721225920.114347-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21 16:02:13 -07:00
Eric Biggers bd0d97b719 fscrypt: switch fscrypt_do_sha256() to use the SHA-256 library
fscrypt_do_sha256() is only used for hashing encrypted filenames to
create no-key tokens, which isn't performance-critical.  Therefore a C
implementation of SHA-256 is sufficient.

Also, the logic to create no-key tokens is always potentially needed.
This differs from fscrypt's other dependencies on crypto API algorithms,
which are conditionally needed depending on what encryption policies
userspace is using.  Therefore, for fscrypt there isn't much benefit to
allowing SHA-256 to be a loadable module.

So, make fscrypt_do_sha256() use the SHA-256 library instead of the
crypto_shash API.  This is much simpler, since it avoids having to
implement one-time-init (which is hard to do correctly, and in fact was
implemented incorrectly) and handle failures to allocate the
crypto_shash object.

Fixes: edc440e3d2 ("fscrypt: improve format of no-key names")
Cc: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20200721225920.114347-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21 16:02:13 -07:00
Taehee Yoo 2c9d8e01f0 netdevsim: fix unbalaced locking in nsim_create()
In the nsim_create(), rtnl_lock() is called before nsim_bpf_init().
If nsim_bpf_init() is failed, rtnl_unlock() should be called,
but it isn't called.
So, unbalanced locking would occur.

Fixes: e05b2d141f ("netdevsim: move netdev creation/destruction to dev probe")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 16:00:43 -07:00
David S. Miller 4c8024f731 Merge branch 'ena-driver-new-features'
Arthur Kiyanovski says:

====================
ENA driver new features

V4 changes:
-----------
Add smp_rmb() to "net: ena: avoid unnecessary rearming of interrupt
vector when busy-polling" to adhere to the linux kernel memory model,
and update the commit message accordingly.

V3 changes:
-----------
1. Add "net: ena: enable support of rss hash key and function
   changes" patch again, with more explanations why it should
   be in net-next in commit message.
2. Add synchronization considerations to "net: ena: avoid unnecessary
   rearming of interrupt vector when busy-polling"

V2 changes:
-----------
1. Update commit messages of 2 patches to be more verbose.
2. Remove "net: ena: enable support of rss hash key and function
   changes" patch. Will be resubmitted net.

V1 cover letter:
----------------
This patchset contains performance improvements, support for new devices
and functionality:

1. Support for upcoming ENA devices
2. Avoid unnecessary IRQ unmasking in busy poll to reduce interrupt rate
3. Enabling device support for RSS function and key manipulation
4. Support for NIC-based traffic mirroring (SPAN port)
5. Additional PCI device ID
6. Cosmetic changes
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Arthur Kiyanovski 0e3a3f6dac net: ena: support new LLQ acceleration mode
New devices add a new hardware acceleration engine, which adds some
restrictions to the driver.
Metadata descriptor must be present for each packet and the maximum
burst size between two doorbells is now limited to a number
advertised by the device.

This patch adds:
1. A handshake protocol between the driver and the device, so the
device will enable the accelerated queues only when both sides
support it.

2. The driver support for the new acceleration engine:
2.1. Send metadata descriptor for each Tx packet.
2.2. Limit the number of packets sent between doorbells.(*)

(*) A previous driver implementation of this feature was comitted in
commit 05d62ca218 ("net: ena: add handling of llq max tx burst size")
however the design of the interface between the driver and device
changed since then. This change is reflected in this commit.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Arthur Kiyanovski c29efeae37 net: ena: move llq configuration from ena_probe to ena_device_init()
When the ENA device resets to recover from some error state, all LLQ
configuration values are reset to their defaults, because LLQ is
initialized only once during ena_probe().

Changes in this commit:
1. Move the LLQ configuration process into ena_init_device()
which is called from both ena_probe() and ena_restore_device(). This
way, LLQ setup configurations that are different from the default
values will survive resets.

2. Extract the LLQ bar mapping to ena_map_llq_bar(),
and call once in the lifetime of the driver from ena_probe(),
since there is no need to unmap and map the LLQ bar again every reset.

3. Map the LLQ bar if it exists, regardless if initialization of LLQ
placement policy (ENA_ADMIN_PLACEMENT_POLICY_DEV) succeeded
or not. Initialization might fail the first time, falling back to the
ENA_ADMIN_PLACEMENT_POLICY_HOST placement policy, but later succeed
after device reset, in which case the LLQ bar needs to be mapped
already.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Arthur Kiyanovski 0ee60edf46 net: ena: enable support of rss hash key and function changes
Add the rss_configurable_function_key bit to driver_supported_feature.

This bit tells the device that the driver in question supports the
retrieving and updating of RSS function and hash key, and therefore
the device should allow RSS function and key manipulation.

This commit turns on  device support for hash key and RSS function
management. Without this commit this feature is turned off at the
device and appears to the user as unsupported.

This commit concludes the following series of already merged commits:
commit 0af3c4e2ea ("net: ena: changes to RSS hash key allocation")
commit c1bd17e51c ("net: ena: change default RSS hash function to Toeplitz")
commit f66c2ea3b1 ("net: ena: allow setting the hash function without changing the key")
commit e9a1de378d ("net: ena: fix error returning in ena_com_get_hash_function()")
commit 80f8443fcd ("net: ena: avoid unnecessary admin command when RSS function set fails")
commit 6a4f7dc82d ("net: ena: rss: do not allocate key when not supported")
commit 0d1c3de7b8 ("net: ena: fix incorrect default RSS key")

The above commits represent the last part of the implementation of
this feature, and with them merged the feature can be enabled
in the device.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Arthur Kiyanovski 0f505c604e net: ena: add support for traffic mirroring
Add support for traffic mirroring, where the hardware reads the
buffer from the instance memory directly.

Traffic Mirroring needs access to the rx buffers in the instance.
To have this access, this patch:
1. Changes the code to map and unmap the rx buffers bidirectionally.
2. Enables the relevant bit in driver_supported_features to indicate
   to the FW that this driver supports traffic mirroring.

Rx completion is not generated until mirroring is done to avoid
the situation where the driver changes the buffer before it is
mirrored.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Arthur Kiyanovski 0dcec68651 net: ena: cosmetic: change ena_com_stats_admin stats to u64
The size of the admin statistics in ena_com_stats_admin is changed
from 32bit to 64bit so to align with the sizes of the other statistics
in the driver (i.e. rx_stats, tx_stats and ena_stats_dev).

This is done as part of an effort to create a unified API to read
statistics.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Arthur Kiyanovski 79890d3f3c net: ena: cosmetic: satisfy gcc warning
gcc 4.8 reports a warning when initializing with = {0}.
Dropping the "0" from the braces fixes the issue.
This fix is not ANSI compatible but is allowed by gcc.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Arthur Kiyanovski 866032ab4d net: ena: add reserved PCI device ID
Add a reserved PCI device ID to the driver's table
Used for internal testing purposes.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Arthur Kiyanovski 1e5ae35072 net: ena: avoid unnecessary rearming of interrupt vector when busy-polling
For an overview of the race created by this patch goto synchronization
label.

In napi busy-poll mode, the kernel invokes the napi handler of the
device repeatedly to poll the NIC's receive queues. This process
repeats until a timeout, specific for each connection, is up.
By polling packets in busy-poll mode the user may gain lower latency
and higher throughput (since the kernel no longer waits for interrupts
to poll the queues) in expense of CPU usage.

Upon completing a napi routine, the driver checks whether
the routine was called by an interrupt handler. If so, the driver
re-enables interrupts for the device. This is needed since an
interrupt routine invocation disables future invocations until
explicitly re-enabled.

The driver avoids re-enabling the interrupts if they were not disabled
in the first place (e.g. if driver in busy mode).
Originally, the driver checked whether interrupt re-enabling is needed
by reading the 'ena_napi->unmask_interrupt' variable. This atomic
variable was set upon interrupt and cleared after re-enabling it.

In the 4.10 Linux version, the 'napi_complete_done' call was changed
so that it returns 'false' when device should not re-enable
interrupts, and 'true' otherwise. The change includes reading the
"NAPIF_STATE_IN_BUSY_POLL" flag to check if the napi call is in
busy-poll mode, and if so, return 'false'.
The driver was changed to re-enable interrupts according to this
routine's return value.
The Linux community rejected the use of the
'ena_napi->unmaunmask_interrupt' variable to determine whether
unmasking is needed, and urged to use napi_napi_complete_done()
return value solely.
See https://lore.kernel.org/patchwork/patch/741149/ for more details

As explained, a busy-poll session exists for a specified timeout
value, after which it exits the busy-poll mode and re-enters it later.
This leads to many invocations of the napi handler where
napi_complete_done() false indicates that interrupts should be
re-enabled.
This creates a bug in which the interrupts are re-enabled
unnecessarily.
To reproduce this bug:
    1) echo 50 | sudo tee /proc/sys/net/core/busy_poll
    2) echo 50 | sudo tee /proc/sys/net/core/busy_read
    3) Add counters that check whether
    'ena_unmask_interrupt(tx_ring, rx_ring);'
    is called without disabling the interrupts in the first
    place (i.e. with calling the interrupt routine
    ena_intr_msix_io())

Steps 1+2 enable busy-poll as the default mode for new connections.

The busy poll routine rearms the interrupts after every session by
design, and so we need to add an extra check that the interrupts were
masked in the first place.

synchronization:
This patch introduces a race between the interrupt handler
ena_intr_msix_io() and the napi routine ena_io_poll().
Some macros and instruction were added to prevent this race from leaving
the interrupts masked. The following specifies the different race
scenarios in this patch:

1) interrupt handler and napi routine run sequentially
    i) interrupt handler is called, sets 'interrupts_masked' flag and
	successfully schedules the napi handler via softirq.

    In this scenario the napi routine might not see the flag change
    for several reasons:
	a) The flag is stored in a register by the compiler. For this
	case the WRITE_ONCE macro which prevents this.
	b) The compiler might reorder the instruction. For this the
	smp_wmb() instruction was used which implies a compiler memory
	barrier.
	c) On archs with weak consistency model (like ARM64) the napi
	routine might be scheduled and start running before the flag
	STORE instruction is committed to cache/memory. To ensure this
	doesn't happen, the smp_wmb() instruction was added. It ensures
	that the flag set instruction is committed before scheduling
	napi.

    ii) compiler reorders the flag's value check in the 'if' with
    the flag set in the napi routine.

    This scenario is prevented by smp_rmb() call after the flag check.

2) interrupt handler and napi routine run in parallel (can happen when
busy poll routine invokes the napi handler)

    i) interrupt handler sets the flag in one core, while the napi
    routine reads it in another core.

    This scenario also is divided into two cases:
	a) napi_complete_done() doesn't finish running, in which case
	napi_sched() would just set NAPIF_STATE_MISSED and the napi
	routine would reschedule itself without changing the flag's value.

	b) napi_complete_done() finishes running. In this case the
	napi routine might override the flag's value.
	This doesn't present any rise since it later unmasks the
	interrupt vector.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:59:04 -07:00
Yuval Basson d4eae993fc qed: Fix ILT and XRCD bitmap memory leaks
- Free ILT lines used for XRC-SRQ's contexts.
- Free XRCD bitmap

Fixes: b8204ad878 ("qed: changes to ILT to support XRC")
Fixes: 7bfb399eca ("qed: Add XRC to RoCE")
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Yuval Basson <ybason@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:50:53 -07:00
Helmut Grohne 3506b2f42d net: dsa: microchip: call phy_remove_link_mode during probe
When doing "ip link set dev ... up" for a ksz9477 backed link,
ksz9477_phy_setup is called and it calls phy_remove_link_mode to remove
1000baseT HDX. During phy_remove_link_mode, phy_advertise_supported is
called. Doing so reverts any previous change to advertised link modes
e.g. using a udevd .link file.

phy_remove_link_mode is not meant to be used while opening a link and
should be called during phy probe when the link is not yet available to
userspace.

Therefore move the phy_remove_link_mode calls into
ksz9477_switch_register. It indirectly calls dsa_register_switch, which
creates the relevant struct phy_devices and we update the link modes
right after that. At that time dev->features is already initialized by
ksz9477_switch_detect.

Remove phy_setup from ksz_dev_ops as no users remain.

Link: https://lore.kernel.org/netdev/20200715192722.GD1256692@lunn.ch/
Fixes: 42fc6a4c61 ("net: dsa: microchip: prepare PHY for proper advertisement")
Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:50:02 -07:00
David S. Miller 2dc3bd74d7 Merge branch 'hns3-fixes'
Huazhong Tan says:

====================
net: hns3: fixes for -net

There are some bugfixes for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:49:17 -07:00
Jian Shen fac24df7b9 net: hns3: fix return value error when query MAC link status fail
Currently, PF queries the MAC link status per second by calling
function hclge_get_mac_link_status(). It return the error code
when failed to send cmdq command to firmware. It's incorrect,
because this return value is used as the MAC link status, which
0 means link down, and none-zero means link up. So fixes it.

Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:49:17 -07:00
Yunsheng Lin 8ceca59fb3 net: hns3: fix error handling for desc filling
The content of the TX desc is automatically cleared by the HW
when the HW has sent out the packet to the wire. When desc filling
fails in hns3_nic_net_xmit(), it will call hns3_clear_desc() to do
the error handling, which miss zeroing of the TX desc and the
checking if a unmapping is needed.

So add the zeroing and checking in hns3_clear_desc() to avoid the
above problem. Also add DESC_TYPE_UNKNOWN to indicate the info in
desc_cb is not valid, because hns3_nic_reclaim_desc() may treat
the desc_cb->type of zero as packet and add to the sent pkt
statistics accordingly.

Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:49:17 -07:00
Yunsheng Lin 48ae74c9d8 net: hns3: fix for not calculating TX BD send size correctly
With GRO and fraglist support, the SKB can be aggregated to
a total size of 65535, and when that SKB is forwarded through
a bridge, the size of the SKB may be pushed to exceed the size
of 65535 when br_dev_queue_push_xmit() is called.

The max send size of BD supported by the HW is 65535, when a SKB
with a headlen of over 65535 is sent to the driver, the driver
needs to use multi BD to send the linear data, and the send size
of the last BD is calculated incorrectly by the driver who is
using '&' operation, which causes a TX error.

Use '%' operation to fix this problem.

Fixes: 3fe13ed95d ("net: hns3: avoid mult + div op in critical data path")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:49:17 -07:00
Yunsheng Lin 0ec3b6a7c0 net: hns3: fix for not unmapping TX buffer correctly
When a big TX buffer is sent using multi BD, the driver maps the
whole TX buffer, and unmaps it using info in desc_cb corresponding
to each BD, but only the info in the desc_cb of first BD is correct,
other info in desc_cb is wrong, which causes TX unmapping problem
when SMMU is on.

Only set the mapping and freeing info in the desc_cb of first BD to
fix this problem, because the TX buffer only need to be unmapped and
freed once.

Fixes: 1e8a7977d09f("net: hns3: add handling for big TX fragment")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huzhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:49:17 -07:00
David S. Miller 11de5770c7 Merge branch 'Phylink-PCS-updates'
Russell King says:

====================
Phylink PCS updates

This series updates the rudimentary phylink PCS support with the
results of the last four months of development of that.  Phylink
PCS support was initially added back at the end of March, when it
became clear that the current approach of treating everything at
the MAC end as being part of the MAC was inadequate.

However, this rudimentary implementation was fine initially for
mvneta and similar, but in practice had a fair number of issues,
particularly when ethtool interfaces were used to change various
link properties.

It became apparent that relying on the phylink_config structure for
the PCS was also bad when it became clear that the same PCS was used
in DSA drivers as well as in NXPs other offerings, and there was a
desire to re-use that code.

It also became apparent that splitting the "configuration" step on
an interface mode configuration between the MAC and PCS using just
mac_config() and pcs_config() methods was not sufficient for some
setups, as the MAC needed to be "taken down" prior to making changes,
and once all settings were complete, the MAC could only then be
resumed.

This series addresses these points, progressing PCS support, and
has been developed with mvneta and DPAA2 setups, with work on both
those drivers to prove this approach.  It has been rigorously tested
with mvneta, as that provides the most flexibility for testing the
various code paths.

To solve the phylink_config reuse problem, we introduce a struct
phylink_pcs, which contains the minimal information necessary, and it
is intended that this is embedded in the PCS private data structure.

To solve the interface mode configuration problem, we introduce two
new MAC methods, mac_prepare() and mac_finish() which wrap the entire
interface mode configuration only.  This has the additional benefit of
relieving MAC drivers from working out whether an interface change has
occurred, and whether they need to do some major work.

I have not yet updated all the interface documentation for these
changes yet, that work remains, but this patch set is provided in the
hope that those working on PCS support in NXP will find this useful.

Since there is a lot of change here, this is the reason why I strongly
advise that everyone has converted to the mac_link_up() way of
configuring the link parameters when the link comes up, rather than
the old way of using mac_config() - especially as splitting the PCS
changes how and when phylink calls mac_config(). Although no change
for existing users is intended, that is something I no longer am able
to test.

Changes since RFC:
- fix bisect build failure
- add patch to use config.an_enabled
- rename phylink_config_interface to phylink_major_reconfig
- add expanded documentation for phylink_set_pcs()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:51 -07:00
Russell King 93eaceb0fc net: phylink: add interface to configure clause 22 PCS PHY
Add an interface to configure the advertisement for a clause 22 PCS
PHY, and set the AN enable flag in the BMCR appropriately.

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:51 -07:00
Russell King 7137e18f6f net: phylink: add struct phylink_pcs
Add a way for MAC PCS to have private data while keeping independence
from struct phylink_config, which is used for the MAC itself. We need
this independence as we will have stand-alone code for PCS that is
independent of the MAC.  Introduce struct phylink_pcs, which is
designed to be embedded in a driver private data structure.

This structure does not include a mdio_device as there are PCS
implementations such as the Marvell DSA and network drivers where this
is not necessary.

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:51 -07:00
Russell King b7ad14c2fe net: phylink: re-implement interface configuration with PCS
With PCS support, how we implement interface reconfiguration (or other
major reconfiguration) is not up to the job; we end up reconfiguring
the PCS for an interface change while the link could potentially be up.
In order to solve this, add two additional MAC methods for major
configuration, one to prepare for the change, and one to finish the
change.

This allows mvneta and mvpp2 to shutdown what they require prior to the
MAC and PCS configuration calls, and then restart as appropriate.

This impacts ksettings_set(), which now needs to identify whether the
change is a minor tweak to the advertisement masks or whether the
interface mode has changed, and call the appropriate function for that
update.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:51 -07:00
Russell King 1571e700fd net: phylink: in-band pause mode advertisement update for PCS
Re-code the pause in-band advertisement update in light of the addition
of PCS support, so that we perform the minimum required; only the PCS
configuration function needs to be called in this case, followed by the
request to trigger a restart of negotiation if the programmed
advertisement changed.

We need to change the pcs_config() signature to pass whether resolved
pause should be passed to the MAC for setups such as mvneta and mvpp2
where doing so overrides the MAC manual flow controls.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King 1e1bf14a89 net: phylink: simplify fixed-link case for ksettings_set method
For fixed links, we only allow the current settings, so this should be
a matter of merely rejecting an attempt to change the settings.  If the
settings agree, then there is nothing more we need to do.

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King a83c8829d1 net: phylink: use config.an_enabled in ksettings_set method
Rather than recomputing whether AN is enabled, use config.an_enabled.

Suggested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King cbc1bb1e46 net: phylink: simplify phy case for ksettings_set method
When we have a PHY attached, an ethtool ksettings_set() call only
really needs to call through to the phylib equivalent; phylib will
call back to us when the link changes so we can update our state.
Therefore, we can bypass most of our ksettings_set() call for this
case.

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King c8cab719cc net: phylink: simplify ksettings_set() implementation
Simplify the ksettings_set() implementation to look more like phylib's
implementation; use a switch() for validating the autoneg setting, and
use the linkmode_modify() helper to set the autoneg bit in the
advertisement mask.

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King 7cceb599d1 net: phylink: avoid mac_config calls
Avoid calling mac_config() when using split PCS, and the interface
remains the same.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King 5005b16344 net: phylink: update PCS when changing interface during resolution
The only PHYs that are used with phylink which change their interface
are the BCM84881 and MV88X3310 family, both of which only change their
interface modes on link-up events.  This will break when drivers are
converted to split-PCS.  Fix this.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King 16319a7d31 net: phylink: ensure link is down when changing interface
The only PHYs that are used with phylink which change their interface
are the BCM84881 and MV88X3310 family, both of which only change their
interface modes on link-up events.  However, rather than relying upon
this behaviour by the PHY, we should give a stronger guarantee when
resolving that the link will be down whenever we change the interface
mode.  This patch implements that stronger guarantee for resolve.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King 319bfafe34 net: phylink: rearrange resolve mac_config() call
Use a boolean to indicate whether mac_config() should be called during
a resolution. This allows resolution to have a single location where
mac_config() will be called, which will allow us to make decisions
about how and what we do.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King b06e5cac21 net: phylink: rejig link state tracking
Rejig the link state tracking, so that we can use the current state
in a future patch.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Russell King 1ceb7ee7a6 net: phylink: update ethtool reporting for fixed-link modes
Comparing the ethtool output from phylink and non-phylink fixed-link
setups shows that we have some differences:

- The "auto-negotiation" fields are different; phylink reports these
  as "No", non-phylink reports these as "Yes" for the supported and
  advertising masks.
- The link partner advertisement is set to the link speed with non-
  phylink, but phylink leaves this unset, causing all link partner
  fields to be omitted.

The phylink ethtool output also disagrees with the software emulated
PHY dump via the MII registers.

Update the phylink fixed-link parsing code so that we better reflect
the behaviour of the non-phylink code that this facility replaces, and
bring the ethtool interface more into line with the report from via the
MII interface.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Mark Brown 43a10bf49f
Merge series "Add ASoC AHUB components for Tegra210 and later" from Sameer Pujar <spujar@nvidia.com>:
Overview
========
Audio Processing Engine (APE) comprises of Audio DMA (ADMA) and Audio
Hub (AHUB) unit. AHUB is a collection of hardware accelerators for audio
pre-processing and post-processing. It also includes a programmable full
crossbar for routing audio data across these accelerators.

This series exposes some of these below mentioned HW devices as ASoC
components for Tegra platforms from Tegra210 onwards.
 * ADMAIF : The interface between ADMA and AHUB
 * XBAR   : Crossbar for routing audio samples across various modules
 * I2S    : Inter-IC Sound Controller
 * DMIC   : Digital Microphone
 * DSPK   : Digital Speaker

Following is the summary of current series.
 * Add YAML DT binding documentation for above mentioned modules.
 * Helper function for ACIF programming is exposed for Tegra210 and later.
 * Add ASoC driver components for each of the above modules.
 * Build ACONNECT and ADMA drivers which are essential to realize audio
   use case.
 * Add DT entries for above components for Tegra210, Tegra186 and
   Tegra194.

As per the suggestion in [0] audio graph based sound card support
is pushed in a separate series.

[0] https://lkml.org/lkml/2020/6/27/4

Changelog
=========

v4 -> v5
--------
 * Common changes
   - simple-card driver changes are dropped. Changes are migrated to audio
     graph card and are moved to a separate series as suggested.

   - '#sound-dai-cells' property is not needed for planned audio graph card
     Hence dropped from documentation and related DT binding of component
     drivers.

   - CIF and DAP DAIs are added for I/O drivers (DMIC, DSPK, I2S) to
     represent DAI links using audio graph card. Similary DAIs are added in
     AHUB driver to describe endpoints in audio crossbar. Routing is updated
     to reflect the same in drivers.

v3 -> v4
--------
 * [1/23] "ASoC: dt-bindings: tegra: Add DT bindings for Tegra210"
   - Removed multiple examples and retained one example per doc
   - Fixed as per inputs on the previous series
   - Tested bindings with 'make dt_binding_check/dtbs_check'

 * [2/23] "ASoC: tegra: Add support for CIF programming"
   - No change

 * Common changes (for patch [3/10] to [7/10])
   - Mixer control overrides, for PCM parameters (rate, channel, bits),
     in each driver are dropped.
   - Updated routing as per DPCM usage
   - Minor changes related to formatting

 * New changes (patch [8/23] to [18/23] and patch [23/23])
   - Based on discussions in following threads DPCM is used for Tegra Audio.
     https://lkml.org/lkml/2020/2/20/91
     https://lkml.org/lkml/2020/4/30/519
   - The simple-card driver is used for Tegra Audio and accordingly
     some enhancements are made in simple-card and core drivers.
   - Patch [8/23] to [18/23] are related to simple-card and core changes.
   - Patch [23/23] adds sound card support to realize complete audio path.
     This is based on simple-card driver with proposed enhancements.
   - Re-ordered patches depending on above

v2 -> v3
--------
 * [1/10]  "dt-bindings: sound: tegra: add DT binding for AHUB
   - Updated licence
   - Removed redundancy w.r.t items/const/enum
   - Added constraints wherever needed with "pattern" property

 * [2/10]  "ASoC: tegra: add support for CIF programming"
   - Removed tegra_cif.c
   - Instead added inline helper function in tegra_cif.h

 * common changes (for patch [3/10] to [7/10])
   - Replace LATE system calls with Normal sleep
   - Remove explicit RPM suspend in driver remove() call
   - Use devm_kzalloc() instead of devm_kcalloc() for single element
   - Replace 'ret' with 'err' for better reading
   - Consistent error printing style across drivers
   - Minor formating fixes

 * [8/10]  "arm64: tegra: add AHUB components for few Tegra chips"
   - no change

 * [9/10]  "arm64: tegra: enable AHUB modules for few Tegra chips"
   - no change

 * [10/10] "arm64: defconfig: enable AHUB components for Tegra210 and later"
   (New patch)
   - Enables ACONNECT and AHUB components. With this AHUB and components are
     registered with ASoC core.

v1 -> v2
--------
 * [1/9] "dt-bindings: sound: tegra: add DT binding for AHUB"
   - no changes

 * [2/9] "ASoC: tegra: add support for CIF programming"
   - removed CIF programming changes for legacy chips.
   - this patch now exposes helper function for CIF programming,
     which can be used on Tegra210 later.
   - later tegra_cif.c can be extended for legacy chips as well.
   - updated commit message accordingly

 * [3/9] "ASoC: tegra: add Tegra210 based DMIC driver"
   - removed unnecessary initialization of 'ret' in probe()

 * [4/9] "ASoC: tegra: add Tegra210 based I2S driver"
   - removed unnecessary initialization of 'ret' in probe()
   - fixed indentation
   - added consistent bracing for if-else clauses
   - updated 'rx_fifo_th' type to 'unsigned int'
   - used BIT() macro for defines like '1 << {x}' in tegra210_i2s.h

 * [5/9] "ASoC: tegra: add Tegra210 based AHUB driver"
   - used of_device_get_match_data() to get 'soc_data' and removed
    explicit of_match_device()
   - used devm_platform_ioremap_resource() and removed explicit
    platform_get_resource()
   - fixed indentation for devm_snd_soc_register_component()
   - updated commit message
   - updated commit message to reflect compatible binding for Tegra186 and
     Tegra194.

 * [6/9] "ASoC: tegra: add Tegra186 based DSPK driver"
   - removed unnecessary initialization of 'ret' in probe()
   - updated 'max_th' to 'unsigned int'
   - shortened lengthy macro names to avoid wrapping in
     tegra186_dspk_wr_reg() and to be consistent

 * [7/9] "ASoC: tegra: add Tegra210 based ADMAIF driver"
   - used of_device_get_match_data() and removed explicit of_match_device()
   - used BIT() macro for defines like '1 << {x}' in tegra210_admaif.h
   - updated commit message to reflect compatible binding for Tegra186 and
     Tegra194.

 * [8/9] "arm64: tegra: add AHUB components for few Tegra chips"
   - no change

 * [9/9] "arm64: tegra: enable AHUB modules for few Tegra chips"
   - no change

 * common changes for patch [3/9] to [7/9]
   - sorted headers in alphabetical order
   - moved MODULE_DEVICE_TABLE() right below *_of_match table
   - removed macro DRV_NAME
   - removed explicit 'owner' field from platform_driver structure
   - added 'const' to snd_soc_dai_ops structure

Sameer Pujar (11):
  ASoC: dt-bindings: tegra: Add DT bindings for Tegra210
  ASoC: tegra: Add support for CIF programming
  ASoC: tegra: Add Tegra210 based DMIC driver
  ASoC: tegra: Add Tegra210 based I2S driver
  ASoC: tegra: Add Tegra210 based AHUB driver
  ASoC: tegra: Add Tegra186 based DSPK driver
  ASoC: tegra: Add Tegra210 based ADMAIF driver
  arm64: defconfig: Build AHUB component drivers
  arm64: defconfig: Build ADMA and ACONNECT driver
  arm64: tegra: Enable ACONNECT, ADMA and AGIC on Jetson Nano
  arm64: tegra: Add DT binding for AHUB components

 .../bindings/sound/nvidia,tegra186-dspk.yaml       |  83 +++
 .../bindings/sound/nvidia,tegra210-admaif.yaml     | 111 +++
 .../bindings/sound/nvidia,tegra210-ahub.yaml       | 136 ++++
 .../bindings/sound/nvidia,tegra210-dmic.yaml       |  83 +++
 .../bindings/sound/nvidia,tegra210-i2s.yaml        | 101 +++
 arch/arm64/boot/dts/nvidia/tegra186.dtsi           | 217 +++++-
 arch/arm64/boot/dts/nvidia/tegra194.dtsi           | 225 +++++-
 arch/arm64/boot/dts/nvidia/tegra210-p3450-0000.dts |  12 +
 arch/arm64/boot/dts/nvidia/tegra210.dtsi           | 140 ++++
 arch/arm64/configs/defconfig                       |   8 +
 sound/soc/tegra/Kconfig                            |  56 ++
 sound/soc/tegra/Makefile                           |  10 +
 sound/soc/tegra/tegra186_dspk.c                    | 442 +++++++++++
 sound/soc/tegra/tegra186_dspk.h                    |  70 ++
 sound/soc/tegra/tegra210_admaif.c                  | 800 ++++++++++++++++++++
 sound/soc/tegra/tegra210_admaif.h                  | 162 ++++
 sound/soc/tegra/tegra210_ahub.c                    | 676 +++++++++++++++++
 sound/soc/tegra/tegra210_ahub.h                    | 127 ++++
 sound/soc/tegra/tegra210_dmic.c                    | 455 ++++++++++++
 sound/soc/tegra/tegra210_dmic.h                    |  82 +++
 sound/soc/tegra/tegra210_i2s.c                     | 812 +++++++++++++++++++++
 sound/soc/tegra/tegra210_i2s.h                     | 126 ++++
 sound/soc/tegra/tegra_cif.h                        |  65 ++
 sound/soc/tegra/tegra_pcm.c                        | 235 +++++-
 sound/soc/tegra/tegra_pcm.h                        |  21 +-
 25 files changed, 5251 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/sound/nvidia,tegra186-dspk.yaml
 create mode 100644 Documentation/devicetree/bindings/sound/nvidia,tegra210-admaif.yaml
 create mode 100644 Documentation/devicetree/bindings/sound/nvidia,tegra210-ahub.yaml
 create mode 100644 Documentation/devicetree/bindings/sound/nvidia,tegra210-dmic.yaml
 create mode 100644 Documentation/devicetree/bindings/sound/nvidia,tegra210-i2s.yaml
 create mode 100644 sound/soc/tegra/tegra186_dspk.c
 create mode 100644 sound/soc/tegra/tegra186_dspk.h
 create mode 100644 sound/soc/tegra/tegra210_admaif.c
 create mode 100644 sound/soc/tegra/tegra210_admaif.h
 create mode 100644 sound/soc/tegra/tegra210_ahub.c
 create mode 100644 sound/soc/tegra/tegra210_ahub.h
 create mode 100644 sound/soc/tegra/tegra210_dmic.c
 create mode 100644 sound/soc/tegra/tegra210_dmic.h
 create mode 100644 sound/soc/tegra/tegra210_i2s.c
 create mode 100644 sound/soc/tegra/tegra210_i2s.h
 create mode 100644 sound/soc/tegra/tegra_cif.h

--
2.7.4
2020-07-21 23:44:59 +01:00
Serge Semin affe93dd5b
spi: dw-dma: Fix Tx DMA channel working too fast
It turns out having a Rx DMA channel serviced with higher priority than
a Tx DMA channel is not enough to provide a well balanced DMA-based SPI
transfer interface. There might still be moments when the Tx DMA channel
is occasionally handled faster than the Rx DMA channel. That in its turn
will eventually cause the SPI Rx FIFO overflow if SPI bus speed is high
enough to fill the SPI Rx FIFO in before it's cleared by the Rx DMA
channel. That's why having the DMA-based SPI Tx interface too optimized
is the errors prone, so the commit 0b2b66514f ("spi: dw: Use DMA max
burst to set the request thresholds") though being perfectly normal from
the standard functionality point of view implicitly introduced the problem
described above. In order to fix that the Tx DMA activity is intentionally
slowed down by limiting the SPI Tx FIFO depth with a value twice bigger
than the Tx burst length calculated earlier by the
dw_spi_dma_maxburst_init() method.

Fixes: 0b2b66514f ("spi: dw: Use DMA max burst to set the request thresholds")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20200721203951.2159-1-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
2020-07-21 23:43:24 +01:00
Miaohe Lin b0a422772f net: udp: Fix wrong clean up for IS_UDPLITE macro
We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is
checked.

Fixes: b2bf1e2659 ("[UDP]: Clean up for IS_UDPLITE macro")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:41:49 -07:00
David S. Miller ccbc6dacac Merge branch 'enetc-Add-adaptive-interrupt-coalescing'
Claudiu Manoil says:

====================
enetc: Add adaptive interrupt coalescing

Apart from some related cleanup patches, this set
introduces in a straightforward way the support needed
to enable and configure interrupt coalescing for ENETC.

Patch 5 introduces the support needed for configuring the
interrupt coalescing parameters and for switching between
moderated (int. coalescing) and per-packet interrupt modes.
When interrupt coalescing is enabled the Rx/Tx time
thresholds are configurable, packet thresholds are fixed.
To make this work reliably, patch 5 uses the traffic
pause procedure introduced in patch 2.

Patch 6 adds DIM (Dynamic Interrupt Moderation) to implement
adaptive coalescing based on time thresholds, for the Rx 'channel'.
On the Tx side a default optimal value is used instead, optimized for
TCP traffic over 1G and 2.5G links.  This default 'optimal' value can
be overridden anytime via 'ethtool -C tx-usecs'.

netperf -t TCP_MAERTS measurements show a significant CPU load
reduction correlated w/ reduced interrupt rates. For the
measurement results refer to the comments in patch 6.

v2: Replaced Tx DIM with predefined optimal value, giving
better results. This was also suggested by Jakub (cc).
Switched order of patches 4 and 5, for better grouping.

v3: minor cleanup/improvements
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:38:30 -07:00
Claudiu Manoil ae0e6a5d16 enetc: Add adaptive interrupt coalescing
Use the generic dynamic interrupt moderation (dim)
framework to implement adaptive interrupt coalescing
on Rx.  With the per-packet interrupt scheme, a high
interrupt rate has been noted for moderate traffic flows
leading to high CPU utilization.  The 'dim' scheme
implemented by the current patch addresses this issue
improving CPU utilization while using minimal coalescing
time thresholds in order to preserve a good latency.
On the Tx side use an optimal time threshold value by
default.  This value has been optimized for Tx TCP
streams at a rate of around 85kpps on a 1G link,
at which rate half of the Tx ring size (128) gets filled
in 1500 usecs.  Scaling this down to 2.5G links yields
the current value of 600 usecs, which is conservative
and gives good enough results for 1G links too (see
next).

Below are some measurement results for before and after
this patch (and related dependencies) basically, for a
2 ARM Cortex-A72 @1.3Ghz CPUs system (32 KB L1 data cache),
using 60secs log netperf TCP stream tests @ 1Gbit link
(maximum throughput):

1) 1 Rx TCP flow, both Rx and Tx processed by the same NAPI
thread on the same CPU:
	CPU utilization		int rate (ints/sec)
Before:	50%-60% (over 50%)		92k
After:  13%-22%				3.5k-12k
Comment:  Major CPU utilization improvement for a single flow
	  Rx TCP flow (i.e. netperf -t TCP_MAERTS) on a single
	  CPU. Usually settles under 16% for longer tests.

2) 4 Rx TCP flows + 4 Tx TCP flows (+ pings to check the latency):
	Total CPU utilization	Total int rate (ints/sec)
Before:	~80% (spikes to 90%)		~100k
After:   60% (more steady)		  ~4k
Comment:  Important improvement for this load test, while the
	  ping test outcome does not show any notable
	  difference compared to before.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:38:30 -07:00
Claudiu Manoil 915710812b enetc: Add interrupt coalescing support
Enable programming of the interrupt coalescing registers
and allow manual configuration of the coalescing time
thresholds via ethtool.  Packet thresholds have been fixed
to predetermined values as there's no point in making them
run-time configurable, also anticipating the dynamic interrupt
moderation (DIM) algorithm which uses fixed packet thresholds
as well.  If the interface is up when the operation mode of
traffic interrupt events is changed by the user (i.e. switching
from default per-packet interrupts to coalesced interrupts),
the traffic needs to be paused in the process.
This patch also prepares the ground for introducing DIM on Rx.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:38:30 -07:00
Claudiu Manoil 058d9cfa60 enetc: Drop redundant ____cacheline_aligned_in_smp
'struct enetc_bdr' is already '____cacheline_aligned_in_smp'.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:38:30 -07:00
Claudiu Manoil 12460a0abe enetc: Fix interrupt coalescing register naming
Interrupt coalescing registers naming in the current revision
of the Ref Man (RM) is ICR, deprecating the ICIR name used
in earlier (draft) versions of the RM.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:38:30 -07:00
Claudiu Manoil bbb96dc7fa enetc: Factor out the traffic start/stop procedures
A reliable traffic pause (and reconfiguration) procedure
is needed to be able to safely make h/w configuration
changes during run-time, like changing the mode in which the
interrupts are operating (i.e. with or without coalescing),
as opposed to making on-the-fly register updates that
may be subject to h/w or s/w concurrency issues.
To this end, the code responsible of the run-time device
configurations that basically starts resp. stops the traffic
flow through the device has been extracted from the
the enetc_open/_close procedures, to the separate standalone
enetc_start/_stop procedures. Traffic stop should be as
graceful as possible, it lets the executing napi threads to
to finish while the interrupts stay disabled.  But since
the napi thread will try to re-enable interrupts by clearing
the device's unmask register, the enable_irq/ disable_irq
API has been used to avoid this potential concurrency issue
and make the traffic pause procedure more reliable.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:38:30 -07:00
Claudiu Manoil 02293dd4b7 enetc: Refine buffer descriptor ring sizes
It's time to differentiate between Rx and Tx ring sizes.
Not only Tx rings are processed differently than Rx rings,
but their default number also differs - i.e. up to 8 Tx rings
per device (8 traffic classes) vs. 2 Rx rings (one per CPU).
So let's set Tx rings sizes to half the size of the Rx rings
for now, to be conservative.
The default ring sizes were decreased as well (to the next
lower power of 2), to reduce the memory footprint, buffering
etc., since the measurements I've made so far show that the
rings are very unlikely to get full.
This change also anticipates the introduction of the
dynamic interrupt moderation (dim) algorithm which operates
on maximum packet thresholds of 256 packets for Rx and 128
packets for Tx.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:38:30 -07:00
Xiongfeng Wang 9bb5fbea59 net-sysfs: add a newline when printing 'tx_timeout' by sysfs
When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.

root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:35:58 -07:00
Jisheng Zhang c17e317802 net: mdio-mux-gpio: use devm_gpiod_get_array()
Use devm_gpiod_get_array() to simplify the error handling and exit
code path.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:35:29 -07:00
Yoshihiro Shimoda 015c5d5e6a net: ethernet: ravb: exit if re-initialization fails in tx timeout
According to the report of [1], this driver is possible to cause
the following error in ravb_tx_timeout_work().

ravb e6800000.ethernet ethernet: failed to switch device to config mode

This error means that the hardware could not change the state
from "Operation" to "Configuration" while some tx and/or rx queue
are operating. After that, ravb_config() in ravb_dmac_init() will fail,
and then any descriptors will be not allocaled anymore so that NULL
pointer dereference happens after that on ravb_start_xmit().

To fix the issue, the ravb_tx_timeout_work() should check
the return values of ravb_stop_dma() and ravb_dmac_init().
If ravb_stop_dma() fails, ravb_tx_timeout_work() re-enables TX and RX
and just exits. If ravb_dmac_init() fails, just exits.

[1]
https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/

Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:34:07 -07:00
David S. Miller ca67323552 Merge branch 'udp-Fix-reuseport-selection-with-connected-sockets'
Kuniyuki Iwashima says:

====================
udp: Fix reuseport selection with connected sockets.

This patch set addresses two issues which happen when both connected and
unconnected sockets are in the same UDP reuseport group.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:31:03 -07:00
Kuniyuki Iwashima efc6b6f6c3 udp: Improve load balancing for SO_REUSEPORT.
Currently, SO_REUSEPORT does not work well if connected sockets are in a
UDP reuseport group.

Then reuseport_has_conns() returns true and the result of
reuseport_select_sock() is discarded. Also, unconnected sockets have the
same score, hence only does the first unconnected socket in udp_hslot
always receive all packets sent to unconnected sockets.

So, the result of reuseport_select_sock() should be used for load
balancing.

The noteworthy point is that the unconnected sockets placed after
connected sockets in sock_reuseport.socks will receive more packets than
others because of the algorithm in reuseport_select_sock().

    index | connected | reciprocal_scale | result
    ---------------------------------------------
    0     | no        | 20%              | 40%
    1     | no        | 20%              | 20%
    2     | yes       | 20%              | 0%
    3     | no        | 20%              | 40%
    4     | yes       | 20%              | 0%

If most of the sockets are connected, this can be a problem, but it still
works better than now.

Fixes: acdcecc612 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:31:02 -07:00
Kuniyuki Iwashima f2b2c55e51 udp: Copy has_conns in reuseport_grow().
If an unconnected socket in a UDP reuseport group connect()s, has_conns is
set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all
sockets in udp_hslot looking for the connected socket with the highest
score.

However, when the number of sockets bound to the port exceeds max_socks,
reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2()
to return without scanning all sockets, resulting in that packets sent to
connected sockets may be distributed to unconnected sockets.

Therefore, reuseport_grow() should copy has_conns.

Fixes: acdcecc612 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:31:02 -07:00