Including one's name in copyright claims is appropriate. Including it
in random comments is just vanity. After 2 decades, it is time for
these to be gone.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This patch adds support for BSS color collisions to the wireless subsystem.
Add the required functionality to nl80211 that will notify about color
collisions, triggering the color change and notifying when it is completed.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/500b3582aec8fe2c42ef46f3117b148cb7cbceb5.1625247619.git.lorenzo@kernel.org
[remove unnecessary NULL initialisation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
to get access to a user-provided bpf_cookie value, specified during BPF
program attachment (BPF link creation) time.
Naming is hard, though. With the concept being named "BPF cookie", I've
considered calling the helper:
- bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
cookie;
- bpf_get_bpf_cookie() -- too much tautology;
- bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
attach BPF program to BPF hook, it's still an "attachment" and the
bpf_cookie is associated with BPF program attachment to a hook, not a BPF
link itself. Technically, we could support bpf_cookie with old-style
cgroup programs.So I ultimately rejected it in favor of
bpf_get_attach_cookie().
Currently all perf_event-backed BPF program types support
bpf_get_attach_cookie() helper. Follow-up patches will add support for
fentry/fexit programs as well.
While at it, mark bpf_tracing_func_proto() as static to make it obvious that
it's only used from within the kernel/trace/bpf_trace.c.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-7-andrii@kernel.org
Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).
This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).
This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.
Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.
BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.
Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.
One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.
The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org
ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_REFERENCE_CLOCK_LOST means the input
external clock signal for SerDes is too weak or lost.
ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_ALOS means the received signal for
SerDes is too weak because analog loss of signal.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lightnvm supports the OCSSD 1.x and 2.0 specs which were early attempts
to produce Open Channel SSDs and never made it into the NVMe spec
proper. They have since been superceeded by NVMe enhancements such
as ZNS support. Remove the support per the deprecation schedule.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210812132308.38486-1-hch@lst.de
Reviewed-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit <47595e32869f> ("<MAINTAINERS: Mark some staging directories>")
indicated the ipx network layer as obsolete in Jan 2018,
updated in the MAINTAINERS file
now, after being exposed for 3 years to refactoring, so to
delete uapi/linux/ipx.h and net/ipx.h header files for good.
additionally, there is no module that depends on ipx.h except
a broken staging driver(r8188eu)
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new mcast querier state dump infrastructure and export vlans'
mcast context querier state embedded in attribute
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_STATE.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dumping global IPv6 querier state, we dump the state
only if our own querier is enabled or there has been another external
querier which has won the election. For the bridge global state we use
a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside.
The structure is:
[IFLA_BR_MCAST_QUERIER_STATE]
`[BRIDGE_QUERIER_IPV6_ADDRESS] - ip address of the querier
`[BRIDGE_QUERIER_IPV6_PORT] - bridge port ifindex where the querier
was seen (set only if external querier)
`[BRIDGE_QUERIER_IPV6_OTHER_TIMER] - other querier timeout
IPv4 and IPv6 attributes are embedded at the same level of
IFLA_BR_MCAST_QUERIER_STATE. If we didn't dump anything we cancel the nest
and return.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dumping global IPv4 querier state, we dump the state
only if our own querier is enabled or there has been another external
querier which has won the election. For the bridge global state we use
a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside.
The structure is:
[IFLA_BR_MCAST_QUERIER_STATE]
`[BRIDGE_QUERIER_IP_ADDRESS] - ip address of the querier
`[BRIDGE_QUERIER_IP_PORT] - bridge port ifindex where the querier was
seen (set only if external querier)
`[BRIDGE_QUERIER_IP_OTHER_TIMER] - other querier timeout
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iwlmei allows to integrate with the CSME firmware. There are
flows that are prioprietary for this purpose:
* Get the information of the AP the CSME firmware is connected
to. This is useful when we need to speed up the connection
process in case the CSME firmware has a TCP connection
that must be kept alive across the ownership transition.
* Forbid roaming, which will happen when the CSME firmware
wants to tell the user space not disrupt the connection.
* Request ownership, upon driver boot when the CSME firmware
owns the device. This is a notification sent by the kernel.
All those commands are expected to be used by any software
managing the connection (mainly NetworkManager). Those commands
are expected to be used only in case the CSME firmware owns
the device and doesn't want to release the device unless the
host made sure that it can keep the connectivity.
Here are the steps of the expected flow:
1) The machine boots while AMT has an active TCP connection
2) iwlwifi starts and tries to access the device
3) The device is not available because of the active TCP
connection. (If there are no active connections, the CSME
firmware would have allowed iwlwifi to use the device)
Note that all the steps up to here don't involve iwlmei. All
this happens in iwlwifi (in iwl_pcie_prepare_card_hw).
4) iwlmei establishes a connection to the CSME firmware (through
SAP)
Here iwlwifi uses iwlmei to access the device's capabilities
(since it can't touch the device), but this is not relevant
for the vendor commands.
5) The CSME firmware tells iwlmei that it uses the NIC and
that there is an acitve TCP connection, and hence, the
host needs to think twice before asking the CSME firmware
to release the device
6) iwlmei tells iwlwifi to report HW RFKILL with a special
reason
Up to here, there was no user space involved.
7) The user space (NetworkManager) boots and sees that the
device is in RFKILL because the host doesn't own the
device
8) The user space asks the kernel what AP the CSME firmware
is connected to (with the first vendor command mentionned
above)
9) The user space checks if it has a profile that matches the
reply from the CSME firmware
10) The user space installs a network to the wpa_supplicant
with a specific BSSID and a specific frequency
11) The user space prevents any type of full scan
12) The user space asks iwlmei to request ownership on the
device (with the third vendor command)
13) iwlmei request ownership from the CSME firmware
14) The CSME firmware grants ownership
15) iwlmei tells iwlwifi to lift the RFKILL
16) RFKILL OFF is reported to userspace
17) The host boots the device, loads the firwmare, and
connect to a specific BSSID without scanning including IP
in less than 600ms (this is what I measured, of course
it depends on many factors)
18) The host reports to the CSME firmware that there is a
connection
19) The TCP connection is preserved and the host has now
connectivity
20) Later, the TCP connection to the CSME firmware is
terminated
21) The CSME firmware tells iwlmei that it is now free to
do whatever it likes
22) iwlwifi sends the second vendor command to tell the
user space that it can remove the special network
configuration and pick any SSID / BSSID it likes.
Co-Developed-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://lore.kernel.org/r/20210625081717.7680-4-emmanuel.grumbach@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The bulk of the addition this time is mainly refactoring to add support
for Virtio transport for SCMI and the addition of the support itself.
The refactoring includes allowing transport specific init/exit calls,
making each transport as compile time configurable, supporting
monotonically increasing tokens instead of using the next available
free buffer index as the token for scmi messages which eases handling
concurrent and out-of-order messages which is a must have for virtio
transport.
Virtio support itself is conformant to the virtio SCMI device spec [1].
Virtio device id 32 has been reserved for the SCMI device [2].
Other than the virtio support, there is one bug fix in the probe failure
clean up path.
[1] https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-scmi.tex
[2] https://www.oasis-open.org/committees/ballot.php?id=3496
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEunHlEgbzHrJD3ZPhAEG6vDF+4pgFAmETO7YACgkQAEG6vDF+
4pgdHQ/9GKscQuD8U969ypygHEUqkjdMvylBdrKoV2qo6sRv18GTIa1vCVVYxpwq
34gx+o6vZxkm5krwnYcLo73VII2O/ifBHt2U5N4fRrZF8EHQcMh0YxW9wyxRvN9Y
LXR9jOUbqIvJ/gHSKGrcjSv+1DzxzGDu9wluseW6EJoYgKyYTfrhHrytdTgN//63
0RlL6jJokcfe62pD2cQQdUj2ia0+Q0mgxJd4vQYj6uy/q6S7aEUEb6yYcJGR2DEH
4nRXVhx6VXbry5UeguZG1nlE4MmHcE2qMlAeO/T1V5IgvhMbcwapYeHtwY3uQY0A
HHMUa3VJwoIJ0eCICToZkUs1ySh3gm2xCSCl3EAfeKL055zGBvozBkR32CBrPUnN
ozIT+dgFMwM1Np5lK68Ny4tuxqmAK3Zj3KDLrApfUOWHfn+EeWaPOQcfKRA2qKmD
NohXwh3luybxtwLfiUZ4s2QxPmJk/YhTkkSIqhLBvGMfVwF6zyG+7q/3HXA1F3Q2
AqL4kNm5kg/6tMgS0Htj9Eb33/w+KUGU+nU8uh+bVOg8MvmwsFAy/Qt5nCJYq8Lj
NpnqoMbatppSGU1ICwoRJLA1uxMssFeTAiy+ULHIMHCVc4Fm0WIuUTf9LTZn+7AE
cWZQmtJs8B+UoMZnc/x3JShr54oXJhapMIQtVYOVkPkdZ1zUDD4=
=CR6R
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmEVhscACgkQmmx57+YA
GNlOcxAAlSAsiZWe7B/1TuA8R903Zq1/Dq3pYvPtmancGL9N32GM081Xr123sHyx
sO5KsxU94Zed+9h2UQl0kU9/JHve8+utzLnBQvk7xYypRtDe0gOgTuOUiGIcUuFH
xMkwV9kzSOXJlaUzlFrHqDYPF/ve9eQ7RHbH/7K3pfBdbNgRPkPzyGkMM0qYEz7T
RTVAGl4nMsUkaqqEs1TRtPN3UWpbX08xphZwSC8dKRJtbnwUYpaYNvaBPzyKFhgp
zBK6JlMoiqaLN6gW5rZzsZc6hoRv5H8jLPnHPKpFlAuQK/Z1QXbqN4mBU49VnLFf
hyQNOZxSadw6BEapY+aG+r+RY5hKgk4CXcN8hbmDjGXVDoplrhbqUhj9zW3VFIT/
0JLbDUq92tjBPyjAMYxy/oRs2qMgtOajsrqRHZTwNyZthhdW/9M04siAYb2LfFdP
Q9Ydxw6dG7rPPficXtNiasXzd2+/RotCQioymR9nJp94xqbI+gm7ci8veHWMyVtQ
s68DxBgED0k4CrNxVa6vgWWAn1yVLKfeU5lONx4aNPSRVBs1WuepKW5IuWsM0Cn5
QH9DJ6XX7Yo91ykOCPfvxN/JEPgMydAzBThbXJAVWRHh8Esk68fYusq+dMmnNA3H
WD22l4sCZXaGKRsmRG89a6qYognBtBJtrX93I574Odr9xXvL5kc=
=H7PM
-----END PGP SIGNATURE-----
Merge tag 'scmi-updates-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/drivers
SCMI Updates for v5.15
The bulk of the addition this time is mainly refactoring to add support
for Virtio transport for SCMI and the addition of the support itself.
The refactoring includes allowing transport specific init/exit calls,
making each transport as compile time configurable, supporting
monotonically increasing tokens instead of using the next available
free buffer index as the token for scmi messages which eases handling
concurrent and out-of-order messages which is a must have for virtio
transport.
Virtio support itself is conformant to the virtio SCMI device spec [1].
Virtio device id 32 has been reserved for the SCMI device [2].
Other than the virtio support, there is one bug fix in the probe failure
clean up path.
[1] https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-scmi.tex
[2] https://www.oasis-open.org/committees/ballot.php?id=3496
* tag 'scmi-updates-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Use WARN_ON() to check configured transports
firmware: arm_scmi: Fix boolconv.cocci warnings
firmware: arm_scmi: Free mailbox channels if probe fails
firmware: arm_scmi: Add virtio transport
firmware: arm_scmi: Add priv parameter to scmi_rx_callback
dt-bindings: arm: Add virtio transport for SCMI
firmware: arm_scmi: Add optional link_supplier() transport op
firmware: arm_scmi: Add message passing abstractions for transports
firmware: arm_scmi: Add method to override max message number
firmware: arm_scmi: Make shmem support optional for transports
firmware: arm_scmi: Make SCMI transports configurable
firmware: arm_scmi: Make polling mode optional
firmware: arm_scmi: Make .clear_channel optional
firmware: arm_scmi: Handle concurrent and out-of-order messages
firmware: arm_scmi: Introduce monotonically increasing tokens
firmware: arm_scmi: Add optional transport_init/exit support
firmware: arm_scmi: Remove scmi_dump_header_dbg() helper
firmware: arm_scmi: Add support for type handling in common functions
Link: https://lore.kernel.org/r/20210811075743.707961-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Embed the standard multicast router port export by br_rports_fill_info()
into a new global vlan attribute BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS.
In order to have the same format for the global bridge mcast context and
the per-vlan mcast context we need a double-nesting:
- BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS
- MDBA_ROUTER
Currently we don't compare router lists, if any router port exists in
the bridge mcast contexts we consider their option sets as different and
export them separately.
In addition we export the router port vlan id when dumping similar to
the router port notification format.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast router state
which is used for the bridge itself. We just need to pass multicast context
to br_multicast_set_router instead of bridge device and the rest of the
logic remains the same.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast querier state.
We just need to pass multicast context to br_multicast_set_querier
instead of bridge device and the rest of the logic remains the same.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast startup query
interval option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast query response
interval option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast query interval
option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast querier interval
option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast membership
interval option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast last member
interval option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast startup query
count option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan multicast last member
count option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change and retrieve global vlan IGMP/MLD versions.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Use nfnetlink_unicast() instead of netlink_unicast() in nft_compat.
2) Remove call to nf_ct_l4proto_find() in flowtable offload timeout
fixup.
3) CLUSTERIP registers ARP hook on demand, from Florian.
4) Use clusterip_net to store pernet warning, also from Florian.
5) Remove struct netns_xt, from Florian Westphal.
6) Enable ebtables hooks in initns on demand, from Florian.
7) Allow to filter conntrack netlink dump per status bits,
from Florian Westphal.
8) Register x_tables hooks in initns on demand, from Florian.
9) Remove queue_handler from per-netns structure, again from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ignore fdb flags when adding port extern learn entries and always set
BR_FDB_LOCAL flag when adding bridge extern learn entries. This is
closest to the behaviour we had before and avoids breaking any use cases
which were allowed.
This patch fixes iproute2 calls which assume NUD_PERMANENT and were
allowed before, example:
$ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn
Extern learn entries are allowed to roam, but do not expire, so static
or dynamic flags make no sense for them.
Also add a comment for future reference.
Fixes: eb100e0e24 ("net: bridge: allow to add externally learned entries from user-space")
Fixes: 0541a62932 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DM configures a block device with various target specific attributes
passed to it as a table. DM loads the table, and calls each target’s
respective constructors with the attributes as input parameters.
Some of these attributes are critical to ensure the device meets
certain security bar. Thus, IMA should measure these attributes, to
ensure they are not tampered with, during the lifetime of the device.
So that the external services can have high confidence in the
configuration of the block-devices on a given system.
Some devices may have large tables. And a given device may change its
state (table-load, suspend, resume, rename, remove, table-clear etc.)
many times. Measuring these attributes each time when the device
changes its state will significantly increase the size of the IMA logs.
Further, once configured, these attributes are not expected to change
unless a new table is loaded, or a device is removed and recreated.
Therefore the clear-text of the attributes should only be measured
during table load, and the hash of the active/inactive table should be
measured for the remaining device state changes.
Export IMA function ima_measure_critical_data() to allow measurement
of DM device parameters, as well as target specific attributes, during
table load. Compute the hash of the inactive table and store it for
measurements during future state change. If a load is called multiple
times, update the inactive table hash with the hash of the latest
populated table. So that the correct inactive table hash is measured
when the device transitions to different states like resume, remove,
rename, etc.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
allows userspace applications to control whether a pidfd information
record containing a pidfd is to be returned alongside the generic
event metadata for each event.
If FAN_REPORT_PIDFD is enabled for a notification group, an additional
struct fanotify_event_info_pidfd object type will be supplied
alongside the generic struct fanotify_event_metadata for a single
event. This functionality is analogous to that of FAN_REPORT_FID in
terms of how the event structure is supplied to a userspace
application. Usage of FAN_REPORT_PIDFD with
FAN_REPORT_FID/FAN_REPORT_DFID_NAME is permitted, and in this case a
struct fanotify_event_info_pidfd object will likely follow any struct
fanotify_event_info_fid object.
Currently, the usage of the FAN_REPORT_TID flag is not permitted along
with FAN_REPORT_PIDFD as the pidfd API currently only supports the
creation of pidfds for thread-group leaders. Additionally, usage of
the FAN_REPORT_PIDFD flag is limited to privileged processes only
i.e. event listeners that are running with the CAP_SYS_ADMIN
capability. Attempting to supply the FAN_REPORT_TID initialization
flags with FAN_REPORT_PIDFD or creating a notification group without
CAP_SYS_ADMIN will result with -EINVAL being returned to the caller.
In the event of a pidfd creation error, there are two types of error
values that can be reported back to the listener. There is
FAN_NOPIDFD, which will be reported in cases where the process
responsible for generating the event has terminated prior to the event
listener being able to read the event. Then there is FAN_EPIDFD, which
will be reported when a more generic pidfd creation error has occurred
when fanotify calls pidfd_create().
Link: https://lore.kernel.org/r/5f9e09cff7ed62bfaa51c1369e0f7ea5f16a91aa.1628398044.git.repnop@google.com
Signed-off-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently the SVM get_attr call allows querying, which flags are set
in the entire address range. Add the opposite query, which flags are
clear in the entire address range. Both queries can be combined in a
single get_attr call, which allows answering questions such as, "is
this address range coherent, non-coherent, or a mix of both"?
Proposed userspace for UAPI:
https://github.com/RadeonOpenCompute/ROCR-Runtime/tree/memory_model_queries
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yand <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The family is relevant for pseudo-families like NFPROTO_INET
otherwise the user needs to rely on the hook function name to
differentiate it from NFPROTO_IPV4 and NFPROTO_IPV6 names.
Add nfnl_hook_chain_desc_attributes instead of using the existing
NFTA_CHAIN_* attributes, since these do not provide a family number.
Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmEHNoEeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGv34H/05pr8hlCcs14RcN
V6YWWHZoAi6jpClkp8612e3Pn9xTVFHaEd81E0dJVO7Vr0xAeuObeG7yDpbnFGIR
9Q1PLAeMJUi+dxLV7A0VpWHm15ODtDHnCd2P5y4YNvH/iiO0mMb6Pqw8bUF9HKII
M8u/Lqn8FXyHTUG2yqGTAZJgUVWpbD93IeKVHG4LdsiVCJuAaXLvg6LmkxWTYeGW
BcDDIL53sY4IgwrAVgwIji8Wiqmh4Bzp1+BvksPymA/91u0kJoZS9397CsDIomHp
i0RQ7Roo/sGamC7HR6MOHm21DiwIkwY1ULo+PglYSw5U3mohkMJemBvQRW3PGYJy
7SADmiE=
=LOoK
-----END PGP SIGNATURE-----
Merge tag 'v5.14-rc4' into media_tree
Linux 5.14-rc4
* tag 'v5.14-rc4': (948 commits)
Linux 5.14-rc4
pipe: make pipe writes always wake up readers
Revert "perf map: Fix dso->nsinfo refcounting"
mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
slub: fix unreclaimable slab stat for bulk free
mm/migrate: fix NR_ISOLATED corruption on 64-bit
mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
ocfs2: issue zeroout to EOF blocks
ocfs2: fix zero out valid data
lib/test_string.c: move string selftest in the Runtime Testing menu
gve: Update MAINTAINERS list
arch: Kconfig: clean up obsolete use of HAVE_IDE
can: esd_usb2: fix memory leak
can: ems_usb: fix memory leak
can: usb_8dev: fix memory leak
can: mcba_usb_start(): add missing urb->transfer_dma initialization
can: hi311x: fix a signedness bug in hi3110_cmd()
MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
scsi: fas216: Fix fall-through warning for Clang
scsi: acornscsi: Fix fall-through warning for clang
...
If CTA_STATUS is present, but CTA_STATUS_MASK is not, then the
mask is automatically set to 'status', so that kernel returns those
entries that have all of the requested bits set.
This makes more sense than using a all-one mask since we'd hardly
ever find a match.
There are no other checks for status bits, so if e.g. userspace
sets impossible combinations it will get an empty dump.
If kernel would reject unknown status bits, then a program that works on
a future kernel that has IPS_FOO bit fails on old kernels.
Same for 'impossible' combinations:
Kernel never sets ASSURED without first having set SEEN_REPLY, but its
possible that a future kernel could do so.
Therefore no sanity tests other than a 0-mask.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This transport enables communications with an SCMI platform through virtio;
the SCMI platform will be represented by a virtio device.
Implement an SCMI virtio driver according to the virtio SCMI device spec
[1]. Virtio device id 32 has been reserved for the SCMI device [2].
The virtio transport has one Tx channel (virtio cmdq, A2P channel) and
at most one Rx channel (virtio eventq, P2A channel).
The following feature bit defined in [1] is not implemented:
VIRTIO_SCMI_F_SHARED_MEMORY.
The number of messages which can be pending simultaneously is restricted
according to the virtqueue capacity negotiated at probing time.
As soon as Rx channel message buffers are allocated or have been read
out by the arm-scmi driver, feed them back to the virtio device.
Since some virtio devices may not have the short response time exhibited
by SCMI platforms using other transports, set a generous response
timeout.
SCMI polling mode is not supported by this virtio transport since deemed
meaningless: polling mode operation is offered by the SCMI core to those
transports that could not provide a completion interrupt on the TX path,
which is never the case for virtio whose core callbacks can easily call
into core scmi_rx_callback upon messages reception.
[1] https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-scmi.tex
[2] https://www.oasis-open.org/committees/ballot.php?id=3496
Link: https://lore.kernel.org/r/20210803131024.40280-16-cristian.marussi@arm.com
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Co-developed-by: Peter Hilber <peter.hilber@opensynergy.com>
Co-developed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Igor Skalkin <igor.skalkin@opensynergy.com>
[ Peter: Adapted patch for submission to upstream. ]
Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com>
[ Cristian: simplified driver logic, changed link_supplier and channel
available/setup logic, removed dummy callbacks ]
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Add a control to set intra-refresh period.
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
tcp_sndbuf_expand()). If we've just created a new socket this adjustment
is enabled on it, but if one changes the socket buffer size by
setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
restore as it first needs to increase buffer sizes for packet queues
restore and second it needs to restore back original buffer sizes. So
after CRIU restore all sockets become non-auto-adjustable, which can
decrease network performance of restored applications significantly.
CRIU need to be able to restore sockets with enabled/disabled adjustment
to the same state it was before dump, so let's add special setsockopt
for it.
Let's also export SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags to uAPI so
that using these interface one can reenable automatic socket buffer
adjustment on their sockets.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmEKaBUTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqSvgCACpR64hydl7/qt9QGnm9Ym6/v/L9y9v
aBfZMQsedP1GSuev5PpxghXU4GF0LXiDr6ryr0hhu7w2ojjlLNl9sVHCF9qdAJKz
x2D4YTlxct2KuPBdhWllQr/KWFbJh2IzarHEWzdo+QoU5A8jDlsK2kLeeikFECzT
fVUe3mu1k66/DvHsetsfzIvbUkuHk2SPpK/pwrUC6Siw6wQZBHlSoUEtBNwEPlyH
8+ZQJPqtrjr2v3mZUOkgHrlXEOZRu6OM3i1Yv2bn2x4VI+3KQHEw/cA1WNE2AOzN
CfMp4sS98QdCrAboX4VJZpGAbziTFHedqFjjIP9ultCfH9ROHhQj4Zsl
=37wt
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-5.15-20210804' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2021-08-04
this is a pull request of 5 patches for net-next/master.
The first patch is by me and fixes a typo in a comment in the CAN
J1939 protocol.
The next 2 patches are by Oleksij Rempel and update the CAN J1939
protocol to send RX status updates via the error queue mechanism.
The next patch is by me and adds a missing variable initialization to
the flexcan driver (the problem was introduced in the current net-next
cycle).
The last patch is by Aswath Govindraju and adds power-domains to the
Bosch m_can DT binding documentation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To be able to create applications with user friendly feedback, we need be
able to provide receive status information.
Typical ETP transfer may take seconds or even hours. To give user some
clue or show a progress bar, the stack should push status updates.
Same as for the TX information, the socket error queue will be used with
following new signals:
- J1939_EE_INFO_RX_RTS - received and accepted request to send signal.
- J1939_EE_INFO_RX_DPO - received data package offset signal
- J1939_EE_INFO_RX_ABORT - RX session was aborted
Instead of completion signal, user will get data package.
To activate this signals, application should set
SOF_TIMESTAMPING_RX_SOFTWARE to the SO_TIMESTAMPING socket option. This
will avoid unpredictable application behavior for the old software.
Link: https://lore.kernel.org/r/20210707094854.30781-3-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When running command pipelining for WRITE direction commands (e.g. tape
device write), userspace sends cmd completion to cmd ring before processing
write data. In that case userspace has to copy data before sending
completion, because cmd completion also implicitly releases the data buffer
in data area.
The new feature KEEP_BUF allows userspace to optionally keep the buffer
after completion by setting new bit TCMU_UFLAG_KEEP_BUF in
tcmu_cmd_entry_hdr->uflags. In that case buffer has to be released
explicitly by writing the cmd_id to new action item free_kept_buf.
All kept buffers are released during reset_ring and if userspace closes uio
device (tcmu_release).
Link: https://lore.kernel.org/r/20210713175021.20103-1-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add an option lacp_active, which is similar with team's runner.active.
This option specifies whether to send LACPDU frames periodically. If set
on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".
Note, the LACPDU state frames still will be sent when init or unbind port.
v2: remove module parameter
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
systemd added a modified copy of include/linux/ioprio.h into its
code to get the relevant content definitions for the exposed
ioprio_[get|set] system calls.
Move the user space relevant ioprio bits to the UAPI includes to be
able to use the ioprio_[get|set] syscalls as intended.
Cc: Kay Sievers <kay@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20210714195655.181943-1-socketcan@hartkopp.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
Use an anonymous union with a couple of anonymous structs in order to
keep userspace unchanged:
$ pahole -C ip_msfilter net/ipv4/ip_sockglue.o
struct ip_msfilter {
union {
struct {
__be32 imsf_multiaddr_aux; /* 0 4 */
__be32 imsf_interface_aux; /* 4 4 */
__u32 imsf_fmode_aux; /* 8 4 */
__u32 imsf_numsrc_aux; /* 12 4 */
__be32 imsf_slist[1]; /* 16 4 */
}; /* 0 20 */
struct {
__be32 imsf_multiaddr; /* 0 4 */
__be32 imsf_interface; /* 4 4 */
__u32 imsf_fmode; /* 8 4 */
__u32 imsf_numsrc; /* 12 4 */
__be32 imsf_slist_flex[0]; /* 16 0 */
}; /* 0 16 */
}; /* 0 20 */
/* size: 20, cachelines: 1, members: 1 */
/* last cacheline: 20 bytes */
};
Also, refactor the code accordingly and make use of the struct_size()
and flex_array_size() helpers.
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC action ->init() API has 10 parameters, it becomes harder
to read. Some of them are just boolean and can be replaced
by flags. Similarly for the internal API tcf_action_init()
and tcf_exts_validate().
This patch converts them to flags and fold them into
the upper 16 bits of "flags", whose lower 16 bits are still
reserved for user-space. More specifically, the following
kernel flags are introduced:
TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
distinguish whether it is compatible with policer.
TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
this action is bound to a filter.
TCA_ACT_FLAGS_REPLACE replaces 'ovr' in most contexts,
means we are replacing an existing action.
TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
opposite meaning, because we still hold RTNL in most
cases.
The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
untouched and still stored as before.
I have tested this patch with tdc and I do not see any
failure related to this patch.
Tested-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we have a compile-time default network
(MCTP_INITIAL_DEFAULT_NET). This change introduces a default_net field
on the net namespace, allowing future configuration for new interfaces.
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds the infrastructure for managing MCTP netdevices; we add
a pointer to the AF_MCTP-specific data to struct netdevice, and hook up
the rtnetlink operations for adding and removing addresses.
Includes changes from Matt Johnston <matt@codeconstruct.com.au>.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an empty drivers/net/mctp/, for future interface drivers.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change introduces the user-visible MCTP header, containing the
protocol-specific addressing definitions.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic Kconfig, an initial (empty) af_mctp source object, and
{AF,PF}_MCTP definitions, and the required definitions for a new
protocol type.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to check up->dirmask to avoid shift-out-of-bounce bug,
since up->dirmask comes from userspace.
Also, added XFRM_USERPOLICY_DIRMASK_MAX constant to uapi to inform
user-space that up->dirmask has maximum possible value
Fixes: 2d151d3907 ("xfrm: Add possibility to set the default to block if we have no policy")
Reported-and-tested-by: syzbot+9cd5837a045bbee5b810@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Allow the user program to specify both ASYNC and SYNC TCF modes by
repurposing the existing constants as bitfields. This will allow the
kernel to select one of the modes on behalf of the user program. With
this patch the kernel will always select async mode, but a subsequent
patch will make this configurable.
Link: https://linux-review.googlesource.com/id/Icc5923c85a8ea284588cc399ae74fd19ec291230
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210727205300.2554659-3-pcc@google.com
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This file was given GPL-2.0 license. But LGPL-2.1 makes more sense
as it needs to be used by libraries outside of the kernel source tree.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enabling device and wq returns standard errno and that does not provide
enough details to indicate what exactly failed. The hardware command status
is only 8bits. Expand the command status to 32bits and use the upper 16
bits to define software errors to provide more details on the exact
failure. Bit 31 will be used to indicate the error is software set as the
driver is using some of the spec defined hardware error as well.
Cc: Ramesh Thomas <ramesh.thomas@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162681373579.1968485.5891788397526827892.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Currently, when doing rate limiting using the tc-police(8) action, the
easiest way is to simply drop the packets which exceed or conform the
configured bandwidth limit. Add a new option to tc-skbmod(8), so that
users may use the ECN [1] extension to explicitly inform the receiver
about the congestion instead of dropping packets "on the floor".
The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [2]:
0b00: "Non ECN-Capable Transport", Non-ECT
0b10: "ECN Capable Transport", ECT(0)
0b01: "ECN Capable Transport", ECT(1)
0b11: "Congestion Encountered", CE
As an example:
$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
matchall action skbmod ecn
Doing the above marks all ECT(0) and ECT(1) packets as CE. It does NOT
affect Non-ECT or non-IP packets. In the tc-police scenario mentioned
above, users may pipe a tc-police action and a tc-skbmod "ecn" action
together to achieve ECN-based rate limiting.
For TCP connections, upon receiving a CE packet, the receiver will respond
with an ECE packet, asking the sender to reduce their congestion window.
However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and
our implementation does not touch or care about L4 headers.
The updated tc-skbmod SYNOPSIS looks like the following:
tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...
Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command. Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IPv{4,6} packets.
It is also worth mentioning that, in theory, the same effect could be
achieved by piping a "police" action and a "bpf" action using the
bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the
user, thus impractical.
Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets".
[1] https://datatracker.ietf.org/doc/html/rfc3168
[2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the existing PR_GET/SET_SPECULATION_CTRL API to expose the L1D flush
capability. For L1D flushing PR_SPEC_FORCE_DISABLE and
PR_SPEC_DISABLE_NOEXEC are not supported.
Enabling L1D flush does not check if the task is running on an SMT enabled
core, rather a check is done at runtime (at the time of flush), if the task
runs on a SMT sibling then the task is sent a SIGBUS which is executed
before the task returns to user space or to a guest.
This is better than the other alternatives of:
a. Ensuring strict affinity of the task (hard to enforce without further
changes in the scheduler)
b. Silently skipping flush for tasks that move to SMT enabled cores.
Hook up the core prctl and implement the x86 specific parts which in turn
makes it functional.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-5-sblbir@amazon.com
Oxford Semiconductor 950 serial port devices have a 128-byte FIFO and in
the enhanced (650) mode, which we select in `autoconfig_has_efr' with
the ECB bit set in the EFR register, they support the receive interrupt
trigger level selectable with FCR bits 7:6 from the set of 16, 32, 112,
120. This applies to the original OX16C950 discrete UART[1] as well as
950 cores embedded into more complex devices.
For these devices we set the default to 112, which sets an excessively
high level of 112 or 7/8 of the FIFO capacity, unlike with other port
types where we choose at most 1/2 of their respective FIFO capacities.
Additionally we don't make the trigger level configurable. Consequently
frequent input overruns happen with high bit rates where hardware flow
control cannot be used (e.g. terminal applications) even with otherwise
highly-performant systems.
Lower the default receive interrupt trigger level to 32 then, and make
it configurable. Document the trigger levels along with other port
types, including the set of 16, 32, 64, 112 for the transmit interrupt
as well[2].
References:
[1] "OX16C950 rev B High Performance UART with 128 byte FIFOs", Oxford
Semiconductor, Inc., DS-0031, Sep 05, Table 10: "Receiver Trigger
Levels", p. 22
[2] same, Table 9: "Transmit Interrupt Trigger Levels", p. 22
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2106260608480.37803@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously a sharing group (shared and master ids pair) can be only
inherited when mount is created via bindmount. This patch adds an
ability to add an existing private mount into an existing sharing group.
With this functionality one can first create the desired mount tree from
only private mounts (without the need to care about undesired mount
propagation or mount creation order implied by sharing group
dependencies), and next then setup any desired mount sharing between
those mounts in tree as needed.
This allows CRIU to restore any set of mount namespaces, mount trees and
sharing group trees for a container.
We have many issues with restoring mounts in CRIU related to sharing
groups and propagation:
- reverse sharing groups vs mount tree order requires complex mounts
reordering which mostly implies also using some temporary mounts
(please see https://lkml.org/lkml/2021/3/23/569 for more info)
- mount() syscall creates tons of mounts due to propagation
- mount re-parenting due to propagation
- "Mount Trap" due to propagation
- "Non Uniform" propagation, meaning that with different tricks with
mount order and temporary children-"lock" mounts one can create mount
trees which can't be restored without those tricks
(see https://www.linuxplumbersconf.org/event/7/contributions/640/)
With this new functionality we can resolve all the problems with
propagation at once.
Link: https://lore.kernel.org/r/20210715100714.120228-1-ptikhomirov@virtuozzo.com
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Mattias Nissler <mnissler@chromium.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Co-developed-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Add support for the IOAM inline insertion (only for the host-to-host use case)
which is per-route configured with lightweight tunnels. The target is iproute2
and the patch is ready. It will be posted as soon as this patchset is merged.
Here is an overview:
$ ip -6 ro ad fc00::1/128 encap ioam6 trace type 0x800000 ns 1 size 12 dev eth0
This example configures an IOAM Pre-allocated Trace option attached to the
fc00::1/128 prefix. The IOAM namespace (ns) is 1, the size of the pre-allocated
trace data block is 12 octets (size) and only the first IOAM data (bit 0:
hop_limit + node id) is included in the trace (type) represented as a bitfield.
The reason why the in-transit (IPv6-in-IPv6 encapsulation) use case is not
implemented is explained on the patchset cover.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Generic Netlink commands to allow userspace to configure IOAM
namespaces and schemas. The target is iproute2 and the patch is ready.
It will be posted as soon as this patchset is merged. Here is an overview:
$ ip ioam
Usage: ip ioam { COMMAND | help }
ip ioam namespace show
ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
ip ioam namespace del ID
ip ioam schema show
ip ioam schema add ID DATA
ip ioam schema del ID
ip ioam namespace set ID schema { ID | none }
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement support for processing the IOAM Pre-allocated Trace with IPv6,
see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option, see IANA [3].
A new per-interface sysctl is introduced. The value is a boolean to accept (=1)
or ignore (=0, by default) IPv6 IOAM options on ingress for an interface:
- net.ipv6.conf.XXX.ioam6_enabled
Two other sysctls are introduced to define IOAM IDs, represented by an integer.
They are respectively per-namespace and per-interface:
- net.ipv6.ioam6_id
- net.ipv6.conf.XXX.ioam6_id
The value of the first one represents the IOAM ID of the node itself (u32; max
and default value = U32_MAX>>8, due to hop limit concatenation) while the other
represents the IOAM ID of an interface (u16; max and default value = U16_MAX).
Each "ioam6_id" sysctl has a "_wide" equivalent:
- net.ipv6.ioam6_id_wide
- net.ipv6.conf.XXX.ioam6_id_wide
The value of the first one represents the wide IOAM ID of the node itself (u64;
max and default value = U64_MAX>>8, due to hop limit concatenation) while the
other represents the wide IOAM ID of an interface (u32; max and default value
= U32_MAX).
The use of short and wide equivalents is not exclusive, a deployment could
choose to leverage both. For example, net.ipv6.conf.XXX.ioam6_id (short format)
could be an identifier for a physical interface, whereas
net.ipv6.conf.XXX.ioam6_id_wide (wide format) could be an identifier for a
logical sub-interface. Documentation about new sysctls is provided at the end
of this patchset.
Two relativistic hash tables are used: one for IOAM namespaces, the other for
IOAM schemas. A namespace can only have a single active schema and a schema
can only be attached to a single namespace (1:1 relationship).
[1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options
[2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data
[3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides the IPv6 IOAM option header [1] as well as the IOAM
Trace header [2]. An IOAM option must be 4n-aligned. Here is an overview of
a Hop-by-Hop with an IOAM Trace option:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next header | Hdr Ext Len | Padding | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Type | Opt Data Len | Reserved | IOAM Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Namespace-ID | NodeLen | Flags | RemainingLen|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IOAM-Trace-Type | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
| | |
| node data [n] | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ D
| | a
| node data [n-1] | t
| | a
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ ... ~ S
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ p
| | a
| node data [1] | c
| | e
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | |
| node data [0] | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
The IOAM option header starts at "Option Type" and ends after "IOAM
Type". The IOAM Trace header starts at "Namespace-ID" and ends after
"IOAM-Trace-Type/Reserved".
IOAM Type: either Pre-allocated Trace (=0), Incremental Trace (=1),
Proof-of-Transit (=2) or Edge-to-Edge (=3). Note that both the
Pre-allocated Trace and the Incremental Trace look the same. The two
others are not implemented.
Namespace-ID: IOAM namespace identifier, not to be confused with network
namespaces. It adds further context to IOAM options and associated data,
and allows devices which are IOAM capable to determine whether IOAM
options must be processed or ignored. It can also be used by an operator
to distinguish different operational domains or to identify different
sets of devices.
NodeLen: Length of data added by each node. It depends on the Trace
Type.
Flags: Only the Overflow (O) flag for now. The O flag is set by a
transit node when there are not enough octets left to record its data.
RemainingLen: Remaining free space to record data.
IOAM-Trace-Type: Bit field where each bit corresponds to a specific kind
of IOAM data. See [2] for a detailed list.
[1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options
[2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the default we assume the traffic to pass, if we have no
matching IPsec policy. With this patch, we have a possibility to
change this default from allow to block. It can be configured
via netlink. Each direction (input/output/forward) can be
configured separately. With the default to block configuered,
we need allow policies for all packet flows we accept.
We do not use default policy lookup for the loopback device.
v1->v2
- fix compiling when XFRM is disabled
- Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: Christian Langrock <christian.langrock@secunet.com>
Signed-off-by: Christian Langrock <christian.langrock@secunet.com>
Co-developed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
UAPI Changes:
Cross-subsystem Changes:
- udmabuf: Add support for mapping hugepages
- Add dma-buf stats to sysfs.
- Assorted fixes to fbdev/omap2.
- dma-buf: Document DMA_BUF_IOCTL_SYNC
- Improve dma-buf non-dynamic exporter expectations better.
- Add module parameters for dma-buf size and list limit.
- Add HDMI codec support to vc4, to replace vc4's own codec.
- Document dma-buf implicit fencing rules.
- dma_resv_test_signaled test_all handling.
Core Changes:
- Extract i915's eDP backlight code into DRM helpers.
- Assorted docbook updates.
- Rework drm_dp_aux documentation.
- Add support for the DP aux bus.
- Shrink dma-fence-chain slightly.
- Add alloc/free helpers for dma-fence-chain.
- Assorted fixes to TTM., drm/of, bridge
- drm_gem_plane_helper_prepare/cleanup_fb is now the default for gem drivers.
- Small fix for scheduler completion.
- Remove use of drm_device.irq_enabled.
- Print the driver name to dmesg when registering framebuffer.
- Export drm/gem's shadow plane handling, and use it in vkms.
- Assorted small fixes.
Driver Changes:
- Add eDP backlight to nouveau.
- Assorted fixes and cleanups to nouveau, panfrost, vmwgfx, anx7625,
amdgpu, gma500, radeon, mgag200, vgem, vc4, vkms, omapdrm.
- Add support for Samsung DB7430, Samsung ATNA33XC20, EDT ETMV570G2DHU,
EDT ETM0350G0DH6, Innolux EJ030NA panels.
- Fix some simple pannels missing bus_format and connector types.
- Add mks-guest-stats instrumentation support to vmwgfx.
- Merge i915-ttm topic branch.
- Make s6e63m0 panel use Mipi-DBI helpers.
- Add detect() supoprt for AST.
- Use interrupts for hotplug on vc4.
- vmwgfx is now moved to drm-misc-next, as sroland is no longer a maintainer for now.
- vmwgfx now uses copies of vmware's internal device headers.
- Slowly convert ti-sn65dsi83 over to atomic.
- Rework amdgpu dma-resv handling.
- Fix virtio fencing for planes.
- Ensure amdgpu can always evict to SYSTEM.
- Many drivers fixed for implicit fencing rules.
- Set default prepare/cleanup fb for tiny, vram and simple helpers too.
- Rework panfrost gpu reset and related serialization.
- Update VKMS todo list.
- Make bochs a tiny gpu driver, and use vram helper.
- Use linux irq interfaces instead of drm_irq in some drivers.
- Add support for Raspberry Pi Pico to GUD.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmDxaBwACgkQ/lWMcqZw
E8PBYRAAsZgmuQU1urEsDTL931jWoJ8zxHpxSLow8ZtplembyhloGeRXRmGT8erd
ocw1wAzm0UajbFLvv50XW5N4jPnsn9IBRQVhfNNc06g4OH6qy17PPAA+clHaBJrf
BFiAcK4rzmUet3+6335ko/OvkD5er0s7ipNljxgB7FkIwP3gh3NEFG0yFcpFpxF4
fzT5Wz5vMW++XUCXZHMX+vBMjFP2AosxLVvsnxpM/48dyFWTiYRg7jhy5bICKYBM
3GdRj2e1wm3cAsZISbqtDpXSlstIw6u0w+BB6ryQvD/K5nPTqydE/YMOB85DUWLg
Sp1tijxM/KtOyC5w/IpDLkf9X24KAIcu0eKffUGbkLvIkP5cSyibelOtZBG6Jmln
AubXpgi4+mGVyYvMEVngHyrY2tW/rtpNGr/g9To9hYVHKkdRZUtolQk7KgtdV7v3
pFq60AilYTENJthkjCRoTi66BsocpaJfQOyppp6uD8/a0Spxfrq5tM+POWNylqxB
70L2ObvM4Xx51GI0ziCZQwkMp2Uzwosr+6CdbrzQKaxxpbQEcr3frkv6cap5V0WY
lnYgFw3dbA/Ga6YsnInQ87KmF4svnaWB2z/KzfnBF5pNrwoR9/4K5k7Vfb3P9YyN
w+nrfeHto0r768PjC/05uyD9diDuHOw3RHtljf/C4klBNRDDovU=
=x8Eo
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2021-07-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.15:
UAPI Changes:
Cross-subsystem Changes:
- udmabuf: Add support for mapping hugepages
- Add dma-buf stats to sysfs.
- Assorted fixes to fbdev/omap2.
- dma-buf: Document DMA_BUF_IOCTL_SYNC
- Improve dma-buf non-dynamic exporter expectations better.
- Add module parameters for dma-buf size and list limit.
- Add HDMI codec support to vc4, to replace vc4's own codec.
- Document dma-buf implicit fencing rules.
- dma_resv_test_signaled test_all handling.
Core Changes:
- Extract i915's eDP backlight code into DRM helpers.
- Assorted docbook updates.
- Rework drm_dp_aux documentation.
- Add support for the DP aux bus.
- Shrink dma-fence-chain slightly.
- Add alloc/free helpers for dma-fence-chain.
- Assorted fixes to TTM., drm/of, bridge
- drm_gem_plane_helper_prepare/cleanup_fb is now the default for gem drivers.
- Small fix for scheduler completion.
- Remove use of drm_device.irq_enabled.
- Print the driver name to dmesg when registering framebuffer.
- Export drm/gem's shadow plane handling, and use it in vkms.
- Assorted small fixes.
Driver Changes:
- Add eDP backlight to nouveau.
- Assorted fixes and cleanups to nouveau, panfrost, vmwgfx, anx7625,
amdgpu, gma500, radeon, mgag200, vgem, vc4, vkms, omapdrm.
- Add support for Samsung DB7430, Samsung ATNA33XC20, EDT ETMV570G2DHU,
EDT ETM0350G0DH6, Innolux EJ030NA panels.
- Fix some simple pannels missing bus_format and connector types.
- Add mks-guest-stats instrumentation support to vmwgfx.
- Merge i915-ttm topic branch.
- Make s6e63m0 panel use Mipi-DBI helpers.
- Add detect() supoprt for AST.
- Use interrupts for hotplug on vc4.
- vmwgfx is now moved to drm-misc-next, as sroland is no longer a maintainer for now.
- vmwgfx now uses copies of vmware's internal device headers.
- Slowly convert ti-sn65dsi83 over to atomic.
- Rework amdgpu dma-resv handling.
- Fix virtio fencing for planes.
- Ensure amdgpu can always evict to SYSTEM.
- Many drivers fixed for implicit fencing rules.
- Set default prepare/cleanup fb for tiny, vram and simple helpers too.
- Rework panfrost gpu reset and related serialization.
- Update VKMS todo list.
- Make bochs a tiny gpu driver, and use vram helper.
- Use linux irq interfaces instead of drm_irq in some drivers.
- Add support for Raspberry Pi Pico to GUD.
Signed-off-by: Dave Airlie <airlied@redhat.com>
# gpg: Signature made Fri 16 Jul 2021 21:06:04 AEST
# gpg: using RSA key B97BD6A80CAC4981091AE547FE558C72A67013C3
# gpg: Good signature from "Maarten Lankhorst <maarten.lankhorst@linux.intel.com>" [expired]
# gpg: aka "Maarten Lankhorst <maarten@debian.org>" [expired]
# gpg: aka "Maarten Lankhorst <maarten.lankhorst@canonical.com>" [expired]
# gpg: Note: This key has expired!
# Primary key fingerprint: B97B D6A8 0CAC 4981 091A E547 FE55 8C72 A670 13C3
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/444811c3-cbec-e9d5-9a6b-9632eda7962a@linux.intel.com
Add a new global vlan option which controls whether multicast snooping
is enabled or disabled for a single vlan. It controls the vlan private
flag: BR_VLFLAG_GLOBAL_MCAST_ENABLED.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new vlan options dump flag which causes only global vlan options
to be dumped. The dumps are done only with bridge devices, ports are
ignored. They support vlan compression if the options in sequential
vlans are equal (currently always true).
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can have two types of vlan options depending on context:
- per-device vlan options (split in per-bridge and per-port)
- global vlan options
The second type wasn't supported in the bridge until now, but we need
them for per-vlan multicast support, per-vlan STP support and other
options which require global vlan context. They are contained in the global
bridge vlan context even if the vlan is not configured on the bridge device
itself. This patch adds initial netlink attributes and support for setting
these global vlan options, they can only be set (RTM_NEWVLAN) and the
operation must use the bridge device. Since there are no such options yet
it shouldn't have any functional effect.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the port multicast context to check if the router port is a vlan and
in case it is include its vlan id in the notification.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a global knob that controls if vlan multicast snooping is enabled.
The proper contexts (vlan or bridge-wide) will be chosen based on the knob
when processing packets and changing bridge device state. Note that
vlans have their individual mcast snooping enabled by default, but this
knob is needed to turn on bridge vlan snooping. It is disabled by
default. To enable the knob vlan filtering must also be enabled, it
doesn't make sense to have vlan mcast snooping without vlan filtering
since that would lead to inconsistencies. Disabling vlan filtering will
also automatically disable vlan mcast snooping.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Open vSwitch kernel module uses the upcall mechanism to send
packets from kernel space to user space when it misses in the kernel
space flow table. The upcall sends packets via a Netlink socket.
Currently, a Netlink socket is created for every vport. In this way,
there is a 1:1 mapping between a vport and a Netlink socket.
When a packet is received by a vport, if it needs to be sent to
user space, it is sent via the corresponding Netlink socket.
This mechanism, with various iterations of the corresponding user
space code, has seen some limitations and issues:
* On systems with a large number of vports, there is a correspondingly
large number of Netlink sockets which can limit scaling.
(https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
* Packet reordering on upcalls.
(https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
* A thundering herd issue.
(https://bugzilla.redhat.com/show_bug.cgi?id=1834444)
This patch introduces an alternative, feature-negotiated, upcall
mode using a per-cpu dispatch rather than a per-vport dispatch.
In this mode, the Netlink socket to be used for the upcall is
selected based on the CPU of the thread that is executing the upcall.
In this way, it resolves the issues above as:
a) The number of Netlink sockets scales with the number of CPUs
rather than the number of vports.
b) Ordering per-flow is maintained as packets are distributed to
CPUs based on mechanisms such as RSS and flows are distributed
to a single user space thread.
c) Packets from a flow can only wake up one user space thread.
The corresponding user space code can be found at:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html
Bugzilla: https://bugzilla.redhat.com/1844576
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-07-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 45 non-merge commits during the last 15 day(s) which contain
a total of 52 files changed, 3122 insertions(+), 384 deletions(-).
The main changes are:
1) Introduce bpf timers, from Alexei.
2) Add sockmap support for unix datagram socket, from Cong.
3) Fix potential memleak and UAF in the verifier, from He.
4) Add bpf_get_func_ip helper, from Jiri.
5) Improvements to generic XDP mode, from Kumar.
6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding bpf_get_func_ip helper for BPF_PROG_TYPE_KPROBE programs,
so it's now possible to call bpf_get_func_ip from both kprobe and
kretprobe programs.
Taking the caller's address from 'struct kprobe::addr', which is
defined for both kprobe and kretprobe.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-5-jolsa@kernel.org
Adding bpf_get_func_ip helper for BPF_PROG_TYPE_TRACING programs,
specifically for all trampoline attach types.
The trampoline's caller IP address is stored in (ctx - 8) address.
so there's no reason to actually call the helper, but rather fixup
the call instruction and return [ctx - 8] value directly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-4-jolsa@kernel.org
Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded
in hash/array/lru maps as a regular field and helpers to operate on it:
// Initialize the timer.
// First 4 bits of 'flags' specify clockid.
// Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed.
long bpf_timer_init(struct bpf_timer *timer, struct bpf_map *map, int flags);
// Configure the timer to call 'callback_fn' static function.
long bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn);
// Arm the timer to expire 'nsec' nanoseconds from the current time.
long bpf_timer_start(struct bpf_timer *timer, u64 nsec, u64 flags);
// Cancel the timer and wait for callback_fn to finish if it was running.
long bpf_timer_cancel(struct bpf_timer *timer);
Here is how BPF program might look like:
struct map_elem {
int counter;
struct bpf_timer timer;
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1000);
__type(key, int);
__type(value, struct map_elem);
} hmap SEC(".maps");
static int timer_cb(void *map, int *key, struct map_elem *val);
/* val points to particular map element that contains bpf_timer. */
SEC("fentry/bpf_fentry_test1")
int BPF_PROG(test1, int a)
{
struct map_elem *val;
int key = 0;
val = bpf_map_lookup_elem(&hmap, &key);
if (val) {
bpf_timer_init(&val->timer, &hmap, CLOCK_REALTIME);
bpf_timer_set_callback(&val->timer, timer_cb);
bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 usec */, 0);
}
}
This patch adds helper implementations that rely on hrtimers
to call bpf functions as timers expire.
The following patches add necessary safety checks.
Only programs with CAP_BPF are allowed to use bpf_timer.
The amount of timers used by the program is constrained by
the memcg recorded at map creation time.
The bpf_timer_init() helper needs explicit 'map' argument because inner maps
are dynamic and not known at load time. While the bpf_timer_set_callback() is
receiving hidden 'aux->prog' argument supplied by the verifier.
The prog pointer is needed to do refcnting of bpf program to make sure that
program doesn't get freed while the timer is armed. This approach relies on
"user refcnt" scheme used in prog_array that stores bpf programs for
bpf_tail_call. The bpf_timer_set_callback() will increment the prog refcnt which is
paired with bpf_timer_cancel() that will drop the prog refcnt. The
ops->map_release_uref is responsible for cancelling the timers and dropping
prog refcnt when user space reference to a map reaches zero.
This uref approach is done to make sure that Ctrl-C of user space process will
not leave timers running forever unless the user space explicitly pinned a map
that contained timers in bpffs.
bpf_timer_init() and bpf_timer_set_callback() will return -EPERM if map doesn't
have user references (is not held by open file descriptor from user space and
not pinned in bpffs).
The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel
and free the timer if given map element had it allocated.
"bpftool map update" command can be used to cancel timers.
The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because
'__u64 :64' has 1 byte alignment of 8 byte padding.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210715005417.78572-4-alexei.starovoitov@gmail.com
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets
in subflow receive buffers longer than necessary, delaying
MPTCP-level ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
- netfilter: conntrack: do not mark RST in the reply direction coming
after SYN packet for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S
Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb
HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw
wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX
GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0
vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu
rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo
peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V
xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc
FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw
4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ
eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs=
=QFnb
-----END PGP SIGNATURE-----
Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski.
"Including fixes from bpf and netfilter.
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets in
subflow receive buffers longer than necessary, delaying MPTCP-level
ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack:
- do not renew entry stuck in tcp SYN_SENT state
- do not mark RST in the reply direction coming after SYN packet
for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison"
* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
sfc: add logs explaining XDP_TX/REDIRECT is not available
sfc: ensure correct number of XDP queues
sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
net: fddi: fix UAF in fza_probe
net: dsa: sja1105: fix address learning getting disabled on the CPU port
net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
net: Use nlmsg_unicast() instead of netlink_unicast()
octeontx2-pf: Fix uninitialized boolean variable pps
ipv6: allocate enough headroom in ip6_finish_output2()
net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
net: bridge: multicast: fix MRD advertisement router port marking race
net: bridge: multicast: fix PIM hello router port marking race
net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
dsa: fix for_each_child.cocci warnings
virtio_net: check virtqueue_add_sgs() return value
mptcp: properly account bulk freed memory
selftests: mptcp: fix case multiple subflows limited by server
mptcp: avoid processing packet if a subflow reset
mptcp: fix syncookie process if mptcp can not_accept new subflow
...
The bdflush system call has been deprecated for a very long time.
Recently Michael Schmitz tested[1] and found that the last known
caller of of the bdflush system call is unaffected by it's removal.
Since the code is not needed delete it.
[1] https://lkml.kernel.org/r/36123b5d-daa0-6c2b-f2d4-a942f069fd54@gmail.com
Link: https://lkml.kernel.org/r/87sg10quue.fsf_-_@disp2133
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Fix typo in a define: CEC_OP_REC_SEQ_SATERDAY -> CEC_OP_REC_SEQ_SATURDAY
This isn't used yet in actual applications to the best of my knowledge,
and it certainly doesn't break the ABI since the value doesn't change.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmDnGVYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv6UEAC78zkseI8TmKaowNfkz/+MkP9eSFb1pVn3
rxpbPOsZompHoZpeWt4oHL+3Rmm3a9iRo/APA2ELas4zvp+Q+6uG7eha2Dc4hUA9
YgeO4z9YfG8wQNZc3x7bncb6ZwqEE5nnbFe/m25SyrAZVLlZ7FKHxfoZDqjhlGFC
eLNiYO6vdvwgCoBMcotyCDttrPfEu6947/5vB1zevv57twdQQaEWGUhvyx1XrlDX
0YD5fmdOjNU2isgxt4xo2Ur2zL6w254/hvj58sV3Z7JfkJpI9DCK+ztKEfzuyEhA
WYz06rDAT1+1KuVLfowaZ+pYiPPOIsL0+QXI83r3nLaE7WGGlfS8Hmz//1FbziYs
ZSZI826kEN+/lKeWTcKOOMhmkYyXEFFuQZS34eg9KI4xwML8v+ILlHmcp+tjebw9
vzNF6f7N2ki+jnyxxyNxeMHxeAMWsqnIRROOhZg6bbs6UVNpDy4qRzpQaDOaJsVe
uSAQ6PTd/etR9KE+ClhLe6X7Rmp/lfZCPe64wqM/3k1qV2KWhE1fwCQO4c5o1MBN
rpk3Ef5PZYP3aakCvZnfcjMWlpZNbq/xMc6vPc+yq32akq1t1KbODVBiR5odcH0C
Gt5N11im50SO06haBt7EOe4JMQLbK5sxG15t4C6mNQZgPegGfaLlVkKpzIkOzUha
OkRofKMcDA==
=gHse
-----END PGP SIGNATURE-----
Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"A combination of changes that ended up depending on both the driver
and core branch (and/or the IDE removal), and a few late arriving
fixes. In detail:
- Fix io ticks wrap-around issue (Chunguang)
- nvme-tcp sock locking fix (Maurizio)
- s390-dasd fixes (Kees, Christoph)
- blk_execute_rq polling support (Keith)
- blk-cgroup RCU iteration fix (Yu)
- nbd backend ID addition (Prasanna)
- Partition deletion fix (Yufen)
- Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph)
- Removal of now dead block request types due to IDE removal
(Christoph)
- Loop probing and control device cleanups (Christoph)
- Device uevent fix (Christoph)
- Misc cleanups/fixes (Tetsuo, Christoph)"
* tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits)
blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs
block: fix the problem of io_ticks becoming smaller
nvme-tcp: can't set sk_user_data without write_lock
loop: remove unused variable in loop_set_status()
block: remove the bdgrab in blk_drop_partitions
block: grab a device refcount in disk_uevent
s390/dasd: Avoid field over-reading memcpy()
dasd: unexport dasd_set_target_state
block: check disk exist before trying to add partition
ubd: remove dead code in ubd_setup_common
nvme: use return value from blk_execute_rq()
block: return errors from blk_execute_rq()
nvme: use blk_execute_rq() for passthrough commands
block: support polling through blk_execute_rq
block: remove REQ_OP_SCSI_{IN,OUT}
block: mark blk_mq_init_queue_data static
loop: rewrite loop_exit using idr_for_each_entry
loop: split loop_lookup
loop: don't allow deleting an unspecified loop device
loop: move loop_ctl_mutex locking into loop_add
...
Doorbell remapping for ifcvf, mlx5.
virtio_vdpa support for mlx5.
Validate device input in several drivers (for SEV and friends).
ZONE_MOVABLE aware handling in virtio-mem.
Misc fixes, cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmDm5jQPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp6mYIAMTk5ggM5xdt6NCAASAigssEAoCTMorfoxkx
i7O562TEejgLvYKx/EZnYF+YpmYGyWEY9AgxMPxP/nPRLszuf0nZSmMp5ivu/vMz
zwpAto+7RpUmIQP+N6QjWabiWrpQI9EnXA47kOnyU703Y+RnITPNCvD1PpnDG3zs
W2GdH7DKqwsCY22hB+zboH2D6HNf3gTuUtgUBYbdBnYVxdOsSd1dx9Te0EKUTV3y
uvENmFEcushDRYpUhAsZm4bKcLOn+6rgNGXuXNa4R/hUlJTwrQjGmzu+ua6vfMwF
dcGxdaeMJUo8o0C1Pz7wJBXF5UZXQlxoyBP+0b0ZTm69AwmIHMY=
=o6A1
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio,vhost,vdpa updates from Michael Tsirkin:
- Doorbell remapping for ifcvf, mlx5
- virtio_vdpa support for mlx5
- Validate device input in several drivers (for SEV and friends)
- ZONE_MOVABLE aware handling in virtio-mem
- Misc fixes, cleanups
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-mem: prioritize unplug from ZONE_MOVABLE in Big Block Mode
virtio-mem: simplify high-level unplug handling in Big Block Mode
virtio-mem: prioritize unplug from ZONE_MOVABLE in Sub Block Mode
virtio-mem: simplify high-level unplug handling in Sub Block Mode
virtio-mem: simplify high-level plug handling in Sub Block Mode
virtio-mem: use page_zonenum() in virtio_mem_fake_offline()
virtio-mem: don't read big block size in Sub Block Mode
virtio/vdpa: clear the virtqueue state during probe
vp_vdpa: allow set vq state to initial state after reset
virtio-pci library: introduce vp_modern_get_driver_features()
vdpa: support packed virtqueue for set/get_vq_state()
virtio-ring: store DMA metadata in desc_extra for split virtqueue
virtio: use err label in __vring_new_virtqueue()
virtio_ring: introduce virtqueue_desc_add_split()
virtio_ring: secure handling of mapping errors
virtio-ring: factor out desc_extra allocation
virtio_ring: rename vring_desc_extra_packed
virtio-ring: maintain next in extra state for packed virtqueue
vdpa/mlx5: Clear vq ready indication upon device reset
vdpa/mlx5: Add support for doorbell bypassing
...
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmDnZwAWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wW1BD/9SHWGYhxLY+xL27eO0Q8XOPePb
diqllGavzq3fcakmJ3+6iIpb/WYX0ztu1M4KMBRP3QxNjP6nFkS1ph3PC0LL3ec2
h23hRfOrhlQd4rdonPcq/Z7oXKhrkem9G6KneVfvB94HmXnaZIrNBjwQRy0uRMXE
/IVNH4o6YMR8Av/VrG+L6BS+O/oXVnYVSLOuXsIrxmxS24NybsOpRzHvl14ZUsHt
eiwzcRC3ugAaxJn8cOSrHdBwvdOgbFFWEtMITcesQpYru+EmQcsCZdmJ0DbwsV2e
9k+LrVoy0CZFoekBtaaFvZq+JVBjUZKoAUYBML4ejWnQKolJH0BZQRh4RT0rbTjc
UMiuE3kFUsdJjzJRyO4pcqpwaNhCiZ2XrwyKeev/FLIn95bD1xbLJWfRvoKhioiI
X+1vujN2+N5n8T+u8sCVohujJCkUkMjevfF6ew8rvYOj3FrGqTi4jgrXUFAIsjLa
mHdA92oHIjNOCjyVIqnoUFTDltVMW9CwnLtd5nPnGvJoMtsj7lthy6fdtdPH0WVu
iNR4toE/AjBJo4rtib/irYbZtqmw2AbBFqoRk4yj8Fw4ZdSPYELwAR1aah0Oce9R
t1T9OE66vlr28XIC0NF917JfSNkc2eXnx4B21Zh+a/68XSJ1FzXPTob3lvXVVhQR
Ou4aw6dH7mql/2bq1w==
=wAww
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
* tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove unneeded semicolon in um_arch.c
um: Remove the repeated declaration
um: fix error return code in winch_tramp()
um: fix error return code in slip_open()
um: Fix stack pointer alignment
um: implement flush_cache_vmap/flush_cache_vunmap
um: add a UML specific futex implementation
um: enable the use of optimized xor routines in UML
um: Add support for host CPU flags and alignment
um: allow not setting extra rpaths in the linux binary
um: virtio/pci: enable suspend/resume
um: add PCI over virtio emulation driver
um: irqs: allow invoking time-travel handler multiple times
um: time-travel/signals: fix ndelay() in interrupt
um: expose time-travel mode to userspace side
um: export signals_enabled directly
um: remove unused smp_sigio_handler() declaration
lib: add iomem emulation (logic_iomem)
um: allow disabling NO_IOMEM
Pull yet more updates from Andrew Morton:
"54 patches.
Subsystems affected by this patch series: lib, mm (slub, secretmem,
cleanups, init, pagemap, and mremap), and debug"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (54 commits)
powerpc/mm: enable HAVE_MOVE_PMD support
powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache
mm/mremap: allow arch runtime override
mm/mremap: hold the rmap lock in write mode when moving page table entries.
mm/mremap: use pmd/pud_poplulate to update page table entries
mm/mremap: don't enable optimized PUD move if page table levels is 2
mm/mremap: convert huge PUD move to separate helper
selftest/mremap_test: avoid crash with static build
selftest/mremap_test: update the test to handle pagesize other than 4K
mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *
mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
kdump: use vmlinux_build_id to simplify
buildid: fix kernel-doc notation
buildid: mark some arguments const
scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path
scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
scripts/decode_stacktrace.sh: support debuginfod
x86/dumpstack: use %pSb/%pBb for backtrace printing
arm64: stacktrace: use %pSb for backtrace printing
module: add printk formats to add module build ID to stacktraces
...
Introduce "memfd_secret" system call with the ability to create memory
areas visible only in the context of the owning process and not mapped not
only to other processes but in the kernel page tables as well.
The secretmem feature is off by default and the user must explicitly
enable it at the boot time.
Once secretmem is enabled, the user will be able to create a file
descriptor using the memfd_secret() system call. The memory areas created
by mmap() calls from this file descriptor will be unmapped from the kernel
direct map and they will be only mapped in the page table of the processes
that have access to the file descriptor.
Secretmem is designed to provide the following protections:
* Enhanced protection (in conjunction with all the other in-kernel
attack prevention systems) against ROP attacks. Seceretmem makes
"simple" ROP insufficient to perform exfiltration, which increases the
required complexity of the attack. Along with other protections like
the kernel stack size limit and address space layout randomization which
make finding gadgets is really hard, absence of any in-kernel primitive
for accessing secret memory means the one gadget ROP attack can't work.
Since the only way to access secret memory is to reconstruct the missing
mapping entry, the attacker has to recover the physical page and insert
a PTE pointing to it in the kernel and then retrieve the contents. That
takes at least three gadgets which is a level of difficulty beyond most
standard attacks.
* Prevent cross-process secret userspace memory exposures. Once the
secret memory is allocated, the user can't accidentally pass it into the
kernel to be transmitted somewhere. The secreremem pages cannot be
accessed via the direct map and they are disallowed in GUP.
* Harden against exploited kernel flaws. In order to access secretmem,
a kernel-side attack would need to either walk the page tables and
create new ones, or spawn a new privileged uiserspace process to perform
secrets exfiltration using ptrace.
The file descriptor based memory has several advantages over the
"traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File
descriptor approach allows explicit and controlled sharing of the memory
areas, it allows to seal the operations. Besides, file descriptor based
memory paves the way for VMMs to remove the secret memory range from the
userspace hipervisor process, for instance QEMU. Andy Lutomirski says:
"Getting fd-backed memory into a guest will take some possibly major
work in the kernel, but getting vma-backed memory into a guest without
mapping it in the host user address space seems much, much worse."
memfd_secret() is made a dedicated system call rather than an extension to
memfd_create() because it's purpose is to allow the user to create more
secure memory mappings rather than to simply allow file based access to
the memory. Nowadays a new system call cost is negligible while it is way
simpler for userspace to deal with a clear-cut system calls than with a
multiplexer or an overloaded syscall. Moreover, the initial
implementation of memfd_secret() is completely distinct from
memfd_create() so there is no much sense in overloading memfd_create() to
begin with. If there will be a need for code sharing between these
implementation it can be easily achieved without a need to adjust user
visible APIs.
The secret memory remains accessible in the process context using uaccess
primitives, but it is not exposed to the kernel otherwise; secret memory
areas are removed from the direct map and functions in the
follow_page()/get_user_page() family will refuse to return a page that
belongs to the secret memory area.
Once there will be a use case that will require exposing secretmem to the
kernel it will be an opt-in request in the system call flags so that user
would have to decide what data can be exposed to the kernel.
Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e057
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to
have secretmem disabled by default with the ability of a system
administrator to enable it at boot time.
Pages in the secretmem regions are unevictable and unmovable to avoid
accidental exposure of the sensitive data via swap or during page
migration.
Since the secretmem mappings are locked in memory they cannot exceed
RLIMIT_MEMLOCK. Since these mappings are already locked independently
from mlock(), an attempt to mlock()/munlock() secretmem range would fail
and mlockall()/munlockall() will ignore secretmem mappings.
However, unlike mlock()ed memory, secretmem currently behaves more like
long-term GUP: secretmem mappings are unmovable mappings directly consumed
by user space. With default limits, there is no excessive use of
secretmem and it poses no real problem in combination with
ZONE_MOVABLE/CMA, but in the future this should be addressed to allow
balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA.
A page that was a part of the secret memory area is cleared when it is
freed to ensure the data is not exposed to the next user of that page.
The following example demonstrates creation of a secret mapping (error
handling is omitted):
fd = memfd_secret(0);
ftruncate(fd, MAP_SIZE);
ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
[akpm@linux-foundation.org: suppress Kconfig whine]
Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for
BPF_PROG_TEST_RUN.
The intended use case is to pass some XDP meta data to the test runs of
XDP programs that are used as tail calls.
For programs that use bpf_prog_test_run_xdp, support xdp_md input and
output. Unlike with an actual xdp_md during a non-test run, data_meta must
be 0 because it must point to the start of the provided user data. From
the initial xdp_md, use data and data_end to adjust the pointers in the
generated xdp_buff. All other non-zero fields are prohibited (with
EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially
different) xdp_md back to the userspace.
We require all fields of input xdp_md except the ones we explicitly
support to be set to zero. The expectation is that in the future we might
add support for more fields and we want to fail explicitly if the user
runs the program on the kernel where we don't yet support them.
Co-developed-by: Cody Haas <chaas@riotgames.com>
Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
Signed-off-by: Cody Haas <chaas@riotgames.com>
Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
Signed-off-by: Zvi Effron <zeffron@riotgames.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210707221657.3985075-3-zeffron@riotgames.com
- Prevent sigaltstack out of bounds writes. The kernel unconditionally
writes the FPU state to the alternate stack without checking whether
the stack is large enough to accomodate it.
Check the alternate stack size before doing so and in case it's too
small force a SIGSEGV instead of silently corrupting user space data.
- MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been
updated despite the fact that the FPU state which is stored on the
signal stack has grown over time which causes trouble in the field
when AVX512 is available on a CPU. The kernel does not expose the
minimum requirements for the alternate stack size depending on the
available and enabled CPU features.
ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
Add it to x86 as well
- A major cleanup of the x86 FPU code. The recent discoveries of XSTATE
related issues unearthed quite some inconsistencies, duplicated code
and other issues.
The fine granular overhaul addresses this, makes the code more robust
and maintainable, which allows to integrate upcoming XSTATE related
features in sane ways.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDlcpETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeP5D/4i+AgYYeiMLgGb+NS7iaKPfoWo6LIz
y3qdTSA0DQaIYbYivWwRO/g0GYdDMXDWeZalFi7eGnVI8O3eOog+22Zrf/y0UINB
KJHdYd4ApWHhs401022y5hexrWQvnV8w1yQCuj/zLm6eC+AVhdwt2AY+IBoRrdUj
wqY97B/4rJNsBvvqTDn9EeDrJA2y0y0Suc7AhIp2BGMI+dpIdxys8RJDamXNWyDL
gJf0YRgUoiIn3AHKb+fgv60AoxfC175NSg/5/y/scFNXqVlW0Up4YCb7pqG9o2Ga
f3XvtWfbw1N5PmUYjFkALwEkzGUbM3v0RA3xLY2j2WlWm9fBPPy59dt+i/h/VKyA
GrA7i7lcIqX8dfVH6XkrReZBkRDSB6t9SZTvV54jAz5fcIZO2Rg++UFUvI/R6GKK
XCcxukYaArwo+IG62iqDszS3gfLGhcor/cviOeULRC5zMUIO4Jah+IhDnifmShtC
M5s9QzrwIRD/XMewGRQmvkiN4kBfE7jFoBQr1J9leCXJKrM+2JQmMzVInuubTQIq
SdlKOaAIn7xtekz+6XdFG9Gmhck0PCLMJMOLNvQkKWI3KqGLRZ+dAWKK0vsCizAx
0BA7ZeB9w9lFT+D8mQCX77JvW9+VNwyfwIOLIrJRHk3VqVpS5qvoiFTLGJJBdZx4
/TbbRZu7nXDN2w==
=Mq1m
-----END PGP SIGNATURE-----
Merge tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Thomas Gleixner:
"Fixes and improvements for FPU handling on x86:
- Prevent sigaltstack out of bounds writes.
The kernel unconditionally writes the FPU state to the alternate
stack without checking whether the stack is large enough to
accomodate it.
Check the alternate stack size before doing so and in case it's too
small force a SIGSEGV instead of silently corrupting user space
data.
- MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never
been updated despite the fact that the FPU state which is stored on
the signal stack has grown over time which causes trouble in the
field when AVX512 is available on a CPU. The kernel does not expose
the minimum requirements for the alternate stack size depending on
the available and enabled CPU features.
ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
Add it to x86 as well.
- A major cleanup of the x86 FPU code. The recent discoveries of
XSTATE related issues unearthed quite some inconsistencies,
duplicated code and other issues.
The fine granular overhaul addresses this, makes the code more
robust and maintainable, which allows to integrate upcoming XSTATE
related features in sane ways"
* tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again
x86/fpu/signal: Let xrstor handle the features to init
x86/fpu/signal: Handle #PF in the direct restore path
x86/fpu: Return proper error codes from user access functions
x86/fpu/signal: Split out the direct restore code
x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing()
x86/fpu/signal: Sanitize the xstate check on sigframe
x86/fpu/signal: Remove the legacy alignment check
x86/fpu/signal: Move initial checks into fpu__restore_sig()
x86/fpu: Mark init_fpstate __ro_after_init
x86/pkru: Remove xstate fiddling from write_pkru()
x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate()
x86/fpu: Remove PKRU handling from switch_fpu_finish()
x86/fpu: Mask PKRU from kernel XRSTOR[S] operations
x86/fpu: Hook up PKRU into ptrace()
x86/fpu: Add PKRU storage outside of task XSAVE buffer
x86/fpu: Dont restore PKRU in fpregs_restore_userspace()
x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi()
x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs()
x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs()
...
nf_conntrack_netlink.h does not exist, refer to nfnetlink_conntrack.h instead.
Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYORvYQAKCRDh3BK/laaZ
PCfvAQCbU+PW2RbwlqjZMet6w9qorh29XYe786P5pNRVbMYCygD+N45l66Sbd/Rz
7M7ioVDseyTW4dnLhb8SzSNB0zr6jQs=
=MDvD
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Fixes for virtiofs submounts
- Misc fixes and cleanups
* tag 'fuse-update-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
virtiofs: Fix spelling mistakes
fuse: use DIV_ROUND_UP helper macro for calculations
fuse: fix illegal access to inode with reused nodeid
fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)
fuse: Make fuse_fill_super_submount() static
fuse: Switch to fc_mount() for submounts
fuse: Call vfs_get_tree() for submounts
fuse: add dedicated filesystem context ops for submounts
virtiofs: propagate sync() to file server
fuse: reject internal errno
fuse: check connected before queueing on fpq->io
fuse: ignore PG_workingset after stealing
fuse: Fix infinite loop in sget_fc()
fuse: Fix crash if superblock of submount gets killed early
fuse: Fix crash in fuse_dentry_automount() error path
Here is the big set of tty and serial driver patches for 5.14-rc1.
A bit more than normal, but nothing major, lots of cleanups. Highlights
are:
- lots of tty api cleanups and mxser driver cleanups from Jiri
- build warning fixes
- various serial driver updates
- coding style cleanups
- various tty driver minor fixes and updates
- removal of broken and disable r3964 line discipline (finally!)
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYOM4qQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylKvQCfbh+OmTkDlDlDhSWlxuV05M1XTXoAoLUcLZru
s5JCnwSZztQQLMDHj7Pd
=Zupm
-----END PGP SIGNATURE-----
Merge tag 'tty-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty and serial driver patches for 5.14-rc1.
A bit more than normal, but nothing major, lots of cleanups.
Highlights are:
- lots of tty api cleanups and mxser driver cleanups from Jiri
- build warning fixes
- various serial driver updates
- coding style cleanups
- various tty driver minor fixes and updates
- removal of broken and disable r3964 line discipline (finally!)
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (227 commits)
serial: mvebu-uart: remove unused member nb from struct mvebu_uart
arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART
dt-bindings: mvebu-uart: fix documentation
serial: mvebu-uart: correctly calculate minimal possible baudrate
serial: mvebu-uart: do not allow changing baudrate when uartclk is not available
serial: mvebu-uart: fix calculation of clock divisor
tty: make linux/tty_flip.h self-contained
serial: Prefer unsigned int to bare use of unsigned
serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs
serial: qcom_geni_serial: use DT aliases according to DT bindings
Revert "tty: serial: Add UART driver for Cortina-Access platform"
tty: serial: Add UART driver for Cortina-Access platform
MAINTAINERS: add me back as mxser maintainer
mxser: Documentation, fix typos
mxser: Documentation, make the docs up-to-date
mxser: Documentation, remove traces of callout device
mxser: introduce mxser_16550A_or_MUST helper
mxser: rename flags to old_speed in mxser_set_serial_info
mxser: use port variable in mxser_set_serial_info
mxser: access info->MCR under info->slock
...
Here is the big set of char / misc and other driver subsystem updates
for 5.14-rc1. Included in here are:
- habanna driver updates
- fsl-mc driver updates
- comedi driver updates
- fpga driver updates
- extcon driver updates
- interconnect driver updates
- mei driver updates
- nvmem driver updates
- phy driver updates
- pnp driver updates
- soundwire driver updates
- lots of other tiny driver updates for char and misc drivers
This is looking more and more like the "various driver subsystems mushed
together" tree...
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYOM8jQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymECgCg0yL+8WxDKO5Gg5llM5PshvLB1rQAn0y5pDgg
nw78LV3HQ0U7qaZBtI91
=x+AR
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here is the big set of char / misc and other driver subsystem updates
for 5.14-rc1. Included in here are:
- habanalabs driver updates
- fsl-mc driver updates
- comedi driver updates
- fpga driver updates
- extcon driver updates
- interconnect driver updates
- mei driver updates
- nvmem driver updates
- phy driver updates
- pnp driver updates
- soundwire driver updates
- lots of other tiny driver updates for char and misc drivers
This is looking more and more like the "various driver subsystems
mushed together" tree...
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
mcb: Use DEFINE_RES_MEM() helper macro and fix the end address
PNP: moved EXPORT_SYMBOL so that it immediately followed its function/variable
bus: mhi: pci-generic: Add missing 'pci_disable_pcie_error_reporting()' calls
bus: mhi: Wait for M2 state during system resume
bus: mhi: core: Fix power down latency
intel_th: Wait until port is in reset before programming it
intel_th: msu: Make contiguous buffers uncached
intel_th: Remove an unused exit point from intel_th_remove()
stm class: Spelling fix
nitro_enclaves: Set Bus Master for the NE PCI device
misc: ibmasm: Modify matricies to matrices
misc: vmw_vmci: return the correct errno code
siox: Simplify error handling via dev_err_probe()
fpga: machxo2-spi: Address warning about unused variable
lkdtm/heap: Add init_on_alloc tests
selftests/lkdtm: Enable various testable CONFIGs
lkdtm: Add CONFIG hints in errors where possible
lkdtm: Enable DOUBLE_FAULT on all architectures
lkdtm/heap: Add vmalloc linear overflow test
lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITE
...
- Add support for the CXL Fixed Memory Window Structure, a recent
extension of the ACPI CEDT (CXL Early Discovery Table)
- Add infrastructure for component registers
- Add HDM (Host-managed device memory) decoder definitions
- Define a device model for an HDM decoder tree
- Bridge CXL persistent memory capabilities to an NVDIMM bus /
device-model
- Switch to fine grained mapping of CXL MMIO registers to allow
different drivers / system software to own individual register blocks
- Enable media provisioning commands, and publish the label storage area
size in sysfs
- Miscellaneous cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYOB2lAAKCRDfioYZHlFs
ZyyaAP9O+SnYflFX+3gpoU4pK92VbIUl9KzzHdvJdW2CqtEVMgD9GO4V2Ng17WFg
/Mzn9Mj9S+YaHYvOsN6qEF1V0QvqNQ4=
=+X3m
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dan Williams:
"This subsystem is still in the build-out phase as the bulk of the
update is improvements to enumeration and fleshing out the device
model. In terms of new features, more mailbox commands have been added
to the allowed-list in support of persistent memory provisioning
support targeting v5.15.
The critical update from an enumeration perspective is support for the
CXL Fixed Memory Window Structure that indicates to Linux which system
physical address ranges decode to the CXL Host Bridges in the system.
This allows the driver to detect which address ranges have been mapped
by firmware and what address ranges are available for future hotplug.
So, again, mostly skeleton this round, with more meat targeting v5.15.
Summary:
- Add support for the CXL Fixed Memory Window Structure, a recent
extension of the ACPI CEDT (CXL Early Discovery Table)
- Add infrastructure for component registers
- Add HDM (Host-managed device memory) decoder definitions
- Define a device model for an HDM decoder tree
- Bridge CXL persistent memory capabilities to an NVDIMM bus /
device-model
- Switch to fine grained mapping of CXL MMIO registers to allow
different drivers / system software to own individual register
blocks
- Enable media provisioning commands, and publish the label storage
area size in sysfs
- Miscellaneous cleanups and fixes"
* tag 'cxl-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits)
cxl/pci: Rename CXL REGLOC ID
cxl/acpi: Use the ACPI CFMWS to create static decoder objects
cxl/acpi: Add the Host Bridge base address to CXL port objects
cxl/pmem: Register 'pmem' / cxl_nvdimm devices
libnvdimm: Drop unused device power management support
libnvdimm: Export nvdimm shutdown helper, nvdimm_delete()
cxl/pmem: Add initial infrastructure for pmem support
cxl/core: Add cxl-bus driver infrastructure
cxl/pci: Add media provisioning required commands
cxl/component_regs: Fix offset
cxl/hdm: Fix decoder count calculation
cxl/acpi: Introduce cxl_decoder objects
cxl/acpi: Enumerate host bridge root ports
cxl/acpi: Add downstream port data to cxl_port instances
cxl/Kconfig: Default drivers to CONFIG_CXL_BUS
cxl/acpi: Introduce the root of a cxl_port topology
cxl/pci: Fixup devm_cxl_iomap_block() to take a 'struct device *'
cxl/pci: Add HDM decoder capabilities
cxl/pci: Reserve individual register block regions
cxl/pci: Map registers based on capabilities
...
Merge more updates from Andrew Morton:
"190 patches.
Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
signals, exec, kcov, selftests, compress/decompress, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
ipc/util.c: use binary search for max_idx
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc: use kmalloc for msg_queue and shmid_kernel
ipc sem: use kvmalloc for sem_undo allocation
lib/decompressors: remove set but not used variabled 'level'
selftests/vm/pkeys: exercise x86 XSAVE init state
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
exec: remove checks in __register_bimfmt()
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
hfsplus: report create_date to kstat.btime
hfsplus: remove unnecessary oom message
nilfs2: remove redundant continue statement in a while-loop
kprobes: remove duplicated strong free_insn_page in x86 and s390
init: print out unknown kernel parameters
checkpatch: do not complain about positive return values starting with EPOLL
checkpatch: improve the indented label test
checkpatch: scripts/spdxcheck.py now requires python3
...
Since PTP virtual clock support is added, there can be
several PTP virtual clocks based on one PTP physical
clock for timestamping.
This patch is to extend SO_TIMESTAMPING API to support
PHC (PTP Hardware Clock) binding by adding a new flag
SOF_TIMESTAMPING_BIND_PHC. When PTP virtual clocks are
in use, user space can configure to bind one for
timestamping, but PTP physical clock is not supported
and not needed to bind.
This patch is preparation for timestamp conversion from
raw timestamp to a specific PTP virtual clock time in
core net.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an interface for getting PHC (PTP Hardware Clock)
virtual clocks, which are based on PHC physical clock
providing hardware timestamp to network packets.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MPOL_LOCAL policy has been setup as a real policy, but it is still handled
like a faked POL_PREFERRED policy with one internal MPOL_F_LOCAL flag bit
set, and there are many places having to judge the real 'prefer' or the
'local' policy, which are quite confusing.
In current code, there are 4 cases that MPOL_LOCAL are used:
1. user specifies 'local' policy
2. user specifies 'prefer' policy, but with empty nodemask
3. system 'default' policy is used
4. 'prefer' policy + valid 'preferred' node with MPOL_F_STATIC_NODES
flag set, and when it is 'rebind' to a nodemask which doesn't contains
the 'preferred' node, it will perform as 'local' policy
So make 'local' a real policy instead of a fake 'prefer' one, and kill
MPOL_F_LOCAL bit, which can greatly reduce the confusion for code reading.
For case 4, the logic of mpol_rebind_preferred() is confusing, as Michal
Hocko pointed out:
: I do believe that rebinding preferred policy is just bogus and it should
: be dropped altogether on the ground that a preference is a mere hint from
: userspace where to start the allocation. Unless I am missing something
: cpusets will be always authoritative for the final placement. The
: preferred node just acts as a starting point and it should be really
: preserved when cpusets changes. Otherwise we have a very subtle behavior
: corner cases.
So dump all the tricky transformation between 'prefer' and 'local', and
just record the new nodemask of rebinding.
[feng.tang@intel.com: fix a problem in mpol_set_nodemask(), per Michal Hocko]
Link: https://lkml.kernel.org/r/1622560492-1294-3-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: refine code and comments of mpol_set_nodemask(), per Michal]
Link: https://lkml.kernel.org/r/20210603081807.GE56979@shbuild999.sh.intel.com
Link: https://lkml.kernel.org/r/1622469956-82897-3-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the feature is fully implemented (the faulting path hooks exist
so userspace is notified, and the ioctl to resolve such faults is
available), advertise this as a supported feature.
Link: https://lkml.kernel.org/r/20210503180737.2487560-6-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener
to another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance
(for pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require
jump labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast address
allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks
in NAPI context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP
(our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm 60GHz WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDb+fUACgkQMUZtbf5S
Irs2Jg//aqN0Q8CgIvYCVhPxQw1tY7pTAbgyqgBZ01vwjyvtIOgJiWzSfFEU84mX
M8fcpFX5eTKrOyJ9S6UFfQ/JG114n3hjAxFFT4Hxk2gC1Tg0vHuFQTDHcUl28bUE
mTm61e1YpdorILnv2k5JVQ/wu0vs5QKDrjcYcrcPnh+j93wvnPOgAfDBV95nZzjS
OTt4q2fR8GzLcSYWWsclMbDNkzyTG50RW/0Yd6aGjr5QGvXfrMeXfUJNz533PMf/
w5lNyjRKv+x9mdTZJzU0+msNUrZgUdRz7W8Ey8lD3hJZRE+D6/uU7FtsE8Mi3+uc
HWxeZUyzA3YF1MfVl/eesbxyPT7S/OkLzk4O5B35FbqP0YltaP+bOjq1/nM3ce1/
io9Dx9pIl/2JANUgRCAtLi8Z2dkvRoqTaBxZ/nPudCCljFwDwl6joTMJ7Ow22i5Y
5aIkcXFmZq4LbJDiHvbTlqT7yiuaEvu2UK/23bSIg/K3nF4eAmkY9Y1EgiMf60OF
78Ttw0wk2tUegwaS5MZnCniKBKDyl9gM2F6rbZ/IxQRR2LTXFc1B6gC+ynUxgXfh
Ub8O++6qGYGYZ0XvQH4pzco79p3qQWBTK5beIp2eu6BOAjBVIXq4AibUfoQLACsu
hX7jMPYd0kc3WFgUnKgQP8EnjFSwbf4XiaE7fIXvWBY8hzCw2h4=
=LvtX
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI
context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmDbjZ4UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMMMRAAgwwYgJ8pqac21co0rLCEMLdnSKvu
ueaqCa0F/c+UJ9VmHZ0eDr4FHFgCxKamUP1+/Zn4U53tc483mX/Kz6blGd91hz1K
YnYJGK/tlOmhOJtaZOsp0saRrYn/62kO5tyzp+mhNKje5ON4ICA5jR3Wz+B+RDSM
GXH8EDuxKVkAuqpTZHm79a8LL2G9uDre2uWWHehDxLNAOuLxK/3DnObHRdEi4ZcV
9MSFljLnSocFdLOPRL2R05LQ2ce18NHkTW+FMzPbTezLP4lm62lEsAYOQ1vvUm0A
MoncR4bFGMPYbM5WQmwiDDBBVoo6+HB5Q0AK6fbZwaIJYNFCCc0ytKKyfSyyIaRK
8qSaGeXpUU4nHeygeHvhz7uOVFJfG+WXWRM2WIOwUWU0KKbvCSJ3UDM3cuJHIQol
jv2yuYIZQcz/tXTsOn59ivjsxXclsb3d1Z6FmligaoAoASCerCgeGkhDC4CwHYg5
qZHxE/rtiZxzgZ15cghxITOcGzOVEeYqbo9FzD5aXh66MGzOkrtYwJ+CCVTSx27O
uco6PKelVI2EJ9v4KeQtIMOFDU/ZyJrde/q2otfqQoSaR12Jarpi0qL/5ossuR2B
bEqNN6sX22++QfvLZOAJ8EmYjzkmDfs4q4LYQxMpQfw27OTcK17rXcnI//mmRbUr
SrCuVW3mF8FzxJc=
=1bAZ
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Another merge window, another small audit pull request.
Four patches in total: one is cosmetic, one removes an unnecessary
initialization, one renames some enum values to prevent name
collisions, and one converts list_del()/list_add() to list_move().
None of these are earth shattering and all pass the audit-testsuite
tests while merging cleanly on top of your tree from earlier today"
* tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove unnecessary 'ret' initialization
audit: remove trailing spaces and tabs
audit: Use list_move instead of list_del/list_add
audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition
audit: add blank line after variable declarations
Problem:
On reconfigure of device, there is no way to defend if the backend
storage is matching with the initial backend storage.
Say, if an initial connect request for backend "pool1/image1" got
mapped to /dev/nbd0 and the userspace process is terminated. A next
reconfigure request within NBD_ATTR_DEAD_CONN_TIMEOUT is allowed to
use /dev/nbd0 for a different backend "pool1/image2"
For example, an operation like below could be dangerous:
$ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4"
$ sudo pkill -9 rbd-nbd
$ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs"
Solution:
Provide a way for userspace processes to keep some metadata to identify
between the device and the backend, so that when a reconfigure request is
made, we can compare and avoid such dangerous operations.
With this solution, as part of the initial connect request, backend
path can be stored in the sysfs per device config, so that on a reconfigure
request it's easy to check if the backend path matches with the initial
connect backend path.
Please note, ioctl interface to nbd will not have these changes, as there
won't be any reconfigure.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210429102828.31248-1-prasanna.kalever@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Highlights:
- New think-lmi driver adding support for changing BIOS settings from
within Linux using the standard firmware-attributes class sysfs API
- MS Surface aggregator-cdev now also supports forwarding events to
user-space (for debugging / new driver development purposes only)
- New intel_skl_int3472 driver this provides the necessary glue to
translate ACPI table information to GPIOs, regulators, etc. for
camera sensors on Intel devices with IPU3 attached MIPI cameras
- A whole bunch of other fixes + device-specific quirk additions
- New devm_work_autocancel() devm-helpers.h function
Note this also contains merges of the following immutable branches/tags
shared with other subsystems:
- platform-drivers-x86-goodix-v5.14-1
- intel-gpio-v5.14-1
- linux-pm/acpi-scan
- devm-helpers-v5.14-1
The following is an automated git shortlog grouped by driver:
ACPI:
- scan: initialize local variable to avoid garbage being returned
- scan: Add function to fetch dependent of ACPI device
- scan: Extend acpi_walk_dep_device_list()
- scan: Rearrange dep_unmet initialization
Add intel_skl_int3472 driver:
- Add intel_skl_int3472 driver
ISST:
- Use numa node id for cpu pci dev mapping
- Optimize CPU to PCI device mapping
Input:
- goodix - platform/x86: touchscreen_dmi - Move upside down quirks to touchscreen_dmi.c
MAINTAINERS:
- Update IRC link for Surface System Aggregator subsystem
- Update info for telemetry
Merge remote-tracking branch 'linux-pm/acpi-scan' into review-hans:
- Merge remote-tracking branch 'linux-pm/acpi-scan' into review-hans
Merge tag 'devm-helpers-v5.14-1' into review-hans:
- Merge tag 'devm-helpers-v5.14-1' into review-hans
Merge tag 'intel-gpio-v5.14-1' into review-hans:
- Merge tag 'intel-gpio-v5.14-1' into review-hans
Merge tag 'platform-drivers-x86-goodix-v5.14-1' into review-hans:
- Merge tag 'platform-drivers-x86-goodix-v5.14-1' into review-hans
Remove "default n" entries:
- Remove "default n" entries
Rename hp-wireless to wireless-hotkey:
- Rename hp-wireless to wireless-hotkey
asus-nb-wmi:
- Revert "add support for ASUS ROG Zephyrus G14 and G15"
- Revert "Drop duplicate DMI quirk structures"
dcdbas:
- drop unneeded assignment in host_control_smi()
dell-privacy:
- Add support for Dell hardware privacy
dell-wmi:
- Rename dell-wmi.c to dell-wmi-base.c
dell-wmi-sysman:
- Change user experience when Admin/System Password is modified
- fw_attr_inuse can be static
- Use firmware_attributes_class helper
- Make populate_foo_data functions more robust
dell-wmi-sysman/think-lmi:
- Make fw_attr_class global static
devm-helpers:
- Add resource managed version of work init
docs:
- driver-api: Update Surface Aggregator user-space interface documentation
extcon:
- extcon-max8997: Simplify driver using devm
- extcon-max8997: Fix IRQ freeing at error path
- extcon-max77693.c: Fix potential work-queue cancellation race
- extcon-max14577: Fix potential work-queue cancellation race
firmware_attributes_class:
- Create helper file for handling firmware-attributes class registration events
gpio:
- wcove: Split error handling for CTRL and IRQ registers
- wcove: Unify style of to_reg() with to_ireg()
- wcove: Use IRQ hardware number getter instead of direct access
- crystalcove: remove platform_set_drvdata() + cleanup probe
gpiolib:
- acpi: Add acpi_gpio_get_io_resource()
- acpi: Introduce acpi_get_and_request_gpiod() helper
hdaps:
- Constify static attribute_group struct
ideapad-laptop:
- Ignore VPC event bit 10
intel_cht_int33fe:
- Move to its own subfolder
- Correct "displayport" fwnode reference
intel_ips:
- fix set but unused warning in read_mgtv
intel_pmt_crashlog:
- Constify static attribute_group struct
intel_skl_int3472:
- Uninitialized variable in skl_int3472_handle_gpio_resources()
- Move to intel/ subfolder
- Provide skl_int3472_unregister_clock()
- Provide skl_int3472_unregister_regulator()
- Use ACPI GPIO resource directly
- Fix dependencies (drop CLKDEV_LOOKUP)
- Free ACPI device resources after use
mfd:
- tps68470: Remove tps68470 MFD driver
platform/mellanox:
- mlxreg-hotplug: Revert "move to use request_irq by IRQF_NO_AUTOEN flag"
platform/surface:
- aggregator: Use list_move_tail instead of list_del/list_add_tail in ssh_packet_layer.c
- aggregator: Use list_move_tail instead of list_del/list_add_tail in ssh_request_layer.c
- aggregator: Drop unnecessary variable initialization
- aggregator: Do not return uninitialized value
- aggregator_cdev: Add lockdep support
- aggregator_cdev: Allow enabling of events from user-space
- aggregator_cdev: Add support for forwarding events to user-space
- aggregator: Update copyright
- aggregator: Allow enabling of events without notifiers
- aggregator: Allow registering notifiers without enabling events
- dtx: Add missing mutex_destroy() call in failure path
- aggregator: Fix event disable function
- aggregator_registry: Consolidate node groups for 5th- and 6th-gen devices
- aggregator_registry: Add support for 13" Intel Surface Laptop 4
- aggregator_registry: Update comments for 15" AMD Surface Laptop 4
samsung-laptop:
- set debugfs blobs to read only
- use octal numbers for rwx file permissions
tc1100-wmi:
- Constify static attribute_group struct
think-lmi:
- Move kfree(setting->possible_values) to tlmi_attr_setting_release()
- Split current_value to reflect only the value
- Fix issues with duplicate attributes
- Return EINVAL when kbdlang gets set to a 0 length string
- Add missing MODULE_DEVICE_TABLE
- Avoid potential read before start of the buffer
- Fix check for admin password being set
- Add WMI interface support on Lenovo platforms
thinkpad-lmi:
- Remove unused display_name member from struct tlmi_pwd_setting
thinkpad_acpi:
- Add X1 Carbon Gen 9 second fan support
- Fix inconsistent indenting
tools/power/x86/intel-speed-select:
- v1.10 release
- Fix uncore memory frequency display
toshiba_acpi:
- Fix missing error code in toshiba_acpi_setup_keyboard()
toshiba_haps:
- Fix missing newline in pr_debug call in toshiba_haps_notify
touchscreen_dmi:
- Fix Chuwi Hi10 Pro comment
- Add info for the Goodix GT912 panel of TM800A550L tablets
- Add an extra entry for the upside down Goodix touchscreen on Teclast X89 tablets
x86/platform/uv:
- Constify static attribute_group struct
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmDbELwUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yp2wgAj1mTOJi/4Rx1g8wXLpP/hflEkFMU
yyMeKe3LOEzuo/LZUfW4tqWiXa4aTgN6rUOF8KUumsIor/72hKcczuPVY+qCqF7V
qYZ0vMG93DfAyVPQvBrNjHMXiVevD/gMFRqJEOOgXt96B6Zea4vh1pBvLACAHFZ0
bjkZDX3cO89TSfUF7uhiU9UkMvMMAVs34Knc1Pe4QnZ16e2kPGcKip3qb73yT+xC
8NVRgE6fdSIJfDAVzqpdh91rfDdzHDJ6vT10uijOTkriJciN07UKtYuK5StCpAo5
sXIQllHySHRHj5N0IWZ04w6RMQ+l/9CaHDttkYWW3fV1EU9SVzvp/+d6zA==
=tAuE
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"Highlights:
- New think-lmi driver adding support for changing Lenovo Thinkpad
BIOS settings from within Linux using the standard firmware-
attributes class sysfs API
- MS Surface aggregator-cdev now also supports forwarding events to
user-space (for debugging / new driver development purposes only)
- New intel_skl_int3472 driver this provides the necessary glue to
translate ACPI table information to GPIOs, regulators, etc. for
camera sensors on Intel devices with IPU3 attached MIPI cameras
- A whole bunch of other fixes + device-specific quirk additions
- New devm_work_autocancel() devm-helpers.h function"
* tag 'platform-drivers-x86-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (83 commits)
platform/x86: dell-wmi-sysman: Change user experience when Admin/System Password is modified
platform/x86: intel_skl_int3472: Uninitialized variable in skl_int3472_handle_gpio_resources()
platform/x86: think-lmi: Move kfree(setting->possible_values) to tlmi_attr_setting_release()
platform/x86: think-lmi: Split current_value to reflect only the value
platform/x86: think-lmi: Fix issues with duplicate attributes
platform/x86: think-lmi: Return EINVAL when kbdlang gets set to a 0 length string
platform/x86: intel_cht_int33fe: Move to its own subfolder
platform/x86: intel_skl_int3472: Move to intel/ subfolder
platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_clock()
platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_regulator()
platform/x86: intel_skl_int3472: Use ACPI GPIO resource directly
platform/x86: intel_skl_int3472: Fix dependencies (drop CLKDEV_LOOKUP)
platform/x86: intel_skl_int3472: Free ACPI device resources after use
platform/x86: Remove "default n" entries
platform/x86: ISST: Use numa node id for cpu pci dev mapping
platform/x86: ISST: Optimize CPU to PCI device mapping
tools/power/x86/intel-speed-select: v1.10 release
tools/power/x86/intel-speed-select: Fix uncore memory frequency display
extcon: extcon-max8997: Simplify driver using devm
extcon: extcon-max8997: Fix IRQ freeing at error path
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYNnKNAAKCRCRxhvAZXjc
ok/HAQDz3FK3/yeqxH6OLyCedUcD+YBFPPzrfqX+3y6q3z5tGgD9GAGxFXWcMFA2
/cbfmizwh1eJ3WMnbHUp7x6ogpQhWwQ=
=PpNs
-----END PGP SIGNATURE-----
Merge tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull mount_setattr updates from Christian Brauner:
"A few releases ago the old mount API gained support for a mount
options which prevents following symlinks on a given mount. This adds
support for it in the new mount api through the MOUNT_ATTR_NOSYMFOLLOW
flag via mount_setattr() and fsmount(). With mount_setattr() that flag
can even be applied recursively.
There's an additional ack from Ross Zwisler who originally authored
the nosymfollow patch. As I've already had the patches in my for-next
I didn't add his ack explicitly"
* tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: test MOUNT_ATTR_NOSYMFOLLOW with mount_setattr()
mount: Support "nosymfollow" in new mount api
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmDZ4TwACgkQxWXV+ddt
WDvWCQ/8Dgnk+FBC25JOkqgu29VZtvhfWkY1poDRuG+tca6VeMMnDbPgnTQFyeS1
38F4uNNi/F5UdFuLz3RK0jYgGFKXTp+sFjavFuXeJQpFxe7VSu7JrilZPaA1Dti8
E8Dp42ilrHDikDbZaT8JB9GSnR7a8tHnIs0RfZSIkHsd+rPs7QPtM0TTzEZyLHqH
2uYoVyd5EvclvM5JLVGxRZ3lTU64zfZlJg+TnoAkBpilqUHqpD+x5cEoNYbdhbAb
j3sF11h/zEa/wmU5w5LRd4Qvl3JygCrnAo+6VAxB/u0yzJnH+UwOEJdDDeUpB/9k
2F/Zy69CUQ7DdXM+Es4TOfAyQ9fpPLt8Z96GIBrdD5BxWbam4pyU5xH4cDPNpsHo
zRCepdU1zwD6z3cfEYKmUAx89ewC8SE8XlUOWiGun4pBKdi3tgwcrytTnu+02JND
mEkP4vTWG2bU+S0Si0u/aAKHcFvOwiY9iHM9tmblVvvlSFYrhFAclsytihPwu9NQ
d9FRQMo9JZbQZXqaWpcmd8eXACz9+5AulIhofpuZLciyhvWpL+CQ+xGNnzJ1DnTH
ct0m+ByFb33bTpAnblkgCMQa9xuwlM57NxvIclRaDPXWipqyZReih9fbF1TkHbXQ
0dkrKe8cHn9w+DI1Hs1Hu1zdD7WJJxNMY2x9MowMU9gDVNBbbVs=
=htVu
-----END PGP SIGNATURE-----
Merge tag 'for-5.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"A normal mix of improvements, core changes and features that user have
been missing or complaining about.
User visible changes:
- new sysfs exports:
- add sysfs knob to limit scrub IO bandwidth per device
- device stats are also available in
/sys/fs/btrfs/FSID/devinfo/DEVID/error_stats
- support cancellable resize and device delete ioctls
- change how the empty value is interpreted when setting a property,
so far we have only 'btrfs.compression' and we need to distinguish
a reset to defaults and setting "do not compress", in general the
empty value will always mean 'reset to defaults' for any other
property, for compression it's either 'no' or 'none' to forbid
compression
Performance improvements:
- no need for full sync when truncation does not touch extents,
reported run time change is -12%
- avoid unnecessary logging of xattrs during fast fsyncs (+17%
throughput, -17% runtime on xattr stress workload)
Core:
- preemptive flushing improvements and fixes
- adjust clamping logic on multi-threaded workloads to avoid
flushing too soon
- take into account global block reserve, may help on almost full
filesystems
- continue flushing when there are enough pending delalloc and
ordered bytes
- simplify logic around conditional transaction commit, a workaround
used in the past for throttling that's been superseded by ticket
reservations that manage the throttling in a better way
- subpage blocksize preparation:
- submit read time repair only for each corrupted sector
- scrub repair now works with sectors and not pages
- free space cache (v1) works with sectors and not pages
- more fine grained bio tracking for extents
- subpage support in page callbacks, extent callbacks, end io
callbacks
- simplify transaction abort logic and always abort and don't check
various potentially unreliable stats tracked by the transaction
- exclusive operations can do more checks when started and allow eg.
cancellation of the same running operation
- ensure relocation never runs while we have send operations running,
e.g. when zoned background auto reclaim starts
Fixes:
- zoned: more sanity checks of write pointer
- improve error handling in delayed inodes
- send:
- fix invalid path for unlink operations after parent
orphanization
- fix crash when memory allocations trigger reclaim
- skip compression of we have only one page (can't make things
better)
- empty value of a property newly means reset to default
Other:
- lots of cleanups, comment updates, yearly typo fixing
- disable build on platforms having page size 256K"
* tag 'for-5.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (101 commits)
btrfs: remove unused btrfs_fs_info::total_pinned
btrfs: rip out btrfs_space_info::total_bytes_pinned
btrfs: rip the first_ticket_bytes logic from fail_all_tickets
btrfs: remove FLUSH_DELAYED_REFS from data ENOSPC flushing
btrfs: rip out may_commit_transaction
btrfs: send: fix crash when memory allocations trigger reclaim
btrfs: ensure relocation never runs while we have send operations running
btrfs: shorten integrity checker extent data mount option
btrfs: switch mount option bits to enums and use wider type
btrfs: props: change how empty value is interpreted
btrfs: compression: don't try to compress if we don't have enough pages
btrfs: fix unbalanced unlock in qgroup_account_snapshot()
btrfs: sysfs: export dev stats in devinfo directory
btrfs: fix typos in comments
btrfs: remove a stale comment for btrfs_decompress_bio()
btrfs: send: use list_move_tail instead of list_del/list_add_tail
btrfs: disable build on platforms having page size 256K
btrfs: send: fix invalid path for unlink operations after parent orphanization
btrfs: inline wait_current_trans_commit_start in its caller
btrfs: sink wait_for_unblock parameter to async commit
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmDZzHUACgkQCF8+vY7k
4RWjmQ//QSMEAlz/Xv7YmnkheMEbEvMGgRzUK6UIzfbI9sb2ZPx/1dYySVExrPba
Z1WtrH9oEZGX75IwRSBb6Kb0j2k3E1Y9UZzaAAofE4yyrw5sH8oNvhxhAdak+YRC
XaVJq3xulIq/ClsEyaDzZPFfIhZ5Uo/Cz9s4bZMiA8IwZOImnttJmbrw+Og9ly0+
TrvA0MMkO790h+OOnu5Lv1Q2qJZXaVoTIZ+/icDW29WbQdTQnEZ6XOz/Y+4BZFwW
dA49MLdz7BlypV7a3ijIM4ENPrPmmRIf5agKkQ13Z84gNH14Vb8zShDFq2vHwdKC
7qqvZLuCug/GyK7hyPQHOM8b7wN2utNMZCJIcWVob4oZDzHSvLO+wFI4RCI/RBdY
3tOxrH5cd1FJXB5Vi4KtWNk2Ne63UyaJUYSDk9j30LSwNC/EA6+7fc9LeRY2ykNg
rSD8aXMBoHWtxPxBH5O1ljVdbvG4f8Ds6Yb1cjrxvQWIXvzE7TIC2zhRl0o/r6My
1BWOkVVHw0i5/U+PTyQnVK2XYprq0Jp0b9a0ErkJoOtRpd4VW0pFFcDzWN2tlR0h
McIinHAXdeZO6qHbNMyArYcwfTgWP51IWKm5qDQxPtEnrZDQIjbG3GfZZz9wtks/
Moz7vUTbSUaxSqY/Eg+meg1lilU/lyee9O6hhKCDsVQDxaZUKU8=
=VQX8
-----END PGP SIGNATURE-----
Merge tag 'media/v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- V4L2 core control API was split into separate files
- New RC maps: tango and tc-90405
- Hantro driver got support for G2/HEVC decoder
- av7710 is moving to staging, together with some legacy APIs
- several cleanups related to compat_ioctl32 code
- Move the MPEG-2 stateless control type out of staging
- Address several issues with RPM get logic on media drivers
- Lots of cleanups, bug fixes and improvements.
* tag 'media/v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (394 commits)
media: s5p-mfc: Fix display delay control creation
media: mtk-vpu: on suspend, read/write regs only if vpu is running
media: video-mux: Skip dangling endpoints
media: Fix Media Controller API config checks
media: i2c: rdacm20: Re-work ov10635 reset
media: i2c: rdacm20: Check return values
media: i2c: rdacm20: Report camera module name
media: i2c: rdacm20: Enable noise immunity
media: i2c: rdacm20: Embed 'serializer' field
media: i2c: rdacm21: Power up OV10640 before OV490
media: i2c: rdacm21: Fix OV10640 powerup
media: i2c: rdacm21: Add delay after OV490 reset
media: i2c: max9271: Introduce wake_up() function
media: i2c: max9271: Check max9271_write() return
media: i2c: max9286: Rework comments in .bound()
media: i2c: max9286: Define high channel amplitude
media: i2c: max9286: Cache channel amplitude
media: i2c: max9286: Rename reverse_channel_mv
media: i2c: max9286: Adjust parameters indent
media: hantro: add support for Rockchip RK3036
...
- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
PPC:
- Support for the H_RPT_INVALIDATE hypercall
- Conversion of Book3S entry/exit to C
- Bug fixes
S390:
- new HW facilities for guests
- make inline assembly more robust with KASAN and co
x86:
- Allow userspace to handle emulation errors (unknown instructions)
- Lazy allocation of the rmap (host physical -> guest physical address)
- Support for virtualizing TSC scaling on VMX machines
- Optimizations to avoid shattering huge pages at the beginning of live migration
- Support for initializing the PDPTRs without loading them from memory
- Many TLB flushing cleanups
- Refuse to load if two-stage paging is available but NX is not (this has
been a requirement in practice for over a year)
- A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
CR0/CR4/EFER, using the MMU mode everywhere once it is computed
from the CPU registers
- Use PM notifier to notify the guest about host suspend or hibernate
- Support for passing arguments to Hyper-V hypercalls using XMM registers
- Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on
AMD processors
- Hide Hyper-V hypercalls that are not included in the guest CPUID
- Fixes for live migration of virtual machines that use the Hyper-V
"enlightened VMCS" optimization of nested virtualization
- Bugfixes (not many)
Generic:
- Support for retrieving statistics without debugfs
- Cleanups for the KVM selftests API
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV9UYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIRgf/XX8fKLh24RnTOs2ldIu2AfRGVrT4
QMrr8MxhmtukBAszk2xKvBt8/6gkUjdaIC3xqEnVjxaDaUvZaEtP7CQlF5JV45rn
iv1zyxUKucXrnIOr+gCioIT7qBlh207zV35ArKioP9Y83cWx9uAs22pfr6g+7RxO
h8bJZlJbSG6IGr3voANCIb9UyjU1V/l8iEHqRwhmr/A5rARPfD7g8lfMEQeGkzX6
+/UydX2fumB3tl8e2iMQj6vLVdSOsCkehvpHK+Z33EpkKhan7GwZ2sZ05WmXV/nY
QLAYfD10KegoNWl5Ay4GTp4hEAIYVrRJCLC+wnLdc0U8udbfCuTC31LK4w==
=NcRh
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This covers all architectures (except MIPS) so I don't expect any
other feature pull requests this merge window.
ARM:
- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration and
apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
PPC:
- Support for the H_RPT_INVALIDATE hypercall
- Conversion of Book3S entry/exit to C
- Bug fixes
S390:
- new HW facilities for guests
- make inline assembly more robust with KASAN and co
x86:
- Allow userspace to handle emulation errors (unknown instructions)
- Lazy allocation of the rmap (host physical -> guest physical
address)
- Support for virtualizing TSC scaling on VMX machines
- Optimizations to avoid shattering huge pages at the beginning of
live migration
- Support for initializing the PDPTRs without loading them from
memory
- Many TLB flushing cleanups
- Refuse to load if two-stage paging is available but NX is not (this
has been a requirement in practice for over a year)
- A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
CR0/CR4/EFER, using the MMU mode everywhere once it is computed
from the CPU registers
- Use PM notifier to notify the guest about host suspend or hibernate
- Support for passing arguments to Hyper-V hypercalls using XMM
registers
- Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap
on AMD processors
- Hide Hyper-V hypercalls that are not included in the guest CPUID
- Fixes for live migration of virtual machines that use the Hyper-V
"enlightened VMCS" optimization of nested virtualization
- Bugfixes (not many)
Generic:
- Support for retrieving statistics without debugfs
- Cleanups for the KVM selftests API"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits)
KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
kvm: x86: disable the narrow guest module parameter on unload
selftests: kvm: Allows userspace to handle emulation errors.
kvm: x86: Allow userspace to handle emulation errors
KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
KVM: x86: Enhance comments for MMU roles and nested transition trickiness
KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
KVM: x86/mmu: Use MMU's role to determine PTTYPE
KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
KVM: x86/mmu: Add a helper to calculate root from role_regs
KVM: x86/mmu: Add helper to update paging metadata
KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
KVM: x86/mmu: Get nested MMU's root level from the MMU's role
...
* aggregation handling improvements for some drivers
* hidden AP discovery on 6 GHz and other HE 6 GHz
improvements
* minstrel improvements for no-ack frames
* deferred rate control for TXQs to improve reaction
times
* virtual time-based airtime scheduler
* along with various little cleanups/fixups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmDWUHgACgkQB8qZga/f
l8RxXA/+KjkrFUGDHlMTIzS/i03EcB+idLYy+XBWYuO3PN4p/uCCXLzdGfiRKTxp
iCAjYp0ZP01FlN4sOIN+qqrUMoN0e8xqBEY1DFbJELBB1knV4VR0FeBPPcBMUEgE
xAjdZnIsfKO1yU2mrdQVDnkvXr4jHDALePvGwCQe+4KUbwCmVjo6nb534v17Ie3C
MDNnGq395fCvo9NW+Yvzw6s9ZLdhr28bGhSBgBrqmPBx/9vm3b7BA8qO+X6BzAny
87S2x+z40wabBujCGMvw9J5H6OJ0ZLfbT0TumNznqQbCbmwRdSUC+W/cjgRq/aQn
CA3E9T7tofIr1mk9EgcjE8ax91TB/TTX5Nh9Huki25B7rNRBM57VnuEuzmXP5qBS
xgsr60agQHp2ZAGRPBC9msPcPCbvRfdl9ckTAF8/4vI7uEXSyYViX92M0F1FEy60
Mw13B8WeRA4bH22xHpqdXsyl0x2OQHXHlLHfpzmhcUy0i2cQOQMsWbl3g6CJMTq+
/InNSaG+cqrCVse734KJLumRhOq4+MkedYJlqL7uHZJrXlLgQ9Z8ewANjJ/BjTsQ
d8I16XkKPyv8UiJEwHZ6NaHvU48LR8T/r4ym9c/rz0DTWHZyp87QMYa55iwuk4Zo
cW7G3f/N7dMk9fGRxgfIrnNwNUCMsGcBSJeJyr5qT7u8+RW0bRo=
=nFvj
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2021-06-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes berg says:
====================
Lots of changes:
* aggregation handling improvements for some drivers
* hidden AP discovery on 6 GHz and other HE 6 GHz
improvements
* minstrel improvements for no-ack frames
* deferred rate control for TXQs to improve reaction
times
* virtual time-based airtime scheduler
* along with various little cleanups/fixups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alban Crequy reported a race condition userspace faces when we want to
add some fds and make the syscall return them[1] using seccomp notify.
The problem is that currently two different ioctl() calls are needed by
the process handling the syscalls (agent) for another userspace process
(target): SECCOMP_IOCTL_NOTIF_ADDFD to allocate the fd and
SECCOMP_IOCTL_NOTIF_SEND to return that value. Therefore, it is possible
for the agent to do the first ioctl to add a file descriptor but the
target is interrupted (EINTR) before the agent does the second ioctl()
call.
This patch adds a flag to the ADDFD ioctl() so it adds the fd and
returns that value atomically to the target program, as suggested by
Kees Cook[2]. This is done by simply allowing
seccomp_do_user_notification() to add the fd and return it in this case.
Therefore, in this case the target wakes up from the wait in
seccomp_do_user_notification() either to interrupt the syscall or to add
the fd and return it.
This "allocate an fd and return" functionality is useful for syscalls
that return a file descriptor only, like connect(2). Other syscalls that
return a file descriptor but not as return value (or return more than
one fd), like socketpair(), pipe(), recvmsg with SCM_RIGHTs, will not
work with this flag.
This effectively combines SECCOMP_IOCTL_NOTIF_ADDFD and
SECCOMP_IOCTL_NOTIF_SEND into an atomic opteration. The notification's
return value, nor error can be set by the user. Upon successful invocation
of the SECCOMP_IOCTL_NOTIF_ADDFD ioctl with the SECCOMP_ADDFD_FLAG_SEND
flag, the notifying process's errno will be 0, and the return value will
be the file descriptor number that was installed.
[1]: https://lore.kernel.org/lkml/CADZs7q4sw71iNHmV8EOOXhUKJMORPzF7thraxZYddTZsxta-KQ@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/202012011322.26DCBC64F2@keescook/
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210517193908.3113-4-sargun@sargun.me
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow
the flexible utilization of SMT siblings, without exposing
untrusted domains to information leaks & side channels, plus
to ensure more deterministic computing performance on SMT
systems used by heterogenous workloads.
There's new prctls to set core scheduling groups, which
allows more flexible management of workloads that can share
siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve
'memcache'-like workloads.
- "Age" (decay) average idle time, to better track & improve workloads
such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked
via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable
it at runtime if tooling needs it. Use static keys and
other optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZcPoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g3yw//WfhIqy7Psa9d/MBMjQDRGbTuO4+w22Dj
vmWFU44Q4KJxQHWeIgUlrK+dzvYWvNmflUs2CUUOiDVzxFTHMIyBtL4qCBUbx4Ns
vKAcB9wsWZge2o3WzZqpProRhdoRaSKw8egUr2q7rACVBkckY7eGP/OjWxXU8BdA
b7D0LPWwuIBFfN4pFYeCDLn32Dqr9s6Chyj+ZecabdG7EE6Gu+f1diVcxy7JE/mc
4WWL0D1RqdgpGrBEuMJIxPYekdrZiuy4jtEbztz5gbTBteN1cj3BLfqn0Pc/e6rO
Vyuc5mXCAmzRVi18z6g6bsVl+IA/nrbErENB2OHOhOYtqiZxqGTd4GPWZszMyY17
5AsEO5+5pcaBsy4gyp09qURggBu9zhJnMVmOI3rIHZkmkhwzc6uUJlyhDCTiFWOz
3ZF3LjbZEyCKodMD8qMHbs3axIBpIfZqjzkvSKyFnvfXEGVytVse7NUuWtQ36u92
GnURxVeYY1TDVXvE1Y8owNKMxknKQ6YRlypP7Dtbeo/qG6hShp0xmS7qDLDi0ybZ
ZlK+bDECiVoDf3nvJo+8v5M82IJ3CBt4UYldeRJsa1YCK/FsbK8tp91fkEfnXVue
+U6LPX0AmMpXacR5HaZfb3uBIKRw/QMdP/7RFtBPhpV6jqCrEmuqHnpPQiEVtxwO
UmG7bt94Trk=
=3VDr
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow the
flexible utilization of SMT siblings, without exposing untrusted
domains to information leaks & side channels, plus to ensure more
deterministic computing performance on SMT systems used by
heterogenous workloads.
There are new prctls to set core scheduling groups, which allows
more flexible management of workloads that can share siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve 'memcache'-like
workloads.
- "Age" (decay) average idle time, to better track & improve
workloads such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked via
/sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable it at
runtime if tooling needs it. Use static keys and other
optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/doc: Update the CPU capacity asymmetry bits
sched/topology: Rework CPU capacity asymmetry detection
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
psi: Fix race between psi_trigger_create/destroy
sched/fair: Introduce the burstable CFS controller
sched/uclamp: Fix uclamp_tg_restrict()
sched/rt: Fix Deadline utilization tracking during policy change
sched/rt: Fix RT utilization tracking during policy change
sched: Change task_struct::state
sched,arch: Remove unused TASK_STATE offsets
sched,timer: Use __set_current_state()
sched: Add get_current_state()
sched,perf,kvm: Fix preemption condition
sched: Introduce task_is_running()
sched: Unbreak wakeups
sched/fair: Age the average idle time
sched/cpufreq: Consider reduced CPU capacity in energy calculation
sched/fair: Take thermal pressure into account while estimating energy
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
sched/fair: Return early from update_tg_cfs_load() if delta == 0
...
- Core locking & atomics:
- Convert all architectures to ARCH_ATOMIC: move every
architecture to ARCH_ATOMIC, then get rid of ARCH_ATOMIC
and all the transitory facilities and #ifdefs.
Much reduction in complexity from that series:
63 files changed, 756 insertions(+), 4094 deletions(-)
- Self-test enhancements
- Futexes:
- Add the new FUTEX_LOCK_PI2 ABI, which is a variant that
doesn't set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).
[ The temptation to repurpose FUTEX_LOCK_PI's implicit
setting of FLAGS_CLOCKRT & invert the flag's meaning
to avoid having to introduce a new variant was
resisted successfully. ]
- Enhance futex self-tests
- Lockdep:
- Fix dependency path printouts
- Optimize trace saving
- Broaden & fix wait-context checks
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZaEYRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hPdxAAiNCsxL6X1cZ8zqbWsvLefT9Zqhzgs5u6
gdZele7PNibvbYdON26b5RUzuKfOW/hgyX6LKqr+AiNYTT9PGhcY+tycUr2PGk5R
LMyhJWmmX5cUVPU92ky+z5hEHB2gr4XPJcvgpKKUL0XB1tBaSvy2DtgwPuhXOoT1
1sCQfy63t71snt2RfEnibVW6xovwaA2lsqL81lLHJN4iRFWvqO498/m4+PWkylsm
ig/+VT1Oz7t4wqu3NhTqNNZv+4K4W2asniyo53Dg2BnRm/NjhJtgg4jRibrb0ssb
67Xdq6y8+xNBmEAKj+Re8VpMcu4aj346Ctk7d4gst2ah/Rc0TvqfH6mezH7oq7RL
hmOrMBWtwQfKhEE/fDkng30nrVxc/98YXP0n2rCCa0ySsaF6b6T185mTcYDRDxFs
BVNS58ub+zxrF9Zd4nhIHKaEHiL2ZdDimqAicXN0RpywjIzTQ/y11uU7I1WBsKkq
WkPYs+FPHnX7aBv1MsuxHhb8sUXjG924K4JeqnjF45jC3sC1crX+N0jv4wHw+89V
h4k20s2Tw6m5XGXlgGwMJh0PCcD6X22Vd9Uyw8zb+IJfvNTGR9Rp1Ec+1gMRSll+
xsn6G6Uy9bcNU0SqKlBSfelweGKn4ZxbEPn76Jc8KWLiepuZ6vv5PBoOuaujWht9
KAeOC5XdjMk=
=tH//
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Core locking & atomics:
- Convert all architectures to ARCH_ATOMIC: move every architecture
to ARCH_ATOMIC, then get rid of ARCH_ATOMIC and all the
transitory facilities and #ifdefs.
Much reduction in complexity from that series:
63 files changed, 756 insertions(+), 4094 deletions(-)
- Self-test enhancements
- Futexes:
- Add the new FUTEX_LOCK_PI2 ABI, which is a variant that doesn't
set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).
[ The temptation to repurpose FUTEX_LOCK_PI's implicit setting of
FLAGS_CLOCKRT & invert the flag's meaning to avoid having to
introduce a new variant was resisted successfully. ]
- Enhance futex self-tests
- Lockdep:
- Fix dependency path printouts
- Optimize trace saving
- Broaden & fix wait-context checks
- Misc cleanups and fixes.
* tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
locking/lockdep: Correct the description error for check_redundant()
futex: Provide FUTEX_LOCK_PI2 to support clock selection
futex: Prepare futex_lock_pi() for runtime clock selection
lockdep/selftest: Remove wait-type RCU_CALLBACK tests
lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
lockdep: Fix wait-type for empty stack
locking/selftests: Add a selftest for check_irq_usage()
lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
locking/lockdep: Remove the unnecessary trace saving
locking/lockdep: Fix the dep path printing for backwards BFS
selftests: futex: Add futex compare requeue test
selftests: futex: Add futex wait test
seqlock: Remove trailing semicolon in macros
locking/lockdep: Reduce LOCKDEP dependency list
locking/lockdep,doc: Improve readability of the block matrix
locking/atomics: atomic-instrumented: simplify ifdeffery
locking/atomic: delete !ARCH_ATOMIC remnants
locking/atomic: xtensa: move to ARCH_ATOMIC
locking/atomic: sparc: move to ARCH_ATOMIC
locking/atomic: sh: move to ARCH_ATOMIC
...
This ioctl request reads from uffdio_continue structure written by
userspace which justifies _IOC_WRITE flag. It also writes back to that
structure which justifies _IOC_READ flag.
See NOTEs in include/uapi/asm-generic/ioctl.h for more information.
Fixes: f619147104 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmDV2bEPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDEr8P/ivwROx5NwGcHGmU5RfUCT3aFqhtVHHwD/lu
jPcgoO61kz9TelOu6QRaVuK+mVHxcq3iP4R8nPq/QCkUlEXTmK2xkyhXhGXSYpH4
6jM8+BbC3eG7iAxx6H0UM4JTl4Riwat6ZZtXpWEWs9TKqOHOQYFpMkxSttwVZ1CZ
SjbtFvXLEdzKn6PzUWnKdBNMV/mHsdAtohZit9oJOc4ttc8072XxETQ4TFQ+MSvA
j9zY9QPmWzgcZnotqRRu9sbTGO2vxtXuUtY3sjdD8+C9OgSe9qvpnNjymcmfwaMu
1fBkfh65oaO4ItJBdGOUOoEcFqwN5imPiI7CB/O+ZYkO9sBCuTUPSQwPkyiwXb9r
bUkTaQw2nZiNWsqR1x07fQ2sGYbMp5mnmgmqiV4MUWkLmFp9LZATCWYTTn24cBNS
6SjVP6/8S0r3EhLnYjH0Pn1we5PooU1EF6RlCAd3ewYoo+9fPnwjNYwIWH5i5wB7
+tnei44NACAw9cfbos+BYQQ/dY15OSFzLzIMomlabB7OpXOdDg3H6tJnPbFwWwXb
9nF8XdHqxeDVVVrDCAx1BSodSXm9xqgnQM2RDGTUnpVcAfqAr3MXX6VsyKQDzj8T
QXF9qOVCBAABv6BXAvSQ6mvMJZDUVbUPEPhf7kXzF46JsRd6A7wWoU/OnMGHQ/w7
wjvH8HVy
=fWBV
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for v5.14.
- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
Add a fallback mechanism to the in-kernel instruction emulator that
allows userspace the opportunity to process an instruction the emulator
was unable to. When the in-kernel instruction emulator fails to process
an instruction it will either inject a #UD into the guest or exit to
userspace with exit reason KVM_INTERNAL_ERROR. This is because it does
not know how to proceed in an appropriate manner. This feature lets
userspace get involved to see if it can figure out a better path
forward.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210510144834.658457-2-aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit defines the API for userspace and prepare the common
functionalities to support per VM/VCPU binary stats data readings.
The KVM stats now is only accessible by debugfs, which has some
shortcomings this change series are supposed to fix:
1. The current debugfs stats solution in KVM could be disabled
when kernel Lockdown mode is enabled, which is a potential
rick for production.
2. The current debugfs stats solution in KVM is organized as "one
stats per file", it is good for debugging, but not efficient
for production.
3. The stats read/clear in current debugfs solution in KVM are
protected by the global kvm_lock.
Besides that, there are some other benefits with this change:
1. All KVM VM/VCPU stats can be read out in a bulk by one copy
to userspace.
2. A schema is used to describe KVM statistics. From userspace's
perspective, the KVM statistics are self-describing.
3. With the fd-based solution, a separate telemetry would be able
to read KVM stats in a less privileged environment.
4. After the initial setup by reading in stats descriptors, a
telemetry only needs to read the stats data itself, no more
parsing or setup is needed.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com> #arm64
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-3-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit adds two stats for the socket migration feature to evaluate the
effectiveness: LINUX_MIB_TCPMIGRATEREQ(SUCCESS|FAILURE).
If the migration fails because of the own_req race in receiving ACK and
sending SYN+ACK paths, we do not increment the failure stat. Then another
CPU is responsible for the req.
Link: https://lore.kernel.org/bpf/CAK6E8=cgFKuGecTzSCSQ8z3YJ_163C0uwO9yRvfDSE7vOe9mJA@mail.gmail.com/
Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Skip non-SCTP packets in the new SCTP chunk support for nft_exthdr,
from Phil Sutter.
2) Simplify TCP option sanity check for TCP packets, also from Phil.
3) Add a new expression to store when the rule has been used last time.
4) Pass the hook state object to log function, from Florian Westphal.
5) Document the new sysctl knobs to tune the flowtable timeouts,
from Oz Shlomo.
6) Fix snprintf error check in the new nfnetlink_hook infrastructure,
from Dan Carpenter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pick up dependent changes which either went mainline (x86/urgent is
based on -rc7 and that contains them) as urgent fixes and the current
x86/urgent branch which contains two more urgent fixes, so that the
bigger FPU rework can base off ontop.
Signed-off-by: Borislav Petkov <bp@suse.de>
There may be cases where vendor-specific elements need to be
used over the air. Rather than have driver or firmware add
them and possibly cause problems that way, add them to the
iftype-data band capabilities. This way we can advertise to
userspace first, and use them in mac80211 next.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.e8c4f0347276.Iee5964682b3e9ec51fc1cd57a7c62383eaf6ddd7@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDPuyMeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvxgH/RKvSuRPwkJ2Jcp9
VLi5kCbqtJlYLq6tB6peSJ8otKgxkcRwC0pIY4LlYIAWYboktLQ5RKp/9nB2h2FN
aMZUMu6AI/lVJyFMI5MnKnJIUiUq+WXR3lSSlw68vwFLFdzqUZFNq+bqeiVvnIy1
yqA6naj24Tu/RbYffQoPvdSJcU2SLXRMxwD8HRGiU2d51RaFsOvsZvF+P5TVcsEV
ZmttJeER21CaI/A809eqaFmyGrUOcZZK9roZEbMwanTZOMw18biEsLu/UH4kBX01
JC4+RlGxcWjQ5YNZgChsgoOK/CHzc6ITztTntdeDWAvwZjQFzV7pCy4/3BWne3O+
5178yHM=
=o8cN
-----END PGP SIGNATURE-----
Backmerge tag 'v5.13-rc7' into drm-next
Backmerge Linux 5.13-rc7 to make some pulls from later bases apply,
and to bake in the conflicts so far.
With this socket option, users can change probe_interval for
a transport, asoc or sock after it's created.
Note that if the change is for an asoc, also apply the change
to each transport in this asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'ETHTOOL_A_MODULE_EEPROM_DATA' is a binary attribute, not a nested one.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FUTEX_LOCK_PI futex operand uses a CLOCK_REALTIME based absolute
timeout since it was implemented, but it does not require that the
FUTEX_CLOCK_REALTIME flag is set, because that was introduced later.
In theory as none of the user space implementations can set the
FUTEX_CLOCK_REALTIME flag on this operand, it would be possible to
creatively abuse it and make the meaning invers, i.e. select CLOCK_REALTIME
when not set and CLOCK_MONOTONIC when set. But that's a nasty hackery.
Another option would be to have a new FUTEX_CLOCK_MONOTONIC flag only for
FUTEX_LOCK_PI, but that's also awkward because it does not allow libraries
to handle the timeout clock selection consistently.
So provide a new FUTEX_LOCK_PI2 operand which implements the timeout
semantics which the other operands use and leave FUTEX_LOCK_PI alone.
Reported-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210422194705.440773992@linutronix.de
Now that we have H_RPT_INVALIDATE fully implemented, enable
support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-6-bharata@linux.ibm.com
The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).
Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210621111716.37157-6-steven.price@arm.com
Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This will expose the feature to the guest and automatically
tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
storage) to ensure that the guest cannot see stale tags, and so that
the tags are correctly saved/restored across swap.
Actually exposing the new capability to user space happens in a later
patch.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
[maz: move VM_SHARED sampling into the critical section]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210621111716.37157-3-steven.price@arm.com
Even if POSIX doesn't mandate it, linux users legitimately expect sync() to
flush all data and metadata to physical storage when it is located on the
same system. This isn't happening with virtiofs though: sync() inside the
guest returns right away even though data still needs to be flushed from
the host page cache.
This is easily demonstrated by doing the following in the guest:
$ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
sync() = 0 <0.024068>
and start the following in the host when the 'dd' command completes
in the guest:
$ strace -T -e fsync /usr/bin/sync virtiofs/foo
fsync(3) = 0 <10.371640>
There are no good reasons not to honor the expected behavior of sync()
actually: it gives an unrealistic impression that virtiofs is super fast
and that data has safely landed on HW, which isn't the case obviously.
Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS
request type for this purpose. Provision a 64-bit placeholder for possible
future extensions. Since the file server cannot handle the wait == 0 case,
we skip it to avoid a gratuitous roundtrip. Note that this is
per-superblock: a FUSE_SYNCFS is send for the root mount and for each
submount.
Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in
the file server is treated as permanent success. This ensures
compatibility with older file servers: the client will get the current
behavior of sync() not being propagated to the file server.
Note that such an operation allows the file server to DoS sync(). Since a
typical FUSE file server is an untrusted piece of software running in
userspace, this is disabled by default. Only enable it with virtiofs for
now since virtiofsd is supposedly trusted by the guest kernel.
Reported-by: Robert Krawitz <rlk@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh
scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bluetooth, netfilter and can.
Current release - regressions:
- mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
to fix modifying offloaded qdiscs
- lantiq: net: fix duplicated skb in rx descriptor ring
- rtnetlink: fix regression in bridge VLAN configuration, empty info
is not an error, bot-generated "fix" was not needed
- libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix
umem creation
Current release - new code bugs:
- ethtool: fix NULL pointer dereference during module EEPROM dump via
the new netlink API
- mlx5e: don't update netdev RQs with PTP-RQ, the special purpose queue
should not be visible to the stack
- mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs
- mlx5e: verify dev is present in get devlink port ndo, avoid a panic
Previous releases - regressions:
- neighbour: allow NUD_NOARP entries to be force GCed
- further fixes for fallout from reorg of WiFi locking
(staging: rtl8723bs, mac80211, cfg80211)
- skbuff: fix incorrect msg_zerocopy copy notifications
- mac80211: fix NULL ptr deref for injected rate info
- Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs
Previous releases - always broken:
- bpf: more speculative execution fixes
- netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local
- udp: fix race between close() and udp_abort() resulting in a panic
- fix out of bounds when parsing TCP options before packets
are validated (in netfilter: synproxy, tc: sch_cake and mptcp)
- mptcp: improve operation under memory pressure, add missing wake-ups
- mptcp: fix double-lock/soft lookup in subflow_error_report()
- bridge: fix races (null pointer deref and UAF) in vlan tunnel egress
- ena: fix DMA mapping function issues in XDP
- rds: fix memory leak in rds_recvmsg
Misc:
- vrf: allow larger MTUs
- icmp: don't send out ICMP messages with a source address of 0.0.0.0
- cdc_ncm: switch to eth%d interface naming
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDNP7EACgkQMUZtbf5S
IrvTmxAAgOAM9MdRl9wnYtqXKPXJ1JJtenozwt1yX6b6OG+Ns7cm6YYafU3KoZWR
KlzpvP90vRrER3RqksbMngHzvGjZKDS4LWRur7sRlJ1TBQoLrQCIbriAh07d7wlU
0nnS4J8mczTCKx78QCUYy1QBIX5TQrUbx0JQZDPoIPBjFeILW+Gx/Ghg5tUR4mhf
6icYqwIPocTXO37ZmWOzezZNVOXJF4kaQUZeuOHNe5hOtm6EeIpZbW1Xx3DIr5bd
80a/uNU7nVyos0n7jxnfVE/oelTnYbT5scZeV/PPVqZ4U113f7uex2QP23/XhGSX
lK1EhwPqPOyaNhQoihLM6Xzd4o7aZOcmF8NY96xqjC+DqdN+juvfJU+ClCZojGIj
H4bwCSaj3y2PiimfQdBiIKvYMc5d4zBdw/Dpk/gLDp4d5N638TAtuunK4Mj+TEuT
QF1qkBLIB4HFtLS0M35/twk93md/5GUdSTij2GB3fOkAWRu2m266P5m+4DigW/TB
Xm8FgKdetvxVP0Qv/p49nPEn24Ny8wCafH1x1wVTmoda2qi6j1EXMuSa0PlCdz70
Sl5FrlxdEkOpC4p+Aoc8APSoBXnOriAlpU+z/EVb8Co4JR/+Ge5zBWpsiZDVD0/K
Ay0FW3I87iyn9tw1H1Fzr9GBlVl5vWRauZFHjzl90fWakCrCzJE=
=xxUe
-----END PGP SIGNATURE-----
Merge tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
bluetooth, netfilter and can.
Current release - regressions:
- mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
to fix modifying offloaded qdiscs
- lantiq: net: fix duplicated skb in rx descriptor ring
- rtnetlink: fix regression in bridge VLAN configuration, empty info
is not an error, bot-generated "fix" was not needed
- libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
creation
Current release - new code bugs:
- ethtool: fix NULL pointer dereference during module EEPROM dump via
the new netlink API
- mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
queue should not be visible to the stack
- mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs
- mlx5e: verify dev is present in get devlink port ndo, avoid a panic
Previous releases - regressions:
- neighbour: allow NUD_NOARP entries to be force GCed
- further fixes for fallout from reorg of WiFi locking (staging:
rtl8723bs, mac80211, cfg80211)
- skbuff: fix incorrect msg_zerocopy copy notifications
- mac80211: fix NULL ptr deref for injected rate info
- Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs
Previous releases - always broken:
- bpf: more speculative execution fixes
- netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local
- udp: fix race between close() and udp_abort() resulting in a panic
- fix out of bounds when parsing TCP options before packets are
validated (in netfilter: synproxy, tc: sch_cake and mptcp)
- mptcp: improve operation under memory pressure, add missing
wake-ups
- mptcp: fix double-lock/soft lookup in subflow_error_report()
- bridge: fix races (null pointer deref and UAF) in vlan tunnel
egress
- ena: fix DMA mapping function issues in XDP
- rds: fix memory leak in rds_recvmsg
Misc:
- vrf: allow larger MTUs
- icmp: don't send out ICMP messages with a source address of 0.0.0.0
- cdc_ncm: switch to eth%d interface naming"
* tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
net: ethernet: fix potential use-after-free in ec_bhf_remove
selftests/net: Add icmp.sh for testing ICMP dummy address responses
icmp: don't send out ICMP messages with a source address of 0.0.0.0
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
net: ll_temac: Fix TX BD buffer overwrite
net: ll_temac: Add memory-barriers for TX BD access
net: ll_temac: Make sure to free skb when it is completely used
MAINTAINERS: add Guvenc as SMC maintainer
bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
bnxt_en: Fix TQM fastpath ring backing store computation
bnxt_en: Rediscover PHY capabilities after firmware reset
cxgb4: fix wrong shift.
mac80211: handle various extensible elements correctly
mac80211: reset profile_periodicity/ema_ap
cfg80211: avoid double free of PMSR request
cfg80211: make certificate generation more robust
mac80211: minstrel_ht: fix sample time check
net: qed: Fix memcpy() overflow of qed_dcbx_params()
net: cdc_eem: fix tx fixup skb leak
net: hamradio: fix memory leak in mkiss_close
...
Modify the pr_info content from int to char *, this looks more readable.
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When constructing ICMP response messages, the kernel will try to pick a
suitable source address for the outgoing packet. However, if no IPv4
addresses are configured on the system at all, this will fail and we end up
producing an ICMP message with a source address of 0.0.0.0. This can happen
on a box routing IPv4 traffic via v6 nexthops, for instance.
Since 0.0.0.0 is not generally routable on the internet, there's a good
chance that such ICMP messages will never make it back to the sender of the
original packet that the ICMP message was sent in response to. This, in
turn, can create connectivity and PMTUd problems for senders. Fortunately,
RFC7600 reserves a dummy address to be used as a source for ICMP
messages (192.0.0.8/32), so let's teach the kernel to substitute that
address as a last resort if the regular source address selection procedure
fails.
Below is a quick example reproducing this issue with network namespaces:
ip netns add ns0
ip l add type veth peer netns ns0
ip l set dev veth0 up
ip a add 10.0.0.1/24 dev veth0
ip a add fc00:dead:cafe:42::1/64 dev veth0
ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2
ip -n ns0 l set dev veth0 up
ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0
ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1
ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0
ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1
tcpdump -tpni veth0 -c 2 icmp &
ping -w 1 10.1.0.1 > /dev/null
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64
IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
2 packets captured
2 packets received by filter
0 packets dropped by kernel
With this patch the above capture changes to:
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64
IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Juliusz Chroboczek <jch@irif.fr>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added a new member named csum_enabled in struct mptcp_sock,
used a dummy mptcp_is_checksum_enabled() helper to initialize it.
Also added a new member named mptcpi_csum_enabled in struct mptcp_info
to expose the csum_enabled flag.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit in sched/urgent moved the cfs_rq_is_decayed() function:
a7b359fc6a37: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
and this fresh commit in sched/core modified it in the old location:
9e077b52d86a: ("sched/pelt: Check that *_avg are null when *_sum are")
Merge the two variants.
Conflicts:
kernel/sched/fair.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds a new "DMA Buffer ioctls" section to the dma-buf docs and adds
documentation for DMA_BUF_IOCTL_SYNC.
v2 (Daniel Vetter):
- Fix a couple typos
- Add commentary about synchronization with other devices
- Use item list format for describing flags
v3 (Pekka Paalanen):
- Clarify stalling requirements.
- Be more clear that that DMA_BUF_IOCTL_SYNC with SINC_END has to be
called before more GPU work happens.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210617194258.579011-1-jason@jlekstrand.net
To support testing of PCI/PCIe drivers in UML, add a PCI bus
support driver. This driver uses virtio, which in UML is really
just vhost-user, to talk to devices, and adds the devices to
the virtual PCI bus in the system.
Since virtio already allows DMA/bus mastering this really isn't
all that hard, of course we need the logic_iomem infrastructure
that was added by a previous patch.
The protocol to talk to the device is has a few fairly simple
messages for reading to/writing from config and IO spaces, and
messages for the device to send the various interrupts (INT#,
MSI/MSI-X and while suspended PME#).
Note that currently no offical virtio device ID is assigned for
this protocol, as a consequence this patch requires defining it
in the Kconfig, with a default that makes the driver refuse to
work at all.
Finally, in order to add support for MSI/MSI-X interrupts, some
small changes are needed in the UML IRQ code, it needs to have
more interrupts, changing NR_IRQS from 64 to 128 if this driver
is enabled, but not actually use them for anything so that the
generic IRQ domain/MSI infrastructure can allocate IRQ numbers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-06-17
The following pull-request contains BPF updates for your *net-next* tree.
We've added 50 non-merge commits during the last 25 day(s) which contain
a total of 148 files changed, 4779 insertions(+), 1248 deletions(-).
The main changes are:
1) BPF infrastructure to migrate TCP child sockets from a listener to another
in the same reuseport group/map, from Kuniyuki Iwashima.
2) Add a provably sound, faster and more precise algorithm for tnum_mul() as
noted in https://arxiv.org/abs/2105.05398, from Harishankar Vishwanathan.
3) Streamline error reporting changes in libbpf as planned out in the
'libbpf: the road to v1.0' effort, from Andrii Nakryiko.
4) Add broadcast support to xdp_redirect_map(), from Hangbin Liu.
5) Extends bpf_map_lookup_and_delete_elem() functionality to 4 more map
types, that is, {LRU_,PERCPU_,LRU_PERCPU_,}HASH, from Denis Salopek.
6) Support new LLVM relocations in libbpf to make them more linker friendly,
also add a doc to describe the BPF backend relocations, from Yonghong Song.
7) Silence long standing KUBSAN complaints on register-based shifts in
interpreter, from Daniel Borkmann and Eric Biggers.
8) Add dummy PT_REGS macros in libbpf to fail BPF program compilation when
target arch cannot be determined, from Lorenz Bauer.
9) Extend AF_XDP to support large umems with 1M+ pages, from Magnus Karlsson.
10) Fix two minor libbpf tc BPF API issues, from Kumar Kartikeya Dwivedi.
11) Move libbpf BPF_SEQ_PRINTF/BPF_SNPRINTF macros that can be used by BPF
programs to bpf_helpers.h header, from Florent Revest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Th_strings arrays netdev_features_strings, tunable_strings, and
phy_tunable_strings has been moved to file net/ethtool/common.c.
So fixes the comment.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.
The hypercall exits to userspace to manage the guest shared regions and
integrate with the userspace VMM's migration code.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a new version of KVM_GET_SREGS / KVM_SET_SREGS.
It has the following changes:
* Has flags for future extensions
* Has vcpu's PDPTRs, allowing to save/restore them on migration.
* Lacks obsolete interrupt bitmap (done now via KVM_SET_VCPU_EVENTS)
New capability, KVM_CAP_SREGS2 is added to signal
the userspace of this ioctl.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210607090203.133058-8-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Modeled after KVM_CAP_ENFORCE_PV_FEATURE_CPUID, the new capability allows
for limiting Hyper-V features to those exposed to the guest in Hyper-V
CPUIDs (0x40000003, 0x40000004, ...).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210521095204.2161214-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
io-wq defaults to per-node masks for IO workers. This works fine by
default, but isn't particularly handy for workloads that prefer more
specific affinities, for either performance or isolation reasons.
This adds IORING_REGISTER_IOWQ_AFF that allows the user to pass in a CPU
mask that is then applied to IO thread workers, and an
IORING_UNREGISTER_IOWQ_AFF that simply resets the masks back to the
default of per-node.
Note that no care is given to existing IO threads, they will need to go
through a reschedule before the affinity is correct if they are already
running or sleeping.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new optional expression that tells you when last matching on a
given rule / set element element has happened.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add support to collect more detailed SMC fallback reason statistics and
provide these statistics to user space on the netlink interface.
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the netlink function which collects the statistics information and
delivers it to the userspace.
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While events can already be enabled and disabled via the generic request
IOCTL, this bypasses the internal reference counting mechanism of the
controller. Due to that, disabling an event will turn it off regardless
of any other client having requested said event, which may break
functionality of that client.
To solve this, add IOCTLs wrapping the ssam_controller_event_enable()
and ssam_controller_event_disable() functions, which have been
previously introduced for this specific purpose.
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210604134755.535590-6-luzmaximilian@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Currently, debugging unknown events requires writing a custom driver.
This is somewhat difficult, slow to adapt, and not entirely
user-friendly for quickly trying to figure out things on devices of some
third-party user. We can do better. We already have a user-space
interface intended for debugging SAM EC requests, so let's add support
for receiving events to that.
This commit provides support for receiving events by reading from the
controller file. It additionally introduces two new IOCTLs to control
which event categories will be forwarded. Specifically, a user-space
client can specify which target categories it wants to receive events
from by registering the corresponding notifier(s) via the IOCTLs and
after that, read the received events by reading from the controller
device.
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210604134755.535590-5-luzmaximilian@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT
to check if the attached eBPF program is capable of migrating sockets. When
the eBPF program is attached, we run it for socket migration if the
expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or
net.ipv4.tcp_migrate_req is enabled.
Currently, the expected_attach_type is not enforced for the
BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the
earlier idea in the commit aac3fc320d ("bpf: Post-hooks for sys_bind") to
fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type().
Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to
select a new listener based on the child socket. migrating_sk varies
depending on if it is migrating a request in the accept queue or during
3WHS.
- accept_queue : sock (ESTABLISHED/SYN_RECV)
- 3WHS : request_sock (NEW_SYN_RECV)
In the eBPF program, we can select a new listener by
BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning
SK_DROP. This feature is useful when listeners have different settings at
the socket API level or when we want to free resources as soon as possible.
- SK_PASS with selected_sk, select it as a new listener
- SK_PASS with selected_sk NULL, fallbacks to the random selection
- SK_DROP, cancel the migration.
There is a noteworthy point. We select a listening socket in three places,
but we do not have struct skb at closing a listener or retransmitting a
SYN+ACK. On the other hand, some helper functions do not expect skb is NULL
(e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer()
in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb
temporarily before running the eBPF program.
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp
We will call sock_reuseport.prog for socket migration in the next commit,
so the eBPF program has to know which listener is closing to select a new
listener.
We can currently get a unique ID of each listener in the userspace by
calling bpf_map_lookup_elem() for BPF_MAP_TYPE_REUSEPORT_SOCKARRAY map.
This patch makes the pointer of sk available in sk_reuseport_md so that we
can get the ID by BPF_FUNC_get_socket_cookie() in the eBPF program.
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/netdev/20201119001154.kapwihc2plp4f7zc@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20210612123224.12525-9-kuniyu@amazon.co.jp
Some of the commands have already been defined for the support of RAW
commands (to be blocked). Unlike their usage in the RAW interface, when
used through the supported interface, they will be coordinated and
marshalled along with other commands being issued by userspace and the
driver itself. That coordination will be added later.
The list of commands was determined based on the learnings from
libnvdimm and this list is provided directly from Dan.
Recommended-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20210413140907.534404-1-ben.widawsky@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDGe+4eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/IUH/iyHVulAtAhL9bnR
qL4M1kWfcG1sKS2TzGRZzo6YiUABf89vFP90r4sKxG3AKrb8YkTwmJr8B/sWwcsv
PpKkXXTobbDfpSrsXGEapBkQOE7h2w739XeXyBLRPkoCR4UrEFn68TV2rLjMLBPS
/EIZkonXLWzzWalgKDP4wSJ7GaQxi3LMx3dGAvbFArEGZ1mPHNlgWy2VokFY/yBf
qh1EZ5rugysc78JCpTqfTf3fUPK2idQW5gtHSMbyESrWwJ/3XXL9o1ET3JWURYf1
b0FgVztzddwgULoIGWLxDH5WWts3l54sjBLj0yrLUlnGKA5FjrZb12g9PdhdywuY
/8KfjeE=
=JfJm
-----END PGP SIGNATURE-----
Merge tag 'v5.13-rc6' into tty-next
We want the tty fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDGe+4eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/IUH/iyHVulAtAhL9bnR
qL4M1kWfcG1sKS2TzGRZzo6YiUABf89vFP90r4sKxG3AKrb8YkTwmJr8B/sWwcsv
PpKkXXTobbDfpSrsXGEapBkQOE7h2w739XeXyBLRPkoCR4UrEFn68TV2rLjMLBPS
/EIZkonXLWzzWalgKDP4wSJ7GaQxi3LMx3dGAvbFArEGZ1mPHNlgWy2VokFY/yBf
qh1EZ5rugysc78JCpTqfTf3fUPK2idQW5gtHSMbyESrWwJ/3XXL9o1ET3JWURYf1
b0FgVztzddwgULoIGWLxDH5WWts3l54sjBLj0yrLUlnGKA5FjrZb12g9PdhdywuY
/8KfjeE=
=JfJm
-----END PGP SIGNATURE-----
Merge tag 'v5.13-rc6' into char-misc-next
We need the fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add support to create (and destroy) interfaces via a new
rtnetlink kind "wwan". The responsible driver has to use
the new wwan_register_ops() to make this possible.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, for example in the upcoming WWAN framework changes,
there's no natural "parent netdev", so sometimes dummy netdevs are
created or similar. IFLA_PARENT_DEV_NAME is a new attribute intended to
contain a device (sysfs, struct device) name that can be used instead
when creating a new netdev, if the rtnetlink family implements it.
As suggested by Parav Pandit, we also introduce IFLA_PARENT_DEV_BUS_NAME
attribute in order to uniquely identify a device on the system (with
bus/name pair).
ip-link(8) support for the generic parent device attributes will help
us avoid code duplication, so no other link type will require a custom
code to handle the parent name attribute. E.g. the WWAN interface
creation command will looks like this:
$ ip link add wwan0-1 parent-dev wwan0 type wwan channel-id 1
So, some future subsystem (or driver) FOO will have an interface
creation command that looks like this:
$ ip link add foo1-3 parent-dev foo1 type foo bar-id 3 baz-type Y
Below is an example of dumping link info of a random device with these
new attributes:
$ ip --details link show wlp0s20f3
4: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP mode DORMANT group default qlen 1000
...
parent_bus pci parent_dev 0000:00:14.3
Co-developed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Co-developed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmDEwEEQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpu2uEACIZXc0e4Jz2tJmtlLzhm0T+YUXu88/n0Ki
3HsCfjyk0k2tvGjAmzLgBruR+0dxuoTlC8ZyLWkCgYFvRxCQMrjxB4+Q53WAAPud
ictv/5C992eWfmkk5lKWYh/SVUZU0nN/HlcITggFzH+/Ek4RgqBJK6rYPpN4YM6W
OifSZ22xwjZy9i8svzCPzGUbS5d5qbNeRSaacfADWFmzTqqzllWz/KkN633UFefR
tkqWy610P0O8fz3xe5HcECIOc3aNRZuk5zrNqCJPvxcOdYlqlL/HfsWMACEiC/g1
N3ahNGrUzJqhB1QNAIKATKAlh8hzAws9t/alLJQzSHZWRu7vso0qctoVJT3i6xRp
qD17EAQgrC0R0fQxdHmoMzRHEnKPCXQx36wb/mhZbG60/Q+scmSrFXp86XvbKZiI
uzHTsUL/80bRXHuVrKXT+JWTRCzpv1yk9ufIVzSOheVCl/H6bxZ29cabBL2/XvvI
d+OljDsy7oMH6rOBFi3XYmwZShEoUqeATeFoFf5isjkWfe7qdiMVu4apD8fBhIjX
8rNLjp0nIKN+5IjHwFkAXRwp8P1SJQ8c7Tl4I6xY82FsMQxUUgMhjSqrn58i2g9d
Lem9YHKaXIbw1yfWcaf8erA6d0S4rujG+j3miG0y248kOTb9FeMbfbRgjj8v99m1
XB7F9SIQUw==
=MbrN
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Just an API change for the registration changes that went into this
release. Better to get it sorted out now than before it's too late"
* tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-block:
io_uring: add feature flag for rsrc tags
io_uring: change registration/upd/rsrc tagging ABI
Add set of defines and constants for SOCK_SEQPACKET support
in vsock.
Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Run the following command to find and remove the trailing spaces and tabs:
sed -r -i 's/[ \t]+$//' <audit_files>
The files to be checked are as follows:
kernel/audit*
include/linux/audit.h
include/uapi/linux/audit.h
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add IORING_FEAT_RSRC_TAGS indicating that io_uring supports a bunch of
new IORING_REGISTER operations, in particular
IORING_REGISTER_[FILES[,UPDATE]2,BUFFERS[2,UPDATE]] that support rsrc
tagging, and also indicating implemented dynamic fixed buffer updates.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b995d4045b6c6b4ab7510ca124fd25ac2203af7.1623339162.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are ABI moments about recently added rsrc registration/update and
tagging that might become a nuisance in the future. First,
IORING_REGISTER_RSRC[_UPD] hide different types of resources under it,
so breaks fine control over them by restrictions. It works for now, but
once those are wanted under restrictions it would require a rework.
It was also inconvenient trying to fit a new resource not supporting
all the features (e.g. dynamic update) into the interface, so better
to return to IORING_REGISTER_* top level dispatching.
Second, register/update were considered to accept a type of resource,
however that's not a good idea because there might be several ways of
registration of a single resource type, e.g. we may want to add
non-contig buffers or anything more exquisite as dma mapped memory.
So, remove IORING_RSRC_[FILE,BUFFER] out of the ABI, and place them
internally for now to limit changes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b554897a7c17ad6e3becc48dfed2f7af9f423d5.1623339162.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The NLMSG_LENGTH(0) may confuse the API users,
NLMSG_HDRLEN is much more clear.
Besides, some code style problems are also fixed.
Signed-off-by: Chen Li <chenli@uniontech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Add nfgenmsg field to nfnetlink's struct nfnl_info and use it.
2) Remove nft_ctx_init_from_elemattr() and nft_ctx_init_from_setattr()
helper functions.
3) Add the nf_ct_pernet() helper function to fetch the conntrack
pernetns data area.
4) Expose TCP and UDP flowtable offload timeouts through sysctl,
from Oz Shlomo.
5) Add nfnetlink_hook subsystem to fetch the netfilter hook
pipeline configuration, from Florian Westphal. This also includes
a new field to annotate the hook type as metadata.
6) Fix unsafe memory access to non-linear skbuff in the new SCTP
chunk support for nft_exthdr, from Phil Sutter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This nfnl subsystem allows to dump the list of all active netfiler hooks,
e.g. defrag, conntrack, nf/ip/arp/ip6tables and so on.
This helps to see what kind of features are currently enabled in
the network stack.
Sample output from nft tool using this infra:
$ nft list hook ip input
family ip hook input {
+0000000010 nft_do_chain_inet [nf_tables] # nft table firewalld INPUT
+0000000100 nf_nat_ipv4_local_in [nf_nat]
+2147483647 ipv4_confirm [nf_conntrack]
}
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter and wireguard trees.
The bpf vs lockdown+audit fix is the most notable.
Current release - regressions:
- virtio-net: fix page faults and crashes when XDP is enabled
- mlx5e: fix HW timestamping with CQE compression, and make sure they
are only allowed to coexist with capable devices
- stmmac:
- fix kernel panic due to NULL pointer dereference of mdio_bus_data
- fix double clk unprepare when no PHY device is connected
Current release - new code bugs:
- mt76: a few fixes for the recent MT7921 devices and runtime
power management
Previous releases - regressions:
- ice: - track AF_XDP ZC enabled queues in bitmap to fix copy mode Tx
- fix allowing VF to request more/less queues via virtchnl
- correct supported and advertised autoneg by using PHY capabilities
- allow all LLDP packets from PF to Tx
- kbuild: quote OBJCOPY var to avoid a pahole call break the build
Previous releases - always broken:
- bpf, lockdown, audit: fix buggy SELinux lockdown permission checks
- mt76: address the recent FragAttack vulnerabilities not covered
by generic fixes
- ipv6: fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
- Bluetooth:
- fix the erroneous flush_work() order, to avoid double free
- use correct lock to prevent UAF of hdev object
- nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
- ieee802154: multiple fixes to error checking and return values
- igb: fix XDP with PTP enabled
- intel: add correct exception tracing for XDP
- tls: fix use-after-free when TLS offload device goes down and back up
- ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
- netfilter: nft_ct: skip expectations for confirmed conntrack
- mptcp: fix falling back to TCP in presence of out of order packets
early in connection lifetime
- wireguard: switch from O(n) to a O(1) algorithm for maintaining peers,
fixing stalls and a large memory leak in the process
Misc:
- devlink: correct VIRTUAL port to not have phys_port attributes
- Bluetooth: fix VIRTIO_ID_BT assigned number
- net: return the correct errno code ENOBUF -> ENOMEM
- wireguard:
- peer: allocate in kmem_cache saving 25% on peer memory
- do not use -O3
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmC6yGMACgkQMUZtbf5S
Irv67w//ZpT4+KHETUIS+CgeUIgjAQD0FTmO4iboHFGG7BadWEZpEVswUU0xBfY/
RJrSWAEqTga8zbjWqRaLRx5Qii99F2hHPZ502VR6x6NbPu1mNdS5rUOa61YbtGCv
v4sC45eOvG7T/y5mceq4rQaPsQKEUUAIgYzIOpjSiDoMfgFCT3UUF/UrBhgLzybj
aMXd12rg17dN+RJeNOZjQKligNENX9A0tBtSGXxs9hhYYbY25O+uECOsESrA1RKt
uHeh003iqApT5x8hmJsdMDtis05n7S/Bq1/4RZfAdbTcgJngepw570bQ999tbXqE
HeB3Ls9k3Vi9W6svfUkYjFGt3GYygsVGPjFAVhC+g0TZXAgdsh5w2SPQAgcIrzIr
WOfDL9hu7OJp/XRsPiB9pg8cul7a4Q5Yhp29bvN33u43AMij2TWD0CpKCQt9UQdi
8V0KOLAGC8bzXx35VTP/pbbwAI21PIYxVKfe/0cOJKShTMtfPePx1a2cuYRWoQSP
PYYbQaY6WhfUniV3DEmvL1Z+dgL0yyaJKIV2IdBHR8MPKKy+5kD+6HDaNo2lO75J
wWSN1LtoVKrc5msCD375epGmkbjatpWdfzOE+pljWHz5LnW+2cGwFhCo7+UJhAG5
XwE8+G9YUyYH51PjFpGBsoPBWEmYmIMnY34p20A1Pz1M7/HFfXc=
=sNP5
-----END PGP SIGNATURE-----
Merge tag 'net-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from bpf, wireless, netfilter and
wireguard trees.
The bpf vs lockdown+audit fix is the most notable.
Things haven't slowed down just yet, both in terms of regressions in
current release and largish fixes for older code, but we usually see a
slowdown only after -rc5.
Current release - regressions:
- virtio-net: fix page faults and crashes when XDP is enabled
- mlx5e: fix HW timestamping with CQE compression, and make sure they
are only allowed to coexist with capable devices
- stmmac:
- fix kernel panic due to NULL pointer dereference of
mdio_bus_data
- fix double clk unprepare when no PHY device is connected
Current release - new code bugs:
- mt76: a few fixes for the recent MT7921 devices and runtime power
management
Previous releases - regressions:
- ice:
- track AF_XDP ZC enabled queues in bitmap to fix copy mode Tx
- fix allowing VF to request more/less queues via virtchnl
- correct supported and advertised autoneg by using PHY
capabilities
- allow all LLDP packets from PF to Tx
- kbuild: quote OBJCOPY var to avoid a pahole call break the build
Previous releases - always broken:
- bpf, lockdown, audit: fix buggy SELinux lockdown permission checks
- mt76: address the recent FragAttack vulnerabilities not covered by
generic fixes
- ipv6: fix KASAN: slab-out-of-bounds Read in
fib6_nh_flush_exceptions
- Bluetooth:
- fix the erroneous flush_work() order, to avoid double free
- use correct lock to prevent UAF of hdev object
- nfc: fix NULL ptr dereference in llcp_sock_getname() after failed
connect
- ieee802154: multiple fixes to error checking and return values
- igb: fix XDP with PTP enabled
- intel: add correct exception tracing for XDP
- tls: fix use-after-free when TLS offload device goes down and back
up
- ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
- netfilter: nft_ct: skip expectations for confirmed conntrack
- mptcp: fix falling back to TCP in presence of out of order packets
early in connection lifetime
- wireguard: switch from O(n) to a O(1) algorithm for maintaining
peers, fixing stalls and a large memory leak in the process
Misc:
- devlink: correct VIRTUAL port to not have phys_port attributes
- Bluetooth: fix VIRTIO_ID_BT assigned number
- net: return the correct errno code ENOBUF -> ENOMEM
- wireguard:
- peer: allocate in kmem_cache saving 25% on peer memory
- do not use -O3"
* tag 'net-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
cxgb4: avoid link re-train during TC-MQPRIO configuration
sch_htb: fix refcount leak in htb_parent_to_leaf_offload
wireguard: allowedips: free empty intermediate nodes when removing single node
wireguard: allowedips: allocate nodes in kmem_cache
wireguard: allowedips: remove nodes in O(1)
wireguard: allowedips: initialize list head in selftest
wireguard: peer: allocate in kmem_cache
wireguard: use synchronize_net rather than synchronize_rcu
wireguard: do not use -O3
wireguard: selftests: make sure rp_filter is disabled on vethc
wireguard: selftests: remove old conntrack kconfig value
virtchnl: Add missing padding to virtchnl_proto_hdrs
ice: Allow all LLDP packets from PF to Tx
ice: report supported and advertised autoneg using PHY capabilities
ice: handle the VF VSI rebuild failure
ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
ice: Fix allowing VF to request more/less queues via virtchnl
virtio-net: fix for skb_over_panic inside big mode
ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
fib: Return the correct errno code
...
The audio, video and OSD APIs are used upstream only by the
av7110 driver, which was moved to staging.
So, move the corresponding header files to it.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
- Fixes UAF and CVE-2021-3564
- Fix VIRTIO_ID_BT to use an unassigned ID
- Fix firmware loading on some Intel Controllers
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmC5RWQZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKS0+D/4kJF7G9FohvLJUzTrrhcPx
nEE/5IL1eZeCQVCdKmgMeiy6K2iARGY9ZNqnx/AX1SJN9bHI7WsL6uy2RV7r57kx
iP2XZsV2uzXbwY9KVvfXBMNoCA2E4xS0UxpxA2h1znRUgMWDFLFkZydwYsBieGb6
tXZwJo3WOnDp169RbKdWTrWstYlL6KTTJoIxaVYWlghXVZ8Fl8LUHbhnx5MEqhqz
469AfGDlUKEoiYUUDwNrwX1ory/RWhcDxTFpDeji48U0P7oLFL73Aoyy/WP0B2FO
dhOErn38YUDivwBqSO2O21RUsICREbyLqHy6K/JWe4RqY50nEmWhfQo59ApzSuV3
e2HcbDwK5vgGYxmU6T9vb5S0nV1AgTV+5O3t1Mj6ZVqTAl6b2OkfqskCZzTrklIS
aKIP4viRAPLsJMdKKHW1mhR3zBH0deYEovIpFy+LkjX5aFsrEgc8hRn7i5ceF8GW
d+Ov9LPJQJQTK+r6W7xPiCUkC1dj/SMZ756Gr6cGhXPzY1DgBoyaaoZV1K4mz17g
dlLwXfF4nIJqJFop3iTPVGWVoeapZ/tgu73iTUdkXIEbqj19wj67nw+xz0WGs1pB
B1H/OemQS4/yfo4IsfLRDAJ14Q+5JS4qRKBf7p4e/yj533BW6lia0GTdujO+N4eT
FQfnUoYaexkiPYwGMyjRpQ==
=X9Cg
-----END PGP SIGNATURE-----
Merge tag 'for-net-2021-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
bluetooth pull request for net:
- Fixes UAF and CVE-2021-3564
- Fix VIRTIO_ID_BT to use an unassigned ID
- Fix firmware loading on some Intel Controllers
Signed-off-by: David S. Miller <davem@davemloft.net>
Including <linux/in.h> and <netinet/in.h> in the dependencies breaks
compilation of trinity due to multiple definitions. <linux/in.h> is only
used in <linux/icmp.h> to provide the definition of the struct in_addr,
but this can be substituted out by using the datatype __be32.
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turned out that the VIRTIO_ID_* are not assigned in the virtio_ids.h
file in the upstream kernel. Picking the next free one was wrong and
there is a process that has been followed now.
See https://github.com/oasis-tcs/virtio-spec/issues/108 for details.
Fixes: afd2daa26c ("Bluetooth: Add support for virtio transport driver")
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Refactor DEVLINK_CMD_RATE_{GET|SET} command handlers to support setting
a node as a parent for another rate object (leaf or node) by means of
new attribute DEVLINK_ATTR_RATE_PARENT_NODE_NAME. Extend devlink ops
with new callbacks rate_{leaf|node}_parent_set() to set node as a parent
for rate object to allow supporting drivers to implement rate grouping
through devlink. Driver implementations are allowed to support leafs
or node children only. Invoking callback with NULL as parent should be
threated by the driver as unset parent action.
Extend rate object struct with reference counter to disallow deleting a
node with any child pointing to it. User should unset parent for the
child explicitly.
Example:
$ devlink port function rate add netdevsim/netdevsim10/group1
$ devlink port function rate add netdevsim/netdevsim10/group2
$ devlink port function rate set netdevsim/netdevsim10/group1 parent group2
$ devlink port function rate show netdevsim/netdevsim10/group1
netdevsim/netdevsim10/group1: type node parent group2
$ devlink port function rate set netdevsim/netdevsim10/group1 noparent
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement support for DEVLINK_CMD_RATE_{NEW|DEL} commands that are used
to create and delete devlink rate nodes. Add new attribute
DEVLINK_ATTR_RATE_NODE_NAME that specify node name string. The node name
is an alphanumeric identifier. No valid node name can be a devlink port
index, eg. decimal number. Extend devlink ops with new callbacks
rate_node_{new|del}() and rate_node_tx_{share|max}_set() to allow
supporting drivers to implement ports rate grouping and setting tx rate
of rate nodes through devlink.
Expose devlink_rate_nodes_destroy() function to allow vendor driver do
proper cleanup of internally allocated resources for the nodes if the
driver goes down or due to any other reasons which requires nodes to be
destroyed.
Disallow moving device from switchdev to legacy mode if any node exists
on that device. User must explicitly delete nodes before switching mode.
Example:
$ devlink port function rate add netdevsim/netdevsim10/group1
$ devlink port function rate set netdevsim/netdevsim10/group1 \
tx_share 10mbit tx_max 100mbit
Add + set command can be combined:
$ devlink port function rate add netdevsim/netdevsim10/group1 \
tx_share 10mbit tx_max 100mbit
$ devlink port function rate show netdevsim/netdevsim10/group1
netdevsim/netdevsim10/group1: type node tx_share 10mbit tx_max 100mbit
$ devlink port function rate del netdevsim/netdevsim10/group1
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement support for DEVLINK_CMD_RATE_SET command with new attributes
DEVLINK_ATTR_RATE_TX_{SHARE|MAX} that are used to set devlink rate
shared/max tx rate values. Extend devlink ops with new callbacks
rate_leaf_tx_{share|max}_set() to allow supporting drivers to implement
rate control through devlink.
New attributes are optional. Driver implementations are allowed to
support either or both of them.
Shared rate example:
$ devlink port function rate set netdevsim/netdevsim10/0 tx_share 10mbit
$ devlink port function rate show netdevsim/netdevsim10/0
netdevsim/netdevsim10/0: type leaf tx_share 10mbit
Max rate example:
$ devlink port function rate set netdevsim/netdevsim10/0 tx_max 100mbit
$ devlink port function rate show netdevsim/netdevsim10/0
netdevsim/netdevsim10/0: type leaf tx_max 100mbit
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow registering rate object for devlink ports with dedicated
devlink_rate_leaf_{create|destroy}() API. Implement new netlink
DEVLINK_CMD_RATE_GET command that is used to retrieve rate object info.
Add new DEVLINK_CMD_RATE_{NEW|DEL} commands that are used for
notifications when creating/deleting leaf rate object.
Rate API is intended to be used for rate limiting of individual
devlink ports (leafs) and their aggregates (nodes).
Example:
$ devlink port show
pci/0000:03:00.0/0
pci/0000:03:00.0/1
$ devlink port function rate show
pci/0000:03:00.0/0: type leaf
pci/0000:03:00.0/1: type leaf
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace BIT() in v4l2's UPAI header with _BITUL(). BIT() is not defined
in the UAPI headers and its usage may cause userspace build errors.
Fixes: 206bc0f6fb ("media: vicodec: mark the stateless FWHT API as stable")
Signed-off-by: Joe Richey <joerichey@google.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
If the audio_out_delay value is unused, then set it to 1, not 0.
The value 0 is reserved, and 1 is a much safer value since it
translates to a delay of (1 - 1) * 2 = 0 ms.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmC0CoEeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG8qYH/0bJOUUsj4fsBHY+
1fSTcbEZYxrdO6nYOzk34UZBeC7rGdhpFFVCU0fh/il8ec3K8pU9ypZykhvpFwGD
cY0zrfaNiEcsIinYZh/tMyfPK3JMqGrcOkKqYli5kSfW9BR0YzHBeiETcHMyZfMu
a8uI9nNYIZda0hwrXMbMNaFMH4Nif3kXwDxNNrx1THaTYifVZK+IwlM8TpcSPIK/
eNbU+5G/VBiQ9YE4PZyXVbsWMZcJ9NDRanx6aKMYTlwSDytxmd5HlURKJcfOiXnQ
AQq9VYbkkuw6FhBPnWvkYVJj1UwkySXjXyZkTiIH8iIMpJ406rWrQgIaQPI3o/98
iVDfH9k=
=t88B
-----END PGP SIGNATURE-----
Merge tag 'v5.13-rc4' into media_tree
Linux 5.13-rc4
* tag 'v5.13-rc4': (976 commits)
Linux 5.13-rc4
seccomp: Refactor notification handler to prepare for new semantics
selftests: kvm: fix overlapping addresses in memslot_perf_test
KVM: X86: Kill off ctxt->ud
KVM: X86: Fix warning caused by stale emulation context
KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
Documentation: seccomp: Fix user notification documentation
MAINTAINERS: adjust to removing i2c designware platform data
perf vendor events powerpc: Fix eventcode of power10 JSON events
Revert "serial: 8250: 8250_omap: Fix possible interrupt storm"
i2c: s3c2410: fix possible NULL pointer deref on read message after write
i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
perf stat: Fix error check for bpf_program__attach
cifs: change format of CIFS_FULL_KEY_DUMP ioctl
i2c: i801: Don't generate an interrupt on bus reset
i2c: mpc: implement erratum A-004447 workaround
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
dt-bindings: i2c: mpc: Add fsl,i2c-erratum-a004447 flag
i2c: busses: i2c-stm32f4: Remove incorrectly placed ' ' from function name
...
Pull HID fixes from Jiri Kosina:
- memory leak fix in usbhid from Anirudh Rayabharam
- additions for a few new recognized generic key IDs from Dmitry
Torokhov
- Asus T101HA and Dell K15A quirks from Hans de Goede
- memory leak fix in amd_sfh from Basavaraj Natikar
- Win8 compatibility and Stylus fixes in multitouch driver from
Ahelenia Ziemiańska
- NULL pointer dereference fix in hid-magicmouse from Johan Hovold
- assorted other small fixes and device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (33 commits)
HID: asus: Cleanup Asus T101HA keyboard-dock handling
HID: magicmouse: fix NULL-deref on disconnect
HID: intel-ish-hid: ipc: Add Alder Lake device IDs
HID: i2c-hid: fix format string mismatch
HID: amd_sfh: Fix memory leak in amd_sfh_work
HID: amd_sfh: Use devm_kzalloc() instead of kzalloc()
HID: ft260: improve error handling of ft260_hid_feature_report_get()
HID: magicmouse: fix crash when disconnecting Magic Trackpad 2
HID: gt683r: add missing MODULE_DEVICE_TABLE
HID: pidff: fix error return code in hid_pidff_init()
HID: logitech-hidpp: initialize level variable
HID: multitouch: Disable event reporting on suspend on the Asus T101HA touchpad
HID: core: Remove extraneous empty line before EXPORT_SYMBOL_GPL(hid_check_keys_pressed)
HID: hid-sensor-custom: Process failure of sensor_hub_set_feature()
HID: i2c-hid: Skip ELAN power-on command after reset
HID: usbhid: fix info leak in hid_submit_ctrl
HID: Add BUS_VIRTUAL to hid_connect logging
HID: multitouch: set Stylus suffix for Stylus-application devices, too
HID: multitouch: require Finger field to mark Win8 reports as MT
HID: remove the unnecessary redefinition of a macro
...
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Support for SCTP chunks matching on nf_tables, from Phil Sutter.
2) Skip LDMXCSR, we don't need a valid MXCSR state. From Stefano Brivio.
3) CONFIG_RETPOLINE for nf_tables set lookups, from Florian Westphal.
4) A few Kconfig leading spaces removal, from Juerg Haefliger.
5) Remove spinlock from xt_limit, from Jason Baron.
6) Remove useless initialization in xt_CT, oneliner from Yang Li.
7) Tree-wide replacement of netlink_unicast() by nfnetlink_unicast().
8) Reduce footprint of several structures: xt_action_param,
nft_pktinfo and nf_hook_state, from Florian.
10) Add nft_thoff() and nft_sk() helpers and use them, also from Florian.
11) Fix documentation in nf_tables pipapo avx2, from Florian Westphal.
12) Fix clang-12 fmt string warnings, also from Florian.
====================
Adding support for MAPv5 egress packets.
This involves adding the MAPv5 header and setting the csum_valid_required
in the checksum header to request HW compute the checksum.
Corresponding stats are incremented based on whether the checksum is
computed in software or HW.
New stat has been added which represents the count of packets whose
checksum is calculated by the HW.
Signed-off-by: Sharath Chandra Vurukala <sharathv@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding support for processing of MAPv5 downlink packets.
It involves parsing the Mapv5 packet and checking the csum header
to know whether the hardware has validated the checksum and is
valid or not.
Based on the checksum valid bit the corresponding stats are
incremented and skb->ip_summed is marked either CHECKSUM_UNNECESSARY
or left as CHEKSUM_NONE to let network stack revalidate the checksum
and update the respective snmp stats.
Current MAPV1 header has been modified, the reserved field in the
Mapv1 header is now used for next header indication.
Signed-off-by: Sharath Chandra Vurukala <sharathv@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit dab741e0e0 ("Add a "nosymfollow" mount option.") added support
for the "nosymfollow" mount option allowing to block following symlinks
when resolving paths. The mount option so far was only available in the
old mount api. Make it available in the new mount api as well. Bonus is
that it can be applied to a whole subtree not just a single mount.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mattias Nissler <mnissler@chromium.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <zwisler@google.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* Another state update on exit to userspace fix
* Prevent the creation of mixed 32/64 VMs
* Fix regression with irqbypass not restarting the guest on failed connect
* Fix regression with debug register decoding resulting in overlapping access
* Commit exception state on exit to usrspace
* Fix the MMU notifier return values
* Add missing 'static' qualifiers in the new host stage-2 code
x86 fixes:
* fix guest missed wakeup with assigned devices
* fix WARN reported by syzkaller
* do not use BIT() in UAPI headers
* make the kvm_amd.avic parameter bool
PPC fixes:
* make halt polling heuristics consistent with other architectures
selftests:
* various fixes
* new performance selftest memslot_perf_test
* test UFFD minor faults in demand_paging_test
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCyF0MUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOHSgf/Q4Hm5e12Bj2xJy6A+iShnrbbT8PW
hcIIOA7zGWXfjVYcBV7anbj7CcpzfIz0otcRBABa5mkhj+fb3YmPEb0EzCPi4Hru
zxpcpB2w7W7WtUOIKe2EmaT+4Pk6/iLcfr8UMHMqx460akE9OmIg10QNWai3My/3
RIOeakSckBI9e/1TQZbxH66dsLwCT0lLco7i7AWHdFxkzUQyoA34HX5pczOCBsO5
3nXH+/txnRVhqlcyzWLVVGVzFqmpHtBqkIInDOXfUqIoxo/gOhOgF1QdMUEKomxn
5ZFXlL5IXNtr+7yiI67iHX7CWkGZE9oJ04TgPHn6LR6wRnVvc3JInzcB5Q==
=ollO
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM fixes:
- Another state update on exit to userspace fix
- Prevent the creation of mixed 32/64 VMs
- Fix regression with irqbypass not restarting the guest on failed
connect
- Fix regression with debug register decoding resulting in
overlapping access
- Commit exception state on exit to usrspace
- Fix the MMU notifier return values
- Add missing 'static' qualifiers in the new host stage-2 code
x86 fixes:
- fix guest missed wakeup with assigned devices
- fix WARN reported by syzkaller
- do not use BIT() in UAPI headers
- make the kvm_amd.avic parameter bool
PPC fixes:
- make halt polling heuristics consistent with other architectures
selftests:
- various fixes
- new performance selftest memslot_perf_test
- test UFFD minor faults in demand_paging_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
selftests: kvm: fix overlapping addresses in memslot_perf_test
KVM: X86: Kill off ctxt->ud
KVM: X86: Fix warning caused by stale emulation context
KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
KVM: x86/mmu: Fix comment mentioning skip_4k
KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
KVM: x86: add start_assignment hook to kvm_x86_ops
KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
selftests: kvm: do only 1 memslot_perf_test run by default
KVM: X86: Use _BITUL() macro in UAPI headers
KVM: selftests: add shared hugetlbfs backing source type
KVM: selftests: allow using UFFD minor faults for demand paging
KVM: selftests: create alias mappings when using shared memory
KVM: selftests: add shmem backing source type
KVM: selftests: refactor vm_mem_backing_src_type flags
KVM: selftests: allow different backing source types
KVM: selftests: compute correct demand paging size
KVM: selftests: simplify setup_demand_paging error handling
KVM: selftests: Print a message if /dev/kvm is missing
...
Chunks are SCTP header extensions similar in implementation to IPv6
extension headers or TCP options. Reusing exthdr expression to find and
extract field values from them is therefore pretty straightforward.
For now, this supports extracting data from chunks at a fixed offset
(and length) only - chunks themselves are an extensible data structure;
in order to make all fields available, a nested extension search is
needed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmCvTYUTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqTxPB/4xVeasYKcyinylU7adBp9HFIvKjOiK
TpxQ7h4EhjJxkmQyONP529ZeQ5sjbnbc9IGkDQNhhVVm764LnEJ01aIi+kMtRs+M
szAGbWcITSyv4iaYCcKtNDSi2m74TK4gtRhsKItkIBRAZCs5jb54DSjWae7cGH0A
M/ts6WbYTbp89Lmww3mYtQ4dpmqvk/gXNbzKicrs2uGbPg0YTyq8rAQztt4yFaQR
9cBzxnwcfgvTz/uihkItiClv7kZIYjwzFB8BO1S2qx0TaUE1n78uiuuzcIdfvPt8
TKap7pwnjLYYToHokcWaU2t8Nd1Hy3KCaHuYO54TdXM4KYODJmbPydER
=pVNd
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-5.14-20210527' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
can-next 2021-05-27
The first 2 patches are by Geert Uytterhoeven and convert the rcan_can
and rcan_canfd device tree bindings to yaml.
The next 2 patches are by Oliver Hartkopp and me and update the CAN
uapi headers.
zuoqilin's patch removes an unnecessary variable from the CAN proc
code.
Patrick Menschel contributes 3 patches for CAN ISOTP to enhance the
error messages.
Jiapeng Chong's patch removes two dead stores from the softing driver.
The next 4 patches are by me and silence several warnings found by
clang compiler.
Jimmy Assarsson's patches for the kvaser_usb driver add support for
the Kvaser hydra devices.
Dario Binacchi provides 2 patches for the c_can driver, first removing
an unused variable, then adding basic ethtool support to query driver
and ring parameter info.
The last 4 patches are by Torin Cooper-Bennun and clean up the m_can
driver.
* tag 'linux-can-next-for-5.14-20210527' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (21 commits)
can: m_can: fix whitespace in a few comments
can: m_can: make TXESC, RXESC config more explicit
can: m_can: clean up CCCR reg defs, order by revs
can: m_can: use bits.h macros for all regmasks
can: c_can: add ethtool support
can: c_can: remove unused variable struct c_can_priv::rxmasked
can: kvaser_usb: Add new Kvaser hydra devices
can: kvaser_usb: Rename define USB_HYBRID_{,PRO_}CANLIN_PRODUCT_ID
can: at91_can: silence clang warning
can: mcp251xfd: silence clang warning
can: mcp251x: mcp251x_can_probe(): silence clang warning
can: hi311x: hi3110_can_probe(): silence clang warning
can: softing: Remove redundant variable ptr
can: isotp: Add error message if txqueuelen is too small
can: isotp: add symbolic error message to isotp_module_init()
can: isotp: change error format from decimal to symbolic error names
can: proc: remove unnecessary variables
can: uapi: introduce CANFD_FDF flag for mixed content in struct canfd_frame
can: uapi: update CAN-FD frame description
dt-bindings: can: rcar_canfd: Convert to json-schema
...
====================
Link: https://lore.kernel.org/r/20210527084532.1384031-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace BIT() in KVM's UPAI header with _BITUL(). BIT() is not defined
in the UAPI headers and its usage may cause userspace build errors.
Fixes: fb04a1eddb ("KVM: X86: Implement ring-based dirty memory tracking")
Signed-off-by: Joe Richey <joerichey@google.com>
Message-Id: <20210521085849.37676-3-joerichey94@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmCqzFgeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGIgQH/3nAV/fYbUCubEQe
RXUcjMGznIpdHeMiY/hPezObYnpBI3UAi2JwHCvQfoE8ckbx4tq8Xp+TUWebsdaf
zpDhKXDj2jHha1f5AixHCn1UFxiqOSn3d2muY2Bh1Nhg7iJuzU8xjIMCcOdss+fp
8e4wqidOHkpWvGJ96CQ5zCNxeXI+/f7VX2IgdJ+RCDwzbqJlIvvXwAkg1KrguUEz
EPmhpODqjPbVVc/mhtguMLMWl78WKCTBOSHCcYBolatXfm2ojsnX1hXprypWY4Mg
vKXxF/91AS8InCC08Jw+puz+fXDBx1jtNmFFhDOFTyz/TvwPaKZiWbAeXOZFJA2Z
Wm4su7g=
=cqxg
-----END PGP SIGNATURE-----
Merge v5.13-rc3 into drm-next
drm/i915 is extremely on fire without the below revert from -rc3:
commit 293837b9ac
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed May 19 05:55:57 2021 -1000
Revert "i915: fix remap_io_sg to verify the pgprot"
Backmerge so we don't have a too wide bisect window for anything
that's a more involved workload than booting the driver.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The struct can_frame and struct canfd_frame intentionally share the
same layout to be able to write CAN frame content into a CAN FD frame
structure. When this is done the former differentiation via CAN_MTU /
CANFD_MTU is lost. CANFD_FDF allows programmers to mark CAN FD frames
in the case of using struct canfd_frame for mixed CAN/CAN FD
content (dual use).
N.B. the Kernel APIs do NOT provide mixed CAN / CAN FD content inside
of struct canfd_frame therefore the CANFD_FDF flag is disregarded by
Linux.
Link: https://lore.kernel.org/r/20170411134343.3089-1-socketcan@hartkopp.net
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Since an early version of the CAN-FD specification the bit that
defines a CAN-FD frame on the wire, has been renamed from Extended
Data Length (EDL) to FD Frame (FDF).
To avoid confusion, update the struct canfd_frame description in the
UAPI headers accordingly.
Link: https://lore.kernel.org/r/20210517113727.77597-1-mkl@pengutronix.de
Suggested-by: Ayoub Kaanich <kayoub5@live.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
extend xdp_redirect_map for broadcast support.
With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.
When getting the devices in dev hash map via dev_map_hash_get_next_key(),
there is a possibility that we fall back to the first key when a device
was removed. This will duplicate packets on some interfaces. So just walk
the whole buckets to avoid this issue. For dev array map, we also walk the
whole map to find valid interfaces.
Function bpf_clear_redirect_map() was removed in
commit ee75aef23a ("bpf, xdp: Restructure redirect actions").
Add it back as we need to use ri->map again.
With test topology:
+-------------------+ +-------------------+
| Host A (i40e 10G) | ---------- | eno1(i40e 10G) |
+-------------------+ | |
| Host B |
+-------------------+ | |
| Host C (i40e 10G) | ---------- | eno2(i40e 10G) |
+-------------------+ | |
| +------+ |
| veth0 -- | Peer | |
| veth1 -- | | |
| veth2 -- | NS | |
| +------+ |
+-------------------+
On Host A:
# pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64
On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
All the veth peers in the NS have a XDP_DROP program loaded. The
forward_map max_entries in xdp_redirect_map_multi is modify to 4.
Testing the performance impact on the regular xdp_redirect path with and
without patch (to check impact of additional check for broadcast mode):
5.12 rc4 | redirect_map i40e->i40e | 2.0M | 9.7M
5.12 rc4 | redirect_map i40e->veth | 1.7M | 11.8M
5.12 rc4 + patch | redirect_map i40e->i40e | 2.0M | 9.6M
5.12 rc4 + patch | redirect_map i40e->veth | 1.7M | 11.7M
Testing the performance when cloning packets with the redirect_map_multi
test, using a redirect map size of 4, filled with 1-3 devices:
5.12 rc4 + patch | redirect_map multi i40e->veth (x1) | 1.7M | 11.4M
5.12 rc4 + patch | redirect_map multi i40e->veth (x2) | 1.1M | 4.3M
5.12 rc4 + patch | redirect_map multi i40e->veth (x3) | 0.8M | 2.6M
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
Extend the existing bpf_map_lookup_and_delete_elem() functionality to
hashtab map types, in addition to stacks and queues.
Create a new hashtab bpf_map_ops function that does lookup and deletion
of the element under the same bucket lock and add the created map_ops to
bpf.h.
Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/4d18480a3e990ffbf14751ddef0325eed3be2966.1620763117.git.denis.salopek@sartura.hr
Until now, the MPEG-2 V4L2 API was not exported as a public API,
and only defined in a private media header (media/mpeg2-ctrls.h).
After reviewing the MPEG-2 specification in detail, and reworking
the controls so they match the MPEG-2 semantics properly,
we can consider it ready.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Tested-by: Jernej Skrabec <jernej.skrabec@siol.net>
Reviewed-by: Jernej Skrabec <jernej.skrabec@siol.net>
Tested-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Move the MPEG-2 stateless control types out of staging,
and re-number it to avoid any confusion.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Tested-by: Jernej Skrabec <jernej.skrabec@siol.net>
Reviewed-by: Jernej Skrabec <jernej.skrabec@siol.net>
Tested-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Pull siginfo fix from Eric Biederman:
"During the merge window an issue with si_perf and the siginfo ABI came
up. The alpha and sparc siginfo structure layout had changed with the
addition of SIGTRAP TRAP_PERF and the new field si_perf.
The reason only alpha and sparc were affected is that they are the
only architectures that use si_trapno.
Looking deeper it was discovered that si_trapno is used for only a few
select signals on alpha and sparc, and that none of the other
_sigfault fields past si_addr are used at all. Which means technically
no regression on alpha and sparc.
While the alignment concerns might be dismissed the abuse of si_errno
by SIGTRAP TRAP_PERF does have the potential to cause regressions in
existing userspace.
While we still have time before userspace starts using and depending
on the new definition siginfo for SIGTRAP TRAP_PERF this set of
changes cleans up siginfo_t.
- The si_trapno field is demoted from magic alpha and sparc status
and made an ordinary union member of the _sigfault member of
siginfo_t. Without moving it of course.
- si_perf is replaced with si_perf_data and si_perf_type ending the
abuse of si_errno.
- Unnecessary additions to signalfd_siginfo are removed"
* 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo
signal: Deliver all of the siginfo perf data in _perf
signal: Factor force_sig_perf out of perf_sigtrap
signal: Implement SIL_FAULT_TRAPNO
siginfo: Move si_trapno inside the union inside _si_fault
This file has been updated many times since 2010.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
amd-drm-next-5.14-2021-05-19:
amdgpu:
- Aldebaran updates
- More LTTPR display work
- Vangogh updates
- SDMA 5.x GCR fixes
- RAS fixes
- PCIe ASPM support
- Modifier fixes
- Enable TMZ on Renoir
- Buffer object code cleanup
- Display overlay fixes
- Initial support for multiple eDP panels
- Initial SR-IOV support for Aldebaran
- DP link training refactor
- Misc code cleanups and bug fixes
- SMU regression fixes for variable sized arrays
- MAINTAINERS fixes for amdgpu
amdkfd:
- Initial SR-IOV support for Aldebaran
- Topology fixes
- Initial HMM SVM support
- Misc code cleanups and bug fixes
radeon:
- Misc code cleanups and bug fixes
- SMU regression fixes for variable sized arrays
- Flickering fix for Oland with multiple 4K displays
UAPI:
- amdgpu: Drop AMDGPU_GEM_CREATE_SHADOW flag.
This was always a kernel internal flag and userspace use of it has always been blocked.
It's no longer needed so remove it.
- amdkgd: HMM SVM support
Overview: https://patchwork.freedesktop.org/series/85562/
Porposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/tree/fxkamd/hmm-wip
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210520031258.231896-1-alexander.deucher@amd.com
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-05-19
The following pull-request contains BPF updates for your *net-next* tree.
We've added 43 non-merge commits during the last 11 day(s) which contain
a total of 74 files changed, 3717 insertions(+), 578 deletions(-).
The main changes are:
1) syscall program type, fd array, and light skeleton, from Alexei.
2) Stop emitting static variables in skeleton, from Andrii.
3) Low level tc-bpf api, from Kumar.
4) Reduce verifier kmalloc/kfree churn, from Lorenz.
====================
Add BPF_PROG_RUN command as an alias to BPF_RPOG_TEST_RUN to better
indicate the full range of use cases done by the command.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519014032.20908-1-alexei.starovoitov@gmail.com
Define AT_MINSIGSTKSZ in the generic uapi header. It is already used
as generic ABI in glibc's generic elf.h, and this define will prevent
future namespace conflicts. In particular, x86 is also using this
generic definition.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20210518200320.17239-2-chang.seok.bae@intel.com
Add bpf_sys_close() helper to be used by the syscall/loader program to close
intermediate FDs and other cleanup.
Note this helper must never be allowed inside fdget/fdput bracketing.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-11-alexei.starovoitov@gmail.com
Add new helper:
long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags)
Description
Find BTF type with given name and kind in vmlinux BTF or in module's BTFs.
Return
Returns btf_id and btf_obj_fd in lower and upper 32 bits.
It will be used by loader program to find btf_id to attach the program to
and to find btf_ids of ksyms.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-10-alexei.starovoitov@gmail.com
Typical program loading sequence involves creating bpf maps and applying
map FDs into bpf instructions in various places in the bpf program.
This job is done by libbpf that is using compiler generated ELF relocations
to patch certain instruction after maps are created and BTFs are loaded.
The goal of fd_idx is to allow bpf instructions to stay immutable
after compilation. At load time the libbpf would still create maps as usual,
but it wouldn't need to patch instructions. It would store map_fds into
__u32 fd_array[] and would pass that pointer to sys_bpf(BPF_PROG_LOAD).
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-9-alexei.starovoitov@gmail.com
Add placeholders for bpf_sys_bpf() helper and new program type.
Make sure to check that expected_attach_type is zero for future extensibility.
Allow tracing helper functions to be used in this program type, since they will
only execute from user context via bpf_prog_test_run.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com
With the addition of ssi_perf_data and ssi_perf_type struct signalfd_siginfo
is dangerously close to running out of space. All that remains is just
enough space for two additional 64bit fields. A practice of adding all
possible siginfo_t fields into struct singalfd_siginfo can not be supported
as adding the missing fields ssi_lower, ssi_upper, and ssi_pkey would
require two 64bit fields and one 32bit fields. In practice the fields
ssi_perf_data and ssi_perf_type can never be used by signalfd as the signal
that generates them always delivers them synchronously to the thread that
triggers them.
Therefore until someone actually needs the fields ssi_perf_data and
ssi_perf_type in signalfd_siginfo remove them. This leaves a bit more room
for future expansion.
v1: https://lkml.kernel.org/r/20210503203814.25487-12-ebiederm@xmission.com
v2: https://lkml.kernel.org/r/20210505141101.11519-12-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/20210517195748.8880-5-ebiederm@xmission.com
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCexAAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpo3SEADXxkU29LW0GPLvtQqaHSWiHZRHL1BHeqcI
tDRMx4Sch3wGJg6dKV9KoL1xRRTeKNpPFzVvQ0BLDrE+Brqu6S44AnWrYuRFvhAa
uOXsvyUCYkcB2y1ylGxPfj35ccAyFCi204/px96nz7+2J/0ASI5qaPXcZ7yf1yR/
HbN8oV3iM/pYohHlwFkEt785sMSqjDVNFA8gWWwCek+iF0Pp34J8ktesHEJubCJ+
R8B5YEK5JSfQ7ROQFlCYn/dS/DrP5mD6+1Yyy1iDumPkHgkxIz8tJ+z6h8jlyfhE
XqKPtSFVyE6LYyOk0m9j4lmNuNXWcIo1c5iiScMRvcvHyvVMMZoV0mjzSGI7HCsf
RoYQt8Ypi27Iei2EXph1V+WmpdYDhG55649m8ubn2YfMJbbep2+ya5DYZpWO1Ir+
Bof8idZkYFDZVSA6T9eBzMg/XwTvNI5WuwjCdD9tfO0s9R7OSVD0eZQNlLSJSjJA
c7N+jQkod+2uhgMzqGLSDvRze/0BOaN25Xt+R7bbOEG+k/mBd8+xgPIemAPKmS93
s6Ia87SRFdYpcJkxoIPJ6Tqky3QTcmSApTZ9ckYVUCxo8IGSsYV5gaoKX6G4O9nm
eewhdiN7si65f1duDkjXEySQ2eBPqwWpA0/w/O1WUwPDJdIYhXU2d1zDdnVGh0nH
NUcsJD1UDQ==
=JpHn
-----END PGP SIGNATURE-----
Merge tag 'block-5.13-2021-05-14' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix for shared tag set exit (Bart)
- Correct ioctl range for zoned ioctls (Damien)
- Removed dead/unused function (Lin)
- Fix perf regression for shared tags (Ming)
- Fix out-of-bounds issue with kyber and preemption (Omar)
- BFQ merge fix (Paolo)
- Two error handling fixes for nbd (Sun)
- Fix weight update in blk-iocost (Tejun)
- NVMe pull request (Christoph):
- correct the check for using the inline bio in nvmet (Chaitanya
Kulkarni)
- demote unsupported command warnings (Chaitanya Kulkarni)
- fix corruption due to double initializing ANA state (me, Hou Pu)
- reset ns->file when open fails (Daniel Wagner)
- fix a NULL deref when SEND is completed with error in nvmet-rdma
(Michal Kalderon)
- Fix kernel-doc warning (Bart)
* tag 'block-5.13-2021-05-14' of git://git.kernel.dk/linux-block:
block/partitions/efi.c: Fix the efi_partition() kernel-doc header
blk-mq: Swap two calls in blk_mq_exit_queue()
blk-mq: plug request for shared sbitmap
nvmet: use new ana_log_size instead the old one
nvmet: seset ns->file when open fails
nbd: share nbd_put and return by goto put_nbd
nbd: Fix NULL pointer in flush_workqueue
blkdev.h: remove unused codes blk_account_rq
block, bfq: avoid circular stable merges
blk-iocost: fix weight updates of inner active iocgs
nvmet: demote fabrics cmd parse err msg to debug
nvmet: use helper to remove the duplicate code
nvmet: demote discovery cmd parse err msg to debug
nvmet-rdma: Fix NULL deref when SEND is completed with error
nvmet: fix inline bio check for passthru
nvmet: fix inline bio check for bdev-ns
nvme-multipath: fix double initialization of ANA state
kyber: fix out of bounds access when preempted
block: uapi: fix comment about block device ioctl
Now that we have split the multicast router state into two, one for IPv4
and one for IPv6, also add individual timers to the mdb netlink router
port dump. Leaving the old timer attribute for backwards compatibility.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noone stepped up in the past two years since it was marked as BROKEN by
commit c7084edc3f (tty: mark Siemens R3964 line discipline as BROKEN).
Remove the line discipline for good.
Three remarks:
* we remove also the uapi header (as noone is able to use that interface
anyway)
* we do *not* remove the N_R3964 constant definition from tty.h, so it
remains reserved.
* in_interrupt() check is now removed from vt's con_put_char. Noone else
calls tty_operations::put_char from interrupt context.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20210505091928.22010-2-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch provides support for setting and copying core scheduling
'task cookies' between threads (PID), processes (TGID), and process
groups (PGID).
The value of core scheduling isn't that tasks don't share a core,
'nosmt' can do that. The value lies in exploiting all the sharing
opportunities that exist to recover possible lost performance and that
requires a degree of flexibility in the API.
From a security perspective (and there are others), the thread,
process and process group distinction is an existent hierarchal
categorization of tasks that reflects many of the security concerns
about 'data sharing'. For example, protecting against cache-snooping
by a thread that can just read the memory directly isn't all that
useful.
With this in mind, subcommands to CREATE/SHARE (TO/FROM) provide a
mechanism to create and share cookies. CREATE/SHARE_TO specify a
target pid with enum pidtype used to specify the scope of the targeted
tasks. For example, PIDTYPE_TGID will share the cookie with the
process and all of it's threads as typically desired in a security
scenario.
API:
prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, tgtpid, pidtype, &cookie)
prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, tgtpid, pidtype, NULL)
prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, tgtpid, pidtype, NULL)
prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, srcpid, pidtype, NULL)
where 'tgtpid/srcpid == 0' implies the current process and pidtype is
kernel enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, ...}.
For return values, EINVAL, ENOMEM are what they say. ESRCH means the
tgtpid/srcpid was not found. EPERM indicates lack of PTRACE permission
access to tgtpid/srcpid. ENODEV indicates your machines lacks SMT.
[peterz: complete rewrite]
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Don Hiatt <dhiatt@digitalocean.com>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123309.039845339@infradead.org
Fix the comment mentioning ioctl command range used for zoned block
devices to reflect the range of commands actually implemented.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/20210509234806.3000-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
and netfilter trees. Self-contained fixes, nothing risky.
Current release - new code bugs:
- dsa: ksz: fix a few bugs found by static-checker in the new driver
- stmmac: fix frame preemption handshake not triggering after
interface restart
Previous releases - regressions:
- make nla_strcmp handle more then one trailing null character
- fix stack OOB reads while fragmenting IPv4 packets in openvswitch
and net/sched
- sctp: do asoc update earlier in sctp_sf_do_dupcook_a
- sctp: delay auto_asconf init until binding the first addr
- stmmac: clear receive all(RA) bit when promiscuous mode is off
- can: mcp251x: fix resume from sleep before interface was brought up
Previous releases - always broken:
- bpf: fix leakage of uninitialized bpf stack under speculation
- bpf: fix masking negation logic upon negative dst register
- netfilter: don't assume that skb_header_pointer() will never fail
- only allow init netns to set default tcp cong to a restricted algo
- xsk: fix xp_aligned_validate_desc() when len == chunk_size to
avoid false positive errors
- ethtool: fix missing NLM_F_MULTI flag when dumping
- can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
- sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
- bridge: fix NULL-deref caused by a races between assigning
rx_handler_data and setting the IFF_BRIDGE_PORT bit
Latecomer:
- seg6: add counters support for SRv6 Behaviors
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCV3YoACgkQMUZtbf5S
IrsQ2w//Q8/qbl6wGTKUfu6DZHYUU5j5sTwiHR823PKKSgXI+okWMN0KUlZszOsz
qnPkH6GuojRooOE1s8PFLSlt9axKhQ0y7uzMTrWYafQ+JZTtgg9/MiPxQ8fdiE5i
uOG1ngttZ+1jlE5tMPL4GAOSegg3rWVDclzqnJTdsPPOco3MWj6SL9xN0LDPxCEL
BDysRqL/UiOIoh4v6IXQRx2UWjsNGu4biM1po+Jfumnd9T0zKoEpzu6UN6yPShbx
284LihZSQtughCbhGqkErBOxfjZcvpFOQrqmjEvI+Z/eYg4InfWZemt8Sa92/alE
yAFjK76MUTaUxaAO/gk8XauhvkYOzJJwKpqhbOmlaM7oj55QdzT5/8JxMxVoA6hV
pscHOixk15GVse49PdPV8v47cyTLc/Xi69i+/uUdNVVfuORL1wft1w1xbd0S6Pbe
7Gqax21S7zxcDsrUli7cFheYiqtbQAL0anlIUz8tUOZFz0VQ/zPuFd4rUYZ/o38V
Mrevdk3t6CXNxS4CRXyUW4UejYB1O6Qw12sUue31e3h73d6LiN3NAiN5Qp7SEk1/
fvk+jfOf8vvmtimYvcUK2i0D+vqj4Ec/qRIE/XXuUDBcp22tPL9uWMfWavwTdAj1
Se4SzksTWF+NM0lO0ItonMyPh3ZXcSLhIv/gHrZwEKuWkXCGO4M=
=JmWS
-----END PGP SIGNATURE-----
Merge tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc1, including fixes from bpf, can and
netfilter trees. Self-contained fixes, nothing risky.
Current release - new code bugs:
- dsa: ksz: fix a few bugs found by static-checker in the new driver
- stmmac: fix frame preemption handshake not triggering after
interface restart
Previous releases - regressions:
- make nla_strcmp handle more then one trailing null character
- fix stack OOB reads while fragmenting IPv4 packets in openvswitch
and net/sched
- sctp: do asoc update earlier in sctp_sf_do_dupcook_a
- sctp: delay auto_asconf init until binding the first addr
- stmmac: clear receive all(RA) bit when promiscuous mode is off
- can: mcp251x: fix resume from sleep before interface was brought up
Previous releases - always broken:
- bpf: fix leakage of uninitialized bpf stack under speculation
- bpf: fix masking negation logic upon negative dst register
- netfilter: don't assume that skb_header_pointer() will never fail
- only allow init netns to set default tcp cong to a restricted algo
- xsk: fix xp_aligned_validate_desc() when len == chunk_size to avoid
false positive errors
- ethtool: fix missing NLM_F_MULTI flag when dumping
- can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
- sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
- bridge: fix NULL-deref caused by a races between assigning
rx_handler_data and setting the IFF_BRIDGE_PORT bit
Latecomer:
- seg6: add counters support for SRv6 Behaviors"
* tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
atm: firestream: Use fallthrough pseudo-keyword
net: stmmac: Do not enable RX FIFO overflow interrupts
mptcp: fix splat when closing unaccepted socket
i40e: Remove LLDP frame filters
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
i40e: fix the restart auto-negotiation after FEC modified
i40e: Fix use-after-free in i40e_client_subtask()
i40e: fix broken XDP support
netfilter: nftables: avoid potential overflows on 32bit arches
netfilter: nftables: avoid overflows in nft_hash_buckets()
tcp: Specify cmsgbuf is user pointer for receive zerocopy.
mlxsw: spectrum_mr: Update egress RIF list before route's action
net: ipa: fix inter-EE IRQ register definitions
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
can: mcp251x: fix resume from sleep before interface was brought up
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
netfilter: nftables: Fix a memleak from userdata error path in new objects
netfilter: remove BUG_ON() after skb_header_pointer()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
...
The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."
I recently receive a patch to explicitly add a new one.
Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.
It is even nicer if scripts/checkpatch.pl can check it.
If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.
[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/
Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Support for the memtest= kernel command-line argument.
* Support for building the kernel with FORTIFY_SOURCE.
* Support for generic clockevent broadcasts.
* Support for the buildtar build target.
* Some build system cleanups to pass more LLVM-friendly arguments.
* Support for kprobes.
* A rearranged kernel memory map, the first part of supporting sv48
systems.
* Improvements to kexec, along with support for kdump and crash kernels.
* An alternatives-based errata framework, along with support for
handling a pair of errata that manifest on some SiFive designs
(including the HiFive Unmatched).
* Support for XIP.
* A device tree for the Microchip PolarFire ICICLE SoC and associated
dev board.
Along with a bunch of cleanups. There are already a handful of fixes
on the list so there will likely be a part 2.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmCS4lITHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYieZqEACSihfcOgZ/oyGWN3chca917/yCWimM
DOu37Zlh81TNPgzzJwbT44IY5sg/lSecwktxs665TChiJjr3JlM4jmz+u64KOTA8
mTWhqZNr5zT9kFj/m3x0V9yYOVr9g43QRmIlc14d+8JaQDw0N8WeH/yK85/CXDSS
X5gQK/e9q/yPf/NPyPuPm67jDsFnJERINWaAHI8lhA5fvFyy/xRLmSkuexchysss
XOGfyxxX590jGLK1vD+5wccX7ZwfwU4jriTaxyah/VBl8QUur/xSPVyspHIdWiMG
jrNXI1dg6oI861BdjryUpZI0iYJaRe5FRWUx7uTIqHfIyL/MnvYI7USVYOOPb72M
yZgN903R++5NeUUVTzfXwaigTwfXAPB6USFqZpEfRAf204pgNybmznJWThAVBdYG
rUixp7GsEMU3aAT2tE/iHR33JQxQfnZq8Tg43/4gB7MoACrzQrYrGcPnj9xssMyV
F1hnao3dr+5Xjo3MwfkW9JvLPwvDuE3mdrdj+a0XZ45gbTJeuBhYxo3VOsFeijhQ
gf/VYuoNn5iae9fiMzx5rlmFT9NJDYKDhla+BpAel84/6nRryyfCZCaE5FvDynOO
CNQynaeJMIMEygPBYR9FVVCwm+EtVsz3NVFKEuo5ilQpgX8ipctxiqy2+moZALLN
OWlEH6BKEgXqkw==
=PsA8
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for the memtest= kernel command-line argument.
- Support for building the kernel with FORTIFY_SOURCE.
- Support for generic clockevent broadcasts.
- Support for the buildtar build target.
- Some build system cleanups to pass more LLVM-friendly arguments.
- Support for kprobes.
- A rearranged kernel memory map, the first part of supporting sv48
systems.
- Improvements to kexec, along with support for kdump and crash
kernels.
- An alternatives-based errata framework, along with support for
handling a pair of errata that manifest on some SiFive designs
(including the HiFive Unmatched).
- Support for XIP.
- A device tree for the Microchip PolarFire ICICLE SoC and associated
dev board.
... along with a bunch of cleanups. There are already a handful of fixes
on the list so there will likely be a part 2.
* tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (45 commits)
RISC-V: Always define XIP_FIXUP
riscv: Remove 32b kernel mapping from page table dump
riscv: Fix 32b kernel build with CONFIG_DEBUG_VIRTUAL=y
RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
RISC-V: Enable Microchip PolarFire ICICLE SoC
RISC-V: Initial DTS for Microchip ICICLE board
dt-bindings: riscv: microchip: Add YAML documentation for the PolarFire SoC
RISC-V: Add Microchip PolarFire SoC kconfig option
RISC-V: enable XIP
RISC-V: Add crash kernel support
RISC-V: Add kdump support
RISC-V: Improve init_resources()
RISC-V: Add kexec support
RISC-V: Add EM_RISCV to kexec UAPI header
riscv: vdso: fix and clean-up Makefile
riscv/mm: Use BUG_ON instead of if condition followed by BUG.
riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe
riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU
riscv: module: Create module allocations without exec permissions
riscv: bpf: Avoid breaking W^X
...
Merge more updates from Andrew Morton:
"The remainder of the main mm/ queue.
143 patches.
Subsystems affected by this patch series (all mm): pagecache, hugetlb,
userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap,
kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and
kfence"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits)
kfence: use power-efficient work queue to run delayed work
kfence: maximize allocation wait timeout duration
kfence: await for allocation using wait_event
kfence: zero guard page after out-of-bounds access
mm/process_vm_access.c: remove duplicate include
mm/mempool: minor coding style tweaks
mm/highmem.c: fix coding style issue
btrfs: use memzero_page() instead of open coded kmap pattern
iov_iter: lift memzero_page() to highmem.h
mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
mm/zswap.c: switch from strlcpy to strscpy
arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
mm,memory_hotplug: allocate memmap from the added memory range
mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
mm,memory_hotplug: relax fully spanned sections check
drivers/base/memory: introduce memory_block_{online,offline}
mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
...
- Fix spellos in comments for the tegra and sun8i (Bhaskar Chowdhury)
- Add the missing fifth node on the rcar_gen3 sensor (Niklas
Söderlund)
- Remove duplicate include in ti-bandgap (Zhang Yunkai)
- Assign error code in the error path in the function
thermal_of_populate_bind_params() (Jia-Ju Bai)
- Fix spelling mistake in a comment 'disabed' -> 'disabled' (Colin Ian
King)
- Use the device name instead of auto-numbering for a better
identification of the cooling device (Daniel Lezcano)
- Improve a bit the division accuracy in the power allocator governor
(Jeson Gao)
- Enable the missing third sensor on msm8976 (Konrad Dybcio)
- Add QCom tsens driver co-maintainer (Thara Gopinath)
- Fix memory leak and use after free errors in the core code (Daniel
Lezcano)
- Add the MDM9607 compatible bindings (Konrad Dybcio)
- Fix trivial spello in the copyright name for Hisilicon (Hao Fang)
- Fix negative index array access when converting the frequency to
power in the energy model (Brian-sy Yang)
- Add support for Gen2 new PMIC support for Qcom SPMI (David Collins)
- Update maintainer file for CPU cooling device section (Lukasz Luba)
- Fix missing put_device on error in the Qcom tsens driver (Guangqing
Zhu)
- Add compatible DT binding for sm8350 (Robert Foss)
- Add support for the MDM9607's tsens driver (Konrad Dybcio)
- Remove duplicate error messages in thermal_mmio and the bcm2835
driver (Ruiqi Gong)
- Add the Thermal Temperature Cooling driver (Zhang Rui)
- Remove duplicate error messages in the Hisilicon sensor driver (Ye
Bin)
- Use the devm_platform_ioremap_resource_byname() function instead of
a couple of corresponding calls (dingsenjie)
- Sort the headers alphabetically in the ti-bandgap driver (Zhen Lei)
- Add missing property in the DT thermal sensor binding (Rafał
Miłecki)
- Remove dead code in the ti-bandgap sensor driver (Lin Ruizhe)
- Convert the BRCM DT bindings to the yaml schema (Rafał Miłecki)
- Replace the thermal_notify_framework() call by a call to the
thermal_zone_device_update() function. Remove the function as well
as the corresponding documentation (Thara Gopinath)
- Add support for the ipq8064-tsens sensor along with a set of
cleanups and code preparation (Ansuel Smith)
- Add a lockless __thermal_cdev_update() function to improve the
locking scheme in the core code and governors (Lukasz Luba)
- Fix multiple cooling device notification changes (Lukasz Luba)
- Remove unneeded variable initialization (Colin Ian King)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmCRqDIACgkQqDIjiipP
6E8O2Qf5AQvSVoN9WYRBLo1+a4mkGsJ/wHQMEsOA4FVHft5/QVkRtpMNbSiyq00O
YTpNuoBqiYm/tSTyzK/5Oh+0ucgm/ef4c4dTyPjZYw2GB+3rYNRAXdX/tB6Ggjl/
oUArUCoSQZjOU6Y573B05rcHp1PVM/XL9LgD1uX76tXA1MaGvsyC0cyPRAdOANke
W83BWI0XMhv8B1bZwHVB2Oft5x6HhqWBl3HKbNOmPEMtwkqqBCFAqB0wNEH88ZTf
2hyBjBoZQHdMkJsC0piMvIyAjHZiIjQB47VWz31EvKB3/E28xCqRqPViPq9QbrA5
got0+oDbxI96T024ndXRomc0SSxZnw==
=5THg
-----END PGP SIGNATURE-----
Merge tag 'thermal-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Remove duplicate error message for the amlogic driver (Tang Bin)
- Fix spellos in comments for the tegra and sun8i (Bhaskar Chowdhury)
- Add the missing fifth node on the rcar_gen3 sensor (Niklas Söderlund)
- Remove duplicate include in ti-bandgap (Zhang Yunkai)
- Assign error code in the error path in the function
thermal_of_populate_bind_params() (Jia-Ju Bai)
- Fix spelling mistake in a comment 'disabed' -> 'disabled' (Colin Ian
King)
- Use the device name instead of auto-numbering for a better
identification of the cooling device (Daniel Lezcano)
- Improve a bit the division accuracy in the power allocator governor
(Jeson Gao)
- Enable the missing third sensor on msm8976 (Konrad Dybcio)
- Add QCom tsens driver co-maintainer (Thara Gopinath)
- Fix memory leak and use after free errors in the core code (Daniel
Lezcano)
- Add the MDM9607 compatible bindings (Konrad Dybcio)
- Fix trivial spello in the copyright name for Hisilicon (Hao Fang)
- Fix negative index array access when converting the frequency to
power in the energy model (Brian-sy Yang)
- Add support for Gen2 new PMIC support for Qcom SPMI (David Collins)
- Update maintainer file for CPU cooling device section (Lukasz Luba)
- Fix missing put_device on error in the Qcom tsens driver (Guangqing
Zhu)
- Add compatible DT binding for sm8350 (Robert Foss)
- Add support for the MDM9607's tsens driver (Konrad Dybcio)
- Remove duplicate error messages in thermal_mmio and the bcm2835
driver (Ruiqi Gong)
- Add the Thermal Temperature Cooling driver (Zhang Rui)
- Remove duplicate error messages in the Hisilicon sensor driver (Ye
Bin)
- Use the devm_platform_ioremap_resource_byname() function instead of a
couple of corresponding calls (dingsenjie)
- Sort the headers alphabetically in the ti-bandgap driver (Zhen Lei)
- Add missing property in the DT thermal sensor binding (Rafał Miłecki)
- Remove dead code in the ti-bandgap sensor driver (Lin Ruizhe)
- Convert the BRCM DT bindings to the yaml schema (Rafał Miłecki)
- Replace the thermal_notify_framework() call by a call to the
thermal_zone_device_update() function. Remove the function as well as
the corresponding documentation (Thara Gopinath)
- Add support for the ipq8064-tsens sensor along with a set of cleanups
and code preparation (Ansuel Smith)
- Add a lockless __thermal_cdev_update() function to improve the
locking scheme in the core code and governors (Lukasz Luba)
- Fix multiple cooling device notification changes (Lukasz Luba)
- Remove unneeded variable initialization (Colin Ian King)
* tag 'thermal-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (55 commits)
thermal/drivers/mtk_thermal: Remove redundant initializations of several variables
thermal/core/power allocator: Use the lockless __thermal_cdev_update() function
thermal/core/fair share: Use the lockless __thermal_cdev_update() function
thermal/core/fair share: Lock the thermal zone while looping over instances
thermal/core/power_allocator: Update once cooling devices when temp is low
thermal/core/power_allocator: Maintain the device statistics from going stale
thermal/core: Create a helper __thermal_cdev_update() without a lock
dt-bindings: thermal: tsens: Document ipq8064 bindings
thermal/drivers/tsens: Add support for ipq8064-tsens
thermal/drivers/tsens: Drop unused define for msm8960
thermal/drivers/tsens: Replace custom 8960 apis with generic apis
thermal/drivers/tsens: Fix bug in sensor enable for msm8960
thermal/drivers/tsens: Use init_common for msm8960
thermal/drivers/tsens: Add VER_0 tsens version
thermal/drivers/tsens: Convert msm8960 to reg_field
thermal/drivers/tsens: Don't hardcode sensor slope
Documentation: driver-api: thermal: Remove thermal_notify_framework from documentation
thermal/core: Remove thermal_notify_framework
iwlwifi: mvm: tt: Replace thermal_notify_framework
dt-bindings: thermal: brcm,ns-thermal: Convert to the json-schema
...
It is currently not obvious that the RECLAIM_* bits are part of the uapi
since they are defined in vmscan.c. Move them to a uapi header to make it
obvious.
This should have no functional impact.
Link: https://lkml.kernel.org/r/20210219172557.08074910@viggo.jf.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: "Tobin C. Harding" <tobin@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This ioctl is how userspace ought to resolve "minor" userfaults. The
idea is, userspace is notified that a minor fault has occurred. It
might change the contents of the page using its second non-UFFD mapping,
or not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have
ensured the page contents are correct, carry on setting up the mapping".
Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in
the minor fault case, we already have some pre-existing underlying page.
Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
We'd just use memcpy() or similar instead.
It turns out hugetlb_mcopy_atomic_pte() already does very close to what
we want, if an existing page is provided via `struct page **pagep`. We
already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
just extend that design: add an enum for the three modes of operation,
and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
case. (Basically, look up the existing page, and avoid adding the
existing page to the page cache or calling set_page_huge_active() on
it.)
Link: https://lkml.kernel.org/r/20210301222728.176417-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "userfaultfd: add minor fault handling", v9.
Overview
========
This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults. By "minor" fault, I mean the following
situation:
Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory). One of the mappings is registered with userfaultfd (in minor
mode), and the other is not. Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents. The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.
We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...). In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".
Use Case
========
Consider the use case of VM live migration (e.g. under QEMU/KVM):
1. While a VM is still running, we copy the contents of its memory to a
target machine. The pages are populated on the target by writing to the
non-UFFD mapping, using the setup described above. The VM is still running
(and therefore its memory is likely changing), so this may be repeated
several times, until we decide the target is "up to date enough".
2. We pause the VM on the source, and start executing on the target machine.
During this gap, the VM's user(s) will *see* a pause, so it is desirable to
minimize this window.
3. Between the last time any page was copied from the source to the target, and
when the VM was paused, the contents of that page may have changed - and
therefore the copy we have on the target machine is out of date. Although we
can keep track of which pages are out of date, for VMs with large amounts of
memory, it is "slow" to transfer this information to the target machine. We
want to resume execution before such a transfer would complete.
4. So, the guest begins executing on the target machine. The first time it
touches its memory (via the UFFD-registered mapping), userspace wants to
intercept this fault. Userspace checks whether or not the page is up to date,
and if not, copies the updated page from the source machine, via the non-UFFD
mapping. Finally, whether a copy was performed or not, userspace issues a
UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
are correct, carry on setting up the mapping".
We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.
Interaction with Existing APIs
==============================
Because this is a feature, a registered VMA could potentially receive both
missing and minor faults. I spent some time thinking through how the
existing API interacts with the new feature:
UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault:
- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.
UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated. This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).
- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
-ENOENT in that case (regardless of the kind of fault).
Future Work
===========
This series only supports hugetlbfs. I have a second series in flight to
support shmem as well, extending the functionality. This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.
This patch (of 6):
This feature allows userspace to intercept "minor" faults. By "minor"
faults, I mean the following situation:
Let there exist two mappings (i.e., VMAs) to the same page(s). One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not. Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents. The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault. As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.
This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered. In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.
This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].
However, doing it this was requires we allocate a VM_* flag for the new
registration mode. On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.
[1] https://lore.kernel.org/patchwork/patch/1380226/
[peterx@redhat.com: fix minor fault page leak]
Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert the uAPI changes from the below commit with notice that these
regions and capabilities are no longer provided.
Fixes: b392a19891 ("vfio/pci: remove vfio_pci_nvlink2")
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <162014341432.3807030.11054087109120670135.stgit@omen>
HUTRR101 added a new usage code for a key that is supposed to invoke and
dismiss an emoji picker widget to assist users to locate and enter emojis.
This patch adds a new key definition KEY_EMOJI_PICKER and maps 0x0c/0x0d9
usage code to this new keycode. Additionally hid-debug is adjusted to
recognize this new usage code as well.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
In addition to some bug fixes and cleanups this adds support for
exposing the virtio based transport to user space using the rpmsg_char
driver.
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmCRYXsbHGJqb3JuLmFu
ZGVyc3NvbkBsaW5hcm8ub3JnAAoJEAsfOT8Nma3FXiEP/RC4cXMaHCtIKvXDbwwB
yP0Gl+gnP1LEEK8IkD5PJVzS5oi/s6KGct4x3EadvQEq42uJCxskbYZMN/R+wMGd
XTz2N0Bt7wErLQznD0dqDCdpihrkoao1zSu4rSDlnBzRZEGuWQBgTTeiEPh1sVXF
40r2zuUrvkERKG27DsflbE15Iqaz3ZoEc7X/0Mv9YQtBV+0HTtd6n0X8Vus6JcGR
un7t9Op9SlvYcEtsyqCewNrgpJhb9orpWDnIV+V7VXrvEBr0UOat881t2QZN3geM
pdIo+bU66VnqWy8Rd2g5zJl9jHr5C8YbpTgUy0+WFAjNu6CV0sW9jlMFKBMxMrDi
XpKqj/4TUqTvdSe30Y70/HdCQelbFVxNEFljX86hqD2Isg6P5DAe2GKDApt1SsMo
apy6Snp0F4JTL8rb0g5e/WUnWh11bKyhFSYFKnzPZKpQWfE0ppfOLKfDMlvksxyI
IZIqVz87VVu2itVPuYAvSBTtL+5KQP1U39LIOrX4y9Y3eordlbjIMU1wiem7uU3c
B3VYgaJY0iElElNqyrGZWvJxHuJf/IgOqg4V2P8x5fcw+vmSuz2prS6a2kU8doKU
D3cZrIpSGZLBfJw47bOBsdEnVNh16gC+TbGuh/hzEA+wJ+nu/haLl/KQf6yUQd24
ifIRQ0csTeptssTb/5rjLtNf
=pv2g
-----END PGP SIGNATURE-----
Merge tag 'rpmsg-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"In addition to some bug fixes and cleanups this adds support for
exposing the virtio based transport to user space using the rpmsg_char
driver"
* tag 'rpmsg-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data()
rpmsg: char: Return an error if device already open
rpmsg: virtio: Register the rpmsg_char device
rpmsg: char: Use rpmsg_sendto to specify the message destination address
rpmsg: Add short description of the IOCTL defined in UAPI.
rpmsg: Move RPMSG_ADDR_ANY in user API
rpmsg: char: Rename rpmsg_char_init to rpmsg_chrdev_init
This extension breaks when trying to delete rules, add a new revision to
fix this.
Fixes: 5e6874cdb8 ("[SECMARK]: Add xtables SECMARK target")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEgycj0O+d1G2aycA8rZhLv9lQBTwFAmCInP4ACgkQrZhLv9lQ
BTza0g//dTeb9woC9H7qlEhK4l9yk62lTss60Q8X7m7ZSNfdL4tiEbi64SgK+iOW
OOegbrOEb8Kzh4KJJYmVlVZ5YUWyH4szgmee1wnylBdsWiWaPLPF3Cflz77apy6T
TiiBsJd7rRE29FKheaMt34B41BMh8QHESN+DzjzJWsFoi/uNxjgSs2W16XuSupKu
bpRmB1pYNXMlrkzz7taL05jndZYE5arVriqlxgAsuLOFOp/ER7zecrjImdCM/4kL
W6ej0R1fz2Geh6CsLBJVE+bKWSQ82q5a4xZEkSYuQHXgZV5eywE5UKu8ssQcRgQA
VmGUY5k73rfY9Ofupf2gCaf/JSJNXKO/8Xjg0zAdklKtmgFjtna5Tyg9I90j7zn+
5swSpKuRpilN8MQH+6GWAnfqQlNoviTOpFeq3LwBtNVVOh08cOg6lko/bmebBC+R
TeQPACKS0Q0gCDPm9RYoU1pMUuYgfOwVfVRZK1prgi2Co7ZBUMOvYbNoKYoPIydr
ENBYljlU1OYwbzgR2nE+24fvhU8xdNOVG1xXYPAEHShu+p7dLIWRLhl8UCtRQpSR
1ofeVaJjgjrp29O+1OIQjB2kwCaRdfv/Gq1mztE/VlMU/r++E62OEzcH0aS+mnrg
yzfyUdI8IFv1q6FGT9yNSifWUWxQPmOKuC8kXsKYfqfJsFwKmHM=
=uCN4
-----END PGP SIGNATURE-----
Merge tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull Landlock LSM from James Morris:
"Add Landlock, a new LSM from Mickaël Salaün.
Briefly, Landlock provides for unprivileged application sandboxing.
From Mickaël's cover letter:
"The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock
is a stackable LSM [1], it makes possible to create safe security
sandboxes as new security layers in addition to the existing
system-wide access-controls. This kind of sandbox is expected to
help mitigate the security impact of bugs or unexpected/malicious
behaviors in user-space applications. Landlock empowers any
process, including unprivileged ones, to securely restrict
themselves.
Landlock is inspired by seccomp-bpf but instead of filtering
syscalls and their raw arguments, a Landlock rule can restrict the
use of kernel objects like file hierarchies, according to the
kernel semantic. Landlock also takes inspiration from other OS
sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD
Pledge/Unveil.
In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This
series still addresses multiple use cases, especially with the
combined use of seccomp-bpf: applications with built-in sandboxing,
init systems, security sandbox tools and security-oriented APIs [2]"
The cover letter and v34 posting is here:
https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/
See also:
https://landlock.io/
This code has had extensive design discussion and review over several
years"
Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1]
Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2]
* tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
landlock: Enable user space to infer supported features
landlock: Add user and kernel documentation
samples/landlock: Add a sandbox manager example
selftests/landlock: Add user space tests
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
LSM: Infrastructure management of the superblock
landlock: Add ptrace restrictions
landlock: Set up the security framework and manage credentials
landlock: Add ruleset and domain management
landlock: Add object management
- Extend DM ioctl's DM_LIST_DEVICES_CMD handling to include UUID and
allow filtering based on name or UUID prefix.
- Various small fixes for typos, warnings, unused function, or
needlessly exported interfaces.
- Remove needless request_queue NULL pointer checks in DM thin and
cache targets.
- Remove unnecessary loop in DM core's __split_and_process_bio().
- Remove DM core's dm_vcalloc() and just use kvcalloc or
kvmalloc_array instead (depending whether zeroing is useful).
- Fix request-based DM's double free of blk_mq_tag_set in device
remove after table load fails.
- Improve DM persistent data performance on non-x86 by fixing packed
structs to have a stated alignment. Also remove needless extra work
from redundant calls to sm_disk_get_nr_free() and a paranoid BUG_ON()
that caused duplicate checksum calculation.
- Fix missing goto in DM integrity's bitmap_flush_interval error
handling.
- Add "reset_recalculate" feature flag to DM integrity.
- Improve DM integrity by leveraging discard support to avoid needless
re-writing of metadata and also use discard support to improve
hash recalculation.
- Fix race with DM raid target's reshape and MD raid4/5/6 resync that
resulted in inconsistant reshape state during table reloads.
- Update DM raid target to temove unnecessary discard limits for raid0
and raid10 now that MD has optimized discard handling for both raid
levels.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmCMVxkTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWjD7B/4mucuval0w8OBl7MPE5mR/tPmU/Avr
iFmXQjofjXCXGdCk7XqOqKZGKlm/jCTQkhbZPEo5PTCZPf6iGyeoSFOC0xck9HUI
9EJNrnx6L0ch3OS5IREgc3oO7vXFnmnLvmo27t7yUaqWqdMBIZ/rEA6Ro7oq4pXc
fo/Yqka2iuS0X2RKhAbci2KAOWdeX400nqH7bHaR5VgOE3JRJtsEWRTmkngB4bzM
Bsz2zzaSnbFa/Nlg2cH69kw2NJpZsAoKmoQ0OJt7QQs+GYLvCbioQnlxLPSFhlKX
4mSPPPKy3TxaAqsHXt+gCG6Vs8VvvgK0iHIa9e0ifIy3wvLoosLpIXhc
=k1ND
-----END PGP SIGNATURE-----
Merge tag 'for-5.13/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Improve scalability of DM's device hash by switching to rbtree
- Extend DM ioctl's DM_LIST_DEVICES_CMD handling to include UUID and
allow filtering based on name or UUID prefix.
- Various small fixes for typos, warnings, unused function, or
needlessly exported interfaces.
- Remove needless request_queue NULL pointer checks in DM thin and
cache targets.
- Remove unnecessary loop in DM core's __split_and_process_bio().
- Remove DM core's dm_vcalloc() and just use kvcalloc or kvmalloc_array
instead (depending whether zeroing is useful).
- Fix request-based DM's double free of blk_mq_tag_set in device remove
after table load fails.
- Improve DM persistent data performance on non-x86 by fixing packed
structs to have a stated alignment. Also remove needless extra work
from redundant calls to sm_disk_get_nr_free() and a paranoid BUG_ON()
that caused duplicate checksum calculation.
- Fix missing goto in DM integrity's bitmap_flush_interval error
handling.
- Add "reset_recalculate" feature flag to DM integrity.
- Improve DM integrity by leveraging discard support to avoid needless
re-writing of metadata and also use discard support to improve hash
recalculation.
- Fix race with DM raid target's reshape and MD raid4/5/6 resync that
resulted in inconsistant reshape state during table reloads.
- Update DM raid target to temove unnecessary discard limits for raid0
and raid10 now that MD has optimized discard handling for both raid
levels.
* tag 'for-5.13/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
dm raid: remove unnecessary discard limits for raid0 and raid10
dm rq: fix double free of blk_mq_tag_set in dev remove after table load fails
dm integrity: use discard support when recalculating
dm integrity: increase RECALC_SECTORS to improve recalculate speed
dm integrity: don't re-write metadata if discarding same blocks
dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload sequences
dm raid: fix fall-through warning in rs_check_takeover() for Clang
dm clone metadata: remove unused function
dm integrity: fix missing goto in bitmap_flush_interval error handling
dm: replace dm_vcalloc()
dm space map common: fix division bug in sm_ll_find_free_block()
dm persistent data: packed struct should have an aligned() attribute too
dm btree spine: remove paranoid node_check call in node_prep_for_write()
dm space map disk: remove redundant calls to sm_disk_get_nr_free()
dm integrity: add the "reset_recalculate" feature flag
dm persistent data: remove unused return from exit_shadow_spine()
dm cache: remove needless request_queue NULL pointer checks
dm thin: remove needless request_queue NULL pointer check
dm: unexport dm_{get,put}_table_device
dm ebs: fix a few typos
...
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation,
zap under read lock, enable/disably dirty page logging under
read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing
the architecture-specific code
- Some selftests improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
+2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
=AXUi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This is a large update by KVM standards, including AMD PSP (Platform
Security Processor, aka "AMD Secure Technology") and ARM CoreSight
(debug and trace) changes.
ARM:
- CoreSight: Add support for ETE and TRBE
- Stage-2 isolation for the host kernel when running in protected
mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- AMD PSP driver changes
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation, zap under
read lock, enable/disably dirty page logging under read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing the
architecture-specific code
- a handful of "Get rid of oprofile leftovers" patches
- Some selftests improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
KVM: selftests: Speed up set_memory_region_test
selftests: kvm: Fix the check of return value
KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
KVM: SVM: Skip SEV cache flush if no ASIDs have been used
KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
KVM: SVM: Drop redundant svm_sev_enabled() helper
KVM: SVM: Move SEV VMCB tracking allocation to sev.c
KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
KVM: SVM: Unconditionally invoke sev_hardware_teardown()
KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
KVM: SVM: Move SEV module params/variables to sev.c
KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
KVM: SVM: Zero out the VMCB array used to track SEV ASID association
x86/sev: Drop redundant and potentially misleading 'sev_enabled'
KVM: x86: Move reverse CPUID helpers to separate header file
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
...
Including:
- Big cleanup of almost unsused parts of the IOMMU API by
Christoph Hellwig. This mostly affects the Freescale PAMU
driver.
- New IOMMU driver for Unisoc SOCs
- ARM SMMU Updates from Will:
- SMMUv3: Drop vestigial PREFETCH_ADDR support
- SMMUv3: Elide TLB sync logic for empty gather
- SMMUv3: Fix "Service Failure Mode" handling
- SMMUv2: New Qualcomm compatible string
- Removal of the AMD IOMMU performance counter writeable check
on AMD. It caused long boot delays on some machines and is
only needed to work around an errata on some older (possibly
pre-production) chips. If someone is still hit by this
hardware issue anyway the performance counters will just
return 0.
- Support for targeted invalidations in the AMD IOMMU driver.
Before that the driver only invalidated a single 4k page or the
whole IO/TLB for an address space. This has been extended now
and is mostly useful for emulated AMD IOMMUs.
- Several fixes for the Shared Virtual Memory support in the
Intel VT-d driver
- Mediatek drivers can now be built as modules
- Re-introduction of the forcedac boot option which got lost
when converting the Intel VT-d driver to the common dma-iommu
implementation.
- Extension of the IOMMU device registration interface and
support iommu_ops to be const again when drivers are built as
modules.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmCMEIoACgkQK/BELZcB
GuOu9xAAvg6aR0uHlxvRq6cgNnHN9Ltp5+t3qFYtRRrauY0iOPMO62k0QQli5shX
CGeczD0e59KAZqI0zNJnQn8hMY5dg7XVkFCC5BrSzuCDCtwJZ0N5Tq3pfUlaV1rw
BJf41t79Fd+jp7kn53tu+vRAfYZ3+sLOx/6U3c15pqKRZSkyFWbQllOtD3J5LnLu
1PyPlfiNpMwCajiS7aQbN+fuJ/lKIFeA2MDPOsCBzhbfxiJUqJxZOKAZO3rOjFfK
feTibqQ+3Zz6MPXt9st1cvPpy8jCosv81OY6Knqvxf/oB5q+fEdi2uNrKISonb/t
Fw331oOIwg2A+HOpwC9MN1AumOIqiHSWWENAMk9SlP+TMIWKQ8kZreyI6IEB23dV
+QvP3DVA+CfLwtNY/Zh0IqKh28D+IHlKbpWNU1m+9AUe468mV/MTjfwxr9Yfffhm
LZ6C0DgFdmtqv8jPuDGUOgo3RNeN8bLnUSEHG9gHibA+RKujl5BWDjKkwILqMQTt
Ysdsu8TiNtFIULomizqCpgqEbQfW8TLFvASXCM1VMQ/PDURxvchZPxFDJonYXy+K
z2HGaG3eUE07YrAdRKH69aMVIbmS+sjEhvmi4xZ1Lh7wWcIE2AZVvO8qNb+Ckcp3
4tLPPDksm/iQngnFf6gdgH3qv4rgbzE4+74GXqeANiQCjY9dSJI=
=qF2C
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- Big cleanup of almost unsused parts of the IOMMU API by Christoph
Hellwig. This mostly affects the Freescale PAMU driver.
- New IOMMU driver for Unisoc SOCs
- ARM SMMU Updates from Will:
- Drop vestigial PREFETCH_ADDR support (SMMUv3)
- Elide TLB sync logic for empty gather (SMMUv3)
- Fix "Service Failure Mode" handling (SMMUv3)
- New Qualcomm compatible string (SMMUv2)
- Removal of the AMD IOMMU performance counter writeable check on AMD.
It caused long boot delays on some machines and is only needed to
work around an errata on some older (possibly pre-production) chips.
If someone is still hit by this hardware issue anyway the performance
counters will just return 0.
- Support for targeted invalidations in the AMD IOMMU driver. Before
that the driver only invalidated a single 4k page or the whole IO/TLB
for an address space. This has been extended now and is mostly useful
for emulated AMD IOMMUs.
- Several fixes for the Shared Virtual Memory support in the Intel VT-d
driver
- Mediatek drivers can now be built as modules
- Re-introduction of the forcedac boot option which got lost when
converting the Intel VT-d driver to the common dma-iommu
implementation.
- Extension of the IOMMU device registration interface and support
iommu_ops to be const again when drivers are built as modules.
* tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (84 commits)
iommu: Streamline registration interface
iommu: Statically set module owner
iommu/mediatek-v1: Add error handle for mtk_iommu_probe
iommu/mediatek-v1: Avoid build fail when build as module
iommu/mediatek: Always enable the clk on resume
iommu/fsl-pamu: Fix uninitialized variable warning
iommu/vt-d: Force to flush iotlb before creating superpage
iommu/amd: Put newline after closing bracket in warning
iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()'
iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86
iommu/amd: Remove performance counter pre-initialization test
Revert "iommu/amd: Fix performance counter initialization"
iommu/amd: Remove duplicate check of devid
iommu/exynos: Remove unneeded local variable initialization
iommu/amd: Page-specific invalidations for more than one page
iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command
iommu/vt-d: Avoid unnecessary cache flush in pasid entry teardown
iommu/vt-d: Invalidate PASID cache when root/context entry changed
iommu/vt-d: Remove WO permissions on second-level paging entries
iommu/vt-d: Report the right page fault address
...
No surprises in this development cycle, and most of works are about
the fixes and the improvements of the existing code, while a new LED
control layer and a few new drivers have been introduced.
Here are some highlights:
Core:
- A common mute-LED framework was introduced;
used by HD-audio for now, more adaption will follow later.
The former "Mic Mute-LED Mode" mixer control has been replaced with
the corresponding sysfs now.
- User-control management was changed to count consumed bytes instead
of capping by number of elements;
this will allow more controls in the normal usage pattern while
avoiding the possible memory exhaustion DoS
ASoC:
- Continued refactoring and cleanups in ASoC core and generic card
drivers
- Wide range of small cppcheck and warning fixes
- New drivers for Freescale i.MX DMA over rpmsg, Mediatek MT6358
accessory detection, and Realtek RT1019, RT1316, RT711 and RT715
USB-audio:
- Continued improvements and fixes of the implicit feedback mode,
including better support for Pioneer and Roland/BOSS devices
HD-audio:
- Default back to non-buffer preallocation on x86
- Cirrus codec improvements, more quirks for Realtek codecs
Others:
- New virtio sound driver
- FireWire Bebob updates
Note that this PR includes a couple of changes in reset and SPI
drivers, too, and some merge conflicts might happen.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmCMJaAOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/T6A//Ti0SAWYnAr5l/7ccuwS4zljHcuHngwvIxRPY
BokU1ZUlagi+Ro2HLUq13G8T4AlUAQ8r2ecz7EJQHHl9tkrIg7Cc0+fiBPHju1Yu
0F3Vjc78/JsJHvAR2DPll2rwhsdD3usSQXFo181k38J098X02iNcrzsj3kW5Bpzb
DBvXzOBIAg/PPfPa4edSYsSurqYeZTkhshedTohlwOCnVbW9NN5b5T9yoXP+t5na
rvK1Vhu0He8nVMBPDrzjKgE5rjm7Kn0FNXZ6CMDekU9sRVzm/PbgAqqmRnn6bUKa
GDpcQzlaiDrw8a7/uTVgUZy85F9kMXMMnfYpBy4bBXOt6RWOplXY1yMxy1RXV+op
3qC9k5R+IsjSWFQZ2z5bIHtGBNCG0698z9fQcvpsWTv+R68rUyfj+jeO/G9zzvpi
qpQTloBfI28NoP+iGis7wtrlQ15ut47YMCQS8QiOEvLmd5/3xKXRut4Ac/VmvDpS
q7fLivL8MZ/SMoXY74q/kByMBkXNpryQCAN+xAslaJ5P0aefNYJJdBt/sJlsDd9J
Ya2VIxHoP+Sb1MG6OLq1Y8c53Di9lwY80pOtF3plcz/ZWgzipirf6BhFj0OttiKP
a6+VewXA7zZcWEdw+Ik4dWP2dybWL+CuNl7Bwug8SyG9iWqg8Ph7FgoCTWAi93Fx
KKUJxsc=
=YT3U
-----END PGP SIGNATURE-----
Merge tag 'sound-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"No surprises in this development cycle, and most of work is about the
fixes and the improvements of the existing code, while a new LED
control layer and a few new drivers have been introduced.
Here are some highlights:
Core:
- A common mute-LED framework was introduced. It is used by HD-audio
for now, more adaption will follow later. The former "Mic Mute-LED
Mode" mixer control has been replaced with the corresponding sysfs
now.
- User-control management was changed to count consumed bytes instead
of capping by number of elements; this will allow more controls in
the normal usage pattern while avoiding the possible memory
exhaustion DoS
ASoC:
- Continued refactoring and cleanups in ASoC core and generic card
drivers
- Wide range of small cppcheck and warning fixes
- New drivers for Freescale i.MX DMA over rpmsg, Mediatek MT6358
accessory detection, and Realtek RT1019, RT1316, RT711 and RT715
USB-audio:
- Continued improvements and fixes of the implicit feedback mode,
including better support for Pioneer and Roland/BOSS devices
HD-audio:
- Default back to non-buffer preallocation on x86
- Cirrus codec improvements, more quirks for Realtek codecs
Others:
- New virtio sound driver
- FireWire Bebob updates"
* tag 'sound-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (587 commits)
ALSA: hda/conexant: Re-order CX5066 quirk table entries
ALSA: hda/realtek: Remove redundant entry for ALC861 Haier/Uniwill devices
ALSA: hda/realtek: Re-order ALC662 quirk table entries
ALSA: hda/realtek: Re-order remaining ALC269 quirk table entries
ALSA: hda/realtek: Re-order ALC269 Lenovo quirk table entries
ALSA: hda/realtek: Re-order ALC269 Sony quirk table entries
ALSA: hda/realtek: Re-order ALC269 ASUS quirk table entries
ALSA: hda/realtek: Re-order ALC269 Dell quirk table entries
ALSA: hda/realtek: Re-order ALC269 Acer quirk table entries
ALSA: hda/realtek: Re-order ALC269 HP quirk table entries
ALSA: hda/realtek: Re-order ALC882 Clevo quirk table entries
ALSA: hda/realtek: Re-order ALC882 Sony quirk table entries
ALSA: hda/realtek: Re-order ALC882 Acer quirk table entries
ALSA: usb-audio: Remove redundant assignment to len
ALSA: hda/realtek: Add quirk for Intel Clevo PCx0Dx
ALSA: virtio: fix kernel-doc
ALSA: hda/cirrus: Use CS8409 filter to fix abnormal sounds on Bullseye
ALSA: hda/cirrus: Set Initial DMIC volume for Bullseye to -26 dB
ALSA: sb: Fix two use after free in snd_sb_qsound_build
ALSA: emu8000: Fix a use after free in snd_emu8000_create_mixer
...
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to
walk all map elements in a more robust and easier to verify
fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF
on s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices
which don't need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability
on next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is
slow in reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take
place correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP
packets before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used
to define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
-independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and
bnxt support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP
which define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay
for packets sampled from switch HW (and corresponding egress
and policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP
forwarding, bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface
to distribute MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x -
11-port Ethernet switch with 8x 1-Gigabit Ethernet
and 3x 10-Gigabit interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365
and BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack,
matching on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode
changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCKFPIACgkQMUZtbf5S
Irtw0g/+NA8bWdHNgG4H5rya0pv2z3IieLRmSdDfKRQQXcJpklawc5MKVVaTee/Q
5/QqgPdCsu1LAU6JXBKsKmyDDaMlQKdWuKbOqDSiAQKoMesZStTEHf9d851ZzgxA
Cdb6O7BD3lBl/IN+oxNG+KcmD1LKquTPKGySq2mQtEdLO12ekAsranzmj4voKffd
q9tBShpXQ7Dq77DLYfiQXVCvsizNcbbJFuxX0o9Lpb9+61ZyYAbogZSa9ypiZZwR
I/9azRBtJg7UV1aD/cLuAfy66Qh7t63+rCxVazs5Os8jVO26P/jQdisnnOe/x+p9
wYEmKm3GSu0V4SAPxkWW+ooKusflCeqDoMIuooKt6kbP6BRj540veGw3Ww/m5YFr
7pLQkTSP/tSjuGQIdBE1LOP5LBO8DZeC8Kiop9V0fzAW9hFSZbEq25WW0bPj8QQO
zA4Z7yWlslvxcfY2BdJX3wD8klaINkl/8fDWZFFsBdfFX2VeLtm7Xfduw34BJpvU
rYT3oWr6PhtkPAKR32SUcemSfeWgIVU41eSshzRz3kez1NngBUuLlSGGSEaKbes5
pZVt6pYFFVByyf6MTHFEoQvafZfEw04JILZpo4R5V8iTHzom0kD3Py064sBiXEw2
B6t+OW4qgcxGblpFkK2lD4kR2s1TPUs0ckVO6sAy1x8q60KKKjY=
=vcbA
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to walk
all map elements in a more robust and easier to verify fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF on
s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices which don't
need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability on
next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is slow in
reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take place
correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP packets
before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used to
define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and bnxt
support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP which
define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay for
packets sampled from switch HW (and corresponding egress and
policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP forwarding,
bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface to distribute
MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port
Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit
interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and
BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack, matching
on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes"
* tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits)
net: selftest: fix build issue if INET is disabled
net: netrom: nr_in: Remove redundant assignment to ns
net: tun: Remove redundant assignment to ret
net: phy: marvell: add downshift support for M88E1240
net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255
net/sched: act_ct: Remove redundant ct get and check
icmp: standardize naming of RFC 8335 PROBE constants
bpf, selftests: Update array map tests for per-cpu batched ops
bpf: Add batched ops support for percpu array
bpf: Implement formatted output helpers with bstr_printf
seq_file: Add a seq_bprintf function
sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
net: fix a concurrency bug in l2tp_tunnel_register()
net/smc: Remove redundant assignment to rc
mpls: Remove redundant assignment to err
llc2: Remove redundant assignment to rc
net/tls: Remove redundant initialization of record
rds: Remove redundant assignment to nr_sig
dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmCJU1UACgkQnJ2qBz9k
QNk62AgAgp05OIXU/AgObb7DvSyI3ycwCV8PeWBpwD8yoDAh5x0tmT7vnJu974p6
yHdnF7rr69ZzvbNCHLJ5kRykRlUao9W7cO5fdOW1uTpL7Ic60QuJMks/NfgVTHp1
2zIQmBDerfn1/LTK8r2pPGcvtcjRcr7Ep4beN0Duw57lfVMJhjsNRPnBbXGBcp0r
QzKk4/8V3DCZvOw+XNC3nto7avjvf+nU9sJmuh83546eqh0atjWivvO5aAlDOe6W
rhBiLlmP0in5u2n1fYqzI1OQvtgtleyEZT2G0CrbAZn0xjmV/if9wl+3K6TOwDvR
778xDEX7sZCaO/xkB+WK3hrd15ftKg==
=0kYE
-----END PGP SIGNATURE-----
Merge tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota, ext2, reiserfs updates from Jan Kara:
- support for path (instead of device) based quotactl syscall
(quotactl_path(2))
- ext2 conversion to kmap_local()
- other minor cleanups & fixes
* tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fs/reiserfs/journal.c: delete useless variables
fs/ext2: Replace kmap() with kmap_local_page()
ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry()
fs/ext2/: fix misspellings using codespell tool
quota: report warning limits for realtime space quotas
quota: wire up quotactl_path
quota: Add mountpath based quota support
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCIRBUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjt5D/9de6zCaha6CyfIIPiU+crropQ2jPzO49cb
WzcOCmdhSv0GtYlhdnIqCOo5p8mRDWJAEBU9upTDTCWOx9hwr5Ms0TCNQHxuQ/T0
4Ll+/cMsOxeTypiykfMtOG9TEmYSria2vTJKLgpyaP4ohfJa3uT7r2NZ8NK/8T4t
wwbJ+jCSKewelI1l0XD8k8LBU39FS/KRgLTdfYj/rCW3PWt/ZE2eSIYjZQvMCVOC
3fIdgOOJAMQVQafz+YAeJd2E+/l5/8YcJVKpJMVtBNbqTHIjA4EsInZauy8TpBgW
OzJ3I+XdF70qZM119tI/nXw3sb0e+UV0fRsIXLkOwTEBzowernrAtsEwAOP+qFKS
2YnqSKOSjMO5d5Mpkz6T0MDMloU45jph88lUH0RoShVxGa7jv+TMOL6QU1oOyxc1
+gPPbApQs9WtSZDHsTJ0xFLpol804UDQmwb38mHdzedDVSE7iip1jANkw6LEhKkJ
Mlg60ZF1Z305G+cDhrbs02ZGVa+fzbrtXtLlTqZw8bNX9lBp0JLtDpzskjbnUmck
6A04nfg+Eto5GvAn+FRBuOCPridLEk2K6ygko/gwQWsYCgqkCgRuqjlIQCSZy5iu
jHEFixIXKn6eACf+YzLVxSLyEQrmFyDSypbN7LvzoKJYo/loy8Q1+42nGlrVC3zi
+CB1NokPng==
=ZJ8L
-----END PGP SIGNATURE-----
Merge tag 'for-5.13/io_uring-2021-04-27' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
- Support for multi-shot mode for POLL requests
- More efficient reference counting. This is shamelessly stolen from
the mm side. Even though referencing is mostly single/dual user, the
128 count was retained to keep the code the same. Maybe this
should/could be made generic at some point.
- Removal of the need to have a manager thread for each ring. The
manager threads only job was checking and creating new io-threads as
needed, instead we handle this from the queue path.
- Allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE. Since 5.12, this
thread is "just" a regular application thread, so no need to restrict
use of it anymore.
- Cleanup of how internal async poll data lifetime is managed.
- Fix for syzbot reported crash on SQPOLL cancelation.
- Make buffer registration more like file registrations, which includes
flexibility in avoiding full set unregistration and re-registration.
- Fix for io-wq affinity setting.
- Be a bit more defensive in task->pf_io_worker setup.
- Various SQPOLL fixes.
- Cleanup of SQPOLL creds handling.
- Improvements to in-flight request tracking.
- File registration cleanups.
- Tons of cleanups and little fixes
* tag 'for-5.13/io_uring-2021-04-27' of git://git.kernel.dk/linux-block: (156 commits)
io_uring: maintain drain logic for multishot poll requests
io_uring: Check current->io_uring in io_uring_cancel_sqpoll
io_uring: fix NULL reg-buffer
io_uring: simplify SQPOLL cancellations
io_uring: fix work_exit sqpoll cancellations
io_uring: Fix uninitialized variable up.resv
io_uring: fix invalid error check after malloc
io_uring: io_sq_thread() no longer needs to reset current->pf_io_worker
kernel: always initialize task->pf_io_worker to NULL
io_uring: update sq_thread_idle after ctx deleted
io_uring: add full-fledged dynamic buffers support
io_uring: implement fixed buffers registration similar to fixed files
io_uring: prepare fixed rw for dynanic buffers
io_uring: keep table of pointers to ubufs
io_uring: add generic rsrc update with tags
io_uring: add IORING_REGISTER_RSRC
io_uring: enumerate dynamic resources
io_uring: add generic path for rsrc update
io_uring: preparation for rsrc tagging
io_uring: decouple CQE filling from requests
...
The current definitions of constants for PROBE, currently defined only
in the net-next kernel branch, are inconsistent, with
some beginning with ICMP and others with simply EXT. This patch
attempts to standardize the naming conventions of the constants for
PROBE before their release into a stable Kernel, and to update the
relevant definitions in net/ipv4/icmp.c.
Similarly, the definitions for the code field (previously
ICMP_EXT_MAL_QUERY, etc) use the same prefixes as the type field. This
patch adds _CODE_ to the prefix to clarify the distinction of these
constants.
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Acked-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210427153635.2591-1-andreas.a.roeseler@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Clean up SCHED_DEBUG: move the decades old mess of sysctl, procfs and debugfs interfaces
to a unified debugfs interface.
- Signals: Allow caching one sigqueue object per task, to improve performance & latencies.
- Improve newidle_balance() irq-off latencies on systems with a large number of CPU cgroups.
- Improve energy-aware scheduling
- Improve the PELT metrics for certain workloads
- Reintroduce select_idle_smt() to improve load-balancing locality - but without the previous
regressions
- Add 'scheduler latency debugging': warn after long periods of pending need_resched. This
is an opt-in feature that requires the enabling of the LATENCY_WARN scheduler feature,
or the use of the resched_latency_warn_ms=xx boot parameter.
- CPU hotplug fixes for HP-rollback, and for the 'fail' interface. Fix remaining
balance_push() vs. hotplug holes/races
- PSI fixes, plus allow /proc/pressure/ files to be written by CAP_SYS_RESOURCE tasks as well
- Fix/improve various load-balancing corner cases vs. capacity margins
- Fix sched topology on systems with NUMA diameter of 3 or above
- Fix PF_KTHREAD vs to_kthread() race
- Minor rseq optimizations
- Misc cleanups, optimizations, fixes and smaller updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJInsRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1i5XxAArh0b+fwXlkVGzTUly7HQjhU7lFbChnmF
h6ToyNLi6pXoZ14VC/WoRIME+RzK3gmw9cEFaSLVPxbkbekTcyWS78kqmcg1/j2v
kO/20QhXobiIxVskYfoMmqSavZ5mKhMWBqtFXkCuYfxwGylas0VVdh3AZLJ7N21G
WEoFh99pVULwWnPHxM2ZQ87Ex9BkGKbsBTswxWpprCfXLqD0N2hHlABpwJP78zRf
VniWFOcC7lslILCFawb7CqGgAwbgV85nDRS4QCuCKisrkFywvjJrEeu/W+h1NfhF
d6ves/osNdEAM1DSALoxwEA42An8l8xh8NyJnl8JZV00LW0DM108O5/7pf5Zcryc
RHV3RxA7skgezBh5uThvo60QzNK+kVMatI4qpQEHxLE52CaDl/fBu1Cgb/VUxnIl
AEBfyiFbk+skHpuMFKtl30Tx3M+yJKMTzFPd4kYjHYGEDwtAcXcB3dJQW48A79i3
H3IWcDcXpk5Rjo2UZmaXdt/qlj7mP6U0xdOUq8ZK6JOC4uY9skszVGsfuNN9QQ5u
2E2YKKVrGFoQydl4C8R6A7axL2VzIJszHFZNipd8E3YOyW7PWRAkr02tOOkBTj8N
dLMcNM7aPJWqEYiEIjEzGQN20pweJ1dRA29LDuOswKh+7W2bWTQFh6F2Q8Haansc
RVg5PDzl+Mc=
=E7mz
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Clean up SCHED_DEBUG: move the decades old mess of sysctl, procfs and
debugfs interfaces to a unified debugfs interface.
- Signals: Allow caching one sigqueue object per task, to improve
performance & latencies.
- Improve newidle_balance() irq-off latencies on systems with a large
number of CPU cgroups.
- Improve energy-aware scheduling
- Improve the PELT metrics for certain workloads
- Reintroduce select_idle_smt() to improve load-balancing locality -
but without the previous regressions
- Add 'scheduler latency debugging': warn after long periods of pending
need_resched. This is an opt-in feature that requires the enabling of
the LATENCY_WARN scheduler feature, or the use of the
resched_latency_warn_ms=xx boot parameter.
- CPU hotplug fixes for HP-rollback, and for the 'fail' interface. Fix
remaining balance_push() vs. hotplug holes/races
- PSI fixes, plus allow /proc/pressure/ files to be written by
CAP_SYS_RESOURCE tasks as well
- Fix/improve various load-balancing corner cases vs. capacity margins
- Fix sched topology on systems with NUMA diameter of 3 or above
- Fix PF_KTHREAD vs to_kthread() race
- Minor rseq optimizations
- Misc cleanups, optimizations, fixes and smaller updates
* tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
cpumask/hotplug: Fix cpu_dying() state tracking
kthread: Fix PF_KTHREAD vs to_kthread() race
sched/debug: Fix cgroup_path[] serialization
sched,psi: Handle potential task count underflow bugs more gracefully
sched: Warn on long periods of pending need_resched
sched/fair: Move update_nohz_stats() to the CONFIG_NO_HZ_COMMON block to simplify the code & fix an unused function warning
sched/debug: Rename the sched_debug parameter to sched_verbose
sched,fair: Alternative sched_slice()
sched: Move /proc/sched_debug to debugfs
sched,debug: Convert sysctl sched_domains to debugfs
debugfs: Implement debugfs_create_str()
sched,preempt: Move preempt_dynamic to debug.c
sched: Move SCHED_DEBUG sysctl to debugfs
sched: Don't make LATENCYTOP select SCHED_DEBUG
sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG
sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG
sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
cpumask: Introduce DYING mask
cpumask: Make cpu_{online,possible,present,active}() inline
rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs()
...
- Improve Intel uncore PMU support:
- Parse uncore 'discovery tables' - a new hardware capability enumeration method
introduced on the latest Intel platforms. This table is in a well-defined PCI
namespace location and is read via MMIO. It is organized in an rbtree.
These uncore tables will allow the discovery of standard counter blocks, but
fancier counters still need to be enumerated explicitly.
- Add Alder Lake support
- Improve IIO stacks to PMON mapping support on Skylake servers
- Add Intel Alder Lake PMU support - which requires the introduction of 'hybrid' CPUs
and PMUs. Alder Lake is a mix of Golden Cove ('big') and Gracemont ('small' - Atom derived)
cores.
The CPU-side feature set is entirely symmetrical - but on the PMU side there's
core type dependent PMU functionality.
- Reduce data loss with CPU level hardware tracing on Intel PT / AUX profiling, by
fixing the AUX allocation watermark logic.
- Improve ring buffer allocation on NUMA systems
- Put 'struct perf_event' into their separate kmem_cache pool
- Add support for synchronous signals for select perf events. The immediate motivation
is to support low-overhead sampling-based race detection for user-space code. The
feature consists of the following main changes:
- Add thread-only event inheritance via perf_event_attr::inherit_thread, which limits
inheritance of events to CLONE_THREAD.
- Add the ability for events to not leak through exec(), via perf_event_attr::remove_on_exec.
- Allow the generation of SIGTRAP via perf_event_attr::sigtrap, extend siginfo with an u64
::si_perf, and add the breakpoint information to ::si_addr and ::si_perf if the event is
PERF_TYPE_BREAKPOINT.
The siginfo support is adequate for breakpoints right now - but the new field can be used
to introduce support for other types of metadata passed over siginfo as well.
- Misc fixes, cleanups and smaller updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJGpERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1j9zBAAuVbG2snV6SBSdXLhQcM66N3NckOXvSY5
QjjhQcuwJQEK/NJB3266K5d8qSmdyRBsWf3GCsrmyBT67P1V28K44Pu7oCV0UDtf
mpVRjEP0oR7hNsANSSgo8Fa4ZD7H5waX7dK7925Tvw8By3mMoZoddiD/84WJHhxO
NDF+GRFaRj+/dpbhV8cdCoXTjYdkC36vYuZs3b9lu0tS9D/AJgsNy7TinLvO02Cs
5peP+2y29dgvCXiGBiuJtEA6JyGnX3nUJCvfOZZ/DWDc3fdduARlRrc5Aiq4n/wY
UdSkw1VTZBlZ1wMSdmHQVeC5RIH3uWUtRoNqy0Yc90lBm55AQ0EENwIfWDUDC5zy
USdBqWTNWKMBxlEilUIyqKPQK8LW/31TRzqy8BWKPNcZt5yP5YS1SjAJRDDjSwL/
I+OBw1vjLJamYh8oNiD5b+VLqNQba81jFASfv+HVWcULumnY6ImECCpkg289Fkpi
BVR065boifJDlyENXFbvTxyMBXQsZfA+EhtxG7ju2Ni+TokBbogyCb3L2injPt9g
7jjtTOqmfad4gX1WSc+215iYZMkgECcUd9E+BfOseEjBohqlo7yNKIfYnT8mE/Xq
nb7eHjyvLiE8tRtZ+7SjsujOMHv9LhWFAbSaxU/kEVzpkp0zyd6mnnslDKaaHLhz
goUMOL/D0lg=
=NhQ7
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:
- Improve Intel uncore PMU support:
- Parse uncore 'discovery tables' - a new hardware capability
enumeration method introduced on the latest Intel platforms. This
table is in a well-defined PCI namespace location and is read via
MMIO. It is organized in an rbtree.
These uncore tables will allow the discovery of standard counter
blocks, but fancier counters still need to be enumerated
explicitly.
- Add Alder Lake support
- Improve IIO stacks to PMON mapping support on Skylake servers
- Add Intel Alder Lake PMU support - which requires the introduction of
'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big')
and Gracemont ('small' - Atom derived) cores.
The CPU-side feature set is entirely symmetrical - but on the PMU
side there's core type dependent PMU functionality.
- Reduce data loss with CPU level hardware tracing on Intel PT / AUX
profiling, by fixing the AUX allocation watermark logic.
- Improve ring buffer allocation on NUMA systems
- Put 'struct perf_event' into their separate kmem_cache pool
- Add support for synchronous signals for select perf events. The
immediate motivation is to support low-overhead sampling-based race
detection for user-space code. The feature consists of the following
main changes:
- Add thread-only event inheritance via
perf_event_attr::inherit_thread, which limits inheritance of
events to CLONE_THREAD.
- Add the ability for events to not leak through exec(), via
perf_event_attr::remove_on_exec.
- Allow the generation of SIGTRAP via perf_event_attr::sigtrap,
extend siginfo with an u64 ::si_perf, and add the breakpoint
information to ::si_addr and ::si_perf if the event is
PERF_TYPE_BREAKPOINT.
The siginfo support is adequate for breakpoints right now - but the
new field can be used to introduce support for other types of
metadata passed over siginfo as well.
- Misc fixes, cleanups and smaller updates.
* tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
signal, perf: Add missing TRAP_PERF case in siginfo_layout()
signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures
perf/x86: Allow for 8<num_fixed_counters<16
perf/x86/rapl: Add support for Intel Alder Lake
perf/x86/cstate: Add Alder Lake CPU support
perf/x86/msr: Add Alder Lake CPU support
perf/x86/intel/uncore: Add Alder Lake support
perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
perf/x86/intel: Add Alder Lake Hybrid support
perf/x86: Support filter_match callback
perf/x86/intel: Add attr_update for Hybrid PMUs
perf/x86: Add structures for the attributes of Hybrid PMUs
perf/x86: Register hybrid PMUs
perf/x86: Factor out x86_pmu_show_pmu_cap
perf/x86: Remove temporary pmu assignment in event_init
perf/x86/intel: Factor out intel_pmu_check_extra_regs
perf/x86/intel: Factor out intel_pmu_check_event_constraints
perf/x86/intel: Factor out intel_pmu_check_num_counters
perf/x86: Hybrid PMU support for extra_regs
perf/x86: Hybrid PMU support for event constraints
...
- printk fourcc modifier support added %p4cc
core:
- drm_crtc_commit_wait
- atomic plane state helpers reworked for full state
- dma-buf heaps API rework
- edid: rework and improvements for displayid
dp-mst:
- better topology logging
bridge:
- Chipone ICN6211
- Lontium LT8912B
- anx7625 regulator support
panel:
- fix lt9611 4k panels handling
simple-kms:
- add plane state helpers
ttm:
- debugfs support
- removal of unused sysfs
- ignore signaled moved fences
- ioremap buffer according to mem caching
i915:
- Alderlake S enablement
- Conversion to dma_resv_locking
- Bring back watchdog timeout support
- legacy ioctl cleanups
- add GEM TDDO and RFC process
- DG1 LMEM preparation work
- intel_display.c refactoring
- Gen9/TGL PCH combination support
- eDP MSO Support
- multiple PSR instance support
- Link training debug updates
- Disable PSR2 support on JSL/EHL
- DDR5/LPDDR5 support for bw calcs
- LSPCON limited to gen9/10 platforms
- HSW/BDW async flip/VTd corruption workaround
= SAGV watermakr fixes
- SNB hard hang on ring resume fix
- Limit imported dma-buf size
- move to use new tasklet API
- refactor KBL/TGL/ADL-S display/gt steppings
- refactoring legacy DP/HDMI, FB plane code out
amdgpu:
- uapi: add ioctl to query video capabilities
- Iniital AMD Freesync HDMI support
- Initial Adebaran support
- 10bpc dithering improvements
- DCN secure display support
- Drop legacy IO BAR requirements
- PCIE/S0ix/RAS/Prime/Reset fixes
- Display ASSR support
- SMU gfx busy queues for RV/PCO
- Initial LTTPR display work
amdkfd:
- MMU notifier fixes
- APU fixes
radeon:
- debugfs cleanps
- fw error handling ifix
- Flexible array cleanups
msm:
- big DSI phy/pll cleanup
- sc7280 initial support
- commong bandwidth scaling path
- shrinker locking contention fixes
- unpin/swap support for GEM objcets
ast:
- cursor plane handling reworked
tegra:
- don't register DP AUX channels before connectors
zynqmp:
- fix OOB struct padding memset
gma500:
- drop ttm and medfield support
exynos:
- request_irq cleanup function
mediatek:
- fine tune line time for EOTp
- MT8192 dpi support
- atomic crtc config updates
- don't support HDMI connector creation
mxsdb:
- imx8mm support
panfrost:
-= MMU IRQ handling rework
qxl:
- locking fixes
- resource deallocation changes
sun4i:
- add alpha properties to UI/VI layers
vc4:
- RPi4 CEC support
vmwgfx:
- doc cleanups
arc:
- moved to drm/tiny
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJgiNSVAAoJEAx081l5xIa+fvYP/1206BfOYOx5opt5K3By06ZY
zrOsbeaqFdHzfUR7xVwO4vqQNhkX4Pt8H/U7uYZx8PRdrXzGENwWLIaIskyUrKOd
BtwNqUr0ZXJGDlGg26StnUHKeAXuYXlpBKLta5y4LUTkI+bm6V/oVaDMq4dnah70
2CXS4C2mnaFRLBzuRlraxoGFN4eZkz6Waeyo6PJxn/l2GE2gw+jho0Yrh8e8F2w5
EjQeNF22/uHwznov03XFJlyugecuBDbE8A6Ma/znnkVdBXcT94eUMugbKOKi4Nn6
PuJOEdJxmj/9s3oi6kBERc8dvpOj0O+8Vp+xOzn2U3BVXebvu7VoJsq6FcAvL5lN
ltj4iErxUlEud2GRIVUMx8OTFiKj4ThRFJ2/8Uf22r3P7RHO5E9BLnZBzqIAhDVr
s2cDBMItcxcVHRCmE04h12XAO4libZBb2TVjbqG94Acq7beR76pMszFrmxPmHBEm
NGe1s7+ajxMzsq/NIsk4XAhqSmJo6+ujKyyVnrgvKUVeEaWW1U4YvjhJaetnP4fB
47gF24wOSNFwiCUZlqaIpp/MR4Z8YmaJ7tayWQq4Oj/neWe/yc8xQgQIuE8GL20j
P9eNQNvlBnoxkz275M9x4kVhJ5FRjr7OYnd3sFVnALuj6fnL3Z1RXLqI1lNtIz1d
YM89veZuNxMaiDz8roPH
=bLWZ
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"The usual lots of work all over the place.
i915 has gotten some Alderlake work and prelim DG1 code, along with a
major locking rework over the GEM code, and brings back the property
of timing out long running jobs using a watchdog. amdgpu has some
Alderbran support (new GPU), freesync HDMI support along with a lot
other fixes.
Outside of the drm, there is a new printf specifier added which should
have all the correct acks/sobs:
- printk fourcc modifier support added %p4cc
Summary:
core:
- drm_crtc_commit_wait
- atomic plane state helpers reworked for full state
- dma-buf heaps API rework
- edid: rework and improvements for displayid
dp-mst:
- better topology logging
bridge:
- Chipone ICN6211
- Lontium LT8912B
- anx7625 regulator support
panel:
- fix lt9611 4k panels handling
simple-kms:
- add plane state helpers
ttm:
- debugfs support
- removal of unused sysfs
- ignore signaled moved fences
- ioremap buffer according to mem caching
i915:
- Alderlake S enablement
- Conversion to dma_resv_locking
- Bring back watchdog timeout support
- legacy ioctl cleanups
- add GEM TDDO and RFC process
- DG1 LMEM preparation work
- intel_display.c refactoring
- Gen9/TGL PCH combination support
- eDP MSO Support
- multiple PSR instance support
- Link training debug updates
- Disable PSR2 support on JSL/EHL
- DDR5/LPDDR5 support for bw calcs
- LSPCON limited to gen9/10 platforms
- HSW/BDW async flip/VTd corruption workaround
- SAGV watermark fixes
- SNB hard hang on ring resume fix
- Limit imported dma-buf size
- move to use new tasklet API
- refactor KBL/TGL/ADL-S display/gt steppings
- refactoring legacy DP/HDMI, FB plane code out
amdgpu:
- uapi: add ioctl to query video capabilities
- Iniital AMD Freesync HDMI support
- Initial Adebaran support
- 10bpc dithering improvements
- DCN secure display support
- Drop legacy IO BAR requirements
- PCIE/S0ix/RAS/Prime/Reset fixes
- Display ASSR support
- SMU gfx busy queues for RV/PCO
- Initial LTTPR display work
amdkfd:
- MMU notifier fixes
- APU fixes
radeon:
- debugfs cleanps
- fw error handling ifix
- Flexible array cleanups
msm:
- big DSI phy/pll cleanup
- sc7280 initial support
- commong bandwidth scaling path
- shrinker locking contention fixes
- unpin/swap support for GEM objcets
ast:
- cursor plane handling reworked
tegra:
- don't register DP AUX channels before connectors
zynqmp:
- fix OOB struct padding memset
gma500:
- drop ttm and medfield support
exynos:
- request_irq cleanup function
mediatek:
- fine tune line time for EOTp
- MT8192 dpi support
- atomic crtc config updates
- don't support HDMI connector creation
mxsdb:
- imx8mm support
panfrost:
- MMU IRQ handling rework
qxl:
- locking fixes
- resource deallocation changes
sun4i:
- add alpha properties to UI/VI layers
vc4:
- RPi4 CEC support
vmwgfx:
- doc cleanups
arc:
- moved to drm/tiny"
* tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm: (1390 commits)
drm/ttm: Don't count pages in SG BOs against pages_limit
drm/ttm: fix return value check
drm/bridge: lt8912b: fix incorrect handling of of_* return values
drm: bridge: fix LONTIUM use of mipi_dsi_() functions
drm: bridge: fix ANX7625 use of mipi_dsi_() functions
drm/amdgpu: page retire over debugfs mechanism
drm/radeon: Fix a missing check bug in radeon_dp_mst_detect()
drm/amd/display: Fix the Wunused-function warning
drm/radeon/r600: Fix variables that are not used after assignment
drm/amdgpu/smu7: fix CAC setting on TOPAZ
drm/amd/display: Update DCN302 SR Exit Latency
drm/amdgpu: enable ras eeprom on aldebaran
drm/amdgpu: RAS harvest on driver load
drm/amdgpu: add ras aldebaran ras eeprom driver
drm/amd/pm: increase time out value when sending msg to SMU
drm/amdgpu: add DMUB outbox event IRQ source define/complete/debug flag
drm/amd/pm: add the callback to get vbios bootup values for vangogh
drm/radeon: Fix size overflow
drm/amdgpu: Fix size overflow
drm/amdgpu: move mmhub ras_func init to ip specific file
...
This patch extends the set infrastructure to add a special catch-all set
element. If the lookup fails to find an element (or range) in the set,
then the catch-all element is selected. Users can specify a mapping,
expression(s) and timeout to be attached to the catch-all element.
This patch adds a catchall list to the set, this list might contain more
than one single catch-all element (e.g. in case that the catch-all
element is removed and a new one is added in the same transaction).
However, most of the time, there will be either one element or no
elements at all in this list.
The catch-all element is identified via NFT_SET_ELEM_CATCHALL flag and
such special element has no NFTA_SET_ELEM_KEY attribute. There is a new
nft_set_elem_catchall object that stores a reference to the dummy
catch-all element (catchall->elem) whose layout is the same of the set
element type to reuse the existing set element codebase.
The set size does not apply to the catch-all element, users can define a
catch-all element even if the set is full.
The check for valid set element flags hava been updates to report
EOPNOTSUPP in case userspace requests flags that are not supported when
using new userspace nftables and old kernel.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Update NFSv2 and NFSv3 XDR encoding functions
- Add batch Receive posting to the server's RPC/RDMA transport (take 2)
- Reduce page allocator traffic in svcrdma
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmBzLCIACgkQM2qzM29m
f5evWA//fE2WlZDoTP8Iq1BGreGrGyzqOIkJakDGoZs4VOaUJN9WxWEcmBHI4t22
yom7aZ7S7VtMF6SoGMnohYoNwkloPJ1kfYBVZuUxDUCIHGVrLaAGZwjtojQftUS0
19ZdiSx8D8xWEqI/cpbHsj+CNCH1F5IDGjJzNhlz+rIFLrRPBMeHwbY8qf/zusm/
uZ4tGJtKWmFXvdT9duGkoNXRd0gBdcfDeFN//JYrLPS9sX4zs4/2KnDh25YJR8Jf
EQryc19l4ztlVT8PXJfI4I/fG0Sfv5AWuYYzaFIncp1PkmiunqGL1yai6eeW4LEN
8v3QEUNy9J7J20FrsL1ge1icbEyObfNFvkgYqNhmGIBdbEDdLWPppL6fQKUYl/zi
HSnAoJsJYyzYKbE7BMLy3wwZ775GsTrkU/7tfiu/M9KgvXJtDy0m7vsxt3/qULXn
Bg4KAKnmYIcuigdG6BATJ23jISRK7cHH/YYlORj3lsB/KQi+c2U/5zQFaROOwbtc
Ny+y92zQZVu9ZMkUfs2+b5qdg8Z/J0p8n9MfS7GcpCIyGZTnvfs8SfM8BXuHH2Yn
BdL3xDoPlbYQnAzp16oVfdoYm8RWzFBv+xHc360ielMOQbP4ntdC0oexIApsXzyz
Yv4OK9itsLagwcuAxPWnfnIgGxj9hPsp/peUsChsglhx+4NSGWA=
=BOO0
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Highlights:
- Update NFSv2 and NFSv3 XDR encoding functions
- Add batch Receive posting to the server's RPC/RDMA transport (take 2)
- Reduce page allocator traffic in svcrdma"
* tag 'nfsd-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (70 commits)
NFSD: Use DEFINE_SPINLOCK() for spinlock
sunrpc: Remove unused function ip_map_lookup
NFSv4.2: fix copy stateid copying for the async copy
UAPI: nfsfh.h: Replace one-element array with flexible-array member
svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom()
svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg
svcrdma: Remove sc_read_complete_q
svcrdma: Single-stage RDMA Read
SUNRPC: Move svc_xprt_received() call sites
SUNRPC: Export svc_xprt_received()
svcrdma: Retain the page backing rq_res.head[0].iov_base
svcrdma: Remove unused sc_pages field
svcrdma: Normalize Send page handling
svcrdma: Add a "deferred close" helper
svcrdma: Maintain a Receive water mark
svcrdma: Use svc_rdma_refresh_recvs() in wc_receive
svcrdma: Add a batch Receive posting mechanism
svcrdma: Remove stale comment for svc_rdma_wc_receive()
svcrdma: Provide an explanatory comment in CMA event handler
svcrdma: RPCDBG_FACILITY is no longer used
...
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) The various ip(6)table_foo incarnations are updated to expect
that the table is passed as 'void *priv' argument that netfilter core
passes to the hook functions. This reduces the struct net size by 2
cachelines on x86_64. From Florian Westphal.
2) Add cgroupsv2 support for nftables.
3) Fix bridge log family merge into nf_log_syslog: Missing
unregistration from netns exit path, from Phil Sutter.
4) Add nft_pernet() helper to access nftables pernet area.
5) Add struct nfnl_info to reduce nfnetlink callback footprint and
to facilite future updates. Consolidate nfnetlink callbacks.
6) Add CONFIG_NETFILTER_XTABLES_COMPAT Kconfig knob, also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the big set of USB and Thunderbolt driver updates for 5.13-rc1.
Lots of little things in here, with loads of tiny fixes and cleanups
over these drivers, as well as these "larger" changes:
- thunderbolt updates and new features added
- xhci driver updates and split out of a mediatek-specific xhci
driver from the main xhci module to make it easier to work
with (something that I have been wanting for a while).
- loads of typec feature additions and updates
- dwc2 driver updates
- dwc3 driver updates
- gadget driver fixes and minor updates
- loads of usb-serial cleanups and fixes and updates
- usbip documentation updates and fixes
- lots of other tiny USB driver updates
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYIa42A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymMEACgkQBzLb5W/IocS+oq7+D7P3V581sAn1Dcy2Qq
Yz370/X2hrjXAyIm7/Cz
=5dj/
-----END PGP SIGNATURE-----
Merge tag 'usb-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt driver updates for
5.13-rc1.
Lots of little things in here, with loads of tiny fixes and cleanups
over these drivers, as well as these "larger" changes:
- thunderbolt updates and new features added
- xhci driver updates and split out of a mediatek-specific xhci
driver from the main xhci module to make it easier to work with
(something that I have been wanting for a while).
- loads of typec feature additions and updates
- dwc2 driver updates
- dwc3 driver updates
- gadget driver fixes and minor updates
- loads of usb-serial cleanups and fixes and updates
- usbip documentation updates and fixes
- lots of other tiny USB driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (371 commits)
usb: Fix up movement of USB core kerneldoc location
usb: dwc3: gadget: Handle DEV_TXF_FLUSH_BYPASS capability
usb: dwc3: Capture new capability register GHWPARAMS9
usb: gadget: prevent a ternary sign expansion bug
usb: dwc3: core: Do core softreset when switch mode
usb: dwc2: Get rid of useless error checks in suspend interrupt
usb: dwc2: Update dwc2_handle_usb_suspend_intr function.
usb: dwc2: Add exit hibernation mode before removing drive
usb: dwc2: Add hibernation exiting flow by system resume
usb: dwc2: Add hibernation entering flow by system suspend
usb: dwc2: Allow exit hibernation in urb enqueue
usb: dwc2: Move exit hibernation to dwc2_port_resume() function
usb: dwc2: Move enter hibernation to dwc2_port_suspend() function
usb: dwc2: Clear GINTSTS_RESTOREDONE bit after restore is generated.
usb: dwc2: Clear fifo_map when resetting core.
usb: dwc2: Allow exiting hibernation from gpwrdn rst detect
usb: dwc2: Fix hibernation between host and device modes.
usb: dwc2: Fix host mode hibernation exit with remote wakeup flow.
usb: dwc2: Reset DEVADDR after exiting gadget hibernation.
usb: dwc2: Update exit hibernation when port reset is asserted
...
Here is the big set of tty and serial driver updates for 5.13-rc1.
Actually busy this release, with a number of cleanups happening:
- much needed core tty cleanups by Jiri Slaby
- removal of unused and orphaned old-style serial drivers. If
anyone shows up with this hardware, it is trivial to restore
these but we really do not think they are in use anymore.
- fixes and cleanups from Johan Hovold on a number of termios
setting corner cases that loads of drivers got wrong as well
as removing unneeded code due to tty core changes from long
ago that were never propagated out to the drivers
- loads of platform-specific serial port driver updates and
fixes
- coding style cleanups and other small fixes and updates all
over the tty/serial tree.
All of these have been in linux-next for a while now with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYIa3NQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykMXgCfX3FZgKveI4l94ChXSy4OyKwycHUAn00BzrMC
/7BwA1FnjQnC4zSzuHnm
=bAas
-----END PGP SIGNATURE-----
Merge tag 'tty-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty and serial driver updates from Greg KH:
"Here is the big set of tty and serial driver updates for 5.13-rc1.
Actually busy this release, with a number of cleanups happening:
- much needed core tty cleanups by Jiri Slaby
- removal of unused and orphaned old-style serial drivers. If anyone
shows up with this hardware, it is trivial to restore these but we
really do not think they are in use anymore.
- fixes and cleanups from Johan Hovold on a number of termios setting
corner cases that loads of drivers got wrong as well as removing
unneeded code due to tty core changes from long ago that were never
propagated out to the drivers
- loads of platform-specific serial port driver updates and fixes
- coding style cleanups and other small fixes and updates all over
the tty/serial tree.
All of these have been in linux-next for a while now with no reported
issues"
* tag 'tty-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (186 commits)
serial: extend compile-test coverage
serial: stm32: add FIFO threshold configuration
dt-bindings: serial: 8250: update TX FIFO trigger level
dt-bindings: serial: stm32: override FIFO threshold properties
dt-bindings: serial: add RX and TX FIFO properties
serial: xilinx_uartps: drop low-latency workaround
serial: vt8500: drop low-latency workaround
serial: timbuart: drop low-latency workaround
serial: sunsu: drop low-latency workaround
serial: sifive: drop low-latency workaround
serial: txx9: drop low-latency workaround
serial: sa1100: drop low-latency workaround
serial: rp2: drop low-latency workaround
serial: rda: drop low-latency workaround
serial: owl: drop low-latency workaround
serial: msm_serial: drop low-latency workaround
serial: mpc52xx_uart: drop low-latency workaround
serial: meson: drop low-latency workaround
serial: mcf: drop low-latency workaround
serial: lpc32xx_hs: drop low-latency workaround
...
Here is the big set of staging and IIO driver updates for 5.13-rc1.
Lots of little churn in here, and some larger churn as well. Major
things are:
- removal of wimax drivers, no one has this hardware anymore for
this failed "experiment".
- removal of the Google gasket driver, turns out no one wanted
to maintain it or cares about it anymore, so they asked for it
to be removed.
- comedi finally moves out of the staging directory into
drivers/comedi/ This is one of the oldest kernel subsystems
around, being created in the 2.0 kernel days, and was one of
the first things added to drivers/staging/ when that was
created over 15 years ago. It should have been moved out of
staging a long time ago, it's well maintained and used by
loads of different devices in the real world every day. Nice
to see this finally happen.
- so many tiny coding style cleanups it's not funny. Perfect
storm of at least 2 different intern project application
deadlines combined to provide a huge number of new
contributions in this area from people learning how to do
kernel development. Great job to everyone involved here.
There's also the normal updates for IIO drivers with new IIO drivers and
updates all over that subsystem.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYIa1zw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykfMACgq/Qj9n6NO/P4BX55XWjRkjOmxxwAoKrYEWkG
fIdLmhh4FGWkxaJO3Izf
=PCXb
-----END PGP SIGNATURE-----
Merge tag 'staging-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver updates from Greg KH:
"Here is the big set of staging and IIO driver updates for 5.13-rc1.
Lots of little churn in here, and some larger churn as well. Major
things are:
- removal of wimax drivers, no one has this hardware anymore for this
failed "experiment".
- removal of the Google gasket driver, turns out no one wanted to
maintain it or cares about it anymore, so they asked for it to be
removed.
- comedi finally moves out of the staging directory into drivers/comedi
This is one of the oldest kernel subsystems around, being created
in the 2.0 kernel days, and was one of the first things added to
drivers/staging/ when that was created over 15 years ago.
It should have been moved out of staging a long time ago, it's well
maintained and used by loads of different devices in the real world
every day. Nice to see this finally happen.
- so many tiny coding style cleanups it's not funny.
Perfect storm of at least 2 different intern project application
deadlines combined to provide a huge number of new contributions in
this area from people learning how to do kernel development. Great
job to everyone involved here.
There's also the normal updates for IIO drivers with new IIO drivers
and updates all over that subsystem.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (907 commits)
staging: octeon: Use 'for_each_child_of_node'
Staging: rtl8723bs: rtw_xmit: fixed tabbing issue
staging: rtl8188eu: remove unused function parameters
staging: rtl8188eu: cmdThread is a task_struct
staging: rtl8188eu: remove constant variable and dead code
staging: rtl8188eu: change bLeisurePs' type to bool
staging: rtl8723bs: remove empty #ifdef block
staging: rtl8723bs: remove unused DBG_871X_LEVEL macro declarations
staging: rtl8723bs: split too long line
staging: rtl8723bs: fix indentation in if block
staging: rtl8723bs: fix code indent issue
staging: rtl8723bs: replace DBG_871X_LEVEL logs with netdev_*()
staging: rtl8192e: indent statement properly
staging: rtl8723bs: Remove led_blink_hdl() and everything related
staging: comedi: move out of staging directory
staging: rtl8723bs: remove sdio_drv_priv structure
staging: rtl8723bs: remove unused argument in function
staging: rtl8723bs: remove DBG_871X_SEL_NL macro declaration
staging: rtl8723bs: replace DBG_871X_SEL_NL with netdev_dbg()
staging: rtl8723bs: fix indentation issue introduced by long line split
...
Here is the big set of various smaller driver subsystem updates for
5.13-rc1.
Major bits in here are:
- habanalabs driver updates
- hwtracing driver updates
- interconnect driver updates
- mhi driver updates
- extcon driver updates
- fpga driver updates
- new binder features added
- nvmem driver updates
- phy driver updates
- soundwire driver updates
- smaller misc and char driver fixes and updates.
- bluetooth driver bugfix that maintainer wanted to go through
this tree.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYIa0CQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylQ/QCgwLQleU5hH/iQwxbHgNL5GawNUroAmwZtxILF
1r6zjmGi0Ak4oFBf7A0T
=Rrl6
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of various smaller driver subsystem updates for
5.13-rc1.
Major bits in here are:
- habanalabs driver updates
- hwtracing driver updates
- interconnect driver updates
- mhi driver updates
- extcon driver updates
- fpga driver updates
- new binder features added
- nvmem driver updates
- phy driver updates
- soundwire driver updates
- smaller misc and char driver fixes and updates.
- bluetooth driver bugfix that maintainer wanted to go through this
tree.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (330 commits)
bluetooth: eliminate the potential race condition when removing the HCI controller
coresight: etm-perf: Fix define build issue when built as module
phy: Revert "phy: ti: j721e-wiz: add missing of_node_put"
phy: ti: j721e-wiz: Add missing include linux/slab.h
phy: phy-twl4030-usb: Fix possible use-after-free in twl4030_usb_remove()
stm class: Use correct UUID APIs
intel_th: pci: Add Alder Lake-M support
intel_th: pci: Add Rocket Lake CPU support
intel_th: Consistency and off-by-one fix
intel_th: Constify attribute_group structs
intel_th: Constify all drvdata references
stm class: Remove an unused function
habanalabs/gaudi: Fix uninitialized return code rc when read size is zero
greybus: es2: fix kernel-doc warnings
mei: me: add Alder Lake P device id.
dw-xdata-pcie: Update outdated info and improve text format
dw-xdata-pcie: Fix documentation build warns
fbdev: zero-fill colormap in fbcmap.c
firmware: qcom-scm: Fix QCOM_SCM configuration
speakup: i18n: Switch to kmemdup_nul() in spk_msg_set()
...
Highlights:
- Lots of Microsoft Surface work
- platform-profile support for HP and Microsoft Surface devices
- New WMI Gigabyte motherboard temperature monitoring driver
- Intel PMC improvements for Tiger Lake and Alder Lake
- Misc. bugfixes, improvements and quirk additions all over
The following is an automated git shortlog grouped by driver:
Add support for DYTC MMC_GET BIOS API.:
- Add support for DYTC MMC_GET BIOS API.
Adjust Dell drivers to a personal email address:
- Adjust Dell drivers to a personal email address
Fix typo in Kconfig:
- Fix typo in Kconfig
ISST:
- Account for increased timeout in some cases
MAINTAINERS:
- Add missing section for alienware-wmi driver
- Adjust Dell drivers to email alias
- update MELLANOX HARDWARE PLATFORM SUPPORT maintainers
Merge tag 'ib-mfd-platform-x86-v5.13' into review-hans:
- Merge tag 'ib-mfd-platform-x86-v5.13' into review-hans
Merge tag 'irq-no-autoen-2021-03-25' into review-hans:
- Merge tag 'irq-no-autoen-2021-03-25' into review-hans
Typo fix in the file classmate-laptop.c:
- Typo fix in the file classmate-laptop.c
add Gigabyte WMI temperature driver:
- add Gigabyte WMI temperature driver
add support for Advantech software defined button:
- add support for Advantech software defined button
asus-laptop:
- fix kobj_to_dev.cocci warnings
asus-wmi:
- Add param to turn fn-lock mode on by default
dell-wmi-sysman:
- Make init_bios_attributes() ACPI object parsing more robust
- Cleanup create_attributes_level_sysfs_files()
- Make sysman_init() return -ENODEV of the interfaces are not found
- Cleanup sysman_init() error-exit handling
- Fix release_attributes_data() getting called twice on init_bios_attributes() failure
- Make it safe to call exit_foo_attributes() multiple times
- Fix possible NULL pointer deref on exit
- Fix crash caused by calling kset_unregister twice
docs:
- driver-api: Add Surface DTX driver documentation
genirq:
- Add IRQF_NO_AUTOEN for request_irq/nmi()
gigabyte-wmi:
- add support for B550M AORUS PRO-P
- add X570 AORUS ELITE
hp-wmi:
- add platform profile support
- rename "thermal policy" to "thermal profile"
intel-hid:
- Fix spurious wakeups caused by tablet-mode events during suspend
- Support Lenovo ThinkPad X1 Tablet Gen 2
intel-vbtn:
- Remove unused KEYMAP_LEN define
- Stop reporting SW_DOCK events
intel_chtdc_ti_pwrbtn:
- Fix missing IRQF_ONESHOT as only threaded handler
intel_pmc_core:
- Uninitialized data in pmc_core_lpm_latch_mode_write()
- add ACPI dependency
- Fix "unsigned 'ret' is never less than zero" smatch warning
- Add support for Alder Lake PCH-P
- Add LTR registers for Tiger Lake
- Add option to set/clear LPM mode
- Add requirements file to debugfs
- Get LPM requirements for Tiger Lake
- Show LPM residency in microseconds
- Handle sub-states generically
- Remove global struct pmc_dev
- Don't use global pmcdev in quirks
- export platform global reset bits via etr3 sysfs file
- Ignore GBE LTR on Tiger Lake platforms
- Update Kconfig
intel_pmt_class:
- Initial resource to 0
intel_pmt_crashlog:
- Fix incorrect macros
mfd:
- intel_pmt: Add support for DG1
- intel_pmt: Fix nuisance messages and handling of disabled capabilities
panasonic-laptop:
- remove redundant assignment of variable result
platform:
- x86: ACPI: Get rid of ACPICA message printing
platform/mellanox:
- mlxreg-hotplug: move to use request_irq by IRQF_NO_AUTOEN flag
- Typo fix in the file mlxbf-bootctl.c
platform/surface:
- aggregator: fix a bit test
- aggregator: move to use request_irq by IRQF_NO_AUTOEN flag
- aggregator_registry: Give devices time to set up when connecting
- clean up a variable in surface_dtx_read()
- fix semicolon.cocci warnings
- aggregator_registry: Add support for Surface Pro 7+
- aggregator_registry: Make symbol 'ssam_base_hub_group' static
- dtx: Add support for native SSAM devices
- Add DTX driver
- aggregator: Make SSAM_DEFINE_SYNC_REQUEST_x define static functions
- Add platform profile driver
- aggregator_registry: Add HID subsystem devices
- aggregator_registry: Add DTX device
- aggregator_registry: Add platform profile device
- aggregator_registry: Add battery subsystem devices
- aggregator_registry: Add base device hub
- Set up Surface Aggregator device registry
pmc_atom:
- Match all Beckhoff Automation baytrail boards with critclk_systems DMI table
thinkpad_acpi:
- Add labels to the first 2 temperature sensors
- Correct thermal sensor allocation
- Correct minor typo
- sysfs interface to get wwan antenna type
- Disable DYTC CQL mode around switching to balanced mode
- Allow the FnLock LED to change state
- check dytc version for lapmode sysfs
- Handle keyboard cover attach/detach events
tools/power/x86/intel-speed-select:
- v1.9 release
- Drop __DATE__ and __TIME__ macros
- Add options to force online
- Process mailbox read error for core-power
- Increase string size
touchscreen_dmi:
- Add info for the Teclast Tbook 11 tablet
- Handle device properties with software node API
wmi:
- Make remove callback return void
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmCGbVEUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9x4ywgAo51ExPQcLMlEDdfpN7oa0ErT+4AF
lKqOHO/g3Am63NwlAVZElKAJq+AChfQzZ+Idy9E/IirFplmhuoKBBRQoB+U9SwYS
zerwNDwAh1j1ZLlWDo0BSsiJLdGJH3j5BvScjo57+Vfa75J9EofIGXvNEjLNxb7j
djLc4FawAfaqL6YerKXZPvYIfpIw2+26SyxDw2s6KlYyBkPIEneQvto0ObWR3vLc
1iFxLgfxL1fYX7dD9e/9H84kIQzs/wgTduXmnSn32BcFw3YOtWpnpwB0wJ8IIXM0
8Ta6jH2ZGTbgfKaHZf2O+UObj8tRXFzjpx4neh5vybRrBsYELzQIm+W+jQ==
=fsK6
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates freom Hans de Goede:
- lots of Microsoft Surface work
- platform-profile support for HP and Microsoft Surface devices
- new WMI Gigabyte motherboard temperature monitoring driver
- Intel PMC improvements for Tiger Lake and Alder Lake
- misc bugfixes, improvements and quirk additions all over
* tag 'platform-drivers-x86-v5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (87 commits)
platform/x86: gigabyte-wmi: add support for B550M AORUS PRO-P
platform/x86: intel_pmc_core: Uninitialized data in pmc_core_lpm_latch_mode_write()
platform/x86: intel_pmc_core: add ACPI dependency
platform/surface: aggregator: fix a bit test
platform/x86: intel_pmc_core: Fix "unsigned 'ret' is never less than zero" smatch warning
platform/x86: touchscreen_dmi: Add info for the Teclast Tbook 11 tablet
platform/x86: intel_pmc_core: Add support for Alder Lake PCH-P
platform/x86: intel_pmc_core: Add LTR registers for Tiger Lake
platform/x86: intel_pmc_core: Add option to set/clear LPM mode
platform/x86: intel_pmc_core: Add requirements file to debugfs
platform/x86: intel_pmc_core: Get LPM requirements for Tiger Lake
platform/x86: intel_pmc_core: Show LPM residency in microseconds
platform/x86: intel_pmc_core: Handle sub-states generically
platform/x86: intel_pmc_core: Remove global struct pmc_dev
platform/x86: intel_pmc_core: Don't use global pmcdev in quirks
platform/x86: intel_chtdc_ti_pwrbtn: Fix missing IRQF_ONESHOT as only threaded handler
platform/x86: gigabyte-wmi: add X570 AORUS ELITE
platform/x86: thinkpad_acpi: Add labels to the first 2 temperature sensors
platform/x86: pmc_atom: Match all Beckhoff Automation baytrail boards with critclk_systems DMI table
platform/x86: add Gigabyte WMI temperature driver
...
- MTE asynchronous support for KASan. Previously only synchronous
(slower) mode was supported. Asynchronous is faster but does not allow
precise identification of the illegal access.
- Run kernel mode SIMD with softirqs disabled. This allows using NEON in
softirq context for crypto performance improvements. The conditional
yield support is modified to take softirqs into account and reduce the
latency.
- Preparatory patches for Apple M1: handle CPUs that only have the VHE
mode available (host kernel running at EL2), add FIQ support.
- arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers, new
functions for the HiSilicon HHA and L3C PMU, cleanups.
- Re-introduce support for execute-only user permissions but only when
the EPAN (Enhanced Privileged Access Never) architecture feature is
available.
- Disable fine-grained traps at boot and improve the documented boot
requirements.
- Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC).
- Add hierarchical eXecute Never permissions for all page tables.
- Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs
to control which PAC keys are enabled in a particular task.
- arm64 kselftests for BTI and some improvements to the MTE tests.
- Minor improvements to the compat vdso and sigpage.
- Miscellaneous cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmB5xkkACgkQa9axLQDI
XvEBgRAAsr6r8gsBQJP3FDHmbtbVf2ej5QJTCOAQAGHbTt0JH7Pk03pWSBr7h5nF
vsddRDxxeDgB6xd7jWP7EvDaPxHeB0CdSj5gG8EP/ZdOm8sFAwB1ZIHWikgUgSwW
nu6R28yXTMSj+EkyFtahMhTMJ1EMF4sCPuIgAo59ST5w/UMMqLCJByOu4ej6RPKZ
aeSJJWaDLBmbgnTKWxRvCc/MgIx4J/LAHWGkdpGjuMK6SLp38Kdf86XcrklXtzwf
K30ZYeoKq8zZ+nFOsK9gBVlOlocZcbS1jEbN842jD6imb6vKLQtBWrKk9A6o4v5E
XulORWcSBhkZb3ItIU9+6SmelUExf0VeVlSp657QXYPgquoIIGvFl6rCwhrdGMGO
bi6NZKCfJvcFZJoIN1oyhuHejgZSBnzGEcvhvzNdg7ItvOCed7q3uXcGHz/OI6tL
2TZKddzHSEMVfTo0D+RUsYfasZHI1qAiQ0mWVC31c+YHuRuW/K/jlc3a5TXlSBUa
Dwu0/zzMLiqx65ISx9i7XNMrngk55uzrS6MnwSByPoz4M4xsElZxt3cbUxQ8YAQz
jhxTHs1Pwes8i7f4n61ay/nHCFbmVvN/LlsPRpZdwd8JumThLrDolF3tc6aaY0xO
hOssKtnGY4Xvh/WitfJ5uvDb1vMObJKTXQEoZEJh4hlNQDxdeUE=
=6NGI
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- MTE asynchronous support for KASan. Previously only synchronous
(slower) mode was supported. Asynchronous is faster but does not
allow precise identification of the illegal access.
- Run kernel mode SIMD with softirqs disabled. This allows using NEON
in softirq context for crypto performance improvements. The
conditional yield support is modified to take softirqs into account
and reduce the latency.
- Preparatory patches for Apple M1: handle CPUs that only have the VHE
mode available (host kernel running at EL2), add FIQ support.
- arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers,
new functions for the HiSilicon HHA and L3C PMU, cleanups.
- Re-introduce support for execute-only user permissions but only when
the EPAN (Enhanced Privileged Access Never) architecture feature is
available.
- Disable fine-grained traps at boot and improve the documented boot
requirements.
- Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC).
- Add hierarchical eXecute Never permissions for all page tables.
- Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs
to control which PAC keys are enabled in a particular task.
- arm64 kselftests for BTI and some improvements to the MTE tests.
- Minor improvements to the compat vdso and sigpage.
- Miscellaneous cleanups.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (86 commits)
arm64/sve: Add compile time checks for SVE hooks in generic functions
arm64/kernel/probes: Use BUG_ON instead of if condition followed by BUG.
arm64: pac: Optimize kernel entry/exit key installation code paths
arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS)
arm64: mte: make the per-task SCTLR_EL1 field usable elsewhere
arm64/sve: Remove redundant system_supports_sve() tests
arm64: fpsimd: run kernel mode NEON with softirqs disabled
arm64: assembler: introduce wxN aliases for wN registers
arm64: assembler: remove conditional NEON yield macros
kasan, arm64: tests supports for HW_TAGS async mode
arm64: mte: Report async tag faults before suspend
arm64: mte: Enable async tag check fault
arm64: mte: Conditionally compile mte_enable_kernel_*()
arm64: mte: Enable TCO in functions that can read beyond buffer limits
kasan: Add report for async mode
arm64: mte: Drop arch_enable_tagging()
kasan: Add KASAN mode kernel parameter
arm64: mte: Add asynchronous mode support
arm64: Get rid of CONFIG_ARM64_VHE
arm64: Cope with CPUs stuck in VHE mode
...
Add RISC-V to the list of supported kexec architectures, we need to
add the definition early-on so that later patches can use it.
EM_RISCV is 243 as per ELF psABI specification here:
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md
Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
A lot of changes here for quite a quiet release in subsystem terms -
there's been a lot of fixes and cleanups all over the subsystem both
from generic work and from people working on specific drivers.
- More cleanup and consolidation work in the core and the generic card
drivers from Morimoto-san.
- Lots of cppcheck fixes for Pierre-Louis Brossart.
- New drivers for Freescale i.MX DMA over rpmsg, Mediatek MT6358
accessory detection, and Realtek RT1019, RT1316, RT711 and RT715.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmCGv4YACgkQJNaLcl1U
h9DwqAf/bSdRqMQLPAAzU/O79ztMRwSRcF14ygZceoqnNbohwqzeFTHweTK8NINj
dZsZiXK/NYDlcbBE3e5VcYr6g149L+1Xu6HZEY1CBUz7LOR8QaHUXAnJQHuXlv/D
J0EK5NBILR8jk9mpPd/c+dd3lo4liREWTOQKCcIuFI8M5V8CZqtoSfg6RK2qf3Oi
myC3+2pEqI4+h5GQRy5y7mxtFOn4w9kzp49P7EwD9SL9o4VGbsaORMeA+QaOe9PS
KLn6ZKSJ7lBcxvg5a1w4E4SwRC/GA0QY+n1YMNGfrfCm7PSdw4GSyovd9xQKwrvG
vhf+bYkzRBVRqvQP9pvrGGJY9DdDIA==
=A+NC
-----END PGP SIGNATURE-----
Merge tag 'asoc-v5.13' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v5.13
A lot of changes here for quite a quiet release in subsystem terms -
there's been a lot of fixes and cleanups all over the subsystem both
from generic work and from people working on specific drivers.
- More cleanup and consolidation work in the core and the generic card
drivers from Morimoto-san.
- Lots of cppcheck fixes for Pierre-Louis Brossart.
- New drivers for Freescale i.MX DMA over rpmsg, Mediatek MT6358
accessory detection, and Realtek RT1019, RT1316, RT711 and RT715.
The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.
This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.
v2: netdev wants non-standard line length
Signed-off-by: Jethro Beekman <kernel@jbeekman.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-04-23
The following pull-request contains BPF updates for your *net-next* tree.
We've added 69 non-merge commits during the last 22 day(s) which contain
a total of 69 files changed, 3141 insertions(+), 866 deletions(-).
The main changes are:
1) Add BPF static linker support for extern resolution of global, from Andrii.
2) Refine retval for bpf_get_task_stack helper, from Dave.
3) Add a bpf_snprintf helper, from Florent.
4) A bunch of miscellaneous improvements from many developers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new io_uring_register() opcode for rsrc registeration. Instead of
accepting a pointer to resources, fds or iovecs, it @arg is now pointing
to a struct io_uring_rsrc_register, and the second argument tells how
large that struct is to make it easily extendible by adding new fields.
All that is done mainly to be able to pass in a pointer with tags. Pass
it in and enable CQE posting for file resources. Doesn't support setting
tags on update yet.
A design choice made here is to not post CQEs on rsrc de-registration,
but only when we updated-removed it by rsrc dynamic update.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c498aaec32a4bb277b2406b9069662c02cdda98c.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
New features:
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)
Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCCpuAPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpD2G8QALWQYeBggKnNmAJfuihzZ2WariBmgcENs2R2
qNZ/Py6dIF+b69P68nmgrEV1x2Kp35cPJbBwXnnrS4FCB5tk0b8YMaj00QbiRIYV
UXbPxQTmYO1KbevpoEcw8NmR4bZJ/hRYPuzcQG7CCMKIZw0zj2cMcBofzQpTOAp/
CgItdcv7at3iwamQatfU9vUmC0nDdnjdIwSxTAJOYMVV1ENwtnYSNgZVo4XLTg7n
xR/5Qx27PKBJw7GyTRAIIxKAzNXG2tDL+GVIHe4AnRp3z3La8sr6PJf7nz9MCmco
ISgeY7EGQINzmm4LahpnV+2xwwxOWo8QotxRFGNuRTOBazfARyAbp97yJ6eXJUpa
j0qlg3xK9neyIIn9BQKkKx4sY9V45yqkuVDsK6odmqPq3EE01IMTRh1N/XQi+sTF
iGrlM3ZW4AjlT5zgtT9US/FRXeDKoYuqVCObJeXZdm3sJSwEqTAs0JScnc0YTsh7
m30CODnomfR2y5X6GoaubbQ0wcZ2I20K1qtIm+2F6yzD5P1/3Yi8HbXMxsSWyYWZ
1ldoSa+ZUQlzV9Ot0S3iJ4PkphLKmmO96VlxE2+B5gQG50PZkLzsr8bVyYOuJC8p
T83xT9xd07cy+FcGgF9veZL99Y6BLHMa6ZwFUolYNbzJxqrmqyR1aiJMEBIcX+aP
ACeKW1w5
=fpey
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.13
New features:
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)
Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
Add a new flag LANDLOCK_CREATE_RULESET_VERSION to
landlock_create_ruleset(2). This enables to retreive a Landlock ABI
version that is useful to efficiently follow a best-effort security
approach. Indeed, it would be a missed opportunity to abort the whole
sandbox building, because some features are unavailable, instead of
protecting users as much as possible with the subset of features
provided by the running kernel.
This new flag enables user space to identify the minimum set of Landlock
features supported by the running kernel without relying on a filesystem
interface (e.g. /proc/version, which might be inaccessible) nor testing
multiple syscall argument combinations (i.e. syscall bisection). New
Landlock features will be documented and tied to a minimum version
number (greater than 1). The current version will be incremented for
each new kernel release supporting new Landlock features. User space
libraries can leverage this information to seamlessly restrict processes
as much as possible while being compatible with newer APIs.
This is a much more lighter approach than the previous
landlock_get_features(2): the complexity is pushed to user space
libraries. This flag meets similar needs as securityfs versions:
selinux/policyvers, apparmor/features/*/version* and tomoyo/version.
Supporting this flag now will be convenient for backward compatibility.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-14-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
These 3 system calls are designed to be used by unprivileged processes
to sandbox themselves:
* landlock_create_ruleset(2): Creates a ruleset and returns its file
descriptor.
* landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
ruleset, identified by the dedicated file descriptor.
* landlock_restrict_self(2): Enforces a ruleset on the calling thread
and its future children (similar to seccomp). This syscall has the
same usage restrictions as seccomp(2): the caller must have the
no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
namespace.
All these syscalls have a "flags" argument (not currently used) to
enable extensibility.
Here are the motivations for these new syscalls:
* A sandboxed process may not have access to file systems, including
/dev, /sys or /proc, but it should still be able to add more
restrictions to itself.
* Neither prctl(2) nor seccomp(2) (which was used in a previous version)
fit well with the current definition of a Landlock security policy.
All passed structs (attributes) are checked at build time to ensure that
they don't contain holes and that they are aligned the same way for each
architecture.
See the user and kernel documentation for more details (provided by a
following commit):
* Documentation/userspace-api/landlock.rst
* Documentation/security/landlock.rst
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Link: https://lore.kernel.org/r/20210422154123.13086-9-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
Using Landlock objects and ruleset, it is possible to tag inodes
according to a process's domain. To enable an unprivileged process to
express a file hierarchy, it first needs to open a directory (or a file)
and pass this file descriptor to the kernel through
landlock_add_rule(2). When checking if a file access request is
allowed, we walk from the requested dentry to the real root, following
the different mount layers. The access to each "tagged" inodes are
collected according to their rule layer level, and ANDed to create
access to the requested file hierarchy. This makes possible to identify
a lot of files without tagging every inodes nor modifying the
filesystem, while still following the view and understanding the user
has from the filesystem.
Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
keep the same struct inodes for the same inodes whereas these inodes are
in use.
This commit adds a minimal set of supported filesystem access-control
which doesn't enable to restrict all file-related actions. This is the
result of multiple discussions to minimize the code of Landlock to ease
review. Thanks to the Landlock design, extending this access-control
without breaking user space will not be a problem. Moreover, seccomp
filters can be used to restrict the use of syscall families which may
not be currently handled by Landlock.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-8-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
The command is used for copying the incoming buffer into the
SEV guest memory space.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <c5d0e3e719db7bb37ea85d79ed4db52e9da06257.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The command is used to create the encryption context for an incoming
SEV guest. The encryption context can be later used by the hypervisor
to import the incoming data into the SEV guest memory space.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <c7400111ed7458eee01007c4d8d57cdf2cbb0fc2.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After completion of SEND_START, but before SEND_FINISH, the source VMM can
issue the SEND_CANCEL command to stop a migration. This is necessary so
that a cancelled migration can restart with a new target later.
Reviewed-by: Nathan Tempelman <natet@google.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Steve Rutherford <srutherford@google.com>
Message-Id: <20210412194408.2458827-1-srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The command is used for encrypting the guest memory region using the encryption
context created with KVM_SEV_SEND_START.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by : Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <d6a6ea740b0c668b30905ae31eac5ad7da048bb3.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a capability for userspace to mirror SEV encryption context from
one vm to another. On our side, this is intended to support a
Migration Helper vCPU, but it can also be used generically to support
other in-guest workloads scheduled by the host. The intention is for
the primary guest and the mirror to have nearly identical memslots.
The primary benefits of this are that:
1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
can't accidentally clobber each other.
2) The VMs can have different memory-views, which is necessary for post-copy
migration (the migration vCPUs on the target need to read and write to
pages, when the primary guest would VMEXIT).
This does not change the threat model for AMD SEV. Any memory involved
is still owned by the primary guest and its initial state is still
attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
to circumvent SEV, they could achieve the same effect by simply attaching
a vCPU to the primary VM.
This patch deliberately leaves userspace in charge of the memslots for the
mirror, as it already has the power to mess with them in the primary guest.
This patch does not support SEV-ES (much less SNP), as it does not
handle handing off attested VMSAs to the mirror.
For additional context, we need a Migration Helper because SEV PSP
migration is far too slow for our live migration on its own. Using
an in-guest migrator lets us speed this up significantly.
Signed-off-by: Nathan Tempelman <natet@google.com>
Message-Id: <20210408223214.2582277-1-natet@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xnack retries are used for page fault recovery. Some AMD chip
families support continuously retry while page table entries are invalid.
The driver must handle the page fault interrupt and fill in a valid entry
for the GPU to continue.
This ioctl allows to enable/disable XNACK retries per KFD process.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add svm (shared virtual memory) ioctl data structure and API definition.
The svm ioctl API is designed to be extensible in the future. All
operations are provided by a single IOCTL to preserve ioctl number
space. The arguments structure ends with a variable size array of
attributes that can be used to set or get one or multiple attributes.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* set sk_pacing_shift for 802.3->802.11 encap offload
* some monitor support for 802.11->802.3 decap offload
* HE (802.11ax) spec updates
* userspace API for TDLS HE support
* along with various other small features, cleanups and
fixups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmB+4y8ACgkQB8qZga/f
l8SiGw/9Fz3XETnNDYMvyY7ppmWzZ6vofRq307YJiCz1fszEKqwyyzMQOrHA9tg2
Nasl711egWlVyHTBCN+VCSaTQUjkODsK/5t4XWoxdJ0J3lZkgryVGBJljpl+k4A6
11qpvwUnO1WCmt0s49V2yU/jWgZ9itHfu9dosu/YIq+NfXUVA7ylKmP3gqfmcCeV
631z5AnM8/9N8QVMpnk5F2fE57WUXbA+KdVsw0LXMmjXYSsQ9MyTBX/lRDVcaMWV
7cOtHekkzD0MVfsOoBVvsJl+bybBgEPOfZn2Kt22Rh4JzAch/uUhwRQGzsGxcR3p
D8W9BABXCU8C5mhP8gcKlOSuH3h7ydKKqrXXNeRO+y5hymOtUSGJxia93m+uQ8qC
97wootP3cb97/dEzv5cWqw5Pa39uEsny6mQqueD5WcMI9imL98HEo3hrZElbctx8
s9ZE37WAlZ0zw+cGIsmElZfE2qMqEhjxF3mGFcpXLkk9/Y/1jmypYopkBLJh6KcS
mIfwk9qWgADbPT5df1A/1388lMkjBRcQGc1SriYxy/olvb70mD8IPPiDSD2kULDt
Sq2frnOdvjW0Q5DB6jBKzdMudAxY3WP5MlcGDy1iYwEbY6s4lPfQXG48joJpRQFG
I3zPM6Z+Pimx7vcTd5a+IUyKvDoF+DtxiOu8DGKYT2M5tv3/tpI=
=b0NQ
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2021-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Another set of updates, all over the map:
* set sk_pacing_shift for 802.3->802.11 encap offload
* some monitor support for 802.11->802.3 decap offload
* HE (802.11ax) spec updates
* userspace API for TDLS HE support
* along with various other small features, cleanups and
fixups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
cap_setfcap is required to create file capabilities.
Since commit 8db6c34f1d ("Introduce v3 namespaced file capabilities"),
a process running as uid 0 but without cap_setfcap is able to work
around this as follows: unshare a new user namespace which maps parent
uid 0 into the child namespace.
While this task will not have new capabilities against the parent
namespace, there is a loophole due to the way namespaced file
capabilities are represented as xattrs. File capabilities valid in
userns 1 are distinguished from file capabilities valid in userns 2 by
the kuid which underlies uid 0. Therefore the restricted root process
can unshare a new self-mapping namespace, add a namespaced file
capability onto a file, then use that file capability in the parent
namespace.
To prevent that, do not allow mapping parent uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the
capability for setting file capabilities.
As a further wrinkle: a task can unshare its user namespace, then open
its uid_map file itself, and map (only) its own uid. In this case we do
not have the credential from before unshare, which was potentially more
restricted. So, when creating a user namespace, we record whether the
creator had CAP_SETFCAP. Then we can use that during map_write().
With this patch:
1. Unprivileged user can still unshare -Ur
ubuntu@caps:~$ unshare -Ur
root@caps:~# logout
2. Root user can still unshare -Ur
ubuntu@caps:~$ sudo bash
root@caps:/home/ubuntu# unshare -Ur
root@caps:/home/ubuntu# logout
3. Root user without CAP_SETFCAP cannot unshare -Ur:
root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
unable to set CAP_SETFCAP effective capability: Operation not permitted
root@caps:/home/ubuntu# unshare -Ur
unshare: write failed /proc/self/uid_map: Operation not permitted
Note: an alternative solution would be to allow uid 0 mappings by
processes without CAP_SETFCAP, but to prevent such a namespace from
writing any file capabilities. This approach can be seen at [1].
Background history: commit 95ebabde38 ("capabilities: Don't allow
writing ambiguous v3 file capabilities") tried to fix the issue by
preventing v3 fscaps to be written to disk when the root uid would map
to the same uid in nested user namespaces. This led to regressions for
various workloads. For example, see [2]. Ultimately this is a valid
use-case we have to support meaning we had to revert this change in
3b0c2d3eaa ("Revert 95ebabde38 ("capabilities: Don't allow writing
ambiguous v3 file capabilities")").
Link: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 [1]
Link: https://github.com/containers/buildah/issues/3071 [2]
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Andrew G. Morgan <morgan@kernel.org>
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace
to grant a VM access to a priveleged attribute, with args[0] holding a
file handle to a valid SGX attribute file.
The SGX subsystem restricts access to a subset of enclave attributes to
provide additional security for an uncompromised kernel, e.g. to prevent
malware from using the PROVISIONKEY to ensure its nodes are running
inside a geniune SGX enclave and/or to obtain a stable fingerprint.
To prevent userspace from circumventing such restrictions by running an
enclave in a VM, KVM restricts guest access to privileged attributes by
default.
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <0b099d65e933e068e3ea934b0523bab070cb8cea.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The implementation takes inspiration from the existing bpf_trace_printk
helper but there are a few differences:
To allow for a large number of format-specifiers, parameters are
provided in an array, like in bpf_seq_printf.
Because the output string takes two arguments and the array of
parameters also takes two arguments, the format string needs to fit in
one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to
a zero-terminated read-only map so we don't need a format string length
arg.
Because the format-string is known at verification time, we also do
a first pass of format string validation in the verifier logic. This
makes debugging easier.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-4-revest@chromium.org
Current Hardware events and Hardware cache events have special perf
types, PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The two types don't
pass the PMU type in the user interface. For a hybrid system, the perf
subsystem doesn't know which PMU the events belong to. The first capable
PMU will always be assigned to the events. The events never get a chance
to run on the other capable PMUs.
Extend the two types to become PMU aware types. The PMU type ID is
stored at attr.config[63:32].
Add a new PMU capability, PERF_PMU_CAP_EXTENDED_HW_TYPE, to indicate a
PMU which supports the extended PERF_TYPE_HARDWARE and
PERF_TYPE_HW_CACHE.
The PMU type is only required when searching a specific PMU. The PMU
specific codes will only be interested in the 'real' config value, which
is stored in the low 32 bit of the event->attr.config. Update the
event->attr.config in the generic code, so the PMU specific codes don't
need to calculate it separately.
If a user specifies a PMU type, but the PMU doesn't support the extended
type, error out.
If an event cannot be initialized in a PMU specified by a user, error
out immediately. Perf should not try to open it on other PMUs.
The new PMU capability is only set for the X86 hybrid PMUs for now.
Other architectures, e.g., ARM, may need it as well. The support on ARM
may be implemented later separately.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1618237865-33448-22-git-send-email-kan.liang@linux.intel.com
Draft P802.11ax_D2.5 defines the following capabilities that
can be negotiated using RSNXE capabilities:
- Secure LTF measurement exchange protocol.
- Secure RTT measurement exchange protocol.
- Management frame protection for all management frames exchanged
during the negotiation and range measurement procedure.
Extend the nl80211 API to allow drivers to declare support for
these new capabilities as part of extended feature.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210409123755.8280e31d8091.Ifcb29f84f432290338f80c8378aa5c9e0a390c93@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
- keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
- fix build after move to net_generic
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This capability will allow the user to know which KVM_GUESTDBG_* bits
are supported.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401135451.1004564-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Most devices maintain RMON (RFC 2819) stats - particularly
the "histogram" of packets received by size. Unlike other
RFCs which duplicate IEEE stats, the short/oversized frame
counters in RMON don't seem to match IEEE stats 1-to-1 either,
so expose those, too. Do not expose basic packet, CRC errors
etc - those are already otherwise covered.
Because standard defines packet ranges only up to 1518, and
everything above that should theoretically be "oversized"
- devices often create their own ranges.
Going beyond what the RFC defines - expose the "histogram"
in the Tx direction (assume for now that the ranges will
be the same).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Number of devices maintains the standard-based MAC control
counters for control frames. Add a API for those.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the MAC statistics are included in
struct rtnl_link_stats64, but some fields
are aggregated. Besides it's good to expose
these clearly hardware stats separately.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an interface for reading standard stats, including
stats which don't have a corresponding control interface.
Start with IEEE 802.3 PHY stats. There seems to be only
one stat to expose there.
Define API to not require user space changes when new
stats or groups are added. Groups are based on bitset,
stats have a string set associated.
v1: wrap stats in a nest
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new software event to count context switches
involving cgroup switches. So it's counted only if cgroups of
previous and next tasks are different. Note that it only checks the
cgroups in the perf_event subsystem. For cgroup v2, it shouldn't
matter anyway.
One can argue that we can do this by using existing sched_switch event
with eBPF. But some systems might not have eBPF for some reason so
I'd like to add this as a simple way.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210210083327.22726-2-namhyung@kernel.org
Adds bit perf_event_attr::sigtrap, which can be set to cause events to
send SIGTRAP (with si_code TRAP_PERF) to the task where the event
occurred. The primary motivation is to support synchronous signals on
perf events in the task where an event (such as breakpoints) triggered.
To distinguish perf events based on the event type, the type is set in
si_errno. For events that are associated with an address, si_addr is
copied from perf_sample_data.
The new field perf_event_attr::sig_data is copied to si_perf, which
allows user space to disambiguate which event (of the same type)
triggered the signal. For example, user space could encode the relevant
information it cares about in sig_data.
We note that the choice of an opaque u64 provides the simplest and most
flexible option. Alternatives where a reference to some user space data
is passed back suffer from the problem that modification of referenced
data (be it the event fd, or the perf_event_attr) can race with the
signal being delivered (of course, the same caveat applies if user space
decides to store a pointer in sig_data, but the ABI explicitly avoids
prescribing such a design).
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.net/
Introduces the TRAP_PERF si_code, and associated siginfo_t field
si_perf. These will be used by the perf event subsystem to send signals
(if requested) to the task where an event occurred.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: Arnd Bergmann <arnd@arndb.de> # asm-generic
Link: https://lkml.kernel.org/r/20210408103605.1676875-6-elver@google.com
Adds bit perf_event_attr::remove_on_exec, to support removing an event
from a task on exec.
This option supports the case where an event is supposed to be
process-wide only, and should not propagate beyond exec, to limit
monitoring to the original process image only.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210408103605.1676875-5-elver@google.com
Adds bit perf_event_attr::inherit_thread, to restricting inheriting
events only if the child was cloned with CLONE_THREAD.
This option supports the case where an event is supposed to be
process-wide only (including subthreads), but should not propagate
beyond the current process's shared environment.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YBvj6eJR%2FDY2TsEB@hirez.programming.kicks-ass.net/
Similarly to pause statistics add stats for FEC.
The IEEE standard mandates two sets of counters:
- 30.5.1.1.17 aFECCorrectedBlocks
- 30.5.1.1.18 aFECUncorrectableBlocks
where block is a block of bits FEC operates on.
Each of these counters is defined per lane (PCS instance).
Multiple vendors provide number of corrected _bits_ rather
than/as well as blocks.
This set adds the 2 standard-based block counters and a extra
one for corrected bits.
Counters are exposed to user space via netlink in new attributes.
Each attribute carries an array of u64s, first element is
the total count, and the following ones are a per-lane break down.
Much like with pause stats the operation will not fail when driver
does not implement the get_fec_stats callback (nor can the driver
fail the operation by returning an error). If stats can't be
reported the relevant attributes will be empty.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Couple of dmaengine driver fixes for:
- race and descriptor issue for xilinx driver
- fix interrupt handling, wq state & cleanup, field sizes for
completion, msix permissions for idxd driver
- rumtim pm fix for tegra driver
- double free fix in dma_async_device_register
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmB3F20ACgkQfBQHDyUj
g0d4GhAAw24ujgoKFdtV11h8eVyHwbsP45saRD8FVZ81oQ/3PI60rwnfuBNRnv4y
KkWxQj0TsV1XezJ2t16Ovoq/9y8m/x5pWgP0LxwYj463P8PP40p31WZJ1qaAcSCp
elau7TaoUSPW5HmD+Rv9IKpzVwLR9VLu0nqL22aJV62zouTJ0CCBFC/tJO+qrnmx
vsa8kE3B85M+KzONUlmYstclgYEMaH2IROnrY0fhY68Txg2BENl9+mT69YaY6btb
UABhKCUlfHYlXqCZNNsXlPo0E+n/jjVnTUceblCWWMwz3bPuYlc2KQLz49Nd3dWQ
CRj2WQ9YWBXcPntUOicLMOFEUiAyUEC+bRssCfLofqKltRDXI5vyq6Wdo23L1Dz5
eXLqI9DgccX4uoDQNp72AIPR7WO2pJnWjeLf7+tiXQ+htyr/SLfaZm2jciI4DcvK
NLczhwMo0aWb50XYPI1iyN+Rhohq/s724xHPx5JrlxJlacuXwHIqUs27GrMpdEQv
7Cpn4Hb0orPVjDeMQLFUcN8L7OmxC3M+vfTTl35ACI0EyAaeMW5Bm8SsKY8FjmFz
f6OioKE7wDlfUnJqqQyYaZoeaW40ofGJoVSB6cs3XyJXH6PNYoVD+rKw5joS06Jt
PbKkSJm3ZyV/m4Dd+GALRqntpHuYB4AaJbDI/blPzRzCJj4olJI=
=GduB
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-fix-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"A couple of dmaengine driver fixes for:
- race and descriptor issue for xilinx driver
- fix interrupt handling, wq state & cleanup, field sizes for
completion, msix permissions for idxd driver
- runtime pm fix for tegra driver
- double free fix in dma_async_device_register"
* tag 'dmaengine-fix-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: idxd: fix wq cleanup of WQCFG registers
dmaengine: idxd: clear MSIX permission entry on shutdown
dmaengine: plx_dma: add a missing put_device() on error path
dmaengine: tegra20: Fix runtime PM imbalance on error
dmaengine: Fix a double free in dma_async_device_register
dmaengine: dw: Make it dependent to HAS_IOMEM
dmaengine: idxd: fix wq size store permission state
dmaengine: idxd: fix opcap sysfs attribute output
dmaengine: idxd: fix delta_rec and crc size field for completion record
dmaengine: idxd: Fix clobbering of SWERR overflow bit on writeback
dmaengine: xilinx: dpdma: Fix race condition in done IRQ
dmaengine: xilinx: dpdma: Fix descriptor issuing on video group
When posix access ACL is set, it can have an effect on file mode and it can
also need to clear SGID if.
- None of caller's group/supplementary groups match file owner group.
AND
- Caller is not priviliged (No CAP_FSETID).
As of now fuser server is responsible for changing the file mode as
well. But it does not know whether to clear SGID or not.
So add a flag FUSE_SETXATTR_ACL_KILL_SGID and send this info with SETXATTR
to let file server know that sgid needs to be cleared as well.
Reported-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fuse client needs to send additional information to file server when it
calls SETXATTR(system.posix_acl_access), so add extra flags field to the
structure.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There is currently no way to discover the target of a tracing program
attachment after the fact. Add this information to bpf_link_info and return
it when querying the bpf_link fd.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413091607.58945-1-toke@redhat.com
msm-next pull request has a baseline with stuff from -fixes, roll
forward first.
Some simple conflicts in amdgpu, ttm and one in i915 where git gets
confused and tries to add the same function twice.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This change introduces a prctl that allows the user program to control
which PAC keys are enabled in a particular task. The main reason
why this is useful is to enable a userspace ABI that uses PAC to
sign and authenticate function pointers and other pointers exposed
outside of the function, while still allowing binaries conforming
to the ABI to interoperate with legacy binaries that do not sign or
authenticate pointers.
The idea is that a dynamic loader or early startup code would issue
this prctl very early after establishing that a process may load legacy
binaries, but before executing any PAC instructions.
This change adds a small amount of overhead to kernel entry and exit
due to additional required instruction sequences.
On a DragonBoard 845c (Cortex-A75) with the powersave governor, the
overhead of similar instruction sequences was measured as 4.9ns when
simulating the common case where IA is left enabled, or 43.7ns when
simulating the uncommon case where IA is disabled. These numbers can
be seen as the worst case scenario, since in more realistic scenarios
a better performing governor would be used and a newer chip would be
used that would support PAC unlike Cortex-A75 and would be expected
to be faster than Cortex-A75.
On an Apple M1 under a hypervisor, the overhead of the entry/exit
instruction sequences introduced by this patch was measured as 0.3ns
in the case where IA is left enabled, and 33.0ns in the case where
IA is disabled.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://linux-review.googlesource.com/id/Ibc41a5e6a76b275efbaa126b31119dc197b927a5
Link: https://lore.kernel.org/r/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
'linux/blkdev.h' and 'uapi/linux/lightnvm.h' included in 'lightnvm.h'
is duplicated.It is also included in the 5th and 7th line.
Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-4-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In 'bpf_ringbuf_reserve()' we require the flag to '0' at the moment.
For 'bpf_ringbuf_{discard,submit,output}' a flag of '0' might send a
notification to the process if needed.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210412192434.944343-1-pctammela@mojatatu.com
Per net/bpf/test_run.c, particular prog types have additional
restrictions around the parameters that can be provided, so document
these in the header.
I didn't bother documenting the limitation on duration for raw
tracepoints since that's an output parameter anyway.
Tested with ./tools/testing/selftests/bpf/test_doc_build.sh.
Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210410174549.816482-1-joe@cilium.io
This adds two new POLL_ADD flags, IORING_POLL_UPDATE_EVENTS and
IORING_POLL_UPDATE_USER_DATA. As with the other POLL_ADD flag, these are
masked into sqe->len. If set, the POLL_ADD will have the following
behavior:
- sqe->addr must contain the the user_data of the poll request that
needs to be modified. This field is otherwise invalid for a POLL_ADD
command.
- If IORING_POLL_UPDATE_EVENTS is set, sqe->poll_events must contain the
new mask for the existing poll request. There are no checks for whether
these are identical or not, if a matching poll request is found, then it
is re-armed with the new mask.
- If IORING_POLL_UPDATE_USER_DATA is set, sqe->off must contain the new
user_data for the existing poll request.
A POLL_ADD with any of these flags set may complete with any of the
following results:
1) 0, which means that we successfully found the existing poll request
specified, and performed the re-arm procedure. Any error from that
re-arm will be exposed as a completion event for that original poll
request, not for the update request.
2) -ENOENT, if no existing poll request was found with the given
user_data.
3) -EALREADY, if the existing poll request was already in the process of
being removed/canceled/completing.
4) -EACCES, if an attempt was made to modify an internal poll request
(eg not one originally issued ass IORING_OP_POLL_ADD).
The usual -EINVAL cases apply as well, if any invalid fields are set
in the sqe for this command type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The default io_uring poll mode is one-shot, where once the event triggers,
the poll command is completed and won't trigger any further events. If
we're doing repeated polling on the same file or socket, then it can be
more efficient to do multishot, where we keep triggering whenever the
event becomes true.
This deviates from the usual norm of having one CQE per SQE submitted. Add
a CQE flag, IORING_CQE_F_MORE, which tells the application to expect
further completion events from the submitted SQE. Right now the only user
of this is POLL_ADD in multishot mode.
Since sqe->poll_events is using the space that we normally use for adding
flags to commands, use sqe->len for the flag space for POLL_ADD. Multishot
mode is selected by setting IORING_POLL_ADD_MULTI in sqe->len. An
application should expect more CQEs for the specificed SQE if the CQE is
flagged with IORING_CQE_F_MORE. In multishot mode, only cancelation or an
error will terminate the poll request, in which case the flag will be
cleared.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Define get_module_eeprom_by_page() ethtool callback and implement
netlink infrastructure.
get_module_eeprom_by_page() allows network drivers to dump a part of
module's EEPROM specified by page and bank numbers along with offset and
length. It is effectively a netlink replacement for get_module_info()
and get_module_eeprom() pair, which is needed due to emergence of
complex non-linear EEPROM layouts.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When async binder buffer got exhausted, some normal oneway transactions
will also be discarded and may cause system or application failures. By
that time, the binder debug information we dump may not be relevant to
the root cause. And this issue is difficult to debug if without the
backtrace of the thread sending spam.
This change will send BR_ONEWAY_SPAM_SUSPECT to userspace when oneway
spamming is detected, request to dump current backtrace. Oneway spamming
will be reported only once when exceeding the threshold (target process
dips below 80% of its oneway space, and current process is responsible for
either more than 50 transactions, or more than 50% of the oneway space).
And the detection will restart when the async buffer has returned to a
healthy state.
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-3-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Callout devices are long-gone, but the ASYNC_SPLIT_TERMIOS flag was
never added to the deprecation mask.
Add it so that a warning is printed if it is ever used.
Fixes: 8a8ae62f82 ("tty: warn on deprecated serial flags")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20210407095208.31838-7-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some kernel-internal ASYNC flags have been superseded by tty-port flags
and should no longer be used by kernel drivers.
Fix the misspelled "__KERNEL__" compile guards which failed their sole
purpose to break out-of-tree drivers that have not yet been updated.
Fixes: 5c0517fefc ("tty: core: Undefine ASYNC_* flags superceded by TTY_PORT* flags")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20210407095208.31838-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mac80211, wireless, and bpf trees. No scary regressions here
or in the works, but small fixes for 5.12 changes keep coming.
Current release - regressions:
- virtio: do not pull payload in skb->head
- virtio: ensure mac header is set in virtio_net_hdr_to_skb()
- Revert "net: correct sk_acceptq_is_full()"
- mptcp: revert "mptcp: provide subflow aware release function"
- ethernet: lan743x: fix ethernet frame cutoff issue
- dsa: fix type was not set for devlink port
- ethtool: remove link_mode param and derive link params
from driver
- sched: htb: fix null pointer dereference on a null new_q
- wireless: iwlwifi: Fix softirq/hardirq disabling in
iwl_pcie_enqueue_hcmd()
- wireless: iwlwifi: fw: fix notification wait locking
- wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding
the rtnl dependency
Current release - new code bugs:
- napi: fix hangup on napi_disable for threaded napi
- bpf: take module reference for trampoline in module
- wireless: mt76: mt7921: fix airtime reporting and related
tx hangs
- wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending
config command
Previous releases - regressions:
- rfkill: revert back to old userspace API by default
- nfc: fix infinite loop, refcount & memory leaks in LLCP sockets
- let skb_orphan_partial wake-up waiters
- xfrm/compat: Cleanup WARN()s that can be user-triggered
- vxlan, geneve: do not modify the shared tunnel info when PMTU
triggers an ICMP reply
- can: fix msg_namelen values depending on CAN_REQUIRED_SIZE
- can: uapi: mark union inside struct can_frame packed
- sched: cls: fix action overwrite reference counting
- sched: cls: fix err handler in tcf_action_init()
- ethernet: mlxsw: fix ECN marking in tunnel decapsulation
- ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
- ethernet: i40e: fix receiving of single packets in xsk zero-copy
mode
- ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic
Previous releases - always broken:
- bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET
- bpf: Refcount task stack in bpf_get_task_stack
- bpf, x86: Validate computation of branch displacements
- ieee802154: fix many similar syzbot-found bugs
- fix NULL dereferences in netlink attribute handling
- reject unsupported operations on monitor interfaces
- fix error handling in llsec_key_alloc()
- xfrm: make ipv4 pmtu check honor ip header df
- xfrm: make hash generation lock per network namespace
- xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp
offload
- ethtool: fix incorrect datatype in set_eee ops
- xdp: fix xdp_return_frame() kernel BUG throw for page_pool
memory model
- openvswitch: fix send of uninitialized stack memory in ct limit
reply
Misc:
- udp: add get handling for UDP_GRO sockopt
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmBwyfAACgkQMUZtbf5S
IruJ/BAAnjghw2kWXRCKK3Tkm0pi0zjaKvTS30AcKCW2+GnqSxTdiWNv+mxqFgnm
YdduPKiGwLoDkA2i2d4EF8/HK6m+Q6bHcUbZ2npEm1ElkKfxCYGmocor8n2kD+a9
je94VGYV7zytnxXw85V6/jFLDqOXXwhBfHhlDMVBZP8OyzUfbDKGorWmyGuy9GJp
81bvzqN2bHUGIM0cDr+ol3eYw2ituGWgiqNfnq7z+/NVcYmD0EPChDRbp0jtH1ng
dcoONI6YlymDEDpu/9GmyKL1ken9lcWoVdvv/aDGtP62x6SYDt5HKe3wAtJ+Kjbq
jIPADxPx5BymYIZRBtdNR0rP66LycA7hDtM/C/h1WoihDXwpGeNUU4g0aJ+hsP5Q
ldwJI1DJo79VbwM2c3Kg73PaphLcPD4RdwF0/ovFsl0+bTDfj8i93ah4Wnzj0Qli
EMiSDEDNb51e9nkW+xu+FjLWmxHJvLOL/+VgHV5bPJJBob2fqnjAMj2PkPEuEtXY
TPWEh9y3zaEyp/9tNx0cstGOt6Gf5DQ5Nk6tX6hMpJT/BeL8mju1jm0yPLZhMJjF
LlTrJgXftfP/cjltdSm4aVqSU5okjHNYDhmHlNgvzih5mt+NVslRJfzwq62Vudqy
C0kpmVdQNFkOB0UcqQihevZg9mvem3m/dYl+v/MV7Uq6r4s4M2A=
=SHL0
-----END PGP SIGNATURE-----
Merge tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.12-rc7, including fixes from can, ipsec,
mac80211, wireless, and bpf trees.
No scary regressions here or in the works, but small fixes for 5.12
changes keep coming.
Current release - regressions:
- virtio: do not pull payload in skb->head
- virtio: ensure mac header is set in virtio_net_hdr_to_skb()
- Revert "net: correct sk_acceptq_is_full()"
- mptcp: revert "mptcp: provide subflow aware release function"
- ethernet: lan743x: fix ethernet frame cutoff issue
- dsa: fix type was not set for devlink port
- ethtool: remove link_mode param and derive link params from driver
- sched: htb: fix null pointer dereference on a null new_q
- wireless: iwlwifi: Fix softirq/hardirq disabling in
iwl_pcie_enqueue_hcmd()
- wireless: iwlwifi: fw: fix notification wait locking
- wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding the
rtnl dependency
Current release - new code bugs:
- napi: fix hangup on napi_disable for threaded napi
- bpf: take module reference for trampoline in module
- wireless: mt76: mt7921: fix airtime reporting and related tx hangs
- wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending
config command
Previous releases - regressions:
- rfkill: revert back to old userspace API by default
- nfc: fix infinite loop, refcount & memory leaks in LLCP sockets
- let skb_orphan_partial wake-up waiters
- xfrm/compat: Cleanup WARN()s that can be user-triggered
- vxlan, geneve: do not modify the shared tunnel info when PMTU
triggers an ICMP reply
- can: fix msg_namelen values depending on CAN_REQUIRED_SIZE
- can: uapi: mark union inside struct can_frame packed
- sched: cls: fix action overwrite reference counting
- sched: cls: fix err handler in tcf_action_init()
- ethernet: mlxsw: fix ECN marking in tunnel decapsulation
- ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
- ethernet: i40e: fix receiving of single packets in xsk zero-copy
mode
- ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic
Previous releases - always broken:
- bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET
- bpf: Refcount task stack in bpf_get_task_stack
- bpf, x86: Validate computation of branch displacements
- ieee802154: fix many similar syzbot-found bugs
- fix NULL dereferences in netlink attribute handling
- reject unsupported operations on monitor interfaces
- fix error handling in llsec_key_alloc()
- xfrm: make ipv4 pmtu check honor ip header df
- xfrm: make hash generation lock per network namespace
- xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp
offload
- ethtool: fix incorrect datatype in set_eee ops
- xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory
model
- openvswitch: fix send of uninitialized stack memory in ct limit
reply
Misc:
- udp: add get handling for UDP_GRO sockopt"
* tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (182 commits)
net: fix hangup on napi_disable for threaded napi
net: hns3: Trivial spell fix in hns3 driver
lan743x: fix ethernet frame cutoff issue
net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
net: dsa: lantiq_gswip: Don't use PHY auto polling
net: sched: sch_teql: fix null-pointer dereference
ipv6: report errors for iftoken via netlink extack
net: sched: fix err handler in tcf_action_init()
net: sched: fix action overwrite reference counting
Revert "net: sched: bump refcount for new action in ACT replace mode"
ice: fix memory leak of aRFS after resuming from suspend
i40e: Fix sparse warning: missing error code 'err'
i40e: Fix sparse error: 'vsi->netdev' could be null
i40e: Fix sparse error: uninitialized symbol 'ring'
i40e: Fix sparse errors in i40e_txrx.c
i40e: Fix parameters in aq_get_phy_register()
nl80211: fix beacon head validation
bpf, x86: Validate computation of branch displacements for x86-32
bpf, x86: Validate computation of branch displacements for x86-64
...
- Proper support for BCM4330 and BMC4334
- Various improvements for firmware download of Intel controllers
- Update management interface revision to 20
- Support for AOSP HCI vendor commands
- Initial Virtio support
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmBvL7UZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKU+eD/4qGiwnwQVia4hjnQ5PZ0c/
sBmZvzzZE43RNRvy8gAwcVLypLaiIkEYml4ruVdF6RGoqaDA6kRB7D0tdk8EtZc4
6h72LX0zefnyZNj/Fcl6+dKuq7bSHjdOK0G+oXFmpvYfScPsI6Oq0ntpYOFYwUE0
cDdeZyb/sW4f6L5AXUgLjD9AIjzlBOJEU5HcRL/uY+E/fH1oe757/IhUDerXjTIh
9EhY28mrI7KeoEcJ1NIgvjPnRCxPRLAKgrzc8NXkUsLeNZSIywrwjgRfTvCZ+qQf
g/MmNv+n7yFlUzA7Fu2d8YAgRtgF9nZc45geqOSHqcSZURORSs0CzEoYinqwT7f0
xdMBi/UoSzJsLksgv+OfXOyoQ4lPNJ47pcH/edmT/ZJ/Ai2yTqUUSODHQ3MLC4Dp
kQj34thQvfTf5b6+9HwKmBjlfMK0QGPqWjp6dth2bzos/ugurDP+XimAv0urvFz+
NkES5kCMUs/1z7Yh+Nj1MyU3Wfdwol0wOh+PXBTAw2fxSy6kmKYGyLw7HEdWegB0
K2w/ZS58g0EGK89ZSiqlZKmXtAd22dQqS6JlZje+9YTtEtRWu+rAF1dddS8vJDjl
Rq3VSQ1B6Qw0rdGpyfxdgm/sWwq5S1YgQQcAZgVaeo9ZiIhfscVD5v5L0Ol6k9ms
3eCK/2brsX0IaV61eVnoOw==
=jiCz
-----END PGP SIGNATURE-----
Merge tag 'for-net-next-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- Proper support for BCM4330 and BMC4334
- Various improvements for firmware download of Intel controllers
- Update management interface revision to 20
- Support for AOSP HCI vendor commands
- Initial Virtio support
====================
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add constants for 2.5G and 5G speed in PCS speed register into mdio.h.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhance enum nl80211_tdls_peer_capability to configure TDLS peer's
support for HE mode. Userspace decodes the TDLS setup response frame
and confugures the HE mode support to driver if the peer has advertized
HE mode support in TDLS setup response frame. The driver uses this
information to decide whether to include HE operation IE in TDLS setup
confirmation frame.
Signed-off-by: Vamsi Krishna <vamsin@codeaurora.org>
Link: https://lore.kernel.org/r/1614696636-30144-1-git-send-email-vamsin@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This adds support for Bluetooth HCI transport over virtio.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The docs were very sparse with how exactly CMD_ROAM should be
used. Specifically related to BSS information normally obtained
through a user space scan.
Signed-off-by: James Prestwood <prestwoj@gmail.com>
Link: https://lore.kernel.org/r/20210311230333.103934-1-prestwoj@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Recompiling with the new extended version of struct rfkill_event
broke systemd in *two* ways:
- It used "sizeof(struct rfkill_event)" to read the event, but
then complained if it actually got something != 8, this broke
it on new kernels (that include the updated API);
- It used sizeof(struct rfkill_event) to write a command, but
didn't implement the intended expansion protocol where the
kernel returns only how many bytes it accepted, and errored
out due to the unexpected smaller size on kernels that didn't
include the updated API.
Even though systemd has now been fixed, that fix may not be always
deployed, and other applications could potentially have similar
issues.
As such, in the interest of avoiding regressions, revert the
default API "struct rfkill_event" back to the original size.
Instead, add a new "struct rfkill_event_ext" that extends it by
the new field, and even more clearly document that applications
should be prepared for extensions in two ways:
* write might only accept fewer bytes on older kernels, and
will return how many to let userspace know which data may
have been ignored;
* read might return anything between 8 (the original size) and
whatever size the application sized its buffer at, indicating
how much event data was supported by the kernel.
Perhaps that will help avoid such issues in the future and we
won't have to come up with another version of the struct if we
ever need to extend it again.
Applications that want to take advantage of the new field will
have to be modified to use struct rfkill_event_ext instead now,
which comes with the danger of them having already been updated
to use it from 'struct rfkill_event', but I found no evidence
of that, and it's still relatively new.
Cc: stable@vger.kernel.org # 5.11
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v12.0.0-r4 (x86-64)
Link: https://lore.kernel.org/r/20210319232510.f1a139cfdd9c.Ic5c7c9d1d28972059e132ea653a21a427c326678@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix remaining issues with kdoc in the ethtool headers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a note on expected handling of reserved fields,
and references to all kdocs. This fixes a bunch
of kdoc warnings.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extended link state structures and enums use kdoc headers
but then do not describe any of the members.
Convert to normal comments.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the hypervisor side of the KVM PTP interface.
The service offers wall time and cycle count from host to guest.
The caller must specify whether they want the host's view of
either the virtual or physical counter.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong.wu@arm.com
This driver never had any open userspace (which for VFIO would include
VM kernel drivers) that use it, and thus should never have been added
by our normal userspace ABI rules.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20210326061311.1497642-2-hch@lst.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add Colorimetry control class for colorimetry controls
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Long Term Reference (LTR) frames are the frames that are encoded
sometime in the past and stored in the DPB buffer list to be used
as reference to encode future frames.
This change adds controls to enable this feature.
Signed-off-by: Dikshita Agarwal <dikshita@codeaurora.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Modify the documentation to point out which flags and structs are
used to configure the statistics.
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: Sebastian Fricke <sebastian.fricke@posteo.net>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
CoreSight PMU supports aux-buffer for the ETM tracing. The trace
generated by the ETM (associated with individual CPUs, like Intel PT)
is captured by a separate IP (CoreSight TMC-ETR/ETF until now).
The TMC-ETR applies formatting of the raw ETM trace data, as it
can collect traces from multiple ETMs, with the TraceID to indicate
the source of a given trace packet.
Arm Trace Buffer Extension is new "sink" IP, attached to individual
CPUs and thus do not provide additional formatting, like TMC-ETR.
Additionally, a system could have both TRBE *and* TMC-ETR for
the trace collection. e.g, TMC-ETR could be used as a single
trace buffer to collect data from multiple ETMs to correlate
the traces from different CPUs. It is possible to have a
perf session where some events end up collecting the trace
in TMC-ETR while the others in TRBE. Thus we need a way
to identify the type of the trace for each AUX record.
Define the trace formats exported by the CoreSight PMU.
We don't define the flags following the "ETM" as this
information is available to the user when issuing
the session. What is missing is the additional
formatting applied by the "sink" which is decided
at the runtime and the user may not have a control on.
So we define :
- CORESIGHT format (indicates the Frame format)
- RAW format (indicates the format of the source)
The default value is CORESIGHT format for all the records
(i,e == 0). Add the RAW format for others that use
raw format.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20210405164307.1720226-3-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Allocate a byte for advertising the PMU specific format type
of the given AUX record. A PMU could end up providing hardware
trace data in multiple format in a single session.
e.g, The format of hardware buffer produced by CoreSight ETM
PMU depends on the type of the "sink" device used for collection
for an event (Traditional TMC-ETR/Bs with formatting or
TRBEs without any formatting).
# Boring story of why this is needed. Goto The_End_of_Story for skipping.
CoreSight ETM trace allows instruction level tracing of Arm CPUs.
The ETM generates the CPU excecution trace and pumps it into CoreSight
AMBA Trace Bus and is collected by a different CoreSight component
(traditionally CoreSight TMC-ETR /ETB/ETF), called "sink".
Important to note that there is no guarantee that every CPU has
a dedicated sink. Thus multiple ETMs could pump the trace data
into the same "sink" and thus they apply additional formatting
of the trace data for the user to decode it properly and attribute
the trace data to the corresponding ETM.
However, with the introduction of Arm Trace buffer Extensions (TRBE),
we now have a dedicated per-CPU architected sink for collecting the
trace. Since the TRBE is always per-CPU, it doesn't apply any formatting
of the trace. The support for this driver is under review [1].
Now a system could have a per-cpu TRBE and one or more shared
TMC-ETRs on the system. A user could choose a "specific" sink
for a perf session (e.g, a TMC-ETR) or the driver could automatically
select the nearest sink for a given ETM. It is possible that
some ETMs could end up using TMC-ETR (e.g, if the TRBE is not
usable on the CPU) while the others using TRBE in a single
perf session. Thus we now have "formatted" trace collected
from TMC-ETR and "unformatted" trace collected from TRBE.
However, we don't get into a situation where a single event
could end up using TMC-ETR & TRBE. i.e, any AUX buffer is
guaranteed to be either RAW or FORMATTED, but not a mix
of both.
As for perf decoding, we need to know the type of the data
in the individual AUX buffers, so that it can set up the
"OpenCSD" (library for decoding CoreSight trace) decoder
instance appropriately. Thus the perf.data file must conatin
the hints for the tool to decode the data correctly.
Since this is a runtime variable, and perf tool doesn't have
a control on what sink gets used (in case of automatic sink
selection), we need this information made available from
the PMU driver for each AUX record.
# The_End_of_Story
Cc: Peter Ziljstra <peterz@infradead.org>
Cc: alexander.shishkin@linux.intel.com
Cc: mingo@redhat.com
Cc: will@kernel.org
Cc: mark.rutland@arm.com
Cc: mike.leach@linaro.org
Cc: acme@kernel.org
Cc: jolsa@redhat.com
Cc: Mathieu Poirier <mathieu.poirer@linaro.org>
Reviewed by: Mike Leach <mike.leach@linaro.org>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20210405164307.1720226-2-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBnh84QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpubkD/0Y+l3cecjzds3RnRXEXYsRFBKGfK6c7Uuu
QVCrlRp6tKPmBoDLQyl95Mg0e44pR4s3Bw5W4j9GmJtVyNVzC2x3dqXn3uXSFca/
KU+4GIzl2VIXS5Pn90GLE6/xw3FtVy8w2c6V3g4jkLR29bexPdO4s57cohxKR9kL
ZU+icCag9RlNIYkuB79Wy6Y3/m41L5WRkMGiMb0sJS9Q+k+zetZNIeNIxWn4E1zF
qWymdyBFx31qL8/2ZmRwb8XzF5qE2XimXz1a7ZX754zyR/Ry5rGc0h+JjqgUhSV9
wM2gLlMNEP+k+8DOU9ACYdff18P6b+RZ8mJnGZjZseAut1qJXonVtgDoWX7mEs9+
8Gl+n18TYpKEfzLiOOOtu/xeZYMjp0MUjO6iHTpzRfqBjKNoZGTuz0wGC5nX/ZYI
y5QWifI0NmMmTPDJpH6nVYzqDLbEZzcMz6WeOfhKQ/yv7gOxj+BFGJ3olJ+DAx8c
e6HDPa/WkC0iqie5cpzYjmve0HrKJADMMrRRWGRkgmOZ8uAaSS17rZExg1CICr8I
bOVYsrPsg8ErKVvzlx/DK6EfhNrw0+Db7paYccl2a3pXx/T8iHmW3RSqn7jMrhA1
7QPOCUMKuWuaOupWJWw25gxNS3viJa57/hxMG1nvAgpJx6QvBNaLrwcIWXO1cfrp
boe/UFnftg==
=8odY
-----END PGP SIGNATURE-----
Merge tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Remove comment that never came to fruition in 22 years of development
(Christoph)
- Remove unused request flag (Christoph)
- Fix for null_blk fake timeout handling (Damien)
- Fix for IOCB_NOWAIT being ignored for O_DIRECT on raw bdevs (Pavel)
- Error propagation fix for multiple split bios (Yufen)
* tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
block: remove the unused RQF_ALLOCED flag
block: update a few comments in uapi/linux/blkpg.h
block: don't ignore REQ_NOWAIT for direct IO
null_blk: fix command timeout completion handling
block: only update parent bi_status when bio fail
The MPTCP reset option allows to carry a mptcp-specific error code that
provides more information on the nature of a connection reset.
Reset option data received gets stored in the subflow context so it can
be sent to userspace via the 'subflow closed' netlink event.
When a subflow is closed, the desired error code that should be sent to
the peer is also placed in the subflow context structure.
If a reset is sent before subflow establishment could complete, e.g. on
HMAC failure during an MP_JOIN operation, the mptcp skb extension is
used to store the reset information.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-04-01
The following pull-request contains BPF updates for your *net-next* tree.
We've added 68 non-merge commits during the last 7 day(s) which contain
a total of 70 files changed, 2944 insertions(+), 1139 deletions(-).
The main changes are:
1) UDP support for sockmap, from Cong.
2) Verifier merge conflict resolution fix, from Daniel.
3) xsk selftests enhancements, from Maciej.
4) Unstable helpers aka kernel func calling, from Martin.
5) Batches ops for LPM map, from Pedro.
6) Fix race in bpf_get_local_storage, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The big top of the file comment talk about grand plans that never
happened, so remove them to not confuse the readers. Also mark the
devname and volname fields as ignored as they were never used by the
kernel.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is
confusing and more importantly we still want to distinguish them
from user-space. So we can just reuse the stream verdict code but
introduce a new type of eBPF program, skb_verdict. Users are not
allowed to attach stream_verdict and skb_verdict programs to the
same map.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-10-xiyou.wangcong@gmail.com
Add FEC API to netlink.
This is not a 1-to-1 conversion.
FEC settings already depend on link modes to tell user which
modes are supported. Take this further an use link modes for
manual configuration. Old struct ethtool_fecparam is still
used to talk to the drivers, so we need to translate back
and forth. We can revisit the internal API if number of FEC
encodings starts to grow.
Enforce only one active FEC bit (by using a bit position
rather than another mask).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add definitions for the ICMPV6 type of Extended Echo Request and
Extended Echo Reply, as defined by sections 2 and 3 of RFC 8335.
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add definitions for PROBE ICMP types and codes.
Add AFI definitions for IP and IPV6 as specified by IANA
Add a struct to represent the additional header when probing by IP
address (ctype == 3) for use in parsing incoming PROBE messages
Add a struct to represent the entire Interface Identification Object
(IIO) section of an incoming PROBE packet
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset tries to resolve the diversity in the audio LED
control among the ALSA drivers. A new control layer registration
is introduced which allows to run additional operations on
top of the elementary ALSA sound controls.
A new control access group (three bits in the access flags)
was introduced to carry the LED group information for
the sound controls. The low-level sound drivers can just
mark those controls using this access group. This information
is not exported to the user space, but user space can
manage the LED sound control associations through sysfs
(last patch) per Mark's request. It makes things fully
configurable in the kernel and user space (UCM).
The actual state ('route') evaluation is really easy
(the minimal value check for all channels / controls / cards).
If there's more complicated logic for a given hardware,
the card driver may eventually export a new read-only
sound control for the LED group and do the logic itself.
The new LED trigger control code is completely separated
and possibly optional (there's no symbol dependency).
The full code separation allows eventually to move this
LED trigger control to the user space in future.
Actually it replaces the already present functionality
in the kernel space (HDA drivers) and allows a quick adoption
for the recent hardware (ASoC codecs including SoundWire).
snd_ctl_led 24576 0
The sound driver implementation is really easy:
1) call snd_ctl_led_request() when control LED layer should be
automatically activated
/ it calls module_request("snd-ctl-led") on demand /
2) mark all related kcontrols with
SNDRV_CTL_ELEM_ACCESS_SPK_LED or
SNDRV_CTL_ELEM_ACCESS_MIC_LED
Link: https://lore.kernel.org/r/20210317172945.842280-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmBjRuQOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE9ZQw/6Ao2X1io4TVnyO/gO8HtwmnZ6TWcrLUlySaep
H6Suf0RHsOQO9VOaMcUarA3Wnz1vZ44qJ/fkdLTslnIPGSRJDUx15bbb+n2N6pQJ
gS7Umxy6n73rQyEoDjd3ZorvDGjFSVFpjM+RYjk/Ohq+yziz7nQ/SZuHPPeqm1GU
C5d9SxyXGdqlJJ6yFCHzbtjSmIey+l1TZ+j3rSSww/CzDKxB2l5J6JZAMUjVdL9b
J80Mcw0XLdG9iTtEnkUt3TwvLMcMl95UPeQHIkVQlwsRRb3BtHNIwJLPQ/n+Cou7
Hre3y2miUYHrNICEzMdMlpDzQBqu5wvpXgbgIV0CwAwCXPZBVWE1hVJ9gG0l+r1G
fy1a75OmEin4V9g8w+wNTWDEgjwAOkWhA67WVjpbvHtd6kEbISzt4JHFksG1rjU2
vXOIj2VBmQN6zHtxfcZqY8Ge4A7XGo7tAM/3NsUxin+2y7ZXI6sDvv+0esYmqrYr
9as/tD84L5LTrbUYewaUgEdauQXluQI1egRi7pSeg7hZyLeYYkmszk54Ra3zdlmM
m7Hr6u+Y/G7yeFdn/WdeG3VzdmxhC2ZFfk3gq0vneBS3WrATbf6nAORp2bwzGSz4
pUsVLSv+vhpZdSF+IxpUuMnsLkkbUCvFivXLjQ6irSWvp7uts1HWdRowdg7Pe2lC
FVbFRuA=
=1uM7
-----END PGP SIGNATURE-----
Merge tag 'tags/mute-led-rework' into for-next
ALSA: control - add generic LED API
This patchset tries to resolve the diversity in the audio LED
control among the ALSA drivers. A new control layer registration
is introduced which allows to run additional operations on
top of the elementary ALSA sound controls.
A new control access group (three bits in the access flags)
was introduced to carry the LED group information for
the sound controls. The low-level sound drivers can just
mark those controls using this access group. This information
is not exported to the user space, but user space can
manage the LED sound control associations through sysfs
(last patch) per Mark's request. It makes things fully
configurable in the kernel and user space (UCM).
The actual state ('route') evaluation is really easy
(the minimal value check for all channels / controls / cards).
If there's more complicated logic for a given hardware,
the card driver may eventually export a new read-only
sound control for the LED group and do the logic itself.
The new LED trigger control code is completely separated
and possibly optional (there's no symbol dependency).
The full code separation allows eventually to move this
LED trigger control to the user space in future.
Actually it replaces the already present functionality
in the kernel space (HDA drivers) and allows a quick adoption
for the recent hardware (ASoC codecs including SoundWire).
snd_ctl_led 24576 0
The sound driver implementation is really easy:
1) call snd_ctl_led_request() when control LED layer should be
automatically activated
/ it calls module_request("snd-ctl-led") on demand /
2) mark all related kcontrols with
SNDRV_CTL_ELEM_ACCESS_SPK_LED or
SNDRV_CTL_ELEM_ACCESS_MIC_LED
Link: https://lore.kernel.org/r/20210317172945.842280-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In commit ea7800565a ("can: add optional DLC element to Classical
CAN frame structure") the struct can_frame::can_dlc was put into an
anonymous union with another u8 variable.
For various reasons some members in struct can_frame and canfd_frame
including the first 8 byes of data are expected to have the same
memory layout. This is enforced by a BUILD_BUG_ON check in af_can.c.
Since the above mentioned commit this check fails on ARM kernels
compiled with the ARM OABI (which means CONFIG_AEABI not set). In this
case -mabi=apcs-gnu is passed to the compiler, which leads to a
structure size boundary of 32, instead of 8 compared to CONFIG_AEABI
enabled. This means the the union in struct can_frame takes 4 bytes
instead of the expected 1.
Rong Chen illustrates the problem with pahole in the ARM OABI case:
| struct can_frame {
| canid_t can_id; /* 0 4 */
| union {
| __u8 len; /* 4 1 */
| __u8 can_dlc; /* 4 1 */
| }; /* 4 4 */
| __u8 __pad; /* 8 1 */
| __u8 __res0; /* 9 1 */
| __u8 len8_dlc; /* 10 1 */
|
| /* XXX 5 bytes hole, try to pack */
|
| __u8 data[8]
| __attribute__((__aligned__(8))); /* 16 8 */
|
| /* size: 24, cachelines: 1, members: 6 */
| /* sum members: 19, holes: 1, sum holes: 5 */
| /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
| /* last cacheline: 24 bytes */
| } __attribute__((__aligned__(8)));
Marking the anonymous union as __attribute__((packed)) fixes the
BUILD_BUG_ON problem on these compilers.
Fixes: ea7800565a ("can: add optional DLC element to Classical CAN frame structure")
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Rong Chen <rong.a.chen@intel.com>
Link: https://lore.kernel.org/linux-can/2c82ec23-3551-61b5-1bd8-178c3407ee83@hartkopp.net/
Link: https://lore.kernel.org/r/20210325125850.1620-3-socketcan@hartkopp.net
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch adds support to BPF verifier to allow bpf program calling
kernel function directly.
The use case included in this set is to allow bpf-tcp-cc to directly
call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()"). Those
functions have already been used by some kernel tcp-cc implementations.
This set will also allow the bpf-tcp-cc program to directly call the
kernel tcp-cc implementation, For example, a bpf_dctcp may only want to
implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
from the kernel tcp_dctcp.c instead of reimplementing (or
copy-and-pasting) them.
The tcp-cc kernel functions mentioned above will be white listed
for the struct_ops bpf-tcp-cc programs to use in a later patch.
The white listed functions are not bounded to a fixed ABI contract.
Those functions have already been used by the existing kernel tcp-cc.
If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
implementations have to be changed. The same goes for the struct_ops
bpf-tcp-cc programs which have to be adjusted accordingly.
This patch is to make the required changes in the bpf verifier.
First change is in btf.c, it adds a case in "btf_check_func_arg_match()".
When the passed in "btf->kernel_btf == true", it means matching the
verifier regs' states with a kernel function. This will handle the
PTR_TO_BTF_ID reg. It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
and PTR_TO_TCP_SOCK to its kernel's btf_id.
In the later libbpf patch, the insn calling a kernel function will
look like:
insn->code == (BPF_JMP | BPF_CALL)
insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
insn->imm == func_btf_id /* btf_id of the running kernel */
[ For the future calling function-in-kernel-module support, an array
of module btf_fds can be passed at the load time and insn->off
can be used to index into this array. ]
At the early stage of verifier, the verifier will collect all kernel
function calls into "struct bpf_kfunc_desc". Those
descriptors are stored in "prog->aux->kfunc_tab" and will
be available to the JIT. Since this "add" operation is similar
to the current "add_subprog()" and looking for the same insn->code,
they are done together in the new "add_subprog_and_kfunc()".
In the "do_check()" stage, the new "check_kfunc_call()" is added
to verify the kernel function call instruction:
1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
A new bpf_verifier_ops "check_kfunc_call" is added to do that.
The bpf-tcp-cc struct_ops program will implement this function in
a later patch.
2. Call "btf_check_kfunc_args_match()" to ensure the regs can be
used as the args of a kernel function.
3. Mark the regs' type, subreg_def, and zext_dst.
At the later do_misc_fixups() stage, the new fixup_kfunc_call()
will replace the insn->imm with the function address (relative
to __bpf_call_base). If needed, the jit can find the btf_func_model
by calling the new bpf_jit_find_kfunc_model(prog, insn).
With the imm set to the function address, "bpftool prog dump xlated"
will be able to display the kernel function calls the same way as
it displays other bpf helper calls.
gpl_compatible program is required to call kernel function.
This feature currently requires JIT.
The verifier selftests are adjusted because of the changes in
the verbose log in add_subprog_and_kfunc().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015142.1544736-1-kafai@fb.com
kdoc does not have good support for documenting defines,
and we can't abuse the enum documentation because it
generates warnings.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ethtool_fecparam::reserved can't be used in SET, because
ethtool user space doesn't zero-initialize the structure.
Make this clear.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When LVM needs to find a device with a particular UUID it needs to ask for
UUID for each device. This patch returns UUID directly in the list of
devices, so that LVM doesn't have to query all the devices with an ioctl.
The UUID is returned if the flag DM_UUID_FLAG is set in the parameters.
Returning UUID is done in backward-compatible way. There's one unused
32-bit word value after the event number. This patch sets the bit
DM_NAME_LIST_FLAG_HAS_UUID if UUID is present and
DM_NAME_LIST_FLAG_DOESNT_HAVE_UUID if it isn't (if none of these bits is
set, then we have an old kernel that doesn't support returning UUIDs). The
UUID is stored after this word. The 'next' value is updated to point after
the UUID, so that old version of libdevmapper will skip the UUID without
attempting to interpret it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
amd-drm-next-5.13-2021-03-23:
amdgpu:
- Debugfs cleanup
- Various cleanups and spelling fixes
- Flexible array cleanups
- Initial AMD Freesync HDMI
- Display fixes
- 10bpc dithering improvements
- Display ASSR support
- Clean up and unify powerplay and swsmu interfaces
- Vangogh fixes
- Add SMU gfx busy queues for RV/PCO
- PCIE DPM fixes
- S0ix fixes
- GPU metrics data fixes
- DCN secure display support
- Backlight type override
- Add initial support for Aldebaran
- RAS fixes
- Prime fixes for A+A systems
- Reset fixes
- Initial resource cursor support
- Drop legacy IO BAR requirements
- Various power fixes
amdkfd:
- MMU notifier fixes
- APU fixes
radeon:
- Debugfs cleanups
- Flexible array cleanups
UAPI:
- amdgpu: Add a new INFO ioctl interface to query video capabilities
rather than hardcoding them in userspace. This allows us to provide
fine grained asic capabilities (e.g., if a particular part is
bandwidth limited, we can limit the capabilities). Proposed userspace:
https://gitlab.freedesktop.org/leoliu/drm/-/commits/info_video_capshttps://gitlab.freedesktop.org/leoliu/mesa/-/commits/info_video_caps
- amdkfd: bump the driver version. There was a problem with reporting
some RAS features on older versions of the driver. Proposed userspace:
7cdd63475c
Danvet: A bunch of conflicts all over, but it seems to compile ... I
did put the call to dc_allow_idle_optimizations() on a single line
since it looked a bit too jarring to be left alone.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324040147.1990338-1-alexander.deucher@amd.com
Big set in here from Alexandru Ardelean enabling multiple buffer support.
This includes providing a new directory per buffer that combines
what was previously in buffer/ and scan_elements/. Old interfaces still
in place for compatiblity.
Note immuatable branch for scmi patches to allow for some significant
rework going on in that subsystem. Merge required updating to reflect
some changes in IIO.
Late rebase to fix some wrong fixes tags due to some earlier rebases
made necessary by messing up the immutable branch.
IIO New Device Support
* adi,ad5686
- Add info to support AD5673R and AD5677R
* bosch,bmi088
- New driver supporting this accelerometer + gyroscope
* cros_ec_mkbp
- New driver for this proximity sensor that exposes a 'front'
sensor. Very simple switch like device, but driver allows it
to share interface with more sophisticated proximity sensors.
* iio_scmi
- New driver to support ARM SCMI protocol to expose underlying
accelerometers and gyroscopes via this firmware interface.
* st,st_magn
- Add ID for IISMDC magnetometer.
* ti,ads131e0
- New driver supporting ads131e04, ads131e06 and ads131e08 24 bit ADCs
Counter New Device Support
* IRQ or GPIO based counter
- New driver for a conceptually simple counter that uses interrupts
to perform the count.
Features
* core
- Dual buffer supprt including:
Various helpers to centralize handling of bufferer related elements.
Document existing and new IOCTLs
Register the IIO chrdev only if it can actually be used for anything.
Rework attribute group creation in the core (lots of patches)
Merge buffer/ and scan_elements/ entries into one list + maintain
backwards compatible set.
Introduce the internal logic and IOCTL to allow multiple buffers
+ access to an anon FD per buffer to actually read from it.
Tidy up tools/iio/iio_generic_buffer and switch to new interfaces.
Update ABI docs.
A few follow up fixes, unsuprising as this was a huge bit of rework.
- Move common case setting of trig->parent to the core.
- Provide an iio_read_channel_processed_scale() to avoid loss of
precision from iio_read_channel_processed() then applying integer
scale. Use it in ntc_thermistor driver in hwmon.
- Allow drivers to specify labels from elsewhere than DT. Use it for
bmc150 and kxcjk-1013 labels related to position on 2 in one tablets.
- Document label usage for proximity and accelerometer sensors.
- Some local variable renames for consistency
tools
- Add -a parameter to iio_event_monitor to allow autoenabling of events.
* acpi_als
- Add trigger support for devices that don't support notification method.
* adi,ad7124
- Allow more than 8 channels. This is a complex little device, but is
capable of supporting up to 16 channels if the share certain
configuration settings.
* hrtimer-trigger
- Support sampling frequency below 1Hz.
* mediatek,mt8195-auxadc
- Add compatible to binding docs (always also includes mt8173)
* st,stm32-adc
- Enable timetamps when not using DMA.
* vishay,vcnl3020
- Sampling frequency control.
Cleanup and minor fixes:
* treewide
- Use some getter and setter functions instead of opencoding.
- Set of fixes for pointless casts in various drivers.
- Avoid wrong kernel-doc marking on comment blocks.
- Fix various other minor kernel-doc issues shown by W=1
* core
- Use a signed temporary for IIO_VAL_FRACTIONAL_LOG2 to avoid odd casts.
- Fix IIO_VAL_FRACTIONAL_LOG2 for values between -1.0 and 0.0
- Add unit tests for iio_format_value()
* docs
- Fix formatting/typos in iio_configfs.rst and buffers.rst
- Add documentation of index in buffers.rst
- Fix scan element description
- Avoid some issues with HTML generation from ABI docs by moving
duplicated defintions to more generic files.
- Drop reference to long dead mailing list.
* 104-quad
- Remove left over deprecated IIO counter ABI.
* adi,adi-axi-adc
- Fix wrong bit of docs.
* adi,ad5791
- Typos
* adi,ad9834
- Switch to device managed functions in probe.
* adi,adis*
- Add and use helpers for locking to reduced duplication.
* adi,adis16480
- Fix calculation of sampling frequency when using pulse per second input.
* adi,adis16475
- Calculate the IMU scaled internal sampling rate and runtime depending
on sysfs based configuration rather than getting from DT. Drop now
unnecessary property from DT bindings doc.
* cros_ec
- Fix result of a series of recent changes that means extended buffer
attributes turn up in the wrong place. Too complex to revert the
various patches unfortunately so this is a bit messy.
* fsl,mma3452
- Indentation cleanup.
* hid-sensors
- Size of storage needs to increase for some parts when using quaternions.
- Move the get sensistivity attribute to hid-sensors-common to reduce
duplication. Enable it for more device types.
- Correctly handle relative sensitivity if reported that way including
documenting the new ABI.
* maxim,max517
- Use device managed functions in probe.
* mediatek,mt6360-adc
- Use asm/unaligned.h instead of directly including
unaligned/be_byteshift.h
* novuton,npcm-adc
- Local lock instead of missusing mlock.
* semtech,sx9500
- Typos
* st,sensor
- typo fix
* st,spear-adc
- Local lock instead of missusing mlock.
* st,stm32-adc
- Long standing HAS_IOMEM dependency fix.
* st,stm32-counter
- Remove left over deprecated IIO counter ABI.
* ti,palmas-adc
- Local lock instead of missusing mlock.
* ti,tmp007
- Switch to device managed functions in probe.
Other
* MAINTAINERS
- Move Peter Meerwald-Stadler to Credits at his request
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmBdtl4RHGppYzIzQGtl
cm5lbC5vcmcACgkQVIU0mcT0FogEhxAAuTWrEwun8rE5fQkQIlEkKYwZqEgUln4Q
tLKhrqeyfGcY/A1aX/HTpnn0TOtaOkUqRzLWsAW0thZih1u7yEL6Vc55KKh5WGL7
CvcvLWAkorsTjbtusgrBgFmjuoAMjW892Q+bbh1CJ/0qlezhFE9jrmJfmH2klI/p
nIoJsdyCE98+4oIdcOCxwJe7nTDDHP8BCF7WnKtHCLtn3T9Dzttises3T6HfKxlg
cdu3cy2N+pQpakYpv96tvjBGI9Ho3FX8R+dILUxJpVwCcLUjf8b1CFcgboJwxou2
tgPNwWToxd9OTYJa7EOsDaFPZD46NRProkUBGKgA58XPkhqSvLcSdvGokFPgKnPW
NorymGaUOC2qolH91nuFaWrd6c6hIf5NeWtGDo1GHJdcSgu21C0OdaU3K72EGhsB
YLnl0Wp8Bthwn7KS0Ck4TqUPN3D3Q9NCEz7sAUzqc3QBzm4U+dXVzCwRehI7hPdw
YlORAzbV1o7Z0skhAAth+NAYUUB6GywGZLaUi5oXWoJSYhNvI1K1uiHVVStVINWl
L7uor5FXTr4/czjrutWQbw7GQ0cfCODH6B1cbS9vNaDQ6wO9XGSaWgc3mK9Lgsqc
Y1ekYvXNSxKJw42FWvr4ylkeF7BV6h0oBFB4DLlZppYi1pKZb8oPsED8UpBrFnG1
uPqjNX9Tsqw=
=jeRJ
-----END PGP SIGNATURE-----
Merge tag 'iio-for-5.13a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next
Jonathan writes:
1st set of IIO/counter device support, features and cleanup in the 5.13 cycle
Big set in here from Alexandru Ardelean enabling multiple buffer support.
This includes providing a new directory per buffer that combines
what was previously in buffer/ and scan_elements/. Old interfaces still
in place for compatiblity.
Note immuatable branch for scmi patches to allow for some significant
rework going on in that subsystem. Merge required updating to reflect
some changes in IIO.
Late rebase to fix some wrong fixes tags due to some earlier rebases
made necessary by messing up the immutable branch.
IIO New Device Support
* adi,ad5686
- Add info to support AD5673R and AD5677R
* bosch,bmi088
- New driver supporting this accelerometer + gyroscope
* cros_ec_mkbp
- New driver for this proximity sensor that exposes a 'front'
sensor. Very simple switch like device, but driver allows it
to share interface with more sophisticated proximity sensors.
* iio_scmi
- New driver to support ARM SCMI protocol to expose underlying
accelerometers and gyroscopes via this firmware interface.
* st,st_magn
- Add ID for IISMDC magnetometer.
* ti,ads131e0
- New driver supporting ads131e04, ads131e06 and ads131e08 24 bit ADCs
Counter New Device Support
* IRQ or GPIO based counter
- New driver for a conceptually simple counter that uses interrupts
to perform the count.
Features
* core
- Dual buffer supprt including:
Various helpers to centralize handling of bufferer related elements.
Document existing and new IOCTLs
Register the IIO chrdev only if it can actually be used for anything.
Rework attribute group creation in the core (lots of patches)
Merge buffer/ and scan_elements/ entries into one list + maintain
backwards compatible set.
Introduce the internal logic and IOCTL to allow multiple buffers
+ access to an anon FD per buffer to actually read from it.
Tidy up tools/iio/iio_generic_buffer and switch to new interfaces.
Update ABI docs.
A few follow up fixes, unsuprising as this was a huge bit of rework.
- Move common case setting of trig->parent to the core.
- Provide an iio_read_channel_processed_scale() to avoid loss of
precision from iio_read_channel_processed() then applying integer
scale. Use it in ntc_thermistor driver in hwmon.
- Allow drivers to specify labels from elsewhere than DT. Use it for
bmc150 and kxcjk-1013 labels related to position on 2 in one tablets.
- Document label usage for proximity and accelerometer sensors.
- Some local variable renames for consistency
tools
- Add -a parameter to iio_event_monitor to allow autoenabling of events.
* acpi_als
- Add trigger support for devices that don't support notification method.
* adi,ad7124
- Allow more than 8 channels. This is a complex little device, but is
capable of supporting up to 16 channels if the share certain
configuration settings.
* hrtimer-trigger
- Support sampling frequency below 1Hz.
* mediatek,mt8195-auxadc
- Add compatible to binding docs (always also includes mt8173)
* st,stm32-adc
- Enable timetamps when not using DMA.
* vishay,vcnl3020
- Sampling frequency control.
Cleanup and minor fixes:
* treewide
- Use some getter and setter functions instead of opencoding.
- Set of fixes for pointless casts in various drivers.
- Avoid wrong kernel-doc marking on comment blocks.
- Fix various other minor kernel-doc issues shown by W=1
* core
- Use a signed temporary for IIO_VAL_FRACTIONAL_LOG2 to avoid odd casts.
- Fix IIO_VAL_FRACTIONAL_LOG2 for values between -1.0 and 0.0
- Add unit tests for iio_format_value()
* docs
- Fix formatting/typos in iio_configfs.rst and buffers.rst
- Add documentation of index in buffers.rst
- Fix scan element description
- Avoid some issues with HTML generation from ABI docs by moving
duplicated defintions to more generic files.
- Drop reference to long dead mailing list.
* 104-quad
- Remove left over deprecated IIO counter ABI.
* adi,adi-axi-adc
- Fix wrong bit of docs.
* adi,ad5791
- Typos
* adi,ad9834
- Switch to device managed functions in probe.
* adi,adis*
- Add and use helpers for locking to reduced duplication.
* adi,adis16480
- Fix calculation of sampling frequency when using pulse per second input.
* adi,adis16475
- Calculate the IMU scaled internal sampling rate and runtime depending
on sysfs based configuration rather than getting from DT. Drop now
unnecessary property from DT bindings doc.
* cros_ec
- Fix result of a series of recent changes that means extended buffer
attributes turn up in the wrong place. Too complex to revert the
various patches unfortunately so this is a bit messy.
* fsl,mma3452
- Indentation cleanup.
* hid-sensors
- Size of storage needs to increase for some parts when using quaternions.
- Move the get sensistivity attribute to hid-sensors-common to reduce
duplication. Enable it for more device types.
- Correctly handle relative sensitivity if reported that way including
documenting the new ABI.
* maxim,max517
- Use device managed functions in probe.
* mediatek,mt6360-adc
- Use asm/unaligned.h instead of directly including
unaligned/be_byteshift.h
* novuton,npcm-adc
- Local lock instead of missusing mlock.
* semtech,sx9500
- Typos
* st,sensor
- typo fix
* st,spear-adc
- Local lock instead of missusing mlock.
* st,stm32-adc
- Long standing HAS_IOMEM dependency fix.
* st,stm32-counter
- Remove left over deprecated IIO counter ABI.
* ti,palmas-adc
- Local lock instead of missusing mlock.
* ti,tmp007
- Switch to device managed functions in probe.
Other
* MAINTAINERS
- Move Peter Meerwald-Stadler to Credits at his request
* tag 'iio-for-5.13a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (119 commits)
iio: acpi_als: Add trigger support
iio: acpi_als: Add local variable dev in probe
iio: acpi_als: Add timestamp channel
iio: adc: ad7292: Modify the bool initialization assignment
iio: cros: unify hw fifo attributes without API changes
iio: kfifo: add devm_iio_triggered_buffer_setup_ext variant
iio: event_monitor: Enable events before monitoring
dt-bindings: iio: adc: Add compatible for Mediatek MT8195
iio:magnetometer: Add Support for ST IIS2MDC
dt-bindings: iio: st,st-sensors add IIS2MDC.
staging: iio: ad9832: kernel-doc fixes
iio:dac:max517.c: Use devm_iio_device_register()
iio:cros_ec_sensors: Fix a wrong function name in kernel doc.
iio: buffer: kfifo_buf: kernel-doc, typo in function name.
iio: accel: sca3000: kernel-doc fixes. Missing - and wrong function names.
iio: adc: adi-axi-adc: Drop false marking for kernel-doc
iio: adc: cpcap-adc: kernel-doc fix - that should be _ in structure name
iio: dac: ad5504: fix wrong part number in kernel-doc structure name.
iio: dac: ad5770r: kernel-doc fix case of letter R wrong in structure name
iio: adc: ti-adc084s021: kernel-doc fixes, missing function names
...
struct ethtool_fecparam::active_fec is a GET-only field,
all in-tree drivers correctly ignore it on SET. Clear
the field on SET to avoid any confusion. Again, we can't
reject non-zero now since ethtool user space does not
zero-init the param correctly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ethtool_fecparam::reserved is never looked at by the core.
Make sure it's actually 0. Unfortunately we can't return an error
because old ethtool doesn't zero-initialize the structure for SET.
On GET we can be more verbose, there are no in tree (ab)users.
Fix up the kdoc on the structure. Remove the mention of FEC
bypass. Seems like a niche thing to configure in the first
place.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Digging through the mailing list archive @autoneg was part
of the first version of the RFC, this left over comment was
pointed out twice in review but wasn't removed.
The sentence is an exact copy-paste from pauseparam.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/porte/the port/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Various fixes, all over:
1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu.
2) Always store the rx queue mapping in veth, from Maciej
Fijalkowski.
3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov.
4) Fix memory leak in octeontx2-af from Colin Ian King.
5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from
Yonghong Song.
6) Fix tx ptp stats in mlx5, from Aya Levin.
7) Check correct ip version in tun decap, fropm Roi Dayan.
8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit.
9) Work item memork leak in mlx5, from Shay Drory.
10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann.
11) Lack of preemptrion awareness in macvlan, from Eric Dumazet.
12) Fix data race in pxa168_eth, from Pavel Andrianov.
13) Range validate stab in red_check_params(), from Eric Dumazet.
14) Inherit vlan filtering setting properly in b53 driver, from
Florian Fainelli.
15) Fix rtnl locking in igc driver, from Sasha Neftin.
16) Pause handling fixes in igc driver, from Muhammad Husaini
Zulkifli.
17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits.
18) Use after free in qlcnic, from Lv Yunlong.
19) fix crash in fritzpci mISDN, from Tong Zhang.
20) Premature rx buffer reuse in igb, from Li RongQing.
21) Missing termination of ip[a driver message handler arrays, from
Alex Elder.
22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25
driver, from Xie He.
23) Use after free in c_can_pci_remove(), from Tong Zhang.
24) Uninitialized variable use in nl80211, from Jarod Wilson.
25) Off by one size calc in bpf verifier, from Piotr Krysiuk.
26) Use delayed work instead of deferrable for flowtable GC, from
Yinjun Zhang.
27) Fix infinite loop in NPC unmap of octeontx2 driver, from
Hariprasad Kelam.
28) Fix being unable to change MTU of dwmac-sun8i devices due to lack
of fifo sizes, from Corentin Labbe.
29) DMA use after free in r8169 with WoL, fom Heiner Kallweit.
30) Mismatched prototypes in isdn-capi, from Arnd Bergmann.
31) Fix psample UAPI breakage, from Ido Schimmel"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits)
psample: Fix user API breakage
math: Export mul_u64_u64_div_u64
ch_ktls: fix enum-conversion warning
octeontx2-af: Fix memory leak of object buf
ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
net: bridge: don't notify switchdev for local FDB addresses
net/sched: act_ct: clear post_ct if doing ct_clear
net: dsa: don't assign an error value to tag_ops
isdn: capi: fix mismatched prototypes
net/mlx5: SF, do not use ecpu bit for vhca state processing
net/mlx5e: Fix division by 0 in mlx5e_select_queue
net/mlx5e: Fix error path for ethtool set-priv-flag
net/mlx5e: Offload tuple rewrite for non-CT flows
net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
net/mlx5: Add back multicast stats for uplink representor
net: ipconfig: ic_dev can be NULL in ic_close_devs
MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one
docs: networking: Fix a typo
r8169: fix DMA being used after buffer free if WoL is enabled
net: ipa: fix init header command validation
...
Cited commit added a new attribute before the existing group reference
count attribute, thereby changing its value and breaking existing
applications on new kernels.
Before:
# psample -l
libpsample ERROR psample_group_foreach: failed to recv message: Operation not supported
After:
# psample -l
Group Num Refcount Group Seq
1 1 0
Fix by restoring the value of the old attribute and remove the
misleading comments from the enumerator to avoid future bugs.
Cc: stable@vger.kernel.org
Fixes: d8bed686ab ("net: psample: Add tunnel support")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Adiel Bidani <adielb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- The Open Routing (Open/R) network protocol netlink handler uses ID 99
- Will also add to `/etc/iproute2/rt_protos` once this is accepted
- For more information: https://github.com/facebook/openr
Signed-off-by: From: Cooper Lees <me@cooperlees.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
User space needs to know if binder transactions occurred to frozen
processes. Introduce a new BINDER_GET_FROZEN ioctl and keep track of
transactions occurring to frozen proceses.
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Li Li <dualli@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-4-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Frozen tasks can't process binder transactions, so a way is required to
inform transmitting ends of communication failures due to the frozen
state of their receiving counterparts. Additionally, races are possible
between transitions to frozen state and binder transactions enqueued to
a specific process.
Implement BINDER_FREEZE ioctl for user space to inform the binder driver
about the intention to freeze or unfreeze a process. When the ioctl is
called, block the caller until any pending binder transactions toward
the target process are flushed. Return an error to transactions to
processes marked as frozen.
Co-developed-by: Todd Kjos <tkjos@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Li Li <dualli@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove the license boilerplate (containing an obsolete address), because
we now have the SPDX header.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20210322141748.1062733-1-geert@linux-m68k.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Indicate the availability reliable SRAM EDC state in the new bit
in the device properties.
Proposed userspace changes:
7cdd63475c
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The macro is for memory mapped by GPU as uncached.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The added format is V4L2_PIX_FMT_YUV24, this is a packed
YUV 4:4:4 format, with 8 bits for each component, 24 bits
per sample.
This format is used by the i.MX 8QuadMax and i.MX 8DualXPlus/8QuadXPlus
JPEG encoder/decoder.
Signed-off-by: Mirela Rabulea <mirela.rabulea@nxp.com>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Fix these kernel-doc warnings:
include/media/davinci/vpbe_osd.h:77: warning: Enum value 'PIXFMT_YCBCRI' not described in enum 'osd_pix_format'
include/media/davinci/vpbe_osd.h:77: warning: Enum value 'PIXFMT_YCRCBI' not described in enum 'osd_pix_format'
include/media/davinci/vpbe_osd.h:77: warning: Excess enum value 'PIXFMT_YCrCbI' description in 'osd_pix_format'
include/media/davinci/vpbe_osd.h:77: warning: Excess enum value 'PIXFMT_YCbCrI' description in 'osd_pix_format'
include/media/davinci/vpbe_osd.h:232: warning: expecting prototype for enum davinci_cursor_v_width. Prototype was for enum
osd_cursor_v_width instead
include/uapi/linux/uvcvideo.h:98: warning: Function parameter or member 'ns' not described in 'uvc_meta_buf'
include/uapi/linux/uvcvideo.h:98: warning: Function parameter or member 'sof' not described in 'uvc_meta_buf'
include/uapi/linux/uvcvideo.h:98: warning: Function parameter or member 'length' not described in 'uvc_meta_buf'
include/uapi/linux/uvcvideo.h:98: warning: Function parameter or member 'flags' not described in 'uvc_meta_buf'
include/uapi/linux/uvcvideo.h:98: warning: Function parameter or member 'buf' not described in 'uvc_meta_buf'
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
This patch fixes the following kernel-doc warnings:
include/uapi/linux/videodev2.h:996: warning: Function parameter or member 'm' not described in 'v4l2_plane'
include/uapi/linux/videodev2.h:996: warning: Function parameter or member 'reserved' not described in 'v4l2_plane'
include/uapi/linux/videodev2.h:1057: warning: Function parameter or member 'm' not described in 'v4l2_buffer'
include/uapi/linux/videodev2.h:1057: warning: Function parameter or member 'reserved2' not described in 'v4l2_buffer'
include/uapi/linux/videodev2.h:1057: warning: Function parameter or member 'reserved' not described in 'v4l2_buffer'
include/uapi/linux/videodev2.h:1068: warning: Function parameter or member 'tv' not described in 'v4l2_timeval_to_ns'
include/uapi/linux/videodev2.h:1068: warning: Excess function parameter 'ts' description in 'v4l2_timeval_to_ns'
include/uapi/linux/videodev2.h:1138: warning: Function parameter or member 'reserved' not described in 'v4l2_exportbuffer'
include/uapi/linux/videodev2.h:2237: warning: Function parameter or member 'reserved' not described in 'v4l2_plane_pix_format'
include/uapi/linux/videodev2.h:2270: warning: Function parameter or member 'hsv_enc' not described in 'v4l2_pix_format_mplane'
include/uapi/linux/videodev2.h:2270: warning: Function parameter or member 'reserved' not described in 'v4l2_pix_format_mplane'
include/uapi/linux/videodev2.h:2281: warning: Function parameter or member 'reserved' not described in 'v4l2_sdr_format'
include/uapi/linux/videodev2.h:2315: warning: Function parameter or member 'fmt' not described in 'v4l2_format'
include/uapi/linux/v4l2-subdev.h:53: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_format'
include/uapi/linux/v4l2-subdev.h:66: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_crop'
include/uapi/linux/v4l2-subdev.h:89: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_mbus_code_enum'
include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'min_width' not described in 'v4l2_subdev_frame_size_enum'
include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'max_width' not described in 'v4l2_subdev_frame_size_enum'
include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'min_height' not described in 'v4l2_subdev_frame_size_enum'
include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'max_height' not described in 'v4l2_subdev_frame_size_enum'
include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_frame_size_enum'
include/uapi/linux/v4l2-subdev.h:119: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_frame_interval'
include/uapi/linux/v4l2-subdev.h:140: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_frame_interval_enum'
include/uapi/linux/cec.h:406: warning: Function parameter or member 'raw' not described in 'cec_connector_info'
include/uapi/linux/cec.h:470: warning: Function parameter or member 'flags' not described in 'cec_event'
include/media/v4l2-h264.h:82: warning: Function parameter or member 'reflist' not described in 'v4l2_h264_build_p_ref_list'
include/media/v4l2-h264.h:82: warning: expecting prototype for v4l2_h264_build_b_ref_lists(). Prototype was for v4l2_h264_build_p_ref_list()
instead
include/media/cec.h:50: warning: Function parameter or member 'lock' not described in 'cec_devnode'
include/media/v4l2-jpeg.h:122: warning: Function parameter or member 'num_dht' not described in 'v4l2_jpeg_header'
include/media/v4l2-jpeg.h:122: warning: Function parameter or member 'num_dqt' not described in 'v4l2_jpeg_header'
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Fix following warnings:
./scripts/kernel-doc --none include/uapi/linux/v4l2-controls.h
include/uapi/linux/v4l2-controls.h:1727: warning: bad line:
include/uapi/linux/v4l2-controls.h:1853: warning: expecting prototype for struct v4l2_vp8_frame. Prototype was for struct v4l2_ctrl_vp8_frame instead
include/uapi/linux/v4l2-controls.h:1853: warning: Function parameter or member 'segment' not described in 'v4l2_ctrl_vp8_frame'
include/uapi/linux/v4l2-controls.h:1853: warning: Function parameter or member 'entropy' not described in 'v4l2_ctrl_vp8_frame'
include/uapi/linux/v4l2-controls.h:1853: warning: Function parameter or member 'coder_state' not described in 'v4l2_ctrl_vp8_frame'
Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Until now, the VP8 V4L2 API was not exported as a public API,
and only defined in a private media header (media/vp8-ctrls.h).
The reason for this was a concern about the API not complete
and ready to support VP8 decoding hardware accelerators.
After reviewing the VP8 specification in detail, and now
that the API is able to support Cedrus and Hantro G1,
we can consider this ready.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Move the VP8 stateless control types out of staging,
and re-number it to avoid any confusion.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Since we are ready to stabilize the VP8 stateless API,
move the parsed VP8 pixel format.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Add a control to enable inserting of AUD NALU into encoded
bitstream.
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Make display delay and display delay enable MFC controls standard v4l
controls. This will allow reuse of the controls for other decoder
drivers. Also the new proposed controls are now codec agnostic because
they could be used for any codec.
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBXwRseHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGX80H/2qQ49e2lbOfIqdR
CBThtgg89QmN9WPTfVhwB6b4vejB7kIiIpOPyJVkaio6lVe8Ewhl064fnGHwXm39
vPy0ZVAB96oaKaki6qi1k7jhCAMpl/vXf1RDe5PaEKPwp3Lr81BlY6dcTPbjxkFP
Uw+uC3iRQnT8msSqA1vnhbDl9w6jfmuxX45Eo9NWGz0hDCpZNOEt2oSo/OcXTH4k
c91FiW8Qv9uZX2tV4VSqFQgVPfneA+OWXMpjMg6kfK3jOJ5cZwmFGAa8ByqWACH/
U9OODYQCsyX5ZM11g7MOt7Iv+YSU8OA0We8KDN4cRZobrCHF0Txp3ZTSPjb/xHE3
9nUM50I=
=a6Go
-----END PGP SIGNATURE-----
Merge 5.12-rc4 into usb-next
We need the usb/thunderbolt fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Report the number of warnings that a user will get for exceeding the
soft limit of a realtime volume. This plugs a gap needed before we
can land a realtime quota implementation for XFS in the next cycle.
Link: https://lore.kernel.org/r/20210318041736.GB22094@magnolia
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In guest supervisor shared virtual addressing (SVA), write-protect
should be honored upon guest bind supervisor PASID request.
This patch extends the VT-d portion of the IOMMU UAPI to include WP bit.
WPE bit of the supervisor PASID entry will be set to match CPU CR0.WP bit.
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1614680040-1989-3-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
According with USB Device Class Definition for Video Device the
Processing Unit Descriptor bLength should be 12 (10 + bmControlSize),
but it has 11.
Invalid length caused that Processing Unit Descriptor Test Video form
CV tool failed. To fix this issue patch adds bmVideoStandards into
uvc_processing_unit_descriptor structure.
The bmVideoStandards field was added in UVC 1.1 and it wasn't part of
UVC 1.0a.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Reviewed-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20210315071748.29706-1-pawell@gli-login.cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a description of the IOCTLs and provide information on the default
value of the source and destination addresses.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com>
Link: https://lore.kernel.org/r/20210311140413.31725-4-arnaud.pouliquen@foss.st.com
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
As the RPMSG_ADDR_ANY is a valid src or dst address that can be set by
user applications, migrate its definition in user API.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com>
Link: https://lore.kernel.org/r/20210311140413.31725-3-arnaud.pouliquen@foss.st.com
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The Microsoft Surface Book series devices consist of a so-called
clipboard part (containing the CPU, touchscreen, and primary battery)
and a base part (containing keyboard, secondary battery, and optional
discrete GPU). These parts can be separated, i.e. the clipboard can be
detached and used as tablet.
This detachment process is initiated by pressing a button. On the
Surface Book 2 and 3 (targeted with this commit), the Surface Aggregator
Module (i.e. the embedded controller on those devices) attempts to send
a notification to any listening client driver and waits for further
instructions (i.e. whether the detachment process should continue or be
aborted). If it does not receive a response in a certain time-frame, the
detachment process (by default) continues and the clipboard can be
physically separated. In other words, (by default and) without a driver,
the detachment process takes about 10 seconds to complete.
This commit introduces a driver for this detachment system (called DTX).
This driver allows a user-space daemon to control and influence the
detachment behavior. Specifically, it forwards any detachment requests
to user-space, allows user-space to make such requests itself, and
allows handling of those requests. Requests can be handled by either
aborting, continuing/allowing, or delaying (i.e. resetting the timeout
via a heartbeat commend). The user-space API is implemented via the
/dev/surface/dtx miscdevice.
In addition, user-space can change the default behavior on timeout from
allowing detachment to disallowing it, which is useful if the (optional)
discrete GPU is in use.
Furthermore, this driver allows user-space to receive notifications
about the state of the base, specifically when it is physically removed
(as opposed to detachment requested), in what manner it is connected
(i.e. in reverse-/tent-/studio- or laptop-mode), and what type of base
is connected. Based on this information, the driver also provides a
simple tablet-mode switch (aliasing all modes without keyboard access,
i.e. tablet-mode and studio-mode to its reported tablet-mode).
An implementation of such a user-space daemon, allowing configuration of
detachment behavior via scripts (e.g. safely unmounting USB devices
connected to the base before continuing) can be found at [1].
[1]: https://github.com/linux-surface/surface-dtx-daemon
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20210308184819.437438-2-luzmaximilian@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
For userspace checkpoint and restore (C/R) a way of getting process state
containing RSEQ configuration is needed.
There are two ways this information is going to be used:
- to re-enable RSEQ for threads which had it enabled before C/R
- to detect if a thread was in a critical section during C/R
Since C/R preserves TLS memory and addresses RSEQ ABI will be restored
using the address registered before C/R.
Detection whether the thread is in a critical section during C/R is needed
to enforce behavior of RSEQ abort during C/R. Attaching with ptrace()
before registers are dumped itself doesn't cause RSEQ abort.
Restoring the instruction pointer within the critical section is
problematic because rseq_cs may get cleared before the control is passed
to the migrated application code leading to RSEQ invariants not being
preserved. C/R code will use RSEQ ABI address to find the abort handler
to which the instruction pointer needs to be set.
To achieve above goals expose the RSEQ ABI address and the signature value
with the new ptrace request PTRACE_GET_RSEQ_CONFIGURATION.
This new ptrace request can also be used by debuggers so they are aware
of stops within restartable sequences in progress.
Signed-off-by: Piotr Figiel <figiel@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michal Miroslaw <emmir@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20210226135156.1081606-1-figiel@google.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYFC+9QAKCRDh3BK/laaZ
PNUIAQD+g4qznv8fTiN5Juj+qr42DsLAWutI0EdVvZI4UMe01AEAmlLrlHZCE1dM
inXPu/Nq+0gMytAlodcOkHFtdOZqpgY=
=9izk
-----END PGP SIGNATURE-----
Merge tag 'fuse-fixes-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Fix a deadlock and a couple of other bugs"
* tag 'fuse-fixes-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: 32-bit user space ioctl compat for fuse device
virtiofs: Fail dax mount if device does not support it
fuse: fix live lock in fuse_iget()