mirror-linux/net
Eric Dumazet 93ab6cc691 tcp: implement mmap() for zero copy receive
Some networks can make sure TCP payload can exactly fit 4KB pages,
with well chosen MSS/MTU and architectures.

Implement mmap() system call so that applications can avoid
copying data without complex splice() games.

Note that a successful mmap( X bytes) on TCP socket is consuming
bytes, as if recvmsg() has been done. (tp->copied += X)

Only PROT_READ mappings are accepted, as skb page frags
are fundamentally shared and read only.

If tcp_mmap() finds data that is not a full page, or a patch of
urgent data, -EINVAL is returned, no bytes are consumed.

Application must fallback to recvmsg() to read the problematic sequence.

mmap() wont block,  regardless of socket being in blocking or
non-blocking mode. If not enough bytes are in receive queue,
mmap() would return -EAGAIN, or -EIO if socket is in a state
where no other bytes can be added into receive queue.

An application might use SO_RCVLOWAT, poll() and/or ioctl( FIONREAD)
to efficiently use mmap()

On the sender side, MSG_EOR might help to clearly separate unaligned
headers and 4K-aligned chunks if necessary.

Tested:

mlx4 (cx-3) 40Gbit NIC, with tcp_mmap program provided in following patch.
MTU set to 4168  (4096 TCP payload, 40 bytes IPv6 header, 32 bytes TCP header)

Without mmap() (tcp_mmap -s)

received 32768 MB (0 % mmap'ed) in 8.13342 s, 33.7961 Gbit,
  cpu usage user:0.034 sys:3.778, 116.333 usec per MB, 63062 c-switches
received 32768 MB (0 % mmap'ed) in 8.14501 s, 33.748 Gbit,
  cpu usage user:0.029 sys:3.997, 122.864 usec per MB, 61903 c-switches
received 32768 MB (0 % mmap'ed) in 8.11723 s, 33.8635 Gbit,
  cpu usage user:0.048 sys:3.964, 122.437 usec per MB, 62983 c-switches
received 32768 MB (0 % mmap'ed) in 8.39189 s, 32.7552 Gbit,
  cpu usage user:0.038 sys:4.181, 128.754 usec per MB, 55834 c-switches

With mmap() on receiver (tcp_mmap -s -z)

received 32768 MB (100 % mmap'ed) in 8.03083 s, 34.2278 Gbit,
  cpu usage user:0.024 sys:1.466, 45.4712 usec per MB, 65479 c-switches
received 32768 MB (100 % mmap'ed) in 7.98805 s, 34.4111 Gbit,
  cpu usage user:0.026 sys:1.401, 43.5486 usec per MB, 65447 c-switches
received 32768 MB (100 % mmap'ed) in 7.98377 s, 34.4296 Gbit,
  cpu usage user:0.028 sys:1.452, 45.166 usec per MB, 65496 c-switches
received 32768 MB (99.9969 % mmap'ed) in 8.01838 s, 34.281 Gbit,
  cpu usage user:0.02 sys:1.446, 44.7388 usec per MB, 65505 c-switches

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-16 18:26:37 -04:00
..
6lowpan
9p net/9p/client.c: fix potential refcnt problem of trans module 2018-04-05 21:36:23 -07:00
802
8021q vlan: also check phy_driver ts_info for vlan's real device 2018-04-01 20:53:50 -04:00
appletalk net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
atm net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
ax25 net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2018-04-08 17:19:15 -04:00
bpf bpf: fix null pointer deref in bpf_prog_test_run_xdp 2018-02-01 07:43:56 -08:00
bridge net: bridge: disable bridge MTU auto tuning if it was set manually 2018-03-31 22:19:00 -04:00
caif net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
can net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ceph The big ticket items are: 2018-04-10 12:25:30 -07:00
core tcp: fix SO_RCVLOWAT and RCVBUF autotuning 2018-04-16 18:26:37 -04:00
dcb
dccp dccp: initialize ireq->ir_mark 2018-04-07 22:32:31 -04:00
decnet net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
dns_resolver net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
dsa net: dsa: Discard frames from unused ports 2018-04-08 10:34:49 -04:00
ethernet
hsr
ieee802154 inet: frags: fix ip6frag_low_thresh boundary 2018-04-04 12:04:59 -04:00
ife
ipv4 tcp: implement mmap() for zero copy receive 2018-04-16 18:26:37 -04:00
ipv6 tcp: implement mmap() for zero copy receive 2018-04-16 18:26:37 -04:00
iucv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
kcm net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
key net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
l2tp l2tp: fix race in duplicate tunnel detection 2018-04-11 17:41:27 -04:00
l3mdev
lapb
llc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
mac80211 We have a fair number of patches, but many of them are from the 2018-03-29 16:23:26 -04:00
mac802154 net/mac802154: disambiguate mac80215 vs mac802154 trace events 2018-03-28 22:55:18 +02:00
mpls net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ncsi net/ncsi: check for null return from call to nla_nest_start 2018-03-27 10:38:26 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
netlabel netlabel: If PF_INET6, check sk_buff ip header version 2018-02-14 14:01:41 -05:00
netlink netlink: fix uninit-value in netlink_sendmsg 2018-04-07 22:32:31 -04:00
netrom net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-02-19 18:46:11 -05:00
nsh
openvswitch ovs: Remove rtnl_lock() from ovs_exit_net() 2018-03-29 13:47:54 -04:00
packet net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
phonet net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
psample
qrtr Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-06 01:20:46 -05:00
rds rds: MP-RDS may use an invalid c_path 2018-04-11 10:24:01 -04:00
rfkill vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
rose net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
rxrpc rxrpc: Fix undefined packet handling 2018-04-04 11:04:08 -04:00
sched net_sched: fix a missing idr_remove() in u32_delete_key() 2018-04-07 12:36:45 -04:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-09 17:04:10 -07:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
strparser strparser: Fix sign of err codes 2018-03-27 11:00:18 -04:00
sunrpc Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on 2018-04-05 19:15:29 -07:00
switchdev
tipc tipc: use the right skb in tipc_sk_fill_sock_diag() 2018-04-08 12:34:29 -04:00
tls tls: support for Inline tls record 2018-03-31 23:37:32 -04:00
unix af_unix: remove redundant lockdep class 2018-04-04 11:13:40 -04:00
vmw_vsock net: make getname() functions return length rather than use int* parameter 2018-02-12 14:15:04 -05:00
wimax
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-03-31 23:33:04 -04:00
x25 net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
Kconfig Staging/IIO patches for 4.16-rc1 2018-02-01 09:51:57 -08:00
Makefile
compat.c net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to compat syscalls 2018-04-02 20:15:20 +02:00
socket.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2018-04-05 11:56:35 -07:00
sysctl_net.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00