mirror-linux

Commit Graph

Author	SHA1	Message	Date
David S. Miller	6b798d70d0	Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch Pravin B Shelar says: ==================== Open vSwitch First two patches are related to OVS MPLS support. Rest of patches are mostly refactoring and minor improvements to openvswitch. v1-v2: - Fix conflicts due to "gue: Remote checksum offload" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-06 16:33:13 -05:00
Martin Townsend	56b2c3eea3	6lowpan: move skb_free from error paths in decompression Currently we ensure that the skb is freed on every error path in IPHC decompression which makes it easy to introduce skb leaks. By centralising the skb_free into the receive function it makes future decompression routines easier to maintain. It does come at the expense of ensuring that the skb passed into the decompression routine must not be copied. Signed-off-by: Martin Townsend <mtownsend1973@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-06 22:09:48 +01:00
Joe Perches	4508349777	net: esp: Convert NETDEBUG to pr_info Commit `64ce207306` ("[NET]: Make NETDEBUG pure printk wrappers") originally had these NETDEBUG printks as always emitting. Commit `a2a316fd06` ("[NET]: Replace CONFIG_NET_DEBUG with sysctl") added a net_msg_warn sysctl to these NETDEBUG uses. Convert these NETDEBUG uses to normal pr_info calls. This changes the output prefix from "ESP: " to include "IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case. These output lines are now like the other messages in the files. Other miscellanea: Neaten the arithmetic spacing to be consistent with other arithmetic spacing in the files. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-06 15:11:10 -05:00
Joe Perches	cbffccc970	net; ipv[46] - Remove 2 unnecessary NETDEBUG OOM messages These messages aren't useful as there's a generic dump_stack() on OOM. Neaten the comment and if test above the OOM by separating the assign in if into an allocation then if test. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-06 15:11:10 -05:00
Andrew Lunn	b31f65fb43	net: dsa: slave: Fix autoneg for phys on switch MDIO bus When the ports phys are connected to the switches internal MDIO bus, we need to connect the phy to the slave netdev, otherwise auto-negotiation etc, does not work. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-06 15:06:28 -05:00
Jiri Pirko	0c6965dd31	sched: fix act file names in header comment Fixes: `4bba3925` ("[PKT_SCHED]: Prefix tc actions with act_") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-06 15:04:41 -05:00
Steffen Klassert	ea3dc9601b	ip6_tunnel: Add support for wildcard tunnel endpoints. This patch adds support for tunnels with local or remote wildcard endpoints. With this we get a NBMA tunnel mode like we have it for ipv4 and sit tunnels. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-06 14:19:20 -05:00
Steffen Klassert	d50051407f	ipv6: Allow sending packets through tunnels with wildcard endpoints Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit packets through an ipv6 tunnel. This capability is set when the tunnel gets configured, based on the tunnel endpoint addresses. On tunnels with wildcard tunnel endpoints, we need to do the capabiltiy checking on a per packet basis like it is done in the receive path. This patch extends ip6_tnl_xmit_ctl() to take local and remote addresses as parameters to allow for per packet capabiltiy checking. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-06 14:19:19 -05:00
Kuba Pawlak	9645c76c7c	Bluetooth: Sort switch cases by opcode's numeric value Opcodes in switch/case in hci_cmd_status_evt are not sorted by value. This patch restores proper ordering. Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-06 19:38:42 +01:00
Kuba Pawlak	50fc85f1b0	Bluetooth: Clear role switch pending flag If role switch was rejected by the controller and HCI Event: Command Status returned with status "Command Disallowed" (0x0C) the flag HCI_CONN_RSWITCH_PEND remains set. No further role switches are possible as this flag prevents us from sending any new HCI Switch Role requests and the only way to clear it is to receive a valid HCI Event Switch Role. This patch clears the flag if command was rejected. 2013-01-01 00:03:44.209913 < HCI Command: Switch Role (0x02\|0x000b) plen 7 bdaddr BC:C6:DB:C4:6F:79 role 0x00 Role: Master 2013-01-01 00:03:44.210867 > HCI Event: Command Status (0x0f) plen 4 Switch Role (0x02\|0x000b) status 0x0c ncmd 1 Error: Command Disallowed Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-06 19:38:42 +01:00
Ronald Wahl	4f031fa9f1	mac80211: Fix regression that triggers a kernel BUG with CCMP Commit `7ec7c4a9a6` (mac80211: port CCMP to cryptoapi's CCM driver) introduced a regression when decrypting empty packets (data_len == 0). This will lead to backtraces like: (scatterwalk_start) from [<c01312f4>] (scatterwalk_map_and_copy+0x2c/0xa8) (scatterwalk_map_and_copy) from [<c013a5a0>] (crypto_ccm_decrypt+0x7c/0x25c) (crypto_ccm_decrypt) from [<c032886c>] (ieee80211_aes_ccm_decrypt+0x160/0x170) (ieee80211_aes_ccm_decrypt) from [<c031c628>] (ieee80211_crypto_ccmp_decrypt+0x1ac/0x238) (ieee80211_crypto_ccmp_decrypt) from [<c032ef28>] (ieee80211_rx_handlers+0x870/0x1d24) (ieee80211_rx_handlers) from [<c0330c7c>] (ieee80211_prepare_and_rx_handle+0x8a0/0x91c) (ieee80211_prepare_and_rx_handle) from [<c0331260>] (ieee80211_rx+0x568/0x730) (ieee80211_rx) from [<c01d3054>] (__carl9170_rx+0x94c/0xa20) (__carl9170_rx) from [<c01d3324>] (carl9170_rx_stream+0x1fc/0x320) (carl9170_rx_stream) from [<c01cbccc>] (carl9170_usb_tasklet+0x80/0xc8) (carl9170_usb_tasklet) from [<c00199dc>] (tasklet_hi_action+0x88/0xcc) (tasklet_hi_action) from [<c00193c8>] (__do_softirq+0xcc/0x200) (__do_softirq) from [<c0019734>] (irq_exit+0x80/0xe0) (irq_exit) from [<c0009c10>] (handle_IRQ+0x64/0x80) (handle_IRQ) from [<c000c3a0>] (__irq_svc+0x40/0x4c) (__irq_svc) from [<c0009d44>] (arch_cpu_idle+0x2c/0x34) Such packets can appear for example when using the carl9170 wireless driver because hardware sometimes generates garbage when the internal FIFO overruns. This patch adds an additional length check. Cc: stable@vger.kernel.org Fixes: `7ec7c4a9a6` ("mac80211: port CCMP to cryptoapi's CCM driver") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-06 12:42:22 +01:00
Pravin B Shelar	a85311bf1f	openvswitch: Avoid NULL mask check while building mask OVS does mask validation even if it does not need to convert netlink mask attributes to mask structure. ovs_nla_get_match() caller can pass NULL mask structure pointer if the caller does not need mask. Therefore NULL check is required in SW_FLOW_KEY* macros. Following patch does not convert mask netlink attributes if mask pointer is NULL, so we do not need these checks in SW_FLOW_KEY* macro. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Andy Zhou <azhou@nicira.com>	2014-11-05 23:52:35 -08:00
Pravin B Shelar	2fdb957d63	openvswitch: Refactor action alloc and copy api. There are two separate API to allocate and copy actions list. Anytime OVS needs to copy action list, it needs to call both functions. Following patch moves action allocation to copy function to avoid code duplication. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-11-05 23:52:35 -08:00
Joe Stringer	41af73e9c1	openvswitch: Move key_attr_size() to flow_netlink.h. flow-netlink has netlink related code. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:35 -08:00
Lorand Jakab	d98612b8c1	openvswitch: Remove flow member from struct ovs_skb_cb The 'flow' memeber was chosen for removal because it's only used in ovs_execute_actions() we can pass it as argument to this function. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:35 -08:00
Chunhe Li	e1f9c356d2	openvswitch: Drop packets when interdev is not up If the internal device is not up, it should drop received packets. Sometimes it receive the broadcast or multicast packets, and the ip protocol stack will casue more cpu usage wasted. Signed-off-by: Chunhe Li <lichunhe@huawei.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:35 -08:00
Andy Zhou	cc3a5ae6f2	openvswitch: Refactor get_dp() function into multiple access APIs. Avoid recursive read_rcu_lock() by using the lighter weight get_dp_rcu() API. Add proper locking assertions to get_dp(). Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Joe Stringer	ca7105f278	openvswitch: Refactor ovs_flow_cmd_fill_info(). Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a dump reply. This will be used to streamline flow_dump in a future patch. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Andy Zhou	738967b8bf	openvswitch: refactor do_output() to move NULL check out of fast path skb_clone() NULL check is implemented in do_output(), as past of the common (fast) path. Refactoring so that NULL check is done in the slow path, immediately after skb_clone() is called. Besides optimization, this change also improves code readability by making the skb_clone() NULL check consistent within OVS datapath module. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Jesse Gross	426cda5cc1	openvswitch: Additional logging for -EINVAL on flow setups. There are many possible ways that a flow can be invalid so we've added logging for most of them. This adds logs for the remaining possible cases so there isn't any ambiguity while debugging. CC: Federico Iezzi <fiezzi@enter.it> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Joe Stringer	1b760fb9a8	openvswitch: Remove redundant tcp_flags code. These two cases used to be treated differently for IPv4/IPv6, but they are now identical. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Pravin B Shelar	9b996e544a	openvswitch: Move table destroy to dp-rcu callback. Ths simplifies flow-table-destroy API. No need to pass explicit parameter about context. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@redhat.com>	2014-11-05 23:52:34 -08:00
Simon Horman	25cd9ba0ab	openvswitch: Add basic MPLS support to kernel Allow datapath to recognize and extract MPLS labels into flow keys and execute actions which push, pop, and set labels on packets. Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer. Cc: Ravi K <rkerur@gmail.com> Cc: Leo Alterman <lalterman@nicira.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:33 -08:00
Pravin B Shelar	59b93b41e7	net: Remove MPLS GSO feature. Device can export MPLS GSO support in dev->mpls_features same way it export vlan features in dev->vlan_features. So it is safe to remove NETIF_F_GSO_MPLS redundant flag. Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:33 -08:00
Tom Herbert	e1b2cb6550	fou: Fix typo in returning flags in netlink When filling netlink info, dport is being returned as flags. Fix instances to return correct value. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 22:18:20 -05:00
Daniel Borkmann	4c672e4b42	ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs It has been reported that generating an MLD listener report on devices with large MTUs (e.g. 9000) and a high number of IPv6 addresses can trigger a skb_over_panic(): skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20 head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0 dev:port1 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:100! invalid opcode: 0000 [#1] SMP Modules linked in: ixgbe(O) CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4 [...] Call Trace: <IRQ> [<ffffffff80578226>] ? skb_put+0x3a/0x3b [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4 [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45 [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68 [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182 [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3 [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46 [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70 mld_newpack() skb allocations are usually requested with dev->mtu in size, since commit `72e09ad107` ("ipv6: avoid high order allocations") we have changed the limit in order to be less likely to fail. However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb) macros, which determine if we may end up doing an skb_put() for adding another record. To avoid possible fragmentation, we check the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong assumption as the actual max allocation size can be much smaller. The IGMP case doesn't have this issue as commit `57e1ab6ead` ("igmp: refine skb allocations") stores the allocation size in the cb[]. Set a reserved_tailroom to make it fit into the MTU and use skb_availroom() helper instead. This also allows to get rid of igmp_skb_size(). Reported-by: Wei Liu <lw1a2.jing@gmail.com> Fixes: `72e09ad107` ("ipv6: avoid high order allocations") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: David L Stevens <david.stevens@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 22:12:30 -05:00
Joe Perches	1744bea1fa	net: Convert SEQ_START_TOKEN/seq_printf to seq_puts Using a single fixed string is smaller code size than using a format and many string arguments. Reduces overall code size a little. $ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o* text data bss dec hex filename 34269 7012 14824 56105 db29 net/ipv4/igmp.o.new 34315 7012 14824 56151 db57 net/ipv4/igmp.o.old 30078 7869 13200 51147 c7cb net/ipv6/mcast.o.new 30105 7869 13200 51174 c7e6 net/ipv6/mcast.o.old 11434 3748 8580 23762 5cd2 net/ipv6/ip6_flowlabel.o.new 11491 3748 8580 23819 5d0b net/ipv6/ip6_flowlabel.o.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 22:04:55 -05:00
Marcelo Leitner	1f37bf87aa	tcp: zero retrans_stamp if all retrans were acked Ueki Kohei reported that when we are using NewReno with connections that have a very low traffic, we may timeout the connection too early if a second loss occurs after the first one was successfully acked but no data was transfered later. Below is his description of it: When SACK is disabled, and a socket suffers multiple separate TCP retransmissions, that socket's ETIMEDOUT value is calculated from the time of the first retransmission instead of the latest retransmission. This happens because the tcp_sock's retrans_stamp is set once then never cleared. Take the following connection: Linux remote-machine \| \| send#1---->(1)\|--------> data#1 --------->\| \| \| \| RTO : : \| \| \| ---(2)\|----> data#1(retrans) ---->\| \| (3)\|<---------- ACK <----------\| \| \| \| \| : : \| : : \| : : 16 minutes (or more) : \| : : \| : : \| : : \| \| \| send#2---->(4)\|--------> data#2 --------->\| \| \| \| RTO : : \| \| \| ---(5)\|----> data#2(retrans) ---->\| \| \| \| \| \| \| RTO2 : : \| \| \| \| \| \| ETIMEDOUT<----(6)\| \| (1) One data packet sent. (2) Because no ACK packet is received, the packet is retransmitted. (3) The ACK packet is received. The transmitted packet is acknowledged. At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event". (4) After 16 minutes (to correspond with retries2=15), a new data packet is sent. Note: No data is transmitted between (3) and (4). The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago. (5) Because no ACK packet is received, the packet is retransmitted. (*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT. Therefore, now we clear retrans_stamp as soon as all data during the loss window is fully acked. Reported-by: Ueki Kohei Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:59:49 -05:00
David S. Miller	51f3d02b98	net: Add and use skb_copy_datagram_msg() helper. This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:46:40 -05:00
Tom Herbert	a8d31c128b	gue: Receive side of remote checksum offload Add processing of the remote checksum offload option in both the normal path as well as the GRO path. The implements patching the affected checksum to derive the offloaded checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:04 -05:00
Tom Herbert	b17f709a24	gue: TX support for using remote checksum offload option Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure remote checksum offload on an IP tunnel. Add logic in gue_build_header to insert remote checksum offload option. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	e585f23636	udp: Changes to udp_offload to support remote checksum offload Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote checksum offload being done (in this case inner checksum must not be offloaded to the NIC). Added logic in __skb_udp_tunnel_segment to handle remote checksum offload case. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	5024c33ac3	gue: Add infrastructure for flags and options Add functions and basic definitions for processing standard flags, private flags, and control messages. This includes definitions to compute length of optional fields corresponding to a set of flags. Flag validation is in validate_gue_flags function. This checks for unknown flags, and that length of optional fields is <= length in guehdr hlen. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	4bcb877d25	udp: Offload outer UDP tunnel csum if available In __skb_udp_tunnel_segment if outer UDP checksums are enabled and ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload if device features allow it. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	63487babf0	net: Move fou_build_header into fou.c and refactor Move fou_build_header out of ip_tunnel.c and into fou.c splitting it up into fou_build_header, gue_build_header, and fou_build_udp. This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap to call fou_build_header or gue_build_header based on the tunnel encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen functions which are called by ip_encap_hlen. New net/fou.h has prototypes and defines for this. Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels can use FOU/GUE and fou module is also selected. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:02 -05:00
Alexander Aring	0916c02205	mac802154: fix typo promisuous to promiscuous Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:06 +01:00
Alexander Aring	e57a894684	mac802154: use IEEE802154_EXTENDED_ADDR_LEN This patch removes the af_ieee802154 defines and use the IEEE802154_EXTENDED_ADDR_LEN. We should do this everywhere in the 802.15.4 subsystem because af_ieee802154 should be normally an uapi header. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:06 +01:00
Alexander Aring	dee56d1477	mac802154: add support for perm_extended_addr This patch adding support for a perm extended address. This is useful when a device supports an eeprom with a programmed static extended address. If a device doesn't support such eeprom or serial registers then the driver should generate a random extended address. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:05 +01:00
Alexander Aring	705cbbbe9c	mac802154: cleanup ieee802154_netdev_to_extended_addr This patch cleanups the ieee802154_be64_to_le64 to have a similar function like ieee802154_le64_to_be64 only with switched source and destionation types. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:05 +01:00
Alexander Aring	7c118c1a86	mac802154: add ieee802154_vif struct This patch adds an ieee802154_vif similar like the ieee80211_vif which holds the interface type and maybe further more attributes like the ieee80211_vif structure. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:04 +01:00
Alexander Aring	e4962a1443	mac802154: add default interface registration This patch adds a default interface registration for a wpan interface type. Currently the 802.15.4 subsystem need to call userspace tools to add an interface. This patch is like mac80211 handling for registration a station interface type by default. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:04 +01:00
Alexander Aring	bd28a11f25	ieee802154: remove mlme get_phy callback This patch removes the get_phy callback from mlme ops structure. Instead we doing a dereference via ieee802154_ptr dev pointer. For backwards compatibility we need to run get_device after dereference wpan_phy via ieee802154_ptr. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:04 +01:00
Alexander Aring	d5ae67bacd	ieee802154: rework interface registration This patch meld mac802154_netdev_register into ieee802154_if_add function. Also we have now only one alloc_netdev call with one interface setup routine "ieee802154_if_setup" instead two different one for each interface type. This patch checks via runtime the interface type and do different handling now. Additional we add the wpan_dev struct in ieee802154_sub_if_data and set the new ieee802154_ptr while netdev registration. This behaviour is very similar the mac80211 netdev registration functionality. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:04 +01:00
Alexander Aring	12cb56c237	mac802154: move dev_hold out of ieee802154_if_add This patch moves the dev_hold call inside of nl-phy ieee802154_add_iface function. The ieee802154_add_iface is the only one function which use the ieee802154_if_add function and contains the corresponding dev_put call. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:03 +01:00
Alexander Aring	986a8abfc5	mac802154: move interface add handling in iface This patch moves and renames the mac802154_add_iface and mac802154_netdev_register functions into iface.c. The function mac802154_add_iface is renamed to ieee802154_if_add which is a similar naming convention like mac80211. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:03 +01:00
Alexander Aring	b210b18747	mac802154: move interface del handling in iface This patch moves and rename the mac802154_del_iface function into iface.c and rename the function to ieee802154_if_remove which is a similar naming convention like mac80211. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:03 +01:00
Alexander Aring	9f3295b9ea	ieee802154: remove nl802154 unused functions The include/net/nl802154.h file contains a lot of prototypes which are not used inside of ieee802154 subsystem. This patch removes this file and make the only one used prototype "ieee802154_nl_start_confirm" as static declaration in ieee802154/nl-mac.c Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:03 +01:00
Alexander Aring	53f9ee61b4	ieee802154: rework wpan_phy index assignment This patch reworks the wpan_phy index incrementation. It's now similar like wireless wiphy index incrementation. We move the wpan_phy index attribute inside of cfg802154_registered_device and use atomic operations instead locking mechanism via wpan_phy_mutex. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-05 21:53:03 +01:00
Jesse Gross	d3ca9eafc0	geneve: Unregister pernet subsys on module unload. The pernet ops aren't ever unregistered, which causes a memory leak and an OOPs if the module is ever reinserted. Fixes: `0b5e8b8eea` ("net: Add Geneve tunneling protocol driver") CC: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 15:00:51 -05:00
Jesse Gross	45cac46e51	geneve: Set GSO type on transmit. Geneve does not currently set the inner protocol type when transmitting packets. This causes GSO segmentation to fail on NICs that do not support Geneve offloading. CC: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 15:00:51 -05:00
Steven Rostedt (Red Hat)	e71456ae98	netfilter: Remove checks of seq_printf() return values The return value of seq_printf() is soon to be removed. Remove the checks from seq_printf() in favor of seq_has_overflowed(). Link: http://lkml.kernel.org/r/20141104142236.GA10239@salvia Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-05 14:11:02 -05:00
Joe Perches	824f1fbee7	netfilter: Convert print_tuple functions to return void Since adding a new function to seq_file (seq_has_overflowed()) there isn't any value for functions called from seq_show to return anything. Remove the int returns of the various print_tuple/<foo>_print_tuple functions. Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-05 14:10:33 -05:00
Steven Rostedt (Red Hat)	37246a5837	netfilter: Remove return values for print_conntrack callbacks The seq_printf() and friends are having their return values removed. The print_conntrack() returns the result of seq_printf(), which is meaningless when seq_printf() returns void. Might as well remove the return values of print_conntrack() as well. Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-05 14:09:47 -05:00
Fabian Frederick	6cf1093e58	udp: remove blank line between set and test Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 17:12:10 -05:00
Florent Fourcot	869ba988fe	ipv6: trivial, add bracket for the if block The "else" block is on several lines and use bracket. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 17:10:19 -05:00
Fabian Frederick	05006e8c59	esp4: remove assignment in if condition Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 16:57:49 -05:00
John W. Linville	bf515fb11a	This relatively large batch of changes is comprised of the following: * large mac80211-hwsim changes from Ben, Jukka and a bit myself * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical University in Prague and Volkswagen Group Research * minstrel VHT work from Karl * more CSA work from Luca * WMM admission control support in mac80211 (myself) * various smaller fixes, spelling corrections, and minor API additions -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUWMs/AAoJEDBSmw7B7bqrmBQQAIbfAe7wH1WifRtOnhw3zWQQ K36+Edf3HlQ+EIkSs63QousRj2e7pGDOyhzMWLaqsmeTLteUtlGbr7qwiJO1QZdf Ml2V5O2s+b8hUIClDBVQF2L6+GGUmRUdQqvDDhkN1guoxD/Nk8cNtsRkSdiXWJWy R48NzvYDflBhc8uqPtR8jDb10eM3c00YP9HB+w9hYAfizD+FRue7UNp4MQIqwp9V HdKRT6L2n/6QA+Mzse0rMDes5qI7nIUNgj+hjqgJSnhITPMgGR5j/pitnVHrr81M ngOipBFG3svsQrwZh8nM4Llp0cM4Gs+GlgCieu9+TJpr2sY00Z3kYcp0pxtDoSxz Wblqz9n/bnW9mrkEfl12XqwwT5vguchwHoZ9cXhejDxSawWXoTRx20uW4ahO8ArA kWwwjTBVsQ5WMCtOBiqggzNKghwCc2ILmcZnjGdg9aNXcWsmQ4vyeCfG2QxBz/UB Grv/f9NSy6mzKQ34yv+lyR7rFZ8XcT03EVAnZSYz8X0ZZGxwtFupRp1RrBh1KPtD TJoe6Q71FfHKYRJ2xgygYkQFo+r9d0BKBeerq+Vu2hBeaqyi4aUwSj7d1sUaaq6N tL8fmAUqFjVOOUFeH1g07Xke5QD+yrEC7sJKkeRMfcRGB+dEa+2m3I5p4WDz9bWM AEvFSsYr/I9KI4d1huXD =6GIj -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg <johannes@sipsolutions.net> says: "This relatively large batch of changes is comprised of the following: * large mac80211-hwsim changes from Ben, Jukka and a bit myself * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical University in Prague and Volkswagen Group Research * minstrel VHT work from Karl * more CSA work from Luca * WMM admission control support in mac80211 (myself) * various smaller fixes, spelling corrections, and minor API additions" Conflicts: drivers/net/wireless/ath/wil6210/cfg80211.c Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-11-04 16:18:12 -05:00
Florian Westphal	f7b3bec6f5	net: allow setting ecn via routing table This patch allows to set ECN on a per-route basis in case the sysctl tcp_ecn is not set to 1. In other words, when ECN is set for specific routes, it provides a tcp_ecn=1 behaviour for that route while the rest of the stack acts according to the global settings. One can use 'ip route change dev $dev $net features ecn' to toggle this. Having a more fine-grained per-route setting can be beneficial for various reasons, for example, 1) within data centers, or 2) local ISPs may deploy ECN support for their own video/streaming services [1], etc. There was a recent measurement study/paper [2] which scanned the Alexa's publicly available top million websites list from a vantage point in US, Europe and Asia: Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely blamed to commit `255cac91c3` ("tcp: extend ECN sysctl to allow server-side only ECN") ;)); the break in connectivity on-path was found is about 1 in 10,000 cases. Timeouts rather than receiving back RSTs were much more common in the negotiation phase (and mostly seen in the Alexa middle band, ranks around 50k-150k): from 12-thousand hosts on which there _may_ be ECN-linked connection failures, only 79 failed with RST when _not_ failing with RST when ECN is not requested. It's unclear though, how much equipment in the wild actually marks CE when buffers start to fill up. We thought about a fallback to non-ECN for retransmitted SYNs as another global option (which could perhaps one day be made default), but as Eric points out, there's much more work needed to detect broken middleboxes. Two examples Eric mentioned are buggy firewalls that accept only a single SYN per flow, and middleboxes that successfully let an ECN flow establish, but later mark CE for all packets (so cwnd converges to 1). [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15 [2] http://ecn.ethz.ch/ Joint work with Daniel Borkmann. Reference: http://thread.gmane.org/gmane.linux.network/335797 Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 16:06:09 -05:00
Florian Westphal	f1673381b1	syncookies: split cookie_check_timestamp() into two functions The function cookie_check_timestamp(), both called from IPv4/6 context, is being used to decode the echoed timestamp from the SYN/ACK into TCP options used for follow-up communication with the peer. We can remove ECN handling from that function, split it into a separate one, and simply rename the original function into cookie_decode_options(). cookie_decode_options() just fills in tcp_option struct based on the echoed timestamp received from the peer. Anything that fails in this function will actually discard the request socket. While this is the natural place for decoding options such as ECN which commit `172d69e63c` ("syncookies: add support for ECN") added, we argue that in particular for ECN handling, it can be checked at a later point in time as the request sock would actually not need to be dropped from this, but just ECN support turned off. Therefore, we split this functionality into cookie_ecn_ok(), which tells us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled. This prepares for per-route ECN support: just looking at the tcp_ecn sysctl won't be enough anymore at that point; if the timestamp indicates ECN and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric. This would mean adding a route lookup to cookie_check_timestamp(), which we definitely want to avoid. As we already do a route lookup at a later point in cookie_{v4,v6}_check(), we can simply make use of that as well for the new cookie_ecn_ok() function w/o any additional cost. Joint work with Daniel Borkmann. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 16:06:09 -05:00
Florian Westphal	274e2da0ec	syncookies: avoid magic values and document which-bit-is-what-option Was a bit more difficult to read than needed due to magic shifts; add defines and document the used encoding scheme. Joint work with Daniel Borkmann. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 16:06:08 -05:00
Mika Westerberg	72daceb9a1	net: rfkill: gpio: Add default GPIO driver mappings for ACPI The driver uses devm_gpiod_get_index(..., index) so that the index refers directly to the GpioIo resource under the ACPI device. The problem with this is that if the ordering changes we get wrong GPIOs. With ACPI 5.1 _DSD we can now use names instead to reference GPIOs analogous to Device Tree. However, we still have systems out there that do not provide _DSD at all. These systems must be supported as well. Luckily we now have acpi_dev_add_driver_gpios() that can be used to provide mappings for systems where _DSD is not provided and still take advantage of _DSD if it exists. This patch changes the driver to create default GPIO mappings if we are running on ACPI system. While there we can drop the indices completely and use devm_gpiod_get() with name instead. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: John W. Linville <linville@tuxdriver.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-04 21:58:24 +01:00
John W. Linville	0c9a67c8f1	This contains another small set of fixes for 3.18, these are all over the place and most of the bugs are old, one even dates back to the original mac80211 we merged into the kernel. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUWJSYAAoJEDBSmw7B7bqr17IP/3RFbqI4S/ceZzrVNLtvGDUd MIlkGhngDdhpFhSxTdOH4opFM/j9bkwndk0F35d4r94mCeB5eJQKrtUfeon/7aft AKaRa3CNsEVQgCempCYOKGwlJZQQ86IL6IvU4CW5CTNHENUBLA83KHqX+6Aoumhm mdJxhSzmB53Qn1bteIJXyJmOjgxQvBZggBIF/25Xnosb3FBH3hvPsH0qbIKZaicy PlD5JWk9UseySjLNwk1/jriQ4koF5Dy/BVRyQ/0fRYswdmS3o2EiC4JOWjsOfIUi NE9Ax+DAKvHHGYNcsX/hXsPJTc6fYgq3INEZBvnK04GHVFVGLq1WoEIfOeLugK7o j7OIEJbkKAQjJSnEpB9Y6YHO/jPXEokJjUNT7VuZJqLElp4Hd8K9jnhKD9jkZBA6 TGjNO5NJqgGdlxnq3nu4+XFh9StAam6J1Ey1TWarc6Kxd8Gtg3Ymkj3cO46rHcQU JX3i3RGlYqibEQ0NVtZ4EfnGjtcGx0Vbf+yAc9ZpWzKFvX9YKS1wuOd5i/eZI8bb hxMjHFwmViV3Ifk9GjBNKioXkCpEfk9Q3pKzRllHQn56ueTu1mBvAfIe93PRm9kR y/giIZvHEhs8VH2PHVuHzT16YMVnNfQniAi+BK73QWC3zAhj1ss3xN33+Q8FfpMM xw/prlY9IAH2A9zis1Vz =8uwd -----END PGP SIGNATURE----- Merge tag 'mac80211-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg <johannes@sipsolutions.net> says: "This contains another small set of fixes for 3.18, these are all over the place and most of the bugs are old, one even dates back to the original mac80211 we merged into the kernel." Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-11-04 15:56:33 -05:00
Fabian Frederick	436f7c2068	igmp: remove camel case definitions use standard uppercase for definitions Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:13:18 -05:00
Fabian Frederick	c18450a52a	udp: remove else after return else is unnecessary after return 0 in __udp4_lib_rcv() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:13:18 -05:00
Fabian Frederick	aa1f731e52	inet: frags: remove inline on static in c file remove __inline__ / inline and let compiler decide what to do with static functions Inspired-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:13:18 -05:00
Fabian Frederick	0d3979b9c7	ipv4: remove 0/NULL assignment on static static values are automatically initialized to 0 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:09:52 -05:00
Fabian Frederick	c9f503b006	ipv4: use seq_puts instead of seq_printf where possible Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:09:52 -05:00
Fabian Frederick	b92022f3e5	tcp: spelling s/plugable/pluggable Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:09:52 -05:00
Fabian Frederick	988b13438c	cipso: remove NULL assignment on static Also add blank line after structure declarations Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:09:52 -05:00
Fabian Frederick	4c787b1626	ipv4: include linux/bug.h instead of asm/bug.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:09:20 -05:00
Fabian Frederick	4973404f81	cipso: kerneldoc warning fix Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-04 15:09:20 -05:00
Pablo Neira Ayuso	c5a589cc30	netfilter: nf_log: fix sparse warning in nf_logger_find_get() net/netfilter/nf_log.c:157:16: warning: incorrect type in assignment (different address spaces) net/netfilter/nf_log.c:157:16: expected struct nf_logger logger net/netfilter/nf_log.c:157:16: got struct nf_logger [noderef] <asn:4><noident> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-11-04 17:56:31 +01:00
Simon Vincent	980edbd503	6lowpan: fix udp header compression when using raw sockets If you use RAW sockets the transport header offset is not set by the ipv6 stack so when we get to the udp header compression it does not compress the right part of the packet. This patch adds a check for this scenario and sets the transport header offset. Signed-off-by: Simon Vincent <simon.vincent@xsilon.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-04 17:31:01 +01:00
Henning Rogge	1ef4c85049	cfg80211: fix nl80211 cmd id in nl80211_send_mpath() Netlink command for nl80211_send_mpath() should be NL80211_CMD_NEW_MPATH. Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-04 16:37:22 +01:00
Eliad Peller	cf2c92d840	mac80211: replace restart_complete() with reconfig_complete() Drivers might want to know also when mac80211 has completed reconfiguring after resume (e.g. in order to know when frames can be passed to mac80211). Rename restart_complete() to a more-generic reconfig_complete(), and add a new enum to indicate the reconfiguration type. Update the current users with the new prototype. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-04 13:49:00 +01:00
Andrei Otcheretianski	13a8098af9	mac80211: increase U-APSD max service period length Deliver up to 128 frames during service period instead of 8 if unlimited is specified by the client during association. 8 was just an arbitrary value; so is 128 since unlimited can be any number. However for large traffic bursts, increasing this value looks reasonable. Also, it seems that a few certification tests expect more frames to be delivered during SP. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-04 13:18:22 +01:00
Johannes Berg	8ed2874715	mac80211: handle RIC data element in reassociation request When the RIC data element (RDE) is included in the IEs coming from userspace for an association request, its handling is currently broken as any IEs that are contained within it would be split off from it and inserted again after all the IEs that mac80211 generates (e.g. HT, VHT.) To fix this, treat the RIC element specially, and stop after it only when we find something that doesn't actually belong to it. This assumes userspace is actually correctly building it, directly after the fast BSS transition IE and before all the others like extended capabilities. This leaves as a potential problem the case where userspace is building the following IEs: [RDE] [vendor resource description] [vendor non-resource IE] In this case, we'd erroneously consider all three IEs to be part of the RIC data together, and not split them between the two vendor IEs. Unfortunately, it isn't easily possible to distinguish vendor IEs, so this isn't easy to fix. Luckily, this case is rare as normally wpa_supplicant will include an extended capabilities IE in the IEs, and that certainly will break the two vendor IEs apart correctly. Reviewed-by: Eliad Peller <eliad@wizery.com> Reviewed-by: Beni Lev <beni.lev@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-04 13:18:21 +01:00
Rostislav Lisovy	239281f803	mac80211: 802.11p OCB mode support This patch adds 802.11p OCB (Outside the Context of a BSS) mode support. When communicating in OCB mode a mandatory wildcard BSSID (48 '1' bits) is used. The EDCA parameters handling function was changed to support 802.11p specific values. The insertion of a newly discovered STAs is done in the similar way as in the IBSS mode -- through the deferred insertion. The OCB mode uses a periodic 'housekeeping task' for expiration of disconnected STAs (in the similar manner as in the MESH mode). New Kconfig option for verbose OCB debugging outputs is added. Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-04 13:18:21 +01:00
Rostislav Lisovy	6e0bd6c35b	cfg80211: 802.11p OCB mode handling This patch adds new iface type (NL80211_IFTYPE_OCB) representing the OCB (Outside the Context of a BSS) mode. When establishing a connection to the network a cfg80211_join_ocb function is called (particular nl80211_command is added as well). A mandatory parameters during the ocb_join operation are 'center frequency' and 'channel width (5/10 MHz)'. Changes done in mac80211 are minimal possible required to avoid many warnings (warning: enumeration value 'NL80211_IFTYPE_OCB' not handled in switch) during compilation. Full functionality (where needed) is added in the following patch. Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-04 13:18:17 +01:00
Felix Fietkau	5b3dc42b1b	mac80211: add support for driver tx power reporting The configured tx power is often limited by hardware capabilities, channel settings, antenna configuration, etc. Signed-off-by: Felix Fietkau <nbd@openwrt.org> [fix tracing compilation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-04 10:15:09 +01:00
Johan Hedberg	2a68c89724	Bluetooth: Fix sparse warnings in RFCOMM This patch fixes the following sparse warnings in rfcomm/core.c: net/bluetooth/rfcomm/core.c:391:16: warning: dubious: x \| !y net/bluetooth/rfcomm/core.c:546:24: warning: dubious: x \| !y Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-04 08:01:46 +01:00
Peter Zijlstra	ff960a7317	netdev, sched/wait: Fix sleeping inside wait event rtnl_lock_unregistering*() take rtnl_lock() -- a mutex -- inside a wait loop. The wait loop relies on current->state to function, but so does mutex_lock(), nesting them makes for the inner to destroy the outer state. Fix this using the new wait_woken() bits. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Cong Wang <cwang@twopensource.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jerry Chu <hkchu@google.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com> Cc: stephen hemminger <stephen@networkplumber.org> Cc: Tom Gundersen <teg@jklm.no> Cc: Tom Herbert <therbert@google.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20141029173110.GE15602@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:48 +01:00
Peter Zijlstra	eedf7e47da	rfcomm, sched/wait: Fix broken wait construct rfcomm_run() is a tad broken in that is has a nested wait loop. One cannot rely on p->state for the outer wait because the inner wait will overwrite it. Fix this using the new wait_woken() facility. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Alexander Holler <holler@ahsoftware.de> Cc: David S. Miller <davem@davemloft.net> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Joe Perches <joe@perches.com> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Libor Pechacek <lpechacek@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Vignesh Raman <Vignesh_Raman@mentor.com> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:47 +01:00
Greg Kroah-Hartman	a8a93c6f99	Merge branch 'platform/remove_owner' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into driver-core-next Remove all .owner fields from platform drivers	2014-11-03 19:53:56 -08:00
Linus Torvalds	ce1928da84	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "There is a GFP flag fix from Mike Christie, an error code fix from Jan, and fixes for two unnecessary allocations (kmalloc and workqueue) from Ilya. All are well tested. Ilya has one other fix on the way but it didn't get tested in time" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: eliminate unnecessary allocation in process_one_ticket() rbd: Fix error recovery in rbd_obj_read_sync() libceph: use memalloc flags for net IO rbd: use a single workqueue for all devices	2014-11-03 15:04:26 -08:00
Eric Dumazet	56b174256b	net: add rbnode to struct sk_buff Yaogong replaces TCP out of order receive queue by an RB tree. As netem already does a private skb->{next/prev/tstamp} union with a 'struct rb_node', lets do this in a cleaner way. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yaogong Wang <wygivan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-03 16:13:03 -05:00
Steffen Klassert	f03eb128e3	gre6: Move the setting of dev->iflink into the ndo_init functions. Otherwise it gets overwritten by register_netdev(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-03 15:42:24 -05:00
Steffen Klassert	ebe084aafb	sit: Use ipip6_tunnel_init as the ndo_init function. ipip6_tunnel_init() sets the dev->iflink via a call to ipip6_tunnel_bind_dev(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the ndo_init function. Then ipip6_tunnel_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-03 15:42:24 -05:00
Steffen Klassert	16a0231bf7	vti6: Use vti6_dev_init as the ndo_init function. vti6_dev_init() sets the dev->iflink via a call to vti6_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for vti6 tunnels. Fix this by using vti6_dev_init() as the ndo_init function. Then vti6_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-03 15:42:24 -05:00
Steffen Klassert	6c6151daaf	ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function. ip6_tnl_dev_init() sets the dev->iflink via a call to ip6_tnl_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the ndo_init function. Then ip6_tnl_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-03 15:42:24 -05:00
Eric Dumazet	d75b1ade56	net: less interrupt masking in NAPI net_rx_action() can mask irqs a single time to transfert sd->poll_list into a private list, for a very short duration. Then, napi_complete() can avoid masking irqs again, and net_rx_action() only needs to mask irq again in slow path. This patch removes 2 couples of irq mask/unmask per typical NAPI run, more if multiple napi were triggered. Note this also allows to give control back to caller (do_softirq()) more often, so that other softirq handlers can be called a bit earlier, or ksoftirqd can be wakeup earlier under pressure. This was developed while testing an alternative to RX interrupt mitigation to reduce latencies while keeping or improving GRO aggregation on fast NIC. Idea is to test napi->gro_list at the end of a napi->poll() and reschedule one NAPI poll, but after servicing a full round of softirqs (timers, TX, rcu, ...). This will be allowed only if softirq is currently serviced by idle task or ksoftirqd, and resched not needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-03 12:25:09 -05:00
Guenter Roeck	c1207c049b	netfilter: nft_reject_bridge: Fix powerpc build error Fix: net/bridge/netfilter/nft_reject_bridge.c: In function 'nft_reject_br_send_v6_unreach': net/bridge/netfilter/nft_reject_bridge.c:240:3: error: implicit declaration of function 'csum_ipv6_magic' csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr, ^ make[3]: *** [net/bridge/netfilter/nft_reject_bridge.o] Error 1 Seen with powerpc:allmodconfig. Fixes: `523b929d54` ("netfilter: nft_reject_bridge: don't use IP stack to reject traffic") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-03 12:12:34 -05:00
Szymon Janc	a736abc1ac	Bluetooth: Fix invalid response for 'Start Discovery' command According to Management Interface API 'Start Discovery' command should generate a Command Complete event on failure. Currently kernel is sending Command Status on early errors. This results in userspace ignoring such event due to invalid size. bluetoothd[28499]: src/adapter.c:trigger_start_discovery() bluetoothd[28499]: src/adapter.c:cancel_passive_scanning() bluetoothd[28499]: src/adapter.c:start_discovery_timeout() bluetoothd[28499]: src/adapter.c:start_discovery_complete() status 0x0a bluetoothd[28499]: Wrong size of start discovery return parameters Reported-by: Jukka Taimisto <jtt@codenomicon.com> Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-11-03 15:43:05 +02:00
Johannes Berg	b8fff407a1	mac80211: fix use-after-free in defragmentation Upon receiving the last fragment, all but the first fragment are freed, but the multicast check for statistics at the end of the function refers to the current skb (the last fragment) causing a use-after-free bug. Since multicast frames cannot be fragmented and we check for this early in the function, just modify that check to also do the accounting to fix the issue. Cc: stable@vger.kernel.org Reported-by: Yosef Khyal <yosefx.khyal@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-11-03 14:28:50 +01:00
Marcel Holtmann	40f4938aa6	Bluetooth: Consolidate whitelist debugfs entry into device_list The debufs entry for the BR/EDR whitelist is confusing since there is a controller debugfs entry with the name white_list and both are two different things. With the BR/EDR whitelist, the actual interface in use is the device list and thus just include all values from the internal BR/EDR whitelist in the device_list debugfs entry. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-11-03 10:13:42 +02:00
dingzhi	f293a5e33e	xfrm: add XFRMA_REPLAY_VAL attribute to SA messages After this commit, the attribute XFRMA_REPLAY_VAL is added when no ESN replay value is defined. Thus sequence number values are always notified to userspace. Signed-off-by: dingzhi <zhi.ding@6wind.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-11-03 08:54:52 +01:00
Alexander Aring	868ed8e06a	ieee802154: remove unnecessary functions This patch fixes commit `c7420c367d` ("mac802154: move mac_params functions into mac_cmd"). The mac_params functions wasn't deleted by this commit. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 21:52:03 +01:00
Alexander Aring	fdd2068ab7	mac802154: cfg: add missing include Running make C=2 occurs warning: symbol 'mac802154_config_ops' was not declared. Should it be static? This patch adds a missing include in cfg.c to solve this warning. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 21:52:02 +01:00
Alexander Aring	c5fbbc4683	ieee802154: sysfs: add missing include Running make C=2 occurs in warnings: symbol 'wpan_phy_class' was not declared. Should it be static? symbol 'wpan_phy_sysfs_init' was not declared. Should it be static? symbol 'wpan_phy_sysfs_exit' wasnot declared. Should it be static? This patch adds a missing include "sysfs.h" to solve these warnings. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 21:52:02 +01:00
Linus Torvalds	4cb8c3593b	irda: stop calling sk_prot->disconnect() on connection failure The sk_prot is irda's own set of protocol handlers, so irda should statically know what that function is anyway, without using an indirect pointer. And as it happens, we know exactly what that pointer is statically: it's NULL, because irda doesn't define a disconnect operation. So calling that function is doubly wrong, and will just cause an oops. Reported-by: Martin Lang <mlg.hessigheim@gmail.com> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-02 10:20:26 -08:00
Marcel Holtmann	75e0569f7f	Bluetooth: Add hci_reset_dev() for driver triggerd stack reset Some Bluetooth drivers require to reset the upper stack. To avoid having all drivers send HCI Hardware Error events, provide a generic function to wrap the reset functionality. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-11-02 10:03:45 +02:00
Marcel Holtmann	65efd2bf48	Bluetooth: Introduce BT_BREDR and BT_LE config options The current kernel options do not make it clear which modules are for Bluetooth Classic (BR/EDR) and which are for Bluetooth Low Energy (LE). To make it really clear, introduce BT_BREDR and BT_LE options with proper dependencies into the different modules. Both new options default to y to not create a regression with previous kernel config files. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-11-02 10:01:53 +02:00
Marcel Holtmann	24dfa34371	Bluetooth: Print error message for HCI_Hardware_Error event When the HCI_Hardware_Error event is send by the controller or injected by the driver, then at least print an error message. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-11-02 09:59:42 +02:00
Marcel Holtmann	8761f9d662	Bluetooth: Check status of command complete for HCI_Reset When the HCI_Reset command returns, the status needs to be checked. It is unlikely that HCI_Reset actually fails, but when it fails, it is a bad idea to reset all values since the controller will have not reset its values in that case. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-11-02 09:58:50 +02:00
Alexander Aring	6290671018	ieee802154: 6lowpan: remove set of mac address Currently the ieee802154 6lowpan interface operates on wpan interfaces only. Setting the wpan mac address over 6lowpan interface is complex and maybe we can't never do this. This patch removes the set of mac address handling in ieee802154 6lowpan interface for now. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:08 +01:00
Alexander Aring	ea7053c1df	mac802154: iface: add validation for extended address This patch use the validation function to check if an extended address is valid or not while set the extended address. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:08 +01:00
Alexander Aring	f59f419d31	mac802154: move phy settings into netlink receive All PHY attributes should be directly set to the transceiver after netlink. MAC attributes should be set by interface up. Currently the macparams netlink cmd contains mixed attributes of phy and mac settings. This patch moves all phy settings to the netlink receive function for setting macparams. This is the only way which doesn't change the userspace API and keep the deprecated netlink interface alive. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:07 +01:00
Alexander Aring	50c7907501	mac802154: set panid address filter on ifup This patch moves the setting of hardware panid address filtering inside of interface up instead doing it it directly inside of netlink interface. The netlink call which can only be called when netif isn't running sets only the necessary panid value in sdata. After an interface up the address filter will be set with this value. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:07 +01:00
Alexander Aring	78b4bad16e	mac802154: set short address filter on ifup This patch moves the setting of hardware short address filtering inside of interface up instead doing it it directly inside of netlink interface. The netlink call which can only be called when netif isn't running sets only the necessary short_addr value in sdata. After an interface up the address filter will be set with this value. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:07 +01:00
Alexander Aring	776e59de46	mac802154: set extended address filter on ifup This patch moves the setting of hardware extended address filtering inside of interface up instead doing it directly inside of netlink interface. Also we don't need to set the sdata extended attribute in netlink. This is already done by ndo_set_mac_address of net_device_ops. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:07 +01:00
Alexander Aring	8f499f991c	ieee802154: don't allow to change addr while netif_running This patch changes the actual behaviour for setting address attributes. We should not change addresses while netif_running is true. Furthermore when netif_running is running the address attributes becomes read only and we can remove locking mechanism in receive and transmit hothpaths of 802.15.4 subsystem. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:07 +01:00
Alexander Aring	4a9a816a4f	cfg802154: convert deprecated iface add and del This patch removes the wpan_phy callbacks for add and del an interface on a phy. Instead we introduce deprecated cfg802154 callbacks for this. Furthermore we introduce a new netlink interface nl802154 which use different callbacks. The deprecated function is to have a backwards compatibility with the current netlink interface. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:06 +01:00
Alexander Aring	ea4dcd32a4	ieee802154: add helper wpan_phy_to_rdev function This patch introduce a function to get the cfg802154_registered_device from a wpan_phy. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:06 +01:00
Alexander Aring	1201cd22fd	mac802154: introduce mac802154_config_ops This patch introduces mac802154_config_ops struct. Like wireless this struct should be the only one interface between ieee802154 to mac802154 or possible HardMAC drivers. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:06 +01:00
Alexander Aring	a5dd1d72d8	cfg802154: introduce cfg802154_registered_device This patch introduce the cfg802154_registered_device struct. Like cfg80211_registered_device in wireless this should contain similar functionality for cfg802154. This patch should not change any behaviour. We just adds cfg802154_registered_device as container for wpan_phy struct. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:06 +01:00
Alexander Aring	ff4e65581e	ieee802154: remove default channel settings This patch removes the default channel setting. A channel is always set and there is no default channel setting according 802.15.4. Drivers should set the default channel and page in probing routine. This behaviour is currently a lack of all 802.15.4 drivers. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-11-02 04:51:06 +01:00
Chan-yeol Park	039fada5cd	Bluetooth: Fix hci_sync missing wakeup interrupt __hci_cmd_sync_ev(), __hci_req_sync() could miss wake_up_interrupt from hci_req_sync_complete() because hci_cmd_work() workqueue and its response could be completed before they are ready to get the signal through add_wait_queue(), set_current_state(TASK_INTERRUPTIBLE). Signed-off-by: Chan-yeol Park <chanyeol.park@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-11-01 23:20:21 +02:00
David S. Miller	55b42b5ca2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/phy/marvell.c Simple overlapping changes in drivers/net/phy/marvell.c Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-01 14:53:27 -04:00
Ilya Dryomov	e9226d7c9f	libceph: eliminate unnecessary allocation in process_one_ticket() Commit `c27a3e4d66` ("libceph: do not hard code max auth ticket len") while fixing a buffer overlow tried to keep the same as much of the surrounding code as possible and introduced an unnecessary kmalloc() in the unencrypted ticket path. It is likely to fail on huge tickets, so get rid of it. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-31 23:43:08 +03:00
Guenter Roeck	e0fb6fb6d5	net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0 If a driver supports reading EEPROM but no EEPROM is installed in the system, the driver's get_eeprom_len function returns 0. ethtool will subsequently try to read that zero-length EEPROM anyway. If the driver does not support EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver does support EEPROM access but no EEPROM is installed, the operation will return -EINVAL. Return -EOPNOTSUPP in both cases for consistency. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-31 16:12:34 -04:00
Pravin B Shelar	de05c400f7	mpls: Allow mpls_gso to be built as module Kconfig already allows mpls to be built as module. Following patch fixes Makefile to do same. CC: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-31 15:47:21 -04:00
Pravin B Shelar	f7065f4bd3	mpls: Fix mpls_gso handler. mpls gso handler needs to pull skb after segmenting skb. CC: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-31 15:47:21 -04:00
David S. Miller	e3a88f9c4f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== netfilter/ipvs fixes for net The following patchset contains fixes for netfilter/ipvs. This round of fixes is larger than usual at this stage, specifically because of the nf_tables bridge reject fixes that I would like to see in 3.18. The patches are: 1) Fix a null-pointer dereference that may occur when logging errors. This problem was introduced by `4a4739d56b` ("ipvs: Pull out crosses_local_route_boundary logic") in v3.17-rc5. 2) Update hook mask in nft_reject_bridge so we can also filter out packets from there. This fixes `36d2af5` ("netfilter: nf_tables: allow to filter from prerouting and postrouting"), which needs this chunk to work. 3) Two patches to refactor common code to forge the IPv4 and IPv6 reject packets from the bridge. These are required by the nf_tables reject bridge fix. 4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject packets from the bridge. The idea is to forge the reject packets and inject them to the original port via br_deliver() which is now exported for that purpose. 5) Restrict nft_reject_bridge to bridge prerouting and input hooks. the original skbuff may cloned after prerouting when the bridge stack needs to flood it to several bridge ports, it is too late to reject the traffic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-31 12:29:42 -04:00
Johannes Berg	de4fcbadde	cfg80211: avoid using default in interface type switch Most code avoids having a default case in interface type switch statements already, to make it easier to find places that need to be extended. Change the code in the __cfg80211_leave() and nl80211_key_allowed() functions to not have a default case. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-31 14:19:19 +01:00
Pablo Neira Ayuso	127917c29a	netfilter: nft_reject_bridge: restrict reject to prerouting and input Restrict the reject expression to the prerouting and input bridge hooks. If we allow this to be used from forward or any other later bridge hook, if the frame is flooded to several ports, we'll end up sending several reject packets, one per cloned packet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-31 12:50:09 +01:00
Pablo Neira Ayuso	523b929d54	netfilter: nft_reject_bridge: don't use IP stack to reject traffic If the packet is received via the bridge stack, this cannot reject packets from the IP stack. This adds functions to build the reject packet and send it from the bridge stack. Comments and assumptions on this patch: 1) Validate the IPv4 and IPv6 headers before further processing, given that the packet comes from the bridge stack, we cannot assume they are clean. Truncated packets are dropped, we follow similar approach in the existing iptables match/target extensions that need to inspect layer 4 headers that is not available. This also includes packets that are directed to multicast and broadcast ethernet addresses. 2) br_deliver() is exported to inject the reject packet via bridge localout -> postrouting. So the approach is similar to what we already do in the iptables reject target. The reject packet is sent to the bridge port from which we have received the original packet. 3) The reject packet is forged based on the original packet. The TTL is set based on sysctl_ip_default_ttl for IPv4 and per-net ipv6.devconf_all hoplimit for IPv6. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-31 12:50:08 +01:00
Pablo Neira Ayuso	8bfcdf6671	netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions That can be reused by the reject bridge expression to build the reject packet. The new functions are: * nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header. * nf_reject_ip6hdr_put(): to build the IPv6 header. * nf_reject_ip6_tcphdr_put(): to build the TCP header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-31 12:49:57 +01:00
Pablo Neira Ayuso	052b9498ee	netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions That can be reused by the reject bridge expression to build the reject packet. The new functions are: * nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header. * nf_reject_iphdr_put(): to build the IPv4 header. * nf_reject_ip_tcphdr_put(): to build the TCP header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-31 12:49:05 +01:00
Pablo Neira Ayuso	4d87716cd0	netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing Fixes: `36d2af5` ("netfilter: nf_tables: allow to filter from prerouting and postrouting") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-31 12:44:56 +01:00
Stephen Hemminger	d070f9137a	mac80211: fix spelling errors Use codespell to find spelling errors. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-31 08:29:56 +01:00
Ben Hutchings	5188cd44c5	drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets UFO is now disabled on all drivers that work with virtio net headers, but userland may try to send UFO/IPv6 packets anyway. Instead of sending with ID=0, we should select identifiers on their behalf (as we used to). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: `916e4cf46d` ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 20:01:18 -04:00
Eric Dumazet	39bb5e6286	net: skb_fclone_busy() needs to detect orphaned skb Some drivers are unable to perform TX completions in a bound time. They instead call skb_orphan() Problem is skb_fclone_busy() has to detect this case, otherwise we block TCP retransmits and can freeze unlucky tcp sessions on mostly idle hosts. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `1f3279ae0c` ("tcp: avoid retransmits of TCP packets hanging in host queues") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 19:58:30 -04:00
Sowmini Varadhan	cd2145358e	tcp: Correction to RFC number in comment Challenge ACK is described in RFC 5961, fix typo. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 19:53:34 -04:00
Tom Herbert	14051f0452	gre: Use inner mac length when computing tunnel length Currently, skb_inner_network_header is used but this does not account for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which handles TEB and also should work with IP encapsulation in which case inner mac and inner network headers are the same. Tested: Ran TCP_STREAM over GRE, worked as expected. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 19:51:56 -04:00
Michele Baldessari	afb6befce6	sctp: replace seq_printf with seq_puts Fixes checkpatch warning: "WARNING: Prefer seq_puts to seq_printf" Signed-off-by: Michele Baldessari <michele@acksyn.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 19:40:16 -04:00
Michele Baldessari	891310d53d	sctp: add transport state in /proc/net/sctp/remaddr It is often quite helpful to be able to know the state of a transport outside of the application itself (for troubleshooting purposes or for monitoring purposes). Add it under /proc/net/sctp/remaddr. Signed-off-by: Michele Baldessari <michele@acksyn.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 19:40:16 -04:00
Nicolas Cavallari	fa19c2b050	ipv4: Do not cache routing failures due to disabled forwarding. If we cache them, the kernel will reuse them, independently of whether forwarding is enabled or not. Which means that if forwarding is disabled on the input interface where the first routing request comes from, then that unreachable result will be cached and reused for other interfaces, even if forwarding is enabled on them. The opposite is also true. This can be verified with two interfaces A and B and an output interface C, where B has forwarding enabled, but not A and trying ip route get $dst iif A from $src && ip route get $dst iif B from $src Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 19:20:40 -04:00
stephen hemminger	b2ad5e5fcc	tipc: spelling errors Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 16:56:41 -04:00
Florian Westphal	646697b9e3	syncookies: only increment SYNCOOKIESFAILED on validation error Only count packets that failed cookie-authentication. We can get SYNCOOKIESFAILED > 0 while we never even sent a single cookie. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 16:53:39 -04:00
stephen hemminger	f4e715c325	ipv4: minor spelling fixes Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 16:14:43 -04:00
Alexey Andriyanov	acf722f734	ip6_tunnel: allow to change mode for the ip6tnl0 The fallback device is in ipv6 mode by default. The mode can not be changed in runtime, so there is no way to decapsulate ip4in6 packets coming from various sources without creating the specific tunnel ifaces for each peer. This allows to update the fallback tunnel device, but only the mode could be changed. Usual command should work for the fallback device: `ip -6 tun change ip6tnl0 mode any` The fallback device can not be hidden from the packet receiver as a regular tunnel, but there is no need for synchronization as long as we do single assignment. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexey Andriyanov <alan@al-an.info> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 16:09:20 -04:00
Fabian Frederick	43728fa5c5	ipv6: remove assignment in if condition Do assignment before if condition and test !skb like in rawv6_recvmsg() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 15:51:43 -04:00
Fabian Frederick	fc08c25819	ipv6: remove inline on static in c file remove __inline__ / inline and let compiler decide what to do with static functions Inspired-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 15:51:43 -04:00
Fabian Frederick	40dc2ca3cb	ipv6: spelling s/incomming/incoming Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 15:51:43 -04:00
Fabian Frederick	e7b6658ea8	ipx: remove all unnecessary castings on ntohl Apply commit `e0f36310f7` ("ipx: remove unnecessary casting on ntohl") to all seq_printf/08lX Inspired-by: "David S. Miller" <davem@davemloft.net> Inspired-by: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 15:51:43 -04:00
Guenter Roeck	3d762a0f0a	net: dsa: Add support for reading switch registers with ethtool Add support for reading switch registers with 'ethtool -d'. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 14:54:11 -04:00
Guenter Roeck	6793abb4e8	net: dsa: Add support for switch EEPROM access On some chips it is possible to access the switch eeprom. Add infrastructure support for it. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 14:54:11 -04:00
Guenter Roeck	51579c3f1a	net: dsa: Add support for reporting switch chip temperatures Some switches provide chip temperature data. Add support for reporting it through the hwmon subsystem. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 14:54:11 -04:00
Guenter Roeck	734cbb5b6b	net: dsa: Don't set skb->protocol on outgoing tagged packets Setting skb->protocol to a private protocol type may result in warning messages such as e1000e 0000:00:19.0 em1: checksum_partial proto=dada! This happens if the L3 protocol is IP or IPv6 and skb->ip_summed is set to CHECKSUM_PARTIAL. Looking through the code, it appears that changing skb->protocol for transmitted packets is not necessary and may actually be harmful. For example, it prevents purposely unmodified (from a DSA perspective) network drivers from properly setting up their transmit checksum offload pointers since they inspect skb->protocol to set up the IPv4 header or IPv6 header pointers. So don't unnecessarily change the protocol field. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 14:54:10 -04:00
Marcel Holtmann	a4d5504d5c	Bluetooth: Clear LE white list when resetting controller The internal representation of the LE white list needs to be cleared when receiving a successful HCI_Reset command. A reset of the controller is expected to start with an empty LE white list. When the LE white list is not cleared on controller reset, the passive background scanning might skip programming the remote devices. Only changes to the LE white list are programmed when passive background is started. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org # 3.17.x	2014-10-30 17:41:08 +01:00
stephen hemminger	01cfa0a4ed	netfilter: fix spelling errors Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-30 17:35:30 +01:00
Dan Carpenter	daac197ca9	Bluetooth: 6lowpan: use after free in disconnect_devices() This was accidentally changed from list_for_each_entry_safe() to list_for_each_entry() so now it has a use after free bug. I've changed it back. Fixes: `9030582963` ('Bluetooth: 6lowpan: Converting rwlocks to use RCU') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-30 17:23:25 +01:00
Pablo Neira Ayuso	c3ac759ea6	Merge branch 'ipvs-next' Simon Horman says: ==================== The single patch in this series fixes some minor fallout from adding support IPv6 real servers in IPv4 virtual-services and vice versa. It should not have any run-time affect other than perhaps saving a few cycles. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-30 16:52:30 +01:00
Marcelo Leitner	8ac2bde2a4	netfilter: log: protect nf_log_register against double registering Currently, despite the comment right before the function, nf_log_register allows registering two loggers on with the same type and end up overwriting the previous register. Not a real issue today as current tree doesn't have two loggers for the same type but it's better to get this protected. Also make sure that all of its callers do error checking. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-30 16:41:48 +01:00
Marcelo Leitner	0c26ed1c07	netfilter: nf_log: Introduce nft_log_dereference() macro Wrap up a common call pattern in an easier to handle call. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-30 16:39:40 +01:00
Johannes Berg	46238845bd	mac80211: properly flush delayed scan work on interface removal When an interface is deleted, an ongoing hardware scan is canceled and the driver must abort the scan, at the very least reporting completion while the interface is removed. However, if it scheduled the work that might only run after everything is said and done, which leads to cfg80211 warning that the scan isn't reported as finished yet; this is no fault of the driver, it already did, but mac80211 hasn't processed it. To fix this situation, flush the delayed work when the interface being removed is the one that was executing the scan. Cc: stable@vger.kernel.org Reported-by: Sujith Manoharan <sujith@msujith.org> Tested-by: Sujith Manoharan <sujith@msujith.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-30 15:48:32 +01:00
Mike Christie	89baaa570a	libceph: use memalloc flags for net IO This patch has ceph's lib code use the memalloc flags. If the VM layer needs to write data out to free up memory to handle new allocation requests, the block layer must be able to make forward progress. To handle that requirement we use structs like mempools to reserve memory for objects like bios and requests. The problem is when we send/receive block layer requests over the network layer, net skb allocations can fail and the system can lock up. To solve this, the memalloc related flags were added. NBD, iSCSI and NFS uses these flags to tell the network/vm layer that it should use memory reserves to fullfill allcation requests for structs like skbs. I am running ceph in a bunch of VMs in my laptop, so this patch was not tested very harshly. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>	2014-10-30 13:11:50 +03:00
Alexander Aring	38130c31ef	mac802154: add basic support for monitor This patch adds basic support for monitor mode. Also change the open call that we set the transceiver mac setting on an interface up. Futher patches will add a better handling while interface up an interface. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:46 +01:00
Alexander Aring	05f7de6792	mac802154: rx: add error handling after skb_clone This patch adds error handling after skb_clone and deliver only if skb_clone was successful. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:46 +01:00
Alexander Aring	20b48120c1	mac802154: rx: monitor receive cleanup This patch replace the !netif_running(sdata->dev) instead we doing a !ieee802154_sdata_running(sdata). Also move this in two separate if branches to compare with mac80211 code. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:46 +01:00
Alexander Aring	18460672e0	mac802154: rx: add rx stats incrementation This patch adds rx stats incrementation when the monitor interface recevied a frame. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:46 +01:00
Alexander Aring	21fdf0a1c1	mac802154: rx: use netif_receive_skb This patch removes netif_rx_ni call. Instead we call netif_receive_skb, we can do that since commit `c5c47e67bc` ("mac802154: rx: use tasklet instead workqueue"). Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:46 +01:00
Alexander Aring	1054ed81c4	mac802154: rx: remove override pkt_type set to PACKET_HOST This patch removes pkt_type set to PACKET_HOST while monitor receiving. This should be PACKET_OTHERHOST on monitor mode which already set before. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:45 +01:00
Alexander Aring	ec718f3db9	mac802154: rx: add software checksum filtering check This patch adds a new hardware flag which indicate that the transceiver doesn't support check for bad checksum via hardware. Also add a handling of this while receive. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:45 +01:00
Alexander Aring	b7889497d3	mac802154: rx: simplify crc receive handling This patch change the actual crc handling while receive. Currently the IEEE802154_HW_RX_OMIT_CKSUM flag is used to filter a frame with a bad crc. This patch changes the behaviour of IEEE802154_HW_RX_OMIT_CKSUM to add a crc while receiving for the monitor interface. After monitor receiving we remove the crc for frame parsing. This affect the driver layer because all drivers sets IEEE802154_HW_RX_OMIT_CKSUM and deliver without checksum. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:45 +01:00
Alexander Aring	08c511a733	mac802154: rx: remove unnecessary parameter This patch removes a not used parameter in ieee802154_deliver_skb. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:45 +01:00
Alexander Aring	90386a7e3b	mac802154: separate omit tx/rx flags This patch splits the IEEE802154_HW_OMIT_CKSUM hardware flag into IEEE802154_HW_TX_OMIT_CKSUM and IEEE802154_HW_RX_OMIT_CKSUM. This is useful to deliver the received crc from the driver layer to the monitor interface. At the moment we can't do that without change the xmit handling. The received checksum should be visible in monitor mode only. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:45 +01:00
Alexander Aring	94b792220c	mac802154: add support for promiscuous mode This patch adds a new driver operation to bring the transceiver into promiscuous mode. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:45 +01:00
Alexander Aring	55a2d06517	mac802154: main: remove unnecessary include This patch removes an unnecessary include of driver-ops header file. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 23:07:44 +01:00
Nicolas Dichtel	75fbfd3323	neigh: optimize neigh_parms_release() In neigh_parms_release() we loop over all entries to find the entry given in argument and being able to remove it from the list. By using a double linked list, we can avoid this loop. Here are some numbers with 30 000 dummy interfaces configured: Before the patch: $ time rmmod dummy real 2m0.118s user 0m0.000s sys 1m50.048s After the patch: $ time rmmod dummy real 1m9.970s user 0m0.000s sys 0m47.976s Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-29 16:11:50 -04:00
Eric Dumazet	bc9ad166e3	net: introduce napi_schedule_irqoff() napi_schedule() can be called from any context and has to mask hard irqs. Add a variant that can only be called from hard interrupts handlers or when irqs are already masked. Many NIC drivers can use it from their hard IRQ handler instead of generic variant. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-29 16:07:27 -04:00
Nikolay Aleksandrov	d70127e8a9	inet: frags: remove the WARN_ON from inet_evict_bucket The WARN_ON in inet_evict_bucket can be triggered by a valid case: inet_frag_kill and inet_evict_bucket can be running in parallel on the same queue which means that there has been at least one more ref added by a previous inet_frag_find call, but inet_frag_kill can delete the timer before inet_evict_bucket which will cause the WARN_ON() there to trigger since we'll have refcnt!=1. Now, this case is valid because the queue is being "killed" for some reason (removed from the chain list and its timer deleted) so it will get destroyed in the end by one of the inet_frag_put() calls which reaches 0 i.e. refcnt is still valid. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: `b13d3cbfb8` ("inet: frag: move eviction of queues to work queue") Reported-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-29 15:21:30 -04:00
Nikolay Aleksandrov	65ba1f1ec0	inet: frags: fix a race between inet_evict_bucket and inet_frag_kill When the evictor is running it adds some chosen frags to a local list to be evicted once the chain lock has been released but at the same time the *frag_queue can be running for some of the same queues and it may call inet_frag_kill which will wait on the chain lock and will then delete the queue from the wrong list since it was added in the eviction one. The fix is simple - check if the queue has the evict flag set under the chain lock before deleting it, this is safe because the evict flag is set only under that lock and having the flag set also means that the queue has been detached from the chain list, so no need to delete it again. An important note to make is that we're safe w.r.t refcnt because inet_frag_kill and inet_evict_bucket will sync on the del_timer operation where only one of the two can succeed (or if the timer is executing - none of them), the cases are: 1. inet_frag_kill succeeds in del_timer - then the timer ref is removed, but inet_evict_bucket will not add this queue to its expire list but will restart eviction in that chain 2. inet_evict_bucket succeeds in del_timer - then the timer ref is kept until the evictor "expires" the queue, but inet_frag_kill will remove the initial ref and will set INET_FRAG_COMPLETE which will make the frag_expire fn just to remove its ref. In the end all of the queue users will do an inet_frag_put and the one that reaches 0 will free it. The refcount balance should be okay. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: `b13d3cbfb8` ("inet: frag: move eviction of queues to work queue") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Patrick McLean <chutzpah@gentoo.org> Tested-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-29 15:21:30 -04:00
Erik Kline	7fd2561e4e	net: ipv6: Add a sysctl to make optimistic addresses useful candidates Add a sysctl that causes an interface's optimistic addresses to be considered equivalent to other non-deprecated addresses for source address selection purposes. Preferred addresses will still take precedence over optimistic addresses, subject to other ranking in the source address selection algorithm. This is useful where different interfaces are connected to different networks from different ISPs (e.g., a cell network and a home wifi network). The current behaviour complies with RFC 3484/6724, and it makes sense if the host has only one interface, or has multiple interfaces on the same network (same or cooperating administrative domain(s), but not in the multiple distinct networks case. For example, if a mobile device has an IPv6 address on an LTE network and then connects to IPv6-enabled wifi, while the wifi IPv6 address is undergoing DAD, IPv6 connections will try use the wifi default route with the LTE IPv6 address, and will get stuck until they time out. Also, because optimistic nodes can receive frames, issue an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC flag appropriately set). A second RTM_NEWADDR is sent if DAD completes (the address flags have changed), otherwise an RTM_DELADDR is sent. Also: add an entry in ip-sysctl.txt for optimistic_dad. Signed-off-by: Erik Kline <ek@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-29 15:11:36 -04:00
Eric Dumazet	dca145ffaa	tcp: allow for bigger reordering level While testing upcoming Yaogong patch (converting out of order queue into an RB tree), I hit the max reordering level of linux TCP stack. Reordering level was limited to 127 for no good reason, and some network setups [1] can easily reach this limit and get limited throughput. Allow a new max limit of 300, and add a sysctl to allow admins to even allow bigger (or lower) values if needed. [1] Aggregation of links, per packet load balancing, fabrics not doing deep packet inspections, alternative TCP congestion modules... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yaogong Wang <wygivan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-29 15:05:15 -04:00
Toshiaki Makita	432c856fcf	net: skb_segment() should preserve backpressure This patch generalizes commit `d6a4a10411` ("tcp: GSO should be TSQ friendly") to protocols using skb_set_owner_w() TCP uses its own destructor (tcp_wfree) and needs a more complex scheme as explained in commit `6ff50cd555` ("tcp: gso: do not generate out of order packets") This allows UDP sockets using UFO to get proper backpressure, thus avoiding qdisc drops and excessive cpu usage. Here are performance test results (macvlan on vlan): - Before # netperf -t UDP_STREAM ... Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 65507 60.00 144096 1224195 1258.56 212992 60.00 51 0.45 Average: CPU %user %nice %system %iowait %steal %idle Average: all 0.23 0.00 25.26 0.08 0.00 74.43 - After # netperf -t UDP_STREAM ... Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 65507 60.00 109593 0 957.20 212992 60.00 109593 957.20 Average: CPU %user %nice %system %iowait %steal %idle Average: all 0.18 0.00 8.38 0.02 0.00 91.43 [edumazet] Rewrote patch and changelog. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-29 14:47:19 -04:00
Lubomir Rintel	b2ed64a974	ipv6: notify userspace when we added or changed an ipv6 token NetworkManager might want to know that it changed when the router advertisement arrives. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-29 14:35:32 -04:00
WANG Cong	d56109020d	sch_pie: schedule the timer after all init succeed Cc: Vijay Subramanian <vijaynsu@cisco.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com>	2014-10-29 14:28:01 -04:00
Johannes Berg	fc1f48ffd5	cfg80211: fix integer signedness in chandef_primary_freqs() The helper function can't ever create negative values, so use u32 pointers as the function arguments as the caller does. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-29 18:42:51 +01:00
Fabian Frederick	dcc6c2f516	cfg80211: fix set but not used warning in nl80211_channel_switch() radar_detect_width is unused since commit `97dc94f1d9` ("cfg80211: remove channel_switch combination check") Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-29 18:42:51 +01:00
Luciano Coelho	ff1e417c7c	mac80211: schedule the actual switch of the station before CSA count 0 Due to the time it takes to process the beacon that started the CSA process, we may be late for the switch if we try to reach exactly beacon 0. To avoid that, use count - 1 when calculating the switch time. Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-29 16:37:54 +01:00
Luciano Coelho	84469a45a1	mac80211: use secondary channel offset IE also beacons during CSA If we are switching from an HT40+ to an HT40- channel (or vice-versa), we need the secondary channel offset IE to specify what is the post-CSA offset to be used. This applies both to beacons and to probe responses. In ieee80211_parse_ch_switch_ie() we were ignoring this IE from beacons and using the current HT information IE instead. This was causing us to use the same offset as before the switch. Fix that by using the secondary channel offset IE also for beacons and don't ever use the pre-switch offset. Additionally, remove the "beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not needed anymore. Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-29 16:37:45 +01:00
Dan Carpenter	eb63192bb8	SUNRPC: off by one in BUG_ON() The m->pool_to[] array has "maxpools" number of elements. It's allocated in svc_pool_map_alloc_arrays() which we called earlier in the function. This test should be >= instead of >. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-10-29 11:37:42 -04:00
Felix Fietkau	10b6848786	mac80211: flush keys for AP mode on ieee80211_do_stop Userspace can add keys to an AP mode interface before start_ap has been called. If there have been no calls to start_ap/stop_ap in the mean time, the keys will still be around when the interface is brought down. Signed-off-by: Felix Fietkau <nbd@openwrt.org> [adjust comments, fix AP_VLAN case] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-29 16:33:37 +01:00
Jukka Rissanen	9cfd5a23a4	Bluetooth: Wrong style spin lock used Use spin_lock_bh() as the code is called from softirq in networking subsystem. This is needed to prevent deadlocks when 6lowpan link is in use. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-29 16:20:40 +01:00
Alexander Aring	e23e9ec16b	ieee802154: introduce sysfs file This patch moves the sysfs handling in a own file. This is like wireless sysfs file handling. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:09 +01:00
Alexander Aring	7445764155	mac802154: cleanup open count handling This patch cleanups the open_count variable increment in open and close calls of netdev. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:09 +01:00
Alexander Aring	c7420c367d	mac802154: move mac_params functions into mac_cmd These functions can be static in mac_cmd file. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:08 +01:00
Alexander Aring	12439a5356	mac802154: remove channel attributes from sdata These channel attributes was part of "channel context switch while xmit" which was removed by commit `dc67c6b30f` ("mac802154: tx: remove xmit channel context switch"). This patch removes these unnecessary variables and use the current_page and current_channel by wpan_phy struct now. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:08 +01:00
Alexander Aring	33d4189f51	mac802154: iface: remove assign to zero These variables should already be zero, so we remove the extra assign to zero. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:08 +01:00
Alexander Aring	538181a879	mac802154: add synchronization handling This patch adds synchronization handling in start and stop driver ops calls. This patch is mostly grab from mac80211 which was introduced by commit `ea77f12f2c` ("mac80211: remove tasklet enable/disable"). This is to be sure that we don't run into same issues. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:08 +01:00
Alexander Aring	e363eca386	mac802154: move local started handling This patch removes the current handling of started boolean. This is actually dead code, because mac802154_netdev_register can't never be called before ieee802154_register_hw. This means that local->started is always be true when mac802154_netdev_register is called. Instead we using this now like mac80211 to indicate that an instance of sdata is running. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:08 +01:00
Alexander Aring	5d65cae4bf	mac802154: rename running to started This variable should be handled like ieee80211_local struct of mac80211. We rename this variable to started now to have the same name convention. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:08 +01:00
Alexander Aring	0ea3da64fa	mac802154: rework sdata state change to running This patch reworks the handling for setting the state like mac80211. We use bit's instead a bool variable. The mutex is not needed because it use test and set bits which are atomic operations. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:07 +01:00
Alexander Aring	a543c5989d	mac802154: remove driver ops in wpan-phy This patch removes the driver ops callbacks inside of wpan_phy struct. It was used to check if a phy supports this driver ops call. We do this now via hardware flags. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:07 +01:00
Alexander Aring	59cb300f2b	mac802154: use driver-ops function wrappers This patch replaces all directly called driver ops by previous introduced driver-ops function wrappers. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:07 +01:00
Alexander Aring	b6eea9ca35	mac802154: introduce driver-ops header This patch introduce a driver-ops header file with function wrappers to call the driver ops. These wrappers checking on right context information and warn if optional driver ops are called when these aren't implemented. This behaviour is like mac80211 driver-ops header file, just without function tracing calls. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:07 +01:00
Alexander Aring	1630186100	mac802154: declare struct ieee802154_ops as const The ieee802154_ops structure should be never changed during runtime. This patch declare this structure as const to avoid a runtime change. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:07 +01:00
Alexander Aring	19ec690a43	mac802154: main: move open and close into iface These functions can be static inside the iface file, because it's not used anywhere else. This patch moves these functions into iface file. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:06 +01:00
Alexander Aring	b9ff77e50c	mac802154: monitor: merge into iface implementation This patch removes the monitor implementation file and put all monitor stuff into iface file. It's now small enough to put all necessary handling into iface. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 23:19:06 +01:00
Johan Hedberg	0b1db38ca2	Bluetooth: Fix check for direct advertising These days we allow simultaneous LE scanning and advertising. Checking for whether advertising is enabled or not is therefore not a reliable way to determine whether directed advertising was used to trigger the connection creation. The appropriate place to check (instead of the hdev context) is the connection role that's stored in the hci_conn. This patch fixes such a check in le_conn_timeout() which could otherwise lead to incorrect HCI commands being sent. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 3.16.x	2014-10-28 22:48:56 +01:00
Johan Hedberg	980ffc0a2c	Bluetooth: Fix LE connection timeout deadlock The le_conn_timeout() may call hci_le_conn_failed() which in turn may call hci_conn_del(). Trying to use the _sync variant for cancelling the conn timeout from hci_conn_del() could therefore result in a deadlock. This patch converts hci_conn_del() to use the non-sync variant so the deadlock is not possible. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 3.16.x	2014-10-28 22:48:56 +01:00
David S. Miller	2c6c49ded7	openvswitch: Export lockdep_ovsl_is_held to modules. ERROR: "lockdep_ovsl_is_held" [net/openvswitch/vport-gre.ko] undefined! Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:27:23 -04:00
Simon Horman	941d8ebcf7	datapath: Rename last_action() as nla_is_last() and move to netlink.h The original motivation for this change was to allow the helper to be used in files other than actions.c as part of work on an odp select group action. It was as pointed out by Thomas Graf that this helper would be best off living in netlink.h. Furthermore, I think that the generic nature of this helper means it is best off in netlink.h regardless of if it is used more than one .c file or not. Thus, I would like it considered independent of the work on an odp select group action. Cc: Thomas Graf <tgraf@suug.ch> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Andy Zhou <azhou@nicira.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:07:29 -04:00
David S. Miller	25946f20b7	Merge tag 'master-2014-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-10-28 Please pull this batch of fixes intended for the 3.18 stream! For the mac80211 bits, Johannes says: "Here are a few fixes for the wireless stack: one fixes the RTS rate, one for a debugfs file, one to return the correct channel to userspace, a sanity check for a userspace value and the remaining two are just documentation fixes." For the iwlwifi bits, Emmanuel says: "I revert here a patch that caused interoperability issues. dvm gets a fix for a bug that was reported by many users. Two minor fixes for BT Coex and platform power fix that helps reducing latency when the PCIe link goes to low power states." In addition... Felix Fietkau adds a couple of ath code fixes related to regulatory rule enforcement. Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS is not set. Karsten Wiese provides a trio of minor fixes for rtl8192cu. Kees Cook prevents a potential information leak in rtlwifi. Larry Finger also brings a trio of minor fixes for rtlwifi. Rafał Miłecki adds a device ID to the bcma bus driver. Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac to eliminate non-terminated string issues. Sujith Manoharan avoids some ath9k stalls by enabling HW queue control only for MCC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 15:30:15 -04:00
Andrew Lunn	ae439286a0	net: dsa: Error out on tagging protocol mismatches If there is a mismatch between enabled tagging protocols and the protocol the switch supports, error out, rather than continue with a situation which is unlikely to work. Signed-off-by: Andrew Lunn <andrew@lunn.ch> cc: alexander.h.duyck@intel.com Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 15:27:54 -04:00
Thomas Graf	62b9c8d037	ovs: Turn vports with dependencies into separate modules The internal and netdev vport remain part of openvswitch.ko. Encap vports including vxlan, gre, and geneve can be built as separate modules and are loaded on demand. Modules can be unloaded after use. Datapath ports keep a reference to the vport module during their lifetime. Allows to remove the error prone maintenance of the global list vport_ops_list. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 14:43:18 -04:00
Stephen Hemminger	49c922bb1e	Bluetooth: spelling fixes Fix spelling errors in comments. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 17:23:58 +01:00
Jukka Rissanen	df092306d6	Bluetooth: 6lowpan: Fix lockdep splats When a device ndo_start_xmit() calls again dev_queue_xmit(), lockdep can complain because dev_queue_xmit() is re-entered and the spinlocks protecting tx queues share a common lockdep class. Same issue was fixed for ieee802154 in commit "20e7c4e80dcd" Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 17:04:39 +01:00
Jukka Rissanen	9030582963	Bluetooth: 6lowpan: Converting rwlocks to use RCU The rwlocks are converted to use RCU. This helps performance as the irq locks are not needed any more. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 17:04:38 +01:00
Johan Hedberg	da213f8e0c	Bluetooth: Revert SMP self-test patches This reverts commits `c6992e9ef2` and `4cd3362da8`. The reason for the revert is that we cannot have more than one module initialization function and the SMP one breaks the build with modular kernels. As the proper fix for this is right now looking non-trivial it's better to simply revert the problematic patches in order to keep the upstream tree compilable. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-28 15:32:49 +01:00
Alex Gartrell	d770108911	ipvs: remove unnecessary assignment in __ip_vs_get_out_rt It is a precondition of the function that daddr be equal to dest->addr.ip if dest is non-NULL, so this additional assignment is just confusing for stupid engineers like me. Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-10-28 09:50:06 +09:00
Alex Gartrell	3d53666b40	ipvs: Avoid null-pointer deref in debug code Use daddr instead of reaching into dest. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-10-28 09:48:31 +09:00
Alexei Starovoitov	f89b7755f5	bpf: split eBPF out of NET introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 19:09:59 -04:00
Kyeyoon Park	958501163d	bridge: Add support for IEEE 802.11 Proxy ARP This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows the AP devices to keep track of the hardware-address-to-IP-address mapping of the mobile devices within the WLAN network. The AP will learn this mapping via observing DHCP, ARP, and NS/NA frames. When a request for such information is made (i.e. ARP request, Neighbor Solicitation), the AP will respond on behalf of the associated mobile device. In the process of doing so, the AP will drop the multicast request frame that was intended to go out to the wireless medium. It was recommended at the LKS workshop to do this implementation in the bridge layer. vxlan.c is already doing something very similar. The DHCP snooping code will be added to the userspace application (hostapd) per the recommendation. This RFC commit is only for IPv4. A similar approach in the bridge layer will be taken for IPv6 as well. Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 19:02:04 -04:00
David S. Miller	5d26b1f50a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Allow to recycle a TCP port in conntrack when the change role from server to client, from Marcelo Leitner. 2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch from Dan Carpenter. 3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables chain statistic updates. From Sabrina Dubroca. 4) Don't compile ip options in bridge netfilter, this mangles the packet and bridge should not alter layer >= 3 headers when forwarding packets. Patch from Herbert Xu and tested by Florian Westphal. 5) Account the final NLMSG_DONE message when calculating the size of the nflog netlink batches. Patch from Florian Westphal. 6) Fix a possible netlink attribute length overflow with large packets. Again from Florian Westphal. 7) Release the skbuff if nfnetlink_log fails to put the final NLMSG_DONE message. This fixes a leak on error. This shouldn't ever happen though, otherwise this means we miscalculate the netlink batch size, so spot a warning if this ever happens so we can track down the problem. This patch from Houcheng Lin. 8) Look at the right list when recycling targets in the nft_compat, patch from Arturo Borrero. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 18:47:40 -04:00
Arturo Borrero	e9105f1bea	netfilter: nf_tables: add new expression nft_redir This new expression provides NAT in the redirect flavour, which is to redirect packets to local machine. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-27 22:49:39 +01:00
Arturo Borrero	9de920eddb	netfilter: refactor NAT redirect IPv6 code to use it from nf_tables This patch refactors the IPv6 code so it can be usable both from xt and nf_tables. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-27 22:48:10 +01:00
Arturo Borrero	8b13eddfdf	netfilter: refactor NAT redirect IPv4 to use it from nf_tables This patch refactors the IPv4 code so it can be usable both from xt and nf_tables. A similar patch follows-up to handle IPv6. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-27 22:47:06 +01:00
Arturo Borrero	7965ee9371	netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops() The code looks for an already loaded target, and the correct list to search is nft_target_list, not nft_match_list. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-27 22:17:46 +01:00
Fabian Frederick	b8901ac319	ipx: remove __inline__ in c file on static Let compiler decide what to do with static void __ipxitf_put() Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 16:25:31 -04:00
Fabian Frederick	e0f36310f7	ipx: remove unnecessary casting on ntohl use %08X instead of %08lX and remove casting. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 16:03:53 -04:00
Fabian Frederick	5e96d788d9	ipx: move extern sysctl_ipx_pprop_broadcasting to header file include ipx.h from sysctl_net_ipx.c Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 16:03:53 -04:00
Fabian Frederick	ce256981e5	ipv6: include linux/uaccess.h instead of asm/uaccess.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 16:03:52 -04:00
Fabian Frederick	9451a304ce	ipv6: replace min/casting by min_t Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 16:03:52 -04:00
Fabian Frederick	6b436d3381	ipv4: remove set but unused variable sha unsigned char *sha (source) was already in original git version but was never used. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 16:03:52 -04:00
John W. Linville	99c814066e	Here are a few fixes for the wireless stack: one fixes the RTS rate, one for a debugfs file, one to return the correct channel to userspace, a sanity check for a userspace value and the remaining two are just documentation fixes. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUSUotAAoJEDBSmw7B7bqr2TwP/2EJiYMOXhLTM5F/sEaGP5aX +hN+/Za3hAuLu3GkYXnIEw8uJL0ooDpUkQLoyV7AUoqVKhgqCuMQPWbklU0Ns2Og qubAl9BRY1DPgMtdGa3mMMJ3GYkIC8Hbh6kTOPqYASVZWover5NRKFlp2jp1uhbf ypv0IfIJVO323s90TWv1ZQsTtnGxMtL9DaLwqBNKN8nSGKxe62cUZsQN+H5KGm4N /n7eN62XPkidFUsTmdAXHfcgEpGv82rtSpxWmSrwxDbQEj12xkP66cRTuomZJ5v1 981OIzcxtV0ngLjfnoSGev6bvgO2TDEbvQScIsZiqfnaJuBfzPEAaghnWozox3op dfolKkD3LecLGcxVVGJlKddxm3K4+2q7tuwkfDcxKNx2KFqtOqM6gY8z1uXYX8MW Jv7669nwpKgWM0e3hsxz6WJauEsdWRVzarmimK/Ymitu0RgNmXTVbdvvFVSTenuZ 0HLqfr7Uk0gw5gQgWfj4F0qjNxzmjhnw/pz+c1DRtYs6w6SGToCqcm5yRU6f8pLt SHc3LJ67xK5RnOq8+KJ8o92MfE29HH2CTzLzgghNrLnwqcYBTApuCr0OtpQb04Zf AgQRMq61IGXwCDiVFE1ElpRgrW4/aekUZh/JB8pGHlhrAvl+HwNYVek66MBDeMyl akmLeHhrCkuWstDHHz+o =m/XT -----END PGP SIGNATURE----- Merge tag 'mac80211-for-john-2014-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg <johannes@sipsolutions.net> says: "Here are a few fixes for the wireless stack: one fixes the RTS rate, one for a debugfs file, one to return the correct channel to userspace, a sanity check for a userspace value and the remaining two are just documentation fixes." Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-10-27 13:38:15 -04:00
Alexander Aring	be9d215fa9	mac802154: rx: change naming convention This patch changes the naming convention of mac802154 rx file. It should be more named like mac80211 stack. Furthermore we introduce a new frame parsing implementation which is much similar the mac80211 implementation. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:50 +01:00
Alexander Aring	e176b681b0	mac802154: rx: move rcu locking Instead of twice lock and unlock mechanism this patch hold these locks only once at one position. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:45 +01:00
Alexander Aring	9cf215d073	mac802154: rx: move skb_reset_mac_header This patch moves the skb_reset_mac_header call before frame parsing while wpan rx and before monitor deliver functionality. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:45 +01:00
Alexander Aring	c9ca640140	mac802154: rx: add monitor pkt_type information This patch adds a PACKET_OTHERHOST setting when a monitor interface receives a skb. All receiving skb's to the monitor interface should be PACKET_OTHERHOST. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:44 +01:00
Alexander Aring	75a46f0ee7	mac802154: rx: add CHECKSUM_UNNECESSARY This patch adds CHECKSUM_UNNECESSARY to skb->ip_summed before delivery. There exist no transceiver with IP checksum functionality. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:44 +01:00
Alexander Aring	702dcf994a	mac802154: rx: move skb->protocol setting This patch moves the skb->protocol setting to the position when it's needed. It's only needed when frame parsing was successful. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:43 +01:00
Alexander Aring	469100d6c2	mac802154: rx: rename remove mac802154_subif_rx This patch removes the mac802154_subif_rx function and do the necessary calls inside of ieee802154_rx function. The ieee802154_rx is small enough to move the functionality inside this function. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:43 +01:00
Alexander Aring	4ca18be54f	mac802154: tx: remove monitor receive while xmit This removes the call of monitor receive funktion when any interface type call xmit. There exist no such use case that a monitor interface should receive the actual sending frame. One use case could be that a wpan interface and monitor interface could be running at the same time on one phy. Then the monitor interface receives the wpan frames also. Furthermore we adding support for promiscous mode setting. With promiscous mode setting we can't run a wpan and monitor interface at the same time. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:42 +01:00
Alexander Aring	2a9820c9e2	mac802154: rx: move receive handling into rx.c This patch removes all relevant receiving functions inclusive frame parsing into rx file. Like mac80211 we should implement the complete receive handling and parsing in this file. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:42 +01:00
Alexander Aring	c730c90316	mac802154: rx: document ieee802154_rx() context requirement This patch is similar like `d20ef63d32` ("mac80211: document ieee80211_rx() context requirement"). The netif_receive_skb call requires with softirqs disabled. This patch adds a warning if softirqs are pending while calling ieee802154_rx. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:42 +01:00
Alexander Aring	c5c47e67bc	mac802154: rx: use tasklet instead workqueue Tasklets have much less overhead than workqueues. This patch also removes the heap allocation for the worker on receiving path. Like mac80211 we should prefer use a tasklet here instead a workqueue to getting fast out of interrupt context when ieee802154_rx_irqsafe is called by driver. Like wireless inside the tasklet context we should call netif_receive_skb instead netif_rx_ni anymore. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:40 +01:00
Alexander Aring	061ef8f915	mac802154: tx: use put_unaligned_le16 for copy crc This patch replaces the memcpy with a put_unaligned_le16. The placement of crc inside of PSDU can also be unaligned. With memcpy this can fail on some architectures. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 18:07:36 +01:00
Martin Townsend	01141234f2	ieee802154: 6lowpan: rename process_data and lowpan_process_data As we have decouple decompression from data delivery we can now rename all occurences of process_data in receive path. Signed-off-by: Martin Townsend <mtownsend1973@gmail.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 15:51:16 +01:00
Martin Townsend	3c400b843d	bluetooth:6lowpan: use consume_skb when packet processed successfully Signed-off-by: Martin Townsend <mtownsend1973@gmail.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 15:51:16 +01:00
Martin Townsend	04dfd7386a	6lowpan: fix process_data return values As process_data now returns just error codes fix up the calls to this function to only drop the skb if an error code is returned. Signed-off-by: Martin Townsend <mtownsend1973@gmail.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 15:51:15 +01:00
Martin Townsend	f8b361768e	6lowpan: remove skb_deliver from IPHC Separating skb delivery from decompression ensures that we can support further decompression schemes and removes the mixed return value of error codes with NET_RX_FOO. Signed-off-by: Martin Townsend <mtownsend1973@gmail.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-27 15:51:15 +01:00
Karl Beldan	9ffe904405	mac80211: minstrel_ht: do not always skip ht rates vht_only is true When CONFIG_MAC80211_RC_MINSTREL_VHT is set, the module param minstrel_vht_only tells minstrel_ht whether to allow the mix of ht rates with vht rates. ATM, minstrel_ht skips ht rates when minstrel_vht_only is true, but it does that even if vht is not supported, which makes the sta rates fallback to legacy as no ht rate gets enabled. Fixes: `9208247d74` ("mac80211: minstrel_ht: add basic support for VHT rates <= 3SS@80MHz") Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-27 08:48:35 +01:00
Emmanuel Grumbach	cfede0d80d	mac80211: don't flush when probing the AP All the callers of ieee80211_mgd_probe_ap_send return right after they call the flush() callback. This means that calling flush() is uneeded since its meaning is to wait until the queues of the device are empty. Devices that know how to report status on Tx will do so using the regular path (ieee80211_tx_status) and this status will trigger the continuation of the flow of the probe (ieee80211_sta_tx_notify). Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-27 08:48:34 +01:00
Ben Greear	b5dfae020b	mac80211: support creating vifs with specified mac address This is useful when creating virtual interfaces. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-27 08:48:34 +01:00
Ben Greear	e8f479b112	cfg80211: support configuring vif mac addr on create This is useful when creating virtual interfaces. Keeps udev from mucking with things it shouldn't, since the default MAC is never seen by udev when specified on the cmd-line during creation. Signed-off-by: Ben Greear <greearb@candelatech.com> [check for feature flag in nl80211 to force drivers to set it] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-27 08:48:33 +01:00
Ben Greear	e27513fbd0	mac80211: support creating wiphy w/out creating wlanX This will be helpful when using the mac80211_hwsim wiphys and automated testing. Let user create the vifs as needed, and named as expected. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-27 08:48:32 +01:00
Ben Greear	ad28757eef	mac80211: allow creating wiphy devices with suggested name Support creating wiphy devices with an optional name. This will be used by hwsim to have better automated control over virtual radio creation/deletion. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-27 08:48:31 +01:00
Ben Greear	1998d90ad4	cfg80211: support creating wiphy with suggested name Kernel will attempt to use the name if it is supplied, but if name cannot be used for some reason, the default phyX name will be used instead. Signed-off-by: Ben Greear <greearb@candelatech.com> [while at it, use wiphy_name() instead of dev_name(), fix format string issue reported by Kees Cook] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-27 08:48:18 +01:00
Fabian Frederick	5c1e9f2c1f	xfrm: fix set but not used warning in xfrm_policy_queue_process() err was set but unused. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-10-27 07:16:46 +01:00
Eric Dumazet	93a35f59f1	net: napi_reuse_skb() should check pfmemalloc Do not reuse skb if it was pfmemalloc tainted, otherwise future frame might be dropped anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-26 22:47:23 -04:00
Alexander Aring	f81f466ca5	mac802154: tx: make worker information static This patch moves the worker information struct out of skb control block. Instead control block we declare it static inside of tx.c file. We can do that, because the worker can't be used twice at the same time. It's protected by stop and wake netdev queue. This patch fix an issue that the "struct ieee802154_xmit_cb" doesn't fit into the skb control block on some kernel configuartion reported by kbuild test robot. It was introduced by commit `fe24371d66` ("mac802154: tx: remove kmalloc in xmit hotpath"). Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 19:18:35 +01:00
Alexander Aring	e5e584fcc2	mac802154: tx: change naming convention This patch changes the naming convention of the tx functions like mac80211. Just with an 802154 instead 80211 inside the name. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:05 +01:00
Alexander Aring	409c3b0c5f	mac802154: tx: move stats tx increment This patch moves the stats increment of successful transmitted packets in the right place when the skb was really successful transmitted. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:05 +01:00
Alexander Aring	b7eec52bcb	mac802154: tx: cleanup crc calculation Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:05 +01:00
Alexander Aring	cfa626cb37	mac802154: tx: use netdev print helpers This patch replace the pr_foo printout function to netdev_foo printout function. Inside the xmit handling, the interface is already known. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:04 +01:00
Alexander Aring	6001d5223d	mac802154: tx: don't allow if down while sync tx This patch holds rtnl lock while sync xmit inside of workqueue. Otherwise we could down the interface while worker xmit handling. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:04 +01:00
Alexander Aring	ed0a5dce0c	mac802154: tx: add support for xmit_async callback This patch renames the existsing xmit callback to xmit_sync and introduces an asynchronous xmit_async function. If ieee802154_ops doesn't provide the xmit_async callback, then we have a fallback to the xmit_sync callback. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:04 +01:00
Alexander Aring	cdb66beaa0	mac802154: tx: fix error handling while xmit In case of an error we should call kfree_skb instead of consume_skb which is called by ieee802154_xmit_complete function. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:04 +01:00
Alexander Aring	18d60a0d49	mac802154: tx: use queue helpers in xmit worker This patch uses the queue utility helpers inside the xmit worker of mac802154 subsystem. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:03 +01:00
Alexander Aring	c208510351	mac802154: add netdev qeue helpers This patch adds a new file net/mac802154/util.c which contains utility functions for drivers, etc. This file contains functions to start and stop queues for all virtual interfaces, this is useful for asynchronous handling by driver level. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:03 +01:00
Alexander Aring	dc67c6b30f	mac802154: tx: remove xmit channel context switch This patch removes the channel hopping feature before xmit. There are several issues to provide a real channel hopping (timing requirements, etc...). We don't have any known kernelspace protocol which really use this feature. And I don't know an real user of this feature. We simply drop this feature now. This patch removes also the hold of pib lock which isn't needed by any real driver xmit callback implementation. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:03 +01:00
Alexander Aring	e89e45f22a	mac802154: tx: squash multiple dereferencing This patch introduce some new stack variables to avoid multiple dereferencing inside the xmit worker function. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:24:02 +01:00
Alexander Aring	fe24371d66	mac802154: tx: remove kmalloc in xmit hotpath This patch removes the kmalloc allocation for workqueue data. This patch replaces the kmalloc and uses the control block of skb. The control block has enough space and isn't use by any other layer in this case. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:23:58 +01:00
Alexander Aring	50c6fb9965	mac802154: tx: move xmit callback to tx file This patch moves the netdev xmit callback functions into the tx.c file. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-26 17:23:50 +01:00
Eric Dumazet	349ce993ac	tcp: md5: do not use alloc_percpu() percpu tcp_md5sig_pool contains memory blobs that ultimately go through sg_set_buf(). -> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); This requires that whole area is in a physically contiguous portion of memory. And that @buf is not backed by vmalloc(). Given that alloc_percpu() can use vmalloc() areas, this does not fit the requirements. Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool is small anyway, there is no gain to dynamically allocate it. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `765cf9976e` ("tcp: md5: remove one indirection level in tcp_md5sig_pool") Reported-by: Crestez Dan Leonard <cdleonard@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 16:10:04 -04:00
Alexander Aring	c6f635faf3	mac802154: remove ieee802154_addr from driver_ops This driver_ops callback function is never used by any driver. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:39 +02:00
Alexander Aring	f773054254	mac802154: rename dev_workqueue to workqueue Small rename to use the name workqueue than dev_workqueue. To bring the same naming convention like wireless into 802.15.4. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:38 +02:00
Alexander Aring	59d19cd70c	mac802154: introduce IEEE802154_DEV_TO_SUB_IF This function adds a wrapper to call netdev_priv to getting the sdata attribute. This is similar like the IEEE80211_DEV_TO_SUB_IF function inside wireless stack implementation. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:38 +02:00
Alexander Aring	60741361c3	mac802154: introduce hw_to_local function This patch replace the mac802154_to_priv macro with a static inline function named hw_to_local. This brings a similar naming convention like mac80211 stack. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:38 +02:00
Alexander Aring	d98be45b36	mac802154: rename sdata slaves and slaves_mtx This patch renamens the slaves attribute in sdata to interfaces and slaves_mtx to iflist_mtx. This is similar like the mac80211 stack naming convention. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:38 +02:00
Alexander Aring	04e850fe06	mac802154: rename hw subif_data variable to local This patch renames the hw attribute in struct ieee802154_sub_if_data to local. This avoid confusing with the struct ieee802154_hw hw; inside of local struct. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:38 +02:00
Alexander Aring	036562f9c4	mac802154: rename mac802154_sub_if_data Like wireless this structure should named ieee802154_sub_if_data and not mac802154_sub_if_data. This patch renames the struct and variables to sdata instead priv sometimes. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:37 +02:00
Alexander Aring	a5e1ec538f	mac802154: rename mac802154_priv to ieee802154_local This patch rename the mac802154_priv to ieee802154_local. The mac802154_priv structure is like ieee80211_local and so we name it ieee802154_local. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:37 +02:00
Alexander Aring	5a50439775	ieee802154: rename ieee802154_dev to ieee802154_hw The identical struct of the wireless stack implementation is named ieee80211_hw. This is useful to name the variable hw instead of get confusing with netdev dev variable. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:55:37 +02:00
Alexander Aring	4ca24aca55	ieee802154: move ieee802154 header This patch moves the ieee802154 header into include/linux instead include/net. Similar like wireless which have the ieee80211 header inside of include/linux. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:39:57 +02:00
Alexander Aring	86d52cd964	ieee802154: move wpan-class.c to core.c Like the wireless core.c file this file contains function for phy allocation and freeing. Move this file to core.c to get similar behaviour. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:39:56 +02:00
Alexander Aring	5ad60d3699	ieee802154: move wpan-phy.h to cfg802154.h The wpan-phy header contains the wpan_phy struct information. Later this header will be have similar function like cfg80211 header. The cfg80211 header contains the wiphy struct which is identically the wpan_phy struct inside 802.15.4 subsystem. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:39:56 +02:00
Alexander Aring	15859a5e14	mac802154: move wpan.c to iface.c The wpan.c file contains the interface handling functions now. It's similar like the mac80211 iface.c file. This patch renames this file to iface.c to have similar naming convention in mac802154. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:39:55 +02:00
Alexander Aring	0f1556bc2b	mac802154: move mac802154.h to ieee802154_i.h This patch moves the mac802154.h internal header to ieee802154_i.h like the wireless stack ieee80211_i.h file. This avoids confusing with the not internal header include/net/mac802154.h header. Additional we get the same naming conversion like mac80211 for this file. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:39:55 +02:00
Alexander Aring	62eb01f5c2	mac802154: move ieee802154_dev.c to main.c The ieee802154_dev functionality contains various function for allocation and registration of an ieee802154_dev. This is equal to the net/mac80211/main.c file. This patch rename the ieee802154_dev.c to main.c to have the same behaviour. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:39:54 +02:00
Johan Hedberg	c6992e9ef2	Bluetooth: Add self-tests for SMP crypto functions This patch adds self-tests for the c1 and s1 crypto functions used for SMP pairing. The data used is the sample data from the core specification. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:33:57 +02:00
Johan Hedberg	4cd3362da8	Bluetooth: Add skeleton for SMP self-tests This patch adds a basic skeleton for SMP self-tests. The tests are put behind a new configuration option since running them will slow down the boot process. For now there are no actual tests defined but those will come in a subsequent patch. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:33:56 +02:00
Johan Hedberg	e491eaf3c0	Bluetooth: Pass only crypto context to SMP crypto functions In order to make unit testing possible we need to make the SMP crypto functions only take the crypto context instead of the full SMP context (the latter would require having hci_dev, hci_conn, l2cap_chan, l2cap_conn, etc around). The drawback is that we no-longer get the involved hdev in the debug logs, but this is really the only way to make simple unit tests for the code. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 21:33:56 +02:00
Fabian Frederick	4f639edef7	Bluetooth: fix shadow warning in hci_disconnect() use clkoff_cp for hci_cp_read_clock_offset instead of cp (already defined above). Suggested-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 18:53:39 +02:00
Alexander Aring	ed1da14817	ieee802154: wpan-class: fix trailing semicolon This patch removes an unnecessary tailing semicolon after macro define. Otherwise we get a trailing semicolon while using this macro. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 08:07:30 +02:00
Alexander Aring	57205c14ca	mac802154: fix typo IEEE802515 to IEEE802154 This patch fixs a typo in address filter defines from IEEE802515 to IEEE802154. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 08:07:30 +02:00
Alexander Aring	139f14adab	ieee802154: ieee802154_dev: fix align typo This patch fix a typo and fix align instead allign. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 08:07:30 +02:00
Alexander Aring	b3020f0a35	ieee802154: mac802154: remove FSF address This patch removes the FSF address in files which belongs to ieee802154 and mac802154. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 08:07:30 +02:00
Martin Townsend	ee93053d56	Bluetooth: Fix missing channel unlock in l2cap_le_credits In the error case where credits is greater than max_credits there is a missing l2cap_chan_unlock before returning. Signed-off-by: Martin Townsend <mtownsend1973@gmail.com> Tested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:25 +02:00
Martin Townsend	11e3ff7072	6lowpan: Use skb_cow in IPHC decompression. Currently there are potentially 2 skb_copy_expand calls in IPHC decompression. This patch replaces this with one call to skb_cow which will check to see if there is enough headroom first to ensure it's only done if necessary and will handle alignment issues for cache. As skb_cow uses pskb_expand_head we ensure the skb isn't shared from bluetooth and ieee802.15.4 code that use the IPHC decompression. Signed-off-by: Martin Townsend <martin.townsend@xsilon.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:25 +02:00
Li RongQing	4456c50d23	Bluetooth: 6lowpan: remove unnecessary codes in give_skb_to_upper netif_rx() only returns NET_RX_DROP and NET_RX_SUCCESS, not returns negative value Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:25 +02:00
Szymon Janc	15346a9c28	Bluetooth: Improve RFCOMM __test_pf macro robustness Value returned by this macro might be used as bit value so it should return either 0 or 1 to avoid possible bugs (similar to NSC bug) when shifting it. Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-25 07:56:24 +02:00
Szymon Janc	ec511545ef	Bluetooth: Fix RFCOMM NSC response rfcomm_send_nsc expects CR to be either 0 or 1 since it is later passed to __mcc_type macro and shitfed. Unfortunatelly CR extracted from received frame type was not sanitized and shifted value was passed resulting in bogus response. Note: shifted value was also passed to other functions but was used only in if satements so this bug appears only for NSC case. The CR bit in the value octet shall be set to the same value as the CR bit in the type field octet of the not supported command frame but the CR bit for NCS response should be set to 0 since it is always a response. This was affecting TC_RFC_BV_25_C PTS qualification test. Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-25 07:56:24 +02:00
Alfonso Acosta	89cbb0638e	Bluetooth: Defer connection-parameter removal when unpairing Systematically removing the LE connection parameters and autoconnect action is inconvenient for rebonding without disconnecting from userland (i.e. unpairing followed by repairing without disconnecting). The parameters will be lost after unparing and userland needs to take care of book-keeping them and re-adding them. This patch allows userland to forget about parameter management when rebonding without disconnecting. It defers clearing the connection parameters when unparing without disconnecting, giving a chance of keeping the parameters if a repairing happens before the connection is closed. Signed-off-by: Alfonso Acosta <fons@spotify.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-25 07:56:24 +02:00
Alexander Aring	c37a8106de	ieee802154: 6lowpan: add RTNL assertion This patch ensure that the rtnl lock is hold while newlink callback. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:24 +02:00
Alexander Aring	1ae2605e55	ieee802154: 6lowpan: improve packet registration This patch improves the packet registration handling. Instead of registration with module init we have a open count variable and registration the lowpan packet handler when it's needed. The open count variable should be protected by RTNL. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:24 +02:00
Alfonso Acosta	ddbea5cff7	Bluetooth: Remove redundant check on hci_conn's device class NULL-checking conn->dev_class is pointless since the variable is defined as an array, i.e. it will always be non-NULL. Signed-off-by: Alfonso Acosta <fons@spotify.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-25 07:56:24 +02:00
Alfonso Acosta	fd45ada910	Bluetooth: Include ADV_IND report in Device Connected event There are scenarios when autoconnecting to a device after the reception of an ADV_IND report (action 0x02), in which userland might want to examine the report's contents. For instance, the Service Data might have changed and it would be useful to know ahead of time before starting any GATT procedures. Also, the ADV_IND may contain Manufacturer Specific data which would be lost if not propagated to userland. In fact, this patch results from the need to rebond with a device lacking persistent storage which notifies about losing its LTK in ADV_IND reports. This patch appends the ADV_IND report which triggered the autoconnection to the EIR Data in the Device Connected event. Signed-off-by: Alfonso Acosta <fons@spotify.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-25 07:56:24 +02:00
Alfonso Acosta	48ec92fa4f	Bluetooth: Refactor arguments of mgmt_device_connected The values of a lot of the mgmt_device_connected() parameters come straight from a hci_conn object. We can simplify the function by passing the full hci_conn pointer to it. Signed-off-by: Alfonso Acosta <fons@spotify.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-25 07:56:23 +02:00
Alexander Aring	c0bffc7ddc	ieee802154: 6lowpan: fix sign of errno return val This patch fix ERR_PTR(-rc) to ERR_PTR(rc). The variable rc is already a negative errno value. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:22 +02:00
Alexander Aring	f870b8c631	ieee802154: reassembly: fix tag byteorder This patch fix byte order handling in reassembly code of 802.15.4 6LoWPAN fragmentation handling. net/ieee802154/reassembly.c:58:43: warning: restricted __be16 degrades to integer Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:22 +02:00
Alexander Aring	cd97a713ac	ieee802154: 6lowpan: fix byteorder for frag tag This patch fix byteorder issues with fragment tag of generation 802.15.4 6LoWPAN fragment header. net/ieee802154/6lowpan_rtnl.c:278:54: warning restricted __be16 degrades to integer net/ieee802154/6lowpan_rtnl.c:278:18: warning: incorrect type in assignment (different base types) net/ieee802154/6lowpan_rtnl.c:278:18: expected restricted __be16 [usertype] frag_tag net/ieee802154/6lowpan_rtnl.c:278:18: got unsigned short Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:22 +02:00
Simon Vincent	39f6eb19cf	ieee802154: 6lowpan: Drop PACKET_OTHERHOST skbs in 6lowpan There is no point processing pkts which are PACKET_OTHERHOST in 6lowpan as they are discarded as soon as they reach the ipv6 layer. Therefore we should drop them in the 6lowpan layer. Signed-off-by: Simon Vincent <simon.vincent@xsilon.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:21 +02:00
Julien Catalano	ee4c148e8a	trivial: net/mac802154: Fix Kconfig typo Signed-off-by: Julien Catalano <julien.catalano@gmail.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-25 07:56:21 +02:00
Fabian Frederick	74bca138e1	net: llc: include linux/errno.h instead of asm/errno.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 15:51:42 -04:00
Fabian Frederick	75da1469f9	lapb: move EXPORT_SYMBOL after functions. See Documentation/CodingStyle Chapter 6 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 15:51:42 -04:00
Houcheng Lin	b51d3fa364	netfilter: nf_log: release skbuff on nlmsg put failure The kernel should reserve enough room in the skb so that the DONE message can always be appended. However, in case of e.g. new attribute erronously not being size-accounted for, __nfulnl_send() will still try to put next nlmsg into this full skbuf, causing the skb to be stuck forever and blocking delivery of further messages. Fix issue by releasing skb immediately after nlmsg_put error and WARN() so we can track down the cause of such size mismatch. [ fw@strlen.de: add tailroom/len info to WARN ] Signed-off-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:34:11 +02:00
Florian Westphal	c1e7dc91ee	netfilter: nfnetlink_log: fix maximum packet length logged to userspace don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work. The nla length includes the size of the nla struct, so anything larger results in u16 integer overflow. This patch is similar to `9cefbbc9c8` (netfilter: nfnetlink_queue: cleanup copy_range usage). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:32:27 +02:00
Florian Westphal	9dfa1dfe4d	netfilter: nf_log: account for size of NLMSG_DONE attribute We currently neither account for the nlattr size, nor do we consider the size of the trailing NLMSG_DONE when allocating nlmsg skb. This can result in nflog to stop working, as __nfulnl_send() re-tries sending forever if it failed to append NLMSG_DONE (which will never work if buffer is not large enough). Reported-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:30:15 +02:00
Herbert Xu	7677e86843	bridge: Do not compile options in br_parse_ip_options Commit `462fb2af97` bridge : Sanitize skb before it enters the IP stack broke when IP options are actually used because it mangles the skb as if it entered the IP stack which is wrong because the bridge is supposed to operate below the IP stack. Since nobody has actually requested for parsing of IP options this patch fixes it by simply reverting to the previous approach of ignoring all IP options, i.e., zeroing the IPCB. If and when somebody who uses IP options and actually needs them to be parsed by the bridge complains then we can revisit this. Reported-by: David Newall <davidn@davidnewall.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:24:03 +02:00
Martin KaFai Lau	367efcb932	ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn This patch save the fn before doing rt6_backtrack. Hence, without redo-ing the fib6_lookup(), saved_fn can be used to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off. Some minor changes I think make sense to review as a single patch: * Remove the 'out:' goto label. * Remove the 'reachable' variable. Only use the 'strict' variable instead. After this patch, "failing ip6_ins_rt()" should be the only case that requires a redo of fib6_lookup(). Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 00:14:39 -04:00
Martin KaFai Lau	94c77bb41d	ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case When there is a RTF_CACHE hit, no need to redo fib6_lookup() with reachable=0. Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 00:14:39 -04:00
Martin KaFai Lau	a3c00e46ef	ipv6: Remove BACKTRACK macro It is the prep work to reduce the number of calls to fib6_lookup(). The BACKTRACK macro could be hard-to-read and error-prone due to its side effects (mainly goto). This patch is to: 1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following return values: * If it is backtrack-able, returns next fn for retry. * If it reaches the root, returns NULL. 2. The caller needs to decide if a backtrack is needed (by testing rt == net->ipv6.ip6_null_entry). 3. Rename the goto labels in ip6_pol_route() to make the next few patches easier to read. Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 00:14:39 -04:00
Kenjiro Nakayama	105970f608	net: Remove trailing whitespace in tcp.h icmp.c syncookies.c Remove trailing whitespace in tcp.h icmp.c syncookies.c Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 00:13:10 -04:00
Arik Nemtsov	0fc1e0495f	mac80211: expose API allowing station iteration Allow drivers to iterate all stations currently uploaded to them. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-23 20:40:03 +02:00
Arik Nemtsov	2bad7748b3	mac80211: add stations in order to the station list During reconfig the station list is traversed in order and station are added back to the driver. Make sure the stations are added to the driver in the same order they were added to mac80211. This has a real side effect - some drivers (iwlwifi) require TDLS stations to be added only after the AP station for the same network. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-23 20:40:02 +02:00
Arik Nemtsov	8b94148cfe	mac80211: expose TDLS-initiator value to low level driver Some drivers need to know which station is the TDLS link initiator. Expose this value via the mac80211 ieee80211_sta structure. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-23 20:40:02 +02:00
Arik Nemtsov	452218d9fd	mac80211: fix network header breakage during encryption When an IV is generated, only the MAC header is moved back. The network header location remains the same relative to the skb head, as the new IV is using headroom space that was reserved in advance. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-23 20:40:01 +02:00
Andrei Otcheretianski	a7f3a76828	mac80211: export IE splitting function Export ieee80211_ie_split function, so it can be reused by drivers which need to insert additional elements. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-23 20:39:43 +02:00
Karl Beldan	8ec7886b1c	mac80211: minstrel_ht: use group flags instead of index to display rates When displaying a rate through debugfs minstrel_ht guesses its flags comparing group indexes. Since `3ec373c421` ("mac80211: minstrel_ht: include type (cck/ht) in rates flag"), the rate flags of interest are present in the mcs_group-s, so use it. While improving the code, this also fixes a smatch false positive "error: testing array offset 'i' after use" in minstrel_ht_stats_dump. This warning only triggers after `9208247d74` ("mac80211: minstrel_ht: add basic support for VHT rates <= 3SS@80MHz") with CONFIG_MAC80211_RC_MINSTREL_VHT unset because then MINSTREL_VHT_GROUP_0 is above MINSTREL_GROUPS_NB and smatch only barks when the "testing array offset" seems to prevent possible out of bonds accesses (which does not happen here since i < ARRAY_SIZE(mi->groups)). Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-23 20:36:13 +02:00
J. Bruce Fields	280caac078	rpc: change comments to assertions Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-10-23 14:05:11 -04:00
J. Bruce Fields	ed38c06998	RPC: remove unneeded checks from xdr_truncate_encode() Thanks to Andrea Arcangeli for pointing out these checks are obviously unnecessary given the preceding calculations. Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-10-23 14:05:11 -04:00
Sathya Perla	9e7ceb0607	net: fix saving TX flow hash in sock for outgoing connections The commit "net: Save TX flow hash in sock and set in skbuf on xmit" introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate and record flow hash(sk_txhash) in the socket structure. sk_txhash is used to set skb->hash which is used to spread flows across multiple TXQs. But, the above routines are invoked before the source port of the connection is created. Because of this all outgoing connections that just differ in the source port get hashed into the same TXQ. This patch fixes this problem for IPv4/6 by invoking the the above routines after the source port is available for the socket. Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit") Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 16:14:29 -04:00
Li RongQing	789f202326	xfrm6: fix a potential use after free in xfrm6_policy.c pskb_may_pull() maybe change skb->data and make nh and exthdr pointer oboslete, so recompute the nd and exthdr Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 15:38:48 -04:00
Karl Beldan	a63ba13eec	net: tso: fix unaligned access to crafted TCP header in helper API The crafted header start address is from a driver supplied buffer, which one can reasonably expect to be aligned on a 4-bytes boundary. However ATM the TSO helper API is only used by ethernet drivers and the tcp header will then be aligned to a 2-bytes only boundary from the header start address. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 12:52:55 -04:00
Sabrina Dubroca	c123bb7163	netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation alloc_percpu returns NULL on failure, not a negative error code. Fixes: `ff3cd7b3c9` ("netfilter: nf_tables: refactor chain statistic routines") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-22 14:12:51 +02:00
Dan Carpenter	0f9f5e1b83	netfilter: ipset: off by one in ip_set_nfnl_get_byindex() The ->ip_set_list[] array is initialized in ip_set_net_init() and it has ->ip_set_max elements so this check should be >= instead of > otherwise we are off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-22 14:12:50 +02:00
Marcelo Leitner	e37ad9fd63	netfilter: nf_conntrack: allow server to become a client in TW handling When a port that was used to listen for inbound connections gets closed and reused for outgoing connections (like rsh ends up doing for stderr flow), current we may reject the SYN/ACK packet for the new connection because tcp_conntracks states forbirds a port to become a client while there is still a TIME_WAIT entry in there for it. As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout for it is 120s, there is a ~60s window that the application can end up opening a port that conntrack will end up blocking. This patch fixes this by simply allowing such state transition: if we see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note that the rest of the code already handles this situation, more specificly in tcp_packet(), first switch clause. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-22 14:12:50 +02:00
Johannes Berg	4619194a49	mac80211: don't remove tainted keys after not programming When a key is tainted during resume, it is no longer programmed into the device; however, it's uploaded flag may (will) be set. Clear the flag when not programming it because it's tainted to avoid attempting to remove it again later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-22 11:30:30 +02:00
Johannes Berg	02219b3abc	mac80211: add WMM admission control support Use the currently existing APIs between mac80211 and the low level driver to implement WMM admission control. The low level driver needs to report the media time used by each transmitted packet in ieee80211_tx_status. Based on that information, mac80211 will modify the QoS parameters of the admission controlled Access Category when the limit is reached. Once the original QoS parameters can be restored, mac80211 will do so. One issue with this approach is that management frames will also erroneously be downgraded, but the upside is that the implementation is simple. In the future, it can be extended to driver- or device-based implementations that are better. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-22 10:42:09 +02:00
Johannes Berg	f409079bb6	mac80211: sanity check CW_min/CW_max towards driver There's no reason to ever set invalid CW_min/CW_max to the drivers, we should catch it in higher layers. However, the consequences of setting it wrong can be quite severe, so double-check at a low level and error out for invalid data. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-22 10:42:09 +02:00
Johannes Berg	723e73acd1	cfg80211: make WMM TSPEC support flag an nl80211 feature flag During the review of the corresponding wpa_supplicant patches we noticed that the only way for it to detect that this functionality is supported currently is to check for the command support. This can be misleading though, as the command was also designed to, in the future, support pure 802.11 TSPECs. Expose the WMM-TSPEC feature flag to nl80211 so later we can also expose an 802.11-TSPEC feature flag (if needed) to differentiate the two cases. Note: this change isn't needed in 3.18 as there's no driver there yet that supports the functionality at all. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-22 10:41:49 +02:00
Sabrina Dubroca	7c1c97d54f	net: sched: initialize bstats syncp Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp. Fixes: `22e0f8b932` "net: sched: make bstats per cpu and estimator RCU safe" Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 21:45:21 -04:00
Thomas Graf	78fd1d0ab0	netlink: Re-add locking to netlink_lookup() and seq walker The synchronize_rcu() in netlink_release() introduces unacceptable latency. Reintroduce minimal lookup so we can drop the synchronize_rcu() until socket destruction has been RCUfied. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 21:34:49 -04:00
Ying Xue	1a194c2d59	tipc: fix lockdep warning when intra-node messages are delivered When running tipcTC&tipcTS test suite, below lockdep unsafe locking scenario is reported: [ 1109.997854] [ 1109.997988] ================================= [ 1109.998290] [ INFO: inconsistent lock state ] [ 1109.998575] 3.17.0-rc1+ #113 Not tainted [ 1109.998762] --------------------------------- [ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 1109.998762] (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] {SOFTIRQ-ON-W} state was registered at: [ 1109.998762] [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80 [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc] [ 1109.998762] [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc] [ 1109.998762] [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc] [ 1109.998762] [<ffffffff817676ee>] SYSC_connect+0xae/0xc0 [ 1109.998762] [<ffffffff81767b7e>] SyS_connect+0xe/0x10 [ 1109.998762] [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200 [ 1109.998762] [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f [ 1109.998762] irq event stamp: 241060 [ 1109.998762] hardirqs last enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0 [ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0 [ 1109.998762] softirqs last enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50 [ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [ 1109.998762] other info that might help us debug this: [ 1109.998762] Possible unsafe locking scenario: [ 1109.998762] [ 1109.998762] CPU0 [ 1109.998762] ---- [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] <Interrupt> [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] [ 1109.998762] * DEADLOCK * [ 1109.998762] [ 1109.998762] 2 locks held by swapper/7/0: [ 1109.998762] #0: (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] #1: (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [ 1109.998762] stack backtrace: [ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113 [ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 1109.998762] ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007 [ 1109.998762] ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001 [ 1109.998762] ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000 [ 1109.998762] Call Trace: [ 1109.998762] <IRQ> [<ffffffff81a209eb>] dump_stack+0x4e/0x68 [ 1109.998762] [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202 [ 1109.998762] [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50 [ 1109.998762] [<ffffffff810a406c>] mark_lock+0x28c/0x2f0 [ 1109.998762] [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0 [ 1109.998762] [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80 [ 1109.998762] [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10 [ 1109.998762] [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc] [ 1109.998762] [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc] [ 1109.998762] [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70 [ 1109.998762] [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0 [ 1109.998762] [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70 [ 1109.998762] [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0 [ 1109.998762] [<ffffffff81785518>] napi_gro_receive+0xd8/0x240 [ 1109.998762] [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530 [ 1109.998762] [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffff817842b1>] net_rx_action+0x141/0x310 [ 1109.998762] [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150 [ 1109.998762] [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0 [ 1109.998762] [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [<ffffffff81a30d07>] do_IRQ+0x67/0x110 [ 1109.998762] [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f [ 1109.998762] <EOI> [<ffffffff8100d2b7>] ? default_idle+0x37/0x250 [ 1109.998762] [<ffffffff8100d2b5>] ? default_idle+0x35/0x250 [ 1109.998762] [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20 [ 1109.998762] [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0 [ 1109.998762] [<ffffffff81034c78>] start_secondary+0x188/0x1f0 When intra-node messages are delivered from one process to another process, tipc_link_xmit() doesn't disable BH before it directly calls tipc_sk_rcv() on process context to forward messages to destination socket. Meanwhile, if messages delivered by remote node arrive at the node and their destinations are also the same socket, tipc_sk_rcv() running on process context might be preempted by tipc_sk_rcv() running BH context. As a result, the latter cannot obtain the socket lock as the lock was obtained by the former, however, the former has no chance to be run as the latter is owning the CPU now, so headlock happens. To avoid it, BH should be always disabled in tipc_sk_rcv(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:28:15 -04:00
Ying Xue	7b8613e0a1	tipc: fix a potential deadlock Locking dependency detected below possible unsafe locking scenario: CPU0 CPU1 T0: tipc_named_rcv() tipc_rcv() T1: [grab nametble write lock]* [grab node lock]* T2: tipc_update_nametbl() tipc_node_link_up() T3: tipc_nodesub_subscribe() tipc_nametbl_publish() T4: [grab node lock]* [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the updating of the name table after link state named out of node lock, the reverse order of holding locks will be eliminated, and as a result, the deadlock risk. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:28:15 -04:00
Fabian Frederick	5c6761adc7	mac80211: remove unnecessary null test before debugfs_remove() The debugfs_remove() function can safely take NULL parameters so the additionally null test isn't required, and there's no other reason to have it here, so remove it. Signed-off-by: Fabian Frederick <fabf@skynet.be> [rewrite commit message, re-introduce blank line after assert] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-21 21:08:10 +02:00
Karl Beldan	9208247d74	mac80211: minstrel_ht: add basic support for VHT rates <= 3SS@80MHz When the new CONFIG_MAC80211_RC_MINSTREL_VHT is not set (default 'N'), there is no behavioral change including in sampling and MCS_GROUP_RATES remains 8. Otherwise MCS_GROUP_RATES is 10, and a module parameter vht_only (default 'true'), restricts the rates selection to VHT when VHT is supported. Regarding the debugfs stats buffer: It is explicitly increased from 8k to 32k to fit every rates incl. when both HT and VHT rates are enabled, as for the format, before: type rate tpt eprob prob ret ok(cum) ok( cum) HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0( 0) after: type rate tpt eprob prob ret ok(cum) ok( cum) HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0( 0) VHT40/LGI MCS5/2 0.0 0.0 0.0 0 0( 0) 0( 0) Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-21 13:25:26 +02:00
Karl Beldan	3ec373c421	mac80211: minstrel_ht: include type (cck/ht) in rates flag ATM, we grep cck rates idx with idx / MCS_GROUP_RATES == MINSTREL_CCK_GROUP. Matching neither-cck-non-ht rates could be done by replacing '==' with '>', however it would be less versatile or explicit. This will allow to match VHT rates with IEEE80211_TX_RC_VHT_MCS. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-20 21:39:35 +02:00
Karl Beldan	8a0ee4fe19	mac80211: minstrel_ht: macros adjustments for future VHT_GROUPs No functional change. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-20 21:39:35 +02:00
Karl Beldan	d4d141cae8	mac80211: minstrel_ht: Increase the range of handled rate indexes Since `5935839ad7` ("mac80211: improve minstrel_ht rate sorting by throughput & probability"), the rate indexes are manipulated via u8's and hence allow for a maximum of 256 mcs_group entries in minstrel_mcs_groups. ATM, minstrel_ht advertizes support up to 3HTSS@40MHz, consuming: 8(MCS_GROUP_RATES) * (3(SS)2(GI)2(BW)+1(CCK)), i.e. 104 entries. Support for 3VHTSS@80MHz will require: 10(MCS_GROUP_RATES) * (3(SS)2(GI)2(BW)+1(CCK)) + 10(MCS_GROUP_RATES) * (3(SS)2(GI)3(BW)), i.e. 130 + 180 entries. This change moves from u8s to u16s where necessary. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-20 21:39:35 +02:00
Johannes Berg	8fa74e3aa6	Merge branch 'mac80211' into mac80211-next This was needed to avoid conflicts in the minstrel changes. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-20 21:39:29 +02:00
Johannes Berg	b08cc24e0a	mac80211: fix change flags variable signedness This showed up as a sparse warning (with higher verbosity) and is certainly correct - the change flags should be unsigned. It's not that important since high flag numbers aren't used and bitwise operations would still work. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-20 21:36:55 +02:00
Florian Westphal	f993bc25e5	net: core: handle encapsulation offloads when computing segment lengths if ->encapsulation is set we have to use inner_tcp_hdrlen and add the size of the inner network headers too. This is 'mostly harmless'; tbf might send skb that is slightly over quota or drop skb even if it would have fit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 12:38:13 -04:00
Florian Westphal	330966e501	net: make skb_gso_segment error handling more robust skb_gso_segment has three possible return values: 1. a pointer to the first segmented skb 2. an errno value (IS_ERR()) 3. NULL. This can happen when GSO is used for header verification. However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL and would oops when NULL is returned. Note that these call sites should never actually see such a NULL return value; all callers mask out the GSO bits in the feature argument. However, there have been issues with some protocol handlers erronously not respecting the specified feature mask in some cases. It is preferable to get 'have to turn off hw offloading, else slow' reports rather than 'kernel crashes'. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 12:38:13 -04:00
Florian Westphal	1e16aa3ddf	net: gso: use feature flag argument in all protocol gso handlers skb_gso_segment() has a 'features' argument representing offload features available to the output path. A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use those instead of the provided ones when handing encapsulation/tunnels. Depending on dev->hw_enc_features of the output device skb_gso_segment() can then return NULL even when the caller has disabled all GSO feature bits, as segmentation of inner header thinks device will take care of segmentation. This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs that did not fit the remaining token quota as the segmentation does not work when device supports corresponding hw offload capabilities. Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 12:38:12 -04:00
David S. Miller	ce8ec48967	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains netfilter fixes for your net tree, they are: 1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules. 2) Restrict nat and masq expressions to the nat chain type. Otherwise, users may crash their kernel if they attach a nat/masq rule to a non nat chain. 3) Fix hook validation in nft_compat when non-base chains are used. Basically, initialize hook_mask to zero. 4) Make sure you use match/targets in nft_compat from the right chain type. The existing validation relies on the table name which can be avoided by 5) Better netlink attribute validation in nft_nat. This expression has to reject the configuration when no address and proto configurations are specified. 6) Interpret NFTA_NAT_REG__MAX if only if NFTA_NAT_REG__MIN is set. Yet another sanity check to reject incorrect configurations from userspace. 7) Conditional NAT attribute dumping depending on the existing configuration. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 11:57:47 -04:00
Jouni Malinen	988568669d	cfg80211: Specify frame and reason code for NL80211_CMD_DEL_STATION The optional NL80211_ATTR_MGMT_SUBTYPE and NL80211_ATTR_REASON_CODE attributes can now be included in NL80211_CMD_DEL_STATION to indicate to the driver which frame (Deauthentication/Disassociation) and reason code in that frame should be used to indicate removal to the specific station. This is used by drivers that implement AP SME and generate those frames internally. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-20 16:39:23 +02:00
Karl Beldan	11b2357d5d	mac80211: minstrels: fix buffer overflow in HT debugfs rc_stats ATM an HT rc_stats line is 106 chars. Times 8(MCS_GROUP_RATES)3(SS)2(GI)2(BW) + CCK(4), i.e. x100, this is well above the current 8192 - sizeof(ms) currently allocated. Fix this by squeezing the output as follows (not that we're short on memory but this also improves readability and range, the new format adds one more digit to ok/cum and ok/cum): - Before (HT) (106 ch): type rate throughput ewma prob this prob retry this succ/attempt success attempts CCK/LP 5.5M 0.0 0.0 0.0 0 0( 0) 0 0 HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0 0 - After (75 ch): type rate tpt eprob prob ret ok(cum) ok( cum) CCK/LP 5.5M 0.0 0.0 0.0 0 0( 0) 0( 0) HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0( 0) - Align non-HT format Before (non-HT) (83 ch): rate throughput ewma prob this prob this succ/attempt success attempts ABCDP 6 0.0 0.0 0.0 0( 0) 0 0 54 0.0 0.0 0.0 0( 0) 0 0 - After (61 ch): rate tpt eprob prob ok(cum) ok( cum) ABCDP 1 0.0 0.0 0.0 0( 0) 0( 0) 54 0.0 0.0 0.0 0( 0) 0( 0) This also adds dynamic checks for overflow, lowers the size of the non-HT request (allowing > 30 entries) and replaces the buddy-rounded allocations (s/sizeof(ms) + 8192/8192). Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-20 16:37:01 +02:00
Jouni Malinen	89c771e5a6	cfg80211: Convert del_station() callback to use a param struct This makes it easier to add new parameters for the del_station calls without having to modify all drivers that use this. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-20 16:24:21 +02:00
Wolfram Sang	140bbc4af0	net: rfkill: drop owner assignment from platform_drivers A platform_driver does not need to set an owner, it will be populated by the driver core. Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2014-10-20 16:21:58 +02:00
Wolfram Sang	0dd1153813	net: dsa: drop owner assignment from platform_drivers A platform_driver does not need to set an owner, it will be populated by the driver core. Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2014-10-20 16:21:58 +02:00
Linus Torvalds	e25b492741	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "A quick batch of bug fixes: 1) Fix build with IPV6 disabled, from Eric Dumazet. 2) Several more cases of caching SKB data pointers across calls to pskb_may_pull(), thus referencing potentially free'd memory. From Li RongQing. 3) DSA phy code tests operation presence improperly, instead of going: if (x->ops->foo) r = x->ops->foo(args); it was going: if (x->ops->foo(args)) r = x->ops->foo(args); Fix from Andew Lunn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: Net: DSA: Fix checking for get_phy_flags function ipv6: fix a potential use after free in sit.c ipv6: fix a potential use after free in ip6_offload.c ipv4: fix a potential use after free in gre_offload.c tcp: fix build error if IPv6 is not enabled	2014-10-19 11:41:57 -07:00
Andrew Lunn	228b16cb13	Net: DSA: Fix checking for get_phy_flags function The check for the presence or not of the optional switch function get_phy_flags() called the function, rather than checked to see if it is a NULL pointer. This causes a derefernce of a NULL pointer on all switch chips except the sf2, the only switch to implement this call. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: `6819563e64` ("net: dsa: allow switch drivers to specify phy_device::dev_flags") Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-19 12:46:31 -04:00
Linus Torvalds	0e6e58f941	One cc: stable commit, the rest are a series of minor cleanups which have been sitting in MST's tree during my vacation. I changed a function name and made one trivial change, then they spent two days in linux-next. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU 0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7 YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0 c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1 eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j vtx5nXiJa8YYdxI2TJCN =JK6x -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "One cc: stable commit, the rest are a series of minor cleanups which have been sitting in MST's tree during my vacation. I changed a function name and made one trivial change, then they spent two days in linux-next" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits) virtio-rng: refactor probe error handling virtio_scsi: drop scan callback virtio_balloon: enable VQs early on restore virtio_scsi: fix race on device removal virito_scsi: use freezable WQ for events virtio_net: enable VQs early on restore virtio_console: enable VQs early on restore virtio_scsi: enable VQs early on restore virtio_blk: enable VQs early on restore virtio_scsi: move kick event out from virtscsi_init virtio_net: fix use after free on allocation failure 9p/trans_virtio: enable VQs early virtio_console: enable VQs early virtio_blk: enable VQs early virtio_net: enable VQs early virtio: add API to enable VQs early virtio_net: minor cleanup virtio-net: drop config_mutex virtio_net: drop config_enable virtio-blk: drop config_mutex ...	2014-10-18 10:25:09 -07:00
Li RongQing	a6d4518da3	ipv6: fix a potential use after free in sit.c pskb_may_pull() maybe change skb->data and make iph pointer oboslete, fix it by geting ip header length directly. Fixes: `ca15a078` (sit: generate icmpv6 error when receiving icmpv4 error) Cc: Oussama Ghorbel <ghorbel@pivasoftware.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-18 13:04:09 -04:00
Li RongQing	fc6fb41cd6	ipv6: fix a potential use after free in ip6_offload.c pskb_may_pull() maybe change skb->data and make opth pointer oboslete, so set the opth again Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-18 13:04:08 -04:00
Li RongQing	b4e3cef703	ipv4: fix a potential use after free in gre_offload.c pskb_may_pull() may change skb->data and make greh pointer oboslete; so need to reassign greh; but since first calling pskb_may_pull already ensured that skb->data has enough space for greh, so move the reference of greh before second calling pskb_may_pull(), to avoid reassign greh. Fixes: 7a7ffbabf9("ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC") Cc: Wei-Chun Chao <weichunc@plumgrid.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-18 13:04:08 -04:00
Linus Torvalds	2e923b0251	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Include fixes for netrom and dsa (Fabian Frederick and Florian Fainelli) 2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO. 3) Several SKB use after free fixes (vxlan, openvswitch, vxlan, ip_tunnel, fou), from Li ROngQing. 4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy. 5) Use after free in virtio_net, from Michael S Tsirkin. 6) Fix flow mask handling for megaflows in openvswitch, from Pravin B Shelar. 7) ISDN gigaset and capi bug fixes from Tilman Schmidt. 8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin. 9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov. 10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong Wang and Eric Dumazet. 11) Don't overwrite end of SKB when parsing malformed sctp ASCONF chunks, from Daniel Borkmann. 12) Don't call sock_kfree_s() with NULL pointers, this function also has the side effect of adjusting the socket memory usage. From Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits) bna: fix skb->truesize underestimation net: dsa: add includes for ethtool and phy_fixed definitions openvswitch: Set flow-key members. netrom: use linux/uaccess.h dsa: Fix conversion from host device to mii bus tipc: fix bug in bundled buffer reception ipv6: introduce tcp_v6_iif() sfc: add support for skb->xmit_more r8152: return -EBUSY for runtime suspend ipv4: fix a potential use after free in fou.c ipv4: fix a potential use after free in ip_tunnel_core.c hyperv: Add handling of IP header with option field in netvsc_set_hash() openvswitch: Create right mask with disabled megaflows vxlan: fix a free after use openvswitch: fix a use after free ipv4: dst_entry leak in ip_send_unicast_reply() ipv4: clean up cookie_v4_check() ipv4: share tcp_v4_save_options() with cookie_v4_check() ipv4: call __ip_options_echo() in cookie_v4_check() atm: simplify lanai.c by using module_pci_driver ...	2014-10-18 09:31:37 -07:00
Pablo Neira Ayuso	1e2d56a5d3	netfilter: nft_nat: dump attributes if they are set Dump NFTA_NAT_REG_ADDR_MIN if this is non-zero. Same thing with NFTA_NAT_REG_PROTO_MIN. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-18 14:16:13 +02:00
Pablo Neira Ayuso	61cfac6b42	netfilter: nft_nat: NFTA_NAT_REG_ADDR_MAX depends on NFTA_NAT_REG_ADDR_MIN Interpret NFTA_NAT_REG_ADDR_MAX if NFTA_NAT_REG_ADDR_MIN is present, otherwise, skip it. Same thing with NFTA_NAT_REG_PROTO_MAX. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-18 14:16:12 +02:00
Pablo Neira Ayuso	5c819a3975	netfilter: nft_nat: insufficient attribute validation We have to validate that we at least get an NFTA_NAT_REG_ADDR_MIN or NFTA_NFT_REG_PROTO_MIN attribute. Reject the configuration if none of them are present. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-18 14:16:11 +02:00
Pablo Neira Ayuso	f3f5ddeddd	netfilter: nft_compat: validate chain type in match/target We have to validate the real chain type to ensure that matches/targets are not used out from their scope (eg. MASQUERADE in nat chain type). The existing validation relies on the table name, but this is not sufficient since userspace can fool us by using the appropriate table name with a different chain type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-18 14:14:07 +02:00
Florian Fainelli	a28205437b	net: dsa: add includes for ethtool and phy_fixed definitions net/dsa/slave.c uses functions and structures declared in phy_fixed.h but does not explicitely include it, while dsa.h needs structure declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix those by including the correct header files. Fixes: `ec9436baed` ("net: dsa: allow drivers to do link adjustment") Fixes: `ce31b31c68` ("net: dsa: allow updating fixed PHY link information") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:54:46 -04:00
Pravin B Shelar	25ef1328a0	openvswitch: Set flow-key members. This patch adds missing memset which are required to initialize flow key member. For example for IP flow we need to initialize ip.frag for all cases. Found by inspection. This bug is introduced by commit `0714812134` ("openvswitch: Eliminate memset() from flow_extract"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:54:02 -04:00
Fabian Frederick	dc8e54165f	netrom: use linux/uaccess.h replace asm/uaccess.h by linux/uaccess.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:52:54 -04:00
Jon Paul Maloy	643566d4b4	tipc: fix bug in bundled buffer reception In commit `ec8a2e5621` ("tipc: same receive code path for connection protocol and data messages") we omitted the the possiblilty that an arriving message extracted from a bundle buffer may be a multicast message. Such messages need to be to be delivered to the socket via a separate function, tipc_sk_mcast_rcv(). As a result, small multicast messages arriving as members of a bundle buffer will be silently dropped. This commit corrects the error by considering this case in the function tipc_link_bundle_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:50:53 -04:00
Eric Dumazet	870c315138	ipv6: introduce tcp_v6_iif() Commit `971f10eca1` ("tcp: better TCP_SKB_CB layout to reduce cache line misses") added a regression for SO_BINDTODEVICE on IPv6. This is because we still use inet6_iif() which expects that IP6 control block is still at the beginning of skb->cb[] This patch adds tcp_v6_iif() helper and uses it where necessary. Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif parameter to it. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `971f10eca1` ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:48:07 -04:00
Li RongQing	d8f00d2710	ipv4: fix a potential use after free in fou.c pskb_may_pull() maybe change skb->data and make uh pointer oboslete, so reload uh and guehdr Fixes: `37dd0247` ("gue: Receive side for Generic UDP Encapsulation") Cc: Tom Herbert <therbert@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:45:26 -04:00
Li RongQing	1245dfc8ca	ipv4: fix a potential use after free in ip_tunnel_core.c pskb_may_pull() maybe change skb->data and make eth pointer oboslete, so set eth after pskb_may_pull() Fixes:3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:45:26 -04:00
Pravin B Shelar	f47de068f6	openvswitch: Create right mask with disabled megaflows If megaflows are disabled, the userspace does not send the netlink attribute OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask. sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the bytes that represent padding for struct sw_flow, or the bytes that represent fields that may not be set during ovs_flow_extract(). This is a problem, because when we extract a flow from a packet, we do not memset() anymore the struct sw_flow to 0. This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(), which operates on the netlink attributes rather than on the mask key. Using this approach we are sure that only the bytes that the user provided in the flow are matched. Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we now return with an error. This bug is introduced by commit `0714812134` ("openvswitch: Eliminate memset() from flow_extract"). Reported-by: Alex Wang <alexw@nicira.com> Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 16:49:34 -04:00
Li RongQing	389f48947a	openvswitch: fix a use after free pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp setting after arphdr_ok to avoid the use the freed memory Fixes: `0714812134` ("openvswitch: Eliminate memset() from flow_extract.") Cc: Jesse Gross <jesse@nicira.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 16:21:53 -04:00
Vasily Averin	4062090e3e	ipv4: dst_entry leak in ip_send_unicast_reply() ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork and in case errors in __ip_append_data() nobody frees stolen dst entry Fixes: `2e77d89b2f` ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()") Signed-off-by: Vasily Averin <vvs@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 15:30:12 -04:00
Cong Wang	461b74c391	ipv4: clean up cookie_v4_check() We can retrieve opt from skb, no need to pass it as a parameter. And opt should always be non-NULL, no need to check. Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 12:02:57 -04:00
Cong Wang	e25f866fbc	ipv4: share tcp_v4_save_options() with cookie_v4_check() cookie_v4_check() allocates ip_options_rcu in the same way with tcp_v4_save_options(), we can just make it a helper function. Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 12:02:57 -04:00
Cong Wang	2077eebf7d	ipv4: call __ip_options_echo() in cookie_v4_check() commit `971f10eca1` ("tcp: better TCP_SKB_CB layout to reduce cache line misses") missed that cookie_v4_check() still calls ip_options_echo() which uses IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo() instead. Fixes: commit `971f10eca1` ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 12:02:57 -04:00
Fabian Frederick	4e8febd0a7	openvswitch: use vport instead of p All functions used struct vport *vport except ovs_vport_find_upcall_portid. This fixes 1 kerneldoc warning Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-15 23:25:33 -04:00
Fabian Frederick	7e78cc46b7	openvswitch: kerneldoc warning fix s/sock/gs Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-15 23:25:33 -04:00
Tom Herbert	04ffcb255f	net: Add ndo_gso_check Add ndo_gso_check which a device can define to indicate whether is is capable of doing GSO on a packet. This funciton would be called from the stack to determine whether software GSO is needed to be done. A driver should populate this function if it advertises GSO types for which there are combinations that it wouldn't be able to handle. For instance a device that performs UDP tunneling might only implement support for transparent Ethernet bridging type of inner packets or might have limitations on lengths of inner headers. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-15 12:11:00 -04:00
Linus Torvalds	0429fbc0bd	Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...	2014-10-15 07:48:18 +02:00
Linus Torvalds	6b04908166	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is the long-awaited discard support for RBD (Guangliang Zhao, Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's (Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and performance and debugging improvements (Yan, Zheng, John Spray), and a smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits) ceph: fix divide-by-zero in __validate_layout() rbd: rbd workqueues need a resque worker libceph: ceph-msgr workqueue needs a resque worker ceph: fix bool assignments libceph: separate multiple ops with commas in debugfs output libceph: sync osd op definitions in rados.h libceph: remove redundant declaration ceph: additional debugfs output ceph: export ceph_session_state_name function ceph: include the initial ACL in create/mkdir/mknod MDS requests ceph: use pagelist to present MDS request data libceph: reference counting pagelist ceph: fix llistxattr on symlink ceph: send client metadata to MDS ceph: remove redundant code for max file size verification ceph: remove redundant io_iter_advance() ceph: move ceph_find_inode() outside the s_mutex ceph: request xattrs if xattr_version is zero rbd: set the remaining discard properties to enable support rbd: use helpers to handle discard for layered images correctly ...	2014-10-15 06:46:01 +02:00
Michael S. Tsirkin	64b4cc3911	9p/trans_virtio: enable VQs early virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, but virtio 9p device adds self to channel list within probe, at which point VQ can be used in violation of the spec. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-10-15 10:25:04 +10:30
Eric Dumazet	9b462d02d6	tcp: TCP Small Queues and strange attractors TCP Small queues tries to keep number of packets in qdisc as small as possible, and depends on a tasklet to feed following packets at TX completion time. Choice of tasklet was driven by latencies requirements. Then, TCP stack tries to avoid reorders, by locking flows with outstanding packets in qdisc in a given TX queue. What can happen is that many flows get attracted by a low performing TX queue, and cpu servicing TX completion has to feed packets for all of them, making this cpu 100% busy in softirq mode. This became particularly visible with latest skb->xmit_more support Strategy adopted in this patch is to detect when tcp_wfree() is called from ksoftirqd and let the outstanding queue for this flow being drained before feeding additional packets, so that skb->ooo_okay can be set to allow select_queue() to select the optimal queue : Incoming ACKS are normally handled by different cpus, so this patch gives more chance for these cpus to take over the burden of feeding qdisc with future packets. Tested: lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 & lpaa23:~# sar -n DEV 1 10 \| grep eth1 06:16:18 AM eth1 595448.00 1190564.00 38381.09 1760253.12 0.00 0.00 1.00 06:16:19 AM eth1 594858.00 1189686.00 38340.76 1758952.72 0.00 0.00 0.00 06:16:20 AM eth1 597017.00 1194019.00 38480.79 1765370.29 0.00 0.00 1.00 06:16:21 AM eth1 595450.00 1190936.00 38380.19 1760805.05 0.00 0.00 0.00 06:16:22 AM eth1 596385.00 1193096.00 38442.56 1763976.29 0.00 0.00 1.00 06:16:23 AM eth1 598155.00 1195978.00 38552.97 1768264.60 0.00 0.00 0.00 06:16:24 AM eth1 594405.00 1188643.00 38312.57 1757414.89 0.00 0.00 1.00 06:16:25 AM eth1 593366.00 1187154.00 38252.16 1755195.83 0.00 0.00 0.00 06:16:26 AM eth1 593188.00 1186118.00 38232.88 1753682.57 0.00 0.00 1.00 06:16:27 AM eth1 596301.00 1192241.00 38440.94 1762733.09 0.00 0.00 0.00 Average: eth1 595457.30 1190843.50 38381.69 1760664.84 0.00 0.00 0.50 lpaa23:~# ./tc -s -d qd sh dev eth1 \| grep backlog backlog 7606336b 2513p requeues 167982 backlog 224072b 74p requeues 566 backlog 581376b 192p requeues 5598 backlog 181680b 60p requeues 1070 backlog 5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows backlog 157456b 52p requeues 1758 backlog 672216b 222p requeues 3025 backlog 60560b 20p requeues 24541 backlog 448144b 148p requeues 21258 lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect Immediate jump to full bandwidth, and traffic is properly shard on all tx queues. lpaa23:~# sar -n DEV 1 10 \| grep eth1 06:16:46 AM eth1 1397632.00 2795397.00 90081.87 4133031.26 0.00 0.00 1.00 06:16:47 AM eth1 1396874.00 2793614.00 90032.99 4130385.46 0.00 0.00 0.00 06:16:48 AM eth1 1395842.00 2791600.00 89966.46 4127409.67 0.00 0.00 1.00 06:16:49 AM eth1 1395528.00 2791017.00 89946.17 4126551.24 0.00 0.00 0.00 06:16:50 AM eth1 1397891.00 2795716.00 90098.74 4133497.39 0.00 0.00 1.00 06:16:51 AM eth1 1394951.00 2789984.00 89908.96 4125022.51 0.00 0.00 0.00 06:16:52 AM eth1 1394608.00 2789190.00 89886.90 4123851.36 0.00 0.00 1.00 06:16:53 AM eth1 1395314.00 2790653.00 89934.33 4125983.09 0.00 0.00 0.00 06:16:54 AM eth1 1396115.00 2792276.00 89984.25 4128411.21 0.00 0.00 1.00 06:16:55 AM eth1 1396829.00 2793523.00 90030.19 4130250.28 0.00 0.00 0.00 Average: eth1 1396158.40 2792297.00 89987.09 4128439.35 0.00 0.00 0.50 lpaa23:~# tc -s -d qd sh dev eth1 \| grep backlog backlog 7900052b 2609p requeues 173287 backlog 878120b 290p requeues 589 backlog 1068884b 354p requeues 5621 backlog 996212b 329p requeues 1088 backlog 984100b 325p requeues 115316 backlog 956848b 316p requeues 1781 backlog 1080996b 357p requeues 3047 backlog 975016b 322p requeues 24571 backlog 990156b 327p requeues 21274 (All 8 TX queues get a fair share of the traffic) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 17:16:26 -04:00
David S. Miller	e53da5fbfc	net: Trap attempts to call sock_kfree_s() with a NULL pointer. Unlike normal kfree() it is never right to call sock_kfree_s() with a NULL pointer, because sock_kfree_s() also has the side effect of discharging the memory from the sockets quota. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 17:02:37 -04:00
Cong Wang	dee49f203a	rds: avoid calling sock_kfree_s() on allocation failure It is okay to free a NULL pointer but not okay to mischarge the socket optmem accounting. Compile test only. Reported-by: rucsoftsec@gmail.com Cc: Chien Yen <chien.yen@oracle.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 17:00:19 -04:00
Fabian Frederick	91c4467e3c	caif_usb: use target structure member in memset parent cfusbl was used instead of first structure member 'layer' Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 16:05:45 -04:00
Fabian Frederick	7970f1918f	caif_usb: remove redundant memory message Let MM subsystem display out of memory messages. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 16:05:45 -04:00
Fabian Frederick	6ff1e1e3c8	caif: replace kmalloc/memset 0 by kzalloc Also add blank line after declaration Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 16:04:07 -04:00
Jiri Pirko	f76936d07c	ipv4: fix nexthop attlen check in fib_nh_match fib_nh_match does not match nexthops correctly. Example: ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \ nexthop via 192.168.122.13 dev eth0 ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \ nexthop via 192.168.122.15 dev eth0 Del command is successful and route is removed. After this patch applied, the route is correctly matched and result is: RTNETLINK answers: No such process Please consider this for stable trees as well. Fixes: `4e902c5741` ("[IPv4]: FIB configuration using struct fib_config") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 15:59:37 -04:00
Eric Dumazet	ad971f616a	tcp: fix tcp_ack() performance problem We worked hard to improve tcp_ack() performance, by not accessing skb_shinfo() in fast path (`cd7d8498c9` tcp: change tcp_skb_pcount() location) We still have one spurious access because of ACK timestamping, added in commit `e1c8a607b2` ("net-timestamp: ACK timestamp for bytestreams") By checking if sk_tsflags has SOF_TIMESTAMPING_TX_ACK set, we can avoid two cache line misses for the common case. While we are at it, add two prefetchw() : One in tcp_ack() to bring skb at the head of write queue. One in tcp_clean_rtx_queue() loop to bring following skb, as we will delete skb from the write queue and dirty skb->next->prev. Add a couple of [un]likely() clauses. After this patch, tcp_ack() is no longer the most consuming function in tcp stack. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 15:59:37 -04:00
Ilya Dryomov	f9865f06f7	libceph: ceph-msgr workqueue needs a resque worker Commit `f363e45fd1` ("net/ceph: make ceph_msgr_wq non-reentrant") effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is wrong - libceph is very much a memory reclaim path, so restore it. Cc: stable@vger.kernel.org # needs backporting for < 3.12 Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Tested-by: Micha Krause <micha@krausam.de> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:57:04 -07:00
Ilya Dryomov	25f897773b	libceph: separate multiple ops with commas in debugfs output For requests with multiple ops, separate ops with commas instead of \t, which is a field separator here. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:57:03 -07:00
Ilya Dryomov	70b5bfa360	libceph: sync osd op definitions in rados.h Bring in missing osd ops and strings, use macros to eliminate multiple points of maintenance. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:57:02 -07:00
Yan, Zheng	e4339d28f6	libceph: reference counting pagelist this allow pagelist to present data that may be sent multiple times. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:48 -07:00
Li RongQing	02ea80741a	ipv6: remove aca_lock spinlock from struct ifacaddr6 no user uses this lock. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 13:15:15 -04:00
Eric Dumazet	b2532eb9ab	tcp: fix ooo_okay setting vs Small Queues TCP Small Queues (tcp_tsq_handler()) can hold one reference on sk->sk_wmem_alloc, preventing skb->ooo_okay being set. We should relax test done to set skb->ooo_okay to take care of this extra reference. Minimal truesize of skb containing one byte of payload is SKB_TRUESIZE(1) Without this fix, we have more chance locking flows into the wrong transmit queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 13:12:00 -04:00
Ilya Dryomov	91883cd27c	libceph: don't try checking queue_work() return value queue_work() doesn't "fail to queue", it returns false if work was already on a queue, which can't happen here since we allocate event_work right before we queue it. So don't bother at all. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-10-14 21:03:25 +04:00
Joe Perches	b9a678994b	libceph: Convert pr_warning to pr_warn Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-10-14 21:03:23 +04:00
Li RongQing	589506f1e7	libceph: fix a use after free issue in osdmap_set_max_osd If the state variable is krealloced successfully, map->osd_state will be freed, once following two reallocation failed, and exit the function without resetting map->osd_state, map->osd_state become a wild pointer. fix it by resetting them after krealloc successfully. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-10-14 21:03:21 +04:00
Ilya Dryomov	dc220db03f	libceph: select CRYPTO_CBC in addition to CRYPTO_AES We want "cbc(aes)" algorithm, so select CRYPTO_CBC too, not just CRYPTO_AES. Otherwise on !CRYPTO_CBC kernels we fail rbd map/mount with libceph: error -2 building auth method x request Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-10-14 21:03:20 +04:00
Ilya Dryomov	2cc6128ab2	libceph: resend lingering requests with a new tid Both not yet registered (r_linger && list_empty(&r_linger_item)) and registered linger requests should use the new tid on resend to avoid the dup op detection logic on the OSDs, yet we were doing this only for "registered" case. Factor out and simplify the "registered" logic and use the new helper for "not registered" case as well. Fixes: http://tracker.ceph.com/issues/8806 Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-10-14 21:03:19 +04:00
Ilya Dryomov	f671b581f1	libceph: abstract out ceph_osd_request enqueue logic Introduce __enqueue_request() and switch to it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-10-14 21:03:18 +04:00
Daniel Borkmann	26b87c7881	net: sctp: fix remote memory pressure from excessive queueing This scenario is not limited to ASCONF, just taken as one example triggering the issue. When receiving ASCONF probes in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------> [...] ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------> ... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed ASCONFs and have increasing serial numbers, we process such ASCONF chunk(s) marked with !end_of_packet and !singleton, since we have not yet reached the SCTP packet end. SCTP does only do verification on a chunk by chunk basis, as an SCTP packet is nothing more than just a container of a stream of chunks which it eats up one by one. We could run into the case that we receive a packet with a malformed tail, above marked as trailing JUNK. All previous chunks are here goodformed, so the stack will eat up all previous chunks up to this point. In case JUNK does not fit into a chunk header and there are no more other chunks in the input queue, or in case JUNK contains a garbage chunk header, but the encoded chunk length would exceed the skb tail, or we came here from an entirely different scenario and the chunk has pdiscard=1 mark (without having had a flush point), it will happen, that we will excessively queue up the association's output queue (a correct final chunk may then turn it into a response flood when flushing the queue ;)): I ran a simple script with incremental ASCONF serial numbers and could see the server side consuming excessive amount of RAM [before/after: up to 2GB and more]. The issue at heart is that the chunk train basically ends with !end_of_packet and !singleton markers and since commit `2e3216cd54` ("sctp: Follow security requirement of responding with 1 packet") therefore preventing an output queue flush point in sctp_do_sm() -> sctp_cmd_interpreter() on the input chunk (chunk = event_arg) even though local_cork is set, but its precedence has changed since then. In the normal case, the last chunk with end_of_packet=1 would trigger the queue flush to accommodate possible outgoing bundling. In the input queue, sctp_inq_pop() seems to do the right thing in terms of discarding invalid chunks. So, above JUNK will not enter the state machine and instead be released and exit the sctp_assoc_bh_rcv() chunk processing loop. It's simply the flush point being missing at loop exit. Adding a try-flush approach on the output queue might not work as the underlying infrastructure might be long gone at this point due to the side-effect interpreter run. One possibility, albeit a bit of a kludge, would be to defer invalid chunk freeing into the state machine in order to possibly trigger packet discards and thus indirectly a queue flush on error. It would surely be better to discard chunks as in the current, perhaps better controlled environment, but going back and forth, it's simply architecturally not possible. I tried various trailing JUNK attack cases and it seems to look good now. Joint work with Vlad Yasevich. Fixes: `2e3216cd54` ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 12:46:22 -04:00
Daniel Borkmann	b69040d8e3	net: sctp: fix panic on duplicate ASCONF chunks When receiving a e.g. semi-good formed connection scan in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---------------- ASCONF_a; ASCONF_b -----------------> ... where ASCONF_a equals ASCONF_b chunk (at least both serials need to be equal), we panic an SCTP server! The problem is that good-formed ASCONF chunks that we reply with ASCONF_ACK chunks are cached per serial. Thus, when we receive a same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do not need to process them again on the server side (that was the idea, also proposed in the RFC). Instead, we know it was cached and we just resend the cached chunk instead. So far, so good. Where things get nasty is in SCTP's side effect interpreter, that is, sctp_cmd_interpreter(): While incoming ASCONF_a (chunk = event_arg) is being marked !end_of_packet and !singleton, and we have an association context, we do not flush the outqueue the first time after processing the ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it queued up, although we set local_cork to 1. Commit `2e3216cd54` changed the precedence, so that as long as we get bundled, incoming chunks we try possible bundling on outgoing queue as well. Before this commit, we would just flush the output queue. Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we continue to process the same ASCONF_b chunk from the packet. As we have cached the previous ASCONF_ACK, we find it, grab it and do another SCTP_CMD_REPLY command on it. So, effectively, we rip the chunk->list pointers and requeue the same ASCONF_ACK chunk another time. Since we process ASCONF_b, it's correctly marked with end_of_packet and we enforce an uncork, and thus flush, thus crashing the kernel. Fix it by testing if the ASCONF_ACK is currently pending and if that is the case, do not requeue it. When flushing the output queue we may relink the chunk for preparing an outgoing packet, but eventually unlink it when it's copied into the skb right before transmission. Joint work with Vlad Yasevich. Fixes: `2e3216cd54` ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 12:46:22 -04:00
Daniel Borkmann	9de7922bc7	net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks Commit `6f4c618ddb` ("SCTP : Add paramters validity check for ASCONF chunk") added basic verification of ASCONF chunks, however, it is still possible to remotely crash a server by sending a special crafted ASCONF chunk, even up to pre 2.6.12 kernels: skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950 end:0x440 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: <IRQ> [<ffffffff8144fb1c>] skb_put+0x5c/0x70 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp] [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp] [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp] [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440 [<ffffffff81496ac5>] ip_rcv+0x275/0x350 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750 [<ffffffff81460588>] netif_receive_skb+0x58/0x60 This can be triggered e.g., through a simple scripted nmap connection scan injecting the chunk after the handshake, for example, ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ------------------ ASCONF; UNKNOWN ------------------> ... where ASCONF chunk of length 280 contains 2 parameters ... 1) Add IP address parameter (param length: 16) 2) Add/del IP address parameter (param length: 255) ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the Address Parameter in the ASCONF chunk is even missing, too. This is just an example and similarly-crafted ASCONF chunks could be used just as well. The ASCONF chunk passes through sctp_verify_asconf() as all parameters passed sanity checks, and after walking, we ended up successfully at the chunk end boundary, and thus may invoke sctp_process_asconf(). Parameter walking is done with WORD_ROUND() to take padding into account. In sctp_process_asconf()'s TLV processing, we may fail in sctp_process_asconf_param() e.g., due to removal of the IP address that is also the source address of the packet containing the ASCONF chunk, and thus we need to add all TLVs after the failure to our ASCONF response to remote via helper function sctp_add_asconf_response(), which basically invokes a sctp_addto_chunk() adding the error parameters to the given skb. When walking to the next parameter this time, we proceed with ... length = ntohs(asconf_param->param_hdr.length); asconf_param = (void )asconf_param + length; ... instead of the WORD_ROUND()'ed length, thus resulting here in an off-by-one that leads to reading the follow-up garbage parameter length of 12336, and thus throwing an skb_over_panic for the reply when trying to sctp_addto_chunk() next time, which implicitly calls the skb_put() with that length. Fix it by using sctp_walk_params() [ which is also used in INIT parameter processing ] macro in the verification and* in ASCONF processing: it will make sure we don't spill over, that we walk parameters WORD_ROUND()'ed. Moreover, we're being more defensive and guard against unknown parameter types and missized addresses. Joint work with Vlad Yasevich. Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-14 12:46:22 -04:00
Pablo Neira Ayuso	493618a92c	netfilter: nft_compat: fix hook validation for non-base chains Set hook_mask to zero for non-base chains, otherwise people may hit bogus errors from the xt_check_target() and xt_check_match() when validating the uninitialized hook_mask. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-14 12:52:40 +02:00
Karl Beldan	c7abf25af0	mac80211: fix typo in starting baserate for rts_cts_rate_idx It affects non-(V)HT rates and can lead to selecting an rts_cts rate that is not a basic rate or way superior to the reference rate (ATM rates[0] used for the 1st attempt of the protected frame data). E.g, assuming drivers register growing (bitrate) sorted tables of ieee80211_rate-s, having : - rates[0].idx == d'2 and basic_rates == b'10100 will select rts_cts idx b'10011 & ~d'(BIT(2)-1), i.e. 1, likewise - rates[0].idx == d'2 and basic_rates == b'10001 will select rts_cts idx b'10000 The first is not a basic rate and the second is > rates[0]. Also, wrt severity of the addressed misbehavior, ATM we only have one rts_cts_rate_idx rather than one per rate table entry, so this idx might still point to bitrates > rates[1..MAX_RATES]. Fixes: `5253ffb8c9` ("mac80211: always pick a basic rate to tx RTS/CTS for pre-HT rates") Cc: stable@vger.kernel.org Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-14 11:16:16 +02:00
Andy Shevchenko	5df1415aee	lib80211: remove unused print_ssid() In kernel we have %*pE specifier to print an escaped buffer. All users now switched to that approach. This fixes a bug as well. The current implementation wrongly prints octal numbers: only two first digits are used in case when 3 are required and the rest of the string ends up cut off. Additionally by default the \f, \v, \a, and \e are escaped to their alphabetic representation. It's safe to do since it is currently used for messaging only. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "John W . Linville" <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:27 +02:00
Rasmus Villemoes	b7a8d756fb	batman-adv: replace strnicmp with strncasecmp The kernel used to contain two functions for length-delimited, case-insensitive string comparison, strnicmp with correct semantics and a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper for the new strncasecmp to avoid breaking existing users. To allow the compat wrapper strnicmp to be removed at some point in the future, and to avoid the extra indirection cost, do s/strnicmp/strncasecmp/g. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:24 +02:00
Rasmus Villemoes	18082746a2	netfilter: replace strnicmp with strncasecmp The kernel used to contain two functions for length-delimited, case-insensitive string comparison, strnicmp with correct semantics and a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper for the new strncasecmp to avoid breaking existing users. To allow the compat wrapper strnicmp to be removed at some point in the future, and to avoid the extra indirection cost, do s/strnicmp/strncasecmp/g. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:24 +02:00
Pablo Neira Ayuso	7210e4e38f	netfilter: nf_tables: restrict nat/masq expressions to nat chain type This adds the missing validation code to avoid the use of nat/masq from non-nat chains. The validation assumes two possible configuration scenarios: 1) Use of nat from base chain that is not of nat type. Reject this configuration from the nft__init() path of the expression. 2) Use of nat from non-base chain. In this case, we have to wait until the non-base chain is referenced by at least one base chain via jump/goto. This is resolved from the nft__validate() path which is called from nf_tables_check_loops(). The user gets an -EOPNOTSUPP in both cases. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-13 20:42:00 +02:00
Linus Torvalds	77c688ac87	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The big thing in this pile is Eric's unmount-on-rmdir series; we finally have everything we need for that. The final piece of prereqs is delayed mntput() - now filesystem shutdown always happens on shallow stack. Other than that, we have several new primitives for iov_iter (Matt Wilcox, culled from his XIP-related series) pushing the conversion to ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c cleanups and fixes (including the external name refcounting, which gives consistent behaviour of d_move() wrt procfs symlinks for long and short names alike) and assorted cleanups and fixes all over the place. This is just the first pile; there's a lot of stuff from various people that ought to go in this window. Starting with unionmount/overlayfs mess... ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits) fs/file_table.c: Update alloc_file() comment vfs: Deduplicate code shared by xattr system calls operating on paths reiserfs: remove pointless forward declaration of struct nameidata don't need that forward declaration of struct nameidata in dcache.h anymore take dname_external() into fs/dcache.c let path_init() failures treated the same way as subsequent link_path_walk() fix misuses of f_count() in ppp and netlink ncpfs: use list_for_each_entry() for d_subdirs walk vfs: move getname() from callers to do_mount() gfs2_atomic_open(): skip lookups on hashed dentry [infiniband] remove pointless assignments gadgetfs: saner API for gadgetfs_create_file() f_fs: saner API for ffs_sb_create_file() jfs: don't hash direct inode [s390] remove pointless assignment of ->f_op in vmlogrdr ->open() ecryptfs: ->f_op is never NULL android: ->f_op is never NULL nouveau: __iomem misannotations missing annotation in fs/file.c fs: namespace: suppress 'may be used uninitialized' warnings ...	2014-10-13 11:28:42 +02:00
Linus Torvalds	5e40d331bd	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris. Mostly ima, selinux, smack and key handling updates. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) integrity: do zero padding of the key id KEYS: output last portion of fingerprint in /proc/keys KEYS: strip 'id:' from ca_keyid KEYS: use swapped SKID for performing partial matching KEYS: Restore partial ID matching functionality for asymmetric keys X.509: If available, use the raw subjKeyId to form the key description KEYS: handle error code encoded in pointer selinux: normalize audit log formatting selinux: cleanup error reporting in selinux_nlmsg_perm() KEYS: Check hex2bin()'s return when generating an asymmetric key ID ima: detect violations for mmaped files ima: fix race condition on ima_rdwr_violation_check and process_measurement ima: added ima_policy_flag variable ima: return an error code from ima_add_boot_aggregate() ima: provide 'ima_appraise=log' kernel option ima: move keyring initialization to ima_init() PKCS#7: Handle PKCS#7 messages that contain no X.509 certs PKCS#7: Better handling of unsupported crypto KEYS: Overhaul key identification when searching for asymmetric keys KEYS: Implement binary asymmetric key ID handling ...	2014-10-12 10:13:55 -04:00
Linus Torvalds	ca321885b0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "This set fixes a bunch of fallout from the changes that went in during this merge window, particularly: - Fix fsl_pq_mdio (Claudiu Manoil) and fm10k (Pranith Kumar) build failures. - Several networking drivers do atomic_set() on page counts where that's not exactly legal. From Eric Dumazet. - Make __skb_flow_get_ports() work cleanly with unaligned data, from Alexander Duyck. - Fix some kernel-doc buglets in rfkill and netlabel, from Fabian Frederick. - Unbalanced enable_irq_wake usage in bcmgenet and systemport drivers, from Florian Fainelli. - pxa168_eth needs to depend on HAS_DMA, from Geert Uytterhoeven. - Multi-dequeue in the qdisc layer severely bypasses the fairness limits the previous code used to enforce, reintroduce in a way that at the same time doesn't compromise bulk dequeue opportunities. From Jesper Dangaard Brouer. - macvlan receive path unnecessarily hops through a softirq by using netif_rx() instead of netif_receive_skb(). From Jason Baron" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits) net: systemport: avoid unbalanced enable_irq_wake calls net: bcmgenet: avoid unbalanced enable_irq_wake calls net: bcmgenet: fix off-by-one in incrementing read pointer net: fix races in page->_count manipulation mlx4: fix race accessing page->_count ixgbe: fix race accessing page->_count igb: fix race accessing page->_count fm10k: fix race accessing page->_count net/phy: micrel: Add clock support for KSZ8021/KSZ8031 flow-dissector: Fix alignment issue in __skb_flow_get_ports net: filter: fix the comments Documentation: replace __sk_run_filter with __bpf_prog_run macvlan: optimize the receive path macvlan: pass 'bool' type to macvlan_count_rx() drivers: net: xgene: Add 10GbE ethtool support drivers: net: xgene: Add 10GbE support drivers: net: xgene: Preparing for adding 10GbE support dtb: Add 10GbE node to APM X-Gene SoC device tree Documentation: dts: Update section header for APM X-Gene MAINTAINERS: Update APM X-Gene section ...	2014-10-11 21:19:00 -04:00
Linus Torvalds	ef4a48c513	File locking related changes for v3.18 (pile #1 ) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUNZK4AAoJEAAOaEEZVoIVI08P/iM7eaIVRnqaqtWw/JBzxiba EMDlJYUBSlv6lYk9s8RJT4bMmcmGAKSYzVAHSoPahzNcqTDdFLeDTLGxJ8uKBbjf d1qRRdH1yZHGUzCvJq3mEendjfXn435Y3YburUxjLfmzrzW7EbMvndiQsS5dhAm9 PEZ+wrKF/zFL7LuXa1YznYrbqOD/GRsJAXGEWc3kNwfS9avephVG/RI3GtpI2PJj RY1mf8P7+WOlrShYoEuUo5aqs01MnU70LbqGHzY8/QKH+Cb0SOkCHZPZyClpiA+G MMJ+o2XWcif3BZYz+dobwz/FpNZ0Bar102xvm2E8fqByr/T20JFjzooTKsQ+PtCk DetQptrU2gtyZDKtInJUQSDPrs4cvA13TW+OEB1tT8rKBnmyEbY3/TxBpBTB9E6j eb/V3iuWnywR3iE+yyvx24Qe7Pov6deM31s46+Vj+GQDuWmAUJXemhfzPtZiYpMT exMXTyDS3j+W+kKqHblfU5f+Bh1eYGpG2m43wJVMLXKV7NwDf8nVV+Wea962ga+w BAM3ia4JRVgRWJBPsnre3lvGT5kKPyfTZsoG+kOfRxiorus2OABoK+SIZBZ+c65V Xh8VH5p3qyCUBOynXlHJWFqYWe2wH0LfbPrwe9dQwTwON51WF082EMG5zxTG0Ymf J2z9Shz68zu0ok8cuSlo =Hhee -----END PGP SIGNATURE----- Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux Pull file locking related changes from Jeff Layton: "This release is a little more busy for file locking changes than the last: - a set of patches from Kinglong Mee to fix the lockowner handling in knfsd - a pile of cleanups to the internal file lease API. This should get us a bit closer to allowing for setlease methods that can block. There are some dependencies between mine and Bruce's trees this cycle, and I based my tree on top of the requisite patches in Bruce's tree" * tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits) locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING locks: flock_make_lock should return a struct file_lock (or PTR_ERR) locks: set fl_owner for leases to filp instead of current->files locks: give lm_break a return value locks: __break_lease cleanup in preparation of allowing direct removal of leases locks: remove i_have_this_lease check from __break_lease locks: move freeing of leases outside of i_lock locks: move i_lock acquisition into generic_*_lease handlers locks: define a lm_setup handler for leases locks: plumb a "priv" pointer into the setlease routines nfsd: don't keep a pointer to the lease in nfs4_file locks: clean up vfs_setlease kerneldoc comments locks: generic_delete_lease doesn't need a file_lock at all nfsd: fix potential lease memory leak in nfs4_setlease locks: close potential race in lease_get_mtime security: make security_file_set_fowner, f_setown and __f_setown void return locks: consolidate "nolease" routines locks: remove lock_may_read and lock_may_write lockd: rip out deferred lock handling from testlock codepath NFSD: Get reference of lockowner when coping file_lock ...	2014-10-11 13:21:34 -04:00
Pablo Neira Ayuso	ab2d7251d6	netfilter: missing module license in the nf_reject_ipvX modules [ 23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel. Fixes: `c8d7b98` ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules") Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-11 14:59:41 +02:00
Eric Dumazet	4c450583d9	net: fix races in page->_count manipulation This is illegal to use atomic_set(&page->_count, ...) even if we 'own' the page. Other entities in the kernel need to use get_page_unless_zero() to get a reference to the page before testing page properties, so we could loose a refcount increment. The only case it is valid is when page->_count is 0 Fixes: `540eb7bf0b` ("net: Update alloc frag to reduce get/put page usage and recycle pages") Signed-off-by: Eric Dumaze <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-10 15:37:29 -04:00
Alexander Duyck	5af7fb6e3e	flow-dissector: Fix alignment issue in __skb_flow_get_ports This patch addresses a kernel unaligned access bug seen on a sparc64 system with an igb adapter. Specifically the __skb_flow_get_ports was returning a be32 pointer which was then having the value directly returned. In order to prevent this it is actually easier to simply not populate the ports or address values when an skb is not present. In this case the assumption is that the data isn't needed and rather than slow down the faster aligned accesses by making them have to assume the unaligned path on architectures that don't support efficent unaligned access it makes more sense to simply switch off the bits that were copying the source and destination address/port for the case where we only care about the protocol types and lengths which are normally 16 bit fields anyway. Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-10 15:33:47 -04:00
Li RongQing	8ea6e345a6	net: filter: fix the comments 1. sk_run_filter has been renamed, sk_filter() is using SK_RUN_FILTER. 2. Remove wrong comments about storing intermediate value. 3. replace sk_run_filter with __bpf_prog_run for check_load_and_stores's comments Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-10 15:11:51 -04:00
Alexei Starovoitov	38b3629adb	net: bpf: fix bpf syscall dependence on anon_inodes minimal configurations where EPOLL, PERF_EVENTS, etc are disabled, but NET is enabled, are failing to build with link error: kernel/built-in.o: In function `bpf_prog_load': syscall.c:(.text+0x3b728): undefined reference to `anon_inode_getfd' fix it by selecting ANON_INODES when NET is enabled Reported-by: Michal Sojka <sojkam1@fel.cvut.cz> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-10 15:02:23 -04:00
David S. Miller	7b6fa1eef6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter fixes for net-next This batch contains two fixes for what you have in your net-next, they are: 1) Remove nf_send_reset6() from header file. This function now resides in the nf_reject_ipv6 module. Reported by Eric Dumazet. 2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix errors reported by Dan Carpenter's static analysis tools. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-10 15:01:09 -04:00
David S. Miller	4511a4a50e	Merge tag 'master-2014-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-10-09 Please pull this batch of fixes intended for the 3.18 stream! Andrea Merello makes rtl818x_pci use a more reasonable transmission rate for HW generated frames. Fabian Frederick tweaks some kernel-doc bits to avoid warnings. Larry Finger corrects a possible unaligned access in the rtlwifi code. Marek Puzyniak avoids a kernel panic in ath9k_hw_reset. Sujith Manoharan goes for the hat trick -- he fixes a smatch warning in the shared ath code, he fixes a crash in ath9k, and he corrects a sequence number assignment problem in ath9k too. For ease of merging, I pulled the last bits of the wireless tree as well... Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-10 14:49:55 -04:00
Karl Beldan	2a84ee8625	cfg80211: set the rates mask in connection probes over specified freq ATM, specifying the frequency when connecting sends a void 'supported rates' EID. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> [fix memory leak in error path] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-10 17:11:13 +02:00
Michal Kazior	486cf4c08f	mac80211: enable DFS with channel contexts It is okay to enable DFS for channel contexts based drivers as long as no combination advertises radar detection and multi-channel operation at the same time. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-10 17:08:33 +02:00
Linus Torvalds	c798360cd1	Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "A lot of activities on percpu front. Notable changes are... - percpu allocator now can take @gfp. If @gfp doesn't contain GFP_KERNEL, it tries to allocate from what's already available to the allocator and a work item tries to keep the reserve around certain level so that these atomic allocations usually succeed. This will replace the ad-hoc percpu memory pool used by blk-throttle and also be used by the planned blkcg support for writeback IOs. Please note that I noticed a bug in how @gfp is interpreted while preparing this pull request and applied the fix `6ae833c7fe` ("percpu: fix how @gfp is interpreted by the percpu allocator") just now. - percpu_ref now uses longs for percpu and global counters instead of ints. It leads to more sparse packing of the percpu counters on 64bit machines but the overhead should be negligible and this allows using percpu_ref for refcnting pages and in-memory objects directly. - The switching between percpu and single counter modes of a percpu_ref is made independent of putting the base ref and a percpu_ref can now optionally be initialized in single or killed mode. This allows avoiding percpu shutdown latency for cases where the refcounted objects may be synchronously created and destroyed in rapid succession with only a fraction of them reaching fully operational status (SCSI probing does this when combined with blk-mq support). It's also planned to be used to implement forced single mode to detect underflow more timely for debugging. There's a separate branch percpu/for-3.18-consistent-ops which cleans up the duplicate percpu accessors. That branch causes a number of conflicts with s390 and other trees. I'll send a separate pull request w/ resolutions once other branches are merged" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits) percpu: fix how @gfp is interpreted by the percpu allocator blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky percpu_ref: add PERCPU_REF_INIT_* flags percpu_ref: decouple switching to percpu mode and reinit percpu_ref: decouple switching to atomic mode and killing percpu_ref: add PCPU_REF_DEAD percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch percpu_ref: replace pcpu_ prefix with percpu_ percpu_ref: minor code and comment updates percpu_ref: relocate percpu_ref_reinit() Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" Revert "percpu: free percpu allocation info for uniprocessor system" percpu-refcount: make percpu_ref based on longs instead of ints percpu-refcount: improve WARN messages percpu: fix locking regression in the failure path of pcpu_alloc() percpu-refcount: add @gfp to percpu_ref_init() proportions: add @gfp to init functions percpu_counter: add @gfp to percpu_counter_init() percpu_counter: make percpu_counters_lock irq-safe ...	2014-10-10 07:26:02 -04:00
Jesper Dangaard Brouer	b8358d70ce	net_sched: restore qdisc quota fairness limits after bulk dequeue Restore the quota fairness between qdisc's, that we broke with commit `5772e9a346` ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"). Before that commit, the quota in __qdisc_run() were in packets as dequeue_skb() would only dequeue a single packet, that assumption broke with bulk dequeue. We choose not to account for the number of packets inside the TSO/GSO packets (accessable via "skb_gso_segs"). As the previous fairness also had this "defect". Thus, GSO/TSO packets counts as a single packet. Further more, we choose to slack on accuracy, by allowing a bulk dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only limited by the BQL bytelimit. This is done because BQL prefers to get its full budget for appropriate feedback from TX completion. In future, we might consider reworking this further and, if it allows, switch to a time-based model, as suggested by Eric. Right now, we only restore old semantics. Joint work with Eric, Hannes, Daniel and Jesper. Hannes wrote the first patch in cooperation with Daniel and Jesper. Eric rewrote the patch. Fixes: `5772e9a346` ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-09 19:12:26 -04:00
Masanari Iida	de3f0d0eff	net: Missing @ before descriptions cause make xmldocs warning This patch fix following warning. Warning(.//net/core/skbuff.c:4142): No description found for parameter 'header_len' Warning(.//net/core/skbuff.c:4142): No description found for parameter 'data_len' Warning(.//net/core/skbuff.c:4142): No description found for parameter 'max_page_order' Warning(.//net/core/skbuff.c:4142): No description found for parameter 'errcode' Warning(.//net/core/skbuff.c:4142): No description found for parameter 'gfp_mask' Acutually the descriptions exist, but missing "@" in front. This problem start to happen when following commit was merged into Linus's tree during 3.18-rc1 merge period. commit `2e4e441071` net: add alloc_skb_with_frags() helper Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-09 18:57:14 -04:00
Fabian Frederick	408b18abf6	mac80211: directly return ieee80211_vif_use_reserved_context() No need to store ieee80211_vif_use_reserved_context() result and test it before returning. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 20:54:04 +02:00
Luciano Coelho	0f791eb47f	mac80211: allow channel switch with multiple channel contexts Channel switch with multiple channel contexts should now work fine. Remove check that disallows switches when multiple contexts are in use. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:30:09 +02:00
Luciano Coelho	0c21e6320f	mac80211: wait for the first beacon on the new channel after CSA Instead of immediately reopening the queues (in case of block_tx), calling the post_channel_switch operation and sending the notification, wait for the first beacon on the new channel. This makes sure that we don't lose packets if the AP/GO is not on the new channel yet. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:30:09 +02:00
Luciano Coelho	f1d65583bc	mac80211: add post_channel_switch driver operation As a counterpart to the pre_channel_switch operation, add a post_channel_switch operation. This allows the drivers to go back to a normal configuration after the channel switch is completed. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:30:09 +02:00
Luciano Coelho	6d027bcc8a	mac80211: add pre_channel_switch driver operation Some drivers may need to prepare for a channel switch also when it is initiated from the remote side (eg. station, P2P client). To make this possible, add a generic callback that can be called for all interface types. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:30:08 +02:00
Luciano Coelho	e9a21949b7	mac80211: add extended channel switching capability if the driver supports CSA The Extended Channel Switching capability bit in the extended capabilities element must be set if the driver supports CSA on non-beaconing interfaces. Since this capability needs to be set during driver registration, the extended_capabiliities global variable needs to be moved to the local structure so that it can be modified. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:30:08 +02:00
Luciano Coelho	2ba45384e5	mac80211: add device_timestamp to the ieee80211_channel_switch struct Some devices may need the device timestamp in order to synchronize the channel switch. To pass this value back to the driver, add it to the channel switch structure and copy the device_timestamp value received in the rx info structure into it. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:30:08 +02:00
Luciano Coelho	252e07ca5f	nl80211: sanity check the channel switch counter value The nl80211 channel switch count attribute (NL80211_ATTR_CH_SWITCH_COUNT) is specified as u32, but the specification uses u8 for the counter. To make sure strange things don't happen without informing the user, sanity check the value and return -EINVAL if it doesn't fit in u8. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:25:11 +02:00
Henning Rogge	a2db2ed3fb	mac80211: implement cfg80211_ops to query mesh proxy path table Implement get_mpp and dump_mpp cfg80211_ops to export the content of the 802.11s mesh proxy path table to userspace. Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:19:07 +02:00
Henning Rogge	66be7d2bcd	cfg80211: add ops to query mesh proxy path table Add two new cfg80211 operations for querying a table with proxied mesh paths. Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:19:07 +02:00
Fabian Frederick	bc37b16870	net: rfkill: kernel-doc warning fixes Correct the kernel-doc, the parameter is called "blocked" not "state". Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:16:15 +02:00
Luciano Coelho	c12bc4885f	mac80211: return the vif's chandef in ieee80211_cfg_get_channel() The chandef of the channel context a vif is using may be different than the chandef of the vif itself. For instance, the bandwidth used by the vif may be narrower than the one configured in the channel context. To avoid confusion, return the vif's chandef in ieee80211_cfg_get_channel() instead of the chandef of the channel context. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 11:01:58 +02:00
Karl Beldan	cc61d8df0a	mac80211: minstrel_ht: fix MCS_GROUP_RATES usage Commit `5935839ad7` ("mac80211: improve minstrel_ht rate sorting by throughput & probability") replaced the constant 8 with MCS_GROUP_RATES when getting the number of streams of an HT MCS. See commit `7a5e3fa2c8` ("mac80211: minstrel_ht: replace some occurences of MCS_GROUP_RATES"). Fixes: `5935839ad7` ("mac80211: improve minstrel_ht rate sorting by throughput & probability") Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 10:51:19 +02:00
Liad Kaufman	8975ae88e1	mac80211: fix warning on htmldocs for last_tdls_pkt_time Forgot to add an entry to the struct description of sta_info. Signed-off-by: Liad Kaufman <liad.kaufman@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-10-09 10:33:29 +02:00
Al Viro	24dff96a37	fix misuses of f_count() in ppp and netlink we used to check for "nobody else could start doing anything with that opened file" by checking that refcount was 2 or less - one for descriptor table and one we'd acquired in fget() on the way to wherever we are. That was race-prone (somebody else might have had a reference to descriptor table and do fget() just as we'd been checking) and it had become flat-out incorrect back when we switched to fget_light() on those codepaths - unlike fget(), it doesn't grab an extra reference unless the descriptor table is shared. The same change allowed a race-free check, though - we are safe exactly when refcount is less than 2. It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH, netlink hadn't grown that check until 3.9 and ppp used to live in drivers/net, not drivers/net/ppp until 3.1. The bug existed well before that, though, and the same fix used to apply in old location of file. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-09 02:39:17 -04:00
Fabian Frederick	59f35b810e	netlabel: kernel-doc warning fix no secid argument in netlbl_cfg_unlbl_static_del Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-09 01:40:05 -04:00
Linus Torvalds	35a9ad8af0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Most notable changes in here: 1) By far the biggest accomplishment, thanks to a large range of contributors, is the addition of multi-send for transmit. This is the result of discussions back in Chicago, and the hard work of several individuals. Now, when the ->ndo_start_xmit() method of a driver sees skb->xmit_more as true, it can choose to defer the doorbell telling the driver to start processing the new TX queue entires. skb->xmit_more means that the generic networking is guaranteed to call the driver immediately with another SKB to send. There is logic added to the qdisc layer to dequeue multiple packets at a time, and the handling mis-predicted offloads in software is now done with no locks held. Finally, pktgen is extended to have a "burst" parameter that can be used to test a multi-send implementation. Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4, virtio_net Adding support is almost trivial, so export more drivers to support this optimization soon. I want to thank, in no particular or implied order, Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann, David Tat, Hannes Frederic Sowa, and Rusty Russell. 2) PTP and timestamping support in bnx2x, from Michal Kalderon. 3) Allow adjusting the rx_copybreak threshold for a driver via ethtool, and add rx_copybreak support to enic driver. From Govindarajulu Varadarajan. 4) Significant enhancements to the generic PHY layer and the bcm7xxx driver in particular (EEE support, auto power down, etc.) from Florian Fainelli. 5) Allow raw buffers to be used for flow dissection, allowing drivers to determine the optimal "linear pull" size for devices that DMA into pools of pages. The objective is to get exactly the necessary amount of headers into the linear SKB area pre-pulled, but no more. The new interface drivers use is eth_get_headlen(). From WANG Cong, with driver conversions (several had their own by-hand duplicated implementations) by Alexander Duyck and Eric Dumazet. 6) Support checksumming more smoothly and efficiently for encapsulations, and add "foo over UDP" facility. From Tom Herbert. 7) Add Broadcom SF2 switch driver to DSA layer, from Florian Fainelli. 8) eBPF now can load programs via a system call and has an extensive testsuite. Alexei Starovoitov and Daniel Borkmann. 9) Major overhaul of the packet scheduler to use RCU in several major areas such as the classifiers and rate estimators. From John Fastabend. 10) Add driver for Intel FM10000 Ethernet Switch, from Alexander Duyck. 11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric Dumazet. 12) Add Datacenter TCP congestion control algorithm support, From Florian Westphal. 13) Reorganize sk_buff so that __copy_skb_header() is significantly faster. From Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits) netlabel: directly return netlbl_unlabel_genl_init() net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers net: description of dma_cookie cause make xmldocs warning cxgb4: clean up a type issue cxgb4: potential shift wrapping bug i40e: skb->xmit_more support net: fs_enet: Add NAPI TX net: fs_enet: Remove non NAPI RX r8169:add support for RTL8168EP net_sched: copy exts->type in tcf_exts_change() wimax: convert printk to pr_foo() af_unix: remove 0 assignment on static ipv6: Do not warn for informational ICMP messages, regardless of type. Update Intel Ethernet Driver maintainers list bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING tipc: fix bug in multicast congestion handling net: better IFF_XMIT_DST_RELEASE support net/mlx4_en: remove NETDEV_TX_BUSY 3c59x: fix bad split of cpu_to_le32(pci_map_single()) net: bcmgenet: fix Tx ring priority programming ...	2014-10-08 21:40:54 -04:00
David S. Miller	64b1f00a08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-10-08 16:22:22 -04:00
Fabian Frederick	16b99a4f66	netlabel: directly return netlbl_unlabel_genl_init() No need to store netlbl_unlabel_genl_init result and test it before returning. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-08 16:08:04 -04:00
WANG Cong	5301e3e117	net_sched: copy exts->type in tcf_exts_change() We need to copy exts->type when committing the change, otherwise it would be always 0. This is a quick fix for -net and -stable, for net-next tcf_exts will be removed. Fixes: commit `33be627159` ("net_sched: act: use standard struct list_head") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-08 15:41:27 -04:00
Fabian Frederick	2f29fed3f8	net: rfkill: kernel-doc warning fixes s/state/blocked Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-10-08 15:24:15 -04:00
Linus Torvalds	6dea0737bc	Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "Highlights: - support the NFSv4.2 SEEK operation (allowing clients to support SEEK_HOLE/SEEK_DATA), thanks to Anna. - end the grace period early in a number of cases, mitigating a long-standing annoyance, thanks to Jeff - improve SMP scalability, thanks to Trond" * 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits) nfsd: eliminate "to_delegation" define NFSD: Implement SEEK NFSD: Add generic v4.2 infrastructure svcrdma: advertise the correct max payload nfsd: introduce nfsd4_callback_ops nfsd: split nfsd4_callback initialization and use nfsd: introduce a generic nfsd4_cb nfsd: remove nfsd4_callback.cb_op nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence nfsd: fix nfsd4_cb_recall_done error handling nfsd4: clarify how grace period ends nfsd4: stop grace_time update at end of grace period nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls nfsd: serialize nfsdcltrack upcalls for a particular client nfsd: pass extra info in env vars to upcalls to allow for early grace period end nfsd: add a v4_end_grace file to /proc/fs/nfsd lockd: add a /proc/fs/lockd/nlm_end_grace file nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE nfsd: remove redundant boot_time parm from grace_done client tracking op ...	2014-10-08 12:51:44 -04:00
Linus Torvalds	25641c0c8d	NFS client updates for Linux 3.18 Highlights include: Stable fixes: - fix an NFSv4.1 state renewal regression - fix open/lock state recovery error handling - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails - fix statd when reconnection fails - Don't wake tasks during connection abort - Don't start reboot recovery if lease check fails - fix duplicate proc entries Features: - pNFS block driver fixes and clean ups from Christoph - More code cleanups from Anna - Improve mmap() writeback performance - Replace use of PF_TRANS with a more generic mechanism for avoiding deadlocks in nfs_release_page -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUMpFYAAoJEGcL54qWCgDywHYP/A7XNykwOGhoHVP1Cgr3xqoz gVhAw97AEMZE8xSNVEGS++pJTe59JVzsIsYAwdHMwePV33l3zyzYorae6N9p7zWF 0xVaNQ4qNLVhbrNLAoB5KA/c3/jMnNjF5t15+8akZad5pt4kXLlhSKjyVpdEEtJE A0eneXShMYEeLZoOJhpQt5bsw0OZ8YbWWEMjGlDqyeelvV3K1+zfivQOoyX6hS4w XFkPEDmU7zunE/xFP9ZoUaVdLO0TvOWfEZ7STWoHm7NuWfPQiDb9w1mTnuZbZyka ssezoGcitzwsjCcQ5e1iKTOoFRIsm/zYXFQgFQL7VFMBU1Tss9Of8047EyDkqcPF GxctsGg0gQ2FkG7yx7JH7AKpyibOIuByQrQQ916coWSf7K0L4H4Rcky3vryroylP 1e1RI49xu215OTm+dLvlvYCv55bqCrTmaUGImZac18+ixD2eh6MNfW2ubSdxk89L U2rTFV09Bd52N7IQOGQx1FBEI2ZnIFUV4UaFz7v+rGFxOnk6+WYe+iWyb4wC70Yc 8Jh/gTIQDd5aghql3FTieMOyfEvO6Re4pLMXmqEWMAevicx2t8DwkJriRu6X8Iy2 rlDlBPwu5QmRWC20Dc897f0VajwDtwdeB8puod7nobOWzOfx4FrNqLJ+jR3pmHUk 0otvJytqemXt+zkqqHKK =/OQi -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - fix an NFSv4.1 state renewal regression - fix open/lock state recovery error handling - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails - fix statd when reconnection fails - don't wake tasks during connection abort - don't start reboot recovery if lease check fails - fix duplicate proc entries Features: - pNFS block driver fixes and clean ups from Christoph - More code cleanups from Anna - Improve mmap() writeback performance - Replace use of PF_TRANS with a more generic mechanism for avoiding deadlocks in nfs_release_page" * tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (66 commits) NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT NFSv3: Fix missing includes of nfs3_fs.h NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() NFS: avoid waiting at all in nfs_release_page when congested. NFS: avoid deadlocks with loop-back mounted NFS filesystems. MM: export page_wakeup functions SCHED: add some "wait..on_bit...timeout()" interfaces. NFS: don't use STABLE writes during writeback. NFSv4: use exponential retry on NFS4ERR_DELAY for async requests. rpc: Add -EPERM processing for xs_udp_send_request() rpc: return sent and err from xs_sendpages() lockd: Try to reconnect if statd has moved SUNRPC: Don't wake tasks during connection abort Fixing lease renewal nfs: fix duplicate proc entries pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe ...	2014-10-08 12:49:23 -04:00
Linus Torvalds	28596c9722	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull "trivial tree" updates from Jiri Kosina: "Usual pile from trivial tree everyone is so eagerly waiting for" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Remove MN10300_PROC_MN2WS0038 mei: fix comments treewide: Fix typos in Kconfig kprobes: update jprobe_example.c for do_fork() change Documentation: change "&" to "and" in Documentation/applying-patches.txt Documentation: remove obsolete pcmcia-cs from Changes Documentation: update links in Changes Documentation: Docbook: Fix generated DocBook/kernel-api.xml score: Remove GENERIC_HAS_IOMAP gpio: fix 'CONFIG_GPIO_IRQCHIP' comments tty: doc: Fix grammar in serial/tty dma-debug: modify check_for_stack output treewide: fix errors in printk genirq: fix reference in devm_request_threaded_irq comment treewide: fix synchronize_rcu() in comments checkstack.pl: port to AArch64 doc: queue-sysfs: minor fixes init/do_mounts: better syntax description MIPS: fix comment spelling powerpc/simpleboot: fix comment ...	2014-10-07 21:16:26 -04:00
Linus Torvalds	d0cd84817c	dmaengine-3.17 1/ Step down as dmaengine maintainer see commit `08223d80df` "dmaengine maintainer update" 2/ Removal of net_dma, as it has been marked 'broken' since 3.13 (commit `7787380336` "net_dma: mark broken"), without reports of performance regression. 3/ Miscellaneous fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUKDLKAAoJEB7SkWpmfYgC7wwP/iNHqRjf1suMUTBIF3P6Hgbe VCUwh0IkuujMPDG46WRn6cYzarRxVPLoGaLHLPszgjI6pmGPVv19wqeDOlUxtcmr 0iQWEWv/zqseaAIW+4gj/WYCyMgKil49EUBJKCZCfNmIaad+e0pr8f0uE5yOkHPM tqWoZERu9A4dlXGr1TjeOZVzdnPrCt92MrLDN6ZZ6tMuJaEc5PauaLxKTeGy5fYj UB+k1xJQzECbsYfpB+uCVYl5/qPO1rNyuBYS8THCsW+JYmrbbfH2kkF2lo2FaUpO 8Yd50FtzXHKWwAt7BzfIwU2M7x0wRmryrC/xsQi6M+WmVeHYvvHUIpzaA66xRZ5x fCy3Fu8sEnmnmboAbh2v2c5uTycqRl2xPzbpLAuxglloXIxzi3ckp6ESF/Z4SldH oxIoEievN7lah3vKgvlHZYcWDzrYr8EKf/EzFe9RqDBQDKtzDzre1H9Uivr387Vm uFUcGHYG/GXuX47C7EUsMtaSW2UEoR2ytw/HR6CKFPTVXwAzEO6kA9vg0EqL0iIq 2wVLgavlZuwegmaUBgnr+bgVZMvVN7OU7fAIRVe5xNO6itrPKvheSlQthmRiiq9C uzOu4PS6PexqzHUNPCcJpCsj+lawmCSrE0bxtPzTA/CQInVgWs219V9+W5Gn/0YA EARN9k6ueX9PZPQrPQLm =BBBv -----END PGP SIGNATURE----- Merge tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine Pull dmaengine updates from Dan Williams: "Even though this has fixes marked for -stable, given the size and the needed conflict resolutions this is 3.18-rc1/merge-window material. These patches have been languishing in my tree for a long while. The fact that I do not have the time to do proper/prompt maintenance of this tree is a primary factor in the decision to step down as dmaengine maintainer. That and the fact that the bulk of drivers/dma/ activity is going through Vinod these days. The net_dma removal has not been in -next. It has developed simple conflicts against mainline and net-next (for-3.18). Continuing thanks to Vinod for staying on top of drivers/dma/. Summary: 1/ Step down as dmaengine maintainer see commit `08223d80df` "dmaengine maintainer update" 2/ Removal of net_dma, as it has been marked 'broken' since 3.13 (commit `7787380336` "net_dma: mark broken"), without reports of performance regression. 3/ Miscellaneous fixes" * tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine: net: make tcp_cleanup_rbuf private net_dma: revert 'copied_early' net_dma: simple removal dmaengine maintainer update dmatest: prevent memory leakage on error path in thread ioat: Use time_before_jiffies() dmaengine: fix xor sources continuation dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup() dma: mv_xor: Remove all callers of mv_xor_slot_cleanup() dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call ioat: Use pci_enable_msix_exact() instead of pci_enable_msix() drivers: dma: Include appropriate header file in dca.c drivers: dma: Mark functions as static in dma_v3.c dma: mv_xor: Add DMA API error checks ioat/dca: Use dev_is_pci() to check whether it is pci device	2014-10-07 20:39:25 -04:00
Fabian Frederick	28b7deae75	wimax: convert printk to pr_foo() Use current logging functions and add module name prefix. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 20:28:44 -04:00
Fabian Frederick	505e907db3	af_unix: remove 0 assignment on static static values are automatically initialized to 0 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 17:03:14 -04:00
David S. Miller	ea85a0a2dc	ipv6: Do not warn for informational ICMP messages, regardless of type. There is no reason to emit a log message for these. Based upon a suggestion from Hannes Frederic Sowa. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>	2014-10-07 16:33:53 -04:00
Herbert Xu	93fdd47e52	bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING As we may defragment the packet in IPv4 PRE_ROUTING and refragment it after POST_ROUTING we should save the value of frag_max_size. This is still very wrong as the bridge is supposed to leave the packets intact, meaning that the right thing to do is to use the original frag_list for fragmentation. Unfortunately we don't currently guarantee that the frag_list is left untouched throughout netfilter so until this changes this is the best we can do. There is also a spot in FORWARD where it appears that we can forward a packet without going through fragmentation, mark it so that we can fix it later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 15:12:44 -04:00
Jon Maloy	908344cdda	tipc: fix bug in multicast congestion handling One aim of commit `50100a5e39` ("tipc: use pseudo message to wake up sockets after link congestion") was to handle link congestion abatement in a uniform way for both unicast and multicast transmit. However, the latter doesn't work correctly, and has been broken since the referenced commit was applied. If a user now sends a burst of multicast messages that is big enough to cause broadcast link congestion, it will be put to sleep, and not be waked up when the congestion abates as it should be. This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS, is set correctly, but in the wrong field. Instead of setting it in the 'action_flags' field of the arrival node struct, it is by mistake set in the dummy node struct that is owned by the broadcast link, where it will never tested for. Second, we cannot use the same flag for waking up unicast and multicast users, since the function tipc_node_unlock() needs to pick the wakeup pseudo messages to deliver from different queues. It must hence be able to distinguish between the two cases. This commit solves this problem by adding a new flag TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user(). The latter is to be called by tipc_node_unlock() when the named flag, now set in the correct field, is encountered. v2: using explicit 'unsigned int' declaration instead of 'uint', as per comment from David Miller. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 14:50:15 -04:00
John W. Linville	d7ffd588f0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2014-10-07 14:48:29 -04:00
Pablo Neira Ayuso	f0d1f04f0a	netfilter: fix wrong arithmetics regarding NFT_REJECT_ICMPX_MAX NFT_REJECT_ICMPX_MAX should be __NFT_REJECT_ICMPX_MAX - 1. nft_reject_icmp_code() and nft_reject_icmpv6_code() are called from the packet path, so BUG_ON in case we try to access an unknown abstracted ICMP code. This should not happen since we already validate this from nft_reject_{inet,bridge}_init(). Fixes: `51b0a5d` ("netfilter: nft_reject: introduce icmp code abstraction for inet and bridge") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-07 20:16:31 +02:00
Eric Dumazet	0287587884	net: better IFF_XMIT_DST_RELEASE support Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 13:22:11 -04:00
WANG Cong	02c0fc1b8f	net_sched: fix unused variables in __gnet_stats_copy_basic_cpu() Probably not a big deal, but we'd better just use the one we get in retry loop. Fixes: commit `22e0f8b932` ("net: sched: make bstats per cpu and estimator RCU safe") Reported-by: Joe Perches <joe@perches.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:10:49 -04:00
Andy Zhou	7c5df8fa19	openvswitch: fix a compilation error when CONFIG_INET is not setW! Fix a openvswitch compilation error when CONFIG_INET is not set: ===================================================== In file included from include/net/geneve.h:4:0, from net/openvswitch/flow_netlink.c:45: include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads': >> include/net/udp_tunnel.h💯2: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration] >> return iptunnel_handle_offloads(skb, udp_csum, type); >> ^ >> >> include/net/udp_tunnel.h💯2: warning: return makes pointer from integer without a cast >> >> cc1: some warnings being treated as errors ===================================================== Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:10:49 -04:00
Andy Zhou	0a5d1c55fa	openvswitch: fix a sparse warning Fix a sparse warning introduced by commit: `f579668406` (openvswitch: Add support for Geneve tunneling.) caught by kbuild test robot: reproduce: # apt-get install sparse # git checkout `f579668406` # make ARCH=x86_64 allmodconfig # make C=1 CF=-D__CHECK_ENDIAN__ # # # sparse warnings: (new ones prefixed by >>) # # >> net/openvswitch/vport-geneve.c:109:15: sparse: incorrect type in assignment (different base types) # net/openvswitch/vport-geneve.c:109:15: expected restricted __be16 [usertype] sport # net/openvswitch/vport-geneve.c:109:15: got int # >> net/openvswitch/vport-geneve.c:110:56: sparse: incorrect type in argument 3 (different base types) # net/openvswitch/vport-geneve.c:110:56: expected unsigned short [unsigned] [usertype] value # net/openvswitch/vport-geneve.c:110:56: got restricted __be16 [usertype] sport Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:10:48 -04:00
Andy Zhou	42350dcaaf	net: fix a sparse warning Fix a sparse warning introduced by Commit `0b5e8b8eea` (net: Add Geneve tunneling protocol driver) caught by kbuild test robot: # apt-get install sparse # git checkout `0b5e8b8eea` # make ARCH=x86_64 allmodconfig # make C=1 CF=-D__CHECK_ENDIAN__ # # # sparse warnings: (new ones prefixed by >>) # # >> net/ipv4/geneve.c:230:42: sparse: incorrect type in assignment (different base types) # net/ipv4/geneve.c:230:42: expected restricted __be32 [addressable] [assigned] [usertype] s_addr # net/ipv4/geneve.c:230:42: got unsigned long [unsigned] <noident> # Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:10:47 -04:00
Hannes Frederic Sowa	327571cb10	ipv6: don't walk node's leaf during serial number update Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa	812918c464	ipv6: make fib6 serial number per namespace Try to reduce number of possible fn_sernum mutation by constraining them to their namespace. Also remove rt_genid which I forgot to remove in `705f1c869d` ("ipv6: remove rt6i_genid"). Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa	c8c4d42a6b	ipv6: only generate one new serial number per fib mutation Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa	42b1870646	ipv6: make rt_sernum atomic and serial number fields ordinary ints Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa	94b2cfe02b	ipv6: minor fib6 cleanups like type safety, bool conversion, inline removal Also renamed struct fib6_walker_t to fib6_walker and enum fib_walk_state_t to fib6_walk_state as recommended by Cong Wang. Cc: Cong Wang <cwang@twopensource.com> Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 00:02:30 -04:00
Eric Dumazet	1ff0dc9499	net: validate_xmit_vlan() is static Marking this as static allows compiler to inline it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 18:17:17 -04:00
Fabian Frederick	79952bca86	net: fix rcu access on phonet_routes -Add __rcu annotation on table to fix sparse warnings: net/phonet/pn_dev.c:279:25: warning: incorrect type in assignment (different address spaces) net/phonet/pn_dev.c:279:25: expected struct net_device <noident> net/phonet/pn_dev.c:279:25: got void [noderef] <asn:4><noident> net/phonet/pn_dev.c:376:17: warning: incorrect type in assignment (different address spaces) net/phonet/pn_dev.c:376:17: expected struct net_device volatile <noident> net/phonet/pn_dev.c:376:17: got struct net_device [noderef] <asn:4><noident> net/phonet/pn_dev.c:392:17: warning: incorrect type in assignment (different address spaces) net/phonet/pn_dev.c:392:17: expected struct net_device <noident> net/phonet/pn_dev.c:392:17: got void [noderef] <asn:4><noident> -Access table with rcu_access_pointer (fixes the following sparse errors): net/phonet/pn_dev.c:278:25: error: incompatible types in comparison expression (different address spaces) net/phonet/pn_dev.c:391:17: error: incompatible types in comparison expression (different address spaces) Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 18:16:30 -04:00
John Fastabend	18cdb37ebf	net: sched: do not use tcf_proto 'tp' argument from call_rcu Using the tcf_proto pointer 'tp' from inside the classifiers callback is not valid because it may have been cleaned up by another call_rcu occuring on another CPU. 'tp' is currently being used by tcf_unbind_filter() in this patch we move instances of tcf_unbind_filter outside of the call_rcu() context. This is safe to do because any running schedulers will either read the valid class field or it will be zeroed. And all schedulers today when the class is 0 do a lookup using the same call used by the tcf_exts_bind(). So even if we have a running classifier hit the null class pointer it will do a lookup and get to the same result. This is particularly fragile at the moment because the only way to verify this is to audit the schedulers call sites. Reported-by: Cong Wang <xiyou.wangconf@gmail.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 18:02:33 -04:00
John Fastabend	13990f8156	net: sched: cls_cgroup tear down exts and ematch from rcu callback It is not RCU safe to destroy the action chain while there is a possibility of readers accessing it. Move this code into the rcu callback using the same rcu callback used in the code patch to make a change to head. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 18:02:32 -04:00
John Fastabend	82a470f111	net: sched: remove tcf_proto from ematch calls This removes the tcf_proto argument from the ematch code paths that only need it to reference the net namespace. This allows simplifying qdisc code paths especially when we need to tear down the ematch from an RCU callback. In this case we can not guarentee that the tcf_proto structure is still valid. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 18:02:32 -04:00
Eric Dumazet	fcbeb976d7	net: introduce netdevice gso_min_segs attribute Some TSO engines might have a too heavy setup cost, that impacts performance on hosts sending small bursts (2 MSS per packet). This patch adds a device gso_min_segs, allowing drivers to set a minimum segment size for TSO packets, according to the NIC performance. Tested on a mlx4 NIC, this allows to get a ~110% increase of throughput when sending 2 MSS per packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 17:56:28 -04:00
Daniel Borkmann	b47bd8d279	ipv4: igmp: fix v3 general query drop monitor false positive In case we find a general query with non-zero number of sources, we are dropping the skb as it's malformed. RFC3376, section 4.1.8. Number of Sources (N): This number is zero in a General Query or a Group-Specific Query, and non-zero in a Group-and-Source-Specific Query. Therefore, reflect that by using kfree_skb() instead of consume_skb(). Fixes: `d679c5324d` ("igmp: avoid drop_monitor false positives") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 17:14:54 -04:00
Eric Dumazet	1255a50554	ethtool: Ethtool parameter to dynamically change tx_copybreak Use new ethtool [sg]et_tunable() to set tx_copybread (inline threshold) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 01:04:16 -04:00
Eric Dumazet	f2600cf02b	net: sched: avoid costly atomic operation in fq_dequeue() Standard qdisc API to setup a timer implies an atomic operation on every packet dequeue : qdisc_unthrottled() It turns out this is not really needed for FQ, as FQ has no concept of global qdisc throttling, being a qdisc handling many different flows, some of them can be throttled, while others are not. Fix is straightforward : add a 'bool throttle' to qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled() in sch_fq. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:55:10 -04:00
Eric Dumazet	bec3cfdca3	net: skb_segment() provides list head and tail Its unfortunate we have to walk again skb list to find the tail after segmentation, even if data is probably hot in cpu caches. skb_segment() can store the tail of the list into segs->prev, and validate_xmit_skb_list() can immediately get the tail. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:37:30 -04:00
Jesse Gross	f579668406	openvswitch: Add support for Geneve tunneling. The Openvswitch implementation is completely agnostic to the options that are in use and can handle newly defined options without further work. It does this by simply matching on a byte array of options and allowing userspace to setup flows on this array. Signed-off-by: Jesse Gross <jesse@nicira.com> Singed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:32:21 -04:00
Jesse Gross	6b205b2ca1	openvswitch: Factor out allocation and verification of actions. As the size of the flow key grows, it can put some pressure on the stack. This is particularly true in ovs_flow_cmd_set(), which needs several copies of the key on the stack. One of those uses is logically separate, so this factors it out to reduce stack pressure and improve readibility. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:32:20 -04:00
Jesse Gross	f0b128c1e2	openvswitch: Wrap struct ovs_key_ipv4_tunnel in a new structure. Currently, the flow information that is matched for tunnels and the tunnel data passed around with packets is the same. However, as additional information is added this is not necessarily desirable, as in the case of pointers. This adds a new structure for tunnel metadata which currently contains only the existing struct. This change is purely internal to the kernel since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version of OVS_KEY_ATTR_TUNNEL that is translated at flow setup. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:32:20 -04:00
Jesse Gross	67fa034194	openvswitch: Add support for matching on OAM packets. Some tunnel formats have mechanisms for indicating that packets are OAM frames that should be handled specially (either as high priority or not forwarded beyond an endpoint). This provides support for allowing those types of packets to be matched. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:32:20 -04:00
Jesse Gross	0714812134	openvswitch: Eliminate memset() from flow_extract. As new protocols are added, the size of the flow key tends to increase although few protocols care about all of the fields. In order to optimize this for hashing and matching, OVS uses a variable length portion of the key. However, when fields are extracted from the packet we must still zero out the entire key. This is no longer necessary now that OVS implements masking. Any fields (or holes in the structure) which are not part of a given protocol will be by definition not part of the mask and zeroed out during lookup. Furthermore, since masking already uses variable length keys this zeroing operation automatically benefits as well. In principle, the only thing that needs to be done at this point is remove the memset() at the beginning of flow. However, some fields assume that they are initialized to zero, which now must be done explicitly. In addition, in the event of an error we must also zero out corresponding fields to signal that there is no valid data present. These increase the total amount of code but very little of it is executed in non-error situations. Removing the memset() reduces the profile of ovs_flow_extract() from 0.64% to 0.56% when tested with large packets on a 10G link. Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:32:20 -04:00
Andy Zhou	0b5e8b8eea	net: Add Geneve tunneling protocol driver This adds a device level support for Geneve -- Generic Network Virtualization Encapsulation. The protocol is documented at http://tools.ietf.org/html/draft-gross-geneve-01 Only protocol layer Geneve support is provided by this driver. Openvswitch can be used for configuring, set up and tear down functional Geneve tunnels. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:32:20 -04:00
Vlad Yasevich	bdf6fa52f0	sctp: handle association restarts when the socket is closed. Currently association restarts do not take into consideration the state of the socket. When a restart happens, the current assocation simply transitions into established state. This creates a condition where a remote system, through a the restart procedure, may create a local association that is no way reachable by user. The conditions to trigger this are as follows: 1) Remote does not acknoledge some data causing data to remain outstanding. 2) Local application calls close() on the socket. Since data is still outstanding, the association is placed in SHUTDOWN_PENDING state. However, the socket is closed. 3) The remote tries to create a new association, triggering a restart on the local system. The association moves from SHUTDOWN_PENDING to ESTABLISHED. At this point, it is no longer reachable by any socket on the local system. This patch addresses the above situation by moving the newly ESTABLISHED association into SHUTDOWN-SENT state and bundling a SHUTDOWN after the COOKIE-ACK chunk. This way, the restarted associate immidiately enters the shutdown procedure and forces the termination of the unreachable association. Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-06 00:21:45 -04:00
David S. Miller	a4b4a2b7f9	Merge tag 'master-2014-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-10-03 Please pull tihs batch of updates intended for the 3.18 stream! For the iwlwifi bits, Emmanuel says: "I have here a few things that depend on the latest mac80211's changes: RRM, TPC, Quiet Period etc... Eyal keeps improving our rate control and we have a new device ID. This last patch should probably have gone to wireless.git, but at that stage, I preferred to send it to -next and CC stable." For (most of) the Atheros bits, Kalle says: "The only new feature is testmode support from me. Ben added a new method to crash the firmware with an assert for debug purposes. As usual, we have lots of smaller fixes from Michal. Matteo fixed a Kconfig dependency with debugfs. I fixed some warnings recently added to checkpatch." For the NFC bits, Samuel says: "We've had major updates for TI and ST Microelectronics drivers, and a few NCI related changes. For TI's trf7970a driver: - Target mode support for trf7970a - Suspend/resume support for trf7970a - DT properties additions to handle different quirks - A bunch of fixes for smartphone IOP related issues For ST Microelectronics' ST21NFCA and ST21NFCB drivers: - ISO15693 support for st21nfcb - checkpatch and sparse related warning fixes - Code cleanups and a few minor fixes Finally, Marvell added ISO15693 support to the NCI stack, together with a couple of NCI fixes." For the Bluetooth bits, Johan says: "This 3.18 pull request replaces the one I did on Monday ("bluetooth-next 2014-09-22", which hasn't been pulled yet). The additions since the last request are: - SCO connection fix for devices not supporting eSCO - Cleanups regarding the SCO establishment logic - Remove unnecessary return value from logging functions - Header compression fix for 6lowpan - Cleanups to the ieee802154/mrf24j40 driver Here's a copy from previous request that this one replaces: ' Here are some more patches for 3.18. They include various fixes to the btusb HCI driver, a fix for LE SMP, as well as adding Jukka to the MAINTAINERS file for generic 6LoWPAN (as requested by Alexander Aring). I've held on to this pull request a bit since we were waiting for a SCO related fix to get sorted out first. However, since the merge window is getting closer I decided not to wait for it. If we do get the fix sorted out there'll probably be a second small pull request later this week. '" And, "Unless 3.17 gets delayed this will probably be our last -next pull request for 3.18. We've got: - New Marvell hardware supportr - Multicast support for 6lowpan - Several of 6lowpan fixes & cleanups - Fix for a (false-positive) lockdep warning in L2CAP - Minor btusb cleanup" On top of all that comes the usual sort of updates to ath5k, ath9k, ath10k, brcmfmac, mwifiex, and wil6210. This time around there are also a number of rtlwifi updates to enable some new hardware and to reconcile the in-kernel drivers with some newer releases of the Realtek vendor drivers. Also of note is some device tree work for the bcma bus. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:34:39 -04:00
David S. Miller	61b37d2f54	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains another batch with Netfilter/IPVS updates for net-next, they are: 1) Add abstracted ICMP codes to the nf_tables reject expression. We introduce four reasons to reject using ICMP that overlap in IPv4 and IPv6 from the semantic point of view. This should simplify the maintainance of dual stack rule-sets through the inet table. 2) Move nf_send_reset() functions from header files to per-family nf_reject modules, suggested by Patrick McHardy. 3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the code now that br_netfilter can be modularized. Convert remaining spots in the network stack code. 4) Use rcu_barrier() in the nf_tables module removal path to ensure that we don't leave object that are still pending to be released via call_rcu (that may likely result in a crash). 5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad) idea was to probe the word size based on the xtables match/target info size, but this assumption is wrong when you have to dump the information back to userspace. 6) Allow to filter from prerouting and postrouting in the nf_tables bridge. In order to emulate the ebtables NAT chains (which are actually simple filter chains with no special semantics), we have support filtering from this hooks too. 7) Add explicit module dependency between xt_physdev and br_netfilter. This provides a way to detect if the user needs br_netfilter from the configuration path. This should reduce the breakage of the br_netfilter modularization. 8) Cleanup coding style in ip_vs.h, from Simon Horman. 9) Fix crash in the recently added nf_tables masq expression. We have to register/unregister the notifiers to clean up the conntrack table entries from the module init/exit path, not from the rule addition / deletion path. From Arturo Borrero. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:32:37 -04:00
Vlad Yasevich	5be5a2df40	bridge: Add filtering support for default_pvid Currently when vlan filtering is turned on on the bridge, the bridge will drop all traffic untill the user configures the filter. This isn't very nice for ports that don't care about vlans and just want untagged traffic. A concept of a default_pvid was recently introduced. This patch adds filtering support for default_pvid. Now, ports that don't care about vlans and don't define there own filter will belong to the VLAN of the default_pvid and continue to receive untagged traffic. This filtering can be disabled by setting default_pvid to 0. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:21:37 -04:00
Vlad Yasevich	3df6bf45ec	bridge: Simplify pvid checks. Currently, if the pvid is not set, we return an illegal vlan value even though the pvid value is set to 0. Since pvid of 0 is currently invalid, just return 0 instead. This makes the current and future checks simpler. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:21:36 -04:00
Vlad Yasevich	96a20d9d7f	bridge: Add a default_pvid sysfs attribute This patch allows the user to set and retrieve default_pvid value. A new value can only be stored when vlan filtering is disabled. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:21:36 -04:00
Ignacy Gawędzki	34a419d4e2	ematch: Fix early ending of inverted containers. The result of a negated container has to be inverted before checking for early ending. This fixes my previous attempt (`17c9c82326`) to make inverted containers work correctly. Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-04 20:49:46 -04:00
John Fastabend	1e203c1a2c	net: sched: suspicious RCU usage in qdisc_watchdog Suspicious RCU usage in qdisc_watchdog call needs to be done inside rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations need to ensure timer is cancelled before removing qdisc structure. [ 3992.191339] =============================== [ 3992.191340] [ INFO: suspicious RCU usage. ] [ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted [ 3992.191345] ------------------------------- [ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage! [ 3992.191348] [ 3992.191348] other info that might help us debug this: [ 3992.191348] [ 3992.191351] [ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1 [ 3992.191353] no locks held by swapper/1/0. [ 3992.191355] [ 3992.191355] stack backtrace: [ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72 [ 3992.191360] Hardware name: /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012 [ 3992.191362] 0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000 [ 3992.191366] ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000 [ 3992.191370] ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98 [ 3992.191374] Call Trace: [ 3992.191376] <IRQ> [<ffffffff8178f92c>] dump_stack+0x4e/0x68 [ 3992.191387] [<ffffffff810c9966>] lockdep_rcu_suspicious+0xe6/0x130 [ 3992.191392] [<ffffffff8167213a>] qdisc_watchdog+0x8a/0xb0 [ 3992.191396] [<ffffffff810f93f2>] __run_hrtimer+0x72/0x420 [ 3992.191399] [<ffffffff810f9bcd>] ? hrtimer_interrupt+0x7d/0x240 [ 3992.191403] [<ffffffff816720b0>] ? tc_classify+0xc0/0xc0 [ 3992.191406] [<ffffffff810f9c4f>] hrtimer_interrupt+0xff/0x240 [ 3992.191410] [<ffffffff8109e4a5>] ? __atomic_notifier_call_chain+0x5/0x140 [ 3992.191415] [<ffffffff8103577b>] local_apic_timer_interrupt+0x3b/0x60 [ 3992.191419] [<ffffffff8179c2b5>] smp_apic_timer_interrupt+0x45/0x60 [ 3992.191422] [<ffffffff8179a6bf>] apic_timer_interrupt+0x6f/0x80 [ 3992.191424] <EOI> [<ffffffff815ed233>] ? cpuidle_enter_state+0x73/0x2e0 [ 3992.191432] [<ffffffff815ed22e>] ? cpuidle_enter_state+0x6e/0x2e0 [ 3992.191437] [<ffffffff815ed567>] cpuidle_enter+0x17/0x20 [ 3992.191441] [<ffffffff810c0741>] cpu_startup_entry+0x3d1/0x4a0 [ 3992.191445] [<ffffffff81106fc6>] ? clockevents_config_and_register+0x26/0x30 [ 3992.191448] [<ffffffff81033c16>] start_secondary+0x1b6/0x260 Fixes: `b26b0d1e8b` ("net: qdisc: use rcu prefix and silence sparse warnings") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-04 20:45:54 -04:00
Florian Fainelli	f7d6b96f34	net: dsa: do not call phy_start_aneg Commit `f7f1de51ed` ("net: dsa: start and stop the PHY state machine") add calls to phy_start() in dsa_slave_open() respectively phy_stop() in dsa_slave_close(). We also call phy_start_aneg() in dsa_slave_create(), and this call is messing up with the PHY state machine, since we basically start the auto-negotiation, and later on restart it when calling phy_start(). phy_start() does not currently handle the PHY_FORCING or PHY_AN states properly, but such a fix would be too invasive for this window. Fixes: `f7f1de51ed` ("net: dsa: start and stop the PHY state machine") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-04 20:44:44 -04:00
Vijay Subramanian	c8753d55af	net: Cleanup skb cloning by adding SKB_FCLONE_FREE SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb. 1: If skb is allocated from head_cache, it indicates fclone is not available. 2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates it is available to be used. To avoid confusion for case 2 above, this patch replaces SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone companion skbs, this indicates it is free for use. SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and cannot / will not have a companion fclone. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-04 20:34:25 -04:00
Nicolas Dichtel	3be07244b7	ip6_gre: fix flowi6_proto value in xmit path In xmit path, we build a flowi6 which will be used for the output route lookup. We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the protocol should be IPPROTO_GRE. Fixes: `c12b395a46` ("gre: Support GRE over IPv6") Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-04 20:08:24 -04:00
Tom Herbert	bc1fc390e1	ip_tunnel: Add GUE support This patch allows configuring IPIP, sit, and GRE tunnels to use GUE. This is very similar to fou excpet that we need to insert the GUE header in addition to the UDP header on transmit. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 16:53:33 -07:00
Tom Herbert	37dd024779	gue: Receive side for Generic UDP Encapsulation This patch adds support receiving for GUE packets in the fou module. The fou module now supports direct foo-over-udp (no encapsulation header) and GUE. To support this a type parameter is added to the fou netlink parameters. For a GUE socket we define gue_udp_recv, gue_gro_receive, and gue_gro_complete to handle the specifics of the GUE protocol. Most of the code to manage and configure sockets is common with the fou. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 16:53:33 -07:00
Tom Herbert	efc98d08e1	fou: eliminate IPv4,v6 specific GRO functions This patch removes fou[46]_gro_receive and fou[46]_gro_complete functions. The v4 or v6 variants were chosen for the UDP offloads based on the address family of the socket this is not necessary or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb. This is set in udp6_gro_receive and unset in udp4_gro_receive. In fou_gro_receive the value is used to select the correct inet_offloads for the protocol of the outer IP header. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 16:53:32 -07:00
Tom Herbert	7371e0221c	ip_tunnel: Account for secondary encapsulation header in max_headroom When adjusting max_header for the tunnel interface based on egress device we need to account for any extra bytes in secondary encapsulation (e.g. FOU). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 16:53:32 -07:00
Eric Dumazet	01291202ed	net: do not export skb_gro_receive() skb_gro_receive() is only called from tcp_gro_receive() which is not in a module. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 15:54:30 -07:00
Eric Dumazet	55a93b3ea7	qdisc: validate skb without holding lock Validation of skb can be pretty expensive : GSO segmentation and/or checksum computations. We can do this without holding qdisc lock, so that other cpus can queue additional packets. Trick is that requeued packets were already validated, so we carry a boolean so that sch_direct_xmit() can validate a fresh skb list, or directly use an old one. Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads host. Turning TSO on or off had no effect on throughput, only few more cpu cycles. Lock contention on qdisc lock disappeared. Same if disabling TX checksum offload. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 15:36:11 -07:00
Herton R. Krzesinski	593cbb3ec6	net/rds: fix possible double free on sock tear down I got a report of a double free happening at RDS slab cache. One suspicion was that may be somewhere we were doing a sock_hold/sock_put on an already freed sock. Thus after providing a kernel with the following change: static inline void sock_hold(struct sock *sk) { - atomic_inc(&sk->sk_refcnt); + if (!atomic_inc_not_zero(&sk->sk_refcnt)) + WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n", + sk, sk->sk_family); } The warning successfuly triggered: Trying to hold sock already gone: ffff81f6dda61280 (family: 21) WARNING: at include/net/sock.h:350 sock_hold() Call Trace: <IRQ> [<ffffffff8adac135>] :rds:rds_send_remove_from_sock+0xf0/0x21b [<ffffffff8adad35c>] :rds:rds_send_drop_acked+0xbf/0xcf [<ffffffff8addf546>] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc [<ffffffff8009899a>] tasklet_action+0x8f/0x12b [<ffffffff800125a2>] __do_softirq+0x89/0x133 [<ffffffff8005f30c>] call_softirq+0x1c/0x28 [<ffffffff8006e644>] do_softirq+0x2c/0x7d [<ffffffff8006e4d4>] do_IRQ+0xee/0xf7 [<ffffffff8005e625>] ret_from_intr+0x0/0xa <EOI> Looking at the call chain above, the only way I think this would be possible is if somewhere we already released the same socket->sock which is assigned to the rds_message at rds_send_remove_from_sock. Which seems only possible to happen after the tear down done on rds_release. rds_release properly calls rds_send_drop_to to drop the socket from any rds_message, and some proper synchronization is in place to avoid race with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see a very narrow window where it may be possible we touch a sock already released: when rds_release races with rds_send_drop_acked, we check RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this specific case we don't clear rm->m_rs. In this case, it seems we could then go on at rds_send_drop_to and after it returns, the sock is freed by last sock_put on rds_release, with concurrently we being at rds_send_remove_from_sock; then at some point in the loop at rds_send_remove_from_sock we process an rds_message which didn't have rm->m_rs unset for a freed sock, and a possible sock_hold on an sock already gone at rds_release happens. This hopefully address the described condition above and avoids a double free on "second last" sock_put. In addition, I removed the comment about socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to in rds_release and we should have things properly serialized there, thus I can't see the comment being accurate there. Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 12:52:00 -07:00
Herton R. Krzesinski	eb74cc97b8	net/rds: do proper house keeping if connection fails in rds_tcp_conn_connect I see two problems if we consider the sock->ops->connect attempt to fail in rds_tcp_conn_connect. The first issue is that for example we don't remove the previously added rds_tcp_connection item to rds_tcp_tc_list at rds_tcp_set_callbacks, which means that on a next reconnect attempt for the same rds_connection, when rds_tcp_conn_connect is called we can again call rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list, leading to list corruption: to avoid this just make sure we call properly rds_tcp_restore_callbacks before we exit. The second issue is that we should also release the sock properly, by setting sock = NULL only if we are returning without error. Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 12:51:59 -07:00
Herton R. Krzesinski	310886dd5f	net/rds: call rds_conn_drop instead of open code it at rds_connect_complete Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 12:51:59 -07:00
Jesper Dangaard Brouer	808e7ac0bd	qdisc: dequeue bulking also pickup GSO/TSO packets The TSO and GSO segmented packets already benefit from bulking on their own. The TSO packets have always taken advantage of the only updating the tailptr once for a large packet. The GSO segmented packets have recently taken advantage of bulking xmit_more API, via merge commit `53fda7f7f9` ("Merge branch 'xmit_list'"), specifically via commit `7f2e870f2a` ("net: Move main gso loop out of dev_hard_start_xmit() into helper.") allowing qdisc requeue of remaining list. And via commit `ce93718fb7` ("net: Don't keep around original SKB when we software segment GSO frames."). This patch allow further bulking of TSO/GSO packets together, when dequeueing from the qdisc. Testing: Measuring HoL (Head-of-Line) blocking for TSO and GSO, with netperf-wrapper. Bulking several TSO show no performance regressions (requeues were in the area 32 requeues/sec). Bulking several GSOs does show small regression or very small improvement (requeues were in the area 8000 requeues/sec). Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional latency. Base-case, which is "normal" GSO bulking, sees varying high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs together, result in a stable high-prio queue delay of 0.50ms. Using igb at 100Mbit/s with GSO bulking, shows an improvement. Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 12:37:06 -07:00
Jesper Dangaard Brouer	5772e9a346	qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE Based on DaveM's recent API work on dev_hard_start_xmit(), that allows sending/processing an entire skb list. This patch implements qdisc bulk dequeue, by allowing multiple packets to be dequeued in dequeue_skb(). The optimization principle for this is two fold, (1) to amortize locking cost and (2) avoid expensive tailptr update for notifying HW. (1) Several packets are dequeued while holding the qdisc root_lock, amortizing locking cost over several packet. The dequeued SKB list is processed under the TXQ lock in dev_hard_start_xmit(), thus also amortizing the cost of the TXQ lock. (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more API to delay HW tailptr update, which also reduces the cost per packet. One restriction of the new API is that every SKB must belong to the same TXQ. This patch takes the easy way out, by restricting bulk dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the qdisc only have attached a single TXQ. Some detail about the flow; dev_hard_start_xmit() will process the skb list, and transmit packets individually towards the driver (see xmit_one()). In case the driver stops midway in the list, the remaining skb list is returned by dev_hard_start_xmit(). In sch_direct_xmit() this returned list is requeued by dev_requeue_skb(). To avoid overshooting the HW limits, which results in requeuing, the patch limits the amount of bytes dequeued, based on the drivers BQL limits. In-effect bulking will only happen for BQL enabled drivers. Small amounts for extra HoL blocking (2x MTU/0.24ms) were measured at 100Mbit/s, with bulking 8 packets, but the oscillating nature of the measurement indicate something, like sched latency might be causing this effect. More comparisons show, that this oscillation goes away occationally. Thus, we disregard this artifact completely and remove any "magic" bulking limit. For now, as a conservative approach, stop bulking when seeing TSO and segmented GSO packets. They already benefit from bulking on their own. A followup patch add this, to allow easier bisect-ability for finding regressions. Jointed work with Hannes, Daniel and Florian. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-03 12:37:06 -07:00
Arturo Borrero	8da4cc1b10	netfilter: nft_masq: register/unregister notifiers on module init/exit We have to register the notifiers in the masquerade expression from the the module _init and _exit path. This fixes crashes when removing the masquerade rule with no ipt_MASQUERADE support in place (which was masking the problem). Fixes: 9ba1f72 ("netfilter: nf_tables: add new nft_masq expression") Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-03 14:24:35 +02:00
David S. Miller	739e4a758e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/r8152.c net/netfilter/nfnetlink.c Both r8152 and nfnetlink conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-02 11:25:43 -07:00
John W. Linville	f6cd071891	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-10-02 13:56:19 -04:00
Pablo Neira Ayuso	4b7fd5d97e	netfilter: explicit module dependency between br_netfilter and physdev You can use physdev to match the physical interface enslaved to the bridge device. This information is stored in skb->nf_bridge and it is set up by br_netfilter. So, this is only available when iptables is used from the bridge netfilter path. Since `34666d4` ("netfilter: bridge: move br_netfilter out of the core"), the br_netfilter code is modular. To reduce the impact of this change, we can autoload the br_netfilter if the physdev match is used since we assume that the users need br_netfilter in place. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:30:57 +02:00
Pablo Neira Ayuso	36d2af5998	netfilter: nf_tables: allow to filter from prerouting and postrouting This allows us to emulate the NAT table in ebtables, which is actually a plain filter chain that hooks at prerouting, output and postrouting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:30:56 +02:00
Pablo Neira Ayuso	756c1b1a7f	netfilter: nft_compat: remove incomplete 32/64 bits arch compat code This code was based on the wrong asumption that you can probe based on the match/target private size that we get from userspace. This doesn't work at all when you have to dump the info back to userspace since you don't know what word size the userspace utility is using. Currently, the extensions that require arch compat are limit match and the ebt_mark match/target. The standard targets are not used by the nft-xt compat layer, so they are not affected. We can work around this limitation with a new revision that uses arch agnostic types. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:30:55 +02:00
Pablo Neira Ayuso	1b1bc49c0f	netfilter: nf_tables: wait for call_rcu completion on module removal Make sure the objects have been released before the nf_tables modules is removed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:30:54 +02:00
Pablo Neira Ayuso	1109a90c01	netfilter: use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) In `34666d4` ("netfilter: bridge: move br_netfilter out of the core"), the bridge netfilter code has been modularized. Use IS_ENABLED instead of ifdef to cover the module case. Fixes: `34666d4` ("netfilter: bridge: move br_netfilter out of the core") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:30:54 +02:00
Pablo Neira Ayuso	c8d7b98bec	netfilter: move nf_send_resetX() code to nf_reject_ipvX modules Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and nf_reject_ipv6 respectively. This code is shared by x_tables and nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:30:49 +02:00
Pablo Neira Ayuso	51b0a5d8c2	netfilter: nft_reject: introduce icmp code abstraction for inet and bridge This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides an abstraction to the ICMP and ICMPv6 codes that you can use from the inet and bridge tables, they are: * NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable * NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable * NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable * NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited You can still use the specific codes when restricting the rule to match the corresponding layer 3 protocol. I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have different semantics depending on the table family and to allow the user to specify ICMP family specific codes if they restrict it to the corresponding family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:29:57 +02:00
Jukka Rissanen	9c238ca8ec	Bluetooth: 6lowpan: Check transmit errors for multicast packets We did not return error if multicast packet transmit failed. This might not be desired so return error also in this case. If there are multiple 6lowpan devices where the multicast packet is sent, then return error even if sending to only one of them fails. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-02 13:41:57 +03:00
Jukka Rissanen	d7b6b0a532	Bluetooth: 6lowpan: Return EAGAIN error also for multicast packets Make sure that we are able to return EAGAIN from l2cap_chan_send() even for multicast packets. The error code was ignored unncessarily. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-02 13:41:39 +03:00
Jukka Rissanen	a7807d73a0	Bluetooth: 6lowpan: Avoid memory leak if memory allocation fails If skb_unshare() returns NULL, then we leak the original skb. Solution is to use temp variable to hold the new skb. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-02 13:41:32 +03:00
Jukka Rissanen	fc12518a4b	Bluetooth: 6lowpan: Memory leak as the skb is not freed The earlier multicast commit `36b3dd250d` ("Bluetooth: 6lowpan: Ensure header compression does not corrupt IPv6 header") lost one skb free which then caused memory leak. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-10-02 13:41:30 +03:00
Johan Hedberg	02e246aee8	Bluetooth: Fix lockdep warning with l2cap_chan_connect The L2CAP connection's channel list lock (conn->chan_lock) must never be taken while already holding a channel lock (chan->lock) in order to avoid lock-inversion and lockdep warnings. So far the l2cap_chan_connect function has acquired the chan->lock early in the function and then later called l2cap_chan_add(conn, chan) which will try to take the conn->chan_lock. This violates the correct order of taking the locks and may lead to the following type of lockdep warnings: -> #1 (&conn->chan_lock){+.+...}: [<c109324d>] lock_acquire+0x9d/0x140 [<c188459c>] mutex_lock_nested+0x6c/0x420 [<d0aab48e>] l2cap_chan_add+0x1e/0x40 [bluetooth] [<d0aac618>] l2cap_chan_connect+0x348/0x8f0 [bluetooth] [<d0cc9a91>] lowpan_control_write+0x221/0x2d0 [bluetooth_6lowpan] -> #0 (&chan->lock){+.+.+.}: [<c10928d8>] __lock_acquire+0x1a18/0x1d20 [<c109324d>] lock_acquire+0x9d/0x140 [<c188459c>] mutex_lock_nested+0x6c/0x420 [<d0ab05fd>] l2cap_connect_cfm+0x1dd/0x3f0 [bluetooth] [<d0a909c4>] hci_le_meta_evt+0x11a4/0x1260 [bluetooth] [<d0a910eb>] hci_event_packet+0x3ab/0x3120 [bluetooth] [<d0a7cb08>] hci_rx_work+0x208/0x4a0 [bluetooth] CPU0 CPU1 ---- ---- lock(&conn->chan_lock); lock(&chan->lock); lock(&conn->chan_lock); lock(&chan->lock); Before calling l2cap_chan_add() the channel is not part of the conn->chan_l list, and can therefore only be accessed by the L2CAP user (such as l2cap_sock.c). We can therefore assume that it is the responsibility of the user to handle mutual exclusion until this point (which we can see is already true in l2cap_sock.c by it in many places touching chan members without holding chan->lock). Since the hci_conn and by exctension l2cap_conn creation in the l2cap_chan_connect() function depend on chan details we cannot simply add a mutex_lock(&conn->chan_lock) in the beginning of the function (since the conn object doesn't yet exist there). What we can do however is move the chan->lock taking later into the function where we already have the conn object and can that way take conn->chan_lock first. This patch implements the above strategy and does some other necessary changes such as using __l2cap_chan_add() which assumes conn->chan_lock is held, as well as adding a second needed label so the unlocking happens as it should. Reported-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Tested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-10-02 10:37:07 +02:00
Alexei Starovoitov	38b2cf2982	net: pktgen: packet bursting via skb->xmit_more This patch demonstrates the effect of delaying update of HW tailptr. (based on earlier patch by Jesper) burst=1 is the default. It sends one packet with xmit_more=false burst=2 sends one packet with xmit_more=true and 2nd copy of the same packet with xmit_more=false burst=3 sends two copies of the same packet with xmit_more=true and 3rd copy with xmit_more=false Performance with ixgbe (usec 30): burst=1 tx:9.2 Mpps burst=2 tx:13.5 Mpps burst=3 tx:14.5 Mpps full 10G line rate Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 22:08:12 -04:00
Florian Fainelli	775dd692bd	net: bridge: add a br_set_state helper function In preparation for being able to propagate port states to e.g: notifiers or other kernel parts, do not manipulate the port state directly, but instead use a helper function which will allow us to do a bit more than just setting the state. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 22:03:50 -04:00
WANG Cong	a0efb80ce3	net_sched: avoid calling tcf_unbind_filter() in call_rcu callback This fixes the following crash: [ 63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648 [ 63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000 [ 63.980094] RIP: 0010:[<ffffffff817e6d07>] [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d [ 63.980094] RSP: 0018:ffff880117dffcc0 EFLAGS: 00010202 [ 63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000 [ 63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b [ 63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000 [ 63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001 [ 63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30 [ 63.980094] FS: 0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000 [ 63.980094] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0 [ 63.980094] Stack: [ 63.980094] ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68 [ 63.980094] ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8 [ 63.980094] ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a [ 63.980094] Call Trace: [ 63.980094] [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d [ 63.980094] [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691 [ 63.980094] [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691 [ 63.980094] [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d [ 63.980094] [<ffffffff810780a4>] __do_softirq+0x142/0x323 [ 63.980094] [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53 [ 63.980094] [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221 [ 63.980094] [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33 [ 63.980094] [<ffffffff8108e44d>] kthread+0xc9/0xd1 [ 63.980094] [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125 [ 63.980094] [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61 [ 63.980094] [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0 [ 63.980094] [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61 tp could be freed in call_rcu callback too, the order is not guaranteed. John Fastabend says: ==================== Its worth noting why this is safe. Any running schedulers will either read the valid class field or it will be zeroed. All schedulers today when the class is 0 do a lookup using the same call used by the tcf_exts_bind(). So even if we have a running classifier hit the null class pointer it will do a lookup and get to the same result. This is particularly fragile at the moment because the only way to verify this is to audit the schedulers call sites. ==================== Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 22:00:42 -04:00
WANG Cong	6e0565697a	net_sched: fix another crash in cls_tcindex This patch fixes the following crash: [ 166.670795] BUG: unable to handle kernel NULL pointer dereference at (null) [ 166.674230] IP: [<ffffffff814b739f>] __list_del_entry+0x5c/0x98 [ 166.674230] PGD d0ea5067 PUD ce7fc067 PMD 0 [ 166.674230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 166.674230] CPU: 1 PID: 775 Comm: tc Not tainted 3.17.0-rc6+ #642 [ 166.674230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 166.674230] task: ffff8800d03c4d20 ti: ffff8800cae7c000 task.ti: ffff8800cae7c000 [ 166.674230] RIP: 0010:[<ffffffff814b739f>] [<ffffffff814b739f>] __list_del_entry+0x5c/0x98 [ 166.674230] RSP: 0018:ffff8800cae7f7d0 EFLAGS: 00010207 [ 166.674230] RAX: 0000000000000000 RBX: ffff8800cba8d700 RCX: ffff8800cba8d700 [ 166.674230] RDX: 0000000000000000 RSI: dead000000200200 RDI: ffff8800cba8d700 [ 166.674230] RBP: ffff8800cae7f7d0 R08: 0000000000000001 R09: 0000000000000001 [ 166.674230] R10: 0000000000000000 R11: 000000000000859a R12: ffffffffffffffe8 [ 166.674230] R13: ffff8800cba8c5b8 R14: 0000000000000001 R15: ffff8800cba8d700 [ 166.674230] FS: 00007fdb5f04a740(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000 [ 166.674230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 166.674230] CR2: 0000000000000000 CR3: 00000000cf929000 CR4: 00000000000006e0 [ 166.674230] Stack: [ 166.674230] ffff8800cae7f7e8 ffffffff814b73e8 ffff8800cba8d6e8 ffff8800cae7f828 [ 166.674230] ffffffff817caeec 0000000000000046 ffff8800cba8c5b0 ffff8800cba8c5b8 [ 166.674230] 0000000000000000 0000000000000001 ffff8800cf8e33e8 ffff8800cae7f848 [ 166.674230] Call Trace: [ 166.674230] [<ffffffff814b73e8>] list_del+0xd/0x2b [ 166.674230] [<ffffffff817caeec>] tcf_action_destroy+0x4c/0x71 [ 166.674230] [<ffffffff817ca0ce>] tcf_exts_destroy+0x20/0x2d [ 166.674230] [<ffffffff817ec2b5>] tcindex_delete+0x196/0x1b7 struct list_head can not be simply copied and we should always init it. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 22:00:42 -04:00
Tom Herbert	54bc9bac30	gre: Set inner protocol in v4 and v6 GRE transmit Call skb_set_inner_protocol to set inner Ethernet protocol to protocol being encapsulation by GRE before tunnel_xmit. This is needed for GSO if UDP encapsulation (fou) is being done. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 21:35:51 -04:00
Tom Herbert	077c5a0948	ipip: Set inner IP protocol in ipip Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV4 before tunnel_xmit. This is needed if UDP encapsulation (fou) is being done. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 21:35:51 -04:00
Tom Herbert	469471cdfc	sit: Set inner IP protocol in sit Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV6 before tunnel_xmit. This is needed if UDP encapsulation (fou) is being done. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 21:35:51 -04:00
Tom Herbert	8bce6d7d0d	udp: Generalize skb_udp_segment skb_udp_segment is the function called from udp4_ufo_fragment to segment a UDP tunnel packet. This function currently assumes segmentation is transparent Ethernet bridging (i.e. VXLAN encapsulation). This patch generalizes the function to operate on either Ethertype or IP protocol. The inner_protocol field must be set to the protocol of the inner header. This can now be either an Ethertype or an IP protocol (in a union). A new flag in the skbuff indicates which type is effective. skb_set_inner_protocol and skb_set_inner_ipproto helper functions were added to set the inner_protocol. These functions are called from the point where the tunnel encapsulation is occuring. When skb_udp_tunnel_segment is called, the function to segment the inner packet is selected based on the inner IP or Ethertype. In the case of an IP protocol encapsulation, the function is derived from inet[6]_offloads. In the case of Ethertype, skb->protocol is set to the inner_protocol and skb_mac_gso_segment is called. (GRE currently does this, but it might be possible to lookup the protocol in offload_base and call the appropriate segmenation function directly). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 21:35:51 -04:00
Eric Dumazet	ce1a4ea3f1	net: avoid one atomic operation in skb_clone() Fast clone cloning can actually avoid an atomic_inc(), if we guarantee prior clone_ref value is 1. This requires a change kfree_skbmem(), to perform the atomic_dec_and_test() on clone_ref before setting fclone to SKB_FCLONE_UNAVAILABLE. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 21:27:23 -04:00
Fabian Frederick	e500f488c2	net/dccp/ccid.c: add __init to ccid_activate ccid_activate is only called by __init ccid_initialize_builtins in same module. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 18:33:13 -04:00
Fabian Frederick	0c5b8a4629	net/dccp/proto.c: add __init to dccp_mib_init dccp_mib_init is only called by __init dccp_init in same module. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 18:33:13 -04:00
Eric Dumazet	d0bf4a9e92	net: cleanup and document skb fclone layout Lets use a proper structure to clearly document and implement skb fast clones. Then, we might experiment more easily alternative layouts. This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm, to stop leaking of implementation details. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 16:34:25 -04:00
Yuchung Cheng	b248230c34	tcp: abort orphan sockets stalling on zero window probes Currently we have two different policies for orphan sockets that repeatedly stall on zero window ACKs. If a socket gets a zero window ACK when it is transmitting data, the RTO is used to probe the window. The socket is aborted after roughly tcp_orphan_retries() retries (as in tcp_write_timeout()). But if the socket was idle when it received the zero window ACK, and later wants to send more data, we use the probe timer to probe the window. If the receiver always returns zero window ACKs, icsk_probes keeps getting reset in tcp_ack() and the orphan socket can stall forever until the system reaches the orphan limit (as commented in tcp_probe_timer()). This opens up a simple attack to create lots of hanging orphan sockets to burn the memory and the CPU, as demonstrated in the recent netdev post "TCP connection will hang in FIN_WAIT1 after closing if zero window is advertised." http://www.spinics.net/lists/netdev/msg296539.html This patch follows the design in RTO-based probe: we abort an orphan socket stalling on zero window when the probe timer reaches both the maximum backoff and the maximum RTO. For example, an 100ms RTT connection will timeout after roughly 153 seconds (0.3 + 0.6 + .... + 76.8) if the receiver keeps the window shut. If the orphan socket passes this check, but the system already has too many orphans (as in tcp_out_of_resources()), we still abort it but we'll also send an RST packet as the connection may still be active. In addition, we change TCP_USER_TIMEOUT to cover (life or dead) sockets stalled on zero-window probes. This changes the semantics of TCP_USER_TIMEOUT slightly because it previously only applies when the socket has pending transmission. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 16:27:52 -04:00
Fabian Frederick	cb57659a15	cipso: add __init to cipso_v4_cache_init cipso_v4_cache_init is only called by __init cipso_v4_init Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 15:46:20 -04:00
Fabian Frederick	57a02c39c1	inet: frags: add __init to ip4_frags_ctl_register ip4_frags_ctl_register is only called by __init ipfrag_init Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 15:46:19 -04:00
Fabian Frederick	47d7a88c18	tcp: add __init to tcp_init_mem tcp_init_mem is only called by __init tcp_init. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 15:41:14 -04:00
Thierry Reding	e506d405ac	net: dsa: Fix build warning for !PM_SLEEP The dsa_switch_suspend() and dsa_switch_resume() functions are only used when PM_SLEEP is enabled, so they need #ifdef CONFIG_PM_SLEEP protection to avoid a compiler warning. Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 15:24:00 -04:00
Eric Dumazet	2c804d0f8f	ipv4: mentions skb_gro_postpull_rcsum() in inet_gro_receive() Proper CHECKSUM_COMPLETE support needs to adjust skb->csum when we remove one header. Its done using skb_gro_postpull_rcsum() In the case of IPv4, we know that the adjustment is not really needed, because the checksum over IPv4 header is 0. Lets add a comment to ease code comprehension and avoid copy/paste errors. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 13:44:05 -04:00
Fabian Frederick	f0a0c1cedf	ieee802154: fix __init functions Commit `3243acd37f` ("ieee802154: add __init to lowpan_frags_sysctl_register") added __init to lowpan_frags_ns_sysctl_register instead of lowpan_frags_sysctl_register Suggested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 02:03:13 -04:00
Trond Myklebust	72c23f0819	Merge branch 'bugfixes' into linux-next * bugfixes: NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT nfs: fix duplicate proc entries	2014-09-30 17:21:41 -04:00
Li RongQing	a12a601ed1	tcp: Change tcp_slow_start function to return void No caller uses the return value, so make this function return void. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 17:09:16 -04:00
Fabian Frederick	3243acd37f	ieee802154: add __init to lowpan_frags_sysctl_register lowpan_frags_sysctl_register is only called by __init lowpan_net_frag_init (part of the lowpan module). Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 17:08:06 -04:00
Fabian Frederick	0d4a2f9a33	irda: add __init to irlan_open irlan_open is only called by __init irlan_init in same module. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 17:08:06 -04:00
Florian Westphal	57f5877c11	netfilter: bridge: build br_nf_core only if required Eric reports build failure with CONFIG_BRIDGE_NETFILTER=n We insist to build br_nf_core.o unconditionally, but we must only do so if br_netfilter was enabled, else it fails to build due to functions being defined to empty stubs (and some structure members being defined out). Also, BRIDGE_NETFILTER=y\|m makes no sense when BRIDGE=n. Fixes: `34666d467` (netfilter: bridge: move br_netfilter out of the core) Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 14:07:51 -04:00
Hannes Frederic Sowa	705f1c869d	ipv6: remove rt6i_genid Eric Dumazet noticed that all no-nonexthop or no-gateway routes which are already marked DST_HOST (e.g. input routes routes) will always be invalidated during sk_dst_check. Thus per-socket dst caching absolutely had no effect and early demuxing had no effect. Thus this patch removes rt6i_genid: fn_sernum already gets modified during add operations, so we only must ensure we mutate fn_sernum during ipv6 address remove operations. This is a fairly cost extensive operations, but address removal should not happen that often. Also our mtu update functions do the same and we heard no complains so far. xfrm policy changes also cause a call into fib6_flush_trees. Also plug a hole in rt6_info (no cacheline changes). I verified via tracing that this change has effect. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 14:00:48 -04:00
James Morris	6c8ff877cd	Merge commit 'v3.16' into next	2014-10-01 00:44:04 +10:00
John Fastabend	b0ab6f9275	net: sched: enable per cpu qstats After previous patches to simplify qstats the qstats can be made per cpu with a packed union in Qdisc struct. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 01:02:26 -04:00
John Fastabend	6401585366	net: sched: restrict use of qstats qlen This removes the use of qstats->qlen variable from the classifiers and makes it an explicit argument to gnet_stats_copy_queue(). The qlen represents the qdisc queue length and is packed into the qstats at the last moment before passnig to user space. By handling it explicitely we avoid, in the percpu stats case, having to figure out which per_cpu variable to put it in. It would probably be best to remove it from qstats completely but qstats is a user space ABI and can't be broken. A future patch could make an internal only qstats structure that would avoid having to allocate an additional u32 variable on the Qdisc struct. This would make the qstats struct 128bits instead of 128+32. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 01:02:26 -04:00
John Fastabend	25331d6ce4	net: sched: implement qstat helper routines This adds helpers to manipulate qstats logic and replaces locations that touch the counters directly. This simplifies future patches to push qstats onto per cpu counters. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 01:02:26 -04:00
John Fastabend	22e0f8b932	net: sched: make bstats per cpu and estimator RCU safe In order to run qdisc's without locking statistics and estimators need to be handled correctly. To resolve bstats make the statistics per cpu. And because this is only needed for qdiscs that are running without locks which is not the case for most qdiscs in the near future only create percpu stats when qdiscs set the TCQ_F_CPUSTATS flag. Next because estimators use the bstats to calculate packets per second and bytes per second the estimator code paths are updated to use the per cpu statistics. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 01:02:26 -04:00
Ignacy Gawędzki	17c9c82326	ematch: Fix matching of inverted containers. Negated expressions and sub-expressions need to have their flags checked for TCF_EM_INVERT and their result negated accordingly. Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 15:31:29 -04:00
Eric Dumazet	73d3fe6d1c	gro: fix aggregation for skb using frag_list In commit `8a29111c7c` ("net: gro: allow to build full sized skb") I added a regression for linear skb that traditionally force GRO to use the frag_list fallback. Erez Shitrit found that at most two segments were aggregated and the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing. This is because pinfo at this spot still points to the last skb in the chain, instead of the first one, where we find the correct gso_size information. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `8a29111c7c` ("net: gro: allow to build full sized skb") Reported-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 15:17:59 -04:00
David S. Miller	852248449c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== pull request: netfilter/ipvs updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: 1) Four patches to make the new nf_tables masquerading support independent of the x_tables infrastructure. This also resolves a compilation breakage if the masquerade target is disabled but the nf_tables masq expression is enabled. 2) ipset updates via Jozsef Kadlecsik. This includes the addition of the skbinfo extension that allows you to store packet metainformation in the elements. This can be used to fetch and restore this to the packets through the iptables SET target, patches from Anton Danilov. 3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick. 4) Add simple weighted fail-over scheduler via Simon Horman. This provides a fail-over IPVS scheduler (unlike existing load balancing schedulers). Connections are directed to the appropriate server based solely on highest weight value and server availability, patch from Kenny Mathis. 5) Support IPv6 real servers in IPv4 virtual-services and vice versa. Simon Horman informs that the motivation for this is to allow more flexibility in the choice of IP version offered by both virtual-servers and real-servers as they no longer need to match: An IPv4 connection from an end-user may be forwarded to a real-server using IPv6 and vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell and Julian Anastasov. 6) Add global generation ID to the nf_tables ruleset. When dumping from several different object lists, we need a way to identify that an update has ocurred so userspace knows that it needs to refresh its lists. This also includes a new command to obtain the 32-bits generation ID. The less significant 16-bits of this ID is also exposed through res_id field in the nfnetlink header to quickly detect the interference and retry when there is no risk of ID wraparound. 7) Move br_netfilter out of the bridge core. The br_netfilter code is built in the bridge core by default. This causes problems of different kind to people that don't want this: Jesper reported performance drop due to the inconditional hook registration and I remember to have read complains on netdev from people regarding the unexpected behaviour of our bridging stack when br_netfilter is enabled (fragmentation handling, layer 3 and upper inspection). People that still need this should easily undo the damage by modprobing the new br_netfilter module. 8) Dump the set policy nf_tables that allows set parameterization. So userspace can keep user-defined preferences when saving the ruleset. From Arturo Borrero. 9) Use __seq_open_private() helper function to reduce boiler plate code in x_tables, From Rob Jones. 10) Safer default behaviour in case that you forget to load the protocol tracker. Daniel Borkmann and Florian Westphal detected that if your ruleset is stateful, you allow traffic to at least one single SCTP port and the SCTP protocol tracker is not loaded, then any SCTP traffic may be pass through unfiltered. After this patch, the connection tracking classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has been compiled with support for these modules. ==================== Trivially resolved conflict in include/linux/skbuff.h, Eric moved some netfilter skbuff members around, and the netfilter tree adjusted the ifdef guards for the bridging info pointer. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 14:46:53 -04:00
Florian Westphal	735d383117	tcp: change TCP_ECN prefixes to lower case Suggested by Stephen. Also drop inline keyword and let compiler decide. gcc 4.7.3 decides to no longer inline tcp_ecn_check_ce, so split it up. The actual evaluation is not inlined anymore while the ECN_OK test is. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 14:41:22 -04:00
Florian Westphal	d82bd12298	tcp: move TCP_ECN_create_request out of header After Octavian Purdilas tcp ipv4/ipv6 unification work this helper only has a single callsite. While at it, convert name to lowercase, suggested by Stephen. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 14:41:22 -04:00
Steve Wise	7e5be28827	svcrdma: advertise the correct max payload Svcrdma currently advertises 1MB, which is too large. The correct value is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed in an NFSRDMA IO chunk * the host page size. This bug is usually benign because the Linux X64 NFSRDMA client correctly limits the payload size to the correct value (64*4096 = 256KB). But if the Linux client is PPC64 with a 64KB page size, then the client will indeed use a payload size that will overflow the server. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-29 14:35:18 -04:00
Li RongQing	41c91996d9	tcp: remove unnecessary assignment. This variable i is overwritten to 0 by following code Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 12:31:12 -04:00
Eric Dumazet	b193722731	net: reorganize sk_buff for faster __copy_skb_header() With proliferation of bit fields in sk_buff, __copy_skb_header() became quite expensive, showing as the most expensive function in a GSO workload. __copy_skb_header() performance is also critical for non GSO TCP operations, as it is used from skb_clone() This patch carefully moves all the fields that were not copied in a separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more Then I moved all other fields and all other copied fields in a section delimited by headers_start[0]/headers_end[0] section so that we can use a single memcpy() call, inlined by compiler using long word load/stores. I also tried to make all copies in the natural orders of sk_buff, to help hardware prefetching. I made sure sk_buff size did not change. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 12:27:20 -04:00
Jukka Rissanen	156395c998	Bluetooth: 6lowpan: Enable multicast support Set multicast support for 6lowpan network interface. This is needed in every network interface that supports IPv6. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-29 17:06:38 +02:00
Jukka Rissanen	36b3dd250d	Bluetooth: 6lowpan: Ensure header compression does not corrupt IPv6 header If skb is going to multiple destinations, then make sure that we do not overwrite the common IPv6 headers. So before compressing the IPv6 headers, we copy the skb and that is then sent to 6LoWPAN Bluetooth devices. This is a similar patch as what was done for IEEE 802.154 6LoWPAN in commit `f19f4f9525` ("ieee802154: 6lowpan: ensure header compression does not corrupt ipv6 header") Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-29 17:06:38 +02:00
Florian Westphal	db29a9508a	netfilter: conntrack: disable generic tracking for known protocols Given following iptables ruleset: -P FORWARD DROP -A FORWARD -m sctp --dport 9 -j ACCEPT -A FORWARD -p tcp --dport 80 -j ACCEPT -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT One would assume that this allows SCTP on port 9 and TCP on port 80. Unfortunately, if the SCTP conntrack module is not loaded, this allows all SCTP communication, to pass though, i.e. -p sctp -j ACCEPT, which we think is a security issue. This is because on the first SCTP packet on port 9, we create a dummy "generic l4" conntrack entry without any port information (since conntrack doesn't know how to extract this information). All subsequent packets that are unknown will then be in established state since they will fallback to proto_generic and will match the 'generic' entry. Our originally proposed version [1] completely disabled generic protocol tracking, but Jozsef suggests to not track protocols for which a more suitable helper is available, hence we now mitigate the issue for in tree known ct protocol helpers only, so that at least NAT and direction information will still be preserved for others. [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html Joint work with Daniel Borkmann. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-29 12:17:49 +02:00
Arturo Borrero	9363dc4b59	netfilter: nf_tables: store and dump set policy We want to know in which cases the user explicitly sets the policy options. In that case, we also want to dump back the info. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-29 11:28:03 +02:00
Jukka Rissanen	59790aa287	Bluetooth: 6lowpan: Make sure skb exists before accessing it We need to make sure that the saved skb exists when resuming or suspending a CoC channel. This can happen if initial credits is 0 when channel is connected. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-29 10:10:02 +02:00
Daniel Borkmann	e3118e8359	net: tcp: add DCTCP congestion control algorithm This work adds the DataCenter TCP (DCTCP) congestion control algorithm [1], which has been first published at SIGCOMM 2010 [2], resp. follow-up analysis at SIGMETRICS 2011 [3] (and also, more recently as an informational IETF draft available at [4]). DCTCP is an enhancement to the TCP congestion control algorithm for data center networks. Typical data center workloads are i.e. i) partition/aggregate (queries; bursty, delay sensitive), ii) short messages e.g. 50KB-1MB (for coordination and control state; delay sensitive), and iii) large flows e.g. 1MB-100MB (data update; throughput sensitive). DCTCP has therefore been designed for such environments to provide/achieve the following three requirements: * High burst tolerance (incast due to partition/aggregate) * Low latency (short flows, queries) * High throughput (continuous data updates, large file transfers) with commodity, shallow buffered switches The basic idea of its design consists of two fundamentals: i) on the switch side, packets are being marked when its internal queue length > threshold K (K is chosen so that a large enough headroom for marked traffic is still available in the switch queue); ii) the sender/host side maintains a moving average of the fraction of marked packets, so each RTT, F is being updated as follows: F := X / Y, where X is # of marked ACKs, Y is total # of ACKs alpha := (1 - g) * alpha + g * F, where g is a smoothing constant The resulting alpha (iow: probability that switch queue is congested) is then being used in order to adaptively decrease the congestion window W: W := (1 - (alpha / 2)) * W The means for receiving marked packets resp. marking them on switch side in DCTCP is the use of ECN. RFC3168 describes a mechanism for using Explicit Congestion Notification from the switch for early detection of congestion, rather than waiting for segment loss to occur. However, this method only detects the presence of congestion, not the extent. In the presence of mild congestion, it reduces the TCP congestion window too aggressively and unnecessarily affects the throughput of long flows [4]. DCTCP, as mentioned, enhances Explicit Congestion Notification (ECN) processing to estimate the fraction of bytes that encounter congestion, rather than simply detecting that some congestion has occurred. DCTCP then scales the TCP congestion window based on this estimate [4], thus it can derive multibit feedback from the information present in the single-bit sequence of marks in its control law. And thus act in proportion to the extent of congestion, not its presence. Switches therefore set the Congestion Experienced (CE) codepoint in packets when internal queue lengths exceed threshold K. Resulting, DCTCP delivers the same or better throughput than normal TCP, while using 90% less buffer space. It was found in [2] that DCTCP enables the applications to handle 10x the current background traffic, without impacting foreground traffic. Moreover, a 10x increase in foreground traffic did not cause any timeouts, and thus largely eliminates TCP incast collapse problems. The algorithm itself has already seen deployments in large production data centers since then. We did a long-term stress-test and analysis in a data center, short summary of our TCP incast tests with iperf compared to cubic: This test measured DCTCP throughput and latency and compared it with CUBIC throughput and latency for an incast scenario. In this test, 19 senders sent at maximum rate to a single receiver. The receiver simply ran iperf -s. The senders ran iperf -c <receiver> -t 30. All senders started simultaneously (using local clocks synchronized by ntp). This test was repeated multiple times. Below shows the results from a single test. Other tests are similar. (DCTCP results were extremely consistent, CUBIC results show some variance induced by the TCP timeouts that CUBIC encountered.) For this test, we report statistics on the number of TCP timeouts, flow throughput, and traffic latency. 1) Timeouts (total over all flows, and per flow summaries): CUBIC DCTCP Total 3227 25 Mean 169.842 1.316 Median 183 1 Max 207 5 Min 123 0 Stddev 28.991 1.600 Timeout data is taken by measuring the net change in netstat -s "other TCP timeouts" reported. As a result, the timeout measurements above are not restricted to the test traffic, and we believe that it is likely that all of the "DCTCP timeouts" are actually timeouts for non-test traffic. We report them nevertheless. CUBIC will also include some non-test timeouts, but they are drawfed by bona fide test traffic timeouts for CUBIC. Clearly DCTCP does an excellent job of preventing TCP timeouts. DCTCP reduces timeouts by at least two orders of magnitude and may well have eliminated them in this scenario. 2) Throughput (per flow in Mbps): CUBIC DCTCP Mean 521.684 521.895 Median 464 523 Max 776 527 Min 403 519 Stddev 105.891 2.601 Fairness 0.962 0.999 Throughput data was simply the average throughput for each flow reported by iperf. By avoiding TCP timeouts, DCTCP is able to achieve much better per-flow results. In CUBIC, many flows experience TCP timeouts which makes flow throughput unpredictable and unfair. DCTCP, on the other hand, provides very clean predictable throughput without incurring TCP timeouts. Thus, the standard deviation of CUBIC throughput is dramatically higher than the standard deviation of DCTCP throughput. Mean throughput is nearly identical because even though cubic flows suffer TCP timeouts, other flows will step in and fill the unused bandwidth. Note that this test is something of a best case scenario for incast under CUBIC: it allows other flows to fill in for flows experiencing a timeout. Under situations where the receiver is issuing requests and then waiting for all flows to complete, flows cannot fill in for timed out flows and throughput will drop dramatically. 3) Latency (in ms): CUBIC DCTCP Mean 4.0088 0.04219 Median 4.055 0.0395 Max 4.2 0.085 Min 3.32 0.028 Stddev 0.1666 0.01064 Latency for each protocol was computed by running "ping -i 0.2 <receiver>" from a single sender to the receiver during the incast test. For DCTCP, "ping -Q 0x6 -i 0.2 <receiver>" was used to ensure that traffic traversed the DCTCP queue and was not dropped when the queue size was greater than the marking threshold. The summary statistics above are over all ping metrics measured between the single sender, receiver pair. The latency results for this test show a dramatic difference between CUBIC and DCTCP. CUBIC intentionally overflows the switch buffer which incurs the maximum queue latency (more buffer memory will lead to high latency.) DCTCP, on the other hand, deliberately attempts to keep queue occupancy low. The result is a two orders of magnitude reduction of latency with DCTCP - even with a switch with relatively little RAM. Switches with larger amounts of RAM will incur increasing amounts of latency for CUBIC, but not for DCTCP. 4) Convergence and stability test: This test measured the time that DCTCP took to fairly redistribute bandwidth when a new flow commences. It also measured DCTCP's ability to remain stable at a fair bandwidth distribution. DCTCP is compared with CUBIC for this test. At the commencement of this test, a single flow is sending at maximum rate (near 10 Gbps) to a single receiver. One second after that first flow commences, a new flow from a distinct server begins sending to the same receiver as the first flow. After the second flow has sent data for 10 seconds, the second flow is terminated. The first flow sends for an additional second. Ideally, the bandwidth would be evenly shared as soon as the second flow starts, and recover as soon as it stops. The results of this test are shown below. Note that the flow bandwidth for the two flows was measured near the same time, but not simultaneously. DCTCP performs nearly perfectly within the measurement limitations of this test: bandwidth is quickly distributed fairly between the two flows, remains stable throughout the duration of the test, and recovers quickly. CUBIC, in contrast, is slow to divide the bandwidth fairly, and has trouble remaining stable. CUBIC DCTCP Seconds Flow 1 Flow 2 Seconds Flow 1 Flow 2 0 9.93 0 0 9.92 0 0.5 9.87 0 0.5 9.86 0 1 8.73 2.25 1 6.46 4.88 1.5 7.29 2.8 1.5 4.9 4.99 2 6.96 3.1 2 4.92 4.94 2.5 6.67 3.34 2.5 4.93 5 3 6.39 3.57 3 4.92 4.99 3.5 6.24 3.75 3.5 4.94 4.74 4 6 3.94 4 5.34 4.71 4.5 5.88 4.09 4.5 4.99 4.97 5 5.27 4.98 5 4.83 5.01 5.5 4.93 5.04 5.5 4.89 4.99 6 4.9 4.99 6 4.92 5.04 6.5 4.93 5.1 6.5 4.91 4.97 7 4.28 5.8 7 4.97 4.97 7.5 4.62 4.91 7.5 4.99 4.82 8 5.05 4.45 8 5.16 4.76 8.5 5.93 4.09 8.5 4.94 4.98 9 5.73 4.2 9 4.92 5.02 9.5 5.62 4.32 9.5 4.87 5.03 10 6.12 3.2 10 4.91 5.01 10.5 6.91 3.11 10.5 4.87 5.04 11 8.48 0 11 8.49 4.94 11.5 9.87 0 11.5 9.9 0 SYN/ACK ECT test: This test demonstrates the importance of ECT on SYN and SYN-ACK packets by measuring the connection probability in the presence of competing flows for a DCTCP connection attempt without ECT in the SYN packet. The test was repeated five times for each number of competing flows. Competing Flows 1 \| 2 \| 4 \| 8 \| 16 ------------------------------ Mean Connection Probability 1 \| 0.67 \| 0.45 \| 0.28 \| 0 Median Connection Probability 1 \| 0.65 \| 0.45 \| 0.25 \| 0 As the number of competing flows moves beyond 1, the connection probability drops rapidly. Enabling DCTCP with this patch requires the following steps: DCTCP must be running both on the sender and receiver side in your data center, i.e.: sysctl -w net.ipv4.tcp_congestion_control=dctcp Also, ECN functionality must be enabled on all switches in your data center for DCTCP to work. The default ECN marking threshold (K) heuristic on the switch for DCTCP is e.g., 20 packets (30KB) at 1Gbps, and 65 packets (~100KB) at 10Gbps (K > 1/7 * C * RTT, [4]). In above tests, for each switch port, traffic was segregated into two queues. For any packet with a DSCP of 0x01 - or equivalently a TOS of 0x04 - the packet was placed into the DCTCP queue. All other packets were placed into the default drop-tail queue. For the DCTCP queue, RED/ECN marking was enabled, here, with a marking threshold of 75 KB. More details however, we refer you to the paper [2] under section 3). There are no code changes required to applications running in user space. DCTCP has been implemented in full isolation of the rest of the TCP code as its own congestion control module, so that it can run without a need to expose code to the core of the TCP stack, and thus nothing changes for non-DCTCP users. Changes in the CA framework code are minimal, and DCTCP algorithm operates on mechanisms that are already available in most Silicon. The gain (dctcp_shift_g) is currently a fixed constant (1/16) from the paper, but we leave the option that it can be chosen carefully to a different value by the user. In case DCTCP is being used and ECN support on peer site is off, DCTCP falls back after 3WHS to operate in normal TCP Reno mode. ss {-4,-6} -t -i diag interface: ... dctcp wscale:7,7 rto:203 rtt:2.349/0.026 mss:1448 cwnd:2054 ssthresh:1102 ce_state 0 alpha 15 ab_ecn 0 ab_tot 735584 send 10129.2Mbps pacing_rate 20254.1Mbps unacked:1822 retrans:0/15 reordering:101 rcv_space:29200 ... dctcp-reno wscale:7,7 rto:201 rtt:0.711/1.327 ato:40 mss:1448 cwnd:10 ssthresh:1102 fallback_mode send 162.9Mbps pacing_rate 325.5Mbps rcv_rtt:1.5 rcv_space:29200 More information about DCTCP can be found in [1-4]. [1] http://simula.stanford.edu/~alizade/Site/DCTCP.html [2] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf [3] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf [4] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00 Joint work with Florian Westphal and Glenn Judd. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 00:13:10 -04:00
Florian Westphal	9890092e46	net: tcp: more detailed ACK events and events for CE marked packets DataCenter TCP (DCTCP) determines cwnd growth based on ECN information and ACK properties, e.g. ACK that updates window is treated differently than DUPACK. Also DCTCP needs information whether ACK was delayed ACK. Furthermore, DCTCP also implements a CE state machine that keeps track of CE markings of incoming packets. Therefore, extend the congestion control framework to provide these event types, so that DCTCP can be properly implemented as a normal congestion algorithm module outside of the core stack. Joint work with Daniel Borkmann and Glenn Judd. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 00:13:10 -04:00
Florian Westphal	7354c8c389	net: tcp: split ack slow/fast events from cwnd_event The congestion control ops "cwnd_event" currently supports CA_EVENT_FAST_ACK and CA_EVENT_SLOW_ACK events (among others). Both FAST and SLOW_ACK are only used by Westwood congestion control algorithm. This removes both flags from cwnd_event and adds a new in_ack_event callback for this. The goal is to be able to provide more detailed information about ACKs, such as whether ECE flag was set, or whether the ACK resulted in a window update. It is required for DataCenter TCP (DCTCP) congestion control algorithm as it makes a different choice depending on ECE being set or not. Joint work with Daniel Borkmann and Glenn Judd. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 00:13:10 -04:00
Daniel Borkmann	30e502a34b	net: tcp: add flag for ca to indicate that ECN is required This patch adds a flag to TCP congestion algorithms that allows for requesting to mark IPv4/IPv6 sockets with transport as ECN capable, that is, ECT(0), when required by a congestion algorithm. It is currently used and needed in DataCenter TCP (DCTCP), as it requires both peers to assert ECT on all IP packets sent - it uses ECN feedback (i.e. CE, Congestion Encountered information) from switches inside the data center to derive feedback to the end hosts. Therefore, simply add a new flag to icsk_ca_ops. Note that DCTCP's algorithm/behaviour slightly diverges from RFC3168, therefore this is only (!) enabled iff the assigned congestion control ops module has requested this. By that, we can tightly couple this logic really only to the provided congestion control ops. Joint work with Florian Westphal and Glenn Judd. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 00:13:10 -04:00
Florian Westphal	55d8694fa8	net: tcp: assign tcp cong_ops when tcp sk is created Split assignment and initialization from one into two functions. This is required by followup patches that add Datacenter TCP (DCTCP) congestion control algorithm - we need to be able to determine if the connection is moderated by DCTCP before the 3WHS has finished. As we walk the available congestion control list during the assignment, we are always guaranteed to have Reno present as it's fixed compiled-in. Therefore, since we're doing the early assignment, we don't have a real use for the Reno alias tcp_init_congestion_ops anymore and can thus remove it. Actual usage of the congestion control operations are being made after the 3WHS has finished, in some cases however we can access get_info() via diag if implemented, therefore we need to zero out the private area for those modules. Joint work with Daniel Borkmann and Glenn Judd. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 00:13:10 -04:00
John Fastabend	53dfd50181	net: sched: cls_rcvp, complete rcu conversion This completes the cls_rsvp conversion to RCU safe copy, update semantics. As a result all cases of tcf_exts_change occur on empty lists now. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 00:04:55 -04:00
WANG Cong	68f6a7c6c9	net_sched: fix another regression in cls_tcindex Clearly the following change is not expected: - if (!cp.perfect && !cp.h) - cp.alloc_hash = cp.hash; + if (!cp->perfect && cp->h) + cp->alloc_hash = cp->hash; Fixes: commit `331b72922c` ("net: sched: RCU cls_tcindex") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:34:35 -04:00
WANG Cong	02c5e84413	net_sched: fix errno in tcindex_set_parms() When kmemdup() fails, we should return -ENOMEM. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:34:22 -04:00
Rick Jones	825bae5d97	arp: Do not perturb drop profiles with ignored ARP packets We do not wish to disturb dropwatch or perf drop profiles with an ARP we will ignore. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:30:35 -04:00
WANG Cong	18d0264f63	net_sched: remove the first parameter from tcf_exts_destroy() Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <hadi@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:29:01 -04:00
David S. Miller	f5c7e1a47a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-09-25 1) Remove useless hash_resize_mutex in xfrm_hash_resize(). This mutex is used only there, but xfrm_hash_resize() can't be called concurrently at all. From Ying Xue. 2) Extend policy hashing to prefixed policies based on prefix lenght thresholds. From Christophe Gouault. 3) Make the policy hash table thresholds configurable via netlink. From Christophe Gouault. 4) Remove the maximum authentication length for AH. This was needed to limit stack usage. We switched already to allocate space, so no need to keep the limit. From Herbert Xu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:19:15 -04:00
WANG Cong	2c1a4311b6	neigh: check error pointer instead of NULL for ipv4_neigh_lookup() Fixes: commit `f187bc6efb` ("ipv4: No need to set generic neighbour pointer") Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:16:04 -04:00
Florian Fainelli	7905288f09	net: dsa: allow switches driver to implement get/set EEE Allow switches driver to query and enable/disable EEE on a per-port basis by implementing the ethtool_{get,set}_eee settings and delegating these operations to the switch driver. set_eee() will need to coordinate with the PHY driver to make sure that EEE is enabled, the link-partner supports it and the auto-negotiation result is satisfactory. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:14:09 -04:00
Florian Fainelli	b2f2af21e3	net: dsa: allow enabling and disable switch ports Whenever a per-port network device is used/unused, invoke the switch driver port_enable/port_disable callbacks to allow saving as much power as possible by disabling unused parts of the switch (RX/TX logic, memory arrays, PHYs...). We supply a PHY device argument to make sure the switch driver can act on the PHY device if needed (like putting/taking the PHY out of deep low power mode). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:14:08 -04:00
Florian Fainelli	f7f1de51ed	net: dsa: start and stop the PHY state machine dsa_slave_open() should start the PHY library state machine for its PHY interface, and dsa_slave_close() should stop the PHY library state machine accordingly. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:14:08 -04:00
Peter Pan(潘卫平)	155c6e1ad4	tcp: use tcp_flags in tcp_data_queue() This patch is a cleanup which follows the idea in commit `e11ecddf51` (tcp: use TCP_SKB_CB(skb)->tcp_flags in input path), and it may reduce register pressure since skb->cb[] access is fast, bacause skb is probably in a register. v2: remove variable th v3: reword the changelog Signed-off-by: Weiping Pan <panweiping3@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 16:37:57 -04:00
Eric Dumazet	cd7d8498c9	tcp: change tcp_skb_pcount() location Our goal is to access no more than one cache line access per skb in a write or receive queue when doing the various walks. After recent TCP_SKB_CB() reorganizations, it is almost done. Last part is tcp_skb_pcount() which currently uses skb_shinfo(skb)->gso_segs, which is a terrible choice, because it needs 3 cache lines in current kernel (skb->head, skb->end, and shinfo->gso_segs are all in 3 different cache lines, far from skb->cb) This very simple patch reuses space currently taken by tcp_tw_isn only in input path, as tcp_skb_pcount is only needed for skb stored in write queue. This considerably speeds up tcp_ack(), granted we avoid shinfo->tx_flags to get SKBTX_ACK_TSTAMP, which seems possible. This also speeds up all sack processing in general. This speeds up tcp_sendmsg() because it no longer has to access/dirty shinfo. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 16:36:48 -04:00
Eric Dumazet	971f10eca1	tcp: better TCP_SKB_CB layout to reduce cache line misses TCP maintains lists of skb in write queue, and in receive queues (in order and out of order queues) Scanning these lists both in input and output path usually requires access to skb->next, TCP_SKB_CB(skb)->seq, and TCP_SKB_CB(skb)->end_seq These fields are currently in two different cache lines, meaning we waste lot of memory bandwidth when these queues are big and flows have either packet drops or packet reorders. We can move TCP_SKB_CB(skb)->header at the end of TCP_SKB_CB, because this header is not used in fast path. This allows TCP to search much faster in the skb lists. Even with regular flows, we save one cache line miss in fast path. Thanks to Christoph Paasch for noticing we need to cleanup skb->cb[] (IPCB/IP6CB) before entering IP stack in tx path, and that I forgot IPCB use in tcp_v4_hnd_req() and tcp_v4_save_options(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 16:35:43 -04:00
Eric Dumazet	a224772db8	ipv6: add a struct inet6_skb_parm param to ipv6_opt_accepted() ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm that it needs. Lets not assume this, as TCP stack might use a different place. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 16:35:43 -04:00
Eric Dumazet	24a2d43d88	ipv4: rename ip_options_echo to __ip_options_echo() ip_options_echo() assumes struct ip_options is provided in &IPCB(skb)->opt Lets break this assumption, but provide a helper to not change all call points. ip_send_unicast_reply() gets a new struct ip_options pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 16:35:42 -04:00
Steffen Klassert	cd0a0bd9b8	ip6_gre: Return an error when adding an existing tunnel. ip6gre_tunnel_locate() should not return an existing tunnel if create is true. Otherwise it is possible to add the same tunnel multiple times without getting an error. So return NULL if the tunnel that should be created already exists. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 16:19:46 -04:00
Steffen Klassert	d814b847be	ip6_vti: Return an error when adding an existing tunnel. vti6_locate() should not return an existing tunnel if create is true. Otherwise it is possible to add the same tunnel multiple times without getting an error. So return NULL if the tunnel that should be created already exists. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 16:19:46 -04:00
Steffen Klassert	2b0bb01b6e	ip6_tunnel: Return an error when adding an existing tunnel. ip6_tnl_locate() should not return an existing tunnel if create is true. Otherwise it is possible to add the same tunnel multiple times without getting an error. So return NULL if the tunnel that should be created already exists. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 16:19:46 -04:00
Dan Williams	3f33407856	net: make tcp_cleanup_rbuf private net_dma was the only external user so this can become local to tcp.c again. Cc: James Morris <jmorris@namei.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2014-09-28 07:22:21 -07:00
Dan Williams	d27f9bc104	net_dma: revert 'copied_early' Now that tcp_dma_try_early_copy() is gone nothing ever sets copied_early. Also reverts "53240c208776 tcp: Fix possible double-ack w/ user dma" since it is no longer necessary. Cc: Ali Saidi <saidi@engin.umich.edu> Cc: James Morris <jmorris@namei.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Neal Cardwell <ncardwell@google.com> Reported-by: Dave Jones <davej@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2014-09-28 07:22:21 -07:00
Dan Williams	7bced39751	net_dma: simple removal Per commit "77873803363c net_dma: mark broken" net_dma is no longer used and there is no plan to fix it. This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards. Reverting the remainder of the net_dma induced changes is deferred to subsequent patches. Marked for stable due to Roman's report of a memory leak in dma_pin_iovec_pages(): https://lkml.org/lkml/2014/9/3/177 Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: David Whipple <whipple@securedatainnovations.ch> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: <stable@vger.kernel.org> Reported-by: Roman Gushchin <klamm@yandex-team.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2014-09-28 07:05:16 -07:00
Nicolas Dichtel	5a4ee9a9a0	ip6gre: add a rtnl link alias for ip6gretap With this alias, we don't need to load manually the module before adding an ip6gretap interface with iproute2. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 17:15:57 -04:00
Eric Dumazet	ff04a771ad	net : optimize skb_release_data() Cache skb_shinfo(skb) in a variable to avoid computing it multiple times. Reorganize the tests to remove one indentation level. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 16:53:49 -04:00
Wang Sheng-Hui	8280bf00fd	net/openvswitch: remove dup comment in vport.h Remove the duplicated comment "/* The following definitions are for users of the vport subsytem: */" in vport.h Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 16:42:33 -04:00
David S. Miller	e7af85db54	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== nf pull request for net This series contains netfilter fixes for net, they are: 1) Fix lockdep splat in nft_hash when releasing sets from the rcu_callback context. We don't the mutex there anymore. 2) Remove unnecessary spinlock_bh in the destroy path of the nf_tables rbtree set type from rcu_callback context. 3) Fix another lockdep splat in rhashtable. None of the callers hold a mutex when calling rhashtable_destroy. 4) Fix duplicated error reporting from nfnetlink when aborting and replaying a batch. 5) Fix a Kconfig issue reported by kbuild robot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 16:21:29 -04:00
LEROY Christophe	58e3cac561	net: optimise inet_proto_csum_replace4() csum_partial() is a generic function which is not optimised for small fixed length calculations, and its use requires to store "from" and "to" values in memory while we already have them available in registers. This also has impact, especially on RISC processors. In the same spirit as the change done by Eric Dumazet on csum_replace2(), this patch rewrites inet_proto_csum_replace4() taking into account RFC1624. I spotted during a NATted tcp transfert that csum_partial() is one of top 5 consuming functions (around 8%), and the second user of csum_partial() is inet_proto_csum_replace4(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 16:14:17 -04:00
Eric Dumazet	f4a775d144	net: introduce __skb_header_release() While profiling TCP stack, I noticed one useless atomic operation in tcp_sendmsg(), caused by skb_header_release(). It turns out all current skb_header_release() users have a fresh skb, that no other user can see, so we can avoid one atomic operation. Introduce __skb_header_release() to clearly document this. This gave me a 1.5 % improvement on TCP_RR workload. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:40:06 -04:00
David S. Miller	57219dc7bf	Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-09-22 Please pull this batch of updates intended for the 3.18 stream... For the mac80211 bits, Johannes says: "This time, I have some rate minstrel improvements, support for a very small feature from CCX that Steinar reverse-engineered, dynamic ACK timeout support, a number of changes for TDLS, early support for radio resource measurement and many fixes. Also, I'm changing a number of places to clear key memory when it's freed and Intel claims copyright for code they developed." For the bluetooth bits, Johan says: "Here are some more patches intended for 3.18. Most of them are cleanups or fixes for SMP. The only exception is a fix for BR/EDR L2CAP fixed channels which should now work better together with the L2CAP information request procedure." For the iwlwifi bits, Emmanuel says: "I fix here dvm which was broken by my last pull request. Arik continues to work on TDLS and Luca solved a few issues in CT-Kill. Eyal keeps digging into rate scaling code, more to come soon. Besides this, nothing really special here." Beyond that, there are the usual big batches of updates to ath9k, b43, mwifiex, and wil6210 as well as a handful of other bits here and there. Also, rtlwifi gets some btcoexist attention from Larry. Please let me know if there are problems! ==================== Had to adjust the wil6210 code to comply with Joe Perches's recent change in net-next to make the netdev_*() routines return void instead of 'int'. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:39:24 -04:00
Joe Perches	6ea754eb76	net: Change netdev_<level> logging functions to return void No caller or macro uses the return value so make all the functions return void. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:17:17 -04:00
John W. Linville	30d3c071a6	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-09-26 13:38:51 -04:00
John W. Linville	330bd4ec9d	NFC: 3.18 pull request This is the NFC pull request for 3.18. We've had major updates for TI and ST Microelectronics drivers: For TI's trf7970a driver: - Target mode support for trf7970a - Suspend/resume support for trf7970a - DT properties additions to handle different quirks - A bunch of fixes for smartphone IOP related issues For ST Microelectronics' ST21NFCA and ST21NFCB drivers: - ISO15693 support for st21nfcb - checkpatch and sparse related warning fixes - Code cleanups and a few minor fixes Finally, Marvell add ISO15693 support to the NCI stack, together with a couple of NCI fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUIg4oAAoJEIqAPN1PVmxKg+kP/RPrgH6LA1tFubIwKR2+sGQ7 g/W2J3AE8QASZkErkpXRNCt2D/SPWEKBY/qscwz+BtcWg76taIIaGTvVUNtxSaW3 gS4hG6V1UlANWv3KFfaOKmzjEOO/SPNtkFAyI0cTOaGyUqG4o9BgBZpn1rYO16MD ZkSC39MpjMoXB9BbsfQngoUEoWc3tZNMmRzk4IVTwE/wXuQvZxmFQXEAiZ+pnYle NQfugaGMz0526rLG3QnrpkUakFb81iQwtONpbx6i8KW/Klkc6TN/ek6J9ecU8t5z tdHOViZWRmA1VwMGBHwpq8F2o/ATH6GeivTgqrQjcjGNhCUUT1Ulzve2UxGEMWi6 ncjKY/GxUrYaMMtRvLv+/knrfbWtd+EnWOav07jgNrrA0tvgBNQvEKKHPoWykDVN QKpxu3YoNxrsR/LJMS+Zjj0IIM1Y+9DTOkLXzxJ5Hvht8rOl5heYGh2DICOpWsbQ ejrQicJOJvN5vqu+Sgcqq4msyTEdbs2LfRDrW1VC9A6ILI+KzYg2laTFGMnhZ5qn TgsYIDdONS2iGUulFHylGHI7ANtUg/mhklLUccY1HQYyiAM1NQUtzq1tAz6yLoIH l8iIiyzJSBWW57nWhyrULEbzHgPE+bHIjO4T+UUOxMgquYa4V11S1uP0OfWfZogR xS24GlobS2oXHMQqh0fA =d/C+ -----END PGP SIGNATURE----- Merge tag 'nfc-next-3.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz <sameo@linux.intel.com> says: "NFC: 3.18 pull request This is the NFC pull request for 3.18. We've had major updates for TI and ST Microelectronics drivers: For TI's trf7970a driver: - Target mode support for trf7970a - Suspend/resume support for trf7970a - DT properties additions to handle different quirks - A bunch of fixes for smartphone IOP related issues For ST Microelectronics' ST21NFCA and ST21NFCB drivers: - ISO15693 support for st21nfcb - checkpatch and sparse related warning fixes - Code cleanups and a few minor fixes Finally, Marvell add ISO15693 support to the NCI stack, together with a couple of NCI fixes." Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-09-26 13:37:02 -04:00
Pablo Neira Ayuso	34666d467c	netfilter: bridge: move br_netfilter out of the core Jesper reported that br_netfilter always registers the hooks since this is part of the bridge core. This harms performance for people that don't need this. This patch modularizes br_netfilter so it can be rmmod'ed, thus, the hooks can be unregistered. I think the bridge netfilter should have been a separated module since the beginning, Patrick agreed on that. Note that this is breaking compatibility for users that expect that bridge netfilter is going to be available after explicitly 'modprobe bridge' or via automatic load through brctl. However, the damage can be easily undone by modprobing br_netfilter. The bridge core also spots a message to provide a clue to people that didn't notice that this has been deprecated. On top of that, the plan is that nftables will not rely on this software layer, but integrate the connection tracking into the bridge layer to enable stateful filtering and NAT, which is was bridge netfilter users seem to require. This patch still keeps the fake_dst_ops in the bridge core, since this is required by when the bridge port is initialized. So we can safely modprobe/rmmod br_netfilter anytime. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>	2014-09-26 18:42:31 +02:00
Pablo Neira Ayuso	7276ca3fa2	netfilter: bridge: nf_bridge_copy_header as static inline in header Move nf_bridge_copy_header() as static inline in netfilter_bridge.h header file. This patch prepares the modularization of the br_netfilter code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-26 18:42:30 +02:00
Rob Jones	772476df70	net/netfilter/x_tables.c: use __seq_open_private() Reduce boilerplate code by using __seq_open_private() instead of seq_open() in xt_match_open() and xt_target_open(). Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-26 18:42:29 +02:00
Steffen Klassert	d61746b2e7	ip_tunnel: Don't allow to add the same tunnel multiple times. When we try to add an already existing tunnel, we don't return an error. Instead we continue and call ip_tunnel_update(). This means that we can change existing tunnels by adding the same tunnel multiple times. It is even possible to change the tunnel endpoints of the fallback device. We fix this by returning an error if we try to add an existing tunnel. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:41:30 -04:00
Eric Dumazet	4a8e320c92	net: sched: use pinned timers While using a MQ + NETEM setup, I had confirmation that the default timer migration ( /proc/sys/kernel/timer_migration ) is killing us. Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX queues) : EST="est 1sec 4sec" for ETH in eth1 do tc qd del dev $ETH root 2>/dev/null tc qd add dev $ETH root handle 1: mq tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms done We can see that timers get migrated into a single cpu, presumably idle at the time timers are set up. Then all qdisc dequeues run from this cpu and huge lock contention happens. This single cpu is stuck in softirq mode and cannot dequeue fast enough. 39.24% [kernel] [k] _raw_spin_lock 2.65% [kernel] [k] netem_enqueue 1.80% [kernel] [k] netem_dequeue 1.63% [kernel] [k] copy_user_enhanced_fast_string 1.45% [kernel] [k] _raw_spin_lock_bh By pinning qdisc timers on the cpu running the qdisc, we respect proper XPS setting and remove this lock contention. 5.84% [kernel] [k] netem_enqueue 4.83% [kernel] [k] _raw_spin_lock 2.92% [kernel] [k] copy_user_enhanced_fast_string Current Qdiscs that benefit from this change are : netem, cbq, fq, hfsc, tbf, htb. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:26:48 -04:00
Tom Herbert	53e5039896	net: Remove gso_send_check as an offload callback The send_check logic was only interesting in cases of TCP offload and UDP UFO where the checksum needed to be initialized to the pseudo header checksum. Now we've moved that logic into the related gso_segment functions so gso_send_check is no longer needed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:22:47 -04:00
Tom Herbert	f71470b37e	udp: move logic out of udp[46]_ufo_send_check In udp[46]_ufo_send_check the UDP checksum initialized to the pseudo header checksum. We can move this logic into udp[46]_ufo_fragment. After this change udp[64]_ufo_send_check is a no-op. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:22:46 -04:00
Tom Herbert	d020f8f733	tcp: move logic out of tcp_v[64]_gso_send_check In tcp_v[46]_gso_send_check the TCP checksum is initialized to the pseudo header checksum using __tcp_v[46]_send_check. We can move this logic into new tcp[46]_gso_segment functions to be done when ip_summed != CHECKSUM_PARTIAL (ip_summed == CHECKSUM_PARTIAL should be the common case, possibly always true when taking GSO path). After this change tcp_v[46]_gso_send_check is no-op. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:22:46 -04:00
Trond Myklebust	2aca5b869a	SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in order to allow NFSv4 clients to disable resend timeouts. Since those cause the RPC layer to break the connection, they mess up the duplicate reply caches that remain indexed on the port number in NFSv4.. This patch includes the code that was missing in the original to set the appropriate flag in struct rpc_clnt, when the caller of rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT. Fixes: `8a19a0b6cb` (SUNRPC: Add RPC task and client level options to...) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-25 21:25:17 -04:00
NeilBrown	1aff525629	NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() Now that nfs_release_page() doesn't block indefinitely, other deadlock avoidance mechanisms aren't needed. - it doesn't hurt for kswapd to block occasionally. If it doesn't want to block it would clear __GFP_WAIT. The current_is_kswapd() was only added to avoid deadlocks and we have a new approach for that. - memory allocation in the SUNRPC layer can very rarely try to ->releasepage() a page it is trying to handle. The deadlock is removed as nfs_release_page() doesn't block indefinitely. So we don't need to set PF_FSTRANS for sunrpc network operations any more. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-25 08:25:47 -04:00
Johan Hedberg	565766b087	Bluetooth: Rename sco_param_wideband table to esco_param_msbc The sco_param_wideband table represents the eSCO parameters for specifically mSBC encoding. This patch renames the table to the more descriptive esco_param_msbc name. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-25 10:35:08 +02:00
Jason Baron	3dedbb5ca1	rpc: Add -EPERM processing for xs_udp_send_request() If an iptables drop rule is added for an nfs server, the client can end up in a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM is ignored since the prior bits of the packet may have been successfully queued and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request() thinks that because some bits were queued it should return -EAGAIN. We then try the request again and again, resulting in cpu spinning. Reproducer: 1) open a file on the nfs server '/nfs/foo' (mounted using udp) 2) iptables -A OUTPUT -d <nfs server ip> -j DROP 3) write to /nfs/foo 4) close /nfs/foo 5) iptables -D OUTPUT -d <nfs server ip> -j DROP The softlockup occurs in step 4 above. The previous patch, allows xs_sendpages() to return both a sent count and any error values that may have occurred. Thus, if we get an -EPERM, return that to the higher level code. With this patch in place we can successfully abort the above sequence and avoid the softlockup. I also tried the above test case on an nfs mount on tcp and although the system does not softlockup, I still ended up with the 'hung_task' firing after 120 seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix, since -EPERM appears to get ignored much lower down in the stack and does not propogate up to xs_sendpages(). This case is not quite as insidious as the softlockup and it is not addressed here. Reported-by: Yigong Lou <ylou@akamai.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:13:46 -04:00
Jason Baron	f279cd008f	rpc: return sent and err from xs_sendpages() If an error is returned after the first bits of a packet have already been successfully queued, xs_sendpages() will return a positive 'int' value indicating success. Callers seem to treat this as -EAGAIN. However, there are cases where its not a question of waiting for the write queue to drain. For example, when there is an iptables rule dropping packets to the destination, the lower level code can return -EPERM only after parts of the packet have been successfully queued. In this case, we can end up continuously retrying resulting in a kernel softlockup. This patch is intended to make no changes in behavior but is in preparation for subsequent patches that can make decisions based on both on the number of bytes sent by xs_sendpages() and any errors that may have be returned. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:13:37 -04:00
Benjamin Coddington	a743419f42	SUNRPC: Don't wake tasks during connection abort When aborting a connection to preserve source ports, don't wake the task in xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the connection needs to be re-established since it preserves the task's status instead of setting it to the status of the aborting kernel_connect(). This may also avoid a potential conflict on the socket's lock. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:06:56 -04:00
David S. Miller	4daaab4f0c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-09-24 16:48:32 -04:00
Johan Hedberg	c7da579763	Bluetooth: Add retransmission effort into SCO parameter table It is expected that new parameter combinations will have the retransmission effort value different between some entries (mainly because of the new S4 configuration added by HFP 1.7), so it makes sense to move it into the table instead of having it hard coded based on the selected SCO_AIRMODE_*. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-24 22:15:29 +02:00
David S. Miller	543a2dff5e	Merge tag 'master-2014-09-23' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-09-23 Please consider pulling this one last batch of fixes intended for the 3.17 stream! For the NFC bits, Samuel says: "Hopefully not too late for a handful of NFC fixes: - 2 potential build failures for ST21NFCA and ST21NFCB, triggered by a depmod dependenyc cycle. - One potential buffer overflow in the microread driver." On top of that... Emil Goode provides a fix for a brcmfmac off-by-one regression which was introduced in the 3.17 cycle. Loic Poulain fixes a polarity mismatch for a variable assignment inside of rfkill-gpio. Wojciech Dubowik prevents a NULL pointer dereference in ath9k. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 15:00:12 -04:00
Tejun Heo	d06efebf0c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18 This is to receive `0a30288da1` ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe") which implements __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The commit reverted and patches to implement proper fix will be added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>	2014-09-24 13:00:21 -04:00
Simon Vincent	f19f4f9525	ieee802154: 6lowpan: ensure header compression does not corrupt ipv6 header The 6lowpan ipv6 header compression was causing problems for other interfaces that expected a ipv6 header to still be in place, as we were replacing the ipv6 header with a compressed version. This happened if you sent a packet to a multicast address as the packet would be output on 802.15.4, ethernet, and also be sent to the loopback interface. The skb data was shared between these interfaces so all interfaces ended up with a compressed ipv6 header. The solution is to ensure that before we do any header compression we are not sharing the skb or skb data with any other interface. If we are then we must take a copy of the skb and skb data before modifying the ipv6 header. The only place we can copy the skb is inside the xmit function so we don't leave dangling references to skb. This patch moves all the header compression to inside the xmit function. Very little code has been changed it has mostly been moved from lowpan_header_create to lowpan_xmit. At the top of the xmit function we now check if the skb is shared and if so copy it. In lowpan_header_create all we do now is store the source and destination addresses for use later when we compress the header. Signed-off-by: Simon Vincent <simon.vincent@xsilon.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-24 14:15:08 +02:00
Johan Hedberg	d41c15cf95	Bluetooth: Fix reason code used for rejecting SCO connections The core specification defines valid values for the HCI_Reject_Synchronous_Connection_Request command to be 0x0D-0x0F. So far the code has been using HCI_ERROR_REMOTE_USER_TERM (0x13) which is not a valid value and is therefore being rejected by some controllers: > HCI Event: Connect Request (0x04) plen 10 bdaddr 40:6F:2A:6A:E5:E0 class 0x000000 type eSCO < HCI Command: Reject Synchronous Connection (0x01\|0x002a) plen 7 bdaddr 40:6F:2A:6A:E5:E0 reason 0x13 Reason: Remote User Terminated Connection > HCI Event: Command Status (0x0f) plen 4 Reject Synchronous Connection (0x01\|0x002a) status 0x12 ncmd 1 Error: Invalid HCI Command Parameters This patch introduces a new define for a value from the valid range (0x0d == Connection Rejected Due To Limited Resources) and uses it instead for rejecting incoming connections. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-24 14:03:32 +02:00
Joe Perches	2b0bf6c85a	Bluetooth: Convert bt_<level> logging functions to return void No caller or macro uses the return value so make all the functions return void. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-24 09:40:08 +02:00
Christophe Ricard	9e87f9a9c4	NFC: nci: Add support for proprietary RF Protocols In NFC Forum NCI specification, some RF Protocol values are reserved for proprietary use (from 0x80 to 0xfe). Some CLF vendor may need to use one value within this range for specific technology. Furthermore, some CLF may not becompliant with NFC Froum NCI specification 2.0 and therefore will not support RF Protocol value 0x06 for PROTOCOL_T5T as mention in a draft specification and in a recent push. Adding get_rf_protocol handle to the nci_ops structure will help to set the correct technology to target. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-09-24 02:02:24 +02:00
Eric Dumazet	bd1e75abf4	tcp: add coalescing attempt in tcp_ofo_queue() In order to make TCP more resilient in presence of reorders, we need to allow coalescing to happen when skbs from out of order queue are transferred into receive queue. LRO/GRO can be completely canceled in some pathological cases, like per packet load balancing on aggregated links. I had to move tcp_try_coalesce() up in the file above tcp_ofo_queue() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-23 12:47:38 -04:00
Eric Dumazet	4cdf507d54	icmp: add a global rate limitation Current ICMP rate limiting uses inetpeer cache, which is an RBL tree protected by a lock, meaning that hosts can be stuck hard if all cpus want to check ICMP limits. When say a DNS or NTP server process is restarted, inetpeer tree grows quick and machine comes to its knees. iptables can not help because the bottleneck happens before ICMP messages are even cooked and sent. This patch adds a new global limitation, using a token bucket filter, controlled by two new sysctl : icmp_msgs_per_sec - INTEGER Limit maximal number of ICMP packets sent per second from this host. Only messages whose type matches icmp_ratemask are controlled by this limit. Default: 1000 icmp_msgs_burst - INTEGER icmp_msgs_per_sec controls number of ICMP packets sent per second, while icmp_msgs_burst controls the burst size of these packets. Default: 50 Note that if we really want to send millions of ICMP messages per second, we might extend idea and infra added in commit `04ca6973f7` ("ip: make IP identifiers less predictable") : add a token bucket in the ip_idents hash and no longer rely on inetpeer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-23 12:47:38 -04:00
David S. Miller	1f6d80358d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: arch/mips/net/bpf_jit.c drivers/net/can/flexcan.c Both the flexcan and MIPS bpf_jit conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-23 12:09:27 -04:00
Bernhard Thaler	48e68ff5e5	Bluetooth: Check for SCO type before setting retransmission effort SCO connection cannot be setup to devices that do not support retransmission. Patch based on http://permalink.gmane.org/gmane.linux.bluez.kernel/7779 and adapted for this kernel version. Code changed to check SCO/eSCO type before setting retransmission effort and max. latency. The purpose of the patch is to support older devices not capable of eSCO. Tested on Blackberry 655+ headset which does not support retransmission. Credits go to Alexander Sommerhuber. Signed-off-by: Bernhard Thaler <bernhard.thaler@r-it.at> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-23 11:30:04 +02:00
Linus Torvalds	98f75b8291	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) If the user gives us a msg_namelen of 0, don't try to interpret anything pointed to by msg_name. From Ani Sinha. 2) Fix some bnx2i/bnx2fc randconfig compilation errors. The gist of the issue is that we firstly have drivers that span both SCSI and networking. And at the top of that chain of dependencies we have things like SCSI_FC_ATTRS and SCSI_NETLINK which are selected. But since select is a sledgehammer and ignores dependencies, everything to select's SCSI_FC_ATTRS and/or SCSI_NETLINK has to also explicitly select their dependencies and so on and so forth. Generally speaking 'select' is supposed to only be used for child nodes, those which have no dependencies of their own. And this whole chain of dependencies in the scsi layer violates that rather strongly. So just make SCSI_NETLINK depend upon it's dependencies, and so on and so forth for the things selecting it (either directly or indirectly). From Anish Bhatt and Randy Dunlap. 3) Fix generation of blackhole routes in IPSEC, from Steffen Klassert. 4) Actually notice netdev feature changes in rtl_open() code, from Hayes Wang. 5) Fix divide by zero in bond enslaving, from Nikolay Aleksandrov. 6) Missing memory barrier in sunvnet driver, from David Stevens. 7) Don't leave anycast addresses around when ipv6 interface is destroyed, from Sabrina Dubroca. 8) Don't call efx_{arch}_filter_sync_rx_mode before addr_list_lock is initialized in SFC driver, from Edward Cree. 9) Fix missing DMA error checking in 3c59x, from Neal Horman. 10) Openvswitch doesn't emit OVS_FLOW_CMD_NEW notifications accidently, fix from Samuel Gauthier. 11) pch_gbe needs to select NET_PTP_CLASSIFY otherwise we can get a build error. 12) Fix macvlan regression wherein we stopped emitting broadcast/multicast frames over software devices. From Nicolas Dichtel. 13) Fix infiniband bug due to unintended overflow of skb->cb[], from Eric Dumazet. And add an assertion so this doesn't happen again. 14) dm9000_parse_dt() should return error pointers, not NULL. From Tobias Klauser. 15) IP tunneling code uses this_cpu_ptr() in preemptible contexts, fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma net: bcmgenet: fix TX reclaim accounting for fragments ipv4: do not use this_cpu_ptr() in preemptible context dm9000: Return an ERR_PTR() in all error conditions of dm9000_parse_dt() r8169: fix an if condition r8152: disable ALDPS ipoib: validate struct ipoib_cb size net: sched: shrink struct qdisc_skb_cb to 28 bytes tg3: Work around HW/FW limitations with vlan encapsulated frames macvlan: allow to enqueue broadcast pkt on virtual device pch_gbe: 'select' NET_PTP_CLASSIFY. scsi: Use 'depends' with LIBFC instead of 'select'. openvswitch: restore OVS_FLOW_CMD_NEW notifications genetlink: add function genl_has_listeners() lib: rhashtable: remove second linux/log2.h inclusion net: allow macvlans to move to net namespace 3c59x: Fix bad offset spec in skb_frag_dma_map 3c59x: Add dma error checking and recovery sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG can: at91_can: add missing prepare and unprepare of the clock ...	2014-09-22 18:23:33 -07:00
Eric Dumazet	a35165ca10	ipv4: do not use this_cpu_ptr() in preemptible context this_cpu_ptr() in preemptible context is generally bad Sep 22 05:05:55 br kernel: [ 94.608310] BUG: using smp_processor_id() in preemptible [00000000] code: ip/2261 Sep 22 05:05:55 br kernel: [ 94.608316] caller is tunnel_dst_set.isra.28+0x20/0x60 [ip_tunnel] Sep 22 05:05:55 br kernel: [ 94.608319] CPU: 3 PID: 2261 Comm: ip Not tainted 3.17.0-rc5 #82 We can simply use raw_cpu_ptr(), as preemption is safe in these contexts. Should fix https://bugzilla.kernel.org/show_bug.cgi?id=84991 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Joe <joe9mail@gmail.com> Fixes: `9a4aa9af44` ("ipv4: Use percpu Cache route in IP tunnels") Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 18:31:18 -04:00
Eric Dumazet	a2aeb02a8e	net: sched: fix compile warning in cls_u32 $ grep CONFIG_CLS_U32_MARK .config # CONFIG_CLS_U32_MARK is not set net/sched/cls_u32.c: In function 'u32_change': net/sched/cls_u32.c:852:1: warning: label 'errout' defined but not used [-Wunused-label] Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 16:47:19 -04:00
David S. Miller	84de67b298	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2014-09-22 We generate a blackhole or queueing route if a packet matches an IPsec policy but a state can't be resolved. Here we assume that dst_output() is called to kill these packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system without the necessary transformations. This pull request contains two patches to fix this issue: 1) Fix for blackhole routed packets. 2) Fix for queue routed packets. Both patches are serious stable candidates. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 16:41:41 -04:00
Eric Dumazet	fcdd1cf4dd	tcp: avoid possible arithmetic overflows icsk_rto is a 32bit field, and icsk_backoff can reach 15 by default, or more if some sysctl (eg tcp_retries2) are changed. Better use 64bit to perform icsk_rto << icsk_backoff operations As Joe Perches suggested, add a helper for this. Yuchung spotted the tcp_v4_err() case. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 16:27:10 -04:00
Daniel Borkmann	35f7aa5309	ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback RFC2710 (MLDv1), section 3.7. says: The length of a received MLD message is computed by taking the IPv6 Payload Length value and subtracting the length of any IPv6 extension headers present between the IPv6 header and the MLD message. If that length is greater than 24 octets, that indicates that there are other fields present beyond the fields described above, perhaps belonging to a future backwards-compatible version of MLD. An implementation of the version of MLD specified in this document MUST NOT send an MLD message longer than 24 octets and MUST ignore anything past the first 24 octets of a received MLD message. RFC3810 (MLDv2), section 8.2.1. states for listeners regarding presence of MLDv1 routers: In order to be compatible with MLDv1 routers, MLDv2 hosts MUST operate in version 1 compatibility mode. [...] When Host Compatibility Mode is MLDv2, a host acts using the MLDv2 protocol on that interface. When Host Compatibility Mode is MLDv1, a host acts in MLDv1 compatibility mode, using only the MLDv1 protocol, on that interface. [...] While section 8.3.1. specifies router behaviour regarding presence of MLDv1 routers: MLDv2 routers may be placed on a network where there is at least one MLDv1 router. The following requirements apply: If an MLDv1 router is present on the link, the Querier MUST use the lowest version of MLD present on the network. This must be administratively assured. Routers that desire to be compatible with MLDv1 MUST have a configuration option to act in MLDv1 mode; if an MLDv1 router is present on the link, the system administrator must explicitly configure all MLDv2 routers to act in MLDv1 mode. When in MLDv1 mode, the Querier MUST send periodic General Queries truncated at the Multicast Address field (i.e., 24 bytes long), and SHOULD also warn about receiving an MLDv2 Query (such warnings must be rate-limited). The Querier MUST also fill in the Maximum Response Delay in the Maximum Response Code field, i.e., the exponential algorithm described in section 5.1.3. is not used. [...] That means that we should not get queries from different versions of MLD. When there's a MLDv1 router present, MLDv2 enforces truncation and MRC == MRD (both fields are overlapping within the 24 octet range). Section 8.3.2. specifies behaviour in the presence of MLDv1 multicast address listeners: MLDv2 routers may be placed on a network where there are hosts that have not yet been upgraded to MLDv2. In order to be compatible with MLDv1 hosts, MLDv2 routers MUST operate in version 1 compatibility mode. MLDv2 routers keep a compatibility mode per multicast address record. The compatibility mode of a multicast address is determined from the Multicast Address Compatibility Mode variable, which can be in one of the two following states: MLDv1 or MLDv2. The Multicast Address Compatibility Mode of a multicast address record is set to MLDv1 whenever an MLDv1 Multicast Listener Report is received for that multicast address. At the same time, the Older Version Host Present timer for the multicast address is set to Older Version Host Present Timeout seconds. The timer is re-set whenever a new MLDv1 Report is received for that multicast address. If the Older Version Host Present timer expires, the router switches back to Multicast Address Compatibility Mode of MLDv2 for that multicast address. [...] That means, what can happen is the following scenario, that hosts can act in MLDv1 compatibility mode when they previously have received an MLDv1 query (or, simply operate in MLDv1 mode-only); and at the same time, an MLDv2 router could start up and transmits MLDv2 startup query messages while being unaware of the current operational mode. Given RFC2710, section 3.7 we would need to answer to that with an MLDv1 listener report, so that the router according to RFC3810, section 8.3.2. would receive that and internally switch to MLDv1 compatibility as well. Right now, I believe since the initial implementation of MLDv2, Linux hosts would just silently drop such MLDv2 queries instead of replying with an MLDv1 listener report, which would prevent a MLDv2 router going into fallback mode (until it receives other MLDv1 queries). Since the mapping of MRC to MRD in exactly such cases can make use of the exponential algorithm from 5.1.3, we cannot [strictly speaking] be aware in MLDv1 of the encoding in MRC, it seems also not mentioned by the RFC. Since encodings are the same up to 32767, assume in such a situation this value as a hard upper limit we would clamp. We have asked one of the RFC authors on that regard, and he mentioned that there seem not to be any implementations that make use of that exponential algorithm on startup messages. In any case, this patch fixes this MLD interoperability issue. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 16:23:15 -04:00
Loic Poulain	fa5c107cc8	net: rfkill: gpio: Fix clock status Clock is disabled when the device is blocked. So, clock_enabled is the logical negation of "blocked". Signed-off-by: Loic Poulain <loic.poulain@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-09-22 16:02:15 -04:00
John Fastabend	de5df63228	net: sched: cls_u32 changes to knode must appear atomic to readers Changes to the cls_u32 classifier must appear atomic to the readers. Before this patch if a change is requested for both the exts and ifindex, first the ifindex is updated then the exts with tcf_exts_change(). This opens a small window where a reader can have a exts chain with an incorrect ifindex. This violates the the RCU semantics. Here we resolve this by always passing u32_set_parms() a copy of the tc_u_knode to work on and then inserting it into the hash table after the updates have been successfully applied. Tested with the following short script: #tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 handle 1: \ u32 divisor 256 #tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 \ u32 link 1: hashkey mask ffffff00 at 12 \ match ip src 192.168.8.0/2 #tc filter add dev p3p2 parent 8001:0 protocol ip prio 102 \ handle 1::10 u32 classid 1:2 ht 1: \ match ip src 192.168.8.0/8 match ip tos 0x0a 1e #tc filter change dev p3p2 parent 8001:0 protocol ip prio 102 \ handle 1::10 u32 classid 1:2 ht 1: \ match ip src 1.1.0.0/8 match ip tos 0x0b 1e CC: Eric Dumazet <edumazet@google.com> CC: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 15:59:21 -04:00
John Fastabend	a1ddcfee2d	net: cls_u32: fix missed pcpu_success free_percpu This fixes a missed free_percpu in the unwind code path and when keys are destroyed. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 15:59:21 -04:00
Tom Herbert	3fcb95a84f	udp: Need to make ip6_udp_tunnel.c have GPL license Unable to load various tunneling modules without this: [ 80.679049] fou: Unknown symbol udp_sock_create6 (err 0) [ 91.439939] ip6_udp_tunnel: Unknown symbol ip6_local_out (err 0) [ 91.439954] ip6_udp_tunnel: Unknown symbol __put_net (err 0) [ 91.457792] vxlan: Unknown symbol udp_sock_create6 (err 0) [ 91.457831] vxlan: Unknown symbol udp_tunnel6_xmit_skb (err 0) Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 15:08:25 -04:00
Jason Wang	cecda693a9	net: keep original skb which only needs header checking during software GSO Commit `ce93718fb7` ("net: Don't keep around original SKB when we software segment GSO frames") frees the original skb after software GSO even for dodgy gso skbs. This breaks the stream throughput from untrusted sources, since only header checking was done during software GSO instead of a true segmentation. This patch fixes this by freeing the original gso skb only when it was really segmented by software. Fixes `ce93718fb7` ("net: Don't keep around original SKB when we software segment GSO frames.") Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 14:57:08 -04:00
Florian Fainelli	19e57c4e6d	net: dsa: add {get, set}_wol callbacks to slave devices Allow switch drivers to implement per-port Wake-on-LAN getter and setters. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 14:41:23 -04:00
Florian Fainelli	2446254915	net: dsa: allow switch drivers to implement suspend/resume hooks Add an abstraction layer to suspend/resume switch devices, doing the following split: - suspend/resume the slave network devices and their corresponding PHY devices - suspend/resume the switch hardware using switch driver callbacks Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 14:41:23 -04:00
Eric Dumazet	2571178626	net: sched: shrink struct qdisc_skb_cb to 28 bytes We cannot make struct qdisc_skb_cb bigger without impacting IPoIB, or increasing skb->cb[] size. Commit `e0f31d8498` ("flow_keys: Record IP layer protocol in skb_flow_dissect()") broke IPoIB. Only current offender is sch_choke, and this one do not need an absolutely precise flow key. If we store 17 bytes of flow key, its more than enough. (Its the actual size of flow_keys if it was a packed structure, but we might add new fields at the end of it later) Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `e0f31d8498` ("flow_keys: Record IP layer protocol in skb_flow_dissect()") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-22 14:21:47 -04:00
Andy Zhou	6d967f8789	udp_tunnel: Only build ip6_udp_tunnel.c when IPV6 is selected Functions supplied in ip6_udp_tunnel.c are only needed when IPV6 is selected. When IPV6 is not selected, those functions are stubbed out in udp_tunnel.h. ================================================================== net/ipv6/ip6_udp_tunnel.c:15:5: error: redefinition of 'udp_sock_create6' int udp_sock_create6(struct net net, struct udp_port_cfg cfg, In file included from net/ipv6/ip6_udp_tunnel.c:9:0: include/net/udp_tunnel.h:36:19: note: previous definition of 'udp_sock_create6' was here static inline int udp_sock_create6(struct net net, struct udp_port_cfg cfg, ================================================================== Fixes: `fd384412e` udp_tunnel: Seperate ipv6 functions into its own file Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 22:05:28 -04:00
Samuel Gauthier	9b67aa4a82	openvswitch: restore OVS_FLOW_CMD_NEW notifications Since commit `fb5d1e9e12` ("openvswitch: Build flow cmd netlink reply only if needed."), the new flows are not notified to the listeners of OVS_FLOW_MCGROUP. This commit fixes the problem by using the genl function, ie genl_has_listerners() instead of netlink_has_listeners(). Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:28:26 -04:00
Tom Herbert	4565e9919c	gre: Setup and TX path for gre/UDP foo-over-udp encapsulation Added netlink attrs to configure FOU encapsulation for GRE, netlink handling of these flags, and properly adjust MTU for encapsulation. ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:15:32 -04:00
Tom Herbert	473ab820dd	ipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation Add netlink handling for IP tunnel encapsulation parameters and and adjustment of MTU for encapsulation. ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:15:32 -04:00
Tom Herbert	14909664e4	sit: Setup and TX path for sit/UDP foo-over-udp encapsulation Added netlink handling of IP tunnel encapulation paramters, properly adjust MTU for encapsulation. Added ip_tunnel_encap call to ipip6_tunnel_xmit to actually perform FOU encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:15:32 -04:00
Tom Herbert	5632848653	net: Changes to ip_tunnel to support foo-over-udp encapsulation This patch changes IP tunnel to support (secondary) encapsulation, Foo-over-UDP. Changes include: 1) Adding tun_hlen as the tunnel header length, encap_hlen as the encapsulation header length, and hlen becomes the grand total of these. 2) Added common netlink define to support FOU encapsulation. 3) Routines to perform FOU encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:15:32 -04:00
Tom Herbert	afe93325bc	fou: Add GRO support Implement fou_gro_receive and fou_gro_complete, and populate these in the correponsing udp_offloads for the socket. Added ipproto to udp_offloads and pass this from UDP to the fou GRO routine in proto field of napi_gro_cb structure. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:15:31 -04:00
Tom Herbert	23461551c0	fou: Support for foo-over-udp RX path This patch provides a receive path for foo-over-udp. This allows direct encapsulation of IP protocols over UDP. The bound destination port is used to map to an IP protocol, and the XFRM framework (udp_encap_rcv) is used to receive encapsulated packets. Upon reception, the encapsulation header is logically removed (pointer to transport header is advanced) and the packet is reinjected into the receive path with the IP protocol indicated by the mapping. Netlink is used to configure FOU ports. The configuration information includes the port number to bind to and the IP protocol corresponding to that port. This should support GRE/UDP (http://tools.ietf.org/html/draft-yong-tsvwg-gre-in-udp-encap-02), as will as the other IP tunneling protocols (IPIP, SIT). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:15:31 -04:00
Tom Herbert	ce3e02867e	net: Export inet_offloads and inet6_offloads Want to be able to use these in foo-over-udp offloads, etc. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:15:31 -04:00
John Fastabend	4e2840eee6	net: sched: cls_u32: rcu can not be last node tc_u32_sel 'sel' in tc_u_knode expects to be the last element in the structure and pads the structure with tc_u32_key fields for each key. kzalloc(sizeof(n) + s->nkeyssizeof(struct tc_u32_key), GFP_KERNEL) CC: Eric Dumazet <edumazet@google.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 17:05:45 -04:00
David S. Miller	8f665f6cb7	Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-09-17 Please pull this batch of fixes intended for the 3.17 stream... Arend van Spriel sends a trio of minor brcmfmac fixes, including a fix for a Kconfig/build issue, a fix for a crash (null reference), and a regression fix related to event handling on a P2P interface. Hante Meuleman follows-up with a brcmfmac fix for a memory leak. Johannes Stezenbach brings an ath9k_htc fix for a regression related to hardware decryption offload. Marcel Holtmann delivers a one-liner to properly mark a device ID table in rfkill-gpio. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 16:33:15 -04:00
Eric Dumazet	ab34f64808	net: sched: use __skb_queue_head_init() where applicable pfifo_fast and htb use skb lists, without needing their spinlocks. (They instead use the standard qdisc lock) We can use __skb_queue_head_init() instead of skb_queue_head_init() to be consistent. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 16:32:10 -04:00
Florian Fainelli	6819563e64	net: dsa: allow switch drivers to specify phy_device::dev_flags Some switch drivers (e.g: bcm_sf2) may have to communicate specific workarounds or flags towards the PHY device driver. Allow switches driver to be delegated that task by introducing a get_phy_flags() callback which will do just that. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 16:27:07 -04:00
Eric Dumazet	2e4e441071	net: add alloc_skb_with_frags() helper Extract from sock_alloc_send_pskb() code building skb with frags, so that we can reuse this in other contexts. Intent is to use it from tcp_send_rcvq(), tcp_collapse(), ... We also want to replace some skb_linearize() calls to a more reliable strategy in pathological cases where we need to reduce number of frags. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 16:25:23 -04:00
Eric Dumazet	cb93471acc	tcp: do not fake tcp headers in tcp_send_rcvq() Now we no longer rely on having tcp headers for skbs in receive queue, tcp repair do not need to build fake ones. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 16:04:13 -04:00
Andy Zhou	c8fffcea0a	l2tp: Refactor l2tp core driver to make use of the common UDP tunnel functions Simplify l2tp implementation using common UDP tunnel APIs. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 15:57:15 -04:00
Andy Zhou	6a93cc9052	udp-tunnel: Add a few more UDP tunnel APIs Added a few more UDP tunnel APIs that can be shared by UDP based tunnel protocol implementation. The main ones are highlighted below. setup_udp_tunnel_sock() configures UDP listener socket for receiving UDP encapsulated packets. udp_tunnel_xmit_skb() and upd_tunnel6_xmit_skb() transmit skb using UDP encapsulation. udp_tunnel_sock_release() closes the UDP tunnel listener socket. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 15:57:15 -04:00
Andy Zhou	fd384412e1	udp_tunnel: Seperate ipv6 functions into its own file. Add ip6_udp_tunnel.c for ipv6 UDP tunnel functions to avoid ifdefs in udp_tunnel.c Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-19 15:57:15 -04:00
Pablo Neira Ayuso	84d7fce693	netfilter: nf_tables: export rule-set generation ID This patch exposes the ruleset generation ID in three ways: 1) The new command NFT_MSG_GETGEN that exposes the 32-bits ruleset generation ID. This ID is incremented in every commit and it should be large enough to avoid wraparound problems. 2) The less significant 16-bits of the generation ID are exposed through the nfgenmsg->res_id header field. This allows us to quickly catch if the ruleset has change between two consecutive list dumps from different object lists (in this specific case I think the risk of wraparound is unlikely). 3) Userspace subscribers may receive notifications of new rule-set generation after every commit. This also provides an alternative way to monitor the generation ID. If the events are lost, the userspace process hits a overrun error, so it knows that it is working with a stale ruleset anyway. Patrick spotted that rule-set transformations in userspace may take quite some time. In that case, it annotates the 32-bits generation ID before fetching the rule-set, then: 1) it compares it to what we obtain after the transformation to make sure it is not working with a stale rule-set and no wraparound has ocurred. 2) it subscribes to ruleset notifications, so it can watch for new generation ID. This is complementary to the NLM_F_DUMP_INTR approach, which allows us to detect an interference in the middle one single list dumping. There is no way to explicitly check that an interference has occurred between two list dumps from the kernel, since it doesn't know how many lists the userspace client is actually going to dump. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-19 11:14:43 +02:00
Pablo Neira Ayuso	fc04733a1a	netfilter: nfnetlink: use original skbuff when committing/aborting This allows us to access the original content of the batch from the commit and the abort paths. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-19 11:14:42 +02:00
Johan Hedberg	5eb596f55c	Bluetooth: Fix setting correct security level when initiating SMP We can only determine the final security level when both pairing request and response have been exchanged. When initiating pairing the starting target security level is set to MEDIUM unless explicitly specified to be HIGH, so that we can still perform pairing even if the remote doesn't have MITM capabilities. However, once we've received the pairing response we should re-consult the remote and local IO capabilities and upgrade the target security level if necessary. Without this patch the resulting Long Term Key will occasionally be reported to be unauthenticated when it in reality is an authenticated one. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-09-18 17:39:37 +02:00
Pablo Neira Ayuso	fcfa8f493f	Merge branch 'ipvs-next' Simon Horman says: ==================== This pull requests makes the following changes: * Add simple weighted fail-over scheduler. - Unlike other IPVS schedulers this offers fail-over rather than load balancing. Connections are directed to the appropriate server based solely on highest weight value and server availability. - Thanks to Kenny Mathis * Support IPv6 real servers in IPv4 virtual-services and vice versa - This feature is supported in conjunction with the tunnel (IPIP) forwarding mechanism. That is, IPv4 may be forwarded in IPv6 and vice versa. - The motivation for this is to allow more flexibility in the choice of IP version offered by both virtual-servers and real-servers as they no longer need to match: An IPv4 connection from an end-user may be forwarded to a real-server using IPv6 and vice versa. - Further work need to be done to support this feature in conjunction with connection synchronisation. For now such configurations are not allowed. - This change includes update to netlink protocol, adding a new destination address family attribute. And the necessary changes to plumb this information throughout IPVS. - Thanks to Alex Gartrell and Julian Anastasov ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-18 10:59:33 +02:00
Herbert Xu	689f1c9de2	ipsec: Remove obsolete MAX_AH_AUTH_LEN While tracking down the MAX_AH_AUTH_LEN crash in an old kernel I thought that this limit was rather arbitrary and we should just get rid of it. In fact it seems that we've already done all the work needed to remove it apart from actually removing it. This limit was there in order to limit stack usage. Since we've already switched over to allocating scratch space using kmalloc, there is no longer any need to limit the authentication length. This patch kills all references to it, including the BUG_ONs that led me here. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-18 10:54:36 +02:00
Alex Gartrell	bc18d37f67	ipvs: Allow heterogeneous pools now that we support them Remove the temporary consistency check and add a case statement to only allow ipip mixed dests. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-18 08:59:29 +09:00
Julian Anastasov	f18ae7206e	ipvs: use the new dest addr family field Use the new address family field cp->daf when printing cp->daddr in logs or connection listing. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-18 08:59:28 +09:00
Julian Anastasov	4d316f3f9a	ipvs: use correct address family in scheduler logs Needed to support svc->af != dest->af. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-18 08:59:23 +09:00
Marcel Holtmann	0097db06f5	Bluetooth: Remove exported hci_recv_fragment function The hci_recv_fragment function is no longer used by any driver and thus do not export it. In fact it is not even needed by the core and it can be removed altogether. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-09-17 10:23:03 +03:00
John Fastabend	9f6c38e70b	net: sched: cls_cgroup need tcf_exts_init in all cases This ensures the tcf_exts_init() is called for all cases. Fixes: `952313bd62` ("net: sched: cls_cgroup use RCU") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-16 16:26:39 -04:00
David S. Miller	2d9d65fa44	Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch Pravin B Shelar says: ==================== Open vSwitch Following patches adds recirculation and hash action to OVS. First patch removes pointer to stack object. Next three patches does code restructuring which is required for last patch. Recirculation implementation is changed, according to comments from David Miller, to avoid using recursive calls in OVS. It is using queue to record recirc action and deferred recirc is executed at the end of current actions execution. v1-v2: Changed subsystem name in subject to openvswitch v2-v3: Added patch to remove pkt_key pointer from skb->cb. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-16 16:21:48 -04:00
Marcel Holtmann	dda3b191eb	net: rfkill: gpio: Enable module auto-loading for ACPI based switches For the ACPI based switches the MODULE_DEVICE_TABLE is missing to export the entries for module auto-loading. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-09-16 16:09:01 -04:00
John Fastabend	e1f93eb06c	net: sched: cls_fw: add missing tcf_exts_init call in fw_change() When allocating a new structure we also need to call tcf_exts_init to initialize exts. A follow up patch might be in order to remove some of this code and do tcf_exts_assign(). With this we could remove the tcf_exts_init/tcf_exts_change pattern for some of the classifiers. As part of the future tcf_actions RCU series this will need to be done. For now fix the call here. Fixes `e35a8ee599` ("net: sched: fw use RCU") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-16 15:59:36 -04:00
John Fastabend	d14cbfc88f	net: sched: cls_cgroup fix possible memory leak of 'new' tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: `54996b529a` commit: `c7953ef230` [625/646] net: sched: cls_cgroup use RCU net/sched/cls_cgroup.c:130 cls_cgroup_change() warn: possible memory leak of 'new' net/sched/cls_cgroup.c:135 cls_cgroup_change() warn: possible memory leak of 'new' net/sched/cls_cgroup.c:139 cls_cgroup_change() warn: possible memory leak of 'new' Fixes: `c7953ef230` ("net: sched: cls_cgroup use RCU") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-16 15:59:36 -04:00
John Fastabend	a96366bf26	net: sched: cls_u32 add missing rcu_assign_pointer and annotation Add missing rcu_assign_pointer and missing annotation for ht_up in cls_u32.c Caught by kbuild bot, >> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces) net/sched/cls_u32.c:378:36: expected struct tc_u_hnode ht net/sched/cls_u32.c:378:36: got struct tc_u_hnode [noderef] <asn:4>ht_up >> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces) net/sched/cls_u32.c:610:54: expected struct tc_u_hnode ht net/sched/cls_u32.c:610:54: got struct tc_u_hnode [noderef] <asn:4>ht_up >> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces) net/sched/cls_u32.c:684:18: expected struct tc_u_hnode [noderef] <asn:4>ht_up net/sched/cls_u32.c:684:18: got struct tc_u_hnode [assigned] ht >> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-16 15:59:36 -04:00
John Fastabend	80aab73de4	net: sched: fix unsued cpu variable kbuild test robot reported an unused variable cpu in cls_u32.c after the patch below. This happens when PERF and MARK config variables are disabled Fix this is to use separate variables for perf and mark and define the cpu variable inside the ifdef logic. Fixes: `459d5f626d` ("net: sched: make cls_u32 per cpu")' Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-16 15:59:36 -04:00
WANG Cong	69301eaa7f	net_sched: fix a null pointer dereference in tcindex_set_parms() This patch fixes the following crash: [ 42.199159] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 42.200027] IP: [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526 [ 42.200027] PGD d2319067 PUD d4ffe067 PMD 0 [ 42.200027] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 42.200027] CPU: 0 PID: 541 Comm: tc Not tainted 3.17.0-rc4+ #603 [ 42.200027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 42.200027] task: ffff8800d22d2670 ti: ffff8800ce790000 task.ti: ffff8800ce790000 [ 42.200027] RIP: 0010:[<ffffffff817e3fc4>] [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526 [ 42.200027] RSP: 0018:ffff8800ce793898 EFLAGS: 00010202 [ 42.200027] RAX: 0000000000000001 RBX: ffff8800d1786498 RCX: 0000000000000000 [ 42.200027] RDX: ffffffff82114ec8 RSI: ffffffff82114ec8 RDI: ffffffff82114ec8 [ 42.200027] RBP: ffff8800ce793958 R08: 00000000000080d0 R09: 0000000000000001 [ 42.200027] R10: ffff8800ce7939a0 R11: 0000000000000246 R12: ffff8800d017d238 [ 42.200027] R13: 0000000000000018 R14: ffff8800d017c6a0 R15: ffff8800d1786620 [ 42.200027] FS: 00007f4e24539740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000 [ 42.200027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 42.200027] CR2: 0000000000000018 CR3: 00000000cff38000 CR4: 00000000000006f0 [ 42.200027] Stack: [ 42.200027] ffff8800ce0949f0 0000000000000000 0000000200000003 ffff880000000000 [ 42.200027] ffff8800ce7938b8 ffff8800ce7938b8 0000000600000007 0000000000000000 [ 42.200027] ffff8800ce7938d8 ffff8800ce7938d8 0000000600000007 ffff8800ce0949f0 [ 42.200027] Call Trace: [ 42.200027] [<ffffffff817e4169>] tcindex_change+0xdb/0xee [ 42.200027] [<ffffffff817c16ca>] tc_ctl_tfilter+0x44d/0x63f [ 42.200027] [<ffffffff8179d161>] rtnetlink_rcv_msg+0x181/0x194 [ 42.200027] [<ffffffff8179cf9d>] ? rtnl_lock+0x17/0x19 [ 42.200027] [<ffffffff8179cfe0>] ? __rtnl_unlock+0x17/0x17 [ 42.200027] [<ffffffff817ee296>] netlink_rcv_skb+0x49/0x8b [ 43.462494] [<ffffffff8179cfc2>] rtnetlink_rcv+0x23/0x2a [ 43.462494] [<ffffffff817ec8df>] netlink_unicast+0xc7/0x148 [ 43.462494] [<ffffffff817ed413>] netlink_sendmsg+0x5cb/0x63d [ 43.462494] [<ffffffff810ad781>] ? mark_lock+0x2e/0x224 [ 43.462494] [<ffffffff817757b8>] __sock_sendmsg_nosec+0x25/0x27 [ 43.462494] [<ffffffff81778165>] sock_sendmsg+0x57/0x71 [ 43.462494] [<ffffffff81152bbd>] ? might_fault+0x57/0xa4 [ 43.462494] [<ffffffff81152c06>] ? might_fault+0xa0/0xa4 [ 43.462494] [<ffffffff81152bbd>] ? might_fault+0x57/0xa4 [ 43.462494] [<ffffffff817838fd>] ? verify_iovec+0x69/0xb7 [ 43.462494] [<ffffffff817784f8>] ___sys_sendmsg+0x21d/0x2bb [ 43.462494] [<ffffffff81009db3>] ? native_sched_clock+0x35/0x37 [ 43.462494] [<ffffffff8109ab53>] ? sched_clock_local+0x12/0x72 [ 43.462494] [<ffffffff810ad781>] ? mark_lock+0x2e/0x224 [ 43.462494] [<ffffffff8109ada4>] ? sched_clock_cpu+0xa0/0xb9 [ 43.462494] [<ffffffff810aee37>] ? __lock_acquire+0x5fe/0xde4 [ 43.462494] [<ffffffff8119f570>] ? rcu_read_lock_held+0x36/0x38 [ 43.462494] [<ffffffff8119f75a>] ? __fcheck_files.isra.7+0x4b/0x57 [ 43.462494] [<ffffffff8119fbf2>] ? __fget_light+0x30/0x54 [ 43.462494] [<ffffffff81779012>] __sys_sendmsg+0x42/0x60 [ 43.462494] [<ffffffff81779042>] SyS_sendmsg+0x12/0x1c [ 43.462494] [<ffffffff819d24d2>] system_call_fastpath+0x16/0x1b 'p->h' could be NULL while 'cp->h' is always update to date. Fixes: commit `331b72922c` ("net: sched: RCU cls_tcindex") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-By: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-16 15:20:09 -04:00
WANG Cong	44b75e4317	net_sched: fix memory leak in cls_tcindex Fixes: commit `331b72922c` ("net: sched: RCU cls_tcindex") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-By: John Fastabend <john.r.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-16 15:19:23 -04:00
David Howells	0c903ab64f	KEYS: Make the key matching functions return bool Make the key matching functions pointed to by key_match_data::cmp return bool rather than int. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-09-16 17:36:08 +01:00
David Howells	c06cfb08b8	KEYS: Remove key_type::match in favour of overriding default by match_preparse A previous patch added a ->match_preparse() method to the key type. This is allowed to override the function called by the iteration algorithm. Therefore, we can just set a default that simply checks for an exact match of the key description with the original criterion data and allow match_preparse to override it as needed. The key_type::match op is then redundant and can be removed, as can the user_match() function. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-09-16 17:36:06 +01:00
David Howells	462919591a	KEYS: Preparse match data Preparse the match data. This provides several advantages: (1) The preparser can reject invalid criteria up front. (2) The preparser can convert the criteria to binary data if necessary (the asymmetric key type really wants to do binary comparison of the key IDs). (3) The preparser can set the type of search to be performed. This means that it's not then a one-off setting in the key type. (4) The preparser can set an appropriate comparator function. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-09-16 17:36:02 +01:00
Steffen Klassert	b8c203b2d2	xfrm: Generate queueing routes only from route lookup functions Currently we genarate a queueing route if we have matching policies but can not resolve the states and the sysctl xfrm_larval_drop is disabled. Here we assume that dst_output() is called to kill the queued packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating queueing routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: `a0073fe18e` ("xfrm: Add a state resolution packet queue") Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-16 10:08:49 +02:00
Steffen Klassert	f92ee61982	xfrm: Generate blackhole routes only from route lookup functions Currently we genarate a blackhole route route whenever we have matching policies but can not resolve the states. Here we assume that dst_output() is called to kill the balckholed packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating blackhole routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: `2774c131b1` ("xfrm: Handle blackhole route creation via afinfo.") Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-16 10:08:40 +02:00
Andy Zhou	971427f353	openvswitch: Add recirc and hash action. Recirc action allows a packet to reenter openvswitch processing. currently openvswitch lookup flow for packet received and execute set of actions on that packet, with help of recirc action we can process/modify the packet and recirculate it back in openvswitch for another pass. OVS hash action calculates 5-tupple hash and set hash in flow-key hash. This can be used along with recirculation for distributing packets among different ports for bond devices. For example: OVS bonding can use following actions: Match on: bond flow; Action: hash, recirc(id) Match on: recirc-id == id and hash lower bits == a; Action: output port_bond_a Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-15 23:28:14 -07:00
Andy Zhou	32ae87ff79	openvswitch: simplify sample action implementation The current sample() function implementation is more complicated than necessary in handling single user space action optimization and skb reference counting. There is no functional changes. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-15 23:28:14 -07:00
Pravin B Shelar	8c8b1b83fc	openvswitch: Use tun_key only for egress tunnel path. Currently tun_key is used for passing tunnel information on ingress and egress path, this cause confusion. Following patch removes its use on ingress path make it egress only parameter. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2014-09-15 23:28:13 -07:00
Pravin B Shelar	83c8df26a3	openvswitch: refactor ovs flow extract API. OVS flow extract is called on packet receive or packet execute code path. Following patch defines separate API for extracting flow-key in packet execute code path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2014-09-15 23:28:13 -07:00
Pravin B Shelar	2ff3e4e486	openvswitch: Remove pkt_key from OVS_CB OVS keeps pointer to packet key in skb->cb, but the packet key is store on stack. This could make code bit tricky. So it is better to get rid of the pointer. Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-15 23:28:13 -07:00
Julian Anastasov	cf34e646da	ipvs: address family of LBLCR entry depends on svc family The LBLCR entries should use svc->af, not dest->af. Needed to support svc->af != dest->af. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:38 +09:00
Julian Anastasov	f7fa380069	ipvs: address family of LBLC entry depends on svc family The LBLC entries should use svc->af, not dest->af. Needed to support svc->af != dest->af. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:38 +09:00
Alex Gartrell	8052ba2925	ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding Pull the common logic for preparing an skb to prepend the header into a single function and then set fields such that they can be used in either case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc) Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:37 +09:00
Alex Gartrell	c63e4de2be	ipvs: Add generic ensure_mtu_is_adequate to handle mixed pools The out_rt functions check to see if the mtu is large enough for the packet and, if not, send icmp messages (TOOBIG or DEST_UNREACH) to the source and bail out. We needed the ability to send ICMP from the out_rt_v6 function and DEST_UNREACH from the out_rt function, so we just pulled it out into a common function. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:37 +09:00
Alex Gartrell	919aa0b2bb	ipvs: Pull out update_pmtu code Another step toward heterogeneous pools, this removes another piece of functionality currently specific to each address family type. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:36 +09:00
Alex Gartrell	4a4739d56b	ipvs: Pull out crosses_local_route_boundary logic This logic is repeated in both out_rt functions so it was redundant. Additionally, we'll need to be able to do checks to route v4 to v6 and vice versa in order to deal with heterogeneous pools. This patch also updates the callsites to add an additional parameter to the out route functions. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:36 +09:00
Alex Gartrell	391f503d69	ipvs: prevent mixing heterogeneous pools and synchronization The synchronization protocol is not compatible with heterogeneous pools, so we need to verify that we're not turning both on at the same time. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:35 +09:00
Alex Gartrell	ba38528aae	ipvs: Supply destination address family to ip_vs_conn_new The assumption that dest af is equal to service af is now unreliable, so we must specify it manually so as not to copy just the first 4 bytes of a v6 address or doing an illegal read of 16 butes on a v6 address. We "lie" in two places: for synchronization (which we will explicitly disallow from happening when we have heterogeneous pools) and for black hole addresses where there's no real dest. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:34 +09:00
Alex Gartrell	ad147aa4dd	ipvs: Pass destination address family to ip_vs_trash_get_dest Part of a series of diffs to tease out destination family from virtual family. This diff just adds a parameter to ip_vs_trash_get and then uses it for comparison rather than svc->af. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:34 +09:00
Alex Gartrell	655eef103d	ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest} We need to remove the assumption that virtual address family is the same as real address family in order to support heterogeneous services (that is, services with v4 vips and v6 backends or the opposite). Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:33 +09:00
Alex Gartrell	6cff339bbd	ipvs: Add destination address family to netlink interface This is necessary to support heterogeneous pools. For example, if you have an ipv6 addressed network, you'll want to be able to forward ipv4 traffic into it. This patch enforces that destination address family is the same as service family, as none of the forwarding mechanisms support anything else. For the old setsockopt mechanism, we simply set the dest address family to AF_INET as we do with the service. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:33 +09:00
Kenny Mathis	616a9be25c	ipvs: Add simple weighted failover scheduler Add simple weighted IPVS failover support to the Linux kernel. All other scheduling modules implement some form of load balancing, while this offers a simple failover solution. Connections are directed to the appropriate server based solely on highest weight value and server availability. Tested functionality with keepalived. Signed-off-by: Kenny Mathis <kmathis@chokepoint.net> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-09-16 09:03:32 +09:00
Florian Fainelli	c1f570a6ab	net: dsa: fix mii_bus to host_dev replacement dsa_of_probe() still used cd->mii_bus instead of cd->host_dev when building with CONFIG_OF=y. Fix this by making the replacement here as well. Fixes: `b4d2394d01` ("dsa: Replace mii_bus with a generic host device") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:52:48 -04:00
WANG Cong	10ee1c34be	net_sched: use tcindex_filter_result_init() Fixes: commit `331b72922c` ("net: sched: RCU cls_tcindex") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:51:18 -04:00
WANG Cong	2f9a220eff	net_sched: fix suspicious RCU usage in tcindex_classify() This patch fixes the following kernel warning: [ 44.805900] [ INFO: suspicious RCU usage. ] [ 44.808946] 3.17.0-rc4+ #610 Not tainted [ 44.811831] ------------------------------- [ 44.814873] net/sched/cls_tcindex.c:84 suspicious rcu_dereference_check() usage! Fixes: commit `331b72922c` ("net: sched: RCU cls_tcindex") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:49:42 -04:00
WANG Cong	a57a65ba47	net_sched: fix an allocation bug in tcindex_set_parms() Fixes: commit `331b72922c` ("net: sched: RCU cls_tcindex") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:48:23 -04:00
WANG Cong	80dcbd12fb	net_sched: fix suspicious RCU usage in cls_bpf_classify() Fixes: commit `1f947bf151` ("net: sched: rcu'ify cls_bpf") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:42:08 -04:00
Vlad Yasevich	c095f248e6	bridge: Fix br_should_learn to check vlan_enabled As Toshiaki Makita pointed out, the BRIDGE_INPUT_SKB_CB will not be initialized in br_should_learn() as that function is called only from br_handle_local_finish(). That is an input handler for link-local ethernet traffic so it perfectly correct to check br->vlan_enabled here. Reported-by: Toshiaki Makita<toshiaki.makita1@gmail.com> Fixes: `20adfa1` bridge: Check if vlan filtering is enabled only once. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:38:30 -04:00
Alexander Duyck	b4d2394d01	dsa: Replace mii_bus with a generic host device This change makes it so that instead of passing and storing a mii_bus we instead pass and store a host_dev. From there we can test to determine the exact type of device, and can verify it is the correct device for our switch. So for example it would be possible to pass a device pointer from a pci_dev and instead of checking for a PHY ID we could check for a vendor and/or device ID. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:24:20 -04:00
Alexander Duyck	5075314e4e	dsa: Split ops up, and avoid assigning tag_protocol and receive separately This change addresses several issues. First, it was possible to set tag_protocol without setting the ops pointer. To correct that I have reordered things so that rcv is now populated before we set tag_protocol. Second, it didn't make much sense to keep setting the device ops each time a new slave was registered. So by moving the receive portion out into root switch initialization that issue should be addressed. Third, I wanted to avoid sending tags if the rcv pointer was not registered so I changed the tag check to verify if the rcv function pointer is set on the root tree. If it is then we start sending DSA tagged frames. Finally I split the device ops pointer in the structures into two spots. I placed the rcv function pointer in the root switch since this makes it easiest to access from there, and I placed the xmit function pointer in the slave for the same reason. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:24:20 -04:00
Jozsef Kadlecsik	07034aeae1	netfilter: ipset: hash:mac type added to ipset Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:21 +02:00
Anton Danilov	76cea4109c	netfilter: ipset: Add skbinfo extension support to SET target. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:21 +02:00
Anton Danilov	cbee93d7b7	netfilter: ipset: Add skbinfo extension kernel support for the list set type. Add skbinfo extension kernel support for the list set type. Introduce the new revision of the list set type. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
Anton Danilov	af331419d3	netfilter: ipset: Add skbinfo extension kernel support for the hash set types. Add skbinfo extension kernel support for the hash set types. Inroduce the new revisions of all hash set types. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
Anton Danilov	39d1ecf1ad	netfilter: ipset: Add skbinfo extension kernel support for the bitmap set types. Add skbinfo extension kernel support for the bitmap set types. Inroduce the new revisions of bitmap_ip, bitmap_ipmac and bitmap_port set types. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
Anton Danilov	0e9871e3f7	netfilter: ipset: Add skbinfo extension kernel support in the ipset core. Skbinfo extension provides mapping of metainformation with lookup in the ipset tables. This patch defines the flags, the constants, the functions and the structures for the data type independent support of the extension. Note the firewall mark stores in the kernel structures as two 32bit values, but transfered through netlink as one 64bit value. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
Jozsef Kadlecsik	73e64e1813	netfilter: ipset: Fix static checker warning in ip_set_core.c Dan Carpenter reported the following static checker warning: net/netfilter/ipset/ip_set_core.c:1414 call_ad() error: 'nlh->nlmsg_len' from user is not capped properly The payload size is limited now by the max size of size_t. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
John W. Linville	1186b623c2	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-09-15 14:55:45 -04:00
John W. Linville	6bd2bd27ba	This time, I have some rate minstrel improvements, support for a very small feature from CCX that Steinar reverse-engineered, dynamic ACK timeout support, a number of changes for TDLS, early support for radio resource measurement and many fixes. Also, I'm changing a number of places to clear key memory when it's freed and Intel claims copyright for code they developed. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUEpv0AAoJEDBSmw7B7bqr6CMP/2CXvWr/98AY2Flt74KDNyaE vmJBVCsu+eT0G9FL6YxbVU5+rvInGDHd9qTHkU4ljd+uXwnG8XAT+WHFlhBjzm+V juXPWblbSdMzwpWDfq7Kbk134b9ALTEUqekhqSFvhPA5h0Dq0/8lDK9CFyfwKWbN 07PwUv0VUUEHKVqQoVSNJu9Szi5NvZvDcN7Jwg1Cpnv0sUOeH7J2Kz1OUT4RaEhI c/UJjCQV4ssXaEkTDIxciQ62HrglZanMqyx4a9LGbrxLdw1KJ19CNmSkwB5mQuZg LhV05Y0Gv4tkRC8sCo7HF7cqgjBfjTNiEjZYfbExW0QFOMKIgKmmjYIEezVdbrk7 gFIyhTRE595UtztUJV0dcitoOlybbRf3OdEwAIJD6fc0vhoe/rSjUIyS7/CZisMT 9zg33JvtK3eYPSJS1jy4lk2yZ5alhLoPMQTNmsEuyOGcU3sH9vTGMjONPffOlcH9 nzj7aUS2Qvwn3H+4CIaZbZhySpa0B9zkGL3oxeaEBmLJbFMTo5ua2FNGhubC2O+O BwNULDBEMwsGHKMUCWCLmQwACWdVdNxYYWtXbWfxdmC/CJoXgdLCJIUfoa1aOf2A DyCqUFvG/n8ObHVy+P3RU6poQFj0M/yclJAMHRW6x2qzNvAkDb0G6TVeIlgN5dG8 jLoZPL5OH0wb0BPVNEH8 =OPIp -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-john-2014-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg <johannes@sipsolutions.net> says: "This time, I have some rate minstrel improvements, support for a very small feature from CCX that Steinar reverse-engineered, dynamic ACK timeout support, a number of changes for TDLS, early support for radio resource measurement and many fixes. Also, I'm changing a number of places to clear key memory when it's freed and Intel claims copyright for code they developed." Conflicts: net/mac80211/iface.c Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-09-15 14:51:23 -04:00
Eric Dumazet	b3d6cb92fd	tcp: do not copy headers in tcp_collapse() tcp_collapse() wants to shrink skb so that the overhead is minimal. Now we store tcp flags into TCP_SKB_CB(skb)->tcp_flags, we no longer need to keep around full headers. Whole available space is dedicated to the payload. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 14:41:08 -04:00
Eric Dumazet	e93a0435f8	tcp: allow segment with FIN in tcp_try_coalesce() We can allow a segment with FIN to be aggregated, if we take care to add tcp flags, and if skb_try_coalesce() takes care of zero sized skbs. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 14:41:07 -04:00
Eric Dumazet	e11ecddf51	tcp: use TCP_SKB_CB(skb)->tcp_flags in input path Input path of TCP do not currently uses TCP_SKB_CB(skb)->tcp_flags, which is only used in output path. tcp_recvmsg(), looks at tcp_hdr(skb)->syn for every skb found in receive queue, and its unfortunate because this bit is located in a cache line right before the payload. We can simplify TCP by copying tcp flags into TCP_SKB_CB(skb)->tcp_flags. This patch does so, and avoids the cache line miss in tcp_recvmsg() Following patches will - allow a segment with FIN being coalesced in tcp_try_coalesce() - simplify tcp_collapse() by not copying the headers. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 14:41:07 -04:00
Alexander Y. Fomichev	7ce64c79c4	net: fix creation adjacent device symlinks __netdev_adjacent_dev_insert may add adjust device of different net namespace, without proper check it leads to emergence of broken sysfs links from/to devices in another namespace. Fix: rewrite netdev_adjacent_is_neigh_list macro as a function, move net_eq check into netdev_adjacent_is_neigh_list. (thanks David) related to: `4c75431ac3` Signed-off-by: Alexander Fomichev <git.user@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 14:24:53 -04:00
Marcel Holtmann	43e73e4e2a	Bluetooth: Provide HCI command opcode information to driver The Bluetooth core already does processing of the HCI command header and puts it together before sending it to the driver. It is not really efficient for the driver to look at the HCI command header again in case it has to make certain decisions about certain commands. To make this easier, just provide the opcode as part of the SKB control buffer information. The extra information about the opcode is optional and only provided for HCI commands. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-09-15 07:15:45 +03:00
Marcel Holtmann	7cb9d20fd9	Bluetooth: Add BUILD_BUG_ON check for SKB control buffer size The struct bt_skb_cb size needs to stay within the limits of skb->cb at all times and to ensure that add a BUILD_BUG_ON to check for it at compile time. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-09-15 07:15:41 +03:00
Sasha Levin	c0d1379a19	net: bpf: correctly handle errors in sk_attach_filter() Commit "net: bpf: make eBPF interpreter images read-only" has changed bpf_prog to be vmalloc()ed but never handled some of the errors paths of the old code. On error within sk_attach_filter (which userspace can easily trigger), we'd kfree() the vmalloc()ed memory, and leak the internal bpf_work_struct. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 17:37:49 -04:00
Vlad Yasevich	635126b7ca	bridge: Allow clearing of pvid and untagged bitmap Currently, it is possible to modify the vlan filter configuration to add pvid or untagged support. For example: bridge vlan add vid 10 dev eth0 bridge vlan add vid 10 dev eth0 untagged pvid The second statement will modify vlan 10 to include untagged and pvid configuration. However, it is currently impossible to go backwards bridge vlan add vid 10 dev eth0 untagged pvid bridge vlan add vid 10 dev eth0 Here nothing happens. This patch correct this so that any modifiers not supplied are removed from the configuration. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 17:21:56 -04:00
Vlad Yasevich	20adfa1a81	bridge: Check if vlan filtering is enabled only once. The bridge code checks if vlan filtering is enabled on both ingress and egress. When the state flip happens, it is possible for the bridge to currently be forwarding packets and forwarding behavior becomes non-deterministic. Bridge may drop packets on some interfaces, but not others. This patch solves this by caching the filtered state of the packet into skb_cb on ingress. The skb_cb is guaranteed to not be over-written between the time packet entres bridge forwarding path and the time it leaves it. On egress, we can then check the cached state to see if we need to apply filtering information. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 17:21:56 -04:00
Hannes Frederic Sowa	233577a220	net: filter: constify detection of pkt_type_offset Currently we have 2 pkt_type_offset functions doing the same thing and spread across the architecture files. Remove those and replace them with a PKT_TYPE_OFFSET macro helper which gets the constant value from a zero sized sk_buff member right in front of the bitfield with offsetof. This new offset marker does not change size of struct sk_buff. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 17:07:21 -04:00
Florian Fainelli	ac7a04c33d	net: dsa: change tag_protocol to an enum Now that we introduced an additional multiplexing/demultiplexing layer with commit `3e8a72d1da` ("net: dsa: reduce number of protocol hooks") that lives within the DSA code, we no longer need to have a given switch driver tag_protocol be an actual ethertype value, instead, we can replace it with an enum: dsa_tag_protocol. Do this replacement in the drivers, which allows us to get rid of the cpu_to_be16()/htons() dance, and remove ETH_P_BRCMTAG since we do not need it anymore. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 17:04:35 -04:00
WANG Cong	3ce62a84d5	ipv6: exit early in addrconf_notify() if IPv6 is disabled If IPv6 is explicitly disabled before the interface comes up, it makes no sense to continue when it comes up, even just print a message. (I am not sure about other cases though, so I prefer not to touch) Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:39:40 -04:00
WANG Cong	1691c63ea4	ipv6: refactor ipv6_dev_mc_inc() Refactor out allocation and initialization and make the refcount code more readable. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:38:42 -04:00
WANG Cong	f7ed925c1b	ipv6: update the comment in mcast.c Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:38:42 -04:00
WANG Cong	414b6c943f	ipv6: drop some rcu_read_lock in mcast Similarly the code is already protected by rtnl lock. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:38:42 -04:00
WANG Cong	b5350916bf	ipv6: drop ipv6_sk_mc_lock in mcast Similarly the code is already protected by rtnl lock. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:38:42 -04:00
WANG Cong	83aa29eefd	ipv6: refactor __ipv6_dev_ac_inc() Refactor out allocation and initialization and make the refcount code more readable. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:38:42 -04:00
WANG Cong	013b4d9038	ipv6: clean up ipv6_dev_ac_inc() Make it accept inet6_dev, and rename it to __ipv6_dev_ac_inc() to reflect this change. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:38:42 -04:00
WANG Cong	b03a9c04a3	ipv6: remove ipv6_sk_ac_lock Just move rtnl lock up, so that the anycast list can be protected by rtnl lock now. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:38:42 -04:00
WANG Cong	6c555490e0	ipv6: drop useless rcu_read_lock() in anycast These code is now protected by rtnl lock, rcu read lock is useless now. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 16:38:42 -04:00
John Fastabend	1f947bf151	net: sched: rcu'ify cls_bpf This patch makes the cls_bpf classifier RCU safe. The tcf_lock was being used to protect a list of cls_bpf_prog now this list is RCU safe and updates occur with rcu_replace. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	b929d86d25	net: sched: rcu'ify cls_rsvp Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	1ce87720d4	net: sched: make cls_u32 lockless Make cls_u32 classifier safe to run without holding lock. This patch converts statistics that are kept in read section u32_classify into per cpu counters. This patch was tested with a tight u32 filter add/delete loop while generating traffic with pktgen. By running pktgen on vlan devices created on top of a physical device we can hit the qdisc layer correctly. For ingress qdisc's a loopback cable was used. for i in {1..100}; do q=`echo $i%8\|bc`; echo -n "u32 tos: iteration $i on queue $q"; tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \ action skbedit queue_mapping $q; sleep 1; tc filter del dev p3p2 prio $i; echo -n "u32 tos hash table: iteration $i on queue $q"; tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1 tc filter add dev p3p2 parent $p protocol ip prio $i u32 \ match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0 tc filter add dev p3p2 parent $p protocol ip prio $i u32 \ ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q sleep 2; tc filter del dev p3p2 prio $i sleep 1; done Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	459d5f626d	net: sched: make cls_u32 per cpu This uses per cpu counters in cls_u32 in preparation to convert over to rcu. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	331b72922c	net: sched: RCU cls_tcindex Make cls_tcindex RCU safe. This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check caller either holds the rcu read lock or RTNL. This is needed to handle the case where tcindex_lookup() is being called in both cases. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	1109c00547	net: sched: RCU cls_route RCUify the route classifier. For now however spinlock's are used to protect fastmap cache. The issue here is the fastmap may be read by one CPU while the cache is being updated by another. An array of pointers could be one possible solution. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	e35a8ee599	net: sched: fw use RCU RCU'ify fw classifier. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	70da9f0bf9	net: sched: cls_flow use RCU Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	952313bd62	net: sched: cls_cgroup use RCU Make cgroup classifier safe for RCU. Also drops the calls in the classify routine that were doing a rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held entering this routine we have issues with deleting the classifier chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock() pair noting all paths AFAIK hold rcu_read_lock. If there is a case where classify is called without the rcu read lock then an rcu splat will occur and we can correct it. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:26 -04:00
John Fastabend	9888faefe1	net: sched: cls_basic use RCU Enable basic classifier for RCU. Dereferencing tp->root may look a bit strange here but it is needed by my accounting because it is allocated at init time and needs to be kfree'd at destroy time. However because it may be referenced in the classify() path we must wait an RCU grace period before free'ing it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern is used in all the classifiers. Also the hgenerator can be incremented without concern because it is always incremented under RTNL. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:25 -04:00
John Fastabend	25d8c0d55f	net: rcu-ify tcf_proto rcu'ify tcf_proto this allows calling tc_classify() without holding any locks. Updaters are protected by RTNL. This patch prepares the core net_sched infrastracture for running the classifier/action chains without holding the qdisc lock however it does nothing to ensure cls_xxx and act_xxx types also work without locking. Additional patches are required to address the fall out. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:25 -04:00
John Fastabend	46e5da40ae	net: qdisc: use rcu prefix and silence sparse warnings Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:30:25 -04:00
David S. Miller	cffc6c4c94	Merge tag 'master-2014-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-09-11 Please pull this batch of fixes intended for the 3.17 stream: For the mac80211 bits, Johannes says: "Two more fixes for mac80211 - one of them addresses a long-standing issue that we only found when using vendor events more frequently; the other addresses some bad information being reported in userspace that people were starting to actually look at." For the iwlwifi bits, Emmanuel says: "I re-enable scheduled scan on firmware that contain the fix for the bug that Linus reported. A few trivial fixes: endianity issues, the same DTIM period fix that I did in mac80211. Eyal fixes a few issues we identified with EAPOL, we now send them just as if they were management frames, this solves interrop issues. Johannes has another set of trivial fixes, while Luca fixes the way we configure the filters in the firmware. Last but not least, a new device is added by Oren." Emmanuel was traveling, resulting in his pull to be a bit larger than I would have liked to see at this point. FWIW, I have asked Emmanuel to be much more strict for any more pull requests in this cycle. In addition to the above, Sujith Manoharan reverts an earlier ath9k patch. The earlier change was found to allow for the device to sleep too long and miss beacons. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-12 18:21:47 -04:00
Scott Wood	2d8f7e2c8a	udp: Fix inverted NAPI_GRO_CB(skb)->flush test Commit `2abb7cdc0d` ("udp: Add support for doing checksum unnecessary conversion") caused napi_gro_cb structs with the "flush" field zero to take the "udp_gro_receive" path rather than the "set flush to 1" path that they would previously take. As a result I saw booting from an NFS root hang shortly after starting userspace, with "server not responding" messages. This change to the handling of "flush == 0" packets appears to be incidental to the goal of adding new code in the case where skb_gro_checksum_validate_zero_check() returns zero. Based on that and the fact that it breaks things, I'm assuming that it is unintentional. Fixes: `2abb7cdc0d` ("udp: Add support for doing checksum unnecessary conversion") Cc: Tom Herbert <therbert@google.com> Signed-off-by: Scott Wood <scottwood@freescale.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-12 17:55:41 -04:00
Alexander Duyck	bf7fa551e0	mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path There is a possible issue with the use, or lack thereof of sk_refcnt and sk_wmem_alloc in the wifi ack status functionality. Specifically if a socket were to request acknowledgements, and the socket were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc to reach 0 it would be possible to have sock_queue_err_skb orphan the last buffer, resulting in __sk_free being called on the socket. After this the buffer is enqueued on sk_error_queue, however the queue has already been flushed resulting in at least a memory leak, if not a data corruption. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-12 17:51:25 -04:00
Alexander Duyck	cab41c47d9	skb: Add documentation for skb_clone_sk This change adds some documentation to the call skb_clone_sk. This is meant to help clarify the purpose of the function for other developers. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-12 17:51:24 -04:00
Sabrina Dubroca	381f4dca48	ipv6: clean up anycast when an interface is destroyed If we try to rmmod the driver for an interface while sockets with setsockopt(JOIN_ANYCAST) are alive, some refcounts aren't cleaned up and we get stuck on: unregister_netdevice: waiting for ens3 to become free. Usage count = 1 If we LEAVE_ANYCAST/close everything before rmmod'ing, there is no problem. We need to perform a cleanup similar to the one for multicast in addrconf_ifdown(how == 1). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-12 17:33:06 -04:00
Johan Hedberg	9a783a139c	Bluetooth: Fix re-setting RPA as expired when deferring update The hci_update_random_address will clear the RPA_EXPIRED flag and proceed with setting a new one if the flag was set. However, the set_random_addr() function that is called may choose to defer the update to a later moment. In such a case the flag would incorrectly remain unset unless set_random_addr() re-sets it. This patch fixes the issue. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-12 18:34:25 +02:00
Pablo Neira Ayuso	0bbe80e571	netfilter: masquerading needs to be independent of x_tables in Kconfig Users are starting to test nf_tables with no x_tables support. Therefore, masquerading needs to be indenpendent of it from Kconfig. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-12 09:40:18 +02:00
Pablo Neira Ayuso	3e8dc212a0	netfilter: NFT_CHAIN_NAT_IPV* is independent of NFT_NAT Now that we have masquerading support in nf_tables, the NAT chain can be use with it, not only for SNAT/DNAT. So make this chain type independent of it. While at it, move it inside the scope of 'if NF_NAT_IPV*' to simplify dependencies. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-12 09:40:17 +02:00
Linus Torvalds	c73f6fdf2f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "The main thing here is a set of three patches that fix a buffer overrun for large authentication tickets (sigh). There is also a trivial warning fix and an error path fix that are both regressions" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: do not hard code max auth ticket len libceph: add process_one_ticket() helper libceph: gracefully handle large reply messages from the mon rbd: fix error return code in rbd_dev_device_setup() rbd: avoid format-security warning inside alloc_workqueue()	2014-09-11 18:03:21 -07:00
Eliad Peller	0d8614b4b9	mac80211: replace SMPS hw flags with wiphy feature bits Use the new static_smps / dynamic_smps feature bits instead of mac80211-internal hw flags. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 13:37:02 +02:00
Eliad Peller	f699317487	mac80211: set smps_mode according to ap params Take the requested smps mode from the ap params (instead of always starting with SMPS_OFF) Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 13:37:02 +02:00
Eliad Peller	18998c381b	cfg80211: allow requesting SMPS mode on ap start Add feature bits to indicate device support for static-smps and dynamic-smps modes. Add a new NL80211_ATTR_SMPS_MODE attribue to allow configuring the smps mode to be used by the ap (e.g. configuring to ap to dynamic smps mode will reduce power consumption while having minor effect on throughput) Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 13:37:02 +02:00
Arik Nemtsov	59cd85cbcf	mac80211: set network header in TDLS frames Correctly mark the network header location in mac80211-generated TDLS frames. These may be used by lower-level drivers. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:25:22 +02:00
Eliad Peller	b0b6aa2c8e	cfg80211/mac80211: add wmm info to assoc event Userspace might need to know what queues are configured for uapsd (e.g. for setting proper default values in tspecs). Add this bitmap to the association event (inside wmm nested attribute) Add additional parameter to cfg80211_rx_assoc_resp, and update its callers. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:24:39 +02:00
Johannes Berg	960d01acf6	cfg80211: add WMM traffic stream API Add nl80211 and driver API to validate, add and delete traffic streams with appropriate settings. The API calls for userspace doing the action frame handshake with the peer, and then allows only to set up the parameters in the driver. To avoid setting up a session only to tear it down again, the validate API is provided, but the real usage later can still fail so userspace must be prepared for that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:21:18 +02:00
Liad Kaufman	9d58f25b12	mac80211: add TDLS connection timeout Adding a timeout for tearing down a TDLS connection that hasn't had ACKed traffic sent through it for a certain amount of time. Since we have no other monitoring facility to indicate the existance (or non-existance) of a peer, this patch will cause a peer to be considered as unavailable if for some X time at least some Y packets have all not been ACKed. Signed-off-by: Liad Kaufman <liad.kaufman@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:18:47 +02:00
Thomas Huehn	5935839ad7	mac80211: improve minstrel_ht rate sorting by throughput & probability This patch improves the way minstrel_ht sorts rates according to throughput and success probability. 3 FOR-loops across the entire rate and mcs group set in function minstrel_ht_update_stats() which where used to determine the fastest, second fastest and most robust rate are reduced to 2 FOR-loop. The sorted list of rates according throughput is extended to the best four rates as we need them in upcoming joint rate and power control. The sorting is done via the new function minstrel_ht_sort_best_tp_rates(). The annotation of those 4 best throughput rates in the debugfs file rc-stats is changes to: "A,B,C,D", where A is the fastest rate and C the 4th fastest. Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de> Tested-by: Stefan Venz <ikstream86@gmail.com> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:10:14 +02:00
Thomas Huehn	ca12c0c833	mac80211: Unify rate statistic variables between Minstrel & Minstrel_HT Minstrel and Mintrel_HT used there own structs to keep track of rate statistics. Unify those variables in struct minstrel_rate_states and move it to rc80211_minstrel.h for common usage. This is a clean-up patch to prepare Minstrel and Minstrel_HT codebase for upcoming TPC. Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:08:31 +02:00
Johannes Berg	5393b917bc	cfg80211: clear nl80211 messages carrying keys after processing Clear any nl80211 messages that might contain keys after processing them to avoid leaving their data in memory "forever" after they've been freed. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:07:39 +02:00
Johannes Berg	78f686cae0	cfg80211: don't put kek/kck/replay counter on the stack There's no need to put the values on the stack, just pass a pointer to the data in the nl80211 message. This reduces stack usage and avoids potential issues with putting sensitive data on the stack. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:07:34 +02:00
Johannes Berg	538c9eb8b3	cfg80211: clear wext keys when freeing and removing them When freeing the keys stored for wireless extensions, clear the memory to avoid having the key material stick around in memory "forever". Similarly, when userspace overwrites a key, actually clear it instead of just setting the key length to zero. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:07:28 +02:00
Johannes Berg	29c3f9c399	mac80211: clear key material when freeing keys When freeing the key, clear the memory to avoid having the key material stick around in memory "forever". Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:07:23 +02:00
Johannes Berg	b47f610bd6	cfg80211: clear connect keys when freeing them When freeing the connect keys, clear the memory to avoid having the key material stick around in memory "forever". Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-11 12:07:18 +02:00
Johan Hedberg	7ed3fa2078	Bluetooth: Expire RPA if encryption fails If encryption fails and we're using an RPA it may be because of a conflict with another device. To avoid repeated failures the safest action is to simply mark the RPA as expired so that a new one gets generated as soon as the connection drops. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-11 07:32:14 +02:00
Johan Hedberg	5be5e275ad	Bluetooth: Avoid hard-coded IO capability values in SMP This is a trivial change to use a proper define for the NoInputNoOutput IO capability instead of hard-coded values. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-11 03:02:22 +02:00
Johan Hedberg	aeaeb4bbca	Bluetooth: Fix L2CAP information request handling for fixed channels Even if we have no connection-oriented channels we should perform the L2CAP Information Request procedures before notifying L2CAP channels of the connection. This is so that the L2CAP channel implementations can perform checks on what the remote side supports (e.g. does it support the fixed channel in question). So far the code has relied on the l2cap_do_start() function to initiate the Information Request, however l2cap_do_start() is used on a per-channel basis and only for connection-oriented channels. This means that if there are no connection-oriented channels on the system we would never start the Information Request procedure. This patch creates a new l2cap_request_info() helper function to initiate the Information Request procedure, and ensures that it is called whenever a BR/EDR connection has been established. The patch also updates fixed channels to be notified of connection readiness only once the Information Request procedure has completed. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-11 02:45:24 +02:00
Johan Hedberg	a6f7833ca3	Bluetooth: Add smp_ltk_sec_level() helper function There are several places that need to determine the security level that an LTK can provide. This patch adds a convenience function for this to help make the code more readable. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-11 02:45:24 +02:00
Johan Hedberg	1afc2a1ab6	Bluetooth: Fix SMP security level when we have no IO capabilities When the local IO capability is NoInputNoOutput any attempt to convert the remote authentication requirement to a target security level is futile. This patch makes sure that we set the target security level at most to MEDIUM if the local IO capability is NoInputNoOutput. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-11 02:45:24 +02:00
Johan Hedberg	24bd0bd94e	Bluetooth: Centralize disallowing SMP commands to a single place All the cases where we mark SMP commands as dissalowed are their respective command handlers. We can therefore simplify the code by always clearing the bit immediately after testing it. This patch converts the corresponding test_bit() call to a test_and_clear_bit() call and also removes the now unused SMP_DISALLOW_CMD macro. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-11 02:45:24 +02:00
Johan Hedberg	c05b9339c8	Bluetooth: Fix ignoring unknown SMP authentication requirement bits The SMP specification states that we should ignore any unknown bits from the authentication requirement. We already have a define for masking out unknown bits but we haven't used it in all places so far. This patch adds usage of the AUTH_REQ_MASK to all places that need it and ensures that we don't pass unknown bits onward to other functions. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-11 02:45:24 +02:00
Johan Hedberg	3a7dbfb8ff	Bluetooth: Remove unnecessary early initialization of variable We do nothing else with the auth variable in smp_cmd_pairing_rsp() besides passing it to tk_request() which in turn only cares about whether one of the sides had the MITM bit set. It is therefore unnecessary to assign a value to it until just before calling tk_request(), and this value can simply be the bit-wise or of the local and remote requirements. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-11 02:45:24 +02:00
Erik Hugne	0fc4dffad1	tipc: fix sparse warnings This fixes the following sparse warnings: sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static? Also, the function is changed to return bool upon success, rather than a potentially freed pointer. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:00:58 -07:00
Chris Perl	0f7a622ca6	rpc: xs_bind - do not bind when requesting a random ephemeral port When attempting to establish a local ephemeral endpoint for a TCP or UDP socket, do not explicitly call bind, instead let it happen implicilty when the socket is first used. The main motivating factor for this change is when TCP runs out of unique ephemeral ports (i.e. cannot find any ephemeral ports which are not a part of any TCP connection). In this situation if you explicitly call bind, then the call will fail with EADDRINUSE. However, if you allow the allocation of an ephemeral port to happen implicitly as part of connect (or other functions), then ephemeral ports can be reused, so long as the combination of (local_ip, local_port, remote_ip, remote_port) is unique for TCP sockets on the system. This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP sockets the same. This can allow mount.nfs(8) to continue to function successfully, even in the face of misbehaving applications which are creating a large number of TCP connections. Signed-off-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-10 12:47:00 -07:00
David S. Miller	0aac383353	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== nf-next pull request The following patchset contains Netfilter/IPVS updates for your net-next tree. Regarding nf_tables, most updates focus on consolidating the NAT infrastructure and adding support for masquerading. More specifically, they are: 1) use __u8 instead of u_int8_t in arptables header, from Mike Frysinger. 2) Add support to match by skb->pkttype to the meta expression, from Ana Rey. 3) Add support to match by cpu to the meta expression, also from Ana Rey. 4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from Vytas Dauksa. 5) Fix netnet and netportnet hash types the range support for IPv4, from Sergey Popovich. 6) Fix missing-field-initializer warnings resolved, from Mark Rustad. 7) Dan Carperter reported possible integer overflows in ipset, from Jozsef Kadlecsick. 8) Filter out accounting objects in nfacct by type, so you can selectively reset quotas, from Alexey Perevalov. 9) Move specific NAT IPv4 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4. 11) Move specific NAT IPv6 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6. 13) Refactor code to add nft_delrule(), which can be reused in the enhancement of the NFT_MSG_DELTABLE to remove a table and its content, from Arturo Borrero. 14) Add a helper function to unregister chain hooks, from Arturo Borrero. 15) A cleanup to rename to nft_delrule_by_chain for consistency with the new nft_*() functions, also from Arturo. 16) Add support to match devgroup to the meta expression, from Ana Rey. 17) Reduce stack usage for IPVS socket option, from Julian Anastasov. 18) Remove unnecessary textsearch state initialization in xt_string, from Bojan Prtvar. 19) Add several helper functions to nf_tables, more work to prepare the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero. 20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from Arturo Borrero. 21) Support NAT flags in the nat expression to indicate the flavour, eg. random fully, from Arturo. 22) Add missing audit code to ebtables when replacing tables, from Nicolas Dichtel. 23) Generalize the IPv4 masquerading code to allow its re-use from nf_tables, from Arturo. 24) Generalize the IPv6 masquerading code, also from Arturo. 25) Add the new masq expression to support IPv4/IPv6 masquerading from nf_tables, also from Arturo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:46:32 -07:00
Joe Perches	b167a37c7b	netfilter: Convert pr_warning to pr_warn Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
Joe Perches	47c4cfc37f	iucv: Convert pr_warning to pr_warn Use the more common pr_warn. Coalesce formats. Realign arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
Joe Perches	294a0b7f31	pktgen: Convert pr_warning to pr_warn Use the more common pr_warn. Realign arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
Joe Perches	ef423a4109	atm: Convert pr_warning to pr_warn Use the more common pr_warn. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
Ilya Dryomov	c27a3e4d66	libceph: do not hard code max auth ticket len We hard code cephx auth ticket buffer size to 256 bytes. This isn't enough for any moderate setups and, in case tickets themselves are not encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but ceph_decode_copy() doesn't - it's just a memcpy() wrapper). Since the buffer is allocated dynamically anyway, allocated it a bit later, at the point where we know how much is going to be needed. Fixes: http://tracker.ceph.com/issues/8979 Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-09-10 20:08:36 +04:00
Ilya Dryomov	597cda3577	libceph: add process_one_ticket() helper Add a helper for processing individual cephx auth tickets. Needed for the next commit, which deals with allocating ticket buffers. (Most of the diff here is whitespace - view with git diff -b). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-09-10 20:08:35 +04:00
Sage Weil	73c3d4812b	libceph: gracefully handle large reply messages from the mon We preallocate a few of the message types we get back from the mon. If we get a larger message than we are expecting, fall back to trying to allocate a new one instead of blindly using the one we have. CC: stable@vger.kernel.org Signed-off-by: Sage Weil <sage@redhat.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-09-10 20:08:32 +04:00
Tom Herbert	19424e052f	sit: Add gro callbacks to sit_offload Add ipv6_gro_receive and ipv6_gro_complete to sit_offload to support GRO. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 21:29:33 -07:00
Tom Herbert	9667e9bb3f	ipip: Add gro callbacks to ipip offload Add inet_gro_receive and inet_gro_complete to ipip_offload to support GRO. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 21:29:33 -07:00
Tom Herbert	03d56daafe	ipv6: Clear flush_id to make GRO work In TCP gro we check flush_id which is derived from the IP identifier. In IPv4 gro path the flush_id is set with the expectation that every matched packet increments IP identifier. In IPv6, the flush_id is never set and thus is uinitialized. What's worse is that in IPv6 over IPv4 encapsulation, the IP identifier is taken from the outer header which is currently not incremented on every packet for Linux stack, so GRO in this case never matches packets (identifier is not increasing). This patch clears flush_id for every time for a matched packet in IPv6 gro_receive. We need to do this each time to overwrite the setting that would be done in IPv4 gro_receive per the outer header in IPv6 over Ipv4 encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 21:29:33 -07:00
David Howells	ed3bfdfdce	RxRPC: Fix missing __user annotation Fix a missing __user annotation in a cast of a user space pointer (found by checker). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:39:40 -07:00
Florian Westphal	46cfd725c3	net: use kfree_skb_list() helper in more places Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:45 -07:00
Eric Dumazet	72bb17b37b	ipv4: udp4_gro_complete() is static net/ipv4/udp_offload.c:339:5: warning: symbol 'udp4_gro_complete' was not declared. Should it be static? Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Fixes: `57c67ff4bd` ("udp: additional GRO support") Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:45 -07:00
Eric Dumazet	416c51e17b	netns: remove one sparse warning net/core/net_namespace.c:227:18: warning: incorrect type in argument 1 (different address spaces) net/core/net_namespace.c:227:18: expected void const <noident> net/core/net_namespace.c:227:18: got struct net_generic [noderef] <asn:4>gen We can use rcu_access_pointer() here as read-side access to the pointer was removed at least one grace period ago. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:45 -07:00
Eric Dumazet	cc9c668a08	ipv6: udp6_gro_complete() is static net/ipv6/udp_offload.c:159:5: warning: symbol 'udp6_gro_complete' was not declared. Should it be static? Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `57c67ff4bd` ("udp: additional GRO support") Cc: Tom Herbert <therbert@google.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:44 -07:00
Eric Dumazet	8e380f004e	ipv4: rcu cleanup in ip_ra_control() Remove one sparse warning : net/ipv4/ip_sockglue.c:328:22: warning: incorrect type in assignment (different address spaces) net/ipv4/ip_sockglue.c:328:22: expected struct ip_ra_chain [noderef] <asn:4>next net/ipv4/ip_sockglue.c:328:22: got struct ip_ra_chain [assigned] ra And replace one rcu_assign_ptr() by RCU_INIT_POINTER() where applicable. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:44 -07:00
Daniel Borkmann	cbeddd5d16	ipv6: mcast: remove dead debugging defines It's not used anywhere, so just remove these. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:44 -07:00
Ani Sinha	6a2a2b3ae0	net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland. Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will break old binaries and any code for which there is no access to source code. To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland. Signed-off-by: Ani Sinha <ani@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:35:46 -07:00
Willem de Bruijn	67cc0d4077	net-timestamp: optimize sock_tx_timestamp default path Few packets have timestamping enabled. Exit sock_tx_timestamp quickly in this common case. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:34:41 -07:00
Florian Westphal	17448e5f63	net_sched: sfq: remove unused macro not used anymore since `ddecf0f` (net_sched: sfq: add optional RED on top of SFQ). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:34:41 -07:00
Daniel Borkmann	286aad3c40	net: bpf: be friendly to kmemcheck Reported by Mikulas Patocka, kmemcheck currently barks out a false positive since we don't have special kmemcheck annotation for bitfields used in bpf_prog structure. We currently have jited:1, len:31 and thus when accessing len while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that we're reading uninitialized memory. As we don't need the whole bit universe for pages member, we can just split it to u16 and use a bool flag for jited instead of a bitfield. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:58:56 -07:00
Eric Dumazet	ca777eff51	tcp: remove dst refcount false sharing for prequeue mode Alexander Duyck reported high false sharing on dst refcount in tcp stack when prequeue is used. prequeue is the mechanism used when a thread is blocked in recvmsg()/read() on a TCP socket, using a blocking model rather than select()/poll()/epoll() non blocking one. We already try to use RCU in input path as much as possible, but we were forced to take a refcount on the dst when skb escaped RCU protected region. When/if the user thread runs on different cpu, dst_release() will then touch dst refcount again. Commit `093162553c` (tcp: force a dst refcount when prequeue packet) was an example of a race fix. It turns out the only remaining usage of skb->dst for a packet stored in a TCP socket prequeue is IP early demux. We can add a logic to detect when IP early demux is probably going to use skb->dst. Because we do an optimistic check rather than duplicate existing logic, we need to guard inet_sk_rx_dst_set() and inet6_sk_rx_dst_set() from using a NULL dst. Many thanks to Alexander for providing a nice bug report, git bisection, and reproducer. Tested using Alexander script on a 40Gb NIC, 8 RX queues. Hosts have 24 cores, 48 hyper threads. echo 0 >/proc/sys/net/ipv4/tcp_autocorking for i in `seq 0 47` do for j in `seq 0 2` do netperf -H $DEST -t TCP_STREAM -l 1000 \ -c -C -T $i,$i -P 0 -- \ -m 64 -s 64K -D & done done Before patch : ~6Mpps and ~95% cpu usage on receiver After patch : ~9Mpps and ~35% cpu usage on receiver. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:54:41 -07:00
Johan Hedberg	196332f5a1	Bluetooth: Fix allowing SMP Signing info PDU If the remote side is not distributing its IRK but is distributing the CSRK the next PDU after master identification is the Signing Information. This patch fixes a missing SMP_ALLOW_CMD() for this in the smp_cmd_master_ident() function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-10 01:45:01 +02:00
Jeff Layton	e0b93eddfe	security: make security_file_set_fowner, f_setown and __f_setown void return security_file_set_fowner always returns 0, so make it f_setown and __f_setown void return functions and fix up the error handling in the callers. Cc: linux-security-module@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2014-09-09 16:01:36 -04:00
Li RongQing	e403aded79	openvswitch: change the data type of error status to atomic_long_t Change the date type of error status from u64 to atomic_long_t, and use atomic operation, then remove the lock which is used to protect the error status. The operation of atomic maybe faster than spin lock. Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:48:07 -07:00
Rami Rosen	5aaa62d608	bridge: Cleanup of unncessary check. This patch removes an unncessary check in the br_afspec() method of br_netlink.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:32:11 -07:00
Jiri Pirko	1332351617	bridge: implement rtnl_link_ops->changelink Allow rtnetlink users to set bridge master info via IFLA_INFO_DATA attr This initial part implements forward_delay, hello_time, max_age options. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:55 -07:00
Jiri Pirko	e5c3ea5c66	bridge: implement rtnl_link_ops->get_size and rtnl_link_ops->fill_info Allow rtnetlink users to get bridge master info in IFLA_INFO_DATA attr This initial part implements forward_delay, hello_time, max_age options. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:55 -07:00
Jiri Pirko	3ac636b859	bridge: implement rtnl_link_ops->slave_changelink Allow rtnetlink users to set port info via IFLA_INFO_SLAVE_DATA attr Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:55 -07:00
Jiri Pirko	ced8283f90	bridge: implement rtnl_link_ops->get_slave_size and rtnl_link_ops->fill_slave_info Allow rtnetlink users to get port info in IFLA_INFO_SLAVE_DATA attr Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:55 -07:00
Jiri Pirko	0f49579a39	bridge: switch order of rx_handler reg and upper dev link The thing is that netdev_master_upper_dev_link calls call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev). That generates rtnl link message and during that, rtnl_link_ops->fill_slave_info is called. But with current ordering, rx_handler and IFF_BRIDGE_PORT are not set yet so there would have to be check for that in fill_slave_info callback. Resolve this by reordering to similar what bonding and team does to avoid the check. Also add removal of IFF_BRIDGE_PORT flag into error path. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:54 -07:00
John W. Linville	ab09b95cbf	Two more fixes for mac80211 - one of them addresses a long-standing issue that we only found when using vendor events more frequently; the other addresses some bad information being reported in userspace that people were starting to actually look at. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUDcHCAAoJEDBSmw7B7bqr4wAQALogLihe4uESol038EXLA6yi NqoDQLUKtpcbWzhZKQZlOf7W53ggIXHUeIYSS2SXpn9dOX/9VniA5MkzbJK70uCf oVaXkSHujFE7JkLZ5Pusto4WtCLSfmlWJTRt3SN2BEpGcW0boYM8FxVuKBMtIj7s H56eVcTSCWCQrMbCfQcS2pDkkRLQL2MYmA0CAT/Hbxy7KjBYuxMJvXz9HsoTKotj Zj86UzisZ2QJSyxRtV5v7Z95LTcQtRKCKQp1kIV+64Q7c/ZOeTZ6l//52MqUhLpH vIwfvAcW02iCWrN8d/lulkAKPfw4RCPvoEWt9sIsp9WQjwWhrsBXLocw6XiuAFHZ j2EGoZvOBJV43FQOSw9Hli232QkwHh2QTiLEObNaVbG0wTMuEfRcjIjKsSpJ/WRq HfSGzg32X4XLPVh2EJ5n7qbChPbTzgBv+ydU4ApESHFUJHmLPWrbKNgTEs14llpr oIM+QVFA4o3vukaZIL/wrPk9dfbTSeEOHvpLeJZ2BCqOko8WyitZqoD54NeaPthf u/S4QzeRifdHB3gPWI1NNIC5jAgkgA9Zy0a+4xZP75bSBPfPB4PeM9z5yXdv+y6G mT9KWoTUltTBoHRwDceb4BHOYXX4xjwZSfcWwjTs1q4A83a9PQidkG2RelJJEiGQ 8WIVmGSlf6MpD2mZBnJc =gW3Y -----END PGP SIGNATURE----- Merge tag 'mac80211-for-john-2014-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg <johannes@sipsolutions.net> says: "Two more fixes for mac80211 - one of them addresses a long-standing issue that we only found when using vendor events more frequently; the other addresses some bad information being reported in userspace that people were starting to actually look at." Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-09-09 14:29:36 -04:00
Vincent Bernat	49a601589c	net/ipv4: bind ip_nonlocal_bind to current netns net.ipv4.ip_nonlocal_bind sysctl was global to all network namespaces. This patch allows to set a different value for each network namespace. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:27:09 -07:00
Arturo Borrero	9ba1f726be	netfilter: nf_tables: add new nft_masq expression The nft_masq expression is intended to perform NAT in the masquerade flavour. We decided to have the masquerade functionality in a separated expression other than nft_nat. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:30 +02:00
Arturo Borrero	be6b635cd6	netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables Let's refactor the code so we can reach the masquerade functionality from outside the xt context (ie. nftables). The patch includes the addition of an atomic counter to the masquerade notifier: the stuff to be done by the notifier is the same for xt and nftables. Therefore, only one notification handler is needed. This factorization only involves IPv6; a similar patch exists to handle IPv4. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:29 +02:00
Arturo Borrero	8dd33cc93e	netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables Let's refactor the code so we can reach the masquerade functionality from outside the xt context (ie. nftables). The patch includes the addition of an atomic counter to the masquerade notifier: the stuff to be done by the notifier is the same for xt and nftables. Therefore, only one notification handler is needed. This factorization only involves IPv4; a similar patch follows to handle IPv6. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:29 +02:00
Nicolas Dichtel	c55fbbb4a7	netfilter: ebtables: create audit records for replaces This is already done for x_tables (family AF_INET and AF_INET6), let's do it for AF_BRIDGE also. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:28 +02:00
Arturo Borrero	e42eff8a32	netfilter: nft_nat: include a flag attribute Both SNAT and DNAT (and the upcoming masquerade) can have additional configuration parameters, such as port randomization and NAT addressing persistence. We can cover these scenarios by simply adding a flag attribute for userspace to fill when needed. The flags to use are defined in include/uapi/linux/netfilter/nf_nat.h: NF_NAT_RANGE_MAP_IPS NF_NAT_RANGE_PROTO_SPECIFIED NF_NAT_RANGE_PROTO_RANDOM NF_NAT_RANGE_PERSISTENT NF_NAT_RANGE_PROTO_RANDOM_FULLY NF_NAT_RANGE_PROTO_RANDOM_ALL The caller must take care of not messing up with the flags, as they are added unconditionally to the final resulting nf_nat_range. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:27 +02:00
Arturo Borrero	b9ac12ef09	netfilter: nf_tables: extend NFT_MSG_DELTABLE to support flushing the ruleset This patch extend the NFT_MSG_DELTABLE call to support flushing the entire ruleset. The options now are: * No family speficied, no table specified: flush all the ruleset. * Family specified, no table specified: flush all tables in the AF. * Family specified, table specified: flush the given table. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:26 +02:00
Arturo Borrero	ee01d54256	netfilter: nf_tables: add helpers to schedule objects deletion This patch refactor the code to schedule objects deletion. They are useful in follow-up patches. In order to be able to use these new helper functions in all the code, they are placed in the top of the file, with all the dependant functions and symbols. nft_rule_disactivate_next has been renamed to nft_rule_deactivate. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:25 +02:00
Bojan Prtvar	c435201bed	netfilter: xt_string: Remove unnecessary initialization of struct ts_state The skb_find_text() accepts uninitialized textsearch state variable. Signed-off-by: Bojan Prtvar <prtvar.b@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:25 +02:00
Julian Anastasov	5fcf0cf607	ipvs: reduce stack usage for sockopt data Use union to reserve the required stack space for sockopt data which is less than the currently hardcoded value of 128. Now the tables for commands should be more readable. The checks added for readability are optimized by compiler, others warn at compile time if command uses too much stack or exceeds the storage of set_arglen and get_arglen. As Dan Carpenter points out, we can run for unprivileged user, so we can silent some error messages. Signed-off-by: Julian Anastasov <ja@ssi.bg> CC: Dan Carpenter <dan.carpenter@oracle.com> CC: Andrey Utkin <andrey.krieger.utkin@gmail.com> CC: David Binderman <dcb314@hotmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:24 +02:00
Ana Rey	3045d76070	netfilter: nf_tables: add devgroup support in meta expresion Add devgroup support to let us match device group of a packets incoming or outgoing interface. Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:23 +02:00
Arturo Borrero	ce24b7217b	netfilter: nf_tables: rename nf_table_delrule_by_chain() For the sake of homogenize the function naming scheme, let's rename nf_table_delrule_by_chain() to nft_delrule_by_chain(). Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:22 +02:00
Arturo Borrero	c559879406	netfilter: nf_tables: add helper to unregister chain hooks This patch adds a helper function to unregister chain hooks in the chain deletion path. Basically, a code factorization. The new function is useful in follow-up patches. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:21 +02:00
Arturo Borrero	5e266fe7c0	netfilter: nf_tables: refactor rule deletion helper This helper function always schedule the rule to be removed in the following transaction. In follow-up patches, it is interesting to handle separately the logic of rule activation/disactivation from the transaction mechanism. So, this patch simply splits the original nf_tables_delrule_one() in two functions, allowing further control. While at it, for the sake of homigeneize the function naming scheme, let's rename nf_tables_delrule_one() to nft_delrule(). Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:20 +02:00
Pablo Neira Ayuso	876665eafc	netfilter: nft_chain_nat_ipv6: use generic IPv6 NAT code from core Use the exported IPv6 NAT functions that are provided by the core. This removes duplicated code so iptables and nft use the same NAT codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:09 +02:00
Pablo Neira Ayuso	2a5538e9aa	netfilter: nat: move specific NAT IPv6 to core Move the specific NAT IPv6 core functions that are called from the hooks from ip6table_nat.c to nf_nat_l3proto_ipv6.c. This prepares the ground to allow iptables and nft to use the same NAT engine code that comes in a follow up patch. This also renames nf_nat_ipv6_fn to nft_nat_ipv6_fn in net/ipv6/netfilter/nft_chain_nat_ipv6.c to avoid a compilation breakage. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:30:00 +02:00
Jukka Rissanen	39e90c7763	Bluetooth: 6lowpan: Route packets that are not meant to peer via correct device Packets that are supposed to be delivered via the peer device need to be checked and sent to correct device. This requires that user has set the routes properly so that the 6lowpan module can then figure out the destination gateway and the correct Bluetooth device. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 3.17.x	2014-09-09 15:51:47 +02:00
Jukka Rissanen	b2799cec22	Bluetooth: 6lowpan: Set the peer IPv6 address correctly The peer IPv6 address contained wrong U/L bit in the EUI-64 part. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 3.17.x	2014-09-09 15:51:47 +02:00
Jukka Rissanen	2ae50d8d3a	Bluetooth: 6lowpan: Increase the connection timeout value Use the default connection timeout value defined in l2cap.h because the current timeout was too short and most of the time the connection attempts timed out. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 3.17.x	2014-09-09 15:51:47 +02:00
Johan Hedberg	e1e930f591	Bluetooth: Fix mgmt pairing failure when authentication fails Whether through HCI with BR/EDR or SMP with LE when authentication fails we should also notify any pending Pair Device mgmt command. This patch updates the mgmt_auth_failed function to take the actual hci_conn object and makes sure that any pending pairing command is notified and cleaned up appropriately. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-09 03:12:15 +02:00
David S. Miller	5b4c314575	Merge tag 'master-2014-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-09-08 Please pull this batch of updates intended for the 3.18 stream... For the mac80211 bits, Johannes says: "Not that much content this time. Some RCU cleanups, crypto performance improvements, and various patches all over, rather than listing them one might as well look into the git log instead." For the Bluetooth bits, Gustavo says: "The changes consists of: - Coding style fixes to HCI drivers - Corrupted ack value fix for the H5 HCI driver - A couple of Enhanced L2CAP fixes - Conversion of SMP code to use common L2CAP channel API - Page scan optimizations when using the kernel-side whitelist - Various mac802154 and and ieee802154 6lowpan cleanups - One new Atheros USB ID" For the iwlwifi bits, Emmanuel says: "We have a new big thing coming up which is called Dynamic Queue Allocation (or DQA). This is a completely new way to work with the Tx queues and it requires major refactoring. This is being done by Johannes and Avri. Besides this, Johannes disables U-APSD by default because of APs that would disable A-MPDU if the association supports U-ASPD. Luca contributed to the power area which he was cleaning up on the way while working on CSA. A few more random things here and there." For the Atheros bits, Kalle says: "For ath6kl we had two small fixes and a new SDIO device id. For ath10k the bigger changes are: * support for new firmware version 10.2 (Michal) * spectral scan support (Simon, Sven & Mathias) * export a firmware crash dump file (Ben & me) * cleaning up of pci.c (Michal) * print pci id in all messages, which causes most of the churn (Michal)" Beyond that, we have the usual collection of various updates to ath9k, b43, mwifiex, and wil6210, as well as a few other bits here and there. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-08 16:43:58 -07:00
Willem de Bruijn	a7f26b7e1e	inet: remove dead inetpeer sequence code inetpeer sequence numbers are no longer incremented, so no need to check and flush the tree. The function that increments the sequence number was already dead code and removed in in "ipv4: remove unused function" (`068a6e18`). Remove the code that checks for a change, too. Verifying that v4_seq and v6_seq are never incremented and thus that flush_check compares bp->flush_seq to 0 is trivial. The second part of the change removes flush_check completely even though bp->flush_seq is exactly !0 once, at initialization. This change is correct because the time this branch is true is when bp->root == peer_avl_empty_rcu, in which the branch and inetpeer_invalidate_tree are a NOOP. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-08 16:42:42 -07:00
Tom Herbert	1e701f1698	net: Fix GRE RX to use skb_transport_header for GRE header offset GRE assumes that the GRE header is at skb_network_header + ip_hrdlen(skb). It is more general to use skb_transport_header and this allows the possbility of inserting additional header between IP and GRE (which is what we will done in Generic UDP Encapsulation for GRE). Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-08 15:23:05 -07:00
Eric Dumazet	82d5e2b8b4	net: fix skb_page_frag_refill() kerneldoc In commit `d9b2938aab` ("net: attempt a single high order allocation) I forgot to update kerneldoc, as @prio parameter was renamed to @gfp Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-08 14:12:14 -07:00
Johan Hedberg	c68b7f127d	Bluetooth: Fix dereferencing conn variable before NULL check This patch fixes the following type of static analyzer warning (and probably a real bug as well as the NULL check should be there for a reason): net/bluetooth/smp.c:1182 smp_conn_security() warn: variable dereferenced before check 'conn' (see line 1174) Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:57 +02:00
Behan Webster	9f06a8d623	Bluetooth: LLVMLinux: Remove VLAIS from bluetooth/amp.c Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99 compliant equivalent. This patch allocates the appropriate amount of memory using an char array. The new code can be compiled with both gcc and clang. struct shash_desc contains a flexible array member member ctx declared with CRYPTO_MINALIGN_ATTR, so sizeof(struct shash_desc) aligns the beginning of the array declared after struct shash_desc with long long. No trailing padding is required because it is not a struct type that can be used in an array. The CRYPTO_MINALIGN_ATTR is required so that desc is aligned with long long as would be the case for a struct containing a member with CRYPTO_MINALIGN_ATTR. Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:56 +02:00
Johan Hedberg	b28b494366	Bluetooth: Add strict checks for allowed SMP PDUs SMP defines quite clearly when certain PDUs are to be expected/allowed and when not, but doesn't have any explicit request/response definition. So far the code has relied on each PDU handler to behave correctly if receiving PDUs at an unexpected moment, however this requires many different checks and is prone to errors. This patch introduces a generic way to keep track of allowed PDUs and thereby reduces the responsibility & load on individual command handlers. The tracking is implemented using a simple bit-mask where each opcode maps to its own bit. If the bit is set the corresponding PDU is allow and if the bit is not set the PDU is not allowed. As a simple example, when we send the Pairing Request we'd set the bit for Pairing Response, and when we receive the Pairing Response we'd clear the bit for Pairing Response. Since the disallowed PDU rejection is now done in a single central place we need to be a bit careful of which action makes most sense to all cases. Previously some, such as Security Request, have been simply ignored whereas others have caused an explicit disconnect. The only PDU rejection action that keeps good interoperability and can be used for all the applicable use cases is to drop the data. This may raise some concerns of us now being more lenient for misbehaving (and potentially malicious) devices, but the policy of simply dropping data has been a successful one for many years e.g. in L2CAP (where this is the only policy for such cases - we never request disconnection in l2cap_core.c because of bad data). Furthermore, we cannot prevent connected devices from creating the SMP context (through a Security or Pairing Request), and once the context exists looking up the corresponding bit for the received opcode and deciding to reject it is essentially an equally lightweight operation as the kind of rejection that l2cap_core.c already successfully does. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:56 +02:00
Johan Hedberg	c6e81e9ae6	Bluetooth: Fix calling smp_distribute_keys() when still waiting for keys When we're in the process of receiving keys in phase 3 of SMP we keep track of which keys are still expected in the smp->remote_key_dist variable. If we still have some key bits set we need to continue waiting for more PDUs and not needlessly call smp_distribute_keys(). This patch fixes two such cases in the smp_cmd_master_ident() and smp_cmd_ident_addr_info() handler functions. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:56 +02:00
Johan Hedberg	88d3a8acf3	Bluetooth: Add define for key distribution mask This patch adds a define for the allowed bits of the key distribution mask so we don't have to have magic 0x07 constants throughout the code. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:56 +02:00
Johan Hedberg	fc75cc8684	Bluetooth: Fix locking of the SMP context Before the move the l2cap_chan the SMP context (smp_chan) didn't have any kind of proper locking. The best there existed was the HCI_CONN_LE_SMP_PEND flag which was used to enable mutual exclusion for potential multiple creators of the SMP context. Now that SMP has been converted to use the l2cap_chan infrastructure and since the SMP context is directly mapped to a corresponding l2cap_chan we get the SMP context locking essentially for free through the l2cap_chan lock. For all callbacks that l2cap_core.c makes for each channel implementation (smp.c in the case of SMP) the l2cap_chan lock is held through l2cap_chan_lock(chan). Since the calls from l2cap_core.c to smp.c are covered the only missing piece to have the locking implemented properly is to ensure that the lock is held for any other call path that may access the SMP context. This means user responses through mgmt.c, requests to elevate the security of a connection through hci_conn.c, as well as any deferred work through workqueues. This patch adds the necessary locking to all these other code paths that try to access the SMP context. Since mutual exclusion for the l2cap_chan access is now covered from all directions the patch also removes unnecessary HCI_CONN_LE_SMP_PEND flag (once we've acquired the chan lock we can simply check whether chan->smp is set to know if there's an SMP context). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:56 +02:00
Johan Hedberg	d6268e86a1	Bluetooth: Remove unnecessary deferred work for SMP key distribution Now that the identity address update happens through its own deferred work there's no need to have smp_distribute_keys anymore behind a second deferred work. This patch removes this extra construction and makes the code do direct calls to smp_distribute_keys() again. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:56 +02:00
Johan Hedberg	f3d82d0c8e	Bluetooth: Move identity address update behind a workqueue The identity address update of all channels for an l2cap_conn needs to take the lock for each channel, i.e. it's safest to do this by a separate workqueue callback. Previously this was partially solved by moving the entire SMP key distribution behind a workqueue. However, if we want SMP context locking to be correct and safe we should always use the l2cap_chan lock when accessing it, meaning even smp_distribute_keys needs to take that lock which would once again create a dead lock when updating the identity address. The simplest way to solve this is to have l2cap_conn manage the deferred work which is what this patch does. A subsequent patch will remove the now unnecessary SMP key distribution work struct. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:55 +02:00
Johan Hedberg	84bc0db53b	Bluetooth: Don't take any action in smp_resume_cb if not encrypted When smp_resume_cb is called if we're not encrypted (i.e. the callback wasn't called because the connection became encrypted) we shouldn't take any action at all. This patch moves also the security_timer cancellation behind this condition. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:55 +02:00
Johan Hedberg	1b0921d6be	Bluetooth: Remove unnecessary checks after canceling SMP security timer The SMP security timer used to be able to modify the SMP context state but now days it simply calls hci_disconnect(). It is therefore unnecessary to have extra sanity checks for the SMP context after canceling the timer. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:55 +02:00
Johan Hedberg	434714dc02	Bluetooth: Add clarifying comment for LE CoC result value The "pending" L2CAP response value is not defined for LE CoC. This patch adds a clarifying comment to the code so that the reader will not think there is a bug in trying to use this value for LE CoC. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:55 +02:00
Johan Hedberg	839035a7b3	Bluetooth: Move clock offset reading into hci_disconnect() To give all hci_disconnect() users the advantage of getting the clock offset read automatically this patch moves the necessary code from hci_conn_timeout() into hci_disconnect(). This way we pretty much always update the clock offset when disconnecting. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:55 +02:00
Johan Hedberg	e3f2f92a04	Bluetooth: Use hci_disconnect() for mgmt_disconnect_device() There's no reason to custom build the HCI_Disconnect command in the Disconnect Device mgmt command handler. This patch updates the code to use hci_disconnect() instead. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:55 +02:00
Johan Hedberg	e3b679d56c	Bluetooth: Update hci_disconnect() to return an error value We'll soon use hci_disconnect() from places that are interested to know whether the hci_send_cmd() really succeeded or not. This patch updates hci_disconnect() to pass on any error returned from hci_send_cmd(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:55 +02:00
Johan Hedberg	9b7b18ef1b	Bluetooth: Fix SMP error and response to be mutually exclusive Returning failure from the SMP data parsing function will cause an immediate disconnect, making any attempts to send a response PDU futile. This patch updates the function to always either send a response or return an error, but never both at the same time: * In the case that HCI_LE_ENABLED is not set we want to send a Pairing Not Supported response but it is not required to force a disconnection, so do not set the error return in this case. * If we get garbage SMP data we can just fail with the handler function instead of also trying to send an SMP Failure PDU. * There's no reason to force a disconnection if we receive an unknown SMP command. Instead simply send a proper Command Not Supported SMP response. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:54 +02:00
Johan Hedberg	b04afa0c28	Bluetooth: Remove unused l2cap_conn_shutdown API Now that there are no more users of the l2cap_conn_shutdown API (since smp.c switched to using hci_disconnect) we can simply remove it along with all of it's l2cap_conn variables. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:54 +02:00
Johan Hedberg	1e91c29eb6	Bluetooth: Use hci_disconnect for immediate disconnection from SMP Relying on the l2cap_conn_del procedure (triggered through the l2cap_conn_shutdown API) to get the connection disconnected is not reliable as it depends on all users releasing (through hci_conn_drop) and that there's at least one user (so hci_conn_drop is called at least one time). A much simpler and more reliable solution is to call hci_disconnect() directly from the SMP code when we want to disconnect. One side-effect this has is that it prevents any SMP Failure PDU from being sent before the disconnection, however neither one of the scenarios where l2cap_conn_shutdown was used really requires this. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:54 +02:00
Johan Hedberg	e31fb86005	Bluetooth: Set discon_timeout to 0 in l2cap_conn_del When the l2cap_conn_del() function is used we do not want to wait around "in case something happens" before disconnecting. This patch sets the disconnection timeout to 0 so that the disconnection routines get immediately scheduled. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:54 +02:00
Johan Hedberg	bcbb655a18	Bluetooth: Remove hci_conn_hold/drop from hci_chan We can't have hci_chan contribute to the "active" reference counting of the hci_conn since otherwise the connection would never get dropped when there are no more users (since hci_chan would be counted as a user). This patch removes hold() when creating the hci_chan and drop() when destroying it. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:54 +02:00
Johan Hedberg	f94b665dcf	Bluetooth: Ignore incoming data after initiating disconnection When hci_chan_del is called the disconnection routines get scheduled through a workqueue. If there's any incoming ACL data before the routines get executed there's a chance that a new hci_chan is created and the disconnection never happens. This patch adds a new hci_conn flag to indicate that we're in the process of driving the connection down. We set the flag in hci_chan_del and check for it in hci_chan_create so that no new channels are created for the same connection. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:53 +02:00
Johan Hedberg	b3ff670a44	Bluetooth: Set disc_timeout to 0 when calling hci_chan_del The hci_chan_del() function is used in scenarios where we've decided we want to get rid of the underlying baseband link. It makes therefore sense to force the disc_timeout to 0 so that the disconnection routines are immediately scheduled. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:53 +02:00
Johan Hedberg	6c388d32ec	Bluetooth: Fix hci_conn reference counting with hci_chan The hci_chan_del() function was doing a hci_conn_drop() but there was no matching hci_conn_hold() in the hci_chan_create() function. Furthermore, as the hci_chan struct holds a pointer to the hci_conn there should be proper use of hci_conn_get/put. This patch fixes both issues so that hci_chan does correct reference counting of the hci_conn object. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:53 +02:00
Johan Hedberg	f6c6324969	Bluetooth: Refactor connection parameter freeing into its own function The necessary steps for freeing connection paramaters have grown quite a bit so we can simplify the code by factoring it out into its own function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:53 +02:00
Johan Hedberg	f8aaf9b65a	Bluetooth: Fix using hci_conn_get() for hci_conn pointers Wherever we keep hci_conn pointers around we should be using hci_conn_get/put to ensure that they stay valid. This patch fixes all places violating against the principle currently. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:53 +02:00
Johan Hedberg	51bb8457dd	Bluetooth: Improve _get() functions to return the object type It's natural to have _get() functions that increment the reference count of an object to return the object type itself. This way it's simple to make a copy of the object pointer and increase the reference count in a single step. This patch updates two such get() functions, namely hci_conn_get() and l2cap_conn_get(), and updates the users to take advantage of the new API. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:52 +02:00
Johan Hedberg	5477610fc1	Bluetooth: Optimize connection parameter lookup for LE connections When we get an LE connection complete event there's really no reason to look through the entire connection parameter list as the entry should be present in the hdev->pend_le_conns list too. This patch changes the lookup code to do a more restricted lookup only in the pend_le_conns list. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:52 +02:00
Johan Hedberg	08853f18ea	Bluetooth: Set addr_type only when it's needed In the hci_le_conn_complete_evt() function there's no need to set the addr_type value until it's actually needed, i.e. for the black list lookup. This patch moves the code a bit further down in the function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:52 +02:00
Johan Hedberg	c16900cf28	Bluetooth: Fix hci_conn reference counting for fixed channels Now that SMP has been converted to use fixed channels we've got a bit of a problem with the hci_conn reference counting. So far the L2CAP code has kept a reference for each L2CAP channel that was notified of the connection. With SMP however this would mean that the connection is never dropped even though there are no other users of it. Furthermore, SMP already does its own hci_conn reference counting internally, starting from a security or pairing request and ending with the key distribution. This patch makes L2CAP fixed channels default to the L2CAP core not keeping a hci_conn reference for them. A new FLAG_HOLD_HCI_CONN flag is added so that L2CAP users can declare an exception to this rule and hold a reference even for their fixed channels. One such exception is the L2CAP socket layer which does want a reference for each socket (e.g. an ATT socket which uses a fixed channel). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:52 +02:00
Johan Hedberg	b3ed6c63f7	Bluetooth: Remove unnecessary l2cap_chan_unlock before l2cap_chan_add The l2cap_chan_add() function doesn't require the channel to be unlocked. It only requires the l2cap_conn to be unlocked. Therefore, it's unnecessary to unlock a channel before calling l2cap_chan_add(). This patch removes such unnecessary unlocking from the l2cap_chan_connect() function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-08 19:07:52 +02:00
Johan Hedberg	72c6fb915f	Bluetooth: Fix incorrect LE CoC PDU length restriction based on HCI MTU The l2cap_create_le_flowctl_pdu() function that l2cap_segment_le_sdu() calls is perfectly capable of doing packet fragmentation if given bigger PDUs than the HCI buffers allow. Forcing the PDU length based on the HCI MTU (conn->mtu) would therefore needlessly strict operation on hardware with limited LE buffers (e.g. both Intel and Broadcom seem to have this set to just 27 bytes). This patch removes the restriction and makes it possible to send PDUs of the full length that the remote MPS value allows. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-09-08 19:07:52 +02:00
John W. Linville	61a3d4f9d5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2014-09-08 11:14:56 -04:00
Johannes Berg	b1e9be8775	mac80211: annotate MMIC head/tailroom warning This message occasionally triggers for some people as in https://bugzilla.redhat.com/show_bug.cgi?id=1111740 but it's not clear which (headroom or tailroom) is at fault. Annotate the message a bit to get more information. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-08 11:22:42 +02:00
Steinar H. Gunderson	c8d6591752	mac80211: support DTPC IE (from Cisco Client eXtensions) Linux already supports 802.11h, where the access point can tell the client to reduce its transmission power. However, 802.11h is only defined for 5 GHz, where the need for this is much smaller than on 2.4 GHz. Cisco has their own solution, called DTPC (Dynamic Transmit Power Control). Cisco APs on a controller sometimes but not always send 802.11h; they always send DTPC, even on 2.4 GHz. This patch adds support for parsing and honoring the DTPC IE in addition to the 802.11h element (they do not always contain the same limits, so both must be honored); the format is not documented, but very simple. Tested (on top of wireless.git and on 3.16.1) against a Cisco Aironet 1142 joined to a Cisco 2504 WLC, by setting various transmit power levels for the given access points and observing the results. The Wireshark 802.11 dissector agrees with the interpretation of the element, except for negative numbers, which seem to never happen anyway. Signed-off-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>	2014-09-08 10:52:00 +02:00
Steinar H. Gunderson	24a4e4008c	mac80211: split 802.11h parsing from transmit power policy Decouple the logic of parsing the 802.11d and 802.11h IEs from the part of deciding what to do about the data (messaging, clamping to 0 dBm, doing the actual setting). This paves the way for the next patch, which introduces more data sources for transmit power limitation. Signed-off-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-08 10:43:18 +02:00
David S. Miller	eb84d6b604	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-09-07 21:41:53 -07:00
Tejun Heo	908c7f1949	percpu_counter: add @gfp to percpu_counter_init() Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_counters too. We could have left percpu_counter_init() alone and added percpu_counter_init_gfp(); however, the number of users isn't that high and introducing _gfp variants to all percpu data structures would be quite ugly, so let's just do the conversion. This is the one with the most users. Other percpu data structures are a lot easier to convert. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org>	2014-09-08 09:51:29 +09:00
David S. Miller	45ce829dd0	Merge tag 'master-2014-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-09-05 Please pull this batch of fixes intended for the 3.17 stream... For the mac80211 bits, Johannes says: "Here are a few fixes for mac80211. One has been discussed for a while and adds a terminating NUL-byte to the alpha2 sent to userspace, which shouldn't be necessary but since many places treat it as a string we couldn't move to just sending two bytes. In addition to that, we have two VLAN fixes from Felix, a mesh fix, a fix for the recently introduced RX aggregation offload, a revert for a broken patch (that luckily didn't really cause any harm) and a small fix for alignment in debugfs." For the iwlwifi bits, Emmanuel says: "I revert a patch that disabled CTS to self in dvm because users reported issues. The revert is CCed to stable since the offending patch was sent to stable too. I also bump the firmware API versions since a new firmware is coming up. On top of that, Marcel fixes a bug I introduced while fixing a bug in our Kconfig file." Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-07 16:11:10 -07:00
WANG Cong	de185ab46c	ipv6: restore the behavior of ipv6_sock_ac_drop() It is possible that the interface is already gone after joining the list of anycast on this interface as we don't hold a refcount for the device, in this case we are safe to ignore the error. What's more important, for API compatibility we should not change this behavior for applications even if it were correct. Fixes: commit `a9ed4a2986` ("ipv6: fix rtnl locking in setsockopt for anycast and multicast") Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-07 16:10:07 -07:00
Andy Shevchenko	13aa3463e5	rose: use %ph specifier Instead of dereference each byte let's use %ph specifier in the printk() calls. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-07 16:07:25 -07:00
Pablo Neira Ayuso	679ab4ddbd	netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup' CONFIG_IPV6=m CONFIG_NETFILTER_XT_TARGET_TPROXY=y net/built-in.o: In function `nf_tproxy_get_sock_v6.constprop.11': >> xt_TPROXY.c:(.text+0x583a1): undefined reference to `udp6_lib_lookup' net/built-in.o: In function `tproxy_tg_init': >> xt_TPROXY.c:(.init.text+0x1dc3): undefined reference to `nf_defrag_ipv6_enable' This fix is similar to `1a5bbfc` ("netfilter: Fix build errors with xt_socket.c"). Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-07 17:25:16 +02:00
Neal Cardwell	87d943085b	tcp: remove obsolete comment about TCP_SKB_CB(skb)->when in tcp_fragment() The TCP_SKB_CB(skb)->when field no longer exists as of recent change `7faee5c0d5` ("tcp: remove TCP_SKB_CB(skb)->when"). And in any case, tcp_fragment() is called on already-transmitted packets from the __tcp_retransmit_skb() call site, so copying timestamps of any kind in this spot is quite sensible. Signed-off-by: Neal Cardwell <ncardwell@google.com> Reported-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-06 12:29:10 -07:00
Eric Dumazet	7faee5c0d5	tcp: remove TCP_SKB_CB(skb)->when After commit `740b0f1841` ("tcp: switch rtt estimations to usec resolution"), we no longer need to maintain timestamps in two different fields. TCP_SKB_CB(skb)->when can be removed, as same information sits in skb_mstamp.stamp_jiffies Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:49:33 -07:00
Eric Dumazet	04317dafd1	tcp: introduce TCP_SKB_CB(skb)->tcp_tw_isn TCP_SKB_CB(skb)->when has different meaning in output and input paths. In output path, it contains a timestamp. In input path, it contains an ISN, chosen by tcp_timewait_state_process() Lets add a different name to ease code comprehension. Note that 'when' field will disappear in following patch, as skb_mstamp already contains timestamp, the anonymous union will promptly disappear as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:49:33 -07:00
Alexander Duyck	56193d1bce	net: Add function for parsing the header length out of linear ethernet frames This patch updates some of the flow_dissector api so that it can be used to parse the length of ethernet buffers stored in fragments. Most of the changes needed were to __skb_get_poff as it needed to be updated to support sending a linear buffer instead of a skb. I have split __skb_get_poff into two functions, the first is skb_get_poff and it retains the functionality of the original __skb_get_poff. The other function is __skb_get_poff which now works much like __skb_flow_dissect in relation to skb_flow_dissect in that it provides the same functionality but works with just a data buffer and hlen instead of needing an skb. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:47:02 -07:00
Alexander Duyck	82eabd9eb2	net: merge cases where sock_efree and sock_edemux are the same function Since sock_efree and sock_demux are essentially the same code for non-TCP sockets and the case where CONFIG_INET is not defined we can combine the code or replace the call to sock_edemux in several spots. As a result we can avoid a bit of unnecessary code or code duplication. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:43:45 -07:00
Alexander Duyck	62bccb8cdb	net-timestamp: Make the clone operation stand-alone from phy timestamping The phy timestamping takes a different path than the regular timestamping does in that it will create a clone first so that the packets needing to be timestamped can be placed in a queue, or the context block could be used. In order to support these use cases I am pulling the core of the code out so it can be used in other drivers beyond just phy devices. In addition I have added a destructor named sock_efree which is meant to provide a simple way for dropping the reference to skb exceptions that aren't part of either the receive or send windows for the socket, and I have removed some duplication in spots where this destructor could be used in place of sock_edemux. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:43:45 -07:00
Alexander Duyck	37846ef018	net-timestamp: Merge shared code between phy and regular timestamping This change merges the shared bits that exist between skb_tx_tstamp and skb_complete_tx_timestamp. By doing this we can avoid the two diverging as there were already changes pushed into skb_tx_tstamp that hadn't made it into the other function. In addition this resolves issues with the fact that skb_complete_tx_timestamp was included in linux/skbuff.h even though it was only compiled in if phy timestamping was enabled. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:43:45 -07:00
Eric Dumazet	d546c62154	ipv4: harden fnhe_hashfun() Lets make this hash function a bit secure, as ICMP attacks are still in the wild. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:40:33 -07:00
Masanari Iida	e793c0f70e	net: treewide: Fix typo found in DocBook/networking.xml This patch fix spelling typo found in DocBook/networking.xml. It is because the neworking.xml is generated from comments in the source, I have to fix typo in comments within the source. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:35:28 -07:00
Pablo Neira Ayuso	84a59ca55f	netfilter: add explicit Kconfig for NETFILTER_XT_NAT Paul Bolle reports that 'select NETFILTER_XT_NAT' from the IPV4 and IPV6 NAT tables becomes noop since there is no Kconfig switch for it. Add the Kconfig switch to resolve this problem. Fixes: `8993cf8` netfilter: move NAT Kconfig switches out of the iptables scope Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:23:31 -07:00
Eric Dumazet	caa415270c	ipv4: fix a race in update_or_create_fnhe() nh_exceptions is effectively used under rcu, but lacks proper barriers. Between kzalloc() and setting of nh->nh_exceptions(), we need a proper memory barrier. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `4895c771c7` ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:15:50 -07:00
Nicolas Dichtel	e7478dfc46	ipv6: use addrconf_get_prefix_route() to remove peer addr addrconf_get_prefix_route() ensures to get the right route in the right table. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:13:24 -07:00
Nicolas Dichtel	f24062b07d	ipv6: fix a refcnt leak with peer addr There is no reason to take a refcnt before deleting the peer address route. It's done some lines below for the local prefix route because inet6_ifa_finish_destroy() will release it at the end. For the peer address route, we want to free it right now. This bug has been introduced by commit `caeaba7900` ("ipv6: add support of peer address"). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:13:24 -07:00
Andy Zhou	29abe2fda5	l2tp: fix missing line continuation This syntax error was covered by L2TP_REFCNT_DEBUG not being set by default. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 15:19:53 -07:00
Willem de Bruijn	c199105d15	net-timestamp: only report sw timestamp if reporting bit is set The timestamping API has separate bits for generating and reporting timestamps. A software timestamp should only be reported for a packet when the packet has the relevant generation flag (SKBTX_..) set and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set. The second check was accidentally removed. Reinstitute the original behavior. Tested: Without this patch, Documentation/networking/txtimestamp reports timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set. After the patch, it only reports them when the flag is set. Fixes: `f24b9be595` ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 15:02:43 -07:00
Guillaume Nault	eed4d839b0	l2tp: fix race while getting PMTU on PPP pseudo-wire Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU. The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get() could return NULL if tunnel->sock->sk_dst_cache was reset just before the call, thus making dst_mtu() dereference a NULL pointer: [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0 [ 1937.664005] Oops: 0000 [#1] SMP [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core] [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G O 3.17.0-rc1 #1 [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008 [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000 [ 1937.664005] RIP: 0010:[<ffffffffa049db88>] [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] RSP: 0018:ffff8800c43c7de8 EFLAGS: 00010282 [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5 [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000 [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000 [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000 [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009 [ 1937.664005] FS: 00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000 [ 1937.664005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0 [ 1937.664005] Stack: [ 1937.664005] ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009 [ 1937.664005] ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57 [ 1937.664005] ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000 [ 1937.664005] Call Trace: [ 1937.664005] [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp] [ 1937.664005] [<ffffffff81109b57>] ? might_fault+0x9e/0xa5 [ 1937.664005] [<ffffffff81109b0e>] ? might_fault+0x55/0xa5 [ 1937.664005] [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26 [ 1937.664005] [<ffffffff81309196>] SYSC_connect+0x87/0xb1 [ 1937.664005] [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56 [ 1937.664005] [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1 [ 1937.664005] [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 1937.664005] [<ffffffff8114c262>] ? spin_lock+0x9/0xb [ 1937.664005] [<ffffffff813092b4>] SyS_connect+0x9/0xb [ 1937.664005] [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89 [ 1937.664005] RIP [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] RSP <ffff8800c43c7de8> [ 1937.664005] CR2: 0000000000000020 [ 1939.559375] ---[ end trace 82d44500f28f8708 ]--- Fixes: `f34c4a35d8` ("l2tp: take PMTU from tunnel UDP socket") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 14:40:18 -07:00
Govindarajulu Varadarajan	f0db9b0734	ethtool: Add generic options for tunables This patch adds new ethtool cmd, ETHTOOL_GTUNABLE & ETHTOOL_STUNABLE for getting tunable values from driver. Add get_tunable and set_tunable to ethtool_ops. Driver implements these functions for getting/setting tunable value. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 12:12:20 -07:00
Daniel Borkmann	e020836d95	dev_ioctl: remove dev_load() CAP_SYS_MODULE message Marcel reported to see the following message when autoloading is being triggered when adding nlmon device: Loading kernel module for a network device with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias netdev-nlmon instead. This false-positive happens despite with having correct capabilities set, e.g. through issuing `ip link del dev nlmon` more than once on a valid device with name nlmon, but Marcel has also seen it on creation time when no nlmon module is previously compiled-in or loaded as module and the device name equals a link type name (e.g. nlmon, vxlan, team). Stephen says: The netdev module alias is a hold over from the past. For normal devices, people used to create a alias eth0 to and point it to the type of network device used, that was back in the bad old ISA days before real discovery. Also, the tunnels create module alias for the control device and ip used to use this to autoload the tunnel device. The message is bogus and should just be removed, I also see it in a couple of other cases where tap devices are renamed for other usese. As mentioned in `8909c9ad8f` ("net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules"), we nevertheless still might want to leave the old autoloading behaviour in place as it could break old scripts, so for now, lets just remove the log message as Stephen suggests. Reference: http://thread.gmane.org/gmane.linux.kernel/1105168 Reported-by: Marcel Holtmann <marcel@holtmann.org> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 12:04:40 -07:00
Daniel Borkmann	60a3b2253c	net: bpf: make eBPF interpreter images read-only With eBPF getting more extended and exposure to user space is on it's way, hardening the memory range the interpreter uses to steer its command flow seems appropriate. This patch moves the to be interpreted bytecode to read-only pages. In case we execute a corrupted BPF interpreter image for some reason e.g. caused by an attacker which got past a verifier stage, it would not only provide arbitrary read/write memory access but arbitrary function calls as well. After setting up the BPF interpreter image, its contents do not change until destruction time, thus we can setup the image on immutable made pages in order to mitigate modifications to that code. The idea is derived from commit `314beb9bca` ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). This is possible because bpf_prog is not part of sk_filter anymore. After setup bpf_prog cannot be altered during its life-time. This prevents any modifications to the entire bpf_prog structure (incl. function/JIT image pointer). Every eBPF program (including classic BPF that are migrated) have to call bpf_prog_select_runtime() to select either interpreter or a JIT image as a last setup step, and they all are being freed via bpf_prog_free(), including non-JIT. Therefore, we can easily integrate this into the eBPF life-time, plus since we directly allocate a bpf_prog, we have no performance penalty. Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual inspection of kernel_page_tables. Brad Spengler proposed the same idea via Twitter during development of this patch. Joint work with Hannes Frederic Sowa. Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 12:02:48 -07:00
Sabrina Dubroca	a9ed4a2986	ipv6: fix rtnl locking in setsockopt for anycast and multicast Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST triggers the assertion in addrconf_join_solict()/addrconf_leave_solict() ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before calling ipv6_dev_mc_inc/dec. This patch moves ASSERT_RTNL() up a level in the call stack. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reported-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 11:52:28 -07:00
Eyal Shapira	f3000e1b43	mac80211: fix broken use of VHT/20Mhz with some APs commit "mac80211: disable 40MHz support in case of 20MHz AP" broke working VHT in 20Mhz with APs like Netgear R6300v2 which do not publish support for 40Mhz but allow use of VHT in 20Mhz. The break is because VHT is disabled once no HT cap doesn't indicate support for 40Mhz. This causes the assoc request to be sent without any VHT IE and the association is only HT due to this. For more details check out commit `4a817aa7` "mac80211: allow VHT with peers not capable of 40MHz" Fixes: `53b954ee4a` ("mac80211: disable 40MHz support in case of 20MHz AP") Signed-off-by: Eyal Shapira <eyalx.shapira@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:54:07 +02:00
Lorenzo Bianconi	a4bcaf5556	mac80211: extend set_coverage_class signature Extend mac80211 set_coverage_class API in order to enable ACK timeout estimation algorithm (dynack) passing coverage class equals to -1 to lower drivers. Synchronize set_coverage_class routine signature with mac80211 function pointer for p54, ath9k, ath9k_htc and ath5k drivers. Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:54:07 +02:00
Lorenzo Bianconi	3057dbfdab	cfg80211: enable dynack through nl80211 Enable ACK timeout estimation algorithm (dynack) using mac80211 set_coverage_class API. Dynack is activated passing coverage class equals to -1 to lower drivers and it is automatically disabled setting valid value for coverage class. Define NL80211_ATTR_WIPHY_DYN_ACK flag attribute to enable dynack from userspace. In order to activate dynack NL80211_FEATURE_ACKTO_ESTIMATION feature flag must be set by lower drivers to indicate dynack capability. Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:54:03 +02:00
Eliad Peller	eaa336b0f5	mac80211: combine roc with the "next roc" if possible If the remaining time in the current roc is not long enough, mac80211 adds the new roc right after it (if they have similar params). However, in case of multiple rocs, the "next roc" is not considered, resulting in multiple rocs, each one with its own duration. Refactor the code a bit and consider the next roc, so a single max roc will be used instead of multiple rocs (which might last much longer). Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:52:09 +02:00
Eliad Peller	24ecd45e2e	mac80211: adjust roc duration when combining ROCs The new duration (remaining duration after the current ROC ends) was calculated but not used, making the optimization worthless. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:52:08 +02:00
Eliad Peller	a62a1aed37	cfg80211: avoid duplicate entries on regdomain intersection The regdom intersection code simply tries intersecting each rule of the source with each rule of the target. Since the resulting intersections are not observed as a whole, this can result in multiple overlapping/duplicate entries. Make the rule addition a bit more smarter, by looking for rules that can be contained within other rules, and adding only extended ones. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:52:08 +02:00
Assaf Krauss	cd2f5dd709	mac80211: Add RRM support to assoc request In case of a RRM-supporting connection, in the association request frame: set the RRM capability flag, and add the required IEs. Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:52:08 +02:00
Assaf Krauss	bab5ab7d2a	nl80211: Add flag attribute for RRM connections Add a flag attribute to use in associations, for tagging the target connection as supporting RRM. It is the responsibility of upper layers to set this flag only if both the underlying device, and the target network indeed support RRM. To be used in ASSOCIATE and CONNECT commands. Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:52:08 +02:00
Liad Kaufman	6188c271f0	mac80211: fix description comment of ieee80211_subif_start_xmit The function description claimed that on error the skb isn't freed even though it is, and stated return values that are different than what really happens in the code. Signed-off-by: Liad Kaufman <liad.kaufman@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:52:07 +02:00
Johannes Berg	2740f0cf8e	cfg80211: add Intel Mobile Communications copyright Our legal structure changed at some point (see wikipedia), but we forgot to immediately switch over to the new copyright notice. For files that we have modified in the time since the change, add the proper copyright notice now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:52:06 +02:00
Johannes Berg	d98ad83ee8	mac80211: add Intel Mobile Communications copyright Our legal structure changed at some point (see wikipedia), but we forgot to immediately switch over to the new copyright notice. For files that we have modified in the time since the change, add the proper copyright notice now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:52:06 +02:00
Emmanuel Grumbach	785e21a89d	mac80211: use bss_conf->dtim_period instead of conf.ps_dtim_period sta_set_sinfo is obviously takes data for specific station. This specific station is attached to a specific virtual interface. Hence we should use the dtim_period from this virtual interface rather than the system wide dtim_period. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 13:41:08 +02:00
Johannes Berg	c10a19930f	mac80211: clean up ieee80211_i.h Not sure how the declaration of ieee80211_tdls_peer_del_work landed after the double inclusion protection end. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-05 10:56:50 +02:00
Hannes Frederic Sowa	a9fe8e2994	ipv4: implement igmp_qrv sysctl to tune igmp robustness variable As in IPv6 people might increase the igmp query robustness variable to make sure unsolicited state change reports aren't lost on the network. Add and document this new knob to igmp code. RFCs allow tuning this parameter back to first IGMP RFC, so we also use this setting for all counters, including source specific multicast. Also take over sysctl value when upping the interface and don't reuse the last one seen on the interface. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-04 22:26:14 -07:00
Hannes Frederic Sowa	2f711939d2	ipv6: add sysctl_mld_qrv to configure query robustness variable This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query robustness variable. It specifies how many retransmit of unsolicited mld retransmit should happen. Admins might want to tune this on lossy links. Also reset mld state on interface down/up, so we pick up new sysctl settings during interface up event. IPv6 certification requests this knob to be available. I didn't make this knob netns specific, as it is mostly a setting in a physical environment and should be per host. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-04 22:26:14 -07:00
John W. Linville	ef4ead3f29	Not that much content this time. Some RCU cleanups, crypto performance improvements, and various patches all over, rather than listing them one might as well look into the git log instead. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUAIx4AAoJEDBSmw7B7bqrUYcP/3t4qdFxm0bd4j2AEkl3mPwB Qu7obTicOTfBRoJNEgS+8AU2u3PfztU6+ErZs4ETLUuqaZwXisqmwBiMo86+Wtdf gx9KonwEW051g7YmB0+6EMwuy04MGzTEk8VavQwqM4g9LIPJ4Buo/kj7MNJ51m11 XyRmJqZJnKKeiiQ4eC0gPf8e44qiQqaDuYZ0r1UDnNRg2KrbAHlGTBKYI3VRl2u4 xRpPGVnHwT0qkWb1Zw9fk0VfPr9m1ETthzcZvnhk6uMnJ28D+1B1FjZR1GJU6BW7 Zx2FbevbZTjDoNT1GQpLGMXBuW0lsZFetXVFiJCr/StaPBtHmtdu28fuNVm8yJYz euDlEgrE8F4npdec2F5R2zh7Ue2U7eMEL2uxxjciNSJOipHgx5EXH12Y/5QtrChy 4OHPbNHgpmqFB7TmkvHDgP/0A7XdyqKVc+NtIV+eECIwE4tHcJ6A+bQ+ZCoRV2Vw zmsNuNeNeDW7NEAw9veRXissLZMy/EjUnsOrnW29BpO/yG+2YjqpyQ6JQpcXeCPD WQgl2FHpk6ap3jpVjxminxw2HkDnQ0oTKusGLcezalhUlWMo7VYNN59aLzcphxX5 Fotp/8v1sbDTF46uc/QJ38N5TqflwWeFpxvGkdNGuAT4llP03NaXV0ORBecFmMW2 esb+PLwlByCDeVFu53q+ =Qth6 -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-john-2014-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg <johannes@sipsolutions.net> says: "Not that much content this time. Some RCU cleanups, crypto performance improvements, and various patches all over, rather than listing them one might as well look into the git log instead." Signed-off-by: John W. Linville <linville@tuxdriver.com> Conflicts: drivers/net/wireless/ath/wil6210/wmi.c	2014-09-04 13:41:33 -04:00
John W. Linville	190355cc06	Here are a few fixes for mac80211. One has been discussed for a while and adds a terminating NUL-byte to the alpha2 sent to userspace, which shouldn't be necessary but since many places treat it as a string we couldn't move to just sending two bytes. In addition to that, we have two VLAN fixes from Felix, a mesh fix, a fix for the recently introduced RX aggregation offload, a revert for a broken patch (that luckily didn't really cause any harm) and a small fix for alignment in debugfs. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUAFrmAAoJEDBSmw7B7bqr79AP/1yGi9lkv/wWUs5y0AhUSen9 850MU26BBlyAAFSz11xqgaEeRmeBeqhR3K7w/M02TX0CHxBzMqMZfyE//tq0UJaI ZwZmtyQmdMiOSNKignTIIx7OHTioq0wrGKb6O2UvKoJfTlB9t01jCC4jmCTF5Vos 6ReF7NaZEbxW6XDOsClNTAtIa1c6n1RQ5VbDIEL5Vfvqv8LbcobduF8WcYl80eIQ +EvIHtUm/Luxg6DblibgEVtwYOtNpvRz4pofdw3xoSHAnF+zhXbUr0dUjpkBNA7o vWboCBl14Qn1M7pOJZ0+TBzFmquAr6CDbDvArVCH01Swh27EUDQUcHQAggGpT71w DFgWHOYP0UCB6Y4U0GjBehy8PeuytqJLBSceKVud7DDqd8fY+Lq3MMyicIk0aw3o IIDLWrujkCBXsdfuxQETmYxHU05WHSuYOCTgGSqbq3QPTWm8pBGWTdbk+1t/0FyH cGLJOWs/jCrtHdzDj6TH+kL8NmvwB7sC9MT45qG0ilevmPW25yrnTJPEMEvFBqvZ lnaqiX6D1kGNZd09CxgSIhxrQi+N0Yg+UlLa4IUtOIqnQussOC3xH2U5qTufdpa1 Gi9aCkBGVKQiObPWucf2QB4t1sZ18rxBhrAelZhQPLTKrnsuLhpcVBlU+L6ScCAk FVni4HZH2IGtDQ577k10 =G/pB -----END PGP SIGNATURE----- Merge tag 'mac80211-for-john-2014-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg <johannes@sipsolutions.net> says: "Here are a few fixes for mac80211. One has been discussed for a while and adds a terminating NUL-byte to the alpha2 sent to userspace, which shouldn't be necessary but since many places treat it as a string we couldn't move to just sending two bytes. In addition to that, we have two VLAN fixes from Felix, a mesh fix, a fix for the recently introduced RX aggregation offload, a revert for a broken patch (that luckily didn't really cause any harm) and a small fix for alignment in debugfs." Signed-off-by: John W. Linville <linville@redhat.com>	2014-09-04 13:08:24 -04:00
Li RongQing	c5eba0b6f8	openvswitch: distinguish between the dropped and consumed skb distinguish between the dropped and consumed skb, not assume the skb is consumed always Cc: Thomas Graf <tgraf@noironetworks.com> Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-03 20:50:51 -07:00
Jesper Dangaard Brouer	1f59533f9c	qdisc: validate frames going through the direct_xmit path In commit `50cbe9ab5f` ("net: Validate xmit SKBs right when we pull them out of the qdisc") the validation code was moved out of dev_hard_start_xmit and into dequeue_skb. However this overlooked the fact that we do not always enqueue the skb onto a qdisc. First situation is if qdisc have flag TCQ_F_CAN_BYPASS and qdisc is empty. Second situation is if there is no qdisc on the device, which is a common case for software devices. Originally spotted and inital patch by Alexander Duyck. As a result Alex was seeing issues trying to connect to a vhost_net interface after commit `50cbe9ab5f` was applied. Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in __dev_queue_xmit() when no qdisc. Also handle the error situation where dev_hard_start_xmit() could return a skb list, and does not return dev_xmit_complete(rc) and falls through to the kfree_skb(), in that situation it should call kfree_skb_list(). Fixes: `50cbe9ab5f` ("net: Validate xmit SKBs right when we pull them out of the qdisc") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-03 20:41:42 -07:00
Jesper Dangaard Brouer	3f3c7eec60	qdisc: exit case fixes for skb list handling in qdisc layer More minor fixes to merge commit `53fda7f7f9` (Merge branch 'xmit_list') that allows us to work with a list of SKBs. Fixing exit cases in qdisc_reset() and qdisc_destroy(), where a leftover requeued SKB (qdisc->gso_skb) can have the potential of being a skb list, thus use kfree_skb_list(). This is a followup to commit `10770bc2d1` ("qdisc: adjustments for API allowing skb list xmits"). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-03 20:41:42 -07:00
Pablo Neira Ayuso	cbb8125eb4	netfilter: nfnetlink: deliver netlink errors on batch completion We have to wait until the full batch has been processed to deliver the netlink error messages to userspace. Otherwise, we may deliver duplicated errors to userspace in case that we need to abort and replay the transaction if any of the required modules needs to be autoloaded. A simple way to reproduce this (assumming nft_meta is not loaded) with the following test file: add table filter add chain filter test add chain bad test # intentional wrong unexistent table add rule filter test meta mark 0 Then, when trying to load the batch: # nft -f test test:4:1-19: Error: Could not process rule: No such file or directory add chain bad test ^^^^^^^^^^^^^^^^^^^ test:4:1-19: Error: Could not process rule: No such file or directory add chain bad test ^^^^^^^^^^^^^^^^^^^ The error is reported twice, once when the batch is aborted due to missing nft_meta and another when it is fully processed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-03 16:56:23 +02:00
Michal Kazior	4549cf2b18	mac80211: fix offloaded BA session traffic after hw restart When starting an offloaded BA session it is unknown what starting sequence number should be used. Using last_seq worked in most cases except after hw restart. When hw restart is requested last_seq is (rightfully so) kept unmodified. This ended up with BA sessions being restarted with an aribtrary BA window values resulting in dropped frames until sequence numbers caught up. Instead of last_seq pick seqno of a first Rxed frame of a given BA session. This fixes stalled traffic after hw restart with offloaded BA sessions (currently only ath10k). Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-03 13:40:38 +02:00
Johannes Berg	bd8c78e78d	nl80211: clear skb cb before passing to netlink In testmode and vendor command reply/event SKBs we use the skb cb data to store nl80211 parameters between allocation and sending. This causes the code for CONFIG_NETLINK_MMAP to get confused, because it takes ownership of the skb cb data when the SKB is handed off to netlink, and it doesn't explicitly clear it. Clear the skb cb explicitly when we're done and before it gets passed to netlink to avoid this issue. Cc: stable@vger.kernel.org [this goes way back] Reported-by: Assaf Azulay <assaf.azulay@intel.com> Reported-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-09-03 11:13:14 +02:00
Pablo Neira Ayuso	d99407f42f	netfilter: nft_rbtree: no need for spinlock from set destroy path The sets are released from the rcu callback, after the rule is removed from the chain list, which implies that nfnetlink cannot update the rbtree and no packets are walking on the set anymore. Thus, we can get rid of the spinlock in the set destroy path there. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewied-by: Thomas Graf <tgraf@suug.ch>	2014-09-03 10:57:08 +02:00
Pablo Neira Ayuso	39f390167e	netfilter: nft_hash: no need for rcu in the hash set destroy path The sets are released from the rcu callback, after the rule is removed from the chain list, which implies that nfnetlink cannot update the hashes (thus, no resizing may occur) and no packets are walking on the set anymore. This resolves a lockdep splat in the nft_hash_destroy() path since the nfnl mutex is not held there. =============================== [ INFO: suspicious RCU usage. ] 3.16.0-rc2+ #168 Not tainted ------------------------------- net/netfilter/nft_hash.c:362 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by ksoftirqd/0/3: #0: (rcu_callback){......}, at: [<ffffffff81096393>] rcu_process_callbacks+0x27e/0x4c7 stack backtrace: CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.16.0-rc2+ #168 Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000001 ffff88011769bb98 ffffffff8142c922 0000000000000006 ffff880117694090 ffff88011769bbc8 ffffffff8107c3ff ffff8800cba52400 ffff8800c476bea8 ffff8800c476bea8 ffff8800cba52400 ffff88011769bc08 Call Trace: [<ffffffff8142c922>] dump_stack+0x4e/0x68 [<ffffffff8107c3ff>] lockdep_rcu_suspicious+0xfa/0x103 [<ffffffffa079931e>] nft_hash_destroy+0x50/0x137 [nft_hash] [<ffffffffa078cd57>] nft_set_destroy+0x11/0x2a [nf_tables] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Thomas Graf <tgraf@suug.ch>	2014-09-03 10:57:06 +02:00
Jesper Dangaard Brouer	10770bc2d1	qdisc: adjustments for API allowing skb list xmits Minor adjustments for merge commit `53fda7f7f9` (Merge branch 'xmit_list') that allows us to work with a list of SKBs. Update code doc to function sch_direct_xmit(). In handle_dev_cpu_collision() use kfree_skb_list() in error handling. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 14:06:17 -07:00
Li RongQing	4ee45ea05c	openvswitch: fix a memory leak The user_skb maybe be leaked if the operation on it failed and codes skipped into the label "out:" without calling genlmsg_unicast. Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 14:01:21 -07:00
Pablo Neira	41ad82f7f8	netfilter: fix missing dependencies in NETFILTER_XT_TARGET_LOG make defconfig reports: warning: (NETFILTER_XT_TARGET_LOG) selects NF_LOG_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NETFILTER_ADVANCED) Fixes: `d79a61d` netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_* Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 13:59:54 -07:00
David S. Miller	abccc5878a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== pull request: Netfilter/IPVS fixes for net The following patchset contains seven Netfilter fixes for your net tree, they are: 1) Make the NAT infrastructure independent of x_tables, some users are already starting to test nf_tables with NAT without enabling x_tables. Without this patch for Kconfig, there's a superfluous dependency between NAT and x_tables. 2) Allow to use 0 in the cgroup match, the kernel rejects with -EINVAL with no good reason. From Daniel Borkmann. 3) Select CONFIG_NF_NAT from the nf_tables NAT expression, this also resolves another NAT dependency with x_tables. 4) Use HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL in the Netfilter hook code as elsewhere in the kernel to resolve toolchain problems, from Zhouyi Zhou. 5) Use iptunnel_handle_offloads() to set up tunnel encapsulation depending on the offload capabilities, reported by Alex Gartrell patch from Julian Anastasov. 6) Fix wrong family when registering the ip_vs_local_reply6() hook, also from Julian. 7) Select the NF_LOG_* symbols from NETFILTER_XT_TARGET_LOG. Rafał Miłecki reported that when jumping from 3.16 to 3.17-rc, his log target is not selected anymore due to changes in the previous development cycle to accomodate the full logging support for nf_tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 13:56:30 -07:00
Nicolas Dichtel	ba9989069f	rtnl/do_setlink(): notify when a netdev is modified Depending on which parameters were updated, the changes were not propagated via the notifier chain and netlink. The new flag has been set only when the change did not cause a call to the notifier chain and/or to the netlink notification functions. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 12:57:04 -07:00
Nicolas Dichtel	90c325e3bf	rtnl/do_setlink(): last arg is now a set of flags There is no functional changes with this commit, it only prepares the next one. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 12:57:04 -07:00
Nicolas Dichtel	1889b0e7ef	rtnl/do_setlink(): set modified when IFLA_LINKMODE is updated The only effect of this patch is to print a warning if IFLA_LINKMODE is updated and a following change fails. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 12:57:04 -07:00
Nicolas Dichtel	5d1180fcac	rtnl/do_setlink(): set modified when IFLA_TXQLEN is updated The only effect of this patch is to print a warning if IFLA_TXQLEN is updated and a following change fails. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 12:57:04 -07:00
Pablo Neira Ayuso	65cd90ac76	netfilter: nft_chain_nat_ipv4: use generic IPv4 NAT code from core Use the exported IPv4 NAT functions that are provided by the core. This removes duplicated code so iptables and nft use the same NAT codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-02 17:14:11 +02:00
Pablo Neira Ayuso	30766f4c2d	netfilter: nat: move specific NAT IPv4 to core Move the specific NAT IPv4 core functions that are called from the hooks from iptable_nat.c to nf_nat_l3proto_ipv4.c. This prepares the ground to allow iptables and nft to use the same NAT engine code that comes in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-02 17:14:10 +02:00
Christophe Gouault	880a6fab8f	xfrm: configure policy hash table thresholds by netlink Enable to specify local and remote prefix length thresholds for the policy hash table via a netlink XFRM_MSG_NEWSPDINFO message. prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh). example: struct xfrmu_spdhthresh thresh4 = { .lbits = 0; .rbits = 24; }; struct xfrmu_spdhthresh thresh6 = { .lbits = 0; .rbits = 56; }; struct nlmsghdr hdr; struct nl_msg msg; msg = nlmsg_alloc(); hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST); nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4); nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6); nla_send_auto(sk, msg); The numbers are the policy selector minimum prefix lengths to put a policy in the hash table. - lbits is the local threshold (source address for out policies, destination address for in and fwd policies). - rbits is the remote threshold (destination address for out policies, source address for in and fwd policies). The default values are: XFRMA_SPD_IPV4_HTHRESH: 32 32 XFRMA_SPD_IPV6_HTHRESH: 128 128 Dynamic re-building of the SPD is performed when the thresholds values are changed. The current thresholds can be read via a XFRM_MSG_GETSPDINFO request: the kernel replies to XFRM_MSG_GETSPDINFO requests by an XFRM_MSG_NEWSPDINFO message, with both attributes XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH. Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-02 13:37:56 +02:00
Christophe Gouault	b58555f176	xfrm: hash prefixed policies based on preflen thresholds The idea is an extension of the current policy hashing. Today only non-prefixed policies are stored in a hash table. This patch relaxes the constraints, and hashes policies whose prefix lengths are greater or equal to a configurable threshold. Each hash table (one per direction) maintains its own set of IPv4 and IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32, 128, 128). Example, if the output hash table is configured with values (16, 24, 56, 64): ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ... => hashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ... => hashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ... => unhashed The high order bits of the addresses (up to the threshold) are used to compute the hash key. Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-02 13:29:44 +02:00
Willem de Bruijn	364a9e9324	sock: deduplicate errqueue dequeue sk->sk_error_queue is dequeued in four locations. All share the exact same logic. Deduplicate. Also collapse the two critical sections for dequeue (at the top of the recv handler) and signal (at the bottom). This moves signal generation for the next packet forward, which should be harmless. It also changes the behavior if the recv handler exits early with an error. Previously, a signal for follow-up packets on the errqueue would then not be scheduled. The new behavior, to always signal, is arguably a bug fix. For rxrpc, the change causes the same function to be called repeatedly for each queued packet (because the recv handler == sk_error_report). It is likely that all packets will fail for the same reason (e.g., memory exhaustion). This code runs without sk_lock held, so it is not safe to trust that sk->sk_err is immutable inbetween releasing q->lock and the subsequent test. Introduce int err just to avoid this potential race. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:49:08 -07:00
Tom Herbert	72297c59f7	l2tp: Enable checksum unnecessary conversions for l2tp/UDP sockets Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:36:28 -07:00
Tom Herbert	884d338c04	gre: Add support for checksum unnecessary conversions Call skb_checksum_try_convert and skb_gro_checksum_try_convert after checksum is found present and validated in the GRE header for normal and GRO paths respectively. In GRO path, call skb_gro_checksum_try_convert Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:36:28 -07:00
Tom Herbert	2abb7cdc0d	udp: Add support for doing checksum unnecessary conversion Add support for doing CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE conversion in UDP tunneling path. In the normal UDP path, we call skb_checksum_try_convert after locating the UDP socket. The check is that checksum conversion is enabled for the socket (new flag in UDP socket) and that checksum field is non-zero. In the UDP GRO path, we call skb_gro_checksum_try_convert after checksum is validated and checksum field is non-zero. Since this is already in GRO we assume that checksum conversion is always wanted. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:36:28 -07:00
Tom Herbert	5a21232983	net: Support for csum_bad in skbuff This flag indicates that an invalid checksum was detected in the packet. __skb_mark_checksum_bad helper function was added to set this. Checksums can be marked bad from a driver or the GRO path (the latter is implemented in this patch). csum_bad is checked in __skb_checksum_validate_complete (i.e. calling that when ip_summed == CHECKSUM_NONE). csum_bad works in conjunction with ip_summed value. In the case that ip_summed is CHECKSUM_NONE and csum_bad is set, this implies that the first (or next) checksum encountered in the packet is bad. When ip_summed is CHECKSUM_UNNECESSARY, the first checksum after the last one validated is bad. For example, if ip_summed == CHECKSUM_UNNECESSARY, csum_level == 1, and csum_bad is set-- then the third checksum in the packet is bad. In the normal path, the packet will be dropped when processing the protocol layer of the bad checksum: __skb_decr_checksum_unnecessary called twice for the good checksums changing ip_summed to CHECKSUM_NONE so that __skb_checksum_validate_complete is called to validate the third checksum and that will fail since csum_bad is set. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:36:27 -07:00
Florian Fainelli	61b7363ffa	net: dsa: make dsa_pack_type static net/dsa/dsa.c:624:20: sparse: symbol 'dsa_pack_type' was not declared. Should it be static? Fixes: `3e8a72d1da` ("net: dsa: reduce number of protocol hooks") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 20:41:45 -07:00
stephen hemminger	688d1945bc	tcp: whitespace fixes Fix places where there is space before tab, long lines, and awkward if(){, double spacing etc. Add blank line after declaration/initialization. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 18:12:45 -07:00
David S. Miller	377655fe6b	Merge tag 'master-2014-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-08-28 Please pull this batch of fixes intended for the 3.17 stream. For the Bluetooth/6LowPAN/802.15.4 bits, Johan says: 'It contains a connection reference counting fix for LE where a connection might stay up even though it should get disconnected. The other 802.15.4 6LoWPAN related patches were sent to the bluetooth tree by Alexander Aring and described as follows by him: " these patches contains patches for the bluetooth branch. This series includes memory leak fixes and an errno value fix. Also there are two patches for sending and receiving 1280 6LoWPAN packets, which makes the IEEE 802.15.4 6LoWPAN stack more RFC compliant. "' Along with that... Alexey Khoroshilov fixes a use-after-free bug on at76c50x-usb. Hauke Mehrtens adds a PCI ID to bcma. Himangi Saraogi fixes a silly "A \|\| A" test in rtlwifi. Larry Finger adds a device ID to rtl8192cu. Maks Naumov fixes a strncmp argument in ath9k. Álvaro Fernández Rojas adds a PCI ID to ssb. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 18:08:11 -07:00
Jesper Dangaard Brouer	afb84b6261	pktgen: add flag NO_TIMESTAMP to disable timestamping Then testing the TX limits of the stack, then it is useful to be-able to disable the do_gettimeofday() timetamping on every packet. This implements a pktgen flag NO_TIMESTAMP which will disable this call to do_gettimeofday(). The performance change on (my system E5-2695) with skb_clone=0, goes from TX 2,423,751 pps to 2,567,165 pps with flag NO_TIMESTAMP. Thus, the cost of do_gettimeofday() or saving is approx 23 nanosec. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 18:06:59 -07:00
Erik Hugne	a5325ae5b8	tipc: add name distributor resiliency queue TIPC name table updates are distributed asynchronously in a cluster, entailing a risk of certain race conditions. E.g., if two nodes simultaneously issue conflicting (overlapping) publications, this may not be detected until both publications have reached a third node, in which case one of the publications will be silently dropped on that node. Hence, we end up with an inconsistent name table. In most cases this conflict is just a temporary race, e.g., one node is issuing a publication under the assumption that a previous, conflicting, publication has already been withdrawn by the other node. However, because of the (rtt related) distributed update delay, this may not yet hold true on all nodes. The symptom of this failure is a syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error". In this commit we add a resiliency queue at the receiving end of the name table distributor. When insertion of an arriving publication fails, we retain it in this queue for a short amount of time, assuming that another update will arrive very soon and clear the conflict. If so happens, we insert the publication, otherwise we drop it. The (configurable) retention value defaults to 2000 ms. Knowing from experience that the situation described above is extremely rare, there is no risk that the queue will accumulate any large number of items. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:51:48 -07:00
Erik Hugne	f4ad8a4b8b	tipc: refactor name table updates out of named packet receive routine We need to perform the same actions when processing deferred name table updates, so this functionality is moved to a separate function. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:51:48 -07:00
David S. Miller	8dcda22a5d	net: xmit_list() becomes dev_hard_start_xmit(). Now fundamentally we can process lists of SKBs as cheaply as single packets. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:56 -07:00
David S. Miller	ce93718fb7	net: Don't keep around original SKB when we software segment GSO frames. Just maintain the list properly by returning the head of the remaining SKB list from dev_hard_start_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:56 -07:00
David S. Miller	50cbe9ab5f	net: Validate xmit SKBs right when we pull them out of the qdisc. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:56 -07:00
David S. Miller	eae3f88ee4	net: Separate out SKB validation logic from transmit path. dev_hard_start_xmit() does two things, it first validates and canonicalizes the SKB, then it actually sends it. Make a set of helper functions for doing the first part. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	95f6b3dda2	net: Have xmit_list() signal more==true when appropriate. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	fa2dbdc253	net: Pass a "more" indication down into netdev_start_xmit() code paths. For now it will always be false. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	7f2e870f2a	net: Move main gso loop out of dev_hard_start_xmit() into helper. There is a slight policy change happening here as well. The previous code would drop the entire rest of the GSO skb if any of them got, for example, a congestion notification. That makes no sense, anything NET_XMIT_MASK and below is something like congestion or policing. And in the congestion case it doesn't even mean the packet was actually dropped. Just continue until dev_xmit_complete() evaluates to false. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	2ea2551375	net: Create xmit_one() helper for dev_hard_start_xmit() Hopefully making the code a bit easier to read and digest. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	10b3ad8c21	net: Do txq_trans_update() in netdev_start_xmit() That way we don't have to audit every call site to make sure it is doing this properly. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
Vincent Cuissard	83724c3329	NFC: NCI: Fix NCI RF FRAME interface usage NCI RF FRAME interface is used for all kind of tags except ISODEP ones. So for all other kind of tags the status byte has to be removed. Signed-off-by: Vincent Cuissard <cuissard@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-09-01 14:40:43 +02:00
Vincent Cuissard	3c1c0f5dc8	NFC: NCI: Fix nci_register_device init sequence All contexts have to be initiliazed before calling nfc_register_device otherwise it is possible to call nci_dev_up before ending the nci_register_device function. In such case kernel will crash on non initialized variables. Signed-off-by: Vincent Cuissard <cuissard@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-09-01 14:40:37 +02:00
Vincent Cuissard	cfdbeeafdb	NFC: NCI: Add support of ISO15693 Update nci.h to respect latest NCI specification proposal (stop using proprietary opcodes). Handle ISO15693 parameters in NCI_RF_ACTIVATED_NTF handler. Signed-off-by: Vincent Cuissard <cuissard@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-09-01 14:40:31 +02:00
Pablo Neira Ayuso	d79a61d646	netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_* CONFIG_NETFILTER_XT_TARGET_LOG is not selected anymore when jumping from 3.16 to 3.17-rc1 if you don't set on the new NF_LOG_IPV4 and NF_LOG_IPV6 switches. Change this to select the three new symbols NF_LOG_COMMON, NF_LOG_IPV4 and NF_LOG_IPV6 instead, so NETFILTER_XT_TARGET_LOG remains enabled when moving from old to new kernels. Reported-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-01 13:46:31 +02:00
Masanari Iida	1a84db567a	treewide: fix errors in printk This patch fix spelling typo in printk. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-09-01 11:18:25 +02:00
Mark A. Greer	dddb3da046	NFC: digital: Add Inititor-side PSL support In order to operate at the fasted bit rate possible, add initiator-side support for PSL REQ while in P2P mode. The PSL REQ will switch the RF technology to 424F whenever possible. Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com> Tested-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-08-31 22:15:37 +02:00
Tom Herbert	202863fe4c	sctp: Change sctp to implement csum_levels CHECKSUM_UNNECESSARY may be applied to the SCTP CRC so we need to appropriate account for this by decrementing csum_level. This is done by calling __skb_dec_checksum_unnecessary. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:41:11 -07:00
Tom Herbert	662880f442	net: Allow GRO to use and set levels of checksum unnecessary Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY and to report new checksums verfied for use in fallback to normal path. Change GRO checksum path to track csum_level using a csum_cnt field in NAPI_GRO_CB. On GRO initialization, if ip_summed is CHECKSUM_UNNECESSARY set NAPI_GRO_CB(skb)->csum_cnt to skb->csum_level + 1. For each checksum verified, decrement NAPI_GRO_CB(skb)->csum_cnt while its greater than zero. If a checksum is verfied and NAPI_GRO_CB(skb)->csum_cnt == 0, we have verified a deeper checksum than originally indicated in skbuf so increment csum_level (or initialize to CHECKSUM_UNNECESSARY if ip_summed is CHECKSUM_NONE or CHECKSUM_COMPLETE). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:41:11 -07:00
Tom Herbert	77cffe23c1	net: Clarification of CHECKSUM_UNNECESSARY This patch: - Clarifies the specific requirements of devices returning CHECKSUM_UNNECESSARY (comments in skbuff.h). - Adds csum_level field to skbuff. This is used to express how many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1). This replaces the overloading of skb->encapsulation, that field is is now only used to indicate inner headers are valid. - Change __skb_checksum_validate_needed to "consume" each checksum as indicated by csum_level as layers of the the packet are parsed. - Remove skb_pop_rcv_encapsulation, no longer needed in the new csum_level model. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:41:11 -07:00
Daniel Borkmann	38ab1fa981	net: sctp: fix ABI mismatch through sctp_assoc_to_state helper Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp tree, the official <netinet/sctp.h> header carries a copy of enum sctp_sstat_state that looks like (compared to the current in-kernel enumeration): User definition: Kernel definition: enum sctp_sstat_state { typedef enum { SCTP_EMPTY = 0, <removed> SCTP_CLOSED = 1, SCTP_STATE_CLOSED = 0, SCTP_COOKIE_WAIT = 2, SCTP_STATE_COOKIE_WAIT = 1, SCTP_COOKIE_ECHOED = 3, SCTP_STATE_COOKIE_ECHOED = 2, SCTP_ESTABLISHED = 4, SCTP_STATE_ESTABLISHED = 3, SCTP_SHUTDOWN_PENDING = 5, SCTP_STATE_SHUTDOWN_PENDING = 4, SCTP_SHUTDOWN_SENT = 6, SCTP_STATE_SHUTDOWN_SENT = 5, SCTP_SHUTDOWN_RECEIVED = 7, SCTP_STATE_SHUTDOWN_RECEIVED = 6, SCTP_SHUTDOWN_ACK_SENT = 8, SCTP_STATE_SHUTDOWN_ACK_SENT = 7, }; } sctp_state_t; This header was later on also placed into the uapi, so that user space programs can compile without having <netinet/sctp.h>, but the shipped with <linux/sctp.h> instead. While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we nevertheless have a what it appears to be dummy SCTP_EMPTY state from the very early days. While it seems to do just nothing, commit `0b8f9e25b0` ("sctp: remove completely unsed EMPTY state") did the right thing and removed this dead code. That however, causes an off-by-one when the user asks the SCTP stack via SCTP_STATUS API and checks for the current socket state thus yielding possibly undefined behaviour in applications as they expect the kernel to tell the right thing. The enumeration had to be changed however as based on the current socket state, we access a function pointer lookup-table through this. Therefore, I think the best way to deal with this is just to add a helper function sctp_assoc_to_state() to encapsulate the off-by-one quirk. Reported-by: Tristan Su <sooqing@gmail.com> Fixes: `0b8f9e25b0` ("sctp: remove completely unsed EMPTY state") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:31:08 -07:00
Eric Dumazet	d9b2938aab	net: attempt a single high order allocation In commit `ed98df3361` ("net: use __GFP_NORETRY for high order allocations") we tried to address one issue caused by order-3 allocations. We still observe high latencies and system overhead in situations where compaction is not successful. Instead of trying order-3, order-2, and order-1, do a single order-3 best effort and immediately fallback to plain order-0. This mimics slub strategy to fallback to slab min order if the high order allocation used for performance failed. Order-3 allocations give a performance boost only if they can be done without recurring and expensive memory scan. Quoting David : The page allocator relies on synchronous (sync light) memory compaction after direct reclaim for allocations that don't retry and deferred compaction doesn't work with this strategy because the allocation order is always decreasing from the previous failed attempt. This means sync light compaction will always be encountered if memory cannot be defragmented or reclaimed several times during the skb_page_frag_refill() iteration. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:28:23 -07:00
Ying Xue	cc086fcf92	tipc: fix a potential oops Commit `6c9808ce09` ("tipc: remove port_lock") accidentally involves a potential bug: when tipc socket instance(tsk) is not got with given reference number in tipc_sk_get(), tsk is set to NULL. Subsequently we jump to exit label where to decrease socket reference counter pointed by tsk pointer in tipc_sk_put(). However, As now tsk is NULL, oops may happen because of touching a NULL pointer. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:22:43 -07:00
Daniel Borkmann	10c51b5623	net: add skb_get_tx_queue() helper Replace occurences of skb_get_queue_mapping() and follow-up netdev_get_tx_queue() with an actual helper function. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:02:07 -07:00
Mika Westerberg	d0616613d9	net: rfkill: gpio: Add more Broadcom bluetooth ACPI IDs This adds one more ACPI ID of a Broadcom bluetooth chip. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-29 13:10:44 +02:00
Michal Kazior	a00f4f6e04	mac80211: fix chantype recalc warning When a device driver is unloaded local->interfaces list is cleared. If there was more than 1 interface running and connected (bound to a chanctx) then chantype recalc was called and it ended up with compat being NULL causing a call trace warning. Warn if compat becomes NULL as a result of incompatible bss_conf.chandef of interfaces bound to a given channel context only. The call trace looked like this: WARNING: CPU: 2 PID: 2594 at /devel/src/linux/net/mac80211/chan.c:557 ieee80211_recalc_chanctx_chantype+0x2cd/0x2e0() Modules linked in: ath10k_pci(-) ath10k_core ath CPU: 2 PID: 2594 Comm: rmmod Tainted: G W 3.16.0-rc1+ #150 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 0000000000000009 ffff88001ea279c0 ffffffff818dfa93 0000000000000000 ffff88001ea279f8 ffffffff810514a8 ffff88001ce09cd0 ffff88001e03cc58 0000000000000000 ffff88001ce08840 ffff88001ce09cd0 ffff88001ea27a08 Call Trace: [<ffffffff818dfa93>] dump_stack+0x4d/0x66 [<ffffffff810514a8>] warn_slowpath_common+0x78/0xa0 [<ffffffff81051585>] warn_slowpath_null+0x15/0x20 [<ffffffff818a407d>] ieee80211_recalc_chanctx_chantype+0x2cd/0x2e0 [<ffffffff818a3dda>] ? ieee80211_recalc_chanctx_chantype+0x2a/0x2e0 [<ffffffff818a4919>] ieee80211_assign_vif_chanctx+0x1a9/0x770 [<ffffffff818a6220>] __ieee80211_vif_release_channel+0x70/0x130 [<ffffffff818a6dd3>] ieee80211_vif_release_channel+0x43/0xb0 [<ffffffff81885f4e>] ieee80211_stop_ap+0x21e/0x5a0 [<ffffffff8184b9b5>] __cfg80211_stop_ap+0x85/0x520 [<ffffffff8181c188>] __cfg80211_leave+0x68/0x120 [<ffffffff8181c268>] cfg80211_leave+0x28/0x40 [<ffffffff8181c5f3>] cfg80211_netdev_notifier_call+0x373/0x6b0 [<ffffffff8107f965>] notifier_call_chain+0x55/0x110 [<ffffffff8107fa41>] raw_notifier_call_chain+0x11/0x20 [<ffffffff816a8dc0>] call_netdevice_notifiers_info+0x30/0x60 [<ffffffff816a8eb9>] __dev_close_many+0x59/0xf0 [<ffffffff816a9021>] dev_close_many+0x81/0x120 [<ffffffff816aa1c5>] rollback_registered_many+0x115/0x2a0 [<ffffffff816aa3a6>] unregister_netdevice_many+0x16/0xa0 [<ffffffff8187d841>] ieee80211_remove_interfaces+0x121/0x1b0 [<ffffffff8185e0e6>] ieee80211_unregister_hw+0x56/0x110 [<ffffffffa0011ac4>] ath10k_mac_unregister+0x14/0x60 [ath10k_core] [<ffffffffa0014fe7>] ath10k_core_unregister+0x27/0x40 [ath10k_core] [<ffffffffa003b1f4>] ath10k_pci_remove+0x44/0xa0 [ath10k_pci] [<ffffffff81373138>] pci_device_remove+0x28/0x60 [<ffffffff814cb534>] __device_release_driver+0x64/0xd0 [<ffffffff814cbcc8>] driver_detach+0xb8/0xc0 [<ffffffff814cb23a>] bus_remove_driver+0x4a/0xb0 [<ffffffff814cc697>] driver_unregister+0x27/0x50 Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-29 13:06:01 +02:00
Ying Xue	0244790c8a	xfrm: remove useless hash_resize_mutex locks In xfrm_state.c, hash_resize_mutex is defined as a local variable and only used in xfrm_hash_resize() which is declared as a work handler of xfrm.state_hash_work. But when the xfrm.state_hash_work work is put in the global workqueue(system_wq) with schedule_work(), the work will be really inserted in the global workqueue if it was not already queued, otherwise, it is still left in the same position on the the global workqueue. This means the xfrm_hash_resize() work handler is only executed once at any time no matter how many times its work is scheduled, that is, xfrm_hash_resize() is not called concurrently at all, so hash_resize_mutex is redundant for us. Cc: Christophe Gouault <christophe.gouault@6wind.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-08-29 11:40:03 +02:00
Chuck Lever	71efecb3f5	sunrpc: fix byte-swapping of displayed XID xprt_lookup_rqst() and bc_send_request() display a byte-swapped XID, but receive_cb_reply() does not. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-28 16:00:07 -04:00
J. Bruce Fields	ae89254da6	SUNRPC: Fix compile on non-x86 current_task appears to be x86-only, oops. Let's just delete this check entirely: Any developer that adds a new user without setting rq_task will get a crash the first time they test it. I also don't think there are normally any important locks held here, and I can't see any other reason why killing a server thread would bring the whole box down. So the effort to fail gracefully here looks like overkill. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `983c684466` "SUNRPC: get rid of the request wait queue" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-28 15:51:35 -04:00
Jesper Dangaard Brouer	d7cdb96808	treewide: fix synchronize_rcu() in comments Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-08-28 15:01:24 +02:00
Florian Fainelli	246d7f773c	net: dsa: add Broadcom SF2 switch driver Add support for the Broadcom Starfigther 2 switch chip using a DSA driver. This switch driver supports the following features: - configuration of the external switch port interface: MII, RevMII, RGMII and RGMII_NO_ID are supported - support for the per-port MIB counters - support for link interrupts for special ports (e.g: MoCA) - powering up/down of switch memories to conserve power when ports are unused Finally, update the compatible property for the DSA core code to match our switch top-level compatible node. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	5037d532b8	net: dsa: add Broadcom tag RX/TX handler Add support for the 4-bytes Broadcom tag that built-in switches such as the Starfighter 2 might insert when receiving packets, or that we need to insert while targetting specific switch ports. We use a fake local EtherType value for this 4-bytes switch tag: ETH_P_BRCMTAG to make sure we can assign DSA-specific network operations within the DSA drivers. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	ce31b31c68	net: dsa: allow updating fixed PHY link information Allow switch drivers to hook a PHY link update callback to perform port-specific link work. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	ec9436baed	net: dsa: allow drivers to do link adjustment Whenever libphy determines that the link status of a given PHY/port has changed, allow to call into the switch driver link adjustment callback so proper actions can be taken care of by the switch driver upon link notification. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	5aed85cec2	net: dsa: allow switches to work without tagging In case switch port tagging is disabled (voluntarily, or the switch just does not support it), allow us to continue using the defined set of dsa_device_ops in net/dsa/slave.c. We introduce dsa_protocol_is_tagged() to check whether we need to override skb->protocol and go through the DSA-specifif packet_type function, or if we just go on and receive the SKB through the normal path. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	0d8bcdd383	net: dsa: allow for more complex PHY setups Modify the DSA slave interface to be bound to an arbitray PHY, not just the ones that are available as child PHY devices of the switch MDIO bus. This allows us for instance to have external PHYs connected to a separate MDIO bus, but yet also connected to a given switch port. Under certain configurations, the physical port mask might not be a 1:1 mapping to the MII PHYs mask. This is the case, if e.g: Port 1 of the switch is used and connects to a PHY at a MDIO address different than 1. Introduce a phys_mii_mask variable which allows driver to implement and divert their own MDIO read/writes operations for a subset of the MDIO PHY addresses. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	bd47497a01	net: dsa: retain a per-port device_node pointer We will later use the per-port device_node pointer to fetch a bunch of port-specific properties. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	fa981d9af8	net: dsa: provide a switch device device tree node pointer We might need to fetch additional resources from the device tree node pointer, such as register ranges or other properties. Keep a device_node pointer around for this. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:39 -07:00
Florian Fainelli	3e8a72d1da	net: dsa: reduce number of protocol hooks DSA is currently registering one packet_type function per EtherType it needs to intercept in the receive path of a DSA-enabled Ethernet device. Right now we have three of them: trailer, DSA and eDSA, and there might be more in the future, this will not scale to the addition of new protocols. This patch proceeds with adding a new layer of abstraction and two new functions: dsa_switch_rcv() which will dispatch into the tag-protocol specific receive function implemented by net/dsa/tag_.c dsa_slave_xmit() which will dispatch into the tag-protocol specific transmit function implemented by net/dsa/tag_.c When we do create the per-port slave network devices, we iterate over the switch protocol to assign the DSA-specific receive and transmit operations. A new fake ethertype value is used: ETH_P_XDSA to illustrate the fact that this is no longer going to look like ETH_P_DSA or ETH_P_TRAILER like it used to be. This allows us to greatly simplify the check in eth_type_trans() and always override the skb->protocol with ETH_P_XDSA for Ethernet switches tagged protocol, while also reducing the number repetitive slave netdevice_ops assignments. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:39 -07:00
Julian Anastasov	eb90b0c734	ipvs: fix ipv6 hook registration for local replies commit `fc60476761` ("ipvs: changes for local real server") from 2.6.37 introduced DNAT support to local real server but the IPv6 LOCAL_OUT handler ip_vs_local_reply6() is registered incorrectly as IPv4 hook causing any outgoing IPv4 traffic to be dropped depending on the IP header values. Chris tracked down the problem to CONFIG_IP_VS_IPV6=y Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768 Reported-by: Chris J Arges <chris.j.arges@canonical.com> Tested-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-08-28 10:52:37 +09:00
Florian Westphal	253ff51635	tcp: syncookies: mark cookie_secret read_mostly only written once. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 16:30:49 -07:00
Andreea-Cristina Bernat	2688eba9d5	mac80211: Replace rcu_dereference() with rcu_access_pointer() The "rcu_dereference()" calls are used directly in conditions. Since their return values are never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacements. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} \| while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-27 12:14:10 +02:00
Julian Anastasov	ea1d5d7755	ipvs: properly declare tunnel encapsulation The tunneling method should properly use tunnel encapsulation. Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum offload is supported. Thanks to Alex Gartrell for reporting the problem, providing solution and for all suggestions. Reported-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-08-27 14:31:56 +09:00
Alexey Perevalov	f111f780ae	netfilter: nfnetlink_acct: add filter support to nfacct counter list/reset You can use this to skip accounting objects when listing/resetting via NFNL_MSG_ACCT_GET/NFNL_MSG_ACCT_GET_CTRZERO messages with the NLM_F_DUMP netlink flag. The filtering covers the following cases: 1. No filter specified. In this case, the client will get old behaviour, 2. List/reset counter object only: In this case, you have to use NFACCT_F_QUOTA as mask and value 0. 3. List/reset quota objects only: You have to use NFACCT_F_QUOTA_PKTS as mask and value - the same, for byte based quota mask should be NFACCT_F_QUOTA_BYTES and value - the same. If you want to obtain the object with any quota type (ie. NFACCT_F_QUOTA_PKTS\|NFACCT_F_QUOTA_BYTES), you need to perform two dump requests, one to obtain NFACCT_F_QUOTA_PKTS objects and another for NFACCT_F_QUOTA_BYTES. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-26 21:36:19 +02:00
Christoph Lameter	903ceff7ca	net: Replace get_cpu_var through this_cpu_ptr Replace uses of get_cpu_var for address calculation through this_cpu_ptr. Cc: netdev@vger.kernel.org Cc: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-26 13:45:47 -04:00
Andreea-Cristina Bernat	ad053a962f	mac80211: scan: Replace rcu_assign_pointer() with RCU_INIT_POINTER() The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:31 +02:00
Johannes Berg	5bc8c1f2b0	cfg80211: allow passing frame type to cfg80211_inform_bss() When using the cfg80211_inform_bss[_width]() functions drivers cannot currently indicate whether the data was received in a beacon or probe response. Fix that by passing a new enum that indicates such (or unknown). For good measure, use it in ath6kl. Acked-by: Kalle Valo <kvalo@qca.qualcomm.com> [ath6kl] Acked-by: Arend van Spriel <arend@broadcom.com> [brcmfmac] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:02 +02:00
Johannes Berg	0e227084ae	cfg80211: clarify BSS probe response vs. beacon data There are a few possible cases of where BSS data came from: 1) only a beacon has been received 2) only a probe response has been received 3) the driver didn't report what it received (this happens when using cfg80211_inform_bss[_width]()) 4) both probe response and beacon data has been received Unfortunately, in the userspace API, a few things weren't there: a) there was no way to differentiate cases 1) and 4) above without comparing the data of the IEs b) the TSF was always from the last frame, instead of being exposed for beacon/probe response separately like IEs Fix this by i) exporting a new flag attribute that indicates whether or not probe response data has been received - this addresses (a) ii) exporting a BEACON_TSF attribute that holds the beacon's TSF if a beacon has been received iii) not exporting the beacon attributes in case (3) above as that would just lead userspace into thinking the data actually came from a beacon when that isn't clear To implement this, track inside the IEs struct whether or not it (definitely) came from a beacon. Reported-by: William Seto Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:01 +02:00
Michal Kazior	f41ef64853	cfg80211: re-enable CSA for drivers that support it This reverts commit `dda444d524`. Channel switching code has been reworked and improved significantly since the time original locking issues were found. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:01 +02:00
Ido Yariv	c70f59a2a0	mac80211: don't resize skbs needlessly Header-less cloned skbs with sufficient headroom need not be cloned unless the tailroom is going to be modified. Fix ieee80211_skb_resize so it would only resize cloned skbs if either the header isn't released or the tailroom is going to be modified. Some drivers might have assumed that skbs are never cloned, so add a HW flag that explicitly permits cloned TX skbs. Drivers which do not modify TX skbs should set this flag to avoid copying skbs. Signed-off-by: Ido Yariv <idox.yariv@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:00 +02:00
Ido Yariv	ca34e3b5c8	mac80211: Fix accounting of the tailroom-needed counter When hw acceleration is enabled, the GENERATE_IV or PUT_IV_SPACE flags will only require headroom space. Consequently, the tailroom-needed counter can safely be decremented. Signed-off-by: Ido Yariv <idox.yariv@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:15:59 +02:00
Vladimir Kondratiev	970fdfa89b	cfg80211: remove @gfp parameter from cfg80211_rx_mgmt() In the cfg80211_rx_mgmt(), parameter @gfp was used for the memory allocation. But, memory get allocated under spin_lock_bh(), this implies atomic context. So, one can't use GFP_KERNEL, only variants with no __GFP_WAIT. Actually, in all occurrences GFP_ATOMIC is used (wil6210 use GFP_KERNEL by mistake), and it should be this way or warning triggered in the memory allocation code. Remove @gfp parameter as no actual choice exist, and use hard coded GFP_ATOMIC for memory allocation. Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:15:58 +02:00
Johannes Berg	649b2a4da5	mac80211: make ieee80211_vif_use_reserved_switch static Reorder some code to make ieee80211_vif_use_reserved_switch() static, no other changes. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:15:35 +02:00
Bob Copeland	f8134fed83	mac80211: mesh_plink: use get_unaligned_le16 instead of memcpy Use get_unaligned_le16 to access llid/plid. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:15:34 +02:00
Johannes Berg	14b058bbce	mac80211: fix agg_status debugfs file alignment The "RX active" string is too long, so the columns get shifted. Change it to just "RX" to avoid this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:13:37 +02:00
Denton Gentry	c7dcb45fac	mac80211: fix start_seq_num in Rx reorder offload sta->last_seq_ctrl is the seq_ctrl field from the last header seen, need to shift it 4 bits to extract the sequence number. Otherwise the ieee80211_sn_less() check at the top of ieee80211_sta_manage_reorder_buf drops frames until the sequence number catches up. Cc: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Denton Gentry <denton.gentry@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:13:32 +02:00
Bob Copeland	6c6fa49649	mac80211: mesh_plink: handle confirm frames with new plid The 802.11 standard says when processing a plink confirm frame: "If the peerLinkID in the mesh peering instance has not been set, the Local Link ID field of the Mesh Peering Confirm request shall be copied into the peerLinkID in the mesh peering instance." We were only doing this when receiving an open peering frame, but it could happen that the open frame gets lost and so we should handle this case rather than rejecting the confirm and failing the whole peering process. Reported-by: Yu Niiro <yu.niiro@gmail.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:12:55 +02:00
Felix Fietkau	3918edb0e6	mac80211: fix smps mode check for AP_VLAN In ieee80211_sta_ps_deliver_wakeup, sdata->smps_mode is checked. This is initialized only for the base AP interface, not the individual VLANs. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:12:44 +02:00
Felix Fietkau	0e67c13667	mac80211: ignore AP_VLAN in ieee80211_recalc_chanctx_chantype When bringing down the AP, a WARN_ON is hit because the bss config chandef is empty here. Since AP_VLAN channel settings do not matter for anything chanctx related (always inherits the settings from the AP interface), let's just ignore it here. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:12:37 +02:00
Johannes Berg	bb512ad073	Revert "mac80211: disable uAPSD if all ACs are under ACM" This reverts commit `24aa11ab8a`. That commit was wrong since it uses data that hasn't even been set up yet, but might be a hold-over from a previous connection. Additionally, it seems like a driver-specific workaround that shouldn't have been in mac80211 to start with. Cc: stable@vger.kernel.org Fixes: `24aa11ab8a` ("mac80211: disable uAPSD if all ACs are under ACM") Reviewed-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 09:45:35 +02:00
Masanari Iida	9b13494c91	treewide: Fix typo in printk This patch fix spelling typo in printk within vairous part of the code. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-08-26 09:35:54 +02:00
Michal Kubeček	db115037bb	net: fix checksum features handling in netif_skb_features() This is follow-up to `da08143b85` ("vlan: more careful checksum features handling") which introduced more careful feature intersection in vlan code, taking into account that HW_CSUM should be considered superset of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features() in order to avoid offloading mismatch warning when vlan is created on top of a bond consisting of slaves supporting IP/IPv6 checksumming but not vlan Tx offloading. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 17:23:03 -07:00
WANG Cong	453a940ea7	net: make skb an optional parameter for__skb_flow_dissect() Fixes: commit `690e36e726` (net: Allow raw buffers to be passed into the flow dissector) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 17:21:26 -07:00
WANG Cong	6451b3f59a	net: fix comments for __skb_flow_get_ports() Fixes: commit `690e36e726` (net: Allow raw buffers to be passed into the flow dissector) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 17:21:26 -07:00
Alexander Y. Fomichev	4c75431ac3	net: prevent of emerging cross-namespace symlinks Code manipulating sysfs symlinks on adjacent net_devices(s) currently doesn't take into account that devices potentially belong to different namespaces. This patch trying to fix an issue as follows: - check for net_ns before creating / deleting symlink. for now only netdev_adjacent_rename_links and __netdev_adjacent_dev_remove are affected, afaics __netdev_adjacent_dev_insert implies both net_devs belong to the same namespace. - Drop all existing symlinks to / from all adj_devs before switching namespace and recreate them just after. Signed-off-by: Alexander Y. Fomichev <git.user@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 15:17:43 -07:00
Tomasz Bursztyka	a796dac9a6	wireless: core: Reorder wiphy_register() notifications relevantly Currently it can send regulatory domain change notification before any NEW_WIPHY notification. Moreover, if rfill_register() fails, calling wiphy_unregister() will send a DEL_WIPHY though no NEW_WIPHY had been sent previously. Thus reordering so it properly notifies NEW_WIPHY before any other. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-08-25 16:17:41 -04:00
John W. Linville	07bc788424	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-08-25 15:58:02 -04:00
Mika Westerberg	fb70118c0e	net: rfkill: gpio: Add more Broadcom bluetooth ACPI IDs This adds one more ACPI ID of a Broadcom bluetooth chip. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-08-25 15:39:23 -04:00
John W. Linville	0fdcaa5948	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	2014-08-25 15:35:20 -04:00
Zhouyi Zhou	d1c85c2ebe	netfilter: HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure that the toolchain has the required support in addition to CONFIG_JUMP_LABEL being set. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-25 10:45:28 +02:00
David S. Miller	4798248e4e	net: Add ops->ndo_xmit_flush() Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 23:02:45 -07:00
Ian Morris	4c83acbc56	ipv6: White-space cleansing : gaps between function and symbol export This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. This patch removes some blank lines between the end of a function definition and the EXPORT_SYMBOL_GPL macro in order to prevent checkpatch warning that EXPORT_SYMBOL must immediately follow a function. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 22:37:52 -07:00
Ian Morris	cc24becae3	ipv6: White-space cleansing : Structure layouts This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. This patch addresses structure definitions, specifically it cleanses the brace placement and replaces spaces with tabs in a few places. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 22:37:52 -07:00
Ian Morris	67ba4152e8	ipv6: White-space cleansing : Line Layouts This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. A number of items are addressed in this patch: * Multiple spaces converted to tabs * Spaces before tabs removed. * Spaces in pointer typing cleansed (char )foo etc. Remove space after sizeof * Ensure spacing around comparators such as if statements. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 22:37:52 -07:00
Tom Herbert	48a5fc7731	gre: When GRE csum is present count as encap layer wrt csum In GRE demux if the GRE checksum pop rcv encapsulation so that any encapsulated checksums are treated as tunnel checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:24 -07:00
Tom Herbert	57c67ff4bd	udp: additional GRO support Implement GRO for UDPv6. Add UDP checksum verification in gro_receive for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:24 -07:00
Tom Herbert	149d0774a7	tcp: Call skb_gro_checksum_validate In tcp[64]_gro_receive call skb_gro_checksum_validate to validate TCP checksum in the gro context. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:24 -07:00
Tom Herbert	758f75d1ff	gre: call skb_gro_checksum_simple_validate Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:23 -07:00
Tom Herbert	573e8fca25	net: skb_gro_checksum_* functions Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check, and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete. These are the cognates of the normal checksum functions but are used in the gro_receive path and operate on GRO related fields in sk_buffs. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:23 -07:00
Jozsef Kadlecsik	1b05756c48	netfilter: ipset: Fix warn: integer overflows 'sizeof(map) + size set->dsize' Dan Carpenter reported that the static checker emits the warning net/netfilter/ipset/ip_set_list_set.c:600 init_list_set() warn: integer overflows 'sizeof(map) + size set->dsize' Limit the maximal number of elements in list type of sets. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:33:10 +02:00
Mark Rustad	94729f8a1e	netfilter: ipset: Resolve missing-field-initializer warnings Resolve missing-field-initializer warnings by providing a directed initializer. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:32:34 +02:00
Sergey Popovich	6e41ee684e	netfilter: ipset: netnet,netportnet: Fix value range support for IPv4 Ranges of values are broken with hash:net,net and hash:net,port,net. hash:net,net ============ # ipset create test-nn hash:net,net # ipset add test-nn 10.0.10.1-10.0.10.127,10.0.0.0/8 # ipset list test-nn Name: test-nn Type: hash:net,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 16960 References: 0 Members: 10.0.10.1,10.0.0.0/8 # ipset test test-nn 10.0.10.65,10.0.0.1 10.0.10.65,10.0.0.1 is NOT in set test-nn. # ipset test test-nn 10.0.10.1,10.0.0.1 10.0.10.1,10.0.0.1 is in set test-nn. hash:net,port,net ================= # ipset create test-npn hash:net,port,net # ipset add test-npn 10.0.10.1-10.0.10.127,tcp:80,10.0.0.0/8 # ipset list test-npn Name: test-npn Type: hash:net,port,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 17344 References: 0 Members: 10.0.10.8/29,tcp:80,10.0.0.0 10.0.10.16/28,tcp:80,10.0.0.0 10.0.10.2/31,tcp:80,10.0.0.0 10.0.10.64/26,tcp:80,10.0.0.0 10.0.10.32/27,tcp:80,10.0.0.0 10.0.10.4/30,tcp:80,10.0.0.0 10.0.10.1,tcp:80,10.0.0.0 # ipset list test-npn # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.2 10.0.10.126,tcp:80,10.0.0.2 is NOT in set test-npn. # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.0 10.0.10.126,tcp:80,10.0.0.0 is in set test-npn. # ipset create test-npn hash:net,port,net # ipset add test-npn 10.0.10.0/24,tcp:80-81,10.0.0.0/8 # ipset list test-npn Name: test-npn Type: hash:net,port,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 17024 References: 0 Members: 10.0.10.0,tcp:80,10.0.0.0 10.0.10.0,tcp:81,10.0.0.0 # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.0 10.0.10.126,tcp:80,10.0.0.0 is NOT in set test-npn. # ipset test test-npn 10.0.10.0,tcp:80,10.0.0.0 10.0.10.0,tcp:80,10.0.0.0 is in set test-npn. Correctly setup from..to variables where no IPSET_ATTR_IP_TO{,2} attribute is given, so in range processing loop we construct proper cidr value. Check whenever we have no ranges and can short cut in hash:net,net properly. Use unlikely() where appropriate, to comply with other modules. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:32:05 +02:00
Vytas Dauksa	ecc245c2bd	netfilter: ipset: Removed invalid IPSET_ATTR_MARKMASK validation Markmask is an u32, hence it can't be greater then 4294967295 ( i.e. 0xffffffff ). This was causing smatch warning: net/netfilter/ipset/ip_set_hash_gen.h:1084 hash_ipmark_create() warn: impossible condition '(markmask > 4294967295) => (0-u32max > u32max)' Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:31:34 +02:00
Ana Rey	afc5be3079	netfilter: nft_meta: Add cpu attribute support Add cpu support to meta expresion. This allows you to match packets with cpu number. Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-24 14:08:46 +02:00
Ana Rey	e2a093ff0d	netfilter: nft_meta: add pkttype support Add pkttype support for ip, ipv6 and inet families of tables. This allows you to fetch the meta packet type based on the link layer information. The loopback traffic is a special case, the packet type is guessed from the network layer header. No special handling for bridge and arp since we're not going to see such traffic in the loopback interface. Joint work with Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-24 14:06:39 +02:00
Daniel Borkmann	8fc54f6891	net: use reciprocal_scale() helper Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 12:21:21 -07:00
David S. Miller	690e36e726	net: Allow raw buffers to be passed into the flow dissector. Drivers, and perhaps other entities we have not yet considered, sometimes want to know how deep the protocol headers go before deciding how large of an SKB to allocate and how much of the packet to place into the linear SKB area. For example, consider a driver which has a device which DMAs into pools of pages and then tells the driver where the data went in the DMA descriptor(s). The driver can then build an SKB and reference most of the data via SKB fragments (which are page/offset/length triplets). However at least some of the front of the packet should be placed into the linear SKB area, which comes before the fragments, so that packet processing can get at the headers efficiently. The first thing each protocol layer is going to do is a "pskb_may_pull()" so we might as well aggregate as much of this as possible while we're building the SKB in the driver. Part of supporting this is that we don't have an SKB yet, so we want to be able to let the flow dissector operate on a raw buffer in order to compute the offset of the end of the headers. So now we have a __skb_flow_dissect() which takes an explicit data pointer and length. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 12:13:41 -07:00
Jon Paul Maloy	301bae56f2	tipc: merge struct tipc_port into struct tipc_sock We complete the merging of the port and socket layer by aggregating the fields of struct tipc_port directly into struct tipc_sock, and moving the combined structure into socket.c. We also move all functions and macros that are not any longer exposed to the rest of the stack into socket.c, and rename them accordingly. Despite the size of this commit, there are no functional changes. We have only made such changes that are necessary due of the removal of struct tipc_port. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	808d90f9c5	tipc: remove files ref.h and ref.c The reference table is now 'socket aware' instead of being generic, and has in reality become a socket internal table. In order to be able to minimize the API exposed by the socket layer towards the rest of the stack, we now move the reference table definitions and functions into the file socket.c, and rename the functions accordingly. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	2e84c60b77	tipc: remove include file port.h We move the inline functions in the file port.h to socket.c, and modify their names accordingly. We move struct tipc_port and some macros to socket.h. Finally, we remove the file port.h. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	0fc87aaebd	tipc: remove source file port.c In this commit, we move the remaining functions in port.c to socket.c, and give them new names that correspond to their new location. We then remove the file port.c. There are only cosmetic changes to the moved functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	6c9808ce09	tipc: remove port_lock In previous commits we have reduced usage of port_lock to a minimum, and complemented it with usage of bh_lock_sock() at the remaining locations. The purpose has been to remove this lock altogether, since it largely duplicates the role of bh_lock_sock. We are now ready to do this. However, we still need to protect the BH callers from inadvertent release of the socket while they hold a reference to it. We do this by replacing port_lock by a combination of a rw-lock protecting the reference table as such, and updating the socket reference counter while the socket is referenced from BH. This technique is more standard and comprehensible than the previous approach, and turns out to have a positive effect on overall performance. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	9b50fd087a	tipc: replace port pointer with socket pointer in registry In order to make tipc_sock the only entity referencable from other parts of the stack, we add a tipc_sock pointer instead of a tipc_port pointer to the registry. As a consequence, we also let the function tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port. We keep the function's name for now, since the lock still is owned by the port. This is another step in the direction of eliminating port_lock, replacing its usage with lock_sock() and bh_lock_sock(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5a9ee0be33	tipc: use registry when scanning sockets The functions tipc_port_get_ports() and tipc_port_reinit() scan over all sockets/ports to access each of them. This is done by using a dedicated linked list, 'tipc_socks' where all sockets are members. The list is in turn protected by a spinlock, 'port_list_lock', while each socket is locked by using port_lock at the moment of access. In order to reduce complexity and risk of deadlock, we want to get rid of the linked list and the accompanying spinlock. This is what we do in this commit. Instead of the linked list, we use the port registry to scan across the sockets. We also add usage of bh_lock_sock() inside the scope of port_lock in both functions, as a preparation for the complete removal of port_lock. Finally, we move the functions from port.c to socket.c, and rename them to tipc_sk_sock_show() and tipc_sk_reinit() repectively. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5b8fa7ce82	tipc: eliminate functions tipc_port_init and tipc_port_destroy After the latest changes to the socket/port layer the existence of the functions tipc_port_init() and tipc_port_destroy() cannot be justified. They are both called only once, from tipc_sk_create() and tipc_sk_delete() respectively, and their functionality can better be merged into the latter two functions. This also entails that all remaining references to port_lock now are made from inside socket.c, something that will make it easier to remove this lock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	739f5e4efc	tipc: redefine message acknowledge function The function tipc_acknowledge() is a remnant from the obsolete native API. Currently, it grabs port_lock, before building an acknowledge message and sending it to the peer. Since all access to socket members now is protected by the socket lock, it has become unnecessary to grab port_lock here. In this commit, we remove the usage of port_lock, simplify the function, and move it to socket.c, renaming it to tipc_sk_send_ack(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	dadebc0029	tipc: eliminate port_connect()/port_disconnect() functions tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete native API. Their only task is to grab port_lock and call the functions __tipc_port_connect()/__tipc_port_disconnect() respectively, which will perform the actual state change. Since socket/port exection now is single-threaded the use of port_lock is not needed any more, so we can safely replace the two functions with their lock-free counterparts. In this commit, we remove the two functions. Furthermore, the contents of __tipc_port_disconnect() is so trivial that we choose to eliminate that function too, expanding its functionality into tipc_shutdown(). __tipc_port_connect() is simplified, moved to socket.c, and given the more correct name tipc_sk_finish_conn(). Finally, we eliminate the function auto_connect(), and expand its contents into filter_connect(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	80e44c2225	tipc: eliminate function tipc_port_shutdown() tipc_port_shutdown() is a remnant from the now obsolete native interface. As such it grabs port_lock in order to protect itself from concurrent BH processing. However, after the recent changes to the port/socket upcalls, sockets are now basically single-threaded, and all execution, except the read-only tipc_sk_timer(), is executing within the protection of lock_sock(). So the use of port_lock is not needed here. In this commit we eliminate the whole function, and merge it into its only caller, tipc_shutdown(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5728901581	tipc: clean up socket timer function The last remaining BH upcall to the socket, apart for the message reception function tipc_sk_rcv(), is the timer function. We prefer to let this function continue executing in BH, since it only does read-acces to semi-permanent data, but we make three changes to it: 1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope of port_lock. This is a preparation for replacing port_lock with bh_lock_sock() at the locations where it is still used. 2) We move the function from port.c to socket.c, as a further step of eliminating the port code level altogether. 3) We let it make use of the newly introduced tipc_msg_create() function. This enables us to get rid of three context specific functions (port_create_self_abort_msg() etc.) in port.c Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	02be61a981	tipc: use message to abort connections when losing contact to node In the current implementation, each 'struct tipc_node' instance keeps a linked list of those ports/sockets that are connected to the node represented by that struct. The purpose of this is to let the node object know which sockets to alert when it loses contact with its peer node, i.e., which sockets need to have their connections aborted. This entails an unwanted direct reference from the node structure back to the port/socket structure, and a need to grab port_lock when we have to make an upcall to the port. We want to get rid of this unecessary BH entry point into the socket, and also eliminate its use of port_lock. In this commit, we instead let the node struct keep list of "connected socket" structs, which each represents a connected socket, but is allocated independently by the node at the moment of connection. If the node loses contact with its peer node, the list is traversed, and a "connection abort" message is created for each entry in the list. The message is sent to it respective connected socket using the ordinary data path, and the receiving socket aborts its connections upon reception of the message. This enables us to get rid of the direct reference from 'struct node' to ´struct port', and another unwanted BH access point to the latter. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	50100a5e39	tipc: use pseudo message to wake up sockets after link congestion The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	1dd0bd2b14	tipc: introduce new function tipc_msg_create() The function tipc_msg_init() has turned out to be of limited value in many cases. It take too few parameters to be usable for creating a complete message, it makes too many assumptions about what the message should be used for, and it does not allocate any buffer to be returned to the caller. Therefore, we now introduce the new function tipc_msg_create(), which takes all the parameters needed to create a full message, and returns a buffer of the requested size. The new function will be very useful for the changes we will be doing in later commits in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
David S. Miller	f9474ddfaa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pulling to get some TIPC fixes that a net-next series depends upon. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:12:08 -07:00
Yuchung Cheng	989e04c5bc	tcp: improve undo on timeout Upon timeout, undo (via both timestamps/Eifel and DSACKs) was disabled if any retransmits were still in flight. The concern was perhaps that spurious retransmission sent in a previous recovery episode may trigger DSACKs to falsely undo the current recovery. However, this inadvertently misses undo opportunities (using either TCP timestamps or DSACKs) when timeout occurs during a loss episode, i.e. recurring timeouts or timeout during fast recovery. In these cases some retransmissions will be in flight but we should allow undo. Furthermore, we should only reset undo_marker and undo_retrans upon timeout if we are starting a new recovery episode. Finally, when we do reset our undo state, we now do so in a manner similar to tcp_enter_recovery(), so that we require a DSACK for each of the outstsanding retransmissions. This will achieve the original goal by requiring that we receive the same number of DSACKs as retransmissions. This patch increases the undo events by 50% on Google servers. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 21:28:02 -07:00
Eric Dumazet	884cf705c7	net: remove dead code after sk_data_ready change As a followup to commit `676d23690f` ("net: Fix use after free by removing length arg from sk_data_ready callbacks"), we can remove some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 21:08:50 -07:00
Eric Dumazet	d2de875c6d	net: use ktime_get_ns() and ktime_get_real_ns() helpers ktime_get_ns() replaces ktime_to_ns(ktime_get()) ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real()) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 19:57:23 -07:00
Michal Kazior	47e4df94d1	mac80211: fix channel switch for chanctx-based drivers The new_ctx pointer is set only for non-chanctx drivers. This yielded a crash for chanctx-based drivers during channel switch finalization: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211] Use an adequate chanctx pointer to fix this. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-22 14:45:49 -07:00
Himangi Saraogi	c0b802367b	af_decnet: Use time_after_eq The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2,E3; @@ - jiffies - E1 >= (E2E3) + time_after_eq(jiffies, E1+E2E3) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:11 -07:00
Himangi Saraogi	8b1b1eb521	decnet: Use time_after_eq The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ - (jiffies - E1) >= E2 + time_after_eq(jiffies, E1+E2) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:11 -07:00
Himangi Saraogi	c72c95a064	ipconfig: Use time_before The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ - jiffies - E1 < E2 + time_before(jiffies, E1+E2) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:11 -07:00
Himangi Saraogi	b5c5c36d36	dn_dev: Use time_before The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ ( - (jiffies - E1) < E2 + time_before(jiffies, E1+E2) ) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:11 -07:00
Andreea-Cristina Bernat	0932997e34	br_multicast: Replace rcu_assign_pointer() with RCU_INIT_POINTER() The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:11 -07:00
Andreea-Cristina Bernat	8c6b00c816	net/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer() The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacement. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} \| while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:10 -07:00
Andreea-Cristina Bernat	e6b688838e	net/ipv4/igmp.c: Replace rcu_dereference() with rcu_access_pointer() The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacement. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} \| while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:10 -07:00
Sébastien Barré	1dced6a854	ipv4: Restore accept_local behaviour in fib_validate_source() Commit `7a9bc9b81a` ("ipv4: Elide fib_validate_source() completely when possible.") introduced a short-circuit to avoid calling fib_validate_source when not needed. That change took rp_filter into account, but not accept_local. This resulted in a change of behaviour: with rp_filter and accept_local off, incoming packets with a local address in the source field should be dropped. Here is how to reproduce the change pre/post `7a9bc9b81a` commit: -configure the same IPv4 address on hosts A and B. -try to send an ARP request from B to A. -The ARP request will be dropped before that commit, but accepted and answered after that commit. This adds a check for ACCEPT_LOCAL, to maintain full fib validation in case it is 0. We also leave __fib_validate_source() earlier when possible, based on the same check as fib_validate_source(), once the accept_local stuff is verified. Cc: Gregory Detal <gregory.detal@uclouvain.be> Cc: Christoph Paasch <christoph.paasch@uclouvain.be> Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:10 -07:00
Daniel Borkmann	aa4a83ee8b	net: sctp: fix suboptimal edge-case on non-active active/retrans path selection In SCTP, selection of active (T.ACT) and retransmission (T.RET) transports is being done whenever transport control operations (UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport(). Commits `4c47af4d5e` ("net: sctp: rework multihoming retransmission path selection to rfc4960") and `a7288c4dd5` ("net: sctp: improve sctp_select_active_and_retran_path selection") have both improved it towards a more fine-grained and optimal path selection. Currently, the selection algorithm for T.ACT and T.RET is as follows: 1) Elect the two most recently used ACTIVE transports T1, T2 for T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used 2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set T.ACT<-T.PRI and T.RET<-T1 3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1 4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is still slightly suboptimal as it can lead to the following scenario: Setup: <A> <B> T1: p1p1 (10.0.10.10) <==> .'`) <==> p1p1 (10.0.10.12) <= T.PRI T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22) net.sctp.rto_min = 1000 net.sctp.path_max_retrans = 2 net.sctp.pf_retrans = 0 net.sctp.hb_interval = 1000 T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to link flapping). Here, the first time transmission is sent over PF path T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks are sent over the INACTIVE path T1 (T.PRI), which is not good. After the patch, it's choosing better transports in both cases by modifying step 4): 4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is the most recently used (if avail) in PF, set T.RET<-T.ACT_new This will still select a best possible path in PF if available (which can also include T.PRI/T.RET), and set both T.ACT/T.RET to it. In case sctp_assoc_control_transport() just put T.ACT_old into INACTIVE as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just for a very short while before going back ACTIVE, it will guarantee that this path will be reselected for T.ACT/T.RET since T3 (PF) is not available. Previously, this was not possible, as we would only select between T.PRI and T.RET, and a possible T3 would be NULL due to the fact that we have just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE and would select a suboptimal path when T.PRI/T.RET have worse properties. In the case that T.ACT_old permanently went to INACTIVE during this transition and there's no PF path available, plus T.PRI and T.RET are INACTIVE as well, we would now camp on T.ACT_old, but if everything is being INACTIVE there's really not much we can do except hoping for a successful HB to bring one of the transports back up again and, thus cause a new selection through sctp_assoc_control_transport(). Now both tests work fine: Case 1: 1. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET 2. T1 S(ACTIVE) T.ACT, T.RET T2 S(PF) 3. T1 S(ACTIVE) T.ACT, T.RET T2 S(INACTIVE) 5. T1 S(PF) T.ACT, T.RET T2 S(INACTIVE) [ 5.1 T1 S(INACTIVE) T.ACT, T.RET T2 S(INACTIVE) ] 6. T1 S(ACTIVE) T.ACT, T.RET T2 S(INACTIVE) 7. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET Case 2: 1. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET 2. T1 S(PF) T2 S(ACTIVE) T.ACT, T.RET 3. T1 S(INACTIVE) T2 S(ACTIVE) T.ACT, T.RET 5. T1 S(INACTIVE) T2 S(PF) T.ACT, T.RET [ 5.1 T1 S(INACTIVE) T2 S(INACTIVE) T.ACT, T.RET ] 6. T1 S(INACTIVE) T2 S(ACTIVE) T.ACT, T.RET 7. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 11:31:30 -07:00
Daniel Borkmann	ea4f19c1f8	net: sctp: spare unnecessary comparison in sctp_trans_elect_best When both transports are the same, we don't have to go down that road only to realize that we will return the very same transport. We are guaranteed that curr is always non-NULL. Therefore, just short-circuit this special case. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 11:31:30 -07:00
Jiri Benc	2ba5af42a7	openvswitch: fix panic with multiple vlan headers When there are multiple vlan headers present in a received frame, the first one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the skb beyond the VLAN TPID may be still non-linear, including the inner TCI and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is executed, __pop_vlan_tci pulls the next vlan header into vlan_tci. This leads to two things: 1. Part of the resulting ethernet header is in the non-linear part of the skb. When eth_type_trans is called later as the result of OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci is in fact accessing random data when it reads past the TPID. 2. network_header points into the ethernet header instead of behind it. mac_len is set to a wrong value (10), too. Reported-by: Yulong Pei <ypei@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 11:24:04 -07:00
Benjamin Block	793c3b4000	net: ipv6: fib: don't sleep inside atomic lock The function fib6_commit_metrics() allocates a piece of memory in mode GFP_KERNEL while holding an atomic lock from higher up in the stack, in the function __ip6_ins_rt(). This produces the following BUG: > BUG: sleeping function called from invalid context at mm/slub.c:1250 > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd > 2 locks held by dhcpcd/2909: > #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81978e67>] rtnl_lock+0x17/0x20 > #1: (&tb->tb6_lock){++--+.}, at: [<ffffffff81a6951a>] ip6_route_add+0x65a/0x800 > CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1 > Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013 > 0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000 > ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98 > 0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001 > Call Trace: > [<ffffffff81af135a>] dump_stack+0x4e/0x68 > [<ffffffff810f8d3a>] __might_sleep+0x10a/0x120 > [<ffffffff8121ad0e>] kmem_cache_alloc_trace+0x4e/0x190 > [<ffffffff81a6bcd6>] ? fib6_commit_metrics+0x66/0x110 > [<ffffffff81a6bcd6>] fib6_commit_metrics+0x66/0x110 > [<ffffffff81a6cbf3>] fib6_add+0x883/0xa80 > [<ffffffff81a6951a>] ? ip6_route_add+0x65a/0x800 > [<ffffffff81a69535>] ip6_route_add+0x675/0x800 > [<ffffffff81a68f2a>] ? ip6_route_add+0x6a/0x800 > [<ffffffff81a6990c>] inet6_rtm_newroute+0x5c/0x80 > [<ffffffff8197cf01>] rtnetlink_rcv_msg+0x211/0x260 > [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20 > [<ffffffff81119708>] ? lock_release_holdtime+0x28/0x180 > [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20 > [<ffffffff8197ccf0>] ? __rtnl_unlock+0x20/0x20 > [<ffffffff819a989e>] netlink_rcv_skb+0x6e/0xd0 > [<ffffffff81978ee5>] rtnetlink_rcv+0x25/0x40 > [<ffffffff819a8e59>] netlink_unicast+0xd9/0x180 > [<ffffffff819a9600>] netlink_sendmsg+0x700/0x770 > [<ffffffff81103735>] ? local_clock+0x25/0x30 > [<ffffffff8194e83c>] sock_sendmsg+0x6c/0x90 > [<ffffffff811f98e3>] ? might_fault+0xa3/0xb0 > [<ffffffff8195ca6d>] ? verify_iovec+0x7d/0xf0 > [<ffffffff8194ec3e>] ___sys_sendmsg+0x37e/0x3b0 > [<ffffffff8111ef15>] ? trace_hardirqs_on_caller+0x185/0x220 > [<ffffffff81af979e>] ? mutex_unlock+0xe/0x10 > [<ffffffff819a55ec>] ? netlink_insert+0xbc/0xe0 > [<ffffffff819a65e5>] ? netlink_autobind.isra.30+0x125/0x150 > [<ffffffff819a6520>] ? netlink_autobind.isra.30+0x60/0x150 > [<ffffffff819a84f9>] ? netlink_bind+0x159/0x230 > [<ffffffff811f989a>] ? might_fault+0x5a/0xb0 > [<ffffffff8194f25e>] ? SYSC_bind+0x7e/0xd0 > [<ffffffff8194f8cd>] __sys_sendmsg+0x4d/0x80 > [<ffffffff8194f912>] SyS_sendmsg+0x12/0x20 > [<ffffffff81afc692>] system_call_fastpath+0x16/0x1b Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC. Signed-off-by: Benjamin Block <bebl@mageta.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 10:54:49 -07:00
zhuyj	061079ac0b	sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe Since the transport has always been in state SCTP_UNCONFIRMED, it therefore wasn't active before and hasn't been used before, and it always has been, so it is unnecessary to bug the user with a notification. Reported-by: Deepak Khandelwal <khandelwal.deepak.1987@gmail.com> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Suggested-by: Michael Tuexen <tuexen@fh-muenster.de> Suggested-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-21 21:33:17 -07:00
Eric Dumazet	dc808110bb	packet: handle too big packets for PACKET_V3 af_packet can currently overwrite kernel memory by out of bound accesses, because it assumed a [new] block can always hold one frame. This is not generally the case, even if most existing tools do it right. This patch clamps too long frames as API permits, and issue a one time error on syslog. [ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82 In this example, packet header tp_snaplen was set to 3966, and tp_len was set to 5042 (skb->len) Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `f6fb8f100b` ("af-packet: TPACKET_V3 flexible buffer implementation.") Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-21 16:44:28 -07:00
chas williams - CONTRACTOR	6df378d2d1	lec: Use rtnl lock/unlock when updating MTU The LECS response contains the MTU that should be used. Correctly synchronize with other layers when updating. Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-21 16:31:23 -07:00
Johan Hedberg	f161dd4122	Bluetooth: Fix hci_conn reference counting for auto-connections Recently the LE passive scanning and auto-connections feature was introduced. It uses the hci_connect_le() API which returns a hci_conn along with a reference count to that object. All previous users would tie this returned reference to some existing object, such as an L2CAP channel, and there'd be no leaked references this way. For auto-connections however the reference was returned but not stored anywhere, leaving established connections with one higher reference count than they should have. Instead of playing special tricks with hci_conn_hold/drop this patch associates the returned reference from hci_connect_le() with the object that in practice does own this reference, i.e. the hci_conn_params struct that caused us to initiate a connection in the first place. Once the connection is established or fails to establish this reference is removed appropriately. One extra thing needed is to call hci_pend_le_actions_clear() before calling hci_conn_hash_flush() so that the reference is cleared before the hci_conn objects are fully removed. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-20 21:57:39 +03:00
Pablo Neira Ayuso	1e8430f30b	netfilter: nf_tables: nat expression must select CONFIG_NF_NAT This enables the netfilter NAT engine in first place, otherwise you cannot ever select the nf_tables nat expression if iptables is not selected. Reported-by: Matteo Croce <technoboy85@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-19 21:42:45 +02:00
Daniel Borkmann	caa8ad94ed	netfilter: x_tables: allow to use default cgroup match There's actually no good reason why we cannot use cgroup id 0, so lets just remove this artificial barrier. Reported-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-19 21:38:55 +02:00
David S. Miller	02784f1b05	tipc: Fix build. Missing semicolon in range check fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-19 11:16:38 -07:00
Vasily Averin	7201c1ddf7	cbq: now_rt removal Now q->now_rt is identical to q->now and is not required anymore. Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-19 10:58:44 -07:00
Vasily Averin	73d0f37ac4	cbq: incorrectly low bandwidth setting blocks limited traffic Mainstream commit `f0f6ee1f70` ("cbq: incorrect processing of high limits") have side effect: if cbq bandwidth setting is less than real interface throughput non-limited traffic can delay limited traffic for a very long time. This happen because of q->now changes incorrectly in cbq_dequeue(): in described scenario L2T is much greater than real time delay, and q->now gets an extra boost for each transmitted packet. Accumulated boost prevents update q->now, and blocked class can wait very long time until (q->now >= cl->undertime) will be true again. To fix the problem the patch updates q->now on each cbq_update() call. L2T-related pre-modification q->now was moved to cbq_update(). My testing confirmed that it fixes the problem and did not discover any side-effects Fixes: `f0f6ee1f70` ("cbq: incorrect processing of high limits") Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-19 10:58:44 -07:00
Martin Townsend	6697dabe27	ieee802154: 6lowpan: ensure MTU of 1280 for 6lowpan This patch drops the userspace accessable sysfs entry for the maximum datagram size of a 6LoWPAN fragment packet. A fragment should not have a datagram size value greater than 1280 byte. Instead of make this value configurable, we accept 1280 datagram size fragment packets only. Signed-off-by: Martin Townsend <martin.townsend@xsilon.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-19 19:17:42 +02:00
Alexander Aring	685d632804	ieee802154: 6lowpan: ensure of sending 1280 packets This patch changes the 1281 MTU to 1280. Others stack have only a 1280 byte array for uncompressed 6LoWPAN packets, this avoid that these stacks have an overflow. Sending 1281 uncompressed 6LoWPAN packets isn't also rfc complaint. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-19 19:17:41 +02:00
Martin Townsend	6e361d6ffe	ieee802154: mac802154: handle the reserved dest mode by dropping the packet If received frame contains the reserved destination address mode. The frame should be dropped and free the skb. Signed-off-by: Martin Townsend <martin.townsend@xsilon.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-19 19:17:41 +02:00
Alexander Aring	c4cb901ac6	ieee802154: 6lowpan_rtnl: fix correct errno value This patch correct the return value of lowpan_alloc_frag if an error occur. Errno numbers should always be negative. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-19 19:17:41 +02:00
Martin Townsend	7629d1eaf3	mac802154: fixed potential skb leak with mac802154_parse_frame_start This patch fix a memory leak if received frame was not able to parse. Signed-off-by: Martin Townsend <martin.townsend@xsilon.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-19 19:17:41 +02:00
Pablo Neira Ayuso	8993cf8edf	netfilter: move NAT Kconfig switches out of the iptables scope Currently, the NAT configs depend on iptables and ip6tables. However, users should be capable of enabling NAT for nft without having to switch on iptables. Fix this by adding new specific IP_NF_NAT and IP6_NF_NAT config switches for iptables and ip6tables NAT support. I have also moved the original NF_NAT_IPV4 and NF_NAT_IPV6 configs out of the scope of iptables to make them independent of it. This patch also adds NETFILTER_XT_NAT which selects the xt_nat combo that provides snat/dnat for iptables. We cannot use NF_NAT anymore since nf_tables can select this. Reported-by: Matteo Croce <technoboy85@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-18 21:55:54 +02:00
Trond Myklebust	f8d1ff47b6	SUNRPC: Optimise away svc_recv_available We really do not want to do ioctls in the server's fast path. Instead, let's use the fact that we managed to read a full record as the indicator that we should try to read the socket again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-17 12:00:11 -04:00
Trond Myklebust	0c0746d03e	SUNRPC: More optimisations of svc_xprt_enqueue() Just move the transport locking out of the spin lock protected area altogether. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-17 12:00:11 -04:00
Trond Myklebust	a4aa8054a6	SUNRPC: Fix broken kthread_should_stop test in svc_get_next_xprt We should definitely not be exiting svc_get_next_xprt() with the thread enqueued. Fix this by ensuring that we fall through to the dequeue. Also move the test itself outside the spin lock protected section. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-17 12:00:11 -04:00
Trond Myklebust	983c684466	SUNRPC: get rid of the request wait queue We're always _only_ waking up tasks from within the sp_threads list, so we know that they are enqueued and alive. The rq_wait waitqueue is just a distraction with extra atomic semantics. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-17 12:00:11 -04:00
Trond Myklebust	106f359cf4	SUNRPC: Do not grab pool->sp_lock unnecessarily in svc_get_next_xprt Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-17 12:00:10 -04:00
Trond Myklebust	9e5b208dc9	SUNRPC: Do not override wspace tests in svc_handle_xprt We already determined that there was enough wspace when we called svc_xprt_enqueue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-17 12:00:10 -04:00
Erik Hugne	ac32c7f705	tipc: fix message importance range check Commit `3b4f302d85` ("tipc: eliminate redundant locking") introduced a bug by removing the sanity check for message importance, allowing programs to assign any value to the msg_user field. This will mess up the packet reception logic and may cause random link resets. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-16 20:17:34 -07:00
Sven Eckelmann	e050dbeb0d	batman-adv: Fix parameter order of hlist_add_behind `1d023284c3` ("list: fix order of arguments for hlist_add_after(_rcu)") was incorrectly rebased on top of `d9124268d8` ("batman-adv: Fix out-of-order fragmentation support"). The parameter order change of the rebased patch was not re-applied as expected. This causes a memory leak and can cause crashes when out-of-order packets are received. hlist_add_behind will try to access the uninitalized list pointers of frag_entry_new to find the previous/next entry and may modify/read random memory locations. Signed-off-by: Sven Eckelmann <sven@narfation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-16 19:19:08 -07:00
Eliad Peller	53b954ee4a	mac80211: disable 40MHz support in case of 20MHz AP If the AP only advertises support for 20MHz (in the ht operation ie), disable 40MHz and VHT. This can improve interoperability with APs that don't like stations exceeding their own advertised capabilities. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-15 14:38:08 +02:00
Johannes Berg	a74a8c846f	mac80211: don't duplicate station QoS capability data We currently track the QoS capability twice: for all peer stations in the WLAN_STA_WME flag, and for any clients associated to an AP interface separately for drivers in the sta->sta.wme field. Remove the WLAN_STA_WME flag and track the capability only in the driver-visible field, getting rid of the limitation that the field is only valid in AP mode. Reviewed-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-15 14:38:08 +02:00
Thomas Graf	9ce12eb16f	netlink: Annotate RCU locking for seq_file walker Silences the following sparse warnings: net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-14 15:13:40 -07:00
Neal Cardwell	0c9ab09223	tcp: fix ssthresh and undo for consecutive short FRTO episodes Fix TCP FRTO logic so that it always notices when snd_una advances, indicating that any RTO after that point will be a new and distinct loss episode. Previously there was a very specific sequence that could cause FRTO to fail to notice a new loss episode had started: (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue (2) receiver ACKs packet 1 (3) FRTO sends 2 more packets (4) RTO timer fires again (should start a new loss episode) The problem was in step (3) above, where tcp_process_loss() returned early (in the spot marked "Step 2.b"), so that it never got to the logic to clear icsk_retransmits. Thus icsk_retransmits stayed non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero icsk_retransmits, decide that this RTO is not a new episode, and decide not to cut ssthresh and remember the current cwnd and ssthresh for undo. There were two main consequences to the bug that we have observed. First, ssthresh was not decreased in step (4). Second, when there was a series of such FRTO (1-4) sequences that happened to be followed by an FRTO undo, we would restore the cwnd and ssthresh from before the entire series started (instead of the cwnd and ssthresh from before the most recent RTO). This could result in cwnd and ssthresh being restored to values much bigger than the proper values. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: `e33099f96d` ("tcp: implement RFC5682 F-RTO") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-14 14:38:55 -07:00
Hannes Frederic Sowa	a26552afe8	tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic tcp_tw_recycle heavily relies on tcp timestamps to build a per-host ordering of incoming connections and teardowns without the need to hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for the last measured RTO. To do so, we keep the last seen timestamp in a per-host indexed data structure and verify if the incoming timestamp in a connection request is strictly greater than the saved one during last connection teardown. Thus we can verify later on that no old data packets will be accepted by the new connection. During moving a socket to time-wait state we already verify if timestamps where seen on a connection. Only if that was the case we let the time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN will be used. But we don't verify this on incoming SYN packets. If a connection teardown was less than TCP_PAWS_MSL seconds in the past we cannot guarantee to not accept data packets from an old connection if no timestamps are present. We should drop this SYN packet. This patch closes this loophole. Please note, this patch does not make tcp_tw_recycle in any way more usable but only adds another safety check: Sporadic drops of SYN packets because of reordering in the network or in the socket backlog queues can happen. Users behing NAT trying to connect to a tcp_tw_recycle enabled server can get caught in blackholes and their connection requests may regullary get dropped because hosts behind an address translator don't have synchronized tcp timestamp clocks. tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled. In general, use of tcp_tw_recycle is disadvised. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-14 14:38:54 -07:00
Neal Cardwell	4fab907195	tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() Make sure we use the correct address-family-specific function for handling MTU reductions from within tcp_release_cb(). Previously AF_INET6 sockets were incorrectly always using the IPv6 code path when sometimes they were handling IPv4 traffic and thus had an IPv4 dst. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Fixes: `563d34d057` ("tcp: dont drop MTU reduction indications") Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-14 14:38:54 -07:00
Shmulik Ladkani	bc8fc7b8f8	sit: Fix ipip6_tunnel_lookup device matching criteria As of `4fddbf5d78` ("sit: strictly restrict incoming traffic to tunnel link device"), when looking up a tunnel, tunnel's underlying interface (t->parms.link) is verified to match incoming traffic's ingress device. However the comparison was incorrectly based on skb->dev->iflink. Instead, dev->ifindex should be used, which correctly represents the interface from which the IP stack hands the ipip6 packets. This allows setting up sit tunnels bound to vlan interfaces (otherwise incoming ipip6 traffic on the vlan interface was dropped due to ipip6_tunnel_lookup match failure). Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-14 14:38:54 -07:00
Andrey Vagin	9d186cac7f	tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) We don't know right timestamp for repaired skb-s. Wrong RTT estimations isn't good, because some congestion modules heavily depends on it. This patch adds the TCPCB_REPAIRED flag, which is included in TCPCB_RETRANS. Thanks to Eric for the advice how to fix this issue. This patch fixes the warning: [ 879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380() [ 879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1 [ 879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 879.568177] 0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2 [ 879.568776] 0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80 [ 879.569386] ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc [ 879.569982] Call Trace: [ 879.570264] [<ffffffff817aa2d2>] dump_stack+0x4d/0x66 [ 879.570599] [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0 [ 879.570935] [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20 [ 879.571292] [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380 [ 879.571614] [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710 [ 879.571958] [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370 [ 879.572315] [<ffffffff81657459>] release_sock+0x89/0x1d0 [ 879.572642] [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860 [ 879.573000] [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80 [ 879.573352] [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40 [ 879.573678] [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20 [ 879.574031] [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0 [ 879.574393] [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b [ 879.574730] ---[ end trace a17cbc38eb8c5c00 ]--- v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit, where it's set for other skb-s. Fixes: `431a91242d` ("tcp: timestamp SYN+DATA messages") Fixes: `740b0f1841` ("tcp: switch rtt estimations to usec resolution") Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-14 14:38:54 -07:00
Lukasz Rymanowski	13cac15296	Bluetooth: Fix ERTM L2CAP resend packet I-Frame which is going to be resend already has FCS field added and set (if it was required). Adding additional FCS field calculated from data + old FCS in resend function is incorrect. This patch fix that. Issue has been found during PTS testing. Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 09:47:02 +02:00
Lukasz Rymanowski	069cb27017	Bluetooth: Improve data packing in SAR mode There is no need to decrease pdu size with L2CAP SDU lenght in Start L2CAP SDU frame. Start packtet is just 2 bytes longer as specified and we can keep payload as long as possible. When testing SAR L2CAP against PTS, L2CAP channel is usually configured in that way, that SDU = MPS * 3. PTS expets then 3 I-Frames from IUT: Start, Continuation and End frame. Without this fix, we sent 4 I-Frames. We could pass a test by using -b option in l2test and send just two bytes less than SDU length. With this patch no need to use -b option. Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:26 +02:00
Varka Bhadram	f55889128a	mac802154: common tx error path This patch introduce the common error path on failure of Tx by inserting the label 'err_tx'. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:25 +02:00
Alexander Aring	0ba1f94e72	ieee802154: 6lowpan: remove unused function Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:25 +02:00
Varka Bhadram	b288a4963f	mac802154: common error path By introducing label fail, making the common error path for mac802154_llsec_decrypt() and packet type default case. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:25 +02:00
Varka Bhadram	24bbd44a96	mac802154: cleanup in rx path This patch replace the sizeof(struct rx_work) with sizeof(*work) and directly passing the skb in mac802154_subif_rx() Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:24 +02:00
Johan Hedberg	6f48e260a9	Bluetooth: Make smp_chan_destroy() private to smp.c There are no external users of smp_chan_destroy() so make it private to smp.c. The patch also moves the function higher up in the c-file in order to avoid forward declarations. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:24 +02:00
Johan Hedberg	86d1407cb9	Bluetooth: Always call smp_distribute_keys() from a workqueue The smp_distribute_keys() function calls smp_notify_keys() which in turn calls l2cap_conn_update_id_addr(). The l2cap_conn_update_id_addr() function will iterate through all L2CAP channels for the respective connection: lock the channel, update the address information and unlock the channel. Since SMP is now using l2cap_chan callbacks each callback is called with the channel lock held. Therefore, calling l2cap_conn_update_id_addr() would cause a deadlock calling l2cap_chan_lock() on the SMP channel. This patch moves calling smp_distribute_keys() through a workqueue so that it is never called from an L2CAP channel callback. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:24 +02:00
Johan Hedberg	109ec2309e	Bluetooth: Move canceling security_timer into smp_chan_destroy() All places needing to cancel the security timer also call smp_chan_destroy() in the same go. To eliminate the need to do these two calls in multiple places simply move the timer cancellation into smp_chan_destroy(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:24 +02:00
Johan Hedberg	276d807317	Bluetooth: Remove unused l2cap_conn->security_timer Now that there are no-longer any users for l2cap_conn->security_timer we can go ahead and simply remove it. The patch makes initialization of the conn->info_timer unconditional since it's better not to leave any l2cap_conn data structures uninitialized no matter what the underlying transport. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:24 +02:00
Johan Hedberg	b68fda6848	Bluetooth: Add SMP-internal timeout callback This patch adds an SMP-internal timeout callback to remove the depenency on (the soon to be removed) l2cap_conn->security_timer. The behavior is the same as with l2cap_conn->security_timer except that the new l2cap_conn_shutdown() public function is used instead of the L2CAP core internal l2cap_conn_del(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:23 +02:00
Johan Hedberg	8ae9b9845b	Bluetooth: Fix double free of SMP data skb In the case that the SMP recv callback returns error the calling code in l2cap_core.c expects that it still owns the skb and will try to free it. The SMP code should therefore not try to free the skb if it return an error. This patch fixes such behavior in the SMP command handler function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:23 +02:00
Johan Hedberg	4befb867b9	Bluetooth: Call l2cap_conn_shutdown() when SMP recv callback fails To restore pre-l2cap_chan functionality we should be trying to disconnect the connection when receviving garbage SMP data (i.e. when the SMP command handler fails). This patch renames the command handler back to smp_sig_channel() and adds a smp_recv_cb() wrapper function for calling it. If smp_sig_channel() fails the code calls l2cap_conn_shutdown(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:22 +02:00
Johan Hedberg	dec5b49235	Bluetooth: Add public l2cap_conn_shutdown() API to request disconnection Since we no-longer do special handling of SMP within l2cap_core.c we don't have any code for calling l2cap_conn_del() when smp.c doesn't like the data it gets. At the same time we cannot simply export l2cap_conn_del() since it will try to lock the channels it calls into whereas we already hold the lock in the smp.c l2cap_chan callbacks (i.e. it'd lead to a deadlock). This patch adds a new l2cap_conn_shutdown() API which is very similar to l2cap_conn_del() except that it defers the call to l2cap_conn_del() through a workqueue, thereby making it safe to use it from an L2CAP channel callback. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:21 +02:00
Johan Hedberg	44f1a7ab51	Bluetooth: Use L2CAP resume callback to call smp_distribute_keys There's no need to export the smp_distribute_keys() function since the resume callback is called in the same scenario. This patch makes the smp_notify_keys function private (at the same time moving it higher up in smp.c to avoid forward declarations) and adds a resume callback for SMP to call it from there instead. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:21 +02:00
Johan Hedberg	5d88cc73dd	Bluetooth: Convert SMP to use l2cap_chan infrastructure Now that we have all the necessary pieces in place we can fully convert SMP to use the L2CAP channel infrastructure. This patch adds the necessary callbacks and removes the now unneeded conn->smp_chan pointer. One notable behavioral change in this patch comes from the following code snippet: - case L2CAP_CID_SMP: - if (smp_sig_channel(conn, skb)) - l2cap_conn_del(conn->hcon, EACCES); This piece of code was essentially forcing a disconnection if garbage SMP data was received. The l2cap_conn_del() function is private to l2cap_conn.c so we don't have access to it anymore when using the L2CAP channel callbacks. Therefore, the behavior of the new code is simply to return errors in the recv() callback (which is simply the old smp_sig_channel()), but no disconnection will occur. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:19 +02:00
Johan Hedberg	defce9e836	Bluetooth: Make AES crypto context private to SMP Now that we have per-adapter SMP data thanks to the root SMP L2CAP channel we can take advantage of it and attach the AES crypto context (only used for SMP) to it. This means that the smp_irk_matches() and smp_generate_rpa() function can be converted to internally handle the AES context. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:19 +02:00
Johan Hedberg	70db83c4bc	Bluetooth: Add SMP L2CAP channel skeleton This patch creates the initial SMP L2CAP channels and a skeleton for their callbacks. There is one per-adapter channel created upon adapter registration, and then one channel per-connection created through the new_connection callback. The channels are registered with the reserved CID 0x1f for now in order to not conflict with existing SMP functionality. Once everything is in place the value can be changed to what it should be, i.e. L2CAP_CID_SMP. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:18 +02:00
Johan Hedberg	711eafe345	Bluetooth: Move SMP (de)initialization to smp.c As preparation for moving SMP to use l2cap_chan infrastructure we need to move the (de)initialization functions to smp.c (where they'll eventually need access to the local L2CAP channel callbacks). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:18 +02:00
Johan Hedberg	5450691805	Bluetooth: Move SMP initialization after HCI init First of all, it's wasteful to initialize SMP if it's never going to be used (e.g. on non-LE controllers). Second of all, when we move to use l2cap_chan we need to know the real local address, meaning we must have completed at least part of the HCI init. This patch moves the SMP initialization to after the HCI init procedure and makes it depend on whether the controller actually supports LE. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:17 +02:00
Johan Hedberg	222916e3e5	Bluetooth: Refactor SMP (de)initialization into separate functions As preparation for converting SMP to use the l2cap_chan infrastructure refactor the (de)initialization into separate functions. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:17 +02:00
Johan Hedberg	893ededeb1	Bluetooth: Fix IRK lookup when tfm_aes is not available If the AES crypto has not been initialized properly we should cleanly return from the hci_find_irk_by_rpa() function. Right now this will not happen in practice, but once (in subsequent patches) SMP init is moved to after the HCI init procedure it is possible that the pointer is NULL. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:16 +02:00
Johan Hedberg	fabed38fcf	Bluetooth: Fix hci_update_random_address() error return for no crypto If the AES crypto context is not available we cannot generate new RPAs. We should therefore cleanly return an error from the function responsible for updating the random address. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:16 +02:00
Johan Hedberg	d336860559	Bluetooth: Fix using HCI_CONN_LE_SMP_PEND to check for SMP context The code is consistently using the HCI_CONN_LE_SMP_PEND flag check for the existence of the SMP context, with the exception of this one place in smp_sig_channel(). This patch converts the place to use the flag just like all other instances. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:16 +02:00
Johan Hedberg	79a0572736	Bluetooth: Call l2cap_le_conn_ready after notifying channels For most cases it makes no difference whether l2cap_le_conn_ready() is called before or after calling the channel ready() callbacks, however for upcoming SMP code we need this as the ready() callback initializes certain structures that a call to smp_conn_security() from l2cap_le_conn_ready() depends on. Therefore, move the call to l2cap_le_conn_ready() after iterating through and notifying channels. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:15 +02:00
Johan Hedberg	72847ce021	Bluetooth: Call L2CAP teardown callback before clearing chan->conn L2CAP channel implementations may want to still access the chan->conn pointer. This will particularly be the case for SMP that will want to clear a reference to the SMP channel in the l2cap_conn structure. The only user of the teardown callback so far is l2cap_sock.c and for the code there it makes no difference whether the callback is called before or after clearing the chan->conn pointer. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:15 +02:00
Johan Hedberg	148243087b	Bluetooth: Move parts of fixed channel initialization to l2cap_add_scid The l2cap_add_scid function is used for registering a fixed L2CAP channel. Instead of having separate initialization of the channel type and outgoing MTU in l2cap_sock.c it's more intuitive to do these things in the l2cap_add_scid function itself (and thereby make the functionality available to other users besides l2cap_sock.c). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:15 +02:00
Johan Hedberg	06171e0546	Bluetooth: Remove special ATT data channel handling Now that we've got the fixed channel infrastructure cleaned up in a generic way there's no longer a need to have a dedicated function for handling data on the ATT channel. Instead the generic l2cap_data_channel() handler will be able to do the exact same thing. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:14 +02:00
Johan Hedberg	54a1b626c9	Bluetooth: Improve fixed channel lookup based on link type When notifying global fixed channels of new connections it doesn't make sense to consider channels meant for a different link type than the one available. This patch adds an extra parameter to the l2cap_global_fixed_chan() lookup function and ensures that only channels matching the current hci_conn type are looked up. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:14 +02:00
Johan Hedberg	e760ec1213	Bluetooth: Move L2CAP fixed channel creation into l2cap_conn_cfm In order to remove special handling of fixed L2CAP channels we need to start creating them in a single place instead of having per-channel exceptions. The most natural place is the l2cap_conn_cfm() function which is called whenever there is a new baseband link. The only really special case so far has been the ATT socket, so in order not to break the code in between this patch removes the ATT special handling at the same time as it adds the generic fixed channel handling from l2cap_le_conn_ready() into the hci_conn_cfm() function. As a related change the channel locking in l2cap_conn_ready() becomes simpler and we can thereby move the smp_conn_security() call into the l2cap_le_conn_ready() function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:14 +02:00
Johan Hedberg	dc0f508818	Bluetooth: Refactor l2cap_connect_cfm This patch is a simple refactoring of l2cap_connect_cfm to allow easier extension of the function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:13 +02:00
Johan Hedberg	191eb398c6	Bluetooth: Remove special handling of ATT in l2cap_security_cfm() With the update to sk->resume() and __l2cap_no_conn_pending() we no-longer need to have special handling of ATT channels in the l2cap_security_cfm() function. The chan->sec_level update when encryption has been enabled is safe to do for any kind of channel, and the loop takes later care of calling chan->ready() or chan->resume() if necessary. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:13 +02:00
Johan Hedberg	d52deb1748	Bluetooth: Resume BT_CONNECTED state after LE security elevation The LE ATT socket uses a special trick where it temporarily sets BT_CONFIG state for the duration of a security level elevation. In order to not require special hacks for going back to BT_CONNECTED state in the l2cap_core.c code the most reasonable place to resume the state is the resume callback. This patch adds a new flag to track the pending security level change and ensures that the state is set back to BT_CONNECTED in the resume callback in case the flag is set. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:12 +02:00
Johan Hedberg	5ff6f34d42	Bluetooth: Fix __l2cap_no_conn_pending() usage with all channels The __l2cap_no_conn_pending() function would previously only return a meaningful value for connection oriented channels and was therefore not useful for anything else. As preparation of making the L2CAP code more generic allow the function to be called for other channel types as well by returning a meaningful value for them. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:12 +02:00
Johan Hedberg	a24cce144b	Bluetooth: Fix reference counting of global L2CAP channels When looking up entries from the global L2CAP channel list there needs to be a guarantee that other code doesn't go and remove the entry after a channel has been returned by the lookup function. This patch makes sure that the channel reference is incremented before the read lock is released in the global channel lookup functions. The patch also adds the corresponding l2cap_chan_put() calls once the channels pointers are no-longer needed. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:11 +02:00
Johan Hedberg	2b29349044	Bluetooth: Fix confusion between parent and child channel for 6lowpan The new_connection L2CAP channel callback creates a new channel based on the provided parent channel. The 6lowpan code was confusingly naming the child channel "pchan" and the parent channel "chan". This patch swaps the names. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:11 +02:00
Johan Hedberg	5fcb934756	Bluetooth: Remove redundant check for remote_key_dist In the smp_cmd_sign_info() function the SMP_DIST_SIGN bit is explicitly cleared early on in the function. This means that there's no need to check for it again before calling smp_distribute_keys(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:10 +02:00
Johan Hedberg	22f433dcf7	Bluetooth: Disable page scan if all whitelisted devices are connected When we're not connectable and all whitelisted (BR/EDR) devices are connected it doesn't make sense to keep page scan enabled. This patch adds code to check for any disconnected whitelist devices and if there are none take the appropriate action in the hci_update_page_scan() function to disable page scan. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:10 +02:00
Johan Hedberg	432df05eb1	Bluetooth: Create unified helper function for updating page scan Similar to our hci_update_background_scan() function we can simplify a lot of code by creating a unified helper function for doing page scan updates. This patch adds such a function to hci_core.c and updates all the relevant places to use it. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:09 +02:00
Johan Hedberg	84c61d92bb	Bluetooth: Add convenience function to check for pending power off There are several situations where we're interested in knowing whether we're currently in the process of powering off an adapter. This patch adds a convenience function for the purpose and makes it public since we'll soon need to access it from hci_event.c as well. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-08-14 08:49:08 +02:00
Willem de Bruijn	490cc7d03c	net-timestamp: fix missing tcp fragmentation cases Bytestream timestamps are correlated with a single byte in the skbuff, recorded in skb_shinfo(skb)->tskey. When fragmenting skbuffs, ensure that the tskey is set for the fragment in which the tskey falls (seqno <= tskey < end_seqno). The original implementation did not address fragmentation in tcp_fragment or tso_fragment. Add code to inspect the sequence numbers and move both tskey and the relevant tx_flags if necessary. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-13 20:06:06 -07:00
Willem de Bruijn	712a72213f	net-timestamp: fix missing ACK timestamp ACK timestamps are generated in tcp_clean_rtx_queue. The TSO datapath can break out early, causing the timestamp code to be skipped. Move the code up before the break. Reported-by: David S. Miller <davem@davemloft.net> Also fix a boundary condition: tp->snd_una is the next unacknowledged byte and between tests inclusive (a <= b <= c), so generate a an ACK timestamp if (prior_snd_una <= tskey <= tp->snd_una - 1). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-13 20:06:06 -07:00
Maks Naumov	efd5029010	irda: Fix rd_frame control field initialization in irlap_send_rd_frame() Signed-off-by: Maks Naumov <maksqwe1@ukr.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-13 20:05:52 -07:00
chas williams - CONTRACTOR	8356f9d564	lec: Fix bug introduced by `b67bfe0d42` `b67bfe0d42` (hlist: drop the node parameter from iterators) dropped the node parameter from iterators which lec_tbl_walk() was using to iterate the list. Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-13 20:04:46 -07:00
chas williams - CONTRACTOR	de713b5794	atm/svc: Fix blocking in wait loop One should not call blocking primitives inside a wait loop, since both require task_struct::state to sleep, so the inner will destroy the outer state. sigd_enq() will possibly sleep for alloc_skb(). Move sigd_enq() before prepare_to_wait() to avoid sleeping while waiting interruptibly. You do not actually need to call sigd_enq() after the initial prepare_to_wait() because we test the termination condition before calling schedule(). Based on suggestions from Peter Zijlstra. Signed-off-by: Chas Williams <chas@cmf.n4rl.navy.mil> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-13 20:04:46 -07:00
Christoph Jaeger	3791b3f6fb	openvswitch: Fix memory leak in ovs_vport_alloc() error path ovs_vport_alloc() bails out without freeing the memory 'vport' points to. Picked up by Coverity - CID 1230503. Fixes: `5cd667b0a4` ("openvswitch: Allow each vport to have an array of 'port_id's.") Signed-off-by: Christoph Jaeger <cj@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-13 20:04:46 -07:00
Linus Torvalds	f0094b28f3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Several networking final fixes and tidies for the merge window: 1) Changes during the merge window unintentionally took away the ability to build bluetooth modular, fix from Geert Uytterhoeven. 2) Several phy_node reference count bug fixes from Uwe Kleine-König. 3) Fix ucc_geth build failures, also from Uwe Kleine-König. 4) Fix klog false positivies when netlink messages go to network taps, by properly resetting the network header. Fix from Daniel Borkmann. 5) Sizing estimate of VF netlink messages is too small, from Jiri Benc. 6) New APM X-Gene SoC ethernet driver, from Iyappan Subramanian. 7) VLAN untagging is erroneously dependent upon whether the VLAN module is loaded or not, but there are generic dependencies that matter wrt what can be expected as the SKB enters the stack. Make the basic untagging generic code, and do it unconditionally. From Vlad Yasevich. 8) xen-netfront only has so many slots in it's transmit queue so linearize packets that have too many frags. From Zoltan Kiss. 9) Fix suspend/resume PHY handling in bcmgenet driver, from Florian Fainelli" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (55 commits) net: bcmgenet: correctly resume adapter from Wake-on-LAN net: bcmgenet: update UMAC_CMD only when link is detected net: bcmgenet: correctly suspend and resume PHY device net: bcmgenet: request and enable main clock earlier net: ethernet: myricom: myri10ge: myri10ge.c: Cleaning up missing null-terminate after strncpy call xen-netfront: Fix handling packets on compound pages with skb_linearize net: fec: Support phys probed from devicetree and fixed-link smsc: replace WARN_ON() with WARN_ON_SMP() xen-netback: Don't deschedule NAPI when carrier off net: ethernet: qlogic: qlcnic: Remove duplicate object file from Makefile wan: wanxl: Remove typedefs from struct names m68k/atari: EtherNEC - ethernet support (ne) net: ethernet: ti: cpmac.c: Cleaning up missing null-terminate after strncpy call hdlc: Remove typedefs from struct names airo_cs: Remove typedef local_info_t atmel: Remove typedef atmel_priv_ioctl com20020_cs: Remove typedef com20020_dev_t ethernet: amd: Remove typedef local_info_t net: Always untag vlan-tagged traffic on input. drivers: net: Add APM X-Gene SoC ethernet driver support. ...	2014-08-13 18:27:40 -06:00
Linus Torvalds	06b8ab5528	NFS client updates for Linux 3.17 Highlights include: - Stable fix for a bug in nfs3_list_one_acl() - Speed up NFS path walks by supporting LOOKUP_RCU - More read/write code cleanups - pNFS fixes for layout return on close - Fixes for the RCU handling in the rpcsec_gss code - More NFS/RDMA fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJT65zoAAoJEGcL54qWCgDyvq8QAJ+OKuC5dpngrZ13i4ZJIcK1 TJSkWCr44FhYPlrmkLCntsGX6C0376oFEtJ5uqloqK0+/QtvwRNVSQMKaJopKIVY mR4En0WwpigxVQdW2lgto6bfOhzMVO+llVdmicEVrU8eeSThATxGNv7rxRzWorvL RX3TwBkWSc0kLtPi66VRFQ1z+gg5I0kngyyhsKnLOaHHtpTYP2JDZlRPRkokXPUg nmNedmC3JrFFkarroFIfYr54Qit2GW/eI2zVhOwHGCb45j4b2wntZ6wr7LpUdv3A OGDBzw59cTpcx3Hij9CFvLYVV9IJJHBNd2MJqdQRtgWFfs+aTkZdk4uilUJCIzZh f4BujQAlm/4X1HbPxsSvkCRKga7mesGM7e0sBDPHC1vu0mSaY1cakcj2kQLTpbQ7 gqa1cR3pZ+4shCq37cLwWU0w1yElYe1c4otjSCttPCrAjXbXJZSFzYnHm8DwKROR t+yEDRL5BIXPu1nEtSnD2+xTQ3vUIYXooZWEmqLKgRtBTtPmgSn9Vd8P1OQXmMNo VJyFXyjNx5WH06Wbc/jLzQ1/cyhuPmJWWyWMJlVROyv+FXk9DJUFBZuTkpMrIPcF NlBXLV1GnA7PzMD9Xt9bwqteERZl6fOUDJLWS9P74kTk5c2kD+m+GaqC/rBTKKXc ivr2s7aIDV48jhnwBSVL =KE07 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: - stable fix for a bug in nfs3_list_one_acl() - speed up NFS path walks by supporting LOOKUP_RCU - more read/write code cleanups - pNFS fixes for layout return on close - fixes for the RCU handling in the rpcsec_gss code - more NFS/RDMA fixes" * tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) nfs: reject changes to resvport and sharecache during remount NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred NFS: fix two problems in lookup_revalidate in RCU-walk NFS: allow lockless access to access_cache NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU NFS: support RCU_WALK in nfs_permission() sunrpc/auth: allow lockless (rcu) lookup of credential cache. NFS: prepare for RCU-walk support but pushing tests later in code. NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used. NFS: add checks for returned value of try_module_get() nfs: clear_request_commit while holding i_lock pnfs: add pnfs_put_lseg_async pnfs: find swapped pages on pnfs commit lists too nfs: fix comment and add warn_on for PG_INODE_REF nfs: check wait_on_bit_lock err in page_group_lock sunrpc: remove "ec" argument from encrypt_v2 operation sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c ...	2014-08-13 18:13:19 -06:00
Linus Torvalds	8d2d441ac4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is a lot of refactoring and hardening of the libceph and rbd code here from Ilya that fix various smaller bugs, and a few more important fixes with clone overlap. The main fix is a critical change to the request_fn handling to not sleep that was exposed by the recent mutex changes (which will also go to the 3.16 stable series). Yan Zheng has several fixes in here for CephFS fixing ACL handling, time stamps, and request resends when the MDS restarts. Finally, there are a few cleanups from Himangi Saraogi based on Coccinelle" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits) libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly rbd: remove extra newlines from rbd_warn() messages rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC rbd: rework rbd_request_fn() ceph: fix kick_requests() ceph: fix append mode write ceph: fix sizeof(struct tYpO *) typo ceph: remove redundant memset(0) rbd: take snap_id into account when reading in parent info rbd: do not read in parent info before snap context rbd: update mapping size only on refresh rbd: harden rbd_dev_refresh() and callers a bit rbd: split rbd_dev_spec_update() into two functions rbd: remove unnecessary asserts in rbd_dev_image_probe() rbd: introduce rbd_dev_header_info() rbd: show the entire chain of parent images ceph: replace comma with a semicolon rbd: use rbd_segment_name_free() instead of kfree() ceph: check zero length in ceph_sync_read() ceph: reset r_resend_mds after receiving -ESTALE ...	2014-08-13 17:43:29 -06:00
Vlad Yasevich	0d5501c1c8	net: Always untag vlan-tagged traffic on input. Currently the functionality to untag traffic on input resides as part of the vlan module and is build only when VLAN support is enabled in the kernel. When VLAN is disabled, the function vlan_untag() turns into a stub and doesn't really untag the packets. This seems to create an interesting interaction between VMs supporting checksum offloading and some network drivers. There are some drivers that do not allow the user to change tx-vlan-offload feature of the driver. These drivers also seem to assume that any VLAN-tagged traffic they transmit will have the vlan information in the vlan_tci and not in the vlan header already in the skb. When transmitting skbs that already have tagged data with partial checksum set, the checksum doesn't appear to be updated correctly by the card thus resulting in a failure to establish TCP connections. The following is a packet trace taken on the receiver where a sender is a VM with a VLAN configued. The host VM is running on doest not have VLAN support and the outging interface on the host is tg3: 10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294837885 ecr 0,nop,wscale 7], length 0 10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294838888 ecr 0,nop,wscale 7], length 0 This connection finally times out. I've only access to the TG3 hardware in this configuration thus have only tested this with TG3 driver. There are a lot of other drivers that do not permit user changes to vlan acceleration features, and I don't know if they all suffere from a similar issue. The patch attempt to fix this another way. It moves the vlan header stipping code out of the vlan module and always builds it into the kernel network core. This way, even if vlan is not supported on a virtualizatoin host, the virtual machines running on top of such host will still work with VLANs enabled. CC: Patrick McHardy <kaber@trash.net> CC: Nithin Nayak Sujir <nsujir@broadcom.com> CC: Michael Chan <mchan@broadcom.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-11 12:16:51 -07:00
David S. Miller	f00439e2e3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains fixes for your net tree, they are: 1) Unitialize the set element key and data from the commit path, otherwise this leaks chain refcount if the transaction is aborted, reported by Thomas Graf. 2) Fix crash when updating chains without no counters in nf_tables, this slipped through in the new transaction infrastructure, reported by Matteo Croce. 3) Replace all mutex_lock_interruptible() by mutex_lock() in the Netfilter tree, suggested by Patrick McHardy. This implicitly fixes the problem that Eric Dumazet reported in: http://patchwork.ozlabs.org/patch/373076/ 4) Fix error return code in nf_tables when deleting set element in nf_tables if the transaction cannot be allocated, from Julia Lawall. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-11 10:26:15 -07:00
Linus Torvalds	77e40aae76	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "This is a bunch of small changes built against 3.16-rc6. The most significant change for users is the first patch which makes setns drmatically faster by removing unneded rcu handling. The next chunk of changes are so that "mount -o remount,.." will not allow the user namespace root to drop flags on a mount set by the system wide root. Aks this forces read-only mounts to stay read-only, no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to stay no exec and it prevents unprivileged users from messing with a mounts atime settings. I have included my test case as the last patch in this series so people performing backports can verify this change works correctly. The next change fixes a bug in NFS that was discovered while auditing nsproxy users for the first optimization. Today you can oops the kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given that no one to my knowledge bases anything on my tree fixing the typo in place seems more responsible that requiring a typo-fix to be backported as well. The last change is a small semantic cleanup introducing /proc/thread-self and pointing /proc/mounts and /proc/net at it. This prevents several kinds of problemantic corner cases. It is a user-visible change so it has a minute chance of causing regressions so the change to /proc/mounts and /proc/net are individual one line commits that can be trivially reverted. Unfortunately I lost and could not find the email of the original reporter so he is not credited. From at least one perspective this change to /proc/net is a refgression fix to allow pthread /proc/net uses that were broken by the introduction of the network namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net proc: Implement /proc/thread-self to point at the directory of the current thread proc: Have net show up under /proc/<tgid>/task/<tid> NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes mnt: Add tests for unprivileged remount cases that have found to be faulty mnt: Change the default remount atime from relatime to the existing value mnt: Correct permission checks in do_remount mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount mnt: Only change user settable mount flags in remount namespaces: Use task_lock and not rcu to protect nsproxy	2014-08-09 17:10:41 -07:00
Linus Torvalds	0d10c2c170	Merge branch 'for-3.17' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "This includes a major rewrite of the NFSv4 state code, which has always depended on a single mutex. As an example, open creates are no longer serialized, fixing a performance regression on NFSv3->NFSv4 upgrades. Thanks to Jeff, Trond, and Benny, and to Christoph for review. Also some RDMA fixes from Chuck Lever and Steve Wise, and miscellaneous fixes from Kinglong Mee and others" * 'for-3.17' of git://linux-nfs.org/~bfields/linux: (167 commits) svcrdma: remove rdma_create_qp() failure recovery logic nfsd: add some comments to the nfsd4 object definitions nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers nfsd: remove nfs4_lock_state: nfs4_state_shutdown_net nfsd: remove nfs4_lock_state: nfs4_laundromat nfsd: Remove nfs4_lock_state(): reclaim_complete() nfsd: Remove nfs4_lock_state(): setclientid, setclientid_confirm, renew nfsd: Remove nfs4_lock_state(): exchange_id, create/destroy_session() nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm nfsd: Remove nfs4_lock_state(): nfsd4_delegreturn() nfsd: Remove nfs4_lock_state(): nfsd4_open_downgrade + nfsd4_close nfsd: Remove nfs4_lock_state(): nfsd4_lock/locku/lockt() nfsd: Remove nfs4_lock_state(): nfsd4_release_lockowner nfsd: Remove nfs4_lock_state(): nfsd4_test_stateid/nfsd4_free_stateid nfsd: Remove nfs4_lock_state(): nfs4_preprocess_stateid_op() nfsd: remove old fault injection infrastructure nfsd: add more granular locking to *_delegations fault injectors nfsd: add more granular locking to forget_openowners fault injector nfsd: add more granular locking to forget_locks fault injector nfsd: add a list_head arg to nfsd_foreach_client_lock ...	2014-08-09 14:31:18 -07:00
Ilya Dryomov	5f740d7e15	libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly Determining ->last_piece based on the value of ->page_offset + length is incorrect because length here is the length of the entire message. ->last_piece set to false even if page array data item length is <= PAGE_SIZE, which results in invalid length passed to ceph_tcp_{send,recv}page() and causes various asserts to fire. # cat pages-cursor-init.sh #!/bin/bash rbd create --size 10 --image-format 2 foo FOO_DEV=$(rbd map foo) dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null rbd snap create foo@snap rbd snap protect foo@snap rbd clone foo@snap bar # rbd_resize calls librbd rbd_resize(), size is in bytes ./rbd_resize bar $(((4 << 20) + 512)) rbd resize --size 10 bar BAR_DEV=$(rbd map bar) # trigger a 512-byte copyup -- 512-byte page array data item dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5 The problem exists only in ceph_msg_data_pages_cursor_init(), ceph_msg_data_pages_advance() does the right thing. The size_t cast is unnecessary. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-08-09 11:27:32 +04:00
Jiri Benc	945a36761f	rtnetlink: fix VF info size Commit `1d8faf48c7` ("net/core: Add VF link state control") added new attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we may trigger warnings in rtnl_getlink and similar functions when many VF links are enabled, as the information does not fit into the allocated skb. Fixes: `1d8faf48c7` ("net/core: Add VF link state control") Reported-by: Yulong Pei <ypei@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-08 10:28:09 -07:00
Niv Yehezkel	b7a71b51ee	ipv4: removed redundant conditional Since fib_lookup cannot return ESRCH no longer, checking for this error code is no longer neccesary. Signed-off-by: Niv Yehezkel <executerx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-08 10:22:22 -07:00
Julia Lawall	609ccf0877	netfilter: nf_tables: fix error return code Convert a zero return value on error to a negative one, as returned elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e1,e2; @@ ( if ($ret < 0\\|ret != 0$) { ... return ret; } \| ret = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-08 16:47:29 +02:00
Pablo Neira Ayuso	7926dbfa4b	netfilter: don't use mutex_lock_interruptible() Eric Dumazet reports that getsockopt() or setsockopt() sometimes returns -EINTR instead of -ENOPROTOOPT, causing headaches to application developers. This patch replaces all the mutex_lock_interruptible() by mutex_lock() in the netfilter tree, as there is no reason we should sleep for a long time there. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Julian Anastasov <ja@ssi.bg>	2014-08-08 16:47:23 +02:00
Pablo Neira Ayuso	b88825de85	netfilter: nf_tables: don't update chain with unset counters Fix possible replacement of the per-cpu chain counters by null pointer when updating an existing chain in the commit path. Reported-by: Matteo Croce <technoboy85@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-08 15:38:50 +02:00
Pablo Neira Ayuso	a3716e70e1	netfilter: nf_tables: uninitialize element key/data from the commit path This should happen once the element has been effectively released in the commit path, not before. This fixes a possible chain refcount leak if the transaction is aborted. Reported-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-08 15:38:46 +02:00
Daniel Borkmann	4e48ed883c	netlink: reset network header before passing to taps netlink doesn't set any network header offset thus when the skb is being passed to tap devices via dev_queue_xmit_nit(), it emits klog false positives due to it being unset like: ... [ 124.990397] protocol 0000 is buggy, dev nlmon0 [ 124.990411] protocol 0000 is buggy, dev nlmon0 ... So just reset the network header before passing to the device; for packet sockets that just means nothing will change - mac and net offset hold the same value just as before. Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-07 16:02:58 -07:00
Jean Sacren	0a4dd0d786	batman: fix duplicate #include of multicast.h The header multicast.h was included twice, so delete one of them. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Cc: b.a.t.m.a.n@lists.open-mesh.org Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-07 16:02:57 -07:00
Jean Sacren	2072ec846a	openvswitch: fix duplicate #include headers The #include headers net/genetlink.h and linux/genetlink.h both were included twice, so delete each of the duplicate. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: dev@openvswitch.org Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-07 16:02:57 -07:00
Geert Uytterhoeven	2d177f3113	6lowpan: Allow 6LoWPAN to be modular Change config symbol 6LOWPAN from type bool to type tristate, so 6LoWPAN can be built modular, just like IPV6 Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-07 11:44:18 -07:00
Linus Torvalds	33caee3992	Merge branch 'akpm' (patchbomb from Andrew Morton) Merge incoming from Andrew Morton: - Various misc things. - arch/sh updates. - Part of ocfs2. Review is slow. - Slab updates. - Most of -mm. - printk updates. - lib/ updates. - checkpatch updates. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (226 commits) checkpatch: update $declaration_macros, add uninitialized_var checkpatch: warn on missing spaces in broken up quoted checkpatch: fix false positives for --strict "space after cast" test checkpatch: fix false positive MISSING_BREAK warnings with --file checkpatch: add test for native c90 types in unusual order checkpatch: add signed generic types checkpatch: add short int to c variable types checkpatch: add for_each tests to indentation and brace tests checkpatch: fix brace style misuses of else and while checkpatch: add --fix option for a couple OPEN_BRACE misuses checkpatch: use the correct indentation for which() checkpatch: add fix_insert_line and fix_delete_line helpers checkpatch: add ability to insert and delete lines to patch/file checkpatch: add an index variable for fixed lines checkpatch: warn on break after goto or return with same tab indentation checkpatch: emit a warning on file add/move/delete checkpatch: add test for commit id formatting style in commit log checkpatch: emit fewer kmalloc_array/kcalloc conversion warnings checkpatch: improve "no space after cast" test checkpatch: allow multiple const * types ...	2014-08-06 21:14:42 -07:00
Thomas Graf	6c8f7e7083	netlink: hold nl_sock_hash_lock during diag dump Although RCU protection would be possible during diag dump, doing so allows for concurrent table mutations which can render the in-table offset between individual Netlink messages invalid and thus cause legitimate sockets to be skipped in the dump. Since the diag dump is relatively low volume and consistency is more important than performance, the table mutex is held during dump. Reported-by: Andrey Wagin <avagin@gmail.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Fixes: `e341694e3e` ("netlink: Convert netlink_lookup() to use RCU protected hash table") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-06 19:17:44 -07:00
Ken Helias	1d023284c3	list: fix order of arguments for hlist_add_after(_rcu) All other add functions for lists have the new item as first argument and the position where it is added as second argument. This was changed for no good reason in this function and makes using it unnecessary confusing. The name was changed to hlist_add_behind() to cause unconverted code to generate a compile error instead of using the wrong parameter order. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Ken Helias <kenhelias@firemail.de> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [intel driver bits] Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:24 -07:00
Dmitry Popov	9ea88a1530	tcp: md5: check md5 signature without socket lock Since `a8afca032` (tcp: md5: protects md5sig_info with RCU) tcp_md5_do_lookup doesn't require socket lock, rcu_read_lock is enough. Therefore socket lock is no longer required for tcp_v{4,6}_inbound_md5_hash too, so we can move these calls (wrapped with rcu_read_{,un}lock) before bh_lock_sock: from tcp_v{4,6}_do_rcv to tcp_v{4,6}_rcv. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-06 16:00:20 -07:00
Willem de Bruijn	f066e2b091	net-timestamp: cumulative tcp timestamping fixes A set of small fixes pointed out just after the merge: - make tcp_tx_timestamp static - make tcp_gso_tstamp static - use before() to compare TCP seqno, instead of cast to u64 - add tstamp to tx_flags in GSO, instead of overwrite tx_flags - record skb_shinfo(skb)->tskey for all timestamps, also HW. - optimization in tcp_tx_timestamp: call sock_tx_timestamp only if a tstamp option is set. Signed-off-by: Willem de Bruijn <willemb@google.com> Fixes: `4ed2d765df` ("net-timestamp: TCP timestamping") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-06 14:09:01 -07:00
Eric Dumazet	140c55d4b5	net-timestamp: sock_tx_timestamp() fix sock_tx_timestamp() should not ignore initial tx_flags value, as TCP stack can store SKBTX_SHARED_FRAG in it. Also first argument (struct sock ) can be const. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `4ed2d765df` ("net-timestamp: TCP timestamping") Cc: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-06 12:38:07 -07:00
Linus Torvalds	ae045e2455	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights: 1) Steady transitioning of the BPF instructure to a generic spot so all kernel subsystems can make use of it, from Alexei Starovoitov. 2) SFC driver supports busy polling, from Alexandre Rames. 3) Take advantage of hash table in UDP multicast delivery, from David Held. 4) Lighten locking, in particular by getting rid of the LRU lists, in inet frag handling. From Florian Westphal. 5) Add support for various RFC6458 control messages in SCTP, from Geir Ola Vaagland. 6) Allow to filter bridge forwarding database dumps by device, from Jamal Hadi Salim. 7) virtio-net also now supports busy polling, from Jason Wang. 8) Some low level optimization tweaks in pktgen from Jesper Dangaard Brouer. 9) Add support for ipv6 address generation modes, so that userland can have some input into the process. From Jiri Pirko. 10) Consolidate common TCP connection request code in ipv4 and ipv6, from Octavian Purdila. 11) New ARP packet logger in netfilter, from Pablo Neira Ayuso. 12) Generic resizable RCU hash table, with intial users in netlink and nftables. From Thomas Graf. 13) Maintain a name assignment type so that userspace can see where a network device name came from (enumerated by kernel, assigned explicitly by userspace, etc.) From Tom Gundersen. 14) Automatic flow label generation on transmit in ipv6, from Tom Herbert. 15) New packet timestamping facilities from Willem de Bruijn, meant to assist in measuring latencies going into/out-of the packet scheduler, latency from TCP data transmission to ACK, etc" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits) cxgb4 : Disable recursive mailbox commands when enabling vi net: reduce USB network driver config options. tg3: Modify tg3_tso_bug() to handle multiple TX rings amd-xgbe: Perform phy connect/disconnect at dev open/stop amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask net: sun4i-emac: fix memory leak on bad packet sctp: fix possible seqlock seadlock in sctp_packet_transmit() Revert "net: phy: Set the driver when registering an MDIO bus device" cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine team: Simplify return path of team_newlink bridge: Update outdated comment on promiscuous mode net-timestamp: ACK timestamp for bytestreams net-timestamp: TCP timestamping net-timestamp: SCHED timestamp on entering packet scheduler net-timestamp: add key to disambiguate concurrent datagrams net-timestamp: move timestamp flags out of sk_flags net-timestamp: extend SCM_TIMESTAMPING ancillary data struct cxgb4i : Move stray CPL definitions to cxgb4 driver tcp: reduce spurious retransmits due to transient SACK reneging qlcnic: Initialize dcbnl_ops before register_netdev ...	2014-08-06 09:38:14 -07:00
Linus Torvalds	bb2cbf5e93	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "In this release: - PKCS#7 parser for the key management subsystem from David Howells - appoint Kees Cook as seccomp maintainer - bugfixes and general maintenance across the subsystem" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits) X.509: Need to export x509_request_asymmetric_key() netlabel: shorter names for the NetLabel catmap funcs/structs netlabel: fix the catmap walking functions netlabel: fix the horribly broken catmap functions netlabel: fix a problem when setting bits below the previously lowest bit PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1 tpm: simplify code by using %*phN specifier tpm: Provide a generic means to override the chip returned timeouts tpm: missing tpm_chip_put in tpm_get_random() tpm: Properly clean sysfs entries in error path tpm: Add missing tpm_do_selftest to ST33 I2C driver PKCS#7: Use x509_request_asymmetric_key() Revert "selinux: fix the default socket labeling in sock_graft()" X.509: x509_request_asymmetric_keys() doesn't need string length arguments PKCS#7: fix sparse non static symbol warning KEYS: revert encrypted key change ima: add support for measuring and appraising firmware firmware_class: perform new LSM checks security: introduce kernel_fw_from_file hook PKCS#7: Missing inclusion of linux/err.h ...	2014-08-06 08:06:39 -07:00
David S. Miller	d247b6ab3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/Makefile net/ipv6/sysctl_net_ipv6.c Two ipv6_table_template[] additions overlap, so the index of the ipv6_table[x] assignments needed to be adjusted. In the drivers/net/Makefile case, we've gotten rid of the garbage whereby we had to list every single USB networking driver in the top-level Makefile, there is just one "USB_NETWORKING" that guards everything. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 18:46:26 -07:00
Eric Dumazet	757efd32d5	sctp: fix possible seqlock seadlock in sctp_packet_transmit() Dave reported following splat, caused by improper use of IP_INC_STATS_BH() in process context. BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3 Call Trace: [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100 [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20 [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp] [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp] [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0 [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp] [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp] [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp] [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100 [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp] [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp] [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp] [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160 [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30 [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220 [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220 [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0 [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0 [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0 [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0 [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330 [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10 [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2 This is a followup of commits `f1d8cba61c` ("inet: fix possible seqlock deadlocks") and `7f88c6b23a` ("ipv6: fix possible seqlock deadlock in ip6_finish_output2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:42:55 -07:00
David S. Miller	6ff4e36f8b	Included changes: - kmalloc_array instead of kmalloc when possible - avoid log spam due to useless net_ratelimit() invocations - increase default metric hop penalty from 15 to 30 - update internal version number -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJT4Ir0AAoJEJgn97Bh2u9eYG8P/2Bx9uCbajNQSCBvQ20OXW4o YzubqMETLel6GLoVjL2XFsmJATaQGGWYWGSytDF2sRa1C7/dXXfUrwgfFNzw2L9Z nZSd8WbhiBW0LK/ECzTQ12M3VnTdjgj3IP9A58N9zqQf5uM1onQR6gR7IQGARn9S D5onFdoRz5bWcVffyf4YN9irMkvAFjOB7OzNtBaHRbH5oObITS10LcQ7rVLAic/0 lyMCX1ioBnFpbH1YfII0dcSFjY22m9QTjJDj4dx6LbjDXaxv9BMiJ7bfjCt5wn1o 0ju5X898fhTX0L4Z67pGGHzawByyXtrf1r9INot8K8oqftq5vkHpIJdn9n0GrYGa 9FVoQ0hzDVSp7mxnBKnbXyaf47HX8RYRBZegHEDW8zrpO0zj7cyruRKifDL3q9R9 cqGIRYSFPaXEGF6HfmxWXCjst1VpddcTBgD7OWlKuTi/Juk2ZoWI4O+h3WlotOv8 niTiG0DiNoGlrLXjFlya6TdfL43zg3sE8O4oukcVO4uejo+wdL+Rc7ZGuO7CbO9k V+ORQ/8Efg3XD8xRSmzi0pgyBfU/C6HTTWCIv8DIMfl0B5zoeIx75quDP9/CKIfz UTF0R1+mA0o6UmSiIngqRTFEEB9cWrMvCjM1drMb1V99NzI+/fg1skO7pcnJRddC rbPgeTHDAYgMsPj74rhs =EIGl -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Antonio Quartulli says: ==================== pull request: batman-adv 2014-08-05 this is a pull request intended for net-next/linux-3.17 (yeah..it's really late). Patches 1, 2 and 4 are really minor changes: - kmalloc_array is substituted to kmalloc when possible (as suggested by checkpatch); - net_ratelimited() is now used properly and the "suppressed" message is not printed anymore if not needed; - the internal version number has been increased to reflect our current version. Patch 3 instead is introducing a change in the metric computation function by changing the penalty applied at each mesh hop from 15/255 (~6%) to 30/255 (~11%). This change is introduced by Simon Wunderlich after having observed a performance improvement in several networks when using the new value. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:39:47 -07:00
Toshiaki Makita	fdb0a6626e	bridge: Update outdated comment on promiscuous mode Now bridge ports can be non-promiscuous, vlan_vid_add() is no longer an unnecessary operation. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:37:10 -07:00
Willem de Bruijn	e1c8a607b2	net-timestamp: ACK timestamp for bytestreams Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte in the send() call is acknowledged. It implements the feature for TCP. The timestamp is generated when the TCP socket cumulative ACK is moved beyond the tracked seqno for the first time. The feature ignores SACK and FACK, because those acknowledge the specific byte, but not necessarily the entire contents of the buffer up to that byte. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:35:54 -07:00
Willem de Bruijn	4ed2d765df	net-timestamp: TCP timestamping TCP timestamping extends SO_TIMESTAMPING to bytestreams. Bytestreams do not have a 1:1 relationship between send() buffers and network packets. The feature interprets a send call on a bytestream as a request for a timestamp for the last byte in that send() buffer. The choice corresponds to a request for a timestamp when all bytes in the buffer have been sent. That assumption depends on in-order kernel transmission. This is the common case. That said, it is possible to construct a traffic shaping tree that would result in reordering. The guarantee is strong, then, but not ironclad. This implementation supports send and sendpages (splice). GSO replaces one large packet with multiple smaller packets. This patch also copies the option into the correct smaller packet. This patch does not yet support timestamping on data in an initial TCP Fast Open SYN, because that takes a very different data path. If ID generation in ee_data is enabled, bytestream timestamps return a byte offset, instead of the packet counter for datagrams. The implementation supports a single timestamp per packet. It silenty replaces requests for previous timestamps. To avoid missing tstamps, flush the tcp queue by disabling Nagle, cork and autocork. Missing tstamps can be detected by offset when the ee_data ID is enabled. Implementation details: - On GSO, the timestamping code can be included in the main loop. I moved it into its own loop to reduce the impact on the common case to a single branch. - To avoid leaking the absolute seqno to userspace, the offset returned in ee_data must always be relative. It is an offset between an skb and sk field. The first is always set (also for GSO & ACK). The second must also never be uninitialized. Only allow the ID option on sockets in the ESTABLISHED state, for which the seqno is available. Never reset it to zero (instead, move it to the current seqno when reenabling the option). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:35:54 -07:00
Willem de Bruijn	e7fd288538	net-timestamp: SCHED timestamp on entering packet scheduler Kernel transmit latency is often incurred in the packet scheduler. Introduce a new timestamp on transmission just before entering the scheduler. When data travels through multiple devices (bonding, tunneling, ...) each device will export an individual timestamp. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:35:54 -07:00
Willem de Bruijn	09c2d251b7	net-timestamp: add key to disambiguate concurrent datagrams Datagrams timestamped on transmission can coexist in the kernel stack and be reordered in packet scheduling. When reading looped datagrams from the socket error queue it is not always possible to unique correlate looped data with original send() call (for application level retransmits). Even if possible, it may be expensive and complex, requiring packet inspection. Introduce a data-independent ID mechanism to associate timestamps with send calls. Pass an ID alongside the timestamp in field ee_data of sock_extended_err. The ID is a simple 32 bit unsigned int that is associated with the socket and incremented on each send() call for which software tx timestamp generation is enabled. The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to avoid changing ee_data for existing applications that expect it 0. The counter is reset each time the flag is reenabled. Reenabling does not change the ID of already submitted data. It is possible to receive out of order IDs if the timestamp stream is not quiesced first. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:35:54 -07:00
Willem de Bruijn	b9f40e21ef	net-timestamp: move timestamp flags out of sk_flags sk_flags is reaching its limit. New timestamping options will not fit. Move all of them into a new field sk->sk_tsflags. Added benefit is that this removes boilerplate code to convert between SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt. SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive timestamp logic (netstamp_needed). That can be simplified and this last key removed, but will leave that for a separate patch. Signed-off-by: Willem de Bruijn <willemb@google.com> ---- The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs, though that scatters tstamp fields throughout the struct. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:35:54 -07:00
Willem de Bruijn	f24b9be595	net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Applications that request kernel tx timestamps with SO_TIMESTAMPING read timestamps as recvmsg() ancillary data. The response is defined implicitly as timespec[3]. 1) define struct scm_timestamping explicitly and 2) add support for new tstamp types. On tx, scm_timestamping always accompanies a sock_extended_err. Define previously unused field ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to define the existing behavior. The reception path is not modified. On rx, no struct similar to sock_extended_err is passed along with SCM_TIMESTAMPING. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:35:53 -07:00
Neal Cardwell	5ae344c949	tcp: reduce spurious retransmits due to transient SACK reneging This commit reduces spurious retransmits due to apparent SACK reneging by only reacting to SACK reneging that persists for a short delay. When a sequence space hole at snd_una is filled, some TCP receivers send a series of ACKs as they apparently scan their out-of-order queue and cumulatively ACK all the packets that have now been consecutiveyly received. This is essentially misbehavior B in "Misbehaviors in TCP SACK generation" ACM SIGCOMM Computer Communication Review, April 2011, so we suspect that this is from several common OSes (Windows 2000, Windows Server 2003, Windows XP). However, this issue has also been seen in other cases, e.g. the netdev thread "TCP being hoodwinked into spurious retransmissions by lack of timestamps?" from March 2014, where the receiver was thought to be a BSD box. Since snd_una would temporarily be adjacent to a previously SACKed range in these scenarios, this receiver behavior triggered the Linux SACK reneging code path in the sender. This led the sender to clear the SACK scoreboard, enter CA_Loss, and spuriously retransmit (potentially) every packet from the entire write queue at line rate just a few milliseconds before the ACK for each packet arrives at the sender. To avoid such situations, now when a sender sees apparent reneging it does not yet retransmit, but rather adjusts the RTO timer to give the receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs that will restore sanity to the SACK scoreboard. If the reneging persists until this RTO then, as before, we clear the SACK scoreboard and enter CA_Loss. A 10ms delay tolerates a receiver sending such a stream of ACKs at 56Kbit/sec. And to allow for receivers with slower or more congested paths, we wait for at least RTT/2. We validated the resulting max(RTT/2, 10ms) delay formula with a mix of North American and South American Google web server traffic, and found that for ACKs displaying transient reneging: (1) 90% of inter-ACK delays were less than 10ms (2) 99% of inter-ACK delays were less than RTT/2 In tests on Google web servers this commit reduced reneging events by 75%-90% (as measured by the TcpExtTCPSACKReneging counter), without any measurable impact on latency for user HTTP and SPDY requests. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:29:33 -07:00
David S. Miller	aef4f5b6db	Merge tag 'master-2014-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next Conflicts: net/6lowpan/iphc.c Minor conflicts in iphc.c were changes overlapping with some style cleanups. John W. Linville says: ==================== Please pull this last(?) batch of wireless change intended for the 3.17 stream... For the NFC bits, Samuel says: "This is a rather quiet one, we have: - A new driver from ST Microelectronics for their NCI ST21NFCB, including device tree support. - p2p support for the ST21NFCA driver - A few fixes an enhancements for the NFC digital laye" For the Atheros bits, Kalle says: "Michal and Janusz did some important RX aggregation fixes, basically we were missing RX reordering altogether. The 10.1 firmware doesn't support Ad-Hoc mode and Michal fixed ath10k so that it doesn't advertise Ad-Hoc support with that firmware. Also he implemented a workaround for a KVM issue." For the Bluetooth bits, Gustavo and Johan say: "To quote Gustavo from his previous request: 'Some last minute fixes for -next. We have a fix for a use after free in RFCOMM, another fix to an issue with ADV_DIRECT_IND and one for ADV_IND with auto-connection handling. Last, we added support for reading the codec and MWS setting for controllers that support these features.' Additionally there are fixes to LE scanning, an update to conform to the 4.1 core specification as well as fixes for tracking the page scan state. All of these fixes are important for 3.17." And, "We've got: - 6lowpan fixes/cleanups - A couple crash fixes, one for the Marvell HCI driver and another in LE SMP. - Fix for an incorrect connected state check - Fix for the bondable requirement during pairing (an issue which had crept in because of using "pairable" when in fact the actual meaning was "bondable" (these have different meanings in Bluetooth)" Along with those are some late-breaking hardware support patches in brcmfmac and b43 as well as a stray ath9k patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 13:18:20 -07:00
Steve Wise	d1e458fe67	svcrdma: remove rdma_create_qp() failure recovery logic In svc_rdma_accept(), if rdma_create_qp() fails, there is useless logic to try and call rdma_create_qp() again with reduced sge depths. The assumption, I guess, was that perhaps the initial sge depths chosen were too big. However they initial depths are selected based on the rdma device attribute max_sge returned from ib_query_device(). If rdma_create_qp() fails, it would not be because the max_send_sge and max_recv_sge values passed in exceed the device's max. So just remove this code. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 16:09:21 -04:00
Simon Wunderlich	71b75d0e95	batman-adv: Start new development cycle Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-08-05 09:42:17 +02:00
Simon Wunderlich	e03366ea6b	batman-adv: increase default hop penalty The default hop penalty is currently set to 15, which is applied like that for multi interface devices (e.g. dual band APs). Single band devices will still use an effective penalty of 30 (hop penalty + wifi penalty). After receiving reports of too long paths in mesh networks with dual band APs which were fixed by increasing the hop penalty, we'd like to suggest to increase that default value in the default setting as well. We've evaluated that increase in a handful of medium sized mesh networks (5-20 nodes) with single and dual band devices, with changes for the better (shorter routes, higher throughput) or no change at all. This patch changes the hop penalty to 30, which will give an effective penalty of 60 on single band devices (hop penalty + wifi penalty). Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-08-05 09:42:13 +02:00
André Gaul	23c4ec10f4	batman-adv: remove unnecessary logspam This patch removes unnecessary logspam which resulted from superfluous calls to net_ratelimit(). With the supplied patch, net_ratelimit() is called after the loglevel has been checked. Signed-off-by: André Gaul <gaul@web-yard.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-08-05 09:41:54 +02:00
Sven Eckelmann	d9124268d8	batman-adv: Fix out-of-order fragmentation support batadv_frag_insert_packet was unable to handle out-of-order packets because it dropped them directly. This is caused by the way the fragmentation lists is checked for the correct place to insert a fragmentation entry. The fragmentation code keeps the fragments in lists. The fragmentation entries are kept in descending order of sequence number. The list is traversed and each entry is compared with the new fragment. If the current entry has a smaller sequence number than the new fragment then the new one has to be inserted before the current entry. This ensures that the list is still in descending order. An out-of-order packet with a smaller sequence number than all entries in the list still has to be added to the end of the list. The used hlist has no information about the last entry in the list inside hlist_head and thus the last entry has to be calculated differently. Currently the code assumes that the iterator variable of hlist_for_each_entry can be used for this purpose after the hlist_for_each_entry finished. This is obviously wrong because the iterator variable is always NULL when the list was completely traversed. Instead the information about the last entry has to be stored in a different variable. This problem was introduced in `610bfc6bc9` ("batman-adv: Receive fragmented packets and merge"). Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-08-05 09:12:16 +02:00
Eric Dumazet	67a24ac18b	netlink: fix lockdep splats With netlink_lookup() conversion to RCU, we need to use appropriate rcu dereference in netlink_seq_socket_idx() & netlink_seq_next() Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `e341694e3e` ("netlink: Convert netlink_lookup() to use RCU protected hash table") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-04 22:58:06 -07:00
Linus Torvalds	79eb238c76	TTY / Serial driver update for 3.17-rc1 Here's the big tty / serial driver update for 3.17-rc1. Nothing major, just a number of fixes and new features for different serial drivers, and some more tty core fixes and documentation of the tty locks. All of these have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlPf2C4ACgkQMUfUDdst+yllVgCgtZl/Mcr/LlxPgjsg2C1AE7nX YJ4An3o4N112bkdGqhZ7RjAE6K/8YILx =rPhE -----END PGP SIGNATURE----- Merge tag 'tty-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver update from Greg KH: "Here's the big tty / serial driver update for 3.17-rc1. Nothing major, just a number of fixes and new features for different serial drivers, and some more tty core fixes and documentation of the tty locks. All of these have been in linux-next for a while" * tag 'tty-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (82 commits) tty/n_gsm.c: fix a memory leak in gsmld_open pch_uart: don't hardcode PCI slot to get DMA device tty: n_gsm, use setup_timer Revert "ARC: [arcfpga] stdout-path now suffices for earlycon/console" serial: sc16is7xx: Correct initialization of s->clk serial: 8250_dw: Add support for deferred probing serial: 8250_dw: Add optional reset control support serial: st-asc: Fix overflow in baudrate calculation serial: st-asc: Don't call BUG in asc_console_setup() tty: serial: msm: Make of_device_id array const tty/n_gsm.c: get gsm->num after gsm_activate_mux serial/core: Fix too big allocation for attribute member drivers/tty/serial: use correct type for dma_map/unmap serial: altera_jtaguart: Fix putchar function passed to uart_console_write() serial/uart/8250: Add tunable RX interrupt trigger I/F of FIFO buffers Serial: allow port drivers to have a default attribute group tty: kgdb_nmi: Automatically manage tty enable serial: altera_jtaguart: Adpot uart_console_write() serial: samsung: improve code clarity by defining a variable serial: samsung: correct the case and default order in switch ...	2014-08-04 18:51:19 -07:00
Linus Torvalds	98959948a7	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Move the nohz kick code out of the scheduler tick to a dedicated IPI, from Frederic Weisbecker. This necessiated quite some background infrastructure rework, including: * Clean up some irq-work internals * Implement remote irq-work * Implement nohz kick on top of remote irq-work * Move full dynticks timer enqueue notification to new kick * Move multi-task notification to new kick * Remove unecessary barriers on multi-task notification - Remove proliferation of wait_on_bit() action functions and allow wait_on_bit_action() functions to support a timeout. (Neil Brown) - Another round of sched/numa improvements, cleanups and fixes. (Rik van Riel) - Implement fast idling of CPUs when the system is partially loaded, for better scalability. (Tim Chen) - Restructure and fix the CPU hotplug handling code that may leave cfs_rq and rt_rq's throttled when tasks are migrated away from a dead cpu. (Kirill Tkhai) - Robustify the sched topology setup code. (Peterz Zijlstra) - Improve sched_feat() handling wrt. static_keys (Jason Baron) - Misc fixes. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) sched/fair: Fix 'make xmldocs' warning caused by missing description sched: Use macro for magic number of -1 for setparam sched: Robustify topology setup sched: Fix sched_setparam() policy == -1 logic sched: Allow wait_on_bit_action() functions to support a timeout sched: Remove proliferation of wait_on_bit() action functions sched/numa: Revert "Use effective_load() to balance NUMA loads" sched: Fix static_key race with sched_feat() sched: Remove extra static_key*() function indirection sched/rt: Fix replenish_dl_entity() comments to match the current upstream code sched: Transform resched_task() into resched_curr() sched/deadline: Kill task_struct->pi_top_task sched: Rework check_for_tasks() sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime() sched/fair: Disable runtime_enabled on dying rq sched/numa: Change scan period code to match intent sched/numa: Rework best node setting in task_numa_migrate() sched/numa: Examine a task move when examining a task swap sched/numa: Simplify task_numa_compare() sched/numa: Use effective_load() to balance NUMA loads ...	2014-08-04 16:23:30 -07:00
Dmitry Popov	64a124edcc	tcp: md5: remove unneeded check in tcp_v4_parse_md5_keys tcpm_key is an array inside struct tcp_md5sig, there is no need to check it against NULL. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-04 15:09:45 -07:00
Michael S. Tsirkin	4b7a9168e1	bridge: remove a useless comment commit `6cbdceeb1c` bridge: Dump vlan information from a bridge port introduced a comment in an attempt to explain the code logic. The comment is unfinished so it confuses more than it explains, remove it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-04 12:46:51 -07:00
Linus Torvalds	47dfe4037e	Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Mostly changes to get the v2 interface ready. The core features are mostly ready now and I think it's reasonable to expect to drop the devel mask in one or two devel cycles at least for a subset of controllers. - cgroup added a controller dependency mechanism so that block cgroup can depend on memory cgroup. This will be used to finally support IO provisioning on the writeback traffic, which is currently being implemented. - The v2 interface now uses a separate table so that the interface files for the new interface are explicitly declared in one place. Each controller will explicitly review and add the files for the new interface. - cpuset is getting ready for the hierarchical behavior which is in the similar style with other controllers so that an ancestor's configuration change doesn't change the descendants' configurations irreversibly and processes aren't silently migrated when a CPU or node goes down. All the changes are to the new interface and no behavior changed for the multiple hierarchies" * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits) cpuset: fix the WARN_ON() in update_nodemasks_hier() cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core cgroup: distinguish the default and legacy hierarchies when handling cftypes cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes cgroup: split cgroup_base_files[] into cgroup_{dfl\|legacy}_base_files[] cpuset: export effective masks to userspace cpuset: allow writing offlined masks to cpuset.cpus/mems cpuset: enable onlined cpu/node in effective masks cpuset: refactor cpuset_hotplug_update_tasks() cpuset: make cs->{cpus, mems}_allowed as user-configured masks cpuset: apply cs->effective_{cpus,mems} cpuset: initialize top_cpuset's configured masks at mount cpuset: use effective cpumask to build sched domains cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty cpuset: update cs->effective_{cpus, mems} when config changes cpuset: update cpuset->effective_{cpus,mems} at hotplug cpuset: add cs->effective_cpus and cs->effective_mems cgroup: clean up sane_behavior handling ...	2014-08-04 10:11:28 -07:00
Antonio Quartulli	0185dda640	batman-adv: prefer kmalloc_array to kmalloc when possible Reported by checkpatch with the following warning: WARNING: Prefer kmalloc_array over kmalloc with multiply Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-08-04 16:02:10 +02:00
NeilBrown	122a8cda6a	SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred current_cred() can only be changed by 'current', and cred->group_info is never changed. If a new group_info is needed, a new 'cred' is created. Consequently it is always safe to access current_cred()->group_info without taking any further references. So drop the refcounting and the incorrect rcu_dereference(). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-04 09:22:08 -04:00
NeilBrown	bd95608053	sunrpc/auth: allow lockless (rcu) lookup of credential cache. The new flag RPCAUTH_LOOKUP_RCU to credential lookup avoids locking, does not take a reference on the returned credential, and returns -ECHILD if a simple lookup was not possible. The returned value can only be used within an rcu_read_lock protected region. The main user of this is the new rpc_lookup_cred_nonblock() which returns a pointer to the current credential which is only rcu-safe (no ref-count held), and might return -ECHILD if allocation was required. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:12 -04:00
Jeff Layton	ec25422c66	sunrpc: remove "ec" argument from encrypt_v2 operation It's always 0. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:24 -04:00
Jeff Layton	b36e9c44af	sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c Fix the endianness handling in gss_wrap_kerberos_v1 and drop the memset call there in favor of setting the filler bytes directly. In gss_wrap_kerberos_v2, get rid of the "ec" variable which is always zero, and drop the endianness conversion of 0. Sparse handles 0 as a special case, so it's not necessary. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:24 -04:00
Jeff Layton	6ac0fbbfc1	sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c Use u16 pointer in setup_token and setup_token_v2. None of the fields are actually handled as __be16, so this simplifies the code a bit. Also get rid of some unneeded pointer increments. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:23 -04:00
Jeff Layton	c5e6aecd03	sunrpc: fix RCU handling of gc_ctx field The handling of the gc_ctx pointer only seems to be partially RCU-safe. The assignment and freeing are done using RCU, but many places in the code seem to dereference that pointer without proper RCU safeguards. Fix them to use rcu_dereference and to rcu_read_lock/unlock, and to properly handle the case where the pointer is NULL. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:23 -04:00
Trond Myklebust	9806755c56	Merge branch 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next * 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (916 commits) xprtrdma: Handle additional connection events xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro xprtrdma: Make rpcrdma_ep_disconnect() return void xprtrdma: Schedule reply tasklet once per upcall xprtrdma: Allocate each struct rpcrdma_mw separately xprtrdma: Rename frmr_wr xprtrdma: Disable completions for LOCAL_INV Work Requests xprtrdma: Disable completions for FAST_REG_MR Work Requests xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external() xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect xprtrdma: Properly handle exhaustion of the rb_mws list xprtrdma: Chain together all MWs in same buffer pool xprtrdma: Back off rkey when FAST_REG_MR fails xprtrdma: Unclutter struct rpcrdma_mr_seg xprtrdma: Don't invalidate FRMRs if registration fails xprtrdma: On disconnect, don't ignore pending CQEs xprtrdma: Update rkeys after transport reconnect xprtrdma: Limit data payload size for ALLPHYSICAL xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs ...	2014-08-03 17:04:51 -04:00
Trond Myklebust	bae6746ff3	SUNRPC: Enforce an upper limit on the number of cached credentials In some cases where the credentials are not often reused, we may want to limit their total number just in order to make the negative lookups in the hash table more manageable. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 16:02:50 -04:00
Thomas Graf	cfe4a9dda0	nftables: Convert nft_hash to use generic rhashtable The sizing of the hash table and the practice of requiring a lookup to retrieve the pprev to be stored in the element cookie before the deletion of an entry is left intact. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Patrick McHardy <kaber@trash.net> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 19:49:38 -07:00
Thomas Graf	e341694e3e	netlink: Convert netlink_lookup() to use RCU protected hash table Heavy Netlink users such as Open vSwitch spend a considerable amount of time in netlink_lookup() due to the read-lock on nl_table_lock. Use of RCU relieves the lock contention. Makes use of the new resizable hash table to avoid locking on the lookup. The hash table will grow if entries exceeds 75% of table size up to a total table size of 64K. It will automatically shrink if usage falls below 30%. Also splits nl_table_lock into a separate mutex to protect hash table mutations and allow synchronize_rcu() to sleep while waiting for readers during expansion and shrinking. Before: 9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup 6.42% kpktgend_0 [pktgen] [k] mod_cur_headers 6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker 6.23% kpktgend_0 [kernel.kallsyms] [k] memset 4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup 4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy 3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract 2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2 After: 15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup 8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker 7.92% kpktgend_0 [pktgen] [k] mod_cur_headers 5.11% kpktgend_0 [kernel.kallsyms] [k] memset 4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract 4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock 3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2 [...] 0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 19:49:38 -07:00
David S. Miller	ae8694fa8a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Maintain all DSCP and ECN bits for IPv6 tun forwarding. This resolves an inconsistency between IPv4 and IPv6 behaviour. Patch from Alex Gartrell via Simon Horman. 2) Fix unnoticeable blink in xt_LED when the led-always-blink option is used, from Jiri Prchal. 3) Add missing return in nft_del_setelem(), otherwise this results in a double call of nft_data_uninit() in the nf_tables code, from Thomas Graf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 16:43:04 -07:00
Hannes Frederic Sowa	166bd890a3	ipv6: data of fwmark_reflect sysctl needs to be updated on netns construction Fixes: `e110861f86` ("net: add a sysctl to reflect the fwmark on replies") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 16:16:54 -07:00
Nikolay Aleksandrov	d4ad4d22e7	inet: frags: use kmem_cache for inet_frag_queue Use kmem_cache to allocate/free inet_frag_queue objects since they're all the same size per inet_frags user and are alloced/freed in high volumes thus making it a perfect case for kmem_cache. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:31:31 -07:00
Nikolay Aleksandrov	2e404f632f	inet: frags: use INET_FRAG_EVICTED to prevent icmp messages Now that we have INET_FRAG_EVICTED we might as well use it to stop sending icmp messages in the "frag_expire" functions instead of stripping INET_FRAG_FIRST_IN from their flags when evicting. Also fix the comment style in ip6_expire_frag_queue(). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:31:31 -07:00
Nikolay Aleksandrov	f926e23660	inet: frags: fix function declaration alignments in inet_fragment Fix a couple of functions' declaration alignments. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:31:31 -07:00
Nikolay Aleksandrov	06aa8b8a03	inet: frags: rename last_in to flags The last_in field has been used to store various flags different from first/last frag in so give it a more descriptive name: flags. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:31:31 -07:00
Nikolay Aleksandrov	d2373862b3	inet: frags: use INC_STATS_BH in the ipv6 reassembly code Softirqs are already disabled so no need to do it again, thus let's be consistent and use the IP6_INC_STATS_BH variant. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:31:31 -07:00
Duan Jiong	188b1210f3	ipv4: remove nested rcu_read_lock/unlock ip_local_deliver_finish() already have a rcu_read_lock/unlock, so the rcu_read_lock/unlock is unnecessary. See the stack below: ip_local_deliver_finish \| \| ->icmp_rcv \| \| ->icmp_socket_deliver Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:27:35 -07:00
Alexei Starovoitov	7ae457c1e5	net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_' prefix - everything that is pure BPF is changed to 'bpf_' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern orig_prog; unsigned int (bpf_func)(const struct sk_buff skb, const struct bpf_insn filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter ' and BPF_PROG_RUN to be used with 'struct bpf_prog ' __sk_filter_release(struct sk_filter ) gains __bpf_prog_release(struct bpf_prog ) helper function also perform related renames for the functions that work with 'struct bpf_prog ', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock )/sk_detach_filter(struct sock ) and SK_RUN_FILTER(struct sk_filter , ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog )/bpf_prog_destroy(struct bpf_prog ) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:03:58 -07:00
Alexei Starovoitov	8fb575ca39	net: filter: rename sk_convert_filter() -> bpf_convert_filter() to indicate that this function is converting classic BPF into eBPF and not related to sockets Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:02:38 -07:00
Alexei Starovoitov	4df95ff488	net: filter: rename sk_chk_filter() -> bpf_check_classic() trivial rename to indicate that this functions performs classic BPF checking Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:02:38 -07:00
Alexei Starovoitov	009937e78a	net: filter: rename sk_filter_proglen -> bpf_classic_proglen trivial rename to better match semantics of macro Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:02:38 -07:00
Alexei Starovoitov	278571baca	net: filter: simplify socket charging attaching bpf program to a socket involves multiple socket memory arithmetic, since size of 'sk_filter' is changing when classic BPF is converted to eBPF. Also common path of program creation has to deal with two ways of freeing the memory. Simplify the code by delaying socket charging until program is ready and its size is known Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:02:37 -07:00
James Morris	103ae675b1	Merge branch 'next' of git://git.infradead.org/users/pcmoore/selinux into next	2014-08-02 22:58:02 +10:00
Thomas Graf	0dc1362562	netfilter: nf_tables: Avoid duplicate call to nft_data_uninit() for same key nft_del_setelem() currently calls nft_data_uninit() twice on the same key. Once to release the key which is guaranteed to be NFT_DATA_VALUE and a second time in the error path to which it falls through. The second call has been harmless so far though because the type passed is always NFT_DATA_VALUE which is currently a no-op. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-01 18:14:49 +02:00
Paul Moore	4fbe63d1c7	netlabel: shorter names for the NetLabel catmap funcs/structs Historically the NetLabel LSM secattr catmap functions and data structures have had very long names which makes a mess of the NetLabel code and anyone who uses NetLabel. This patch renames the catmap functions and structures from "_secattr_catmap_" to just "_catmap_" which improves things greatly. There are no substantial code or logic changes in this patch. Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>	2014-08-01 11:17:37 -04:00
Paul Moore	d960a6184a	netlabel: fix the catmap walking functions The two NetLabel LSM secattr catmap walk functions didn't handle certain edge conditions correctly, causing incorrect security labels to be generated in some cases. This patch corrects these problems and converts the functions to use the new _netlbl_secattr_catmap_getnode() function in order to reduce the amount of repeated code. Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>	2014-08-01 11:17:29 -04:00
Paul Moore	4b8feff251	netlabel: fix the horribly broken catmap functions The NetLabel secattr catmap functions, and the SELinux import/export glue routines, were broken in many horrible ways and the SELinux glue code fiddled with the NetLabel catmap structures in ways that we probably shouldn't allow. At some point this "worked", but that was likely due to a bit of dumb luck and sub-par testing (both inflicted by yours truly). This patch corrects these problems by basically gutting the code in favor of something less obtuse and restoring the NetLabel abstractions in the SELinux catmap glue code. Everything is working now, and if it decides to break itself in the future this code will be much easier to debug than the code it replaces. One noteworthy side effect of the changes is that it is no longer necessary to allocate a NetLabel catmap before calling one of the NetLabel APIs to set a bit in the catmap. NetLabel will automatically allocate the catmap nodes when needed, resulting in less allocations when the lowest bit is greater than 255 and less code in the LSMs. Cc: stable@vger.kernel.org Reported-by: Christian Evans <frodox@zoho.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>	2014-08-01 11:17:17 -04:00
Paul Moore	41c3bd2039	netlabel: fix a problem when setting bits below the previously lowest bit The NetLabel category (catmap) functions have a problem in that they assume categories will be set in an increasing manner, e.g. the next category set will always be larger than the last. Unfortunately, this is not a valid assumption and could result in problems when attempting to set categories less than the startbit in the lowest catmap node. In some cases kernel panics and other nasties can result. This patch corrects the problem by checking for this and allocating a new catmap node instance and placing it at the front of the list. Cc: stable@vger.kernel.org Reported-by: Christian Evans <frodox@zoho.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>	2014-08-01 11:17:03 -04:00
Duan Jiong	4330487acf	net: use inet6_iif instead of IP6CB()->iif Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 22:37:06 -07:00
Vlad Yasevich	fcdfe3a7fa	net: Correctly set segment mac_len in skb_segment(). When performing segmentation, the mac_len value is copied right out of the original skb. However, this value is not always set correctly (like when the packet is VLAN-tagged) and we'll end up copying a bad value. One way to demonstrate this is to configure a VM which tags packets internally and turn off VLAN acceleration on the forwarding bridge port. The packets show up corrupt like this: 16:18:24.985548 52:54:00🆎be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0, 0x0000: 8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...\|...d@.....d. 0x0010: 0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0.. 0x0020: 29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3 0x0030: 000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp 0x0040: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0050: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0060: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp ... This also leads to awful throughput as GSO packets are dropped and cause retransmissions. The solution is to set the mac_len using the values already available in then new skb. We've already adjusted all of the header offset, so we might as well correctly figure out the mac_len using skb_reset_mac_len(). After this change, packets are segmented correctly and performance is restored. CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 22:28:39 -07:00
Tobias Klauser	74e83b23f2	netlink: Use PAGE_ALIGNED macro Use PAGE_ALIGNED(...) instead of IS_ALIGNED(..., PAGE_SIZE). Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 22:05:28 -07:00
Duan Jiong	7304fe4681	net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS When dealing with ICMPv[46] Error Message, function icmp_socket_deliver() and icmpv6_notify() do some valid checks on packet's length, but then some protocols check packet's length redaudantly. So remove those duplicated statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in function icmp_socket_deliver() and icmpv6_notify() respectively. In addition, add missed counter in udp6/udplite6 when socket is NULL. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 22:04:18 -07:00
Jason Gunthorpe	299ee123e1	sctp: Fixup v4mapped behaviour to comply with Sock API The SCTP socket extensions API document describes the v4mapping option as follows: 8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR) This socket option is a Boolean flag which turns on or off the mapping of IPv4 addresses. If this option is turned on, then IPv4 addresses will be mapped to V6 representation. If this option is turned off, then no mapping will be done of V4 addresses and a user will receive both PF_INET6 and PF_INET type addresses on the socket. See [RFC3542] for more details on mapped V6 addresses. This description isn't really in line with what the code does though. Introduce addr_to_user (renamed addr_v4map), which should be called before any sockaddr is passed back to user space. The new function places the sockaddr into the correct format depending on the SCTP_I_WANT_MAPPED_V4_ADDR option. Audit all places that touched v4mapped and either sanely construct a v4 or v6 address then call addr_to_user, or drop the unnecessary v4mapped check entirely. Audit all places that call addr_to_user and verify they are on a sycall return path. Add a custom getname that formats the address properly. Several bugs are addressed: - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for addresses to user space - The addr_len returned from recvmsg was not correct when returning AF_INET on a v6 socket - flowlabel and scope_id were not zerod when promoting a v4 to v6 - Some syscalls like bind and connect behaved differently depending on v4mapped Tested bind, getpeername, getsockname, connect, and recvmsg for proper behaviour in v4mapped = 1 and 0 cases. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 21:49:06 -07:00
David S. Miller	a173e550c2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains netfilter updates for net-next, they are: 1) Add the reject expression for the nf_tables bridge family, this allows us to send explicit reject (TCP RST / ICMP dest unrech) to the packets matching a rule. 2) Simplify and consolidate the nf_tables set dumping logic. This uses netlink control->data to filter out depending on the request. 3) Perform garbage collection in xt_hashlimit using a workqueue instead of a timer, which is problematic when many entries are in place in the tables, from Eric Dumazet. 4) Remove leftover code from the removed ulog target support, from Paul Bolle. 5) Dump unmodified flags in the netfilter packet accounting when resetting counters, so userspace knows that a counter was in overquota situation, from Alexey Perevalov. 6) Fix wrong usage of the bitwise functions in nfnetlink_acct, also from Alexey. 7) Fix a crash when adding new set element with an empty NFTA_SET_ELEM_LIST attribute. This patchset also includes a couple of cleanups for xt_LED from Duan Jiong and for nf_conntrack_ipv4 (using coccinelle) from Himangi Saraogi. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 14:09:14 -07:00
Banerjee, Debabrata	388070faa1	tcp: don't require root to read tcp_metrics commit `d23ff7016` (tcp: add generic netlink support for tcp_metrics) introduced netlink support for the new tcp_metrics, however it restricted getting of tcp_metrics to root user only. This is a change from how these values could have been fetched when in the old route cache. Unless there's a legitimate reason to restrict the reading of these values it would be better if normal users could fetch them. Cc: Julian Anastasov <ja@ssi.bg> Cc: linux-kernel@vger.kernel.org Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 14:07:37 -07:00
Chuck Lever	8079fb785e	xprtrdma: Handle additional connection events Commit `38ca83a5` added RDMA_CM_EVENT_TIMEWAIT_EXIT. But that status is relevant only for consumers that re-use their QPs on new connections. xprtrdma creates a fresh QP on reconnection, so that event should be explicitly ignored. Squelch the alarming "unexpected CM event" message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:59 -04:00
Chuck Lever	a779ca5fa7	xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro Clean up. RPCRDMA_PERSISTENT_REGISTRATION was a compile-time switch between RPCRDMA_REGISTER mode and RPCRDMA_ALLPHYSICAL mode. Since RPCRDMA_REGISTER has been removed, there's no need for the extra conditional compilation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:59 -04:00
Chuck Lever	282191cb72	xprtrdma: Make rpcrdma_ep_disconnect() return void Clean up: The return code is used only for dprintk's that are already redundant. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:58 -04:00
Chuck Lever	bb96193d91	xprtrdma: Schedule reply tasklet once per upcall Minor optimization: grab rpcrdma_tk_lock_g and disable hard IRQs just once after clearing the receive completion queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:58 -04:00
Chuck Lever	2e84522c2e	xprtrdma: Allocate each struct rpcrdma_mw separately Currently rpcrdma_buffer_create() allocates struct rpcrdma_mw's as a single contiguous area of memory. It amounts to quite a bit of memory, and there's no requirement for these to be carved from a single piece of contiguous memory. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:57 -04:00
Chuck Lever	f590e878c5	xprtrdma: Rename frmr_wr Clean up: Name frmr_wr after the opcode of the Work Request, consistent with the send and local invalidation paths. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:57 -04:00
Chuck Lever	dab7e3b8da	xprtrdma: Disable completions for LOCAL_INV Work Requests Instead of relying on a completion to change the state of an FRMR to FRMR_IS_INVALID, set it in advance. If an error occurs, a completion will fire anyway and mark the FRMR FRMR_IS_STALE. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:57 -04:00
Chuck Lever	050557220e	xprtrdma: Disable completions for FAST_REG_MR Work Requests Instead of relying on a completion to change the state of an FRMR to FRMR_IS_VALID, set it in advance. If an error occurs, a completion will fire anyway and mark the FRMR FRMR_IS_STALE. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:56 -04:00
Chuck Lever	440ddad51b	xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external() Any FRMR arriving in rpcrdma_register_frmr_external() is now guaranteed to be either invalid, or to be targeted by a queued LOCAL_INV that will invalidate it before the adapter processes the FAST_REG_MR being built here. The problem with current arrangement of chaining a LOCAL_INV to the FAST_REG_MR is that if the transport is not connected, the LOCAL_INV is flushed and the FAST_REG_MR is flushed. This leaves the FRMR valid with the old rkey. But rpcrdma_register_frmr_external() has already bumped the in-memory rkey. Next time through rpcrdma_register_frmr_external(), a LOCAL_INV and FAST_REG_MR is attempted again because the FRMR is still valid. But the rkey no longer matches the hardware's rkey, and a memory management operation error occurs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:56 -04:00
Chuck Lever	ddb6bebcc6	xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request When a LOCAL_INV Work Request is flushed, it leaves an FRMR in the VALID state. This FRMR can be returned by rpcrdma_buffer_get(), and must be knocked down in rpcrdma_register_frmr_external() before it can be re-used. Instead, capture these in rpcrdma_buffer_get(), and reset them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:55 -04:00
Chuck Lever	9f9d802a28	xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect FAST_REG_MR Work Requests update a Memory Region's rkey. Rkey's are used to block unwanted access to the memory controlled by an MR. The rkey is passed to the receiver (the NFS server, in our case), and is also used by xprtrdma to invalidate the MR when the RPC is complete. When a FAST_REG_MR Work Request is flushed after a transport disconnect, xprtrdma cannot tell whether the WR actually hit the adapter or not. So it is indeterminant at that point whether the existing rkey is still valid. After the transport connection is re-established, the next FAST_REG_MR or LOCAL_INV Work Request against that MR can sometimes fail because the rkey value does not match what xprtrdma expects. The only reliable way to recover in this case is to deregister and register the MR before it is used again. These operations can be done only in a process context, so handle it in the transport connect worker. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:55 -04:00
Chuck Lever	c2922c0235	xprtrdma: Properly handle exhaustion of the rb_mws list If the rb_mws list is exhausted, clean up and return NULL so that call_allocate() will delay and try again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:55 -04:00
Chuck Lever	3111d72c7c	xprtrdma: Chain together all MWs in same buffer pool During connection loss recovery, need to visit every MW in a buffer pool. Any MW that is in use by an RPC will not be on the rb_mws list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:54 -04:00
Chuck Lever	c93e986a29	xprtrdma: Back off rkey when FAST_REG_MR fails If posting a FAST_REG_MR Work Reqeust fails, revert the rkey update to avoid subsequent IB_WC_MW_BIND_ERR completions. Suggested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:54 -04:00
Chuck Lever	0dbb4108a6	xprtrdma: Unclutter struct rpcrdma_mr_seg Clean ups: - make it obvious that the rl_mw field is a pointer -- allocated separately, not as part of struct rpcrdma_mr_seg - promote "struct {} frmr;" to a named type - promote the state enum to a named type - name the MW state field the same way other fields in rpcrdma_mw are named Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:54 -04:00
Chuck Lever	539431a437	xprtrdma: Don't invalidate FRMRs if registration fails If FRMR registration fails, it's likely to transition the QP to the error state. Or, registration may have failed because the QP is _already_ in ERROR. Thus calling rpcrdma_deregister_external() in rpcrdma_create_chunks() is useless in FRMR mode: the LOCAL_INVs just get flushed. It is safe to leave existing registrations: when FRMR registration is tried again, rpcrdma_register_frmr_external() checks if each FRMR is already/still VALID, and knocks it down first if it is. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:53 -04:00
Chuck Lever	a7bc211ac9	xprtrdma: On disconnect, don't ignore pending CQEs xprtrdma is currently throwing away queued completions during a reconnect. RPC replies posted just before connection loss, or successful completions that change the state of an FRMR, can be missed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:53 -04:00
Chuck Lever	6ab59945f2	xprtrdma: Update rkeys after transport reconnect Various reports of: rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 ep ffff8800bfd3e848 Ensure that rkeys in already-marshalled RPC/RDMA headers are refreshed after the QP has been replaced by a reconnect. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=249 Suggested-by: Selvin Xavier <Selvin.Xavier@Emulex.Com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:53 -04:00
Chuck Lever	43e9598817	xprtrdma: Limit data payload size for ALLPHYSICAL When the client uses physical memory registration, each page in the payload gets its own array entry in the RPC/RDMA header's chunk list. Therefore, don't advertise a maximum payload size that would require more array entries than can fit in the RPC buffer where RPC/RDMA headers are built. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:52 -04:00
Chuck Lever	73806c8832	xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs Ensure ia->ri_id remains valid while invoking dma_unmap_page() or posting LOCAL_INV during a transport reconnect. Otherwise, ia->ri_id->device or ia->ri_id->qp is NULL, which triggers a panic. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=259 Fixes: `ec62f40` 'xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting' Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:52 -04:00
Chuck Lever	5fc83f470d	xprtrdma: Fix panic in rpcrdma_register_frmr_external() seg1->mr_nsegs is not yet initialized when it is used to unmap segments during an error exit. Use the same unmapping logic for all error exits. "if (frmr_wr.wr.fast_reg.length < len) {" used to be a BUG_ON check. The broken code will never be executed under normal operation. Fixes: `c977dea` (xprtrdma: Remove BUG_ON() call sites) Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-31 16:22:52 -04:00
Toshiaki Makita	47fab41ab5	bridge: Don't include NDA_VLAN for FDB entries with vid 0 An FDB entry with vlan_id 0 doesn't mean it is used in vlan 0, but used when vlan_filtering is disabled. There is inconsistency around NDA_VLAN whose payload is 0 - even if we add an entry by RTM_NEWNEIGH without any NDA_VLAN, and even though adding an entry with NDA_VLAN 0 is prohibited, we get an entry with NDA_VLAN 0 by RTM_GETNEIGH. Dumping an FDB entry with vlan_id 0 shouldn't include NDA_VLAN. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 12:18:44 -07:00
Pablo Neira Ayuso	7d5570ca89	netfilter: nf_tables: check for unset NFTA_SET_ELEM_LIST_ELEMENTS attribute Otherwise, the kernel oopses in nla_for_each_nested when iterating over the unset attribute NFTA_SET_ELEM_LIST_ELEMENTS in the nf_tables_{new,del}setelem() path. netlink: 65524 bytes leftover after parsing attributes in process `nft'. [...] Oops: 0000 [#1] SMP [...] CPU: 2 PID: 6287 Comm: nft Not tainted 3.16.0-rc2+ #169 RIP: 0010:[<ffffffffa0526e61>] [<ffffffffa0526e61>] nf_tables_newsetelem+0x82/0xec [nf_tables] [...] Call Trace: [<ffffffffa05178c4>] nfnetlink_rcv+0x2e7/0x3d7 [nfnetlink] [<ffffffffa0517939>] ? nfnetlink_rcv+0x35c/0x3d7 [nfnetlink] [<ffffffff8137d300>] netlink_unicast+0xf8/0x17a [<ffffffff8137d6a5>] netlink_sendmsg+0x323/0x351 [...] Fix this by returning -EINVAL if this attribute is not set, which doesn't make sense at all since those commands are there to add and to delete elements from the set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-31 21:11:43 +02:00
Alexey Perevalov	b6d0468804	netfilter: nfnetlink_acct: avoid using NFACCT_F_OVERQUOTA with bit helper functions Bit helper functions were used for manipulation with NFACCT_F_OVERQUOTA, but they are accepting pit position, but not a bit mask. As a result not a third bit for NFACCT_F_OVERQUOTA was set, but forth. Such behaviour was dangarous and could lead to unexpected overquota report result. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-31 19:55:47 +02:00
David S. Miller	ccda4a77f3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-07-30 This is the last pull request for ipsec-next before I'll be off for two weeks starting on friday. David, can you please take urgent ipsec patches directly into net/net-next during this time? 1) Error handling simplifications for vti and vti6. From Mathias Krause. 2) Remove a duplicate semicolon after a return statement. From Christoph Paasch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 20:05:54 -07:00
Pablo Neira	34c5bd66e5	net: filter: don't release unattached filter through call_rcu() sk_unattached_filter_destroy() does not always need to release the filter object via rcu. Since this filter is never attached to the socket, the caller should be responsible for releasing the filter in a safe way, which may not necessarily imply rcu. This is a short summary of clients of this function: 1) xt_bpf.c and cls_bpf.c use the bpf matchers from rules, these rules are removed from the packet path before the filter is released. Thus, the framework makes sure the filter is safely removed. 2) In the ppp driver, the ppp_lock ensures serialization between the xmit and filter attachment/detachment path. This doesn't use rcu so deferred release via rcu makes no sense. 3) In the isdn/ppp driver, it is called from isdn_ppp_release() the isdn_ppp_ioctl(). This driver uses mutex and spinlocks, no rcu. Thus, deferred rcu makes no sense to me either, the deferred releases may be just masking the effects of wrong locking strategy, which should be fixed in the driver itself. 4) In the team driver, this is the only place where the rcu synchronization with unattached filter is used. Therefore, this patch introduces synchronize_rcu() which is called from the genetlink path to make sure the filter doesn't go away while packets are still walking over it. I think we can revisit this once struct bpf_prog (that only wraps specific bpf code bits) is in place, then add some specific struct rcu_head in the scope of the team driver if Jiri thinks this is needed. Deferred rcu release for unattached filters was originally introduced in `302d663` ("filter: Allow to create sk-unattached filters"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 19:56:27 -07:00
Thomas Graf	80019d310f	net: Remove unlikely() for WARN_ON() conditions No need for the unlikely(), WARN_ON() and BUG_ON() internally use unlikely() on the condition. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 17:41:47 -07:00
Christoph Paasch	1f74e613de	tcp: Fix integer-overflow in TCP vegas In vegas we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. Then, we need to do do_div to allow this to be used on 32-bit arches. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Doug Leith <doug.leith@nuim.ie> Fixes: `8d3a564da3` (tcp: tcp_vegas cong avoid fix) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 17:31:06 -07:00
Christoph Paasch	45a07695bc	tcp: Fix integer-overflows in TCP veno In veno we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. A first attempt at fixing `76f1017757` ([TCP]: TCP Veno congestion control) was made by `159131149c` (tcp: Overflow bug in Vegas), but it failed to add the required cast in tcp_veno_cong_avoid(). Fixes: `76f1017757` ([TCP]: TCP Veno congestion control) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 17:31:06 -07:00
Anish Bhatt	16eecd9be4	dcbnl : Fix misleading dcb_app->priority explanation Current explanation of dcb_app->priority is wrong. It says priority is expected to be a 3-bit unsigned integer which is only true when working with DCBx-IEEE. Use of dcb_app->priority by DCBx-CEE expects it to be 802.1p user priority bitmap. Updated accordingly This affects the cxgb4 driver, but I will post those changes as part of a larger changeset shortly. Fixes: `3e29027af4` ("dcbnl: add support for ieee8021Qaz attributes") Signed-off-by: Anish Bhatt <anish@chelsio.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 17:21:05 -07:00
Dmitry Popov	95cb574598	ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip" Ipv4 tunnels created with "local any remote $ip" didn't work properly since `7d442fab0` (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels had src addr = 0.0.0.0. That was because only dst_entry was cached, although fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry (tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit(). This patch adds saddr to ip_tunnel->dst_cache, fixing this issue. Reported-by: Sergey Popov <pinkbyte@gentoo.org> Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 15:18:58 -07:00
David S. Miller	f139c74a8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 13:25:49 -07:00
Johan Hedberg	82c295b1b0	Bluetooth: Always use non-bonding requirement when not bondable When we're not bondable we should never send any other SSP authentication requirement besides one of the non-bonding ones. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:41 +02:00
Johan Hedberg	b2939475eb	Bluetooth: Rename pairable mgmt setting to bondable This setting maps to the HCI_BONDABLE flag which tracks whether we're bondable or not. Therefore, rename the mgmt setting and respective command accordingly. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:41 +02:00
Johan Hedberg	b6ae8457ac	Bluetooth: Rename HCI_PAIRABLE to HCI_BONDABLE The HCI_PAIRABLE flag isn't actually controlling whether we're pairable but whether we're bondable. Therefore, rename it accordingly. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:41 +02:00
Marcel Holtmann	bdb9434664	Bluetooth: Fix sparse warning from HID new leds handling The new leds bit handling produces this spares warning. CHECK net/bluetooth/hidp/core.c net/bluetooth/hidp/core.c:156:60: warning: dubious: x \| !y Just fix it by doing an explicit x << 0 shift operation. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-30 19:28:41 +02:00
Johan Hedberg	6f78fd4bb9	Bluetooth: Fix check for connected state when pairing Both BT_CONNECTED and BT_CONFIG state mean that we have a baseband link available. We should therefore check for either of these when pairing and deciding whether to call hci_conn_security() directly. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:41 +02:00
Marcel Holtmann	3fa71fe0b9	6lowpan: iphc: Fix parenthesis alignments which off-by-one CHECK: Alignment should match open parenthesis + if (((hdr->flow_lbl[0] & 0x0F) == 0) && + (hdr->flow_lbl[1] == 0) && (hdr->flow_lbl[2] == 0)) { CHECK: Alignment should match open parenthesis + if ((hdr->priority == 0) && + ((hdr->flow_lbl[0] & 0xF0) == 0)) { CHECK: Alignment should match open parenthesis + if ((hdr->priority == 0) && + ((hdr->flow_lbl[0] & 0xF0) == 0)) { Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-30 19:28:40 +02:00
Marcel Holtmann	9ab9bb009c	6lowpan: iphc: Fix missing braces for if statement CHECK: braces {} should be used on all arms of this statement + if ((iphc0 & 0x03) != LOWPAN_IPHC_TTL_I) [...] + else { [...] Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-30 19:28:40 +02:00
Marcel Holtmann	26fff593cd	6lowpan: iphc: Fix missing blank line after variable declarations WARNING: Missing a blank line after declarations + struct sk_buff *new; + if (uncompress_udp_header(skb, &uh)) Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-30 19:28:40 +02:00
Marcel Holtmann	7fc4cfda75	6lowpan: iphc: Fix issues with alignment matching open parenthesis This patch fixes all the issues with alignment matching of open parenthesis found by checkpatch.pl and makes them follow the network coding style now. CHECK: Alignment should match open parenthesis +static int uncompress_addr(struct sk_buff skb, + struct in6_addr ipaddr, const u8 address_mode, CHECK: Alignment should match open parenthesis +static int uncompress_context_based_src_addr(struct sk_buff skb, + struct in6_addr ipaddr, CHECK: Alignment should match open parenthesis +static int skb_deliver(struct sk_buff skb, struct ipv6hdr hdr, + struct net_device dev, skb_delivery_cb deliver_skb) CHECK: Alignment should match open parenthesis + new = skb_copy_expand(skb, sizeof(struct ipv6hdr), skb_tailroom(skb), + GFP_ATOMIC); CHECK: Alignment should match open parenthesis + raw_dump_table(__func__, "raw skb data dump before receiving", + new->data, new->len); CHECK: Alignment should match open parenthesis +lowpan_uncompress_multicast_daddr(struct sk_buff skb, + struct in6_addr ipaddr, CHECK: Alignment should match open parenthesis + raw_dump_inline(NULL, "Reconstructed ipv6 multicast addr is", + ipaddr->s6_addr, 16); CHECK: Alignment should match open parenthesis +int lowpan_process_data(struct sk_buff skb, struct net_device dev, + const u8 saddr, const u8 saddr_type, const u8 saddr_len, CHECK: Alignment should match open parenthesis + raw_dump_table(__func__, "raw skb data dump uncompressed", + skb->data, skb->len); CHECK: Alignment should match open parenthesis + err = uncompress_addr(skb, &hdr.saddr, tmp, saddr, + saddr_type, saddr_len); CHECK: Alignment should match open parenthesis + err = uncompress_addr(skb, &hdr.daddr, tmp, daddr, + daddr_type, daddr_len); CHECK: Alignment should match open parenthesis + pr_debug("dest: stateless compression mode %d dest %pI6c\n", + tmp, &hdr.daddr); CHECK: Alignment should match open parenthesis + raw_dump_table(__func__, "raw UDP header dump", + (u8 )&uh, sizeof(uh)); CHECK: Alignment should match open parenthesis + raw_dump_table(__func__, "raw header dump", (u8 )&hdr, + sizeof(hdr)); CHECK: Alignment should match open parenthesis +int lowpan_header_compress(struct sk_buff skb, struct net_device dev, + unsigned short type, const void *_daddr, CHECK: Alignment should match open parenthesis + raw_dump_table(__func__, "raw skb network header dump", + skb_network_header(skb), sizeof(struct ipv6hdr)); CHECK: Alignment should match open parenthesis + raw_dump_table(__func__, + "sending raw skb network uncompressed packet", CHECK: Alignment should match open parenthesis + if (((hdr->flow_lbl[0] & 0x0F) == 0) && + (hdr->flow_lbl[1] == 0) && (hdr->flow_lbl[2] == 0)) { WARNING: quoted string split across lines + pr_debug("dest address unicast link-local %pI6c " + "iphc1 0x%02x\n", &hdr->daddr, iphc1); CHECK: Alignment should match open parenthesis + raw_dump_table(__func__, "raw skb data dump compressed", + skb->data, skb->len); Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-30 19:28:40 +02:00
Marcel Holtmann	89f534905a	6lowpan: iphc: Fix block comments to match networking style This patch fixes all the block comment issues found by checkpatch.pl and makes them match the network style now. WARNING: networking block comments don't use an empty /* line, use /* Comment... +/* + * Based on patches from Jon Smirl <jonsmirl@gmail.com> WARNING: networking block comments don't use an empty /* line, use /* Comment... +/* + * Uncompress address function for source and WARNING: networking block comments don't use an empty /* line, use /* Comment... +/* + * Uncompress address function for source context WARNING: networking block comments don't use an empty /* line, use /* Comment... + /* + * UDP lenght needs to be infered from the lower layers WARNING: networking block comments don't use an empty /* line, use /* Comment... + /* + * Traffic Class and FLow Label carried in-line WARNING: networking block comments don't use an empty /* line, use /* Comment... + /* + * Traffic class carried in-line WARNING: networking block comments don't use an empty /* line, use /* Comment... + /* + * Flow Label carried in-line WARNING: networking block comments don't use an empty /* line, use /* Comment... + /* + * replace the compressed UDP head by the uncompressed UDP WARNING: networking block comments don't use an empty /* line, use /* Comment... + /* + * As we copy some bit-length fields, in the IPHC encoding bytes, WARNING: networking block comments don't use an empty /* line, use /* Comment... + /* + * Traffic class, flow label WARNING: networking block comments don't use an empty /* line, use /* Comment... + /* + * Hop limit Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-30 19:28:40 +02:00
Alexander Aring	b2e3a479a6	6lowpan: iphc: remove check on null This memory is placed on stack and can't be null so remove the check on null. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:39 +02:00
Alexander Aring	556a5bfc03	6lowpan: iphc: use ipv6 api to check address scope This patch removes the own implementation to check of link-layer, broadcast and any address type and use the IPv6 api for that. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:39 +02:00
Alexander Aring	85c71240a3	6lowpan: iphc: cleanup use of lowpan_push_hc_data This patch uses the lowpan_push_hc_data functions in several places where we can use it. The lowpan_push_hc_data was introduced in some previous patches. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:39 +02:00
Alexander Aring	4ebc960f94	6lowpan: iphc: cleanup use of lowpan_fetch_skb We introduced the lowpan_fetch_skb function in some previous patches for 6lowpan to have a generic fetch function. This patch drops the old function and use the generic lowpan_fetch_skb one. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:39 +02:00
Alexander Aring	8ec1d9be32	6lowpan: iphc: use sizeof in udp uncompression Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:39 +02:00
Alexander Aring	84ca5e036f	6lowpan: iphc: rename hc06_ptr pointer to hc_ptr The hc06_ptr pointer variable stands for header compression draft-06. We are mostly rfc complaint. This patch rename the variable to normal hc_ptr. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:39 +02:00
Johan Hedberg	616d55be4c	Bluetooth: Fix SMP context tracking leading to a kernel crash The HCI_CONN_LE_SMP_PEND flag is supposed to indicate whether we have an SMP context or not. If the context creation fails, or some other error is indicated between setting the flag and creating the context the flag must be cleared first. This patch ensures that smp_chan_create() clears the flag in case of allocation failure as well as reorders code in smp_cmd_security_req() that could lead to returning an error between setting the flag and creating the context. Without the patch the following kind of kernel crash could be observed (this one because of unacceptable authentication requirements in a Security Request): [ +0.000855] kernel BUG at net/bluetooth/smp.c:606! [ +0.000000] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ +0.000000] CPU: 0 PID: 58 Comm: kworker/u5:2 Tainted: G W 3.16.0-rc1+ #785 [ +0.008391] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ +0.000000] Workqueue: hci0 hci_rx_work [ +0.000000] task: f4dc8f90 ti: f4ef0000 task.ti: f4ef0000 [ +0.000000] EIP: 0060:[<c13432b6>] EFLAGS: 00010246 CPU: 0 [ +0.000000] EIP is at smp_chan_destroy+0x1e/0x145 [ +0.000709] EAX: f46db870 EBX: 00000000 ECX: 00000000 EDX: 00000005 [ +0.000000] ESI: f46db870 EDI: f46db870 EBP: f4ef1dc0 ESP: f4ef1db0 [ +0.000000] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ +0.000000] CR0: 8005003b CR2: b666b0b0 CR3: 00022000 CR4: 00000690 [ +0.000000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ +0.000000] DR6: fffe0ff0 DR7: 00000400 [ +0.000000] Stack: [ +0.000000] 00000005 f17b7840 f46db870 f4ef1dd4 f4ef1de4 c1343441 c134342e 00000000 [ +0.000000] c1343441 00000005 00000002 00000000 f17b7840 f4ef1e38 c134452a 00002aae [ +0.000000] 01ef1e00 00002aae f46bd980 f46db870 00000039 ffffffff 00000007 f4ef1e34 [ +0.000000] Call Trace: [ +0.000000] [<c1343441>] smp_failure+0x64/0x6c [ +0.000000] [<c134342e>] ? smp_failure+0x51/0x6c [ +0.000000] [<c1343441>] ? smp_failure+0x64/0x6c [ +0.000000] [<c134452a>] smp_sig_channel+0xad6/0xafc [ +0.000000] [<c1053b61>] ? vprintk_emit+0x343/0x366 [ +0.000000] [<c133f34e>] l2cap_recv_frame+0x1337/0x1ac4 [ +0.000000] [<c133f34e>] ? l2cap_recv_frame+0x1337/0x1ac4 [ +0.000000] [<c1172307>] ? __dynamic_pr_debug+0x3e/0x40 [ +0.000000] [<c11702a1>] ? debug_smp_processor_id+0x12/0x14 [ +0.000000] [<c1340bc9>] l2cap_recv_acldata+0xe8/0x239 [ +0.000000] [<c1340bc9>] ? l2cap_recv_acldata+0xe8/0x239 [ +0.000000] [<c1169931>] ? __const_udelay+0x1a/0x1c [ +0.000000] [<c131f120>] hci_rx_work+0x1a1/0x286 [ +0.000000] [<c137244e>] ? mutex_unlock+0x8/0xa [ +0.000000] [<c131f120>] ? hci_rx_work+0x1a1/0x286 [ +0.000000] [<c1038fe5>] process_one_work+0x128/0x1df [ +0.000000] [<c1038fe5>] ? process_one_work+0x128/0x1df [ +0.000000] [<c10392df>] worker_thread+0x222/0x2de [ +0.000000] [<c10390bd>] ? process_scheduled_works+0x21/0x21 [ +0.000000] [<c103d34c>] kthread+0x82/0x87 [ +0.000000] [<c1040000>] ? create_new_namespaces+0x90/0x105 [ +0.000000] [<c13738e1>] ret_from_kernel_thread+0x21/0x30 [ +0.000000] [<c103d2ca>] ? __kthread_parkme+0x50/0x50 [ +0.000000] Code: 65 f4 89 f0 5b 5e 5f 5d 8d 67 f8 5f c3 57 8d 7c 24 08 83 e4 f8 ff 77 fc 55 89 e5 57 89 c7 56 53 52 8b 98 e0 00 00 00 85 db 75 02 <0f> 0b 8b b3 80 00 00 00 8b 00 c1 ee 03 83 e6 01 89 f2 e8 ef 09 [ +0.000000] EIP: [<c13432b6>] smp_chan_destroy+0x1e/0x145 SS:ESP 0068:f4ef1db0 Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-30 19:28:38 +02:00
Alexey Perevalov	d24675cb1f	netfilter: nfnetlink_acct: dump unmodified nfacct flags NFNL_MSG_ACCT_GET_CTRZERO modifies dumped flags, in this case client see unmodified (uncleared) counter value and cleared overquota state - end user doesn't know anything about overquota state, unless end user subscribed on overquota report. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-30 18:16:56 +02:00
Eric W. Biederman	728dba3a39	namespaces: Use task_lock and not rcu to protect nsproxy The synchronous syncrhonize_rcu in switch_task_namespaces makes setns a sufficiently expensive system call that people have complained. Upon inspect nsproxy no longer needs rcu protection for remote reads. remote reads are rare. So optimize for same process reads and write by switching using rask_lock instead. This yields a simpler to understand lock, and a faster setns system call. In particular this fixes a performance regression observed by Rafael David Tinoco <rafael.tinoco@canonical.com>. This is effectively a revert of Pavel Emelyanov's commit `cf7b708c8d` Make access to task's nsproxy lighter from 2007. The race this originialy fixed no longer exists as do_notify_parent uses task_active_pid_ns(parent) instead of parent->nsproxy. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-29 18:08:50 -07:00
Karoly Kemeny	c54a5e0247	ipv4: clean up cast warning in do_ip_getsockopt Sparse warns because of implicit pointer cast. v2: subject line correction, space between "void" and "*" Signed-off-by: Karoly Kemeny <karoly.kemeny@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 16:31:16 -07:00
Wei Yongjun	ad025a56a5	tipc: remove duplicated include from socket.c Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 15:51:14 -07:00
Himangi Saraogi	27446442a8	net/udp_offload: Use IS_ERR_OR_NULL This patch introduces the use of the macro IS_ERR_OR_NULL in place of tests for NULL and IS_ERR. The following Coccinelle semantic patch was used for making the change: @@ expression e; @@ - e == NULL \|\| IS_ERR(e) + IS_ERR_OR_NULL(e) \|\| ... Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 15:31:56 -07:00
Himangi Saraogi	d0e992aa02	openvswitch: Use IS_ERR_OR_NULL This patch introduces the use of the macro IS_ERR_OR_NULL in place of tests for NULL and IS_ERR. The following Coccinelle semantic patch was used for making the change: @@ expression e; @@ - e == NULL \|\| IS_ERR(e) + IS_ERR_OR_NULL(e) \|\| ... Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 15:31:56 -07:00
Himangi Saraogi	5a8dbf03dd	net/ipv4: Use IS_ERR_OR_NULL This patch introduces the use of the macro IS_ERR_OR_NULL in place of tests for NULL and IS_ERR. The following Coccinelle semantic patch was used for making the change: @@ expression e; @@ - e == NULL \|\| IS_ERR(e) + IS_ERR_OR_NULL(e) \|\| ... Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 15:31:56 -07:00
Trond Myklebust	518776800c	SUNRPC: Allow svc_reserve() to notify TCP socket that space has been freed Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 16:10:20 -04:00
Trond Myklebust	c7fb3f0631	SUNRPC: svc_tcp_write_space: don't clear SOCK_NOSPACE prematurely If requests are queued in the socket inbuffer waiting for an svc_tcp_has_wspace() requirement to be satisfied, then we do not want to clear the SOCK_NOSPACE flag until we've satisfied that requirement. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 16:10:19 -04:00
Trond Myklebust	0971374e28	SUNRPC: Reduce contention in svc_xprt_enqueue() Ensure that all calls to svc_xprt_enqueue() except svc_xprt_received() check the value of XPT_BUSY, before attempting to grab spinlocks etc. This is to avoid situations such as the following "perf" trace, which shows heavy contention on the pool spinlock: 54.15% nfsd [kernel.kallsyms] [k] _raw_spin_lock_bh \| --- _raw_spin_lock_bh \| \|--71.43%-- svc_xprt_enqueue \| \| \| \|--50.31%-- svc_reserve \| \| \| \|--31.35%-- svc_xprt_received \| \| \| \|--18.34%-- svc_tcp_data_ready ... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 16:10:15 -04:00
Andrey Ryabinin	40eea803c6	net: sendmsg: fix NULL pointer dereference Sasha's report: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel with the KASAN patchset, I've stumbled on the following spew: > > [ 4448.949424] ================================================================== > [ 4448.951737] AddressSanitizer: user-memory-access on address 0 > [ 4448.952988] Read of size 2 by thread T19638: > [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 > [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40 > [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d > [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000 > [ 4448.961266] Call Trace: > [ 4448.963158] dump_stack (lib/dump_stack.c:52) > [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184) > [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352) > [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555) > [ 4448.970103] sock_sendmsg (net/socket.c:654) > [ 4448.971584] ? might_fault (mm/memory.c:3741) > [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740) > [ 4448.973596] ? verify_iovec (net/core/iovec.c:64) > [ 4448.974522] ___sys_sendmsg (net/socket.c:2096) > [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254) > [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273) > [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1)) > [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188) > [ 4448.980535] __sys_sendmmsg (net/socket.c:2181) > [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) > [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2)) > [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.986754] SyS_sendmmsg (net/socket.c:2201) > [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542) > [ 4448.988929] ================================================================== This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0. After this report there was no usual "Unable to handle kernel NULL pointer dereference" and this gave me a clue that address 0 is mapped and contains valid socket address structure in it. This bug was introduced in `f3d3342602` (net: rework recvmsg handler msg_name and msg_namelen logic). Commit message states that: "Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address." But in fact this affects sendto when address 0 is mapped and contains socket address structure in it. In such case copy-in address will succeed, verify_iovec() function will successfully exit with msg->msg_namelen > 0 and msg->msg_name == NULL. This patch fixes it by setting msg_namelen to 0 if msg_name == NULL. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 12:20:22 -07:00
WANG Cong	9c5ff24f96	vlan: fail early when creating netdev named config Similarly, vlan will create /proc/net/vlan/<dev>, so when we create dev with name "config", it will confict with /proc/net/vlan/config. Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 11:43:50 -07:00
WANG Cong	a317a2f19d	ipv6: fail early when creating netdev named all or default We create a proc dir for each network device, this will cause conflicts when the devices have name "all" or "default". Rather than emitting an ugly kernel warning, we could just fail earlier by checking the device name. Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 11:43:50 -07:00
WANG Cong	20e61da7ff	ipv4: fail early when creating netdev named all or default We create a proc dir for each network device, this will cause conflicts when the devices have name "all" or "default". Rather than emitting an ugly kernel warning, we could just fail earlier by checking the device name. Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 11:43:50 -07:00
Willem de Bruijn	4d276eb6a4	net: remove deprecated syststamp timestamp The SO_TIMESTAMPING API defines three types of timestamps: software, hardware in raw format (hwtstamp) and hardware converted to system format (syststamp). The last has been deprecated in favor of combining hwtstamp with a PTP clock driver. There are no active users in the kernel. The option was device driver dependent. If set, but without hardware support, the correct behavior is to return zero in the relevant field in the SCM_TIMESTAMPING ancillary message. Without device drivers implementing the option, this field is effectively always zero. Remove the internal plumbing to dissuage new drivers from implementing the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to avoid breaking existing applications that request the timestamp. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 11:39:50 -07:00
Willem de Bruijn	68a360e82e	packet: remove deprecated syststamp timestamp No device driver will ever return an skb_shared_info structure with syststamp non-zero, so remove the branch that tests for this and optionally marks the packet timestamp as TP_STATUS_TS_SYS_HARDWARE. Do not remove the definition TP_STATUS_TS_SYS_HARDWARE, as processes may refer to it. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 11:39:50 -07:00
John W. Linville	a1ae52c203	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-07-29 10:32:36 -04:00
John W. Linville	ec87652694	NFC: 3.17 pull request This is the NFC pull request for 3.17. This is a rather quiet one, we have: - A new driver from ST Microelectronics for their NCI ST21NFCB, including device tree support. - p2p support for the ST21NFCA driver - A few fixes an enhancements for the NFC digital layer -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJT0CJhAAoJEIqAPN1PVmxKtnAP/jOu1cP/EsegdByqmEumjSGQ mBv7wC+/r91wV6Hny8Ij8uxyN+/QfxF75nwPjTrPSApo7mHOlF8FeY0/EktxJazo c/NIAntNwREIbpkc68CIQNrYr9YFkKEM+7lCV1ImALUb/CPfiH7Fx7vGdhkCKqkc B8etkLyeJtl9EqSZM7GI2YrEbPzPEWLk2ydVQ9BccxvN0I8rc29DnD+DNR5sL6Wa 7bEmZF5J/GnVErvnS8uHDGgerpzlFAj7MONrsxADZCHPie0F87T3wXAGwKlaBY6R nOde0YCS749JxN1c2AdmgIadwo5XqjeSbjQ1g8L8HJTWY8Tl9Vw8GoQFN9qhHkSM gkOya77n0R8SZWoJzo3BFBMpsncVG2cJIsnwpRBIWMUzg76mGe7Fzl21KCXD8xyy xvy8Ar3QKPDTu6uvLNEPk9+cfl/JxQAoLNL30eeZGBDBSg/g2ptNiBYNLXvVoEtU B/xyTdmA1SXQnOKGKNxjFCo+WZDXSoTrWeml/uvBhprVAj6YS3K/imc9EiL4zcD7 72iNGZbIZRfw91x7VbbQ5Nb8PEyYsLef8ztUFM9HgliNgIDMHaQHoXYwmD4264uO RGTETdHYQb0ltX3HsBiIgTNF+Y1sJUVP3TyEhjSk58TpCy/S6v9YjcT9RYmNpuPk YftdxfapAKV0EJhcQc1r =wR0y -----END PGP SIGNATURE----- Merge tag 'nfc-next-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz <sameo@linux.intel.com> says: "NFC: 3.17 pull request This is the NFC pull request for 3.17. This is a rather quiet one, we have: - A new driver from ST Microelectronics for their NCI ST21NFCB, including device tree support. - p2p support for the ST21NFCA driver - A few fixes an enhancements for the NFC digital layer" Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-07-29 10:31:20 -04:00
Eric Dumazet	04ca6973f7	ip: make IP identifiers less predictable In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and Jedidiah describe ways exploiting linux IP identifier generation to infer whether two machines are exchanging packets. With commit `73f156a6e8` ("inetpeer: get rid of ip_id_count"), we changed IP id generation, but this does not really prevent this side-channel technique. This patch adds a random amount of perturbation so that IP identifiers for a given destination [1] are no longer monotonically increasing after an idle period. Note that prandom_u32_max(1) returns 0, so if generator is used at most once per jiffy, this patch inserts no hole in the ID suite and do not increase collision probability. This is jiffies based, so in the worst case (HZ=1000), the id can rollover after ~65 seconds of idle time, which should be fine. We also change the hash used in __ip_select_ident() to not only hash on daddr, but also saddr and protocol, so that ICMP probes can not be used to infer information for other protocols. For IPv6, adds saddr into the hash as well, but not nexthdr. If I ping the patched target, we can see ID are now hard to predict. 21:57:11.008086 IP (...) A > target: ICMP echo request, seq 1, length 64 21:57:11.010752 IP (... id 2081 ...) target > A: ICMP echo reply, seq 1, length 64 21:57:12.013133 IP (...) A > target: ICMP echo request, seq 2, length 64 21:57:12.015737 IP (... id 3039 ...) target > A: ICMP echo reply, seq 2, length 64 21:57:13.016580 IP (...) A > target: ICMP echo request, seq 3, length 64 21:57:13.019251 IP (... id 3437 ...) target > A: ICMP echo reply, seq 3, length 64 [1] TCP sessions uses a per flow ID generator not changed by this patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu> Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Hannes Frederic Sowa <hannes@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 18:46:34 -07:00
Jon Paul Maloy	13e9b9972f	tipc: make tipc_buf_append() more robust As per comment from David Miller, we try to make the buffer reassembly function more resilient to user errors than it is today. - We check that the "buf" parameter always is set, since this is mandatory input. - We ensure that buf->next always is set to NULL before linking in the buffer, instead of relying of the caller to have done this. - We ensure that the "tail" pointer in the head buffer's control block is initialized to NULL when the first fragment arrives. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 18:34:01 -07:00
Jun Zhao	545469f7a5	neighbour : fix ndm_type type error issue ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST. NDA_DST is for netlink TLV type, hence it's not right value in this context. Signed-off-by: Jun Zhao <mypopydev@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 17:52:17 -07:00
David S. Miller	3fd0202a0d	Merge tag 'master-2014-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-07-25 Please pull this batch of updates intended for the 3.17 stream! For the mac80211 bits, Johannes says: "We have a lot of TDLS patches, among them a fix that should make hwsim tests happy again. The rest, this time, is mostly small fixes." For the Bluetooth bits, Gustavo says: "Some more patches for 3.17. The most important change here is the move of the 6lowpan code to net/6lowpan. It has been agreed with Davem that this change will go through the bluetooth tree. The rest are mostly clean up and fixes." and, "Here follows some more patches for 3.17. These are mostly fixes to what we've sent to you before for next merge window." For the iwlwifi bits, Emmanuel says: "I have the usual amount of BT Coex stuff. Arik continues to work on TDLS and Ariej contributes a few things for HS2.0. I added a few more things to the firmware debugging infrastructure. Eran fixes a small bug - pretty normal content." And for the Atheros bits, Kalle says: "For ath6kl me and Jessica added support for ar6004 hw3.0, our latest version of ar6004. For ath10k Janusz added a printout so that it's easier to check what ath10k kconfig options are enabled. He also added a debugfs file to configure maximum amsdu and ampdu values. Also we had few fixes as usual." On top of that is the usual large batch of various driver updates -- brcmfmac, mwifiex, the TI drivers, and wil6210 all get some action. Rafał has also been very busy with b43 and related updates. Also, I pulled the wireless tree into this in order to resolve a merge conflict... P.S. The change to fs/compat_ioctl.c reflects a name change in a Bluetooth header file... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 17:36:25 -07:00
Johan Hedberg	3bd2724010	Bluetooth: Fix incorrectly disabling page scan when toggling connectable If we have entries in the whitelist we shouldn't disable page scanning when disabling connectable mode. This patch adds the necessary check to the Set Connectable command handler. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-28 20:13:32 +02:00
Johan Hedberg	204e399003	Bluetooth: Fix clearing HCI_PSCAN flag This patch fixes a typo in the hci_cc_write_scan_enable() function where we want to clear the HCI_PSCAN flag if the SCAN_PAGE bit of the HCI command parameter was not set. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-28 16:50:52 +02:00
Ingo Molnar	ca5bc6cd5d	Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-28 10:03:00 +02:00
Nikolay Aleksandrov	1bab4c7507	inet: frag: set limits and make init_net's high_thresh limit global This patch makes init_net's high_thresh limit to be the maximum for all namespaces, thus introducing a global memory limit threshold equal to the sum of the individual high_thresh limits which are capped. It also introduces some sane minimums for low_thresh as it shouldn't be able to drop below 0 (or > high_thresh in the unsigned case), and overall low_thresh should not ever be above high_thresh, so we make the following relations for a namespace: init_net: high_thresh - max(not capped), min(init_net low_thresh) low_thresh - max(init_net high_thresh), min (0) all other namespaces: high_thresh = max(init_net high_thresh), min(namespace's low_thresh) low_thresh = max(namespace's high_thresh), min(0) The major issue with having low_thresh > high_thresh is that we'll schedule eviction but never evict anything and thus rely only on the timers. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:36 -07:00
Florian Westphal	ab1c724f63	inet: frag: use seqlock for hash rebuild rehash is rare operation, don't force readers to take the read-side rwlock. Instead, we only have to detect the (rare) case where the secret was altered while we are trying to insert a new inetfrag queue into the table. If it was changed, drop the bucket lock and recompute the hash to get the 'new' chain bucket that we have to insert into. Joint work with Nikolay Aleksandrov. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:36 -07:00
Florian Westphal	e3a57d18b0	inet: frag: remove periodic secret rebuild timer merge functionality into the eviction workqueue. Instead of rebuilding every n seconds, take advantage of the upper hash chain length limit. If we hit it, mark table for rebuild and schedule workqueue. To prevent frequent rebuilds when we're completely overloaded, don't rebuild more than once every 5 seconds. ipfrag_secret_interval sysctl is now obsolete and has been marked as deprecated, it still can be changed so scripts won't be broken but it won't have any effect. A comment is left above each unused secret_timer variable to avoid confusion. Joint work with Nikolay Aleksandrov. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:36 -07:00
Florian Westphal	3fd588eb90	inet: frag: remove lru list no longer used. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:36 -07:00
Florian Westphal	434d305405	inet: frag: don't account number of fragment queues The 'nqueues' counter is protected by the lru list lock, once thats removed this needs to be converted to atomic counter. Given this isn't used for anything except for reporting it to userspace via /proc, just remove it. We still report the memory currently used by fragment reassembly queues. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:36 -07:00
Florian Westphal	b13d3cbfb8	inet: frag: move eviction of queues to work queue When the high_thresh limit is reached we try to toss the 'oldest' incomplete fragment queues until memory limits are below the low_thresh value. This happens in softirq/packet processing context. This has two drawbacks: 1) processors might evict a queue that was about to be completed by another cpu, because they will compete wrt. resource usage and resource reclaim. 2) LRU list maintenance is expensive. But when constantly overloaded, even the 'least recently used' element is recent, so removing 'lru' queue first is not 'fairer' than removing any other fragment queue. This moves eviction out of the fast path: When the low threshold is reached, a work queue is scheduled which then iterates over the table and removes the queues that exceed the memory limits of the namespace. It sets a new flag called INET_FRAG_EVICTED on the evicted queues so the proper counters will get incremented when the queue is forcefully expired. When the high threshold is reached, no more fragment queues are created until we're below the limit again. The LRU list is now unused and will be removed in a followup patch. Joint work with Nikolay Aleksandrov. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:35 -07:00
Florian Westphal	86e93e470c	inet: frag: move evictor calls into frag_find function First step to move eviction handling into a work queue. We lose two spots that accounted evicted fragments in MIB counters. Accounting will be restored since the upcoming work-queue evictor invokes the frag queue timer callbacks instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:35 -07:00
Florian Westphal	fb3cfe6e75	inet: frag: remove hash size assumptions from callers hide actual hash size from individual users: The _find function will now fold the given hash value into the required range. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:35 -07:00
Florian Westphal	36c7778218	inet: frag: constify match, hashfn and constructor arguments Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-27 22:34:35 -07:00
Marcel Holtmann	32226e4f1a	Bluetooth: Set Simultaneous LE and BR/EDR controller option to zero With the Bluetooth 4.1 specification the Simultaneous LE and BR/EDR controller option has been deprecated. It shall be set to zero and ignored otherwise. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-27 10:25:52 +03:00
Georg Lukas	729a1051da	Bluetooth: Expose default LE advertising interval via debugfs Expose the default values for minimum and maximum LE advertising interval via debugfs for testing purposes. Signed-off-by: Georg Lukas <georg@op-co.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-26 19:05:10 +02:00
Georg Lukas	628531c9e9	Bluetooth: Provide defaults for LE advertising interval Store the default values for minimum and maximum advertising interval with all the other controller defaults. These vaules are sent to the adapter whenever advertising is (re)enabled. Signed-off-by: Georg Lukas <georg@op-co.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-26 19:05:09 +02:00
Marcel Holtmann	66d8e837ab	Bluetooth: Fix white list handling with resolvable private addresses Devices using resolvable private addresses are required to provide an identity resolving key. These devices can not be found using the current controller white list support. This means if the kernel knows about any devices with an identity resolving key, the white list filtering must be disabled. However so far the kernel kept identity resolving keys around even for devices that are not using resolvable private addresses. The notification to userspace clearly hints to not store the key and so it is best to just remove the key from the kernel as well at that point. With this it easy now to detect when using the white list is possible or when kernel side resolving of addresses is required. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-26 14:13:19 +03:00
Marcel Holtmann	8540f6c036	Bluetooth: Add support for using controller white list filtering The Bluetooth controller can use a white list filter when scanning to avoid waking up the host for devices that are of no interest. Devices marked as reporting, direct connection (incoming) or general connection are now added to the controller white list. The update of the white list happens just before enabling passive scanning. In case the white list is full and can not hold all devices, the white list is not used and the filter policy set to accept all advertisements. Using the white list for scanning allows for power saving with controllers that do not handle the duplicate filtering correctly. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-26 14:13:17 +03:00
John W. Linville	9a244409d0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Conflicts: net/mac80211/cfg.c Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-07-25 10:22:36 -04:00
Jiri Prchal	8452e6ff3e	netfilter: xt_LED: fix too short led-always-blink If led-always-blink is set, then between switch led OFF and ON is almost zero time. So blink is invisible. This use oneshot led trigger with fixed time 50ms witch is enough to see blink. Signed-off-by: Jiri Prchal <jiri.prchal@aksignal.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-25 16:09:28 +02:00
Paul Bolle	d4da843e6f	netfilter: kill remnants of ulog targets The ulog targets were recently killed. A few references to the Kconfig macros CONFIG_IP_NF_TARGET_ULOG and CONFIG_BRIDGE_EBT_ULOG were left untouched. Kill these too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-25 14:55:44 +02:00
Duan Jiong	a2b60c75fa	netfilter: xt_LED: don't output error message redundantly The function led_trigger_register() will only return -EEXIST when error arises. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-25 14:55:33 +02:00
Himangi Saraogi	5bd3a76f4b	netfilter: nf_conntrack: remove exceptional & on function name In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-25 14:50:58 +02:00
Li RongQing	ac3d2e5a9e	ipv6: remove obsolete comment in ip6_append_data() After 11878b40e[net-timestamp: SOCK_RAW and PING timestamping], this comment becomes obsolete since the codes check not only UDP socket, but also RAW sock; and the codes are clear, not need the comments Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-24 23:47:04 -07:00
WANG Cong	6b53dafe23	net: do not name the pointer to struct net_device net "net" is normally for struct net*, pointer to struct net_device should be named to either "dev" or "ndev" etc. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-24 23:33:55 -07:00
Alexei Starovoitov	2695fb552c	net: filter: rename 'struct sock_filter_int' into 'struct bpf_insn' eBPF is used by socket filtering, seccomp and soon by tracing and exposed to userspace, therefore 'sock_filter_int' name is not accurate. Rename it to 'bpf_insn' Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-24 23:27:17 -07:00
Himangi Saraogi	e40f5c7234	net_sched: remove exceptional & on function name In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-24 23:23:32 -07:00
Himangi Saraogi	56ec0fb10c	neigh: remove exceptional & on function name In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-24 23:23:31 -07:00
Himangi Saraogi	179542a548	igmp: remove exceptional & on function name In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-24 23:23:31 -07:00
David S. Miller	e62f77579c	Merge tag 'master-2014-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-07-24 Please pull this batch of fixes intended for the 3.16 stream... For the mac80211 fixes, Johannes says: "I have two fixes: one for tracing that fixes a long-standing NULL pointer dereference, and one for a mac80211 issue that causes iwlmvm to send invalid frames during authentication/association." and, "One more fix - for a bug in the newly introduced code that obtains rate control information for stations." For the iwlwifi fixes, Emmanuel says: "It includes a merge damage fix. This region has been changed in -next and -fixes quite a few times and apparently, I failed to handle it properly, so here the fix. Along with that I have a fix from Eliad to properly handle overlapping BSS in AP mode." On top of that, Felix provides and ath9k fix for Tx stalls that happen after an aggregation session failure. Please let me know if there are problems! There are some changes here that will cause merge conflicts in -next. Once you merge this I can pull it into wireless-next and resolve those issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-24 23:22:15 -07:00
David S. Miller	29be618076	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Via Simon Horman, I received the following one-liner for your net tree: 1) Fix crash when exiting from netns that uses IPVS and conntrack, from Julian Anastasov via Simon Horman. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-24 18:00:05 -07:00
Andy Zhou	d9e0ecb814	openvswitch: Add skb_clone NULL check for the sampling action. Fix a bug where skb_clone() NULL check is missing in sample action implementation. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-07-24 09:37:22 -07:00
Simon Horman	651887b0c2	openvswitch: Sample action without side effects The sample action is rather generic, allowing arbitrary actions to be executed based on a probability. However its use, within the Open vSwitch code-base is limited: only a single user-space action is ever nested. A consequence of the current implementation of sample actions is that depending on weather the sample action executed (due to its probability) any side-effects of nested actions may or may not be present before executing subsequent actions. This has the potential to complicate verification of valid actions by the (kernel) datapath. And indeed adding support for push and pop MPLS actions inside sample actions is one case where such case. In order to allow all supported actions to be continue to be nested inside sample actions without the potential need for complex verification code this patch changes the implementation of the sample action in the kernel datapath so that sample actions are more like a function call and any side effects of nested actions are not present when executing subsequent actions. With the above in mind the motivation for this change is twofold: * To contain side-effects the sample action in the hope of making it easier to deal with in the future and; * To avoid some rather complex verification code introduced in the MPLS datapath patch. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-07-24 09:37:21 -07:00
Andy Zhou	f53e38317d	openvswitch: Avoid memory corruption in queue_userspace_packet() In queue_userspace_packet(), the ovs_nla_put_flow return value is not checked. This is fine as long as key_attr_size() returns the correct value. In case it does not, the current code may corrupt buffer memory. Add a run time assertion catch this case to avoid silent failure. Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-07-24 09:37:20 -07:00
Eric Dumazet	7bd8490eef	netfilter: xt_hashlimit: perform garbage collection from process context xt_hashlimit cannot be used with large hash tables, because garbage collector is run from a timer. If table is really big, its possible to hold cpu for more than 500 msec, which is unacceptable. Switch to a work queue, and use proper scheduling points to remove latencies spikes. Later, we also could switch to a smoother garbage collection done at lookup time, one bucket at a time... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-24 13:07:25 +02:00
Pravin B Shelar	f6eec614d2	openvswitch: Enable tunnel GSO for OVS bridge. Following patch enables all available tunnel GSO features for OVS bridge device so that ovs can use hardware offloads available to underling device. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2014-07-24 01:15:04 -07:00
Alex Wang	5cd667b0a4	openvswitch: Allow each vport to have an array of 'port_id's. In order to allow handlers directly read upcalls from datapath, we need to support per-handler netlink socket for each vport in datapath. This commit makes this happen. Also, it is guaranteed to be backward compatible with previous branch. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-07-24 01:15:04 -07:00
David S. Miller	11f1fb3459	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2014-07-23 Just two fixes this time, both are stable candidates. 1) Fix the dst_entry refcount on socket policy usage. 2) Fix a wrong SPI check that prevents AH SAs from getting installed, dependent on the SPI. From Tobias Brunner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-23 21:56:36 -07:00
Alexei Starovoitov	f5bffecda9	net: filter: split filter.c into two files BPF is used in several kernel components. This split creates logical boundary between generic eBPF core and the rest kernel/bpf/core.c: eBPF interpreter net/core/filter.c: classic->eBPF converter, classic verifiers, socket filters This patch only moves functions. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-23 21:06:22 -07:00
Quentin Armitage	f5220d6399	ipv4: Make IP_MULTICAST_ALL and IP_MSFILTER work on raw sockets Currently, although IP_MULTICAST_ALL and IP_MSFILTER ioctl calls succeed on raw sockets, there is no code to implement the functionality on received packets; it is only implemented for UDP sockets. The raw(7) man page states: "In addition, all ip(7) IPPROTO_IP socket options valid for datagram sockets are supported", which implies these ioctls should work on raw sockets. To fix this, add a call to ip_mc_sf_allow on raw sockets. This should not break any existing code, since the current position of not calling ip_mc_sf_filter makes it behave as if neither the IP_MULTICAST_ALL nor the IP_MSFILTER ioctl had been called. Adding the call to ip_mc_sf_allow will therefore maintain the current behaviour so long as IP_MULTICAST_ALL and IP_MSFILTER ioctls are not called. Any code that currently is calling IP_MULTICAST_ALL or IP_MSFILTER ioctls on raw sockets presumably is wanting the filter to be applied, although no filtering will currently be occurring. Signed-off-by: Quentin Armitage <quentin@armitage.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-23 15:13:26 -07:00
Marcel Holtmann	4b9e7e7516	Bluetooth: Fix issue with ADV_IND reports and auto-connection handling When adding remote devices to the kernel using the Add Device management command, these devices are explicitly allowed to connect. This kind of incoming connections are possible even when the controller itself is not connectable. For BR/EDR this distinction is pretty simple since there is only one type of incoming connections. With LE this is not that simple anymore since there are ADV_IND and ADV_DIRECT_IND advertising events. The ADV_DIRECT_IND advertising events are send for incoming (slave initiated) connections only. And this is the only thing the kernel should allow when adding devices using action 0x01. This meaning of incoming connections is coming from BR/EDR and needs to be mapped to LE the same way. Supporting the auto-connection of devices using ADV_IND advertising events is an important feature as well. However it does not map to incoming connections. So introduce a new action 0x02 that allows the kernel to connect to devices using ADV_DIRECT_IND and in addition ADV_IND advertising reports. This difference is represented by the new HCI_AUTO_CONN_DIRECT value for only connecting to ADV_DIRECT_IND. For connection to ADV_IND and ADV_DIRECT_IND the old value HCI_AUTO_CONN_ALWAYS is used. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-24 00:37:23 +03:00
Marcel Holtmann	cd4d567138	Bluetooth: Ignore ADV_DIRECT_IND attempts from unknown devices Unconditionally connecting to devices sending ADV_DIRECT_IND when the controller is in CONNECTABLE mode is a feature that is not fully working. The background scanning trigger for this has been removed, but the statement allowing it to happen in case some other part triggers is still present. So remove that code part as well to avoid unwanted connections. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-24 00:37:20 +03:00
Sorin Dumitru	274f482d33	sock: remove skb argument from sk_rcvqueues_full It hasn't been used since commit 0fd7bac(net: relax rcvbuf limits). Signed-off-by: Sorin Dumitru <sorin@returnze.ro> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-23 13:23:06 -07:00
Marcel Holtmann	f4fe73ed56	Bluetooth: Get MWS transport configuration of the controller If the Bluetooth controller supports Get MWS Transport Layer Configuration command, then issue it during initialization. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-23 20:34:08 +03:00
Marcel Holtmann	109e319193	Bluetooth: Read list of local codecs supported by the controller If the Bluetooth controller supports Read Local Supported Codecs command, then issue it during initialization so that the list of codecs is known. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-23 20:34:06 +03:00
John W. Linville	3b8de07492	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-07-23 13:01:14 -04:00
Daniel Borkmann	1be9a950c6	net: sctp: inherit auth_capable on INIT collisions Jason reported an oops caused by SCTP on his ARM machine with SCTP authentication enabled: Internal error: Oops: 17 [#1] ARM CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1 task: c6eefa40 ti: c6f52000 task.ti: c6f52000 PC is at sctp_auth_calculate_hmac+0xc4/0x10c LR is at sg_init_table+0x20/0x38 pc : [<c024bb80>] lr : [<c00f32dc>] psr: 40000013 sp : c6f538e8 ip : 00000000 fp : c6f53924 r10: c6f50d80 r9 : 00000000 r8 : 00010000 r7 : 00000000 r6 : c7be4000 r5 : 00000000 r4 : c6f56254 r3 : c00c8170 r2 : 00000001 r1 : 00000008 r0 : c6f1e660 Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005397f Table: 06f28000 DAC: 00000015 Process sctp-test (pid: 104, stack limit = 0xc6f521c0) Stack: (0xc6f538e8 to 0xc6f54000) [...] Backtrace: [<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8) [<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844) [<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28) [<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220) [<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4) [<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160) [<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74) [<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888) While we already had various kind of bugs in that area `ec0223ec48` ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable") and `b14878ccb7` ("net: sctp: cache auth_enable per endpoint"), this one is a bit of a different kind. Giving a bit more background on why SCTP authentication is needed can be found in RFC4895: SCTP uses 32-bit verification tags to protect itself against blind attackers. These values are not changed during the lifetime of an SCTP association. Looking at new SCTP extensions, there is the need to have a method of proving that an SCTP chunk(s) was really sent by the original peer that started the association and not by a malicious attacker. To cause this bug, we're triggering an INIT collision between peers; normal SCTP handshake where both sides intent to authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO parameters that are being negotiated among peers: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- RFC4895 says that each endpoint therefore knows its own random number and the peer's random number after the association has been established. The local and peer's random number along with the shared key are then part of the secret used for calculating the HMAC in the AUTH chunk. Now, in our scenario, we have 2 threads with 1 non-blocking SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling sctp_bindx(3), listen(2) and connect(2) against each other, thus the handshake looks similar to this, e.g.: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------- -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------> ... Since such collisions can also happen with verification tags, the RFC4895 for AUTH rather vaguely says under section 6.1: In case of INIT collision, the rules governing the handling of this Random Number follow the same pattern as those for the Verification Tag, as explained in Section 5.2.4 of RFC 2960 [5]. Therefore, each endpoint knows its own Random Number and the peer's Random Number after the association has been established. In RFC2960, section 5.2.4, we're eventually hitting Action B: B) In this case, both sides may be attempting to start an association at about the same time but the peer endpoint started its INIT after responding to the local endpoint's INIT. Thus it may have picked a new Verification Tag not being aware of the previous Tag it had sent this endpoint. The endpoint should stay in or enter the ESTABLISHED state but it MUST update its peer's Verification Tag from the State Cookie, stop any init or cookie timers that may running and send a COOKIE ACK. In other words, the handling of the Random parameter is the same as behavior for the Verification Tag as described in Action B of section 5.2.4. Looking at the code, we exactly hit the sctp_sf_do_dupcook_b() case which triggers an SCTP_CMD_UPDATE_ASSOC command to the side effect interpreter, and in fact it properly copies over peer_{random, hmacs, chunks} parameters from the newly created association to update the existing one. Also, the old asoc_shared_key is being released and based on the new params, sctp_auth_asoc_init_active_key() updated. However, the issue observed in this case is that the previous asoc->peer.auth_capable was 0, and has not been updated, so that instead of creating a new secret, we're doing an early return from the function sctp_auth_asoc_init_active_key() leaving asoc->asoc_shared_key as NULL. However, we now have to authenticate chunks from the updated chunk list (e.g. COOKIE-ACK). That in fact causes the server side when responding with ... <------------------ AUTH; COOKIE-ACK ----------------- ... to trigger a NULL pointer dereference, since in sctp_packet_transmit(), it discovers that an AUTH chunk is being queued for xmit, and thus it calls sctp_auth_calculate_hmac(). Since the asoc->active_key_id is still inherited from the endpoint, and the same as encoded into the chunk, it uses asoc->asoc_shared_key, which is still NULL, as an asoc_key and dereferences it in ... crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len) ... causing an oops. All this happens because sctp_make_cookie_ack() called with the new association has the peer.auth_capable=1 and therefore marks the chunk with auth=1 after checking sctp_auth_send_cid(), but it is actually sent later on over the then updated association's transport that didn't initialize its shared key due to peer.auth_capable=0. Since control chunks in that case are not sent by the temporary association which are scheduled for deletion, they are issued for xmit via SCTP_CMD_REPLY in the interpreter with the context of the updated association. peer.auth_capable was 0 in the updated association (which went from COOKIE_WAIT into ESTABLISHED state), since all previous processing that performed sctp_process_init() was being done on temporary associations, that we eventually throw away each time. The correct fix is to update to the new peer.auth_capable value as well in the collision case via sctp_assoc_update(), so that in case the collision migrated from 0 -> 1, sctp_auth_asoc_init_active_key() can properly recalculate the secret. This therefore fixes the observed server panic. Fixes: `730fc3d05c` ("[SCTP]: Implete SCTP-AUTH parameter processing") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-22 19:56:58 -07:00
Mark A. Greer	bf30a67c94	NFC: digital: Add 'tg_listen_md' and 'tg_get_rf_tech' driver hooks The digital layer of the NFC subsystem currently supports a 'tg_listen_mdaa' driver hook that supports devices that can do mode detection and automatic anticollision. However, there are some devices that can do mode detection but not automatic anitcollision so add the 'tg_listen_md' hook to support those devices. In order for the digital layer to get the RF technology detected by the device from the driver, add the 'tg_get_rf_tech' hook. It is only valid to call this hook immediately after a successful call to 'tg_listen_md'. CC: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-07-23 01:17:31 +02:00
Christophe Ricard	95f7687b20	NFC: hci: Add stop_poll HCI operand. stop_poll allows to stop CLF reader polling. Some other operations might be necessary for some CLF to stop polling. For example in card mode. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-07-23 01:04:31 +02:00
Christophe Ricard	bb15b2170c	NFC: nci: Add T1T support notification Add T1T matching with Jewel during notification. It was causing "the target found does not have the desired protocol" to show up. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-07-23 00:49:36 +02:00
David Howells	633706a2ee	Merge branch 'keys-fixes' into keys-next Signed-off-by: David Howells <dhowells@redhat.com>	2014-07-22 21:55:45 +01:00
David Howells	8a7a3eb4dd	KEYS: RxRPC: Use key preparsing Make use of key preparsing in the RxRPC protocol so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com>	2014-07-22 21:46:41 +01:00
David Howells	d46d494214	KEYS: DNS: Use key preparsing Make use of key preparsing in the DNS resolver so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com>	2014-07-22 21:46:36 +01:00
David Howells	7c3bec0a1f	KEYS: Ceph: Use user_match() Ceph can use user_match() instead of defining its own identical function. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> cc: Tommi Virtanen <tommi.virtanen@dreamhost.com>	2014-07-22 21:46:30 +01:00
David Howells	efa64c0978	KEYS: Ceph: Use key preparsing Make use of key preparsing in Ceph so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> cc: Tommi Virtanen <tommi.virtanen@dreamhost.com>	2014-07-22 21:46:23 +01:00
Chuck Lever	e560e3b510	svcrdma: Add zero padding if the client doesn't send it See RFC 5666 section 3.7: clients don't have to send zero XDR padding. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=246 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-22 16:40:21 -04:00
David Laight	526cbef778	net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAY MSG_MORE and 'corking' a socket would require that the transmit of a data chunk be delayed. Rename the return value to be less specific. Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-22 13:32:11 -07:00
David Laight	723189faca	net: sctp: Open out the check for Nagle The check for Nagle contains 6 separate checks all of which must be true before a data packet is delayed. Separate out each into its own 'if (test) return SCTP_XMIT_OK' so that the reasons can be individually described. Also return directly with SCTP_XMIT_RWND_FULL. Delete the now-unused 'retval' variable and 'finish' label from sctp_packet_can_append_data(). Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-22 13:32:10 -07:00
Felix Fietkau	fa8f136fe9	mac80211: fix crash on getting sta info with uninitialized rate control If the expected throughput is queried before rate control has been initialized, the minstrel op for it will crash while trying to access the rate table. Check for WLAN_STA_RATE_CONTROL before attempting to use the rate control op. Reported-by: Jean-Pierre Tosoni <jp.tosoni@acksys.fr> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-22 22:17:17 +02:00
Yan Burman	bf858ab0ad	xprtrdma: Fix DMA-API-DEBUG warning by checking dma_map result Fix the following warning when DMA-API debug is enabled by checking ib_dma_map_single result: [ 1455.345548] ------------[ cut here ]------------ [ 1455.346863] WARNING: CPU: 3 PID: 3929 at /home/yanb/kernel/net-next/lib/dma-debug.c:1140 check_unmap+0x4e5/0x990() [ 1455.349350] mlx4_core 0000:00:07.0: DMA-API: device driver failed to check map error[device address=0x000000007c9f2090] [size=2656 bytes] [mapped as single] [ 1455.349350] Modules linked in: xprtrdma netconsole configfs nfsv3 nfs_acl ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm autofs4 auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc dm_mirror dm_region_hash dm_log microcode pcspkr mlx4_ib ib_sa ib_mad ib_core ib_addr mlx4_en ipv6 ptp pps_core vxlan mlx4_core virtio_balloon cirrus ttm drm_kms_helper drm sysimgblt sysfillrect syscopyarea i2c_piix4 i2c_core button ext3 jbd virtio_blk virtio_net virtio_pci virtio_ring virtio uhci_hcd ata_generic ata_piix libata [ 1455.349350] CPU: 3 PID: 3929 Comm: mount.nfs Not tainted 3.15.0-rc1-dbg+ #13 [ 1455.349350] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 1455.349350] 0000000000000474 ffff880069dcf628 ffffffff8151c341 ffffffff817b69d8 [ 1455.349350] ffff880069dcf678 ffff880069dcf668 ffffffff8105b5fc 0000000069dcf658 [ 1455.349350] ffff880069dcf778 ffff88007b0c9f00 ffffffff8255ec40 0000000000000a60 [ 1455.349350] Call Trace: [ 1455.349350] [<ffffffff8151c341>] dump_stack+0x52/0x81 [ 1455.349350] [<ffffffff8105b5fc>] warn_slowpath_common+0x8c/0xc0 [ 1455.349350] [<ffffffff8105b6e6>] warn_slowpath_fmt+0x46/0x50 [ 1455.349350] [<ffffffff812e6305>] check_unmap+0x4e5/0x990 [ 1455.349350] [<ffffffff81521fb0>] ? _raw_spin_unlock_irq+0x30/0x60 [ 1455.349350] [<ffffffff812e6a0a>] debug_dma_unmap_page+0x5a/0x60 [ 1455.349350] [<ffffffffa0389583>] rpcrdma_deregister_internal+0xb3/0xd0 [xprtrdma] [ 1455.349350] [<ffffffffa038a639>] rpcrdma_buffer_destroy+0x69/0x170 [xprtrdma] [ 1455.349350] [<ffffffffa03872ff>] xprt_rdma_destroy+0x3f/0xb0 [xprtrdma] [ 1455.349350] [<ffffffffa04a95ff>] xprt_destroy+0x6f/0x80 [sunrpc] [ 1455.349350] [<ffffffffa04a9625>] xprt_put+0x15/0x20 [sunrpc] [ 1455.349350] [<ffffffffa04a899a>] rpc_free_client+0x8a/0xe0 [sunrpc] [ 1455.349350] [<ffffffffa04a8a58>] rpc_release_client+0x68/0xa0 [sunrpc] [ 1455.349350] [<ffffffffa04a9060>] rpc_shutdown_client+0xb0/0xc0 [sunrpc] [ 1455.349350] [<ffffffffa04a8f5d>] ? rpc_ping+0x5d/0x70 [sunrpc] [ 1455.349350] [<ffffffffa04a91ab>] rpc_create_xprt+0xbb/0xd0 [sunrpc] [ 1455.349350] [<ffffffffa04a9273>] rpc_create+0xb3/0x160 [sunrpc] [ 1455.349350] [<ffffffff81129749>] ? __probe_kernel_read+0x69/0xb0 [ 1455.349350] [<ffffffffa053851c>] nfs_create_rpc_client+0xdc/0x100 [nfs] [ 1455.349350] [<ffffffffa0538cfa>] nfs_init_client+0x3a/0x90 [nfs] [ 1455.349350] [<ffffffffa05391c8>] nfs_get_client+0x478/0x5b0 [nfs] [ 1455.349350] [<ffffffffa0538e50>] ? nfs_get_client+0x100/0x5b0 [nfs] [ 1455.349350] [<ffffffff81172c6d>] ? kmem_cache_alloc_trace+0x24d/0x260 [ 1455.349350] [<ffffffffa05393f3>] nfs_create_server+0xf3/0x4c0 [nfs] [ 1455.349350] [<ffffffffa0545ff0>] ? nfs_request_mount+0xf0/0x1a0 [nfs] [ 1455.349350] [<ffffffffa031c0c3>] nfs3_create_server+0x13/0x30 [nfsv3] [ 1455.349350] [<ffffffffa0546293>] nfs_try_mount+0x1f3/0x230 [nfs] [ 1455.349350] [<ffffffff8108ea21>] ? get_parent_ip+0x11/0x50 [ 1455.349350] [<ffffffff812d6343>] ? __this_cpu_preempt_check+0x13/0x20 [ 1455.349350] [<ffffffff810d632b>] ? try_module_get+0x6b/0x190 [ 1455.349350] [<ffffffffa05449f7>] nfs_fs_mount+0x187/0x9d0 [nfs] [ 1455.349350] [<ffffffffa0545940>] ? nfs_clone_super+0x140/0x140 [nfs] [ 1455.349350] [<ffffffffa0543b20>] ? nfs_auth_info_match+0x40/0x40 [nfs] [ 1455.349350] [<ffffffff8117e360>] mount_fs+0x20/0xe0 [ 1455.349350] [<ffffffff811a1c16>] vfs_kern_mount+0x76/0x160 [ 1455.349350] [<ffffffff811a29a8>] do_mount+0x428/0xae0 [ 1455.349350] [<ffffffff811a30f0>] SyS_mount+0x90/0xe0 [ 1455.349350] [<ffffffff8152af52>] system_call_fastpath+0x16/0x1b [ 1455.349350] ---[ end trace f1f31572972e211d ]--- Signed-off-by: Yan Burman <yanb@mellanox.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-07-22 13:55:30 -04:00
John W. Linville	bd6fb31fd3	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-07-22 13:50:23 -04:00
John W. Linville	a006827a15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2014-07-22 13:49:34 -04:00
Vignesh Raman	32333edb82	Bluetooth: Avoid use of session socket after the session gets freed The commits `08c30aca9e` "Bluetooth: Remove RFCOMM session refcnt" and `8ff52f7d04` "Bluetooth: Return RFCOMM session ptrs to avoid freed session" allow rfcomm_recv_ua and rfcomm_session_close to delete the session (and free the corresponding socket) and propagate NULL session pointer to the upper callers. Additional fix is required to terminate the loop in rfcomm_process_rx function to avoid use of freed 'sk' memory. The issue is only reproducible with kernel option CONFIG_PAGE_POISONING enabled making freed memory being changed and filled up with fixed char value used to unmask use-after-free issues. Signed-off-by: Vignesh Raman <Vignesh_Raman@mentor.com> Signed-off-by: Vitaly Kuzmichev <Vitaly_Kuzmichev@mentor.com> Acked-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-07-22 16:07:31 +02:00
Pablo Neira Ayuso	5b96af7713	netfilter: nf_tables: simplify set dump through netlink This patch uses the cb->data pointer that allows us to store the context when dumping the set list. Thus, we don't need to parse the original netlink message containing the dump request for each recvmsg() call when dumping the set list. The different function flavours depending on the dump criteria has been also merged into one single generic function. This saves us ~100 lines of code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-22 12:08:54 +02:00
Pablo Neira Ayuso	85f5b3086a	netfilter: bridge: add reject support So you can reject IPv4 and IPv6 packets from bridge tables. If the ether proto is now known, default on dropping the packet instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-22 12:00:22 +02:00
David S. Miller	8fd90bb889	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/infiniband/hw/cxgb4/device.c The cxgb4 conflict was simply overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-22 00:44:59 -07:00
Ursula Braun	1042cab862	af_iucv: avoid path quiesce of severed path in shutdown() An af_iucv stress test showed -EPIPE results for sendmsg() calls. They are caused by quiescing a path even though it has been already severed by peer. For IUCV transport shutdown() consists of 2 steps: (1) sending the shutdown message to peer (2) quiescing the iucv path If the iucv path between these 2 steps is severed due to peer closing the path, the quiesce step is no longer needed. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-21 20:21:40 -07:00
David S. Miller	850717ef00	Included fixes: - recognise and drop Bridge Loop Avoidance packets even if they are encapsulated in the 802.1q header multiple times. Forwarding them into the mesh creates issues on other nodes. - properly handle VLAN private objects in order to avoid race conditions upon fast VLAN deletion-addition. Such conditions create an unrecoverable inconsistency in the TT database of the nodes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJTzMYfAAoJEJgn97Bh2u9eKLIP+wWwqvRe5hFleA7Xd7vHS769 20TrhDPZrQAcaK8dg8/VpqUZ4oGAi0WHhbhAdur1Vj3Ie5DDsqqu45lK9a/o+PAe avWafxcPcK5LLoLbDKNxX98n6BN3aNFIp7rUy4CDO7Beix/PfQUYGbZ01IEueNlX tvKz1oO7r3SvWFELltSU7bndU+0NoZRon5qXSaxnlYHMXcsJEJAKRPE9eLdwXUaF 9h0oIKkPVQt8YFn0w1zZRePSPWGQSAb20exgRGwPxI23xs7ui1i+s5Od9aSt8FcR e6eNuMDsuHVeAmW+nsxF3WAyYGIGyaTb9sSkwrToXZge7BRFRfphKN1WHD1bp6A5 a0Lu3wkzCJbrS3LZkjt99jh+0XAaaoWkAt4Lu4+VUcMYtfITHHHN4kfmzoPE7Z8y Qq64KL/ry6v2lqGk2+9G5/oHXMAYAyed+TPk/HSn5O0CS+zXxXFvrvbYyQyFg99X BcuOD6dGLbfaPQh9XuCE9jJ2D5QHnkAXj2FlK5oFd7y6ASdLltratTYNKJ4T7cVR +cyBkZ6cI3Ehzq1jrR8/9qqAal+a/jdzne6J7DPnWksDWxnTylANuWecVkETkpcL mUp6Zv9SYISqQSPtrbE7xu1XW/ICoajc+6H0eEOFhKU+JEqKjxwSE2QoKvzxeC8Y OHIbq99fItGwH7Vuldkg =RdJM -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Antonio Quartulli says: ==================== pull request [net]: batman-adv 20140721 here you have two fixes that we have been testing for quite some time (this is why they arrived a bit late in the rc cycle). Patch 1) ensures that BLA packets get dropped and not forwarded to the mesh even if they reach batman-adv within QinQ frames. Forwarding them into the mesh means messing up with the TT database of other nodes which can generate all kind of unexpected behaviours during route computation. Patch 2) avoids a couple of race conditions triggered upon fast VLAN deletion-addition. Such race conditions are pretty dangerous because they not only create inconsistencies in the TT database of the nodes in the network, but such scenario is also unrecoverable (unless nodes are rebooted). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-21 20:19:09 -07:00
Eric Dumazet	10ec9472f0	ipv4: fix buffer overflow in ip_options_compile() There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0 [28504.915394] [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120 [28504.916843] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.919490] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.920835] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.922208] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.923459] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120 [28504.926884] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.929175] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.930400] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.931677] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.932851] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-21 20:16:26 -07:00
Michal Kazior	08cf42e843	mac80211: add support for Rx reordering offloading Some drivers may be performing most of Tx/Rx aggregation on their own (e.g. in firmware) including AddBa/DelBa negotiations but may otherwise require Rx reordering assistance. The patch exports 2 new functions for establishing Rx aggregation sessions in assumption device driver has taken care of the necessary negotiations. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> [fix endian bug] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 17:42:07 +02:00
Michal Kazior	83eb935ec7	mac80211: fix Rx reordering with RX_FLAG_AMSDU_MORE Some drivers (e.g. ath10k) report A-MSDU subframes individually with identical seqno. The A-MPDU Rx reorder code did not account for that which made it practically unusable with drivers using RX_FLAG_AMSDU_MORE because it would end up dropping a lot of frames resulting in confusion in upper network transport layers. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 16:17:26 +02:00
Eytan Lifshitz	60e83deb4c	mac80211: remove useless NULL checks sdata can't be NULL, and key being NULL is really not possible unless the code is modified. The sdata check made a static analyze (klocwork) unhappy because we would get pointer to local (sdata->local) and only then check if sdata is non-NULL. Signed-off-by: Eytan Lifshitz <eytan.lifshitz@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> [remove !key check as well] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 16:04:27 +02:00
Johan Hedberg	27f70f3e62	Bluetooth: Prefer sizeof(ptr) when allocating memory It's safer practice to use sizeof(ptr) instead of sizeof(ptr_type) when allocating memory in case the type changes. This also fixes the following style of warnings from static analyzers: CHECK: Prefer kzalloc(sizeof(*ie)...) over kzalloc(sizeof(struct inquiry_entry)...) + ie = kzalloc(sizeof(struct inquiry_entry), GFP_KERNEL); Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-21 12:59:38 +02:00
Max Stepanov	aeb136c5b4	mac80211: fix a potential NULL access in ieee80211_crypto_hw_decrypt The NULL pointer access could happen when ieee80211_crypto_hw_decrypt is called from ieee80211_rx_h_decrypt with the following condition: 1. rx->key->conf.cipher is not WEP, CCMP, TKIP or AES_CMAC 2. rx->sta is NULL When ieee80211_crypto_hw_decrypt is called, it verifies rx->sta->cipher_scheme and it will cause Oops if rx->sta is NULL. This path adds an addirional rx->sta == NULL verification in ieee80211_crypto_hw_decrypt for this case. Signed-off-by: Max Stepanov <Max.Stepanov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:34:08 +02:00
Luis R. Rodriguez	fa96aabb6a	wireless: fixup genregdb.awk for remove of antenna gain from wireless-regd Since "wireless-regdb: remove antenna gain" was merged in the wireless-regdb tree, the awk script parser has been incompatible with the 'official' regulatory database. This fixes that up. Without this change the max EIRP is set to 0 making 802.11 devices useless. The fragile nature of the awk parser must be replaced, but ideas over how to do that in the most scalable way are being reviewed. In the meantime update the documentation for CFG80211_INTERNAL_REGDB so folks are aware of expectations for now. Reported-by: John Walker <john@x109.net> Reported-by: Krishna Chaitanya <chaitanya.mgit@gmail.com> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:24:20 +02:00
Luciano Coelho	3e2a0226c6	mac80211: remove redundant IEEE80211_STA_CSA_RECEIVED flag The csa_active flag was added in sdata a while ago and made IEEE80211_STA_CSA_RECEIVED redundant. The new flag is also used to mark when CSA is ongoing on other iftypes and took over the old one as the preferred method for checking whether we're in the middle of a channel switch. Remove the old, redundant flag. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:21:26 +02:00
Liad Kaufman	bb3f848608	mac80211: make sure TDLS teardown packet is sent on time Since the teardown packet is created while the queues are stopped, it isn't sent immediately, but rather is pending. To be sure that when we flush the queues prior to destroying the station we also send this packet - the tasklet handling pending packets is invoked to flush the packets. Signed-off-by: Liad Kaufman <liad.kaufman@intel.com> Reviewed-by: ArikX Nemtsov <arik@wizery.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:05 +02:00
Arik Nemtsov	db8e173245	mac80211: ignore frames between TDLS peers when operating as AP If the AP receives actions frames destined for other peers, it may mistakenly toggle BA-sessions from itself to a peer. Ignore TDLS data packets as well - the AP should not handle them. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:05 +02:00
Arik Nemtsov	c72e114046	cfg80211: fix TDLS setup with VHT peers Some VHT TDLS peers (Google Nexus 5) include the VHT-AID IE in their TDLS setup request/response. Usermode passes this aid as the station aid, causing it to fail verifiction, since this happens in the "set_station" stage. Make an exception for the TDLS use-case. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:04 +02:00
Arik Nemtsov	bed766bd4c	mac80211: disable VHT for TDLS TDLS VHT support requires some more information elements during setup. While these are not there, mask out the peer's VHT capabilities so that VHT rates are not mistakenly used. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:04 +02:00
Arik Nemtsov	dc5943d540	mac80211: set Rx highest rate in ht_cap Set for completeness mostly, currently unused in the code. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:04 +02:00
Arik Nemtsov	13cc8a4a1d	mac80211: support HT for TDLS stations Add the HT capabilities and HT operation information elements to TDLS setup packets where appropriate. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:04 +02:00
Arik Nemtsov	81dd2b8822	mac80211: move TDLS data to mgd private part We can only be a station for TDLS connections. Also fix a bug where a delayed work could be left scheduled if the station interface was brought down during TDLS setup. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:04 +02:00
Arik Nemtsov	6f7eaa47e1	mac80211: add TDLS QoS param IE on setup-confirm When TDLS QoS is supported by the the peer and the local card, add the WMM parameter IE to the setup-confirm frame. Take the QoS settings from the current AP, or if unsupported, use the default values from the specification. This behavior is mandated by IEEE802.11-2012 section 10.22.4. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Liad Kaufman <liad.kaufman@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:04 +02:00
Arik Nemtsov	40b861a0ee	mac80211: add QoS IE during TDLS setup start If QoS is supported by the card, add an appropriate IE to TDLS setup- request and setup-response frames. Consolidate the setting of the WMM info IE across mac80211. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Liad Kaufman <liad.kaufman@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:04 +02:00
Arik Nemtsov	dd8c0b03d3	mac80211: set TDLS capab to zero on failure frames When sending setup-failure frames, set the capability field to zero, as mandated by the specification (IEEE802.11-2012 8.5.13). Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Liad Kaufman <liad.kaufman@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:03 +02:00
Arik Nemtsov	1606ef4a9d	mac80211: avoid adding some IEs on TDLS setup failure packets Most setup-specific information elements are not to be added when a setup frame is sent with an error status code. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Liad Kaufman <liad.kaufman@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:03 +02:00
Arik Nemtsov	f09a87d274	mac80211: split extra TDLS IEs in setup frames When building TDLS setup frames, use the IE order mandates in the specification, splitting extra IEs coming from usermode. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:03 +02:00
Arik Nemtsov	46792a2dfc	mac80211: consolidate TDLS IE treatment Add all information elements for TDLS discovery and setup in the same function. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Liad Kaufman <liad.kaufman@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:03 +02:00
Arik Nemtsov	6ae32e5d28	mac80211: fix error path for TDLS setup The patch "8f02e6b mac80211: make sure TDLS peer STA exists during setup" broke TDLS error paths where the STA doesn't exist when sending the error. Fix it by only testing for STA existence during a non-error flow. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:03 +02:00
Arik Nemtsov	626911cc60	mac80211: track TDLS initiator internally Infer the TDLS initiator and track it in mac80211 via a STA flag. This avoids breaking old userspace that doesn't pass it via nl80211 APIs. The only case where userspace will need to pass the initiator is when the STA is removed due to unreachability before a teardown packet is sent. Support for unreachability was only recently added to wpa_supplicant, so it won't be a problem in practice. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-21 12:14:03 +02:00
Antonio Quartulli	35df3b298f	batman-adv: fix TT VLAN inconsistency on VLAN re-add When a VLAN interface (on top of batX) is removed and re-added within a short timeframe TT does not have enough time to properly cleanup. This creates an internal TT state mismatch as the newly created softif_vlan will be initialized from scratch with a TT client count of zero (even if TT entries for this VLAN still exist). The resulting TT messages are bogus due to the counter / tt client listing mismatch, thus creating inconsistencies on every node in the network To fix this issue destroy_vlan() has to not free the VLAN object immediately but it has to be kept alive until all the TT entries for this VLAN have been removed. destroy_vlan() still removes the sysfs folder so that the user has the feeling that everything went fine. If the same VLAN is re-added before the old object is free'd, then the latter is resurrected and re-used. Implement such behaviour by increasing the reference counter of a softif_vlan object every time a new local TT entry for such VLAN is created and remove the object from the list only when all the TT entries have been destroyed. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-07-21 09:49:30 +02:00
Simon Wunderlich	d46b6bfa76	batman-adv: drop QinQ claim frames in bridge loop avoidance Since bridge loop avoidance only supports untagged or simple 802.1q tagged VLAN claim frames, claim frames with stacked VLAN headers (QinQ) should be detected and dropped. Transporting the over the mesh may cause problems on the receivers, or create bogus entries in the local tt tables. Reported-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-07-21 09:05:31 +02:00
Ben Hutchings	640d7efe4c	dns_resolver: Null-terminate the right string _result[len] is parsed as (_result[len]) which is not at all what we want to touch here. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: `84a7c0b1db` ("dns_resolver: assure that dns_query() result is null-terminated") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 22:33:32 -07:00
Wei Yongjun	52f50ce556	tipc: fix sparse non static symbol warnings Fixes the following sparse warnings: net/tipc/socket.c:545:5: warning: symbol 'tipc_sk_proto_rcv' was not declared. Should it be static? net/tipc/socket.c:2015:5: warning: symbol 'tipc_ioctl' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 22:19:04 -07:00
Andrey Utkin	fa4eff44a6	net/rxrpc/ar-key.c: drop negativity check on unsigned value Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=80611 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 21:25:56 -07:00
David S. Miller	a8138f42d4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains updates for your net-next tree, they are: 1) Use kvfree() helper function from x_tables, from Eric Dumazet. 2) Remove extra timer from the conntrack ecache extension, use a workqueue instead to redeliver lost events to userspace instead, from Florian Westphal. 3) Removal of the ulog targets for ebtables and iptables. The nflog infrastructure superseded this almost 9 years ago, time to get rid of this code. 4) Replace the list of loggers by an array now that we can only have two possible non-overlapping logger flavours, ie. kernel ring buffer and netlink logging. 5) Move Eric Dumazet's log buffer code to nf_log to reuse it from all of the supported per-family loggers. 6) Consolidate nf_log_packet() as an unified interface for packet logging. After this patch, if the struct nf_loginfo is available, it explicitly selects the logger that is used. 7) Move ip and ip6 logging code from xt_LOG to the corresponding per-family loggers. Thus, x_tables and nf_tables share the same code for packet logging. 8) Add generic ARP packet logger, which is used by nf_tables. The format aims to be consistent with the output of xt_LOG. 9) Add generic bridge packet logger. Again, this is used by nf_tables and it routes the packets to the real family loggers. As a result, we get consistent logging format for the bridge family. The ebt_log logging code has been intentionally left in place not to break backward compatibility since the logging output differs from xt_LOG. 10) Update nft_log to explicitly request the required family logger when needed. 11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families. Allowing selection between netlink and kernel buffer ring logging. 12) Several fixes coming after the netfilter core logging changes spotted by robots. 13) Use IS_ENABLED() macros whenever possible in the netfilter tree, from Duan Jiong. 14) Removal of a couple of unnecessary branch before kfree, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 21:01:43 -07:00
Cong Wang	7801db8aec	net_sched: avoid generating same handle for u32 filters When kernel generates a handle for a u32 filter, it tries to start from the max in the bucket. So when we have a filter with the max (fff) handle, it will cause kernel always generates the same handle for new filters. This can be shown by the following command: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip pref 770 handle 800::fff u32 match ip protocol 1 0xff tc filter add dev eth0 parent ffff: protocol ip pref 770 u32 match ip protocol 1 0xff ... we will get some u32 filters with same handle: # tc filter show dev eth0 parent ffff: filter protocol ip pref 770 u32 filter protocol ip pref 770 u32 fh 800: ht divisor 1 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 handles should be unique. This patch fixes it by looking up a bitmap, so that can guarantee the handle is as unique as possible. For compatibility, we still start from 0x800. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 20:49:17 -07:00
Veaceslav Falico	6fe82a39e5	net: print a notification on device rename Currently it's done silently (from the kernel part), and thus it might be hard to track the renames from logs. Add a simple netdev_info() to notify the rename, but only in case the previous name was valid. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Vlad Yasevich <vyasevic@redhat.com> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 20:44:25 -07:00
Veaceslav Falico	ccc7f4968a	net: print net_device reg_state in netdev_* unless it's registered This way we'll always know in what status the device is, unless it's running normally (i.e. NETDEV_REGISTERED). Also, emit a warning once in case of a bad reg_state. CC: "David S. Miller" <davem@davemloft.net> CC: Jason Baron <jbaron@akamai.com> CC: Eric Dumazet <edumazet@google.com> CC: Vlad Yasevich <vyasevic@redhat.com> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Joe Perches <joe@perches.com> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 20:38:43 -07:00
Cong Wang	224e923cd9	net_sched: hold tcf_lock in netdevice notifier We modify mirred action (m->tcfm_dev) in netdev event, we need to prevent on-going mirred actions from reading freed m->tcfm_dev. So we need to acquire this spin lock. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 20:31:42 -07:00
Mark A. Greer	55537c7e7d	NFC: digital: Add digital framing calls when in target mode Add new "NFC_DIGITAL_FRAMING_*" calls to the digital layer so the driver can make the necessary adjustments when performing anticollision while in target mode. The driver must ensure that the effect of these calls happens after the following response has been sent but before reception of the next request begins. Acked-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-07-21 00:45:21 +02:00
Mark A. Greer	0529a7adf3	NFC: digital: Clear poll_tech_count before activating target Currently, digital_target_found() has a race between the events started by calling nfc_targets_found() (which ultimately expect ddev->poll_tech_count to be zero) and setting ddev->poll_tech_count to zero after the call to nfc_targets_found(). When the race is "lost" (i.e., ddev->poll_tech_count is found to not be zero by the events started by nfc_targets_found()), an error message is printed and the target is not found. A similar race exists when digital_tg_recv_atr_req() calls nfc_tm_activated(). Fix this by first saving the current value of ddev->poll_tech_count and then clearing it before calling nfc_targets_found()/nfc_tm_activated(). Clearing ddev->poll_tech_count before calling nfc_targets_found()/nfc_tm_activated() eliminates the race. Saving the value is required so it can be restored when nfc_targets_found()/nfc_tm_activated() fails and polling needs to continue. Acked-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-07-21 00:45:11 +02:00
Mark A. Greer	4b4dbca5e4	NFC: digital: Check for NFC-DEP before checking for Type 4 tag In digital_in_recv_sel_res(), the code that determines the tag type will interpret bits 7:6 (lsb being b1 as per the Digital Specification) of a SEL RES set to 11b as a Type 4 tag. This is okay except that the neard will interpret the same value as an NFC-DEP device (in src/tag.c:set_tag_type() in the neard source). Make the digital layer's interpretation match neard's interpretation by changing the order of the checks in digital_in_recv_sel_res() so that a value of 11b in bits 7:6 is interpreted as an NFC-DEP device instead of a Type 4 tag. Acked-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-07-21 00:45:03 +02:00
Marcel Holtmann	0a961a440d	Bluetooth: Remove unneeded variable assignment in hmac_sha256 The variable ret does not need to be assigned when declaring it. So remove this initial assignment. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-20 19:53:11 +03:00
Johan Hedberg	d1d588c181	Bluetooth: Disable HCI_CONNECTABLE based passive scanning for now When HCI_CONNECTABLE is set the code has been enabling passive scanning in order to be consistent with BR/EDR and accept connections from any device doing directed advertising to us. However, some hardware (particularly CSR) can get very noisy even when doing duplicates filtering, making this feature waste resources. Considering that the feature is for fairly corner-case use (devices who'd use directed advertising would likely be in the whitelist anyway) it's better to disable it for now. It may still be brought back later, possibly with a better implementation (e.g. through improved scan parameters). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-20 16:15:38 +02:00
John W. Linville	fd29d2cdd5	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-07-18 13:35:45 -04:00
John W. Linville	7fc9427222	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-07-18 12:55:45 -04:00
Trond Myklebust	22cb43855d	SUNRPC: xdr_get_next_encode_buffer should be declared static Quell another sparse warning. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-18 11:35:46 -04:00
Chuck Lever	3c45ddf823	svcrdma: Select NFSv4.1 backchannel transport based on forward channel The current code always selects XPRT_TRANSPORT_BC_TCP for the back channel, even when the forward channel was not TCP (eg, RDMA). When a 4.1 mount is attempted with RDMA, the server panics in the TCP BC code when trying to send CB_NULL. Instead, construct the transport protocol number from the forward channel transport or'd with XPRT_TRANSPORT_BC. Transports that do not support bi-directional RPC will not have registered a "BC" transport, causing create_backchannel_client() to fail immediately. Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-18 11:35:45 -04:00
Johan Hedberg	beb19e4c07	Bluetooth: Use EOPNOTSUPP instead of ENOTSUPP The EOPNOTSUPP and ENOTSUPP errors are very similar in meaning, but ENOTSUPP is a fairly new addition to POSIX. Not all libc versions know about the value the kernel uses for ENOTSUPP so it's better to use EOPNOTSUPP to ensure understandable error messages. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-18 11:11:38 +02:00
Eliad Peller	8c26d45839	cfg80211: fix mic_failure tracing tsc can be NULL (mac80211 currently always passes NULL), resulting in NULL-dereference. check before copying it. Cc: stable@vger.kernel.org Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-18 09:53:56 +02:00
Johannes Berg	1d4cc30c86	mac80211: suppress unused variable warning without lockdep When lockdep isn't compiled, a local variable isn't used (it's only in a macro argument), annotate it to suppress the compiler warning. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-07-18 09:47:26 +02:00
Anish Bhatt	c2659479f7	Update setapp/getapp prototypes in dcbnl_rtnl_ops to return int instead of u8 v2: fixed issue with checking return of dcbnl_rtnl_ops->getapp() Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-17 16:02:29 -07:00
Cong Wang	9cc63db5e1	net_sched: cancel nest attribute on failure in tcf_exts_dump() Like other places, we need to cancel the nest attribute after we start. Fortunately the netlink message will not be sent on failure, so it's not a big problem at all. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-17 14:58:52 -07:00
David Howells	0c7774abb4	KEYS: Allow special keys (eg. DNS results) to be invalidated by CAP_SYS_ADMIN Special kernel keys, such as those used to hold DNS results for AFS, CIFS and NFS and those used to hold idmapper results for NFS, used to be 'invalidateable' with key_revoke(). However, since the default permissions for keys were reduced: Commit: `96b5c8fea6` KEYS: Reduce initial permissions on keys it has become impossible to do this. Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be invalidated by root. This should not be used for system keyrings as the garbage collector will try and remove any invalidate key. For system keyrings, KEY_FLAG_ROOT_CAN_CLEAR can be used instead. After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and idmapper keys. Invalidated keys are immediately garbage collected and will be immediately rerequested if needed again. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Steve Dickson <steved@redhat.com>	2014-07-17 20:45:08 +01:00
Johan Hedberg	2f407f0afb	Bluetooth: Fix allowing initiating pairing when not pairable When we're not pairable we should still allow us to act as initiators for pairing, i.e. the HCI_PAIRABLE flag should only be affecting incoming pairing attempts. This patch fixes the relevant checks for the hci_io_capa_request_evt() and hci_pin_code_request_evt() functions. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-17 14:39:40 +02:00
Johan Hedberg	977f8fce02	Bluetooth: Introduce a flag to track who really initiates authentication Even though our side requests authentication, the original action that caused it may be remotely triggered, such as an incoming L2CAP or RFCOMM connect request. To track this information introduce a new hci_conn flag called HCI_CONN_AUTH_INITIATOR. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-17 14:39:40 +02:00
Johan Hedberg	e7cafc4525	Bluetooth: Pass initiator/acceptor information to hci_conn_security() We're interested in whether an authentication request is because of a remote or local action. So far hci_conn_security() has been used both for incoming and outgoing actions (e.g. RFCOMM or L2CAP connect requests) so without some modifications it cannot know which peer is responsible for requesting authentication. This patch adds a new "bool initiator" parameter to hci_conn_security() to indicate which side is responsible for the request and updates the current users to pass this information correspondingly. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-17 14:39:39 +02:00
Johan Hedberg	c1d4fa7aa8	Bluetooth: Fix resetting remote authentication requirement after pairing When a new hci_conn object is created the remote SSP authentication requirement is set to the invalid value 0xff to indicate that it is unknown. Once pairing completes however the code was leaving it as-is. In case a new pairing happens over the same connection it is important that we reset the value back to unknown so that the pairing code doesn't make false assumptions about the requirements. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-17 14:28:09 +02:00
Vladimir Davydov	093facf363	Bluetooth: never linger on process exit If the current process is exiting, lingering on socket close will make it unkillable, so we should avoid it. Reproducer: #include <sys/types.h> #include <sys/socket.h> #define BTPROTO_L2CAP 0 #define BTPROTO_SCO 2 #define BTPROTO_RFCOMM 3 int main() { int fd; struct linger ling; fd = socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM); //or: fd = socket(PF_BLUETOOTH, SOCK_DGRAM, BTPROTO_L2CAP); //or: fd = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_SCO); ling.l_onoff = 1; ling.l_linger = 1000000000; setsockopt(fd, SOL_SOCKET, SO_LINGER, &ling, sizeof(ling)); return 0; } Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-07-17 12:13:06 +02:00
Johan Hedberg	02f3e25457	Bluetooth: Don't bother user space without IO capabilities If user space has a NoInputNoOutput IO capability it makes no sense to bother it with confirmation requests. This patch updates both SSP and SMP to check for the local IO capability before sending a user confirmation request to user space. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-17 11:43:06 +02:00
Johan Hedberg	9f743d7499	Bluetooth: Fix using uninitialized variable when pairing Commit `6c53823ae0` reshuffled the way the authentication requirement gets set in the hci_io_capa_request_evt() function, but at the same time it failed to update an if-statement where cp.authentication is used before it has been initialized. The correct value the code should be looking for in this if-statement is conn->auth_type. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 3.16	2014-07-17 11:38:00 +02:00
stephen hemminger	48e48a70c0	openvswitch: make generic netlink group const Generic netlink tables can be const. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 23:41:13 -07:00
David Held	2dc41cff75	udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver. Many multicast sources can have the same port which can result in a very large list when hashing by port only. Hash by address and port instead if this is the case. This makes multicast more similar to unicast. On a 24-core machine receiving from 500 multicast sockets on the same port, before this patch 80% of system CPU was used up by spin locking and only ~25% of packets were successfully delivered. With this patch, all packets are delivered and kernel overhead is ~8% system CPU on spinlocks. Signed-off-by: David Held <drheld@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 23:29:52 -07:00
David Held	5cf3d46192	udp: Simplify __udp*_lib_mcast_deliver. Switch to using sk_nulls_for_each which shortens the code and makes it easier to update. Signed-off-by: David Held <drheld@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 23:29:52 -07:00
Varka Bhadram	498044bb2b	netlink: remove bool varible This patch removes the bool variable 'pass'. If the swith case exist return true or return false. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 23:15:00 -07:00
Alexander Duyck	c8a89c4a1d	rtnetlink: Drop unnecessary return value from ndo_dflt_fdb_del This change cleans up ndo_dflt_fdb_del to drop the ENOTSUPP return value since that isn't actually returned anywhere in the code. As a result we are able to drop a few lines by just defaulting this to -EINVAL. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 23:13:26 -07:00
françois romieu	a40e0a664b	net: remove open-coded skb_cow_head. Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 22:42:32 -07:00
Jon Paul Maloy	6f92ee54b3	tipc: ensure sequential message delivery across dual bearers When we run broadcast packets over dual bearers/interfaces, the current transmission code is flipping bearers between each sent packet, with the purpose of leveraging the double bandwidth available. The receiving bclink is resequencing the packets if needed, so all messages are delivered upwards from the broadcast link in the correct order, even if they may arrive in concurrent interrupts. However, at the moment of delivery upwards to the socket, we release all spinlocks (bclink_lock, node_lock), so it is still possible that arriving messages bypass each other before they reach the socket queue. We fix this by applying the same technique we are using for unicast traffic. We use a link selector (i.e., the last bit of sending port number) to ensure that messages from the same sender socket always are sent over the same bearer. This guarantees sequential delivery between socket pairs, which is sufficient to satisfy the protocol spec, as well as all known user requirements. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	9fbfb8b120	tipc: rename temporarily named functions After the previous commit, we can now give the functions with temporary names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper names. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	c4116e1057	tipc: remove unreferenced functions We can now remove a number of functions which have become obsolete and unreferenced through this commit series. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	0abd8ff21f	tipc: start using the new multicast functions In this commit, we convert the socket multicast send function to directly call the new multicast/broadcast function (tipc_bclink_xmit2()) introduced in the previous commit. We do this instead of letting the call go via the now obsolete tipc_port_mcast_xmit(), hence saving a call level and some code complexity. We also remove the initial destination lookup at the message sending side, and replace that with an unconditional lookup at the receiving side, including on the sending node itself. This makes the destination lookup and message transfer more uniform than before. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	078bec826f	tipc: add new functions for multicast and broadcast distribution We add a new broadcast link transmit function in bclink.c and a new receive function in socket.c. The purpose is to move the branching between external and internal destination down to the link layer, just as we have done with unicast in earlier commits. We also make use of the new link-independent fragmentation support that was introduced in an earlier commit series. This gives a shorter and simpler code path, and makes it possible to obtain copy-free buffer delivery to all node local destination sockets. The new transmission code is added in parallel with the existing one, and will be used by the socket multicast send function in the next commit in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	25b660c7e2	tipc: let internal link users call the new link send function We convert the link internal users (changeover protocol, broadcast synchronization) to use the new packet send function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	dbdf6d24ad	tipc: make name table distributor use new send function In a previous commit series ("tipc: new unicast transmission code") we introduced a new message sending function, tipc_link_xmit2(), and moved the unicast data users over to use that function. We now let the internal name table distributor do the same. The interaction between the name distributor and the node/link layer also becomes significantly simpler, so we can eliminate the function tipc_link_names_xmit(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Alex Gartrell	76f084bc10	ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding Previously, only the four high bits of the tclass were maintained in the ipv6 case. This matches the behavior of ipv4, though whether or not we should reflect ECN bits may be up for debate. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-07-17 12:53:54 +09:00
David S. Miller	38a4dfcf80	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/nf_tables fixes The following patchset contains nf_tables fixes, they are: 1) Fix wrong transaction handling when the table flags are not modified. 2) Fix missing rcu read_lock section in the netlink dump path, which is not protected by the nfnl_lock. 3) Set NLM_F_DUMP_INTR in the netlink dump path to indicate interferences with updates. 4) Fix 64 bits chain counters when they are retrieved from a 32 bits arch, from Eric Dumazet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 15:27:16 -07:00
Jerry Chu	c3caf1192f	net-gre-gro: Fix a bug that breaks the forwarding path Fixed a bug that was introduced by my GRE-GRO patch (`bf5a755f5e` net-gre-gro: Add GRE support to the GRO stack) that breaks the forwarding path because various GSO related fields were not set. The bug will cause on the egress path either the GSO code to fail, or a GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following fix has been tested for both cases. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:45:26 -07:00
Daniel Borkmann	bbbea41d5e	net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support With support of SCTP_SNDINFO/SCTP_RCVINFO as described in RFC6458, 5.3.4/5.3.5, we can now deprecate SCTP_SNDRCV. The RFC already declares it as deprecated: This structure mixes the send and receive path. SCTP_SNDINFO (described in Section 5.3.4) and SCTP_RCVINFO (described in Section 5.3.5) split this information. These structures should be used, when possible, since SCTP_SNDRCV is deprecated. So whenever a user tries to subscribe to sctp_data_io_event via setsockopt(2) which triggers inclusion of SCTP_SNDRCV cmsg_type, issue a warning in the log. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:40:04 -07:00
Geir Ola Vaagland	6b3fd5f3a2	net: sctp: implement rfc6458, 8.1.31. SCTP_DEFAULT_SNDINFO support This patch implements section 8.1.31. of RFC6458, which adds support for setting/retrieving SCTP_DEFAULT_SNDINFO: Applications that wish to use the sendto() system call may wish to specify a default set of parameters that would normally be supplied through the inclusion of ancillary data. This socket option allows such an application to set the default sctp_sndinfo structure. The application that wishes to use this socket option simply passes the sctp_sndinfo structure (defined in Section 5.3.4) to this call. The input parameters accepted by this call include snd_sid, snd_flags, snd_ppid, and snd_context. The snd_flags parameter is composed of a bitwise OR of SCTP_UNORDERED, SCTP_EOF, and SCTP_SENDALL. The snd_assoc_id field specifies the association to which to apply the parameters. For a one-to-many style socket, any of the predefined constants are also allowed in this field. The field is ignored for one-to-one style sockets. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:40:04 -07:00
Geir Ola Vaagland	2347c80ff1	net: sctp: implement rfc6458, 5.3.6. SCTP_NXTINFO cmsg support This patch implements section 5.3.6. of RFC6458, that is, support for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which is placed into ancillary data cmsghdr structure for each recvmsg() call, if this information is already available when delivering the current message. This option can be enabled/disabled via setsockopt(2) on SOL_SCTP level by setting an int value with 1/0 for SCTP_RECVNXTINFO in user space applications as per RFC6458, section 8.1.30. The sctp_nxtinfo structure is defined as per RFC as below ... struct sctp_nxtinfo { uint16_t nxt_sid; uint16_t nxt_flags; uint32_t nxt_ppid; uint32_t nxt_length; sctp_assoc_t nxt_assoc_id; }; ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:40:03 -07:00
Geir Ola Vaagland	0d3a421d28	net: sctp: implement rfc6458, 5.3.5. SCTP_RCVINFO cmsg support This patch implements section 5.3.5. of RFC6458, that is, support for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is placed into ancillary data cmsghdr structure for each recvmsg() call. This option can be enabled/disabled via setsockopt(2) on SOL_SCTP level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user space applications as per RFC6458, section 8.1.29. The sctp_rcvinfo structure is defined as per RFC as below ... struct sctp_rcvinfo { uint16_t rcv_sid; uint16_t rcv_ssn; uint16_t rcv_flags; <-- 2 bytes hole --> uint32_t rcv_ppid; uint32_t rcv_tsn; uint32_t rcv_cumtsn; uint32_t rcv_context; sctp_assoc_t rcv_assoc_id; }; ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo. An sctp_rcvinfo item always corresponds to the data in msg_iov. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:40:03 -07:00
Geir Ola Vaagland	63b949382c	net: sctp: implement rfc6458, 5.3.4. SCTP_SNDINFO cmsg support This patch implements section 5.3.4. of RFC6458, that is, support for 'SCTP Send Information Structure' (SCTP_SNDINFO) which can be placed into ancillary data cmsghdr structure for sendmsg() calls. The sctp_sndinfo structure is defined as per RFC as below ... struct sctp_sndinfo { uint16_t snd_sid; uint16_t snd_flags; uint32_t snd_ppid; uint32_t snd_context; sctp_assoc_t snd_assoc_id; }; ... and supplied under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_SNDINFO, while cmsg_data[] contains struct sctp_sndinfo. An sctp_sndinfo item always corresponds to the data in msg_iov. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:40:03 -07:00
David S. Miller	1a98c69af1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:09:34 -07:00
Johan Hedberg	46c4c941a4	Bluetooth: Fix always checking the blacklist for incoming connections We should check the blacklist no matter what, meaning also when we're not connectable. This patch fixes the respective logic in the function making the decision whether to accept a connection or not. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-16 15:21:53 +02:00
NeilBrown	c1221321b7	sched: Allow wait_on_bit_action() functions to support a timeout It is currently not possible for various wait_on_bit functions to implement a timeout. While the "action" function that is called to do the waiting could certainly use schedule_timeout(), there is no way to carry forward the remaining timeout after a false wake-up. As false-wakeups a clearly possible at least due to possible hash collisions in bit_waitqueue(), this is a real problem. The 'action' function is currently passed a pointer to the word containing the bit being waited on. No current action functions use this pointer. So changing it to something else will be a little noisy but will have no immediate effect. This patch changes the 'action' function to take a pointer to the "struct wait_bit_key", which contains a pointer to the word containing the bit so nothing is really lost. It also adds a 'private' field to "struct wait_bit_key", which is initialized to zero. An action function can now implement a timeout with something like static int timed_out_waiter(struct wait_bit_key key) { unsigned long waited; if (key->private == 0) { key->private = jiffies; if (key->private == 0) key->private -= 1; } waited = jiffies - key->private; if (waited > 10 HZ) return -EAGAIN; schedule_timeout(waited - 10 * HZ); return 0; } If any other need for context in a waiter were found it would be easy to use ->private for some other purpose, or even extend "struct wait_bit_key". My particular need is to support timeouts in nfs_release_page() to avoid deadlocks with loopback mounted NFS. While wait_on_bit_timeout() would be a cleaner interface, it will not meet my need. I need the timeout to be sensitive to the state of the connection with the server, which could change. So I need to use an 'action' interface. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-16 15:10:41 +02:00
NeilBrown	743162013d	sched: Remove proliferation of wait_on_bit() action functions The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are not given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-16 15:10:39 +02:00
Johan Hedberg	f99353cf9c	Bluetooth: Fix trying to initiate connections when acting as LE slave When we have at least one LE slave connection most (probably all) controllers will refuse to initiate any new connections. To avoid unnecessary failures simply check for this situation up-front and skip the connection attempt. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-16 11:58:04 +02:00
Johan Hedberg	a5c4e309b9	Bluetooth: Add a role parameter to hci_conn_add() We need to be able to track slave vs master LE connections in hci_conn_hash, and to be able to do that we need to know the role of the connection by the time hci_conn_add_has() is called. This means in practice the hci_conn_add() call that creates the hci_conn_object. This patch adds a new role parameter to hci_conn_add() function to give the object its initial role value, and updates the callers to pass the appropriate role to it. Since the function now takes care of initializing both conn->role and conn->out values we can remove some other unnecessary assignments. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-16 11:58:03 +02:00
Johan Hedberg	e804d25d4a	Bluetooth: Use explicit role instead of a bool in function parameters To make the code more understandable it makes sense to use the new HCI defines for connection role instead of a "bool master" parameter. This makes it immediately clear when looking at the function calls what the last parameter is describing. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-16 11:04:23 +02:00
Johan Hedberg	40bef302f6	Bluetooth: Convert HCI_CONN_MASTER flag to a conn->role variable Having a dedicated u8 role variable in the hci_conn struct greatly simplifies tracking of the role, since this is the native way that it's represented on the HCI level. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-16 11:04:23 +02:00
Johan Hedberg	ba165a90b5	Bluetooth: Add proper defines for HCI connection role All HCI commands and events, including LE ones, use 0x00 for master role and 0x01 for slave role. It makes therefore sense to add generic defines for these instead of the current LE_CONN_ROLE_MASTER. Having clean defines will also make it possible to provide simpler internal APIs. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-16 11:04:23 +02:00
Yannick Brosseau	16ea4c6b9d	ipvs: Remove dead debug code This code section cannot compile as it refer to non existing variable It also pre-date git history. Signed-off-by: Yannick Brosseau <scientist@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-07-16 10:07:11 +09:00
Fabian Frederick	b734427a4f	ipvs: remove null test before kfree Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-07-16 10:07:01 +09:00
Julian Anastasov	2627b7e15c	ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack commit `8f4e0a1868` ("IPVS netns exit causes crash in conntrack") added second ip_vs_conn_drop_conntrack call instead of just adding the needed check. As result, the first call still can cause crash on netns exit. Remove it. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-07-16 09:39:28 +09:00
Willem de Bruijn	11878b40ed	net-timestamp: SOCK_RAW and PING timestamping Add SO_TIMESTAMPING to sockets of type PF_INET[6]/SOCK_RAW: Add the necessary sock_tx_timestamp calls to the datapath for RAW sockets (ping sockets already had these calls). Fix the IP output path to pass the timestamp flags on the first fragment also for these sockets. The existing code relies on transhdrlen != 0 to indicate a first fragment. For these sockets, that assumption does not hold. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=77221 Tested SOCK_RAW on IPv4 and IPv6, not PING. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:32:45 -07:00
Fabian Frederick	138ce91024	net: sctp: remove unnecessary break after return/goto Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:01 -07:00
Fabian Frederick	84dbee1869	ieee802154: remove unnecessary break after goto Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:01 -07:00
Fabian Frederick	f301f22f36	irda: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:01 -07:00
Fabian Frederick	1f7a316f9b	caif: remove unnecessary break after goto Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:01 -07:00
Fabian Frederick	6c4c170105	NFC: remove unnecessary break after goto Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:01 -07:00
Fabian Frederick	66635dc121	ipv6: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:01 -07:00
Fabian Frederick	d518825eab	netfilter: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Fabian Frederick	4ecf1dc7ad	af_key: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Fabian Frederick	be45dff290	mac80211: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Fabian Frederick	4d3520cb52	drop_monitor: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Fabian Frederick	aee944ddf8	pktgen: remove unnecessary break after goto Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Fabian Frederick	0947611d16	netlabel: remove unnecessary break after goto Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Fabian Frederick	089b03227a	af_iucv: remove unnecessary break after goto Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Fabian Frederick	d8282ea05a	9P: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Fabian Frederick	7ceaa583be	tipc: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:26:59 -07:00
Fabian Frederick	fe8c0f4ac2	packet: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:26:59 -07:00
Christoph Paasch	5ee2c941b5	tcp: Remove unnecessary arg from tcp_enter_cwr and tcp_init_cwnd_reduction Since Yuchung's `9b44190dc1` (tcp: refactor F-RTO), tcp_enter_cwr is always called with set_ssthresh = 1. Thus, we can remove this argument from tcp_enter_cwr. Further, as we remove this one, tcp_init_cwnd_reduction is then always called with set_ssthresh = true, and so we can get rid of this argument as well. Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:19:36 -07:00
Tom Gundersen	5517750f05	net: rtnetlink - make create_link take name_assign_type This passes down NET_NAME_USER (or NET_NAME_ENUM) to alloc_netdev(), for any device created over rtnetlink. v9: restore reverse-christmas-tree order of local variables Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:13:07 -07:00
Tom Gundersen	c835a67733	net: set name_assign_type in alloc_netdev() Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) \| -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) \| -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:12:48 -07:00
Tom Gundersen	238fa3623a	net: set name assign type for renamed devices Based on a patch from David Herrmann. This is the only place devices can be renamed. v9: restore revers-christmas-tree order of local variables Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:12:01 -07:00
Tom Gundersen	685343fc3b	net: add name_assign_type netdev attribute Based on a patch by David Herrmann. The name_assign_type attribute gives hints where the interface name of a given net-device comes from. These values are currently defined: NET_NAME_ENUM: The ifname is provided by the kernel with an enumerated suffix, typically based on order of discovery. Names may be reused and unpredictable. NET_NAME_PREDICTABLE: The ifname has been assigned by the kernel in a predictable way that is guaranteed to avoid reuse and always be the same for a given device. Examples include statically created devices like the loopback device and names deduced from hardware properties (including being given explicitly by the firmware). Names depending on the order of discovery, or in any other way on the existence of other devices, must not be marked as PREDICTABLE. NET_NAME_USER: The ifname was provided by user-space during net-device setup. NET_NAME_RENAMED: The net-device has been renamed from userspace. Once this type is set, it cannot change again. NET_NAME_UNKNOWN: This is an internal placeholder to indicate that we yet haven't yet categorized the name. It will not be exposed to userspace, rather -EINVAL is returned. The aim of these patches is to improve user-space renaming of interfaces. As a general rule, userspace must rename interfaces to guarantee that names stay the same every time a given piece of hardware appears (at boot, or when attaching it). However, there are several situations where userspace should not perform the renaming, and that depends on both the policy of the local admin, but crucially also on the nature of the current interface name. If an interface was created in repsonse to a userspace request, and userspace already provided a name, we most probably want to leave that name alone. The main instance of this is wifi-P2P devices created over nl80211, which currently have a long-standing bug where they are getting renamed by udev. We label such names NET_NAME_USER. If an interface, unbeknown to us, has already been renamed from userspace, we most probably want to leave also that alone. This will typically happen when third-party plugins (for instance to udev, but the interface is generic so could be from anywhere) renames the interface without informing udev about it. A typical situation is when you switch root from an installer or an initrd to the real system and the new instance of udev does not know what happened before the switch. These types of problems have caused repeated issues in the past. To solve this, once an interface has been renamed, its name is labelled NET_NAME_RENAMED. In many cases, the kernel is actually able to name interfaces in such a way that there is no need for userspace to rename them. This is the case when the enumeration order of devices, or in fact any other (non-parent) device on the system, can not influence the name of the interface. Examples include statically created devices, or any naming schemes based on hardware properties of the interface. In this case the admin may prefer to use the kernel-provided names, and to make that possible we label such names NET_NAME_PREDICTABLE. We want the kernel to have tho possibilty of performing predictable interface naming itself (and exposing to userspace that it has), as the information necessary for a proper naming scheme for a certain class of devices may not be exposed to userspace. The case where renaming is almost certainly desired, is when the kernel has given the interface a name using global device enumeration based on order of discovery (ethX, wlanY, etc). These naming schemes are labelled NET_NAME_ENUM. Lastly, a fallback is left as NET_NAME_UNKNOWN, to indicate that a driver has not yet been ported. This is mostly useful as a transitionary measure, allowing us to label the various naming schemes bit by bit. v8: minor documentation fixes v9: move comment to the right commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Kay Sievers <kay@vrfy.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:12:01 -07:00
Linus Torvalds	5615f9f822	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Bluetooth pairing fixes from Johan Hedberg. 2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB, from Max Stepanov. 3) New iwlwifi chip IDs, from Oren Givon. 4) bnx2x driver reads wrong PCI config space MSI register, from Yijing Wang. 5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu. 6) Fix double SKB free in openvswitch, from Andy Zhou. 7) Fix sk_dst_set() being racey with UDP sockets, leading to strange crashes, from Eric Dumazet. 8) Interpret the NAPI budget correctly in the new systemport driver, from Florian Fainelli. 9) VLAN code frees percpu stats in the wrong place, leading to crashes in the get stats handler. From Eric Dumazet. 10) TCP sockets doing a repair can crash with a divide by zero, because we invoke tcp_push() with an MSS value of zero. Just skip that part of the sendmsg paths in repair mode. From Christoph Paasch. 11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai. 12) Don't ignore path MTU icmp messages with a zero mtu, machines out there still spit them out, and all of our per-protocol handlers for PMTU can cope with it just fine. From Edward Allcutt. 13) Some NETDEV_CHANGE notifier invocations were not passing in the correct kind of cookie as the argument, from Loic Prylli. 14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul Maloy. 15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix from Dmitry Popov. 16) Fix skb->sk assigned without taking a reference to 'sk' in appletalk, from Andrey Utkin. 17) Fix some info leaks in ULP event signalling to userspace in SCTP, from Daniel Borkmann. 18) Fix deadlocks in HSO driver, from Olivier Sobrie. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits) hso: fix deadlock when receiving bursts of data hso: remove unused workqueue net: ppp: don't call sk_chk_filter twice mlx4: mark napi id for gro_skb bonding: fix ad_select module param check net: pppoe: use correct channel MTU when using Multilink PPP neigh: sysctl - simplify address calculation of gc_* variables net: sctp: fix information leaks in ulpevent layer MAINTAINERS: update r8169 maintainer net: bcmgenet: fix RGMII_MODE_EN bit tipc: clear 'next'-pointer of message fragments before reassembly r8152: fix r8152_csum_workaround function be2net: set EQ DB clear-intr bit in be_open() GRE: enable offloads for GRE farsync: fix invalid memory accesses in fst_add_one() and fst_init_card() igb: do a reset on SR-IOV re-init if device is down igb: Workaround for i210 Errata 25: Slow System Clock usbnet: smsc95xx: add reset_resume function with reset operation dp83640: Always decode received status frames r8169: disable L23 ...	2014-07-15 08:42:52 -07:00
Tejun Heo	2cf669a58d	cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() Currently, cftypes added by cgroup_add_cftypes() are used for both the unified default hierarchy and legacy ones and subsystems can mark each file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear only on one of them. This is quite hairy and error-prone. Also, we may end up exposing interface files to the default hierarchy without thinking it through. cgroup_subsys will grow two separate cftype addition functions and apply each only on the hierarchies of the matching type. This will allow organizing cftypes in a lot clearer way and encourage subsystems to scrutinize the interface which is being exposed in the new default hierarchy. In preparation, this patch adds cgroup_add_legacy_cftypes() which currently is a simple wrapper around cgroup_add_cftypes() and replaces all cgroup_add_cftypes() usages with it. While at it, this patch drops a completely spurious return from __hugetlb_cgroup_file_init(). This patch doesn't introduce any functional differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2014-07-15 11:05:09 -04:00
Tejun Heo	5577964e64	cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes Currently, cgroup_subsys->base_cftypes is used for both the unified default hierarchy and legacy ones and subsystems can mark each file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear only on one of them. This is quite hairy and error-prone. Also, we may end up exposing interface files to the default hierarchy without thinking it through. cgroup_subsys will grow two separate cftype arrays and apply each only on the hierarchies of the matching type. This will allow organizing cftypes in a lot clearer way and encourage subsystems to scrutinize the interface which is being exposed in the new default hierarchy. In preparation, this patch renames cgroup_subsys->base_cftypes to cgroup_subsys->legacy_cftypes. This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2014-07-15 11:05:09 -04:00
Johan Hedberg	2d3c2260e7	Bluetooth: Don't try to reject failed LE connections The check for the blacklist in hci_le_conn_complete_evt() should be when we know that we have an actual successful connection (ev->status being non-zero). This patch fixes this ordering. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-15 10:59:30 +02:00
Johan Hedberg	3a19b6feb2	Bluetooth: Remove unnecessary params variable from process_adv_report() The params variable was just used for storing the return value from the hci_pend_le_action_lookup() function and then checking whether it's NULL or not. We can simplify the code by checking the return value directly. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-15 08:01:19 +02:00
Sasha Levin	3cf521f7dc	net/l2tp: don't fall back on UDP [get\|set]sockopt The l2tp [get\|set]sockopt() code has fallen back to the UDP functions for socket option levels != SOL_PPPOL2TP since day one, but that has never actually worked, since the l2tp socket isn't an inet socket. As David Miller points out: "If we wanted this to work, it'd have to look up the tunnel and then use tunnel->sk, but I wonder how useful that would be" Since this can never have worked so nobody could possibly have depended on that functionality, just remove the broken code and return -EINVAL. Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: James Chapman <jchapman@katalix.com> Acked-by: David Miller <davem@davemloft.net> Cc: Phil Turnbull <phil.turnbull@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-14 17:02:31 -07:00
Tom Herbert	155e010edb	udp: Move udp_tunnel_segment into udp_offload.c Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-14 16:12:15 -07:00
Tom Herbert	85644b4d0c	l2tp: Call udp_sock_create In l2tp driver call common function udp_sock_create to create the listener UDP port. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-14 16:12:15 -07:00
Tom Herbert	8024e02879	udp: Add udp_sock_create for UDP tunnels to open listener socket Added udp_tunnel.c which can contain some common functions for UDP tunnels. The first function in this is udp_sock_create which is used to open the listener port for a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-14 16:12:15 -07:00
Mathias Krause	9ecf07a1d8	neigh: sysctl - simplify address calculation of gc_* variables The code in neigh_sysctl_register() relies on a specific layout of struct neigh_table, namely that the 'gc_' variables are directly following the 'parms' member in a specific order. The code, though, expresses this in the most ugly way. Get rid of the ugly casts and use the 'tbl' pointer to get a handle to the table. This way we can refer to the 'gc_' variables directly. Similarly seen in the grsecurity patch, written by Brad Spengler. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-14 14:32:51 -07:00
Daniel Borkmann	8f2e5ae40e	net: sctp: fix information leaks in ulpevent layer While working on some other SCTP code, I noticed that some structures shared with user space are leaking uninitialized stack or heap buffer. In particular, struct sctp_sndrcvinfo has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when putting this into cmsg. But also struct sctp_remote_error contains a 2 bytes hole that we don't fill but place into a skb through skb_copy_expand() via sctp_ulpevent_make_remote_error(). Both structures are defined by the IETF in RFC6458: * Section 5.3.2. SCTP Header Information Structure: The sctp_sndrcvinfo structure is defined below: struct sctp_sndrcvinfo { uint16_t sinfo_stream; uint16_t sinfo_ssn; uint16_t sinfo_flags; <-- 2 bytes hole --> uint32_t sinfo_ppid; uint32_t sinfo_context; uint32_t sinfo_timetolive; uint32_t sinfo_tsn; uint32_t sinfo_cumtsn; sctp_assoc_t sinfo_assoc_id; }; * 6.1.3. SCTP_REMOTE_ERROR: A remote peer may send an Operation Error message to its peer. This message indicates a variety of error conditions on an association. The entire ERROR chunk as it appears on the wire is included in an SCTP_REMOTE_ERROR event. Please refer to the SCTP specification [RFC4960] and any extensions for a list of possible error formats. An SCTP error notification has the following format: struct sctp_remote_error { uint16_t sre_type; uint16_t sre_flags; uint32_t sre_length; uint16_t sre_error; <-- 2 bytes hole --> sctp_assoc_t sre_assoc_id; uint8_t sre_data[]; }; Fix this by setting both to 0 before filling them out. We also have other structures shared between user and kernel space in SCTP that contains holes (e.g. struct sctp_paddrthlds), but we copy that buffer over from user space first and thus don't need to care about it in that cases. While at it, we can also remove lengthy comments copied from the draft, instead, we update the comment with the correct RFC number where one can look it up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-14 14:18:56 -07:00
Himangi Saraogi	4d042654af	Bluetooth: cmtp: Remove unnecessary null test This patch removes the null test on ctrl. ctrl is initialized at the beginning of the function to &session->ctrl. Since session is dereferenced prior to the null test, session must be a valid pointer, and &session->ctrl cannot be null. The following Coccinelle script is used for detecting the change: @r@ expression e,f; identifier g,y; statement S1,S2; @@ e = &f->g <+... f->y ...+> if (e != NULL \|\| ...) S1 else S2 Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-14 23:00:13 +02:00
Johan Hedberg	b2d5e254eb	Bluetooth: Fix trying LTK re-encryption when we don't have an LTK In the case that the key distribution bits cause us not to generate a local LTK we should not try to re-encrypt if we're currently encrypted with an STK. This patch fixes the check for this in the smp_sufficient_security function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-07-14 13:37:10 +02:00
Marcel Holtmann	eb5a4de80f	Bluetooth: Remove sco_chan_get helper function The sco_chan_get helper function is only used in two places and really only protects conn->sk with a lock. So instead of hiding that fact, just put the actual code in place where it is used. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-14 13:10:57 +03:00
Eric Dumazet	ce355e209f	netfilter: nf_tables: 64bit stats need some extra synchronization Use generic u64_stats_sync infrastructure to get proper 64bit stats, even on 32bit arches, at no extra cost for 64bit arches. Without this fix, 32bit arches can have some wrong counters at the time the carry is propagated into upper word. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-14 12:00:17 +02:00
Pablo Neira Ayuso	38e029f14a	netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale An updater may interfer with the dumping of any of the object lists. Fix this by using a per-net generation counter and use the nl_dump_check_consistent() interface so the NLM_F_DUMP_INTR flag is set to notify userspace that it has to restart the dump since an updater has interfered. This patch also replaces the existing consistency checking code in the rule dumping path since it is broken. Basically, the value that the dump callback returns is not propagated to userspace via netlink_dump_start(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-14 12:00:16 +02:00
Pablo Neira Ayuso	e688a7f8c6	netfilter: nf_tables: safe RCU iteration on list when dumping The dump operation through netlink is not protected by the nfnl_lock. Thus, a reader process can be dumping any of the existing object lists while another process can be updating the list content. This patch resolves this situation by protecting all the object lists with RCU in the netlink dump path which is the reader side. The updater path is already protected via nfnl_lock, so use list manipulation RCU-safe operations. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-14 11:20:45 +02:00
Eric Dumazet	ec31a05c4d	net: filter: sk_chk_filter() no longer mangles filter Add const attribute to filter argument to make clear it is no longer modified. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-13 23:27:41 -07:00
David S. Miller	66568b3925	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please pull this batch of updates intended for the 3.17 stream... This is primarily a Bluetooth pull. Gustavo says: "A lot of patches to 3.17. The bulk of changes here are for LE support. The 6loWPAN over Bluetooth now has it own module, we also have support for background auto-connection and passive scanning, Bluetooth device address provisioning, support for reading Bluetooth clock values and LE connection parameters plus many many fixes." The balance is just a pull of the wireless.git tree, to avoid some pending merge problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-13 22:42:17 -07:00
Marcel Holtmann	5a54e7c85b	Bluetooth: Convert L2CAP ident spinlock into a mutex The spinlock protecting the L2CAP ident number can be converted into a mutex since the whole processing is run in a workqueue. So instead of using a spinlock, just use a mutex here. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-13 22:32:45 +03:00
Marcel Holtmann	e03ab5199d	Bluetooth: Remove unneeded forward declaration of sco_chan_del The forward declaration of sco_chan_del is not needed and thus just remove it. Move sco_chan_del into the proper location. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-13 21:39:38 +03:00
Marcel Holtmann	015b01cbca	Bluetooth: Remove unneeded forward declaration of __sco_chan_add The forward declaration of __sco_chan_add is not needed and thus just remove it. Move __sco_chan_add into the proper location. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-13 21:39:37 +03:00
Marcel Holtmann	395365eaf1	Bluetooth: Allocate struct inquiry_entry with GFP_KERNEL The allocation of inquiry cache entries is triggered as a result of processing HCI events. Since the processing is done in the context of a workqueue, there is no needed to allocate with GFP_ATOMIC in that case. Switch it to GFP_KERNEL. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-13 21:39:34 +03:00
Marcel Holtmann	4d6c705bbd	Bluetooth: Enable LE Long Term Key Request event only when supported The support for LE encryption is optional and with that also the LE Long Term Key Request event. If encryption is not supported, then do not bother enabling this event. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-07-13 08:49:58 +03:00

... 28 29 30 31 32 ...

36499 Commits (24d4e7f642882a8a13da170b4ba86eec8fa91bf2)