summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-2/+2
Merge third patch-bomb from Andrew Morton: - more ocfs2 changes - a few hotfixes - Andy's compat cleanups - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd, panic, ipmi, kgdb, profile, kfifo, ubsan, etc. - many rapidio updates: fixes, new drivers. - kcov: kernel code coverage feature. Like gcov, but not "prohibitively expensive". - extable code consolidation for various archs * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits) ia64/extable: use generic search and sort routines x86/extable: use generic search and sort routines s390/extable: use generic search and sort routines alpha/extable: use generic search and sort routines kernel/...: convert pr_warning to pr_warn drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP memremap: add MEMREMAP_WC flag memremap: don't modify flags kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE mm/mprotect.c: don't imply PROT_EXEC on non-exec fs ipc/sem: make semctl setting sempid consistent ubsan: fix tree-wide -Wmaybe-uninitialized false positives kfifo: fix sparse complaints scripts/gdb: account for changes in module data structure scripts/gdb: add cmdline reader command scripts/gdb: add version command kernel: add kcov code coverage profile: hide unused functions when !CONFIG_PROC_FS hpwdt: use nmi_panic() when kernel panics in NMI handler ...
2016-03-22Merge tag 'for-linus' of ↵Linus Torvalds1-0/+36
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "Round two of 4.6 merge window patches. This is a monster pull request. I held off on the hfi1 driver updates (the hfi1 driver is intimately tied to the qib driver and the new rdmavt software library that was created to help both of them) in my first pull request. The hfi1/qib/rdmavt update is probably 90% of this pull request. The hfi1 driver is being left in staging so that it can be fixed up in regards to the API that Al and yourself didn't like. Intel has agreed to do the work, but in the meantime, this clears out 300+ patches in the backlog queue and brings my tree and their tree closer to sync. This also includes about 10 patches to the core and a few to mlx5 to create an infrastructure for configuring SRIOV ports on IB devices. That series includes one patch to the net core that we sent to netdev@ and Dave Miller with each of the three revisions to the series. We didn't get any response to the patch, so we took that as implicit approval. Finally, this series includes Intel's new iWARP driver for their x722 cards. It's not nearly the beast as the hfi1 driver. It also has a linux-next merge issue, but that has been resolved and it now passes just fine. Summary: - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits) IB/ipoib: Allow mcast packets from other VFs IB/mlx5: Implement callbacks for manipulating VFs net/mlx5_core: Implement modify HCA vport command net/mlx5_core: Add VF param when querying vport counter IB/ipoib: Add ndo operations for configuring VFs IB/core: Add interfaces to control VF attributes IB/core: Support accessing SA in virtualized environment IB/core: Add subnet prefix to port info IB/mlx5: Fix decision on using MAD_IFC net/core: Add support for configuring VF GUIDs IB/{core, ulp} Support above 32 possible device capability flags IB/core: Replace setting the zero values in ib_uverbs_ex_query_device net/mlx5_core: Introduce offload arithmetic hardware capabilities net/mlx5_core: Refactor device capability function net/mlx5_core: Fix caching ATOMIC endian mode capability ib_srpt: fix a WARN_ON() message i40iw: Replace the obsolete crypto hash interface with shash IB/hfi1: Add SDMA cache eviction algorithm IB/hfi1: Switch to using the pin query function IB/hfi1: Specify mm when releasing pages ...
2016-03-22net/xfrm_user: use in_compat_syscall to deny compat syscallsAndy Lutomirski1-1/+1
The code wants to prevent compat code from receiving messages. Use in_compat_syscall for this. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22net/sctp: use in_compat_syscall for sctp_getsockopt_connectx3Andy Lutomirski1-1/+1
SCTP unfortunately has a different ABI for SCTP_SOCKOPT_CONNECTX3 for 32-bit and 64-bit callers. Use in_compat_syscall to correctly distinguish them on all architectures. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22AF_VSOCK: Shrink the area influenced by prepare_to_waitClaudio Imbrenda1-73/+85
When a thread is prepared for waiting by calling prepare_to_wait, sleeping is not allowed until either the wait has taken place or finish_wait has been called. The existing code in af_vsock imposed unnecessary no-sleep assumptions to a broad list of backend functions. This patch shrinks the influence of prepare_to_wait to the area where it is strictly needed, therefore relaxing the no-sleep restriction there. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-22Revert "vsock: Fix blocking ops call in prepare_to_wait"Claudio Imbrenda1-6/+13
This reverts commit 5988818008257ca42010d6b43a3e0e48afec9898 ("vsock: Fix blocking ops call in prepare_to_wait") The commit reverted with this patch caused us to potentially miss wakeups. Since the condition is not checked between the prepare_to_wait and the schedule(), if a wakeup happens after the condition is checked but before the sleep happens, we will miss it. ( A description of the problem can be found here: http://www.makelinux.net/ldd3/chp-6-sect-2 ). By reverting the patch, the behaviour is still incorrect (since we shouldn't sleep between the prepare_to_wait and the schedule) but at least it will not miss wakeups. The next patch in the series actually fixes the behaviour. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-22Merge tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds14-346/+1020
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Add support for multiple NFSv4.1 callbacks in flight - Initial patchset for RPC multipath support - Adapt RPC/RDMA to use the new completion queue API Bugfixes and cleanups: - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync - Fix RPC/RDMA credit accounting - Properly handle RDMA_ERROR replies - xprtrdma: Do not wait if ib_post_send() fails - xprtrdma: Segment head and tail XDR buffers on page boundaries - xprtrdma cleanups for dprintk, physical_op_map and unused macros" * tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits) nfs/blocklayout: make sure making a aligned read request nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed nfs: remove nfs_inode_dio_wait nfs: remove nfs4_file_fsync xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3 SUNRPC: Allow addition of new transports to a struct rpc_clnt NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections SUNRPC: Make NFS swap work with multipath ...
2016-03-22ipv4: initialize flowi4_flags before calling fib_lookup()Lance Richardson1-9/+7
Field fl4.flowi4_flags is not initialized in fib_compute_spec_dst() before calling fib_lookup(), which means fib_table_lookup() is using non-deterministic data at this line: if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) { Fix by initializing the entire fl4 structure, which will prevent similar issues as fields are added in the future by ensuring that all fields are initialized to zero unless explicitly initialized to another value. Fixes: 58189ca7b2741 ("net: Fix vti use case with oif in dst lookups") Suggested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-22ipv4: fix broadcast packets receptionPaolo Abeni1-4/+8
Currently, ingress ipv4 broadcast datagrams are dropped since, in udp_v4_early_demux(), ip_check_mc_rcu() is invoked even on bcast packets. This patch addresses the issue, invoking ip_check_mc_rcu() only for mcast packets. Fixes: 6e5403093261 ("ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-22netlink: add support for NIC driver ioctlsDavid Decotigny1-1/+9
By returning -ENOIOCTLCMD, sock_do_ioctl() falls back to calling dev_ioctl(), which provides support for NIC driver ioctls, which includes ethtool support. This is similar to the way ioctls are handled in udp.c or tcp.c. This removes the requirement that ethtool for example be tied to the support of a specific L3 protocol (ethtool uses an AF_INET socket today). Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21net: ipv4: Fix truncated timestamp returned by inet_current_timestamp()Deepa Dinamani1-1/+1
The millisecond timestamps returned by the function is converted to network byte order by making a call to htons(). htons() only returns __be16 while __be32 is required here. This was identified by the sparse warning from the buildbot: net/ipv4/af_inet.c:1405:16: sparse: incorrect type in return expression (different base types) net/ipv4/af_inet.c:1405:16: expected restricted __be32 net/ipv4/af_inet.c:1405:16: got restricted __be16 [usertype] <noident> Change the function to use htonl() to return the correct __be32 type instead so that the millisecond value doesn't get truncated. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnd Bergmann <arnd@arndb.de> Fixes: 822c868532ca ("net: ipv4: Convert IP network timestamps to be y2038 safe") Reported-by: Fengguang Wu <fengguang.wu@intel.com> [0-day test robot] Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21Make DST_CACHE a silent config optionDave Jones1-1/+1
commit 911362c70d ("net: add dst_cache support") added a new kconfig option that gets selected by other networking options. It seems the intent wasn't to offer this as a user-selectable option given the lack of help text, so this patch converts it to a silent option. Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21net/core: Add support for configuring VF GUIDsEli Cohen1-0/+36
Add two new NLAs to support configuration of Infiniband node or port GUIDs. New applications can choose to use this interface to configure GUIDs with iproute2 with commands such as: ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70 ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78 A new ndo, ndo_sef_vf_guid is introduced to notify the net device of the request to change the GUID. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21bridge: update max_gso_segs and max_gso_sizeEric Dumazet1-0/+16
It can be useful to lower max_gso_segs on NIC with very low number of TX descriptors like bcmgenet. However, this is defeated by bridge since it does not propagate the lower value of max_gso_segs and max_gso_size. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petri Gynther <pgynther@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21net/rtnetlink: add IFLA_GSO_MAX_SEGS and IFLA_GSO_MAX_SIZE attributesEric Dumazet1-0/+4
It can be useful to report dev->gso_max_segs and dev->gso_max_size so that "ip -d link" can display them to help debugging. For the moment, these attributes are read-only. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petri Gynther <pgynther@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21net: add description for len argument of dev_get_phys_port_nameLuis de Bethencourt1-0/+1
When the function dev_get_phys_port_name was added it missed a description for it's len argument. Adding it. Fixes: db24a9044ee1 ("net: add support for phys_port_name") Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20Merge branch 'mm-pkeys-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 protection key support from Ingo Molnar: "This tree adds support for a new memory protection hardware feature that is available in upcoming Intel CPUs: 'protection keys' (pkeys). There's a background article at LWN.net: https://lwn.net/Articles/643797/ The gist is that protection keys allow the encoding of user-controllable permission masks in the pte. So instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a (handful of) protection mask variants and can change the masks runtime relatively cheaply, without having to change every single page in the affected virtual memory range. This allows the dynamic switching of the protection bits of large amounts of virtual memory, via user-space instructions. It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit (see more about that below). This tree adds the MM infrastructure and low level x86 glue needed for that, plus it adds a high level API to make use of protection keys - if a user-space application calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies PROT_READ as well. Unreadable executable mappings have security advantages: they cannot be read via information leaks to figure out ASLR details, nor can they be scanned for ROP gadgets - and they cannot be used by exploits for data purposes either. We know about no user-space code that relies on pure PROT_EXEC mappings today, but binary loaders could start making use of this new feature to map binaries and libraries in a more secure fashion. There is other pending pkeys work that offers more high level system call APIs to manage protection keys - but those are not part of this pull request. Right now there's a Kconfig that controls this feature (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled (like most x86 CPU feature enablement code that has no runtime overhead), but it's not user-configurable at the moment. If there's any serious problem with this then we can make it configurable and/or flip the default" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/mm/pkeys: Fix mismerge of protection keys CPUID bits mm/pkeys: Fix siginfo ABI breakage caused by new u64 field x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA mm/core, x86/mm/pkeys: Add execute-only protection keys support x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags x86/mm/pkeys: Allow kernel to modify user pkey rights register x86/fpu: Allow setting of XSAVE state x86/mm: Factor out LDT init from context init mm/core, x86/mm/pkeys: Add arch_validate_pkey() mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU x86/mm/pkeys: Add Kconfig prompt to existing config option x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps x86/mm/pkeys: Dump PKRU with other kernel registers mm/core, x86/mm/pkeys: Differentiate instruction fetches x86/mm/pkeys: Optimize fault handling in access_error() mm/core: Do not enforce PKEY permissions on remote mm access um, pkeys: Add UML arch_*_access_permitted() methods mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys x86/mm/gup: Simplify get_user_pages() PTE bit handling ...
2016-03-20net: sched: Add description for cpu_bstats argumentLuis de Bethencourt1-0/+2
Commit 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe") added the argument cpu_bstats to functions gen_new_estimator and gen_replace_estimator and now the descriptions of these are missing for the documentation. Adding them. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20gen_stats.c: Add description for cpu argumentLuis de Bethencourt1-0/+1
Function gnet_stats_copy_basic is missing the description of the cpu argument in the documentation. Adding it. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20tunnels: Remove encapsulation offloads on decap.Jesse Gross3-5/+17
If a packet is either locally encapsulated or processed through GRO it is marked with the offloads that it requires. However, when it is decapsulated these tunnel offload indications are not removed. This means that if we receive an encapsulated TCP packet, aggregate it with GRO, decapsulate, and retransmit the resulting frame on a NIC that does not support encapsulation, we won't be able to take advantage of hardware offloads even though it is just a simple TCP packet at this point. This fixes the problem by stripping off encapsulation offload indications when packets are decapsulated. The performance impacts of this bug are significant. In a test where a Geneve encapsulated TCP stream is sent to a hypervisor, GRO'ed, decapsulated, and bridged to a VM performance is improved by 60% (5Gbps->8Gbps) as a result of avoiding unnecessary segmentation at the VM tap interface. Reported-by: Ramu Ramamurthy <sramamur@linux.vnet.ibm.com> Fixes: 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE") Signed-off-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20tunnels: Don't apply GRO to multiple layers of encapsulation.Jesse Gross5-6/+37
When drivers express support for TSO of encapsulated packets, they only mean that they can do it for one layer of encapsulation. Supporting additional levels would mean updating, at a minimum, more IP length fields and they are unaware of this. No encapsulation device expresses support for handling offloaded encapsulated packets, so we won't generate these types of frames in the transmit path. However, GRO doesn't have a check for multiple levels of encapsulation and will attempt to build them. UDP tunnel GRO actually does prevent this situation but it only handles multiple UDP tunnels stacked on top of each other. This generalizes that solution to prevent any kind of tunnel stacking that would cause problems. Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack") Signed-off-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20ipip: Properly mark ipip GRO packets as encapsulated.Jesse Gross1-1/+8
ipip encapsulated packets can be merged together by GRO but the result does not have the proper GSO type set or even marked as being encapsulated at all. Later retransmission of these packets will likely fail if the device does not support ipip offloads. This is similar to the issue resolved in IPv6 sit in feec0cb3 ("ipv6: gro: support sit protocol"). Reported-by: Patrick Boutilier <boutilpj@ednet.ns.ca> Fixes: 9667e9bb ("ipip: Add gro callbacks to ipip offload") Tested-by: Patrick Boutilier <boutilpj@ednet.ns.ca> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20sctp: align MTU to a wordMarcelo Ricardo Leitner3-4/+6
SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare MTU can sometimes return values that are not aligned, like for loopback, which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment will cause the last non-aligned bytes to never be used and can cause issues with congestion control. So it's better to just consider a lower MTU and keep congestion control calcs saner as they are based on PMTU. Same applies to icmp frag needed messages, which is also fixed by this patch. One other effect of this is the inability to send MTU-sized packet without queueing or fragmentation and without hitting Nagle. As the check performed at sctp_packet_can_append_data(): if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead) /* Enough data queued to fill a packet */ return SCTP_XMIT_OK; with the above example of MTU, if there are no other messages queued, one cannot send a packet that just fits one packet (65532 bytes) and without causing DATA chunk fragmentation or a delay. v2: - Added WORD_TRUNC macro Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20sctp: do not leak chunks that are sent to unconfirmed pathsMarcelo Ricardo Leitner1-1/+5
Currently, if a chunk is scheduled to be sent through a transport that is currently unconfirmed, it will be leaked as it is dequeued from outq and is not re-queued nor freed. As I'm not aware of any situation that may lead to this situation, I'm fixing this by freeing the chunk and also logging a trace so that we can fix the other bug if it ever happens. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20sctp: do not update a_rwnd if we are not issuing a sackMarcelo Ricardo Leitner1-1/+5
The SACK can be lost pretty much elsewhere, but if its allocation fail, we know we are not sending it, so it is better to revert a_rwnd to its previous value as this may give it a chance to issue a window update later. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20net: remove a dubious unlikely() clauseEric Dumazet1-1/+1
TCP protocol is still used these days, and TCP uses clones in its transmit path. We can not optimize linux stack assuming it is mostly used in routers, or that TCP is dead. Fixes: 795bb1c00d ("net: bulk free infrastructure for NAPI context, use napi_consume_skb") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller423-12295/+19159
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds402-11643/+18375
Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...
2016-03-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...
2016-03-18RDS: TCP: Remove unused constantSowmini Varadhan1-2/+0
RDS_TCP_DEFAULT_BUFSIZE has been unused since commit 1edd6a14d24f ("RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune"). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socketSowmini Varadhan1-10/+135
Add per-net sysctl tunables to set the size of sndbuf and rcvbuf on the kernel tcp socket. The tunables are added at /proc/sys/net/rds/tcp/rds_tcp_sndbuf and /proc/sys/net/rds/tcp/rds_tcp_rcvbuf. These values must be set before accept() or connect(), and there may be an arbitrary number of existing rds-tcp sockets when the tunable is modified. To make sure that all connections in the netns pick up the same value for the tunable, we reset existing rds-tcp connections in the netns, so that they can reconnect with the new parameters. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use itDaniel Borkmann3-8/+9
eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX for this, which makes the code a bit more generic and allows to remove BPF_TUNLEN_MAX from eBPF code. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18bpf, dst: add and use dst_tclassid helperDaniel Borkmann1-8/+1
We can just add a small helper dst_tclassid() for retrieving the dst->tclassid value. It makes the code a bit better in that we can get rid of the ifdef from filter.c by moving this into the header. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18bpf: make skb->tc_classid also readableDaniel Borkmann1-6/+6
Currently, the tc_classid from eBPF skb context is write-only, but there's no good reason for tc programs to limit it to write-only. For example, it can be used to transfer its state via tail calls where the resulting tc_classid gets filled gradually. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18cls_bpf: reset class and reuse major in daDaniel Borkmann1-5/+8
There are two issues with the current code. First one is that we need to set res->class to 0 in case we use non-default classid matching. This is important for the case where cls_bpf was initially set up with an optional binding to a default class with tcf_bind_filter(), where the underlying qdisc implements bind_tcf() that fills res->class and tests for it later on when doing the classification. Convention for these cases is that after tc_classify() was called, such qdiscs (atm, drr, qfq, cbq, hfsc, htb) first test class, and if 0, then they lookup based on classid. Second, there's a bug with da mode, where res->classid is only assigned a 16 bit minor, but it needs to expand to the full 32 bit major/minor combination instead, therefore we need to expand with the bound major. This is fine as classes belonging to a classful qdisc must share the same major. Fixes: 045efa82ff56 ("cls_bpf: introduce integrated actions") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18openvswitch: allow output of MPLS packets on tunnel vportsSimon Horman1-3/+0
Currently output of MPLS packets on tunnel vports is not allowed by Open vSwitch. This is because historically encapsulation was done in such a way that the inner_protocol field of the skb needed to hold the inner protocol for both MPLS and tunnel encapsulation in order for GSO segmentation to be performed correctly. Since b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") Open vSwitch makes use of lwt to output to tunnel netdevs which perform encapsulation. As no drivers expose support for MPLS offloads this means that GSO packets are segmented in software by validate_xmit_skb(), which is called from __dev_queue_xmit(), before tunnel encapsulation occurs. This means that the inner protocol of MPLS is no longer needed by the time encapsulation occurs and the contention on the inner_protocol field of the skb no longer occurs. Thus it is now safe to output MPLS to tunnel vports. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18ovs: internal_set_rx_headroom() can be staticWu Fengguang1-1/+1
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18net: dst_cache_per_cpu_dst_set() can be staticWu Fengguang1-4/+4
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18Merge tag 'for-linus' of ↵Linus Torvalds1-55/+31
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Initial roundup of 4.6 merge window patches. This is the first of two pull requests. It is the smaller request, but touches for more different things (this is everything but what is in or going into staging). The pull request for the code in staging/rdma is on hold until after we decide what to do on the write/writev API issue and may be partially deferred until 4.7 as a result. Summary: - cxgb4 updates - nes updates - unification of iwarp portmapper code to core - add drain_cq API - various ib_core updates - minor ipoib updates - minor mlx4 updates - more significant mlx5 updates (including a minor merge conflict with net-next tree...merge is simple to resolve and Stephen's resolution was confirmed by Mellanox) - trivial net/9p rdma conversion - ocrdma RoCEv2 update - srpt updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits) iwpm: crash fix for large connections test iw_cxgb3: support for iWARP port mapping iw_cxgb4: remove port mapper related code iw_nes: remove port mapper related code iwcm: common code for port mapper net/9p: convert to new CQ API IB/mlx5: Add support for don't trap rules net/mlx5_core: Introduce forward to next priority action net/mlx5_core: Create anchor of last flow table iser: Accept arbitrary sg lists mapping if the device supports it mlx5: Add arbitrary sg list support IB/core: Add arbitrary sg_list support IB/mlx5: Expose correct max_fast_reg_page_list_len IB/mlx5: Make coding style more consistent IB/mlx5: Convert UMR CQ to new CQ API IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Delete unnecessary variable initialisations in 11 functions IB/core: Documentation fix in the MAD header file IB/core: trivial prink cleanup. ...
2016-03-17Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: drivers/rtc: broken link fix drm/i915 Fix typos in i915_gem_fence.c Docs: fix missing word in REPORTING-BUGS lib+mm: fix few spelling mistakes MAINTAINERS: add git URL for APM driver treewide: Fix typo in printk
2016-03-17tcp/dccp: remove obsolete WARN_ON() in icmp handlersEric Dumazet2-4/+0
Now SYN_RECV request sockets are installed in ehash table, an ICMP handler can find a request socket while another cpu handles an incoming packet transforming this SYN_RECV request socket into an ESTABLISHED socket. We need to remove the now obsolete WARN_ON(req->sk), since req->sk is set when a new child is created and added into listener accept queue. If this race happens, the ICMP will do nothing special. Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ben Lazarus <blazarus@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-17vlan: propagate gso_max_segsEric Dumazet2-0/+2
vlan drivers lack proper propagation of gso_max_segs from lower device. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-17mm: introduce page reference manipulation functionsJoonsoo Kim1-1/+1
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17Merge tag 'tty-4.6-rc1' of ↵Linus Torvalds3-27/+10
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here's the big tty/serial driver pull request for 4.6-rc1. Lots of changes in here, Peter has been on a tear again, with lots of refactoring and bugs fixes, many thanks to the great work he has been doing. Lots of driver updates and fixes as well, full details in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (220 commits) serial: 8250: describe CONFIG_SERIAL_8250_RSA serial: samsung: optimize UART rx fifo access routine serial: pl011: add mark/space parity support serial: sa1100: make sa1100_register_uart_fns a function tty: serial: 8250: add MOXA Smartio MUE boards support serial: 8250: convert drivers to use up_to_u8250p() serial: 8250/mediatek: fix building with SERIAL_8250=m serial: 8250/ingenic: fix building with SERIAL_8250=m serial: 8250/uniphier: fix modular build Revert "drivers/tty/serial: make 8250/8250_ingenic.c explicitly non-modular" Revert "drivers/tty/serial: make 8250/8250_mtk.c explicitly non-modular" serial: mvebu-uart: initial support for Armada-3700 serial port serial: mctrl_gpio: Add missing module license serial: ifx6x60: avoid uninitialized variable use tty/serial: at91: fix bad offset for UART timeout register tty/serial: at91: restore dynamic driver binding serial: 8250: Add hardware dependency to RT288X option TTY, devpts: document pty count limiting tty: goldfish: support platform_device with id -1 drivers: tty: goldfish: Add device tree bindings ...
2016-03-17sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a raceNeilBrown1-3/+3
sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer set. In this case it aborts the queuing of the upcall. However it has already taken a new counted reference on "h" and doesn't "put" it, even though it frees the data structure holding the reference. So let's delay the "cache_get" until we know we need it. Fixes: f9e1aedc6c79 ("sunrpc/cache: remove races with queuing an upcall.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-17Merge branch 'linus' of ↵Linus Torvalds23-562/+737
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 4.6: API: - Convert remaining crypto_hash users to shash or ahash, also convert blkcipher/ablkcipher users to skcipher. - Remove crypto_hash interface. - Remove crypto_pcomp interface. - Add crypto engine for async cipher drivers. - Add akcipher documentation. - Add skcipher documentation. Algorithms: - Rename crypto/crc32 to avoid name clash with lib/crc32. - Fix bug in keywrap where we zero the wrong pointer. Drivers: - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver. - Add PIC32 hwrng driver. - Support BCM6368 in bcm63xx hwrng driver. - Pack structs for 32-bit compat users in qat. - Use crypto engine in omap-aes. - Add support for sama5d2x SoCs in atmel-sha. - Make atmel-sha available again. - Make sahara hashing available again. - Make ccp hashing available again. - Make sha1-mb available again. - Add support for multiple devices in ccp. - Improve DMA performance in caam. - Add hashing support to rockchip" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: qat - remove redundant arbiter configuration crypto: ux500 - fix checks of error code returned by devm_ioremap_resource() crypto: atmel - fix checks of error code returned by devm_ioremap_resource() crypto: qat - Change the definition of icp_qat_uof_regtype hwrng: exynos - use __maybe_unused to hide pm functions crypto: ccp - Add abstraction for device-specific calls crypto: ccp - CCP versioning support crypto: ccp - Support for multiple CCPs crypto: ccp - Remove check for x86 family and model crypto: ccp - memset request context to zero during import lib/mpi: use "static inline" instead of "extern inline" lib/mpi: avoid assembler warning hwrng: bcm63xx - fix non device tree compatibility crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode. crypto: qat - The AE id should be less than the maximal AE number lib/mpi: Endianness fix crypto: rockchip - add hash support for crypto engine in rk3288 crypto: xts - fix compile errors crypto: doc - add skcipher API documentation crypto: doc - update AEAD AD handling ...
2016-03-16ethtool: Set cmd field in ETHTOOL_GLINKSETTINGS response to wrong nwordsBen Hutchings1-1/+1
When the ETHTOOL_GLINKSETTINGS implementation finds that userland is using the wrong number of words of link mode bitmaps (or is trying to find out the right numbers) it sets the cmd field to 0 in the response structure. This is inconsistent with the implementation of every other ethtool command, so let's remove that inconsistency before it gets into a stable release. Fixes: 3f1ac7a700d03 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-16sctp: consolidate local_bh_disable/enable + spin_lock/unlock to _bh variantNicholas Mc Guire1-4/+2
local_bh_disable() + spin_lock() is equivalent to spin_lock_bh(), same for the unlock/enable case, so replace the calls by the appropriate wrappers. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-16Merge tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust58-410/+708
NFS: NFSoRDMA Client Side Changes These patches include several bugfixes and cleanups for the NFSoRDMA client. This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes from the recent workqueue swicchover. These patches also switch xprtrdma to use the new CQ API Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (787 commits) xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
2016-03-15tags: Fix DEFINE_PER_CPU expansionsPeter Zijlstra3-6/+4
$ make tags GEN tags ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1" ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1" ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1" ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1" ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1" ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1" ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1" Which are all the result of the DEFINE_PER_CPU pattern: scripts/tags.sh:200: '/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/' scripts/tags.sh:201: '/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/' The below cures them. All except the workqueue one are within reasonable distance of the 80 char limit. TJ do you have any preference on how to fix the wq one, or shall we just not care its too long? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>