summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-10-31Merge tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds4-169/+348
Pull ceph updates from Ilya Dryomov: "The highlights are: - a series that fixes some old memory allocation issues in libceph (myself). We no longer allocate memory in places where allocation failures cannot be handled and BUG when the allocation fails. - support for copy_file_range() syscall (Luis Henriques). If size and alignment conditions are met, it leverages RADOS copy-from operation. Otherwise, a local copy is performed. - a patch that reduces memory requirement of ceph_sync_read() from the size of the entire read to the size of one object (Zheng Yan). - fallocate() syscall is now restricted to FALLOC_FL_PUNCH_HOLE (Luis Henriques)" * tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-client: (25 commits) ceph: new mount option to disable usage of copy-from op ceph: support copy_file_range file operation libceph: support the RADOS copy-from operation ceph: add non-blocking parameter to ceph_try_get_caps() libceph: check reply num_data_items in setup_request_data() libceph: preallocate message data items libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls libceph: introduce alloc_watch_request() libceph: assign cookies in linger_submit() libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get() ceph: num_ops is off by one in ceph_aio_retry_work() libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op() ceph: set timeout conditionally in __cap_delay_requeue libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist() libceph: introduce ceph_pagelist_alloc() libceph: osd_req_op_cls_init() doesn't need to take opcode libceph: bump CEPH_MSG_MAX_DATA_LEN ceph: only allow punch hole mode in fallocate ceph: refactor ceph_sync_read() ceph: check if LOOKUPNAME request was aborted when filling trace ...
2018-10-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-5/+5
Merge more updates from Andrew Morton: - the rest of MM - lib/bitmap updates - hfs updates - fatfs updates - various other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) mm/gup.c: fix __get_user_pages_fast() comment mm: Fix warning in insert_pfn() memory-hotplug.rst: add some details about locking internals powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages() powerpc/powernv: hold device_hotplug_lock when calling device_online() mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock mm/memory_hotplug: make add_memory() take the device_hotplug_lock mm/memory_hotplug: make remove_memory() take the device_hotplug_lock mm/memblock.c: warn if zero alignment was requested memblock: stop using implicit alignment to SMP_CACHE_BYTES docs/boot-time-mm: remove bootmem documentation mm: remove include/linux/bootmem.h memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants mm: remove nobootmem memblock: rename __free_pages_bootmem to memblock_free_pages memblock: rename free_all_bootmem to memblock_free_all memblock: replace free_bootmem_late with memblock_free_late memblock: replace free_bootmem{_node} with memblock_free mm: nobootmem: remove bootmem allocation APIs memblock: replace alloc_bootmem with memblock_alloc ...
2018-10-31mm: remove include/linux/bootmem.hMike Rapoport5-5/+5
Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-30Merge tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linuxLinus Torvalds8-161/+219
Pull nfsd updates from Bruce Fields: "Olga added support for the NFSv4.2 asynchronous copy protocol. We already supported COPY, by copying a limited amount of data and then returning a short result, letting the client resend. The asynchronous protocol should offer better performance at the expense of some complexity. The other highlight is Trond's work to convert the duplicate reply cache to a red-black tree, and to move it and some other server caches to RCU. (Previously these have meant taking global spinlocks on every RPC) Otherwise, some RDMA work and miscellaneous bugfixes" * tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits) lockd: fix access beyond unterminated strings in prints nfsd: Fix an Oops in free_session() nfsd: correctly decrement odstate refcount in error path svcrdma: Increase the default connection credit limit svcrdma: Remove try_module_get from backchannel svcrdma: Remove ->release_rqst call in bc reply handler svcrdma: Reduce max_send_sges nfsd: fix fall-through annotations knfsd: Improve lookup performance in the duplicate reply cache using an rbtree knfsd: Further simplify the cache lookup knfsd: Simplify NFS duplicate replay cache knfsd: Remove dead code from nfsd_cache_lookup SUNRPC: Simplify TCP receive code SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock SUNRPC: Remove non-RCU protected lookup NFS: Fix up a typo in nfs_dns_ent_put NFS: Lockless DNS lookups knfsd: Lockless lookup of NFSv4 identities. SUNRPC: Lockless server RPCSEC_GSS context lookup knfsd: Allow lockless lookups of the exports ...
2018-10-29nfsd: Fix an Oops in free_session()Trond Myklebust1-1/+1
In call_xpt_users(), we delete the entry from the list, but we do not reinitialise it. This triggers the list poisoning when we later call unregister_xpt_user() in nfsd4_del_conns(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29svcrdma: Remove try_module_get from backchannelChuck Lever1-14/+0
Since commit ffe1f0df5862 ("rpcrdma: Merge svcrdma and xprtrdma modules into one"), the forward and backchannel components are part of the same kernel module. A separate try_module_get() call in the backchannel code is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29svcrdma: Remove ->release_rqst call in bc reply handlerChuck Lever1-5/+4
Similar to a change made in the client's forward channel reply handler: The xprt_release_rqst_cong() call is not necessary. Also, release xprt->recv_lock when taking xprt->transport_lock to avoid disabling and enabling BH's while holding another spin lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29svcrdma: Reduce max_send_sgesChuck Lever1-4/+6
There's no need to request a large number of send SGEs because the inline threshold already constrains the number of SGEs per Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Simplify TCP receive codeTrond Myklebust1-39/+14
Use the fact that the iov iterators already have functionality for skipping a base offset. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Replace the cache_detail->hash_lock with a regular spinlockTrond Myklebust1-23/+23
Now that the reader functions are all RCU protected, use a regular spinlock rather than a reader/writer lock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Remove non-RCU protected lookupTrond Myklebust1-57/+4
Clean up the cache code by removing the non-RCU protected lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Lockless server RPCSEC_GSS context lookupTrond Myklebust1-6/+26
Use RCU protection for looking up the RPCSEC_GSS context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Make server side AUTH_UNIX use lockless lookupsTrond Myklebust1-6/+8
Convert structs ip_map and unix_gid to use RCU protected lookups. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlockTrond Myklebust1-14/+79
Instead of the reader/writer spinlock, allow cache lookups to use RCU for looking up entries. This is more efficient since modifications can occur while other entries are being looked up. Note that for now, we keep the reader/writer spinlock until all users have been converted to use RCU-safe freeing of their cache entries. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29Merge tag '9p-for-4.20' of git://github.com/martinetd/linuxLinus Torvalds9-478/+405
Pull 9p updates from Dominique Martinet: "Highlights this time around are the end of Matthew's work to remove the custom 9p request cache and use a slab directly for requests, with some extra patches on my end to not degrade performance, but it's a very good cleanup. Tomas and I fixed a few more syzkaller bugs (refcount is the big one), and I had a go at the coverity bugs and at some of the bugzilla reports we had open for a while. I'm a bit disappointed that I couldn't get much reviews for a few of my own patches, but the big ones got some and it's all been soaking in linux-next for quite a while so I think it should be OK. Summary: - Finish removing the custom 9p request cache mechanism - Embed part of the fcall in the request to have better slab performance (msize usually is power of two aligned) - syzkaller fixes: * add a refcount to 9p requests to avoid use after free * a few double free issues - A few coverity fixes - Some old patches that were in the bugzilla: * do not trust pdu content for size header * mount option for lock retry interval" * tag '9p-for-4.20' of git://github.com/martinetd/linux: (21 commits) 9p/trans_fd: put worker reqs on destroy 9p/trans_fd: abort p9_read_work if req status changed 9p: potential NULL dereference 9p locks: fix glock.client_id leak in do_lock 9p: p9dirent_read: check network-provided name length 9p/rdma: remove useless check in cm_event_handler 9p: acl: fix uninitialized iattr access 9p locks: add mount option for lock retry interval 9p: do not trust pdu content for stat item size 9p: Rename req to rreq in trans_fd 9p: fix spelling mistake in fall-through annotation 9p/rdma: do not disconnect on down_interruptible EAGAIN 9p: Add refcount to p9_req_t 9p: rename p9_free_req() function 9p: add a per-client fcall kmem_cache 9p: embed fcall in req to round down buffer allocs 9p: Remove p9_idpool 9p: Use a slab for allocating requests 9p: clear dangling pointers in p9stat_free v9fs_dir_readdir: fix double-free on p9stat_read error ...
2018-10-28net: diag: document swapped src/dst in udp_dump_one.Lorenzo Colitti1-0/+1
Since its inception, udp_dump_one has had a bug where userspace needs to swap src and dst addresses and ports in order to find the socket it wants. This is because it passes the socket source address to __udp[46]_lib_lookup's saddr argument, but those functions are intended to find local sockets matching received packets, so saddr is the remote address, not the local address. This can no longer be fixed for backwards compatibility reasons, so add a brief comment explaining that this is the case. This will avoid confusion and help ensure SOCK_DIAG implementations of new protocols don't have the same problem. Fixes: a925aa00a55 ("udp_diag: Implement the get_exact dumping functionality") Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-28net: sched: gred: pass the right attribute to gred_change_table_def()Jakub Kicinski1-1/+1
gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute, and expects it will be able to interpret its contents as struct tc_gred_sopt. Pass the correct gred attribute, instead of TCA_OPTIONS. This bug meant the table definition could never be changed after Qdisc was initialized (unless whatever TCA_OPTIONS contained both passed netlink validation and was a valid struct tc_gred_sopt...). Old behaviour: $ ip link add type dummy $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 RTNETLINK answers: Invalid argument Now: $ ip link add type dummy $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 Fixes: f62d6b936df5 ("[PKT_SCHED]: GRED: Use central VQ change procedure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-28net: bridge: remove ipv6 zero address check in mcast queriesNikolay Aleksandrov1-2/+1
Recently a check was added which prevents marking of routers with zero source address, but for IPv6 that cannot happen as the relevant RFCs actually forbid such packets: RFC 2710 (MLDv1): "To be valid, the Query message MUST come from a link-local IPv6 Source Address, be at least 24 octets long, and have a correct MLD checksum." Same goes for RFC 3810. And also it can be seen as a requirement in ipv6_mc_check_mld_query() which is used by the bridge to validate the message before processing it. Thus any queries with :: source address won't be processed anyway. So just remove the check for zero IPv6 source address from the query processing function. Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-28net: Properly unlink GRO packets on overflow.David S. Miller1-1/+1
Just like with normal GRO processing, we have to initialize skb->next to NULL when we unlink overflow packets from the GRO hash lists. Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.") Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2-6/+25
Daniel Borkmann says: ==================== pull-request: bpf 2018-10-27 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix toctou race in BTF header validation, from Martin and Wenwen. 2) Fix devmap interface comparison in notifier call which was neglecting netns, from Taehee. 3) Several fixes in various places, for example, correcting direct packet access and helper function availability, from Daniel. 4) Fix BPF kselftest config fragment to include af_xdp and sockmap, from Naresh. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds17-41/+103
Pull networking fixes from David Miller: "What better way to start off a weekend than with some networking bug fixes: 1) net namespace leak in dump filtering code of ipv4 and ipv6, fixed by David Ahern and Bjørn Mork. 2) Handle bad checksums from hardware when using CHECKSUM_COMPLETE properly in UDP, from Sean Tranchetti. 3) Remove TCA_OPTIONS from policy validation, it turns out we don't consistently use nested attributes for this across all packet schedulers. From David Ahern. 4) Fix SKB corruption in cadence driver, from Tristram Ha. 5) Fix broken WoL handling in r8169 driver, from Heiner Kallweit. 6) Fix OOPS in pneigh_dump_table(), from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits) net/neigh: fix NULL deref in pneigh_dump_table() net: allow traceroute with a specified interface in a vrf bridge: do not add port to router list when receives query with source 0.0.0.0 net/smc: fix smc_buf_unuse to use the lgr pointer ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called net/{ipv4,ipv6}: Do not put target net if input nsid is invalid lan743x: Remove SPI dependency from Microchip group. drivers: net: remove <net/busy_poll.h> inclusion when not needed net: phy: genphy_10g_driver: Avoid NULL pointer dereference r8169: fix broken Wake-on-LAN from S5 (poweroff) octeontx2-af: Use GFP_ATOMIC under spin lock net: ethernet: cadence: fix socket buffer corruption problem net/ipv6: Allow onlink routes to have a device mismatch if it is the default route net: sched: Remove TCA_OPTIONS from policy ice: Poll for link status change ice: Allocate VF interrupts and set queue map ice: Introduce ice_dev_onetime_setup net: hns3: Fix for warning uninitialized symbol hw_err_lst3 octeontx2-af: Copy the right amount of memory net: udp: fix handling of CHECKSUM_COMPLETE packets ...
2018-10-26net/neigh: fix NULL deref in pneigh_dump_table()Eric Dumazet1-2/+2
pneigh can have NULL device pointer, so we need to make neigh_master_filtered() and neigh_ifindex_filtered() more robust. syzbot report : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 15867 Comm: syz-executor2 Not tainted 4.19.0+ #276 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:179 [inline] RIP: 0010:list_empty include/linux/list.h:203 [inline] RIP: 0010:netdev_master_upper_dev_get+0xa1/0x250 net/core/dev.c:6467 RSP: 0018:ffff8801bfaf7220 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000001 RCX: ffffc90005e92000 RDX: 0000000000000016 RSI: ffffffff860b44d9 RDI: 0000000000000005 RBP: ffff8801bfaf72b0 R08: ffff8801c4c84080 R09: fffffbfff139a580 R10: fffffbfff139a580 R11: ffffffff89cd2c07 R12: 1ffff10037f5ee45 R13: 0000000000000000 R14: ffff8801bfaf7288 R15: 00000000000000b0 FS: 00007f65cc68d700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b33a21000 CR3: 00000001c6116000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: neigh_master_filtered net/core/neighbour.c:2367 [inline] pneigh_dump_table net/core/neighbour.c:2456 [inline] neigh_dump_info+0x7a9/0x1ce0 net/core/neighbour.c:2577 netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244 __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:216 [inline] rtnetlink_rcv_msg+0x809/0xc20 net/core/rtnetlink.c:4898 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4953 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x5a5/0x760 net/netlink/af_netlink.c:1336 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 sock_write_iter+0x35e/0x5c0 net/socket.c:900 call_write_iter include/linux/fs.h:1808 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6b8/0x9f0 fs/read_write.c:487 vfs_write+0x1fc/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457569 Fixes: 6f52f80e8530 ("net/neigh: Extend dump filter to proxy neighbor dumps") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@gmail.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-26bpf: fix wrong helper enablement in cgroup local storageDaniel Borkmann1-4/+0
Commit cd3394317653 ("bpf: introduce the bpf_get_local_storage() helper function") enabled the bpf_get_local_storage() helper also for BPF program types where it does not make sense to use them. They have been added both in sk_skb_func_proto() and sk_msg_func_proto() even though both program types are not invoked in combination with cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the bpf_cgroup_storage_set() is set shortly before BPF program invocation. Later, the helper bpf_get_local_storage() retrieves this prior set up per-cpu pointer and hands the buffer to the BPF program. The map argument in there solely retrieves the enum bpf_cgroup_storage_type from a local storage map associated with the program and based on the type returns either the global or per-cpu storage. However, there is no specific association between the program's map and the actual content in bpf_cgroup_storage[]. Meaning, any BPF program that would have been properly run from the cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set() was performed, and that is later unloaded such that prog / maps are teared down will cause a use after free if that pointer is retrieved from programs that are not run through BPF_PROG_RUN_ARRAY() but have the cgroup local storage helper enabled in their func proto. Lets just remove it from the two sock_map program types to fix it. Auditing through the types where this helper is enabled, it appears that these are the only ones where it was mistakenly allowed. Fixes: cd3394317653 ("bpf: introduce the bpf_get_local_storage() helper function") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Roman Gushchin <guro@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-26net: allow traceroute with a specified interface in a vrfMike Manning2-3/+3
Traceroute executed in a vrf succeeds if no device is given or if the vrf is given as the device, but fails if the interface is given as the device. This is for default UDP probes, it succeeds for TCP SYN or ICMP ECHO probes. As the skb bound dev is the interface and the sk dev is the vrf, sk lookup fails for ICMP_DEST_UNREACH and ICMP_TIME_EXCEEDED messages. The solution is for the secondary dev to be passed so that the interface is available for the device match to succeed, in the same way as is already done for non-error cases. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-26bridge: do not add port to router list when receives query with source 0.0.0.0Hangbin Liu1-1/+9
Based on RFC 4541, 2.1.1. IGMP Forwarding Rules The switch supporting IGMP snooping must maintain a list of multicast routers and the ports on which they are attached. This list can be constructed in any combination of the following ways: a) This list should be built by the snooping switch sending Multicast Router Solicitation messages as described in IGMP Multicast Router Discovery [MRDISC]. It may also snoop Multicast Router Advertisement messages sent by and to other nodes. b) The arrival port for IGMP Queries (sent by multicast routers) where the source address is not 0.0.0.0. We should not add the port to router list when receives query with source 0.0.0.0. Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-26net/smc: fix smc_buf_unuse to use the lgr pointerKarsten Graul1-13/+12
The pointer to the link group is unset in the smc connection structure right before the call to smc_buf_unuse. Provide the lgr pointer to smc_buf_unuse explicitly. And move the call to smc_lgr_schedule_free_work to the end of smc_conn_free. Fixes: a6920d1d130c ("net/smc: handle unregistered buffers") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-26ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are calledStefano Brivio1-2/+1
Commit a61bbcf28a8c ("[NET]: Store skb->timestamp as offset to a base timestamp") introduces a neighbour control buffer and zeroes it out in ndisc_rcv(), as ndisc_recv_ns() uses it. Commit f2776ff04722 ("[IPV6]: Fix address/interface handling in UDP and DCCP, according to the scoping architecture.") introduces the usage of the IPv6 control buffer in protocol error handlers (e.g. inet6_iif() in present-day __udp6_lib_err()). Now, with commit b94f1c0904da ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect()."), we call protocol error handlers from ndisc_redirect_rcv(), after the control buffer is already stolen and some parts are already zeroed out. This implies that inet6_iif() on this path will always return zero. This gives unexpected results on UDP socket lookup in __udp6_lib_err(), as we might actually need to match sockets for a given interface. Instead of always claiming the control buffer in ndisc_rcv(), do that only when needed. Fixes: b94f1c0904da ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-26Merge tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds26-1590/+1921
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix the NFSv4.1 r/wsize sanity checking - Reset the RPC/RDMA credit grant properly after a disconnect - Fix a missed page unlock after pg_doio() Features and optimisations: - Overhaul of the RPC client socket code to eliminate a locking bottleneck and reduce the latency when transmitting lots of requests in parallel. - Allow parallelisation of the RPCSEC_GSS encoding of an RPC request. - Convert the RPC client socket receive code to use iovec_iter() for improved efficiency. - Convert several NFS and RPC lookup operations to use RCU instead of taking global locks. - Avoid the need for BH-safe locks in the RPC/RDMA back channel. Bugfixes and cleanups: - Fix lock recovery during NFSv4 delegation recalls - Fix the NFSv4 + NFSv4.1 "lookup revalidate + open file" case. - Fixes for the RPC connection metrics - Various RPC client layer cleanups to consolidate stream based sockets - RPC/RDMA connection cleanups - Simplify the RPC/RDMA cleanup after memory operation failures - Clean ups for NFS v4.2 copy completion and NFSv4 open state reclaim" * tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (97 commits) SUNRPC: Convert the auth cred cache to use refcount_t SUNRPC: Convert auth creds to use refcount_t SUNRPC: Simplify lookup code SUNRPC: Clean up the AUTH cache code NFS: change sign of nfs_fh length sunrpc: safely reallow resvport min/max inversion nfs: remove redundant call to nfs_context_set_write_error() nfs: Fix a missed page unlock after pg_doio() SUNRPC: Fix a compile warning for cmpxchg64() NFSv4.x: fix lock recovery during delegation recall SUNRPC: use cmpxchg64() in gss_seq_send64_fetch_and_inc() xprtrdma: Squelch a sparse warning xprtrdma: Clean up xprt_rdma_disconnect_inject xprtrdma: Add documenting comments xprtrdma: Report when there were zero posted Receives xprtrdma: Move rb_flags initialization xprtrdma: Don't disable BH's in backchannel server xprtrdma: Remove memory address of "ep" from an error message xprtrdma: Rename rpcrdma_qp_async_error_upcall xprtrdma: Simplify RPC wake-ups on connect ...
2018-10-25Merge branch 'for-4.20' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "All trivial changes - simplification, typo fix and adding cond_resched() in a netclassid update loop" * 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup, netclassid: add a preemption point to write_classid rdmacg: fix a typo in rdmacg documentation cgroup: Simplify cgroup_ancestor
2018-10-25bpf: add bpf_jit_limit knob to restrict unpriv allocationsDaniel Borkmann1-2/+8
Rick reported that the BPF JIT could potentially fill the entire module space with BPF programs from unprivileged users which would prevent later attempts to load normal kernel modules or privileged BPF programs, for example. If JIT was enabled but unsuccessful to generate the image, then before commit 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") we would always fall back to the BPF interpreter. Nowadays in the case where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort with a failure since the BPF interpreter was compiled out. Add a global limit and enforce it for unprivileged users such that in case of BPF interpreter compiled out we fail once the limit has been reached or we fall back to BPF interpreter earlier w/o using module mem if latter was compiled in. In a next step, fair share among unprivileged users can be resolved in particular for the case where we would fail hard once limit is reached. Fixes: 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Co-Developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: LKML <linux-kernel@vger.kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25bpf: make direct packet write unclone more robustDaniel Borkmann1-0/+11
Given this seems to be quite fragile and can easily slip through the cracks, lets make direct packet write more robust by requiring that future program types which allow for such write must provide a prologue callback. In case of XDP and sk_msg it's noop, thus add a generic noop handler there. The latter starts out with NULL data/data_end unconditionally when sg pages are shared. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25bpf: disallow direct packet access for unpriv in cg_skbDaniel Borkmann1-0/+6
Commit b39b5f411dcf ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB") added support for returning pkt pointers for direct packet access. Given this program type is allowed for both unprivileged and privileged users, we shouldn't allow unprivileged ones to use it, e.g. besides others one reason would be to avoid any potential speculation on the packet test itself, thus guard this for root only. Fixes: b39b5f411dcf ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-25Merge branch 'linus' of ↵Linus Torvalds11-133/+132
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Remove VLA usage - Add cryptostat user-space interface - Add notifier for new crypto algorithms Algorithms: - Add OFB mode - Remove speck Drivers: - Remove x86/sha*-mb as they are buggy - Remove pcbc(aes) from x86/aesni - Improve performance of arm/ghash-ce by up to 85% - Implement CTS-CBC in arm64/aes-blk, faster by up to 50% - Remove PMULL based arm64/crc32 driver - Use PMULL in arm64/crct10dif - Add aes-ctr support in s5p-sss - Add caam/qi2 driver Others: - Pick better transform if one becomes available in crc-t10dif" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (124 commits) crypto: chelsio - Update ntx queue received from cxgb4 crypto: ccree - avoid implicit enum conversion crypto: caam - add SPDX license identifier to all files crypto: caam/qi - simplify CGR allocation, freeing crypto: mxs-dcp - make symbols 'sha1_null_hash' and 'sha256_null_hash' static crypto: arm64/aes-blk - ensure XTS mask is always loaded crypto: testmgr - fix sizeof() on COMP_BUF_SIZE crypto: chtls - remove set but not used variable 'csk' crypto: axis - fix platform_no_drv_owner.cocci warnings crypto: x86/aes-ni - fix build error following fpu template removal crypto: arm64/aes - fix handling sub-block CTS-CBC inputs crypto: caam/qi2 - avoid double export crypto: mxs-dcp - Fix AES issues crypto: mxs-dcp - Fix SHA null hashes and output length crypto: mxs-dcp - Implement sha import/export crypto: aegis/generic - fix for big endian systems crypto: morus/generic - fix for big endian systems crypto: lrw - fix rebase error after out of bounds fix crypto: cavium/nitrox - use pci_alloc_irq_vectors() while enabling MSI-X. crypto: cavium/nitrox - NITROX command queue changes. ...
2018-10-25net/{ipv4,ipv6}: Do not put target net if input nsid is invalidBjørn Mork2-0/+2
The cleanup path will put the target net when netnsid is set. So we must reset netnsid if the input is invalid. Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes") Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes") Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-25Merge branch 'work.compat' of ↵Linus Torvalds5-49/+80
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat_ioctl fixes from Al Viro: "A bunch of compat_ioctl fixes, mostly in bluetooth. Hopefully, most of fs/compat_ioctl.c will get killed off over the next few cycles; between this, tty series already merged and Arnd's work this cycle ought to take a good chunk out of the damn thing..." * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: hidp: fix compat_ioctl hidp: constify hidp_connection_add() cmtp: fix compat_ioctl bnep: fix compat_ioctl compat_ioctl: trim the pointless includes
2018-10-25Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2-15/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping updates from Thomas Gleixner: "The timers and timekeeping departement provides: - Another large y2038 update with further preparations for providing the y2038 safe timespecs closer to the syscalls. - An overhaul of the SHCMT clocksource driver - SPDX license identifier updates - Small cleanups and fixes all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) tick/sched : Remove redundant cpu_online() check clocksource/drivers/dw_apb: Add reset control clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE clocksource/drivers: Unify the names to timer-* format clocksource/drivers/sh_cmt: Add R-Car gen3 support dt-bindings: timer: renesas: cmt: document R-Car gen3 support clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines clocksource/drivers/sh_cmt: Fixup for 64-bit machines clocksource/drivers/sh_tmu: Convert to SPDX identifiers clocksource/drivers/sh_mtu2: Convert to SPDX identifiers clocksource/drivers/sh_cmt: Convert to SPDX identifiers clocksource/drivers/renesas-ostm: Convert to SPDX identifiers clocksource: Convert to using %pOFn instead of device_node.name tick/broadcast: Remove redundant check RISC-V: Request newstat syscalls y2038: signal: Change rt_sigtimedwait to use __kernel_timespec y2038: socket: Change recvmmsg to use __kernel_timespec y2038: sched: Change sched_rr_get_interval to use __kernel_timespec y2038: utimes: Rework #ifdef guards for compat syscalls ...
2018-10-24net/ipv6: Allow onlink routes to have a device mismatch if it is the default ↵David Ahern1-0/+2
route The intent of ip6_route_check_nh_onlink is to make sure the gateway given for an onlink route is not actually on a connected route for a different interface (e.g., 2001:db8:1::/64 is on dev eth1 and then an onlink route has a via 2001:db8:1::1 dev eth2). If the gateway lookup hits the default route then it most likely will be a different interface than the onlink route which is ok. Update ip6_route_check_nh_onlink to disregard the device mismatch if the gateway lookup hits the default route. Turns out the existing onlink tests are passing because there is no default route or it is an unreachable default, so update the onlink tests to have a default route other than unreachable. Fixes: fc1e64e1092f6 ("net/ipv6: Add support for onlink flag") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24net: sched: Remove TCA_OPTIONS from policyDavid Ahern1-1/+0
Marco reported an error with hfsc: root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1 Error: Attribute failed policy validation. Apparently a few implementations pass TCA_OPTIONS as a binary instead of nested attribute, so drop TCA_OPTIONS from the policy. Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes") Reported-by: Marco Berizzi <pupilla@libero.it> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24net: udp: fix handling of CHECKSUM_COMPLETE packetsSean Tranchetti3-6/+39
Current handling of CHECKSUM_COMPLETE packets by the UDP stack is incorrect for any packet that has an incorrect checksum value. udp4/6_csum_init() will both make a call to __skb_checksum_validate_complete() to initialize/validate the csum field when receiving a CHECKSUM_COMPLETE packet. When this packet fails validation, skb->csum will be overwritten with the pseudoheader checksum so the packet can be fully validated by software, but the skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way the stack can later warn the user about their hardware spewing bad checksums. Unfortunately, leaving the SKB in this state can cause problems later on in the checksum calculation. Since the the packet is still marked as CHECKSUM_COMPLETE, udp_csum_pull_header() will SUBTRACT the checksum of the UDP header from skb->csum instead of adding it, leaving us with a garbage value in that field. Once we try to copy the packet to userspace in the udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg() to checksum the packet data and add it in the garbage skb->csum value to perform our final validation check. Since the value we're validating is not the proper checksum, it's possible that the folded value could come out to 0, causing us not to drop the packet. Instead, we believe that the packet was checksummed incorrectly by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt to warn the user with netdev_rx_csum_fault(skb->dev); Unfortunately, since this is the UDP path, skb->dev has been overwritten by skb->dev_scratch and is no longer a valid pointer, so we end up reading invalid memory. This patch addresses this problem in two ways: 1) Do not use the dev pointer when calling netdev_rx_csum_fault() from skb_copy_and_csum_datagram_msg(). Since this gets called from the UDP path where skb->dev has been overwritten, we have no way of knowing if the pointer is still valid. Also for the sake of consistency with the other uses of netdev_rx_csum_fault(), don't attempt to call it if the packet was checksummed by software. 2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init(). If we receive a packet that's CHECKSUM_COMPLETE that fails verification (i.e. skb->csum_valid == 0), check who performed the calculation. It's possible that the checksum was done in software by the network stack earlier (such as Netfilter's CONNTRACK module), and if that says the checksum is bad, we can drop the packet immediately instead of waiting until we try and copy it to userspace. Otherwise, we need to mark the SKB as CHECKSUM_NONE, since the skb->csum field no longer contains the full packet checksum after the call to __skb_checksum_validate_complete(). Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Fixes: c84d949057ca ("udp: copy skb->truesize in the first cache line") Cc: Sam Kumar <samanthakumar@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24net: rtnl_dump_all needs to propagate error from dumpit functionDavid Ahern1-2/+4
If an address, route or netconf dump request is sent for AF_UNSPEC, then rtnl_dump_all is used to do the dump across all address families. If one of the dumpit functions fails (e.g., invalid attributes in the dump request) then rtnl_dump_all needs to propagate that error so the user gets an appropriate response instead of just getting no data. Fixes: effe67926624 ("net: Enable kernel side filtering of route dumps") Fixes: 5fcd266a9f64 ("net/ipv4: Add support for dumping addresses for a specific device") Fixes: 6371a71f3a3b ("net/ipv6: Add support for dumping addresses for a specific device") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24net: Don't return invalid table id error when dumping all familiesDavid Ahern4-0/+13
When doing a route dump across all address families, do not error out if the table does not exist. This allows a route dump for AF_UNSPEC with a table id that may only exist for some of the families. Do return the table does not exist error if dumping routes for a specific family and the table does not exist. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24net/ipv6: Put target net when address dump fails due to bad attributesDavid Ahern1-6/+8
If tgt_net is set based on IFA_TARGET_NETNSID attribute in the dump request, make sure all error paths call put_net. Fixes: 6371a71f3a3b ("net/ipv6: Add support for dumping addresses for a specific device") Fixes: ed6eff11790a ("net/ipv6: Update inet6_dump_addr for strict data checking") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24net/ipv4: Put target net when address dump fails due to bad attributesDavid Ahern1-5/+8
If tgt_net is set based on IFA_TARGET_NETNSID attribute in the dump request, make sure all error paths call put_net. Fixes: 5fcd266a9f64 ("net/ipv4: Add support for dumping addresses for a specific device") Fixes: c33078e3dfb1 ("net/ipv4: Update inet_dump_ifaddr for strict data checking") Reported-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24Merge tag 'docs-4.20' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation updates from Jonathan Corbet: "This is a fairly typical cycle for documentation. There's some welcome readability improvements for the formatted output, some LICENSES updates including the addition of the ISC license, the removal of the unloved and unmaintained 00-INDEX files, the deprecated APIs document from Kees, more MM docs from Mike Rapoport, and the usual pile of typo fixes and corrections" * tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits) docs: Fix typos in histogram.rst docs: Introduce deprecated APIs list kernel-doc: fix declaration type determination doc: fix a typo in adding-syscalls.rst docs/admin-guide: memory-hotplug: remove table of contents doc: printk-formats: Remove bogus kobject references for device nodes Documentation: preempt-locking: Use better example dm flakey: Document "error_writes" feature docs/completion.txt: Fix a couple of punctuation nits LICENSES: Add ISC license text LICENSES: Add note to CDDL-1.0 license that it should not be used docs/core-api: memory-hotplug: add some details about locking internals docs/core-api: rename memory-hotplug-notifier to memory-hotplug docs: improve readability for people with poorer eyesight yama: clarify ptrace_scope=2 in Yama documentation docs/vm: split memory hotplug notifier description to Documentation/core-api docs: move memory hotplug description into admin-guide/mm doc: Fix acronym "FEKEK" in ecryptfs docs: fix some broken documentation references iommu: Fix passthrough option documentation ...
2018-10-24Merge branch 'work.tty-ioctl' of ↵Linus Torvalds2-12/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull tty ioctl updates from Al Viro: "This is the compat_ioctl work related to tty ioctls. Quite a bit of dead code taken out, all tty-related stuff gone from fs/compat_ioctl.c. A bunch of compat bugs fixed - some still remain, but all more or less generic tty-related ioctls should be covered (remaining issues are in things like driver-private ioctls in a pcmcia serial card driver not getting properly handled in 32bit processes on 64bit host, etc)" * 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (53 commits) kill TIOCSERGSTRUCT change semantics of ldisc ->compat_ioctl() kill TIOCSER[SG]WILD synclink_gt(): fix compat_ioctl() pty: fix compat ioctls compat_ioctl - kill keyboard ioctl handling gigaset: add ->compat_ioctl() vt_compat_ioctl(): clean up, use compat_ptr() properly gigaset: don't try to printk userland buffer contents dgnc: don't bother with (empty) stub for TCXONC dgnc: leave TIOC[GS]SOFTCAR to ldisc remove fallback to drivers for TIOCGICOUNT dgnc: break-related ioctls won't reach ->ioctl() kill the rest of tty COMPAT_IOCTL() entries dgnc: TIOCM... won't reach ->ioctl() isdn_tty: TCSBRK{,P} won't reach ->ioctl() kill capinc_tty_ioctl() take compat TIOC[SG]SERIAL treatment into tty_compat_ioctl() synclink: reduce pointless checks in ->ioctl() complete ->[sg]et_serial() switchover ...
2018-10-23tcp: add tcp_reset_xmit_timer() helperEric Dumazet2-12/+14
With EDT model, SRTT no longer is inflated by pacing delays. This means that RTO and some other xmit timers might be setup incorrectly. This is particularly visible with either : - Very small enforced pacing rates (SO_MAX_PACING_RATE) - Reduced rto (from the default 200 ms) This can lead to TCP flows aborts in the worst case, or spurious retransmits in other cases. For example, this session gets far more throughput than the requested 80kbit : $ netperf -H 127.0.0.2 -l 100 -- -q 10000 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.2 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 540000 262144 262144 104.00 2.66 With the fix : $ netperf -H 127.0.0.2 -l 100 -- -q 10000 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.2 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 540000 262144 262144 104.00 0.12 EDT allows for better control of rtx timers, since TCP has a better idea of the earliest departure time of each skb in the rtx queue. We only have to eventually add to the timer the difference of the EDT time with current time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-23cgroup, netclassid: add a preemption point to write_classidMichal Hocko1-0/+1
We have seen a customer complaining about soft lockups on !PREEMPT kernel config with 4.4 based kernel [1072141.435366] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [systemd:1] [1072141.444090] Modules linked in: mpt3sas raid_class binfmt_misc af_packet 8021q garp mrp stp llc xfs libcrc32c bonding iscsi_ibft iscsi_boot_sysfs msr ext4 crc16 jbd2 mbcache cdc_ether usbnet mii joydev hid_generic usbhid intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif mgag200 i2c_algo_bit ttm ipmi_devintf drbg ixgbe drm_kms_helper vxlan ansi_cprng ip6_udp_tunnel drm aesni_intel udp_tunnel aes_x86_64 iTCO_wdt syscopyarea ptp xhci_pci lrw iTCO_vendor_support pps_core gf128mul ehci_pci glue_helper sysfillrect mdio pcspkr sb_edac ablk_helper cryptd ehci_hcd sysimgblt xhci_hcd fb_sys_fops edac_core mei_me lpc_ich ses usbcore enclosure dca mfd_core ipmi_si mei i2c_i801 scsi_transport_sas usb_common ipmi_msghandler shpchp fjes wmi processor button acpi_pad btrfs xor raid6_pq sd_mod crc32c_intel megaraid_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod md_mod autofs4 [1072141.444146] Supported: Yes [1072141.444149] CPU: 21 PID: 1 Comm: systemd Not tainted 4.4.121-92.80-default #1 [1072141.444150] Hardware name: LENOVO Lenovo System x3650 M5 -[5462P4U]- -[5462P4U]-/01GR451, BIOS -[TCE136H-2.70]- 06/13/2018 [1072141.444151] task: ffff880191bd0040 ti: ffff880191bd4000 task.ti: ffff880191bd4000 [1072141.444153] RIP: 0010:[<ffffffff815229f9>] [<ffffffff815229f9>] update_classid_sock+0x29/0x40 [1072141.444157] RSP: 0018:ffff880191bd7d58 EFLAGS: 00000286 [1072141.444158] RAX: ffff883b177cb7c0 RBX: 0000000000000000 RCX: 0000000000000000 [1072141.444159] RDX: 00000000000009c7 RSI: ffff880191bd7d5c RDI: ffff8822e29bb200 [1072141.444160] RBP: ffff883a72230980 R08: 0000000000000101 R09: 0000000000000000 [1072141.444161] R10: 0000000000000008 R11: f000000000000000 R12: ffffffff815229d0 [1072141.444162] R13: 0000000000000000 R14: ffff881fd0a47ac0 R15: ffff880191bd7f28 [1072141.444163] FS: 00007f3e2f1eb8c0(0000) GS:ffff882000340000(0000) knlGS:0000000000000000 [1072141.444164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1072141.444165] CR2: 00007f3e2f200000 CR3: 0000001ffea4e000 CR4: 00000000001606f0 [1072141.444166] Stack: [1072141.444166] ffffffa800000246 00000000000009c7 ffffffff8121d583 ffff8818312a05c0 [1072141.444168] ffff8818312a1100 ffff880197c3b280 ffff881861422858 ffffffffffffffea [1072141.444170] ffffffff81522b1c ffffffff81d0ca20 ffff8817fa17b950 ffff883fdd8121e0 [1072141.444171] Call Trace: [1072141.444179] [<ffffffff8121d583>] iterate_fd+0x53/0x80 [1072141.444182] [<ffffffff81522b1c>] write_classid+0x4c/0x80 [1072141.444187] [<ffffffff8111328b>] cgroup_file_write+0x9b/0x100 [1072141.444193] [<ffffffff81278bcb>] kernfs_fop_write+0x11b/0x150 [1072141.444198] [<ffffffff81201566>] __vfs_write+0x26/0x100 [1072141.444201] [<ffffffff81201bed>] vfs_write+0x9d/0x190 [1072141.444203] [<ffffffff812028c2>] SyS_write+0x42/0xa0 [1072141.444207] [<ffffffff815f58c3>] entry_SYSCALL_64_fastpath+0x1e/0xca [1072141.445490] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xca If a cgroup has many tasks with many open file descriptors then we would end up in a large loop without any rescheduling point throught the operation. Add cond_resched once per task. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-10-23Revert "net: simplify sock_poll_wait"Karsten Graul11-12/+12
This reverts commit dd979b4df817e9976f18fb6f9d134d6bc4a3c317. This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an internal TCP socket for the initial handshake with the remote peer. Whenever the SMC connection can not be established this TCP socket is used as a fallback. All socket operations on the SMC socket are then forwarded to the TCP socket. In case of poll, the file->private_data pointer references the SMC socket because the TCP socket has no file assigned. This causes tcp_poll to wait on the wrong socket. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-23SUNRPC: Convert the auth cred cache to use refcount_tTrond Myklebust5-8/+8
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-23SUNRPC: Convert auth creds to use refcount_tTrond Myklebust2-8/+8
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>