summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-12-03tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()Eric Dumazet2-23/+46
James Morris reported kernel stack corruption bug [1] while running the SELinux testsuite, and bisected to a recent commit bffa72cf7f9d ("net: sk_buff rbnode reorg") We believe this commit is fine, but exposes an older bug. SELinux code runs from tcp_filter() and might send an ICMP, expecting IP options to be found in skb->cb[] using regular IPCB placement. We need to defer TCP mangling of skb->cb[] after tcp_filter() calls. This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very similar way we added them for IPv6. [1] [ 339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet [ 339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5 [ 339.822505] [ 339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15 [ 339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A 01/19/2017 [ 339.885060] Call Trace: [ 339.896875] <IRQ> [ 339.908103] dump_stack+0x63/0x87 [ 339.920645] panic+0xe8/0x248 [ 339.932668] ? ip_push_pending_frames+0x33/0x40 [ 339.946328] ? icmp_send+0x525/0x530 [ 339.958861] ? kfree_skbmem+0x60/0x70 [ 339.971431] __stack_chk_fail+0x1b/0x20 [ 339.984049] icmp_send+0x525/0x530 [ 339.996205] ? netlbl_skbuff_err+0x36/0x40 [ 340.008997] ? selinux_netlbl_err+0x11/0x20 [ 340.021816] ? selinux_socket_sock_rcv_skb+0x211/0x230 [ 340.035529] ? security_sock_rcv_skb+0x3b/0x50 [ 340.048471] ? sk_filter_trim_cap+0x44/0x1c0 [ 340.061246] ? tcp_v4_inbound_md5_hash+0x69/0x1b0 [ 340.074562] ? tcp_filter+0x2c/0x40 [ 340.086400] ? tcp_v4_rcv+0x820/0xa20 [ 340.098329] ? ip_local_deliver_finish+0x71/0x1a0 [ 340.111279] ? ip_local_deliver+0x6f/0xe0 [ 340.123535] ? ip_rcv_finish+0x3a0/0x3a0 [ 340.135523] ? ip_rcv_finish+0xdb/0x3a0 [ 340.147442] ? ip_rcv+0x27c/0x3c0 [ 340.158668] ? inet_del_offload+0x40/0x40 [ 340.170580] ? __netif_receive_skb_core+0x4ac/0x900 [ 340.183285] ? rcu_accelerate_cbs+0x5b/0x80 [ 340.195282] ? __netif_receive_skb+0x18/0x60 [ 340.207288] ? process_backlog+0x95/0x140 [ 340.218948] ? net_rx_action+0x26c/0x3b0 [ 340.230416] ? __do_softirq+0xc9/0x26a [ 340.241625] ? do_softirq_own_stack+0x2a/0x40 [ 340.253368] </IRQ> [ 340.262673] ? do_softirq+0x50/0x60 [ 340.273450] ? __local_bh_enable_ip+0x57/0x60 [ 340.285045] ? ip_finish_output2+0x175/0x350 [ 340.296403] ? ip_finish_output+0x127/0x1d0 [ 340.307665] ? nf_hook_slow+0x3c/0xb0 [ 340.318230] ? ip_output+0x72/0xe0 [ 340.328524] ? ip_fragment.constprop.54+0x80/0x80 [ 340.340070] ? ip_local_out+0x35/0x40 [ 340.350497] ? ip_queue_xmit+0x15c/0x3f0 [ 340.361060] ? __kmalloc_reserve.isra.40+0x31/0x90 [ 340.372484] ? __skb_clone+0x2e/0x130 [ 340.382633] ? tcp_transmit_skb+0x558/0xa10 [ 340.393262] ? tcp_connect+0x938/0xad0 [ 340.403370] ? ktime_get_with_offset+0x4c/0xb0 [ 340.414206] ? tcp_v4_connect+0x457/0x4e0 [ 340.424471] ? __inet_stream_connect+0xb3/0x300 [ 340.435195] ? inet_stream_connect+0x3b/0x60 [ 340.445607] ? SYSC_connect+0xd9/0x110 [ 340.455455] ? __audit_syscall_entry+0xaf/0x100 [ 340.466112] ? syscall_trace_enter+0x1d0/0x2b0 [ 340.476636] ? __audit_syscall_exit+0x209/0x290 [ 340.487151] ? SyS_connect+0xe/0x10 [ 340.496453] ? do_syscall_64+0x67/0x1b0 [ 340.506078] ? entry_SYSCALL64_slow_path+0x25/0x25 Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: James Morris <james.l.morris@oracle.com> Tested-by: James Morris <james.l.morris@oracle.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03rxrpc: Fix the MAINTAINERS recordDavid Howells1-3/+15
Fix the MAINTAINERS record so that it's more obvious who the maintainer for AF_RXRPC is. Reported-by: Joe Perches <joe@perches.com> Reported-by: David Miller <davem@davemloft.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03rxrpc: Use correct netns source in rxrpc_release_sock()David Howells1-2/+3
In rxrpc_release_sock() there may be no rx->local value to access, so we can't unconditionally follow it to the rxrpc network namespace information to poke the connection reapers. Instead, use the socket's namespace pointer to find the namespace. This unfixed code causes the following static checker warning: net/rxrpc/af_rxrpc.c:898 rxrpc_release_sock() error: we previously assumed 'rx->local' could be null (see line 887) Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03liquidio: fix incorrect indentation of assignment statementColin Ian King1-1/+1
Remove one extraneous level of indentation on assignment statement. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03Merge tag 'linux-can-fixes-for-4.15-20171201' of ↵David S. Miller6-11/+25
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-12-01 this is a pull for net consisting of nine patches. The first three patches are by Jimmy Assarsson for the kvaser_usb driver and add the missing free()s in some error path, a signed/unsigned comparison and ratelimit the error messages in case of incomplete messages. Oliver Stäbler's patch for the ti_hecc driver fix the napi poll function's return value. The return values of the probe function of the peak_canfd and peak_pci PCI drivers are fixed by Stephane Grosjean's patch. Two patches by me for the flexcan driver update the bugs/features/quirks overview table and fix the error state transition for the VF610 SoC. The two patches by Martin Kelly for the mcba_usb driver fix a typo and a device disconnect bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03stmmac: reset last TSO segment size after device openLars Persson1-0/+1
The mss variable tracks the last max segment size sent to the TSO engine. We do not update the hardware as long as we receive skb:s with the same value in gso_size. During a network device down/up cycle (mapped to stmmac_release() and stmmac_open() callbacks) we issue a reset to the hardware and it forgets the setting for mss. However we did not zero out our mss variable so the next transmission of a gso packet happens with an undefined hardware setting. This triggers a hang in the TSO engine and eventuelly the netdev watchdog will bark. Fixes: f748be531d70 ("stmmac: support new GMAC4") Signed-off-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03ipvlan: Add the skb->mark as flow4's member to lookup routeGao Feng1-0/+1
Current codes don't use skb->mark to assign flowi4_mark, it would make the policy route rule with fwmark doesn't work as expected. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02Merge branch 's390-qeth-fixes'David S. Miller4-4/+41
Julian Wiedmann says: ==================== s390/qeth: fixes 2017-12-01 please apply the following three fixes for 4.15. These should also go back to stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02s390/qeth: build max size GSO skbs on L2 devicesJulian Wiedmann2-4/+2
The current GSO skb size limit was copy&pasted over from the L3 path, where it is needed due to a TSO limitation. As L2 devices don't offer TSO support (and thus all GSO skbs are segmented before they reach the driver), there's no reason to restrict the stack in how large it may build the GSO skbs. Fixes: d52aec97e5bc ("qeth: enable scatter/gather in layer 2 mode") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02s390/qeth: fix GSO throughput regressionJulian Wiedmann4-0/+38
Using GSO with small MTUs currently results in a substantial throughput regression - which is caused by how qeth needs to map non-linear skbs into its IO buffer elements: compared to a linear skb, each GSO-segmented skb effectively consumes twice as many buffer elements (ie two instead of one) due to the additional header-only part. This causes the Output Queue to be congested with low-utilized IO buffers. Fix this as follows: If the MSS is low enough so that a non-SG GSO segmentation produces order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is where we anticipate the biggest savings, since an SG-enabled GSO segmentation produces skbs that always consume at least two buffer elements. Larger MSS values continue to get a SG-enabled GSO segmentation, since 1) the relative overhead of the additional header-only buffer element becomes less noticeable, and 2) the linearization overhead increases. With the throughput regression fixed, re-enable NETIF_F_SG by default to reap the significant CPU savings of GSO. Fixes: 5722963a8e83 ("qeth: do not turn on SG per default") Reported-by: Nils Hoppmann <niho@de.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02s390/qeth: fix thinko in IPv4 multicast address trackingJulian Wiedmann1-0/+1
Commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") reworked how secondary addresses are managed for qeth devices. Instead of dropping & subsequently re-adding all addresses on every ndo_set_rx_mode() call, qeth now keeps track of the addresses that are currently registered with the HW. On a ndo_set_rx_mode(), we thus only need to do (de-)registration requests for the addresses that have actually changed. On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong hashtable - and thus never finds a match. As a result, we first delete *all* such addresses, and then re-add them again. So each set_rx_mode() causes a short period where the IPv4 Multicast addresses are not registered, and the card stops forwarding inbound traffic for them. Fix this by setting the ->is_multicast flag on the lookup object, thus enabling qeth_l3_ip_from_hash() to search the correct hashtable and find a match there. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02Merge branch 'vhost-skb-leaks'David S. Miller3-20/+38
Wei Xu says: ==================== vhost: fix a few skb leaks Matthew found a roughly 40% tcp throughput regression with commit c67df11f(vhost_net: try batch dequing from skb array) as discussed in the following thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html v4: - fix zero iov iterator count in tap/tap_do_read()(Jason) - don't put tun in case of EBADFD(Jason) - Replace msg->msg_control with new 'skb' when calling tun/tap_do_read() v3: - move freeing skb from vhost to tun/tap recvmsg() to not confuse the callers. v2: - add Matthew as the reporter, thanks matthew. - moving zero headcount check ahead instead of defer consuming skb due to jason and mst's comment. - add freeing skb in favor of recvmsg() fails. ==================== Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02tap: free skb if flags errorWei Xu1-4/+10
tap_recvmsg() supports accepting skb by msg_control after commit 3b4ba04acca8 ("tap: support receiving skb from msg_control"), the skb if presented should be freed within the function, otherwise it would be leaked. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02tun: free skb in early errorsWei Xu1-6/+18
tun_recvmsg() supports accepting skb by msg_control after commit ac77cfd4258f ("tun: support receiving skb through msg_control"), the skb if presented should be freed no matter how far it can go along, otherwise it would be leaked. This patch fixes several missed cases. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02vhost: fix skb leak in handle_rx()Wei Xu1-10/+10
Matthew found a roughly 40% tcp throughput regression with commit c67df11f(vhost_net: try batch dequing from skb array) as discussed in the following thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html Eventually we figured out that it was a skb leak in handle_rx() when sending packets to the VM. This usually happens when a guest can not drain out vq as fast as vhost fills in, afterwards it sets off the traffic jam and leaks skb(s) which occurs as no headcount to send on the vq from vhost side. This can be avoided by making sure we have got enough headcount before actually consuming a skb from the batched rx array while transmitting, which is simply done by moving checking the zero headcount a bit ahead. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02Merge branch 'bnxt_en-fixes'David S. Miller2-29/+31
Michael Chan says: ==================== bnxt_en: Fixes. A shutdown fix for SMARTNIC, 2 fixes related to TC Flower vxlan filters, and the last one fixes an out-of-scope variable when sending short firmware messages. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()Vasundhara Volam1-1/+1
short_input variable is assigned to another data pointer which is referred out of its scope. Fix it by moving short_input definition to the beginning of bnxt_hwrm_do_send_msg() function. No failure has been reported so far due to this issue. Fixes: e605db801bde ("bnxt_en: Support for Short Firmware Message") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02bnxt_en: fix dst/src fid for vxlan encap/decap actionsSathya Perla1-22/+26
For flows that involve a vxlan encap action, the vxlan sock interface may be specified as the outgoing interface. The driver must resolve the outgoing PF interface used by this socket and use the dst_fid of the PF in the hwrm_cfa_encap_record_alloc cmd. Similarily for flows that have a vxlan decap action, the fid of the incoming PF interface must be used as the src_fid in the hwrm_cfa_decap_filter_alloc cmd. Fixes: 8c95f773b4a3 ("bnxt_en: add support for Flower based vxlan encap/decap offload") Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02bnxt_en: wildcard smac while creating tunnel decap filterSunil Challa1-5/+2
While creating a decap filter the tunnel smac need not (and must not) be specified as we cannot ascertain the neighbor in the recv path. 'ttl' match is also not needed for the decap filter and must be wild-carded. Fixes: f484f6782e01 ("bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter") Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdownRay Jui1-1/+2
The current 'bnxt_shutdown' implementation only invokes 'bnxt_ulp_shutdown' to shut down RoCE in the case when the system is in the path of power off (SYSTEM_POWER_OFF). While this may work in most cases, it does not work in the smart NIC case, when Linux 'reboot' command is initiated from the Linux that runs on the ARM cores of the NIC card. In this particular case, Linux 'reboot' results in a system 'L3' level reset where the entire ARM and associated subsystems are being reset, but at the same time, Nitro core is being kept in sane state (to allow external PCIe connected servers to continue to work). Without properly shutting down RoCE and freeing all associated resources, it results in the ARM core to hang immediately after the 'reboot' By always invoking 'bnxt_ulp_shutdown' in 'bnxt_shutdown', it fixes the above issue Fixes: 0efd2fc65c92 ("bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.") Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01Merge branch 'sfp-phylink-fixes'David S. Miller2-11/+31
Russell King says: ==================== SFP/phylink fixes Here are four phylink fixes: - the "options" is a big-endian value, we must test the bits taking the endian-ness into account. - improve the handling of RX_LOS polarity, taking no RX_LOS polarity bits set to mean there is no RX_LOS functionality provided. - do not report modules that require the address mode switching as supporting SFF8472. - ensure that the mac_link_down() function is called when phylink_stop() is called. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01phylink: ensure we take the link down when phylink_stop() is calledRussell King1-0/+1
Ensure that we tell the MAC to take the link down when phylink_stop() is called, and that this completes prior to phylink_stop() returns. Reported-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sfp: warn about modules requiring address change sequenceRussell King1-1/+7
We do not support SFP modules which require the address change sequence as detailed by SFF 8472 revision 1.22 section 8.9. Warn when these modules are inserted, and treat them as SFF8079 modules for ethtool. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sfp: improve RX_LOS handlingRussell King1-11/+22
There are two bits in the option word for the RX_LOS signal. One reports that the RX_LOS signal is active high, the other reports that it is active low. When both or neither are set, the result is not well defined in the specification. Rather than assuming that neither set means normal RX_LOS, take this as meaning no RX_LOS signal available, thereby ignoring the signal. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sfp: fix RX_LOS signal handlingRussell King1-3/+5
The options word is a be16 quantity, so we need to test the flags having converted the endian-ness. Convert the flag bits to be16, which can be optimised by the compiler, rather than converting a variable at runtime. Reported-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01net: phy-micrel: check return code in flp center functionMax Uvarov1-2/+4
Fix obvious typo that first return value is set but not checked. Signed-off-by: Max Uvarov <muvarov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()Tommi Rantala1-4/+0
Remove the second tipc_rcv() call in tipc_udp_recv(). We have just checked that the bearer is not up, and calling tipc_rcv() with a bearer that is not up leads to a TIPC div-by-zero crash in tipc_node_calculate_timer(). The crash is rare in practice, but can happen like this: We're enabling a bearer, but it's not yet up and fully initialized. At the same time we receive a discovery packet, and in tipc_udp_recv() we end up calling tipc_rcv() with the not-yet-initialized bearer, causing later the div-by-zero crash in tipc_node_calculate_timer(). Jon Maloy explains the impact of removing the second tipc_rcv() call: "link setup in the worst case will be delayed until the next arriving discovery messages, 1 sec later, and this is an acceptable delay." As the tipc_rcv() call is removed, just leave the function via the rcu_out label, so that we will kfree_skb(). [ 12.590450] Own node address <1.1.1>, network identity 1 [ 12.668088] divide error: 0000 [#1] SMP [ 12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1 [ 12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000 [ 12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] [ 12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246 [ 12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000 [ 12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600 [ 12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001 [ 12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8 [ 12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800 [ 12.702338] FS: 0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000 [ 12.705099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0 [ 12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 12.712627] Call Trace: [ 12.713390] <IRQ> [ 12.714011] tipc_node_check_dest+0x2e8/0x350 [tipc] [ 12.715286] tipc_disc_rcv+0x14d/0x1d0 [tipc] [ 12.716370] tipc_rcv+0x8b0/0xd40 [tipc] [ 12.717396] ? minmax_running_min+0x2f/0x60 [ 12.718248] ? dst_alloc+0x4c/0xa0 [ 12.718964] ? tcp_ack+0xaf1/0x10b0 [ 12.719658] ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc] [ 12.720634] tipc_udp_recv+0x71/0x1d0 [tipc] [ 12.721459] ? dst_alloc+0x4c/0xa0 [ 12.722130] udp_queue_rcv_skb+0x264/0x490 [ 12.722924] __udp4_lib_rcv+0x21e/0x990 [ 12.723670] ? ip_route_input_rcu+0x2dd/0xbf0 [ 12.724442] ? tcp_v4_rcv+0x958/0xa40 [ 12.725039] udp_rcv+0x1a/0x20 [ 12.725587] ip_local_deliver_finish+0x97/0x1d0 [ 12.726323] ip_local_deliver+0xaf/0xc0 [ 12.726959] ? ip_route_input_noref+0x19/0x20 [ 12.727689] ip_rcv_finish+0xdd/0x3b0 [ 12.728307] ip_rcv+0x2ac/0x360 [ 12.728839] __netif_receive_skb_core+0x6fb/0xa90 [ 12.729580] ? udp4_gro_receive+0x1a7/0x2c0 [ 12.730274] __netif_receive_skb+0x1d/0x60 [ 12.730953] ? __netif_receive_skb+0x1d/0x60 [ 12.731637] netif_receive_skb_internal+0x37/0xd0 [ 12.732371] napi_gro_receive+0xc7/0xf0 [ 12.732920] receive_buf+0x3c3/0xd40 [ 12.733441] virtnet_poll+0xb1/0x250 [ 12.733944] net_rx_action+0x23e/0x370 [ 12.734476] __do_softirq+0xc5/0x2f8 [ 12.734922] irq_exit+0xfa/0x100 [ 12.735315] do_IRQ+0x4f/0xd0 [ 12.735680] common_interrupt+0xa2/0xa2 [ 12.736126] </IRQ> [ 12.736416] RIP: 0010:native_safe_halt+0x6/0x10 [ 12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d [ 12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000 [ 12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88 [ 12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 [ 12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000 [ 12.741831] default_idle+0x2a/0x100 [ 12.742323] arch_cpu_idle+0xf/0x20 [ 12.742796] default_idle_call+0x28/0x40 [ 12.743312] do_idle+0x179/0x1f0 [ 12.743761] cpu_startup_entry+0x1d/0x20 [ 12.744291] start_secondary+0x112/0x120 [ 12.744816] secondary_startup_64+0xa5/0xa5 [ 12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00 00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48 89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f [ 12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0 [ 12.748555] ---[ end trace 1399ab83390650fd ]--- [ 12.749296] Kernel panic - not syncing: Fatal exception in interrupt [ 12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 12.751215] Rebooting in 60 seconds.. Fixes: c9b64d492b1f ("tipc: add replicast peer discovery") Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01tcp/dccp: block bh before arming time_wait timerEric Dumazet2-0/+12
Maciej Żenczykowski reported some panics in tcp_twsk_destructor() that might be caused by the following bug. timewait timer is pinned to the cpu, because we want to transition timwewait refcount from 0 to 4 in one go, once everything has been initialized. At the time commit ed2e92394589 ("tcp/dccp: fix timewait races in timer handling") was merged, TCP was always running from BH habdler. After commit 5413d1babe8f ("net: do not block BH while processing socket backlog") we definitely can run tcp_time_wait() from process context. We need to block BH in the critical section so that the pinned timer has still its purpose. This bug is more likely to happen under stress and when very small RTO are used in datacenter flows. Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01Merge branch 'sctp-prsctp-chunk-fixes'David S. Miller3-7/+26
Xin Long says: ==================== sctp: a couple of fixes for chunks abandoned in prsctp Now when abandoning chunks in prsctp, it doesn't consider for frags in one msg, which would cause peer can never receive the whole frags for one msg to get them reassembled, these pieces of this msg will stay in the reasm queue forever and block the following chunks' receiving. This patchset is to fix them in patch 2 and 3, and also fix another issue for prsctp in patch 1. ==================== Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sctp: do not abandon the other frags in unsent outq if one msg has ↵Xin Long2-1/+6
outstanding frags Now for the abandoned chunks in unsent outq, it would just free the chunks. Because no tsn is assigned to them yet, there's no need to send fwd tsn to peer, unlike for the abandoned chunks in sent outq. The problem is when parts of the msg have been sent and the other frags are still in unsent outq, if they are abandoned/dropped, the peer would never get this msg reassembled. So these frags in unsent outq can't be dropped if this msg already has outstanding frags. This patch does the check in sctp_chunk_abandoned and sctp_prsctp_prune_unsent. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sctp: abandon the whole msg if one part of a fragmented message is abandonedXin Long3-5/+17
As rfc3758#section-3.1 demands: A3) When a TSN is "abandoned", if it is part of a fragmented message, all other TSN's within that fragmented message MUST be abandoned at the same time. Besides, if it couldn't handle this, the rest frags would never get assembled in peer side. This patch supports it by adding abandoned flag in sctp_datamsg, when one chunk is being abandoned, set chunk->msg->abandoned as well. Next time when checking for abandoned, go checking chunk->msg->abandoned first. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sctp: only update outstanding_bytes for transmitted queue when doing ↵Xin Long1-2/+4
prsctp_prune Now outstanding_bytes is only increased when appending chunks into one packet and sending it at 1st time, while decreased when it is about to move into retransmit queue. It means outstanding_bytes value is already decreased for all chunks in retransmit queue. However sctp_prsctp_prune_sent is a common function to check the chunks in both transmitted and retransmit queue, it decrease outstanding_bytes when moving a chunk into abandoned queue from either of them. It could cause outstanding_bytes underflow, as it also decreases it's value for the chunks in retransmit queue. This patch fixes it by only updating outstanding_bytes for transmitted queue when pruning queues for prsctp prio policy, the same fix is also needed in sctp_check_transmitted. Fixes: 8dbdf1f5b09c ("sctp: implement prsctp PRIO policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01can: mcba_usb: fix device disconnect bugMartin Kelly1-0/+1
Currently, when you disconnect the device, the driver infinitely resubmits all URBs, so you see: Rx URB aborted (-32) in an infinite loop. Fix this by catching -EPIPE (what we get in urb->status when the device disconnects) and not resubmitting. With this patch, I can plug and unplug many times and the driver recovers correctly. Signed-off-by: Martin Kelly <mkelly@xevo.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-12-01can: mcba_usb: fix typoMartin Kelly1-1/+1
Fix typo "analizer" --> "Analyzer". Signed-off-by: Martin Kelly <mkelly@xevo.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-12-01can: flexcan: fix VF610 state transition issueMarc Kleine-Budde1-2/+3
Enable FLEXCAN_QUIRK_BROKEN_PERR_STATE for VF610 to report correct state transitions. Tested-by: Mirza Krak <mirza.krak@gmail.com> Cc: linux-stable <stable@vger.kernel.org> # >= v4.11 Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-12-01can: flexcan: Update IRQ Err Passive informationMarc Kleine-Budde1-2/+2
The flexcan IP cores used on MX25 and MX35 do not generate Error Passive IRQs. Update the IP core overview table in the driver accordingly. Suggested-by: ZHU Yi (ST-FIR/ENG1-Zhu) <Yi.Zhu5@cn.bosch.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-12-01can: peak/pci: fix potential bug when probe() failsStephane Grosjean2-2/+8
PCI/PCIe drivers for PEAK-System CAN/CAN-FD interfaces do some access to the PCI config during probing. In case one of these accesses fails, a POSITIVE PCIBIOS_xxx error code is returned back. This POSITIVE error code MUST be converted into a NEGATIVE errno for the probe() function to indicate it failed. Using the pcibios_err_to_errno() function, we make sure that the return code will always be negative. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-12-01can: ti_hecc: Fix napi poll return value for repollOliver Stäbler1-0/+3
After commit d75b1ade567f ("net: less interrupt masking in NAPI") napi repoll is done only when work_done == budget. So we need to return budget if there are still packets to receive. Signed-off-by: Oliver Stäbler <oliver.staebler@bytesatwork.ch> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-12-01can: kvaser_usb: ratelimit errors if incomplete messages are receivedJimmy Assarsson1-3/+4
Avoid flooding the kernel log with "Formate error", if incomplete message are received. Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-12-01can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback()Jimmy Assarsson1-1/+1
The conditon in the while-loop becomes true when actual_length is less than 2 (MSG_HEADER_LEN). In best case we end up with a former, already dispatched msg, that got msg->len greater than actual_length. This will result in a "Format error" error printout. Problem seen when unplugging a Kvaser USB device connected to a vbox guest. warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-12-01can: kvaser_usb: free buf in error pathsJimmy Assarsson1-0/+2
The allocated buffer was not freed if usb_submit_urb() failed. Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-11-30net: dsa: bcm_sf2: Set correct CHAIN_ID and slice number maskFlorian Fainelli1-2/+2
When configuring an IPv6 address mask, we should use SLICE_NUM_MASK as the mask in order to make sure all bits are masked by the hardware. Also, we want matching entries to have a CHAIN_ID value set to the same value as the rule index we return to user-space for convenience, so fix that too. Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules") Fixes: dd8eff68343d ("net: dsa: bcm_sf2: Allow matching arbitrary IPv6 masks/lengths") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30sit: update frag_off infoHangbin Liu1-0/+1
After parsing the sit netlink change info, we forget to update frag_off in ipip6_tunnel_update(). Fix it by assigning frag_off with new value. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30tcp: remove buggy call to tcp_v6_restore_cb()Eric Dumazet1-1/+0
tcp_v6_send_reset() expects to receive an skb with skb->cb[] layout as used in TCP stack. MD5 lookup uses tcp_v6_iif() and tcp_v6_sdif() and thus TCP_SKB_CB(skb)->header.h6 This patch probably fixes RST packets sent on behalf of a timewait md5 ipv6 socket. Before Florian patch, tcp_v6_restore_cb() was needed before jumping to no_tcp_socket label. Fixes: 271c3b9b7bda ("tcp: honour SO_BINDTODEVICE for TW_RST case too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30act_sample: get rid of tcf_sample_cleanup_rcu()Cong Wang2-12/+3
Similar to commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu"), TC actions don't need to respect RCU grace period, because it is either just detached from tc filter (standalone case) or it is removed together with tc filter (bound case) in which case RCU grace period is already respected at filter layer. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30Merge tag 'rxrpc-fixes-20171129' of ↵David S. Miller5-27/+35
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes Here are three patches for AF_RXRPC. One removes some whitespace, one fixes terminal ACK generation and the third makes a couple of places actually use the timeout value just determined rather than ignoring it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30net: stmmac: dwmac-sun8i: fix allwinner,leds-active-low handlingCorentin Labbe1-2/+1
The driver expect "allwinner,leds-active-low" to be in PHY node, but the binding doc expect it to be in MAC node. Since all board DT use it also in MAC node, the driver need to search allwinner,leds-active-low in MAC node. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30skbuff: Grammar s/are can/can/, s/change/changes/Geert Uytterhoeven1-2/+1
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30net: mvpp2: allocate zeroed tx descriptorsYan Markman1-1/+1
Reserved and unused fields in the Tx descriptors should be 0. The PPv2 driver doesn't clear them at run-time (for performance reasons) but these descriptors aren't zeroed when allocated, which can lead to unpredictable behaviors. This patch fixes this by using dma_zalloc_coherent instead of dma_alloc_coherent. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Yan Markman <ymarkman@marvell.com> [Antoine: commit message] Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds15-132/+270
Pull nfsd fixes from Bruce Fields: "I screwed up my merge window pull request; I only sent half of what I meant to. There were no new features, just bugfixes of various importance and some very minor cleanup, so I think it's all still appropriate for -rc2. Highlights: - Fixes from Trond for some races in the NFSv4 state code. - Fix from Naofumi Honda for a typo in the blocked lock notificiation code - Fixes from Vasily Averin for some problems starting and stopping lockd especially in network namespaces" * tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits) lockd: fix "list_add double add" caused by legacy signal interface nlm_shutdown_hosts_net() cleanup race of nfsd inetaddr notifiers vs nn->nfsd_serv change race of lockd inetaddr notifiers vs nlmsvc_rqst change SUNRPC: make cache_detail structures const NFSD: make cache_detail structures const sunrpc: make the function arg as const nfsd: check for use of the closed special stateid nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat lockd: lost rollback of set_grace_period() in lockd_down_net() lockd: added cleanup checks in exit_net hook grace: replace BUG_ON by WARN_ONCE in exit_net hook nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class lockd: remove net pointer from messages nfsd: remove net pointer from debug messages nfsd: Fix races with check_stateid_generation() nfsd: Ensure we check stateid validity in the seqid operation checks nfsd: Fix race in lock stateid creation nfsd4: move find_lock_stateid nfsd: Ensure we don't recognise lock stateids after freeing them ...