summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-01sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()Jules Irenge1-0/+1
Sparse reports a warning at efx_ef10_try_update_nic_stats_vf() warning: context imbalance in efx_ef10_try_update_nic_stats_vf() - unexpected unlock The root cause is the missing annotation at efx_ef10_try_update_nic_stats_vf() Add the missing _must_hold(&efx->stats_lock) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01crypto/chtls: IPv6 support for inline TLSVinay Kumar Yadav4-43/+168
Extends support to IPv6 for Inline TLS server. Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> v1->v2: - cc'd tcp folks. v2->v3: - changed EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL() Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge branch 'chelsio-crypto-fixes'David S. Miller2-7/+6
Ayush Sawal says: ==================== Fixing compilation warnings and errors Patch 1: Fixes the warnings seen when compiling using sparse tool. Patch 2: Fixes a cocci check error introduced after commit 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). V1->V2 patch1: Avoid type casting by using get_unaligned_be32() and put_unaligned_be16/32() functions. patch2: Modified subject of the patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Crypto/chcr: Fixes a coccinile check errorAyush Sawal1-0/+1
This fixes an error observed after running coccinile check. drivers/crypto/chelsio/chcr_algo.c:1462:5-8: Unneeded variable: "err". Return "0" on line 1480 This line is missed in the commit 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). Fixes: 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). V1->V2 -Modified subject. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Crypto/chcr: Fixes compilations warningsAyush Sawal2-7/+5
This patch fixes the compilation warnings displayed by sparse tool for chcr driver. V1->V2 Avoid type casting by using get_unaligned_be32() and put_unaligned_be16/32() functions. The key which comes from stack is an u8 byte stream so we store it in an unsigned char array(ablkctx->key). The function get_aes_decrypt_key() is a used to calculate the reverse round key for decryption, for this operation the key has to be divided into 4 bytes, so to extract 4 bytes from an u8 byte stream and store it in an u32 variable, get_aligned_be32() is used. Similarly for copying back the key from u32 variable to the original u8 key stream, put_aligned_be32() is used. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01crypto/chcr: IPV6 code needs to be in CONFIG_IPV6Rohit Maheshwari1-15/+33
Error messages seen while building kernel with CONFIG_IPV6 disabled. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01cxgb4/chcr: Enable ktls settings at run timeRohit Maheshwari10-85/+195
Current design enables ktls setting from start, which is not efficient. Now the feature will be enabled when user demands TLS offload on any interface. v1->v2: - taking ULD module refcount till any single connection exists. - taking rtnl_lock() before clearing tls_devops. v2->v3: - cxgb4 is now registering to tlsdev_ops. - module refcount inc/dec in chcr. - refcount is only for connections. - removed new code from cxgb_set_feature(). v3->v4: - fixed warning message. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01ipv6: fix IPV6_ADDRFORM operation logicHangbin Liu1-6/+7
Socket option IPV6_ADDRFORM supports UDP/UDPLITE and TCP at present. Previously the checking logic looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol != IPPROTO_TCP) break; After commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation"), TCP was blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; break; else break; Then after commit 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") UDP/UDPLITE were blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; if (sk->sk_protocol == IPPROTO_TCP) do_some_check; if (sk->sk_protocol != IPPROTO_TCP) break; Fix it by using Eric's code and simply remove the break in TCP check, which looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; else break; Fixes: 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01tipc: Fix NULL pointer dereference in __tipc_sendstream()YueHaibing1-2/+6
tipc_sendstream() may send zero length packet, then tipc_msg_append() do not alloc skb, skb_peek_tail() will get NULL, msg_set_ack_required will trigger NULL pointer dereference. Reported-by: syzbot+8eac6d030e7807c21d32@syzkaller.appspotmail.com Fixes: 0a3e060f340d ("tipc: add test for Nagle algorithm effectiveness") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01net: dsa: sja1105: suppress -Wmissing-prototypes in sja1105_vl.cVladimir Oltean2-1/+3
Newer C compilers are complaining about the fact that there are no function prototypes in sja1105_vl.c for the non-static functions. Give them what they want. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge branch '100GbE' of ↵David S. Miller16-156/+142
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2020-05-31 This series contains updates to the ice driver only. Brett modifies the driver to allow users to clear a VF's administratively set MAC address on the PF. Fixes the driver to recognize an existing VLAN tag when DMAC/SMAC is enabled in a packet. Fixes an issue, so that VF's are reset after any VF port VLAN modifications are made on the PF. Made sure the register QRXFLXP_CNTXT is cleared before writing a new value to ensure the previous value is not passed forward. Updates the PF to allow the VF to request a reset as soon as it has been initialized. Fixes an issue to ensure when a VSI is created, it uses the current coalesce value, not the default value. Paul allows untrusted VF's to add 16 filters. Dan increases the timeout needed after a PFR to allow ample time for package download. Chinh adjust the define value for the number of PHY speeds we currently support. Changes the driver to ignore EMODE error when configuring the PHY. Jesse fixes an issue which was preventing a user from configuring the interface before bringing it up. Henry fixes the logic for adding back perfect flows after flow director filter does a deletion. Bruce fixes line wrappings to make it more consistent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01vxlan: fix dereference of nexthop group in nexthop update pathRoopa Prabhu1-2/+2
fix dereference of nexthop group in fdb nexthop group update validation path. Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries") Reported-by: Ido Schimmel <idosch@idosch.org> Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01switch cmsghdr_from_user_compat_to_kern() to copy_from_user()Al Viro1-7/+8
no point getting compat_cmsghdr field-by-field Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge branch 'dpaa2-eth-add-PFC-support'David S. Miller10-57/+787
Ioana Ciornei says: ==================== dpaa2-eth: add PFC support This patch set adds support for Priority Flow Control in DPAA2 Ethernet devices. The first patch make the necessary changes so that multiple traffic classes are configured. The dequeue priority of the maximum 8 traffic classes is configured to be equal. The second patch adds a static distribution to said traffic classes based on the VLAN PCP field. In the future, this could be extended through the .setapp() DCB callback for dynamic configuration. Also, add support for the congestion group taildrop mechanism that allows us to control the number of frames that can accumulate on a group of Rx frame queues belonging to the same traffic class. The basic subset of the DCB ops is implemented so that the user can query the number of PFC capable traffic classes, their state and reconfigure them if necessary. Changes in v3: - add patches 6-7 which add the PFC functionality - patch 2/7: revert to explicitly cast mask to u16 * to not get into sparse warnings Changes in v4: - really fix the sparse warnings in 2/7 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01dpaa2-eth: Keep congestion group taildrop enabled when PFC onIoana Ciornei3-10/+34
Leave congestion group taildrop enabled for all traffic classes when PFC is enabled. Notification threshold is low enough such that it will be hit first and this also ensures that FQs on traffic classes which are not PFC enabled won't drain the buffer pool. FQ taildrop threshold is kept disabled as long as any form of flow control is on. Since FQ taildrop works with bytes, not number of frames, we can't guarantee it will not interfere with the congestion notification mechanism for all frame sizes. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01dpaa2-eth: Add PFC support through DCB opsIoana Ciornei8-0/+316
Add support in dpaa2-eth for PFC (Priority Flow Control) through the DCB ops. Instruct the hardware to respond to received PFC frames. Current firmware doesn't allow us to selectively enable PFC on the Rx side for some priorities only, so we will react to all incoming PFC frames (and stop transmitting on the traffic classes specified in the frame). Also, configure the hardware to generate PFC frames based on Rx congestion notifications. When a certain number of frames accumulate in the ingress queues corresponding to a traffic class, priority flow control frames are generated for that TC. The number of PFC traffic classes available can be queried through lldptool. Also, which of those traffic classes have PFC enabled is also controlled through the same dcbnl_rtnl_ops callbacks. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01dpaa2-eth: Update FQ taildrop threshold and buffer pool countIoana Radulescu1-12/+11
Now that we have congestion group taildrop configured at all times, we can afford to increase the frame queue taildrop threshold; this will ensure a better response when receiving bursts of large-sized frames. Also decouple the buffer pool count from the Rx FQ taildrop threshold, as above change would increase it too much. Instead, keep the old count as a hardcoded value. With the new limits, we try to ensure that: * we allow enough leeway for large frame bursts (by buffering enough of them in queues to avoid heavy dropping in case of bursty traffic, but when overall ingress bandwidth is manageable) * allow pending frames to be evenly spread between ingress FQs, regardless of frame size * avoid dropping frames due to the buffer pool being empty; this is not a bad behaviour per se, but system overall response is more linear and predictable when frames are dropped at frame queue/group level. Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01dpaa2-eth: Add congestion group taildropIoana Radulescu2-10/+38
The increase in number of ingress frame queues means we now risk depleting the buffer pool before the FQ taildrop kicks in. Congestion group taildrop allows us to control the number of frames that can accumulate on a group of Rx frame queues belonging to the same traffic class. This setting coexists with the frame queue based taildrop: whichever limit gets hit first triggers the frame drop. Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01dpaa2-eth: Add helper functionsIoana Radulescu3-5/+14
Add convenient helper functions that determines whether Rx/Tx pause frames are enabled based on link state flags received from firmware. Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01dpaa2-eth: Distribute ingress frames based on VLAN prioIoana Radulescu5-0/+318
Configure static ingress classification based on VLAN PCP field. If the DPNI doesn't have enough traffic classes to accommodate all priority levels, the lowest ones end up on TC 0 (default on miss). Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01dpaa2-eth: Add support for Rx traffic classesIoana Radulescu4-32/+68
The firmware reserves for each DPNI a number of RX frame queues equal to the number of configured flows x number of configured traffic classes. Current driver configuration directs all incoming traffic to FQs corresponding to TC0, leaving all other priority levels unused. Start adding support for multiple ingress traffic classes, by configuring the FQs associated with all priority levels, not just TC0. All settings that are per-TC, such as those related to hashing and flow steering, are also updated. Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01net: phy: broadcom: don't export RDB/legacy access methodsMichael Walle1-4/+2
Don't export __bcm_phy_enable_rdb_access() and __bcm_phy_enable_legacy_access() functions. They aren't used outside this module and it was forgotten to provide a prototype for these functions. Just make them static for now. Fixes: 11ecf8c55b91 ("net: phy: broadcom: add cable test support") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01tun: correct header offsets in napi frags modeWillem de Bruijn1-4/+10
Tun in IFF_NAPI_FRAGS mode calls napi_gro_frags. Unlike netif_rx and netif_gro_receive, this expects skb->data to point to the mac layer. But skb_probe_transport_header, __skb_get_hash_symmetric, and xdp_do_generic in tun_get_user need skb->data to point to the network header. Flow dissection also needs skb->protocol set, so eth_type_trans has to be called. Ensure the link layer header lies in linear as eth_type_trans pulls ETH_HLEN. Then take the same code paths for frags as for not frags. Push the link layer header back just before calling napi_gro_frags. By pulling up to ETH_HLEN from frag0 into linear, this disables the frag0 optimization in the special case when IFF_NAPI_FRAGS is used with zero length iov[0] (and thus empty skb->linear). Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Petar Penkov <ppenkov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01cls_flower: remove mpls_opts_policyGuillaume Nault1-5/+0
Compiling with W=1 gives the following warning: net/sched/cls_flower.c:731:1: warning: ‘mpls_opts_policy’ defined but not used [-Wunused-const-variable=] The TCA_FLOWER_KEY_MPLS_OPTS contains a list of TCA_FLOWER_KEY_MPLS_OPTS_LSE. Therefore, the attributes all have the same type and we can't parse the list with nla_parse*() and have the attributes validated automatically using an nla_policy. fl_set_key_mpls_opts() properly verifies that all attributes in the list are TCA_FLOWER_KEY_MPLS_OPTS_LSE. Then fl_set_key_mpls_lse() uses nla_parse_nested() on all these attributes, thus verifying that they have the NLA_F_NESTED flag. So we can safely drop the mpls_opts_policy. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge branch 'bridge-mrp-Add-support-for-MRA-role'David S. Miller7-33/+182
Horatiu Vultur says: ==================== bridge: mrp: Add support for MRA role This patch series extends the MRP with the MRA role. A node that has the MRA role can behave as a MRM or as a MRC. In case there are multiple nodes in the topology that has the MRA role then only one node can behave as MRM and all the others need to be have as MRC. The node that has the higher priority(lower value) will behave as MRM. A node that has the MRA role and behaves as MRC, it just needs to forward the MRP_Test frames between the ring ports but also it needs to detect in case it stops receiving MRP_Test frames. In that case it would try to behave as MRM. v2: - add new patch that fixes sparse warnings - fix parsing of prio attribute ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01bridge: mrp: Add support for role MRAHoratiu Vultur7-21/+159
A node that has the MRA role, it can behave as MRM or MRC. Initially it starts as MRM and sends MRP_Test frames on both ring ports. If it detects that there are MRP_Test send by another MRM, then it checks if these frames have a lower priority than itself. In this case it would send MRP_Nack frames to notify the other node that it needs to stop sending MRP_Test frames. If it receives a MRP_Nack frame then it stops sending MRP_Test frames and starts to behave as a MRC but it would continue to monitor the MRP_Test frames send by MRM. If at a point the MRM stops to send MRP_Test frames it would get the MRM role and start to send MRP_Test frames. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01bridge: mrp: Set the priority of MRP instanceHoratiu Vultur6-1/+12
Each MRP instance has a priority, a lower value means a higher priority. The priority of MRP instance is stored in MRP_Test frame in this way all the MRP nodes in the ring can see other nodes priority. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01bridge: mrp: Update MRP frame typeHoratiu Vultur1-11/+11
Replace u16/u32 with be16/be32 in the MRP frame types. This fixes sparse warnings like: warning: cast to restricted __be16 Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01net: vmxnet3: fix possible buffer overflow caused by bad DMA value in ↵Jia-Ju Bai1-0/+2
vmxnet3_get_rss() The value adapter->rss_conf is stored in DMA memory, and it is assigned to rssConf, so rssConf->indTableSize can be modified at anytime by malicious hardware. Because rssConf->indTableSize is assigned to n, buffer overflow may occur when the code "rssConf->indTable[n]" is executed. To fix this possible bug, n is checked after being used. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01flow_dissector: work around stack frame size warningArnd Bergmann1-9/+8
The fl_flow_key structure is around 500 bytes, so having two of them on the stack in one function now exceeds the warning limit after an otherwise correct change: net/sched/cls_flower.c:298:12: error: stack frame size of 1056 bytes in function 'fl_classify' [-Werror,-Wframe-larger-than=] I suspect the fl_classify function could be reworked to only have one of them on the stack and modify it in place, but I could not work out how to do that. As a somewhat hacky workaround, move one of them into an out-of-line function to reduce its scope. This does not necessarily reduce the stack usage of the outer function, but at least the second copy is removed from the stack during most of it and does not add up to whatever is called from there. I now see 552 bytes of stack usage for fl_classify(), plus 528 bytes for fl_mask_lookup(). Fixes: 58cff782cc55 ("flow_dissector: Parse multiple MPLS Label Stack Entries") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01lan743x: Added fixed link and RGMII supportRoelof Berg4-13/+80
Microchip lan7431 is frequently connected to a phy. However, it can also be directly connected to a MII remote peer without any phy in between. For supporting such a phyless hardware setup in Linux we utilized phylib, which supports a fixed-link configuration via the device tree. And we added support for defining the connection type R/GMII in the device tree. New behavior: ------------- . The automatic speed and duplex detection of the lan743x silicon between mac and phy is disabled. Instead phylib is used like in other typical Linux drivers. The usage of phylib allows to specify fixed-link parameters in the device tree. . The device tree entry phy-connection-type is supported now with the modes RGMII or (G)MII (default). Development state: ------------------ . Tested with fixed-phy configurations. Not yet tested in normal configurations with phy. Microchip kindly offered testing as soon as the Corona measures allow this. . All review findings of Andrew Lunn are included Example: -------- &pcie { status = "okay"; host@0 { reg = <0 0 0 0 0>; #address-cells = <3>; #size-cells = <2>; ethernet@0 { compatible = "weyland-yutani,noscom1", "microchip,lan743x"; status = "okay"; reg = <0 0 0 0 0>; phy-connection-type = "rgmii"; fixed-link { speed = <100>; full-duplex; }; }; }; }; Signed-off-by: Roelof Berg <rberg@berg-solutions.de> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge branch 'devlink-Add-support-for-control-packet-traps'David S. Miller11-168/+1781
Ido Schimmel says: ==================== devlink: Add support for control packet traps So far device drivers were only able to register drop and exception packet traps with devlink. These traps are used for packets that were either dropped by the underlying device or encountered an exception (e.g., missing neighbour entry) during forwarding. However, in the steady state, the majority of the packets being trapped to the CPU are packets that are required for the correct functioning of the control plane. For example, ARP request and IGMP query packets. This patch set allows device drivers to register such control traps with devlink and expose their default control plane policy to user space. User space can then tune the packet trap policer settings according to its needs, as with existing packet traps. In a similar fashion to exception traps, the action associated with such traps cannot be changed as it can easily break the control plane. Unlike drop and exception traps, packets trapped via control traps are not reported to the kernel's drop monitor as they are not indicative of any problem. Patch set overview: Patches #1-#3 break out layer 3 exceptions to a different group to provide better granularity. A future patch set will make this completely configurable. Patch #4 adds a new trap action ('mirror') that is used for packets that are forwarded by the device and sent to the CPU. Such packets are marked by device drivers with 'skb->offload_fwd_mark = 1' in order to prevent the kernel from forwarding them again. Patch #5 adds the new trap type, 'control'. Patches #6-#8 gradually add various control traps to devlink with proper documentation. Patch #9 adds a few control traps to netdevsim, which are automatically exercised by existing devlink-trap selftest. Patches #10 performs small refactoring in mlxsw. Patches #11-#13 change mlxsw to register its existing control traps with devlink. Patch #14 adds a selftest over mlxsw that exercises all the registered control traps. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01selftests: mlxsw: Add test for control packetsIdo Schimmel2-0/+711
Generate packets matching the various control traps and check that the traps' stats increase accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01mlxsw: spectrum_trap: Register ACL control trapsIdo Schimmel3-23/+55
In a similar fashion to other control traps, register ACL control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01mlxsw: spectrum_trap: Register layer 3 control trapsIdo Schimmel3-94/+318
In a similar fashion to layer 2 control traps, register layer 3 control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01mlxsw: spectrum_trap: Register layer 2 control trapsIdo Schimmel3-31/+159
In a similar fashion to other traps, register layer 2 control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01mlxsw: spectrum_trap: Factor out common Rx listener functionIdo Schimmel1-5/+24
We currently have an Rx listener function for exception traps that marks received skbs with 'offload_fwd_mark' and injects them to the kernel's Rx path. The marking is done because all these exceptions occur during L3 forwarding, after the packets were potentially flooded at L2. A subsequent patch will add support for control traps. Packets received via some of these control traps need different handling: 1. Packets might not need to be marked with 'offload_fwd_mark'. For example, if packet was trapped before L2 forwarding 2. Packets might not need to be injected to the kernel's Rx path. For example, sampled packets are reported to user space via the psample module Factor out a common Rx listener function that only reports trapped packets to devlink. Call it from mlxsw_sp_rx_no_mark_listener() and mlxsw_sp_rx_mark_listener() that will inject the packets to the kernel's Rx path, without and with the marking, respectively. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01netdevsim: Register control trapsIdo Schimmel1-0/+7
Register two control traps with devlink. The existing selftest at tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh iterates over all registered traps and checks that the action of non-drop traps cannot be changed. Up until now only exception traps were tested, now control traps will be tested as well. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01devlink: Add ACL control packet trapsIdo Schimmel3-0/+30
Add packet traps for packets that are sampled / trapped by ACLs, so that capable drivers could register them with devlink. Add documentation for every added packet trap and packet trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01devlink: Add layer 3 control packet trapsIdo Schimmel3-0/+311
Add layer 3 control packet traps such as ARP and DHCP, so that capable device drivers could register them with devlink. Add documentation for every added packet trap and packet trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01devlink: Add layer 2 control packet trapsIdo Schimmel3-0/+109
Add layer 2 control packet traps such as STP and IGMP query, so that capable device drivers could register them with devlink. Add documentation for every added packet trap and packet trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01devlink: Add 'control' trap typeIdo Schimmel3-1/+20
This type is used for traps that trap control packets such as ARP request and IGMP query to the CPU. Do not report such packets to the kernel's drop monitor as they were not dropped by the device no encountered an exception during forwarding. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01devlink: Add 'mirror' trap actionIdo Schimmel3-1/+7
The action is used by control traps such as IGMP query. The packet is flooded by the device, but also trapped to the CPU in order for the software bridge to mark the receiving port as a multicast router port. Such packets are marked with 'skb->offload_fwd_mark = 1' in order to prevent the software bridge from flooding them again. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01netdevsim: Move layer 3 exceptions to exceptions trap groupIdo Schimmel1-1/+2
The layer 3 exceptions are still subject to the same trap policer, so nothing changes, but user space can choose to assign a different one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01mlxsw: spectrum_trap: Move layer 3 exceptions to exceptions trap groupIdo Schimmel2-16/+25
The layer 3 exceptions are still subject to the same trap policer, so nothing changes, but user space can choose to assign a different one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01devlink: Create dedicated trap group for layer 3 exceptionsIdo Schimmel3-2/+9
Packets that hit exceptions during layer 3 forwarding must be trapped to the CPU for the control plane to function properly. Create a dedicated group for them, so that user space could choose to assign a different policer for them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller9-137/+670
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next to extend ctnetlink and the flowtable infrastructure: 1) Extend ctnetlink kernel side netlink dump filtering capabilities, from Romain Bellan. 2) Generalise the flowtable hook parser to take a hook list. 3) Pass a hook list to the flowtable hook registration/unregistration. 4) Add a helper function to release the flowtable hook list. 5) Update the flowtable event notifier to pass a flowtable hook list. 6) Allow users to add new devices to an existing flowtables. 7) Allow users to remove devices to an existing flowtables. 8) Allow for registering a flowtable with no initial devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01net: fec: disable correct clk in the err path of fec_enet_clk_enableLiu Xiang1-2/+6
When enable clk_ref failed, clk_ptp should be disabled rather than clk_ref itself. Signed-off-by: Liu Xiang <liuxiang_1999@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01net: octeon: mgmt: Repair filling of RX ringAlexander Sverdlin1-0/+5
The removal of mips_swiotlb_ops exposed a problem in octeon_mgmt Ethernet driver. mips_swiotlb_ops had an mb() after most of the operations and the removal of the ops had broken the receive functionality of the driver. My code inspection has shown no other places except octeon_mgmt_rx_fill_ring() where an explicit barrier would be obviously missing. The latter function however has to make sure that "ringing the bell" doesn't happen before RX ring entry is really written. The patch has been successfully tested on Octeon II. Fixes: a999933db9ed ("MIPS: remove mips_swiotlb_ops") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge branch 'fix-indirect-flow_block-infrastructure'David S. Miller16-595/+249
Pablo Neira Ayuso says: ==================== the indirect flow_block infrastructure, revisited This series fixes b5140a36da78 ("netfilter: flowtable: add indr block setup support") that adds support for the indirect block for the flowtable. This patch crashes the kernel with the TC CT action. [ 630.908086] BUG: kernel NULL pointer dereference, address: 00000000000000f0 [ 630.908233] #PF: error_code(0x0000) - not-present page [ 630.908304] PGD 800000104addd067 P4D 800000104addd067 PUD 104311d067 PMD 0 [ 630.908380] Oops: 0000 [#1] SMP PTI [ 630.908615] RIP: 0010:nf_flow_table_indr_block_cb+0xc0/0x190 [nf_flow_table] [ 630.908690] Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 4c 89 75 a0 4c 89 65 a8 4d 89 ee 49 89 dd 4c 89 fe 48 c7 c7 b7 64 36 a0 31 c0 e8 ce ed d8 e0 <49> 8b b7 f0 00 00 00 48 c7 c7 c8 64 36 a0 31 c0 e8 b9 ed d8 e0 49[ 630.908790] RSP: 0018:ffffc9000895f8c0 EFLAGS: 00010246 [...] [ 630.910774] Call Trace: [ 630.911192] ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core] [ 630.911621] ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core] [ 630.912040] ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core] [ 630.912443] flow_block_cmd+0x51/0x80 [ 630.912844] __flow_indr_block_cb_register+0x26c/0x510 [ 630.913265] mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core] [ 630.913665] notifier_call_chain+0x53/0xa0 [ 630.914063] raw_notifier_call_chain+0x16/0x20 [ 630.914466] call_netdevice_notifiers_info+0x39/0x90 [ 630.914859] register_netdevice+0x484/0x550 [ 630.915256] __ip_tunnel_create+0x12b/0x1f0 [ip_tunnel] [ 630.915661] ip_tunnel_init_net+0x116/0x180 [ip_tunnel] [ 630.916062] ipgre_tap_init_net+0x22/0x30 [ip_gre] [ 630.916458] ops_init+0x44/0x110 [ 630.916851] register_pernet_operations+0x112/0x200 A workaround patch to cure this crash has been proposed. However, there is another problem: The indirect flow_block still does not work for the new TC CT action. The problem is that the existing flow_indr_block_entry callback assumes you can look up for the flowtable from the netdevice to get the flow_block. This flow_block allows you to offload the flows via TC_SETUP_CLSFLOWER. Unfortunately, it is not possible to get the flow_block from the TC CT flowtables because they are _not_ bound to any specific netdevice. = What is the indirect flow_block infrastructure? The indirect flow_block infrastructure allows drivers to offload tc/netfilter rules that belong to software tunnel netdevices, e.g. vxlan. This indirect flow_block infrastructure relates tunnel netdevices with drivers because there is no obvious way to relate these two things from the control plane. = How does the indirect flow_block work before this patchset? Front-ends register the indirect block callback through flow_indr_add_block_cb() if they support for offloading tunnel netdevices. == Setting up an indirect block 1) Drivers track tunnel netdevices via NETDEV_{REGISTER,UNREGISTER} events. If there is a new tunnel netdevice that the driver can offload, then the driver invokes __flow_indr_block_cb_register() with the new tunnel netdevice and the driver callback. The __flow_indr_block_cb_register() call iterates over the list of the front-end callbacks. 2) The front-end callback sets up the flow_block_offload structure and it invokes the driver callback to set up the flow_block. 3) The driver callback now registers the flow_block structure and it returns the flow_block back to the front-end. 4) The front-end gets the flow_block object and it is now ready to offload rules for this tunnel netdevice. A simplified callgraph is represented below. Front-end Driver NETDEV_REGISTER | __flow_indr_block_cb_register(netdev, cb_priv, driver_cb) | [1] .--------------frontend_indr_block_cb(cb_priv, driver_cb) | . setup_flow_block_offload(bo) | [2] driver_cb(bo, cb_priv) -----------. | \/ set up flow_blocks [3] | add rules to flow_block <---------- TC_SETUP_CLSFLOWER [4] == Releasing the indirect flow_block There are two possibilities, either tunnel netdevice is removed or a netdevice (port representor) is removed. === Tunnel netdevice is removed Driver waits for the NETDEV_UNREGISTER event that announces the tunnel netdevice removal. Then, it calls __flow_indr_block_cb_unregister() to remove the flow_block and rules. Callgraph is very similar to the one described above. === Netdevice is removed (port representor) Driver calls __flow_indr_block_cb_unregister() to remove the existing netfilter/tc rule that belong to the tunnel netdevice. = How does the indirect flow_block work after this patchset? Drivers register the indirect flow_block setup callback through flow_indr_dev_register() if they support for offloading tunnel netdevices. == Setting up an indirect flow_block 1) Frontends check if dev->netdev_ops->ndo_setup_tc is unset. If so, frontends call flow_indr_dev_setup_offload(). This call invokes the drivers' indirect flow_block setup callback. 2) The indirect flow_block setup callback sets up a flow_block structure which relates the tunnel netdevice and the driver. 3) The front-end uses flow_block and offload the rules. Note that the operational to set up (non-indirect) flow_block is very similar. == Releasing the indirect flow_block === Tunnel netdevice is removed This calls flow_indr_dev_setup_offload() to set down the flow_block and remove the offloaded rules. This alternate path is exercised if dev->netdev_ops->ndo_setup_tc is unset. === Netdevice is removed (port representor) If a netdevice is removed, then it might need to to clean up the offloaded tc/netfilter rules that belongs to the tunnel netdevice: 1) The driver invokes flow_indr_dev_unregister() when a netdevice is removed. 2) This call iterates over the existing indirect flow_blocks and it invokes the cleanup callback to let the front-end remove the tc/netfilter rules. The cleanup callback already provides the flow_block that the front-end needs to clean up. Front-end Driver | flow_indr_dev_unregister(...) | iterate over list of indirect flow_block and invoke cleanup callback | .----------------------------- | . frontend_flow_block_cleanup(flow_block) . | \/ remove rules to flow_block TC_SETUP_CLSFLOWER = About this patchset This patchset aims to address the existing TC CT problem while simplifying the indirect flow_block infrastructure. Saving 300 LoC in the flow_offload core and the drivers. The operational gets aligned with the (non-indirect) flow_blocks logic. Patchset is composed of: Patch #1 add nf_flow_table_gc_cleanup() which is required by the netfilter's flowtable new indirect flow_block approach. Patch #2 adds the flow_block_indr object which is actually part of of the flow_block object. This stores the indirect flow_block metadata such as the tunnel netdevice owner and the cleanup callback (in case the tunnel netdevice goes away). This patch adds flow_indr_dev_{un}register() to allow drivers to offer netdevice tunnel hardware offload to the front-ends. Then, front-ends call flow_indr_dev_setup_offload() to invoke the drivers to set up the (indirect) flow_block. Patch #3 add the tcf_block_offload_init() helper function, this is a preparation patch to adapt the tc front-end to use this new indirect flow_block infrastructure. Patch #4 updates the tc and netfilter front-ends to use the new indirect flow_block infrastructure. Patch #5 updates the mlx5 driver to use the new indirect flow_block infrastructure. Patch #6 updates the nfp driver to use the new indirect flow_block infrastructure. Patch #7 updates the bnxt driver to use the new indirect flow_block infrastructure. Patch #8 removes the indirect flow_block infrastructure version 1, now that frontends and drivers have been translated to version 2 (coming in this patchset). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>