summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-09-26mlxsw: spectrum: Add support for setting counters on nexthopsArkadi Sharshevsky2-3/+51
Add support for setting counters on nexthops based on dpipe's adjacency table counter status. This patch also adds the ability for getting the counter value, which will be used by the dpipe adjacency table dump implementation in the next patches. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26mlxsw: reg: Add support for counters on RATRArkadi Sharshevsky1-9/+35
In order to add the ability for setting counters on nexthops the RATR register should be extended. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26mlxsw: spectrum_dpipe: Add initial support for the router adjacency tableArkadi Sharshevsky2-1/+100
Add initial support for router adjacency table. The table does lookup based on the nexthop-group index and the local nexthop offset. After locating the nexthop entry it sets the destination MAC address and the egress RIF. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26mlxsw: spectrum_router: Add helpers for nexthop accessArkadi Sharshevsky2-0/+83
This is done as a preparation before introducing the ability to dump the adjacency table via dpipe, and to count the table size. The current table implementation avoids tunnel entries, thus a helper for checking if the nexthop group contains tunnel entries is also provided. The mlxsw's nexthop representative struct stays private to the router module. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26mlxsw: spectrum_router: Use helper to check for last neighborArkadi Sharshevsky1-1/+1
Use list_is_last helper to check for last neighbor. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26mlxsw: spectrum_router: Keep nexthops in a linked listArkadi Sharshevsky1-0/+9
Keep nexthops in a linked list for easy access. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26mlxsw: Add fields for mlxsw's meta header for adjacency tableArkadi Sharshevsky1-0/+12
This patch adds field for mlxsw's meta header which will be used to describe the match/action behavior of the adjacency table. The fields are: 1. Adj_index - The global index of the nexthop group in the adjacency table. 2. Adj_hash_index - Local index offset which is based on packets hash mod the nexthop group size. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26mlxsw: spectrum_dpipe: Fix indentation in header descriptionArkadi Sharshevsky1-10/+13
Fix indentation in mlxsw_meta header's description. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26Merge branch 'bpf-metadata-direct-access'David S. Miller29-104/+759
Daniel Borkmann says: ==================== BPF metadata for direct access This work enables generic transfer of metadata from XDP into skb, meaning the packet has a flexible and programmable room for meta data, which can later be used by BPF to set various skb members when passing up the stack. For details, please see second patch. Support has been implemented and tested with two drivers, and should be straight forward to add to other drivers as well which properly support head adjustment already. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf, ixgbe: add meta data supportDaniel Borkmann1-4/+26
Implement support for transferring XDP meta data into skb for ixgbe driver; before calling into the program, xdp.data_meta points to xdp.data, where on program return with pass verdict, we call into skb_metadata_set(). We implement this for the default ixgbe_build_skb() variant. For the ixgbe_construct_skb() that is used when legacy-rx buffer mananagement mode is turned on via ethtool, I found that XDP gets 0 headroom, so neither xdp_adjust_head() nor xdp_adjust_meta() can be used with this. Just add a comment with explanation for this operating mode. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf, nfp: add meta data supportDaniel Borkmann1-25/+15
Implement support for transferring XDP meta data into skb for nfp driver; before calling into the program, xdp.data_meta points to xdp.data, where on program return with pass verdict, we call into skb_metadata_set(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf: improve selftests and add tests for meta pointerDaniel Borkmann5-4/+370
Add various test_verifier selftests, and a simple xdp/tc functional test that is being attached to veths. Also let new versions of clang use the recently added -mcpu=probe support [1] for the BPF target, so that it can probe the underlying kernel for BPF insn set extensions. We could also just set this options always, where older versions just ignore it and give a note to the user that the -mcpu value is not supported, but given emitting the note cannot be turned off from clang side lets not confuse users running selftests with it, thus fallback to the default generic one when we see that clang doesn't support it. Also allow CPU option to be overridden in the Makefile from command line. [1] https://github.com/llvm-mirror/llvm/commit/d7276a40d87b89aed89978dec6457a5b8b3a0db5 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf: update bpf.h uapi header for toolsDaniel Borkmann1-13/+32
Looks like a couple of updates missed to get carried into tools/include/uapi/, so copy the bpf.h header as usual to pull in latest updates. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf: add meta pointer for direct accessDaniel Borkmann19-42/+297
This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room. xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail. The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf: rename bpf_compute_data_end into bpf_compute_data_pointersDaniel Borkmann7-18/+21
Just do the rename into bpf_compute_data_pointers() as we'll add one more pointer here to recompute. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26net: bcm63xx_enet: Use setup_timer and mod_timerHimanshu Jha1-5/+2
Use setup_timer and mod_timer API instead of structure assignments. This is done using Coccinelle and semantic patch used for this as follows: @@ expression x,y,z,a,b; @@ -init_timer (&x); +setup_timer (&x, y, z); +mod_timer (&a, b); -x.function = y; -x.data = z; -x.expires = b; -add_timer(&a); Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26Merge branch 'qed-iWARP-fixes-and-enhancements'David S. Miller8-14/+90
Michal Kalderon says: ==================== qed: iWARP fixes and enhancements This patch series includes several fixes and enhancements related to iWARP. Patch #1 is actually the last of the initial iWARP submission. It has been delayed until now as I wanted to make sure that qedr supports iWARP prior to enabling iWARP device detection. iWARP changes in RDMA tree have been accepted and targeted at kernel 4.15, therefore, all iWARP fixes for this cycle are submitted to net-next. Changes from v1->v2 - Added "Fixes:" tag to commit message of patch #3 ==================== Signed-off by: Michal.Kalderon@cavium.com Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26qed: iWARP - Add check for errors on a SYN packetMichal Kalderon3-0/+10
A SYN packet which arrives with errors from FW should be dropped. This required adding an additional field to the ll2 rx completion data. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26qed: Fix maximum number of CQs for iWARPMichal Kalderon1-6/+6
The maximum number of CQs supported is bound to the number of connections supported, which differs between RoCE and iWARP. This fixes a crash that occurred in iWARP when running 1000 sessions using perftest. Fixes: 67b40dccc45 ("qed: Implement iWARP initialization, teardown and qp operations") Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26qed: Add iWARP out of order supportMichal Kalderon3-3/+59
iWARP requires OOO support which is already provided by the ll2 interface (until now was used only for iSCSI offload). The changes mostly include opening a ll2 dedicated connection for OOO and notifiying the FW about the handle id. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26qed: Add iWARP enablement supportMichal Kalderon4-5/+15
This patch is the last of the initial iWARP patch series. It adds the possiblity to actually detect iWARP from the device and enable it in the critical locations which basically make iWARP available. It wasn't submitted until now as iWARP hadn't been accepted into the rdma tree. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26ldmvsw: Remove redundant unlikely()Tobias Klauser1-1/+1
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26net/mlx5: Remove redundant unlikely()Tobias Klauser1-1/+1
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bnxt_en: Remove redundant unlikely()Tobias Klauser1-1/+1
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26kcm: Remove redundant unlikely()Tobias Klauser1-1/+1
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26ipv6: Remove redundant unlikely()Tobias Klauser1-1/+1
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26datagram: Remove redundant unlikely()Tobias Klauser1-1/+1
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26net: ena: Remove redundant unlikely()Tobias Klauser1-2/+2
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25neigh: make strucrt neigh_table::entry_size unsigned intAlexey Dobriyan3-12/+12
Key length can't be negative. Leave comparisons against nla_len() signed just in case truncated attribute can sneak in there. Space savings: add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-7 (-7) function old new delta pneigh_delete 273 272 -1 mlx5e_rep_netevent_event 1415 1414 -1 mlx5e_create_encap_header_ipv6 1194 1193 -1 mlx5e_create_encap_header_ipv4 1071 1070 -1 cxgb4_l2t_get 1104 1103 -1 __pneigh_lookup 69 68 -1 __neigh_create 2452 2451 -1 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25neigh: make struct neigh_table::entry_size unsigned intAlexey Dobriyan1-1/+1
Neigh entry size can't be negative. Space savings: add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-7 (-7) function old new delta lowpan_neigh_construct 25 24 -1 clip_seq_sub_iter 152 151 -1 clip_ioctl 1475 1474 -1 clip_constructor 93 92 -1 __neigh_create 2455 2452 -3 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: speed up skb_rbtree_purge()Eric Dumazet1-4/+7
As measured in my prior patch ("sch_netem: faster rb tree removal"), rbtree_postorder_for_each_entry_safe() is nice looking but much slower than using rb_next() directly, except when tree is small enough to fit in CPU caches (then the cost is the same) Also note that there is not even an increase of text size : $ size net/core/skbuff.o.before net/core/skbuff.o text data bss dec hex filename 40711 1298 0 42009 a419 net/core/skbuff.o.before 40711 1298 0 42009 a419 net/core/skbuff.o From: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25sch_netem: faster rb tree removalEric Dumazet1-3/+4
While running TCP tests involving netem storing millions of packets, I had the idea to speed up tfifo_reset() and did experiments. I tried the rbtree_postorder_for_each_entry_safe() method that is used in skb_rbtree_purge() but discovered it was slower than the current tfifo_reset() method. I measured time taken to release skbs with three occupation levels : 10^4, 10^5 and 10^6 skbs with three methods : 1) (current 'naive' method) while ((p = rb_first(&q->t_root))) { struct sk_buff *skb = netem_rb_to_skb(p); rb_erase(p, &q->t_root); rtnl_kfree_skbs(skb, skb); } 2) Use rb_next() instead of rb_first() in the loop : p = rb_first(&q->t_root); while (p) { struct sk_buff *skb = netem_rb_to_skb(p); p = rb_next(p); rb_erase(&skb->rbnode, &q->t_root); rtnl_kfree_skbs(skb, skb); } 3) "optimized" method using rbtree_postorder_for_each_entry_safe() struct sk_buff *skb, *next; rbtree_postorder_for_each_entry_safe(skb, next, &q->t_root, rbnode) { rtnl_kfree_skbs(skb, skb); } q->t_root = RB_ROOT; Results : method_1:while (rb_first()) rb_erase() 10000 skbs in 690378 ns (69 ns per skb) method_2:rb_first; while (p) { p = rb_next(p); ...} 10000 skbs in 541846 ns (54 ns per skb) method_3:rbtree_postorder_for_each_entry_safe() 10000 skbs in 868307 ns (86 ns per skb) method_1:while (rb_first()) rb_erase() 99996 skbs in 7804021 ns (78 ns per skb) method_2:rb_first; while (p) { p = rb_next(p); ...} 100000 skbs in 5942456 ns (59 ns per skb) method_3:rbtree_postorder_for_each_entry_safe() 100000 skbs in 11584940 ns (115 ns per skb) method_1:while (rb_first()) rb_erase() 1000000 skbs in 108577838 ns (108 ns per skb) method_2:rb_first; while (p) { p = rb_next(p); ...} 1000000 skbs in 82619635 ns (82 ns per skb) method_3:rbtree_postorder_for_each_entry_safe() 1000000 skbs in 127328743 ns (127 ns per skb) Method 2) is simply faster, probably because it maintains a smaller working size set. Note that this is the method we use in tcp_ofo_queue() already. I will also change skb_rbtree_purge() in a second patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25tun: delete original tun_get() and rename __tun_get() to tun_get()yuan linyu1-15/+11
it seems no need to keep tun_get() and __tun_get() at same time. Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25cxgb4: do DCB state reset in couple of placesGanesh Goudar3-6/+20
reset the driver's DCB state in couple of places where it was missing. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25Merge branch 'liquidio-fw-loading'David S. Miller4-28/+84
Rick Farrington says: ==================== liquidio: firmware loading 1. Allow host driver parameter to override auto-loaded firmware (in flash). 2. Verify version of firmware that is auto-loaded from flash. 3. Change value of fw_type module parameter to reflect default firmware image name that is loaded by host driver (in /sys/module/liquidio/...) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25liquidio: update module parameter fw_type to reflect firmware type loadedRick Farrington1-2/+4
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25liquidio: verify firmware version when auto-loaded from flash.Rick Farrington1-1/+17
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25liquidio: allow override of firmware present in flashRick Farrington4-26/+64
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25Merge branch 'dsa-port-enabling'David S. Miller3-17/+38
Vivien Didelot says: ==================== net: dsa: port enabling This patchset makes slave open and close symmetrical and provides helpers for enabling or disabling a given DSA port. Changes in v3: - save the phy_device change for a future patchset Changes in v2: - do not remove the phy argument from port enable/disable ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: dsa: add port enable and disable helpersVivien Didelot3-16/+37
Provide dsa_port_enable and dsa_port_disable helpers to respectively enable and disable a switch port. This makes the dsa_port_set_state_now helper static. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: dsa: make slave close symmetrical to openVivien Didelot1-5/+5
The DSA slave open function configures the unicast MAC addresses on the master device, enable the switch port, change its STP state, then start the PHY device. Make the close function symmetric, by first stopping the PHY device, then changing the STP state, disabling the switch port and restore the master device. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25hv_netvsc: Fix the real number of queues of non-vRSS casesHaiyang Zhang1-0/+6
For older hosts without multi-channel (vRSS) support, and some error cases, we still need to set the real number of queues to one. This patch adds this missing setting. Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25Merge branch 'tun-NAPI-and-gro'David S. Miller2-18/+245
Petar Penkov says: ==================== net: Improve code coverage of syzkaller This patch series is intended to improve code coverage of syzkaller on the early receive path, specifically including flow dissector, GRO, and GRO with frags parts of the networking stack. Syzkaller exercises the stack through the TUN driver and this is therefore where changes reside. Current coverage through netif_receive_skb() is limited as it does not touch on any of the aforementioned code paths. Furthermore, for full coverage, it is necessary to have more flexibility over the linear and non-linear data of the skbs. The following patches address this by providing the user(syzkaller) with the ability to send via napi_gro_receive() and napi_gro_frags(). Additionally, syzkaller can specify how many fragments there are and how much data per fragment there is. This is done by exploiting the convenient structure of iovecs. Finally, this patch series adds support for exercising the flow dissector during fuzzing. The code path including napi_gro_receive() can be enabled via the IFF_NAPI flag. The remainder of the changes in this patch series give the user significantly more control over packets entering the kernel. To avoid potential security vulnerabilities, hide the ability to send custom skbs and the flow dissector code paths behind a capable(CAP_NET_ADMIN) check to require special user privileges. Changes since v2 based on feedback from Willem de Bruijn and Mahesh Bandewar: Patch 1/ No changes. Patch 2/ Check if the preconditions for IFF_NAPI_FRAGS (IFF_NAPI and IFF_TAP) are met before opening/attaching rather than after. If they are not, change the behavior from discarding the flag to rejecting the command with EINVAL. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25tun: enable napi_gro_frags() for TUN/TAP driverPetar Penkov2-6/+129
Add a TUN/TAP receive mode that exercises the napi_gro_frags() interface. This mode is available only in TAP mode, as the interface expects packets with Ethernet headers. Furthermore, packets follow the layout of the iovec_iter that was received. The first iovec is the linear data, and every one after the first is a fragment. If there are more fragments than the max number, drop the packet. Additionally, invoke eth_get_headlen() to exercise flow dissector code and to verify that the header resides in the linear data. The napi_gro_frags() mode requires setting the IFF_NAPI_FRAGS option. This is imposed because this mode is intended for testing via tools like syzkaller and packetdrill, and the increased flexibility it provides can introduce security vulnerabilities. This flag is accepted only if the device is in TAP mode and has the IFF_NAPI flag set as well. This is done because both of these are explicit requirements for correct operation in this mode. Signed-off-by: Petar Penkov <peterpenkov96@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: davem@davemloft.net Cc: ppenkov@stanford.edu Acked-by: Mahesh Bandewar <maheshb@google,com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25tun: enable NAPI for TUN/TAP driverPetar Penkov2-15/+119
Changes TUN driver to use napi_gro_receive() upon receiving packets rather than netif_rx_ni(). Adds flag IFF_NAPI that enables these changes and operation is not affected if the flag is disabled. SKBs are constructed upon packet arrival and are queued to be processed later. The new path was evaluated with a benchmark with the following setup: Open two tap devices and a receiver thread that reads in a loop for each device. Start one sender thread and pin all threads to different CPUs. Send 1M minimum UDP packets to each device and measure sending time for each of the sending methods: napi_gro_receive(): 4.90s netif_rx_ni(): 4.90s netif_receive_skb(): 7.20s Signed-off-by: Petar Penkov <peterpenkov96@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: davem@davemloft.net Cc: ppenkov@stanford.edu Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: remove MTU limits for dummy and ifb deviceZhang Shengju2-1/+4
These two drivers (dummy and ifb) call ether_setup(), after commit 61e84623ace3 ("net: centralize net_device min/max MTU checking"), the range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default. These two devices should not have limits on MTU. This patch set their min_mtu/max_mtu to 0. So that dev_set_mtu() will not check the mtu range, and can be set with any value. CC: Eric Dumazet <edumazet@google.com> CC: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25hv_netvsc: make const array ver_list static, reduces object code sizeColin Ian King1-1/+1
Don't populate const array ver_list on the stack, instead make it static. Makes the object code smaller by over 400 bytes: Before: text data bss dec hex filename 18444 3168 320 21932 55ac drivers/net/hyperv/netvsc.o After: text data bss dec hex filename 17950 3224 320 21494 53f6 drivers/net/hyperv/netvsc.o (gcc 6.3.0, x86-64) Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25bpf: Optimize lpm trie deleteCraig Gallek1-28/+43
Before the delete operator was added, this datastructure maintained an invariant that intermediate nodes were only present when necessary to build the tree. This patch updates the delete operation to reinstate that invariant by removing unnecessary intermediate nodes after a node is removed and thus keeping the tree structure at a minimal size. Suggested-by: Daniel Mack <daniel@zonque.org> Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: nfc: llcp_core: use setup_timer() helper.Allen Pais1-6/+4
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: nfc: hci: llc_shdlc: use setup_timer() helper.Allen Pais1-9/+6
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>