diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-13 15:47:48 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-13 15:47:48 -0800 |
commit | 7e68dd7d07a28faa2e6574dd6b9dbd90cdeaae91 (patch) | |
tree | ae0427c5a3b905f24b3a44b510a9bcf35d9b67a3 /net/dsa/tag.h | |
parent | 1ca06f1c1acecbe02124f14a37cce347b8c1a90c (diff) | |
parent | 7c4a6309e27f411743817fe74a832ec2d2798a4b (diff) |
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing
data races
- De-duplicate common code between OVS and conntrack offloading
infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the
workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF
programs
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting,
and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of
access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back
to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink
operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the
existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error
reporting
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking
- IEEE 802154: synchronous send frame and extended filtering support,
initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and
the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment
implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and
migratable
- Extend devlink-rate to support strict prioriry and weighted fair
queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches
- Marvel Prestera AC5X Ethernet Switch
- WangXun 10 Gigabit NIC
- Motorcomm yt8521 Gigabit Ethernet
- Microchip ksz9563 Gigabit Ethernet Switch
- Microsoft Azure Network Adapter
- Linux Automation 10Base-T1L adapter
- PHY:
- Aquantia AQR112 and AQR412
- Motorcomm YT8531S
- PTP:
- Orolia ART-CARD
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets
- Realtek RTL8852BE and RTL8723DS
- Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN:
- gs_usb: bus error reporting support
- kvaser_usb: listen only and bus error reporting support
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping
- implement devlink-rate support
- support direct read from memory
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate
- Support for enhanced events compression
- extend H/W offload packet manipulation capabilities
- implement IPSec packet offload mode
- nVidia/Mellanox (mlx4):
- better big TCP support
- Netronome Ethernet NICs (nfp):
- IPsec offload support
- add support for multicast filter
- Broadcom:
- RSS and PTP support improvements
- AMD/SolarFlare:
- netlink extened ack improvements
- add basic flower matches to offload, and related stats
- Virtual NICs:
- ibmvnic: introduce affinity hint support
- small / embedded:
- FreeScale fec: add initial XDP support
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
- TI am65-cpsw: add suspend/resume support
- Mediatek MT7986: add RX wireless wthernet dispatch support
- Realtek 8169: enable GRO software interrupt coalescing per
default
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support
- add ip6gre support
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support
- enable flow offload support
- Renesas:
- add rswitch R-Car Gen4 gPTP support
- Microchip (lan966x):
- add full XDP support
- add TC H/W offload via VCAP
- enable PTP on bridge interfaces
- Microchip (ksz8):
- add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support
- add ack signal support
- enable coredump support
- remain_on_channel support
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
- 320 MHz channels support
- RealTek WiFi (rtw89):
- new dynamic header firmware format support
- wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
ipvs: fix type warning in do_div() on 32 bit
net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
net: ipa: add IPA v4.7 support
dt-bindings: net: qcom,ipa: Add SM6350 compatible
bnxt: Use generic HBH removal helper in tx path
IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
selftests: forwarding: Add bridge MDB test
selftests: forwarding: Rename bridge_mdb test
bridge: mcast: Support replacement of MDB port group entries
bridge: mcast: Allow user space to specify MDB entry routing protocol
bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
bridge: mcast: Add support for (*, G) with a source list and filter mode
bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
bridge: mcast: Add a flag for user installed source entries
bridge: mcast: Expose __br_multicast_del_group_src()
bridge: mcast: Expose br_multicast_new_group_src()
bridge: mcast: Add a centralized error path
bridge: mcast: Place netlink policy before validation functions
bridge: mcast: Split (*, G) and (S, G) addition into different functions
bridge: mcast: Do not derive entry type from its filter mode
...
Diffstat (limited to 'net/dsa/tag.h')
-rw-r--r-- | net/dsa/tag.h | 310 |
1 files changed, 310 insertions, 0 deletions
diff --git a/net/dsa/tag.h b/net/dsa/tag.h new file mode 100644 index 000000000000..7cfbca824f1c --- /dev/null +++ b/net/dsa/tag.h @@ -0,0 +1,310 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#ifndef __DSA_TAG_H +#define __DSA_TAG_H + +#include <linux/if_vlan.h> +#include <linux/list.h> +#include <linux/types.h> +#include <net/dsa.h> + +#include "port.h" +#include "slave.h" + +struct dsa_tag_driver { + const struct dsa_device_ops *ops; + struct list_head list; + struct module *owner; +}; + +extern struct packet_type dsa_pack_type; + +const struct dsa_device_ops *dsa_tag_driver_get_by_id(int tag_protocol); +const struct dsa_device_ops *dsa_tag_driver_get_by_name(const char *name); +void dsa_tag_driver_put(const struct dsa_device_ops *ops); +const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops); + +static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops) +{ + return ops->needed_headroom + ops->needed_tailroom; +} + +static inline struct net_device *dsa_master_find_slave(struct net_device *dev, + int device, int port) +{ + struct dsa_port *cpu_dp = dev->dsa_ptr; + struct dsa_switch_tree *dst = cpu_dp->dst; + struct dsa_port *dp; + + list_for_each_entry(dp, &dst->ports, list) + if (dp->ds->index == device && dp->index == port && + dp->type == DSA_PORT_TYPE_USER) + return dp->slave; + + return NULL; +} + +/* If under a bridge with vlan_filtering=0, make sure to send pvid-tagged + * frames as untagged, since the bridge will not untag them. + */ +static inline struct sk_buff *dsa_untag_bridge_pvid(struct sk_buff *skb) +{ + struct dsa_port *dp = dsa_slave_to_port(skb->dev); + struct net_device *br = dsa_port_bridge_dev_get(dp); + struct net_device *dev = skb->dev; + struct net_device *upper_dev; + u16 vid, pvid, proto; + int err; + + if (!br || br_vlan_enabled(br)) + return skb; + + err = br_vlan_get_proto(br, &proto); + if (err) + return skb; + + /* Move VLAN tag from data to hwaccel */ + if (!skb_vlan_tag_present(skb) && skb->protocol == htons(proto)) { + skb = skb_vlan_untag(skb); + if (!skb) + return NULL; + } + + if (!skb_vlan_tag_present(skb)) + return skb; + + vid = skb_vlan_tag_get_id(skb); + + /* We already run under an RCU read-side critical section since + * we are called from netif_receive_skb_list_internal(). + */ + err = br_vlan_get_pvid_rcu(dev, &pvid); + if (err) + return skb; + + if (vid != pvid) + return skb; + + /* The sad part about attempting to untag from DSA is that we + * don't know, unless we check, if the skb will end up in + * the bridge's data path - br_allowed_ingress() - or not. + * For example, there might be an 8021q upper for the + * default_pvid of the bridge, which will steal VLAN-tagged traffic + * from the bridge's data path. This is a configuration that DSA + * supports because vlan_filtering is 0. In that case, we should + * definitely keep the tag, to make sure it keeps working. + */ + upper_dev = __vlan_find_dev_deep_rcu(br, htons(proto), vid); + if (upper_dev) + return skb; + + __vlan_hwaccel_clear_tag(skb); + + return skb; +} + +/* For switches without hardware support for DSA tagging to be able + * to support termination through the bridge. + */ +static inline struct net_device * +dsa_find_designated_bridge_port_by_vid(struct net_device *master, u16 vid) +{ + struct dsa_port *cpu_dp = master->dsa_ptr; + struct dsa_switch_tree *dst = cpu_dp->dst; + struct bridge_vlan_info vinfo; + struct net_device *slave; + struct dsa_port *dp; + int err; + + list_for_each_entry(dp, &dst->ports, list) { + if (dp->type != DSA_PORT_TYPE_USER) + continue; + + if (!dp->bridge) + continue; + + if (dp->stp_state != BR_STATE_LEARNING && + dp->stp_state != BR_STATE_FORWARDING) + continue; + + /* Since the bridge might learn this packet, keep the CPU port + * affinity with the port that will be used for the reply on + * xmit. + */ + if (dp->cpu_dp != cpu_dp) + continue; + + slave = dp->slave; + + err = br_vlan_get_info_rcu(slave, vid, &vinfo); + if (err) + continue; + + return slave; + } + + return NULL; +} + +/* If the ingress port offloads the bridge, we mark the frame as autonomously + * forwarded by hardware, so the software bridge doesn't forward in twice, back + * to us, because we already did. However, if we're in fallback mode and we do + * software bridging, we are not offloading it, therefore the dp->bridge + * pointer is not populated, and flooding needs to be done by software (we are + * effectively operating in standalone ports mode). + */ +static inline void dsa_default_offload_fwd_mark(struct sk_buff *skb) +{ + struct dsa_port *dp = dsa_slave_to_port(skb->dev); + + skb->offload_fwd_mark = !!(dp->bridge); +} + +/* Helper for removing DSA header tags from packets in the RX path. + * Must not be called before skb_pull(len). + * skb->data + * | + * v + * | | | | | | | | | | | | | | | | | | | + * +-----------------------+-----------------------+---------------+-------+ + * | Destination MAC | Source MAC | DSA header | EType | + * +-----------------------+-----------------------+---------------+-------+ + * | | + * <----- len -----> <----- len -----> + * | + * >>>>>>> v + * >>>>>>> | | | | | | | | | | | | | | | + * >>>>>>> +-----------------------+-----------------------+-------+ + * >>>>>>> | Destination MAC | Source MAC | EType | + * +-----------------------+-----------------------+-------+ + * ^ + * | + * skb->data + */ +static inline void dsa_strip_etype_header(struct sk_buff *skb, int len) +{ + memmove(skb->data - ETH_HLEN, skb->data - ETH_HLEN - len, 2 * ETH_ALEN); +} + +/* Helper for creating space for DSA header tags in TX path packets. + * Must not be called before skb_push(len). + * + * Before: + * + * <<<<<<< | | | | | | | | | | | | | | | + * ^ <<<<<<< +-----------------------+-----------------------+-------+ + * | <<<<<<< | Destination MAC | Source MAC | EType | + * | +-----------------------+-----------------------+-------+ + * <----- len -----> + * | + * | + * skb->data + * + * After: + * + * | | | | | | | | | | | | | | | | | | | + * +-----------------------+-----------------------+---------------+-------+ + * | Destination MAC | Source MAC | DSA header | EType | + * +-----------------------+-----------------------+---------------+-------+ + * ^ | | + * | <----- len -----> + * skb->data + */ +static inline void dsa_alloc_etype_header(struct sk_buff *skb, int len) +{ + memmove(skb->data, skb->data + len, 2 * ETH_ALEN); +} + +/* On RX, eth_type_trans() on the DSA master pulls ETH_HLEN bytes starting from + * skb_mac_header(skb), which leaves skb->data pointing at the first byte after + * what the DSA master perceives as the EtherType (the beginning of the L3 + * protocol). Since DSA EtherType header taggers treat the EtherType as part of + * the DSA tag itself, and the EtherType is 2 bytes in length, the DSA header + * is located 2 bytes behind skb->data. Note that EtherType in this context + * means the first 2 bytes of the DSA header, not the encapsulated EtherType + * that will become visible after the DSA header is stripped. + */ +static inline void *dsa_etype_header_pos_rx(struct sk_buff *skb) +{ + return skb->data - 2; +} + +/* On TX, skb->data points to skb_mac_header(skb), which means that EtherType + * header taggers start exactly where the EtherType is (the EtherType is + * treated as part of the DSA header). + */ +static inline void *dsa_etype_header_pos_tx(struct sk_buff *skb) +{ + return skb->data + 2 * ETH_ALEN; +} + +/* Create 2 modaliases per tagging protocol, one to auto-load the module + * given the ID reported by get_tag_protocol(), and the other by name. + */ +#define DSA_TAG_DRIVER_ALIAS "dsa_tag:" +#define MODULE_ALIAS_DSA_TAG_DRIVER(__proto, __name) \ + MODULE_ALIAS(DSA_TAG_DRIVER_ALIAS __name); \ + MODULE_ALIAS(DSA_TAG_DRIVER_ALIAS "id-" \ + __stringify(__proto##_VALUE)) + +void dsa_tag_drivers_register(struct dsa_tag_driver *dsa_tag_driver_array[], + unsigned int count, + struct module *owner); +void dsa_tag_drivers_unregister(struct dsa_tag_driver *dsa_tag_driver_array[], + unsigned int count); + +#define dsa_tag_driver_module_drivers(__dsa_tag_drivers_array, __count) \ +static int __init dsa_tag_driver_module_init(void) \ +{ \ + dsa_tag_drivers_register(__dsa_tag_drivers_array, __count, \ + THIS_MODULE); \ + return 0; \ +} \ +module_init(dsa_tag_driver_module_init); \ + \ +static void __exit dsa_tag_driver_module_exit(void) \ +{ \ + dsa_tag_drivers_unregister(__dsa_tag_drivers_array, __count); \ +} \ +module_exit(dsa_tag_driver_module_exit) + +/** + * module_dsa_tag_drivers() - Helper macro for registering DSA tag + * drivers + * @__ops_array: Array of tag driver structures + * + * Helper macro for DSA tag drivers which do not do anything special + * in module init/exit. Each module may only use this macro once, and + * calling it replaces module_init() and module_exit(). + */ +#define module_dsa_tag_drivers(__ops_array) \ +dsa_tag_driver_module_drivers(__ops_array, ARRAY_SIZE(__ops_array)) + +#define DSA_TAG_DRIVER_NAME(__ops) dsa_tag_driver ## _ ## __ops + +/* Create a static structure we can build a linked list of dsa_tag + * drivers + */ +#define DSA_TAG_DRIVER(__ops) \ +static struct dsa_tag_driver DSA_TAG_DRIVER_NAME(__ops) = { \ + .ops = &__ops, \ +} + +/** + * module_dsa_tag_driver() - Helper macro for registering a single DSA tag + * driver + * @__ops: Single tag driver structures + * + * Helper macro for DSA tag drivers which do not do anything special + * in module init/exit. Each module may only use this macro once, and + * calling it replaces module_init() and module_exit(). + */ +#define module_dsa_tag_driver(__ops) \ +DSA_TAG_DRIVER(__ops); \ + \ +static struct dsa_tag_driver *dsa_tag_driver_array[] = { \ + &DSA_TAG_DRIVER_NAME(__ops) \ +}; \ +module_dsa_tag_drivers(dsa_tag_driver_array) + +#endif |