Age | Commit message (Collapse) | Author | Files | Lines |
|
A latter patch will postpone the delivery_time clearing until the stack
knows the skb is being delivered locally. That will allow other kernel
forwarding path (e.g. ip[6]_forward) to keep the delivery_time also.
An earlier attempt was to do skb_clear_delivery_time() in
ip_local_deliver() and ip6_input(). The discussion [0] requested
to move it one step later into ip_local_deliver_finish()
and ip6_input_finish() so that the delivery_time can be kept
for the ip_vs forwarding path also.
To do that, this patch also needs to take care of the (rcv) timestamp
usecase in ip_is_fragment(). It needs to expect delivery_time in
the skb->tstamp, so it needs to save the mono_delivery_time bit in
inet_frag_queue such that the delivery_time (if any) can be restored
in the final defragmented skb.
[Note that it will only happen when the locally generated skb is looping
from egress to ingress over a virtual interface (e.g. veth, loopback...),
skb->tstamp may have the delivery time before it is known that it will
be delivered locally and received by another sk.]
[0]: https://lore.kernel.org/netdev/ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg/
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous patches handled the delivery_time before sch_handle_ingress().
This patch can now set the skb->mono_delivery_time to flag the skb->tstamp
is used as the mono delivery_time (EDT) instead of the (rcv) timestamp
and also clear it with skb_clear_delivery_time() after
sch_handle_ingress(). This will make the bpf_redirect_*()
to keep the mono delivery_time and used by a qdisc (fq) of
the egress-ing interface.
A latter patch will postpone the skb_clear_delivery_time() until the
stack learns that the skb is being delivered locally and that will
make other kernel forwarding paths (ip[6]_forward) able to keep
the delivery_time also. Thus, like the previous patches on using
the skb->mono_delivery_time bit, calling skb_clear_delivery_time()
is not limited within the CONFIG_NET_INGRESS to avoid too many code
churns among this set.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In __skb_tstamp_tx(), it may clone the egress skb and queues the clone to
the sk_error_queue. The outgoing skb may have the mono delivery_time
while the (rcv) timestamp is expected for the clone, so the
skb->mono_delivery_time bit needs to be cleared from the clone.
This patch adds the skb->mono_delivery_time clearing to the existing
__net_timestamp() and use it in __skb_tstamp_tx().
The __net_timestamp() fast path usage in dev.c is changed to directly
call ktime_get_real() since the mono_delivery_time bit is not set at
that point.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A latter patch will set the skb->mono_delivery_time to flag the skb->tstamp
is used as the mono delivery_time (EDT) instead of the (rcv) timestamp.
skb_clear_tstamp() will then keep this delivery_time during forwarding.
This patch is to make the network tapping (with af_packet) to handle
the delivery_time stored in skb->tstamp.
Regardless of tapping at the ingress or egress, the tapped skb is
received by the af_packet socket, so it is ingress to the af_packet
socket and it expects the (rcv) timestamp.
When tapping at egress, dev_queue_xmit_nit() is used. It has already
expected skb->tstamp may have delivery_time, so it does
skb_clone()+net_timestamp_set() to ensure the cloned skb has
the (rcv) timestamp before passing to the af_packet sk.
This patch only adds to clear the skb->mono_delivery_time
bit in net_timestamp_set().
When tapping at ingress, it currently expects the skb->tstamp is either 0
or the (rcv) timestamp. Meaning, the tapping at ingress path
has already expected the skb->tstamp could be 0 and it will get
the (rcv) timestamp by ktime_get_real() when needed.
There are two cases for tapping at ingress:
One case is af_packet queues the skb to its sk_receive_queue.
The skb is either not shared or new clone created. The newly
added skb_clear_delivery_time() is called to clear the
delivery_time (if any) and set the (rcv) timestamp if
needed before the skb is queued to the sk_receive_queue.
Another case, the ingress skb is directly copied to the rx_ring
and tpacket_get_timestamp() is used to get the (rcv) timestamp.
The newly added skb_tstamp() is used in tpacket_get_timestamp()
to check the skb->mono_delivery_time bit before returning skb->tstamp.
As mentioned earlier, the tapping@ingress has already expected
the skb may not have the (rcv) timestamp (because no sk has asked
for it) and has handled this case by directly calling ktime_get_real().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Right now, skb->tstamp is reset to 0 whenever the skb is forwarded.
If skb->tstamp has the mono delivery_time, clearing it can hurt
the performance when it finally transmits out to fq@phy-dev.
The earlier patch added a skb->mono_delivery_time bit to
flag the skb->tstamp carrying the mono delivery_time.
This patch adds skb_clear_tstamp() helper which keeps
the mono delivery_time and clears everything else.
The delivery_time clearing will be postponed until the stack knows the
skb will be delivered locally. It will be done in a latter patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(rcv) timestamp
skb->tstamp was first used as the (rcv) timestamp.
The major usage is to report it to the user (e.g. SO_TIMESTAMP).
Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP)
during egress and used by the qdisc (e.g. sch_fq) to make decision on when
the skb can be passed to the dev.
Currently, there is no way to tell skb->tstamp having the (rcv) timestamp
or the delivery_time, so it is always reset to 0 whenever forwarded
between egress and ingress.
While it makes sense to always clear the (rcv) timestamp in skb->tstamp
to avoid confusing sch_fq that expects the delivery_time, it is a
performance issue [0] to clear the delivery_time if the skb finally
egress to a fq@phy-dev. For example, when forwarding from egress to
ingress and then finally back to egress:
tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns
^ ^
reset rest
This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp
is storing the mono delivery_time (EDT) instead of the (rcv) timestamp.
The current use case is to keep the TCP mono delivery_time (EDT) and
to be used with sch_fq. A latter patch will also allow tc-bpf@ingress
to read and change the mono delivery_time.
In the future, another bit (e.g. skb->user_delivery_time) can be added
for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid.
[ This patch is a prep work. The following patches will
get the other parts of the stack ready first. Then another patch
after that will finally set the skb->mono_delivery_time. ]
skb_set_delivery_time() function is added. It is used by the tcp_output.c
and during ip[6] fragmentation to assign the delivery_time to
the skb->tstamp and also set the skb->mono_delivery_time.
A note on the change in ip_send_unicast_reply() in ip_output.c.
It is only used by TCP to send reset/ack out of a ctl_sk.
Like the new skb_set_delivery_time(), this patch sets
the skb->mono_delivery_time to 0 for now as a place
holder. It will be enabled in a latter patch.
A similar case in tcp_ipv6 can be done with
skb_set_delivery_time() in tcp_v6_send_response().
[0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "ocelot" and "ocelot-8021q" tagging protocols make use of different
hardware resources, and host FDB entries have different destination
ports in the switch analyzer module, practically speaking.
So when the user requests a tagging protocol change, the driver must
migrate all host FDB and MDB entries from the NPI port (in fact CPU port
module) towards the same physical port, but this time used as a regular
port.
It is pointless for the felix driver to keep a copy of the host
addresses, when we can create and export DSA helpers for walking through
the addresses that it already needs to keep on the CPU port, for
refcounting purposes.
felix_classify_db() is moved up to avoid a forward declaration.
We pass "bool change" because dp->fdbs and dp->mdbs are uninitialized
lists when felix_setup() first calls felix_set_tag_protocol(), so we
need to avoid calling dsa_port_walk_fdbs() during probe time.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS,
which should be carried by the RTM_SETSTATS message, and expresses a desire
to toggle L3 offload xstats on or off.
As part of the above, add an exported function rtnl_offload_xstats_notify()
that drivers can use when they have installed or deinstalled the counters
backing the HW stats.
At this point, it is possible to enable, disable and query L3 offload
xstats on netdevices. (However there is no driver actually implementing
these.)
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. These stats are only accessible through RTM_GETSTATS, and
therefore should be toggled by a RTM_SETSTATS message. Add it, and the
necessary skeleton handler.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new IFLA_STATS_LINK_OFFLOAD_XSTATS child attribute,
IFLA_OFFLOAD_XSTATS_L3_STATS, to carry statistics for traffic that takes
place in a HW router.
The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. Additionally, as a netdevice is configured, it may become or
cease being suitable for binding of a HW counter. Both of these aspects
need to be communicated to the userspace. To that end, add another child
attribute, IFLA_OFFLOAD_XSTATS_HW_S_INFO:
- attr nest IFLA_OFFLOAD_XSTATS_HW_S_INFO
- attr nest IFLA_OFFLOAD_XSTATS_L3_STATS
- attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST
- {0,1} as u8
- attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED
- {0,1} as u8
Thus this one attribute is a nest that can be used to carry information
about various types of HW statistics, and indexing is very simply done by
wrapping the information for a given statistics suite into the attribute
that carries the suite is the RTM_GETSTATS query. At the same time, because
_HW_S_INFO is nested directly below IFLA_STATS_LINK_OFFLOAD_XSTATS, it is
possible through filtering to request only the metadata about individual
statistics suites, without having to hit the HW to get the actual counters.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Offloading switch device drivers may be able to collect statistics of the
traffic taking place in the HW datapath that pertains to a certain soft
netdevice, such as VLAN. Add the necessary infrastructure to allow exposing
these statistics to the offloaded netdevice in question. The API was shaped
by the following considerations:
- Collection of HW statistics is not free: there may be a finite number of
counters, and the act of counting may have a performance impact. It is
therefore necessary to allow toggling whether HW counting should be done
for any particular SW netdevice.
- As the drivers are loaded and removed, a particular device may get
offloaded and unoffloaded again. At the same time, the statistics values
need to stay monotonic (modulo the eventual 64-bit wraparound),
increasing only to reflect traffic measured in the device.
To that end, the netdevice keeps around a lazily-allocated copy of struct
rtnl_link_stats64. Device drivers then contribute to the values kept
therein at various points. Even as the driver goes away, the struct stays
around to maintain the statistics values.
- Different HW devices may be able to count different things. The
motivation behind this patch in particular is exposure of HW counters on
Nvidia Spectrum switches, where the only practical approach to counting
traffic on offloaded soft netdevices currently is to use router interface
counters, and count L3 traffic. Correspondingly that is the statistics
suite added in this patch.
Other devices may be able to measure different kinds of traffic, and for
that reason, the APIs are built to allow uniform access to different
statistics suites.
- Because soft netdevices and offloading drivers are only loosely bound, a
netdevice uses a notifier chain to communicate with the drivers. Several
new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages
to the offloading drivers.
- Devices can have various conditions for when a particular counter is
available. As the device is configured and reconfigured, the device
offload may become or cease being suitable for counter binding. A
netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to
ping offloading drivers and determine whether anyone currently implements
a given statistics suite. This information can then be propagated to user
space.
When the driver decides to unoffload a netdevice, it can use a
newly-added function, netdev_offload_xstats_report_delta(), to record
outstanding collected statistics, before destroying the HW counter.
This patch adds a helper, call_netdevice_notifiers_info_robust(), for
dispatching a notifier with the possibility of unwind when one of the
consumers bails. Given the wish to eventually get rid of the global
notifier block altogether, this helper only invokes the per-netns notifier
block.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The filter_mask field of RTM_GETSTATS header determines which top-level
attributes should be included in the netlink response. This saves
processing time by only including the bits that the user cares about
instead of always dumping everything. This is doubly important for
HW-backed statistics that would typically require a trip to the device to
fetch the stats.
So far there was only one HW-backed stat suite per attribute. However,
IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in
the following patches. It would therefore be advantageous to be able to
filter within that nest, and select just one or the other HW-backed
statistics suite.
Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload.
The scheme is as follows:
- RTM_GETSTATS
- struct if_stats_msg
- attr nest IFLA_STATS_GET_FILTERS
- attr IFLA_STATS_LINK_OFFLOAD_XSTATS
- u32 filter_mask
This scheme reuses the existing enumerators by nesting them in a dedicated
context attribute. This is covered by policies as usual, therefore a
gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS
nest has filtering enabled, because for the SW counters the issue does not
seem to be that important.
rtnl_offload_xstats_get_size() and _fill() are extended to observe the
requested filters.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds a function page_pool_get_stats which can be used by drivers to obtain
stats for a specified page_pool.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add per-cpu stats tracking page pool recycling events:
- cached: recycling placed page in the page pool cache
- cache_full: page pool cache was full
- ring: page placed into the ptr ring
- ring_full: page released from page pool because the ptr ring was full
- released_refcnt: page released (and not recycled) because refcnt > 1
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add per-pool statistics counters for the allocation path of a page pool.
These stats are incremented in softirq context, so no locking or per-cpu
variables are needed.
This code is disabled by default and a kernel config option is provided for
users who wish to enable them.
The statistics added are:
- fast: successful fast path allocations
- slow: slow path order-0 allocations
- slow_high_order: slow path high order allocations
- empty: ptr ring is empty, so a slow path allocation was forced.
- refill: an allocation which triggered a refill of the cache
- waive: pages obtained from the ptr ring that cannot be added to
the cache due to a NUMA mismatch.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Last tcp_write_queue_head() use was removed in commit
114f39feab36 ("tcp: restore autocorking"), so remove it.
Signed-off-by: Tao Chen <chentao3@hotmail.com>
Link: https://lore.kernel.org/r/SYZP282MB33317DEE1253B37C0F57231E86029@SYZP282MB3331.AUSP282.PROD.OUTLOOK.COM
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Network drivers such as igb or igc call eth_get_headlen() to determine the
header length for their to be constructed skbs in receive path.
When running HSR on top of these drivers, it results in triggering BUG_ON() in
skb_pull(). The reason is the skb headlen is not sufficient for HSR to work
correctly. skb_pull() notices that.
For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which
is not correct. The problem is, the flow dissection code does not take HSR into
account. Therefore, add support for it.
Reported-by: Anthony Harivel <anthony.harivel@linutronix.de>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the Ethertype for EtherCAT protocol.
Signed-off-by: Daniel Braunwarth <daniel@braunwarth.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the Ethertype for PROFINET protocol.
Signed-off-by: Daniel Braunwarth <daniel@braunwarth.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This add a new sysctl: net.smc.autocorking_size
We can dynamically change the behaviour of autocorking
by change the value of autocorking_size.
Setting to 0 disables autocorking in SMC
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch add sysctl interface to support container environment
for SMC as we talk in the mail list.
Link: https://lore.kernel.org/netdev/20220224020253.GF5443@linux.alibaba.com
Co-developed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for VXLAN vni filter entries' stats dumping
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add per-vni statistics for vni filter mode. Counting Rx/Tx
bytes/packets/drops/errors at the appropriate places.
This patch changes vxlan_vs_find_vni to also return the
vxlan_vni_node in cases where the vni belongs to a vni
filtering vxlan device
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds vnifiltering support to collect metadata device.
Motivation:
You can only use a single vxlan collect metadata device for a given
vxlan udp port in the system today. The vxlan collect metadata device
terminates all received vxlan packets. As shown in the below diagram,
there are use-cases where you need to support multiple such vxlan devices in
independent bridge domains. Each vxlan device must terminate the vni's
it is configured for.
Example usecase: In a service provider network a service provider
typically supports multiple bridge domains with overlapping vlans.
One bridge domain per customer. Vlans in each bridge domain are
mapped to globally unique vxlan ranges assigned to each customer.
vnifiltering support in collect metadata devices terminates only configured
vnis. This is similar to vlan filtering in bridge driver. The vni filtering
capability is provided by a new flag on collect metadata device.
In the below pic:
- customer1 is mapped to br1 bridge domain
- customer2 is mapped to br2 bridge domain
- customer1 vlan 10-11 is mapped to vni 1001-1002
- customer2 vlan 10-11 is mapped to vni 2001-2002
- br1 and br2 are vlan filtering bridges
- vxlan1 and vxlan2 are collect metadata devices with
vnifiltering enabled
┌──────────────────────────────────────────────────────────────────┐
│ switch │
│ │
│ ┌───────────┐ ┌───────────┐ │
│ │ │ │ │ │
│ │ br1 │ │ br2 │ │
│ └┬─────────┬┘ └──┬───────┬┘ │
│ vlans│ │ vlans │ │ │
│ 10,11│ │ 10,11│ │ │
│ │ vlanvnimap: │ vlanvnimap: │
│ │ 10-1001,11-1002 │ 10-2001,11-2002 │
│ │ │ │ │ │
│ ┌──────┴┐ ┌──┴─────────┐ ┌───┴────┐ │ │
│ │ swp1 │ │vxlan1 │ │ swp2 │ ┌┴─────────────┐ │
│ │ │ │ vnifilter:│ │ │ │vxlan2 │ │
│ └───┬───┘ │ 1001,1002│ └───┬────┘ │ vnifilter: │ │
│ │ └────────────┘ │ │ 2001,2002 │ │
│ │ │ └──────────────┘ │
│ │ │ │
└───────┼──────────────────────────────────┼───────────────────────┘
│ │
│ │
┌─────┴───────┐ │
│ customer1 │ ┌─────┴──────┐
│ host/VM │ │customer2 │
└─────────────┘ │ host/VM │
└────────────┘
With this implementation, vxlan dst metadata device can
be associated with range of vnis.
struct vxlan_vni_node is introduced to represent
a configured vni. We start with vni and its
associated remote_ip in this structure. This
structure can be extended to bring in other
per vni attributes if there are usecases for it.
A vni inherits an attribute from the base vxlan device
if there is no per vni attributes defined.
struct vxlan_dev gets a new rhashtable for
vnis called vxlan_vni_group. vxlan_vnifilter.c
implements the necessary netlink api, notifications
and helper functions to process and manage lifecycle
of vxlan_vni_node.
This patch also adds new helper functions in vxlan_multicast.c
to handle per vni remote_ip multicast groups which are part
of vxlan_vni_group.
Fix build problems:
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds new rtm tunnel msg and api for tunnel id
filtering in dst_metadata devices. First dst_metadata
device to use the api is vxlan driver with AF_BRIDGE
family.
This and later changes add ability in vxlan driver to do
tunnel id filtering (or vni filtering) on dst_metadata
devices. This is similar to vlan api in the vlan filtering bridge.
this patch includes selinux nlmsg_route_perms support for RTM_*TUNNEL
api from Benjamin Poirier.
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-next 2022-22-02
The following PR includes updates to mlx5-next branch:
Headlines:
==========
1) Jakub cleans up unused static inline functions
2) I did some low level firmware command interface return status changes to
provide the caller with full visibility on the error/status returned by
the Firmware.
3) Use the new command interface in RDMA DEVX usecases to avoid flooding
dmesg with some "expected" user error prone use cases.
4) Moshe also uses the new command interface to grab the specific error
code from MFRL register command to provide the exact error reason for
why SW reset couldn't perform internally in FW.
5) From Mark Bloch: Lag, drop packets in hardware when possible
In active-backup mode the inactive interface's packets are dropped by the
bond device. In switchdev where TC rules are offloaded to the FDB
this can lead to packets being hit in the FDB where without offload
they would have been dropped before reaching TC rules in the kernel.
Create a drop rule to make sure packets on inactive ports are dropped
before reaching the FDB.
Listen on NETDEV_CHANGEUPPER / NETDEV_CHANGEINFODATA events and record
the inactive state and offload accordingly.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Add clarification on sync reset failure
net/mlx5: Add reset_state field to MFRL register
RDMA/mlx5: Use new command interface API
net/mlx5: cmdif, Refactor error handling and reporting of async commands
net/mlx5: Use mlx5_cmd_do() in core create_{cq,dct}
net/mlx5: cmdif, Add new api for command execution
net/mlx5: cmdif, cmd_check refactoring
net/mlx5: cmdif, Return value improvements
net/mlx5: Lag, offload active-backup drops to hardware
net/mlx5: Lag, record inactive state of bond device
net/mlx5: Lag, don't use magic numbers for ports
net/mlx5: Lag, use local variable already defined to access E-Switch
net/mlx5: E-switch, add drop rule support to ingress ACL
net/mlx5: E-switch, remove special uplink ingress ACL handling
net/mlx5: E-Switch, reserve and use same uplink metadata across ports
net/mlx5: Add ability to insert to specific flow group
mlx5: remove unused static inlines
====================
Link: https://lore.kernel.org/r/20220223233930.319301-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Mark Brown says:
====================
spi: Make remove() return void
This series from Uwe Kleine-König converts the spi remove function to
return void since there is nothing useful that we can do with a failure
and it as more buses are converted it'll enable further work on the
driver core.
====================
Link: https://lore.kernel.org/r/20220228173957.1262628-2-broonie@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As all users of phylink_set_pcs() have now been updated to use the
mac_select_pcs() method, it can be removed from the phylink kernel
API and its functionality moved into phylink_major_config().
Removing phylink_set_pcs() gives us a single approach for attaching
a PCS within phylink.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As more police parameters are passed to flow_offload, driver can check
them to make sure hardware handles packets in the way indicated by tc.
The conform-exceed control should be drop/pipe or drop/ok. Besides,
for drop/ok, the police should be the last action. As hardware can't
configure peakrate/avrate/overhead, offload should not be supported if
any of them is configured.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current police offload action entry is missing exceed/notexceed
actions and parameters that can be configured by tc police action.
Add the missing parameters as a pre-step for offloading police actions
to hardware.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently ocelot uses a pvid of 0 for standalone ports and ports under a
VLAN-unaware bridge, and the pvid of the bridge for ports under a
VLAN-aware bridge. Standalone ports do not perform learning, but packets
received on them are still subject to FDB lookups. So if the MAC DA that
a standalone port receives has been also learned on a VLAN-unaware
bridge port, ocelot will attempt to forward to that port, even though it
can't, so it will drop packets.
So there is a desire to avoid that, and isolate the FDBs of different
bridges from one another, and from standalone ports.
The ocelot switch library has two distinct entry points: the felix DSA
driver and the ocelot switchdev driver.
We need to code up a minimal bridge_num allocation in the ocelot
switchdev driver too, this is copied from DSA with the exception that
ocelot does not care about DSA trees, cross-chip bridging etc. So it
only looks at its own ports that are already in the same bridge.
The ocelot switchdev driver uses the bridge_num it has allocated itself,
while the felix driver uses the bridge_num allocated by DSA. They are
both stored inside ocelot_port->bridge_num by the common function
ocelot_port_bridge_join() which receives the bridge_num passed by value.
Once we have a bridge_num, we can only use it to enforce isolation
between VLAN-unaware bridges. As far as I can see, ocelot does not have
anything like a FID that further makes VLAN 100 from a port be different
to VLAN 100 from another port with regard to FDB lookup. So we simply
deny multiple VLAN-aware bridges.
For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we
allocate a VLAN for each bridge_num. This will be used as the pvid of
each port that is under that VLAN-unaware bridge, for as long as that
bridge is VLAN-unaware.
VID 0 remains only for standalone ports. It is okay if all standalone
ports use the same VID 0, since they perform no address learning, the
FDB will contain no entry in VLAN 0, so the packets will always be
flooded to the only possible destination, the CPU port.
The CPU port module doesn't need to be member of the VLANs to receive
packets, but if we use the DSA tag_8021q protocol, those packets are
part of the data plane as far as ocelot is concerned, so there it needs
to. Just ensure that the DSA tag_8021q CPU port is a member of all
reserved VLANs when it is created, and is removed when it is deleted.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As FDB isolation cannot be enforced between VLAN-aware bridges in lack
of hardware assistance like extra FID bits, it seems plausible that many
DSA switches cannot do it. Therefore, they need to reject configurations
with multiple VLAN-aware bridges from the two code paths that can
transition towards that state:
- joining a VLAN-aware bridge
- toggling VLAN awareness on an existing bridge
The .port_vlan_filtering method already propagates the netlink extack to
the driver, let's propagate it from .port_bridge_join too, to make sure
that the driver can use the same function for both.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For DSA, to encourage drivers to perform FDB isolation simply means to
track which bridge does each FDB and MDB entry belong to. It then
becomes the driver responsibility to use something that makes the FDB
entry from one bridge not match the FDB lookup of ports from other
bridges.
The top-level functions where the bridge is determined are:
- dsa_port_fdb_{add,del}
- dsa_port_host_fdb_{add,del}
- dsa_port_mdb_{add,del}
- dsa_port_host_mdb_{add,del}
aka the pre-crosschip-notifier functions.
Changing the API to pass a reference to a bridge is not superfluous, and
looking at the passed bridge argument is not the same as having the
driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add()
method.
DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well,
and those do not have any dp->bridge information to retrieve, because
they are not in any bridge - they are merely the pipes that serve the
user ports that are in one or multiple bridges.
The struct dsa_bridge associated with each FDB/MDB entry is encapsulated
in a larger "struct dsa_db" database. Although only databases associated
to bridges are notified for now, this API will be the starting point for
implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB
entries on the CPU port which belong to the corresponding user port's
port database. These are supposed to match only when the port is
standalone.
It is better to introduce the API in its expected final form than to
introduce it for bridges first, then to have to change drivers which may
have made one or more assumptions.
Drivers can use the provided bridge.num, but they can also use a
different numbering scheme that is more convenient.
DSA must perform refcounting on the CPU and DSA ports by also taking
into account the bridge number. So if two bridges request the same local
address, DSA must notify the driver twice, once for each bridge.
In fact, if the driver supports FDB isolation, DSA must perform
refcounting per bridge, but if the driver doesn't, DSA must refcount
host addresses across all bridges, otherwise it would be telling the
driver to delete an FDB entry for a bridge and the driver would delete
it for all bridges. So introduce a bool fdb_isolation in drivers which
would make all bridge databases passed to the cross-chip notifier have
the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal()
say that all bridge databases are the same database - which is
essentially the legacy behavior.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The dsa_8021q_bridge_tx_fwd_offload_vid is no longer used just for
bridge TX forwarding offload, it is the private VLAN reserved for
VLAN-unaware bridging in a way that is compatible with FDB isolation.
So just rename it dsa_tag_8021q_bridge_vid.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the old Shared VLAN Learning mode of operation that tag_8021q
previously used for forwarding, we needed to have distinct concepts for
an RX and a TX VLAN.
An RX VLAN could be installed on all ports that were members of a given
bridge, so that autonomous forwarding could still work, while a TX VLAN
was dedicated for precise packet steering, so it just contained the CPU
port and one egress port.
Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX
all over, those lines have been blurred and we no longer have the need
to do precise TX towards a port that is in a bridge. As for standalone
ports, it is fine to use the same VLAN ID for both RX and TX.
This patch changes the tag_8021q format by shifting the VLAN range it
reserves, and halving it. Previously, our DIR bits were encoding the
VLAN direction (RX/TX) and were set to either 1 or 2. This meant that
tag_8021q reserved 2K VLANs, or 50% of the available range.
Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q
reserve only 1K VLANs, and a different range now (the last 1K). This is
done so that we leave the old format in place in case we need to return
to it.
In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan
functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and
they are no longer distinct. For example, felix which did different
things for different VLAN types, now needs to handle the RX and the TX
logic for the same VLAN.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The sja1105 switch can't populate the PORT field of the tag_8021q header
when sending a frame to the CPU with a non-zero VBID.
Similar to dsa_find_designated_bridge_port_by_vid() which performs
imprecise RX for VLAN-aware bridges, let's introduce a helper in
tag_8021q for performing imprecise RX based on the VLAN that it has
allocated for a VLAN-unaware bridge.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too
tied with the sja1105 switch: each port uses the same pvid which is also
used for standalone operation (a unique one from which the source port
and device ID can be retrieved when packets from that port are forwarded
to the CPU). Since each port has a unique pvid when performing
autonomous forwarding, the switch must be configured for Shared VLAN
Learning (SVL) such that the VLAN ID itself is ignored when performing
FDB lookups. Without SVL, packets would always be flooded, since FDB
lookup in the source port's VLAN would never find any entry.
First of all, to make tag_8021q more palatable to switches which might
not support Shared VLAN Learning, let's just use a common VLAN for all
ports that are under the same bridge.
Secondly, using Shared VLAN Learning means that FDB isolation can never
be enforced. But if all ports under the same VLAN-unaware bridge share
the same VLAN ID, it can.
The disadvantage is that the CPU port can no longer perform precise
source port identification for these packets. But at least we have a
mechanism which has proven to be adequate for that situation: imprecise
RX (dsa_find_designated_bridge_port_by_vid), which is what we use for
termination on VLAN-aware bridges.
The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the
same one as we were previously using for imprecise TX (bridge TX
forwarding offload). It is already allocated, it is just a matter of
using it.
Note that because now all ports under the same bridge share the same
VLAN, the complexity of performing a tag_8021q bridge join decreases
dramatically. We no longer have to install the RX VLAN of a newly
joining port into the port membership of the existing bridge ports.
The newly joining port just becomes a member of the VLAN corresponding
to that bridge, and the other ports are already members of it from when
they joined the bridge themselves. So forwarding works properly.
This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the
cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put
these calls directly into the sja1105 driver.
With this new mode of operation, a port controlled by tag_8021q can have
two pvids whereas before it could only have one. The pvid for standalone
operation is different from the pvid used for VLAN-unaware bridging.
This is done, again, so that FDB isolation can be enforced.
Let tag_8021q manage this by deleting the standalone pvid when a port
joins a bridge, and restoring it when it leaves it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace kfree_skb() used in __neigh_event_send() with
kfree_skb_reason(). Following drop reasons are added:
SKB_DROP_REASON_NEIGH_FAILED
SKB_DROP_REASON_NEIGH_QUEUEFULL
SKB_DROP_REASON_NEIGH_DEAD
The first two reasons above should be the hot path that skb drops
in neighbour subsystem.
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace kfree_skb() which is used in the packet egress path of IP layer
with kfree_skb_reason(). Functions that are involved include:
__ip_queue_xmit()
ip_finish_output()
ip_mc_finish_output()
ip6_output()
ip6_finish_output()
ip6_finish_output2()
Following new drop reasons are introduced:
SKB_DROP_REASON_IP_OUTNOROUTES
SKB_DROP_REASON_BPF_CGROUP_EGRESS
SKB_DROP_REASON_IPV6DISABLED
SKB_DROP_REASON_NEIGH_CREATEFAIL
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
packets can be filtered using ipv6_ext flag.
Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The functions do essentially the same work to verify TCP-MD5 sign.
Code can be merged into one family-independent function in order to
reduce copy'n'paste and generated code.
Later with TCP-AO option added, this will allow to create one function
that's responsible for segment verification, that will have all the
different checks for MD5/AO/non-signed packets, which in turn will help
to see checks for all corner-cases in one function, rather than spread
around different families and functions.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220223175740.452397-1-dima@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This adds the logic in the Felix DSA driver and Ocelot switch library.
For Ocelot switches, the DEST_IDX that is the output of the MAC table
lookup is a logical port (equal to physical port, if no LAG is used, or
a dynamically allocated number otherwise). The allocation we have in
place for LAG IDs is different from DSA's, so we can't use that:
- DSA allocates a continuous range of LAG IDs starting from 1
- Ocelot appears to require that physical ports and LAG IDs are in the
same space of [0, num_phys_ports), and additionally, ports that aren't
in a LAG must have physical port id == logical port id
The implication is that an FDB entry towards a LAG might need to be
deleted and reinstalled when the LAG ID changes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change introduces support for installing static FDB entries towards
a bridge port that is a LAG of multiple DSA switch ports, as well as
support for filtering towards the CPU local FDB entries emitted for LAG
interfaces that are bridge ports.
Conceptually, host addresses on LAG ports are identical to what we do
for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply
be replicated towards all member ports like we do for multicast, or VLAN.
Instead we need new driver API. Hardware usually considers a LAG to be a
"logical port", and sets the entire LAG as the forwarding destination.
The physical egress port selection within the LAG is made by hashing
policy, as usual.
To represent the logical port corresponding to the LAG, we pass by value
a copy of the dsa_lag structure to all switches in the tree that have at
least one port in that LAG.
To illustrate why a refcounted list of FDB entries is needed in struct
dsa_lag, it is enough to say that:
- a LAG may be a bridge port and may therefore receive FDB events even
while it isn't yet offloaded by any DSA interface
- DSA interfaces may be removed from a LAG while that is a bridge port;
we don't want FDB entries lingering around, but we don't want to
remove entries that are still in use, either
For all the cases below to work, the idea is to always keep an FDB entry
on a LAG with a reference count equal to the DSA member ports. So:
- if a port joins a LAG, it requests the bridge to replay the FDB, and
the FDB entries get created, or their refcount gets bumped by one
- if a port leaves a LAG, the FDB replay deletes or decrements refcount
by one
- if an FDB is installed towards a LAG with ports already present, that
entry is created (if it doesn't exist) and its refcount is bumped by
the amount of ports already present in the LAG
echo "Adding FDB entry to bond with existing ports"
ip link del bond0
ip link add bond0 type bond mode 802.3ad
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link del br0
ip link del bond0
echo "Adding FDB entry to empty bond"
ip link del bond0
ip link add bond0 type bond mode 802.3ad
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link del br0
ip link del bond0
echo "Adding FDB entry to empty bond, then removing ports one by one"
ip link del bond0
ip link add bond0 type bond mode 802.3ad
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link set swp1 nomaster
ip link set swp2 nomaster
ip link del br0
ip link del bond0
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the switchdev_handle_fdb_event_to_device() event replication helper
was created, my original thought was that FDB events on LAG interfaces
should most likely be special-cased, not just replicated towards all
switchdev ports beneath that LAG. So this replication helper currently
does not recurse through switchdev lower interfaces of LAG bridge ports,
but rather calls the lag_mod_cb() if that was provided.
No switchdev driver uses this helper for FDB events on LAG interfaces
yet, so that was an assumption which was yet to be tested. It is
certainly usable for that purpose, as my RFC series shows:
https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/
however this approach is slightly convoluted because:
- the switchdev driver gets a "dev" that isn't its own net device, but
rather the LAG net device. It must call switchdev_lower_dev_find(dev)
in order to get a handle of any of its own net devices (the ones that
pass check_cb).
- in order for FDB entries on LAG ports to be correctly refcounted per
the number of switchdev ports beneath that LAG, we haven't escaped the
need to iterate through the LAG's lower interfaces. Except that is now
the responsibility of the switchdev driver, because the replication
helper just stopped half-way.
So, even though yes, FDB events on LAG bridge ports must be
special-cased, in the end it's simpler to let switchdev_handle_fdb_*
just iterate through the LAG port's switchdev lowers, and let the
switchdev driver figure out that those physical ports are under a LAG.
The switchdev_handle_fdb_event_to_device() helper takes a
"foreign_dev_check" callback so it can figure out whether @dev can
autonomously forward to @foreign_dev. DSA fills this method properly:
if the LAG is offloaded by another port in the same tree as @dev, then
it isn't foreign. If it is a software LAG, it is foreign - forwarding
happens in software.
Whether an interface is foreign or not decides whether the replication
helper will go through the LAG's switchdev lowers or not. Since the
lan966x doesn't properly fill this out, FDB events on software LAG
uppers will get called. By changing lan966x_foreign_dev_check(), we can
suppress them.
Whereas DSA will now start receiving FDB events for its offloaded LAG
uppers, so we need to return -EOPNOTSUPP, since we currently don't do
the right thing for them.
Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The main purpose of this change is to create a data structure for a LAG
as seen by DSA. This is similar to what we have for bridging - we pass a
copy of this structure by value to ->port_lag_join and ->port_lag_leave.
For now we keep the lag_dev, id and a reference count in it. Future
patches will add a list of FDB entries for the LAG (these also need to
be refcounted to work properly).
The LAG structure is created using dsa_port_lag_create() and destroyed
using dsa_port_lag_destroy(), just like we have for bridging.
Because now, the dsa_lag itself is refcounted, we can simplify
dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in
the dst->lags array only as long as at least one port uses it. The
refcounting logic inside those functions can be removed now - they are
called only when we should perform the operation.
dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag
structure instead of the lag_dev net_device.
dsa_lag_foreach_port() now takes the dsa_lag structure as argument.
dst->lags holds an array of dsa_lag structures.
dsa_lag_map() now also saves the dsa_lag->id value, so that linear
walking of dst->lags in drivers using dsa_lag_id() is no longer
necessary. They can just look at lag.id.
dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(),
which can be used by drivers to get the LAG ID assigned by DSA to a
given port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The DSA LAG API will be changed to become more similar with the bridge
data structures, where struct dsa_bridge holds an unsigned int num,
which is generated by DSA and is one-based. We have a similar thing
going with the DSA LAG, except that isn't stored anywhere, it is
calculated dynamically by dsa_lag_id() by iterating through dst->lags.
The idea of encoding an invalid (or not requested) LAG ID as zero for
the purpose of simplifying checks in drivers means that the LAG IDs
passed by DSA to drivers need to be one-based too. So back-and-forth
conversion is needed when indexing the dst->lags array, as well as in
drivers which assume a zero-based index.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation of converting struct net_device *dp->lag_dev into a
struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
all occurrences of the "lag" variable in the DSA core to "lag_dev".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tools/testing/selftests/net/mptcp/mptcp_join.sh
34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers")
857898eb4b28 ("selftests: mptcp: add missing join check")
6ef84b1517e0 ("selftests: mptcp: more robust signal race test")
https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c
fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions")
c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information")
09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr")
84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr")
efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow")
3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
Current release - regressions:
- bpf: fix crash due to out of bounds access into reg2btf_ids
- mvpp2: always set port pcs ops, avoid null-deref
- eth: marvell: fix driver load from initrd
- eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"
Current release - new code bugs:
- mptcp: fix race in overlapping signal events
Previous releases - regressions:
- xen-netback: revert hotplug-status changes causing devices to not
be configured
- dsa:
- avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
held
- fix panic when removing unoffloaded port from bridge
- dsa: microchip: fix bridging with more than two member ports
Previous releases - always broken:
- bpf:
- fix crash due to incorrect copy_map_value when both spin lock
and timer are present in a single value
- fix a bpf_timer initialization issue with clang
- do not try bpf_msg_push_data with len 0
- add schedule points in batch ops
- nf_tables:
- unregister flowtable hooks on netns exit
- correct flow offload action array size
- fix a couple of memory leaks
- vsock: don't check owner in vhost_vsock_stop() while releasing
- gso: do not skip outer ip header in case of ipip and net_failover
- smc: use a mutex for locking "struct smc_pnettable"
- openvswitch: fix setting ipv6 fields causing hw csum failure
- mptcp: fix race in incoming ADD_ADDR option processing
- sysfs: add check for netdevice being present to speed_show
- sched: act_ct: fix flow table lookup after ct clear or switching
zones
- eth: intel: fixes for SR-IOV forwarding offloads
- eth: broadcom: fixes for selftests and error recovery
- eth: mellanox: flow steering and SR-IOV forwarding fixes
Misc:
- make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
friends not report freed skbs as drops
- force inlining of checksum functions in net/checksum.h"
* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: mv643xx_eth: process retval from of_get_mac_address
ping: remove pr_err from ping_lookup
Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
openvswitch: Fix setting ipv6 fields causing hw csum failure
ipv6: prevent a possible race condition with lifetimes
net/smc: Use a mutex for locking "struct smc_pnettable"
bnx2x: fix driver load from initrd
Revert "xen-netback: Check for hotplug-status existence before watching"
Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
net/mlx5e: Fix VF min/max rate parameters interchange mistake
net/mlx5e: Add missing increment of count
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
net/mlx5e: Add feature check for set fec counters
net/mlx5e: TC, Skip redundant ct clear actions
net/mlx5e: TC, Reject rules with forward and drop actions
net/mlx5e: TC, Reject rules with drop and modify hdr action
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
net/mlx5: Fix possible deadlock on rule deletion
...
|