Age | Commit message (Collapse) | Author | Files | Lines |
|
Vport number is 16-bit field in hardware. Make it u16.
Move location of vport in the structure so that it reduces a hole
in the structure.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Maor Gottlieb says:
====================
This series from Maor extends MEMIC to support atomic operations from the
host in addition to already supported regular read/write.
====================
* 'memic_ops':
RDMA/mlx5: Expose UAPI to query DM
RDMA/mlx5: Add support in MEMIC operations
RDMA/mlx5: Add support to MODIFY_MEMIC command
RDMA/mlx5: Re-organize the DM code
RDMA/mlx5: Move all DM logic to separate file
RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number
net/mlx5: Add MEMIC operations related bits
|
|
Add the MEMIC operations bits and structures to the mlx5_ifc file.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
This pr contains changes from mlx5-next branch,
already reviewed on netdev and rdma mailing lists, links below.
1) From Leon, Dynamically assign MSI-X vectors count
Already Acked by Bjorn Helgaas.
https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/
2) Cleanup series:
https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/
From Mark, E-Switch cleanups and refactoring, and the addition
of single FDB mode needed HW bits.
From Mikhael, Remove unused struct field
From Saeed, Cleanup W=1 prototype warning
From Zheng, Esw related cleanup
From Tariq, User order-0 page allocation for EQs
====================
* mlx5-next:
net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks
net/mlx5: Dynamically assign MSI-X vectors count
net/mlx5: Add dynamic MSI-X capabilities bits
PCI/IOV: Add sysfs MSI-X vector assignment interface
net/mlx5: Use order-0 allocations for EQs
net/mlx5: Add IFC bits needed for single FDB mode
net/mlx5: E-Switch, Refactor send to vport to be more generic
RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
net/mlx5: E-Switch, Add eswitch pointer to each representor
net/mlx5: E-Switch, Add match on vhca id to default send rules
net/mlx5: Remove unused mlx5_core_health member recover_work
net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()
net/mlx5: Cleanup prototype warning
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Allow the driver to recognise DSFP transceiver module ID and therefore
allow its EEPROM dumps using ethtool.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement ethtool_ops::get_module_eeprom_by_page() to enable
support of new SFP standards.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prepare for ethtool_ops::get_module_eeprom_data() implementation by
extracting common part of mlx5_query_module_eeprom() into a separate
function.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-next 2021-04-09
This pr contains changes from mlx5-next branch,
already reviewed on netdev and rdma mailing lists, links below.
1) From Leon, Dynamically assign MSI-X vectors count
Already Acked by Bjorn Helgaas.
https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/
2) Cleanup series:
https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/
From Mark, E-Switch cleanups and refactoring, and the addition
of single FDB mode needed HW bits.
From Mikhael, Remove unused struct field
From Saeed, Cleanup W=1 prototype warning
From Zheng, Esw related cleanup
From Tariq, User order-0 page allocation for EQs
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks
net/mlx5: Dynamically assign MSI-X vectors count
net/mlx5: Add dynamic MSI-X capabilities bits
PCI/IOV: Add sysfs MSI-X vector assignment interface
net/mlx5: Use order-0 allocations for EQs
net/mlx5: Add IFC bits needed for single FDB mode
net/mlx5: E-Switch, Refactor send to vport to be more generic
RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
net/mlx5: E-Switch, Add eswitch pointer to each representor
net/mlx5: E-Switch, Add match on vhca id to default send rules
net/mlx5: Remove unused mlx5_core_health member recover_work
net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()
net/mlx5: Cleanup prototype warning
====================
Link: https://lore.kernel.org/r/20210409200704.10886-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently reg_c0 lower 16 bits and reg_b are used to store the chain
id that missed in FDB and NIC tables accordingly. However, the
registers' values may index a restore object, rather than a single u32
value. Different object types can be used to restore mutually exclusive
contexts such as chain id and sample group id.
Use the mapping object to associate an index with a restore object
as a prestep for supporting additional restore types.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add reserved mapping to cover all the register in order to avoid setting
arbitrary values to newer FW which implements the reserved fields.
Fixes: 50b4a3c23646 ("net/mlx5: PPTB and PBMC register firmware command support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add reserved mapping to cover all the register in order to avoid
setting arbitrary values to newer FW which implements the reserved
fields.
Fixes: a58837f52d43 ("net/mlx5e: Expose FEC feilds and related capability bit")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commit wrongly placed log_max_flow_counter field of
mlx5_ifc_flow_table_prop_layout_bits, align it to the HW spec intended
placement.
Fixes: 16f1c5bb3ed7 ("net/mlx5: Check device capability for maximum flow counters")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
These new fields declare the number of MSI-X vectors that is possible to
allocate on the VF through PF configuration.
Value must be in range defined by min_dynamic_vf_msix_table_size and
max_dynamic_vf_msix_table_size.
The driver should continue to query its MSI-X table through PCI
configuration header.
Link: https://lore.kernel.org/linux-pci/20210314124256.70253-3-leon@kernel.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
A device supports 128 rate limiters. A static table allocation consumes
8KB of memory even when rate is not configured.
Instead, allocate the table when at least one rate is configured.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5_rl_entry structure is not properly packed as shown below. Due to this
an array of size 9144 bytes allocated which is aligned to 16Kbytes.
Hence, pack the structure and avoid the wastage.
This offers 8Kbytes of saving per mlx5_core_dev struct.
pahole -C mlx5_rl_entry drivers/net/ethernet/mellanox/mlx5/core/en_main.o
Existing layout:
struct mlx5_rl_entry {
u8 rl_raw[48]; /* 0 48 */
u16 index; /* 48 2 */
/* XXX 6 bytes hole, try to pack */
u64 refcount; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u16 uid; /* 64 2 */
u8 dedicated:1; /* 66: 0 1 */
/* size: 72, cachelines: 2, members: 5 */
/* sum members: 60, holes: 1, sum holes: 6 */
/* sum bitfield members: 1 bits (0 bytes) */
/* padding: 5 */
/* bit_padding: 7 bits */
/* last cacheline: 8 bytes */
};
After alignment:
struct mlx5_rl_entry {
u8 rl_raw[48]; /* 0 48 */
u64 refcount; /* 48 8 */
u16 index; /* 56 2 */
u16 uid; /* 58 2 */
u8 dedicated:1; /* 60: 0 1 */
/* size: 64, cachelines: 1, members: 5 */
/* padding: 3 */
/* bit_padding: 7 bits */
};
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When switching modes between legacy and switchdev and back, do not
reload ethernet interfaces. just change the profile from nic profile
to uplink rep profile in switchdev mode.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
We re-use the native NIC port net device instance for the Uplink
representor, and the devlink port.
When changing profiles we reset the mlx5e priv but we should still
use the devlink port so move it to mlx5e resources.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This is to separate between resources attributes and other
attributes we will want to use.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Support matching on ICMPv4/6 type and code parameters using misc3
section of match parameters.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently we are allocating high-order page for EQs. In case of
fragmented system, VF hot remove/add in VMs for example, there isn't
enough contiguous memory for EQs allocation, which results in crashing
of the VM.
Therefore, use order-0 fragments for the EQ allocations instead.
Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
Performance tests show no sensible degradation.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently we operate in a mode where each eswitch manager has a separate
FDB. In order to combine these multiple FDBs we expose new caps to allow
this:
- Set root flow table which isn't native.
- Set FDB a different selection mode when in LAG mode.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Now that each representor stores a pointer to the managing E-Switch
use that information when creating the send-to-vport rules.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Store the managing E-Switch of each representor. This will be used
when a representor is created on eswitch manager 0 but the vport
belongs to eswitch manager 1.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The code related to health->recover_work was removed in
commit 63cbc552eebf ("net/mlx5: Handle SW reset of FW in error flow")
Fix struct mlx5_core_health accordingly.
Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5_is_roce_enabled returns the devlink RoCE init value, therefore it
should be used only when driver is loaded. Instead we just need to read
the roce_en field.
In addition, rename mlx5_is_roce_enabled to mlx5_is_roce_init_enabled.
Fixes: 7a58779edd75 ("IB/mlx5: Improve query port for representor port")
Link: https://lore.kernel.org/r/20210304124517.1100608-2-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
QPs which don't care from timestamp mode, should set the ts_format
to default, otherwise the QP creation could be failed if the timestamp
mode is not supported.
Fixes: 2fe8d4b87802 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Pull rdma updates from Jason Gunthorpe:
"This is quite a small cycle, if not for Lee's 70 patches cleaning the
kdocs it would be well below typical for patch count.
Most of the interesting work here was in the HNS and rxe drivers which
got fairly major internal changes.
Summary:
- Driver updates and bug fixes: siw, hns, bnxt_re, mlx5, efa
- Significant rework in rxe to get it ready to have XRC support added
- Several rts bug fixes
- Big series to get to 'make W=1' cleanness, primarily updating kdocs
- Support for creating a RDMA MR from a DMABUF fd to allow PCI peer
to peer transfers to GPU VRAM
- Device disassociation now works properly with umad
- Work to support more than 255 ports on a RDMA device
- Further support for the new HNS HIP09 hardware
- Coding style cleanups: comma to semicolon, unneded semicolon/blank
lines, remove 'h' printk format, don't check for NULL before kfree,
use true/false for bool"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (205 commits)
RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes
RDMA/mlx5: Fail QP creation if the device can not support the CQE TS
RDMA/mlx5: Allow CQ creation without attached EQs
RDMA/rtrs-srv-sysfs: fix missing put_device
RDMA/rtrs-srv: fix memory leak by missing kobject free
RDMA/rtrs: Only allow addition of path to an already established session
RDMA/rtrs-srv: Fix stack-out-of-bounds
RDMA/rxe: Remove unused pkt->offset
RDMA/ucma: Fix use-after-free bug in ucma_create_uevent
RDMA/core: Fix kernel doc warnings for ib_port_immutable_read()
RDMA/qedr: Use true and false for bool variable
RDMA/hns: Adjust definition of FRMR fields
RDMA/hns: Refactor process of posting CMDQ
RDMA/hns: Adjust fields and variables about CMDQ tail/head
RDMA/hns: Remove redundant operations on CMDQ
RDMA/hns: Fixes missing error code of CMDQ
RDMA/hns: Remove unused member and variable of CMDQ
RDMA/ipoib: Remove racy Subnet Manager sendonly join checks
RDMA/mlx5: Support 400Gbps IB rate in mlx5 driver
...
|
|
Linux 5.11
Merged to resolve conflicts with RDMA rc commits
- drivers/infiniband/sw/rxe/rxe_net.c
The final logic is to call rxe_get_dev_from_net() again with the master
netdev if the packet was rx'd on a vlan. To keep the elimination of the
local variables requires a trivial edit to the code in -rc
Link: https://lore.kernel.org/r/20210210131542.215ea67c@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
pull-request: mlx5-next 2021-02-16
The patches in this pr are already submitted and reviewed through the
netdev and rdma mailing lists.
The series includes mlx5 HW bits and definitions for mlx5 real time clock
translation and handling in the mlx5 driver clock module to enable and
support such mode [1]
[1] https://patchwork.kernel.org/project/netdevbpf/patch/20210212223042.449816-7-saeed@kernel.org/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Internal timer mode (SW clock) requires some PTP clock related metadata
structs. Real time mode (HW clock) will not need these metadata structs.
This separation emphasize the different interfaces for HW clock and SW
clock.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add needed structure layouts and defines for MTUTC (Management UTC)
register. MTUTC will be used for cyc2time HW translation.
In addition, add cyc2time modify capability bit and init segment HCA
real time address.
Finally, add capability bits indicating which time-stamping format is
supported per SQ and RQ. Add ts_format in the queue's context layout to
allow configuration.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Leon Romanovsky says:
====================
Add an extra timestamp format for mlx5_ib device.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* branch 'mlx5_timestamp':
RDMA/mlx5: Fail QP creation if the device can not support the CQE TS
net/mlx5: Add new timestamp mode bits
|
|
The traditional DevX CQ creation flow goes through mlx5_core_create_cq()
which checks that the given EQN corresponds to an existing EQ and attaches
a devx handler to the EQN for the CQ.
In some cases the EQ will not be a kernel EQ, but will be controlled by
modify CQ, don't block creating these just because the EQN can't be found
in the kernel.
Link: https://lore.kernel.org/r/20210211085549.1277674-1-leon@kernel.org
Signed-off-by: Tal Gilboa <talgi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
These fields declare which timestamp mode is supported by the device
per RQ/SQ/QP.
In addition add the ts_format field to the select the mode for
RQ/SQ/QP.
Link: https://lore.kernel.org/r/20210209131107.698833-2-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Device list is not stored in mlx5_priv anymore, so delete it as it's not
used.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Cleanup the synchronize_srcu() from the ODP flow as it was found to be a
very heavy time consumer as part of dereg_mr.
For example de-registration of 10000 ODP MRs each with size of 2M hugepage
took 19.6 sec comparing de-registration of same number of non ODP MRs that
took 172 ms.
The new locking scheme uses the wait_event() mechanism which follows the
use count of the MR instead of using synchronize_srcu().
By that change, the time required for the above test took 95 ms which is
even better than the non ODP flow.
Once fully dropped the srcu usage, had to come with a lock to protect the
XA access.
As part of using the above mechanism we could also clean the
num_deferred_work stuff and follow the use count instead.
Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Previous patch in series that implements stack devices RX path implements
indirect table rules that match on tunnel VNI. After such rule is created
all tunnel traffic is recirculated to root table. However, recirculated
packet might not match on any rules installed in the table (for example,
when IP traffic follows ARP traffic). In that case packets appear on
representor of tunnel endpoint VF instead being redirected to the VF
itself.
Extend slow table with additional flow group that matches on reg_c0 (source
port value set by indirect tables implemented by previous patch in series)
and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install
one rule per VF vport to match on recirculated miss packets and redirect
them to appropriate VF vport. Modify indirect tables code to also rewrite
reg_c1 with special 0xFFF mark.
Implementation reuses reg_c1 tunnel id bits. This is safe to do because
recirculated packets are always matched before decapsulation.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Following patch in series uses reg_c1 in eswitch code. To use reg_c1
helpers in both TC and eswitch code, refactor existing helpers according to
similar use case of reg_c0 and move the functionality into eswitch.h.
Calculate reg mappings length from new defines to ensure that they are
always in sync and only need to be changed in single place.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When tunnel endpoint is on VF, driver still assumes that endpoint is on
uplink and incorrectly configures encap rule offload according to that
assumption. As a result, traffic is sent directly to the uplink and rules
installed on representor of tunnel endpoint VF are ignored.
Implement following changes to allow offloading tx traffic with tunnel
endpoint on VF:
- For tunneling flows perform route lookup on route and out devices pair.
If out device is uplink and route device is VF of same physical port, then
modify packet reg_c_0 metadata register (source port) with the value of VF
vport. Use eswitch vhca_id->vport mapping introduced in one of previous
patches in the series to obtain vport from route netdevice.
- Recirculate encapsulated packets to VF vport in order to apply any flow
rules installed on VF representor that match on encapsulated traffic.
Only enable support for this functionality when all following conditions
are true:
- Hardware advertises capability to preserve reg_c_0 value on packet
recirculation.
- Vport metadata matching is enabled.
- Termination tables are to be used by the flow.
Example TC rules for VF tunnel traffic:
1. Rule that redirects packets from UL to VF rep that has the tunnel
endpoint IP address:
$ tc -s filter show dev enp8s0f0 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
dst_mac 16:c9:a0:2d:69:2c
src_mac 0c:42:a1:58:ab:e4
eth_type ipv4
ip_flags nofrag
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen
index 3 ref 1 bind 1 installed 377 sec used 0 sec
Action statistics:
Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 114096 bytes 952 pkt
backlog 0b 0p requeues 0
cookie 878fa48d8c423fc08c3b6ca599b50a97
no_percpu
used_hw_stats delayed
2. Rule that decapsulates the tunneled flow and redirects to destination VF
representor:
$ tc -s filter show dev vxlan_sys_4789 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
dst_mac ca:2e:a7:3f:f5:0f
src_mac 0a:40:bd:30:89:99
eth_type ipv4
enc_dst_ip 7.7.7.5
enc_src_ip 7.7.7.1
enc_key_id 98
enc_dst_port 4789
enc_tos 0
ip_flags nofrag
in_hw in_hw_count 1
action order 1: tunnel_key unset pipe
index 2 ref 1 bind 1 installed 434 sec used 434 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats delayed
action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
index 4 ref 1 bind 1 installed 434 sec used 0 sec
Action statistics:
Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 129936 bytes 1082 pkt
backlog 0b 0p requeues 0
cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
no_percpu
used_hw_stats delayed
Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5_port_caps are RDMA specific capabilities. These are not used by the
mlx5_core_device at all. Move them to mlx5_ib_dev where it is used and
reduce the scope of it to multiple drivers.
Link: https://lore.kernel.org/r/20210203130133.4057329-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
drivers/net/can/dev.c
b552766c872f ("can: dev: prevent potential information leak in can_fill_info()")
3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir")
0a042c6ec991 ("can: dev: move netlink related code into seperate file")
Code move.
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
57ac4a31c483 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
214baf22870c ("net/mlx5e: Support HTB offload")
Adjacent code changes
net/switchdev/switchdev.c
20776b465c0c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
ffb68fc58e96 ("net: switchdev: remove the transaction structure from port object notifiers")
bae33f2b5afe ("net: switchdev: remove the transaction structure from port attributes")
Transaction parameter gets dropped otherwise keep the fix.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 subfunction support
Parav Pandit says:
This patchset introduces support for mlx5 subfunction (SF).
A subfunction is a lightweight function that has a parent PCI function on
which it is deployed. mlx5 subfunction has its own function capabilities
and its own resources. This means a subfunction has its own dedicated
queues(txq, rxq, cq, eq). These queues are neither shared nor stolen from
the parent PCI function.
When subfunction is RDMA capable, it has its own QP1, GID table and rdma
resources neither shared nor stolen from the parent PCI function.
A subfunction has dedicated window in PCI BAR space that is not shared
with the other subfunctions or parent PCI function. This ensures that all
class devices of the subfunction accesses only assigned PCI BAR space.
A Subfunction supports eswitch representation through which it supports tc
offloads. User must configure eswitch to send/receive packets from/to
subfunction port.
Subfunctions share PCI level resources such as PCI MSI-X IRQs with
their other subfunctions and/or with its parent PCI function.
Subfunction support is discussed in detail in RFC [1] and [2].
RFC [1] and extension [2] describes requirements, design and proposed
plumbing using devlink, auxiliary bus and sysfs for systemd/udev
support. Functionality of this patchset is best explained using real
examples further below.
overview:
--------
A subfunction can be created and deleted by a user using devlink port
add/delete interface.
A subfunction can be configured using devlink port function attribute
before its activated.
When a subfunction is activated, it results in an auxiliary device on
the host PCI device where it is deployed. A driver binds to the
auxiliary device that further creates supported class devices.
example subfunction usage sequence:
-----------------------------------
Change device to switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
Add a devlink port of subfunction flavour:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
Configure mac address of the port function:
$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88
Now activate the function:
$ devlink port function set ens2f0npf0sf88 state active
Now use the auxiliary device and class devices:
$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4
$ ip link show
127: ens2f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 24:8a:07:b3:d1:12 brd ff:ff:ff:ff:ff:ff
altname enp6s0f0np0
129: p0sf88: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:88:88 brd ff:ff:ff:ff:ff:ff
$ rdma dev show
43: rdmap6s0f0: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d112 sys_image_guid 248a:0703:00b3:d112
44: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
After use inactivate the function:
$ devlink port function set ens2f0npf0sf88 state inactive
Now delete the subfunction port:
$ devlink port del ens2f0npf0sf88
[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2
=================
* tag 'mlx5-updates-2021-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Add devlink subfunction port documentation
devlink: Extend devlink port documentation for subfunctions
devlink: Add devlink port documentation
net/mlx5: SF, Port function state change support
net/mlx5: SF, Add port add delete functionality
net/mlx5: E-switch, Add eswitch helpers for SF vport
net/mlx5: E-switch, Prepare eswitch to handle SF vport
net/mlx5: SF, Add auxiliary device driver
net/mlx5: SF, Add auxiliary device support
net/mlx5: Introduce vhca state event notifier
devlink: Support get and set state of port function
devlink: Support add and delete devlink port
devlink: Introduce PCI SF port flavour and port attribute
devlink: Prepare code to fill multiple port function attributes
====================
Link: https://lore.kernel.org/r/20210122193658.282884-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce mlx5e_trap which includes a dedicated RQ and NAPI for trapped
packets. Trap-RQ processes packets that were destined to be dropped,
but for debug and visibility sake these packets are trapped and reported
to devlink.
Trap-RQ connects between the HW and the driver and is not a part of a
channel. Open mlx5e_create_rq() and mlx5_core_destroy_rq() as API and
add dedicate RQ handlers which report to devlink of trapped packets.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to allow mlx5 core driver to trigger synchronous operations to
its consumers, add a blocking events handler. Add wrappers to
blocking_notifier_[call_chain/chain_register/chain_unregister]. Add trap
callback for action set and notify about this change. Following patches
in the set add a listener for this event.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add devlink traps infra-structure to mlx5 core driver. Add traps list to
mlx5_priv and corresponding API:
- mlx5_devlink_trap_report() to wrap trap reports to devlink
- mlx5_devlink_trap_get_num_active() to decide whether to open/close trap
resources.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This commit adds support for HTB offload in the mlx5e driver.
Performance:
NIC: Mellanox ConnectX-6 Dx
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (24 cores with HT)
100 Gbit/s line rate, 500 UDP streams @ ~200 Mbit/s each
48 traffic classes, flower used for steering
No shaping (rate limits set to 4 Gbit/s per TC) - checking for max
throughput.
Baseline: 98.7 Gbps, 8.25 Mpps
HTB: 6.7 Gbps, 0.56 Mpps
HTB offload: 95.6 Gbps, 8.00 Mpps
Limitations:
1. 256 leaf nodes, 3 levels of depth.
2. Granularity for ceil is 1 Mbit/s. Rates are converted to weights, and
the bandwidth is split among the siblings according to these weights.
Other parameters for classes are not supported.
Ethtool statistics support for QoS SQs are also added. The counters are
called qos_txN_*, where N is the QoS queue number (starting from 0, the
numeration is separate from the normal SQs), and * is the counter name
(the counters are the same as for the normal SQs).
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that mlx5 upper
layer who wish to support sf port management in switchdev mode can
perform its task whenever eswitch mode is set to switchdev or before
eswitch is disabled.
Initialize sf port table on such eswitch event.
Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
or by its unique port index:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88 -jp
{
"port": {
"pci/0000:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00",
"state": "inactive",
"opstate": "detached"
}
}
}
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add auxiliary device driver for mlx5 subfunction auxiliary device.
A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.
As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as
$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4
$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4
$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4
$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].
[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|