summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
3 daysMerge tag 'driver-core-6.13-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of driver core changes for 6.13-rc1. Nothing major for this merge cycle, except for the two simple merge conflicts are here just to make life interesting. Included in here are: - sysfs core changes and preparations for more sysfs api cleanups that can come through all driver trees after -rc1 is out - fw_devlink fixes based on many reports and debugging sessions - list_for_each_reverse() removal, no one was using it! - last-minute seq_printf() format string bug found and fixed in many drivers all at once. - minor bugfixes and changes full details in the shortlog" * tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits) Fix a potential abuse of seq_printf() format string in drivers cpu: Remove spurious NULL in attribute_group definition s390/con3215: Remove spurious NULL in attribute_group definition perf: arm-ni: Remove spurious NULL in attribute_group definition driver core: Constify bin_attribute definitions sysfs: attribute_group: allow registration of const bin_attribute firmware_loader: Fix possible resource leak in fw_log_firmware_info() drivers: core: fw_devlink: Fix excess parameter description in docstring driver core: class: Correct WARN() message in APIs class_(for_each|find)_device() cacheinfo: Use of_property_present() for non-boolean properties cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap() drivers: core: fw_devlink: Make the error message a bit more useful phy: tegra: xusb: Set fwnode for xusb port devices drm: display: Set fwnode for aux bus devices driver core: fw_devlink: Stop trying to optimize cycle detection logic driver core: Constify attribute arguments of binary attributes sysfs: bin_attribute: add const read/write callback variants sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR() sysfs: treewide: constify attribute callback of bin_attribute::llseek() sysfs: treewide: constify attribute callback of bin_attribute::mmap() ...
10 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds51-492/+1821
Pull rdma updates from Jason Gunthorpe: "Seveal fixes scattered across the drivers and a few new features: - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns - Force disassociate the userspace FD when hns does an async reset - bnxt new features for optimized modify QP to skip certain stayes, CQ coalescing, better debug dumping - mlx5 new data placement ordering feature - Faster destruction of mlx5 devx HW objects - Improvements to RDMA CM mad handling" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits) RDMA/bnxt_re: Correct the sequence of device suspend RDMA/bnxt_re: Use the default mode of congestion control RDMA/bnxt_re: Support different traffic class IB/cm: Rework sending DREQ when destroying a cm_id IB/cm: Do not hold reference on cm_id unless needed IB/cm: Explicitly mark if a response MAD is a retransmission RDMA/mlx5: Move events notifier registration to be after device registration RDMA/bnxt_re: Cache MSIx info to a local structure RDMA/bnxt_re: Refurbish CQ to NQ hash calculation RDMA/bnxt_re: Refactor NQ allocation RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved RDMA/hns: Fix different dgids mapping to the same dip_idx RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design bnxt_en: Add support for RoCE sriov configuration RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg() RDMA/hns: Fix out-of-order issue of requester when setting FENCE RDMA/nldev: Add IB device and net device rename events RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation RDMA/core: Move ib_uverbs_file struct to uverbs_types.h ...
2024-11-18Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-18/+9
Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...
2024-11-17RDMA/bnxt_re: Correct the sequence of device suspendKalesh AP1-23/+5
When in fatal error condition, mark device as detached first and then complete all pending HWRM commands as firmware is not going to process them and eventually time out. Move the device to error only if suspend is called when device is in Fatal state. Also, remove some outdated comments. Remove the stop_irq call which is no longer required. Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1731660464-27838-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-17RDMA/bnxt_re: Use the default mode of congestion controlKalesh AP1-3/+2
Instead of driver setting the congestion mode, use the default values setup by Firmware. Enable the tos_ecn field in FW. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1731660464-27838-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-17RDMA/bnxt_re: Support different traffic classChandramohan Akula2-2/+12
Adding support for different traffic class passed to driver. Fix the traffic class setting in modify_qp by skipping the ECN bits. Pass the service level received from applications to the firmware. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1731660464-27838-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-17IB/cm: Rework sending DREQ when destroying a cm_idSean Hefty1-21/+32
A DREQ is sent in 2 situations: 1. When requested by the user. This DREQ has to wait for a DREP, which will be routed to the user. 2. When the cm_id is destroyed. This DREQ is generated by the CM to notify the peer that the connection has been destroyed. In the latter case, any DREP that is received will be discarded. There's no need to hold a reference on the cm_id. Today, both situations are covered by the same function: cm_send_dreq_locked(). When invoked in the cm_id destroy path, the cm_id reference would be held until the DREQ completes, blocking the destruction. Because it could take several seconds to minutes before the DREQ receives a DREP, the destroy call posts a send for the DREQ then immediately cancels the MAD. However, cancellation is not immediate in the MAD layer. There could still be a delay before the MAD layer returns the DREQ to the CM. Moreover, the only guarantee is that the DREQ will be sent at most once. Introduce a separate flow for sending a DREQ when destroying the cm_id. The new flow will not hold a reference on the cm_id, allowing it to be cleaned up immediately. The cancellation trick is no longer needed. The MAD layer will send the DREQ exactly once. Signed-off-by: Sean Hefty <shefty@nvidia.com> Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/a288a098b8e0550305755fd4a7937431699317f4.1731495873.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-17IB/cm: Do not hold reference on cm_id unless neededSean Hefty1-42/+24
Typically, when the CM sends a MAD it bumps a reference count on the associated cm_id. There are some exceptions, such as when the MAD is a direct response to a receive MAD. For example, the CM may generate an MRA in response to a duplicate REQ. But, in general, if a MAD may be sent as a result of the user invoking an API call (e.g. ib_send_cm_rep(), ib_send_cm_rtu(), etc.), a reference is taken on the cm_id. This reference is necessary if the MAD requires a response. The reference allows routing a response MAD back to the cm_id, or, if no response is received, allows updating the cm_id state to reflect the failure. For MADs which do not generate a response from the target, however, there's no need to hold a reference on the cm_id. Such MADs will not be retried by the MAD layer and their completions do not change the state of the cm_id. There are 2 internal calls used to allocate MADs which take a reference on the cm_id: cm_alloc_msg() and cm_alloc_priv_msg(). The latter calls the former. It turns out that all other places where cm_alloc_msg() is called are for MADs that do not generate a response from the target: sending an RTU, DREP, REJ, MRA, or SIDR REP. In all of these cases, there's no need to hold a reference on the cm_id. The benefit of dropping unneeded references is that it allows destruction of the cm_id to proceed immediately. Currently, the cm_destroy_id() call blocks as long as there's a reference held on the cm_id. Worse, is that cm_destroy_id() will send MADs, which it then needs to complete. Sending the MADs is beneficial, as they notify the peer that a connection is being destroyed. However, since the MADs hold a reference on the cm_id, they block destruction and cannot be retried. Move cm_id referencing from cm_alloc_msg() to cm_alloc_priv_msg(). The latter should hold a reference on the cm_id in all cases but one, which will be handled in a separate patch. cm_alloc_priv_msg() is used when sending a REQ, REP, DREQ, and SIDR REQ, all of which require a response. Also, merge common code into cm_alloc_priv_msg() and combine the freeing of all messages which do not need a response. Signed-off-by: Sean Hefty <shefty@nvidia.com> Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/1f0f96acace72790ecf89087fc765dead960189e.1731495873.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-17IB/cm: Explicitly mark if a response MAD is a retransmissionSean Hefty1-20/+31
In several situations the CM may send a reply to a received MAD without the reply being directly linked with a cm_id. For example, it may send a REJ in response to a REQ which does not match a listener. Or, it may send a DREP in response to a DREQ if the cm_id has already been destroyed. This can happen if the original DREP was lost and the DREQ was retried. When such a response MAD completes, it updates a counter tracking how many MADs were retried. However, not all response MADs issued directly by the CM may be retries. The REJ mentioned in the example above is such a case. To distinguish between responses which were retries versus those that are not, the send_handler performs the following check: is a retry if the response is not associated with a cm_id and the response is not a REJ message. Replace this indirect method of checking if a response is a retry with an explicit check. Note that these retries are generated directly by the CM, rather than retried by the MAD layer. This change will be needed by later changes which would otherwise break the indirect check. Signed-off-by: Sean Hefty <shefty@nvidia.com> Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/1ee6e2a68f8de1992b9da23aa1d7e3f9f25e0036.1731495873.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-14RDMA/mlx5: Move events notifier registration to be after device registrationPatrisious Haddad2-22/+20
Move pkey change work initialization and cleanup from device resources stage to notifier stage, since this is the stage which handles this work events. Fix a race between the device deregistration and pkey change work by moving MLX5_IB_STAGE_DEVICE_NOTIFIER to be after MLX5_IB_STAGE_IB_REG in order to ensure that the notifier is deregistered before the device during cleanup. Which ensures there are no works that are being executed after the device has already unregistered which can cause the panic below. BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 630071 Comm: kworker/1:2 Kdump: loaded Tainted: G W OE --------- --- 5.14.0-162.6.1.el9_1.x86_64 #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 02/27/2023 Workqueue: events pkey_change_handler [mlx5_ib] RIP: 0010:setup_qp+0x38/0x1f0 [mlx5_ib] Code: ee 41 54 45 31 e4 55 89 f5 53 48 89 fb 48 83 ec 20 8b 77 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 48 8b 07 48 8d 4c 24 16 <4c> 8b 38 49 8b 87 80 0b 00 00 4c 89 ff 48 8b 80 08 05 00 00 8b 40 RSP: 0018:ffffbcc54068be20 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff954054494128 RCX: ffffbcc54068be36 RDX: ffff954004934000 RSI: 0000000000000001 RDI: ffff954054494128 RBP: 0000000000000023 R08: ffff954001be2c20 R09: 0000000000000001 R10: ffff954001be2c20 R11: ffff9540260133c0 R12: 0000000000000000 R13: 0000000000000023 R14: 0000000000000000 R15: ffff9540ffcb0905 FS: 0000000000000000(0000) GS:ffff9540ffc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010625c001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_ib_gsi_pkey_change+0x20/0x40 [mlx5_ib] process_one_work+0x1e8/0x3c0 worker_thread+0x50/0x3b0 ? rescuer_thread+0x380/0x380 kthread+0x149/0x170 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x22/0x30 Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_fwctl(OE) fwctl(OE) ib_uverbs(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlx_compat(OE) psample mlxfw(OE) tls knem(OE) netconsole nfsv3 nfs_acl nfs lockd grace fscache netfs qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common rapl hv_balloon hv_utils i2c_piix4 pcspkr joydev fuse ext4 mbcache jbd2 sr_mod sd_mod cdrom t10_pi sg ata_generic pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper drm_kms_helper hv_storvsc syscopyarea hv_netvsc sysfillrect sysimgblt hid_hyperv fb_sys_fops scsi_transport_fc hyperv_keyboard drm ata_piix crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel hv_vmbus serio_raw [last unloaded: ib_core] CR2: 0000000000000000 ---[ end trace f6f8be4eae12f7bc ]--- Fixes: 7722f47e71e5 ("IB/mlx5: Create GSI transmission QPs when P_Key table is changed") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/d271ceeff0c08431b3cbbbb3e2d416f09b6d1621.1731496944.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-14RDMA/bnxt_re: Cache MSIx info to a local structureKalesh AP2-8/+11
L2 driver allocates the vectors for RoCE and pass it through the en_dev structure to RoCE. During probe, cache the MSIx related info to a local structure. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1731577748-1804-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-14RDMA/bnxt_re: Refurbish CQ to NQ hash calculationKalesh AP5-11/+32
There are few use cases where CQ create and destroy is seen before re-creating the CQ, this kind of use case is disturbing the RR distribution and all the active CQ getting mapped to only 2 NQ alternatively. Fixing the CQ to NQ hash calculation by implementing a quick load sorting mechanism under a mutex. Using this, if the CQ was allocated and destroyed before using it, the nq selecting algorithm still obtains the least loaded CQ. Thus balancing the load on NQs. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1731577748-1804-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-14RDMA/bnxt_re: Refactor NQ allocationKalesh AP3-33/+60
Move NQ related data structures from rdev to a new structure named "struct bnxt_re_nq_record" by keeping a pointer to in the rdev structure. Allocate the memory for it dynamically. This change is needed for subsequent patches in the series. Also, removed the nq_task variable from rdev structure as it is redundant and no longer used. This change would help to reduce the size of the driver private structure as well. Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1731577748-1804-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-14RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reservedKalesh AP2-10/+14
L2 driver allocates and populates the MSI-x vector details for RoCE in the en_dev structure. RoCE driver requires minimum 2 MSIx vectors. Hence during probe, driver has to check and bail out if there are not enough MSI-x vectors reserved for it before proceeding further initialization. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1731577748-1804-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-14RDMA/hns: Fix different dgids mapping to the same dip_idxFeng Fang5-44/+75
DIP algorithm requires a one-to-one mapping between dgid and dip_idx. Currently a queue 'spare_idx' is used to store QPN of QPs that use DIP algorithm. For a new dgid, use a QPN from spare_idx as dip_idx. This method lacks a mechanism for deduplicating QPN, which may result in different dgids sharing the same dip_idx and break the one-to-one mapping requirement. This patch replaces spare_idx with xarray and introduces a refcnt of a dip_idx to indicate the number of QPs that using this dip_idx. The state machine for dip_idx management is implemented as: * The entry at an index in xarray is empty -- This indicates that the corresponding dip_idx hasn't been created. * The entry at an index in xarray is not empty but with 0 refcnt -- This indicates that the corresponding dip_idx has been created but not used as dip_idx yet. * The entry at an index in xarray is not empty and with non-0 refcnt -- This indicates that the corresponding dip_idx is being used by refcnt number of DIP QPs. Fixes: eb653eda1e91 ("RDMA/hns: Bugfix for incorrect association between dip_idx and dgid") Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Feng Fang <fangfeng4@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241112055553.3681129-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-12Revert "RDMA/core: Fix ENODEV error for iWARP test over vlan"Leon Romanovsky1-2/+0
The citied commit in Fixes line caused to regression for udaddy [1] application. It doesn't work over VLANs anymore. Client: ifconfig eth2 1.1.1.1 ip link add link eth2 name p0.3597 type vlan protocol 802.1Q id 3597 ip link set dev p0.3597 up ip addr add 2.2.2.2/16 dev p0.3597 udaddy -S 847 -C 220 -c 2 -t 0 -s 2.2.2.3 -b 2.2.2.2 Server: ifconfig eth2 1.1.1.3 ip link add link eth2 name p0.3597 type vlan protocol 802.1Q id 3597 ip link set dev p0.3597 up ip addr add 2.2.2.3/16 dev p0.3597 udaddy -S 847 -C 220 -c 2 -t 0 -b 2.2.2.3 [1] https://github.com/linux-rdma/rdma-core/blob/master/librdmacm/examples/udaddy.c Fixes: 5069d7e202f6 ("RDMA/core: Fix ENODEV error for iWARP test over vlan") Reported-by: Leon Romanovsky <leonro@nvidia.com> Closes: https://lore.kernel.org/all/20241110130746.GA48891@unreal Link: https://patch.msgid.link/bb9d403419b2b9566da5b8bf0761fa8377927e49.1731401658.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-11-12RDMA/bnxt_re: Add set_func_resources support for P5/P7 adaptersKalesh AP2-15/+7
Enable set_func_resources for P5 and P7 adapters to handle VF resource distribution. Remove setting max resources per VF during PF initialization. This change is required for firmwares which does not support RoCE VF resource management by NIC driver. The code is same for all adapters now. Reviewed-by: Stephen Shi <stephen.shi@broadcom.com> Reviewed-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-12RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration designBhargava Chenna Marreddy4-5/+14
Refine RoCE SRIOV resource configuration design, using the INITIALIZE_FW's flag as an indication for the new design to the firmware. RoCE driver does not have to provision resources to VF when firmware advertises support for RoCE resource management by NIC driver. Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> CC: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-10RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg()Junxian Huang1-3/+4
ib_map_mr_sg() allows ULPs to specify NULL as the sg_offset argument. The driver needs to check whether it is a NULL pointer before dereferencing it. Fixes: d387d4b54eb8 ("RDMA/hns: Fix missing pagesize and alignment check in FRMR") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241108075743.2652258-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-10RDMA/hns: Fix out-of-order issue of requester when setting FENCEJunxian Huang2-1/+2
The FENCE indicator in hns WQE doesn't ensure that response data from a previous Read/Atomic operation has been written to the requester's memory before the subsequent Send/Write operation is processed. This may result in the subsequent Send/Write operation accessing the original data in memory instead of the expected response data. Unlike FENCE, the SO (Strong Order) indicator blocks the subsequent operation until the previous response data is written to memory and a bresp is returned. Set the SO indicator instead of FENCE to maintain strict order. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241108075743.2652258-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-05sysfs: treewide: constify attribute callback of bin_is_visible()Thomas Weißschuh1-1/+1
The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-04RDMA/nldev: Add IB device and net device rename eventsChiara Meiohas2-2/+76
Implement event sending for IB device rename and IB device port associated netdevice rename. In iproute2, rdma monitor displays the IB device name, port and the netdevice name when displaying event info. Since users can modiy these names, we track and notify on renaming events. Note: In order to receive netdevice rename events, drivers must use the ib_device_set_netdev() API when attaching net devices to IB devices. $ rdma monitor $ rmmod mlx5_ib [UNREGISTER] dev 1 rocep8s0f1 [UNREGISTER] dev 0 rocep8s0f0 $ modprobe mlx5_ib [REGISTER] dev 2 mlx5_0 [NETDEV_ATTACH] dev 2 mlx5_0 port 1 netdev 4 eth2 [REGISTER] dev 3 mlx5_1 [NETDEV_ATTACH] dev 3 mlx5_1 port 1 netdev 5 eth3 [RENAME] dev 2 rocep8s0f0 [RENAME] dev 3 rocep8s0f1 $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev [UNREGISTER] dev 2 rocep8s0f0 [REGISTER] dev 4 mlx5_0 [NETDEV_ATTACH] dev 4 mlx5_0 port 30 netdev 4 eth2 [RENAME] dev 4 rdmap8s0f0 $ echo 4 > /sys/class/net/eth2/device/sriov_numvfs [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 2 netdev 7 eth4 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 3 netdev 8 eth5 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 4 netdev 9 eth6 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 5 netdev 10 eth7 [REGISTER] dev 5 mlx5_0 [NETDEV_ATTACH] dev 5 mlx5_0 port 1 netdev 11 eth8 [REGISTER] dev 6 mlx5_1 [NETDEV_ATTACH] dev 6 mlx5_1 port 1 netdev 12 eth9 [RENAME] dev 5 rocep8s0f0v0 [RENAME] dev 6 rocep8s0f0v1 [REGISTER] dev 7 mlx5_0 [NETDEV_ATTACH] dev 7 mlx5_0 port 1 netdev 13 eth10 [RENAME] dev 7 rocep8s0f0v2 [REGISTER] dev 8 mlx5_0 [NETDEV_ATTACH] dev 8 mlx5_0 port 1 netdev 14 eth11 [RENAME] dev 8 rocep8s0f0v3 $ ip link set eth2 name myeth2 [NETDEV_RENAME] netdev 4 myeth2 $ ip link set eth1 name myeth1 ** no events received, because eth1 is not attached to an IB device ** Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/093c978ef2766fd3ab4ff8798eeb68f2f11582f6.1730367038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Add implementation for ufile_hw_cleanup device operationPatrisious Haddad3-1/+97
Implement the device API for ufile_hw_cleanup operation, which iterates over the ufile uobjects lists, and attempts to destroy DevX QPs, by issuing up to 8 commands in parallel. This function is responsible only for cleaning the FW resources of the QP, and doesn't necessarily cleanup all of its resources. Hence the normal serialized cleanup flow is still executed after it in __uverbs_cleanup_ufile() to cleanup the remaining resources and handle the cleanup of SW objects. In order to avoid double cleanup for the FW resources, new DevX flag was added DEVX_OBJ_FLAGS_HW_FREED, which marks the object's FW resources as already freed. Since QP destruction is the most time-consuming operation in FW, parallelizing it reduces the cleanup time of applications that use DevX QPs. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/2f82675d0412542cba1c47a6b86f589521ae41e1.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Move ib_uverbs_file struct to uverbs_types.hPatrisious Haddad2-33/+3
In light of the previous commit, make the ib_uverbs_file accessible to drivers by moving its definition to uverbs_types.h, to allow drivers to freely access the struct argument and create a personalized cleanup flow. For the same reason expose uverbs_try_lock_object function to allow driver to safely access the uverbs objects. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/29b718e0dca35daa5f496320a39284fc1f5a1722.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Add device ufile cleanup operationPatrisious Haddad2-1/+7
Add a driver operation to allow preemptive cleanup of ufile HW resources before the standard ufile cleanup flow begins. Thus, expediting the final cleanup phase which leads to fast teardown overall. This allows the use of driver specific clean up procedures to make the cleanup process more efficient. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/cabe00d75132b5732cb515944e3c500a01fb0b4a.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Ensure active slave attachment to the bond IB deviceChiara Meiohas1-10/+18
Fix a race condition when creating a lag bond in active backup mode where after the bond creation the backup slave was attached to the IB device, instead of the active slave. This caused stale entries in the GID table, as the gid updating mechanism relies on ib_device_get_netdev(), which would return the backup slave. Send an MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE event when activating the lag, additionally to when modifying the lag. This ensures that eventually the active netdevice is stored in the bond IB device. When handling this event remove the GIDs of the previously attached netdevice in this port and rescan the GIDs of the newly attached netdevice. This ensures that eventually the active slave netdevice is correctly stored in the IB device port. While there might be a brief moment where the backup slave GIDs appear in the GID table, it will eventually stabilize with the correct GIDs (of the bond and the active slave). Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/91fc2cb24f63add266a528c1c702668a80416d9f.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Implement RoCE GID port rescan and export delete functionChiara Meiohas1-4/+26
rdma_roce_rescan_port() scans all network devices in the system and adds the gids if relevant to the RoCE device port. When not in bonding mode it adds the GIDs of the netdevice in this port. When in bonding mode it adds the GIDs of both the port's netdevice and the bond master netdevice. Export roce_del_all_netdev_gids(), which removes all GIDs associated with a specific netdevice for a given port. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/674d498da4637a1503ff1367e28bd09ff942fd5e.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Call dev_put() after the blocking notifierChiara Meiohas1-1/+0
Move dev_put() call to occur directly after the blocking notifier, instead of within the event handler. Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/342ff94b3dcbb07da1c7dab862a73933d604b717.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Support querying per-plane IB PortCountersMark Zhang1-1/+7
On a SMI device, set requested plane_num when querying PPCNT register with the PortCounters Attribute group. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/828d57444a0a41042556bb0a4394ecf2fcaed639.1730368052.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Support OOO RX WQE consumptionEdward Srouji3-5/+55
Support QP with out-of-order (OOO) capabilities enabled. This allows WRs on the receiver side of the QP to be consumed OOO, permitting the sender side to transmit messages without guaranteeing arrival order on the receiver side. When enabled, the completion ordering of WRs remains in-order, regardless of the Receive WRs consumption order. RDMA Read and RDMA Atomic operations on the responder side continue to be executed in-order, while the ordering of data placement for RDMA Write and Send operations is not guaranteed. Atomic operations larger than 8 bytes are currently not supported. Therefore, when this feature is enabled, the created QP restricts its atomic support to 8 bytes at most. In addition, when querying the device, a new flag is returned in response to indicate that the Kernel supports OOO QP. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/06ac609a5f358c8fb0a090d22c61a2f9329d82e6.1725362773.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Add debugfs hook in the driverKalesh AP7-2/+180
Adding support for a per device debugfs folder for exporting some of the device specific debug information. Added support to get QP info for now. The same folder can be used to export other debug features in future. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Support raw data query for each resourcesKashyap Desai1-0/+118
Support interfaces to get the raw data for each of the resources. Use this interface to get some of the HW structures from active resources. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Add support for querying HW contextsKashyap Desai5-0/+98
Implements support for querying the hardware resource contexts. This raw data can be used for the debugging of the field issues. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Support driver specific data collection using rdma toolKashyap Desai1-0/+141
Allow users to dump driver specific resource details when queried through rdma tool. This supports the driver data for QP, CQ, MR and SRQ. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/rxe: Set queue pair cur_qp_state when being queriedLiu Jian1-0/+1
Same with commit e375b9c92985 ("RDMA/cxgb4: Set queue pair state when being queried"). The API for ib_query_qp requires the driver to set cur_qp_state on return, add the missing set. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Liu Jian <liujian56@huawei.com> Link: https://patch.msgid.link/20241031092019.2138467-1-liujian56@huawei.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-03RDMA/bnxt_re: Remove some dead codeChristophe JAILLET1-19/+0
If the probe succeeds, then auxiliary_get_drvdata() can't return a NULL pointer. So several NULL checks can be removed to simplify code. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/f02eb630734ee530315dce9f60b078f631ae93d0.1730477345.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-03RDMA/bnxt_re: Fix some error handling paths in bnxt_re_probe()Christophe JAILLET1-0/+8
If bnxt_re_add_device() fails, 'en_info' still needs to be freed, as already done in the .remove() function. The commit in Fixes incorrectly removed this call, certainly because it was expecting the .remove() function was called anyway. But if the probe fails, the remove function is not called. There is no need to call bnxt_re_remove() as it was done before, kfree() is enough. Fixes: a5e099e0c464 ("RDMA/bnxt_re: Fix an error path in bnxt_re_add_device") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/9e48ff955ae55fc39a9eb1eb590d374539eab5ba.1730477345.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-03deal with the last remaing boolean uses of fd_file()Al Viro1-5/+3
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03fdget(), more trivial conversionsAl Viro1-13/+6
all failure exits prior to fdget() leave the scope, all matching fdput() are immediately followed by leaving the scope. [xfs_ioc_commit_range() chunk moved here as well] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-30RDMA/efa: Report link speed according to device attributesMichael Margolin4-2/+54
Set port link speed and width based on max bandwidth acquired from the device instead of using constant 100 Gbps. Use a default value in case the device didn't set the field. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas Jahjah <firasj@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20241030093006.21352-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/bnxt_re: Check cqe flags to know imm_data vs inv_irkeyKashyap Desai2-3/+6
Invalidate rkey is cpu endian and immediate data is in big endian format. Both immediate data and invalidate the remote key returned by HW is in little endian format. While handling the commit in fixes tag, the difference between immediate data and invalidate rkey endianness was not considered. Without changes of this patch, Kernel ULP was failing while processing inv_rkey. dmesg log snippet - nvme nvme0: Bogus remote invalidation for rkey 0x2000019Fix in this patch Do endianness conversion based on completion queue entry flag. Also, the HW completions are already converted to host endianness in bnxt_qplib_cq_process_res_rc and bnxt_qplib_cq_process_res_ud and there is no need to convert it again in bnxt_re_poll_cq. Modified the union to hold the correct data type. Fixes: 95b087f87b78 ("bnxt_re: Fix imm_data endianness") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730110014-20755-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/rxe: Fix the qp flush warnings in reqZhu Yanjun1-2/+4
When the qp is in error state, the status of WQEs in the queue should be set to error. Or else the following will appear. [ 920.617269] WARNING: CPU: 1 PID: 21 at drivers/infiniband/sw/rxe/rxe_comp.c:756 rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.617744] Modules linked in: rnbd_client(O) rtrs_client(O) rtrs_core(O) rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop brd null_blk ipv6 [ 920.618516] CPU: 1 PID: 21 Comm: ksoftirqd/1 Tainted: G O 6.1.113-storage+ #65 [ 920.618986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 920.619396] RIP: 0010:rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.619658] Code: 0f b6 84 24 3a 02 00 00 41 89 84 24 44 04 00 00 e9 2a f7 ff ff 39 ca bb 03 00 00 00 b8 0e 00 00 00 48 0f 45 d8 e9 15 f7 ff ff <0f> 0b e9 cb f8 ff ff 41 bf f5 ff ff ff e9 08 f8 ff ff 49 8d bc 24 [ 920.620482] RSP: 0018:ffff97b7c00bbc38 EFLAGS: 00010246 [ 920.620817] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000008 [ 920.621183] RDX: ffff960dc396ebc0 RSI: 0000000000005400 RDI: ffff960dc4e2fbac [ 920.621548] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffffffac406450 [ 920.621884] R10: ffffffffac4060c0 R11: 0000000000000001 R12: ffff960dc4e2f800 [ 920.622254] R13: ffff960dc4e2f928 R14: ffff97b7c029c580 R15: 0000000000000000 [ 920.622609] FS: 0000000000000000(0000) GS:ffff960ef7d00000(0000) knlGS:0000000000000000 [ 920.622979] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 920.623245] CR2: 00007fa056965e90 CR3: 00000001107f1000 CR4: 00000000000006e0 [ 920.623680] Call Trace: [ 920.623815] <TASK> [ 920.623933] ? __warn+0x79/0xc0 [ 920.624116] ? rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.624356] ? report_bug+0xfb/0x150 [ 920.624594] ? handle_bug+0x3c/0x60 [ 920.624796] ? exc_invalid_op+0x14/0x70 [ 920.624976] ? asm_exc_invalid_op+0x16/0x20 [ 920.625203] ? rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.625474] ? rxe_completer+0x329/0xcc0 [rdma_rxe] [ 920.625749] rxe_do_task+0x80/0x110 [rdma_rxe] [ 920.626037] rxe_requester+0x625/0xde0 [rdma_rxe] [ 920.626310] ? rxe_cq_post+0xe2/0x180 [rdma_rxe] [ 920.626583] ? do_complete+0x18d/0x220 [rdma_rxe] [ 920.626812] ? rxe_completer+0x1a3/0xcc0 [rdma_rxe] [ 920.627050] rxe_do_task+0x80/0x110 [rdma_rxe] [ 920.627285] tasklet_action_common.constprop.0+0xa4/0x120 [ 920.627522] handle_softirqs+0xc2/0x250 [ 920.627728] ? sort_range+0x20/0x20 [ 920.627942] run_ksoftirqd+0x1f/0x30 [ 920.628158] smpboot_thread_fn+0xc7/0x1b0 [ 920.628334] kthread+0xd6/0x100 [ 920.628504] ? kthread_complete_and_exit+0x20/0x20 [ 920.628709] ret_from_fork+0x1f/0x30 [ 920.628892] </TASK> Fixes: ae720bdb703b ("RDMA/rxe: Generate error completion for error requester QP state") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20241025152036.121417-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix cpu stuck caused by printings during resetwenglianfa5-48/+41
During reset, cmd to destroy resources such as qp, cq, and mr may fail, and error logs will be printed. When a large number of resources are destroyed, there will be lots of printings, and it may lead to a cpu stuck. Delete some unnecessary printings and replace other printing functions in these paths with the ratelimited version. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode") Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Use dev_* printings in hem code instead of ibdev_*Junxian Huang1-22/+22
The hem code is executed before ib_dev is registered, so use dev_* printing instead of ibdev_* to avoid log like this: (null): set HEM address to HW failed! Fixes: 2f49de21f3e9 ("RDMA/hns: Optimize mhop get flow for multi-hop addressing") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Modify debugfs nameYuyu Li1-1/+2
The sub-directory of hns_roce debugfs is named after the device's kernel name currently, but it will be inconvenient to use when the device is renamed. Modify the name to pci name as users can always easily find the correspondence between an RDMA device and its pci name. Fixes: eb7854d63db5 ("RDMA/hns: Support SW stats with debugfs") Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix flush cqe error when racing with destroy qpwenglianfa3-2/+22
QP needs to be modified to IB_QPS_ERROR to trigger HW flush cqe. But when this process races with destroy qp, the destroy-qp process may modify the QP to IB_QPS_RESET first. In this case flush cqe will fail since it is invalid to modify qp from IB_QPS_RESET to IB_QPS_ERROR. Add lock and bit flag to make sure pending flush cqe work is completed first and no more new works will be added. Fixes: ffd541d45726 ("RDMA/hns: Add the workqueue framework for flush cqe handler") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-3-huangjunxian6@hisilicon.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix an AEQE overflow error caused by untimely update of eq_db_ciwenglianfa4-44/+91
eq_db_ci is updated only after all AEQEs are processed in the AEQ interrupt handler, which is not timely enough and may result in AEQ overflow. Two optimization methods are proposed: 1. Set an upper limit for AEQE processing. 2. Move time-consuming operations such as printings to the bottom half of the interrupt. cmd events and flush_cqe events are still fully processed in the top half to ensure timely handling. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA: Use ethtool string helpersRosen Penev2-9/+4
Avoids having to manually increment the pointer. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241021011543.5922-1-rosenp@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Fix access flags for MR and QP modifyHongguang Gao1-9/+50
Access flag definition in MR and QP is different in FW. Currently both reg/bind MR and modify/query QP uses the same flags. Add a different function to map the QP access flags for newer adapters. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Add support for modify_device hookKalesh AP3-0/+20
Adds support for modify_device in the driver for node desc changes. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>