summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2018-12-12IB/core: Fix oops in netdev_next_upper_dev_rcu()Mark Zhang1-0/+3
When support for bonding of RoCE devices was added, there was necessarily a link between the RoCE device and the paired netdevice that was part of the bond. If you remove the mlx4_en module, that paired association is broken (the RoCE device is still present but the paired netdevice has been released). We need to account for this in is_upper_ndev_bond_master_filter() and filter out those links with a broken pairing or else we later oops in netdev_next_upper_dev_rcu(). Fixes: 408f1242d940 ("IB/core: Delete lower netdevice default GID entries in bonding scenario") Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-06IB/mlx5: Block DEVX umem from the non applicable casesYishai Hadas1-1/+3
Blocks creating a DEVX UMEM with the non applicable access flags as of ODP, MW_BIND, etc. Specifically when an ODP flag is used below WARN call trace is issued. [ 2510.404131] RIP: 0010:__mlx5_ib_populate_pas+0x207/0x220 [mlx5_ib] ... [ 2510.404143] Call Trace: [ 2510.404150] ? __kmalloc_node+0x1b3/0x280 [ 2510.404156] ? _uverbs_alloc+0x63/0x90 [ib_uverbs] [ 2510.404158] ? _uverbs_alloc+0x63/0x90 [ib_uverbs] [ 2510.404162] mlx5_ib_populate_pas+0x53/0x60 [mlx5_ib] [ 2510.404167] mlx5_ib_handler_MLX5_IB_METHOD_DEVX_UMEM_REG+0x273/0x3f0 [mlx5_ib] Fixes: aeae94579caf ("IB/mlx5: Add DEVX support for memory registration") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03IB/mlx5: Fix implicit ODP interrupted page faultArtemy Kovalyov1-5/+4
Since any page fault may be interrupted by a MMU invalidation and implicit leaf MR may be released during this process. The check for parent value is unreliable condition for an implicit MR. Use other condition that we can rely on to determine if MR is implicit. Fixes: b4cfe447d47b ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03IB/hfi1: Fix an out-of-bounds access in get_hw_statsPiotr Stankiewicz3-2/+5
When running with KASAN, the following trace is produced: [ 62.535888] ================================================================== [ 62.544930] BUG: KASAN: slab-out-of-bounds in gut_hw_stats+0x122/0x230 [hfi1] [ 62.553856] Write of size 8 at addr ffff88080e8d6330 by task kworker/0:1/14 [ 62.565333] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.19.0-test-build-kasan+ #8 [ 62.575087] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 [ 62.587951] Workqueue: events work_for_cpu_fn [ 62.594050] Call Trace: [ 62.598023] dump_stack+0xc6/0x14c [ 62.603089] ? dump_stack_print_info.cold.1+0x2f/0x2f [ 62.610041] ? kmsg_dump_rewind_nolock+0x59/0x59 [ 62.616615] ? get_hw_stats+0x122/0x230 [hfi1] [ 62.622985] print_address_description+0x6c/0x23c [ 62.629744] ? get_hw_stats+0x122/0x230 [hfi1] [ 62.636108] kasan_report.cold.6+0x241/0x308 [ 62.642365] get_hw_stats+0x122/0x230 [hfi1] [ 62.648703] ? hfi1_alloc_rn+0x40/0x40 [hfi1] [ 62.655088] ? __kmalloc+0x110/0x240 [ 62.660695] ? hfi1_alloc_rn+0x40/0x40 [hfi1] [ 62.667142] setup_hw_stats+0xd8/0x430 [ib_core] [ 62.673972] ? show_hfi+0x50/0x50 [hfi1] [ 62.680026] ib_device_register_sysfs+0x165/0x180 [ib_core] [ 62.687995] ib_register_device+0x5a2/0xa10 [ib_core] [ 62.695340] ? show_hfi+0x50/0x50 [hfi1] [ 62.701421] ? ib_unregister_device+0x2e0/0x2e0 [ib_core] [ 62.709222] ? __vmalloc_node_range+0x2d0/0x380 [ 62.716131] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt] [ 62.723735] ? vmalloc_node+0x5c/0x70 [ 62.729697] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt] [ 62.737347] ? rvt_driver_mr_init+0x1f5/0x2d0 [rdmavt] [ 62.744998] ? __rvt_alloc_mr+0x110/0x110 [rdmavt] [ 62.752315] ? rvt_rc_error+0x140/0x140 [rdmavt] [ 62.759434] ? rvt_vma_open+0x30/0x30 [rdmavt] [ 62.766364] ? mutex_unlock+0x1d/0x40 [ 62.772445] ? kmem_cache_create_usercopy+0x15d/0x230 [ 62.780115] rvt_register_device+0x1f6/0x360 [rdmavt] [ 62.787823] ? rvt_get_port_immutable+0x180/0x180 [rdmavt] [ 62.796058] ? __get_txreq+0x400/0x400 [hfi1] [ 62.802969] ? memcpy+0x34/0x50 [ 62.808611] hfi1_register_ib_device+0xde6/0xeb0 [hfi1] [ 62.816601] ? hfi1_get_npkeys+0x10/0x10 [hfi1] [ 62.823760] ? hfi1_init+0x89f/0x9a0 [hfi1] [ 62.830469] ? hfi1_setup_eagerbufs+0xad0/0xad0 [hfi1] [ 62.838204] ? pcie_capability_clear_and_set_word+0xcd/0xe0 [ 62.846429] ? pcie_capability_read_word+0xd0/0xd0 [ 62.853791] ? hfi1_pcie_init+0x187/0x4b0 [hfi1] [ 62.860958] init_one+0x67f/0xae0 [hfi1] [ 62.867301] ? hfi1_init+0x9a0/0x9a0 [hfi1] [ 62.873876] ? wait_woken+0x130/0x130 [ 62.879860] ? read_word_at_a_time+0xe/0x20 [ 62.886329] ? strscpy+0x14b/0x280 [ 62.891998] ? hfi1_init+0x9a0/0x9a0 [hfi1] [ 62.898405] local_pci_probe+0x70/0xd0 [ 62.904295] ? pci_device_shutdown+0x90/0x90 [ 62.910833] work_for_cpu_fn+0x29/0x40 [ 62.916750] process_one_work+0x584/0x960 [ 62.922974] ? rcu_work_rcufn+0x40/0x40 [ 62.928991] ? __schedule+0x396/0xdc0 [ 62.934806] ? __sched_text_start+0x8/0x8 [ 62.941020] ? pick_next_task_fair+0x68b/0xc60 [ 62.947674] ? run_rebalance_domains+0x260/0x260 [ 62.954471] ? __list_add_valid+0x29/0xa0 [ 62.960607] ? move_linked_works+0x1c7/0x230 [ 62.967077] ? trace_event_raw_event_workqueue_execute_start+0x140/0x140 [ 62.976248] ? mutex_lock+0xa6/0x100 [ 62.982029] ? __mutex_lock_slowpath+0x10/0x10 [ 62.988795] ? __switch_to+0x37a/0x710 [ 62.994731] worker_thread+0x62e/0x9d0 [ 63.000602] ? max_active_store+0xf0/0xf0 [ 63.006828] ? __switch_to_asm+0x40/0x70 [ 63.012932] ? __switch_to_asm+0x34/0x70 [ 63.019013] ? __switch_to_asm+0x40/0x70 [ 63.025042] ? __switch_to_asm+0x34/0x70 [ 63.031030] ? __switch_to_asm+0x40/0x70 [ 63.037006] ? __schedule+0x396/0xdc0 [ 63.042660] ? kmem_cache_alloc_trace+0xf3/0x1f0 [ 63.049323] ? kthread+0x59/0x1d0 [ 63.054594] ? ret_from_fork+0x35/0x40 [ 63.060257] ? __sched_text_start+0x8/0x8 [ 63.066212] ? schedule+0xcf/0x250 [ 63.071529] ? __wake_up_common+0x110/0x350 [ 63.077794] ? __schedule+0xdc0/0xdc0 [ 63.083348] ? wait_woken+0x130/0x130 [ 63.088963] ? finish_task_switch+0x1f1/0x520 [ 63.095258] ? kasan_unpoison_shadow+0x30/0x40 [ 63.101792] ? __init_waitqueue_head+0xa0/0xd0 [ 63.108183] ? replenish_dl_entity.cold.60+0x18/0x18 [ 63.115151] ? _raw_spin_lock_irqsave+0x25/0x50 [ 63.121754] ? max_active_store+0xf0/0xf0 [ 63.127753] kthread+0x1ae/0x1d0 [ 63.132894] ? kthread_bind+0x30/0x30 [ 63.138422] ret_from_fork+0x35/0x40 [ 63.146973] Allocated by task 14: [ 63.152077] kasan_kmalloc+0xbf/0xe0 [ 63.157471] __kmalloc+0x110/0x240 [ 63.162804] init_cntrs+0x34d/0xdf0 [hfi1] [ 63.168883] hfi1_init_dd+0x29a3/0x2f90 [hfi1] [ 63.175244] init_one+0x551/0xae0 [hfi1] [ 63.181065] local_pci_probe+0x70/0xd0 [ 63.186759] work_for_cpu_fn+0x29/0x40 [ 63.192310] process_one_work+0x584/0x960 [ 63.198163] worker_thread+0x62e/0x9d0 [ 63.203843] kthread+0x1ae/0x1d0 [ 63.208874] ret_from_fork+0x35/0x40 [ 63.217203] Freed by task 1: [ 63.221844] __kasan_slab_free+0x12e/0x180 [ 63.227844] kfree+0x92/0x1a0 [ 63.232570] single_release+0x3a/0x60 [ 63.238024] __fput+0x1d9/0x480 [ 63.242911] task_work_run+0x139/0x190 [ 63.248440] exit_to_usermode_loop+0x191/0x1a0 [ 63.254814] do_syscall_64+0x301/0x330 [ 63.260283] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 63.270199] The buggy address belongs to the object at ffff88080e8d5500 which belongs to the cache kmalloc-4096 of size 4096 [ 63.287247] The buggy address is located 3632 bytes inside of 4096-byte region [ffff88080e8d5500, ffff88080e8d6500) [ 63.303564] The buggy address belongs to the page: [ 63.310447] page:ffffea00203a3400 count:1 mapcount:0 mapping:ffff88081380e840 index:0x0 compound_mapcount: 0 [ 63.323102] flags: 0x2fffff80008100(slab|head) [ 63.329775] raw: 002fffff80008100 0000000000000000 0000000100000001 ffff88081380e840 [ 63.340175] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 63.350564] page dumped because: kasan: bad access detected [ 63.361974] Memory state around the buggy address: [ 63.369137] ffff88080e8d6200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 63.379082] ffff88080e8d6280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 63.389032] >ffff88080e8d6300: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc [ 63.398944] ^ [ 63.406141] ffff88080e8d6380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 63.416109] ffff88080e8d6400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 63.426099] ================================================================== The trace happens because get_hw_stats() assumes there is room in the memory allocated in init_cntrs() to accommodate the driver counters. Unfortunately, that routine only allocated space for the device counters. Fix by insuring the allocation has room for the additional driver counters. Cc: <Stable@vger.kernel.org> # v4.14+ Fixes: b7481944b06e9 ("IB/hfi1: Show statistics counters under IB stats interface") Reviewed-by: Mike Marciniczyn <mike.marciniszyn@intel.com> Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03IB/hfi1: Fix a latency issue for small messagesMichael J. Ruhl1-0/+7
A recent performance enhancement introduced a latency issue in the HFI message path. The new algorithm removed a forced call send for PIO messages and added a forced schedule event for messages larger than the MTU. For PIO, the schedule path can introduce thrashing that can significantly impact the throughput for small messages. If a message size is within the PIO threshold, always take the send path. Fixes: 0b79b27748cb ("IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-11-29RDMA/mlx5: Initialize return variable in case pagefault was skippedLeon Romanovsky1-0/+1
Pagefaults occurred in non-ODP MR are completely valid events, so initialize return variable to 0. Fixes: 4d5422a309de ("IB/mlx5: Skip non-ODP MR when handling a page fault") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26IB/mlx5: Fix page fault handling for MWArtemy Kovalyov1-0/+1
Memory windows are implemented with an indirect MKey, when a page fault event comes for a MW Mkey we need to find the MR at the end of the list of the indirect MKeys by iterating on all items from the first to the last. The offset calculated during this process has to be zeroed after the first iteration or the next iteration will start from a wrong address, resulting incorrect ODP faulting behavior. Fixes: db570d7deafb ("IB/mlx5: Add ODP support to MW") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26IB/umem: Set correct address to the invalidation functionArtemy Kovalyov1-14/+6
The invalidate range was using PAGE_SIZE instead of the computed 'end', and had the wrong transformation of page_index due the weird construction. This can trigger during error unwind and would cause malfunction. Inline the code and correct the math. Fixes: 403cd12e2cf7 ("IB/umem: Add contiguous ODP support") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26IB/mlx5: Skip non-ODP MR when handling a page faultArtemy Kovalyov1-0/+8
It is possible that we call pagefault_single_data_segment() with a MKey that belongs to a memory region which is not on demand (i.e. pinned pages). This can happen if, for instance, a WQE that points to multiple MRs where some of them are ODP MRs and some are not. In this case we don't need to handle this MR in the ODP context besides reporting success. Otherwise the code will call pagefault_mr() which will do to_ib_umem_odp() on a non-ODP MR and thus access out of bounds. Fixes: 7bdf65d411c1 ("IB/mlx5: Handle page faults") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-23RDMA/hns: Bugfix pbl configuration for rereg mrYixian Liu1-68/+60
Current hns driver assigned the first two PBL page addresses from previous registered MR to the hardware when reregister MR changing the memory locations occurred. This will lead to PBL addressing error as the PBL has already been released. This patch fixes this wrong assignment by using the page address from new allocated PBL. Fixes: a2c80b7b4119 ("RDMA/hns: Add rereg mr support for hip08") Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21iser: set sector for ambiguous mr status errorsSagi Grimberg1-4/+3
If for some reason we failed to query the mr status, we need to make sure to provide sufficient information for an ambiguous error (guard error on sector 0). Fixes: 0a7a08ad6f5f ("IB/iser: Implement check_protection") Cc: <stable@vger.kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21RDMA/rdmavt: Fix rvt_create_ah function signatureKamal Heib2-2/+5
rdmavt uses a crazy system that looses the type checking when assinging functions to struct ib_device function pointers. Because of this the signature to this function was not changed when the below commit revised things. Fix the signature so we are not calling a function pointer with a mismatched signature. Fixes: 477864c8fcd9 ("IB/core: Let create_ah return extended response to user") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21IB/mlx5: Avoid load failure due to unknown link widthMichael Guralnik1-18/+11
If the firmware reports a connection width that is not 1x, 4x, 8x or 12x it causes the driver to fail during initialization. To prevent this failure every time a new width is introduced to the RDMA stack, we will set a default 4x width for these widths which ar unknown to the driver. This is needed to allow to run old kernels with new firmware. Cc: <stable@vger.kernel.org> # 4.1 Fixes: 1b5daf11b015 ("IB/mlx5: Avoid using the MAD_IFC command under ISSI > 0 mode") Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21IB/mlx5: Fix XRC QP support after introducing extended atomicYonatan Cohen1-2/+1
Extended atomics are supported with RC and XRC QP types, but the commit citied in the Fixes line added an unneeded check to to_mlx5_access_flags. This broke XRC QPs. The following ib_atomic_bw invocation over XRC reproduces the issue: ib_atomic_bw -d mlx5_1 --connection=XRC --atomic_type=FETCH_AND_ADD It is safe to remove such checks because the QP type was already checked in ib_modify_qp_is_ok(), which was previously called from mlx5_ib_modify_qp. Fixes: a60109dc9a95 ("IB/mlx5: Add support for extended atomic operations") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21RDMA/bnxt_re: Avoid accessing the device structure after it is freedSelvin Xavier1-0/+2
When bnxt_re_ib_reg returns failure, the device structure gets freed. Driver tries to access the device pointer after it is freed. [ 4871.034744] Failed to register with netedev: 0xffffffa1 [ 4871.034765] infiniband (null): Failed to register with IB: 0xffffffea [ 4871.046430] ================================================================== [ 4871.046437] BUG: KASAN: use-after-free in bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046439] Write of size 4 at addr ffff880fa8406f48 by task kworker/u48:2/17813 [ 4871.046443] CPU: 20 PID: 17813 Comm: kworker/u48:2 Kdump: loaded Tainted: G B OE 4.20.0-rc1+ #42 [ 4871.046444] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 4871.046447] Workqueue: bnxt_re bnxt_re_task [bnxt_re] [ 4871.046449] Call Trace: [ 4871.046454] dump_stack+0x91/0xeb [ 4871.046458] print_address_description+0x6a/0x2a0 [ 4871.046461] kasan_report+0x176/0x2d0 [ 4871.046463] ? bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046466] bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046470] process_one_work+0x216/0x5b0 [ 4871.046471] ? process_one_work+0x189/0x5b0 [ 4871.046475] worker_thread+0x4e/0x3d0 [ 4871.046479] kthread+0x10e/0x140 [ 4871.046480] ? process_one_work+0x5b0/0x5b0 [ 4871.046482] ? kthread_stop+0x220/0x220 [ 4871.046486] ret_from_fork+0x3a/0x50 [ 4871.046492] The buggy address belongs to the page: [ 4871.046494] page:ffffea003ea10180 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 4871.046495] flags: 0x57ffffc0000000() [ 4871.046498] raw: 0057ffffc0000000 0000000000000000 ffffea003ea10188 0000000000000000 [ 4871.046500] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 4871.046501] page dumped because: kasan: bad access detected Avoid accessing the device structure once it is freed. Fixes: 497158aa5f52 ("RDMA/bnxt_re: Fix the ib_reg failure cleanup") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21RDMA/bnxt_re: Fix system hang when registration with L2 driver failsSelvin Xavier1-0/+1
Driver doesn't release rtnl lock if registration with L2 driver (bnxt_re_register_netdev) fais and this causes hang while requesting for the next lock. [ 371.635416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 371.635417] kworker/u48:1 D 0 634 2 0x80000000 [ 371.635423] Workqueue: bnxt_re bnxt_re_task [bnxt_re] [ 371.635424] Call Trace: [ 371.635426] ? __schedule+0x36b/0xbd0 [ 371.635429] schedule+0x39/0x90 [ 371.635430] schedule_preempt_disabled+0x11/0x20 [ 371.635431] __mutex_lock+0x45b/0x9c0 [ 371.635433] ? __mutex_lock+0x16d/0x9c0 [ 371.635435] ? bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re] [ 371.635438] ? wake_up_klogd+0x37/0x40 [ 371.635442] bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re] [ 371.635447] bnxt_re_task+0xfd/0x180 [bnxt_re] [ 371.635449] process_one_work+0x216/0x5b0 [ 371.635450] ? process_one_work+0x189/0x5b0 [ 371.635453] worker_thread+0x4e/0x3d0 [ 371.635455] kthread+0x10e/0x140 [ 371.635456] ? process_one_work+0x5b0/0x5b0 [ 371.635458] ? kthread_stop+0x220/0x220 [ 371.635460] ret_from_fork+0x3a/0x50 [ 371.635477] INFO: task NetworkManager:1228 blocked for more than 120 seconds. [ 371.635478] Tainted: G B OE 4.20.0-rc1+ #42 [ 371.635479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Release the rtnl_lock correctly in the failure path. Fixes: de5c95d0f518 ("RDMA/bnxt_re: Fix system crash during RDMA resource initialization") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21RDMA/core: Add GIDs while changing MAC addr only for registered ndevParav Pandit1-2/+4
Currently when MAC address is changed, regardless of the netdev reg_state, GID entries are removed and added to reflect the new MAC address and new default GID entries. When a bonding device is used and the underlying PCI device is removed several netdevice events are generated. Two events of the interest are CHANGEADDR and UNREGISTER event on lower(slave) netdevice of the bond netdevice. Sometimes CHANGEADDR event is generated when netdev state is UNREGISTERING (after UNREGISTER event is generated). In this scenario, GID entries for default GIDs are added and never deleted because GID entries are deleted only when netdev state is < UNREGISTERED. This leads to non zero reference count on the netdevice. Due to this, PCI device unbind operation is getting stuck. To avoid it, when changing mac address, add GID entries only if netdev is in REGISTERED state. Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WRMajd Dibbiny1-9/+10
Currently, for IB_WR_LOCAL_INV WR, when the next fence is None, the current fence will be SMALL instead of Normal Fence. Without this patch krping doesn't work on CX-5 devices and throws following error: The error messages are from CX5 driver are: (from server side) [ 710.434014] mlx5_0:dump_cqe:278:(pid 2712): dump error cqe [ 710.434016] 00000000 00000000 00000000 00000000 [ 710.434016] 00000000 00000000 00000000 00000000 [ 710.434017] 00000000 00000000 00000000 00000000 [ 710.434018] 00000000 93003204 100000b8 000524d2 [ 710.434019] krping: cq completion failed with wr_id 0 status 4 opcode 128 vender_err 32 Fixed the logic to set the correct fence type. Fixes: 6e8484c5cf07 ("RDMA/mlx5: set UMR wqe fence according to HCA cap") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-26Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate ↵Michal Hocko1-1/+0
callbacks" Revert 5ff7091f5a2ca ("mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks"). MMU_INVALIDATE_DOES_NOT_BLOCK flags was the only one used and it is no longer needed since 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers"). We now have a full support for per range !blocking behavior so we can drop the stop gap workaround which the per notifier flag was used for. Link: http://lkml.kernel.org/r/20180827112623.8992-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds182-5018/+7128
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle with many of the commits being smallish code fixes and improvements across the drivers. - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and rxe - Memory window support in hns - mlx5 user API 'flow mutate/steering' allows accessing the full packet mangling and matching machinery from user space - Support inter-working with verbs API calls in the 'devx' mlx5 user API, and provide options to use devx with less privilege - Modernize the use of syfs and the device interface to use attribute groups and cdev properly for uverbs, and clean up some of the core code's device list management - More progress on net namespaces for RDMA devices - Consolidate driver BAR mmapping support into core code helpers and rework how RDMA holds poitners to mm_struct for get_user_pages cases - First pass to use 'dev_name' instead of ib_device->name - Device renaming for RDMA devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (242 commits) IB/mlx5: Add support for extended atomic operations RDMA/core: Fix comment for hw stats init for port == 0 RDMA/core: Refactor ib_register_device() function RDMA/core: Fix unwinding flow in case of error to register device ib_srp: Remove WARN_ON in srp_terminate_io() IB/mlx5: Allow scatter to CQE without global signaled WRs IB/mlx5: Verify that driver supports user flags IB/mlx5: Support scatter to CQE for DC transport type RDMA/drivers: Use core provided API for registering device attributes RDMA/core: Allow existing drivers to set one sysfs group per device IB/rxe: Remove unnecessary enum values RDMA/umad: Use kernel API to allocate umad indexes RDMA/uverbs: Use kernel API to allocate uverbs indexes RDMA/core: Increase total number of RDMA ports across all devices IB/mlx4: Add port and TID to MAD debug print IB/mlx4: Enable debug print of SMPs RDMA/core: Rename ports_parent to ports_kobj RDMA/core: Do not expose unsupported counters IB/mlx4: Refer to the device kobject instead of ports_parent RDMA/nldev: Allow IB device rename through RDMA netlink ...
2018-10-25Merge tag 'pci-v4.20-changes' of ↵Linus Torvalds5-10/+15
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - Fix ASPM link_state teardown on removal (Lukas Wunner) - Fix misleading _OSC ASPM message (Sinan Kaya) - Make _OSC optional for PCI (Sinan Kaya) - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set (Patrick Talbert) - Remove x86 and arm64 node-local allocation for host bridge structures (Punit Agrawal) - Pay attention to device-specific _PXM node values (Jonathan Cameron) - Support new Immediate Readiness bit (Felipe Balbi) - Differentiate between pciehp surprise and safe removal (Lukas Wunner) - Remove unnecessary pciehp includes (Lukas Wunner) - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner) - Tolerate PCIe Slot Presence Detect being hardwired to zero to workaround broken hardware, e.g., the Wilocity switch/wireless device (Lukas Wunner) - Unify pciehp controller & slot structs (Lukas Wunner) - Constify hotplug_slot_ops (Lukas Wunner) - Drop hotplug_slot_info (Lukas Wunner) - Embed hotplug_slot struct into users instead of allocating it separately (Lukas Wunner) - Initialize PCIe port service drivers directly instead of relying on initcall ordering (Keith Busch) - Restore PCI config state after a slot reset (Keith Busch) - Save/restore DPC config state along with other PCI config state (Keith Busch) - Reference count devices during AER handling to avoid race issue with concurrent hot removal (Keith Busch) - If an Upstream Port reports ERR_FATAL, don't try to read the Port's config space because it is probably unreachable (Keith Busch) - During error handling, use slot-specific reset instead of secondary bus reset to avoid link up/down issues on hotplug ports (Keith Busch) - Restore previous AER/DPC handling that does not remove and re-enumerate devices on ERR_FATAL (Keith Busch) - Notify all drivers that may be affected by error recovery resets (Keith Busch) - Always generate error recovery uevents, even if a driver doesn't have error callbacks (Keith Busch) - Make PCIe link active reporting detection generic (Keith Busch) - Support D3cold in PCIe hierarchies during system sleep and runtime, including hotplug and Thunderbolt ports (Mika Westerberg) - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots are empty or occupied (Jon Derrick) - Remove duplicated include from pci/pcie/err.c and unused variable from cpqphp (YueHaibing) - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza Pawandeep) - Uninline PCI bus accessors for better ftracing (Keith Busch) - Remove unused AER Root Port .error_resume method (Keith Busch) - Use kfifo in AER instead of a local version (Keith Busch) - Use threaded IRQ in AER bottom half (Keith Busch) - Use managed resources in AER core (Keith Busch) - Reuse pcie_port_find_device() for AER injection (Keith Busch) - Abstract AER interrupt handling to disconnect error injection (Keith Busch) - Refactor AER injection callbacks to simplify future improvments (Keith Busch) - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski) - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko) - Add switch fall-through annotations (Gustavo A. R. Silva) - Remove unused Switchtec quirk variable (Joshua Abraham) - Fix pci.c kernel-doc warning (Randy Dunlap) - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig) - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng) - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless dmesg errors (Logan Gunthorpe) - Update Switchtec NTB documentation (Wesley Yung) - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz) - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang) - Add PCI support for peer-to-peer DMA (Logan Gunthorpe) - Add sysfs group for PCI peer-to-peer memory statistics (Logan Gunthorpe) - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan Gunthorpe) - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan Gunthorpe) - Add PCI peer-to-peer DMA driver writer's documentation (Logan Gunthorpe) - Add block layer flag to indicate driver support for PCI peer-to-peer DMA (Logan Gunthorpe) - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P memory (Logan Gunthorpe) - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan Gunthorpe) - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan Gunthorpe) - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise, Christoph Hellwig, Logan Gunthorpe) - Cache VF config space size to optimize enumeration of many VFs (KarimAllah Ahmed) - Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas) - Fix VMD AERSID quirk Device ID matching (Jon Derrick) - Fix Cadence PHY handling during probe (Alan Douglas) - Signal Cadence Endpoint interrupts via AXI region 0 instead of last region (Alan Douglas) - Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan Douglas) - Remove redundant controller tests for "device_type == pci" (Rob Herring) - Document R-Car E3 (R8A77990) bindings (Tho Vu) - Add device tree support for R-Car r8a7744 (Biju Das) - Drop unused mvebu PCIe capability code (Thomas Petazzoni) - Add shared PCI bridge emulation code (Thomas Petazzoni) - Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni) - Add aardvark Root Port emulation (Thomas Petazzoni) - Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach) - Add initial power management for i.MX7 (Leonard Crestez) - Add PME_Turn_Off support for i.MX7 (Leonard Crestez) - Fix qcom runtime power management error handling (Bjorn Andersson) - Update TI dra7xx unaligned access errata workaround for host mode as well as endpoint mode (Vignesh R) - Fix kirin section mismatch warning (Nathan Chancellor) - Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare) - Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I) - Update Keystone to use MRRS quirk for host bridge instead of open coding (Kishon Vijay Abraham I) - Refactor Keystone link establishment (Kishon Vijay Abraham I) - Simplify and speed up Keystone link training (Kishon Vijay Abraham I) - Remove unused Keystone host_init argument (Kishon Vijay Abraham I) - Merge Keystone driver files into one (Kishon Vijay Abraham I) - Remove redundant Keystone platform_set_drvdata() (Kishon Vijay Abraham I) - Rename Keystone functions for uniformity (Kishon Vijay Abraham I) - Add Keystone device control module DT binding (Kishon Vijay Abraham I) - Use SYSCON API to get Keystone control module device IDs (Kishon Vijay Abraham I) - Clean up Keystone PHY handling (Kishon Vijay Abraham I) - Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I) - Clean up Keystone config space access checks (Kishon Vijay Abraham I) - Get Keystone outbound window count from DT (Kishon Vijay Abraham I) - Clean up Keystone outbound window configuration (Kishon Vijay Abraham I) - Clean up Keystone DBI setup (Kishon Vijay Abraham I) - Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I) - Fix Keystone IRQ status checking (Kishon Vijay Abraham I) - Add debug messages for all Keystone errors (Kishon Vijay Abraham I) - Clean up Keystone includes and macros (Kishon Vijay Abraham I) - Fix Mediatek unchecked return value from devm_pci_remap_iospace() (Gustavo A. R. Silva) - Fix Mediatek endpoint/port matching logic (Honghui Zhang) - Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui Zhang) - Remove redundant Mediatek PM domain check (Honghui Zhang) - Convert Mediatek to pci_host_probe() (Honghui Zhang) - Fix Mediatek MSI enablement (Honghui Zhang) - Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang) - Add Mediatek loadable module support (Honghui Zhang) - Detach VMD resources after stopping root bus to prevent orphan resources (Jon Derrick) - Convert pcitest build process to that used by other tools (iio, perf, etc) (Gustavo Pimentel) * tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI/AER: Refactor error injection fallbacks PCI/AER: Abstract AER interrupt handling PCI/AER: Reuse existing pcie_port_find_device() interface PCI/AER: Use managed resource allocations PCI: pcie: Remove redundant 'default n' from Kconfig PCI: aardvark: Implement emulated root PCI bridge config space PCI: mvebu: Convert to PCI emulated bridge config space PCI: mvebu: Drop unused PCI express capability code PCI: Introduce PCI bridge emulated config space common logic PCI: vmd: Detach resources after stopping root bus nvmet: Optionally use PCI P2P memory nvmet: Introduce helper functions to allocate and free request SGLs nvme-pci: Add support for P2P memory in requests nvme-pci: Use PCI p2pmem subsystem to manage the CMB IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() block: Add PCI P2P flag for request queue PCI/P2PDMA: Add P2P DMA driver writer's documentation docs-rst: Add a new directory for PCI documentation PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset ...
2018-10-20Merge branch 'pci/peer-to-peer'Bjorn Helgaas1-2/+9
- Add PCI support for peer-to-peer DMA (Logan Gunthorpe) - Add sysfs group for PCI peer-to-peer memory statistics (Logan Gunthorpe) - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan Gunthorpe) - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan Gunthorpe) - Add PCI peer-to-peer DMA driver writer's documentation (Logan Gunthorpe) - Add block layer flag to indicate driver support for PCI peer-to-peer DMA (Logan Gunthorpe) - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P memory (Logan Gunthorpe) - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan Gunthorpe) - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan Gunthorpe) - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise, Christoph Hellwig, Logan Gunthorpe) * pci/peer-to-peer: nvmet: Optionally use PCI P2P memory nvmet: Introduce helper functions to allocate and free request SGLs nvme-pci: Add support for P2P memory in requests nvme-pci: Use PCI p2pmem subsystem to manage the CMB IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() block: Add PCI P2P flag for request queue PCI/P2PDMA: Add P2P DMA driver writer's documentation docs-rst: Add a new directory for PCI documentation PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset PCI/P2PDMA: Add sysfs group to display p2pmem stats PCI/P2PDMA: Support peer-to-peer memory
2018-10-20Merge branch 'pci/misc'Bjorn Helgaas2-6/+6
- Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski) - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko) - Add switch fall-through annotations (Gustavo A. R. Silva) - Remove unused Switchtec quirk variable (Joshua Abraham) - Fix pci.c kernel-doc warning (Randy Dunlap) - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig) - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng) - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless dmesg errors (Logan Gunthorpe) - Update Switchtec NTB documentation (Wesley Yung) - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz) * pci/misc: PCI: pcie: Remove redundant 'default n' from Kconfig NTB: switchtec_ntb: Update switchtec documentation with prerequisites for NTB PCI: Fix Switchtec DMA aliasing quirk dmesg noise PCI: Add macro for Switchtec quirk declarations PCI: Add Device IDs for Intel GPU "spurious interrupt" quirk PCI: Remove pci_set_dma_max_seg_size() PCI: Remove pci_set_dma_seg_boundary() PCI: Remove pci_unmap_addr() wrappers for DMA API PCI / ACPI: Mark expected switch fall-through PCI: Remove set but unused variable PCI: Fix pci.c kernel-doc parameter warning PCI: Allocate dma_alias_mask with bitmap_zalloc() PCI: Remove unused NFP32xx IDs
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-0/+6
net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19Merge tag 'for-gkh' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaGreg Kroah-Hartman2-0/+6
Doug writes: "Really final for-rc pull request for 4.19 Ok, so last week I thought we had sent our final pull request for 4.19. Well, wouldn't ya know someone went and found a couple Spectre v1 fixes were needed :-/. So, a couple *very* small specter patches for this (hopefully) final -rc week." * tag 'for-gkh' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/ucma: Fix Spectre v1 vulnerability IB/ucm: Fix Spectre v1 vulnerability
2018-10-18net/mlx5: Refactor fragmented buffer struct fields and init flowTariq Toukan2-18/+14
Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not needed to manage and control the datapath of the fragmented buffers API. struct mlx5_frag_buf contains control info to manage the allocation and de-allocation of the fragmented buffer. Its fields are not relevant for datapath, so here I take them out of the struct mlx5_frag_buf_ctrl, except for the fragments array itself. In addition, modified mlx5_fill_fbc to initialise the frags pointers as well. This implies that the buffer must be allocated before the function is called. A set of type-specific *_get_byte_size() functions are replaced by a generic one. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17net/mlx5: Add a no-append flow insertion modePaul Blakey1-3/+3
If no-append flag is set, we will add a new FTE, instead of appending the actions of the inserted rule when the same match already exists. While here, move the has_flow_tag boolean indicator to be a flag too. This patch doesn't change any functionality. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17net/mlx5: Use flow counter IDs and not the wrapping cache objectMark Bloch1-2/+5
Currently, when a flow rule is created using the FS core layer, the caller has to pass the entire flow counter object and not just the counter HW handle (ID). This requires both the FS core and the caller to have knowledge about the inner implementation of the FS layer flow counters cache and limits the possible users. Move to use the counter ID across the place when dealing with flows. Doing this decoupling, now can we privatize the inner implementation of the flow counters. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17Merge branch 'mlx5-next' of ↵Saeed Mahameed2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next mlx5 updates for both net-next and rdma-next * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (21 commits) net/mlx5: Expose DC scatter to CQE capability bit net/mlx5: Update mlx5_ifc with DEVX UID bits net/mlx5: Set uid as part of DCT commands net/mlx5: Set uid as part of SRQ commands net/mlx5: Set uid as part of SQ commands net/mlx5: Set uid as part of RQ commands net/mlx5: Set uid as part of QP commands net/mlx5: Set uid as part of CQ commands net/mlx5: Rename incorrect naming in IFC file net/mlx5: Export packet reformat alloc/dealloc functions net/mlx5: Pass a namespace for packet reformat ID allocation net/mlx5: Expose new packet reformat capabilities {net, RDMA}/mlx5: Rename encap to reformat packet net/mlx5: Move header encap type to IFC header file net/mlx5: Break encap/decap into two separated flow table creation flags net/mlx5: Add support for more namespaces when allocating modify header net/mlx5: Export modify header alloc/dealloc functions net/mlx5: Add proper NIC TX steering flow tables support net/mlx5: Cleanup flow namespace getter switch logic net/mlx5: Add memic command opcode to command checker ... Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()Logan Gunthorpe1-2/+9
In order to use PCI P2P memory the pci_p2pmem_map_sg() function must be called to map the correct PCI bus address. To do this, check the first page in the scatter list to see if it is P2P memory or not. At the moment, scatter lists that contain P2P memory must be homogeneous so if the first page is P2P the entire SGL should be P2P. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-10-17IB/mlx5: Add support for extended atomic operationsYonatan Cohen1-12/+84
Extended atomic operations cmp&swp and fetch&add is a Mellanox feature extending the standard atomic operation to use, varied operand sizes, as apposed to normal atomic operation that use an 8 byte operand only. Extended atomics allows masking the results and arguments. This patch configures QP to support extended atomic operation with the maximum size possible, as exposed by HCA capabilities. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17RDMA/core: Fix comment for hw stats init for port == 0Parav Pandit1-3/+3
When add_port() is done for port == 0, it indicates that ports hardware counters initialization should be skipped. Reflect so in the comment. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17RDMA/core: Refactor ib_register_device() functionParav Pandit1-51/+75
ib_register_device() does several allocation and initialization steps. Split it into smaller more readable functions for easy review and maintenance. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17RDMA/core: Fix unwinding flow in case of error to register deviceParav Pandit1-2/+4
If port pkey list initialization fails, free the port_immutable memory during cleanup path. Currently it is missed out. If cache setup fails, free the pkey list during cleanup path. Fixes: d291f1a65 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17ib_srp: Remove WARN_ON in srp_terminate_io()Hannes Reinecke1-9/+0
The WARN_ON() is pointless as the rport is placed in SDEV_TRANSPORT_OFFLINE at that time, so no new commands can be submitted via srp_queuecommand() Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17IB/mlx5: Allow scatter to CQE without global signaled WRsYonatan Cohen1-3/+11
Requester scatter to CQE is restricted to QPs configured to signal all WRs. This patch adds ability to enable scatter to cqe (force enable) in the requester without sig_all, for users who do not want all WRs signaled but rather just the ones whose data found in the CQE. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17IB/mlx5: Verify that driver supports user flagsYonatan Cohen1-0/+14
Flags sent down from user might not be supported by running driver. This might lead to unwanted bugs. To solve this, added macro to test for unsupported flags. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17IB/mlx5: Support scatter to CQE for DC transport typeYonatan Cohen3-21/+54
Scatter to CQE is a HW offload that saves PCI writes by scattering the payload to the CQE. This patch extends already existing functionality to support DC transport type. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17RDMA/drivers: Use core provided API for registering device attributesParav Pandit23-546/+367
Use rdma_set_device_sysfs_group() to register device attributes and simplify the driver. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-17IB/rxe: Remove unnecessary enum valuesNathan Chancellor3-12/+7
Clang warns when an emumerated type is implicitly converted to another. drivers/infiniband/sw/rxe/rxe.c:106:27: warning: implicit conversion from enumeration type 'enum rxe_device_param' to different enumeration type 'enum ib_atomic_cap' [-Wenum-conversion] rxe->attr.atomic_cap = RXE_ATOMIC_CAP; ~ ^~~~~~~~~~~~~~ drivers/infiniband/sw/rxe/rxe.c:131:22: warning: implicit conversion from enumeration type 'enum rxe_port_param' to different enumeration type 'enum ib_port_state' [-Wenum-conversion] port->attr.state = RXE_PORT_STATE; ~ ^~~~~~~~~~~~~~ drivers/infiniband/sw/rxe/rxe.c:132:24: warning: implicit conversion from enumeration type 'enum rxe_port_param' to different enumeration type 'enum ib_mtu' [-Wenum-conversion] port->attr.max_mtu = RXE_PORT_MAX_MTU; ~ ^~~~~~~~~~~~~~~~ drivers/infiniband/sw/rxe/rxe.c:133:27: warning: implicit conversion from enumeration type 'enum rxe_port_param' to different enumeration type 'enum ib_mtu' [-Wenum-conversion] port->attr.active_mtu = RXE_PORT_ACTIVE_MTU; ~ ^~~~~~~~~~~~~~~~~~~ drivers/infiniband/sw/rxe/rxe.c:151:24: warning: implicit conversion from enumeration type 'enum rxe_port_param' to different enumeration type 'enum ib_mtu' [-Wenum-conversion] ib_mtu_enum_to_int(RXE_PORT_ACTIVE_MTU); ~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~~ 5 warnings generated. Use the appropriate values from the expected enumerated type so no conversion needs to happen then remove the unneeded definitions. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16RDMA/umad: Use kernel API to allocate umad indexesLeon Romanovsky1-6/+5
Replace custom code to allocate indexes to generic kernel API. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/uverbs: Use kernel API to allocate uverbs indexesLeon Romanovsky1-6/+6
Replace custom code to allocate indexes to generic kernel API. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/core: Increase total number of RDMA ports across all devicesLeon Romanovsky1-1/+1
IDA adds overhead to store IDs bitmap with maximal value of IDA can be upto 2099202 (IDA_MAX = 0x80000000U / IDA_BITMAP_BITS - 1). However, there is no need to add such enormous number of devices and it is enough for now to limit it to be 8192. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16IB/mlx4: Add port and TID to MAD debug printHåkon Bugge1-8/+10
Add said information and make the debug print format consistent. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16IB/mlx4: Enable debug print of SMPsHåkon Bugge1-1/+1
IB Subnet Management Packets (SMPs) were excluded from debug prints. Fixed by enabling print even on QP0 MADs. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/core: Rename ports_parent to ports_kobjParav Pandit1-5/+4
Normally kobj objects have kobj suffix to reflect it. Rename ports_parent to ports_kobj. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/core: Do not expose unsupported countersParav Pandit1-7/+12
If the provider driver (such as rdma_rxe) doesn't support pma counters, avoid exposing its directory similar to optional hw_counters directory. If core fails to read the PMA counter, return an error so that user can retry later if needed. Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c") Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16IB/mlx4: Refer to the device kobject instead of ports_parentParav Pandit1-5/+1
iov sysfs tree is created under ib device at /sys/class/infiniband/mlx4_0/iov. And, ibdev->ports_parent->parent = &ibdev->dev. Therefore, refer to device's kobject directly instead of indirect access to it. Additionally, iov entries are created under device kobject and deleted before device is removed. There is no need to hold additional reference to device kobject in provider driver. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/nldev: Allow IB device rename through RDMA netlinkLeon Romanovsky1-0/+34
Provide an option to rename IB device name through RDMA netlink and limit it to users with ADMIN capability only. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/core: Implement IB device rename functionLeon Romanovsky2-0/+26
Generic implementation of IB device rename function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>