summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)AuthorFilesLines
2016-12-29net/mlx4_core: Fix raw qp flow steering rules under SRIOVJack Morgenstein2-21/+19
Demoting simple flow steering rule priority (for DPDK) was achieved by wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper function for the PF and all VFs. In function mlx4_ib_create_flow(), this change caused the main rule creation for the PF to be wrapped, while it left the associated tunnel steering rule creation unwrapped for the PF. This mismatch caused rule deletion failures in mlx4_ib_destroy_flow() for the PF when the detach wrapper function did not find the associated tunnel-steering rule (since creation of that rule for the PF did not go through the wrapper function). Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native" (so that the PF invocation does not go through the wrapper), and perform the required priority demotion for the PF in the mlx4_ib_create_flow() code path. Fixes: 48564135cba8 ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29net/mlx4_en: Fix type mismatch for 32-bit systemsSlava Shwartsman1-6/+2
is_power_of_2 expects unsigned long and we pass u64 max_val_cycles, this will be truncated on 32 bit systems, and the result is not what we were expecting. div_u64 expects u32 as a second argument and we pass max_val_cycles_rounded which is u64 hence it will always be truncated. Fix was tested on both 64 and 32 bit systems and got same results for max_val_cycles and max_val_cycles_rounded. Fixes: 4850cf458157 ("net/mlx4_en: Resolve dividing by zero in 32-bit system") Signed-off-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29net/mlx4: Remove BUG_ON from ICM allocation routineLeon Romanovsky1-1/+6
This patch removes BUG_ON() macro from mlx4_alloc_icm_coherent() by checking DMA address alignment in advance and performing proper folding in case of error. Fixes: 5b0bf5e25efe ("mlx4_core: Support ICM tables in coherent memory") Reported-by: Ozgur Karatas <okaratas@member.fsf.org> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29net/mlx4_en: Fix bad WQE issueEugenia Emantayev1-1/+7
Single send WQE in RX buffer should be stamped with software ownership in order to prevent the flow of QP in error in FW once UPDATE_QP is called. Fixes: 9f519f68cfff ('mlx4_en: Not using Shared Receive Queues') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29net/mlx4_core: Use-after-free causes a resource leak in flow-steering detachJack Morgenstein1-2/+4
mlx4_QP_FLOW_STEERING_DETACH_wrapper first removes the steering rule (which results in freeing the rule structure), and then references a field in this struct (the qp number) when releasing the busy-status on the rule's qp. Since this memory was freed, it could reallocated and changed. Therefore, the qp number in the struct may be incorrect, so that we are releasing the incorrect qp. This leaves the rule's qp in the busy state (and could possibly release an incorrect qp as well). Fix this by saving the qp number in a local variable, for use after removing the steering rule. Fixes: 2c473ae7e582 ("net/mlx4_core: Disallow releasing VF QPs which have steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5e: Disable netdev after closeSaeed Mahameed1-4/+4
Disable netdev should come after it was closed, although no harm of doing it before -hence the MLX5E_STATE_DESTROYING bit- but it is more natural this way. Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5e: Don't sync netdev state when not registeredSaeed Mahameed1-7/+12
Skip setting netdev vxlan ports and netdev rx_mode on driver load when netdev is not yet registered. Synchronizing with netdev state is needed only on reset flow where the netdev remains registered for the whole reset period. This also fixes an access before initialization of net_device.addr_list_lock - which for some reason initialized on register_netdev - where we queued set_rx_mode work on driver load before netdev registration. Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5e: Check ets capability before initializing ets settingsHuy Nguyen1-0/+3
During the initial setup, the ets command is sent to firmware without checking if the HCA supports ets. This causes the invalid command error. Add the ets capiblity check before sending firmware command to initialize ets settings. Fixes: e207b7e99176 ("net/mlx5e: ConnectX-4 firmware support for DCBX") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28Revert "net/mlx5e: Expose PCIe statistics to ethtool"Gal Pressman3-72/+1
This reverts commit 9c7262399ba12825f3ca4b00a76d8d5e77c720f5. PCIe counters were introduced in a new firmware version, as a result users with old firmware encountered a syndrome every 200ms due to update stats work. This feature will be re-introduced later with appropriate capabilities infrastructure. Fixes: 9c7262399ba1 ("net/mlx5e: Expose PCIe statistics to ethtool") Signed-off-by: Gal Pressman <galp@mellanox.com> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5: Prevent setting multicast macs for VFsMohamad Haj Yahia1-1/+1
Need to check that VF mac address entered by the admin user is either zero or unicast mac. Multicast mac addresses are prohibited. Fixes: 77256579c6b4 ('net/mlx5: E-Switch, Introduce Vport administration functions') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5: Release FTE lock in error flowMaor Gottlieb1-0/+1
Release the FTE lock when adding rule to the FTE has failed. Fixes: 0fd758d6112f ('net/mlx5: Don't unlock fte while still using it') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5: Mask destination mac value in ethtool steering rulesMaor Gottlieb1-0/+1
We need to mask the destination mac value with the destination mac mask when adding steering rule via ethtool. Fixes: 1174fce8d1410 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5: Avoid shadowing numa_nodeEli Cohen1-2/+1
Avoid using a local variable named numa_node to avoid shadowing a public one. Fixes: db058a186f98 ('net/mlx5_core: Set irq affinity hints') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5: Cancel recovery work in remove flowDaniel Jurgens1-2/+3
If there is pending delayed work for health recovery it must be canceled if the device is being unloaded. Fixes: 05ac2c0b7438 ("net/mlx5: Fix race between PCI error handlers and health work") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5: Check FW limitations on log_max_qp before setting itNoa Osherovich1-0/+7
When setting HCA capabilities, set log_max_qp to be the minimum between the selected profile's value and the HCA limitation. Fixes: 938fe83c8dcb ('net/mlx5_core: New device capabilities...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28net/mlx5: Disable RoCE on the e-switch management port under switchdev modeOr Gerlitz1-0/+11
Under the switchdev/offloads mode, packets that don't match any e-switch steering rule are sent towards the e-switch management port. We use a NIC HW steering rule set per vport (uplink and VFs) to make them be received into the host OS through the respective vport representor netdevice. Currnetly such missed RoCE packets will not get to this NIC steering rule, and hence VF RoCE will not work over the slow path of the offloads mode. This is b/c these packets will be matched by a steering rule added by the firmware that serves RoCE traffic set on the PF NIC vport which is also the e-switch management port under SRIOV. Disabling RoCE on the e-switch management vport when we are in the offloads mode, will signal to the firmware to remove their RoCE rule, and then the missed RoCE packets will be matched by the representor NIC steering rule as any other missed packets. To achieve that, we disable RoCE on the PF vport. We do that by removing (hot-unplugging) the IB device instance associated with the PF. This is also required by our current model where the PF serves as the uplink representor and hence only SW switching (TC, bridge, OVS) applications and slow path vport mlx5e net-device should be running over that vport. Fixes: c930a3ad7453 ('net/mlx5e: Add devlink based SRIOV mode changes') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+2
Pull networking fixes from David Miller: 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar. The most important is to not assume the packet is RX just because the destination address matches that of the device. Such an assumption causes problems when an interface is put into loopback mode. 2) If we retry when creating a new tc entry (because we dropped the RTNL mutex in order to load a module, for example) we end up with -EAGAIN and then loop trying to replay the request. But we didn't reset some state when looping back to the top like this, and if another thread meanwhile inserted the same tc entry we were trying to, we re-link it creating an enless loop in the tc chain. Fix from Daniel Borkmann. 3) There are two different WRITE bits in the MDIO address register for the stmmac chip, depending upon the chip variant. Due to a bug we could set them both, fix from Hock Leong Kweh. 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: stmmac: fix incorrect bit set in gmac4 mdio addr register r8169: add support for RTL8168 series add-on card. net: xdp: remove unused bfp_warn_invalid_xdp_buffer() openvswitch: upcall: Fix vlan handling. ipv4: Namespaceify tcp_tw_reuse knob net: korina: Fix NAPI versus resources freeing net, sched: fix soft lockup in tc_classify net/mlx4_en: Fix user prio field in XDP forward tipc: don't send FIN message from connectionless socket ipvlan: fix multicast processing ipvlan: fix various issues in ipvlan_process_multicast()
2016-12-25clocksource: Use a plain u64 instead of cycle_tThomas Gleixner5-7/+7
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
2016-12-23net/mlx4_en: Fix user prio field in XDP forwardTariq Toukan1-1/+2
The user prio field is wrong (and overflows) in the XDP forward flow. This is a result of a bad value for num_tx_rings_p_up, which should account all XDP TX rings, as they operate for the same user prio. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23mlxsw: spectrum_router: Correctly remove nexthop groupsIdo Schimmel1-0/+3
At the end of the nexthop initialization process we determine whether the nexthop should be offloaded or not based on the NUD state of the neighbour representing it. After all the nexthops were initialized we refresh the nexthop group and potentially offload it to the device, in case some of the nexthops were resolved. Make the destruction of a nexthop group symmetric with its creation by marking all nexthops as invalid and then refresh the nexthop group to make sure it was removed from the device's tables. Fixes: b2157149b0b0 ("mlxsw: spectrum_router: Add the nexthop neigh activity update") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23mlxsw: spectrum_router: Don't reflect dead neighsIdo Schimmel1-4/+6
When a neighbour is considered to be dead, we should remove it from the device's table regardless of its NUD state. Without this patch, after setting a port to be administratively down we get the following errors when we periodically try to update the kernel about neighbours activity: [ 461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find matching neighbour for IP=192.168.100.2 Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20net/mlx5: use rb_entry()Geliang Tang1-1/+1
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+1
Pull networking fixes and cleanups from David Miller: 1) Revert bogus nla_ok() change, from Alexey Dobriyan. 2) Various bpf validator fixes from Daniel Borkmann. 3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04 drivers, from Dongpo Li. 4) Several ethtool ksettings conversions from Philippe Reynes. 5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert. 6) XDP support for virtio_net, from John Fastabend. 7) Fix NAT handling within a vrf, from David Ahern. 8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits) net: mv643xx_eth: fix build failure isdn: Constify some function parameters mlxsw: spectrum: Mark split ports as such cgroup: Fix CGROUP_BPF config qed: fix old-style function definition net: ipv6: check route protocol when deleting routes r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected irda: w83977af_ir: cleanup an indent issue net: sfc: use new api ethtool_{get|set}_link_ksettings net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings bpf: fix mark_reg_unknown_value for spilled regs on map value marking bpf: fix overflow in prog accounting bpf: dynamically allocate digest scratch buffer gtp: Fix initialization of Flags octet in GTPv1 header gtp: gtp_check_src_ms_ipv4() always return success net/x25: use designated initializers isdn: use designated initializers ...
2016-12-17mlxsw: spectrum: Mark split ports as suchIdo Schimmel1-1/+1
When a port is split we should mark it as such, as otherwise the split ports aren't renamed correctly (e.g. sw1p3 -> sw1p3s1) and the unsplit operation fails: $ devlink port split sw1p3 count 4 $ devlink port unsplit eth0 devlink answers: Invalid argument [ 598.565307] mlxsw_spectrum 0000:03:00.0 eth0: Port wasn't split Fixes: 67963a33b4fd ("mlxsw: Make devlink port instances independent of spectrum/switchx2 port instances") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Tamir Winetroub <tamirw@mellanox.com> Reviewed-by: Elad Raz <eladr@mellanox.com> Tested-by: Tamir Winetroub <tamirw@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-15Merge tag 'pci-v4.10-changes' of ↵Linus Torvalds1-41/+43
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes: - add support for PCI on ARM64 boxes with ACPI. We already had this for theoretical spec-compliant hardware; now we're adding quirks for the actual hardware (Cavium, HiSilicon, Qualcomm, X-Gene) - add runtime PM support for hotplug ports - enable runtime suspend for Intel UHCI that uses platform-specific wakeup signaling - add yet another host bridge registration interface. We hope this is extensible enough to subsume the others - expose device revision in sysfs for DRM - to avoid device conflicts, make sure any VF BAR updates are done before enabling the VF - avoid unnecessary link retrains for ASPM - allow INTx masking on Mellanox devices that support it - allow access to non-standard VPD for Chelsio devices - update Broadcom iProc support for PAXB v2, PAXC v2, inbound DMA, etc - update Rockchip support for max-link-speed - add NVIDIA Tegra210 support - add Layerscape LS1046a support - update R-Car compatibility strings - add Qualcomm MSM8996 support - remove some uninformative bootup messages" * tag 'pci-v4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (115 commits) PCI: Enable access to non-standard VPD for Chelsio devices (cxgb3) PCI: Expand "VPD access disabled" quirk message PCI: pciehp: Remove loading message PCI: hotplug: Remove hotplug core message PCI: Remove service driver load/unload messages PCI/AER: Log AER IRQ when claiming Root Port PCI/AER: Log errors with PCI device, not PCIe service device PCI/AER: Remove unused version macros PCI/PME: Log PME IRQ when claiming Root Port PCI/PME: Drop unused support for PMEs from Root Complex Event Collectors PCI: Move config space size macros to pci_regs.h x86/platform/intel-mid: Constify mid_pci_platform_pm PCI/ASPM: Don't retrain link if ASPM not possible PCI: iproc: Skip check for legacy IRQ on PAXC buses PCI: pciehp: Leave power indicator on when enabling already-enabled slot PCI: pciehp: Prioritize data-link event over presence detect PCI: rcar: Add gen3 fallback compatibility string for pcie-rcar PCI: rcar: Use gen2 fallback compatibility last PCI: rcar-gen2: Use gen2 fallback compatibility last PCI: rockchip: Move the deassert of pm/aclk/pclk after phy_init() ..
2016-12-12Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+0
2016-12-09net: mlx5: Fix Kconfig help textChristopher Covington1-2/+0
Since the following commit, Infiniband and Ethernet have not been mutually exclusive. Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08net/mlx5e: use %pad format string for dma_addr_tArnd Bergmann1-2/+2
On 32-bit ARM with 64-bit dma_addr_t I get this warning about an incorrect format string: In file included from /git/arm-soc/drivers/net/ethernet/mellanox/mlx5/core/alloc.c:42:0: drivers/net/ethernet/mellanox/mlx5/core/alloc.c: In function ‘mlx5_frag_buf_alloc_node’: drivers/net/ethernet/mellanox/mlx5/core/alloc.c:134:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] We have the special %pad format for printing dma_addr_t, so use that to print the correct address and avoid the warning. Fixes: 1c1b522808a1 ("net/mlx5e: Implement Fragmented Work Queue (WQ)") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08mlx4: xdp: Reserve headroom for receiving packet when XDP prog is activeMartin KaFai Lau4-17/+27
Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head() support. This patch only affects the code path when XDP is active. After testing, the tx_dropped counter is incremented if the xdp_prog sends more than wire MTU. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrsMartin KaFai Lau2-30/+44
When XDP is active in mlx4, mlx4 is using one page/pkt. At the same time (i.e. when XDP is active), it is currently limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 in x86. AFAICT, we can at least raise the MTU limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is doing. It will be useful in the next patch which allows XDP program to extend the packet by adding new header(s). Note: In the earlier XDP patches, there is already existing guard to ensure the page/pkt scheme only applies when XDP is active in mlx4. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08bpf: xdp: Allow head adjustment in XDP progMartin KaFai Lau2-0/+10
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08net/mlx5e: Offload TC matching on packets being IP fragmentsOr Gerlitz1-0/+12
Enable offloading of matching on packets being fragments. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-42/+53
2016-12-06net/mlx5e: Change the SQ/RQ operational state to positive logicMohamad Haj Yahia5-13/+17
When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen we will have a time interval that the RQ/SQ is not really ready and the state indicates that its not in FLUSH state because the initial SQ/RQ struct memory starts as zeros. Now we changed the state to indicate if the SQ/RQ is opened and we will set the READY state after finishing preparing all the SQ/RQ resources. Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Fixes: f2fde18c52a7 ("net/mlx5e: Don't wait for RQ completions on close") Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5e: Don't flush SQ on errorSaeed Mahameed1-1/+0
We are doing SQ descriptors cleanup in driver. Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5e: Don't notify HW when filling the edge of ICO SQSaeed Mahameed1-1/+1
We are going to do this a couple of steps ahead anyway. Fixes: d3c9bc2743dc ("net/mlx5e: Added ICO SQs") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5: Fix query ISSI flowKamal Heib3-11/+14
In old FWs query ISSI command is not supported and for some of those FWs it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR". In such case instead of failing the driver load, we will treat any FW status other than 0 for Query ISSI FW command as ISSI not supported and assume ISSI=0 (most basic driver/FW interface). In case of driver syndrom (query ISSI failure by driver) we will fail driver load. Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5: Remove duplicate pci dev name printKamal Heib1-4/+4
Remove duplicate pci dev name printing from mlx5_core_warn/dbg. Fixes: 5a7883989b1c ('net/mlx5_core: Improve mlx5 messages') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5: Verify module parametersKamal Heib2-12/+17
Verify the mlx5_core module parameters by making sure that they are in the expected range and if they aren't restore them to their default values. Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Replay events when registering FIB notifierIdo Schimmel1-1/+19
Commit b90eb7549499 ("fib: introduce FIB notification infrastructure") introduced a new notification chain to notify listeners (f.e., switchdev drivers) about addition and deletion of routes. However, upon registration to the chain the FIB tables can already be populated, which means potential listeners will have an incomplete view of the tables. Solve that by dumping the FIB tables and replaying the events to the passed notification block. The dump itself is done using RCU in order not to starve consumers that need RTNL to make progress. The integrity of the dump is ensured by reading the FIB change sequence counter before and after the dump under RTNL. This allows us to avoid the problematic situation in which the dumping process sends a ENTRY_ADD notification following ENTRY_DEL generated by another process holding RTNL. Callers of the registration function may pass a callback that is executed in case the dump was inconsistent with current FIB tables. The number of retries until a consistent dump is achieved is set to a fixed number to prevent callers from looping for long periods of time. In case current limit proves to be problematic in the future, it can be easily converted to be configurable using a sysctl. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03mlxsw: spectrum_router: Implement FIB offload in deferred workIdo Schimmel1-10/+62
FIB offload is currently done in process context with RTNL held, but we're about to dump the FIB tables in RCU critical section, so we can no longer sleep. Instead, defer the operation to process context using deferred work. Make sure fib info isn't freed while the work is queued by taking a reference on it and releasing it after the operation is done. Deferring the operation is valid because the upper layers always assume the operation was successful. If it's not, then the driver-specific abort mechanism is called and all routed traffic is directed to slow path. The work items are submitted to an ordered workqueue to prevent a mismatch between the kernel's FIB table and the device's. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03mlxsw: core: Create an ordered workqueue for FIB offloadIdo Schimmel2-0/+24
We're going to start processing FIB entries addition / deletion events in deferred work. These work items must be processed in the order they were submitted or otherwise we can have differences between the kernel's FIB table and the device's. Solve this by creating an ordered workqueue to which these work items will be submitted to. Note that we can't simply convert the current workqueue to be ordered, as EMADs re-transmissions are also processed in deferred work. Later on, we can migrate other work items to this workqueue, such as FDB notification processing and nexthop resolution, since they all take the same lock anyway. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03mlx4: use reset to set mac headerZhang Shengju1-1/+1
Since offset is zero, it's not necessary to use set function. Reset function is straightforward, and will remove the unnecessary add operation in set function. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-20/+9
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02mlx4: fix use-after-free in mlx4_en_fold_software_stats()Eric Dumazet2-1/+5
My recent commit to get more precise rx/tx counters in ndo_get_stats64() can lead to crashes at device dismantle, as Jesper found out. We must prevent mlx4_en_fold_software_stats() trying to access tx/rx rings if they are deleted. Fix this by adding a test against priv->port_up in mlx4_en_fold_software_stats() Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port() allows us to eventually broadcast the latest/current counters to rtnetlink monitors. Fixes: 40931b85113d ("mlx4: give precise rx/tx bytes/packets counters") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Support adding ingress tc rule when egress device flag is setHadar Hen Zion1-0/+8
When ndo_setup_tc is called with an egress_dev flag set, it means that the ndo call was executed on the mirred action (egress) device and not on the ingress device. In order to support this kind of ndo_setup_tc call, and insert the correct decap rule to the hardware, the uplink device on the same eswitch should be found. Currently, we use this resolution between the mirred device and the uplink on the same eswitch to offload vxlan shared device decap rules. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Save the represntor netdevice as part of the representorHadar Hen Zion4-10/+22
Replace the representor private data to a net_device pointer holding the representor netdevice, instead of void pointer holding mlx5e_priv. It will be used by a new eswitch service function, returning the uplink representor netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Bring back representor's ndos that were accidentally removedHadar Hen Zion1-0/+2
The VF Representor udp tunnel ndo entries were removed by mistake, return them. Fixes: 370bad0f9a52 ('net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: skip loopback selftest with !CONFIG_INETArnd Bergmann1-1/+9
When CONFIG_INET is disabled, the new selftest results in a link error: drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: In function `mlx5e_test_loopback': en_selftest.c:(.text.mlx5e_test_loopback+0x2ec): undefined reference to `ip_send_check' en_selftest.c:(.text.mlx5e_test_loopback+0x34c): undefined reference to `udp4_hwcsum' This hides the specific test in that configuration. Fixes: 0952da791c97 ("net/mlx5e: Add support for loopback selftest") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>