summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-12-17bonding: add fail_over_mac attribute netlink supportsfeldma@cumulusnetworks.com5-18/+40
Add IFLA_BOND_FAIL_OVER_MAC to allow get/set of bonding parameter fail_over_mac via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17bonding: add primary_select attribute netlink supportsfeldma@cumulusnetworks.com5-16/+41
Add IFLA_BOND_PRIMARY_SELECT to allow get/set of bonding parameter primary_select via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17bonding: add primary attribute netlink supportsfeldma@cumulusnetworks.com5-43/+83
Add IFLA_BOND_PRIMARY to allow get/set of bonding parameter primary via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17pkt_sched: fq: more robust memory allocationEric Dumazet1-6/+28
This patch brings NUMA support and automatic fallback to vmalloc() in case kmalloc() failed to allocate FQ hash table. NUMA support depends on XPS being setup for the device before qdisc allocation. After a XPS change, it might be worth creating qdisc hierarchy again. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17bondnl: use be32 nla put/get for be32 valuesJiri Pirko1-2/+2
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17bnad: make local variable staticstephen hemminger2-2/+1
Compile tested only. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Rasesh Mody <rmody@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17tcp: refine TSO splitsEric Dumazet2-46/+54
While investigating performance problems on small RPC workloads, I noticed linux TCP stack was always splitting the last TSO skb into two parts (skbs). One being a multiple of MSS, and a small one with the Push flag. This split is done even if TCP_NODELAY is set, or if no small packet is in flight. Example with request/response of 4K/4K IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001> IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001> IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001> IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593> IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593> IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593> IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001> IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001> IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001> We can avoid this split by including the Nagle tests at the right place. Note : If some NIC had trouble sending TSO packets with a partial last segment, we would have hit the problem in GRO/forwarding workload already. tcp_minshall_update() is moved to tcp_output.c and is updated as we might feed a TSO packet with a partial last segment. This patch tremendously improves performance, as the traffic now looks like : IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685> IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685> IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277> IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277> IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686> IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686> IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279> IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279> IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687> IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687> IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280> IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280> Before : lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K 280774 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K': 205719.049006 task-clock # 9.278 CPUs utilized 8,449,968 context-switches # 0.041 M/sec 1,935,997 CPU-migrations # 0.009 M/sec 160,541 page-faults # 0.780 K/sec 548,478,722,290 cycles # 2.666 GHz [83.20%] 455,240,670,857 stalled-cycles-frontend # 83.00% frontend cycles idle [83.48%] 272,881,454,275 stalled-cycles-backend # 49.75% backend cycles idle [66.73%] 166,091,460,030 instructions # 0.30 insns per cycle # 2.74 stalled cycles per insn [83.39%] 29,150,229,399 branches # 141.699 M/sec [83.30%] 1,943,814,026 branch-misses # 6.67% of all branches [83.32%] 22.173517844 seconds time elapsed lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets" IpOutRequests 16851063 0.0 IpExtOutOctets 23878580777 0.0 After patch : lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K 280877 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K': 107496.071918 task-clock # 4.847 CPUs utilized 5,635,458 context-switches # 0.052 M/sec 1,374,707 CPU-migrations # 0.013 M/sec 160,920 page-faults # 0.001 M/sec 281,500,010,924 cycles # 2.619 GHz [83.28%] 228,865,069,307 stalled-cycles-frontend # 81.30% frontend cycles idle [83.38%] 142,462,742,658 stalled-cycles-backend # 50.61% backend cycles idle [66.81%] 95,227,712,566 instructions # 0.34 insns per cycle # 2.40 stalled cycles per insn [83.43%] 16,209,868,171 branches # 150.795 M/sec [83.20%] 874,252,952 branch-misses # 5.39% of all branches [83.37%] 22.175821286 seconds time elapsed lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets" IpOutRequests 11239428 0.0 IpExtOutOctets 23595191035 0.0 Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher : 2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet scheduler. Many thanks to Neal for review and ideas. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Van Jacobson <vanj@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: remove dead code for add/del multiplestephen hemminger2-107/+1
These function to manipulate multiple addresses are not used anywhere in current net-next tree. Some out of tree code maybe using these but too bad; they should submit their code upstream.. Also, make __hw_addr_flush local since only used by dev_addr_lists.c Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17Merge branch 'for-davem' of ↵David S. Miller222-4227/+4954
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please pull this batch of updates for the 3.14 stream... For the Bluetooth bits, Gustavo says: "This is the first batch of patches intended for 3.14. There is nothing big here. Most of the code are refactors, clean up, small fixes, plus some new device id support." And... "More patches to 3.14. Here we have the support for Low Energy Connection Oriented Channels (LE CoC). Basically, as the name says, this adds supports for connection oriented channels in the same way we already have them for BR/EDR connections so profiles/protocols that work on top of BR/EDR can now work on LE plus a plenty of new possibilities for LE." For the ath10k bits, Kalle says: "Janusz and Marek implemented DFS support to ath10k, but the code is not enabled yet due to missing cfg80211/mac80211 patches (it will be enabled in the next pull request). Michal did some device reset fixes and made it possible for ath10k to share an interrupt with another device. And lots of smaller fixes from different people." For the iwlwifi bits, Emmanuel says: "I have here a big rework of the rate control by Eyal. This is obviously the biggest part of this batch. I also have enhancement of protection flags by Avri and a few bits for WoWLAN by Eliad and Luca. Johannes cleans up the debugfs plus a few fixes. I provided a few things for Bluetooth coexistence. Besides this we have an implementation for low priority scan." Along with all that, there are big batches of updates to mwifiex and ath9k, Jeff Kirsher's FSF address fix patches, and a handful of other bits here and there. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17Merge branch 'phy_power'David S. Miller5-2/+59
Sebastian Hesselbarth says: ==================== net: phy: Ethernet PHY powerdown optimization This is v2 of the ethernet PHY power optimization patches to reduce power consumption of network PHYs with link that are either unused or the corresponding netdev is down. Compared to the last version, this patch set drops a patch to disable unused PHYs after late initcall, as it is not compatible with a modular mdio bus [1]. I'll investigate different ways to have a modular mdio bus driver get notified when driver loading is done. Again, a branch with v2 applied to v3.13-rc2 can also be found at https://github.com/shesselba/linux-dove.git topic/ethphy-power-v2 [1] http://www.spinics.net/lists/arm-kernel/msg293028.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: phy: suspend phydev when going to HALTEDSebastian Hesselbarth1-1/+5
When phydev is going to HALTED state, we can try to suspend it to safe more power. phy_suspend helper will check if PHY can be suspended, so just call it when entering HALTED state. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: phy: resume/suspend PHYs on attach/detachSebastian Hesselbarth1-0/+3
This ensures PHYs are resumed on attach and suspended on detach. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: phy: provide phy_resume/phy_suspend helpersSebastian Hesselbarth2-0/+26
This adds helper functions to resume and suspend a given phy_device by calling the corresponding driver callbacks if available. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: phy: marvell: provide genphy suspend/resumeSebastian Hesselbarth1-0/+22
Marvell PHYs support generic PHY suspend/resume, so provide those callbacks to all marvell specific drivers. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: mv643xx_eth: properly start/stop phy deviceSebastian Hesselbarth1-1/+3
When using phydev, it should be phy_start/phy_stop'ed properly. This driver doesn't do that, so add the corresponding calls to port_start/ stop respectively. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17sctp: Reorder 'struc association' members to reduce its sizewangweidong1-30/+29
Members of 'struct association' are not in appropriate order to reuse compiler added padding on 64bit architectures. In this patch we reorder those struct members and help reduce the size of the structure from 2776 bytes to 2720 bytes on 64 bit architectures. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17Merge branch 'master' of ↵David S. Miller8-90/+469
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to i40e only (again). Jesse provides a fix for when tx_rings structure is NULL and we do not want to panic. Then refactors the flow control set up and disables L2 flow control by default. Provides some trivial fixes as well as prevent compiler warnings. Then to align to similar behaviour in ixgbe, use the total number of CPUs in the system to suggest the number of transmit and receive queue pairs. Shannon provides a i40e ethtool fix to get some more reasonable information reports back out to the ethtool. In addition, fixes PF reset after offline test, where it reorders the test to put the register test last as it is the only one that needs a reset, and we wait to trigger the reset until after we clear the testing bit. Lastly provides basic support for handling suspend and resume for now, later on Wake-On-LAN support will be added. Anjali provides changes to tell the stack about our actual number of queues in order for RFS/RPS/XFS to work correctly. Then provides several patches to implement dynamically changing the queue count for the main VSI. Adds basic support for get/set channels for RSS so that the number of receive and transmit queue pair can be changed via ethtool. Cleans up the use of rtnl_lock in the reset patch since it runs from a work time. Neerav Parikh cleans up the VF interface to remove FCoE code as this feature will not be supported on VF interfaces. v2: - submitted patch 1 to net (since it was a fix needed for net), so dropped from this series (this patch will get added to net-next when Dave syncs his trees) - Dropped patches 4 & 11 from previous submission because of feedback received from Ben Hutchings and Sergei Shtylyov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17Merge branch 'ovs_hash'David S. Miller8-4/+182
Francesco Fusco says: ==================== ovs: introduce arch-specific fast hashing improvements From: Daniel Borkmann <dborkman@redhat.com> We are introducing a fast hash function (see patch1) that can be used in the context of OpenVSwitch to reduce the hashing footprint (patch2). For details, please see individual patches! v1->v2: - Make hash generic and place it under lib ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: ovs: use CRC32 accelerated flow hash if availableFrancesco Fusco1-2/+2
Currently OVS uses jhash2() for calculating flow hashes in its internal flow_hash() function. The performance of the flow_hash() function is critical, as the input data can be hundreds of bytes long. OVS is largely deployed in x86_64 based datacenters. Therefore, we argue that the performance critical fast path of OVS should exploit underlying CPU features in order to reduce the per packet processing costs. We replace jhash2 with the hash implementation provided by the kernel hash lib, which exploits the crc32l instruction to achieve high performance Our patch greatly reduces the hash footprint from ~200 cycles of jhash2() to around ~90 cycles in case of ovs_flow_hash_crc() (measured with rdtsc over maximum length flow keys on an i7 Intel CPU). Additionally, we wrote a microbenchmark to stress the flow table performance. The benchmark inserts random flows into the flow hash and then performs lookups. Our hash deployed on a CRC32 capable CPU reduces the lookup for 1000 flows, 100 masks from ~10,100us to ~6,700us, for example. Thus, simply use the newly introduced arch_fast_hash2() as a drop-in replacement. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17lib: introduce arch optimized hash libraryFrancesco Fusco7-2/+180
We introduce a new hashing library that is meant to be used in the contexts where speed is more important than uniformity of the hashed values. The hash library leverages architecture specific implementation to achieve high performance and fall backs to jhash() for the generic case. On Intel-based x86 architectures, the library can exploit the crc32l instruction, part of the Intel SSE4.2 instruction set, if the instruction is supported by the processor. This implementation is twice as fast as the jhash() implementation on an i7 processor. Additional architectures, such as Arm64 provide instructions for accelerating the computation of CRC, so they could be added as well in follow-up work. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16fddi: cleanup unsigned to unsigned int/shorttanxiaojun4-73/+73
Use "unsigned int/short" instead of "unsigned", and change the type of iteration variable "i" to "unsigned int". Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16tipc: change lock_sock order in connect()wangweidong1-4/+2
Instead of reaquiring the socket lock and taking the normal exit path when a connection times out, we bail out early with a return -ETIMEDOUT. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16tipc: Use <linux/uaccess.h> instead of <asm/uaccess.h>wangweidong1-1/+1
As warned by checkpatch.pl, use #include <linux/uaccess.h> instead of <asm/uaccess.h> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16tipc: kill unnecessary goto'swangweidong1-8/+6
Remove a number of needless 'goto exit' in send_stream when the socket is in an unconnected state. This patch is cosmetic and does not alter the operation of TIPC in any way. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16tipc: remove unnecessary variables and conditionswangweidong3-17/+8
We remove a number of unnecessary variables and branches in TIPC. This patch is cosmetic and does not change the operation of TIPC in any way. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16i40e: Remove FCoE in i40e_virtchnl_pf.c codeNeerav Parikh1-27/+2
Remove FCoE code from the VF interface, as the feature will not be supported on VF interfaces. Change-Id: Ie9db04fa2e37fa14ac3e73a9c20980348d931357 Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: support for suspend and resumeShannon Nelson2-1/+97
Add basic support for handling suspend and resume. We'll add Wake-on-LAN support later. Change-Id: Iea5e11c81bd9289a5bdbf086de8f626911a0b5ce Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: rtnl_lock in reset path fixesAnjali Singhai Jain3-5/+23
Any user-initiated path which eventually calls reset needs to hold the rtnl_lock, so add functionality to do that. Be careful not to use the safe reset when cleaning up from the diagnostic tests, which avoids rtnl_lock recursion from ethtool. Protect the reset_task with rtnl_lock, since it runs from a work item. Change-Id: Ib6e7a3fb2966809db2daf35fd5a123ccdf6f6f0f Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: Add basic support for get/set channels for RSSAnjali Singhai Jain1-0/+90
Implement the number of receive/transmit queue pair being changed on the fly by ethtool. Change-Id: I70df2363f1ca40b63610baa561c5b6b92b81bca7 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: function to reconfigure RSS queues and rebuildAnjali Singhai Jain2-8/+50
This is the second of 3 patches that allows for changing the number of queues in the driver on the fly. This patch adds a function that calls the reinit flow for the main VSI after making changes to the RSS queue count as requested by the user. Change-Id: I82dee91e9fe90eeb4e84a7369f4b8b342155dd85 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: reinit flow for the main VSIAnjali Singhai Jain1-21/+97
This patch is the first in a 3 series patchset to implement dynamically changing the queue count for the main VSI. This patch starts by adding a reinit flow. This flow is designed to be able to change just the queue count and not the number of interrupt vectors that the device originally came up with. Change-Id: I0634aaebf7dc4dd6c66af8f9dbbef89d7beac438 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: use same number of queues as CPUsJesse Brandeburg1-4/+3
The current driver default sets the number of transmit/receive queue pairs based on the current node's CPU count. A better method is to use the total number of CPUs in the system to suggest the number of queue pairs, which aligns better with the behavior of ixgbe, and also with the expectations of the kernel XPS and other subsystems in the stack. Change-Id: If3e20c7f100f13e51d69762594d948f247ffe0c8 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: trivial fixesJesse Brandeburg3-3/+3
Prevent some compiler warnings and implement some other trivial fixes. Change-Id: I7f49d79b91b94df1ad4a8306a0410ed72238845f Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: init flow control settings to disabledJesse Brandeburg2-9/+60
Refactor flow control set up and disable L2 flow control by default. Change-Id: I2fe257b80df6d9a1e37deb4df118da8f8467040d Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: Tell the stack about our actual number of queuesAnjali Singhai Jain1-0/+10
Call the netif_set_real* functions in order to make sure the stack knows about how many queues we have, in order for RFS/RPS/XFS to work correctly. Change-Id: Ib7a7b2792f80c5eef210dedf42cc6607d63953d2 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: fix pf reset after offline testShannon Nelson1-6/+8
When the ethtool testing starts it sets the I40E_TESTING state bit, which blocks new netdev opens so that things don't get confused, while the testing might be messing with register and other things. Unfortunately, that was keeping the PF resets after the register test from working correctly because the netdev would not get reopened. This patch reorders the tests to put the register test last as it is the only one that needs a reset, and we wait to trigger the reset until after we clear the I40E_TESTING bit. Change-Id: Ieaa18d74264250ac336b0656b490125ee8a22d2a Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-16i40e: fix up some of the ethtool connection reportingJesse Brandeburg2-8/+28
Get some more reasonable information reported back out to ethtool for the different types of connections supported. Change-Id: I57b153f86b9cdd04ad7cb5bf7d1c45873c196a7a Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-15net-ipv6: Fix alleged compiler warning in ipv6_exthdrs_len()Jerry Chu1-8/+4
It was reported that Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") triggered a compiler warning in ipv6_exthdrs_len(): net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-u opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Note that there was no real bug here - optlen was never uninitialized before use. (Was the version of gcc I used smarter to not complain?) Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14Merge branch 'for-davem' of ↵David S. Miller10-217/+797
git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next Ben Hutchings says: ==================== 1. Change PTP clock name to 'sfc'. 2. Complete support for hardware timestamping and PTP clock on the SFC9100 family. 3. Various cleanups for the PTP code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14ipv6: fix compiler warning in ipv6_exthdrs_lenHannes Frederic Sowa1-4/+5
Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") used an uninitialized variable which leads to the following compiler warning: net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-uninitialized] opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Fix it up. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14Merge branch 'bonding_rcu'David S. Miller8-147/+130
Ding Tianhong says: ==================== bonding: rebuild the lock use for bond monitor Now the bond slave list is not protected by bond lock, only by RTNL, but the monitor still use the bond lock to protect the slave list, it is useless, according to the Veaceslav's opinion, there were three way to fix the protect problem: 1. add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe to call call_netdevice_notifiers() in write lock. 2. remove unused bond->lock for monitor function, only use the exist rtnl lock(), it will take performance loss in fast path. 3. use RCU to protect the slave list, of course, performance is better, but in slow path, it is ignored. obviously the solution 1 is not fit here, I will consider the 2 and 3 solution. My principle is simple, if in fast path, RCU is better, otherwise in slow path, both is well, but according to the Jay Vosburgh's opinion, the monitor will loss performace if use RTNL to protect the all slave list, so remove the bond lock and replace with RCU. The second problem is the curr_slave_lock for bond, it is too old and unwanted in many place, because the curr_active_slave would only be changed in 3 place: 1. enslave slave. 2. release slave. 3. change active slave. all above were already holding bond lock, RTNL and curr_slave_lock together, it is tedious and no need to add so mach lock, when change the curr_active_slave, you have to hold the RTNL and curr_slave_lock together, and when you read the curr_active_slave, RTNL or curr_slave_lock, any one of them is no problem. for the stability, I did not change the logic for the monitor, all change is clear and simple, I have test the patch set for lockdep, it work well and stability. v2. accept the Jay Vosburgh's opinion, remove the RTNL and replace with RCU, also add some rcu function for bond use, so the patch set reach 10. v3. accept the Nikolay Aleksandrov's opinion, remove no needed bond_has_slave_rcu(), add protection for several 3ad mode handler functions and current_arp_slave. rebuild the bond_first_slave_rcu(), make it more clear. v4. because the struct netdev_adjacent should not be exist in netdevice.h, so I have to make a new function to support micro bond_first_slave_rcu(). also add a new patch to simplify the bond_resend_igmp_join_requests_delayed(). v5. according the Jay Vosburgh's opinion, in patch 2 and 6, the calling of notify peer is hardly to happen with the bond_xxx_commit() when the monitoring is running, so the performance impact about make two round trips to one trip on RTNL is minimal, no need to do that,the reason is very clear, so modify the patch 2 and 6, recover the notify peer in RTNL alone. ==================== Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: rebuild the bond_resend_igmp_join_requests_delayed()dingtianhong1-16/+5
The bond_resend_igmp_join_requests_delayed() and bond_resend_igmp_join_requests() should be integrated, because the bond_resend_igmp_join_requests_delayed() did nothing except bond_resend_igmp_join_requests(). The bond igmp_retrans could only be changed in bond_change_active_slave and here, bond_change_active_slave will be called in RTNL and curr_slave_lock, the bond_resend_igmp_join_requests already hold RTNL, so no need to free RTNL and hold curr_slave_lock again, it may be a small optimization, so move the igmp_retrans in RTNL and remove the curr_slave_lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: remove unwanted lock for bond_store_primaryxxx()dingtianhong1-4/+0
The bond_select_active_slave() will not release and acquire bond lock, so it is no need to read the bond lock for them, and the bond_store_primaryxxx() is already in RTNL, so remove the unwanted lock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: remove unwanted lock for bond_option_active_slave_set()dingtianhong1-2/+0
The bond_option_active_slave_set() is always called in RTNL, the RTNL could protect bond slave list, so remove the unwanted bond lock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: add RCU for bond_3ad_state_machine_handler()dingtianhong2-26/+35
The bond_3ad_state_machine_handler() use the bond lock to protect the bond slave list and slave port together, but it is not enough, the bond slave list was link and unlink in RTNL, not bond lock, so I add RCU to protect the slave list from leaving. The bond lock is still used here, because when the slave has been removed from the list by the time the state machine runs, it appears to be possible for both function to manupulate the same aggregator->lag_ports by finding the aggregator via two different ports that are both members of that aggregator (i.e., port A of the agg is being unbound, and port B of the agg is runing its state machine). If I remove the bond lock, there are nothing to mutex changes to aggregator->lag_ports between bond_3ad_state_machine_handler and bond_3ad_unbind_slave, So the bond lock is the simplest way to protect aggregator->lag_ports. There was a lot of function need RCU protect, I have two choice to make the function in RCU-safe, (1) create new similar functions and make the bond slave list in RCU. (2) modify the existed functions and make them in read-side critical section, because the RCU read-side critical sections may be nested. I choose (2) because it is no need to create more similar functions. The nots in the function is still too old, clean up the nots. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: remove unwanted lock for bond enslave and releasedingtianhong1-21/+6
The bond_change_active_slave() and bond_select_active_slave() do't need bond lock anymore, so remove the unwanted bond lock for these two functions. The bond_select_active_slave() will release and acquire curr_slave_lock, so the curr_slave_lock need to protect the function. In bond enslave and bond release, the bond slave list is also protected by RTNL, so bond lock is no need to exist, remove the lock and clean the functions. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: rebuild the lock use for bond_activebackup_arp_mon()dingtianhong1-25/+20
The bond_activebackup_arp_mon() use the bond lock for read to protect the slave list, it is no effect, and the RTNL is only called for bond_ab_arp_commit() and peer notify, for the performance better, use RCU to replace with the bond lock, to the bond slave list need to called in RCU, add a new bond_first_slave_rcu() to get the first slave in RCU protection. In bond_ab_arp_probe(), the bond->current_arp_slave may changd if bond release slave, just like: bond_ab_arp_probe() bond_release() cpu 0 cpu 1 ... if (bond->current_arp_slave...) ... ... bond->current_arp_slave = NULl bond->current_arp_slave->dev->name ... So the current_arp_slave need to dereference in the section. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: create bond_first_slave_rcu()dingtianhong3-0/+26
The bond_first_slave_rcu() will be used to instead of bond_first_slave() in rcu_read_lock(). According to the Jay Vosburgh's suggestion, the struct netdev_adjacent should hide from users who wanted to use it directly. so I package a new function to get the first slave of the bond. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: rebuild the lock use for bond_loadbalance_arp_mon()dingtianhong1-6/+12
The bond_loadbalance_arp_mon() use the bond lock to protect the bond slave list, it is no effect, so I could use RTNL or RCU to replace it, considering the performance impact, the RCU is more better here, so the bond lock replace with the RCU. The bond_select_active_slave() need RTNL and curr_slave_lock together, but there is no RTNL lock here, so add a rtnl_rtylock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: rebuild the lock use for bond_alb_monitor()dingtianhong1-13/+9
The bond_alb_monitor use bond lock to protect the bond slave list, it is no effect here, we need to use RTNL or RCU to replace bond lock, the bond_alb_monitor will called 10 times one second, RTNL may loss performance here, so I replace bond lock with RCU to protect the bond slave list, also the RTNL is preserved, the logic of the monitor did not changed. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>