summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-10-21dev: use name hash for dev_seq_opsMihai Maruseac1-15/+69
Instead of using the dev->next chain and trying to resync at each call to dev_seq_start, use the name hash, keeping the bucket and the offset in seq->private field. Tests revealed the following results for ifconfig > /dev/null * 1000 interfaces: * 0.114s without patch * 0.089s with patch * 3000 interfaces: * 0.489s without patch * 0.110s with patch * 5000 interfaces: * 1.363s without patch * 0.250s with patch * 128000 interfaces (other setup): * ~100s without patch * ~30s with patch Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21macvtap: Fix the minor device number allocationEric W. Biederman2-16/+72
On systems that create and delete lots of dynamic devices the 31bit linux ifindex fails to fit in the 16bit macvtap minor, resulting in unusable macvtap devices. I have systems running automated tests that that hit this condition in just a few days. Use a linux idr allocator to track which mavtap minor numbers are available and and to track the association between macvtap minor numbers and macvtap network devices. Remove the unnecessary unneccessary check to see if the network device we have found is indeed a macvtap device. With macvtap specific data structures it is impossible to find any other kind of networking device. Increase the macvtap minor range from 65536 to the full 20 bits that is supported by linux device numbers. It doesn't solve the original problem but there is no penalty for a larger minor device range. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21macvtap: Rewrite macvtap_newlink so the error handling works.Eric W. Biederman1-24/+49
Place macvlan_common_newlink at the end of macvtap_newlink because failing in newlink after registering your network device is not supported. Move device_create into a netdevice creation notifier. The network device notifier is the only hook that is called after the network device has been registered with the device layer and before register_network_device returns success. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21macvtap: Don't leak unreceived packets when we delete a macvtap device.Eric W. Biederman1-0/+6
To avoid leaking packets in the receive queue. Add a socket destructor that will run whenever destroy a macvtap socket. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21macvtap: Fix macvtap_open races in the zero copy enable code.Eric W. Biederman1-6/+5
To see if it is appropriate to enable the macvtap zero copy feature don't test the lowerdev network device flags. Instead test the macvtap network device flags which are a direct copy of the lowerdev flags. This is important because nothing holds a reference to lowerdev and on a very bad day we lowerdev could be a pointer to stale memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21macvtap: Close a race between macvtap_open and macvtap_dellink.Eric W. Biederman1-0/+2
There is a small window in macvtap_open between looking up a networking device and calling macvtap_set_queue in which macvtap_del_queues called from macvtap_dellink. After calling macvtap_del_queues it is totally incorrect to allow macvtap_set_queue to proceed so prevent success by reporting that all of the available queues are in use. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21virtio_net: fix truesize underestimationEric Dumazet1-1/+1
We must account in skb->truesize, the size of the fragments, not the used part of them. Doing this work is important to avoid unexpected OOM situations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: virtualization@lists.linux-foundation.org CC: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21bnx2x: fix skb truesize underestimationEric Dumazet1-1/+1
bnx2x allocates a full page per fragment. We must account in skb->truesize, the size of the fragment, not the used part of it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21net: add opaque struct around skb frag pageIan Campbell2-7/+9
I've split this bit out of the skb frag destructor patch since it helps enforce the use of the fragment API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21cxgbi: convert to SKB paged frag API.Ian Campbell2-14/+16
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@suse.de> Cc: Karen Xie <kxie@chelsio.com> Cc: linux-scsi@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21cxgb4vf: convert to SKB paged frag API.Ian Campbell2-53/+41
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Casey Leedom <leedom@chelsio.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21cxgb4: convert to SKB paged frag API.Ian Campbell2-23/+24
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Dimitris Michailidis <dm@chelsio.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21mlx4: convert to SKB paged frag API.Ian Campbell2-32/+20
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20net: allow CAP_NET_RAW to set socket options IP{,V6}_TRANSPARENTMaciej Żenczykowski3-3/+4
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set the same bit in the socket struct) have required CAP_NET_ADMIN privileges to set or clear the option. - we make clearing the bit not require any privileges. - we allow CAP_NET_ADMIN to set the bit (as before this change) - we allow CAP_NET_RAW to set this bit, because raw sockets already pretty much effectively allow you to emulate socket transparency. Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20net: constify skbuff and Qdisc elementsEric Dumazet2-20/+21
Preliminary patch before tcp constification Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20tcp: remove unused tcp_fin() parametersEric Dumazet1-3/+3
tcp_fin() only needs socket pointer, we can remove skb and th params. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20ll_temac: Add support for ethtoolRicardo1-0/+27
This patch enables the ethtool interface. The implementation is done using the libphy helper functions. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20igb: fix a compile warningRongQing Li1-0/+3
control these three function declarations and definitions with same macro CONFIG_PCI_IOV drivers/net/ethernet/intel/igb/igb_main.c:165: warning: ‘igb_vf_configure’ declared ‘static’ but never defined drivers/net/ethernet/intel/igb/igb_main.c:166: warning: ‘igb_find_enabled_vfs’ declared ‘static’ but never defined drivers/net/ethernet/intel/igb/igb_main.c:167: warning: ‘igb_check_vf_assignment’ declared ‘static’ but never defined Signed-off-by: RongQing Li <roy.qing.li@gmail.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20myri10ge: fix truesize underestimationEric Dumazet1-1/+2
skb->truesize must account for allocated memory, not the used part of it. Doing this work is important to avoid unexpected OOM situations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jon Mason <mason@myri.com> Acked-by: Jon Mason <mason@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20igbvf: fix truesize underestimationEric Dumazet1-1/+1
igbvf allocates half a page per skb fragment. We must account PAGE_SIZE/2 increments on skb->truesize, not the actual frag length. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20pktgen: remove ndelay() callEric Dumazet1-4/+7
Daniel Turull reported inaccuracies in pktgen when using low packet rates, because we call ndelay(val) with values bigger than 20000. Instead of calling ndelay() for delays < 100us, we can instead loop calling ktime_now() only. Reported-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20tcp: use TCP_DEFAULT_INIT_RCVWND in tcp_fixup_rcvbuf()Eric Dumazet1-8/+15
Since commit 356f039822b (TCP: increase default initial receive window.), we allow sender to send 10 (TCP_DEFAULT_INIT_RCVWND) segments. Change tcp_fixup_rcvbuf() to reflect this change, even if no real change is expected, since sysctl_tcp_rmem[1] = 87380 and this value is bigger than tcp_fixup_rcvbuf() computed rcvmem (~23720) Note: Since commit 356f039822b limited default window to maximum of 10*1460 and 2*MSS, we use same heuristic in this patch. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20mm: add a "struct page_frag" type containing a page, offset and lengthIan Campbell1-0/+11
A few network drivers currently use skb_frag_struct for this purpose but I have patches which add additional fields and semantics there which these other uses do not want. A structure for reference sub-page regions seems like a generally useful thing so do so instead of adding a network subsystem specific structure. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20mlx4_en: fix skb truesize underestimationEric Dumazet1-6/+5
skb->truesize must account for allocated memory, not the used part of it. Doing this work is important to avoid unexpected OOM situations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20virtio_net: Clean up set_skb_frag()Krishna Kumar1-8/+5
Remove manual initialization in set_skb_frag, and instead use __skb_fill_page_desc() to do the same. Patch tested on net-next. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19net: do not take an additional reference in skb_frag_set_pageIan Campbell2-1/+1
I audited all of the callers in the tree and only one of them (pktgen) expects it to do so. Taking this reference is pretty obviously confusing and error prone. In particular I looked at the following commits which switched callers of (__)skb_frag_set_page to the skb paged fragment api: 6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API. 5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API. 0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API. 86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API. 4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API. 18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API. b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API. b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API. 804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19neigh: fix rcu splat in neigh_update()roy.qing.li@gmail.com1-0/+4
when use dst_get_neighbour to get neighbour, we need rcu_read_lock to protect, since dst_get_neighbour uses rcu_dereference. The bug was reported by Ari Savolainen <ari.m.savolainen@gmail.com> [ 105.612095] [ 105.612096] =================================================== [ 105.612100] [ INFO: suspicious rcu_dereference_check() usage. ] [ 105.612101] --------------------------------------------------- [ 105.612103] include/net/dst.h:91 invoked rcu_dereference_check() without protection! [ 105.612105] [ 105.612106] other info that might help us debug this: [ 105.612106] [ 105.612108] [ 105.612108] rcu_scheduler_active = 1, debug_locks = 0 [ 105.612110] 1 lock held by dnsmasq/2618: [ 105.612111] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815df8c7>] rtnl_lock+0x17/0x20 [ 105.612120] [ 105.612121] stack backtrace: [ 105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41 [ 105.612125] Call Trace: [ 105.612129] [<ffffffff810ccdcb>] lockdep_rcu_dereference+0xbb/0xc0 [ 105.612132] [<ffffffff815dc5a9>] neigh_update+0x4f9/0x5f0 [ 105.612135] [<ffffffff815da001>] ? neigh_lookup+0xe1/0x220 [ 105.612139] [<ffffffff81639298>] arp_req_set+0xb8/0x230 [ 105.612142] [<ffffffff8163a59f>] arp_ioctl+0x1bf/0x310 [ 105.612146] [<ffffffff810baa40>] ? lock_hrtimer_base.isra.26+0x30/0x60 [ 105.612150] [<ffffffff8163fb75>] inet_ioctl+0x85/0x90 [ 105.612154] [<ffffffff815b5520>] sock_do_ioctl+0x30/0x70 [ 105.612157] [<ffffffff815b55d3>] sock_ioctl+0x73/0x280 [ 105.612162] [<ffffffff811b7698>] do_vfs_ioctl+0x98/0x570 [ 105.612165] [<ffffffff811a5c40>] ? fget_light+0x340/0x3a0 [ 105.612168] [<ffffffff811b7bbf>] sys_ioctl+0x4f/0x80 [ 105.612172] [<ffffffff816fdcab>] system_call_fastpath+0x16/0x1b Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19filter: use unsigned int to silence static checker warningDan Carpenter2-2/+2
This is just a cleanup. My testing version of Smatch warns about this: net/core/filter.c +380 check_load_and_stores(6) warn: check 'flen' for negative values flen comes from the user. We try to clamp the values here between 1 and BPF_MAXINSNS but the clamp doesn't work because it could be negative. This is a bug, but it's not exploitable. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19NET: asix: fix ethtool -e for AX88178 USB dongleGrant Grundler1-0/+3
"ethtool -e ethX" dumps EEPROM data. Patch sets EEPROM length for device. Ethtool works alot better when the kernel believes the length is > 0. From: Allan Chou <allan@asix.com.tw> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19cleanup: remove unnecessary include.Kevin Wilson1-4/+0
This cleanup patch removes unnecessary include from net/ipv6/ip6_fib.c. Signed-off-by: Kevin Wilson <wkevils@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19ipv4: compat_ioctl is local to af_inet.c, make it staticGerrit Renker1-1/+1
ipv4: compat_ioctl is local to af_inet.c, make it static Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19stmmac: limit max_mtu in case of 4KiB and use __netdev_alloc_skb (V2)Giuseppe CAVALLARO1-2/+4
Problem using big mtu around 4096 bytes is you end allocating (4096 +NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info) bytes -> 8192 bytes : order-1 pages It's better to limit the mtu to SKB_MAX_HEAD(NET_SKB_PAD), to have no more than one page per skb. Also the patch changes the netdev_alloc_skb_ip_align() done in init_dma_desc_rings() and uses a variant allowing GFP_KERNEL allocations allowing the driver to load even in case of memory pressure. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19stmmac: add CHAINED descriptor mode support (V4)Giuseppe CAVALLARO10-105/+491
This patch enhances the STMMAC driver to support CHAINED mode of descriptor. STMMAC supports DMA descriptor to operate both in dual buffer(RING) and linked-list(CHAINED) mode. In RING mode (default) each descriptor points to two data buffer pointers whereas in CHAINED mode they point to only one data buffer pointer. In CHAINED mode each descriptor will have pointer to next descriptor in the list, hence creating the explicit chaining in the descriptor itself, whereas such explicit chaining is not possible in RING mode. First version of this work has been done by Rayagond. Then the patch has been reworked avoiding ifdef inside the C code. A new header file has been added to define all the functions needed for managing enhanced and normal descriptors. In fact, these have to be specialized according to the ring/chain usage. Two new C files have been also added to implement the helper routines needed to manage: jumbo frames, chain and ring setup (i.e. desc3). Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19stmmac: allow mmc usage only if feature actually available (V4)Giuseppe CAVALLARO2-11/+16
Enable the MMC support if it is actually available from the HW capability register. Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19stmmac: use predefined macros for HW cap register fields (V4)Rayagond Kokatanur2-21/+63
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19stmmac: allow mtu bigger than 1500 in case of normal desc (V4)Giuseppe CAVALLARO2-3/+9
This patch allows to set the mtu bigger than 1500 in case of normal descriptors. This is helping some SPEAr customers. Signed-off-by: Deepak SIKRI <deepak.sikri@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19stmmac: update the driver version and doc (V4)Giuseppe CAVALLARO2-2/+11
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19stmmac: protect tx process with lock (V4)Giuseppe CAVALLARO2-0/+9
This patch fixes a problem raised on Orly ARM SMP platform where, in case of fragmented frames, the descriptors in the TX ring resulted broken. This was due to a missing lock protection in the tx process. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19stmmac: Stop advertising 1000Base capabilties for non GMII iface (V4).Srinivas Kandagatla1-3/+10
This patch stops advertising 1000Base capablities if GMAC is either configured for MII or RMII mode and on board there is a GPHY plugged on. Without this patch if an GBit switch is connected on MII interface, Ethernet stops working at all. Discovered as part of https://bugzilla.stlinux.com/show_bug.cgi?id=14148 triage Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19sysfs: Reject with a warning invalid uses of tagged directories.Eric W. Biederman2-3/+14
sysfs is a core piece of ifrastructure that many people use and few people have all of the rules in their head on how to use it correctly. Add warnings for people using tagged directories improperly to that any misuses can be caught and diagnosed quickly. A single inexpensive test in sysfs_find_dirent is almost sufficient to catch all possible misuses. An additional warning is needed in sysfs_add_dirent so that we actually fail when attempting to add an untagged dirent in a tagged directory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19sysfs: Remove support for tagged directories with untagged members.Eric W. Biederman2-5/+3
Now that /sys/class/net/bonding_masters is implemented as a tagged sysfs file we can remove support for untagged files in tagged directories. This change removes any ambiguity of what a NULL namespace value means. A NULL namespace parameter after this patch means that we are talking about an untagged sysfs dirent. This makes the sysfs code much less prone to mistakes when during maintenance. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19bonding: Use a per netns implementation of /sys/class/net/bonding_masters.Eric W. Biederman3-21/+38
This fixes a network namespace misfeature that bonding_masters looked at current instead of the remembering the context where in which /sys/class/net/bonding_masters was opened in to see which network namespace to act upon. This removes the need for sysfs to handle tagged directories with untagged members allowing for a conceptually simpler sysfs implementation. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19class: Implement support for class attrs in tagged sysfs directories.Eric W. Biederman2-2/+17
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19sysfs: Implement support for tagged files in sysfs.Eric W. Biederman2-2/+52
Looking up files in sysfs is hard to understand and analyize because we currently allow placing untagged files in tagged directories. In the implementation of that we have two subtly different meanings of NULL. NULL meaning there is no tag on a directory entry and NULL meaning we don't care which namespace the lookup is performed for. This multiple uses of NULL have resulted in subtle bugs (since fixed) in the code. Currently it is only the bonding driver that needs to have an untagged file in a tagged directory. To untagle this mess I am adding support for tagged files to sysfs. Modifying the bonding driver to implement bonding_masters as a tagged file. Registering bonding_masters once for each network namespace. Then I am removing support for untagged entries in tagged sysfs directories. Resulting in code that is much easier to reason about. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19bonding: fix wrong port enabling in 802.3adFlavio Leitner1-7/+0
The port shouldn't be enabled unless its current MUX state is DISTRIBUTING which is correctly handled by ad_mux_machine(), otherwise the packet sent can be lost because the other end may not be ready. The issue happens on every port initialization, but as the ports are expected to move quickly to DISTRIBUTING, it doesn't cause much problem. However, it does cause constant packet loss if the other peer has the port configured to stay in STANDBY (i.e. SYNC set to OFF). Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19net: validate HWTSTAMP ioctl parametersRichard Cochran2-2/+60
This patch adds a sanity check on the values provided by user space for the hardware time stamping configuration. If the values lie outside of the absolute limits, then the ioctl request will be denied. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19net: Move rcu_barrier from rollback_registered_many to netdev_run_todo.Eric W. Biederman1-1/+7
This patch moves the rcu_barrier from rollback_registered_many (inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock). This allows us to gain the full benefit of sychronize_net calling synchronize_rcu_expedited when the rtnl_lock is held. The rcu_barrier in rollback_registered_many was originally a synchronize_net but was promoted to be a rcu_barrier() when it was found that people were unnecessarily hitting the 250ms wait in netdev_wait_allrefs(). Changing the rcu_barrier back to a synchronize_net is therefore safe. Since we only care about waiting for the rcu callbacks before we get to netdev_wait_allrefs() it is also safe to move the wait into netdev_run_todo. This was tested by creating and destroying 1000 tap devices and observing /proc/lock_stat. /proc/lock_stat reports this change reduces the hold times of the rtnl_lock by a factor of 10. There was no observable difference in the amount of time it takes to destroy a network device. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19tcp: use TCP_INIT_CWND in tcp_fixup_sndbuf()Eric Dumazet1-5/+3
Initial cwnd being 10 (TCP_INIT_CWND) instead of 3, change tcp_fixup_sndbuf() to get more than 16384 bytes (sysctl_tcp_wmem[1]) in initial sk_sndbuf Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19phylib: Modify Vitesse RGMII skew settingsAndy Fleming1-12/+22
The Vitesse driver was using the RGMII_ID interface type to determine if skew was necessary. However, we want to move away from using that interface type, as it's really a property of the board's PHY connection. However, some boards depend on it, so we want to support it, while allowing new boards to use the more flexible "fixups" approach. To do this, we extract the code which adds skew into its own function, and call that function when RGMII_ID has been selected. Another side-effect of this change is that if your PHY has skew set already, it doesn't clear it. This way, the fixup code can modify the register without config_init then clearing it. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19net: Allow skb_recycle_check to be done in stagesAndy Fleming2-25/+47
skb_recycle_check resets the skb if it's eligible for recycling. However, there are times when a driver might want to optionally manipulate the skb data with the skb before resetting the skb, but after it has determined eligibility. We do this by splitting the eligibility check from the skb reset, creating two inline functions to accomplish that task. Signed-off-by: Andy Fleming <afleming@freescale.com> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>