summaryrefslogtreecommitdiff
path: root/net/core/rtnetlink.c
AgeCommit message (Collapse)AuthorFilesLines
2010-02-28Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller1-0/+8
Conflicts: drivers/firmware/iscsi_ibft.c
2010-02-27rtnetlink: support specifying device flags on device creationPatrick McHardy1-9/+43
commit e8469ed959c373c2ff9e6f488aa5a14971aebe1f Author: Patrick McHardy <kaber@trash.net> Date: Tue Feb 23 20:41:30 2010 +0100 Support specifying the initial device flags when creating a device though rtnl_link. Devices allocated by rtnl_create_link() are marked as INITIALIZING in order to surpress netlink registration notifications. To complete setup, rtnl_configure_link() must be called, which performs the device flag changes and invokes the deferred notifiers if everything went well. Two examples: # add macvlan to eth0 # $ ip link add link eth0 up allmulticast on type macvlan [LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 26:f8:84:02:f9:2a brd ff:ff:ff:ff:ff:ff [ROUTE]ff00::/8 dev macvlan0 table local metric 256 mtu 1500 advmss 1440 hoplimit 0 [ROUTE]fe80::/64 dev macvlan0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 [LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 link/ether 26:f8:84:02:f9:2a [ADDR]11: macvlan0 inet6 fe80::24f8:84ff:fe02:f92a/64 scope link valid_lft forever preferred_lft forever [ROUTE]local fe80::24f8:84ff:fe02:f92a via :: dev lo table local proto none metric 0 mtu 16436 advmss 16376 hoplimit 0 [ROUTE]default via fe80::215:e9ff:fef0:10f8 dev macvlan0 proto kernel metric 1024 mtu 1500 advmss 1440 hoplimit 0 [NEIGH]fe80::215:e9ff:fef0:10f8 dev macvlan0 lladdr 00:15:e9:f0:10:f8 router STALE [ROUTE]2001:6f8:974::/64 dev macvlan0 proto kernel metric 256 expires 0sec mtu 1500 advmss 1440 hoplimit 0 [PREFIX]prefix 2001:6f8:974::/64 dev macvlan0 onlink autoconf valid 14400 preferred 131084 [ADDR]11: macvlan0 inet6 2001:6f8:974:0:24f8:84ff:fe02:f92a/64 scope global dynamic valid_lft 86399sec preferred_lft 14399sec # add VLAN to eth1, eth1 is down # $ ip link add link eth1 up type vlan id 1000 RTNETLINK answers: Network is down <no events> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27dev: support deferring device flag change notificationsPatrick McHardy1-2/+0
Split dev_change_flags() into two functions: __dev_change_flags() to perform the actual changes and __dev_notify_flags() to invoke netdevice notifiers. This will be used by rtnl_link to defer netlink notifications until the device has been fully configured. This changes ordering of some operations, in particular: - netlink notifications are sent after all changes have been performed. As a side effect this surpresses one unnecessary netlink message when the IFF_UP and other flags are changed simultaneously. - The NETDEV_UP/NETDEV_DOWN and NETDEV_CHANGE notifiers are invoked after all changes have been performed. Their relative is unchanged. - net_dmaengine_put() is invoked before the NETDEV_DOWN notifier instead of afterwards. This should not make any difference since both RX and TX are already shut down at this point. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27rtnetlink: handle rtnl_link netlink notifications manuallyPatrick McHardy1-3/+1
In order to support specifying device flags during device creation, we must be able to roll back device registration in case setting the flags fails without sending any notifications related to the device to userspace. This patch changes rollback_registered_many() and register_netdevice() to manually send netlink notifications for devices not handled by rtnl_link and allows to defer notifications for devices handled by rtnl_link until setup is complete. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27rtnetlink: ignore NETDEV_PRE_UP notifier in rtnetlink_event()Patrick McHardy1-0/+1
Commit 3b8bcfd (net: introduce pre-up netdev notifier) added a new notifier which is run before a device is set UP for use by cfg80211. The patch missed to add the new notifier to the ignore list in rtnetlink_event(), so we currently get an unnecessary netlink notification before a device is set UP. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26rtnetlink: clean up SR-IOV config interfaceWilliams, Mitch A1-8/+5
This patch consists of a few minor cleanups to the SR-IOV configurion code in rtnetlink. - Remove unneccesary lock - Remove unneccesary casts - Return correct error code for no driver support These changes are based on comments from Patrick McHardy Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-25net: Add checking to rcu_dereference() primitivesPaul E. McKenney1-0/+8
Update rcu_dereference() primitives to use new lockdep-based checking. The rcu_dereference() in __in6_dev_get() may be protected either by rcu_read_lock() or RTNL, per Eric Dumazet. The rcu_dereference() in __sk_free() is protected by the fact that it is never reached if an update could change it. Check for this by using rcu_dereference_check() to verify that the struct sock's ->sk_wmem_alloc counter is zero. Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-12rtnetlink: Add VF config code to rtnetlinkWilliams, Mitch A1-0/+67
Add code to allow rtnetlink clients to query and set VF information through the PF driver. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-17net: spread __net_init, __net_exitAlexey Dobriyan1-2/+2
__net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-13net: Fix userspace RTM_NEWLINK notifications.Eric W. Biederman1-3/+3
I received some bug reports about userspace programs having problems because after RTM_NEWLINK was received they could not immediate access files under /proc/sys/net/ because they had not been registered yet. The original problem was trivially fixed by moving the userspace notification from rtnetlink_event() to the end of register_netdevice(). When testing that change I discovered I was still getting RTM_NEWLINK events before I could access proc and I was also getting RTM_NEWLINK events after I was seeing RTM_DELLINK. Things practically guaranteed to confuse userspace. After a little more investigation these extra notifications proved to be from the new notifiers NETDEV_POST_INIT and NETDEV_UNREGISTER_BATCH hitting the default case in rtnetlink_event, and triggering unnecessary RTM_NEWLINK messages. rtnetlink_event now explicitly handles NETDEV_UNREGISTER_BATCH and NETDEV_POST_INIT to avoid sending the incorrect userspace notifications. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08net: Support specifying the network namespace upon device creation.Eric W. Biederman1-11/+27
There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2009-11-07rtnetlink: CleanupsEric Dumazet1-31/+23
Pure cleanups patch Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: add a list_head parameter to dellink() methodEric Dumazet1-7/+7
Adding a list_head parameter to rtnl_link_ops->dellink() methods allow us to queue devices on a list, in order to dismantle them all at once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24rtnetlink: speedup rtnl_dump_ifinfo()Eric Dumazet1-13/+24
When handling large number of netdevice, rtnl_dump_ifinfo() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the 256 sub lists of the dev_index hash table. This considerably speedups "ip link" operations Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-22rtnetlink: rtnl_setlink() and rtnl_getlink() changesEric Dumazet1-19/+19
rtnl_getlink() & rtnl_setlink() run with RTNL held, we can use __dev_get_by_index() and __dev_get_by_name() variants and avoid dev_hold()/dev_put() Adds to rtnl_getlink() the capability to find a device by its name, not only by its index. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06net_sched: reintroduce dev->qdisc for use by sch_apiPatrick McHardy1-4/+2
Currently the multiqueue integration with the qdisc API suffers from a few problems: - with multiple queues, all root qdiscs use the same handle. This means they can't be exposed to userspace in a backwards compatible fashion. - all API operations always refer to queue number 0. Newly created qdiscs are automatically shared between all queues, its not possible to address individual queues or restore multiqueue behaviour once a shared qdisc has been attached. - Dumps only contain the root qdisc of queue 0, in case of non-shared qdiscs this means the statistics are incomplete. This patch reintroduces dev->qdisc, which points to the (single) root qdisc from userspace's point of view. Currently it either points to the first (non-shared) default qdisc, or a qdisc shared between all queues. The following patches will introduce a classful dummy qdisc, which will be used as root qdisc and contain the per-queue qdiscs as children. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02vlan: multiqueue vlan deviceEric Dumazet1-1/+9
vlan devices are currently not multi-queue capable. We can do that with a new rtnl_link_ops method, get_tx_queues(), called from rtnl_create_link() This new method gets num_tx_queues/real_num_tx_queues from real device. register_vlan_device() is also handled. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12net: move and export get_net_ns_by_pidJohannes Berg1-20/+1
The function get_net_ns_by_pid(), to get a network namespace from a pid_t, will be required in cfg80211 as well. Therefore, let's move it to net_namespace.c and export it. We can't make it a static inline in the !NETNS case because it needs to verify that the given pid even exists (and return -ESRCH). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24netlink: change nlmsg_notify() return value logicPablo Neira Ayuso1-4/+5
This patch changes the return value of nlmsg_notify() as follows: If NETLINK_BROADCAST_ERROR is set by any of the listeners and an error in the delivery happened, return the broadcast error; else if there are no listeners apart from the socket that requested a change with the echo flag, return the result of the unicast notification. Thus, with this patch, the unicast notification is handled in the same way of a broadcast listener that has set the NETLINK_BROADCAST_ERROR socket flag. This patch is useful in case that the caller of nlmsg_notify() wants to know the result of the delivery of a netlink notification (including the broadcast delivery) and take any action in case that the delivery failed. For example, ctnetlink can drop packets if the event delivery failed to provide reliable logging and state-synchronization at the cost of dropping packets. This patch also modifies the rtnetlink code to ignore the return value of rtnl_notify() in all callers. The function rtnl_notify() (before this patch) returned the error of the unicast notification which makes rtnl_set_sk_err() reports errors to all listeners. This is not of any help since the origin of the change (the socket that requested the echoing) notices the ENOBUFS error if the notification fails and should resync itself. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19netdev: introduce dev_get_stats()Stephen Hemminger1-3/+3
In order for the network device ops get_stats call to be immutable, the handling of the default internal network device stats block has to be changed. Add a new helper function which replaces the old use of internal_get_stats. Note: change return code to make it clear that the caller should not go changing the returned statistics. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19netdev: network device operations infrastructureStephen Hemminger1-4/+5
This patch changes the network device internal API to move adminstrative operations out of the network device structure and into a separate structure. This patch involves some hackery to maintain compatablity between the new and old model, so all 300+ drivers don't have to be changed at once. For drivers that aren't converted yet, the netdevice_ops virt function list still resides in the net_device structure. For old protocols, the new net_device_ops are copied out to the old net_device pointers. After the transistion is completed the nag message can be changed to an WARN_ON, and the compatiablity code can be made configurable. Some function pointers aren't moved: * destructor can't be in net_device_ops because it may need to be referenced after the module is unloaded. * neighbor setup is manipulated in a couple of places that need special consideration * hard_start_xmit is in the fast path for transmit. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16rtnetlink: propagate error from dev_change_flags in do_setlink()Johannes Berg1-1/+3
Unlike ifconfig, iproute doesn't report an error when setting an interface up fails: (example: put wireless network mac80211 interface into repeater mode with iwconfig but do not set a peer MAC address, it should fail with -ENOLINK) without patch: # ip link set wlan0 up ; echo $? 0 # with patch: # ip link set wlan0 up ; echo $? RTNETLINK answers: Link has been severed 2 # Propagate the return value from dev_change_flags() to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-16net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely)Johannes Berg1-2/+2
Some code here depends on CONFIG_KMOD to not try to load protocol modules or similar, replace by CONFIG_MODULES where more than just request_module depends on CONFIG_KMOD and and also use try_then_request_module in ebtables. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08Merge branch 'master' of ↵David S. Miller1-1/+1
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/ich8lan.c drivers/net/e1000e/netdev.c
2008-10-07net: Fix netdev_run_todo dead-lockHerbert Xu1-1/+1
Benjamin Thery tracked down a bug that explains many instances of the error unregister_netdevice: waiting for %s to become free. Usage count = %d It turns out that netdev_run_todo can dead-lock with itself if a second instance of it is run in a thread that will then free a reference to the device waited on by the first instance. The problem is really quite silly. We were trying to create parallelism where none was required. As netdev_run_todo always follows a RTNL section, and that todo tasks can only be added with the RTNL held, by definition you should only need to wait for the very ones that you've added and be done with it. There is no need for a second mutex or spinlock. This is exactly what the following patch does. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22net: network device name ifalias supportStephen Hemminger1-0/+13
This patch add support for keeping an additional character alias associated with an network interface. This is useful for maintaining the SNMP ifAlias value which is a user defined value. Routers use this to hold information like which circuit or line it is connected to. It is just an arbitrary text label on the network device. There are two exposed interfaces with this patch, the value can be read/written either via netlink or sysfs. This could be maintained just by the snmp daemon, but it is more generally useful for other management tools, and the kernel is good place to act as an agreed upon interface to store it. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17netdev: Allocate multiple queues for TX.David S. Miller1-1/+1
alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08netdev: Move rest of qdisc state into struct netdev_queueDavid S. Miller1-2/+4
Now qdisc, qdisc_sleeping, and qdisc_list also live there. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10Merge branch 'master' of ↵David S. Miller1-1/+2
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tg3.c drivers/net/wireless/rt2x00/rt2x00dev.c net/mac80211/ieee80211_i.h
2008-06-03netlink: Improve returned error codesThomas Graf1-1/+2
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and nla_nest_cancel() void functions. Return -EMSGSIZE instead of -1 if the provided message buffer is not big enough. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-21net: The dev->get_stats pointer is not NULL nowadays.Pavel Emelyanov1-12/+8
And so does the pointer is returns, but sysfs and netlinks still check for both cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-23[RTNETLINK]: Fix bogus ASSERT_RTNL warningPatrick McHardy1-0/+6
ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is held. This bogus warnings when running in atomic context, which f.e. happens when adding secondary unicast addresses through macvlan or vlan or when synchronizing multicast addresses from wireless devices. Mid-term we might want to consider moving all address updates to process context since the locking seems overly complicated, for now just fix the bogus warning by changing ASSERT_RTNL to use mutex_is_locked(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16[RTNL]: Introduce the rtnl_kill_links helper.Pavel Emelyanov1-8/+21
This one is responsible for calling ->dellink on each net device found in net to help with vlan net_exit hook in the nearest future. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16[RTNL]: Relax for_each_netdev_safe in __rtnl_link_unregister.Pavel Emelyanov1-2/+2
Each potential list_del (happening from inside a ->dellink call) is followed by goto restart, so there's no need in _safe iteration. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.YOSHIFUJI Hideaki1-6/+6
Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.YOSHIFUJI Hideaki1-2/+2
Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-02-23[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINKThomas Graf1-6/+19
RTM_NEWLINK allows for already existing links to be modified. For this purpose do_setlink() is called which expects address attributes with a payload length of at least dev->addr_len. This patch adds the necessary validation for the RTM_NEWLINK case. The address length for links to be created is not checked for now as the actual attribute length is used when copying the address to the netdevice structure. It might make sense to report an error if less than addr_len bytes are provided but enforcing this might break drivers trying to be smart with not transmitting all zero addresses. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19[RTNL]: Add missing link netlink attribute policy definitionsThomas Graf1-0/+2
IFLA_LINK is no longer a write-only attribute on the kernel side and must thus be validated. Same goes for the newly introduced IFLA_LINKINFO. Fixes undefined behaviour if either of the attributes are not well formed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17Revert "[RTNETLINK]: Send a single notification on device state changes."David S. Miller1-26/+10
This reverts commit 45b503548210fe6f23e92b856421c2a3f05fd034. It break locking around dev->link_mode as well as cause other bootup problems. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12[RTNETLINK]: Send a single notification on device state changes.Laszlo Attila Toth1-10/+26
In do_setlink() a single notification is sent at the end of the function if any modification occured. If the address has been changed, another notification is sent. Both of them is required because originally only the NETDEV_CHANGEADDR notification was sent and although device state change implies address change, some programs may expect the original notification. It remains for compatibity. If set_operstate() is called from do_setlink(), it doesn't send a notification, only if it is called from rtnl_create_link() as earlier. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05[NET] rtnetlink.c: remove no longer used functionsAdrian Bunk1-44/+0
This patch removes the following no longer used functions: - rtattr_parse() - rtattr_strlcpy() - __rtattr_parse_nested_compat() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Namespace stop vs 'ip r l' race.Denis V. Lunev1-13/+2
During network namespace stop process kernel side netlink sockets belonging to a namespace should be closed. They should not prevent namespace to stop, so they do not increment namespace usage counter. Though this counter will be put during last sock_put. The raplacement of the correct netns for init_ns solves the problem only partial as socket to be stoped until proper stop is a valid netlink kernel socket and can be looked up by the user processes. This is not a problem until it resides in initial namespace (no processes inside this net), but this is not true for init_net. So, hold the referrence for a socket, remove it from lookup tables and only after that change namespace and perform a last put. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Consolidate kernel netlink socket destruction.Denis V. Lunev1-1/+1
Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Memory leak on network namespace stop.Denis V. Lunev1-1/+1
Network namespace allocates 2 kernel netlink sockets, fibnl & rtnl. These sockets should be disposed properly, i.e. by sock_release. Plain sock_put is not enough. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Make the netlink methods in rtnetlink handle multiple network namespacesEric W. Biederman1-24/+3
After the previous prep work this just consists of removing checks limiting the code to work in the initial network namespace, and updating rtmsg_ifinfo so we can generate events for devices in something other then the initial network namespace. Referring to network other network devices like the IFLA_LINK and IFLA_MASTER attributes do, gets interesting if those network devices happen to be in other network namespaces. Currently ifindex numbers are allocated globally so I have taken the path of least resistance and not still report the information even though the devices they are talking about are invisible. If applications start getting confused or when ifindex numbers become local to the network namespace we may need to do something different in the future. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Denis V. Lunev <den@openz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Make rtnetlink infrastructure network namespace aware (v3)Denis V. Lunev1-11/+52
After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2)Denis V. Lunev1-0/+19
Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-20[NET]: rtnl_link: fix use-after-freePatrick McHardy1-1/+4
When unregistering the rtnl_link_ops, all existing devices using the ops are destroyed. With nested devices this may lead to a use-after-free despite the use of for_each_netdev_safe() in case the upper device is next in the device list and is destroyed by the NETDEV_UNREGISTER notifier. The easy fix is to restart scanning the device list after removing a device. Alternatively we could add new devices to the front of the list to avoid having dependant devices follow the device they depend on. A third option would be to only restart scanning if dev->iflink of the next device matches dev->ifindex of the current one. For now this seems like the safest solution. With this patch, the veth rtnl_link_ops unregistration can use rtnl_link_unregister() directly since it now also handles destruction of multiple devices at once. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26[NETNS]: Fix get_net_ns_by_pidEric W. Biederman1-1/+1
The pid namespace patches changed the semantics of find_task_by_pid without breaking the compile resulting in get_net_ns_by_pid doing the wrong thing. So switch to using the intended find_task_by_vpid. Combined with Denis' earlier patch to make netlink traffic fully synchronous the inadvertent race I introduced with accessing current is actually removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-19Make access to task's nsproxy lighterPavel Emelyanov1-4/+4
When someone wants to deal with some other taks's namespaces it has to lock the task and then to get the desired namespace if the one exists. This is slow on read-only paths and may be impossible in some cases. E.g. Oleg recently noticed a race between unshare() and the (sent for review in cgroups) pid namespaces - when the task notifies the parent it has to know the parent's namespace, but taking the task_lock() is impossible there - the code is under write locked tasklist lock. On the other hand switching the namespace on task (daemonize) and releasing the namespace (after the last task exit) is rather rare operation and we can sacrifice its speed to solve the issues above. The access to other task namespaces is proposed to be performed like this: rcu_read_lock(); nsproxy = task_nsproxy(tsk); if (nsproxy != NULL) { / * * work with the namespaces here * e.g. get the reference on one of them * / } / * * NULL task_nsproxy() means that this task is * almost dead (zombie) * / rcu_read_unlock(); This patch has passed the review by Eric and Oleg :) and, of course, tested. [clg@fr.ibm.com: fix unshare()] [ebiederm@xmission.com: Update get_net_ns_by_pid] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>