Age | Commit message (Collapse) | Author | Files | Lines |
|
Clean up: The logic to wait for write space is common to a bunch of
the encoding helper functions. Lift it out and put it in the tail
of rpcrdma_marshal_req().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the
transport never calls xprt_write_space() when more pages become
available. -ENOBUFS will trigger the correct "delay briefly and call
again" logic.
Fixes: 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
After commit f6cc9c054e77, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argument
dev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.
With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.
CC: Petr Machata <petrm@mellanox.com>
CC: Ido Schimmel <idosch@mellanox.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2018-05-31
1) Avoid possible overflow of the offset variable
in _decode_session6(), this fixes an infinite
lookp there. From Eric Dumazet.
2) We may use an error pointer in the error path of
xfrm_bundle_create(). Fix this by returning this
pointer directly to the caller.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function
involves a lot of branching on specific message type because most of the
code is message-specific. This significantly complicates adding new
functionality and doesn't provide much benefit of code reuse.
Split tc_ctl_tfilter to three standalone functions that handle filter new,
delete and get requests.
The only truly protocol independent part of tc_ctl_tfilter is code that
looks up queue, class, and block. Refactor this code to standalone
tcf_block_find function that is used by all three new handlers.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rtnl_newlink(), NULL check is performed on m_ops however member of
ops is accessed. Fixed by accessing member of m_ops instead of ops.
[ 345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110
[ 345.432629] Read of size 4 at addr 0000000000000088 by task ip/986
[ 345.432629]
[ 345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9
[ 345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 345.432629] Call Trace:
[ 345.432629] dump_stack+0xc6/0x150
[ 345.432629] ? dump_stack_print_info.cold.0+0x1b/0x1b
[ 345.432629] ? kasan_report+0xb4/0x410
[ 345.432629] kasan_report.cold.4+0x8f/0x91
[ 345.432629] ? rtnl_newlink+0x400/0x1110
[ 345.432629] rtnl_newlink+0x400/0x1110
[...]
Fixes: ccf8dbcd062a ("rtnetlink: Remove VLA usage")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(resend for properly queueing in patchwork)
kcm_clone() creates kernel socket, which does not take net counter.
Thus, the net may die before the socket is completely destructed,
i.e. kcm_exit_net() is executed before kcm_done().
Reported-by: syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for FTP commands with extended format (RFC 2428):
- FTP EPRT: IPv4 and IPv6, active mode, similar to PORT
- FTP EPSV: IPv4 and IPv6, passive mode, similar to PASV.
EPSV response usually contains only port but we allow real
server to provide different address
We restrict control and data connection to be from same
address family.
Allow the "(" and ")" to be optional in PASV response.
Also, add ipvsh argument to the pkt_in/pkt_out handlers to better
access the payload after transport header.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Prepare NFCT to support IPv6 for FTP:
- Do not restrict the expectation callback to PF_INET
- Split the debug messages, so that the 160-byte limitation
in IP_VS_DBG_BUF is not exceeded when printing many IPv6
addresses. This means no more than 3 addresses in one message,
i.e. 1 tuple with 2 addresses or 1 connection with 3 addresses.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This allows us to forward packets from the netdev family via neighbour
layer, so you don't need an explicit link-layer destination when using
this expression from rules. The ttl/hop_limit field is decremented.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible attrs and adds
sanity-checks at both registration and usage to make sure nothing
gets out of sync.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Some drivers, such as vxlan and wireguard, use the skb's dst in order to
determine things like PMTU. They therefore loose functionality when flow
offloading is enabled. So, we ensure the skb has it before xmit'ing it
in the offloading path.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The following ruleset:
add table ip filter
add chain ip filter input { type filter hook input priority 4; }
add chain ip filter ap
add rule ip filter input jump ap
add rule ip filter ap masquerade
results in a panic, because the masquerade extension should be rejected
from the filter chain. The existing validation is missing a chain
dependency check when the rule is added to the non-base chain.
This patch fixes the problem by walking down the rules from the
basechains, searching for either immediate or lookup expressions, then
jumping to non-base chains and again walking down the rules to perform
the expression validation, so we make sure the full ruleset graph is
validated. This is done only once from the commit phase, in case of
problem, we abort the transaction and perform fine grain validation for
error reporting. This patch requires 003087911af2 ("netfilter:
nfnetlink: allow commit to fail") to achieve this behaviour.
This patch also adds a cleanup callback to nfnl batch interface to reset
the validate state from the exit path.
As a result of this patch, nf_tables_check_loops() doesn't use
->validate to check for loops, instead it just checks for immediate
expressions.
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This extends log statement to support the behaviour achieved with
AUDIT target in iptables.
Audit logging is enabled via a pseudo log level 8. In this case any
other settings like log prefix are ignored since audit log format is
fixed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Now it can only match the transparent flag of an ip/ipv6 socket.
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
net/netfilter/nft_numgen.c:117:1-3: WARNING: PTR_ERR_OR_ZERO can be used
net/netfilter/nft_hash.c:180:1-3: WARNING: PTR_ERR_OR_ZERO can be used
net/netfilter/nft_hash.c:223:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Fixes: b9ccc07e3f31 ("netfilter: nft_hash: add map lookups for hashing operations")
Fixes: d734a2888922 ("netfilter: nft_numgen: add map lookups for numgen statements")
CC: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch reorders the error cases in showing the XPS configuration so
that we hold off on memory allocation until after we have verified that we
can support XPS on a given ring.
Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible types and adds
sanity-checks at both registration and usage to make sure nothing gets
out of sync. This matches the proposed VLA solution for nfnetlink[2]. The
values chosen here were based on finding assignments for .maxtype and
.slave_maxtype and manually counting the enums:
slave_maxtype (max 33):
IFLA_BRPORT_MAX 33
IFLA_BOND_SLAVE_MAX 9
maxtype (max 45):
IFLA_BOND_MAX 28
IFLA_BR_MAX 45
__IFLA_CAIF_HSI_MAX 8
IFLA_CAIF_MAX 4
IFLA_CAN_MAX 16
IFLA_GENEVE_MAX 12
IFLA_GRE_MAX 25
IFLA_GTP_MAX 5
IFLA_HSR_MAX 7
IFLA_IPOIB_MAX 4
IFLA_IPTUN_MAX 21
IFLA_IPVLAN_MAX 3
IFLA_MACSEC_MAX 15
IFLA_MACVLAN_MAX 7
IFLA_PPP_MAX 2
__IFLA_RMNET_MAX 4
IFLA_VLAN_MAX 6
IFLA_VRF_MAX 2
IFLA_VTI_MAX 7
IFLA_VXLAN_MAX 28
VETH_INFO_MAX 2
VXCAN_INFO_MAX 2
This additionally changes maxtype and slave_maxtype fields to unsigned,
since they're only ever using positive values.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://patchwork.kernel.org/patch/10439647/
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With CONFIG_CC_STACKPROTECTOR enabled the kernel panics as below when
parsing a NCSI_CMD_PKG_INFO command:
[ 150.149711] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08
[ 150.149711]
[ 150.159919] CPU: 0 PID: 1301 Comm: ncsi-netlink Not tainted 4.13.16-468cbec6d2c91239332cb91b1f0a73aafcb6f0c6 #1
[ 150.170004] Hardware name: Generic DT based system
[ 150.174852] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24)
[ 150.182641] [<80106bc4>] (show_stack) from [<805d36e4>] (dump_stack+0x20/0x28)
[ 150.189888] [<805d36e4>] (dump_stack) from [<801163ac>] (panic+0xdc/0x278)
[ 150.196780] [<801163ac>] (panic) from [<801162cc>] (__stack_chk_fail+0x20/0x24)
[ 150.204111] [<801162cc>] (__stack_chk_fail) from [<805cff08>] (ncsi_pkg_info_all_nl+0x244/0x258)
[ 150.212912] [<805cff08>] (ncsi_pkg_info_all_nl) from [<804f939c>] (genl_lock_dumpit+0x3c/0x54)
[ 150.221535] [<804f939c>] (genl_lock_dumpit) from [<804f873c>] (netlink_dump+0xf8/0x284)
[ 150.229550] [<804f873c>] (netlink_dump) from [<804f8d44>] (__netlink_dump_start+0x124/0x17c)
[ 150.237992] [<804f8d44>] (__netlink_dump_start) from [<804f9880>] (genl_rcv_msg+0x1c8/0x3d4)
[ 150.246440] [<804f9880>] (genl_rcv_msg) from [<804f9174>] (netlink_rcv_skb+0xd8/0x134)
[ 150.254361] [<804f9174>] (netlink_rcv_skb) from [<804f96a4>] (genl_rcv+0x30/0x44)
[ 150.261850] [<804f96a4>] (genl_rcv) from [<804f7790>] (netlink_unicast+0x198/0x234)
[ 150.269511] [<804f7790>] (netlink_unicast) from [<804f7ffc>] (netlink_sendmsg+0x368/0x3b0)
[ 150.277783] [<804f7ffc>] (netlink_sendmsg) from [<804abea4>] (sock_sendmsg+0x24/0x34)
[ 150.285625] [<804abea4>] (sock_sendmsg) from [<804ac1dc>] (___sys_sendmsg+0x244/0x260)
[ 150.293556] [<804ac1dc>] (___sys_sendmsg) from [<804ad98c>] (__sys_sendmsg+0x5c/0x9c)
[ 150.301400] [<804ad98c>] (__sys_sendmsg) from [<804ad9e4>] (SyS_sendmsg+0x18/0x1c)
[ 150.308984] [<804ad9e4>] (SyS_sendmsg) from [<80102640>] (ret_fast_syscall+0x0/0x3c)
[ 150.316743] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08
This turns out to be because the attrs array in ncsi_pkg_info_all_nl()
is initialised to a length of NCSI_ATTR_MAX which is the maximum
attribute number, not the number of attributes.
Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we fail to modify a rule, we incorrectly release the idr handle
of the unmodified old rule.
Fix that by checking if we need to release it.
Fixes: fe2502e49b58 ("net_sched: remove cls_flower idr on failure")
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A driver might need to react to changes in settings of brentry VLANs.
Therefore send switchdev port notifications for these as well. Reuse
SWITCHDEV_OBJ_ID_PORT_VLAN for this purpose. Listeners should use
netif_is_bridge_master() on orig_dev to determine whether the
notification is about a bridge port or a bridge.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A follow-up patch enables emitting VLAN notifications for the bridge CPU
port in addition to the existing slave port notifications. These
notifications have orig_dev set to the bridge in question.
Because there's no specific support for these VLANs, just ignore the
notifications to maintain the current behavior.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extract the code that deals with adding a preexisting VLAN to bridge CPU
port to a separate function. A follow-up patch introduces a need to roll
back operations in this block due to an error, and this split will make
the error-handling code clearer.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A call to switchdev_port_obj_add() or switchdev_port_obj_del() involves
initializing a struct switchdev_obj_port_vlan, a piece of code that
repeats on each call site almost verbatim. While in the current codebase
there is just one duplicated add call, the follow-up patches add more of
both add and del calls.
Thus to remove the duplication, extract the repetition into named
functions and reuse.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Checking netif_xmit_frozen_or_stopped() at the end of sch_direct_xmit()
is being bypassed. This is because "ret" from sch_direct_xmit() will be
either NETDEV_TX_OK or NETDEV_TX_BUSY, and only ret == NETDEV_TX_OK == 0
will reach the condition:
if (ret && netif_xmit_frozen_or_stopped(txq))
return false;
This patch cleans up the code by removing the whole condition.
For more discussion about this, please refer to
https://marc.info/?t=152727195700008
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is additional to the
commit ea1627c20c34 ("tcp: minor optimizations around tcp_hdr() usage").
At this point, skb->data is same with tcp_hdr() as tcp header has not
been pulled yet. So use the less expensive one to get the tcp header.
Remove the third parameter of tcp_rcv_established() and put it into
the function body.
Furthermore, the local variables are listed as a reverse christmas tree :)
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We may derference an invalid pointer in the error path of
xfrm_bundle_create(). Fix this by returning this error
pointer directly instead of assigning it to xdst0.
Fixes: 45b018beddb6 ("ipsec: Create and use new helpers for dst child access.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
families
Update bpf_fib_lookup to return -EAFNOSUPPORT for unsupported address
families. Allows userspace to probe for support as more are added
(e.g., AF_MPLS).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Re-use kstrtobool_from_user() instead of open coded variant.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Verify flags argument contains only known flags. Allows programs to probe
for support as more are added.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The local variable is only used while CONFIG_IPV6 enabled
net/core/filter.c: In function ‘sk_msg_convert_ctx_access’:
net/core/filter.c:6489:6: warning: unused variable ‘off’ [-Wunused-variable]
int off;
^
This puts it into #ifdef.
Fixes: 303def35f64e ("bpf: allow sk_msg programs to read sock fields")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
gcc-7.3.0 report following err:
HOSTCC net/bpfilter/main.o
In file included from net/bpfilter/main.c:9:0:
./include/uapi/linux/bpf.h:12:10: fatal error: linux/bpf_common.h: No such file or directory
#include <linux/bpf_common.h>
remove it by adding a include path.
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for IFA_RT_PRIORITY to ipv6 addresses.
If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be atomically replaced.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for IFA_RT_PRIORITY to ipv4 addresses.
If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be replaced.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update inet6_addr_modify to take ifa6_config argument versus a parameter
list. This is an argument move only; no functional change intended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the creation of struct ifa6_config up to callers of inet6_addr_add.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move config parameters for adding an ipv6 address to a struct. struct
names stem from inet6_rtm_newaddr which is the modern handler for
adding an address.
Start the conversion to ifa6_config with ipv6_add_addr. This is an argument
move only; no functional change intended. Mapping of variable changes:
addr --> cfg->pfx
peer_addr --> cfg->peer_pfx
pfxlen --> cfg->plen
flags --> cfg->ifa_flags
scope, valid_lft, prefered_lft have the same names within cfg
(with corrected spelling).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the message be freed immediately, no need to trim it
back to the previous size.
Inspired by commit 7a9b3ec1e19f ("nl80211: remove unnecessary genlmsg_cancel() calls")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes the following sparse warnings:
net/ipv4/bpfilter/sockopt.c:13:5: warning:
symbol 'bpfilter_mbox_request' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MQ doesn't hold any statistics on its own, however, statistic
from offloads are requested starting from the root, hence MQ
will read the old values for its sums. Call into the drivers,
because of the additive nature of the stats drivers are aware
of how much "pending updates" they have to children of the MQ.
Since MQ reset its stats on every dump we can simply offset
the stats, predicting how stats of offloaded children will
change.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mq offload is trivial, we just need to let the device know
that the root qdisc is mq. Alternative approach would be
to export qdisc_lookup() and make drivers check the root
type themselves, but notification via ndo_setup_tc is more
in line with other qdiscs.
Note that mq doesn't hold any stats on it's own, it just
adds up stats of its children.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The comment and trace_loginfo are not used anymore.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We can make all dumps and lookups lockless.
Dumps currently only hold the nfnl mutex on the dump request itself.
Dumps can span multiple syscalls, dump continuation doesn't acquire the
nfnl mutex anywhere, i.e. the dump callbacks in nf_tables already use
rcu and never rely on nfnl mutex being held.
So, just switch all dumpers to rcu.
This requires taking a module reference before dropping the rcu lock
so rmmod is blocked, we also need to hold module reference over
the entire dump operation sequence. netlink already supports this
via the .module member in the netlink_dump_control struct.
For the non-dump case (i.e. lookup of a specific tables, chains, etc),
we need to swtich to _rcu list iteration primitive and make sure we
use GFP_ATOMIC.
This patch also adds the new nft_netlink_dump_start_rcu() helper that
takes care of the get_ref, drop-rcu-lock,start dump,
get-rcu-lock,put-ref sequence.
The helper will be reused for all dumps.
Rationale in all dump requests is:
- use the nft_netlink_dump_start_rcu helper added in first patch
- use GFP_ATOMIC and rcu list iteration
- switch to .call_rcu
... thus making all dumps in nf_tables not depend on the
nfnl mutex anymore.
In the nf_tables_getgen: This callback just fetches the current base
sequence, there is no need to serialize this with nfnl nft mutex.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
abort batch processing and return so task can exit faster.
Otherwise even SIGKILL has no immediate effect.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
harmless, but it avoids sparse warnings:
nf_tables_api.c:2813:16: warning: incorrect type in return expression (different base types)
nf_tables_api.c:2863:47: warning: incorrect type in argument 3 (different base types)
nf_tables_api.c:3524:47: warning: incorrect type in argument 3 (different base types)
nf_tables_api.c:3538:55: warning: incorrect type in argument 3 (different base types)
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Just use .call_rcu instead. We can drop the rcu read lock
after obtaining a reference and re-acquire on return.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fixes the following sparse warning:
net/netfilter/nf_nat_core.c:1039:20: warning:
symbol 'nat_hook' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
synchronize_rcu() is expensive.
The commit phase currently enforces an unconditional
synchronize_rcu() after incrementing the generation counter.
This is to make sure that a packet always sees a consistent chain, either
nft_do_chain is still using old generation (it will skip the newly added
rules), or the new one (it will skip old ones that might still be linked
into the list).
We could just remove the synchronize_rcu(), it would not cause a crash but
it could cause us to evaluate a rule that was removed and new rule for the
same packet, instead of either-or.
To resolve this, add rule pointer array holding two generations, the
current one and the future generation.
In commit phase, allocate the rule blob and populate it with the rules that
will be active in the new generation.
Then, make this rule blob public, replacing the old generation pointer.
Then the generation counter can be incremented.
nft_do_chain() will either continue to use the current generation
(in case loop was invoked right before increment), or the new one.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
bpfilter_process_sockopt is a callback that gets called from
ip_setsockopt() and ip_getsockopt(). However, when CONFIG_INET is
disabled, it never gets called at all, and assigning a function to the
callback pointer results in a link failure:
net/bpfilter/bpfilter_kern.o: In function `__stop_umh':
bpfilter_kern.c:(.text.unlikely+0x3): undefined reference to `bpfilter_process_sockopt'
net/bpfilter/bpfilter_kern.o: In function `load_umh':
bpfilter_kern.c:(.init.text+0x73): undefined reference to `bpfilter_process_sockopt'
Since there is no caller in this configuration, I assume we can
simply make the assignment conditional.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
seg6_do_srh_encap and seg6_do_srh_inline can possibly do an
out-of-bounds access when adding the SRH to the packet. This no longer
happen when expanding the skb not only by the size of the SRH (+
outer IPv6 header), but also by skb->mac_len.
[ 53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620
[ 53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674
[ 53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90
[ 53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 53.796673] Call Trace:
[ 53.796679] <IRQ>
[ 53.796689] dump_stack+0x71/0xab
[ 53.796700] print_address_description+0x6a/0x270
[ 53.796707] kasan_report+0x258/0x380
[ 53.796715] ? seg6_do_srh_encap+0x284/0x620
[ 53.796722] memmove+0x34/0x50
[ 53.796730] seg6_do_srh_encap+0x284/0x620
[ 53.796741] ? seg6_do_srh+0x29b/0x360
[ 53.796747] seg6_do_srh+0x29b/0x360
[ 53.796756] seg6_input+0x2e/0x2e0
[ 53.796765] lwtunnel_input+0x93/0xd0
[ 53.796774] ipv6_rcv+0x690/0x920
[ 53.796783] ? ip6_input+0x170/0x170
[ 53.796791] ? eth_gro_receive+0x2d0/0x2d0
[ 53.796800] ? ip6_input+0x170/0x170
[ 53.796809] __netif_receive_skb_core+0xcc0/0x13f0
[ 53.796820] ? netdev_info+0x110/0x110
[ 53.796827] ? napi_complete_done+0xb6/0x170
[ 53.796834] ? e1000_clean+0x6da/0xf70
[ 53.796845] ? process_backlog+0x129/0x2a0
[ 53.796853] process_backlog+0x129/0x2a0
[ 53.796862] net_rx_action+0x211/0x5c0
[ 53.796870] ? napi_complete_done+0x170/0x170
[ 53.796887] ? run_rebalance_domains+0x11f/0x150
[ 53.796891] __do_softirq+0x10e/0x39e
[ 53.796894] do_softirq_own_stack+0x2a/0x40
[ 53.796895] </IRQ>
[ 53.796898] do_softirq.part.16+0x54/0x60
[ 53.796900] __local_bh_enable_ip+0x5b/0x60
[ 53.796903] ip6_finish_output2+0x416/0x9f0
[ 53.796906] ? ip6_dst_lookup_flow+0x110/0x110
[ 53.796909] ? ip6_sk_dst_lookup_flow+0x390/0x390
[ 53.796911] ? __rcu_read_unlock+0x66/0x80
[ 53.796913] ? ip6_mtu+0x44/0xf0
[ 53.796916] ? ip6_output+0xfc/0x220
[ 53.796918] ip6_output+0xfc/0x220
[ 53.796921] ? ip6_finish_output+0x2b0/0x2b0
[ 53.796923] ? memcpy+0x34/0x50
[ 53.796926] ip6_send_skb+0x43/0xc0
[ 53.796929] rawv6_sendmsg+0x1216/0x1530
[ 53.796932] ? __orc_find+0x6b/0xc0
[ 53.796934] ? rawv6_rcv_skb+0x160/0x160
[ 53.796937] ? __rcu_read_unlock+0x66/0x80
[ 53.796939] ? __rcu_read_unlock+0x66/0x80
[ 53.796942] ? is_bpf_text_address+0x1e/0x30
[ 53.796944] ? kernel_text_address+0xec/0x100
[ 53.796946] ? __kernel_text_address+0xe/0x30
[ 53.796948] ? unwind_get_return_address+0x2f/0x50
[ 53.796950] ? __save_stack_trace+0x92/0x100
[ 53.796954] ? save_stack+0x89/0xb0
[ 53.796956] ? kasan_kmalloc+0xa0/0xd0
[ 53.796958] ? kmem_cache_alloc+0xd2/0x1f0
[ 53.796961] ? prepare_creds+0x23/0x160
[ 53.796963] ? __x64_sys_capset+0x252/0x3e0
[ 53.796966] ? do_syscall_64+0x69/0x160
[ 53.796968] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.796971] ? __alloc_pages_nodemask+0x170/0x380
[ 53.796973] ? __alloc_pages_slowpath+0x12c0/0x12c0
[ 53.796977] ? tty_vhangup+0x20/0x20
[ 53.796979] ? policy_nodemask+0x1a/0x90
[ 53.796982] ? __mod_node_page_state+0x8d/0xa0
[ 53.796986] ? __check_object_size+0xe7/0x240
[ 53.796989] ? __sys_sendto+0x229/0x290
[ 53.796991] ? rawv6_rcv_skb+0x160/0x160
[ 53.796993] __sys_sendto+0x229/0x290
[ 53.796996] ? __ia32_sys_getpeername+0x50/0x50
[ 53.796999] ? commit_creds+0x2de/0x520
[ 53.797002] ? security_capset+0x57/0x70
[ 53.797004] ? __x64_sys_capset+0x29f/0x3e0
[ 53.797007] ? __x64_sys_rt_sigsuspend+0xe0/0xe0
[ 53.797011] ? __do_page_fault+0x664/0x770
[ 53.797014] __x64_sys_sendto+0x74/0x90
[ 53.797017] do_syscall_64+0x69/0x160
[ 53.797019] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.797022] RIP: 0033:0x7f43b7a6714a
[ 53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
[ 53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a
[ 53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004
[ 53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c
[ 53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661
[ 53.797171] Allocated by task 642:
[ 53.797460] kasan_kmalloc+0xa0/0xd0
[ 53.797463] kmem_cache_alloc+0xd2/0x1f0
[ 53.797465] getname_flags+0x40/0x210
[ 53.797467] user_path_at_empty+0x1d/0x40
[ 53.797469] do_faccessat+0x12a/0x320
[ 53.797471] do_syscall_64+0x69/0x160
[ 53.797473] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.797607] Freed by task 642:
[ 53.797869] __kasan_slab_free+0x130/0x180
[ 53.797871] kmem_cache_free+0xa8/0x230
[ 53.797872] filename_lookup+0x15b/0x230
[ 53.797874] do_faccessat+0x12a/0x320
[ 53.797876] do_syscall_64+0x69/0x160
[ 53.797878] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.798014] The buggy address belongs to the object at ffff88011975e600
which belongs to the cache names_cache of size 4096
[ 53.799043] The buggy address is located 1786 bytes inside of
4096-byte region [ffff88011975e600, ffff88011975f600)
[ 53.800013] The buggy address belongs to the page:
[ 53.800414] page:ffffea000465d600 count:1 mapcount:0
mapping:0000000000000000 index:0x0 compound_mapcount: 0
[ 53.801259] flags: 0x17fff0000008100(slab|head)
[ 53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000
0000000100070007
[ 53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40
0000000000000000
[ 53.803787] page dumped because: kasan: bad access detected
[ 53.804384] Memory state around the buggy address:
[ 53.804788] ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.805384] ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.806577] ^
[ 53.807165] ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.807762] ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.808356] ==================================================================
[ 53.808949] Disabling lock debugging due to kernel taint
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|