diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2018-08-15 15:04:25 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2018-08-15 15:04:25 -0700 |
commit | 9a76aba02a37718242d7cdc294f0a3901928aa57 (patch) | |
tree | 2040d038f85d2120f21af83b0793efd5af1864e3 /net/netfilter | |
parent | 0a957467c5fd46142bc9c52758ffc552d4c5e2f7 (diff) | |
parent | 26a1ccc6c117be8e33e0410fce8c5298b0015b99 (diff) |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
"Highlights:
- Gustavo A. R. Silva keeps working on the implicit switch fallthru
changes.
- Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
Luca Coelho.
- Re-enable ASPM in r8169, from Kai-Heng Feng.
- Add virtual XFRM interfaces, which avoids all of the limitations of
existing IPSEC tunnels. From Steffen Klassert.
- Convert GRO over to use a hash table, so that when we have many
flows active we don't traverse a long list during accumluation.
- Many new self tests for routing, TC, tunnels, etc. Too many
contributors to mention them all, but I'm really happy to keep
seeing this stuff.
- Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.
- Lots of cleanups and fixes in L2TP code from Guillaume Nault.
- Add IPSEC offload support to netdevsim, from Shannon Nelson.
- Add support for slotting with non-uniform distribution to netem
packet scheduler, from Yousuk Seung.
- Add UDP GSO support to mlx5e, from Boris Pismenny.
- Support offloading of Team LAG in NFP, from John Hurley.
- Allow to configure TX queue selection based upon RX queue, from
Amritha Nambiar.
- Support ethtool ring size configuration in aquantia, from Anton
Mikaev.
- Support DSCP and flowlabel per-transport in SCTP, from Xin Long.
- Support list based batching and stack traversal of SKBs, this is
very exciting work. From Edward Cree.
- Busyloop optimizations in vhost_net, from Toshiaki Makita.
- Introduce the ETF qdisc, which allows time based transmissions. IGB
can offload this in hardware. From Vinicius Costa Gomes.
- Add parameter support to devlink, from Moshe Shemesh.
- Several multiplication and division optimizations for BPF JIT in
nfp driver, from Jiong Wang.
- Lots of prepatory work to make more of the packet scheduler layer
lockless, when possible, from Vlad Buslov.
- Add ACK filter and NAT awareness to sch_cake packet scheduler, from
Toke Høiland-Jørgensen.
- Support regions and region snapshots in devlink, from Alex Vesker.
- Allow to attach XDP programs to both HW and SW at the same time on
a given device, with initial support in nfp. From Jakub Kicinski.
- Add TLS RX offload and support in mlx5, from Ilya Lesokhin.
- Use PHYLIB in r8169 driver, from Heiner Kallweit.
- All sorts of changes to support Spectrum 2 in mlxsw driver, from
Ido Schimmel.
- PTP support in mv88e6xxx DSA driver, from Andrew Lunn.
- Make TCP_USER_TIMEOUT socket option more accurate, from Jon
Maxwell.
- Support for templates in packet scheduler classifier, from Jiri
Pirko.
- IPV6 support in RDS, from Ka-Cheong Poon.
- Native tproxy support in nf_tables, from Máté Eckl.
- Maintain IP fragment queue in an rbtree, but optimize properly for
in-order frags. From Peter Oskolkov.
- Improvde handling of ACKs on hole repairs, from Yuchung Cheng"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
hv/netvsc: Fix NULL dereference at single queue mode fallback
net: filter: mark expected switch fall-through
xen-netfront: fix warn message as irq device name has '/'
cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
net: dsa: mv88e6xxx: missing unlock on error path
rds: fix building with IPV6=m
inet/connection_sock: prefer _THIS_IP_ to current_text_addr
net: dsa: mv88e6xxx: bitwise vs logical bug
net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
ieee802154: hwsim: using right kind of iteration
net: hns3: Add vlan filter setting by ethtool command -K
net: hns3: Set tx ring' tc info when netdev is up
net: hns3: Remove tx ring BD len register in hns3_enet
net: hns3: Fix desc num set to default when setting channel
net: hns3: Fix for phy link issue when using marvell phy driver
net: hns3: Fix for information of phydev lost problem when down/up
net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
net: hns3: Add support for serdes loopback selftest
bnxt_en: take coredump_record structure off stack
...
Diffstat (limited to 'net/netfilter')
58 files changed, 4269 insertions, 1376 deletions
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index f0a1c536ef15..71709c104081 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -46,9 +46,19 @@ config NETFILTER_NETLINK_LOG and is also scheduled to replace the old syslog-based ipt_LOG and ip6t_LOG modules. +config NETFILTER_NETLINK_OSF + tristate "Netfilter OSF over NFNETLINK interface" + depends on NETFILTER_ADVANCED + select NETFILTER_NETLINK + help + If this option is enabled, the kernel will include support + for passive OS fingerprint via NFNETLINK. + config NF_CONNTRACK tristate "Netfilter connection tracking support" default m if NETFILTER_ADVANCED=n + select NF_DEFRAG_IPV4 + select NF_DEFRAG_IPV6 if IPV6 != n help Connection tracking keeps a record of what packets have passed through your machine, in order to figure out how they are related @@ -96,7 +106,6 @@ config NF_CONNTRACK_SECMARK config NF_CONNTRACK_ZONES bool 'Connection tracking zones' depends on NETFILTER_ADVANCED - depends on NETFILTER_XT_TARGET_CT help This option enables support for connection tracking zones. Normally, each connection needs to have a unique system wide @@ -148,10 +157,11 @@ config NF_CONNTRACK_TIMESTAMP If unsure, say `N'. config NF_CONNTRACK_LABELS - bool + bool "Connection tracking labels" help This option enables support for assigning user-defined flag bits - to connection tracking entries. It selected by the connlabel match. + to connection tracking entries. It can be used with xtables connlabel + match and the nftables ct expression. config NF_CT_PROTO_DCCP bool 'DCCP protocol connection tracking support' @@ -355,6 +365,7 @@ config NF_CT_NETLINK_TIMEOUT tristate 'Connection tracking timeout tuning via Netlink' select NETFILTER_NETLINK depends on NETFILTER_ADVANCED + depends on NF_CONNTRACK_TIMEOUT help This option enables support for connection tracking timeout fine-grain tuning. This allows you to attach specific timeout @@ -440,9 +451,6 @@ config NETFILTER_SYNPROXY endif # NF_CONNTRACK -config NF_OSF - tristate - config NF_TABLES select NETFILTER_NETLINK tristate "Netfilter nf_tables support" @@ -551,6 +559,12 @@ config NFT_NAT This option adds the "nat" expression that you can use to perform typical Network Address Translation (NAT) packet transformations. +config NFT_TUNNEL + tristate "Netfilter nf_tables tunnel module" + help + This option adds the "tunnel" expression that you can use to set + tunneling policies. + config NFT_OBJREF tristate "Netfilter nf_tables stateful object reference module" help @@ -615,11 +629,28 @@ config NFT_SOCKET tristate "Netfilter nf_tables socket match support" depends on IPV6 || IPV6=n select NF_SOCKET_IPV4 - select NF_SOCKET_IPV6 if IPV6 + select NF_SOCKET_IPV6 if NF_TABLES_IPV6 help This option allows matching for the presence or absence of a corresponding socket and its attributes. +config NFT_OSF + tristate "Netfilter nf_tables passive OS fingerprint support" + depends on NETFILTER_ADVANCED + select NETFILTER_NETLINK_OSF + help + This option allows matching packets from an specific OS. + +config NFT_TPROXY + tristate "Netfilter nf_tables tproxy support" + depends on IPV6 || IPV6=n + select NF_DEFRAG_IPV4 + select NF_DEFRAG_IPV6 if NF_TABLES_IPV6 + select NF_TPROXY_IPV4 + select NF_TPROXY_IPV6 if NF_TABLES_IPV6 + help + This makes transparent proxy support available in nftables. + if NF_TABLES_NETDEV config NF_DUP_NETDEV @@ -881,7 +912,7 @@ config NETFILTER_XT_TARGET_LOG tristate "LOG target support" select NF_LOG_COMMON select NF_LOG_IPV4 - select NF_LOG_IPV6 if IPV6 + select NF_LOG_IPV6 if IP6_NF_IPTABLES default m if NETFILTER_ADVANCED=n help This option adds a `LOG' target, which allows you to create rules in @@ -973,7 +1004,7 @@ config NETFILTER_XT_TARGET_TEE depends on IPV6 || IPV6=n depends on !NF_CONNTRACK || NF_CONNTRACK select NF_DUP_IPV4 - select NF_DUP_IPV6 if IPV6 + select NF_DUP_IPV6 if IP6_NF_IPTABLES ---help--- This option adds a "TEE" target with which a packet can be cloned and this clone be rerouted to another nexthop. @@ -1366,8 +1397,8 @@ config NETFILTER_XT_MATCH_NFACCT config NETFILTER_XT_MATCH_OSF tristate '"osf" Passive OS fingerprint match' - depends on NETFILTER_ADVANCED && NETFILTER_NETLINK - select NF_OSF + depends on NETFILTER_ADVANCED + select NETFILTER_NETLINK_OSF help This option selects the Passive OS Fingerprinting match module that allows to passively match the remote operating system by @@ -1481,8 +1512,8 @@ config NETFILTER_XT_MATCH_SOCKET depends on NETFILTER_ADVANCED depends on IPV6 || IPV6=n depends on IP6_NF_IPTABLES || IP6_NF_IPTABLES=n - depends on NF_SOCKET_IPV4 - depends on NF_SOCKET_IPV6 + select NF_SOCKET_IPV4 + select NF_SOCKET_IPV6 if IP6_NF_IPTABLES select NF_DEFRAG_IPV4 select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES != n help diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 8a76dced974d..16895e045b66 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -1,7 +1,12 @@ # SPDX-License-Identifier: GPL-2.0 netfilter-objs := core.o nf_log.o nf_queue.o nf_sockopt.o utils.o -nf_conntrack-y := nf_conntrack_core.o nf_conntrack_standalone.o nf_conntrack_expect.o nf_conntrack_helper.o nf_conntrack_proto.o nf_conntrack_l3proto_generic.o nf_conntrack_proto_generic.o nf_conntrack_proto_tcp.o nf_conntrack_proto_udp.o nf_conntrack_extend.o nf_conntrack_acct.o nf_conntrack_seqadj.o +nf_conntrack-y := nf_conntrack_core.o nf_conntrack_standalone.o nf_conntrack_expect.o nf_conntrack_helper.o \ + nf_conntrack_proto.o nf_conntrack_proto_generic.o nf_conntrack_proto_tcp.o nf_conntrack_proto_udp.o \ + nf_conntrack_proto_icmp.o \ + nf_conntrack_extend.o nf_conntrack_acct.o nf_conntrack_seqadj.o + +nf_conntrack-$(subst m,y,$(CONFIG_IPV6)) += nf_conntrack_proto_icmpv6.o nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += nf_conntrack_timeout.o nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMESTAMP) += nf_conntrack_timestamp.o nf_conntrack-$(CONFIG_NF_CONNTRACK_EVENTS) += nf_conntrack_ecache.o @@ -15,6 +20,7 @@ obj-$(CONFIG_NETFILTER_NETLINK) += nfnetlink.o obj-$(CONFIG_NETFILTER_NETLINK_ACCT) += nfnetlink_acct.o obj-$(CONFIG_NETFILTER_NETLINK_QUEUE) += nfnetlink_queue.o obj-$(CONFIG_NETFILTER_NETLINK_LOG) += nfnetlink_log.o +obj-$(CONFIG_NETFILTER_NETLINK_OSF) += nfnetlink_osf.o # connection tracking obj-$(CONFIG_NF_CONNTRACK) += nf_conntrack.o @@ -95,6 +101,7 @@ obj-$(CONFIG_NFT_QUEUE) += nft_queue.o obj-$(CONFIG_NFT_QUOTA) += nft_quota.o obj-$(CONFIG_NFT_REJECT) += nft_reject.o obj-$(CONFIG_NFT_REJECT_INET) += nft_reject_inet.o +obj-$(CONFIG_NFT_TUNNEL) += nft_tunnel.o obj-$(CONFIG_NFT_COUNTER) += nft_counter.o obj-$(CONFIG_NFT_LOG) += nft_log.o obj-$(CONFIG_NFT_MASQ) += nft_masq.o @@ -103,8 +110,9 @@ obj-$(CONFIG_NFT_HASH) += nft_hash.o obj-$(CONFIG_NFT_FIB) += nft_fib.o obj-$(CONFIG_NFT_FIB_INET) += nft_fib_inet.o obj-$(CONFIG_NFT_FIB_NETDEV) += nft_fib_netdev.o -obj-$(CONFIG_NF_OSF) += nf_osf.o obj-$(CONFIG_NFT_SOCKET) += nft_socket.o +obj-$(CONFIG_NFT_OSF) += nft_osf.o +obj-$(CONFIG_NFT_TPROXY) += nft_tproxy.o # nf_tables netdev obj-$(CONFIG_NFT_DUP_NETDEV) += nft_dup_netdev.o diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 168af54db975..dc240cb47ddf 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -603,6 +603,21 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct) } EXPORT_SYMBOL(nf_conntrack_destroy); +bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, + const struct sk_buff *skb) +{ + struct nf_ct_hook *ct_hook; + bool ret = false; + + rcu_read_lock(); + ct_hook = rcu_dereference(nf_ct_hook); + if (ct_hook) + ret = ct_hook->get_tuple_skb(dst_tuple, skb); + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(nf_ct_get_tuple_skb); + /* Built-in default zone used e.g. by modules. */ const struct nf_conntrack_zone nf_ct_zone_dflt = { .id = NF_CT_DEFAULT_ZONE_ID, diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index 99e0aa350dc5..0edc62910ebf 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -825,12 +825,23 @@ static void ip_vs_conn_expire(struct timer_list *t) /* Unlink conn if not referenced anymore */ if (likely(ip_vs_conn_unlink(cp))) { + struct ip_vs_conn *ct = cp->control; + /* delete the timer if it is activated by other users */ del_timer(&cp->timer); /* does anybody control me? */ - if (cp->control) + if (ct) { ip_vs_control_del(cp); + /* Drop CTL or non-assured TPL if not used anymore */ + if (!cp->timeout && !atomic_read(&ct->n_control) && + (!(ct->flags & IP_VS_CONN_F_TEMPLATE) || + !(ct->state & IP_VS_CTPL_S_ASSURED))) { + IP_VS_DBG(4, "drop controlling connection\n"); + ct->timeout = 0; + ip_vs_conn_expire_now(ct); + } + } if ((cp->flags & IP_VS_CONN_F_NFCT) && !(cp->flags & IP_VS_CONN_F_ONE_PACKET)) { @@ -872,6 +883,10 @@ static void ip_vs_conn_expire(struct timer_list *t) /* Modify timer, so that it expires as soon as possible. * Can be called without reference only if under RCU lock. + * We can have such chain of conns linked with ->control: DATA->CTL->TPL + * - DATA (eg. FTP) and TPL (persistence) can be present depending on setup + * - cp->timeout=0 indicates all conns from chain should be dropped but + * TPL is not dropped if in assured state */ void ip_vs_conn_expire_now(struct ip_vs_conn *cp) { @@ -1107,7 +1122,7 @@ static int ip_vs_conn_seq_show(struct seq_file *seq, void *v) &cp->caddr.in6, ntohs(cp->cport), &cp->vaddr.in6, ntohs(cp->vport), dbuf, ntohs(cp->dport), - ip_vs_state_name(cp->protocol, cp->state), + ip_vs_state_name(cp), (cp->timer.expires-jiffies)/HZ, pe_data); else #endif @@ -1118,7 +1133,7 @@ static int ip_vs_conn_seq_show(struct seq_file *seq, void *v) ntohl(cp->caddr.ip), ntohs(cp->cport), ntohl(cp->vaddr.ip), ntohs(cp->vport), dbuf, ntohs(cp->dport), - ip_vs_state_name(cp->protocol, cp->state), + ip_vs_state_name(cp), (cp->timer.expires-jiffies)/HZ, pe_data); } return 0; @@ -1169,7 +1184,7 @@ static int ip_vs_conn_sync_seq_show(struct seq_file *seq, void *v) &cp->caddr.in6, ntohs(cp->cport), &cp->vaddr.in6, ntohs(cp->vport), dbuf, ntohs(cp->dport), - ip_vs_state_name(cp->protocol, cp->state), + ip_vs_state_name(cp), ip_vs_origin_name(cp->flags), (cp->timer.expires-jiffies)/HZ); else @@ -1181,7 +1196,7 @@ static int ip_vs_conn_sync_seq_show(struct seq_file *seq, void *v) ntohl(cp->caddr.ip), ntohs(cp->cport), ntohl(cp->vaddr.ip), ntohs(cp->vport), dbuf, ntohs(cp->dport), - ip_vs_state_name(cp->protocol, cp->state), + ip_vs_state_name(cp), ip_vs_origin_name(cp->flags), (cp->timer.expires-jiffies)/HZ); } @@ -1197,8 +1212,11 @@ static const struct seq_operations ip_vs_conn_sync_seq_ops = { #endif -/* - * Randomly drop connection entries before running out of memory +/* Randomly drop connection entries before running out of memory + * Can be used for DATA and CTL conns. For TPL conns there are exceptions: + * - traffic for services in OPS mode increases ct->in_pkts, so it is supported + * - traffic for services not in OPS mode does not increase ct->in_pkts in + * all cases, so it is not supported */ static inline int todrop_entry(struct ip_vs_conn *cp) { @@ -1242,7 +1260,7 @@ static inline bool ip_vs_conn_ops_mode(struct ip_vs_conn *cp) void ip_vs_random_dropentry(struct netns_ipvs *ipvs) { int idx; - struct ip_vs_conn *cp, *cp_c; + struct ip_vs_conn *cp; rcu_read_lock(); /* @@ -1254,13 +1272,15 @@ void ip_vs_random_dropentry(struct netns_ipvs *ipvs) hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[hash], c_list) { if (cp->ipvs != ipvs) continue; + if (atomic_read(&cp->n_control)) + continue; if (cp->flags & IP_VS_CONN_F_TEMPLATE) { - if (atomic_read(&cp->n_control) || - !ip_vs_conn_ops_mode(cp)) - continue; - else - /* connection template of OPS */ + /* connection template of OPS */ + if (ip_vs_conn_ops_mode(cp)) goto try_drop; + if (!(cp->state & IP_VS_CTPL_S_ASSURED)) + goto drop; + continue; } if (cp->protocol == IPPROTO_TCP) { switch(cp->state) { @@ -1294,15 +1314,10 @@ try_drop: continue; } - IP_VS_DBG(4, "del connection\n"); +drop: + IP_VS_DBG(4, "drop connection\n"); + cp->timeout = 0; ip_vs_conn_expire_now(cp); - cp_c = cp->control; - /* cp->control is valid only with reference to cp */ - if (cp_c && __ip_vs_conn_get(cp)) { - IP_VS_DBG(4, "del conn template\n"); - ip_vs_conn_expire_now(cp_c); - __ip_vs_conn_put(cp); - } } cond_resched_rcu(); } @@ -1325,15 +1340,19 @@ flush_again: hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { if (cp->ipvs != ipvs) continue; - IP_VS_DBG(4, "del connection\n"); - ip_vs_conn_expire_now(cp); + /* As timers are expired in LIFO order, restart + * the timer of controlling connection first, so + * that it is expired after us. + */ cp_c = cp->control; /* cp->control is valid only with reference to cp */ if (cp_c && __ip_vs_conn_get(cp)) { - IP_VS_DBG(4, "del conn template\n"); + IP_VS_DBG(4, "del controlling connection\n"); ip_vs_conn_expire_now(cp_c); __ip_vs_conn_put(cp); } + IP_VS_DBG(4, "del connection\n"); + ip_vs_conn_expire_now(cp); } cond_resched_rcu(); } diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index dd21782e2f12..62eefea48973 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -134,7 +134,7 @@ static void update_defense_level(struct netns_ipvs *ipvs) } else { atomic_set(&ipvs->dropentry, 0); ipvs->sysctl_drop_entry = 1; - }; + } break; case 3: atomic_set(&ipvs->dropentry, 1); diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c index ca880a3ad033..54ee84adf0bd 100644 --- a/net/netfilter/ipvs/ip_vs_proto.c +++ b/net/netfilter/ipvs/ip_vs_proto.c @@ -42,6 +42,11 @@ static struct ip_vs_protocol *ip_vs_proto_table[IP_VS_PROTO_TAB_SIZE]; +/* States for conn templates: NONE or words separated with ",", max 15 chars */ +static const char *ip_vs_ctpl_state_name_table[IP_VS_CTPL_S_LAST] = { + [IP_VS_CTPL_S_NONE] = "NONE", + [IP_VS_CTPL_S_ASSURED] = "ASSURED", +}; /* * register an ipvs protocol @@ -193,12 +198,20 @@ ip_vs_create_timeout_table(int *table, int size) } -const char * ip_vs_state_name(__u16 proto, int state) +const char *ip_vs_state_name(const struct ip_vs_conn *cp) { - struct ip_vs_protocol *pp = ip_vs_proto_get(proto); + unsigned int state = cp->state; + struct ip_vs_protocol *pp; + + if (cp->flags & IP_VS_CONN_F_TEMPLATE) { + if (state >= IP_VS_CTPL_S_LAST) + return "ERR!"; + return ip_vs_ctpl_state_name_table[state] ? : "?"; + } + pp = ip_vs_proto_get(cp->protocol); if (pp == NULL || pp->state_name == NULL) - return (IPPROTO_IP == proto) ? "NONE" : "ERR!"; + return (cp->protocol == IPPROTO_IP) ? "NONE" : "ERR!"; return pp->state_name(state); } diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c index 3250c4a1111e..b0cd7d08f2a7 100644 --- a/net/netfilter/ipvs/ip_vs_proto_sctp.c +++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c @@ -461,6 +461,8 @@ set_sctp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp, cp->flags &= ~IP_VS_CONN_F_INACTIVE; } } + if (next_state == IP_VS_SCTP_S_ESTABLISHED) + ip_vs_control_assure_ct(cp); } if (likely(pd)) cp->timeout = pd->timeout_table[cp->state = next_state]; diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c index 80d10ad12a15..1770fc6ce960 100644 --- a/net/netfilter/ipvs/ip_vs_proto_tcp.c +++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c @@ -569,6 +569,8 @@ set_tcp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp, cp->flags &= ~IP_VS_CONN_F_INACTIVE; } } + if (new_state == IP_VS_TCP_S_ESTABLISHED) + ip_vs_control_assure_ct(cp); } if (likely(pd)) diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c index e0ef11c3691e..0f53c49025f8 100644 --- a/net/netfilter/ipvs/ip_vs_proto_udp.c +++ b/net/netfilter/ipvs/ip_vs_proto_udp.c @@ -460,6 +460,8 @@ udp_state_transition(struct ip_vs_conn *cp, int direction, } cp->timeout = pd->timeout_table[IP_VS_UDP_S_NORMAL]; + if (direction == IP_VS_DIR_OUTPUT) + ip_vs_control_assure_ct(cp); } static int __udp_init(struct netns_ipvs *ipvs, struct ip_vs_proto_data *pd) diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c index 001501e25625..d4020c5e831d 100644 --- a/net/netfilter/ipvs/ip_vs_sync.c +++ b/net/netfilter/ipvs/ip_vs_sync.c @@ -1003,12 +1003,9 @@ static void ip_vs_process_message_v0(struct netns_ipvs *ipvs, const char *buffer continue; } } else { - /* protocol in templates is not used for state/timeout */ - if (state > 0) { - IP_VS_DBG(2, "BACKUP v0, Invalid template state %u\n", - state); - state = 0; - } + if (state >= IP_VS_CTPL_S_LAST) + IP_VS_DBG(7, "BACKUP v0, Invalid tpl state %u\n", + state); } ip_vs_conn_fill_param(ipvs, AF_INET, s->protocol, @@ -1166,12 +1163,9 @@ static inline int ip_vs_proc_sync_conn(struct netns_ipvs *ipvs, __u8 *p, __u8 *m goto out; } } else { - /* protocol in templates is not used for state/timeout */ - if (state > 0) { - IP_VS_DBG(3, "BACKUP, Invalid template state %u\n", - state); - state = 0; - } + if (state >= IP_VS_CTPL_S_LAST) + IP_VS_DBG(7, "BACKUP, Invalid tpl state %u\n", + state); } if (ip_vs_conn_fill_param_sync(ipvs, af, s, ¶m, pe_data, pe_data_len, pe_name, pe_name_len)) { diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c index 510039862aa9..02ca7df793f5 100644 --- a/net/netfilter/nf_conncount.c +++ b/net/netfilter/nf_conncount.c @@ -44,17 +44,19 @@ /* we will save the tuples of all connections we care about */ struct nf_conncount_tuple { - struct hlist_node node; + struct list_head node; struct nf_conntrack_tuple tuple; struct nf_conntrack_zone zone; int cpu; u32 jiffies32; + struct rcu_head rcu_head; }; struct nf_conncount_rb { struct rb_node node; - struct hlist_head hhead; /* connections/hosts in same subnet */ + struct nf_conncount_list list; u32 key[MAX_KEYLEN]; + struct rcu_head rcu_head; }; static spinlock_t nf_conncount_locks[CONNCOUNT_LOCK_SLOTS] __cacheline_aligned_in_smp; @@ -62,6 +64,10 @@ static spinlock_t nf_conncount_locks[CONNCOUNT_LOCK_SLOTS] __cacheline_aligned_i struct nf_conncount_data { unsigned int keylen; struct rb_root root[CONNCOUNT_SLOTS]; + struct net *net; + struct work_struct gc_work; + unsigned long pending_trees[BITS_TO_LONGS(CONNCOUNT_SLOTS)]; + unsigned int gc_tree; }; static u_int32_t conncount_rnd __read_mostly; @@ -82,26 +88,70 @@ static int key_diff(const u32 *a, const u32 *b, unsigned int klen) return memcmp(a, b, klen * sizeof(u32)); } -bool nf_conncount_add(struct hlist_head *head, - const struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_zone *zone) +enum nf_conncount_list_add +nf_conncount_add(struct nf_conncount_list *list, + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_zone *zone) { struct nf_conncount_tuple *conn; + if (WARN_ON_ONCE(list->count > INT_MAX)) + return NF_CONNCOUNT_ERR; + conn = kmem_cache_alloc(conncount_conn_cachep, GFP_ATOMIC); if (conn == NULL) - return false; + return NF_CONNCOUNT_ERR; + conn->tuple = *tuple; conn->zone = *zone; conn->cpu = raw_smp_processor_id(); conn->jiffies32 = (u32)jiffies; - hlist_add_head(&conn->node, head); - return true; + spin_lock(&list->list_lock); + if (list->dead == true) { + kmem_cache_free(conncount_conn_cachep, conn); + spin_unlock(&list->list_lock); + return NF_CONNCOUNT_SKIP; + } + list_add_tail(&conn->node, &list->head); + list->count++; + spin_unlock(&list->list_lock); + return NF_CONNCOUNT_ADDED; } EXPORT_SYMBOL_GPL(nf_conncount_add); +static void __conn_free(struct rcu_head *h) +{ + struct nf_conncount_tuple *conn; + + conn = container_of(h, struct nf_conncount_tuple, rcu_head); + kmem_cache_free(conncount_conn_cachep, conn); +} + +static bool conn_free(struct nf_conncount_list *list, + struct nf_conncount_tuple *conn) +{ + bool free_entry = false; + + spin_lock(&list->list_lock); + + if (list->count == 0) { + spin_unlock(&list->list_lock); + return free_entry; + } + + list->count--; + list_del_rcu(&conn->node); + if (list->count == 0) + free_entry = true; + + spin_unlock(&list->list_lock); + call_rcu(&conn->rcu_head, __conn_free); + return free_entry; +} + static const struct nf_conntrack_tuple_hash * -find_or_evict(struct net *net, struct nf_conncount_tuple *conn) +find_or_evict(struct net *net, struct nf_conncount_list *list, + struct nf_conncount_tuple *conn, bool *free_entry) { const struct nf_conntrack_tuple_hash *found; unsigned long a, b; @@ -121,34 +171,37 @@ find_or_evict(struct net *net, struct nf_conncount_tuple *conn) */ age = a - b; if (conn->cpu == cpu || age >= 2) { - hlist_del(&conn->node); - kmem_cache_free(conncount_conn_cachep, conn); + *free_entry = conn_free(list, conn); return ERR_PTR(-ENOENT); } return ERR_PTR(-EAGAIN); } -unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, - const struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_zone *zone, - bool *addit) +void nf_conncount_lookup(struct net *net, + struct nf_conncount_list *list, + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_zone *zone, + bool *addit) { const struct nf_conntrack_tuple_hash *found; - struct nf_conncount_tuple *conn; + struct nf_conncount_tuple *conn, *conn_n; struct nf_conn *found_ct; - struct hlist_node *n; - unsigned int length = 0; + unsigned int collect = 0; + bool free_entry = false; + /* best effort only */ *addit = tuple ? true : false; /* check the saved connections */ - hlist_for_each_entry_safe(conn, n, head, node) { - found = find_or_evict(net, conn); + list_for_each_entry_safe(conn, conn_n, &list->head, node) { + if (collect > CONNCOUNT_GC_MAX_NODES) + break; + + found = find_or_evict(net, list, conn, &free_entry); if (IS_ERR(found)) { /* Not found, but might be about to be confirmed */ if (PTR_ERR(found) == -EAGAIN) { - length++; if (!tuple) continue; @@ -156,7 +209,8 @@ unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, nf_ct_zone_id(&conn->zone, conn->zone.dir) == nf_ct_zone_id(zone, zone->dir)) *addit = false; - } + } else if (PTR_ERR(found) == -ENOENT) + collect++; continue; } @@ -165,9 +219,10 @@ unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, if (tuple && nf_ct_tuple_equal(&conn->tuple, tuple) && nf_ct_zone_equal(found_ct, zone, zone->dir)) { /* - * Just to be sure we have it only once in the list. * We should not see tuples twice unless someone hooks * this into a table without "-p tcp --syn". + * + * Attempt to avoid a re-add in this case. */ *addit = false; } else if (already_closed(found_ct)) { @@ -176,19 +231,75 @@ unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, * closed already -> ditch it */ nf_ct_put(found_ct); - hlist_del(&conn->node); - kmem_cache_free(conncount_conn_cachep, conn); + conn_free(list, conn); + collect++; continue; } nf_ct_put(found_ct); - length++; } - - return length; } EXPORT_SYMBOL_GPL(nf_conncount_lookup); +void nf_conncount_list_init(struct nf_conncount_list *list) +{ + spin_lock_init(&list->list_lock); + INIT_LIST_HEAD(&list->head); + list->count = 1; + list->dead = false; +} +EXPORT_SYMBOL_GPL(nf_conncount_list_init); + +/* Return true if the list is empty */ +bool nf_conncount_gc_list(struct net *net, + struct nf_conncount_list *list) +{ + const struct nf_conntrack_tuple_hash *found; + struct nf_conncount_tuple *conn, *conn_n; + struct nf_conn *found_ct; + unsigned int collected = 0; + bool free_entry = false; + + list_for_each_entry_safe(conn, conn_n, &list->head, node) { + found = find_or_evict(net, list, conn, &free_entry); + if (IS_ERR(found)) { + if (PTR_ERR(found) == -ENOENT) { + if (free_entry) + return true; + collected++; + } + continue; + } + + found_ct = nf_ct_tuplehash_to_ctrack(found); + if (already_closed(found_ct)) { + /* + * we do not care about connections which are + * closed already -> ditch it + */ + nf_ct_put(found_ct); + if (conn_free(list, conn)) + return true; + collected++; + continue; + } + + nf_ct_put(found_ct); + if (collected > CONNCOUNT_GC_MAX_NODES) + return false; + } + return false; +} +EXPORT_SYMBOL_GPL(nf_conncount_gc_list); + +static void __tree_nodes_free(struct rcu_head *h) +{ + struct nf_conncount_rb *rbconn; + + rbconn = container_of(h, struct nf_conncount_rb, rcu_head); + kmem_cache_free(conncount_rb_cachep, rbconn); +} + static void tree_nodes_free(struct rb_root *root, struct nf_conncount_rb *gc_nodes[], unsigned int gc_count) @@ -197,32 +308,46 @@ static void tree_nodes_free(struct rb_root *root, while (gc_count) { rbconn = gc_nodes[--gc_count]; - rb_erase(&rbconn->node, root); - kmem_cache_free(conncount_rb_cachep, rbconn); + spin_lock(&rbconn->list.list_lock); + if (rbconn->list.count == 0 && rbconn->list.dead == false) { + rbconn->list.dead = true; + rb_erase(&rbconn->node, root); + call_rcu(&rbconn->rcu_head, __tree_nodes_free); + } + spin_unlock(&rbconn->list.list_lock); } } +static void schedule_gc_worker(struct nf_conncount_data *data, int tree) +{ + set_bit(tree, data->pending_trees); + schedule_work(&data->gc_work); +} + static unsigned int -count_tree(struct net *net, struct rb_root *root, - const u32 *key, u8 keylen, - const struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_zone *zone) +insert_tree(struct net *net, + struct nf_conncount_data *data, + struct rb_root *root, + unsigned int hash, + const u32 *key, + u8 keylen, + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_zone *zone) { + enum nf_conncount_list_add ret; struct nf_conncount_rb *gc_nodes[CONNCOUNT_GC_MAX_NODES]; struct rb_node **rbnode, *parent; struct nf_conncount_rb *rbconn; struct nf_conncount_tuple *conn; - unsigned int gc_count; - bool no_gc = false; + unsigned int count = 0, gc_count = 0; + bool node_found = false; + + spin_lock_bh(&nf_conncount_locks[hash % CONNCOUNT_LOCK_SLOTS]); - restart: - gc_count = 0; parent = NULL; rbnode = &(root->rb_node); while (*rbnode) { int diff; - bool addit; - rbconn = rb_entry(*rbnode, struct nf_conncount_rb, node); parent = *rbnode; @@ -232,33 +357,30 @@ count_tree(struct net *net, struct rb_root *root, } else if (diff > 0) { rbnode = &((*rbnode)->rb_right); } else { - /* same source network -> be counted! */ - unsigned int count; - - count = nf_conncount_lookup(net, &rbconn->hhead, tuple, - zone, &addit); - - tree_nodes_free(root, gc_nodes, gc_count); - if (!addit) - return count; - - if (!nf_conncount_add(&rbconn->hhead, tuple, zone)) - return 0; /* hotdrop */ - - return count + 1; + /* unlikely: other cpu added node already */ + node_found = true; + ret = nf_conncount_add(&rbconn->list, tuple, zone); + if (ret == NF_CONNCOUNT_ERR) { + count = 0; /* hotdrop */ + } else if (ret == NF_CONNCOUNT_ADDED) { + count = rbconn->list.count; + } else { + /* NF_CONNCOUNT_SKIP, rbconn is already + * reclaimed by gc, insert a new tree node + */ + node_found = false; + } + break; } - if (no_gc || gc_count >= ARRAY_SIZE(gc_nodes)) + if (gc_count >= ARRAY_SIZE(gc_nodes)) continue; - /* only used for GC on hhead, retval and 'addit' ignored */ - nf_conncount_lookup(net, &rbconn->hhead, tuple, zone, &addit); - if (hlist_empty(&rbconn->hhead)) + if (nf_conncount_gc_list(net, &rbconn->list)) gc_nodes[gc_count++] = rbconn; } if (gc_count) { - no_gc = true; tree_nodes_free(root, gc_nodes, gc_count); /* tree_node_free before new allocation permits * allocator to re-use newly free'd object. @@ -266,58 +388,146 @@ count_tree(struct net *net, struct rb_root *root, * This is a rare event; in most cases we will find * existing node to re-use. (or gc_count is 0). */ - goto restart; + + if (gc_count >= ARRAY_SIZE(gc_nodes)) + schedule_gc_worker(data, hash); } - if (!tuple) - return 0; + if (node_found) + goto out_unlock; - /* no match, need to insert new node */ + /* expected case: match, insert new node */ rbconn = kmem_cache_alloc(conncount_rb_cachep, GFP_ATOMIC); if (rbconn == NULL) - return 0; + goto out_unlock; conn = kmem_cache_alloc(conncount_conn_cachep, GFP_ATOMIC); if (conn == NULL) { kmem_cache_free(conncount_rb_cachep, rbconn); - return 0; + goto out_unlock; } conn->tuple = *tuple; conn->zone = *zone; memcpy(rbconn->key, key, sizeof(u32) * keylen); - INIT_HLIST_HEAD(&rbconn->hhead); - hlist_add_head(&conn->node, &rbconn->hhead); + nf_conncount_list_init(&rbconn->list); + list_add(&conn->node, &rbconn->list.head); + count = 1; rb_link_node(&rbconn->node, parent, rbnode); rb_insert_color(&rbconn->node, root); - return 1; +out_unlock: + spin_unlock_bh(&nf_conncount_locks[hash % CONNCOUNT_LOCK_SLOTS]); + return count; } -/* Count and return number of conntrack entries in 'net' with particular 'key'. - * If 'tuple' is not null, insert it into the accounting data structure. - */ -unsigned int nf_conncount_count(struct net *net, - struct nf_conncount_data *data, - const u32 *key, - const struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_zone *zone) +static unsigned int +count_tree(struct net *net, + struct nf_conncount_data *data, + const u32 *key, + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_zone *zone) { + enum nf_conncount_list_add ret; struct rb_root *root; - int count; - u32 hash; + struct rb_node *parent; + struct nf_conncount_rb *rbconn; + unsigned int hash; + u8 keylen = data->keylen; hash = jhash2(key, data->keylen, conncount_rnd) % CONNCOUNT_SLOTS; root = &data->root[hash]; - spin_lock_bh(&nf_conncount_locks[hash % CONNCOUNT_LOCK_SLOTS]); + parent = rcu_dereference_raw(root->rb_node); + while (parent) { + int diff; + bool addit; - count = count_tree(net, root, key, data->keylen, tuple, zone); + rbconn = rb_entry(parent, struct nf_conncount_rb, node); - spin_unlock_bh(&nf_conncount_locks[hash % CONNCOUNT_LOCK_SLOTS]); + diff = key_diff(key, rbconn->key, keylen); + if (diff < 0) { + parent = rcu_dereference_raw(parent->rb_left); + } else if (diff > 0) { + parent = rcu_dereference_raw(parent->rb_right); + } else { + /* same source network -> be counted! */ + nf_conncount_lookup(net, &rbconn->list, tuple, zone, + &addit); - return count; + if (!addit) + return rbconn->list.count; + + ret = nf_conncount_add(&rbconn->list, tuple, zone); + if (ret == NF_CONNCOUNT_ERR) { + return 0; /* hotdrop */ + } else if (ret == NF_CONNCOUNT_ADDED) { + return rbconn->list.count; + } else { + /* NF_CONNCOUNT_SKIP, rbconn is already + * reclaimed by gc, insert a new tree node + */ + break; + } + } + } + + if (!tuple) + return 0; + + return insert_tree(net, data, root, hash, key, keylen, tuple, zone); +} + +static void tree_gc_worker(struct work_struct *work) +{ + struct nf_conncount_data *data = container_of(work, struct nf_conncount_data, gc_work); + struct nf_conncount_rb *gc_nodes[CONNCOUNT_GC_MAX_NODES], *rbconn; + struct rb_root *root; + struct rb_node *node; + unsigned int tree, next_tree, gc_count = 0; + + tree = data->gc_tree % CONNCOUNT_LOCK_SLOTS; + root = &data->root[tree]; + + rcu_read_lock(); + for (node = rb_first(root); node != NULL; node = rb_next(node)) { + rbconn = rb_entry(node, struct nf_conncount_rb, node); + if (nf_conncount_gc_list(data->net, &rbconn->list)) + gc_nodes[gc_count++] = rbconn; + } + rcu_read_unlock(); + + spin_lock_bh(&nf_conncount_locks[tree]); + + if (gc_count) { + tree_nodes_free(root, gc_nodes, gc_count); + } + + clear_bit(tree, data->pending_trees); + + next_tree = (tree + 1) % CONNCOUNT_SLOTS; + next_tree = find_next_bit(data->pending_trees, next_tree, CONNCOUNT_SLOTS); + + if (next_tree < CONNCOUNT_SLOTS) { + data->gc_tree = next_tree; + schedule_work(work); + } + + spin_unlock_bh(&nf_conncount_locks[tree]); +} + +/* Count and return number of conntrack entries in 'net' with particular 'key'. + * If 'tuple' is not null, insert it into the accounting data structure. + * Call with RCU read lock. + */ +unsigned int nf_conncount_count(struct net *net, + struct nf_conncount_data *data, + const u32 *key, + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_zone *zone) +{ + return count_tree(net, data, key, tuple, zone); } EXPORT_SYMBOL_GPL(nf_conncount_count); @@ -348,17 +558,18 @@ struct nf_conncount_data *nf_conncount_init(struct net *net, unsigned int family data->root[i] = RB_ROOT; data->keylen = keylen / sizeof(u32); + data->net = net; + INIT_WORK(&data->gc_work, tree_gc_worker); return data; } EXPORT_SYMBOL_GPL(nf_conncount_init); -void nf_conncount_cache_free(struct hlist_head *hhead) +void nf_conncount_cache_free(struct nf_conncount_list *list) { - struct nf_conncount_tuple *conn; - struct hlist_node *n; + struct nf_conncount_tuple *conn, *conn_n; - hlist_for_each_entry_safe(conn, n, hhead, node) + list_for_each_entry_safe(conn, conn_n, &list->head, node) kmem_cache_free(conncount_conn_cachep, conn); } EXPORT_SYMBOL_GPL(nf_conncount_cache_free); @@ -373,7 +584,7 @@ static void destroy_tree(struct rb_root *r) rb_erase(node, r); - nf_conncount_cache_free(&rbconn->hhead); + nf_conncount_cache_free(&rbconn->list); kmem_cache_free(conncount_rb_cachep, rbconn); } @@ -384,6 +595,7 @@ void nf_conncount_destroy(struct net *net, unsigned int family, { unsigned int i; + cancel_work_sync(&data->gc_work); nf_ct_netns_put(net, family); for (i = 0; i < ARRAY_SIZE(data->root); ++i) diff --git a/net/netfilter/nf_conntrack_broadcast.c b/net/netfilter/nf_conntrack_broadcast.c index a1086bdec242..5423b197d98a 100644 --- a/net/netfilter/nf_conntrack_broadcast.c +++ b/net/netfilter/nf_conntrack_broadcast.c @@ -32,7 +32,7 @@ int nf_conntrack_broadcast_help(struct sk_buff *skb, __be32 mask = 0; /* we're only interested in locally generated packets */ - if (skb->sk == NULL) + if (skb->sk == NULL || !net_eq(nf_ct_net(ct), sock_net(skb->sk))) goto out; if (rt == NULL || !(rt->rt_flags & RTCF_BROADCAST)) goto out; diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 3d5280425027..a676d5f76bdc 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -37,7 +37,6 @@ #include <linux/rculist_nulls.h> #include <net/netfilter/nf_conntrack.h> -#include <net/netfilter/nf_conntrack_l3proto.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_expect.h> #include <net/netfilter/nf_conntrack_helper.h> @@ -55,6 +54,7 @@ #include <net/netfilter/nf_nat_core.h> #include <net/netfilter/nf_nat_helper.h> #include <net/netns/hash.h> +#include <net/ip.h> #include "nf_internals.h" @@ -222,7 +222,7 @@ static u32 hash_conntrack(const struct net *net, return scale_hash(hash_conntrack_raw(tuple, net)); } -bool +static bool nf_ct_get_tuple(const struct sk_buff *skb, unsigned int nhoff, unsigned int dataoff, @@ -230,37 +230,151 @@ nf_ct_get_tuple(const struct sk_buff *skb, u_int8_t protonum, struct net *net, struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_l3proto *l3proto, const struct nf_conntrack_l4proto *l4proto) { + unsigned int size; + const __be32 *ap; + __be32 _addrs[8]; + struct { + __be16 sport; + __be16 dport; + } _inet_hdr, *inet_hdr; + memset(tuple, 0, sizeof(*tuple)); tuple->src.l3num = l3num; - if (l3proto->pkt_to_tuple(skb, nhoff, tuple) == 0) + switch (l3num) { + case NFPROTO_IPV4: + nhoff += offsetof(struct iphdr, saddr); + size = 2 * sizeof(__be32); + break; + case NFPROTO_IPV6: + nhoff += offsetof(struct ipv6hdr, saddr); + size = sizeof(_addrs); + break; + default: + return true; + } + + ap = skb_header_pointer(skb, nhoff, size, _addrs); + if (!ap) return false; + switch (l3num) { + case NFPROTO_IPV4: + tuple->src.u3.ip = ap[0]; + tuple->dst.u3.ip = ap[1]; + break; + case NFPROTO_IPV6: + memcpy(tuple->src.u3.ip6, ap, sizeof(tuple->src.u3.ip6)); + memcpy(tuple->dst.u3.ip6, ap + 4, sizeof(tuple->dst.u3.ip6)); + break; + } + tuple->dst.protonum = protonum; tuple->dst.dir = IP_CT_DIR_ORIGINAL; - return l4proto->pkt_to_tuple(skb, dataoff, net, tuple); + if (unlikely(l4proto->pkt_to_tuple)) + return l4proto->pkt_to_tuple(skb, dataoff, net, tuple); + + /* Actually only need first 4 bytes to get ports. */ + inet_hdr = skb_header_pointer(skb, dataoff, sizeof(_inet_hdr), &_inet_hdr); + if (!inet_hdr) + return false; + + tuple->src.u.udp.port = inet_hdr->sport; + tuple->dst.u.udp.port = inet_hdr->dport; + return true; +} + +static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff, + u_int8_t *protonum) +{ + int dataoff = -1; + const struct iphdr *iph; + struct iphdr _iph; + + iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph); + if (!iph) + return -1; + + /* Conntrack defragments packets, we might still see fragments + * inside ICMP packets though. + */ + if (iph->frag_off & htons(IP_OFFSET)) + return -1; + + dataoff = nhoff + (iph->ihl << 2); + *protonum = iph->protocol; + + /* Check bogus IP headers */ + if (dataoff > skb->len) { + pr_debug("bogus IPv4 packet: nhoff %u, ihl %u, skblen %u\n", + nhoff, iph->ihl << 2, skb->len); + return -1; + } + return dataoff; +} + +#if IS_ENABLED(CONFIG_IPV6) +static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff, + u8 *protonum) +{ + int protoff = -1; + unsigned int extoff = nhoff + sizeof(struct ipv6hdr); + __be16 frag_off; + u8 nexthdr; + + if (skb_copy_bits(skb, nhoff + offsetof(struct ipv6hdr, nexthdr), + &nexthdr, sizeof(nexthdr)) != 0) { + pr_debug("can't get nexthdr\n"); + return -1; + } + protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off); + /* + * (protoff == skb->len) means the packet has not data, just + * IPv6 and possibly extensions headers, but it is tracked anyway + */ + if (protoff < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("can't find proto in pkt\n"); + return -1; + } + + *protonum = nexthdr; + return protoff; +} +#endif + +static int get_l4proto(const struct sk_buff *skb, + unsigned int nhoff, u8 pf, u8 *l4num) +{ + switch (pf) { + case NFPROTO_IPV4: + return ipv4_get_l4proto(skb, nhoff, l4num); +#if IS_ENABLED(CONFIG_IPV6) + case NFPROTO_IPV6: + return ipv6_get_l4proto(skb, nhoff, l4num); +#endif + default: + *l4num = 0; + break; + } + return -1; } -EXPORT_SYMBOL_GPL(nf_ct_get_tuple); bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff, u_int16_t l3num, struct net *net, struct nf_conntrack_tuple *tuple) { - const struct nf_conntrack_l3proto *l3proto; const struct nf_conntrack_l4proto *l4proto; - unsigned int protoff; - u_int8_t protonum; + u8 protonum; + int protoff; int ret; rcu_read_lock(); - l3proto = __nf_ct_l3proto_find(l3num); - ret = l3proto->get_l4proto(skb, nhoff, &protoff, &protonum); - if (ret != NF_ACCEPT) { + protoff = get_l4proto(skb, nhoff, l3num, &protonum); + if (protoff <= 0) { rcu_read_unlock(); return false; } @@ -268,7 +382,7 @@ bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff, l4proto = __nf_ct_l4proto_find(l3num, protonum); ret = nf_ct_get_tuple(skb, nhoff, protoff, l3num, protonum, net, tuple, - l3proto, l4proto); + l4proto); rcu_read_unlock(); return ret; @@ -278,19 +392,35 @@ EXPORT_SYMBOL_GPL(nf_ct_get_tuplepr); bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse, const struct nf_conntrack_tuple *orig, - const struct nf_conntrack_l3proto *l3proto, const struct nf_conntrack_l4proto *l4proto) { memset(inverse, 0, sizeof(*inverse)); inverse->src.l3num = orig->src.l3num; - if (l3proto->invert_tuple(inverse, orig) == 0) - return false; + + switch (orig->src.l3num) { + case NFPROTO_IPV4: + inverse->src.u3.ip = orig->dst.u3.ip; + inverse->dst.u3.ip = orig->src.u3.ip; + break; + case NFPROTO_IPV6: + inverse->src.u3.in6 = orig->dst.u3.in6; + inverse->dst.u3.in6 = orig->src.u3.in6; + break; + default: + break; + } inverse->dst.dir = !orig->dst.dir; inverse->dst.protonum = orig->dst.protonum; - return l4proto->invert_tuple(inverse, orig); + + if (unlikely(l4proto->invert_tuple)) + return l4proto->invert_tuple(inverse, orig); + + inverse->src.u.all = orig->dst.u.all; + inverse->dst.u.all = orig->src.u.all; + return true; } EXPORT_SYMBOL_GPL(nf_ct_invert_tuple); @@ -502,6 +632,18 @@ nf_ct_key_equal(struct nf_conntrack_tuple_hash *h, net_eq(net, nf_ct_net(ct)); } +static inline bool +nf_ct_match(const struct nf_conn *ct1, const struct nf_conn *ct2) +{ + return nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_ORIGINAL].tuple, + &ct2->tuplehash[IP_CT_DIR_ORIGINAL].tuple) && + nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_REPLY].tuple, + &ct2->tuplehash[IP_CT_DIR_REPLY].tuple) && + nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_ORIGINAL) && + nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_REPLY) && + net_eq(nf_ct_net(ct1), nf_ct_net(ct2)); +} + /* caller must hold rcu readlock and none of the nf_conntrack_locks */ static void nf_ct_gc_expired(struct nf_conn *ct) { @@ -695,19 +837,21 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb, /* This is the conntrack entry already in hashes that won race. */ struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); const struct nf_conntrack_l4proto *l4proto; + enum ip_conntrack_info oldinfo; + struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo); l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); if (l4proto->allow_clash && - ((ct->status & IPS_NAT_DONE_MASK) == 0) && !nf_ct_is_dying(ct) && atomic_inc_not_zero(&ct->ct_general.use)) { - enum ip_conntrack_info oldinfo; - struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo); - - nf_ct_acct_merge(ct, ctinfo, loser_ct); - nf_conntrack_put(&loser_ct->ct_general); - nf_ct_set(skb, ct, oldinfo); - return NF_ACCEPT; + if (((ct->status & IPS_NAT_DONE_MASK) == 0) || + nf_ct_match(ct, loser_ct)) { + nf_ct_acct_merge(ct, ctinfo, loser_ct); + nf_conntrack_put(&loser_ct->ct_general); + nf_ct_set(skb, ct, oldinfo); + return NF_ACCEPT; + } + nf_ct_put(ct); } NF_CT_STAT_INC(net, drop); return NF_DROP; @@ -1195,7 +1339,6 @@ EXPORT_SYMBOL_GPL(nf_conntrack_free); static noinline struct nf_conntrack_tuple_hash * init_conntrack(struct net *net, struct nf_conn *tmpl, const struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_l3proto *l3proto, const struct nf_conntrack_l4proto *l4proto, struct sk_buff *skb, unsigned int dataoff, u32 hash) @@ -1208,9 +1351,8 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, const struct nf_conntrack_zone *zone; struct nf_conn_timeout *timeout_ext; struct nf_conntrack_zone tmp; - unsigned int *timeouts; - if (!nf_ct_invert_tuple(&repl_tuple, tuple, l3proto, l4proto)) { + if (!nf_ct_invert_tuple(&repl_tuple, tuple, l4proto)) { pr_debug("Can't invert tuple.\n"); return NULL; } @@ -1227,15 +1369,8 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, } timeout_ext = tmpl ? nf_ct_timeout_find(tmpl) : NULL; - if (timeout_ext) { - timeouts = nf_ct_timeout_data(timeout_ext); - if (unlikely(!timeouts)) - timeouts = l4proto->get_timeouts(net); - } else { - timeouts = l4proto->get_timeouts(net); - } - if (!l4proto->new(ct, skb, dataoff, timeouts)) { + if (!l4proto->new(ct, skb, dataoff)) { nf_conntrack_free(ct); pr_debug("can't track with proto module\n"); return NULL; @@ -1266,8 +1401,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, /* exp->master safe, refcnt bumped in nf_ct_find_expectation */ ct->master = exp->master; if (exp->helper) { - help = nf_ct_helper_ext_add(ct, exp->helper, - GFP_ATOMIC); + help = nf_ct_helper_ext_add(ct, GFP_ATOMIC); if (help) rcu_assign_pointer(help->helper, exp->helper); } @@ -1307,7 +1441,6 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl, unsigned int dataoff, u_int16_t l3num, u_int8_t protonum, - const struct nf_conntrack_l3proto *l3proto, const struct nf_conntrack_l4proto *l4proto) { const struct nf_conntrack_zone *zone; @@ -1319,8 +1452,7 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl, u32 hash; if (!nf_ct_get_tuple(skb, skb_network_offset(skb), - dataoff, l3num, protonum, net, &tuple, l3proto, - l4proto)) { + dataoff, l3num, protonum, net, &tuple, l4proto)) { pr_debug("Can't get tuple\n"); return 0; } @@ -1330,7 +1462,7 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl, hash = hash_conntrack_raw(&tuple, net); h = __nf_conntrack_find_get(net, zone, &tuple, hash); if (!h) { - h = init_conntrack(net, tmpl, &tuple, l3proto, l4proto, + h = init_conntrack(net, tmpl, &tuple, l4proto, skb, dataoff, hash); if (!h) return 0; @@ -1363,14 +1495,11 @@ unsigned int nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum, struct sk_buff *skb) { - const struct nf_conntrack_l3proto *l3proto; const struct nf_conntrack_l4proto *l4proto; struct nf_conn *ct, *tmpl; enum ip_conntrack_info ctinfo; - unsigned int *timeouts; - unsigned int dataoff; u_int8_t protonum; - int ret; + int dataoff, ret; tmpl = nf_ct_get(skb, &ctinfo); if (tmpl || ctinfo == IP_CT_UNTRACKED) { @@ -1384,14 +1513,12 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum, } /* rcu_read_lock()ed by nf_hook_thresh */ - l3proto = __nf_ct_l3proto_find(pf); - ret = l3proto->get_l4proto(skb, skb_network_offset(skb), - &dataoff, &protonum); - if (ret <= 0) { + dataoff = get_l4proto(skb, skb_network_offset(skb), pf, &protonum); + if (dataoff <= 0) { pr_debug("not prepared to track yet or error occurred\n"); NF_CT_STAT_INC_ATOMIC(net, error); NF_CT_STAT_INC_ATOMIC(net, invalid); - ret = -ret; + ret = NF_ACCEPT; goto out; } @@ -1413,8 +1540,7 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum, goto out; } repeat: - ret = resolve_normal_ct(net, tmpl, skb, dataoff, pf, protonum, - l3proto, l4proto); + ret = resolve_normal_ct(net, tmpl, skb, dataoff, pf, protonum, l4proto); if (ret < 0) { /* Too stressed to deal. */ NF_CT_STAT_INC_ATOMIC(net, drop); @@ -1430,10 +1556,7 @@ repeat: goto out; } - /* Decide what timeout policy we want to apply to this flow. */ - timeouts = nf_ct_timeout_lookup(net, ct, l4proto); - - ret = l4proto->packet(ct, skb, dataoff, ctinfo, timeouts); + ret = l4proto->packet(ct, skb, dataoff, ctinfo); if (ret <= 0) { /* Invalid: inverse of the return code tells * the netfilter core what to do */ @@ -1471,7 +1594,6 @@ bool nf_ct_invert_tuplepr(struct nf_conntrack_tuple *inverse, rcu_read_lock(); ret = nf_ct_invert_tuple(inverse, orig, - __nf_ct_l3proto_find(orig->src.l3num), __nf_ct_l4proto_find(orig->src.l3num, orig->dst.protonum)); rcu_read_unlock(); @@ -1609,14 +1731,14 @@ static void nf_conntrack_attach(struct sk_buff *nskb, const struct sk_buff *skb) static int nf_conntrack_update(struct net *net, struct sk_buff *skb) { - const struct nf_conntrack_l3proto *l3proto; const struct nf_conntrack_l4proto *l4proto; struct nf_conntrack_tuple_hash *h; struct nf_conntrack_tuple tuple; enum ip_conntrack_info ctinfo; struct nf_nat_hook *nat_hook; - unsigned int dataoff, status; + unsigned int status; struct nf_conn *ct; + int dataoff; u16 l3num; u8 l4num; @@ -1625,16 +1747,15 @@ static int nf_conntrack_update(struct net *net, struct sk_buff *skb) return 0; l3num = nf_ct_l3num(ct); - l3proto = nf_ct_l3proto_find_get(l3num); - if (l3proto->get_l4proto(skb, skb_network_offset(skb), &dataoff, - &l4num) <= 0) + dataoff = get_l4proto(skb, skb_network_offset(skb), l3num, &l4num); + if (dataoff <= 0) return -1; l4proto = nf_ct_l4proto_find_get(l3num, l4num); if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num, - l4num, net, &tuple, l3proto, l4proto)) + l4num, net, &tuple, l4proto)) return -1; if (ct->status & IPS_SRC_NAT) { @@ -1683,6 +1804,41 @@ static int nf_conntrack_update(struct net *net, struct sk_buff *skb) return 0; } +static bool nf_conntrack_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, + const struct sk_buff *skb) +{ + const struct nf_conntrack_tuple *src_tuple; + const struct nf_conntrack_tuple_hash *hash; + struct nf_conntrack_tuple srctuple; + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + + ct = nf_ct_get(skb, &ctinfo); + if (ct) { + src_tuple = nf_ct_tuple(ct, CTINFO2DIR(ctinfo)); + memcpy(dst_tuple, src_tuple, sizeof(*dst_tuple)); + return true; + } + + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), + NFPROTO_IPV4, dev_net(skb->dev), + &srctuple)) + return false; + + hash = nf_conntrack_find_get(dev_net(skb->dev), + &nf_ct_zone_dflt, + &srctuple); + if (!hash) + return false; + + ct = nf_ct_tuplehash_to_ctrack(hash); + src_tuple = nf_ct_tuple(ct, !hash->tuple.dst.dir); + memcpy(dst_tuple, src_tuple, sizeof(*dst_tuple)); + nf_ct_put(ct); + + return true; +} + /* Bring out ya dead! */ static struct nf_conn * get_next_corpse(int (*iter)(struct nf_conn *i, void *data), @@ -1866,16 +2022,6 @@ static int kill_all(struct nf_conn *i, void *data) return net_eq(nf_ct_net(i), data); } -void nf_ct_free_hashtable(void *hash, unsigned int size) -{ - if (is_vmalloc_addr(hash)) - vfree(hash); - else - free_pages((unsigned long)hash, - get_order(sizeof(struct hlist_head) * size)); -} -EXPORT_SYMBOL_GPL(nf_ct_free_hashtable); - void nf_conntrack_cleanup_start(void) { conntrack_gc_work.exiting = true; @@ -1886,7 +2032,7 @@ void nf_conntrack_cleanup_end(void) { RCU_INIT_POINTER(nf_ct_hook, NULL); cancel_delayed_work_sync(&conntrack_gc_work.dwork); - nf_ct_free_hashtable(nf_conntrack_hash, nf_conntrack_htable_size); + kvfree(nf_conntrack_hash); nf_conntrack_proto_fini(); nf_conntrack_seqadj_fini(); @@ -1952,7 +2098,6 @@ void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls) { struct hlist_nulls_head *hash; unsigned int nr_slots, i; - size_t sz; if (*sizep > (UINT_MAX / sizeof(struct hlist_nulls_head))) return NULL; @@ -1960,14 +2105,8 @@ void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls) BUILD_BUG_ON(sizeof(struct hlist_nulls_head) != sizeof(struct hlist_head)); nr_slots = *sizep = roundup(*sizep, PAGE_SIZE / sizeof(struct hlist_nulls_head)); - if (nr_slots > (UINT_MAX / sizeof(struct hlist_nulls_head))) - return NULL; - - sz = nr_slots * sizeof(struct hlist_nulls_head); - hash = (void *)__get_free_pages(GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, - get_order(sz)); - if (!hash) - hash = vzalloc(sz); + hash = kvmalloc_array(nr_slots, sizeof(struct hlist_nulls_head), + GFP_KERNEL | __GFP_ZERO); if (hash && nulls) for (i = 0; i < nr_slots; i++) @@ -1994,7 +2133,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize) old_size = nf_conntrack_htable_size; if (old_size == hashsize) { - nf_ct_free_hashtable(hash, hashsize); + kvfree(hash); return 0; } @@ -2030,7 +2169,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize) local_bh_enable(); synchronize_net(); - nf_ct_free_hashtable(old_hash, old_size); + kvfree(old_hash); return 0; } @@ -2054,9 +2193,6 @@ int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp) } EXPORT_SYMBOL_GPL(nf_conntrack_set_hashsize); -module_param_call(hashsize, nf_conntrack_set_hashsize, param_get_uint, - &nf_conntrack_htable_size, 0600); - static __always_inline unsigned int total_extension_size(void) { /* remember to add new extensions below */ @@ -2197,13 +2333,14 @@ err_acct: err_expect: kmem_cache_destroy(nf_conntrack_cachep); err_cachep: - nf_ct_free_hashtable(nf_conntrack_hash, nf_conntrack_htable_size); + kvfree(nf_conntrack_hash); return ret; } static struct nf_ct_hook nf_conntrack_hook = { .update = nf_conntrack_update, .destroy = destroy_conntrack, + .get_tuple_skb = nf_conntrack_get_tuple_skb, }; void nf_conntrack_init_end(void) diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c index 853b23206bb7..27b84231db10 100644 --- a/net/netfilter/nf_conntrack_expect.c +++ b/net/netfilter/nf_conntrack_expect.c @@ -610,7 +610,6 @@ static int exp_seq_show(struct seq_file *s, void *v) expect->tuple.src.l3num, expect->tuple.dst.protonum); print_tuple(s, &expect->tuple, - __nf_ct_l3proto_find(expect->tuple.src.l3num), __nf_ct_l4proto_find(expect->tuple.src.l3num, expect->tuple.dst.protonum)); @@ -713,5 +712,5 @@ void nf_conntrack_expect_fini(void) { rcu_barrier(); /* Wait for call_rcu() before destroy */ kmem_cache_destroy(nf_ct_expect_cachep); - nf_ct_free_hashtable(nf_ct_expect_hash, nf_ct_expect_hsize); + kvfree(nf_ct_expect_hash); } diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index a75b11c39312..e24b762ffa1d 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -24,7 +24,6 @@ #include <linux/rtnetlink.h> #include <net/netfilter/nf_conntrack.h> -#include <net/netfilter/nf_conntrack_l3proto.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_core.h> @@ -193,8 +192,7 @@ void nf_conntrack_helper_put(struct nf_conntrack_helper *helper) EXPORT_SYMBOL_GPL(nf_conntrack_helper_put); struct nf_conn_help * -nf_ct_helper_ext_add(struct nf_conn *ct, - struct nf_conntrack_helper *helper, gfp_t gfp) +nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp) { struct nf_conn_help *help; @@ -263,7 +261,7 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, } if (help == NULL) { - help = nf_ct_helper_ext_add(ct, helper, flags); + help = nf_ct_helper_ext_add(ct, flags); if (help == NULL) return -ENOMEM; } else { @@ -564,12 +562,12 @@ int nf_conntrack_helper_init(void) return 0; out_extend: - nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize); + kvfree(nf_ct_helper_hash); return ret; } void nf_conntrack_helper_fini(void) { nf_ct_extend_unregister(&helper_extend); - nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize); + kvfree(nf_ct_helper_hash); } diff --git a/net/netfilter/nf_conntrack_l3proto_generic.c b/net/netfilter/nf_conntrack_l3proto_generic.c deleted file mode 100644 index 397e6911214f..000000000000 --- a/net/netfilter/nf_conntrack_l3proto_generic.c +++ /dev/null @@ -1,66 +0,0 @@ -/* - * (C) 2003,2004 USAGI/WIDE Project <http://www.linux-ipv6.org> - * - * Based largely upon the original ip_conntrack code which - * had the following copyright information: - * - * (C) 1999-2001 Paul `Rusty' Russell - * (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * Author: - * Yasuyuki Kozakai @USAGI <yasuyuki.kozakai@toshiba.co.jp> - */ - -#include <linux/types.h> -#include <linux/ip.h> -#include <linux/netfilter.h> -#include <linux/module.h> -#include <linux/skbuff.h> -#include <linux/icmp.h> -#include <linux/sysctl.h> -#include <net/ip.h> - -#include <linux/netfilter_ipv4.h> -#include <net/netfilter/nf_conntrack.h> -#include <net/netfilter/nf_conntrack_l4proto.h> -#include <net/netfilter/nf_conntrack_l3proto.h> -#include <net/netfilter/nf_conntrack_core.h> -#include <net/netfilter/ipv4/nf_conntrack_ipv4.h> - -static bool generic_pkt_to_tuple(const struct sk_buff *skb, unsigned int nhoff, - struct nf_conntrack_tuple *tuple) -{ - memset(&tuple->src.u3, 0, sizeof(tuple->src.u3)); - memset(&tuple->dst.u3, 0, sizeof(tuple->dst.u3)); - - return true; -} - -static bool generic_invert_tuple(struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_tuple *orig) -{ - memset(&tuple->src.u3, 0, sizeof(tuple->src.u3)); - memset(&tuple->dst.u3, 0, sizeof(tuple->dst.u3)); - - return true; -} - -static int generic_get_l4proto(const struct sk_buff *skb, unsigned int nhoff, - unsigned int *dataoff, u_int8_t *protonum) -{ - /* Never track !!! */ - return -NF_ACCEPT; -} - - -struct nf_conntrack_l3proto nf_conntrack_l3proto_generic __read_mostly = { - .l3proto = PF_UNSPEC, - .pkt_to_tuple = generic_pkt_to_tuple, - .invert_tuple = generic_invert_tuple, - .get_l4proto = generic_get_l4proto, -}; -EXPORT_SYMBOL_GPL(nf_conntrack_l3proto_generic); diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 20a2e37c76d1..f981bfa8db72 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -38,7 +38,6 @@ #include <net/netfilter/nf_conntrack_expect.h> #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_seqadj.h> -#include <net/netfilter/nf_conntrack_l3proto.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_tuple.h> #include <net/netfilter/nf_conntrack_acct.h> @@ -81,9 +80,26 @@ nla_put_failure: return -1; } +static int ipv4_tuple_to_nlattr(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple) +{ + if (nla_put_in_addr(skb, CTA_IP_V4_SRC, tuple->src.u3.ip) || + nla_put_in_addr(skb, CTA_IP_V4_DST, tuple->dst.u3.ip)) + return -EMSGSIZE; + return 0; +} + +static int ipv6_tuple_to_nlattr(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple) +{ + if (nla_put_in6_addr(skb, CTA_IP_V6_SRC, &tuple->src.u3.in6) || + nla_put_in6_addr(skb, CTA_IP_V6_DST, &tuple->dst.u3.in6)) + return -EMSGSIZE; + return 0; +} + static int ctnetlink_dump_tuples_ip(struct sk_buff *skb, - const struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_l3proto *l3proto) + const struct nf_conntrack_tuple *tuple) { int ret = 0; struct nlattr *nest_parms; @@ -92,8 +108,14 @@ static int ctnetlink_dump_tuples_ip(struct sk_buff *skb, if (!nest_parms) goto nla_put_failure; - if (likely(l3proto->tuple_to_nlattr)) - ret = l3proto->tuple_to_nlattr(skb, tuple); + switch (tuple->src.l3num) { + case NFPROTO_IPV4: + ret = ipv4_tuple_to_nlattr(skb, tuple); + break; + case NFPROTO_IPV6: + ret = ipv6_tuple_to_nlattr(skb, tuple); + break; + } nla_nest_end(skb, nest_parms); @@ -106,13 +128,11 @@ nla_put_failure: static int ctnetlink_dump_tuples(struct sk_buff *skb, const struct nf_conntrack_tuple *tuple) { - const struct nf_conntrack_l3proto *l3proto; const struct nf_conntrack_l4proto *l4proto; int ret; rcu_read_lock(); - l3proto = __nf_ct_l3proto_find(tuple->src.l3num); - ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto); + ret = ctnetlink_dump_tuples_ip(skb, tuple); if (ret >= 0) { l4proto = __nf_ct_l4proto_find(tuple->src.l3num, @@ -556,15 +576,20 @@ nla_put_failure: return -1; } +static const struct nla_policy cta_ip_nla_policy[CTA_IP_MAX + 1] = { + [CTA_IP_V4_SRC] = { .type = NLA_U32 }, + [CTA_IP_V4_DST] = { .type = NLA_U32 }, + [CTA_IP_V6_SRC] = { .len = sizeof(__be32) * 4 }, + [CTA_IP_V6_DST] = { .len = sizeof(__be32) * 4 }, +}; + #if defined(CONFIG_NETFILTER_NETLINK_GLUE_CT) || defined(CONFIG_NF_CONNTRACK_EVENTS) static size_t ctnetlink_proto_size(const struct nf_conn *ct) { - const struct nf_conntrack_l3proto *l3proto; const struct nf_conntrack_l4proto *l4proto; size_t len, len4 = 0; - l3proto = __nf_ct_l3proto_find(nf_ct_l3num(ct)); - len = l3proto->nla_size; + len = nla_policy_len(cta_ip_nla_policy, CTA_IP_MAX + 1); len *= 3u; /* ORIG, REPLY, MASTER */ l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); @@ -936,29 +961,54 @@ out: return skb->len; } +static int ipv4_nlattr_to_tuple(struct nlattr *tb[], + struct nf_conntrack_tuple *t) +{ + if (!tb[CTA_IP_V4_SRC] || !tb[CTA_IP_V4_DST]) + return -EINVAL; + + t->src.u3.ip = nla_get_in_addr(tb[CTA_IP_V4_SRC]); + t->dst.u3.ip = nla_get_in_addr(tb[CTA_IP_V4_DST]); + + return 0; +} + +static int ipv6_nlattr_to_tuple(struct nlattr *tb[], + struct nf_conntrack_tuple *t) +{ + if (!tb[CTA_IP_V6_SRC] || !tb[CTA_IP_V6_DST]) + return -EINVAL; + + t->src.u3.in6 = nla_get_in6_addr(tb[CTA_IP_V6_SRC]); + t->dst.u3.in6 = nla_get_in6_addr(tb[CTA_IP_V6_DST]); + + return 0; +} + static int ctnetlink_parse_tuple_ip(struct nlattr *attr, struct nf_conntrack_tuple *tuple) { struct nlattr *tb[CTA_IP_MAX+1]; - struct nf_conntrack_l3proto *l3proto; int ret = 0; ret = nla_parse_nested(tb, CTA_IP_MAX, attr, NULL, NULL); if (ret < 0) return ret; - rcu_read_lock(); - l3proto = __nf_ct_l3proto_find(tuple->src.l3num); + ret = nla_validate_nested(attr, CTA_IP_MAX, + cta_ip_nla_policy, NULL); + if (ret) + return ret; - if (likely(l3proto->nlattr_to_tuple)) { - ret = nla_validate_nested(attr, CTA_IP_MAX, - l3proto->nla_policy, NULL); - if (ret == 0) - ret = l3proto->nlattr_to_tuple(tb, tuple); + switch (tuple->src.l3num) { + case NFPROTO_IPV4: + ret = ipv4_nlattr_to_tuple(tb, tuple); + break; + case NFPROTO_IPV6: + ret = ipv6_nlattr_to_tuple(tb, tuple); + break; } - rcu_read_unlock(); - return ret; } @@ -1897,7 +1947,7 @@ ctnetlink_create_conntrack(struct net *net, } else { struct nf_conn_help *help; - help = nf_ct_helper_ext_add(ct, helper, GFP_ATOMIC); + help = nf_ct_helper_ext_add(ct, GFP_ATOMIC); if (help == NULL) { err = -ENOMEM; goto err2; @@ -2581,7 +2631,6 @@ static int ctnetlink_exp_dump_mask(struct sk_buff *skb, const struct nf_conntrack_tuple *tuple, const struct nf_conntrack_tuple_mask *mask) { - const struct nf_conntrack_l3proto *l3proto; const struct nf_conntrack_l4proto *l4proto; struct nf_conntrack_tuple m; struct nlattr *nest_parms; @@ -2597,8 +2646,7 @@ static int ctnetlink_exp_dump_mask(struct sk_buff *skb, goto nla_put_failure; rcu_read_lock(); - l3proto = __nf_ct_l3proto_find(tuple->src.l3num); - ret = ctnetlink_dump_tuples_ip(skb, &m, l3proto); + ret = ctnetlink_dump_tuples_ip(skb, &m); if (ret >= 0) { l4proto = __nf_ct_l4proto_find(tuple->src.l3num, tuple->dst.protonum); diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c index d88841fbc560..30070732ee50 100644 --- a/net/netfilter/nf_conntrack_proto.c +++ b/net/netfilter/nf_conntrack_proto.c @@ -1,14 +1,4 @@ -/* L3/L4 protocol support for nf_conntrack. */ - -/* (C) 1999-2001 Paul `Rusty' Russell - * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org> - * (C) 2003,2004 USAGI/WIDE Project <http://www.linux-ipv6.org> - * (C) 2006-2012 Patrick McHardy <kaber@trash.net> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ +// SPDX-License-Identifier: GPL-2.0 #include <linux/types.h> #include <linux/netfilter.h> @@ -24,14 +14,36 @@ #include <linux/netdevice.h> #include <net/netfilter/nf_conntrack.h> -#include <net/netfilter/nf_conntrack_l3proto.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_core.h> #include <net/netfilter/nf_log.h> +#include <linux/ip.h> +#include <linux/icmp.h> +#include <linux/sysctl.h> +#include <net/route.h> +#include <net/ip.h> + +#include <linux/netfilter_ipv4.h> +#include <linux/netfilter_ipv6.h> +#include <linux/netfilter_ipv6/ip6_tables.h> +#include <net/netfilter/nf_conntrack_helper.h> +#include <net/netfilter/nf_conntrack_zones.h> +#include <net/netfilter/nf_conntrack_seqadj.h> +#include <net/netfilter/ipv4/nf_conntrack_ipv4.h> +#include <net/netfilter/ipv6/nf_conntrack_ipv6.h> +#include <net/netfilter/nf_nat_helper.h> +#include <net/netfilter/ipv4/nf_defrag_ipv4.h> +#include <net/netfilter/ipv6/nf_defrag_ipv6.h> + +#include <linux/ipv6.h> +#include <linux/in6.h> +#include <net/ipv6.h> +#include <net/inet_frag.h> + +extern unsigned int nf_conntrack_net_id; + static struct nf_conntrack_l4proto __rcu **nf_ct_protos[NFPROTO_NUMPROTO] __read_mostly; -struct nf_conntrack_l3proto __rcu *nf_ct_l3protos[NFPROTO_NUMPROTO] __read_mostly; -EXPORT_SYMBOL_GPL(nf_ct_l3protos); static DEFINE_MUTEX(nf_ct_proto_mutex); @@ -122,137 +134,6 @@ __nf_ct_l4proto_find(u_int16_t l3proto, u_int8_t l4proto) } EXPORT_SYMBOL_GPL(__nf_ct_l4proto_find); -/* this is guaranteed to always return a valid protocol helper, since - * it falls back to generic_protocol */ -const struct nf_conntrack_l3proto * -nf_ct_l3proto_find_get(u_int16_t l3proto) -{ - struct nf_conntrack_l3proto *p; - - rcu_read_lock(); - p = __nf_ct_l3proto_find(l3proto); - if (!try_module_get(p->me)) - p = &nf_conntrack_l3proto_generic; - rcu_read_unlock(); - - return p; -} -EXPORT_SYMBOL_GPL(nf_ct_l3proto_find_get); - -int -nf_ct_l3proto_try_module_get(unsigned short l3proto) -{ - const struct nf_conntrack_l3proto *p; - int ret; - -retry: p = nf_ct_l3proto_find_get(l3proto); - if (p == &nf_conntrack_l3proto_generic) { - ret = request_module("nf_conntrack-%d", l3proto); - if (!ret) - goto retry; - - return -EPROTOTYPE; - } - - return 0; -} -EXPORT_SYMBOL_GPL(nf_ct_l3proto_try_module_get); - -void nf_ct_l3proto_module_put(unsigned short l3proto) -{ - struct nf_conntrack_l3proto *p; - - /* rcu_read_lock not necessary since the caller holds a reference, but - * taken anyways to avoid lockdep warnings in __nf_ct_l3proto_find() - */ - rcu_read_lock(); - p = __nf_ct_l3proto_find(l3proto); - module_put(p->me); - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(nf_ct_l3proto_module_put); - -static int nf_ct_netns_do_get(struct net *net, u8 nfproto) -{ - const struct nf_conntrack_l3proto *l3proto; - int ret; - - might_sleep(); - - ret = nf_ct_l3proto_try_module_get(nfproto); - if (ret < 0) - return ret; - - /* we already have a reference, can't fail */ - rcu_read_lock(); - l3proto = __nf_ct_l3proto_find(nfproto); - rcu_read_unlock(); - - if (!l3proto->net_ns_get) - return 0; - - ret = l3proto->net_ns_get(net); - if (ret < 0) - nf_ct_l3proto_module_put(nfproto); - - return ret; -} - -int nf_ct_netns_get(struct net *net, u8 nfproto) -{ - int err; - - if (nfproto == NFPROTO_INET) { - err = nf_ct_netns_do_get(net, NFPROTO_IPV4); - if (err < 0) - goto err1; - err = nf_ct_netns_do_get(net, NFPROTO_IPV6); - if (err < 0) - goto err2; - } else { - err = nf_ct_netns_do_get(net, nfproto); - if (err < 0) - goto err1; - } - return 0; - -err2: - nf_ct_netns_put(net, NFPROTO_IPV4); -err1: - return err; -} -EXPORT_SYMBOL_GPL(nf_ct_netns_get); - -static void nf_ct_netns_do_put(struct net *net, u8 nfproto) -{ - const struct nf_conntrack_l3proto *l3proto; - - might_sleep(); - - /* same as nf_conntrack_netns_get(), reference assumed */ - rcu_read_lock(); - l3proto = __nf_ct_l3proto_find(nfproto); - rcu_read_unlock(); - - if (WARN_ON(!l3proto)) - return; - - if (l3proto->net_ns_put) - l3proto->net_ns_put(net); - - nf_ct_l3proto_module_put(nfproto); -} - -void nf_ct_netns_put(struct net *net, uint8_t nfproto) -{ - if (nfproto == NFPROTO_INET) { - nf_ct_netns_do_put(net, NFPROTO_IPV4); - nf_ct_netns_do_put(net, NFPROTO_IPV6); - } else - nf_ct_netns_do_put(net, nfproto); -} -EXPORT_SYMBOL_GPL(nf_ct_netns_put); - const struct nf_conntrack_l4proto * nf_ct_l4proto_find_get(u_int16_t l3num, u_int8_t l4num) { @@ -274,11 +155,6 @@ void nf_ct_l4proto_put(const struct nf_conntrack_l4proto *p) } EXPORT_SYMBOL_GPL(nf_ct_l4proto_put); -static int kill_l3proto(struct nf_conn *i, void *data) -{ - return nf_ct_l3num(i) == ((const struct nf_conntrack_l3proto *)data)->l3proto; -} - static int kill_l4proto(struct nf_conn *i, void *data) { const struct nf_conntrack_l4proto *l4proto; @@ -287,52 +163,6 @@ static int kill_l4proto(struct nf_conn *i, void *data) nf_ct_l3num(i) == l4proto->l3proto; } -int nf_ct_l3proto_register(const struct nf_conntrack_l3proto *proto) -{ - int ret = 0; - struct nf_conntrack_l3proto *old; - - if (proto->l3proto >= NFPROTO_NUMPROTO) - return -EBUSY; -#if IS_ENABLED(CONFIG_NF_CT_NETLINK) - if (proto->tuple_to_nlattr && proto->nla_size == 0) - return -EINVAL; -#endif - mutex_lock(&nf_ct_proto_mutex); - old = rcu_dereference_protected(nf_ct_l3protos[proto->l3proto], - lockdep_is_held(&nf_ct_proto_mutex)); - if (old != &nf_conntrack_l3proto_generic) { - ret = -EBUSY; - goto out_unlock; - } - - rcu_assign_pointer(nf_ct_l3protos[proto->l3proto], proto); - -out_unlock: - mutex_unlock(&nf_ct_proto_mutex); - return ret; - -} -EXPORT_SYMBOL_GPL(nf_ct_l3proto_register); - -void nf_ct_l3proto_unregister(const struct nf_conntrack_l3proto *proto) -{ - BUG_ON(proto->l3proto >= NFPROTO_NUMPROTO); - - mutex_lock(&nf_ct_proto_mutex); - BUG_ON(rcu_dereference_protected(nf_ct_l3protos[proto->l3proto], - lockdep_is_held(&nf_ct_proto_mutex) - ) != proto); - rcu_assign_pointer(nf_ct_l3protos[proto->l3proto], - &nf_conntrack_l3proto_generic); - mutex_unlock(&nf_ct_proto_mutex); - - synchronize_rcu(); - /* Remove all contrack entries for this protocol */ - nf_ct_iterate_destroy(kill_l3proto, (void*)proto); -} -EXPORT_SYMBOL_GPL(nf_ct_l3proto_unregister); - static struct nf_proto_net *nf_ct_l4proto_net(struct net *net, const struct nf_conntrack_l4proto *l4proto) { @@ -499,8 +329,23 @@ void nf_ct_l4proto_pernet_unregister_one(struct net *net, } EXPORT_SYMBOL_GPL(nf_ct_l4proto_pernet_unregister_one); -int nf_ct_l4proto_register(const struct nf_conntrack_l4proto * const l4proto[], - unsigned int num_proto) +static void +nf_ct_l4proto_unregister(const struct nf_conntrack_l4proto * const l4proto[], + unsigned int num_proto) +{ + mutex_lock(&nf_ct_proto_mutex); + while (num_proto-- != 0) + __nf_ct_l4proto_unregister_one(l4proto[num_proto]); + mutex_unlock(&nf_ct_proto_mutex); + + synchronize_net(); + /* Remove all contrack entries for this protocol */ + nf_ct_iterate_destroy(kill_l4proto, (void *)l4proto); +} + +static int +nf_ct_l4proto_register(const struct nf_conntrack_l4proto * const l4proto[], + unsigned int num_proto) { int ret = -EINVAL, ver; unsigned int i; @@ -518,7 +363,6 @@ int nf_ct_l4proto_register(const struct nf_conntrack_l4proto * const l4proto[], } return ret; } -EXPORT_SYMBOL_GPL(nf_ct_l4proto_register); int nf_ct_l4proto_pernet_register(struct net *net, const struct nf_conntrack_l4proto *const l4proto[], @@ -542,20 +386,6 @@ int nf_ct_l4proto_pernet_register(struct net *net, } EXPORT_SYMBOL_GPL(nf_ct_l4proto_pernet_register); -void nf_ct_l4proto_unregister(const struct nf_conntrack_l4proto * const l4proto[], - unsigned int num_proto) -{ - mutex_lock(&nf_ct_proto_mutex); - while (num_proto-- != 0) - __nf_ct_l4proto_unregister_one(l4proto[num_proto]); - mutex_unlock(&nf_ct_proto_mutex); - - synchronize_net(); - /* Remove all contrack entries for this protocol */ - nf_ct_iterate_destroy(kill_l4proto, (void *)l4proto); -} -EXPORT_SYMBOL_GPL(nf_ct_l4proto_unregister); - void nf_ct_l4proto_pernet_unregister(struct net *net, const struct nf_conntrack_l4proto *const l4proto[], unsigned int num_proto) @@ -565,6 +395,562 @@ void nf_ct_l4proto_pernet_unregister(struct net *net, } EXPORT_SYMBOL_GPL(nf_ct_l4proto_pernet_unregister); +static unsigned int ipv4_helper(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct nf_conn *ct; + enum ip_conntrack_info ctinfo; + const struct nf_conn_help *help; + const struct nf_conntrack_helper *helper; + + /* This is where we call the helper: as the packet goes out. */ + ct = nf_ct_get(skb, &ctinfo); + if (!ct || ctinfo == IP_CT_RELATED_REPLY) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help) + return NF_ACCEPT; + + /* rcu_read_lock()ed by nf_hook_thresh */ + helper = rcu_dereference(help->helper); + if (!helper) + return NF_ACCEPT; + + return helper->help(skb, skb_network_offset(skb) + ip_hdrlen(skb), + ct, ctinfo); +} + +static unsigned int ipv4_confirm(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct nf_conn *ct; + enum ip_conntrack_info ctinfo; + + ct = nf_ct_get(skb, &ctinfo); + if (!ct || ctinfo == IP_CT_RELATED_REPLY) + goto out; + + /* adjust seqs for loopback traffic only in outgoing direction */ + if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) && + !nf_is_loopback_packet(skb)) { + if (!nf_ct_seq_adjust(skb, ct, ctinfo, ip_hdrlen(skb))) { + NF_CT_STAT_INC_ATOMIC(nf_ct_net(ct), drop); + return NF_DROP; + } + } +out: + /* We've seen it coming out the other side: confirm it */ + return nf_conntrack_confirm(skb); +} + +static unsigned int ipv4_conntrack_in(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + return nf_conntrack_in(state->net, PF_INET, state->hook, skb); +} + +static unsigned int ipv4_conntrack_local(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + if (ip_is_fragment(ip_hdr(skb))) { /* IP_NODEFRAG setsockopt set */ + enum ip_conntrack_info ctinfo; + struct nf_conn *tmpl; + + tmpl = nf_ct_get(skb, &ctinfo); + if (tmpl && nf_ct_is_template(tmpl)) { + /* when skipping ct, clear templates to avoid fooling + * later targets/matches + */ + skb->_nfct = 0; + nf_ct_put(tmpl); + } + return NF_ACCEPT; + } + + return nf_conntrack_in(state->net, PF_INET, state->hook, skb); +} + +/* Connection tracking may drop packets, but never alters them, so + * make it the first hook. + */ +static const struct nf_hook_ops ipv4_conntrack_ops[] = { + { + .hook = ipv4_conntrack_in, + .pf = NFPROTO_IPV4, + .hooknum = NF_INET_PRE_ROUTING, + .priority = NF_IP_PRI_CONNTRACK, + }, + { + .hook = ipv4_conntrack_local, + .pf = NFPROTO_IPV4, + .hooknum = NF_INET_LOCAL_OUT, + .priority = NF_IP_PRI_CONNTRACK, + }, + { + .hook = ipv4_helper, + .pf = NFPROTO_IPV4, + .hooknum = NF_INET_POST_ROUTING, + .priority = NF_IP_PRI_CONNTRACK_HELPER, + }, + { + .hook = ipv4_confirm, + .pf = NFPROTO_IPV4, + .hooknum = NF_INET_POST_ROUTING, + .priority = NF_IP_PRI_CONNTRACK_CONFIRM, + }, + { + .hook = ipv4_helper, + .pf = NFPROTO_IPV4, + .hooknum = NF_INET_LOCAL_IN, + .priority = NF_IP_PRI_CONNTRACK_HELPER, + }, + { + .hook = ipv4_confirm, + .pf = NFPROTO_IPV4, + .hooknum = NF_INET_LOCAL_IN, + .priority = NF_IP_PRI_CONNTRACK_CONFIRM, + }, +}; + +/* Fast function for those who don't want to parse /proc (and I don't + * blame them). + * Reversing the socket's dst/src point of view gives us the reply + * mapping. + */ +static int +getorigdst(struct sock *sk, int optval, void __user *user, int *len) +{ + const struct inet_sock *inet = inet_sk(sk); + const struct nf_conntrack_tuple_hash *h; + struct nf_conntrack_tuple tuple; + + memset(&tuple, 0, sizeof(tuple)); + + lock_sock(sk); + tuple.src.u3.ip = inet->inet_rcv_saddr; + tuple.src.u.tcp.port = inet->inet_sport; + tuple.dst.u3.ip = inet->inet_daddr; + tuple.dst.u.tcp.port = inet->inet_dport; + tuple.src.l3num = PF_INET; + tuple.dst.protonum = sk->sk_protocol; + release_sock(sk); + + /* We only do TCP and SCTP at the moment: is there a better way? */ + if (tuple.dst.protonum != IPPROTO_TCP && + tuple.dst.protonum != IPPROTO_SCTP) { + pr_debug("SO_ORIGINAL_DST: Not a TCP/SCTP socket\n"); + return -ENOPROTOOPT; + } + + if ((unsigned int)*len < sizeof(struct sockaddr_in)) { + pr_debug("SO_ORIGINAL_DST: len %d not %zu\n", + *len, sizeof(struct sockaddr_in)); + return -EINVAL; + } + + h = nf_conntrack_find_get(sock_net(sk), &nf_ct_zone_dflt, &tuple); + if (h) { + struct sockaddr_in sin; + struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); + + sin.sin_family = AF_INET; + sin.sin_port = ct->tuplehash[IP_CT_DIR_ORIGINAL] + .tuple.dst.u.tcp.port; + sin.sin_addr.s_addr = ct->tuplehash[IP_CT_DIR_ORIGINAL] + .tuple.dst.u3.ip; + memset(sin.sin_zero, 0, sizeof(sin.sin_zero)); + + pr_debug("SO_ORIGINAL_DST: %pI4 %u\n", + &sin.sin_addr.s_addr, ntohs(sin.sin_port)); + nf_ct_put(ct); + if (copy_to_user(user, &sin, sizeof(sin)) != 0) + return -EFAULT; + else + return 0; + } + pr_debug("SO_ORIGINAL_DST: Can't find %pI4/%u-%pI4/%u.\n", + &tuple.src.u3.ip, ntohs(tuple.src.u.tcp.port), + &tuple.dst.u3.ip, ntohs(tuple.dst.u.tcp.port)); + return -ENOENT; +} + +static struct nf_sockopt_ops so_getorigdst = { + .pf = PF_INET, + .get_optmin = SO_ORIGINAL_DST, + .get_optmax = SO_ORIGINAL_DST + 1, + .get = getorigdst, + .owner = THIS_MODULE, +}; + +#if IS_ENABLED(CONFIG_IPV6) +static int +ipv6_getorigdst(struct sock *sk, int optval, void __user *user, int *len) +{ + struct nf_conntrack_tuple tuple = { .src.l3num = NFPROTO_IPV6 }; + const struct ipv6_pinfo *inet6 = inet6_sk(sk); + const struct inet_sock *inet = inet_sk(sk); + const struct nf_conntrack_tuple_hash *h; + struct sockaddr_in6 sin6; + struct nf_conn *ct; + __be32 flow_label; + int bound_dev_if; + + lock_sock(sk); + tuple.src.u3.in6 = sk->sk_v6_rcv_saddr; + tuple.src.u.tcp.port = inet->inet_sport; + tuple.dst.u3.in6 = sk->sk_v6_daddr; + tuple.dst.u.tcp.port = inet->inet_dport; + tuple.dst.protonum = sk->sk_protocol; + bound_dev_if = sk->sk_bound_dev_if; + flow_label = inet6->flow_label; + release_sock(sk); + + if (tuple.dst.protonum != IPPROTO_TCP && + tuple.dst.protonum != IPPROTO_SCTP) + return -ENOPROTOOPT; + + if (*len < 0 || (unsigned int)*len < sizeof(sin6)) + return -EINVAL; + + h = nf_conntrack_find_get(sock_net(sk), &nf_ct_zone_dflt, &tuple); + if (!h) { + pr_debug("IP6T_SO_ORIGINAL_DST: Can't find %pI6c/%u-%pI6c/%u.\n", + &tuple.src.u3.ip6, ntohs(tuple.src.u.tcp.port), + &tuple.dst.u3.ip6, ntohs(tuple.dst.u.tcp.port)); + return -ENOENT; + } + + ct = nf_ct_tuplehash_to_ctrack(h); + + sin6.sin6_family = AF_INET6; + sin6.sin6_port = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u.tcp.port; + sin6.sin6_flowinfo = flow_label & IPV6_FLOWINFO_MASK; + memcpy(&sin6.sin6_addr, + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.in6, + sizeof(sin6.sin6_addr)); + + nf_ct_put(ct); + sin6.sin6_scope_id = ipv6_iface_scope_id(&sin6.sin6_addr, bound_dev_if); + return copy_to_user(user, &sin6, sizeof(sin6)) ? -EFAULT : 0; +} + +static struct nf_sockopt_ops so_getorigdst6 = { + .pf = NFPROTO_IPV6, + .get_optmin = IP6T_SO_ORIGINAL_DST, + .get_optmax = IP6T_SO_ORIGINAL_DST + 1, + .get = ipv6_getorigdst, + .owner = THIS_MODULE, +}; + +static unsigned int ipv6_confirm(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct nf_conn *ct; + enum ip_conntrack_info ctinfo; + unsigned char pnum = ipv6_hdr(skb)->nexthdr; + int protoff; + __be16 frag_off; + + ct = nf_ct_get(skb, &ctinfo); + if (!ct || ctinfo == IP_CT_RELATED_REPLY) + goto out; + + protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &pnum, + &frag_off); + if (protoff < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("proto header not found\n"); + goto out; + } + + /* adjust seqs for loopback traffic only in outgoing direction */ + if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) && + !nf_is_loopback_packet(skb)) { + if (!nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) { + NF_CT_STAT_INC_ATOMIC(nf_ct_net(ct), drop); + return NF_DROP; + } + } +out: + /* We've seen it coming out the other side: confirm it */ + return nf_conntrack_confirm(skb); +} + +static unsigned int ipv6_conntrack_in(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + return nf_conntrack_in(state->net, PF_INET6, state->hook, skb); +} + +static unsigned int ipv6_conntrack_local(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + return nf_conntrack_in(state->net, PF_INET6, state->hook, skb); +} + +static unsigned int ipv6_helper(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct nf_conn *ct; + const struct nf_conn_help *help; + const struct nf_conntrack_helper *helper; + enum ip_conntrack_info ctinfo; + __be16 frag_off; + int protoff; + u8 nexthdr; + + /* This is where we call the helper: as the packet goes out. */ + ct = nf_ct_get(skb, &ctinfo); + if (!ct || ctinfo == IP_CT_RELATED_REPLY) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help) + return NF_ACCEPT; + /* rcu_read_lock()ed by nf_hook_thresh */ + helper = rcu_dereference(help->helper); + if (!helper) + return NF_ACCEPT; + + nexthdr = ipv6_hdr(skb)->nexthdr; + protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, + &frag_off); + if (protoff < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("proto header not found\n"); + return NF_ACCEPT; + } + + return helper->help(skb, protoff, ct, ctinfo); +} + +static const struct nf_hook_ops ipv6_conntrack_ops[] = { + { + .hook = ipv6_conntrack_in, + .pf = NFPROTO_IPV6, + .hooknum = NF_INET_PRE_ROUTING, + .priority = NF_IP6_PRI_CONNTRACK, + }, + { + .hook = ipv6_conntrack_local, + .pf = NFPROTO_IPV6, + .hooknum = NF_INET_LOCAL_OUT, + .priority = NF_IP6_PRI_CONNTRACK, + }, + { + .hook = ipv6_helper, + .pf = NFPROTO_IPV6, + .hooknum = NF_INET_POST_ROUTING, + .priority = NF_IP6_PRI_CONNTRACK_HELPER, + }, + { + .hook = ipv6_confirm, + .pf = NFPROTO_IPV6, + .hooknum = NF_INET_POST_ROUTING, + .priority = NF_IP6_PRI_LAST, + }, + { + .hook = ipv6_helper, + .pf = NFPROTO_IPV6, + .hooknum = NF_INET_LOCAL_IN, + .priority = NF_IP6_PRI_CONNTRACK_HELPER, + }, + { + .hook = ipv6_confirm, + .pf = NFPROTO_IPV6, + .hooknum = NF_INET_LOCAL_IN, + .priority = NF_IP6_PRI_LAST - 1, + }, +}; +#endif + +static int nf_ct_netns_do_get(struct net *net, u8 nfproto) +{ + struct nf_conntrack_net *cnet = net_generic(net, nf_conntrack_net_id); + int err = 0; + + mutex_lock(&nf_ct_proto_mutex); + + switch (nfproto) { + case NFPROTO_IPV4: + cnet->users4++; + if (cnet->users4 > 1) + goto out_unlock; + err = nf_defrag_ipv4_enable(net); + if (err) { + cnet->users4 = 0; + goto out_unlock; + } + + err = nf_register_net_hooks(net, ipv4_conntrack_ops, + ARRAY_SIZE(ipv4_conntrack_ops)); + if (err) + cnet->users4 = 0; + break; +#if IS_ENABLED(CONFIG_IPV6) + case NFPROTO_IPV6: + cnet->users6++; + if (cnet->users6 > 1) + goto out_unlock; + err = nf_defrag_ipv6_enable(net); + if (err < 0) { + cnet->users6 = 0; + goto out_unlock; + } + + err = nf_register_net_hooks(net, ipv6_conntrack_ops, + ARRAY_SIZE(ipv6_conntrack_ops)); + if (err) + cnet->users6 = 0; + break; +#endif + default: + err = -EPROTO; + break; + } + out_unlock: + mutex_unlock(&nf_ct_proto_mutex); + return err; +} + +static void nf_ct_netns_do_put(struct net *net, u8 nfproto) +{ + struct nf_conntrack_net *cnet = net_generic(net, nf_conntrack_net_id); + + mutex_lock(&nf_ct_proto_mutex); + switch (nfproto) { + case NFPROTO_IPV4: + if (cnet->users4 && (--cnet->users4 == 0)) + nf_unregister_net_hooks(net, ipv4_conntrack_ops, + ARRAY_SIZE(ipv4_conntrack_ops)); + break; +#if IS_ENABLED(CONFIG_IPV6) + case NFPROTO_IPV6: + if (cnet->users6 && (--cnet->users6 == 0)) + nf_unregister_net_hooks(net, ipv6_conntrack_ops, + ARRAY_SIZE(ipv6_conntrack_ops)); + break; +#endif + } + + mutex_unlock(&nf_ct_proto_mutex); +} + +int nf_ct_netns_get(struct net *net, u8 nfproto) +{ + int err; + + if (nfproto == NFPROTO_INET) { + err = nf_ct_netns_do_get(net, NFPROTO_IPV4); + if (err < 0) + goto err1; + err = nf_ct_netns_do_get(net, NFPROTO_IPV6); + if (err < 0) + goto err2; + } else { + err = nf_ct_netns_do_get(net, nfproto); + if (err < 0) + goto err1; + } + return 0; + +err2: + nf_ct_netns_put(net, NFPROTO_IPV4); +err1: + return err; +} +EXPORT_SYMBOL_GPL(nf_ct_netns_get); + +void nf_ct_netns_put(struct net *net, uint8_t nfproto) +{ + if (nfproto == NFPROTO_INET) { + nf_ct_netns_do_put(net, NFPROTO_IPV4); + nf_ct_netns_do_put(net, NFPROTO_IPV6); + } else { + nf_ct_netns_do_put(net, nfproto); + } +} +EXPORT_SYMBOL_GPL(nf_ct_netns_put); + +static const struct nf_conntrack_l4proto * const builtin_l4proto[] = { + &nf_conntrack_l4proto_tcp4, + &nf_conntrack_l4proto_udp4, + &nf_conntrack_l4proto_icmp, +#ifdef CONFIG_NF_CT_PROTO_DCCP + &nf_conntrack_l4proto_dccp4, +#endif +#ifdef CONFIG_NF_CT_PROTO_SCTP + &nf_conntrack_l4proto_sctp4, +#endif +#ifdef CONFIG_NF_CT_PROTO_UDPLITE + &nf_conntrack_l4proto_udplite4, +#endif +#if IS_ENABLED(CONFIG_IPV6) + &nf_conntrack_l4proto_tcp6, + &nf_conntrack_l4proto_udp6, + &nf_conntrack_l4proto_icmpv6, +#ifdef CONFIG_NF_CT_PROTO_DCCP + &nf_conntrack_l4proto_dccp6, +#endif +#ifdef CONFIG_NF_CT_PROTO_SCTP + &nf_conntrack_l4proto_sctp6, +#endif +#ifdef CONFIG_NF_CT_PROTO_UDPLITE + &nf_conntrack_l4proto_udplite6, +#endif +#endif /* CONFIG_IPV6 */ +}; + +int nf_conntrack_proto_init(void) +{ + int ret = 0; + + ret = nf_register_sockopt(&so_getorigdst); + if (ret < 0) + return ret; + +#if IS_ENABLED(CONFIG_IPV6) + ret = nf_register_sockopt(&so_getorigdst6); + if (ret < 0) + goto cleanup_sockopt; +#endif + ret = nf_ct_l4proto_register(builtin_l4proto, + ARRAY_SIZE(builtin_l4proto)); + if (ret < 0) + goto cleanup_sockopt2; + + return ret; +cleanup_sockopt2: + nf_unregister_sockopt(&so_getorigdst); +#if IS_ENABLED(CONFIG_IPV6) +cleanup_sockopt: + nf_unregister_sockopt(&so_getorigdst6); +#endif + return ret; +} + +void nf_conntrack_proto_fini(void) +{ + unsigned int i; + + nf_unregister_sockopt(&so_getorigdst); +#if IS_ENABLED(CONFIG_IPV6) + nf_unregister_sockopt(&so_getorigdst6); +#endif + /* No need to call nf_ct_l4proto_unregister(), the register + * tables are free'd here anyway. + */ + for (i = 0; i < ARRAY_SIZE(nf_ct_protos); i++) + kfree(nf_ct_protos[i]); +} + int nf_conntrack_proto_pernet_init(struct net *net) { int err; @@ -581,6 +967,14 @@ int nf_conntrack_proto_pernet_init(struct net *net) if (err < 0) return err; + err = nf_ct_l4proto_pernet_register(net, builtin_l4proto, + ARRAY_SIZE(builtin_l4proto)); + if (err < 0) { + nf_ct_l4proto_unregister_sysctl(net, pn, + &nf_conntrack_l4proto_generic); + return err; + } + pn->users++; return 0; } @@ -590,25 +984,19 @@ void nf_conntrack_proto_pernet_fini(struct net *net) struct nf_proto_net *pn = nf_ct_l4proto_net(net, &nf_conntrack_l4proto_generic); + nf_ct_l4proto_pernet_unregister(net, builtin_l4proto, + ARRAY_SIZE(builtin_l4proto)); pn->users--; nf_ct_l4proto_unregister_sysctl(net, pn, &nf_conntrack_l4proto_generic); } -int nf_conntrack_proto_init(void) -{ - unsigned int i; - for (i = 0; i < NFPROTO_NUMPROTO; i++) - rcu_assign_pointer(nf_ct_l3protos[i], - &nf_conntrack_l3proto_generic); - return 0; -} -void nf_conntrack_proto_fini(void) -{ - unsigned int i; - /* free l3proto protocol tables */ - for (i = 0; i < ARRAY_SIZE(nf_ct_protos); i++) - kfree(nf_ct_protos[i]); -} +module_param_call(hashsize, nf_conntrack_set_hashsize, param_get_uint, + &nf_conntrack_htable_size, 0600); + +MODULE_ALIAS("ip_conntrack"); +MODULE_ALIAS("nf_conntrack-" __stringify(AF_INET)); +MODULE_ALIAS("nf_conntrack-" __stringify(AF_INET6)); +MODULE_LICENSE("GPL"); diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c index 9ce6336d1e55..8c58f96b59e7 100644 --- a/net/netfilter/nf_conntrack_proto_dccp.c +++ b/net/netfilter/nf_conntrack_proto_dccp.c @@ -23,6 +23,7 @@ #include <net/netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_ecache.h> +#include <net/netfilter/nf_conntrack_timeout.h> #include <net/netfilter/nf_log.h> /* Timeouts are based on values from RFC4340: @@ -388,31 +389,8 @@ static inline struct nf_dccp_net *dccp_pernet(struct net *net) return &net->ct.nf_ct_proto.dccp; } -static bool dccp_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff, - struct net *net, struct nf_conntrack_tuple *tuple) -{ - struct dccp_hdr _hdr, *dh; - - /* Actually only need first 4 bytes to get ports. */ - dh = skb_header_pointer(skb, dataoff, 4, &_hdr); - if (dh == NULL) - return false; - - tuple->src.u.dccp.port = dh->dccph_sport; - tuple->dst.u.dccp.port = dh->dccph_dport; - return true; -} - -static bool dccp_invert_tuple(struct nf_conntrack_tuple *inv, - const struct nf_conntrack_tuple *tuple) -{ - inv->src.u.dccp.port = tuple->dst.u.dccp.port; - inv->dst.u.dccp.port = tuple->src.u.dccp.port; - return true; -} - static bool dccp_new(struct nf_conn *ct, const struct sk_buff *skb, - unsigned int dataoff, unsigned int *timeouts) + unsigned int dataoff) { struct net *net = nf_ct_net(ct); struct nf_dccp_net *dn; @@ -460,19 +438,14 @@ static u64 dccp_ack_seq(const struct dccp_hdr *dh) ntohl(dhack->dccph_ack_nr_low); } -static unsigned int *dccp_get_timeouts(struct net *net) -{ - return dccp_pernet(net)->dccp_timeout; -} - static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb, - unsigned int dataoff, enum ip_conntrack_info ctinfo, - unsigned int *timeouts) + unsigned int dataoff, enum ip_conntrack_info ctinfo) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); struct dccp_hdr _dh, *dh; u_int8_t type, old_state, new_state; enum ct_dccp_roles role; + unsigned int *timeouts; dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); BUG_ON(dh == NULL); @@ -546,6 +519,9 @@ static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb, if (new_state != old_state) nf_conntrack_event_cache(IPCT_PROTOINFO, ct); + timeouts = nf_ct_timeout_lookup(ct); + if (!timeouts) + timeouts = dccp_pernet(nf_ct_net(ct))->dccp_timeout; nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[new_state]); return NF_ACCEPT; @@ -864,11 +840,8 @@ static struct nf_proto_net *dccp_get_net_proto(struct net *net) const struct nf_conntrack_l4proto nf_conntrack_l4proto_dccp4 = { .l3proto = AF_INET, .l4proto = IPPROTO_DCCP, - .pkt_to_tuple = dccp_pkt_to_tuple, - .invert_tuple = dccp_invert_tuple, .new = dccp_new, .packet = dccp_packet, - .get_timeouts = dccp_get_timeouts, .error = dccp_error, .can_early_drop = dccp_can_early_drop, #ifdef CONFIG_NF_CONNTRACK_PROCFS @@ -900,11 +873,8 @@ EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_dccp4); const struct nf_conntrack_l4proto nf_conntrack_l4proto_dccp6 = { .l3proto = AF_INET6, .l4proto = IPPROTO_DCCP, - .pkt_to_tuple = dccp_pkt_to_tuple, - .invert_tuple = dccp_invert_tuple, .new = dccp_new, .packet = dccp_packet, - .get_timeouts = dccp_get_timeouts, .error = dccp_error, .can_early_drop = dccp_can_early_drop, #ifdef CONFIG_NF_CONNTRACK_PROCFS diff --git a/net/netfilter/nf_conntrack_proto_generic.c b/net/netfilter/nf_conntrack_proto_generic.c index 6c6896d21cd7..ac4a0b296dcd 100644 --- a/net/netfilter/nf_conntrack_proto_generic.c +++ b/net/netfilter/nf_conntrack_proto_generic.c @@ -11,6 +11,7 @@ #include <linux/timer.h> #include <linux/netfilter.h> #include <net/netfilter/nf_conntrack_l4proto.h> +#include <net/netfilter/nf_conntrack_timeout.h> static const unsigned int nf_ct_generic_timeout = 600*HZ; @@ -41,34 +42,24 @@ static bool generic_pkt_to_tuple(const struct sk_buff *skb, return true; } -static bool generic_invert_tuple(struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_tuple *orig) -{ - tuple->src.u.all = 0; - tuple->dst.u.all = 0; - - return true; -} - -static unsigned int *generic_get_timeouts(struct net *net) -{ - return &(generic_pernet(net)->timeout); -} - /* Returns verdict for packet, or -1 for invalid. */ static int generic_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, - enum ip_conntrack_info ctinfo, - unsigned int *timeout) + enum ip_conntrack_info ctinfo) { + const unsigned int *timeout = nf_ct_timeout_lookup(ct); + + if (!timeout) + timeout = &generic_pernet(nf_ct_net(ct))->timeout; + nf_ct_refresh_acct(ct, ctinfo, skb, *timeout); return NF_ACCEPT; } /* Called when a new connection for this protocol found. */ static bool generic_new(struct nf_conn *ct, const struct sk_buff *skb, - unsigned int dataoff, unsigned int *timeouts) + unsigned int dataoff) { bool ret; @@ -87,8 +78,11 @@ static bool generic_new(struct nf_conn *ct, const struct sk_buff *skb, static int generic_timeout_nlattr_to_obj(struct nlattr *tb[], struct net *net, void *data) { - unsigned int *timeout = data; struct nf_generic_net *gn = generic_pernet(net); + unsigned int *timeout = data; + + if (!timeout) + timeout = &gn->timeout; if (tb[CTA_TIMEOUT_GENERIC_TIMEOUT]) *timeout = @@ -168,9 +162,7 @@ const struct nf_conntrack_l4proto nf_conntrack_l4proto_generic = .l3proto = PF_UNSPEC, .l4proto = 255, .pkt_to_tuple = generic_pkt_to_tuple, - .invert_tuple = generic_invert_tuple, .packet = generic_packet, - .get_timeouts = generic_get_timeouts, .new = generic_new, #if IS_ENABLED(CONFIG_NF_CT_NETLINK_TIMEOUT) .ctnl_timeout = { diff --git a/net/netfilter/nf_conntrack_proto_gre.c b/net/netfilter/nf_conntrack_proto_gre.c index d049ea5a3770..d1632252bf5b 100644 --- a/net/netfilter/nf_conntrack_proto_gre.c +++ b/net/netfilter/nf_conntrack_proto_gre.c @@ -39,6 +39,7 @@ #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_core.h> +#include <net/netfilter/nf_conntrack_timeout.h> #include <linux/netfilter/nf_conntrack_proto_gre.h> #include <linux/netfilter/nf_conntrack_pptp.h> @@ -179,15 +180,6 @@ EXPORT_SYMBOL_GPL(nf_ct_gre_keymap_destroy); /* PUBLIC CONNTRACK PROTO HELPER FUNCTIONS */ -/* invert gre part of tuple */ -static bool gre_invert_tuple(struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_tuple *orig) -{ - tuple->dst.u.gre.key = orig->src.u.gre.key; - tuple->src.u.gre.key = orig->dst.u.gre.key; - return true; -} - /* gre hdr info to tuple */ static bool gre_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff, struct net *net, struct nf_conntrack_tuple *tuple) @@ -243,8 +235,7 @@ static unsigned int *gre_get_timeouts(struct net *net) static int gre_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, - enum ip_conntrack_info ctinfo, - unsigned int *timeouts) + enum ip_conntrack_info ctinfo) { /* If we've seen traffic both ways, this is a GRE connection. * Extend timeout. */ @@ -263,8 +254,13 @@ static int gre_packet(struct nf_conn *ct, /* Called when a new connection for this protocol found. */ static bool gre_new(struct nf_conn *ct, const struct sk_buff *skb, - unsigned int dataoff, unsigned int *timeouts) + unsigned int dataoff) { + unsigned int *timeouts = nf_ct_timeout_lookup(ct); + + if (!timeouts) + timeouts = gre_get_timeouts(nf_ct_net(ct)); + pr_debug(": "); nf_ct_dump_tuple(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); @@ -300,6 +296,8 @@ static int gre_timeout_nlattr_to_obj(struct nlattr *tb[], unsigned int *timeouts = data; struct netns_proto_gre *net_gre = gre_pernet(net); + if (!timeouts) + timeouts = gre_get_timeouts(net); /* set default timeouts for GRE. */ timeouts[GRE_CT_UNREPLIED] = net_gre->gre_timeouts[GRE_CT_UNREPLIED]; timeouts[GRE_CT_REPLIED] = net_gre->gre_timeouts[GRE_CT_REPLIED]; @@ -356,11 +354,9 @@ static const struct nf_conntrack_l4proto nf_conntrack_l4proto_gre4 = { .l3proto = AF_INET, .l4proto = IPPROTO_GRE, .pkt_to_tuple = gre_pkt_to_tuple, - .invert_tuple = gre_invert_tuple, #ifdef CONFIG_NF_CONNTRACK_PROCFS .print_conntrack = gre_print_conntrack, #endif - .get_timeouts = gre_get_timeouts, .packet = gre_packet, .new = gre_new, .destroy = gre_destroy, diff --git a/net/netfilter/nf_conntrack_proto_icmp.c b/net/netfilter/nf_conntrack_proto_icmp.c new file mode 100644 index 000000000000..036670b38282 --- /dev/null +++ b/net/netfilter/nf_conntrack_proto_icmp.c @@ -0,0 +1,388 @@ +/* (C) 1999-2001 Paul `Rusty' Russell + * (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org> + * (C) 2006-2010 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/types.h> +#include <linux/timer.h> +#include <linux/netfilter.h> +#include <linux/in.h> +#include <linux/icmp.h> +#include <linux/seq_file.h> +#include <net/ip.h> +#include <net/checksum.h> +#include <linux/netfilter_ipv4.h> +#include <net/netfilter/nf_conntrack_tuple.h> +#include <net/netfilter/nf_conntrack_l4proto.h> +#include <net/netfilter/nf_conntrack_core.h> +#include <net/netfilter/nf_conntrack_timeout.h> +#include <net/netfilter/nf_conntrack_zones.h> +#include <net/netfilter/nf_log.h> + +static const unsigned int nf_ct_icmp_timeout = 30*HZ; + +static inline struct nf_icmp_net *icmp_pernet(struct net *net) +{ + return &net->ct.nf_ct_proto.icmp; +} + +static bool icmp_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff, + struct net *net, struct nf_conntrack_tuple *tuple) +{ + const struct icmphdr *hp; + struct icmphdr _hdr; + + hp = skb_header_pointer(skb, dataoff, sizeof(_hdr), &_hdr); + if (hp == NULL) + return false; + + tuple->dst.u.icmp.type = hp->type; + tuple->src.u.icmp.id = hp->un.echo.id; + tuple->dst.u.icmp.code = hp->code; + + return true; +} + +/* Add 1; spaces filled with 0. */ +static const u_int8_t invmap[] = { + [ICMP_ECHO] = ICMP_ECHOREPLY + 1, + [ICMP_ECHOREPLY] = ICMP_ECHO + 1, + [ICMP_TIMESTAMP] = ICMP_TIMESTAMPREPLY + 1, + [ICMP_TIMESTAMPREPLY] = ICMP_TIMESTAMP + 1, + [ICMP_INFO_REQUEST] = ICMP_INFO_REPLY + 1, + [ICMP_INFO_REPLY] = ICMP_INFO_REQUEST + 1, + [ICMP_ADDRESS] = ICMP_ADDRESSREPLY + 1, + [ICMP_ADDRESSREPLY] = ICMP_ADDRESS + 1 +}; + +static bool icmp_invert_tuple(struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_tuple *orig) +{ + if (orig->dst.u.icmp.type >= sizeof(invmap) || + !invmap[orig->dst.u.icmp.type]) + return false; + + tuple->src.u.icmp.id = orig->src.u.icmp.id; + tuple->dst.u.icmp.type = invmap[orig->dst.u.icmp.type] - 1; + tuple->dst.u.icmp.code = orig->dst.u.icmp.code; + return true; +} + +static unsigned int *icmp_get_timeouts(struct net *net) +{ + return &icmp_pernet(net)->timeout; +} + +/* Returns verdict for packet, or -1 for invalid. */ +static int icmp_packet(struct nf_conn *ct, + const struct sk_buff *skb, + unsigned int dataoff, + enum ip_conntrack_info ctinfo) +{ + /* Do not immediately delete the connection after the first + successful reply to avoid excessive conntrackd traffic + and also to handle correctly ICMP echo reply duplicates. */ + unsigned int *timeout = nf_ct_timeout_lookup(ct); + + if (!timeout) + timeout = icmp_get_timeouts(nf_ct_net(ct)); + + nf_ct_refresh_acct(ct, ctinfo, skb, *timeout); + + return NF_ACCEPT; +} + +/* Called when a new connection for this protocol found. */ +static bool icmp_new(struct nf_conn *ct, const struct sk_buff *skb, + unsigned int dataoff) +{ + static const u_int8_t valid_new[] = { + [ICMP_ECHO] = 1, + [ICMP_TIMESTAMP] = 1, + [ICMP_INFO_REQUEST] = 1, + [ICMP_ADDRESS] = 1 + }; + + if (ct->tuplehash[0].tuple.dst.u.icmp.type >= sizeof(valid_new) || + !valid_new[ct->tuplehash[0].tuple.dst.u.icmp.type]) { + /* Can't create a new ICMP `conn' with this. */ + pr_debug("icmp: can't create new conn with type %u\n", + ct->tuplehash[0].tuple.dst.u.icmp.type); + nf_ct_dump_tuple_ip(&ct->tuplehash[0].tuple); + return false; + } + return true; +} + +/* Returns conntrack if it dealt with ICMP, and filled in skb fields */ +static int +icmp_error_message(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb, + unsigned int hooknum) +{ + struct nf_conntrack_tuple innertuple, origtuple; + const struct nf_conntrack_l4proto *innerproto; + const struct nf_conntrack_tuple_hash *h; + const struct nf_conntrack_zone *zone; + enum ip_conntrack_info ctinfo; + struct nf_conntrack_zone tmp; + + WARN_ON(skb_nfct(skb)); + zone = nf_ct_zone_tmpl(tmpl, skb, &tmp); + + /* Are they talking about one of our connections? */ + if (!nf_ct_get_tuplepr(skb, + skb_network_offset(skb) + ip_hdrlen(skb) + + sizeof(struct icmphdr), + PF_INET, net, &origtuple)) { + pr_debug("icmp_error_message: failed to get tuple\n"); + return -NF_ACCEPT; + } + + /* rcu_read_lock()ed by nf_hook_thresh */ + innerproto = __nf_ct_l4proto_find(PF_INET, origtuple.dst.protonum); + + /* Ordinarily, we'd expect the inverted tupleproto, but it's + been preserved inside the ICMP. */ + if (!nf_ct_invert_tuple(&innertuple, &origtuple, innerproto)) { + pr_debug("icmp_error_message: no match\n"); + return -NF_ACCEPT; + } + + ctinfo = IP_CT_RELATED; + + h = nf_conntrack_find_get(net, zone, &innertuple); + if (!h) { + pr_debug("icmp_error_message: no match\n"); + return -NF_ACCEPT; + } + + if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) + ctinfo += IP_CT_IS_REPLY; + + /* Update skb to refer to this connection */ + nf_ct_set(skb, nf_ct_tuplehash_to_ctrack(h), ctinfo); + return NF_ACCEPT; +} + +static void icmp_error_log(const struct sk_buff *skb, struct net *net, + u8 pf, const char *msg) +{ + nf_l4proto_log_invalid(skb, net, pf, IPPROTO_ICMP, "%s", msg); +} + +/* Small and modified version of icmp_rcv */ +static int +icmp_error(struct net *net, struct nf_conn *tmpl, + struct sk_buff *skb, unsigned int dataoff, + u8 pf, unsigned int hooknum) +{ + const struct icmphdr *icmph; + struct icmphdr _ih; + + /* Not enough header? */ + icmph = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(_ih), &_ih); + if (icmph == NULL) { + icmp_error_log(skb, net, pf, "short packet"); + return -NF_ACCEPT; + } + + /* See ip_conntrack_proto_tcp.c */ + if (net->ct.sysctl_checksum && hooknum == NF_INET_PRE_ROUTING && + nf_ip_checksum(skb, hooknum, dataoff, 0)) { + icmp_error_log(skb, net, pf, "bad hw icmp checksum"); + return -NF_ACCEPT; + } + + /* + * 18 is the highest 'known' ICMP type. Anything else is a mystery + * + * RFC 1122: 3.2.2 Unknown ICMP messages types MUST be silently + * discarded. + */ + if (icmph->type > NR_ICMP_TYPES) { + icmp_error_log(skb, net, pf, "invalid icmp type"); + return -NF_ACCEPT; + } + + /* Need to track icmp error message? */ + if (icmph->type != ICMP_DEST_UNREACH && + icmph->type != ICMP_SOURCE_QUENCH && + icmph->type != ICMP_TIME_EXCEEDED && + icmph->type != ICMP_PARAMETERPROB && + icmph->type != ICMP_REDIRECT) + return NF_ACCEPT; + + return icmp_error_message(net, tmpl, skb, hooknum); +} + +#if IS_ENABLED(CONFIG_NF_CT_NETLINK) + +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nfnetlink_conntrack.h> + +static int icmp_tuple_to_nlattr(struct sk_buff *skb, + const struct nf_conntrack_tuple *t) +{ + if (nla_put_be16(skb, CTA_PROTO_ICMP_ID, t->src.u.icmp.id) || + nla_put_u8(skb, CTA_PROTO_ICMP_TYPE, t->dst.u.icmp.type) || + nla_put_u8(skb, CTA_PROTO_ICMP_CODE, t->dst.u.icmp.code)) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static const struct nla_policy icmp_nla_policy[CTA_PROTO_MAX+1] = { + [CTA_PROTO_ICMP_TYPE] = { .type = NLA_U8 }, + [CTA_PROTO_ICMP_CODE] = { .type = NLA_U8 }, + [CTA_PROTO_ICMP_ID] = { .type = NLA_U16 }, +}; + +static int icmp_nlattr_to_tuple(struct nlattr *tb[], + struct nf_conntrack_tuple *tuple) +{ + if (!tb[CTA_PROTO_ICMP_TYPE] || + !tb[CTA_PROTO_ICMP_CODE] || + !tb[CTA_PROTO_ICMP_ID]) + return -EINVAL; + + tuple->dst.u.icmp.type = nla_get_u8(tb[CTA_PROTO_ICMP_TYPE]); + tuple->dst.u.icmp.code = nla_get_u8(tb[CTA_PROTO_ICMP_CODE]); + tuple->src.u.icmp.id = nla_get_be16(tb[CTA_PROTO_ICMP_ID]); + + if (tuple->dst.u.icmp.type >= sizeof(invmap) || + !invmap[tuple->dst.u.icmp.type]) + return -EINVAL; + + return 0; +} + +static unsigned int icmp_nlattr_tuple_size(void) +{ + static unsigned int size __read_mostly; + + if (!size) + size = nla_policy_len(icmp_nla_policy, CTA_PROTO_MAX + 1); + + return size; +} +#endif + +#if IS_ENABLED(CONFIG_NF_CT_NETLINK_TIMEOUT) + +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nfnetlink_cttimeout.h> + +static int icmp_timeout_nlattr_to_obj(struct nlattr *tb[], + struct net *net, void *data) +{ + unsigned int *timeout = data; + struct nf_icmp_net *in = icmp_pernet(net); + + if (tb[CTA_TIMEOUT_ICMP_TIMEOUT]) { + if (!timeout) + timeout = &in->timeout; + *timeout = + ntohl(nla_get_be32(tb[CTA_TIMEOUT_ICMP_TIMEOUT])) * HZ; + } else if (timeout) { + /* Set default ICMP timeout. */ + *timeout = in->timeout; + } + return 0; +} + +static int +icmp_timeout_obj_to_nlattr(struct sk_buff *skb, const void *data) +{ + const unsigned int *timeout = data; + + if (nla_put_be32(skb, CTA_TIMEOUT_ICMP_TIMEOUT, htonl(*timeout / HZ))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -ENOSPC; +} + +static const struct nla_policy +icmp_timeout_nla_policy[CTA_TIMEOUT_ICMP_MAX+1] = { + [CTA_TIMEOUT_ICMP_TIMEOUT] = { .type = NLA_U32 }, +}; +#endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */ + +#ifdef CONFIG_SYSCTL +static struct ctl_table icmp_sysctl_table[] = { + { + .procname = "nf_conntrack_icmp_timeout", + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_jiffies, + }, + { } +}; +#endif /* CONFIG_SYSCTL */ + +static int icmp_kmemdup_sysctl_table(struct nf_proto_net *pn, + struct nf_icmp_net *in) +{ +#ifdef CONFIG_SYSCTL + pn->ctl_table = kmemdup(icmp_sysctl_table, + sizeof(icmp_sysctl_table), + GFP_KERNEL); + if (!pn->ctl_table) + return -ENOMEM; + + pn->ctl_table[0].data = &in->timeout; +#endif + return 0; +} + +static int icmp_init_net(struct net *net, u_int16_t proto) +{ + struct nf_icmp_net *in = icmp_pernet(net); + struct nf_proto_net *pn = &in->pn; + + in->timeout = nf_ct_icmp_timeout; + + return icmp_kmemdup_sysctl_table(pn, in); +} + +static struct nf_proto_net *icmp_get_net_proto(struct net *net) +{ + return &net->ct.nf_ct_proto.icmp.pn; +} + +const struct nf_conntrack_l4proto nf_conntrack_l4proto_icmp = +{ + .l3proto = PF_INET, + .l4proto = IPPROTO_ICMP, + .pkt_to_tuple = icmp_pkt_to_tuple, + .invert_tuple = icmp_invert_tuple, + .packet = icmp_packet, + .new = icmp_new, + .error = icmp_error, + .destroy = NULL, + .me = NULL, +#if IS_ENABLED(CONFIG_NF_CT_NETLINK) + .tuple_to_nlattr = icmp_tuple_to_nlattr, + .nlattr_tuple_size = icmp_nlattr_tuple_size, + .nlattr_to_tuple = icmp_nlattr_to_tuple, + .nla_policy = icmp_nla_policy, +#endif +#if IS_ENABLED(CONFIG_NF_CT_NETLINK_TIMEOUT) + .ctnl_timeout = { + .nlattr_to_obj = icmp_timeout_nlattr_to_obj, + .obj_to_nlattr = icmp_timeout_obj_to_nlattr, + .nlattr_max = CTA_TIMEOUT_ICMP_MAX, + .obj_size = sizeof(unsigned int), + .nla_policy = icmp_timeout_nla_policy, + }, +#endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */ + .init_net = icmp_init_net, + .get_net_proto = icmp_get_net_proto, +}; diff --git a/net/netfilter/nf_conntrack_proto_icmpv6.c b/net/netfilter/nf_conntrack_proto_icmpv6.c new file mode 100644 index 000000000000..bed07b998a10 --- /dev/null +++ b/net/netfilter/nf_conntrack_proto_icmpv6.c @@ -0,0 +1,387 @@ +/* + * Copyright (C)2003,2004 USAGI/WIDE Project + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Author: + * Yasuyuki Kozakai @USAGI <yasuyuki.kozakai@toshiba.co.jp> + */ + +#include <linux/types.h> +#include <linux/timer.h> +#include <linux/module.h> +#include <linux/netfilter.h> +#include <linux/in6.h> +#include <linux/icmpv6.h> +#include <linux/ipv6.h> +#include <net/ipv6.h> +#include <net/ip6_checksum.h> +#include <linux/seq_file.h> +#include <linux/netfilter_ipv6.h> +#include <net/netfilter/nf_conntrack_tuple.h> +#include <net/netfilter/nf_conntrack_l4proto.h> +#include <net/netfilter/nf_conntrack_core.h> +#include <net/netfilter/nf_conntrack_timeout.h> +#include <net/netfilter/nf_conntrack_zones.h> +#include <net/netfilter/ipv6/nf_conntrack_icmpv6.h> +#include <net/netfilter/nf_log.h> + +static const unsigned int nf_ct_icmpv6_timeout = 30*HZ; + +static inline struct nf_icmp_net *icmpv6_pernet(struct net *net) +{ + return &net->ct.nf_ct_proto.icmpv6; +} + +static bool icmpv6_pkt_to_tuple(const struct sk_buff *skb, + unsigned int dataoff, + struct net *net, + struct nf_conntrack_tuple *tuple) +{ + const struct icmp6hdr *hp; + struct icmp6hdr _hdr; + + hp = skb_header_pointer(skb, dataoff, sizeof(_hdr), &_hdr); + if (hp == NULL) + return false; + tuple->dst.u.icmp.type = hp->icmp6_type; + tuple->src.u.icmp.id = hp->icmp6_identifier; + tuple->dst.u.icmp.code = hp->icmp6_code; + + return true; +} + +/* Add 1; spaces filled with 0. */ +static const u_int8_t invmap[] = { + [ICMPV6_ECHO_REQUEST - 128] = ICMPV6_ECHO_REPLY + 1, + [ICMPV6_ECHO_REPLY - 128] = ICMPV6_ECHO_REQUEST + 1, + [ICMPV6_NI_QUERY - 128] = ICMPV6_NI_REPLY + 1, + [ICMPV6_NI_REPLY - 128] = ICMPV6_NI_QUERY + 1 +}; + +static const u_int8_t noct_valid_new[] = { + [ICMPV6_MGM_QUERY - 130] = 1, + [ICMPV6_MGM_REPORT - 130] = 1, + [ICMPV6_MGM_REDUCTION - 130] = 1, + [NDISC_ROUTER_SOLICITATION - 130] = 1, + [NDISC_ROUTER_ADVERTISEMENT - 130] = 1, + [NDISC_NEIGHBOUR_SOLICITATION - 130] = 1, + [NDISC_NEIGHBOUR_ADVERTISEMENT - 130] = 1, + [ICMPV6_MLD2_REPORT - 130] = 1 +}; + +static bool icmpv6_invert_tuple(struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_tuple *orig) +{ + int type = orig->dst.u.icmp.type - 128; + if (type < 0 || type >= sizeof(invmap) || !invmap[type]) + return false; + + tuple->src.u.icmp.id = orig->src.u.icmp.id; + tuple->dst.u.icmp.type = invmap[type] - 1; + tuple->dst.u.icmp.code = orig->dst.u.icmp.code; + return true; +} + +static unsigned int *icmpv6_get_timeouts(struct net *net) +{ + return &icmpv6_pernet(net)->timeout; +} + +/* Returns verdict for packet, or -1 for invalid. */ +static int icmpv6_packet(struct nf_conn *ct, + const struct sk_buff *skb, + unsigned int dataoff, + enum ip_conntrack_info ctinfo) +{ + unsigned int *timeout = nf_ct_timeout_lookup(ct); + + if (!timeout) + timeout = icmpv6_get_timeouts(nf_ct_net(ct)); + + /* Do not immediately delete the connection after the first + successful reply to avoid excessive conntrackd traffic + and also to handle correctly ICMP echo reply duplicates. */ + nf_ct_refresh_acct(ct, ctinfo, skb, *timeout); + + return NF_ACCEPT; +} + +/* Called when a new connection for this protocol found. */ +static bool icmpv6_new(struct nf_conn *ct, const struct sk_buff *skb, + unsigned int dataoff) +{ + static const u_int8_t valid_new[] = { + [ICMPV6_ECHO_REQUEST - 128] = 1, + [ICMPV6_NI_QUERY - 128] = 1 + }; + int type = ct->tuplehash[0].tuple.dst.u.icmp.type - 128; + + if (type < 0 || type >= sizeof(valid_new) || !valid_new[type]) { + /* Can't create a new ICMPv6 `conn' with this. */ + pr_debug("icmpv6: can't create new conn with type %u\n", + type + 128); + nf_ct_dump_tuple_ipv6(&ct->tuplehash[0].tuple); + return false; + } + return true; +} + +static int +icmpv6_error_message(struct net *net, struct nf_conn *tmpl, + struct sk_buff *skb, + unsigned int icmp6off) +{ + struct nf_conntrack_tuple intuple, origtuple; + const struct nf_conntrack_tuple_hash *h; + const struct nf_conntrack_l4proto *inproto; + enum ip_conntrack_info ctinfo; + struct nf_conntrack_zone tmp; + + WARN_ON(skb_nfct(skb)); + + /* Are they talking about one of our connections? */ + if (!nf_ct_get_tuplepr(skb, + skb_network_offset(skb) + + sizeof(struct ipv6hdr) + + sizeof(struct icmp6hdr), + PF_INET6, net, &origtuple)) { + pr_debug("icmpv6_error: Can't get tuple\n"); + return -NF_ACCEPT; + } + + /* rcu_read_lock()ed by nf_hook_thresh */ + inproto = __nf_ct_l4proto_find(PF_INET6, origtuple.dst.protonum); + + /* Ordinarily, we'd expect the inverted tupleproto, but it's + been preserved inside the ICMP. */ + if (!nf_ct_invert_tuple(&intuple, &origtuple, inproto)) { + pr_debug("icmpv6_error: Can't invert tuple\n"); + return -NF_ACCEPT; + } + + ctinfo = IP_CT_RELATED; + + h = nf_conntrack_find_get(net, nf_ct_zone_tmpl(tmpl, skb, &tmp), + &intuple); + if (!h) { + pr_debug("icmpv6_error: no match\n"); + return -NF_ACCEPT; + } else { + if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) + ctinfo += IP_CT_IS_REPLY; + } + + /* Update skb to refer to this connection */ + nf_ct_set(skb, nf_ct_tuplehash_to_ctrack(h), ctinfo); + return NF_ACCEPT; +} + +static void icmpv6_error_log(const struct sk_buff *skb, struct net *net, + u8 pf, const char *msg) +{ + nf_l4proto_log_invalid(skb, net, pf, IPPROTO_ICMPV6, "%s", msg); +} + +static int +icmpv6_error(struct net *net, struct nf_conn *tmpl, + struct sk_buff *skb, unsigned int dataoff, + u8 pf, unsigned int hooknum) +{ + const struct icmp6hdr *icmp6h; + struct icmp6hdr _ih; + int type; + + icmp6h = skb_header_pointer(skb, dataoff, sizeof(_ih), &_ih); + if (icmp6h == NULL) { + icmpv6_error_log(skb, net, pf, "short packet"); + return -NF_ACCEPT; + } + + if (net->ct.sysctl_checksum && hooknum == NF_INET_PRE_ROUTING && + nf_ip6_checksum(skb, hooknum, dataoff, IPPROTO_ICMPV6)) { + icmpv6_error_log(skb, net, pf, "ICMPv6 checksum failed"); + return -NF_ACCEPT; + } + + type = icmp6h->icmp6_type - 130; + if (type >= 0 && type < sizeof(noct_valid_new) && + noct_valid_new[type]) { + nf_ct_set(skb, NULL, IP_CT_UNTRACKED); + return NF_ACCEPT; + } + + /* is not error message ? */ + if (icmp6h->icmp6_type >= 128) + return NF_ACCEPT; + + return icmpv6_error_message(net, tmpl, skb, dataoff); +} + +#if IS_ENABLED(CONFIG_NF_CT_NETLINK) + +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nfnetlink_conntrack.h> +static int icmpv6_tuple_to_nlattr(struct sk_buff *skb, + const struct nf_conntrack_tuple *t) +{ + if (nla_put_be16(skb, CTA_PROTO_ICMPV6_ID, t->src.u.icmp.id) || + nla_put_u8(skb, CTA_PROTO_ICMPV6_TYPE, t->dst.u.icmp.type) || + nla_put_u8(skb, CTA_PROTO_ICMPV6_CODE, t->dst.u.icmp.code)) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static const struct nla_policy icmpv6_nla_policy[CTA_PROTO_MAX+1] = { + [CTA_PROTO_ICMPV6_TYPE] = { .type = NLA_U8 }, + [CTA_PROTO_ICMPV6_CODE] = { .type = NLA_U8 }, + [CTA_PROTO_ICMPV6_ID] = { .type = NLA_U16 }, +}; + +static int icmpv6_nlattr_to_tuple(struct nlattr *tb[], + struct nf_conntrack_tuple *tuple) +{ + if (!tb[CTA_PROTO_ICMPV6_TYPE] || + !tb[CTA_PROTO_ICMPV6_CODE] || + !tb[CTA_PROTO_ICMPV6_ID]) + return -EINVAL; + + tuple->dst.u.icmp.type = nla_get_u8(tb[CTA_PROTO_ICMPV6_TYPE]); + tuple->dst.u.icmp.code = nla_get_u8(tb[CTA_PROTO_ICMPV6_CODE]); + tuple->src.u.icmp.id = nla_get_be16(tb[CTA_PROTO_ICMPV6_ID]); + + if (tuple->dst.u.icmp.type < 128 || + tuple->dst.u.icmp.type - 128 >= sizeof(invmap) || + !invmap[tuple->dst.u.icmp.type - 128]) + return -EINVAL; + + return 0; +} + +static unsigned int icmpv6_nlattr_tuple_size(void) +{ + static unsigned int size __read_mostly; + + if (!size) + size = nla_policy_len(icmpv6_nla_policy, CTA_PROTO_MAX + 1); + + return size; +} +#endif + +#if IS_ENABLED(CONFIG_NF_CT_NETLINK_TIMEOUT) + +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nfnetlink_cttimeout.h> + +static int icmpv6_timeout_nlattr_to_obj(struct nlattr *tb[], + struct net *net, void *data) +{ + unsigned int *timeout = data; + struct nf_icmp_net *in = icmpv6_pernet(net); + + if (!timeout) + timeout = icmpv6_get_timeouts(net); + if (tb[CTA_TIMEOUT_ICMPV6_TIMEOUT]) { + *timeout = + ntohl(nla_get_be32(tb[CTA_TIMEOUT_ICMPV6_TIMEOUT])) * HZ; + } else { + /* Set default ICMPv6 timeout. */ + *timeout = in->timeout; + } + return 0; +} + +static int +icmpv6_timeout_obj_to_nlattr(struct sk_buff *skb, const void *data) +{ + const unsigned int *timeout = data; + + if (nla_put_be32(skb, CTA_TIMEOUT_ICMPV6_TIMEOUT, htonl(*timeout / HZ))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -ENOSPC; +} + +static const struct nla_policy +icmpv6_timeout_nla_policy[CTA_TIMEOUT_ICMPV6_MAX+1] = { + [CTA_TIMEOUT_ICMPV6_TIMEOUT] = { .type = NLA_U32 }, +}; +#endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */ + +#ifdef CONFIG_SYSCTL +static struct ctl_table icmpv6_sysctl_table[] = { + { + .procname = "nf_conntrack_icmpv6_timeout", + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_jiffies, + }, + { } +}; +#endif /* CONFIG_SYSCTL */ + +static int icmpv6_kmemdup_sysctl_table(struct nf_proto_net *pn, + struct nf_icmp_net *in) +{ +#ifdef CONFIG_SYSCTL + pn->ctl_table = kmemdup(icmpv6_sysctl_table, + sizeof(icmpv6_sysctl_table), + GFP_KERNEL); + if (!pn->ctl_table) + return -ENOMEM; + + pn->ctl_table[0].data = &in->timeout; +#endif + return 0; +} + +static int icmpv6_init_net(struct net *net, u_int16_t proto) +{ + struct nf_icmp_net *in = icmpv6_pernet(net); + struct nf_proto_net *pn = &in->pn; + + in->timeout = nf_ct_icmpv6_timeout; + + return icmpv6_kmemdup_sysctl_table(pn, in); +} + +static struct nf_proto_net *icmpv6_get_net_proto(struct net *net) +{ + return &net->ct.nf_ct_proto.icmpv6.pn; +} + +const struct nf_conntrack_l4proto nf_conntrack_l4proto_icmpv6 = +{ + .l3proto = PF_INET6, + .l4proto = IPPROTO_ICMPV6, + .pkt_to_tuple = icmpv6_pkt_to_tuple, + .invert_tuple = icmpv6_invert_tuple, + .packet = icmpv6_packet, + .new = icmpv6_new, + .error = icmpv6_error, +#if IS_ENABLED(CONFIG_NF_CT_NETLINK) + .tuple_to_nlattr = icmpv6_tuple_to_nlattr, + .nlattr_tuple_size = icmpv6_nlattr_tuple_size, + .nlattr_to_tuple = icmpv6_nlattr_to_tuple, + .nla_policy = icmpv6_nla_policy, +#endif +#if IS_ENABLED(CONFIG_NF_CT_NETLINK_TIMEOUT) + .ctnl_timeout = { + .nlattr_to_obj = icmpv6_timeout_nlattr_to_obj, + .obj_to_nlattr = icmpv6_timeout_obj_to_nlattr, + .nlattr_max = CTA_TIMEOUT_ICMP_MAX, + .obj_size = sizeof(unsigned int), + .nla_policy = icmpv6_timeout_nla_policy, + }, +#endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */ + .init_net = icmpv6_init_net, + .get_net_proto = icmpv6_get_net_proto, +}; diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c index fb9a35d16069..8d1e085fc14a 100644 --- a/net/netfilter/nf_conntrack_proto_sctp.c +++ b/net/netfilter/nf_conntrack_proto_sctp.c @@ -28,6 +28,7 @@ #include <net/netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_ecache.h> +#include <net/netfilter/nf_conntrack_timeout.h> /* FIXME: Examine ipfilter's timeouts and conntrack transitions more closely. They're more complex. --RR @@ -150,30 +151,6 @@ static inline struct nf_sctp_net *sctp_pernet(struct net *net) return &net->ct.nf_ct_proto.sctp; } -static bool sctp_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff, - struct net *net, struct nf_conntrack_tuple *tuple) -{ - const struct sctphdr *hp; - struct sctphdr _hdr; - - /* Actually only need first 4 bytes to get ports. */ - hp = skb_header_pointer(skb, dataoff, 4, &_hdr); - if (hp == NULL) - return false; - - tuple->src.u.sctp.port = hp->source; - tuple->dst.u.sctp.port = hp->dest; - return true; -} - -static bool sctp_invert_tuple(struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_tuple *orig) -{ - tuple->src.u.sctp.port = orig->dst.u.sctp.port; - tuple->dst.u.sctp.port = orig->src.u.sctp.port; - return true; -} - #ifdef CONFIG_NF_CONNTRACK_PROCFS /* Print out the private part of the conntrack. */ static void sctp_print_conntrack(struct seq_file *s, struct nf_conn *ct) @@ -296,17 +273,11 @@ static int sctp_new_state(enum ip_conntrack_dir dir, return sctp_conntracks[dir][i][cur_state]; } -static unsigned int *sctp_get_timeouts(struct net *net) -{ - return sctp_pernet(net)->timeouts; -} - /* Returns verdict for packet, or -NF_ACCEPT for invalid. */ static int sctp_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, - enum ip_conntrack_info ctinfo, - unsigned int *timeouts) + enum ip_conntrack_info ctinfo) { enum sctp_conntrack new_state, old_state; enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); @@ -315,6 +286,7 @@ static int sctp_packet(struct nf_conn *ct, const struct sctp_chunkhdr *sch; struct sctp_chunkhdr _sch; u_int32_t offset, count; + unsigned int *timeouts; unsigned long map[256 / sizeof(unsigned long)] = { 0 }; sh = skb_header_pointer(skb, dataoff, sizeof(_sctph), &_sctph); @@ -403,6 +375,10 @@ static int sctp_packet(struct nf_conn *ct, } spin_unlock_bh(&ct->lock); + timeouts = nf_ct_timeout_lookup(ct); + if (!timeouts) + timeouts = sctp_pernet(nf_ct_net(ct))->timeouts; + nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[new_state]); if (old_state == SCTP_CONNTRACK_COOKIE_ECHOED && @@ -423,7 +399,7 @@ out: /* Called when a new connection for this protocol found. */ static bool sctp_new(struct nf_conn *ct, const struct sk_buff *skb, - unsigned int dataoff, unsigned int *timeouts) + unsigned int dataoff) { enum sctp_conntrack new_state; const struct sctphdr *sh; @@ -780,13 +756,10 @@ static struct nf_proto_net *sctp_get_net_proto(struct net *net) const struct nf_conntrack_l4proto nf_conntrack_l4proto_sctp4 = { .l3proto = PF_INET, .l4proto = IPPROTO_SCTP, - .pkt_to_tuple = sctp_pkt_to_tuple, - .invert_tuple = sctp_invert_tuple, #ifdef CONFIG_NF_CONNTRACK_PROCFS .print_conntrack = sctp_print_conntrack, #endif .packet = sctp_packet, - .get_timeouts = sctp_get_timeouts, .new = sctp_new, .error = sctp_error, .can_early_drop = sctp_can_early_drop, @@ -817,13 +790,10 @@ EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_sctp4); const struct nf_conntrack_l4proto nf_conntrack_l4proto_sctp6 = { .l3proto = PF_INET6, .l4proto = IPPROTO_SCTP, - .pkt_to_tuple = sctp_pkt_to_tuple, - .invert_tuple = sctp_invert_tuple, #ifdef CONFIG_NF_CONNTRACK_PROCFS .print_conntrack = sctp_print_conntrack, #endif .packet = sctp_packet, - .get_timeouts = sctp_get_timeouts, .new = sctp_new, .error = sctp_error, .can_early_drop = sctp_can_early_drop, diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 8e67910185a0..d80d322b9d8b 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -29,6 +29,7 @@ #include <net/netfilter/nf_conntrack_ecache.h> #include <net/netfilter/nf_conntrack_seqadj.h> #include <net/netfilter/nf_conntrack_synproxy.h> +#include <net/netfilter/nf_conntrack_timeout.h> #include <net/netfilter/nf_log.h> #include <net/netfilter/ipv4/nf_conntrack_ipv4.h> #include <net/netfilter/ipv6/nf_conntrack_ipv6.h> @@ -276,31 +277,6 @@ static inline struct nf_tcp_net *tcp_pernet(struct net *net) return &net->ct.nf_ct_proto.tcp; } -static bool tcp_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff, - struct net *net, struct nf_conntrack_tuple *tuple) -{ - const struct tcphdr *hp; - struct tcphdr _hdr; - - /* Actually only need first 4 bytes to get ports. */ - hp = skb_header_pointer(skb, dataoff, 4, &_hdr); - if (hp == NULL) - return false; - - tuple->src.u.tcp.port = hp->source; - tuple->dst.u.tcp.port = hp->dest; - - return true; -} - -static bool tcp_invert_tuple(struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_tuple *orig) -{ - tuple->src.u.tcp.port = orig->dst.u.tcp.port; - tuple->dst.u.tcp.port = orig->src.u.tcp.port; - return true; -} - #ifdef CONFIG_NF_CONNTRACK_PROCFS /* Print out the private part of the conntrack. */ static void tcp_print_conntrack(struct seq_file *s, struct nf_conn *ct) @@ -793,27 +769,21 @@ static int tcp_error(struct net *net, struct nf_conn *tmpl, return NF_ACCEPT; } -static unsigned int *tcp_get_timeouts(struct net *net) -{ - return tcp_pernet(net)->timeouts; -} - /* Returns verdict for packet, or -1 for invalid. */ static int tcp_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, - enum ip_conntrack_info ctinfo, - unsigned int *timeouts) + enum ip_conntrack_info ctinfo) { struct net *net = nf_ct_net(ct); struct nf_tcp_net *tn = tcp_pernet(net); struct nf_conntrack_tuple *tuple; enum tcp_conntrack new_state, old_state; + unsigned int index, *timeouts; enum ip_conntrack_dir dir; const struct tcphdr *th; struct tcphdr _tcph; unsigned long timeout; - unsigned int index; th = skb_header_pointer(skb, dataoff, sizeof(_tcph), &_tcph); BUG_ON(th == NULL); @@ -1046,6 +1016,10 @@ static int tcp_packet(struct nf_conn *ct, && new_state == TCP_CONNTRACK_FIN_WAIT) ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT; + timeouts = nf_ct_timeout_lookup(ct); + if (!timeouts) + timeouts = tn->timeouts; + if (ct->proto.tcp.retrans >= tn->tcp_max_retrans && timeouts[new_state] > timeouts[TCP_CONNTRACK_RETRANS]) timeout = timeouts[TCP_CONNTRACK_RETRANS]; @@ -1095,7 +1069,7 @@ static int tcp_packet(struct nf_conn *ct, /* Called when a new connection for this protocol found. */ static bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb, - unsigned int dataoff, unsigned int *timeouts) + unsigned int dataoff) { enum tcp_conntrack new_state; const struct tcphdr *th; @@ -1313,10 +1287,12 @@ static unsigned int tcp_nlattr_tuple_size(void) static int tcp_timeout_nlattr_to_obj(struct nlattr *tb[], struct net *net, void *data) { - unsigned int *timeouts = data; struct nf_tcp_net *tn = tcp_pernet(net); + unsigned int *timeouts = data; int i; + if (!timeouts) + timeouts = tn->timeouts; /* set default TCP timeouts. */ for (i=0; i<TCP_CONNTRACK_TIMEOUT_MAX; i++) timeouts[i] = tn->timeouts[i]; @@ -1559,13 +1535,10 @@ const struct nf_conntrack_l4proto nf_conntrack_l4proto_tcp4 = { .l3proto = PF_INET, .l4proto = IPPROTO_TCP, - .pkt_to_tuple = tcp_pkt_to_tuple, - .invert_tuple = tcp_invert_tuple, #ifdef CONFIG_NF_CONNTRACK_PROCFS .print_conntrack = tcp_print_conntrack, #endif .packet = tcp_packet, - .get_timeouts = tcp_get_timeouts, .new = tcp_new, .error = tcp_error, .can_early_drop = tcp_can_early_drop, @@ -1597,13 +1570,10 @@ const struct nf_conntrack_l4proto nf_conntrack_l4proto_tcp6 = { .l3proto = PF_INET6, .l4proto = IPPROTO_TCP, - .pkt_to_tuple = tcp_pkt_to_tuple, - .invert_tuple = tcp_invert_tuple, #ifdef CONFIG_NF_CONNTRACK_PROCFS .print_conntrack = tcp_print_conntrack, #endif .packet = tcp_packet, - .get_timeouts = tcp_get_timeouts, .new = tcp_new, .error = tcp_error, .can_early_drop = tcp_can_early_drop, diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c index fe7243970aa4..7a1b8988a931 100644 --- a/net/netfilter/nf_conntrack_proto_udp.c +++ b/net/netfilter/nf_conntrack_proto_udp.c @@ -22,6 +22,7 @@ #include <linux/netfilter_ipv6.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_ecache.h> +#include <net/netfilter/nf_conntrack_timeout.h> #include <net/netfilter/nf_log.h> #include <net/netfilter/ipv4/nf_conntrack_ipv4.h> #include <net/netfilter/ipv6/nf_conntrack_ipv6.h> @@ -36,33 +37,6 @@ static inline struct nf_udp_net *udp_pernet(struct net *net) return &net->ct.nf_ct_proto.udp; } -static bool udp_pkt_to_tuple(const struct sk_buff *skb, - unsigned int dataoff, - struct net *net, - struct nf_conntrack_tuple *tuple) -{ - const struct udphdr *hp; - struct udphdr _hdr; - - /* Actually only need first 4 bytes to get ports. */ - hp = skb_header_pointer(skb, dataoff, 4, &_hdr); - if (hp == NULL) - return false; - - tuple->src.u.udp.port = hp->source; - tuple->dst.u.udp.port = hp->dest; - - return true; -} - -static bool udp_invert_tuple(struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_tuple *orig) -{ - tuple->src.u.udp.port = orig->dst.u.udp.port; - tuple->dst.u.udp.port = orig->src.u.udp.port; - return true; -} - static unsigned int *udp_get_timeouts(struct net *net) { return udp_pernet(net)->timeouts; @@ -72,9 +46,14 @@ static unsigned int *udp_get_timeouts(struct net *net) static int udp_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, - enum ip_conntrack_info ctinfo, - unsigned int *timeouts) + enum ip_conntrack_info ctinfo) { + unsigned int *timeouts; + + timeouts = nf_ct_timeout_lookup(ct); + if (!timeouts) + timeouts = udp_get_timeouts(nf_ct_net(ct)); + /* If we've seen traffic both ways, this is some kind of UDP stream. Extend timeout. */ if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { @@ -92,7 +71,7 @@ static int udp_packet(struct nf_conn *ct, /* Called when a new connection for this protocol found. */ static bool udp_new(struct nf_conn *ct, const struct sk_buff *skb, - unsigned int dataoff, unsigned int *timeouts) + unsigned int dataoff) { return true; } @@ -203,6 +182,9 @@ static int udp_timeout_nlattr_to_obj(struct nlattr *tb[], unsigned int *timeouts = data; struct nf_udp_net *un = udp_pernet(net); + if (!timeouts) + timeouts = un->timeouts; + /* set default timeouts for UDP. */ timeouts[UDP_CT_UNREPLIED] = un->timeouts[UDP_CT_UNREPLIED]; timeouts[UDP_CT_REPLIED] = un->timeouts[UDP_CT_REPLIED]; @@ -301,10 +283,7 @@ const struct nf_conntrack_l4proto nf_conntrack_l4proto_udp4 = .l3proto = PF_INET, .l4proto = IPPROTO_UDP, .allow_clash = true, - .pkt_to_tuple = udp_pkt_to_tuple, - .invert_tuple = udp_invert_tuple, .packet = udp_packet, - .get_timeouts = udp_get_timeouts, .new = udp_new, .error = udp_error, #if IS_ENABLED(CONFIG_NF_CT_NETLINK) @@ -333,10 +312,7 @@ const struct nf_conntrack_l4proto nf_conntrack_l4proto_udplite4 = .l3proto = PF_INET, .l4proto = IPPROTO_UDPLITE, .allow_clash = true, - .pkt_to_tuple = udp_pkt_to_tuple, - .invert_tuple = udp_invert_tuple, .packet = udp_packet, - .get_timeouts = udp_get_timeouts, .new = udp_new, .error = udplite_error, #if IS_ENABLED(CONFIG_NF_CT_NETLINK) @@ -365,10 +341,7 @@ const struct nf_conntrack_l4proto nf_conntrack_l4proto_udp6 = .l3proto = PF_INET6, .l4proto = IPPROTO_UDP, .allow_clash = true, - .pkt_to_tuple = udp_pkt_to_tuple, - .invert_tuple = udp_invert_tuple, .packet = udp_packet, - .get_timeouts = udp_get_timeouts, .new = udp_new, .error = udp_error, #if IS_ENABLED(CONFIG_NF_CT_NETLINK) @@ -397,10 +370,7 @@ const struct nf_conntrack_l4proto nf_conntrack_l4proto_udplite6 = .l3proto = PF_INET6, .l4proto = IPPROTO_UDPLITE, .allow_clash = true, - .pkt_to_tuple = udp_pkt_to_tuple, - .invert_tuple = udp_invert_tuple, .packet = udp_packet, - .get_timeouts = udp_get_timeouts, .new = udp_new, .error = udplite_error, #if IS_ENABLED(CONFIG_NF_CT_NETLINK) @@ -423,3 +393,4 @@ const struct nf_conntrack_l4proto nf_conntrack_l4proto_udplite6 = }; EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_udplite6); #endif +#include <net/netfilter/nf_conntrack_timeout.h> diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c index b642c0b2495c..13279f683da9 100644 --- a/net/netfilter/nf_conntrack_standalone.c +++ b/net/netfilter/nf_conntrack_standalone.c @@ -1,12 +1,4 @@ -/* (C) 1999-2001 Paul `Rusty' Russell - * (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org> - * (C) 2005-2012 Patrick McHardy <kaber@trash.net> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - +// SPDX-License-Identifier: GPL-2.0 #include <linux/types.h> #include <linux/netfilter.h> #include <linux/slab.h> @@ -24,7 +16,6 @@ #include <net/netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_core.h> -#include <net/netfilter/nf_conntrack_l3proto.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_expect.h> #include <net/netfilter/nf_conntrack_helper.h> @@ -33,15 +24,14 @@ #include <net/netfilter/nf_conntrack_timestamp.h> #include <linux/rculist_nulls.h> -MODULE_LICENSE("GPL"); +unsigned int nf_conntrack_net_id __read_mostly; #ifdef CONFIG_NF_CONNTRACK_PROCFS void print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple, - const struct nf_conntrack_l3proto *l3proto, const struct nf_conntrack_l4proto *l4proto) { - switch (l3proto->l3proto) { + switch (tuple->src.l3num) { case NFPROTO_IPV4: seq_printf(s, "src=%pI4 dst=%pI4 ", &tuple->src.u3.ip, &tuple->dst.u3.ip); @@ -282,7 +272,6 @@ static int ct_seq_show(struct seq_file *s, void *v) { struct nf_conntrack_tuple_hash *hash = v; struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(hash); - const struct nf_conntrack_l3proto *l3proto; const struct nf_conntrack_l4proto *l4proto; struct net *net = seq_file_net(s); int ret = 0; @@ -303,14 +292,12 @@ static int ct_seq_show(struct seq_file *s, void *v) if (!net_eq(nf_ct_net(ct), net)) goto release; - l3proto = __nf_ct_l3proto_find(nf_ct_l3num(ct)); - WARN_ON(!l3proto); l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); WARN_ON(!l4proto); ret = -ENOSPC; seq_printf(s, "%-8s %u %-8s %u ", - l3proto_name(l3proto->l3proto), nf_ct_l3num(ct), + l3proto_name(nf_ct_l3num(ct)), nf_ct_l3num(ct), l4proto_name(l4proto->l4proto), nf_ct_protonum(ct)); if (!test_bit(IPS_OFFLOAD_BIT, &ct->status)) @@ -320,7 +307,7 @@ static int ct_seq_show(struct seq_file *s, void *v) l4proto->print_conntrack(s, ct); print_tuple(s, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, - l3proto, l4proto); + l4proto); ct_show_zone(s, ct, NF_CT_ZONE_DIR_ORIG); @@ -333,8 +320,7 @@ static int ct_seq_show(struct seq_file *s, void *v) if (!(test_bit(IPS_SEEN_REPLY_BIT, &ct->status))) seq_puts(s, "[UNREPLIED] "); - print_tuple(s, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, - l3proto, l4proto); + print_tuple(s, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, l4proto); ct_show_zone(s, ct, NF_CT_ZONE_DIR_REPL); @@ -680,6 +666,8 @@ static void nf_conntrack_pernet_exit(struct list_head *net_exit_list) static struct pernet_operations nf_conntrack_net_ops = { .init = nf_conntrack_pernet_init, .exit_batch = nf_conntrack_pernet_exit, + .id = &nf_conntrack_net_id, + .size = sizeof(struct nf_conntrack_net), }; static int __init nf_conntrack_standalone_init(void) diff --git a/net/netfilter/nf_conntrack_timeout.c b/net/netfilter/nf_conntrack_timeout.c index 46aee65f339b..91fbd183da2d 100644 --- a/net/netfilter/nf_conntrack_timeout.c +++ b/net/netfilter/nf_conntrack_timeout.c @@ -24,13 +24,30 @@ #include <net/netfilter/nf_conntrack_extend.h> #include <net/netfilter/nf_conntrack_timeout.h> -struct ctnl_timeout * +struct nf_ct_timeout * (*nf_ct_timeout_find_get_hook)(struct net *net, const char *name) __read_mostly; EXPORT_SYMBOL_GPL(nf_ct_timeout_find_get_hook); -void (*nf_ct_timeout_put_hook)(struct ctnl_timeout *timeout) __read_mostly; +void (*nf_ct_timeout_put_hook)(struct nf_ct_timeout *timeout) __read_mostly; EXPORT_SYMBOL_GPL(nf_ct_timeout_put_hook); +static int untimeout(struct nf_conn *ct, void *timeout) +{ + struct nf_conn_timeout *timeout_ext = nf_ct_timeout_find(ct); + + if (timeout_ext && (!timeout || timeout_ext->timeout == timeout)) + RCU_INIT_POINTER(timeout_ext->timeout, NULL); + + /* We are not intended to delete this conntrack. */ + return 0; +} + +void nf_ct_untimeout(struct net *net, struct nf_ct_timeout *timeout) +{ + nf_ct_iterate_cleanup_net(net, untimeout, timeout, 0, 0); +} +EXPORT_SYMBOL_GPL(nf_ct_untimeout); + static const struct nf_ct_ext_type timeout_extend = { .len = sizeof(struct nf_conn_timeout), .align = __alignof__(struct nf_conn_timeout), diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index eb0d1658ac05..d8125616edc7 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -107,11 +107,12 @@ static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp) tcp->seen[1].td_maxwin = 0; } +#define NF_FLOWTABLE_TCP_PICKUP_TIMEOUT (120 * HZ) +#define NF_FLOWTABLE_UDP_PICKUP_TIMEOUT (30 * HZ) + static void flow_offload_fixup_ct_state(struct nf_conn *ct) { const struct nf_conntrack_l4proto *l4proto; - struct net *net = nf_ct_net(ct); - unsigned int *timeouts; unsigned int timeout; int l4num; @@ -123,14 +124,10 @@ static void flow_offload_fixup_ct_state(struct nf_conn *ct) if (!l4proto) return; - timeouts = l4proto->get_timeouts(net); - if (!timeouts) - return; - if (l4num == IPPROTO_TCP) - timeout = timeouts[TCP_CONNTRACK_ESTABLISHED]; + timeout = NF_FLOWTABLE_TCP_PICKUP_TIMEOUT; else if (l4num == IPPROTO_UDP) - timeout = timeouts[UDP_CT_REPLIED]; + timeout = NF_FLOWTABLE_UDP_PICKUP_TIMEOUT; else return; diff --git a/net/netfilter/nf_log_common.c b/net/netfilter/nf_log_common.c index dc61399e30be..a8c5c846aec1 100644 --- a/net/netfilter/nf_log_common.c +++ b/net/netfilter/nf_log_common.c @@ -132,9 +132,10 @@ int nf_log_dump_tcp_header(struct nf_log_buf *m, const struct sk_buff *skb, } EXPORT_SYMBOL_GPL(nf_log_dump_tcp_header); -void nf_log_dump_sk_uid_gid(struct nf_log_buf *m, struct sock *sk) +void nf_log_dump_sk_uid_gid(struct net *net, struct nf_log_buf *m, + struct sock *sk) { - if (!sk || !sk_fullsock(sk)) + if (!sk || !sk_fullsock(sk) || !net_eq(net, sock_net(sk))) return; read_lock_bh(&sk->sk_callback_lock); diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 46f9df99d276..e2b196054dfc 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -28,7 +28,6 @@ #include <net/netfilter/nf_nat_helper.h> #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_seqadj.h> -#include <net/netfilter/nf_conntrack_l3proto.h> #include <net/netfilter/nf_conntrack_zones.h> #include <linux/netfilter/nf_nat.h> @@ -108,6 +107,7 @@ int nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int family) struct flowi fl; unsigned int hh_len; struct dst_entry *dst; + struct sock *sk = skb->sk; int err; err = xfrm_decode_session(skb, &fl, family); @@ -119,7 +119,10 @@ int nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int family) dst = ((struct xfrm_dst *)dst)->route; dst_hold(dst); - dst = xfrm_lookup(net, dst, &fl, skb->sk, 0); + if (sk && !net_eq(net, sock_net(sk))) + sk = NULL; + + dst = xfrm_lookup(net, dst, &fl, sk, 0); if (IS_ERR(dst)) return PTR_ERR(dst); @@ -739,12 +742,6 @@ EXPORT_SYMBOL_GPL(nf_nat_l4proto_unregister); int nf_nat_l3proto_register(const struct nf_nat_l3proto *l3proto) { - int err; - - err = nf_ct_l3proto_try_module_get(l3proto->l3proto); - if (err < 0) - return err; - mutex_lock(&nf_nat_proto_mutex); RCU_INIT_POINTER(nf_nat_l4protos[l3proto->l3proto][IPPROTO_TCP], &nf_nat_l4proto_tcp); @@ -777,7 +774,6 @@ void nf_nat_l3proto_unregister(const struct nf_nat_l3proto *l3proto) synchronize_rcu(); nf_nat_l3proto_clean(l3proto->l3proto); - nf_ct_l3proto_module_put(l3proto->l3proto); } EXPORT_SYMBOL_GPL(nf_nat_l3proto_unregister); @@ -1060,7 +1056,7 @@ static int __init nf_nat_init(void) ret = nf_ct_extend_register(&nat_extend); if (ret < 0) { - nf_ct_free_hashtable(nf_nat_bysource, nf_nat_htable_size); + kvfree(nf_nat_bysource); pr_err("Unable to register extension\n"); return ret; } @@ -1098,7 +1094,7 @@ static void __exit nf_nat_cleanup(void) for (i = 0; i < NFPROTO_NUMPROTO; i++) kfree(nf_nat_l4protos[i]); synchronize_net(); - nf_ct_free_hashtable(nf_nat_bysource, nf_nat_htable_size); + kvfree(nf_nat_bysource); unregister_pernet_subsys(&nat_net_ops); } diff --git a/net/netfilter/nf_osf.c b/net/netfilter/nf_osf.c deleted file mode 100644 index 5ba5c7bef2f9..000000000000 --- a/net/netfilter/nf_osf.c +++ /dev/null @@ -1,218 +0,0 @@ -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt -#include <linux/module.h> -#include <linux/kernel.h> - -#include <linux/capability.h> -#include <linux/if.h> -#include <linux/inetdevice.h> -#include <linux/ip.h> -#include <linux/list.h> -#include <linux/rculist.h> -#include <linux/skbuff.h> -#include <linux/slab.h> -#include <linux/tcp.h> - -#include <net/ip.h> -#include <net/tcp.h> - -#include <linux/netfilter/nfnetlink.h> -#include <linux/netfilter/x_tables.h> -#include <net/netfilter/nf_log.h> -#include <linux/netfilter/nf_osf.h> - -static inline int nf_osf_ttl(const struct sk_buff *skb, - const struct nf_osf_info *info, - unsigned char f_ttl) -{ - const struct iphdr *ip = ip_hdr(skb); - - if (info->flags & NF_OSF_TTL) { - if (info->ttl == NF_OSF_TTL_TRUE) - return ip->ttl == f_ttl; - if (info->ttl == NF_OSF_TTL_NOCHECK) - return 1; - else if (ip->ttl <= f_ttl) - return 1; - else { - struct in_device *in_dev = __in_dev_get_rcu(skb->dev); - int ret = 0; - - for_ifa(in_dev) { - if (inet_ifa_match(ip->saddr, ifa)) { - ret = (ip->ttl == f_ttl); - break; - } - } - endfor_ifa(in_dev); - - return ret; - } - } - - return ip->ttl == f_ttl; -} - -bool -nf_osf_match(const struct sk_buff *skb, u_int8_t family, - int hooknum, struct net_device *in, struct net_device *out, - const struct nf_osf_info *info, struct net *net, - const struct list_head *nf_osf_fingers) -{ - const unsigned char *optp = NULL, *_optp = NULL; - unsigned int optsize = 0, check_WSS = 0; - int fmatch = FMATCH_WRONG, fcount = 0; - const struct iphdr *ip = ip_hdr(skb); - const struct nf_osf_user_finger *f; - unsigned char opts[MAX_IPOPTLEN]; - const struct nf_osf_finger *kf; - u16 window, totlen, mss = 0; - const struct tcphdr *tcp; - struct tcphdr _tcph; - bool df; - - tcp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(struct tcphdr), &_tcph); - if (!tcp) - return false; - - if (!tcp->syn) - return false; - - totlen = ntohs(ip->tot_len); - df = ntohs(ip->frag_off) & IP_DF; - window = ntohs(tcp->window); - - if (tcp->doff * 4 > sizeof(struct tcphdr)) { - optsize = tcp->doff * 4 - sizeof(struct tcphdr); - - _optp = optp = skb_header_pointer(skb, ip_hdrlen(skb) + - sizeof(struct tcphdr), optsize, opts); - } - - list_for_each_entry_rcu(kf, &nf_osf_fingers[df], finger_entry) { - int foptsize, optnum; - - f = &kf->finger; - - if (!(info->flags & NF_OSF_LOG) && strcmp(info->genre, f->genre)) - continue; - - optp = _optp; - fmatch = FMATCH_WRONG; - - if (totlen != f->ss || !nf_osf_ttl(skb, info, f->ttl)) - continue; - - /* - * Should not happen if userspace parser was written correctly. - */ - if (f->wss.wc >= OSF_WSS_MAX) - continue; - - /* Check options */ - - foptsize = 0; - for (optnum = 0; optnum < f->opt_num; ++optnum) - foptsize += f->opt[optnum].length; - - if (foptsize > MAX_IPOPTLEN || - optsize > MAX_IPOPTLEN || - optsize != foptsize) - continue; - - check_WSS = f->wss.wc; - - for (optnum = 0; optnum < f->opt_num; ++optnum) { - if (f->opt[optnum].kind == (*optp)) { - __u32 len = f->opt[optnum].length; - const __u8 *optend = optp + len; - - fmatch = FMATCH_OK; - - switch (*optp) { - case OSFOPT_MSS: - mss = optp[3]; - mss <<= 8; - mss |= optp[2]; - - mss = ntohs((__force __be16)mss); - break; - case OSFOPT_TS: - break; - } - - optp = optend; - } else - fmatch = FMATCH_OPT_WRONG; - - if (fmatch != FMATCH_OK) - break; - } - - if (fmatch != FMATCH_OPT_WRONG) { - fmatch = FMATCH_WRONG; - - switch (check_WSS) { - case OSF_WSS_PLAIN: - if (f->wss.val == 0 || window == f->wss.val) - fmatch = FMATCH_OK; - break; - case OSF_WSS_MSS: - /* - * Some smart modems decrease mangle MSS to - * SMART_MSS_2, so we check standard, decreased - * and the one provided in the fingerprint MSS - * values. - */ -#define SMART_MSS_1 1460 -#define SMART_MSS_2 1448 - if (window == f->wss.val * mss || - window == f->wss.val * SMART_MSS_1 || - window == f->wss.val * SMART_MSS_2) - fmatch = FMATCH_OK; - break; - case OSF_WSS_MTU: - if (window == f->wss.val * (mss + 40) || - window == f->wss.val * (SMART_MSS_1 + 40) || - window == f->wss.val * (SMART_MSS_2 + 40)) - fmatch = FMATCH_OK; - break; - case OSF_WSS_MODULO: - if ((window % f->wss.val) == 0) - fmatch = FMATCH_OK; - break; - } - } - - if (fmatch != FMATCH_OK) - continue; - - fcount++; - - if (info->flags & NF_OSF_LOG) - nf_log_packet(net, family, hooknum, skb, - in, out, NULL, - "%s [%s:%s] : %pI4:%d -> %pI4:%d hops=%d\n", - f->genre, f->version, f->subtype, - &ip->saddr, ntohs(tcp->source), - &ip->daddr, ntohs(tcp->dest), - f->ttl - ip->ttl); - - if ((info->flags & NF_OSF_LOG) && - info->loglevel == NF_OSF_LOGLEVEL_FIRST) - break; - } - - if (!fcount && (info->flags & NF_OSF_LOG)) - nf_log_packet(net, family, hooknum, skb, in, out, NULL, - "Remote OS is not known: %pI4:%u -> %pI4:%u\n", - &ip->saddr, ntohs(tcp->source), - &ip->daddr, ntohs(tcp->dest)); - - if (fcount) - fmatch = FMATCH_OK; - - return fmatch == FMATCH_OK; -} -EXPORT_SYMBOL_GPL(nf_osf_match); - -MODULE_LICENSE("GPL"); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f5745e4c6513..67cdd5c4f4f5 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -14,6 +14,7 @@ #include <linux/skbuff.h> #include <linux/netlink.h> #include <linux/vmalloc.h> +#include <linux/rhashtable.h> #include <linux/netfilter.h> #include <linux/netfilter/nfnetlink.h> #include <linux/netfilter/nf_tables.h> @@ -455,20 +456,59 @@ __nf_tables_chain_type_lookup(const struct nlattr *nla, u8 family) return NULL; } +/* + * Loading a module requires dropping mutex that guards the + * transaction. + * We first need to abort any pending transactions as once + * mutex is unlocked a different client could start a new + * transaction. It must not see any 'future generation' + * changes * as these changes will never happen. + */ +#ifdef CONFIG_MODULES +static int __nf_tables_abort(struct net *net); + +static void nft_request_module(struct net *net, const char *fmt, ...) +{ + char module_name[MODULE_NAME_LEN]; + va_list args; + int ret; + + __nf_tables_abort(net); + + va_start(args, fmt); + ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args); + va_end(args); + if (WARN(ret >= MODULE_NAME_LEN, "truncated: '%s' (len %d)", module_name, ret)) + return; + + mutex_unlock(&net->nft.commit_mutex); + request_module("%s", module_name); + mutex_lock(&net->nft.commit_mutex); +} +#endif + +static void lockdep_nfnl_nft_mutex_not_held(void) +{ +#ifdef CONFIG_PROVE_LOCKING + WARN_ON_ONCE(lockdep_nfnl_is_held(NFNL_SUBSYS_NFTABLES)); +#endif +} + static const struct nft_chain_type * -nf_tables_chain_type_lookup(const struct nlattr *nla, u8 family, bool autoload) +nf_tables_chain_type_lookup(struct net *net, const struct nlattr *nla, + u8 family, bool autoload) { const struct nft_chain_type *type; type = __nf_tables_chain_type_lookup(nla, family); if (type != NULL) return type; + + lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (autoload) { - nfnl_unlock(NFNL_SUBSYS_NFTABLES); - request_module("nft-chain-%u-%.*s", family, - nla_len(nla), (const char *)nla_data(nla)); - nfnl_lock(NFNL_SUBSYS_NFTABLES); + nft_request_module(net, "nft-chain-%u-%.*s", family, + nla_len(nla), (const char *)nla_data(nla)); type = __nf_tables_chain_type_lookup(nla, family); if (type != NULL) return ERR_PTR(-EAGAIN); @@ -772,6 +812,7 @@ static int nf_tables_newtable(struct net *net, struct sock *nlsk, struct nft_ctx ctx; int err; + lockdep_assert_held(&net->nft.commit_mutex); attr = nla[NFTA_TABLE_NAME]; table = nft_table_lookup(net, attr, family, genmask); if (IS_ERR(table)) { @@ -1012,7 +1053,17 @@ nft_chain_lookup_byhandle(const struct nft_table *table, u64 handle, u8 genmask) return ERR_PTR(-ENOENT); } -static struct nft_chain *nft_chain_lookup(struct nft_table *table, +static bool lockdep_commit_lock_is_held(struct net *net) +{ +#ifdef CONFIG_PROVE_LOCKING + return lockdep_is_held(&net->nft.commit_mutex); +#else + return true; +#endif +} + +static struct nft_chain *nft_chain_lookup(struct net *net, + struct nft_table *table, const struct nlattr *nla, u8 genmask) { char search[NFT_CHAIN_MAXNAMELEN + 1]; @@ -1025,7 +1076,7 @@ static struct nft_chain *nft_chain_lookup(struct nft_table *table, nla_strlcpy(search, nla, sizeof(search)); WARN_ON(!rcu_read_lock_held() && - !lockdep_nfnl_is_held(NFNL_SUBSYS_NFTABLES)); + !lockdep_commit_lock_is_held(net)); chain = ERR_PTR(-ENOENT); rcu_read_lock(); @@ -1265,7 +1316,7 @@ static int nf_tables_getchain(struct net *net, struct sock *nlsk, return PTR_ERR(table); } - chain = nft_chain_lookup(table, nla[NFTA_CHAIN_NAME], genmask); + chain = nft_chain_lookup(net, table, nla[NFTA_CHAIN_NAME], genmask); if (IS_ERR(chain)) { NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_NAME]); return PTR_ERR(chain); @@ -1391,13 +1442,16 @@ struct nft_chain_hook { static int nft_chain_parse_hook(struct net *net, const struct nlattr * const nla[], struct nft_chain_hook *hook, u8 family, - bool create) + bool autoload) { struct nlattr *ha[NFTA_HOOK_MAX + 1]; const struct nft_chain_type *type; struct net_device *dev; int err; + lockdep_assert_held(&net->nft.commit_mutex); + lockdep_nfnl_nft_mutex_not_held(); + err = nla_parse_nested(ha, NFTA_HOOK_MAX, nla[NFTA_CHAIN_HOOK], nft_hook_policy, NULL); if (err < 0) @@ -1412,8 +1466,8 @@ static int nft_chain_parse_hook(struct net *net, type = chain_type[family][NFT_CHAIN_T_DEFAULT]; if (nla[NFTA_CHAIN_TYPE]) { - type = nf_tables_chain_type_lookup(nla[NFTA_CHAIN_TYPE], - family, create); + type = nf_tables_chain_type_lookup(net, nla[NFTA_CHAIN_TYPE], + family, autoload); if (IS_ERR(type)) return PTR_ERR(type); } @@ -1480,7 +1534,7 @@ static struct nft_rule **nf_tables_chain_alloc_rules(const struct nft_chain *cha } static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, - u8 policy, bool create) + u8 policy) { const struct nlattr * const *nla = ctx->nla; struct nft_table *table = ctx->table; @@ -1498,7 +1552,7 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, struct nft_chain_hook hook; struct nf_hook_ops *ops; - err = nft_chain_parse_hook(net, nla, &hook, family, create); + err = nft_chain_parse_hook(net, nla, &hook, family, true); if (err < 0) return err; @@ -1589,8 +1643,7 @@ err1: return err; } -static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, - bool create) +static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy) { const struct nlattr * const *nla = ctx->nla; struct nft_table *table = ctx->table; @@ -1607,7 +1660,7 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, return -EBUSY; err = nft_chain_parse_hook(ctx->net, nla, &hook, ctx->family, - create); + false); if (err < 0) return err; @@ -1631,7 +1684,8 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, nla[NFTA_CHAIN_NAME]) { struct nft_chain *chain2; - chain2 = nft_chain_lookup(table, nla[NFTA_CHAIN_NAME], genmask); + chain2 = nft_chain_lookup(ctx->net, table, + nla[NFTA_CHAIN_NAME], genmask); if (!IS_ERR(chain2)) return -EEXIST; } @@ -1706,9 +1760,8 @@ static int nf_tables_newchain(struct net *net, struct sock *nlsk, u8 policy = NF_ACCEPT; struct nft_ctx ctx; u64 handle = 0; - bool create; - create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false; + lockdep_assert_held(&net->nft.commit_mutex); table = nft_table_lookup(net, nla[NFTA_CHAIN_TABLE], family, genmask); if (IS_ERR(table)) { @@ -1728,7 +1781,7 @@ static int nf_tables_newchain(struct net *net, struct sock *nlsk, } attr = nla[NFTA_CHAIN_HANDLE]; } else { - chain = nft_chain_lookup(table, attr, genmask); + chain = nft_chain_lookup(net, table, attr, genmask); if (IS_ERR(chain)) { if (PTR_ERR(chain) != -ENOENT) { NL_SET_BAD_ATTR(extack, attr); @@ -1771,10 +1824,10 @@ static int nf_tables_newchain(struct net *net, struct sock *nlsk, if (nlh->nlmsg_flags & NLM_F_REPLACE) return -EOPNOTSUPP; - return nf_tables_updchain(&ctx, genmask, policy, create); + return nf_tables_updchain(&ctx, genmask, policy); } - return nf_tables_addchain(&ctx, family, genmask, policy, create); + return nf_tables_addchain(&ctx, family, genmask, policy); } static int nf_tables_delchain(struct net *net, struct sock *nlsk, @@ -1806,7 +1859,7 @@ static int nf_tables_delchain(struct net *net, struct sock *nlsk, chain = nft_chain_lookup_byhandle(table, handle, genmask); } else { attr = nla[NFTA_CHAIN_NAME]; - chain = nft_chain_lookup(table, attr, genmask); + chain = nft_chain_lookup(net, table, attr, genmask); } if (IS_ERR(chain)) { NL_SET_BAD_ATTR(extack, attr); @@ -1891,7 +1944,8 @@ static const struct nft_expr_type *__nft_expr_type_get(u8 family, return NULL; } -static const struct nft_expr_type *nft_expr_type_get(u8 family, +static const struct nft_expr_type *nft_expr_type_get(struct net *net, + u8 family, struct nlattr *nla) { const struct nft_expr_type *type; @@ -1903,19 +1957,16 @@ static const struct nft_expr_type *nft_expr_type_get(u8 family, if (type != NULL && try_module_get(type->owner)) return type; + lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (type == NULL) { - nfnl_unlock(NFNL_SUBSYS_NFTABLES); - request_module("nft-expr-%u-%.*s", family, - nla_len(nla), (char *)nla_data(nla)); - nfnl_lock(NFNL_SUBSYS_NFTABLES); + nft_request_module(net, "nft-expr-%u-%.*s", family, + nla_len(nla), (char *)nla_data(nla)); if (__nft_expr_type_get(family, nla)) return ERR_PTR(-EAGAIN); - nfnl_unlock(NFNL_SUBSYS_NFTABLES); - request_module("nft-expr-%.*s", - nla_len(nla), (char *)nla_data(nla)); - nfnl_lock(NFNL_SUBSYS_NFTABLES); + nft_request_module(net, "nft-expr-%.*s", + nla_len(nla), (char *)nla_data(nla)); if (__nft_expr_type_get(family, nla)) return ERR_PTR(-EAGAIN); } @@ -1984,7 +2035,7 @@ static int nf_tables_expr_parse(const struct nft_ctx *ctx, if (err < 0) return err; - type = nft_expr_type_get(ctx->family, tb[NFTA_EXPR_NAME]); + type = nft_expr_type_get(ctx->net, ctx->family, tb[NFTA_EXPR_NAME]); if (IS_ERR(type)) return PTR_ERR(type); @@ -2349,7 +2400,7 @@ static int nf_tables_getrule(struct net *net, struct sock *nlsk, return PTR_ERR(table); } - chain = nft_chain_lookup(table, nla[NFTA_RULE_CHAIN], genmask); + chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN], genmask); if (IS_ERR(chain)) { NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]); return PTR_ERR(chain); @@ -2383,6 +2434,7 @@ static void nf_tables_rule_destroy(const struct nft_ctx *ctx, { struct nft_expr *expr; + lockdep_assert_held(&ctx->net->nft.commit_mutex); /* * Careful: some expressions might not be initialized in case this * is called on error from nf_tables_newrule(). @@ -2454,8 +2506,6 @@ static int nft_table_validate(struct net *net, const struct nft_table *table) #define NFT_RULE_MAXEXPRS 128 -static struct nft_expr_info *info; - static int nf_tables_newrule(struct net *net, struct sock *nlsk, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const nla[], @@ -2463,6 +2513,7 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk, { const struct nfgenmsg *nfmsg = nlmsg_data(nlh); u8 genmask = nft_genmask_next(net); + struct nft_expr_info *info = NULL; int family = nfmsg->nfgen_family; struct nft_table *table; struct nft_chain *chain; @@ -2474,10 +2525,9 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk, struct nlattr *tmp; unsigned int size, i, n, ulen = 0, usize = 0; int err, rem; - bool create; u64 handle, pos_handle; - create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false; + lockdep_assert_held(&net->nft.commit_mutex); table = nft_table_lookup(net, nla[NFTA_RULE_TABLE], family, genmask); if (IS_ERR(table)) { @@ -2485,7 +2535,7 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk, return PTR_ERR(table); } - chain = nft_chain_lookup(table, nla[NFTA_RULE_CHAIN], genmask); + chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN], genmask); if (IS_ERR(chain)) { NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]); return PTR_ERR(chain); @@ -2508,7 +2558,8 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk, else return -EOPNOTSUPP; } else { - if (!create || nlh->nlmsg_flags & NLM_F_REPLACE) + if (!(nlh->nlmsg_flags & NLM_F_CREATE) || + nlh->nlmsg_flags & NLM_F_REPLACE) return -EINVAL; handle = nf_tables_alloc_handle(table); @@ -2533,6 +2584,12 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk, n = 0; size = 0; if (nla[NFTA_RULE_EXPRESSIONS]) { + info = kvmalloc_array(NFT_RULE_MAXEXPRS, + sizeof(struct nft_expr_info), + GFP_KERNEL); + if (!info) + return -ENOMEM; + nla_for_each_nested(tmp, nla[NFTA_RULE_EXPRESSIONS], rem) { err = -EINVAL; if (nla_type(tmp) != NFTA_LIST_ELEM) @@ -2625,6 +2682,7 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk, list_add_rcu(&rule->list, &chain->rules); } } + kvfree(info); chain->use++; if (net->nft.validate_state == NFT_VALIDATE_DO) @@ -2638,6 +2696,7 @@ err1: if (info[i].ops != NULL) module_put(info[i].ops->type->owner); } + kvfree(info); return err; } @@ -2677,7 +2736,8 @@ static int nf_tables_delrule(struct net *net, struct sock *nlsk, } if (nla[NFTA_RULE_CHAIN]) { - chain = nft_chain_lookup(table, nla[NFTA_RULE_CHAIN], genmask); + chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN], + genmask); if (IS_ERR(chain)) { NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]); return PTR_ERR(chain); @@ -2769,11 +2829,11 @@ nft_select_set_ops(const struct nft_ctx *ctx, const struct nft_set_type *type; u32 flags = 0; + lockdep_assert_held(&ctx->net->nft.commit_mutex); + lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (list_empty(&nf_tables_set_types)) { - nfnl_unlock(NFNL_SUBSYS_NFTABLES); - request_module("nft-set"); - nfnl_lock(NFNL_SUBSYS_NFTABLES); + nft_request_module(ctx->net, "nft-set"); if (!list_empty(&nf_tables_set_types)) return ERR_PTR(-EAGAIN); } @@ -3295,7 +3355,6 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, struct nft_ctx ctx; char *name; unsigned int size; - bool create; u64 timeout; u32 ktype, dtype, flags, policy, gc_int, objtype; struct nft_set_desc desc; @@ -3396,8 +3455,6 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, return err; } - create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false; - table = nft_table_lookup(net, nla[NFTA_SET_TABLE], family, genmask); if (IS_ERR(table)) { NL_SET_BAD_ATTR(extack, nla[NFTA_SET_TABLE]); @@ -3963,7 +4020,6 @@ static int nft_get_set_elem(struct nft_ctx *ctx, struct nft_set *set, const struct nlattr *attr) { struct nlattr *nla[NFTA_SET_ELEM_MAX + 1]; - const struct nft_set_ext *ext; struct nft_data_desc desc; struct nft_set_elem elem; struct sk_buff *skb; @@ -3997,7 +4053,6 @@ static int nft_get_set_elem(struct nft_ctx *ctx, struct nft_set *set, return PTR_ERR(priv); elem.priv = priv; - ext = nft_set_elem_ext(set, &elem); err = -ENOMEM; skb = nlmsg_new(NLMSG_GOODSIZE, GFP_ATOMIC); @@ -4818,7 +4873,8 @@ static const struct nft_object_type *__nft_obj_type_get(u32 objtype) return NULL; } -static const struct nft_object_type *nft_obj_type_get(u32 objtype) +static const struct nft_object_type * +nft_obj_type_get(struct net *net, u32 objtype) { const struct nft_object_type *type; @@ -4826,11 +4882,10 @@ static const struct nft_object_type *nft_obj_type_get(u32 objtype) if (type != NULL && try_module_get(type->owner)) return type; + lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (type == NULL) { - nfnl_unlock(NFNL_SUBSYS_NFTABLES); - request_module("nft-obj-%u", objtype); - nfnl_lock(NFNL_SUBSYS_NFTABLES); + nft_request_module(net, "nft-obj-%u", objtype); if (__nft_obj_type_get(objtype)) return ERR_PTR(-EAGAIN); } @@ -4882,7 +4937,7 @@ static int nf_tables_newobj(struct net *net, struct sock *nlsk, nft_ctx_init(&ctx, net, skb, nlh, family, table, NULL, nla); - type = nft_obj_type_get(objtype); + type = nft_obj_type_get(net, objtype); if (IS_ERR(type)) return PTR_ERR(type); @@ -5372,7 +5427,8 @@ static const struct nf_flowtable_type *__nft_flowtable_type_get(u8 family) return NULL; } -static const struct nf_flowtable_type *nft_flowtable_type_get(u8 family) +static const struct nf_flowtable_type * +nft_flowtable_type_get(struct net *net, u8 family) { const struct nf_flowtable_type *type; @@ -5380,11 +5436,10 @@ static const struct nf_flowtable_type *nft_flowtable_type_get(u8 family) if (type != NULL && try_module_get(type->owner)) return type; + lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (type == NULL) { - nfnl_unlock(NFNL_SUBSYS_NFTABLES); - request_module("nf-flowtable-%u", family); - nfnl_lock(NFNL_SUBSYS_NFTABLES); + nft_request_module(net, "nf-flowtable-%u", family); if (__nft_flowtable_type_get(family)) return ERR_PTR(-EAGAIN); } @@ -5464,7 +5519,7 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk, goto err1; } - type = nft_flowtable_type_get(family); + type = nft_flowtable_type_get(net, family); if (IS_ERR(type)) { err = PTR_ERR(type); goto err2; @@ -5874,13 +5929,13 @@ static int nf_tables_flowtable_event(struct notifier_block *this, if (!net) return 0; - nfnl_lock(NFNL_SUBSYS_NFTABLES); + mutex_lock(&net->nft.commit_mutex); list_for_each_entry(table, &net->nft.tables, list) { list_for_each_entry(flowtable, &table->flowtables, list) { nft_flowtable_event(event, dev, flowtable); } } - nfnl_unlock(NFNL_SUBSYS_NFTABLES); + mutex_unlock(&net->nft.commit_mutex); put_net(net); return NOTIFY_DONE; } @@ -6232,9 +6287,9 @@ static void nf_tables_commit_chain_active(struct net *net, struct nft_chain *cha next_genbit = nft_gencursor_next(net); g0 = rcu_dereference_protected(chain->rules_gen_0, - lockdep_nfnl_is_held(NFNL_SUBSYS_NFTABLES)); + lockdep_commit_lock_is_held(net)); g1 = rcu_dereference_protected(chain->rules_gen_1, - lockdep_nfnl_is_held(NFNL_SUBSYS_NFTABLES)); + lockdep_commit_lock_is_held(net)); /* No changes to this chain? */ if (chain->rules_next == NULL) { @@ -6444,6 +6499,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nf_tables_commit_release(net); nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN); + mutex_unlock(&net->nft.commit_mutex); return 0; } @@ -6595,12 +6651,25 @@ static void nf_tables_cleanup(struct net *net) static int nf_tables_abort(struct net *net, struct sk_buff *skb) { - return __nf_tables_abort(net); + int ret = __nf_tables_abort(net); + + mutex_unlock(&net->nft.commit_mutex); + + return ret; } static bool nf_tables_valid_genid(struct net *net, u32 genid) { - return net->nft.base_seq == genid; + bool genid_ok; + + mutex_lock(&net->nft.commit_mutex); + + genid_ok = genid == 0 || net->nft.base_seq == genid; + if (!genid_ok) + mutex_unlock(&net->nft.commit_mutex); + + /* else, commit mutex has to be released by commit or abort function */ + return genid_ok; } static const struct nfnetlink_subsystem nf_tables_subsys = { @@ -6612,6 +6681,7 @@ static const struct nfnetlink_subsystem nf_tables_subsys = { .abort = nf_tables_abort, .cleanup = nf_tables_cleanup, .valid_genid = nf_tables_valid_genid, + .owner = THIS_MODULE, }; int nft_chain_validate_dependency(const struct nft_chain *chain, @@ -6931,8 +7001,8 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data, case NFT_GOTO: if (!tb[NFTA_VERDICT_CHAIN]) return -EINVAL; - chain = nft_chain_lookup(ctx->table, tb[NFTA_VERDICT_CHAIN], - genmask); + chain = nft_chain_lookup(ctx->net, ctx->table, + tb[NFTA_VERDICT_CHAIN], genmask); if (IS_ERR(chain)) return PTR_ERR(chain); if (nft_is_base_chain(chain)) @@ -7177,6 +7247,7 @@ static int __net_init nf_tables_init_net(struct net *net) { INIT_LIST_HEAD(&net->nft.tables); INIT_LIST_HEAD(&net->nft.commit_list); + mutex_init(&net->nft.commit_mutex); net->nft.base_seq = 1; net->nft.validate_state = NFT_VALIDATE_SKIP; @@ -7185,11 +7256,11 @@ static int __net_init nf_tables_init_net(struct net *net) static void __net_exit nf_tables_exit_net(struct net *net) { - nfnl_lock(NFNL_SUBSYS_NFTABLES); + mutex_lock(&net->nft.commit_mutex); if (!list_empty(&net->nft.commit_list)) __nf_tables_abort(net); __nft_release_tables(net); - nfnl_unlock(NFNL_SUBSYS_NFTABLES); + mutex_unlock(&net->nft.commit_mutex); WARN_ON_ONCE(!list_empty(&net->nft.tables)); } @@ -7204,29 +7275,19 @@ static int __init nf_tables_module_init(void) nft_chain_filter_init(); - info = kmalloc_array(NFT_RULE_MAXEXPRS, sizeof(struct nft_expr_info), - GFP_KERNEL); - if (info == NULL) { - err = -ENOMEM; - goto err1; - } - err = nf_tables_core_module_init(); if (err < 0) - goto err2; + return err; err = nfnetlink_subsys_register(&nf_tables_subsys); if (err < 0) - goto err3; + goto err; register_netdevice_notifier(&nf_tables_flowtable_notifier); return register_pernet_subsys(&nf_tables_net_ops); -err3: +err: nf_tables_core_module_exit(); -err2: - kfree(info); -err1: return err; } @@ -7238,7 +7299,6 @@ static void __exit nf_tables_module_exit(void) unregister_pernet_subsys(&nf_tables_net_ops); rcu_barrier(); nf_tables_core_module_exit(); - kfree(info); } module_init(nf_tables_module_init); diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index 8de912ca53d3..ffd5c0f9412b 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -120,6 +120,20 @@ struct nft_jumpstack { struct nft_rule *const *rules; }; +static void expr_call_ops_eval(const struct nft_expr *expr, + struct nft_regs *regs, + struct nft_pktinfo *pkt) +{ + unsigned long e = (unsigned long)expr->ops->eval; + + if (e == (unsigned long)nft_meta_get_eval) + nft_meta_get_eval(expr, regs, pkt); + else if (e == (unsigned long)nft_lookup_eval) + nft_lookup_eval(expr, regs, pkt); + else + expr->ops->eval(expr, regs, pkt); +} + unsigned int nft_do_chain(struct nft_pktinfo *pkt, void *priv) { @@ -153,7 +167,7 @@ next_rule: nft_cmp_fast_eval(expr, ®s); else if (expr->ops != &nft_payload_fast_ops || !nft_payload_fast_eval(expr, ®s, pkt)) - expr->ops->eval(expr, ®s, pkt); + expr_call_ops_eval(expr, ®s, pkt); if (regs.verdict.code != NFT_CONTINUE) break; diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index e1b6be29848d..916913454624 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -331,18 +331,27 @@ replay: } } - if (!ss->commit || !ss->abort) { + if (!ss->valid_genid || !ss->commit || !ss->abort) { nfnl_unlock(subsys_id); netlink_ack(oskb, nlh, -EOPNOTSUPP, NULL); return kfree_skb(skb); } - if (genid && ss->valid_genid && !ss->valid_genid(net, genid)) { + if (!try_module_get(ss->owner)) { + nfnl_unlock(subsys_id); + netlink_ack(oskb, nlh, -EOPNOTSUPP, NULL); + return kfree_skb(skb); + } + + if (!ss->valid_genid(net, genid)) { + module_put(ss->owner); nfnl_unlock(subsys_id); netlink_ack(oskb, nlh, -ERESTART, NULL); return kfree_skb(skb); } + nfnl_unlock(subsys_id); + while (skb->len >= nlmsg_total_size(0)) { int msglen, type; @@ -464,14 +473,10 @@ ack: } done: if (status & NFNL_BATCH_REPLAY) { - const struct nfnetlink_subsystem *ss2; - - ss2 = nfnl_dereference_protected(subsys_id); - if (ss2 == ss) - ss->abort(net, oskb); + ss->abort(net, oskb); nfnl_err_reset(&err_list); - nfnl_unlock(subsys_id); kfree_skb(skb); + module_put(ss->owner); goto replay; } else if (status == NFNL_BATCH_DONE) { err = ss->commit(net, oskb); @@ -489,8 +494,8 @@ done: ss->cleanup(net); nfnl_err_deliver(&err_list, oskb); - nfnl_unlock(subsys_id); kfree_skb(skb); + module_put(ss->owner); } static const struct nla_policy nfnl_batch_policy[NFNL_BATCH_MAX + 1] = { diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c index 9ee5fa551fa6..d46a236cdf31 100644 --- a/net/netfilter/nfnetlink_cttimeout.c +++ b/net/netfilter/nfnetlink_cttimeout.c @@ -26,7 +26,6 @@ #include <net/sock.h> #include <net/netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_core.h> -#include <net/netfilter/nf_conntrack_l3proto.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_tuple.h> #include <net/netfilter/nf_conntrack_timeout.h> @@ -47,7 +46,7 @@ static const struct nla_policy cttimeout_nla_policy[CTA_TIMEOUT_MAX+1] = { }; static int -ctnl_timeout_parse_policy(void *timeouts, +ctnl_timeout_parse_policy(void *timeout, const struct nf_conntrack_l4proto *l4proto, struct net *net, const struct nlattr *attr) { @@ -68,7 +67,7 @@ ctnl_timeout_parse_policy(void *timeouts, if (ret < 0) goto err; - ret = l4proto->ctnl_timeout.nlattr_to_obj(tb, net, timeouts); + ret = l4proto->ctnl_timeout.nlattr_to_obj(tb, net, timeout); err: kfree(tb); @@ -114,13 +113,13 @@ static int cttimeout_new_timeout(struct net *net, struct sock *ctnl, /* You cannot replace one timeout policy by another of * different kind, sorry. */ - if (matching->l3num != l3num || - matching->l4proto->l4proto != l4num) + if (matching->timeout.l3num != l3num || + matching->timeout.l4proto->l4proto != l4num) return -EINVAL; - return ctnl_timeout_parse_policy(&matching->data, - matching->l4proto, net, - cda[CTA_TIMEOUT_DATA]); + return ctnl_timeout_parse_policy(&matching->timeout.data, + matching->timeout.l4proto, + net, cda[CTA_TIMEOUT_DATA]); } return -EBUSY; @@ -141,14 +140,14 @@ static int cttimeout_new_timeout(struct net *net, struct sock *ctnl, goto err_proto_put; } - ret = ctnl_timeout_parse_policy(&timeout->data, l4proto, net, + ret = ctnl_timeout_parse_policy(&timeout->timeout.data, l4proto, net, cda[CTA_TIMEOUT_DATA]); if (ret < 0) goto err; strcpy(timeout->name, nla_data(cda[CTA_TIMEOUT_NAME])); - timeout->l3num = l3num; - timeout->l4proto = l4proto; + timeout->timeout.l3num = l3num; + timeout->timeout.l4proto = l4proto; refcount_set(&timeout->refcnt, 1); list_add_tail_rcu(&timeout->head, &net->nfct_timeout_list); @@ -167,7 +166,7 @@ ctnl_timeout_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type, struct nlmsghdr *nlh; struct nfgenmsg *nfmsg; unsigned int flags = portid ? NLM_F_MULTI : 0; - const struct nf_conntrack_l4proto *l4proto = timeout->l4proto; + const struct nf_conntrack_l4proto *l4proto = timeout->timeout.l4proto; event = nfnl_msg_type(NFNL_SUBSYS_CTNETLINK_TIMEOUT, event); nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nfmsg), flags); @@ -180,8 +179,9 @@ ctnl_timeout_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type, nfmsg->res_id = 0; if (nla_put_string(skb, CTA_TIMEOUT_NAME, timeout->name) || - nla_put_be16(skb, CTA_TIMEOUT_L3PROTO, htons(timeout->l3num)) || - nla_put_u8(skb, CTA_TIMEOUT_L4PROTO, timeout->l4proto->l4proto) || + nla_put_be16(skb, CTA_TIMEOUT_L3PROTO, + htons(timeout->timeout.l3num)) || + nla_put_u8(skb, CTA_TIMEOUT_L4PROTO, l4proto->l4proto) || nla_put_be32(skb, CTA_TIMEOUT_USE, htonl(refcount_read(&timeout->refcnt)))) goto nla_put_failure; @@ -195,7 +195,8 @@ ctnl_timeout_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type, if (!nest_parms) goto nla_put_failure; - ret = l4proto->ctnl_timeout.obj_to_nlattr(skb, &timeout->data); + ret = l4proto->ctnl_timeout.obj_to_nlattr(skb, + &timeout->timeout.data); if (ret < 0) goto nla_put_failure; @@ -298,22 +299,6 @@ static int cttimeout_get_timeout(struct net *net, struct sock *ctnl, return ret; } -static int untimeout(struct nf_conn *ct, void *timeout) -{ - struct nf_conn_timeout *timeout_ext = nf_ct_timeout_find(ct); - - if (timeout_ext && (!timeout || timeout_ext->timeout == timeout)) - RCU_INIT_POINTER(timeout_ext->timeout, NULL); - - /* We are not intended to delete this conntrack. */ - return 0; -} - -static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout) -{ - nf_ct_iterate_cleanup_net(net, untimeout, timeout, 0, 0); -} - /* try to delete object, fail if it is still in use. */ static int ctnl_timeout_try_del(struct net *net, struct ctnl_timeout *timeout) { @@ -325,8 +310,8 @@ static int ctnl_timeout_try_del(struct net *net, struct ctnl_timeout *timeout) if (refcount_dec_if_one(&timeout->refcnt)) { /* We are protected by nfnl mutex. */ list_del_rcu(&timeout->head); - nf_ct_l4proto_put(timeout->l4proto); - ctnl_untimeout(net, timeout); + nf_ct_l4proto_put(timeout->timeout.l4proto); + nf_ct_untimeout(net, &timeout->timeout); kfree_rcu(timeout, rcu_head); } else { ret = -EBUSY; @@ -373,7 +358,6 @@ static int cttimeout_default_set(struct net *net, struct sock *ctnl, struct netlink_ext_ack *extack) { const struct nf_conntrack_l4proto *l4proto; - unsigned int *timeouts; __u16 l3num; __u8 l4num; int ret; @@ -393,9 +377,7 @@ static int cttimeout_default_set(struct net *net, struct sock *ctnl, goto err; } - timeouts = l4proto->get_timeouts(net); - - ret = ctnl_timeout_parse_policy(timeouts, l4proto, net, + ret = ctnl_timeout_parse_policy(NULL, l4proto, net, cda[CTA_TIMEOUT_DATA]); if (ret < 0) goto err; @@ -432,7 +414,6 @@ cttimeout_default_fill_info(struct net *net, struct sk_buff *skb, u32 portid, if (likely(l4proto->ctnl_timeout.obj_to_nlattr)) { struct nlattr *nest_parms; - unsigned int *timeouts = l4proto->get_timeouts(net); int ret; nest_parms = nla_nest_start(skb, @@ -440,7 +421,7 @@ cttimeout_default_fill_info(struct net *net, struct sk_buff *skb, u32 portid, if (!nest_parms) goto nla_put_failure; - ret = l4proto->ctnl_timeout.obj_to_nlattr(skb, timeouts); + ret = l4proto->ctnl_timeout.obj_to_nlattr(skb, NULL); if (ret < 0) goto nla_put_failure; @@ -508,7 +489,6 @@ err: return err; } -#ifdef CONFIG_NF_CONNTRACK_TIMEOUT static struct ctnl_timeout * ctnl_timeout_find_get(struct net *net, const char *name) { @@ -532,14 +512,16 @@ err: return matching; } -static void ctnl_timeout_put(struct ctnl_timeout *timeout) +static void ctnl_timeout_put(struct nf_ct_timeout *t) { + struct ctnl_timeout *timeout = + container_of(t, struct ctnl_timeout, timeout); + if (refcount_dec_and_test(&timeout->refcnt)) kfree_rcu(timeout, rcu_head); module_put(THIS_MODULE); } -#endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ static const struct nfnl_callback cttimeout_cb[IPCTNL_MSG_TIMEOUT_MAX] = { [IPCTNL_MSG_TIMEOUT_NEW] = { .call = cttimeout_new_timeout, @@ -580,11 +562,11 @@ static void __net_exit cttimeout_net_exit(struct net *net) struct ctnl_timeout *cur, *tmp; nf_ct_unconfirmed_destroy(net); - ctnl_untimeout(net, NULL); + nf_ct_untimeout(net, NULL); list_for_each_entry_safe(cur, tmp, &net->nfct_timeout_list, head) { list_del_rcu(&cur->head); - nf_ct_l4proto_put(cur->l4proto); + nf_ct_l4proto_put(cur->timeout.l4proto); if (refcount_dec_and_test(&cur->refcnt)) kfree_rcu(cur, rcu_head); @@ -610,10 +592,8 @@ static int __init cttimeout_init(void) "nfnetlink.\n"); goto err_out; } -#ifdef CONFIG_NF_CONNTRACK_TIMEOUT RCU_INIT_POINTER(nf_ct_timeout_find_get_hook, ctnl_timeout_find_get); RCU_INIT_POINTER(nf_ct_timeout_put_hook, ctnl_timeout_put); -#endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ return 0; err_out: @@ -626,11 +606,9 @@ static void __exit cttimeout_exit(void) nfnetlink_subsys_unregister(&cttimeout_subsys); unregister_pernet_subsys(&cttimeout_ops); -#ifdef CONFIG_NF_CONNTRACK_TIMEOUT RCU_INIT_POINTER(nf_ct_timeout_find_get_hook, NULL); RCU_INIT_POINTER(nf_ct_timeout_put_hook, NULL); synchronize_rcu(); -#endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ } module_init(cttimeout_init); diff --git a/net/netfilter/nfnetlink_osf.c b/net/netfilter/nfnetlink_osf.c new file mode 100644 index 000000000000..00db27dfd2ff --- /dev/null +++ b/net/netfilter/nfnetlink_osf.c @@ -0,0 +1,436 @@ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include <linux/module.h> +#include <linux/kernel.h> + +#include <linux/capability.h> +#include <linux/if.h> +#include <linux/inetdevice.h> +#include <linux/ip.h> +#include <linux/list.h> +#include <linux/rculist.h> +#include <linux/skbuff.h> +#include <linux/slab.h> +#include <linux/tcp.h> + +#include <net/ip.h> +#include <net/tcp.h> + +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/x_tables.h> +#include <net/netfilter/nf_log.h> +#include <linux/netfilter/nfnetlink_osf.h> + +/* + * Indexed by dont-fragment bit. + * It is the only constant value in the fingerprint. + */ +struct list_head nf_osf_fingers[2]; +EXPORT_SYMBOL_GPL(nf_osf_fingers); + +static inline int nf_osf_ttl(const struct sk_buff *skb, + int ttl_check, unsigned char f_ttl) +{ + const struct iphdr *ip = ip_hdr(skb); + + if (ttl_check != -1) { + if (ttl_check == NF_OSF_TTL_TRUE) + return ip->ttl == f_ttl; + if (ttl_check == NF_OSF_TTL_NOCHECK) + return 1; + else if (ip->ttl <= f_ttl) + return 1; + else { + struct in_device *in_dev = __in_dev_get_rcu(skb->dev); + int ret = 0; + + for_ifa(in_dev) { + if (inet_ifa_match(ip->saddr, ifa)) { + ret = (ip->ttl == f_ttl); + break; + } + } + endfor_ifa(in_dev); + + return ret; + } + } + + return ip->ttl == f_ttl; +} + +struct nf_osf_hdr_ctx { + bool df; + u16 window; + u16 totlen; + const unsigned char *optp; + unsigned int optsize; +}; + +static bool nf_osf_match_one(const struct sk_buff *skb, + const struct nf_osf_user_finger *f, + int ttl_check, + struct nf_osf_hdr_ctx *ctx) +{ + unsigned int check_WSS = 0; + int fmatch = FMATCH_WRONG; + int foptsize, optnum; + u16 mss = 0; + + if (ctx->totlen != f->ss || !nf_osf_ttl(skb, ttl_check, f->ttl)) + return false; + + /* + * Should not happen if userspace parser was written correctly. + */ + if (f->wss.wc >= OSF_WSS_MAX) + return false; + + /* Check options */ + + foptsize = 0; + for (optnum = 0; optnum < f->opt_num; ++optnum) + foptsize += f->opt[optnum].length; + + if (foptsize > MAX_IPOPTLEN || + ctx->optsize > MAX_IPOPTLEN || + ctx->optsize != foptsize) + return false; + + check_WSS = f->wss.wc; + + for (optnum = 0; optnum < f->opt_num; ++optnum) { + if (f->opt[optnum].kind == *ctx->optp) { + __u32 len = f->opt[optnum].length; + const __u8 *optend = ctx->optp + len; + + fmatch = FMATCH_OK; + + switch (*ctx->optp) { + case OSFOPT_MSS: + mss = ctx->optp[3]; + mss <<= 8; + mss |= ctx->optp[2]; + + mss = ntohs((__force __be16)mss); + break; + case OSFOPT_TS: + break; + } + + ctx->optp = optend; + } else + fmatch = FMATCH_OPT_WRONG; + + if (fmatch != FMATCH_OK) + break; + } + + if (fmatch != FMATCH_OPT_WRONG) { + fmatch = FMATCH_WRONG; + + switch (check_WSS) { + case OSF_WSS_PLAIN: + if (f->wss.val == 0 || ctx->window == f->wss.val) + fmatch = FMATCH_OK; + break; + case OSF_WSS_MSS: + /* + * Some smart modems decrease mangle MSS to + * SMART_MSS_2, so we check standard, decreased + * and the one provided in the fingerprint MSS + * values. + */ +#define SMART_MSS_1 1460 +#define SMART_MSS_2 1448 + if (ctx->window == f->wss.val * mss || + ctx->window == f->wss.val * SMART_MSS_1 || + ctx->window == f->wss.val * SMART_MSS_2) + fmatch = FMATCH_OK; + break; + case OSF_WSS_MTU: + if (ctx->window == f->wss.val * (mss + 40) || + ctx->window == f->wss.val * (SMART_MSS_1 + 40) || + ctx->window == f->wss.val * (SMART_MSS_2 + 40)) + fmatch = FMATCH_OK; + break; + case OSF_WSS_MODULO: + if ((ctx->window % f->wss.val) == 0) + fmatch = FMATCH_OK; + break; + } + } + + return fmatch == FMATCH_OK; +} + +static const struct tcphdr *nf_osf_hdr_ctx_init(struct nf_osf_hdr_ctx *ctx, + const struct sk_buff *skb, + const struct iphdr *ip, + unsigned char *opts) +{ + const struct tcphdr *tcp; + struct tcphdr _tcph; + + tcp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(struct tcphdr), &_tcph); + if (!tcp) + return NULL; + + if (!tcp->syn) + return NULL; + + ctx->totlen = ntohs(ip->tot_len); + ctx->df = ntohs(ip->frag_off) & IP_DF; + ctx->window = ntohs(tcp->window); + + if (tcp->doff * 4 > sizeof(struct tcphdr)) { + ctx->optsize = tcp->doff * 4 - sizeof(struct tcphdr); + + ctx->optp = skb_header_pointer(skb, ip_hdrlen(skb) + + sizeof(struct tcphdr), ctx->optsize, opts); + } + + return tcp; +} + +bool +nf_osf_match(const struct sk_buff *skb, u_int8_t family, + int hooknum, struct net_device *in, struct net_device *out, + const struct nf_osf_info *info, struct net *net, + const struct list_head *nf_osf_fingers) +{ + const struct iphdr *ip = ip_hdr(skb); + const struct nf_osf_user_finger *f; + unsigned char opts[MAX_IPOPTLEN]; + const struct nf_osf_finger *kf; + int fcount = 0, ttl_check; + int fmatch = FMATCH_WRONG; + struct nf_osf_hdr_ctx ctx; + const struct tcphdr *tcp; + + memset(&ctx, 0, sizeof(ctx)); + + tcp = nf_osf_hdr_ctx_init(&ctx, skb, ip, opts); + if (!tcp) + return false; + + ttl_check = (info->flags & NF_OSF_TTL) ? info->ttl : -1; + + list_for_each_entry_rcu(kf, &nf_osf_fingers[ctx.df], finger_entry) { + + f = &kf->finger; + + if (!(info->flags & NF_OSF_LOG) && strcmp(info->genre, f->genre)) + continue; + + if (!nf_osf_match_one(skb, f, ttl_check, &ctx)) + continue; + + fmatch = FMATCH_OK; + + fcount++; + + if (info->flags & NF_OSF_LOG) + nf_log_packet(net, family, hooknum, skb, + in, out, NULL, + "%s [%s:%s] : %pI4:%d -> %pI4:%d hops=%d\n", + f->genre, f->version, f->subtype, + &ip->saddr, ntohs(tcp->source), + &ip->daddr, ntohs(tcp->dest), + f->ttl - ip->ttl); + + if ((info->flags & NF_OSF_LOG) && + info->loglevel == NF_OSF_LOGLEVEL_FIRST) + break; + } + + if (!fcount && (info->flags & NF_OSF_LOG)) + nf_log_packet(net, family, hooknum, skb, in, out, NULL, + "Remote OS is not known: %pI4:%u -> %pI4:%u\n", + &ip->saddr, ntohs(tcp->source), + &ip->daddr, ntohs(tcp->dest)); + + if (fcount) + fmatch = FMATCH_OK; + + return fmatch == FMATCH_OK; +} +EXPORT_SYMBOL_GPL(nf_osf_match); + +const char *nf_osf_find(const struct sk_buff *skb, + const struct list_head *nf_osf_fingers) +{ + const struct iphdr *ip = ip_hdr(skb); + const struct nf_osf_user_finger *f; + unsigned char opts[MAX_IPOPTLEN]; + const struct nf_osf_finger *kf; + struct nf_osf_hdr_ctx ctx; + const struct tcphdr *tcp; + const char *genre = NULL; + + memset(&ctx, 0, sizeof(ctx)); + + tcp = nf_osf_hdr_ctx_init(&ctx, skb, ip, opts); + if (!tcp) + return NULL; + + list_for_each_entry_rcu(kf, &nf_osf_fingers[ctx.df], finger_entry) { + f = &kf->finger; + if (!nf_osf_match_one(skb, f, -1, &ctx)) + continue; + + genre = f->genre; + break; + } + + return genre; +} +EXPORT_SYMBOL_GPL(nf_osf_find); + +static const struct nla_policy nfnl_osf_policy[OSF_ATTR_MAX + 1] = { + [OSF_ATTR_FINGER] = { .len = sizeof(struct nf_osf_user_finger) }, +}; + +static int nfnl_osf_add_callback(struct net *net, struct sock *ctnl, + struct sk_buff *skb, const struct nlmsghdr *nlh, + const struct nlattr * const osf_attrs[], + struct netlink_ext_ack *extack) +{ + struct nf_osf_user_finger *f; + struct nf_osf_finger *kf = NULL, *sf; + int err = 0; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + if (!osf_attrs[OSF_ATTR_FINGER]) + return -EINVAL; + + if (!(nlh->nlmsg_flags & NLM_F_CREATE)) + return -EINVAL; + + f = nla_data(osf_attrs[OSF_ATTR_FINGER]); + + kf = kmalloc(sizeof(struct nf_osf_finger), GFP_KERNEL); + if (!kf) + return -ENOMEM; + + memcpy(&kf->finger, f, sizeof(struct nf_osf_user_finger)); + + list_for_each_entry(sf, &nf_osf_fingers[!!f->df], finger_entry) { + if (memcmp(&sf->finger, f, sizeof(struct nf_osf_user_finger))) + continue; + + kfree(kf); + kf = NULL; + + if (nlh->nlmsg_flags & NLM_F_EXCL) + err = -EEXIST; + break; + } + + /* + * We are protected by nfnl mutex. + */ + if (kf) + list_add_tail_rcu(&kf->finger_entry, &nf_osf_fingers[!!f->df]); + + return err; +} + +static int nfnl_osf_remove_callback(struct net *net, struct sock *ctnl, + struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const osf_attrs[], + struct netlink_ext_ack *extack) +{ + struct nf_osf_user_finger *f; + struct nf_osf_finger *sf; + int err = -ENOENT; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + if (!osf_attrs[OSF_ATTR_FINGER]) + return -EINVAL; + + f = nla_data(osf_attrs[OSF_ATTR_FINGER]); + + list_for_each_entry(sf, &nf_osf_fingers[!!f->df], finger_entry) { + if (memcmp(&sf->finger, f, sizeof(struct nf_osf_user_finger))) + continue; + + /* + * We are protected by nfnl mutex. + */ + list_del_rcu(&sf->finger_entry); + kfree_rcu(sf, rcu_head); + + err = 0; + break; + } + + return err; +} + +static const struct nfnl_callback nfnl_osf_callbacks[OSF_MSG_MAX] = { + [OSF_MSG_ADD] = { + .call = nfnl_osf_add_callback, + .attr_count = OSF_ATTR_MAX, + .policy = nfnl_osf_policy, + }, + [OSF_MSG_REMOVE] = { + .call = nfnl_osf_remove_callback, + .attr_count = OSF_ATTR_MAX, + .policy = nfnl_osf_policy, + }, +}; + +static const struct nfnetlink_subsystem nfnl_osf_subsys = { + .name = "osf", + .subsys_id = NFNL_SUBSYS_OSF, + .cb_count = OSF_MSG_MAX, + .cb = nfnl_osf_callbacks, +}; + +static int __init nfnl_osf_init(void) +{ + int err = -EINVAL; + int i; + + for (i = 0; i < ARRAY_SIZE(nf_osf_fingers); ++i) + INIT_LIST_HEAD(&nf_osf_fingers[i]); + + err = nfnetlink_subsys_register(&nfnl_osf_subsys); + if (err < 0) { + pr_err("Failed to register OSF nsfnetlink helper (%d)\n", err); + goto err_out_exit; + } + return 0; + +err_out_exit: + return err; +} + +static void __exit nfnl_osf_fini(void) +{ + struct nf_osf_finger *f; + int i; + + nfnetlink_subsys_unregister(&nfnl_osf_subsys); + + rcu_read_lock(); + for (i = 0; i < ARRAY_SIZE(nf_osf_fingers); ++i) { + list_for_each_entry_rcu(f, &nf_osf_fingers[i], finger_entry) { + list_del_rcu(&f->finger_entry); + kfree_rcu(f, rcu_head); + } + } + rcu_read_unlock(); + + rcu_barrier(); +} + +module_init(nfnl_osf_init); +module_exit(nfnl_osf_fini); + +MODULE_LICENSE("GPL"); diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c index d21834bed805..ea5b7c4944f6 100644 --- a/net/netfilter/nft_chain_filter.c +++ b/net/netfilter/nft_chain_filter.c @@ -322,7 +322,7 @@ static int nf_tables_netdev_event(struct notifier_block *this, if (!ctx.net) return NOTIFY_DONE; - nfnl_lock(NFNL_SUBSYS_NFTABLES); + mutex_lock(&ctx.net->nft.commit_mutex); list_for_each_entry(table, &ctx.net->nft.tables, list) { if (table->family != NFPROTO_NETDEV) continue; @@ -337,7 +337,7 @@ static int nf_tables_netdev_event(struct notifier_block *this, nft_netdev_event(event, dev, &ctx); } } - nfnl_unlock(NFNL_SUBSYS_NFTABLES); + mutex_unlock(&ctx.net->nft.commit_mutex); put_net(ctx.net); return NOTIFY_DONE; diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c index a832c59f0a9c..b90d96ba4a12 100644 --- a/net/netfilter/nft_connlimit.c +++ b/net/netfilter/nft_connlimit.c @@ -14,10 +14,9 @@ #include <net/netfilter/nf_conntrack_zones.h> struct nft_connlimit { - spinlock_t lock; - struct hlist_head hhead; - u32 limit; - bool invert; + struct nf_conncount_list list; + u32 limit; + bool invert; }; static inline void nft_connlimit_do_eval(struct nft_connlimit *priv, @@ -45,21 +44,19 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv, return; } - spin_lock_bh(&priv->lock); - count = nf_conncount_lookup(nft_net(pkt), &priv->hhead, tuple_ptr, zone, - &addit); + nf_conncount_lookup(nft_net(pkt), &priv->list, tuple_ptr, zone, + &addit); + count = priv->list.count; if (!addit) goto out; - if (!nf_conncount_add(&priv->hhead, tuple_ptr, zone)) { + if (nf_conncount_add(&priv->list, tuple_ptr, zone) == NF_CONNCOUNT_ERR) { regs->verdict.code = NF_DROP; - spin_unlock_bh(&priv->lock); return; } count++; out: - spin_unlock_bh(&priv->lock); if ((count > priv->limit) ^ priv->invert) { regs->verdict.code = NFT_BREAK; @@ -87,8 +84,7 @@ static int nft_connlimit_do_init(const struct nft_ctx *ctx, invert = true; } - spin_lock_init(&priv->lock); - INIT_HLIST_HEAD(&priv->hhead); + nf_conncount_list_init(&priv->list); priv->limit = limit; priv->invert = invert; @@ -99,7 +95,7 @@ static void nft_connlimit_do_destroy(const struct nft_ctx *ctx, struct nft_connlimit *priv) { nf_ct_netns_put(ctx->net, ctx->family); - nf_conncount_cache_free(&priv->hhead); + nf_conncount_cache_free(&priv->list); } static int nft_connlimit_do_dump(struct sk_buff *skb, @@ -212,8 +208,7 @@ static int nft_connlimit_clone(struct nft_expr *dst, const struct nft_expr *src) struct nft_connlimit *priv_dst = nft_expr_priv(dst); struct nft_connlimit *priv_src = nft_expr_priv(src); - spin_lock_init(&priv_dst->lock); - INIT_HLIST_HEAD(&priv_dst->hhead); + nf_conncount_list_init(&priv_dst->list); priv_dst->limit = priv_src->limit; priv_dst->invert = priv_src->invert; @@ -225,21 +220,14 @@ static void nft_connlimit_destroy_clone(const struct nft_ctx *ctx, { struct nft_connlimit *priv = nft_expr_priv(expr); - nf_conncount_cache_free(&priv->hhead); + nf_conncount_cache_free(&priv->list); } static bool nft_connlimit_gc(struct net *net, const struct nft_expr *expr) { struct nft_connlimit *priv = nft_expr_priv(expr); - bool addit, ret; - spin_lock_bh(&priv->lock); - nf_conncount_lookup(net, &priv->hhead, NULL, &nf_ct_zone_dflt, &addit); - - ret = hlist_empty(&priv->hhead); - spin_unlock_bh(&priv->lock); - - return ret; + return nf_conncount_gc_list(net, &priv->list); } static struct nft_expr_type nft_connlimit_type; diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c index 1435ffc5f57e..4855d4ce1c8f 100644 --- a/net/netfilter/nft_ct.c +++ b/net/netfilter/nft_ct.c @@ -22,6 +22,8 @@ #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_ecache.h> #include <net/netfilter/nf_conntrack_labels.h> +#include <net/netfilter/nf_conntrack_timeout.h> +#include <net/netfilter/nf_conntrack_l4proto.h> struct nft_ct { enum nft_ct_keys key:8; @@ -765,6 +767,194 @@ static struct nft_expr_type nft_notrack_type __read_mostly = { .owner = THIS_MODULE, }; +#ifdef CONFIG_NF_CONNTRACK_TIMEOUT +static int +nft_ct_timeout_parse_policy(void *timeouts, + const struct nf_conntrack_l4proto *l4proto, + struct net *net, const struct nlattr *attr) +{ + struct nlattr **tb; + int ret = 0; + + if (!l4proto->ctnl_timeout.nlattr_to_obj) + return 0; + + tb = kcalloc(l4proto->ctnl_timeout.nlattr_max + 1, sizeof(*tb), + GFP_KERNEL); + + if (!tb) + return -ENOMEM; + + ret = nla_parse_nested(tb, l4proto->ctnl_timeout.nlattr_max, + attr, l4proto->ctnl_timeout.nla_policy, + NULL); + if (ret < 0) + goto err; + + ret = l4proto->ctnl_timeout.nlattr_to_obj(tb, net, timeouts); + +err: + kfree(tb); + return ret; +} + +struct nft_ct_timeout_obj { + struct nf_conn *tmpl; + u8 l4proto; +}; + +static void nft_ct_timeout_obj_eval(struct nft_object *obj, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + const struct nft_ct_timeout_obj *priv = nft_obj_data(obj); + struct nf_conn *ct = (struct nf_conn *)skb_nfct(pkt->skb); + struct sk_buff *skb = pkt->skb; + + if (ct || + priv->l4proto != pkt->tprot) + return; + + nf_ct_set(skb, priv->tmpl, IP_CT_NEW); +} + +static int nft_ct_timeout_obj_init(const struct nft_ctx *ctx, + const struct nlattr * const tb[], + struct nft_object *obj) +{ + const struct nf_conntrack_zone *zone = &nf_ct_zone_dflt; + struct nft_ct_timeout_obj *priv = nft_obj_data(obj); + const struct nf_conntrack_l4proto *l4proto; + struct nf_conn_timeout *timeout_ext; + struct nf_ct_timeout *timeout; + int l3num = ctx->family; + struct nf_conn *tmpl; + __u8 l4num; + int ret; + + if (!tb[NFTA_CT_TIMEOUT_L3PROTO] || + !tb[NFTA_CT_TIMEOUT_L4PROTO] || + !tb[NFTA_CT_TIMEOUT_DATA]) + return -EINVAL; + + l3num = ntohs(nla_get_be16(tb[NFTA_CT_TIMEOUT_L3PROTO])); + l4num = nla_get_u8(tb[NFTA_CT_TIMEOUT_L4PROTO]); + priv->l4proto = l4num; + + l4proto = nf_ct_l4proto_find_get(l3num, l4num); + + if (l4proto->l4proto != l4num) { + ret = -EOPNOTSUPP; + goto err_proto_put; + } + + timeout = kzalloc(sizeof(struct nf_ct_timeout) + + l4proto->ctnl_timeout.obj_size, GFP_KERNEL); + if (timeout == NULL) { + ret = -ENOMEM; + goto err_proto_put; + } + + ret = nft_ct_timeout_parse_policy(&timeout->data, l4proto, ctx->net, + tb[NFTA_CT_TIMEOUT_DATA]); + if (ret < 0) + goto err_free_timeout; + + timeout->l3num = l3num; + timeout->l4proto = l4proto; + tmpl = nf_ct_tmpl_alloc(ctx->net, zone, GFP_ATOMIC); + if (!tmpl) { + ret = -ENOMEM; + goto err_free_timeout; + } + + timeout_ext = nf_ct_timeout_ext_add(tmpl, timeout, GFP_ATOMIC); + if (!timeout_ext) { + ret = -ENOMEM; + goto err_free_tmpl; + } + + ret = nf_ct_netns_get(ctx->net, ctx->family); + if (ret < 0) + goto err_free_tmpl; + + priv->tmpl = tmpl; + + return 0; + +err_free_tmpl: + nf_ct_tmpl_free(tmpl); +err_free_timeout: + kfree(timeout); +err_proto_put: + nf_ct_l4proto_put(l4proto); + return ret; +} + +static void nft_ct_timeout_obj_destroy(const struct nft_ctx *ctx, + struct nft_object *obj) +{ + struct nft_ct_timeout_obj *priv = nft_obj_data(obj); + struct nf_conn_timeout *t = nf_ct_timeout_find(priv->tmpl); + struct nf_ct_timeout *timeout; + + timeout = rcu_dereference_raw(t->timeout); + nf_ct_untimeout(ctx->net, timeout); + nf_ct_l4proto_put(timeout->l4proto); + nf_ct_netns_put(ctx->net, ctx->family); + nf_ct_tmpl_free(priv->tmpl); +} + +static int nft_ct_timeout_obj_dump(struct sk_buff *skb, + struct nft_object *obj, bool reset) +{ + const struct nft_ct_timeout_obj *priv = nft_obj_data(obj); + const struct nf_conn_timeout *t = nf_ct_timeout_find(priv->tmpl); + const struct nf_ct_timeout *timeout = rcu_dereference_raw(t->timeout); + struct nlattr *nest_params; + int ret; + + if (nla_put_u8(skb, NFTA_CT_TIMEOUT_L4PROTO, timeout->l4proto->l4proto) || + nla_put_be16(skb, NFTA_CT_TIMEOUT_L3PROTO, htons(timeout->l3num))) + return -1; + + nest_params = nla_nest_start(skb, NFTA_CT_TIMEOUT_DATA | NLA_F_NESTED); + if (!nest_params) + return -1; + + ret = timeout->l4proto->ctnl_timeout.obj_to_nlattr(skb, &timeout->data); + if (ret < 0) + return -1; + nla_nest_end(skb, nest_params); + return 0; +} + +static const struct nla_policy nft_ct_timeout_policy[NFTA_CT_TIMEOUT_MAX + 1] = { + [NFTA_CT_TIMEOUT_L3PROTO] = {.type = NLA_U16 }, + [NFTA_CT_TIMEOUT_L4PROTO] = {.type = NLA_U8 }, + [NFTA_CT_TIMEOUT_DATA] = {.type = NLA_NESTED }, +}; + +static struct nft_object_type nft_ct_timeout_obj_type; + +static const struct nft_object_ops nft_ct_timeout_obj_ops = { + .type = &nft_ct_timeout_obj_type, + .size = sizeof(struct nft_ct_timeout_obj), + .eval = nft_ct_timeout_obj_eval, + .init = nft_ct_timeout_obj_init, + .destroy = nft_ct_timeout_obj_destroy, + .dump = nft_ct_timeout_obj_dump, +}; + +static struct nft_object_type nft_ct_timeout_obj_type __read_mostly = { + .type = NFT_OBJECT_CT_TIMEOUT, + .ops = &nft_ct_timeout_obj_ops, + .maxattr = NFTA_CT_TIMEOUT_MAX, + .policy = nft_ct_timeout_policy, + .owner = THIS_MODULE, +}; +#endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ + static int nft_ct_helper_obj_init(const struct nft_ctx *ctx, const struct nlattr * const tb[], struct nft_object *obj) @@ -773,6 +963,7 @@ static int nft_ct_helper_obj_init(const struct nft_ctx *ctx, struct nf_conntrack_helper *help4, *help6; char name[NF_CT_HELPER_NAME_LEN]; int family = ctx->family; + int err; if (!tb[NFTA_CT_HELPER_NAME] || !tb[NFTA_CT_HELPER_L4PROTO]) return -EINVAL; @@ -823,7 +1014,18 @@ static int nft_ct_helper_obj_init(const struct nft_ctx *ctx, priv->helper4 = help4; priv->helper6 = help6; + err = nf_ct_netns_get(ctx->net, ctx->family); + if (err < 0) + goto err_put_helper; + return 0; + +err_put_helper: + if (priv->helper4) + nf_conntrack_helper_put(priv->helper4); + if (priv->helper6) + nf_conntrack_helper_put(priv->helper6); + return err; } static void nft_ct_helper_obj_destroy(const struct nft_ctx *ctx, @@ -835,6 +1037,8 @@ static void nft_ct_helper_obj_destroy(const struct nft_ctx *ctx, nf_conntrack_helper_put(priv->helper4); if (priv->helper6) nf_conntrack_helper_put(priv->helper6); + + nf_ct_netns_put(ctx->net, ctx->family); } static void nft_ct_helper_obj_eval(struct nft_object *obj, @@ -870,7 +1074,7 @@ static void nft_ct_helper_obj_eval(struct nft_object *obj, if (test_bit(IPS_HELPER_BIT, &ct->status)) return; - help = nf_ct_helper_ext_add(ct, to_assign, GFP_ATOMIC); + help = nf_ct_helper_ext_add(ct, GFP_ATOMIC); if (help) { rcu_assign_pointer(help->helper, to_assign); set_bit(IPS_HELPER_BIT, &ct->status); @@ -949,9 +1153,17 @@ static int __init nft_ct_module_init(void) err = nft_register_obj(&nft_ct_helper_obj_type); if (err < 0) goto err2; - +#ifdef CONFIG_NF_CONNTRACK_TIMEOUT + err = nft_register_obj(&nft_ct_timeout_obj_type); + if (err < 0) + goto err3; +#endif return 0; +#ifdef CONFIG_NF_CONNTRACK_TIMEOUT +err3: + nft_unregister_obj(&nft_ct_helper_obj_type); +#endif err2: nft_unregister_expr(&nft_notrack_type); err1: @@ -961,6 +1173,9 @@ err1: static void __exit nft_ct_module_exit(void) { +#ifdef CONFIG_NF_CONNTRACK_TIMEOUT + nft_unregister_obj(&nft_ct_timeout_obj_type); +#endif nft_unregister_obj(&nft_ct_helper_obj_type); nft_unregister_expr(&nft_notrack_type); nft_unregister_expr(&nft_ct_type); @@ -974,3 +1189,4 @@ MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); MODULE_ALIAS_NFT_EXPR("ct"); MODULE_ALIAS_NFT_EXPR("notrack"); MODULE_ALIAS_NFT_OBJ(NFT_OBJECT_CT_HELPER); +MODULE_ALIAS_NFT_OBJ(NFT_OBJECT_CT_TIMEOUT); diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c index 27d7e4598ab6..81184c244d1a 100644 --- a/net/netfilter/nft_dynset.c +++ b/net/netfilter/nft_dynset.c @@ -118,6 +118,8 @@ static int nft_dynset_init(const struct nft_ctx *ctx, u64 timeout; int err; + lockdep_assert_held(&ctx->net->nft.commit_mutex); + if (tb[NFTA_DYNSET_SET_NAME] == NULL || tb[NFTA_DYNSET_OP] == NULL || tb[NFTA_DYNSET_SREG_KEY] == NULL) diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index c2a1d84cdfc4..ad13e8643599 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -26,9 +26,9 @@ struct nft_lookup { struct nft_set_binding binding; }; -static void nft_lookup_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) +void nft_lookup_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) { const struct nft_lookup *priv = nft_expr_priv(expr); const struct nft_set *set = priv->set; diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index 1105a23bda5e..297fe7d97c18 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -41,9 +41,9 @@ static DEFINE_PER_CPU(struct rnd_state, nft_prandom_state); #include "../bridge/br_private.h" #endif -static void nft_meta_get_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) +void nft_meta_get_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) { const struct nft_meta *priv = nft_expr_priv(expr); const struct sk_buff *skb = pkt->skb; @@ -107,7 +107,8 @@ static void nft_meta_get_eval(const struct nft_expr *expr, break; case NFT_META_SKUID: sk = skb_to_full_sk(skb); - if (!sk || !sk_fullsock(sk)) + if (!sk || !sk_fullsock(sk) || + !net_eq(nft_net(pkt), sock_net(sk))) goto err; read_lock_bh(&sk->sk_callback_lock); @@ -123,7 +124,8 @@ static void nft_meta_get_eval(const struct nft_expr *expr, break; case NFT_META_SKGID: sk = skb_to_full_sk(skb); - if (!sk || !sk_fullsock(sk)) + if (!sk || !sk_fullsock(sk) || + !net_eq(nft_net(pkt), sock_net(sk))) goto err; read_lock_bh(&sk->sk_callback_lock); @@ -214,7 +216,8 @@ static void nft_meta_get_eval(const struct nft_expr *expr, #ifdef CONFIG_CGROUP_NET_CLASSID case NFT_META_CGROUP: sk = skb_to_full_sk(skb); - if (!sk || !sk_fullsock(sk)) + if (!sk || !sk_fullsock(sk) || + !net_eq(nft_net(pkt), sock_net(sk))) goto err; *dest = sock_cgroup_classid(&sk->sk_cgrp_data); break; diff --git a/net/netfilter/nft_numgen.c b/net/netfilter/nft_numgen.c index 1f4d0854cf70..649d1700ec5b 100644 --- a/net/netfilter/nft_numgen.c +++ b/net/netfilter/nft_numgen.c @@ -237,10 +237,8 @@ static int nft_ng_random_map_init(const struct nft_ctx *ctx, priv->map = nft_set_lookup_global(ctx->net, ctx->table, tb[NFTA_NG_SET_NAME], tb[NFTA_NG_SET_ID], genmask); - if (IS_ERR(priv->map)) - return PTR_ERR(priv->map); - return 0; + return PTR_ERR_OR_ZERO(priv->map); } static int nft_ng_random_dump(struct sk_buff *skb, const struct nft_expr *expr) diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c new file mode 100644 index 000000000000..5af74b37f423 --- /dev/null +++ b/net/netfilter/nft_osf.c @@ -0,0 +1,104 @@ +#include <net/ip.h> +#include <net/tcp.h> + +#include <net/netfilter/nf_tables.h> +#include <linux/netfilter/nfnetlink_osf.h> + +struct nft_osf { + enum nft_registers dreg:8; +}; + +static const struct nla_policy nft_osf_policy[NFTA_OSF_MAX + 1] = { + [NFTA_OSF_DREG] = { .type = NLA_U32 }, +}; + +static void nft_osf_eval(const struct nft_expr *expr, struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + struct nft_osf *priv = nft_expr_priv(expr); + u32 *dest = ®s->data[priv->dreg]; + struct sk_buff *skb = pkt->skb; + const struct tcphdr *tcp; + struct tcphdr _tcph; + const char *os_name; + + tcp = skb_header_pointer(skb, ip_hdrlen(skb), + sizeof(struct tcphdr), &_tcph); + if (!tcp) { + regs->verdict.code = NFT_BREAK; + return; + } + if (!tcp->syn) { + regs->verdict.code = NFT_BREAK; + return; + } + + os_name = nf_osf_find(skb, nf_osf_fingers); + if (!os_name) + strncpy((char *)dest, "unknown", NFT_OSF_MAXGENRELEN); + else + strncpy((char *)dest, os_name, NFT_OSF_MAXGENRELEN); +} + +static int nft_osf_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_osf *priv = nft_expr_priv(expr); + int err; + + priv->dreg = nft_parse_register(tb[NFTA_OSF_DREG]); + err = nft_validate_register_store(ctx, priv->dreg, NULL, + NFTA_DATA_VALUE, NFT_OSF_MAXGENRELEN); + if (err < 0) + return err; + + return 0; +} + +static int nft_osf_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_osf *priv = nft_expr_priv(expr); + + if (nft_dump_register(skb, NFTA_OSF_DREG, priv->dreg)) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_osf_type; +static const struct nft_expr_ops nft_osf_op = { + .eval = nft_osf_eval, + .size = NFT_EXPR_SIZE(sizeof(struct nft_osf)), + .init = nft_osf_init, + .dump = nft_osf_dump, + .type = &nft_osf_type, +}; + +static struct nft_expr_type nft_osf_type __read_mostly = { + .ops = &nft_osf_op, + .name = "osf", + .owner = THIS_MODULE, + .policy = nft_osf_policy, + .maxattr = NFTA_OSF_MAX, +}; + +static int __init nft_osf_module_init(void) +{ + return nft_register_expr(&nft_osf_type); +} + +static void __exit nft_osf_module_exit(void) +{ + return nft_unregister_expr(&nft_osf_type); +} + +module_init(nft_osf_module_init); +module_exit(nft_osf_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Fernando Fernandez <ffmancera@riseup.net>"); +MODULE_ALIAS_NFT_EXPR("osf"); diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c index 74e1b3bd6954..d7f3776dfd71 100644 --- a/net/netfilter/nft_socket.c +++ b/net/netfilter/nft_socket.c @@ -23,12 +23,15 @@ static void nft_socket_eval(const struct nft_expr *expr, struct sock *sk = skb->sk; u32 *dest = ®s->data[priv->dreg]; + if (sk && !net_eq(nft_net(pkt), sock_net(sk))) + sk = NULL; + if (!sk) switch(nft_pf(pkt)) { case NFPROTO_IPV4: sk = nf_sk_lookup_slow_v4(nft_net(pkt), skb, nft_in(pkt)); break; -#if IS_ENABLED(CONFIG_NF_SOCKET_IPV6) +#if IS_ENABLED(CONFIG_NF_TABLES_IPV6) case NFPROTO_IPV6: sk = nf_sk_lookup_slow_v6(nft_net(pkt), skb, nft_in(pkt)); break; @@ -39,8 +42,8 @@ static void nft_socket_eval(const struct nft_expr *expr, return; } - if(!sk) { - nft_reg_store8(dest, 0); + if (!sk) { + regs->verdict.code = NFT_BREAK; return; } @@ -51,6 +54,14 @@ static void nft_socket_eval(const struct nft_expr *expr, case NFT_SOCKET_TRANSPARENT: nft_reg_store8(dest, inet_sk_transparent(sk)); break; + case NFT_SOCKET_MARK: + if (sk_fullsock(sk)) { + *dest = sk->sk_mark; + } else { + regs->verdict.code = NFT_BREAK; + return; + } + break; default: WARN_ON(1); regs->verdict.code = NFT_BREAK; @@ -74,7 +85,7 @@ static int nft_socket_init(const struct nft_ctx *ctx, switch(ctx->family) { case NFPROTO_IPV4: -#if IS_ENABLED(CONFIG_NF_SOCKET_IPV6) +#if IS_ENABLED(CONFIG_NF_TABLES_IPV6) case NFPROTO_IPV6: #endif case NFPROTO_INET: @@ -88,6 +99,9 @@ static int nft_socket_init(const struct nft_ctx *ctx, case NFT_SOCKET_TRANSPARENT: len = sizeof(u8); break; + case NFT_SOCKET_MARK: + len = sizeof(u32); + break; default: return -EOPNOTSUPP; } diff --git a/net/netfilter/nft_tproxy.c b/net/netfilter/nft_tproxy.c new file mode 100644 index 000000000000..eff99dffc842 --- /dev/null +++ b/net/netfilter/nft_tproxy.c @@ -0,0 +1,316 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include <linux/module.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_core.h> +#include <net/netfilter/nf_tproxy.h> +#include <net/inet_sock.h> +#include <net/tcp.h> +#include <linux/if_ether.h> +#include <net/netfilter/ipv4/nf_defrag_ipv4.h> +#if IS_ENABLED(CONFIG_NF_TABLES_IPV6) +#include <net/netfilter/ipv6/nf_defrag_ipv6.h> +#endif + +struct nft_tproxy { + enum nft_registers sreg_addr:8; + enum nft_registers sreg_port:8; + u8 family; +}; + +static void nft_tproxy_eval_v4(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + const struct nft_tproxy *priv = nft_expr_priv(expr); + struct sk_buff *skb = pkt->skb; + const struct iphdr *iph = ip_hdr(skb); + struct udphdr _hdr, *hp; + __be32 taddr = 0; + __be16 tport = 0; + struct sock *sk; + + hp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(_hdr), &_hdr); + if (!hp) { + regs->verdict.code = NFT_BREAK; + return; + } + + /* check if there's an ongoing connection on the packet addresses, this + * happens if the redirect already happened and the current packet + * belongs to an already established connection + */ + sk = nf_tproxy_get_sock_v4(nft_net(pkt), skb, iph->protocol, + iph->saddr, iph->daddr, + hp->source, hp->dest, + skb->dev, NF_TPROXY_LOOKUP_ESTABLISHED); + + if (priv->sreg_addr) + taddr = regs->data[priv->sreg_addr]; + taddr = nf_tproxy_laddr4(skb, taddr, iph->daddr); + + if (priv->sreg_port) + tport = regs->data[priv->sreg_port]; + if (!tport) + tport = hp->dest; + + /* UDP has no TCP_TIME_WAIT state, so we never enter here */ + if (sk && sk->sk_state == TCP_TIME_WAIT) { + /* reopening a TIME_WAIT connection needs special handling */ + sk = nf_tproxy_handle_time_wait4(nft_net(pkt), skb, taddr, tport, sk); + } else if (!sk) { + /* no, there's no established connection, check if + * there's a listener on the redirected addr/port + */ + sk = nf_tproxy_get_sock_v4(nft_net(pkt), skb, iph->protocol, + iph->saddr, taddr, + hp->source, tport, + skb->dev, NF_TPROXY_LOOKUP_LISTENER); + } + + if (sk && nf_tproxy_sk_is_transparent(sk)) + nf_tproxy_assign_sock(skb, sk); + else + regs->verdict.code = NFT_BREAK; +} + +#if IS_ENABLED(CONFIG_NF_TABLES_IPV6) +static void nft_tproxy_eval_v6(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + const struct nft_tproxy *priv = nft_expr_priv(expr); + struct sk_buff *skb = pkt->skb; + const struct ipv6hdr *iph = ipv6_hdr(skb); + struct in6_addr taddr = {0}; + int thoff = pkt->xt.thoff; + struct udphdr _hdr, *hp; + __be16 tport = 0; + struct sock *sk; + int l4proto; + + if (!pkt->tprot_set) { + regs->verdict.code = NFT_BREAK; + return; + } + l4proto = pkt->tprot; + + hp = skb_header_pointer(skb, thoff, sizeof(_hdr), &_hdr); + if (hp == NULL) { + regs->verdict.code = NFT_BREAK; + return; + } + + /* check if there's an ongoing connection on the packet addresses, this + * happens if the redirect already happened and the current packet + * belongs to an already established connection + */ + sk = nf_tproxy_get_sock_v6(nft_net(pkt), skb, thoff, l4proto, + &iph->saddr, &iph->daddr, + hp->source, hp->dest, + nft_in(pkt), NF_TPROXY_LOOKUP_ESTABLISHED); + + if (priv->sreg_addr) + memcpy(&taddr, ®s->data[priv->sreg_addr], sizeof(taddr)); + taddr = *nf_tproxy_laddr6(skb, &taddr, &iph->daddr); + + if (priv->sreg_port) + tport = regs->data[priv->sreg_port]; + if (!tport) + tport = hp->dest; + + /* UDP has no TCP_TIME_WAIT state, so we never enter here */ + if (sk && sk->sk_state == TCP_TIME_WAIT) { + /* reopening a TIME_WAIT connection needs special handling */ + sk = nf_tproxy_handle_time_wait6(skb, l4proto, thoff, + nft_net(pkt), + &taddr, + tport, + sk); + } else if (!sk) { + /* no there's no established connection, check if + * there's a listener on the redirected addr/port + */ + sk = nf_tproxy_get_sock_v6(nft_net(pkt), skb, thoff, + l4proto, &iph->saddr, &taddr, + hp->source, tport, + nft_in(pkt), NF_TPROXY_LOOKUP_LISTENER); + } + + /* NOTE: assign_sock consumes our sk reference */ + if (sk && nf_tproxy_sk_is_transparent(sk)) + nf_tproxy_assign_sock(skb, sk); + else + regs->verdict.code = NFT_BREAK; +} +#endif + +static void nft_tproxy_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + const struct nft_tproxy *priv = nft_expr_priv(expr); + + switch (nft_pf(pkt)) { + case NFPROTO_IPV4: + switch (priv->family) { + case NFPROTO_IPV4: + case NFPROTO_UNSPEC: + nft_tproxy_eval_v4(expr, regs, pkt); + return; + } + break; +#if IS_ENABLED(CONFIG_NF_TABLES_IPV6) + case NFPROTO_IPV6: + switch (priv->family) { + case NFPROTO_IPV6: + case NFPROTO_UNSPEC: + nft_tproxy_eval_v6(expr, regs, pkt); + return; + } +#endif + } + regs->verdict.code = NFT_BREAK; +} + +static const struct nla_policy nft_tproxy_policy[NFTA_TPROXY_MAX + 1] = { + [NFTA_TPROXY_FAMILY] = { .type = NLA_U32 }, + [NFTA_TPROXY_REG_ADDR] = { .type = NLA_U32 }, + [NFTA_TPROXY_REG_PORT] = { .type = NLA_U32 }, +}; + +static int nft_tproxy_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_tproxy *priv = nft_expr_priv(expr); + unsigned int alen = 0; + int err; + + if (!tb[NFTA_TPROXY_FAMILY] || + (!tb[NFTA_TPROXY_REG_ADDR] && !tb[NFTA_TPROXY_REG_PORT])) + return -EINVAL; + + priv->family = ntohl(nla_get_be32(tb[NFTA_TPROXY_FAMILY])); + + switch (ctx->family) { + case NFPROTO_IPV4: + if (priv->family != NFPROTO_IPV4) + return -EINVAL; + break; +#if IS_ENABLED(CONFIG_NF_TABLES_IPV6) + case NFPROTO_IPV6: + if (priv->family != NFPROTO_IPV6) + return -EINVAL; + break; +#endif + case NFPROTO_INET: + break; + default: + return -EOPNOTSUPP; + } + + /* Address is specified but the rule family is not set accordingly */ + if (priv->family == NFPROTO_UNSPEC && tb[NFTA_TPROXY_REG_ADDR]) + return -EINVAL; + + switch (priv->family) { + case NFPROTO_IPV4: + alen = FIELD_SIZEOF(union nf_inet_addr, in); + err = nf_defrag_ipv4_enable(ctx->net); + if (err) + return err; + break; +#if IS_ENABLED(CONFIG_NF_TABLES_IPV6) + case NFPROTO_IPV6: + alen = FIELD_SIZEOF(union nf_inet_addr, in6); + err = nf_defrag_ipv6_enable(ctx->net); + if (err) + return err; + break; +#endif + case NFPROTO_UNSPEC: + /* No address is specified here */ + err = nf_defrag_ipv4_enable(ctx->net); + if (err) + return err; +#if IS_ENABLED(CONFIG_NF_TABLES_IPV6) + err = nf_defrag_ipv6_enable(ctx->net); + if (err) + return err; +#endif + break; + default: + return -EOPNOTSUPP; + } + + if (tb[NFTA_TPROXY_REG_ADDR]) { + priv->sreg_addr = nft_parse_register(tb[NFTA_TPROXY_REG_ADDR]); + err = nft_validate_register_load(priv->sreg_addr, alen); + if (err < 0) + return err; + } + + if (tb[NFTA_TPROXY_REG_PORT]) { + priv->sreg_port = nft_parse_register(tb[NFTA_TPROXY_REG_PORT]); + err = nft_validate_register_load(priv->sreg_port, sizeof(u16)); + if (err < 0) + return err; + } + + return 0; +} + +static int nft_tproxy_dump(struct sk_buff *skb, + const struct nft_expr *expr) +{ + const struct nft_tproxy *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_TPROXY_FAMILY, htonl(priv->family))) + return -1; + + if (priv->sreg_addr && + nft_dump_register(skb, NFTA_TPROXY_REG_ADDR, priv->sreg_addr)) + return -1; + + if (priv->sreg_port && + nft_dump_register(skb, NFTA_TPROXY_REG_PORT, priv->sreg_port)) + return -1; + + return 0; +} + +static struct nft_expr_type nft_tproxy_type; +static const struct nft_expr_ops nft_tproxy_ops = { + .type = &nft_tproxy_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_tproxy)), + .eval = nft_tproxy_eval, + .init = nft_tproxy_init, + .dump = nft_tproxy_dump, +}; + +static struct nft_expr_type nft_tproxy_type __read_mostly = { + .name = "tproxy", + .ops = &nft_tproxy_ops, + .policy = nft_tproxy_policy, + .maxattr = NFTA_TPROXY_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_tproxy_module_init(void) +{ + return nft_register_expr(&nft_tproxy_type); +} + +static void __exit nft_tproxy_module_exit(void) +{ + nft_unregister_expr(&nft_tproxy_type); +} + +module_init(nft_tproxy_module_init); +module_exit(nft_tproxy_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Máté Eckl"); +MODULE_DESCRIPTION("nf_tables tproxy support module"); +MODULE_ALIAS_NFT_EXPR("tproxy"); diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c new file mode 100644 index 000000000000..3a15f219e4e7 --- /dev/null +++ b/net/netfilter/nft_tunnel.c @@ -0,0 +1,566 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/seqlock.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> +#include <net/dst_metadata.h> +#include <net/ip_tunnels.h> +#include <net/vxlan.h> +#include <net/erspan.h> + +struct nft_tunnel { + enum nft_tunnel_keys key:8; + enum nft_registers dreg:8; +}; + +static void nft_tunnel_get_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + const struct nft_tunnel *priv = nft_expr_priv(expr); + u32 *dest = ®s->data[priv->dreg]; + struct ip_tunnel_info *tun_info; + + tun_info = skb_tunnel_info(pkt->skb); + + switch (priv->key) { + case NFT_TUNNEL_PATH: + nft_reg_store8(dest, !!tun_info); + break; + case NFT_TUNNEL_ID: + if (!tun_info) { + regs->verdict.code = NFT_BREAK; + return; + } + *dest = ntohl(tunnel_id_to_key32(tun_info->key.tun_id)); + break; + default: + WARN_ON(1); + regs->verdict.code = NFT_BREAK; + } +} + +static const struct nla_policy nft_tunnel_policy[NFTA_TUNNEL_MAX + 1] = { + [NFTA_TUNNEL_KEY] = { .type = NLA_U32 }, + [NFTA_TUNNEL_DREG] = { .type = NLA_U32 }, +}; + +static int nft_tunnel_get_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_tunnel *priv = nft_expr_priv(expr); + u32 len; + + if (!tb[NFTA_TUNNEL_KEY] && + !tb[NFTA_TUNNEL_DREG]) + return -EINVAL; + + priv->key = ntohl(nla_get_be32(tb[NFTA_TUNNEL_KEY])); + switch (priv->key) { + case NFT_TUNNEL_PATH: + len = sizeof(u8); + break; + case NFT_TUNNEL_ID: + len = sizeof(u32); + break; + default: + return -EOPNOTSUPP; + } + + priv->dreg = nft_parse_register(tb[NFTA_TUNNEL_DREG]); + + return nft_validate_register_store(ctx, priv->dreg, NULL, + NFT_DATA_VALUE, len); +} + +static int nft_tunnel_get_dump(struct sk_buff *skb, + const struct nft_expr *expr) +{ + const struct nft_tunnel *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_TUNNEL_KEY, htonl(priv->key))) + goto nla_put_failure; + if (nft_dump_register(skb, NFTA_TUNNEL_DREG, priv->dreg)) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_tunnel_type; +static const struct nft_expr_ops nft_tunnel_get_ops = { + .type = &nft_tunnel_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_tunnel)), + .eval = nft_tunnel_get_eval, + .init = nft_tunnel_get_init, + .dump = nft_tunnel_get_dump, +}; + +static struct nft_expr_type nft_tunnel_type __read_mostly = { + .name = "tunnel", + .ops = &nft_tunnel_get_ops, + .policy = nft_tunnel_policy, + .maxattr = NFTA_TUNNEL_MAX, + .owner = THIS_MODULE, +}; + +struct nft_tunnel_opts { + union { + struct vxlan_metadata vxlan; + struct erspan_metadata erspan; + } u; + u32 len; + __be16 flags; +}; + +struct nft_tunnel_obj { + struct metadata_dst *md; + struct nft_tunnel_opts opts; +}; + +static const struct nla_policy nft_tunnel_ip_policy[NFTA_TUNNEL_KEY_IP_MAX + 1] = { + [NFTA_TUNNEL_KEY_IP_SRC] = { .type = NLA_U32 }, + [NFTA_TUNNEL_KEY_IP_DST] = { .type = NLA_U32 }, +}; + +static int nft_tunnel_obj_ip_init(const struct nft_ctx *ctx, + const struct nlattr *attr, + struct ip_tunnel_info *info) +{ + struct nlattr *tb[NFTA_TUNNEL_KEY_IP_MAX + 1]; + int err; + + err = nla_parse_nested(tb, NFTA_TUNNEL_KEY_IP_MAX, attr, + nft_tunnel_ip_policy, NULL); + if (err < 0) + return err; + + if (!tb[NFTA_TUNNEL_KEY_IP_DST]) + return -EINVAL; + + if (tb[NFTA_TUNNEL_KEY_IP_SRC]) + info->key.u.ipv4.src = nla_get_be32(tb[NFTA_TUNNEL_KEY_IP_SRC]); + if (tb[NFTA_TUNNEL_KEY_IP_DST]) + info->key.u.ipv4.dst = nla_get_be32(tb[NFTA_TUNNEL_KEY_IP_DST]); + + return 0; +} + +static const struct nla_policy nft_tunnel_ip6_policy[NFTA_TUNNEL_KEY_IP6_MAX + 1] = { + [NFTA_TUNNEL_KEY_IP6_SRC] = { .len = sizeof(struct in6_addr), }, + [NFTA_TUNNEL_KEY_IP6_DST] = { .len = sizeof(struct in6_addr), }, + [NFTA_TUNNEL_KEY_IP6_FLOWLABEL] = { .type = NLA_U32, } +}; + +static int nft_tunnel_obj_ip6_init(const struct nft_ctx *ctx, + const struct nlattr *attr, + struct ip_tunnel_info *info) +{ + struct nlattr *tb[NFTA_TUNNEL_KEY_IP6_MAX + 1]; + int err; + + err = nla_parse_nested(tb, NFTA_TUNNEL_KEY_IP6_MAX, attr, + nft_tunnel_ip6_policy, NULL); + if (err < 0) + return err; + + if (!tb[NFTA_TUNNEL_KEY_IP6_DST]) + return -EINVAL; + + if (tb[NFTA_TUNNEL_KEY_IP6_SRC]) { + memcpy(&info->key.u.ipv6.src, + nla_data(tb[NFTA_TUNNEL_KEY_IP6_SRC]), + sizeof(struct in6_addr)); + } + if (tb[NFTA_TUNNEL_KEY_IP6_DST]) { + memcpy(&info->key.u.ipv6.dst, + nla_data(tb[NFTA_TUNNEL_KEY_IP6_DST]), + sizeof(struct in6_addr)); + } + if (tb[NFTA_TUNNEL_KEY_IP6_FLOWLABEL]) + info->key.label = nla_get_be32(tb[NFTA_TUNNEL_KEY_IP6_FLOWLABEL]); + + info->mode |= IP_TUNNEL_INFO_IPV6; + + return 0; +} + +static const struct nla_policy nft_tunnel_opts_vxlan_policy[NFTA_TUNNEL_KEY_VXLAN_MAX + 1] = { + [NFTA_TUNNEL_KEY_VXLAN_GBP] = { .type = NLA_U32 }, +}; + +static int nft_tunnel_obj_vxlan_init(const struct nlattr *attr, + struct nft_tunnel_opts *opts) +{ + struct nlattr *tb[NFTA_TUNNEL_KEY_VXLAN_MAX + 1]; + int err; + + err = nla_parse_nested(tb, NFTA_TUNNEL_KEY_VXLAN_MAX, attr, + nft_tunnel_opts_vxlan_policy, NULL); + if (err < 0) + return err; + + if (!tb[NFTA_TUNNEL_KEY_VXLAN_GBP]) + return -EINVAL; + + opts->u.vxlan.gbp = ntohl(nla_get_be32(tb[NFTA_TUNNEL_KEY_VXLAN_GBP])); + + opts->len = sizeof(struct vxlan_metadata); + opts->flags = TUNNEL_VXLAN_OPT; + + return 0; +} + +static const struct nla_policy nft_tunnel_opts_erspan_policy[NFTA_TUNNEL_KEY_ERSPAN_MAX + 1] = { + [NFTA_TUNNEL_KEY_ERSPAN_V1_INDEX] = { .type = NLA_U32 }, + [NFTA_TUNNEL_KEY_ERSPAN_V2_DIR] = { .type = NLA_U8 }, + [NFTA_TUNNEL_KEY_ERSPAN_V2_HWID] = { .type = NLA_U8 }, +}; + +static int nft_tunnel_obj_erspan_init(const struct nlattr *attr, + struct nft_tunnel_opts *opts) +{ + struct nlattr *tb[NFTA_TUNNEL_KEY_ERSPAN_MAX + 1]; + uint8_t hwid, dir; + int err, version; + + err = nla_parse_nested(tb, NFTA_TUNNEL_KEY_ERSPAN_MAX, attr, + nft_tunnel_opts_erspan_policy, NULL); + if (err < 0) + return err; + + version = ntohl(nla_get_be32(tb[NFTA_TUNNEL_KEY_ERSPAN_VERSION])); + switch (version) { + case ERSPAN_VERSION: + if (!tb[NFTA_TUNNEL_KEY_ERSPAN_V1_INDEX]) + return -EINVAL; + + opts->u.erspan.u.index = + nla_get_be32(tb[NFTA_TUNNEL_KEY_ERSPAN_V1_INDEX]); + break; + case ERSPAN_VERSION2: + if (!tb[NFTA_TUNNEL_KEY_ERSPAN_V2_DIR] || + !tb[NFTA_TUNNEL_KEY_ERSPAN_V2_HWID]) + return -EINVAL; + + hwid = nla_get_u8(tb[NFTA_TUNNEL_KEY_ERSPAN_V2_HWID]); + dir = nla_get_u8(tb[NFTA_TUNNEL_KEY_ERSPAN_V2_DIR]); + + set_hwid(&opts->u.erspan.u.md2, hwid); + opts->u.erspan.u.md2.dir = dir; + break; + default: + return -EOPNOTSUPP; + } + opts->u.erspan.version = version; + + opts->len = sizeof(struct erspan_metadata); + opts->flags = TUNNEL_ERSPAN_OPT; + + return 0; +} + +static const struct nla_policy nft_tunnel_opts_policy[NFTA_TUNNEL_KEY_OPTS_MAX + 1] = { + [NFTA_TUNNEL_KEY_OPTS_VXLAN] = { .type = NLA_NESTED, }, + [NFTA_TUNNEL_KEY_OPTS_ERSPAN] = { .type = NLA_NESTED, }, +}; + +static int nft_tunnel_obj_opts_init(const struct nft_ctx *ctx, + const struct nlattr *attr, + struct ip_tunnel_info *info, + struct nft_tunnel_opts *opts) +{ + struct nlattr *tb[NFTA_TUNNEL_KEY_OPTS_MAX + 1]; + int err; + + err = nla_parse_nested(tb, NFTA_TUNNEL_KEY_OPTS_MAX, attr, + nft_tunnel_opts_policy, NULL); + if (err < 0) + return err; + + if (tb[NFTA_TUNNEL_KEY_OPTS_VXLAN]) { + err = nft_tunnel_obj_vxlan_init(tb[NFTA_TUNNEL_KEY_OPTS_VXLAN], + opts); + } else if (tb[NFTA_TUNNEL_KEY_OPTS_ERSPAN]) { + err = nft_tunnel_obj_erspan_init(tb[NFTA_TUNNEL_KEY_OPTS_ERSPAN], + opts); + } else { + return -EOPNOTSUPP; + } + + return err; +} + +static const struct nla_policy nft_tunnel_key_policy[NFTA_TUNNEL_KEY_MAX + 1] = { + [NFTA_TUNNEL_KEY_IP] = { .type = NLA_NESTED, }, + [NFTA_TUNNEL_KEY_IP6] = { .type = NLA_NESTED, }, + [NFTA_TUNNEL_KEY_ID] = { .type = NLA_U32, }, + [NFTA_TUNNEL_KEY_FLAGS] = { .type = NLA_U32, }, + [NFTA_TUNNEL_KEY_TOS] = { .type = NLA_U8, }, + [NFTA_TUNNEL_KEY_TTL] = { .type = NLA_U8, }, + [NFTA_TUNNEL_KEY_OPTS] = { .type = NLA_NESTED, }, +}; + +static int nft_tunnel_obj_init(const struct nft_ctx *ctx, + const struct nlattr * const tb[], + struct nft_object *obj) +{ + struct nft_tunnel_obj *priv = nft_obj_data(obj); + struct ip_tunnel_info info; + struct metadata_dst *md; + int err; + + if (!tb[NFTA_TUNNEL_KEY_ID]) + return -EINVAL; + + memset(&info, 0, sizeof(info)); + info.mode = IP_TUNNEL_INFO_TX; + info.key.tun_id = key32_to_tunnel_id(nla_get_be32(tb[NFTA_TUNNEL_KEY_ID])); + info.key.tun_flags = TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE; + + if (tb[NFTA_TUNNEL_KEY_IP]) { + err = nft_tunnel_obj_ip_init(ctx, tb[NFTA_TUNNEL_KEY_IP], &info); + if (err < 0) + return err; + } else if (tb[NFTA_TUNNEL_KEY_IP6]) { + err = nft_tunnel_obj_ip6_init(ctx, tb[NFTA_TUNNEL_KEY_IP6], &info); + if (err < 0) + return err; + } else { + return -EINVAL; + } + + if (tb[NFTA_TUNNEL_KEY_SPORT]) { + info.key.tp_src = nla_get_be16(tb[NFTA_TUNNEL_KEY_SPORT]); + } + if (tb[NFTA_TUNNEL_KEY_DPORT]) { + info.key.tp_dst = nla_get_be16(tb[NFTA_TUNNEL_KEY_DPORT]); + } + + if (tb[NFTA_TUNNEL_KEY_FLAGS]) { + u32 tun_flags; + + tun_flags = ntohl(nla_get_be32(tb[NFTA_TUNNEL_KEY_FLAGS])); + if (tun_flags & ~NFT_TUNNEL_F_MASK) + return -EOPNOTSUPP; + + if (tun_flags & NFT_TUNNEL_F_ZERO_CSUM_TX) + info.key.tun_flags &= ~TUNNEL_CSUM; + if (tun_flags & NFT_TUNNEL_F_DONT_FRAGMENT) + info.key.tun_flags |= TUNNEL_DONT_FRAGMENT; + if (tun_flags & NFT_TUNNEL_F_SEQ_NUMBER) + info.key.tun_flags |= TUNNEL_SEQ; + } + if (tb[NFTA_TUNNEL_KEY_TOS]) + info.key.tos = nla_get_u8(tb[NFTA_TUNNEL_KEY_TOS]); + if (tb[NFTA_TUNNEL_KEY_TTL]) + info.key.ttl = nla_get_u8(tb[NFTA_TUNNEL_KEY_TTL]); + else + info.key.ttl = U8_MAX; + + if (tb[NFTA_TUNNEL_KEY_OPTS]) { + err = nft_tunnel_obj_opts_init(ctx, tb[NFTA_TUNNEL_KEY_OPTS], + &info, &priv->opts); + if (err < 0) + return err; + } + + md = metadata_dst_alloc(priv->opts.len, METADATA_IP_TUNNEL, GFP_KERNEL); + if (!md) + return -ENOMEM; + + memcpy(&md->u.tun_info, &info, sizeof(info)); + ip_tunnel_info_opts_set(&md->u.tun_info, &priv->opts.u, priv->opts.len, + priv->opts.flags); + priv->md = md; + + return 0; +} + +static inline void nft_tunnel_obj_eval(struct nft_object *obj, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + struct nft_tunnel_obj *priv = nft_obj_data(obj); + struct sk_buff *skb = pkt->skb; + + skb_dst_drop(skb); + dst_hold((struct dst_entry *) priv->md); + skb_dst_set(skb, (struct dst_entry *) priv->md); +} + +static int nft_tunnel_ip_dump(struct sk_buff *skb, struct ip_tunnel_info *info) +{ + struct nlattr *nest; + + if (info->mode & IP_TUNNEL_INFO_IPV6) { + nest = nla_nest_start(skb, NFTA_TUNNEL_KEY_IP6); + if (!nest) + return -1; + + if (nla_put_in6_addr(skb, NFTA_TUNNEL_KEY_IP6_SRC, &info->key.u.ipv6.src) < 0 || + nla_put_in6_addr(skb, NFTA_TUNNEL_KEY_IP6_DST, &info->key.u.ipv6.dst) < 0 || + nla_put_be32(skb, NFTA_TUNNEL_KEY_IP6_FLOWLABEL, info->key.label)) + return -1; + + nla_nest_end(skb, nest); + } else { + nest = nla_nest_start(skb, NFTA_TUNNEL_KEY_IP); + if (!nest) + return -1; + + if (nla_put_in_addr(skb, NFTA_TUNNEL_KEY_IP_SRC, info->key.u.ipv4.src) < 0 || + nla_put_in_addr(skb, NFTA_TUNNEL_KEY_IP_DST, info->key.u.ipv4.dst) < 0) + return -1; + + nla_nest_end(skb, nest); + } + + return 0; +} + +static int nft_tunnel_opts_dump(struct sk_buff *skb, + struct nft_tunnel_obj *priv) +{ + struct nft_tunnel_opts *opts = &priv->opts; + struct nlattr *nest; + + nest = nla_nest_start(skb, NFTA_TUNNEL_KEY_OPTS); + if (!nest) + return -1; + + if (opts->flags & TUNNEL_VXLAN_OPT) { + if (nla_put_be32(skb, NFTA_TUNNEL_KEY_VXLAN_GBP, + htonl(opts->u.vxlan.gbp))) + return -1; + } else if (opts->flags & TUNNEL_ERSPAN_OPT) { + switch (opts->u.erspan.version) { + case ERSPAN_VERSION: + if (nla_put_be32(skb, NFTA_TUNNEL_KEY_ERSPAN_V1_INDEX, + opts->u.erspan.u.index)) + return -1; + break; + case ERSPAN_VERSION2: + if (nla_put_u8(skb, NFTA_TUNNEL_KEY_ERSPAN_V2_HWID, + get_hwid(&opts->u.erspan.u.md2)) || + nla_put_u8(skb, NFTA_TUNNEL_KEY_ERSPAN_V2_DIR, + opts->u.erspan.u.md2.dir)) + return -1; + break; + } + } + nla_nest_end(skb, nest); + + return 0; +} + +static int nft_tunnel_ports_dump(struct sk_buff *skb, + struct ip_tunnel_info *info) +{ + if (nla_put_be16(skb, NFTA_TUNNEL_KEY_SPORT, htons(info->key.tp_src)) < 0 || + nla_put_be16(skb, NFTA_TUNNEL_KEY_DPORT, htons(info->key.tp_dst)) < 0) + return -1; + + return 0; +} + +static int nft_tunnel_flags_dump(struct sk_buff *skb, + struct ip_tunnel_info *info) +{ + u32 flags = 0; + + if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) + flags |= NFT_TUNNEL_F_DONT_FRAGMENT; + if (!(info->key.tun_flags & TUNNEL_CSUM)) + flags |= NFT_TUNNEL_F_ZERO_CSUM_TX; + if (info->key.tun_flags & TUNNEL_SEQ) + flags |= NFT_TUNNEL_F_SEQ_NUMBER; + + if (nla_put_be32(skb, NFTA_TUNNEL_KEY_FLAGS, htonl(flags)) < 0) + return -1; + + return 0; +} + +static int nft_tunnel_obj_dump(struct sk_buff *skb, + struct nft_object *obj, bool reset) +{ + struct nft_tunnel_obj *priv = nft_obj_data(obj); + struct ip_tunnel_info *info = &priv->md->u.tun_info; + + if (nla_put_be32(skb, NFTA_TUNNEL_KEY_ID, + tunnel_id_to_key32(info->key.tun_id)) || + nft_tunnel_ip_dump(skb, info) < 0 || + nft_tunnel_ports_dump(skb, info) < 0 || + nft_tunnel_flags_dump(skb, info) < 0 || + nla_put_u8(skb, NFTA_TUNNEL_KEY_TOS, info->key.tos) || + nla_put_u8(skb, NFTA_TUNNEL_KEY_TTL, info->key.ttl) || + nft_tunnel_opts_dump(skb, priv) < 0) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -1; +} + +static void nft_tunnel_obj_destroy(const struct nft_ctx *ctx, + struct nft_object *obj) +{ + struct nft_tunnel_obj *priv = nft_obj_data(obj); + + metadata_dst_free(priv->md); +} + +static struct nft_object_type nft_tunnel_obj_type; +static const struct nft_object_ops nft_tunnel_obj_ops = { + .type = &nft_tunnel_obj_type, + .size = sizeof(struct nft_tunnel_obj), + .eval = nft_tunnel_obj_eval, + .init = nft_tunnel_obj_init, + .destroy = nft_tunnel_obj_destroy, + .dump = nft_tunnel_obj_dump, +}; + +static struct nft_object_type nft_tunnel_obj_type __read_mostly = { + .type = NFT_OBJECT_TUNNEL, + .ops = &nft_tunnel_obj_ops, + .maxattr = NFTA_TUNNEL_KEY_MAX, + .policy = nft_tunnel_key_policy, + .owner = THIS_MODULE, +}; + +static int __init nft_tunnel_module_init(void) +{ + int err; + + err = nft_register_expr(&nft_tunnel_type); + if (err < 0) + return err; + + err = nft_register_obj(&nft_tunnel_obj_type); + if (err < 0) + nft_unregister_expr(&nft_tunnel_type); + + return err; +} + +static void __exit nft_tunnel_module_exit(void) +{ + nft_unregister_obj(&nft_tunnel_obj_type); + nft_unregister_expr(&nft_tunnel_type); +} + +module_init(nft_tunnel_module_init); +module_exit(nft_tunnel_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>"); +MODULE_ALIAS_NFT_EXPR("tunnel"); +MODULE_ALIAS_NFT_OBJ(NFT_OBJECT_TUNNEL); diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c index 0b660c568156..e8da9a9bba73 100644 --- a/net/netfilter/utils.c +++ b/net/netfilter/utils.c @@ -1,14 +1,128 @@ +// SPDX-License-Identifier: GPL-2.0 #include <linux/kernel.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv4.h> #include <linux/netfilter_ipv6.h> #include <net/netfilter/nf_queue.h> +#include <net/ip6_checksum.h> + +#ifdef CONFIG_INET +__sum16 nf_ip_checksum(struct sk_buff *skb, unsigned int hook, + unsigned int dataoff, u8 protocol) +{ + const struct iphdr *iph = ip_hdr(skb); + __sum16 csum = 0; + + switch (skb->ip_summed) { + case CHECKSUM_COMPLETE: + if (hook != NF_INET_PRE_ROUTING && hook != NF_INET_LOCAL_IN) + break; + if ((protocol == 0 && !csum_fold(skb->csum)) || + !csum_tcpudp_magic(iph->saddr, iph->daddr, + skb->len - dataoff, protocol, + skb->csum)) { + skb->ip_summed = CHECKSUM_UNNECESSARY; + break; + } + /* fall through */ + case CHECKSUM_NONE: + if (protocol == 0) + skb->csum = 0; + else + skb->csum = csum_tcpudp_nofold(iph->saddr, iph->daddr, + skb->len - dataoff, + protocol, 0); + csum = __skb_checksum_complete(skb); + } + return csum; +} +EXPORT_SYMBOL(nf_ip_checksum); +#endif + +static __sum16 nf_ip_checksum_partial(struct sk_buff *skb, unsigned int hook, + unsigned int dataoff, unsigned int len, + u8 protocol) +{ + const struct iphdr *iph = ip_hdr(skb); + __sum16 csum = 0; + + switch (skb->ip_summed) { + case CHECKSUM_COMPLETE: + if (len == skb->len - dataoff) + return nf_ip_checksum(skb, hook, dataoff, protocol); + /* fall through */ + case CHECKSUM_NONE: + skb->csum = csum_tcpudp_nofold(iph->saddr, iph->daddr, protocol, + skb->len - dataoff, 0); + skb->ip_summed = CHECKSUM_NONE; + return __skb_checksum_complete_head(skb, dataoff + len); + } + return csum; +} + +__sum16 nf_ip6_checksum(struct sk_buff *skb, unsigned int hook, + unsigned int dataoff, u8 protocol) +{ + const struct ipv6hdr *ip6h = ipv6_hdr(skb); + __sum16 csum = 0; + + switch (skb->ip_summed) { + case CHECKSUM_COMPLETE: + if (hook != NF_INET_PRE_ROUTING && hook != NF_INET_LOCAL_IN) + break; + if (!csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr, + skb->len - dataoff, protocol, + csum_sub(skb->csum, + skb_checksum(skb, 0, + dataoff, 0)))) { + skb->ip_summed = CHECKSUM_UNNECESSARY; + break; + } + /* fall through */ + case CHECKSUM_NONE: + skb->csum = ~csum_unfold( + csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr, + skb->len - dataoff, + protocol, + csum_sub(0, + skb_checksum(skb, 0, + dataoff, 0)))); + csum = __skb_checksum_complete(skb); + } + return csum; +} +EXPORT_SYMBOL(nf_ip6_checksum); + +static __sum16 nf_ip6_checksum_partial(struct sk_buff *skb, unsigned int hook, + unsigned int dataoff, unsigned int len, + u8 protocol) +{ + const struct ipv6hdr *ip6h = ipv6_hdr(skb); + __wsum hsum; + __sum16 csum = 0; + + switch (skb->ip_summed) { + case CHECKSUM_COMPLETE: + if (len == skb->len - dataoff) + return nf_ip6_checksum(skb, hook, dataoff, protocol); + /* fall through */ + case CHECKSUM_NONE: + hsum = skb_checksum(skb, 0, dataoff, 0); + skb->csum = ~csum_unfold(csum_ipv6_magic(&ip6h->saddr, + &ip6h->daddr, + skb->len - dataoff, + protocol, + csum_sub(0, hsum))); + skb->ip_summed = CHECKSUM_NONE; + return __skb_checksum_complete_head(skb, dataoff + len); + } + return csum; +}; __sum16 nf_checksum(struct sk_buff *skb, unsigned int hook, - unsigned int dataoff, u_int8_t protocol, + unsigned int dataoff, u8 protocol, unsigned short family) { - const struct nf_ipv6_ops *v6ops; __sum16 csum = 0; switch (family) { @@ -16,9 +130,7 @@ __sum16 nf_checksum(struct sk_buff *skb, unsigned int hook, csum = nf_ip_checksum(skb, hook, dataoff, protocol); break; case AF_INET6: - v6ops = rcu_dereference(nf_ipv6_ops); - if (v6ops) - csum = v6ops->checksum(skb, hook, dataoff, protocol); + csum = nf_ip6_checksum(skb, hook, dataoff, protocol); break; } @@ -28,9 +140,8 @@ EXPORT_SYMBOL_GPL(nf_checksum); __sum16 nf_checksum_partial(struct sk_buff *skb, unsigned int hook, unsigned int dataoff, unsigned int len, - u_int8_t protocol, unsigned short family) + u8 protocol, unsigned short family) { - const struct nf_ipv6_ops *v6ops; __sum16 csum = 0; switch (family) { @@ -39,10 +150,8 @@ __sum16 nf_checksum_partial(struct sk_buff *skb, unsigned int hook, protocol); break; case AF_INET6: - v6ops = rcu_dereference(nf_ipv6_ops); - if (v6ops) - csum = v6ops->checksum_partial(skb, hook, dataoff, len, - protocol); + csum = nf_ip6_checksum_partial(skb, hook, dataoff, len, + protocol); break; } diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c index 03b9a50ec93b..89457efd2e00 100644 --- a/net/netfilter/xt_CT.c +++ b/net/netfilter/xt_CT.c @@ -93,7 +93,7 @@ xt_ct_set_helper(struct nf_conn *ct, const char *helper_name, return -ENOENT; } - help = nf_ct_helper_ext_add(ct, helper, GFP_KERNEL); + help = nf_ct_helper_ext_add(ct, GFP_KERNEL); if (help == NULL) { nf_conntrack_helper_put(helper); return -ENOMEM; @@ -104,7 +104,7 @@ xt_ct_set_helper(struct nf_conn *ct, const char *helper_name, } #ifdef CONFIG_NF_CONNTRACK_TIMEOUT -static void __xt_ct_tg_timeout_put(struct ctnl_timeout *timeout) +static void __xt_ct_tg_timeout_put(struct nf_ct_timeout *timeout) { typeof(nf_ct_timeout_put_hook) timeout_put; @@ -121,7 +121,7 @@ xt_ct_set_timeout(struct nf_conn *ct, const struct xt_tgchk_param *par, #ifdef CONFIG_NF_CONNTRACK_TIMEOUT typeof(nf_ct_timeout_find_get_hook) timeout_find_get; const struct nf_conntrack_l4proto *l4proto; - struct ctnl_timeout *timeout; + struct nf_ct_timeout *timeout; struct nf_conn_timeout *timeout_ext; const char *errmsg = NULL; int ret = 0; diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c index 475957cfcf50..0d0d68c989df 100644 --- a/net/netfilter/xt_TEE.c +++ b/net/netfilter/xt_TEE.c @@ -38,7 +38,7 @@ tee_tg4(struct sk_buff *skb, const struct xt_action_param *par) return XT_CONTINUE; } -#if IS_ENABLED(CONFIG_IPV6) +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) static unsigned int tee_tg6(struct sk_buff *skb, const struct xt_action_param *par) { @@ -141,7 +141,7 @@ static struct xt_target tee_tg_reg[] __read_mostly = { .destroy = tee_tg_destroy, .me = THIS_MODULE, }, -#if IS_ENABLED(CONFIG_IPV6) +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) { .name = "TEE", .revision = 1, diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c index d76550a8b642..ad7420cdc439 100644 --- a/net/netfilter/xt_TPROXY.c +++ b/net/netfilter/xt_TPROXY.c @@ -36,15 +36,6 @@ #include <net/netfilter/nf_tproxy.h> #include <linux/netfilter/xt_TPROXY.h> -/* assign a socket to the skb -- consumes sk */ -static void -nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk) -{ - skb_orphan(skb); - skb->sk = sk; - skb->destructor = sock_edemux; -} - static unsigned int tproxy_tg4(struct net *net, struct sk_buff *skb, __be32 laddr, __be16 lport, u_int32_t mark_mask, u_int32_t mark_value) diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c index 7df2dece57d3..5d92e1781980 100644 --- a/net/netfilter/xt_cgroup.c +++ b/net/netfilter/xt_cgroup.c @@ -72,8 +72,9 @@ static bool cgroup_mt_v0(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_cgroup_info_v0 *info = par->matchinfo; + struct sock *sk = skb->sk; - if (skb->sk == NULL || !sk_fullsock(skb->sk)) + if (!sk || !sk_fullsock(sk) || !net_eq(xt_net(par), sock_net(sk))) return false; return (info->id == sock_cgroup_classid(&skb->sk->sk_cgrp_data)) ^ @@ -85,8 +86,9 @@ static bool cgroup_mt_v1(const struct sk_buff *skb, struct xt_action_param *par) const struct xt_cgroup_info_v1 *info = par->matchinfo; struct sock_cgroup_data *skcd = &skb->sk->sk_cgrp_data; struct cgroup *ancestor = info->priv; + struct sock *sk = skb->sk; - if (!skb->sk || !sk_fullsock(skb->sk)) + if (!sk || !sk_fullsock(sk) || !net_eq(xt_net(par), sock_net(sk))) return false; if (ancestor) diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c index 6275106ccf50..bc6c8ab0fa62 100644 --- a/net/netfilter/xt_connlimit.c +++ b/net/netfilter/xt_connlimit.c @@ -93,10 +93,8 @@ static int connlimit_mt_check(const struct xt_mtchk_param *par) /* init private data */ info->data = nf_conncount_init(par->net, par->family, keylen); - if (IS_ERR(info->data)) - return PTR_ERR(info->data); - return 0; + return PTR_ERR_OR_ZERO(info->data); } static void connlimit_mt_destroy(const struct xt_mtdtor_param *par) diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c index 9cfef73b4107..bf7bba80e24c 100644 --- a/net/netfilter/xt_osf.c +++ b/net/netfilter/xt_osf.c @@ -37,118 +37,6 @@ #include <net/netfilter/nf_log.h> #include <linux/netfilter/xt_osf.h> -/* - * Indexed by dont-fragment bit. - * It is the only constant value in the fingerprint. - */ -static struct list_head xt_osf_fingers[2]; - -static const struct nla_policy xt_osf_policy[OSF_ATTR_MAX + 1] = { - [OSF_ATTR_FINGER] = { .len = sizeof(struct xt_osf_user_finger) }, -}; - -static int xt_osf_add_callback(struct net *net, struct sock *ctnl, - struct sk_buff *skb, const struct nlmsghdr *nlh, - const struct nlattr * const osf_attrs[], - struct netlink_ext_ack *extack) -{ - struct xt_osf_user_finger *f; - struct xt_osf_finger *kf = NULL, *sf; - int err = 0; - - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - - if (!osf_attrs[OSF_ATTR_FINGER]) - return -EINVAL; - - if (!(nlh->nlmsg_flags & NLM_F_CREATE)) - return -EINVAL; - - f = nla_data(osf_attrs[OSF_ATTR_FINGER]); - - kf = kmalloc(sizeof(struct xt_osf_finger), GFP_KERNEL); - if (!kf) - return -ENOMEM; - - memcpy(&kf->finger, f, sizeof(struct xt_osf_user_finger)); - - list_for_each_entry(sf, &xt_osf_fingers[!!f->df], finger_entry) { - if (memcmp(&sf->finger, f, sizeof(struct xt_osf_user_finger))) - continue; - - kfree(kf); - kf = NULL; - - if (nlh->nlmsg_flags & NLM_F_EXCL) - err = -EEXIST; - break; - } - - /* - * We are protected by nfnl mutex. - */ - if (kf) - list_add_tail_rcu(&kf->finger_entry, &xt_osf_fingers[!!f->df]); - - return err; -} - -static int xt_osf_remove_callback(struct net *net, struct sock *ctnl, - struct sk_buff *skb, - const struct nlmsghdr *nlh, - const struct nlattr * const osf_attrs[], - struct netlink_ext_ack *extack) -{ - struct xt_osf_user_finger *f; - struct xt_osf_finger *sf; - int err = -ENOENT; - - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - - if (!osf_attrs[OSF_ATTR_FINGER]) - return -EINVAL; - - f = nla_data(osf_attrs[OSF_ATTR_FINGER]); - - list_for_each_entry(sf, &xt_osf_fingers[!!f->df], finger_entry) { - if (memcmp(&sf->finger, f, sizeof(struct xt_osf_user_finger))) - continue; - - /* - * We are protected by nfnl mutex. - */ - list_del_rcu(&sf->finger_entry); - kfree_rcu(sf, rcu_head); - - err = 0; - break; - } - - return err; -} - -static const struct nfnl_callback xt_osf_nfnetlink_callbacks[OSF_MSG_MAX] = { - [OSF_MSG_ADD] = { - .call = xt_osf_add_callback, - .attr_count = OSF_ATTR_MAX, - .policy = xt_osf_policy, - }, - [OSF_MSG_REMOVE] = { - .call = xt_osf_remove_callback, - .attr_count = OSF_ATTR_MAX, - .policy = xt_osf_policy, - }, -}; - -static const struct nfnetlink_subsystem xt_osf_nfnetlink = { - .name = "osf", - .subsys_id = NFNL_SUBSYS_OSF, - .cb_count = OSF_MSG_MAX, - .cb = xt_osf_nfnetlink_callbacks, -}; - static bool xt_osf_match_packet(const struct sk_buff *skb, struct xt_action_param *p) { @@ -159,7 +47,7 @@ xt_osf_match_packet(const struct sk_buff *skb, struct xt_action_param *p) return false; return nf_osf_match(skb, xt_family(p), xt_hooknum(p), xt_in(p), - xt_out(p), info, net, xt_osf_fingers); + xt_out(p), info, net, nf_osf_fingers); } static struct xt_match xt_osf_match = { @@ -177,52 +65,21 @@ static struct xt_match xt_osf_match = { static int __init xt_osf_init(void) { - int err = -EINVAL; - int i; - - for (i=0; i<ARRAY_SIZE(xt_osf_fingers); ++i) - INIT_LIST_HEAD(&xt_osf_fingers[i]); - - err = nfnetlink_subsys_register(&xt_osf_nfnetlink); - if (err < 0) { - pr_err("Failed to register OSF nsfnetlink helper (%d)\n", err); - goto err_out_exit; - } + int err; err = xt_register_match(&xt_osf_match); if (err) { pr_err("Failed to register OS fingerprint " "matching module (%d)\n", err); - goto err_out_remove; + return err; } return 0; - -err_out_remove: - nfnetlink_subsys_unregister(&xt_osf_nfnetlink); -err_out_exit: - return err; } static void __exit xt_osf_fini(void) { - struct xt_osf_finger *f; - int i; - - nfnetlink_subsys_unregister(&xt_osf_nfnetlink); xt_unregister_match(&xt_osf_match); - - rcu_read_lock(); - for (i=0; i<ARRAY_SIZE(xt_osf_fingers); ++i) { - - list_for_each_entry_rcu(f, &xt_osf_fingers[i], finger_entry) { - list_del_rcu(&f->finger_entry); - kfree_rcu(f, rcu_head); - } - } - rcu_read_unlock(); - - rcu_barrier(); } module_init(xt_osf_init); diff --git a/net/netfilter/xt_owner.c b/net/netfilter/xt_owner.c index 3d705c688a27..46686fb73784 100644 --- a/net/netfilter/xt_owner.c +++ b/net/netfilter/xt_owner.c @@ -67,7 +67,7 @@ owner_mt(const struct sk_buff *skb, struct xt_action_param *par) struct sock *sk = skb_to_full_sk(skb); struct net *net = xt_net(par); - if (sk == NULL || sk->sk_socket == NULL) + if (!sk || !sk->sk_socket || !net_eq(net, sock_net(sk))) return (info->match ^ info->invert) == 0; else if (info->match & info->invert & XT_OWNER_SOCKET) /* diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c index 07085c22b19c..f44de4bc2100 100644 --- a/net/netfilter/xt_recent.c +++ b/net/netfilter/xt_recent.c @@ -265,7 +265,8 @@ recent_mt(const struct sk_buff *skb, struct xt_action_param *par) } /* use TTL as seen before forwarding */ - if (xt_out(par) != NULL && skb->sk == NULL) + if (xt_out(par) != NULL && + (!skb->sk || !net_eq(net, sock_net(skb->sk)))) ttl++; spin_lock_bh(&recent_lock); diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c index 5c0779c4fa3c..0472f3472842 100644 --- a/net/netfilter/xt_socket.c +++ b/net/netfilter/xt_socket.c @@ -56,8 +56,12 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par, struct sk_buff *pskb = (struct sk_buff *)skb; struct sock *sk = skb->sk; + if (!net_eq(xt_net(par), sock_net(sk))) + sk = NULL; + if (!sk) sk = nf_sk_lookup_slow_v4(xt_net(par), skb, xt_in(par)); + if (sk) { bool wildcard; bool transparent = true; @@ -113,8 +117,12 @@ socket_mt6_v1_v2_v3(const struct sk_buff *skb, struct xt_action_param *par) struct sk_buff *pskb = (struct sk_buff *)skb; struct sock *sk = skb->sk; + if (!net_eq(xt_net(par), sock_net(sk))) + sk = NULL; + if (!sk) sk = nf_sk_lookup_slow_v6(xt_net(par), skb, xt_in(par)); + if (sk) { bool wildcard; bool transparent = true; |