From 746ce767128598711a00d8df5713d4c3b3d9e9a7 Mon Sep 17 00:00:00 2001 From: Dave Thaler Date: Mon, 20 Feb 2023 22:37:42 +0000 Subject: bpf, docs: Add explanation of endianness Document the discussion from the email thread on the IETF bpf list, where it was explained that the raw format varies by endianness of the processor. Signed-off-by: Dave Thaler Acked-by: David Vernet Link: https://lore.kernel.org/r/20230220223742.1347-1-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/instruction-set.rst | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst index af515de5fc38..01802ed9b29b 100644 --- a/Documentation/bpf/instruction-set.rst +++ b/Documentation/bpf/instruction-set.rst @@ -38,8 +38,9 @@ eBPF has two instruction encodings: * the wide instruction encoding, which appends a second 64-bit immediate (i.e., constant) value after the basic instruction for a total of 128 bits. -The basic instruction encoding is as follows, where MSB and LSB mean the most significant -bits and least significant bits, respectively: +The basic instruction encoding looks as follows for a little-endian processor, +where MSB and LSB mean the most significant bits and least significant bits, +respectively: ============= ======= ======= ======= ============ 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode **opcode** operation to perform +and as follows for a big-endian processor: + +============= ======= ======= ======= ============ +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) +============= ======= ======= ======= ============ +imm offset dst_reg src_reg opcode +============= ======= ======= ======= ============ + +Multi-byte fields ('imm' and 'offset') are similarly stored in +the byte order of the processor. + Note that most instructions do not use all of the fields. Unused fields shall be cleared to zero. -- cgit v1.2.3 From 332ea1f697be148bd5e66475d82b5ecc5084da65 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 22 Feb 2023 15:29:12 -1000 Subject: bpf: Add bpf_cgroup_from_id() kfunc cgroup ID is an userspace-visible 64bit value uniquely identifying a given cgroup. As the IDs are used widely, it's useful to be able to look up the matching cgroups. Add bpf_cgroup_from_id(). v2: Separate out selftest into its own patch as suggested by Alexei. Signed-off-by: Tejun Heo Link: https://lore.kernel.org/r/Y/bBaG96t0/gQl9/@slm.duckdns.org Signed-off-by: Alexei Starovoitov --- Documentation/bpf/kfuncs.rst | 10 +++++++--- kernel/bpf/helpers.c | 18 ++++++++++++++++++ 2 files changed, 25 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index ca96ef3f6896..226313747be5 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -583,13 +583,17 @@ Here's an example of how it can be used: ---- -Another kfunc available for interacting with ``struct cgroup *`` objects is -bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup, -and return it as a cgroup kptr. +Other kfuncs available for interacting with ``struct cgroup *`` objects are +bpf_cgroup_ancestor() and bpf_cgroup_from_id(), allowing callers to access +the ancestor of a cgroup and find a cgroup by its ID, respectively. Both +return a cgroup kptr. .. kernel-doc:: kernel/bpf/helpers.c :identifiers: bpf_cgroup_ancestor +.. kernel-doc:: kernel/bpf/helpers.c + :identifiers: bpf_cgroup_from_id + Eventually, BPF should be updated to allow this to happen with a normal memory load in the program itself. This is currently not possible without more work in the verifier. bpf_cgroup_ancestor() can be used as follows: diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 5b278a38ae58..a784be6f8bac 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2101,6 +2101,23 @@ __bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) cgroup_get(ancestor); return ancestor; } + +/** + * bpf_cgroup_from_id - Find a cgroup from its ID. A cgroup returned by this + * kfunc which is not subsequently stored in a map, must be released by calling + * bpf_cgroup_release(). + * @cgrp: The cgroup for which we're performing a lookup. + * @level: The level of ancestor to look up. + */ +__bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid) +{ + struct cgroup *cgrp; + + cgrp = cgroup_get_from_id(cgid); + if (IS_ERR(cgrp)) + return NULL; + return cgrp; +} #endif /* CONFIG_CGROUPS */ /** @@ -2167,6 +2184,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE) BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL) #endif BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) BTF_SET8_END(generic_btf_ids) -- cgit v1.2.3 From ae256f95478e07d49dae5036bb83c09dfbd686d4 Mon Sep 17 00:00:00 2001 From: "Jose E. Marchesi" Date: Tue, 28 Feb 2023 10:51:29 +0100 Subject: bpf, docs: Document BPF insn encoding in term of stored bytes [Changes from V4: - s/regs:16/regs:8 in figure.] [Changes from V3: - Back to src_reg and dst_reg, since they denote register numbers as opposed to the values stored in these registers.] [Changes from V2: - Use src and dst consistently in the document. - Use a more graphical depiction of the 128-bit instruction. - Remove `Where:' fragment. - Clarify that unused bits are reserved and shall be zeroed.] [Changes from V1: - Use rst literal blocks for figures. - Avoid using | in the basic instruction/pseudo instruction figure. - Rebased to today's bpf-next master branch.] This patch modifies instruction-set.rst so it documents the encoding of BPF instructions in terms of how the bytes are stored (be it in an ELF file or as bytes in a memory buffer to be loaded into the kernel or some other BPF consumer) as opposed to how the instruction looks like once loaded. This is hopefully easier to understand by implementors looking to generate and/or consume bytes conforming BPF instructions. The patch also clarifies that the unused bytes in a pseudo-instruction shall be cleared with zeros. Signed-off-by: Jose E. Marchesi Acked-by: Yonghong Song Acked-by: David Vernet Link: https://lore.kernel.org/r/87h6v6i0da.fsf_-_@oracle.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/instruction-set.rst | 46 ++++++++++++++++++----------------- 1 file changed, 24 insertions(+), 22 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst index 01802ed9b29b..db8789e6969e 100644 --- a/Documentation/bpf/instruction-set.rst +++ b/Documentation/bpf/instruction-set.rst @@ -38,15 +38,11 @@ eBPF has two instruction encodings: * the wide instruction encoding, which appends a second 64-bit immediate (i.e., constant) value after the basic instruction for a total of 128 bits. -The basic instruction encoding looks as follows for a little-endian processor, -where MSB and LSB mean the most significant bits and least significant bits, -respectively: +The fields conforming an encoded basic instruction are stored in the +following order:: -============= ======= ======= ======= ============ -32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) -============= ======= ======= ======= ============ -imm offset src_reg dst_reg opcode -============= ======= ======= ======= ============ + opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF. + opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF. **imm** signed integer immediate value @@ -64,16 +60,17 @@ imm offset src_reg dst_reg opcode **opcode** operation to perform -and as follows for a big-endian processor: +Note that the contents of multi-byte fields ('imm' and 'offset') are +stored using big-endian byte ordering in big-endian BPF and +little-endian byte ordering in little-endian BPF. -============= ======= ======= ======= ============ -32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) -============= ======= ======= ======= ============ -imm offset dst_reg src_reg opcode -============= ======= ======= ======= ============ +For example:: -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of the processor. + opcode offset imm assembly + src_reg dst_reg + 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little + dst_reg src_reg + 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big Note that most instructions do not use all of the fields. Unused fields shall be cleared to zero. @@ -84,18 +81,23 @@ The 64 bits following the basic instruction contain a pseudo instruction using the same format but with opcode, dst_reg, src_reg, and offset all set to zero, and imm containing the high 32 bits of the immediate value. -================= ================== -64 bits (MSB) 64 bits (LSB) -================= ================== -basic instruction pseudo instruction -================= ================== +This is depicted in the following figure:: + + basic_instruction + .-----------------------------. + | | + code:8 regs:8 offset:16 imm:32 unused:32 imm:32 + | | + '--------------' + pseudo instruction Thus the 64-bit immediate value is constructed as follows: imm64 = (next_imm << 32) | imm where 'next_imm' refers to the imm value of the pseudo instruction -following the basic instruction. +following the basic instruction. The unused bytes in the pseudo +instruction are reserved and shall be cleared to zero. Instruction classes ------------------- -- cgit v1.2.3 From d96d937d7c5c12237dce1f14bf0fc9900cabba09 Mon Sep 17 00:00:00 2001 From: Joanne Koong Date: Wed, 1 Mar 2023 07:49:49 -0800 Subject: bpf: Add __uninit kfunc annotation This patch adds __uninit as a kfunc annotation. This will be useful for scenarios such as for example in dynptrs, indicating whether the dynptr should be checked by the verifier as an initialized or an uninitialized dynptr. Without this annotation, the alternative would be needing to hard-code in the verifier the specific kfunc to indicate that arg should be treated as an uninitialized arg. Signed-off-by: Joanne Koong Link: https://lore.kernel.org/r/20230301154953.641654-7-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/kfuncs.rst | 17 +++++++++++++++++ kernel/bpf/verifier.c | 18 ++++++++++++++++-- 2 files changed, 33 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index 226313747be5..9a78533d25ac 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -100,6 +100,23 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a size parameter, and the value of the constant matters for program safety, __k suffix should be used. +2.2.2 __uninit Annotation +-------------------- + +This annotation is used to indicate that the argument will be treated as +uninitialized. + +An example is given below:: + + __bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit) + { + ... + } + +Here, the dynptr will be treated as an uninitialized dynptr. Without this +annotation, the verifier will reject the program if the dynptr passed in is +not initialized. + .. _BPF_kfunc_nodef: 2.3 Using an existing kernel function diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8fd2f26a8977..d052aa5800de 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -8727,6 +8727,11 @@ static bool is_kfunc_arg_alloc_obj(const struct btf *btf, const struct btf_param return __kfunc_param_match_suffix(btf, arg, "__alloc"); } +static bool is_kfunc_arg_uninit(const struct btf *btf, const struct btf_param *arg) +{ + return __kfunc_param_match_suffix(btf, arg, "__uninit"); +} + static bool is_kfunc_arg_scalar_with_name(const struct btf *btf, const struct btf_param *arg, const char *name) @@ -9662,17 +9667,26 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return ret; break; case KF_ARG_PTR_TO_DYNPTR: + { + enum bpf_arg_type dynptr_arg_type = ARG_PTR_TO_DYNPTR; + if (reg->type != PTR_TO_STACK && reg->type != CONST_PTR_TO_DYNPTR) { verbose(env, "arg#%d expected pointer to stack or dynptr_ptr\n", i); return -EINVAL; } - ret = process_dynptr_func(env, regno, insn_idx, - ARG_PTR_TO_DYNPTR | MEM_RDONLY); + if (reg->type == CONST_PTR_TO_DYNPTR) + dynptr_arg_type |= MEM_RDONLY; + + if (is_kfunc_arg_uninit(btf, &args[i])) + dynptr_arg_type |= MEM_UNINIT; + + ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type); if (ret < 0) return ret; break; + } case KF_ARG_PTR_TO_LIST_HEAD: if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { -- cgit v1.2.3 From db52b587c67f40e4bd6e8167f2334d4500617bdc Mon Sep 17 00:00:00 2001 From: David Vernet Date: Wed, 1 Mar 2023 13:49:10 -0600 Subject: bpf, docs: Fix __uninit kfunc doc section In commit d96d937d7c5c ("bpf: Add __uninit kfunc annotation"), the __uninit kfunc annotation was documented in kfuncs.rst. You have to fully underline a section in rst, or the build will issue a warning that the title underline is too short: ./Documentation/bpf/kfuncs.rst:104: WARNING: Title underline too short. 2.2.2 __uninit Annotation -------------------- This patch fixes that title underline. Fixes: d96d937d7c5c ("bpf: Add __uninit kfunc annotation") Signed-off-by: David Vernet Link: https://lore.kernel.org/r/20230301194910.602738-2-void@manifault.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/kfuncs.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index 9a78533d25ac..9d85bbc3b771 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -101,7 +101,7 @@ size parameter, and the value of the constant matters for program safety, __k suffix should be used. 2.2.2 __uninit Annotation --------------------- +------------------------- This annotation is used to indicate that the argument will be treated as uninitialized. -- cgit v1.2.3 From d56b0c461d19dae917fa0bba76cbe8ad7a44712e Mon Sep 17 00:00:00 2001 From: David Vernet Date: Thu, 2 Mar 2023 12:39:17 -0600 Subject: bpf, docs: Fix link to netdev-FAQ target The BPF devel Q&A documentation page makes frequent reference to the netdev-QA page via the netdev-FAQ rst link. This link is currently broken, as is evidenced by the build output when making BPF docs: ./Documentation/bpf/bpf_devel_QA.rst:150: WARNING: undefined label: 'netdev-faq' ./Documentation/bpf/bpf_devel_QA.rst:206: WARNING: undefined label: 'netdev-faq' ./Documentation/bpf/bpf_devel_QA.rst:231: WARNING: undefined label: 'netdev-faq' ./Documentation/bpf/bpf_devel_QA.rst:396: WARNING: undefined label: 'netdev-faq' ./Documentation/bpf/bpf_devel_QA.rst:412: WARNING: undefined label: 'netdev-faq' Fix the links to point to the actual netdev-faq page. Signed-off-by: David Vernet Link: https://lore.kernel.org/r/20230302183918.54190-1-void@manifault.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/bpf_devel_QA.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst index 03d4993eda6f..5f5f9ccc3862 100644 --- a/Documentation/bpf/bpf_devel_QA.rst +++ b/Documentation/bpf/bpf_devel_QA.rst @@ -128,7 +128,7 @@ into the bpf-next tree will make their way into net-next tree. net and net-next are both run by David S. Miller. From there, they will go into the kernel mainline tree run by Linus Torvalds. To read up on the process of net and net-next being merged into the mainline tree, see -the :ref:`netdev-FAQ` +the `netdev-FAQ`_. @@ -147,7 +147,7 @@ request):: Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to? --------------------------------------------------------------------------------- -A: The process is the very same as described in the :ref:`netdev-FAQ`, +A: The process is the very same as described in the `netdev-FAQ`_, so please read up on it. The subject line must indicate whether the patch is a fix or rather "next-like" content in order to let the maintainers know whether it is targeted at bpf or bpf-next. @@ -206,7 +206,7 @@ ii) run extensive BPF test suite and Once the BPF pull request was accepted by David S. Miller, then the patches end up in net or net-next tree, respectively, and make their way from there further into mainline. Again, see the -:ref:`netdev-FAQ` for additional information e.g. on how often they are +`netdev-FAQ`_ for additional information e.g. on how often they are merged to mainline. Q: How long do I need to wait for feedback on my BPF patches? @@ -230,7 +230,7 @@ Q: Are patches applied to bpf-next when the merge window is open? ----------------------------------------------------------------- A: For the time when the merge window is open, bpf-next will not be processed. This is roughly analogous to net-next patch processing, -so feel free to read up on the :ref:`netdev-FAQ` about further details. +so feel free to read up on the `netdev-FAQ`_ about further details. During those two weeks of merge window, we might ask you to resend your patch series once bpf-next is open again. Once Linus released @@ -394,7 +394,7 @@ netdev kernel mailing list in Cc and ask for the fix to be queued up: netdev@vger.kernel.org The process in general is the same as on netdev itself, see also the -:ref:`netdev-FAQ`. +`netdev-FAQ`_. Q: Do you also backport to kernels not currently maintained as stable? ---------------------------------------------------------------------- @@ -410,7 +410,7 @@ Q: The BPF patch I am about to submit needs to go to stable as well What should I do? A: The same rules apply as with netdev patch submissions in general, see -the :ref:`netdev-FAQ`. +the `netdev-FAQ`_. Never add "``Cc: stable@vger.kernel.org``" to the patch description, but ask the BPF maintainers to queue the patches instead. This can be done @@ -685,7 +685,7 @@ when: .. Links .. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/ -.. _netdev-FAQ: Documentation/process/maintainer-netdev.rst +.. _netdev-FAQ: https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html .. _selftests: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/ .. _Documentation/dev-tools/kselftest.rst: -- cgit v1.2.3 From cacad346f67ce9604dcc9db10f1f1769dabb3891 Mon Sep 17 00:00:00 2001 From: David Vernet Date: Thu, 2 Mar 2023 12:39:18 -0600 Subject: bpf, docs: Fix final bpf docs build failure maps.rst in the BPF documentation links to the /userspace-api/ebpf/syscall document (Documentation/userspace-api/ebpf/syscall.rst). For some reason, if you try to reference the document with :doc:, the docs build emits the following warning: ./Documentation/bpf/maps.rst:13: WARNING: \ unknown document: '/userspace-api/ebpf/syscall' It appears that other places in the docs tree also don't support using :doc:. Elsewhere in the BPF documentation, we just reference the kernel docs page directly. Let's do that here to clean up the last remaining noise in the docs build. Signed-off-by: David Vernet Link: https://lore.kernel.org/r/20230302183918.54190-2-void@manifault.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/maps.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/maps.rst b/Documentation/bpf/maps.rst index 4906ff0f8382..6f069f3d6f4b 100644 --- a/Documentation/bpf/maps.rst +++ b/Documentation/bpf/maps.rst @@ -11,9 +11,9 @@ maps are accessed from BPF programs via BPF helpers which are documented in the `man-pages`_ for `bpf-helpers(7)`_. BPF maps are accessed from user space via the ``bpf`` syscall, which provides -commands to create maps, lookup elements, update elements and delete -elements. More details of the BPF syscall are available in -:doc:`/userspace-api/ebpf/syscall` and in the `man-pages`_ for `bpf(2)`_. +commands to create maps, lookup elements, update elements and delete elements. +More details of the BPF syscall are available in `ebpf-syscall`_ and in the +`man-pages`_ for `bpf(2)`_. Map Types ========= @@ -79,3 +79,4 @@ Find and delete element by key in a given map using ``attr->map_fd``, .. _man-pages: https://www.kernel.org/doc/man-pages/ .. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html .. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html +.. _ebpf-syscall: https://docs.kernel.org/userspace-api/ebpf/syscall.html -- cgit v1.2.3 From 03b77e17aeb22a5935ea20d585ca6a1f2947e62b Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Thu, 2 Mar 2023 20:14:41 -0800 Subject: bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted. __kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps. The concept felt useful, but didn't get much traction, since bpf_rdonly_cast() was added soon after and bpf programs received a simpler way to access PTR_UNTRUSTED kernel pointers without going through restrictive __kptr usage. Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate its intended usage. The main goal of __kptr_untrusted was to read/write such pointers directly while bpf_kptr_xchg was a mechanism to access refcnted kernel pointers. The next patch will allow RCU protected __kptr access with direct read. At that point __kptr_untrusted will be deprecated. Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: David Vernet Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com --- Documentation/bpf/bpf_design_QA.rst | 4 ++-- Documentation/bpf/cpumasks.rst | 4 ++-- Documentation/bpf/kfuncs.rst | 2 +- kernel/bpf/btf.c | 4 ++-- tools/lib/bpf/bpf_helpers.h | 2 +- tools/testing/selftests/bpf/progs/cb_refs.c | 2 +- .../selftests/bpf/progs/cgrp_kfunc_common.h | 2 +- tools/testing/selftests/bpf/progs/cpumask_common.h | 2 +- tools/testing/selftests/bpf/progs/jit_probe_mem.c | 2 +- tools/testing/selftests/bpf/progs/lru_bug.c | 2 +- tools/testing/selftests/bpf/progs/map_kptr.c | 4 ++-- tools/testing/selftests/bpf/progs/map_kptr_fail.c | 6 +++--- .../selftests/bpf/progs/task_kfunc_common.h | 2 +- tools/testing/selftests/bpf/test_verifier.c | 22 +++++++++++----------- 14 files changed, 30 insertions(+), 30 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index bfff0e7e37c2..38372a956d65 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -314,7 +314,7 @@ Q: What is the compatibility story for special BPF types in map values? Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map values (when using BTF support for BPF maps). This allows to use helpers for such objects on these fields inside map values. Users are also allowed to embed -pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the +pointers to some kernel types (with __kptr_untrusted and __kptr BTF tags). Will the kernel preserve backwards compatibility for these features? A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else: @@ -324,7 +324,7 @@ For struct types that have been added already, like bpf_spin_lock and bpf_timer, the kernel will preserve backwards compatibility, as they are part of UAPI. For kptrs, they are also part of UAPI, but only with respect to the kptr -mechanism. The types that you can use with a __kptr and __kptr_ref tagged +mechanism. The types that you can use with a __kptr_untrusted and __kptr tagged pointer in your struct are NOT part of the UAPI contract. The supported types can and will change across kernel releases. However, operations like accessing kptr fields and bpf_kptr_xchg() helper will continue to be supported across kernel diff --git a/Documentation/bpf/cpumasks.rst b/Documentation/bpf/cpumasks.rst index 24bef9cbbeee..75344cd230e5 100644 --- a/Documentation/bpf/cpumasks.rst +++ b/Documentation/bpf/cpumasks.rst @@ -51,7 +51,7 @@ For example: .. code-block:: c struct cpumask_map_value { - struct bpf_cpumask __kptr_ref * cpumask; + struct bpf_cpumask __kptr * cpumask; }; struct array_map { @@ -128,7 +128,7 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map: /* struct containing the struct bpf_cpumask kptr which is stored in the map. */ struct cpumasks_kfunc_map_value { - struct bpf_cpumask __kptr_ref * bpf_cpumask; + struct bpf_cpumask __kptr * bpf_cpumask; }; /* The map containing struct cpumasks_kfunc_map_value entries. */ diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index 9d85bbc3b771..b5d9b0d446bc 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -544,7 +544,7 @@ Here's an example of how it can be used: /* struct containing the struct task_struct kptr which is actually stored in the map. */ struct __cgroups_kfunc_map_value { - struct cgroup __kptr_ref * cgroup; + struct cgroup __kptr * cgroup; }; /* The map containing struct __cgroups_kfunc_map_value entries. */ diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index ef2d8969ed1f..c5e1d6955491 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3288,9 +3288,9 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t, /* Reject extra tags */ if (btf_type_is_type_tag(btf_type_by_id(btf, t->type))) return -EINVAL; - if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off))) + if (!strcmp("kptr_untrusted", __btf_name_by_offset(btf, t->name_off))) type = BPF_KPTR_UNREF; - else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off))) + else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off))) type = BPF_KPTR_REF; else return -EINVAL; diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index 5ec1871acb2f..7d12d3e620cc 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -174,8 +174,8 @@ enum libbpf_tristate { #define __kconfig __attribute__((section(".kconfig"))) #define __ksym __attribute__((section(".ksyms"))) +#define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted"))) #define __kptr __attribute__((btf_type_tag("kptr"))) -#define __kptr_ref __attribute__((btf_type_tag("kptr_ref"))) #ifndef ___bpf_concat #define ___bpf_concat(a, b) a ## b diff --git a/tools/testing/selftests/bpf/progs/cb_refs.c b/tools/testing/selftests/bpf/progs/cb_refs.c index 7653df1bc787..ce96b33e38d6 100644 --- a/tools/testing/selftests/bpf/progs/cb_refs.c +++ b/tools/testing/selftests/bpf/progs/cb_refs.c @@ -4,7 +4,7 @@ #include struct map_value { - struct prog_test_ref_kfunc __kptr_ref *ptr; + struct prog_test_ref_kfunc __kptr *ptr; }; struct { diff --git a/tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h b/tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h index 2f8de933b957..d0b7cd0d09d7 100644 --- a/tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h +++ b/tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h @@ -10,7 +10,7 @@ #include struct __cgrps_kfunc_map_value { - struct cgroup __kptr_ref * cgrp; + struct cgroup __kptr * cgrp; }; struct hash_map { diff --git a/tools/testing/selftests/bpf/progs/cpumask_common.h b/tools/testing/selftests/bpf/progs/cpumask_common.h index ad34f3b602be..65e5496ca1b2 100644 --- a/tools/testing/selftests/bpf/progs/cpumask_common.h +++ b/tools/testing/selftests/bpf/progs/cpumask_common.h @@ -10,7 +10,7 @@ int err; struct __cpumask_map_value { - struct bpf_cpumask __kptr_ref * cpumask; + struct bpf_cpumask __kptr * cpumask; }; struct array_map { diff --git a/tools/testing/selftests/bpf/progs/jit_probe_mem.c b/tools/testing/selftests/bpf/progs/jit_probe_mem.c index 2d2e61470794..13f00ca2ed0a 100644 --- a/tools/testing/selftests/bpf/progs/jit_probe_mem.c +++ b/tools/testing/selftests/bpf/progs/jit_probe_mem.c @@ -4,7 +4,7 @@ #include #include -static struct prog_test_ref_kfunc __kptr_ref *v; +static struct prog_test_ref_kfunc __kptr *v; long total_sum = -1; extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym; diff --git a/tools/testing/selftests/bpf/progs/lru_bug.c b/tools/testing/selftests/bpf/progs/lru_bug.c index 687081a724b3..ad73029cb1e3 100644 --- a/tools/testing/selftests/bpf/progs/lru_bug.c +++ b/tools/testing/selftests/bpf/progs/lru_bug.c @@ -4,7 +4,7 @@ #include struct map_value { - struct task_struct __kptr *ptr; + struct task_struct __kptr_untrusted *ptr; }; struct { diff --git a/tools/testing/selftests/bpf/progs/map_kptr.c b/tools/testing/selftests/bpf/progs/map_kptr.c index a24d17bc17eb..3fe7cde4cbfd 100644 --- a/tools/testing/selftests/bpf/progs/map_kptr.c +++ b/tools/testing/selftests/bpf/progs/map_kptr.c @@ -4,8 +4,8 @@ #include struct map_value { - struct prog_test_ref_kfunc __kptr *unref_ptr; - struct prog_test_ref_kfunc __kptr_ref *ref_ptr; + struct prog_test_ref_kfunc __kptr_untrusted *unref_ptr; + struct prog_test_ref_kfunc __kptr *ref_ptr; }; struct array_map { diff --git a/tools/testing/selftests/bpf/progs/map_kptr_fail.c b/tools/testing/selftests/bpf/progs/map_kptr_fail.c index 760e41e1a632..e19e2a5f38cf 100644 --- a/tools/testing/selftests/bpf/progs/map_kptr_fail.c +++ b/tools/testing/selftests/bpf/progs/map_kptr_fail.c @@ -7,9 +7,9 @@ struct map_value { char buf[8]; - struct prog_test_ref_kfunc __kptr *unref_ptr; - struct prog_test_ref_kfunc __kptr_ref *ref_ptr; - struct prog_test_member __kptr_ref *ref_memb_ptr; + struct prog_test_ref_kfunc __kptr_untrusted *unref_ptr; + struct prog_test_ref_kfunc __kptr *ref_ptr; + struct prog_test_member __kptr *ref_memb_ptr; }; struct array_map { diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h index c0ffd171743e..4c2a4b0e3a25 100644 --- a/tools/testing/selftests/bpf/progs/task_kfunc_common.h +++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h @@ -10,7 +10,7 @@ #include struct __tasks_kfunc_map_value { - struct task_struct __kptr_ref * task; + struct task_struct __kptr * task; }; struct hash_map { diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 8b9949bb833d..49a70d9beb0b 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -699,13 +699,13 @@ static int create_cgroup_storage(bool percpu) * struct bpf_timer t; * }; * struct btf_ptr { + * struct prog_test_ref_kfunc __kptr_untrusted *ptr; * struct prog_test_ref_kfunc __kptr *ptr; - * struct prog_test_ref_kfunc __kptr_ref *ptr; - * struct prog_test_member __kptr_ref *ptr; + * struct prog_test_member __kptr *ptr; * } */ static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t" - "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_ref" + "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_untrusted" "\0prog_test_member"; static __u32 btf_raw_types[] = { /* int */ @@ -724,20 +724,20 @@ static __u32 btf_raw_types[] = { BTF_MEMBER_ENC(41, 4, 0), /* struct bpf_timer t; */ /* struct prog_test_ref_kfunc */ /* [6] */ BTF_STRUCT_ENC(51, 0, 0), - BTF_STRUCT_ENC(89, 0, 0), /* [7] */ + BTF_STRUCT_ENC(95, 0, 0), /* [7] */ + /* type tag "kptr_untrusted" */ + BTF_TYPE_TAG_ENC(80, 6), /* [8] */ /* type tag "kptr" */ - BTF_TYPE_TAG_ENC(75, 6), /* [8] */ - /* type tag "kptr_ref" */ - BTF_TYPE_TAG_ENC(80, 6), /* [9] */ - BTF_TYPE_TAG_ENC(80, 7), /* [10] */ + BTF_TYPE_TAG_ENC(75, 6), /* [9] */ + BTF_TYPE_TAG_ENC(75, 7), /* [10] */ BTF_PTR_ENC(8), /* [11] */ BTF_PTR_ENC(9), /* [12] */ BTF_PTR_ENC(10), /* [13] */ /* struct btf_ptr */ /* [14] */ BTF_STRUCT_ENC(43, 3, 24), - BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr *ptr; */ - BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr_ref *ptr; */ - BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr_ref *ptr; */ + BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr_untrusted *ptr; */ + BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr *ptr; */ + BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr *ptr; */ }; static char bpf_vlog[UINT_MAX >> 8]; -- cgit v1.2.3 From 20c09d92faeefb8536f705d3a4629e0dc314c8a1 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Thu, 2 Mar 2023 20:14:43 -0800 Subject: bpf: Introduce kptr_rcu. The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU. Derefrence of other kptr-s returns PTR_UNTRUSTED. For example: struct map_value { struct cgroup __kptr *cgrp; }; SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path) { struct cgroup *cg, *cg2; cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0 bpf_kptr_xchg(&v->cgrp, cg); cg2 = v->cgrp; // This is new feature introduced by this patch. // cg2 is PTR_MAYBE_NULL | MEM_RCU. // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero if (cg2) bpf_cgroup_ancestor(cg2, level); // safe to do. } Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: Tejun Heo Acked-by: David Vernet Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com --- Documentation/bpf/kfuncs.rst | 12 +++-- include/linux/btf.h | 2 +- kernel/bpf/helpers.c | 6 ++- kernel/bpf/verifier.c | 55 ++++++++++++++++++---- net/bpf/test_run.c | 3 +- .../selftests/bpf/progs/cgrp_kfunc_failure.c | 2 +- tools/testing/selftests/bpf/progs/map_kptr_fail.c | 4 +- tools/testing/selftests/bpf/verifier/calls.c | 2 +- tools/testing/selftests/bpf/verifier/map_kptr.c | 2 +- 9 files changed, 65 insertions(+), 23 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index b5d9b0d446bc..69eccf6f98ef 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -249,11 +249,13 @@ added later. 2.4.8 KF_RCU flag ----------------- -The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument. -When used together with KF_ACQUIRE, it indicates the kfunc should have a -single argument which must be a trusted argument or a MEM_RCU pointer. -The argument may have reference count of 0 and the kfunc must take this -into consideration. +The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with +KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees +that the objects are valid and there is no use-after-free. The pointers are not +NULL, but the object's refcount could have reached zero. The kfuncs need to +consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE +pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely +also be KF_RET_NULL. .. _KF_deprecated_flag: diff --git a/include/linux/btf.h b/include/linux/btf.h index 49e0fe6d8274..556b3e2e7471 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -70,7 +70,7 @@ #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ #define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ #define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ -#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ +#define KF_RCU (1 << 7) /* kfunc takes either rcu or trusted pointer arguments */ /* * Tag marking a kernel function as a kfunc. This is meant to minimize the diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 12f12e879bcf..637ac4e92e75 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2163,8 +2163,10 @@ __bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) if (level > cgrp->level || level < 0) return NULL; + /* cgrp's refcnt could be 0 here, but ancestors can still be accessed */ ancestor = cgrp->ancestors[level]; - cgroup_get(ancestor); + if (!cgroup_tryget(ancestor)) + return NULL; return ancestor; } @@ -2382,7 +2384,7 @@ BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE) -BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL) #endif BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b834f3d2d81a..a095055d7ef4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4218,7 +4218,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno) { const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id); - int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED; + int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED | MEM_RCU; const char *reg_name = ""; /* Only unreferenced case accepts untrusted pointers */ @@ -4285,6 +4285,34 @@ bad_type: return -EINVAL; } +/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock() + * can dereference RCU protected pointers and result is PTR_TRUSTED. + */ +static bool in_rcu_cs(struct bpf_verifier_env *env) +{ + return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable; +} + +/* Once GCC supports btf_type_tag the following mechanism will be replaced with tag check */ +BTF_SET_START(rcu_protected_types) +BTF_ID(struct, prog_test_ref_kfunc) +BTF_ID(struct, cgroup) +BTF_SET_END(rcu_protected_types) + +static bool rcu_protected_object(const struct btf *btf, u32 btf_id) +{ + if (!btf_is_kernel(btf)) + return false; + return btf_id_set_contains(&rcu_protected_types, btf_id); +} + +static bool rcu_safe_kptr(const struct btf_field *field) +{ + const struct btf_field_kptr *kptr = &field->kptr; + + return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id); +} + static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno, int value_regno, int insn_idx, struct btf_field *kptr_field) @@ -4319,7 +4347,10 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno, * value from map as PTR_TO_BTF_ID, with the correct type. */ mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf, - kptr_field->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED); + kptr_field->kptr.btf_id, + rcu_safe_kptr(kptr_field) && in_rcu_cs(env) ? + PTR_MAYBE_NULL | MEM_RCU : + PTR_MAYBE_NULL | PTR_UNTRUSTED); /* For mark_ptr_or_null_reg */ val_reg->id = ++env->id_gen; } else if (class == BPF_STX) { @@ -5163,10 +5194,17 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, * An RCU-protected pointer can also be deemed trusted if we are in an * RCU read region. This case is handled below. */ - if (nested_ptr_is_trusted(env, reg, off)) + if (nested_ptr_is_trusted(env, reg, off)) { flag |= PTR_TRUSTED; - else + /* + * task->cgroups is trusted. It provides a stronger guarantee + * than __rcu tag on 'cgroups' field in 'struct task_struct'. + * Clear MEM_RCU in such case. + */ + flag &= ~MEM_RCU; + } else { flag &= ~PTR_TRUSTED; + } if (flag & MEM_RCU) { /* Mark value register as MEM_RCU only if it is protected by @@ -5175,11 +5213,10 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, * read lock region. Also mark rcu pointer as PTR_MAYBE_NULL since * it could be null in some cases. */ - if (!env->cur_state->active_rcu_lock || - !(is_trusted_reg(reg) || is_rcu_reg(reg))) - flag &= ~MEM_RCU; - else + if (in_rcu_cs(env) && (is_trusted_reg(reg) || is_rcu_reg(reg))) flag |= PTR_MAYBE_NULL; + else + flag &= ~MEM_RCU; } else if (reg->type & MEM_RCU) { /* ptr (reg) is marked as MEM_RCU, but the struct field is not tagged * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively. @@ -9676,7 +9713,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return -EINVAL; } - if (is_kfunc_trusted_args(meta) && + if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) && (register_is_null(reg) || type_may_be_null(reg->type))) { verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i); return -EACCES; diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 6f3d654b3339..6a8b33a103a4 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -737,6 +737,7 @@ __bpf_kfunc void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len) __bpf_kfunc void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) { + /* p != NULL, but p->cnt could be 0 */ } __bpf_kfunc void bpf_kfunc_call_test_destructive(void) @@ -784,7 +785,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2) -BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU) BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE) BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg) BTF_SET8_END(test_sk_check_kfunc_ids) diff --git a/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c b/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c index 4ad7fe24966d..b42291ed9586 100644 --- a/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c +++ b/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c @@ -205,7 +205,7 @@ int BPF_PROG(cgrp_kfunc_get_unreleased, struct cgroup *cgrp, const char *path) } SEC("tp_btf/cgroup_mkdir") -__failure __msg("arg#0 is untrusted_ptr_or_null_ expected ptr_ or socket") +__failure __msg("expects refcounted") int BPF_PROG(cgrp_kfunc_release_untrusted, struct cgroup *cgrp, const char *path) { struct __cgrps_kfunc_map_value *v; diff --git a/tools/testing/selftests/bpf/progs/map_kptr_fail.c b/tools/testing/selftests/bpf/progs/map_kptr_fail.c index e19e2a5f38cf..08f9ec18c345 100644 --- a/tools/testing/selftests/bpf/progs/map_kptr_fail.c +++ b/tools/testing/selftests/bpf/progs/map_kptr_fail.c @@ -281,7 +281,7 @@ int reject_kptr_get_bad_type_match(struct __sk_buff *ctx) } SEC("?tc") -__failure __msg("R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_") +__failure __msg("R1 type=rcu_ptr_or_null_ expected=percpu_ptr_") int mark_ref_as_untrusted_or_null(struct __sk_buff *ctx) { struct map_value *v; @@ -316,7 +316,7 @@ int reject_untrusted_store_to_ref(struct __sk_buff *ctx) } SEC("?tc") -__failure __msg("R2 type=untrusted_ptr_ expected=ptr_") +__failure __msg("R2 must be referenced") int reject_untrusted_xchg(struct __sk_buff *ctx) { struct prog_test_ref_kfunc *p; diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c index 289ed202ec66..9a326a800e5c 100644 --- a/tools/testing/selftests/bpf/verifier/calls.c +++ b/tools/testing/selftests/bpf/verifier/calls.c @@ -243,7 +243,7 @@ }, .result_unpriv = REJECT, .result = REJECT, - .errstr = "R1 must be referenced", + .errstr = "R1 must be", }, { "calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID", diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c index 6914904344c0..d775ccb01989 100644 --- a/tools/testing/selftests/bpf/verifier/map_kptr.c +++ b/tools/testing/selftests/bpf/verifier/map_kptr.c @@ -336,7 +336,7 @@ .prog_type = BPF_PROG_TYPE_SCHED_CLS, .fixup_map_kptr = { 1 }, .result = REJECT, - .errstr = "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_", + .errstr = "R1 type=rcu_ptr_or_null_ expected=percpu_ptr_", }, { "map_kptr: ref: reject off != 0", -- cgit v1.2.3 From 7d8c48917a9576b5fc8871aa4946149b0e4a4927 Mon Sep 17 00:00:00 2001 From: Arınç ÜNAL Date: Tue, 7 Mar 2023 12:56:19 +0300 Subject: dt-bindings: net: dsa: mediatek,mt7530: change some descriptions to literal MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The line endings must be preserved on gpio-controller, io-supply, and reset-gpios properties to look proper when the YAML file is parsed. Currently it's interpreted as a single line when parsed. Change the style of the description of these properties to literal style to preserve the line endings. Signed-off-by: Arınç ÜNAL Acked-by: Rob Herring Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml b/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml index 449ee0735012..5ae9cd8f99a2 100644 --- a/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml +++ b/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml @@ -93,7 +93,7 @@ properties: gpio-controller: type: boolean - description: + description: | If defined, LED controller of the MT7530 switch will run on GPIO mode. There are 15 controllable pins. @@ -112,7 +112,7 @@ properties: maxItems: 1 io-supply: - description: + description: | Phandle to the regulator node necessary for the I/O power. See Documentation/devicetree/bindings/regulator/mt6323-regulator.txt for details for the regulator setup on these boards. @@ -124,7 +124,7 @@ properties: switch is a part of the multi-chip module. reset-gpios: - description: + description: | GPIO to reset the switch. Use this if mediatek,mcm is not used. This property is optional because some boards share the reset line with other components which makes it impossible to probe the switch if the -- cgit v1.2.3 From c1f9e14e3b676eb88fe1c9488c0b5f4fc9108a1c Mon Sep 17 00:00:00 2001 From: Dave Thaler Date: Wed, 8 Mar 2023 20:53:03 +0000 Subject: bpf, docs: Explain helper functions Add brief text about existence of helper functions, with details to go in separate psABI text. Note that text about runtime functions (kfuncs) is part of a separate patch, not this one. Signed-off-by: Dave Thaler Link: https://lore.kernel.org/r/20230308205303.1308-1-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/clang-notes.rst | 6 ++++++ Documentation/bpf/instruction-set.rst | 9 ++++++++- Documentation/bpf/linux-notes.rst | 8 ++++++++ 3 files changed, 22 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/bpf/clang-notes.rst b/Documentation/bpf/clang-notes.rst index 528feddf2db9..2c872a1ee08e 100644 --- a/Documentation/bpf/clang-notes.rst +++ b/Documentation/bpf/clang-notes.rst @@ -20,6 +20,12 @@ Arithmetic instructions For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with ``-Xclang -target-feature -Xclang +alu32``. In CPU version 3, support is automatically included. +Jump instructions +================= + +If ``-O0`` is used, Clang will generate the ``BPF_CALL | BPF_X | BPF_JMP`` (0x8d) +instruction, which is not supported by the Linux kernel verifier. + Atomic operations ================= diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst index db8789e6969e..5e43e14abe80 100644 --- a/Documentation/bpf/instruction-set.rst +++ b/Documentation/bpf/instruction-set.rst @@ -253,7 +253,7 @@ BPF_JSET 0x40 PC += off if dst & src BPF_JNE 0x50 PC += off if dst != src BPF_JSGT 0x60 PC += off if dst > src signed BPF_JSGE 0x70 PC += off if dst >= src signed -BPF_CALL 0x80 function call +BPF_CALL 0x80 function call see `Helper functions`_ BPF_EXIT 0x90 function / program return BPF_JMP only BPF_JLT 0xa0 PC += off if dst < src unsigned BPF_JLE 0xb0 PC += off if dst <= src unsigned @@ -264,6 +264,13 @@ BPF_JSLE 0xd0 PC += off if dst <= src signed The eBPF program needs to store the return value into register R0 before doing a BPF_EXIT. +Helper functions +~~~~~~~~~~~~~~~~ + +Helper functions are a concept whereby BPF programs can call into a +set of function calls exposed by the runtime. Each helper +function is identified by an integer used in a ``BPF_CALL`` instruction. +The available helper functions may differ for each program type. Load and store instructions =========================== diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst index 956b0c86699d..f43b9c797bcb 100644 --- a/Documentation/bpf/linux-notes.rst +++ b/Documentation/bpf/linux-notes.rst @@ -12,6 +12,14 @@ Byte swap instructions ``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively. +Jump instructions +================= + +``BPF_CALL | BPF_X | BPF_JMP`` (0x8d), where the helper function +integer would be read from a specified register, is not currently supported +by the verifier. Any programs with this instruction will fail to load +until such support is added. + Legacy BPF Packet access instructions ===================================== -- cgit v1.2.3 From aacaf7b3d19daaa91528ab0c598b89a7f82aa47d Mon Sep 17 00:00:00 2001 From: Siddharth Vadapalli Date: Thu, 9 Mar 2023 13:06:11 +0530 Subject: dt-bindings: net: ti: k3-am654-cpsw-nuss: Document Serdes PHY Update bindings to include Serdes PHY as an optional PHY, in addition to the existing CPSW MAC's PHY. The CPSW MAC's PHY is required while the Serdes PHY is optional. The Serdes PHY handle has to be provided only when the Serdes is being configured in a Single-Link protocol. Using the name "serdes-phy" to represent the Serdes PHY handle, the am65-cpsw-nuss driver can obtain the Serdes PHY and request the Serdes to be configured. Signed-off-by: Siddharth Vadapalli Reviewed-by: Krzysztof Kozlowski Signed-off-by: Jakub Kicinski --- .../devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml index 900063411a20..628d63e1eb1f 100644 --- a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml +++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml @@ -126,8 +126,18 @@ properties: description: CPSW port number phys: - maxItems: 1 - description: phandle on phy-gmii-sel PHY + minItems: 1 + items: + - description: CPSW MAC's PHY. + - description: Serdes PHY. Serdes PHY is required only if + the Serdes has to be configured in the + Single-Link configuration. + + phy-names: + minItems: 1 + items: + - const: mac + - const: serdes label: description: label associated with this port -- cgit v1.2.3 From b9fe8e8d03d0df28b2431e3aaf8e115cf7bf2f65 Mon Sep 17 00:00:00 2001 From: Dave Thaler Date: Fri, 10 Mar 2023 23:38:14 +0000 Subject: bpf, docs: Add signed comparison example Improve clarity by adding an example of a signed comparison instruction Signed-off-by: Dave Thaler Acked-by: David Vernet Acked-by: John Fastabend Link: https://lore.kernel.org/r/20230310233814.4641-1-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/instruction-set.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst index 5e43e14abe80..b44640589055 100644 --- a/Documentation/bpf/instruction-set.rst +++ b/Documentation/bpf/instruction-set.rst @@ -11,7 +11,8 @@ Documentation conventions ========================= For brevity, this document uses the type notion "u64", "u32", etc. -to mean an unsigned integer whose width is the specified number of bits. +to mean an unsigned integer whose width is the specified number of bits, +and "s32", etc. to mean a signed integer of the specified number of bits. Registers and calling convention ================================ @@ -264,6 +265,14 @@ BPF_JSLE 0xd0 PC += off if dst <= src signed The eBPF program needs to store the return value into register R0 before doing a BPF_EXIT. +Example: + +``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means:: + + if (s32)dst s>= (s32)src goto +offset + +where 's>=' indicates a signed '>=' comparison. + Helper functions ~~~~~~~~~~~~~~~~ -- cgit v1.2.3 From 1bffcea42926b26e092045ac398850e80d950bb2 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Mon, 13 Mar 2023 22:42:30 -0700 Subject: net/mlx5e: Add devlink hairpin queues parameters We refer to a TC NIC rule that involves forwarding as "hairpin". Hairpin queues are mlx5 hardware specific implementation for hardware forwarding of such packets. Per the discussion in [1], move the hairpin queues control (number and size) from debugfs to devlink. Expose two devlink params: - hairpin_num_queues: control the number of hairpin queues - hairpin_queue_size: control the size (in packets) of the hairpin queues [1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/ Signed-off-by: Gal Pressman Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Link: https://lore.kernel.org/r/20230314054234.267365-12-saeed@kernel.org Signed-off-by: Jakub Kicinski --- .../ethernet/mellanox/mlx5/devlink.rst | 35 ++++++++++++ Documentation/networking/devlink/mlx5.rst | 12 ++++ drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 66 ++++++++++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 50 +++++++--------- 5 files changed, 134 insertions(+), 31 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst index 9b5c40ba7f0d..0995e4e5acd7 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst @@ -122,6 +122,41 @@ users try to enable them. $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev +hairpin_num_queues: Number of hairpin queues +-------------------------------------------- +We refer to a TC NIC rule that involves forwarding as "hairpin". + +Hairpin queues are mlx5 hardware specific implementation for hardware +forwarding of such packets. + +- Show the number of hairpin queues:: + + $ devlink dev param show pci/0000:06:00.0 name hairpin_num_queues + pci/0000:06:00.0: + name hairpin_num_queues type driver-specific + values: + cmode driverinit value 2 + +- Change the number of hairpin queues:: + + $ devlink dev param set pci/0000:06:00.0 name hairpin_num_queues value 4 cmode driverinit + +hairpin_queue_size: Size of the hairpin queues +---------------------------------------------- +Control the size of the hairpin queues. + +- Show the size of the hairpin queues:: + + $ devlink dev param show pci/0000:06:00.0 name hairpin_queue_size + pci/0000:06:00.0: + name hairpin_queue_size type driver-specific + values: + cmode driverinit value 1024 + +- Change the size (in packets) of the hairpin queues:: + + $ devlink dev param set pci/0000:06:00.0 name hairpin_queue_size value 512 cmode driverinit + Health reporters ================ diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 3321117cf605..202798d6501e 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -72,6 +72,18 @@ parameters. Default: disabled + * - ``hairpin_num_queues`` + - u32 + - driverinit + - We refer to a TC NIC rule that involves forwarding as "hairpin". + Hairpin queues are mlx5 hardware specific implementation for hardware + forwarding of such packets. + + Control the number of hairpin queues. + * - ``hairpin_queue_size`` + - u32 + - driverinit + - Control the size (in packets) of the hairpin queues. The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index b7784e02c2dd..1ee2a472e1d2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -494,6 +494,61 @@ static int mlx5_devlink_eq_depth_validate(struct devlink *devlink, u32 id, return (val.vu32 >= 64 && val.vu32 <= 4096) ? 0 : -EINVAL; } +static int +mlx5_devlink_hairpin_num_queues_validate(struct devlink *devlink, u32 id, + union devlink_param_value val, + struct netlink_ext_ack *extack) +{ + return val.vu32 ? 0 : -EINVAL; +} + +static int +mlx5_devlink_hairpin_queue_size_validate(struct devlink *devlink, u32 id, + union devlink_param_value val, + struct netlink_ext_ack *extack) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + u32 val32 = val.vu32; + + if (!is_power_of_2(val32)) { + NL_SET_ERR_MSG_MOD(extack, "Value is not power of two"); + return -EINVAL; + } + + if (val32 > BIT(MLX5_CAP_GEN(dev, log_max_hairpin_num_packets))) { + NL_SET_ERR_MSG_FMT_MOD( + extack, "Maximum hairpin queue size is %lu", + BIT(MLX5_CAP_GEN(dev, log_max_hairpin_num_packets))); + return -EINVAL; + } + + return 0; +} + +static void mlx5_devlink_hairpin_params_init_values(struct devlink *devlink) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + union devlink_param_value value; + u64 link_speed64; + u32 link_speed; + + /* set hairpin pair per each 50Gbs share of the link */ + mlx5_port_max_linkspeed(dev, &link_speed); + link_speed = max_t(u32, link_speed, 50000); + link_speed64 = link_speed; + do_div(link_speed64, 50000); + + value.vu32 = link_speed64; + devl_param_driverinit_value_set( + devlink, MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES, value); + + value.vu32 = + BIT(min_t(u32, 16 - MLX5_MPWRQ_MIN_LOG_STRIDE_SZ(dev), + MLX5_CAP_GEN(dev, log_max_hairpin_num_packets))); + devl_param_driverinit_value_set( + devlink, MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE, value); +} + static const struct devlink_param mlx5_devlink_params[] = { DEVLINK_PARAM_GENERIC(ENABLE_ROCE, BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, mlx5_devlink_enable_roce_validate), @@ -547,6 +602,14 @@ static void mlx5_devlink_set_params_init_values(struct devlink *devlink) static const struct devlink_param mlx5_devlink_eth_params[] = { DEVLINK_PARAM_GENERIC(ENABLE_ETH, BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, NULL), + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES, + "hairpin_num_queues", DEVLINK_PARAM_TYPE_U32, + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, + mlx5_devlink_hairpin_num_queues_validate), + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE, + "hairpin_queue_size", DEVLINK_PARAM_TYPE_U32, + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, + mlx5_devlink_hairpin_queue_size_validate), }; static int mlx5_devlink_eth_params_register(struct devlink *devlink) @@ -567,6 +630,9 @@ static int mlx5_devlink_eth_params_register(struct devlink *devlink) devl_param_driverinit_value_set(devlink, DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH, value); + + mlx5_devlink_hairpin_params_init_values(devlink); + return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h index 212b12424146..5dcfb4d86d8a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h @@ -12,6 +12,8 @@ enum mlx5_devlink_param_id { MLX5_DEVLINK_PARAM_ID_ESW_LARGE_GROUP_NUM, MLX5_DEVLINK_PARAM_ID_ESW_PORT_METADATA, MLX5_DEVLINK_PARAM_ID_ESW_MULTIPORT, + MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES, + MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE, }; struct mlx5_trap_ctx { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 79dd8ad5ede7..2e6351ef4d9c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -44,6 +44,7 @@ #include #include #include +#include "devlink.h" #include "en.h" #include "en/tc/post_act.h" #include "en/tc/act_stats.h" @@ -73,12 +74,6 @@ #define MLX5E_TC_TABLE_NUM_GROUPS 4 #define MLX5E_TC_TABLE_MAX_GROUP_SIZE BIT(18) -struct mlx5e_hairpin_params { - struct mlx5_core_dev *mdev; - u32 num_queues; - u32 queue_size; -}; - struct mlx5e_tc_table { /* Protects the dynamic assignment of the t parameter * which is the nic tc root table. @@ -101,7 +96,6 @@ struct mlx5e_tc_table { struct mlx5_tc_ct_priv *ct; struct mapping_ctx *mapping; - struct mlx5e_hairpin_params hairpin_params; struct dentry *dfs_root; /* tc action stats */ @@ -1099,33 +1093,15 @@ static void mlx5e_tc_debugfs_init(struct mlx5e_tc_table *tc, &debugfs_hairpin_table_dump_fops); } -static void -mlx5e_hairpin_params_init(struct mlx5e_hairpin_params *hairpin_params, - struct mlx5_core_dev *mdev) -{ - u64 link_speed64; - u32 link_speed; - - hairpin_params->mdev = mdev; - /* set hairpin pair per each 50Gbs share of the link */ - mlx5_port_max_linkspeed(mdev, &link_speed); - link_speed = max_t(u32, link_speed, 50000); - link_speed64 = link_speed; - do_div(link_speed64, 50000); - hairpin_params->num_queues = link_speed64; - - hairpin_params->queue_size = - BIT(min_t(u32, 16 - MLX5_MPWRQ_MIN_LOG_STRIDE_SZ(mdev), - MLX5_CAP_GEN(mdev, log_max_hairpin_num_packets))); -} - static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow, struct mlx5e_tc_flow_parse_attr *parse_attr, struct netlink_ext_ack *extack) { struct mlx5e_tc_table *tc = mlx5e_fs_get_tc(priv->fs); + struct devlink *devlink = priv_to_devlink(priv->mdev); int peer_ifindex = parse_attr->mirred_ifindex[0]; + union devlink_param_value val = {}; struct mlx5_hairpin_params params; struct mlx5_core_dev *peer_mdev; struct mlx5e_hairpin_entry *hpe; @@ -1182,7 +1158,14 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv, hash_hairpin_info(peer_id, match_prio)); mutex_unlock(&tc->hairpin_tbl_lock); - params.log_num_packets = ilog2(tc->hairpin_params.queue_size); + err = devl_param_driverinit_value_get( + devlink, MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE, &val); + if (err) { + err = -ENOMEM; + goto out_err; + } + + params.log_num_packets = ilog2(val.vu32); params.log_data_size = clamp_t(u32, params.log_num_packets + @@ -1191,7 +1174,14 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv, MLX5_CAP_GEN(priv->mdev, log_max_hairpin_wq_data_sz)); params.q_counter = priv->q_counter; - params.num_channels = tc->hairpin_params.num_queues; + err = devl_param_driverinit_value_get( + devlink, MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES, &val); + if (err) { + err = -ENOMEM; + goto out_err; + } + + params.num_channels = val.vu32; hp = mlx5e_hairpin_create(priv, ¶ms, peer_ifindex); hpe->hp = hp; @@ -5289,8 +5279,6 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv) tc->ct = mlx5_tc_ct_init(priv, tc->chains, &tc->mod_hdr, MLX5_FLOW_NAMESPACE_KERNEL, tc->post_act); - mlx5e_hairpin_params_init(&tc->hairpin_params, dev); - tc->netdevice_nb.notifier_call = mlx5e_tc_netdev_event; err = register_netdevice_notifier_dev_net(priv->netdev, &tc->netdevice_nb, -- cgit v1.2.3 From fec2c6d14fd5001e7d24a2ae44f0e9aea82a6149 Mon Sep 17 00:00:00 2001 From: David Vernet Date: Thu, 16 Mar 2023 00:40:28 -0500 Subject: bpf,docs: Remove bpf_cpumask_kptr_get() from documentation Now that the kfunc no longer exists, we can remove it and instead describe how RCU can be used to get a struct bpf_cpumask from a map value. This patch updates the BPF documentation accordingly. Signed-off-by: David Vernet Link: https://lore.kernel.org/r/20230316054028.88924-6-void@manifault.com Signed-off-by: Alexei Starovoitov --- Documentation/bpf/cpumasks.rst | 30 ++++++++++-------------------- 1 file changed, 10 insertions(+), 20 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/cpumasks.rst b/Documentation/bpf/cpumasks.rst index 75344cd230e5..41efd8874eeb 100644 --- a/Documentation/bpf/cpumasks.rst +++ b/Documentation/bpf/cpumasks.rst @@ -117,12 +117,7 @@ For example: As mentioned and illustrated above, these ``struct bpf_cpumask *`` objects can also be stored in a map and used as kptrs. If a ``struct bpf_cpumask *`` is in a map, the reference can be removed from the map with bpf_kptr_xchg(), or -opportunistically acquired with bpf_cpumask_kptr_get(): - -.. kernel-doc:: kernel/bpf/cpumask.c - :identifiers: bpf_cpumask_kptr_get - -Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map: +opportunistically acquired using RCU: .. code-block:: c @@ -144,7 +139,7 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map: /** * A simple example tracepoint program showing how a * struct bpf_cpumask * kptr that is stored in a map can - * be acquired using the bpf_cpumask_kptr_get() kfunc. + * be passed to kfuncs using RCU protection. */ SEC("tp_btf/cgroup_mkdir") int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path) @@ -158,26 +153,21 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map: if (!v) return -ENOENT; + bpf_rcu_read_lock(); /* Acquire a reference to the bpf_cpumask * kptr that's already stored in the map. */ - kptr = bpf_cpumask_kptr_get(&v->cpumask); - if (!kptr) + kptr = v->cpumask; + if (!kptr) { /* If no bpf_cpumask was present in the map, it's because * we're racing with another CPU that removed it with * bpf_kptr_xchg() between the bpf_map_lookup_elem() - * above, and our call to bpf_cpumask_kptr_get(). - * bpf_cpumask_kptr_get() internally safely handles this - * race, and will return NULL if the cpumask is no longer - * present in the map by the time we invoke the kfunc. + * above, and our load of the pointer from the map. */ + bpf_rcu_read_unlock(); return -EBUSY; + } - /* Free the reference we just took above. Note that the - * original struct bpf_cpumask * kptr is still in the map. It will - * be freed either at a later time if another context deletes - * it from the map, or automatically by the BPF subsystem if - * it's still present when the map is destroyed. - */ - bpf_cpumask_release(kptr); + bpf_cpumask_setall(kptr); + bpf_rcu_read_unlock(); return 0; } -- cgit v1.2.3 From 40235edeadf58e4232bfcf8bf15be453cfe233b7 Mon Sep 17 00:00:00 2001 From: Siddharth Vadapalli Date: Wed, 15 Mar 2023 13:29:47 +0530 Subject: dt-bindings: net: ti: k3-am654-cpsw-nuss: Fix compatible order Reorder compatibles to follow alphanumeric order. Signed-off-by: Siddharth Vadapalli Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml index 628d63e1eb1f..6f56add1919b 100644 --- a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml +++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml @@ -54,11 +54,11 @@ properties: compatible: enum: + - ti,am642-cpsw-nuss - ti,am654-cpsw-nuss - ti,j7200-cpswxg-nuss - ti,j721e-cpsw-nuss - ti,j721e-cpswxg-nuss - - ti,am642-cpsw-nuss reg: maxItems: 1 @@ -215,8 +215,8 @@ allOf: compatible: contains: enum: - - ti,j721e-cpswxg-nuss - ti,j7200-cpswxg-nuss + - ti,j721e-cpswxg-nuss then: properties: ethernet-ports: -- cgit v1.2.3 From e0c9c2a7dd738120c2fbc155c6fba1066f109be0 Mon Sep 17 00:00:00 2001 From: Siddharth Vadapalli Date: Wed, 15 Mar 2023 13:29:48 +0530 Subject: dt-bindings: net: ti: k3-am654-cpsw-nuss: Add J784S4 CPSW9G support Update bindings for TI K3 J784S4 SoC which contains 9 ports (8 external ports) CPSW9G module and add compatible for it. Signed-off-by: Siddharth Vadapalli Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml index 6f56add1919b..306709bcc9e9 100644 --- a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml +++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml @@ -59,6 +59,7 @@ properties: - ti,j7200-cpswxg-nuss - ti,j721e-cpsw-nuss - ti,j721e-cpswxg-nuss + - ti,j784s4-cpswxg-nuss reg: maxItems: 1 @@ -197,7 +198,9 @@ allOf: properties: compatible: contains: - const: ti,j721e-cpswxg-nuss + enum: + - ti,j721e-cpswxg-nuss + - ti,j784s4-cpswxg-nuss then: properties: ethernet-ports: @@ -217,6 +220,7 @@ allOf: enum: - ti,j7200-cpswxg-nuss - ti,j721e-cpswxg-nuss + - ti,j784s4-cpswxg-nuss then: properties: ethernet-ports: -- cgit v1.2.3 From 74bf6477c18b2904936763132e9224a41b8da13a Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 15 Mar 2023 21:49:13 -0700 Subject: netlink-specs: add partial specification for devlink Devlink is quite complex but put in the very basics so we can incrementally fill in the commands as needed. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --dump get [{'bus-name': 'netdevsim', 'dev-name': 'netdevsim1', 'dev-stats': {'reload-stats': {'reload-action-info': {'reload-action': 1, 'reload-action-stats': {'reload-stats-entry': [{'reload-stats-limit': 0, 'reload-stats-value': 0}]}}}, 'remote-reload-stats': {'reload-action-info': {'reload-action': 2, 'reload-action-stats': {'reload-stats-entry': [{'reload-stats-limit': 0, 'reload-stats-value': 0}, {'reload-stats-limit': 1, 'reload-stats-value': 0}]}}}}, 'reload-failed': 0}] Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- Documentation/netlink/specs/devlink.yaml | 198 +++++++++++++++++++++++++++++++ 1 file changed, 198 insertions(+) create mode 100644 Documentation/netlink/specs/devlink.yaml (limited to 'Documentation') diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml new file mode 100644 index 000000000000..90641668232e --- /dev/null +++ b/Documentation/netlink/specs/devlink.yaml @@ -0,0 +1,198 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) + +name: devlink + +protocol: genetlink-legacy + +doc: Partial family for Devlink. + +attribute-sets: + - + name: devlink + attributes: + - + name: bus-name + type: string + value: 1 + - + name: dev-name + type: string + - + name: port-index + type: u32 + + # TODO: fill in the attributes in between + + - + name: info-driver-name + type: string + value: 98 + - + name: info-serial-number + type: string + - + name: info-version-fixed + type: nest + multi-attr: true + nested-attributes: dl-info-version + - + name: info-version-running + type: nest + multi-attr: true + nested-attributes: dl-info-version + - + name: info-version-stored + type: nest + multi-attr: true + nested-attributes: dl-info-version + - + name: info-version-name + type: string + - + name: info-version-value + type: string + + # TODO: fill in the attributes in between + + - + name: reload-failed + type: u8 + value: 136 + + # TODO: fill in the attributes in between + + - + name: reload-action + type: u8 + value: 153 + + # TODO: fill in the attributes in between + + - + name: dev-stats + type: nest + value: 156 + nested-attributes: dl-dev-stats + - + name: reload-stats + type: nest + nested-attributes: dl-reload-stats + - + name: reload-stats-entry + type: nest + multi-attr: true + nested-attributes: dl-reload-stats-entry + - + name: reload-stats-limit + type: u8 + - + name: reload-stats-value + type: u32 + - + name: remote-reload-stats + type: nest + nested-attributes: dl-reload-stats + - + name: reload-action-info + type: nest + nested-attributes: dl-reload-act-info + - + name: reload-action-stats + type: nest + nested-attributes: dl-reload-act-stats + - + name: dl-dev-stats + subset-of: devlink + attributes: + - + name: reload-stats + type: nest + - + name: remote-reload-stats + type: nest + - + name: dl-reload-stats + subset-of: devlink + attributes: + - + name: reload-action-info + type: nest + - + name: dl-reload-act-info + subset-of: devlink + attributes: + - + name: reload-action + type: u8 + - + name: reload-action-stats + type: nest + - + name: dl-reload-act-stats + subset-of: devlink + attributes: + - + name: reload-stats-entry + type: nest + - + name: dl-reload-stats-entry + subset-of: devlink + attributes: + - + name: reload-stats-limit + type: u8 + - + name: reload-stats-value + type: u32 + - + name: dl-info-version + subset-of: devlink + attributes: + - + name: info-version-name + type: string + - + name: info-version-value + type: string + +operations: + enum-model: directional + list: + - + name: get + doc: Get devlink instances. + attribute-set: devlink + + do: + request: + value: 1 + attributes: &dev-id-attrs + - bus-name + - dev-name + reply: &get-reply + value: 3 + attributes: + - bus-name + - dev-name + - reload-failed + - reload-action + - dev-stats + dump: + reply: *get-reply + + # TODO: fill in the operations in between + + - + name: info-get + doc: Get device information, like driver name, hardware and firmware versions etc. + attribute-set: devlink + + do: + request: + value: 51 + attributes: *dev-id-attrs + reply: + value: 51 + attributes: + - bus-name + - dev-name -- cgit v1.2.3 From 82b3297009b6831dfe47f0f38ed4043e39f58c9f Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 15 Mar 2023 21:50:27 -0700 Subject: netlink: specs: allow uapi-header in genetlink Chuck wanted to put the UAPI header in linux/net/ which seems reasonable, allow genetlink families to choose the location. It doesn't really matter for non-C-like languages. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- Documentation/netlink/genetlink-c.yaml | 2 +- Documentation/netlink/genetlink-legacy.yaml | 2 +- Documentation/netlink/genetlink.yaml | 3 +++ 3 files changed, 5 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/netlink/genetlink-c.yaml b/Documentation/netlink/genetlink-c.yaml index f082a5ad7cf1..c83643d403b7 100644 --- a/Documentation/netlink/genetlink-c.yaml +++ b/Documentation/netlink/genetlink-c.yaml @@ -33,10 +33,10 @@ properties: protocol: description: Schema compatibility level. Default is "genetlink". enum: [ genetlink, genetlink-c ] - # Start genetlink-c uapi-header: description: Path to the uAPI header, default is linux/${family-name}.h type: string + # Start genetlink-c c-family-name: description: Name of the define for the family name. type: string diff --git a/Documentation/netlink/genetlink-legacy.yaml b/Documentation/netlink/genetlink-legacy.yaml index c6b8c77f7d12..792875dd7ed1 100644 --- a/Documentation/netlink/genetlink-legacy.yaml +++ b/Documentation/netlink/genetlink-legacy.yaml @@ -33,10 +33,10 @@ properties: protocol: description: Schema compatibility level. Default is "genetlink". enum: [ genetlink, genetlink-c, genetlink-legacy ] # Trim - # Start genetlink-c uapi-header: description: Path to the uAPI header, default is linux/${family-name}.h type: string + # Start genetlink-c c-family-name: description: Name of the define for the family name. type: string diff --git a/Documentation/netlink/genetlink.yaml b/Documentation/netlink/genetlink.yaml index b2d56ab9e615..8952e84ff207 100644 --- a/Documentation/netlink/genetlink.yaml +++ b/Documentation/netlink/genetlink.yaml @@ -33,6 +33,9 @@ properties: protocol: description: Schema compatibility level. Default is "genetlink". enum: [ genetlink ] + uapi-header: + description: Path to the uAPI header, default is linux/${family-name}.h + type: string definitions: description: List of type and constant definitions (enums, flags, defines). -- cgit v1.2.3 From 0f10f647f45545004ea50b73a7a7c5c3309ff286 Mon Sep 17 00:00:00 2001 From: Bagas Sanjaya Date: Tue, 14 Mar 2023 14:44:49 +0700 Subject: bpf, docs: Use internal linking for link to netdev subsystem doc Commit d56b0c461d19da ("bpf, docs: Fix link to netdev-FAQ target") attempts to fix linking problem to undefined "netdev-FAQ" label introduced in 287f4fa99a5281 ("docs: Update references to netdev-FAQ") by changing internal cross reference to netdev subsystem documentation (Documentation/process/maintainer-netdev.rst) to external one at docs.kernel.org. However, the linking problem is still not resolved, as the generated link points to non-existent netdev-FAQ section of the external doc, which when clicked, will instead going to the top of the doc. Revert back to internal linking by simply mention the doc path while massaging the leading text to the link, since the netdev subsystem doc contains no FAQs but rather general information about the subsystem. Fixes: d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target") Fixes: 287f4fa99a52 ("docs: Update references to netdev-FAQ") Signed-off-by: Bagas Sanjaya Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230314074449.23620-1-bagasdotme@gmail.com --- Documentation/bpf/bpf_devel_QA.rst | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst index 5f5f9ccc3862..e151e61dff38 100644 --- a/Documentation/bpf/bpf_devel_QA.rst +++ b/Documentation/bpf/bpf_devel_QA.rst @@ -128,7 +128,8 @@ into the bpf-next tree will make their way into net-next tree. net and net-next are both run by David S. Miller. From there, they will go into the kernel mainline tree run by Linus Torvalds. To read up on the process of net and net-next being merged into the mainline tree, see -the `netdev-FAQ`_. +the documentation on netdev subsystem at +Documentation/process/maintainer-netdev.rst. @@ -147,7 +148,8 @@ request):: Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to? --------------------------------------------------------------------------------- -A: The process is the very same as described in the `netdev-FAQ`_, +A: The process is the very same as described in the netdev subsystem +documentation at Documentation/process/maintainer-netdev.rst, so please read up on it. The subject line must indicate whether the patch is a fix or rather "next-like" content in order to let the maintainers know whether it is targeted at bpf or bpf-next. @@ -206,8 +208,9 @@ ii) run extensive BPF test suite and Once the BPF pull request was accepted by David S. Miller, then the patches end up in net or net-next tree, respectively, and make their way from there further into mainline. Again, see the -`netdev-FAQ`_ for additional information e.g. on how often they are -merged to mainline. +documentation for netdev subsystem at +Documentation/process/maintainer-netdev.rst for additional information +e.g. on how often they are merged to mainline. Q: How long do I need to wait for feedback on my BPF patches? ------------------------------------------------------------- @@ -230,7 +233,8 @@ Q: Are patches applied to bpf-next when the merge window is open? ----------------------------------------------------------------- A: For the time when the merge window is open, bpf-next will not be processed. This is roughly analogous to net-next patch processing, -so feel free to read up on the `netdev-FAQ`_ about further details. +so feel free to read up on the netdev docs at +Documentation/process/maintainer-netdev.rst about further details. During those two weeks of merge window, we might ask you to resend your patch series once bpf-next is open again. Once Linus released @@ -394,7 +398,8 @@ netdev kernel mailing list in Cc and ask for the fix to be queued up: netdev@vger.kernel.org The process in general is the same as on netdev itself, see also the -`netdev-FAQ`_. +the documentation on networking subsystem at +Documentation/process/maintainer-netdev.rst. Q: Do you also backport to kernels not currently maintained as stable? ---------------------------------------------------------------------- @@ -410,7 +415,7 @@ Q: The BPF patch I am about to submit needs to go to stable as well What should I do? A: The same rules apply as with netdev patch submissions in general, see -the `netdev-FAQ`_. +the netdev docs at Documentation/process/maintainer-netdev.rst. Never add "``Cc: stable@vger.kernel.org``" to the patch description, but ask the BPF maintainers to queue the patches instead. This can be done @@ -685,7 +690,6 @@ when: .. Links .. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/ -.. _netdev-FAQ: https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html .. _selftests: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/ .. _Documentation/dev-tools/kselftest.rst: -- cgit v1.2.3 From 0de10fd6eb94259a749d558ee0d34083ae010a1d Mon Sep 17 00:00:00 2001 From: Alex Elder Date: Wed, 15 Mar 2023 14:43:05 -0500 Subject: dt-bindings: net: qcom,ipa: add SDX65 compatible Add support for SDX65, which uses IPA v5.0. Reviewed-by: Simon Horman Signed-off-by: Alex Elder Link: https://lore.kernel.org/r/20230315194305.1647311-1-elder@linaro.org Signed-off-by: Jakub Kicinski --- Documentation/devicetree/bindings/net/qcom,ipa.yaml | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/qcom,ipa.yaml b/Documentation/devicetree/bindings/net/qcom,ipa.yaml index 4aeda379726f..2d5e4ffb2f9e 100644 --- a/Documentation/devicetree/bindings/net/qcom,ipa.yaml +++ b/Documentation/devicetree/bindings/net/qcom,ipa.yaml @@ -49,6 +49,7 @@ properties: - qcom,sc7280-ipa - qcom,sdm845-ipa - qcom,sdx55-ipa + - qcom,sdx65-ipa - qcom,sm6350-ipa - qcom,sm8350-ipa -- cgit v1.2.3 From 08ff1c9f3e927ba3701c113dda70953a6f4afffa Mon Sep 17 00:00:00 2001 From: Sreevani Sreejith Date: Wed, 15 Mar 2023 12:54:05 -0700 Subject: bpf, docs: Libbpf overview documentation This patch documents overview of libbpf, including its features for developing BPF programs. Signed-off-by: Sreevani Sreejith Signed-off-by: Andrii Nakryiko Acked-by: David Vernet Link: https://lore.kernel.org/bpf/20230315195405.2051559-1-ssreevani@meta.com --- Documentation/bpf/libbpf/index.rst | 25 ++- Documentation/bpf/libbpf/libbpf_overview.rst | 228 +++++++++++++++++++++++++++ 2 files changed, 245 insertions(+), 8 deletions(-) create mode 100644 Documentation/bpf/libbpf/libbpf_overview.rst (limited to 'Documentation') diff --git a/Documentation/bpf/libbpf/index.rst b/Documentation/bpf/libbpf/index.rst index f9b3b252e28f..7545a2049692 100644 --- a/Documentation/bpf/libbpf/index.rst +++ b/Documentation/bpf/libbpf/index.rst @@ -2,23 +2,32 @@ .. _libbpf: +====== libbpf ====== +If you are looking to develop BPF applications using the libbpf library, this +directory contains important documentation that you should read. + +To get started, it is recommended to begin with the :doc:`libbpf Overview +` document, which provides a high-level understanding of the +libbpf APIs and their usage. This will give you a solid foundation to start +exploring and utilizing the various features of libbpf to develop your BPF +applications. + .. toctree:: :maxdepth: 1 + libbpf_overview API Documentation program_types libbpf_naming_convention libbpf_build -This is documentation for libbpf, a userspace library for loading and -interacting with bpf programs. -All general BPF questions, including kernel functionality, libbpf APIs and -their application, should be sent to bpf@vger.kernel.org mailing list. -You can `subscribe `_ to the -mailing list search its `archive `_. -Please search the archive before asking new questions. It very well might -be that this was already addressed or answered before. +All general BPF questions, including kernel functionality, libbpf APIs and their +application, should be sent to bpf@vger.kernel.org mailing list. You can +`subscribe `_ to the mailing list +search its `archive `_. Please search the archive +before asking new questions. It may be that this was already addressed or +answered before. diff --git a/Documentation/bpf/libbpf/libbpf_overview.rst b/Documentation/bpf/libbpf/libbpf_overview.rst new file mode 100644 index 000000000000..f36a2d4ffea2 --- /dev/null +++ b/Documentation/bpf/libbpf/libbpf_overview.rst @@ -0,0 +1,228 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +libbpf Overview +=============== + +libbpf is a C-based library containing a BPF loader that takes compiled BPF +object files and prepares and loads them into the Linux kernel. libbpf takes the +heavy lifting of loading, verifying, and attaching BPF programs to various +kernel hooks, allowing BPF application developers to focus only on BPF program +correctness and performance. + +The following are the high-level features supported by libbpf: + +* Provides high-level and low-level APIs for user space programs to interact + with BPF programs. The low-level APIs wrap all the bpf system call + functionality, which is useful when users need more fine-grained control + over the interactions between user space and BPF programs. +* Provides overall support for the BPF object skeleton generated by bpftool. + The skeleton file simplifies the process for the user space programs to access + global variables and work with BPF programs. +* Provides BPF-side APIS, including BPF helper definitions, BPF maps support, + and tracing helpers, allowing developers to simplify BPF code writing. +* Supports BPF CO-RE mechanism, enabling BPF developers to write portable + BPF programs that can be compiled once and run across different kernel + versions. + +This document will delve into the above concepts in detail, providing a deeper +understanding of the capabilities and advantages of libbpf and how it can help +you develop BPF applications efficiently. + +BPF App Lifecycle and libbpf APIs +================================== + +A BPF application consists of one or more BPF programs (either cooperating or +completely independent), BPF maps, and global variables. The global +variables are shared between all BPF programs, which allows them to cooperate on +a common set of data. libbpf provides APIs that user space programs can use to +manipulate the BPF programs by triggering different phases of a BPF application +lifecycle. + +The following section provides a brief overview of each phase in the BPF life +cycle: + +* **Open phase**: In this phase, libbpf parses the BPF + object file and discovers BPF maps, BPF programs, and global variables. After + a BPF app is opened, user space apps can make additional adjustments + (setting BPF program types, if necessary; pre-setting initial values for + global variables, etc.) before all the entities are created and loaded. + +* **Load phase**: In the load phase, libbpf creates BPF + maps, resolves various relocations, and verifies and loads BPF programs into + the kernel. At this point, libbpf validates all the parts of a BPF application + and loads the BPF program into the kernel, but no BPF program has yet been + executed. After the load phase, it’s possible to set up the initial BPF map + state without racing with the BPF program code execution. + +* **Attachment phase**: In this phase, libbpf + attaches BPF programs to various BPF hook points (e.g., tracepoints, kprobes, + cgroup hooks, network packet processing pipeline, etc.). During this + phase, BPF programs perform useful work such as processing + packets, or updating BPF maps and global variables that can be read from user + space. + +* **Tear down phase**: In the tear down phase, + libbpf detaches BPF programs and unloads them from the kernel. BPF maps are + destroyed, and all the resources used by the BPF app are freed. + +BPF Object Skeleton File +======================== + +BPF skeleton is an alternative interface to libbpf APIs for working with BPF +objects. Skeleton code abstract away generic libbpf APIs to significantly +simplify code for manipulating BPF programs from user space. Skeleton code +includes a bytecode representation of the BPF object file, simplifying the +process of distributing your BPF code. With BPF bytecode embedded, there are no +extra files to deploy along with your application binary. + +You can generate the skeleton header file ``(.skel.h)`` for a specific object +file by passing the BPF object to the bpftool. The generated BPF skeleton +provides the following custom functions that correspond to the BPF lifecycle, +each of them prefixed with the specific object name: + +* ``__open()`` – creates and opens BPF application (```` stands for + the specific bpf object name) +* ``__load()`` – instantiates, loads,and verifies BPF application parts +* ``__attach()`` – attaches all auto-attachable BPF programs (it’s + optional, you can have more control by using libbpf APIs directly) +* ``__destroy()`` – detaches all BPF programs and + frees up all used resources + +Using the skeleton code is the recommended way to work with bpf programs. Keep +in mind, BPF skeleton provides access to the underlying BPF object, so whatever +was possible to do with generic libbpf APIs is still possible even when the BPF +skeleton is used. It's an additive convenience feature, with no syscalls, and no +cumbersome code. + +Other Advantages of Using Skeleton File +--------------------------------------- + +* BPF skeleton provides an interface for user space programs to work with BPF + global variables. The skeleton code memory maps global variables as a struct + into user space. The struct interface allows user space programs to initialize + BPF programs before the BPF load phase and fetch and update data from user + space afterward. + +* The ``skel.h`` file reflects the object file structure by listing out the + available maps, programs, etc. BPF skeleton provides direct access to all the + BPF maps and BPF programs as struct fields. This eliminates the need for + string-based lookups with ``bpf_object_find_map_by_name()`` and + ``bpf_object_find_program_by_name()`` APIs, reducing errors due to BPF source + code and user-space code getting out of sync. + +* The embedded bytecode representation of the object file ensures that the + skeleton and the BPF object file are always in sync. + +BPF Helpers +=========== + +libbpf provides BPF-side APIs that BPF programs can use to interact with the +system. The BPF helpers definition allows developers to use them in BPF code as +any other plain C function. For example, there are helper functions to print +debugging messages, get the time since the system was booted, interact with BPF +maps, manipulate network packets, etc. + +For a complete description of what the helpers do, the arguments they take, and +the return value, see the `bpf-helpers +`_ man page. + +BPF CO-RE (Compile Once – Run Everywhere) +========================================= + +BPF programs work in the kernel space and have access to kernel memory and data +structures. One limitation that BPF applications come across is the lack of +portability across different kernel versions and configurations. `BCC +`_ is one of the solutions for BPF +portability. However, it comes with runtime overhead and a large binary size +from embedding the compiler with the application. + +libbpf steps up the BPF program portability by supporting the BPF CO-RE concept. +BPF CO-RE brings together BTF type information, libbpf, and the compiler to +produce a single executable binary that you can run on multiple kernel versions +and configurations. + +To make BPF programs portable libbpf relies on the BTF type information of the +running kernel. Kernel also exposes this self-describing authoritative BTF +information through ``sysfs`` at ``/sys/kernel/btf/vmlinux``. + +You can generate the BTF information for the running kernel with the following +command: + +:: + + $ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h + +The command generates a ``vmlinux.h`` header file with all kernel types +(:doc:`BTF types <../btf>`) that the running kernel uses. Including +``vmlinux.h`` in your BPF program eliminates dependency on system-wide kernel +headers. + +libbpf enables portability of BPF programs by looking at the BPF program’s +recorded BTF type and relocation information and matching them to BTF +information (vmlinux) provided by the running kernel. libbpf then resolves and +matches all the types and fields, and updates necessary offsets and other +relocatable data to ensure that BPF program’s logic functions correctly for a +specific kernel on the host. BPF CO-RE concept thus eliminates overhead +associated with BPF development and allows developers to write portable BPF +applications without modifications and runtime source code compilation on the +target machine. + +The following code snippet shows how to read the parent field of a kernel +``task_struct`` using BPF CO-RE and libbf. The basic helper to read a field in a +CO-RE relocatable manner is ``bpf_core_read(dst, sz, src)``, which will read +``sz`` bytes from the field referenced by ``src`` into the memory pointed to by +``dst``. + +.. code-block:: C + :emphasize-lines: 6 + + //... + struct task_struct *task = (void *)bpf_get_current_task(); + struct task_struct *parent_task; + int err; + + err = bpf_core_read(&parent_task, sizeof(void *), &task->parent); + if (err) { + /* handle error */ + } + + /* parent_task contains the value of task->parent pointer */ + +In the code snippet, we first get a pointer to the current ``task_struct`` using +``bpf_get_current_task()``. We then use ``bpf_core_read()`` to read the parent +field of task struct into the ``parent_task`` variable. ``bpf_core_read()`` is +just like ``bpf_probe_read_kernel()`` BPF helper, except it records information +about the field that should be relocated on the target kernel. i.e, if the +``parent`` field gets shifted to a different offset within +``struct task_struct`` due to some new field added in front of it, libbpf will +automatically adjust the actual offset to the proper value. + +Getting Started with libbpf +=========================== + +Check out the `libbpf-bootstrap `_ +repository with simple examples of using libbpf to build various BPF +applications. + +See also `libbpf API documentation +`_. + +libbpf and Rust +=============== + +If you are building BPF applications in Rust, it is recommended to use the +`Libbpf-rs `_ library instead of bindgen +bindings directly to libbpf. Libbpf-rs wraps libbpf functionality in +Rust-idiomatic interfaces and provides libbpf-cargo plugin to handle BPF code +compilation and skeleton generation. Using Libbpf-rs will make building user +space part of the BPF application easier. Note that the BPF program themselves +must still be written in plain C. + +Additional Documentation +======================== + +* `Program types and ELF Sections `_ +* `API naming convention `_ +* `Building libbpf `_ +* `API documentation Convention `_ -- cgit v1.2.3 From e485f3a6eae0849f83b94936778a2325f72a0c89 Mon Sep 17 00:00:00 2001 From: Tony Nguyen Date: Fri, 17 Mar 2023 13:09:03 -0700 Subject: ixgb: Remove ixgb driver There are likely no users of this driver as the hardware has been discontinued since 2010. Remove the driver and all references to it in documentation. Suggested-by: Jakub Kicinski Signed-off-by: Tony Nguyen Acked-by: Jesse Brandeburg Signed-off-by: David S. Miller --- Documentation/PCI/pci-error-recovery.rst | 1 - .../networking/device_drivers/ethernet/index.rst | 1 - .../device_drivers/ethernet/intel/ixgb.rst | 468 ---- arch/loongarch/configs/loongson3_defconfig | 1 - arch/mips/configs/loongson2k_defconfig | 1 - arch/mips/configs/loongson3_defconfig | 1 - arch/mips/configs/mtx1_defconfig | 1 - arch/powerpc/configs/powernv_defconfig | 1 - arch/powerpc/configs/ppc64_defconfig | 1 - arch/powerpc/configs/ppc64e_defconfig | 1 - arch/powerpc/configs/ppc6xx_defconfig | 1 - arch/powerpc/configs/pseries_defconfig | 1 - arch/powerpc/configs/skiroot_defconfig | 1 - drivers/net/ethernet/intel/Kconfig | 17 - drivers/net/ethernet/intel/Makefile | 1 - drivers/net/ethernet/intel/ixgb/Makefile | 9 - drivers/net/ethernet/intel/ixgb/ixgb.h | 179 -- drivers/net/ethernet/intel/ixgb/ixgb_ee.c | 580 ----- drivers/net/ethernet/intel/ixgb/ixgb_ee.h | 79 - drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c | 642 ------ drivers/net/ethernet/intel/ixgb/ixgb_hw.c | 1229 ----------- drivers/net/ethernet/intel/ixgb/ixgb_hw.h | 767 ------- drivers/net/ethernet/intel/ixgb/ixgb_ids.h | 23 - drivers/net/ethernet/intel/ixgb/ixgb_main.c | 2285 -------------------- drivers/net/ethernet/intel/ixgb/ixgb_osdep.h | 39 - drivers/net/ethernet/intel/ixgb/ixgb_param.c | 442 ---- 26 files changed, 6772 deletions(-) delete mode 100644 Documentation/networking/device_drivers/ethernet/intel/ixgb.rst delete mode 100644 drivers/net/ethernet/intel/ixgb/Makefile delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb.h delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_ee.c delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_ee.h delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_hw.c delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_hw.h delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_ids.h delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_main.c delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_osdep.h delete mode 100644 drivers/net/ethernet/intel/ixgb/ixgb_param.c (limited to 'Documentation') diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst index bdafeb4b66dc..9981d330da8f 100644 --- a/Documentation/PCI/pci-error-recovery.rst +++ b/Documentation/PCI/pci-error-recovery.rst @@ -418,7 +418,6 @@ That is, the recovery API only requires that: - drivers/next/e100.c - drivers/net/e1000 - drivers/net/e1000e - - drivers/net/ixgb - drivers/net/ixgbe - drivers/net/cxgb3 - drivers/net/s2io.c diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst index 392969ac88ad..6e9e7012d000 100644 --- a/Documentation/networking/device_drivers/ethernet/index.rst +++ b/Documentation/networking/device_drivers/ethernet/index.rst @@ -31,7 +31,6 @@ Contents: intel/fm10k intel/igb intel/igbvf - intel/ixgb intel/ixgbe intel/ixgbevf intel/i40e diff --git a/Documentation/networking/device_drivers/ethernet/intel/ixgb.rst b/Documentation/networking/device_drivers/ethernet/intel/ixgb.rst deleted file mode 100644 index c6a233e68ad6..000000000000 --- a/Documentation/networking/device_drivers/ethernet/intel/ixgb.rst +++ /dev/null @@ -1,468 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0+ - -===================================================================== -Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection -===================================================================== - -October 1, 2018 - - -Contents -======== - -- In This Release -- Identifying Your Adapter -- Command Line Parameters -- Improving Performance -- Additional Configurations -- Known Issues/Troubleshooting -- Support - - - -In This Release -=============== - -This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) -Network Connection. This driver includes support for Itanium(R)2-based -systems. - -For questions related to hardware requirements, refer to the documentation -supplied with your 10 Gigabit adapter. All hardware requirements listed apply -to use with Linux. - -The following features are available in this kernel: - - Native VLANs - - Channel Bonding (teaming) - - SNMP - -Channel Bonding documentation can be found in the Linux kernel source: -/Documentation/networking/bonding.rst - -The driver information previously displayed in the /proc filesystem is not -supported in this release. Alternatively, you can use ethtool (version 1.6 -or later), lspci, and iproute2 to obtain the same information. - -Instructions on updating ethtool can be found in the section "Additional -Configurations" later in this document. - - -Identifying Your Adapter -======================== - -The following Intel network adapters are compatible with the drivers in this -release: - -+------------+------------------------------+----------------------------------+ -| Controller | Adapter Name | Physical Layer | -+============+==============================+==================================+ -| 82597EX | Intel(R) PRO/10GbE LR/SR/CX4 | - 10G Base-LR (fiber) | -| | Server Adapters | - 10G Base-SR (fiber) | -| | | - 10G Base-CX4 (copper) | -+------------+------------------------------+----------------------------------+ - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - https://support.intel.com - - -Command Line Parameters -======================= - -If the driver is built as a module, the following optional parameters are -used by entering them on the command line with the modprobe command using -this syntax:: - - modprobe ixgb [