[bpf-next,v3] bpf/docs: Document kfunc lifecycle / stability expectations
Commit Message
BPF kernel <-> kernel API stability has been discussed at length over
the last several weeks and months. Now that we've largely aligned over
kfuncs being the way forward, and BPF helpers being considered
functionally frozen, it's time to document the expectations for kfunc
lifecycles and stability so that everyone (BPF users, kfunc developers,
and maintainers) are all aligned, and have a crystal-clear understanding
of the expectations surrounding kfuncs.
To do that, this patch adds that documentation to the main kfuncs
documentation page via a new 'kfunc lifecycle expectations' section. The
patch describes how decisions are made in the kernel regarding whether
to include, keep, deprecate, or change / remove a kfunc. As described
very overtly in the patch itself, but likely worth highlighting here:
"kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
development elsewhere in the kernel".
Rather, the intention and expectation is for kfuncs to be treated like
EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
safe and valuable option for maintainers and kfunc developers to extend
the kernel, without tying anyone's hands, or imposing any kind of
restrictions on maintainers in the same way that UAPI changes do.
In addition to the 'kfunc lifecycle expectations' section, this patch
also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
authors or maintainers can choose to add to kfuncs if and when they
decide to deprecate them. Note that as described in the patch itself, a
kfunc need not be deprecated before being changed or removed -- this
flag is simply provided as an available deprecation mechanism for those
that want to provide a deprecation story / timeline to their users.
When necessary, kfuncs may be changed or removed to accommodate changes
elsewhere in the kernel without any deprecation at all.
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David Vernet <void@manifault.com>
---
Documentation/bpf/kfuncs.rst | 125 +++++++++++++++++++++++++++++++++--
1 file changed, 120 insertions(+), 5 deletions(-)
Comments
On Fri, Feb 3, 2023 at 7:57 AM David Vernet <void@manifault.com> wrote:
>
> BPF kernel <-> kernel API stability has been discussed at length over
> the last several weeks and months. Now that we've largely aligned over
> kfuncs being the way forward, and BPF helpers being considered
> functionally frozen, it's time to document the expectations for kfunc
> lifecycles and stability so that everyone (BPF users, kfunc developers,
> and maintainers) are all aligned, and have a crystal-clear understanding
> of the expectations surrounding kfuncs.
>
> To do that, this patch adds that documentation to the main kfuncs
> documentation page via a new 'kfunc lifecycle expectations' section. The
> patch describes how decisions are made in the kernel regarding whether
> to include, keep, deprecate, or change / remove a kfunc. As described
> very overtly in the patch itself, but likely worth highlighting here:
>
> "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
> development elsewhere in the kernel".
>
> Rather, the intention and expectation is for kfuncs to be treated like
> EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
> safe and valuable option for maintainers and kfunc developers to extend
> the kernel, without tying anyone's hands, or imposing any kind of
> restrictions on maintainers in the same way that UAPI changes do.
>
> In addition to the 'kfunc lifecycle expectations' section, this patch
> also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
> authors or maintainers can choose to add to kfuncs if and when they
> decide to deprecate them. Note that as described in the patch itself, a
> kfunc need not be deprecated before being changed or removed -- this
> flag is simply provided as an available deprecation mechanism for those
> that want to provide a deprecation story / timeline to their users.
> When necessary, kfuncs may be changed or removed to accommodate changes
> elsewhere in the kernel without any deprecation at all.
>
> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
> Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: David Vernet <void@manifault.com>
David, Toke,
Thanks a lot for writing it down.
It certainly captures the main points.
Applied.
Hello:
This patch was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Fri, 3 Feb 2023 09:57:27 -0600 you wrote:
> BPF kernel <-> kernel API stability has been discussed at length over
> the last several weeks and months. Now that we've largely aligned over
> kfuncs being the way forward, and BPF helpers being considered
> functionally frozen, it's time to document the expectations for kfunc
> lifecycles and stability so that everyone (BPF users, kfunc developers,
> and maintainers) are all aligned, and have a crystal-clear understanding
> of the expectations surrounding kfuncs.
>
> [...]
Here is the summary with links:
- [bpf-next,v3] bpf/docs: Document kfunc lifecycle / stability expectations
https://git.kernel.org/bpf/bpf-next/c/16c294a6aad8
You are awesome, thank you!
On Fri, Feb 03, 2023 at 09:57:27AM -0600, David Vernet wrote:
> BPF kernel <-> kernel API stability has been discussed at length over
> the last several weeks and months. Now that we've largely aligned over
> kfuncs being the way forward, and BPF helpers being considered
> functionally frozen, it's time to document the expectations for kfunc
> lifecycles and stability so that everyone (BPF users, kfunc developers,
> and maintainers) are all aligned, and have a crystal-clear understanding
> of the expectations surrounding kfuncs.
>
> To do that, this patch adds that documentation to the main kfuncs
> documentation page via a new 'kfunc lifecycle expectations' section. The
> patch describes how decisions are made in the kernel regarding whether
> to include, keep, deprecate, or change / remove a kfunc. As described
> very overtly in the patch itself, but likely worth highlighting here:
>
> "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
> development elsewhere in the kernel".
>
> Rather, the intention and expectation is for kfuncs to be treated like
> EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
> safe and valuable option for maintainers and kfunc developers to extend
> the kernel, without tying anyone's hands, or imposing any kind of
> restrictions on maintainers in the same way that UAPI changes do.
I think they are still different, kernel modules are still considered as
a part of kernel development, while eBPF code is not that supposed to be
kernel development, at least much further. Treating them alike is
misleading, IMHO.
>
> In addition to the 'kfunc lifecycle expectations' section, this patch
> also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
> authors or maintainers can choose to add to kfuncs if and when they
> decide to deprecate them. Note that as described in the patch itself, a
> kfunc need not be deprecated before being changed or removed -- this
> flag is simply provided as an available deprecation mechanism for those
> that want to provide a deprecation story / timeline to their users.
> When necessary, kfuncs may be changed or removed to accommodate changes
> elsewhere in the kernel without any deprecation at all.
This fundamentally contradicts with Compile-Once-Run-Everywhere
https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
Could you add some clarification for this too? Especically how we could
respect CO-RE meanwhile deprecating kfuncs?
BTW, not related to compatibility, but still kfuncs related confusion,
it also contradicts with Documentation/bpf/bpf_design_QA.rst:
"
Q: Can BPF functionality such as new program or map types, new
helpers, etc be added out of kernel module code?
A: NO.
"
The conntrack kfuncs like bpf_skb_ct_alloc() reside in a kernel module.
Thanks!
Cong Wang <xiyou.wangcong@gmail.com> writes:
> On Fri, Feb 03, 2023 at 09:57:27AM -0600, David Vernet wrote:
>> BPF kernel <-> kernel API stability has been discussed at length over
>> the last several weeks and months. Now that we've largely aligned over
>> kfuncs being the way forward, and BPF helpers being considered
>> functionally frozen, it's time to document the expectations for kfunc
>> lifecycles and stability so that everyone (BPF users, kfunc developers,
>> and maintainers) are all aligned, and have a crystal-clear understanding
>> of the expectations surrounding kfuncs.
>>
>> To do that, this patch adds that documentation to the main kfuncs
>> documentation page via a new 'kfunc lifecycle expectations' section. The
>> patch describes how decisions are made in the kernel regarding whether
>> to include, keep, deprecate, or change / remove a kfunc. As described
>> very overtly in the patch itself, but likely worth highlighting here:
>>
>> "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
>> development elsewhere in the kernel".
>>
>> Rather, the intention and expectation is for kfuncs to be treated like
>> EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
>> safe and valuable option for maintainers and kfunc developers to extend
>> the kernel, without tying anyone's hands, or imposing any kind of
>> restrictions on maintainers in the same way that UAPI changes do.
>
> I think they are still different, kernel modules are still considered as
> a part of kernel development, while eBPF code is not that supposed to be
> kernel development, at least much further. Treating them alike is
> misleading, IMHO.
If you read the actual documentation text added to kfuncs.rst this
difference is indeed called out. But you're right that "treated like" in
the commit message is probably a bit strong.
>> In addition to the 'kfunc lifecycle expectations' section, this patch
>> also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
>> authors or maintainers can choose to add to kfuncs if and when they
>> decide to deprecate them. Note that as described in the patch itself, a
>> kfunc need not be deprecated before being changed or removed -- this
>> flag is simply provided as an available deprecation mechanism for those
>> that want to provide a deprecation story / timeline to their users.
>> When necessary, kfuncs may be changed or removed to accommodate changes
>> elsewhere in the kernel without any deprecation at all.
>
> This fundamentally contradicts with Compile-Once-Run-Everywhere
> https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
> Could you add some clarification for this too? Especically how we could
> respect CO-RE meanwhile deprecating kfuncs?
Well, CO-RE doesn't work for kfuncs, currently, so... :)
What do you mean "respect CO-RE", though? CO-RE is a tool to make BPF
programs more portable, so not sure how one would "respect" that?
> BTW, not related to compatibility, but still kfuncs related confusion,
> it also contradicts with Documentation/bpf/bpf_design_QA.rst:
>
> "
> Q: Can BPF functionality such as new program or map types, new
> helpers, etc be added out of kernel module code?
>
> A: NO.
> "
>
> The conntrack kfuncs like bpf_skb_ct_alloc() reside in a kernel
> module.
Yup, good point, we should update that. I'll send a patch...
-Toke
On Sun, Feb 05, 2023 at 12:42:03PM -0800, Cong Wang wrote:
> On Fri, Feb 03, 2023 at 09:57:27AM -0600, David Vernet wrote:
> > BPF kernel <-> kernel API stability has been discussed at length over
> > the last several weeks and months. Now that we've largely aligned over
> > kfuncs being the way forward, and BPF helpers being considered
> > functionally frozen, it's time to document the expectations for kfunc
> > lifecycles and stability so that everyone (BPF users, kfunc developers,
> > and maintainers) are all aligned, and have a crystal-clear understanding
> > of the expectations surrounding kfuncs.
> >
> > To do that, this patch adds that documentation to the main kfuncs
> > documentation page via a new 'kfunc lifecycle expectations' section. The
> > patch describes how decisions are made in the kernel regarding whether
> > to include, keep, deprecate, or change / remove a kfunc. As described
> > very overtly in the patch itself, but likely worth highlighting here:
> >
> > "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
> > development elsewhere in the kernel".
> >
> > Rather, the intention and expectation is for kfuncs to be treated like
> > EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
> > safe and valuable option for maintainers and kfunc developers to extend
> > the kernel, without tying anyone's hands, or imposing any kind of
> > restrictions on maintainers in the same way that UAPI changes do.
>
> I think they are still different, kernel modules are still considered as
> a part of kernel development, while eBPF code is not that supposed to be
> kernel development, at least much further. Treating them alike is
> misleading, IMHO.
I'm not following. How is a BPF program not kernel development?
> >
> > In addition to the 'kfunc lifecycle expectations' section, this patch
> > also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
> > authors or maintainers can choose to add to kfuncs if and when they
> > decide to deprecate them. Note that as described in the patch itself, a
> > kfunc need not be deprecated before being changed or removed -- this
> > flag is simply provided as an available deprecation mechanism for those
> > that want to provide a deprecation story / timeline to their users.
> > When necessary, kfuncs may be changed or removed to accommodate changes
> > elsewhere in the kernel without any deprecation at all.
>
> This fundamentally contradicts with Compile-Once-Run-Everywhere
> https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
Sorry, but again, I'm not following your point. What exactly about this
"fundamentally contradicts" with CO-RE? Please elaborate if you're going
to claim that something is a fundamental contradiction.
> Could you add some clarification for this too? Especically how we could
> respect CO-RE meanwhile deprecating kfuncs?
I don't know what you mean by "respecting CO-RE". You can compile a BPF
program that calls a kfunc and invoke it on differents, as long as
whatever kernel you're running on provides that kfunc with the same BTF
encoding. This is no different than e.g. accessing a struct element on
two kernel versions.
Also, CO-RE doesn't provide any ironclad guarantees either. If you
access a struct element in a BPF program, and then a kernel version
removes that element from the struct, that BPF program will fail to load
on that kernel.
> BTW, not related to compatibility, but still kfuncs related confusion,
> it also contradicts with Documentation/bpf/bpf_design_QA.rst:
>
> "
> Q: Can BPF functionality such as new program or map types, new
> helpers, etc be added out of kernel module code?
>
> A: NO.
Agreed, we should improve the QA to mention that you can load kfuncs
from a module -- thanks for pointing that out!
> "
>
> The conntrack kfuncs like bpf_skb_ct_alloc() reside in a kernel module.
>
> Thanks!
@@ -13,7 +13,7 @@ BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
kfuncs do not have a stable interface and can change from one kernel release to
another. Hence, BPF programs need to be updated in response to changes in the
-kernel.
+kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
2. Defining a kfunc
===================
@@ -238,6 +238,28 @@ single argument which must be a trusted argument or a MEM_RCU pointer.
The argument may have reference count of 0 and the kfunc must take this
into consideration.
+.. _KF_deprecated_flag:
+
+2.4.9 KF_DEPRECATED flag
+------------------------
+
+The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
+changed or removed in a subsequent kernel release. A kfunc that is
+marked with KF_DEPRECATED should also have any relevant information
+captured in its kernel doc. Such information typically includes the
+kfunc's expected remaining lifespan, a recommendation for new
+functionality that can replace it if any is available, and possibly a
+rationale for why it is being removed.
+
+Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
+supported and have its KF_DEPRECATED flag removed, it is likely to be far more
+difficult to remove a KF_DEPRECATED flag after it's been added than it is to
+prevent it from being added in the first place. As described in
+:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
+encouraged to make their use-cases known as early as possible, and participate
+in upstream discussions regarding whether to keep, change, deprecate, or remove
+those kfuncs if and when such discussions occur.
+
2.5 Registering the kfuncs
--------------------------
@@ -304,14 +326,107 @@ In order to accommodate such requirements, the verifier will enforce strict
PTR_TO_BTF_ID type matching if two types have the exact same name, with one
being suffixed with ``___init``.
-3. Core kfuncs
+.. _BPF_kfunc_lifecycle_expectations:
+
+3. kfunc lifecycle expectations
+===============================
+
+kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
+strict stability restrictions associated with kernel <-> user UAPIs. This means
+they can be thought of as similar to EXPORT_SYMBOL_GPL, and can therefore be
+modified or removed by a maintainer of the subsystem they're defined in when
+it's deemed necessary.
+
+Like any other change to the kernel, maintainers will not change or remove a
+kfunc without having a reasonable justification. Whether or not they'll choose
+to change a kfunc will ultimately depend on a variety of factors, such as how
+widely used the kfunc is, how long the kfunc has been in the kernel, whether an
+alternative kfunc exists, what the norm is in terms of stability for the
+subsystem in question, and of course what the technical cost is of continuing
+to support the kfunc.
+
+There are several implications of this:
+
+a) kfuncs that are widely used or have been in the kernel for a long time will
+ be more difficult to justify being changed or removed by a maintainer. In
+ other words, kfuncs that are known to have a lot of users and provide
+ significant value provide stronger incentives for maintainers to invest the
+ time and complexity in supporting them. It is therefore important for
+ developers that are using kfuncs in their BPF programs to communicate and
+ explain how and why those kfuncs are being used, and to participate in
+ discussions regarding those kfuncs when they occur upstream.
+
+b) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF programs
+ that call kfuncs are generally not part of the kernel tree. This means that
+ refactoring cannot typically change callers in-place when a kfunc changes,
+ as is done for e.g. an upstreamed driver being updated in place when a
+ kernel symbol is changed.
+
+ Unlike with regular kernel symbols, this is expected behavior for BPF
+ symbols, and out-of-tree BPF programs that use kfuncs should be considered
+ relevant to discussions and decisions around modifying and removing those
+ kfuncs. The BPF community will take an active role in participating in
+ upstream discussions when necessary to ensure that the perspectives of such
+ users are taken into account.
+
+c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
+ will not ever hard-block a change in the kernel purely for stability
+ reasons. That being said, kfuncs are features that are meant to solve
+ problems and provide value to users. The decision of whether to change or
+ remove a kfunc is a multivariate technical decision that is made on a
+ case-by-case basis, and which is informed by data points such as those
+ mentioned above. It is expected that a kfunc being removed or changed with
+ no warning will not be a common occurrence or take place without sound
+ justification, but it is a possibility that must be accepted if one is to
+ use kfuncs.
+
+3.1 kfunc deprecation
+---------------------
+
+As described above, while sometimes a maintainer may find that a kfunc must be
+changed or removed immediately to accommodate some changes in their subsystem,
+usually kfuncs will be able to accommodate a longer and more measured
+deprecation process. For example, if a new kfunc comes along which provides
+superior functionality to an existing kfunc, the existing kfunc may be
+deprecated for some period of time to allow users to migrate their BPF programs
+to use the new one. Or, if a kfunc has no known users, a decision may be made
+to remove the kfunc (without providing an alternative API) after some
+deprecation period so as to provide users with a window to notify the kfunc
+maintainer if it turns out that the kfunc is actually being used.
+
+It's expected that the common case will be that kfuncs will go through a
+deprecation period rather than being changed or removed without warning. As
+described in :ref:`KF_deprecated_flag`, the kfunc framework provides the
+KF_DEPRECATED flag to kfunc developers to signal to users that a kfunc has been
+deprecated. Once a kfunc has been marked with KF_DEPRECATED, the following
+procedure is followed for removal:
+
+1. Any relevant information for deprecated kfuncs is documented in the kfunc's
+ kernel docs. This documentation will typically include the kfunc's expected
+ remaining lifespan, a recommendation for new functionality that can replace
+ the usage of the deprecated function (or an explanation as to why no such
+ replacement exists), etc.
+
+2. The deprecated kfunc is kept in the kernel for some period of time after it
+ was first marked as deprecated. This time period will be chosen on a
+ case-by-case basis, and will typically depend on how widespread the use of
+ the kfunc is, how long it has been in the kernel, and how hard it is to move
+ to alternatives. This deprecation time period is "best effort", and as
+ described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
+ sometimes dictate that the kfunc be removed before the full intended
+ deprecation period has elapsed.
+
+3. After the deprecation period the kfunc will be removed. At this point, BPF
+ programs calling the kfunc will be rejected by the verifier.
+
+4. Core kfuncs
==============
The BPF subsystem provides a number of "core" kfuncs that are potentially
applicable to a wide variety of different possible use cases and programs.
Those kfuncs are documented here.
-3.1 struct task_struct * kfuncs
+4.1 struct task_struct * kfuncs
-------------------------------
There are a number of kfuncs that allow ``struct task_struct *`` objects to be
@@ -387,7 +502,7 @@ Here is an example of it being used:
return 0;
}
-3.2 struct cgroup * kfuncs
+4.2 struct cgroup * kfuncs
--------------------------
``struct cgroup *`` objects also have acquire and release functions:
@@ -502,7 +617,7 @@ the verifier. bpf_cgroup_ancestor() can be used as follows:
return 0;
}
-3.3 struct cpumask * kfuncs
+4.3 struct cpumask * kfuncs
---------------------------
BPF provides a set of kfuncs that can be used to query, allocate, mutate, and