[bpf-next,v2] bpf/docs: Document kfunc lifecycle / stability expectations

Message ID 20230202223557.744110-2-void@manifault.com
State New
Headers
Series [bpf-next,v2] bpf/docs: Document kfunc lifecycle / stability expectations |

Commit Message

David Vernet Feb. 2, 2023, 10:35 p.m. UTC
  BPF kernel <-> kernel API stability has been discussed at length over
the last several weeks and months. Now that we've largely aligned over
kfuncs being the way forward, and BPF helpers being considered frozen,
it's time to document the expectations for kfunc lifecycles and
stability so that everyone (BPF users, kfunc developers, and
maintainers) are all aligned, and have a crystal-clear understanding of
the expectations surrounding kfuncs.

To do that, this patch adds that documentation to the main kfuncs
documentation page via a new 'kfunc lifecycle expectations' section. The
patch describes how decisions are made in the kernel regarding whether
to include, keep, deprecate, or change / remove a kfunc. As described
very overtly in the patch itself, but likely worth highlighting here:

"kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
development elsewhere in the kernel".

Rather, the intention and expectation is for kfuncs to be treated like
EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
safe and valuable option for maintainers and kfunc developers to extend
the kernel, without tying anyone's hands, or imposing any kind of
restrictions on maintainers in the same way that UAPI changes do.

In addition to the 'kfunc lifecycle expectations' section, this patch
also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
authors or maintainers can choose to add to kfuncs if and when they
decide to deprecate them. Note that as described in the patch itself, a
kfunc need not be deprecated before being changed or removed -- this
flag is simply provided as an available deprecation mechanism for those
that want to provide a deprecation story / timeline to their users.
When necessary, kfuncs may be changed or removed to accommodate changes
elsewhere in the kernel without any deprecation at all.

Signed-off-by: David Vernet <void@manifault.com>
---
 Documentation/bpf/kfuncs.rst | 138 +++++++++++++++++++++++++++++++++--
 1 file changed, 133 insertions(+), 5 deletions(-)
  

Comments

Bagas Sanjaya Feb. 3, 2023, 2:14 a.m. UTC | #1
On 2/3/23 05:35, David Vernet wrote:
> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 0bd07b39c2a4..4135f3111b67 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst
> @@ -13,7 +13,7 @@ BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
>  kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
>  kfuncs do not have a stable interface and can change from one kernel release to
>  another. Hence, BPF programs need to be updated in response to changes in the
> -kernel.
> +kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
>  
>  2. Defining a kfunc
>  ===================
> @@ -238,6 +238,32 @@ single argument which must be a trusted argument or a MEM_RCU pointer.
>  The argument may have reference count of 0 and the kfunc must take this
>  into consideration.
>  
> +.. _KF_deprecated_flag:
> +
> +2.4.9 KF_DEPRECATED flag
> +------------------------
> +
> +The KF_DEPRECATED flag is used for kfuncs which are expected to be changed or
> +removed in a subsequent kernel release. Deprecated kfuncs may be removed at any
> +time, though if possible (and when applicable), developers are encouraged to
> +provide users with a deprecation window to ease the burden of migrating off of
> +the kfunc.
> +
> +A kfunc that is marked with KF_DEPRECATED should also have any relevant
> +information captured in its kernel doc. Such information typically includes the
> +kfunc's expected remaining lifespan, a recommendation for new functionality
> +that can replace it if any is available, and possibly a rationale for why it is
> +being removed.
> +
> +Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
> +supported and have its KF_DEPRECATED flag removed, it is likely to be far more
> +difficult to remove a KF_DEPRECATED flag after it's been added than it is to
> +prevent it from being added in the first place. As described in
> +:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
> +highly encouraged to make their use-cases known as early as possible, and
> +participate in upstream discussions regarding whether to keep, change,
> +deprecate, or remove those kfuncs if and when such discussions occur.
> +
>  2.5 Registering the kfuncs
>  --------------------------
>  
> @@ -304,14 +330,116 @@ In order to accommodate such requirements, the verifier will enforce strict
>  PTR_TO_BTF_ID type matching if two types have the exact same name, with one
>  being suffixed with ``___init``.
>  
> -3. Core kfuncs
> +.. _BPF_kfunc_lifecycle_expectations:
> +
> +3. kfunc lifecycle expectations
> +===============================
> +
> +kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
> +strict stability restrictions associated with kernel <-> user UAPIs. Instead,
> +they're modeled more similarly to EXPORT_SYMBOL_GPL, and can therefore be
> +modified or removed by a maintainer of the subsystem they're defined in when
> +it's deemed necessary.
> +
> +Like any other change to the kernel, maintainers will not change or remove a
> +kfunc without having a reasonable justification.  Whether or not they'll choose
> +to change a kfunc will ultimately depend on a variety of factors, such as how
> +widely used the kfunc is, how long the kfunc has been in the kernel, whether an
> +alternative kfunc exists, what the norm is in terms of stability for the
> +subsystem in question, and of course what the technical cost is of continuing
> +to support the kfunc.
> +
> +There are several implications of this:
> +
> +a) kfuncs that are widely used or have been in the kernel for a long time will
> +   be more difficult to justify being changed or removed by a maintainer. Said
> +   in a different way, kfuncs that are known to have a lot of users and provide
> +   significant value provide stronger incentives for maintainers to invest the
> +   time and complexity in supporting them. It is therefore important for
> +   developers that are using kfuncs in their BPF programs to communicate and
> +   explain how and why those kfuncs are being used, and to participate in
> +   discussions regarding those kfuncs when they occur upstream.
> +
> +b) Because many BPF programs are not upstreamed as part of the kernel tree, it
> +   is often not possible to change them in-place when a kfunc changes, as it is
> +   for e.g. an upstreamed driver being updated in place when an
> +   EXPORT_SYMBOL_GPL symbol is changed. Distributions that bundle BPF programs
> +   that use kfuncs must therefore ensure that those BPF programs are linking
> +   against the kfuncs that are supported by the kernel version being used for
> +   any given release. Additionally, BPF developers are encouraged to upstream
> +   their BPF programs so they can enjoy the same benefits as upstreamed
> +   modules, and avoid code churn.
> +
> +   On the other hand, while the hope is that it will become the norm to
> +   upstream BPF programs, the reality is that most BPF programs are still
> +   out-of-tree. This means that users with out-of-tree BPF programs that use
> +   kfuncs should be considered relevant to discussions and decisions around
> +   modifying and removing kfuncs, despite that not being the norm for
> +   out-of-tree kernel modules. The BPF community will take an active role in
> +   participating in upstream discussions when necessary to ensure that the
> +   perspectives of such users are taken into account.
> +
> +c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
> +   will not ever hard-block a change in the kernel purely for stability
> +   reasons. In other words, kfuncs have the same stability guarantees as any
> +   other kernel API, such as those provided by EXPORT_SYMBOL_GPL, though with
> +   perhaps less burden than EXPORT_SYMBOL_GPL changes thanks to BPF CO-RE.
> +
> +   That being said, kfuncs are features that are meant to solve problems and
> +   provide value to users. The decision of whether to change or remove a kfunc
> +   is a multivariate technical decision that is made on a case-by-case basis,
> +   and which is informed by data points such as those mentioned above. It is
> +   expected that a kfunc being removed or changed with no warning will not be a
> +   common occurrence or take place without sound justification, but it is a
> +   possibility that must be accepted if one is to use kfuncs.
> +
> +3.1 kfunc deprecation
> +---------------------
> +
> +As described above, while sometimes a maintainer may find that a kfunc must be
> +changed or removed immediately to accommodate some changes in their subsystem,
> +other kfuncs may be able to accommodate a longer and more measured deprecation
> +process. For example, if a new kfunc comes along which provides superior
> +functionality to an existing kfunc, the existing kfunc may be deprecated for
> +some period of time to allow users to migrate their BPF programs to use the new
> +one. Or, if a kfunc has no known users, a decision may be made to remove the
> +kfunc (without providing an alternative API) after some deprecation period
> +period so as to provide users with a window to notify the kfunc maintainer if
> +it turns out that the kfunc is actually being used.
> +
> +kfuncs being deprecated (rather than changed or removed with no warning) is
> +expected to be the common case, and as described in :ref:`KF_deprecated_flag`,
> +the kfunc framework provides the KF_DEPRECATED flag to kfunc developers to
> +signal to users that a kfunc has been deprecated. Once a kfunc has been marked
> +with KF_DEPRECATED, the following procedure is followed for removal:
> +
> +1. Any relevant information for deprecated kfuncs is documented in the kfunc's
> +   kernel docs. This documentation will typically include the kfunc's expected
> +   remaining lifespan,  a recommendation for new functionality that can replace
> +   the usage of the deprecated function (or an explanation as to why no such
> +   replacement exists), etc.
> +
> +2. The deprecated kfunc is kept in the kernel for some period of time after it
> +   was first marked as deprecated. This time period will be chosen on a
> +   case-by-case basis, and will typically depend on how widespread the use of
> +   the kfunc is, how long it has been in the kernel, and how hard it is to move
> +   to alternatives. This deprecation time period is "best effort", and as
> +   described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
> +   sometimes dictate that the kfunc be removed before the full intended
> +   deprecation period has elapsed.
> +
> +3. After the deprecation period, or sometimes earlier if necessary, the kfunc
> +   will be removed. At this point, BPF programs calling the kfunc will be
> +   rejected by the verifier.
> +
> +4. Core kfuncs
>  ==============
>  
>  The BPF subsystem provides a number of "core" kfuncs that are potentially
>  applicable to a wide variety of different possible use cases and programs.
>  Those kfuncs are documented here.
>  
> -3.1 struct task_struct * kfuncs
> +4.1 struct task_struct * kfuncs
>  -------------------------------
>  
>  There are a number of kfuncs that allow ``struct task_struct *`` objects to be
> @@ -387,7 +515,7 @@ Here is an example of it being used:
>  		return 0;
>  	}
>  
> -3.2 struct cgroup * kfuncs
> +4.2 struct cgroup * kfuncs
>  --------------------------
>  
>  ``struct cgroup *`` objects also have acquire and release functions:
> @@ -502,7 +630,7 @@ the verifier. bpf_cgroup_ancestor() can be used as follows:
>  		return 0;
>  	}
>  
> -3.3 struct cpumask * kfuncs
> +4.3 struct cpumask * kfuncs
>  ---------------------------
>  
>  BPF provides a set of kfuncs that can be used to query, allocate, mutate, and

LGTM, thanks!

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
  
Donald Hunter Feb. 3, 2023, 10:48 a.m. UTC | #2
David Vernet <void@manifault.com> writes:

> +3. kfunc lifecycle expectations
> +===============================
> +
> +kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
> +strict stability restrictions associated with kernel <-> user UAPIs. Instead,
> +they're modeled more similarly to EXPORT_SYMBOL_GPL, and can therefore be
> +modified or removed by a maintainer of the subsystem they're defined in when
> +it's deemed necessary.
> +
> +Like any other change to the kernel, maintainers will not change or remove a
> +kfunc without having a reasonable justification.  Whether or not they'll choose
> +to change a kfunc will ultimately depend on a variety of factors, such as how
> +widely used the kfunc is, how long the kfunc has been in the kernel, whether an
> +alternative kfunc exists, what the norm is in terms of stability for the
> +subsystem in question, and of course what the technical cost is of continuing
> +to support the kfunc.
> +
> +There are several implications of this:
> +
> +a) kfuncs that are widely used or have been in the kernel for a long time will
> +   be more difficult to justify being changed or removed by a maintainer. Said
> +   in a different way, kfuncs that are known to have a lot of users and provide
> +   significant value provide stronger incentives for maintainers to invest the
> +   time and complexity in supporting them. It is therefore important for
> +   developers that are using kfuncs in their BPF programs to communicate and
> +   explain how and why those kfuncs are being used, and to participate in
> +   discussions regarding those kfuncs when they occur upstream.
> +
> +b) Because many BPF programs are not upstreamed as part of the kernel tree, it
> +   is often not possible to change them in-place when a kfunc changes, as it is
> +   for e.g. an upstreamed driver being updated in place when an
> +   EXPORT_SYMBOL_GPL symbol is changed. Distributions that bundle BPF programs
> +   that use kfuncs must therefore ensure that those BPF programs are linking
> +   against the kfuncs that are supported by the kernel version being used for
> +   any given release. Additionally, BPF developers are encouraged to upstream
> +   their BPF programs so they can enjoy the same benefits as upstreamed
> +   modules, and avoid code churn.

It seems unrealistic to wish for BPF programs to be upstreamed, for
several reasons. A key benefit of BPF programs is that they are
decoupled from the kernel lifecycle and packaging. BPF programs are
often more tightly coupled with the application they are part of and
need to be maintained alongside those applications. There does not seem
to be any desire, process or incentive to maintain BPF programs in tree.

> +   On the other hand, while the hope is that it will become the norm to
> +   upstream BPF programs, the reality is that most BPF programs are still
> +   out-of-tree. This means that users with out-of-tree BPF programs that use
> +   kfuncs should be considered relevant to discussions and decisions around
> +   modifying and removing kfuncs, despite that not being the norm for
> +   out-of-tree kernel modules. The BPF community will take an active role in
> +   participating in upstream discussions when necessary to ensure that the
> +   perspectives of such users are taken into account.
> +
> +c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
> +   will not ever hard-block a change in the kernel purely for stability
> +   reasons. In other words, kfuncs have the same stability guarantees as any
> +   other kernel API, such as those provided by EXPORT_SYMBOL_GPL, though with
> +   perhaps less burden than EXPORT_SYMBOL_GPL changes thanks to BPF CO-RE.
> +
> +   That being said, kfuncs are features that are meant to solve problems and
> +   provide value to users. The decision of whether to change or remove a kfunc
> +   is a multivariate technical decision that is made on a case-by-case basis,
> +   and which is informed by data points such as those mentioned above. It is
> +   expected that a kfunc being removed or changed with no warning will not be a
> +   common occurrence or take place without sound justification, but it is a
> +   possibility that must be accepted if one is to use kfuncs.
> +
> +3.1 kfunc deprecation
> +---------------------
> +
> +As described above, while sometimes a maintainer may find that a kfunc must be
> +changed or removed immediately to accommodate some changes in their subsystem,
> +other kfuncs may be able to accommodate a longer and more measured deprecation

How about replacing 'other kfuncs may' with 'usually kfuncs will' to
re-emphasise that this would be the more common scenario.

> +process. For example, if a new kfunc comes along which provides superior
> +functionality to an existing kfunc, the existing kfunc may be deprecated for
> +some period of time to allow users to migrate their BPF programs to use the new
> +one. Or, if a kfunc has no known users, a decision may be made to remove the
> +kfunc (without providing an alternative API) after some deprecation period
> +period so as to provide users with a window to notify the kfunc maintainer if

Duplicate 'period'.

> +it turns out that the kfunc is actually being used.
> +
> +kfuncs being deprecated (rather than changed or removed with no warning) is
> +expected to be the common case, and as described in :ref:`KF_deprecated_flag`,
> +the kfunc framework provides the KF_DEPRECATED flag to kfunc developers to
> +signal to users that a kfunc has been deprecated. Once a kfunc has been marked
> +with KF_DEPRECATED, the following procedure is followed for removal:
> +
> +1. Any relevant information for deprecated kfuncs is documented in the kfunc's
> +   kernel docs. This documentation will typically include the kfunc's expected
> +   remaining lifespan,  a recommendation for new functionality that can replace
> +   the usage of the deprecated function (or an explanation as to why no such
> +   replacement exists), etc.
> +
> +2. The deprecated kfunc is kept in the kernel for some period of time after it
> +   was first marked as deprecated. This time period will be chosen on a
> +   case-by-case basis, and will typically depend on how widespread the use of
> +   the kfunc is, how long it has been in the kernel, and how hard it is to move
> +   to alternatives. This deprecation time period is "best effort", and as
> +   described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
> +   sometimes dictate that the kfunc be removed before the full intended
> +   deprecation period has elapsed.
> +
> +3. After the deprecation period, or sometimes earlier if necessary, the kfunc
> +   will be removed. At this point, BPF programs calling the kfunc will be
> +   rejected by the verifier.
> +
  
Toke Høiland-Jørgensen Feb. 3, 2023, 1:04 p.m. UTC | #3
David Vernet <void@manifault.com> writes:

> BPF kernel <-> kernel API stability has been discussed at length over
> the last several weeks and months. Now that we've largely aligned over
> kfuncs being the way forward, and BPF helpers being considered frozen,
> it's time to document the expectations for kfunc lifecycles and
> stability so that everyone (BPF users, kfunc developers, and
> maintainers) are all aligned, and have a crystal-clear understanding of
> the expectations surrounding kfuncs.
>
> To do that, this patch adds that documentation to the main kfuncs
> documentation page via a new 'kfunc lifecycle expectations' section. The
> patch describes how decisions are made in the kernel regarding whether
> to include, keep, deprecate, or change / remove a kfunc. As described
> very overtly in the patch itself, but likely worth highlighting here:
>
> "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
> development elsewhere in the kernel".
>
> Rather, the intention and expectation is for kfuncs to be treated like
> EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
> safe and valuable option for maintainers and kfunc developers to extend
> the kernel, without tying anyone's hands, or imposing any kind of
> restrictions on maintainers in the same way that UAPI changes do.
>
> In addition to the 'kfunc lifecycle expectations' section, this patch
> also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
> authors or maintainers can choose to add to kfuncs if and when they
> decide to deprecate them. Note that as described in the patch itself, a
> kfunc need not be deprecated before being changed or removed -- this
> flag is simply provided as an available deprecation mechanism for those
> that want to provide a deprecation story / timeline to their users.
> When necessary, kfuncs may be changed or removed to accommodate changes
> elsewhere in the kernel without any deprecation at all.
>
> Signed-off-by: David Vernet <void@manifault.com>

Some comments below, but otherwise please add my:

Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com>

should we Cc the next version to linux-api@vger as well just to get a
bit more visibility in case others have comments?

> ---
>  Documentation/bpf/kfuncs.rst | 138 +++++++++++++++++++++++++++++++++--
>  1 file changed, 133 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 0bd07b39c2a4..4135f3111b67 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst
> @@ -13,7 +13,7 @@ BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
>  kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
>  kfuncs do not have a stable interface and can change from one kernel release to
>  another. Hence, BPF programs need to be updated in response to changes in the
> -kernel.
> +kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
>  
>  2. Defining a kfunc
>  ===================
> @@ -238,6 +238,32 @@ single argument which must be a trusted argument or a MEM_RCU pointer.
>  The argument may have reference count of 0 and the kfunc must take this
>  into consideration.
>  
> +.. _KF_deprecated_flag:
> +
> +2.4.9 KF_DEPRECATED flag
> +------------------------
> +
> +The KF_DEPRECATED flag is used for kfuncs which are expected to be changed or
> +removed in a subsequent kernel release. Deprecated kfuncs may be removed at any
> +time, though if possible (and when applicable), developers are encouraged to
> +provide users with a deprecation window to ease the burden of migrating off of
> +the kfunc.



I think the "may be removed at any time" is a bit odd here. If someone
wants to just remove a kfunc, why bother with the deprecation flag at
all? Besides, that whole "deprecation is optional" bit is explained
below, in this section we're just explaining the flag. So I'd just drop
this bit and combine the first two paragraphs as:

"The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
changed or removed in a subsequent kernel release. A kfunc that is
marked with KF_DEPRECATED should also have any relevant information
captured in its kernel doc. Such information typically includes the
kfunc's expected remaining lifespan, a recommendation for new
functionality that can replace it if any is available, and possibly a
rationale for why it is being removed."

> +Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
> +supported and have its KF_DEPRECATED flag removed, it is likely to be far more
> +difficult to remove a KF_DEPRECATED flag after it's been added than it is to
> +prevent it from being added in the first place. As described in
> +:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
> +highly encouraged to make their use-cases known as early as possible, and

nit: "highly encouraged" reads a bit like overuse of "very" - just "encouraged"?

> +participate in upstream discussions regarding whether to keep, change,
> +deprecate, or remove those kfuncs if and when such discussions occur.
> +
>  2.5 Registering the kfuncs
>  --------------------------
>  
> @@ -304,14 +330,116 @@ In order to accommodate such requirements, the verifier will enforce strict
>  PTR_TO_BTF_ID type matching if two types have the exact same name, with one
>  being suffixed with ``___init``.
>  
> -3. Core kfuncs
> +.. _BPF_kfunc_lifecycle_expectations:
> +
> +3. kfunc lifecycle expectations
> +===============================
> +
> +kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
> +strict stability restrictions associated with kernel <-> user UAPIs. Instead,
> +they're modeled more similarly to EXPORT_SYMBOL_GPL, and can therefore be

nit: "Instead, they're modeled more similarly to" -> "This means they
can be thought of as similar to" ? ("more similarly" is terrible :P)

> +modified or removed by a maintainer of the subsystem they're defined in when
> +it's deemed necessary.
> +
> +Like any other change to the kernel, maintainers will not change or remove a
> +kfunc without having a reasonable justification.  Whether or not they'll choose
> +to change a kfunc will ultimately depend on a variety of factors, such as how
> +widely used the kfunc is, how long the kfunc has been in the kernel, whether an
> +alternative kfunc exists, what the norm is in terms of stability for the
> +subsystem in question, and of course what the technical cost is of continuing
> +to support the kfunc.
> +
> +There are several implications of this:
> +
> +a) kfuncs that are widely used or have been in the kernel for a long time will
> +   be more difficult to justify being changed or removed by a maintainer. Said
> +   in a different way, kfuncs that are known to have a lot of users and provide

nit: "said in a different way" -> "in other words" ?

> +   significant value provide stronger incentives for maintainers to invest the
> +   time and complexity in supporting them. It is therefore important for
> +   developers that are using kfuncs in their BPF programs to communicate and
> +   explain how and why those kfuncs are being used, and to participate in
> +   discussions regarding those kfuncs when they occur upstream.
> +
> +b) Because many BPF programs are not upstreamed as part of the kernel tree, it
> +   is often not possible to change them in-place when a kfunc changes, as it is
> +   for e.g. an upstreamed driver being updated in place when an
> +   EXPORT_SYMBOL_GPL symbol is changed. Distributions that bundle BPF programs
> +   that use kfuncs must therefore ensure that those BPF programs are linking
> +   against the kfuncs that are supported by the kernel version being used for
> +   any given release. Additionally, BPF developers are encouraged to upstream
> +   their BPF programs so they can enjoy the same benefits as upstreamed
> +   modules, and avoid code churn.
> +
> +   On the other hand, while the hope is that it will become the norm to
> +   upstream BPF programs, the reality is that most BPF programs are still
> +   out-of-tree. This means that users with out-of-tree BPF programs that use
> +   kfuncs should be considered relevant to discussions and decisions around
> +   modifying and removing kfuncs, despite that not being the norm for
> +   out-of-tree kernel modules. The BPF community will take an active role in
> +   participating in upstream discussions when necessary to ensure that the
> +   perspectives of such users are taken into account.

As I said in a previous email, I really don't think encouraging people
to upstream BPF programs are either realistic of desirable. I think we
should drop that and change this point to something like:

b) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF
   programs that call kfuncs are generally not part of the kernel tree.
   This means that refactoring can not generally change callers in-place
   when a kfunc changes, as it is done for e.g. an upstreamed driver being
   updated in place when kernel symbol is changed.

   Unlike with regular kernel symbols, this is expected behaviour for
   BPF symbols, and out-of-tree BPF programs that use kfuncs should be
   considered relevant to discussions and decisions around modifying and
   removing kfuncs. The BPF community will take an active role in
   participating in upstream discussions when necessary to ensure that
   the perspectives of such users are taken into account.

> +c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
> +   will not ever hard-block a change in the kernel purely for stability
> +   reasons. In other words, kfuncs have the same stability guarantees as any
> +   other kernel API, such as those provided by EXPORT_SYMBOL_GPL, though with
> +   perhaps less burden than EXPORT_SYMBOL_GPL changes thanks to BPF CO-RE.

I'd drop the last sentence (from "In other words..."). It's not true
that kfuncs have "the same stability guarantees", we just said above
that out-of-tree BPF programs are relevant. Also, other than that I
don't think having this sentence here adds anything that's not already
explained below, so I'd just drop it and merge the below paragraph into
the above.

> +   That being said, kfuncs are features that are meant to solve problems and
> +   provide value to users. The decision of whether to change or remove a kfunc
> +   is a multivariate technical decision that is made on a case-by-case basis,
> +   and which is informed by data points such as those mentioned above. It is
> +   expected that a kfunc being removed or changed with no warning will not be a
> +   common occurrence or take place without sound justification, but it is a
> +   possibility that must be accepted if one is to use kfuncs.
> +
> +3.1 kfunc deprecation
> +---------------------
> +
> +As described above, while sometimes a maintainer may find that a kfunc must be
> +changed or removed immediately to accommodate some changes in their subsystem,
> +other kfuncs may be able to accommodate a longer and more measured deprecation
> +process. For example, if a new kfunc comes along which provides superior
> +functionality to an existing kfunc, the existing kfunc may be deprecated for
> +some period of time to allow users to migrate their BPF programs to use the new
> +one. Or, if a kfunc has no known users, a decision may be made to remove the
> +kfunc (without providing an alternative API) after some deprecation period
> +period so as to provide users with a window to notify the kfunc maintainer if
> +it turns out that the kfunc is actually being used.
> +
> +kfuncs being deprecated (rather than changed or removed with no warning) is
> +expected to be the common case, and as described in :ref:`KF_deprecated_flag`,

reword as: "It's expected that the common case will be that kfuncs will
go through a deprecation period rather than being changed or removed
with not warning. As described in..."

> +the kfunc framework provides the KF_DEPRECATED flag to kfunc developers to
> +signal to users that a kfunc has been deprecated. Once a kfunc has been marked
> +with KF_DEPRECATED, the following procedure is followed for removal:
> +
> +1. Any relevant information for deprecated kfuncs is documented in the kfunc's
> +   kernel docs. This documentation will typically include the kfunc's expected
> +   remaining lifespan,  a recommendation for new functionality that can replace
> +   the usage of the deprecated function (or an explanation as to why no such
> +   replacement exists), etc.
> +
> +2. The deprecated kfunc is kept in the kernel for some period of time after it
> +   was first marked as deprecated. This time period will be chosen on a
> +   case-by-case basis, and will typically depend on how widespread the use of
> +   the kfunc is, how long it has been in the kernel, and how hard it is to move
> +   to alternatives. This deprecation time period is "best effort", and as
> +   described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
> +   sometimes dictate that the kfunc be removed before the full intended
> +   deprecation period has elapsed.
> +
> +3. After the deprecation period, or sometimes earlier if necessary, the kfunc

drop "or sometimes earlier if necessary" - the deprecation period ends
when the kfunc is removed, that's what a deprecation period means. If
some factor means that the deprecation period is shortened, that's still
the end of the deprecation period, by definition :)



-Toke
  
David Vernet Feb. 3, 2023, 2:14 p.m. UTC | #4
On Fri, Feb 03, 2023 at 10:48:35AM +0000, Donald Hunter wrote:
> David Vernet <void@manifault.com> writes:
> 
> > +3. kfunc lifecycle expectations
> > +===============================
> > +
> > +kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
> > +strict stability restrictions associated with kernel <-> user UAPIs. Instead,
> > +they're modeled more similarly to EXPORT_SYMBOL_GPL, and can therefore be
> > +modified or removed by a maintainer of the subsystem they're defined in when
> > +it's deemed necessary.
> > +
> > +Like any other change to the kernel, maintainers will not change or remove a
> > +kfunc without having a reasonable justification.  Whether or not they'll choose
> > +to change a kfunc will ultimately depend on a variety of factors, such as how
> > +widely used the kfunc is, how long the kfunc has been in the kernel, whether an
> > +alternative kfunc exists, what the norm is in terms of stability for the
> > +subsystem in question, and of course what the technical cost is of continuing
> > +to support the kfunc.
> > +
> > +There are several implications of this:
> > +
> > +a) kfuncs that are widely used or have been in the kernel for a long time will
> > +   be more difficult to justify being changed or removed by a maintainer. Said
> > +   in a different way, kfuncs that are known to have a lot of users and provide
> > +   significant value provide stronger incentives for maintainers to invest the
> > +   time and complexity in supporting them. It is therefore important for
> > +   developers that are using kfuncs in their BPF programs to communicate and
> > +   explain how and why those kfuncs are being used, and to participate in
> > +   discussions regarding those kfuncs when they occur upstream.
> > +
> > +b) Because many BPF programs are not upstreamed as part of the kernel tree, it
> > +   is often not possible to change them in-place when a kfunc changes, as it is
> > +   for e.g. an upstreamed driver being updated in place when an
> > +   EXPORT_SYMBOL_GPL symbol is changed. Distributions that bundle BPF programs
> > +   that use kfuncs must therefore ensure that those BPF programs are linking
> > +   against the kfuncs that are supported by the kernel version being used for
> > +   any given release. Additionally, BPF developers are encouraged to upstream
> > +   their BPF programs so they can enjoy the same benefits as upstreamed
> > +   modules, and avoid code churn.
> 
> It seems unrealistic to wish for BPF programs to be upstreamed, for
> several reasons. A key benefit of BPF programs is that they are
> decoupled from the kernel lifecycle and packaging. BPF programs are
> often more tightly coupled with the application they are part of and
> need to be maintained alongside those applications. There does not seem
> to be any desire, process or incentive to maintain BPF programs in tree.

I think this is slowly becoming less true with the "modern BPF"
described by Alexei in [0], but I agree that it's certainly the vast
majority of cases now, and I'm fine with removing this from the doc to
decouple it from the larger kfunc lifecycle discussion. Toke mentioned
something similar in [1], so I'll remove this in v3.

[0]: https://lwn.net/Articles/909095/
[1]: https://lore.kernel.org/all/87cz6qew8l.fsf@toke.dk/

> 
> > +   On the other hand, while the hope is that it will become the norm to
> > +   upstream BPF programs, the reality is that most BPF programs are still
> > +   out-of-tree. This means that users with out-of-tree BPF programs that use
> > +   kfuncs should be considered relevant to discussions and decisions around
> > +   modifying and removing kfuncs, despite that not being the norm for
> > +   out-of-tree kernel modules. The BPF community will take an active role in
> > +   participating in upstream discussions when necessary to ensure that the
> > +   perspectives of such users are taken into account.
> > +
> > +c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
> > +   will not ever hard-block a change in the kernel purely for stability
> > +   reasons. In other words, kfuncs have the same stability guarantees as any
> > +   other kernel API, such as those provided by EXPORT_SYMBOL_GPL, though with
> > +   perhaps less burden than EXPORT_SYMBOL_GPL changes thanks to BPF CO-RE.
> > +
> > +   That being said, kfuncs are features that are meant to solve problems and
> > +   provide value to users. The decision of whether to change or remove a kfunc
> > +   is a multivariate technical decision that is made on a case-by-case basis,
> > +   and which is informed by data points such as those mentioned above. It is
> > +   expected that a kfunc being removed or changed with no warning will not be a
> > +   common occurrence or take place without sound justification, but it is a
> > +   possibility that must be accepted if one is to use kfuncs.
> > +
> > +3.1 kfunc deprecation
> > +---------------------
> > +
> > +As described above, while sometimes a maintainer may find that a kfunc must be
> > +changed or removed immediately to accommodate some changes in their subsystem,
> > +other kfuncs may be able to accommodate a longer and more measured deprecation
> 
> How about replacing 'other kfuncs may' with 'usually kfuncs will' to
> re-emphasise that this would be the more common scenario.

Good suggestion, will do.

> 
> > +process. For example, if a new kfunc comes along which provides superior
> > +functionality to an existing kfunc, the existing kfunc may be deprecated for
> > +some period of time to allow users to migrate their BPF programs to use the new
> > +one. Or, if a kfunc has no known users, a decision may be made to remove the
> > +kfunc (without providing an alternative API) after some deprecation period
> > +period so as to provide users with a window to notify the kfunc maintainer if
> 
> Duplicate 'period'.

Ack, good catch.

> 
> > +it turns out that the kfunc is actually being used.
> > +
> > +kfuncs being deprecated (rather than changed or removed with no warning) is
> > +expected to be the common case, and as described in :ref:`KF_deprecated_flag`,
> > +the kfunc framework provides the KF_DEPRECATED flag to kfunc developers to
> > +signal to users that a kfunc has been deprecated. Once a kfunc has been marked
> > +with KF_DEPRECATED, the following procedure is followed for removal:
> > +
> > +1. Any relevant information for deprecated kfuncs is documented in the kfunc's
> > +   kernel docs. This documentation will typically include the kfunc's expected
> > +   remaining lifespan,  a recommendation for new functionality that can replace
> > +   the usage of the deprecated function (or an explanation as to why no such
> > +   replacement exists), etc.
> > +
> > +2. The deprecated kfunc is kept in the kernel for some period of time after it
> > +   was first marked as deprecated. This time period will be chosen on a
> > +   case-by-case basis, and will typically depend on how widespread the use of
> > +   the kfunc is, how long it has been in the kernel, and how hard it is to move
> > +   to alternatives. This deprecation time period is "best effort", and as
> > +   described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
> > +   sometimes dictate that the kfunc be removed before the full intended
> > +   deprecation period has elapsed.
> > +
> > +3. After the deprecation period, or sometimes earlier if necessary, the kfunc
> > +   will be removed. At this point, BPF programs calling the kfunc will be
> > +   rejected by the verifier.
> > +
  
David Vernet Feb. 3, 2023, 2:47 p.m. UTC | #5
On Fri, Feb 03, 2023 at 02:04:10PM +0100, Toke Høiland-Jørgensen wrote:
> David Vernet <void@manifault.com> writes:
> 
> > BPF kernel <-> kernel API stability has been discussed at length over
> > the last several weeks and months. Now that we've largely aligned over
> > kfuncs being the way forward, and BPF helpers being considered frozen,
> > it's time to document the expectations for kfunc lifecycles and
> > stability so that everyone (BPF users, kfunc developers, and
> > maintainers) are all aligned, and have a crystal-clear understanding of
> > the expectations surrounding kfuncs.
> >
> > To do that, this patch adds that documentation to the main kfuncs
> > documentation page via a new 'kfunc lifecycle expectations' section. The
> > patch describes how decisions are made in the kernel regarding whether
> > to include, keep, deprecate, or change / remove a kfunc. As described
> > very overtly in the patch itself, but likely worth highlighting here:
> >
> > "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
> > development elsewhere in the kernel".
> >
> > Rather, the intention and expectation is for kfuncs to be treated like
> > EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
> > safe and valuable option for maintainers and kfunc developers to extend
> > the kernel, without tying anyone's hands, or imposing any kind of
> > restrictions on maintainers in the same way that UAPI changes do.
> >
> > In addition to the 'kfunc lifecycle expectations' section, this patch
> > also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
> > authors or maintainers can choose to add to kfuncs if and when they
> > decide to deprecate them. Note that as described in the patch itself, a
> > kfunc need not be deprecated before being changed or removed -- this
> > flag is simply provided as an available deprecation mechanism for those
> > that want to provide a deprecation story / timeline to their users.
> > When necessary, kfuncs may be changed or removed to accommodate changes
> > elsewhere in the kernel without any deprecation at all.
> >
> > Signed-off-by: David Vernet <void@manifault.com>
> 
> Some comments below, but otherwise please add my:
> 
> Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> should we Cc the next version to linux-api@vger as well just to get a
> bit more visibility in case others have comments?

Yes, good idea, will do.

> 
> > ---
> >  Documentation/bpf/kfuncs.rst | 138 +++++++++++++++++++++++++++++++++--
> >  1 file changed, 133 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > index 0bd07b39c2a4..4135f3111b67 100644
> > --- a/Documentation/bpf/kfuncs.rst
> > +++ b/Documentation/bpf/kfuncs.rst
> > @@ -13,7 +13,7 @@ BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
> >  kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
> >  kfuncs do not have a stable interface and can change from one kernel release to
> >  another. Hence, BPF programs need to be updated in response to changes in the
> > -kernel.
> > +kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
> >  
> >  2. Defining a kfunc
> >  ===================
> > @@ -238,6 +238,32 @@ single argument which must be a trusted argument or a MEM_RCU pointer.
> >  The argument may have reference count of 0 and the kfunc must take this
> >  into consideration.
> >  
> > +.. _KF_deprecated_flag:
> > +
> > +2.4.9 KF_DEPRECATED flag
> > +------------------------
> > +
> > +The KF_DEPRECATED flag is used for kfuncs which are expected to be changed or
> > +removed in a subsequent kernel release. Deprecated kfuncs may be removed at any
> > +time, though if possible (and when applicable), developers are encouraged to
> > +provide users with a deprecation window to ease the burden of migrating off of
> > +the kfunc.
> 
> 
> 
> I think the "may be removed at any time" is a bit odd here. If someone
> wants to just remove a kfunc, why bother with the deprecation flag at
> all? Besides, that whole "deprecation is optional" bit is explained

I definitely agree that the phrasing could be improved, but my intention
with that phrase was actually to answer the exact nuanced question you
posed. I think we need to clarify that KF_DEPRECATED is an optional tool
that developers can use to provide a softer deprecation story to their
users, rather than a flag that exists as part of a larger, strict
deprecation policy. Otherwise, I think it would be unclear to someone
reading this when and why they would need to use the flag. I liked your
suggestion below, and proposed a bit of extra text to try and capture
this point. Lmk what you think.

> below, in this section we're just explaining the flag. So I'd just drop
> this bit and combine the first two paragraphs as:
> 
> "The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
> changed or removed in a subsequent kernel release. A kfunc that is
> marked with KF_DEPRECATED should also have any relevant information
> captured in its kernel doc. Such information typically includes the
> kfunc's expected remaining lifespan, a recommendation for new
> functionality that can replace it if any is available, and possibly a
> rationale for why it is being removed."

I like this -- are you OK with adding this in a small subsequent
paragraph to address the point I made above?

Note that KF_DEPRECATED is simply a tool that developers can choose to
use to ease their users' burden of migrating off of a kfunc. While
developers are encouraged to use KF_DEPRECATED to provide a
transitionary deprecation period to users of their kfunc, doing so is
not strictly required, and the flag does not exist as part of any larger
kfunc deprecation policy.

> 
> > +Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
> > +supported and have its KF_DEPRECATED flag removed, it is likely to be far more
> > +difficult to remove a KF_DEPRECATED flag after it's been added than it is to
> > +prevent it from being added in the first place. As described in
> > +:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
> > +highly encouraged to make their use-cases known as early as possible, and
> 
> nit: "highly encouraged" reads a bit like overuse of "very" - just "encouraged"?

Agreed, I'll use "encouraged" here.

> 
> > +participate in upstream discussions regarding whether to keep, change,
> > +deprecate, or remove those kfuncs if and when such discussions occur.
> > +
> >  2.5 Registering the kfuncs
> >  --------------------------
> >  
> > @@ -304,14 +330,116 @@ In order to accommodate such requirements, the verifier will enforce strict
> >  PTR_TO_BTF_ID type matching if two types have the exact same name, with one
> >  being suffixed with ``___init``.
> >  
> > -3. Core kfuncs
> > +.. _BPF_kfunc_lifecycle_expectations:
> > +
> > +3. kfunc lifecycle expectations
> > +===============================
> > +
> > +kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
> > +strict stability restrictions associated with kernel <-> user UAPIs. Instead,
> > +they're modeled more similarly to EXPORT_SYMBOL_GPL, and can therefore be
> 
> nit: "Instead, they're modeled more similarly to" -> "This means they
> can be thought of as similar to" ? ("more similarly" is terrible :P)

Heh, I agree that it doesn't sound right, so much so that I googled it
because of how odd it sounded in my head after repeating it a few times:
[1]. Even though it's apparently correct grammar, I'll still apply your
suggestion to v3 because I think it reads more clearly.

[1]: https://english.stackexchange.com/questions/253073/is-using-more-similar-incorrect

> 
> > +modified or removed by a maintainer of the subsystem they're defined in when
> > +it's deemed necessary.
> > +
> > +Like any other change to the kernel, maintainers will not change or remove a
> > +kfunc without having a reasonable justification.  Whether or not they'll choose
> > +to change a kfunc will ultimately depend on a variety of factors, such as how
> > +widely used the kfunc is, how long the kfunc has been in the kernel, whether an
> > +alternative kfunc exists, what the norm is in terms of stability for the
> > +subsystem in question, and of course what the technical cost is of continuing
> > +to support the kfunc.
> > +
> > +There are several implications of this:
> > +
> > +a) kfuncs that are widely used or have been in the kernel for a long time will
> > +   be more difficult to justify being changed or removed by a maintainer. Said
> > +   in a different way, kfuncs that are known to have a lot of users and provide
> 
> nit: "said in a different way" -> "in other words" ?

Ack

> 
> > +   significant value provide stronger incentives for maintainers to invest the
> > +   time and complexity in supporting them. It is therefore important for
> > +   developers that are using kfuncs in their BPF programs to communicate and
> > +   explain how and why those kfuncs are being used, and to participate in
> > +   discussions regarding those kfuncs when they occur upstream.
> > +
> > +b) Because many BPF programs are not upstreamed as part of the kernel tree, it
> > +   is often not possible to change them in-place when a kfunc changes, as it is
> > +   for e.g. an upstreamed driver being updated in place when an
> > +   EXPORT_SYMBOL_GPL symbol is changed. Distributions that bundle BPF programs
> > +   that use kfuncs must therefore ensure that those BPF programs are linking
> > +   against the kfuncs that are supported by the kernel version being used for
> > +   any given release. Additionally, BPF developers are encouraged to upstream
> > +   their BPF programs so they can enjoy the same benefits as upstreamed
> > +   modules, and avoid code churn.
> > +
> > +   On the other hand, while the hope is that it will become the norm to
> > +   upstream BPF programs, the reality is that most BPF programs are still
> > +   out-of-tree. This means that users with out-of-tree BPF programs that use
> > +   kfuncs should be considered relevant to discussions and decisions around
> > +   modifying and removing kfuncs, despite that not being the norm for
> > +   out-of-tree kernel modules. The BPF community will take an active role in
> > +   participating in upstream discussions when necessary to ensure that the
> > +   perspectives of such users are taken into account.
> 
> As I said in a previous email, I really don't think encouraging people
> to upstream BPF programs are either realistic of desirable. I think we
> should drop that and change this point to something like:

Fair enough. Donald said something similar in [2] as well. I'd like to
see us eventually change this paradigm over time, but we can (and
probably should) certainly decouple that from this patch.

[2]: https://lore.kernel.org/all/m2sffnvxbw.fsf@gmail.com/

> 
> b) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF
>    programs that call kfuncs are generally not part of the kernel tree.
>    This means that refactoring can not generally change callers in-place
>    when a kfunc changes, as it is done for e.g. an upstreamed driver being
>    updated in place when kernel symbol is changed.
> 
>    Unlike with regular kernel symbols, this is expected behaviour for
>    BPF symbols, and out-of-tree BPF programs that use kfuncs should be
>    considered relevant to discussions and decisions around modifying and
>    removing kfuncs. The BPF community will take an active role in
>    participating in upstream discussions when necessary to ensure that
>    the perspectives of such users are taken into account.

Well put, will apply in v3.

> 
> > +c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
> > +   will not ever hard-block a change in the kernel purely for stability
> > +   reasons. In other words, kfuncs have the same stability guarantees as any
> > +   other kernel API, such as those provided by EXPORT_SYMBOL_GPL, though with
> > +   perhaps less burden than EXPORT_SYMBOL_GPL changes thanks to BPF CO-RE.
> 
> I'd drop the last sentence (from "In other words..."). It's not true
> that kfuncs have "the same stability guarantees", we just said above
> that out-of-tree BPF programs are relevant. Also, other than that I
> don't think having this sentence here adds anything that's not already
> explained below, so I'd just drop it and merge the below paragraph into
> the above.

Ack, will do.

> 
> > +   That being said, kfuncs are features that are meant to solve problems and
> > +   provide value to users. The decision of whether to change or remove a kfunc
> > +   is a multivariate technical decision that is made on a case-by-case basis,
> > +   and which is informed by data points such as those mentioned above. It is
> > +   expected that a kfunc being removed or changed with no warning will not be a
> > +   common occurrence or take place without sound justification, but it is a
> > +   possibility that must be accepted if one is to use kfuncs.
> > +
> > +3.1 kfunc deprecation
> > +---------------------
> > +
> > +As described above, while sometimes a maintainer may find that a kfunc must be
> > +changed or removed immediately to accommodate some changes in their subsystem,
> > +other kfuncs may be able to accommodate a longer and more measured deprecation
> > +process. For example, if a new kfunc comes along which provides superior
> > +functionality to an existing kfunc, the existing kfunc may be deprecated for
> > +some period of time to allow users to migrate their BPF programs to use the new
> > +one. Or, if a kfunc has no known users, a decision may be made to remove the
> > +kfunc (without providing an alternative API) after some deprecation period
> > +period so as to provide users with a window to notify the kfunc maintainer if
> > +it turns out that the kfunc is actually being used.
> > +
> > +kfuncs being deprecated (rather than changed or removed with no warning) is
> > +expected to be the common case, and as described in :ref:`KF_deprecated_flag`,
> 
> reword as: "It's expected that the common case will be that kfuncs will
> go through a deprecation period rather than being changed or removed
> with not warning. As described in..."

Ack

> 
> > +the kfunc framework provides the KF_DEPRECATED flag to kfunc developers to
> > +signal to users that a kfunc has been deprecated. Once a kfunc has been marked
> > +with KF_DEPRECATED, the following procedure is followed for removal:
> > +
> > +1. Any relevant information for deprecated kfuncs is documented in the kfunc's
> > +   kernel docs. This documentation will typically include the kfunc's expected
> > +   remaining lifespan,  a recommendation for new functionality that can replace
> > +   the usage of the deprecated function (or an explanation as to why no such
> > +   replacement exists), etc.
> > +
> > +2. The deprecated kfunc is kept in the kernel for some period of time after it
> > +   was first marked as deprecated. This time period will be chosen on a
> > +   case-by-case basis, and will typically depend on how widespread the use of
> > +   the kfunc is, how long it has been in the kernel, and how hard it is to move
> > +   to alternatives. This deprecation time period is "best effort", and as
> > +   described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
> > +   sometimes dictate that the kfunc be removed before the full intended
> > +   deprecation period has elapsed.
> > +
> > +3. After the deprecation period, or sometimes earlier if necessary, the kfunc
> 
> drop "or sometimes earlier if necessary" - the deprecation period ends
> when the kfunc is removed, that's what a deprecation period means. If
> some factor means that the deprecation period is shortened, that's still
> the end of the deprecation period, by definition :)

Fair enough, will change in v3.

Thanks for all of the suggestions. I'll send out v3 later today, along
with your edits and Co-developed-by tag.

- David
  
David Vernet Feb. 3, 2023, 2:56 p.m. UTC | #6
On Fri, Feb 03, 2023 at 08:47:21AM -0600, David Vernet wrote:
> On Fri, Feb 03, 2023 at 02:04:10PM +0100, Toke Høiland-Jørgensen wrote:
> > David Vernet <void@manifault.com> writes:
> > 
> > > BPF kernel <-> kernel API stability has been discussed at length over
> > > the last several weeks and months. Now that we've largely aligned over
> > > kfuncs being the way forward, and BPF helpers being considered frozen,
> > > it's time to document the expectations for kfunc lifecycles and
> > > stability so that everyone (BPF users, kfunc developers, and
> > > maintainers) are all aligned, and have a crystal-clear understanding of
> > > the expectations surrounding kfuncs.
> > >
> > > To do that, this patch adds that documentation to the main kfuncs
> > > documentation page via a new 'kfunc lifecycle expectations' section. The
> > > patch describes how decisions are made in the kernel regarding whether
> > > to include, keep, deprecate, or change / remove a kfunc. As described
> > > very overtly in the patch itself, but likely worth highlighting here:
> > >
> > > "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
> > > development elsewhere in the kernel".
> > >
> > > Rather, the intention and expectation is for kfuncs to be treated like
> > > EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
> > > safe and valuable option for maintainers and kfunc developers to extend
> > > the kernel, without tying anyone's hands, or imposing any kind of
> > > restrictions on maintainers in the same way that UAPI changes do.
> > >
> > > In addition to the 'kfunc lifecycle expectations' section, this patch
> > > also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
> > > authors or maintainers can choose to add to kfuncs if and when they
> > > decide to deprecate them. Note that as described in the patch itself, a
> > > kfunc need not be deprecated before being changed or removed -- this
> > > flag is simply provided as an available deprecation mechanism for those
> > > that want to provide a deprecation story / timeline to their users.
> > > When necessary, kfuncs may be changed or removed to accommodate changes
> > > elsewhere in the kernel without any deprecation at all.
> > >
> > > Signed-off-by: David Vernet <void@manifault.com>
> > 
> > Some comments below, but otherwise please add my:
> > 
> > Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > 
> > should we Cc the next version to linux-api@vger as well just to get a
> > bit more visibility in case others have comments?
> 
> Yes, good idea, will do.
> 
> > 
> > > ---
> > >  Documentation/bpf/kfuncs.rst | 138 +++++++++++++++++++++++++++++++++--
> > >  1 file changed, 133 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > > index 0bd07b39c2a4..4135f3111b67 100644
> > > --- a/Documentation/bpf/kfuncs.rst
> > > +++ b/Documentation/bpf/kfuncs.rst
> > > @@ -13,7 +13,7 @@ BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
> > >  kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
> > >  kfuncs do not have a stable interface and can change from one kernel release to
> > >  another. Hence, BPF programs need to be updated in response to changes in the
> > > -kernel.
> > > +kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
> > >  
> > >  2. Defining a kfunc
> > >  ===================
> > > @@ -238,6 +238,32 @@ single argument which must be a trusted argument or a MEM_RCU pointer.
> > >  The argument may have reference count of 0 and the kfunc must take this
> > >  into consideration.
> > >  
> > > +.. _KF_deprecated_flag:
> > > +
> > > +2.4.9 KF_DEPRECATED flag
> > > +------------------------
> > > +
> > > +The KF_DEPRECATED flag is used for kfuncs which are expected to be changed or
> > > +removed in a subsequent kernel release. Deprecated kfuncs may be removed at any
> > > +time, though if possible (and when applicable), developers are encouraged to
> > > +provide users with a deprecation window to ease the burden of migrating off of
> > > +the kfunc.
> > 
> > 
> > 
> > I think the "may be removed at any time" is a bit odd here. If someone
> > wants to just remove a kfunc, why bother with the deprecation flag at
> > all? Besides, that whole "deprecation is optional" bit is explained
> 
> I definitely agree that the phrasing could be improved, but my intention
> with that phrase was actually to answer the exact nuanced question you
> posed. I think we need to clarify that KF_DEPRECATED is an optional tool
> that developers can use to provide a softer deprecation story to their
> users, rather than a flag that exists as part of a larger, strict
> deprecation policy. Otherwise, I think it would be unclear to someone
> reading this when and why they would need to use the flag. I liked your
> suggestion below, and proposed a bit of extra text to try and capture
> this point. Lmk what you think.
> 
> > below, in this section we're just explaining the flag. So I'd just drop
> > this bit and combine the first two paragraphs as:
> > 
> > "The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
> > changed or removed in a subsequent kernel release. A kfunc that is
> > marked with KF_DEPRECATED should also have any relevant information
> > captured in its kernel doc. Such information typically includes the
> > kfunc's expected remaining lifespan, a recommendation for new
> > functionality that can replace it if any is available, and possibly a
> > rationale for why it is being removed."
> 
> I like this -- are you OK with adding this in a small subsequent
> paragraph to address the point I made above?
> 
> Note that KF_DEPRECATED is simply a tool that developers can choose to
> use to ease their users' burden of migrating off of a kfunc. While
> developers are encouraged to use KF_DEPRECATED to provide a
> transitionary deprecation period to users of their kfunc, doing so is
> not strictly required, and the flag does not exist as part of any larger
> kfunc deprecation policy.

Nevermind, after reading this a few more times, I think what you
proposed above is sufficient. Will not be including this extra paragraph
in v3.

[...]
  

Patch

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 0bd07b39c2a4..4135f3111b67 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -13,7 +13,7 @@  BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
 kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
 kfuncs do not have a stable interface and can change from one kernel release to
 another. Hence, BPF programs need to be updated in response to changes in the
-kernel.
+kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
 
 2. Defining a kfunc
 ===================
@@ -238,6 +238,32 @@  single argument which must be a trusted argument or a MEM_RCU pointer.
 The argument may have reference count of 0 and the kfunc must take this
 into consideration.
 
+.. _KF_deprecated_flag:
+
+2.4.9 KF_DEPRECATED flag
+------------------------
+
+The KF_DEPRECATED flag is used for kfuncs which are expected to be changed or
+removed in a subsequent kernel release. Deprecated kfuncs may be removed at any
+time, though if possible (and when applicable), developers are encouraged to
+provide users with a deprecation window to ease the burden of migrating off of
+the kfunc.
+
+A kfunc that is marked with KF_DEPRECATED should also have any relevant
+information captured in its kernel doc. Such information typically includes the
+kfunc's expected remaining lifespan, a recommendation for new functionality
+that can replace it if any is available, and possibly a rationale for why it is
+being removed.
+
+Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
+supported and have its KF_DEPRECATED flag removed, it is likely to be far more
+difficult to remove a KF_DEPRECATED flag after it's been added than it is to
+prevent it from being added in the first place. As described in
+:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
+highly encouraged to make their use-cases known as early as possible, and
+participate in upstream discussions regarding whether to keep, change,
+deprecate, or remove those kfuncs if and when such discussions occur.
+
 2.5 Registering the kfuncs
 --------------------------
 
@@ -304,14 +330,116 @@  In order to accommodate such requirements, the verifier will enforce strict
 PTR_TO_BTF_ID type matching if two types have the exact same name, with one
 being suffixed with ``___init``.
 
-3. Core kfuncs
+.. _BPF_kfunc_lifecycle_expectations:
+
+3. kfunc lifecycle expectations
+===============================
+
+kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
+strict stability restrictions associated with kernel <-> user UAPIs. Instead,
+they're modeled more similarly to EXPORT_SYMBOL_GPL, and can therefore be
+modified or removed by a maintainer of the subsystem they're defined in when
+it's deemed necessary.
+
+Like any other change to the kernel, maintainers will not change or remove a
+kfunc without having a reasonable justification.  Whether or not they'll choose
+to change a kfunc will ultimately depend on a variety of factors, such as how
+widely used the kfunc is, how long the kfunc has been in the kernel, whether an
+alternative kfunc exists, what the norm is in terms of stability for the
+subsystem in question, and of course what the technical cost is of continuing
+to support the kfunc.
+
+There are several implications of this:
+
+a) kfuncs that are widely used or have been in the kernel for a long time will
+   be more difficult to justify being changed or removed by a maintainer. Said
+   in a different way, kfuncs that are known to have a lot of users and provide
+   significant value provide stronger incentives for maintainers to invest the
+   time and complexity in supporting them. It is therefore important for
+   developers that are using kfuncs in their BPF programs to communicate and
+   explain how and why those kfuncs are being used, and to participate in
+   discussions regarding those kfuncs when they occur upstream.
+
+b) Because many BPF programs are not upstreamed as part of the kernel tree, it
+   is often not possible to change them in-place when a kfunc changes, as it is
+   for e.g. an upstreamed driver being updated in place when an
+   EXPORT_SYMBOL_GPL symbol is changed. Distributions that bundle BPF programs
+   that use kfuncs must therefore ensure that those BPF programs are linking
+   against the kfuncs that are supported by the kernel version being used for
+   any given release. Additionally, BPF developers are encouraged to upstream
+   their BPF programs so they can enjoy the same benefits as upstreamed
+   modules, and avoid code churn.
+
+   On the other hand, while the hope is that it will become the norm to
+   upstream BPF programs, the reality is that most BPF programs are still
+   out-of-tree. This means that users with out-of-tree BPF programs that use
+   kfuncs should be considered relevant to discussions and decisions around
+   modifying and removing kfuncs, despite that not being the norm for
+   out-of-tree kernel modules. The BPF community will take an active role in
+   participating in upstream discussions when necessary to ensure that the
+   perspectives of such users are taken into account.
+
+c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
+   will not ever hard-block a change in the kernel purely for stability
+   reasons. In other words, kfuncs have the same stability guarantees as any
+   other kernel API, such as those provided by EXPORT_SYMBOL_GPL, though with
+   perhaps less burden than EXPORT_SYMBOL_GPL changes thanks to BPF CO-RE.
+
+   That being said, kfuncs are features that are meant to solve problems and
+   provide value to users. The decision of whether to change or remove a kfunc
+   is a multivariate technical decision that is made on a case-by-case basis,
+   and which is informed by data points such as those mentioned above. It is
+   expected that a kfunc being removed or changed with no warning will not be a
+   common occurrence or take place without sound justification, but it is a
+   possibility that must be accepted if one is to use kfuncs.
+
+3.1 kfunc deprecation
+---------------------
+
+As described above, while sometimes a maintainer may find that a kfunc must be
+changed or removed immediately to accommodate some changes in their subsystem,
+other kfuncs may be able to accommodate a longer and more measured deprecation
+process. For example, if a new kfunc comes along which provides superior
+functionality to an existing kfunc, the existing kfunc may be deprecated for
+some period of time to allow users to migrate their BPF programs to use the new
+one. Or, if a kfunc has no known users, a decision may be made to remove the
+kfunc (without providing an alternative API) after some deprecation period
+period so as to provide users with a window to notify the kfunc maintainer if
+it turns out that the kfunc is actually being used.
+
+kfuncs being deprecated (rather than changed or removed with no warning) is
+expected to be the common case, and as described in :ref:`KF_deprecated_flag`,
+the kfunc framework provides the KF_DEPRECATED flag to kfunc developers to
+signal to users that a kfunc has been deprecated. Once a kfunc has been marked
+with KF_DEPRECATED, the following procedure is followed for removal:
+
+1. Any relevant information for deprecated kfuncs is documented in the kfunc's
+   kernel docs. This documentation will typically include the kfunc's expected
+   remaining lifespan,  a recommendation for new functionality that can replace
+   the usage of the deprecated function (or an explanation as to why no such
+   replacement exists), etc.
+
+2. The deprecated kfunc is kept in the kernel for some period of time after it
+   was first marked as deprecated. This time period will be chosen on a
+   case-by-case basis, and will typically depend on how widespread the use of
+   the kfunc is, how long it has been in the kernel, and how hard it is to move
+   to alternatives. This deprecation time period is "best effort", and as
+   described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
+   sometimes dictate that the kfunc be removed before the full intended
+   deprecation period has elapsed.
+
+3. After the deprecation period, or sometimes earlier if necessary, the kfunc
+   will be removed. At this point, BPF programs calling the kfunc will be
+   rejected by the verifier.
+
+4. Core kfuncs
 ==============
 
 The BPF subsystem provides a number of "core" kfuncs that are potentially
 applicable to a wide variety of different possible use cases and programs.
 Those kfuncs are documented here.
 
-3.1 struct task_struct * kfuncs
+4.1 struct task_struct * kfuncs
 -------------------------------
 
 There are a number of kfuncs that allow ``struct task_struct *`` objects to be
@@ -387,7 +515,7 @@  Here is an example of it being used:
 		return 0;
 	}
 
-3.2 struct cgroup * kfuncs
+4.2 struct cgroup * kfuncs
 --------------------------
 
 ``struct cgroup *`` objects also have acquire and release functions:
@@ -502,7 +630,7 @@  the verifier. bpf_cgroup_ancestor() can be used as follows:
 		return 0;
 	}
 
-3.3 struct cpumask * kfuncs
+4.3 struct cpumask * kfuncs
 ---------------------------
 
 BPF provides a set of kfuncs that can be used to query, allocate, mutate, and