[ipsec-next,v2,3/6] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

Message ID ed7920365daf5eff1c82892b57e918d3db786ac7.1701193577.git.dxu@dxuuu.xyz
State New
Headers
Series Add bpf_xdp_get_xfrm_state() kfunc |

Commit Message

Daniel Xu Nov. 28, 2023, 5:54 p.m. UTC
  Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
writing wrapper to make the verifier happy.

Two alternatives to this approach are:

1. Use the upcoming `preserve_static_offset` [0] attribute to disable
   CO-RE on specific structs.
2. Use broader byte-sized writes to write to bitfields.

(1) is a bit a bit hard to use. It requires specific and
not-very-obvious annotations to bpftool generated vmlinux.h. It's also
not generally available in released LLVM versions yet.

(2) makes the code quite hard to read and write. And especially if
BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
to have an inverse helper for writing.

[0]: https://reviews.llvm.org/D133361
From: Eduard Zingerman <eddyz87@gmail.com>

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 tools/lib/bpf/bpf_core_read.h | 36 +++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)
  

Comments

Eduard Zingerman Nov. 28, 2023, 5:59 p.m. UTC | #1
On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> writing wrapper to make the verifier happy.
> 
> Two alternatives to this approach are:
> 
> 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
>    CO-RE on specific structs.
> 2. Use broader byte-sized writes to write to bitfields.
> 
> (1) is a bit a bit hard to use. It requires specific and
> not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> not generally available in released LLVM versions yet.
> 
> (2) makes the code quite hard to read and write. And especially if
> BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> to have an inverse helper for writing.
> 
> [0]: https://reviews.llvm.org/D133361
> From: Eduard Zingerman <eddyz87@gmail.com>
> 
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---

Could you please also add a selftest (or several) using __retval()
annotation for this macro?
  
Daniel Xu Nov. 28, 2023, 7:15 p.m. UTC | #2
On Tue, Nov 28, 2023 at 07:59:01PM +0200, Eduard Zingerman wrote:
> On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> > Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> > writing wrapper to make the verifier happy.
> > 
> > Two alternatives to this approach are:
> > 
> > 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
> >    CO-RE on specific structs.
> > 2. Use broader byte-sized writes to write to bitfields.
> > 
> > (1) is a bit a bit hard to use. It requires specific and
> > not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> > not generally available in released LLVM versions yet.
> > 
> > (2) makes the code quite hard to read and write. And especially if
> > BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> > to have an inverse helper for writing.
> > 
> > [0]: https://reviews.llvm.org/D133361
> > From: Eduard Zingerman <eddyz87@gmail.com>
> > 
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > ---
> 
> Could you please also add a selftest (or several) using __retval()
> annotation for this macro?

Sure, I'll take a look.

Thanks,
Daniel
  
Daniel Xu Dec. 1, 2023, 1:33 a.m. UTC | #3
On Tue, Nov 28, 2023 at 07:59:01PM +0200, Eduard Zingerman wrote:
> On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> > Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> > writing wrapper to make the verifier happy.
> > 
> > Two alternatives to this approach are:
> > 
> > 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
> >    CO-RE on specific structs.
> > 2. Use broader byte-sized writes to write to bitfields.
> > 
> > (1) is a bit a bit hard to use. It requires specific and
> > not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> > not generally available in released LLVM versions yet.
> > 
> > (2) makes the code quite hard to read and write. And especially if
> > BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> > to have an inverse helper for writing.
> > 
> > [0]: https://reviews.llvm.org/D133361
> > From: Eduard Zingerman <eddyz87@gmail.com>
> > 
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > ---
> 
> Could you please also add a selftest (or several) using __retval()
> annotation for this macro?

Good call about adding tests -- I found a few bugs with the code from
the other thread. But boy did they take a lot of brain cells to figure
out.

There was some 6th grade algebra involved too -- I'll do my best to
explain it in the commit msg for v3.


Here are the fixes in case you are curious:

diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index 7a764f65d299..8f02c558c0ff 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -120,7 +120,9 @@ enum bpf_enum_value_kind {
        unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);      \
        unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);        \
        unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);        \
-       unsigned int bit_size = (rshift - lshift);                      \
+       unsigned int bit_size = (64 - rshift);                          \
+       unsigned int hi_size = lshift;                                  \
+       unsigned int lo_size = (rshift - lshift);                       \
        unsigned long long nval, val, hi, lo;                           \
                                                                        \
        asm volatile("" : "+r"(p));                                     \
@@ -131,13 +133,13 @@ enum bpf_enum_value_kind {
        case 4: val = *(unsigned int *)p; break;                        \
        case 8: val = *(unsigned long long *)p; break;                  \
        }                                                               \
-       hi = val >> (bit_size + rshift);                                \
-       hi <<= bit_size + rshift;                                       \
-       lo = val << (bit_size + lshift);                                \
-       lo >>= bit_size + lshift;                                       \
+       hi = val >> (64 - hi_size);                                     \
+       hi <<= 64 - hi_size;                                            \
+       lo = val << (64 - lo_size);                                     \
+       lo >>= 64 - lo_size;                                            \
        nval = new_val;                                                 \
-       nval <<= lshift;                                                \
-       nval >>= rshift;                                                \
+       nval <<= (64 - bit_size);                                       \
+       nval >>= (64 - bit_size - lo_size);                             \
        val = hi | nval | lo;                                           \
        switch (byte_size) {                                            \
        case 1: *(unsigned char *)p      = val; break;                  \


Thanks,
Daniel
  
Eduard Zingerman Dec. 1, 2023, 4:13 p.m. UTC | #4
On Thu, 2023-11-30 at 18:33 -0700, Daniel Xu wrote:
[...]
> Good call about adding tests -- I found a few bugs with the code from
> the other thread. But boy did they take a lot of brain cells to figure
> out.
> 
> There was some 6th grade algebra involved too -- I'll do my best to
> explain it in the commit msg for v3.
> 
> Here are the fixes in case you are curious:

Ouch, I knew my code from 3am can't be trusted, sorry for that.
Your math seem to make sense, thank you.

[...]
  
Andrii Nakryiko Dec. 1, 2023, 7:11 p.m. UTC | #5
On Thu, Nov 30, 2023 at 5:33 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> On Tue, Nov 28, 2023 at 07:59:01PM +0200, Eduard Zingerman wrote:
> > On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> > > Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> > > writing wrapper to make the verifier happy.
> > >
> > > Two alternatives to this approach are:
> > >
> > > 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
> > >    CO-RE on specific structs.
> > > 2. Use broader byte-sized writes to write to bitfields.
> > >
> > > (1) is a bit a bit hard to use. It requires specific and
> > > not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> > > not generally available in released LLVM versions yet.
> > >
> > > (2) makes the code quite hard to read and write. And especially if
> > > BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> > > to have an inverse helper for writing.
> > >
> > > [0]: https://reviews.llvm.org/D133361
> > > From: Eduard Zingerman <eddyz87@gmail.com>
> > >
> > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > > ---
> >
> > Could you please also add a selftest (or several) using __retval()
> > annotation for this macro?
>
> Good call about adding tests -- I found a few bugs with the code from
> the other thread. But boy did they take a lot of brain cells to figure
> out.
>
> There was some 6th grade algebra involved too -- I'll do my best to
> explain it in the commit msg for v3.
>
>
> Here are the fixes in case you are curious:
>
> diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
> index 7a764f65d299..8f02c558c0ff 100644
> --- a/tools/lib/bpf/bpf_core_read.h
> +++ b/tools/lib/bpf/bpf_core_read.h
> @@ -120,7 +120,9 @@ enum bpf_enum_value_kind {
>         unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);      \
>         unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);        \
>         unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);        \
> -       unsigned int bit_size = (rshift - lshift);                      \
> +       unsigned int bit_size = (64 - rshift);                          \
> +       unsigned int hi_size = lshift;                                  \
> +       unsigned int lo_size = (rshift - lshift);                       \

nit: let's drop unnecessary ()

>         unsigned long long nval, val, hi, lo;                           \
>                                                                         \
>         asm volatile("" : "+r"(p));                                     \
> @@ -131,13 +133,13 @@ enum bpf_enum_value_kind {
>         case 4: val = *(unsigned int *)p; break;                        \
>         case 8: val = *(unsigned long long *)p; break;                  \
>         }                                                               \
> -       hi = val >> (bit_size + rshift);                                \
> -       hi <<= bit_size + rshift;                                       \
> -       lo = val << (bit_size + lshift);                                \
> -       lo >>= bit_size + lshift;                                       \
> +       hi = val >> (64 - hi_size);                                     \
> +       hi <<= 64 - hi_size;                                            \
> +       lo = val << (64 - lo_size);                                     \
> +       lo >>= 64 - lo_size;                                            \
>         nval = new_val;                                                 \
> -       nval <<= lshift;                                                \
> -       nval >>= rshift;                                                \
> +       nval <<= (64 - bit_size);                                       \
> +       nval >>= (64 - bit_size - lo_size);                             \
>         val = hi | nval | lo;                                           \

this looks.. unusual. I'd imagine we calculate a mask, mask out bits
we are replacing, and then OR with new values, roughly (assuming all
the right left/right shift values and stuff)

/* clear bits */
val &= ~(bitfield_mask << shift);
/* set bits */
val |= (nval & bitfield_mask) << shift;

?

>         switch (byte_size) {                                            \
>         case 1: *(unsigned char *)p      = val; break;                  \
>
>
> Thanks,
> Daniel
  
Andrii Nakryiko Dec. 1, 2023, 7:13 p.m. UTC | #6
On Fri, Dec 1, 2023 at 11:11 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 30, 2023 at 5:33 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
> >
> > On Tue, Nov 28, 2023 at 07:59:01PM +0200, Eduard Zingerman wrote:
> > > On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> > > > Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> > > > writing wrapper to make the verifier happy.
> > > >
> > > > Two alternatives to this approach are:
> > > >
> > > > 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
> > > >    CO-RE on specific structs.
> > > > 2. Use broader byte-sized writes to write to bitfields.
> > > >
> > > > (1) is a bit a bit hard to use. It requires specific and
> > > > not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> > > > not generally available in released LLVM versions yet.
> > > >
> > > > (2) makes the code quite hard to read and write. And especially if
> > > > BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> > > > to have an inverse helper for writing.
> > > >
> > > > [0]: https://reviews.llvm.org/D133361
> > > > From: Eduard Zingerman <eddyz87@gmail.com>
> > > >
> > > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > > > ---
> > >
> > > Could you please also add a selftest (or several) using __retval()
> > > annotation for this macro?
> >
> > Good call about adding tests -- I found a few bugs with the code from
> > the other thread. But boy did they take a lot of brain cells to figure
> > out.
> >
> > There was some 6th grade algebra involved too -- I'll do my best to
> > explain it in the commit msg for v3.
> >
> >
> > Here are the fixes in case you are curious:
> >
> > diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
> > index 7a764f65d299..8f02c558c0ff 100644
> > --- a/tools/lib/bpf/bpf_core_read.h
> > +++ b/tools/lib/bpf/bpf_core_read.h
> > @@ -120,7 +120,9 @@ enum bpf_enum_value_kind {
> >         unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);      \
> >         unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);        \
> >         unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);        \
> > -       unsigned int bit_size = (rshift - lshift);                      \
> > +       unsigned int bit_size = (64 - rshift);                          \
> > +       unsigned int hi_size = lshift;                                  \
> > +       unsigned int lo_size = (rshift - lshift);                       \
>
> nit: let's drop unnecessary ()
>
> >         unsigned long long nval, val, hi, lo;                           \
> >                                                                         \
> >         asm volatile("" : "+r"(p));                                     \
> > @@ -131,13 +133,13 @@ enum bpf_enum_value_kind {
> >         case 4: val = *(unsigned int *)p; break;                        \
> >         case 8: val = *(unsigned long long *)p; break;                  \
> >         }                                                               \
> > -       hi = val >> (bit_size + rshift);                                \
> > -       hi <<= bit_size + rshift;                                       \
> > -       lo = val << (bit_size + lshift);                                \
> > -       lo >>= bit_size + lshift;                                       \
> > +       hi = val >> (64 - hi_size);                                     \
> > +       hi <<= 64 - hi_size;                                            \
> > +       lo = val << (64 - lo_size);                                     \
> > +       lo >>= 64 - lo_size;                                            \
> >         nval = new_val;                                                 \
> > -       nval <<= lshift;                                                \
> > -       nval >>= rshift;                                                \
> > +       nval <<= (64 - bit_size);                                       \
> > +       nval >>= (64 - bit_size - lo_size);                             \
> >         val = hi | nval | lo;                                           \
>
> this looks.. unusual. I'd imagine we calculate a mask, mask out bits
> we are replacing, and then OR with new values, roughly (assuming all
> the right left/right shift values and stuff)
>
> /* clear bits */
> val &= ~(bitfield_mask << shift);

we can also calculate shifted mask with just

bitfield_mask = (-1ULL) << some_left_shift >> some_right_shift;
val &= ~bitfield_mask;

> /* set bits */
> val |= (nval & bitfield_mask) << shift;
>
> ?
>
> >         switch (byte_size) {                                            \
> >         case 1: *(unsigned char *)p      = val; break;                  \
> >
> >
> > Thanks,
> > Daniel
  
Daniel Xu Dec. 1, 2023, 8:05 p.m. UTC | #7
On Fri, Dec 01, 2023 at 11:13:13AM -0800, Andrii Nakryiko wrote:
> On Fri, Dec 1, 2023 at 11:11 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Nov 30, 2023 at 5:33 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
> > >
> > > On Tue, Nov 28, 2023 at 07:59:01PM +0200, Eduard Zingerman wrote:
> > > > On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> > > > > Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> > > > > writing wrapper to make the verifier happy.
> > > > >
> > > > > Two alternatives to this approach are:
> > > > >
> > > > > 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
> > > > >    CO-RE on specific structs.
> > > > > 2. Use broader byte-sized writes to write to bitfields.
> > > > >
> > > > > (1) is a bit a bit hard to use. It requires specific and
> > > > > not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> > > > > not generally available in released LLVM versions yet.
> > > > >
> > > > > (2) makes the code quite hard to read and write. And especially if
> > > > > BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> > > > > to have an inverse helper for writing.
> > > > >
> > > > > [0]: https://reviews.llvm.org/D133361
> > > > > From: Eduard Zingerman <eddyz87@gmail.com>
> > > > >
> > > > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > > > > ---
> > > >
> > > > Could you please also add a selftest (or several) using __retval()
> > > > annotation for this macro?
> > >
> > > Good call about adding tests -- I found a few bugs with the code from
> > > the other thread. But boy did they take a lot of brain cells to figure
> > > out.
> > >
> > > There was some 6th grade algebra involved too -- I'll do my best to
> > > explain it in the commit msg for v3.
> > >
> > >
> > > Here are the fixes in case you are curious:
> > >
> > > diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
> > > index 7a764f65d299..8f02c558c0ff 100644
> > > --- a/tools/lib/bpf/bpf_core_read.h
> > > +++ b/tools/lib/bpf/bpf_core_read.h
> > > @@ -120,7 +120,9 @@ enum bpf_enum_value_kind {
> > >         unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);      \
> > >         unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);        \
> > >         unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);        \
> > > -       unsigned int bit_size = (rshift - lshift);                      \
> > > +       unsigned int bit_size = (64 - rshift);                          \
> > > +       unsigned int hi_size = lshift;                                  \
> > > +       unsigned int lo_size = (rshift - lshift);                       \
> >
> > nit: let's drop unnecessary ()
> >
> > >         unsigned long long nval, val, hi, lo;                           \
> > >                                                                         \
> > >         asm volatile("" : "+r"(p));                                     \
> > > @@ -131,13 +133,13 @@ enum bpf_enum_value_kind {
> > >         case 4: val = *(unsigned int *)p; break;                        \
> > >         case 8: val = *(unsigned long long *)p; break;                  \
> > >         }                                                               \
> > > -       hi = val >> (bit_size + rshift);                                \
> > > -       hi <<= bit_size + rshift;                                       \
> > > -       lo = val << (bit_size + lshift);                                \
> > > -       lo >>= bit_size + lshift;                                       \
> > > +       hi = val >> (64 - hi_size);                                     \
> > > +       hi <<= 64 - hi_size;                                            \
> > > +       lo = val << (64 - lo_size);                                     \
> > > +       lo >>= 64 - lo_size;                                            \
> > >         nval = new_val;                                                 \
> > > -       nval <<= lshift;                                                \
> > > -       nval >>= rshift;                                                \
> > > +       nval <<= (64 - bit_size);                                       \
> > > +       nval >>= (64 - bit_size - lo_size);                             \
> > >         val = hi | nval | lo;                                           \
> >
> > this looks.. unusual. I'd imagine we calculate a mask, mask out bits
> > we are replacing, and then OR with new values, roughly (assuming all
> > the right left/right shift values and stuff)
> >
> > /* clear bits */
> > val &= ~(bitfield_mask << shift);
> 
> we can also calculate shifted mask with just
> 
> bitfield_mask = (-1ULL) << some_left_shift >> some_right_shift;
> val &= ~bitfield_mask;

Yeah I was chatting w/ JonathanL about this and I've got basically that
code ready to send for v3.

Thanks,
Daniel
  

Patch

diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index 1ac57bb7ac55..7a764f65d299 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -111,6 +111,42 @@  enum bpf_enum_value_kind {
 	val;								      \
 })
 
+/*
+ * Write to a bitfield, identified by s->field.
+ * This is the inverse of BPF_CORE_WRITE_BITFIELD().
+ */
+#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({			\
+	void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET);	\
+	unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);	\
+	unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);	\
+	unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);	\
+	unsigned int bit_size = (rshift - lshift);			\
+	unsigned long long nval, val, hi, lo;				\
+									\
+	asm volatile("" : "+r"(p));					\
+									\
+	switch (byte_size) {						\
+	case 1: val = *(unsigned char *)p; break;			\
+	case 2: val = *(unsigned short *)p; break;			\
+	case 4: val = *(unsigned int *)p; break;			\
+	case 8: val = *(unsigned long long *)p; break;			\
+	}								\
+	hi = val >> (bit_size + rshift);				\
+	hi <<= bit_size + rshift;					\
+	lo = val << (bit_size + lshift);				\
+	lo >>= bit_size + lshift;					\
+	nval = new_val;							\
+	nval <<= lshift;						\
+	nval >>= rshift;						\
+	val = hi | nval | lo;						\
+	switch (byte_size) {						\
+	case 1: *(unsigned char *)p      = val; break;			\
+	case 2: *(unsigned short *)p     = val; break;			\
+	case 4: *(unsigned int *)p       = val; break;			\
+	case 8: *(unsigned long long *)p = val; break;			\
+	}								\
+})
+
 #define ___bpf_field_ref1(field)	(field)
 #define ___bpf_field_ref2(type, field)	(((typeof(type) *)0)->field)
 #define ___bpf_field_ref(args...)					    \