[v5,2/5] lib/test_bitmap: add tests for bitmap_{read,write}()

Message ID 20230922080848.1261487-3-glider@google.com
State New
Headers
Series Implement MTE tag compression for swapped pages |

Commit Message

Alexander Potapenko Sept. 22, 2023, 8:08 a.m. UTC
  Add basic tests ensuring that values can be added at arbitrary positions
of the bitmap, including those spanning into the adjacent unsigned
longs.

Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

---
This patch was previously called
"lib/test_bitmap: add tests for bitmap_{set,get}_value()"
(https://lore.kernel.org/lkml/20230720173956.3674987-3-glider@google.com/)
and
"lib/test_bitmap: add tests for bitmap_{set,get}_value_unaligned"
(https://lore.kernel.org/lkml/20230713125706.2884502-3-glider@google.com/)

v5:
 - update patch title
 - address Yury Norov's comments:
   - rename the test cases
   - factor out test_bitmap_write_helper() to test writing over
     different background patterns;
   - add a test case copying a nontrivial value bit-by-bit;
   - drop volatile

v4:
 - Address comments by Andy Shevchenko: added Reviewed-by: and a link to
   the previous discussion
 - Address comments by Yury Norov:
   - expand the bitmap to catch more corner cases
   - add code testing that bitmap_set_value() does not touch adjacent
     bits
   - add code testing the nbits==0 case
   - rename bitmap_{get,set}_value() to bitmap_{read,write}()

v3:
 - switch to using bitmap_{set,get}_value()
 - change the expected bit pattern in test_set_get_value(),
   as the test was incorrectly assuming 0 is the LSB.
---
 lib/test_bitmap.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)
  

Comments

Alexander Potapenko Sept. 25, 2023, 12:16 p.m. UTC | #1
>
> +/*
> + * Test bitmap should be big enough to include the cases when start is not in
> + * the first word, and start+nbits lands in the following word.
> + */
> +#define TEST_BIT_LEN (1000)

Dunno why this didn't fire previously, but CONFIG_CPU_BIG_ENDIAN=y
kernel reports mismatches here, presumably because the last quad word
ends up partially initialized.
The easiest fix is to make TEST_BIT_LEN divisible by 64 (e.g. 1024)
  
Andy Shevchenko Sept. 25, 2023, 12:23 p.m. UTC | #2
On Mon, Sep 25, 2023 at 02:16:37PM +0200, Alexander Potapenko wrote:

...

> > +/*
> > + * Test bitmap should be big enough to include the cases when start is not in
> > + * the first word, and start+nbits lands in the following word.
> > + */
> > +#define TEST_BIT_LEN (1000)
> 
> Dunno why this didn't fire previously, but CONFIG_CPU_BIG_ENDIAN=y
> kernel reports mismatches here, presumably because the last quad word
> ends up partially initialized.

Hmm... But if designed and used correctly it shouldn't be the issue,
and 1000, I believe, is carefully chosen to be specifically not dividable
by pow-of-2 value.

> The easiest fix is to make TEST_BIT_LEN divisible by 64 (e.g. 1024)

Wouldn't it hide the real issue somewhere?
  
Alexander Potapenko Sept. 25, 2023, 1:09 p.m. UTC | #3
On Mon, Sep 25, 2023 at 2:23 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Mon, Sep 25, 2023 at 02:16:37PM +0200, Alexander Potapenko wrote:
>
> ...
>
> > > +/*
> > > + * Test bitmap should be big enough to include the cases when start is not in
> > > + * the first word, and start+nbits lands in the following word.
> > > + */
> > > +#define TEST_BIT_LEN (1000)
> >
> > Dunno why this didn't fire previously, but CONFIG_CPU_BIG_ENDIAN=y
> > kernel reports mismatches here, presumably because the last quad word
> > ends up partially initialized.
>
> Hmm... But if designed and used correctly it shouldn't be the issue,
> and 1000, I believe, is carefully chosen to be specifically not dividable
> by pow-of-2 value.
>

The problem manifests already right after initialization:

static void __init test_bit_len_1000(void)
{
        DECLARE_BITMAP(bitmap, TEST_BIT_LEN);
        DECLARE_BITMAP(exp_bitmap, TEST_BIT_LEN);
        memset(bitmap, 0x00, TEST_BYTE_LEN);
        memset(exp_bitmap, 0x00, TEST_BYTE_LEN);
        expect_eq_bitmap(exp_bitmap, bitmap, TEST_BIT_LEN);
}

...
[   29.601614][    T1] test_bitmap: [lib/test_bitmap.c:1250] bitmaps
contents differ: expected
"960-963,966-967,969,971-973,976,978-979,981", got "963"
...

So it's probably expect_eq_bitmap() that is incorrectly rounding up
the bitmap length somewhere (or maybe it is not supposed to be used
for non-aligned bitmaps?)
Looking further...
  
Alexander Potapenko Sept. 25, 2023, 2:54 p.m. UTC | #4
On Mon, Sep 25, 2023 at 3:09 PM Alexander Potapenko <glider@google.com> wrote:
>
> On Mon, Sep 25, 2023 at 2:23 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> >
> > On Mon, Sep 25, 2023 at 02:16:37PM +0200, Alexander Potapenko wrote:
> >
> > ...
> >
> > > > +/*
> > > > + * Test bitmap should be big enough to include the cases when start is not in
> > > > + * the first word, and start+nbits lands in the following word.
> > > > + */
> > > > +#define TEST_BIT_LEN (1000)
> > >
> > > Dunno why this didn't fire previously, but CONFIG_CPU_BIG_ENDIAN=y
> > > kernel reports mismatches here, presumably because the last quad word
> > > ends up partially initialized.
> >
> > Hmm... But if designed and used correctly it shouldn't be the issue,
> > and 1000, I believe, is carefully chosen to be specifically not dividable
> > by pow-of-2 value.
> >
>
> The problem manifests already right after initialization:
>
> static void __init test_bit_len_1000(void)
> {
>         DECLARE_BITMAP(bitmap, TEST_BIT_LEN);
>         DECLARE_BITMAP(exp_bitmap, TEST_BIT_LEN);
>         memset(bitmap, 0x00, TEST_BYTE_LEN);
>         memset(exp_bitmap, 0x00, TEST_BYTE_LEN);
>         expect_eq_bitmap(exp_bitmap, bitmap, TEST_BIT_LEN);
> }

The problem is that there's no direct analog of memset() that can be
used to initialize bitmaps on both BE and LE systems.
bitmap_zero() and bitmap_set() work by rounding up the bitmap size to
BITS_TO_LONGS(nbits), but there's no bitmap_memset() that would do the
same for an arbitrary byte pattern.
We could call memset(..., ..., BITS_TO_LONGS(TEST_BIT_LEN)), but that
would be similar to declaring a bigger bitmap and not testing the last
24 bits.

Overall, unless allocating and initializing bitmaps with size
divisible by sizeof(long), most of bitmap.c is undefined behavior, so
I don't think it makes much sense to specifically test this case here
(given that we do not extend bitmap_equal() in the patch set).
  
Yury Norov Sept. 25, 2023, 4:06 p.m. UTC | #5
On Mon, Sep 25, 2023 at 04:54:00PM +0200, Alexander Potapenko wrote:
> On Mon, Sep 25, 2023 at 3:09 PM Alexander Potapenko <glider@google.com> wrote:
> >
> > On Mon, Sep 25, 2023 at 2:23 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > >
> > > On Mon, Sep 25, 2023 at 02:16:37PM +0200, Alexander Potapenko wrote:
> > >
> > > ...
> > >
> > > > > +/*
> > > > > + * Test bitmap should be big enough to include the cases when start is not in
> > > > > + * the first word, and start+nbits lands in the following word.
> > > > > + */
> > > > > +#define TEST_BIT_LEN (1000)
> > > >
> > > > Dunno why this didn't fire previously, but CONFIG_CPU_BIG_ENDIAN=y
> > > > kernel reports mismatches here, presumably because the last quad word
> > > > ends up partially initialized.
> > >
> > > Hmm... But if designed and used correctly it shouldn't be the issue,
> > > and 1000, I believe, is carefully chosen to be specifically not dividable
> > > by pow-of-2 value.
> > >
> >
> > The problem manifests already right after initialization:
> >
> > static void __init test_bit_len_1000(void)
> > {
> >         DECLARE_BITMAP(bitmap, TEST_BIT_LEN);
> >         DECLARE_BITMAP(exp_bitmap, TEST_BIT_LEN);
> >         memset(bitmap, 0x00, TEST_BYTE_LEN);
> >         memset(exp_bitmap, 0x00, TEST_BYTE_LEN);
> >         expect_eq_bitmap(exp_bitmap, bitmap, TEST_BIT_LEN);
> > }
> 
> The problem is that there's no direct analog of memset() that can be
> used to initialize bitmaps on both BE and LE systems.

memset fills an array of chars with the same value. In bitmap world we operate
on array of bits, and there are only 2 possible values: '0' and '1'. For those
we've got bitmap_zero() and bitmap_fill().

> bitmap_zero() and bitmap_set() work by rounding up the bitmap size to
> BITS_TO_LONGS(nbits), but there's no bitmap_memset() that would do the
> same for an arbitrary byte pattern.
> We could call memset(..., ..., BITS_TO_LONGS(TEST_BIT_LEN)), but that
> would be similar to declaring a bigger bitmap and not testing the last
> 24 bits.

No, you couldn't. On the test level, bitmap should be considered as a
black box. memset()'ing it may (and did) damage internal structure.

If you have some pattern in mind, you can use bitmap_parselist(). For example,
you can set every 2nd bit in your bitmap like this:

        bitmap_parselist("all:1/2", bitmap, 1000);

Check for almost 100 examples of bitmap_parselist usage in the test for
bitmap_parselist in the same file.

> Overall, unless allocating and initializing bitmaps with size
> divisible by sizeof(long), most of bitmap.c is undefined behavior, so
> I don't think it makes much sense to specifically test this case here
> (given that we do not extend bitmap_equal() in the patch set).

This is wrong statement. People spent huge amount of time making bitmap
API working well for all combinations of lengths and endiannesses,
including Andy and me.

NAK for this and for ignoring my other comment to v4.
  
Alexander Potapenko Sept. 25, 2023, 5:16 p.m. UTC | #6
On Mon, Sep 25, 2023 at 6:06 PM Yury Norov <yury.norov@gmail.com> wrote:
>
> On Mon, Sep 25, 2023 at 04:54:00PM +0200, Alexander Potapenko wrote:
> > On Mon, Sep 25, 2023 at 3:09 PM Alexander Potapenko <glider@google.com> wrote:
> > >
> > > On Mon, Sep 25, 2023 at 2:23 PM Andy Shevchenko
> > > <andriy.shevchenko@linux.intel.com> wrote:
> > > >
> > > > On Mon, Sep 25, 2023 at 02:16:37PM +0200, Alexander Potapenko wrote:
> > > >
> > > > ...
> > > >
> > > > > > +/*
> > > > > > + * Test bitmap should be big enough to include the cases when start is not in
> > > > > > + * the first word, and start+nbits lands in the following word.
> > > > > > + */
> > > > > > +#define TEST_BIT_LEN (1000)
> > > > >
> > > > > Dunno why this didn't fire previously, but CONFIG_CPU_BIG_ENDIAN=y
> > > > > kernel reports mismatches here, presumably because the last quad word
> > > > > ends up partially initialized.
> > > >
> > > > Hmm... But if designed and used correctly it shouldn't be the issue,
> > > > and 1000, I believe, is carefully chosen to be specifically not dividable
> > > > by pow-of-2 value.
> > > >
> > >
> > > The problem manifests already right after initialization:
> > >
> > > static void __init test_bit_len_1000(void)
> > > {
> > >         DECLARE_BITMAP(bitmap, TEST_BIT_LEN);
> > >         DECLARE_BITMAP(exp_bitmap, TEST_BIT_LEN);
> > >         memset(bitmap, 0x00, TEST_BYTE_LEN);
> > >         memset(exp_bitmap, 0x00, TEST_BYTE_LEN);
> > >         expect_eq_bitmap(exp_bitmap, bitmap, TEST_BIT_LEN);
> > > }
> >
> > The problem is that there's no direct analog of memset() that can be
> > used to initialize bitmaps on both BE and LE systems.
>
> memset fills an array of chars with the same value. In bitmap world we operate
> on array of bits, and there are only 2 possible values: '0' and '1'. For those
> we've got bitmap_zero() and bitmap_fill().
>
> > bitmap_zero() and bitmap_set() work by rounding up the bitmap size to
> > BITS_TO_LONGS(nbits), but there's no bitmap_memset() that would do the
> > same for an arbitrary byte pattern.
> > We could call memset(..., ..., BITS_TO_LONGS(TEST_BIT_LEN)), but that
> > would be similar to declaring a bigger bitmap and not testing the last
> > 24 bits.
>
> No, you couldn't. On the test level, bitmap should be considered as a
> black box. memset()'ing it may (and did) damage internal structure.

You are right about this. bitmap_zero() and bitmap_fill() are calling
memset() under the hood, but I shouldn't have assumed doing raw memset
is safe.
Unfortunately lib/test_bitmap.c does a bunch of memsets already, which
probably led to the confusion.


> If you have some pattern in mind, you can use bitmap_parselist(). For example,
> you can set every 2nd bit in your bitmap like this:
>
>         bitmap_parselist("all:1/2", bitmap, 1000);
>
> Check for almost 100 examples of bitmap_parselist usage in the test for
> bitmap_parselist in the same file.

Thanks! This solves my problem.
I am planning to use an intermediate bitmap to avoid parsing the
pattern multiple times.

>
> > Overall, unless allocating and initializing bitmaps with size
> > divisible by sizeof(long), most of bitmap.c is undefined behavior, so
> > I don't think it makes much sense to specifically test this case here
> > (given that we do not extend bitmap_equal() in the patch set).
>
> This is wrong statement. People spent huge amount of time making bitmap
> API working well for all combinations of lengths and endiannesses,
> including Andy and me.

Please accept my apologies for that, I didn't mean to insult you,
Andy, or anyone else.
My understanding of the comment at the top of lib/bitmap.c was that
the bitmap API is expected to work in the presence of uninitialized
values, which is actually not what the comment says.
bitmap_zero() and such do ensure that none of the tail bytes remain
uninitialized, so we are safe as long as those functions are used.


>
> NAK for this and for ignoring my other comment to v4.

If you are talking about this comment:
https://lore.kernel.org/lkml/ZQ2WboApqYyEkjjG@yury-ThinkPad/, I was
going to incorporate it in v6 (as it was sent after I published v5).
I am fine with not mandating the return value for reading/writing zero bytes.
  
David Laight Sept. 27, 2023, 7:51 a.m. UTC | #7
...
> Overall, unless allocating and initializing bitmaps with size
> divisible by sizeof(long), most of bitmap.c is undefined behavior, so
> I don't think it makes much sense to specifically test this case here
> (given that we do not extend bitmap_equal() in the patch set).

Bitmaps are arrays of unsigned long.
Using any of the APIs on anything else is a bug.
So it is always wrong to try to initialise 'a number of bytes'.
The size used in the definition need not be a multiple of 8 (on 64bit)
but the allocated data is always a multiple of 8.

Any calls to the functions that have a cast of the bitmap
parameter are likely to be buggy.
And yes, there are loads of them, and many are buggy.

On LE you mostly get away with shorter memory allocations.
But still get errors when trying to do locked operations
on misaligned addresses.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Alexander Potapenko Sept. 28, 2023, 2:19 p.m. UTC | #8
On Wed, Sep 27, 2023 at 9:51 AM David Laight <David.Laight@aculab.com> wrote:
>
> ...
> > Overall, unless allocating and initializing bitmaps with size
> > divisible by sizeof(long), most of bitmap.c is undefined behavior, so
> > I don't think it makes much sense to specifically test this case here
> > (given that we do not extend bitmap_equal() in the patch set).
>
> Bitmaps are arrays of unsigned long.
> Using any of the APIs on anything else is a bug.
> So it is always wrong to try to initialise 'a number of bytes'.
> The size used in the definition need not be a multiple of 8 (on 64bit)
> but the allocated data is always a multiple of 8.
>
> Any calls to the functions that have a cast of the bitmap
> parameter are likely to be buggy.
> And yes, there are loads of them, and many are buggy.

I got rid of the casts in the bitmap test, but they remain in
mtecomp.c, where 16-, 32-, 64-byte buffers allocated by
kmem_cache_alloc() are treated as bitmaps:
https://lore.kernel.org/linux-arm-kernel/20230922080848.1261487-6-glider@google.com/T/#mdb0d636d2d357f8ffe6ac79cef1145df3440f659

Having them allocated by bitmap_alloc() won't work, because on Android
bitmap_alloc() will allocate the buffers from the kmalloc-64 cache,
defeating the purpose of the compression.

Would it be better to extend the bitmap.h API so that it is possible
to allocate from a kmem cache (which would in turn require
bitmap_kmem_cache_create() to ensure the alignment requirements)?

> On LE you mostly get away with shorter memory allocations.
> But still get errors when trying to do locked operations
> on misaligned addresses.
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
  
Alexander Potapenko Sept. 28, 2023, 3:14 p.m. UTC | #9
On Thu, Sep 28, 2023 at 4:43 PM Yury Norov <yury.norov@gmail.com> wrote:
>
>
>
> On Thu, Sep 28, 2023, 10:20 AM Alexander Potapenko <glider@google.com> wrote:
>>
>> On Wed, Sep 27, 2023 at 9:51 AM David Laight <David.Laight@aculab.com> wrote:
>> >
>> > ...
>> > > Overall, unless allocating and initializing bitmaps with size
>> > > divisible by sizeof(long), most of bitmap.c is undefined behavior, so
>> > > I don't think it makes much sense to specifically test this case here
>> > > (given that we do not extend bitmap_equal() in the patch set).
>> >
>> > Bitmaps are arrays of unsigned long.
>> > Using any of the APIs on anything else is a bug.
>> > So it is always wrong to try to initialise 'a number of bytes'.
>> > The size used in the definition need not be a multiple of 8 (on 64bit)
>> > but the allocated data is always a multiple of 8.
>> >
>> > Any calls to the functions that have a cast of the bitmap
>> > parameter are likely to be buggy.
>> > And yes, there are loads of them, and many are buggy.
>>
>> I got rid of the casts in the bitmap test, but they remain in
>> mtecomp.c, where 16-, 32-, 64-byte buffers allocated by
>> kmem_cache_alloc() are treated as bitmaps:
>> https://lore.kernel.org/linux-arm-kernel/20230922080848.1261487-6-glider@google.com/T/#mdb0d636d2d357f8ffe6ac79cef1145df3440f659
>>
>> Having them allocated by bitmap_alloc() won't work, because on Android
>> bitmap_alloc() will allocate the buffers from the kmalloc-64 cache,
>> defeating the purpose of the compression.
>>
>> Would it be better to extend the bitmap.h API so that it is possible
>> to allocate from a kmem cache (which would in turn require
>> bitmap_kmem_cache_create() to ensure the alignment requirements)?
>
>
> So all that is wrong then. Bad on me, I'd spend more time looking into your driver code...
>
> We already have bitmap_(from,to)_u(64,32),
> And you can use them. For 16-bit you have to add helpers yourself. But it's not a rocket science.
>

So e.g. for compressing something into a 16-byte buffer using bitmaps
I'd need to:

1) Allocate the buffer: buf = kmem_cache_alloc(...)
2) Allocate the bitmap: bitmap = bitmap_alloc(16*8, ...)
3) Fill the bitmap: mte_compress_to_buf(..., bitmap, 16)
4) Copy the bitmap contents to the buffer: bitmap_to_arr64(buf, bitmap, 16*8)
5) Deallocate the bitmap: bitmap_free(bitmap)

instead of:

buf = kmem_cache_alloc(...)
mte_compress_to_buf(..., (unsigned long *)buf, 16)

, correct?

Given that the buffer contents are opaque and its size is aligned on 8
bytes, could it be possible to somehow adopt the `buf` pointer
instead?

> I'm AFK at the moment. I'll take a close look at your machinery at the weekend.

Thanks and take your time!
  
Yury Norov Sept. 28, 2023, 7:59 p.m. UTC | #10
On Thu, Sep 28, 2023 at 05:14:55PM +0200, Alexander Potapenko wrote:
> On Thu, Sep 28, 2023 at 4:43 PM Yury Norov <yury.norov@gmail.com> wrote:
> >
> >
> >
> > On Thu, Sep 28, 2023, 10:20 AM Alexander Potapenko <glider@google.com> wrote:
> >>
> >> On Wed, Sep 27, 2023 at 9:51 AM David Laight <David.Laight@aculab.com> wrote:
> >> >
> >> > ...
> >> > > Overall, unless allocating and initializing bitmaps with size
> >> > > divisible by sizeof(long), most of bitmap.c is undefined behavior, so
> >> > > I don't think it makes much sense to specifically test this case here
> >> > > (given that we do not extend bitmap_equal() in the patch set).
> >> >
> >> > Bitmaps are arrays of unsigned long.
> >> > Using any of the APIs on anything else is a bug.
> >> > So it is always wrong to try to initialise 'a number of bytes'.
> >> > The size used in the definition need not be a multiple of 8 (on 64bit)
> >> > but the allocated data is always a multiple of 8.
> >> >
> >> > Any calls to the functions that have a cast of the bitmap
> >> > parameter are likely to be buggy.
> >> > And yes, there are loads of them, and many are buggy.
> >>
> >> I got rid of the casts in the bitmap test, but they remain in
> >> mtecomp.c, where 16-, 32-, 64-byte buffers allocated by
> >> kmem_cache_alloc() are treated as bitmaps:
> >> https://lore.kernel.org/linux-arm-kernel/20230922080848.1261487-6-glider@google.com/T/#mdb0d636d2d357f8ffe6ac79cef1145df3440f659
> >>
> >> Having them allocated by bitmap_alloc() won't work, because on Android
> >> bitmap_alloc() will allocate the buffers from the kmalloc-64 cache,
> >> defeating the purpose of the compression.
> >>
> >> Would it be better to extend the bitmap.h API so that it is possible
> >> to allocate from a kmem cache (which would in turn require
> >> bitmap_kmem_cache_create() to ensure the alignment requirements)?
> >
> >
> > So all that is wrong then. Bad on me, I'd spend more time looking into your driver code...
> >
> > We already have bitmap_(from,to)_u(64,32),
> > And you can use them. For 16-bit you have to add helpers yourself. But it's not a rocket science.
> >
> 
> So e.g. for compressing something into a 16-byte buffer using bitmaps
> I'd need to:
> 
> 1) Allocate the buffer: buf = kmem_cache_alloc(...)
> 2) Allocate the bitmap: bitmap = bitmap_alloc(16*8, ...)
> 3) Fill the bitmap: mte_compress_to_buf(..., bitmap, 16)
> 4) Copy the bitmap contents to the buffer: bitmap_to_arr64(buf, bitmap, 16*8)
> 5) Deallocate the bitmap: bitmap_free(bitmap)
> 
> instead of:
> 
> buf = kmem_cache_alloc(...)
> mte_compress_to_buf(..., (unsigned long *)buf, 16)
> 
> , correct?
> 
> Given that the buffer contents are opaque and its size is aligned on 8
> bytes, could it be possible to somehow adopt the `buf` pointer
> instead?

I didn't find an explicit typecasting where you're using
mte_compress_to_buf(), but now after hard 2nd look I see...

Firstly, now that in the documentation you are explicitly describing the
return value of mte_compress() as 64-bit frame, the right way to go would
be declaring the function as: u64 mte_compress(u8 *tags).

And the general pattern should be like this:

  unsigned long mte_compress(u8 *tags)
  {
          DECLARE_BITMAP(tmp, MTECOMP_CACHES_MAXBITS);
          void *storage;
          ...
          if (alloc_size < MTE_PAGE_TAG_STORAGE) {
                  storage = kmem_cache_alloc(cache, GFP_KERNEL);
                  mte_compress_to_buf(r_len, r_tags, r_sizes, tmp, alloc_size);
        
                  switch (alloc_size) {
                  case 16:
                          bitmap_to_arr16(storage, tmp, 16);
                          break;
                  case 32:
                          bitmap_to_arr32(storage, tmp, 32);
                          break;
                  case 64:
                          bitmap_to_arr64(storage, tmp, 64);
                          break;
                  default:
                          pr_err("error\n");
                  }
                  result = ((u64)storage | cache_id) & MTE_HANDLE_MASK;
                  goto ret;
          }
          ...
  }
        
Yeah, it looks cumbersome, but this is the right way to go if you need a
reliable BE-compatible driver. I think it will be less scary if you wrap
the switch with a helper, and/or move it inside mte_compress_to_buf(),
so that the mte_compress will stay unchanged.

Anyways, hope the above helped.

Thanks,
Yury
  
Alexander Potapenko Sept. 29, 2023, 8:54 a.m. UTC | #11
On Thu, Sep 28, 2023 at 10:02 PM Yury Norov <yury.norov@gmail.com> wrote:
>
> On Thu, Sep 28, 2023 at 05:14:55PM +0200, Alexander Potapenko wrote:
> >
> > So e.g. for compressing something into a 16-byte buffer using bitmaps
> > I'd need to:
> >
> > 1) Allocate the buffer: buf = kmem_cache_alloc(...)
> > 2) Allocate the bitmap: bitmap = bitmap_alloc(16*8, ...)
> > 3) Fill the bitmap: mte_compress_to_buf(..., bitmap, 16)
> > 4) Copy the bitmap contents to the buffer: bitmap_to_arr64(buf, bitmap, 16*8)
> > 5) Deallocate the bitmap: bitmap_free(bitmap)
> >
> > instead of:
> >
> > buf = kmem_cache_alloc(...)
> > mte_compress_to_buf(..., (unsigned long *)buf, 16)
> >
> > , correct?
> >
> > Given that the buffer contents are opaque and its size is aligned on 8
> > bytes, could it be possible to somehow adopt the `buf` pointer
> > instead?
>
> I didn't find an explicit typecasting where you're using
> mte_compress_to_buf(), but now after hard 2nd look I see...
>
> Firstly, now that in the documentation you are explicitly describing the
> return value of mte_compress() as 64-bit frame, the right way to go would
> be declaring the function as: u64 mte_compress(u8 *tags).

Ack.

> And the general pattern should be like this:
>
>   unsigned long mte_compress(u8 *tags)
>   {
>           DECLARE_BITMAP(tmp, MTECOMP_CACHES_MAXBITS);
>           void *storage;
>           ...
>           if (alloc_size < MTE_PAGE_TAG_STORAGE) {
>                   storage = kmem_cache_alloc(cache, GFP_KERNEL);
>                   mte_compress_to_buf(r_len, r_tags, r_sizes, tmp, alloc_size);
>
>                   switch (alloc_size) {
>                   case 16:
>                           bitmap_to_arr16(storage, tmp, 16);

I might be missing something, but why do we need the switch at all?
The buffers we are allocating always contain a whole number of u64's -
cannot we just always call bitmap_to_arr64()?

Note that for cases where alloc_size is > 8 we never make any
assumptions about the contents of @storage, and don't care much about
the byte order as long as swap decompression is done with the same
endianness (which is always the case).
(The case where alloc_size==8 is somewhat special, and needs more
accurate handling, because we do make assumptions about the bit layout
there).

>                           break;
>                   case 32:
>                           bitmap_to_arr32(storage, tmp, 32);
>                           break;
>                   case 64:
>                           bitmap_to_arr64(storage, tmp, 64);
>                           break;
>                   default:
>                           pr_err("error\n");
>                   }
>                   result = ((u64)storage | cache_id) & MTE_HANDLE_MASK;
>                   goto ret;
>           }
>           ...
>   }
>
> Yeah, it looks cumbersome, but this is the right way to go if you need a
> reliable BE-compatible driver.

What is the BE compatibility problem that you are anticipating here?

The issue that came up during testing was that on systems with
different endianness we cannot treat unaligned buffers as bitmaps,
because bitmap_write() might be touching memory past the allocation
size.
I agree therefore with the general approach of encapsulating all
bitmap operations in bitmap.h and hiding the implementation details
behind the API.
But in this particular case it seems to complicate the otherwise
trivial application of bitmap_read()/bitmap_write() to an external
buffer with guaranteed 64-bit alignment: we'll end up allocating this
temporary bitmap and doing a hard-to-optimize memcpy() between it and
the final storage.

A straightforward solution for this would be to fork u64* versions of
bitmap_read()/bitmap_write() implementations in mtecomp.c and apply
them to u64* buffers allocated in that file.
This would remove the need for the temporary bitmap (because there's
no encapsulation that mandates using bitmap.h now) and the extra
memcpy().
But I don't like this either, because forking code that exists in
headers is just wrong.

> I think it will be less scary if you wrap
> the switch with a helper, and/or move it inside mte_compress_to_buf(),
> so that the mte_compress will stay unchanged.
>
> Anyways, hope the above helped.
>
> Thanks,
> Yury


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
  
Yury Norov Oct. 2, 2023, 2:44 a.m. UTC | #12
On Fri, Sep 29, 2023 at 10:54:59AM +0200, Alexander Potapenko wrote:
> On Thu, Sep 28, 2023 at 10:02 PM Yury Norov <yury.norov@gmail.com> wrote:
> >
> > On Thu, Sep 28, 2023 at 05:14:55PM +0200, Alexander Potapenko wrote:
> > >
> > > So e.g. for compressing something into a 16-byte buffer using bitmaps
> > > I'd need to:
> > >
> > > 1) Allocate the buffer: buf = kmem_cache_alloc(...)
> > > 2) Allocate the bitmap: bitmap = bitmap_alloc(16*8, ...)
> > > 3) Fill the bitmap: mte_compress_to_buf(..., bitmap, 16)
> > > 4) Copy the bitmap contents to the buffer: bitmap_to_arr64(buf, bitmap, 16*8)
> > > 5) Deallocate the bitmap: bitmap_free(bitmap)
> > >
> > > instead of:
> > >
> > > buf = kmem_cache_alloc(...)
> > > mte_compress_to_buf(..., (unsigned long *)buf, 16)
> > >
> > > , correct?
> > >
> > > Given that the buffer contents are opaque and its size is aligned on 8
> > > bytes, could it be possible to somehow adopt the `buf` pointer
> > > instead?
> >
> > I didn't find an explicit typecasting where you're using
> > mte_compress_to_buf(), but now after hard 2nd look I see...
> >
> > Firstly, now that in the documentation you are explicitly describing the
> > return value of mte_compress() as 64-bit frame, the right way to go would
> > be declaring the function as: u64 mte_compress(u8 *tags).
> 
> Ack.
> 
> > And the general pattern should be like this:
> >
> >   unsigned long mte_compress(u8 *tags)
> >   {
> >           DECLARE_BITMAP(tmp, MTECOMP_CACHES_MAXBITS);
> >           void *storage;
> >           ...
> >           if (alloc_size < MTE_PAGE_TAG_STORAGE) {
> >                   storage = kmem_cache_alloc(cache, GFP_KERNEL);
> >                   mte_compress_to_buf(r_len, r_tags, r_sizes, tmp, alloc_size);
> >
> >                   switch (alloc_size) {
> >                   case 16:
> >                           bitmap_to_arr16(storage, tmp, 16);
> 
> I might be missing something, but why do we need the switch at all?
> The buffers we are allocating always contain a whole number of u64's -
> cannot we just always call bitmap_to_arr64()?
> 
> Note that for cases where alloc_size is > 8 we never make any
> assumptions about the contents of @storage, and don't care much about
> the byte order as long as swap decompression is done with the same
> endianness (which is always the case).
> (The case where alloc_size==8 is somewhat special, and needs more
> accurate handling, because we do make assumptions about the bit layout
> there).

So, this is my fault, and I'm really sorry. I read that 16-byte as
16-bit, and mistaken everything else. Please scratch the above.

If you allocate word-aligned memory, and it's a multiple of words,
which is your case, and access it only using bitmap API like
bitmap_read/write, everything should be fine.

Sorry again.
  
Alexander Potapenko Oct. 2, 2023, 7:34 a.m. UTC | #13
On Mon, Oct 2, 2023 at 4:44 AM Yury Norov <yury.norov@gmail.com> wrote:
>
> On Fri, Sep 29, 2023 at 10:54:59AM +0200, Alexander Potapenko wrote:
> > On Thu, Sep 28, 2023 at 10:02 PM Yury Norov <yury.norov@gmail.com> wrote:
> > >
> > > On Thu, Sep 28, 2023 at 05:14:55PM +0200, Alexander Potapenko wrote:
> > > >
> > > > So e.g. for compressing something into a 16-byte buffer using bitmaps
> > > > I'd need to:
> > > >
> > > > 1) Allocate the buffer: buf = kmem_cache_alloc(...)
> > > > 2) Allocate the bitmap: bitmap = bitmap_alloc(16*8, ...)
> > > > 3) Fill the bitmap: mte_compress_to_buf(..., bitmap, 16)
> > > > 4) Copy the bitmap contents to the buffer: bitmap_to_arr64(buf, bitmap, 16*8)
> > > > 5) Deallocate the bitmap: bitmap_free(bitmap)
> > > >
> > > > instead of:
> > > >
> > > > buf = kmem_cache_alloc(...)
> > > > mte_compress_to_buf(..., (unsigned long *)buf, 16)
> > > >
> > > > , correct?
> > > >
> > > > Given that the buffer contents are opaque and its size is aligned on 8
> > > > bytes, could it be possible to somehow adopt the `buf` pointer
> > > > instead?
> > >
> > > I didn't find an explicit typecasting where you're using
> > > mte_compress_to_buf(), but now after hard 2nd look I see...
> > >
> > > Firstly, now that in the documentation you are explicitly describing the
> > > return value of mte_compress() as 64-bit frame, the right way to go would
> > > be declaring the function as: u64 mte_compress(u8 *tags).
> >
> > Ack.
> >
> > > And the general pattern should be like this:
> > >
> > >   unsigned long mte_compress(u8 *tags)
> > >   {
> > >           DECLARE_BITMAP(tmp, MTECOMP_CACHES_MAXBITS);
> > >           void *storage;
> > >           ...
> > >           if (alloc_size < MTE_PAGE_TAG_STORAGE) {
> > >                   storage = kmem_cache_alloc(cache, GFP_KERNEL);
> > >                   mte_compress_to_buf(r_len, r_tags, r_sizes, tmp, alloc_size);
> > >
> > >                   switch (alloc_size) {
> > >                   case 16:
> > >                           bitmap_to_arr16(storage, tmp, 16);
> >
> > I might be missing something, but why do we need the switch at all?
> > The buffers we are allocating always contain a whole number of u64's -
> > cannot we just always call bitmap_to_arr64()?
> >
> > Note that for cases where alloc_size is > 8 we never make any
> > assumptions about the contents of @storage, and don't care much about
> > the byte order as long as swap decompression is done with the same
> > endianness (which is always the case).
> > (The case where alloc_size==8 is somewhat special, and needs more
> > accurate handling, because we do make assumptions about the bit layout
> > there).
>
> So, this is my fault, and I'm really sorry. I read that 16-byte as
> 16-bit, and mistaken everything else. Please scratch the above.
>
> If you allocate word-aligned memory, and it's a multiple of words,
> which is your case, and access it only using bitmap API like
> bitmap_read/write, everything should be fine.

Ok, fine, I'll stick to the current implementation then.
  

Patch

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 187f5b2db4cf1..ea66e39b29a4e 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -71,6 +71,17 @@  __check_eq_uint(const char *srcfile, unsigned int line,
 	return true;
 }
 
+static bool __init
+__check_eq_ulong(const char *srcfile, unsigned int line,
+		 const unsigned long exp_ulong, unsigned long x)
+{
+	if (exp_ulong != x) {
+		pr_err("[%s:%u] expected %lu, got %lu\n",
+			srcfile, line, exp_ulong, x);
+		return false;
+	}
+	return true;
+}
 
 static bool __init
 __check_eq_bitmap(const char *srcfile, unsigned int line,
@@ -186,6 +197,7 @@  __check_eq_str(const char *srcfile, unsigned int line,
 	})
 
 #define expect_eq_uint(...)		__expect_eq(uint, ##__VA_ARGS__)
+#define expect_eq_ulong(...)		__expect_eq(ulong, ##__VA_ARGS__)
 #define expect_eq_bitmap(...)		__expect_eq(bitmap, ##__VA_ARGS__)
 #define expect_eq_pbl(...)		__expect_eq(pbl, ##__VA_ARGS__)
 #define expect_eq_u32_array(...)	__expect_eq(u32_array, ##__VA_ARGS__)
@@ -1222,6 +1234,108 @@  static void __init test_bitmap_const_eval(void)
 	BUILD_BUG_ON(~var != ~BIT(25));
 }
 
+/*
+ * Test bitmap should be big enough to include the cases when start is not in
+ * the first word, and start+nbits lands in the following word.
+ */
+#define TEST_BIT_LEN (1000)
+#define TEST_BYTE_LEN (BITS_TO_BYTES(TEST_BIT_LEN))
+
+/*
+ * Helper function to test bitmap_write() overwriting the chosen byte pattern.
+ */
+static void __init test_bitmap_write_helper(unsigned char pattern)
+{
+	DECLARE_BITMAP(bitmap, TEST_BIT_LEN);
+	DECLARE_BITMAP(exp_bitmap, TEST_BIT_LEN);
+	unsigned long w, r, bit;
+	int i, n, nbits;
+
+	/*
+	 * Check that setting a single bit does not accidentally touch the
+	 * adjacent bits.
+	 */
+	for (i = 0; i < TEST_BIT_LEN; i++) {
+		/*
+		 * A 0b10101010 pattern to catch both 0s replaced to 1s
+		 * and vice versa.
+		 */
+		memset(bitmap, pattern, TEST_BYTE_LEN);
+		memset(exp_bitmap, pattern, TEST_BYTE_LEN);
+		for (bit = 0; bit <= 1; bit++) {
+			bitmap_write(bitmap, bit, i, 1);
+			__assign_bit(i, exp_bitmap, bit);
+			expect_eq_bitmap(exp_bitmap, bitmap,
+					 TEST_BIT_LEN);
+		}
+	}
+
+	/* Ensure setting 0 bits does not change anything. */
+	memset(bitmap, pattern, TEST_BYTE_LEN);
+	memset(exp_bitmap, pattern, TEST_BYTE_LEN);
+	for (i = 0; i < TEST_BIT_LEN; i++) {
+		bitmap_write(bitmap, ~0UL, i, 0);
+		expect_eq_bitmap(exp_bitmap, bitmap, TEST_BIT_LEN);
+	}
+
+	for (nbits = BITS_PER_LONG; nbits >= 1; nbits--) {
+		w = 0xdeadbeefdeadbeefUL >> (BITS_PER_LONG - nbits);
+		for (i = 0; i <= TEST_BIT_LEN - nbits; i++) {
+			memset(bitmap, pattern, TEST_BYTE_LEN);
+			memset(exp_bitmap, pattern, TEST_BYTE_LEN);
+			for (n = 0; n < nbits; n++)
+				__assign_bit(i + n, exp_bitmap, w & BIT(n));
+			bitmap_write(bitmap, w, i, nbits);
+			expect_eq_bitmap(exp_bitmap, bitmap, TEST_BIT_LEN);
+			r = bitmap_read(bitmap, i, nbits);
+			expect_eq_ulong(r, w);
+		}
+	}
+}
+
+static void __init test_bitmap_read_write(void)
+{
+	unsigned char pattern[3] = {0x00, 0xaa, 0xff};
+	DECLARE_BITMAP(bitmap, TEST_BIT_LEN);
+	unsigned long zero_bits = 0;
+	unsigned long val;
+	int i, pi;
+
+	/*
+	 * Setting/getting zero bytes should not crash the kernel.
+	 * READ_ONCE() prevents constant folding.
+	 */
+	bitmap_write(NULL, 0, 0, READ_ONCE(zero_bits));
+	val = bitmap_read(NULL, 0, READ_ONCE(zero_bits));
+	expect_eq_ulong(0, val);
+
+	/*
+	 * Ensure that bitmap_read() reads the same value that was previously
+	 * written, and two consequent values are correctly merged.
+	 * The resulting bit pattern is asymmetric to rule out possible issues
+	 * with bit numeration order.
+	 */
+	for (i = 0; i < TEST_BIT_LEN - 7; i++) {
+		bitmap_zero(bitmap, TEST_BIT_LEN);
+
+		bitmap_write(bitmap, 0b10101UL, i, 5);
+		val = bitmap_read(bitmap, i, 5);
+		expect_eq_ulong(0b10101UL, val);
+
+		bitmap_write(bitmap, 0b101UL, i + 5, 3);
+		val = bitmap_read(bitmap, i + 5, 3);
+		expect_eq_ulong(0b101UL, val);
+
+		val = bitmap_read(bitmap, i, 8);
+		expect_eq_ulong(0b10110101UL, val);
+	}
+
+	for (pi = 0; pi < sizeof(pattern); pi++)
+		test_bitmap_write_helper(pattern[pi]);
+}
+#undef TEST_BYTE_LEN
+#undef TEST_BIT_LEN
+
 static void __init selftest(void)
 {
 	test_zero_clear();
@@ -1237,6 +1351,7 @@  static void __init selftest(void)
 	test_bitmap_cut();
 	test_bitmap_print_buf();
 	test_bitmap_const_eval();
+	test_bitmap_read_write();
 
 	test_find_nth_bit();
 	test_for_each_set_bit();