[v2,06/12] mm/execmem: introduce execmem_data_alloc()

Message ID 20230616085038.4121892-7-rppt@kernel.org
State New
Headers
Series mm: jit/text allocator |

Commit Message

Mike Rapoport June 16, 2023, 8:50 a.m. UTC
  From: "Mike Rapoport (IBM)" <rppt@kernel.org>

Data related to code allocations, such as module data section, need to
comply with architecture constraints for its placement and its
allocation right now was done using execmem_text_alloc().

Create a dedicated API for allocating data related to code allocations
and allow architectures to define address ranges for data allocations.

Since currently this is only relevant for powerpc variants that use the
VMALLOC address space for module data allocations, automatically reuse
address ranges defined for text unless address range for data is
explicitly defined by an architecture.

With separation of code and data allocations, data sections of the
modules are now mapped as PAGE_KERNEL rather than PAGE_KERNEL_EXEC which
was a default on many architectures.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
---
 arch/powerpc/kernel/module.c |  8 +++++++
 include/linux/execmem.h      | 18 +++++++++++++++
 kernel/module/main.c         | 15 +++----------
 mm/execmem.c                 | 43 ++++++++++++++++++++++++++++++++++++
 4 files changed, 72 insertions(+), 12 deletions(-)
  

Comments

Edgecombe, Rick P June 16, 2023, 4:55 p.m. UTC | #1
On Fri, 2023-06-16 at 11:50 +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> Data related to code allocations, such as module data section, need
> to
> comply with architecture constraints for its placement and its
> allocation right now was done using execmem_text_alloc().
> 
> Create a dedicated API for allocating data related to code
> allocations
> and allow architectures to define address ranges for data
> allocations.

Right now the cross-arch way to specify kernel memory permissions is
encoded in the function names of all the set_memory_foo()'s. You can't
just have unified prot names because some arch's have NX and some have
X bits, etc. CPA wouldn't know if it needs to set or unset a bit if you
pass in a PROT.

But then you end up with a new function for *each* combination (i.e.
set_memory_rox()). I wish CPA has flags like mmap() does, and I wonder
if it makes sense here instead of execmem_data_alloc().

Maybe that is an overhaul for another day though...
  
Song Liu June 16, 2023, 8:01 p.m. UTC | #2
On Fri, Jun 16, 2023 at 1:51 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
>
> Data related to code allocations, such as module data section, need to
> comply with architecture constraints for its placement and its
> allocation right now was done using execmem_text_alloc().
>
> Create a dedicated API for allocating data related to code allocations
> and allow architectures to define address ranges for data allocations.
>
> Since currently this is only relevant for powerpc variants that use the
> VMALLOC address space for module data allocations, automatically reuse
> address ranges defined for text unless address range for data is
> explicitly defined by an architecture.
>
> With separation of code and data allocations, data sections of the
> modules are now mapped as PAGE_KERNEL rather than PAGE_KERNEL_EXEC which
> was a default on many architectures.
>
> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
[...]
>  static void free_mod_mem(struct module *mod)
> diff --git a/mm/execmem.c b/mm/execmem.c
> index a67acd75ffef..f7bf496ad4c3 100644
> --- a/mm/execmem.c
> +++ b/mm/execmem.c
> @@ -63,6 +63,20 @@ void *execmem_text_alloc(size_t size)
>                              fallback_start, fallback_end, kasan);
>  }
>
> +void *execmem_data_alloc(size_t size)
> +{
> +       unsigned long start = execmem_params.modules.data.start;
> +       unsigned long end = execmem_params.modules.data.end;
> +       pgprot_t pgprot = execmem_params.modules.data.pgprot;
> +       unsigned int align = execmem_params.modules.data.alignment;
> +       unsigned long fallback_start = execmem_params.modules.data.fallback_start;
> +       unsigned long fallback_end = execmem_params.modules.data.fallback_end;
> +       bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW;
> +
> +       return execmem_alloc(size, start, end, align, pgprot,
> +                            fallback_start, fallback_end, kasan);
> +}
> +
>  void execmem_free(void *ptr)
>  {
>         /*
> @@ -101,6 +115,28 @@ static bool execmem_validate_params(struct execmem_params *p)
>         return true;
>  }
>
> +static void execmem_init_missing(struct execmem_params *p)

Shall we call this execmem_default_init_data?

> +{
> +       struct execmem_modules_range *m = &p->modules;
> +
> +       if (!pgprot_val(execmem_params.modules.data.pgprot))
> +               execmem_params.modules.data.pgprot = PAGE_KERNEL;

Do we really need to check each of these? IOW, can we do:

if (!pgprot_val(execmem_params.modules.data.pgprot)) {
       execmem_params.modules.data.pgprot = PAGE_KERNEL;
       execmem_params.modules.data.alignment = m->text.alignment;
       execmem_params.modules.data.start = m->text.start;
       execmem_params.modules.data.end = m->text.end;
       execmem_params.modules.data.fallback_start = m->text.fallback_start;
      execmem_params.modules.data.fallback_end = m->text.fallback_end;
}

Thanks,
Song

[...]
  
Mike Rapoport June 17, 2023, 6:44 a.m. UTC | #3
On Fri, Jun 16, 2023 at 04:55:53PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2023-06-16 at 11:50 +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> > 
> > Data related to code allocations, such as module data section, need
> > to
> > comply with architecture constraints for its placement and its
> > allocation right now was done using execmem_text_alloc().
> > 
> > Create a dedicated API for allocating data related to code
> > allocations
> > and allow architectures to define address ranges for data
> > allocations.
> 
> Right now the cross-arch way to specify kernel memory permissions is
> encoded in the function names of all the set_memory_foo()'s. You can't
> just have unified prot names because some arch's have NX and some have
> X bits, etc. CPA wouldn't know if it needs to set or unset a bit if you
> pass in a PROT.

The idea is that allocator never uses CPA. It allocates with the
permissions defined by architecture at the first place and then the callers
might use CPA to change them. Ideally, that shouldn't be needed for
anything but ro_after_init, but we are far from there.

> But then you end up with a new function for *each* combination (i.e.
> set_memory_rox()). I wish CPA has flags like mmap() does, and I wonder
> if it makes sense here instead of execmem_data_alloc().

I don't think execmem should handle all the combinations. The code is
always allocated as ROX for architectures that support text poking and as
RWX for those that don't.

Maybe execmem_data_alloc() will need to able to handle RW and RO data
differently at some point, but that's the only variant I can think of, but
even then I don't expect CPA will be used by execmem. 

We also might move resetting the permissions to default from vmalloc to
execmem, but again, we are far from there.
 
> Maybe that is an overhaul for another day though...

CPA surely needs some love :)
  
Mike Rapoport June 17, 2023, 6:51 a.m. UTC | #4
On Fri, Jun 16, 2023 at 01:01:08PM -0700, Song Liu wrote:
> On Fri, Jun 16, 2023 at 1:51 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> >
> > Data related to code allocations, such as module data section, need to
> > comply with architecture constraints for its placement and its
> > allocation right now was done using execmem_text_alloc().
> >
> > Create a dedicated API for allocating data related to code allocations
> > and allow architectures to define address ranges for data allocations.
> >
> > Since currently this is only relevant for powerpc variants that use the
> > VMALLOC address space for module data allocations, automatically reuse
> > address ranges defined for text unless address range for data is
> > explicitly defined by an architecture.
> >
> > With separation of code and data allocations, data sections of the
> > modules are now mapped as PAGE_KERNEL rather than PAGE_KERNEL_EXEC which
> > was a default on many architectures.
> >
> > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> [...]
> >  static void free_mod_mem(struct module *mod)
> > diff --git a/mm/execmem.c b/mm/execmem.c
> > index a67acd75ffef..f7bf496ad4c3 100644
> > --- a/mm/execmem.c
> > +++ b/mm/execmem.c
> > @@ -63,6 +63,20 @@ void *execmem_text_alloc(size_t size)
> >                              fallback_start, fallback_end, kasan);
> >  }
> >
> > +void *execmem_data_alloc(size_t size)
> > +{
> > +       unsigned long start = execmem_params.modules.data.start;
> > +       unsigned long end = execmem_params.modules.data.end;
> > +       pgprot_t pgprot = execmem_params.modules.data.pgprot;
> > +       unsigned int align = execmem_params.modules.data.alignment;
> > +       unsigned long fallback_start = execmem_params.modules.data.fallback_start;
> > +       unsigned long fallback_end = execmem_params.modules.data.fallback_end;
> > +       bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW;
> > +
> > +       return execmem_alloc(size, start, end, align, pgprot,
> > +                            fallback_start, fallback_end, kasan);
> > +}
> > +
> >  void execmem_free(void *ptr)
> >  {
> >         /*
> > @@ -101,6 +115,28 @@ static bool execmem_validate_params(struct execmem_params *p)
> >         return true;
> >  }
> >
> > +static void execmem_init_missing(struct execmem_params *p)
> 
> Shall we call this execmem_default_init_data?

This also fills in jit.text (next patch), so _data doesn't work here :)
And it's not really a default, the defaults are set explicitly for arches
that don't have execmem_arch_params.
 
> > +{
> > +       struct execmem_modules_range *m = &p->modules;
> > +
> > +       if (!pgprot_val(execmem_params.modules.data.pgprot))
> > +               execmem_params.modules.data.pgprot = PAGE_KERNEL;
> 
> Do we really need to check each of these? IOW, can we do:
> 
> if (!pgprot_val(execmem_params.modules.data.pgprot)) {
>        execmem_params.modules.data.pgprot = PAGE_KERNEL;
>        execmem_params.modules.data.alignment = m->text.alignment;
>        execmem_params.modules.data.start = m->text.start;
>        execmem_params.modules.data.end = m->text.end;
>        execmem_params.modules.data.fallback_start = m->text.fallback_start;
>       execmem_params.modules.data.fallback_end = m->text.fallback_end;
> }

Yes, we can have a single check here.
 
> Thanks,
> Song
> 
> [...]
  
Thomas Gleixner June 18, 2023, 10:32 p.m. UTC | #5
Mike!

Sorry for being late on this ...

On Fri, Jun 16 2023 at 11:50, Mike Rapoport wrote:
>  
> +void *execmem_data_alloc(size_t size)
> +{
> +	unsigned long start = execmem_params.modules.data.start;
> +	unsigned long end = execmem_params.modules.data.end;
> +	pgprot_t pgprot = execmem_params.modules.data.pgprot;
> +	unsigned int align = execmem_params.modules.data.alignment;
> +	unsigned long fallback_start = execmem_params.modules.data.fallback_start;
> +	unsigned long fallback_end = execmem_params.modules.data.fallback_end;
> +	bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW;

While I know for sure that you read up on the discussion I had with Song
about data structures, it seems you completely failed to understand it.

> +	return execmem_alloc(size, start, end, align, pgprot,
> +			     fallback_start, fallback_end, kasan);

Having _seven_ intermediate variables to fill _eight_ arguments of a
function instead of handing in @size and a proper struct pointer is
tasteless and disgusting at best.

Six out of those seven parameters are from:

    execmem_params.module.data

while the KASAN shadow part is retrieved from

    execmem_params.module.flags

So what prevents you from having a uniform data structure, which is
extensible and decribes _all_ types of allocations?

Absolutely nothing. The flags part can either be in the type dependend
part or you make the type configs an array as I had suggested originally
and then execmem_alloc() becomes:

void *execmem_alloc(type, size)

and

static inline void *execmem_data_alloc(size_t size)
{
        return execmem_alloc(EXECMEM_TYPE_DATA, size);
}

which gets the type independent parts from @execmem_param.

Just read through your own series and watch the evolution of
execmem_alloc():

  static void *execmem_alloc(size_t size)

  static void *execmem_alloc(size_t size, unsigned long start,
                             unsigned long end, unsigned int align,
                             pgprot_t pgprot)

  static void *execmem_alloc(size_t len, unsigned long start,
                             unsigned long end, unsigned int align,
                             pgprot_t pgprot,
                             unsigned long fallback_start,
                             unsigned long fallback_end,
                             bool kasan)

In a month from now this function will have _ten_ parameters and tons of
horrible wrappers which convert an already existing data structure into
individual function arguments.

Seriously?

If you want this function to be [ab]used outside of the exec_param
configuration space for whatever non-sensical reasons then this still
can be either:

void *execmem_alloc(params, type, size)

static inline void *execmem_data_alloc(size_t size)
{
        return execmem_alloc(&exec_param, EXECMEM_TYPE_DATA, size);
}

or

void *execmem_alloc(type_params, size);

static inline void *execmem_data_alloc(size_t size)
{
        return execmem_alloc(&exec_param.data, size);
}

which both allows you to provide alternative params, right?

Coming back to my conversation with Song:

   "Bad programmers worry about the code. Good programmers worry about
    data structures and their relationships."

You might want to reread:

    https://lore.kernel.org/r/87lenuukj0.ffs@tglx

and the subsequent thread.

The fact that my suggestions had a 'mod_' namespace prefix does not make
any of my points moot.

Song did an extremly good job in abstracting things out, but you decided
to ditch his ground work instead of building on it and keeping the good
parts. That's beyond sad.

Worse, you went the full 'not invented here' path with an outcome which is
even worse than the original hackery from Song which started the above
referenced thread.

I don't know what caused you to decide that ad hoc hackery is better
than proper data structure based design patterns. I actually don't want
to know.

As much as my voice counts:

  NAK-ed-by: Thomas Gleixner <tglx@linutronix.de>

Thanks,

        tglx
  
Kent Overstreet June 18, 2023, 11:14 p.m. UTC | #6
On Mon, Jun 19, 2023 at 12:32:55AM +0200, Thomas Gleixner wrote:
> Mike!
> 
> Sorry for being late on this ...
> 
> On Fri, Jun 16 2023 at 11:50, Mike Rapoport wrote:
> >  
> > +void *execmem_data_alloc(size_t size)
> > +{
> > +	unsigned long start = execmem_params.modules.data.start;
> > +	unsigned long end = execmem_params.modules.data.end;
> > +	pgprot_t pgprot = execmem_params.modules.data.pgprot;
> > +	unsigned int align = execmem_params.modules.data.alignment;
> > +	unsigned long fallback_start = execmem_params.modules.data.fallback_start;
> > +	unsigned long fallback_end = execmem_params.modules.data.fallback_end;
> > +	bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW;
> 
> While I know for sure that you read up on the discussion I had with Song
> about data structures, it seems you completely failed to understand it.
> 
> > +	return execmem_alloc(size, start, end, align, pgprot,
> > +			     fallback_start, fallback_end, kasan);
> 
> Having _seven_ intermediate variables to fill _eight_ arguments of a
> function instead of handing in @size and a proper struct pointer is
> tasteless and disgusting at best.
> 
> Six out of those seven parameters are from:
> 
>     execmem_params.module.data
> 
> while the KASAN shadow part is retrieved from
> 
>     execmem_params.module.flags
> 
> So what prevents you from having a uniform data structure, which is
> extensible and decribes _all_ types of allocations?
> 
> Absolutely nothing. The flags part can either be in the type dependend
> part or you make the type configs an array as I had suggested originally
> and then execmem_alloc() becomes:
> 
> void *execmem_alloc(type, size)
> 
> and
> 
> static inline void *execmem_data_alloc(size_t size)
> {
>         return execmem_alloc(EXECMEM_TYPE_DATA, size);
> }
> 
> which gets the type independent parts from @execmem_param.
> 
> Just read through your own series and watch the evolution of
> execmem_alloc():
> 
>   static void *execmem_alloc(size_t size)
> 
>   static void *execmem_alloc(size_t size, unsigned long start,
>                              unsigned long end, unsigned int align,
>                              pgprot_t pgprot)
> 
>   static void *execmem_alloc(size_t len, unsigned long start,
>                              unsigned long end, unsigned int align,
>                              pgprot_t pgprot,
>                              unsigned long fallback_start,
>                              unsigned long fallback_end,
>                              bool kasan)
> 
> In a month from now this function will have _ten_ parameters and tons of
> horrible wrappers which convert an already existing data structure into
> individual function arguments.
> 
> Seriously?
> 
> If you want this function to be [ab]used outside of the exec_param
> configuration space for whatever non-sensical reasons then this still
> can be either:
> 
> void *execmem_alloc(params, type, size)
> 
> static inline void *execmem_data_alloc(size_t size)
> {
>         return execmem_alloc(&exec_param, EXECMEM_TYPE_DATA, size);
> }
> 
> or
> 
> void *execmem_alloc(type_params, size);
> 
> static inline void *execmem_data_alloc(size_t size)
> {
>         return execmem_alloc(&exec_param.data, size);
> }
> 
> which both allows you to provide alternative params, right?
> 
> Coming back to my conversation with Song:
> 
>    "Bad programmers worry about the code. Good programmers worry about
>     data structures and their relationships."

Thomas, you're confusing an internal interface with external, I made the
same mistake reviewing Song's patchset...
  
Thomas Gleixner June 19, 2023, 12:43 a.m. UTC | #7
Kent!

On Sun, Jun 18 2023 at 19:14, Kent Overstreet wrote:
> On Mon, Jun 19, 2023 at 12:32:55AM +0200, Thomas Gleixner wrote:
>
> Thomas, you're confusing an internal interface with external

No. I am not.

Whether that's an internal function or not does not make any difference
at all.

Having a function growing _eight_ parameters where _six_ of them are
derived from a well defined data structure is a clear sign of design
fail.

It's not rocket science to do:

struct generic_allocation_info {
       ....
};

struct execmem_info {
       ....
       struct generic_allocation_info alloc_info;
;

struct execmem_param {
       ...
       struct execmem_info[NTYPES];
};

and having a function which can either operate on execmem_param and type
or on generic_allocation_info itself. It does not matter as long as the
data structure which is handed into this internal function is
describing it completely or needs a supplementary argument, i.e. flags.

Having tons of wrappers which do:

       a = generic_info.a;
       b = generic_info.b;
       ....
       n = generic_info.n;

       internal_func(a, b, ....,, n);

is just hillarious and to repeat myself tasteless and therefore
disgusting.

That's CS course first semester hackery, but TBH, I can only tell from
imagination because I did not take CS courses - maybe that's the
problem...

Data structure driven design works not from the usage site down to the
internals. It's the other way round:

  1) Define a data structure which describes what the internal function
     needs to know

  2) Implement use case specific variants which describe that

  3) Hand the use case specific variant to the internal function
     eventually with some minimal supplementary information.

Object oriented basics, right?

There is absolutely nothing wrong with having

      internal_func(generic_info *, modifier);

but having

      internal_func(a, b, ... n)

is fundamentally wrong in the context of an extensible and future proof
internal function.

Guess what. Today it's sufficient to have _eight_ arguments and we are
happy to have 10 nonsensical wrappers around this internal
function. Tomorrow there happens to be a use case which needs another
argument so you end up:

  Changing 10 wrappers plus the function declaration and definition in
  one go

instead of adding

  One new naturally 0 initialized member to generic_info and be done
  with it.

Look at the evolution of execmem_alloc() in this very pathset which I
pointed out. That very patchset covers _two_ of at least _six_ cases
Song and myself identified. It already had _three_ steps of evolution
from _one_ to _five_ to _eight_ parameters.

C is not like e.g. python where you can "solve" that problem by simply
doing:

- internal_func(a, b, c):
+ internal_func(a, b, c, d=None, ..., n=None):

But that's not a solution either. That's a horrible workaround even in
python once your parameter space gets sufficiently large. The way out in
python is to have **kwargs. But that's not an option in C, and not
necessarily the best option for python either.

Even in python or any other object oriented language you get to the
point where you have to rethink your approach, go back to the drawing
board and think about data representation.

But creating a new interface based on "let's see what we need over
time and add parameters as we see fit" is simply wrong to begin with
independent of the programming language.

Even if the _eight_ parameters are the end of the range, then they are
beyond justifyable because that's way beyond the natural register
argument space of any architecture and you are offloading your lazyness
to wrappers and the compiler to emit pointlessly horrible code.

There is a reason why new syscalls which need more than a few parameters
are based on 'struct DATA_WHICH_I_NEED_TO_KNOW' and 'flags'.

We've got burned on the non-extensibilty often enough. Why would a new
internal function have any different requirements especially as it is
neither implemented to the full extent nor a hotpath function?

Now you might argue that it _is_ a "hotpath" due to the BPF usage, but
then even more so as any intermediate wrapper which converts from one
data representation to another data representation is not going to
increase performance, right?

> ... I made the same mistake reviewing Song's patchset...

Songs series had rough edges, but was way more data structure driven
and palatable than this hackery.

The fact that you made a mistake while reviewing Songs series has
absolutely nothing to do with the above or my previous reply to Mike.

Thanks,

        tglx
  
Kent Overstreet June 19, 2023, 2:12 a.m. UTC | #8
On Mon, Jun 19, 2023 at 02:43:58AM +0200, Thomas Gleixner wrote:
> Kent!

Hi Thomas :)

> No. I am not.

Ok.

> Whether that's an internal function or not does not make any difference
> at all.

Well, at the risk of this discussion going completely off the rails, I
have to disagree with you there. External interfaces and high level
semantics are more important to get right from the outset, internal
implementation details can be cleaned up later, within reason.

And the discussion on this patchset has been more focused on those
external interfaces, which seems like the right approach to me.

> > ... I made the same mistake reviewing Song's patchset...
> 
> Songs series had rough edges, but was way more data structure driven
> and palatable than this hackery.

I liked that aspect of Song's patchset too, and I'm actually inclined to
agree with you that this patchset might get a bit cleaner with more of
that, but really, this semes like just quibbling over calling convention
for an internal helper function.
  
Mike Rapoport June 19, 2023, 3:23 p.m. UTC | #9
On Mon, Jun 19, 2023 at 12:32:55AM +0200, Thomas Gleixner wrote:
> Mike!
> 
> Sorry for being late on this ...
> 
> On Fri, Jun 16 2023 at 11:50, Mike Rapoport wrote:
> 
> The fact that my suggestions had a 'mod_' namespace prefix does not make
> any of my points moot.

The prefix does not matter. What matters is what we are trying to abstract.
Your suggestion is based of the memory used by modules. I'm abstracting
address spaces for different types of executable and related memory. They
are similar, yes, but they are not the same.

The TEXT, INIT_TEXT and *_DATA do not match to what we have from arch POV.
They have modules with text, rw data, ro data and ro after init data and
the memory for the generated code. The memory for modules and memory for
other users have different restrictions for their placement, so using a
single TEXT type for them is semantically wrong. BPF and kprobes do not
necessarily must be at the same address range as modules and init text does
not differ from normal text.

> Song did an extremly good job in abstracting things out, but you decided
> to ditch his ground work instead of building on it and keeping the good
> parts. That's beyond sad.

Actually not. The core idea to describe address range suitable for code
allocations with a structure and have arch code initialize this structure
at boot and be done with it is the same. But I don't think vmalloc
parameters belong there, they should be completely encapsulated in the
allocator. Having fallback range named explicitly is IMO clearer than an
array of address spaces.

I accept your point that the structures describing ranges for different
types should be unified and I've got carried away with making the wrappers
to convert that structure to parameters to the core allocation function.

I've chosen to define ranges as fields in the containing structure rather
than enum with types and an array because I strongly feel that the callers
should not care about these parameters. These parameters are defined by
architecture and the callers should not need to know how each and every
arch defines restrictions suitable for modules, bpf or kprobes.

That's also the reason to have different names for API calls, exactly to
avoid having alloc(KPROBES,...), alloc(BPF, ...), alloc(MODULES, ...) an so
on.

All in all, if I filter all the ranting, this boils down to having a
unified structure for all the address ranges and passing this structure
from the wrappers to the core alloc as is rather that translating it to
separate parameters, with which I agree.

> Thanks,
> 
>         tglx
  
Steven Rostedt June 20, 2023, 2:51 p.m. UTC | #10
On Mon, 19 Jun 2023 02:43:58 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> Now you might argue that it _is_ a "hotpath" due to the BPF usage, but
> then even more so as any intermediate wrapper which converts from one
> data representation to another data representation is not going to
> increase performance, right?

Just as a side note. BPF can not attach its return calling code to
functions that have more than 6 parameters (3 on 32 bit x86), because of
the way BPF return path trampoline works. It is a requirement that all
parameters live in registers, and none on the stack.

-- Steve
  
Alexei Starovoitov June 20, 2023, 3:32 p.m. UTC | #11
On Tue, Jun 20, 2023 at 7:51 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 19 Jun 2023 02:43:58 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > Now you might argue that it _is_ a "hotpath" due to the BPF usage, but
> > then even more so as any intermediate wrapper which converts from one
> > data representation to another data representation is not going to
> > increase performance, right?
>
> Just as a side note. BPF can not attach its return calling code to
> functions that have more than 6 parameters (3 on 32 bit x86), because of
> the way BPF return path trampoline works. It is a requirement that all
> parameters live in registers, and none on the stack.

It's actually 7 and that restriction is being lifted.
The patch set to attach to <= 12 is being discussed.
  

Patch

diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index ba7abff77d98..4c6c15bf3947 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -103,6 +103,10 @@  struct execmem_params __init *execmem_arch_params(void)
 {
 	pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC;
 
+	/*
+	 * BOOK3S_32 and 8xx define MODULES_VADDR for text allocations and
+	 * allow allocating data in the entire vmalloc space
+	 */
 #ifdef MODULES_VADDR
 	unsigned long limit = (unsigned long)_etext - SZ_32M;
 
@@ -116,6 +120,10 @@  struct execmem_params __init *execmem_arch_params(void)
 		execmem_params.modules.text.start = MODULES_VADDR;
 		execmem_params.modules.text.end = MODULES_END;
 	}
+	execmem_params.modules.data.start = VMALLOC_START;
+	execmem_params.modules.data.end = VMALLOC_END;
+	execmem_params.modules.data.pgprot = PAGE_KERNEL;
+	execmem_params.modules.data.alignment = 1;
 #else
 	execmem_params.modules.text.start = VMALLOC_START;
 	execmem_params.modules.text.end = VMALLOC_END;
diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index b9a97fcdf3c5..2e1221310d13 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -44,10 +44,12 @@  enum execmem_module_flags {
  *				  space
  * @flags:	options for module memory allocations
  * @text:	address range for text allocations
+ * @data:	address range for data allocations
  */
 struct execmem_modules_range {
 	enum execmem_module_flags flags;
 	struct execmem_range text;
+	struct execmem_range data;
 };
 
 /**
@@ -89,6 +91,22 @@  struct execmem_params *execmem_arch_params(void);
  */
 void *execmem_text_alloc(size_t size);
 
+/**
+ * execmem_data_alloc - allocate memory for data coupled to code
+ * @size: how many bytes of memory are required
+ *
+ * Allocates memory that will contain data copupled with executable code,
+ * like data sections in kernel modules.
+ *
+ * The memory will have protections defined by architecture.
+ *
+ * The allocated memory will reside in an area that does not impose
+ * restrictions on the addressing modes.
+ *
+ * Return: a pointer to the allocated memory or %NULL
+ */
+void *execmem_data_alloc(size_t size);
+
 /**
  * execmem_free - free executable memory
  * @ptr: pointer to the memory that should be freed
diff --git a/kernel/module/main.c b/kernel/module/main.c
index b445c5ad863a..d6582bfec1f6 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1195,25 +1195,16 @@  void __weak module_arch_freeing_init(struct module *mod)
 {
 }
 
-static bool mod_mem_use_vmalloc(enum mod_mem_type type)
-{
-	return IS_ENABLED(CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC) &&
-		mod_mem_type_is_core_data(type);
-}
-
 static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
 {
-	if (mod_mem_use_vmalloc(type))
-		return vzalloc(size);
+	if (mod_mem_type_is_data(type))
+		return execmem_data_alloc(size);
 	return execmem_text_alloc(size);
 }
 
 static void module_memory_free(void *ptr, enum mod_mem_type type)
 {
-	if (mod_mem_use_vmalloc(type))
-		vfree(ptr);
-	else
-		execmem_free(ptr);
+	execmem_free(ptr);
 }
 
 static void free_mod_mem(struct module *mod)
diff --git a/mm/execmem.c b/mm/execmem.c
index a67acd75ffef..f7bf496ad4c3 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -63,6 +63,20 @@  void *execmem_text_alloc(size_t size)
 			     fallback_start, fallback_end, kasan);
 }
 
+void *execmem_data_alloc(size_t size)
+{
+	unsigned long start = execmem_params.modules.data.start;
+	unsigned long end = execmem_params.modules.data.end;
+	pgprot_t pgprot = execmem_params.modules.data.pgprot;
+	unsigned int align = execmem_params.modules.data.alignment;
+	unsigned long fallback_start = execmem_params.modules.data.fallback_start;
+	unsigned long fallback_end = execmem_params.modules.data.fallback_end;
+	bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW;
+
+	return execmem_alloc(size, start, end, align, pgprot,
+			     fallback_start, fallback_end, kasan);
+}
+
 void execmem_free(void *ptr)
 {
 	/*
@@ -101,6 +115,28 @@  static bool execmem_validate_params(struct execmem_params *p)
 	return true;
 }
 
+static void execmem_init_missing(struct execmem_params *p)
+{
+	struct execmem_modules_range *m = &p->modules;
+
+	if (!pgprot_val(execmem_params.modules.data.pgprot))
+		execmem_params.modules.data.pgprot = PAGE_KERNEL;
+
+	if (!execmem_params.modules.data.alignment)
+		execmem_params.modules.data.alignment = m->text.alignment;
+
+	if (!execmem_params.modules.data.start) {
+		execmem_params.modules.data.start = m->text.start;
+		execmem_params.modules.data.end = m->text.end;
+	}
+
+	if (!execmem_params.modules.data.fallback_start &&
+	    execmem_params.modules.text.fallback_start) {
+		execmem_params.modules.data.fallback_start = m->text.fallback_start;
+		execmem_params.modules.data.fallback_end = m->text.fallback_end;
+	}
+}
+
 void __init execmem_init(void)
 {
 	struct execmem_params *p = execmem_arch_params();
@@ -112,6 +148,11 @@  void __init execmem_init(void)
 		p->modules.text.pgprot = PAGE_KERNEL_EXEC;
 		p->modules.text.alignment = 1;
 
+		p->modules.data.start = VMALLOC_START;
+		p->modules.data.end = VMALLOC_END;
+		p->modules.data.pgprot = PAGE_KERNEL;
+		p->modules.data.alignment = 1;
+
 		return;
 	}
 
@@ -119,4 +160,6 @@  void __init execmem_init(void)
 		return;
 
 	execmem_params = *p;
+
+	execmem_init_missing(p);
 }