[0/3] Type aware module allocator

Message ID 20230526051529.3387103-1-song@kernel.org
Headers
Series Type aware module allocator |

Message

Song Liu May 26, 2023, 5:15 a.m. UTC
  This set implements the second part of module type aware allocator
(module_alloc_type), which was discussed in [1]. This part contains the
interface of the new allocator, as well as changes in x86 code to use the
new allocator (modules, BPF, ftrace, kprobe).

The set does not contain a binpack allocator to enable sharing huge pages
among different allocations. But this set defines the interface used by
the binpack allocator. [2] has some discussion on different options of the
binpack allocator.

[1] https://lore.kernel.org/linux-mm/20221107223921.3451913-1-song@kernel.org/
[2] https://lore.kernel.org/all/20230308094106.227365-1-rppt@kernel.org/

Song Liu (3):
  module: Introduce module_alloc_type
  ftrace: Add swap_func to ftrace_process_locs()
  x86/module: Use module_alloc_type

 arch/x86/kernel/alternative.c  |  37 ++++--
 arch/x86/kernel/ftrace.c       |  44 ++++---
 arch/x86/kernel/kprobes/core.c |   8 +-
 arch/x86/kernel/module.c       | 114 +++++++++++------
 arch/x86/kernel/unwind_orc.c   |  13 +-
 arch/x86/net/bpf_jit_comp.c    |  22 +++-
 include/linux/ftrace.h         |   2 +
 include/linux/module.h         |   6 +
 include/linux/moduleloader.h   |  75 ++++++++++++
 init/main.c                    |   1 +
 kernel/bpf/bpf_struct_ops.c    |  10 +-
 kernel/bpf/core.c              |  26 ++--
 kernel/bpf/trampoline.c        |   6 +-
 kernel/kprobes.c               |   6 +-
 kernel/module/internal.h       |   3 +
 kernel/module/main.c           | 217 +++++++++++++++++++++++++++++++--
 kernel/module/strict_rwx.c     |   4 +
 kernel/trace/ftrace.c          |  13 +-
 18 files changed, 493 insertions(+), 114 deletions(-)

--
2.34.1
  

Comments

Kent Overstreet May 27, 2023, 7:04 a.m. UTC | #1
On Thu, May 25, 2023 at 10:15:26PM -0700, Song Liu wrote:
> This set implements the second part of module type aware allocator
> (module_alloc_type), which was discussed in [1]. This part contains the
> interface of the new allocator, as well as changes in x86 code to use the
> new allocator (modules, BPF, ftrace, kprobe).
> 
> The set does not contain a binpack allocator to enable sharing huge pages
> among different allocations. But this set defines the interface used by
> the binpack allocator. [2] has some discussion on different options of the
> binpack allocator.

I'm afraid the more I look at this patchset the more it appears to be
complete nonsense.

The exposed interface appears to be both for specifying architecture
dependent options (which should be hidden inside the allocator
internals!) _and_ a general purpose interface for module/bpf/kprobes -
but it's not very clear, and the rational appears to be completely
missing.

I think this needs to back to the drawing board and we need something
simpler just targeted at executable memory; architecture specific
options should definitely _not_ be part of the exposed interface.

The memory protection interface also needs to go, we've got a better
interface to model after (text_poke(), although that code needs work
too!). And the instruction fill features need a thorough justification
if they're to be included.
  
Song Liu May 28, 2023, 5:58 a.m. UTC | #2
On Sat, May 27, 2023 at 12:04 AM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Thu, May 25, 2023 at 10:15:26PM -0700, Song Liu wrote:
> > This set implements the second part of module type aware allocator
> > (module_alloc_type), which was discussed in [1]. This part contains the
> > interface of the new allocator, as well as changes in x86 code to use the
> > new allocator (modules, BPF, ftrace, kprobe).
> >
> > The set does not contain a binpack allocator to enable sharing huge pages
> > among different allocations. But this set defines the interface used by
> > the binpack allocator. [2] has some discussion on different options of the
> > binpack allocator.
>
> I'm afraid the more I look at this patchset the more it appears to be
> complete nonsense.

I don't think it is nonsense, as you clearly got most of the points here. :)

>
> The exposed interface appears to be both for specifying architecture
> dependent options (which should be hidden inside the allocator
> internals!) _and_ a general purpose interface for module/bpf/kprobes -
> but it's not very clear, and the rational appears to be completely
> missing.

The rationale is to have something to replace module_alloc(). Therefore,
it needs to handle architecture specific requirements, and provide
interface to all current users of module_alloc(). It may look a little weird
at the moment, because the actual allocator logic is very thin. But that's
where we will plug in the next step of the work.

>
> I think this needs to back to the drawing board and we need something
> simpler just targeted at executable memory; architecture specific
> options should definitely _not_ be part of the exposed interface.

I don't think we are exposing architecture specific options to users.
Some layer need to handle arch specifics. If the new allocator is
built on top of module_alloc, module_alloc is handling that. If the new
allocator is to replace module_alloc, it needs to handle arch specifics.

>
> The memory protection interface also needs to go, we've got a better
> interface to model after (text_poke(), although that code needs work
> too!). And the instruction fill features need a thorough justification
> if they're to be included.

I guess the first step to use text_poke() is to make it available on all
archs? That doesn't seem easy to me.

Thanks,
Song
  
Mike Rapoport May 29, 2023, 10:45 a.m. UTC | #3
On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> On Sat, May 27, 2023 at 12:04 AM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > I think this needs to back to the drawing board and we need something
> > simpler just targeted at executable memory; architecture specific
> > options should definitely _not_ be part of the exposed interface.
> 
> I don't think we are exposing architecture specific options to users.
> Some layer need to handle arch specifics. If the new allocator is
> built on top of module_alloc, module_alloc is handling that. If the new
> allocator is to replace module_alloc, it needs to handle arch specifics.
 
I'm for creating a new allocator that will replace module_alloc(). This
will give us a clean abstraction that modules and all the rest will use and
it will make easier to plug binpack or another allocator instead of
vmalloc.
Another point is with a new allocator we won't have weird dependencies on
CONFIG_MODULE in e.g. bpf and kprobes.

I'll have something ready to post as an RFC in a few days.

> Thanks,
> Song
  
Kent Overstreet May 29, 2023, 6:25 p.m. UTC | #4
On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> I don't think we are exposing architecture specific options to users.
> Some layer need to handle arch specifics. If the new allocator is
> built on top of module_alloc, module_alloc is handling that. If the new
> allocator is to replace module_alloc, it needs to handle arch specifics.

Ok, I went back and read more thoroughly, I got this part wrong. The
actual interface is the mod_mem_type enum, not mod_alloc_params or
vmalloc_params.

So this was my main complaint, but this actually looks ok now.

It would be better to have those structs in a .c file, not the header
file - it looks like those are the public interface the way you have it.

> > The memory protection interface also needs to go, we've got a better
> > interface to model after (text_poke(), although that code needs work
> > too!). And the instruction fill features need a thorough justification
> > if they're to be included.
> 
> I guess the first step to use text_poke() is to make it available on all
> archs? That doesn't seem easy to me.

We just need a helper that either calls text_poke() or does the page
permission dance in a single place.

If we do the same thing for other mod_mem_types, we could potentially
allow them to be shared on hugepages too.
  
Song Liu May 30, 2023, 10:37 p.m. UTC | #5
On Mon, May 29, 2023 at 3:45 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> > On Sat, May 27, 2023 at 12:04 AM Kent Overstreet
> > <kent.overstreet@linux.dev> wrote:
> > >
> > > I think this needs to back to the drawing board and we need something
> > > simpler just targeted at executable memory; architecture specific
> > > options should definitely _not_ be part of the exposed interface.
> >
> > I don't think we are exposing architecture specific options to users.
> > Some layer need to handle arch specifics. If the new allocator is
> > built on top of module_alloc, module_alloc is handling that. If the new
> > allocator is to replace module_alloc, it needs to handle arch specifics.
>
> I'm for creating a new allocator that will replace module_alloc(). This
> will give us a clean abstraction that modules and all the rest will use and
> it will make easier to plug binpack or another allocator instead of
> vmalloc.
>
> Another point is with a new allocator we won't have weird dependencies on
> CONFIG_MODULE in e.g. bpf and kprobes.
>
> I'll have something ready to post as an RFC in a few days.

I guess this RFC is similar to unmapped_alloc()? If it replaces
vmalloc, we can probably trim this set down a bit (remove
mod_alloc_params and vmalloc_params, etc.).

Thanks,
Song
  
Song Liu May 30, 2023, 10:48 p.m. UTC | #6
On Mon, May 29, 2023 at 11:25 AM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> > I don't think we are exposing architecture specific options to users.
> > Some layer need to handle arch specifics. If the new allocator is
> > built on top of module_alloc, module_alloc is handling that. If the new
> > allocator is to replace module_alloc, it needs to handle arch specifics.
>
> Ok, I went back and read more thoroughly, I got this part wrong. The
> actual interface is the mod_mem_type enum, not mod_alloc_params or
> vmalloc_params.
>
> So this was my main complaint, but this actually looks ok now.
>
> It would be better to have those structs in a .c file, not the header
> file - it looks like those are the public interface the way you have it.

Thanks for this suggestion. It makes a lot of sense. But I am not quite
sure how we can avoid putting it in the header yet. I will take a closer
look. OTOH, if we plan to use Mike's new allocator to replace vmalloc,
we probably don't need this part.

>
> > > The memory protection interface also needs to go, we've got a better
> > > interface to model after (text_poke(), although that code needs work
> > > too!). And the instruction fill features need a thorough justification
> > > if they're to be included.
> >
> > I guess the first step to use text_poke() is to make it available on all
> > archs? That doesn't seem easy to me.
>
> We just need a helper that either calls text_poke() or does the page
> permission dance in a single place.

AFAICT, we don't have a global text_poke() API yet. I can take a look
into it (if it makes sense).

>
> If we do the same thing for other mod_mem_types, we could potentially
> allow them to be shared on hugepages too.

Yeah, that's part of the goal to extend the scope from executable to all
types.

Thanks,
Song
  
Mike Rapoport May 31, 2023, 1:51 p.m. UTC | #7
On Tue, May 30, 2023 at 03:37:24PM -0700, Song Liu wrote:
> On Mon, May 29, 2023 at 3:45 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> > > On Sat, May 27, 2023 at 12:04 AM Kent Overstreet
> > > <kent.overstreet@linux.dev> wrote:
> > > >
> > > > I think this needs to back to the drawing board and we need something
> > > > simpler just targeted at executable memory; architecture specific
> > > > options should definitely _not_ be part of the exposed interface.
> > >
> > > I don't think we are exposing architecture specific options to users.
> > > Some layer need to handle arch specifics. If the new allocator is
> > > built on top of module_alloc, module_alloc is handling that. If the new
> > > allocator is to replace module_alloc, it needs to handle arch specifics.
> >
> > I'm for creating a new allocator that will replace module_alloc(). This
> > will give us a clean abstraction that modules and all the rest will use and
> > it will make easier to plug binpack or another allocator instead of
> > vmalloc.
> >
> > Another point is with a new allocator we won't have weird dependencies on
> > CONFIG_MODULE in e.g. bpf and kprobes.
> >
> > I'll have something ready to post as an RFC in a few days.
> 
> I guess this RFC is similar to unmapped_alloc()? If it replaces
> vmalloc, we can probably trim this set down a bit (remove
> mod_alloc_params and vmalloc_params, etc.).

No, it's not a new allocator. I'm trying to create an API for code
allocations that can accommodate all the architectures and it won't be a
part of modules code. The modules will use the new API just like every
other subsystem that needs to allocate code.

I've got a core part of it here:

https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=jitalloc/v1

and I hope I'll get it ready to post this week.

> Thanks,
> Song
  
Song Liu May 31, 2023, 5:03 p.m. UTC | #8
On Wed, May 31, 2023 at 6:51 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, May 30, 2023 at 03:37:24PM -0700, Song Liu wrote:
> > On Mon, May 29, 2023 at 3:45 AM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> > > > On Sat, May 27, 2023 at 12:04 AM Kent Overstreet
> > > > <kent.overstreet@linux.dev> wrote:
> > > > >
> > > > > I think this needs to back to the drawing board and we need something
> > > > > simpler just targeted at executable memory; architecture specific
> > > > > options should definitely _not_ be part of the exposed interface.
> > > >
> > > > I don't think we are exposing architecture specific options to users.
> > > > Some layer need to handle arch specifics. If the new allocator is
> > > > built on top of module_alloc, module_alloc is handling that. If the new
> > > > allocator is to replace module_alloc, it needs to handle arch specifics.
> > >
> > > I'm for creating a new allocator that will replace module_alloc(). This
> > > will give us a clean abstraction that modules and all the rest will use and
> > > it will make easier to plug binpack or another allocator instead of
> > > vmalloc.
> > >
> > > Another point is with a new allocator we won't have weird dependencies on
> > > CONFIG_MODULE in e.g. bpf and kprobes.
> > >
> > > I'll have something ready to post as an RFC in a few days.
> >
> > I guess this RFC is similar to unmapped_alloc()? If it replaces
> > vmalloc, we can probably trim this set down a bit (remove
> > mod_alloc_params and vmalloc_params, etc.).
>
> No, it's not a new allocator. I'm trying to create an API for code
> allocations that can accommodate all the architectures and it won't be a
> part of modules code. The modules will use the new API just like every
> other subsystem that needs to allocate code.
>
> I've got a core part of it here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=jitalloc/v1

This branch looks like the same scope as this set (but with different
implementation). So it will still use vmalloc, right?

Thanks,
Song
  
Mike Rapoport May 31, 2023, 7:54 p.m. UTC | #9
On Wed, May 31, 2023 at 10:03:58AM -0700, Song Liu wrote:
> On Wed, May 31, 2023 at 6:51 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, May 30, 2023 at 03:37:24PM -0700, Song Liu wrote:
> > > On Mon, May 29, 2023 at 3:45 AM Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> > > > > On Sat, May 27, 2023 at 12:04 AM Kent Overstreet
> > > > > <kent.overstreet@linux.dev> wrote:
> > > > > >
> > > > > > I think this needs to back to the drawing board and we need something
> > > > > > simpler just targeted at executable memory; architecture specific
> > > > > > options should definitely _not_ be part of the exposed interface.
> > > > >
> > > > > I don't think we are exposing architecture specific options to users.
> > > > > Some layer need to handle arch specifics. If the new allocator is
> > > > > built on top of module_alloc, module_alloc is handling that. If the new
> > > > > allocator is to replace module_alloc, it needs to handle arch specifics.
> > > >
> > > > I'm for creating a new allocator that will replace module_alloc(). This
> > > > will give us a clean abstraction that modules and all the rest will use and
> > > > it will make easier to plug binpack or another allocator instead of
> > > > vmalloc.
> > > >
> > > > Another point is with a new allocator we won't have weird dependencies on
> > > > CONFIG_MODULE in e.g. bpf and kprobes.
> > > >
> > > > I'll have something ready to post as an RFC in a few days.
> > >
> > > I guess this RFC is similar to unmapped_alloc()? If it replaces
> > > vmalloc, we can probably trim this set down a bit (remove
> > > mod_alloc_params and vmalloc_params, etc.).
> >
> > No, it's not a new allocator. I'm trying to create an API for code
> > allocations that can accommodate all the architectures and it won't be a
> > part of modules code. The modules will use the new API just like every
> > other subsystem that needs to allocate code.
> >
> > I've got a core part of it here:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=jitalloc/v1
> 
> This branch looks like the same scope as this set (but with different
> implementation). So it will still use vmalloc, right?

Yes, it still uses vmalloc. The idea is to decouple code allocations from
modules from one side and make it handle all the variants expected by the
architectures based on a set of parameters each architecture provides.

The first few commits essentially shuffle the code around and replace
arch::module_alloc() with arch::jit_alloc_params.

The commits on top enable some bits that are not available today, like ROX
executable memory and DYNAMIC_FTRACE without modules for x86.
 
> Thanks,
> Song
  
Kent Overstreet June 1, 2023, 5:53 p.m. UTC | #10
On Tue, May 30, 2023 at 03:48:51PM -0700, Song Liu wrote:
> On Mon, May 29, 2023 at 11:25 AM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> > > I don't think we are exposing architecture specific options to users.
> > > Some layer need to handle arch specifics. If the new allocator is
> > > built on top of module_alloc, module_alloc is handling that. If the new
> > > allocator is to replace module_alloc, it needs to handle arch specifics.
> >
> > Ok, I went back and read more thoroughly, I got this part wrong. The
> > actual interface is the mod_mem_type enum, not mod_alloc_params or
> > vmalloc_params.
> >
> > So this was my main complaint, but this actually looks ok now.
> >
> > It would be better to have those structs in a .c file, not the header
> > file - it looks like those are the public interface the way you have it.
> 
> Thanks for this suggestion. It makes a lot of sense. But I am not quite
> sure how we can avoid putting it in the header yet. I will take a closer
> look. OTOH, if we plan to use Mike's new allocator to replace vmalloc,
> we probably don't need this part.

The architectures previously exported constants that were used by
module_alloc(), why not stick with that?

> AFAICT, we don't have a global text_poke() API yet. I can take a look
> into it (if it makes sense).

Great

> Yeah, that's part of the goal to extend the scope from executable to all
> types.

Yeah it took me a bit to wrap my head around how this all makes sense -
it started out as just a better module_alloc(), then there was lots of
talk about hugepages.

I like it more now, looking forward to see how it fits together with
Mike's work :)
  
Kent Overstreet June 1, 2023, 6:21 p.m. UTC | #11
On Tue, May 30, 2023 at 03:48:51PM -0700, Song Liu wrote:
> On Mon, May 29, 2023 at 11:25 AM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Sat, May 27, 2023 at 10:58:37PM -0700, Song Liu wrote:
> > > I don't think we are exposing architecture specific options to users.
> > > Some layer need to handle arch specifics. If the new allocator is
> > > built on top of module_alloc, module_alloc is handling that. If the new
> > > allocator is to replace module_alloc, it needs to handle arch specifics.
> >
> > Ok, I went back and read more thoroughly, I got this part wrong. The
> > actual interface is the mod_mem_type enum, not mod_alloc_params or
> > vmalloc_params.
> >
> > So this was my main complaint, but this actually looks ok now.
> >
> > It would be better to have those structs in a .c file, not the header
> > file - it looks like those are the public interface the way you have it.
> 
> Thanks for this suggestion. It makes a lot of sense. But I am not quite
> sure how we can avoid putting it in the header yet. I will take a closer
> look. OTOH, if we plan to use Mike's new allocator to replace vmalloc,
> we probably don't need this part.

Well, right now module_alloc() uses arch-exported constants, we could
keep doing that if it makes sense.

Or the structs could go in a separate header file that better indicates
what they're for. The main point is just that - when we're writing new
code and creating a new interface, it's very helpful if we can have the
header file basically be the documentation for what the external
interface is. Put your big kernel doc "what is this thing about" comment
in that file, too :)

> > We just need a helper that either calls text_poke() or does the page
> > permission dance in a single place.
> 
> AFAICT, we don't have a global text_poke() API yet. I can take a look
> into it (if it makes sense).

I don't think anyone's working on this yet, so if you're interested I
think it might be helpful.

> > If we do the same thing for other mod_mem_types, we could potentially
> > allow them to be shared on hugepages too.
> 
> Yeah, that's part of the goal to extend the scope from executable to all
> types.

Yeah, I think the "enumerate types of allocations that want similar page
permission handling" is a good thing to focus on.