[v9,2/8] kbuild: add modules_thick.builtin

Message ID 20221109134132.9052-3-nick.alcock@oracle.com
State New
Headers
Series [v9,1/8] kbuild: bring back tristate.conf |

Commit Message

Nick Alcock Nov. 9, 2022, 1:41 p.m. UTC
  This is similar to modules.builtin, and constructed in a similar way to
the way that used to be built before commit
8b41fc4454e36fbfdbb23f940d023d4dece2de29, via tristate.conf inclusion
and recursive concatenation up the tree. Unlike modules.builtin,
modules_thick.builtin gives the names of the object files that make up
modules that are comprised of more than one object file, using a syntax
similar to that of makefiles, e.g.:

crypto/crypto.o: crypto/api.o crypto/cipher.o crypto/compress.o crypto/memneq.o
crypto/crypto_algapi.o: crypto/algapi.o crypto/proc.o crypto/scatterwalk.o
crypto/aead.o:
crypto/geniv.o:

(where the latter two are single-file modules).

An upcoming commit will use this mapping to populate /proc/kallmodsyms.

A parser is included that yields a stram of (module, objfile name[])
mappings: it's a bit baroque, but then parsing text files in C is quite
painful, and I'd rather put the complexity in here than in its callers.
The parser is not built in this commit, nor does it have any callers
yet; nor is any rule added that causes modules_thick.builtin to actually
be constructed.  (Again, see a later commit for that.)

I am not wedded to the approach used to construct this file, but I don't
see any other way to do it despite spending a week or so trying to tie
it into Kbuild without using a separate Makefile.modbuiltin: unlike the
names of builtin modules (which are also recorded in the source files
themseves via MODULE_*() macros) the mapping from object file name to
built-in module name is not recorded anywhere but in the makefiles
themselves, so we have to at least reparse them with something to
indicate the builtin-ness of each module (i.e., tristate.conf) if we are
to figure out which modules are built-in and which are not.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
---

Notes:
    v9: move modules_thick.builtin generation into the top-level Kbuild

 .gitignore                  |   1 +
 Documentation/dontdiff      |   1 +
 Kbuild                      |  16 +++
 Makefile                    |   2 +-
 scripts/Kbuild.include      |   6 ++
 scripts/Makefile.modbuiltin |  56 ++++++++++
 scripts/modules_thick.c     | 200 ++++++++++++++++++++++++++++++++++++
 scripts/modules_thick.h     |  48 +++++++++
 8 files changed, 329 insertions(+), 1 deletion(-)
 create mode 100644 scripts/Makefile.modbuiltin
 create mode 100644 scripts/modules_thick.c
 create mode 100644 scripts/modules_thick.h
  

Comments

Luis Chamberlain Nov. 10, 2022, 3:58 a.m. UTC | #1
On Wed, Nov 09, 2022 at 01:41:26PM +0000, Nick Alcock wrote:
> I am not wedded to the approach used to construct this file, but I don't
> see any other way to do it despite spending a week or so trying to tie
> it into Kbuild without using a separate Makefile.modbuiltin: unlike the
> names of builtin modules (which are also recorded in the source files
> themseves via MODULE_*() macros) the mapping from object file name to
> built-in module name is not recorded anywhere but in the makefiles
> themselves, so we have to at least reparse them with something to
> indicate the builtin-ness of each module (i.e., tristate.conf) if we are
> to figure out which modules are built-in and which are not.

Please try this patch, unless I am not understanding perhaps we may be
able to replace the first two patches with this one? It also seems
to capture a bit more data than the modules_thick.builtin file.

From 3e5ce9141c25fd4eb23b5423d9aa9f1c4ebe4e8e Mon Sep 17 00:00:00 2001
From: Luis Chamberlain <mcgrof@kernel.org>
Date: Wed, 9 Nov 2022 17:58:57 -0800
Subject: [PATCH] kbuild: add modules.builtin.objs

The file modules.builtin contains all modules that are built into
the kernel and this is used by modprobe to not fail when trying to
load something builtin. But for tools which want to see which object
files come from what modules we want to help them with such a mapping
as it is not easy to get this otherwise.

We do this by just extending scripts/Makefile.lib with a new variable
and define to capture all possible objects, and stuff this into a new
modinfo.

Note that this doesn't bloat the kernel as all because as you can see
in include/asm-generic/vmlinux.lds.h, the .modinfo section is discarded
at the link stage.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 .gitignore                      |  2 +-
 Documentation/dontdiff          |  2 +-
 Documentation/kbuild/kbuild.rst |  5 +++++
 Makefile                        | 10 +++++++---
 include/linux/module.h          |  4 +++-
 scripts/Makefile.lib            |  5 ++++-
 scripts/Makefile.vmlinux_o      | 15 ++++++++++++++-
 7 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/.gitignore b/.gitignore
index 5da004814678..ef8665c64f21 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,7 +67,7 @@ modules.order
 /System.map
 /Module.markers
 /modules.builtin
-/modules.builtin.modinfo
+/modules.builtin.*
 /modules.nsdeps
 
 #
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 352ff53a2306..ed1fbc711f33 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -180,7 +180,7 @@ mkutf8data
 modpost
 modules-only.symvers
 modules.builtin
-modules.builtin.modinfo
+modules.builtin.*
 modules.nsdeps
 modules.order
 modversions.h*
diff --git a/Documentation/kbuild/kbuild.rst b/Documentation/kbuild/kbuild.rst
index 08f575e6236c..504f31517cb4 100644
--- a/Documentation/kbuild/kbuild.rst
+++ b/Documentation/kbuild/kbuild.rst
@@ -17,6 +17,11 @@ modules.builtin
 This file lists all modules that are built into the kernel. This is used
 by modprobe to not fail when trying to load something builtin.
 
+modules.builtin.objs
+-----------------------
+This file contains object mapping of modules that are built into the kernel
+to their corresponding object files used to build the module.
+
 modules.builtin.modinfo
 -----------------------
 This file contains modinfo from all modules that are built into the kernel.
diff --git a/Makefile b/Makefile
index d148a55bfd0f..cd563d9713c9 100644
--- a/Makefile
+++ b/Makefile
@@ -1228,7 +1228,11 @@ PHONY += vmlinux_o
 vmlinux_o: vmlinux.a $(KBUILD_VMLINUX_LIBS)
 	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.vmlinux_o
 
-vmlinux.o modules.builtin.modinfo modules.builtin: vmlinux_o
+MODULES_BUILTIN := modules.builtin.modinfo
+MODULES_BUILTIN += modules.builtin
+MODULES_BUILTIN += modules.builtin.objs
+
+vmlinux.o $(MODULES_BUILTIN): vmlinux_o
 	@:
 
 PHONY += vmlinux
@@ -1558,7 +1562,7 @@ __modinst_pre:
 	fi
 	@sed 's:^:kernel/:' modules.order > $(MODLIB)/modules.order
 	@cp -f modules.builtin $(MODLIB)/
-	@cp -f $(objtree)/modules.builtin.modinfo $(MODLIB)/
+	@cp -f $(objtree)/modules.builtin.* $(MODLIB)/
 
 endif # CONFIG_MODULES
 
@@ -1571,7 +1575,7 @@ endif # CONFIG_MODULES
 
 # Directories & files removed with 'make clean'
 CLEAN_FILES += include/ksym vmlinux.symvers modules-only.symvers \
-	       modules.builtin modules.builtin.modinfo modules.nsdeps \
+	       modules.builtin modules.builtin.* modules.nsdeps \
 	       compile_commands.json .thinlto-cache rust/test rust/doc \
 	       .vmlinux.objs .vmlinux.export.c
 
diff --git a/include/linux/module.h b/include/linux/module.h
index ec61fb53979a..ce044cffd43d 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -179,7 +179,9 @@ extern void cleanup_module(void);
 #ifdef MODULE
 #define MODULE_FILE
 #else
-#define MODULE_FILE	MODULE_INFO(file, KBUILD_MODFILE);
+#define MODULE_FILE					                      \
+			MODULE_INFO(file, KBUILD_MODFILE);                    \
+			MODULE_INFO(objs, KBUILD_MODOBJS);
 #endif
 
 /*
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3aa384cec76b..f7e5a83572fa 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -112,6 +112,8 @@ modname-multi = $(sort $(foreach m,$(multi-obj-ym),\
 __modname = $(or $(modname-multi),$(basetarget))
 
 modname = $(subst $(space),:,$(__modname))
+modname-objs = $($(modname)-objs) $($(modname)-y) $($(modname)-Y)
+modname-objs-prefixed = $(sort $(strip $(addprefix $(obj)/, $(modname-objs))))
 modfile = $(addprefix $(obj)/,$(__modname))
 
 # target with $(obj)/ and its suffix stripped
@@ -125,7 +127,8 @@ name-fix = $(call stringify,$(call name-fix-token,$1))
 basename_flags = -DKBUILD_BASENAME=$(call name-fix,$(basetarget))
 modname_flags  = -DKBUILD_MODNAME=$(call name-fix,$(modname)) \
 		 -D__KBUILD_MODNAME=kmod_$(call name-fix-token,$(modname))
-modfile_flags  = -DKBUILD_MODFILE=$(call stringify,$(modfile))
+modfile_flags  = -DKBUILD_MODFILE=$(call stringify,$(modfile)) \
+                 -DKBUILD_MODOBJS=$(call stringify,$(modfile).o:$(subst $(space),|,$(modname-objs-prefixed)))
 
 _c_flags       = $(filter-out $(CFLAGS_REMOVE_$(target-stem).o), \
                      $(filter-out $(ccflags-remove-y), \
diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o
index 0edfdb40364b..9b4ca83f0695 100644
--- a/scripts/Makefile.vmlinux_o
+++ b/scripts/Makefile.vmlinux_o
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 PHONY := __default
-__default: vmlinux.o modules.builtin.modinfo modules.builtin
+__default: vmlinux.o modules.builtin.modinfo modules.builtin modules.builtin.objs
 
 include include/config/auto.conf
 include $(srctree)/scripts/Kbuild.include
@@ -86,6 +86,19 @@ targets += modules.builtin
 modules.builtin: modules.builtin.modinfo FORCE
 	$(call if_changed,modules_builtin)
 
+# module.builtin.objs
+# ---------------------------------------------------------------------------
+quiet_cmd_modules_builtin_objs = GEN     $@
+      cmd_modules_builtin_objs = \
+	tr '\0' '\n' < $< | \
+	sed -n 's/^[[:alnum:]:_]*\.objs=//p' | \
+	tr ' ' '\n' | uniq | sed -e 's|:|: |' -e 's:|: :g' | \
+	tr -s ' ' > $@
+
+targets += modules.builtin.objs
+modules.builtin.objs: modules.builtin.modinfo FORCE
+	$(call if_changed,modules_builtin_objs)
+
 # Add FORCE to the prequisites of a target to force it to be always rebuilt.
 # ---------------------------------------------------------------------------
  
Nick Alcock Nov. 11, 2022, 1:47 p.m. UTC | #2
On 10 Nov 2022, Luis Chamberlain uttered the following:

> On Wed, Nov 09, 2022 at 01:41:26PM +0000, Nick Alcock wrote:
>> I am not wedded to the approach used to construct this file, but I don't
>> see any other way to do it despite spending a week or so trying to tie
>> it into Kbuild without using a separate Makefile.modbuiltin: unlike the
>> names of builtin modules (which are also recorded in the source files
>> themseves via MODULE_*() macros) the mapping from object file name to
>> built-in module name is not recorded anywhere but in the makefiles
>> themselves, so we have to at least reparse them with something to
>> indicate the builtin-ness of each module (i.e., tristate.conf) if we are
>> to figure out which modules are built-in and which are not.
>
> Please try this patch, unless I am not understanding perhaps we may be
> able to replace the first two patches with this one? It also seems
> to capture a bit more data than the modules_thick.builtin file.

Oh thank you! At first sight this looks really good. I had trouble
believing it could be that much simpler :)

...

But... it's not quite doing the same thing, so perhaps it can't be that
much simpler. Picking the first item that appears in my test build of
this but not in modules_thick.builtin:

+arch/x86/crypto/libblake2s-x86_64.o: arch/x86/crypto/blake2s-core.o arch/x86/crypto/blake2s-glue.o

But...

obj-$(CONFIG_CRYPTO_BLAKE2S_X86) += libblake2s-x86_64.o
libblake2s-x86_64-y := blake2s-core.o blake2s-glue.o

config CRYPTO_BLAKE2S_X86
        bool "Hash functions: BLAKE2s (SSSE3/AVX-512)"

This cannot be built as a module. The point of modules_thick.builtin was
not to capture things that can be built into the kernel or left
unconfigured entirely (though that is *also* a nice thing to capture,
and should probably be captured regardless) but to capture *those things
that can possibly be built as modules*, i.e. those things which are
tristates in Kbuild and might possibly get built into .ko's. That was
the whole reason we needed the tristate stuff in the first place, and
I'm still not sure how to do it without that.

(The reason: if your tracer or whatever has a distinct notation used by
users for things in named modules, then you'd usually like to keep that
notation the same if you choose to build something into the kernel that
might otherwise be a module. Things that cannot be built as modules
could just use the in-the-core-kernel notation unconditionally, because
there's no way they can ever be found anywhere else: and users are
likely to expect that. At least, when I broke it in DTrace years ago, I
got complaints!)
  
Nick Alcock Nov. 11, 2022, 2:03 p.m. UTC | #3
On 11 Nov 2022, Nick Alcock said:

> But... it's not quite doing the same thing, so perhaps it can't be that
> much simpler. Picking the first item that appears in my test build of
> this but not in modules_thick.builtin:
>
> +arch/x86/crypto/libblake2s-x86_64.o: arch/x86/crypto/blake2s-core.o arch/x86/crypto/blake2s-glue.o
>
> But...
>
> obj-$(CONFIG_CRYPTO_BLAKE2S_X86) += libblake2s-x86_64.o
> libblake2s-x86_64-y := blake2s-core.o blake2s-glue.o
>
> config CRYPTO_BLAKE2S_X86
>         bool "Hash functions: BLAKE2s (SSSE3/AVX-512)"
>
> This cannot be built as a module. The point of modules_thick.builtin was
> not to capture things that can be built into the kernel or left
> unconfigured entirely (though that is *also* a nice thing to capture,
> and should probably be captured regardless) but to capture *those things
> that can possibly be built as modules*, i.e. those things which are
> tristates in Kbuild and might possibly get built into .ko's. That was
> the whole reason we needed the tristate stuff in the first place, and
> I'm still not sure how to do it without that.

OK, I think I should be able to combine your patch with (a variant of)
the tristate stuff and get the best of both worlds (only-modular plus
the efficiency of building and simplicity of your patch, with no nasty
separate recursion like I had for modules_thick.builtin). Working on
that...
  
Luis Chamberlain Nov. 11, 2022, 3:12 p.m. UTC | #4
On Fri, Nov 11, 2022 at 01:47:03PM +0000, Nick Alcock wrote:
> +arch/x86/crypto/libblake2s-x86_64.o: arch/x86/crypto/blake2s-core.o arch/x86/crypto/blake2s-glue.o
> 
> But...
> 
> obj-$(CONFIG_CRYPTO_BLAKE2S_X86) += libblake2s-x86_64.o
> libblake2s-x86_64-y := blake2s-core.o blake2s-glue.o
> 
> config CRYPTO_BLAKE2S_X86
>         bool "Hash functions: BLAKE2s (SSSE3/AVX-512)"
> 
> This cannot be built as a module.

mcgrof@fulton ~/linux (git::modules-next)$ git grep MODULE_LICENSE arch/x86/crypto/blake2s-*
arch/x86/crypto/blake2s-glue.c:MODULE_LICENSE("GPL v2");

Try removing that.

  Luis
  
Nick Alcock Nov. 14, 2022, 5:49 p.m. UTC | #5
On 11 Nov 2022, Luis Chamberlain outgrape:

> On Fri, Nov 11, 2022 at 01:47:03PM +0000, Nick Alcock wrote:
>> +arch/x86/crypto/libblake2s-x86_64.o: arch/x86/crypto/blake2s-core.o arch/x86/crypto/blake2s-glue.o
>> 
>> But...
>> 
>> obj-$(CONFIG_CRYPTO_BLAKE2S_X86) += libblake2s-x86_64.o
>> libblake2s-x86_64-y := blake2s-core.o blake2s-glue.o
>> 
>> config CRYPTO_BLAKE2S_X86
>>         bool "Hash functions: BLAKE2s (SSSE3/AVX-512)"
>> 
>> This cannot be built as a module.
>
> mcgrof@fulton ~/linux (git::modules-next)$ git grep MODULE_LICENSE arch/x86/crypto/blake2s-*
> arch/x86/crypto/blake2s-glue.c:MODULE_LICENSE("GPL v2");
>
> Try removing that.

OK, that works!

So if we're using the presence of MODULE_LICENSE to indicate that
something is potentially modular, I guess this means I need to do a
sweep through the kernel and find everywhere that cites a MODULE_LICENSE
and cannot be built as a module before this will say things are modules
that really are.

Should be easy enough to do semiautomatically, I hope (by comparison
with the output of the old thing using tristate on a make allyesconfig
build).

Hm. Maybe we need some sort of checking target that can use something
like the old tristate recursor to detect this problem to stop it
recurring. (Said target would only be run when requested, so it wouldn't
slow the build down the way it used to.)

... assuming that people even think the presence of a MODULE_LICENSE in
things that can't be modular is actually a problem. If it's not, my
chances of getting fixes for all these cases in seems low.
  
Luis Chamberlain Nov. 15, 2022, 9:21 p.m. UTC | #6
On Mon, Nov 14, 2022 at 05:49:58PM +0000, Nick Alcock wrote:
> On 11 Nov 2022, Luis Chamberlain outgrape:
> 
> > On Fri, Nov 11, 2022 at 01:47:03PM +0000, Nick Alcock wrote:
> >> +arch/x86/crypto/libblake2s-x86_64.o: arch/x86/crypto/blake2s-core.o arch/x86/crypto/blake2s-glue.o
> >> 
> >> But...
> >> 
> >> obj-$(CONFIG_CRYPTO_BLAKE2S_X86) += libblake2s-x86_64.o
> >> libblake2s-x86_64-y := blake2s-core.o blake2s-glue.o
> >> 
> >> config CRYPTO_BLAKE2S_X86
> >>         bool "Hash functions: BLAKE2s (SSSE3/AVX-512)"
> >> 
> >> This cannot be built as a module.
> >
> > mcgrof@fulton ~/linux (git::modules-next)$ git grep MODULE_LICENSE arch/x86/crypto/blake2s-*
> > arch/x86/crypto/blake2s-glue.c:MODULE_LICENSE("GPL v2");
> >
> > Try removing that.
> 
> OK, that works!
> 
> So if we're using the presence of MODULE_LICENSE to indicate that
> something is potentially modular, I guess this means I need to do a
> sweep through the kernel and find everywhere that cites a MODULE_LICENSE
> and cannot be built as a module before this will say things are modules
> that really are.

Yes, make allyesconfig builds + a verifier for tristate would be nice.
scripts/kconfig/streamline_config.pl has an iterator over kconfig files
and also objects which you might find useful.

At build time such a thing could nag about issues like the above.

  Luis
  
Nick Alcock Nov. 21, 2022, 3:21 p.m. UTC | #7
On 15 Nov 2022, Luis Chamberlain uttered the following:

> On Mon, Nov 14, 2022 at 05:49:58PM +0000, Nick Alcock wrote:
>> On 11 Nov 2022, Luis Chamberlain outgrape:
>> 
>> > On Fri, Nov 11, 2022 at 01:47:03PM +0000, Nick Alcock wrote:
>> >> +arch/x86/crypto/libblake2s-x86_64.o: arch/x86/crypto/blake2s-core.o arch/x86/crypto/blake2s-glue.o
>> >> 
>> >> But...
>> >> 
>> >> obj-$(CONFIG_CRYPTO_BLAKE2S_X86) += libblake2s-x86_64.o
>> >> libblake2s-x86_64-y := blake2s-core.o blake2s-glue.o
>> >> 
>> >> config CRYPTO_BLAKE2S_X86
>> >>         bool "Hash functions: BLAKE2s (SSSE3/AVX-512)"
>> >> 
>> >> This cannot be built as a module.
>> >
>> > mcgrof@fulton ~/linux (git::modules-next)$ git grep MODULE_LICENSE arch/x86/crypto/blake2s-*
>> > arch/x86/crypto/blake2s-glue.c:MODULE_LICENSE("GPL v2");
>> >
>> > Try removing that.
>> 
>> OK, that works!
>> 
>> So if we're using the presence of MODULE_LICENSE to indicate that
>> something is potentially modular, I guess this means I need to do a
>> sweep through the kernel and find everywhere that cites a MODULE_LICENSE
>> and cannot be built as a module before this will say things are modules
>> that really are.
>
> Yes, make allyesconfig builds + a verifier for tristate would be nice.
> scripts/kconfig/streamline_config.pl has an iterator over kconfig files
> and also objects which you might find useful.
>
> At build time such a thing could nag about issues like the above.

OK, I've got enough of this working now that weird edge cases are
popping out and it's starting to become fun rather than frustrating!

Some of them are bugs in the tristate-checker side of things, e.g. in
net/bridge/Makefile we see

bridge-$(subst m,y,$(CONFIG_BRIDGE_NETFILTER)) += br_nf_core.o

and my code is tripping over because while it runs
CONFIG_BRIDGE_NETFILTER comes from tristate.conf and has an uppercase
value. This sort of thing is rare (a dozen cases at most across the
tree) and easily fixable in the individual makefiles as part of the
tristate checker commit, so I'll just do that.

But some are I think obscure bugs in the new machinery. e.g. in
security/keys/encrypted-keys/Makefile we have this wonderfully crazy
thing:

masterkey-$(CONFIG_TRUSTED_KEYS)-$(CONFIG_ENCRYPTED_KEYS) := masterkey_trusted.o

This produces things that look like masterkey-y-y, which does not match
$(modname)-y. I'll come up with a more complete list shortly (a make
allyesconfig is spinning now). We can probably hack around these by also
expanding $(modname)-y-y etc, for as many steps out as it seems people
are actually doing in the tree right now. I can submit a patch that does
that if you like. (If there's a less gross and fragile approach, I'd be
happy to do that instead.)


One question: do you think it's worthwhile me submitting patches to
de-MODULE_* things that need it? It looks like I need to tear all the
MODULE_ lines out before the .modinfo section goes away and this gives
accurate results. This isn't actually deleting copyright lines, but it's
making me a bit worried anyway: it feels morally equivalent somehow.

Is it actually considered OK to just delete MODULE_LICENSE,
MODULE_AUTHOR etc for non-modules when they were written by someone
else? (Commenting them out, leaving them as documentation, is presumably
less likely to anger people but might be considered ugly.)
  
Luis Chamberlain Nov. 21, 2022, 7:12 p.m. UTC | #8
On Mon, Nov 21, 2022 at 03:21:10PM +0000, Nick Alcock wrote:
> One question: do you think it's worthwhile me submitting patches to
> de-MODULE_* things that need it?

100% yes.

Yes please remove all that module declration helpers for things that are
not modules, and after you add your helper which will nag at build time
when it finds new ones.

For justification just mention in the commit log that after commit
8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or
tristate.conf") we rely on the module license tag to generate the
modules.builtin file and so built-in code which uses module helpers
just need to be removed.

  Luis
  
Luis Chamberlain Nov. 21, 2022, 7:18 p.m. UTC | #9
On Mon, Nov 21, 2022 at 11:12:52AM -0800, Luis Chamberlain wrote:
> On Mon, Nov 21, 2022 at 03:21:10PM +0000, Nick Alcock wrote:
> > One question: do you think it's worthwhile me submitting patches to
> > de-MODULE_* things that need it?
> 
> 100% yes.
> 
> Yes please remove all that module declration helpers for things that are
> not modules, and after you add your helper which will nag at build time
> when it finds new ones.
> 
> For justification just mention in the commit log that after commit
> 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or
> tristate.conf") we rely on the module license tag to generate the
> modules.builtin file and so built-in code which uses module helpers
> just need to be removed.

You should also mention what modules.builtin is used for as per our
Documentation/kbuild/kbuild.rst, and that it is only used for
modprobe to *not* fail when trying to load a module which is
built-in.

And so, the removing these tags doesn't fix anything critical
in particular, but it does fix false positive uses of userspace
modprobe use against built-in symbols.

That would *complete* the commit log with a clear justification and
evaluation of impact.

How many of these are we talking about? I'm happy to take them
via modules-next. I'd hope to not run accross many conflicts against
other trees.

  Luis
  
Nick Alcock Nov. 21, 2022, 9:14 p.m. UTC | #10
On 21 Nov 2022, Luis Chamberlain spake thusly:

> On Mon, Nov 21, 2022 at 11:12:52AM -0800, Luis Chamberlain wrote:
>> On Mon, Nov 21, 2022 at 03:21:10PM +0000, Nick Alcock wrote:
>> > One question: do you think it's worthwhile me submitting patches to
>> > de-MODULE_* things that need it?
>> 
>> 100% yes.
>> 
>> Yes please remove all that module declration helpers for things that are
>> not modules, and after you add your helper which will nag at build time
>> when it finds new ones.
>> 
>> For justification just mention in the commit log that after commit
>> 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or
>> tristate.conf") we rely on the module license tag to generate the
>> modules.builtin file and so built-in code which uses module helpers
>> just need to be removed.
>
> You should also mention what modules.builtin is used for as per our
> Documentation/kbuild/kbuild.rst, and that it is only used for
> modprobe to *not* fail when trying to load a module which is
> built-in.

Yep! (Which is an extremely damn useful improvement, I must say. I'm
relying on it already.)

Only two remaining problems in your patch that I can see (hacked around
in the checker, but consumers shouldn't have to hack around this sort of
thing). First, some very strange lines like this in modules.builtin.objs:

drivers/hid/hid-uclogic.o: drivers/hid/hid-uclogic-core.o drivers/hid/hid-uclogic-params.o drivers/hid/hid-uclogic-rdesc.o
drivers/hid/hid-uclogic
drivers/hid/hid-uclogic-test.o: 

(note the line with no .o or colon at all.)

This seems to be a consequence of lines like

hid-uclogic-objs                := hid-uclogic-core.o \
                                   hid-uclogic-rdesc.o \
                                   hid-uclogic-params.o
obj-$(CONFIG_HID_UCLOGIC)       += hid-uclogic.o

i.e. use of -objs as a completely random variable for holding object
files which might or might not be in a module. This seems a bit.. risky
to me. Looking for a fix... maybe we can just ignore *-objs on the
grounds that if it matters it will always land in some other variable
too? ... maybe?


One other definite problem: drivers/staging/media/atomisp/Makefile says:

obj-$(CONFIG_VIDEO_ATOMISP) += pci/atomisp_gmin_platform.o

This subdirectory is lost from KBUILD_MODOBJS, leading to the file entry
in modinfo and thus the resulting modules.builtin.objs pointing to a
file named drivers/staging/media/atomisp/atomisp_gmin_platform.o, which
does not exist. (This is also seen in some non-staging directories, e.g.
kernel/trace/rv.)

> How many of these are we talking about? I'm happy to take them
> via modules-next. I'd hope to not run accross many conflicts against
> other trees.

Going by an x86 allyesconfig run, 169 total (probably plus a few given
errors like the one above), plus no doubt a few more for other arches.
So not a vast number, but enough that hacking up a checker was clearly
not a waste of time.

If there turn out to be any conflicts that aren't spurious I'd be very
surprised. Hardly anyone ever even adjusts their email address in
MODULE_AUTHOR when they change it :P
  

Patch

diff --git a/.gitignore b/.gitignore
index 5da004814678..f129bf52cbd4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -52,6 +52,7 @@ 
 *.zst
 Module.symvers
 modules.order
+modules_thick.builtin
 
 #
 # Top-level generic files
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 352ff53a2306..077d43b9675d 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -183,6 +183,7 @@  modules.builtin
 modules.builtin.modinfo
 modules.nsdeps
 modules.order
+modules_thick.builtin
 modversions.h*
 nconf
 nconf-cfg
diff --git a/Kbuild b/Kbuild
index 464b34a08f51..a84e8312c174 100644
--- a/Kbuild
+++ b/Kbuild
@@ -97,3 +97,19 @@  obj-$(CONFIG_SAMPLES)	+= samples/
 obj-$(CONFIG_NET)	+= net/
 obj-y			+= virt/
 obj-y			+= $(ARCH_DRIVERS)
+
+# Generate modules_thick.builtin if needed.
+#
+# modules_thick.builtin maps from kernel modules (or rather the object file
+# names they would have had had they not been built in) to their constituent
+# object files: we can use this to determine which modules any given object
+# file is part of.  (We cannot eliminate the slight redundancy here without
+# double-expansion.)
+
+modthickbuiltin-files := $(addsuffix modules_thick.builtin, $(filter %/,$(obj-y)))
+
+$(modthickbuiltin-files): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst %/modules_thick.builtin,%,$@) builtin-file=modules_thick.builtin
+
+modules_thick.builtin: $(modthickbuiltin-files) $(obj-y)
+	$(Q)$(AWK) '!x[$$0]++' $(patsubst %/built-in.a, %/$@, $(filter %/built-in.a,$(obj-y))) > $@
diff --git a/Makefile b/Makefile
index 5d26447fecb8..21117f9d4202 100644
--- a/Makefile
+++ b/Makefile
@@ -2008,7 +2008,7 @@  clean: $(clean-dirs)
 		-o -name '*.lex.c' -o -name '*.tab.[ch]' \
 		-o -name '*.asn1.[ch]' \
 		-o -name '*.symtypes' -o -name 'modules.order' \
-		-o -name '.tmp_*' \
+		-o -name '.tmp_*' -o -name modules_thick.builtin \
 		-o -name '*.c.[012]*.*' \
 		-o -name '*.ll' \
 		-o -name '*.gcno' \
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 2bc08ace38a3..28d5eb7a9b61 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -78,6 +78,12 @@  endef
 # $(Q)$(MAKE) $(build)=dir
 build := -f $(srctree)/scripts/Makefile.build obj
 
+###
+# Shorthand for $(Q)$(MAKE) -f scripts/Makefile.modbuiltin obj=
+# Usage:
+# $(Q)$(MAKE) $(modbuiltin)=dir
+modbuiltin := -f $(srctree)/scripts/Makefile.modbuiltin obj
+
 ###
 # Shorthand for $(Q)$(MAKE) -f scripts/Makefile.dtbinst obj=
 # Usage:
diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
new file mode 100644
index 000000000000..a27b692ea795
--- /dev/null
+++ b/scripts/Makefile.modbuiltin
@@ -0,0 +1,56 @@ 
+# SPDX-License-Identifier: GPL-2.0
+# ==========================================================================
+# Generating modules_thick.builtin
+# ==========================================================================
+
+src := $(obj)
+
+PHONY := __modbuiltin
+__modbuiltin:
+
+include include/config/auto.conf
+# tristate.conf sets tristate variables to uppercase 'Y' or 'M'
+# That way, we get the list of built-in modules in obj-Y
+include include/config/tristate.conf
+
+include scripts/Kbuild.include
+
+ifdef building_out_of_srctree
+# Create output directory if not already present
+_dummy := $(shell [ -d $(obj) ] || mkdir -p $(obj))
+endif
+
+# The filename Kbuild has precedence over Makefile
+kbuild-dir := $(if $(filter /%,$(src)),$(src),$(srctree)/$(src))
+kbuild-file := $(if $(wildcard $(kbuild-dir)/Kbuild),$(kbuild-dir)/Kbuild,$(kbuild-dir)/Makefile)
+include $(kbuild-file)
+
+include scripts/Makefile.lib
+
+modthickbuiltin-subdirs := $(patsubst %,%/modules_thick.builtin, $(subdir-ym))
+modthickbuiltin-target  := $(obj)/modules_thick.builtin
+
+__modbuiltin: $(obj)/$(builtin-file) $(subdir-ym)
+	@:
+
+$(modthickbuiltin-target): $(subdir-ym) FORCE
+	$(Q) rm -f $@
+	$(Q) $(foreach mod-o, $(filter %.o,$(obj-Y)),\
+		printf "%s:" $(addprefix $(obj)/,$(mod-o)) >> $@; \
+		printf " %s" $(sort $(strip $(addprefix $(obj)/,$($(mod-o:.o=-objs)) \
+			$($(mod-o:.o=-y)) $($(mod-o:.o=-Y))))) >> $@; \
+		printf "\n" >> $@; ) \
+	cat /dev/null $(modthickbuiltin-subdirs) >> $@;
+
+PHONY += FORCE
+
+FORCE:
+
+# Descending
+# ---------------------------------------------------------------------------
+
+PHONY += $(subdir-ym)
+$(subdir-ym):
+	$(Q)$(MAKE) $(modbuiltin)=$@ builtin-file=$(builtin-file)
+
+.PHONY: $(PHONY)
diff --git a/scripts/modules_thick.c b/scripts/modules_thick.c
new file mode 100644
index 000000000000..9a15e99c1330
--- /dev/null
+++ b/scripts/modules_thick.c
@@ -0,0 +1,200 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * A simple modules_thick reader.
+ *
+ * (C) 2014, 2021 Oracle, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "modules_thick.h"
+
+/*
+ * Read a modules_thick.builtin file and translate it into a stream of
+ * name / module-name pairs.
+ */
+
+/*
+ * Construct a modules_thick.builtin iterator.
+ */
+struct modules_thick_iter *
+modules_thick_iter_new(const char *modules_thick_file)
+{
+	struct modules_thick_iter *i;
+
+	i = calloc(1, sizeof(struct modules_thick_iter));
+	if (i == NULL)
+		return NULL;
+
+	i->f = fopen(modules_thick_file, "r");
+
+	if (i->f == NULL) {
+		fprintf(stderr, "Cannot open builtin module file %s: %s\n",
+			modules_thick_file, strerror(errno));
+		return NULL;
+	}
+
+	return i;
+}
+
+/*
+ * Iterate, returning a new null-terminated array of object file names, and a
+ * new dynamically-allocated module name.  (The module name passed in is freed.)
+ *
+ * The array of object file names should be freed by the caller: the strings it
+ * points to are owned by the iterator, and should not be freed.
+ */
+
+char ** __attribute__((__nonnull__))
+modules_thick_iter_next(struct modules_thick_iter *i, char **module_name)
+{
+	size_t npaths = 1;
+	char **module_paths;
+	char *last_slash;
+	char *last_dot;
+	char *trailing_linefeed;
+	char *object_name = i->line;
+	char *dash;
+	int composite = 0;
+
+	/*
+	 * Read in all module entries, computing the suffixless, pathless name
+	 * of the module and building the next arrayful of object file names for
+	 * return.
+	 *
+	 * Modules can consist of multiple files: in this case, the portion
+	 * before the colon is the path to the module (as before): the portion
+	 * after the colon is a space-separated list of files that should be
+	 * considered part of this module.  In this case, the portion before the
+	 * name is an "object file" that does not actually exist: it is merged
+	 * into built-in.a without ever being written out.
+	 *
+	 * All module names have - translated to _, to match what is done to the
+	 * names of the same things when built as modules.
+	 */
+
+	/*
+	 * Reinvocation of exhausted iterator. Return NULL, once.
+	 */
+retry:
+	if (getline(&i->line, &i->line_size, i->f) < 0) {
+		if (ferror(i->f)) {
+			fprintf(stderr, "Error reading from modules_thick file:"
+				" %s\n", strerror(errno));
+			exit(1);
+		}
+		rewind(i->f);
+		return NULL;
+	}
+
+	if (i->line[0] == '\0')
+		goto retry;
+
+	/*
+	 * Slice the line in two at the colon, if any.  If there is anything
+	 * past the ': ', this is a composite module.  (We allow for no colon
+	 * for robustness, even though one should always be present.)
+	 */
+	if (strchr(i->line, ':') != NULL) {
+		char *name_start;
+
+		object_name = strchr(i->line, ':');
+		*object_name = '\0';
+		object_name++;
+		name_start = object_name + strspn(object_name, " \n");
+		if (*name_start != '\0') {
+			composite = 1;
+			object_name = name_start;
+		}
+	}
+
+	/*
+	 * Figure out the module name.
+	 */
+	last_slash = strrchr(i->line, '/');
+	last_slash = (!last_slash) ? i->line :
+		last_slash + 1;
+	free(*module_name);
+	*module_name = strdup(last_slash);
+	dash = *module_name;
+
+	while (dash != NULL) {
+		dash = strchr(dash, '-');
+		if (dash != NULL)
+			*dash = '_';
+	}
+
+	last_dot = strrchr(*module_name, '.');
+	if (last_dot != NULL)
+		*last_dot = '\0';
+
+	trailing_linefeed = strchr(object_name, '\n');
+	if (trailing_linefeed != NULL)
+		*trailing_linefeed = '\0';
+
+	/*
+	 * Multifile separator? Object file names explicitly stated:
+	 * slice them up and shuffle them in.
+	 *
+	 * The array size may be an overestimate if any object file
+	 * names start or end with spaces (very unlikely) but cannot be
+	 * an underestimate.  (Check for it anyway.)
+	 */
+	if (composite) {
+		char *one_object;
+
+		for (npaths = 0, one_object = object_name;
+		     one_object != NULL;
+		     npaths++, one_object = strchr(one_object + 1, ' '));
+	}
+
+	module_paths = malloc((npaths + 1) * sizeof(char *));
+	if (!module_paths) {
+		fprintf(stderr, "%s: out of memory on module %s\n", __func__,
+			*module_name);
+		exit(1);
+	}
+
+	if (composite) {
+		char *one_object;
+		size_t i = 0;
+
+		while ((one_object = strsep(&object_name, " ")) != NULL) {
+			if (i >= npaths) {
+				fprintf(stderr, "%s: num_objs overflow on module "
+					"%s: this is a bug.\n", __func__,
+					*module_name);
+				exit(1);
+			}
+
+			module_paths[i++] = one_object;
+		}
+	} else
+		module_paths[0] = i->line;	/* untransformed module name */
+
+	module_paths[npaths] = NULL;
+
+	return module_paths;
+}
+
+/*
+ * Free an iterator. Can be called while iteration is underway, so even
+ * state that is freed at the end of iteration must be freed here too.
+ */
+void
+modules_thick_iter_free(struct modules_thick_iter *i)
+{
+	if (i == NULL)
+		return;
+	fclose(i->f);
+	free(i->line);
+	free(i);
+}
diff --git a/scripts/modules_thick.h b/scripts/modules_thick.h
new file mode 100644
index 000000000000..f5edcaf9550c
--- /dev/null
+++ b/scripts/modules_thick.h
@@ -0,0 +1,48 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * A simple modules_thick reader.
+ *
+ * (C) 2014, 2021 Oracle, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _LINUX_MODULES_THICK_H
+#define _LINUX_MODULES_THICK_H
+
+#include <stdio.h>
+#include <stddef.h>
+
+/*
+ * modules_thick.builtin iteration state.
+ */
+struct modules_thick_iter {
+	FILE *f;
+	char *line;
+	size_t line_size;
+};
+
+/*
+ * Construct a modules_thick.builtin iterator.
+ */
+struct modules_thick_iter *
+modules_thick_iter_new(const char *modules_thick_file);
+
+/*
+ * Iterate, returning a new null-terminated array of object file names, and a
+ * new dynamically-allocated module name.  (The module name passed in is freed.)
+ *
+ * The array of object file names should be freed by the caller: the strings it
+ * points to are owned by the iterator, and should not be freed.
+ */
+
+char ** __attribute__((__nonnull__))
+modules_thick_iter_next(struct modules_thick_iter *i, char **module_name);
+
+void
+modules_thick_iter_free(struct modules_thick_iter *i);
+
+#endif