[v2,1/3] objtool: Make objtool check actually fatal upon fatal errors

Message ID 20231213134303.2302285-2-dimitri.ledkov@canonical.com
State New
Headers
Series [v2,1/3] objtool: Make objtool check actually fatal upon fatal errors |

Commit Message

Dimitri John Ledkov Dec. 13, 2023, 1:43 p.m. UTC
  Currently function calls within check() are sensitive to fatal errors
(negative return codes) and abort execution prematurely. However, in
all such cases the check() function still returns 0, and thus
resulting in a successful kernel build.

The only correct code paths were the ones that escpae the control flow
with `return ret`.

Make the check() function return `ret` status code, and make all
negative return codes goto that instruction. This makes fatal errors
(not warnings) from various function calls actually fail the
build. E.g. if create_retpoline_sites_sections() fails to create elf
section pair retpoline_sites the tool now exits with an error code.

Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
---
 tools/objtool/check.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)
  

Comments

Ingo Molnar Jan. 8, 2024, 9:15 a.m. UTC | #1
* Dimitri John Ledkov <dimitri.ledkov@canonical.com> wrote:

> Currently function calls within check() are sensitive to fatal errors
> (negative return codes) and abort execution prematurely. However, in
> all such cases the check() function still returns 0, and thus
> resulting in a successful kernel build.
> 
> The only correct code paths were the ones that escpae the control flow
> with `return ret`.
> 
> Make the check() function return `ret` status code, and make all
> negative return codes goto that instruction. This makes fatal errors
> (not warnings) from various function calls actually fail the
> build. E.g. if create_retpoline_sites_sections() fails to create elf
> section pair retpoline_sites the tool now exits with an error code.
> 
> Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>

So, is this not expected to be the case anymore:

>  out:
> -	/*
> -	 *  For now, don't fail the kernel build on fatal warnings.  These
> -	 *  errors are still fairly common due to the growing matrix of
> -	 *  supported toolchains and their recent pace of change.
> -	 */
> -	return 0;

?

How about making it only fatal if CONFIG_WERROR=y, ie. an analogue to our 
treatment of compiler warnings?

Thanks,

	Ingo
  
Josh Poimboeuf Jan. 9, 2024, 7:24 p.m. UTC | #2
On Mon, Jan 08, 2024 at 10:15:34AM +0100, Ingo Molnar wrote:
> 
> * Dimitri John Ledkov <dimitri.ledkov@canonical.com> wrote:
> 
> > Currently function calls within check() are sensitive to fatal errors
> > (negative return codes) and abort execution prematurely. However, in
> > all such cases the check() function still returns 0, and thus
> > resulting in a successful kernel build.
> > 
> > The only correct code paths were the ones that escpae the control flow
> > with `return ret`.
> > 
> > Make the check() function return `ret` status code, and make all
> > negative return codes goto that instruction. This makes fatal errors
> > (not warnings) from various function calls actually fail the
> > build. E.g. if create_retpoline_sites_sections() fails to create elf
> > section pair retpoline_sites the tool now exits with an error code.
> > 
> > Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
> 
> So, is this not expected to be the case anymore:
> 
> >  out:
> > -	/*
> > -	 *  For now, don't fail the kernel build on fatal warnings.  These
> > -	 *  errors are still fairly common due to the growing matrix of
> > -	 *  supported toolchains and their recent pace of change.
> > -	 */
> > -	return 0;
> 
> ?
> 
> How about making it only fatal if CONFIG_WERROR=y, ie. an analogue to our 
> treatment of compiler warnings?

Objtool has two classes of warnings:

1) "fatal"

   - allocation failures
   - CFG recreation failures
   - annotation parsing errors
   - other objtool bugs

2) non-"fatal":

  - missing security features (retpolines, IBT, SLS INT3)
  - unreachable instructions (note this warning may indicate more
    serious issues like an incomplete or buggy objtool CFG)

The first class of "warning" is actually an error.  It means objtool
couldn't reasonably continue, so it exited early.  I'm thinking this
should always fail the build so it can be reported and fixed ASAP.

We tried doing that before, but ending up reverting it (for raisins).
We should try again (as per the above patch).

The second class of warning, though it doesn't abort objtool, can still
be quite serious.  Ignoring it can fail the boot, or can expose the user
to certain attacks.

My proposal would be to always fail for #1, and to make #2 dependent on
CONFIG_WERROR.

Note the latter may be problematic at the moment due to some outstanding
warnings reported by Arnd and randconfig build bots.  I try to fix those
when I can, but any help would be appreciated.
  
Dimitri John Ledkov Jan. 9, 2024, 7:48 p.m. UTC | #3
Hi,

On Tue, 9 Jan 2024 at 19:24, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Mon, Jan 08, 2024 at 10:15:34AM +0100, Ingo Molnar wrote:
> >
> > * Dimitri John Ledkov <dimitri.ledkov@canonical.com> wrote:
> >
> > > Currently function calls within check() are sensitive to fatal errors
> > > (negative return codes) and abort execution prematurely. However, in
> > > all such cases the check() function still returns 0, and thus
> > > resulting in a successful kernel build.
> > >
> > > The only correct code paths were the ones that escpae the control flow
> > > with `return ret`.
> > >
> > > Make the check() function return `ret` status code, and make all
> > > negative return codes goto that instruction. This makes fatal errors
> > > (not warnings) from various function calls actually fail the
> > > build. E.g. if create_retpoline_sites_sections() fails to create elf
> > > section pair retpoline_sites the tool now exits with an error code.
> > >
> > > Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
> >
> > So, is this not expected to be the case anymore:
> >
> > >  out:
> > > -   /*
> > > -    *  For now, don't fail the kernel build on fatal warnings.  These
> > > -    *  errors are still fairly common due to the growing matrix of
> > > -    *  supported toolchains and their recent pace of change.
> > > -    */
> > > -   return 0;
> >
> > ?
> >
> > How about making it only fatal if CONFIG_WERROR=y, ie. an analogue to our
> > treatment of compiler warnings?
>
> Objtool has two classes of warnings:
>
> 1) "fatal"
>
>    - allocation failures
>    - CFG recreation failures
>    - annotation parsing errors
>    - other objtool bugs
>
> 2) non-"fatal":
>
>   - missing security features (retpolines, IBT, SLS INT3)
>   - unreachable instructions (note this warning may indicate more
>     serious issues like an incomplete or buggy objtool CFG)
>
> The first class of "warning" is actually an error.  It means objtool
> couldn't reasonably continue, so it exited early.  I'm thinking this
> should always fail the build so it can be reported and fixed ASAP.
>
> We tried doing that before, but ending up reverting it (for raisins).
> We should try again (as per the above patch).
>
> The second class of warning, though it doesn't abort objtool, can still
> be quite serious.  Ignoring it can fail the boot, or can expose the user
> to certain attacks.
>

As a distribution kernel maintainer, my primary motivation was
enforcement of the security features.
Specifically because these security features are config opt-in, and
when one opts into having them, they better be enfoced.
If something fails, well, you cannot claim that one has retpoline, sls
mitigrations enabled.

> My proposal would be to always fail for #1, and to make #2 dependent on
> CONFIG_WERROR.

I worry about bundling unrelated things. I still want to enforce
security relevant features separately from compiler changes.

How are you proposing to deal with toolchain upgrades? A given kernel
might be WERROR clean with gcc-11 and gcc-12 -march=generic, but not
with -march=x86-64-v3 (as recently found) or for example with
clang-snapshot.

Whilst at the same time, the security relevant things should be enforced.

From a distribution maintainer perspective, if a kernel is compiled
with CONFIG_SLS I want to be able to enforce that for both kernel and
dkms builds. I cannot reasonably keep up with toolchain updates and
dkms code quality w.r.t. WERROR beyond what is required for security
sensitive code paths like SLS and RETPOLINE.

>
> Note the latter may be problematic at the moment due to some outstanding
> warnings reported by Arnd and randconfig build bots.  I try to fix those
> when I can, but any help would be appreciated.

I guess we can override CONFIG_WERROR during dkms compilations, but
then i still want to ensofrce OBJTOOL errors.

How abou introducing CONFIG_WERROR_OBJTOOL and make CONFIG_WERROR
select CONFIG_WERROR_OBJTOOL? such that distributions can start to opt
into CONFIG_WERROR_OBJTOOL for production builds, even if they cannot
yet make the jump to CONFIG_WERROR?
  
Josh Poimboeuf Jan. 9, 2024, 8:20 p.m. UTC | #4
On Tue, Jan 09, 2024 at 07:48:30PM +0000, Dimitri John Ledkov wrote:
> > Note the latter may be problematic at the moment due to some outstanding
> > warnings reported by Arnd and randconfig build bots.  I try to fix those
> > when I can, but any help would be appreciated.
> 
> I guess we can override CONFIG_WERROR during dkms compilations, but
> then i still want to ensofrce OBJTOOL errors.
> 
> How abou introducing CONFIG_WERROR_OBJTOOL and make CONFIG_WERROR
> select CONFIG_WERROR_OBJTOOL? such that distributions can start to opt
> into CONFIG_WERROR_OBJTOOL for production builds, even if they cannot
> yet make the jump to CONFIG_WERROR?

Seems reasonable.

Though, keep in mind that even compiler warnings can cause boot failures
and security bugs.  So I wouldn't recommend sleeping at night unless
your distro also has CONFIG_WERROR=y.
  

Patch

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e94756e09c..15df4afae2 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -4669,166 +4669,163 @@  static void free_insns(struct objtool_file *file)
 int check(struct objtool_file *file)
 {
 	int ret, warnings = 0;
 
 	arch_initial_func_cfi_state(&initial_func_cfi);
 	init_cfi_state(&init_cfi);
 	init_cfi_state(&func_cfi);
 	set_func_state(&func_cfi);
 	init_cfi_state(&force_undefined_cfi);
 	force_undefined_cfi.force_undefined = true;
 
-	if (!cfi_hash_alloc(1UL << (file->elf->symbol_bits - 3)))
+	if (!cfi_hash_alloc(1UL << (file->elf->symbol_bits - 3))) {
+		ret = -1;
 		goto out;
+	}
 
 	cfi_hash_add(&init_cfi);
 	cfi_hash_add(&func_cfi);
 
 	ret = decode_sections(file);
 	if (ret < 0)
 		goto out;
 
 	warnings += ret;
 
 	if (!nr_insns)
 		goto out;
 
 	if (opts.retpoline) {
 		ret = validate_retpoline(file);
 		if (ret < 0)
-			return ret;
+			goto out;
 		warnings += ret;
 	}
 
 	if (opts.stackval || opts.orc || opts.uaccess) {
 		ret = validate_functions(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 
 		ret = validate_unwind_hints(file, NULL);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 
 		if (!warnings) {
 			ret = validate_reachable_instructions(file);
 			if (ret < 0)
 				goto out;
 			warnings += ret;
 		}
 
 	} else if (opts.noinstr) {
 		ret = validate_noinstr_sections(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	if (opts.unret) {
 		/*
 		 * Must be after validate_branch() and friends, it plays
 		 * further games with insn->visited.
 		 */
 		ret = validate_unrets(file);
 		if (ret < 0)
-			return ret;
+			goto out;
 		warnings += ret;
 	}
 
 	if (opts.ibt) {
 		ret = validate_ibt(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	if (opts.sls) {
 		ret = validate_sls(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	if (opts.static_call) {
 		ret = create_static_call_sections(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	if (opts.retpoline) {
 		ret = create_retpoline_sites_sections(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	if (opts.cfi) {
 		ret = create_cfi_sections(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	if (opts.rethunk) {
 		ret = create_return_sites_sections(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 
 		if (opts.hack_skylake) {
 			ret = create_direct_call_sections(file);
 			if (ret < 0)
 				goto out;
 			warnings += ret;
 		}
 	}
 
 	if (opts.mcount) {
 		ret = create_mcount_loc_sections(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	if (opts.prefix) {
 		ret = add_prefix_symbols(file);
 		if (ret < 0)
-			return ret;
+			goto out;
 		warnings += ret;
 	}
 
 	if (opts.ibt) {
 		ret = create_ibt_endbr_seal_sections(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	if (opts.orc && nr_insns) {
 		ret = orc_create(file);
 		if (ret < 0)
 			goto out;
 		warnings += ret;
 	}
 
 	free_insns(file);
 
 	if (opts.verbose)
 		disas_warned_funcs(file);
 
 	if (opts.stats) {
 		printf("nr_insns_visited: %ld\n", nr_insns_visited);
 		printf("nr_cfi: %ld\n", nr_cfi);
 		printf("nr_cfi_reused: %ld\n", nr_cfi_reused);
 		printf("nr_cfi_cache: %ld\n", nr_cfi_cache);
 	}
 
 out:
-	/*
-	 *  For now, don't fail the kernel build on fatal warnings.  These
-	 *  errors are still fairly common due to the growing matrix of
-	 *  supported toolchains and their recent pace of change.
-	 */
-	return 0;
+	return ret < 0 ? ret : 0;
 }