[v6,03/20] modpost: detect section mismatch for R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS
Message ID | 20230521160426.1881124-4-masahiroy@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp978927vqo; Sun, 21 May 2023 10:18:28 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4w67hhRDmgTRdI0H+PoExvQbdIfj7SFq18lDObShmmjsZewYFWXGFpUJ7JUJE8SuKj5bW6 X-Received: by 2002:a05:6a00:2e9e:b0:64c:ae1c:33ac with SMTP id fd30-20020a056a002e9e00b0064cae1c33acmr11495623pfb.25.1684689507868; Sun, 21 May 2023 10:18:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684689507; cv=none; d=google.com; s=arc-20160816; b=FfOvYnYokXnqs80msUdUiPJ0BZg2dG6+gaonsvez27Pk0xxCtAFUso0BQAm9YsrY0s LNLegRMAwfkx6nqLmvalM0WJLGX6F3mcmkUbG7VGn09lUDZ7cf/Y4GGgvIHHrG+ViRGY PhbbJ2GTMD4arCWVm+yZTuhznY7Sd1tPPM+iVyGKEHT0Oq5YJfD2nDKXc4bzAoYT3y3T 5cSjko5xXz32HpG2vtBdSldeEuL3KmfqZcPHhCfqAQtw6BOKxOBC/5LmJxOFmuANM3oC 2nqDxzxv8HMzdpAKEK2W4urjnMMcDd5Y9o3lGniDtNtW5cWXc2E3oeBHxoWD071UjcGy MVZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lh4F5mYvW06Yykdy7tZ5hqY9GSfajE0RfKXMIhB5uCM=; b=HFnXDbIPKIgFbiHI0J+Lo+bU7IIjv4yfVCbxk7T3mcXDNkzNg2VhFB67w3XNhWw2J+ ybvE8lkhHqn0YIJwzIcK7SpTYbd00p++n4XFQXw5f9GzdH1m2/7z2Ctf3scTiB2nlaGi 2FhCV2nOYFaw5XsTBDa4JmBWX3PJMs8yv7rOivk12pPT1adhjyOlHY/kkvAJ4KHUhy+6 LX7DJsZU+PNLKgdzRcYTIHyjynBJ8A0E276X6Fl4mLY8B8Qd/qizFhXzrkO4C23lNfQJ tLFs5RZC2XyWDdgGg/NCenf0jewW7bDE6EniLKMjas7W32RVeNWZr9PMWo4lVt4vrmXK s4Eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="jSg/z3aG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z28-20020aa79e5c000000b0064354230c2asi3353453pfq.367.2023.05.21.10.18.13; Sun, 21 May 2023 10:18:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="jSg/z3aG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230284AbjEUQF3 (ORCPT <rfc822;cscallsign@gmail.com> + 99 others); Sun, 21 May 2023 12:05:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230153AbjEUQFV (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 21 May 2023 12:05:21 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5206ECD; Sun, 21 May 2023 09:05:20 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D7A6F60FA7; Sun, 21 May 2023 16:05:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D86ECC433D2; Sun, 21 May 2023 16:05:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1684685119; bh=WHBV7iLszGwzqmoBiBCGvksDMYSMzg3ZUVZdErHe9Yg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jSg/z3aGOvkuyn5c5Urq2Q/TCkD0gnjBzHJ/S1qxaUK6+Fl/neoCMcMq5PMDX2t3U x9DBmCetn/NGYRUL1voVIqs4beFG+xlcjSGp79Lz+hdx+0uO+1mC+PYjv6jYLQkfnW EAOnq0H7IpKBFNsauz47zEInl1/v0r2Ng3o5w+2/USxCuOt9JCXpCWnYxMlDJlLAVn OudINlJvQf4Zpe7uXDiCMXBYD++Farbf5mi1SC7h7kBCmQJxecGbt4NuY4yAk1zoGN jHeFGEqZwxWAWyrbtkdvBXTvCiew1GQqsHcSYC2Buwx0HvDtVtXzUS0ZP9cE4Appn5 jbpzCkZPaVyzA== From: Masahiro Yamada <masahiroy@kernel.org> To: linux-kbuild@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Nathan Chancellor <nathan@kernel.org>, Nick Desaulniers <ndesaulniers@google.com>, Nicolas Schier <nicolas@fjasle.eu>, Masahiro Yamada <masahiroy@kernel.org> Subject: [PATCH v6 03/20] modpost: detect section mismatch for R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS Date: Mon, 22 May 2023 01:04:08 +0900 Message-Id: <20230521160426.1881124-4-masahiroy@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230521160426.1881124-1-masahiroy@kernel.org> References: <20230521160426.1881124-1-masahiroy@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766524985427736290?= X-GMAIL-MSGID: =?utf-8?q?1766524985427736290?= |
Series |
Unify <linux/export.h> and <asm/export.h>, remove EXPORT_DATA_SYMBOL(), faster TRIM_UNUSED_KSYMS
|
|
Commit Message
Masahiro Yamada
May 21, 2023, 4:04 p.m. UTC
ARM defconfig misses to detect some section mismatches.
[test code]
#include <linux/init.h>
int __initdata foo;
int get_foo(int x) { return foo; }
It is apparently a bad reference, but modpost does not report anything
for ARM defconfig (i.e. multi_v7_defconfig).
The test code above produces the following relocations.
Relocation section '.rel.text' at offset 0x200 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0
00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0
Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 0000022a R_ARM_PREL31 00000000 .text
00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0
Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
Add code to handle them. I checked arch/arm/kernel/module.c to learn
how the offset is encoded in the instruction.
The referenced symbol in relocation might be a local anchor.
If is_valid_name() returns false, let's search for a better symbol name.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---
scripts/mod/modpost.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
Comments
+ linux-arm-kernel On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > > ARM defconfig misses to detect some section mismatches. > > [test code] > > #include <linux/init.h> > > int __initdata foo; > int get_foo(int x) { return foo; } > > It is apparently a bad reference, but modpost does not report anything > for ARM defconfig (i.e. multi_v7_defconfig). > > The test code above produces the following relocations. > > Relocation section '.rel.text' at offset 0x200 contains 2 entries: > Offset Info Type Sym.Value Sym. Name > 00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0 > 00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0 > > Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries: > Offset Info Type Sym.Value Sym. Name > 00000000 0000022a R_ARM_PREL31 00000000 .text > 00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0 > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped. > > Add code to handle them. I checked arch/arm/kernel/module.c to learn > how the offset is encoded in the instruction. > > The referenced symbol in relocation might be a local anchor. > If is_valid_name() returns false, let's search for a better symbol name. > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> > --- > > scripts/mod/modpost.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c > index 34fbbd85bfde..ed2301e951a9 100644 > --- a/scripts/mod/modpost.c > +++ b/scripts/mod/modpost.c > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) > /** > * Find symbol based on relocation record info. > * In some cases the symbol supplied is a valid symbol so > - * return refsym. If st_name != 0 we assume this is a valid symbol. > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol. > * In other cases the symbol needs to be looked up in the symbol table > * based on section and address. > * **/ > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr, > Elf64_Sword d; > unsigned int relsym_secindex; > > - if (relsym->st_name != 0) > + if (is_valid_name(elf, relsym)) > return relsym; > > /* > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r) > unsigned int r_typ = ELF_R_TYPE(r->r_info); > Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info); > unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r)); > + int offset; > > switch (r_typ) { > case R_ARM_ABS32: > r->r_addend = inst + sym->st_value; > break; > + case R_ARM_MOVW_ABS_NC: > + case R_ARM_MOVT_ABS: > + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff); > + offset = (offset ^ 0x8000) - 0x8000; The code in arch/arm/kernel/module.c then right shifts the offset by 16 for R_ARM_MOVT_ABS. Is that necessary? > + offset += sym->st_value; > + r->r_addend = offset; > + break; > case R_ARM_PC24: > case R_ARM_CALL: > case R_ARM_JUMP24: > -- > 2.39.2 >
On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@google.com> wrote: > > + linux-arm-kernel > > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > > > > ARM defconfig misses to detect some section mismatches. > > > > [test code] > > > > #include <linux/init.h> > > > > int __initdata foo; > > int get_foo(int x) { return foo; } > > > > It is apparently a bad reference, but modpost does not report anything > > for ARM defconfig (i.e. multi_v7_defconfig). > > > > The test code above produces the following relocations. > > > > Relocation section '.rel.text' at offset 0x200 contains 2 entries: > > Offset Info Type Sym.Value Sym. Name > > 00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0 > > 00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0 > > > > Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries: > > Offset Info Type Sym.Value Sym. Name > > 00000000 0000022a R_ARM_PREL31 00000000 .text > > 00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0 > > > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped. > > > > Add code to handle them. I checked arch/arm/kernel/module.c to learn > > how the offset is encoded in the instruction. > > > > The referenced symbol in relocation might be a local anchor. > > If is_valid_name() returns false, let's search for a better symbol name. > > > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> > > --- > > > > scripts/mod/modpost.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c > > index 34fbbd85bfde..ed2301e951a9 100644 > > --- a/scripts/mod/modpost.c > > +++ b/scripts/mod/modpost.c > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) > > /** > > * Find symbol based on relocation record info. > > * In some cases the symbol supplied is a valid symbol so > > - * return refsym. If st_name != 0 we assume this is a valid symbol. > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol. > > * In other cases the symbol needs to be looked up in the symbol table > > * based on section and address. > > * **/ > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr, > > Elf64_Sword d; > > unsigned int relsym_secindex; > > > > - if (relsym->st_name != 0) > > + if (is_valid_name(elf, relsym)) > > return relsym; > > > > /* > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r) > > unsigned int r_typ = ELF_R_TYPE(r->r_info); > > Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info); > > unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r)); > > + int offset; > > > > switch (r_typ) { > > case R_ARM_ABS32: > > r->r_addend = inst + sym->st_value; > > break; > > + case R_ARM_MOVW_ABS_NC: > > + case R_ARM_MOVT_ABS: > > + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff); > > + offset = (offset ^ 0x8000) - 0x8000; > > The code in arch/arm/kernel/module.c then right shifts the offset by > 16 for R_ARM_MOVT_ABS. Is that necessary? > MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same value must be encoded in both instructions. When constructing the actual immediate value from the symbol value and the addend, only the top 16 bits are used in MOVT and the bottom 16 bits in MOVW. However, this code seems to borrow the Elf_Rela::addend field (which ARM does not use natively) to record the intermediate value, which would need to be split if it is used to fix up instruction opcodes. Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here. > > + offset += sym->st_value; > > + r->r_addend = offset; > > + break; > > case R_ARM_PC24: > > case R_ARM_CALL: > > case R_ARM_JUMP24: > > -- > > 2.39.2 > > > > > -- > Thanks, > ~Nick Desaulniers
On Tue, May 23, 2023 at 6:50 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > + linux-arm-kernel > > > > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > > > > > > ARM defconfig misses to detect some section mismatches. > > > > > > [test code] > > > > > > #include <linux/init.h> > > > > > > int __initdata foo; > > > int get_foo(int x) { return foo; } > > > > > > It is apparently a bad reference, but modpost does not report anything > > > for ARM defconfig (i.e. multi_v7_defconfig). > > > > > > The test code above produces the following relocations. > > > > > > Relocation section '.rel.text' at offset 0x200 contains 2 entries: > > > Offset Info Type Sym.Value Sym. Name > > > 00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0 > > > 00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0 > > > > > > Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries: > > > Offset Info Type Sym.Value Sym. Name > > > 00000000 0000022a R_ARM_PREL31 00000000 .text > > > 00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0 > > > > > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped. > > > > > > Add code to handle them. I checked arch/arm/kernel/module.c to learn > > > how the offset is encoded in the instruction. > > > > > > The referenced symbol in relocation might be a local anchor. > > > If is_valid_name() returns false, let's search for a better symbol name. > > > > > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> > > > --- > > > > > > scripts/mod/modpost.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c > > > index 34fbbd85bfde..ed2301e951a9 100644 > > > --- a/scripts/mod/modpost.c > > > +++ b/scripts/mod/modpost.c > > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) > > > /** > > > * Find symbol based on relocation record info. > > > * In some cases the symbol supplied is a valid symbol so > > > - * return refsym. If st_name != 0 we assume this is a valid symbol. > > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol. > > > * In other cases the symbol needs to be looked up in the symbol table > > > * based on section and address. > > > * **/ > > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr, > > > Elf64_Sword d; > > > unsigned int relsym_secindex; > > > > > > - if (relsym->st_name != 0) > > > + if (is_valid_name(elf, relsym)) > > > return relsym; > > > > > > /* > > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r) > > > unsigned int r_typ = ELF_R_TYPE(r->r_info); > > > Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info); > > > unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r)); > > > + int offset; > > > > > > switch (r_typ) { > > > case R_ARM_ABS32: > > > r->r_addend = inst + sym->st_value; > > > break; > > > + case R_ARM_MOVW_ABS_NC: > > > + case R_ARM_MOVT_ABS: > > > + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff); > > > + offset = (offset ^ 0x8000) - 0x8000; > > > > The code in arch/arm/kernel/module.c then right shifts the offset by > > 16 for R_ARM_MOVT_ABS. Is that necessary? > > > > MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same > value must be encoded in both instructions. In my understanding, 'movt' loads the immediate value to the upper 16-bit of the register. I am just curious about the code in arch/arm/kernel/module.c. Please see 'case R_ARM_MOVT_ABS:' part. [1] 'offset' is the immediate value encoded in instruction [2] Add sym->st_value [3] Right-shift 'offset' by 16 [4] Write it back to the instruction So, the immediate value encoded in the instruction is divided by 65536. I guess we need something like the following? (left-shift by 16). if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS || ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL) offset <<= 16; > > When constructing the actual immediate value from the symbol value and > the addend, only the top 16 bits are used in MOVT and the bottom 16 > bits in MOVW. > > However, this code seems to borrow the Elf_Rela::addend field (which > ARM does not use natively) to record the intermediate value, which > would need to be split if it is used to fix up instruction opcodes. At first, modpost supported only RELA for section mismatch checks. Later, 2c1a51f39d95 ("[PATCH] kbuild: check SHT_REL sections") added REL support. But, the common code still used Elf_Rela. modpost does not need to write back the fixed instruction. modpost is only interested in the offset address. Currently, modpost saves the offset address in r->r_offset even for Rel. I do not like this code. So, I am trying to reduce the use of Elf_Rela. For example, this patch. https://patchwork.kernel.org/project/linux-kbuild/patch/20230521160426.1881124-8-masahiroy@kernel.org/ > Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here. Right, if CONFIG_THUMB2_KERNEL=y, section mismatch check. Several relocation types are just skipped. > > > > > + offset += sym->st_value; > > > + r->r_addend = offset; > > > + break; > > > case R_ARM_PC24: > > > case R_ARM_CALL: > > > case R_ARM_JUMP24: > > > -- > > > 2.39.2 > > > > > > > > > -- > > Thanks, > > ~Nick Desaulniers -- Best Regards Masahiro Yamada
On Tue, 23 May 2023 at 13:59, Masahiro Yamada <masahiroy@kernel.org> wrote: > > On Tue, May 23, 2023 at 6:50 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > > > + linux-arm-kernel > > > > > > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > > > > > > > > ARM defconfig misses to detect some section mismatches. > > > > > > > > [test code] > > > > > > > > #include <linux/init.h> > > > > > > > > int __initdata foo; > > > > int get_foo(int x) { return foo; } > > > > > > > > It is apparently a bad reference, but modpost does not report anything > > > > for ARM defconfig (i.e. multi_v7_defconfig). > > > > > > > > The test code above produces the following relocations. > > > > > > > > Relocation section '.rel.text' at offset 0x200 contains 2 entries: > > > > Offset Info Type Sym.Value Sym. Name > > > > 00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0 > > > > 00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0 > > > > > > > > Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries: > > > > Offset Info Type Sym.Value Sym. Name > > > > 00000000 0000022a R_ARM_PREL31 00000000 .text > > > > 00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0 > > > > > > > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped. > > > > > > > > Add code to handle them. I checked arch/arm/kernel/module.c to learn > > > > how the offset is encoded in the instruction. > > > > > > > > The referenced symbol in relocation might be a local anchor. > > > > If is_valid_name() returns false, let's search for a better symbol name. > > > > > > > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> > > > > --- > > > > > > > > scripts/mod/modpost.c | 12 ++++++++++-- > > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c > > > > index 34fbbd85bfde..ed2301e951a9 100644 > > > > --- a/scripts/mod/modpost.c > > > > +++ b/scripts/mod/modpost.c > > > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) > > > > /** > > > > * Find symbol based on relocation record info. > > > > * In some cases the symbol supplied is a valid symbol so > > > > - * return refsym. If st_name != 0 we assume this is a valid symbol. > > > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol. > > > > * In other cases the symbol needs to be looked up in the symbol table > > > > * based on section and address. > > > > * **/ > > > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr, > > > > Elf64_Sword d; > > > > unsigned int relsym_secindex; > > > > > > > > - if (relsym->st_name != 0) > > > > + if (is_valid_name(elf, relsym)) > > > > return relsym; > > > > > > > > /* > > > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r) > > > > unsigned int r_typ = ELF_R_TYPE(r->r_info); > > > > Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info); > > > > unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r)); > > > > + int offset; > > > > > > > > switch (r_typ) { > > > > case R_ARM_ABS32: > > > > r->r_addend = inst + sym->st_value; > > > > break; > > > > + case R_ARM_MOVW_ABS_NC: > > > > + case R_ARM_MOVT_ABS: > > > > + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff); > > > > + offset = (offset ^ 0x8000) - 0x8000; > > > > > > The code in arch/arm/kernel/module.c then right shifts the offset by > > > 16 for R_ARM_MOVT_ABS. Is that necessary? > > > > > > > MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same > > value must be encoded in both instructions. > > > In my understanding, 'movt' loads the immediate value to > the upper 16-bit of the register. > Correct. It sets the upper 16 bits of a register without corrupting the lower 16 bits. > I am just curious about the code in arch/arm/kernel/module.c. > > Please see 'case R_ARM_MOVT_ABS:' part. > > [1] 'offset' is the immediate value encoded in instruction > [2] Add sym->st_value > [3] Right-shift 'offset' by 16 > [4] Write it back to the instruction > > So, the immediate value encoded in the instruction > is divided by 65536. > > I guess we need something like the following? > (left-shift by 16). > > if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS || > ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL) > offset <<= 16; > No. The addend is not encoded in the same way as the effective immediate value. The addend is limited to -/+ 32 KiB (range of s16), and the MOVT instruction must use the same addend value as the MOVW instruction it is paired with, without shifting. This is necessary because otherwise, there is no way to handle an addend/symbol combination that results in a carry between the lower and upper 16 bit words. This is a consequence of the use of REL format rather than RELA, where the addend is part of the relocation and not encoded in the instructions. > > > > > > > When constructing the actual immediate value from the symbol value and > > the addend, only the top 16 bits are used in MOVT and the bottom 16 > > bits in MOVW. > > > > However, this code seems to borrow the Elf_Rela::addend field (which > > ARM does not use natively) to record the intermediate value, which > > would need to be split if it is used to fix up instruction opcodes. > > At first, modpost supported only RELA for section mismatch checks. > > Later, 2c1a51f39d95 ("[PATCH] kbuild: check SHT_REL sections") > added REL support. > > But, the common code still used Elf_Rela. > > > modpost does not need to write back the fixed instruction. > modpost is only interested in the offset address. > > Currently, modpost saves the offset address in > r->r_offset even for Rel. I do not like this code. > > So, I am trying to reduce the use of Elf_Rela. > For example, this patch. > https://patchwork.kernel.org/project/linux-kbuild/patch/20230521160426.1881124-8-masahiroy@kernel.org/ > Yeah, that looks better to me. > > > Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here. > > Right, if CONFIG_THUMB2_KERNEL=y, section mismatch check. > > Several relocation types are just skipped. > Skipped entirely? Or only for the diagnostic print that outputs the symbol name?
On Tue, May 23, 2023 at 9:21 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Tue, 23 May 2023 at 13:59, Masahiro Yamada <masahiroy@kernel.org> wrote: > > > > On Tue, May 23, 2023 at 6:50 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > > > > > + linux-arm-kernel > > > > > > > > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > > > > > > > > > > ARM defconfig misses to detect some section mismatches. > > > > > > > > > > [test code] > > > > > > > > > > #include <linux/init.h> > > > > > > > > > > int __initdata foo; > > > > > int get_foo(int x) { return foo; } > > > > > > > > > > It is apparently a bad reference, but modpost does not report anything > > > > > for ARM defconfig (i.e. multi_v7_defconfig). > > > > > > > > > > The test code above produces the following relocations. > > > > > > > > > > Relocation section '.rel.text' at offset 0x200 contains 2 entries: > > > > > Offset Info Type Sym.Value Sym. Name > > > > > 00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0 > > > > > 00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0 > > > > > > > > > > Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries: > > > > > Offset Info Type Sym.Value Sym. Name > > > > > 00000000 0000022a R_ARM_PREL31 00000000 .text > > > > > 00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0 > > > > > > > > > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped. > > > > > > > > > > Add code to handle them. I checked arch/arm/kernel/module.c to learn > > > > > how the offset is encoded in the instruction. > > > > > > > > > > The referenced symbol in relocation might be a local anchor. > > > > > If is_valid_name() returns false, let's search for a better symbol name. > > > > > > > > > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> > > > > > --- > > > > > > > > > > scripts/mod/modpost.c | 12 ++++++++++-- > > > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c > > > > > index 34fbbd85bfde..ed2301e951a9 100644 > > > > > --- a/scripts/mod/modpost.c > > > > > +++ b/scripts/mod/modpost.c > > > > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) > > > > > /** > > > > > * Find symbol based on relocation record info. > > > > > * In some cases the symbol supplied is a valid symbol so > > > > > - * return refsym. If st_name != 0 we assume this is a valid symbol. > > > > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol. > > > > > * In other cases the symbol needs to be looked up in the symbol table > > > > > * based on section and address. > > > > > * **/ > > > > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr, > > > > > Elf64_Sword d; > > > > > unsigned int relsym_secindex; > > > > > > > > > > - if (relsym->st_name != 0) > > > > > + if (is_valid_name(elf, relsym)) > > > > > return relsym; > > > > > > > > > > /* > > > > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r) > > > > > unsigned int r_typ = ELF_R_TYPE(r->r_info); > > > > > Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info); > > > > > unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r)); > > > > > + int offset; > > > > > > > > > > switch (r_typ) { > > > > > case R_ARM_ABS32: > > > > > r->r_addend = inst + sym->st_value; > > > > > break; > > > > > + case R_ARM_MOVW_ABS_NC: > > > > > + case R_ARM_MOVT_ABS: > > > > > + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff); > > > > > + offset = (offset ^ 0x8000) - 0x8000; > > > > > > > > The code in arch/arm/kernel/module.c then right shifts the offset by > > > > 16 for R_ARM_MOVT_ABS. Is that necessary? > > > > > > > > > > MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same > > > value must be encoded in both instructions. > > > > > > In my understanding, 'movt' loads the immediate value to > > the upper 16-bit of the register. > > > > Correct. It sets the upper 16 bits of a register without corrupting > the lower 16 bits. > > > I am just curious about the code in arch/arm/kernel/module.c. > > > > Please see 'case R_ARM_MOVT_ABS:' part. > > > > [1] 'offset' is the immediate value encoded in instruction > > [2] Add sym->st_value > > [3] Right-shift 'offset' by 16 > > [4] Write it back to the instruction > > > > So, the immediate value encoded in the instruction > > is divided by 65536. > > > > I guess we need something like the following? > > (left-shift by 16). > > > > if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS || > > ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL) > > offset <<= 16; > > > > No. The addend is not encoded in the same way as the effective immediate value. > > The addend is limited to -/+ 32 KiB (range of s16), and the MOVT > instruction must use the same addend value as the MOVW instruction it > is paired with, without shifting. > > This is necessary because otherwise, there is no way to handle an > addend/symbol combination that results in a carry between the lower > and upper 16 bit words. This is a consequence of the use of REL format > rather than RELA, where the addend is part of the relocation and not > encoded in the instructions. Ah, OK. Now I understand. > > > > > > > > > > > > When constructing the actual immediate value from the symbol value and > > > the addend, only the top 16 bits are used in MOVT and the bottom 16 > > > bits in MOVW. > > > > > > However, this code seems to borrow the Elf_Rela::addend field (which > > > ARM does not use natively) to record the intermediate value, which > > > would need to be split if it is used to fix up instruction opcodes. > > > > At first, modpost supported only RELA for section mismatch checks. > > > > Later, 2c1a51f39d95 ("[PATCH] kbuild: check SHT_REL sections") > > added REL support. > > > > But, the common code still used Elf_Rela. > > > > > > modpost does not need to write back the fixed instruction. > > modpost is only interested in the offset address. > > > > Currently, modpost saves the offset address in > > r->r_offset even for Rel. I do not like this code. > > > > So, I am trying to reduce the use of Elf_Rela. > > For example, this patch. > > https://patchwork.kernel.org/project/linux-kbuild/patch/20230521160426.1881124-8-masahiroy@kernel.org/ > > > > Yeah, that looks better to me. > > > > > > Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here. > > > > Right, if CONFIG_THUMB2_KERNEL=y, section mismatch check. > > > > Several relocation types are just skipped. > > > > Skipped entirely? Or only for the diagnostic print that outputs the symbol name? Skipped entirely. modpost cannot detect section mismatches if you enable CONFIG_THUMB2_KERNEL. -- Best Regards Masahiro Yamada
On Tue, May 23, 2023 at 3:03 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > + linux-arm-kernel > > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > > > > ARM defconfig misses to detect some section mismatches. > > > > [test code] > > > > #include <linux/init.h> > > > > int __initdata foo; > > int get_foo(int x) { return foo; } > > > > It is apparently a bad reference, but modpost does not report anything > > for ARM defconfig (i.e. multi_v7_defconfig). > > > > The test code above produces the following relocations. > > > > Relocation section '.rel.text' at offset 0x200 contains 2 entries: > > Offset Info Type Sym.Value Sym. Name > > 00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0 > > 00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0 > > > > Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries: > > Offset Info Type Sym.Value Sym. Name > > 00000000 0000022a R_ARM_PREL31 00000000 .text > > 00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0 > > > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped. > > > > Add code to handle them. I checked arch/arm/kernel/module.c to learn > > how the offset is encoded in the instruction. > > > > The referenced symbol in relocation might be a local anchor. > > If is_valid_name() returns false, let's search for a better symbol name. > > > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> > > --- > > > > scripts/mod/modpost.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c > > index 34fbbd85bfde..ed2301e951a9 100644 > > --- a/scripts/mod/modpost.c > > +++ b/scripts/mod/modpost.c > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) > > /** > > * Find symbol based on relocation record info. > > * In some cases the symbol supplied is a valid symbol so > > - * return refsym. If st_name != 0 we assume this is a valid symbol. > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol. > > * In other cases the symbol needs to be looked up in the symbol table > > * based on section and address. > > * **/ > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr, > > Elf64_Sword d; > > unsigned int relsym_secindex; > > > > - if (relsym->st_name != 0) > > + if (is_valid_name(elf, relsym)) > > return relsym; > > > > /* > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r) > > unsigned int r_typ = ELF_R_TYPE(r->r_info); > > Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info); > > unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r)); > > + int offset; > > > > switch (r_typ) { > > case R_ARM_ABS32: > > r->r_addend = inst + sym->st_value; > > break; > > + case R_ARM_MOVW_ABS_NC: > > + case R_ARM_MOVT_ABS: > > + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff); > > + offset = (offset ^ 0x8000) - 0x8000; > > The code in arch/arm/kernel/module.c then right shifts the offset by > 16 for R_ARM_MOVT_ABS. Is that necessary? I replied to Ard's email, but just in case. modpost does not need to write back the fixed instruction. modpost is only interested in the offset address. So, the right-shift by 16 is unneeded.
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 34fbbd85bfde..ed2301e951a9 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) /** * Find symbol based on relocation record info. * In some cases the symbol supplied is a valid symbol so - * return refsym. If st_name != 0 we assume this is a valid symbol. + * return refsym. If is_valid_name() == true, we assume this is a valid symbol. * In other cases the symbol needs to be looked up in the symbol table * based on section and address. * **/ @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr, Elf64_Sword d; unsigned int relsym_secindex; - if (relsym->st_name != 0) + if (is_valid_name(elf, relsym)) return relsym; /* @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r) unsigned int r_typ = ELF_R_TYPE(r->r_info); Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info); unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r)); + int offset; switch (r_typ) { case R_ARM_ABS32: r->r_addend = inst + sym->st_value; break; + case R_ARM_MOVW_ABS_NC: + case R_ARM_MOVT_ABS: + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff); + offset = (offset ^ 0x8000) - 0x8000; + offset += sym->st_value; + r->r_addend = offset; + break; case R_ARM_PC24: case R_ARM_CALL: case R_ARM_JUMP24: