Message ID | cover.1682673542.git.houwenlong.hwl@antgroup.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp812382vqo; Fri, 28 Apr 2023 02:53:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5pojLzcd0+8emyNu7GuAEArZmzkxTFzlSh/m+s1w+AKBayhgWUdYSQREDzaBDZaxKQy+Jr X-Received: by 2002:a17:902:e883:b0:1a2:8c7e:f310 with SMTP id w3-20020a170902e88300b001a28c7ef310mr5153783plg.35.1682675599899; Fri, 28 Apr 2023 02:53:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682675599; cv=none; d=google.com; s=arc-20160816; b=N4g1LaDFNjmUWBUBbi1J58TGukZAKeQXqnj/r+4PzXkDY07vFtGEXMYy3wAkHRdeIf yXHQcbNiZdRPdvTZzgJzxEZY3wTjHKMhjNveRgNbFaElCQK2EUcIevBnpGaqeGcTXBz3 8JO7fGMZUADxOp5OM4IbB4TvHnP4XKoR4WOAmigFGKpe04GNumAuEn8CWzDnbEwOL+d9 P8oAZG5Cvg/IRAHaiV0yjuv9T69dJYhb0LIIOWvWwQZr6R/KL1p2PrZLtFjkmkGrktPq efV7dqP7D2e0JSvweFOEXAnOVndJ1AmSmsruVFJXPkg/FVR0sYsNRKdNq9Gp4iy97cP1 tmOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=1bRjA56wHms5M6IXaO2xgjI3P1A6r2ibDVhHq3856e0=; b=LcDNLRUR1g3SNpSH66Ts5f63BfDIXktyyNUByNQ0PpRI4rbZewlQgB9q1LkSQCJfGK lhjOXtN4T7/vhxyOt4zxXpjeBNIHNN62FR0SfEX0zlMgMmXVS6kLm638WzX/3vmIpWWd B+AljFUmMZvrSPT0DIPa2yTotw+/y8PsVYKjRd95GqzppgQK0wQvLmhUaqdy2FE0NnP0 8zaaFRVFE8d4jnzS/zeZWMpANl/nSwsUSvqVGl27VMvuPShfGk6RttHhymnml+LQVWEp JHzkUY4gjcGfT8uZRzKdUBGOH1b1dmWwTnKNtMLtyTgCQSfLMzws6QTrdPMwPt5RZHQf q3mQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=antgroup.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l12-20020a170903120c00b0019ca1961bc1si22309673plh.108.2023.04.28.02.53.05; Fri, 28 Apr 2023 02:53:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=antgroup.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345519AbjD1Jvu (ORCPT <rfc822;chrisjones.unixmen@gmail.com> + 99 others); Fri, 28 Apr 2023 05:51:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229556AbjD1Jvs (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 28 Apr 2023 05:51:48 -0400 Received: from out0-214.mail.aliyun.com (out0-214.mail.aliyun.com [140.205.0.214]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 713132701; Fri, 28 Apr 2023 02:51:44 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018047201;MF=houwenlong.hwl@antgroup.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---.STCEPEV_1682675495; Received: from localhost(mailfrom:houwenlong.hwl@antgroup.com fp:SMTPD_---.STCEPEV_1682675495) by smtp.aliyun-inc.com; Fri, 28 Apr 2023 17:51:36 +0800 From: "Hou Wenlong" <houwenlong.hwl@antgroup.com> To: linux-kernel@vger.kernel.org Cc: "Thomas Garnier" <thgarnie@chromium.org>, "Lai Jiangshan" <jiangshan.ljs@antgroup.com>, "Kees Cook" <keescook@chromium.org>, "Hou Wenlong" <houwenlong.hwl@antgroup.com>, "Nathan Chancellor" <nathan@kernel.org>, "Nick Desaulniers" <ndesaulniers@google.com>, "Tom Rix" <trix@redhat.com>, <bpf@vger.kernel.org>, <llvm@lists.linux.dev> Subject: [PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible Date: Fri, 28 Apr 2023 17:50:40 +0800 Message-Id: <cover.1682673542.git.houwenlong.hwl@antgroup.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764413250304603693?= X-GMAIL-MSGID: =?utf-8?q?1764413250304603693?= |
Series |
x86/pie: Make kernel image's virtual address flexible
|
|
Message
Hou Wenlong
April 28, 2023, 9:50 a.m. UTC
Purpose: These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. And this patchset provides an example to allow kernel image to be relocated in top 512G of the address space. The ultimate purpose for PIE kernel is to increase the security of the the kernel and also the fleixbility of the kernel image's virtual address, which can be even in the low half of the address space. More locations the kernel can fit in, this means an attacker could guess harder. The patchset is based on Thomas Garnier's X86 PIE patchset v6[1] and v11[2]. However, some design changes are made and some bugs are fixed by testing with different configurations and compilers. Important changes: - For fixmap area, move vsyscall page out of fixmap area and unify __FIXADDR_TOP for x86. Then fixmap area could be relocated together with kernel image. - For compile-time base address of kernel image, keep it in top 2G of address space. Introduce a new variable to store the run-time base address and adapt for VA/PA transition during runtime. - For percpu section, keep it as zero mapping for SMP. Because compile-time base address of kernel image still resides in top 2G of address space, then RIP-relative reference can still be used when percpu section is zero mapping. However, when do relocation for percpu variable references, percpu variable should be treated as normal variable and absolute references should be relocated accordingly. In addition, the relocation offset should be subtracted from the GS base in order to ensure correct operation. - For x86/boot/head64.c, don't build it as mcmodel=large. Instead, use data relocation to acqiure global symbol's value and make fixup_pointer() as a nop when running in identity mapping. This is because not all global symbol references in the code use fixup_pointer(), e.g. variables in macro related to 5-level paging, which can be optimized by GCC as relative referencs. If build it as mcmodel=large, there will be more fixup_pointer() calls, resulting in uglier code. Actually, if build it as PIE even when CONFIG_X86_PIE is disabled, then all fixup_pointer() could be dropped. However stack protector would be broken if per-cpu stack protector is not supported. Limitations: - Since I am not familiar with XEN, it has been disabled for now as it is not adapted for PIE. This is due to the assignment of wrong pointers (with low address values) to x86_ini_ops when running in identity mapping. This issue can be resolved by building pagetable eraly and jumping to high kernel address as soon as possible. - It is not allowed to reference global variables in an alternative section since RIP-relative addressing is not fixed in apply_alternatives(). Fortunately, all disallowed relocations in the alternative section can be captured by objtool. I believe that this issue can also be fixed by using objtool. - For module loading, only allow to load module without GOT for simplicity. Only weak global variable referencs are using GOT. Tests: I only have tested booting with GCC 5.1.0 (min version), GCC 12.2.0 and CLANG 15.0.7. And I have also run the following tests for both default configuration and Ubuntu configuration. Performance/Size impact (GCC 12.2.0): Size of vmlinux (Default configuration): File size: - PIE disabled: +0.012% - PIE enabled: -2.219% instructions: - PIE disabled: same - PIE enabled: +1.383% .text section: - PIE disabled: same - PIE enabled: +0.589% Size of vmlinux (Ubuntu configuration): File size: - PIE disabled: same - PIE enabled: +2.391% instructions: - PIE disabled: +0.013% - PIE enabled: +1.566% .text section: - PIE disabled: same - PIE enabled: +0.055% The .text section size increase is due to more instructions required for PIE code. There are two reasons that have been mentioned in previous mailist. Firstly, switch folding is disabled under PIE [3]. Secondly, two instructions are needed for PIE to represent a single instruction with sign extension, such as when accessing an array element. While only one instruction is required when using mcmode=kernel, for PIE, it needs to use lea to get the base of the array first. Hackbench (50% and 1600% on thread/process for pipe/sockets): - PIE disabled: no significant change (avg -/+ 0.5% on default config). - PIE enabled: -2% to +2% in average (default config). Kernbench (average of 10 Half and Optimal runs): Elapsed Time: - PIE disabled: no significant change (avg -0.2% on ubuntu config) - PIE enabled: average -0.2% to +0.2% System Time: - PIE disabled: no significant change (avg -0.5% on ubuntu config) - PIE enabled: average -0.5% to +0.5% [1] https://lore.kernel.org/all/20190131192533.34130-1-thgarnie@chromium.org [2] https://lore.kernel.org/all/20200228000105.165012-1-thgarnie@chromium.org [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 Brian Gerst (1): x86-64: Use per-cpu stack canary if supported by compiler Hou Wenlong (29): x86/irq: Adapt assembly for PIE support x86,rethook: Adapt assembly for PIE support x86/paravirt: Use relative reference for original instruction x86/Kconfig: Introduce new Kconfig for PIE kernel building x86/PVH: Use fixed_percpu_data to set up GS base x86/pie: Enable stack protector only if per-cpu stack canary is supported x86/percpu: Use PC-relative addressing for percpu variable references x86/tools: Explicitly include autoconf.h for hostprogs x86/percpu: Adapt percpu references relocation for PIE support x86/ftrace: Adapt assembly for PIE support x86/pie: Force hidden visibility for all symbol references x86/boot/compressed: Adapt sed command to generate voffset.h when PIE is enabled x86/pie: Add .data.rel.* sections into link script KVM: x86: Adapt assembly for PIE support x86/PVH: Adapt PVH booting for PIE support x86/bpf: Adapt BPF_CALL JIT codegen for PIE support x86/modules: Adapt module loading for PIE support x86/boot/64: Use data relocation to get absloute address when PIE is enabled objtool: Add validation for x86 PIE support objtool: Adapt indirect call of __fentry__() for PIE support x86/pie: Build the kernel as PIE x86/vsyscall: Don't use set_fixmap() to map vsyscall page x86/xen: Pin up to VSYSCALL_ADDR when vsyscall page is out of fixmap area x86/fixmap: Move vsyscall page out of fixmap area x86/fixmap: Unify FIXADDR_TOP x86/boot: Fill kernel image puds dynamically x86/mm: Sort address_markers array when X86 PIE is enabled x86/pie: Allow kernel image to be relocated in top 512G x86/boot: Extend relocate range for PIE kernel image Thomas Garnier (13): x86/crypto: Adapt assembly for PIE support x86: Add macro to get symbol address for PIE support x86: relocate_kernel - Adapt assembly for PIE support x86/entry/64: Adapt assembly for PIE support x86: pm-trace: Adapt assembly for PIE support x86/CPU: Adapt assembly for PIE support x86/acpi: Adapt assembly for PIE support x86/boot/64: Adapt assembly for PIE support x86/power/64: Adapt assembly for PIE support x86/alternatives: Adapt assembly for PIE support x86/ftrace: Adapt ftrace nop patching for PIE support x86/mm: Make the x86 GOT read-only x86/relocs: Handle PIE relocations Documentation/x86/x86_64/mm.rst | 4 + arch/x86/Kconfig | 36 +++++- arch/x86/Makefile | 33 +++-- arch/x86/boot/compressed/Makefile | 2 +- arch/x86/boot/compressed/kaslr.c | 55 +++++++++ arch/x86/boot/compressed/misc.c | 4 +- arch/x86/boot/compressed/misc.h | 9 ++ arch/x86/crypto/aegis128-aesni-asm.S | 6 +- arch/x86/crypto/aesni-intel_asm.S | 2 +- arch/x86/crypto/aesni-intel_avx-x86_64.S | 3 +- arch/x86/crypto/aria-aesni-avx-asm_64.S | 30 ++--- arch/x86/crypto/camellia-aesni-avx-asm_64.S | 30 ++--- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 30 ++--- arch/x86/crypto/camellia-x86_64-asm_64.S | 8 +- arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 50 ++++---- arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 44 ++++--- arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +- arch/x86/crypto/des3_ede-asm_64.S | 96 ++++++++++----- arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +- arch/x86/crypto/sha256-avx2-asm.S | 18 ++- arch/x86/entry/calling.h | 17 ++- arch/x86/entry/entry_64.S | 22 +++- arch/x86/entry/vdso/Makefile | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 7 +- arch/x86/include/asm/alternative.h | 6 +- arch/x86/include/asm/asm.h | 1 + arch/x86/include/asm/fixmap.h | 28 +---- arch/x86/include/asm/irq_stack.h | 2 +- arch/x86/include/asm/kmsan.h | 6 +- arch/x86/include/asm/nospec-branch.h | 10 +- arch/x86/include/asm/page_64.h | 8 +- arch/x86/include/asm/page_64_types.h | 8 ++ arch/x86/include/asm/paravirt.h | 17 ++- arch/x86/include/asm/paravirt_types.h | 12 +- arch/x86/include/asm/percpu.h | 29 ++++- arch/x86/include/asm/pgtable_64_types.h | 10 +- arch/x86/include/asm/pm-trace.h | 2 +- arch/x86/include/asm/processor.h | 17 ++- arch/x86/include/asm/sections.h | 5 + arch/x86/include/asm/stackprotector.h | 16 ++- arch/x86/include/asm/sync_core.h | 6 +- arch/x86/include/asm/vsyscall.h | 13 ++ arch/x86/kernel/acpi/wakeup_64.S | 31 ++--- arch/x86/kernel/alternative.c | 8 +- arch/x86/kernel/asm-offsets_64.c | 2 +- arch/x86/kernel/callthunks.c | 2 +- arch/x86/kernel/cpu/common.c | 15 ++- arch/x86/kernel/ftrace.c | 46 ++++++- arch/x86/kernel/ftrace_64.S | 9 +- arch/x86/kernel/head64.c | 77 +++++++++--- arch/x86/kernel/head_64.S | 68 ++++++++--- arch/x86/kernel/kvm.c | 21 +++- arch/x86/kernel/module.c | 27 +++++ arch/x86/kernel/paravirt.c | 4 + arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kernel/rethook.c | 8 ++ arch/x86/kernel/setup.c | 6 + arch/x86/kernel/vmlinux.lds.S | 10 +- arch/x86/kvm/svm/vmenter.S | 10 +- arch/x86/kvm/vmx/vmenter.S | 2 +- arch/x86/lib/cmpxchg16b_emu.S | 8 +- arch/x86/mm/dump_pagetables.c | 36 +++++- arch/x86/mm/fault.c | 1 - arch/x86/mm/init_64.c | 10 +- arch/x86/mm/ioremap.c | 5 +- arch/x86/mm/kasan_init_64.c | 4 +- arch/x86/mm/pat/set_memory.c | 2 +- arch/x86/mm/pgtable.c | 13 ++ arch/x86/mm/pgtable_32.c | 3 - arch/x86/mm/physaddr.c | 14 +-- arch/x86/net/bpf_jit_comp.c | 17 ++- arch/x86/platform/efi/efi_thunk_64.S | 4 + arch/x86/platform/pvh/head.S | 29 ++++- arch/x86/power/hibernate_asm_64.S | 4 +- arch/x86/tools/Makefile | 4 +- arch/x86/tools/relocs.c | 113 ++++++++++++++++- arch/x86/xen/mmu_pv.c | 32 +++-- arch/x86/xen/xen-asm.S | 10 +- arch/x86/xen/xen-head.S | 14 ++- include/asm-generic/vmlinux.lds.h | 12 ++ scripts/Makefile.lib | 1 + scripts/recordmcount.c | 81 ++++++++----- tools/objtool/arch/x86/decode.c | 10 +- tools/objtool/builtin-check.c | 4 +- tools/objtool/check.c | 121 +++++++++++++++++++ tools/objtool/include/objtool/builtin.h | 1 + 86 files changed, 1202 insertions(+), 410 deletions(-) Patchset is based on tip/master. base-commit: 01cbd032298654fe4c85e153dd9a224e5bc10194 -- 2.31.1
Comments
For some raison I didn't get 0/n but did get all of the others. Please keep your Cc list consistent. On Fri, Apr 28, 2023 at 05:50:40PM +0800, Hou Wenlong wrote: > - It is not allowed to reference global variables in an alternative > section since RIP-relative addressing is not fixed in > apply_alternatives(). Fortunately, all disallowed relocations in the > alternative section can be captured by objtool. I believe that this > issue can also be fixed by using objtool. https://lkml.kernel.org/r/Y9py2a5Xw0xbB8ou@hirez.programming.kicks-ass.net
On Fri, 28 Apr 2023 at 11:55, Hou Wenlong <houwenlong.hwl@antgroup.com> wrote: > > From: Thomas Garnier <thgarnie@chromium.org> > > From: Thomas Garnier <thgarnie@chromium.org> > > The GOT is changed during early boot when relocations are applied. Make > it read-only directly. This table exists only for PIE binary. Since weak > symbol reference would always be GOT reference, there are 8 entries in > GOT, but only one entry for __fentry__() is in use. Other GOT > references have been optimized by linker. > > [Hou Wenlong: Change commit message and skip GOT size check] > > Signed-off-by: Thomas Garnier <thgarnie@chromium.org> > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> > Cc: Lai Jiangshan <jiangshan.ljs@antgroup.com> > Cc: Kees Cook <keescook@chromium.org> > --- > arch/x86/kernel/vmlinux.lds.S | 2 ++ > include/asm-generic/vmlinux.lds.h | 12 ++++++++++++ > 2 files changed, 14 insertions(+) > > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S > index f02dcde9f8a8..fa4c6582663f 100644 > --- a/arch/x86/kernel/vmlinux.lds.S > +++ b/arch/x86/kernel/vmlinux.lds.S > @@ -462,6 +462,7 @@ SECTIONS > #endif > "Unexpected GOT/PLT entries detected!") > > +#ifndef CONFIG_X86_PIE > /* > * Sections that should stay zero sized, which is safer to > * explicitly check instead of blindly discarding. > @@ -470,6 +471,7 @@ SECTIONS > *(.got) *(.igot.*) > } > ASSERT(SIZEOF(.got) == 0, "Unexpected GOT entries detected!") > +#endif > > .plt : { > *(.plt) *(.plt.*) *(.iplt) > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index d1f57e4868ed..438ed8b39896 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -441,6 +441,17 @@ > __end_ro_after_init = .; > #endif > > +#ifdef CONFIG_X86_PIE > +#define RO_GOT_X86 Please don't put X86 specific stuff in generic code. > + .got : AT(ADDR(.got) - LOAD_OFFSET) { \ > + __start_got = .; \ > + *(.got) *(.igot.*); \ > + __end_got = .; \ > + } > +#else > +#define RO_GOT_X86 > +#endif > + I don't think it makes sense for this definition to be conditional. You can include it conditionally from the x86 code, but even that seems unnecessary, given that it will be empty otherwise. > /* > * .kcfi_traps contains a list KCFI trap locations. > */ > @@ -486,6 +497,7 @@ > BOUNDED_SECTION_PRE_LABEL(.pci_fixup_suspend_late, _pci_fixups_suspend_late, __start, __end) \ > } \ > \ > + RO_GOT_X86 \ > FW_LOADER_BUILT_IN_DATA \ > TRACEDATA \ > \ > -- > 2.31.1 >
On Fri, Apr 28, 2023 at 2:52 AM Hou Wenlong <houwenlong.hwl@antgroup.com> wrote: > > From: Brian Gerst <brgerst@gmail.com> > > From: Brian Gerst <brgerst@gmail.com> > > If the compiler supports it, use a standard per-cpu variable for the > stack protector instead of the old fixed location. Keep the fixed > location code for compatibility with older compilers. > > [Hou Wenlong: Disable it on Clang, adapt new code change and adapt > missing GS set up path in pvh_start_xen()] > > Signed-off-by: Brian Gerst <brgerst@gmail.com> > Co-developed-by: Hou Wenlong <houwenlong.hwl@antgroup.com> > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> > Cc: Thomas Garnier <thgarnie@chromium.org> > Cc: Lai Jiangshan <jiangshan.ljs@antgroup.com> > Cc: Kees Cook <keescook@chromium.org> > --- > arch/x86/Kconfig | 12 ++++++++++++ > arch/x86/Makefile | 21 ++++++++++++++------- > arch/x86/entry/entry_64.S | 6 +++++- > arch/x86/include/asm/processor.h | 17 ++++++++++++----- > arch/x86/include/asm/stackprotector.h | 16 +++++++--------- > arch/x86/kernel/asm-offsets_64.c | 2 +- > arch/x86/kernel/cpu/common.c | 15 +++++++-------- > arch/x86/kernel/head_64.S | 16 ++++++++++------ > arch/x86/kernel/vmlinux.lds.S | 4 +++- > arch/x86/platform/pvh/head.S | 8 ++++++++ > arch/x86/xen/xen-head.S | 14 +++++++++----- > 11 files changed, 88 insertions(+), 43 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 68e5da464b96..55cce8cdf9bd 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -410,6 +410,18 @@ config CC_HAS_SANE_STACKPROTECTOR > the compiler produces broken code or if it does not let us control > the segment on 32-bit kernels. > > +config CC_HAS_CUSTOMIZED_STACKPROTECTOR > + bool > + # Although clang supports -mstack-protector-guard-reg option, it > + # would generate GOT reference for __stack_chk_guard even with > + # -fno-PIE flag. > + default y if (!CC_IS_CLANG && $(cc-option,-mstack-protector-guard-reg=gs)) Hi Hou, I've filed this bug against LLVM and will work with LLVM folks at Intel to resolve: https://github.com/llvm/llvm-project/issues/62481 Can you please review that report and let me know here or there if I missed anything? Would you also mind including a link to that in the comments in the next version of this patch? Less relevant issues I filed looking at some related codegen: https://github.com/llvm/llvm-project/issues/62482 https://github.com/llvm/llvm-project/issues/62480 And we should probably look into: https://github.com/llvm/llvm-project/issues/22476
On 28.04.23 11:50, Hou Wenlong wrote: > From: Brian Gerst <brgerst@gmail.com> > > From: Brian Gerst <brgerst@gmail.com> > > If the compiler supports it, use a standard per-cpu variable for the > stack protector instead of the old fixed location. Keep the fixed > location code for compatibility with older compilers. > > [Hou Wenlong: Disable it on Clang, adapt new code change and adapt > missing GS set up path in pvh_start_xen()] > > Signed-off-by: Brian Gerst <brgerst@gmail.com> > Co-developed-by: Hou Wenlong <houwenlong.hwl@antgroup.com> > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> > Cc: Thomas Garnier <thgarnie@chromium.org> > Cc: Lai Jiangshan <jiangshan.ljs@antgroup.com> > Cc: Kees Cook <keescook@chromium.org> > --- > arch/x86/Kconfig | 12 ++++++++++++ > arch/x86/Makefile | 21 ++++++++++++++------- > arch/x86/entry/entry_64.S | 6 +++++- > arch/x86/include/asm/processor.h | 17 ++++++++++++----- > arch/x86/include/asm/stackprotector.h | 16 +++++++--------- > arch/x86/kernel/asm-offsets_64.c | 2 +- > arch/x86/kernel/cpu/common.c | 15 +++++++-------- > arch/x86/kernel/head_64.S | 16 ++++++++++------ > arch/x86/kernel/vmlinux.lds.S | 4 +++- > arch/x86/platform/pvh/head.S | 8 ++++++++ > arch/x86/xen/xen-head.S | 14 +++++++++----- > 11 files changed, 88 insertions(+), 43 deletions(-) > ... > diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S > index 643d02900fbb..09eaf59e8066 100644 > --- a/arch/x86/xen/xen-head.S > +++ b/arch/x86/xen/xen-head.S > @@ -51,15 +51,19 @@ SYM_CODE_START(startup_xen) > > leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp > > - /* Set up %gs. > - * > - * The base of %gs always points to fixed_percpu_data. If the > - * stack protector canary is enabled, it is located at %gs:40. > + /* > + * Set up GS base. > * Note that, on SMP, the boot cpu uses init data section until > * the per cpu areas are set up. > */ > movl $MSR_GS_BASE,%ecx > - movq $INIT_PER_CPU_VAR(fixed_percpu_data),%rax > +#if defined(CONFIG_STACKPROTECTOR_FIXED) > + leaq INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx > +#elif defined(CONFIG_SMP) > + movabs $__per_cpu_load, %rdx Shouldn't above 2 targets be %rax? > +#else > + xorl %eax, %eax > +#endif > cdq > wrmsr > Juergen
On Thu, May 4, 2023 at 11:14 PM Hou Wenlong <houwenlong.hwl@antgroup.com> wrote: > > On Tue, May 02, 2023 at 01:27:53AM +0800, Nick Desaulniers wrote: > > On Fri, Apr 28, 2023 at 2:52 AM Hou Wenlong <houwenlong.hwl@antgroup.com> wrote: > > > > > > +config CC_HAS_CUSTOMIZED_STACKPROTECTOR > > > + bool > > > + # Although clang supports -mstack-protector-guard-reg option, it > > > + # would generate GOT reference for __stack_chk_guard even with > > > + # -fno-PIE flag. > > > + default y if (!CC_IS_CLANG && $(cc-option,-mstack-protector-guard-reg=gs)) > > > > Hi Hou, > > I've filed this bug against LLVM and will work with LLVM folks at > > Intel to resolve: > > https://github.com/llvm/llvm-project/issues/62481 > > Can you please review that report and let me know here or there if I > > missed anything? Would you also mind including a link to that in the > > comments in the next version of this patch? > > > Hi Nick, > > Thanks for your help, I'll include the link in the next version. > Actually, I had post an issue on github too when I test the patch on > LLVM. But no replies. :(. Ah, sorry about that. The issue tracker is pretty high volume and stuff gets missed. With many users comes many bug reports. We could be better about triage though. If it's specific to the Linux kernel, https://github.com/ClangBuiltLinux/linux/issues is a better issue tracker to use; we can move bug reports upstream to https://github.com/llvm/llvm-project/issues/ when necessary. It's linked off of clangbuiltlinux.github.io if you lose it. > https://github.com/llvm/llvm-project/issues/60116 > > There is another problem I met for this patch, some unexpected code > are generated: > > do_one_initcall: (init/main.o) > ...... > movq __stack_chk_guard(%rip), %rax > movq %rax,0x2b0(%rsp) > > The complier generates wrong instruction, no GOT reference and gs > register. I only see it in init/main.c file. I have tried to move the > function into another file and compiled it with same cflags. It could > generate right instruction for the function in another file. The wrong instruction or the wrong operand? This is loading the canary into the stack slot in the fn prolog. Perhaps the expected cflag is not getting set (or being removed) from init/main.c? You should be able to do: $ make LLVM=1 init/main.o V=1 to see how clang was invoked to see if the expected flag was there, or not. > > The LLVM chain toolsare built by myself: > clang version 15.0.7 (https://github.com/llvm/llvm-project.git > 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a) Perhaps worth rebuilding with top of tree, which is clang 17. > > > Less relevant issues I filed looking at some related codegen: > > https://github.com/llvm/llvm-project/issues/62482 > > https://github.com/llvm/llvm-project/issues/62480 > > > > And we should probably look into: > > https://github.com/llvm/llvm-project/issues/22476 > > > > > > Except for per-cpu stack canary patch, there is another issue I post on > github: https://github.com/llvm/llvm-project/issues/60096 Thanks, I'll bring that up with Intel, too. > > The related patch is: > https://lore.kernel.org/lkml/175116f75c38c15d8d73a03301eab805fea13a0a.1682673543.git.houwenlong.hwl@antgroup.com/ > > I couldn't find the related documentation about that, hope you can help > me too. > > One more problem that I didn't post is: > https://lore.kernel.org/lkml/8d6bbaf66b90cf1a8fd2c5da98f5e094b9ffcb27.1682673543.git.houwenlong.hwl@antgroup.com/ Mind filing another bug for this in llvm's issue tracker? We can discuss there if LLD needs to be doing something different. Thanks for uncovering these and helping us get them fixed up!
On Fri, May 5, 2023 at 11:02 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Thu, May 4, 2023 at 11:14 PM Hou Wenlong <houwenlong.hwl@antgroup.com> wrote: > > > > On Tue, May 02, 2023 at 01:27:53AM +0800, Nick Desaulniers wrote: > > > On Fri, Apr 28, 2023 at 2:52 AM Hou Wenlong <houwenlong.hwl@antgroup.com> wrote: > > > > > > > > +config CC_HAS_CUSTOMIZED_STACKPROTECTOR > > > > + bool > > > > + # Although clang supports -mstack-protector-guard-reg option, it > > > > + # would generate GOT reference for __stack_chk_guard even with > > > > + # -fno-PIE flag. > > > > + default y if (!CC_IS_CLANG && $(cc-option,-mstack-protector-guard-reg=gs)) > > > > > > Hi Hou, > > > I've filed this bug against LLVM and will work with LLVM folks at > > > Intel to resolve: > > > https://github.com/llvm/llvm-project/issues/62481 > > > Can you please review that report and let me know here or there if I > > > missed anything? Would you also mind including a link to that in the > > > comments in the next version of this patch? > > > > > Hi Nick, > > > > Thanks for your help, I'll include the link in the next version. > > Actually, I had post an issue on github too when I test the patch on > > LLVM. But no replies. :(. > > Ah, sorry about that. The issue tracker is pretty high volume and > stuff gets missed. With many users comes many bug reports. We could > be better about triage though. If it's specific to the Linux kernel, > https://github.com/ClangBuiltLinux/linux/issues is a better issue > tracker to use; we can move bug reports upstream to > https://github.com/llvm/llvm-project/issues/ when necessary. It's > linked off of clangbuiltlinux.github.io if you lose it. > > > https://github.com/llvm/llvm-project/issues/60116 > > > > There is another problem I met for this patch, some unexpected code > > are generated: > > > > do_one_initcall: (init/main.o) > > ...... > > movq __stack_chk_guard(%rip), %rax > > movq %rax,0x2b0(%rsp) > > > > The complier generates wrong instruction, no GOT reference and gs > > register. I only see it in init/main.c file. I have tried to move the > > function into another file and compiled it with same cflags. It could > > generate right instruction for the function in another file. > > The wrong instruction or the wrong operand? This is loading the > canary into the stack slot in the fn prolog. Perhaps the expected > cflag is not getting set (or being removed) from init/main.c? You > should be able to do: > > $ make LLVM=1 init/main.o V=1 > > to see how clang was invoked to see if the expected flag was there, or not. > > > > > The LLVM chain toolsare built by myself: > > clang version 15.0.7 (https://github.com/llvm/llvm-project.git > > 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a) > > Perhaps worth rebuilding with top of tree, which is clang 17. > > > > > > Less relevant issues I filed looking at some related codegen: > > > https://github.com/llvm/llvm-project/issues/62482 > > > https://github.com/llvm/llvm-project/issues/62480 > > > > > > And we should probably look into: > > > https://github.com/llvm/llvm-project/issues/22476 > > > > > > > > > > Except for per-cpu stack canary patch, there is another issue I post on > > github: https://github.com/llvm/llvm-project/issues/60096 > > Thanks, I'll bring that up with Intel, too. > > > > > The related patch is: > > https://lore.kernel.org/lkml/175116f75c38c15d8d73a03301eab805fea13a0a.1682673543.git.houwenlong.hwl@antgroup.com/ > > > > I couldn't find the related documentation about that, hope you can help > > me too. > > > > One more problem that I didn't post is: > > https://lore.kernel.org/lkml/8d6bbaf66b90cf1a8fd2c5da98f5e094b9ffcb27.1682673543.git.houwenlong.hwl@antgroup.com/ > > Mind filing another bug for this in llvm's issue tracker? We can > discuss there if LLD needs to be doing something different. > > Thanks for uncovering these and helping us get them fixed up! > -- > Thanks, > ~Nick Desaulniers In my opinion, Clang's behavior is working as intended. The Linux kernel should support R_X86_64_REX_GOTPCRELX, and the solution is straightforward: treat R_X86_64_REX_GOTPCRELX the same way as R_X86_64_PC32 (-shared -Bsymbolic), assuming that every symbol is defined, which means that every symbol is non-preemptible. Clang's `-fno-pic` option chooses `R_X86_64_REX_GOTPCRELX` which is correct, although it differs from GCC's `-fno-pic` option. The compiler doesn't know whether `__stack_chk_guard` will be provided by the main executable (`libc.a`) or a shared object (`libc.so`, available on some ports of glibc but not x86, on musl this is available for all ports). (Also see `__stack_chk_guard` on https://maskray.me/blog/2022-12-18-control-flow-integrity) If an `R_X86_64_32` relocation is used and `__stack_chk_guard` is defined by a shared object, copy relocation. We will need an ELF hack called [copy relocation](https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected). The instruction movq __stack_chk_guard@GOTPCREL(%rip), %rbx produces an R_X86_64_REX_GOTPCRELX relocation. If `__stack_chk_guard` is non-preemptible, linkers can [optimize the access to be direct](https://maskray.me/blog/2021-08-29-all-about-global-offset-table#got-optimization). Although we could technically use the `-fno-direct-access-external-data` option to switch between `R_X86_64_REX_GOTPCRELX` and `R_X86_64_32`, I think there is no justification to complicate the compiler.
On Fri, Apr 28, 2023 at 11:22:06PM +0800, Peter Zijlstra wrote: > > For some raison I didn't get 0/n but did get all of the others. Please > keep your Cc list consistent. > Sorry, I'll pay attention next time. > On Fri, Apr 28, 2023 at 05:50:40PM +0800, Hou Wenlong wrote: > > > - It is not allowed to reference global variables in an alternative > > section since RIP-relative addressing is not fixed in > > apply_alternatives(). Fortunately, all disallowed relocations in the > > alternative section can be captured by objtool. I believe that this > > issue can also be fixed by using objtool. > > https://lkml.kernel.org/r/Y9py2a5Xw0xbB8ou@hirez.programming.kicks-ass.net Thank you for your patch. However, it's more complicated for call depth tracking case. Although, the per-cpu variable in the alternative section is relocated, but the content of the "skl_call_thunk_template" is copied into another place, so the offset is still incorrect. I'm not sure if this case is common or not. Since the destination is clear, I could do relocation here as well, but it would make the code more complicated. Thanks!