Message ID | 20230526154630.289374-1-alexghiti@rivosinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp590664vqr; Fri, 26 May 2023 09:10:54 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5lY2cPN2IC+gQ8nyCGKN30HInWoFDHSdrQbsLi5SlDMqXzxZokHE24RfqOc3Ra9e+AuqvK X-Received: by 2002:a17:903:41c3:b0:1ae:4a0b:5957 with SMTP id u3-20020a17090341c300b001ae4a0b5957mr3641221ple.54.1685117452484; Fri, 26 May 2023 09:10:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685117452; cv=none; d=google.com; s=arc-20160816; b=iBYWfeCwvTvTfLs8XgjI1lchoV3CRdP3cE9jDY3yrnodCiFzNeuu5yoTAacwj3AChk baCYF6B2b7+6N5wtWgi1lzL0eeTtNm9Zcz1wLePzkDwMSm6FtQ6Y23Jp9G6Lwa453H2f jgRiKGsXIWLEPc+lwIoWUA2JI2yQNUYHKsXebD0XP7+oj6SghuyjEbGtqscvmBC0LO58 u5cc3+2jlsDemuSZUAirtIDBFUOwAPHr1EDt8NwOTWDeuccuvWgt9FekOBBxn4TnJ/DF zJfpF3uYYRta2kIFoaT2y9tCzmBECVd15+hbP7QK2ahgLoFZnH3mSLPJnbiRYHxLFIqT DDkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=3GWh9UDKye2M20srh+R8Vryxrum2NkiwrFA9635wntA=; b=ICV9wToiiiPCcUjtY9nUQKUSaYSRGQRxEDdHGjqHNJ/oR/JLyhhMossDlHQ742L6Fg D4QjOf51ANDIMykV6joQObxi3JU8eF9wBAcpuPhf/AI8rKjTDBMoWG9zBnSu8ihhS8Ba dHcHRIRfB6Iu3TvjUgaPAdMCtTd/TltwclLNbyP2JdL3DtaiIr1gXwS4eq2Nn1kCF7A1 kg+tcJ0ykgeE5U4Ia7dVb9eAE9o3UPbWSlqRNx/aRbiQspHhAYljGrgDsfFgfWcWayat Cdnv36O8L7cczpac54Vb7AEZXw0IPoMZAq2Fg8SsPYFdQK7LBkrbW1UbRPtHlKDHhOcH CPnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=mAnILZNn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ja17-20020a170902efd100b001a9581fe4b6si3703325plb.653.2023.05.26.09.10.37; Fri, 26 May 2023 09:10:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=mAnILZNn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237496AbjEZPqn (ORCPT <rfc822;zhanglyra.2023@gmail.com> + 99 others); Fri, 26 May 2023 11:46:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236978AbjEZPql (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 26 May 2023 11:46:41 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8FCD119 for <linux-kernel@vger.kernel.org>; Fri, 26 May 2023 08:46:35 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id 5b1f17b1804b1-3f603d4bc5bso10091545e9.3 for <linux-kernel@vger.kernel.org>; Fri, 26 May 2023 08:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1685115994; x=1687707994; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=3GWh9UDKye2M20srh+R8Vryxrum2NkiwrFA9635wntA=; b=mAnILZNnk/4CYo5pVG/iS9vkO1SnDeZwD3+xRXH7NanqhZoZeSn8FMqcBtzJa+Hi/t G3Y9tZzZSSCZG0Q5rnyuc3pFJgdNCFFQaxZe9O84iYBuGfDwIM05oLvxa2byTdLYn50n 8CJY85sep/QmwfdU67veyl4OfaBn2sNIwLEezvBypEiboJ0zoOM5VulhN7SNd8GhS3BT 5kEe3M8tWLz5VysifPEsu7Im0BUycU+q0J/6alQywOUuyWNIWatLD+3c3ymh2ILWkWWD kINojqleKsckPtypyHuNZF86nn5ouDQumSUmuKOHKPMMAPQCjdheF2PNSsyZRjJY4WTu qD2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685115994; x=1687707994; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3GWh9UDKye2M20srh+R8Vryxrum2NkiwrFA9635wntA=; b=H+3EY5Xx4X7IfGWws33WIAly9ObIDreIky6IfCRmV+NlfzDLhVc5wKOlMp42LN/8uQ 4pbzXX/obmttdA3gCbKzbokAC6n4t6K9YGCYCxZtsZdKYAg+fjgzlD3OvzJWuB2E+t6t VXSLxSNSaR+yVVLnRBSft0Vvhl6LfgpOqMPiWVMaj8qWA/fTyRLFxx/XYvwkslCU+ojH N4DBp4YUZnjXb276n/MQnQfYVMmITwDhxs5MX2LntxyVDxlg5//RViVK2x4iKRZobwCT u2fNbne0euqvxAew6x50e262XEEKduCmtjfaY3ZrY7MXsa3pTKc8n6FC016TNSF0CHM5 XaAw== X-Gm-Message-State: AC+VfDx10p8HdHtE6DN35W8Ffv68k8FEqf/aOuYlyDwRewMpLqgI33Et gUs6V91vb1j6WfGI1+bGky1wXw== X-Received: by 2002:a05:600c:2197:b0:3f6:787:5e53 with SMTP id e23-20020a05600c219700b003f607875e53mr1969948wme.20.1685115994303; Fri, 26 May 2023 08:46:34 -0700 (PDT) Received: from localhost.localdomain (amontpellier-656-1-456-62.w92-145.abo.wanadoo.fr. [92.145.124.62]) by smtp.gmail.com with ESMTPSA id f10-20020a7bc8ca000000b003f42ceb3bf4sm5562995wml.32.2023.05.26.08.46.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 May 2023 08:46:34 -0700 (PDT) From: Alexandre Ghiti <alexghiti@rivosinc.com> To: Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Andreas Schwab <schwab@linux-m68k.org>, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Subject: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie Date: Fri, 26 May 2023 17:46:30 +0200 Message-Id: <20230526154630.289374-1-alexghiti@rivosinc.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766973718105467268?= X-GMAIL-MSGID: =?utf-8?q?1766973718105467268?= |
Series |
[-fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie
|
|
Commit Message
Alexandre Ghiti
May 26, 2023, 3:46 p.m. UTC
Early alternatives are called with the mmu disabled, and then should not
access any global symbols through the GOT since it requires relocations,
relocations that we do before but *virtually*. So only use medany code
model for this early code.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
Note that I'm not very happy with this fix, I think we need to put more
effort into "harmonizing" this very early code (ie before the mmu is
enabled) as it is spread between different locations and compiled
differently. I'll work on that later, but for now, this fix does what is
needed to work (from my testing at least). Any Tested-by on the Unmatched
and T-head boards is welcome!
arch/riscv/errata/Makefile | 4 ++++
arch/riscv/kernel/Makefile | 4 ++++
2 files changed, 8 insertions(+)
Comments
On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote: > Early alternatives are called with the mmu disabled, and then should not > access any global symbols through the GOT since it requires relocations, > relocations that we do before but *virtually*. So only use medany code > model for this early code. > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > --- > > Note that I'm not very happy with this fix, I think we need to put more > effort into "harmonizing" this very early code (ie before the mmu is > enabled) as it is spread between different locations and compiled > differently. Totally & I'll happily spend the time trying to review that work. > I'll work on that later, but for now, this fix does what is > needed to work (from my testing at least). Any Tested-by on the Unmatched > and T-head boards is welcome! On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my config, my Nezha fails to boot. There is no output whatsoever from the kernel. Turning off CONFIG_RELOCATABLE boots again. I didn't test on my unmatched. Thanks, Conor.
On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote: > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote: > > Early alternatives are called with the mmu disabled, and then should not > > access any global symbols through the GOT since it requires relocations, > > relocations that we do before but *virtually*. So only use medany code > > model for this early code. > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > --- > > > > Note that I'm not very happy with this fix, I think we need to put more > > effort into "harmonizing" this very early code (ie before the mmu is > > enabled) as it is spread between different locations and compiled > > differently. > > Totally & I'll happily spend the time trying to review that work. > > > I'll work on that later, but for now, this fix does what is > > needed to work (from my testing at least). Any Tested-by on the Unmatched > > and T-head boards is welcome! > > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my > config, my Nezha fails to boot. There is no output whatsoever from the > kernel. Turning off CONFIG_RELOCATABLE boots again. I don't know if this is better or worse news, but same thing happens on an icicle kit. What systems, other than QEMU, has the relocatable eries been tested with, btw? Cheers, Conor. > > I didn't test on my unmatched. > > Thanks, > Conor.
On 26/05/2023 18:24, Conor Dooley wrote: > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote: >> Early alternatives are called with the mmu disabled, and then should not >> access any global symbols through the GOT since it requires relocations, >> relocations that we do before but *virtually*. So only use medany code >> model for this early code. >> >> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> >> --- >> >> Note that I'm not very happy with this fix, I think we need to put more >> effort into "harmonizing" this very early code (ie before the mmu is >> enabled) as it is spread between different locations and compiled >> differently. > Totally & I'll happily spend the time trying to review that work. > >> I'll work on that later, but for now, this fix does what is >> needed to work (from my testing at least). Any Tested-by on the Unmatched >> and T-head boards is welcome! > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my > config, my Nezha fails to boot. There is no output whatsoever from the > kernel. Turning off CONFIG_RELOCATABLE boots again. Damn, that's going to ruin my long week-end...Thanks though, I'll try to figure out what's going on, too bad I don't have any thead boards! Thanks again, Alex > I didn't test on my unmatched. > > Thanks, > Conor. > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On 26/05/2023 18:35, Conor Dooley wrote: > On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote: >> On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote: >>> Early alternatives are called with the mmu disabled, and then should not >>> access any global symbols through the GOT since it requires relocations, >>> relocations that we do before but *virtually*. So only use medany code >>> model for this early code. >>> >>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> >>> --- >>> >>> Note that I'm not very happy with this fix, I think we need to put more >>> effort into "harmonizing" this very early code (ie before the mmu is >>> enabled) as it is spread between different locations and compiled >>> differently. >> Totally & I'll happily spend the time trying to review that work. >> >>> I'll work on that later, but for now, this fix does what is >>> needed to work (from my testing at least). Any Tested-by on the Unmatched >>> and T-head boards is welcome! >> On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my >> config, my Nezha fails to boot. There is no output whatsoever from the >> kernel. Turning off CONFIG_RELOCATABLE boots again. > I don't know if this is better or worse news, but same thing happens on > an icicle kit. What systems, other than QEMU, has the relocatable > eries been tested with, btw? I tested it on the Unmatched (Andreas did too). Very weird it does not work on the icicle kit, there is no errata for this soc, so what gets executed this early for this soc? Do you know where it fails to boot? If you can debug, you should break on the address of the entry point (usually 0x8020_0000) since this is the stvec address, so when you get a trap, you will branch there, and then could you dump $sepc, $ra and $stval when you get there? Regarding the thead issue, I think the following should fix it: diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile index b85e9e82f082..a9bf3f8c7cb4 100644 --- a/arch/riscv/mm/Makefile +++ b/arch/riscv/mm/Makefile @@ -3,6 +3,7 @@ CFLAGS_init.o := -mcmodel=medany ifdef CONFIG_RELOCATABLE CFLAGS_init.o += -fno-pie +CFLAGS_dma-noncoherent.o += -fno-pie endif ifdef CONFIG_FTRACE > > Cheers, > Conor. > >> I didn't test on my unmatched. >> >> Thanks, >> Conor. > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote: > > On 26/05/2023 18:35, Conor Dooley wrote: > > On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote: > > > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote: > > > > Early alternatives are called with the mmu disabled, and then should not > > > > access any global symbols through the GOT since it requires relocations, > > > > relocations that we do before but *virtually*. So only use medany code > > > > model for this early code. > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > --- > > > > > > > > Note that I'm not very happy with this fix, I think we need to put more > > > > effort into "harmonizing" this very early code (ie before the mmu is > > > > enabled) as it is spread between different locations and compiled > > > > differently. > > > Totally & I'll happily spend the time trying to review that work. > > > > > > > I'll work on that later, but for now, this fix does what is > > > > needed to work (from my testing at least). Any Tested-by on the Unmatched > > > > and T-head boards is welcome! > > > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my > > > config, my Nezha fails to boot. There is no output whatsoever from the > > > kernel. Turning off CONFIG_RELOCATABLE boots again. > > I don't know if this is better or worse news, but same thing happens on > > an icicle kit. What systems, other than QEMU, has the relocatable > > eries been tested with, btw? > > > I tested it on the Unmatched (Andreas did too). Cool. I cracked out my unmatched and it has the same issue as the icicle. Ditto my Visionfive v2. Here's my config. https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig A ~default qemu virt doesn't work either. (-m 2G -smp 5) > Very weird it does not work on the icicle kit, there is no errata for this > soc, so what gets executed this early for this soc? Do you know where it > fails to boot? If you can debug, you should break on the address of the > entry point (usually 0x8020_0000) since this is the stvec address, so when > you get a trap, you will branch there, and then could you dump $sepc, $ra > and $stval when you get there? > Regarding the thead issue, I think the following should fix it: It did not :/ Cheers, Conor.
On Sat, May 27, 2023 at 12:02 PM Conor Dooley <conor@kernel.org> wrote: > > On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote: > > > > On 26/05/2023 18:35, Conor Dooley wrote: > > > On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote: > > > > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote: > > > > > Early alternatives are called with the mmu disabled, and then should not > > > > > access any global symbols through the GOT since it requires relocations, > > > > > relocations that we do before but *virtually*. So only use medany code > > > > > model for this early code. > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > --- > > > > > > > > > > Note that I'm not very happy with this fix, I think we need to put more > > > > > effort into "harmonizing" this very early code (ie before the mmu is > > > > > enabled) as it is spread between different locations and compiled > > > > > differently. > > > > Totally & I'll happily spend the time trying to review that work. > > > > > > > > > I'll work on that later, but for now, this fix does what is > > > > > needed to work (from my testing at least). Any Tested-by on the Unmatched > > > > > and T-head boards is welcome! > > > > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my > > > > config, my Nezha fails to boot. There is no output whatsoever from the > > > > kernel. Turning off CONFIG_RELOCATABLE boots again. > > > I don't know if this is better or worse news, but same thing happens on > > > an icicle kit. What systems, other than QEMU, has the relocatable > > > eries been tested with, btw? > > > > > > I tested it on the Unmatched (Andreas did too). > > Cool. I cracked out my unmatched and it has the same issue as the > icicle. Ditto my Visionfive v2. Here's my config. > https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig > > A ~default qemu virt doesn't work either. (-m 2G -smp 5) I can boot with this config using: $ sudo ~/qemu/build/qemu-system-riscv64 -machine virt -cpu rv64,sv48=off -nographic -m 2G -smp 5 -kernel build_conor/arch/riscv/boot/Image -s I noticed when trying to add this to our internal CI that I had local failures that did not happen in the CI because the CI was not using the same toolchain: can you give me the full .config? So that I can see if the compiler added stack guards or some other things I did not think of. Thanks! > > > Very weird it does not work on the icicle kit, there is no errata for this > > soc, so what gets executed this early for this soc? Do you know where it > > fails to boot? If you can debug, you should break on the address of the > > entry point (usually 0x8020_0000) since this is the stvec address, so when > > you get a trap, you will branch there, and then could you dump $sepc, $ra > > and $stval when you get there? > > > Regarding the thead issue, I think the following should fix it: > > It did not :/ > > Cheers, > Conor. >
On Sun, May 28, 2023 at 03:00:57PM +0200, Alexandre Ghiti wrote: > On Sat, May 27, 2023 at 12:02 PM Conor Dooley <conor@kernel.org> wrote: > > > > On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote: > > > > > > On 26/05/2023 18:35, Conor Dooley wrote: > > > > On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote: > > > > > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote: > > > > > > Early alternatives are called with the mmu disabled, and then should not > > > > > > access any global symbols through the GOT since it requires relocations, > > > > > > relocations that we do before but *virtually*. So only use medany code > > > > > > model for this early code. > > > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > --- > > > > > > > > > > > > Note that I'm not very happy with this fix, I think we need to put more > > > > > > effort into "harmonizing" this very early code (ie before the mmu is > > > > > > enabled) as it is spread between different locations and compiled > > > > > > differently. > > > > > Totally & I'll happily spend the time trying to review that work. > > > > > > > > > > > I'll work on that later, but for now, this fix does what is > > > > > > needed to work (from my testing at least). Any Tested-by on the Unmatched > > > > > > and T-head boards is welcome! > > > > > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my > > > > > config, my Nezha fails to boot. There is no output whatsoever from the > > > > > kernel. Turning off CONFIG_RELOCATABLE boots again. > > > > I don't know if this is better or worse news, but same thing happens on > > > > an icicle kit. What systems, other than QEMU, has the relocatable > > > > eries been tested with, btw? > > > > > > > > > I tested it on the Unmatched (Andreas did too). > > > > Cool. I cracked out my unmatched and it has the same issue as the > > icicle. Ditto my Visionfive v2. Here's my config. > > https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig > > > > A ~default qemu virt doesn't work either. (-m 2G -smp 5) > > I can boot with this config using: > > $ sudo ~/qemu/build/qemu-system-riscv64 -machine virt -cpu > rv64,sv48=off -nographic -m 2G -smp 5 -kernel > build_conor/arch/riscv/boot/Image -s Just in case, that is my normal config that I use for testing random stuff on LKML, I added CONFIG_RELOCATABLE in addition to that. > I noticed when trying to add this to our internal CI that I had local > failures that did not happen in the CI because the CI was not using > the same toolchain: can you give me the full .config? So that I can > see if the compiler added stack guards or some other things I did not > think of. https://gist.githubusercontent.com/ConchuOD/655f9cc19fb3be63f1c9da7e7e3ab717/raw/a1aad3c0d307609b2062fd3a66705166aede9f9f/.config 90% of what I test for upstream stuff uses clang, since clang appears to be a minority choice - but I could reproduce this with gcc-12 as well, using the same defconfig as linked above + CONFIG_RELOCATABLE. Cheers, Conor.
On 28/05/2023 15:12, Conor Dooley wrote: > On Sun, May 28, 2023 at 03:00:57PM +0200, Alexandre Ghiti wrote: >> On Sat, May 27, 2023 at 12:02 PM Conor Dooley <conor@kernel.org> wrote: >>> On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote: >>>> On 26/05/2023 18:35, Conor Dooley wrote: >>>>> On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote: >>>>>> On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote: >>>>>>> Early alternatives are called with the mmu disabled, and then should not >>>>>>> access any global symbols through the GOT since it requires relocations, >>>>>>> relocations that we do before but *virtually*. So only use medany code >>>>>>> model for this early code. >>>>>>> >>>>>>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> >>>>>>> --- >>>>>>> >>>>>>> Note that I'm not very happy with this fix, I think we need to put more >>>>>>> effort into "harmonizing" this very early code (ie before the mmu is >>>>>>> enabled) as it is spread between different locations and compiled >>>>>>> differently. >>>>>> Totally & I'll happily spend the time trying to review that work. >>>>>> >>>>>>> I'll work on that later, but for now, this fix does what is >>>>>>> needed to work (from my testing at least). Any Tested-by on the Unmatched >>>>>>> and T-head boards is welcome! >>>>>> On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my >>>>>> config, my Nezha fails to boot. There is no output whatsoever from the >>>>>> kernel. Turning off CONFIG_RELOCATABLE boots again. >>>>> I don't know if this is better or worse news, but same thing happens on >>>>> an icicle kit. What systems, other than QEMU, has the relocatable >>>>> eries been tested with, btw? >>>> >>>> I tested it on the Unmatched (Andreas did too). >>> Cool. I cracked out my unmatched and it has the same issue as the >>> icicle. Ditto my Visionfive v2. Here's my config. >>> https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig >>> >>> A ~default qemu virt doesn't work either. (-m 2G -smp 5) >> I can boot with this config using: >> >> $ sudo ~/qemu/build/qemu-system-riscv64 -machine virt -cpu >> rv64,sv48=off -nographic -m 2G -smp 5 -kernel >> build_conor/arch/riscv/boot/Image -s > Just in case, that is my normal config that I use for testing random > stuff on LKML, I added CONFIG_RELOCATABLE in addition to that. > >> I noticed when trying to add this to our internal CI that I had local >> failures that did not happen in the CI because the CI was not using >> the same toolchain: can you give me the full .config? So that I can >> see if the compiler added stack guards or some other things I did not >> think of. > https://gist.githubusercontent.com/ConchuOD/655f9cc19fb3be63f1c9da7e7e3ab717/raw/a1aad3c0d307609b2062fd3a66705166aede9f9f/.config > > 90% of what I test for upstream stuff uses clang, since clang appears to > be a minority choice - but I could reproduce this with gcc-12 as well, > using the same defconfig as linked above + CONFIG_RELOCATABLE. Hmmm, it still works for me with both clang and gcc-9. You don't have to do that now but is there a way I could get your compiled image? With the sha1 used to build it? Sorry, I don't see what happens, I need to get my hands dirty in some debug! Thanks for being so quick Conor! > Cheers, > Conor. > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote: > Hmmm, it still works for me with both clang and gcc-9. gcc-9 is a bit of a relic, do you have more recent compilers lying around? If not, I can try some older compilers at some point. > You don't have to do that now but is there a way I could get your compiled > image? With the sha1 used to build it? Sorry, I don't see what happens, I > need to get my hands dirty in some debug! What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable hash, if that's what you're looking for. Otherwise, https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin (ignore the release crap haha, too lazy to find a proper hosting mechanism) | git show | commit 3bd124485ed55d8ee6c1ff3532c8f617b24aa6ef (HEAD) | Author: Alexandre Ghiti <alexghiti@rivosinc.com> | Date: Fri May 26 17:46:30 2023 +0200 | | riscv: Fix relocatable kernels with early alternatives using -fno-pie | | Early alternatives are called with the mmu disabled, and then should not | access any global symbols through the GOT since it requires relocations, | relocations that we do before but *virtually*. So only use medany code | model for this early code. | | Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> | Signed-off-by: Conor Dooley <conor.dooley@microchip.com> | | diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile | index a1055965fbee..7b2637c8c332 100644 | --- a/arch/riscv/errata/Makefile | +++ b/arch/riscv/errata/Makefile | @@ -1,2 +1,6 @@ | +ifdef CONFIG_RELOCATABLE | +KBUILD_CFLAGS += -fno-pie | +endif | + | obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ | obj-$(CONFIG_ERRATA_THEAD) += thead/ | diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile | index fbdccc21418a..153864e4f399 100644 | --- a/arch/riscv/kernel/Makefile | +++ b/arch/riscv/kernel/Makefile | @@ -23,6 +23,10 @@ ifdef CONFIG_FTRACE | CFLAGS_REMOVE_alternative.o = $(CC_FLAGS_FTRACE) | CFLAGS_REMOVE_cpufeature.o = $(CC_FLAGS_FTRACE) | endif | +ifdef CONFIG_RELOCATABLE | +CFLAGS_alternative.o += -fno-pie | +CFLAGS_cpufeature.o += -fno-pie | +endif | ifdef CONFIG_KASAN | KASAN_SANITIZE_alternative.o := n | KASAN_SANITIZE_cpufeature.o := n
On 28/05/2023 15:56, Conor Dooley wrote: > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote: >> Hmmm, it still works for me with both clang and gcc-9. > gcc-9 is a bit of a relic, do you have more recent compilers lying > around? If not, I can try some older compilers at some point. > >> You don't have to do that now but is there a way I could get your compiled >> image? With the sha1 used to build it? Sorry, I don't see what happens, I >> need to get my hands dirty in some debug! > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable > hash, if that's what you're looking for. > > Otherwise, > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin > (ignore the release crap haha, too lazy to find a proper hosting > mechanism) Ok, I don't get much info without the symbols, can you also provide the vmlinux please? But at least your image does not boot, not during the early boot though because the mmu is enabled. I tried with gcc-12 and it still works fine on my end, so frustrating! > | git show > | commit 3bd124485ed55d8ee6c1ff3532c8f617b24aa6ef (HEAD) > | Author: Alexandre Ghiti <alexghiti@rivosinc.com> > | Date: Fri May 26 17:46:30 2023 +0200 > | > | riscv: Fix relocatable kernels with early alternatives using -fno-pie > | > | Early alternatives are called with the mmu disabled, and then should not > | access any global symbols through the GOT since it requires relocations, > | relocations that we do before but *virtually*. So only use medany code > | model for this early code. > | > | Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > | Signed-off-by: Conor Dooley <conor.dooley@microchip.com> > | > | diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile > | index a1055965fbee..7b2637c8c332 100644 > | --- a/arch/riscv/errata/Makefile > | +++ b/arch/riscv/errata/Makefile > | @@ -1,2 +1,6 @@ > | +ifdef CONFIG_RELOCATABLE > | +KBUILD_CFLAGS += -fno-pie > | +endif > | + > | obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ > | obj-$(CONFIG_ERRATA_THEAD) += thead/ > | diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > | index fbdccc21418a..153864e4f399 100644 > | --- a/arch/riscv/kernel/Makefile > | +++ b/arch/riscv/kernel/Makefile > | @@ -23,6 +23,10 @@ ifdef CONFIG_FTRACE > | CFLAGS_REMOVE_alternative.o = $(CC_FLAGS_FTRACE) > | CFLAGS_REMOVE_cpufeature.o = $(CC_FLAGS_FTRACE) > | endif > | +ifdef CONFIG_RELOCATABLE > | +CFLAGS_alternative.o += -fno-pie > | +CFLAGS_cpufeature.o += -fno-pie > | +endif > | ifdef CONFIG_KASAN > | KASAN_SANITIZE_alternative.o := n > | KASAN_SANITIZE_cpufeature.o := n > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote: > > On 28/05/2023 15:56, Conor Dooley wrote: > > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote: > > > Hmmm, it still works for me with both clang and gcc-9. > > gcc-9 is a bit of a relic, do you have more recent compilers lying > > around? If not, I can try some older compilers at some point. > > > > > You don't have to do that now but is there a way I could get your compiled > > > image? With the sha1 used to build it? Sorry, I don't see what happens, I > > > need to get my hands dirty in some debug! > > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable > > hash, if that's what you're looking for. > > > > Otherwise, > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin > > (ignore the release crap haha, too lazy to find a proper hosting > > mechanism) > > > Ok, I don't get much info without the symbols, can you also provide the > vmlinux please? But at least your image does not boot, not during the early > boot though because the mmu is enabled. Do you see anything print when you try it? Cos I do not. Iff I have time tomorrow, I'll go poking with gdb. I'm sorry I have not really done any investigating, I have been really busy this last week or so with dt-binding stuff but I should be freer again from tomorrow. https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux > I tried with gcc-12 and it still works fine on my end, so frustrating! Crap! Also, should you not be enjoying a public holiday rather than debugging?! Or maybe debugging is enjoyable for you... Cheers, Conor.
On 29/05/2023 21:06, Conor Dooley wrote: > On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote: >> On 28/05/2023 15:56, Conor Dooley wrote: >>> On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote: >>>> Hmmm, it still works for me with both clang and gcc-9. >>> gcc-9 is a bit of a relic, do you have more recent compilers lying >>> around? If not, I can try some older compilers at some point. >>> >>>> You don't have to do that now but is there a way I could get your compiled >>>> image? With the sha1 used to build it? Sorry, I don't see what happens, I >>>> need to get my hands dirty in some debug! >>> What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable >>> hash, if that's what you're looking for. >>> >>> Otherwise, >>> https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin >>> (ignore the release crap haha, too lazy to find a proper hosting >>> mechanism) >> >> Ok, I don't get much info without the symbols, can you also provide the >> vmlinux please? But at least your image does not boot, not during the early >> boot though because the mmu is enabled. > Do you see anything print when you try it? Cos I do not. Iff I have time > tomorrow, I'll go poking with gdb. I'm sorry I have not really done any > investigating, I have been really busy this last week or so with > dt-binding stuff but I should be freer again from tomorrow. > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux Better, the trap happens in kasan_early_init() when it tries to access a global symbol using the GOT but ends up with a NULL pointer, which is weird. So to me, this is not related to kasan, it happens that kasan_early_init() is the first function called after enabling the mmu, I think you may have an issue with the filling of the relocations. Sorry to bother you again, but if at some point you can recompile with DEBUG_INFO enabled, that would be perfect! And also provide the vmlinux.relocs file. Sorry for all that, too bad I can't reproduce it. > >> I tried with gcc-12 and it still works fine on my end, so frustrating! > Crap! Also, should you not be enjoying a public holiday rather than > debugging?! Or maybe debugging is enjoyable for you... Ahah, this is what I enjoy doing when the kids finally sleep :) Thank you again for your very quick feedback, really appreciated! > > Cheers, > Conor.
On Mon, May 29, 2023 at 09:37:28PM +0200, Alexandre Ghiti wrote: > On 29/05/2023 21:06, Conor Dooley wrote: > > On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote: > > > On 28/05/2023 15:56, Conor Dooley wrote: > > > > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote: > > > > > Hmmm, it still works for me with both clang and gcc-9. > > > > gcc-9 is a bit of a relic, do you have more recent compilers lying > > > > around? If not, I can try some older compilers at some point. > > > > > > > > > You don't have to do that now but is there a way I could get your compiled > > > > > image? With the sha1 used to build it? Sorry, I don't see what happens, I > > > > > need to get my hands dirty in some debug! > > > > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable > > > > hash, if that's what you're looking for. > > > > > > > > Otherwise, > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin > > > > (ignore the release crap haha, too lazy to find a proper hosting > > > > mechanism) > > > > > > Ok, I don't get much info without the symbols, can you also provide the > > > vmlinux please? But at least your image does not boot, not during the early > > > boot though because the mmu is enabled. > > Do you see anything print when you try it? Cos I do not. Iff I have time > > tomorrow, I'll go poking with gdb. I'm sorry I have not really done any > > investigating, I have been really busy this last week or so with > > dt-binding stuff but I should be freer again from tomorrow. > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux > > > Better, the trap happens in kasan_early_init() when it tries to access a > global symbol using the GOT but ends up with a NULL pointer, which is weird. > So to me, this is not related to kasan, it happens that kasan_early_init() > is the first function called after enabling the mmu, I think you may have an > issue with the filling of the relocations. Yeah, it reproduces without KASAN. > Sorry to bother you again, but if > at some point you can recompile with DEBUG_INFO enabled, that would be > perfect! And also provide the vmlinux.relocs file. Sorry for all that, too > bad I can't reproduce it. New vmlinux & vmlinux.relocs here: https://microchiptechnology-my.sharepoint.com/:u:/g/personal/conor_dooley_microchip_com/EZpFNxYYrnNAh5Z3c-rf0pUBBpdPGTLafqdtfcXRUUBkXw?e=7KKMHX They're pretty massive unfortunately & hopefully that is not some garbage internal-only link. .config is a wee bit different, cos different build machine, but the problem still manifests on a icicle. I've added it to the tarball just in case. > > > I tried with gcc-12 and it still works fine on my end, so frustrating! > > Crap! Also, should you not be enjoying a public holiday rather than > > debugging?! Or maybe debugging is enjoyable for you... > > > Ahah, this is what I enjoy doing when the kids finally sleep :) > > > Thank you again for your very quick feedback, really appreciated! No worries.
On 30/05/2023 13:27, Conor Dooley wrote: > On Mon, May 29, 2023 at 09:37:28PM +0200, Alexandre Ghiti wrote: >> On 29/05/2023 21:06, Conor Dooley wrote: >>> On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote: >>>> On 28/05/2023 15:56, Conor Dooley wrote: >>>>> On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote: >>>>>> Hmmm, it still works for me with both clang and gcc-9. >>>>> gcc-9 is a bit of a relic, do you have more recent compilers lying >>>>> around? If not, I can try some older compilers at some point. >>>>> >>>>>> You don't have to do that now but is there a way I could get your compiled >>>>>> image? With the sha1 used to build it? Sorry, I don't see what happens, I >>>>>> need to get my hands dirty in some debug! >>>>> What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable >>>>> hash, if that's what you're looking for. >>>>> >>>>> Otherwise, >>>>> https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin >>>>> (ignore the release crap haha, too lazy to find a proper hosting >>>>> mechanism) >>>> Ok, I don't get much info without the symbols, can you also provide the >>>> vmlinux please? But at least your image does not boot, not during the early >>>> boot though because the mmu is enabled. >>> Do you see anything print when you try it? Cos I do not. Iff I have time >>> tomorrow, I'll go poking with gdb. I'm sorry I have not really done any >>> investigating, I have been really busy this last week or so with >>> dt-binding stuff but I should be freer again from tomorrow. >>> >>> https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux >> >> Better, the trap happens in kasan_early_init() when it tries to access a >> global symbol using the GOT but ends up with a NULL pointer, which is weird. >> So to me, this is not related to kasan, it happens that kasan_early_init() >> is the first function called after enabling the mmu, I think you may have an >> issue with the filling of the relocations. > Yeah, it reproduces without KASAN. > >> Sorry to bother you again, but if >> at some point you can recompile with DEBUG_INFO enabled, that would be >> perfect! And also provide the vmlinux.relocs file. Sorry for all that, too >> bad I can't reproduce it. > New vmlinux & vmlinux.relocs here: > https://microchiptechnology-my.sharepoint.com/:u:/g/personal/conor_dooley_microchip_com/EZpFNxYYrnNAh5Z3c-rf0pUBBpdPGTLafqdtfcXRUUBkXw?e=7KKMHX > They're pretty massive unfortunately & hopefully that is not some > garbage internal-only link. > .config is a wee bit different, cos different build machine, but the > problem still manifests on a icicle. I've added it to the tarball just > in case. Ok so I had to recreate the Image from the files you gave me and it boots fine using qemu: is that expected? Because you only mention the icicle above. [ 0.000000] Linux version 6.4.0-rc1 (conor@wendy) (ClangBuiltLinux clang version 15.0.7 (/home/conor/stuff/dev/llvm/clang 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a), ClangBuiltLinux LLD 15.0.7) #1 SMP PREEMPT Tue May 30 12:13:12 IST 2023 [ 0.000000] random: crng init done [ 0.000000] Machine model: riscv-virtio,qemu [ 0.000000] earlycon: ns16550a0 at MMIO 0x0000000010000000 (options '') [ 0.000000] printk: bootconsole [ns16550a0] enabled [ 0.000000] printk: debug: skip boot console de-registration. [ 0.000000] efi: UEFI not found. [ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008003ffff (256 KiB) map non-reusable mmode_resv0@80000000 [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x000000017fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x000000017fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000017fffffff] [ 0.000000] SBI specification v1.0 detected [ 0.000000] SBI implementation ID=0x1 Version=0x10002 [ 0.000000] SBI TIME extension detected [ 0.000000] SBI IPI extension detected [ 0.000000] SBI RFENCE extension detected [ 0.000000] SBI SRST extension detected [ 0.000000] SBI HSM extension detected [ 0.000000] riscv: base ISA extensions acdfhim [ 0.000000] riscv: ELF capabilities acdfim [ 0.000000] percpu: Embedded 30 pages/cpu s83872 r8192 d30816 u122880 [ 0.000000] Kernel command line: earlycon keep_bootcon root=/dev/mmcblk1p2 rootdelay=10 reboot=cold [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear) [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1034240 [ 0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off ... >>>> I tried with gcc-12 and it still works fine on my end, so frustrating! >>> Crap! Also, should you not be enjoying a public holiday rather than >>> debugging?! Or maybe debugging is enjoyable for you... >> >> Ahah, this is what I enjoy doing when the kids finally sleep :) >> >> >> Thank you again for your very quick feedback, really appreciated! > No worries. > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Tue, May 30, 2023 at 04:33:45PM +0200, Alexandre Ghiti wrote: > > On 30/05/2023 13:27, Conor Dooley wrote: > > On Mon, May 29, 2023 at 09:37:28PM +0200, Alexandre Ghiti wrote: > > > On 29/05/2023 21:06, Conor Dooley wrote: > > > > On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote: > > > > > On 28/05/2023 15:56, Conor Dooley wrote: > > > > > > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote: > > > > > > > Hmmm, it still works for me with both clang and gcc-9. > > > > > > gcc-9 is a bit of a relic, do you have more recent compilers lying > > > > > > around? If not, I can try some older compilers at some point. > > > > > > > > > > > > > You don't have to do that now but is there a way I could get your compiled > > > > > > > image? With the sha1 used to build it? Sorry, I don't see what happens, I > > > > > > > need to get my hands dirty in some debug! > > > > > > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable > > > > > > hash, if that's what you're looking for. > > > > > > > > > > > > Otherwise, > > > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin > > > > > > (ignore the release crap haha, too lazy to find a proper hosting > > > > > > mechanism) > > > > > Ok, I don't get much info without the symbols, can you also provide the > > > > > vmlinux please? But at least your image does not boot, not during the early > > > > > boot though because the mmu is enabled. > > > > Do you see anything print when you try it? Cos I do not. Iff I have time > > > > tomorrow, I'll go poking with gdb. I'm sorry I have not really done any > > > > investigating, I have been really busy this last week or so with > > > > dt-binding stuff but I should be freer again from tomorrow. > > > > > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux > > > > > > Better, the trap happens in kasan_early_init() when it tries to access a > > > global symbol using the GOT but ends up with a NULL pointer, which is weird. > > > So to me, this is not related to kasan, it happens that kasan_early_init() > > > is the first function called after enabling the mmu, I think you may have an > > > issue with the filling of the relocations. > > Yeah, it reproduces without KASAN. > > > > > Sorry to bother you again, but if > > > at some point you can recompile with DEBUG_INFO enabled, that would be > > > perfect! And also provide the vmlinux.relocs file. Sorry for all that, too > > > bad I can't reproduce it. > > New vmlinux & vmlinux.relocs here: > > https://microchiptechnology-my.sharepoint.com/:u:/g/personal/conor_dooley_microchip_com/EZpFNxYYrnNAh5Z3c-rf0pUBBpdPGTLafqdtfcXRUUBkXw?e=7KKMHX > > They're pretty massive unfortunately & hopefully that is not some > > garbage internal-only link. > > .config is a wee bit different, cos different build machine, but the > > problem still manifests on a icicle. I've added it to the tarball just > > in case. > > > Ok so I had to recreate the Image from the files you gave me and it boots > fine using qemu: is that expected? Because you only mention the icicle > above. Unfortunately you sent this one right as I left work.. I ssh'ed in though and ran the vmlinux.bin & had the same issues. Silly question perhaps - is it just not possible to boot something that has been hit with `objcopy -O binary vmlinux vmlinux.bin` with CONFIG_RELOCATABLE? At this point that's the main thing that sticks out to me as being different. You couldn't boot the vmlinux.bin that I sent you either. Cheers, Conor.
On Tue, May 30, 2023 at 7:47 PM Conor Dooley <conor@kernel.org> wrote: > > On Tue, May 30, 2023 at 04:33:45PM +0200, Alexandre Ghiti wrote: > > > > On 30/05/2023 13:27, Conor Dooley wrote: > > > On Mon, May 29, 2023 at 09:37:28PM +0200, Alexandre Ghiti wrote: > > > > On 29/05/2023 21:06, Conor Dooley wrote: > > > > > On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote: > > > > > > On 28/05/2023 15:56, Conor Dooley wrote: > > > > > > > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote: > > > > > > > > Hmmm, it still works for me with both clang and gcc-9. > > > > > > > gcc-9 is a bit of a relic, do you have more recent compilers lying > > > > > > > around? If not, I can try some older compilers at some point. > > > > > > > > > > > > > > > You don't have to do that now but is there a way I could get your compiled > > > > > > > > image? With the sha1 used to build it? Sorry, I don't see what happens, I > > > > > > > > need to get my hands dirty in some debug! > > > > > > > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable > > > > > > > hash, if that's what you're looking for. > > > > > > > > > > > > > > Otherwise, > > > > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin > > > > > > > (ignore the release crap haha, too lazy to find a proper hosting > > > > > > > mechanism) > > > > > > Ok, I don't get much info without the symbols, can you also provide the > > > > > > vmlinux please? But at least your image does not boot, not during the early > > > > > > boot though because the mmu is enabled. > > > > > Do you see anything print when you try it? Cos I do not. Iff I have time > > > > > tomorrow, I'll go poking with gdb. I'm sorry I have not really done any > > > > > investigating, I have been really busy this last week or so with > > > > > dt-binding stuff but I should be freer again from tomorrow. > > > > > > > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux > > > > > > > > Better, the trap happens in kasan_early_init() when it tries to access a > > > > global symbol using the GOT but ends up with a NULL pointer, which is weird. > > > > So to me, this is not related to kasan, it happens that kasan_early_init() > > > > is the first function called after enabling the mmu, I think you may have an > > > > issue with the filling of the relocations. > > > Yeah, it reproduces without KASAN. > > > > > > > Sorry to bother you again, but if > > > > at some point you can recompile with DEBUG_INFO enabled, that would be > > > > perfect! And also provide the vmlinux.relocs file. Sorry for all that, too > > > > bad I can't reproduce it. > > > New vmlinux & vmlinux.relocs here: > > > https://microchiptechnology-my.sharepoint.com/:u:/g/personal/conor_dooley_microchip_com/EZpFNxYYrnNAh5Z3c-rf0pUBBpdPGTLafqdtfcXRUUBkXw?e=7KKMHX > > > They're pretty massive unfortunately & hopefully that is not some > > > garbage internal-only link. > > > .config is a wee bit different, cos different build machine, but the > > > problem still manifests on a icicle. I've added it to the tarball just > > > in case. > > > > > > Ok so I had to recreate the Image from the files you gave me and it boots > > fine using qemu: is that expected? Because you only mention the icicle > > above. > > Unfortunately you sent this one right as I left work.. > I ssh'ed in though and ran the vmlinux.bin & had the same issues. > Silly question perhaps - is it just not possible to boot something that > has been hit with `objcopy -O binary vmlinux vmlinux.bin` with > CONFIG_RELOCATABLE? At this point that's the main thing that sticks out > to me as being different. You couldn't boot the vmlinux.bin that I sent > you either. Ahah, I think we found the culprit! With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the relocations (so that it can be shipped) and vmlinux.relocs is what you should use instead, since it is just a copy of vmlinux before the removal of the relocations! > > Cheers, > Conor.
On Tue, May 30, 2023 at 08:04:17PM +0200, Alexandre Ghiti wrote: > > Ahah, I think we found the culprit! > > With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the > relocations (so that it can be shipped) and vmlinux.relocs is what you > should use instead, since it is just a copy of vmlinux before the > removal of the relocations! That probably makes us both eejits for not realising sooner... Tested-by: Conor Dooley <conor.dooley@microchip.com> # booted on nezha & unmatched Thanks for your patience here Alex.
On 30/05/2023 22:22, Conor Dooley wrote: > On Tue, May 30, 2023 at 08:04:17PM +0200, Alexandre Ghiti wrote: >> Ahah, I think we found the culprit! >> >> With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the >> relocations (so that it can be shipped) and vmlinux.relocs is what you >> should use instead, since it is just a copy of vmlinux before the >> removal of the relocations! > That probably makes us both eejits for not realising sooner... Ahah, TIL a new word, thanks :) > > Tested-by: Conor Dooley <conor.dooley@microchip.com> # booted on nezha & unmatched > > Thanks for your patience here Alex. So I checked again if the -fno-pie should be applied to mm/dma-noncoherent.c as I suggested, but actually no: errata/thead/errata.c never reaches riscv_noncoherent_supported() in early boot (you can see how 'fragile' it is though and why something needs to be done...). Oh and I realized that I forgot the Reported-by from Andreas and the Fixes tags, so here they are: Fixes: 39b33072941f ("riscv: Introduce CONFIG_RELOCATABLE") Reported-by: Andreas Schwab <schwab@linux-m68k.org> Thank you too for your patience and your quick answers! Alex > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Wed, May 31, 2023 at 09:26:27AM +0200, Alexandre Ghiti wrote: > On 30/05/2023 22:22, Conor Dooley wrote: > > On Tue, May 30, 2023 at 08:04:17PM +0200, Alexandre Ghiti wrote: > > > Ahah, I think we found the culprit! > > > > > > With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the > > > relocations (so that it can be shipped) and vmlinux.relocs is what you > > > should use instead, since it is just a copy of vmlinux before the > > > removal of the relocations! > > That probably makes us both eejits for not realising sooner... > > Ahah, TIL a new word, thanks :) > > > > > Tested-by: Conor Dooley <conor.dooley@microchip.com> # booted on nezha & unmatched > > > > Thanks for your patience here Alex. > > So I checked again if the -fno-pie should be applied to mm/dma-noncoherent.c > as I suggested, but actually no: errata/thead/errata.c never reaches > riscv_noncoherent_supported() in early boot (you can see how 'fragile' it is > though and why something needs to be done...). I did make sure to check this patch itself, without the additional bit, to see if it was needed. But yeah, it is going to be super fragile - do you have any ideas about how to circumvent that? > Oh and I realized that I forgot the Reported-by from Andreas and the Fixes > tags, so here they are: > > Fixes: 39b33072941f ("riscv: Introduce CONFIG_RELOCATABLE") > Reported-by: Andreas Schwab <schwab@linux-m68k.org> > > > Thank you too for your patience and your quick answers! > > Alex > > > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv
On 31/05/2023 11:32, Conor Dooley wrote: > On Wed, May 31, 2023 at 09:26:27AM +0200, Alexandre Ghiti wrote: >> On 30/05/2023 22:22, Conor Dooley wrote: >>> On Tue, May 30, 2023 at 08:04:17PM +0200, Alexandre Ghiti wrote: >>>> Ahah, I think we found the culprit! >>>> >>>> With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the >>>> relocations (so that it can be shipped) and vmlinux.relocs is what you >>>> should use instead, since it is just a copy of vmlinux before the >>>> removal of the relocations! >>> That probably makes us both eejits for not realising sooner... >> Ahah, TIL a new word, thanks :) >> >>> Tested-by: Conor Dooley <conor.dooley@microchip.com> # booted on nezha & unmatched >>> >>> Thanks for your patience here Alex. >> So I checked again if the -fno-pie should be applied to mm/dma-noncoherent.c >> as I suggested, but actually no: errata/thead/errata.c never reaches >> riscv_noncoherent_supported() in early boot (you can see how 'fragile' it is >> though and why something needs to be done...). > I did make sure to check this patch itself, without the additional bit, > to see if it was needed. > But yeah, it is going to be super fragile - do you have any ideas about > how to circumvent that? Yes, I was thinking about multiple solutions: - All the early code could go into kernel/pi: all the dependencies of the early code is built in its own way (the symbols are actually 'duplicated'). I see that a bit like the EFI stub. My first try failed with !CONFIG_RELOCATABLE, I have to dig further. - Simply do a physical relocation before any early code, execute the early code, and then do the virtual relocation. But that does not solve the issue fixed by kernel/pi which allows to recompile standard functions (like the string ones) without any instrumentation and have the versions with the instrumentation for normal execution. - Compile relocatable kernels without -fPIE (why can't we just use medany actually?). That won't fix certain types of situations where we need relocations, but that will limit the number of outliers that need to be compiled with -fno-pie and it will be easier to spot (we'll still have to be very careful though) - Be very strict about what can/cannot be done in this pre-mmu stage, and document that... The best solution would be the first I guess. Any other ideas welcome :) > >> Oh and I realized that I forgot the Reported-by from Andreas and the Fixes >> tags, so here they are: >> >> Fixes: 39b33072941f ("riscv: Introduce CONFIG_RELOCATABLE") >> Reported-by: Andreas Schwab <schwab@linux-m68k.org> >> >> >> Thank you too for your patience and your quick answers! >> >> Alex >> >> >>> _______________________________________________ >>> linux-riscv mailing list >>> linux-riscv@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
On Fri, 26 May 2023 17:46:30 +0200, Alexandre Ghiti wrote: > Early alternatives are called with the mmu disabled, and then should not > access any global symbols through the GOT since it requires relocations, > relocations that we do before but *virtually*. So only use medany code > model for this early code. > > Applied, thanks! [1/1] riscv: Fix relocatable kernels with early alternatives using -fno-pie https://git.kernel.org/palmer/c/8dc2a7e8027f Best regards,
Hello: This patch was applied to riscv/linux.git (fixes) by Palmer Dabbelt <palmer@rivosinc.com>: On Fri, 26 May 2023 17:46:30 +0200 you wrote: > Early alternatives are called with the mmu disabled, and then should not > access any global symbols through the GOT since it requires relocations, > relocations that we do before but *virtually*. So only use medany code > model for this early code. > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > [...] Here is the summary with links: - [-fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie https://git.kernel.org/riscv/c/8dc2a7e8027f You are awesome, thank you!
diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index a1055965fbee..7b2637c8c332 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -1,2 +1,6 @@ +ifdef CONFIG_RELOCATABLE +KBUILD_CFLAGS += -fno-pie +endif + obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index fbdccc21418a..153864e4f399 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -23,6 +23,10 @@ ifdef CONFIG_FTRACE CFLAGS_REMOVE_alternative.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_cpufeature.o = $(CC_FLAGS_FTRACE) endif +ifdef CONFIG_RELOCATABLE +CFLAGS_alternative.o += -fno-pie +CFLAGS_cpufeature.o += -fno-pie +endif ifdef CONFIG_KASAN KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n