Message ID | 20230710054029.2026124-1-guoren@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f45:0:b0:3ea:f831:8777 with SMTP id v5csp4805973vqx; Sun, 9 Jul 2023 23:00:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlFBd2BsdsR3L5uuwrMZNJ1WBM/4Dbl+ZE4r4UXyY8ip8CdxZc2CPJfMMA/gL96uh8fK4Hdu X-Received: by 2002:a17:906:7385:b0:991:b292:699 with SMTP id f5-20020a170906738500b00991b2920699mr10672761ejl.5.1688968813851; Sun, 09 Jul 2023 23:00:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688968813; cv=none; d=google.com; s=arc-20160816; b=VX4sd3ehu0Rk30SSEZCdGF4fykKY7LoShoqFawjmlVZVUR21M6DLpgkYq2r1nYkw+r svCwcyRQyhdAq7r+Pvp8cOjlphT8CT7Z/AMMiBhJwlPLUkGYH40bVh+e+aI9XYXdgzRF vdewHbvgOAC8jtFgKHbbn/DRJO8eSlUerVCzDvqeNl4ZlOyQRdkhATfewnAqIU0dG1ci DrDFibaeaubt09ndVP9kIWRodGs84a2tJeDYnMNVfW2yaNqQiHevYJmGhHSbcNmSimdE R/dhf7AiFFT9CJtlSdn7JPvqezz//nx8s0jGas0tQuDy0HiBWXRx+i6MYkiE/4uN9ooy NHhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=3WLsM8F33Pf4lMqyfoOpYr3vRkc98A5UdH+w8F+nyf4=; fh=ucCEmViepBzd7CjR+4jKA5JHDxCKnNWqFMxrc31HjYc=; b=gOUw7GS/S25H2TGfTRAcI7B3a1Jul/t3eLThmMKR4bmxIlXoMbUO3gr9Bggjf40sWo XK/5gA9H3Fkf7TIMIcPkL5FVRKh/iNsAbRlCOTHq8zpRrO2QsWnMK19OpdqfzsbYDfDf xSTHb7Gwa1vWQdT/hFtrtTpPoPmS6gGllfcpDTy4oWS7AzBI9NRYEdDE6ok+j3L55ax0 oVGUh9U6OyVzxDQkswtuombxOIpHP27LCBchx+nIwHRmi0u612ZPxClWTIqYdw1X4e/i E2zO+Ha1lLMnC7pxNR/WOCCpLwPmiVP8v2wr4Cm2kXpd/SnvFV31TGKkzMY/jhrXllbv k1IA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YWmXKZSq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gj16-20020a170906e11000b00988786fec30si5060065ejb.935.2023.07.09.22.59.50; Sun, 09 Jul 2023 23:00:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YWmXKZSq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230058AbjGJFkw (ORCPT <rfc822;ybw1215001957@gmail.com> + 99 others); Mon, 10 Jul 2023 01:40:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbjGJFku (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 10 Jul 2023 01:40:50 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43C14A8; Sun, 9 Jul 2023 22:40:49 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D630560DFE; Mon, 10 Jul 2023 05:40:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7DF2DC433C7; Mon, 10 Jul 2023 05:40:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1688967648; bh=vTePZySJwROm5kopsmfpT+96m9V/ekc/sUd11srBFsU=; h=From:To:Cc:Subject:Date:From; b=YWmXKZSqljsSEjQnsy1OYiCfCaPh2j1hzxcY/ZLIkpotGxHgoJM5wkIKD7Go1IcRV cfJwcsPsQdTbXlFiDpUYgfj3lpAMuZRUEQqBuj6dOlB5Ev14XeXsfd6fsd88lLIDCm cVmu5BCSMYaRKXqdXfCsPmrsT+g1inMmhTyQ13so+6W6hZrMCgRHwQlhxdSnKdwCjh v++6Z0GP66TwoQmjwWQgAwwkbE/fiJBu7Jr0NtV5pPhs2Mdw8CFlTbC2KBA0GYIIRu FWGX3u0fURZZj9hQuSpEjDYgYryvisP1XPwyVJnJ3fYmI+r1vS22O/2vb/VAdUNN73 jPXdeZ53PuEMQ== From: guoren@kernel.org To: guoren@kernel.org, palmer@rivosinc.com, paul.walmsley@sifive.com, zong.li@sifive.com, atishp@atishpatra.org, alex@ghiti.fr, jszhang@kernel.org, bjorn@kernel.org Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Guo Ren <guoren@linux.alibaba.com> Subject: [PATCH V2] riscv: kexec: Fixup synchronization problem between init_mm and active_mm Date: Mon, 10 Jul 2023 01:40:29 -0400 Message-Id: <20230710054029.2026124-1-guoren@kernel.org> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771012163067619830 X-GMAIL-MSGID: 1771012163067619830 |
Series |
[V2] riscv: kexec: Fixup synchronization problem between init_mm and active_mm
|
|
Commit Message
Guo Ren
July 10, 2023, 5:40 a.m. UTC
From: Guo Ren <guoren@linux.alibaba.com> The machine_kexec() uses set_memory_x to modify the direct mapping attributes from RW to RWX. But set_memory_x only changes the init_mm's attributes, not current->active_mm, so when kexec jumps into control_buffer, the instruction page fault happens, and there is no minor_pagefault for it, then panic. The bug is found on an MMU_sv39 machine, and the direct mapping used a 1GB PUD, the pgd entries. Here is the bug output: kexec_core: Starting new kernel Will call new kernel at 00300000 from hart id 0 FDT image at 747c7000 Bye... Unable to handle kernel paging request at virtual address ffffffda23b0d000 Oops [#1] Modules linked in: CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15 Hardware name: Sophgo Mango (DT) epc : 0xffffffda23b0d000 ra : machine_kexec+0xa6/0xb0 epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10 gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000 t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50 s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000 a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000 a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000 s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000 s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af t5 : ffffffff815351b0 t6 : ffffffc80c173b50 status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c The solution is to fix machine_kexec() to remap control code page outside the linear mapping. Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Alexandre Ghiti <alex@ghiti.fr> --- Changelog: V2: - Use vm_map_ram instead of modifying set_memory_x - Correct Fixes tag --- arch/riscv/include/asm/kexec.h | 1 + arch/riscv/kernel/machine_kexec.c | 14 ++++++++++---- 2 files changed, 11 insertions(+), 4 deletions(-)
Comments
Hi Guo, On 10/07/2023 07:40, guoren@kernel.org wrote: > From: Guo Ren <guoren@linux.alibaba.com> > > The machine_kexec() uses set_memory_x to modify the direct mapping > attributes from RW to RWX. But set_memory_x only changes the init_mm's > attributes, not current->active_mm, so when kexec jumps into > control_buffer, the instruction page fault happens, and there is no > minor_pagefault for it, then panic. I think it needs more details like this: "The current implementation of set_memory_x does not split hugepages in the linear mapping and then when a PGD mapping is used, the whole PGD is marked as executable. But changing the permissions at the PGD level must be propagated to all the page tables." > > The bug is found on an MMU_sv39 machine, and the direct mapping used a > 1GB PUD, the pgd entries. Here is the bug output: > > kexec_core: Starting new kernel > Will call new kernel at 00300000 from hart id 0 > FDT image at 747c7000 > Bye... > Unable to handle kernel paging request at virtual address ffffffda23b0d000 > Oops [#1] > Modules linked in: > CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15 > Hardware name: Sophgo Mango (DT) > epc : 0xffffffda23b0d000 > ra : machine_kexec+0xa6/0xb0 > epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10 > gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000 > t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50 > s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000 > a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000 > a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff > s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000 > s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000 > s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 > s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af > t5 : ffffffff815351b0 t6 : ffffffc80c173b50 > status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c > > The solution is to fix machine_kexec() to remap control code page outside > the linear mapping. "Given the current flaw in the set_memory_x implementation, the simplest solution is to ..." > > Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> > Cc: Alexandre Ghiti <alex@ghiti.fr> > --- > Changelog: > V2: > - Use vm_map_ram instead of modifying set_memory_x > - Correct Fixes tag > --- > arch/riscv/include/asm/kexec.h | 1 + > arch/riscv/kernel/machine_kexec.c | 14 ++++++++++---- > 2 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h > index 2b56769cb530..17456e91476e 100644 > --- a/arch/riscv/include/asm/kexec.h > +++ b/arch/riscv/include/asm/kexec.h > @@ -41,6 +41,7 @@ crash_setup_regs(struct pt_regs *newregs, > struct kimage_arch { > void *fdt; /* For CONFIG_KEXEC_FILE */ > unsigned long fdt_addr; > + void *control_code_buffer; > }; > > extern const unsigned char riscv_kexec_relocate[]; > diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c > index 2d139b724bc8..eeb209775107 100644 > --- a/arch/riscv/kernel/machine_kexec.c > +++ b/arch/riscv/kernel/machine_kexec.c > @@ -86,7 +86,14 @@ machine_kexec_prepare(struct kimage *image) > > /* Copy the assembler code for relocation to the control page */ > if (image->type != KEXEC_TYPE_CRASH) { > - control_code_buffer = page_address(image->control_code_page); > + control_code_buffer = vm_map_ram(&image->control_code_page, > + KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE, > + NUMA_NO_NODE); > + if (control_code_buffer == NULL) { > + pr_err("Failed to vm_map control page\n"); > + return -ENOMEM; > + } > + > control_code_buffer_sz = page_size(image->control_code_page); > > if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) { > @@ -97,8 +104,7 @@ machine_kexec_prepare(struct kimage *image) > memcpy(control_code_buffer, riscv_kexec_relocate, > riscv_kexec_relocate_size); > > - /* Mark the control page executable */ > - set_memory_x((unsigned long) control_code_buffer, 1); > + internal->control_code_buffer = control_code_buffer; Where is this mapping marked as executable? I see that vm_map_ram() maps the pages as PAGE_KERNEL, which does not set PAGE_EXEC. > } > > return 0; > @@ -211,7 +217,7 @@ machine_kexec(struct kimage *image) > unsigned long this_cpu_id = __smp_processor_id(); > unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); > unsigned long fdt_addr = internal->fdt_addr; > - void *control_code_buffer = page_address(image->control_code_page); > + void *control_code_buffer = internal->control_code_buffer; > riscv_kexec_method kexec_method = NULL; > > #ifdef CONFIG_SMP Otherwise, you can add: Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Thanks, Alex
On Tue, 11 Jul 2023 04:07:22 PDT (-0700), alex@ghiti.fr wrote: > Hi Guo, > > > On 10/07/2023 07:40, guoren@kernel.org wrote: >> From: Guo Ren <guoren@linux.alibaba.com> >> >> The machine_kexec() uses set_memory_x to modify the direct mapping >> attributes from RW to RWX. But set_memory_x only changes the init_mm's >> attributes, not current->active_mm, so when kexec jumps into >> control_buffer, the instruction page fault happens, and there is no >> minor_pagefault for it, then panic. > > > I think it needs more details like this: > > "The current implementation of set_memory_x does not split hugepages in > the linear mapping and then when a PGD mapping is used, the whole PGD is > marked as executable. But changing the permissions at the PGD level must > be propagated to all the page tables." > > >> >> The bug is found on an MMU_sv39 machine, and the direct mapping used a >> 1GB PUD, the pgd entries. Here is the bug output: >> >> kexec_core: Starting new kernel >> Will call new kernel at 00300000 from hart id 0 >> FDT image at 747c7000 >> Bye... >> Unable to handle kernel paging request at virtual address ffffffda23b0d000 >> Oops [#1] >> Modules linked in: >> CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15 >> Hardware name: Sophgo Mango (DT) >> epc : 0xffffffda23b0d000 >> ra : machine_kexec+0xa6/0xb0 >> epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10 >> gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000 >> t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50 >> s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000 >> a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000 >> a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff >> s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000 >> s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000 >> s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 >> s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af >> t5 : ffffffff815351b0 t6 : ffffffc80c173b50 >> status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c >> >> The solution is to fix machine_kexec() to remap control code page outside >> the linear mapping. > > > "Given the current flaw in the set_memory_x implementation, the simplest > solution is to ..." > > >> >> Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") >> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> >> Signed-off-by: Guo Ren <guoren@kernel.org> >> Cc: Alexandre Ghiti <alex@ghiti.fr> >> --- >> Changelog: >> V2: >> - Use vm_map_ram instead of modifying set_memory_x >> - Correct Fixes tag >> --- >> arch/riscv/include/asm/kexec.h | 1 + >> arch/riscv/kernel/machine_kexec.c | 14 ++++++++++---- >> 2 files changed, 11 insertions(+), 4 deletions(-) >> >> diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h >> index 2b56769cb530..17456e91476e 100644 >> --- a/arch/riscv/include/asm/kexec.h >> +++ b/arch/riscv/include/asm/kexec.h >> @@ -41,6 +41,7 @@ crash_setup_regs(struct pt_regs *newregs, >> struct kimage_arch { >> void *fdt; /* For CONFIG_KEXEC_FILE */ >> unsigned long fdt_addr; >> + void *control_code_buffer; >> }; >> >> extern const unsigned char riscv_kexec_relocate[]; >> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c >> index 2d139b724bc8..eeb209775107 100644 >> --- a/arch/riscv/kernel/machine_kexec.c >> +++ b/arch/riscv/kernel/machine_kexec.c >> @@ -86,7 +86,14 @@ machine_kexec_prepare(struct kimage *image) >> >> /* Copy the assembler code for relocation to the control page */ >> if (image->type != KEXEC_TYPE_CRASH) { >> - control_code_buffer = page_address(image->control_code_page); >> + control_code_buffer = vm_map_ram(&image->control_code_page, >> + KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE, >> + NUMA_NO_NODE); >> + if (control_code_buffer == NULL) { >> + pr_err("Failed to vm_map control page\n"); >> + return -ENOMEM; >> + } >> + >> control_code_buffer_sz = page_size(image->control_code_page); >> >> if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) { >> @@ -97,8 +104,7 @@ machine_kexec_prepare(struct kimage *image) >> memcpy(control_code_buffer, riscv_kexec_relocate, >> riscv_kexec_relocate_size); >> >> - /* Mark the control page executable */ >> - set_memory_x((unsigned long) control_code_buffer, 1); >> + internal->control_code_buffer = control_code_buffer; > > > Where is this mapping marked as executable? I see that vm_map_ram() maps > the pages as PAGE_KERNEL, which does not set PAGE_EXEC. > > >> } >> >> return 0; >> @@ -211,7 +217,7 @@ machine_kexec(struct kimage *image) >> unsigned long this_cpu_id = __smp_processor_id(); >> unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); >> unsigned long fdt_addr = internal->fdt_addr; >> - void *control_code_buffer = page_address(image->control_code_page); >> + void *control_code_buffer = internal->control_code_buffer; >> riscv_kexec_method kexec_method = NULL; >> >> #ifdef CONFIG_SMP > > > Otherwise, you can add: > > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > Thanks, > > Alex Thanks for looking at this. Guo: do you have a re-spit that fixes the issues Alex pointed out? Sorry if I just missed it.
On Wed, Jul 12, 2023 at 10:43 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > On Tue, 11 Jul 2023 04:07:22 PDT (-0700), alex@ghiti.fr wrote: > > Hi Guo, > > > > > > On 10/07/2023 07:40, guoren@kernel.org wrote: > >> From: Guo Ren <guoren@linux.alibaba.com> > >> > >> The machine_kexec() uses set_memory_x to modify the direct mapping > >> attributes from RW to RWX. But set_memory_x only changes the init_mm's > >> attributes, not current->active_mm, so when kexec jumps into > >> control_buffer, the instruction page fault happens, and there is no > >> minor_pagefault for it, then panic. > > > > > > I think it needs more details like this: > > > > "The current implementation of set_memory_x does not split hugepages in > > the linear mapping and then when a PGD mapping is used, the whole PGD is > > marked as executable. But changing the permissions at the PGD level must > > be propagated to all the page tables." > > > > > >> > >> The bug is found on an MMU_sv39 machine, and the direct mapping used a > >> 1GB PUD, the pgd entries. Here is the bug output: > >> > >> kexec_core: Starting new kernel > >> Will call new kernel at 00300000 from hart id 0 > >> FDT image at 747c7000 > >> Bye... > >> Unable to handle kernel paging request at virtual address ffffffda23b0d000 > >> Oops [#1] > >> Modules linked in: > >> CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15 > >> Hardware name: Sophgo Mango (DT) > >> epc : 0xffffffda23b0d000 > >> ra : machine_kexec+0xa6/0xb0 > >> epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10 > >> gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000 > >> t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50 > >> s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000 > >> a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000 > >> a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff > >> s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000 > >> s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000 > >> s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 > >> s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af > >> t5 : ffffffff815351b0 t6 : ffffffc80c173b50 > >> status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c > >> > >> The solution is to fix machine_kexec() to remap control code page outside > >> the linear mapping. > > > > > > "Given the current flaw in the set_memory_x implementation, the simplest > > solution is to ..." > > > > > >> > >> Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") > >> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > >> Signed-off-by: Guo Ren <guoren@kernel.org> > >> Cc: Alexandre Ghiti <alex@ghiti.fr> > >> --- > >> Changelog: > >> V2: > >> - Use vm_map_ram instead of modifying set_memory_x > >> - Correct Fixes tag > >> --- > >> arch/riscv/include/asm/kexec.h | 1 + > >> arch/riscv/kernel/machine_kexec.c | 14 ++++++++++---- > >> 2 files changed, 11 insertions(+), 4 deletions(-) > >> > >> diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h > >> index 2b56769cb530..17456e91476e 100644 > >> --- a/arch/riscv/include/asm/kexec.h > >> +++ b/arch/riscv/include/asm/kexec.h > >> @@ -41,6 +41,7 @@ crash_setup_regs(struct pt_regs *newregs, > >> struct kimage_arch { > >> void *fdt; /* For CONFIG_KEXEC_FILE */ > >> unsigned long fdt_addr; > >> + void *control_code_buffer; > >> }; > >> > >> extern const unsigned char riscv_kexec_relocate[]; > >> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c > >> index 2d139b724bc8..eeb209775107 100644 > >> --- a/arch/riscv/kernel/machine_kexec.c > >> +++ b/arch/riscv/kernel/machine_kexec.c > >> @@ -86,7 +86,14 @@ machine_kexec_prepare(struct kimage *image) > >> > >> /* Copy the assembler code for relocation to the control page */ > >> if (image->type != KEXEC_TYPE_CRASH) { > >> - control_code_buffer = page_address(image->control_code_page); > >> + control_code_buffer = vm_map_ram(&image->control_code_page, > >> + KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE, > >> + NUMA_NO_NODE); > >> + if (control_code_buffer == NULL) { > >> + pr_err("Failed to vm_map control page\n"); > >> + return -ENOMEM; > >> + } > >> + > >> control_code_buffer_sz = page_size(image->control_code_page); > >> > >> if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) { > >> @@ -97,8 +104,7 @@ machine_kexec_prepare(struct kimage *image) > >> memcpy(control_code_buffer, riscv_kexec_relocate, > >> riscv_kexec_relocate_size); > >> > >> - /* Mark the control page executable */ > >> - set_memory_x((unsigned long) control_code_buffer, 1); > >> + internal->control_code_buffer = control_code_buffer; > > > > > > Where is this mapping marked as executable? I see that vm_map_ram() maps > > the pages as PAGE_KERNEL, which does not set PAGE_EXEC. > > > > > >> } > >> > >> return 0; > >> @@ -211,7 +217,7 @@ machine_kexec(struct kimage *image) > >> unsigned long this_cpu_id = __smp_processor_id(); > >> unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); > >> unsigned long fdt_addr = internal->fdt_addr; > >> - void *control_code_buffer = page_address(image->control_code_page); > >> + void *control_code_buffer = internal->control_code_buffer; > >> riscv_kexec_method kexec_method = NULL; > >> > >> #ifdef CONFIG_SMP > > > > > > Otherwise, you can add: > > > > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > Thanks, > > > > Alex > > Thanks for looking at this. Guo: do you have a re-spit that fixes the > issues Alex pointed out? Sorry if I just missed it. Sorry for the late reply. Here is the patch of v3: https://lore.kernel.org/linux-riscv/20230713150758.2956316-1-guoren@kernel.org/
On Tue, Jul 11, 2023 at 7:07 AM Alexandre Ghiti <alex@ghiti.fr> wrote: > > Hi Guo, > > > On 10/07/2023 07:40, guoren@kernel.org wrote: > > From: Guo Ren <guoren@linux.alibaba.com> > > > > The machine_kexec() uses set_memory_x to modify the direct mapping > > attributes from RW to RWX. But set_memory_x only changes the init_mm's > > attributes, not current->active_mm, so when kexec jumps into > > control_buffer, the instruction page fault happens, and there is no > > minor_pagefault for it, then panic. > > > I think it needs more details like this: > > "The current implementation of set_memory_x does not split hugepages in > the linear mapping and then when a PGD mapping is used, the whole PGD is > marked as executable. But changing the permissions at the PGD level must > be propagated to all the page tables." okay > > > > > > The bug is found on an MMU_sv39 machine, and the direct mapping used a > > 1GB PUD, the pgd entries. Here is the bug output: > > > > kexec_core: Starting new kernel > > Will call new kernel at 00300000 from hart id 0 > > FDT image at 747c7000 > > Bye... > > Unable to handle kernel paging request at virtual address ffffffda23b0d000 > > Oops [#1] > > Modules linked in: > > CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15 > > Hardware name: Sophgo Mango (DT) > > epc : 0xffffffda23b0d000 > > ra : machine_kexec+0xa6/0xb0 > > epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10 > > gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000 > > t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50 > > s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000 > > a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000 > > a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff > > s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000 > > s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000 > > s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 > > s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af > > t5 : ffffffff815351b0 t6 : ffffffc80c173b50 > > status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c > > > > The solution is to fix machine_kexec() to remap control code page outside > > the linear mapping. > > > "Given the current flaw in the set_memory_x implementation, the simplest > solution is to ..." Thx, it's better. > > > > > > Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > Signed-off-by: Guo Ren <guoren@kernel.org> > > Cc: Alexandre Ghiti <alex@ghiti.fr> > > --- > > Changelog: > > V2: > > - Use vm_map_ram instead of modifying set_memory_x > > - Correct Fixes tag > > --- > > arch/riscv/include/asm/kexec.h | 1 + > > arch/riscv/kernel/machine_kexec.c | 14 ++++++++++---- > > 2 files changed, 11 insertions(+), 4 deletions(-) > > > > diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h > > index 2b56769cb530..17456e91476e 100644 > > --- a/arch/riscv/include/asm/kexec.h > > +++ b/arch/riscv/include/asm/kexec.h > > @@ -41,6 +41,7 @@ crash_setup_regs(struct pt_regs *newregs, > > struct kimage_arch { > > void *fdt; /* For CONFIG_KEXEC_FILE */ > > unsigned long fdt_addr; > > + void *control_code_buffer; > > }; > > > > extern const unsigned char riscv_kexec_relocate[]; > > diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c > > index 2d139b724bc8..eeb209775107 100644 > > --- a/arch/riscv/kernel/machine_kexec.c > > +++ b/arch/riscv/kernel/machine_kexec.c > > @@ -86,7 +86,14 @@ machine_kexec_prepare(struct kimage *image) > > > > /* Copy the assembler code for relocation to the control page */ > > if (image->type != KEXEC_TYPE_CRASH) { > > - control_code_buffer = page_address(image->control_code_page); > > + control_code_buffer = vm_map_ram(&image->control_code_page, > > + KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE, > > + NUMA_NO_NODE); > > + if (control_code_buffer == NULL) { > > + pr_err("Failed to vm_map control page\n"); > > + return -ENOMEM; > > + } > > + > > control_code_buffer_sz = page_size(image->control_code_page); > > > > if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) { > > @@ -97,8 +104,7 @@ machine_kexec_prepare(struct kimage *image) > > memcpy(control_code_buffer, riscv_kexec_relocate, > > riscv_kexec_relocate_size); > > > > - /* Mark the control page executable */ > > - set_memory_x((unsigned long) control_code_buffer, 1); > > + internal->control_code_buffer = control_code_buffer; > > > Where is this mapping marked as executable? I see that vm_map_ram() maps > the pages as PAGE_KERNEL, which does not set PAGE_EXEC. I shouldn't delete set_memory_x() when I made the patch. > > > > } > > > > return 0; > > @@ -211,7 +217,7 @@ machine_kexec(struct kimage *image) > > unsigned long this_cpu_id = __smp_processor_id(); > > unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); > > unsigned long fdt_addr = internal->fdt_addr; > > - void *control_code_buffer = page_address(image->control_code_page); > > + void *control_code_buffer = internal->control_code_buffer; > > riscv_kexec_method kexec_method = NULL; > > > > #ifdef CONFIG_SMP > > > Otherwise, you can add: > > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > Thanks, > > Alex >
On Thu, Jul 13, 2023 at 11:11 PM Guo Ren <guoren@kernel.org> wrote: > > On Wed, Jul 12, 2023 at 10:43 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > > > On Tue, 11 Jul 2023 04:07:22 PDT (-0700), alex@ghiti.fr wrote: > > > Hi Guo, > > > > > > > > > On 10/07/2023 07:40, guoren@kernel.org wrote: > > >> From: Guo Ren <guoren@linux.alibaba.com> > > >> > > >> The machine_kexec() uses set_memory_x to modify the direct mapping > > >> attributes from RW to RWX. But set_memory_x only changes the init_mm's > > >> attributes, not current->active_mm, so when kexec jumps into > > >> control_buffer, the instruction page fault happens, and there is no > > >> minor_pagefault for it, then panic. > > > > > > > > > I think it needs more details like this: > > > > > > "The current implementation of set_memory_x does not split hugepages in > > > the linear mapping and then when a PGD mapping is used, the whole PGD is > > > marked as executable. But changing the permissions at the PGD level must > > > be propagated to all the page tables." > > > > > > > > >> > > >> The bug is found on an MMU_sv39 machine, and the direct mapping used a > > >> 1GB PUD, the pgd entries. Here is the bug output: > > >> > > >> kexec_core: Starting new kernel > > >> Will call new kernel at 00300000 from hart id 0 > > >> FDT image at 747c7000 > > >> Bye... > > >> Unable to handle kernel paging request at virtual address ffffffda23b0d000 > > >> Oops [#1] > > >> Modules linked in: > > >> CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15 > > >> Hardware name: Sophgo Mango (DT) > > >> epc : 0xffffffda23b0d000 > > >> ra : machine_kexec+0xa6/0xb0 > > >> epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10 > > >> gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000 > > >> t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50 > > >> s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000 > > >> a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000 > > >> a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff > > >> s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000 > > >> s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000 > > >> s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 > > >> s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af > > >> t5 : ffffffff815351b0 t6 : ffffffc80c173b50 > > >> status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c > > >> > > >> The solution is to fix machine_kexec() to remap control code page outside > > >> the linear mapping. > > > > > > > > > "Given the current flaw in the set_memory_x implementation, the simplest > > > solution is to ..." > > > > > > > > >> > > >> Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") > > >> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > >> Signed-off-by: Guo Ren <guoren@kernel.org> > > >> Cc: Alexandre Ghiti <alex@ghiti.fr> > > >> --- > > >> Changelog: > > >> V2: > > >> - Use vm_map_ram instead of modifying set_memory_x > > >> - Correct Fixes tag > > >> --- > > >> arch/riscv/include/asm/kexec.h | 1 + > > >> arch/riscv/kernel/machine_kexec.c | 14 ++++++++++---- > > >> 2 files changed, 11 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h > > >> index 2b56769cb530..17456e91476e 100644 > > >> --- a/arch/riscv/include/asm/kexec.h > > >> +++ b/arch/riscv/include/asm/kexec.h > > >> @@ -41,6 +41,7 @@ crash_setup_regs(struct pt_regs *newregs, > > >> struct kimage_arch { > > >> void *fdt; /* For CONFIG_KEXEC_FILE */ > > >> unsigned long fdt_addr; > > >> + void *control_code_buffer; > > >> }; > > >> > > >> extern const unsigned char riscv_kexec_relocate[]; > > >> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c > > >> index 2d139b724bc8..eeb209775107 100644 > > >> --- a/arch/riscv/kernel/machine_kexec.c > > >> +++ b/arch/riscv/kernel/machine_kexec.c > > >> @@ -86,7 +86,14 @@ machine_kexec_prepare(struct kimage *image) > > >> > > >> /* Copy the assembler code for relocation to the control page */ > > >> if (image->type != KEXEC_TYPE_CRASH) { > > >> - control_code_buffer = page_address(image->control_code_page); > > >> + control_code_buffer = vm_map_ram(&image->control_code_page, > > >> + KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE, > > >> + NUMA_NO_NODE); > > >> + if (control_code_buffer == NULL) { > > >> + pr_err("Failed to vm_map control page\n"); > > >> + return -ENOMEM; > > >> + } > > >> + > > >> control_code_buffer_sz = page_size(image->control_code_page); > > >> > > >> if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) { > > >> @@ -97,8 +104,7 @@ machine_kexec_prepare(struct kimage *image) > > >> memcpy(control_code_buffer, riscv_kexec_relocate, > > >> riscv_kexec_relocate_size); > > >> > > >> - /* Mark the control page executable */ > > >> - set_memory_x((unsigned long) control_code_buffer, 1); > > >> + internal->control_code_buffer = control_code_buffer; > > > > > > > > > Where is this mapping marked as executable? I see that vm_map_ram() maps > > > the pages as PAGE_KERNEL, which does not set PAGE_EXEC. > > > > > > > > >> } > > >> > > >> return 0; > > >> @@ -211,7 +217,7 @@ machine_kexec(struct kimage *image) > > >> unsigned long this_cpu_id = __smp_processor_id(); > > >> unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); > > >> unsigned long fdt_addr = internal->fdt_addr; > > >> - void *control_code_buffer = page_address(image->control_code_page); > > >> + void *control_code_buffer = internal->control_code_buffer; > > >> riscv_kexec_method kexec_method = NULL; > > >> > > >> #ifdef CONFIG_SMP > > > > > > > > > Otherwise, you can add: > > > > > > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > Thanks, > > > > > > Alex > > > > Thanks for looking at this. Guo: do you have a re-spit that fixes the > > issues Alex pointed out? Sorry if I just missed it. > Sorry for the late reply. Here is the patch of v3: > https://lore.kernel.org/linux-riscv/20230713150758.2956316-1-guoren@kernel.org/ @Palmer Dabbelt Above V3 has been abandoned; I've updated it to V4: https://lore.kernel.org/linux-riscv/20230714103659.3146949-1-guoren@kernel.org/ Xing Xiaoguang has tested it: https://lore.kernel.org/lkml/6b766b2b.2e5.189570f5ee6.Coremail.xingxg2008@163.com/ > > > > -- > Best Regards > Guo Ren
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h index 2b56769cb530..17456e91476e 100644 --- a/arch/riscv/include/asm/kexec.h +++ b/arch/riscv/include/asm/kexec.h @@ -41,6 +41,7 @@ crash_setup_regs(struct pt_regs *newregs, struct kimage_arch { void *fdt; /* For CONFIG_KEXEC_FILE */ unsigned long fdt_addr; + void *control_code_buffer; }; extern const unsigned char riscv_kexec_relocate[]; diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c index 2d139b724bc8..eeb209775107 100644 --- a/arch/riscv/kernel/machine_kexec.c +++ b/arch/riscv/kernel/machine_kexec.c @@ -86,7 +86,14 @@ machine_kexec_prepare(struct kimage *image) /* Copy the assembler code for relocation to the control page */ if (image->type != KEXEC_TYPE_CRASH) { - control_code_buffer = page_address(image->control_code_page); + control_code_buffer = vm_map_ram(&image->control_code_page, + KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE, + NUMA_NO_NODE); + if (control_code_buffer == NULL) { + pr_err("Failed to vm_map control page\n"); + return -ENOMEM; + } + control_code_buffer_sz = page_size(image->control_code_page); if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) { @@ -97,8 +104,7 @@ machine_kexec_prepare(struct kimage *image) memcpy(control_code_buffer, riscv_kexec_relocate, riscv_kexec_relocate_size); - /* Mark the control page executable */ - set_memory_x((unsigned long) control_code_buffer, 1); + internal->control_code_buffer = control_code_buffer; } return 0; @@ -211,7 +217,7 @@ machine_kexec(struct kimage *image) unsigned long this_cpu_id = __smp_processor_id(); unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); unsigned long fdt_addr = internal->fdt_addr; - void *control_code_buffer = page_address(image->control_code_page); + void *control_code_buffer = internal->control_code_buffer; riscv_kexec_method kexec_method = NULL; #ifdef CONFIG_SMP