Message ID | 20230308080612.122398-5-jeeheng.sia@starfivetech.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp206046wrd; Wed, 8 Mar 2023 00:22:10 -0800 (PST) X-Google-Smtp-Source: AK7set//hgdmIbu55TPnveiTjt8HHkXiSSumYl0kT4QkwtqXS3QPSNcbDbYKc6sIYfyudl6dY6uC X-Received: by 2002:a17:902:bb8c:b0:19d:1bc1:ce22 with SMTP id m12-20020a170902bb8c00b0019d1bc1ce22mr15217419pls.5.1678263730148; Wed, 08 Mar 2023 00:22:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678263730; cv=none; d=google.com; s=arc-20160816; b=GepyFKE7SEOKs+VBrXcWjz1G2HgbNrxfibiuyOynYZSMzQUTq49DVkFt37wn1AZMPJ eLp3QCQ5I0zdmb/3qOGtkpzS9INkpC9g3JUJum7RftIIo/5Yw9hfu0+7axtKxu+vEuYj ROkJq7YaCYi6UtRJDE16pHShJ7O/C5s9jOUKS/kr1B0GlXE73BgUoOrjpwVsAAQdWh9w Upvvgl7y1hnF6Je+2g3g6OkDhFER0Lle63p533elx9Q84Vk7XwbeaZEZDOGNBm7cU3J+ P3up5b711QS+XlLu69JaY65xWi+D4skSuQDNiwcOfGZFvFhAiMP8FuLiHsPxVnPSFje1 rtKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=xc6F/TXjucnzLNrchWhEKMZlxhsihGI79O4clDy/BCc=; b=zzK4Cj0qj1U1p55/vjPOtXwHOChz0zN5w1Jb7XLMSfOiQWeNfPWPPoNHRYcWJFlDuM rhL+PSQ5yvuJbsOKNci2A2VohPI5KuZDHrtjVrvQWFVyJT7rXxjIpp4BQ1Hutaj6fxb2 Tx79XzOXZK7QaL+NItA1sWBghAfRx+FK67Wl9jPQMZLLEkVlqpRl35+Zx4iNO8fRaRka nrrnBZ9MCo3N8mgINdTiAbGdZb/TqbbRs76CGKyeH0AJVMFusGnfSPGqKz3bFaJjcwI0 JJmkjKRfrGleTmdy2ihEAC4YubbwB1q3wj80h2xVygkEGsb5tnmUUMsltwQzW6Pikx0m /jRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ki3-20020a170903068300b0019abd464dc8si13122253plb.277.2023.03.08.00.21.54; Wed, 08 Mar 2023 00:22:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229483AbjCHIGy convert rfc822-to-8bit (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Wed, 8 Mar 2023 03:06:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229657AbjCHIGt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 8 Mar 2023 03:06:49 -0500 Received: from ex01.ufhost.com (ex01.ufhost.com [61.152.239.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CACCABADD for <linux-kernel@vger.kernel.org>; Wed, 8 Mar 2023 00:06:40 -0800 (PST) Received: from EXMBX165.cuchost.com (unknown [175.102.18.54]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "EXMBX165", Issuer "EXMBX165" (not verified)) by ex01.ufhost.com (Postfix) with ESMTP id 19D5424E1F3; Wed, 8 Mar 2023 16:06:39 +0800 (CST) Received: from EXMBX066.cuchost.com (172.16.7.66) by EXMBX165.cuchost.com (172.16.6.75) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Wed, 8 Mar 2023 16:06:39 +0800 Received: from jsia-virtual-machine.localdomain (202.188.176.82) by EXMBX066.cuchost.com (172.16.6.66) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Wed, 8 Mar 2023 16:06:36 +0800 From: Sia Jee Heng <jeeheng.sia@starfivetech.com> To: <paul.walmsley@sifive.com>, <palmer@dabbelt.com>, <aou@eecs.berkeley.edu> CC: <linux-riscv@lists.infradead.org>, <linux-kernel@vger.kernel.org>, <jeeheng.sia@starfivetech.com>, <leyfoon.tan@starfivetech.com>, <mason.huo@starfivetech.com> Subject: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk Date: Wed, 8 Mar 2023 16:06:12 +0800 Message-ID: <20230308080612.122398-5-jeeheng.sia@starfivetech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308080612.122398-1-jeeheng.sia@starfivetech.com> References: <20230308080612.122398-1-jeeheng.sia@starfivetech.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [202.188.176.82] X-ClientProxiedBy: EXCAS064.cuchost.com (172.16.6.24) To EXMBX066.cuchost.com (172.16.6.66) X-YovoleRuleAgent: yovoleflag Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759787069049921767?= X-GMAIL-MSGID: =?utf-8?q?1759787069049921767?= |
Series |
RISC-V Hibernation Support
|
|
Commit Message
JeeHeng Sia
March 8, 2023, 8:06 a.m. UTC
Low level Arch functions were created to support hibernation. swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write cpu state onto the stack, then calling swsusp_save() to save the memory image. Arch specific hibernation header is implemented and is utilized by the arch_hibernation_header_restore() and arch_hibernation_header_save() functions. The arch specific hibernation header consists of satp, hartid, and the cpu_resume address. The kernel built version is also need to be saved into the hibernation image header to making sure only the same kernel is restore when resume. swsusp_arch_resume() creates a temporary page table that covering only the linear map. It copies the restore code to a 'safe' page, then start to restore the memory image. Once completed, it restores the original kernel's page table. It then calls into __hibernate_cpu_resume() to restore the CPU context. Finally, it follows the normal hibernation path back to the hibernation core. To enable hibernation/suspend to disk into RISCV, the below config need to be enabled: - CONFIG_ARCH_HIBERNATION_HEADER - CONFIG_ARCH_HIBERNATION_POSSIBLE Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> --- arch/riscv/Kconfig | 6 + arch/riscv/include/asm/assembler.h | 20 ++ arch/riscv/include/asm/suspend.h | 19 ++ arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/asm-offsets.c | 5 + arch/riscv/kernel/hibernate-asm.S | 77 ++++++ arch/riscv/kernel/hibernate.c | 424 +++++++++++++++++++++++++++++ 7 files changed, 552 insertions(+) create mode 100644 arch/riscv/kernel/hibernate-asm.S create mode 100644 arch/riscv/kernel/hibernate.c
Comments
On 3/8/23 09:06, Sia Jee Heng wrote: > Low level Arch functions were created to support hibernation. > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write > cpu state onto the stack, then calling swsusp_save() to save the memory > image. > > Arch specific hibernation header is implemented and is utilized by the > arch_hibernation_header_restore() and arch_hibernation_header_save() > functions. The arch specific hibernation header consists of satp, hartid, > and the cpu_resume address. The kernel built version is also need to be > saved into the hibernation image header to making sure only the same > kernel is restore when resume. > > swsusp_arch_resume() creates a temporary page table that covering only > the linear map. It copies the restore code to a 'safe' page, then start > to restore the memory image. Once completed, it restores the original > kernel's page table. It then calls into __hibernate_cpu_resume() > to restore the CPU context. Finally, it follows the normal hibernation > path back to the hibernation core. > > To enable hibernation/suspend to disk into RISCV, the below config > need to be enabled: > - CONFIG_ARCH_HIBERNATION_HEADER > - CONFIG_ARCH_HIBERNATION_POSSIBLE > > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com> > --- > arch/riscv/Kconfig | 6 + > arch/riscv/include/asm/assembler.h | 20 ++ > arch/riscv/include/asm/suspend.h | 19 ++ > arch/riscv/kernel/Makefile | 1 + > arch/riscv/kernel/asm-offsets.c | 5 + > arch/riscv/kernel/hibernate-asm.S | 77 ++++++ > arch/riscv/kernel/hibernate.c | 424 +++++++++++++++++++++++++++++ > 7 files changed, 552 insertions(+) > create mode 100644 arch/riscv/kernel/hibernate-asm.S > create mode 100644 arch/riscv/kernel/hibernate.c > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index c5e42cc37604..473c2a1a6884 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -707,6 +707,12 @@ menu "Power management options" > > source "kernel/power/Kconfig" > > +config ARCH_HIBERNATION_POSSIBLE > + def_bool y > + > +config ARCH_HIBERNATION_HEADER > + def_bool HIBERNATION > + > endmenu # "Power management options" > > menu "CPU Power Management" > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h > index ba59d38f8937..44b1457d3e95 100644 > --- a/arch/riscv/include/asm/assembler.h > +++ b/arch/riscv/include/asm/assembler.h > @@ -59,4 +59,24 @@ > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0) > .endm > > +/* > + * copy_page - copy 1 page (4KB) of data from source to destination > + * @a0 - destination > + * @a1 - source > + */ > + .macro copy_page a0, a1 > + lui a2, 0x1 > + add a2, a2, a0 > +1 : > + REG_L t0, 0(a1) > + REG_L t1, SZREG(a1) > + > + REG_S t0, 0(a0) > + REG_S t1, SZREG(a0) > + > + addi a0, a0, 2 * SZREG > + addi a1, a1, 2 * SZREG > + bne a2, a0, 1b > + .endm > + > #endif /* __ASM_ASSEMBLER_H */ > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h > index 67e047445662..02f87867389a 100644 > --- a/arch/riscv/include/asm/suspend.h > +++ b/arch/riscv/include/asm/suspend.h > @@ -21,6 +21,11 @@ struct suspend_context { > #endif > }; > > +/* > + * Used by hibernation core and cleared during resume sequence > + */ > +extern int in_suspend; > + > /* Low-level CPU suspend entry function */ > int __cpu_suspend_enter(struct suspend_context *context); > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context); > /* Used to save and restore the CSRs */ > void suspend_save_csrs(struct suspend_context *context); > void suspend_restore_csrs(struct suspend_context *context); > + > +/* Low-level API to support hibernation */ > +int swsusp_arch_suspend(void); > +int swsusp_arch_resume(void); > +int arch_hibernation_header_save(void *addr, unsigned int max_size); > +int arch_hibernation_header_restore(void *addr); > +int __hibernate_cpu_resume(void); > + > +/* Used to resume on the CPU we hibernated on */ > +int hibernate_resume_nonboot_cpu_disable(void); > + > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp, > + unsigned long cpu_resume); > +asmlinkage int hibernate_core_restore_code(void); > #endif > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > index 4cf303a779ab..daab341d55e4 100644 > --- a/arch/riscv/kernel/Makefile > +++ b/arch/riscv/kernel/Makefile > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c > index df9444397908..d6a75aac1d27 100644 > --- a/arch/riscv/kernel/asm-offsets.c > +++ b/arch/riscv/kernel/asm-offsets.c > @@ -9,6 +9,7 @@ > #include <linux/kbuild.h> > #include <linux/mm.h> > #include <linux/sched.h> > +#include <linux/suspend.h> > #include <asm/kvm_host.h> > #include <asm/thread_info.h> > #include <asm/ptrace.h> > @@ -116,6 +117,10 @@ void asm_offsets(void) > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs); > > + OFFSET(HIBERN_PBE_ADDR, pbe, address); > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address); > + OFFSET(HIBERN_PBE_NEXT, pbe, next); > + > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S > new file mode 100644 > index 000000000000..effaf5ca5da0 > --- /dev/null > +++ b/arch/riscv/kernel/hibernate-asm.S > @@ -0,0 +1,77 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Hibernation low level support for RISCV. > + * > + * Copyright (C) 2023 StarFive Technology Co., Ltd. > + * > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> > + */ > + > +#include <asm/asm.h> > +#include <asm/asm-offsets.h> > +#include <asm/assembler.h> > +#include <asm/csr.h> > + > +#include <linux/linkage.h> > + > +/* > + * int __hibernate_cpu_resume(void) > + * Switch back to the hibernated image's page table prior to restoring the CPU > + * context. > + * > + * Always returns 0 > + */ > +ENTRY(__hibernate_cpu_resume) > + /* switch to hibernated image's page table. */ > + csrw CSR_SATP, s0 > + sfence.vma > + > + REG_L a0, hibernate_cpu_context > + > + suspend_restore_csrs > + suspend_restore_regs > + > + /* Return zero value. */ > + mv a0, zero > + > + ret > +END(__hibernate_cpu_resume) > + > +/* > + * Prepare to restore the image. > + * a0: satp of saved page tables. > + * a1: satp of temporary page tables. > + * a2: cpu_resume. > + */ > +ENTRY(hibernate_restore_image) > + mv s0, a0 > + mv s1, a1 > + mv s2, a2 > + REG_L s4, restore_pblist > + REG_L a1, relocated_restore_code > + > + jalr a1 > +END(hibernate_restore_image) > + > +/* > + * The below code will be executed from a 'safe' page. > + * It first switches to the temporary page table, then starts to copy the pages > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume() > + * to restore the CPU context. > + */ > +ENTRY(hibernate_core_restore_code) > + /* switch to temp page table. */ > + csrw satp, s1 > + sfence.vma > +.Lcopy: > + /* The below code will restore the hibernated image. */ > + REG_L a1, HIBERN_PBE_ADDR(s4) > + REG_L a0, HIBERN_PBE_ORIG(s4) > + > + copy_page a0, a1 > + > + REG_L s4, HIBERN_PBE_NEXT(s4) > + bnez s4, .Lcopy > + > + jalr s2 > +END(hibernate_core_restore_code) > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c > new file mode 100644 > index 000000000000..8fab50122e54 > --- /dev/null > +++ b/arch/riscv/kernel/hibernate.c > @@ -0,0 +1,424 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Hibernation support for RISCV > + * > + * Copyright (C) 2023 StarFive Technology Co., Ltd. > + * > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> > + */ > + > +#include <asm/barrier.h> > +#include <asm/cacheflush.h> > +#include <asm/mmu_context.h> > +#include <asm/page.h> > +#include <asm/pgalloc.h> > +#include <asm/pgtable.h> > +#include <asm/sections.h> > +#include <asm/set_memory.h> > +#include <asm/smp.h> > +#include <asm/suspend.h> > + > +#include <linux/cpu.h> > +#include <linux/memblock.h> > +#include <linux/pm.h> > +#include <linux/sched.h> > +#include <linux/suspend.h> > +#include <linux/utsname.h> > + > +/* The logical cpu number we should resume on, initialised to a non-cpu number. */ > +static int sleep_cpu = -EINVAL; > + > +/* Pointer to the temporary resume page table. */ > +static pgd_t *resume_pg_dir; > + > +/* CPU context to be saved. */ > +struct suspend_context *hibernate_cpu_context; > +EXPORT_SYMBOL_GPL(hibernate_cpu_context); > + > +unsigned long relocated_restore_code; > +EXPORT_SYMBOL_GPL(relocated_restore_code); > + > +/** > + * struct arch_hibernate_hdr_invariants - container to store kernel build version. > + * @uts_version: to save the build number and date so that the we do not resume with > + * a different kernel. > + */ > +struct arch_hibernate_hdr_invariants { > + char uts_version[__NEW_UTS_LEN + 1]; > +}; > + > +/** > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image. > + * @invariants: container to store kernel build version. > + * @hartid: to make sure same boot_cpu executes the hibernate/restore code. > + * @saved_satp: original page table used by the hibernated image. > + * @restore_cpu_addr: the kernel's image address to restore the CPU context. > + */ > +static struct arch_hibernate_hdr { > + struct arch_hibernate_hdr_invariants invariants; > + unsigned long hartid; > + unsigned long saved_satp; > + unsigned long restore_cpu_addr; > +} resume_hdr; > + > +static void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i) > +{ > + memset(i, 0, sizeof(*i)); > + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version)); > +} > + > +/* > + * Check if the given pfn is in the 'nosave' section. > + */ > +int pfn_is_nosave(unsigned long pfn) > +{ > + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); > + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); > + > + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)); > +} > + > +void notrace save_processor_state(void) > +{ > + WARN_ON(num_online_cpus() != 1); > +} > + > +void notrace restore_processor_state(void) > +{ > +} > + > +/* > + * Helper parameters need to be saved to the hibernation image header. > + */ > +int arch_hibernation_header_save(void *addr, unsigned int max_size) > +{ > + struct arch_hibernate_hdr *hdr = addr; > + > + if (max_size < sizeof(*hdr)) > + return -EOVERFLOW; > + > + arch_hdr_invariants(&hdr->invariants); > + > + hdr->hartid = cpuid_to_hartid_map(sleep_cpu); > + hdr->saved_satp = csr_read(CSR_SATP); > + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume; > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(arch_hibernation_header_save); > + > +/* > + * Retrieve the helper parameters from the hibernation image header. > + */ > +int arch_hibernation_header_restore(void *addr) > +{ > + struct arch_hibernate_hdr_invariants invariants; > + struct arch_hibernate_hdr *hdr = addr; > + int ret = 0; > + > + arch_hdr_invariants(&invariants); > + > + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) { > + pr_crit("Hibernate image not generated by this kernel!\n"); > + return -EINVAL; > + } > + > + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid); > + if (sleep_cpu < 0) { > + pr_crit("Hibernated on a CPU not known to this kernel!\n"); > + sleep_cpu = -EINVAL; > + return -EINVAL; > + } > + > +#ifdef CONFIG_SMP > + ret = bringup_hibernate_cpu(sleep_cpu); > + if (ret) { > + sleep_cpu = -EINVAL; > + return ret; > + } > +#endif > + resume_hdr = *hdr; > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore); > + > +int swsusp_arch_suspend(void) > +{ > + int ret = 0; > + > + if (__cpu_suspend_enter(hibernate_cpu_context)) { > + sleep_cpu = smp_processor_id(); > + suspend_save_csrs(hibernate_cpu_context); > + ret = swsusp_save(); > + } else { > + suspend_restore_csrs(hibernate_cpu_context); > + flush_tlb_all(); > + flush_icache_all(); > + > + /* > + * Tell the hibernation core that we've just restored the memory. > + */ > + in_suspend = 0; > + sleep_cpu = -EINVAL; > + } > + > + return ret; > +} > + > +static int temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, unsigned long start, > + unsigned long end, pgprot_t prot) > +{ > + pte_t *src_ptep; > + pte_t *dst_ptep; > + > + if (pmd_none(READ_ONCE(*dst_pmdp))) { > + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC); > + if (!dst_ptep) > + return -ENOMEM; > + > + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); > + } > + > + dst_ptep = pte_offset_kernel(dst_pmdp, start); > + src_ptep = pte_offset_kernel(src_pmdp, start); > + > + do { > + pte_t pte = READ_ONCE(*src_ptep); > + > + if (pte_present(pte)) > + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot))); > + } while (dst_ptep++, src_ptep++, start += PAGE_SIZE, start < end); > + > + return 0; > +} > + > +static int temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp, unsigned long start, > + unsigned long end, pgprot_t prot) > +{ > + unsigned long next; > + unsigned long ret; > + pmd_t *src_pmdp; > + pmd_t *dst_pmdp; > + > + if (pud_none(READ_ONCE(*dst_pudp))) { > + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC); > + if (!dst_pmdp) > + return -ENOMEM; > + > + pud_populate(NULL, dst_pudp, dst_pmdp); > + } > + > + dst_pmdp = pmd_offset(dst_pudp, start); > + src_pmdp = pmd_offset(src_pudp, start); > + > + do { > + pmd_t pmd = READ_ONCE(*src_pmdp); > + > + next = pmd_addr_end(start, end); > + > + if (pmd_none(pmd)) > + continue; > + > + if (pmd_leaf(pmd)) { > + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot))); > + } else { > + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, start, next, prot); > + if (ret) > + return -ENOMEM; > + } > + } while (dst_pmdp++, src_pmdp++, start = next, start != end); > + > + return 0; > +} > + > +static int temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp, unsigned long start, > + unsigned long end, pgprot_t prot) > +{ > + unsigned long next; > + unsigned long ret; > + pud_t *dst_pudp; > + pud_t *src_pudp; > + > + if (p4d_none(READ_ONCE(*dst_p4dp))) { > + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC); > + if (!dst_pudp) > + return -ENOMEM; > + > + p4d_populate(NULL, dst_p4dp, dst_pudp); > + } > + > + dst_pudp = pud_offset(dst_p4dp, start); > + src_pudp = pud_offset(src_p4dp, start); > + > + do { > + pud_t pud = READ_ONCE(*src_pudp); > + > + next = pud_addr_end(start, end); > + > + if (pud_none(pud)) > + continue; > + > + if (pud_leaf(pud)) { > + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot))); > + } else { > + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, start, next, prot); > + if (ret) > + return -ENOMEM; > + } > + } while (dst_pudp++, src_pudp++, start = next, start != end); > + > + return 0; > +} > + > +static int temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp, unsigned long start, > + unsigned long end, pgprot_t prot) > +{ > + unsigned long next; > + unsigned long ret; > + p4d_t *dst_p4dp; > + p4d_t *src_p4dp; > + > + if (pgd_none(READ_ONCE(*dst_pgdp))) { > + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC); > + if (!dst_p4dp) > + return -ENOMEM; > + > + pgd_populate(NULL, dst_pgdp, dst_p4dp); > + } > + > + dst_p4dp = p4d_offset(dst_pgdp, start); > + src_p4dp = p4d_offset(src_pgdp, start); > + > + do { > + p4d_t p4d = READ_ONCE(*src_p4dp); > + > + next = p4d_addr_end(start, end); > + > + if (p4d_none(p4d)) > + continue; > + > + if (p4d_leaf(p4d)) { > + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot))); > + } else { > + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, start, next, prot); > + if (ret) > + return -ENOMEM; > + } > + } while (dst_p4dp++, src_p4dp++, start = next, start != end); > + > + return 0; > +} > + > +static int temp_pgtable_mapping(pgd_t *pgdp, unsigned long start, unsigned long end, pgprot_t prot) > +{ > + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, start); > + pgd_t *src_pgdp = pgd_offset_k(start); > + unsigned long next; > + unsigned long ret; > + > + do { > + pgd_t pgd = READ_ONCE(*src_pgdp); > + > + next = pgd_addr_end(start, end); > + > + if (pgd_none(pgd)) > + continue; > + > + if (pgd_leaf(pgd)) { > + set_pgd(dst_pgdp, __pgd(pgd_val(pgd) | pgprot_val(prot))); > + } else { > + ret = temp_pgtable_map_p4d(dst_pgdp, src_pgdp, start, next, prot); > + if (ret) > + return -ENOMEM; > + } > + } while (dst_pgdp++, src_pgdp++, start = next, start != end); > + > + return 0; > +} > + > +static unsigned long relocate_restore_code(void) > +{ > + void *page = (void *)get_safe_page(GFP_ATOMIC); > + > + if (!page) > + return -ENOMEM; > + > + copy_page(page, hibernate_core_restore_code); > + > + /* Make the page containing the relocated code executable. */ > + set_memory_x((unsigned long)page, 1); > + > + return (unsigned long)page; > +} > + > +int swsusp_arch_resume(void) > +{ > + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn); > + unsigned long start = PAGE_OFFSET; > + int ret; > + > + /* > + * Memory allocated by get_safe_page() will be dealt with by the hibernation core, > + * we don't need to free it here. > + */ > + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); > + if (!resume_pg_dir) > + return -ENOMEM; > + > + /* > + * Create a temporary page table and map the whole linear region as executable and > + * writable. > + */ > + ret = temp_pgtable_mapping(resume_pg_dir, start, end, __pgprot(_PAGE_WRITE | _PAGE_EXEC)); Hmm weird protections, PAGE_PRESENT and PAGE_READ are not even set. Shouldn't you use PAGE_KERNEL_EXEC instead? > + if (ret) > + return ret; > + > + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */ > + relocated_restore_code = relocate_restore_code(); > + if (relocated_restore_code == -ENOMEM) > + return -ENOMEM; > + > + /* > + * Map the __hibernate_cpu_resume() address to the temporary page table so that the > + * restore code can jumps to it after finished restore the image. The next execution > + * code doesn't find itself in a different address space after switching over to the > + * original page table used by the hibernated image. > + * > + */ > + start = (unsigned long)resume_hdr.restore_cpu_addr; > + end = start + PAGE_SIZE; > + > + ret = temp_pgtable_mapping(resume_pg_dir, start, end, __pgprot(_PAGE_WRITE | _PAGE_EXEC)); > + if (ret) > + return ret; You set the linear mapping as executable so no need for this. > + > + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode), > + resume_hdr.restore_cpu_addr); > + > + return 0; > +} > + > +#ifdef CONFIG_PM_SLEEP_SMP > +int hibernate_resume_nonboot_cpu_disable(void) > +{ > + if (sleep_cpu < 0) { > + pr_err("Failing to resume from hibernate on an unknown CPU\n"); > + return -ENODEV; > + } > + > + return freeze_secondary_cpus(sleep_cpu); > +} > +#endif > + > +static int __init riscv_hibernate_init(void) > +{ > + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL); > + > + if (WARN_ON(!hibernate_cpu_context)) > + return -ENOMEM; > + > + return 0; > +} > + > +early_initcall(riscv_hibernate_init);
On 3/8/23 10:08, Alexandre Ghiti wrote: > > On 3/8/23 09:06, Sia Jee Heng wrote: >> Low level Arch functions were created to support hibernation. >> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write >> cpu state onto the stack, then calling swsusp_save() to save the memory >> image. >> >> Arch specific hibernation header is implemented and is utilized by the >> arch_hibernation_header_restore() and arch_hibernation_header_save() >> functions. The arch specific hibernation header consists of satp, >> hartid, >> and the cpu_resume address. The kernel built version is also need to be >> saved into the hibernation image header to making sure only the same >> kernel is restore when resume. >> >> swsusp_arch_resume() creates a temporary page table that covering only >> the linear map. It copies the restore code to a 'safe' page, then start >> to restore the memory image. Once completed, it restores the original >> kernel's page table. It then calls into __hibernate_cpu_resume() >> to restore the CPU context. Finally, it follows the normal hibernation >> path back to the hibernation core. >> >> To enable hibernation/suspend to disk into RISCV, the below config >> need to be enabled: >> - CONFIG_ARCH_HIBERNATION_HEADER >> - CONFIG_ARCH_HIBERNATION_POSSIBLE >> >> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> >> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> >> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> >> --- >> arch/riscv/Kconfig | 6 + >> arch/riscv/include/asm/assembler.h | 20 ++ >> arch/riscv/include/asm/suspend.h | 19 ++ >> arch/riscv/kernel/Makefile | 1 + >> arch/riscv/kernel/asm-offsets.c | 5 + >> arch/riscv/kernel/hibernate-asm.S | 77 ++++++ >> arch/riscv/kernel/hibernate.c | 424 +++++++++++++++++++++++++++++ >> 7 files changed, 552 insertions(+) >> create mode 100644 arch/riscv/kernel/hibernate-asm.S >> create mode 100644 arch/riscv/kernel/hibernate.c >> >> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >> index c5e42cc37604..473c2a1a6884 100644 >> --- a/arch/riscv/Kconfig >> +++ b/arch/riscv/Kconfig >> @@ -707,6 +707,12 @@ menu "Power management options" >> source "kernel/power/Kconfig" >> +config ARCH_HIBERNATION_POSSIBLE >> + def_bool y >> + >> +config ARCH_HIBERNATION_HEADER >> + def_bool HIBERNATION >> + >> endmenu # "Power management options" >> menu "CPU Power Management" >> diff --git a/arch/riscv/include/asm/assembler.h >> b/arch/riscv/include/asm/assembler.h >> index ba59d38f8937..44b1457d3e95 100644 >> --- a/arch/riscv/include/asm/assembler.h >> +++ b/arch/riscv/include/asm/assembler.h >> @@ -59,4 +59,24 @@ >> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0) >> .endm >> +/* >> + * copy_page - copy 1 page (4KB) of data from source to destination >> + * @a0 - destination >> + * @a1 - source >> + */ >> + .macro copy_page a0, a1 >> + lui a2, 0x1 >> + add a2, a2, a0 >> +1 : >> + REG_L t0, 0(a1) >> + REG_L t1, SZREG(a1) >> + >> + REG_S t0, 0(a0) >> + REG_S t1, SZREG(a0) >> + >> + addi a0, a0, 2 * SZREG >> + addi a1, a1, 2 * SZREG >> + bne a2, a0, 1b >> + .endm >> + >> #endif /* __ASM_ASSEMBLER_H */ >> diff --git a/arch/riscv/include/asm/suspend.h >> b/arch/riscv/include/asm/suspend.h >> index 67e047445662..02f87867389a 100644 >> --- a/arch/riscv/include/asm/suspend.h >> +++ b/arch/riscv/include/asm/suspend.h >> @@ -21,6 +21,11 @@ struct suspend_context { >> #endif >> }; >> +/* >> + * Used by hibernation core and cleared during resume sequence >> + */ >> +extern int in_suspend; >> + >> /* Low-level CPU suspend entry function */ >> int __cpu_suspend_enter(struct suspend_context *context); >> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, >> unsigned long context); >> /* Used to save and restore the CSRs */ >> void suspend_save_csrs(struct suspend_context *context); >> void suspend_restore_csrs(struct suspend_context *context); >> + >> +/* Low-level API to support hibernation */ >> +int swsusp_arch_suspend(void); >> +int swsusp_arch_resume(void); >> +int arch_hibernation_header_save(void *addr, unsigned int max_size); >> +int arch_hibernation_header_restore(void *addr); >> +int __hibernate_cpu_resume(void); >> + >> +/* Used to resume on the CPU we hibernated on */ >> +int hibernate_resume_nonboot_cpu_disable(void); >> + >> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, >> unsigned long satp_temp, >> + unsigned long cpu_resume); >> +asmlinkage int hibernate_core_restore_code(void); >> #endif >> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile >> index 4cf303a779ab..daab341d55e4 100644 >> --- a/arch/riscv/kernel/Makefile >> +++ b/arch/riscv/kernel/Makefile >> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o >> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o >> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o >> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o >> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o >> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o >> diff --git a/arch/riscv/kernel/asm-offsets.c >> b/arch/riscv/kernel/asm-offsets.c >> index df9444397908..d6a75aac1d27 100644 >> --- a/arch/riscv/kernel/asm-offsets.c >> +++ b/arch/riscv/kernel/asm-offsets.c >> @@ -9,6 +9,7 @@ >> #include <linux/kbuild.h> >> #include <linux/mm.h> >> #include <linux/sched.h> >> +#include <linux/suspend.h> >> #include <asm/kvm_host.h> >> #include <asm/thread_info.h> >> #include <asm/ptrace.h> >> @@ -116,6 +117,10 @@ void asm_offsets(void) >> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs); >> + OFFSET(HIBERN_PBE_ADDR, pbe, address); >> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address); >> + OFFSET(HIBERN_PBE_NEXT, pbe, next); >> + >> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); >> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); >> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); >> diff --git a/arch/riscv/kernel/hibernate-asm.S >> b/arch/riscv/kernel/hibernate-asm.S >> new file mode 100644 >> index 000000000000..effaf5ca5da0 >> --- /dev/null >> +++ b/arch/riscv/kernel/hibernate-asm.S >> @@ -0,0 +1,77 @@ >> +/* SPDX-License-Identifier: GPL-2.0-only */ >> +/* >> + * Hibernation low level support for RISCV. >> + * >> + * Copyright (C) 2023 StarFive Technology Co., Ltd. >> + * >> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> >> + */ >> + >> +#include <asm/asm.h> >> +#include <asm/asm-offsets.h> >> +#include <asm/assembler.h> >> +#include <asm/csr.h> >> + >> +#include <linux/linkage.h> >> + >> +/* >> + * int __hibernate_cpu_resume(void) >> + * Switch back to the hibernated image's page table prior to >> restoring the CPU >> + * context. >> + * >> + * Always returns 0 >> + */ >> +ENTRY(__hibernate_cpu_resume) >> + /* switch to hibernated image's page table. */ >> + csrw CSR_SATP, s0 >> + sfence.vma >> + >> + REG_L a0, hibernate_cpu_context >> + >> + suspend_restore_csrs >> + suspend_restore_regs >> + >> + /* Return zero value. */ >> + mv a0, zero >> + >> + ret >> +END(__hibernate_cpu_resume) >> + >> +/* >> + * Prepare to restore the image. >> + * a0: satp of saved page tables. >> + * a1: satp of temporary page tables. >> + * a2: cpu_resume. >> + */ >> +ENTRY(hibernate_restore_image) >> + mv s0, a0 >> + mv s1, a1 >> + mv s2, a2 >> + REG_L s4, restore_pblist >> + REG_L a1, relocated_restore_code >> + >> + jalr a1 >> +END(hibernate_restore_image) >> + >> +/* >> + * The below code will be executed from a 'safe' page. >> + * It first switches to the temporary page table, then starts to >> copy the pages >> + * back to the original memory location. Finally, it jumps to >> __hibernate_cpu_resume() >> + * to restore the CPU context. >> + */ >> +ENTRY(hibernate_core_restore_code) >> + /* switch to temp page table. */ >> + csrw satp, s1 >> + sfence.vma >> +.Lcopy: >> + /* The below code will restore the hibernated image. */ >> + REG_L a1, HIBERN_PBE_ADDR(s4) >> + REG_L a0, HIBERN_PBE_ORIG(s4) >> + >> + copy_page a0, a1 >> + >> + REG_L s4, HIBERN_PBE_NEXT(s4) >> + bnez s4, .Lcopy >> + >> + jalr s2 >> +END(hibernate_core_restore_code) >> diff --git a/arch/riscv/kernel/hibernate.c >> b/arch/riscv/kernel/hibernate.c >> new file mode 100644 >> index 000000000000..8fab50122e54 >> --- /dev/null >> +++ b/arch/riscv/kernel/hibernate.c >> @@ -0,0 +1,424 @@ >> +// SPDX-License-Identifier: GPL-2.0-only >> +/* >> + * Hibernation support for RISCV >> + * >> + * Copyright (C) 2023 StarFive Technology Co., Ltd. >> + * >> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> >> + */ >> + >> +#include <asm/barrier.h> >> +#include <asm/cacheflush.h> >> +#include <asm/mmu_context.h> >> +#include <asm/page.h> >> +#include <asm/pgalloc.h> >> +#include <asm/pgtable.h> >> +#include <asm/sections.h> >> +#include <asm/set_memory.h> >> +#include <asm/smp.h> >> +#include <asm/suspend.h> >> + >> +#include <linux/cpu.h> >> +#include <linux/memblock.h> >> +#include <linux/pm.h> >> +#include <linux/sched.h> >> +#include <linux/suspend.h> >> +#include <linux/utsname.h> >> + >> +/* The logical cpu number we should resume on, initialised to a >> non-cpu number. */ >> +static int sleep_cpu = -EINVAL; >> + >> +/* Pointer to the temporary resume page table. */ >> +static pgd_t *resume_pg_dir; >> + >> +/* CPU context to be saved. */ >> +struct suspend_context *hibernate_cpu_context; >> +EXPORT_SYMBOL_GPL(hibernate_cpu_context); >> + >> +unsigned long relocated_restore_code; >> +EXPORT_SYMBOL_GPL(relocated_restore_code); >> + >> +/** >> + * struct arch_hibernate_hdr_invariants - container to store kernel >> build version. >> + * @uts_version: to save the build number and date so that the we do >> not resume with >> + * a different kernel. >> + */ >> +struct arch_hibernate_hdr_invariants { >> + char uts_version[__NEW_UTS_LEN + 1]; >> +}; >> + >> +/** >> + * struct arch_hibernate_hdr - helper parameters that help us to >> restore the image. >> + * @invariants: container to store kernel build version. >> + * @hartid: to make sure same boot_cpu executes the >> hibernate/restore code. >> + * @saved_satp: original page table used by the hibernated image. >> + * @restore_cpu_addr: the kernel's image address to restore the CPU >> context. >> + */ >> +static struct arch_hibernate_hdr { >> + struct arch_hibernate_hdr_invariants invariants; >> + unsigned long hartid; >> + unsigned long saved_satp; >> + unsigned long restore_cpu_addr; >> +} resume_hdr; >> + >> +static void arch_hdr_invariants(struct arch_hibernate_hdr_invariants >> *i) >> +{ >> + memset(i, 0, sizeof(*i)); >> + memcpy(i->uts_version, init_utsname()->version, >> sizeof(i->uts_version)); >> +} >> + >> +/* >> + * Check if the given pfn is in the 'nosave' section. >> + */ >> +int pfn_is_nosave(unsigned long pfn) >> +{ >> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); >> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); >> + >> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)); >> +} >> + >> +void notrace save_processor_state(void) >> +{ >> + WARN_ON(num_online_cpus() != 1); >> +} >> + >> +void notrace restore_processor_state(void) >> +{ >> +} >> + >> +/* >> + * Helper parameters need to be saved to the hibernation image header. >> + */ >> +int arch_hibernation_header_save(void *addr, unsigned int max_size) >> +{ >> + struct arch_hibernate_hdr *hdr = addr; >> + >> + if (max_size < sizeof(*hdr)) >> + return -EOVERFLOW; >> + >> + arch_hdr_invariants(&hdr->invariants); >> + >> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu); >> + hdr->saved_satp = csr_read(CSR_SATP); >> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume; >> + >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save); >> + >> +/* >> + * Retrieve the helper parameters from the hibernation image header. >> + */ >> +int arch_hibernation_header_restore(void *addr) >> +{ >> + struct arch_hibernate_hdr_invariants invariants; >> + struct arch_hibernate_hdr *hdr = addr; >> + int ret = 0; >> + >> + arch_hdr_invariants(&invariants); >> + >> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) { >> + pr_crit("Hibernate image not generated by this kernel!\n"); >> + return -EINVAL; >> + } >> + >> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid); >> + if (sleep_cpu < 0) { >> + pr_crit("Hibernated on a CPU not known to this kernel!\n"); >> + sleep_cpu = -EINVAL; >> + return -EINVAL; >> + } >> + >> +#ifdef CONFIG_SMP >> + ret = bringup_hibernate_cpu(sleep_cpu); >> + if (ret) { >> + sleep_cpu = -EINVAL; >> + return ret; >> + } >> +#endif >> + resume_hdr = *hdr; >> + >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore); >> + >> +int swsusp_arch_suspend(void) >> +{ >> + int ret = 0; >> + >> + if (__cpu_suspend_enter(hibernate_cpu_context)) { >> + sleep_cpu = smp_processor_id(); >> + suspend_save_csrs(hibernate_cpu_context); >> + ret = swsusp_save(); >> + } else { >> + suspend_restore_csrs(hibernate_cpu_context); >> + flush_tlb_all(); >> + flush_icache_all(); >> + >> + /* >> + * Tell the hibernation core that we've just restored the >> memory. >> + */ >> + in_suspend = 0; >> + sleep_cpu = -EINVAL; >> + } >> + >> + return ret; >> +} >> + >> +static int temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, >> unsigned long start, >> + unsigned long end, pgprot_t prot) >> +{ >> + pte_t *src_ptep; >> + pte_t *dst_ptep; >> + >> + if (pmd_none(READ_ONCE(*dst_pmdp))) { >> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC); >> + if (!dst_ptep) >> + return -ENOMEM; >> + >> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); >> + } >> + >> + dst_ptep = pte_offset_kernel(dst_pmdp, start); >> + src_ptep = pte_offset_kernel(src_pmdp, start); >> + >> + do { >> + pte_t pte = READ_ONCE(*src_ptep); >> + >> + if (pte_present(pte)) >> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot))); >> + } while (dst_ptep++, src_ptep++, start += PAGE_SIZE, start < end); >> + >> + return 0; >> +} >> + >> +static int temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp, >> unsigned long start, >> + unsigned long end, pgprot_t prot) >> +{ >> + unsigned long next; >> + unsigned long ret; >> + pmd_t *src_pmdp; >> + pmd_t *dst_pmdp; >> + >> + if (pud_none(READ_ONCE(*dst_pudp))) { >> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC); >> + if (!dst_pmdp) >> + return -ENOMEM; >> + >> + pud_populate(NULL, dst_pudp, dst_pmdp); >> + } >> + >> + dst_pmdp = pmd_offset(dst_pudp, start); >> + src_pmdp = pmd_offset(src_pudp, start); >> + >> + do { >> + pmd_t pmd = READ_ONCE(*src_pmdp); >> + >> + next = pmd_addr_end(start, end); >> + >> + if (pmd_none(pmd)) >> + continue; >> + >> + if (pmd_leaf(pmd)) { >> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot))); >> + } else { >> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, start, >> next, prot); >> + if (ret) >> + return -ENOMEM; >> + } >> + } while (dst_pmdp++, src_pmdp++, start = next, start != end); >> + >> + return 0; >> +} >> + >> +static int temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp, >> unsigned long start, >> + unsigned long end, pgprot_t prot) >> +{ >> + unsigned long next; >> + unsigned long ret; >> + pud_t *dst_pudp; >> + pud_t *src_pudp; >> + >> + if (p4d_none(READ_ONCE(*dst_p4dp))) { >> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC); >> + if (!dst_pudp) >> + return -ENOMEM; >> + >> + p4d_populate(NULL, dst_p4dp, dst_pudp); >> + } >> + >> + dst_pudp = pud_offset(dst_p4dp, start); >> + src_pudp = pud_offset(src_p4dp, start); >> + >> + do { >> + pud_t pud = READ_ONCE(*src_pudp); >> + >> + next = pud_addr_end(start, end); >> + >> + if (pud_none(pud)) >> + continue; >> + >> + if (pud_leaf(pud)) { >> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot))); >> + } else { >> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, start, >> next, prot); >> + if (ret) >> + return -ENOMEM; >> + } >> + } while (dst_pudp++, src_pudp++, start = next, start != end); >> + >> + return 0; >> +} >> + >> +static int temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp, >> unsigned long start, >> + unsigned long end, pgprot_t prot) >> +{ >> + unsigned long next; >> + unsigned long ret; >> + p4d_t *dst_p4dp; >> + p4d_t *src_p4dp; >> + >> + if (pgd_none(READ_ONCE(*dst_pgdp))) { >> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC); >> + if (!dst_p4dp) >> + return -ENOMEM; >> + >> + pgd_populate(NULL, dst_pgdp, dst_p4dp); >> + } >> + >> + dst_p4dp = p4d_offset(dst_pgdp, start); >> + src_p4dp = p4d_offset(src_pgdp, start); >> + >> + do { >> + p4d_t p4d = READ_ONCE(*src_p4dp); >> + >> + next = p4d_addr_end(start, end); >> + >> + if (p4d_none(p4d)) >> + continue; >> + >> + if (p4d_leaf(p4d)) { >> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot))); >> + } else { >> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, start, >> next, prot); >> + if (ret) >> + return -ENOMEM; >> + } >> + } while (dst_p4dp++, src_p4dp++, start = next, start != end); >> + >> + return 0; >> +} >> + >> +static int temp_pgtable_mapping(pgd_t *pgdp, unsigned long start, >> unsigned long end, pgprot_t prot) >> +{ >> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, start); >> + pgd_t *src_pgdp = pgd_offset_k(start); >> + unsigned long next; >> + unsigned long ret; >> + >> + do { >> + pgd_t pgd = READ_ONCE(*src_pgdp); >> + >> + next = pgd_addr_end(start, end); >> + >> + if (pgd_none(pgd)) >> + continue; >> + >> + if (pgd_leaf(pgd)) { >> + set_pgd(dst_pgdp, __pgd(pgd_val(pgd) | pgprot_val(prot))); >> + } else { >> + ret = temp_pgtable_map_p4d(dst_pgdp, src_pgdp, start, >> next, prot); >> + if (ret) >> + return -ENOMEM; >> + } >> + } while (dst_pgdp++, src_pgdp++, start = next, start != end); >> + >> + return 0; >> +} >> + >> +static unsigned long relocate_restore_code(void) >> +{ >> + void *page = (void *)get_safe_page(GFP_ATOMIC); >> + >> + if (!page) >> + return -ENOMEM; >> + >> + copy_page(page, hibernate_core_restore_code); >> + >> + /* Make the page containing the relocated code executable. */ >> + set_memory_x((unsigned long)page, 1); >> + >> + return (unsigned long)page; >> +} >> + >> +int swsusp_arch_resume(void) >> +{ >> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn); >> + unsigned long start = PAGE_OFFSET; >> + int ret; >> + >> + /* >> + * Memory allocated by get_safe_page() will be dealt with by the >> hibernation core, >> + * we don't need to free it here. >> + */ >> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); >> + if (!resume_pg_dir) >> + return -ENOMEM; >> + >> + /* >> + * Create a temporary page table and map the whole linear region >> as executable and >> + * writable. >> + */ >> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, >> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); > > > Hmm weird protections, PAGE_PRESENT and PAGE_READ are not even set. > Shouldn't you use PAGE_KERNEL_EXEC instead? Forget this one, that's the protections you add, so that's good :) > > >> + if (ret) >> + return ret; >> + >> + /* Move the restore code to a new page so that it doesn't get >> overwritten by itself. */ >> + relocated_restore_code = relocate_restore_code(); >> + if (relocated_restore_code == -ENOMEM) >> + return -ENOMEM; >> + >> + /* >> + * Map the __hibernate_cpu_resume() address to the temporary >> page table so that the >> + * restore code can jumps to it after finished restore the >> image. The next execution >> + * code doesn't find itself in a different address space after >> switching over to the >> + * original page table used by the hibernated image. >> + * >> + */ >> + start = (unsigned long)resume_hdr.restore_cpu_addr; >> + end = start + PAGE_SIZE; >> + >> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, >> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); >> + if (ret) >> + return ret; > > > You set the linear mapping as executable so no need for this. > > >> + >> + hibernate_restore_image(resume_hdr.saved_satp, >> (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode), >> + resume_hdr.restore_cpu_addr); >> + >> + return 0; >> +} >> + >> +#ifdef CONFIG_PM_SLEEP_SMP >> +int hibernate_resume_nonboot_cpu_disable(void) >> +{ >> + if (sleep_cpu < 0) { >> + pr_err("Failing to resume from hibernate on an unknown CPU\n"); >> + return -ENODEV; >> + } >> + >> + return freeze_secondary_cpus(sleep_cpu); >> +} >> +#endif >> + >> +static int __init riscv_hibernate_init(void) >> +{ >> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), >> GFP_KERNEL); >> + >> + if (WARN_ON(!hibernate_cpu_context)) >> + return -ENOMEM; >> + >> + return 0; >> +} >> + >> +early_initcall(riscv_hibernate_init); > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
> -----Original Message----- > From: Alexandre Ghiti <alex@ghiti.fr> > Sent: Wednesday, March 8, 2023 5:28 PM > To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu > Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo > <mason.huo@starfivetech.com> > Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk > > > On 3/8/23 10:08, Alexandre Ghiti wrote: > > > > On 3/8/23 09:06, Sia Jee Heng wrote: > >> Low level Arch functions were created to support hibernation. > >> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write > >> cpu state onto the stack, then calling swsusp_save() to save the memory > >> image. > >> > >> Arch specific hibernation header is implemented and is utilized by the > >> arch_hibernation_header_restore() and arch_hibernation_header_save() > >> functions. The arch specific hibernation header consists of satp, > >> hartid, > >> and the cpu_resume address. The kernel built version is also need to be > >> saved into the hibernation image header to making sure only the same > >> kernel is restore when resume. > >> > >> swsusp_arch_resume() creates a temporary page table that covering only > >> the linear map. It copies the restore code to a 'safe' page, then start > >> to restore the memory image. Once completed, it restores the original > >> kernel's page table. It then calls into __hibernate_cpu_resume() > >> to restore the CPU context. Finally, it follows the normal hibernation > >> path back to the hibernation core. > >> > >> To enable hibernation/suspend to disk into RISCV, the below config > >> need to be enabled: > >> - CONFIG_ARCH_HIBERNATION_HEADER > >> - CONFIG_ARCH_HIBERNATION_POSSIBLE > >> > >> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> > >> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> > >> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> > >> --- > >> arch/riscv/Kconfig | 6 + > >> arch/riscv/include/asm/assembler.h | 20 ++ > >> arch/riscv/include/asm/suspend.h | 19 ++ > >> arch/riscv/kernel/Makefile | 1 + > >> arch/riscv/kernel/asm-offsets.c | 5 + > >> arch/riscv/kernel/hibernate-asm.S | 77 ++++++ > >> arch/riscv/kernel/hibernate.c | 424 +++++++++++++++++++++++++++++ > >> 7 files changed, 552 insertions(+) > >> create mode 100644 arch/riscv/kernel/hibernate-asm.S > >> create mode 100644 arch/riscv/kernel/hibernate.c > >> > >> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > >> index c5e42cc37604..473c2a1a6884 100644 > >> --- a/arch/riscv/Kconfig > >> +++ b/arch/riscv/Kconfig > >> @@ -707,6 +707,12 @@ menu "Power management options" > >> source "kernel/power/Kconfig" > >> +config ARCH_HIBERNATION_POSSIBLE > >> + def_bool y > >> + > >> +config ARCH_HIBERNATION_HEADER > >> + def_bool HIBERNATION > >> + > >> endmenu # "Power management options" > >> menu "CPU Power Management" > >> diff --git a/arch/riscv/include/asm/assembler.h > >> b/arch/riscv/include/asm/assembler.h > >> index ba59d38f8937..44b1457d3e95 100644 > >> --- a/arch/riscv/include/asm/assembler.h > >> +++ b/arch/riscv/include/asm/assembler.h > >> @@ -59,4 +59,24 @@ > >> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0) > >> .endm > >> +/* > >> + * copy_page - copy 1 page (4KB) of data from source to destination > >> + * @a0 - destination > >> + * @a1 - source > >> + */ > >> + .macro copy_page a0, a1 > >> + lui a2, 0x1 > >> + add a2, a2, a0 > >> +1 : > >> + REG_L t0, 0(a1) > >> + REG_L t1, SZREG(a1) > >> + > >> + REG_S t0, 0(a0) > >> + REG_S t1, SZREG(a0) > >> + > >> + addi a0, a0, 2 * SZREG > >> + addi a1, a1, 2 * SZREG > >> + bne a2, a0, 1b > >> + .endm > >> + > >> #endif /* __ASM_ASSEMBLER_H */ > >> diff --git a/arch/riscv/include/asm/suspend.h > >> b/arch/riscv/include/asm/suspend.h > >> index 67e047445662..02f87867389a 100644 > >> --- a/arch/riscv/include/asm/suspend.h > >> +++ b/arch/riscv/include/asm/suspend.h > >> @@ -21,6 +21,11 @@ struct suspend_context { > >> #endif > >> }; > >> +/* > >> + * Used by hibernation core and cleared during resume sequence > >> + */ > >> +extern int in_suspend; > >> + > >> /* Low-level CPU suspend entry function */ > >> int __cpu_suspend_enter(struct suspend_context *context); > >> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, > >> unsigned long context); > >> /* Used to save and restore the CSRs */ > >> void suspend_save_csrs(struct suspend_context *context); > >> void suspend_restore_csrs(struct suspend_context *context); > >> + > >> +/* Low-level API to support hibernation */ > >> +int swsusp_arch_suspend(void); > >> +int swsusp_arch_resume(void); > >> +int arch_hibernation_header_save(void *addr, unsigned int max_size); > >> +int arch_hibernation_header_restore(void *addr); > >> +int __hibernate_cpu_resume(void); > >> + > >> +/* Used to resume on the CPU we hibernated on */ > >> +int hibernate_resume_nonboot_cpu_disable(void); > >> + > >> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, > >> unsigned long satp_temp, > >> + unsigned long cpu_resume); > >> +asmlinkage int hibernate_core_restore_code(void); > >> #endif > >> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > >> index 4cf303a779ab..daab341d55e4 100644 > >> --- a/arch/riscv/kernel/Makefile > >> +++ b/arch/riscv/kernel/Makefile > >> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o > >> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o > >> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o > >> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o > >> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o > >> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o > >> diff --git a/arch/riscv/kernel/asm-offsets.c > >> b/arch/riscv/kernel/asm-offsets.c > >> index df9444397908..d6a75aac1d27 100644 > >> --- a/arch/riscv/kernel/asm-offsets.c > >> +++ b/arch/riscv/kernel/asm-offsets.c > >> @@ -9,6 +9,7 @@ > >> #include <linux/kbuild.h> > >> #include <linux/mm.h> > >> #include <linux/sched.h> > >> +#include <linux/suspend.h> > >> #include <asm/kvm_host.h> > >> #include <asm/thread_info.h> > >> #include <asm/ptrace.h> > >> @@ -116,6 +117,10 @@ void asm_offsets(void) > >> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs); > >> + OFFSET(HIBERN_PBE_ADDR, pbe, address); > >> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address); > >> + OFFSET(HIBERN_PBE_NEXT, pbe, next); > >> + > >> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); > >> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); > >> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); > >> diff --git a/arch/riscv/kernel/hibernate-asm.S > >> b/arch/riscv/kernel/hibernate-asm.S > >> new file mode 100644 > >> index 000000000000..effaf5ca5da0 > >> --- /dev/null > >> +++ b/arch/riscv/kernel/hibernate-asm.S > >> @@ -0,0 +1,77 @@ > >> +/* SPDX-License-Identifier: GPL-2.0-only */ > >> +/* > >> + * Hibernation low level support for RISCV. > >> + * > >> + * Copyright (C) 2023 StarFive Technology Co., Ltd. > >> + * > >> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> > >> + */ > >> + > >> +#include <asm/asm.h> > >> +#include <asm/asm-offsets.h> > >> +#include <asm/assembler.h> > >> +#include <asm/csr.h> > >> + > >> +#include <linux/linkage.h> > >> + > >> +/* > >> + * int __hibernate_cpu_resume(void) > >> + * Switch back to the hibernated image's page table prior to > >> restoring the CPU > >> + * context. > >> + * > >> + * Always returns 0 > >> + */ > >> +ENTRY(__hibernate_cpu_resume) > >> + /* switch to hibernated image's page table. */ > >> + csrw CSR_SATP, s0 > >> + sfence.vma > >> + > >> + REG_L a0, hibernate_cpu_context > >> + > >> + suspend_restore_csrs > >> + suspend_restore_regs > >> + > >> + /* Return zero value. */ > >> + mv a0, zero > >> + > >> + ret > >> +END(__hibernate_cpu_resume) > >> + > >> +/* > >> + * Prepare to restore the image. > >> + * a0: satp of saved page tables. > >> + * a1: satp of temporary page tables. > >> + * a2: cpu_resume. > >> + */ > >> +ENTRY(hibernate_restore_image) > >> + mv s0, a0 > >> + mv s1, a1 > >> + mv s2, a2 > >> + REG_L s4, restore_pblist > >> + REG_L a1, relocated_restore_code > >> + > >> + jalr a1 > >> +END(hibernate_restore_image) > >> + > >> +/* > >> + * The below code will be executed from a 'safe' page. > >> + * It first switches to the temporary page table, then starts to > >> copy the pages > >> + * back to the original memory location. Finally, it jumps to > >> __hibernate_cpu_resume() > >> + * to restore the CPU context. > >> + */ > >> +ENTRY(hibernate_core_restore_code) > >> + /* switch to temp page table. */ > >> + csrw satp, s1 > >> + sfence.vma > >> +.Lcopy: > >> + /* The below code will restore the hibernated image. */ > >> + REG_L a1, HIBERN_PBE_ADDR(s4) > >> + REG_L a0, HIBERN_PBE_ORIG(s4) > >> + > >> + copy_page a0, a1 > >> + > >> + REG_L s4, HIBERN_PBE_NEXT(s4) > >> + bnez s4, .Lcopy > >> + > >> + jalr s2 > >> +END(hibernate_core_restore_code) > >> diff --git a/arch/riscv/kernel/hibernate.c > >> b/arch/riscv/kernel/hibernate.c > >> new file mode 100644 > >> index 000000000000..8fab50122e54 > >> --- /dev/null > >> +++ b/arch/riscv/kernel/hibernate.c > >> @@ -0,0 +1,424 @@ > >> +// SPDX-License-Identifier: GPL-2.0-only > >> +/* > >> + * Hibernation support for RISCV > >> + * > >> + * Copyright (C) 2023 StarFive Technology Co., Ltd. > >> + * > >> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> > >> + */ > >> + > >> +#include <asm/barrier.h> > >> +#include <asm/cacheflush.h> > >> +#include <asm/mmu_context.h> > >> +#include <asm/page.h> > >> +#include <asm/pgalloc.h> > >> +#include <asm/pgtable.h> > >> +#include <asm/sections.h> > >> +#include <asm/set_memory.h> > >> +#include <asm/smp.h> > >> +#include <asm/suspend.h> > >> + > >> +#include <linux/cpu.h> > >> +#include <linux/memblock.h> > >> +#include <linux/pm.h> > >> +#include <linux/sched.h> > >> +#include <linux/suspend.h> > >> +#include <linux/utsname.h> > >> + > >> +/* The logical cpu number we should resume on, initialised to a > >> non-cpu number. */ > >> +static int sleep_cpu = -EINVAL; > >> + > >> +/* Pointer to the temporary resume page table. */ > >> +static pgd_t *resume_pg_dir; > >> + > >> +/* CPU context to be saved. */ > >> +struct suspend_context *hibernate_cpu_context; > >> +EXPORT_SYMBOL_GPL(hibernate_cpu_context); > >> + > >> +unsigned long relocated_restore_code; > >> +EXPORT_SYMBOL_GPL(relocated_restore_code); > >> + > >> +/** > >> + * struct arch_hibernate_hdr_invariants - container to store kernel > >> build version. > >> + * @uts_version: to save the build number and date so that the we do > >> not resume with > >> + * a different kernel. > >> + */ > >> +struct arch_hibernate_hdr_invariants { > >> + char uts_version[__NEW_UTS_LEN + 1]; > >> +}; > >> + > >> +/** > >> + * struct arch_hibernate_hdr - helper parameters that help us to > >> restore the image. > >> + * @invariants: container to store kernel build version. > >> + * @hartid: to make sure same boot_cpu executes the > >> hibernate/restore code. > >> + * @saved_satp: original page table used by the hibernated image. > >> + * @restore_cpu_addr: the kernel's image address to restore the CPU > >> context. > >> + */ > >> +static struct arch_hibernate_hdr { > >> + struct arch_hibernate_hdr_invariants invariants; > >> + unsigned long hartid; > >> + unsigned long saved_satp; > >> + unsigned long restore_cpu_addr; > >> +} resume_hdr; > >> + > >> +static void arch_hdr_invariants(struct arch_hibernate_hdr_invariants > >> *i) > >> +{ > >> + memset(i, 0, sizeof(*i)); > >> + memcpy(i->uts_version, init_utsname()->version, > >> sizeof(i->uts_version)); > >> +} > >> + > >> +/* > >> + * Check if the given pfn is in the 'nosave' section. > >> + */ > >> +int pfn_is_nosave(unsigned long pfn) > >> +{ > >> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); > >> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); > >> + > >> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)); > >> +} > >> + > >> +void notrace save_processor_state(void) > >> +{ > >> + WARN_ON(num_online_cpus() != 1); > >> +} > >> + > >> +void notrace restore_processor_state(void) > >> +{ > >> +} > >> + > >> +/* > >> + * Helper parameters need to be saved to the hibernation image header. > >> + */ > >> +int arch_hibernation_header_save(void *addr, unsigned int max_size) > >> +{ > >> + struct arch_hibernate_hdr *hdr = addr; > >> + > >> + if (max_size < sizeof(*hdr)) > >> + return -EOVERFLOW; > >> + > >> + arch_hdr_invariants(&hdr->invariants); > >> + > >> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu); > >> + hdr->saved_satp = csr_read(CSR_SATP); > >> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume; > >> + > >> + return 0; > >> +} > >> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save); > >> + > >> +/* > >> + * Retrieve the helper parameters from the hibernation image header. > >> + */ > >> +int arch_hibernation_header_restore(void *addr) > >> +{ > >> + struct arch_hibernate_hdr_invariants invariants; > >> + struct arch_hibernate_hdr *hdr = addr; > >> + int ret = 0; > >> + > >> + arch_hdr_invariants(&invariants); > >> + > >> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) { > >> + pr_crit("Hibernate image not generated by this kernel!\n"); > >> + return -EINVAL; > >> + } > >> + > >> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid); > >> + if (sleep_cpu < 0) { > >> + pr_crit("Hibernated on a CPU not known to this kernel!\n"); > >> + sleep_cpu = -EINVAL; > >> + return -EINVAL; > >> + } > >> + > >> +#ifdef CONFIG_SMP > >> + ret = bringup_hibernate_cpu(sleep_cpu); > >> + if (ret) { > >> + sleep_cpu = -EINVAL; > >> + return ret; > >> + } > >> +#endif > >> + resume_hdr = *hdr; > >> + > >> + return ret; > >> +} > >> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore); > >> + > >> +int swsusp_arch_suspend(void) > >> +{ > >> + int ret = 0; > >> + > >> + if (__cpu_suspend_enter(hibernate_cpu_context)) { > >> + sleep_cpu = smp_processor_id(); > >> + suspend_save_csrs(hibernate_cpu_context); > >> + ret = swsusp_save(); > >> + } else { > >> + suspend_restore_csrs(hibernate_cpu_context); > >> + flush_tlb_all(); > >> + flush_icache_all(); > >> + > >> + /* > >> + * Tell the hibernation core that we've just restored the > >> memory. > >> + */ > >> + in_suspend = 0; > >> + sleep_cpu = -EINVAL; > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +static int temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, > >> unsigned long start, > >> + unsigned long end, pgprot_t prot) > >> +{ > >> + pte_t *src_ptep; > >> + pte_t *dst_ptep; > >> + > >> + if (pmd_none(READ_ONCE(*dst_pmdp))) { > >> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC); > >> + if (!dst_ptep) > >> + return -ENOMEM; > >> + > >> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); > >> + } > >> + > >> + dst_ptep = pte_offset_kernel(dst_pmdp, start); > >> + src_ptep = pte_offset_kernel(src_pmdp, start); > >> + > >> + do { > >> + pte_t pte = READ_ONCE(*src_ptep); > >> + > >> + if (pte_present(pte)) > >> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot))); > >> + } while (dst_ptep++, src_ptep++, start += PAGE_SIZE, start < end); > >> + > >> + return 0; > >> +} > >> + > >> +static int temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp, > >> unsigned long start, > >> + unsigned long end, pgprot_t prot) > >> +{ > >> + unsigned long next; > >> + unsigned long ret; > >> + pmd_t *src_pmdp; > >> + pmd_t *dst_pmdp; > >> + > >> + if (pud_none(READ_ONCE(*dst_pudp))) { > >> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC); > >> + if (!dst_pmdp) > >> + return -ENOMEM; > >> + > >> + pud_populate(NULL, dst_pudp, dst_pmdp); > >> + } > >> + > >> + dst_pmdp = pmd_offset(dst_pudp, start); > >> + src_pmdp = pmd_offset(src_pudp, start); > >> + > >> + do { > >> + pmd_t pmd = READ_ONCE(*src_pmdp); > >> + > >> + next = pmd_addr_end(start, end); > >> + > >> + if (pmd_none(pmd)) > >> + continue; > >> + > >> + if (pmd_leaf(pmd)) { > >> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot))); > >> + } else { > >> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, start, > >> next, prot); > >> + if (ret) > >> + return -ENOMEM; > >> + } > >> + } while (dst_pmdp++, src_pmdp++, start = next, start != end); > >> + > >> + return 0; > >> +} > >> + > >> +static int temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp, > >> unsigned long start, > >> + unsigned long end, pgprot_t prot) > >> +{ > >> + unsigned long next; > >> + unsigned long ret; > >> + pud_t *dst_pudp; > >> + pud_t *src_pudp; > >> + > >> + if (p4d_none(READ_ONCE(*dst_p4dp))) { > >> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC); > >> + if (!dst_pudp) > >> + return -ENOMEM; > >> + > >> + p4d_populate(NULL, dst_p4dp, dst_pudp); > >> + } > >> + > >> + dst_pudp = pud_offset(dst_p4dp, start); > >> + src_pudp = pud_offset(src_p4dp, start); > >> + > >> + do { > >> + pud_t pud = READ_ONCE(*src_pudp); > >> + > >> + next = pud_addr_end(start, end); > >> + > >> + if (pud_none(pud)) > >> + continue; > >> + > >> + if (pud_leaf(pud)) { > >> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot))); > >> + } else { > >> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, start, > >> next, prot); > >> + if (ret) > >> + return -ENOMEM; > >> + } > >> + } while (dst_pudp++, src_pudp++, start = next, start != end); > >> + > >> + return 0; > >> +} > >> + > >> +static int temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp, > >> unsigned long start, > >> + unsigned long end, pgprot_t prot) > >> +{ > >> + unsigned long next; > >> + unsigned long ret; > >> + p4d_t *dst_p4dp; > >> + p4d_t *src_p4dp; > >> + > >> + if (pgd_none(READ_ONCE(*dst_pgdp))) { > >> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC); > >> + if (!dst_p4dp) > >> + return -ENOMEM; > >> + > >> + pgd_populate(NULL, dst_pgdp, dst_p4dp); > >> + } > >> + > >> + dst_p4dp = p4d_offset(dst_pgdp, start); > >> + src_p4dp = p4d_offset(src_pgdp, start); > >> + > >> + do { > >> + p4d_t p4d = READ_ONCE(*src_p4dp); > >> + > >> + next = p4d_addr_end(start, end); > >> + > >> + if (p4d_none(p4d)) > >> + continue; > >> + > >> + if (p4d_leaf(p4d)) { > >> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot))); > >> + } else { > >> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, start, > >> next, prot); > >> + if (ret) > >> + return -ENOMEM; > >> + } > >> + } while (dst_p4dp++, src_p4dp++, start = next, start != end); > >> + > >> + return 0; > >> +} > >> + > >> +static int temp_pgtable_mapping(pgd_t *pgdp, unsigned long start, > >> unsigned long end, pgprot_t prot) > >> +{ > >> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, start); > >> + pgd_t *src_pgdp = pgd_offset_k(start); > >> + unsigned long next; > >> + unsigned long ret; > >> + > >> + do { > >> + pgd_t pgd = READ_ONCE(*src_pgdp); > >> + > >> + next = pgd_addr_end(start, end); > >> + > >> + if (pgd_none(pgd)) > >> + continue; > >> + > >> + if (pgd_leaf(pgd)) { > >> + set_pgd(dst_pgdp, __pgd(pgd_val(pgd) | pgprot_val(prot))); > >> + } else { > >> + ret = temp_pgtable_map_p4d(dst_pgdp, src_pgdp, start, > >> next, prot); > >> + if (ret) > >> + return -ENOMEM; > >> + } > >> + } while (dst_pgdp++, src_pgdp++, start = next, start != end); > >> + > >> + return 0; > >> +} > >> + > >> +static unsigned long relocate_restore_code(void) > >> +{ > >> + void *page = (void *)get_safe_page(GFP_ATOMIC); > >> + > >> + if (!page) > >> + return -ENOMEM; > >> + > >> + copy_page(page, hibernate_core_restore_code); > >> + > >> + /* Make the page containing the relocated code executable. */ > >> + set_memory_x((unsigned long)page, 1); > >> + > >> + return (unsigned long)page; > >> +} > >> + > >> +int swsusp_arch_resume(void) > >> +{ > >> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn); > >> + unsigned long start = PAGE_OFFSET; > >> + int ret; > >> + > >> + /* > >> + * Memory allocated by get_safe_page() will be dealt with by the > >> hibernation core, > >> + * we don't need to free it here. > >> + */ > >> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); > >> + if (!resume_pg_dir) > >> + return -ENOMEM; > >> + > >> + /* > >> + * Create a temporary page table and map the whole linear region > >> as executable and > >> + * writable. > >> + */ > >> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, > >> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); > > > > > > Hmm weird protections, PAGE_PRESENT and PAGE_READ are not even set. > > Shouldn't you use PAGE_KERNEL_EXEC instead? > > > Forget this one, that's the protections you add, so that's good :) OK. > > > > > > > >> + if (ret) > >> + return ret; > >> + > >> + /* Move the restore code to a new page so that it doesn't get > >> overwritten by itself. */ > >> + relocated_restore_code = relocate_restore_code(); > >> + if (relocated_restore_code == -ENOMEM) > >> + return -ENOMEM; > >> + > >> + /* > >> + * Map the __hibernate_cpu_resume() address to the temporary > >> page table so that the > >> + * restore code can jumps to it after finished restore the > >> image. The next execution > >> + * code doesn't find itself in a different address space after > >> switching over to the > >> + * original page table used by the hibernated image. > >> + * > >> + */ > >> + start = (unsigned long)resume_hdr.restore_cpu_addr; > >> + end = start + PAGE_SIZE; > >> + > >> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, > >> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); > >> + if (ret) > >> + return ret; > > > > > > You set the linear mapping as executable so no need for this. We need this mapping because the kernel addresses and the linear addresses in riscv-64bit are in different boundary. For example: Linear address start with 0xff60000000000000 Kernel address start with 0xffffffff80000000 > > > > > >> + > >> + hibernate_restore_image(resume_hdr.saved_satp, > >> (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode), > >> + resume_hdr.restore_cpu_addr); > >> + > >> + return 0; > >> +} > >> + > >> +#ifdef CONFIG_PM_SLEEP_SMP > >> +int hibernate_resume_nonboot_cpu_disable(void) > >> +{ > >> + if (sleep_cpu < 0) { > >> + pr_err("Failing to resume from hibernate on an unknown CPU\n"); > >> + return -ENODEV; > >> + } > >> + > >> + return freeze_secondary_cpus(sleep_cpu); > >> +} > >> +#endif > >> + > >> +static int __init riscv_hibernate_init(void) > >> +{ > >> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), > >> GFP_KERNEL); > >> + > >> + if (WARN_ON(!hibernate_cpu_context)) > >> + return -ENOMEM; > >> + > >> + return 0; > >> +} > >> + > >> +early_initcall(riscv_hibernate_init); > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi JeeHeng, On 3/8/23 12:07, JeeHeng Sia wrote: > >> -----Original Message----- >> From: Alexandre Ghiti <alex@ghiti.fr> >> Sent: Wednesday, March 8, 2023 5:28 PM >> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu >> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo >> <mason.huo@starfivetech.com> >> Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk >> >> >> On 3/8/23 10:08, Alexandre Ghiti wrote: >>> On 3/8/23 09:06, Sia Jee Heng wrote: >>>> Low level Arch functions were created to support hibernation. >>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write >>>> cpu state onto the stack, then calling swsusp_save() to save the memory >>>> image. >>>> >>>> Arch specific hibernation header is implemented and is utilized by the >>>> arch_hibernation_header_restore() and arch_hibernation_header_save() >>>> functions. The arch specific hibernation header consists of satp, >>>> hartid, >>>> and the cpu_resume address. The kernel built version is also need to be >>>> saved into the hibernation image header to making sure only the same >>>> kernel is restore when resume. >>>> >>>> swsusp_arch_resume() creates a temporary page table that covering only >>>> the linear map. It copies the restore code to a 'safe' page, then start >>>> to restore the memory image. Once completed, it restores the original >>>> kernel's page table. It then calls into __hibernate_cpu_resume() >>>> to restore the CPU context. Finally, it follows the normal hibernation >>>> path back to the hibernation core. >>>> >>>> To enable hibernation/suspend to disk into RISCV, the below config >>>> need to be enabled: >>>> - CONFIG_ARCH_HIBERNATION_HEADER >>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE >>>> >>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> >>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> >>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> >>>> --- >>>> arch/riscv/Kconfig | 6 + >>>> arch/riscv/include/asm/assembler.h | 20 ++ >>>> arch/riscv/include/asm/suspend.h | 19 ++ >>>> arch/riscv/kernel/Makefile | 1 + >>>> arch/riscv/kernel/asm-offsets.c | 5 + >>>> arch/riscv/kernel/hibernate-asm.S | 77 ++++++ >>>> arch/riscv/kernel/hibernate.c | 424 +++++++++++++++++++++++++++++ >>>> 7 files changed, 552 insertions(+) >>>> create mode 100644 arch/riscv/kernel/hibernate-asm.S >>>> create mode 100644 arch/riscv/kernel/hibernate.c >>>> >>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >>>> index c5e42cc37604..473c2a1a6884 100644 >>>> --- a/arch/riscv/Kconfig >>>> +++ b/arch/riscv/Kconfig >>>> @@ -707,6 +707,12 @@ menu "Power management options" >>>> source "kernel/power/Kconfig" >>>> +config ARCH_HIBERNATION_POSSIBLE >>>> + def_bool y >>>> + >>>> +config ARCH_HIBERNATION_HEADER >>>> + def_bool HIBERNATION >>>> + >>>> endmenu # "Power management options" >>>> menu "CPU Power Management" >>>> diff --git a/arch/riscv/include/asm/assembler.h >>>> b/arch/riscv/include/asm/assembler.h >>>> index ba59d38f8937..44b1457d3e95 100644 >>>> --- a/arch/riscv/include/asm/assembler.h >>>> +++ b/arch/riscv/include/asm/assembler.h >>>> @@ -59,4 +59,24 @@ >>>> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0) >>>> .endm >>>> +/* >>>> + * copy_page - copy 1 page (4KB) of data from source to destination >>>> + * @a0 - destination >>>> + * @a1 - source >>>> + */ >>>> + .macro copy_page a0, a1 >>>> + lui a2, 0x1 >>>> + add a2, a2, a0 >>>> +1 : >>>> + REG_L t0, 0(a1) >>>> + REG_L t1, SZREG(a1) >>>> + >>>> + REG_S t0, 0(a0) >>>> + REG_S t1, SZREG(a0) >>>> + >>>> + addi a0, a0, 2 * SZREG >>>> + addi a1, a1, 2 * SZREG >>>> + bne a2, a0, 1b >>>> + .endm >>>> + >>>> #endif /* __ASM_ASSEMBLER_H */ >>>> diff --git a/arch/riscv/include/asm/suspend.h >>>> b/arch/riscv/include/asm/suspend.h >>>> index 67e047445662..02f87867389a 100644 >>>> --- a/arch/riscv/include/asm/suspend.h >>>> +++ b/arch/riscv/include/asm/suspend.h >>>> @@ -21,6 +21,11 @@ struct suspend_context { >>>> #endif >>>> }; >>>> +/* >>>> + * Used by hibernation core and cleared during resume sequence >>>> + */ >>>> +extern int in_suspend; >>>> + >>>> /* Low-level CPU suspend entry function */ >>>> int __cpu_suspend_enter(struct suspend_context *context); >>>> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, >>>> unsigned long context); >>>> /* Used to save and restore the CSRs */ >>>> void suspend_save_csrs(struct suspend_context *context); >>>> void suspend_restore_csrs(struct suspend_context *context); >>>> + >>>> +/* Low-level API to support hibernation */ >>>> +int swsusp_arch_suspend(void); >>>> +int swsusp_arch_resume(void); >>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size); >>>> +int arch_hibernation_header_restore(void *addr); >>>> +int __hibernate_cpu_resume(void); >>>> + >>>> +/* Used to resume on the CPU we hibernated on */ >>>> +int hibernate_resume_nonboot_cpu_disable(void); >>>> + >>>> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, >>>> unsigned long satp_temp, >>>> + unsigned long cpu_resume); >>>> +asmlinkage int hibernate_core_restore_code(void); >>>> #endif >>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile >>>> index 4cf303a779ab..daab341d55e4 100644 >>>> --- a/arch/riscv/kernel/Makefile >>>> +++ b/arch/riscv/kernel/Makefile >>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o >>>> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o >>>> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o >>>> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o >>>> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o >>>> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o >>>> diff --git a/arch/riscv/kernel/asm-offsets.c >>>> b/arch/riscv/kernel/asm-offsets.c >>>> index df9444397908..d6a75aac1d27 100644 >>>> --- a/arch/riscv/kernel/asm-offsets.c >>>> +++ b/arch/riscv/kernel/asm-offsets.c >>>> @@ -9,6 +9,7 @@ >>>> #include <linux/kbuild.h> >>>> #include <linux/mm.h> >>>> #include <linux/sched.h> >>>> +#include <linux/suspend.h> >>>> #include <asm/kvm_host.h> >>>> #include <asm/thread_info.h> >>>> #include <asm/ptrace.h> >>>> @@ -116,6 +117,10 @@ void asm_offsets(void) >>>> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs); >>>> + OFFSET(HIBERN_PBE_ADDR, pbe, address); >>>> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address); >>>> + OFFSET(HIBERN_PBE_NEXT, pbe, next); >>>> + >>>> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); >>>> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); >>>> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); >>>> diff --git a/arch/riscv/kernel/hibernate-asm.S >>>> b/arch/riscv/kernel/hibernate-asm.S >>>> new file mode 100644 >>>> index 000000000000..effaf5ca5da0 >>>> --- /dev/null >>>> +++ b/arch/riscv/kernel/hibernate-asm.S >>>> @@ -0,0 +1,77 @@ >>>> +/* SPDX-License-Identifier: GPL-2.0-only */ >>>> +/* >>>> + * Hibernation low level support for RISCV. >>>> + * >>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd. >>>> + * >>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> >>>> + */ >>>> + >>>> +#include <asm/asm.h> >>>> +#include <asm/asm-offsets.h> >>>> +#include <asm/assembler.h> >>>> +#include <asm/csr.h> >>>> + >>>> +#include <linux/linkage.h> >>>> + >>>> +/* >>>> + * int __hibernate_cpu_resume(void) >>>> + * Switch back to the hibernated image's page table prior to >>>> restoring the CPU >>>> + * context. >>>> + * >>>> + * Always returns 0 >>>> + */ >>>> +ENTRY(__hibernate_cpu_resume) >>>> + /* switch to hibernated image's page table. */ >>>> + csrw CSR_SATP, s0 >>>> + sfence.vma >>>> + >>>> + REG_L a0, hibernate_cpu_context >>>> + >>>> + suspend_restore_csrs >>>> + suspend_restore_regs >>>> + >>>> + /* Return zero value. */ >>>> + mv a0, zero >>>> + >>>> + ret >>>> +END(__hibernate_cpu_resume) >>>> + >>>> +/* >>>> + * Prepare to restore the image. >>>> + * a0: satp of saved page tables. >>>> + * a1: satp of temporary page tables. >>>> + * a2: cpu_resume. >>>> + */ >>>> +ENTRY(hibernate_restore_image) >>>> + mv s0, a0 >>>> + mv s1, a1 >>>> + mv s2, a2 >>>> + REG_L s4, restore_pblist >>>> + REG_L a1, relocated_restore_code >>>> + >>>> + jalr a1 >>>> +END(hibernate_restore_image) >>>> + >>>> +/* >>>> + * The below code will be executed from a 'safe' page. >>>> + * It first switches to the temporary page table, then starts to >>>> copy the pages >>>> + * back to the original memory location. Finally, it jumps to >>>> __hibernate_cpu_resume() >>>> + * to restore the CPU context. >>>> + */ >>>> +ENTRY(hibernate_core_restore_code) >>>> + /* switch to temp page table. */ >>>> + csrw satp, s1 >>>> + sfence.vma >>>> +.Lcopy: >>>> + /* The below code will restore the hibernated image. */ >>>> + REG_L a1, HIBERN_PBE_ADDR(s4) >>>> + REG_L a0, HIBERN_PBE_ORIG(s4) >>>> + >>>> + copy_page a0, a1 >>>> + >>>> + REG_L s4, HIBERN_PBE_NEXT(s4) >>>> + bnez s4, .Lcopy >>>> + >>>> + jalr s2 >>>> +END(hibernate_core_restore_code) >>>> diff --git a/arch/riscv/kernel/hibernate.c >>>> b/arch/riscv/kernel/hibernate.c >>>> new file mode 100644 >>>> index 000000000000..8fab50122e54 >>>> --- /dev/null >>>> +++ b/arch/riscv/kernel/hibernate.c >>>> @@ -0,0 +1,424 @@ >>>> +// SPDX-License-Identifier: GPL-2.0-only >>>> +/* >>>> + * Hibernation support for RISCV >>>> + * >>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd. >>>> + * >>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> >>>> + */ >>>> + >>>> +#include <asm/barrier.h> >>>> +#include <asm/cacheflush.h> >>>> +#include <asm/mmu_context.h> >>>> +#include <asm/page.h> >>>> +#include <asm/pgalloc.h> >>>> +#include <asm/pgtable.h> >>>> +#include <asm/sections.h> >>>> +#include <asm/set_memory.h> >>>> +#include <asm/smp.h> >>>> +#include <asm/suspend.h> >>>> + >>>> +#include <linux/cpu.h> >>>> +#include <linux/memblock.h> >>>> +#include <linux/pm.h> >>>> +#include <linux/sched.h> >>>> +#include <linux/suspend.h> >>>> +#include <linux/utsname.h> >>>> + >>>> +/* The logical cpu number we should resume on, initialised to a >>>> non-cpu number. */ >>>> +static int sleep_cpu = -EINVAL; >>>> + >>>> +/* Pointer to the temporary resume page table. */ >>>> +static pgd_t *resume_pg_dir; >>>> + >>>> +/* CPU context to be saved. */ >>>> +struct suspend_context *hibernate_cpu_context; >>>> +EXPORT_SYMBOL_GPL(hibernate_cpu_context); >>>> + >>>> +unsigned long relocated_restore_code; >>>> +EXPORT_SYMBOL_GPL(relocated_restore_code); >>>> + >>>> +/** >>>> + * struct arch_hibernate_hdr_invariants - container to store kernel >>>> build version. >>>> + * @uts_version: to save the build number and date so that the we do >>>> not resume with >>>> + * a different kernel. >>>> + */ >>>> +struct arch_hibernate_hdr_invariants { >>>> + char uts_version[__NEW_UTS_LEN + 1]; >>>> +}; >>>> + >>>> +/** >>>> + * struct arch_hibernate_hdr - helper parameters that help us to >>>> restore the image. >>>> + * @invariants: container to store kernel build version. >>>> + * @hartid: to make sure same boot_cpu executes the >>>> hibernate/restore code. >>>> + * @saved_satp: original page table used by the hibernated image. >>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU >>>> context. >>>> + */ >>>> +static struct arch_hibernate_hdr { >>>> + struct arch_hibernate_hdr_invariants invariants; >>>> + unsigned long hartid; >>>> + unsigned long saved_satp; >>>> + unsigned long restore_cpu_addr; >>>> +} resume_hdr; >>>> + >>>> +static void arch_hdr_invariants(struct arch_hibernate_hdr_invariants >>>> *i) >>>> +{ >>>> + memset(i, 0, sizeof(*i)); >>>> + memcpy(i->uts_version, init_utsname()->version, >>>> sizeof(i->uts_version)); >>>> +} >>>> + >>>> +/* >>>> + * Check if the given pfn is in the 'nosave' section. >>>> + */ >>>> +int pfn_is_nosave(unsigned long pfn) >>>> +{ >>>> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); >>>> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); >>>> + >>>> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)); >>>> +} >>>> + >>>> +void notrace save_processor_state(void) >>>> +{ >>>> + WARN_ON(num_online_cpus() != 1); >>>> +} >>>> + >>>> +void notrace restore_processor_state(void) >>>> +{ >>>> +} >>>> + >>>> +/* >>>> + * Helper parameters need to be saved to the hibernation image header. >>>> + */ >>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size) >>>> +{ >>>> + struct arch_hibernate_hdr *hdr = addr; >>>> + >>>> + if (max_size < sizeof(*hdr)) >>>> + return -EOVERFLOW; >>>> + >>>> + arch_hdr_invariants(&hdr->invariants); >>>> + >>>> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu); >>>> + hdr->saved_satp = csr_read(CSR_SATP); >>>> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume; >>>> + >>>> + return 0; >>>> +} >>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save); >>>> + >>>> +/* >>>> + * Retrieve the helper parameters from the hibernation image header. >>>> + */ >>>> +int arch_hibernation_header_restore(void *addr) >>>> +{ >>>> + struct arch_hibernate_hdr_invariants invariants; >>>> + struct arch_hibernate_hdr *hdr = addr; >>>> + int ret = 0; >>>> + >>>> + arch_hdr_invariants(&invariants); >>>> + >>>> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) { >>>> + pr_crit("Hibernate image not generated by this kernel!\n"); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid); >>>> + if (sleep_cpu < 0) { >>>> + pr_crit("Hibernated on a CPU not known to this kernel!\n"); >>>> + sleep_cpu = -EINVAL; >>>> + return -EINVAL; >>>> + } >>>> + >>>> +#ifdef CONFIG_SMP >>>> + ret = bringup_hibernate_cpu(sleep_cpu); >>>> + if (ret) { >>>> + sleep_cpu = -EINVAL; >>>> + return ret; >>>> + } >>>> +#endif >>>> + resume_hdr = *hdr; >>>> + >>>> + return ret; >>>> +} >>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore); >>>> + >>>> +int swsusp_arch_suspend(void) >>>> +{ >>>> + int ret = 0; >>>> + >>>> + if (__cpu_suspend_enter(hibernate_cpu_context)) { >>>> + sleep_cpu = smp_processor_id(); >>>> + suspend_save_csrs(hibernate_cpu_context); >>>> + ret = swsusp_save(); >>>> + } else { >>>> + suspend_restore_csrs(hibernate_cpu_context); >>>> + flush_tlb_all(); >>>> + flush_icache_all(); >>>> + >>>> + /* >>>> + * Tell the hibernation core that we've just restored the >>>> memory. >>>> + */ >>>> + in_suspend = 0; >>>> + sleep_cpu = -EINVAL; >>>> + } >>>> + >>>> + return ret; >>>> +} >>>> + >>>> +static int temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, >>>> unsigned long start, >>>> + unsigned long end, pgprot_t prot) >>>> +{ >>>> + pte_t *src_ptep; >>>> + pte_t *dst_ptep; >>>> + >>>> + if (pmd_none(READ_ONCE(*dst_pmdp))) { >>>> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC); >>>> + if (!dst_ptep) >>>> + return -ENOMEM; >>>> + >>>> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); >>>> + } >>>> + >>>> + dst_ptep = pte_offset_kernel(dst_pmdp, start); >>>> + src_ptep = pte_offset_kernel(src_pmdp, start); >>>> + >>>> + do { >>>> + pte_t pte = READ_ONCE(*src_ptep); >>>> + >>>> + if (pte_present(pte)) >>>> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot))); >>>> + } while (dst_ptep++, src_ptep++, start += PAGE_SIZE, start < end); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp, >>>> unsigned long start, >>>> + unsigned long end, pgprot_t prot) >>>> +{ >>>> + unsigned long next; >>>> + unsigned long ret; >>>> + pmd_t *src_pmdp; >>>> + pmd_t *dst_pmdp; >>>> + >>>> + if (pud_none(READ_ONCE(*dst_pudp))) { >>>> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC); >>>> + if (!dst_pmdp) >>>> + return -ENOMEM; >>>> + >>>> + pud_populate(NULL, dst_pudp, dst_pmdp); >>>> + } >>>> + >>>> + dst_pmdp = pmd_offset(dst_pudp, start); >>>> + src_pmdp = pmd_offset(src_pudp, start); >>>> + >>>> + do { >>>> + pmd_t pmd = READ_ONCE(*src_pmdp); >>>> + >>>> + next = pmd_addr_end(start, end); >>>> + >>>> + if (pmd_none(pmd)) >>>> + continue; >>>> + >>>> + if (pmd_leaf(pmd)) { >>>> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot))); >>>> + } else { >>>> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, start, >>>> next, prot); >>>> + if (ret) >>>> + return -ENOMEM; >>>> + } >>>> + } while (dst_pmdp++, src_pmdp++, start = next, start != end); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp, >>>> unsigned long start, >>>> + unsigned long end, pgprot_t prot) >>>> +{ >>>> + unsigned long next; >>>> + unsigned long ret; >>>> + pud_t *dst_pudp; >>>> + pud_t *src_pudp; >>>> + >>>> + if (p4d_none(READ_ONCE(*dst_p4dp))) { >>>> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC); >>>> + if (!dst_pudp) >>>> + return -ENOMEM; >>>> + >>>> + p4d_populate(NULL, dst_p4dp, dst_pudp); >>>> + } >>>> + >>>> + dst_pudp = pud_offset(dst_p4dp, start); >>>> + src_pudp = pud_offset(src_p4dp, start); >>>> + >>>> + do { >>>> + pud_t pud = READ_ONCE(*src_pudp); >>>> + >>>> + next = pud_addr_end(start, end); >>>> + >>>> + if (pud_none(pud)) >>>> + continue; >>>> + >>>> + if (pud_leaf(pud)) { >>>> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot))); >>>> + } else { >>>> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, start, >>>> next, prot); >>>> + if (ret) >>>> + return -ENOMEM; >>>> + } >>>> + } while (dst_pudp++, src_pudp++, start = next, start != end); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp, >>>> unsigned long start, >>>> + unsigned long end, pgprot_t prot) >>>> +{ >>>> + unsigned long next; >>>> + unsigned long ret; >>>> + p4d_t *dst_p4dp; >>>> + p4d_t *src_p4dp; >>>> + >>>> + if (pgd_none(READ_ONCE(*dst_pgdp))) { >>>> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC); >>>> + if (!dst_p4dp) >>>> + return -ENOMEM; >>>> + >>>> + pgd_populate(NULL, dst_pgdp, dst_p4dp); >>>> + } >>>> + >>>> + dst_p4dp = p4d_offset(dst_pgdp, start); >>>> + src_p4dp = p4d_offset(src_pgdp, start); >>>> + >>>> + do { >>>> + p4d_t p4d = READ_ONCE(*src_p4dp); >>>> + >>>> + next = p4d_addr_end(start, end); >>>> + >>>> + if (p4d_none(p4d)) >>>> + continue; >>>> + >>>> + if (p4d_leaf(p4d)) { >>>> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot))); >>>> + } else { >>>> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, start, >>>> next, prot); >>>> + if (ret) >>>> + return -ENOMEM; >>>> + } >>>> + } while (dst_p4dp++, src_p4dp++, start = next, start != end); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int temp_pgtable_mapping(pgd_t *pgdp, unsigned long start, >>>> unsigned long end, pgprot_t prot) >>>> +{ >>>> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, start); >>>> + pgd_t *src_pgdp = pgd_offset_k(start); >>>> + unsigned long next; >>>> + unsigned long ret; >>>> + >>>> + do { >>>> + pgd_t pgd = READ_ONCE(*src_pgdp); >>>> + >>>> + next = pgd_addr_end(start, end); >>>> + >>>> + if (pgd_none(pgd)) >>>> + continue; >>>> + >>>> + if (pgd_leaf(pgd)) { >>>> + set_pgd(dst_pgdp, __pgd(pgd_val(pgd) | pgprot_val(prot))); >>>> + } else { >>>> + ret = temp_pgtable_map_p4d(dst_pgdp, src_pgdp, start, >>>> next, prot); >>>> + if (ret) >>>> + return -ENOMEM; >>>> + } >>>> + } while (dst_pgdp++, src_pgdp++, start = next, start != end); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static unsigned long relocate_restore_code(void) >>>> +{ >>>> + void *page = (void *)get_safe_page(GFP_ATOMIC); >>>> + >>>> + if (!page) >>>> + return -ENOMEM; >>>> + >>>> + copy_page(page, hibernate_core_restore_code); >>>> + >>>> + /* Make the page containing the relocated code executable. */ >>>> + set_memory_x((unsigned long)page, 1); >>>> + >>>> + return (unsigned long)page; >>>> +} >>>> + >>>> +int swsusp_arch_resume(void) >>>> +{ >>>> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn); >>>> + unsigned long start = PAGE_OFFSET; >>>> + int ret; >>>> + >>>> + /* >>>> + * Memory allocated by get_safe_page() will be dealt with by the >>>> hibernation core, >>>> + * we don't need to free it here. >>>> + */ >>>> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); >>>> + if (!resume_pg_dir) >>>> + return -ENOMEM; >>>> + >>>> + /* >>>> + * Create a temporary page table and map the whole linear region >>>> as executable and >>>> + * writable. >>>> + */ >>>> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, >>>> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); >>> >>> Hmm weird protections, PAGE_PRESENT and PAGE_READ are not even set. >>> Shouldn't you use PAGE_KERNEL_EXEC instead? >> >> Forget this one, that's the protections you add, so that's good :) > OK. >> >>> >>>> + if (ret) >>>> + return ret; >>>> + >>>> + /* Move the restore code to a new page so that it doesn't get >>>> overwritten by itself. */ >>>> + relocated_restore_code = relocate_restore_code(); >>>> + if (relocated_restore_code == -ENOMEM) >>>> + return -ENOMEM; >>>> + >>>> + /* >>>> + * Map the __hibernate_cpu_resume() address to the temporary >>>> page table so that the >>>> + * restore code can jumps to it after finished restore the >>>> image. The next execution >>>> + * code doesn't find itself in a different address space after >>>> switching over to the >>>> + * original page table used by the hibernated image. >>>> + * >>>> + */ >>>> + start = (unsigned long)resume_hdr.restore_cpu_addr; >>>> + end = start + PAGE_SIZE; >>>> + >>>> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, >>>> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); >>>> + if (ret) >>>> + return ret; >>> >>> You set the linear mapping as executable so no need for this. > We need this mapping because the kernel addresses and the linear addresses in riscv-64bit are in different boundary. For example: > Linear address start with 0xff60000000000000 > Kernel address start with 0xffffffff80000000 Ok I understand now that you need to map this into the temporary page table: one nit though, since you copy the kernel mapping, no need to use PAGE_WRITE | PAGE_EXEC, the protections are already correct. Thanks, Alex >>> >>>> + >>>> + hibernate_restore_image(resume_hdr.saved_satp, >>>> (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode), >>>> + resume_hdr.restore_cpu_addr); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +#ifdef CONFIG_PM_SLEEP_SMP >>>> +int hibernate_resume_nonboot_cpu_disable(void) >>>> +{ >>>> + if (sleep_cpu < 0) { >>>> + pr_err("Failing to resume from hibernate on an unknown CPU\n"); >>>> + return -ENODEV; >>>> + } >>>> + >>>> + return freeze_secondary_cpus(sleep_cpu); >>>> +} >>>> +#endif >>>> + >>>> +static int __init riscv_hibernate_init(void) >>>> +{ >>>> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), >>>> GFP_KERNEL); >>>> + >>>> + if (WARN_ON(!hibernate_cpu_context)) >>>> + return -ENOMEM; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +early_initcall(riscv_hibernate_init); >>> _______________________________________________ >>> linux-riscv mailing list >>> linux-riscv@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi Alex, > -----Original Message----- > From: Alexandre Ghiti <alex@ghiti.fr> > Sent: Friday, March 10, 2023 5:59 PM > To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu > Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo > <mason.huo@starfivetech.com> > Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk > > Hi JeeHeng, > > On 3/8/23 12:07, JeeHeng Sia wrote: > > > >> -----Original Message----- > >> From: Alexandre Ghiti <alex@ghiti.fr> > >> Sent: Wednesday, March 8, 2023 5:28 PM > >> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu > >> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo > >> <mason.huo@starfivetech.com> > >> Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk > >> > >> > >> On 3/8/23 10:08, Alexandre Ghiti wrote: > >>> On 3/8/23 09:06, Sia Jee Heng wrote: > >>>> Low level Arch functions were created to support hibernation. > >>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write > >>>> cpu state onto the stack, then calling swsusp_save() to save the memory > >>>> image. > >>>> > >>>> Arch specific hibernation header is implemented and is utilized by the > >>>> arch_hibernation_header_restore() and arch_hibernation_header_save() > >>>> functions. The arch specific hibernation header consists of satp, > >>>> hartid, > >>>> and the cpu_resume address. The kernel built version is also need to be > >>>> saved into the hibernation image header to making sure only the same > >>>> kernel is restore when resume. > >>>> > >>>> swsusp_arch_resume() creates a temporary page table that covering only > >>>> the linear map. It copies the restore code to a 'safe' page, then start > >>>> to restore the memory image. Once completed, it restores the original > >>>> kernel's page table. It then calls into __hibernate_cpu_resume() > >>>> to restore the CPU context. Finally, it follows the normal hibernation > >>>> path back to the hibernation core. > >>>> > >>>> To enable hibernation/suspend to disk into RISCV, the below config > >>>> need to be enabled: > >>>> - CONFIG_ARCH_HIBERNATION_HEADER > >>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE > >>>> > >>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> > >>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> > >>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> > >>>> --- > >>>> arch/riscv/Kconfig | 6 + > >>>> arch/riscv/include/asm/assembler.h | 20 ++ > >>>> arch/riscv/include/asm/suspend.h | 19 ++ > >>>> arch/riscv/kernel/Makefile | 1 + > >>>> arch/riscv/kernel/asm-offsets.c | 5 + > >>>> arch/riscv/kernel/hibernate-asm.S | 77 ++++++ > >>>> arch/riscv/kernel/hibernate.c | 424 +++++++++++++++++++++++++++++ > >>>> 7 files changed, 552 insertions(+) > >>>> create mode 100644 arch/riscv/kernel/hibernate-asm.S > >>>> create mode 100644 arch/riscv/kernel/hibernate.c > >>>> > >>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > >>>> index c5e42cc37604..473c2a1a6884 100644 > >>>> --- a/arch/riscv/Kconfig > >>>> +++ b/arch/riscv/Kconfig > >>>> @@ -707,6 +707,12 @@ menu "Power management options" > >>>> source "kernel/power/Kconfig" > >>>> +config ARCH_HIBERNATION_POSSIBLE > >>>> + def_bool y > >>>> + > >>>> +config ARCH_HIBERNATION_HEADER > >>>> + def_bool HIBERNATION > >>>> + > >>>> endmenu # "Power management options" > >>>> menu "CPU Power Management" > >>>> diff --git a/arch/riscv/include/asm/assembler.h > >>>> b/arch/riscv/include/asm/assembler.h > >>>> index ba59d38f8937..44b1457d3e95 100644 > >>>> --- a/arch/riscv/include/asm/assembler.h > >>>> +++ b/arch/riscv/include/asm/assembler.h > >>>> @@ -59,4 +59,24 @@ > >>>> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0) > >>>> .endm > >>>> +/* > >>>> + * copy_page - copy 1 page (4KB) of data from source to destination > >>>> + * @a0 - destination > >>>> + * @a1 - source > >>>> + */ > >>>> + .macro copy_page a0, a1 > >>>> + lui a2, 0x1 > >>>> + add a2, a2, a0 > >>>> +1 : > >>>> + REG_L t0, 0(a1) > >>>> + REG_L t1, SZREG(a1) > >>>> + > >>>> + REG_S t0, 0(a0) > >>>> + REG_S t1, SZREG(a0) > >>>> + > >>>> + addi a0, a0, 2 * SZREG > >>>> + addi a1, a1, 2 * SZREG > >>>> + bne a2, a0, 1b > >>>> + .endm > >>>> + > >>>> #endif /* __ASM_ASSEMBLER_H */ > >>>> diff --git a/arch/riscv/include/asm/suspend.h > >>>> b/arch/riscv/include/asm/suspend.h > >>>> index 67e047445662..02f87867389a 100644 > >>>> --- a/arch/riscv/include/asm/suspend.h > >>>> +++ b/arch/riscv/include/asm/suspend.h > >>>> @@ -21,6 +21,11 @@ struct suspend_context { > >>>> #endif > >>>> }; > >>>> +/* > >>>> + * Used by hibernation core and cleared during resume sequence > >>>> + */ > >>>> +extern int in_suspend; > >>>> + > >>>> /* Low-level CPU suspend entry function */ > >>>> int __cpu_suspend_enter(struct suspend_context *context); > >>>> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, > >>>> unsigned long context); > >>>> /* Used to save and restore the CSRs */ > >>>> void suspend_save_csrs(struct suspend_context *context); > >>>> void suspend_restore_csrs(struct suspend_context *context); > >>>> + > >>>> +/* Low-level API to support hibernation */ > >>>> +int swsusp_arch_suspend(void); > >>>> +int swsusp_arch_resume(void); > >>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size); > >>>> +int arch_hibernation_header_restore(void *addr); > >>>> +int __hibernate_cpu_resume(void); > >>>> + > >>>> +/* Used to resume on the CPU we hibernated on */ > >>>> +int hibernate_resume_nonboot_cpu_disable(void); > >>>> + > >>>> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, > >>>> unsigned long satp_temp, > >>>> + unsigned long cpu_resume); > >>>> +asmlinkage int hibernate_core_restore_code(void); > >>>> #endif > >>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > >>>> index 4cf303a779ab..daab341d55e4 100644 > >>>> --- a/arch/riscv/kernel/Makefile > >>>> +++ b/arch/riscv/kernel/Makefile > >>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o > >>>> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o > >>>> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o > >>>> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o > >>>> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o > >>>> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o > >>>> diff --git a/arch/riscv/kernel/asm-offsets.c > >>>> b/arch/riscv/kernel/asm-offsets.c > >>>> index df9444397908..d6a75aac1d27 100644 > >>>> --- a/arch/riscv/kernel/asm-offsets.c > >>>> +++ b/arch/riscv/kernel/asm-offsets.c > >>>> @@ -9,6 +9,7 @@ > >>>> #include <linux/kbuild.h> > >>>> #include <linux/mm.h> > >>>> #include <linux/sched.h> > >>>> +#include <linux/suspend.h> > >>>> #include <asm/kvm_host.h> > >>>> #include <asm/thread_info.h> > >>>> #include <asm/ptrace.h> > >>>> @@ -116,6 +117,10 @@ void asm_offsets(void) > >>>> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs); > >>>> + OFFSET(HIBERN_PBE_ADDR, pbe, address); > >>>> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address); > >>>> + OFFSET(HIBERN_PBE_NEXT, pbe, next); > >>>> + > >>>> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); > >>>> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); > >>>> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); > >>>> diff --git a/arch/riscv/kernel/hibernate-asm.S > >>>> b/arch/riscv/kernel/hibernate-asm.S > >>>> new file mode 100644 > >>>> index 000000000000..effaf5ca5da0 > >>>> --- /dev/null > >>>> +++ b/arch/riscv/kernel/hibernate-asm.S > >>>> @@ -0,0 +1,77 @@ > >>>> +/* SPDX-License-Identifier: GPL-2.0-only */ > >>>> +/* > >>>> + * Hibernation low level support for RISCV. > >>>> + * > >>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd. > >>>> + * > >>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> > >>>> + */ > >>>> + > >>>> +#include <asm/asm.h> > >>>> +#include <asm/asm-offsets.h> > >>>> +#include <asm/assembler.h> > >>>> +#include <asm/csr.h> > >>>> + > >>>> +#include <linux/linkage.h> > >>>> + > >>>> +/* > >>>> + * int __hibernate_cpu_resume(void) > >>>> + * Switch back to the hibernated image's page table prior to > >>>> restoring the CPU > >>>> + * context. > >>>> + * > >>>> + * Always returns 0 > >>>> + */ > >>>> +ENTRY(__hibernate_cpu_resume) > >>>> + /* switch to hibernated image's page table. */ > >>>> + csrw CSR_SATP, s0 > >>>> + sfence.vma > >>>> + > >>>> + REG_L a0, hibernate_cpu_context > >>>> + > >>>> + suspend_restore_csrs > >>>> + suspend_restore_regs > >>>> + > >>>> + /* Return zero value. */ > >>>> + mv a0, zero > >>>> + > >>>> + ret > >>>> +END(__hibernate_cpu_resume) > >>>> + > >>>> +/* > >>>> + * Prepare to restore the image. > >>>> + * a0: satp of saved page tables. > >>>> + * a1: satp of temporary page tables. > >>>> + * a2: cpu_resume. > >>>> + */ > >>>> +ENTRY(hibernate_restore_image) > >>>> + mv s0, a0 > >>>> + mv s1, a1 > >>>> + mv s2, a2 > >>>> + REG_L s4, restore_pblist > >>>> + REG_L a1, relocated_restore_code > >>>> + > >>>> + jalr a1 > >>>> +END(hibernate_restore_image) > >>>> + > >>>> +/* > >>>> + * The below code will be executed from a 'safe' page. > >>>> + * It first switches to the temporary page table, then starts to > >>>> copy the pages > >>>> + * back to the original memory location. Finally, it jumps to > >>>> __hibernate_cpu_resume() > >>>> + * to restore the CPU context. > >>>> + */ > >>>> +ENTRY(hibernate_core_restore_code) > >>>> + /* switch to temp page table. */ > >>>> + csrw satp, s1 > >>>> + sfence.vma > >>>> +.Lcopy: > >>>> + /* The below code will restore the hibernated image. */ > >>>> + REG_L a1, HIBERN_PBE_ADDR(s4) > >>>> + REG_L a0, HIBERN_PBE_ORIG(s4) > >>>> + > >>>> + copy_page a0, a1 > >>>> + > >>>> + REG_L s4, HIBERN_PBE_NEXT(s4) > >>>> + bnez s4, .Lcopy > >>>> + > >>>> + jalr s2 > >>>> +END(hibernate_core_restore_code) > >>>> diff --git a/arch/riscv/kernel/hibernate.c > >>>> b/arch/riscv/kernel/hibernate.c > >>>> new file mode 100644 > >>>> index 000000000000..8fab50122e54 > >>>> --- /dev/null > >>>> +++ b/arch/riscv/kernel/hibernate.c > >>>> @@ -0,0 +1,424 @@ > >>>> +// SPDX-License-Identifier: GPL-2.0-only > >>>> +/* > >>>> + * Hibernation support for RISCV > >>>> + * > >>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd. > >>>> + * > >>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> > >>>> + */ > >>>> + > >>>> +#include <asm/barrier.h> > >>>> +#include <asm/cacheflush.h> > >>>> +#include <asm/mmu_context.h> > >>>> +#include <asm/page.h> > >>>> +#include <asm/pgalloc.h> > >>>> +#include <asm/pgtable.h> > >>>> +#include <asm/sections.h> > >>>> +#include <asm/set_memory.h> > >>>> +#include <asm/smp.h> > >>>> +#include <asm/suspend.h> > >>>> + > >>>> +#include <linux/cpu.h> > >>>> +#include <linux/memblock.h> > >>>> +#include <linux/pm.h> > >>>> +#include <linux/sched.h> > >>>> +#include <linux/suspend.h> > >>>> +#include <linux/utsname.h> > >>>> + > >>>> +/* The logical cpu number we should resume on, initialised to a > >>>> non-cpu number. */ > >>>> +static int sleep_cpu = -EINVAL; > >>>> + > >>>> +/* Pointer to the temporary resume page table. */ > >>>> +static pgd_t *resume_pg_dir; > >>>> + > >>>> +/* CPU context to be saved. */ > >>>> +struct suspend_context *hibernate_cpu_context; > >>>> +EXPORT_SYMBOL_GPL(hibernate_cpu_context); > >>>> + > >>>> +unsigned long relocated_restore_code; > >>>> +EXPORT_SYMBOL_GPL(relocated_restore_code); > >>>> + > >>>> +/** > >>>> + * struct arch_hibernate_hdr_invariants - container to store kernel > >>>> build version. > >>>> + * @uts_version: to save the build number and date so that the we do > >>>> not resume with > >>>> + * a different kernel. > >>>> + */ > >>>> +struct arch_hibernate_hdr_invariants { > >>>> + char uts_version[__NEW_UTS_LEN + 1]; > >>>> +}; > >>>> + > >>>> +/** > >>>> + * struct arch_hibernate_hdr - helper parameters that help us to > >>>> restore the image. > >>>> + * @invariants: container to store kernel build version. > >>>> + * @hartid: to make sure same boot_cpu executes the > >>>> hibernate/restore code. > >>>> + * @saved_satp: original page table used by the hibernated image. > >>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU > >>>> context. > >>>> + */ > >>>> +static struct arch_hibernate_hdr { > >>>> + struct arch_hibernate_hdr_invariants invariants; > >>>> + unsigned long hartid; > >>>> + unsigned long saved_satp; > >>>> + unsigned long restore_cpu_addr; > >>>> +} resume_hdr; > >>>> + > >>>> +static void arch_hdr_invariants(struct arch_hibernate_hdr_invariants > >>>> *i) > >>>> +{ > >>>> + memset(i, 0, sizeof(*i)); > >>>> + memcpy(i->uts_version, init_utsname()->version, > >>>> sizeof(i->uts_version)); > >>>> +} > >>>> + > >>>> +/* > >>>> + * Check if the given pfn is in the 'nosave' section. > >>>> + */ > >>>> +int pfn_is_nosave(unsigned long pfn) > >>>> +{ > >>>> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); > >>>> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); > >>>> + > >>>> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)); > >>>> +} > >>>> + > >>>> +void notrace save_processor_state(void) > >>>> +{ > >>>> + WARN_ON(num_online_cpus() != 1); > >>>> +} > >>>> + > >>>> +void notrace restore_processor_state(void) > >>>> +{ > >>>> +} > >>>> + > >>>> +/* > >>>> + * Helper parameters need to be saved to the hibernation image header. > >>>> + */ > >>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size) > >>>> +{ > >>>> + struct arch_hibernate_hdr *hdr = addr; > >>>> + > >>>> + if (max_size < sizeof(*hdr)) > >>>> + return -EOVERFLOW; > >>>> + > >>>> + arch_hdr_invariants(&hdr->invariants); > >>>> + > >>>> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu); > >>>> + hdr->saved_satp = csr_read(CSR_SATP); > >>>> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume; > >>>> + > >>>> + return 0; > >>>> +} > >>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save); > >>>> + > >>>> +/* > >>>> + * Retrieve the helper parameters from the hibernation image header. > >>>> + */ > >>>> +int arch_hibernation_header_restore(void *addr) > >>>> +{ > >>>> + struct arch_hibernate_hdr_invariants invariants; > >>>> + struct arch_hibernate_hdr *hdr = addr; > >>>> + int ret = 0; > >>>> + > >>>> + arch_hdr_invariants(&invariants); > >>>> + > >>>> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) { > >>>> + pr_crit("Hibernate image not generated by this kernel!\n"); > >>>> + return -EINVAL; > >>>> + } > >>>> + > >>>> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid); > >>>> + if (sleep_cpu < 0) { > >>>> + pr_crit("Hibernated on a CPU not known to this kernel!\n"); > >>>> + sleep_cpu = -EINVAL; > >>>> + return -EINVAL; > >>>> + } > >>>> + > >>>> +#ifdef CONFIG_SMP > >>>> + ret = bringup_hibernate_cpu(sleep_cpu); > >>>> + if (ret) { > >>>> + sleep_cpu = -EINVAL; > >>>> + return ret; > >>>> + } > >>>> +#endif > >>>> + resume_hdr = *hdr; > >>>> + > >>>> + return ret; > >>>> +} > >>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore); > >>>> + > >>>> +int swsusp_arch_suspend(void) > >>>> +{ > >>>> + int ret = 0; > >>>> + > >>>> + if (__cpu_suspend_enter(hibernate_cpu_context)) { > >>>> + sleep_cpu = smp_processor_id(); > >>>> + suspend_save_csrs(hibernate_cpu_context); > >>>> + ret = swsusp_save(); > >>>> + } else { > >>>> + suspend_restore_csrs(hibernate_cpu_context); > >>>> + flush_tlb_all(); > >>>> + flush_icache_all(); > >>>> + > >>>> + /* > >>>> + * Tell the hibernation core that we've just restored the > >>>> memory. > >>>> + */ > >>>> + in_suspend = 0; > >>>> + sleep_cpu = -EINVAL; > >>>> + } > >>>> + > >>>> + return ret; > >>>> +} > >>>> + > >>>> +static int temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, > >>>> unsigned long start, > >>>> + unsigned long end, pgprot_t prot) > >>>> +{ > >>>> + pte_t *src_ptep; > >>>> + pte_t *dst_ptep; > >>>> + > >>>> + if (pmd_none(READ_ONCE(*dst_pmdp))) { > >>>> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC); > >>>> + if (!dst_ptep) > >>>> + return -ENOMEM; > >>>> + > >>>> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); > >>>> + } > >>>> + > >>>> + dst_ptep = pte_offset_kernel(dst_pmdp, start); > >>>> + src_ptep = pte_offset_kernel(src_pmdp, start); > >>>> + > >>>> + do { > >>>> + pte_t pte = READ_ONCE(*src_ptep); > >>>> + > >>>> + if (pte_present(pte)) > >>>> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot))); > >>>> + } while (dst_ptep++, src_ptep++, start += PAGE_SIZE, start < end); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +static int temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp, > >>>> unsigned long start, > >>>> + unsigned long end, pgprot_t prot) > >>>> +{ > >>>> + unsigned long next; > >>>> + unsigned long ret; > >>>> + pmd_t *src_pmdp; > >>>> + pmd_t *dst_pmdp; > >>>> + > >>>> + if (pud_none(READ_ONCE(*dst_pudp))) { > >>>> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC); > >>>> + if (!dst_pmdp) > >>>> + return -ENOMEM; > >>>> + > >>>> + pud_populate(NULL, dst_pudp, dst_pmdp); > >>>> + } > >>>> + > >>>> + dst_pmdp = pmd_offset(dst_pudp, start); > >>>> + src_pmdp = pmd_offset(src_pudp, start); > >>>> + > >>>> + do { > >>>> + pmd_t pmd = READ_ONCE(*src_pmdp); > >>>> + > >>>> + next = pmd_addr_end(start, end); > >>>> + > >>>> + if (pmd_none(pmd)) > >>>> + continue; > >>>> + > >>>> + if (pmd_leaf(pmd)) { > >>>> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot))); > >>>> + } else { > >>>> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, start, > >>>> next, prot); > >>>> + if (ret) > >>>> + return -ENOMEM; > >>>> + } > >>>> + } while (dst_pmdp++, src_pmdp++, start = next, start != end); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +static int temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp, > >>>> unsigned long start, > >>>> + unsigned long end, pgprot_t prot) > >>>> +{ > >>>> + unsigned long next; > >>>> + unsigned long ret; > >>>> + pud_t *dst_pudp; > >>>> + pud_t *src_pudp; > >>>> + > >>>> + if (p4d_none(READ_ONCE(*dst_p4dp))) { > >>>> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC); > >>>> + if (!dst_pudp) > >>>> + return -ENOMEM; > >>>> + > >>>> + p4d_populate(NULL, dst_p4dp, dst_pudp); > >>>> + } > >>>> + > >>>> + dst_pudp = pud_offset(dst_p4dp, start); > >>>> + src_pudp = pud_offset(src_p4dp, start); > >>>> + > >>>> + do { > >>>> + pud_t pud = READ_ONCE(*src_pudp); > >>>> + > >>>> + next = pud_addr_end(start, end); > >>>> + > >>>> + if (pud_none(pud)) > >>>> + continue; > >>>> + > >>>> + if (pud_leaf(pud)) { > >>>> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot))); > >>>> + } else { > >>>> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, start, > >>>> next, prot); > >>>> + if (ret) > >>>> + return -ENOMEM; > >>>> + } > >>>> + } while (dst_pudp++, src_pudp++, start = next, start != end); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +static int temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp, > >>>> unsigned long start, > >>>> + unsigned long end, pgprot_t prot) > >>>> +{ > >>>> + unsigned long next; > >>>> + unsigned long ret; > >>>> + p4d_t *dst_p4dp; > >>>> + p4d_t *src_p4dp; > >>>> + > >>>> + if (pgd_none(READ_ONCE(*dst_pgdp))) { > >>>> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC); > >>>> + if (!dst_p4dp) > >>>> + return -ENOMEM; > >>>> + > >>>> + pgd_populate(NULL, dst_pgdp, dst_p4dp); > >>>> + } > >>>> + > >>>> + dst_p4dp = p4d_offset(dst_pgdp, start); > >>>> + src_p4dp = p4d_offset(src_pgdp, start); > >>>> + > >>>> + do { > >>>> + p4d_t p4d = READ_ONCE(*src_p4dp); > >>>> + > >>>> + next = p4d_addr_end(start, end); > >>>> + > >>>> + if (p4d_none(p4d)) > >>>> + continue; > >>>> + > >>>> + if (p4d_leaf(p4d)) { > >>>> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot))); > >>>> + } else { > >>>> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, start, > >>>> next, prot); > >>>> + if (ret) > >>>> + return -ENOMEM; > >>>> + } > >>>> + } while (dst_p4dp++, src_p4dp++, start = next, start != end); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +static int temp_pgtable_mapping(pgd_t *pgdp, unsigned long start, > >>>> unsigned long end, pgprot_t prot) > >>>> +{ > >>>> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, start); > >>>> + pgd_t *src_pgdp = pgd_offset_k(start); > >>>> + unsigned long next; > >>>> + unsigned long ret; > >>>> + > >>>> + do { > >>>> + pgd_t pgd = READ_ONCE(*src_pgdp); > >>>> + > >>>> + next = pgd_addr_end(start, end); > >>>> + > >>>> + if (pgd_none(pgd)) > >>>> + continue; > >>>> + > >>>> + if (pgd_leaf(pgd)) { > >>>> + set_pgd(dst_pgdp, __pgd(pgd_val(pgd) | pgprot_val(prot))); > >>>> + } else { > >>>> + ret = temp_pgtable_map_p4d(dst_pgdp, src_pgdp, start, > >>>> next, prot); > >>>> + if (ret) > >>>> + return -ENOMEM; > >>>> + } > >>>> + } while (dst_pgdp++, src_pgdp++, start = next, start != end); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +static unsigned long relocate_restore_code(void) > >>>> +{ > >>>> + void *page = (void *)get_safe_page(GFP_ATOMIC); > >>>> + > >>>> + if (!page) > >>>> + return -ENOMEM; > >>>> + > >>>> + copy_page(page, hibernate_core_restore_code); > >>>> + > >>>> + /* Make the page containing the relocated code executable. */ > >>>> + set_memory_x((unsigned long)page, 1); > >>>> + > >>>> + return (unsigned long)page; > >>>> +} > >>>> + > >>>> +int swsusp_arch_resume(void) > >>>> +{ > >>>> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn); > >>>> + unsigned long start = PAGE_OFFSET; > >>>> + int ret; > >>>> + > >>>> + /* > >>>> + * Memory allocated by get_safe_page() will be dealt with by the > >>>> hibernation core, > >>>> + * we don't need to free it here. > >>>> + */ > >>>> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); > >>>> + if (!resume_pg_dir) > >>>> + return -ENOMEM; > >>>> + > >>>> + /* > >>>> + * Create a temporary page table and map the whole linear region > >>>> as executable and > >>>> + * writable. > >>>> + */ > >>>> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, > >>>> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); > >>> > >>> Hmm weird protections, PAGE_PRESENT and PAGE_READ are not even set. > >>> Shouldn't you use PAGE_KERNEL_EXEC instead? > >> > >> Forget this one, that's the protections you add, so that's good :) > > OK. > >> > >>> > >>>> + if (ret) > >>>> + return ret; > >>>> + > >>>> + /* Move the restore code to a new page so that it doesn't get > >>>> overwritten by itself. */ > >>>> + relocated_restore_code = relocate_restore_code(); > >>>> + if (relocated_restore_code == -ENOMEM) > >>>> + return -ENOMEM; > >>>> + > >>>> + /* > >>>> + * Map the __hibernate_cpu_resume() address to the temporary > >>>> page table so that the > >>>> + * restore code can jumps to it after finished restore the > >>>> image. The next execution > >>>> + * code doesn't find itself in a different address space after > >>>> switching over to the > >>>> + * original page table used by the hibernated image. > >>>> + * > >>>> + */ > >>>> + start = (unsigned long)resume_hdr.restore_cpu_addr; > >>>> + end = start + PAGE_SIZE; > >>>> + > >>>> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, > >>>> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); > >>>> + if (ret) > >>>> + return ret; > >>> > >>> You set the linear mapping as executable so no need for this. > > We need this mapping because the kernel addresses and the linear addresses in riscv-64bit are in different boundary. For example: > > Linear address start with 0xff60000000000000 > > Kernel address start with 0xffffffff80000000 > > > Ok I understand now that you need to map this into the temporary page > table: one nit though, since you copy the kernel mapping, no need to use > PAGE_WRITE | PAGE_EXEC, the protections are already correct. Here is the thing. For riscv_64bit, I am agreed with you that we don’t need to use PAGE_WRITE | PAGE_EXEC. However, we need to use PAGE_WRITE | PAGE_EXEC for riscv_32bit because the linear addresses and kernel addresses are similar. The attribute becomes not writable if following the "original" page table. I am thinking to use #ifdef config_32bit to split the path for 64bit and 32bit but I just afraid the linear addresses and kernel addresses might change again in the future. To be on the safe side, I let both riscv_32bit & riscv_64bit to use the PAGE_WRITE | PAGE_EXEC. For example: Linear address and kernel address for 32bit started with 0xc0000000 Linear address for 64bit start with 0xff60000000000000 Kernel address for 64bit start with 0xffffffff80000000 Please let me know your thoughts. > > Thanks, > > Alex > > > >>> > >>>> + > >>>> + hibernate_restore_image(resume_hdr.saved_satp, > >>>> (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode), > >>>> + resume_hdr.restore_cpu_addr); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +#ifdef CONFIG_PM_SLEEP_SMP > >>>> +int hibernate_resume_nonboot_cpu_disable(void) > >>>> +{ > >>>> + if (sleep_cpu < 0) { > >>>> + pr_err("Failing to resume from hibernate on an unknown CPU\n"); > >>>> + return -ENODEV; > >>>> + } > >>>> + > >>>> + return freeze_secondary_cpus(sleep_cpu); > >>>> +} > >>>> +#endif > >>>> + > >>>> +static int __init riscv_hibernate_init(void) > >>>> +{ > >>>> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), > >>>> GFP_KERNEL); > >>>> + > >>>> + if (WARN_ON(!hibernate_cpu_context)) > >>>> + return -ENOMEM; > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +early_initcall(riscv_hibernate_init); > >>> _______________________________________________ > >>> linux-riscv mailing list > >>> linux-riscv@lists.infradead.org > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv
On 3/10/23 11:43, JeeHeng Sia wrote: > Hi Alex, > >> -----Original Message----- >> From: Alexandre Ghiti <alex@ghiti.fr> >> Sent: Friday, March 10, 2023 5:59 PM >> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu >> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo >> <mason.huo@starfivetech.com> >> Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk >> >> Hi JeeHeng, >> >> On 3/8/23 12:07, JeeHeng Sia wrote: >>>> -----Original Message----- >>>> From: Alexandre Ghiti <alex@ghiti.fr> >>>> Sent: Wednesday, March 8, 2023 5:28 PM >>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu >>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo >>>> <mason.huo@starfivetech.com> >>>> Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk >>>> >>>> >>>> On 3/8/23 10:08, Alexandre Ghiti wrote: >>>>> On 3/8/23 09:06, Sia Jee Heng wrote: >>>>>> Low level Arch functions were created to support hibernation. >>>>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write >>>>>> cpu state onto the stack, then calling swsusp_save() to save the memory >>>>>> image. >>>>>> >>>>>> Arch specific hibernation header is implemented and is utilized by the >>>>>> arch_hibernation_header_restore() and arch_hibernation_header_save() >>>>>> functions. The arch specific hibernation header consists of satp, >>>>>> hartid, >>>>>> and the cpu_resume address. The kernel built version is also need to be >>>>>> saved into the hibernation image header to making sure only the same >>>>>> kernel is restore when resume. >>>>>> >>>>>> swsusp_arch_resume() creates a temporary page table that covering only >>>>>> the linear map. It copies the restore code to a 'safe' page, then start >>>>>> to restore the memory image. Once completed, it restores the original >>>>>> kernel's page table. It then calls into __hibernate_cpu_resume() >>>>>> to restore the CPU context. Finally, it follows the normal hibernation >>>>>> path back to the hibernation core. >>>>>> >>>>>> To enable hibernation/suspend to disk into RISCV, the below config >>>>>> need to be enabled: >>>>>> - CONFIG_ARCH_HIBERNATION_HEADER >>>>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE >>>>>> >>>>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> >>>>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> >>>>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> >>>>>> --- >>>>>> arch/riscv/Kconfig | 6 + >>>>>> arch/riscv/include/asm/assembler.h | 20 ++ >>>>>> arch/riscv/include/asm/suspend.h | 19 ++ >>>>>> arch/riscv/kernel/Makefile | 1 + >>>>>> arch/riscv/kernel/asm-offsets.c | 5 + >>>>>> arch/riscv/kernel/hibernate-asm.S | 77 ++++++ >>>>>> arch/riscv/kernel/hibernate.c | 424 +++++++++++++++++++++++++++++ >>>>>> 7 files changed, 552 insertions(+) >>>>>> create mode 100644 arch/riscv/kernel/hibernate-asm.S >>>>>> create mode 100644 arch/riscv/kernel/hibernate.c >>>>>> >>>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >>>>>> index c5e42cc37604..473c2a1a6884 100644 >>>>>> --- a/arch/riscv/Kconfig >>>>>> +++ b/arch/riscv/Kconfig >>>>>> @@ -707,6 +707,12 @@ menu "Power management options" >>>>>> source "kernel/power/Kconfig" >>>>>> +config ARCH_HIBERNATION_POSSIBLE >>>>>> + def_bool y >>>>>> + >>>>>> +config ARCH_HIBERNATION_HEADER >>>>>> + def_bool HIBERNATION >>>>>> + >>>>>> endmenu # "Power management options" >>>>>> menu "CPU Power Management" >>>>>> diff --git a/arch/riscv/include/asm/assembler.h >>>>>> b/arch/riscv/include/asm/assembler.h >>>>>> index ba59d38f8937..44b1457d3e95 100644 >>>>>> --- a/arch/riscv/include/asm/assembler.h >>>>>> +++ b/arch/riscv/include/asm/assembler.h >>>>>> @@ -59,4 +59,24 @@ >>>>>> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0) >>>>>> .endm >>>>>> +/* >>>>>> + * copy_page - copy 1 page (4KB) of data from source to destination >>>>>> + * @a0 - destination >>>>>> + * @a1 - source >>>>>> + */ >>>>>> + .macro copy_page a0, a1 >>>>>> + lui a2, 0x1 >>>>>> + add a2, a2, a0 >>>>>> +1 : >>>>>> + REG_L t0, 0(a1) >>>>>> + REG_L t1, SZREG(a1) >>>>>> + >>>>>> + REG_S t0, 0(a0) >>>>>> + REG_S t1, SZREG(a0) >>>>>> + >>>>>> + addi a0, a0, 2 * SZREG >>>>>> + addi a1, a1, 2 * SZREG >>>>>> + bne a2, a0, 1b >>>>>> + .endm >>>>>> + >>>>>> #endif /* __ASM_ASSEMBLER_H */ >>>>>> diff --git a/arch/riscv/include/asm/suspend.h >>>>>> b/arch/riscv/include/asm/suspend.h >>>>>> index 67e047445662..02f87867389a 100644 >>>>>> --- a/arch/riscv/include/asm/suspend.h >>>>>> +++ b/arch/riscv/include/asm/suspend.h >>>>>> @@ -21,6 +21,11 @@ struct suspend_context { >>>>>> #endif >>>>>> }; >>>>>> +/* >>>>>> + * Used by hibernation core and cleared during resume sequence >>>>>> + */ >>>>>> +extern int in_suspend; >>>>>> + >>>>>> /* Low-level CPU suspend entry function */ >>>>>> int __cpu_suspend_enter(struct suspend_context *context); >>>>>> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, >>>>>> unsigned long context); >>>>>> /* Used to save and restore the CSRs */ >>>>>> void suspend_save_csrs(struct suspend_context *context); >>>>>> void suspend_restore_csrs(struct suspend_context *context); >>>>>> + >>>>>> +/* Low-level API to support hibernation */ >>>>>> +int swsusp_arch_suspend(void); >>>>>> +int swsusp_arch_resume(void); >>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size); >>>>>> +int arch_hibernation_header_restore(void *addr); >>>>>> +int __hibernate_cpu_resume(void); >>>>>> + >>>>>> +/* Used to resume on the CPU we hibernated on */ >>>>>> +int hibernate_resume_nonboot_cpu_disable(void); >>>>>> + >>>>>> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, >>>>>> unsigned long satp_temp, >>>>>> + unsigned long cpu_resume); >>>>>> +asmlinkage int hibernate_core_restore_code(void); >>>>>> #endif >>>>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile >>>>>> index 4cf303a779ab..daab341d55e4 100644 >>>>>> --- a/arch/riscv/kernel/Makefile >>>>>> +++ b/arch/riscv/kernel/Makefile >>>>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o >>>>>> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o >>>>>> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o >>>>>> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o >>>>>> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o >>>>>> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o >>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c >>>>>> b/arch/riscv/kernel/asm-offsets.c >>>>>> index df9444397908..d6a75aac1d27 100644 >>>>>> --- a/arch/riscv/kernel/asm-offsets.c >>>>>> +++ b/arch/riscv/kernel/asm-offsets.c >>>>>> @@ -9,6 +9,7 @@ >>>>>> #include <linux/kbuild.h> >>>>>> #include <linux/mm.h> >>>>>> #include <linux/sched.h> >>>>>> +#include <linux/suspend.h> >>>>>> #include <asm/kvm_host.h> >>>>>> #include <asm/thread_info.h> >>>>>> #include <asm/ptrace.h> >>>>>> @@ -116,6 +117,10 @@ void asm_offsets(void) >>>>>> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs); >>>>>> + OFFSET(HIBERN_PBE_ADDR, pbe, address); >>>>>> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address); >>>>>> + OFFSET(HIBERN_PBE_NEXT, pbe, next); >>>>>> + >>>>>> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); >>>>>> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); >>>>>> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); >>>>>> diff --git a/arch/riscv/kernel/hibernate-asm.S >>>>>> b/arch/riscv/kernel/hibernate-asm.S >>>>>> new file mode 100644 >>>>>> index 000000000000..effaf5ca5da0 >>>>>> --- /dev/null >>>>>> +++ b/arch/riscv/kernel/hibernate-asm.S >>>>>> @@ -0,0 +1,77 @@ >>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */ >>>>>> +/* >>>>>> + * Hibernation low level support for RISCV. >>>>>> + * >>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd. >>>>>> + * >>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> >>>>>> + */ >>>>>> + >>>>>> +#include <asm/asm.h> >>>>>> +#include <asm/asm-offsets.h> >>>>>> +#include <asm/assembler.h> >>>>>> +#include <asm/csr.h> >>>>>> + >>>>>> +#include <linux/linkage.h> >>>>>> + >>>>>> +/* >>>>>> + * int __hibernate_cpu_resume(void) >>>>>> + * Switch back to the hibernated image's page table prior to >>>>>> restoring the CPU >>>>>> + * context. >>>>>> + * >>>>>> + * Always returns 0 >>>>>> + */ >>>>>> +ENTRY(__hibernate_cpu_resume) >>>>>> + /* switch to hibernated image's page table. */ >>>>>> + csrw CSR_SATP, s0 >>>>>> + sfence.vma >>>>>> + >>>>>> + REG_L a0, hibernate_cpu_context >>>>>> + >>>>>> + suspend_restore_csrs >>>>>> + suspend_restore_regs >>>>>> + >>>>>> + /* Return zero value. */ >>>>>> + mv a0, zero >>>>>> + >>>>>> + ret >>>>>> +END(__hibernate_cpu_resume) >>>>>> + >>>>>> +/* >>>>>> + * Prepare to restore the image. >>>>>> + * a0: satp of saved page tables. >>>>>> + * a1: satp of temporary page tables. >>>>>> + * a2: cpu_resume. >>>>>> + */ >>>>>> +ENTRY(hibernate_restore_image) >>>>>> + mv s0, a0 >>>>>> + mv s1, a1 >>>>>> + mv s2, a2 >>>>>> + REG_L s4, restore_pblist >>>>>> + REG_L a1, relocated_restore_code >>>>>> + >>>>>> + jalr a1 >>>>>> +END(hibernate_restore_image) >>>>>> + >>>>>> +/* >>>>>> + * The below code will be executed from a 'safe' page. >>>>>> + * It first switches to the temporary page table, then starts to >>>>>> copy the pages >>>>>> + * back to the original memory location. Finally, it jumps to >>>>>> __hibernate_cpu_resume() >>>>>> + * to restore the CPU context. >>>>>> + */ >>>>>> +ENTRY(hibernate_core_restore_code) >>>>>> + /* switch to temp page table. */ >>>>>> + csrw satp, s1 >>>>>> + sfence.vma >>>>>> +.Lcopy: >>>>>> + /* The below code will restore the hibernated image. */ >>>>>> + REG_L a1, HIBERN_PBE_ADDR(s4) >>>>>> + REG_L a0, HIBERN_PBE_ORIG(s4) >>>>>> + >>>>>> + copy_page a0, a1 >>>>>> + >>>>>> + REG_L s4, HIBERN_PBE_NEXT(s4) >>>>>> + bnez s4, .Lcopy >>>>>> + >>>>>> + jalr s2 >>>>>> +END(hibernate_core_restore_code) >>>>>> diff --git a/arch/riscv/kernel/hibernate.c >>>>>> b/arch/riscv/kernel/hibernate.c >>>>>> new file mode 100644 >>>>>> index 000000000000..8fab50122e54 >>>>>> --- /dev/null >>>>>> +++ b/arch/riscv/kernel/hibernate.c >>>>>> @@ -0,0 +1,424 @@ >>>>>> +// SPDX-License-Identifier: GPL-2.0-only >>>>>> +/* >>>>>> + * Hibernation support for RISCV >>>>>> + * >>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd. >>>>>> + * >>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> >>>>>> + */ >>>>>> + >>>>>> +#include <asm/barrier.h> >>>>>> +#include <asm/cacheflush.h> >>>>>> +#include <asm/mmu_context.h> >>>>>> +#include <asm/page.h> >>>>>> +#include <asm/pgalloc.h> >>>>>> +#include <asm/pgtable.h> >>>>>> +#include <asm/sections.h> >>>>>> +#include <asm/set_memory.h> >>>>>> +#include <asm/smp.h> >>>>>> +#include <asm/suspend.h> >>>>>> + >>>>>> +#include <linux/cpu.h> >>>>>> +#include <linux/memblock.h> >>>>>> +#include <linux/pm.h> >>>>>> +#include <linux/sched.h> >>>>>> +#include <linux/suspend.h> >>>>>> +#include <linux/utsname.h> >>>>>> + >>>>>> +/* The logical cpu number we should resume on, initialised to a >>>>>> non-cpu number. */ >>>>>> +static int sleep_cpu = -EINVAL; >>>>>> + >>>>>> +/* Pointer to the temporary resume page table. */ >>>>>> +static pgd_t *resume_pg_dir; >>>>>> + >>>>>> +/* CPU context to be saved. */ >>>>>> +struct suspend_context *hibernate_cpu_context; >>>>>> +EXPORT_SYMBOL_GPL(hibernate_cpu_context); >>>>>> + >>>>>> +unsigned long relocated_restore_code; >>>>>> +EXPORT_SYMBOL_GPL(relocated_restore_code); >>>>>> + >>>>>> +/** >>>>>> + * struct arch_hibernate_hdr_invariants - container to store kernel >>>>>> build version. >>>>>> + * @uts_version: to save the build number and date so that the we do >>>>>> not resume with >>>>>> + * a different kernel. >>>>>> + */ >>>>>> +struct arch_hibernate_hdr_invariants { >>>>>> + char uts_version[__NEW_UTS_LEN + 1]; >>>>>> +}; >>>>>> + >>>>>> +/** >>>>>> + * struct arch_hibernate_hdr - helper parameters that help us to >>>>>> restore the image. >>>>>> + * @invariants: container to store kernel build version. >>>>>> + * @hartid: to make sure same boot_cpu executes the >>>>>> hibernate/restore code. >>>>>> + * @saved_satp: original page table used by the hibernated image. >>>>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU >>>>>> context. >>>>>> + */ >>>>>> +static struct arch_hibernate_hdr { >>>>>> + struct arch_hibernate_hdr_invariants invariants; >>>>>> + unsigned long hartid; >>>>>> + unsigned long saved_satp; >>>>>> + unsigned long restore_cpu_addr; >>>>>> +} resume_hdr; >>>>>> + >>>>>> +static void arch_hdr_invariants(struct arch_hibernate_hdr_invariants >>>>>> *i) >>>>>> +{ >>>>>> + memset(i, 0, sizeof(*i)); >>>>>> + memcpy(i->uts_version, init_utsname()->version, >>>>>> sizeof(i->uts_version)); >>>>>> +} >>>>>> + >>>>>> +/* >>>>>> + * Check if the given pfn is in the 'nosave' section. >>>>>> + */ >>>>>> +int pfn_is_nosave(unsigned long pfn) >>>>>> +{ >>>>>> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); >>>>>> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); >>>>>> + >>>>>> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)); >>>>>> +} >>>>>> + >>>>>> +void notrace save_processor_state(void) >>>>>> +{ >>>>>> + WARN_ON(num_online_cpus() != 1); >>>>>> +} >>>>>> + >>>>>> +void notrace restore_processor_state(void) >>>>>> +{ >>>>>> +} >>>>>> + >>>>>> +/* >>>>>> + * Helper parameters need to be saved to the hibernation image header. >>>>>> + */ >>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size) >>>>>> +{ >>>>>> + struct arch_hibernate_hdr *hdr = addr; >>>>>> + >>>>>> + if (max_size < sizeof(*hdr)) >>>>>> + return -EOVERFLOW; >>>>>> + >>>>>> + arch_hdr_invariants(&hdr->invariants); >>>>>> + >>>>>> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu); >>>>>> + hdr->saved_satp = csr_read(CSR_SATP); >>>>>> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume; >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save); >>>>>> + >>>>>> +/* >>>>>> + * Retrieve the helper parameters from the hibernation image header. >>>>>> + */ >>>>>> +int arch_hibernation_header_restore(void *addr) >>>>>> +{ >>>>>> + struct arch_hibernate_hdr_invariants invariants; >>>>>> + struct arch_hibernate_hdr *hdr = addr; >>>>>> + int ret = 0; >>>>>> + >>>>>> + arch_hdr_invariants(&invariants); >>>>>> + >>>>>> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) { >>>>>> + pr_crit("Hibernate image not generated by this kernel!\n"); >>>>>> + return -EINVAL; >>>>>> + } >>>>>> + >>>>>> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid); >>>>>> + if (sleep_cpu < 0) { >>>>>> + pr_crit("Hibernated on a CPU not known to this kernel!\n"); >>>>>> + sleep_cpu = -EINVAL; >>>>>> + return -EINVAL; >>>>>> + } >>>>>> + >>>>>> +#ifdef CONFIG_SMP >>>>>> + ret = bringup_hibernate_cpu(sleep_cpu); >>>>>> + if (ret) { >>>>>> + sleep_cpu = -EINVAL; >>>>>> + return ret; >>>>>> + } >>>>>> +#endif >>>>>> + resume_hdr = *hdr; >>>>>> + >>>>>> + return ret; >>>>>> +} >>>>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore); >>>>>> + >>>>>> +int swsusp_arch_suspend(void) >>>>>> +{ >>>>>> + int ret = 0; >>>>>> + >>>>>> + if (__cpu_suspend_enter(hibernate_cpu_context)) { >>>>>> + sleep_cpu = smp_processor_id(); >>>>>> + suspend_save_csrs(hibernate_cpu_context); >>>>>> + ret = swsusp_save(); >>>>>> + } else { >>>>>> + suspend_restore_csrs(hibernate_cpu_context); >>>>>> + flush_tlb_all(); >>>>>> + flush_icache_all(); >>>>>> + >>>>>> + /* >>>>>> + * Tell the hibernation core that we've just restored the >>>>>> memory. >>>>>> + */ >>>>>> + in_suspend = 0; >>>>>> + sleep_cpu = -EINVAL; >>>>>> + } >>>>>> + >>>>>> + return ret; >>>>>> +} >>>>>> + >>>>>> +static int temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, >>>>>> unsigned long start, >>>>>> + unsigned long end, pgprot_t prot) >>>>>> +{ >>>>>> + pte_t *src_ptep; >>>>>> + pte_t *dst_ptep; >>>>>> + >>>>>> + if (pmd_none(READ_ONCE(*dst_pmdp))) { >>>>>> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC); >>>>>> + if (!dst_ptep) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); >>>>>> + } >>>>>> + >>>>>> + dst_ptep = pte_offset_kernel(dst_pmdp, start); >>>>>> + src_ptep = pte_offset_kernel(src_pmdp, start); >>>>>> + >>>>>> + do { >>>>>> + pte_t pte = READ_ONCE(*src_ptep); >>>>>> + >>>>>> + if (pte_present(pte)) >>>>>> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot))); >>>>>> + } while (dst_ptep++, src_ptep++, start += PAGE_SIZE, start < end); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static int temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp, >>>>>> unsigned long start, >>>>>> + unsigned long end, pgprot_t prot) >>>>>> +{ >>>>>> + unsigned long next; >>>>>> + unsigned long ret; >>>>>> + pmd_t *src_pmdp; >>>>>> + pmd_t *dst_pmdp; >>>>>> + >>>>>> + if (pud_none(READ_ONCE(*dst_pudp))) { >>>>>> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC); >>>>>> + if (!dst_pmdp) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + pud_populate(NULL, dst_pudp, dst_pmdp); >>>>>> + } >>>>>> + >>>>>> + dst_pmdp = pmd_offset(dst_pudp, start); >>>>>> + src_pmdp = pmd_offset(src_pudp, start); >>>>>> + >>>>>> + do { >>>>>> + pmd_t pmd = READ_ONCE(*src_pmdp); >>>>>> + >>>>>> + next = pmd_addr_end(start, end); >>>>>> + >>>>>> + if (pmd_none(pmd)) >>>>>> + continue; >>>>>> + >>>>>> + if (pmd_leaf(pmd)) { >>>>>> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot))); >>>>>> + } else { >>>>>> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, start, >>>>>> next, prot); >>>>>> + if (ret) >>>>>> + return -ENOMEM; >>>>>> + } >>>>>> + } while (dst_pmdp++, src_pmdp++, start = next, start != end); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static int temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp, >>>>>> unsigned long start, >>>>>> + unsigned long end, pgprot_t prot) >>>>>> +{ >>>>>> + unsigned long next; >>>>>> + unsigned long ret; >>>>>> + pud_t *dst_pudp; >>>>>> + pud_t *src_pudp; >>>>>> + >>>>>> + if (p4d_none(READ_ONCE(*dst_p4dp))) { >>>>>> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC); >>>>>> + if (!dst_pudp) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + p4d_populate(NULL, dst_p4dp, dst_pudp); >>>>>> + } >>>>>> + >>>>>> + dst_pudp = pud_offset(dst_p4dp, start); >>>>>> + src_pudp = pud_offset(src_p4dp, start); >>>>>> + >>>>>> + do { >>>>>> + pud_t pud = READ_ONCE(*src_pudp); >>>>>> + >>>>>> + next = pud_addr_end(start, end); >>>>>> + >>>>>> + if (pud_none(pud)) >>>>>> + continue; >>>>>> + >>>>>> + if (pud_leaf(pud)) { >>>>>> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot))); >>>>>> + } else { >>>>>> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, start, >>>>>> next, prot); >>>>>> + if (ret) >>>>>> + return -ENOMEM; >>>>>> + } >>>>>> + } while (dst_pudp++, src_pudp++, start = next, start != end); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static int temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp, >>>>>> unsigned long start, >>>>>> + unsigned long end, pgprot_t prot) >>>>>> +{ >>>>>> + unsigned long next; >>>>>> + unsigned long ret; >>>>>> + p4d_t *dst_p4dp; >>>>>> + p4d_t *src_p4dp; >>>>>> + >>>>>> + if (pgd_none(READ_ONCE(*dst_pgdp))) { >>>>>> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC); >>>>>> + if (!dst_p4dp) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + pgd_populate(NULL, dst_pgdp, dst_p4dp); >>>>>> + } >>>>>> + >>>>>> + dst_p4dp = p4d_offset(dst_pgdp, start); >>>>>> + src_p4dp = p4d_offset(src_pgdp, start); >>>>>> + >>>>>> + do { >>>>>> + p4d_t p4d = READ_ONCE(*src_p4dp); >>>>>> + >>>>>> + next = p4d_addr_end(start, end); >>>>>> + >>>>>> + if (p4d_none(p4d)) >>>>>> + continue; >>>>>> + >>>>>> + if (p4d_leaf(p4d)) { >>>>>> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot))); >>>>>> + } else { >>>>>> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, start, >>>>>> next, prot); >>>>>> + if (ret) >>>>>> + return -ENOMEM; >>>>>> + } >>>>>> + } while (dst_p4dp++, src_p4dp++, start = next, start != end); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static int temp_pgtable_mapping(pgd_t *pgdp, unsigned long start, >>>>>> unsigned long end, pgprot_t prot) >>>>>> +{ >>>>>> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, start); >>>>>> + pgd_t *src_pgdp = pgd_offset_k(start); >>>>>> + unsigned long next; >>>>>> + unsigned long ret; >>>>>> + >>>>>> + do { >>>>>> + pgd_t pgd = READ_ONCE(*src_pgdp); >>>>>> + >>>>>> + next = pgd_addr_end(start, end); >>>>>> + >>>>>> + if (pgd_none(pgd)) >>>>>> + continue; >>>>>> + >>>>>> + if (pgd_leaf(pgd)) { >>>>>> + set_pgd(dst_pgdp, __pgd(pgd_val(pgd) | pgprot_val(prot))); >>>>>> + } else { >>>>>> + ret = temp_pgtable_map_p4d(dst_pgdp, src_pgdp, start, >>>>>> next, prot); >>>>>> + if (ret) >>>>>> + return -ENOMEM; >>>>>> + } >>>>>> + } while (dst_pgdp++, src_pgdp++, start = next, start != end); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static unsigned long relocate_restore_code(void) >>>>>> +{ >>>>>> + void *page = (void *)get_safe_page(GFP_ATOMIC); >>>>>> + >>>>>> + if (!page) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + copy_page(page, hibernate_core_restore_code); >>>>>> + >>>>>> + /* Make the page containing the relocated code executable. */ >>>>>> + set_memory_x((unsigned long)page, 1); >>>>>> + >>>>>> + return (unsigned long)page; >>>>>> +} >>>>>> + >>>>>> +int swsusp_arch_resume(void) >>>>>> +{ >>>>>> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn); >>>>>> + unsigned long start = PAGE_OFFSET; >>>>>> + int ret; >>>>>> + >>>>>> + /* >>>>>> + * Memory allocated by get_safe_page() will be dealt with by the >>>>>> hibernation core, >>>>>> + * we don't need to free it here. >>>>>> + */ >>>>>> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); >>>>>> + if (!resume_pg_dir) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + /* >>>>>> + * Create a temporary page table and map the whole linear region >>>>>> as executable and >>>>>> + * writable. >>>>>> + */ >>>>>> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, >>>>>> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); >>>>> Hmm weird protections, PAGE_PRESENT and PAGE_READ are not even set. >>>>> Shouldn't you use PAGE_KERNEL_EXEC instead? >>>> Forget this one, that's the protections you add, so that's good :) >>> OK. >>>>>> + if (ret) >>>>>> + return ret; >>>>>> + >>>>>> + /* Move the restore code to a new page so that it doesn't get >>>>>> overwritten by itself. */ >>>>>> + relocated_restore_code = relocate_restore_code(); >>>>>> + if (relocated_restore_code == -ENOMEM) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + /* >>>>>> + * Map the __hibernate_cpu_resume() address to the temporary >>>>>> page table so that the >>>>>> + * restore code can jumps to it after finished restore the >>>>>> image. The next execution >>>>>> + * code doesn't find itself in a different address space after >>>>>> switching over to the >>>>>> + * original page table used by the hibernated image. >>>>>> + * >>>>>> + */ >>>>>> + start = (unsigned long)resume_hdr.restore_cpu_addr; >>>>>> + end = start + PAGE_SIZE; >>>>>> + >>>>>> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, >>>>>> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); >>>>>> + if (ret) >>>>>> + return ret; >>>>> You set the linear mapping as executable so no need for this. >>> We need this mapping because the kernel addresses and the linear addresses in riscv-64bit are in different boundary. For example: >>> Linear address start with 0xff60000000000000 >>> Kernel address start with 0xffffffff80000000 >> >> Ok I understand now that you need to map this into the temporary page >> table: one nit though, since you copy the kernel mapping, no need to use >> PAGE_WRITE | PAGE_EXEC, the protections are already correct. > Here is the thing. For riscv_64bit, I am agreed with you that we don’t need to use PAGE_WRITE | PAGE_EXEC. > However, we need to use PAGE_WRITE | PAGE_EXEC for riscv_32bit because the linear addresses and kernel addresses are similar. The attribute becomes not writable if following the "original" page table. I am thinking to use #ifdef config_32bit to split the path for 64bit and 32bit but I just afraid the linear addresses and kernel addresses might change again in the future. To be on the safe side, I let both riscv_32bit & riscv_64bit to use the PAGE_WRITE | PAGE_EXEC. Indeed, you really thought about everything, well done! I would add a comment that explains just that above, no need for other ifdefs IMO. > For example: > Linear address and kernel address for 32bit started with 0xc0000000 > Linear address for 64bit start with 0xff60000000000000 > Kernel address for 64bit start with 0xffffffff80000000 > > Please let me know your thoughts. > >> Thanks, >> >> Alex >> >> >>>>>> + >>>>>> + hibernate_restore_image(resume_hdr.saved_satp, >>>>>> (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode), >>>>>> + resume_hdr.restore_cpu_addr); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +#ifdef CONFIG_PM_SLEEP_SMP >>>>>> +int hibernate_resume_nonboot_cpu_disable(void) >>>>>> +{ >>>>>> + if (sleep_cpu < 0) { >>>>>> + pr_err("Failing to resume from hibernate on an unknown CPU\n"); >>>>>> + return -ENODEV; >>>>>> + } >>>>>> + >>>>>> + return freeze_secondary_cpus(sleep_cpu); >>>>>> +} >>>>>> +#endif >>>>>> + >>>>>> +static int __init riscv_hibernate_init(void) >>>>>> +{ >>>>>> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), >>>>>> GFP_KERNEL); >>>>>> + >>>>>> + if (WARN_ON(!hibernate_cpu_context)) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +early_initcall(riscv_hibernate_init); >>>>> _______________________________________________ >>>>> linux-riscv mailing list >>>>> linux-riscv@lists.infradead.org >>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv >>> _______________________________________________ >>> linux-riscv mailing list >>> linux-riscv@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> -----Original Message----- > From: Alexandre Ghiti <alex@ghiti.fr> > Sent: Friday, March 10, 2023 6:50 PM > To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu > Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo > <mason.huo@starfivetech.com> > Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk > > > On 3/10/23 11:43, JeeHeng Sia wrote: > > Hi Alex, > > > >> -----Original Message----- > >> From: Alexandre Ghiti <alex@ghiti.fr> > >> Sent: Friday, March 10, 2023 5:59 PM > >> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu > >> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo > >> <mason.huo@starfivetech.com> > >> Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk > >> > >> Hi JeeHeng, > >> > >> On 3/8/23 12:07, JeeHeng Sia wrote: > >>>> -----Original Message----- > >>>> From: Alexandre Ghiti <alex@ghiti.fr> > >>>> Sent: Wednesday, March 8, 2023 5:28 PM > >>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu > >>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo > >>>> <mason.huo@starfivetech.com> > >>>> Subject: Re: [PATCH v5 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk > >>>> > >>>> > >>>> On 3/8/23 10:08, Alexandre Ghiti wrote: > >>>>> On 3/8/23 09:06, Sia Jee Heng wrote: > >>>>>> Low level Arch functions were created to support hibernation. > >>>>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write > >>>>>> cpu state onto the stack, then calling swsusp_save() to save the memory > >>>>>> image. > >>>>>> > >>>>>> Arch specific hibernation header is implemented and is utilized by the > >>>>>> arch_hibernation_header_restore() and arch_hibernation_header_save() > >>>>>> functions. The arch specific hibernation header consists of satp, > >>>>>> hartid, > >>>>>> and the cpu_resume address. The kernel built version is also need to be > >>>>>> saved into the hibernation image header to making sure only the same > >>>>>> kernel is restore when resume. > >>>>>> > >>>>>> swsusp_arch_resume() creates a temporary page table that covering only > >>>>>> the linear map. It copies the restore code to a 'safe' page, then start > >>>>>> to restore the memory image. Once completed, it restores the original > >>>>>> kernel's page table. It then calls into __hibernate_cpu_resume() > >>>>>> to restore the CPU context. Finally, it follows the normal hibernation > >>>>>> path back to the hibernation core. > >>>>>> > >>>>>> To enable hibernation/suspend to disk into RISCV, the below config > >>>>>> need to be enabled: > >>>>>> - CONFIG_ARCH_HIBERNATION_HEADER > >>>>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE > >>>>>> > >>>>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> > >>>>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> > >>>>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> > >>>>>> --- > >>>>>> arch/riscv/Kconfig | 6 + > >>>>>> arch/riscv/include/asm/assembler.h | 20 ++ > >>>>>> arch/riscv/include/asm/suspend.h | 19 ++ > >>>>>> arch/riscv/kernel/Makefile | 1 + > >>>>>> arch/riscv/kernel/asm-offsets.c | 5 + > >>>>>> arch/riscv/kernel/hibernate-asm.S | 77 ++++++ > >>>>>> arch/riscv/kernel/hibernate.c | 424 +++++++++++++++++++++++++++++ > >>>>>> 7 files changed, 552 insertions(+) > >>>>>> create mode 100644 arch/riscv/kernel/hibernate-asm.S > >>>>>> create mode 100644 arch/riscv/kernel/hibernate.c > >>>>>> > >>>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > >>>>>> index c5e42cc37604..473c2a1a6884 100644 > >>>>>> --- a/arch/riscv/Kconfig > >>>>>> +++ b/arch/riscv/Kconfig > >>>>>> @@ -707,6 +707,12 @@ menu "Power management options" > >>>>>> source "kernel/power/Kconfig" > >>>>>> +config ARCH_HIBERNATION_POSSIBLE > >>>>>> + def_bool y > >>>>>> + > >>>>>> +config ARCH_HIBERNATION_HEADER > >>>>>> + def_bool HIBERNATION > >>>>>> + > >>>>>> endmenu # "Power management options" > >>>>>> menu "CPU Power Management" > >>>>>> diff --git a/arch/riscv/include/asm/assembler.h > >>>>>> b/arch/riscv/include/asm/assembler.h > >>>>>> index ba59d38f8937..44b1457d3e95 100644 > >>>>>> --- a/arch/riscv/include/asm/assembler.h > >>>>>> +++ b/arch/riscv/include/asm/assembler.h > >>>>>> @@ -59,4 +59,24 @@ > >>>>>> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0) > >>>>>> .endm > >>>>>> +/* > >>>>>> + * copy_page - copy 1 page (4KB) of data from source to destination > >>>>>> + * @a0 - destination > >>>>>> + * @a1 - source > >>>>>> + */ > >>>>>> + .macro copy_page a0, a1 > >>>>>> + lui a2, 0x1 > >>>>>> + add a2, a2, a0 > >>>>>> +1 : > >>>>>> + REG_L t0, 0(a1) > >>>>>> + REG_L t1, SZREG(a1) > >>>>>> + > >>>>>> + REG_S t0, 0(a0) > >>>>>> + REG_S t1, SZREG(a0) > >>>>>> + > >>>>>> + addi a0, a0, 2 * SZREG > >>>>>> + addi a1, a1, 2 * SZREG > >>>>>> + bne a2, a0, 1b > >>>>>> + .endm > >>>>>> + > >>>>>> #endif /* __ASM_ASSEMBLER_H */ > >>>>>> diff --git a/arch/riscv/include/asm/suspend.h > >>>>>> b/arch/riscv/include/asm/suspend.h > >>>>>> index 67e047445662..02f87867389a 100644 > >>>>>> --- a/arch/riscv/include/asm/suspend.h > >>>>>> +++ b/arch/riscv/include/asm/suspend.h > >>>>>> @@ -21,6 +21,11 @@ struct suspend_context { > >>>>>> #endif > >>>>>> }; > >>>>>> +/* > >>>>>> + * Used by hibernation core and cleared during resume sequence > >>>>>> + */ > >>>>>> +extern int in_suspend; > >>>>>> + > >>>>>> /* Low-level CPU suspend entry function */ > >>>>>> int __cpu_suspend_enter(struct suspend_context *context); > >>>>>> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, > >>>>>> unsigned long context); > >>>>>> /* Used to save and restore the CSRs */ > >>>>>> void suspend_save_csrs(struct suspend_context *context); > >>>>>> void suspend_restore_csrs(struct suspend_context *context); > >>>>>> + > >>>>>> +/* Low-level API to support hibernation */ > >>>>>> +int swsusp_arch_suspend(void); > >>>>>> +int swsusp_arch_resume(void); > >>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size); > >>>>>> +int arch_hibernation_header_restore(void *addr); > >>>>>> +int __hibernate_cpu_resume(void); > >>>>>> + > >>>>>> +/* Used to resume on the CPU we hibernated on */ > >>>>>> +int hibernate_resume_nonboot_cpu_disable(void); > >>>>>> + > >>>>>> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, > >>>>>> unsigned long satp_temp, > >>>>>> + unsigned long cpu_resume); > >>>>>> +asmlinkage int hibernate_core_restore_code(void); > >>>>>> #endif > >>>>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > >>>>>> index 4cf303a779ab..daab341d55e4 100644 > >>>>>> --- a/arch/riscv/kernel/Makefile > >>>>>> +++ b/arch/riscv/kernel/Makefile > >>>>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o > >>>>>> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o > >>>>>> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o > >>>>>> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o > >>>>>> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o > >>>>>> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o > >>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c > >>>>>> b/arch/riscv/kernel/asm-offsets.c > >>>>>> index df9444397908..d6a75aac1d27 100644 > >>>>>> --- a/arch/riscv/kernel/asm-offsets.c > >>>>>> +++ b/arch/riscv/kernel/asm-offsets.c > >>>>>> @@ -9,6 +9,7 @@ > >>>>>> #include <linux/kbuild.h> > >>>>>> #include <linux/mm.h> > >>>>>> #include <linux/sched.h> > >>>>>> +#include <linux/suspend.h> > >>>>>> #include <asm/kvm_host.h> > >>>>>> #include <asm/thread_info.h> > >>>>>> #include <asm/ptrace.h> > >>>>>> @@ -116,6 +117,10 @@ void asm_offsets(void) > >>>>>> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs); > >>>>>> + OFFSET(HIBERN_PBE_ADDR, pbe, address); > >>>>>> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address); > >>>>>> + OFFSET(HIBERN_PBE_NEXT, pbe, next); > >>>>>> + > >>>>>> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); > >>>>>> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); > >>>>>> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); > >>>>>> diff --git a/arch/riscv/kernel/hibernate-asm.S > >>>>>> b/arch/riscv/kernel/hibernate-asm.S > >>>>>> new file mode 100644 > >>>>>> index 000000000000..effaf5ca5da0 > >>>>>> --- /dev/null > >>>>>> +++ b/arch/riscv/kernel/hibernate-asm.S > >>>>>> @@ -0,0 +1,77 @@ > >>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */ > >>>>>> +/* > >>>>>> + * Hibernation low level support for RISCV. > >>>>>> + * > >>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd. > >>>>>> + * > >>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> > >>>>>> + */ > >>>>>> + > >>>>>> +#include <asm/asm.h> > >>>>>> +#include <asm/asm-offsets.h> > >>>>>> +#include <asm/assembler.h> > >>>>>> +#include <asm/csr.h> > >>>>>> + > >>>>>> +#include <linux/linkage.h> > >>>>>> + > >>>>>> +/* > >>>>>> + * int __hibernate_cpu_resume(void) > >>>>>> + * Switch back to the hibernated image's page table prior to > >>>>>> restoring the CPU > >>>>>> + * context. > >>>>>> + * > >>>>>> + * Always returns 0 > >>>>>> + */ > >>>>>> +ENTRY(__hibernate_cpu_resume) > >>>>>> + /* switch to hibernated image's page table. */ > >>>>>> + csrw CSR_SATP, s0 > >>>>>> + sfence.vma > >>>>>> + > >>>>>> + REG_L a0, hibernate_cpu_context > >>>>>> + > >>>>>> + suspend_restore_csrs > >>>>>> + suspend_restore_regs > >>>>>> + > >>>>>> + /* Return zero value. */ > >>>>>> + mv a0, zero > >>>>>> + > >>>>>> + ret > >>>>>> +END(__hibernate_cpu_resume) > >>>>>> + > >>>>>> +/* > >>>>>> + * Prepare to restore the image. > >>>>>> + * a0: satp of saved page tables. > >>>>>> + * a1: satp of temporary page tables. > >>>>>> + * a2: cpu_resume. > >>>>>> + */ > >>>>>> +ENTRY(hibernate_restore_image) > >>>>>> + mv s0, a0 > >>>>>> + mv s1, a1 > >>>>>> + mv s2, a2 > >>>>>> + REG_L s4, restore_pblist > >>>>>> + REG_L a1, relocated_restore_code > >>>>>> + > >>>>>> + jalr a1 > >>>>>> +END(hibernate_restore_image) > >>>>>> + > >>>>>> +/* > >>>>>> + * The below code will be executed from a 'safe' page. > >>>>>> + * It first switches to the temporary page table, then starts to > >>>>>> copy the pages > >>>>>> + * back to the original memory location. Finally, it jumps to > >>>>>> __hibernate_cpu_resume() > >>>>>> + * to restore the CPU context. > >>>>>> + */ > >>>>>> +ENTRY(hibernate_core_restore_code) > >>>>>> + /* switch to temp page table. */ > >>>>>> + csrw satp, s1 > >>>>>> + sfence.vma > >>>>>> +.Lcopy: > >>>>>> + /* The below code will restore the hibernated image. */ > >>>>>> + REG_L a1, HIBERN_PBE_ADDR(s4) > >>>>>> + REG_L a0, HIBERN_PBE_ORIG(s4) > >>>>>> + > >>>>>> + copy_page a0, a1 > >>>>>> + > >>>>>> + REG_L s4, HIBERN_PBE_NEXT(s4) > >>>>>> + bnez s4, .Lcopy > >>>>>> + > >>>>>> + jalr s2 > >>>>>> +END(hibernate_core_restore_code) > >>>>>> diff --git a/arch/riscv/kernel/hibernate.c > >>>>>> b/arch/riscv/kernel/hibernate.c > >>>>>> new file mode 100644 > >>>>>> index 000000000000..8fab50122e54 > >>>>>> --- /dev/null > >>>>>> +++ b/arch/riscv/kernel/hibernate.c > >>>>>> @@ -0,0 +1,424 @@ > >>>>>> +// SPDX-License-Identifier: GPL-2.0-only > >>>>>> +/* > >>>>>> + * Hibernation support for RISCV > >>>>>> + * > >>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd. > >>>>>> + * > >>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> > >>>>>> + */ > >>>>>> + > >>>>>> +#include <asm/barrier.h> > >>>>>> +#include <asm/cacheflush.h> > >>>>>> +#include <asm/mmu_context.h> > >>>>>> +#include <asm/page.h> > >>>>>> +#include <asm/pgalloc.h> > >>>>>> +#include <asm/pgtable.h> > >>>>>> +#include <asm/sections.h> > >>>>>> +#include <asm/set_memory.h> > >>>>>> +#include <asm/smp.h> > >>>>>> +#include <asm/suspend.h> > >>>>>> + > >>>>>> +#include <linux/cpu.h> > >>>>>> +#include <linux/memblock.h> > >>>>>> +#include <linux/pm.h> > >>>>>> +#include <linux/sched.h> > >>>>>> +#include <linux/suspend.h> > >>>>>> +#include <linux/utsname.h> > >>>>>> + > >>>>>> +/* The logical cpu number we should resume on, initialised to a > >>>>>> non-cpu number. */ > >>>>>> +static int sleep_cpu = -EINVAL; > >>>>>> + > >>>>>> +/* Pointer to the temporary resume page table. */ > >>>>>> +static pgd_t *resume_pg_dir; > >>>>>> + > >>>>>> +/* CPU context to be saved. */ > >>>>>> +struct suspend_context *hibernate_cpu_context; > >>>>>> +EXPORT_SYMBOL_GPL(hibernate_cpu_context); > >>>>>> + > >>>>>> +unsigned long relocated_restore_code; > >>>>>> +EXPORT_SYMBOL_GPL(relocated_restore_code); > >>>>>> + > >>>>>> +/** > >>>>>> + * struct arch_hibernate_hdr_invariants - container to store kernel > >>>>>> build version. > >>>>>> + * @uts_version: to save the build number and date so that the we do > >>>>>> not resume with > >>>>>> + * a different kernel. > >>>>>> + */ > >>>>>> +struct arch_hibernate_hdr_invariants { > >>>>>> + char uts_version[__NEW_UTS_LEN + 1]; > >>>>>> +}; > >>>>>> + > >>>>>> +/** > >>>>>> + * struct arch_hibernate_hdr - helper parameters that help us to > >>>>>> restore the image. > >>>>>> + * @invariants: container to store kernel build version. > >>>>>> + * @hartid: to make sure same boot_cpu executes the > >>>>>> hibernate/restore code. > >>>>>> + * @saved_satp: original page table used by the hibernated image. > >>>>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU > >>>>>> context. > >>>>>> + */ > >>>>>> +static struct arch_hibernate_hdr { > >>>>>> + struct arch_hibernate_hdr_invariants invariants; > >>>>>> + unsigned long hartid; > >>>>>> + unsigned long saved_satp; > >>>>>> + unsigned long restore_cpu_addr; > >>>>>> +} resume_hdr; > >>>>>> + > >>>>>> +static void arch_hdr_invariants(struct arch_hibernate_hdr_invariants > >>>>>> *i) > >>>>>> +{ > >>>>>> + memset(i, 0, sizeof(*i)); > >>>>>> + memcpy(i->uts_version, init_utsname()->version, > >>>>>> sizeof(i->uts_version)); > >>>>>> +} > >>>>>> + > >>>>>> +/* > >>>>>> + * Check if the given pfn is in the 'nosave' section. > >>>>>> + */ > >>>>>> +int pfn_is_nosave(unsigned long pfn) > >>>>>> +{ > >>>>>> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); > >>>>>> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); > >>>>>> + > >>>>>> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)); > >>>>>> +} > >>>>>> + > >>>>>> +void notrace save_processor_state(void) > >>>>>> +{ > >>>>>> + WARN_ON(num_online_cpus() != 1); > >>>>>> +} > >>>>>> + > >>>>>> +void notrace restore_processor_state(void) > >>>>>> +{ > >>>>>> +} > >>>>>> + > >>>>>> +/* > >>>>>> + * Helper parameters need to be saved to the hibernation image header. > >>>>>> + */ > >>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size) > >>>>>> +{ > >>>>>> + struct arch_hibernate_hdr *hdr = addr; > >>>>>> + > >>>>>> + if (max_size < sizeof(*hdr)) > >>>>>> + return -EOVERFLOW; > >>>>>> + > >>>>>> + arch_hdr_invariants(&hdr->invariants); > >>>>>> + > >>>>>> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu); > >>>>>> + hdr->saved_satp = csr_read(CSR_SATP); > >>>>>> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume; > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save); > >>>>>> + > >>>>>> +/* > >>>>>> + * Retrieve the helper parameters from the hibernation image header. > >>>>>> + */ > >>>>>> +int arch_hibernation_header_restore(void *addr) > >>>>>> +{ > >>>>>> + struct arch_hibernate_hdr_invariants invariants; > >>>>>> + struct arch_hibernate_hdr *hdr = addr; > >>>>>> + int ret = 0; > >>>>>> + > >>>>>> + arch_hdr_invariants(&invariants); > >>>>>> + > >>>>>> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) { > >>>>>> + pr_crit("Hibernate image not generated by this kernel!\n"); > >>>>>> + return -EINVAL; > >>>>>> + } > >>>>>> + > >>>>>> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid); > >>>>>> + if (sleep_cpu < 0) { > >>>>>> + pr_crit("Hibernated on a CPU not known to this kernel!\n"); > >>>>>> + sleep_cpu = -EINVAL; > >>>>>> + return -EINVAL; > >>>>>> + } > >>>>>> + > >>>>>> +#ifdef CONFIG_SMP > >>>>>> + ret = bringup_hibernate_cpu(sleep_cpu); > >>>>>> + if (ret) { > >>>>>> + sleep_cpu = -EINVAL; > >>>>>> + return ret; > >>>>>> + } > >>>>>> +#endif > >>>>>> + resume_hdr = *hdr; > >>>>>> + > >>>>>> + return ret; > >>>>>> +} > >>>>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore); > >>>>>> + > >>>>>> +int swsusp_arch_suspend(void) > >>>>>> +{ > >>>>>> + int ret = 0; > >>>>>> + > >>>>>> + if (__cpu_suspend_enter(hibernate_cpu_context)) { > >>>>>> + sleep_cpu = smp_processor_id(); > >>>>>> + suspend_save_csrs(hibernate_cpu_context); > >>>>>> + ret = swsusp_save(); > >>>>>> + } else { > >>>>>> + suspend_restore_csrs(hibernate_cpu_context); > >>>>>> + flush_tlb_all(); > >>>>>> + flush_icache_all(); > >>>>>> + > >>>>>> + /* > >>>>>> + * Tell the hibernation core that we've just restored the > >>>>>> memory. > >>>>>> + */ > >>>>>> + in_suspend = 0; > >>>>>> + sleep_cpu = -EINVAL; > >>>>>> + } > >>>>>> + > >>>>>> + return ret; > >>>>>> +} > >>>>>> + > >>>>>> +static int temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, > >>>>>> unsigned long start, > >>>>>> + unsigned long end, pgprot_t prot) > >>>>>> +{ > >>>>>> + pte_t *src_ptep; > >>>>>> + pte_t *dst_ptep; > >>>>>> + > >>>>>> + if (pmd_none(READ_ONCE(*dst_pmdp))) { > >>>>>> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC); > >>>>>> + if (!dst_ptep) > >>>>>> + return -ENOMEM; > >>>>>> + > >>>>>> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); > >>>>>> + } > >>>>>> + > >>>>>> + dst_ptep = pte_offset_kernel(dst_pmdp, start); > >>>>>> + src_ptep = pte_offset_kernel(src_pmdp, start); > >>>>>> + > >>>>>> + do { > >>>>>> + pte_t pte = READ_ONCE(*src_ptep); > >>>>>> + > >>>>>> + if (pte_present(pte)) > >>>>>> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot))); > >>>>>> + } while (dst_ptep++, src_ptep++, start += PAGE_SIZE, start < end); > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> +static int temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp, > >>>>>> unsigned long start, > >>>>>> + unsigned long end, pgprot_t prot) > >>>>>> +{ > >>>>>> + unsigned long next; > >>>>>> + unsigned long ret; > >>>>>> + pmd_t *src_pmdp; > >>>>>> + pmd_t *dst_pmdp; > >>>>>> + > >>>>>> + if (pud_none(READ_ONCE(*dst_pudp))) { > >>>>>> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC); > >>>>>> + if (!dst_pmdp) > >>>>>> + return -ENOMEM; > >>>>>> + > >>>>>> + pud_populate(NULL, dst_pudp, dst_pmdp); > >>>>>> + } > >>>>>> + > >>>>>> + dst_pmdp = pmd_offset(dst_pudp, start); > >>>>>> + src_pmdp = pmd_offset(src_pudp, start); > >>>>>> + > >>>>>> + do { > >>>>>> + pmd_t pmd = READ_ONCE(*src_pmdp); > >>>>>> + > >>>>>> + next = pmd_addr_end(start, end); > >>>>>> + > >>>>>> + if (pmd_none(pmd)) > >>>>>> + continue; > >>>>>> + > >>>>>> + if (pmd_leaf(pmd)) { > >>>>>> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot))); > >>>>>> + } else { > >>>>>> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, start, > >>>>>> next, prot); > >>>>>> + if (ret) > >>>>>> + return -ENOMEM; > >>>>>> + } > >>>>>> + } while (dst_pmdp++, src_pmdp++, start = next, start != end); > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> +static int temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp, > >>>>>> unsigned long start, > >>>>>> + unsigned long end, pgprot_t prot) > >>>>>> +{ > >>>>>> + unsigned long next; > >>>>>> + unsigned long ret; > >>>>>> + pud_t *dst_pudp; > >>>>>> + pud_t *src_pudp; > >>>>>> + > >>>>>> + if (p4d_none(READ_ONCE(*dst_p4dp))) { > >>>>>> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC); > >>>>>> + if (!dst_pudp) > >>>>>> + return -ENOMEM; > >>>>>> + > >>>>>> + p4d_populate(NULL, dst_p4dp, dst_pudp); > >>>>>> + } > >>>>>> + > >>>>>> + dst_pudp = pud_offset(dst_p4dp, start); > >>>>>> + src_pudp = pud_offset(src_p4dp, start); > >>>>>> + > >>>>>> + do { > >>>>>> + pud_t pud = READ_ONCE(*src_pudp); > >>>>>> + > >>>>>> + next = pud_addr_end(start, end); > >>>>>> + > >>>>>> + if (pud_none(pud)) > >>>>>> + continue; > >>>>>> + > >>>>>> + if (pud_leaf(pud)) { > >>>>>> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot))); > >>>>>> + } else { > >>>>>> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, start, > >>>>>> next, prot); > >>>>>> + if (ret) > >>>>>> + return -ENOMEM; > >>>>>> + } > >>>>>> + } while (dst_pudp++, src_pudp++, start = next, start != end); > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> +static int temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp, > >>>>>> unsigned long start, > >>>>>> + unsigned long end, pgprot_t prot) > >>>>>> +{ > >>>>>> + unsigned long next; > >>>>>> + unsigned long ret; > >>>>>> + p4d_t *dst_p4dp; > >>>>>> + p4d_t *src_p4dp; > >>>>>> + > >>>>>> + if (pgd_none(READ_ONCE(*dst_pgdp))) { > >>>>>> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC); > >>>>>> + if (!dst_p4dp) > >>>>>> + return -ENOMEM; > >>>>>> + > >>>>>> + pgd_populate(NULL, dst_pgdp, dst_p4dp); > >>>>>> + } > >>>>>> + > >>>>>> + dst_p4dp = p4d_offset(dst_pgdp, start); > >>>>>> + src_p4dp = p4d_offset(src_pgdp, start); > >>>>>> + > >>>>>> + do { > >>>>>> + p4d_t p4d = READ_ONCE(*src_p4dp); > >>>>>> + > >>>>>> + next = p4d_addr_end(start, end); > >>>>>> + > >>>>>> + if (p4d_none(p4d)) > >>>>>> + continue; > >>>>>> + > >>>>>> + if (p4d_leaf(p4d)) { > >>>>>> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot))); > >>>>>> + } else { > >>>>>> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, start, > >>>>>> next, prot); > >>>>>> + if (ret) > >>>>>> + return -ENOMEM; > >>>>>> + } > >>>>>> + } while (dst_p4dp++, src_p4dp++, start = next, start != end); > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> +static int temp_pgtable_mapping(pgd_t *pgdp, unsigned long start, > >>>>>> unsigned long end, pgprot_t prot) > >>>>>> +{ > >>>>>> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, start); > >>>>>> + pgd_t *src_pgdp = pgd_offset_k(start); > >>>>>> + unsigned long next; > >>>>>> + unsigned long ret; > >>>>>> + > >>>>>> + do { > >>>>>> + pgd_t pgd = READ_ONCE(*src_pgdp); > >>>>>> + > >>>>>> + next = pgd_addr_end(start, end); > >>>>>> + > >>>>>> + if (pgd_none(pgd)) > >>>>>> + continue; > >>>>>> + > >>>>>> + if (pgd_leaf(pgd)) { > >>>>>> + set_pgd(dst_pgdp, __pgd(pgd_val(pgd) | pgprot_val(prot))); > >>>>>> + } else { > >>>>>> + ret = temp_pgtable_map_p4d(dst_pgdp, src_pgdp, start, > >>>>>> next, prot); > >>>>>> + if (ret) > >>>>>> + return -ENOMEM; > >>>>>> + } > >>>>>> + } while (dst_pgdp++, src_pgdp++, start = next, start != end); > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> +static unsigned long relocate_restore_code(void) > >>>>>> +{ > >>>>>> + void *page = (void *)get_safe_page(GFP_ATOMIC); > >>>>>> + > >>>>>> + if (!page) > >>>>>> + return -ENOMEM; > >>>>>> + > >>>>>> + copy_page(page, hibernate_core_restore_code); > >>>>>> + > >>>>>> + /* Make the page containing the relocated code executable. */ > >>>>>> + set_memory_x((unsigned long)page, 1); > >>>>>> + > >>>>>> + return (unsigned long)page; > >>>>>> +} > >>>>>> + > >>>>>> +int swsusp_arch_resume(void) > >>>>>> +{ > >>>>>> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn); > >>>>>> + unsigned long start = PAGE_OFFSET; > >>>>>> + int ret; > >>>>>> + > >>>>>> + /* > >>>>>> + * Memory allocated by get_safe_page() will be dealt with by the > >>>>>> hibernation core, > >>>>>> + * we don't need to free it here. > >>>>>> + */ > >>>>>> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); > >>>>>> + if (!resume_pg_dir) > >>>>>> + return -ENOMEM; > >>>>>> + > >>>>>> + /* > >>>>>> + * Create a temporary page table and map the whole linear region > >>>>>> as executable and > >>>>>> + * writable. > >>>>>> + */ > >>>>>> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, > >>>>>> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); > >>>>> Hmm weird protections, PAGE_PRESENT and PAGE_READ are not even set. > >>>>> Shouldn't you use PAGE_KERNEL_EXEC instead? > >>>> Forget this one, that's the protections you add, so that's good :) > >>> OK. > >>>>>> + if (ret) > >>>>>> + return ret; > >>>>>> + > >>>>>> + /* Move the restore code to a new page so that it doesn't get > >>>>>> overwritten by itself. */ > >>>>>> + relocated_restore_code = relocate_restore_code(); > >>>>>> + if (relocated_restore_code == -ENOMEM) > >>>>>> + return -ENOMEM; > >>>>>> + > >>>>>> + /* > >>>>>> + * Map the __hibernate_cpu_resume() address to the temporary > >>>>>> page table so that the > >>>>>> + * restore code can jumps to it after finished restore the > >>>>>> image. The next execution > >>>>>> + * code doesn't find itself in a different address space after > >>>>>> switching over to the > >>>>>> + * original page table used by the hibernated image. > >>>>>> + * > >>>>>> + */ > >>>>>> + start = (unsigned long)resume_hdr.restore_cpu_addr; > >>>>>> + end = start + PAGE_SIZE; > >>>>>> + > >>>>>> + ret = temp_pgtable_mapping(resume_pg_dir, start, end, > >>>>>> __pgprot(_PAGE_WRITE | _PAGE_EXEC)); > >>>>>> + if (ret) > >>>>>> + return ret; > >>>>> You set the linear mapping as executable so no need for this. > >>> We need this mapping because the kernel addresses and the linear addresses in riscv-64bit are in different boundary. For > example: > >>> Linear address start with 0xff60000000000000 > >>> Kernel address start with 0xffffffff80000000 > >> > >> Ok I understand now that you need to map this into the temporary page > >> table: one nit though, since you copy the kernel mapping, no need to use > >> PAGE_WRITE | PAGE_EXEC, the protections are already correct. > > Here is the thing. For riscv_64bit, I am agreed with you that we don’t need to use PAGE_WRITE | PAGE_EXEC. > > However, we need to use PAGE_WRITE | PAGE_EXEC for riscv_32bit because the linear addresses and kernel addresses are similar. > The attribute becomes not writable if following the "original" page table. I am thinking to use #ifdef config_32bit to split the path for > 64bit and 32bit but I just afraid the linear addresses and kernel addresses might change again in the future. To be on the safe side, I > let both riscv_32bit & riscv_64bit to use the PAGE_WRITE | PAGE_EXEC. > > > Indeed, you really thought about everything, well done! I would add a > comment that explains just that above, no need for other ifdefs IMO. Sure, can add comment to explain the above situation. > > > > For example: > > Linear address and kernel address for 32bit started with 0xc0000000 > > Linear address for 64bit start with 0xff60000000000000 > > Kernel address for 64bit start with 0xffffffff80000000 > > > > Please let me know your thoughts. > > > >> Thanks, > >> > >> Alex > >> > >> > >>>>>> + > >>>>>> + hibernate_restore_image(resume_hdr.saved_satp, > >>>>>> (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode), > >>>>>> + resume_hdr.restore_cpu_addr); > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> +#ifdef CONFIG_PM_SLEEP_SMP > >>>>>> +int hibernate_resume_nonboot_cpu_disable(void) > >>>>>> +{ > >>>>>> + if (sleep_cpu < 0) { > >>>>>> + pr_err("Failing to resume from hibernate on an unknown CPU\n"); > >>>>>> + return -ENODEV; > >>>>>> + } > >>>>>> + > >>>>>> + return freeze_secondary_cpus(sleep_cpu); > >>>>>> +} > >>>>>> +#endif > >>>>>> + > >>>>>> +static int __init riscv_hibernate_init(void) > >>>>>> +{ > >>>>>> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), > >>>>>> GFP_KERNEL); > >>>>>> + > >>>>>> + if (WARN_ON(!hibernate_cpu_context)) > >>>>>> + return -ENOMEM; > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> +early_initcall(riscv_hibernate_init); > >>>>> _______________________________________________ > >>>>> linux-riscv mailing list > >>>>> linux-riscv@lists.infradead.org > >>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv > >>> _______________________________________________ > >>> linux-riscv mailing list > >>> linux-riscv@lists.infradead.org > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index c5e42cc37604..473c2a1a6884 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -707,6 +707,12 @@ menu "Power management options" source "kernel/power/Kconfig" +config ARCH_HIBERNATION_POSSIBLE + def_bool y + +config ARCH_HIBERNATION_HEADER + def_bool HIBERNATION + endmenu # "Power management options" menu "CPU Power Management" diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h index ba59d38f8937..44b1457d3e95 100644 --- a/arch/riscv/include/asm/assembler.h +++ b/arch/riscv/include/asm/assembler.h @@ -59,4 +59,24 @@ REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0) .endm +/* + * copy_page - copy 1 page (4KB) of data from source to destination + * @a0 - destination + * @a1 - source + */ + .macro copy_page a0, a1 + lui a2, 0x1 + add a2, a2, a0 +1 : + REG_L t0, 0(a1) + REG_L t1, SZREG(a1) + + REG_S t0, 0(a0) + REG_S t1, SZREG(a0) + + addi a0, a0, 2 * SZREG + addi a1, a1, 2 * SZREG + bne a2, a0, 1b + .endm + #endif /* __ASM_ASSEMBLER_H */ diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h index 67e047445662..02f87867389a 100644 --- a/arch/riscv/include/asm/suspend.h +++ b/arch/riscv/include/asm/suspend.h @@ -21,6 +21,11 @@ struct suspend_context { #endif }; +/* + * Used by hibernation core and cleared during resume sequence + */ +extern int in_suspend; + /* Low-level CPU suspend entry function */ int __cpu_suspend_enter(struct suspend_context *context); @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context); /* Used to save and restore the CSRs */ void suspend_save_csrs(struct suspend_context *context); void suspend_restore_csrs(struct suspend_context *context); + +/* Low-level API to support hibernation */ +int swsusp_arch_suspend(void); +int swsusp_arch_resume(void); +int arch_hibernation_header_save(void *addr, unsigned int max_size); +int arch_hibernation_header_restore(void *addr); +int __hibernate_cpu_resume(void); + +/* Used to resume on the CPU we hibernated on */ +int hibernate_resume_nonboot_cpu_disable(void); + +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp, + unsigned long cpu_resume); +asmlinkage int hibernate_core_restore_code(void); #endif diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 4cf303a779ab..daab341d55e4 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index df9444397908..d6a75aac1d27 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -9,6 +9,7 @@ #include <linux/kbuild.h> #include <linux/mm.h> #include <linux/sched.h> +#include <linux/suspend.h> #include <asm/kvm_host.h> #include <asm/thread_info.h> #include <asm/ptrace.h> @@ -116,6 +117,10 @@ void asm_offsets(void) OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs); + OFFSET(HIBERN_PBE_ADDR, pbe, address); + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address); + OFFSET(HIBERN_PBE_NEXT, pbe, next); + OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S new file mode 100644 index 000000000000..effaf5ca5da0 --- /dev/null +++ b/arch/riscv/kernel/hibernate-asm.S @@ -0,0 +1,77 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Hibernation low level support for RISCV. + * + * Copyright (C) 2023 StarFive Technology Co., Ltd. + * + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> + */ + +#include <asm/asm.h> +#include <asm/asm-offsets.h> +#include <asm/assembler.h> +#include <asm/csr.h> + +#include <linux/linkage.h> + +/* + * int __hibernate_cpu_resume(void) + * Switch back to the hibernated image's page table prior to restoring the CPU + * context. + * + * Always returns 0 + */ +ENTRY(__hibernate_cpu_resume) + /* switch to hibernated image's page table. */ + csrw CSR_SATP, s0 + sfence.vma + + REG_L a0, hibernate_cpu_context + + suspend_restore_csrs + suspend_restore_regs + + /* Return zero value. */ + mv a0, zero + + ret +END(__hibernate_cpu_resume) + +/* + * Prepare to restore the image. + * a0: satp of saved page tables. + * a1: satp of temporary page tables. + * a2: cpu_resume. + */ +ENTRY(hibernate_restore_image) + mv s0, a0 + mv s1, a1 + mv s2, a2 + REG_L s4, restore_pblist + REG_L a1, relocated_restore_code + + jalr a1 +END(hibernate_restore_image) + +/* + * The below code will be executed from a 'safe' page. + * It first switches to the temporary page table, then starts to copy the pages + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume() + * to restore the CPU context. + */ +ENTRY(hibernate_core_restore_code) + /* switch to temp page table. */ + csrw satp, s1 + sfence.vma +.Lcopy: + /* The below code will restore the hibernated image. */ + REG_L a1, HIBERN_PBE_ADDR(s4) + REG_L a0, HIBERN_PBE_ORIG(s4) + + copy_page a0, a1 + + REG_L s4, HIBERN_PBE_NEXT(s4) + bnez s4, .Lcopy + + jalr s2 +END(hibernate_core_restore_code) diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c new file mode 100644 index 000000000000..8fab50122e54 --- /dev/null +++ b/arch/riscv/kernel/hibernate.c @@ -0,0 +1,424 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Hibernation support for RISCV + * + * Copyright (C) 2023 StarFive Technology Co., Ltd. + * + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com> + */ + +#include <asm/barrier.h> +#include <asm/cacheflush.h> +#include <asm/mmu_context.h> +#include <asm/page.h> +#include <asm/pgalloc.h> +#include <asm/pgtable.h> +#include <asm/sections.h> +#include <asm/set_memory.h> +#include <asm/smp.h> +#include <asm/suspend.h> + +#include <linux/cpu.h> +#include <linux/memblock.h> +#include <linux/pm.h> +#include <linux/sched.h> +#include <linux/suspend.h> +#include <linux/utsname.h> + +/* The logical cpu number we should resume on, initialised to a non-cpu number. */ +static int sleep_cpu = -EINVAL; + +/* Pointer to the temporary resume page table. */ +static pgd_t *resume_pg_dir; + +/* CPU context to be saved. */ +struct suspend_context *hibernate_cpu_context; +EXPORT_SYMBOL_GPL(hibernate_cpu_context); + +unsigned long relocated_restore_code; +EXPORT_SYMBOL_GPL(relocated_restore_code); + +/** + * struct arch_hibernate_hdr_invariants - container to store kernel build version. + * @uts_version: to save the build number and date so that the we do not resume with + * a different kernel. + */ +struct arch_hibernate_hdr_invariants { + char uts_version[__NEW_UTS_LEN + 1]; +}; + +/** + * struct arch_hibernate_hdr - helper parameters that help us to restore the image. + * @invariants: container to store kernel build version. + * @hartid: to make sure same boot_cpu executes the hibernate/restore code. + * @saved_satp: original page table used by the hibernated image. + * @restore_cpu_addr: the kernel's image address to restore the CPU context. + */ +static struct arch_hibernate_hdr { + struct arch_hibernate_hdr_invariants invariants; + unsigned long hartid; + unsigned long saved_satp; + unsigned long restore_cpu_addr; +} resume_hdr; + +static void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i) +{ + memset(i, 0, sizeof(*i)); + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version)); +} + +/* + * Check if the given pfn is in the 'nosave' section. + */ +int pfn_is_nosave(unsigned long pfn) +{ + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); + + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)); +} + +void notrace save_processor_state(void) +{ + WARN_ON(num_online_cpus() != 1); +} + +void notrace restore_processor_state(void) +{ +} + +/* + * Helper parameters need to be saved to the hibernation image header. + */ +int arch_hibernation_header_save(void *addr, unsigned int max_size) +{ + struct arch_hibernate_hdr *hdr = addr; + + if (max_size < sizeof(*hdr)) + return -EOVERFLOW; + + arch_hdr_invariants(&hdr->invariants); + + hdr->hartid = cpuid_to_hartid_map(sleep_cpu); + hdr->saved_satp = csr_read(CSR_SATP); + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume; + + return 0; +} +EXPORT_SYMBOL_GPL(arch_hibernation_header_save); + +/* + * Retrieve the helper parameters from the hibernation image header. + */ +int arch_hibernation_header_restore(void *addr) +{ + struct arch_hibernate_hdr_invariants invariants; + struct arch_hibernate_hdr *hdr = addr; + int ret = 0; + + arch_hdr_invariants(&invariants); + + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) { + pr_crit("Hibernate image not generated by this kernel!\n"); + return -EINVAL; + } + + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid); + if (sleep_cpu < 0) { + pr_crit("Hibernated on a CPU not known to this kernel!\n"); + sleep_cpu = -EINVAL; + return -EINVAL; + } + +#ifdef CONFIG_SMP + ret = bringup_hibernate_cpu(sleep_cpu); + if (ret) { + sleep_cpu = -EINVAL; + return ret; + } +#endif + resume_hdr = *hdr; + + return ret; +} +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore); + +int swsusp_arch_suspend(void) +{ + int ret = 0; + + if (__cpu_suspend_enter(hibernate_cpu_context)) { + sleep_cpu = smp_processor_id(); + suspend_save_csrs(hibernate_cpu_context); + ret = swsusp_save(); + } else { + suspend_restore_csrs(hibernate_cpu_context); + flush_tlb_all(); + flush_icache_all(); + + /* + * Tell the hibernation core that we've just restored the memory. + */ + in_suspend = 0; + sleep_cpu = -EINVAL; + } + + return ret; +} + +static int temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, unsigned long start, + unsigned long end, pgprot_t prot) +{ + pte_t *src_ptep; + pte_t *dst_ptep; + + if (pmd_none(READ_ONCE(*dst_pmdp))) { + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC); + if (!dst_ptep) + return -ENOMEM; + + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); + } + + dst_ptep = pte_offset_kernel(dst_pmdp, start); + src_ptep = pte_offset_kernel(src_pmdp, start); + + do { + pte_t pte = READ_ONCE(*src_ptep); + + if (pte_present(pte)) + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot))); + } while (dst_ptep++, src_ptep++, start += PAGE_SIZE, start < end); + + return 0; +} + +static int temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp, unsigned long start, + unsigned long end, pgprot_t prot) +{ + unsigned long next; + unsigned long ret; + pmd_t *src_pmdp; + pmd_t *dst_pmdp; + + if (pud_none(READ_ONCE(*dst_pudp))) { + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC); + if (!dst_pmdp) + return -ENOMEM; + + pud_populate(NULL, dst_pudp, dst_pmdp); + } + + dst_pmdp = pmd_offset(dst_pudp, start); + src_pmdp = pmd_offset(src_pudp, start); + + do { + pmd_t pmd = READ_ONCE(*src_pmdp); + + next = pmd_addr_end(start, end); + + if (pmd_none(pmd)) + continue; + + if (pmd_leaf(pmd)) { + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot))); + } else { + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, start, next, prot); + if (ret) + return -ENOMEM; + } + } while (dst_pmdp++, src_pmdp++, start = next, start != end); + + return 0; +} + +static int temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp, unsigned long start, + unsigned long end, pgprot_t prot) +{ + unsigned long next; + unsigned long ret; + pud_t *dst_pudp; + pud_t *src_pudp; + + if (p4d_none(READ_ONCE(*dst_p4dp))) { + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC); + if (!dst_pudp) + return -ENOMEM; + + p4d_populate(NULL, dst_p4dp, dst_pudp); + } + + dst_pudp = pud_offset(dst_p4dp, start); + src_pudp = pud_offset(src_p4dp, start); + + do { + pud_t pud = READ_ONCE(*src_pudp); + + next = pud_addr_end(start, end); + + if (pud_none(pud)) + continue; + + if (pud_leaf(pud)) { + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot))); + } else { + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, start, next, prot); + if (ret) + return -ENOMEM; + } + } while (dst_pudp++, src_pudp++, start = next, start != end); + + return 0; +} + +static int temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp, unsigned long start, + unsigned long end, pgprot_t prot) +{ + unsigned long next; + unsigned long ret; + p4d_t *dst_p4dp; + p4d_t *src_p4dp; + + if (pgd_none(READ_ONCE(*dst_pgdp))) { + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC); + if (!dst_p4dp) + return -ENOMEM; + + pgd_populate(NULL, dst_pgdp, dst_p4dp); + } + + dst_p4dp = p4d_offset(dst_pgdp, start); + src_p4dp = p4d_offset(src_pgdp, start); + + do { + p4d_t p4d = READ_ONCE(*src_p4dp); + + next = p4d_addr_end(start, end); + + if (p4d_none(p4d)) + continue; + + if (p4d_leaf(p4d)) { + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot))); + } else { + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, start, next, prot); + if (ret) + return -ENOMEM; + } + } while (dst_p4dp++, src_p4dp++, start = next, start != end); + + return 0; +} + +static int temp_pgtable_mapping(pgd_t *pgdp, unsigned long start, unsigned long end, pgprot_t prot) +{ + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, start); + pgd_t *src_pgdp = pgd_offset_k(start); + unsigned long next; + unsigned long ret; + + do { + pgd_t pgd = READ_ONCE(*src_pgdp); + + next = pgd_addr_end(start, end); + + if (pgd_none(pgd)) + continue; + + if (pgd_leaf(pgd)) { + set_pgd(dst_pgdp, __pgd(pgd_val(pgd) | pgprot_val(prot))); + } else { + ret = temp_pgtable_map_p4d(dst_pgdp, src_pgdp, start, next, prot); + if (ret) + return -ENOMEM; + } + } while (dst_pgdp++, src_pgdp++, start = next, start != end); + + return 0; +} + +static unsigned long relocate_restore_code(void) +{ + void *page = (void *)get_safe_page(GFP_ATOMIC); + + if (!page) + return -ENOMEM; + + copy_page(page, hibernate_core_restore_code); + + /* Make the page containing the relocated code executable. */ + set_memory_x((unsigned long)page, 1); + + return (unsigned long)page; +} + +int swsusp_arch_resume(void) +{ + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn); + unsigned long start = PAGE_OFFSET; + int ret; + + /* + * Memory allocated by get_safe_page() will be dealt with by the hibernation core, + * we don't need to free it here. + */ + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); + if (!resume_pg_dir) + return -ENOMEM; + + /* + * Create a temporary page table and map the whole linear region as executable and + * writable. + */ + ret = temp_pgtable_mapping(resume_pg_dir, start, end, __pgprot(_PAGE_WRITE | _PAGE_EXEC)); + if (ret) + return ret; + + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */ + relocated_restore_code = relocate_restore_code(); + if (relocated_restore_code == -ENOMEM) + return -ENOMEM; + + /* + * Map the __hibernate_cpu_resume() address to the temporary page table so that the + * restore code can jumps to it after finished restore the image. The next execution + * code doesn't find itself in a different address space after switching over to the + * original page table used by the hibernated image. + * + */ + start = (unsigned long)resume_hdr.restore_cpu_addr; + end = start + PAGE_SIZE; + + ret = temp_pgtable_mapping(resume_pg_dir, start, end, __pgprot(_PAGE_WRITE | _PAGE_EXEC)); + if (ret) + return ret; + + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode), + resume_hdr.restore_cpu_addr); + + return 0; +} + +#ifdef CONFIG_PM_SLEEP_SMP +int hibernate_resume_nonboot_cpu_disable(void) +{ + if (sleep_cpu < 0) { + pr_err("Failing to resume from hibernate on an unknown CPU\n"); + return -ENODEV; + } + + return freeze_secondary_cpus(sleep_cpu); +} +#endif + +static int __init riscv_hibernate_init(void) +{ + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL); + + if (WARN_ON(!hibernate_cpu_context)) + return -ENOMEM; + + return 0; +} + +early_initcall(riscv_hibernate_init);