Message ID | 20230619094705.51337-1-alexghiti@rivosinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2881681vqr; Mon, 19 Jun 2023 02:57:39 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ47GZSITwegoAFlyD4SaYUqSQdhsq7UTIc5LCouZ9t+bWRIAVqQZRQXPjwUQhufHy3uDI0e X-Received: by 2002:a05:622a:290:b0:3ff:1fba:2820 with SMTP id z16-20020a05622a029000b003ff1fba2820mr3169984qtw.58.1687168658751; Mon, 19 Jun 2023 02:57:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687168658; cv=none; d=google.com; s=arc-20160816; b=Jww+PFrCxPWZa5jx7+1X7B7BbIwkmsH+aAmKY3ik62fzeV27WS8G4NdQ/JA7SZpDII EZJK7EqMGQAF4wxisb2fiU2dUNjPyvl0jVGeChPC6kAcQKU/bh34ssi7DEmzVgJXqa1w nTOYzEPIxzffMxxnjB1QJZMq7vlp9gpqfxwPnOfaT7bsSlDwCVtfm3Ut81O0+iCXBYUo 8fU5YqhfuCHadwB0r2rNzr5Bd3PdIBhkz0fbab2noAnZtNeobL8ZK9M04xLiBpDwqZrK OZTu6nJNBJpbcbjhfWKHn5q4ONRzy9xzB1/rh7hjNGcvw1LpHNR6h6mZY3eFBVKQabX2 Xfjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=pm5bDO7dAW0U1/ZbP5zCHPn8yVTZ8TIF+oUNu//7tHw=; b=g1jIBThCQN20ce8mNOuZ4OKJQtC+fMChJCgkHgNM8d8GFAVTK0dPs0RepgPj+QY+7/ qyaYuBMdBtvgHMtxWktJdyhMA6k408gyan0XTK1Zn/D8myU/5ryEvxIuWDz4nNrNb6jS 5+e9Za3pISqrJ99Kf9I/bQm1iJDPSohOEO1pvGMUCcJT9lRWS9TF+D0DA7h6SkSYH8yA fWOwPAOALXgm/7LB5kAshDsP4QOdYI0sBgMaVk4Gh3lATFtAq15U6RsgpMOPrHSaY93Q Lw5hihPIKfpqKDG1kB2VJAfRqSJDg6DYcCnAFEGNEefAkTEz37h4mHW/+1cYALKvkiGv IZXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=eoKvIZdQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x10-20020aa79a4a000000b0064d2b0cc8f2si17610838pfj.150.2023.06.19.02.57.26; Mon, 19 Jun 2023 02:57:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=eoKvIZdQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231252AbjFSJr0 (ORCPT <rfc822;duw91626@gmail.com> + 99 others); Mon, 19 Jun 2023 05:47:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229489AbjFSJrW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 19 Jun 2023 05:47:22 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13A5518D for <linux-kernel@vger.kernel.org>; Mon, 19 Jun 2023 02:47:16 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id ffacd0b85a97d-3113675d582so885478f8f.3 for <linux-kernel@vger.kernel.org>; Mon, 19 Jun 2023 02:47:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1687168034; x=1689760034; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=pm5bDO7dAW0U1/ZbP5zCHPn8yVTZ8TIF+oUNu//7tHw=; b=eoKvIZdQXEl0s4AQsuIPqutj8dZ3pqg+m54vtxstzp6Q0Cl1fFmcVYBGLRaGahbRvw 4WTUnVOpcgg1uKzRkVqNkc0cTFSTeg99LuqB6iFfQdZ4lcwONnwloO9/Uh7KKy6yxRo0 ID3jZMT9Jm13te2HaRdlSP3bZ4ka+kz9eBSG7NfZ6aN/qXeRSrxhqCOuT/i9I1x6CLsf ejknWxyg7oYUDdBmP2yQjBXnRKg+SxrNYPjgIpIXs1XK1kguhu0Xh36iOiKEHEFAjeBN T/00S5O8FK7uHrHczAvt9ujM+Gt1C6GFyFqzc/l46kBEjEkUfeBCqhBnhHzTaGQkvDVr AYbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687168034; x=1689760034; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pm5bDO7dAW0U1/ZbP5zCHPn8yVTZ8TIF+oUNu//7tHw=; b=HH9HAZjDmU23g8lgLIUa8nxUKVEkCmuPHnYxMAWgMxdJNDY0lO9dqo8SbfOm5wUYr+ nMBOIBIVlL+7tGLpbrozM1P3KH3tahbvOntVYXNWYvQrWLabT07hAP2CtXYl2DOgKz+3 6Ha4jU0vR27DM1MIvm8c+c80QPuFT48QZ/MJTkUbiGBe+rE1g8Fzh3JicfOwM9uKYJfP BRAgIruqGkZkzw0/KNkQDekmVNONRHptaiAIPsEkAcvmA+U3CdnpkJqaFOFt/SGQ++e1 3jLAPcfGG5J74kA+6cfBXUBFadtBjmdc9fRCVsv8oYqLe0h7IWYFqOfe5mFbcUuWEAW8 46BQ== X-Gm-Message-State: AC+VfDwmgkxaBP+3wCTMhVFFqUTfkUGH+CaSb9WOJjxMk97DydZH2VjY W2nFyF1IAcPmnviaaxz08GbyBxOB0S93fbC05Uc= X-Received: by 2002:adf:f952:0:b0:311:162a:ce2a with SMTP id q18-20020adff952000000b00311162ace2amr5395662wrr.29.1687168034471; Mon, 19 Jun 2023 02:47:14 -0700 (PDT) Received: from localhost.localdomain (amontpellier-656-1-456-62.w92-145.abo.wanadoo.fr. [92.145.124.62]) by smtp.gmail.com with ESMTPSA id c1-20020a5d5281000000b0030aded83385sm30971199wrv.27.2023.06.19.02.47.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Jun 2023 02:47:14 -0700 (PDT) From: Alexandre Ghiti <alexghiti@rivosinc.com> To: Jonathan Corbet <corbet@lwn.net>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, linux-doc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Subject: [PATCH 1/2] Documentation: riscv: Add early boot document Date: Mon, 19 Jun 2023 11:47:04 +0200 Message-Id: <20230619094705.51337-1-alexghiti@rivosinc.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769124563556470274?= X-GMAIL-MSGID: =?utf-8?q?1769124563556470274?= |
Series |
[1/2] Documentation: riscv: Add early boot document
|
|
Commit Message
Alexandre Ghiti
June 19, 2023, 9:47 a.m. UTC
This document describes the constraints and requirements of the early boot process in a RISC-V kernel. Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> --- Documentation/riscv/boot-image-header.rst | 3 - Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ Documentation/riscv/index.rst | 1 + 3 files changed, 182 insertions(+), 3 deletions(-) create mode 100644 Documentation/riscv/boot.rst
Comments
@Sunil V L Something about ACPI is more than welcome :) And thanks to @Björn Töpel and @Atish Kumar Patra for helping in writing this document! On Mon, Jun 19, 2023 at 11:47 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > This document describes the constraints and requirements of the early > boot process in a RISC-V kernel. > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > --- > Documentation/riscv/boot-image-header.rst | 3 - > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > Documentation/riscv/index.rst | 1 + > 3 files changed, 182 insertions(+), 3 deletions(-) > create mode 100644 Documentation/riscv/boot.rst > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > index d7752533865f..a4a45310c4c4 100644 > --- a/Documentation/riscv/boot-image-header.rst > +++ b/Documentation/riscv/boot-image-header.rst > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > This document only describes the boot image header details for RISC-V Linux. > > -TODO: > - Write a complete booting guide. > - > The following 64-byte header is present in decompressed Linux kernel image:: > > u32 code0; /* Executable code */ > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > new file mode 100644 > index 000000000000..b02230818b79 > --- /dev/null > +++ b/Documentation/riscv/boot.rst > @@ -0,0 +1,181 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============================================= > +Early boot requirements/constraints on RISC-V > +============================================= > + > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > +:Date: 23 May 2023 > + > +This document describes what the RISC-V kernel expects from the previous stages > +and the firmware, but also the constraints that any developer must have in mind > +when touching the early boot process, e.g. before the final virtual mapping is > +setup. > + > +Pre-kernel boot (Expectations from firmware) > +============================================ > + > +Registers state > +--------------- > + > +The RISC-V kernel expects: > + > + * `$a0` to contain the hartid of the current core. > + * `$a1` to contain the address of the device tree in memory. > + > +CSR state > +--------- > + > +The RISC-V kernel expects: > + > + * `$satp = 0`: the MMU must be disabled. > + > +Reserved memory for resident firmware > +------------------------------------- > + > +The RISC-V kernel expects the firmware to mark any resident memory with the > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > +(avoiding issues with hibernation, speculative accesses and probably other > +subsystems). > + > +Kernel location > +--------------- > + > +The RISC-V kernel expects to be placed at a PMD boundary (2MB for rv64 and 4MB > +for rv32). Note though that the EFI stub will physically relocate the kernel if > +that's not the case. > + > +Device-tree > +----------- > + > +The RISC-V kernel always expects a device tree, it is: > + > +- either passed directly to the kernel from the previous stage using the `$a1` > + register, > +- or when booting with UEFI, the device tree will be retrieved by the EFI stub > + using the EFI configuration table or it will be created. > + > +Bootflow > +-------- > + > +There exist 2 methods to enter the kernel: > + > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > + wins a lottery and executes the early boot code while the other harts are > + parked waiting for the initialization to finish. This method is now > + **deprecated**. > +- Ordered booting: the firmware releases only one hart that will execute the > + initialization phase and then will start all other harts using the SBI HSM > + extension. > + > +UEFI > +---- > + > +UEFI memory map > +~~~~~~~~~~~~~~~ > + > +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to > +populate the system memory. > + > +The UEFI firmware must parse the subnodes of the `/reserved-memory` device tree > +node and abide by the device tree specification to convert the attributes of > +those subnodes (`no-map` and `reusable`) into their correct EFI equivalent > +(refer to section "3.5.4 /reserved-memory and UEFI" of the device tree > +specification). > + > +RISCV_EFI_BOOT_PROTOCOL > +~~~~~~~~~~~~~~~~~~~~~~~ > + > +When booting with UEFI, the EFI stub requires the boot hartid in order to pass > +it to the RISC-V kernel in `$a1`. The EFI stub retrieves the boot hartid using > +one of the following methods: > + > +- `RISCV_EFI_BOOT_PROTOCOL` (**preferred**). > +- `boot-hartid` device tree subnode (**deprecated**). > + > +Any new firmware must implement `RISCV_EFI_BOOT_PROTOCOL` as the device tree > +based approach is deprecated now. > + > +During kernel boot: (Kernel internals) > +====================================== > + > +EFI stub and device tree > +------------------------ > + > +When booting with UEFI, the device tree is supplemented by the EFI stub with the > +following parameters (largely shared with arm64 in Documentation/arm/uefi.rst): > + > +========================== ====== =========================================== > +Name Size Description > +========================== ====== =========================================== > +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. > + > +linux,uefi-mmap-start 64-bit Physical address of the UEFI memory map, > + populated by the UEFI GetMemoryMap() call. > + > +linux,uefi-mmap-size 32-bit Size in bytes of the UEFI memory map > + pointed to in previous entry. > + > +linux,uefi-mmap-desc-size 32-bit Size in bytes of each entry in the UEFI > + memory map. > + > +linux,uefi-mmap-desc-ver 32-bit Version of the mmap descriptor format. > + > +kaslr-seed 64-bit Entropy used to randomize the kernel image > + base address location. > + > +bootargs Kernel command line > +========================== ====== =========================================== > + > +Virtual mapping setup > +--------------------- > + > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > + > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > + :c:var:`early_pg_dir` which allows to discover the system memory: only the > + kernel text/data are mapped at this point. When establishing this mapping, > + no allocation can be done (since the system memory is not known yet), so > + :c:var:`early_pg_dir` page table is statically allocated (using only one > + table for each level). > + > +2. :c:func:`setup_vm_final` creates the final kernel mapping in > + :c:var:`swapper_pg_dir` and takes advantage of the discovered system memory > + to create the linear mapping. When establishing this mapping, the kernel > + can allocate memory but cannot access it directly (since the direct mapping > + is not present yet), so it uses temporary mappings in the fixmap region to > + be able to access the newly allocated page table levels. > + > +For :c:func:`virt_to_phys` and :c:func:`phys_to_virt` to be able to correctly > +convert direct mapping addresses to physical addresses, it needs to know the > +start of the DRAM: this happens after 1, right before 2 installs the direct > +mapping (see :c:func:`setup_bootmem` function in arch/riscv/mm/init.c). So > +any usage of those macros before the final virtual mapping is installed must be > +carefully examined. > + > +Device-tree mapping via fixmap > +------------------------------ > + > +The RISC-V kernel uses the fixmap region to map the device tree because the > +device tree virtual mapping must remain the same between :c:func:`setup_vm` and > +:c:func:`setup_vm_final` calls since :c:var:`reserved_mem` array is initialized > +with virtual addresses established by :c:func:`setup_vm` and used with the > +mapping established by :c:func:`setup_vm_final`. > + > +Pre-MMU execution > +----------------- > + > +Any code that executes before even the first virtual mapping is established > +must be very carefully compiled as: > + > +- `-fno-pie`: This is needed for relocatable kernels which use `-fPIE`, since > + otherwise, any access to a global symbol would go through the GOT which is > + only relocated virtually. > +- `-mcmodel=medany`: Any access to a global symbol must be PC-relative to avoid > + any relocations to happen before the MMU is setup. > +- Also note that *all* instrumentation must also be disabled (that includes > + KASAN, ftrace and others). > + > +As using a symbol from a different compilation unit requires this unit to be > +compiled with those flags, we advise, as much as possible, not to use external > +symbols. > diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst > index 175a91db0200..1f66062def6d 100644 > --- a/Documentation/riscv/index.rst > +++ b/Documentation/riscv/index.rst > @@ -5,6 +5,7 @@ RISC-V architecture > .. toctree:: > :maxdepth: 1 > > + boot > boot-image-header > vm-layout > hwprobe > -- > 2.39.2 >
Alexandre Ghiti <alexghiti@rivosinc.com> writes: > This document describes the constraints and requirements of the early > boot process in a RISC-V kernel. > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Additional 'z'; "Signed-off-by:" Nice writeup! Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Hey Alex, Thanks for working on this :) I've got a mix of suggestions and questions below. Hopefully it is not too disjoint, since I didn't write them in order. On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > This document describes the constraints and requirements of the early > boot process in a RISC-V kernel. > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > --- > Documentation/riscv/boot-image-header.rst | 3 - > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > Documentation/riscv/index.rst | 1 + > 3 files changed, 182 insertions(+), 3 deletions(-) > create mode 100644 Documentation/riscv/boot.rst > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > index d7752533865f..a4a45310c4c4 100644 > --- a/Documentation/riscv/boot-image-header.rst > +++ b/Documentation/riscv/boot-image-header.rst > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > This document only describes the boot image header details for RISC-V Linux. > > -TODO: > - Write a complete booting guide. > - > The following 64-byte header is present in decompressed Linux kernel image:: > > u32 code0; /* Executable code */ > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > new file mode 100644 > index 000000000000..b02230818b79 > --- /dev/null > +++ b/Documentation/riscv/boot.rst > @@ -0,0 +1,181 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============================================= > +Early boot requirements/constraints on RISC-V > +============================================= Please use "title case", here and elsewhere in the doc. I'd also be inclined to drop the "Early" from here, as it permits more natural section headings. Perhaps "RISC-V Kernel Boot Requirements and Constraints"? > + > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > +:Date: 23 May 2023 > + > +This document describes what the RISC-V kernel expects from the previous stages "the previous stages" is a bit vague IMO. You mean bootloader stages I assume, but I think it should be explicit. Perhaps: "...what a RISC-V kernel expects from bootloaders and firmware, and the constraints..." > +and the firmware, but also the constraints that any developer must have in mind > +when touching the early boot process, e.g. before the final virtual mapping is > +setup. s/setup./set up./ Do you mean to have "For example" here? Or is "before the final virtual mapping is set up" the definition or "early boot"? If the latter, I would reword this as something like: "...when modifying the early boot process. For the purposes of this document, the 'early boot process' refers to any code that runs before the final virtual mapping is set up." > +Pre-kernel boot (Expectations from firmware) Firmware or bootloaders? TBH, I would just drop the section in () and do something like: Pre-kernel Requirements and Constraints ======================================= The RISC-V kernel expects the following of bootloaders and platform firmware: > + > +Registers state s/Registers state/Register State/ > +--------------- > + > +The RISC-V kernel expects: > + > + * `$a0` to contain the hartid of the current core. > + * `$a1` to contain the address of the device tree in memory. > + > +CSR state > +--------- > + > +The RISC-V kernel expects: > + > + * `$satp = 0`: the MMU must be disabled. "the MMU, if present, must be disabled." ;) > + > +Reserved memory for resident firmware > +------------------------------------- > + > +The RISC-V kernel expects the firmware to mark any resident memory with the Should this be "...resident memory, or memory it has protected with PMPs, with..." ? > +`no-map` flag, thus the kernel won't map those regions in the direct mapping "no-map" is a DT specific term, should this section be moved down under DT, as a sub-section of that? > +(avoiding issues with hibernation, speculative accesses and probably other > +subsystems). I'm not sure that this () section is beneficial. To be honest, recent issues aside, this section here seems like a statement of the obvious... > + > +Kernel location > +--------------- > + > +The RISC-V kernel expects to be placed at a PMD boundary (2MB for rv64 and 4MB Would that be better worded as "(2 MB aligned for rv64 and 4 MB aligned for rv32)"? It might be overly explicit, but I figure there's no harm... > +for rv32). Note though that the EFI stub will physically relocate the kernel if s/though// > +that's not the case. > + > +Device-tree s/Device-tree/Devicetree/ and... > +----------- > + > +The RISC-V kernel always expects a device tree, it is: ...s/device tree/devicetree/ to match elsewhere in the kernel docs. Same applies to the other instances of "device tree" in this patch, please. > + > +- either passed directly to the kernel from the previous stage using the `$a1` > + register, > +- or when booting with UEFI, the device tree will be retrieved by the EFI stub > + using the EFI configuration table or it will be created. Can I suggest changing this around a little, pulling the "either" & dropping some boilerplate so that it reads (to me!) a little more naturally: The RISC-V kernel always expects a devicetree, it is either: - passed directly to the kernel from the previous stage using the `$a1` register, - retrieved by the EFI stub when booting with UEFI, using the EFI configuration table or it will be created by ____. Also, please elaborate on what it will be created by. > + > +Bootflow "Boot Flow", no? I am not sure that this is the "correct" heading for the content it describes, but I have nothing better to offer :/ > +-------- > + > +There exist 2 methods to enter the kernel: > + > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > + wins a lottery and executes the early boot code while the other harts are > + parked waiting for the initialization to finish. This method is now nit: s/now// What do you mean by deprecated? There's no requirement to implement the HSM extension, right? > + **deprecated**. > +- Ordered booting: the firmware releases only one hart that will execute the > + initialization phase and then will start all other harts using the SBI HSM > + extension. > + > +UEFI > +---- > + > +UEFI memory map > +~~~~~~~~~~~~~~~ > + > +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to > +populate the system memory. > + > +The UEFI firmware must parse the subnodes of the `/reserved-memory` device tree > +node and abide by the device tree specification to convert the attributes of > +those subnodes (`no-map` and `reusable`) into their correct EFI equivalent > +(refer to section "3.5.4 /reserved-memory and UEFI" of the device tree > +specification). > + > +RISCV_EFI_BOOT_PROTOCOL > +~~~~~~~~~~~~~~~~~~~~~~~ > + > +When booting with UEFI, the EFI stub requires the boot hartid in order to pass > +it to the RISC-V kernel in `$a1`. The EFI stub retrieves the boot hartid using > +one of the following methods: > + > +- `RISCV_EFI_BOOT_PROTOCOL` (**preferred**). > +- `boot-hartid` device tree subnode (**deprecated**). > + > +Any new firmware must implement `RISCV_EFI_BOOT_PROTOCOL` as the device tree > +based approach is deprecated now. > + > +During kernel boot: (Kernel internals) With the other section titles changed, this could be: Early Boot Requirements and Constraints ======================================= The RISC-V kernel's early boot process operates under the following constraints: Thoughts? > +====================================== > + > +EFI stub and device tree Same comments about "device tree" here etc. > +------------------------ > + > +When booting with UEFI, the device tree is supplemented by the EFI stub with the > +following parameters (largely shared with arm64 in Documentation/arm/uefi.rst): > + > +========================== ====== =========================================== > +Name Size Description > +========================== ====== =========================================== > +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. nit: Hmm, I think for all of these sizes s/-bit/ bits/. > + > +linux,uefi-mmap-start 64-bit Physical address of the UEFI memory map, > + populated by the UEFI GetMemoryMap() call. > + > +linux,uefi-mmap-size 32-bit Size in bytes of the UEFI memory map > + pointed to in previous entry. > + > +linux,uefi-mmap-desc-size 32-bit Size in bytes of each entry in the UEFI > + memory map. > + > +linux,uefi-mmap-desc-ver 32-bit Version of the mmap descriptor format. > + > +kaslr-seed 64-bit Entropy used to randomize the kernel image > + base address location. > + > +bootargs Kernel command line > +========================== ====== =========================================== > + > +Virtual mapping setup nit: s/setup/Installation/ > +--------------------- > + > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > + > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > + :c:var:`early_pg_dir` which allows to discover the system memory: only the s/to discover/discovery of/ s/: only/. Only/ > + kernel text/data are mapped at this point. When establishing this mapping, > + no allocation can be done (since the system memory is not known yet), so > + :c:var:`early_pg_dir` page table is statically allocated (using only one > + table for each level). > + > +2. :c:func:`setup_vm_final` creates the final kernel mapping in > + :c:var:`swapper_pg_dir` and takes advantage of the discovered system memory > + to create the linear mapping. When establishing this mapping, the kernel > + can allocate memory but cannot access it directly (since the direct mapping > + is not present yet), so it uses temporary mappings in the fixmap region to > + be able to access the newly allocated page table levels. > + > +For :c:func:`virt_to_phys` and :c:func:`phys_to_virt` to be able to correctly > +convert direct mapping addresses to physical addresses, it needs to know the nit: s/it/they/ > +start of the DRAM: this happens after 1, right before 2 installs the direct s/:/./ Also how about s/1/step 1/ & s/2/step 2/? > +mapping (see :c:func:`setup_bootmem` function in arch/riscv/mm/init.c). So s/So// > +any usage of those macros before the final virtual mapping is installed must be > +carefully examined. > + > +Device-tree mapping via fixmap > +------------------------------ > + > +The RISC-V kernel uses the fixmap region to map the device tree because the > +device tree virtual mapping must remain the same between :c:func:`setup_vm` and > +:c:func:`setup_vm_final` calls since :c:var:`reserved_mem` array is initialized Missing a "the" before reserved_mem. > +with virtual addresses established by :c:func:`setup_vm` and used with the > +mapping established by :c:func:`setup_vm_final`. > + > +Pre-MMU execution > +----------------- > + > +Any code that executes before even the first virtual mapping is established > +must be very carefully compiled as: Could you point out what the non-obvious examples of this code are? > +- `-fno-pie`: This is needed for relocatable kernels which use `-fPIE`, since Is there a reason why the capitalisation is different for the two compiler flags? > + otherwise, any access to a global symbol would go through the GOT which is > + only relocated virtually. > +- `-mcmodel=medany`: Any access to a global symbol must be PC-relative to avoid > + any relocations to happen before the MMU is setup. > +- Also note that *all* instrumentation must also be disabled (that includes nit: s/Also note that// > + KASAN, ftrace and others). > + > +As using a symbol from a different compilation unit requires this unit to be > +compiled with those flags, we advise, as much as possible, not to use external > +symbols. If the use of early alternatives grows, are we going to have to split the vendors early alternatives into a different compilation unit from their regular alternatives? Cheers, Conor.
On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > Hey Alex, > > Thanks for working on this :) I've got a mix of suggestions and > questions below. Hopefully it is not too disjoint, since I didn't write > them in order. > > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > > This document describes the constraints and requirements of the early > > boot process in a RISC-V kernel. > > > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > --- > > Documentation/riscv/boot-image-header.rst | 3 - > > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > > Documentation/riscv/index.rst | 1 + > > 3 files changed, 182 insertions(+), 3 deletions(-) > > create mode 100644 Documentation/riscv/boot.rst > > > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > > index d7752533865f..a4a45310c4c4 100644 > > --- a/Documentation/riscv/boot-image-header.rst > > +++ b/Documentation/riscv/boot-image-header.rst > > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > > > This document only describes the boot image header details for RISC-V Linux. > > > > -TODO: > > - Write a complete booting guide. > > - > > The following 64-byte header is present in decompressed Linux kernel image:: > > > > u32 code0; /* Executable code */ > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > > new file mode 100644 > > index 000000000000..b02230818b79 > > --- /dev/null > > +++ b/Documentation/riscv/boot.rst > > @@ -0,0 +1,181 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > + > > +============================================= > > +Early boot requirements/constraints on RISC-V > > +============================================= > > Please use "title case", here and elsewhere in the doc. You mean using "title: " instead of "===="? Or using uppercase for the first letter of each word? FYI I followed https://docs.kernel.org/doc-guide/sphinx.html?highlight=title#specific-guidelines-for-the-kernel-documentation > I'd also be inclined to drop the "Early" from here, as it permits more > natural section headings. Perhaps "RISC-V Kernel Boot Requirements and > Constraints"? Good suggestion, I'll go with that, thanks > > > + > > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > > +:Date: 23 May 2023 > > + > > +This document describes what the RISC-V kernel expects from the previous stages > > "the previous stages" is a bit vague IMO. You mean bootloader stages I > assume, but I think it should be explicit. Perhaps: > "...what a RISC-V kernel expects from bootloaders and firmware, and the > constraints..." > > > +and the firmware, but also the constraints that any developer must have in mind > > +when touching the early boot process, e.g. before the final virtual mapping is > > +setup. > > s/setup./set up./ > > Do you mean to have "For example" here? Or is "before the final virtual > mapping is set up" the definition or "early boot"? If the latter, I > would reword this as something like: > "...when modifying the early boot process. For the purposes of this > document, the 'early boot process' refers to any code that runs before > the final virtual mapping is set up." Thanks, that's what I meant. > > > +Pre-kernel boot (Expectations from firmware) > > Firmware or bootloaders? TBH, I would just drop the section in () and > do something like: > Pre-kernel Requirements and Constraints > ======================================= > > The RISC-V kernel expects the following of bootloaders and platform > firmware: > Ok > > + > > +Registers state > > s/Registers state/Register State/ Ok > > > +--------------- > > + > > +The RISC-V kernel expects: > > + > > + * `$a0` to contain the hartid of the current core. > > + * `$a1` to contain the address of the device tree in memory. > > + > > +CSR state > > +--------- > > + > > +The RISC-V kernel expects: > > + > > + * `$satp = 0`: the MMU must be disabled. > > "the MMU, if present, must be disabled." ;) Ahah forgot the !mmu case, thanks :) > > > + > > +Reserved memory for resident firmware > > +------------------------------------- > > + > > +The RISC-V kernel expects the firmware to mark any resident memory with the > > Should this be > "...resident memory, or memory it has protected with PMPs, with..." > ? I used "resident" memory instead of "PMP" memory because it was more general. I mean you can have a region that is resident but not protected by PMP, and I don't think the kernel should ask for this resident memory to be protected with PMP right? > > > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > > "no-map" is a DT specific term, should this section be moved down under > DT, as a sub-section of that? Maybe I can rephrase with something like that: "The RISC-V kernel must not map any resident memory in the direct mapping, so the firmware must correctly mark those regions as follows: - when using a devicetree, using the `no-map` flag, - when booting with UEFI without devicetree, either as `EfiRuntimeServicesData/Code` or `EfiReserved`." > > > +(avoiding issues with hibernation, speculative accesses and probably other > > +subsystems). > > I'm not sure that this () section is beneficial. To be honest, recent > issues aside, this section here seems like a statement of the obvious... But I made the mistake, so it was not that straightforward to me...I'll remove the (). > > > + > > +Kernel location > > +--------------- > > + > > +The RISC-V kernel expects to be placed at a PMD boundary (2MB for rv64 and 4MB > > Would that be better worded as "(2 MB aligned for rv64 and 4 MB aligned > for rv32)"? It might be overly explicit, but I figure there's no harm... Sure I can add that. > > > +for rv32). Note though that the EFI stub will physically relocate the kernel if > > s/though// Ok > > > +that's not the case. > > + > > +Device-tree Damn, missed this one, thanks! > > s/Device-tree/Devicetree/ and... > > > +----------- > > + > > +The RISC-V kernel always expects a device tree, it is: > > ...s/device tree/devicetree/ to match elsewhere in the kernel docs. > Same applies to the other instances of "device tree" in this patch, > please. Ok I'll do that (but I'm happy to say that I thought about that and it was intentional since "git grep "device tree" | wc -l" returns a significant number of instances :)). > > > + > > +- either passed directly to the kernel from the previous stage using the `$a1` > > + register, > > +- or when booting with UEFI, the device tree will be retrieved by the EFI stub > > + using the EFI configuration table or it will be created. > > Can I suggest changing this around a little, pulling the "either" & > dropping some boilerplate so that it reads (to me!) a little more > naturally: > The RISC-V kernel always expects a devicetree, it is either: > > - passed directly to the kernel from the previous stage using the `$a1` > register, > - retrieved by the EFI stub when booting with UEFI, using the EFI > configuration table or it will be created by ____. > > Also, please elaborate on what it will be created by. Is it better this way: "The RISC-V kernel always expects a devicetree, it is either: - passed directly to the kernel from the previous stage using the `$a1` register, - retrieved by the EFI stub if present in the EFI configuration table, - created by the EFI stub otherwise." > > > + > > +Bootflow > > "Boot Flow", no? > I am not sure that this is the "correct" heading for the content it > describes, but I have nothing better to offer :/ Yes you're right, what about simply "Kernel Entrance"? Not sure this is easily understandable though. > > > +-------- > > + > > +There exist 2 methods to enter the kernel: > > + > > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > > + wins a lottery and executes the early boot code while the other harts are > > + parked waiting for the initialization to finish. This method is now > > nit: s/now// Ok > > What do you mean by deprecated? There's no requirement to implement the > HSM extension, right? I would say yes, you have to use a recent version of openSBI that supports the HSM extension. @Atish Kumar Patra WDYT? > > > + **deprecated**. > > +- Ordered booting: the firmware releases only one hart that will execute the > > + initialization phase and then will start all other harts using the SBI HSM > > + extension. > > + > > +UEFI > > +---- > > + > > +UEFI memory map > > +~~~~~~~~~~~~~~~ > > + > > +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to > > +populate the system memory. > > + > > +The UEFI firmware must parse the subnodes of the `/reserved-memory` device tree > > +node and abide by the device tree specification to convert the attributes of > > +those subnodes (`no-map` and `reusable`) into their correct EFI equivalent > > +(refer to section "3.5.4 /reserved-memory and UEFI" of the device tree > > +specification). > > + > > +RISCV_EFI_BOOT_PROTOCOL > > +~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +When booting with UEFI, the EFI stub requires the boot hartid in order to pass > > +it to the RISC-V kernel in `$a1`. The EFI stub retrieves the boot hartid using > > +one of the following methods: > > + > > +- `RISCV_EFI_BOOT_PROTOCOL` (**preferred**). > > +- `boot-hartid` device tree subnode (**deprecated**). > > + > > +Any new firmware must implement `RISCV_EFI_BOOT_PROTOCOL` as the device tree > > +based approach is deprecated now. > > + > > +During kernel boot: (Kernel internals) > > With the other section titles changed, this could be: > Early Boot Requirements and Constraints > ======================================= > > The RISC-V kernel's early boot process operates under the > following constraints: > > Thoughts? I think it's better as you propose, I changed it, thanks. > > > +====================================== > > + > > +EFI stub and device tree > > Same comments about "device tree" here etc. > > > +------------------------ > > + > > +When booting with UEFI, the device tree is supplemented by the EFI stub with the > > +following parameters (largely shared with arm64 in Documentation/arm/uefi.rst): > > + > > +========================== ====== =========================================== > > +Name Size Description > > +========================== ====== =========================================== > > +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. > > nit: Hmm, I think for all of these sizes s/-bit/ bits/. That's a copy-paste from the link just above the table. But maybe I should have pointed to the doc and added only the "bootargs" stuff (maybe that's also present for arm64 actually). > > > + > > +linux,uefi-mmap-start 64-bit Physical address of the UEFI memory map, > > + populated by the UEFI GetMemoryMap() call. > > + > > +linux,uefi-mmap-size 32-bit Size in bytes of the UEFI memory map > > + pointed to in previous entry. > > + > > +linux,uefi-mmap-desc-size 32-bit Size in bytes of each entry in the UEFI > > + memory map. > > + > > +linux,uefi-mmap-desc-ver 32-bit Version of the mmap descriptor format. > > + > > +kaslr-seed 64-bit Entropy used to randomize the kernel image > > + base address location. > > + > > +bootargs Kernel command line > > +========================== ====== =========================================== > > + > > +Virtual mapping setup > > nit: s/setup/Installation/ Ok > > > +--------------------- > > + > > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > > + > > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > > + :c:var:`early_pg_dir` which allows to discover the system memory: only the > > s/to discover/discovery of/ You mean "the discovery of" right? > s/: only/. Only/ Ok > > > + kernel text/data are mapped at this point. When establishing this mapping, > > + no allocation can be done (since the system memory is not known yet), so > > + :c:var:`early_pg_dir` page table is statically allocated (using only one > > + table for each level). > > + > > +2. :c:func:`setup_vm_final` creates the final kernel mapping in > > + :c:var:`swapper_pg_dir` and takes advantage of the discovered system memory > > + to create the linear mapping. When establishing this mapping, the kernel > > + can allocate memory but cannot access it directly (since the direct mapping > > + is not present yet), so it uses temporary mappings in the fixmap region to > > + be able to access the newly allocated page table levels. > > + > > +For :c:func:`virt_to_phys` and :c:func:`phys_to_virt` to be able to correctly > > +convert direct mapping addresses to physical addresses, it needs to know the > > nit: s/it/they/ Ok > > > +start of the DRAM: this happens after 1, right before 2 installs the direct > > s/:/./ Ahah, you really don't like long sentences :) But of course ok :) > Also how about s/1/step 1/ & s/2/step 2/? Way better, thanks > > > +mapping (see :c:func:`setup_bootmem` function in arch/riscv/mm/init.c). So > > s/So// > Ok > > +any usage of those macros before the final virtual mapping is installed must be > > +carefully examined. > > + > > +Device-tree mapping via fixmap > > +------------------------------ > > + > > +The RISC-V kernel uses the fixmap region to map the device tree because the > > +device tree virtual mapping must remain the same between :c:func:`setup_vm` and > > +:c:func:`setup_vm_final` calls since :c:var:`reserved_mem` array is initialized > > Missing a "the" before reserved_mem. Ok > > > +with virtual addresses established by :c:func:`setup_vm` and used with the > > +mapping established by :c:func:`setup_vm_final`. > > + > > +Pre-MMU execution > > +----------------- > > + > > +Any code that executes before even the first virtual mapping is established > > +must be very carefully compiled as: > > Could you point out what the non-obvious examples of this code are? I can do a list, yes > > > +- `-fno-pie`: This is needed for relocatable kernels which use `-fPIE`, since > > Is there a reason why the capitalisation is different for the two > compiler flags? No idea! > > > + otherwise, any access to a global symbol would go through the GOT which is > > + only relocated virtually. > > +- `-mcmodel=medany`: Any access to a global symbol must be PC-relative to avoid > > + any relocations to happen before the MMU is setup. > > +- Also note that *all* instrumentation must also be disabled (that includes > > nit: s/Also note that// Ok > > > + KASAN, ftrace and others). > > + > > +As using a symbol from a different compilation unit requires this unit to be > > +compiled with those flags, we advise, as much as possible, not to use external > > +symbols. > > If the use of early alternatives grows, are we going to have to split > the vendors early alternatives into a different compilation unit from > their regular alternatives? Indeed, that would relax this constraint for the non-early part of the file, but as we already talked, the goal is to remove all those constraints by moving all this code in kernel/pi (at least it is my goal :)). > > Cheers, > Conor. Thanks for your thorough review!
On Mon, Jun 19, 2023 at 04:04:52PM +0200, Alexandre Ghiti wrote: > On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > > > new file mode 100644 > > > index 000000000000..b02230818b79 > > > --- /dev/null > > > +++ b/Documentation/riscv/boot.rst > > > @@ -0,0 +1,181 @@ > > > +.. SPDX-License-Identifier: GPL-2.0 > > > + > > > +============================================= > > > +Early boot requirements/constraints on RISC-V > > > +============================================= > > > > Please use "title case", here and elsewhere in the doc. > > You mean using "title: " instead of "===="? Or using uppercase for the > first letter of each word? FYI I followed > https://docs.kernel.org/doc-guide/sphinx.html?highlight=title#specific-guidelines-for-the-kernel-documentation The latter. That's weird, I guess it would be nice to see what Jon thinks about that. > > > +Reserved memory for resident firmware > > > +------------------------------------- > > > + > > > +The RISC-V kernel expects the firmware to mark any resident memory with the > > > > Should this be > > "...resident memory, or memory it has protected with PMPs, with..." > > ? > > I used "resident" memory instead of "PMP" memory because it was more > general. I mean you can have a region that is resident but not > protected by PMP, and I don't think the kernel should ask for this > resident memory to be protected with PMP right? Nah, I was thinking about the opposite. PMP protected regions that do not have memory-resident programs in them. > > > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > > > > "no-map" is a DT specific term, should this section be moved down under > > DT, as a sub-section of that? > > Maybe I can rephrase with something like that: > > "The RISC-V kernel must not map any resident memory in the direct > mapping, so the firmware must correctly mark those regions as follows: > - when using a devicetree, using the `no-map` flag, > - when booting with UEFI without devicetree, either as > `EfiRuntimeServicesData/Code` or `EfiReserved`." I'm not sure you need to have a list to be honest. Could do it in free-form text if you like. But you've got 3 options there for Efi stuff, isn't only one of them a valid equivalent for "no-map"? > > > > +(avoiding issues with hibernation, speculative accesses and probably other > > > +subsystems). > > > > I'm not sure that this () section is beneficial. To be honest, recent > > issues aside, this section here seems like a statement of the obvious... > > But I made the mistake, so it was not that straightforward to > me...I'll remove the (). I know, hence the "recent issues aside" ;) > > > +that's not the case. > > > + > > > +Device-tree > > Damn, missed this one, thanks! > > > > > s/Device-tree/Devicetree/ and... > > > > > +----------- > > > + > > > +The RISC-V kernel always expects a device tree, it is: > > > > ...s/device tree/devicetree/ to match elsewhere in the kernel docs. > > Same applies to the other instances of "device tree" in this patch, > > please. > > Ok I'll do that (but I'm happy to say that I thought about that and it > was intentional since "git grep "device tree" | wc -l" returns a > significant number of instances :)). Yeah, I had the same thoughts recently too. It's completely mixed unfortunately, but I suppose I was going off of the headings in the DT docs that are in rst form. It's not a big deal obviously. > > > +- either passed directly to the kernel from the previous stage using the `$a1` > > > + register, > > > +- or when booting with UEFI, the device tree will be retrieved by the EFI stub > > > + using the EFI configuration table or it will be created. > > > > Can I suggest changing this around a little, pulling the "either" & > > dropping some boilerplate so that it reads (to me!) a little more > > naturally: > > The RISC-V kernel always expects a devicetree, it is either: > > > > - passed directly to the kernel from the previous stage using the `$a1` > > register, > > - retrieved by the EFI stub when booting with UEFI, using the EFI > > configuration table or it will be created by ____. > > > > Also, please elaborate on what it will be created by. > > Is it better this way: > > "The RISC-V kernel always expects a devicetree, it is either: > > - passed directly to the kernel from the previous stage using the > `$a1` > register, > - retrieved by the EFI stub if present in the EFI configuration table, > - created by the EFI stub otherwise." Nah, I think the 2 bullet structure was better. This 3 bullet mode implies that if not passed in a1, then the EFI stub will create it. Which is obviously not true > > > > > > + > > > +Bootflow > > > > "Boot Flow", no? > > I am not sure that this is the "correct" heading for the content it > > describes, but I have nothing better to offer :/ > > Yes you're right, what about simply "Kernel Entrance"? Not sure this > is easily understandable though. "Non-boot Hart Initialisation"? > > > +-------- > > > + > > > +There exist 2 methods to enter the kernel: > > > + > > > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > > > + wins a lottery and executes the early boot code while the other harts are > > > + parked waiting for the initialization to finish. This method is now > > > > nit: s/now// > > Ok > > > > > What do you mean by deprecated? There's no requirement to implement the > > HSM extension, right? > > I would say yes, you have to use a recent version of openSBI that > supports the HSM extension. @Atish Kumar Patra WDYT? Uh, you don't need to use OpenSBI in the first place, let alone use a recent version of it, right? What am I missing? Also, what about !SMP systems? Although my suggested new section title kinda addresses that! > > > + **deprecated**. > > > +- Ordered booting: the firmware releases only one hart that will execute the > > > + initialization phase and then will start all other harts using the SBI HSM > > > + extension. > > > + > > > +UEFI > > > +---- > > > + > > > +UEFI memory map > > > +~~~~~~~~~~~~~~~ > > > + > > > +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to > > > +populate the system memory. > > > + > > > +The UEFI firmware must parse the subnodes of the `/reserved-memory` device tree > > > +node and abide by the device tree specification to convert the attributes of > > > +those subnodes (`no-map` and `reusable`) into their correct EFI equivalent > > > +(refer to section "3.5.4 /reserved-memory and UEFI" of the device tree > > > +specification). > > > + > > > +RISCV_EFI_BOOT_PROTOCOL > > > +~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +When booting with UEFI, the EFI stub requires the boot hartid in order to pass > > > +it to the RISC-V kernel in `$a1`. The EFI stub retrieves the boot hartid using > > > +one of the following methods: > > > + > > > +- `RISCV_EFI_BOOT_PROTOCOL` (**preferred**). > > > +- `boot-hartid` device tree subnode (**deprecated**). > > > + > > > +Any new firmware must implement `RISCV_EFI_BOOT_PROTOCOL` as the device tree > > > +based approach is deprecated now. > > > + > > > +During kernel boot: (Kernel internals) > > > > With the other section titles changed, this could be: > > Early Boot Requirements and Constraints > > ======================================= > > > > The RISC-V kernel's early boot process operates under the > > following constraints: > > > > Thoughts? > > I think it's better as you propose, I changed it, thanks. > > > > > > +====================================== > > > + > > > +EFI stub and device tree > > > > Same comments about "device tree" here etc. > > > > > +------------------------ > > > + > > > +When booting with UEFI, the device tree is supplemented by the EFI stub with the > > > +following parameters (largely shared with arm64 in Documentation/arm/uefi.rst): > > > + > > > +========================== ====== =========================================== > > > +Name Size Description > > > +========================== ====== =========================================== > > > +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. > > > > nit: Hmm, I think for all of these sizes s/-bit/ bits/. > > That's a copy-paste from the link just above the table. > > But maybe I should have pointed to the doc and added only the > "bootargs" stuff (maybe that's also present for arm64 actually). If it is identical, sounds like a good idea. It's common code that does that stuff, right? > > > +--------------------- > > > + > > > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > > > + > > > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > > > + :c:var:`early_pg_dir` which allows to discover the system memory: only the > > > > s/to discover/discovery of/ > > You mean "the discovery of" right? No? The "the" there would not be required. > > > +For :c:func:`virt_to_phys` and :c:func:`phys_to_virt` to be able to correctly > > > +convert direct mapping addresses to physical addresses, it needs to know the > > > > nit: s/it/they/ > > Ok > > > > > > +start of the DRAM: this happens after 1, right before 2 installs the direct > > > > s/:/./ > > Ahah, you really don't like long sentences :) I dunno if it is long sentences per se, I just think it is easier to follow. > But of course ok :) > > > Also how about s/1/step 1/ & s/2/step 2/? > > Way better, thanks > > > +Pre-MMU execution > > > +----------------- > > > + > > > +Any code that executes before even the first virtual mapping is established > > > +must be very carefully compiled as: > > > > Could you point out what the non-obvious examples of this code are? > > I can do a list, yes Not even a list, just something like "...established, for example early alternatives and foo, must be very..." > Thanks for your thorough review! NW chief.
Conor Dooley <conor@kernel.org> writes: > On Mon, Jun 19, 2023 at 04:04:52PM +0200, Alexandre Ghiti wrote: >> On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: >> > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > >> > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst >> > > new file mode 100644 >> > > index 000000000000..b02230818b79 >> > > --- /dev/null >> > > +++ b/Documentation/riscv/boot.rst >> > > @@ -0,0 +1,181 @@ >> > > +.. SPDX-License-Identifier: GPL-2.0 >> > > + >> > > +============================================= >> > > +Early boot requirements/constraints on RISC-V >> > > +============================================= >> > >> > Please use "title case", here and elsewhere in the doc. >> >> You mean using "title: " instead of "===="? Or using uppercase for the >> first letter of each word? FYI I followed >> https://docs.kernel.org/doc-guide/sphinx.html?highlight=title#specific-guidelines-for-the-kernel-documentation > > The latter. That's weird, I guess it would be nice to see what Jon > thinks about that. I have Never Been Fond of Excessive Use of Capital Letters, so my preference would be capitalization as in a normal sentence. That said, I don't really feel that something like this is important enough that we need to define and enforce a policy around it. If the maintainers for specific subsystems feel differently, then I guess that's up to them. Thanks, jon
Hey Alex, Many thanks for writing this up! On Mon, Jun 19, 2023 at 11:49:13AM +0200, Alexandre Ghiti wrote: > @Sunil V L Something about ACPI is more than welcome :) > > And thanks to @Björn Töpel and @Atish Kumar Patra for helping in > writing this document! > > On Mon, Jun 19, 2023 at 11:47 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > This document describes the constraints and requirements of the early > > boot process in a RISC-V kernel. > > > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > --- > > Documentation/riscv/boot-image-header.rst | 3 - > > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > > Documentation/riscv/index.rst | 1 + > > 3 files changed, 182 insertions(+), 3 deletions(-) > > create mode 100644 Documentation/riscv/boot.rst > > > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > > index d7752533865f..a4a45310c4c4 100644 > > --- a/Documentation/riscv/boot-image-header.rst > > +++ b/Documentation/riscv/boot-image-header.rst > > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > > > This document only describes the boot image header details for RISC-V Linux. > > > > -TODO: > > - Write a complete booting guide. > > - > > The following 64-byte header is present in decompressed Linux kernel image:: > > > > u32 code0; /* Executable code */ > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > > new file mode 100644 > > index 000000000000..b02230818b79 > > --- /dev/null > > +++ b/Documentation/riscv/boot.rst > > @@ -0,0 +1,181 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > + > > +============================================= > > +Early boot requirements/constraints on RISC-V > > +============================================= > > + > > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > > +:Date: 23 May 2023 > > + > > +This document describes what the RISC-V kernel expects from the previous stages > > +and the firmware, but also the constraints that any developer must have in mind > > +when touching the early boot process, e.g. before the final virtual mapping is > > +setup. > > + > > +Pre-kernel boot (Expectations from firmware) > > +============================================ > > + > > +Registers state > > +--------------- > > + > > +The RISC-V kernel expects: > > + > > + * `$a0` to contain the hartid of the current core. > > + * `$a1` to contain the address of the device tree in memory. > > + > > +CSR state > > +--------- > > + > > +The RISC-V kernel expects: > > + > > + * `$satp = 0`: the MMU must be disabled. > > + > > +Reserved memory for resident firmware > > +------------------------------------- > > + > > +The RISC-V kernel expects the firmware to mark any resident memory with the > > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > > +(avoiding issues with hibernation, speculative accesses and probably other > > +subsystems). > > + > > +Kernel location > > +--------------- > > + > > +The RISC-V kernel expects to be placed at a PMD boundary (2MB for rv64 and 4MB > > +for rv32). Note though that the EFI stub will physically relocate the kernel if > > +that's not the case. > > + > > +Device-tree > > +----------- > > + > > +The RISC-V kernel always expects a device tree, it is: > > + In general, the firmware can pass either DT or ACPI (not both at the same time) to the OS. So, I think the statement that kernel always expects DT probably causes confusion. Can we mention some thing like below? The firmware can pass either DeviceTree or ACPI tables to the RISC-V kernel. The DeviceTree is either passed directly to the kernel from the previous stage using the `$a1` register, or when booting with UEFI, it can be passed using the EFI configuration table. The ACPI tables are passed to the kernel using the EFI configuration table. In this case, a tiny DeviceTree is still created by the EFI stub. Please refer to "EFI stub and device" tree section below for details of the tiny DeviceTree. Thanks, Sunil
On Mon, Jun 19, 2023 at 04:04:52PM +0200, Alexandre Ghiti wrote: > On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > > > Hey Alex, > > > > Thanks for working on this :) I've got a mix of suggestions and > > questions below. Hopefully it is not too disjoint, since I didn't write > > them in order. > > > > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > > > This document describes the constraints and requirements of the early > > > boot process in a RISC-V kernel. > > > > > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > --- > > > Documentation/riscv/boot-image-header.rst | 3 - > > > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > > > Documentation/riscv/index.rst | 1 + > > > 3 files changed, 182 insertions(+), 3 deletions(-) > > > create mode 100644 Documentation/riscv/boot.rst > > > > > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > > > index d7752533865f..a4a45310c4c4 100644 > > > --- a/Documentation/riscv/boot-image-header.rst > > > +++ b/Documentation/riscv/boot-image-header.rst > > > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > > > > > This document only describes the boot image header details for RISC-V Linux. > > > > > > -TODO: > > > - Write a complete booting guide. > > > - > > > The following 64-byte header is present in decompressed Linux kernel image:: > > > > > > u32 code0; /* Executable code */ > > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > > > new file mode 100644 > > > index 000000000000..b02230818b79 > > > --- /dev/null > > > +++ b/Documentation/riscv/boot.rst > > > @@ -0,0 +1,181 @@ > > > +.. SPDX-License-Identifier: GPL-2.0 > > > + > > > +============================================= > > > +Early boot requirements/constraints on RISC-V > > > +============================================= > > > > Please use "title case", here and elsewhere in the doc. > > You mean using "title: " instead of "===="? Or using uppercase for the > first letter of each word? FYI I followed > https://docs.kernel.org/doc-guide/sphinx.html?highlight=title#specific-guidelines-for-the-kernel-documentation > > > I'd also be inclined to drop the "Early" from here, as it permits more > > natural section headings. Perhaps "RISC-V Kernel Boot Requirements and > > Constraints"? > > Good suggestion, I'll go with that, thanks > > > > > > + > > > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > > > +:Date: 23 May 2023 > > > + > > > +This document describes what the RISC-V kernel expects from the previous stages > > > > "the previous stages" is a bit vague IMO. You mean bootloader stages I > > assume, but I think it should be explicit. Perhaps: > > "...what a RISC-V kernel expects from bootloaders and firmware, and the > > constraints..." > > > > > +and the firmware, but also the constraints that any developer must have in mind > > > +when touching the early boot process, e.g. before the final virtual mapping is > > > +setup. > > > > s/setup./set up./ > > > > Do you mean to have "For example" here? Or is "before the final virtual > > mapping is set up" the definition or "early boot"? If the latter, I > > would reword this as something like: > > "...when modifying the early boot process. For the purposes of this > > document, the 'early boot process' refers to any code that runs before > > the final virtual mapping is set up." > > Thanks, that's what I meant. > > > > > > +Pre-kernel boot (Expectations from firmware) > > > > Firmware or bootloaders? TBH, I would just drop the section in () and > > do something like: > > Pre-kernel Requirements and Constraints > > ======================================= > > > > The RISC-V kernel expects the following of bootloaders and platform > > firmware: > > > > Ok > > > > + > > > +Registers state > > > > s/Registers state/Register State/ > > Ok > > > > > > +--------------- > > > + > > > +The RISC-V kernel expects: > > > + > > > + * `$a0` to contain the hartid of the current core. > > > + * `$a1` to contain the address of the device tree in memory. > > > + > > > +CSR state > > > +--------- > > > + > > > +The RISC-V kernel expects: > > > + > > > + * `$satp = 0`: the MMU must be disabled. > > > > "the MMU, if present, must be disabled." ;) > > Ahah forgot the !mmu case, thanks :) > > > > > > + > > > +Reserved memory for resident firmware > > > +------------------------------------- > > > + > > > +The RISC-V kernel expects the firmware to mark any resident memory with the > > > > Should this be > > "...resident memory, or memory it has protected with PMPs, with..." > > ? > > I used "resident" memory instead of "PMP" memory because it was more > general. I mean you can have a region that is resident but not > protected by PMP, and I don't think the kernel should ask for this > resident memory to be protected with PMP right? > > > > > > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > > > > "no-map" is a DT specific term, should this section be moved down under > > DT, as a sub-section of that? > > Maybe I can rephrase with something like that: > > "The RISC-V kernel must not map any resident memory in the direct > mapping, so the firmware must correctly mark those regions as follows: > - when using a devicetree, using the `no-map` flag, > - when booting with UEFI without devicetree, either as > `EfiRuntimeServicesData/Code` or `EfiReserved`." > Hi Alex, I am not sure about the idea behind mentioning only UEFI boot without DT since UEFI boot is supported with DT also. Should we just mention that "when booting with UEFI, resident firmware ranges must be marked as per UEFI specification" ? Converting reserved-memory node in DT to UEFI memory map is anyway mentioned separately under UEFI memory map section right? Thanks, Sunil
On Tue, Jun 20, 2023 at 9:18 AM Sunil V L <sunilvl@ventanamicro.com> wrote: > > Hey Alex, > > Many thanks for writing this up! > > On Mon, Jun 19, 2023 at 11:49:13AM +0200, Alexandre Ghiti wrote: > > @Sunil V L Something about ACPI is more than welcome :) > > > > And thanks to @Björn Töpel and @Atish Kumar Patra for helping in > > writing this document! > > > > On Mon, Jun 19, 2023 at 11:47 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > This document describes the constraints and requirements of the early > > > boot process in a RISC-V kernel. > > > > > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > --- > > > Documentation/riscv/boot-image-header.rst | 3 - > > > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > > > Documentation/riscv/index.rst | 1 + > > > 3 files changed, 182 insertions(+), 3 deletions(-) > > > create mode 100644 Documentation/riscv/boot.rst > > > > > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > > > index d7752533865f..a4a45310c4c4 100644 > > > --- a/Documentation/riscv/boot-image-header.rst > > > +++ b/Documentation/riscv/boot-image-header.rst > > > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > > > > > This document only describes the boot image header details for RISC-V Linux. > > > > > > -TODO: > > > - Write a complete booting guide. > > > - > > > The following 64-byte header is present in decompressed Linux kernel image:: > > > > > > u32 code0; /* Executable code */ > > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > > > new file mode 100644 > > > index 000000000000..b02230818b79 > > > --- /dev/null > > > +++ b/Documentation/riscv/boot.rst > > > @@ -0,0 +1,181 @@ > > > +.. SPDX-License-Identifier: GPL-2.0 > > > + > > > +============================================= > > > +Early boot requirements/constraints on RISC-V > > > +============================================= > > > + > > > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > > > +:Date: 23 May 2023 > > > + > > > +This document describes what the RISC-V kernel expects from the previous stages > > > +and the firmware, but also the constraints that any developer must have in mind > > > +when touching the early boot process, e.g. before the final virtual mapping is > > > +setup. > > > + > > > +Pre-kernel boot (Expectations from firmware) > > > +============================================ > > > + > > > +Registers state > > > +--------------- > > > + > > > +The RISC-V kernel expects: > > > + > > > + * `$a0` to contain the hartid of the current core. > > > + * `$a1` to contain the address of the device tree in memory. > > > + > > > +CSR state > > > +--------- > > > + > > > +The RISC-V kernel expects: > > > + > > > + * `$satp = 0`: the MMU must be disabled. > > > + > > > +Reserved memory for resident firmware > > > +------------------------------------- > > > + > > > +The RISC-V kernel expects the firmware to mark any resident memory with the > > > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > > > +(avoiding issues with hibernation, speculative accesses and probably other > > > +subsystems). > > > + > > > +Kernel location > > > +--------------- > > > + > > > +The RISC-V kernel expects to be placed at a PMD boundary (2MB for rv64 and 4MB > > > +for rv32). Note though that the EFI stub will physically relocate the kernel if > > > +that's not the case. > > > + > > > +Device-tree > > > +----------- > > > + > > > +The RISC-V kernel always expects a device tree, it is: > > > + > In general, the firmware can pass either DT or ACPI (not both at the > same time) to the OS. So, I think the statement that kernel always > expects DT probably causes confusion. Can we mention some thing like > below? > > The firmware can pass either DeviceTree or ACPI tables to the RISC-V > kernel. > > The DeviceTree is either passed directly to the kernel from the > previous stage using the `$a1` register, or when booting with UEFI, it > can be passed using the EFI configuration table. > > The ACPI tables are passed to the kernel using the EFI configuration > table. In this case, a tiny DeviceTree is still created by the EFI > stub. Please refer to "EFI stub and device" tree section below for > details of the tiny DeviceTree. > Great, this is way better so I'll go with that: Hardware description -------------------- The firmware can pass either a devicetree or ACPI tables to the RISC-V kernel. The devicetree is either passed directly to the kernel from the previous stage using the `$a1` register, or when booting with UEFI, it can be passed using the EFI configuration table. The ACPI tables are passed to the kernel using the EFI configuration table. In this case, a tiny devicetree is still created by the EFI stub. Please refer to "EFI stub and devicetree" tree section below for details about this devicetree. Thanks! > Thanks, > Sunil
On Tue, Jun 20, 2023 at 9:41 AM Sunil V L <sunilvl@ventanamicro.com> wrote: > > On Mon, Jun 19, 2023 at 04:04:52PM +0200, Alexandre Ghiti wrote: > > On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > > > > > Hey Alex, > > > > > > Thanks for working on this :) I've got a mix of suggestions and > > > questions below. Hopefully it is not too disjoint, since I didn't write > > > them in order. > > > > > > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > > > > This document describes the constraints and requirements of the early > > > > boot process in a RISC-V kernel. > > > > > > > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > --- > > > > Documentation/riscv/boot-image-header.rst | 3 - > > > > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > > > > Documentation/riscv/index.rst | 1 + > > > > 3 files changed, 182 insertions(+), 3 deletions(-) > > > > create mode 100644 Documentation/riscv/boot.rst > > > > > > > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > > > > index d7752533865f..a4a45310c4c4 100644 > > > > --- a/Documentation/riscv/boot-image-header.rst > > > > +++ b/Documentation/riscv/boot-image-header.rst > > > > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > > > > > > > This document only describes the boot image header details for RISC-V Linux. > > > > > > > > -TODO: > > > > - Write a complete booting guide. > > > > - > > > > The following 64-byte header is present in decompressed Linux kernel image:: > > > > > > > > u32 code0; /* Executable code */ > > > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > > > > new file mode 100644 > > > > index 000000000000..b02230818b79 > > > > --- /dev/null > > > > +++ b/Documentation/riscv/boot.rst > > > > @@ -0,0 +1,181 @@ > > > > +.. SPDX-License-Identifier: GPL-2.0 > > > > + > > > > +============================================= > > > > +Early boot requirements/constraints on RISC-V > > > > +============================================= > > > > > > Please use "title case", here and elsewhere in the doc. > > > > You mean using "title: " instead of "===="? Or using uppercase for the > > first letter of each word? FYI I followed > > https://docs.kernel.org/doc-guide/sphinx.html?highlight=title#specific-guidelines-for-the-kernel-documentation > > > > > I'd also be inclined to drop the "Early" from here, as it permits more > > > natural section headings. Perhaps "RISC-V Kernel Boot Requirements and > > > Constraints"? > > > > Good suggestion, I'll go with that, thanks > > > > > > > > > + > > > > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > +:Date: 23 May 2023 > > > > + > > > > +This document describes what the RISC-V kernel expects from the previous stages > > > > > > "the previous stages" is a bit vague IMO. You mean bootloader stages I > > > assume, but I think it should be explicit. Perhaps: > > > "...what a RISC-V kernel expects from bootloaders and firmware, and the > > > constraints..." > > > > > > > +and the firmware, but also the constraints that any developer must have in mind > > > > +when touching the early boot process, e.g. before the final virtual mapping is > > > > +setup. > > > > > > s/setup./set up./ > > > > > > Do you mean to have "For example" here? Or is "before the final virtual > > > mapping is set up" the definition or "early boot"? If the latter, I > > > would reword this as something like: > > > "...when modifying the early boot process. For the purposes of this > > > document, the 'early boot process' refers to any code that runs before > > > the final virtual mapping is set up." > > > > Thanks, that's what I meant. > > > > > > > > > +Pre-kernel boot (Expectations from firmware) > > > > > > Firmware or bootloaders? TBH, I would just drop the section in () and > > > do something like: > > > Pre-kernel Requirements and Constraints > > > ======================================= > > > > > > The RISC-V kernel expects the following of bootloaders and platform > > > firmware: > > > > > > > Ok > > > > > > + > > > > +Registers state > > > > > > s/Registers state/Register State/ > > > > Ok > > > > > > > > > +--------------- > > > > + > > > > +The RISC-V kernel expects: > > > > + > > > > + * `$a0` to contain the hartid of the current core. > > > > + * `$a1` to contain the address of the device tree in memory. > > > > + > > > > +CSR state > > > > +--------- > > > > + > > > > +The RISC-V kernel expects: > > > > + > > > > + * `$satp = 0`: the MMU must be disabled. > > > > > > "the MMU, if present, must be disabled." ;) > > > > Ahah forgot the !mmu case, thanks :) > > > > > > > > > + > > > > +Reserved memory for resident firmware > > > > +------------------------------------- > > > > + > > > > +The RISC-V kernel expects the firmware to mark any resident memory with the > > > > > > Should this be > > > "...resident memory, or memory it has protected with PMPs, with..." > > > ? > > > > I used "resident" memory instead of "PMP" memory because it was more > > general. I mean you can have a region that is resident but not > > protected by PMP, and I don't think the kernel should ask for this > > resident memory to be protected with PMP right? > > > > > > > > > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > > > > > > "no-map" is a DT specific term, should this section be moved down under > > > DT, as a sub-section of that? > > > > Maybe I can rephrase with something like that: > > > > "The RISC-V kernel must not map any resident memory in the direct > > mapping, so the firmware must correctly mark those regions as follows: > > - when using a devicetree, using the `no-map` flag, > > - when booting with UEFI without devicetree, either as > > `EfiRuntimeServicesData/Code` or `EfiReserved`." > > > Hi Alex, > > I am not sure about the idea behind mentioning only UEFI boot without > DT since UEFI boot is supported with DT also. Should we just mention > that "when booting with UEFI, resident firmware ranges must be marked as > per UEFI specification" ? Converting reserved-memory node in DT to UEFI > memory map is anyway mentioned separately under UEFI memory map section > right? Right, let's make it simple: "The RISC-V kernel must not map any resident memory in the direct mapping, so the firmware must correctly mark those regions as per the devicetree specification and/or the UEFI specification." Feel free to comment if that's still not right to you, Thanks, > > Thanks, > Sunil
On Mon, Jun 19, 2023 at 6:00 PM Conor Dooley <conor@kernel.org> wrote: > > On Mon, Jun 19, 2023 at 04:04:52PM +0200, Alexandre Ghiti wrote: > > On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > > > > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > > > > new file mode 100644 > > > > index 000000000000..b02230818b79 > > > > --- /dev/null > > > > +++ b/Documentation/riscv/boot.rst > > > > @@ -0,0 +1,181 @@ > > > > +.. SPDX-License-Identifier: GPL-2.0 > > > > + > > > > +============================================= > > > > +Early boot requirements/constraints on RISC-V > > > > +============================================= > > > > > > Please use "title case", here and elsewhere in the doc. > > > > You mean using "title: " instead of "===="? Or using uppercase for the > > first letter of each word? FYI I followed > > https://docs.kernel.org/doc-guide/sphinx.html?highlight=title#specific-guidelines-for-the-kernel-documentation > > The latter. That's weird, I guess it would be nice to see what Jon > thinks about that. > > > > > +Reserved memory for resident firmware > > > > +------------------------------------- > > > > + > > > > +The RISC-V kernel expects the firmware to mark any resident memory with the > > > > > > Should this be > > > "...resident memory, or memory it has protected with PMPs, with..." > > > ? > > > > I used "resident" memory instead of "PMP" memory because it was more > > general. I mean you can have a region that is resident but not > > protected by PMP, and I don't think the kernel should ask for this > > resident memory to be protected with PMP right? > > Nah, I was thinking about the opposite. PMP protected regions that do > not have memory-resident programs in them. Ok, I'll add a reference to PMP regions like you mentioned the first time. > > > > > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > > > > > > "no-map" is a DT specific term, should this section be moved down under > > > DT, as a sub-section of that? > > > > Maybe I can rephrase with something like that: > > > > "The RISC-V kernel must not map any resident memory in the direct > > mapping, so the firmware must correctly mark those regions as follows: > > - when using a devicetree, using the `no-map` flag, > > - when booting with UEFI without devicetree, either as > > `EfiRuntimeServicesData/Code` or `EfiReserved`." > > I'm not sure you need to have a list to be honest. Could do it in > free-form text if you like. But you've got 3 options there for Efi > stuff, isn't only one of them a valid equivalent for "no-map"? Indeed, the no-map attribute is (according to the dt spec) always converted to EfiReservedMemoryType and no-map is the only way to "reserve" such memory using devicetree. A firmware can also mark those regions as EfiRuntimeServicesData/Code so that they don't get mapped in the direct mapping. But ok, let's make it simple, I'll go with something like Sunil's proposition. > > > > > > +(avoiding issues with hibernation, speculative accesses and probably other > > > > +subsystems). > > > > > > I'm not sure that this () section is beneficial. To be honest, recent > > > issues aside, this section here seems like a statement of the obvious... > > > > But I made the mistake, so it was not that straightforward to > > me...I'll remove the (). > > I know, hence the "recent issues aside" ;) > > > > > +that's not the case. > > > > + > > > > +Device-tree > > > > Damn, missed this one, thanks! > > > > > > > > s/Device-tree/Devicetree/ and... > > > > > > > +----------- > > > > + > > > > +The RISC-V kernel always expects a device tree, it is: > > > > > > ...s/device tree/devicetree/ to match elsewhere in the kernel docs. > > > Same applies to the other instances of "device tree" in this patch, > > > please. > > > > Ok I'll do that (but I'm happy to say that I thought about that and it > > was intentional since "git grep "device tree" | wc -l" returns a > > significant number of instances :)). > > Yeah, I had the same thoughts recently too. It's completely mixed > unfortunately, but I suppose I was going off of the headings in the DT > docs that are in rst form. It's not a big deal obviously. > > > > > +- either passed directly to the kernel from the previous stage using the `$a1` > > > > + register, > > > > +- or when booting with UEFI, the device tree will be retrieved by the EFI stub > > > > + using the EFI configuration table or it will be created. > > > > > > Can I suggest changing this around a little, pulling the "either" & > > > dropping some boilerplate so that it reads (to me!) a little more > > > naturally: > > > The RISC-V kernel always expects a devicetree, it is either: > > > > > > - passed directly to the kernel from the previous stage using the `$a1` > > > register, > > > - retrieved by the EFI stub when booting with UEFI, using the EFI > > > configuration table or it will be created by ____. > > > > > > Also, please elaborate on what it will be created by. > > > > Is it better this way: > > > > "The RISC-V kernel always expects a devicetree, it is either: > > > > - passed directly to the kernel from the previous stage using the > > `$a1` > > register, > > - retrieved by the EFI stub if present in the EFI configuration table, > > - created by the EFI stub otherwise." > > Nah, I think the 2 bullet structure was better. This 3 bullet mode > implies that if not passed in a1, then the EFI stub will create it. > Which is obviously not true I'll go with Sunil's proposition which is way better than the list above! > > > > > > > > > + > > > > +Bootflow > > > > > > "Boot Flow", no? > > > I am not sure that this is the "correct" heading for the content it > > > describes, but I have nothing better to offer :/ > > > > Yes you're right, what about simply "Kernel Entrance"? Not sure this > > is easily understandable though. > > "Non-boot Hart Initialisation"? Hmmm not that great either (sorry for being direct here) > > > > > +-------- > > > > + > > > > +There exist 2 methods to enter the kernel: > > > > + > > > > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > > > > + wins a lottery and executes the early boot code while the other harts are > > > > + parked waiting for the initialization to finish. This method is now > > > > > > nit: s/now// > > > > Ok > > > > > > > > What do you mean by deprecated? There's no requirement to implement the > > > HSM extension, right? > > > > I would say yes, you have to use a recent version of openSBI that > > supports the HSM extension. @Atish Kumar Patra WDYT? > > Uh, you don't need to use OpenSBI in the first place, let alone use a > recent version of it, right? What am I missing? You need a M-Mode firmware which follows the SBI specification and that implements the HSM extension. > Also, what about !SMP systems? Although my suggested new section title > kinda addresses that! I'll add "On SMP systems, there exist 2 methods to enter the kernel:....", I don't think we need to detail the !SMP case as it is obvious. > > > > > + **deprecated**. > > > > +- Ordered booting: the firmware releases only one hart that will execute the > > > > + initialization phase and then will start all other harts using the SBI HSM > > > > + extension. > > > > + > > > > +UEFI > > > > +---- > > > > + > > > > +UEFI memory map > > > > +~~~~~~~~~~~~~~~ > > > > + > > > > +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to > > > > +populate the system memory. > > > > + > > > > +The UEFI firmware must parse the subnodes of the `/reserved-memory` device tree > > > > +node and abide by the device tree specification to convert the attributes of > > > > +those subnodes (`no-map` and `reusable`) into their correct EFI equivalent > > > > +(refer to section "3.5.4 /reserved-memory and UEFI" of the device tree > > > > +specification). > > > > + > > > > +RISCV_EFI_BOOT_PROTOCOL > > > > +~~~~~~~~~~~~~~~~~~~~~~~ > > > > + > > > > +When booting with UEFI, the EFI stub requires the boot hartid in order to pass > > > > +it to the RISC-V kernel in `$a1`. The EFI stub retrieves the boot hartid using > > > > +one of the following methods: > > > > + > > > > +- `RISCV_EFI_BOOT_PROTOCOL` (**preferred**). > > > > +- `boot-hartid` device tree subnode (**deprecated**). > > > > + > > > > +Any new firmware must implement `RISCV_EFI_BOOT_PROTOCOL` as the device tree > > > > +based approach is deprecated now. > > > > + > > > > +During kernel boot: (Kernel internals) > > > > > > With the other section titles changed, this could be: > > > Early Boot Requirements and Constraints > > > ======================================= > > > > > > The RISC-V kernel's early boot process operates under the > > > following constraints: > > > > > > Thoughts? > > > > I think it's better as you propose, I changed it, thanks. > > > > > > > > > +====================================== > > > > + > > > > +EFI stub and device tree > > > > > > Same comments about "device tree" here etc. > > > > > > > +------------------------ > > > > + > > > > +When booting with UEFI, the device tree is supplemented by the EFI stub with the > > > > +following parameters (largely shared with arm64 in Documentation/arm/uefi.rst): > > > > + > > > > +========================== ====== =========================================== > > > > +Name Size Description > > > > +========================== ====== =========================================== > > > > +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. > > > > > > nit: Hmm, I think for all of these sizes s/-bit/ bits/. > > > > That's a copy-paste from the link just above the table. > > > > But maybe I should have pointed to the doc and added only the > > "bootargs" stuff (maybe that's also present for arm64 actually). > > If it is identical, sounds like a good idea. It's common code that does > that stuff, right? Yep, all of that is done in the same generic function in libstub (update_fdt()). I'll move the bootargs stuff to the arm documentation \o/ > > > > > +--------------------- > > > > + > > > > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > > > > + > > > > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > > > > + :c:var:`early_pg_dir` which allows to discover the system memory: only the > > > > > > s/to discover/discovery of/ > > > > You mean "the discovery of" right? > > No? The "the" there would not be required. That was a genuine question, thanks > > > > > +For :c:func:`virt_to_phys` and :c:func:`phys_to_virt` to be able to correctly > > > > +convert direct mapping addresses to physical addresses, it needs to know the > > > > > > nit: s/it/they/ > > > > Ok > > > > > > > > > +start of the DRAM: this happens after 1, right before 2 installs the direct > > > > > > s/:/./ > > > > Ahah, you really don't like long sentences :) > > I dunno if it is long sentences per se, I just think it is easier to > follow. > > > But of course ok :) > > > > > Also how about s/1/step 1/ & s/2/step 2/? > > > > Way better, thanks > > > > > +Pre-MMU execution > > > > +----------------- > > > > + > > > > +Any code that executes before even the first virtual mapping is established > > > > +must be very carefully compiled as: > > > > > > Could you point out what the non-obvious examples of this code are? > > > > I can do a list, yes > > Not even a list, just something like "...established, for example early > alternatives and foo, must be very..." Done as follows: "A few pieces of code need to run before even the first virtual mapping is established, that comprises the installation of the first virtual mapping, the early alternatives and the early parsing of the kernel command line. That code must be very carefully compiled as:..." > > > Thanks for your thorough review! > > NW chief. >
On Mon, Jun 19, 2023 at 10:48:06AM -0600, Jonathan Corbet wrote: > Conor Dooley <conor@kernel.org> writes: > > > On Mon, Jun 19, 2023 at 04:04:52PM +0200, Alexandre Ghiti wrote: > >> On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: > >> > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > > > >> > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > >> > > new file mode 100644 > >> > > index 000000000000..b02230818b79 > >> > > --- /dev/null > >> > > +++ b/Documentation/riscv/boot.rst > >> > > @@ -0,0 +1,181 @@ > >> > > +.. SPDX-License-Identifier: GPL-2.0 > >> > > + > >> > > +============================================= > >> > > +Early boot requirements/constraints on RISC-V > >> > > +============================================= > >> > > >> > Please use "title case", here and elsewhere in the doc. > >> > >> You mean using "title: " instead of "===="? Or using uppercase for the > >> first letter of each word? FYI I followed > >> https://docs.kernel.org/doc-guide/sphinx.html?highlight=title#specific-guidelines-for-the-kernel-documentation > > > > The latter. That's weird, I guess it would be nice to see what Jon > > thinks about that. > > I have Never Been Fond of Excessive Use of Capital Letters, so my > preference would be capitalization as in a normal sentence. > > That said, I don't really feel that something like this is important > enough that we need to define and enforce a policy around it. If the > maintainers for specific subsystems feel differently, then I guess > that's up to them. Okay, thanks.
在 2023/6/19 17:47, Alexandre Ghiti 写道: > This document describes the constraints and requirements of the early > boot process in a RISC-V kernel. > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > --- > Documentation/riscv/boot-image-header.rst | 3 - > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > Documentation/riscv/index.rst | 1 + > 3 files changed, 182 insertions(+), 3 deletions(-) > create mode 100644 Documentation/riscv/boot.rst > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > index d7752533865f..a4a45310c4c4 100644 > --- a/Documentation/riscv/boot-image-header.rst > +++ b/Documentation/riscv/boot-image-header.rst > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > This document only describes the boot image header details for RISC-V Linux. > > -TODO: > - Write a complete booting guide. > - > The following 64-byte header is present in decompressed Linux kernel image:: > > u32 code0; /* Executable code */ > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > new file mode 100644 > index 000000000000..b02230818b79 > --- /dev/null > +++ b/Documentation/riscv/boot.rst > @@ -0,0 +1,181 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============================================= > +Early boot requirements/constraints on RISC-V > +============================================= > + > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > +:Date: 23 May 2023 > + > +This document describes what the RISC-V kernel expects from the previous stages > +and the firmware, but also the constraints that any developer must have in mind > +when touching the early boot process, e.g. before the final virtual mapping is > +setup. > + > +Pre-kernel boot (Expectations from firmware) > +============================================ > + > +Registers state > +--------------- > + > +The RISC-V kernel expects: > + > + * `$a0` to contain the hartid of the current core. > + * `$a1` to contain the address of the device tree in memory. > + > +CSR state > +--------- > + > +The RISC-V kernel expects: > + > + * `$satp = 0`: the MMU must be disabled. > + > +Reserved memory for resident firmware > +------------------------------------- > + > +The RISC-V kernel expects the firmware to mark any resident memory with the > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > +(avoiding issues with hibernation, speculative accesses and probably other > +subsystems). > + > +Kernel location > +--------------- > + > +The RISC-V kernel expects to be placed at a PMD boundary (2MB for rv64 and 4MB > +for rv32). Note though that the EFI stub will physically relocate the kernel if > +that's not the case. > + > +Device-tree > +----------- > + > +The RISC-V kernel always expects a device tree, it is: > + > +- either passed directly to the kernel from the previous stage using the `$a1` > + register, > +- or when booting with UEFI, the device tree will be retrieved by the EFI stub > + using the EFI configuration table or it will be created. > + > +Bootflow > +-------- > + > +There exist 2 methods to enter the kernel: > + > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > + wins a lottery and executes the early boot code while the other harts are > + parked waiting for the initialization to finish. This method is now > + **deprecated**. > +- Ordered booting: the firmware releases only one hart that will execute the > + initialization phase and then will start all other harts using the SBI HSM > + extension. > + > +UEFI > +---- > + > +UEFI memory map > +~~~~~~~~~~~~~~~ > + > +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to > +populate the system memory. > + > +The UEFI firmware must parse the subnodes of the `/reserved-memory` device tree > +node and abide by the device tree specification to convert the attributes of > +those subnodes (`no-map` and `reusable`) into their correct EFI equivalent > +(refer to section "3.5.4 /reserved-memory and UEFI" of the device tree > +specification). how about declare the version of Device Tree specification? like the devicetree-specification-v0.4-rc1 we recently reference > + > +RISCV_EFI_BOOT_PROTOCOL > +~~~~~~~~~~~~~~~~~~~~~~~ > + > +When booting with UEFI, the EFI stub requires the boot hartid in order to pass > +it to the RISC-V kernel in `$a1`. The EFI stub retrieves the boot hartid using > +one of the following methods: > + > +- `RISCV_EFI_BOOT_PROTOCOL` (**preferred**). > +- `boot-hartid` device tree subnode (**deprecated**). > + > +Any new firmware must implement `RISCV_EFI_BOOT_PROTOCOL` as the device tree > +based approach is deprecated now. > + > +During kernel boot: (Kernel internals) > +====================================== > + > +EFI stub and device tree > +------------------------ > + > +When booting with UEFI, the device tree is supplemented by the EFI stub with the > +following parameters (largely shared with arm64 in Documentation/arm/uefi.rst): > + > +========================== ====== =========================================== > +Name Size Description > +========================== ====== =========================================== > +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. > + > +linux,uefi-mmap-start 64-bit Physical address of the UEFI memory map, > + populated by the UEFI GetMemoryMap() call. > + > +linux,uefi-mmap-size 32-bit Size in bytes of the UEFI memory map > + pointed to in previous entry. > + > +linux,uefi-mmap-desc-size 32-bit Size in bytes of each entry in the UEFI > + memory map. > + > +linux,uefi-mmap-desc-ver 32-bit Version of the mmap descriptor format. > + > +kaslr-seed 64-bit Entropy used to randomize the kernel image > + base address location. > + > +bootargs Kernel command line how about use "string" to declare the type of "bootargs" and replace the "Size" in header with "Type" > +========================== ====== =========================================== > + > +Virtual mapping setup > +--------------------- > + > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > + > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > + :c:var:`early_pg_dir` which allows to discover the system memory: only the > + kernel text/data are mapped at this point. When establishing this mapping, > + no allocation can be done (since the system memory is not known yet), so > + :c:var:`early_pg_dir` page table is statically allocated (using only one > + table for each level). > + > +2. :c:func:`setup_vm_final` creates the final kernel mapping in > + :c:var:`swapper_pg_dir` and takes advantage of the discovered system memory > + to create the linear mapping. When establishing this mapping, the kernel > + can allocate memory but cannot access it directly (since the direct mapping > + is not present yet), so it uses temporary mappings in the fixmap region to > + be able to access the newly allocated page table levels. > + > +For :c:func:`virt_to_phys` and :c:func:`phys_to_virt` to be able to correctly > +convert direct mapping addresses to physical addresses, it needs to know the > +start of the DRAM: this happens after 1, right before 2 installs the direct > +mapping (see :c:func:`setup_bootmem` function in arch/riscv/mm/init.c). So > +any usage of those macros before the final virtual mapping is installed must be > +carefully examined. > + > +Device-tree mapping via fixmap > +------------------------------ > + > +The RISC-V kernel uses the fixmap region to map the device tree because the > +device tree virtual mapping must remain the same between :c:func:`setup_vm` and > +:c:func:`setup_vm_final` calls since :c:var:`reserved_mem` array is initialized > +with virtual addresses established by :c:func:`setup_vm` and used with the > +mapping established by :c:func:`setup_vm_final`. > + > +Pre-MMU execution > +----------------- > + > +Any code that executes before even the first virtual mapping is established > +must be very carefully compiled as: > + > +- `-fno-pie`: This is needed for relocatable kernels which use `-fPIE`, since > + otherwise, any access to a global symbol would go through the GOT which is > + only relocated virtually. > +- `-mcmodel=medany`: Any access to a global symbol must be PC-relative to avoid > + any relocations to happen before the MMU is setup. > +- Also note that *all* instrumentation must also be disabled (that includes > + KASAN, ftrace and others). > + > +As using a symbol from a different compilation unit requires this unit to be > +compiled with those flags, we advise, as much as possible, not to use external > +symbols. > diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst > index 175a91db0200..1f66062def6d 100644 > --- a/Documentation/riscv/index.rst > +++ b/Documentation/riscv/index.rst > @@ -5,6 +5,7 @@ RISC-V architecture > .. toctree:: > :maxdepth: 1 > > + boot > boot-image-header > vm-layout > hwprobe
On Tue, Jun 20, 2023 at 11:09:47AM +0200, Alexandre Ghiti wrote: > On Mon, Jun 19, 2023 at 6:00 PM Conor Dooley <conor@kernel.org> wrote: > > On Mon, Jun 19, 2023 at 04:04:52PM +0200, Alexandre Ghiti wrote: > > > On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > > > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > > > > > +Bootflow > > > > > > > > "Boot Flow", no? > > > > I am not sure that this is the "correct" heading for the content it > > > > describes, but I have nothing better to offer :/ > > > > > > Yes you're right, what about simply "Kernel Entrance"? Not sure this > > > is easily understandable though. > > > > "Non-boot Hart Initialisation"? > > Hmmm not that great either (sorry for being direct here) lol, no need to apologise. > > > > > +There exist 2 methods to enter the kernel: > > > > > + > > > > > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > > > > > + wins a lottery and executes the early boot code while the other harts are > > > > > + parked waiting for the initialization to finish. This method is now > > > > > > > > nit: s/now// > > > > > > Ok > > > > > > > > > > > What do you mean by deprecated? There's no requirement to implement the > > > > HSM extension, right? > > > > > > I would say yes, you have to use a recent version of openSBI that > > > supports the HSM extension. @Atish Kumar Patra WDYT? > > > > Uh, you don't need to use OpenSBI in the first place, let alone use a > > recent version of it, right? What am I missing? > > You need a M-Mode firmware which follows the SBI specification and > that implements the HSM extension. Firstly, and maybe I am showing my ignorance here, but we do support m-mode in Linux, and SMP is not disabled for m-mode kernels where it is required to use the spinwait method. Secondly, I don't think that we've actually noted that non-HSM booting is deprecated before now - at least not somewhere obviously. Things like the platform spec on github might require it & it may be deprecated in SBI implementations etc, but in the Kconfig option it is not described as deprecated. The Kconfig option only says that it "should be only enabled for M-mode Linux or platforms relying on older firmware without SBI HSM extension". I think marking it as deprecated here is not accurate, and instead we would be better off pointing out what the limitations of the method are and noting the limited situations when it should be used. > > Also, what about !SMP systems? Although my suggested new section title > > kinda addresses that! > > I'll add "On SMP systems, there exist 2 methods to enter the nit: s/exist/are/ > kernel:....", I don't think we need to detail the !SMP case as it is > obvious. That's fine. Maybe I am just a pedant, but I think it is good to be a bit over precise. > > > > > + **deprecated**. > > > > > +- Ordered booting: the firmware releases only one hart that will execute the > > > > > + initialization phase and then will start all other harts using the SBI HSM > > > > > + extension. > > > > > +--------------------- > > > > > + > > > > > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > > > > > + > > > > > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > > > > > + :c:var:`early_pg_dir` which allows to discover the system memory: only the > > > > > > > > s/to discover/discovery of/ > > > > > > You mean "the discovery of" right? > > > > No? The "the" there would not be required. > > That was a genuine question, thanks Sorry if the "No?" came across as aggressive, I meant it inquisitively. Adding the "the" changes the meaning slightly, but not in a way that that is relevant here. > > > > > +Pre-MMU execution > > > > > +----------------- > > > > > + > > > > > +Any code that executes before even the first virtual mapping is established > > > > > +must be very carefully compiled as: > > > > > > > > Could you point out what the non-obvious examples of this code are? > > > > > > I can do a list, yes > > > > Not even a list, just something like "...established, for example early > > alternatives and foo, must be very..." > > Done as follows: > > "A few pieces of code need to run before even the first virtual mapping is > established, that comprises the installation of the first virtual mapping, the > early alternatives and the early parsing of the kernel command line. That code > must be very carefully compiled as:..." Changed slightly: "A few pieces of code need to run before even the first virtual mapping is established. These are the installation of the first virtual mapping itself, patching of early alternatives and the early parsing of the kernel command line. That code must be very carefully compiled as:..." Two minor suggestions there, one to make the it more obvious what the first section inside commas refers to & one to note what it is that we do with the alternatives. Cheers, Conor.
On Tue, Jun 20, 2023 at 12:33 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > On Tue, Jun 20, 2023 at 11:09:47AM +0200, Alexandre Ghiti wrote: > > On Mon, Jun 19, 2023 at 6:00 PM Conor Dooley <conor@kernel.org> wrote: > > > On Mon, Jun 19, 2023 at 04:04:52PM +0200, Alexandre Ghiti wrote: > > > > On Mon, Jun 19, 2023 at 2:26 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > > > > On Mon, Jun 19, 2023 at 11:47:04AM +0200, Alexandre Ghiti wrote: > > > > > > > +Bootflow > > > > > > > > > > "Boot Flow", no? > > > > > I am not sure that this is the "correct" heading for the content it > > > > > describes, but I have nothing better to offer :/ > > > > > > > > Yes you're right, what about simply "Kernel Entrance"? Not sure this > > > > is easily understandable though. > > > > > > "Non-boot Hart Initialisation"? > > > > Hmmm not that great either (sorry for being direct here) > > lol, no need to apologise. > > > > > > > +There exist 2 methods to enter the kernel: > > > > > > + > > > > > > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > > > > > > + wins a lottery and executes the early boot code while the other harts are > > > > > > + parked waiting for the initialization to finish. This method is now > > > > > > > > > > nit: s/now// > > > > > > > > Ok > > > > > > > > > > > > > > What do you mean by deprecated? There's no requirement to implement the > > > > > HSM extension, right? > > > > > > > > I would say yes, you have to use a recent version of openSBI that > > > > supports the HSM extension. @Atish Kumar Patra WDYT? > > > > > > Uh, you don't need to use OpenSBI in the first place, let alone use a > > > recent version of it, right? What am I missing? > > > > You need a M-Mode firmware which follows the SBI specification and > > that implements the HSM extension. > > Firstly, and maybe I am showing my ignorance here, but we do support > m-mode in Linux, and SMP is not disabled for m-mode kernels where it is > required to use the spinwait method. You're right. > Secondly, I don't think that we've actually noted that non-HSM booting > is deprecated before now - at least not somewhere obviously. Things like > the platform spec on github might require it & it may be deprecated in > SBI implementations etc, but in the Kconfig option it is not described > as deprecated. The Kconfig option only says that it "should be only > enabled for M-mode Linux or platforms relying on older firmware without > SBI HSM extension". > I think marking it as deprecated here is not accurate, and instead we > would be better off pointing out what the limitations of the method are > and noting the limited situations when it should be used. You're right again, before this is officially deprecated, I'll point out the limitations of this method only as follows: Kernel entrance --------------- On SMP systems, there are 2 methods to enter the kernel: - `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart wins a lottery and executes the early boot code while the other harts are parked waiting for the initialization to finish. This method is mostly used to support older firmwares without SBI HSM extension and M-mode RISC-V kernel. - `Ordered booting`: the firmware releases only one hart that will execute the initialization phase and then will start all other harts using the SBI HSM extension. The ordered booting method is the preferred booting method for booting the RISC-V kernel because it can support cpu hotplug and kexec. > > > > Also, what about !SMP systems? Although my suggested new section title > > > kinda addresses that! > > > > I'll add "On SMP systems, there exist 2 methods to enter the > > nit: s/exist/are/ > > > kernel:....", I don't think we need to detail the !SMP case as it is > > obvious. > > That's fine. Maybe I am just a pedant, but I think it is good to be a > bit over precise. > > > > > > > + **deprecated**. > > > > > > +- Ordered booting: the firmware releases only one hart that will execute the > > > > > > + initialization phase and then will start all other harts using the SBI HSM > > > > > > + extension. > > > > > > > +--------------------- > > > > > > + > > > > > > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > > > > > > + > > > > > > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > > > > > > + :c:var:`early_pg_dir` which allows to discover the system memory: only the > > > > > > > > > > s/to discover/discovery of/ > > > > > > > > You mean "the discovery of" right? > > > > > > No? The "the" there would not be required. > > > > That was a genuine question, thanks > > Sorry if the "No?" came across as aggressive, I meant it inquisitively. > Adding the "the" changes the meaning slightly, but not in a way that > that is relevant here. No worries :) > > > > > > > +Pre-MMU execution > > > > > > +----------------- > > > > > > + > > > > > > +Any code that executes before even the first virtual mapping is established > > > > > > +must be very carefully compiled as: > > > > > > > > > > Could you point out what the non-obvious examples of this code are? > > > > > > > > I can do a list, yes > > > > > > Not even a list, just something like "...established, for example early > > > alternatives and foo, must be very..." > > > > Done as follows: > > > > "A few pieces of code need to run before even the first virtual mapping is > > established, that comprises the installation of the first virtual mapping, the > > early alternatives and the early parsing of the kernel command line. That code > > must be very carefully compiled as:..." > > Changed slightly: > "A few pieces of code need to run before even the first virtual mapping is > established. These are the installation of the first virtual mapping itself, > patching of early alternatives and the early parsing of the kernel command line. > That code must be very carefully compiled as:..." > > Two minor suggestions there, one to make the it more obvious what the first > section inside commas refers to & one to note what it is that we do with > the alternatives. > Thanks > Cheers, > Conor.
On Tue, Jun 20, 2023 at 12:32 PM Song Shuai <songshuaishuai@tinylab.org> wrote: > > > > 在 2023/6/19 17:47, Alexandre Ghiti 写道: > > This document describes the constraints and requirements of the early > > boot process in a RISC-V kernel. > > > > Szigned-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > --- > > Documentation/riscv/boot-image-header.rst | 3 - > > Documentation/riscv/boot.rst | 181 ++++++++++++++++++++++ > > Documentation/riscv/index.rst | 1 + > > 3 files changed, 182 insertions(+), 3 deletions(-) > > create mode 100644 Documentation/riscv/boot.rst > > > > diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst > > index d7752533865f..a4a45310c4c4 100644 > > --- a/Documentation/riscv/boot-image-header.rst > > +++ b/Documentation/riscv/boot-image-header.rst > > @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux > > > > This document only describes the boot image header details for RISC-V Linux. > > > > -TODO: > > - Write a complete booting guide. > > - > > The following 64-byte header is present in decompressed Linux kernel image:: > > > > u32 code0; /* Executable code */ > > diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst > > new file mode 100644 > > index 000000000000..b02230818b79 > > --- /dev/null > > +++ b/Documentation/riscv/boot.rst > > @@ -0,0 +1,181 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > + > > +============================================= > > +Early boot requirements/constraints on RISC-V > > +============================================= > > + > > +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> > > +:Date: 23 May 2023 > > + > > +This document describes what the RISC-V kernel expects from the previous stages > > +and the firmware, but also the constraints that any developer must have in mind > > +when touching the early boot process, e.g. before the final virtual mapping is > > +setup. > > + > > +Pre-kernel boot (Expectations from firmware) > > +============================================ > > + > > +Registers state > > +--------------- > > + > > +The RISC-V kernel expects: > > + > > + * `$a0` to contain the hartid of the current core. > > + * `$a1` to contain the address of the device tree in memory. > > + > > +CSR state > > +--------- > > + > > +The RISC-V kernel expects: > > + > > + * `$satp = 0`: the MMU must be disabled. > > + > > +Reserved memory for resident firmware > > +------------------------------------- > > + > > +The RISC-V kernel expects the firmware to mark any resident memory with the > > +`no-map` flag, thus the kernel won't map those regions in the direct mapping > > +(avoiding issues with hibernation, speculative accesses and probably other > > +subsystems). > > + > > +Kernel location > > +--------------- > > + > > +The RISC-V kernel expects to be placed at a PMD boundary (2MB for rv64 and 4MB > > +for rv32). Note though that the EFI stub will physically relocate the kernel if > > +that's not the case. > > + > > +Device-tree > > +----------- > > + > > +The RISC-V kernel always expects a device tree, it is: > > + > > +- either passed directly to the kernel from the previous stage using the `$a1` > > + register, > > +- or when booting with UEFI, the device tree will be retrieved by the EFI stub > > + using the EFI configuration table or it will be created. > > + > > +Bootflow > > +-------- > > + > > +There exist 2 methods to enter the kernel: > > + > > +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart > > + wins a lottery and executes the early boot code while the other harts are > > + parked waiting for the initialization to finish. This method is now > > + **deprecated**. > > +- Ordered booting: the firmware releases only one hart that will execute the > > + initialization phase and then will start all other harts using the SBI HSM > > + extension. > > + > > +UEFI > > +---- > > + > > +UEFI memory map > > +~~~~~~~~~~~~~~~ > > + > > +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to > > +populate the system memory. > > + > > +The UEFI firmware must parse the subnodes of the `/reserved-memory` device tree > > +node and abide by the device tree specification to convert the attributes of > > +those subnodes (`no-map` and `reusable`) into their correct EFI equivalent > > +(refer to section "3.5.4 /reserved-memory and UEFI" of the device tree > > +specification). > how about declare the version of Device Tree specification? > like the devicetree-specification-v0.4-rc1 we recently reference You're right, the section number refers to this version of the specification, so it should be noted. > > + > > +RISCV_EFI_BOOT_PROTOCOL > > +~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +When booting with UEFI, the EFI stub requires the boot hartid in order to pass > > +it to the RISC-V kernel in `$a1`. The EFI stub retrieves the boot hartid using > > +one of the following methods: > > + > > +- `RISCV_EFI_BOOT_PROTOCOL` (**preferred**). > > +- `boot-hartid` device tree subnode (**deprecated**). > > + > > +Any new firmware must implement `RISCV_EFI_BOOT_PROTOCOL` as the device tree > > +based approach is deprecated now. > > + > > +During kernel boot: (Kernel internals) > > +====================================== > > + > > +EFI stub and device tree > > +------------------------ > > + > > +When booting with UEFI, the device tree is supplemented by the EFI stub with the > > +following parameters (largely shared with arm64 in Documentation/arm/uefi.rst): > > + > > +========================== ====== =========================================== > > +Name Size Description > > +========================== ====== =========================================== > > +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. > > + > > +linux,uefi-mmap-start 64-bit Physical address of the UEFI memory map, > > + populated by the UEFI GetMemoryMap() call. > > + > > +linux,uefi-mmap-size 32-bit Size in bytes of the UEFI memory map > > + pointed to in previous entry. > > + > > +linux,uefi-mmap-desc-size 32-bit Size in bytes of each entry in the UEFI > > + memory map. > > + > > +linux,uefi-mmap-desc-ver 32-bit Version of the mmap descriptor format. > > + > > +kaslr-seed 64-bit Entropy used to randomize the kernel image > > + base address location. > > + > > +bootargs Kernel command line > how about use "string" to declare the type of "bootargs" > and replace the "Size" in header with "Type" Why not, this will be more accurate. > > +========================== ====== =========================================== > > + > > +Virtual mapping setup > > +--------------------- > > + > > +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: > > + > > +1. :c:func:`setup_vm` installs a temporary kernel mapping in > > + :c:var:`early_pg_dir` which allows to discover the system memory: only the > > + kernel text/data are mapped at this point. When establishing this mapping, > > + no allocation can be done (since the system memory is not known yet), so > > + :c:var:`early_pg_dir` page table is statically allocated (using only one > > + table for each level). > > + > > +2. :c:func:`setup_vm_final` creates the final kernel mapping in > > + :c:var:`swapper_pg_dir` and takes advantage of the discovered system memory > > + to create the linear mapping. When establishing this mapping, the kernel > > + can allocate memory but cannot access it directly (since the direct mapping > > + is not present yet), so it uses temporary mappings in the fixmap region to > > + be able to access the newly allocated page table levels. > > + > > +For :c:func:`virt_to_phys` and :c:func:`phys_to_virt` to be able to correctly > > +convert direct mapping addresses to physical addresses, it needs to know the > > +start of the DRAM: this happens after 1, right before 2 installs the direct > > +mapping (see :c:func:`setup_bootmem` function in arch/riscv/mm/init.c). So > > +any usage of those macros before the final virtual mapping is installed must be > > +carefully examined. > > + > > +Device-tree mapping via fixmap > > +------------------------------ > > + > > +The RISC-V kernel uses the fixmap region to map the device tree because the > > +device tree virtual mapping must remain the same between :c:func:`setup_vm` and > > +:c:func:`setup_vm_final` calls since :c:var:`reserved_mem` array is initialized > > +with virtual addresses established by :c:func:`setup_vm` and used with the > > +mapping established by :c:func:`setup_vm_final`. > > + > > +Pre-MMU execution > > +----------------- > > + > > +Any code that executes before even the first virtual mapping is established > > +must be very carefully compiled as: > > + > > +- `-fno-pie`: This is needed for relocatable kernels which use `-fPIE`, since > > + otherwise, any access to a global symbol would go through the GOT which is > > + only relocated virtually. > > +- `-mcmodel=medany`: Any access to a global symbol must be PC-relative to avoid > > + any relocations to happen before the MMU is setup. > > +- Also note that *all* instrumentation must also be disabled (that includes > > + KASAN, ftrace and others). > > + > > +As using a symbol from a different compilation unit requires this unit to be > > +compiled with those flags, we advise, as much as possible, not to use external > > +symbols. > > diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst > > index 175a91db0200..1f66062def6d 100644 > > --- a/Documentation/riscv/index.rst > > +++ b/Documentation/riscv/index.rst > > @@ -5,6 +5,7 @@ RISC-V architecture > > .. toctree:: > > :maxdepth: 1 > > > > + boot > > boot-image-header > > vm-layout > > hwprobe > > -- > Thanks > Song Shuai > Thanks Song!
diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst index d7752533865f..a4a45310c4c4 100644 --- a/Documentation/riscv/boot-image-header.rst +++ b/Documentation/riscv/boot-image-header.rst @@ -7,9 +7,6 @@ Boot image header in RISC-V Linux This document only describes the boot image header details for RISC-V Linux. -TODO: - Write a complete booting guide. - The following 64-byte header is present in decompressed Linux kernel image:: u32 code0; /* Executable code */ diff --git a/Documentation/riscv/boot.rst b/Documentation/riscv/boot.rst new file mode 100644 index 000000000000..b02230818b79 --- /dev/null +++ b/Documentation/riscv/boot.rst @@ -0,0 +1,181 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================= +Early boot requirements/constraints on RISC-V +============================================= + +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> +:Date: 23 May 2023 + +This document describes what the RISC-V kernel expects from the previous stages +and the firmware, but also the constraints that any developer must have in mind +when touching the early boot process, e.g. before the final virtual mapping is +setup. + +Pre-kernel boot (Expectations from firmware) +============================================ + +Registers state +--------------- + +The RISC-V kernel expects: + + * `$a0` to contain the hartid of the current core. + * `$a1` to contain the address of the device tree in memory. + +CSR state +--------- + +The RISC-V kernel expects: + + * `$satp = 0`: the MMU must be disabled. + +Reserved memory for resident firmware +------------------------------------- + +The RISC-V kernel expects the firmware to mark any resident memory with the +`no-map` flag, thus the kernel won't map those regions in the direct mapping +(avoiding issues with hibernation, speculative accesses and probably other +subsystems). + +Kernel location +--------------- + +The RISC-V kernel expects to be placed at a PMD boundary (2MB for rv64 and 4MB +for rv32). Note though that the EFI stub will physically relocate the kernel if +that's not the case. + +Device-tree +----------- + +The RISC-V kernel always expects a device tree, it is: + +- either passed directly to the kernel from the previous stage using the `$a1` + register, +- or when booting with UEFI, the device tree will be retrieved by the EFI stub + using the EFI configuration table or it will be created. + +Bootflow +-------- + +There exist 2 methods to enter the kernel: + +- `RISCV_BOOT_SPINWAIT`: the firmware releases all harts in the kernel, one hart + wins a lottery and executes the early boot code while the other harts are + parked waiting for the initialization to finish. This method is now + **deprecated**. +- Ordered booting: the firmware releases only one hart that will execute the + initialization phase and then will start all other harts using the SBI HSM + extension. + +UEFI +---- + +UEFI memory map +~~~~~~~~~~~~~~~ + +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to +populate the system memory. + +The UEFI firmware must parse the subnodes of the `/reserved-memory` device tree +node and abide by the device tree specification to convert the attributes of +those subnodes (`no-map` and `reusable`) into their correct EFI equivalent +(refer to section "3.5.4 /reserved-memory and UEFI" of the device tree +specification). + +RISCV_EFI_BOOT_PROTOCOL +~~~~~~~~~~~~~~~~~~~~~~~ + +When booting with UEFI, the EFI stub requires the boot hartid in order to pass +it to the RISC-V kernel in `$a1`. The EFI stub retrieves the boot hartid using +one of the following methods: + +- `RISCV_EFI_BOOT_PROTOCOL` (**preferred**). +- `boot-hartid` device tree subnode (**deprecated**). + +Any new firmware must implement `RISCV_EFI_BOOT_PROTOCOL` as the device tree +based approach is deprecated now. + +During kernel boot: (Kernel internals) +====================================== + +EFI stub and device tree +------------------------ + +When booting with UEFI, the device tree is supplemented by the EFI stub with the +following parameters (largely shared with arm64 in Documentation/arm/uefi.rst): + +========================== ====== =========================================== +Name Size Description +========================== ====== =========================================== +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. + +linux,uefi-mmap-start 64-bit Physical address of the UEFI memory map, + populated by the UEFI GetMemoryMap() call. + +linux,uefi-mmap-size 32-bit Size in bytes of the UEFI memory map + pointed to in previous entry. + +linux,uefi-mmap-desc-size 32-bit Size in bytes of each entry in the UEFI + memory map. + +linux,uefi-mmap-desc-ver 32-bit Version of the mmap descriptor format. + +kaslr-seed 64-bit Entropy used to randomize the kernel image + base address location. + +bootargs Kernel command line +========================== ====== =========================================== + +Virtual mapping setup +--------------------- + +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: + +1. :c:func:`setup_vm` installs a temporary kernel mapping in + :c:var:`early_pg_dir` which allows to discover the system memory: only the + kernel text/data are mapped at this point. When establishing this mapping, + no allocation can be done (since the system memory is not known yet), so + :c:var:`early_pg_dir` page table is statically allocated (using only one + table for each level). + +2. :c:func:`setup_vm_final` creates the final kernel mapping in + :c:var:`swapper_pg_dir` and takes advantage of the discovered system memory + to create the linear mapping. When establishing this mapping, the kernel + can allocate memory but cannot access it directly (since the direct mapping + is not present yet), so it uses temporary mappings in the fixmap region to + be able to access the newly allocated page table levels. + +For :c:func:`virt_to_phys` and :c:func:`phys_to_virt` to be able to correctly +convert direct mapping addresses to physical addresses, it needs to know the +start of the DRAM: this happens after 1, right before 2 installs the direct +mapping (see :c:func:`setup_bootmem` function in arch/riscv/mm/init.c). So +any usage of those macros before the final virtual mapping is installed must be +carefully examined. + +Device-tree mapping via fixmap +------------------------------ + +The RISC-V kernel uses the fixmap region to map the device tree because the +device tree virtual mapping must remain the same between :c:func:`setup_vm` and +:c:func:`setup_vm_final` calls since :c:var:`reserved_mem` array is initialized +with virtual addresses established by :c:func:`setup_vm` and used with the +mapping established by :c:func:`setup_vm_final`. + +Pre-MMU execution +----------------- + +Any code that executes before even the first virtual mapping is established +must be very carefully compiled as: + +- `-fno-pie`: This is needed for relocatable kernels which use `-fPIE`, since + otherwise, any access to a global symbol would go through the GOT which is + only relocated virtually. +- `-mcmodel=medany`: Any access to a global symbol must be PC-relative to avoid + any relocations to happen before the MMU is setup. +- Also note that *all* instrumentation must also be disabled (that includes + KASAN, ftrace and others). + +As using a symbol from a different compilation unit requires this unit to be +compiled with those flags, we advise, as much as possible, not to use external +symbols. diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst index 175a91db0200..1f66062def6d 100644 --- a/Documentation/riscv/index.rst +++ b/Documentation/riscv/index.rst @@ -5,6 +5,7 @@ RISC-V architecture .. toctree:: :maxdepth: 1 + boot boot-image-header vm-layout hwprobe