From patchwork Fri Sep 15 08:28:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 140268 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp904033vqi; Fri, 15 Sep 2023 02:02:10 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHfOuYLCkkXc5PqmayXovLXVtoEOoS4P8IEKw+qx5+imH+aCJgV2Jog1QgCs/fthcS/gTGW X-Received: by 2002:a05:6a00:18a5:b0:68e:2d9d:b0cc with SMTP id x37-20020a056a0018a500b0068e2d9db0ccmr1203631pfh.6.1694768530081; Fri, 15 Sep 2023 02:02:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694768530; cv=none; d=google.com; s=arc-20160816; b=lCXYwdnBBg+0RdwWH5iH/ZYXJi45KU5cxARaeFCMFbmBImXxKciWwj+/ygPi6rZbf8 W2/gb7HXze4hI+vi5ICn5XDPlNYrUpTIe6nXoXLgkjm21kueVDZoB+KlLtl2arAdD5ke Y62t/VGDLeTPsV00W5rzQLyBDhB01X/yuQOdM3RsXYyfBJqodIODA5y2njwzviZl1Ceg 0xYuU1g66WkbeyJdki9AwKE3LH+NwytojoZCGbycA6NIClUMGwyLjoR49G6ApRb43rzI 8kn7g94qCTZtaXPVsf3Dv02macZKRdyqvv9vguz7qIQAQkKZGTFDK0hIRNjKSkqlhNMK ej6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=5D+PB2+Xq1bmF0uQFk1Rfk1E0SNjFQGsUKNQTbL8vjI=; fh=DvJpOgpqesTAZlr3bqDFz3CtlbK9yGxVry+sEHnDWYk=; b=zvIWLs7IRSyQsM0GOFczbyU1dEz/524KZbm2pTPKkAAsfEYIAjUrkAk4vy9wWfFdgu TcCVy+6fvmVRm9twuKgKPZP5REjhvBAqL2XIxF8R5/Ldb58t3VnyhfMo36muMshww39J PSOUGIuziQY+LZ3abjBTbnWEq1wUZu2g+zYMYmJb5uxk0fh1FX6ftYzFkzOowAnrAdge 0e00rEyIl7q1btLEEQyEUL7/0sQl0NGGJRyCvBY0pYe33PZZEZZkwJQuABlrjftSvvEa FLBhQTV+Vjz1KOvIfY1qIyRfCgiJR9VgAdOLXqQEOUHCqaSwo5xGp3H/Gjc3eBLGlqPu ZAfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=JDhRrZvP; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id x64-20020a636343000000b0057416a797dasi3008006pgb.734.2023.09.15.02.01.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Sep 2023 02:02:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=JDhRrZvP; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id A1886839B9E0; Fri, 15 Sep 2023 01:29:36 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232435AbjIOI3R (ORCPT + 32 others); Fri, 15 Sep 2023 04:29:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232893AbjIOI3P (ORCPT ); Fri, 15 Sep 2023 04:29:15 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88EBD2D70; Fri, 15 Sep 2023 01:28:30 -0700 (PDT) Date: Fri, 15 Sep 2023 08:28:27 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694766508; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5D+PB2+Xq1bmF0uQFk1Rfk1E0SNjFQGsUKNQTbL8vjI=; b=JDhRrZvPCASGh+mPXX31Uw59n5BNE8m1JOk8sf/RCOAwwDfWEH8bRm64VGi2gjCFofzA1W 4x7tUZtHDp/5Nm2dYCPxEqOy5dlGHLAoqAawzs8MeFczytbdg1dHY6Veo42ftgH/vv7idS M/ghNUJr2Ycwfsq5W4FnVJVIVAXjiQdYa80EvMiEsHXk7DbBIi3PpMAIiT6bF/QlvP8XI0 ltGSwV1RTbGh8KyOtrkO+DEqDYY4ck/XOPTI4/C/wYX9EEOWBxKYdHPq0g1inFBHTJZPs5 s8mHoK+ZiuHdEecPcYppnuWYqNeMOOfSkDCGOH6ufhUD3NMvdQljg95dSxEQwA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694766508; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5D+PB2+Xq1bmF0uQFk1Rfk1E0SNjFQGsUKNQTbL8vjI=; b=ZBfDrVqjNn6i6tUBnjfmxoWZ0eiBrTM5dWMnPXt9w+Nb6ZACLJP8z1C3Iisim9gaInmaKp HIcCObaIkD2FtMBw== From: "tip-bot2 for Kirill A. Shutemov" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/urgent] x86/boot/compressed: Reserve more memory for page tables Cc: Aaron Lu , "Kirill A. Shutemov" , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20230915070221.10266-1-kirill.shutemov@linux.intel.com> References: <20230915070221.10266-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Message-ID: <169476650726.27769.5534132274690328507.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 15 Sep 2023 01:29:36 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777093606144578841 X-GMAIL-MSGID: 1777093606144578841 The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 2768c8ca5cc768568e4dfca291b26caa652127cb Gitweb: https://git.kernel.org/tip/2768c8ca5cc768568e4dfca291b26caa652127cb Author: Kirill A. Shutemov AuthorDate: Fri, 15 Sep 2023 10:02:21 +03:00 Committer: Ingo Molnar CommitterDate: Fri, 15 Sep 2023 10:22:24 +02:00 x86/boot/compressed: Reserve more memory for page tables The decompressor has a hard limit on the number of page tables it can allocate. This limit is defined at compile-time and will cause boot failure if it is reached. The kernel is very strict and calculates the limit precisely for the worst-case scenario based on the current configuration. However, it is easy to forget to adjust the limit when a new use-case arises. The worst-case scenario is rarely encountered during sanity checks. In the case of enabling 5-level paging, a use-case was overlooked. The limit needs to be increased by one to accommodate the additional level. This oversight went unnoticed until Aaron attempted to run the kernel via kexec with 5-level paging and unaccepted memory enabled. Update wost-case calculations to include 5-level paging. To address this issue, let's allocate some extra space for page tables. 128K should be sufficient for any use-case. The logic can be simplified by using a single value for all kernel configurations. [ Also add a warning, should this memory run low - by Dave Hansen. ] Fixes: 34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during decompression stage") Reported-by: Aaron Lu Signed-off-by: Kirill A. Shutemov Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com --- arch/x86/boot/compressed/ident_map_64.c | 8 ++++- arch/x86/include/asm/boot.h | 45 ++++++++++++++++-------- 2 files changed, 39 insertions(+), 14 deletions(-) diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c index bcc956c..08f93b0 100644 --- a/arch/x86/boot/compressed/ident_map_64.c +++ b/arch/x86/boot/compressed/ident_map_64.c @@ -59,6 +59,14 @@ static void *alloc_pgt_page(void *context) return NULL; } + /* Consumed more tables than expected? */ + if (pages->pgt_buf_offset == BOOT_PGT_SIZE_WARN) { + debug_putstr("pgt_buf running low in " __FILE__ "\n"); + debug_putstr("Need to raise BOOT_PGT_SIZE?\n"); + debug_putaddr(pages->pgt_buf_offset); + debug_putaddr(pages->pgt_buf_size); + } + entry = pages->pgt_buf + pages->pgt_buf_offset; pages->pgt_buf_offset += PAGE_SIZE; diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h index 4ae1433..b3a7cfb 100644 --- a/arch/x86/include/asm/boot.h +++ b/arch/x86/include/asm/boot.h @@ -40,23 +40,40 @@ #ifdef CONFIG_X86_64 # define BOOT_STACK_SIZE 0x4000 +/* + * Used by decompressor's startup_32() to allocate page tables for identity + * mapping of the 4G of RAM in 4-level paging mode: + * - 1 level4 table; + * - 1 level3 table; + * - 4 level2 table that maps everything with 2M pages; + * + * The additional level5 table needed for 5-level paging is allocated from + * trampoline_32bit memory. + */ # define BOOT_INIT_PGT_SIZE (6*4096) -# ifdef CONFIG_RANDOMIZE_BASE + /* - * Assuming all cross the 512GB boundary: - * 1 page for level4 - * (2+2)*4 pages for kernel, param, cmd_line, and randomized kernel - * 2 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP). - * Total is 19 pages. + * Total number of page tables kernel_add_identity_map() can allocate, + * including page tables consumed by startup_32(). + * + * Worst-case scenario: + * - 5-level paging needs 1 level5 table; + * - KASLR needs to map kernel, boot_params, cmdline and randomized kernel, + * assuming all of them cross 256T boundary: + * + 4*2 level4 table; + * + 4*2 level3 table; + * + 4*2 level2 table; + * - X86_VERBOSE_BOOTUP needs to map the first 2M (video RAM): + * + 1 level4 table; + * + 1 level3 table; + * + 1 level2 table; + * Total: 28 tables + * + * Add 4 spare table in case decompressor touches anything beyond what is + * accounted above. Warn if it happens. */ -# ifdef CONFIG_X86_VERBOSE_BOOTUP -# define BOOT_PGT_SIZE (19*4096) -# else /* !CONFIG_X86_VERBOSE_BOOTUP */ -# define BOOT_PGT_SIZE (17*4096) -# endif -# else /* !CONFIG_RANDOMIZE_BASE */ -# define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE -# endif +# define BOOT_PGT_SIZE_WARN (28*4096) +# define BOOT_PGT_SIZE (32*4096) #else /* !CONFIG_X86_64 */ # define BOOT_STACK_SIZE 0x1000