From patchwork Thu Mar 30 11:49:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 77157 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1068050vqo; Thu, 30 Mar 2023 04:59:21 -0700 (PDT) X-Google-Smtp-Source: AK7set/9xtPcGAJ5ZoBvvyllO2IRzdzzysAPdAV/n0bIhiAmCiNaLbzZCEiaQ99NH/HJcQCxSYKE X-Received: by 2002:a05:6a20:718a:b0:d9:a792:8e3d with SMTP id s10-20020a056a20718a00b000d9a7928e3dmr20579518pzb.30.1680177561480; Thu, 30 Mar 2023 04:59:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680177561; cv=none; d=google.com; s=arc-20160816; b=UxR5Xfi/3pUGQ0+R9NrRg54cAoYlg6u1I1kuPu3JHIlMK8sbWEW9hLxtAYKWpsrEwD qzK5c7wakGJI/riCWxIulkalNjk/F0Ug4Kp5iHuTdmmVzj6Jt8KJDHMWJRxi/+/pT8+F CO2cu5BrKWj8MB7loJFfpJ4H8hbwG/0G9Lf0h+TwW5Gp1v5FyC8b3mZ6bgRH+CscoD/4 n82FRoovnjyf2SweK+nfwDsTOPTrcHmTGVL4TkaBcNL2Xuvh2JcPtI6edLuOa2jTxLDH tv7IcEI1sJ4yMz4pNfTa4hU0uAmw6tsxYh8x3+26yHE+RgIwQCpOR6APVfnFWjcLJF1k X07A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=8Nf7y2H48SvI7vjZmdAL4egfrBNrBmenWjONYlXJG6k=; b=puyQ63RXx2DEA76LHDng/Nqigq0LCKYTrGVJDDWUVQ4b0Yuoo9iY00a+eHTNvNi9zH XYiuvKdJyXPvSnVCCQxAGDRF1fXfHnCrz0c3A1rUqky0hc1CzgKsMKg2O2BFiSa3ISA0 CiHSW2dITbnt3Tpfu/ALTksb3xYwHBjjmQtrtyWVPw5sMWw14staPWj7GuvdYMwF7HMB T0G7UJn67oAvjX0jgt8cuQhTngRTJfWMDI2wiVjjj+F+jnAp82ptzUz7/9AIvtVyfyTn g/zARO/pnNRvVZ4PqIe6DYEHurHZHFk1zXAsgZU7PtqJsheIJ4KRdFyA/V8q57FvvuCe XeSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WBN+yqjP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j71-20020a63804a000000b004fb33a76e2csi16574676pgd.834.2023.03.30.04.59.07; Thu, 30 Mar 2023 04:59:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WBN+yqjP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231749AbjC3Lv3 (ORCPT + 99 others); Thu, 30 Mar 2023 07:51:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231924AbjC3LvC (ORCPT ); Thu, 30 Mar 2023 07:51:02 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C07B0B465; Thu, 30 Mar 2023 04:50:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1680177039; x=1711713039; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wDfbjNCC+5BxiTtoxByvPaljGaBvwiQ8TKW6aMXMCQY=; b=WBN+yqjPrwKcOt7gLbOU0E1+xBy7hUOt4UxTzyAj6vBS46pHbLL1b88Y Ern03CBqHKIGRf/ZD2SYnpoeuQSRvINoq8xwBGfTh0fXPMM8spWOT1GCY cVhAvo3ug5zBT5xf4XsqCr08mxe5YmSTr+VBgLFT2oSzrQDHn+Ju1/R50 blhxjRZfceCSK51TTFeEVN0SFBEukl/p87TJTW8+QYMK//KthkA34wp0Z nHTcv09xoWzToWZtuw4ORlHAGaC9KfV3Ob76RIzPifX4DwhhPgG/kT9nv RhoGDkM0YY/jY10dUzvtn3S+NtRa7QqBEzGAYEbVBRTzBoZ6ncH7xTXuU A==; X-IronPort-AV: E=McAfee;i="6600,9927,10664"; a="342756818" X-IronPort-AV: E=Sophos;i="5.98,303,1673942400"; d="scan'208";a="342756818" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Mar 2023 04:50:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10664"; a="634856514" X-IronPort-AV: E=Sophos;i="5.98,303,1673942400"; d="scan'208";a="634856514" Received: from ngreburx-mobl.ger.corp.intel.com (HELO box.shutemov.name) ([10.251.209.91]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Mar 2023 04:50:18 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id 60182104788; Thu, 30 Mar 2023 14:50:00 +0300 (+03) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Dave Hansen , Mike Rapoport , David Hildenbrand , Mel Gorman , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, aarcange@redhat.com, peterx@redhat.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv9 11/14] x86/mm: Avoid load_unaligned_zeropad() stepping into unaccepted memory Date: Thu, 30 Mar 2023 14:49:53 +0300 Message-Id: <20230330114956.20342-12-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230330114956.20342-1-kirill.shutemov@linux.intel.com> References: <20230330114956.20342-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.4 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761793866450752785?= X-GMAIL-MSGID: =?utf-8?q?1761793866450752785?= load_unaligned_zeropad() can lead to unwanted loads across page boundaries. The unwanted loads are typically harmless. But, they might be made to totally unrelated or even unmapped memory. load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now #VE) to recover from these unwanted loads. But, this approach does not work for unaccepted memory. For TDX, a load from unaccepted memory will not lead to a recoverable exception within the guest. The guest will exit to the VMM where the only recourse is to terminate the guest. There are three parts to fix this issue and comprehensively avoid access to unaccepted memory. Together these ensure that an extra "guard" page is accepted in addition to the memory that needs to be used. 1. Implicitly extend the range_contains_unaccepted_memory(start, end) checks up to end+2M if 'end' is aligned on a 2M boundary. It may require checking 2M chunk beyond end of RAM. The bitmap allocation is modified to accommodate this. 2. Implicitly extend accept_memory(start, end) to end+2M if 'end' is aligned on a 2M boundary. 3. Set PageUnaccepted() on both memory that itself needs to be accepted *and* memory where the next page needs to be accepted. Essentially, make PageUnaccepted(page) a marker for whether work needs to be done to make 'page' usable. That work might include accepting pages in addition to 'page' itself. Side note: This leads to something strange. Pages which were accepted at boot, marked by the firmware as accepted and will never _need_ to be accepted might have PageUnaccepted() set on them. PageUnaccepted(page) is a cue to ensure that the next page is accepted before 'page' can be used. This is an actual, real-world problem which was discovered during TDX testing. Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/mm/unaccepted_memory.c | 39 +++++++++++++++++++++++++ drivers/firmware/efi/libstub/x86-stub.c | 7 +++++ 2 files changed, 46 insertions(+) diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c index 1df918b21469..a0a58486eb74 100644 --- a/arch/x86/mm/unaccepted_memory.c +++ b/arch/x86/mm/unaccepted_memory.c @@ -23,6 +23,38 @@ void accept_memory(phys_addr_t start, phys_addr_t end) bitmap = __va(boot_params.unaccepted_memory); range_start = start / PMD_SIZE; + /* + * load_unaligned_zeropad() can lead to unwanted loads across page + * boundaries. The unwanted loads are typically harmless. But, they + * might be made to totally unrelated or even unmapped memory. + * load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now + * #VE) to recover from these unwanted loads. + * + * But, this approach does not work for unaccepted memory. For TDX, a + * load from unaccepted memory will not lead to a recoverable exception + * within the guest. The guest will exit to the VMM where the only + * recourse is to terminate the guest. + * + * There are three parts to fix this issue and comprehensively avoid + * access to unaccepted memory. Together these ensure that an extra + * "guard" page is accepted in addition to the memory that needs to be + * used: + * + * 1. Implicitly extend the range_contains_unaccepted_memory(start, end) + * checks up to end+2M if 'end' is aligned on a 2M boundary. + * + * 2. Implicitly extend accept_memory(start, end) to end+2M if 'end' is + * aligned on a 2M boundary. (immediately following this comment) + * + * 3. Set PageUnaccepted() on both memory that itself needs to be + * accepted *and* memory where the next page needs to be accepted. + * Essentially, make PageUnaccepted(page) a marker for whether work + * needs to be done to make 'page' usable. That work might include + * accepting pages in addition to 'page' itself. + */ + if (!(end % PMD_SIZE)) + end += PMD_SIZE; + spin_lock_irqsave(&unaccepted_memory_lock, flags); for_each_set_bitrange_from(range_start, range_end, bitmap, DIV_ROUND_UP(end, PMD_SIZE)) { @@ -46,6 +78,13 @@ bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end) bitmap = __va(boot_params.unaccepted_memory); + /* + * Also consider the unaccepted state of the *next* page. See fix #1 in + * the comment on load_unaligned_zeropad() in accept_memory(). + */ + if (!(end % PMD_SIZE)) + end += PMD_SIZE; + spin_lock_irqsave(&unaccepted_memory_lock, flags); while (start < end) { if (test_bit(start / PMD_SIZE, bitmap)) { diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 1643ddbde249..1afe7b5b02e1 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -715,6 +715,13 @@ static efi_status_t allocate_unaccepted_bitmap(struct boot_params *params, return EFI_SUCCESS; } + /* + * range_contains_unaccepted_memory() may need to check one 2M chunk + * beyond the end of RAM to deal with load_unaligned_zeropad(). Make + * sure that the bitmap is large enough handle it. + */ + max_addr += PMD_SIZE; + /* * If unaccepted memory is present, allocate a bitmap to track what * memory has to be accepted before access.