[PATCHv11,6/9] efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory
Message ID | 20230513220418.19357-7-kirill.shutemov@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp5989593vqo; Sat, 13 May 2023 15:06:25 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6eFG1Wef/qxdHe3NLcGXHeFEvMRrV5omgc5NUC9kYBaiQsbYDoL2lYEmztyDeSSZFRkNb3 X-Received: by 2002:a17:90a:fa85:b0:24d:eb38:bfa0 with SMTP id cu5-20020a17090afa8500b0024deb38bfa0mr29676830pjb.11.1684015585212; Sat, 13 May 2023 15:06:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684015585; cv=none; d=google.com; s=arc-20160816; b=DJvLZXTaqSarlumdZgyoMJWLPIQJKfKGToG8rAUIZsEcv/FraSFvsrKvxIt4iK3ZOz O8kmUAJ6kno7Xu+qe4fAaC3UTfl1WnovuDTGMXj69GJyZVFWAikcI4gvWHsvJZ/yrMen 7TqUMAyjJuA3LBaV1CzmpWjEBi3P2XbeC/k7/A5XzgWQzx7JlI1Arf6+ETpZW7GpODhl HdFy+LBXpTgmhGIUDgFnP+bQAqCPQmO20u98NLKAV+OjfHi++zpCOC4TmjlKNrXyMXuX 2vS9+Lh08/LY5DD4GWT5U/K+jQv7ghXLcEIL0T5ciagSlnL3FLEQRrqkSFzJyM4PdtrE nvwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=OQE2MEAlghjkxnRXmdNB2Xf64La8lXGjBEDZAIlZfb0=; b=CxQSUR8d3ecdUWrebqq8AYnbgJid1jAaqiIITK+EG85uEGg9P9MzKxMfMVMZY6piLW sAlmA47iQLWhTI34AapF/GM3TuKZqTunzZoJk+eRubEE8BBiqj0t8gxOpXrbtrZgem+N FTgKO3zpw59yGw2qxanMyh9ZRmNqC8nzlKCzeaohxZaMi1dWuy2jf4lgPxXoqOF3didw Kd59Rx5PaGI8UBDHU+QLEWMz8iaQ03HjAlapR616mrLi6BjGx7qkEjGMA9X/WPiQh2Kl LFBbApyL4MchOXiTZJCua+wb16tDAneQfIJqzXHXVQnHb16JUIpBwVnXXg3uHlUedoEK 1B0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JKCPO3y8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j23-20020a637a57000000b00525064abb6csi11247820pgn.240.2023.05.13.15.06.10; Sat, 13 May 2023 15:06:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JKCPO3y8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233574AbjEMWFL (ORCPT <rfc822;peekingduck44@gmail.com> + 99 others); Sat, 13 May 2023 18:05:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231916AbjEMWEy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 13 May 2023 18:04:54 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 281AD30D3; Sat, 13 May 2023 15:04:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684015493; x=1715551493; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gRzQhhRAaTLGAt93TT2mPFzt+Xa/PSDkJomh6JXhxmM=; b=JKCPO3y8WHafsRVcc8rBjjeYO2gMypANbGvtZJTuZZpNNjkZH/8CFTpP XU3D16vOMyMyuqY+bxbVKzZHSrV+fWwg41TDTlKdGhbKnWCNg5PXro0YW 12Q8Uxe8AkVhps14Xi4qt/TAy6IdsCOQlbkEUtEKyuAVMV3nRJi3pYAdu UOvtYlGby0TFWW1RRi142GR0T/Hm1kpxkdYaYtOB+7p0rbxpikXe0NPjS dZJOPoJOSnVttkBrKv5/hbCL4lHuYb9e7pZYOcNiI3t8jCyb7hC+cbmK8 a3eT6fnYUvIE6SSen2eJD95stN5sr22jh3jpd7GcPD6SjcxdxMUv4xByu A==; X-IronPort-AV: E=McAfee;i="6600,9927,10709"; a="437325202" X-IronPort-AV: E=Sophos;i="5.99,273,1677571200"; d="scan'208";a="437325202" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2023 15:04:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10709"; a="790199674" X-IronPort-AV: E=Sophos;i="5.99,273,1677571200"; d="scan'208";a="790199674" Received: from sorinaau-mobl1.ger.corp.intel.com (HELO box.shutemov.name) ([10.252.62.145]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2023 15:04:45 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id 1A75810CF7C; Sun, 14 May 2023 01:04:34 +0300 (+03) From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>, Dave Hansen <dave.hansen@intel.com>, Sean Christopherson <seanjc@google.com>, Andrew Morton <akpm@linux-foundation.org>, Joerg Roedel <jroedel@suse.de>, Ard Biesheuvel <ardb@kernel.org> Cc: Andi Kleen <ak@linux.intel.com>, Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, David Rientjes <rientjes@google.com>, Vlastimil Babka <vbabka@suse.cz>, Tom Lendacky <thomas.lendacky@amd.com>, Thomas Gleixner <tglx@linutronix.de>, Peter Zijlstra <peterz@infradead.org>, Paolo Bonzini <pbonzini@redhat.com>, Ingo Molnar <mingo@redhat.com>, Dario Faggioli <dfaggioli@suse.com>, Mike Rapoport <rppt@kernel.org>, David Hildenbrand <david@redhat.com>, Mel Gorman <mgorman@techsingularity.net>, marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, aarcange@redhat.com, peterx@redhat.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Dave Hansen <dave.hansen@linux.intel.com> Subject: [PATCHv11 6/9] efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory Date: Sun, 14 May 2023 01:04:15 +0300 Message-Id: <20230513220418.19357-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20230513220418.19357-1-kirill.shutemov@linux.intel.com> References: <20230513220418.19357-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765818326046761949?= X-GMAIL-MSGID: =?utf-8?q?1765818326046761949?= |
Series |
mm, x86/cc, efi: Implement support for unaccepted memory
|
|
Commit Message
Kirill A. Shutemov
May 13, 2023, 10:04 p.m. UTC
load_unaligned_zeropad() can lead to unwanted loads across page boundaries. The unwanted loads are typically harmless. But, they might be made to totally unrelated or even unmapped memory. load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now #VE) to recover from these unwanted loads. But, this approach does not work for unaccepted memory. For TDX, a load from unaccepted memory will not lead to a recoverable exception within the guest. The guest will exit to the VMM where the only recourse is to terminate the guest. There are two parts to fix this issue and comprehensively avoid access to unaccepted memory. Together these ensure that an extra "guard" page is accepted in addition to the memory that needs to be used. 1. Implicitly extend the range_contains_unaccepted_memory(start, end) checks up to end+unit_size if 'end' is aligned on a unit_size boundary. 2. Implicitly extend accept_memory(start, end) to end+unit_size if 'end' is aligned on a unit_size boundary. Side note: This leads to something strange. Pages which were accepted at boot, marked by the firmware as accepted and will never _need_ to be accepted might be on unaccepted_pages list This is a cue to ensure that the next page is accepted before 'page' can be used. This is an actual, real-world problem which was discovered during TDX testing. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> --- drivers/firmware/efi/unaccepted_memory.c | 35 ++++++++++++++++++++++++ 1 file changed, 35 insertions(+)
Comments
On Sun, 14 May 2023 at 00:04, Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > load_unaligned_zeropad() can lead to unwanted loads across page boundaries. > The unwanted loads are typically harmless. But, they might be made to > totally unrelated or even unmapped memory. load_unaligned_zeropad() > relies on exception fixup (#PF, #GP and now #VE) to recover from these > unwanted loads. > > But, this approach does not work for unaccepted memory. For TDX, a load > from unaccepted memory will not lead to a recoverable exception within > the guest. The guest will exit to the VMM where the only recourse is to > terminate the guest. > Does this mean that the kernel maps memory before accepting it? As otherwise, I would assume that such an access would page fault inside the guest before triggering an exception related to the unaccepted state. > There are two parts to fix this issue and comprehensively avoid access > to unaccepted memory. Together these ensure that an extra "guard" page > is accepted in addition to the memory that needs to be used. > > 1. Implicitly extend the range_contains_unaccepted_memory(start, end) > checks up to end+unit_size if 'end' is aligned on a unit_size > boundary. > 2. Implicitly extend accept_memory(start, end) to end+unit_size if 'end' > is aligned on a unit_size boundary. > > Side note: This leads to something strange. Pages which were accepted > at boot, marked by the firmware as accepted and will never > _need_ to be accepted might be on unaccepted_pages list > This is a cue to ensure that the next page is accepted > before 'page' can be used. > > This is an actual, real-world problem which was discovered during TDX > testing. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> > --- > drivers/firmware/efi/unaccepted_memory.c | 35 ++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c > index bb91c41f76fb..3d1ca60916dd 100644 > --- a/drivers/firmware/efi/unaccepted_memory.c > +++ b/drivers/firmware/efi/unaccepted_memory.c > @@ -37,6 +37,34 @@ void accept_memory(phys_addr_t start, phys_addr_t end) > start -= unaccepted->phys_base; > end -= unaccepted->phys_base; > > + /* > + * load_unaligned_zeropad() can lead to unwanted loads across page > + * boundaries. The unwanted loads are typically harmless. But, they > + * might be made to totally unrelated or even unmapped memory. > + * load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now > + * #VE) to recover from these unwanted loads. > + * > + * But, this approach does not work for unaccepted memory. For TDX, a > + * load from unaccepted memory will not lead to a recoverable exception > + * within the guest. The guest will exit to the VMM where the only > + * recourse is to terminate the guest. > + * > + * There are two parts to fix this issue and comprehensively avoid > + * access to unaccepted memory. Together these ensure that an extra > + * "guard" page is accepted in addition to the memory that needs to be > + * used: > + * > + * 1. Implicitly extend the range_contains_unaccepted_memory(start, end) > + * checks up to end+unit_size if 'end' is aligned on a unit_size > + * boundary. > + * > + * 2. Implicitly extend accept_memory(start, end) to end+unit_size if > + * 'end' is aligned on a unit_size boundary. (immediately following > + * this comment) > + */ > + if (!(end % unit_size)) > + end += unit_size; > + > /* Make sure not to overrun the bitmap */ > if (end > unaccepted->size * unit_size * BITS_PER_BYTE) > end = unaccepted->size * unit_size * BITS_PER_BYTE; > @@ -84,6 +112,13 @@ bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end) > start -= unaccepted->phys_base; > end -= unaccepted->phys_base; > > + /* > + * Also consider the unaccepted state of the *next* page. See fix #1 in > + * the comment on load_unaligned_zeropad() in accept_memory(). > + */ > + if (!(end % unit_size)) > + end += unit_size; > + > /* Make sure not to overrun the bitmap */ > if (end > unaccepted->size * unit_size * BITS_PER_BYTE) > end = unaccepted->size * unit_size * BITS_PER_BYTE; > -- > 2.39.3 >
On 5/16/23 11:08, Ard Biesheuvel wrote: >> But, this approach does not work for unaccepted memory. For TDX, a load >> from unaccepted memory will not lead to a recoverable exception within >> the guest. The guest will exit to the VMM where the only recourse is to >> terminate the guest. >> > Does this mean that the kernel maps memory before accepting it? As > otherwise, I would assume that such an access would page fault inside > the guest before triggering an exception related to the unaccepted > state. Yes, the kernel maps memory before accepting it (modulo things like DEBUG_PAGEALLOC).
On Tue, May 16, 2023 at 08:08:37PM +0200, Ard Biesheuvel wrote: > On Sun, 14 May 2023 at 00:04, Kirill A. Shutemov > <kirill.shutemov@linux.intel.com> wrote: > > > > load_unaligned_zeropad() can lead to unwanted loads across page boundaries. > > The unwanted loads are typically harmless. But, they might be made to > > totally unrelated or even unmapped memory. load_unaligned_zeropad() > > relies on exception fixup (#PF, #GP and now #VE) to recover from these > > unwanted loads. > > > > But, this approach does not work for unaccepted memory. For TDX, a load > > from unaccepted memory will not lead to a recoverable exception within > > the guest. The guest will exit to the VMM where the only recourse is to > > terminate the guest. > > > > Does this mean that the kernel maps memory before accepting it? As > otherwise, I would assume that such an access would page fault inside > the guest before triggering an exception related to the unaccepted > state. Yes, kernel maps all memory into direct mapping whether it is accepted or not [yet]. The problem is that access of unaccepted memory is not page fault on TDX. It causes unrecoverable exit to the host so it must not happen to legitimate accesses, including load_unaligned_zeropad() overshoot. For context: there's a way configure TDX environment to trigger #VE on such accesses and it is default. But Linux requires such #VEs to be disabled as it opens attack vector from the host to the guest: host can pull any private page from under kernel at any point and trigger such #VE. If it happens in just a right time in syscall gap or NMI entry code it can be exploitable. See also commits 9a22bf6debbf and 373e715e31bf.
On Tue, 16 May 2023 at 20:27, Dave Hansen <dave.hansen@intel.com> wrote: > > On 5/16/23 11:08, Ard Biesheuvel wrote: > >> But, this approach does not work for unaccepted memory. For TDX, a load > >> from unaccepted memory will not lead to a recoverable exception within > >> the guest. The guest will exit to the VMM where the only recourse is to > >> terminate the guest. > >> > > Does this mean that the kernel maps memory before accepting it? As > > otherwise, I would assume that such an access would page fault inside > > the guest before triggering an exception related to the unaccepted > > state. > > Yes, the kernel maps memory before accepting it (modulo things like > DEBUG_PAGEALLOC). > OK, and so the architecture stipulates that prefetching or other speculative accesses must never deliver exceptions to the host regarding such ranges? If this all works as it should, then I'm ok with leaving this here, but I imagine we may want to factor out some arch specific policy here in the future, as I don't think this would work the same on ARM.
On Tue, May 16, 2023 at 08:35:27PM +0200, Ard Biesheuvel wrote: > On Tue, 16 May 2023 at 20:27, Dave Hansen <dave.hansen@intel.com> wrote: > > > > On 5/16/23 11:08, Ard Biesheuvel wrote: > > >> But, this approach does not work for unaccepted memory. For TDX, a load > > >> from unaccepted memory will not lead to a recoverable exception within > > >> the guest. The guest will exit to the VMM where the only recourse is to > > >> terminate the guest. > > >> > > > Does this mean that the kernel maps memory before accepting it? As > > > otherwise, I would assume that such an access would page fault inside > > > the guest before triggering an exception related to the unaccepted > > > state. > > > > Yes, the kernel maps memory before accepting it (modulo things like > > DEBUG_PAGEALLOC). > > > > OK, and so the architecture stipulates that prefetching or other > speculative accesses must never deliver exceptions to the host > regarding such ranges? > > If this all works as it should, then I'm ok with leaving this here, > but I imagine we may want to factor out some arch specific policy here > in the future, as I don't think this would work the same on ARM. Even if other architectures don't need this, it is harmless: we just accept one unit ahead of time.
On 5/16/23 11:35, Ard Biesheuvel wrote: >>> Does this mean that the kernel maps memory before accepting it? As >>> otherwise, I would assume that such an access would page fault inside >>> the guest before triggering an exception related to the unaccepted >>> state. >> Yes, the kernel maps memory before accepting it (modulo things like >> DEBUG_PAGEALLOC). >> > OK, and so the architecture stipulates that prefetching or other > speculative accesses must never deliver exceptions to the host > regarding such ranges? I don't know of anywhere that this is explicitly written. It's probably implicit _somewhere_ in the reams of VMX/TDX and base SDM docs, but heck if I know where it is. :) If this is something anyone wants to see added to the SEPT_VE_DISABLE documentation, please speak up. I don't think it would be hard to get it added and provide an explicit guarantee.
On Tue, May 16, 2023 at 01:03:32PM -0700, Dave Hansen wrote: > On 5/16/23 11:35, Ard Biesheuvel wrote: > >>> Does this mean that the kernel maps memory before accepting it? As > >>> otherwise, I would assume that such an access would page fault inside > >>> the guest before triggering an exception related to the unaccepted > >>> state. > >> Yes, the kernel maps memory before accepting it (modulo things like > >> DEBUG_PAGEALLOC). > >> > > OK, and so the architecture stipulates that prefetching or other > > speculative accesses must never deliver exceptions to the host > > regarding such ranges? > > I don't know of anywhere that this is explicitly written. It's probably > implicit _somewhere_ in the reams of VMX/TDX and base SDM docs, but heck > if I know where it is. :) It is not specific to TDX: on x86 (and all architectures with precise exceptions) exception handling is delayed until instruction retirement and will not happen if speculation turned out to be wrong. And prefetching never generates exceptions. But I failed to find right away in 5000+ pages of Intel Software Developer’s Manual. :/
On 5/16/23 14:52, Kirill A. Shutemov wrote: > On Tue, May 16, 2023 at 01:03:32PM -0700, Dave Hansen wrote: >> On 5/16/23 11:35, Ard Biesheuvel wrote: >>>>> Does this mean that the kernel maps memory before accepting it? As >>>>> otherwise, I would assume that such an access would page fault inside >>>>> the guest before triggering an exception related to the unaccepted >>>>> state. >>>> Yes, the kernel maps memory before accepting it (modulo things like >>>> DEBUG_PAGEALLOC). >>>> >>> OK, and so the architecture stipulates that prefetching or other >>> speculative accesses must never deliver exceptions to the host >>> regarding such ranges? >> I don't know of anywhere that this is explicitly written. It's probably >> implicit _somewhere_ in the reams of VMX/TDX and base SDM docs, but heck >> if I know where it is. 😄 > It is not specific to TDX: on x86 (and all architectures with precise > exceptions) exception handling is delayed until instruction retirement and > will not happen if speculation turned out to be wrong. And prefetching > never generates exceptions. Not to be Debbie Downer too much here, but it's *totally* possible for speculative execution to go read memory that causes you to machine check. We've had such bugs in Linux. We just happen to be lucky in this case that the unaccepted memory exceptions don't generate machine checks *AND* TDX hardware does not machine check on speculative accesses that would _just_ violate TDX security properties. You're right for normal, sane exceptions, though.
On Wed, 17 May 2023 at 00:00, Dave Hansen <dave.hansen@intel.com> wrote: > > On 5/16/23 14:52, Kirill A. Shutemov wrote: > > On Tue, May 16, 2023 at 01:03:32PM -0700, Dave Hansen wrote: > >> On 5/16/23 11:35, Ard Biesheuvel wrote: > >>>>> Does this mean that the kernel maps memory before accepting it? As > >>>>> otherwise, I would assume that such an access would page fault inside > >>>>> the guest before triggering an exception related to the unaccepted > >>>>> state. > >>>> Yes, the kernel maps memory before accepting it (modulo things like > >>>> DEBUG_PAGEALLOC). > >>>> > >>> OK, and so the architecture stipulates that prefetching or other > >>> speculative accesses must never deliver exceptions to the host > >>> regarding such ranges? > >> I don't know of anywhere that this is explicitly written. It's probably > >> implicit _somewhere_ in the reams of VMX/TDX and base SDM docs, but heck > >> if I know where it is. 😄 > > It is not specific to TDX: on x86 (and all architectures with precise > > exceptions) exception handling is delayed until instruction retirement and > > will not happen if speculation turned out to be wrong. And prefetching > > never generates exceptions. > > Not to be Debbie Downer too much here, but it's *totally* possible for > speculative execution to go read memory that causes you to machine > check. We've had such bugs in Linux. > > We just happen to be lucky in this case that the unaccepted memory > exceptions don't generate machine checks *AND* TDX hardware does not > machine check on speculative accesses that would _just_ violate TDX > security properties. > > You're right for normal, sane exceptions, though. Same thing on ARM, although I'd have to check their RME stuff in more detail to see how it behaves in this particular case. But Kyrill is right that it doesn't really matter for the logic in this patch - it just accepts some additional pages. The relevant difference between implementations will likely be whether unaccepted memory gets mapped beforehand in the first place, but we'll deal with that once we have to. As long as we only accept memory that appears in the bitmap as 'unaccepted', this kind of rounding seems safe and reasonable to me. Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
On 5/16/23 11:33, Kirill A. Shutemov wrote: > For context: there's a way configure TDX environment to trigger #VE on > such accesses and it is default. But Linux requires such #VEs to be > disabled as it opens attack vector from the host to the guest: host can > pull any private page from under kernel at any point and trigger such #VE. > If it happens in just a right time in syscall gap or NMI entry code it can > be exploitable. I'm kinda uncomfortable with saying it's exploitable. It really boils down to not wanting to deal with managing a new IST exception. While the NMI IST implementation is about as good as we can get it, I believe there are still holes in it (even if we consider only how it interacts with #MC). The more IST users we add, the more holes there are. You add the fact that an actual adversary can induce the exceptions instead of (rare and mostly random) radiation that causes #MC, and it makes me want to either curl up in a little ball or pursue a new career. So, exploitable? Dunno. Do I want to touch an #VE/IST implementation? No way, not with a 10 foot pole.
On 5/13/23 17:04, Kirill A. Shutemov wrote: > load_unaligned_zeropad() can lead to unwanted loads across page boundaries. > The unwanted loads are typically harmless. But, they might be made to > totally unrelated or even unmapped memory. load_unaligned_zeropad() > relies on exception fixup (#PF, #GP and now #VE) to recover from these > unwanted loads. > > But, this approach does not work for unaccepted memory. For TDX, a load > from unaccepted memory will not lead to a recoverable exception within > the guest. The guest will exit to the VMM where the only recourse is to > terminate the guest. > > There are two parts to fix this issue and comprehensively avoid access > to unaccepted memory. Together these ensure that an extra "guard" page > is accepted in addition to the memory that needs to be used. > > 1. Implicitly extend the range_contains_unaccepted_memory(start, end) > checks up to end+unit_size if 'end' is aligned on a unit_size > boundary. > 2. Implicitly extend accept_memory(start, end) to end+unit_size if 'end' > is aligned on a unit_size boundary. > > Side note: This leads to something strange. Pages which were accepted > at boot, marked by the firmware as accepted and will never > _need_ to be accepted might be on unaccepted_pages list > This is a cue to ensure that the next page is accepted > before 'page' can be used. > > This is an actual, real-world problem which was discovered during TDX > testing. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> > --- > drivers/firmware/efi/unaccepted_memory.c | 35 ++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c > index bb91c41f76fb..3d1ca60916dd 100644 > --- a/drivers/firmware/efi/unaccepted_memory.c > +++ b/drivers/firmware/efi/unaccepted_memory.c > @@ -37,6 +37,34 @@ void accept_memory(phys_addr_t start, phys_addr_t end) > start -= unaccepted->phys_base; > end -= unaccepted->phys_base; > > + /* > + * load_unaligned_zeropad() can lead to unwanted loads across page > + * boundaries. The unwanted loads are typically harmless. But, they > + * might be made to totally unrelated or even unmapped memory. > + * load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now > + * #VE) to recover from these unwanted loads. > + * > + * But, this approach does not work for unaccepted memory. For TDX, a > + * load from unaccepted memory will not lead to a recoverable exception > + * within the guest. The guest will exit to the VMM where the only > + * recourse is to terminate the guest. > + * > + * There are two parts to fix this issue and comprehensively avoid > + * access to unaccepted memory. Together these ensure that an extra > + * "guard" page is accepted in addition to the memory that needs to be > + * used: > + * > + * 1. Implicitly extend the range_contains_unaccepted_memory(start, end) > + * checks up to end+unit_size if 'end' is aligned on a unit_size > + * boundary. > + * > + * 2. Implicitly extend accept_memory(start, end) to end+unit_size if > + * 'end' is aligned on a unit_size boundary. (immediately following > + * this comment) > + */ > + if (!(end % unit_size)) > + end += unit_size; > + > /* Make sure not to overrun the bitmap */ > if (end > unaccepted->size * unit_size * BITS_PER_BYTE) > end = unaccepted->size * unit_size * BITS_PER_BYTE; > @@ -84,6 +112,13 @@ bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end) > start -= unaccepted->phys_base; > end -= unaccepted->phys_base; > > + /* > + * Also consider the unaccepted state of the *next* page. See fix #1 in > + * the comment on load_unaligned_zeropad() in accept_memory(). > + */ > + if (!(end % unit_size)) > + end += unit_size; > + > /* Make sure not to overrun the bitmap */ > if (end > unaccepted->size * unit_size * BITS_PER_BYTE) > end = unaccepted->size * unit_size * BITS_PER_BYTE;
diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c index bb91c41f76fb..3d1ca60916dd 100644 --- a/drivers/firmware/efi/unaccepted_memory.c +++ b/drivers/firmware/efi/unaccepted_memory.c @@ -37,6 +37,34 @@ void accept_memory(phys_addr_t start, phys_addr_t end) start -= unaccepted->phys_base; end -= unaccepted->phys_base; + /* + * load_unaligned_zeropad() can lead to unwanted loads across page + * boundaries. The unwanted loads are typically harmless. But, they + * might be made to totally unrelated or even unmapped memory. + * load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now + * #VE) to recover from these unwanted loads. + * + * But, this approach does not work for unaccepted memory. For TDX, a + * load from unaccepted memory will not lead to a recoverable exception + * within the guest. The guest will exit to the VMM where the only + * recourse is to terminate the guest. + * + * There are two parts to fix this issue and comprehensively avoid + * access to unaccepted memory. Together these ensure that an extra + * "guard" page is accepted in addition to the memory that needs to be + * used: + * + * 1. Implicitly extend the range_contains_unaccepted_memory(start, end) + * checks up to end+unit_size if 'end' is aligned on a unit_size + * boundary. + * + * 2. Implicitly extend accept_memory(start, end) to end+unit_size if + * 'end' is aligned on a unit_size boundary. (immediately following + * this comment) + */ + if (!(end % unit_size)) + end += unit_size; + /* Make sure not to overrun the bitmap */ if (end > unaccepted->size * unit_size * BITS_PER_BYTE) end = unaccepted->size * unit_size * BITS_PER_BYTE; @@ -84,6 +112,13 @@ bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end) start -= unaccepted->phys_base; end -= unaccepted->phys_base; + /* + * Also consider the unaccepted state of the *next* page. See fix #1 in + * the comment on load_unaligned_zeropad() in accept_memory(). + */ + if (!(end % unit_size)) + end += unit_size; + /* Make sure not to overrun the bitmap */ if (end > unaccepted->size * unit_size * BITS_PER_BYTE) end = unaccepted->size * unit_size * BITS_PER_BYTE;