Message ID | 20221104223604.29615-16-rick.p.edgecombe@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679349wru; Fri, 4 Nov 2022 15:43:04 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4voLGXnbO91OuhbWcpPk4YWVJCb4EAbjtk+fxohNPYKoDlVxPlHgkKQ1Ai1jyu4uIrWb1k X-Received: by 2002:a17:906:6a26:b0:7ad:975c:9785 with SMTP id qw38-20020a1709066a2600b007ad975c9785mr39405591ejc.25.1667601784335; Fri, 04 Nov 2022 15:43:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601784; cv=none; d=google.com; s=arc-20160816; b=xuvivPrx36/8BsGktcLDT5VsQciWjfklxIxKbVC/2Ulkv99X2LDgKojqoJIi/j+Kni U4Ai7mUDyMn6/cNRfy7dwR0+ALV3l1bPgtGUrlqbQscNnZuCiH7UZKDdfRklRsrhqvTN TLdnlVKmkeinX7c3okciPkXA/TNz6HcU+8jQgp8+XilXrwCP1LvsrcCqQhITrBeK/CSS e5LdQ0VCzSNsZv+FnNFqte75qbUN5qjQLv4nJMgmvvAd05Twr00TT7K8vT72XSTdKtGZ oX0HtGOSWNn3uDLS60CSAGw4Wfw2k8woCU1utiNNiJTdwxtPCAQKXFbTmFWKSz2Fkf+R Df3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=5W9EdcBuQ+BX+iYQV757l9mPEjV+CHVF5qhLTO+Fgr8=; b=IeZXJv2/K2OtIU94/yM5GgR3vbKG5vjHEjxkRlt2e3A+/YGP/l2pLMq8UdmoN61z6v 4D148sfUshtBVKY48PKqCx6NL2g7mX0gyAnzNzSSs3drsmKdiOB/ZdPQrD20NWEYr4CM NbJVUUWCgfi/LUxc+1DetzN2wPrD8YjUrc6ZuTxnfVmDZ/RBd3msphm4JOeIqeLQUI7C 5O7jWEQKA6eYROP07KN9KKlTjDrRfFOPtf9O97rfQKiCsxZc7uomgjV0/ngvxYp2UcAR e874JWl5kSvCxr/tctCFxQDEOH6uEaiqod00+s5jrmmJ0LT9DrRgPIgMh7ix7NGrlfqU J5HQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RD6QhiLk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mp33-20020a1709071b2100b007acbac0871csi344874ejc.420.2022.11.04.15.42.40; Fri, 04 Nov 2022 15:43:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RD6QhiLk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230241AbiKDWlt (ORCPT <rfc822;hjfbswb@gmail.com> + 99 others); Fri, 4 Nov 2022 18:41:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229985AbiKDWkp (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 4 Nov 2022 18:40:45 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE45B4AF30; Fri, 4 Nov 2022 15:39:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601577; x=1699137577; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=3/rM2KyC/x9FhcIHFIgAr562vZ6+pVG10d1tw9A8cn4=; b=RD6QhiLkzJyTfOFSedDqKGmLY8rWgQehYYU5uUWPqnyN3t5QN9DRucHD viVYFPrPJgrlVLQVsmERw3AU8B3ZePWtLsfpjhfrHKcde+BFMkvWtcQdl 3r/I+Q2qb03VV1zdheLB3cLv7bomtw0ezqII6OlUjJlKYyKypIVxgCx/q kdCFx8mN/Bl8R/tZn4CqotgXkbeSmhVGpI7o8MDSbGabW3FXjreg08KRr ogGDTwmFNjMUp4sImHZtk9XUlx1969SsjSiXRLZ0KBMrNgOt62EDNELZc oaVsgGGxnL7ufBkRWneUCUSdThHL98FJJCHMAGEN3L02ULuG7qPpfuOxi Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840538" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840538" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:36 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514058" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514058" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:35 -0700 From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>, Andy Lutomirski <luto@kernel.org>, Balbir Singh <bsingharora@gmail.com>, Borislav Petkov <bp@alien8.de>, Cyrill Gorcunov <gorcunov@gmail.com>, Dave Hansen <dave.hansen@linux.intel.com>, Eugene Syromiatnikov <esyr@redhat.com>, Florian Weimer <fweimer@redhat.com>, "H . J . Lu" <hjl.tools@gmail.com>, Jann Horn <jannh@google.com>, Jonathan Corbet <corbet@lwn.net>, Kees Cook <keescook@chromium.org>, Mike Kravetz <mike.kravetz@oracle.com>, Nadav Amit <nadav.amit@gmail.com>, Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>, Peter Zijlstra <peterz@infradead.org>, Randy Dunlap <rdunlap@infradead.org>, "Ravi V . Shankar" <ravi.v.shankar@intel.com>, Weijiang Yang <weijiang.yang@intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, John Allen <john.allen@amd.com>, kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu <yu-cheng.yu@intel.com> Subject: [PATCH v3 15/37] x86/mm: Check Shadow Stack page fault errors Date: Fri, 4 Nov 2022 15:35:42 -0700 Message-Id: <20221104223604.29615-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607208883507700?= X-GMAIL-MSGID: =?utf-8?q?1748607208883507700?= |
Series |
Shadow stacks for userspace
|
|
Commit Message
Edgecombe, Rick P
Nov. 4, 2022, 10:35 p.m. UTC
From: Yu-cheng Yu <yu-cheng.yu@intel.com> The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stacks accesses to shadow-stack mappings can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- v3: - Improve comment talking about using FAULT_FLAG_WRITE (Peterz) v2: - Update commit log with verbiage/feedback from Dave Hansen - Clarify reasoning for FAULT_FLAG_WRITE for all shadow stack accesses - Update comments with some verbiage from Dave Hansen Yu-cheng v30: - Update Subject line and add a verb arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++++ 2 files changed, 28 insertions(+)
Comments
On Fri, Nov 04, 2022 at 03:35:42PM -0700, Rick Edgecombe wrote: > @@ -1331,6 +1345,18 @@ void do_user_addr_fault(struct pt_regs *regs, > > perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); > > + /* > + * To service shadow stack read faults, unlike normal read faults, the > + * fault handler needs to create a type of memory that will also be > + * writable (with instructions that generate shadow stack writes). > + * In the case of COW memory, the COW needs to take place even with > + * a shadow stack read. Otherwise the shared page will be left (shadow > + * stack) writable in userspace. So to trigger the appropriate behavior > + * by setting FAULT_FLAG_WRITE for shadow stack accesses, even if the > + * access was a shadow stack read. > + */ Clear as mud... So SS pages are 'Write=0,Dirty=1', which, per construction, lack a RW bit. And these pages are writable (WRUSS). pte_wrprotect() seems to do: _PAGE_DIRTY->_PAGE_COW (which is really weird in this situation), resulting in: 'Write=0,Dirty=0,Cow=1'. That's regular RO memory and won't raise read-faults. But I'm thinking RET will trip #PF here when it tries to read the SS because the SSP is not a proper shadow stack page? And in that case you want to tickle pte_mkwrite() to undo the pte_wrprotect() above? So while the #PF is a 'read' fault due to RET not actually writing to the shadow stack, you want to force a write fault so it will re-instate the SS page. Did I get that right? > + if (error_code & X86_PF_SHSTK) > + flags |= FAULT_FLAG_WRITE; > if (error_code & X86_PF_WRITE) > flags |= FAULT_FLAG_WRITE; > if (error_code & X86_PF_INSTR) > -- > 2.17.1 >
On Tue, 2022-11-15 at 12:47 +0100, Peter Zijlstra wrote: > On Fri, Nov 04, 2022 at 03:35:42PM -0700, Rick Edgecombe wrote: > > @@ -1331,6 +1345,18 @@ void do_user_addr_fault(struct pt_regs > > *regs, > > > > perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); > > > > + /* > > + * To service shadow stack read faults, unlike normal read > > faults, the > > + * fault handler needs to create a type of memory that will > > also be > > + * writable (with instructions that generate shadow stack > > writes). > > + * In the case of COW memory, the COW needs to take place > > even with > > + * a shadow stack read. Otherwise the shared page will be > > left (shadow > > + * stack) writable in userspace. So to trigger the > > appropriate behavior > > + * by setting FAULT_FLAG_WRITE for shadow stack accesses, > > even if the > > + * access was a shadow stack read. > > + */ > > Clear as mud... So SS pages are 'Write=0,Dirty=1', which, per > construction, lack a RW bit. And these pages are writable (WRUSS). > > pte_wrprotect() seems to do: _PAGE_DIRTY->_PAGE_COW (which is really > weird in this situation), resulting in: 'Write=0,Dirty=0,Cow=1'. > > That's regular RO memory and won't raise read-faults. > > But I'm thinking RET will trip #PF here when it tries to read the SS > because the SSP is not a proper shadow stack page? > > And in that case you want to tickle pte_mkwrite() to undo the > pte_wrprotect() above? > > So while the #PF is a 'read' fault due to RET not actually writing to > the shadow stack, you want to force a write fault so it will re- > instate > the SS page. > > Did I get that right? That's right. I think the assumption that needs to be broken in the readers head is that you can satisfy a read fault with read-only PTE. This is kind of baked in all over the place with the zero-pfn, COW, etc. Maybe I should try to start with that.
On Tue, Nov 15, 2022 at 08:03:06PM +0000, Edgecombe, Rick P wrote: > That's right. I think the assumption that needs to be broken in the > readers head is that you can satisfy a read fault with read-only PTE. > This is kind of baked in all over the place with the zero-pfn, COW, > etc. Maybe I should try to start with that. Maybe something like: CoW -- pte_wrprotect() -- changes a SS page 'Write=0,Dirty=1' to 'Write=0,Dirty=0,CoW=1' which is a 'regular' RO page. A SS read from RET will #PF because it expects a SS page. Make sure to break the CoW so it can be restored to an SS page, as such force the write path and tickle pte_mkwrite().
On Tue, 2022-11-15 at 22:07 +0100, Peter Zijlstra wrote: > On Tue, Nov 15, 2022 at 08:03:06PM +0000, Edgecombe, Rick P wrote: > > > That's right. I think the assumption that needs to be broken in the > > readers head is that you can satisfy a read fault with read-only > > PTE. > > This is kind of baked in all over the place with the zero-pfn, COW, > > etc. Maybe I should try to start with that. > > Maybe something like: > > CoW -- pte_wrprotect() -- changes a SS page 'Write=0,Dirty=1' to > 'Write=0,Dirty=0,CoW=1' which is a 'regular' RO page. A SS read from > RET > will #PF because it expects a SS page. Make sure to break the CoW so > it > can be restored to an SS page, as such force the write path and > tickle > pte_mkwrite(). Hmm, TBH I'm not sure it's more clear. I tried to take this and fill it out more. Does it sound better? When a page becomes COW it changes from a shadow stack permissioned page (Write=0,Dirty=1) to (Write=0,Dirty=0,CoW=1), which is simply read-only to the CPU. When shadow stack is enabled, a RET would normally pop the shadow stack by reading it with a "shadow stack read" access. However, in the COW case the shadow stack memory does not have shadow stack permissions, it is read-only. So it will generate a fault. For conventionally writable pages, a read can be serviced with a read only PTE, and COW would not have to happen. But for shadow stack, there isn't the concept of read-only shadow stack memory. If it is shadow stack permissioned, it can be modified via CALL and RET instructions. So COW needs to happen before any memory can be mapped with shadow stack permissions. Shadow stack accesses (read or write) need to be serviced with shadow stack permissioned memory, so in the case of a shadow stack read access, treat it as a WRITE fault so both COW will happen and the write fault path will tickle maybe_mkwrite() and map the memory shadow stack.
On Tue, Nov 15, 2022 at 11:13:34PM +0000, Edgecombe, Rick P wrote: > When a page becomes COW it changes from a shadow stack permissioned > page (Write=0,Dirty=1) to (Write=0,Dirty=0,CoW=1), which is simply > read-only to the CPU. When shadow stack is enabled, a RET would > normally pop the shadow stack by reading it with a "shadow stack read" > access. However, in the COW case the shadow stack memory does not have > shadow stack permissions, it is read-only. So it will generate a fault. > > For conventionally writable pages, a read can be serviced with a read > only PTE, and COW would not have to happen. But for shadow stack, there > isn't the concept of read-only shadow stack memory. If it is shadow > stack permissioned, it can be modified via CALL and RET instructions. > So COW needs to happen before any memory can be mapped with shadow > stack permissions. > > Shadow stack accesses (read or write) need to be serviced with shadow > stack permissioned memory, so in the case of a shadow stack read > access, treat it as a WRITE fault so both COW will happen and the write > fault path will tickle maybe_mkwrite() and map the memory shadow stack. ACK.
diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 7b0d4ab894c8..0af3d7f52c2e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1138,8 +1138,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1331,6 +1345,18 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * To service shadow stack read faults, unlike normal read faults, the + * fault handler needs to create a type of memory that will also be + * writable (with instructions that generate shadow stack writes). + * In the case of COW memory, the COW needs to take place even with + * a shadow stack read. Otherwise the shared page will be left (shadow + * stack) writable in userspace. So to trigger the appropriate behavior + * by setting FAULT_FLAG_WRITE for shadow stack accesses, even if the + * access was a shadow stack read. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR)