Message ID | 20221203003606.6838-17-rick.p.edgecombe@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142417wrr; Fri, 2 Dec 2022 16:40:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf6kPHXpXbudoIYQW4dzO/FQ4i73B2mHE8Ozp2nefO/UiLFp/uEdY9yXes7MR3Azfi5j0KGj X-Received: by 2002:aa7:d417:0:b0:46b:203:f389 with SMTP id z23-20020aa7d417000000b0046b0203f389mr26807916edq.303.1670028048034; Fri, 02 Dec 2022 16:40:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028048; cv=none; d=google.com; s=arc-20160816; b=SPz1Cj49BRj89Q+u6iklKx6/giwGP3xDHYZijZZy69nyxa99m29m7PdwcWF52NVGtV 0YSDPSkhUBn28wrC9+dI4/MF5J5w5PppUD50TOoiNpH/ZRUCn+9ZvKsC44EvRpq88k2m 0LcgCsow0Mx2pmxYq65IdhsjbIv2+eDM9/LJNJC20W4h40gkcM4+PAl5Xt1roPMeqhWR +264roV+d/NbNw4FyWYsmUBbc88pZw5aBWK/deTQpl71Cd3HMJEZtMLnXlxlIelhOQDv 3ogwJwVacHWJqqCi2I7SK4qoh+h4oJcrejxOtr5WnhewllxHahZBmgy7vvDBZgzYRdhK GAyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=jQ2ub1Zd+S87JuwDoSuZjc6Rp4KCKe1O+p7UBx9N2so=; b=J1YAV6TPmZCsCCMm0SYmey/kUgjkfKb9tHKive72Q+m0iD53Tm2M5YawMXgLG6fDSp YDgTf6Skl3HK4oixt85FUqEvGfPrzsFIsBPEzWfLJReZlpYzXP1atbrOc3x+lAUAP6nD vn8tRy3dE4iMRAgbVvETUPcyrKcGBWuVBmnhrafukC5thUjkswWhM0qJsokjYYgJzgb/ GYSxMcAw6UjG3FPtezRiXXqRNZUwVyNzIQbJDwlsiw3m6oOo025c+IUAfDs5ZGIlqmlM mMDo018o7uo+yiblfMA039CZudqQWwoLyEjRXQMtTfjq0yoUV9y2tnjs+bUrwPtr0znc R4kA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m4AGbYZr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id br13-20020a170906d14d00b007c0bd0edd7esi2845912ejb.906.2022.12.02.16.40.25; Fri, 02 Dec 2022 16:40:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m4AGbYZr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235171AbiLCAje (ORCPT <rfc822;lhua1029@gmail.com> + 99 others); Fri, 2 Dec 2022 19:39:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235098AbiLCAii (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Dec 2022 19:38:38 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D68EEFCEB3; Fri, 2 Dec 2022 16:37:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027840; x=1701563840; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=47/VmWpEzBLHREjaOQMiegzkRhL0dQgFbtGsfMS6wRY=; b=m4AGbYZrW9vvbWywxI5oVFNfYsFtDcbn6xMF9CkNjmgeUuWKxAXJ/hw6 PeWnBc+uyvxYGLAaSNbAX7nhpQQLLaubC6F2yWpMmgyghahZoriDlZHUV MCd1DwA18kT9oN52v91qWKGR0J4q7mwZpXuywcotrcipYD34RHI/jHUMU +crExBgcK810gf2pVaWn58LbfMcTDZ94gSsv9lW0ZwONjVMD/z0ZwWZWA EfviuKc230Ln/KgzEwofkTV29rGi1OYwr6TQ3rXNRjJMUb8wyhu1lKMIi 6zhffN3N/DAgGKl0w2xbmXns3/VMHryzrJLkeRfpH69mVumfc+mU3MfP+ A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711041" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711041" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:09 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479882" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479882" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:07 -0800 From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>, Andy Lutomirski <luto@kernel.org>, Balbir Singh <bsingharora@gmail.com>, Borislav Petkov <bp@alien8.de>, Cyrill Gorcunov <gorcunov@gmail.com>, Dave Hansen <dave.hansen@linux.intel.com>, Eugene Syromiatnikov <esyr@redhat.com>, Florian Weimer <fweimer@redhat.com>, "H . J . Lu" <hjl.tools@gmail.com>, Jann Horn <jannh@google.com>, Jonathan Corbet <corbet@lwn.net>, Kees Cook <keescook@chromium.org>, Mike Kravetz <mike.kravetz@oracle.com>, Nadav Amit <nadav.amit@gmail.com>, Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>, Peter Zijlstra <peterz@infradead.org>, Randy Dunlap <rdunlap@infradead.org>, Weijiang Yang <weijiang.yang@intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, John Allen <john.allen@amd.com>, kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu <yu-cheng.yu@intel.com> Subject: [PATCH v4 16/39] x86/mm: Check Shadow Stack page fault errors Date: Fri, 2 Dec 2022 16:35:43 -0800 Message-Id: <20221203003606.6838-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151330478741246?= X-GMAIL-MSGID: =?utf-8?q?1751151330478741246?= |
Series |
Shadow stacks for userspace
|
|
Commit Message
Edgecombe, Rick P
Dec. 3, 2022, 12:35 a.m. UTC
From: Yu-cheng Yu <yu-cheng.yu@intel.com> The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stacks accesses to shadow-stack mappings can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- v4: - Further improve comment talking about FAULT_FLAG_WRITE (Peterz) v3: - Improve comment talking about using FAULT_FLAG_WRITE (Peterz) v2: - Update commit log with verbiage/feedback from Dave Hansen - Clarify reasoning for FAULT_FLAG_WRITE for all shadow stack accesses - Update comments with some verbiage from Dave Hansen Yu-cheng v30: - Update Subject line and add a verb arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 38 ++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+)
Comments
On Fri, Dec 02, 2022 at 04:35:43PM -0800, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > The CPU performs "shadow stack accesses" when it expects to encounter > shadow stack mappings. These accesses can be implicit (via CALL/RET > instructions) or explicit (instructions like WRSS). > > Shadow stacks accesses to shadow-stack mappings can see faults in normal, > valid operation just like regular accesses to regular mappings. Shadow > stacks need some of the same features like delayed allocation, swap and > copy-on-write. The kernel needs to use faults to implement those features. > > The architecture has concepts of both shadow stack reads and shadow stack > writes. Any shadow stack access to non-shadow stack memory will generate > a fault with the shadow stack error code bit set. You lost me here: by "shadow stack access to non-shadow stack memory" you mean the explicit one using WRU*SS? > This means that, unlike normal write protection, the fault handler needs > to create a type of memory that can be written to (with instructions that > generate shadow stack writes), even to fulfill a read access. So in the > case of COW memory, the COW needs to take place even with a shadow stack > read. I guess I'm missing an example here: are we talking here about a user process getting its shadow stack pages allocated and them being COW first and on the first shstk operation, it would generate that fault? > @@ -1331,6 +1345,30 @@ void do_user_addr_fault(struct pt_regs *regs, > > perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); > > + /* > + * When a page becomes COW it changes from a shadow stack permissioned Unknown word [permissioned] in comment. > + * page (Write=0,Dirty=1) to (Write=0,Dirty=0,CoW=1), which is simply > + * read-only to the CPU. When shadow stack is enabled, a RET would > + * normally pop the shadow stack by reading it with a "shadow stack > + * read" access. However, in the COW case the shadow stack memory does > + * not have shadow stack permissions, it is read-only. So it will > + * generate a fault. > + * > + * For conventionally writable pages, a read can be serviced with a > + * read only PTE, and COW would not have to happen. But for shadow > + * stack, there isn't the concept of read-only shadow stack memory. > + * If it is shadow stack permissioned, it can be modified via CALL and Ditto. > + * RET instructions. So COW needs to happen before any memory can be > + * mapped with shadow stack permissions. > + * > + * Shadow stack accesses (read or write) need to be serviced with > + * shadow stack permissioned memory, so in the case of a shadow stack Is this some new formulation I haven't heard about yet? "Permissioned <something>"? > + * read access, treat it as a WRITE fault so both COW will happen and > + * the write fault path will tickle maybe_mkwrite() and map the memory > + * shadow stack. > + */ > + if (error_code & X86_PF_SHSTK) > + flags |= FAULT_FLAG_WRITE; > if (error_code & X86_PF_WRITE) > flags |= FAULT_FLAG_WRITE; > if (error_code & X86_PF_INSTR) > -- > 2.17.1 >
On Wed, 2023-01-04 at 15:32 +0100, Borislav Petkov wrote: > On Fri, Dec 02, 2022 at 04:35:43PM -0800, Rick Edgecombe wrote: > > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > > > The CPU performs "shadow stack accesses" when it expects to > > encounter > > shadow stack mappings. These accesses can be implicit (via CALL/RET > > instructions) or explicit (instructions like WRSS). > > > > Shadow stacks accesses to shadow-stack mappings can see faults in > > normal, > > valid operation just like regular accesses to regular mappings. > > Shadow > > stacks need some of the same features like delayed allocation, swap > > and > > copy-on-write. The kernel needs to use faults to implement those > > features. > > > > The architecture has concepts of both shadow stack reads and shadow > > stack > > writes. Any shadow stack access to non-shadow stack memory will > > generate > > a fault with the shadow stack error code bit set. > > You lost me here: by "shadow stack access to non-shadow stack memory" > you mean > the explicit one using WRU*SS? Shadow stack accesses can be WR*SS, shadow stack pushes/pops from call/ret or incssp. Basically the instructions that intend to read or write to a shadow stack. > > > This means that, unlike normal write protection, the fault handler > > needs > > to create a type of memory that can be written to (with > > instructions that > > generate shadow stack writes), even to fulfill a read access. So in > > the > > case of COW memory, the COW needs to take place even with a shadow > > stack > > read. > > I guess I'm missing an example here: are we talking here about a user > process > getting its shadow stack pages allocated and them being COW first and > on the > first shstk operation, it would generate that fault? So if you have a shadow stack, then fork() so the shadow stack PTEs become read-only. Then you RET and the shadow stack get's popped to compare it to the normal stack value. In this case it only needs to read the shadow stack, but it does this with a "shadow stack read", which will fault if memory is not shadow stack. So the fault is a read fault, but it needs to make the PTE shadow stack in order to resolve it. So it needs to trigger COW, otherwise the shared page would be changeable from userspace. Make sense? I guess I can add an example if you think it would help. > > > @@ -1331,6 +1345,30 @@ void do_user_addr_fault(struct pt_regs > > *regs, > > > > perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); > > > > + /* > > + * When a page becomes COW it changes from a shadow stack > > permissioned > > Unknown word [permissioned] in comment. I can change it. > > > + * page (Write=0,Dirty=1) to (Write=0,Dirty=0,CoW=1), which is > > simply > > + * read-only to the CPU. When shadow stack is enabled, a RET > > would > > + * normally pop the shadow stack by reading it with a "shadow > > stack > > + * read" access. However, in the COW case the shadow stack > > memory does > > + * not have shadow stack permissions, it is read-only. So it > > will > > + * generate a fault. > > + * > > + * For conventionally writable pages, a read can be serviced > > with a > > + * read only PTE, and COW would not have to happen. But for > > shadow > > + * stack, there isn't the concept of read-only shadow stack > > memory. > > + * If it is shadow stack permissioned, it can be modified via > > CALL and > > Ditto. > > > + * RET instructions. So COW needs to happen before any memory > > can be > > + * mapped with shadow stack permissions. > > + * > > + * Shadow stack accesses (read or write) need to be serviced > > with > > + * shadow stack permissioned memory, so in the case of a shadow > > stack > > Is this some new formulation I haven't heard about yet? > > "Permissioned <something>"? It looks like it's not an official dictionary word, but I've seen it from time to time: https://en.wiktionary.org/wiki/permissioned > > > + * read access, treat it as a WRITE fault so both COW will > > happen and > > + * the write fault path will tickle maybe_mkwrite() and map the > > memory > > + * shadow stack. > > + */ > > + if (error_code & X86_PF_SHSTK) > > + flags |= FAULT_FLAG_WRITE; > > if (error_code & X86_PF_WRITE) > > flags |= FAULT_FLAG_WRITE; > > if (error_code & X86_PF_INSTR) > > -- > > 2.17.1 > > > >
diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 7b0d4ab894c8..3004ad044e9b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1138,8 +1138,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1331,6 +1345,30 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * When a page becomes COW it changes from a shadow stack permissioned + * page (Write=0,Dirty=1) to (Write=0,Dirty=0,CoW=1), which is simply + * read-only to the CPU. When shadow stack is enabled, a RET would + * normally pop the shadow stack by reading it with a "shadow stack + * read" access. However, in the COW case the shadow stack memory does + * not have shadow stack permissions, it is read-only. So it will + * generate a fault. + * + * For conventionally writable pages, a read can be serviced with a + * read only PTE, and COW would not have to happen. But for shadow + * stack, there isn't the concept of read-only shadow stack memory. + * If it is shadow stack permissioned, it can be modified via CALL and + * RET instructions. So COW needs to happen before any memory can be + * mapped with shadow stack permissions. + * + * Shadow stack accesses (read or write) need to be serviced with + * shadow stack permissioned memory, so in the case of a shadow stack + * read access, treat it as a WRITE fault so both COW will happen and + * the write fault path will tickle maybe_mkwrite() and map the memory + * shadow stack. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR)