[v5,12/39] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW
Message ID | 20230119212317.8324-13-rick.p.edgecombe@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp555950wrn; Thu, 19 Jan 2023 13:35:38 -0800 (PST) X-Google-Smtp-Source: AMrXdXsmSGWkhUqhBjgHirq05ioFrlq1VsXTrRMM5S2YDETcf5iAhzWVSJvtwN8GqZCDdKOQdP4t X-Received: by 2002:a17:902:8688:b0:193:19c3:4915 with SMTP id g8-20020a170902868800b0019319c34915mr11292262plo.67.1674164137766; Thu, 19 Jan 2023 13:35:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674164137; cv=none; d=google.com; s=arc-20160816; b=PNXE/9go4zN5vS4P35kV246GcqRURIh+6OhnhsTYCa5kx5bGD7dPvo/rpb8YqIzREi aju36OUg9j9qAa9pMp+4CZTysLCImHid/K32ZFFIaBh3v37OYUYorKK/618dyR5MD812 8e0Tiw+TKhlITgzy619pccizFD08r21jEAw7US4hLkYx2LyUOdASHEA/n6b88m6X2WRP dTSkG+YDoe2aa6BUaGDwkPXznUQbyfsl5MIIT6QM4a80KO4h37oLhrPjdnc9uGu4qeEF vOcvmIgeCKH8f3tJ2fPjcw4DN8xOS3fcTz4umwgybEAPMDrrddpVhy1NbsQ/2qbXgOuO jlaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=qRKDNK2oMQO5oOmk+unpG3nEq+k+7RxaZmTu69DQeXE=; b=nYMiFgMDFCKhxRRrhE8f+szefI5zENzk/TnMSeyuEPlgAWBk8Hz0m/QJhd6KBxA6zB zA6FMNMMzs3vQwcFUlEXwMBPXkAJUvAnvJQGH2tETedg3yx7voUykmM2W3V1adV0RgPh NasNcD4DT4tXyOqBg+bihpnmsLM+5hIszgm3Ob5CkTdhSeklLPC7/mci6MrrzrzGwULA 5ei6bOpusbaMKrStO/A5HKGkRkhJ4CCoyfeRJpinVe5sH3OS2j4T5bcUvEhOxR71ri6h f0ig6cov0T3Fka5dcBZz73DfzvGTqhufOAN5j29ELYG+LVn0ksTIEC4kwqJs46+o3d0+ N18g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fz7WkrmJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ay8-20020a1709028b8800b00191457548c0si35386029plb.507.2023.01.19.13.35.25; Thu, 19 Jan 2023 13:35:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fz7WkrmJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229834AbjASVeJ (ORCPT <rfc822;pavtiger@gmail.com> + 99 others); Thu, 19 Jan 2023 16:34:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230003AbjASVba (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 19 Jan 2023 16:31:30 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 403A24B4A8; Thu, 19 Jan 2023 13:25:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674163557; x=1705699557; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=BvuOTjZc5NHm9ltMy7F/33kG5ApCivtjPVo72HaZTJE=; b=fz7WkrmJtAuTmeLsTlW4joHvSBAe0DFMq1wvuF7i2ZHA4+9tNpdQI+Hi IbyBliCQxo6m+tsJQNOQVvJTHjMNzbhG2owSPD9JmwHv5juHXlR+lLO1R arxEaRlO6mvpbRgklr8xZgqXiMLbUe0frBhnkahBKvS1FB8Dod/x+0sNn eN+JUGkv04PV/1AL12Fu84KTI0J2+mTnGTxNi5KMlSy30DziPNOwZT3VO f1TQdIA46qokiZWLeN81athol3JyJjMJGH3pgC4DKgGJA1U9Zx6Fvuxdv gIweIq0HE0G033EFZKbSeLP75CnyfwZb4cSwP9fivGjehkDrM7TTLYtZI A==; X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="323119431" X-IronPort-AV: E=Sophos;i="5.97,230,1669104000"; d="scan'208";a="323119431" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 13:23:44 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="989139043" X-IronPort-AV: E=Sophos;i="5.97,230,1669104000"; d="scan'208";a="989139043" Received: from hossain3-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.252.128.187]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 13:23:42 -0800 From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>, Andy Lutomirski <luto@kernel.org>, Balbir Singh <bsingharora@gmail.com>, Borislav Petkov <bp@alien8.de>, Cyrill Gorcunov <gorcunov@gmail.com>, Dave Hansen <dave.hansen@linux.intel.com>, Eugene Syromiatnikov <esyr@redhat.com>, Florian Weimer <fweimer@redhat.com>, "H . J . Lu" <hjl.tools@gmail.com>, Jann Horn <jannh@google.com>, Jonathan Corbet <corbet@lwn.net>, Kees Cook <keescook@chromium.org>, Mike Kravetz <mike.kravetz@oracle.com>, Nadav Amit <nadav.amit@gmail.com>, Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>, Peter Zijlstra <peterz@infradead.org>, Randy Dunlap <rdunlap@infradead.org>, Weijiang Yang <weijiang.yang@intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, John Allen <john.allen@amd.com>, kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu <yu-cheng.yu@intel.com> Subject: [PATCH v5 12/39] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Date: Thu, 19 Jan 2023 13:22:50 -0800 Message-Id: <20230119212317.8324-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230119212317.8324-1-rick.p.edgecombe@intel.com> References: <20230119212317.8324-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755488334676704360?= X-GMAIL-MSGID: =?utf-8?q?1755488334676704360?= |
Series |
Shadow stacks for userspace
|
|
Commit Message
Edgecombe, Rick P
Jan. 19, 2023, 9:22 p.m. UTC
From: Yu-cheng Yu <yu-cheng.yu@intel.com> When shadow stack is in use, Write=0,Dirty=1 PTE are preserved for shadow stack. Copy-on-write PTEs then have Write=0,Cow=1. When a PTE goes from Write=1,Dirty=1 to Write=0,Cow=1, it could become a transient shadow stack PTE in two cases: 1. Some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, and a TLB flush is not necessary. 2. When _PAGE_DIRTY is replaced with _PAGE_COW non-atomically, a transient shadow stack PTE can be created as a result. Thus, prevent that with cmpxchg. In the case of pmdp_set_wrprotect(), for nopmd configs the ->pmd operated on does not exist and the logic would need to be different. Although the extra functionality will normally be optimized out when user shadow stacks are not configured, also exclude it in the preprocessor stage so that it will still compile. User shadow stack is not supported there by Linux anyway. Leave the cpu_feature_enabled() check so that the functionality also gets disabled based on runtime detection of the feature. Similarly, compile it out in ptep_set_wrprotect() due to a clang warning on i386. Like above, the code path should get optimized out on i386 since shadow stack is not supported on 32 bit kernels, but this makes the compiler happy. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- v5: - Commit log verbiage and formatting (Boris) - Remove capitalization on shadow stack (Boris) - Fix i386 warning on recent clang v3: - Remove unnecessary #ifdef (Dave Hansen) v2: - Compile out some code due to clang build error - Clarify commit log (dhansen) - Normalize PTE bit descriptions between patches (dhansen) - Update comment with text from (dhansen) Yu-cheng v30: - Replace (pmdval_t) cast with CONFIG_PGTABLE_LEVELES > 2 (Borislav Petkov). arch/x86/include/asm/pgtable.h | 37 ++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+)
Comments
On Thu, Jan 19, 2023 at 01:22:50PM -0800, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > When shadow stack is in use, Write=0,Dirty=1 PTE are preserved for > shadow stack. Copy-on-write PTEs then have Write=0,Cow=1. > > When a PTE goes from Write=1,Dirty=1 to Write=0,Cow=1, it could > become a transient shadow stack PTE in two cases: > > 1. Some processors can start a write but end up seeing a Write=0 PTE by > the time they get to the Dirty bit, creating a transient shadow stack > PTE. However, this will not occur on processors supporting shadow > stack, and a TLB flush is not necessary. > > 2. When _PAGE_DIRTY is replaced with _PAGE_COW non-atomically, a transient > shadow stack PTE can be created as a result. Thus, prevent that with > cmpxchg. > > In the case of pmdp_set_wrprotect(), for nopmd configs the ->pmd operated > on does not exist and the logic would need to be different. Although the > extra functionality will normally be optimized out when user shadow > stacks are not configured, also exclude it in the preprocessor stage so > that it will still compile. User shadow stack is not supported there by > Linux anyway. Leave the cpu_feature_enabled() check so that the > functionality also gets disabled based on runtime detection of the > feature. > > Similarly, compile it out in ptep_set_wrprotect() due to a clang warning > on i386. Like above, the code path should get optimized out on i386 > since shadow stack is not supported on 32 bit kernels, but this makes > the compiler happy. > > Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many > insights to the issue. Jann Horn provided the cmpxchg solution. > > Tested-by: Pengfei Xu <pengfei.xu@intel.com> > Tested-by: John Allen <john.allen@amd.com> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org>
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 7942eff2af50..c5047eb5f406 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1232,6 +1232,23 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_X86_USER_SHADOW_STACK + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1284,6 +1301,26 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#ifdef CONFIG_X86_USER_SHADOW_STACK + /* + * If shadow stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY + * to _PAGE_COW (see comments at pmd_wrprotect()). + * When a thread reads a RW=1, Dirty=0 PMD and before changing it + * to RW=0, Dirty=0, another thread could have written to the page + * and the PMD is RW=1, Dirty=1 now. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg(&pmdp->pmd, &old_pmd.pmd, new_pmd.pmd)); + + return; + } +#endif + clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); }