From patchwork Fri Nov 4 22:35:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15830 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679275wru; Fri, 4 Nov 2022 15:42:55 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5FUkcxJZPAc+fLVpGv18tvKsqjJuf5kOuCcGZq+JSIt/lppSl1S9TIMA3g0BVOO49r0QRh X-Received: by 2002:a17:906:58c7:b0:722:f4bf:cb75 with SMTP id e7-20020a17090658c700b00722f4bfcb75mr36114039ejs.450.1667601775364; Fri, 04 Nov 2022 15:42:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601775; cv=none; d=google.com; s=arc-20160816; b=OhfT7hprF3y4u0p1dPZsLRTJO3hs7MkdjXuTgpYYnxlf3uAoPvRjCTuZRCu039dbOm tID22d8zA6AcuzZHSX5fPZJM1rFEaXmTa1Ci7ha0rH5djoKxpnNZJcT9zHxnluWTk5aC u19dH1pomDPOFWYqff17nFAqDIBqdiK+VTky8dVXv3sCR91Ldq99+zvmT2qvBO5vcgez x55prbfPVyQ5L8sJmvi62qgeYEaqXzHy0p12ZjeiTQLLaAps8OHuJJWLqpMC8qvkpfxE myLnE6mEd5zEkCWJPG48wAhXAI7R9ZJ7qmtRl4iuj4PGFNSU97ugRlInmH/uRTNWARCp OXtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=Kjn6NeKXWh8OCzYwhdpWW/fhJG1nU6Z+OP15Pr5tU/I=; b=kcYqpngKj4vrdKxZEqqGXwVGM1i0QtMDVU4ox4039a+8XQcQVI/EocSyQcoeCWQDDT R033DFKeNhC6MYajM6Q+/IJO4pT3pIoeoDSfWVzexfnCUX/nsQ4Y9K9yhha+vNvM0Kan /ZDociKbwHpWZz+PX0xZmTRrS8NC2EBCGTZx9HdElBjItibZE8h7Lsu4DLd9O3ExXTIE cNSSzCCGgLGOljyCBCQrRfHuUO1X1PE9N/7d9sC4tR0pfUN3YXtXhb2VC1cp9CZnPK14 EnYMvhV2kZuN/DTtvLu8JLhPxZgeNfpVf39Qf3GvQNvOW1MQzAboHR0babFtVFRul0td aMFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hxvqphCQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gs8-20020a1709072d0800b007addff99f09si224377ejc.1004.2022.11.04.15.42.31; Fri, 04 Nov 2022 15:42:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hxvqphCQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230165AbiKDWlL (ORCPT + 99 others); Fri, 4 Nov 2022 18:41:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230072AbiKDWkK (ORCPT ); Fri, 4 Nov 2022 18:40:10 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F03D243AE2; Fri, 4 Nov 2022 15:39:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601574; x=1699137574; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=8jsAS8647WvaR/jj/jAxKS3gRgPtSsCKGZz+X7gLR1o=; b=hxvqphCQSCjS35NnftYr9QjHGkghCOg3XAstdgT/12tf30bdqcppV9qi z/LpepMAErUs6HRCtHzlmXoLCQ/CQAJxCroB5Uo4huIx6CGZfdKPHn47A V2SE3Dz0myWAbR/3A6wprxiFcqd8IcoCCuHfQfH3T7Ah7i4a/byZFkOYv oQrLoS5fIQXD80lEK7fD3pV7aL4Nramoi8t+fbKyYP81rNKOovFXZSjFi wIFNxEvl9WAozBPQI2Ie9a6G7Yg642UyqMgy9hl1d2gMwxrWdLjMXg66/ X30WSUvz7yYOWhHQdcO9KqReC+MpiywW2K8Dqzy+hPQJwdF5+DcDROEwd g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840521" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840521" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:33 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514033" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514033" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:32 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 12/37] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Date: Fri, 4 Nov 2022 15:35:39 -0700 Message-Id: <20221104223604.29615-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607199108992764?= X-GMAIL-MSGID: =?utf-8?q?1748607199108992764?= From: Yu-cheng Yu When Shadow Stack is in use, Write=0,Dirty=1 PTE are reserved for shadow stack. Copy-on-write PTes then have Write=0,Cow=1. When a PTE goes from Write=1,Dirty=1 to Write=0,Cow=1, it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting Shadow Stack, and a TLB flush is not necessary. The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non- atomically, a transient shadow stack PTE can be created as a result. Thus, prevent that with cmpxchg. In the case of pmdp_set_wrprotect(), for nopmd configs the ->pmd operated on does not exist and the logic would need to be different. Although the extra functionality will normally be optimized out when user shadow stacks are not configured, also exclude it in the preprocessor stage so that it will still compile. User shadow stack is not supported there by Linux anyway. Leave the cpu_feature_enabled() check so that the functionality also disables based on runtime detection of the feature. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v3: - Remove unnecessary #ifdef (Dave Hansen) v2: - Compile out some code due to clang build error - Clarify commit log (dhansen) - Normalize PTE bit descriptions between patches (dhansen) - Update comment with text from (dhansen) Yu-cheng v30: - Replace (pmdval_t) cast with CONFIG_PGTABLE_LEVELES > 2 (Borislav Petkov). arch/x86/include/asm/pgtable.h | 35 ++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 81f388a5a5ab..f252c42f3ca1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1289,6 +1289,21 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1341,6 +1356,26 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#ifdef CONFIG_X86_USER_SHADOW_STACK + /* + * If Shadow Stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY + * to _PAGE_COW (see comments at pmd_wrprotect()). + * When a thread reads a RW=1, Dirty=0 PMD and before changing it + * to RW=0, Dirty=0, another thread could have written to the page + * and the PMD is RW=1, Dirty=1 now. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg(&pmdp->pmd, &old_pmd.pmd, new_pmd.pmd)); + + return; + } +#endif + clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); }