From patchwork Tue Jun 13 00:10:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106998 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp223525vqr; Mon, 12 Jun 2023 17:47:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6skUSmUyxk1Gr6t+M416xMOHPGNr7GFeyJvnkIJPhtmWLS6RD6lgvVz+0LGkGJvaj2V/zE X-Received: by 2002:a17:906:9b8f:b0:96f:5cb3:df66 with SMTP id dd15-20020a1709069b8f00b0096f5cb3df66mr12815871ejc.18.1686617277514; Mon, 12 Jun 2023 17:47:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617277; cv=none; d=google.com; s=arc-20160816; b=CYE49+6CfCV9av19bJx8YU6HjAANNhOtZnQYRu8LqAWXXSFHM8SGjkmtvqkP3kr5ni 4D3WQZg6QlBwWEyjY42GGkgI/k0F0fXtoPdrou6fxKOvU6ZuUMqthEAVPOBpWYmykRUz t3i91lwaa9Ce9RJT54HznIt8dyAcbXgLokKyK+/a4By6zQwBUgAyMswmB0P5KB3RdPMF CtTYqPu235E2VpCswavNzHEmtHeKvKK3MXf9x7BR5T9DIhS3HFVrOe6eQJ8GhDPMEyn/ 0uDv5X3WCho5ltZweP0CgeQSRuAB/xuurgiMnsG0HC6LdBZ8a0YaETcBOt+JZOE3Nt8v TSRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=r4qR69WOh9awfBvqereclECVp/zYAJs3umHYUyW05g8=; b=Vv9qLVOiV2GAGt+OJeI0Lnjp/IcM9EREBmQLPmvICXTpJz6Gik2BlhUnWSeGYcm8Sk jjQFEZMRHqtcQaMXthH0xLdnphnsGl3lJ4YbA2hmrjTmeUwt3EfVjP2n7Cgkerc1k2Qy eAq0swMuRWS0MFb8Enn/0qKlTzCnVO02P9XvNv/ehFOgJbnw3GcBt3OMOQmQSGWQwa7f +w6XUfY8bVMYI/dE27WIxNOZIfHKFKZxrZejsWU56AWMDgppSyZ1z+3QTPLoamK1C1Hw xGUXEdCUctPteB+P2KGyVwBjNTNJnNDffecIs1GQXruaY+6ijxuCYzAYcTvSmo9AIa2t JxFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LQp277AQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u14-20020a170906950e00b00977c6d3303asi5969339ejx.570.2023.06.12.17.47.34; Mon, 12 Jun 2023 17:47:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LQp277AQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231513AbjFMAMd (ORCPT + 99 others); Mon, 12 Jun 2023 20:12:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232682AbjFMAMW (ORCPT ); Mon, 12 Jun 2023 20:12:22 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F580118; Mon, 12 Jun 2023 17:12:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615130; x=1718151130; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+8Ah+gwuuWVdk3MaZgJFhmVmdzU+WHC34h/afb7gaAY=; b=LQp277AQF+AhGUDJYeavNrVreevL5C/aPRLewBH6YDBgycnSvl3F6Zic UdKOSVFsHGWZfRHgpMraTnzzOGiu8tssta+tkrce18NOoDtE+d+RTdpMx rUOQaxVuzGX58Y9Bcc+eqJV7qdXY1fxFZXWYCcy3hj4MxnlA+RE5eTn9o pTq8M+EE+civ9APJKvQlQIouV2x1ThvFxQ5fOkSkKtS8jmpDDP2fGj/7r h/qW7YxUCQiAtc0DfCJvs4ev2i/zqM4FEJ/jaoyzKPOpQjBb1YQyOX3L9 YLHHbCIhacZOluir8aC086+WlAgAp2Cyus0tp8W1QcnMuLQqbmryQAmY9 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556651" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556651" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670967" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670967" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:06 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, Michal Simek , Dinh Nguyen , linux-mips@vger.kernel.org, openrisc@lists.librecores.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, Linus Torvalds Subject: [PATCH v9 01/42] mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma() Date: Mon, 12 Jun 2023 17:10:27 -0700 Message-Id: <20230613001108.3040476-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546398543828619?= X-GMAIL-MSGID: =?utf-8?q?1768546398543828619?= The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable memory or shadow stack memory. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(), create a new arch config HAS_HUGE_PAGE that can be used to tell if pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the compiler would not be able to find pmd_mkwrite_novma(). No functional change. Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: loongarch@lists.linux.dev Cc: linux-m68k@lists.linux-m68k.org Cc: Michal Simek Cc: Dinh Nguyen Cc: linux-mips@vger.kernel.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Suggested-by: Linus Torvalds Signed-off-by: Rick Edgecombe Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ Acked-by: Geert Uytterhoeven Reviewed-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Acked-by: Helge Deller # parisc --- Hi Non-x86 Arch’s, x86 has a feature that allows for the creation of a special type of writable memory (shadow stack) that is only writable in limited specific ways. Previously, changes were proposed to core MM code to teach it to decide when to create normally writable memory or the special shadow stack writable memory, but David Hildenbrand suggested[0] to change pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be moved into x86 code. Later Linus suggested a less error-prone way[1] to go about this after the first attempt had a bug. Since pXX_mkwrite() is defined in every arch, it requires some tree-wide changes. So that is why you are seeing some patches out of a big x86 series pop up in your arch mailing list. There is no functional change. After this refactor, the shadow stack series goes on to use the arch helpers to push arch memory details inside arch/x86 and other arch's with upcoming shadow stack features. Testing was just 0-day build testing. Hopefully that is enough context. Thanks! [0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ [1] https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ --- Documentation/mm/arch_pgtable_helpers.rst | 6 ++++++ arch/Kconfig | 3 +++ arch/alpha/include/asm/pgtable.h | 2 +- arch/arc/include/asm/hugepage.h | 2 +- arch/arc/include/asm/pgtable-bits-arcv2.h | 2 +- arch/arm/include/asm/pgtable-3level.h | 2 +- arch/arm/include/asm/pgtable.h | 2 +- arch/arm64/include/asm/pgtable.h | 4 ++-- arch/csky/include/asm/pgtable.h | 2 +- arch/hexagon/include/asm/pgtable.h | 2 +- arch/ia64/include/asm/pgtable.h | 2 +- arch/loongarch/include/asm/pgtable.h | 4 ++-- arch/m68k/include/asm/mcf_pgtable.h | 2 +- arch/m68k/include/asm/motorola_pgtable.h | 2 +- arch/m68k/include/asm/sun3_pgtable.h | 2 +- arch/microblaze/include/asm/pgtable.h | 2 +- arch/mips/include/asm/pgtable.h | 6 +++--- arch/nios2/include/asm/pgtable.h | 2 +- arch/openrisc/include/asm/pgtable.h | 2 +- arch/parisc/include/asm/pgtable.h | 2 +- arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +- arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++-- arch/powerpc/include/asm/nohash/32/pgtable.h | 4 ++-- arch/powerpc/include/asm/nohash/32/pte-8xx.h | 4 ++-- arch/powerpc/include/asm/nohash/64/pgtable.h | 2 +- arch/riscv/include/asm/pgtable.h | 6 +++--- arch/s390/include/asm/hugetlb.h | 2 +- arch/s390/include/asm/pgtable.h | 4 ++-- arch/sh/include/asm/pgtable_32.h | 4 ++-- arch/sparc/include/asm/pgtable_32.h | 2 +- arch/sparc/include/asm/pgtable_64.h | 6 +++--- arch/um/include/asm/pgtable.h | 2 +- arch/x86/include/asm/pgtable.h | 4 ++-- arch/xtensa/include/asm/pgtable.h | 2 +- include/asm-generic/hugetlb.h | 2 +- include/linux/pgtable.h | 14 ++++++++++++++ 36 files changed, 70 insertions(+), 47 deletions(-) diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index af3891f895b0..69ce1f2aa4d1 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -48,6 +48,9 @@ PTE Page Table Helpers +---------------------------+--------------------------------------------------+ | pte_mkwrite | Creates a writable PTE | +---------------------------+--------------------------------------------------+ +| pte_mkwrite_novma | Creates a writable PTE, of the conventional type | +| | of writable. | ++---------------------------+--------------------------------------------------+ | pte_wrprotect | Creates a write protected PTE | +---------------------------+--------------------------------------------------+ | pte_mkspecial | Creates a special PTE | @@ -120,6 +123,9 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_mkwrite | Creates a writable PMD | +---------------------------+--------------------------------------------------+ +| pmd_mkwrite_novma | Creates a writable PMD, of the conventional type | +| | of writable. | ++---------------------------+--------------------------------------------------+ | pmd_wrprotect | Creates a write protected PMD | +---------------------------+--------------------------------------------------+ | pmd_mkspecial | Creates a special PMD | diff --git a/arch/Kconfig b/arch/Kconfig index 205fd23e0cad..3bc11c9a2ac1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -919,6 +919,9 @@ config HAVE_ARCH_HUGE_VMALLOC config ARCH_WANT_HUGE_PMD_SHARE bool +config HAS_HUGE_PAGE + def_bool HAVE_ARCH_HUGE_VMAP || TRANSPARENT_HUGEPAGE || HUGETLBFS + config HAVE_ARCH_SOFT_DIRTY bool diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h index ba43cb841d19..af1a13ab3320 100644 --- a/arch/alpha/include/asm/pgtable.h +++ b/arch/alpha/include/asm/pgtable.h @@ -256,7 +256,7 @@ extern inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; extern inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_FOW; return pte; } extern inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~(__DIRTY_BITS); return pte; } extern inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~(__ACCESS_BITS); return pte; } -extern inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_FOW; return pte; } +extern inline pte_t pte_mkwrite_novma(pte_t pte){ pte_val(pte) &= ~_PAGE_FOW; return pte; } extern inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= __DIRTY_BITS; return pte; } extern inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= __ACCESS_BITS; return pte; } diff --git a/arch/arc/include/asm/hugepage.h b/arch/arc/include/asm/hugepage.h index 5001b796fb8d..ef8d4166370c 100644 --- a/arch/arc/include/asm/hugepage.h +++ b/arch/arc/include/asm/hugepage.h @@ -21,7 +21,7 @@ static inline pmd_t pte_pmd(pte_t pte) } #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h index 6e9f8ca6d6a1..5c073d9f41c2 100644 --- a/arch/arc/include/asm/pgtable-bits-arcv2.h +++ b/arch/arc/include/asm/pgtable-bits-arcv2.h @@ -87,7 +87,7 @@ PTE_BIT_FUNC(mknotpresent, &= ~(_PAGE_PRESENT)); PTE_BIT_FUNC(wrprotect, &= ~(_PAGE_WRITE)); -PTE_BIT_FUNC(mkwrite, |= (_PAGE_WRITE)); +PTE_BIT_FUNC(mkwrite_novma, |= (_PAGE_WRITE)); PTE_BIT_FUNC(mkclean, &= ~(_PAGE_DIRTY)); PTE_BIT_FUNC(mkdirty, |= (_PAGE_DIRTY)); PTE_BIT_FUNC(mkold, &= ~(_PAGE_ACCESSED)); diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 106049791500..71c3add6417f 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -202,7 +202,7 @@ static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; } PMD_BIT_FUNC(wrprotect, |= L_PMD_SECT_RDONLY); PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF); -PMD_BIT_FUNC(mkwrite, &= ~L_PMD_SECT_RDONLY); +PMD_BIT_FUNC(mkwrite_novma, &= ~L_PMD_SECT_RDONLY); PMD_BIT_FUNC(mkdirty, |= L_PMD_SECT_DIRTY); PMD_BIT_FUNC(mkclean, &= ~L_PMD_SECT_DIRTY); PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF); diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index a58ccbb406ad..f37ba2472eae 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -227,7 +227,7 @@ static inline pte_t pte_wrprotect(pte_t pte) return set_pte_bit(pte, __pgprot(L_PTE_RDONLY)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return clear_pte_bit(pte, __pgprot(L_PTE_RDONLY)); } diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 0bd18de9fd97..7a3d62cb9bee 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -180,7 +180,7 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot) return pmd; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte = set_pte_bit(pte, __pgprot(PTE_WRITE)); pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY)); @@ -487,7 +487,7 @@ static inline int pmd_trans_huge(pmd_t pmd) #define pmd_cont(pmd) pte_cont(pmd_pte(pmd)) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h index d4042495febc..aa0cce4fc02f 100644 --- a/arch/csky/include/asm/pgtable.h +++ b/arch/csky/include/asm/pgtable.h @@ -176,7 +176,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h index 59393613d086..fc2d2d83368d 100644 --- a/arch/hexagon/include/asm/pgtable.h +++ b/arch/hexagon/include/asm/pgtable.h @@ -300,7 +300,7 @@ static inline pte_t pte_wrprotect(pte_t pte) } /* pte_mkwrite - mark page as writable */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h index 21c97e31a28a..f80aba7cad99 100644 --- a/arch/ia64/include/asm/pgtable.h +++ b/arch/ia64/include/asm/pgtable.h @@ -268,7 +268,7 @@ ia64_phys_addr_valid (unsigned long addr) * access rights: */ #define pte_wrprotect(pte) (__pte(pte_val(pte) & ~_PAGE_AR_RW)) -#define pte_mkwrite(pte) (__pte(pte_val(pte) | _PAGE_AR_RW)) +#define pte_mkwrite_novma(pte) (__pte(pte_val(pte) | _PAGE_AR_RW)) #define pte_mkold(pte) (__pte(pte_val(pte) & ~_PAGE_A)) #define pte_mkyoung(pte) (__pte(pte_val(pte) | _PAGE_A)) #define pte_mkclean(pte) (__pte(pte_val(pte) & ~_PAGE_D)) diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h index d28fb9dbec59..8245cf367b31 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -390,7 +390,7 @@ static inline pte_t pte_mkdirty(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) @@ -490,7 +490,7 @@ static inline int pmd_write(pmd_t pmd) return !!(pmd_val(pmd) & _PAGE_WRITE); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { pmd_val(pmd) |= _PAGE_WRITE; if (pmd_val(pmd) & _PAGE_MODIFIED) diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h index d97fbb812f63..42ebea0488e3 100644 --- a/arch/m68k/include/asm/mcf_pgtable.h +++ b/arch/m68k/include/asm/mcf_pgtable.h @@ -211,7 +211,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= CF_PAGE_WRITABLE; return pte; diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h index ec0dc19ab834..ba28ca4d219a 100644 --- a/arch/m68k/include/asm/motorola_pgtable.h +++ b/arch/m68k/include/asm/motorola_pgtable.h @@ -155,7 +155,7 @@ static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_RONLY; return pte; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~_PAGE_DIRTY; return pte; } static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~_PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_RONLY; return pte; } +static inline pte_t pte_mkwrite_novma(pte_t pte){ pte_val(pte) &= ~_PAGE_RONLY; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mknocache(pte_t pte) diff --git a/arch/m68k/include/asm/sun3_pgtable.h b/arch/m68k/include/asm/sun3_pgtable.h index e582b0484a55..4114eaff7404 100644 --- a/arch/m68k/include/asm/sun3_pgtable.h +++ b/arch/m68k/include/asm/sun3_pgtable.h @@ -143,7 +143,7 @@ static inline int pte_young(pte_t pte) { return pte_val(pte) & SUN3_PAGE_ACCESS static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_WRITEABLE; return pte; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_MODIFIED; return pte; } static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) |= SUN3_PAGE_WRITEABLE; return pte; } +static inline pte_t pte_mkwrite_novma(pte_t pte){ pte_val(pte) |= SUN3_PAGE_WRITEABLE; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= SUN3_PAGE_MODIFIED; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= SUN3_PAGE_ACCESSED; return pte; } static inline pte_t pte_mknocache(pte_t pte) { pte_val(pte) |= SUN3_PAGE_NOCACHE; return pte; } diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h index d1b8272abcd9..9108b33a7886 100644 --- a/arch/microblaze/include/asm/pgtable.h +++ b/arch/microblaze/include/asm/pgtable.h @@ -266,7 +266,7 @@ static inline pte_t pte_mkread(pte_t pte) \ { pte_val(pte) |= _PAGE_USER; return pte; } static inline pte_t pte_mkexec(pte_t pte) \ { pte_val(pte) |= _PAGE_USER | _PAGE_EXEC; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) \ +static inline pte_t pte_mkwrite_novma(pte_t pte) \ { pte_val(pte) |= _PAGE_RW; return pte; } static inline pte_t pte_mkdirty(pte_t pte) \ { pte_val(pte) |= _PAGE_DIRTY; return pte; } diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 574fa14ac8b2..40a54fd6e48d 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -309,7 +309,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte.pte_low |= _PAGE_WRITE; if (pte.pte_low & _PAGE_MODIFIED) { @@ -364,7 +364,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) @@ -627,7 +627,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) return pmd; } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { pmd_val(pmd) |= _PAGE_WRITE; if (pmd_val(pmd) & _PAGE_MODIFIED) diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h index 0f5c2564e9f5..cf1ffbc1a121 100644 --- a/arch/nios2/include/asm/pgtable.h +++ b/arch/nios2/include/asm/pgtable.h @@ -129,7 +129,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h index 3eb9b9555d0d..828820c74fc5 100644 --- a/arch/openrisc/include/asm/pgtable.h +++ b/arch/openrisc/include/asm/pgtable.h @@ -250,7 +250,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index e715df5385d6..79d1cef2fd7c 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -331,7 +331,7 @@ static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~_PAGE_ACCESSED; retu static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~_PAGE_WRITE; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; } +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; } static inline pte_t pte_mkspecial(pte_t pte) { pte_val(pte) |= _PAGE_SPECIAL; return pte; } /* diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 7bf1fe7297c6..67dfb674a4c1 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -498,7 +498,7 @@ static inline pte_t pte_mkpte(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 4acc9690f599..0328d917494a 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -600,7 +600,7 @@ static inline pte_t pte_mkexec(pte_t pte) return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { /* * write implies read, hence set both @@ -1071,7 +1071,7 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))) #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY #define pmd_soft_dirty(pmd) pte_soft_dirty(pmd_pte(pmd)) diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h index fec56d965f00..33213b31fcbb 100644 --- a/arch/powerpc/include/asm/nohash/32/pgtable.h +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h @@ -170,8 +170,8 @@ void unmap_kernel_page(unsigned long va); #define pte_clear(mm, addr, ptep) \ do { pte_update(mm, addr, ptep, ~0, 0, 0); } while (0) -#ifndef pte_mkwrite -static inline pte_t pte_mkwrite(pte_t pte) +#ifndef pte_mkwrite_novma +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h index 1a89ebdc3acc..21f681ee535a 100644 --- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h @@ -101,12 +101,12 @@ static inline int pte_write(pte_t pte) #define pte_write pte_write -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_RO); } -#define pte_mkwrite pte_mkwrite +#define pte_mkwrite_novma pte_mkwrite_novma static inline bool pte_user(pte_t pte) { diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h index 287e25864ffa..abe4fd82721e 100644 --- a/arch/powerpc/include/asm/nohash/64/pgtable.h +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h @@ -85,7 +85,7 @@ #ifndef __ASSEMBLY__ /* pte_clear moved to later in this file */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 2258b27173b0..b38faec98154 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -379,7 +379,7 @@ static inline pte_t pte_wrprotect(pte_t pte) /* static inline pte_t pte_mkread(pte_t pte) */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_WRITE); } @@ -665,9 +665,9 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pte_pmd(pte_mkyoung(pmd_pte(pmd))); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { - return pte_pmd(pte_mkwrite(pmd_pte(pmd))); + return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); } static inline pmd_t pmd_wrprotect(pmd_t pmd) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h index ccdbccfde148..f07267875a19 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -104,7 +104,7 @@ static inline int huge_pte_dirty(pte_t pte) static inline pte_t huge_pte_mkwrite(pte_t pte) { - return pte_mkwrite(pte); + return pte_mkwrite_novma(pte); } static inline pte_t huge_pte_mkdirty(pte_t pte) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 6822a11c2c8a..699406036f30 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1005,7 +1005,7 @@ static inline pte_t pte_wrprotect(pte_t pte) return set_pte_bit(pte, __pgprot(_PAGE_PROTECT)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte = set_pte_bit(pte, __pgprot(_PAGE_WRITE)); if (pte_val(pte) & _PAGE_DIRTY) @@ -1488,7 +1488,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) return set_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_PROTECT)); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { pmd = set_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_WRITE)); if (pmd_val(pmd) & _SEGMENT_ENTRY_DIRTY) diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h index 21952b094650..165b4fd08152 100644 --- a/arch/sh/include/asm/pgtable_32.h +++ b/arch/sh/include/asm/pgtable_32.h @@ -359,11 +359,11 @@ static inline pte_t pte_##fn(pte_t pte) { pte.pte_##h op; return pte; } * kernel permissions), we attempt to couple them a bit more sanely here. */ PTE_BIT_FUNC(high, wrprotect, &= ~(_PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE)); -PTE_BIT_FUNC(high, mkwrite, |= _PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE); +PTE_BIT_FUNC(high, mkwrite_novma, |= _PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE); PTE_BIT_FUNC(high, mkhuge, |= _PAGE_SZHUGE); #else PTE_BIT_FUNC(low, wrprotect, &= ~_PAGE_RW); -PTE_BIT_FUNC(low, mkwrite, |= _PAGE_RW); +PTE_BIT_FUNC(low, mkwrite_novma, |= _PAGE_RW); PTE_BIT_FUNC(low, mkhuge, |= _PAGE_SZHUGE); #endif diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h index d4330e3c57a6..a2d909446539 100644 --- a/arch/sparc/include/asm/pgtable_32.h +++ b/arch/sparc/include/asm/pgtable_32.h @@ -241,7 +241,7 @@ static inline pte_t pte_mkold(pte_t pte) return __pte(pte_val(pte) & ~SRMMU_REF); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | SRMMU_WRITE); } diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 5563efa1a19f..4dd4f6cdc670 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -517,7 +517,7 @@ static inline pte_t pte_mkclean(pte_t pte) return __pte(val); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { unsigned long val = pte_val(pte), mask; @@ -772,11 +772,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return __pmd(pte_val(pte)); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { pte_t pte = __pte(pmd_val(pmd)); - pte = pte_mkwrite(pte); + pte = pte_mkwrite_novma(pte); return __pmd(pte_val(pte)); } diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h index a70d1618eb35..46f59a8bc812 100644 --- a/arch/um/include/asm/pgtable.h +++ b/arch/um/include/asm/pgtable.h @@ -207,7 +207,7 @@ static inline pte_t pte_mkyoung(pte_t pte) return(pte); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { if (unlikely(pte_get_bits(pte, _PAGE_RW))) return pte; diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 15ae4d6ba476..112e6060eafa 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -352,7 +352,7 @@ static inline pte_t pte_mkyoung(pte_t pte) return pte_set_flags(pte, _PAGE_ACCESSED); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return pte_set_flags(pte, _PAGE_RW); } @@ -453,7 +453,7 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_ACCESSED); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_RW); } diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h index fc7a14884c6c..27e3ae38a5de 100644 --- a/arch/xtensa/include/asm/pgtable.h +++ b/arch/xtensa/include/asm/pgtable.h @@ -262,7 +262,7 @@ static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITABLE; return pte; } #define pgprot_noncached(prot) \ diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index d7f6335d3999..4da02798a00b 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -22,7 +22,7 @@ static inline unsigned long huge_pte_dirty(pte_t pte) static inline pte_t huge_pte_mkwrite(pte_t pte) { - return pte_mkwrite(pte); + return pte_mkwrite_novma(pte); } #ifndef __HAVE_ARCH_HUGE_PTE_WRPROTECT diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c5a51481bbb9..ae271a307584 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -507,6 +507,20 @@ extern pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, pud_t *pudp); #endif +#ifndef pte_mkwrite +static inline pte_t pte_mkwrite(pte_t pte) +{ + return pte_mkwrite_novma(pte); +} +#endif + +#if defined(CONFIG_HAS_HUGE_PAGE) && !defined(pmd_mkwrite) +static inline pmd_t pmd_mkwrite(pmd_t pmd) +{ + return pmd_mkwrite_novma(pmd); +} +#endif + #ifndef __HAVE_ARCH_PTEP_SET_WRPROTECT struct mm_struct; static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) From patchwork Tue Jun 13 00:10:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106956 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211077vqr; Mon, 12 Jun 2023 17:14:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7ymVYgDTUkljCsXnGb1u5cqIdfAb1oa2pU24ckW8XJqpq1bbvQoU8QmMoD5OJZ/TJnc3ly X-Received: by 2002:a17:906:eec5:b0:969:bac4:8e22 with SMTP id wu5-20020a170906eec500b00969bac48e22mr11195458ejb.26.1686615283094; Mon, 12 Jun 2023 17:14:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615283; cv=none; d=google.com; s=arc-20160816; b=kwhB+1P9P+/AMXJm34884qF20hUCVvP6LSFGcRTiP9i0WU4C1AgTUm7dYKfdFhOBAp Tn59Yzk3Z7pLbOL46ZtI4jivf0cxSABkgJGj5mvgqXll3PnuG4t+42JA219TLUgPT3Np M1Br0MEKw3TDDxkZ2Kib0hreiVEy5qUURoO/Dz6H8BSP4NXC8SimRVT3/KvvSwiCW2Sv jrAmCRPN8yNKRlxyKdrYQBzqZ7Esl8VsRu/wBvg+wf1/FrYXNE//XttsJgIn3PC1xsBQ Y6Ihn81qmaBlhTyvQVcssY5cueVZ4asaQb+2bB24urWFmn4pW15kItPn81imdPhHylH3 OIoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bhEM/cU7s2w8AGabEsZ/KJXI1WiWuC9Y86R+7+JHEQk=; b=XZrGkfkjVva5CiB89PI+HvJGBePE/rZQZs2+hmnU5f7yjm3CwD8B5dtSBZZeE9Xs7B 3YlJxry1M1MLrHO43pwLpbccdXlwZ0rtEt6mLJPloWIEBVsWSU55mnoZausQK5Vjv0nE VkFGQ4s8glEZU1dNZijVvl0lw1ZUGSXbqmk1xc4gwCENep/KDFQyUjfpXdI9MmTKzbla NmXiBr2nUN4WECACWx+PtbNrDqceLrYCU5sSsuEOK/UTPGbpPgHzN1UAwg4SdvA6nCDb 5Gcr+Uf/0ZVkRyfBqHpVS3yBo/+YtfXzJscmdxkQm0I60gtMD20Kt04E9iedBtKW8UKK NEfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=b2BP33R2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b20-20020aa7c914000000b00510aed0c7acsi6804154edt.77.2023.06.12.17.13.51; Mon, 12 Jun 2023 17:14:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=b2BP33R2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232547AbjFMAMY (ORCPT + 99 others); Mon, 12 Jun 2023 20:12:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238261AbjFMAML (ORCPT ); Mon, 12 Jun 2023 20:12:11 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6863C13E; Mon, 12 Jun 2023 17:12:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615130; x=1718151130; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3c77rqX6YUy6Cyz5r8ePlQOYiEkqAYIhDQVjx5oVoZw=; b=b2BP33R2x4W89UJISrB9QYeO5rGrEZ1SyCKsZ3eaSks8xsjHUHy5tIeK wHWWbo3Suujw4Y70dHzl2ylpHVmW+Qc+nd3Ep2R+BH1I1lCGhvuro7HFz 67Pd9IaDXVWFhK+L6r5aVc4ubsMU8sdt5CqjH92DXGWcGke7qHFabqClo Bs0EmXjyLkqmaYBNROdV1+6WLxsCfdK3s14i7fdf+YrA9d5aM4lQW8XlH qGB0ZlSh1D+tLAU91QtyWOZPpR+hcWPT2YkQp1++pRtL1bwW5XGrAwy6M Uf3IbkLAeJGoC7/v41gKJzoOaQeX/RnYQT5Pc5M3ck8NMIz/1akdfjINs Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556696" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556696" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670972" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670972" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:07 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v9 02/42] mm: Move pte/pmd_mkwrite() callers with no VMA to _novma() Date: Mon, 12 Jun 2023 17:10:28 -0700 Message-Id: <20230613001108.3040476-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544307100370478?= X-GMAIL-MSGID: =?utf-8?q?1768544307100370478?= The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable memory or shadow stack memory. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Previous patches have done the first step, so next move the callers that don't have a VMA to pte_mkwrite_novma(). Also do the same for pmd_mkwrite(). This will be ok for the shadow stack feature, as these callers are on kernel memory which will not need to be made shadow stack, and the other architectures only currently support one type of memory in pte_mkwrite() Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Rick Edgecombe Reviewed-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- Hi Non-x86 Arch’s, x86 has a feature that allows for the creation of a special type of writable memory (shadow stack) that is only writable in limited specific ways. Previously, changes were proposed to core MM code to teach it to decide when to create normally writable memory or the special shadow stack writable memory, but David Hildenbrand suggested[0] to change pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be moved into x86 code. Later Linus suggested a less error-prone way[1] to go about this after the first attempt had a bug. Since pXX_mkwrite() is defined in every arch, it requires some tree-wide changes. So that is why you are seeing some patches out of a big x86 series pop up in your arch mailing list. There is no functional change. After this refactor, the shadow stack series goes on to use the arch helpers to push arch memory details inside arch/x86 and other arch's with upcoming shadow stack features. Testing was just 0-day build testing. Hopefully that is enough context. Thanks! [0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ [1] https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ --- arch/arm64/mm/trans_pgd.c | 4 ++-- arch/s390/mm/pageattr.c | 4 ++-- arch/x86/xen/mmu_pv.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 4ea2eefbc053..a01493f3a06f 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -40,7 +40,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) * read only (code, rodata). Clear the RDONLY bit from * the temporary mappings we use during restore. */ - set_pte(dst_ptep, pte_mkwrite(pte)); + set_pte(dst_ptep, pte_mkwrite_novma(pte)); } else if (debug_pagealloc_enabled() && !pte_none(pte)) { /* * debug_pagealloc will removed the PTE_VALID bit if @@ -53,7 +53,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) */ BUG_ON(!pfn_valid(pte_pfn(pte))); - set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte))); + set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte))); } } diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index 5ba3bd8a7b12..6931d484d8a7 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -97,7 +97,7 @@ static int walk_pte_level(pmd_t *pmdp, unsigned long addr, unsigned long end, if (flags & SET_MEMORY_RO) new = pte_wrprotect(new); else if (flags & SET_MEMORY_RW) - new = pte_mkwrite(pte_mkdirty(new)); + new = pte_mkwrite_novma(pte_mkdirty(new)); if (flags & SET_MEMORY_NX) new = set_pte_bit(new, __pgprot(_PAGE_NOEXEC)); else if (flags & SET_MEMORY_X) @@ -155,7 +155,7 @@ static void modify_pmd_page(pmd_t *pmdp, unsigned long addr, if (flags & SET_MEMORY_RO) new = pmd_wrprotect(new); else if (flags & SET_MEMORY_RW) - new = pmd_mkwrite(pmd_mkdirty(new)); + new = pmd_mkwrite_novma(pmd_mkdirty(new)); if (flags & SET_MEMORY_NX) new = set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_NOEXEC)); else if (flags & SET_MEMORY_X) diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index b3b8d289b9ab..63fced067057 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -150,7 +150,7 @@ void make_lowmem_page_readwrite(void *vaddr) if (pte == NULL) return; /* vaddr missing */ - ptev = pte_mkwrite(*pte); + ptev = pte_mkwrite_novma(*pte); if (HYPERVISOR_update_va_mapping(address, ptev, 0)) BUG(); From patchwork Tue Jun 13 00:10:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106957 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211094vqr; Mon, 12 Jun 2023 17:14:46 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5z/HKeOE5IRPdi6gwJaXhwFbmJ6riKQdPPNu/MwpUoDEy8klbvuatHq0Mo88tMOm8/CJIE X-Received: by 2002:a17:907:3da7:b0:974:c32c:b484 with SMTP id he39-20020a1709073da700b00974c32cb484mr13055837ejc.72.1686615285906; Mon, 12 Jun 2023 17:14:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615285; cv=none; d=google.com; s=arc-20160816; b=MEHFJyqdwxMHWphtpE783HJt2xIrEmwEkUpUZhJMm9aFnUmd6XpGsrC5vxAKJVPFIt n7mKEDYyJHdbbxxmfWUOlOvpgOL+8xZ86ILYdZdsPAnkfP0TCTZ8N04P3x7hJ3EYC8WI tnsAPqxNdTs/Qa+HqEI7c9UzdgKUC9kv2LXRHgafVw1Q0/PRwAAiHJSpLh8jUm++ZoXh DrukERogFoEJeAeW1cXzyCwl/nMnw25IVJP2NUfc+BsmiafjB9MTehg5FskQJJTOrIX7 GSnYWznmaJFhocK+B0INcUMUkmIGQge5P9MkXdEqI7oF3iB8Km3Ev4xph89pL4/KYbhQ n9Gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VcuBReWpfxAIKx4wFfFaWaN8xL7BMpduuUswGcnNnAs=; b=gWpsWGFRXMRB2o+svwaCwfRdb8EMcteotjxPw8SQBwn3tV0U8MjXv2Ex8Gnkjb6Tct aQI2hRfYzRbUOIw0UuLfKpORxeSxk5imdIxsMoqoMwQ0Uxtm/AbBzJOCArvs1Tq5ymhz pvwAvt99M3JSYwz9mUCDyQLx8Lhi2t02RhOKh/NH7E9bqB9dTgO6by4nvCjitjooel18 pzxTK1MqQrlmxIzJ31aIw+f93Or4Zjx4JAEU3X7L9vBqFXkO6jOIOSewMrSHW+I9EKLV Nsduq5LkP38dHLInN0ucWZpyXgXZzbJFsg4ak4QUJfNL1Alz2t9TO/9aAzBtJNP8lkF8 gqGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WEe03L+d; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m10-20020aa7d34a000000b0050ddbac1694si6128880edr.309.2023.06.12.17.14.01; Mon, 12 Jun 2023 17:14:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WEe03L+d; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238763AbjFMAM1 (ORCPT + 99 others); Mon, 12 Jun 2023 20:12:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229454AbjFMAMN (ORCPT ); Mon, 12 Jun 2023 20:12:13 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D90FE197; Mon, 12 Jun 2023 17:12:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615132; x=1718151132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=p3wIDF2hCv3eUQbed5EBHWb3M5fOtv++6fox2OA9QBk=; b=WEe03L+dip8DerGbMMAGDK4Ow/Xw3dy+cm3Ti1gMWsu0r8R69pWiUPgo yHvJlcXbTuopsQg2EcCGEDgz70Kdl9QU+6uBcXFqur2ZIud2W2/HSF2L8 GJF34qaIjUY1hH2/iOaPR2Ra5fqOnTyKPQD7My8Al5sqDtO6B/U5kiZ2p br+WLfaTzzdY3gmGGdF3C4RicDik8cpHfEjxNDgBRUMWq7J2sye/PfL4+ wBzTIn1f9weZPypHBlSu1KwlfZ8/6I2kqDTZ04buk+VO2LbNe0/fopnEX Ev26Em8f0ik2wR0lsoFrX0bOhClFj9zptlCu3RsbTBmwGt9GpsHX/3lyi Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556717" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556717" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670975" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670975" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:08 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v9 03/42] mm: Make pte_mkwrite() take a VMA Date: Mon, 12 Jun 2023 17:10:29 -0700 Message-Id: <20230613001108.3040476-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544310024632425?= X-GMAIL-MSGID: =?utf-8?q?1768544310024632425?= The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable memory or shadow stack memory. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. In a previous patches, pte_mkwrite() was renamed pte_mkwrite_novma() and callers that don't have a VMA were changed to use pte_mkwrite_novma(). So now change pte_mkwrite() to take a VMA and change the remaining callers to pass a VMA. Apply the same changes for pmd_mkwrite(). No functional change. Suggested-by: David Hildenbrand Signed-off-by: Rick Edgecombe Reviewed-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- Documentation/mm/arch_pgtable_helpers.rst | 6 ++++-- include/linux/mm.h | 2 +- include/linux/pgtable.h | 4 ++-- mm/debug_vm_pgtable.c | 12 ++++++------ mm/huge_memory.c | 10 +++++----- mm/memory.c | 4 ++-- mm/migrate.c | 2 +- mm/migrate_device.c | 2 +- mm/mprotect.c | 2 +- mm/userfaultfd.c | 2 +- 10 files changed, 24 insertions(+), 22 deletions(-) diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index 69ce1f2aa4d1..c82e3ee20e51 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -46,7 +46,8 @@ PTE Page Table Helpers +---------------------------+--------------------------------------------------+ | pte_mkclean | Creates a clean PTE | +---------------------------+--------------------------------------------------+ -| pte_mkwrite | Creates a writable PTE | +| pte_mkwrite | Creates a writable PTE of the type specified by | +| | the VMA. | +---------------------------+--------------------------------------------------+ | pte_mkwrite_novma | Creates a writable PTE, of the conventional type | | | of writable. | @@ -121,7 +122,8 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_mkclean | Creates a clean PMD | +---------------------------+--------------------------------------------------+ -| pmd_mkwrite | Creates a writable PMD | +| pmd_mkwrite | Creates a writable PMD of the type specified by | +| | the VMA. | +---------------------------+--------------------------------------------------+ | pmd_mkwrite_novma | Creates a writable PMD, of the conventional type | | | of writable. | diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..43701bf223d3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1284,7 +1284,7 @@ void free_compound_page(struct page *page); static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); return pte; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index ae271a307584..0f3cf726812a 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -508,14 +508,14 @@ extern pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, #endif #ifndef pte_mkwrite -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return pte_mkwrite_novma(pte); } #endif #if defined(CONFIG_HAS_HUGE_PAGE) && !defined(pmd_mkwrite) -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { return pmd_mkwrite_novma(pmd); } diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index c54177aabebd..107e293904d3 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -109,10 +109,10 @@ static void __init pte_basic_tests(struct pgtable_debug_args *args, int idx) WARN_ON(!pte_same(pte, pte)); WARN_ON(!pte_young(pte_mkyoung(pte_mkold(pte)))); WARN_ON(!pte_dirty(pte_mkdirty(pte_mkclean(pte)))); - WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte)))); + WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte), args->vma))); WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); - WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); + WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte, args->vma)))); WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte)))); WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); } @@ -153,7 +153,7 @@ static void __init pte_advanced_tests(struct pgtable_debug_args *args) pte = pte_mkclean(pte); set_pte_at(args->mm, args->vaddr, args->ptep, pte); flush_dcache_page(page); - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, args->vma); pte = pte_mkdirty(pte); ptep_set_access_flags(args->vma, args->vaddr, args->ptep, pte, 1); pte = ptep_get(args->ptep); @@ -199,10 +199,10 @@ static void __init pmd_basic_tests(struct pgtable_debug_args *args, int idx) WARN_ON(!pmd_same(pmd, pmd)); WARN_ON(!pmd_young(pmd_mkyoung(pmd_mkold(pmd)))); WARN_ON(!pmd_dirty(pmd_mkdirty(pmd_mkclean(pmd)))); - WARN_ON(!pmd_write(pmd_mkwrite(pmd_wrprotect(pmd)))); + WARN_ON(!pmd_write(pmd_mkwrite(pmd_wrprotect(pmd), args->vma))); WARN_ON(pmd_young(pmd_mkold(pmd_mkyoung(pmd)))); WARN_ON(pmd_dirty(pmd_mkclean(pmd_mkdirty(pmd)))); - WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd)))); + WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd, args->vma)))); WARN_ON(pmd_dirty(pmd_wrprotect(pmd_mkclean(pmd)))); WARN_ON(!pmd_dirty(pmd_wrprotect(pmd_mkdirty(pmd)))); /* @@ -253,7 +253,7 @@ static void __init pmd_advanced_tests(struct pgtable_debug_args *args) pmd = pmd_mkclean(pmd); set_pmd_at(args->mm, vaddr, args->pmdp, pmd); flush_dcache_page(page); - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, args->vma); pmd = pmd_mkdirty(pmd); pmdp_set_access_flags(args->vma, vaddr, args->pmdp, pmd, 1); pmd = READ_ONCE(*args->pmdp); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 624671aaa60d..37dd56b7b3d1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -551,7 +551,7 @@ __setup("transparent_hugepage=", setup_transparent_hugepage); pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, vma); return pmd; } @@ -1572,7 +1572,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) pmd = pmd_modify(oldpmd, vma->vm_page_prot); pmd = pmd_mkyoung(pmd); if (writable) - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, vma); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); @@ -1924,7 +1924,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && can_change_pmd_writable(vma, addr, entry)) - entry = pmd_mkwrite(entry); + entry = pmd_mkwrite(entry, vma); ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); @@ -2234,7 +2234,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, } else { entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); if (write) - entry = pte_mkwrite(entry); + entry = pte_mkwrite(entry, vma); if (anon_exclusive) SetPageAnonExclusive(page + i); if (!young) @@ -3271,7 +3271,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde = pmd_mksoft_dirty(pmde); if (is_writable_migration_entry(entry)) - pmde = pmd_mkwrite(pmde); + pmde = pmd_mkwrite(pmde, vma); if (pmd_swp_uffd_wp(*pvmw->pmd)) pmde = pmd_mkuffd_wp(pmde); if (!is_migration_entry_young(entry)) diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..c1b6fe944c20 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4100,7 +4100,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(&folio->page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = pte_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -4796,7 +4796,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (writable) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); diff --git a/mm/migrate.c b/mm/migrate.c index 01cac26a3127..8b46b722f1a4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -219,7 +219,7 @@ static bool remove_migration_pte(struct folio *folio, if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) pte = pte_mkdirty(pte); if (is_writable_migration_entry(entry)) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); else if (pte_swp_uffd_wp(*pvmw.pte)) pte = pte_mkuffd_wp(pte); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index d30c9de60b0d..df3f5e9d5f76 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -646,7 +646,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } entry = mk_pte(page, vma->vm_page_prot); if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = pte_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index 92d3d3ca390a..afdb6723782e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -198,7 +198,7 @@ static long change_pte_range(struct mmu_gather *tlb, if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pte_write(ptent) && can_change_pte_writable(vma, addr, ptent)) - ptent = pte_mkwrite(ptent); + ptent = pte_mkwrite(ptent, vma); ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); if (pte_needs_flush(oldpte, ptent)) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e97a0b4889fc..6dea7f57026e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -72,7 +72,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, if (page_in_cache && !vm_shared) writable = false; if (writable) - _dst_pte = pte_mkwrite(_dst_pte); + _dst_pte = pte_mkwrite(_dst_pte, dst_vma); if (flags & MFILL_ATOMIC_WP) _dst_pte = pte_mkuffd_wp(_dst_pte); From patchwork Tue Jun 13 00:10:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106959 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211219vqr; Mon, 12 Jun 2023 17:15:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5j+kwTafGEkSPn/1URcSYaEHiwFRJpSE4wvRBSQ0qUsSASJLzpfj13/Lv7MVhT2Nm007La X-Received: by 2002:aa7:cf12:0:b0:516:7a2b:ff80 with SMTP id a18-20020aa7cf12000000b005167a2bff80mr6555012edy.11.1686615301394; Mon, 12 Jun 2023 17:15:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615301; cv=none; d=google.com; s=arc-20160816; b=xHS+3ku8ZGTVc4JGHaAwPzM1g0LWCHVuDM5eQ1j3XVCsP96JhYRNYvC7EEQF+rI5dJ GzPTCVSWLnCSh5Tte9tYsH+cabp0MDokTgz4aASUbaxdui9uf/H3X7dX+LSdHDkRZ8EB da5Wv+9+GwrELYSEc1+rdpnaqWXEXh5Pw5gzDgY+aAvezBV2URmGwSYNnotHkx0zBaPQ ICoqBFnlyuonGEoTjf5B4BKx8rS+6PoK8lAA7AQZ03CJ1AROYU+cLTWv/X+Gmac0TF0u L4DGc6sM3VaVUALZntU9Q3okHPBgD5ReUj+ozlxGvX5l7K4au+F0+P5W5w4VHVwu16W0 g1qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ueCxy0hEBQgGAIs/gsp+oRtbLmHLjLxT1XZ2AlFm+Qc=; b=PyBbCn/7MW1PBfY3YezZYfRjs7y/9xnDon51JxcDsO01pf6A6OpQVqBm7nmAIz6t7s mJswn3zDwrTl95+KFTa1gK9buej/ZyBe9fITW1fHt+xripUj9VQi/1PfXzX1FIV8JRAu B6lb1/erwjcBgGh8XpYvPCSCzihJD7qyXwd65dmlvnob/yZoYW1ZpUBFYTNm+EuyH41G +aR1PDvgUn/MTxi/yfV+odmz7B+viX832KZYTwP0WsifhbG4FqJjtfUbXuMqxNU6a4TT yjBKVEGClIUurGR5I+derWkB8ksHtlpBgpJvrETD0y6SzKgAe1fktqeh2HN2vvtCIR6M mvtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fCT1br1C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k6-20020aa7d8c6000000b005148ebb4cddsi6225280eds.461.2023.06.12.17.14.36; Mon, 12 Jun 2023 17:15:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fCT1br1C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238818AbjFMAM3 (ORCPT + 99 others); Mon, 12 Jun 2023 20:12:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229869AbjFMAMO (ORCPT ); Mon, 12 Jun 2023 20:12:14 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F095B19B; Mon, 12 Jun 2023 17:12:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615132; x=1718151132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tDE8EspiBAdPvpd4BkDoYawjxYWlY+HjRp9EjAaWS/U=; b=fCT1br1C/pf5ZkfVJejD1weJCG6FHypSMBy2efglxfDp/2Ck+CUXiu0w BWztVo6lEsqrwDqrk6NdNd+hEJ7eVUyqGQzMROCLsYGWAAxVEEscz8mFP W4xqPTRhtdBZbCsdi46QAwc6rqDCaMDVtVAtQKH9qLR/uL9of1knlk6jw AFInAMvK7XEM8JvWuo8TwEdq5JMArbChQGmUB35QPR75h+FUNeANvB+12 jo5riqGU+yD3LAS0LtDs4Q9tBqYd2FmYczfnBueV4otLx9Wa5q4aZJWhm To1xgjw/mRRudqf14CYQU1Zzgg9rwl1w8MWktgr2J5lgxwBJAOtVjubYR g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556728" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556728" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670979" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670979" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:08 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Peter Collingbourne , Pengfei Xu Subject: [PATCH v9 04/42] mm: Re-introduce vm_flags to do_mmap() Date: Mon, 12 Jun 2023 17:10:30 -0700 Message-Id: <20230613001108.3040476-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544326135509369?= X-GMAIL-MSGID: =?utf-8?q?1768544326135509369?= From: Yu-cheng Yu There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Acked-by: David Hildenbrand Reviewed-by: Mark Brown Tested-by: Mark Brown --- fs/aio.c | 2 +- include/linux/mm.h | 3 ++- ipc/shm.c | 2 +- mm/mmap.c | 10 +++++----- mm/nommu.c | 4 ++-- mm/util.c | 2 +- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index b0b17bd098bb..4a7576989719 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -558,7 +558,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 43701bf223d3..9ec20cbb20c1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3133,7 +3133,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool downgrade); diff --git a/ipc/shm.c b/ipc/shm.c index 60e45e7045d4..576a543b7cff 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1662,7 +1662,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 13678edaa22c..afdf5f78432b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1221,11 +1221,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; validate_mm(mm); @@ -1286,7 +1286,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2903,7 +2903,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, 0, pgoff, &populate, NULL); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index f670d9979a26..138826c4a872 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1002,6 +1002,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1009,7 +1010,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; VMA_ITERATOR(vmi, current->mm, 0); @@ -1029,7 +1029,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4..8e7fc6cacab4 100644 --- a/mm/util.c +++ b/mm/util.c @@ -540,7 +540,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); From patchwork Tue Jun 13 00:10:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106999 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp223729vqr; Mon, 12 Jun 2023 17:48:34 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7idqSQLvH05+B6lI9D4DfvpygjEXj1fMBMvjR3gGz0wtpKLKby1TOt9LqTA8ybC6uD50jP X-Received: by 2002:a05:6402:2055:b0:514:9df0:e3f3 with SMTP id bc21-20020a056402205500b005149df0e3f3mr6981355edb.0.1686617314465; Mon, 12 Jun 2023 17:48:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617314; cv=none; d=google.com; s=arc-20160816; b=MhKB5bWIJmQasney8+MvSnNjKvrq7h8py1UI9dIVhqbH+64stPcgBsSYFT1AJupvnG jIJ/4fbM+pQSGIto/c5ulaJbQiYpo1QZUAyFMWRD1EZUpgZsN2RqYNGbuuUZPs47grlM /5Pi7Q/WNC7UTszqOodYKRdNPWVCwk5yH0+MRTZuH6R4RJyPP0YKO5FYNY3V2Jb2DxSJ X4KVHXs5H0QzqtgW8S1e7ltHM8S4eiHx4JMmBHjp377kcL7reqrdo39+Tj76vLqkYvCb mtGsj94BAzJt72I5CyeVhl4MuNhb7KUg/yjrrK6EVHcCRxru5advxXB8734F4OsPcvk1 96OQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GGsLCpG/NjQKTNutKZQhh/vSRH0Qv1WXqoRX0znCIbQ=; b=M0iCFWxzVMjWijx01jsHrlAo6iuv6SKkbRVE1Iq8ZFNRVNiXQe08KZETI8yByKrS/0 88gvr1LtmnQAimTq+NX+w5X1IH7HJn0aVc0suv4H5mXS31mEmP3RMP3fQ5hL5hEE/arF ofPPQCeAJDCjC06sziaCx5ic3UEee20iNIw4CUNS47WjFsQXnJiSyZirIv9UXwUg0zkJ xLT/YEWbodwcwZZFYdMZy8J5by0/jgTuJre9+ULB8rl2H7MA4cputHR/PiMB+4vwE6hf gtJVJG29XYcvxA8m7HtuvNMz9CIU3D2Z1baMq0ZxjrPi4CtduKL46oUauTb4gCI0rndx e6xg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U8wH8Lhs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n20-20020aa7db54000000b00515f14fcf54si6516924edt.180.2023.06.12.17.48.08; Mon, 12 Jun 2023 17:48:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U8wH8Lhs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238724AbjFMAMj (ORCPT + 99 others); Mon, 12 Jun 2023 20:12:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237807AbjFMAMW (ORCPT ); Mon, 12 Jun 2023 20:12:22 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9A6A19F; Mon, 12 Jun 2023 17:12:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615133; x=1718151133; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HsIyEUzZjyBgsbaXIFC6qpwhsPOWh1g+KPlkVPDuH8M=; b=U8wH8LhsGtKZUHBcEkduZ4nQxDxYGp/RRA3NNcR0f538h25SL7Kazd8w W1LeUkIXENSJmrPyyf4P6GPVl4Nvr7mlFLUAQ4ENyzwhEK0QKP4GHEnbx duiqB68RpY9NYuyBMPKn8eOVu7k1axQoe0+lINjv7xq0re9LIYB5CMASC YNjB4pSNEUWN3DgYyaZXSBb+6eRTjA8TDAx0MIQ9hvBEH4KM25KaHrLXC jck7ynSAZwsWBoOpIEIOx+4M8VaeN3tmAAqRrCna8Z1DMFuzGGCv9xQDN eUnd7gK8NlW20M06T0Qj6VGl5rHUh2kD/yn33VZ/BTS+CFdUJKPYTa9Kn A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556752" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556752" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670984" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670984" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:09 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Axel Rasmussen , Peter Xu , Pengfei Xu Subject: [PATCH v9 05/42] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 Date: Mon, 12 Jun 2023 17:10:31 -0700 Message-Id: <20230613001108.3040476-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546437173191057?= X-GMAIL-MSGID: =?utf-8?q?1768546437173191057?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Future patches will introduce a new VM flag VM_SHADOW_STACK that will be VM_HIGH_ARCH_BIT_5. VM_HIGH_ARCH_BIT_1 through VM_HIGH_ARCH_BIT_4 are bits 32-36, and bit 37 is the unrelated VM_UFFD_MINOR_BIT. For the sake of order, make all VM_HIGH_ARCH_BITs stay together by moving VM_UFFD_MINOR_BIT from 37 to 38. This will allow VM_SHADOW_STACK to be introduced as 37. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Axel Rasmussen Acked-by: Mike Rapoport (IBM) Acked-by: Peter Xu Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: David Hildenbrand --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9ec20cbb20c1..6f52c1e7c640 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -370,7 +370,7 @@ extern unsigned int kobjsize(const void *objp); #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR -# define VM_UFFD_MINOR_BIT 37 +# define VM_UFFD_MINOR_BIT 38 # define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */ #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ # define VM_UFFD_MINOR VM_NONE From patchwork Tue Jun 13 00:10:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106958 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211206vqr; Mon, 12 Jun 2023 17:15:00 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5zunKhYAzWgxlLOjhLv1Hbpvo/Of0cidDfajGfdh5oDtOpzGo8+vhgBUfcHrKWJyk8L245 X-Received: by 2002:a05:6402:120b:b0:514:9eae:b09c with SMTP id c11-20020a056402120b00b005149eaeb09cmr6420475edw.13.1686615299890; Mon, 12 Jun 2023 17:14:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615299; cv=none; d=google.com; s=arc-20160816; b=LolNtTAZo+Ghf5xn0qjY8jXO4F552tKjl/Fg8w/+7cmmtu9TRJd5faZhwCGfq+u3ex MGJbJ1QhjBmioPtG7alXQ0n8seSqb+olS1Azvj6UQulY7JAIZaceeejPSQcJck8V1NMN mff47PZXMTX6L+u56mVgm+VvMa+c/GMA2NCshsK4jhsBfnnVXaYGqNUinFptMMEFQqyn WPfRmZEw6W2uV9PID9pjweBI4KyANwbwJdGueqQJNE2rGysqJ4CGgQbfA6hCCz9adqkG 5MnNOz1S03M5udGd59l+6/QYUrmHle4mY2EmKxtXBy2cAO4wJUEOVRHFqjMsMcVJyJUz jH2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ryAUT3UXrAcsGxsWLBFRJE1PqULflYsxnmjTi3z/Wrs=; b=ORNjAH6dFxXIiHoFkYPnNfPAAvxdLyRLnCyCVM877cm5QTZuOCocVnnty6+uDpiF7w TsKlNe9OB158p4LCookfoh1sEncfPBpIFJtEy9GZkKoB0v71n58IEcC1g3Vby21pav8I XbZx7OuzGF4L87etqjGB6jn5bUsiuMjAxLajRy8BZ1ALk2HgHbyJaNxNrPWri/kTJjMF 2OSJySQmxr49Ibpq6u6AByL8YE70VqmpsDQdZdRhD9xiTk15TcqsNRMtSJZR25g0AfRn 6pKbxBGZgLjgJMhWm9/EaYtPQ7jvilmYyr5I7SN7OaiKM3lKQ3PFT6RhWVDm4ZE+giiF CCDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=J5dsVrvs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ov15-20020a170906fc0f00b00969f259cac2si6124686ejb.640.2023.06.12.17.14.35; Mon, 12 Jun 2023 17:14:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=J5dsVrvs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238996AbjFMANR (ORCPT + 99 others); Mon, 12 Jun 2023 20:13:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238475AbjFMAMW (ORCPT ); Mon, 12 Jun 2023 20:12:22 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43A79171C; Mon, 12 Jun 2023 17:12:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615134; x=1718151134; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VIRIy8IvN5ZfY5u3MVh2ThurjhiZOJC7RcXup7TqiYM=; b=J5dsVrvsQoEiGXjU2/F2mZxHX9Wtgd8eR1tZAx8sTiWaU7g0Nl12dqYK WC3qIDwv9pTqoHO0SUKRtQbFVgJJXVmXRdQq776vDZ4XBTtUinn7yXkpX ERI5enbupURzIofkxsAzQJggymoC2YwNLTDDS+h4d2p1nfMoSJCBZc/Kg eavVqt6vEYuEaRlEVK9WIqidIFBhMoFKjW197xXJdbTN5Yq4FpZ4taULP CCWpLI0rtXeHOqMTe1e4WkxVbWxAAihvoerV/Tg1OhCah5+fCvni+h7rs 76q2oZ2nMw/rCfj7YwiGzBlGan0ehh2rHw5r0hr9f0ABIGqfzZ0iSoLJv Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556774" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556774" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670988" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670988" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:10 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 06/42] x86/shstk: Add Kconfig option for shadow stack Date: Mon, 12 Jun 2023 17:10:32 -0700 Message-Id: <20230613001108.3040476-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544324580452781?= X-GMAIL-MSGID: =?utf-8?q?1768544324580452781?= Shadow stack provides protection for applications against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-shadow stack applications continue to work, but without protection. Since there is another feature that utilizes CET (Kernel IBT) that will share implementation with shadow stacks, create CONFIG_CET to signify that at least one CET feature is configured. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/Kconfig | 24 ++++++++++++++++++++++++ arch/x86/Kconfig.assembler | 5 +++++ 2 files changed, 29 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 53bab123a8ee..ce460d6b4e25 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1852,6 +1852,11 @@ config CC_HAS_IBT (CC_IS_CLANG && CLANG_VERSION >= 140000)) && \ $(as-instr,endbr64) +config X86_CET + def_bool n + help + CET features configured (Shadow stack or IBT) + config X86_KERNEL_IBT prompt "Indirect Branch Tracking" def_bool y @@ -1859,6 +1864,7 @@ config X86_KERNEL_IBT # https://github.com/llvm/llvm-project/commit/9d7001eba9c4cb311e03cd8cdc231f9e579f2d0f depends on !LD_IS_LLD || LLD_VERSION >= 140000 select OBJTOOL + select X86_CET help Build the kernel with support for Indirect Branch Tracking, a hardware support course-grain forward-edge Control Flow Integrity @@ -1952,6 +1958,24 @@ config X86_SGX If unsure, say N. +config X86_USER_SHADOW_STACK + bool "X86 userspace shadow stack" + depends on AS_WRUSS + depends on X86_64 + select ARCH_USES_HIGH_VMA_FLAGS + select X86_CET + help + Shadow stack protection is a hardware feature that detects function + return address corruption. This helps mitigate ROP attacks. + Applications must be enabled to use it, and old userspace does not + get protection "for free". + + CPUs supporting shadow stacks were first released in 2020. + + See Documentation/x86/shstk.rst for more information. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler index b88f784cb02e..8ad41da301e5 100644 --- a/arch/x86/Kconfig.assembler +++ b/arch/x86/Kconfig.assembler @@ -24,3 +24,8 @@ config AS_GFNI def_bool $(as-instr,vgf2p8mulb %xmm0$(comma)%xmm1$(comma)%xmm2) help Supported by binutils >= 2.30 and LLVM integrated assembler + +config AS_WRUSS + def_bool $(as-instr,wrussq %rax$(comma)(%rbx)) + help + Supported by binutils >= 2.31 and LLVM integrated assembler From patchwork Tue Jun 13 00:10:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106997 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp223493vqr; Mon, 12 Jun 2023 17:47:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6LFxiVhk33Soz7oGOMiVnS7A19WZhq5GVM8VJfQcViySj+n5a7LImhf1fOnYn6Ox+9dQab X-Received: by 2002:a17:907:960c:b0:96f:c0b0:f137 with SMTP id gb12-20020a170907960c00b0096fc0b0f137mr12226522ejc.16.1686617272425; Mon, 12 Jun 2023 17:47:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617272; cv=none; d=google.com; s=arc-20160816; b=W+869gC+LaSdwqnVidjKFeUmpRRINM40AspmqzX8N3Bo0elt7lxA6ZVbRtdL1DeCmw hDTo1Fg+lX/fqeWBKp+lMSsxgXo5fnjD8JktkX0WUSP7L9pAVRGnAyUyaItOBkCLPjx6 u3xQQ0Cow+EXw3BGJ7NO9spntYc9rwRf0sITo9GZPaeCunj+5PJCxBmkrkqfk5J5lJjW roZ3YYoKcWhis0+/j+5JaVrSNxIfGYgdLdXstpN6Gr647mzassoF4QDyBOGrtFGDCrVU xtjSjuVHlIrZp0H3PGPEkr/uQx+F+cqb2koW3nS7o/O3DIjOxXluMVp4EAZJzXB5kwkj XB+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4x3914htyKBGWYyTzQcGb3BHP/ZAi07IgspQYvXolFs=; b=yp20huVLEnDXWptX8cvDOaNgbGZPK4N7LwUA+9swmPawL94htVil/LgFqxGuhNl7fE q9cDmvrlFo7mcJFZZNomCwQyZekEs3VLk27/v9oFaWNDWVet+xr/6e0p9sOoTtgiiOLo kx9nKTl114lXmEFJTAR+L+27FyPuveE5TI/lSvfqxLT4xKlnkMTrMm69C/20VvQI9u1C 7sHlal0rjbpQ6GXdght87Wa8OnNiP15D+pU099rgJ0L1Ic/NtjtnhTrY5/u9F8NAyBp/ 1dFaCI46rGTHVqUJapgIjMekE8/2oDdjZqYW9gyOLifa07KPoqboPefIrv6k3IOO2Pjw Kj1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="R3/b+BSD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v7-20020a056402184700b005149bc2c0absi6735692edy.634.2023.06.12.17.47.25; Mon, 12 Jun 2023 17:47:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="R3/b+BSD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238503AbjFMAMn (ORCPT + 99 others); Mon, 12 Jun 2023 20:12:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238496AbjFMAMX (ORCPT ); Mon, 12 Jun 2023 20:12:23 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08EED1726; Mon, 12 Jun 2023 17:12:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615136; x=1718151136; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1T/qgZrUQhYBEmir0bddzuZOwE3WskkI/v/mGEi34h4=; b=R3/b+BSDam0NqDlAT1HhuUrsopGL40H9zZf3quXKnqZM8STTQFcNglZ/ 9GVzOcJb/LA1ZIXm2HoBRfGZP69oQEndy38eukFBmj5Xw22YEHG3fWV2G V5Grxd7Uqp7qmOALzhjSEIIEKr0Kod1F0p3vGVgK22hn/fZ7UDhYX+wJI RsI+XOuVWRl5V+fPTnY9sYcNONw0hBPOSXta27XwFUrM3qvTHfebL/DXu EOq5y0Y5MPn25nC3rAQWSNogSRfcxNYmDQsiE9OyTagnSCXOLMyidMPHH wnMwndWjUKzgkuW/uaaketD1g4GXc2dRny1y/ZokKMefHTXN/l7Uxw9y7 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556815" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556815" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670993" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670993" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:11 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 07/42] x86/traps: Move control protection handler to separate file Date: Mon, 12 Jun 2023 17:10:33 -0700 Message-Id: <20230613001108.3040476-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546393054632079?= X-GMAIL-MSGID: =?utf-8?q?1768546393054632079?= Today the control protection handler is defined in traps.c and used only for the kernel IBT feature. To reduce ifdeffery, move it to it's own file. In future patches, functionality will be added to make this handler also handle user shadow stack faults. So name the file cet.c. No functional change. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/cet.c | 76 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 75 --------------------------------------- 3 files changed, 78 insertions(+), 75 deletions(-) create mode 100644 arch/x86/kernel/cet.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 4070a01c11b7..abee0564b750 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,6 +145,8 @@ obj-$(CONFIG_CFI_CLANG) += cfi.o obj-$(CONFIG_CALL_THUNKS) += callthunks.o +obj-$(CONFIG_X86_CET) += cet.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c new file mode 100644 index 000000000000..7ad22b705b64 --- /dev/null +++ b/arch/x86/kernel/cet.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include + +static __ro_after_init bool ibt_fatal = true; + +extern void ibt_selftest_ip(void); /* code label defined in asm below */ + +enum cp_error_code { + CP_EC = (1 << 15) - 1, + + CP_RET = 1, + CP_IRET = 2, + CP_ENDBR = 3, + CP_RSTRORSSP = 4, + CP_SETSSBSY = 5, + + CP_ENCL = 1 << 15, +}; + +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (!cpu_feature_enabled(X86_FEATURE_IBT)) { + pr_err("Unexpected #CP\n"); + BUG(); + } + + if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) + return; + + if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { + regs->ax = 0; + return; + } + + pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs)); + if (!ibt_fatal) { + printk(KERN_DEFAULT CUT_HERE); + __warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL); + return; + } + BUG(); +} + +/* Must be noinline to ensure uniqueness of ibt_selftest_ip. */ +noinline bool ibt_selftest(void) +{ + unsigned long ret; + + asm (" lea ibt_selftest_ip(%%rip), %%rax\n\t" + ANNOTATE_RETPOLINE_SAFE + " jmp *%%rax\n\t" + "ibt_selftest_ip:\n\t" + UNWIND_HINT_FUNC + ANNOTATE_NOENDBR + " nop\n\t" + + : "=a" (ret) : : "memory"); + + return !ret; +} + +static int __init ibt_setup(char *str) +{ + if (!strcmp(str, "off")) + setup_clear_cpu_cap(X86_FEATURE_IBT); + + if (!strcmp(str, "warn")) + ibt_fatal = false; + + return 1; +} + +__setup("ibt=", ibt_setup); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 58b1f208eff5..6f666dfa97de 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -213,81 +213,6 @@ DEFINE_IDTENTRY(exc_overflow) do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL); } -#ifdef CONFIG_X86_KERNEL_IBT - -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - -enum cp_error_code { - CP_EC = (1 << 15) - 1, - - CP_RET = 1, - CP_IRET = 2, - CP_ENDBR = 3, - CP_RSTRORSSP = 4, - CP_SETSSBSY = 5, - - CP_ENCL = 1 << 15, -}; - -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) -{ - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); - } - - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) - return; - - if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { - regs->ax = 0; - return; - } - - pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs)); - if (!ibt_fatal) { - printk(KERN_DEFAULT CUT_HERE); - __warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL); - return; - } - BUG(); -} - -/* Must be noinline to ensure uniqueness of ibt_selftest_ip. */ -noinline bool ibt_selftest(void) -{ - unsigned long ret; - - asm (" lea ibt_selftest_ip(%%rip), %%rax\n\t" - ANNOTATE_RETPOLINE_SAFE - " jmp *%%rax\n\t" - "ibt_selftest_ip:\n\t" - UNWIND_HINT_FUNC - ANNOTATE_NOENDBR - " nop\n\t" - - : "=a" (ret) : : "memory"); - - return !ret; -} - -static int __init ibt_setup(char *str) -{ - if (!strcmp(str, "off")) - setup_clear_cpu_cap(X86_FEATURE_IBT); - - if (!strcmp(str, "warn")) - ibt_fatal = false; - - return 1; -} - -__setup("ibt=", ibt_setup); - -#endif /* CONFIG_X86_KERNEL_IBT */ - #ifdef CONFIG_X86_F00F_BUG void handle_invalid_op(struct pt_regs *regs) #else From patchwork Tue Jun 13 00:10:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106983 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222498vqr; Mon, 12 Jun 2023 17:45:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7N3n1Pnz+gtX/VkSX4hjOa2rvTBEB+w424pmZtA4SXkmSh8QVplnrnSWOJD/R438PkwzY4 X-Received: by 2002:a17:907:7246:b0:977:d8bf:58e2 with SMTP id ds6-20020a170907724600b00977d8bf58e2mr8948602ejc.37.1686617123601; Mon, 12 Jun 2023 17:45:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617123; cv=none; d=google.com; s=arc-20160816; b=KtCEKqIAvvh3jb6S9EZS3MDmSBalldfu/xL/FFDCe5XsnfsqmUDUOOlmA5O+RMLlVj mrXv4DlAWOgLaJH7AWF1CH1EKyrx+JHvCgvBONMoluwf5YVn8RzHGWmJ71FPw2z8pFon pCVRbIcNhZS6QHRchBIava91h+94QK9egjI02/IgEFAjFuiiDz63pS20DFjmfh+H5x+e O62A1/gznolBASdZSQMhGRQ4s1hUJBXD3KN72nXaGdCddq+G9idgE/rwtmMuGq41LvNc RjtChJPQhTWDL1v5JsC5hfFi9GlVThk139ENV2rKJBOhvBe7YTHvTkcpFZ4t4O5gLYOh ntOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2/A0JR+pPzLRCqzuy8n+KkCV+4dlid7cWNKSH0w0Lto=; b=utDduQ7/vd9Iqu4V+iVJIG6x8DWHpt4zVSzDDVtQHpZuRtkedFx4/AK2QBZN4JcCye liis5PED2gPx4aqxx6OXZ4oUi0uW6n7VYOv76Tg0iuSs1h0oWr3qJPXjy2yKniKpe97h ixMgvKTht/GQvN/BdseNG4Sc4ickqNA7YTbgPXnYM1jJvPwgBOoRUAY7YN5p2bwgmx6H SfnKLdyae93E5D6g9PGbli8ydfiVwU/EGxeD2RbC7WDNvXIs3LaZoykIQp+keKk8s1m2 1zmrlvKvGUJjvLZBjB74o0qcAT+cplxwWDbT0looPepqKZ+/C1yyYtj0YllCIiUW4Y1O P6hQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YwIeoSXk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e15-20020a1709067e0f00b0097456c647ffsi6314679ejr.447.2023.06.12.17.44.33; Mon, 12 Jun 2023 17:45:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YwIeoSXk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232682AbjFMAM4 (ORCPT + 99 others); Mon, 12 Jun 2023 20:12:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238629AbjFMAMY (ORCPT ); Mon, 12 Jun 2023 20:12:24 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC0B01981; Mon, 12 Jun 2023 17:12:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615138; x=1718151138; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sARsUThnVYVYYqaKHA46gpgv7OakP1T/BusMzOfelkg=; b=YwIeoSXkzRjb3keq9ckb0SycQRpJv0oOyV6yZB2MS/xN57lRQ2dT7MC0 0rAty46XsY3JhLO22NCzJQrDT/gwW4qJgh6o6XAeGNjUA1JpJobNMwnnk hGARQxAjxORBPNQlyUdD/uQM9Q9yc34v/SD1tadyB2kTp5nEKN3NokHXC wGCkkgYsrb5QxjBKyPAECjnfOZeuIriytnqUxemsd2mdH7ZE0wz3YFmSd zh8bONtzu7dOVuZe2e2EzqIKCgxZPNJbfs0JJ3+xgJRsL+qegCZ42oZB2 ubFwRFS0w0hjc89e18lJ7ytUU61X6oIyOgBraGc/OddEiFhxt+58QlXXv A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556836" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556836" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670996" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670996" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:13 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 08/42] x86/cpufeatures: Add CPU feature flags for shadow stacks Date: Mon, 12 Jun 2023 17:10:34 -0700 Message-Id: <20230613001108.3040476-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546236942067203?= X-GMAIL-MSGID: =?utf-8?q?1768546236942067203?= The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. The shadow stack feature, enumerated by the CPUID bit described above, encompasses both supervisor and userspace support for shadow stack. In near future patches, only userspace shadow stack will be enabled. In expectation of future supervisor shadow stack support, create a software CPU capability to enumerate kernel utilization of userspace shadow stack support. This user shadow stack bit should depend on the HW "shstk" capability and that logic will be implemented in future patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cb8ca46213be..d7215c8b7923 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -308,6 +308,7 @@ #define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ #define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */ #define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */ +#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ @@ -380,6 +381,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* "" Shadow stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index fafe9be7a6f4..b9c7eae2e70f 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -105,6 +105,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define DISABLE_USER_SHSTK 0 +#else +#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -120,7 +126,7 @@ #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ - DISABLE_CALL_DEPTH_TRACKING) + DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK) #define DISABLED_MASK12 (DISABLE_LAM) #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index f6748c8bd647..e462c1d3800a 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Tue Jun 13 00:10:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106996 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp223496vqr; Mon, 12 Jun 2023 17:47:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6+lPL1z65lZwOhWE5liKom3Lkwjm5f4beEEm/0vAfygM/oAlZg2Nnnp3CYJIIfg4Kk2uYc X-Received: by 2002:a17:907:705:b0:96f:cde5:5f5e with SMTP id xb5-20020a170907070500b0096fcde55f5emr10419963ejb.29.1686617272587; Mon, 12 Jun 2023 17:47:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617272; cv=none; d=google.com; s=arc-20160816; b=Uzh7ZKsT2CNDt4W3KWC0DuaZfQPK99WlO/pYUGc7RzuVuwLLKdZ8EduRnwPxO04pX9 kRhvzwEaO2IYAMVurf/K4IvIlOqB/EAUnD8CRlCfbRZOcHZCJElS8vHMxGKwCio151RA HpfMd8LnEIUHtx3kcaERTjU9LVZekkWxvyDppH8umLx87JqYgN94ZdpoLrJxebhtZmzF N+Z6KkvgSj2146eaCEIbQTegrj+v5O7jxurTpEzGkcLWxmCqM/90IqJRjZXxHvMYTUmo uIV8/CxsfaUZivG+WH9QvoP1zOIb43WHrIHm1AVfqTS9eYr/jeSlWbbL3Mlz3C6nzGt9 wpcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=B5LlQiTXVucM+z0G5ThLSCfU2JFVpdZqLcXL708P6y0=; b=GuByIalvAfTTt/OxmAAEU3uhyYMOAJAifM2O0RMCHDIjrvz1fexi7wyNQ7/Y+ur76i Cl+ektYquVzRtphK8Oe+zmbD/nH6YK5/xkR4qwAbFgHdOSveoGyhIHeEe0XGiI87TXKd 5LsO7QxzRThL+eLE/VSEpXRTSI6H5kJThq3t/blCZk24iGD/JW+wRt5Gyu4xUwqOKXRw MPshGTvTSki9BFGRCS3/spHscJqIsUrbudUTg2m4Uyr4ZlQF3D7FokyZufbY8HkSiqTq CNv5CO5QLI1YJROK92/12td8tk0iRnRuLcYhOKvLVX5mfO6a2Q5JxRzZRGzC6GBoDvuy zWXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QefwEUnA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h5-20020a170906590500b0097668549c75si5989890ejq.391.2023.06.12.17.47.26; Mon, 12 Jun 2023 17:47:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QefwEUnA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232117AbjFMAMu (ORCPT + 99 others); Mon, 12 Jun 2023 20:12:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238592AbjFMAMX (ORCPT ); Mon, 12 Jun 2023 20:12:23 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8371173F; Mon, 12 Jun 2023 17:12:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615138; x=1718151138; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YP96Yn7FdNN68fPWKmeUZZw0SU1yEnE0kYU9/y2Lvpc=; b=QefwEUnAhkJ5xF6fSIomS1+EFsRVdKxpgTDF6YIqTUjTxp1QLWoEDZgU Ut4C46TN409YUgx2wYW23j+eUexqKg+lgZzfvmE1z5fR36mxceMNqFHHq a2IPzXlEkksvNKvkuKDC69SHpzFYFoubFjXwZ8oFHQ6hxiR+Eto7Vlws5 cyhcqXPmQa+ztulNzwkrRbGsymXV79MYeiJ67J1jNEe0DyXMyOM14UddW 2NzNIrPVGdzrsC+Uaed8ONPncvdhQMqVyByAWkI041ZXXYVwxDBbVq20V 7JMZsORUM1yMQGT07MmegA2/NprzxMZBvlwfKQrtsPbdEjWycVyJPuelV w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556844" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556844" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671000" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671000" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:14 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 09/42] x86/mm: Move pmd_write(), pud_write() up in the file Date: Mon, 12 Jun 2023 17:10:35 -0700 Message-Id: <20230613001108.3040476-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546393363310626?= X-GMAIL-MSGID: =?utf-8?q?1768546393363310626?= To prepare the introduction of _PAGE_SAVED_DIRTY, move pmd_write() and pud_write() up in the file, so that they can be used by other helpers below. No functional changes. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 112e6060eafa..768ee46782c9 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -160,6 +160,18 @@ static inline int pte_write(pte_t pte) return pte_flags(pte) & _PAGE_RW; } +#define pmd_write pmd_write +static inline int pmd_write(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_RW; +} + +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + static inline int pte_huge(pte_t pte) { return pte_flags(pte) & _PAGE_PSE; @@ -1120,12 +1132,6 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#define pmd_write pmd_write -static inline int pmd_write(pmd_t pmd) -{ - return pmd_flags(pmd) & _PAGE_RW; -} - #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) @@ -1155,12 +1161,6 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } -#define pud_write pud_write -static inline int pud_write(pud_t pud) -{ - return pud_flags(pud) & _PAGE_RW; -} - #ifndef pmdp_establish #define pmdp_establish pmdp_establish static inline pmd_t pmdp_establish(struct vm_area_struct *vma, From patchwork Tue Jun 13 00:10:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106961 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211227vqr; Mon, 12 Jun 2023 17:15:02 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4IDzElbCH3pMhwNfs+3WUCpD1ery/6TDFPql8aI2LRFG3d6GKjtYYyjFDuIkLZJIlIR/50 X-Received: by 2002:a17:906:d555:b0:978:b1fe:99b8 with SMTP id cr21-20020a170906d55500b00978b1fe99b8mr11780934ejc.56.1686615301925; Mon, 12 Jun 2023 17:15:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615301; cv=none; d=google.com; s=arc-20160816; b=haU691hkBlOlB80vBhpaIWmNF0ZelV0kCu73BwwUP8qCHuTUACdrnc4wOlHpoF/948 4XZ/JRG7YQFomts9N5Xu09KUg4TIzFDLx5+Jh2HpaHw0euyCdnQTe5suHwxyEib18XpD kh8LqVlUGH+mKJWoSci4XGpnO5x2kLJijBk0FQ+89AmN4mpw0uah9H1zM+tm6R9ilmlB 3fQ82J8oFendtYZsTKg1kLPecoRzHwGCepm54Pe5nsfA4QuZoaKAeoT0cSGzm5ZT4JZe LR/CN4zmf1JrO6/JSzw33hnIR6uxQaJMhCBkhnTZQVNnOBNATHTGIpC1eh02D7YfGVhS xHnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=0S7yGeGIp3jHFXqgB9BBQQG2+YPsvsrr3HVQmGBoKGw=; b=h4pxBD0tW9o0EwdA7s6ZlotxoMW6osL1P3P80qcx1a4itXrRSE1d61l0oJ78udBfTW 6/3MWbSkjY94ST5M9nU8WwbPcfidO9AAPPCYLkf1UqzPaaRqDgI3nHR+5WqxM3H1+alV d9eVn8//dZPkNb2aAKJcHQq1oN0e+Gus9/PN1Y09C8sEB+Ov5kTYoJ7URe8KxZjiiqW/ 0sOFKycA4vMd14TTBzqb0cnljJp/GZazRNaspY0tzaT4NGYzW6ng9s5qnoktKLGujpJK aMoBeTSgEifPWW03xDu8wadEEF5MEzErFii/FLVhlN0BbyXgob9dhvCBMRw7oW8lCrC9 tQ6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=loB2opO6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v7-20020a056402184700b005149bc2c0absi6707536edy.634.2023.06.12.17.14.36; Mon, 12 Jun 2023 17:15:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=loB2opO6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238989AbjFMANO (ORCPT + 99 others); Mon, 12 Jun 2023 20:13:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238728AbjFMAM0 (ORCPT ); Mon, 12 Jun 2023 20:12:26 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF43F199C; Mon, 12 Jun 2023 17:12:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615140; x=1718151140; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=n6G5K0m9WwVLI5yH5jOOTWumHDLO47Q1gA3TW71PVjE=; b=loB2opO6O87gwVNZFycyLg4y6ZdtdfzU/qRjxnCAsaar9EBTCJhlX1Tp OnROA2J8qh+hxo9aqhakfYp4Yw1MC2URHudZd6tFjsmthGKQ1GO0khw5v VAvOscEm1j67n7YylYi/qBbnDLsREinH9Tkd57skPB7xdmeLnXCO3A3kc LI2jHOVokAd6dBHwQaZkvizEUdSWXc5Sbo+Y1W/MO2KgrfvYaUj2x//tA pRczdsFGjVGUH6ELMBo7LAKKNjJT1pA+4glbEwwrngOMKttjp9IjdXYg9 lK7GrcDS8cc+OEQMyd6iV+q/TPG7KBWgQC5OLk3Cs8h5awfz9OvLGwke+ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556869" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556869" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671005" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671005" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:15 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 10/42] x86/mm: Introduce _PAGE_SAVED_DIRTY Date: Mon, 12 Jun 2023 17:10:36 -0700 Message-Id: <20230613001108.3040476-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544326548979576?= X-GMAIL-MSGID: =?utf-8?q?1768544326548979576?= Some OSes have a greater dependence on software available bits in PTEs than Linux. That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. So in order to support shadow stack memory, Linux should avoid creating memory with this PTE bit combination unless it intends for it to be shadow stack. The reason it's lightly used is that Dirty=1 is normally set by HW _before_ a write. A write with a Write=0 PTE would typically only generate a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1 *and* generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware which supports shadow stacks will no longer exhibit this oddity. So that leaves Write=0,Dirty=1 PTEs created in software. To avoid inadvertently created shadow stack memory, in places where Linux normally creates Write=0,Dirty=1, it can use the software-defined _PAGE_SAVED_DIRTY in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0,Dirty=1, it instead creates Write=0,SavedDirty=1 except for shadow stack, which is Write=0,Dirty=1. There are six bits left available to software in the 64-bit PTE after consuming a bit for _PAGE_SAVED_DIRTY. For 32 bit, the same bit as _PAGE_BIT_UFFD_WP is used, since user fault fd is not supported on 32 bit. This leaves one unused software bit on 32 bit (_PAGE_BIT_SOFT_DIRTY, as this is also not supported on 32 bit). Implement only the infrastructure for _PAGE_SAVED_DIRTY. Changes to actually begin creating _PAGE_SAVED_DIRTY PTEs will follow once other pieces are in place. Since this SavedDirty shifting is done for all x86 CPUs, this leaves the possibility for the hardware oddity to still create Write=0,Dirty=1 PTEs in rare cases. Since these CPUs also don't support shadow stack, this will be harmless as it was before the introduction of SavedDirty. Implement the shifting logic to be branchless. Embed the logic of whether to do the shifting (including checking the Write bits) so that it can be called by future callers that would otherwise need additional branching logic. This efficiency allows the logic of when to do the shifting to be centralized, making the code easier to reason about. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Use bit shifting instead of conditionals (Linus) - Make saved dirty bit unconditional (Linus) - Add 32 bit support to make it extra unconditional - Don't re-order PAGE flags (Dave) --- arch/x86/include/asm/pgtable.h | 83 ++++++++++++++++++++++++++++ arch/x86/include/asm/pgtable_types.h | 38 +++++++++++-- arch/x86/include/asm/tlbflush.h | 3 +- 3 files changed, 119 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 768ee46782c9..a95f872c7429 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -301,6 +301,53 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +/* + * Write protection operations can result in Dirty=1,Write=0 PTEs. But in the + * case of X86_FEATURE_USER_SHSTK, these PTEs denote shadow stack memory. So + * when creating dirty, write-protected memory, a software bit is used: + * _PAGE_BIT_SAVED_DIRTY. The following functions take a PTE and transition the + * Dirty bit to SavedDirty, and vice-vesra. + * + * This shifting is only done if needed. In the case of shifting + * Dirty->SavedDirty, the condition is if the PTE is Write=0. In the case of + * shifting SavedDirty->Dirty, the condition is Write=1. + */ +static inline unsigned long mksaveddirty_shift(unsigned long v) +{ + unsigned long cond = !(v & (1 << _PAGE_BIT_RW)); + + v |= ((v >> _PAGE_BIT_DIRTY) & cond) << _PAGE_BIT_SAVED_DIRTY; + v &= ~(cond << _PAGE_BIT_DIRTY); + + return v; +} + +static inline unsigned long clear_saveddirty_shift(unsigned long v) +{ + unsigned long cond = !!(v & (1 << _PAGE_BIT_RW)); + + v |= ((v >> _PAGE_BIT_SAVED_DIRTY) & cond) << _PAGE_BIT_DIRTY; + v &= ~(cond << _PAGE_BIT_SAVED_DIRTY); + + return v; +} + +static inline pte_t pte_mksaveddirty(pte_t pte) +{ + pteval_t v = native_pte_val(pte); + + v = mksaveddirty_shift(v); + return native_make_pte(v); +} + +static inline pte_t pte_clear_saveddirty(pte_t pte) +{ + pteval_t v = native_pte_val(pte); + + v = clear_saveddirty_shift(v); + return native_make_pte(v); +} + static inline pte_t pte_wrprotect(pte_t pte) { return pte_clear_flags(pte, _PAGE_RW); @@ -413,6 +460,24 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +/* See comments above mksaveddirty_shift() */ +static inline pmd_t pmd_mksaveddirty(pmd_t pmd) +{ + pmdval_t v = native_pmd_val(pmd); + + v = mksaveddirty_shift(v); + return native_make_pmd(v); +} + +/* See comments above mksaveddirty_shift() */ +static inline pmd_t pmd_clear_saveddirty(pmd_t pmd) +{ + pmdval_t v = native_pmd_val(pmd); + + v = clear_saveddirty_shift(v); + return native_make_pmd(v); +} + static inline pmd_t pmd_wrprotect(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_RW); @@ -484,6 +549,24 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) return native_make_pud(v & ~clear); } +/* See comments above mksaveddirty_shift() */ +static inline pud_t pud_mksaveddirty(pud_t pud) +{ + pudval_t v = native_pud_val(pud); + + v = mksaveddirty_shift(v); + return native_make_pud(v); +} + +/* See comments above mksaveddirty_shift() */ +static inline pud_t pud_clear_saveddirty(pud_t pud) +{ + pudval_t v = native_pud_val(pud); + + v = clear_saveddirty_shift(v); + return native_make_pud(v); +} + static inline pud_t pud_mkold(pud_t pud) { return pud_clear_flags(pud, _PAGE_ACCESSED); diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 447d4bee25c4..ee6f8e57e115 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -21,7 +21,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -34,6 +35,13 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +#ifdef CONFIG_X86_64 +#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit */ +#else +/* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */ +#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit */ +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +125,22 @@ #define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif +/* + * The hardware requires shadow stack to be Write=0,Dirty=1. However, + * there are valid cases where the kernel might create read-only PTEs that + * are dirty (e.g., fork(), mprotect(), uffd-wp(), soft-dirty tracking). In + * this case, the _PAGE_SAVED_DIRTY bit is used instead of the HW-dirty bit, + * to avoid creating a wrong "shadow stack" PTEs. Such PTEs have + * (Write=0,SavedDirty=1,Dirty=0) set. + */ +#ifdef CONFIG_X86_64 +#define _PAGE_SAVED_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_SAVED_DIRTY) +#else +#define _PAGE_SAVED_DIRTY (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_SAVED_DIRTY) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* @@ -125,9 +149,9 @@ * instance, and is *not* included in this mask since * pte_modify() does modify it. */ -#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ +#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ + _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_BITS | \ + _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ _PAGE_UFFD_WP) #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) @@ -188,10 +212,16 @@ enum page_cache_mode { #define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) #define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) + +/* + * Page tables needs to have Write=1 in order for any lower PTEs to be + * writable. This includes shadow stack memory (Write=0, Dirty=1) + */ #define _KERNPG_TABLE_NOENC (__PP|__RW| 0|___A| 0|___D| 0| 0) #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) + #define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) #define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 75bfaa421030..965659d2c965 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -293,7 +293,8 @@ static inline bool pte_flags_need_flush(unsigned long oldflags, const pteval_t flush_on_clear = _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED; const pteval_t software_flags = _PAGE_SOFTW1 | _PAGE_SOFTW2 | - _PAGE_SOFTW3 | _PAGE_SOFTW4; + _PAGE_SOFTW3 | _PAGE_SOFTW4 | + _PAGE_SAVED_DIRTY; const pteval_t flush_on_change = _PAGE_RW | _PAGE_USER | _PAGE_PWT | _PAGE_PCD | _PAGE_PSE | _PAGE_GLOBAL | _PAGE_PAT | _PAGE_PAT_LARGE | _PAGE_PKEY_BIT0 | _PAGE_PKEY_BIT1 | From patchwork Tue Jun 13 00:10:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106976 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp221551vqr; Mon, 12 Jun 2023 17:42:56 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6oY+XtNOi2JwzDheNJdBK3JpJOEBTd9j1/Mnc4hYF54Jm2cMxVQMBTmqQi8EgjhY4qnivg X-Received: by 2002:a17:907:a40f:b0:978:ae78:a77f with SMTP id sg15-20020a170907a40f00b00978ae78a77fmr9844497ejc.21.1686616976065; Mon, 12 Jun 2023 17:42:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616976; cv=none; d=google.com; s=arc-20160816; b=OvDk4AxBMiHSu4NLRLskC17AqKl5C5cqitY1xSpUlCWWrinpJH0rPLDsAq6CH/xVLt OJA5h/+0b5ttgkdY12TLlaKlMLz+5mXjeR/a7xQg0yLIhNbOB0kBKQlUAMpuarym9WH+ uDKB7kWB45hhTmQWw3ViLU0sq2Ci7Uh0Px2votU0mq63RdM0XxIW39qMzbOfe1E180uX bVLsGXBMGVZK7AEpjF1FWWTPcZB4qzp2vmdv4a18RH7Gc0buNcZtcM1f2iU3wcyjY2Ku 9hutwmqsF3K7zmZB63375kNgAxwtKBj8Ga+f8u6DdIwopJkEBzgpFbn8+F9yvE9S59tT GHeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PX1Ox8jo3hs6viUiDsDzJZ8BE9pewzLRSmXAF0uR7OU=; b=jvKeLxQWHDJUmzbtqod5Zqm3BXky1Jzswkr2gd6IF2xLqGfs+7BIYE4frgFRs0qqkm iFqmEenLl2d/9oY0Sg3eNqatFGMcrIGv1dY9cyxFXKVrpegeVFYbwRDFn4hYLHPqYAwS gNRvnbuu6xRARFpKf9lKluTkDpNOzvUs8nZwGXlS1HbHh5ICynqaOzgV0JANOqhh8djq 4F3EnUo5daQRdvj5fA0gOgkV6ZRlBv3f/vF03Keq1p9YcD6RCzNBEUoY4LERmUrW6VS7 GCXu9kVdmKrkwr2q4tBzQE79cRoUkv3hTUoKqNwQqE3QKHFjmmKmgjRuk2P5t3bjcvIt GfpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QY28iBE1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l4-20020a170906644400b0094f442c8a6dsi3305360ejn.291.2023.06.12.17.42.29; Mon, 12 Jun 2023 17:42:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QY28iBE1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238886AbjFMANA (ORCPT + 99 others); Mon, 12 Jun 2023 20:13:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238695AbjFMAMZ (ORCPT ); Mon, 12 Jun 2023 20:12:25 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33AAE1996; Mon, 12 Jun 2023 17:12:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615139; x=1718151139; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m52H3JZHS5dKvCQFGYyapfckCiGOGTRhvdDeSntBBxk=; b=QY28iBE1VRwIzr0w1+d1sPUyhwOeLbtzUqxmISgNndXNT8KXB+rmz4MG nEpBMDa1hMXmF0tm/naPI4x8811U9wPlZzw/w1R1q7TZnn4fb1vbT9HDl 0Fr/MY8kSvDWeJnFrPMrgivTbTcVrI4/QVo5WafbE/II+DSA/rUvi1LJF ARapMg4l5qGppyWqsVO1mnDEOrLWIrGxmEPGdRKRwJagiVWDjDUHk/qHk uOzeWvVb9VPd0qgsSXkCRKqQgU15spXPqPIh34NCpo4+dEinnJyog9jVh oNco9BeRWA0AF/Om/StG6V+JDnEZp2uwHKHi/JgFNDmmT85cp5P9x8QV5 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556890" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556890" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671009" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671009" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:16 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 11/42] x86/mm: Update ptep/pmdp_set_wrprotect() for _PAGE_SAVED_DIRTY Date: Mon, 12 Jun 2023 17:10:37 -0700 Message-Id: <20230613001108.3040476-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546082241749541?= X-GMAIL-MSGID: =?utf-8?q?1768546082241749541?= When shadow stack is in use, Write=0,Dirty=1 PTE are preserved for shadow stack. Copy-on-write PTEs then have Write=0,SavedDirty=1. When a PTE goes from Write=1,Dirty=1 to Write=0,SavedDirty=1, it could become a transient shadow stack PTE in two cases: 1. Some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, and a TLB flush is not necessary. 2. When _PAGE_DIRTY is replaced with _PAGE_SAVED_DIRTY non-atomically, a transient shadow stack PTE can be created as a result. Prevent the second case when doing a write protection and Dirty->SavedDirty shift at the same time with a CMPXCHG loop. The first case Note, in the PAE case CMPXCHG will need to operate on 8 byte, but try_cmpxchg() will not use CMPXCHG8B, so it cannot operate on a full PAE PTE. However the exiting logic is not operating on a full 8 byte region either, and relies on the fact that the Write bit is in the first 4 bytes when doing the clear_bit(). Since both the Dirty, SavedDirty and Write bits are in the first 4 bytes, casting to a long will be similar to the existing behavior which also casts to a long. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the CMPXCHG solution. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Use bit shifting helpers that don't need any extra conditional logic. (Linus) - Always do the SavedDirty shifting (Linus) --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a95f872c7429..99b54ab0a919 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1189,7 +1189,17 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg((long *)&ptep->pte, (long *)&old_pte, *(long *)&new_pte)); } #define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0) @@ -1241,7 +1251,17 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { - clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg((long *)pmdp, (long *)&old_pmd, *(long *)&new_pmd)); } #ifndef pmdp_establish From patchwork Tue Jun 13 00:10:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106985 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222679vqr; Mon, 12 Jun 2023 17:45:49 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ796C3nfmxi/j/Nk0WSLsPb+s2Vt2hgD0Hm2iA2CJiibOi1PsWMvWb8rKawKE+sgmJDBAwj X-Received: by 2002:a17:907:162a:b0:97e:ab93:b248 with SMTP id hb42-20020a170907162a00b0097eab93b248mr6583033ejc.47.1686617149071; Mon, 12 Jun 2023 17:45:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617149; cv=none; d=google.com; s=arc-20160816; b=bNKJyi/z1L3bgtam0IG6q6JeNt5pL7pGcZOLPTTX9P2sYugxnt9/H6Dl6Cgnqof+V/ obIbgbkoRedzBrLDGtdXttRZ0KR1R6bv/Gwj0VnzulmWyMkzq027RPhdW824CuG7/Vzh Xr+UQ2O+Hv/XQfxfg43Lp2L2c5SuGFBnkMutR7nX8pJFZmla0shwAyEEaHCmYGZebzcj 76W2/mgehrZFY4JJF8jm6j7FXUavoZnHf9xCEM0qapzoLuNJ3pKbYUOpy/p5Gj0bWO4C TkPCYWCIYo9ez3+u0fbpLYOKhArqx4wazqX0t3XGcnP1FWf2ucnT34vjhE6ZLgVVrPap 2fFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=w6xmAlo1/6JBYZEdIzULXEdqQeAR1+gtIuehh+1XHvc=; b=LRELwinisyMBWoYw8riTAyf3YFUYJ589Za6JvK9lHP+sxkCedYeZQ/7d/WyZ9cbHBi wjhcUXYFCyCPrNQSUFR0Hw2Ld5+HLQ8gREFcrsgyTI27rW4T8ompz2UCbMrXIkTL8iMI izE9vKDxN7KXMUJr8Phny5E+eW7wkXHidrE+lHg/jMm5cAruKGx3aJBzIQZlw5expVw6 RNt9uikPvDGgk4fBn5a+t74wZY0Zbj+iE50YUKrMSWJjVE03Dl4MKxYNwUjJ+57IkM+K ekpHn710srPh2mwTT6NjJu5ewTrDDWuWbpA6wl9q4yfe9QyLyzUgFFGdtnGoPHJcC9QQ aRyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=goZUV0P6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lu32-20020a170906fae000b00977e729a2f7si6157831ejb.951.2023.06.12.17.45.04; Mon, 12 Jun 2023 17:45:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=goZUV0P6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238972AbjFMANG (ORCPT + 99 others); Mon, 12 Jun 2023 20:13:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238803AbjFMAM2 (ORCPT ); Mon, 12 Jun 2023 20:12:28 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DF31197; Mon, 12 Jun 2023 17:12:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615143; x=1718151143; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7eVLFaUcP7DtfYYGhz2IcxmOiAMntCwFG2k86EQ1hPY=; b=goZUV0P6gFKBYGIKAOm/P0F/uIFo+IlIBLC+CXnTZAKZTtfIaxNeMohx 4tz0GWKQ+EVG3QOAW9wWITARrbKuhMNG7ESA1iOIoDDNSa8YJNFnYnRsC r3EyEpyZiimgCLOXcgaUoOhdhZlLBft3QayzYw2AEXABMCUpSkUEC+9Y7 SIfHSFCpQ6pB5O7wngp8uqidh3HioAcL+B+VIMlI7vqBFk4eC/g0K0p/z I7RSv3/BjQFHzw3fJuYNhy46j8fl79eSiTTQOD/uUY4Dt2QqXIm+9KwHi 2hzm9LCaHCzBcNOZ5IGzKZIMtZwZrcoYHMJJofZoYNab2LUTKsB2D/P24 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556918" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556918" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671013" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671013" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:17 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 12/42] x86/mm: Start actually marking _PAGE_SAVED_DIRTY Date: Mon, 12 Jun 2023 17:10:38 -0700 Message-Id: <20230613001108.3040476-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546263768077137?= X-GMAIL-MSGID: =?utf-8?q?1768546263768077137?= The recently introduced _PAGE_SAVED_DIRTY should be used instead of the HW Dirty bit whenever a PTE is Write=0, in order to not inadvertently create shadow stack PTEs. Update pte_mk*() helpers to do this, and apply the same changes to pmd and pud. Since there is no x86 version of pte_mkwrite() to hold this arch specific logic, create one. Add it to x86/mm/pgtable.c instead of x86/asm/include/pgtable.h as future patches will require it to live in pgtable.c and it will make the diff easier for reviewers. Since CPUs without shadow stack support could create Write=0,Dirty=1 PTEs, only return true for pte_shstk() if the CPU also supports shadow stack. This will prevent these HW creates PTEs as showing as true for pte_write(). For pte_modify() this is a bit trickier. It takes a "raw" pgprot_t which was not necessarily created with any of the existing PTE bit helpers. That means that it can return a pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to create one. Modify it to also move _PAGE_DIRTY to _PAGE_SAVED_DIRTY. To avoid creating Write=0,Dirty=1 PTEs, pte_modify() needs to avoid: 1. Marking Write=0 PTEs Dirty=1 2. Marking Dirty=1 PTEs Write=0 The first case cannot happen as the existing behavior of pte_modify() is to filter out any Dirty bit passed in newprot. Handle the second case by shifting _PAGE_DIRTY=1 to _PAGE_SAVED_DIRTY=1 if the PTE was write protected by the pte_modify() call. Apply the same changes to pmd_modify(). Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Use bit shifting helpers that don't need any extra conditional logic. (Linus) - Handle the harmless Write=0->Write=1 pte_modify() case with the shifting helpers. - Don't ever return true for pte_shstk() if the CPU does not support shadow stack. --- arch/x86/include/asm/pgtable.h | 151 ++++++++++++++++++++++++++++----- arch/x86/mm/pgtable.c | 14 +++ 2 files changed, 144 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 99b54ab0a919..d8724f5b1202 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,9 +124,15 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + return cpu_feature_enabled(X86_FEATURE_SHSTK) && + (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -134,9 +140,16 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + return cpu_feature_enabled(X86_FEATURE_SHSTK) && + (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) == + (_PAGE_DIRTY | _PAGE_PSE); } #define pmd_young pmd_young @@ -145,9 +158,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -157,13 +170,21 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -350,7 +371,14 @@ static inline pte_t pte_clear_saveddirty(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (Write=0,Dirty=1). Move the hardware + * dirty value to the software bit, if present. + */ + return pte_mksaveddirty(pte); } #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -388,7 +416,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -403,7 +431,16 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pte = pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + + return pte_mksaveddirty(pte); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + pte = pte_clear_flags(pte, _PAGE_RW); + + return pte_set_flags(pte, _PAGE_DIRTY); } static inline pte_t pte_mkyoung(pte_t pte) @@ -416,6 +453,10 @@ static inline pte_t pte_mkwrite_novma(pte_t pte) return pte_set_flags(pte, _PAGE_RW); } +struct vm_area_struct; +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma); +#define pte_mkwrite pte_mkwrite + static inline pte_t pte_mkhuge(pte_t pte) { return pte_set_flags(pte, _PAGE_PSE); @@ -480,7 +521,14 @@ static inline pmd_t pmd_clear_saveddirty(pmd_t pmd) static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + return pmd_mksaveddirty(pmd); } #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -507,12 +555,21 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmd = pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + + return pmd_mksaveddirty(pmd); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + pmd = pmd_clear_flags(pmd, _PAGE_RW); + + return pmd_set_flags(pmd, _PAGE_DIRTY); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -535,6 +592,9 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_RW); } +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#define pmd_mkwrite pmd_mkwrite + static inline pud_t pud_set_flags(pud_t pud, pudval_t set) { pudval_t v = native_pud_val(pud); @@ -574,17 +634,26 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + return pud_mksaveddirty(pud); } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pud = pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + + return pud_mksaveddirty(pud); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -604,7 +673,9 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + return pud_clear_saveddirty(pud); } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY @@ -721,6 +792,7 @@ static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pteval_t val = pte_val(pte), oldval = val; + pte_t pte_result; /* * Chop off the NX bit (if present), and add the NX portion of @@ -729,17 +801,54 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) val &= _PAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); - return __pte(val); + + pte_result = __pte(val); + + /* + * To avoid creating Write=0,Dirty=1 PTEs, pte_modify() needs to avoid: + * 1. Marking Write=0 PTEs Dirty=1 + * 2. Marking Dirty=1 PTEs Write=0 + * + * The first case cannot happen because the _PAGE_CHG_MASK will filter + * out any Dirty bit passed in newprot. Handle the second case by + * going through the mksaveddirty exercise. Only do this if the old + * value was Write=1 to avoid doing this on Shadow Stack PTEs. + */ + if (oldval & _PAGE_RW) + pte_result = pte_mksaveddirty(pte_result); + else + pte_result = pte_clear_saveddirty(pte_result); + + return pte_result; } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { pmdval_t val = pmd_val(pmd), oldval = val; + pmd_t pmd_result; - val &= _HPAGE_CHG_MASK; + val &= (_HPAGE_CHG_MASK & ~_PAGE_DIRTY); val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); - return __pmd(val); + + pmd_result = __pmd(val); + + /* + * To avoid creating Write=0,Dirty=1 PMDs, pte_modify() needs to avoid: + * 1. Marking Write=0 PMDs Dirty=1 + * 2. Marking Dirty=1 PMDs Write=0 + * + * The first case cannot happen because the _PAGE_CHG_MASK will filter + * out any Dirty bit passed in newprot. Handle the second case by + * going through the mksaveddirty exercise. Only do this if the old + * value was Write=1 to avoid doing this on Shadow Stack PTEs. + */ + if (oldval & _PAGE_RW) + pmd_result = pmd_mksaveddirty(pmd_result); + else + pmd_result = pmd_clear_saveddirty(pmd_result); + + return pmd_result; } /* diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index e4f499eb0f29..0ad2c62ac0a8 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -880,3 +880,17 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + pte = pte_mkwrite_novma(pte); + + return pte_clear_saveddirty(pte); +} + +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + pmd = pmd_mkwrite_novma(pmd); + + return pmd_clear_saveddirty(pmd); +} From patchwork Tue Jun 13 00:10:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106960 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211220vqr; Mon, 12 Jun 2023 17:15:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4QY1DgFKch8/Rrje22cWWvVwAmPB1FjW97ZX4nKKn1eYGChkR/T97OkWXzthSdklKgWX8N X-Received: by 2002:aa7:cf13:0:b0:516:a277:48ce with SMTP id a19-20020aa7cf13000000b00516a27748cemr6492316edy.4.1686615301526; Mon, 12 Jun 2023 17:15:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615301; cv=none; d=google.com; s=arc-20160816; b=zAO8KWHC0Z8X0jSRFSZgCKqsnoiPTbXhKbqUrhKSei3JRgnNSqYmEYn+lyQ7UXIC4m wDS7t9jKI1+sScJnj+hCFAbh5n6esL7ucBvlmFOAmAe+aYsbOanMBhlqwgnZWw0NCTt+ +1ejz4SwnTJ4BE3fy252Kb+qF5YjEWV7+zRg5vCu+QwOemMLgvFwF53lVFIjnfqnaQg1 Vq4FBzXNf/gGRPukzKYG5cMGKdASBfhPFmWoZXjDt/0beGK8Esz2DffB7kW+2r6lpirF ksrWwpNhAh2iqkfh2tW86JmH4jm3Kt3Y2eIVEn0izikBHfR+guBoFLqq08ELkCFGrpSY hLcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=IQk0l3wwsjhshYURQZ9ElMIHgqOA76qR/5uNDXp6uwI=; b=j9dptkFkrOhRD4Jpt45XdF894NwteOyMJ0/D/K51d8ojc0fn7NqiAfETuLrjTxsVHX uwDrdqiSEsvNrMcyAuHjUGN8+6+EByZ8qnodwVlJ61FCwUXsReAsRubeVmiKPmE14mYp 0+b6YcmXZ3zrquSKAznKHohDM8Ra7ehFJmN/S+wlWgLZOsAH2mZRMwhe0FhJeNMNCx2e l6jmtgg117xa+4NZQJEw7XAkeOFIn2aaeDljerloXVat+uUSb4QI8hDkwIaxena9widG X8ILw1eox06MYaaUcDPlCAAl9/7W4L9Dl3yCo9LEXIOdOjHXzVVy1CTvpwjgjQPfyT6f zxYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HQRuUlSo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n17-20020aa7d051000000b00514b3dadb55si6403980edo.440.2023.06.12.17.14.36; Mon, 12 Jun 2023 17:15:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HQRuUlSo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238978AbjFMANK (ORCPT + 99 others); Mon, 12 Jun 2023 20:13:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238738AbjFMAM0 (ORCPT ); Mon, 12 Jun 2023 20:12:26 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C6C9F0; Mon, 12 Jun 2023 17:12:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615142; x=1718151142; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qno7FNbVCOlU0l2tsMlcvtzg5iSqFMH8Q3FgqO3yA3s=; b=HQRuUlSox7Nv4w8nTyOBqx3/ZuiQHHppNSrNzzabwGyn30BldrP+ftmG kg+hWfBNLPIXsfkxqD55ZDOnylyC9LjyIGgnwwcfraLeNgRscX9h6KCno +mt/7Tq2xLyrKoK8bgA+iyqq2znz8MGvf/oU174MrggH4Ab2c8lIQtoYc TvcqvENX8sEdZZ5nnjECZgeg5biSBxZiJjBJXK+fjbU/+NRiq7enCkbEq 8oLJliAvQz6yeERjjLOOvfD1dcBK4viqdFyv3/UaRyF1iVAFvdDBm86RM h2f6WSpDWvFYM8CL8iUQqHNjYthEU6SvO1xtmtUu9K7/F+c9i3TrN3UCp Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556946" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556946" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671016" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671016" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:18 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 13/42] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Date: Mon, 12 Jun 2023 17:10:39 -0700 Message-Id: <20230613001108.3040476-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544326330551513?= X-GMAIL-MSGID: =?utf-8?q?1768544326330551513?= New processors that support Shadow Stack regard Write=0,Dirty=1 PTEs as shadow stack pages. In normal cases, it can be helpful to create Write=1 PTEs as also Dirty=1 if HW dirty tracking is not needed, because if the Dirty bit is not already set the CPU has to set Dirty=1 when the memory gets written to. This creates additional work for the CPU. So traditional wisdom was to simply set the Dirty bit whenever you didn't care about it. However, it was never really very helpful for read-only kernel memory. When CR4.CET=1 and IA32_S_CET.SH_STK_EN=1, some instructions can write to such supervisor memory. The kernel does not set IA32_S_CET.SH_STK_EN, so avoiding kernel Write=0,Dirty=1 memory is not strictly needed for any functional reason. But having Write=0,Dirty=1 kernel memory doesn't have any functional benefit either, so to reduce ambiguity between shadow stack and regular Write=0 pages, remove Dirty=1 from any kernel Write=0 PTEs. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/pgtable_types.h | 8 +++++--- arch/x86/mm/pat/set_memory.c | 4 ++-- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index ee6f8e57e115..26f07d6d5758 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -222,10 +222,12 @@ enum page_cache_mode { #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) +#define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 7159cf787613..fc627acfe40e 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2073,12 +2073,12 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } int set_memory_rox(unsigned long addr, int numpages) { - pgprot_t clr = __pgprot(_PAGE_RW); + pgprot_t clr = __pgprot(_PAGE_RW | _PAGE_DIRTY); if (__supported_pte_mask & _PAGE_NX) clr.pgprot |= _PAGE_NX; From patchwork Tue Jun 13 00:10:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106981 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222217vqr; Mon, 12 Jun 2023 17:44:40 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5msHY6lGijmFxF3MfHkUF1ARe6bbWnS06Sz9geo4Y4vX8U/jP4okvHfacFcTN3EYMufnnn X-Received: by 2002:a17:907:6d1e:b0:977:b397:bbfa with SMTP id sa30-20020a1709076d1e00b00977b397bbfamr11604699ejc.6.1686617079475; Mon, 12 Jun 2023 17:44:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617079; cv=none; d=google.com; s=arc-20160816; b=MIzustIurYAoECorxhmGI8tAt0GOtf3mIS3KDSfMEyzEZoQJbT9IK8vZ2mkltaHRNM mA9yIVD+kFUuNFCxgpuTVNSsvIpjeQLVFkKatAFApyRo1GvotoPPMmO+gQlkDtPHNkwu H0GkbF0cjSBKPMOpdN4cootrUfPlSKlvvMzdfT19R7ECAtRlPv4Zr/Um2N7tpInl8MHc b+MgODNQKmpWsW6Vr4oouqirTA3UktXhZodPoE6PyhWLaJ9C+XTCvUmyp/GvLWQ5zL7l 0T5wnSchL4tX22VsE/dETfVKCabN6TdE+1CK0fR8YvJ45vHyX8wcqhGojSyEPujm+RJ7 jFEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FOFVMBXtKB8/Hcfbd8g/yK53EZfWd2ezBu+ZFVQ6/dM=; b=wNd0ZwJ2uXWovUuIJ4n9MQdARa18JvgagrRbH0rPYIP6APAiupykj0x0Xr0zK2MxKf PeSJuLWdopnJxRCNec6J1sXDyij2Qh4FoMXpwZePtNsb+guFq8nTRBKrXqAd4xNkq+1r 87o14HLE629L+1H31fMlxQIDjy+UkLXLBp1U2LQXAw2wDasu4kSTwAwFYo5Xgr+4XjpN /bGrjr6mr5WzkQkvRR2boHSeIN3B2RkPEp423l742QQt25V6oQBucGnK/bDyZq9o0vIv DrZx+lG+9KCTkiwxvipbYpU/+odzb21MWzn+aHpTBsqHPCQugwrmhh2lZnc0rkaf+Xr1 2TFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=n8if8qdQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ko6-20020a170907986600b0097461d96871si6471228ejc.968.2023.06.12.17.44.15; Mon, 12 Jun 2023 17:44:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=n8if8qdQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239003AbjFMANU (ORCPT + 99 others); Mon, 12 Jun 2023 20:13:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238845AbjFMAMe (ORCPT ); Mon, 12 Jun 2023 20:12:34 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1FC51730; Mon, 12 Jun 2023 17:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615146; x=1718151146; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=svMWn5R9eThoiar+k0qdORAiS7Vwyk8B+BTeN3eLytc=; b=n8if8qdQukN6W8TFmTvv6muKDj4494uFDuN6u1F5bMSfjuF5qAXnnQTP ucPwzsXPUCkf/zNUJ/SubFkseQj/EOOSA1UaozanhV5uAp/yJukoXs6ha mnAjNh4Fw0yRlMxyzgKtEw0vmz3/QQxLIRsAUrelYmH+XNIH9W+gx6Ja6 BUbd26SHCizexRqcNBjfg/vpzbfjZ/NeuIuKlUV3bGLj8GYEiCCOGdH3L uJfwABlj6Fj9E6OX7umvycvJ+eD0Qp86f5lYriCkbUrqmekNIZjAO1yFT zM5j3yvLZ40VVMPqFe6poR5uhFC/MHWQadrCOEnwjns+FcE21gQQ4GdoU Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556974" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556974" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671022" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671022" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:19 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 14/42] mm: Introduce VM_SHADOW_STACK for shadow stack memory Date: Mon, 12 Jun 2023 17:10:40 -0700 Message-Id: <20230613001108.3040476-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546190552984239?= X-GMAIL-MSGID: =?utf-8?q?1768546190552984239?= From: Yu-cheng Yu New hardware extensions implement support for shadow stack memory, such as x86 Control-flow Enforcement Technology (CET). Add a new VM flag to identify these areas, for example, to be used to properly indicate shadow stack PTEs to the hardware. Shadow stack VMA creation will be tightly controlled and limited to anonymous memory to make the implementation simpler and since that is all that is required. The solution will rely on pte_mkwrite() to create the shadow stack PTEs, so it will not be required for vm_get_page_prot() to learn how to create shadow stack memory. For this reason document that VM_SHADOW_STACK should not be mixed with VM_SHARED. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Acked-by: David Hildenbrand Reviewed-by: Mark Brown Tested-by: Mark Brown --- Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 3 files changed, 12 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 7897a7dafcbc..6ccb57089a06 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -566,6 +566,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking + ss shadow stack page == ======================================= Note that there is no guarantee that every flag and associated mnemonic will diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 420510f6a545..38b19a757281 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_X86_USER_SHADOW_STACK + [ilog2(VM_SHADOW_STACK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 6f52c1e7c640..fb17cbd531ac 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -319,11 +319,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -339,6 +341,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Tue Jun 13 00:10:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106963 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211277vqr; Mon, 12 Jun 2023 17:15:07 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4UQBngub9dXCaF8UKHM6pp5dJsQ26aBtmLJbQaNPTe1wHYyd9mXvVfmcrGKShvfVCAKLkr X-Received: by 2002:a17:907:6e88:b0:970:925:6564 with SMTP id sh8-20020a1709076e8800b0097009256564mr10065566ejc.35.1686615307204; Mon, 12 Jun 2023 17:15:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615307; cv=none; d=google.com; s=arc-20160816; b=kJECSUoKxpZeaqOIwg1rZlwklWs6u+K3qh3RPWtvyROtR/8Y4fF7WHdMx1crP1x/nT QswChU0XsNifjOzo12BEpmLADiK5nf7pFPMEMRxIU3/Rg4CyXlFMpd2wo7VQn1pZAWJa ykkNeO9xeQytbBETL0FurJ6f2LzkneZqjDSeFk5VMxixMGkWCyouRruHpsHTSHVSlcCx ao9MdtHoQ5Y2sJyGStCvyY8Ti9/d03aj47HuiCnGt8aRn5Y7llpLSIsc2nEjSJEjeRqW TKMwGegfB/Y/hOJTwm1R5Y3WXBCDArtJ15nPz6+OMJwjS4yrzRtRuFOwc8n4AtbWvz3F SZ3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5fI2fz3C60mSydVE+2mwy8UwGqfU5Pn7fjr7hidomLc=; b=Y6nBdMdFe3zsyLki5XkXhSxC4pthFqQDa5mhF2v8M2o99TCMSE+8gHLNlk5Ad15ppb FZZU+bRL1t7MHRH4loEaxsOvSpLoQ4XlFviw2fUxyhIVdlNTBQhDHwE/sFwn/yHuC0uQ byvdYRkk1AbJVGcpXUMWYUwhRrOeTqFjPtfg123GvhYOw+biAG3vJfqcvWCbjBb/aFNa G0mdhOCTxWvxUFnp44PxKOAWmZlG+fNz54ze35g0p5H5NIexAvJEn1NQ/WjPu0hKKBxJ UrPfBzpr2FuiJ5qq0LD8Qj4rDEYODoucT+j08FGjrCKSL5UOP/8jZ8fYeV9YSjKSBJOR U+lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MAkibPwx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qh1-20020a170906eca100b00965bb773354si5172848ejb.261.2023.06.12.17.14.41; Mon, 12 Jun 2023 17:15:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MAkibPwx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239014AbjFMAN0 (ORCPT + 99 others); Mon, 12 Jun 2023 20:13:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238870AbjFMAMf (ORCPT ); Mon, 12 Jun 2023 20:12:35 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD5A81728; Mon, 12 Jun 2023 17:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615146; x=1718151146; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VxsR8A3vgFZdU2Uya7vVGhJBClJyO0tAkA+BqmUO8Zs=; b=MAkibPwx6HjXO50QeclLoNDH6D2TGh73wITGile4ozU9mHcxnH+PheEm qOjOV8daMbTI/KuLiGHq4DTFRyRxFm676WzWhjn4w9iO9g845IDe9z26b fLqErGBXzqHHP/1PSPelFvMbtbKi/g/NPubKjam5QPWQAuZRmOJWDXiDi DjWXx/G850tGLUankXiSV3sx/AD+jF9uujteIqChg2rfcgOqef46XIszU jmuYjA/lcME4NaA6VS+8A3AViz/M9lJ7dOR1FP9CBYfFijfQsY8GfK1O3 lk4CwotlRYtaR+WONZpp2wd+jlQh8t2dx/ZYGlXPs7xRyo1an1mbNDzBZ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556998" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556998" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671026" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671026" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:20 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 15/42] x86/mm: Check shadow stack page fault errors Date: Mon, 12 Jun 2023 17:10:41 -0700 Message-Id: <20230613001108.3040476-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544332269732588?= X-GMAIL-MSGID: =?utf-8?q?1768544332269732588?= The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stack accesses to shadow-stack mappings can result in faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. For the purpose of making this clearer, consider the following example. If a process has a shadow stack, and forks, the shadow stack PTEs will become read-only due to COW. If the CPU in one process performs a shadow stack read access to the shadow stack, for example executing a RET and causing the CPU to read the shadow stack copy of the return address, then in order for the fault to be resolved the PTE will need to be set with shadow stack permissions. But then the memory would be changeable from userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger COW, otherwise the shared page would be changeable from both processes. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index e4399983c50c..fe68119ce2cc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1118,8 +1118,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1311,6 +1325,14 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * Read-only permissions can not be expressed in shadow stack PTEs. + * Treat all shadow stack accesses as WRITE faults. This ensures + * that the MM will prepare everything (e.g., break COW) such that + * maybe_mkwrite() can create a proper shadow stack PTE. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR) From patchwork Tue Jun 13 00:10:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106978 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222042vqr; Mon, 12 Jun 2023 17:44:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4XCAmMZVbVKDnBQurEfZ24LgMqu1QWX+QahVXn3nDV1YX4S1mZB9ABw9O53X9MlgrtFzHv X-Received: by 2002:a17:907:9811:b0:97e:aace:b6bc with SMTP id ji17-20020a170907981100b0097eaaceb6bcmr6916850ejc.53.1686617052541; Mon, 12 Jun 2023 17:44:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617052; cv=none; d=google.com; s=arc-20160816; b=wy7gSkdetmAR9QIcMqthpHzbNeLqeEcK/JEMxgKcDZbpXXPhgLYArICeMCDqTKBMBA K2NW9U+qNXlalfkq4bdHrS16TSj7XqLshLBrkXoziVkGoL7+N0w8fhikXQTNK1efWUBt m8ty9brMJkENNZexMs9kfg85Qw6LuDeVCaJe9aKtJ9yerrWACUJ0GEDKk24KYmbQ2Qln dJl8lETbdpoRz429iHrsuxEAW9RhNNBisJYtlLb8chw3MsqU4u10+reVaChtoF8d86rg 4InQdzWhs10vkXYWcmUGRcBxjB9h7SOJz6nVZWKBFGFws1yK/azVWr4T0O+DZeEpKWun ktGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hAQJJAeb9RDs/WyI+bBNY0zKz2ihMNwhX4fY+cexkp8=; b=XyKJyhyYYn/GojrhlMKUwodTEgn6EIhpqVXerMrcK2D+SrBxDNvF1ucOv8WaPVLq32 6Ok92zw9vLwAK+aLm0XQ/5pdVMkv46lrT3snMYtPWJZU/SXZb+b6rBgwgUV56DrRyk2b ZE4eGcQW3zjuUTuG7Ygh7eoB6HVbY2o8mX6CuYwuYzrVdFIJenjbCI/sQ09gjsdHS0JR cb/bDqT4Qf/T++IPkFEcAV94DBiaLdDJ36uDeaMLMHSZK7l9kbpMIz1AyeP70n6dxZXG fCOw+2KQS8LOEjs7B01/2A1X7btoB7GDeGYNN/p/fElDCrfdEaQmhaG/DBNiICxDCNg0 Gk8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lgizI1y9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pg3-20020a170907204300b00977ed399758si6002895ejb.1052.2023.06.12.17.43.49; Mon, 12 Jun 2023 17:44:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lgizI1y9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239039AbjFMAOB (ORCPT + 99 others); Mon, 12 Jun 2023 20:14:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238907AbjFMAMh (ORCPT ); Mon, 12 Jun 2023 20:12:37 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C89F918C; Mon, 12 Jun 2023 17:12:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615148; x=1718151148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oZdKPqbrR7Qk+U72in77o4gzn1lGdOYqXWRfhhqI1mg=; b=lgizI1y9RJzdU8F/vZYEjcUSqeNizgbKGojD/5jhzbPP3PcuX5KnK5ii d/2nrGuUDxwfwyVKQUDkAbyv7TKi8qxI/vYQ2b/n99dj7A/zOXjLrImpZ lfG9XL7xBRXCqA1RJWGow0wK5i5z6koEgmo4WpKvdszPcFKWDjAc3Hkqs KCjqo1bzNVoDPWIyAdRtbYYdcTUPyu/SdPJmRN0jjhpx+0BgsPieHRGkQ YYibl6hyWXN8w302fN7uD9GD+Ht1vCJGpaum2tCC8HD058UT+e+hmoACy JOFWG9QIsSVGqUO1pEV9V4eI55Mqn3cF7AGnyUiqWxM9Rd/D8BWm+u3Ed w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557020" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557020" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671031" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671031" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:20 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 16/42] mm: Add guard pages around a shadow stack. Date: Mon, 12 Jun 2023 17:10:42 -0700 Message-Id: <20230613001108.3040476-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546162571091752?= X-GMAIL-MSGID: =?utf-8?q?1768546162571091752?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. The architecture of shadow stack constrains the ability of userspace to move the shadow stack pointer (SSP) in order to prevent corrupting or switching to other shadow stacks. The RSTORSSP instruction can move the SSP to different shadow stacks, but it requires a specially placed token in order to do this. However, the architecture does not prevent incrementing the stack pointer to wander onto an adjacent shadow stack. To prevent this in software, enforce guard pages at the beginning of shadow stack VMAs, such that there will always be a gap between adjacent shadow stacks. Make the gap big enough so that no userspace SSP changing operations (besides RSTORSSP), can move the SSP from one stack to the next. The SSP can be incremented or decremented by CALL, RET and INCSSP. CALL and RET can move the SSP by a maximum of 8 bytes, at which point the shadow stack would be accessed. The INCSSP instruction can also increment the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". It can be thought of as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element The maximum distance INCSSP can move the SSP is 2040 bytes, before it would read the memory. Therefore, a single page gap will be enough to prevent any operation from shifting the SSP to an adjacent stack, since it would have to land in the gap at least once, causing a fault. This could be accomplished by using VM_GROWSDOWN, but this has a downside. The behavior would allow shadow stacks to grow, which is unneeded and adds a strange difference to how most regular stacks work. In the maple tree code, there is some logic for retrying the unmapped area search if a guard gap is violated. This retry should happen for shadow stack guard gap violations as well. This logic currently only checks for VM_GROWSDOWN for start gaps. Since shadow stacks also have a start gap as well, create an new define VM_STARTGAP_FLAGS to hold all the VM flag bits that have start gaps, and make mmap use it. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: Mark Brown --- v9: - Add logic needed to still have guard gaps with maple tree. --- include/linux/mm.h | 54 ++++++++++++++++++++++++++++++++++++++++------ mm/mmap.c | 4 ++-- 2 files changed, 50 insertions(+), 8 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index fb17cbd531ac..535c58d3b2e4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -342,7 +342,36 @@ extern unsigned int kobjsize(const void *objp); #endif /* CONFIG_ARCH_HAS_PKEYS */ #ifdef CONFIG_X86_USER_SHADOW_STACK -# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +/* + * This flag should not be set with VM_SHARED because of lack of support + * core mm. It will also get a guard page. This helps userspace protect + * itself from attacks. The reasoning is as follows: + * + * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The + * INCSSP instruction can increment the shadow stack pointer. It is the + * shadow stack analog of an instruction like: + * + * addq $0x80, %rsp + * + * However, there is one important difference between an ADD on %rsp + * and INCSSP. In addition to modifying SSP, INCSSP also reads from the + * memory of the first and last elements that were "popped". It can be + * thought of as acting like this: + * + * READ_ONCE(ssp); // read+discard top element on stack + * ssp += nr_to_pop * 8; // move the shadow stack + * READ_ONCE(ssp-8); // read+discard last popped stack element + * + * The maximum distance INCSSP can move the SSP is 2040 bytes, before + * it would read the memory. Therefore a single page gap will be enough + * to prevent any operation from shifting the SSP to an adjacent stack, + * since it would have to land in the gap at least once, causing a + * fault. + * + * Prevent using INCSSP to move the SSP between shadow stacks by + * having a PAGE_SIZE guard gap. + */ +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 #else # define VM_SHADOW_STACK VM_NONE #endif @@ -405,6 +434,8 @@ extern unsigned int kobjsize(const void *objp); #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS #endif +#define VM_STARTGAP_FLAGS (VM_GROWSDOWN | VM_SHADOW_STACK) + #ifdef CONFIG_STACK_GROWSUP #define VM_STACK VM_GROWSUP #else @@ -3235,15 +3266,26 @@ struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) return mtree_load(&mm->mm_mt, addr); } +static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + + /* See reasoning around the VM_SHADOW_STACK definition */ + if (vma->vm_flags & VM_SHADOW_STACK) + return PAGE_SIZE; + + return 0; +} + static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { + unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; - if (vm_start > vma->vm_start) - vm_start = 0; - } + vm_start -= gap; + if (vm_start > vma->vm_start) + vm_start = 0; return vm_start; } diff --git a/mm/mmap.c b/mm/mmap.c index afdf5f78432b..d4793600a8d4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1570,7 +1570,7 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) gap = mas.index; gap += (info->align_offset - gap) & info->align_mask; tmp = mas_next(&mas, ULONG_MAX); - if (tmp && (tmp->vm_flags & VM_GROWSDOWN)) { /* Avoid prev check if possible */ + if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */ if (vm_start_gap(tmp) < gap + length - 1) { low_limit = tmp->vm_end; mas_reset(&mas); @@ -1622,7 +1622,7 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) gap -= (gap - info->align_offset) & info->align_mask; gap_end = mas.last; tmp = mas_next(&mas, ULONG_MAX); - if (tmp && (tmp->vm_flags & VM_GROWSDOWN)) { /* Avoid prev check if possible */ + if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */ if (vm_start_gap(tmp) <= gap_end) { high_limit = vm_start_gap(tmp); mas_reset(&mas); From patchwork Tue Jun 13 00:10:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106991 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222817vqr; Mon, 12 Jun 2023 17:46:10 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7+XLJWwySSc+hreodANMiZCGhXq+FK1xLTtRDAzehQLISEaD9Ri0ugoNfC2dDDgNnVoLmo X-Received: by 2002:a17:907:9802:b0:971:fa86:27e with SMTP id ji2-20020a170907980200b00971fa86027emr13255660ejc.16.1686617149324; Mon, 12 Jun 2023 17:45:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617149; cv=none; d=google.com; s=arc-20160816; b=R5HwAKxaByll9ZQ+aaMsUxO30QBzXmy9ipGNjgy4TZrfRV1yToxo2oHRlZvWCwbcHj ZURwRHe7PO5ExmoME2RUCmYNiiNjOw7a1adTfX8ZqpTT4w7j1BNhxEF95L/96VMOLF6D 5e1QWI80+i/nBj9y1qbjNhfVXthHEEFdCMr5aaKipbqrPODUgDp7z8kmAQR0cUqmM98G pIbnvE9IdEmBoXCF8JSsyFQogbZJy4TYtjF2wy7EN/MpNAMpsRoSz7dApOzih6/uf5fY yBvd7IASofQwsqsbBm3kwTrXEbSebu0R3MgHsNhJ+h9CAnjl21EfSJDYOKAsK+o1wMII JJUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p4A9Ag3Rbq1a3tkkCFr8IkGFKL7/tKsugwm4Dh2bjeE=; b=Jm+GvO/mlEZRddHCfcvWhWteZKUr9s7um1N+rtZlYLgNW2QjOiQg3K9IyKi43JSEFk 8ZwozbTihtFc2cVTDR7bZEPWQYHHIOdZgKN70ZivUB7N6ze6CSJJHR3PSm465cwtWtg9 oto8Tk5F+/UH7ZoTXjHDSWgyPqcCztNNOVHt9cHXhJVHQ8h/kLkhs//saJ/qPZdncVfF 2FA6qvjj7tnNEWuojayh94epu4jraRcYmc3ahoIfKmZg5zZbAYw0gwzWLraF7eQo8kTE JnOQb5Yp2SghGaHux/uhGTNWDJsXquH1sJuS5SW3XpcrvVTpRH3OOK4mOwTFD5uN5cNU O3eA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nKs6GDAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ko6-20020a170907986600b0097461d96871si6471228ejc.968.2023.06.12.17.45.13; Mon, 12 Jun 2023 17:45:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nKs6GDAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239128AbjFMAPH (ORCPT + 99 others); Mon, 12 Jun 2023 20:15:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239019AbjFMAN3 (ORCPT ); Mon, 12 Jun 2023 20:13:29 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC2F91BD4; Mon, 12 Jun 2023 17:12:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615154; x=1718151154; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dhZWE6YTdZ+FD9PI/U6uJC2MaFgGrtV9CcOxkJ3S4o4=; b=nKs6GDAfAmXL3lLEMiSLw4dNNNs7s5A0Stx5EeIE6X9n4SO805ioHHFN eFUI+pVLjs79BDazntQy/EVRVkoXCZsQXPey8yNjshgQxvR8UJTzwrbJO Ug4HNzoqjKRYyB2akgYKDOzPKJPxdvNEL6vMEdPTlKs/nv+4KxDu6IuxP aHrC41y37agZj2bNv7iNNls1mG0Bw2Xxtdi873dw8akSgzFl6dhDDUhWx jTikhY5vmUOvOyOPnYoOPRbSh+CagAtYlIdcB/vd3Sg46JvauC+3ycJYZ C30ZZJrDXSYoBb4IVhQs6I8z12z07fgXjHPw7LOplyOal1i72UTeoQ5kT g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557047" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557047" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671035" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671035" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:21 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 17/42] mm: Warn on shadow stack memory in wrong vma Date: Mon, 12 Jun 2023 17:10:43 -0700 Message-Id: <20230613001108.3040476-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546263620121819?= X-GMAIL-MSGID: =?utf-8?q?1768546263620121819?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One sharp edge is that PTEs that are both Write=0 and Dirty=1 are treated as shadow by the CPU, but this combination used to be created by the kernel on x86. Previous patches have changed the kernel to now avoid creating these PTEs unless they are for shadow stack memory. In case any missed corners of the kernel are still creating PTEs like this for non-shadow stack memory, and to catch any re-introductions of the logic, warn if any shadow stack PTEs (Write=0, Dirty=1) are found in non-shadow stack VMAs when they are being zapped. This won't catch transient cases but should have decent coverage. In order to check if a PTE is shadow stack in core mm code, add two arch breakouts arch_check_zapped_pte/pmd(). This will allow shadow stack specific code to be kept in arch/x86. Only do the check if shadow stack is supported by the CPU and configured because in rare cases older CPUs may write Dirty=1 to a Write=0 CPU on older CPUs. This check is handled in pte_shstk()/pmd_shstk(). Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: Mark Brown --- v9: - Add comments about not doing the check on non-shadow stack CPUs --- arch/x86/include/asm/pgtable.h | 6 ++++++ arch/x86/mm/pgtable.c | 20 ++++++++++++++++++++ include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 1 + mm/memory.c | 1 + 5 files changed, 42 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index d8724f5b1202..89cfa93d0ad6 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1664,6 +1664,12 @@ static inline bool arch_has_hw_pte_young(void) return true; } +#define arch_check_zapped_pte arch_check_zapped_pte +void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte); + +#define arch_check_zapped_pmd arch_check_zapped_pmd +void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd); + #ifdef CONFIG_XEN_PV #define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young static inline bool arch_has_hw_nonleaf_pmd_young(void) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 0ad2c62ac0a8..101e721d74aa 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -894,3 +894,23 @@ pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd_clear_saveddirty(pmd); } + +void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte) +{ + /* + * Hardware before shadow stack can (rarely) set Dirty=1 + * on a Write=0 PTE. So the below condition + * only indicates a software bug when shadow stack is + * supported by the HW. This checking is covered in + * pte_shstk(). + */ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pte_shstk(pte)); +} + +void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd) +{ + /* See note in arch_check_zapped_pte() */ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pmd_shstk(pmd)); +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0f3cf726812a..feb1fd2c814f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -291,6 +291,20 @@ static inline bool arch_has_hw_pte_young(void) } #endif +#ifndef arch_check_zapped_pte +static inline void arch_check_zapped_pte(struct vm_area_struct *vma, + pte_t pte) +{ +} +#endif + +#ifndef arch_check_zapped_pmd +static inline void arch_check_zapped_pmd(struct vm_area_struct *vma, + pmd_t pmd) +{ +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 37dd56b7b3d1..c3cc20c1b26c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1681,6 +1681,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, */ orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, tlb->fullmm); + arch_check_zapped_pmd(vma, orig_pmd); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (vma_is_special_huge(vma)) { if (arch_needs_pgtable_deposit()) diff --git a/mm/memory.c b/mm/memory.c index c1b6fe944c20..40c0b233b61d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1412,6 +1412,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + arch_check_zapped_pte(vma, ptent); tlb_remove_tlb_entry(tlb, pte, addr); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); From patchwork Tue Jun 13 00:10:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106964 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211840vqr; Mon, 12 Jun 2023 17:16:28 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7413NO/RglDpkGx0aLm3j1/VjQklOFANnADQPkBPRXeksZOss3dQecmxR0M8/Zp6c9vl88 X-Received: by 2002:aa7:c744:0:b0:514:ad09:44df with SMTP id c4-20020aa7c744000000b00514ad0944dfmr6155438eds.28.1686615388463; Mon, 12 Jun 2023 17:16:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615388; cv=none; d=google.com; s=arc-20160816; b=R6oc8tYb2z+Eed8jjBixZNfhKpoNSXCWhfI/oPgvgWI4rHjmjwBC9rzKb0cCv0/lYi uvcJ8QNRAiHl5mEqxx/npv9Epx3VRHb5DZI4tIwVlWKwoSNWTBYqDvEqa2qApeLHCqR0 jIizOZb6W2anyCL45hWUAKlXjp/Znsgt8AMT0bzk/fSthKUdELBkzZvNbo5HmKLH5LLQ E+tO03rPb55Gzv0ByZku0oIzPLWAeWR9nCi1JXuqkI/Kv/9flH86SGhnaBAy3a/nNzju RZTbV7gzcOO4nKv9L/cccnLbzw+szmfammuUma/EE44UVnpyrhi9Uhay/8DMFnxAbtrr lVDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=xbWC9KPiSk6+i/caVwFEQQ8XwmMbX5MPJXovsyxTZuw=; b=UPaHUHfSUKJZk8G9Q8JQD+veN2/bF8BsmpbbugWv1FAnntI6YWEb+9Zl8uT+gx9GlT s+Tb5n4su+pC2NY7rG7By2AV14W3oywp2+e0FR0+4ZhXPJS1GN39Ip58frzr5FXfoDis A0VkoDJ1RrVeHdPXNs/x9eGt50DAPIMnd4f6wJh20e+uRy3hmShrivHL4iAvWz+V8zQS seKGc+0QiZKgOFwTiz97EpBuI6w1mem8lUe590pyZrBKbdcSbuLMpkyqcfGrvfMae5on tJBX/g4FOG7UtO9hHShTGcgwvBr2nnPn2j3KXZF8Y6SUZQQZiJXDR/PvpNG6ViAjj/dP UGKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jrL0WHpc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id by5-20020a0564021b0500b0050bfd34935bsi1966730edb.338.2023.06.12.17.16.02; Mon, 12 Jun 2023 17:16:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jrL0WHpc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239163AbjFMAPU (ORCPT + 99 others); Mon, 12 Jun 2023 20:15:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239022AbjFMAN4 (ORCPT ); Mon, 12 Jun 2023 20:13:56 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43BA81BDC; Mon, 12 Jun 2023 17:12:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615155; x=1718151155; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ioNZyjx88vkgqQos5HBjwuXN8Qld1af278Y586V7kGQ=; b=jrL0WHpcBkHj3QYgXv2hBY/Ru0gbYbl9aOhT0ffF8Yxx9A7RpOEezA7d 4drBnPu33Qj9l1tAypE4TJl0cAru7b1OxQckh31A5sYUja3AJyEnmipMG CzyT2JOxbh3IMStaROCNNwRVCQb4hWAyx99qG6MypQ2RsRRK2/lZGsd+Z f8jbOvNVBPoRgNMJ7E9C4PL2qZWhM22glD3JFWBywa6eznOzM6tDHSN5Q jsmnxb3Z2loQFm+wCVhCgY93pErg05k6vDxfcqSJrtEgWKXpfO5McKxd/ 3nnGLFPOEwTByd6Yy9785wKchaGjP0F3nU/TWWUzGyEMNmn4bjp8bcsHo g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557071" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557071" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671040" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671040" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:22 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 18/42] x86/mm: Warn if create Write=0,Dirty=1 with raw prot Date: Mon, 12 Jun 2023 17:10:44 -0700 Message-Id: <20230613001108.3040476-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544417456940539?= X-GMAIL-MSGID: =?utf-8?q?1768544417456940539?= When user shadow stack is in use, Write=0,Dirty=1 is treated by the CPU as shadow stack memory. So for shadow stack memory this bit combination is valid, but when Dirty=1,Write=1 (conventionally writable) memory is being write protected, the kernel has been taught to transition the Dirty=1 bit to SavedDirty=1, to avoid inadvertently creating shadow stack memory. It does this inside pte_wrprotect() because it knows the PTE is not intended to be a writable shadow stack entry, it is supposed to be write protected. However, when a PTE is created by a raw prot using mk_pte(), mk_pte() can't know whether to adjust Dirty=1 to SavedDirty=1. It can't distinguish between the caller intending to create a shadow stack PTE or needing the SavedDirty shift. The kernel has been updated to not do this, and so Write=0,Dirty=1 memory should only be created by the pte_mkfoo() helpers. Add a warning to make sure no new mk_pte() start doing this, like, for example, set_memory_rox() did. Signed-off-by: Rick Edgecombe Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Always do the check since 32 bit now supports SavedDirty --- arch/x86/include/asm/pgtable.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 89cfa93d0ad6..5383f7282f89 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1032,7 +1032,14 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) * (Currently stuck as a macro because of indirect forward reference * to linux/mm.h:page_to_nid()) */ -#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot)) +#define mk_pte(page, pgprot) \ +({ \ + pgprot_t __pgprot = pgprot; \ + \ + WARN_ON_ONCE((pgprot_val(__pgprot) & (_PAGE_DIRTY | _PAGE_RW)) == \ + _PAGE_DIRTY); \ + pfn_pte(page_to_pfn(page), __pgprot); \ +}) static inline int pmd_bad(pmd_t pmd) { From patchwork Tue Jun 13 00:10:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106965 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp212338vqr; Mon, 12 Jun 2023 17:17:35 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6u++RKM8fuL+b4tVWHi/p8RlqlA2YPKkrW9r7mCHP61EcCnhKka2itESsYUuL8qC9OglK2 X-Received: by 2002:a17:907:7d8f:b0:97d:2bcc:47d5 with SMTP id oz15-20020a1709077d8f00b0097d2bcc47d5mr9409971ejc.49.1686615455352; Mon, 12 Jun 2023 17:17:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615455; cv=none; d=google.com; s=arc-20160816; b=qpsIZZ2I0X3Ziao1Tu6nvfHAHLntougkllrXDSyZWRVeRa2QIJVFBYTmggQRryuGQ9 x8s43dn81RA1HC90guT/hk/30Cmaj2lDlh9uDs903qbTmeOHR8+IeKVBp6dpwYZPihtV ojnMOBTIeYP30KL46gmqUZ5YTxu/wK6sWDl9aKlnURI+x/DUcYLA6FpdWiBhBIZODBqK lcSy5R6nhvGPgS4BnYzjqaffHCXyVBHaapha58IFrCu0RvOcAq49EwqoCftANdhurkcc /+j9IGCVAMG5QH+TzwcVvyxRVLxrsc0/O00l0vtmeWO4ty1B2ka2w4IVVfLUD7Jd/Erp 1uvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=JK9+ERy4F36GfmVdpqghccuIPuJ16HTtI5Wly0MCYFI=; b=UN2nliY+kEjaycF3NLFNuJX04ZDthoIIuLDwRHjjhRnNkRKI6VZ7Wm9nfxZLSzWZNz hicJXqATvENklowjDQBr4/kW799otfUddBx0t5mFhJzNea30idXfrgYwsnP71FUTpY4E H8IJLGFOpiZmySf9sJ6pFDwf+eGyPKipDbQXhAmD2byD1m528lCGP89s8de+VhMg6Zor 5wY+rICznHRix0Xl4AaIlf44EJN7yxua+TZpLAOrNfwGcefeclP++KGOWwT0Q1m9tSp8 VA0LxxNn7JWCiN5BU+CrcJKONP7Bn3SYDN1CA0DF7AdSHeIFpn0vqv6RlB2A/nQvnGwY c5WQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ReZ5iDxt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qh1-20020a170906eca100b00965bb773354si5172848ejb.261.2023.06.12.17.17.05; Mon, 12 Jun 2023 17:17:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ReZ5iDxt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239066AbjFMAPX (ORCPT + 99 others); Mon, 12 Jun 2023 20:15:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239064AbjFMAOF (ORCPT ); Mon, 12 Jun 2023 20:14:05 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 747A01BE5; Mon, 12 Jun 2023 17:12:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615157; x=1718151157; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0w0eM3eQB07Ass1qJGPwxV0mmwv4IZmjBMBIbi82ENo=; b=ReZ5iDxtnRq/j9yOwTVVm8jfsnwBm5XBzlh2zJrrkvlhQ4Sw4nW3aQ0x FZQtGGk3lOA9NIbXbS7iz6esqfttDuhVz05nhygtkjTnQDYBEsSYcVrla gO/7pVN5RpeiK1tH42ldhPb3FBiDpz8gBe8WonZbzFy1fBUC8jf+tIn9G 9nTbtQo4hC4p+VZjDST0dKlQ3OJKzQo7ZwDcCyLkJzrtZ+wxV3Frhn0v+ OKhtfCOhz0i32BNjTL1smFYtrCD/KE+yjLF8BifgWD5xZabAkZ1W7+LIS 9FOX/bOIDh90mmht5l41dCvjRxos7UlYE4QqUfiNQql7zYHZGYmmenc+K w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557093" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557093" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671045" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671045" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:23 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 19/42] mm/mmap: Add shadow stack pages to memory accounting Date: Mon, 12 Jun 2023 17:10:45 -0700 Message-Id: <20230613001108.3040476-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544488110434225?= X-GMAIL-MSGID: =?utf-8?q?1768544488110434225?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- mm/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 68410c6d97ac..dd2ded32d3d5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -535,14 +535,14 @@ static inline bool is_exec_mapping(vm_flags_t flags) } /* - * Stack area - automatically grows in one direction + * Stack area (including shadow stacks) * * VM_GROWSUP / VM_GROWSDOWN VMAs are always private anonymous: * do_mmap() forbids all other combinations. */ static inline bool is_stack_mapping(vm_flags_t flags) { - return (flags & VM_STACK) == VM_STACK; + return ((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK); } /* From patchwork Tue Jun 13 00:10:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106982 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222495vqr; Mon, 12 Jun 2023 17:45:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5Rb4jlYj599g9yhziytoQkU9ja5M9M+RvFlbR9MwyugfEJXQeWe/u/juWYnbt/gKTqJxbj X-Received: by 2002:a17:907:1c88:b0:96f:1629:3e9a with SMTP id nb8-20020a1709071c8800b0096f16293e9amr11464148ejc.34.1686617122928; Mon, 12 Jun 2023 17:45:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617122; cv=none; d=google.com; s=arc-20160816; b=djhSphd1iqk6DZ5wOxO5hLF6r6/MAq+hEj65QdnYx2SVk1aBHmjlVVmTBUQMoFlZG6 aW793XrE5VJv9sT1MAFUC7NCQP/uF2SOlyDiAFQt/dO6Zcn43CsOZtRhhyJgLMvVOUN3 hFUP4zxg3xkmHQs28aBEE2T9K/buuJG5dIjQ49kzjtWNoYYcFUiPpsqQgoxVxjt8io5P wJncqPUT8AqJynm4/BQrjGjC8wVVV2SbZcE7njL6zyimVhDEHYoFkh7+pp9Y9uwuWba7 EChBUTm+D0uSNw7RA8Nz/Sd9SivjGdrUYo3ATVOZhgBUhiMRv7cN56J1VloABuMfkmfC 69ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=eSpfbzj81RJHeXXJwuREGCXBeYOUsrYFIe5C3KzovZA=; b=kHm/4TVrZjzuwr7FVcFrb+CRndWqN/R1oDd+GX9ZNY65U5CuuvTX0uhNNxl4FeE12n AhxFygNEGUzOWjOJMXb4eUux3oEkipaq+pppBZQ8SDWLJ1DMMeAd/UU7OyUR2+YmhG5m eF7NLtH5jGD2f7xmvpPLO53vhDhZ0oD0F4CTzLgDSlTIhH9CRSXG51lGoEx9E93wfMEl QIvO8L2ktRb7PTglJDEE95mNjnAE6yhfGYAyIAYceV+s3JS0oM3141lYgzRIniWl/v6u aA6QLgVjS/llYzGHP3g1SwP6g8JIF+STbo3OesS2u6Kr7sDgfHB4AFffmJqKJJnXcMMI SXXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cDL23HLN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n21-20020a170906725500b00978a964b146si6038210ejk.1047.2023.06.12.17.44.39; Mon, 12 Jun 2023 17:45:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cDL23HLN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239268AbjFMAPy (ORCPT + 99 others); Mon, 12 Jun 2023 20:15:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235018AbjFMAOY (ORCPT ); Mon, 12 Jun 2023 20:14:24 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AA1F173C; Mon, 12 Jun 2023 17:12:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615175; x=1718151175; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EOoWyIFrsGRkjg2/O3hP5P9SjQgryxLlQpl2aAV/598=; b=cDL23HLNne7N8bc6doKlwnM2fFAxzlxsDwkJ0dTESoLpYtOeUl39B27B 7QxMUmcJrDITpgoSyBmEG2SHEecLMu4A2u7e9jNQeT9qRhN5aga283Ag5 h4rRsN3T0ikADWQtc/iRdzzNPm4ndRkx75rxCGBzWL+PBB6YDqyApWuXd YI7809pftOFgxdNHxdi8/u+0GzxyOCJSkZANewLhQ9phzux8v2VmlENsW 8FOeh2ewdf8Vv+Jh4McLcAOHAIzgSgxu/aa2Mc/14xbFLOVB0jSbtUCZS XyIpXhFt+pgU0Qsqj8ylieTeN6SrO7B39hOJqXmuvbKVhX05X+KrO13EY g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557117" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557117" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671048" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671048" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:24 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 20/42] x86/mm: Introduce MAP_ABOVE4G Date: Mon, 12 Jun 2023 17:10:46 -0700 Message-Id: <20230613001108.3040476-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546236174147512?= X-GMAIL-MSGID: =?utf-8?q?1768546236174147512?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which require some core mm changes to function properly. One of the properties is that the shadow stack pointer (SSP), which is a CPU register that points to the shadow stack like the stack pointer points to the stack, can't be pointing outside of the 32 bit address space when the CPU is executing in 32 bit mode. It is desirable to prevent executing in 32 bit mode when shadow stack is enabled because the kernel can't easily support 32 bit signals. On x86 it is possible to transition to 32 bit mode without any special interaction with the kernel, by doing a "far call" to a 32 bit segment. So the shadow stack implementation can use this address space behavior as a feature, by enforcing that shadow stack memory is always mapped outside of the 32 bit address space. This way userspace will trigger a general protection fault which will in turn trigger a segfault if it tries to transition to 32 bit mode with shadow stack enabled. This provides a clean error generating border for the user if they try attempt to do 32 bit mode shadow stack, rather than leave the kernel in a half working state for userspace to be surprised by. So to allow future shadow stack enabling patches to map shadow stacks out of the 32 bit address space, introduce MAP_ABOVE4G. The behavior is pretty much like MAP_32BIT, except that it has the opposite address range. The are a few differences though. If both MAP_32BIT and MAP_ABOVE4G are provided, the kernel will use the MAP_ABOVE4G behavior. Like MAP_32BIT, MAP_ABOVE4G is ignored in a 32 bit syscall. Since the default search behavior is top down, the normal kaslr base can be used for MAP_ABOVE4G. This is unlike MAP_32BIT which has to add its own randomization in the bottom up case. For MAP_32BIT, only the bottom up search path is used. For MAP_ABOVE4G both are potentially valid, so both are used. In the bottomup search path, the default behavior is already consistent with MAP_ABOVE4G since mmap base should be above 4GB. Without MAP_ABOVE4G, the shadow stack will already normally be above 4GB. So without introducing MAP_ABOVE4G, trying to transition to 32 bit mode with shadow stack enabled would usually segfault anyway. This is already pretty decent guard rails. But the addition of MAP_ABOVE4G is some small complexity spent to make it make it more complete. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/uapi/asm/mman.h | 1 + arch/x86/kernel/sys_x86_64.c | 6 +++++- include/linux/mman.h | 4 ++++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 775dbd3aff73..5a0256e73f1e 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -3,6 +3,7 @@ #define _ASM_X86_MMAN_H #define MAP_32BIT 0x40 /* only give out 32bit addresses */ +#define MAP_ABOVE4G 0x80 /* only map above 4GB */ #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS #define arch_calc_vm_prot_bits(prot, key) ( \ diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c index 8cc653ffdccd..c783aeb37dce 100644 --- a/arch/x86/kernel/sys_x86_64.c +++ b/arch/x86/kernel/sys_x86_64.c @@ -193,7 +193,11 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; - info.low_limit = PAGE_SIZE; + if (!in_32bit_syscall() && (flags & MAP_ABOVE4G)) + info.low_limit = SZ_4G; + else + info.low_limit = PAGE_SIZE; + info.high_limit = get_mmap_base(0); /* diff --git a/include/linux/mman.h b/include/linux/mman.h index cee1e4b566d8..40d94411d492 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -15,6 +15,9 @@ #ifndef MAP_32BIT #define MAP_32BIT 0 #endif +#ifndef MAP_ABOVE4G +#define MAP_ABOVE4G 0 +#endif #ifndef MAP_HUGE_2MB #define MAP_HUGE_2MB 0 #endif @@ -50,6 +53,7 @@ | MAP_STACK \ | MAP_HUGETLB \ | MAP_32BIT \ + | MAP_ABOVE4G \ | MAP_HUGE_2MB \ | MAP_HUGE_1GB) From patchwork Tue Jun 13 00:10:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106995 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp223271vqr; Mon, 12 Jun 2023 17:47:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7TN6GpV5GOAQ6QdL27CKdpj05dYwr4IrItW1+jjLF/njzkLrJFvUTeZHRVSf4i8bVOmOKY X-Received: by 2002:a05:6402:1258:b0:518:7902:d244 with SMTP id l24-20020a056402125800b005187902d244mr36100edw.6.1686617240144; Mon, 12 Jun 2023 17:47:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617240; cv=none; d=google.com; s=arc-20160816; b=fQFYu8wWKUXEz6i1Abhq9xq68pbdTJXx7CJXw2gM3opwMZ1VA+cowWDyx0UBAPSsHK YsRU0pyamkzNkOXvMJJJE7GrBuGX4uvJTAlKzC2Tvk2NnDqHRDXkJUOnF2TovDATJmCL 8azMVEIuhcN/fYZ1WmDix3wQzMOs+Nfwj0MazBdowZUU7p3F+mFtgWivniqJ9s2mtDt0 6PUCyuHt30rN0Khe5AX5LfE2FYFXmYOFePjMWkJsdsjETZE7vzcnIaqXO0vcc5PLHca+ pgLIjUnSJeM2H2aH1ADCtJvoQK6GYqzkAJemwhwWMlc+5tySIDZlNZPW+NkSuj3wZqSV nJ1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PXl5GqZ7V0Lo7jE7kAx6s21MXFeCFmlgpAurMlsy448=; b=LgcW3RzquP8lTfQG8TB0PLi/Y8da4wcCGeTqva2lxCp8BMOzEGgc/2rPiTZH+MYje5 t9vuwMAIydW9F9hORrgQ1qOfvAf3FdGb43mGpaKgly/5urk/eeM+rfHd9IEM9UmwscQg AH1z+PKmkGCD5Pc5lqk3aS5d8mhioV+8sdCD37+o7xF8FY8k16euNHPSZEEXLkVXMMb+ YGc0aH9K3pmuwxpeO0QzvH4UXhmKPZ59F0u38ZdsQMzpEAEsvU3JMrq3qNhK+QFToZtK q/V66jfR1H/4keSJ5F1SLEH5FDOeQZD927o8W/hYApfhKOSh+xF0e6dxCSSJIe8ze3/a aj+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=G03JU2S0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g4-20020a056402180400b00510d91f80aesi6645614edy.73.2023.06.12.17.46.56; Mon, 12 Jun 2023 17:47:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=G03JU2S0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238924AbjFMAPt (ORCPT + 99 others); Mon, 12 Jun 2023 20:15:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233336AbjFMAOY (ORCPT ); Mon, 12 Jun 2023 20:14:24 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17AA41FD4; Mon, 12 Jun 2023 17:12:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615176; x=1718151176; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c7sunRbrNdSwt/1KmOuIzzKZkXv73NcNzDYIVaoBldo=; b=G03JU2S0Evi/fePKI8tofmFf6jCfbNuseYNk3pBFLsEjtODG0wjoLGX+ fTtSibEDt9OAP17er1I4Y2zOUV+gLWqXTiu7hsS6hUTYyhxkDSadT8Ihk C5CDBUZgCNPaY5h5CR/sbJg/4LKePeZkxee12wEJJxmjKh4tipSR+a9Qc w5tc+YhEQtoNcTjqPXbqzwshaVvGDWwcJzqPt9OrfXmTvLLO3tyz384MH 2E5LkLl5e2e07zNSx8KSt4z1kHIfUHdPl8zbXgCZdRIuKlsGxnubCYOtY P3ghlZM2+r1GM7YyFY84GurNIdlWtTMvT2smfRg87Qwe8bKcd9xoMsz4a w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557144" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557144" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671052" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671052" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:25 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 21/42] x86/mm: Teach pte_mkwrite() about stack memory Date: Mon, 12 Jun 2023 17:10:47 -0700 Message-Id: <20230613001108.3040476-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546359251974006?= X-GMAIL-MSGID: =?utf-8?q?1768546359251974006?= If a VMA has the VM_SHADOW_STACK flag, it is shadow stack memory. So when it is made writable with pte_mkwrite(), it should create shadow stack memory, not conventionally writable memory. Now that all the places where shadow stack memory might be created pass a VMA into pte_mkwrite(), it can know when it should do this. So make pte_mkwrite() create shadow stack memory when the VMA has the VM_SHADOW_STACK flag. Do the same thing for pmd_mkwrite(). This requires referencing VM_SHADOW_STACK in these functions, which are currently defined in pgtable.h, however mm.h (where VM_SHADOW_STACK is located) can't be pulled in without causing problems for files that reference pgtable.h. So also move pte/pmd_mkwrite() into pgtable.c, where they can safely reference VM_SHADOW_STACK. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: Deepak Gupta Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/mm/pgtable.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 101e721d74aa..c4b222d3b1b4 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -883,6 +883,9 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHADOW_STACK) + return pte_mkwrite_shstk(pte); + pte = pte_mkwrite_novma(pte); return pte_clear_saveddirty(pte); @@ -890,6 +893,9 @@ pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHADOW_STACK) + return pmd_mkwrite_shstk(pmd); + pmd = pmd_mkwrite_novma(pmd); return pmd_clear_saveddirty(pmd); From patchwork Tue Jun 13 00:10:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106994 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp223260vqr; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6agxEFQy4DmujcLmJt3o+9oIzU2dDIPlfDoFI4qNCYNZ6PG5gd7DNVvLohRYjyGiPwjDjl X-Received: by 2002:a17:907:6d98:b0:973:d8c4:a4df with SMTP id sb24-20020a1709076d9800b00973d8c4a4dfmr11233721ejc.52.1686617238231; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617238; cv=none; d=google.com; s=arc-20160816; b=g6quNRhEV9i+FYUv41E1GnpBAwXyzJCvC6PGzIlNXRdfnWLI7zzNunDOoP/VjJNTcW rrj45AAdmY1OYQ5/6GELsFUW1U/bHKkxciPPPeoliKg5i6KUosHjsrEhogLWYpgYfWun Z7tVFfXqd328AN8U7tEScXgapDhaosv1tqsXSuVeieLsCIrsJyEemXbIowgfSHRwaqJx X0uEPQTeqgOoPolpU9ZXqRvPfyYB9tsRXdkwwDITRkhAJsfGsDxkjdF0ZMCDWhsXaNG/ TSF/kaF2mhR7U6X4TLBCSeDv6Zqhrat8rfgxv2BTt1gjLo46peSFA+1DunL9Db9EChWJ 3DKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=OTL6GqoGz5oweSC99z4UnMl2uF9TwgX18y0erkZrvzI=; b=Z1OE6dbvxd6uFTxzQcyGKsnfAguX9yeJxBdZxDWenosYVSWDuW9TSWD+iB8YnTjlZi OMVdrKizDpZLY08xvhIZgpuqDyw0u5Ha2rlMGN8DTEuqDGB0NjtmP6pDx4FxY0Co+mFr fITODEBwXIF5z4BAwaxwyeN0M/cJsBkhqF2ycUkMNZDx45d+JzWwG8bCE9355s/fW817 ECreN9wITwZtQt0lTiuZi53RnbRV2emBFyFjU0wD4xQbA3TQY6tGPUzphnBHP6aje4a/ v/BTWbC5OIUX/KVoxpQZnRt9ktxAiak0Ds5yOVBBX35ICOVLTw7LnmH35ibda8+GCNcs 9ouA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VUA03PHE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l4-20020a170906644400b0094f442c8a6dsi3305360ejn.291.2023.06.12.17.46.52; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VUA03PHE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238920AbjFMAP5 (ORCPT + 99 others); Mon, 12 Jun 2023 20:15:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238630AbjFMAPO (ORCPT ); Mon, 12 Jun 2023 20:15:14 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 236E81FE5; Mon, 12 Jun 2023 17:12:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615179; x=1718151179; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nq8FP2aUY00edE8aipvtg7A+Bog2Zlh3+3MlR+vKuX8=; b=VUA03PHEsKf7BGSuEYVz5E8l/PobAViMN6h/hju3XYNwYiRTtNZCV5AC x3LNNCJ13NUScqLV9UND5F2QFHvBf2/0O77HQrsBq7NF83s1B7jzlviKA G7yIGHLE/rXXSvj46HeXcd0JTRv8xmdNR8Rlq3md4QHaj3n0aFnzAzm+Y tgBDIPCs45SkOkTewLEkJHWRKWJU1nAY746E6y2ff/x46bFujzHEknfoh KdVA4n3jA1bkInpanNHtTZodo6mB1SrouubvfrLEArDgRgSMMpzX7MRAt aWp3omlLb1hLmZEf+7Eg7Qu4SfPLZP4rvp0FYOxggwz1OQp8jozOGkCJu g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557169" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557169" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671056" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671056" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:26 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 22/42] mm: Don't allow write GUPs to shadow stack memory Date: Mon, 12 Jun 2023 17:10:48 -0700 Message-Id: <20230613001108.3040476-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546357127373050?= X-GMAIL-MSGID: =?utf-8?q?1768546357127373050?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. In userspace, shadow stack memory is writable only in very specific, controlled ways. However, since userspace can, even in the limited ways, modify shadow stack contents, the kernel treats it as writable memory. As a result, without additional work there would remain many ways for userspace to trigger the kernel to write arbitrary data to shadow stacks via get_user_pages(, FOLL_WRITE) based operations. To help userspace protect their shadow stacks, make this a little less exposed by blocking writable get_user_pages() operations for shadow stack VMAs. Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections. This is required for debugging use cases. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/pgtable.h | 5 +++++ mm/gup.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5383f7282f89..fce35f5d4a4e 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1630,6 +1630,11 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write) { unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER; + /* + * Write=0,Dirty=1 PTEs are shadow stack, which the kernel + * shouldn't generally allow access to, but since they + * are already Write=0, the below logic covers both cases. + */ if (write) need_pte_bits |= _PAGE_RW; diff --git a/mm/gup.c b/mm/gup.c index bbe416236593..cc0dd5267509 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -978,7 +978,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) return -EFAULT; if (write) { - if (!(vm_flags & VM_WRITE)) { + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ From patchwork Tue Jun 13 00:10:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106966 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp215939vqr; Mon, 12 Jun 2023 17:27:29 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4clp8bIy780jC1hkXatNW0f/bkZ7eqHYyKL/u+xsE6IPYFKdFc4gQ43Ycowxes9OItLg/I X-Received: by 2002:a17:907:c15:b0:94e:48ac:9a51 with SMTP id ga21-20020a1709070c1500b0094e48ac9a51mr11539613ejc.4.1686616049406; Mon, 12 Jun 2023 17:27:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616049; cv=none; d=google.com; s=arc-20160816; b=xV5xyGeUn94yaz7TZWdOjJ/1cR10v+8/L2A8Myzkn+cpXbGpu0LicHYww0eAg7xxci wvNvGvZwUGro6UWj7/gnwCF3b402CwDt/RNJj63fvpdnPp83KqeziRyAwQFPBEJgfKcd w1hd59er+T+Z20JQYzLxD/KMG5s2X9y2WlbTU2eLly9m/n9KnD2oYL7Fctr2O/rbHprf S+A6vzfpQcgDS97kQgPnien7Ygp2BQ+eqjSst5IJ+6Pngd3brJN8O9iqrnRmtxXJSAZg fDyIV+diO7qgD1QZLV/q3YBtmjVD6AuMPXTyHm0q0BJDkfec/3hYmiUBZ9FpsGayQA8P +WWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ngsNHJkwJ8rpoiE9+OlXKRzL3wHKwvwi/lLZa6Im/hc=; b=ltWq7Xgu8RMLb7iZiWH623JDxwk+ZzjHZSWTIK4pByCkdvBInvkRtrfbohFX+tymqN YBrsfpjuBx/Me4GhETiA6HnCLJk36EmFZDP2/f0BLC8Vway28pL++qGpk46kQjkshRZm k7hCbRNs01ANnqgQ/h7zEpZYXrv+qduNbWcRfwEWHeyWePmZXOyEUJQyQ4Y5CItgqHG3 TgcBKejtFaBQlZGv/AC9O+lxaYSZvXpv5D1fpBx1NljhGWPuha90xG4sZ5p/QNFQlMdq kyYE7Elyo6PsvXvX0PRhvkM8+dzgyEx4dh4s4sox/+HWEMfU6hz3kaJ8vaRutKo/46RC hyqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lysIj8lU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c6-20020a170906d18600b00977cc33b59dsi6047915ejz.737.2023.06.12.17.27.04; Mon, 12 Jun 2023 17:27:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lysIj8lU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239335AbjFMAQX (ORCPT + 99 others); Mon, 12 Jun 2023 20:16:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52406 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239231AbjFMAPg (ORCPT ); Mon, 12 Jun 2023 20:15:36 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 116F82681; Mon, 12 Jun 2023 17:13:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615199; x=1718151199; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QMw9iVo5bgNcqwrdr19MOUpW+koohCohKti/401XfIw=; b=lysIj8lU3aMzLbUK29LV7QLkk/p0zxuNXCKzD8t3kBMUIZkY4ydgR0aH ZtYsKNAOcJB1cYIYPCyMDx9Rkz8gjC5EcGt3pvXl+JyaovqGGRBg1KN/m jJ+PCDL0Fkii6uPyoAZWDKJY9vK2Nw4YASS1mlToAscsvao34SyqsjD1F bfFlmRQjmSDu188/t9iXq/6iQtk8Zlv+mDlsCHrRzf8iArLL3+P8xiNen XodN2SvT+g18Xj5rO9Xxufww8VyN3SAJa7xmJXD9VFTXq5aujanviF5RQ 8EHK4jo8eUoyjxqKBIGZQr1R2MJIxGM+agsWwWrFo4ftkRN0D9OX5Q7F+ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557195" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557195" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671062" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671062" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:27 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description Date: Mon, 12 Jun 2023 17:10:49 -0700 Message-Id: <20230613001108.3040476-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768545110814277149?= X-GMAIL-MSGID: =?utf-8?q?1768545110814277149?= Introduce a new document on Control-flow Enforcement Technology (CET). Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: Szabolcs Nagy --- Documentation/arch/x86/index.rst | 1 + Documentation/arch/x86/shstk.rst | 169 +++++++++++++++++++++++++++++++ 2 files changed, 170 insertions(+) create mode 100644 Documentation/arch/x86/shstk.rst diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst index c73d133fd37c..8ac64d7de4dc 100644 --- a/Documentation/arch/x86/index.rst +++ b/Documentation/arch/x86/index.rst @@ -22,6 +22,7 @@ x86-specific Documentation mtrr pat intel-hfi + shstk iommu intel_txt amd-memory-encryption diff --git a/Documentation/arch/x86/shstk.rst b/Documentation/arch/x86/shstk.rst new file mode 100644 index 000000000000..f09afa504ec0 --- /dev/null +++ b/Documentation/arch/x86/shstk.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================================== +Control-flow Enforcement Technology (CET) Shadow Stack +====================================================== + +CET Background +============== + +Control-flow Enforcement Technology (CET) covers several related x86 processor +features that provide protection against control flow hijacking attacks. CET +can protect both applications and the kernel. + +CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack +is a secondary stack allocated from memory which cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. IBT verifies indirect CALL/JMP targets are intended +as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow +Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace +shadow stack and kernel IBT are supported. + +Requirements to use Shadow Stack +================================ + +To use userspace shadow stack you need HW that supports it, a kernel +configured with it and userspace libraries compiled with it. + +The kernel Kconfig option is X86_USER_SHADOW_STACK. When compiled in, shadow +stacks can be disabled at runtime with the kernel parameter: nousershstk. + +To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later +are required. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. "user_shstk" means that userspace shadow stack is supported on the current +kernel and HW. + +Application Enabling +==================== + +An application's CET capability is marked in its ELF note and can be verified +from readelf/llvm-readelf output:: + + readelf -n | grep -a SHSTK + properties: x86 feature: SHSTK + +The kernel does not process these applications markers directly. Applications +or loaders must enable CET features using the interface described in section 4. +Typically this would be done in dynamic loader or static runtime objects, as is +the case in GLIBC. + +Enabling arch_prctl()'s +======================= + +Elf features should be enabled by the loader using the below arch_prctl's. They +are only supported in 64 bit user applications. These operate on the features +on a per-thread basis. The enablement status is inherited on clone, so if the +feature is enabled on the first thread, it will propagate to all the thread's +in an app. + +arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature) + Enable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature) + Disable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) + Lock in features at their current enabled or disabled status. 'features' + is a mask of all features to lock. All bits set are processed, unset bits + are ignored. The mask is ORed with the existing value. So any feature bits + set here cannot be enabled or disabled afterwards. + +The return values are as follows. On success, return 0. On error, errno can +be:: + + -EPERM if any of the passed feature are locked. + -ENOTSUPP if the feature is not supported by the hardware or + kernel. + -EINVAL arguments (non existing feature, etc) + +The feature's bits supported are:: + + ARCH_SHSTK_SHSTK - Shadow stack + ARCH_SHSTK_WRSS - WRSS + +Currently shadow stack and WRSS are supported via this interface. WRSS +can only be enabled with shadow stack, and is automatically disabled +if shadow stack is disabled. + +Proc Status +=========== +To check if an application is actually running with shadow stack, the +user can read the /proc/$PID/status. It will report "wrss" or "shstk" +depending on what is enabled. The lines look like this:: + + x86_Thread_features: shstk wrss + x86_Thread_features_locked: shstk wrss + +Implementation of the Shadow Stack +================================== + +Shadow Stack Size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. In the case +of the clone3 syscall, there is a stack size passed in and shadow stack +uses this instead of the rlimit. + +Signal +------ + +The main program and its signal handlers use the same shadow stack. Because +the shadow stack stores only return addresses, a large shadow stack covers +the condition that both the program stack and the signal alternate stack run +out. + +When a signal happens, the old pre-signal state is pushed on the stack. When +shadow stack is enabled, the shadow stack specific state is pushed onto the +shadow stack. Today this is only the old SSP (shadow stack pointer), pushed +in a special format with bit 63 set. On sigreturn this old SSP token is +verified and restored by the kernel. The kernel will also push the normal +restorer address to the shadow stack to help userspace avoid a shadow stack +violation on the sigreturn path that goes through the restorer. + +So the shadow stack signal frame format is as follows:: + + |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format + (bit 63 set to 1) + | ...| - Other state may be added in the future + + +32 bit ABI signals are not supported in shadow stack processes. Linux prevents +32 bit execution while shadow stack is enabled by the allocating shadow stacks +outside of the 32 bit address space. When execution enters 32 bit mode, either +via far call or returning to userspace, a #GP is generated by the hardware +which, will be delivered to the process as a segfault. When transitioning to +userspace the register's state will be as if the userspace ip being returned to +caused the segfault. + +Fork +---- + +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. New shadow stack creation behaves like mmap() with respect +to ASLR behavior. Similarly, on thread exit the thread's shadow stack is +disabled. + +Exec +---- + +On exec, shadow stack features are disabled by the kernel. At which point, +userspace can choose to re-enable, or lock them. From patchwork Tue Jun 13 00:10:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106972 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp221171vqr; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4OxIUGy8Ph+cCioampRj9cHNl7Z33AcNVAEbvHh5IrHKlXKJGCtOt1d+n6UOOKPMlHfWQo X-Received: by 2002:a17:907:7da5:b0:974:5a12:546 with SMTP id oz37-20020a1709077da500b009745a120546mr10473975ejc.23.1686616917466; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616917; cv=none; d=google.com; s=arc-20160816; b=hoIv7c2z3wES3AEltbdeV72ybpMZyUdofQL8izAX0Wd1nUARYn2PBuZIz0ctMdOUfT 2Y40/ygPoI/aSnYMigErTugZb0vurMQj+dPxDpop/tCVvTXnRw7VuvM4JsSkqSUUsjkn 67n5qJ8oOt3f+wRO3zSEmI2owSewmlgy9QkUrxX1XrCyiM3gNfjZAF/K6OVU/XlydZ1z RU1Qxy+DGnyavWC+xWYNtaapjGUZ7DOqJ+sGIQ/KFjvTENd5iwyewSatTxAgF/Nqi2jD F+R0SUYuqXZ5tSk9scTXaURirtkw0Va10adEuL6y8mfIzlEQVKVayOUTGsztTy2L77MF diJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=caRAaSBciErwlEFgMtwnmdwmBKoQ4mAGZdQm0SayhA8=; b=d4hkxq1VE0lsMXlZJj5NAtkQNwLNZdiDdpHiiziIyzFNr8a1db2vYbClbUpYMtSWAn UFZgbqVCRYBnF4sss5yza7vUoNDfVRemT5b4dwFj8rpMwPY56WTgODT5UTu6kXqBn9tI bomMGxqUOdvRm/O9nZXxStXh1sZhblQ++fZW0VI39XsKkPK++qgLSny5GaNwgSG36uX2 Jg1R2nJ80rxgRm3Xxw18kKuVSlaL82u4Ld+ruFLVd53PVEThg2FrsKAxWscnStDbQreM oO9ic0CQq2+WN9MRsgsKPr8JBYeSRgnwEwpd4UpEr+fGxrX1w3b98HovvZkBy1GDFqZB 4BiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NDEwsYYH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e15-20020a1709067e0f00b0097456c647ffsi6314679ejr.447.2023.06.12.17.41.33; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NDEwsYYH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238718AbjFMAQ2 (ORCPT + 99 others); Mon, 12 Jun 2023 20:16:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238733AbjFMAPj (ORCPT ); Mon, 12 Jun 2023 20:15:39 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46A6B2686; Mon, 12 Jun 2023 17:13:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615199; x=1718151199; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=68vHo4+oSqhV8kF+lXavlD4FeK2AToHr3tcBPqqVDm8=; b=NDEwsYYHx40Ep9OMIk9W1/iP7dJnke4QePI7CdyeGzxMV5ajNdl62T88 Y6TCFnKkVf8pkOeKQLhie1xRNENYtsGWviupO47phzUnvwVzSs81yICvs XpDhafxbU9NMXvQMM+LbtiuE7QJlr57p2t0RpcuGaa8TVjkOzFXqIw/Of qcsSWAQC5kJUHCyBJan0ySMjJAY7XPsZlz2iyrQTbMw56qiLGKGKoNnW7 3Ua13iIKiCiBaWVyBK+IxMC4kBBA5hivZkLj5DXgXKBXOSX3EuP/baD5p KIVHX+EBQbi4uzaMbav0H4BXGgnSqnURIkoh2KdGaOFmULaf25C/bclcm Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557219" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557219" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671067" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671067" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:28 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 24/42] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Mon, 12 Jun 2023 17:10:50 -0700 Message-Id: <20230613001108.3040476-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546020526351652?= X-GMAIL-MSGID: =?utf-8?q?1768546020526351652?= Shadow stack register state can be managed with XSAVE. The registers can logically be separated into two groups: * Registers controlling user-mode operation * Registers controlling kernel-mode operation The architecture has two new XSAVE state components: one for each group of those groups of registers. This lets an OS manage them separately if it chooses. Future patches for host userspace and KVM guests will only utilize the user-mode registers, so only configure XSAVE to save user-mode registers. This state will add 16 bytes to the xsave buffer size. Future patches will use the user-mode XSAVE area to save guest user-mode CET state. However, VMCS includes new fields for guest CET supervisor states. KVM can use these to save and restore guest supervisor state, so host supervisor XSAVE support is not required. Adding this exacerbates the already unwieldy if statement in check_xstate_against_struct() that handles warning about unimplemented xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when it actually check's the xfeature. This ends up exceeding 80 chars, but was better on balance than other options explored. Pass the bool as pointer to make it clear that XCHECK_SZ() can change the variable. While configuring user-mode XSAVE, clarify kernel-mode registers are not managed by XSAVE by defining the xfeature in XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT. This serves more of a documentation as code purpose, and functionally, only enables a few safety checks. Both XSAVE state components are supervisor states, even the state controlling user-mode operation. This is a departure from earlier features like protection keys where the PKRU state is a normal user (non-supervisor) state. Having the user state be supervisor-managed ensures there is no direct, unprivileged access to it, making it harder for an attacker to subvert CET. To facilitate this privileged access, define the two user-mode CET MSRs, and the bits defined in those MSRs relevant to future shadow stack enablement patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/fpu/types.h | 16 +++++- arch/x86/include/asm/fpu/xstate.h | 6 ++- arch/x86/kernel/fpu/xstate.c | 90 +++++++++++++++---------------- 3 files changed, 61 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 7f6d858ff47a..eb810074f1e7 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL_UNUSED, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,16 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + /* user control-flow settings */ + u64 user_cet; + /* user shadow stack pointer */ + u64 user_ssp; +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 0bab497c9436..4fa4751912d9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -39,26 +39,26 @@ */ static const char *xfeature_names[] = { - "x87 floating point registers" , - "SSE registers" , - "AVX registers" , - "MPX bounds registers" , - "MPX CSR" , - "AVX-512 opmask" , - "AVX-512 Hi256" , - "AVX-512 ZMM_Hi256" , - "Processor Trace (unused)" , + "x87 floating point registers", + "SSE registers", + "AVX registers", + "MPX bounds registers", + "MPX CSR", + "AVX-512 opmask", + "AVX-512 Hi256", + "AVX-512 ZMM_Hi256", + "Processor Trace (unused)", "Protection Keys User registers", "PASID state", - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "AMX Tile config" , - "AMX Tile data" , - "unknown xstate feature" , + "Control-flow User registers", + "Control-flow Kernel registers (unused)", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "AMX Tile config", + "AMX Tile data", + "unknown xstate feature", }; static unsigned short xsave_cpuid_features[] __initdata = { @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -276,6 +277,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void) } \ } while (0) -#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \ - if ((nr == nr_macro) && \ - WARN_ONCE(sz != sizeof(__struct), \ - "%s: struct is %zu bytes, cpu state %d bytes\n", \ - __stringify(nr_macro), sizeof(__struct), sz)) { \ +#define XCHECK_SZ(sz, nr, __struct) ({ \ + if (WARN_ONCE(sz != sizeof(__struct), \ + "[%s]: struct is %zu bytes, cpu state %d bytes\n", \ + xfeature_names[nr], sizeof(__struct), sz)) { \ __xstate_dump_leaves(); \ } \ -} while (0) + true; \ +}) + /** * check_xtile_data_against_struct - Check tile data state size. @@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr) * Ask the CPU for the size of the state. */ int sz = xfeature_size(nr); + /* * Match each CPU state with the corresponding software * structure. */ - XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct); - XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); - XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); - XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); - XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); - XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); - XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); - XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); - XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); - - /* The tile data size varies between implementations. */ - if (nr == XFEATURE_XTILE_DATA) - check_xtile_data_against_struct(sz); - - /* - * Make *SURE* to add any feature numbers in below if - * there are "holes" in the xsave state component - * numbers. - */ - if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || - (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + switch (nr) { + case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct); + case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state); + case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state); + case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state); + case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state); + case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state); + case XFEATURE_PKRU: return XCHECK_SZ(sz, nr, struct pkru_state); + case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); + case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); + case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true; + default: XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr); return false; } + return true; } From patchwork Tue Jun 13 00:10:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106977 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222035vqr; Mon, 12 Jun 2023 17:44:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7JUVvKKnuvMkDf9It0oV67wpQQRSZ413W30R8n3O7RuPxe3oFOyjyATkZ5Ojx7ErV6w577 X-Received: by 2002:aa7:cd6d:0:b0:518:7232:e63a with SMTP id ca13-20020aa7cd6d000000b005187232e63amr697511edb.36.1686617051762; Mon, 12 Jun 2023 17:44:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617051; cv=none; d=google.com; s=arc-20160816; b=JJKVR2x0XpdX5dPxqSG2gLH4F/lHyvGDyO8yhHJj6b4mX5xMHZCsbaFPKoKyuO3YdJ aHP/iaxicnipRsdlsV5twyX5ul36AlNh/4feg/QchvVwkf8WpcayRggkmn3ivuORVS0o FHG7FIt9TXxBwnyoLOs/T9I31TLGJzyeeynG1ZBPI33VzLxjfaPn0RyxcEleI/ubtCSw mb8FCgk15UpJiHaUDeSgz1DPITrx55qxGHzdU90XmQmE/+516mkyOOqUal6u96X6Ysyg XH6AaMUUqebi2O/t4ZXjfYayLR0iAif4o7Ord5RNW9ojFUE/kwNYUzHua+rDpwUhUu5D eTYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=F2RW2ABIPgY9zcBNGEtdjwLLlBLz/ha0Ddgc0k6oYH0=; b=TjUcd37hAu84qiE27/zuIi0+76ckVNHKQKWwSDyVhh7Aacoed+CM9+UFiyycuytD0Q bxerC2GT4LianSAscHvTUG19/t668qS8nfA9B8qvJ588ORvpjFMup29iuFYzlcUPq4CL KnJ8Q6q0VpnY9JJX7SlzAlxd5tDdQDjje3gySC88f+6HoRquJhsAfKwvZYG04jAQmPUK G3rHHysXQWTK3PxP2dXMTD05WHiHR5qZzJA+bw2Pln30d08gh4TOFaEP4oVQElutP2cQ hhukqmvVGASxENmTScD2EhIGq4UREZBwBlTtCD4HlbOYOXGOOrcHepvdWhxvgnSYJez0 T7QA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m2hAgSCI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e4-20020a056402088400b0050489449e77si2706010edy.569.2023.06.12.17.43.48; Mon, 12 Jun 2023 17:44:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m2hAgSCI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239374AbjFMAQc (ORCPT + 99 others); Mon, 12 Jun 2023 20:16:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238914AbjFMAPq (ORCPT ); Mon, 12 Jun 2023 20:15:46 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E03EC269D; Mon, 12 Jun 2023 17:13:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615202; x=1718151202; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L2W4szzCW+qvvywxGcp6uHsWCFDTi4NqV9MxxSUxhGY=; b=m2hAgSCIvSKQe7x+o4sP0P0iMTZ8eLiENIOoIE8RAIkvuslNZ2qYuvTF +yfZfxnyqMLDiH4LSV3hezfYsQoTKxUx6OVJEpST/sv1yMeeZUSiGbz2L liCpQVq4v6XmO4n91rEpMHUONW3w2v2cBAJlmAzK3VGeUU/a6TL4wB5gb ug+NDKdeVH6SdRMYAyMthq34o7fifKHJDOZaF/lN1h3yBiDbNYIu+g4mV MoBhjFnDP3SimoTpcO8FCtbBqqcFiK4hR+I4jDd38dlxZKTPERcvx7Zz2 J6wII4D/38ew+5WTtadegZ8G7XCiuYOj7sByjMp2SlLrqP80BdLwTT8K+ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557243" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557243" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671072" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671072" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:28 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 25/42] x86/fpu: Add helper for modifying xstate Date: Mon, 12 Jun 2023 17:10:51 -0700 Message-Id: <20230613001108.3040476-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546161739402773?= X-GMAIL-MSGID: =?utf-8?q?1768546161739402773?= Just like user xfeatures, supervisor xfeatures can be active in the registers or present in the task FPU buffer. If the registers are active, the registers can be modified directly. If the registers are not active, the modification must be performed on the task FPU buffer. When the state is not active, the kernel could perform modifications directly to the buffer. But in order for it to do that, it needs to know where in the buffer the specific state it wants to modify is located. Doing this is not robust against optimizations that compact the FPU buffer, as each access would require computing where in the buffer it is. The easiest way to modify supervisor xfeature data is to force restore the registers and write directly to the MSRs. Often times this is just fine anyway as the registers need to be restored before returning to userspace. Do this for now, leaving buffer writing optimizations for the future. Add a new function fpregs_lock_and_load() that can simultaneously call fpregs_lock() and do this restore. Also perform some extra sanity checks in this function since this will be used in non-fpu focused code. Suggested-by: Thomas Gleixner Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/fpu/api.h | 9 +++++++++ arch/x86/kernel/fpu/core.c | 18 ++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 503a577814b2..aadc6893dcaa 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -82,6 +82,15 @@ static inline void fpregs_unlock(void) preempt_enable(); } +/* + * FPU state gets lazily restored before returning to userspace. So when in the + * kernel, the valid FPU state may be kept in the buffer. This function will force + * restore all the fpu state to the registers early if needed, and lock them from + * being automatically saved/restored. Then FPU state can be modified safely in the + * registers, before unlocking with fpregs_unlock(). + */ +void fpregs_lock_and_load(void); + #ifdef CONFIG_X86_DEBUG_FPU extern void fpregs_assert_state_consistent(void); #else diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index caf33486dc5e..f851558b673f 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -753,6 +753,24 @@ void switch_fpu_return(void) } EXPORT_SYMBOL_GPL(switch_fpu_return); +void fpregs_lock_and_load(void) +{ + /* + * fpregs_lock() only disables preemption (mostly). So modifying state + * in an interrupt could screw up some in progress fpregs operation. + * Warn about it. + */ + WARN_ON_ONCE(!irq_fpu_usable()); + WARN_ON_ONCE(current->flags & PF_KTHREAD); + + fpregs_lock(); + + fpregs_assert_state_consistent(); + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + fpregs_restore_userregs(); +} + #ifdef CONFIG_X86_DEBUG_FPU /* * If current FPU state according to its tracking (loaded FPU context on this From patchwork Tue Jun 13 00:10:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106971 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp221167vqr; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6o6bYcD69BxtfI1sIKuGpDDR812NzHU4CQPPcIt/WJSRCFB2rwOCmnn47XPre3BQomp98V X-Received: by 2002:aa7:c3ca:0:b0:514:9465:81ef with SMTP id l10-20020aa7c3ca000000b00514946581efmr5763031edr.21.1686616917450; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616917; cv=none; d=google.com; s=arc-20160816; b=t6dydoOdOfyEQQkHAblYn9CI8xArThCqNSoSRvlIu7jenrSml5NmNy2v8u1oXDaSkT XjL3gb8Gw4M1uW8XJ4kqI9Xgyinpnox7dpkdaRM62EyAksFMHwNsMAHB1bq2I258RV3+ 9pIYBC0hQbKivs5Rd8xMTqjmlAfjN5qTqtcsMB4/AQKs6NeufG4kpIRAVI+WN4jZUzGn TxGjK7rWKT8U6W+SUdiXugtCSDJwGhdQtVY6osgRNkChLuOFXjp83AHXBEVcjhrWd50v MnLH0voAAFO3KgIDrwRJBi/g0ju5Gvr7UK/0JAKM/bnsqjm3B1YGmrHrmSC6m2UW6ZXy 3jfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=izcKFpB20xlWfjQ0sAp89VgdYkvyy6AEcctJjTNYnrk=; b=nFonn9qlclt+9HfM44/a7aJbfndv3GuQIotJ+hoSupPFa2MLFBeIyX7h7bZx3cFfPN FzpPgktu/mtVd1S2U8G+ai2kog29ddCMt+/IoNqIqKAgorHBbsPFZHPXk09mF1FXc2Lq yFU87TuH9U4XbLwReCkuuZtnsL/fD6O783OcjA2LfMjvP1HCK+1xa6pq0wygTbayQWXI FRxqrkNDcRpR01vD2WQNOM8n9wbrosfTGdLCz2idPi4ziPZF4vGZS3z3xgjB0QH9hP+6 PjMdls/s6peerwTbiNyWioe6A+qc4CwAywdFjuFRsDQQp/cBK6kxc1SCk6VM3iDCG1BS zyAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R6xK0VOk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e4-20020a056402088400b0050489449e77si2706010edy.569.2023.06.12.17.41.33; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R6xK0VOk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239411AbjFMAQ6 (ORCPT + 99 others); Mon, 12 Jun 2023 20:16:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238840AbjFMAQQ (ORCPT ); Mon, 12 Jun 2023 20:16:16 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 037C0173F; Mon, 12 Jun 2023 17:13:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615209; x=1718151209; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FpsnqDmskVtuVWeAVhUxFxlMXuwPJqlBrZTKKFCDs5Y=; b=R6xK0VOk/CsKnPanl4rMbdEeh3v8uOxWqtL/Ssrl1fxL6P8qC444KXCo VO64BplBrjLIBbY2uer5dHGOecGDsRrFpOW+bT+yJSO2vxXhNgguXfq1Z SIezcfqlI2nGtaNj4d3gQr/GmtxzNrMUi9wtkiuwneTP2OiTRMSdqnEQa EY5u6mpQqNAZeUoO5OqG4x/BRfsRI2CMSqROi+7oaz/xZJggAXi1A5SOJ laxjpkUHP0jsdKehMuVS27OzGthsbrEJwxmqLZL5hsPBQDytMkOXCXRAY qWGhnEJ2DIBg5mQxKOHNv7x0/jzWZeyJt4t8MR5as9TW4av/hNLpi3K0m w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557269" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557269" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671078" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671078" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:29 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 26/42] x86: Introduce userspace API for shadow stack Date: Mon, 12 Jun 2023 17:10:52 -0700 Message-Id: <20230613001108.3040476-27-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546020352053279?= X-GMAIL-MSGID: =?utf-8?q?1768546020352053279?= Add three new arch_prctl() handles: - ARCH_SHSTK_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or a negative value on error. - ARCH_SHSTK_LOCK prevents future disabling or enabling of the specified feature. Returns 0 on success or a negative value on error. The features are handled per-thread and inherited over fork(2)/clone(2), but reset on exec(). Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/processor.h | 6 +++++ arch/x86/include/asm/shstk.h | 21 +++++++++++++++ arch/x86/include/uapi/asm/prctl.h | 6 +++++ arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/process_64.c | 6 +++++ arch/x86/kernel/shstk.c | 44 +++++++++++++++++++++++++++++++ 6 files changed, 85 insertions(+) create mode 100644 arch/x86/include/asm/shstk.h create mode 100644 arch/x86/kernel/shstk.c diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a1e4fa58b357..407d5551b6a7 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -28,6 +28,7 @@ struct vm86; #include #include #include +#include #include #include @@ -475,6 +476,11 @@ struct thread_struct { */ u32 pkru; +#ifdef CONFIG_X86_USER_SHADOW_STACK + unsigned long features; + unsigned long features_locked; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h new file mode 100644 index 000000000000..ec753809f074 --- /dev/null +++ b/arch/x86/include/asm/shstk.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SHSTK_H +#define _ASM_X86_SHSTK_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +long shstk_prctl(struct task_struct *task, int option, unsigned long features); +void reset_thread_features(void); +#else +static inline long shstk_prctl(struct task_struct *task, int option, + unsigned long arg2) { return -EINVAL; } +static inline void reset_thread_features(void) {} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_SHSTK_H */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index e8d7ebbca1a4..1cd44ecc9ce0 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -23,9 +23,15 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +/* Don't use 0x3001-0x3004 because of old glibcs */ + #define ARCH_GET_UNTAG_MASK 0x4001 #define ARCH_ENABLE_TAGGED_ADDR 0x4002 #define ARCH_GET_MAX_TAG_BITS 0x4003 #define ARCH_FORCE_TAGGED_SVA 0x4004 +#define ARCH_SHSTK_ENABLE 0x5001 +#define ARCH_SHSTK_DISABLE 0x5002 +#define ARCH_SHSTK_LOCK 0x5003 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index abee0564b750..6b6bf47652ee 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -147,6 +147,8 @@ obj-$(CONFIG_CALL_THUNKS) += callthunks.o obj-$(CONFIG_X86_CET) += cet.o +obj-$(CONFIG_X86_USER_SHADOW_STACK) += shstk.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 3d181c16a2f6..0f89aa0186d1 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -515,6 +515,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, load_gs_index(__USER_DS); } + reset_thread_features(); + loadsegment(fs, 0); loadsegment(es, _ds); loadsegment(ds, _ds); @@ -894,6 +896,10 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) else return put_user(LAM_U57_BITS, (unsigned long __user *)arg2); #endif + case ARCH_SHSTK_ENABLE: + case ARCH_SHSTK_DISABLE: + case ARCH_SHSTK_LOCK: + return shstk_prctl(task, option, arg2); default: ret = -EINVAL; break; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c new file mode 100644 index 000000000000..41ed6552e0a5 --- /dev/null +++ b/arch/x86/kernel/shstk.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * shstk.c - Intel shadow stack support + * + * Copyright (c) 2021, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include + +void reset_thread_features(void) +{ + current->thread.features = 0; + current->thread.features_locked = 0; +} + +long shstk_prctl(struct task_struct *task, int option, unsigned long features) +{ + if (option == ARCH_SHSTK_LOCK) { + task->thread.features_locked |= features; + return 0; + } + + /* Don't allow via ptrace */ + if (task != current) + return -EINVAL; + + /* Do not allow to change locked features */ + if (features & task->thread.features_locked) + return -EPERM; + + /* Only support enabling/disabling one feature at a time. */ + if (hweight_long(features) > 1) + return -EINVAL; + + if (option == ARCH_SHSTK_DISABLE) { + return -EINVAL; + } + + /* Handle ARCH_SHSTK_ENABLE */ + return -EINVAL; +} From patchwork Tue Jun 13 00:10:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106973 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp221170vqr; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4fw6R4zzvpXpayDhItiDUQ+Srl2AlEPARJfrTP+/0H2f2Fmgtzu3dJ4UmAqhxGjAW9kugk X-Received: by 2002:a17:907:169f:b0:96f:2b3f:61 with SMTP id hc31-20020a170907169f00b0096f2b3f0061mr12144691ejc.7.1686616917469; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616917; cv=none; d=google.com; s=arc-20160816; b=HgRN7+7DljTyGGviVTjSTByZVCU+wkDsntGBSTEd9d5IxQyvFyNai9jEC1dKvGYRtx 1dBi0I/qZ0pWskU+QpZZZM9j+dZ77gDeupsLCgxEB2h4Ov8ixI0pCJZ1YtH8erH58TOB tBZRSrT6UVVAgzdJdohwNgyqrghZ9+2zEQAHdeeYGxQiUBSARa0wbVIEV1lKqvmazWTL yNs+OEz+6ogZT0alH6aqKLQVjDWr7l29dsfV9TcREm7FFwpuoBlp6saTq4PZaaueQxY8 +33jB6zY7A63+qF0jN36t2lKCZQGnbsC6I/pvHozVn2FlCGzSdwOKZN7CCXUE+9mMK4+ nvrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=qGs9rCBjozTM2AAPpzKdYfSTzD2KPWUcCiw9QM54jZM=; b=YpZnDehcDnrHs7Fpkv4rA/Il4FPFFeL3AIpSydaGldwXbQbqCbfhpgY/Wx4SK+lV/u pGLFxRGYvBAMATaKvFHPSKmtCiRHBKl99GJIcXafcSWrYCXxV7b/wR+oy2jzbHh5j5G4 YLT/o1DUmnVErb/+HtmQlGGu1a+UPynFkD7GrvI4QTRCX7m8p1mYH2Ee5hBBuvNwBJhk LRWlxYpvM17YNqeKdkjQRxhmtA41KMxiw45moUZTDakQOVNYrWLnIqyIbnDuRX4qkcHj O/lYnyUSy4NpB7duYey2E2TxmJhc7bO/YHICf+isqkfSI7H5GZHN6FSp/6HGIEE8CkrG 23uA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Al0xAI+P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n15-20020a170906688f00b0096f5ebda2d9si5538515ejr.787.2023.06.12.17.41.04; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Al0xAI+P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239422AbjFMARD (ORCPT + 99 others); Mon, 12 Jun 2023 20:17:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239282AbjFMAQR (ORCPT ); Mon, 12 Jun 2023 20:16:17 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7C9F2957; Mon, 12 Jun 2023 17:13:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615210; x=1718151210; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WchKfqQNnYFXXNMxX3TqxAMzB4S9WkAXS/aRm3qIr7o=; b=Al0xAI+PUFsEVN9pTNbLZHllXr+EHXB/PJSOsmYJLEvdTGYC3TbIlpAW 2hsIMbn0kst0HAZL4+s3OS1bnAkD08OMjMzrFTbBc6dkIcETtFXA9Hkqc 2Zsu7jSgXvRuV1tBiUuv0bFUv2tfniDxMQjajOmnIy5fWHvEwctHXRF7g xcpBMZ9TUByTRHVWZtn3iMZX4qKTSrbUXMHRZXNvDmNPNHhvMUjWI/ri4 6QRLgNI0BpLF2v93dkuQbSctgOzmpYPtLDa+1WlB6OTOQWCtT8ylfHIqh hPygRV21tSvBgQUHpEJLxgaF5oshbpcwUqxoaA/h8jHvFsIXvU9RANpoL A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557292" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557292" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671087" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671087" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:30 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 27/42] x86/shstk: Add user control-protection fault handler Date: Mon, 12 Jun 2023 17:10:53 -0700 Message-Id: <20230613001108.3040476-28-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546020726762866?= X-GMAIL-MSGID: =?utf-8?q?1768546020726762866?= A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack. There already exists a control-protection fault handler for handling kernel IBT faults. Refactor this fault handler into separate user and kernel handlers, like the page fault handler. Add a control-protection handler for usermode. To avoid ifdeffery, put them both in a new file cet.c, which is compiled in the case of either of the two CET features supported in the kernel: kernel IBT or user mode shadow stack. Move some static inline functions from traps.c into a header so they can be used in cet.c. Opportunistically fix a comment in the kernel IBT part of the fault handler that is on the end of the line instead of preceding it. Keep the same behavior for the kernel side of the fault handler, except for converting a BUG to a WARN in the case of a #CP happening when the feature is missing. This unifies the behavior with the new shadow stack code, and also prevents the kernel from crashing under this situation which is potentially recoverable. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/arm/kernel/signal.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/signal32.c | 2 +- arch/sparc/kernel/signal32.c | 2 +- arch/sparc/kernel/signal_64.c | 2 +- arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/idtentry.h | 2 +- arch/x86/include/asm/traps.h | 12 +++ arch/x86/kernel/cet.c | 94 +++++++++++++++++++++--- arch/x86/kernel/idt.c | 2 +- arch/x86/kernel/signal_32.c | 2 +- arch/x86/kernel/signal_64.c | 2 +- arch/x86/kernel/traps.c | 12 --- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm.S | 2 +- include/uapi/asm-generic/siginfo.h | 3 +- 16 files changed, 117 insertions(+), 34 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index e07f359254c3..9a3c9de5ac5e 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -681,7 +681,7 @@ asmlinkage void do_rseq_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 2cfc810d0a5b..06d31731c8ed 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1343,7 +1343,7 @@ void __init minsigstksz_setup(void) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index 4700f8522d27..bbd542704730 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -460,7 +460,7 @@ void compat_setup_restart_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c index dad38960d1a8..82da8a2d769d 100644 --- a/arch/sparc/kernel/signal32.c +++ b/arch/sparc/kernel/signal32.c @@ -751,7 +751,7 @@ asmlinkage int do_sys32_sigstack(u32 u_ssptr, u32 u_ossptr, unsigned long sp) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c index 570e43e6fda5..b4e410976e0d 100644 --- a/arch/sparc/kernel/signal_64.c +++ b/arch/sparc/kernel/signal_64.c @@ -562,7 +562,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index b9c7eae2e70f..702d93fdd10e 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -111,6 +111,12 @@ #define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) #endif +#ifdef CONFIG_X86_KERNEL_IBT +#define DISABLE_IBT 0 +#else +#define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -134,7 +140,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ DISABLE_ENQCMD) #define DISABLED_MASK17 0 -#define DISABLED_MASK18 0 +#define DISABLED_MASK18 (DISABLE_IBT) #define DISABLED_MASK19 0 #define DISABLED_MASK20 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index b241af4ce9b4..61e0e6301f09 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -614,7 +614,7 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault); #endif /* #CP */ -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); #endif diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 47ecfff2c83d..75e0dabf0c45 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -47,4 +47,16 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs, struct stack_info *info); #endif +static inline void cond_local_irq_enable(struct pt_regs *regs) +{ + if (regs->flags & X86_EFLAGS_IF) + local_irq_enable(); +} + +static inline void cond_local_irq_disable(struct pt_regs *regs) +{ + if (regs->flags & X86_EFLAGS_IF) + local_irq_disable(); +} + #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index 7ad22b705b64..cc10d8be9d74 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -4,10 +4,6 @@ #include #include -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - enum cp_error_code { CP_EC = (1 << 15) - 1, @@ -20,15 +16,80 @@ enum cp_error_code { CP_ENCL = 1 << 15, }; -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +static const char cp_err[][10] = { + [0] = "unknown", + [1] = "near ret", + [2] = "far/iret", + [3] = "endbranch", + [4] = "rstorssp", + [5] = "setssbsy", +}; + +static const char *cp_err_string(unsigned long error_code) { - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); + unsigned int cpec = error_code & CP_EC; + + if (cpec >= ARRAY_SIZE(cp_err)) + cpec = 0; + return cp_err[cpec]; +} + +static void do_unexpected_cp(struct pt_regs *regs, unsigned long error_code) +{ + WARN_ONCE(1, "Unexpected %s #CP, error_code: %s\n", + user_mode(regs) ? "user mode" : "kernel mode", + cp_err_string(error_code)); +} + +static DEFINE_RATELIMIT_STATE(cpf_rate, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + +static void do_user_cp_fault(struct pt_regs *regs, unsigned long error_code) +{ + struct task_struct *tsk; + unsigned long ssp; + + /* + * An exception was just taken from userspace. Since interrupts are disabled + * here, no scheduling should have messed with the registers yet and they + * will be whatever is live in userspace. So read the SSP before enabling + * interrupts so locking the fpregs to do it later is not required. + */ + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + cond_local_irq_enable(regs); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + /* Ratelimit to prevent log spamming. */ + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + __ratelimit(&cpf_rate)) { + pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)%s", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + cp_err_string(error_code), + error_code & CP_ENCL ? " in enclave" : ""); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); } - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) + force_sig_fault(SIGSEGV, SEGV_CPERR, (void __user *)0); + cond_local_irq_disable(regs); +} + +static __ro_after_init bool ibt_fatal = true; + +/* code label defined in asm below */ +extern void ibt_selftest_ip(void); + +static void do_kernel_cp_fault(struct pt_regs *regs, unsigned long error_code) +{ + if ((error_code & CP_EC) != CP_ENDBR) { + do_unexpected_cp(regs, error_code); return; + } if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { regs->ax = 0; @@ -74,3 +135,18 @@ static int __init ibt_setup(char *str) } __setup("ibt=", ibt_setup); + +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (user_mode(regs)) { + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + do_user_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } else { + if (cpu_feature_enabled(X86_FEATURE_IBT)) + do_kernel_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } +} diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index a58c6bc1cd68..5074b8420359 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -107,7 +107,7 @@ static const __initconst struct idt_data def_idts[] = { ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE), #endif -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET INTG(X86_TRAP_CP, asm_exc_control_protection), #endif diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c index 9027fc088f97..c12624bc82a3 100644 --- a/arch/x86/kernel/signal_32.c +++ b/arch/x86/kernel/signal_32.c @@ -402,7 +402,7 @@ int ia32_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c index 13a1e6083837..0e808c72bf7e 100644 --- a/arch/x86/kernel/signal_64.c +++ b/arch/x86/kernel/signal_64.c @@ -403,7 +403,7 @@ void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 6f666dfa97de..f358350624b2 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -77,18 +77,6 @@ DECLARE_BITMAP(system_vectors, NR_VECTORS); -static inline void cond_local_irq_enable(struct pt_regs *regs) -{ - if (regs->flags & X86_EFLAGS_IF) - local_irq_enable(); -} - -static inline void cond_local_irq_disable(struct pt_regs *regs) -{ - if (regs->flags & X86_EFLAGS_IF) - local_irq_disable(); -} - __always_inline int is_valid_bugaddr(unsigned long addr) { if (addr < TASK_SIZE_MAX) diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 093b78c8bbec..5a034a994682 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -640,7 +640,7 @@ static struct trap_array_entry trap_array[] = { TRAP_ENTRY(exc_coprocessor_error, false ), TRAP_ENTRY(exc_alignment_check, false ), TRAP_ENTRY(exc_simd_coprocessor_error, false ), -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET TRAP_ENTRY(exc_control_protection, false ), #endif }; diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index 08f1ceb9eb81..9e5e68008785 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -148,7 +148,7 @@ xen_pv_trap asm_exc_page_fault xen_pv_trap asm_exc_spurious_interrupt_bug xen_pv_trap asm_exc_coprocessor_error xen_pv_trap asm_exc_alignment_check -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET xen_pv_trap asm_exc_control_protection #endif #ifdef CONFIG_X86_MCE diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ffbe4cec9f32..0f52d0ac47c5 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -242,7 +242,8 @@ typedef struct siginfo { #define SEGV_ADIPERR 7 /* Precise MCD exception */ #define SEGV_MTEAERR 8 /* Asynchronous ARM MTE error */ #define SEGV_MTESERR 9 /* Synchronous ARM MTE exception */ -#define NSIGSEGV 9 +#define SEGV_CPERR 10 /* Control protection fault */ +#define NSIGSEGV 10 /* * SIGBUS si_codes From patchwork Tue Jun 13 00:10:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106980 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222075vqr; Mon, 12 Jun 2023 17:44:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5ZaEAQ83LpwiMWiXaHTPCTHv7DRons3h373mISmAk0KvkWAP6laosgIHc/MHwd6HWjPR2e X-Received: by 2002:a05:6402:4409:b0:514:a5cf:745b with SMTP id y9-20020a056402440900b00514a5cf745bmr7998219eda.3.1686617058577; Mon, 12 Jun 2023 17:44:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617058; cv=none; d=google.com; s=arc-20160816; b=ZMo6mfEyjDegXkJbkJFy3WU9wNRoeElze04plnmSAzaLJA+FKUjduDjnQyjp+j06Xw fb7C53BKz7zeFsmjT8mXSpD70cWC6LwQ1fI93XlCvu2kjtkY1fDx7d5RNyk0NW+u+t1w 20wxl5afWXxRQSVmk953ezKr7vPjyaEIu/0y2QmnttUa4wX1u1ILgiSjRJN7r1sPRCLy WM4UVYOmEaVYTO+MjtucpsGaTx+a/Ulm7pvGRdt06LTtJRftDNDYOq3XXj3evcCNrxz2 7uYysGQ2VPjMSdL7NAxeaO57rANDBqFjY9zexd8CVdszKbdJWb+iUcfdtex/52SWPqCp anOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p3jyjKsRiT5U7d0D/fXsaKNoPvKBDHZ161THvSejxrY=; b=jVkC7IaFgz3OMnB79YlJ6j78D6WWCOnFb57ZfJANsUOWLDJNgB38UkmpbwtGRzutvR c4Aso8DzuRDYO5zr5Eweu+gW94QYNb0IhjzhjZ+dhp4HvHQW5XINTfMie+LbHkDyGjGO mUEGfnCORnezlp6N//gHlyXPV0u7TezdQmyPLMQp/kW7BdFugqrtNb8/k3+6vyea0jyK covLiHWjTEck8x9trn8Tt/1BD+3njAPxpcQAIu44Uccjtf0xKe+OFjwlSHDf4GwFEWuZ muEQk/wP9K1FxFUa2Bdq7QwiFPhqb3fnNrQgQOPyJOSraRQ0V+n1sS6KT/teK984Csrd v45g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mpCLsY8T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d12-20020aa7d5cc000000b005169fe73ad0si6267100eds.600.2023.06.12.17.43.55; Mon, 12 Jun 2023 17:44:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mpCLsY8T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239282AbjFMARG (ORCPT + 99 others); Mon, 12 Jun 2023 20:17:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239064AbjFMAQS (ORCPT ); Mon, 12 Jun 2023 20:16:18 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47D632959; Mon, 12 Jun 2023 17:13:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615211; x=1718151211; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BugAl8HBv2gDjezo8BApRaTr6a0cYuyYBztBA+16s0k=; b=mpCLsY8TSF9hf/sLmIsPKiD8huZmdyRhzYOMig4HbAnedB0CabA+ST70 118AvCIZ518tWZ0k41NXJNYbSyKSHL0BW0Kp7Q6+4kqfSq3xHljhxN0vI 0KR0FbY4cnEG6UHbdalvPFGOIS56tsKhQyJU7ApiHRmtQi0XA/ZujqDif JgAkGZ5srqLg9DgNslzYF6PlvmpknF8RPFb0UCDS2NYUJsJ9+bH9pxHkp Z0D97iVpaqg50+0SHdMoXGApqBV66wuC1jPD7sx4XCO+e+gCyY+QqfmoA eRCQoHy/GeFoZI2ZME67HY21Zgcs0gsni4lUvoWlHNk2whTvzqq1Be7Lm g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557315" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557315" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671093" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671093" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:31 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 28/42] x86/shstk: Add user-mode shadow stack support Date: Mon, 12 Jun 2023 17:10:54 -0700 Message-Id: <20230613001108.3040476-29-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546168853613538?= X-GMAIL-MSGID: =?utf-8?q?1768546168853613538?= Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this. 32 bit shadow stack is not expected to have many users and it will complicate the signal implementation. So do not support IA32 emulation or x32. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/processor.h | 2 + arch/x86/include/asm/shstk.h | 7 ++ arch/x86/include/uapi/asm/prctl.h | 3 + arch/x86/kernel/shstk.c | 145 ++++++++++++++++++++++++++++++ 4 files changed, 157 insertions(+) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 407d5551b6a7..2a5ec5750ba7 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -479,6 +479,8 @@ struct thread_struct { #ifdef CONFIG_X86_USER_SHADOW_STACK unsigned long features; unsigned long features_locked; + + struct thread_shstk shstk; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index ec753809f074..2b1f7c9b9995 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -8,12 +8,19 @@ struct task_struct; #ifdef CONFIG_X86_USER_SHADOW_STACK +struct thread_shstk { + u64 base; + u64 size; +}; + long shstk_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +void shstk_free(struct task_struct *p); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 1cd44ecc9ce0..6a8e0e1bff4a 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -34,4 +34,7 @@ #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 +/* ARCH_SHSTK_ features bits */ +#define ARCH_SHSTK_SHSTK (1ULL << 0) + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 41ed6552e0a5..3cb85224d856 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -8,14 +8,159 @@ #include #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include +static bool features_enabled(unsigned long features) +{ + return current->thread.features & features; +} + +static void features_set(unsigned long features) +{ + current->thread.features |= features; +} + +static void features_clr(unsigned long features) +{ + current->thread.features &= ~features; +} + +static unsigned long alloc_shstk(unsigned long size) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; + struct mm_struct *mm = current->mm; + unsigned long addr, unused; + + mmap_write_lock(mm); + addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + + mmap_write_unlock(mm); + + return addr; +} + +static unsigned long adjust_shstk_size(unsigned long size) +{ + if (size) + return PAGE_ALIGN(size); + + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); +} + +static void unmap_shadow_stack(u64 base, u64 size) +{ + while (1) { + int r; + + r = vm_munmap(base, size); + + /* + * vm_munmap() returns -EINTR when mmap_lock is held by + * something else, and that lock should not be held for a + * long time. Retry it for the case. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + /* + * For all other types of vm_munmap() failure, either the + * system is out of memory or there is bug. + */ + WARN_ON_ONCE(r); + break; + } +} + +static int shstk_setup(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long addr, size; + + /* Already enabled */ + if (features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + /* Also not supported for 32 bit and x32 */ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || in_32bit_syscall()) + return -EOPNOTSUPP; + + size = adjust_shstk_size(0); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + fpregs_unlock(); + + shstk->base = addr; + shstk->size = size; + features_set(ARCH_SHSTK_SHSTK); + + return 0; +} + void reset_thread_features(void) { + memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); current->thread.features = 0; current->thread.features_locked = 0; } +void shstk_free(struct task_struct *tsk) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return; + + if (!tsk->mm) + return; + + unmap_shadow_stack(shstk->base, shstk->size); +} + +static int shstk_disable(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* Already disabled? */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + fpregs_lock_and_load(); + /* Disable WRSS too when disabling shadow stack */ + wrmsrl(MSR_IA32_U_CET, 0); + wrmsrl(MSR_IA32_PL3_SSP, 0); + fpregs_unlock(); + + shstk_free(current); + features_clr(ARCH_SHSTK_SHSTK); + + return 0; +} + long shstk_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_SHSTK_LOCK) { From patchwork Tue Jun 13 00:10:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106967 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp219112vqr; Mon, 12 Jun 2023 17:36:00 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6kx9m+1NrKjVyXiZvbQqSE68ap0/CD48G+Ao2YM9HrhXV6QRau+mavXxfoJ04S9igpxbu6 X-Received: by 2002:a17:906:5d09:b0:978:af9d:c004 with SMTP id g9-20020a1709065d0900b00978af9dc004mr9775341ejt.4.1686616560020; Mon, 12 Jun 2023 17:36:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616560; cv=none; d=google.com; s=arc-20160816; b=AiG6DvIvMS6SgXYsUo7lUN/efmcD5tIEQZj3gsrASbYUootiiueYZuLre4KqHF+8GH iTIGwe9YozWDOT7rH/byOEtoEYs+ZhdCeKYeEcuJir80Z+kB7ER6cQnMHV5+FYGIpSR2 /pKev2gf65XGO2Gg2mpMsnHIKEfaXvpP1gUQ/Hsgq3TWXQynaVkV+LOV0jq7egPOHGDv kE6SxaaPaEGZT7cYtzssOUAcBYUxT+h3ieLgoocLZaOjggMk1/8KYDL0IOI8H3HwNBPA GobXvamQB2NdKoV51edH//KKrJVxkeUinpg1p5S/FcSFP4NbX7OxG82wrUkiWvmTojv6 TApg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6rHYz8SdjBH/6c3pJG6cDBMu51EcZSu0gv2NjEypa6I=; b=Drh9L6JRwJtW+djgyotuoXSJIWWp8hKKDrlKRUmVZhyEHqjqrQN+xxb0jnKpHC+aM0 5jfJSCktxrrrnrdk14bNhtouLRT7NHUeOyx70CfDdEgVWfIAFdCtIbOT5E/fY7gR6xAz RbJKTPub9KRwXwxg7fNiJZCVc62kMuINN/by1hxmvOB2bKdfsqpcRKPSaaaTMk78V9hg k1iNXDiYCERtNweGGqjkp39LYu6KNM6tGRe2KIMB47KsgpfH3EHqvFZ3Gz2h/Uk3H1QN kDEaGvfkZ3ZaT4JNBFIVCSqHQNs/KLITH/9ecBuisbxt//2RJdp6pt95yMhOqvj1uiUG Fr2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jR39Ku9U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sa22-20020a170906edb600b00977c8c91cbesi6326000ejb.638.2023.06.12.17.35.08; Mon, 12 Jun 2023 17:35:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jR39Ku9U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239448AbjFMARJ (ORCPT + 99 others); Mon, 12 Jun 2023 20:17:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238500AbjFMAQT (ORCPT ); Mon, 12 Jun 2023 20:16:19 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27C93295E; Mon, 12 Jun 2023 17:13:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615213; x=1718151213; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZRAh96yHcl8Psj9OI1FtC3p64r7joGZ816QWoOgXk8M=; b=jR39Ku9U1UHMY40GLHwezd7NoEtNGQqQiNryPFKAXQRq0vrx7mD7gK1X rra6XZzU63Au2ucRB52lmNSTOfxHMzAH8TmzfkBCVYk7Fakw1m+J05JHR vkFrOj8CRiXkQdyynaqxOqiIfPT+W7/9BT6Gvyi5/SoYAFQIZdKuT7Adi WyaEvdSVY7Lwr7oTNNVrDqMQBtdY7eFShvcsA6rlMaah3f1bRzkVLt6T3 3rjRdtRy1Ighg2JD+rava7IUhJ5I2yCAeOpMBCRhT16lnaaM86Ns90dRz Kw8TBiAHUzzLJzV/vFco1k6Obm2vEoz1utsSwhW+Y7nxebZ9ssqcoklOe Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557342" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557342" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671099" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671099" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:32 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 29/42] x86/shstk: Handle thread shadow stack Date: Mon, 12 Jun 2023 17:10:55 -0700 Message-Id: <20230613001108.3040476-30-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768545645981127308?= X-GMAIL-MSGID: =?utf-8?q?1768545645981127308?= When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-CET case this is handled in two ways. With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks. For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack. For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it. Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB. For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork(). Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent. During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner. Suggested-by: Thomas Gleixner Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/fpu/sched.h | 3 ++- arch/x86/include/asm/mmu_context.h | 2 ++ arch/x86/include/asm/shstk.h | 5 ++++ arch/x86/kernel/fpu/core.c | 36 +++++++++++++++++++++++++- arch/x86/kernel/process.c | 21 ++++++++++++++- arch/x86/kernel/shstk.c | 41 ++++++++++++++++++++++++++++-- 6 files changed, 103 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h index c2d6cd78ed0c..3c2903bbb456 100644 --- a/arch/x86/include/asm/fpu/sched.h +++ b/arch/x86/include/asm/fpu/sched.h @@ -11,7 +11,8 @@ extern void save_fpregs_to_fpstate(struct fpu *fpu); extern void fpu__drop(struct fpu *fpu); -extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal); +extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long shstk_addr); extern void fpu_flush_thread(void); /* diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 1d29dc791f5a..416901d406f8 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -186,6 +186,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index 2b1f7c9b9995..d4a5c7b10cb5 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -15,11 +15,16 @@ struct thread_shstk { long shstk_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size); void shstk_free(struct task_struct *p); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size) { return 0; } static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index f851558b673f..aa4856b236b8 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -552,8 +552,36 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) } } +/* A passed ssp of zero will not cause any update */ +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) +{ +#ifdef CONFIG_X86_USER_SHADOW_STACK + struct cet_user_state *xstate; + + /* If ssp update is not needed. */ + if (!ssp) + return 0; + + xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, + XFEATURE_CET_USER); + + /* + * If there is a non-zero ssp, then 'dst' must be configured with a shadow + * stack and the fpu state should be up to date since it was just copied + * from the parent in fpu_clone(). So there must be a valid non-init CET + * state location in the buffer. + */ + if (WARN_ON_ONCE(!xstate)) + return 1; + + xstate->user_ssp = (u64)ssp; +#endif + return 0; +} + /* Clone current's FPU state on fork */ -int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) +int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long ssp) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; @@ -613,6 +641,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) if (use_xsave()) dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID; + /* + * Update shadow stack pointer, in case it changed during clone. + */ + if (update_fpu_shstk(dst, ssp)) + return 1; + trace_x86_fpu_copy_src(src_fpu); trace_x86_fpu_copy_dst(dst_fpu); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index dac41a0072ea..3ab62ac98c2c 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -50,6 +50,7 @@ #include #include #include +#include #include "process.h" @@ -121,6 +122,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -142,6 +144,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) struct inactive_task_frame *frame; struct fork_frame *fork_frame; struct pt_regs *childregs; + unsigned long new_ssp; int ret = 0; childregs = task_pt_regs(p); @@ -179,7 +182,16 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->flags = X86_EFLAGS_FIXED; #endif - fpu_clone(p, clone_flags, args->fn); + /* + * Allocate a new shadow stack for thread if needed. If shadow stack, + * is disabled, new_ssp will remain 0, and fpu_clone() will know not to + * update it. + */ + new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size); + if (IS_ERR_VALUE(new_ssp)) + return PTR_ERR((void *)new_ssp); + + fpu_clone(p, clone_flags, args->fn, new_ssp); /* Kernel thread ? */ if (unlikely(p->flags & PF_KTHREAD)) { @@ -225,6 +237,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); + /* + * If copy_thread() if failing, don't leak the shadow stack possibly + * allocated in shstk_alloc_thread_stack() above. + */ + if (ret) + shstk_free(p); + return ret; } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 3cb85224d856..bd9cdc3a7338 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -47,7 +47,7 @@ static unsigned long alloc_shstk(unsigned long size) unsigned long addr, unused; mmap_write_lock(mm); - addr = do_mmap(NULL, addr, size, PROT_READ, flags, + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); @@ -126,6 +126,37 @@ void reset_thread_features(void) current->thread.features_locked = 0; } +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + unsigned long addr, size; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + /* + * For CLONE_VM, except vfork, the child needs a separate shadow + * stack. + */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + size = adjust_shstk_size(stack_size); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return addr; + + shstk->base = addr; + shstk->size = size; + + return addr + size; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; @@ -134,7 +165,13 @@ void shstk_free(struct task_struct *tsk) !features_enabled(ARCH_SHSTK_SHSTK)) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; unmap_shadow_stack(shstk->base, shstk->size); From patchwork Tue Jun 13 00:10:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106988 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222708vqr; Mon, 12 Jun 2023 17:45:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6NuhlqIPXByNZs7nrl6G1hkrgccIAUagCvCBaw4yOezQtHulWzki4wS+QEboMY+ujU0ruj X-Received: by 2002:a17:907:3f83:b0:94b:ffe9:37fd with SMTP id hr3-20020a1709073f8300b0094bffe937fdmr10099018ejc.5.1686617151703; Mon, 12 Jun 2023 17:45:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617151; cv=none; d=google.com; s=arc-20160816; b=uoE48lZnM6G8NHl4FcgttJUQCc/1qIoQ5EMfgexVjKZ42Vmjnc8oNFABBuum5qNkgL nVZnp67r5omoiAnPWWjCLe9dGSBL/MaWtrCOIuNIT09Qo3IVNMbZX00n9fey9Q/XQARS 88kG4Hk4guqHkT5iSnFZgzDpy++Y1es1J6CjCZHHwd81UVmrS+qSqSfTMEfvoDoL/YGF kwxJXL1UFVmyVbCMIGBALY+EZ8oqbGHJ1P7zBVg2fhdUjfKQNiXLCS9ItlX20+7mg1/W siv1d1vWKtCsX1f0IU8mw1Lxhz5IHcHsZE7Bk2rAw7uUh8EL2h6sJUruV17XRHv3rvoj yqyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=CCO8Fv6gzg+48ugIh6HaSUWv+RYPVlqFMXpZfWmPIDc=; b=zPruv5bgA69+uKWKUktyiOsTwJh9Dt2XfAjfrvk40Woaxqi3qZN2dTPCBaejxWHvam 9iopvqeB854IWSlNt9HwId4omN5NbqgIBblyRlhZlwiXnj8bRFLAHMbnpb+kSj8GrQ1Z UghGaoMX2Kw0svSe3vnUN09uoNjABNv2oq4kVSC4SAt5G1ZlMHaSx4gfIXJZWAN3Eg2J dyGnMK+WQyb2wfwe87iryOlGPs/VSAuYUfTBGK+GIiiwNQPd8O/5ZoLIR28ciSiUZpFr 2qGiWANOZvU9Y9uHBi0RlMVwyO6RboCfSGGmQIq+fq8QYg9pkzi0qfB8zyKdusB65KxX ENOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TuwYIIui; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h5-20020a170906590500b0097668549c75si5989890ejq.391.2023.06.12.17.45.04; Mon, 12 Jun 2023 17:45:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TuwYIIui; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239467AbjFMARj (ORCPT + 99 others); Mon, 12 Jun 2023 20:17:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238922AbjFMAQd (ORCPT ); Mon, 12 Jun 2023 20:16:33 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D349E2D52; Mon, 12 Jun 2023 17:13:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615222; x=1718151222; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JRlODCr89ZW4tpE3v7Z5UBl8Adk88XgPugy9MazljtY=; b=TuwYIIuiLLu17rlsf1KPe1S2z0CKrbLXdnrtqChFKDBsp/Jd6h0FN435 78+KNK9LwB2oLRIPK6rzzk/VJ3SkGESBJqTm3Y4Q60Ifh7csCdFtQWFce +TvsWQa40rZmXA3RqsW7dSjVmC4DnwZ7ECZodTSDNvxkTpbaUfzlAFCHA 0YTDe4yO4ZUhptejG7xjA9Q0IG0wLruYH8UPivkGoKn6HxL3xeVBbzEXv HgLEw+gSHVVdmeOyXBeiH42mk7BVytPZHobTCnD4zV/9mQ1XkHngKxtHp NkUil2uSY335dxRlhPrR7IFCOSZOZSXK+TlvusbnSvK4Z7UV81dhKNQRJ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557364" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557364" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671105" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671105" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:33 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 30/42] x86/shstk: Introduce routines modifying shstk Date: Mon, 12 Jun 2023 17:10:56 -0700 Message-Id: <20230613001108.3040476-31-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546266679780972?= X-GMAIL-MSGID: =?utf-8?q?1768546266679780972?= Shadow stacks are normally written to via CALL/RET or specific CET instructions like RSTORSSP/SAVEPREVSSP. However, sometimes the kernel will need to write to the shadow stack directly using the ring-0 only WRUSS instruction. A shadow stack restore token marks a restore point of the shadow stack, and the address in a token must point directly above the token, which is within the same shadow stack. This is distinctively different from other pointers on the shadow stack, since those pointers point to executable code area. Introduce token setup and verify routines. Also introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. In future patches that enable shadow stack to work with signals, the kernel will need something to denote the point in the stack where sigreturn may be called. This will prevent attackers calling sigreturn at arbitrary places in the stack, in order to help prevent SROP attacks. To do this, something that can only be written by the kernel needs to be placed on the shadow stack. This can be accomplished by setting bit 63 in the frame written to the shadow stack. Userspace return addresses can't have this bit set as it is in the kernel range. It also can't be a valid restore token. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/special_insns.h | 13 +++++ arch/x86/kernel/shstk.c | 75 ++++++++++++++++++++++++++++ 2 files changed, 88 insertions(+) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index de48d1389936..d6cd9344f6c7 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -202,6 +202,19 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_USER_SHADOW_STACK +static inline int write_user_shstk_64(u64 __user *addr, u64 val) +{ + asm_volatile_goto("1: wrussq %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); + return 0; +fail: + return -EFAULT; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #define nop() asm volatile ("nop") static inline void serialize(void) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index bd9cdc3a7338..e22928c63ffc 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -25,6 +25,8 @@ #include #include +#define SS_FRAME_SIZE 8 + static bool features_enabled(unsigned long features) { return current->thread.features & features; @@ -40,6 +42,35 @@ static void features_clr(unsigned long features) current->thread.features &= ~features; } +/* + * Create a restore token on the shadow stack. A token is always 8-byte + * and aligned to 8. + */ +static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{ + unsigned long addr; + + /* Token must be aligned */ + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + addr = ssp - SS_FRAME_SIZE; + + /* + * SSP is aligned, so reserved bits and mode bit are a zero, just mark + * the token 64-bit. + */ + ssp |= BIT(0); + + if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) + return -EFAULT; + + if (token_addr) + *token_addr = addr; + + return 0; +} + static unsigned long alloc_shstk(unsigned long size) { int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; @@ -157,6 +188,50 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long cl return addr + size; } +static unsigned long get_user_shstk_addr(void) +{ + unsigned long long ssp; + + fpregs_lock_and_load(); + + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + fpregs_unlock(); + + return ssp; +} + +#define SHSTK_DATA_BIT BIT(63) + +static int put_shstk_data(u64 __user *addr, u64 data) +{ + if (WARN_ON_ONCE(data & SHSTK_DATA_BIT)) + return -EINVAL; + + /* + * Mark the high bit so that the sigframe can't be processed as a + * return address. + */ + if (write_user_shstk_64(addr, data | SHSTK_DATA_BIT)) + return -EFAULT; + return 0; +} + +static int get_shstk_data(unsigned long *data, unsigned long __user *addr) +{ + unsigned long ldata; + + if (unlikely(get_user(ldata, addr))) + return -EFAULT; + + if (!(ldata & SHSTK_DATA_BIT)) + return -EINVAL; + + *data = ldata & ~SHSTK_DATA_BIT; + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; From patchwork Tue Jun 13 00:10:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106987 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222703vqr; Mon, 12 Jun 2023 17:45:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4JogGvg16uS1swHQl90qG9MhQfsu9p+Ty+aDklUR8h5kWD6KQyLcaQbQU2RGEj4RnMewtF X-Received: by 2002:a05:6402:8d5:b0:506:8dba:bd71 with SMTP id d21-20020a05640208d500b005068dbabd71mr5867701edz.27.1686617151265; Mon, 12 Jun 2023 17:45:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617151; cv=none; d=google.com; s=arc-20160816; b=iuucMJRKLXN9aehS8UcG+5BCjOuEa7Twzjlzbk6lteGlWMwDlUBBbgdN/9o+XF+E/V gbZmX6JjoG6nqgLdhj/UEmfnTZ6huJE01OMtjY43Wup1GKY+juuWo6JTJ6uiTsU1Zpbp TXdl2Nay+BbAT/9OGhVvuZDTaSbDXC/wokBvFNdCvzhDcnNjCcI550Jjbj7Ulf0sW2Rw ZLNvUotXdrRWfgg0OX2+ORAOv+icOqvGxvF628j2ebUlgydVTicD6ldXUZ18i/o1W9hh ChEUVwoZxQ97t+gB8Q8Flpor2fgoGFjYyIJuAg6recr10qnc635lL1CM1OI36gZkLT+X 0Gag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=MgDcxUbk8bavQieeuDWIimstnW9ktFErCUXNtfQmk0E=; b=FTV90/sB8msRSwUhPCgednzAK8GXy09ygGbJl0R3uqifWPUy/aBKuE3E2li5LrkYCB 8vu27DsiHuAOKWYhcVFNSaXDnABQzkwEDuNm4i9r/1+gtXgF4ZIgwVHJTo7oWMOZYH2n DHzO2JsVJFk6Crt0yEfCMAFj1QI8GSksvPNhjTNYVxP0egB5oOFRUb4b/zLmHLqbtlCW o6NRVhp0mFFCsEjI9oY6pdoevstX2+4Z3t+Z1GAWqzOnwP/QN0E/UmhpW/yGkXcK+HMR +ykga503JAHFzmx5XuX4RF6ji4ZzPaN6AYRr9Sd0I2tVNAZYS3Wosi/0hKrzUkM9H3lY 8CMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OwZ6peWi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ca20-20020aa7cd74000000b00516575b1ca4si6416502edb.38.2023.06.12.17.45.04; Mon, 12 Jun 2023 17:45:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OwZ6peWi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239483AbjFMARn (ORCPT + 99 others); Mon, 12 Jun 2023 20:17:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238929AbjFMAQs (ORCPT ); Mon, 12 Jun 2023 20:16:48 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DDA430D2; Mon, 12 Jun 2023 17:13:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615226; x=1718151226; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5PTalCu64urUNwCF3tCp/oZ88l0SJTltpYFQdwbjc9Y=; b=OwZ6peWipklmmu+7RE3G7V95NHEZpPAQWJ/VVDHmSyr4KC2S2pdTviQV wNjkfEqFJeDUaetYMci9QsJySWLJJ62L20Fca96YxFrRZDDzoHDq5DszD trylP+Q1npg9BzUfXA0jAE1CyRtZ453tGhTi+hreSGebmOSiANpbiQ94A V3knDybfMstsNkqNS0BziyN8I+pvtwzeZMlmdcVi/XlS35j8jR/Wmuxc7 2nTUUBEdt2EyK7wlHZcexq8ZNIuoLerP9mzKWwuo1PiwTjAcTrrfFWRmG k2PEHwQA5ddv5JDyYN6aV25YNKyKLhit82TT9W+IDTtwD+clDE8F2PM29 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557387" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557387" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671109" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671109" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:34 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 31/42] x86/shstk: Handle signals for shadow stack Date: Mon, 12 Jun 2023 17:10:57 -0700 Message-Id: <20230613001108.3040476-32-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546265981930839?= X-GMAIL-MSGID: =?utf-8?q?1768546265981930839?= When a signal is handled, the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only tracks return addresses, there isn't any state that needs to be pushed. However, there are still a few things that need to be done. These things are visible to userspace and which will be kernel ABI for shadow stacks. One is to make sure the restorer address is written to shadow stack, since the signal handler (if not changing ucontext) returns to the restorer, and the restorer calls sigreturn. So add the restorer on the shadow stack before handling the signal, so there is not a conflict when the signal handler returns to the restorer. The other thing to do is to place some type of checkable token on the thread's shadow stack before handling the signal and check it during sigreturn. This is an extra layer of protection to hamper attackers calling sigreturn manually as in SROP-like attacks. For this token the shadow stack data format defined earlier can be used. Have the data pushed be the previous SSP. In the future the sigreturn might want to return back to a different stack. Storing the SSP (instead of a restore offset or something) allows for future functionality that may want to restore to a different stack. So, when handling a signal push - the SSP pointing in the shadow stack data format - the restorer address below the restore token. In sigreturn, verify SSP is stored in the data format and pop the shadow stack. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/shstk.h | 5 ++ arch/x86/kernel/shstk.c | 95 ++++++++++++++++++++++++++++++++++++ arch/x86/kernel/signal.c | 1 + arch/x86/kernel/signal_64.c | 6 +++ 4 files changed, 107 insertions(+) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index d4a5c7b10cb5..ecb23a8ca47d 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct ksignal; #ifdef CONFIG_X86_USER_SHADOW_STACK struct thread_shstk { @@ -18,6 +19,8 @@ void reset_thread_features(void); unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size); void shstk_free(struct task_struct *p); +int setup_signal_shadow_stack(struct ksignal *ksig); +int restore_signal_shadow_stack(void); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } @@ -26,6 +29,8 @@ static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size) { return 0; } static inline void shstk_free(struct task_struct *p) {} +static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } +static inline int restore_signal_shadow_stack(void) { return 0; } #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index e22928c63ffc..f02e8ea4f1b5 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -232,6 +232,101 @@ static int get_shstk_data(unsigned long *data, unsigned long __user *addr) return 0; } +static int shstk_push_sigframe(unsigned long *ssp) +{ + unsigned long target_ssp = *ssp; + + /* Token must be aligned */ + if (!IS_ALIGNED(target_ssp, 8)) + return -EINVAL; + + *ssp -= SS_FRAME_SIZE; + if (put_shstk_data((void *__user)*ssp, target_ssp)) + return -EFAULT; + + return 0; +} + +static int shstk_pop_sigframe(unsigned long *ssp) +{ + unsigned long token_addr; + int err; + + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); + if (unlikely(err)) + return err; + + /* Restore SSP aligned? */ + if (unlikely(!IS_ALIGNED(token_addr, 8))) + return -EINVAL; + + /* SSP in userspace? */ + if (unlikely(token_addr >= TASK_SIZE_MAX)) + return -EINVAL; + + *ssp = token_addr; + + return 0; +} + +int setup_signal_shadow_stack(struct ksignal *ksig) +{ + void __user *restorer = ksig->ka.sa.sa_restorer; + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + if (!restorer) + return -EINVAL; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_push_sigframe(&ssp); + if (unlikely(err)) + return err; + + /* Push restorer address */ + ssp -= SS_FRAME_SIZE; + err = write_user_shstk_64((u64 __user *)ssp, (u64)restorer); + if (unlikely(err)) + return -EFAULT; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + +int restore_signal_shadow_stack(void) +{ + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_pop_sigframe(&ssp); + if (unlikely(err)) + return err; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 004cb30b7419..356253e85ce9 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -40,6 +40,7 @@ #include #include #include +#include static inline int is_ia32_compat_frame(struct ksignal *ksig) { diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c index 0e808c72bf7e..cacf2ede6217 100644 --- a/arch/x86/kernel/signal_64.c +++ b/arch/x86/kernel/signal_64.c @@ -175,6 +175,9 @@ int x64_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs) frame = get_sigframe(ksig, regs, sizeof(struct rt_sigframe), &fp); uc_flags = frame_uc_flags(regs); + if (setup_signal_shadow_stack(ksig)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -260,6 +263,9 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (restore_altstack(&frame->uc.uc_stack)) goto badframe; From patchwork Tue Jun 13 00:10:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106992 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp223251vqr; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4jj9H/QGeS88L+jLvSFINii+othT+RPFdz7qBwA0v6uYQJp4FtBwIA7rJdz5TZFebEaJ9I X-Received: by 2002:a17:907:3687:b0:974:4f34:b04a with SMTP id bi7-20020a170907368700b009744f34b04amr10804749ejc.34.1686617238023; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617238; cv=none; d=google.com; s=arc-20160816; b=vMZ62v1S4EnLOg0sInGXdPx6FnsfZRF+tmgoJqqpiGhKirAht34yP6opwKhvFrNeHI zPDvipdX3rAJ1fQ4lxhPpkJKTSm/YJ9bi1RAJ+3GqeXmwqFA3lhSvLrBlbl3RIzpohgU C+V3tEnUZsBVi2lO/d3pkhrzn/uGoA/YMaFfXyX7p4w+TlkDQc5pHykNgo0/KiuG1fU4 AII1G4tcrZmINmCzeXO0Tmhd623iQ5sVCt7O92aoLtZt4GACJsMCMjPDD18icif9w/Kk MvAQttWuqZDuthVPaWh040w6DtIlRsa4nv/uqld/XhfljswUuTSZJTCr6Zb2gokKV8Jy Ik5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VJQfD0Gy0U01zHXthnUY9VvfAHAiqWwVCIxa02ED0FQ=; b=0ItUGLHNCNd3pFTr3z0fIjs4LNOW3TWBrkwWK64rsTbSt0pAn02szDrtTUknKg6XNC sbkZIWjYEzu+T1cDcvQfh0mdlOhOZRwWca+RK0+q0gHkZtt2Wf0DR9CCAO6suFZBypT9 5SM4bDs5oc9BKF+fSLBVxOfCQB1jNjWG0EXh4VJ2lSiml9qs+MBwLDMbqCU8zzUluxBI KQnbt8sxR2gGefpKx6YHD495925MIqw9Wjf7/oYjOTkGtbf8Qp/ax1S8QuFcnUs/uLeQ BmPky9/cTdUaQVnzbGWXctIgdcjsSTw5BiekrFokdb9CjdI/tWqX7HL4D9d14CVX3G/3 yd8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JAetgpkg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u14-20020a170906950e00b00977c6d3303asi5969339ejx.570.2023.06.12.17.46.52; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JAetgpkg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239505AbjFMASA (ORCPT + 99 others); Mon, 12 Jun 2023 20:18:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238930AbjFMAQv (ORCPT ); Mon, 12 Jun 2023 20:16:51 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C84830E6; Mon, 12 Jun 2023 17:13:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615228; x=1718151228; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qj0G4UwqkNNGQlR4Z3X67pikfZ5WOGk4HokEGvo8sro=; b=JAetgpkgtOB33AnSsLsZ6l+U4GY9rrlYZYi4fpJkFOIa7Kr9KAjCNHBY SV/PxNHe8T509ZqE4kY6TLxDxCPSjnm+/PiboySFlE+BGU14sWnH+iyJ3 1kSvKqdatfafqbNOrVnumH79hDvXzDG/7IDRCytLvnR4L7L5iwPK/FrX7 tZ29VfZcOZwPBQTF4eYNNOCLV6AahxgWbIlwDwj7j89ccJFJm0wl9t9O1 lE8wFzjrGRhmeM3UfPl9kY8ZuzEd9opmzaCuybQJjdryPargw0Q7Srwiq puDrwQ0mc0+eCy5ThXSDfw8yLTgpJSz9TiulNRpfUpM9ng1hTGrunyRm0 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557411" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557411" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671113" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671113" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:35 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v9 32/42] x86/shstk: Check that SSP is aligned on sigreturn Date: Mon, 12 Jun 2023 17:10:58 -0700 Message-Id: <20230613001108.3040476-33-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546356733085233?= X-GMAIL-MSGID: =?utf-8?q?1768546356733085233?= The shadow stack signal frame is read by the kernel on sigreturn. It relies on shadow stack memory protections to prevent forgeries of this signal frame (which included the pre-signal SSP). It also relies on the shadow stack signal frame to have bit 63 set. Since this bit would not be set via typical shadow stack operations, so the kernel can assume it was a value it placed there. However, in order to support 32 bit shadow stack, the INCSSPD instruction can increment the shadow stack by 4 bytes. In this case SSP might be pointing to a region spanning two 8 byte shadow stack frames. It could confuse the checks described above. Since the kernel only supports shadow stack in 64 bit, just check that the SSP is 8 byte aligned in the sigreturn path. Signed-off-by: Rick Edgecombe --- v9: - New patch --- arch/x86/kernel/shstk.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index f02e8ea4f1b5..a8705f7d966c 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -252,6 +252,9 @@ static int shstk_pop_sigframe(unsigned long *ssp) unsigned long token_addr; int err; + if (!IS_ALIGNED(*ssp, 8)) + return -EINVAL; + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); if (unlikely(err)) return err; From patchwork Tue Jun 13 00:10:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106986 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222691vqr; Mon, 12 Jun 2023 17:45:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7K4rqwfkvwlzG8t8+N1Ekn66LYe5gxNuskGJPpdSz7ukqWKixSk/MfuRffAfioZmYkLmMn X-Received: by 2002:a17:906:dc95:b0:973:d27e:bd87 with SMTP id cs21-20020a170906dc9500b00973d27ebd87mr10663278ejc.25.1686617150150; Mon, 12 Jun 2023 17:45:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617150; cv=none; d=google.com; s=arc-20160816; b=SCFDW6wn2WT6YD97gIPq0S6L4vhocP3iA0AoNM8OB0dSD/FXqDK6WKBPscJEYwB8sZ ExP3NuB4pXbz6pXk8CrDlJQezCbAy4fYhrpKvBOf0XDX5UcOVr+yCgrmxz2Seu8auK6O I6/8To+tSbauWH67znhaJECzX3UcHYP71o+lYM/2uq62bI078rr7g6fWl8gYYrUvGCrW 6Ap1YDw4G7NM4nOk44ohzKrQfDR6XGKNOErh3dXT2bCglr7MICts7iGjYipV/mfB7wgU mmOFgWFQdXfi7QwcUDBIlW2apPqHEjKJerXq2AOfovjuS9DbMqq3mfs7TW97UnbbrwXz /bgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=j/S9+lJTj7lOJLdBjMtY21maw2CmhDPCo82KqZj1ygs=; b=urtHOLcld0Tl12MvmoVqWc2f1Y5cmAQPL5Js3Ci4kSvdmkVEFoqFSFU2lEFiP0PiD2 or5wBuhqo+oJ5XCIC9WpEhJhg508eMRBf0gNCTC+65IjzT02aV9HBlz+vpQshGCI1MCC vLw0TtyQtBDPK5MTsFJEIvB3T91okCy7NFbEa8swf+Ad0fFKwBwRWl6MKtfuHrKqYHSz hgWZSIBOaRoln82MwMngfAwnh+qmBv/et2/RG9Ssb2/j2CdShxd7+0qRKIFihitCBjAk 4m8iU4JUJBCB0wmZbeN1m4E/5BAjfQg6eDDmfTfgfJaATFMEonMxTA5cYsGXcQOFH/oc 70Jg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VtwaUGGb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l17-20020a170906a41100b009782c389df9si5672863ejz.669.2023.06.12.17.44.59; Mon, 12 Jun 2023 17:45:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VtwaUGGb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239147AbjFMASJ (ORCPT + 99 others); Mon, 12 Jun 2023 20:18:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239019AbjFMAQz (ORCPT ); Mon, 12 Jun 2023 20:16:55 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C39530F1; Mon, 12 Jun 2023 17:13:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615230; x=1718151230; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=T7gc3JiJMbmruUv2dTeFaPrm+ED2TBUgFQFH5yJf4mA=; b=VtwaUGGb52YlAWh8RBwn4COZxQR9V4cvBL0nuOeTgM0SrP74fEC6EW3Y NMK+5Qm3Ie0kdtQz8rzORUZQEPQXjzP03L6kIiTbi4f+wpmn/eKJV2K2e HGEXyqIURRLcS9CkhV4hm/GPqZP1HmTxQtBtuGfV4eVKPx+PWRefSxjHp 7doUuumQApGo+zlwyBonyBy84tDUw94TtCKjaqAcl16Yrqws6NcsEy9zs +9yeMQNhUNF/KTb8sZwIiI5gZSVKcqGH/dnWNt4xIXR3Dn/aok2hkOBQ0 JbbSrEAv+3B234bj4Q6pelx8q2oHc4lVfEmdCQmNmeCJFvUY9rf/Y5PtB Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557436" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557436" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671121" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671121" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:36 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v9 33/42] x86/shstk: Check that signal frame is shadow stack mem Date: Mon, 12 Jun 2023 17:10:59 -0700 Message-Id: <20230613001108.3040476-34-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546265019967336?= X-GMAIL-MSGID: =?utf-8?q?1768546265019967336?= The shadow stack signal frame is read by the kernel on sigreturn. It relies on shadow stack memory protections to prevent forgeries of this signal frame (which included the pre-signal SSP). This behavior helps userspace protect itself. However, using the INCSSP instruction userspace can adjust the SSP to 8 bytes beyond the end of a shadow stack. INCSSP performs shadow stack reads to make sure it doesn’t increment off of the shadow stack, but on the end position it actually reads 8 bytes below the new SSP. For the shadow stack HW operations, this situation (INCSSP off the end of a shadow stack by 8 bytes) would be fine. If the a RET is executed, the push to the shadow stack would fail to write to the shadow stack. If a CALL is executed, the SSP will be incremented back onto the stack and the return address will be written successfully to the very end. That is expected behavior around shadow stack underflow. However, the kernel doesn’t have a way to read shadow stack memory using shadow stack accesses. WRUSS can write to shadow stack memory with a shadow stack access which ensures the access is to shadow stack memory. But unfortunately for this case, there is no equivalent instruction for shadow stack reads. So when reading the shadow stack signal frames, the kernel currently assumes the SSP is pointing to the shadow stack and uses a normal read. The SSP pointing to shadow stack memory will be true in most cases, but as described above, in can be untrue by 8 bytes. So lookup the VMA of the shadow stack sigframe being read to verify it is shadow stack. Since the SSP can only be beyond the shadow stack by 8 bytes, and shadow stack memory is page aligned, this check only needs to be done when this type of relative position to a page boundary is encountered. So skip the extra work otherwise. Signed-off-by: Rick Edgecombe --- v9: - New patch --- arch/x86/kernel/shstk.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index a8705f7d966c..50733a510446 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -249,15 +249,38 @@ static int shstk_push_sigframe(unsigned long *ssp) static int shstk_pop_sigframe(unsigned long *ssp) { + struct vm_area_struct *vma; unsigned long token_addr; - int err; + bool need_to_check_vma; + int err = 1; + /* + * It is possible for the SSP to be off the end of a shadow stack by 4 + * or 8 bytes. If the shadow stack is at the start of a page or 4 bytes + * before it, it might be this case, so check that the address being + * read is actually shadow stack. + */ if (!IS_ALIGNED(*ssp, 8)) return -EINVAL; + need_to_check_vma = PAGE_ALIGN(*ssp) == *ssp; + + if (need_to_check_vma) + mmap_read_lock_killable(current->mm); + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); if (unlikely(err)) - return err; + goto out_err; + + if (need_to_check_vma) { + vma = find_vma(current->mm, *ssp); + if (!vma || !(vma->vm_flags & VM_SHADOW_STACK)) { + err = -EFAULT; + goto out_err; + } + + mmap_read_unlock(current->mm); + } /* Restore SSP aligned? */ if (unlikely(!IS_ALIGNED(token_addr, 8))) @@ -270,6 +293,10 @@ static int shstk_pop_sigframe(unsigned long *ssp) *ssp = token_addr; return 0; +out_err: + if (need_to_check_vma) + mmap_read_unlock(current->mm); + return err; } int setup_signal_shadow_stack(struct ksignal *ksig) From patchwork Tue Jun 13 00:11:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106984 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222497vqr; Mon, 12 Jun 2023 17:45:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ64fOCMRizDA27Gqg4CC0jJY5vAb/Ci3N0jNu9uaMPjgMFOvMAmd8vS4LxUN5kvS4qm6L9o X-Received: by 2002:a17:907:9811:b0:97e:aace:b6bc with SMTP id ji17-20020a170907981100b0097eaaceb6bcmr6920022ejc.53.1686617123606; Mon, 12 Jun 2023 17:45:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617123; cv=none; d=google.com; s=arc-20160816; b=cz5TpCvheni3lL/sVvu3ip+seb9yyqaopsgU5w+WSHuT1qiSN2j9Nz4v1WkJszXhxB M4koZtEGhgKrTaHJPU7o0ih9OmQbylkh85pBxY9fuMlLLCsTRTcMgflnRGKCeAfAqIw8 8nOiaDOhs7zTHjHym2UGzEH20qNrUOSPKesBs5WsGt4YR45/LTl94tbhdQ2vbA8cBzyi d6TMHXC627tc/hev9zKUkSXsNmMJsp1EmXiCwk2+33azNVLez7LCI/PtDKTQviI3THqN rTbtHqQPlNdDLvGFArl9ATPek5vjCVSuPuE6kNlyVxais9JUr9u/n4SYxKtuYlq8UVuV fCMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=7bqbFXf7e8tAHfT5l+M8gmkEcZ4WmmRLHaSbxb3zw8o=; b=QO3/AJUbz2Qb5ByaHAQevnQhTWXnEtlWzFys4l39UD4VsZ5rsbyd8hTpEZRaEpDRho hTRt1C48l5HWhIHl9zq5igpYepWfAfatnN8BAdaj6Zm/VgFjgVKXDtgMUQRgSkyXF/ET XJCZOMcEifWWgbIvX7M1c0o1NGM35ApxET55CcRG9G9tU3/UGIFCNCMbg5V9tl9Sks06 Apxs1HnKewoeCzU72dlREHFjMya2wroDnEAiHOsqgVbC2wsTtKy7thJRayeGsNtIGgBQ 9ze6WT8qZ2QkgEbEDOtl/Iz4smx5Mo05YAld4wCabs/Oh5NeOAcSINP5KslrD+pKjMCr 4JiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TynvUVM7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jw14-20020a17090776ae00b009745308879dsi5517716ejc.587.2023.06.12.17.44.35; Mon, 12 Jun 2023 17:45:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TynvUVM7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239405AbjFMASN (ORCPT + 99 others); Mon, 12 Jun 2023 20:18:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239151AbjFMAQ4 (ORCPT ); Mon, 12 Jun 2023 20:16:56 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F33930F8; Mon, 12 Jun 2023 17:13:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615231; x=1718151231; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q/smTC4JyzbSbIdcW3lB/5FNt1IRicAxT9isRFlZelE=; b=TynvUVM7+REZHysTsxCjKulPBu1USkfx2Y5M1Ir7sMcLaUVgutwIEnP5 ff06+l+NsRzlOpUUZ9SFAMRV+2g7WXw2Nwi/LQ6kP53BJc1+G0gjwl8VT RxEWcFV34KtKFgsFfrb6YywDrM2JEYjam91XEE69zPwTAHr3Dp60fqrKU Kvm79yoLzx2cLNuqQw9RlL771Y/oxOT17emjX2cL9SHCFzJl1seg0OBp7 9LP0eCN9nYdx85I/rGAxLUCrQW2K2bk20f1GPvCC6PaNgFLKEeLTEmZKK JPZLHdbE+iIXZtiAc94fQB3W+UvM9HEcKnGkks2JO0O/Qxk35OTDJBqNe g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557464" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557464" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671125" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671125" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:36 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 34/42] x86/shstk: Introduce map_shadow_stack syscall Date: Mon, 12 Jun 2023 17:11:00 -0700 Message-Id: <20230613001108.3040476-35-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546236728602156?= X-GMAIL-MSGID: =?utf-8?q?1768546236728602156?= When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stacks is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Previously, a new PROT_SHADOW_STACK was attempted, which could be mprotect()ed from RW permissions after the data was provisioned. This was found to not be secure enough, as other threads could write to the shadow stack during the writable window. The kernel can use a special instruction, WRUSS, to write directly to userspace shadow stacks. So the solution can be that memory can be mapped as shadow stack permissions from the beginning (never generally writable in userspace), and the kernel itself can write the restore token. First, a new madvise() flag was explored, which could operate on the PROT_SHADOW_STACK memory. This had a couple of downsides: 1. Extra checks were needed in mprotect() to prevent writable memory from ever becoming PROT_SHADOW_STACK. 2. Extra checks/vma state were needed in the new madvise() to prevent restore tokens being written into the middle of pre-used shadow stacks. It is ideal to prevent restore tokens being added at arbitrary locations, so the check was to make sure the shadow stack had never been written to. 3. It stood out from the rest of the madvise flags, as more of direct action than a hint at future desired behavior. So rather than repurpose two existing syscalls (mmap, madvise) that don't quite fit, just implement a new map_shadow_stack syscall to allow userspace to map and setup new shadow stacks in one step. While ucontext is the primary motivator, userspace may have other unforeseen reasons to setup its own shadow stacks using the WRSS instruction. Towards this provide a flag so that stacks can be optionally setup securely for the common case of ucontext without enabling WRSS. Or potentially have the kernel set up the shadow stack in some new way. The following example demonstrates how to create a new shadow stack with map_shadow_stack: void *shstk = map_shadow_stack(addr, stack_size, SHADOW_STACK_SET_TOKEN); Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/uapi/asm/mman.h | 3 ++ arch/x86/kernel/shstk.c | 59 ++++++++++++++++++++++---- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 2 +- kernel/sys_ni.c | 1 + 6 files changed, 58 insertions(+), 9 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c84d12608cd2..f65c671ce3b1 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -372,6 +372,7 @@ 448 common process_mrelease sys_process_mrelease 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node +451 64 map_shadow_stack sys_map_shadow_stack # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 5a0256e73f1e..8148bdddbd2c 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -13,6 +13,9 @@ ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) #endif +/* Flags for map_shadow_stack(2) */ +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ + #include #endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 50733a510446..04c37b33a625 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -71,19 +72,31 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) return 0; } -static unsigned long alloc_shstk(unsigned long size) +static unsigned long alloc_shstk(unsigned long addr, unsigned long size, + unsigned long token_offset, bool set_res_tok) { int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; struct mm_struct *mm = current->mm; - unsigned long addr, unused; + unsigned long mapped_addr, unused; + + if (addr) + flags |= MAP_FIXED_NOREPLACE; mmap_write_lock(mm); - addr = do_mmap(NULL, 0, size, PROT_READ, flags, - VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); - + mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); - return addr; + if (!set_res_tok || IS_ERR_VALUE(mapped_addr)) + goto out; + + if (create_rstor_token(mapped_addr + token_offset, NULL)) { + vm_munmap(mapped_addr, size); + return -EINVAL; + } + +out: + return mapped_addr; } static unsigned long adjust_shstk_size(unsigned long size) @@ -134,7 +147,7 @@ static int shstk_setup(void) return -EOPNOTSUPP; size = adjust_shstk_size(0); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -178,7 +191,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long cl return 0; size = adjust_shstk_size(stack_size); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return addr; @@ -398,6 +411,36 @@ static int shstk_disable(void) return 0; } +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ + bool set_tok = flags & SHADOW_STACK_SET_TOKEN; + unsigned long aligned_size; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + if (flags & ~SHADOW_STACK_SET_TOKEN) + return -EINVAL; + + /* If there isn't space for a token */ + if (set_tok && size < 8) + return -ENOSPC; + + if (addr && addr < SZ_4G) + return -ERANGE; + + /* + * An overflow would result in attempting to write the restore token + * to the wrong location. Not catastrophic, but just return the right + * error code and block it. + */ + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return alloc_shstk(addr, aligned_size, size, set_tok); +} + long shstk_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_SHSTK_LOCK) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 33a0ee3bcb2e..392dc11e3556 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1058,6 +1058,7 @@ asmlinkage long sys_memfd_secret(unsigned int flags); asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long len, unsigned long home_node, unsigned long flags); +asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 45fa180cc56a..b12940ec5926 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -887,7 +887,7 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 860b2dcf3ac4..cb9aebd34646 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -381,6 +381,7 @@ COND_SYSCALL(vm86old); COND_SYSCALL(modify_ldt); COND_SYSCALL(vm86); COND_SYSCALL(kexec_file_load); +COND_SYSCALL(map_shadow_stack); /* s390 */ COND_SYSCALL(s390_pci_mmio_read); From patchwork Tue Jun 13 00:11:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106989 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222713vqr; Mon, 12 Jun 2023 17:45:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6TNj2SZ6EB2FS4kuKL+zspRrrahdgRQalCJsDbZUavYMUBbPK2Ph5ygkInBWeoAJppSVYc X-Received: by 2002:aa7:c1c3:0:b0:514:ae6d:dc24 with SMTP id d3-20020aa7c1c3000000b00514ae6ddc24mr6538198edp.8.1686617152034; Mon, 12 Jun 2023 17:45:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617152; cv=none; d=google.com; s=arc-20160816; b=trd8xPOm7qnfcrMW/5dtK1Ne4b1lIEj/wH3RPr9W3hWet7OsqqPnXdE95FokkUpgFN QQyV7UQA39Qmu2Ucd2lgBGL9hOuSlikNBVvmmv+DStUj3zX/A9LRD/+ZTs6myw67Z8Uz lHjnNpymVZi8FLB/d/04uI08GQ3FDsflWnJlG066YKXpzTYAivqNFoos6Ai1IB7qP+8W VBPrhtNnd7Jt9zDqi6C9wIFACRbdZAkzwxs10LI46z89i77EuYxBypHzOIoqh+P5Et2Y au3YGLN3dzVlKlteUJacSzi3/Jh4DSxgOdTYk23WK+/RGy4o0TEnk9Se+MGBknQXcbr+ 4lLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DsGV+FArnCWzPmUm3Iwim6nAv4YxSXJQs9pogdjIv6c=; b=NDCtJzdJ55+GOSiNe/eilNUGWnm309+7OQVYX/mRO11PTUSYTwQw/2k3dThzsyG4Nu 0uz0+1sJ6uPIplz2HqWlLJS8vmsBM7Uiaoz9ZD+tZSNEp61RfYxdXT08hX/F9Ti4sQTu cmv7uDB+ER8ZJwynQk9dt6iZTYCiSqn8Nb4HgdL+28tnyjq+qWab7pUnDVRv5UxB25BU rfZudQdJagfAi90xQiy6GQb6om58IlgPnuGFF1SZFOMnd8eJtEPFGouUK9rARpgYVhOL wG8URNadS53BCKlUnJ0jv/YXJyC9I+fM9xNMKGtHELbLH3MC467Cuf9YT69TkIo4mWC0 4R5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QgMsEJA5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e4-20020a056402088400b0050489449e77si2706010edy.569.2023.06.12.17.44.50; Mon, 12 Jun 2023 17:45:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QgMsEJA5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239541AbjFMASw (ORCPT + 99 others); Mon, 12 Jun 2023 20:18:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239081AbjFMARe (ORCPT ); Mon, 12 Jun 2023 20:17:34 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B91AF0; Mon, 12 Jun 2023 17:14:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615247; x=1718151247; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Src9+kIB+Tfk1ReeJh4vEiYrNG1G01r+lIh28P15+jA=; b=QgMsEJA5QLErJ9NSGDjEfYUTb4UJdi6QHUJiFIV9ZPaamI/8fZoskvLg EGQkV6D6Ons2i/3Cbxa7uKZlv9LPdIZjFiOdUs/8zMth95uT2TEQ40Cdm jGvsuKB04wUHNK/CQKlLUE/gICny38QWl8ekxu+jSYNRshJLPzpHyfvz6 AWC6oSsbGWDydV0BrjfOcSOPafkmiVh1nB/6uBcV8M3S9KEmkDuTJ9pV+ bz59jqqvK1l8jYtqIIjXCtknhx9mckcllIReBjBUaPEKa/JkGZPCIQ2+c kIw91sYppsJeoQu6M2r+25HotvQFjSKE2un8UpoVJrgitrljEUU3SCWME w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557487" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557487" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671131" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671131" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:37 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 35/42] x86/shstk: Support WRSS for userspace Date: Mon, 12 Jun 2023 17:11:01 -0700 Message-Id: <20230613001108.3040476-36-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546267007126329?= X-GMAIL-MSGID: =?utf-8?q?1768546267007126329?= For the current shadow stack implementation, shadow stacks contents can't easily be provisioned with arbitrary data. This property helps apps protect themselves better, but also restricts any potential apps that may want to do exotic things at the expense of a little security. The x86 shadow stack feature introduces a new instruction, WRSS, which can be enabled to write directly to shadow stack memory from userspace. Allow it to get enabled via the prctl interface. Only enable the userspace WRSS instruction, which allows writes to userspace shadow stacks from userspace. Do not allow it to be enabled independently of shadow stack, as HW does not support using WRSS when shadow stack is disabled. >From a fault handler perspective, WRSS will behave very similar to WRUSS, which is treated like a user access from a #PF err code perspective. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/shstk.c | 43 ++++++++++++++++++++++++++++++- 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 6a8e0e1bff4a..eedfde3b63be 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -36,5 +36,6 @@ /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) +#define ARCH_SHSTK_WRSS (1ULL << 1) #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 04c37b33a625..ea0bf113f9cf 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -390,6 +390,47 @@ void shstk_free(struct task_struct *tsk) unmap_shadow_stack(shstk->base, shstk->size); } +static int wrss_control(bool enable) +{ + u64 msrval; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* + * Only enable WRSS if shadow stack is enabled. If shadow stack is not + * enabled, WRSS will already be disabled, so don't bother clearing it + * when disabling. + */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return -EPERM; + + /* Already enabled/disabled? */ + if (features_enabled(ARCH_SHSTK_WRSS) == enable) + return 0; + + fpregs_lock_and_load(); + rdmsrl(MSR_IA32_U_CET, msrval); + + if (enable) { + features_set(ARCH_SHSTK_WRSS); + msrval |= CET_WRSS_EN; + } else { + features_clr(ARCH_SHSTK_WRSS); + if (!(msrval & CET_WRSS_EN)) + goto unlock; + + msrval &= ~CET_WRSS_EN; + } + + wrmsrl(MSR_IA32_U_CET, msrval); + +unlock: + fpregs_unlock(); + + return 0; +} + static int shstk_disable(void) { if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) @@ -406,7 +447,7 @@ static int shstk_disable(void) fpregs_unlock(); shstk_free(current); - features_clr(ARCH_SHSTK_SHSTK); + features_clr(ARCH_SHSTK_SHSTK | ARCH_SHSTK_WRSS); return 0; } From patchwork Tue Jun 13 00:11:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106969 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp219192vqr; Mon, 12 Jun 2023 17:36:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5KWcIefjQlXJETw7xIYkn2JQ/M1b1xTTgSNqd5VzdScVDi+TWVIpHqrHhe87z9/h/hXYsq X-Received: by 2002:a17:907:6d1e:b0:977:b397:bbfa with SMTP id sa30-20020a1709076d1e00b00977b397bbfamr11592201ejc.6.1686616572482; Mon, 12 Jun 2023 17:36:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616572; cv=none; d=google.com; s=arc-20160816; b=Q88iZdtRIirOyPqwZQfem//mpGiOJey/lF7bGTEqxERW7Kq4BY58aDaovrZbAkaYDI L7OD/X6IWxj2PHgkoII8a4f8ffaIexUJpdI9vlBjBPpLBj3ajMacfJ/xWGOHKv7xMLAp CQuCGclLp+ogteQ+dPkNdyAi94JO2GLQ4dSTzX5Kwb9tC+ALHqbsgB+FQTE7ofFivVk7 H4r47KTfs/ywtQmX8ncyUcaS2gQxXndPv1l4BUddO5pBxQKcNBlh038GrXovNPV26DI7 0pJhnpyJ54Ajlbm0Vd96nIXWP70mWXL0clSu/OnJ/MiilctgLl3qAlO5QlChM3uNagSn DHrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=0WlSnIOpYVVB1S6QKfwhyiJstnml2qma1WgJwX0ae3s=; b=UHPjQsMA61VuBVvyVk8vZ45dLg7hKa5HJXP6fZDOEOQRNIkNZ83cM+4gQGrkIb0ZYl HZ3nWTSLgWIBge3B1kA3kQaCBqQ47ersGOYi3OTxtuMvPe2izHj5SBObfvrGSfcPCvIO mty+IbYfxVFLKo8KE/x+T+kjnTH+0zIIjhfdc9MyOhQu1ZOfx8feImm4VYTbFbMJPsiq ymExDzwBPTk9n1yMJNETNCQJsOsIbKCso3HZG9EDFbEoLt22c4gU1Hj1VxT8T1BpGajb f5mo+rCt5M6AavQr1FY0h4k9OiseQknfSaTG+LiaHTWzWsrFxOQeWNhVx6fBueKPaVZE 5VnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MN4bLAP1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p9-20020a170906614900b00977cc3d4bf2si5730128ejl.1007.2023.06.12.17.35.12; Mon, 12 Jun 2023 17:36:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MN4bLAP1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239357AbjFMATJ (ORCPT + 99 others); Mon, 12 Jun 2023 20:19:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239183AbjFMARe (ORCPT ); Mon, 12 Jun 2023 20:17:34 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8447198D; Mon, 12 Jun 2023 17:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615249; x=1718151249; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x/pco2Xet/ihWAT7hysHs+5knwsBlg8L8hAHOHmdlgo=; b=MN4bLAP1UKupjLheS/bfz31PM8cXj7TLiVHKl7HIS1Hhdi369CtYqpgL MaIMhOAe9E3OjGCL9txRFET4Kkmf47SZXzdUZgM9IAV9l24NR8pU8y6x5 k7/oVLF5Cr0w1GiG+x94Oz8nq0Ys3qf0EBd4GrQnxjcOBu4X+wsO+jxeq 9qE0ySXyO0gSLmh0SCRE+26hnqRvTQhc0P6eW0G8fK2gyPrU9dctGHOlo Ed5Nwji1Y7Yi3b8uFq5vnP1rm2dRelLZSCk0K86qrwSgivl/NW1xNujNu khoRkBWvRXq7wbOKsUc+n8iwE7YdMGo9mRtMTIGDgI72GZhT3M9QZt9w2 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557511" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557511" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671140" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671140" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:38 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 36/42] x86: Expose thread features in /proc/$PID/status Date: Mon, 12 Jun 2023 17:11:02 -0700 Message-Id: <20230613001108.3040476-37-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768545659227315735?= X-GMAIL-MSGID: =?utf-8?q?1768545659227315735?= Applications and loaders can have logic to decide whether to enable shadow stack. They usually don't report whether shadow stack has been enabled or not, so there is no way to verify whether an application actually is protected by shadow stack. Add two lines in /proc/$PID/status to report enabled and locked features. Since, this involves referring to arch specific defines in asm/prctl.h, implement an arch breakout to emit the feature lines. [Switched to CET, added to commit log] Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/kernel/cpu/proc.c | 23 +++++++++++++++++++++++ fs/proc/array.c | 6 ++++++ include/linux/proc_fs.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 099b6f0d96bd..31c0e68f6227 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -4,6 +4,8 @@ #include #include #include +#include +#include #include "cpu.h" @@ -175,3 +177,24 @@ const struct seq_operations cpuinfo_op = { .stop = c_stop, .show = show_cpuinfo, }; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +static void dump_x86_features(struct seq_file *m, unsigned long features) +{ + if (features & ARCH_SHSTK_SHSTK) + seq_puts(m, "shstk "); + if (features & ARCH_SHSTK_WRSS) + seq_puts(m, "wrss "); +} + +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task) +{ + seq_puts(m, "x86_Thread_features:\t"); + dump_x86_features(m, task->thread.features); + seq_putc(m, '\n'); + + seq_puts(m, "x86_Thread_features_locked:\t"); + dump_x86_features(m, task->thread.features_locked); + seq_putc(m, '\n'); +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/fs/proc/array.c b/fs/proc/array.c index d35bbf35a874..2c2efbe685d8 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -431,6 +431,11 @@ static inline void task_untag_mask(struct seq_file *m, struct mm_struct *mm) seq_printf(m, "untag_mask:\t%#lx\n", mm_untag_mask(mm)); } +__weak void arch_proc_pid_thread_features(struct seq_file *m, + struct task_struct *task) +{ +} + int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { @@ -455,6 +460,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, task_cpus_allowed(m, task); cpuset_task_status_allowed(m, task); task_context_switch_counts(m, task); + arch_proc_pid_thread_features(m, task); return 0; } diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 0260f5ea98fe..80ff8e533cbd 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -158,6 +158,8 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); #endif /* CONFIG_PROC_PID_ARCH_STATUS */ +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task); + #else /* CONFIG_PROC_FS */ static inline void proc_root_init(void) From patchwork Tue Jun 13 00:11:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106993 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp223253vqr; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7xqA69F/Br0DfIrtRM2bIQSRWEMrhRvurNPZgKiDz5DbpX0SckyMlENuMiwq3oY1iOR+XW X-Received: by 2002:a17:907:162c:b0:978:8925:7a00 with SMTP id hb44-20020a170907162c00b0097889257a00mr12295606ejc.15.1686617238253; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617238; cv=none; d=google.com; s=arc-20160816; b=oMtsHkw+Q8ZDgZnR6IAKZZR3VPtMY3rSm30t06qtAJin79GRSK9lbAJrLtDJMxSS55 MMTA5o/P0wLKXhCI5pFSwWL46PFvpiD/4rSYU4A6+9tkRVAm6QNFRoQ6pDlGTl1loL3p 6+A0qNczYEon051Ief19eFsLE+5pKm7xHmjgWRTNUOfnUP14v+dpmONJWkt06ll+NKvv Ui6/ntHSJJcr7tR2qB88AvmAl7D4Si/DCfSfBrk8GPR9BvwWGffyeXL46iPWo7hvb5Us 0L6IoARNmbdF8kNYXAdjIr0QgERkJV6hf+odF4JwS5VomzmrP6H5i/ZBjtj/kcoYI/PI sS8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1F1/TF3ynb464OSVq+cAxAGFjNLdRbyulxUg251OjE0=; b=RhQRb8cMpUI/4m/SImzqNvXosP+qQulBHBC52X1uQ4ctZ+GUlCGnJ4qjaxG2hzi9+M xN16o1ntbOESrz3X4APJvShR2rXx9WURGQw6Pd9SvF6sHP0iPMD1on4OlgkSvaaqiX8l Fvb+7VmZvu8qAW3hEyvt7omYCXGzue6ZH9ACquOuwpw0fqo2IqE8qc4ehFPmnNknzUUD C5MpjturEHDjKREsXtISRgoeaNQwHroC0vztFCH9XoHEvSrca8sv2o+tFsDqPTG/LO32 YEBk4bXe6I2XQhKf5suLOKZz5dZgeqv250nkJISawyJMmjO9WLRZ/O7/gSXI/s5M9WAb sqxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hu8F+oRk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e15-20020a1709067e0f00b0097456c647ffsi6314679ejr.447.2023.06.12.17.46.54; Mon, 12 Jun 2023 17:47:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hu8F+oRk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238832AbjFMATE (ORCPT + 99 others); Mon, 12 Jun 2023 20:19:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239327AbjFMARg (ORCPT ); Mon, 12 Jun 2023 20:17:36 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F87F199E; Mon, 12 Jun 2023 17:14:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615253; x=1718151253; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xI4pbs3PiOhFQEbuDUq+4GewOhti2Nle1TQJwXiYqGs=; b=hu8F+oRkUA9SufUWd3i53q2VvFFsKCZw7iZl7c/I4VeIQoJKiyTlfduS F2YiwOgJvJOgmJpBVZ6VLsDMRjLJXhitb4CkdAZzs+ebTSgG7Rg/41+It Xx9GdUMWm1jfvOrHJLnSRhh8YZzN3+Bu9PmFs10hWoS0f/DKx/4qx9gWk WOgO+2Yduwx7KtzTwK0NnZ7qkRu9g7VKQjqPoIWmdgTTqSN043dfzDtRJ CB41KWAYViTKvcGK7TzR3I+cW+dGhxa+mfhRMvlSP/AeB3pru90PrCA2S sC6AxQDYieCiP7oi7/fKGBhKGbo1g+c9K5Q38YglvS/myT1D0j6Sb21V/ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557540" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557540" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671152" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671152" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:39 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 37/42] x86/shstk: Wire in shadow stack interface Date: Mon, 12 Jun 2023 17:11:03 -0700 Message-Id: <20230613001108.3040476-38-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546357449515054?= X-GMAIL-MSGID: =?utf-8?q?1768546357449515054?= The kernel now has the main shadow stack functionality to support applications. Wire in the WRSS and shadow stack enable/disable functions into the existing shadow stack API skeleton. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/kernel/shstk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index ea0bf113f9cf..d723cdc93474 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -502,9 +502,17 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long features) return -EINVAL; if (option == ARCH_SHSTK_DISABLE) { + if (features & ARCH_SHSTK_WRSS) + return wrss_control(false); + if (features & ARCH_SHSTK_SHSTK) + return shstk_disable(); return -EINVAL; } /* Handle ARCH_SHSTK_ENABLE */ + if (features & ARCH_SHSTK_SHSTK) + return shstk_setup(); + if (features & ARCH_SHSTK_WRSS) + return wrss_control(true); return -EINVAL; } From patchwork Tue Jun 13 00:11:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106979 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222046vqr; Mon, 12 Jun 2023 17:44:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7id6gPZTAnT2L6u74FaOsl8DULZ587LMmgAxZ0VVo+rtWBWzJl2LOtdGGGjkfLlGNjY4gt X-Received: by 2002:a17:906:eec3:b0:96a:63d4:2493 with SMTP id wu3-20020a170906eec300b0096a63d42493mr11435160ejb.40.1686617053089; Mon, 12 Jun 2023 17:44:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617053; cv=none; d=google.com; s=arc-20160816; b=qDnlbQ3FKBLWx4uV+7oB21ue5bCufCXf0QQ5bK+Seb69Xj57Fpf7cTfBzRU2XsLYCL j8qg8hTca8wGtzUmY/qRkkt/sYKAA1D977N+T4VsvHEso+TRX7Z99nYMfsrZPczunvXy 3OjGWkpArdXYQMVI13UjrXxyjfbnIxeq9T+MI/UO8rKsGUncSJFt9/D/gEbimvKujYYR 1ZcwVc/1dKrKlSbYkN9qikZ7kKl7DFhzdx4yUEMu/1VvvFC4f/W0jn7ZzC3A/SPxg9dl W4+Xgi8LPoGh5caIpBps8CpRGMmb7CJxjKzH0e+RYj+N/wymoGNfdiUSf0kVAacSGc9j L1YQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ggbaxEIISLmgT7I/30ANB4iH1d2bu4GVx9rE6ZdVsLQ=; b=Y4FaUERiXIiwzAnXl7g32UoPuuKTj67A/lemrOx0lDk9yRk9XgnL9JZfT9pqQzeD6s 9N051oTLgP02aUXUIYQk9TTFTy5aIVrY88+TPPCYiMATkKV9P2dvtgPhhlWWC33Wnyl+ 867IecYPNQdABNYPIVHaLZhq4aohBKEGb0OFumYo2UxAG3TkNFqiwuD46Zxg5//ie5zx gCSruss5SrSPMfGm0+aeTFBFaT4pZV7ZTGCUMssAxRZriKcfkFoVXlBmM7fhgMX7vYV0 FH+qmsTY03rpBmO6DVTKZdlVT7iFt944nRvT/DpQlAzvk8HQGzV1ce3jy1xHFRCnlXeT 8cwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MWPkJUdm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l14-20020a056402124e00b00514a97b6b81si6417487edw.308.2023.06.12.17.43.50; Mon, 12 Jun 2023 17:44:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MWPkJUdm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239603AbjFMATP (ORCPT + 99 others); Mon, 12 Jun 2023 20:19:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238376AbjFMARi (ORCPT ); Mon, 12 Jun 2023 20:17:38 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7BB2E3A98; Mon, 12 Jun 2023 17:14:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615255; x=1718151255; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Umo485lkaXi+aij5vBJ9ejxN8cdryMJacLabCk9dyEI=; b=MWPkJUdmtyjntK5lDrs/JDccnFDWtHqzqfZbFawreZbEia0neiUKvyIs L/Xkukjx4tFeYXalOUMN3smyv9Hko+wyJIj7AYHQc6bJtsg1yyIYI/2jD OuWmyqJbmkq6gLSvG2I4N0YBQ4szoPKEwEgT671iJ5M/5yO0y1C7pJ6Vp 9x2Fk+U+4IXfi8jETSAdXN5CMfJhfKzM2UapONY5T7z3ziGbbB+Tru8G/ ZRQ0HBefpxT/K+Su1gSkvM1uNwanzh+d9IlvQtaROm5K/ooGMfjLXBnUg ABrllXCJao/a3M5WjBMvVFo3z/TZjPbZhFsFhyB+b5NWWY23xpfKgIwWl g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557564" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557564" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671162" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671162" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:40 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 38/42] x86/cpufeatures: Enable CET CR4 bit for shadow stack Date: Mon, 12 Jun 2023 17:11:04 -0700 Message-Id: <20230613001108.3040476-39-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546163040321187?= X-GMAIL-MSGID: =?utf-8?q?1768546163040321187?= Setting CR4.CET is a prerequisite for utilizing any CET features, most of which also require setting MSRs. Kernel IBT already enables the CET CR4 bit when it detects IBT HW support and is configured with kernel IBT. However, future patches that enable userspace shadow stack support will need the bit set as well. So change the logic to enable it in either case. Clear MSR_IA32_U_CET in cet_disable() so that it can't live to see userspace in a new kexec-ed kernel that has CR4.CET set from kernel IBT. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/kernel/cpu/common.c | 35 +++++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 80710a68ef7d..3ea06b0b4570 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -601,27 +601,43 @@ __noendbr void ibt_restore(u64 save) static __always_inline void setup_cet(struct cpuinfo_x86 *c) { - u64 msr = CET_ENDBR_EN; + bool user_shstk, kernel_ibt; - if (!HAS_KERNEL_IBT || - !cpu_feature_enabled(X86_FEATURE_IBT)) + if (!IS_ENABLED(CONFIG_X86_CET)) return; - wrmsrl(MSR_IA32_S_CET, msr); + kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT); + user_shstk = cpu_feature_enabled(X86_FEATURE_SHSTK) && + IS_ENABLED(CONFIG_X86_USER_SHADOW_STACK); + + if (!kernel_ibt && !user_shstk) + return; + + if (user_shstk) + set_cpu_cap(c, X86_FEATURE_USER_SHSTK); + + if (kernel_ibt) + wrmsrl(MSR_IA32_S_CET, CET_ENDBR_EN); + else + wrmsrl(MSR_IA32_S_CET, 0); + cr4_set_bits(X86_CR4_CET); - if (!ibt_selftest()) { + if (kernel_ibt && !ibt_selftest()) { pr_err("IBT selftest: Failed!\n"); wrmsrl(MSR_IA32_S_CET, 0); setup_clear_cpu_cap(X86_FEATURE_IBT); - return; } } __noendbr void cet_disable(void) { - if (cpu_feature_enabled(X86_FEATURE_IBT)) - wrmsrl(MSR_IA32_S_CET, 0); + if (!(cpu_feature_enabled(X86_FEATURE_IBT) || + cpu_feature_enabled(X86_FEATURE_SHSTK))) + return; + + wrmsrl(MSR_IA32_S_CET, 0); + wrmsrl(MSR_IA32_U_CET, 0); } /* @@ -1483,6 +1499,9 @@ static void __init cpu_parse_early_param(void) if (cmdline_find_option_bool(boot_command_line, "noxsaves")) setup_clear_cpu_cap(X86_FEATURE_XSAVES); + if (cmdline_find_option_bool(boot_command_line, "nousershstk")) + setup_clear_cpu_cap(X86_FEATURE_USER_SHSTK); + arglen = cmdline_find_option(boot_command_line, "clearcpuid", arg, sizeof(arg)); if (arglen <= 0) return; From patchwork Tue Jun 13 00:11:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106968 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp219180vqr; Mon, 12 Jun 2023 17:36:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6XBVAhwmWbpde9H6g4R37osKCU4JYXdhN4PIlDYn/HSlobVmAVkYqbaBRzJOha4lB3YT16 X-Received: by 2002:a17:907:9304:b0:96f:45cd:6c24 with SMTP id bu4-20020a170907930400b0096f45cd6c24mr10392960ejc.23.1686616570991; Mon, 12 Jun 2023 17:36:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616570; cv=none; d=google.com; s=arc-20160816; b=ZuQxRrM8Dhsr9YpJIogxWKHupMQRVH9oq/1AQyVqRcEt1kFkRSP6Kv/Iu6zOPxM3sV yxpE9zDXSij4Lzwa28t4SOu5qSbaS3NvZABhFxypyUHPdqbtN+tPVnXkbBWdSYmfXFr/ qhkKrjm2KeYpwvR3qV+CI3wzs8xROv/aMYLSP9bCuEF+8hVy83kISY6C1ZaqE4KgT0NO NlBYU0DUKqU5tKddTdWDPrCOrT5WAdIC/y+c/uxSgAdnCEdJA0I2D2rRdsfksgoXqEXA EGTHzecwuAKfqBDAhFWzf69Kyg15vDY5Y/MK8J3PIQe9N6ITrR8ymzBt/ojaM2hswdmG h2ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=XIDUjPughZjFOXgJfHaTyjXWpD03lk8bxIhB3OEpD8I=; b=WrkY0N2ur0hBifmLNsalJ7/H4L+bsDKaPGueovkcEuS/k8Gj/5uvMG3cxihJk0YAaT Dthvp8CMhhHJ7IEy0uhh1CG/mOQ7r9YYuW7aZJ2JZ+TqrV1KmDxBNJgsT0T3JZqcSXqM KtzWA56zTSRgGU0LDATGm+gD+uERUE9cxiL1H4NWLL1qaOmhq/lA1GpUUT85DKTIwNVN EcVRrM5vj1+5p2AdE8ppZ2IqpYA9maqOdXDdArsh7BFAIf0oLeyUhuHlsilnXSb9EEam NT07g1MA6E8BDSiVKeq1GP4g8jpDPZkMAIYvBQKrTpeDdwMiase7vxo2mDbw5Zp1RKtc EfRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CBsEWX+s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id oy30-20020a170907105e00b00978a23d0adcsi5920207ejb.211.2023.06.12.17.34.57; Mon, 12 Jun 2023 17:36:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CBsEWX+s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239616AbjFMATU (ORCPT + 99 others); Mon, 12 Jun 2023 20:19:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238592AbjFMARw (ORCPT ); Mon, 12 Jun 2023 20:17:52 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CA783AA5; Mon, 12 Jun 2023 17:14:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615258; x=1718151258; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BODs5YgrIYiUG9wa8tbCiXPQczkVVrNo7gO7I9Fm7W8=; b=CBsEWX+sWwTDuKBnaqL9IjQlVpnalydt+qBdk18c2u1/DuppBWs7dRI4 kPA+kSLM5BetH/RprxPBKAf06anLwIz8REUH6WexByMmJMF9aRcU7aRsE We7pR6MoH6W/kkZs7n+RQjHNPD1R1WYjqNmNeIzlvYF9HDKkpx10jDi7c js4MH6CLuRHqh1kZ7r9CQwpbZLi888C5Ne27VwN05q+rfPJqZrmHNmD90 /BiEuICecypWdp4bepLx/OkQ+oBv/u7mTDi4exk+p/ujqAfO8fyS3k94T 3LGmSfYXI8cbRW/j0f8fO6CJtwDv4pVPFmM86pRwUuskjIiqTZ5x1LpYV A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557588" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557588" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671173" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671173" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:41 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 39/42] selftests/x86: Add shadow stack test Date: Mon, 12 Jun 2023 17:11:05 -0700 Message-Id: <20230613001108.3040476-40-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768545657911897025?= X-GMAIL-MSGID: =?utf-8?q?1768545657911897025?= Add a simple selftest for exercising some shadow stack behavior: - map_shadow_stack syscall and pivot - Faulting in shadow stack memory - Handling shadow stack violations - GUP of shadow stack memory - mprotect() of shadow stack memory - Userfaultfd on shadow stack memory - 32 bit segmentation - Guard gap test - Ptrace test Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Fix race in userfaultfd test - Add guard gap test - Add ptrace test --- tools/testing/selftests/x86/Makefile | 2 +- .../testing/selftests/x86/test_shadow_stack.c | 884 ++++++++++++++++++ 2 files changed, 885 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/test_shadow_stack.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 598135d3162b..7e8c937627dd 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \ - corrupt_xstate_header amx lam + corrupt_xstate_header amx lam test_shadow_stack # Some selftests require 32bit support enabled also on 64bit systems TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c new file mode 100644 index 000000000000..5ab788dc1ce6 --- /dev/null +++ b/tools/testing/selftests/x86/test_shadow_stack.c @@ -0,0 +1,884 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program test's basic kernel shadow stack support. It enables shadow + * stack manual via the arch_prctl(), instead of relying on glibc. It's + * Makefile doesn't compile with shadow stack support, so it doesn't rely on + * any particular glibc. As a result it can't do any operations that require + * special glibc shadow stack support (longjmp(), swapcontext(), etc). Just + * stick to the basics and hope the compiler doesn't do anything strange. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Define the ABI defines if needed, so people can run the tests + * without building the headers. + */ +#ifndef __NR_map_shadow_stack +#define __NR_map_shadow_stack 451 + +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) + +#define ARCH_SHSTK_ENABLE 0x5001 +#define ARCH_SHSTK_DISABLE 0x5002 +#define ARCH_SHSTK_LOCK 0x5003 +#define ARCH_SHSTK_UNLOCK 0x5004 +#define ARCH_SHSTK_STATUS 0x5005 + +#define ARCH_SHSTK_SHSTK (1ULL << 0) +#define ARCH_SHSTK_WRSS (1ULL << 1) + +#define NT_X86_SHSTK 0x204 +#endif + +#define SS_SIZE 0x200000 +#define PAGE_SIZE 0x1000 + +#if (__GNUC__ < 8) || (__GNUC__ == 8 && __GNUC_MINOR__ < 5) +int main(int argc, char *argv[]) +{ + printf("[SKIP]\tCompiler does not support CET.\n"); + return 0; +} +#else +void write_shstk(unsigned long *addr, unsigned long val) +{ + asm volatile("wrssq %[val], (%[addr])\n" + : "=m" (addr) + : [addr] "r" (addr), [val] "r" (val)); +} + +static inline unsigned long __attribute__((always_inline)) get_ssp(void) +{ + unsigned long ret = 0; + + asm volatile("xor %0, %0; rdsspq %0" : "=r" (ret)); + return ret; +} + +/* + * For use in inline enablement of shadow stack. + * + * The program can't return from the point where shadow stack gets enabled + * because there will be no address on the shadow stack. So it can't use + * syscall() for enablement, since it is a function. + * + * Based on code from nolibc.h. Keep a copy here because this can't pull in all + * of nolibc.h. + */ +#define ARCH_PRCTL(arg1, arg2) \ +({ \ + long _ret; \ + register long _num asm("eax") = __NR_arch_prctl; \ + register long _arg1 asm("rdi") = (long)(arg1); \ + register long _arg2 asm("rsi") = (long)(arg2); \ + \ + asm volatile ( \ + "syscall\n" \ + : "=a"(_ret) \ + : "r"(_arg1), "r"(_arg2), \ + "0"(_num) \ + : "rcx", "r11", "memory", "cc" \ + ); \ + _ret; \ +}) + +void *create_shstk(void *addr) +{ + return (void *)syscall(__NR_map_shadow_stack, addr, SS_SIZE, SHADOW_STACK_SET_TOKEN); +} + +void *create_normal_mem(void *addr) +{ + return mmap(addr, SS_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); +} + +void free_shstk(void *shstk) +{ + munmap(shstk, SS_SIZE); +} + +int reset_shstk(void *shstk) +{ + return madvise(shstk, SS_SIZE, MADV_DONTNEED); +} + +void try_shstk(unsigned long new_ssp) +{ + unsigned long ssp; + + printf("[INFO]\tnew_ssp = %lx, *new_ssp = %lx\n", + new_ssp, *((unsigned long *)new_ssp)); + + ssp = get_ssp(); + printf("[INFO]\tchanging ssp from %lx to %lx\n", ssp, new_ssp); + + asm volatile("rstorssp (%0)\n":: "r" (new_ssp)); + asm volatile("saveprevssp"); + printf("[INFO]\tssp is now %lx\n", get_ssp()); + + /* Switch back to original shadow stack */ + ssp -= 8; + asm volatile("rstorssp (%0)\n":: "r" (ssp)); + asm volatile("saveprevssp"); +} + +int test_shstk_pivot(void) +{ + void *shstk = create_shstk(0); + + if (shstk == MAP_FAILED) { + printf("[FAIL]\tError creating shadow stack: %d\n", errno); + return 1; + } + try_shstk((unsigned long)shstk + SS_SIZE - 8); + free_shstk(shstk); + + printf("[OK]\tShadow stack pivot\n"); + return 0; +} + +int test_shstk_faults(void) +{ + unsigned long *shstk = create_shstk(0); + + /* Read shadow stack, test if it's zero to not get read optimized out */ + if (*shstk != 0) + goto err; + + /* Wrss memory that was already read. */ + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + /* Page out memory, so we can wrss it again. */ + if (reset_shstk((void *)shstk)) + goto err; + + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + printf("[OK]\tShadow stack faults\n"); + return 0; + +err: + return 1; +} + +unsigned long saved_ssp; +unsigned long saved_ssp_val; +volatile bool segv_triggered; + +void __attribute__((noinline)) violate_ss(void) +{ + saved_ssp = get_ssp(); + saved_ssp_val = *(unsigned long *)saved_ssp; + + /* Corrupt shadow stack */ + printf("[INFO]\tCorrupting shadow stack\n"); + write_shstk((void *)saved_ssp, 0); +} + +void segv_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tGenerated shadow stack violation successfully\n"); + + segv_triggered = true; + + /* Fix shadow stack */ + write_shstk((void *)saved_ssp, saved_ssp_val); +} + +int test_shstk_violation(void) +{ + struct sigaction sa = {}; + + sa.sa_sigaction = segv_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + segv_triggered = false; + + /* Make sure segv_triggered is set before violate_ss() */ + asm volatile("" : : : "memory"); + + violate_ss(); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow stack violation test\n"); + + return !segv_triggered; +} + +/* Gup test state */ +#define MAGIC_VAL 0x12345678 +bool is_shstk_access; +void *shstk_ptr; +int fd; + +void reset_test_shstk(void *addr) +{ + if (shstk_ptr) + free_shstk(shstk_ptr); + shstk_ptr = create_shstk(addr); +} + +void test_access_fix_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tViolation from %s\n", is_shstk_access ? "shstk access" : "normal write"); + + segv_triggered = true; + + /* Fix shadow stack */ + if (is_shstk_access) { + reset_test_shstk(shstk_ptr); + return; + } + + free_shstk(shstk_ptr); + create_normal_mem(shstk_ptr); +} + +bool test_shstk_access(void *ptr) +{ + is_shstk_access = true; + segv_triggered = false; + write_shstk(ptr, MAGIC_VAL); + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool test_write_access(void *ptr) +{ + is_shstk_access = false; + segv_triggered = false; + *(unsigned long *)ptr = MAGIC_VAL; + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool gup_write(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (write(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +bool gup_read(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (read(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +int test_gup(void) +{ + struct sigaction sa = {}; + int status; + pid_t pid; + + sa.sa_sigaction = test_access_fix_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + segv_triggered = false; + + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + return 1; + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> write access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> write access success\n"); + + close(fd); + + /* COW/gup test */ + reset_test_shstk(0); + pid = fork(); + if (!pid) { + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + exit(1); + + if (gup_write(shstk_ptr)) { + close(fd); + exit(1); + } + close(fd); + exit(0); + } + waitpid(pid, &status, 0); + if (WEXITSTATUS(status)) { + printf("[FAIL]\tWrite in child failed\n"); + return 1; + } + if (*(unsigned long *)shstk_ptr == MAGIC_VAL) { + printf("[FAIL]\tWrite in child wrote through to shared memory\n"); + return 1; + } + + printf("[INFO]\tCow gup write -> write access success\n"); + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow gup test\n"); + + return 0; +} + +int test_mprotect(void) +{ + struct sigaction sa = {}; + + sa.sa_sigaction = test_access_fix_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + segv_triggered = false; + + /* mprotect a shadow stack as read only */ + reset_test_shstk(0); + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* try to wrss it and fail */ + if (!test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to read-only memory succeeded\n"); + return 1; + } + + /* + * The shadow stack was reset above to resolve the fault, make the new one + * read-only. + */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* then back to writable */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_WRITE | PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_WRITE) failed\n"); + return 1; + } + + /* then wrss to it and succeed */ + if (test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to mprotect() writable memory failed\n"); + return 1; + } + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tmprotect() test\n"); + + return 0; +} + +char zero[4096]; + +static void *uffd_thread(void *arg) +{ + struct uffdio_copy req; + int uffd = *(int *)arg; + struct uffd_msg msg; + int ret; + + while (1) { + ret = read(uffd, &msg, sizeof(msg)); + if (ret > 0) + break; + else if (errno == EAGAIN) + continue; + return (void *)1; + } + + req.dst = msg.arg.pagefault.address; + req.src = (__u64)zero; + req.len = 4096; + req.mode = 0; + + if (ioctl(uffd, UFFDIO_COPY, &req)) + return (void *)1; + + return (void *)0; +} + +int test_userfaultfd(void) +{ + struct uffdio_register uffdio_register; + struct uffdio_api uffdio_api; + struct sigaction sa = {}; + pthread_t thread; + void *res; + int uffd; + + sa.sa_sigaction = test_access_fix_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + printf("[SKIP]\tUserfaultfd unavailable.\n"); + return 0; + } + + reset_test_shstk(0); + + uffdio_api.api = UFFD_API; + uffdio_api.features = 0; + if (ioctl(uffd, UFFDIO_API, &uffdio_api)) + goto err; + + uffdio_register.range.start = (__u64)shstk_ptr; + uffdio_register.range.len = 4096; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) + goto err; + + if (pthread_create(&thread, NULL, &uffd_thread, &uffd)) + goto err; + + reset_shstk(shstk_ptr); + test_shstk_access(shstk_ptr); + + if (pthread_join(thread, &res)) + goto err; + + if (test_shstk_access(shstk_ptr)) + goto err; + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + if (!res) + printf("[OK]\tUserfaultfd test\n"); + return !!res; +err: + free_shstk(shstk_ptr); + close(uffd); + signal(SIGSEGV, SIG_DFL); + return 1; +} + +/* Simple linked list for keeping track of mappings in test_guard_gap() */ +struct node { + struct node *next; + void *mapping; +}; + +/* + * This tests whether mmap will place other mappings in a shadow stack's guard + * gap. The steps are: + * 1. Finds an empty place by mapping and unmapping something. + * 2. Map a shadow stack in the middle of the known empty area. + * 3. Map a bunch of PAGE_SIZE mappings. These will use the search down + * direction, filling any gaps until it encounters the shadow stack's + * guard gap. + * 4. When a mapping lands below the shadow stack from step 2, then all + * of the above gaps are filled. The search down algorithm will have + * looked at the shadow stack gaps. + * 5. See if it landed in the gap. + */ +int test_guard_gap(void) +{ + void *free_area, *shstk, *test_map = (void *)0xFFFFFFFFFFFFFFFF; + struct node *head = NULL, *cur; + + free_area = mmap(0, SS_SIZE * 3, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + munmap(free_area, SS_SIZE * 3); + + shstk = create_shstk(free_area + SS_SIZE); + if (shstk == MAP_FAILED) + return 1; + + while (test_map > shstk) { + test_map = mmap(0, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (test_map == MAP_FAILED) + return 1; + cur = malloc(sizeof(*cur)); + cur->mapping = test_map; + + cur->next = head; + head = cur; + } + + while (head) { + cur = head; + head = cur->next; + munmap(cur->mapping, PAGE_SIZE); + free(cur); + } + + free_shstk(shstk); + + if (shstk - test_map - PAGE_SIZE != PAGE_SIZE) + return 1; + + printf("[OK]\tGuard gap test\n"); + + return 0; +} + +/* + * Too complicated to pull it out of the 32 bit header, but also get the + * 64 bit one needed above. Just define a copy here. + */ +#define __NR_compat_sigaction 67 + +/* + * Call 32 bit signal handler to get 32 bit signals ABI. Make sure + * to push the registers that will get clobbered. + */ +int sigaction32(int signum, const struct sigaction *restrict act, + struct sigaction *restrict oldact) +{ + register long syscall_reg asm("eax") = __NR_compat_sigaction; + register long signum_reg asm("ebx") = signum; + register long act_reg asm("ecx") = (long)act; + register long oldact_reg asm("edx") = (long)oldact; + int ret = 0; + + asm volatile ("int $0x80;" + : "=a"(ret), "=m"(oldact) + : "r"(syscall_reg), "r"(signum_reg), "r"(act_reg), + "r"(oldact_reg) + : "r8", "r9", "r10", "r11" + ); + + return ret; +} + +sigjmp_buf jmp_buffer; + +void segv_gp_handler(int signum, siginfo_t *si, void *uc) +{ + segv_triggered = true; + + /* + * To work with old glibc, this can't rely on siglongjmp working with + * shadow stack enabled, so disable shadow stack before siglongjmp(). + */ + ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK); + siglongjmp(jmp_buffer, -1); +} + +/* + * Transition to 32 bit mode and check that a #GP triggers a segfault. + */ +int test_32bit(void) +{ + struct sigaction sa = {}; + struct sigaction *sa32; + + /* Create sigaction in 32 bit address range */ + sa32 = mmap(0, 4096, PROT_READ | PROT_WRITE, + MAP_32BIT | MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + sa32->sa_flags = SA_SIGINFO; + + sa.sa_sigaction = segv_gp_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + + segv_triggered = false; + + /* Make sure segv_triggered is set before triggering the #GP */ + asm volatile("" : : : "memory"); + + /* + * Set handler to somewhere in 32 bit address space + */ + sa32->sa_handler = (void *)sa32; + if (sigaction32(SIGUSR1, sa32, NULL)) + return 1; + + if (!sigsetjmp(jmp_buffer, 1)) + raise(SIGUSR1); + + if (segv_triggered) + printf("[OK]\t32 bit test\n"); + + return !segv_triggered; +} + +void segv_handler_ptrace(int signum, siginfo_t *si, void *uc) +{ + /* The SSP adjustment caused a segfault. */ + exit(0); +} + +int test_ptrace(void) +{ + unsigned long saved_ssp, ssp = 0; + struct sigaction sa= {}; + struct iovec iov; + int status; + int pid; + + iov.iov_base = &ssp; + iov.iov_len = sizeof(ssp); + + pid = fork(); + if (!pid) { + ssp = get_ssp(); + + sa.sa_sigaction = segv_handler_ptrace; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + ptrace(PTRACE_TRACEME, NULL, NULL, NULL); + /* + * The parent will tweak the SSP and return from this function + * will #CP. + */ + raise(SIGTRAP); + + exit(1); + } + + while (waitpid(pid, &status, 0) != -1 && WSTOPSIG(status) != SIGTRAP); + + if (ptrace(PTRACE_GETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tFailed to PTRACE_GETREGS\n"); + goto out_kill; + } + + if (!ssp) { + printf("[INFO]\tPtrace child SSP was 0\n"); + goto out_kill; + } + + saved_ssp = ssp; + + iov.iov_len = 0; + if (!ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tToo small size accepted via PTRACE_SETREGS\n"); + goto out_kill; + } + + iov.iov_len = sizeof(ssp) + 1; + if (!ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tToo large size accepted via PTRACE_SETREGS\n"); + goto out_kill; + } + + ssp += 1; + if (!ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tUnaligned SSP written via PTRACE_SETREGS\n"); + goto out_kill; + } + + ssp = 0xFFFFFFFFFFFF0000; + if (!ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tKernel range SSP written via PTRACE_SETREGS\n"); + goto out_kill; + } + + /* + * Tweak the SSP so the child with #CP when it resumes and returns + * from raise() + */ + ssp = saved_ssp + 8; + iov.iov_len = sizeof(ssp); + if (ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tFailed to PTRACE_SETREGS\n"); + goto out_kill; + } + + if (ptrace(PTRACE_DETACH, pid, NULL, NULL)) { + printf("[INFO]\tFailed to PTRACE_DETACH\n"); + goto out_kill; + } + + waitpid(pid, &status, 0); + if (WEXITSTATUS(status)) + return 1; + + printf("[OK]\tPtrace test\n"); + return 0; + +out_kill: + kill(pid, SIGKILL); + return 1; +} + +int main(int argc, char *argv[]) +{ + int ret = 0; + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not re-enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_WRSS)) { + printf("[SKIP]\tCould not enable WRSS\n"); + ret = 1; + goto out; + } + + /* Should have succeeded if here, but this is a test, so double check. */ + if (!get_ssp()) { + printf("[FAIL]\tShadow stack disabled\n"); + return 1; + } + + if (test_shstk_pivot()) { + ret = 1; + printf("[FAIL]\tShadow stack pivot\n"); + goto out; + } + + if (test_shstk_faults()) { + ret = 1; + printf("[FAIL]\tShadow stack fault test\n"); + goto out; + } + + if (test_shstk_violation()) { + ret = 1; + printf("[FAIL]\tShadow stack violation test\n"); + goto out; + } + + if (test_gup()) { + ret = 1; + printf("[FAIL]\tShadow shadow stack gup\n"); + goto out; + } + + if (test_mprotect()) { + ret = 1; + printf("[FAIL]\tShadow shadow mprotect test\n"); + goto out; + } + + if (test_userfaultfd()) { + ret = 1; + printf("[FAIL]\tUserfaultfd test\n"); + goto out; + } + + if (test_guard_gap()) { + ret = 1; + printf("[FAIL]\tGuard gap test\n"); + goto out; + } + + if (test_ptrace()) { + ret = 1; + printf("[FAIL]\tptrace test\n"); + } + + if (test_32bit()) { + ret = 1; + printf("[FAIL]\t32 bit test\n"); + goto out; + } + + return ret; + +out: + /* + * Disable shadow stack before the function returns, or there will be a + * shadow stack violation. + */ + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + return ret; +} +#endif From patchwork Tue Jun 13 00:11:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106990 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222715vqr; Mon, 12 Jun 2023 17:45:53 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ52zhp1LTH9+pv3EMWF5xWMzxX1I4Hu2tT3JwtkBdT30M7M1ubdXC7VlTjGCusjLt38eoRD X-Received: by 2002:aa7:d58c:0:b0:514:9c09:3c96 with SMTP id r12-20020aa7d58c000000b005149c093c96mr6445655edq.15.1686617152287; Mon, 12 Jun 2023 17:45:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617152; cv=none; d=google.com; s=arc-20160816; b=vsTrDDe/Yro8ZV9/mNYGWiTXd+Q6sB0+bfnFkwo+j0TR0iaNhGkrx5UmA0c9V+SYCs ujcUK6PNYm6nnemSKPnkAdd9INyTL+Pj7QZZmFHQh5/aXdnZ8hGpYMeIR5RljUMVPzsW o/VB7/JEbNBD9Bhb6vSxH6C2MCtL93AIvZ7mHJr0VRULFzpgcdR6VvLNYqKr39YmZfSU rVy5T6F+imdNlgEo5pZekAZOZ+QOcO05VCIZXKLAv1Id3e74Y6VZvEWDBqZNuY2/v9V2 +/NdeSF6cjOJZWssC7pqN7AnZ21ZF9WLXUXB7WKgitCKA7A9H/fm4sfcZ4XxM7w36QIf YZKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ym1IUf+WN11sfR5WNNfTjB6hlFyLrR1ZQd5U/3cGxxM=; b=lRlpyJ+YF4zvPj89X8dnfTdVQZeJOuj72ll4lFynCoCljgnc/uiUSL7GIgev7B/CSE 7+UB6ERkzWkZuFgF6EgCe+f/LKosqNiR858flhMq/s4jn5uTai3thkaGA/BpunBarotp 81k5D7kPbIP2JOWPJML/V0VscVo6uwVr2EKc/NZ/0IobkeBaWIv9djn6KoRmDEz9LqKp g26Ck7I8rfmM1rTjEREsVqaoBae6eswO8LBXPpI/jyrglxlToa+Sk9vcIfVN/eXJBFl+ G5TS87lz3UQhyInpx+xFMPu2a7SviqaO8vvu6F7kRqqMXlUdf7du+y+rvCYZf7AvRdax qxSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JM13llmt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h24-20020a50ed98000000b005187254fc4bsi352250edr.125.2023.06.12.17.45.27; Mon, 12 Jun 2023 17:45:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JM13llmt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239627AbjFMATW (ORCPT + 99 others); Mon, 12 Jun 2023 20:19:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239116AbjFMAR5 (ORCPT ); Mon, 12 Jun 2023 20:17:57 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7AB319BF; Mon, 12 Jun 2023 17:14:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615261; x=1718151261; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YlC9UAklwmTYEYA/IeAnmisSABN50M2A88ovR11LHkA=; b=JM13llmtFLVqxpUU/TalEJO7CymFGuC8WreiN5kmRlThp6PBegZsDYoM NtaYi1vmDEJfgdYPCW+DJJzuFprnxjxPxX4yXQoewOVyUSNJKjFBQzS+U o3MClS5pvD+uBJYfAUVtZcnaJgvbpRuKVxk0faqWbqT16vbNfU2SJUdYg dDR56quH1gYbl1nO9saYAy0F3Gmv9j2HKntdjY7EVuvnLPV9xOOD8hniw PLg8WwDjxfRWk1OA4OJk7uirl5upAnu6NHv2wh2VerHZMhJ3Odt9mhOzi evljsAEjMJL4uafiLMEIVdujI7iTPKhHdkFZHDnmQbVDLp12FAO/I3QD2 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557610" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557610" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671179" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671179" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:42 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 40/42] x86: Add PTRACE interface for shadow stack Date: Mon, 12 Jun 2023 17:11:06 -0700 Message-Id: <20230613001108.3040476-41-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546266776261869?= X-GMAIL-MSGID: =?utf-8?q?1768546266776261869?= Some applications (like GDB) would like to tweak shadow stack state via ptrace. This allows for existing functionality to continue to work for seized shadow stack applications. Provide a regset interface for manipulating the shadow stack pointer (SSP). There is already ptrace functionality for accessing xstate, but this does not include supervisor xfeatures. So there is not a completely clear place for where to put the shadow stack state. Adding it to the user xfeatures regset would complicate that code, as it currently shares logic with signals which should not have supervisor features. Don't add a general supervisor xfeature regset like the user one, because it is better to maintain flexibility for other supervisor xfeatures to define their own interface. For example, an xfeature may decide not to expose all of it's state to userspace, as is actually the case for shadow stack ptrace functionality. A lot of enum values remain to be used, so just put it in dedicated shadow stack regset. The only downside to not having a generic supervisor xfeature regset, is that apps need to be enlightened of any new supervisor xfeature exposed this way (i.e. they can't try to have generic save/restore logic). But maybe that is a good thing, because they have to think through each new xfeature instead of encountering issues when a new supervisor xfeature was added. By adding a shadow stack regset, it also has the effect of including the shadow stack state in a core dump, which could be useful for debugging. The shadow stack specific xstate includes the SSP, and the shadow stack and WRSS enablement status. Enabling shadow stack or WRSS in the kernel involves more than just flipping the bit. The kernel is made aware that it has to do extra things when cloning or handling signals. That logic is triggered off of separate feature enablement state kept in the task struct. So the flipping on HW shadow stack enforcement without notifying the kernel to change its behavior would severely limit what an application could do without crashing, and the results would depend on kernel internal implementation details. There is also no known use for controlling this state via ptrace today. So only expose the SSP, which is something that userspace already has indirect control over. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Squash "Enforce only whole copies for ssp_set()" fix that previously was in tip. --- arch/x86/include/asm/fpu/regset.h | 7 +-- arch/x86/kernel/fpu/regset.c | 81 +++++++++++++++++++++++++++++++ arch/x86/kernel/ptrace.c | 12 +++++ include/uapi/linux/elf.h | 2 + 4 files changed, 99 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/regset.h b/arch/x86/include/asm/fpu/regset.h index 4f928d6a367b..697b77e96025 100644 --- a/arch/x86/include/asm/fpu/regset.h +++ b/arch/x86/include/asm/fpu/regset.h @@ -7,11 +7,12 @@ #include -extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active; +extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active, + ssp_active; extern user_regset_get2_fn fpregs_get, xfpregs_get, fpregs_soft_get, - xstateregs_get; + xstateregs_get, ssp_get; extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set, - xstateregs_set; + xstateregs_set, ssp_set; /* * xstateregs_active == regset_fpregs_active. Please refer to the comment diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index 6d056b68f4ed..6bc1eb2a21bd 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "context.h" #include "internal.h" @@ -174,6 +175,86 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, return ret; } +#ifdef CONFIG_X86_USER_SHADOW_STACK +int ssp_active(struct task_struct *target, const struct user_regset *regset) +{ + if (target->thread.features & ARCH_SHSTK_SHSTK) + return regset->n; + + return 0; +} + +int ssp_get(struct task_struct *target, const struct user_regset *regset, + struct membuf to) +{ + struct fpu *fpu = &target->thread.fpu; + struct cet_user_state *cetregs; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -ENODEV; + + sync_fpstate(fpu); + cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because shadow stack was + * verified to be enabled above. This means + * MSR_IA32_U_CET.CET_SHSTK_EN should be 1 and so + * XFEATURE_CET_USER should not be in the init state. + */ + return -ENODEV; + } + + return membuf_write(&to, (unsigned long *)&cetregs->user_ssp, + sizeof(cetregs->user_ssp)); +} + +int ssp_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + struct fpu *fpu = &target->thread.fpu; + struct xregs_state *xsave = &fpu->fpstate->regs.xsave; + struct cet_user_state *cetregs; + unsigned long user_ssp; + int r; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !ssp_active(target, regset)) + return -ENODEV; + + if (pos != 0 || count != sizeof(user_ssp)) + return -EINVAL; + + r = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &user_ssp, 0, -1); + if (r) + return r; + + /* + * Some kernel instructions (IRET, etc) can cause exceptions in the case + * of disallowed CET register values. Just prevent invalid values. + */ + if (user_ssp >= TASK_SIZE_MAX || !IS_ALIGNED(user_ssp, 8)) + return -EINVAL; + + fpu_force_restore(fpu); + + cetregs = get_xsave_addr(xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because shadow stack was + * verified to be enabled above. This means + * MSR_IA32_U_CET.CET_SHSTK_EN should be 1 and so + * XFEATURE_CET_USER should not be in the init state. + */ + return -ENODEV; + } + + cetregs->user_ssp = user_ssp; + return 0; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION /* diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index dfaa270a7cc9..095f04bdabdc 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -58,6 +58,7 @@ enum x86_regset_64 { REGSET64_FP, REGSET64_IOPERM, REGSET64_XSTATE, + REGSET64_SSP, }; #define REGSET_GENERAL \ @@ -1267,6 +1268,17 @@ static struct user_regset x86_64_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, +#ifdef CONFIG_X86_USER_SHADOW_STACK + [REGSET64_SSP] = { + .core_note_type = NT_X86_SHSTK, + .n = 1, + .size = sizeof(u64), + .align = sizeof(u64), + .active = ssp_active, + .regset_get = ssp_get, + .set = ssp_set + }, +#endif }; static const struct user_regset_view user_x86_64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index ac3da855fb19..fa1ceeae2596 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -406,6 +406,8 @@ typedef struct elf64_shdr { #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ +/* Old binutils treats 0x203 as a CET state */ +#define NT_X86_SHSTK 0x204 /* x86 SHSTK state */ #define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */ #define NT_S390_TIMER 0x301 /* s390 timer register */ #define NT_S390_TODCMP 0x302 /* s390 TOD clock comparator register */ From patchwork Tue Jun 13 00:11:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106974 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp221174vqr; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7bw4IFMqYaPoeTgJ9WJ3xO15K0bPWXfSI94LUWcxj3+TwLYunAmqtv1uMqGUrR4QJZAuEn X-Received: by 2002:a17:907:9802:b0:971:fa86:27e with SMTP id ji2-20020a170907980200b00971fa86027emr13248351ejc.16.1686616917674; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686616917; cv=none; d=google.com; s=arc-20160816; b=atnNaF7OUXFw1MMXWBkynG+AOVVN7YacwbY6xFLC2z27S/j6oeezRGKXcijsONhLLI 2ctDHrXwS0/25yjNE7O6OQZ6EIs4p/uawbKcxG+AnP59y1rihJC4fsarSQhRtxJSx5TL 29Tl8Jok2j90a92uNqfUXiBbaavQt5Nhm8tv4JJnVBxc3O5rA2Y/vcXeEgHvDrBcPdcn yVl7jhjXOPJ0s0nNYCU45LxvTVDFsMeSUgtA9NESg/p9lwdao2w0mIs4Dr3o12eIjvDA Tkqerevoqq2Fc6QT8N0A5pTApahRrZBqhKKQfD7rT0iuOdmGCd6wyQ5Fkpy1S0jXxYIJ f1Rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RkSEKvOu5kgy5PUfrb3PfcTMpS5rTzAByk+WnpSbRuQ=; b=Hcaunhcvxx5bcGgW9nFXXzFqwAU/Dt9xtyw1pvQKMe4gqSfs/FK24v2meUXIqMOgXb kWNBa/Fp3lt1hsATPr3Ij70e/s7FtvhZT02RnlSjLXaF4I/liVF926/wizQbzlqkzgVW h56iJKWFTTkTK/Mf78OQA2RdALqWtERXUixSFpRY34pygT377EqcS8V8EQJya4lsTDh9 2BHOtSV/FWnbeuPKJsbBDEiNmQBazaRnCp0hzNzcfmaEc84LpuofZUeO5HHjMJc1wVea Y+h2AhPqNw7ejd4gaLyFRKpyqYPiarV8xXzE4+eMSTcC1WFrYlPvASDJkVhMz1lqZNzd t4KQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MqwCNVj5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l4-20020a170906644400b0094f442c8a6dsi3305360ejn.291.2023.06.12.17.41.33; Mon, 12 Jun 2023 17:41:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MqwCNVj5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238958AbjFMAUo (ORCPT + 99 others); Mon, 12 Jun 2023 20:20:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239426AbjFMAS3 (ORCPT ); Mon, 12 Jun 2023 20:18:29 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D3A53C15; Mon, 12 Jun 2023 17:14:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615272; x=1718151272; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hFHVnW9lxN9HUD3QIC9fY4x2IPOgeGJUGkzddQmtaZw=; b=MqwCNVj5T/NU7ZoDUQmmzV1sTTUzvs7OWVUAWXwB+/0uMoVL6Xyjoj/8 u/vgj+yaLVKVIBWmIOeXda8u7r+GW1RDCCrouC0JN8b95OK7/xKlzQz6l gpWjxA/uSlMNLmJf42V4ehtdtwZRFLapnariIo3Nm/Velx9WLADXonBsh WfIN4reP9wjmitmF/RUCzkFceAZJd0i5YjBBnaDZLZMe22W3AB9P2tyUl mGbF5cnFuVUF1c0SVwP2LWySl6yiMM+Iu5CFpxFpxZKGoaUyrVXxDKMsS 0+cMsfuA1Bh9wnPTbGkCmlpJGh0HhJvS9SSIZt3+CJcdIqdwpYczATJop A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557635" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557635" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671183" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671183" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:43 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Mike Rapoport , Pengfei Xu Subject: [PATCH v9 41/42] x86/shstk: Add ARCH_SHSTK_UNLOCK Date: Mon, 12 Jun 2023 17:11:07 -0700 Message-Id: <20230613001108.3040476-42-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546020903882308?= X-GMAIL-MSGID: =?utf-8?q?1768546020903882308?= From: Mike Rapoport Userspace loaders may lock features before a CRIU restore operation has the chance to set them to whatever state is required by the process being restored. Allow a way for CRIU to unlock features. Add it as an arch_prctl() like the other shadow stack operations, but restrict it being called by the ptrace arch_pctl() interface. [Merged into recent API changes, added commit log and docs] Signed-off-by: Mike Rapoport Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- Documentation/arch/x86/shstk.rst | 4 ++++ arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 9 +++++++-- 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/arch/x86/shstk.rst b/Documentation/arch/x86/shstk.rst index f09afa504ec0..f3553cc8c758 100644 --- a/Documentation/arch/x86/shstk.rst +++ b/Documentation/arch/x86/shstk.rst @@ -75,6 +75,10 @@ arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) are ignored. The mask is ORed with the existing value. So any feature bits set here cannot be enabled or disabled afterwards. +arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) + Unlock features. 'features' is a mask of all features to unlock. All + bits set are processed, unset bits are ignored. Only works via ptrace. + The return values are as follows. On success, return 0. On error, errno can be:: diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index eedfde3b63be..3189c4a96468 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -33,6 +33,7 @@ #define ARCH_SHSTK_ENABLE 0x5001 #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 +#define ARCH_SHSTK_UNLOCK 0x5004 /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 0f89aa0186d1..e6db21c470aa 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -899,6 +899,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_SHSTK_ENABLE: case ARCH_SHSTK_DISABLE: case ARCH_SHSTK_LOCK: + case ARCH_SHSTK_UNLOCK: return shstk_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index d723cdc93474..d43b7a9c57ce 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -489,9 +489,14 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long features) return 0; } - /* Don't allow via ptrace */ - if (task != current) + /* Only allow via ptrace */ + if (task != current) { + if (option == ARCH_SHSTK_UNLOCK && IS_ENABLED(CONFIG_CHECKPOINT_RESTORE)) { + task->thread.features_locked &= ~features; + return 0; + } return -EINVAL; + } /* Do not allow to change locked features */ if (features & task->thread.features_locked) From patchwork Tue Jun 13 00:11:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 107007 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp233691vqr; Mon, 12 Jun 2023 18:13:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6b3FOw+y9GH9Po/SlCTvPsLlLMlaxDpz26LB3V502nootb4bZyID4tMTfaTK1+c7BCEKeF X-Received: by 2002:a17:907:9412:b0:973:d1ce:dbe8 with SMTP id dk18-20020a170907941200b00973d1cedbe8mr10312725ejc.46.1686618839725; Mon, 12 Jun 2023 18:13:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686618839; cv=none; d=google.com; s=arc-20160816; b=PctlNjbQpeP1z9spFGGIqDqcW8/d2rz9fdwFDZeUNx1TxENRIIN2TTNJ3XT+rOj9Gv /ZV04YFJWZUoTQDwMpCg+hzNwqdg1Mq0YegSvxKzmE/JGEl7z4JjRz/XQ6hmSOfXqlop 3YubrZ7xRWhQ8qZVH95cbITa3ZP3pZqCCA/9CAavt7ke3OuRE4RCB/67/LU6/EcNuGIq yWkD7PMTnjWjGoJMF5ozUHW2Re2HfibZLwQ4SJjRUbgjSSBX7uHeOGzhNfVHRAkPJBDr SQU3mfNt+ZSU3BZjUscGmou85+zGU27tZkEMJW27xVNK/QHrTgorzvHKbIq73Htb9Sv9 o3+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ID5oQBvY8UKE1ZCeO9J0p0kT0pVp+VavLYCmip95cII=; b=NGYQAdQY1e9ipeTEESljsz2L42NnqWUFehSf/gh4pIXB80c2c8adKoQxT6TzKclAr1 OrTSLyfw7WNFBZZ/PvRFHP6sYIqyzcCJceBPeK0gWt63f5Aew9WAQS3/1nCHD+H0mHX5 HwS7fmAH93iu0cpLYqbiYDmVBI5EyjnF35xBAaUy1Lt9u+k1SXIDK7Fod3ucX18wlwsT vL4iuncemgfG0E5xauplcz9S0CNYjpAmOXb7nHqJH6InRpiQl+pzyQNk1E5mcqzj7Qn2 nSw2ogxlxEEJ8x0XdyQFAyYzFPp8KzPgwqrEOtfY55v662OG0YXM9etEOVzoG9aACg9l voDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QQ9X1yvE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r26-20020aa7da1a000000b005149150f572si6955683eds.643.2023.06.12.18.13.36; Mon, 12 Jun 2023 18:13:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QQ9X1yvE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239116AbjFMAUq (ORCPT + 99 others); Mon, 12 Jun 2023 20:20:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239447AbjFMASa (ORCPT ); Mon, 12 Jun 2023 20:18:30 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AC003C1B; Mon, 12 Jun 2023 17:14:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615273; x=1718151273; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MIQ1pWvE/RoBE8pKawMdEBQM6Q57ao+KDpsn7QGopL8=; b=QQ9X1yvEzmGzuyIHxFih972HlZMFv/ydI62a38WGtKVXsRctmlFgC+kM x3p9Z6E/pTCwhwVTuvamy4j4G5YxtLsz00mec9bkGVisvezERjAE41ny4 LnBYVirsMrB0DN3IDKRm22i45rN9qc/Y2JUBNq/5hNTr6sNIeHfk9tykd Keztd6faDV/GqJ9qKUFLDM1Mw/KdMEmTpX+m6yqxoYYq8r6jceUZSSkzN Ym0hnGOxRLue5gmD9IRkNGRu6SecQmNMWoc70nxt8a748hJK03HgC4Elz kjamzrkpqYdlaas38sZCXKCHBKh1A7IJh7U3cIQXp/cvip05wXKS5fGPE A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557660" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557660" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671189" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671189" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:43 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 42/42] x86/shstk: Add ARCH_SHSTK_STATUS Date: Mon, 12 Jun 2023 17:11:08 -0700 Message-Id: <20230613001108.3040476-43-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768548036769374752?= X-GMAIL-MSGID: =?utf-8?q?1768548036769374752?= CRIU and GDB need to get the current shadow stack and WRSS enablement status. This information is already available via /proc/pid/status, but this is inconvenient for CRIU because it involves parsing the text output in an area of the code where this is difficult. Provide a status arch_prctl(), ARCH_SHSTK_STATUS for retrieving the status. Have arg2 be a userspace address, and make the new arch_prctl simply copy the features out to userspace. Suggested-by: Mike Rapoport Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- Documentation/arch/x86/shstk.rst | 6 ++++++ arch/x86/include/asm/shstk.h | 2 +- arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 8 +++++++- 5 files changed, 16 insertions(+), 2 deletions(-) diff --git a/Documentation/arch/x86/shstk.rst b/Documentation/arch/x86/shstk.rst index f3553cc8c758..60260e809baf 100644 --- a/Documentation/arch/x86/shstk.rst +++ b/Documentation/arch/x86/shstk.rst @@ -79,6 +79,11 @@ arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) Unlock features. 'features' is a mask of all features to unlock. All bits set are processed, unset bits are ignored. Only works via ptrace. +arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr) + Copy the currently enabled features to the address passed in addr. The + features are described using the bits passed into the others in + 'features'. + The return values are as follows. On success, return 0. On error, errno can be:: @@ -86,6 +91,7 @@ be:: -ENOTSUPP if the feature is not supported by the hardware or kernel. -EINVAL arguments (non existing feature, etc) + -EFAULT if could not copy information back to userspace The feature's bits supported are:: diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index ecb23a8ca47d..42fee8959df7 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -14,7 +14,7 @@ struct thread_shstk { u64 size; }; -long shstk_prctl(struct task_struct *task, int option, unsigned long features); +long shstk_prctl(struct task_struct *task, int option, unsigned long arg2); void reset_thread_features(void); unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size); diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 3189c4a96468..384e2cc6ac19 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -34,6 +34,7 @@ #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 #define ARCH_SHSTK_UNLOCK 0x5004 +#define ARCH_SHSTK_STATUS 0x5005 /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index e6db21c470aa..33b268747bb7 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -900,6 +900,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_SHSTK_DISABLE: case ARCH_SHSTK_LOCK: case ARCH_SHSTK_UNLOCK: + case ARCH_SHSTK_STATUS: return shstk_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index d43b7a9c57ce..b26810c7cd1c 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -482,8 +482,14 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi return alloc_shstk(addr, aligned_size, size, set_tok); } -long shstk_prctl(struct task_struct *task, int option, unsigned long features) +long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { + unsigned long features = arg2; + + if (option == ARCH_SHSTK_STATUS) { + return put_user(task->thread.features, (unsigned long __user *)arg2); + } + if (option == ARCH_SHSTK_LOCK) { task->thread.features_locked |= features; return 0;