From patchwork Tue Jun 13 00:10:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106991 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp222817vqr; Mon, 12 Jun 2023 17:46:10 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7+XLJWwySSc+hreodANMiZCGhXq+FK1xLTtRDAzehQLISEaD9Ri0ugoNfC2dDDgNnVoLmo X-Received: by 2002:a17:907:9802:b0:971:fa86:27e with SMTP id ji2-20020a170907980200b00971fa86027emr13255660ejc.16.1686617149324; Mon, 12 Jun 2023 17:45:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686617149; cv=none; d=google.com; s=arc-20160816; b=R5HwAKxaByll9ZQ+aaMsUxO30QBzXmy9ipGNjgy4TZrfRV1yToxo2oHRlZvWCwbcHj ZURwRHe7PO5ExmoME2RUCmYNiiNjOw7a1adTfX8ZqpTT4w7j1BNhxEF95L/96VMOLF6D 5e1QWI80+i/nBj9y1qbjNhfVXthHEEFdCMr5aaKipbqrPODUgDp7z8kmAQR0cUqmM98G pIbnvE9IdEmBoXCF8JSsyFQogbZJy4TYtjF2wy7EN/MpNAMpsRoSz7dApOzih6/uf5fY yBvd7IASofQwsqsbBm3kwTrXEbSebu0R3MgHsNhJ+h9CAnjl21EfSJDYOKAsK+o1wMII JJUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p4A9Ag3Rbq1a3tkkCFr8IkGFKL7/tKsugwm4Dh2bjeE=; b=Jm+GvO/mlEZRddHCfcvWhWteZKUr9s7um1N+rtZlYLgNW2QjOiQg3K9IyKi43JSEFk 8ZwozbTihtFc2cVTDR7bZEPWQYHHIOdZgKN70ZivUB7N6ze6CSJJHR3PSm465cwtWtg9 oto8Tk5F+/UH7ZoTXjHDSWgyPqcCztNNOVHt9cHXhJVHQ8h/kLkhs//saJ/qPZdncVfF 2FA6qvjj7tnNEWuojayh94epu4jraRcYmc3ahoIfKmZg5zZbAYw0gwzWLraF7eQo8kTE JnOQb5Yp2SghGaHux/uhGTNWDJsXquH1sJuS5SW3XpcrvVTpRH3OOK4mOwTFD5uN5cNU O3eA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nKs6GDAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ko6-20020a170907986600b0097461d96871si6471228ejc.968.2023.06.12.17.45.13; Mon, 12 Jun 2023 17:45:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nKs6GDAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239128AbjFMAPH (ORCPT + 99 others); Mon, 12 Jun 2023 20:15:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239019AbjFMAN3 (ORCPT ); Mon, 12 Jun 2023 20:13:29 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC2F91BD4; Mon, 12 Jun 2023 17:12:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615154; x=1718151154; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dhZWE6YTdZ+FD9PI/U6uJC2MaFgGrtV9CcOxkJ3S4o4=; b=nKs6GDAfAmXL3lLEMiSLw4dNNNs7s5A0Stx5EeIE6X9n4SO805ioHHFN eFUI+pVLjs79BDazntQy/EVRVkoXCZsQXPey8yNjshgQxvR8UJTzwrbJO Ug4HNzoqjKRYyB2akgYKDOzPKJPxdvNEL6vMEdPTlKs/nv+4KxDu6IuxP aHrC41y37agZj2bNv7iNNls1mG0Bw2Xxtdi873dw8akSgzFl6dhDDUhWx jTikhY5vmUOvOyOPnYoOPRbSh+CagAtYlIdcB/vd3Sg46JvauC+3ycJYZ C30ZZJrDXSYoBb4IVhQs6I8z12z07fgXjHPw7LOplyOal1i72UTeoQ5kT g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557047" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557047" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671035" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671035" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:21 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 17/42] mm: Warn on shadow stack memory in wrong vma Date: Mon, 12 Jun 2023 17:10:43 -0700 Message-Id: <20230613001108.3040476-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768546263620121819?= X-GMAIL-MSGID: =?utf-8?q?1768546263620121819?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One sharp edge is that PTEs that are both Write=0 and Dirty=1 are treated as shadow by the CPU, but this combination used to be created by the kernel on x86. Previous patches have changed the kernel to now avoid creating these PTEs unless they are for shadow stack memory. In case any missed corners of the kernel are still creating PTEs like this for non-shadow stack memory, and to catch any re-introductions of the logic, warn if any shadow stack PTEs (Write=0, Dirty=1) are found in non-shadow stack VMAs when they are being zapped. This won't catch transient cases but should have decent coverage. In order to check if a PTE is shadow stack in core mm code, add two arch breakouts arch_check_zapped_pte/pmd(). This will allow shadow stack specific code to be kept in arch/x86. Only do the check if shadow stack is supported by the CPU and configured because in rare cases older CPUs may write Dirty=1 to a Write=0 CPU on older CPUs. This check is handled in pte_shstk()/pmd_shstk(). Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: Mark Brown --- v9: - Add comments about not doing the check on non-shadow stack CPUs --- arch/x86/include/asm/pgtable.h | 6 ++++++ arch/x86/mm/pgtable.c | 20 ++++++++++++++++++++ include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 1 + mm/memory.c | 1 + 5 files changed, 42 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index d8724f5b1202..89cfa93d0ad6 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1664,6 +1664,12 @@ static inline bool arch_has_hw_pte_young(void) return true; } +#define arch_check_zapped_pte arch_check_zapped_pte +void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte); + +#define arch_check_zapped_pmd arch_check_zapped_pmd +void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd); + #ifdef CONFIG_XEN_PV #define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young static inline bool arch_has_hw_nonleaf_pmd_young(void) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 0ad2c62ac0a8..101e721d74aa 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -894,3 +894,23 @@ pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd_clear_saveddirty(pmd); } + +void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte) +{ + /* + * Hardware before shadow stack can (rarely) set Dirty=1 + * on a Write=0 PTE. So the below condition + * only indicates a software bug when shadow stack is + * supported by the HW. This checking is covered in + * pte_shstk(). + */ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pte_shstk(pte)); +} + +void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd) +{ + /* See note in arch_check_zapped_pte() */ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pmd_shstk(pmd)); +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0f3cf726812a..feb1fd2c814f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -291,6 +291,20 @@ static inline bool arch_has_hw_pte_young(void) } #endif +#ifndef arch_check_zapped_pte +static inline void arch_check_zapped_pte(struct vm_area_struct *vma, + pte_t pte) +{ +} +#endif + +#ifndef arch_check_zapped_pmd +static inline void arch_check_zapped_pmd(struct vm_area_struct *vma, + pmd_t pmd) +{ +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 37dd56b7b3d1..c3cc20c1b26c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1681,6 +1681,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, */ orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, tlb->fullmm); + arch_check_zapped_pmd(vma, orig_pmd); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (vma_is_special_huge(vma)) { if (arch_needs_pgtable_deposit()) diff --git a/mm/memory.c b/mm/memory.c index c1b6fe944c20..40c0b233b61d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1412,6 +1412,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + arch_check_zapped_pte(vma, ptent); tlb_remove_tlb_entry(tlb, pte, addr); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);