From patchwork Tue Jun 13 00:10:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 106963 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211277vqr; Mon, 12 Jun 2023 17:15:07 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4UQBngub9dXCaF8UKHM6pp5dJsQ26aBtmLJbQaNPTe1wHYyd9mXvVfmcrGKShvfVCAKLkr X-Received: by 2002:a17:907:6e88:b0:970:925:6564 with SMTP id sh8-20020a1709076e8800b0097009256564mr10065566ejc.35.1686615307204; Mon, 12 Jun 2023 17:15:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615307; cv=none; d=google.com; s=arc-20160816; b=kJECSUoKxpZeaqOIwg1rZlwklWs6u+K3qh3RPWtvyROtR/8Y4fF7WHdMx1crP1x/nT QswChU0XsNifjOzo12BEpmLADiK5nf7pFPMEMRxIU3/Rg4CyXlFMpd2wo7VQn1pZAWJa ykkNeO9xeQytbBETL0FurJ6f2LzkneZqjDSeFk5VMxixMGkWCyouRruHpsHTSHVSlcCx ao9MdtHoQ5Y2sJyGStCvyY8Ti9/d03aj47HuiCnGt8aRn5Y7llpLSIsc2nEjSJEjeRqW TKMwGegfB/Y/hOJTwm1R5Y3WXBCDArtJ15nPz6+OMJwjS4yrzRtRuFOwc8n4AtbWvz3F SZ3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5fI2fz3C60mSydVE+2mwy8UwGqfU5Pn7fjr7hidomLc=; b=Y6nBdMdFe3zsyLki5XkXhSxC4pthFqQDa5mhF2v8M2o99TCMSE+8gHLNlk5Ad15ppb FZZU+bRL1t7MHRH4loEaxsOvSpLoQ4XlFviw2fUxyhIVdlNTBQhDHwE/sFwn/yHuC0uQ byvdYRkk1AbJVGcpXUMWYUwhRrOeTqFjPtfg123GvhYOw+biAG3vJfqcvWCbjBb/aFNa G0mdhOCTxWvxUFnp44PxKOAWmZlG+fNz54ze35g0p5H5NIexAvJEn1NQ/WjPu0hKKBxJ UrPfBzpr2FuiJ5qq0LD8Qj4rDEYODoucT+j08FGjrCKSL5UOP/8jZ8fYeV9YSjKSBJOR U+lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MAkibPwx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qh1-20020a170906eca100b00965bb773354si5172848ejb.261.2023.06.12.17.14.41; Mon, 12 Jun 2023 17:15:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MAkibPwx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239014AbjFMAN0 (ORCPT + 99 others); Mon, 12 Jun 2023 20:13:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238870AbjFMAMf (ORCPT ); Mon, 12 Jun 2023 20:12:35 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD5A81728; Mon, 12 Jun 2023 17:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615146; x=1718151146; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VxsR8A3vgFZdU2Uya7vVGhJBClJyO0tAkA+BqmUO8Zs=; b=MAkibPwx6HjXO50QeclLoNDH6D2TGh73wITGile4ozU9mHcxnH+PheEm qOjOV8daMbTI/KuLiGHq4DTFRyRxFm676WzWhjn4w9iO9g845IDe9z26b fLqErGBXzqHHP/1PSPelFvMbtbKi/g/NPubKjam5QPWQAuZRmOJWDXiDi DjWXx/G850tGLUankXiSV3sx/AD+jF9uujteIqChg2rfcgOqef46XIszU jmuYjA/lcME4NaA6VS+8A3AViz/M9lJ7dOR1FP9CBYfFijfQsY8GfK1O3 lk4CwotlRYtaR+WONZpp2wd+jlQh8t2dx/ZYGlXPs7xRyo1an1mbNDzBZ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556998" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556998" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671026" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671026" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:20 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 15/42] x86/mm: Check shadow stack page fault errors Date: Mon, 12 Jun 2023 17:10:41 -0700 Message-Id: <20230613001108.3040476-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544332269732588?= X-GMAIL-MSGID: =?utf-8?q?1768544332269732588?= The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stack accesses to shadow-stack mappings can result in faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. For the purpose of making this clearer, consider the following example. If a process has a shadow stack, and forks, the shadow stack PTEs will become read-only due to COW. If the CPU in one process performs a shadow stack read access to the shadow stack, for example executing a RET and causing the CPU to read the shadow stack copy of the return address, then in order for the fault to be resolved the PTE will need to be set with shadow stack permissions. But then the memory would be changeable from userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger COW, otherwise the shared page would be changeable from both processes. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index e4399983c50c..fe68119ce2cc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1118,8 +1118,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1311,6 +1325,14 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * Read-only permissions can not be expressed in shadow stack PTEs. + * Treat all shadow stack accesses as WRITE faults. This ensures + * that the MM will prepare everything (e.g., break COW) such that + * maybe_mkwrite() can create a proper shadow stack PTE. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR)