From patchwork Sat Feb 18 21:14:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 59003 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp555314wrn; Sat, 18 Feb 2023 13:20:49 -0800 (PST) X-Google-Smtp-Source: AK7set94Xz6HMJUNf/Xb+Yjkiq0KqJVNrB9C56ZHntkqESwoW+R/s96DWuUhT6IxxFis4o6SGNSj X-Received: by 2002:a05:6a20:144d:b0:c7:7024:26a7 with SMTP id a13-20020a056a20144d00b000c7702426a7mr8719578pzi.59.1676755249337; Sat, 18 Feb 2023 13:20:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676755249; cv=none; d=google.com; s=arc-20160816; b=p3cBw9oCsjv9YL+nfOnKyD/hcmLHJjHw5UpJRpt3HqBy62HCcb5uJhGQHj7qhHaW/6 xxh7bYOWBnqKvOgxDjePTD6mpShshRPwBmzBI20oU78WiGWb+aKmNWbvEACpy8Zk+yI/ /ltkbIYo5hJWIq+nuyfiO09SPxPnzzD9d16f2tXKJfj3qcZMs4hF3NRNaQ+YgJl7nKq2 +bD7WVcYM+Kos8sL5KlRv6noZnc7TuNdVyyFFnHL9bxxnyUeirGGe1kkpLpL1xV5YzyS J2x7gc7uSmCFuHZK0wFZSrFGEP/OCNxoRDpYkl+5EHqlkL8cFHYHtErX+FhpGcVbo1yW 7lgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=/UpplPI4+cYRkVpg9K9b7HSmsSFR2XVceQoo1OLvz9A=; b=R2CQ7WTblCwr14B2/6k0Fbq2UD7j11XiNyn0CtgtOAV8Ihu0ivVFqNS1oOQS8lv0r1 C7dqz4jXMbDkk1qmpmKCUqoiDADWtFk6t41PEXa41eRZwdjoeEnsWxgGMyA7IE8xtwdx ibGBAJeJeLp7Bw7a698TEQIsPvo8CWbaJl3hiFBVAjcYRHXxjB6e2nSPt0ooz4jxmda4 aBP0LfqmgGOHxYtbGZztwder67eq0oSOnEabyog05rG34psltskwRWuSSNHeZLiuhMnn QVLgsLGAAPTE+f6kbFrFRV/BhSibcYxYgAbobvanz9Emdlm7wyKet5GNmSPyj6YUE4FD OPGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Nk/zp/rw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m65-20020a632644000000b004fb8305e165si9048473pgm.64.2023.02.18.13.20.36; Sat, 18 Feb 2023 13:20:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Nk/zp/rw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230059AbjBRVUM (ORCPT + 99 others); Sat, 18 Feb 2023 16:20:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229881AbjBRVTb (ORCPT ); Sat, 18 Feb 2023 16:19:31 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C7FF1714A; Sat, 18 Feb 2023 13:17:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676755046; x=1708291046; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=jMpoJVpnciJcstvlDtXofQ3AOsuP5WXYYIAV42M/rF8=; b=Nk/zp/rwpe6mMFhlSl54qHjWptYex537DUQGrXtKMTyHQhMqZLpbG3OX nfkvQ2Mj0tGUMhm83PrzMHBtskU+SaFOJ9IbG4B+r2vWvImrVh4JQ7VZc ifWRUy4e9btzZ+yhe+dk4t0nMKhyup+4BuXt9ef7zip7CJgXVKYeVjyWd dM914t5GOM8JrjXsVjzDdecEuDh6nzEaRWxLedf7Bg3oj3lSPCGfoulLw 4qjLkJCGi+Jlb4v19R12KGA8ecGfd3Za8Etjp70/eMhsDAy7XWOLQMg1P AjppEhBnUtmVCFLOodKny6d4SIsddxm7vWx0iZncmHyJQDtwmz1Okj1ro Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10625"; a="418427501" X-IronPort-AV: E=Sophos;i="5.97,309,1669104000"; d="scan'208";a="418427501" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2023 13:16:12 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10625"; a="664241664" X-IronPort-AV: E=Sophos;i="5.97,309,1669104000"; d="scan'208";a="664241664" Received: from adityava-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.80.223]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2023 13:16:11 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v6 19/41] x86/mm: Check shadow stack page fault errors Date: Sat, 18 Feb 2023 13:14:11 -0800 Message-Id: <20230218211433.26859-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230218211433.26859-1-rick.p.edgecombe@intel.com> References: <20230218211433.26859-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758205312223172698?= X-GMAIL-MSGID: =?utf-8?q?1758205312223172698?= From: Yu-cheng Yu The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stack accesses to shadow-stack mappings can result in faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. For the purpose of making this clearer, consider the following example. If a process has a shadow stack, and forks, the shadow stack PTEs will become read-only due to COW. If the CPU in one process performs a shadow stack read access to the shadow stack, for example executing a RET and causing the CPU to read the shadow stack copy of the return address, then in order for the fault to be resolved the PTE will need to be set with shadow stack permissions. But then the memory would be changeable from userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger COW, otherwise the shared page would be changeable from both processes. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v6: - Update comment due to rename of Cow bit to SavedDirty v5: - Add description of COW example (Boris) - Replace "permissioned" (Boris) - Remove capitalization of shadow stack (Boris) v4: - Further improve comment talking about FAULT_FLAG_WRITE (Peterz) v3: - Improve comment talking about using FAULT_FLAG_WRITE (Peterz) --- arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 38 ++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 7b0d4ab894c8..42885d8e2036 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1138,8 +1138,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1331,6 +1345,30 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * When a page becomes COW it changes from a shadow stack permission + * page (Write=0,Dirty=1) to (Write=0,Dirty=0,SavedDirty=1), which is simply + * read-only to the CPU. When shadow stack is enabled, a RET would + * normally pop the shadow stack by reading it with a "shadow stack + * read" access. However, in the COW case the shadow stack memory does + * not have shadow stack permissions, it is read-only. So it will + * generate a fault. + * + * For conventionally writable pages, a read can be serviced with a + * read only PTE, and COW would not have to happen. But for shadow + * stack, there isn't the concept of read-only shadow stack memory. + * If it is shadow stack permission, it can be modified via CALL and + * RET instructions. So COW needs to happen before any memory can be + * mapped with shadow stack permissions. + * + * Shadow stack accesses (read or write) need to be serviced with + * shadow stack permission memory, so in the case of a shadow stack + * read access, treat it as a WRITE fault so both COW will happen and + * the write fault path will tickle maybe_mkwrite() and map the memory + * shadow stack. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR)