From patchwork Sat Dec 3 00:35:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29158 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1141740wrr; Fri, 2 Dec 2022 16:38:14 -0800 (PST) X-Google-Smtp-Source: AA0mqf6x1E5QsXxNKpXWbvCeesY/RdIkTBpRJbestiF622Ph+ltAsYnZr+daUkwPgZ+W3B8NMf38 X-Received: by 2002:a17:906:5586:b0:7ad:902c:d158 with SMTP id y6-20020a170906558600b007ad902cd158mr48576860ejp.121.1670027894454; Fri, 02 Dec 2022 16:38:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027894; cv=none; d=google.com; s=arc-20160816; b=O/fHtB+t2O6z3l8A0GUt+588CDePaQobQ8vQ3ufL8DmnoJL8N0E1K3CCPtKJ3jSaod OBTnmLXwmpixZnrjpDoBpoTPmu1Tbx35o0Z1bSytxfYNBCky44eZC7J/suPKvOvSSjda 25oT9lv5F5ib/GJcL0zzjzaAkJZroWiKjtEwGTb1JVFpU78Usa1StEjyQMcMNGSa5lGS V6KUGeshZbGuL9T5x5KSUndB1iqGfl6fiMRKA9JiHatvFHHPCgDX8wZzlWGNldOvT069 qxR2GDB1TL7pg1XwlCetQVebo0BP+tGu+ZDl2Gmmc8VxSwyz2QTzYycP0VfyYyRmxhiO tRWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=Ukla2IpKALst9TPNTqWz7o9Xe1OrW5jdpYbpWT5D/hE=; b=HKmfM81Str2Umay/1x/KWAws0NUvIFohbkSMBnogqE/oiMnZhG5V5A9+O+rA60TMha DPrtwYZ0QEN2VM1PIK4suu+rW87pu+9vVHsMtJwKssciDy/zu5EliKD+SWAUcyZ5Irmx wxSgcbZA+KSO4dIf9NCwCKHo6E744w/5P8QDpr7h+sK7CZp51A5NlFVztrpN1W3D12ib +m34sy3rvMBrHY1Y+Wa2bbJXnT9JNK+o0Ym2mjzgEKu7YKpuGdd2MaQGCNgQec7S/WBk mDNCsaqT7jqkep2qoFk3p9La8JafLgitW0PEcy7eszo+WfQnmTBfS0Nn/xZSZT403m8y M93w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=PlHG5WK4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hh17-20020a170906a95100b007abafd4d7d0si5626345ejb.702.2022.12.02.16.37.51; Fri, 02 Dec 2022 16:38:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=PlHG5WK4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234956AbiLCAgl (ORCPT + 99 others); Fri, 2 Dec 2022 19:36:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234952AbiLCAgc (ORCPT ); Fri, 2 Dec 2022 19:36:32 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40C9DF4EA3; Fri, 2 Dec 2022 16:36:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027791; x=1701563791; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=XOfd9JiLVYyEEu25Vb3yqxeb0WsD1mhiLPx5CYNxCNc=; b=PlHG5WK43S6P9kUl/JptEGRpDcoilpR3MskSZOp1hryttxp9SE9UNd8O IiS90F0YEgwhOk3miJoKpjnqBS7jlxiOPueF1qFSsaB4Xxfs9ilkPZ2tn gfuC7kLd1yjpPm/oZSxuhaVsEXLeQh0I8w8nGwbHC4wIvpZmJ5kt1LNTo 9rEcpmgCRmcX6L6HujdcshcdsGJ4fI6/W7ntqelF9TklfBj9rhRDwoapC 0Bkgmd9fA/+m78V1hVLD1WwEHWEXmqqRgHGVQNXNL1AM+lf7FyoYSDtF6 pp9p+V3zoAx4wuvbcUHmQphKFJfAoYnKmTrhALJNbQQ3sJkeayqrdJrtF w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710642" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710642" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:30 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479747" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479747" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:28 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 01/39] Documentation/x86: Add CET shadow stack description Date: Fri, 2 Dec 2022 16:35:28 -0800 Message-Id: <20221203003606.6838-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151169426325145?= X-GMAIL-MSGID: =?utf-8?q?1751151169426325145?= From: Yu-cheng Yu Introduce a new document on Control-flow Enforcement Technology (CET). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook Reviewed-by: Bagas Sanjaya --- v4: - Drop clearcpuid piece (Boris) - Add some info about 32 bit v3: - Clarify kernel IBT is supported by the kernel. (Kees, Andrew Cooper) - Clarify which arch_prctl's can take multiple bits. (Kees) - Describe ASLR characteristics of thread shadow stacks. (Kees) - Add exec section. (Andrew Cooper) - Fix some capitalization (Bagas Sanjaya) - Update new location of enablement status proc. - Add info about new user_shstk software capability. - Add more info about what the kernel pushes to the shadow stack on signal. v2: - Updated to new arch_prctl() API - Add bit about new proc status v1: - Update and clarify the docs. - Moved kernel parameters documentation to other patch. Documentation/x86/index.rst | 1 + Documentation/x86/shstk.rst | 162 ++++++++++++++++++++++++++++++++++++ 2 files changed, 163 insertions(+) create mode 100644 Documentation/x86/shstk.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index c73d133fd37c..8ac64d7de4dc 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -22,6 +22,7 @@ x86-specific Documentation mtrr pat intel-hfi + shstk iommu intel_txt amd-memory-encryption diff --git a/Documentation/x86/shstk.rst b/Documentation/x86/shstk.rst new file mode 100644 index 000000000000..8e0b2fe83ef8 --- /dev/null +++ b/Documentation/x86/shstk.rst @@ -0,0 +1,162 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================================== +Control-flow Enforcement Technology (CET) Shadow Stack +====================================================== + +CET Background +============== + +Control-flow Enforcement Technology (CET) is term referring to several +related x86 processor features that provides protection against control +flow hijacking attacks. The HW feature itself can be set up to protect +both applications and the kernel. + +CET introduces Shadow Stack and Indirect Branch Tracking (IBT). Shadow stack +is a secondary stack allocated from memory and cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. IBT verifies indirect CALL/JMP targets are intended +as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow +Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace +Shadow Stack and kernel IBT is supported. + +Requirements to use Shadow Stack +================================ + +To use userspace shadow stack you need HW that supports it, a kernel +configured with it and userspace libraries compiled with it. + +The kernel Kconfig option is X86_USER_SHADOW_STACK, and it can be disabled +with the kernel parameter: nousershstk. + +To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later +are required. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. "shstk" and "ibt" relate to the individual HW features. "user_shstk" +relates to whether the userspace shadow stack specifically is supported. + +Application Enabling +==================== + +An application's CET capability is marked in its ELF note and can be verified +from readelf/llvm-readelf output: + + readelf -n | grep -a SHSTK + properties: x86 feature: SHSTK + +The kernel does not process these applications markers directly. Applications +or loaders must enable CET features using the interface described in section 4. +Typically this would be done in dynamic loader or static runtime objects, as is +the case in GLIBC. + +Enabling arch_prctl()'s +======================= + +Elf features should be enabled by the loader using the below arch_prctl's. They +are only supported in 64 bit user applciations. + +arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature) + Enable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature) + Disable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) + Lock in features at their current enabled or disabled status. 'features' + is a mask of all features to lock. All bits set are processed, unset bits + are ignored. The mask is ORed with the existing value. So any feature bits + set here cannot be enabled or disabled afterwards. + +The return values are as following: + On success, return 0. On error, errno can be:: + + -EPERM if any of the passed feature are locked. + -EOPNOTSUPP if the feature is not supported by the hardware or + disabled by kernel parameter. + -EINVAL arguments (non existing feature, etc) + +The feature's bits supported are:: + + ARCH_SHSTK_SHSTK - Shadow stack + ARCH_SHSTK_WRSS - WRSS + +Currently shadow stack and WRSS are supported via this interface. WRSS +can only be enabled with shadow stack, and is automatically disabled +if shadow stack is disabled. + +Proc Status +=========== +To check if an application is actually running with shadow stack, the +user can read the /proc/$PID/status. It will report "wrss" or "shstk" +depending on what is enabled. The lines look like this:: + + x86_Thread_features: shstk wrss + x86_Thread_features_locked: shstk wrss + +Implementation of the Shadow Stack +================================== + +Shadow Stack Size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. However, +a compat-mode application's address space is smaller, each of its thread's +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB). + +Signal +------ + +By default, the main program and its signal handlers use the same shadow +stack. Because the shadow stack stores only return addresses, a large +shadow stack covers the condition that both the program stack and the +signal alternate stack run out. + +When a signal happens, the old pre-signal state is pushed on the stack. When +shadow stack is enabled, the shadow stack specific state is pushed onto the +shadow stack. Today this is only the old SSP (shadow stack pointer), pushed +in a special format with bit 63 set. On sigreturn this old SSP token is +verified and restored by the kernel. The kernel will also push the normal +restorer address to the shadow stack to help userspace avoid a shadow stack +violation on the sigreturn path that goes through the restorer. + +So the shadow stack signal frame format is as follows:: + + |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format + (bit 63 set to 1) + | ...| - Other state may be added in the future + + +32 bit ABI signals are not supported in shadow stack processes. Linux prevents +this by clearing any 32 bit signals (those registered via a 32 bit syscall) +when shadow stack is enabled, and blocking any new ones from being added. + +Fork +---- + +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. New shadow stack's behave like mmap() with respect to +ASLR behavior. + +Exec +---- + +On exec, shadow stack features are disabled by the kernel. At which point, +userspace can choose to re-enable, or lock them. From patchwork Sat Dec 3 00:35:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29159 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1141765wrr; Fri, 2 Dec 2022 16:38:19 -0800 (PST) X-Google-Smtp-Source: AA0mqf4PjpiFZyVUKXgOEGueq7hJ37wnsfOo4ahK4Yi8LL+8AIofgd0NlOTifDad41H8hrbDITQv X-Received: by 2002:a05:6402:2926:b0:46b:b45d:4431 with SMTP id ee38-20020a056402292600b0046bb45d4431mr12371981edb.4.1670027899295; Fri, 02 Dec 2022 16:38:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027899; cv=none; d=google.com; s=arc-20160816; b=ZwyLIA9ZrpVQtNOLdKUM7ZHvrOGhGhv1x8RbyFhMpxFaRhtsYTrc9BnBaYQS1fQwOb Ms+z889YqKBcTJ9AE39B8VCNZbNRMpxbdKmQEbIDFjyp443YEdan7tzviAnPvdX2pWU9 2Rar4JcZ+3U2OEDXeGKs24ktNaX+BlU6mL4qxEweYzQoP1hT4NiarPwHdOUlspc8p0Di HZM0ly5DQit7dHV4ZKSIjDUTYUbki2fLDHrKJ8ZtN5um9IQpNg0/x+ZaqVAcBsugNFk4 y0GmG6oI5zDCgkpZoMLXy47cPw/0P3FcfKHhdRTMur7kOnQUobku2SBEbq0UpVs3jxa8 0W8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=WMzFZEjm9WqY8WrIR8jlAS+jzNVLgpUj8ZPPstVboYQ=; b=qkkmYT62qkqIXoNHR2CpzsIRF/xvHH2ajQYQbTCW75vC1bVTlykiMXWAP7zqMGipEX 2H/qElxUrDEnZoRVGSo37ME1YCZ92hefn7hd+wxEFde85NLuohjg3QGz0lqQ7mNmllln +TVEHZ0gbalh41mH6R5YA/AORZrug/WRzk9ITQW2vPvpsdNp4TW5u5mIhH3ox82f0aEU NTMuN/WbZqD+Y4nhLy8JIQM5V7HYhxW4YPBWuJw5pY2S0iOmxYDNpJERX8FX2QFKh9n6 kTsDHJvu91p0BcJfq+iid9fpGU6v/n12rWAUeiOpEHV9NVeS3DsS2AgcNgEcujulNaKY 5Cig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YlOwkxLB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dz13-20020a0564021d4d00b0045cba2d74c3si7273569edb.541.2022.12.02.16.37.56; Fri, 02 Dec 2022 16:38:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YlOwkxLB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234924AbiLCAgq (ORCPT + 99 others); Fri, 2 Dec 2022 19:36:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233795AbiLCAgd (ORCPT ); Fri, 2 Dec 2022 19:36:33 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1AFEF7A23; Fri, 2 Dec 2022 16:36:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027792; x=1701563792; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=nPGHwvfvhwh0s+qOYAT9IA/XK2YwRNZRo0Le5Qvnc2o=; b=YlOwkxLBIxJ4zgEZb9hGfpkfrYPm7bBLUkSngkARprFxFAbXh7CGyyIW wJ1Y3vzgGdfQcDKs3MmqN9/nuLNE+ijyv3wVR3eTwvcO5C1BYQDsvDTsY 3BwX2npgz5fM3J5ctMysosdzPc8nzU+6xu7LE2Xse0+t6AbsV8Pvx6Tr+ Iux0+TrRNYuE27dPOV05nrVoRMUQ/U0Xc8Humu4kScMyhb2oIW16MN4MI tirmzJ0bGH9gwWiLpr5+j315/rdiHgudEEPxPDvVpcc3YXpKchhp5mFlj uEVcTb5x+eKvWLd776PFVIUMU0epRR60hE6EzD9Zfn0zfScjMJwgItpcU w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710667" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710667" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:32 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479752" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479752" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:30 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 02/39] x86/shstk: Add Kconfig option for Shadow Stack Date: Fri, 2 Dec 2022 16:35:29 -0800 Message-Id: <20221203003606.6838-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151174435573467?= X-GMAIL-MSGID: =?utf-8?q?1751151174435573467?= From: Yu-cheng Yu Shadow Stack provides protection for applications against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-Shadow Stack applications continue to work, but without protection. Since there is another feature that utilizes CET (Kernel IBT) that will share implementation with Shadow Stacks, create CONFIG_CET to signify that at least one CET feature is configured. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Add X86_CET (Kees) - Add back WRUSS dependency (Kees) - Fix verbiage (Dave) - Change from promt to bool (Kirill) - Add more to commit log v2: - Remove already wrong kernel size increase info (tlgx) - Change prompt to remove "Intel" (tglx) - Update line about what CPUs are supported (Dave) Yu-cheng v25: - Remove X86_CET and use X86_SHADOW_STACK directly. Yu-cheng v24: - Update for the splitting X86_CET to X86_SHADOW_STACK and X86_IBT. arch/x86/Kconfig | 24 ++++++++++++++++++++++++ arch/x86/Kconfig.assembler | 5 +++++ 2 files changed, 29 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ae36661cfd81..3c4a8e47cf19 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1853,6 +1853,11 @@ config CC_HAS_IBT (CC_IS_CLANG && CLANG_VERSION >= 140000)) && \ $(as-instr,endbr64) +config X86_CET + def_bool n + help + CET features configured (Shadow Stack or IBT) + config X86_KERNEL_IBT prompt "Indirect Branch Tracking" def_bool y @@ -1860,6 +1865,7 @@ config X86_KERNEL_IBT # https://github.com/llvm/llvm-project/commit/9d7001eba9c4cb311e03cd8cdc231f9e579f2d0f depends on !LD_IS_LLD || LLD_VERSION >= 140000 select OBJTOOL + select X86_CET help Build the kernel with support for Indirect Branch Tracking, a hardware support course-grain forward-edge Control Flow Integrity @@ -1954,6 +1960,24 @@ config X86_SGX If unsure, say N. +config X86_USER_SHADOW_STACK + bool "X86 Userspace Shadow Stack" + depends on AS_WRUSS + depends on X86_64 + select ARCH_USES_HIGH_VMA_FLAGS + select X86_CET + help + Shadow Stack protection is a hardware feature that detects function + return address corruption. This helps mitigate ROP attacks. + Applications must be enabled to use it, and old userspace does not + get protection "for free". + + CPUs supporting shadow stacks were first released in 2020. + + See Documentation/x86/cet.rst for more information. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler index 26b8c08e2fc4..00c79dd93651 100644 --- a/arch/x86/Kconfig.assembler +++ b/arch/x86/Kconfig.assembler @@ -19,3 +19,8 @@ config AS_TPAUSE def_bool $(as-instr,tpause %ecx) help Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7 + +config AS_WRUSS + def_bool $(as-instr,wrussq %rax$(comma)(%rbx)) + help + Supported by binutils >= 2.31 and LLVM integrated assembler From patchwork Sat Dec 3 00:35:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29169 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142324wrr; Fri, 2 Dec 2022 16:40:22 -0800 (PST) X-Google-Smtp-Source: AA0mqf4BzvXo9ardAkvJnWPseSJCR9MAUaamtaRDqbyRON+e05wbuK7zoJPOL0H6IxHHzgyhIyEt X-Received: by 2002:a17:906:3982:b0:7ad:8bc6:46e7 with SMTP id h2-20020a170906398200b007ad8bc646e7mr7346756eje.28.1670028022205; Fri, 02 Dec 2022 16:40:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028022; cv=none; d=google.com; s=arc-20160816; b=Vmo98QHqbkd1c1TDNO5fkec554peWXZQiGS4+L2F1jA3AeSsL2XxkYqqwzore2l/sv hxG9rvKqvTSktyekggMMkMzr4HknSobuNWqd8IVyZzQU47G2g34ouO744aJUtYwXRW5B TXGTI9T+ZQEc8uiIWCHnpK+VZ5v20JDQZtgIqB+jSYZ52VhQNBdM+7v7Gd5mYEiS9erC HbXEjaYfQvEF6gU14rB1Nb5sJClHtzKjeSyXQRp5FODgaTrJDRip3UTNA9JJM/zW7JOe 4x00v3QEJ1KAjXYTW08M4gq5KyQQuqX/8XScruD8Xrgsf+MbTX2JRCqi5wviO2oZTrd3 wNNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=uy/NLZp8uHaoSQpf0GYzKQCUrCJ5dFITnDwoLJ7lJrI=; b=BcVj/4R54tyZkE0f8nUbOlisYr/HUrTihOgsZ1tn7oF0pQh03H6yBW1MZ0R8H9RpCJ 4Z8fQNwXPmIwH/Xu1CyveJPpvbQjOFon+X+2fv+PWzevi5bw1VasC2eFr1GMm695PMZ1 i4TWpHU0aAnS2wrsPjkh3SdgZkuLh5+VrIvLv6LDuS6MYBP55s0pUlU5Aj1cw1L594hq gX6zHtI1j4geATfYgWOdiamzWzxkrjOTWhER0VyWRujly7n+8oz+oeshA2IJpScRl5v+ lGxzmOwxHmKL81ExBzr0TnkGMXLjoYSP2Il4yCqlH49RuDSCtdLnHBUbSE/cSjQZLQgD kRVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=M3garLhX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qw41-20020a1709066a2900b0078c3197bf86si3280046ejc.533.2022.12.02.16.39.59; Fri, 02 Dec 2022 16:40:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=M3garLhX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234590AbiLCAgu (ORCPT + 99 others); Fri, 2 Dec 2022 19:36:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234965AbiLCAgk (ORCPT ); Fri, 2 Dec 2022 19:36:40 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEDC0F7A3A; Fri, 2 Dec 2022 16:36:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027794; x=1701563794; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ngQ/jKKoQl7UDYFjf+gMhARftHxTv2Bpvt3YS5XCfS4=; b=M3garLhXFuSTnJhML8Nlm1GlM5leFyb4+EnBNZUyUOhkIi2Pznr7MOZ9 QToYhqKu9eZJyjNaaxfKfbDxegtIEbG2Lz1S73t2nDdVkOfcIsnB1Y7uL LOc/WK1J3P5kJrJXnwJB4VwUBVo2xIMw11VcNUaIDYX6Yc2Un61b/yhK/ kMiS/hxy37aNSlTZL2+UTuxrzbVKqDi+kHwtMPwAKS2n2dlkjM3Clav2s SgMGj5Risj0eYGUz52h8rNSH2ZTeBWn0OkNy/xR2xGSECuZ3s+cPl4rGD mNk1xevr5db8cSpSvCfu2UsQYoVs/FlgnfDJCVwDbeDHAOLHLnhb5+7GS w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710694" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710694" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:34 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479756" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479756" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:32 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 03/39] x86/cpufeatures: Add CPU feature flags for shadow stacks Date: Fri, 2 Dec 2022 16:35:30 -0800 Message-Id: <20221203003606.6838-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151303303711232?= X-GMAIL-MSGID: =?utf-8?q?1751151303303711232?= From: Yu-cheng Yu The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. The shadow stack feature, enumerated by the CPUID bit described above, encompasses both supervisor and userspace support for shadow stack. In near future patches, only userspace shadow stack will be enabled. In expectation of future supervisor shadow stack support, create a software CPU capability to enumerate kernel utilization of userspace shadow stack support. This will also allow for userspace shadow stack to be disabled, while leaving the shadow stack hardware capability exposed in the cpuinfo proc. This user shadow stack bit should depend on the HW "shstk" capability and that logic will be implemented in future patches. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Add user specific shadow stack cpu cap (Andrew Cooper) - Drop reviewed-bys from Boris and Kees due to the above change. v2: - Remove IBT reference in commit log (Kees) - Describe xsaves dependency using text from (Dave) v1: - Remove IBT, can be added in a follow on IBT series. Yu-cheng v25: - Make X86_FEATURE_IBT depend on X86_FEATURE_SHSTK. arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 11a0e06362e4..aab7fa4104d7 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -307,6 +307,7 @@ #define X86_FEATURE_SGX_EDECCSSA (11*32+18) /* "" SGX EDECCSSA user leaf function */ #define X86_FEATURE_CALL_DEPTH (11*32+19) /* "" Call depth tracking for RSB stuffing */ #define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ +#define X86_FEATURE_USER_SHSTK (11*32+21) /* Shadow stack support for user mode applications */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ @@ -369,6 +370,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* Shadow Stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index c44b56f7ffba..7a2954a16cb7 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -99,6 +99,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define DISABLE_USER_SHSTK 0 +#else +#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -114,7 +120,7 @@ #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ - DISABLE_CALL_DEPTH_TRACKING) + DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK) #define DISABLED_MASK12 0 #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index d95221117129..c3e4e5246df9 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -79,6 +79,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Sat Dec 3 00:35:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29160 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1141860wrr; Fri, 2 Dec 2022 16:38:37 -0800 (PST) X-Google-Smtp-Source: AA0mqf6i2HJzNugoqnE8LkPr4a+ZNgBayOPRjjrGBI66PR+f9I5q2fiOK27F5XOEsTEZg9vZWNDD X-Received: by 2002:a17:902:ead4:b0:189:68ed:6c0d with SMTP id p20-20020a170902ead400b0018968ed6c0dmr37945531pld.140.1670027917424; Fri, 02 Dec 2022 16:38:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027917; cv=none; d=google.com; s=arc-20160816; b=p+FG8o14ks1TK3klDXiiL2Yg1M90kxmc/2EQc4oxr0FyJZYTDk4WEd31ZDACgDuOZe 6rcknQ8jvXjTBaKegs+HIsh+NgooePVEVV1GW0TfSSxgg/ZGksbGGwIP62m5ncCu77zs WgA6AzGRj+803YloVmgZm3i5Ox6+RVBg/Z98YaUwgCnkyJgtvwKzD+O1PzNOFTRG3y7Z 5Uwiy+lehymGQRo0GpcdAfbKhEEvNoqWV7/xi7UwWg8cUrayltMalMz2ygna8T9L6ToU plKbB6imF1zjC9fHHFUjkjcT/ApGsvG7/4RSPhuaMl51LkXn+RcrCian0bIY+tf8b5+R 67bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=OXjHWeb/2fbr9UOSBxn/NfxjyzdA6H41u73mBZkhpgg=; b=Mibde57kTg2mb8Qm26+ELusnr4qq4Tp5gvLRXWXTao0reiWLXlEPgaOo9tAG+uD1qx GsZ+cphBAe8JNqvu8HEbsN8IxaVW7siqRCtQTF2EzOQl/gwjMUWDwhTxvtL8QoV+9Asw FZGavn2IB7MwWE8SpQJ44NUsYkFTLNMtA9LbNmOUg4nUnF/6bTrjuwiEDd9Uwvgmbw18 QojJP9o9p2FsjjRfcMGVZwNbUGI7hcx4xtvqn4O+HFvzfSXpNIc7QdSYk2xEL6cp90Hl WzPQ18dOndf9zcKxjAGIp5mUfo+H0XeXVwVJEnyVRYOA0Xiqiyos56C0PNaxDQKZWeVQ 5v6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jUeMQQfl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u1-20020a056a00158100b0056d89d8fba7si9691094pfk.154.2022.12.02.16.38.22; Fri, 02 Dec 2022 16:38:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jUeMQQfl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235060AbiLCAhJ (ORCPT + 99 others); Fri, 2 Dec 2022 19:37:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234983AbiLCAgk (ORCPT ); Fri, 2 Dec 2022 19:36:40 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B68E7F8195; Fri, 2 Dec 2022 16:36:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027796; x=1701563796; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=1p8FHxMuOhfc16cs1vrqOb7DIxgTV33DBoFY9Xhugz0=; b=jUeMQQflHEcxsbs34tlNnqtrseFeJz+6SdzihZsC5LYQ1nglIecbPHJz YEJj6eqOwfumdgFmF4J0JlwqwzWQLLrtrdrse8Rse1s4iCXi2cymjhR14 9qN1TvHLrOV7p/ioi8+1Zqr4wwFhDJMt3ZpW6nFGoiSdVFyESBa523ipW QBaLW78TiOxPh3RT3wuQzMW4fK3PB9Ccnli/WorfsVWg4JsjRmjxb1MHz u9cc7yam7gEsGNOcQnLNlkN2NgHgauMlcLeXTt+PvXSu7Lh0qcv28hCj0 UvQp6Tp6GcYuWdH7C1FWd8wlKQQ95fl6DAO4GPm7vtbZWoh5y7ACKGj5k A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710722" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710722" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:36 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479762" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479762" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:34 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 04/39] x86/cpufeatures: Enable CET CR4 bit for shadow stack Date: Fri, 2 Dec 2022 16:35:31 -0800 Message-Id: <20221203003606.6838-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151193796401677?= X-GMAIL-MSGID: =?utf-8?q?1751151193796401677?= From: Yu-cheng Yu Setting CR4.CET is a prerequisite for utilizing any CET features, most of which also require setting MSRs. Kernel IBT already enables the CET CR4 bit when it detects IBT HW support and is configured with kernel IBT. However, future patches that enable userspace shadow stack support will need the bit set as well. So change the logic to enable it in either case. Clear MSR_IA32_U_CET in cet_disable() so that it can't live to see userspace in a new kexec-ed kernel that has CR4.CET set from kernel IBT. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v4: - Add back dedicated command line disable: "nousershtk" (Boris) v3: - Remove stay new line (Boris) - Simplify commit log (Andrew Cooper) v2: - In the shadow stack case, go back to only setting CR4.CET if the kernel is compiled with user shadow stack support. - Clear MSR_IA32_U_CET as well. (PeterZ) KVM refresh: - Set CR4.CET if SHSTK or IBT are supported by HW, so that KVM can support CET even if IBT is disabled. - Drop no_user_shstk (Dave Hansen) - Elaborate on what the CR4 bit does in the commit log - Integrate with Kernel IBT logic arch/x86/kernel/cpu/common.c | 37 ++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 73cc546e024d..579f10222432 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -597,29 +597,51 @@ __noendbr void ibt_restore(u64 save) #endif +#ifdef CONFIG_X86_CET static __always_inline void setup_cet(struct cpuinfo_x86 *c) { - u64 msr = CET_ENDBR_EN; + bool kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT); + bool user_shstk; + u64 msr = 0; + + /* + * Enable user shadow stack only if the Linux defined user shadow stack + * cap was not cleared by command line. + */ + user_shstk = cpu_feature_enabled(X86_FEATURE_SHSTK) && + IS_ENABLED(CONFIG_X86_USER_SHADOW_STACK); - if (!HAS_KERNEL_IBT || - !cpu_feature_enabled(X86_FEATURE_IBT)) + if (!kernel_ibt && !user_shstk) return; + if (user_shstk) + set_cpu_cap(c, X86_FEATURE_USER_SHSTK); + + if (kernel_ibt) + msr = CET_ENDBR_EN; + wrmsrl(MSR_IA32_S_CET, msr); cr4_set_bits(X86_CR4_CET); - if (!ibt_selftest()) { + if (kernel_ibt && !ibt_selftest()) { pr_err("IBT selftest: Failed!\n"); wrmsrl(MSR_IA32_S_CET, 0); setup_clear_cpu_cap(X86_FEATURE_IBT); return; } } +#else /* CONFIG_X86_CET */ +static inline void setup_cet(struct cpuinfo_x86 *c) {} +#endif __noendbr void cet_disable(void) { - if (cpu_feature_enabled(X86_FEATURE_IBT)) - wrmsrl(MSR_IA32_S_CET, 0); + if (!(cpu_feature_enabled(X86_FEATURE_IBT) || + cpu_feature_enabled(X86_FEATURE_SHSTK))) + return; + + wrmsrl(MSR_IA32_S_CET, 0); + wrmsrl(MSR_IA32_U_CET, 0); } /* @@ -1470,6 +1492,9 @@ static void __init cpu_parse_early_param(void) if (cmdline_find_option_bool(boot_command_line, "noxsaves")) setup_clear_cpu_cap(X86_FEATURE_XSAVES); + if (cmdline_find_option_bool(boot_command_line, "nousershstk")) + setup_clear_cpu_cap(X86_FEATURE_USER_SHSTK); + arglen = cmdline_find_option(boot_command_line, "clearcpuid", arg, sizeof(arg)); if (arglen <= 0) return; From patchwork Sat Dec 3 00:35:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29171 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142403wrr; Fri, 2 Dec 2022 16:40:44 -0800 (PST) X-Google-Smtp-Source: AA0mqf5IooK6f5QsMgr7BHk35vynYqA81kV8E0eKblW+BazDyGQGzZCalQ3/95bvLU1w+BNd4v/B X-Received: by 2002:a17:906:5f85:b0:7bc:6bb7:41c1 with SMTP id a5-20020a1709065f8500b007bc6bb741c1mr32557558eju.353.1670028044668; Fri, 02 Dec 2022 16:40:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028044; cv=none; d=google.com; s=arc-20160816; b=K9sG1iShsHf8AaCAzjBSmB4V7/hYaYp9Qjx6eo6VLcU8jCW1M1q7YmC55KyOa1xS32 TzQmAjkfAyIW7rzCIaMFiYr9IQfK9CX+MRdEzSUcwrqx3/du1hYWGAb/p/Ll/qTZtGE7 LJjouZNZ2lO0ShVtuuGygv7Ti2/80ML/bHfZlytndspwr1r/LwvtLlnSmvEloR5kdjy1 vKY2/+g2ydhXhaRAsEFy42KBKJGhJa/HM1UYBTZai14h8WFHRUegBhMjA31noBt8tNxi d/RVwe8YDzEs8it75yMNQw6pyuCkyIAGRun6RgzbMFIqAy2al2dI9Kx44odf9aPpA+dH pMEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=mYP7jZ6lGJgaUiTuR4LeVMVIte7xGeV8U7adpYfUdBQ=; b=xd9a5OPSPbOtGoUzmZLmr3Yhu5G5faeJpVkXkPFRHDQVr7lOzRJx2FWW8XKAJObw5h lLvkPfSEfEclCNt3SQkRpnICjw8byokwrSZwPB1YTI/GWpTcDgyywMudrd3sO8FC3c+R 184uRICW2425ZagDbShDEobBUC7MSRt+7pAxUswGy2a2DTqoak1iJ36ppnbNOV/wk6AN tF56hWA5uIcOZ7iknuve5SIaolF4lXOCdvwEEl7LqG2mcHIvU/nEr6LkZ0HjGmOQjWkt Zp/NAy+wkwSZhMaSzV3rmOPKqWr/8Wfbh62/SFsuZApLOKJpr5Tq2t4l+DPLniM3QtNa xi8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="V7jD9/+6"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id by12-20020a0564021b0c00b004697a8fd1f7si2928528edb.549.2022.12.02.16.40.21; Fri, 02 Dec 2022 16:40:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="V7jD9/+6"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235013AbiLCAhP (ORCPT + 99 others); Fri, 2 Dec 2022 19:37:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234973AbiLCAgm (ORCPT ); Fri, 2 Dec 2022 19:36:42 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E875F7A20; Fri, 2 Dec 2022 16:36:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027800; x=1701563800; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=1kXkWpFfMa0u/N098a1VLocmlhHM0ZRGutbjOl8Xqto=; b=V7jD9/+6sY0BdZp2l2zSplhauNCJIa92+5D8BoEzoKZF5AbmyXV7S+Cw qcupslQ+ETciemV7JhBc28H+Tk52ZabV5FlKCbKWyeaiRVK+/SccMmxlp kAe2pmdD9q8S8lcSir6kYhEJN1iluM++72XvA8AmrmSV2e+zdW2aC4n/a Vs1Nqd/G80Lkdivd65HA94f0DzXOhrGURRdwjxHA6dvlXlPKgMOUJrfzF l25AnfcQetsbM8GUxyNyu5v2L0qca/mZiwnVoZ2r7wzBuo18hQo9ioewI 2mt4+yAiE26a0L8BrAgPVKD3p3NP9Iy0vNUjt1lvG3DZN1Yn8i2apNeCy A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710750" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710750" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:39 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479771" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479771" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:36 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 05/39] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Fri, 2 Dec 2022 16:35:32 -0800 Message-Id: <20221203003606.6838-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151326674320677?= X-GMAIL-MSGID: =?utf-8?q?1751151326674320677?= From: Yu-cheng Yu Shadow stack register state can be managed with XSAVE. The registers can logically be separated into two groups: * Registers controlling user-mode operation * Registers controlling kernel-mode operation The architecture has two new XSAVE state components: one for each group of those groups of registers. This lets an OS manage them separately if it chooses. Future patches for host userspace and KVM guests will only utilize the user-mode registers, so only configure XSAVE to save user-mode registers. This state will add 16 bytes to the xsave buffer size. Future patches will use the user-mode XSAVE area to save guest user-mode CET state. However, VMCS includes new fields for guest CET supervisor states. KVM can use these to save and restore guest supervisor state, so host supervisor XSAVE support is not required. Adding this exacerbates the already unwieldy if statement in check_xstate_against_struct() that handles warning about un-implemented xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when it actually check's the xfeature. This ends up exceeding 80 chars, but was better on balance than other options explored. Pass the bool as pointer to make it clear that XCHECK_SZ() can change the variable. While configuring user-mode XSAVE, clarify kernel-mode registers are not managed by XSAVE by defining the xfeature in XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT. This serves more of a documentation as code purpose, and functionally, only enables a few safety checks. Both XSAVE state components are supervisor states, even the state controlling user-mode operation. This is a departure from earlier features like protection keys where the PKRU state is a normal user (non-supervisor) state. Having the user state be supervisor-managed ensures there is no direct, unprivileged access to it, making it harder for an attacker to subvert CET. To facilitate this privileged access, define the two user-mode CET MSRs, and the bits defined in those MSRs relevant to future shadow stack enablement patches. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Add missing "is" in commit log (Boris) - Change to case statement for struct size checking (Boris) - Adjust commas on xfeature_names (Kees, Boris) v2: - Change name to XFEATURE_CET_KERNEL_UNUSED (peterz) KVM refresh: - Reword commit log using some verbiage posted by Dave Hansen - Remove unlikely to be used supervisor cet xsave struct - Clarify that supervisor cet state is not saved by xsave - Remove unused supervisor MSRs v1: - Remove outdated reference to sigreturn checks on msr's. arch/x86/include/asm/fpu/types.h | 14 ++++- arch/x86/include/asm/fpu/xstate.h | 6 ++- arch/x86/kernel/fpu/xstate.c | 90 +++++++++++++++---------------- 3 files changed, 59 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index eb7cd1139d97..344baad02b97 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL_UNUSED, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,14 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + u64 user_cet; /* user control-flow settings */ + u64 user_ssp; /* user shadow stack pointer */ +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 714166cc25f2..13a80521dd51 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -39,26 +39,26 @@ */ static const char *xfeature_names[] = { - "x87 floating point registers" , - "SSE registers" , - "AVX registers" , - "MPX bounds registers" , - "MPX CSR" , - "AVX-512 opmask" , - "AVX-512 Hi256" , - "AVX-512 ZMM_Hi256" , - "Processor Trace (unused)" , + "x87 floating point registers", + "SSE registers", + "AVX registers", + "MPX bounds registers", + "MPX CSR", + "AVX-512 opmask", + "AVX-512 Hi256", + "AVX-512 ZMM_Hi256", + "Processor Trace (unused)", "Protection Keys User registers", "PASID state", - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "AMX Tile config" , - "AMX Tile data" , - "unknown xstate feature" , + "Control-flow User registers", + "Control-flow Kernel registers (unused)", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "AMX Tile config", + "AMX Tile data", + "unknown xstate feature", }; static unsigned short xsave_cpuid_features[] __initdata = { @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -276,6 +277,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void) } \ } while (0) -#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \ - if ((nr == nr_macro) && \ - WARN_ONCE(sz != sizeof(__struct), \ - "%s: struct is %zu bytes, cpu state %d bytes\n", \ - __stringify(nr_macro), sizeof(__struct), sz)) { \ +#define XCHECK_SZ(sz, nr, __struct) ({ \ + if (WARN_ONCE(sz != sizeof(__struct), \ + "[%s]: struct is %zu bytes, cpu state %d bytes\n", \ + xfeature_names[nr], sizeof(__struct), sz)) { \ __xstate_dump_leaves(); \ } \ -} while (0) + true; \ +}) + /** * check_xtile_data_against_struct - Check tile data state size. @@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr) * Ask the CPU for the size of the state. */ int sz = xfeature_size(nr); + /* * Match each CPU state with the corresponding software * structure. */ - XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct); - XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); - XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); - XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); - XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); - XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); - XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); - XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); - XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); - - /* The tile data size varies between implementations. */ - if (nr == XFEATURE_XTILE_DATA) - check_xtile_data_against_struct(sz); - - /* - * Make *SURE* to add any feature numbers in below if - * there are "holes" in the xsave state component - * numbers. - */ - if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || - (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + switch (nr) { + case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct); + case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state); + case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state); + case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state); + case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state); + case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state); + case XFEATURE_PKRU: return XCHECK_SZ(sz, nr, struct pkru_state); + case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); + case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); + case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true; + default: XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr); return false; } + return true; } From patchwork Sat Dec 3 00:35:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29161 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1141893wrr; Fri, 2 Dec 2022 16:38:49 -0800 (PST) X-Google-Smtp-Source: AA0mqf7lDIjFOStp8XHoJWEO9h9qA/onxeIKGs9XaSldE4HSMxxDGD3djh/FZpHUsMKCWhzObZn8 X-Received: by 2002:a17:906:6dd5:b0:78d:a633:b55 with SMTP id j21-20020a1709066dd500b0078da6330b55mr64804560ejt.106.1670027929048; Fri, 02 Dec 2022 16:38:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027929; cv=none; d=google.com; s=arc-20160816; b=ypsY/ZP7a6oZ4wJdfsONcRAQruVvdZ6MxTHpxgnlMApjMf0HsKcTMjPB0mOuWRM2zy w5rYevZTTj0z/8phRuSc6LXAVBo+N8tCI9m08XdybeIXT4qOx1xeRfrAyjZl017IQQEp M6gfvIXAZ07XuIHVFo8gzBbqtrJ6KIayoLv7qhsuNPxLI6KZ1eO52MOoABgUELphd0tB nCOalmVKRD94A/ZrqcZq203FNweBYwLocGGG4FTlSq3jgOLoi2RFppmUvl2RPFsQzvqD 9uOIximt9e3sU9l8vkLMRPO/tdpjkn0eN7iqbXCuH3IMtfbTLZAR1Pgvv5BQT30ujMhs bTPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=holp8FG1KuTwaCPNW2nBFLzGbV+C0253qOrO5vA4Zb4=; b=SW3o2O19i2OtoyjqD4Wp5VrRlazRMMwelBh6Qbc6/CQJsK3jRAsBEZ85dPdSPXn3bv iIQMqjtW9O0BIyVnDw+Av+x4fKYdOzW4T+yQLUzgdrvPkNJLoJsjfrOZv3B13QDeICPY rP4RRNZW0UfNPAwU+Otc4OffBD8Fyc0DiImt7ZZ4SV40jzNgcU3CaVqgg0JC9RUJJMZ5 XI0a46Sg4lhLQHNlC7NBbJEkMUjBlfPjHycehVzo4D98OvnbiblNP4N2/vwKCCUfnhKf 8JXUEGXgxav2FSyxnSNG6FGztEh2G4Vn2mb5x5lVQOTb7g2E/1OIoQuH/opQ2FKo41Jz d8oQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OTcDoUA3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ji22-20020a170907981600b007c0a4c149bfsi6663427ejc.403.2022.12.02.16.38.26; Fri, 02 Dec 2022 16:38:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OTcDoUA3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235015AbiLCAhS (ORCPT + 99 others); Fri, 2 Dec 2022 19:37:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234980AbiLCAgv (ORCPT ); Fri, 2 Dec 2022 19:36:51 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CED7AF8886; Fri, 2 Dec 2022 16:36:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027801; x=1701563801; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=oIjfcMVeiff8rS378Rozcj4vL/fH6nDkOB3vcy9Qe2A=; b=OTcDoUA3ZYsajgH2zoCxNoHUO3BCfHT43C5oTkx2cZwT2VScfvyBTOZe WC5SdIjAlFSHyR8maDYasa8p4X/NSsbhCPW1P92P98o9B3Ovoc+W0Jxu/ 1ULF5xMDEJlmpuYstLOPT6dRV5VbwZLergScR+3NYW2s57T6PGi8VCoI3 cMxnf0KaIjAGLoY/mF+zeCrhq2C4GJiiD0lI1EPMV4DyshQRUDo1qGREm JQLaRrjRok8eZWksRdV4nB7ZE7Qi74hK2QAw0gZ3aYQ1c0WPBSIjezMx/ aHOwOhlmlt2beXG0jeUKQERargoNdppU0Bu66nwvfWD9B6Tw9EQvaxCtX Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710775" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710775" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:41 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479781" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479781" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:39 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 06/39] x86/fpu: Add helper for modifying xstate Date: Fri, 2 Dec 2022 16:35:33 -0800 Message-Id: <20221203003606.6838-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151205355114319?= X-GMAIL-MSGID: =?utf-8?q?1751151205355114319?= Just like user xfeatures, supervisor xfeatures can be active in the registers or present in the task FPU buffer. If the registers are active, the registers can be modified directly. If the registers are not active, the modification must be performed on the task FPU buffer. When the state is not active, the kernel could perform modifications directly to the buffer. But in order for it to do that, it needs to know where in the buffer the specific state it wants to modify is located. Doing this is not robust against optimizations that compact the FPU buffer, as each access would require computing where in the buffer it is. The easiest way to modify supervisor xfeature data is to force restore the registers and write directly to the MSRs. Often times this is just fine anyway as the registers need to be restored before returning to userspace. Do this for now, leaving buffer writing optimizations for the future. Add a new function fpregs_lock_and_load() that can simultaneously call fpregs_lock() and do this restore. Also perform some extra sanity checks in this function since this will be used in non-fpu focused code. Tested-by: Pengfei Xu Tested-by: John Allen Suggested-by: Thomas Gleixner Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v3: - Rename to fpregs_lock_and_load() to match the unlocking fpregs_unlock(). (Kees) - Elaborate in comment about helper. (Dave) v2: - Drop optimization of writing directly the buffer, and change API accordingly. - fpregs_lock_and_load() suggested by tglx - Some commit log verbiage from dhansen v1: - New patch. arch/x86/include/asm/fpu/api.h | 9 +++++++++ arch/x86/kernel/fpu/core.c | 19 +++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 503a577814b2..aadc6893dcaa 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -82,6 +82,15 @@ static inline void fpregs_unlock(void) preempt_enable(); } +/* + * FPU state gets lazily restored before returning to userspace. So when in the + * kernel, the valid FPU state may be kept in the buffer. This function will force + * restore all the fpu state to the registers early if needed, and lock them from + * being automatically saved/restored. Then FPU state can be modified safely in the + * registers, before unlocking with fpregs_unlock(). + */ +void fpregs_lock_and_load(void); + #ifdef CONFIG_X86_DEBUG_FPU extern void fpregs_assert_state_consistent(void); #else diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 9baa89a8877d..9af78e9d92a0 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -753,6 +753,25 @@ void switch_fpu_return(void) } EXPORT_SYMBOL_GPL(switch_fpu_return); +void fpregs_lock_and_load(void) +{ + /* + * fpregs_lock() only disables preemption (mostly). So modifing state + * in an interrupt could screw up some in progress fpregs operation, + * but appear to work. Warn about it. + */ + WARN_ON_ONCE(!irq_fpu_usable()); + WARN_ON_ONCE(current->flags & PF_KTHREAD); + + fpregs_lock(); + + fpregs_assert_state_consistent(); + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + fpregs_restore_userregs(); +} +EXPORT_SYMBOL_GPL(fpregs_lock_and_load); + #ifdef CONFIG_X86_DEBUG_FPU /* * If current FPU state according to its tracking (loaded FPU context on this From patchwork Sat Dec 3 00:35:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29162 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1141940wrr; Fri, 2 Dec 2022 16:39:02 -0800 (PST) X-Google-Smtp-Source: AA0mqf5H6WqQH1zlOdg7y7sbYClXcknOgbxujuncM8YWf0WcWJI9OhbRZZ2imaC9+tTec4J8pZTK X-Received: by 2002:a17:906:194e:b0:7c0:a6c2:fcef with SMTP id b14-20020a170906194e00b007c0a6c2fcefmr10769268eje.671.1670027941877; Fri, 02 Dec 2022 16:39:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027941; cv=none; d=google.com; s=arc-20160816; b=0ZUMPXbeDDQTgrKsoR1ukXveUH+kTn+8KxAr19/6UlKzRQ7TjOjkp3TGvyM3Mm9sPr 2+eoPjTfy45L7W+oUmn/+0a8Bh5rpQhUJ6jDIGABO4M/kbgIp06A5xq1vFkvaXRfzPgS nbcTBttN0lTuBZU69wmado7YOk9+EOgWohAy+67LBmFNG+gqwzOTEbKizLPj+B+e9REZ wUiNFb9KNBQkbhhYQDoSwYAHNVogFef2CYfdqkweSICR4FZS1wvOO8M7q0/gMUjjvIjx ebcF36k+1kGJjbOT4RAYUoGsgB6+25aADMnIluRdIXPqBXMRsdWb3Z+sLWdzHoKNnm67 4lrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=kVX6Hz4ddb7wVnifilOgBZSDk2JVJ8X4Tshin8icZlA=; b=r2RL2JOaAwpEnTlpvq1pGpjyrOkdS6yUOYhH4bK1rWyIFXTFC5n4I27SAZawEXRS+F 2fJDqo/TXk5o1rfAUVR5KTvQ2R7x6YF+LDjOwJvCi7UlnQ85vmG0FSmtgBSoUPQwuILc nG3OD4LaOlzwhfJa4Ds8AAeeHicZSUSYrEJa6iwpgK6+uG2eLJKVN0nJSQH5SVacLWcY vSo6gqHYfOAhvvT4udFvJplWEaSKzZus74lYmXKb7Zyav5m45//BcQL08RsezXrl7c6j h3cuTRtwRpkzJNvanCnYg162baZFNS+5R2b56uv8eGJ3ttXi7ErKHtObH+ElTgyRzWz0 rEaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mSg9eMOB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hv21-20020a17090760d500b007c0d3993853si981184ejc.637.2022.12.02.16.38.39; Fri, 02 Dec 2022 16:39:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mSg9eMOB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235035AbiLCAhk (ORCPT + 99 others); Fri, 2 Dec 2022 19:37:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235025AbiLCAhG (ORCPT ); Fri, 2 Dec 2022 19:37:06 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD9C4F889E; Fri, 2 Dec 2022 16:36:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027803; x=1701563803; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=FiCB/VOdavcFnzoHfox8IGmjG//0meY5bBYOLkkEOgQ=; b=mSg9eMOBvS6VqQOIEpDC1ITFrfh6hLovo6QZZOjQjPlwo6er6PC6oLhQ ivWP1NtVdESE0gBhASyhlvjr4GI6ThI6hTeyoNkgIzylMIyuwODDkfctr 0yU/Kr/CbGNi75T6OO3+rKi7KoN2PtNwXMSp9cTzLLAwinSss5sZXv4Mo 5r4q1JBpmqhI2uOCw1QMM4xUiQPApjU70n/N4otanvC4FM4k1j/Y/4ae+ iOZa27CTtnvy8rS4dJYTMzj5ZM0cZSh/gLM14aa2of9J60YU/VCC9snPV lJeI4WhHeqVpDj7Guo+JG6rn7wjbsEX8rguqAgsDudkad+7pP5KrcWmuy A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710798" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710798" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:43 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479784" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479784" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:41 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Michael Kerrisk Subject: [PATCH v4 07/39] x86: Add user control-protection fault handler Date: Fri, 2 Dec 2022 16:35:34 -0800 Message-Id: <20221203003606.6838-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151218659403949?= X-GMAIL-MSGID: =?utf-8?q?1751151218659403949?= From: Yu-cheng Yu A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack. There already exists a control-protection fault handler for handling kernel IBT. Refactor this fault handler into sparate user and kernel handlers, like the page fault handler. Add a control-protection handler for usermode. Keep the same behavior for the kernel side of the fault handler, except for converting a BUG to a WARN in the case of a #CP happening when !cpu_feature_enabled(). This unifies the behavior with the new shadow stack code, and also prevents the kernel from crashing under this situation which is potentially recoverable. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Cc: Michael Kerrisk Reviewed-by: Kees Cook --- v3: - Shorten user/kernel #CP handler function names (peterz) - Restore CP_ENDBR check to kernel handler (peterz) - Utilize CONFIG_X86_CET (Kees) - Unify "unexpected" warnings (Andrew Cooper) - Use 2d array for error code chars (Andrew Cooper) - Add comment about why to read SSP MSR before enabling interrupts v2: - Integrate with kernel IBT fault handler - Update printed messages. (Dave) - Remove array_index_nospec() usage. (Dave) - Remove IBT messages. (Dave) - Add enclave error code bit processing it case it can get triggered somehow. - Add extra "unknown" in control_protection_err. v1: - Update static asserts for NSIGSEGV Yu-cheng v29: - Remove pr_emerg() since it is followed by die(). - Change boot_cpu_has() to cpu_feature_enabled(). arch/arm/kernel/signal.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/signal32.c | 2 +- arch/sparc/kernel/signal32.c | 2 +- arch/sparc/kernel/signal_64.c | 2 +- arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/idtentry.h | 2 +- arch/x86/kernel/idt.c | 2 +- arch/x86/kernel/signal_compat.c | 2 +- arch/x86/kernel/traps.c | 107 ++++++++++++++++++++--- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm.S | 2 +- include/uapi/asm-generic/siginfo.h | 3 +- 13 files changed, 114 insertions(+), 24 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index e07f359254c3..9a3c9de5ac5e 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -681,7 +681,7 @@ asmlinkage void do_rseq_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 9ad911f1647c..81b13a21046e 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1166,7 +1166,7 @@ void __init minsigstksz_setup(void) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index 4700f8522d27..bbd542704730 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -460,7 +460,7 @@ void compat_setup_restart_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c index dad38960d1a8..82da8a2d769d 100644 --- a/arch/sparc/kernel/signal32.c +++ b/arch/sparc/kernel/signal32.c @@ -751,7 +751,7 @@ asmlinkage int do_sys32_sigstack(u32 u_ssptr, u32 u_ossptr, unsigned long sp) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c index 570e43e6fda5..b4e410976e0d 100644 --- a/arch/sparc/kernel/signal_64.c +++ b/arch/sparc/kernel/signal_64.c @@ -562,7 +562,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 7a2954a16cb7..b7646b471537 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -105,6 +105,12 @@ #define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) #endif +#ifdef CONFIG_X86_KERNEL_IBT +#define DISABLE_IBT 0 +#else +#define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -128,7 +134,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ DISABLE_ENQCMD) #define DISABLED_MASK17 0 -#define DISABLED_MASK18 0 +#define DISABLED_MASK18 (DISABLE_IBT) #define DISABLED_MASK19 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 72184b0b2219..69e26f48d027 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -618,7 +618,7 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault); #endif /* #CP */ -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); #endif diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index a58c6bc1cd68..5074b8420359 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -107,7 +107,7 @@ static const __initconst struct idt_data def_idts[] = { ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE), #endif -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET INTG(X86_TRAP_CP, asm_exc_control_protection), #endif diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 879ef8c72f5c..d441804443d5 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void) */ BUILD_BUG_ON(NSIGILL != 11); BUILD_BUG_ON(NSIGFPE != 15); - BUILD_BUG_ON(NSIGSEGV != 9); + BUILD_BUG_ON(NSIGSEGV != 10); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 6); BUILD_BUG_ON(NSIGCHLD != 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 8b83d8fbce71..e35c70dc1afb 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -213,12 +213,7 @@ DEFINE_IDTENTRY(exc_overflow) do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL); } -#ifdef CONFIG_X86_KERNEL_IBT - -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - +#ifdef CONFIG_X86_CET enum cp_error_code { CP_EC = (1 << 15) - 1, @@ -231,15 +226,87 @@ enum cp_error_code { CP_ENCL = 1 << 15, }; -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +static const char control_protection_err[][10] = { + [0] = "unknown", + [1] = "near ret", + [2] = "far/iret", + [3] = "endbranch", + [4] = "rstorssp", + [5] = "setssbsy", +}; + +static const char *cp_err_string(unsigned long error_code) +{ + unsigned int cpec = error_code & CP_EC; + + if (cpec >= ARRAY_SIZE(control_protection_err)) + cpec = 0; + return control_protection_err[cpec]; +} + +static void do_unexpected_cp(struct pt_regs *regs, unsigned long error_code) +{ + WARN_ONCE(1, "Unexpected %s #CP, error_code: %s\n", + user_mode(regs) ? "user mode" : "kernel mode", + cp_err_string(error_code)); +} +#endif /* CONFIG_X86_CET */ + +void do_user_cp_fault(struct pt_regs *regs, unsigned long error_code); + +#ifdef CONFIG_X86_USER_SHADOW_STACK +static DEFINE_RATELIMIT_STATE(cpf_rate, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + +void do_user_cp_fault(struct pt_regs *regs, unsigned long error_code) { - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); + struct task_struct *tsk; + unsigned long ssp; + + /* + * An exception was just taken from userspace. Since interrupts are disabled + * here, no scheduling should have messed with the registers yet and they + * will be whatever is live in userspace. So read the SSP before enabling + * interrupts so locking the fpregs to do it later is not required. + */ + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + cond_local_irq_enable(regs); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + /* Ratelimit to prevent log spamming. */ + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + __ratelimit(&cpf_rate)) { + pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)%s", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + cp_err_string(error_code), + error_code & CP_ENCL ? " in enclave" : ""); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); } - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) + force_sig_fault(SIGSEGV, SEGV_CPERR, (void __user *)0); + cond_local_irq_disable(regs); +} +#endif + +void do_kernel_cp_fault(struct pt_regs *regs, unsigned long error_code); + +#ifdef CONFIG_X86_KERNEL_IBT +static __ro_after_init bool ibt_fatal = true; + +extern void ibt_selftest_ip(void); /* code label defined in asm below */ + +void do_kernel_cp_fault(struct pt_regs *regs, unsigned long error_code) +{ + if ((error_code & CP_EC) != CP_ENDBR) { + do_unexpected_cp(regs, error_code); return; + } if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { regs->ax = 0; @@ -285,9 +352,25 @@ static int __init ibt_setup(char *str) } __setup("ibt=", ibt_setup); - #endif /* CONFIG_X86_KERNEL_IBT */ +#ifdef CONFIG_X86_CET +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (user_mode(regs)) { + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + do_user_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } else { + if (cpu_feature_enabled(X86_FEATURE_IBT)) + do_kernel_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } +} +#endif /* CONFIG_X86_CET */ + #ifdef CONFIG_X86_F00F_BUG void handle_invalid_op(struct pt_regs *regs) #else diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index a7d83c7800e4..e58d6cd30853 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -639,7 +639,7 @@ static struct trap_array_entry trap_array[] = { TRAP_ENTRY(exc_coprocessor_error, false ), TRAP_ENTRY(exc_alignment_check, false ), TRAP_ENTRY(exc_simd_coprocessor_error, false ), -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET TRAP_ENTRY(exc_control_protection, false ), #endif }; diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index 4a184f6e4e4d..7cdcb4ce6976 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -148,7 +148,7 @@ xen_pv_trap asm_exc_page_fault xen_pv_trap asm_exc_spurious_interrupt_bug xen_pv_trap asm_exc_coprocessor_error xen_pv_trap asm_exc_alignment_check -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET xen_pv_trap asm_exc_control_protection #endif #ifdef CONFIG_X86_MCE diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ffbe4cec9f32..0f52d0ac47c5 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -242,7 +242,8 @@ typedef struct siginfo { #define SEGV_ADIPERR 7 /* Precise MCD exception */ #define SEGV_MTEAERR 8 /* Asynchronous ARM MTE error */ #define SEGV_MTESERR 9 /* Synchronous ARM MTE exception */ -#define NSIGSEGV 9 +#define SEGV_CPERR 10 /* Control protection fault */ +#define NSIGSEGV 10 /* * SIGBUS si_codes From patchwork Sat Dec 3 00:35:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29163 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142001wrr; Fri, 2 Dec 2022 16:39:14 -0800 (PST) X-Google-Smtp-Source: AA0mqf6Z4IKxX+AU6VzLzthGd+TBofyg/WVGDcpXHmblGdl9rHGSfmPEO6K5BEjyCQ64qY+6tLe8 X-Received: by 2002:aa7:d45a:0:b0:46b:9af6:5bf7 with SMTP id q26-20020aa7d45a000000b0046b9af65bf7mr14524487edr.391.1670027954040; Fri, 02 Dec 2022 16:39:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027954; cv=none; d=google.com; s=arc-20160816; b=NMuJhVj6Hhmp3KCJwAXHltMPIPlAD85a5SrYG9a8mXNf9IyEhzAiaHqIU2lDMK1rYl 5+w99X9apsLGVeYJZMBewMQWrnRNa4W5zX5FUWu/bVpk/KCOVYoWp1/Y8cOAqQ3H23gR VsJ3x1rnagqKE52xuq7BLiWFnx+bGznr5EEaoEAXGYe9BYOoZUuSceqyiVE9rk4YkcvJ 4uje74HlwEMiT0v2jms6MFJQRoUXDCWtSgcEMZOoZbbjgOC0ZH5zlWQqMiXF6FNPka3g M3n2qzryq+VrB3Or3d/zkaHAvUHuxIdoCLd+C8pZVaxjDBQ5TJ9HRCEJr8hwDPOtKZzf YbwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=z1lXjmab9iUB/QO7egQSmrhdMAK9Cqh+IBIczqpovLo=; b=rmlga8E9PtSiDmqWXuFZTAAUYPDdeqLA04Ojigaz085E7+sqOS7qOKDgBQ7NKGzrCT jO0m6YRLYXCEd2I2TsBsoBFI+tUntXbe1O63+KwwxrHyGOys+6ROxwg2Vx/rRpjHiouo PNjbTZ43Z31xi51J1+ctuO5FWK46ugich3zpVHs9uLcFF8MZaEbGbM3ej8BhKbMscQpH bGBqhhtI8+JGOWjSOMsNmiUrbfCh414eyo3gA2wZgCo+zOhWTbMMHDf1GQyjFJenEo0+ kjk8Y1UYkD47M18S7Fq5o6c9ySpb5TDmtKwD/m0L4Cf5XipuyZ2bfIgo1YWv3jYhUdp9 8/Eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="OMb+/FNS"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id br13-20020a170906d14d00b007c0bd0edd7esi2844194ejb.906.2022.12.02.16.38.50; Fri, 02 Dec 2022 16:39:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="OMb+/FNS"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235120AbiLCAhz (ORCPT + 99 others); Fri, 2 Dec 2022 19:37:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235054AbiLCAhI (ORCPT ); Fri, 2 Dec 2022 19:37:08 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDC2BF9320; Fri, 2 Dec 2022 16:36:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027808; x=1701563808; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=TsPAiG3CKuUoTXIA4KxoLuEUjC7epYmiZKW9UfFSzwA=; b=OMb+/FNSfmLNY8sAT6idYapYkv02LG0MiWCUd1Pv4X0NzrRji6j8HZdT Nczw6fBMLwBFnKZF0AF148Y7V2v86xyEelRb1leooTO/KuQQV2K7Cs9lA ADpYKVYDxOzuV63nFAw1j2Z2DDjfo6SGKLw38ysDsL5yyf5mIVsUDwiyH fvlfBrLWYyDrmPrGO8M0XSpYgd4FVQfqsVyz6a0iTk85RJ2qogDgP5a6x ++tzQKpos9zdhI75D88NpsJVDS7evd2TJSn9qoFsnxbjyDOQXpIqWVXQX z2srebCANNz85bixJo43Stp8T8RpmS5rTjDH2802j42YzJliuW7HERmsC w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710825" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710825" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:47 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479790" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479790" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:43 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Christoph Hellwig Subject: [PATCH v4 08/39] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Date: Fri, 2 Dec 2022 16:35:35 -0800 Message-Id: <20221203003606.6838-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151231548448060?= X-GMAIL-MSGID: =?utf-8?q?1751151231548448060?= From: Yu-cheng Yu New processors that support Shadow Stack regard Write=0,Dirty=1 PTEs as shadow stack pages. In normal cases, it can be helpful to create Write=1 PTEs as also Dirty=1 if HW dirty tracking is not needed, because if the Dirty bit is not already set the CPU has to set Dirty=1 when it the memory gets written to. This creates addiontal work for the CPU. So tradional wisdom was to simply set the Dirty bit whenever you didn't care about it. However, it was never really very helpful for read only kernel memory. When CR4.CET=1 and IA32_S_CET.SH_STK_EN=1, some instructions can write to such supervisor memory. The kernel does not set IA32_S_CET.SH_STK_EN, so avoiding kernel Write=0,Dirty=1 memory is not strictly needed for any functional reason. But having Write=0,Dirty=1 kernel memory doesn't have any functional benefit either, so to reduce ambiguity between shadow stack and regular Write=0 pages, removed Dirty=1 from any kernel Write=0 PTEs. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: "H. Peter Anvin" Cc: Kees Cook Cc: Thomas Gleixner Cc: Dave Hansen Cc: Christoph Hellwig Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra Reviewed-by: Kees Cook --- v3: - Update commit log (Andrew Cooper, Peterz) v2: - Normalize PTE bit descriptions between patches arch/x86/include/asm/pgtable_types.h | 6 +++--- arch/x86/mm/pat/set_memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 447d4bee25c4..0646ad00178b 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -192,10 +192,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 0db69514fe29..50e07e8493e0 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2055,7 +2055,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } int set_memory_rox(unsigned long addr, int numpages) From patchwork Sat Dec 3 00:35:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29164 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142040wrr; Fri, 2 Dec 2022 16:39:23 -0800 (PST) X-Google-Smtp-Source: AA0mqf6+di5VQtNsXe7/0MKjr927SdwArg/3FJtWKWeK36wPdkh+oBKSD/skIMRlzPxsRv44gtci X-Received: by 2002:a17:906:9709:b0:7c0:cb51:887c with SMTP id k9-20020a170906970900b007c0cb51887cmr3336591ejx.620.1670027963826; Fri, 02 Dec 2022 16:39:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027963; cv=none; d=google.com; s=arc-20160816; b=YFFGr6t+csqN5BiMGpcelTomiA4WZi7/y1D0bssc/ovT4Xzund5+VyOWlncnYPr86A 6FCM+9JY+D7v5/6Jo2ph2zT7VVJd3CSm6rujFJTqzLlvnyG5jjG25KPSjl0l3OFeboYF hFQrnRmZRZxKnMu3rEuU7ein2HDWU0BzNxx7qavpbpFPzQ9RSFAUPrmhoRsndtFuxZuc /8D0Jb4XlB5eL14YDp0PlAlGI0NIZr79RAvf+ZdurdK3RtoCRB++fn1Z/9qmA8nkLLj8 Ku+ZRghKTEUEq7pNkNvpfSaebmrFjpINuq2YFv3/Cvl8ZiRHj+X7B3Vxd20XgkGOacmA oJUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=u+P+qa8kKTWg8AnGQCR7FIL5mFISVmI7FR/YQkztV68=; b=bNAO2WxJxdBzrT62Vn1DZbMjkeVZtJQvabWQ6W48zPUfPXPZTg5K9+ROhRMAoDrjWT 3sEkZ5Sk2og8LkM1PW2jQh2UHGvnrnSHkQ9WIPFDHLj7mhhsio2wpfa+fJrpP/z/woCb K7Q4FWv/0m2akKJeH4c+JQ20DtLBNdIY89O26OeMXEKOPTbMEDfBN5bDVjhh6nqxDDS/ O8ZWprUC82wKGASOo3nizkhIMIPYxLxp8ouLmLd9pK1HSKA2dBDUxhGgI29ZernmTlSt aXk2Z7DTSwjctg3MoWdY5S3uW546oIBOTvALc5ZbJ6b2K/wXFTJjmI6gxPKYbY6i6WFG Hsvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nXqy4FL9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wu9-20020a170906eec900b007adfe2889efsi7339141ejb.607.2022.12.02.16.39.01; Fri, 02 Dec 2022 16:39:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nXqy4FL9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235126AbiLCAiE (ORCPT + 99 others); Fri, 2 Dec 2022 19:38:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235069AbiLCAhi (ORCPT ); Fri, 2 Dec 2022 19:37:38 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34770F818F; Fri, 2 Dec 2022 16:36:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027811; x=1701563811; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=poplwKY6u8XD6c+b4FqOVfxOhvQPYQQiQgDKo7YigJ8=; b=nXqy4FL9uFIiuks9mHK4vdJkdO8HkTSJleHylvaWSIDztZFjD8vZR2C7 Vw6saoIxFpSYn2VLCiUbuDaB5PeWcgVnQO+20MhsPhgtcBJ6edJ5RD4oE iADmByIYfPCCVSfL9Vg7TQrLeo3Dx3OzUrNSOX4Fnyk+ncX+mC3zLqtvE t31McvDqWWFRhHaBW8HcYsrT4GpESMuHJSGDz5leTuA9HRmqKnUJPOSyt wyroZmhGa7Hul+D0ep83V18Uepq/+P6T1UXsNXyQtokPy9qzvcTMsHEIx qjAzfk0nM9XswWru2HPQrM6XMI0X/eevp/Mc6fUEs++grgdq5+QrjpDYB g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710851" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710851" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:50 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479798" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479798" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:48 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 09/39] x86/mm: Move pmd_write(), pud_write() up in the file Date: Fri, 2 Dec 2022 16:35:36 -0800 Message-Id: <20221203003606.6838-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151242091680838?= X-GMAIL-MSGID: =?utf-8?q?1751151242091680838?= From: Yu-cheng Yu To prepare the introduction of _PAGE_COW, move pmd_write() and pud_write() up in the file, so that they can be used by other helpers below. No functional changes. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5059799bebe3..a1d6f121ee35 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -159,6 +159,18 @@ static inline int pte_write(pte_t pte) return pte_flags(pte) & _PAGE_RW; } +#define pmd_write pmd_write +static inline int pmd_write(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_RW; +} + +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + static inline int pte_huge(pte_t pte) { return pte_flags(pte) & _PAGE_PSE; @@ -1103,12 +1115,6 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#define pmd_write pmd_write -static inline int pmd_write(pmd_t pmd) -{ - return pmd_flags(pmd) & _PAGE_RW; -} - #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) @@ -1138,12 +1144,6 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } -#define pud_write pud_write -static inline int pud_write(pud_t pud) -{ - return pud_flags(pud) & _PAGE_RW; -} - #ifndef pmdp_establish #define pmdp_establish pmdp_establish static inline pmd_t pmdp_establish(struct vm_area_struct *vma, From patchwork Sat Dec 3 00:35:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29165 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142058wrr; Fri, 2 Dec 2022 16:39:28 -0800 (PST) X-Google-Smtp-Source: AA0mqf6za9af+J1AzeXjtlSLgqOzeHL9gkK3rNsYCbD43ZsB5/Mg8WMzlHTRjKc2DcNBteHNceco X-Received: by 2002:a17:907:6e23:b0:7c0:bd9a:b846 with SMTP id sd35-20020a1709076e2300b007c0bd9ab846mr5376718ejc.630.1670027967857; Fri, 02 Dec 2022 16:39:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027967; cv=none; d=google.com; s=arc-20160816; b=qtDKRZD500Z81pWlejtkMwnXmpsS98W/h4GJkAH1MnFaYvioGAbad9VWVPW9tu2xEf 033OkvLEDdQN0CxczAJCrs0QBwbNNF/P81URxQ1+2951IDkVAu0zIZnLN3+45PiN120r 7iS+bD6s3QzEKHWF093mb+KEaL50JQ/TbvJhcI7X3i8ENDCLfj5RACw6NMs/egsEKMxl f/1h2KNoONrTazTiC7CokRL0tckNHaKVo6IKQgySayrJmOuNnhkJl0kUINqN01JKhTdn x2l3lAKLZ8XGP1P1bEHgs1K0QL34hk2NQZKGIJlqbgQMcQXA42HUkJQ0lgmf+udNVabU 4A+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=ZILcPi43gb3IWedYgD1PZ7ljgVjY5LUTmGhxm2NnCxA=; b=IACR3g22ttLig6cMIZ0aLIuXWMV21NizO68c3qQ8mpznMQ9sBM2VgYtIYCEaqlq5HK dmoj0k8S8ZPim4fcvSPVM2t3UzMIdADN7YUS4u+9U8xcJkszDCQfiRIoyAY5jh5cKMMK 6q4BUFd3I4z+JszDri0Mvi5LpjDd84fdNzdYTo1vVduuY6WJwPKvP6ZRT8IlulwQmZll dJqlcG7gw4uNwcr2/HqjSt2or5GYvJNqCIn++1wsht2ZYdMspyffCQReLuyJMziKaKMn 5K2SrnktJXDW/7UvFGAf24efvK5s2Nsxe1XyNo6HGEhHpFnJRtZFiveU8tin6DhrTiul vm8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FPO4+lXC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c12-20020a0564021f8c00b0046bcc982714si6153693edc.481.2022.12.02.16.39.05; Fri, 02 Dec 2022 16:39:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FPO4+lXC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234939AbiLCAiT (ORCPT + 99 others); Fri, 2 Dec 2022 19:38:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235085AbiLCAhk (ORCPT ); Fri, 2 Dec 2022 19:37:40 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6E23F9E21; Fri, 2 Dec 2022 16:36:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027812; x=1701563812; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=X9ae03J76p9UA9h1wU+FcFVEj90hHuMBkVwbiMfGTg4=; b=FPO4+lXCYxqerrMre3MDD8Gk/n718tnVw0d+kn6CUoAJSkUHyFbIk0Hu 1CfOfTZg03cIt+6TbB5AP0lI9d+mHkWV5E9jIbgbYeAyEjDeILTD+AzpP b3bZLARTAVybtpBhg/JmOFHiu6DVKT/8JwZzlGWFTJiJmBqVvU46XWubF AnDZbZoQredl+nWcsGMFDBj06QaAi5K4OsA7GMw4zmQ3WkDgc+Xxr/HpF 4/9LxzqmXdS4i8Kkt4D4ytmn5b74mhtY0s7cwYIdF7Jd9qAyH6qg0lLjp pM/kP90sPK9nfKbwTFGa+eJ5lqyOVGmo+1nG/5ESMmc4FkghScV28flUk g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710876" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710876" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:52 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479803" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479803" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:50 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 10/39] x86/mm: Introduce _PAGE_COW Date: Fri, 2 Dec 2022 16:35:37 -0800 Message-Id: <20221203003606.6838-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151246410611975?= X-GMAIL-MSGID: =?utf-8?q?1751151246410611975?= Some OSes have a greater dependence on software available bits in PTEs than Linux. That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. So in order to support shadow stack memory, Linux should avoid creating memory with this PTE bit combination unless it intends for it to be shadow stack. The reason it's lightly used is that Dirty=1 is normally set by HW _before_ a write. A write with a Write=0 PTE would typically only generate a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1 *and* generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware which supports shadow stacks will no longer exhibit this oddity. So that leaves Write=0,Dirty=1 PTEs created in software. To achieve this, in places where Linux normally creates Write=0,Dirty=1, it can use the software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0,Dirty=1, it instead creates Write=0,Cow=1 except for shadow stack, which is Write=0,Dirty=1. Further differentiated by VMA flags, these PTE bit combinations would be set as follows for various types of memory: (Write=0,Cow=1,Dirty=0): - A modified, copy-on-write (COW) page. Previously when a typical anonymous writable mapping was made COW via fork(), the kernel would mark it Write=0,Dirty=1. Now it will instead use the Cow bit. This happens in copy_present_pte(). - A R/O page that has been COW'ed. The user page is in a R/O VMA, and get_user_pages(FOLL_FORCE) needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as Write=0 and Cow=1. - A shared shadow stack PTE. When a shadow stack page is being shared among processes (this happens at fork()), its PTE is made Dirty=0, so the next shadow stack access causes a fault, and the page is duplicated and Dirty=1 is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (Write=0,Cow=0,Dirty=1): - A shadow stack PTE. - A Cow PTE created when a processor without shadow stack support set Dirty=1. There are six bits left available to software in the 64-bit PTE after consuming a bit for _PAGE_COW. No space is consumed in 32-bit kernels because shadow stacks are not enabled there. This is a prepratory patch. Changes to actually start marking _PAGE_COW will follow once other pieces are in place. Tested-by: Pengfei Xu Tested-by: John Allen Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v4: - Teach pte_flags_need_flush() about _PAGE_COW bit - Break apart patch for better bisectability v3: - Add comment around _PAGE_TABLE in response to comment from (Andrew Cooper) - Check for PSE in pmd_shstk (Andrew Cooper) - Get to the point quicker in commit log (Andrew Cooper) - Clarify and reorder commit log for why the PTE bit examples have multiple entries. Apply same changes for comment. (peterz) - Fix comment that implied dirty bit for COW was a specific x86 thing (peterz) - Fix swapping of Write/Dirty (PeterZ) v2: - Update commit log with comments (Dave Hansen) - Add comments in code to explain pte modification code better (Dave) - Clarify info on the meaning of various Write,Cow,Dirty combinations arch/x86/include/asm/pgtable.h | 78 ++++++++++++++++++++++++++++ arch/x86/include/asm/pgtable_types.h | 59 +++++++++++++++++++-- arch/x86/include/asm/tlbflush.h | 3 +- 3 files changed, 134 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a1d6f121ee35..ee5fbdc2615f 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -300,6 +300,44 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +/* + * Normally COW memory can result in Dirty=1,Write=0 PTs. But in the case + * of X86_FEATURE_USER_SHSTK, the software COW bit is used, since the + * Dirty=1,Write=0 will result in the memory being treated as shaodw stack + * by the HW. So when creating COW memory, a software bit is used + * _PAGE_BIT_COW. The following functions pte_mkcow() and pte_clear_cow() + * take a PTE marked conventially COW (Dirty=1) and transition it to the + * shadow stack compatible version of COW (Cow=1). + */ + +static inline pte_t pte_mkcow(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pte; + + pte = pte_clear_flags(pte, _PAGE_DIRTY); + return pte_set_flags(pte, _PAGE_COW); +} + +static inline pte_t pte_clear_cow(pte_t pte) +{ + /* + * _PAGE_COW is unnecessary on !X86_FEATURE_USER_SHSTK kernels. + * See the _PAGE_COW definition for more details. + */ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pte; + + /* + * PTE is getting copied-on-write, so it will be dirtied + * if writable, or made shadow stack if shadow stack and + * being copied on access. Set they dirty bit for both + * cases. + */ + pte = pte_set_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { @@ -396,6 +434,26 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +/* See comments above pte_mkcow() */ +static inline pmd_t pmd_mkcow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pmd; + + pmd = pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_set_flags(pmd, _PAGE_COW); +} + +/* See comments above pte_mkcow() */ +static inline pmd_t pmd_clear_cow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pmd; + + pmd = pmd_set_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pmd_uffd_wp(pmd_t pmd) { @@ -467,6 +525,26 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) return native_make_pud(v & ~clear); } +/* See comments above pte_mkcow() */ +static inline pud_t pud_mkcow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pud; + + pud = pud_clear_flags(pud, _PAGE_DIRTY); + return pud_set_flags(pud, _PAGE_COW); +} + +/* See comments above pte_mkcow() */ +static inline pud_t pud_clear_cow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pud; + + pud = pud_set_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_COW); +} + static inline pud_t pud_mkold(pud_t pud) { return pud_clear_flags(pud, _PAGE_ACCESSED); diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 0646ad00178b..5c3f942865d9 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -21,7 +21,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -34,6 +35,15 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * Indicates a copy-on-write page. + */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ +#else +#define _PAGE_BIT_COW 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +127,40 @@ #define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif +/* + * The hardware requires shadow stack to be read-only and Dirty. + * _PAGE_COW is a software-only bit used to separate copy-on-write PTEs + * from shadow stack PTEs: + * + * (Write=0,Cow=1,Dirty=0): + * - A modified, copy-on-write (COW) page. Previously when a typical + * anonymous writable mapping was made COW via fork(), the kernel would + * mark it Write=0,Dirty=1. Now it will instead use the Cow bit. This + * happens in copy_present_pte(). + * - A R/O page that has been COW'ed. The user page is in a R/O VMA, + * and get_user_pages(FOLL_FORCE) needs a writable copy. The page fault + * handler creates a copy of the page and sets the new copy's PTE as + * Write=0 and Cow=1. + * - A shared shadow stack PTE. When a shadow stack page is being shared + * among processes (this happens at fork()), its PTE is made Dirty=0, so + * the next shadow stack access causes a fault, and the page is + * duplicated and Dirty=1 is set again. This is the COW equivalent for + * shadow stack pages, even though it's copy-on-access rather than + * copy-on-write. + * + * (Write=0,Cow=0,Dirty=1): + * - A shadow stack PTE. + * - A Cow PTE created when a processor without shadow stack support set + * Dirty=1. + */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_COW) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* @@ -186,12 +230,17 @@ enum page_cache_mode { #define PAGE_READONLY __pg(__PP| 0|_USR|___A|__NX| 0| 0| 0) #define PAGE_READONLY_EXEC __pg(__PP| 0|_USR|___A| 0| 0| 0| 0) -#define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) -#define _KERNPG_TABLE_NOENC (__PP|__RW| 0|___A| 0|___D| 0| 0) -#define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) +/* + * Page tables needs to have Write=1 in order for any lower PTEs to be + * writable. This includes shadow stack memory (Write=0, Dirty=1) + */ #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) +#define _KERNPG_TABLE_NOENC (__PP|__RW| 0|___A| 0|___D| 0| 0) +#define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) + +#define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) #define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 662598dea937..08d9b5fce4fd 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -283,7 +283,8 @@ static inline bool pte_flags_need_flush(unsigned long oldflags, const pteval_t flush_on_clear = _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED; const pteval_t software_flags = _PAGE_SOFTW1 | _PAGE_SOFTW2 | - _PAGE_SOFTW3 | _PAGE_SOFTW4; + _PAGE_SOFTW3 | _PAGE_SOFTW4 | + _PAGE_COW; const pteval_t flush_on_change = _PAGE_RW | _PAGE_USER | _PAGE_PWT | _PAGE_PCD | _PAGE_PSE | _PAGE_GLOBAL | _PAGE_PAT | _PAGE_PAT_LARGE | _PAGE_PKEY_BIT0 | _PAGE_PKEY_BIT1 | From patchwork Sat Dec 3 00:35:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29166 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142123wrr; Fri, 2 Dec 2022 16:39:43 -0800 (PST) X-Google-Smtp-Source: AA0mqf78ZZmgIC7aQia/6Un1cfRvXI0OwMwe7Ia8sWk5oiN1oOnRjK0kKjmLgFw6ngzkDbP2e4dv X-Received: by 2002:a17:906:c24d:b0:7ac:2e16:a8d2 with SMTP id bl13-20020a170906c24d00b007ac2e16a8d2mr51555523ejb.584.1670027982940; Fri, 02 Dec 2022 16:39:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027982; cv=none; d=google.com; s=arc-20160816; b=sdcM8ViBl1YJCmZsUvtAFOp1wrps+liq/vpa/SlACWwPZBhh3LQklzPZ+ELJyNotb+ ksP4e8NkbuF/O8X1DYBqJBvz2tjGVonYlBkH4kVQk3HCy+DJj26Qp184XbuNqcbDH0Fo nFtLUbCsipAYS0pNr/84jNnnQ0pVLiwbQuqLUDDTWgX/XFmw8Jb0C5GZroS6lBUB/Ut0 hE83f4DtfooI925EmrzqqSW5zYjlcVvU1kfzuZBvenKiJsgPIqfBqRTwW2aZyYY+A4HC jV/9Pmkzbx2WiK34I+banI4Caa187DPoLujqNElkY3JiXWdwEvbtmlfUQGR5rlbXAX3V toew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=lU+0CnThAPH93LMRHhrbqzGYPcCfg0UH1GmNHKF9n6Q=; b=YlzhAIMACoUWcI/JbhnyQf4S5Kr3MFLQPO/mRUg6aDuAJxZGKi/5cRNsI8TY3PyBLP h3Dk1N7iPs4wL4yJc7StxMJ0OAfE8FAdvunQyYYf7//8SQLnuy0LJjz4ct4VY8yxivzt OxgOZqtYn2JflCYO9leoAD8XIv/uOtdrgPbGrjcib4dcAN6qGbn4NkHufpxr7QQmdNPX 2ALpuQ0VX1y1o1iiHbzlpv81NUvWvEpKdqbMdzjmSuPDRDosJkcBREkG60/HkyCYw93I Ai2++//UxVKoQhxfYcNpAWLIeBeJjv/iJGYuBqm8mu0ETlOYC7jizPZqYCjKlEHBmUAD whUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gjfbeWdt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e10-20020a17090658ca00b007be4dd9ef7csi7783946ejs.402.2022.12.02.16.39.20; Fri, 02 Dec 2022 16:39:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gjfbeWdt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235099AbiLCAii (ORCPT + 99 others); Fri, 2 Dec 2022 19:38:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235100AbiLCAhw (ORCPT ); Fri, 2 Dec 2022 19:37:52 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61CE3FB896; Fri, 2 Dec 2022 16:36:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027818; x=1701563818; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=VdMyMevhZAddtRomIPqld9oZXe9pcHrfJ+wDgFx8LIU=; b=gjfbeWdtx1oNZa+0OtnjeIbaXPdUTg6OfgceMuLjfFP6LiLnEX9GMsFr 7GSxW+hLK3uyM6hR2vYcs+py6P9nyHkiEIFNhvfbdSTsIqn7N53tYtK46 XOvq9ToE7VpPwQrifs/89GRKDaP5Kmv9D2lMJN/doVjsPxCyCbk9tWPIF DkYqr0VaLOUheF1MXC5+aOhGtC1N6neVWU8qaB8Hi7YRM0SQIdoooEW70 pvNUUYMpKzTDCSdT7KzddWDwLjMQPaKMfROWnmvSyzC2DYH+bJVD7TRJN PAuOGQmZo8ZX4yH4lBbLf/IXxVy9hjyIPHxRYIBzBcNz9UqxAY2DsokjO w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710914" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710914" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:57 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479818" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479818" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:52 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 11/39] x86/mm: Update pte_modify for _PAGE_COW Date: Fri, 2 Dec 2022 16:35:38 -0800 Message-Id: <20221203003606.6838-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151262469412536?= X-GMAIL-MSGID: =?utf-8?q?1751151262469412536?= From: Yu-cheng Yu The Write=0,Dirty=1 PTE has been used to indicate copy-on-write pages. However, newer x86 processors also regard a Write=0,Dirty=1 PTE as a shadow stack page. In order to separate the two, the software-defined _PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and pte_*() are updated to do this. pte_modify() takes a "raw" pgprot_t which was not necessarily created with any of the existing PTE bit helpers. That means that it can return a pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to create one. However pte_modify() changes a PTE to 'newprot', but it doesn't use the pte_*(). Modify it to also move _PAGE_DIRTY to _PAGE_COW. Apply the same changes to pmd_modify(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v4: - Fix an issue in soft-dirty test, where pte_modify() would detect _PAGE_COW in pte_dirty() and set the soft dirty bit in pte_mkdirty(). v2: - Update commit log with text and suggestions from (Dave Hansen) - Drop fixup_dirty_pte() in favor of clearing the HW dirty bit along with the _PAGE_CHG_MASK masking, then calling pte_mkdirty() (Dave Hansen) arch/x86/include/asm/pgtable.h | 40 +++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index ee5fbdc2615f..67bd2627c293 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -698,26 +698,54 @@ static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { + pteval_t _page_chg_mask_no_dirty = _PAGE_CHG_MASK & ~_PAGE_DIRTY; pteval_t val = pte_val(pte), oldval = val; + pte_t pte_result; /* * Chop off the NX bit (if present), and add the NX portion of * the newprot (if present): */ - val &= _PAGE_CHG_MASK; - val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; + val &= _page_chg_mask_no_dirty; + val |= check_pgprot(newprot) & ~_page_chg_mask_no_dirty; val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); - return __pte(val); + + pte_result = __pte(val); + + /* + * Dirty bit is not preserved above so it can be done + * in a special way for the shadow stack case, where it + * needs to set _PAGE_COW. pte_mkcow() will do this in + * the case of shadow stack. + */ + if (pte_dirty(pte_result)) + pte_result = pte_mkcow(pte_result); + + return pte_result; } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { + pteval_t _hpage_chg_mask_no_dirty = _HPAGE_CHG_MASK & ~_PAGE_DIRTY; pmdval_t val = pmd_val(pmd), oldval = val; + pmd_t pmd_result; - val &= _HPAGE_CHG_MASK; - val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val &= _hpage_chg_mask_no_dirty; + val |= check_pgprot(newprot) & ~_hpage_chg_mask_no_dirty; val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); - return __pmd(val); + + pmd_result = __pmd(val); + + /* + * Dirty bit is not preserved above so it can be done + * in a special way for the shadow stack case, where it + * needs to set _PAGE_COW. pmd_mkcow() will do this in + * the case of shadow stack. + */ + if (pmd_dirty(pmd_result)) + pmd_result = pmd_mkcow(pmd_result); + + return pmd_result; } /* From patchwork Sat Dec 3 00:35:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29168 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142286wrr; Fri, 2 Dec 2022 16:40:15 -0800 (PST) X-Google-Smtp-Source: AA0mqf6HgRCpLtrQNTYZlq4egvnz1i5BmjoSB6XCwNnX3cfnmTIwthfqGa6qpjbNbi9EIOSpj8Mc X-Received: by 2002:a05:6402:3707:b0:467:6847:83d3 with SMTP id ek7-20020a056402370700b00467684783d3mr26153021edb.247.1670028015316; Fri, 02 Dec 2022 16:40:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028015; cv=none; d=google.com; s=arc-20160816; b=L55cnOmweZk0fjggFrFgjpZrBOrQ4Q6tcwE/iMUzC1YrNQfxYCHTpYFR/YDHSmP6G7 HKEOeddJFVw34vAoNinmzJiDbDBjHguJI9hJNXCV4R6VN48gZNQsH5EeVx82HMQX9Cth 493AJVxUoPfZ4X1kmadnMfnuCmZP5bYr9CqkHyuYXptT1SWVxWnO8e6Xbz6fashFrXQV jh6ZknJOYE5usa2b9+OcSq+hw440rPHLJz1kL9Zh3fn1XTpG8nk/JRJ9MTzSzRCIuMoN V450vrD3CCNebObExh/BaK/EzYXjCq/xswHw8UDQ5WGAhJqA9A/WBMvzZ6WO6z7kqLcV In3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=aR2IwIcBM4q2qnqTcEoPI9s+IKsCSRfLILwEqGZ+4FY=; b=iHPghgUVSFyxO20AOUThzPPTsj3L0qgYJbQA6ZEmky8ZJUb54KvO82jUqSteyGUIa8 LTGZNoqFr3orwFLuEj+rSbVM3sNghUr2RobRRTsZPkEt5eLt/q0AuctjWzvyRMZYGk3q 61kKtvuzj0ModVYPQZdv7Jr0zycY4ayrDGLbbI62SOsfaJEUNACwYFpbPP+q8xIxDwDN NDPaEw7cVnMpZT+x0b11bfiWryZyxe8taBxxyNZltYQ9a80t/ktwpUmX4X9NJ+MApfs0 QZ6LU3cxJFyGweHgsVae0TLXPz1/PsUXReJj9TH36ewccFT1Xs2Rza90dILXVOtzOFl3 MLMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=amCcSKnC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m9-20020aa7d349000000b004615c5728e8si6682430edr.494.2022.12.02.16.39.52; Fri, 02 Dec 2022 16:40:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=amCcSKnC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235019AbiLCAiu (ORCPT + 99 others); Fri, 2 Dec 2022 19:38:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235117AbiLCAhz (ORCPT ); Fri, 2 Dec 2022 19:37:55 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06337FB8B7; Fri, 2 Dec 2022 16:37:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027821; x=1701563821; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=mMUmCMQ2UpuIfMiP4n71ImLyKciupdTDYlrsPhVTKDs=; b=amCcSKnCNnD9eKsl8dqaLnknkLm6WjS38RqDJqNyuKmuvZ7mvlcXtx+R yG4l5Z20EWkQWPq5W86piPxsdzqj9JJ0DQmFKGpa9fBW2sSIS97xORznY 7HxIo5JF+Vv+wWxFqD7gfvhWm4FjY3KKCVvzaACb3dpAXvtu00TrFWl3i U5pvNDdrQExZZLuM4rFXG7jdKjmcd4vtPuRCxbRxba17PPL1a/NQ732hn XX0HiD/WLPcx71R37rq5+aLyfGh7S8DcDL3jf2HjmF+ju24lRNpLq1wNw CuXgZz7sA5f8Su1dI3x4RTNl9xIiRt5nKVGoEoleCmFEpI15PQH1k4j20 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710938" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710938" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:00 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479830" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479830" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:36:57 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 12/39] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Date: Fri, 2 Dec 2022 16:35:39 -0800 Message-Id: <20221203003606.6838-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151296208181479?= X-GMAIL-MSGID: =?utf-8?q?1751151296208181479?= From: Yu-cheng Yu When Shadow Stack is in use, Write=0,Dirty=1 PTE are reserved for shadow stack. Copy-on-write PTes then have Write=0,Cow=1. When a PTE goes from Write=1,Dirty=1 to Write=0,Cow=1, it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting Shadow Stack, and a TLB flush is not necessary. The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non- atomically, a transient shadow stack PTE can be created as a result. Thus, prevent that with cmpxchg. In the case of pmdp_set_wrprotect(), for nopmd configs the ->pmd operated on does not exist and the logic would need to be different. Although the extra functionality will normally be optimized out when user shadow stacks are not configured, also exclude it in the preprocessor stage so that it will still compile. User shadow stack is not supported there by Linux anyway. Leave the cpu_feature_enabled() check so that the functionality also disables based on runtime detection of the feature. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v3: - Remove unnecessary #ifdef (Dave Hansen) v2: - Compile out some code due to clang build error - Clarify commit log (dhansen) - Normalize PTE bit descriptions between patches (dhansen) - Update comment with text from (dhansen) Yu-cheng v30: - Replace (pmdval_t) cast with CONFIG_PGTABLE_LEVELES > 2 (Borislav Petkov). arch/x86/include/asm/pgtable.h | 35 ++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 67bd2627c293..b68428099932 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1195,6 +1195,21 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1247,6 +1262,26 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#ifdef CONFIG_X86_USER_SHADOW_STACK + /* + * If Shadow Stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY + * to _PAGE_COW (see comments at pmd_wrprotect()). + * When a thread reads a RW=1, Dirty=0 PMD and before changing it + * to RW=0, Dirty=0, another thread could have written to the page + * and the PMD is RW=1, Dirty=1 now. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg(&pmdp->pmd, &old_pmd.pmd, new_pmd.pmd)); + + return; + } +#endif + clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } From patchwork Sat Dec 3 00:35:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29167 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142214wrr; Fri, 2 Dec 2022 16:39:58 -0800 (PST) X-Google-Smtp-Source: AA0mqf7+oots0valgwTW13C5eLd5bOS0UYApLcaM7TfO0GMBhHNwQPc0ANa3A2tNB5thSwMqvbBN X-Received: by 2002:a17:906:8e0a:b0:7b9:bef6:3eea with SMTP id rx10-20020a1709068e0a00b007b9bef63eeamr13469224ejc.487.1670027998717; Fri, 02 Dec 2022 16:39:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670027998; cv=none; d=google.com; s=arc-20160816; b=iOI8aVo0aAzQVgn4Iw8EuwcVpd6YSe3+rNafAz6qscaN+oQXN55higixR9DVu9gI0g VAxJ3acQPpEYgNquFwfLkiZst6gMaNXVlj1nRZyD6WtfbjG+2PbqZeRAACF4UezQVfW4 +hZKH/hM+KvjskVW1perhJVs1P3gAdP4Gmw6Nu2elA4UK7HrH1G+SJaRzqToRrUchr9c RryCIgTkTyf7rlMvP2nr4zJ0myDH+QvHqA+AIJXk6HQHKEgmk03eUcIKNp1dSOKwpxK7 f4193PoYV4XXI6iZoLCsayaNin3jd8eAJV0/mcOIFvftCBJzyQ6bqr+Sa2ak9+8EZAxK yaPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=VsvIIjvPr3cEBisQsJ5YYViPV9gqjubxmF8YarrlHGs=; b=Z5sRQNEOgZGlu7i2Y8nrdcfW1Q5z9Lp7WKeGOoiSUIQE0t53KSc3pitXkZlPC9KVS7 Bjh8GAzb/408n7exHjTe+dRVPwzXSTK1illXXNMONVpChEZa8FLUFcO2iOUSpiUT7MN6 BogRBjULqgRrLZL6nGxWTORS5v/PYTUC6sej4E85x+xZbe51FKbbZxdfksDXE4p2F1N4 7YI2I2rRn4sClcIEkAWl0wR6dGi6KRBWxooMDrN8rNExD6G8BuG67hYW+A1jGZz7cgWP jToQFqJSoJLm5oMPhOv6pMimXUw2ki/yanO2h2gDPfWfyIlCnQkxBcs1ADvL1fQ5avbn MaOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UZloUitv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dr19-20020a170907721300b007bdba0ecae2si7522137ejc.944.2022.12.02.16.39.36; Fri, 02 Dec 2022 16:39:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UZloUitv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235077AbiLCAjA (ORCPT + 99 others); Fri, 2 Dec 2022 19:39:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235125AbiLCAiE (ORCPT ); Fri, 2 Dec 2022 19:38:04 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F604F8195; Fri, 2 Dec 2022 16:37:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027823; x=1701563823; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=kpMw19wz7f4XheX0HpHvGFy0c4NCHOFCF5F75RrJvt0=; b=UZloUitvpE/KjhPhTc2u9EYqWXi+DAH7auDpy1d5tlP5+NdBduu9f0/p 7dQjF5yCAgv0/tb0QCjF9UKDVMcNLXljTjnmzJikgSeXzdst9xokh1gnH E0AmdrRIJKWtc3/MfMqIVhIw26EHYnqdMSQpibQ9dzD2XSJba5YxewS4C gaPHy1FY6PbMcG4e6KSwMELN1j/wqrwPutOf7fujZE2zPc1exEjVyHYwu gGq6jWwvtOCfs6byTPfonG5jwCgeUUBE1o18NUTumxWSNTdGmRY1dEOo0 8C1vUvCF9wVLFqwsKUbOwLAcb2+7j5IE+anYIQ0a6t8kjE8h3UXNaaa4a Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710962" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710962" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:02 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479846" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479846" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:00 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 13/39] x86/mm: Start actually marking _PAGE_COW Date: Fri, 2 Dec 2022 16:35:40 -0800 Message-Id: <20221203003606.6838-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151278328249517?= X-GMAIL-MSGID: =?utf-8?q?1751151278328249517?= The recently introduced _PAGE_COW should be used instead of the HW Dirty bit whenever a PTE is Write=0, in order to not inadvertently create shadow stack PTEs. Update pte_mk*() helpers to do this, and apply the same changes to pmd and pud. Tested-by: Pengfei Xu Tested-by: John Allen Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v4: - Break part patch for better bisectability arch/x86/include/asm/pgtable.h | 133 ++++++++++++++++++++++++++++----- 1 file changed, 113 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b68428099932..4a149cec0c07 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,9 +124,17 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -134,9 +142,18 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) == + (_PAGE_DIRTY | _PAGE_PSE); } static inline int pmd_young(pmd_t pmd) @@ -144,9 +161,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -156,13 +173,21 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -357,7 +382,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -367,7 +392,16 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (Write=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pte_dirty(pte)) + pte = pte_mkcow(pte); + return pte; } static inline pte_t pte_mkexec(pte_t pte) @@ -377,7 +411,19 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pteval_t dirty = _PAGE_DIRTY; + + /* Avoid creating Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + /* pte_clear_cow() also sets Dirty=1 */ + return pte_clear_cow(pte); } static inline pte_t pte_mkyoung(pte_t pte) @@ -387,7 +433,12 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { - return pte_set_flags(pte, _PAGE_RW); + pte = pte_set_flags(pte, _PAGE_RW); + + if (pte_dirty(pte)) + pte = pte_clear_cow(pte); + + return pte; } static inline pte_t pte_mkhuge(pte_t pte) @@ -478,17 +529,36 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pmd_dirty(pmd)) + pmd = pmd_mkcow(pmd); + return pmd; } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmdval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pmd_write(pmd)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd_clear_cow(pmd); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -508,7 +578,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_RW); + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_cow(pmd); + return pmd; } static inline pud_t pud_set_flags(pud_t pud, pudval_t set) @@ -552,17 +626,32 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pud_dirty(pud)) + pud = pud_mkcow(pud); + return pud; } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pud_write(pud)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -582,7 +671,11 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + if (pud_dirty(pud)) + pud = pud_clear_cow(pud); + return pud; } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY From patchwork Sat Dec 3 00:35:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29170 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142352wrr; Fri, 2 Dec 2022 16:40:30 -0800 (PST) X-Google-Smtp-Source: AA0mqf7FqxUTzA9UFoQWqtZK9yF09mEA2oKVzN/blnRMoKAV3fp8lFYqxjA+FWIVvZZ86pNfKtag X-Received: by 2002:a17:906:fa98:b0:7c0:a8ff:3380 with SMTP id lt24-20020a170906fa9800b007c0a8ff3380mr10138406ejb.92.1670028029865; Fri, 02 Dec 2022 16:40:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028029; cv=none; d=google.com; s=arc-20160816; b=dvRJAz95YejQ7b9590LJ52+7JzetHETyx6Vtyx1Ur0MYpEWsQfYvMld66uTrV6Vy26 ETXHgT9uG2bKNwAbHK5RllM4fUmiPSQfy82/F18mslkWMIF/s4xohjZqKF1CVp2Bfydy KDmd2vvsx10G740FXCLxowzuHb7ckGNLIX6JTv2WxZ0kkdeGij9iUX/9flRpcAnP6NmK 55JzBE8B22HyTzPc/j1VYjnwhstrZrZ226AOpJ+lC4ligOQ7C5BNwRMy02/ChuYtVCwF hKSOV6FqH3nrLRepfqrxK7dJnyYpJLIP6tgc5B/kupW3gEl7sSbDM8t1vha0GYjjTmtE XYfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=TAdyPKl0YrWCKkqh8A7Y3gcggR6k9vOQgJyAAvPuKRg=; b=Ny/q9UXWs/me2oBrkISy2Wy/t2XAUTEBeSl7Wj4T2GLPeRKeVyfLOpHWM18I3htalc cZNETAJ6eLBiFf8JIMXa9txG9e7Qu4V9LrP8ziCNi7xLj9N0kNKniE4uyoVXmK5JFES8 T5wQ9YQeW7SnQE/t5JOvAZQVKs1VMC2sOveRe7kSqHMCEXzW/g8Hy2DN48YJ4hNgvHR6 k1fPhgjKsgahvcrtedctYJc7wvLTnWcFJRRrjiDdT8DB1rJi7gBhwdeGQxsqHizlFXf8 q2UrS/tTEPqNx4Q/1b7v5TUp1tvsVFB+ZwwXD4RWfwi9CtmhhaFTrrJkZRUM9qOOXotZ g5FQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FrVlq4W0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u24-20020aa7db98000000b0046356ff4d4esi558806edt.593.2022.12.02.16.40.07; Fri, 02 Dec 2022 16:40:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FrVlq4W0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234960AbiLCAjD (ORCPT + 99 others); Fri, 2 Dec 2022 19:39:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235079AbiLCAiQ (ORCPT ); Fri, 2 Dec 2022 19:38:16 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF118F7A20; Fri, 2 Dec 2022 16:37:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027828; x=1701563828; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=1FbLmykB0ip+0YSpvmtMRXYiZF0PQna5zAJx/OBtUAc=; b=FrVlq4W0/Bnk91a/svdqlZ5VSP3D2jXi+ztKT82zIp1bWAdVLMhuoNHC KJSp+FmTSmFHCtUrxVifeyVl7Zuy1rJNjQJDQzbfCHOiYwShXEPPcCdPB 7s7YZOy+M1ADim8qNo4oJA6zCdxZp+os9Gj6/9NmYfBeSz5/GBM8RjipH /pAMTgKIMHcgd6qB+t0c9PSvu2nGqmzMKUOTgJyboWwrWEOCBkpI9h5AA t/0vtMFyARpon0mKVOAJpQ3KxBhL5jjRVRk5b6GFZQ4jLiYy2OWlK/5qp Es5PnOOza6X/tYcLsezsZbZYYUOGM3smTLzissS+r/5iE+cDntqO/XWGj Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313710991" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313710991" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:04 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479865" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479865" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:02 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Peter Xu Subject: [PATCH v4 14/39] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 Date: Fri, 2 Dec 2022 16:35:41 -0800 Message-Id: <20221203003606.6838-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151311408260296?= X-GMAIL-MSGID: =?utf-8?q?1751151311408260296?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Future patches will introduce a new VM flag VM_SHADOW_STACK that will be VM_HIGH_ARCH_BIT_5. VM_HIGH_ARCH_BIT_1 through VM_HIGH_ARCH_BIT_4 are bits 32-36, and bit 37 is the unrelated VM_UFFD_MINOR_BIT. For the sake of order, make all VM_HIGH_ARCH_BITs stay together by moving VM_UFFD_MINOR_BIT from 37 to 38. This will allow VM_SHADOW_STACK to be introduced as 37. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Acked-by: Peter Xu Signed-off-by: Yu-cheng Yu Reviewed-by: Axel Rasmussen Signed-off-by: Rick Edgecombe Cc: Peter Xu Cc: Mike Kravetz --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bfac5a166cb8..9ae6bc38d0b7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -354,7 +354,7 @@ extern unsigned int kobjsize(const void *objp); #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR -# define VM_UFFD_MINOR_BIT 37 +# define VM_UFFD_MINOR_BIT 38 # define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */ #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ # define VM_UFFD_MINOR VM_NONE From patchwork Sat Dec 3 00:35:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29173 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142422wrr; Fri, 2 Dec 2022 16:40:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf5thWGf31E00xfRyb+Bg16YG5yIHgpkH5WdfS+nPfzQCeGLNo2j8M5xkXNuWm9emEqbHe6f X-Received: by 2002:a17:906:5050:b0:7b2:8f2c:a877 with SMTP id e16-20020a170906505000b007b28f2ca877mr36090179ejk.90.1670028048561; Fri, 02 Dec 2022 16:40:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028048; cv=none; d=google.com; s=arc-20160816; b=IOkhyqnwqLQJCmbrGsnN84pc2OMRxpVIcuY02JC1/onH0N99lc/MMVuLlQ2/Hk3ZnR JmWLeV9DCdgX/TssBIstMfmqcq7Enga14++m+m0z6HrS7Ld+QMG8B0yJvepnsbjcvtGU LSlSSgSZ4thX2xfBGKhrzXPE7ctkczqpJbk7UBWO9NdiDmlEaaH6mRJGVnM/Vi1jzyUJ scsJVWJtjm+79i5cftZCeLpu75fUMTPv3Zl0taFBEgD04LKh+iIuZ2eIVyrCOvhMwlkD s61ou5qEK6jhRctl9WzhCRA6W7Q/7easjJLRLjwawkFBlYoLpyVOxgRFCYxKgx6lxqBd Oksw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=82MbcTbPfeyGqiJELrxIo7/pGq6Lnw3gQQiXTZFdsgM=; b=BniSGYL+SYU6iyH46p+FCW5cdIU22b6gzExPV0Ulg32wIVhNzGoLInUd9Z3PmhFYg2 WJhCBc6H+U1hxZBk6i2CkPirjG5Ajw6p2JCUECds8dtI404p+NRHx5YUDLlFWy0CtiGt BpK8txN9eBW2sH/eZt3ySNRGH9E1suVJpHpaCH9UTEK3WbA0db8n8+rGNXenY8hXAMGQ myGxwrBAhbpFsToClBpx+37BebVo4k3izGwLOUkIy7rLvJ17oLhFJAMaLV4D9t8jE8ju qbR1t3Z4SpP7/ZVpVzahaD7sbr1tmNUG4Lm68PS9W7oDz48SHzzWW5tPswigYeLqmBrn WMTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ql96+Beg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bo28-20020a0564020b3c00b0046b807c1a03si1524621edb.77.2022.12.02.16.40.25; Fri, 02 Dec 2022 16:40:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ql96+Beg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235161AbiLCAjU (ORCPT + 99 others); Fri, 2 Dec 2022 19:39:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235020AbiLCAib (ORCPT ); Fri, 2 Dec 2022 19:38:31 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B881FC2BB; Fri, 2 Dec 2022 16:37:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027834; x=1701563834; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=/8ygfG70rYB7SEC4SbN3orD5L3ZmTAS2ypn+csy1744=; b=Ql96+BegosplW52c00P7hjzptjjiv90ulfuDWJBg0CFGq1VeuprcR+be nW2IT3TSo1+bVP3YTaBeWHX2BtW7wFEmZ3ANIrlRVr7E69v8SFmwYMweh Ot7Pfp78GhBv47Z7QQ4GiiVsjD/1XkKuqbtlmD5f4nig4Ufw70T4ken67 XeCf7wFGTq2gcp2s0i2qxxqfxiKgA+HCpzwULdHNSl2O2vdLiML5UpBXG EfynPrY0JtjTzomHzF5WYmBjzWPcCQrC2aup3CExk16Y2eHl7ViRGDiK9 eLy4TNz4BXbBYKk1rY1dDCCgZFyMmum1cc0So2cr88wZSCyDYL7txquR/ g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711017" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711017" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:07 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479877" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479877" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:04 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 15/39] mm: Introduce VM_SHADOW_STACK for shadow stack memory Date: Fri, 2 Dec 2022 16:35:42 -0800 Message-Id: <20221203003606.6838-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151330964225575?= X-GMAIL-MSGID: =?utf-8?q?1751151330964225575?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. A shadow stack PTE must be read-only and have _PAGE_DIRTY set. However, read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These two cases are handled differently for page faults. Introduce VM_SHADOW_STACK to track shadow stack VMAs. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Drop arch specific change in arch_vma_name(). The memory can show as anonymous (Kirill) - Change CONFIG_ARCH_HAS_SHADOW_STACK to CONFIG_X86_USER_SHADOW_STACK in show_smap_vma_flags() (Boris) Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 3 files changed, 12 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 898c99eae8e4..05506dfa0480 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -560,6 +560,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking + ss shadow stack page == ======================================= Note that there is no guarantee that every flag and associated mnemonic will diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index fa3eea895210..9c17502e9746 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -703,6 +703,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_X86_USER_SHADOW_STACK + [ilog2(VM_SHADOW_STACK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 9ae6bc38d0b7..b982c2749e7b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -303,11 +303,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -323,6 +325,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Sat Dec 3 00:35:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29172 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142417wrr; Fri, 2 Dec 2022 16:40:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf6kPHXpXbudoIYQW4dzO/FQ4i73B2mHE8Ozp2nefO/UiLFp/uEdY9yXes7MR3Azfi5j0KGj X-Received: by 2002:aa7:d417:0:b0:46b:203:f389 with SMTP id z23-20020aa7d417000000b0046b0203f389mr26807916edq.303.1670028048034; Fri, 02 Dec 2022 16:40:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028048; cv=none; d=google.com; s=arc-20160816; b=SPz1Cj49BRj89Q+u6iklKx6/giwGP3xDHYZijZZy69nyxa99m29m7PdwcWF52NVGtV 0YSDPSkhUBn28wrC9+dI4/MF5J5w5PppUD50TOoiNpH/ZRUCn+9ZvKsC44EvRpq88k2m 0LcgCsow0Mx2pmxYq65IdhsjbIv2+eDM9/LJNJC20W4h40gkcM4+PAl5Xt1roPMeqhWR +264roV+d/NbNw4FyWYsmUBbc88pZw5aBWK/deTQpl71Cd3HMJEZtMLnXlxlIelhOQDv 3ogwJwVacHWJqqCi2I7SK4qoh+h4oJcrejxOtr5WnhewllxHahZBmgy7vvDBZgzYRdhK GAyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=jQ2ub1Zd+S87JuwDoSuZjc6Rp4KCKe1O+p7UBx9N2so=; b=J1YAV6TPmZCsCCMm0SYmey/kUgjkfKb9tHKive72Q+m0iD53Tm2M5YawMXgLG6fDSp YDgTf6Skl3HK4oixt85FUqEvGfPrzsFIsBPEzWfLJReZlpYzXP1atbrOc3x+lAUAP6nD vn8tRy3dE4iMRAgbVvETUPcyrKcGBWuVBmnhrafukC5thUjkswWhM0qJsokjYYgJzgb/ GYSxMcAw6UjG3FPtezRiXXqRNZUwVyNzIQbJDwlsiw3m6oOo025c+IUAfDs5ZGIlqmlM mMDo018o7uo+yiblfMA039CZudqQWwoLyEjRXQMtTfjq0yoUV9y2tnjs+bUrwPtr0znc R4kA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m4AGbYZr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id br13-20020a170906d14d00b007c0bd0edd7esi2845912ejb.906.2022.12.02.16.40.25; Fri, 02 Dec 2022 16:40:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m4AGbYZr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235171AbiLCAje (ORCPT + 99 others); Fri, 2 Dec 2022 19:39:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235098AbiLCAii (ORCPT ); Fri, 2 Dec 2022 19:38:38 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D68EEFCEB3; Fri, 2 Dec 2022 16:37:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027840; x=1701563840; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=47/VmWpEzBLHREjaOQMiegzkRhL0dQgFbtGsfMS6wRY=; b=m4AGbYZrW9vvbWywxI5oVFNfYsFtDcbn6xMF9CkNjmgeUuWKxAXJ/hw6 PeWnBc+uyvxYGLAaSNbAX7nhpQQLLaubC6F2yWpMmgyghahZoriDlZHUV MCd1DwA18kT9oN52v91qWKGR0J4q7mwZpXuywcotrcipYD34RHI/jHUMU +crExBgcK810gf2pVaWn58LbfMcTDZ94gSsv9lW0ZwONjVMD/z0ZwWZWA EfviuKc230Ln/KgzEwofkTV29rGi1OYwr6TQ3rXNRjJMUb8wyhu1lKMIi 6zhffN3N/DAgGKl0w2xbmXns3/VMHryzrJLkeRfpH69mVumfc+mU3MfP+ A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711041" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711041" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:09 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479882" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479882" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:07 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 16/39] x86/mm: Check Shadow Stack page fault errors Date: Fri, 2 Dec 2022 16:35:43 -0800 Message-Id: <20221203003606.6838-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151330478741246?= X-GMAIL-MSGID: =?utf-8?q?1751151330478741246?= From: Yu-cheng Yu The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stacks accesses to shadow-stack mappings can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v4: - Further improve comment talking about FAULT_FLAG_WRITE (Peterz) v3: - Improve comment talking about using FAULT_FLAG_WRITE (Peterz) v2: - Update commit log with verbiage/feedback from Dave Hansen - Clarify reasoning for FAULT_FLAG_WRITE for all shadow stack accesses - Update comments with some verbiage from Dave Hansen Yu-cheng v30: - Update Subject line and add a verb arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 38 ++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 7b0d4ab894c8..3004ad044e9b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1138,8 +1138,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1331,6 +1345,30 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * When a page becomes COW it changes from a shadow stack permissioned + * page (Write=0,Dirty=1) to (Write=0,Dirty=0,CoW=1), which is simply + * read-only to the CPU. When shadow stack is enabled, a RET would + * normally pop the shadow stack by reading it with a "shadow stack + * read" access. However, in the COW case the shadow stack memory does + * not have shadow stack permissions, it is read-only. So it will + * generate a fault. + * + * For conventionally writable pages, a read can be serviced with a + * read only PTE, and COW would not have to happen. But for shadow + * stack, there isn't the concept of read-only shadow stack memory. + * If it is shadow stack permissioned, it can be modified via CALL and + * RET instructions. So COW needs to happen before any memory can be + * mapped with shadow stack permissions. + * + * Shadow stack accesses (read or write) need to be serviced with + * shadow stack permissioned memory, so in the case of a shadow stack + * read access, treat it as a WRITE fault so both COW will happen and + * the write fault path will tickle maybe_mkwrite() and map the memory + * shadow stack. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR) From patchwork Sat Dec 3 00:35:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29174 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142445wrr; Fri, 2 Dec 2022 16:40:54 -0800 (PST) X-Google-Smtp-Source: AA0mqf5qJXMacmK8OTV1i7GKNCon7Ef5JEEzoZGAE8PaynYJ73Cw41il9ROvUX0PvRkFMdkrYkCm X-Received: by 2002:a17:906:df47:b0:7c0:747a:1e0d with SMTP id if7-20020a170906df4700b007c0747a1e0dmr21168185ejc.224.1670028054459; Fri, 02 Dec 2022 16:40:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028054; cv=none; d=google.com; s=arc-20160816; b=p4ipxPmqIH0F9xjpg8/uCvFhdsxJTWasWbVVocOJ7pb98jFwhRGGXx/wNcZrBLpp4W Jc+HsWeop+BCF4fhupT44ebdStQrz9c7+uh93HmUTC+cl96huoArZ34MTs9o8nrF8oMK FAf1yizH5H52KA3LL+OaTkmAgCxuTWzUnzLx8wAGwfnObPilbCJpekLbwjYkXcSUmr1M 6KEbsYpfXMoml3FkJ9s8kGwkgOsQJ1xy66vekVlCps+ltDcqjTrocp/yVAvL6+7j0W5b V8C7J0Q9djevdzXw09Ds8sXv8vD+N6GkJHBx0qChoeYUV0hgszqmat9WC+nLrc4CVXi6 xczw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=vx3hz9Kn5mkI3XIbAwYx8Mqi1W9qiTJOWKkWSsUJz6s=; b=J3QC2ZzDvSjyzGfzCMIqgPfXrSiD5f2ByjBqKcHRSayzH/350V1a7gzLykxWVi/q8/ CO+9IalXIbMd6rMDAkLRkWKidUoT2HkgOr1A+KEhEMXRwR7/nhxP+0k6gDmOXMn7/UEB uFsQkwsVo4/AJr3bT+JNrSGyX0IjMyI3+YRkGEAm5KLUy0SlfYjvcEnSPOp6n9X/sRr6 heXAJvrcNt8xo/MBkvXLZ0LL0vFKPrn1ckHlOvxZdSQOVKxaRKI9MwaW6zUhcSY8XdlX LENklCSMhAgb0WwjMTR+g6XuGyalxJRY19lrG/iYnKOcp0Oy9Mm1sR8Msn8aTbBFik1Y k4eA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=l+4H+KBy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i9-20020a1709064fc900b00781dbeebeb1si7840108ejw.585.2022.12.02.16.40.32; Fri, 02 Dec 2022 16:40:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=l+4H+KBy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234270AbiLCAjh (ORCPT + 99 others); Fri, 2 Dec 2022 19:39:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235101AbiLCAin (ORCPT ); Fri, 2 Dec 2022 19:38:43 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E53A2F7A23; Fri, 2 Dec 2022 16:37:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027841; x=1701563841; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=x0T/QDAU3EHOSyhU8yagcYDheL0wHPjLRgm6j3SQavQ=; b=l+4H+KByIsB0duFCJlhg71Vz+hVaSI6BaiMvNnhyqNNyJfN/CjMqEoPi KeK/38Q3UQpZgL8A1F/zZOnIoZBWq/sIgo0O1vCS5QJcMYT/oHJSn+JJ7 SdczVE29OV9A2ri2s0aK9SQnxN0iIKlGqM6y5TiDPChEobIcbOxyybTmv EeU6pTOvbiHaKS0h1zqEmR15nCVgXzb+/D+DC93N4BVAzJ6cVy/X63wQK K4/uqv1cmIcrW/Rrv1difrsGsgu2nywnqpGEw1UQYK88zF7Fj9tegW9Gc 4D+PIH7kP3lXV0KmX0Le4BxXxyL18IVGAwxpfB/xrt3kNmf9EhK+U9eWk w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711063" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711063" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:11 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479892" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479892" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:09 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 17/39] x86/mm: Update maybe_mkwrite() for shadow stack Date: Fri, 2 Dec 2022 16:35:44 -0800 Message-Id: <20221203003606.6838-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151337000188192?= X-GMAIL-MSGID: =?utf-8?q?1751151337000188192?= From: Yu-cheng Yu When serving a page fault, maybe_mkwrite() makes a PTE writable if there is a write access to it, and its vma has VM_WRITE. Shadow stack accesses to shadow stack vma's are also treated as write accesses by the fault handler. This is because setting shadow stack memory makes it writable via some instructions, so COW has to happen even for shadow stack reads. So maybe_mkwrite() should continue to set VM_WRITE vma's as normally writable, but also set VM_WRITE|VM_SHADOW_STACK vma's as shadow stack. Do this by adding a pte_mkwrite_shstk() and a cross-arch stub. Check for VM_SHADOW_STACK in maybe_mkwrite() and call pte_mkwrite_shstk() accordingly. Apply the same changes to maybe_pmd_mkwrite(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Remove unneeded define for maybe_mkwrite (Peterz) - Switch to cleaner version of maybe_mkwrite() (Peterz) v2: - Change to handle shadow stacks that are VM_WRITE|VM_SHADOW_STACK - Ditch arch specific maybe_mkwrite(), and make the code generic - Move do_anonymous_page() to next patch (Kirill) Yu-cheng v29: - Remove likely()'s. arch/x86/include/asm/pgtable.h | 2 ++ include/linux/mm.h | 13 ++++++++++--- include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 10 +++++++--- 4 files changed, 33 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 4a149cec0c07..e4530b39f378 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -420,6 +420,7 @@ static inline pte_t pte_mkdirty(pte_t pte) return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); } +#define pte_mkwrite_shstk pte_mkwrite_shstk static inline pte_t pte_mkwrite_shstk(pte_t pte) { /* pte_clear_cow() also sets Dirty=1 */ @@ -556,6 +557,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd) return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); } +#define pmd_mkwrite_shstk pmd_mkwrite_shstk static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) { return pmd_clear_cow(pmd); diff --git a/include/linux/mm.h b/include/linux/mm.h index b982c2749e7b..f10797a1b236 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1004,12 +1004,19 @@ void free_compound_page(struct page *page); * servicing faults for write access. In the normal case, do always want * pte_mkwrite. But get_user_pages can cause write faults for mappings * that do not have writing enabled, when used by access_process_vm. + * + * If a vma is shadow stack (a type of writable memory), mark the pte shadow + * stack. */ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { - if (likely(vma->vm_flags & VM_WRITE)) - pte = pte_mkwrite(pte); - return pte; + if (!(vma->vm_flags & VM_WRITE)) + return pte; + + if (vma->vm_flags & VM_SHADOW_STACK) + return pte_mkwrite_shstk(pte); + + return pte_mkwrite(pte); } vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 70e2a7e06a76..d8096578610a 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -524,6 +524,13 @@ static inline pte_t pte_sw_mkyoung(pte_t pte) #define pte_mk_savedwrite pte_mkwrite #endif +#ifndef pte_mkwrite_shstk +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + return pte; +} +#endif + #ifndef pte_clear_savedwrite #define pte_clear_savedwrite pte_wrprotect #endif @@ -532,6 +539,13 @@ static inline pte_t pte_sw_mkyoung(pte_t pte) #define pmd_savedwrite pmd_write #endif +#ifndef pmd_mkwrite_shstk +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd; +} +#endif + #ifndef pmd_mk_savedwrite #define pmd_mk_savedwrite pmd_mkwrite #endif diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 811d19b5c4f6..60451e588955 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -553,9 +553,13 @@ __setup("transparent_hugepage=", setup_transparent_hugepage); pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { - if (likely(vma->vm_flags & VM_WRITE)) - pmd = pmd_mkwrite(pmd); - return pmd; + if (!(vma->vm_flags & VM_WRITE)) + return pmd; + + if (vma->vm_flags & VM_SHADOW_STACK) + return pmd_mkwrite_shstk(pmd); + + return pmd_mkwrite(pmd); } #ifdef CONFIG_MEMCG From patchwork Sat Dec 3 00:35:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29175 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142475wrr; Fri, 2 Dec 2022 16:41:04 -0800 (PST) X-Google-Smtp-Source: AA0mqf4JZ7y/yXw1RbGSBOp8QTYhT487HLWKNbv75mt+yePAU0A6+V60UGyhoxsOiQX8ufXLzzH9 X-Received: by 2002:a05:6402:10c4:b0:467:7827:232 with SMTP id p4-20020a05640210c400b0046778270232mr68321253edu.268.1670028064031; Fri, 02 Dec 2022 16:41:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028064; cv=none; d=google.com; s=arc-20160816; b=IaK6MFik4ciCgf9u37+BB80S3qCS3Fs52ffFidTrK+XQV2o9ARogmBiJQ328Aotuqr 2VuPv3hzNIamebuYct7eDNdQ9afrLLQMrvlPweueRJGEls0ocuAkA87fR8QhVpFQ4jhv EuGi7uFkFKOxAVL0Zx8jx1hikq013eT3mvJmMFZGu4UtUWUkUxL1pc+L7ctm5vDVI0aa PWeMcdOlQJH/fI5dcMfOsTfg+5zfIYQoLU2UDlGbo1lwEjzhhn8B6tSWcRww0PE52GRn CIOxM/m8slgpp5FspJG+gdpSURFDQ+XMwp1C3lnyRbI0QeAwQ94BPm8fCU21ciChh5NM UQyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=+rIRpWV7Hn8wRR72TTv6GnnkFnDDa1HlOJnEPDvvIkY=; b=LQ6CGhoDPwvJQYcCwDOX2g2OcZXXy5RlS/5RjRZd7RQIhqUZTmHTM4xUfY8zpg3u8x tAMzqLxSMTFP7HQinS73g3RMxp9GhZp5npkFwcPm1Wd5icn28fQgeJ7wxTDWyfYA2OSd C0BJAG6vxP1ykMgljO+s5+HUM8FK9Ww5uh8aUhZIfWLHyuJsKCnEr4wR5N9mke4c86TU 902MR4jVhJ2f+vGRVT25Ni95zqR+OZn64VdkJw473U+CAH2Daln1DkT6l8Rg8jigmvLd SF9G1sGiITn5Uybcwhc0IY/VW8cskepUZ627ZkSFlPgUTae9JvzNHpSGsUOx5j6L4y8S bUaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="HCwZl4W/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cw22-20020a170906479600b00782e85ae302si8124989ejc.574.2022.12.02.16.40.41; Fri, 02 Dec 2022 16:41:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="HCwZl4W/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235117AbiLCAjz (ORCPT + 99 others); Fri, 2 Dec 2022 19:39:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234994AbiLCAi5 (ORCPT ); Fri, 2 Dec 2022 19:38:57 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B2C2FF812; Fri, 2 Dec 2022 16:37:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027844; x=1701563844; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=5fQPilj2pCPckJowlDuif3gKg3nQuarhXSyucpbcgNM=; b=HCwZl4W/fi27609ejETMFt9TiTfz8MaBCETJKC9VB5GqfBpcYib+DgHN 2WkNfJhweC+vxsrybg76TEMAOsAMtz/CS745S1CZaj1ubb8uVLsyz5OuU ppSrOS3KLkwXTB8zuz+b4cNv/2oDhM7L9Em3M2fOmnV23Dq5WAsqjD5Gs eW0Up1leAKG8ekSTsAiA+uG8j8gyW1pSePwtG3A4aCFCGjlb4oQ1t7cXR 5u0mzOFlD1j64EYWgmQWHwMAwFNBcRKZ2EtTEUjgNiyx6eipTnnejUES8 ZxQ8xYr/QRwDF1qSIlBir0zjK/8D7ea9RHUHzWp0100Xq47ETO6sXYYy9 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711087" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711087" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:13 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479898" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479898" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:11 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 18/39] mm: Fixup places that call pte_mkwrite() directly Date: Fri, 2 Dec 2022 16:35:45 -0800 Message-Id: <20221203003606.6838-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151347228856331?= X-GMAIL-MSGID: =?utf-8?q?1751151347228856331?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. With the introduction of shadow stack memory there are two ways a pte can be writable: regular writable memory and shadow stack memory. In past patches, maybe_mkwrite() has been updated to apply pte_mkwrite() or pte_mkwrite_shstk() depending on the VMA flag. This covers most cases where a PTE is made writable. However, there are places where pte_mkwrite() is called directly and the logic should now also create a shadow stack PTE in the case of a shadow stack VMA. - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly and call pte_mkwrite(). Teach it about pte_mkwrite_shstk() - When userfaultfd is creating a PTE after userspace handles the fault it calls pte_mkwrite() directly. Teach it about pte_mkwrite_shstk() To make the code cleaner, introduce is_shstk_write() which simplifies checking for VM_WRITE | VM_SHADOW_STACK together. In other cases where pte_mkwrite() is called directly, the VMA will not be VM_SHADOW_STACK, and so shadow stack memory should not be created. - In the case of pte_savedwrite(), shadow stack VMA's are excluded. - In the case of the "dirty_accountable" optimization in mprotect(), shadow stack VMA's won't be VM_SHARED, so it is not nessary. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Restore do_anonymous_page() that accidetally moved commits (Kirill) - Open code maybe_mkwrite() cases from v2, so the behavior doesn't change to mark that non-writable PTEs dirty. (Nadav) v2: - Updated commit log with comment's from Dave Hansen - Dave also suggested (I understood) to maybe tweak vm_get_page_prot() to avoid having to call maybe_mkwrite(). After playing around with this I opted to *not* do this. Shadow stack memory memory is effectively writable, so having the default permissions be writable ended up mapping the zero page as writable and other surprises. So creating shadow stack memory needs to be done with manual logic like pte_mkwrite(). - Drop change in change_pte_range() because it couldn't actually trigger for shadow stack VMAs. - Clarify reasoning for skipped cases of pte_mkwrite(). Yu-cheng v25: - Apply same changes to do_huge_pmd_numa_page() as to do_numa_page(). arch/x86/include/asm/pgtable.h | 3 +++ arch/x86/mm/pgtable.c | 6 ++++++ include/linux/pgtable.h | 7 +++++++ mm/memory.c | 5 ++++- mm/migrate_device.c | 4 +++- mm/userfaultfd.c | 10 +++++++--- 6 files changed, 30 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e4530b39f378..a89dfa9174ae 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -918,6 +918,9 @@ static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd) } #endif /* CONFIG_PAGE_TABLE_ISOLATION */ +#define is_shstk_write is_shstk_write +extern bool is_shstk_write(unsigned long vm_flags); + #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 8525f2876fb4..f0e536bea3ca 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -876,3 +876,9 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +bool is_shstk_write(unsigned long vm_flags) +{ + return (vm_flags & (VM_SHADOW_STACK | VM_WRITE)) == + (VM_SHADOW_STACK | VM_WRITE); +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index d8096578610a..b4a9d9936463 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1586,6 +1586,13 @@ static inline bool arch_has_pfn_modify_check(void) } #endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */ +#ifndef is_shstk_write +static inline bool is_shstk_write(unsigned long vm_flags) +{ + return false; +} +#endif + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/memory.c b/mm/memory.c index 8a6d5c823f91..c02b6421241d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4128,7 +4128,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) + + if (is_shstk_write(vma->vm_flags)) + entry = pte_mkwrite_shstk(pte_mkdirty(entry)); + else if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 721b2365dbca..53d417683e01 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -645,7 +645,9 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto abort; } entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) + if (is_shstk_write(vma->vm_flags)) + entry = pte_mkwrite_shstk(pte_mkdirty(entry)); + else if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3a8ff47943d5..1f6d102d069b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, int ret; pte_t _dst_pte, *dst_pte; bool writable = dst_vma->vm_flags & VM_WRITE; + bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; bool vm_shared = dst_vma->vm_flags & VM_SHARED; bool page_in_cache = page_mapping(page); spinlock_t *ptl; @@ -83,9 +84,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, writable = false; } - if (writable) - _dst_pte = pte_mkwrite(_dst_pte); - else + if (writable) { + if (shstk) + _dst_pte = pte_mkwrite_shstk(_dst_pte); + else + _dst_pte = pte_mkwrite(_dst_pte); + } else /* * We need this to make sure write bit removed; as mk_pte() * could return a pte with write bit set. From patchwork Sat Dec 3 00:35:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29181 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142754wrr; Fri, 2 Dec 2022 16:42:05 -0800 (PST) X-Google-Smtp-Source: AA0mqf4s0V7THhA0P/vfxj8LgIPdaB59jgz0QGTDlN89CxQ0g2KAt19DnqisrYi4Nouhw3g7DO0B X-Received: by 2002:a17:906:5792:b0:7c0:9e1d:c83b with SMTP id k18-20020a170906579200b007c09e1dc83bmr13099932ejq.68.1670028125094; Fri, 02 Dec 2022 16:42:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028125; cv=none; d=google.com; s=arc-20160816; b=ld3lvZGAjKOSECD0ltGIccoE3f7onP5U/u4EHG/c1AvlEnkrjqloxC+yrqQqBNdpt6 qygEGOWGbys7zJQtL5flD+H0/rtvcgg6tA8kuJNGACCNzg6kqrX7LvR+lBxkdsY/6Akn mhAAlYJqCwLIyhm5JA1NDGdyw0rCUkwbjdrMqAGq3dH9IiUbgdcx/XzUVZ0qVvJ7f7Wv Zti7MQz1C6q2wkprlS2W7GoE74mi2VIxsA9V7ikkfF/QKgSpRT7Bj4zICS4JxGR57f1a YVLaX6AKvQBaWlf0RzV04B0a01+g5lFqxpxDYEH4uPIaIHb3SbltWMxOiwi8Nr7pHyuT y9DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=ZtmMKv/boA8R6lHnDKVvwD9PPIesg/l0ZMJgGagxoiU=; b=XSDbGaJp6qktEBiWDubNegwEifaLZOTFlwMbHvHDF+uI7tOxgzaOBGvwJTDhsCdN7L P2DTmXbdRzQpbwLg4TpbuwmTWYmD+PFMGukjGSu779v0gDVVJc29bEqda2WLe7LFEFFI l5aDsk3+nl4pxFl1k4dZQ4+DGbdZZajbpDEQCMEfWNYoA8DqP1sAPrFcETJRU02rToig oOvTpIlR23zuyO9cTQIHWtT+PYGJNEf0EBpMup9QvH2LC+H+KWAMbBuSL6AXLQ1e09aW LNxgudGs4lJ5XWAXCxz0aKqNpYMsZ/Bm9y7nKtmQnSaWCJYfm1NptRs6s+pwFQNYnyAz ptSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Vfvj2Fmo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id he18-20020a1709073d9200b00782e437a368si7423394ejc.160.2022.12.02.16.41.42; Fri, 02 Dec 2022 16:42:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Vfvj2Fmo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235155AbiLCAkx (ORCPT + 99 others); Fri, 2 Dec 2022 19:40:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235056AbiLCAkO (ORCPT ); Fri, 2 Dec 2022 19:40:14 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0274D2EF4B; Fri, 2 Dec 2022 16:37:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027877; x=1701563877; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=OscWK4vBleHDbCii3J5IxRm9vERbNW8+FtcU6Glk4SQ=; b=Vfvj2FmofA5LOIcQoMyvroTW6AQxBJduWB986yaRTakap3Aucd9V1rut 1TtVFUDEn6ijJLr9iqfhp0ZqIUighhudkVK6Q9qFVo0w4JXhJVZ3jmYQq ZaON/TfldnztU/iOSwepJYiKQvzvcV61orUbo0NM0HwfGkIO3w8c0fa7e T/Tbkby3z6ztjaZubQJxEdMWM/G+y19n4vbjwxb272CTCqnUZPVXq6CYi vs/f5qlFw5tSmJl3AJr+Gzqz8SZw8T6ApGisefOEXhQwr0V4P0etoHAIm G5wxs6SK/je8My6GlzkQwU1+t0MfPJLofXCz4KdH0JFuqc4kVNv8Wl34p w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711140" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711140" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:16 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479912" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479912" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:13 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 19/39] mm: Add guard pages around a shadow stack. Date: Fri, 2 Dec 2022 16:35:46 -0800 Message-Id: <20221203003606.6838-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151411153984269?= X-GMAIL-MSGID: =?utf-8?q?1751151411153984269?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. The architecture of shadow stack constrains the ability of userspace to move the shadow stack pointer (SSP) in order to prevent corrupting or switching to other shadow stacks. The RSTORSSP can move the spp to different shadow stacks, but it requires a specially placed token in order to do this. However, the architecture does not prevent incrementing the stack pointer to wander onto an adjacent shadow stack. To prevent this in software, enforce guard pages at the beginning of shadow stack vmas, such that there will always be a gap between adjacent shadow stacks. Make the gap big enough so that no userspace SSP changing operations (besides RSTORSSP), can move the SSP from one stack to the next. The SSP can increment or decrement by CALL, RET and INCSSP. CALL and RET can move the SSP by a maximum of 8 bytes, at which point the shadow stack would be accessed. The INCSSP instruction can also increment the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". It can be thought of as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element The maximum distance INCSSP can move the SSP is 2040 bytes, before it would read the memory. Therefore a single page gap will be enough to prevent any operation from shifting the SSP to an adjacent stack, since it would have to land in the gap at least once, causing a fault. This could be accomplished by using VM_GROWSDOWN, but this has a downside. The behavior would allow shadow stack's to grow, which is unneeded and adds a strange difference to how most regular stacks work. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v4: - Drop references to 32 bit instructions - Switch to generic code to drop __weak (Peterz) v2: - Use __weak instead of #ifdef (Dave Hansen) - Only have start gap on shadow stack (Andy Luto) - Create stack_guard_start_gap() to not duplicate code in an arch version of vm_start_gap() (Dave Hansen) - Improve commit log partly with verbiage from (Dave Hansen) Yu-cheng v25: - Move SHADOW_STACK_GUARD_GAP to arch/x86/mm/mmap.c. Yu-cheng v24: - Instead changing vm_*_gap(), create x86-specific versions. include/linux/mm.h | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f10797a1b236..e0991d2fc5a8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2821,15 +2821,36 @@ struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) return mtree_load(&mm->mm_mt, addr); } +static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + + /* + * Shadow stack pointer is moved by CALL, RET, and INCSSPQ. + * INCSSPQ moves shadow stack pointer up to 255 * 8 = ~2 KB + * and touches the first and the last element in the range, which + * triggers a page fault if the range is not in a shadow stack. + * Because of this, creating 4-KB guard pages around a shadow + * stack prevents these instructions from going beyond. + * + * Creation of VM_SHADOW_STACK is tightly controlled, so a vma + * can't be both VM_GROWSDOWN and VM_SHADOW_STACK + */ + if (vma->vm_flags & VM_SHADOW_STACK) + return PAGE_SIZE; + + return 0; +} + static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { + unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; - if (vm_start > vma->vm_start) - vm_start = 0; - } + vm_start -= gap; + if (vm_start > vma->vm_start) + vm_start = 0; return vm_start; } From patchwork Sat Dec 3 00:35:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29179 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142706wrr; Fri, 2 Dec 2022 16:41:54 -0800 (PST) X-Google-Smtp-Source: AA0mqf66Q74bFUtC3gQHnn48SC4W++y/obGZQfdoxmXbDQgUqbn35VK3oo+ocSCnkK6ZJlTm1bcj X-Received: by 2002:a17:906:9f07:b0:7be:2fbd:828f with SMTP id fy7-20020a1709069f0700b007be2fbd828fmr26713993ejc.593.1670028114390; Fri, 02 Dec 2022 16:41:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028114; cv=none; d=google.com; s=arc-20160816; b=xxVz4lS85/cNjDOHw65SSl5r7F0XD2TZW0nUGnEQeOhIH5vEKj9cEV26w++JeF5hXC HxSS1C8IlOzVmt7lIP8Je+a6KFTT7E9hmewW4UghPwpynSYrye2M4a8zCQcXJ9xdDr+Z dPuEiu9c0vpq8izl0iRS91LazfMlj1SUfMp62BOtrbwOiVs5wLZzCuiYpBD2/uzdMC63 Ge+UKyKBlmMqVVQnycB/UJ32R1wOnLE7XjxntnWHkgbK+ccyB0cd9l9tcVEoKpgJj3Me y8102elmlD/EQ0pIoEPqNIvUzLX80zNkXWImP042XwsAvNtN3a6rgfVi1oE/14ryKlzk 3ARw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=udV8Xc2KkKJNVzFok9LmZFYgqQbdMsOpmWfD5xkHaGg=; b=nkDP8s8yd52cA/IUzwFyC2CMwaQLaPZIS7T44tywucsNXsmXcROiFJbPJKxtRDu/cW I4W8H1Nab/uV62y7T+wqP9qFapd4/hmIYhjvAS8YRcMnBRFW8D2Xn3EJw812oc7/GwHI S2JlTwfQA7PFmKefwpHP7R1wVVMWlOYG6d9O1GkziRv+kr0KGRHkZRb9E+3ecRLVe6VE L2xurZmGqOnOPxah7q0JFyD2N18RcEIyxpBDdDO/aPgcw6EVghpIS2iYi3JUkfS/Ufol CNUipvhNGPLBj+YrV1uHNdM8NLpnzBLo+pBF1r7DhF3gcysRvrSqgXERtOjoyz6DK6pf OtAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=XUwSQm9O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sc9-20020a1709078a0900b007b790c183d7si7737471ejc.312.2022.12.02.16.41.31; Fri, 02 Dec 2022 16:41:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=XUwSQm9O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235098AbiLCAks (ORCPT + 99 others); Fri, 2 Dec 2022 19:40:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235037AbiLCAkM (ORCPT ); Fri, 2 Dec 2022 19:40:12 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 386E22F031; Fri, 2 Dec 2022 16:37:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027877; x=1701563877; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=xyn1ghT0ss9HjfLIsJBLdJfTOl5XEifC54z6shwmlw0=; b=XUwSQm9OpdskCcQA3MfVxfaL40Eyg9XXhU34b0OqlykomH0eSG3DH5Ds cQWl4XuuX4FqgJvbCVftAlQZclYlsLpM+kuvZu9wqmi3uzhQkiJJfcbd+ hGqP37im8bIQ2AphD9RvgCdJwORPRscDPVMigyDqGLQ2j0HJSpqPrr0Xq s9Z74L+UfprlPK1cFWwaIsAK6aT3pkBmIwB6ZB6HJ49TcQMQss0HlHT8r QXKHj4NiBuBFHqH8TFc4gsuLDu9IIfcorqUaK5b5tsdjYKIaZHWE/U5IG pD35MQqFT4az71nB442UfUIW/rNIB4er7yzNfrOHKCxNV5JA2WYelz5T5 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711165" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711165" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:18 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479918" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479918" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:15 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 20/39] mm/mmap: Add shadow stack pages to memory accounting Date: Fri, 2 Dec 2022 16:35:47 -0800 Message-Id: <20221203003606.6838-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151400066027092?= X-GMAIL-MSGID: =?utf-8?q?1751151400066027092?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Account shadow stack pages to stack memory. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Remove unneeded VM_SHADOW_STACK check in accountable_mapping() (Kirill) v2: - Remove is_shadow_stack_mapping() and just change it to directly bitwise and VM_SHADOW_STACK. Yu-cheng v26: - Remove redundant #ifdef CONFIG_MMU. Yu-cheng v25: - Remove #ifdef CONFIG_ARCH_HAS_SHADOW_STACK for is_shadow_stack_mapping(). mm/mmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 146388feb72b..7ce04d2d5a10 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3295,6 +3295,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages) mm->exec_vm += npages; else if (is_stack_mapping(flags)) mm->stack_vm += npages; + else if (flags & VM_SHADOW_STACK) + mm->stack_vm += npages; else if (is_data_mapping(flags)) mm->data_vm += npages; } From patchwork Sat Dec 3 00:35:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29180 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142716wrr; Fri, 2 Dec 2022 16:41:59 -0800 (PST) X-Google-Smtp-Source: AA0mqf7u5csSZjtVubY17L2Su6yn+8r+uYenvktTfn9uKYNeqSKyY99T7xsc1lLQitGr5zANihmh X-Received: by 2002:a17:907:a04c:b0:7c0:98e5:f886 with SMTP id gz12-20020a170907a04c00b007c098e5f886mr13400804ejc.516.1670028118991; Fri, 02 Dec 2022 16:41:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028118; cv=none; d=google.com; s=arc-20160816; b=z8mQtmwEZWrEbFZ6kBAOc5LTp69ovCiPgcH1iufCktNnSEJsNLEV8j83dYb1Pu6WdK JOjIIAvvdLOcGz0C6R/Qc2jJpF9KIwZEPFzwn7h4V2LTUCBYmgjTOTPIZ+92uQoY//BH aIKd+2p4o7I4xebOU1JkP4h1HvJ29amTHJTwGDu51CeGpmxjGZDk8K14XDw6D3HR+Jvz sND5vPqZRYSxwjfGREkrkLuYtmFWaz2viGdz7ZQQl1hNjwvtBLv6IZiVfjJ4BmmlzmF6 nSaLwms235NhnvaKHmCOcE3YgydUERybjkHznTKfkm6GEc2reqT3xIoaMTVNES/Ss9Tl e+Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=RuVeAKBdy5nPRl76k0Iqev2IB3RHW+4gMvSht8VTB/E=; b=H6h//bn/+lG4Od6sQ0sHImHxRDv5qKqwFR1WhYTgjc2uqnKrw8lecKptqmHKBtYhOM JBTF1NzpMS9QakyLj/4CD05Fq6WZBhWwAwftbMNBkKaRcc6SvySdop7qvAEoYPzyOel6 c/+ifxCOrIxjmVmNiYhQroNQDCbLqan1s0AKAilwJDTIzdjJ4FiqztptJ7GKh34my+Cq 9JvxeD9h58QRzx7S0rHhpTB66FA2gdYvdHyJ11cl6fYLssYB+Y8C5FVUIh4f9OEMBU1Y LjF0yosCZW0eeUoBzHqMDX6P79rWZ9J9RqedZcevQVOq9rlUCuoWEKElQbw2W7bUHIBu M6mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ReqmpWP1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id co18-20020a0564020c1200b0046aa4908162si6782048edb.46.2022.12.02.16.41.36; Fri, 02 Dec 2022 16:41:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ReqmpWP1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235101AbiLCAkv (ORCPT + 99 others); Fri, 2 Dec 2022 19:40:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235092AbiLCAkN (ORCPT ); Fri, 2 Dec 2022 19:40:13 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DA6A3D921; Fri, 2 Dec 2022 16:37:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027879; x=1701563879; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Fk/Qne5mwiZTFgT2silFS8+68/JE0zNL0yZ7bHwxdtU=; b=ReqmpWP1KY58qsQ6NYgt+TzVK2SD/G95+GFzCdAfRgaoFmal9cCBzKDP D1gYn8dhT/YpWlQp6w6zaZcBcJILoLcLENeYm6pGCu0loZviXZaphMx3t kqytYxgQ01iZibaevSIbK4OeRhvbM1Q13pWZgRVaIwXu5kGEA6mqgVpp2 YGZePMiRxda3N4yuBCnpWiFuqdWyw96aOdRFDHV7eDCpo3GBcuyxvWH/W i+E/rYjFKOfWaPpozvOz6wqSLLsEldBf42pzU/LKsdleMeNPQdjQuJDnD /AGTu5LgEnxBHAZlrXMWpdayAIkqDuNs8ZOPdQeaQqSLckD0Zbk7F7U1X g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711168" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711168" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:20 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479924" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479924" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:18 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 21/39] mm/mprotect: Exclude shadow stack from preserve_write Date: Fri, 2 Dec 2022 16:35:48 -0800 Message-Id: <20221203003606.6838-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151404742966865?= X-GMAIL-MSGID: =?utf-8?q?1751151404742966865?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. In change_pte_range(), when a PTE is changed for prot_numa, _PAGE_RW is preserved to avoid the additional write fault after the NUMA hinting fault. However, pte_write() now includes both normal writable and shadow stack (Write=0, Dirty=1) PTEs, but the latter does not have _PAGE_RW and has no need to preserve it. Exclude shadow stack from preserve_write test, and apply the same change to change_huge_pmd(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v4: - Add "why" to comments in code (Peterz) Yu-cheng v25: - Move is_shadow_stack_mapping() to a separate line. Yu-cheng v24: - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping(). mm/huge_memory.c | 8 ++++++++ mm/mprotect.c | 8 ++++++++ 2 files changed, 16 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 60451e588955..b6294c4ad471 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1803,6 +1803,14 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return 0; preserve_write = prot_numa && pmd_write(*pmd); + + /* + * Preserve only normal writable huge PMD, but not shadow + * stack (RW=0, Dirty=1), so the restoring code doesn't + * make the shadow stack memory conventionally writable. + */ + if (vma->vm_flags & VM_SHADOW_STACK) + preserve_write = false; ret = 1; #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION diff --git a/mm/mprotect.c b/mm/mprotect.c index de7351631e21..026347f1f1ee 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -115,6 +115,14 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, pte_t ptent; bool preserve_write = prot_numa && pte_write(oldpte); + /* + * Preserve only normal writable PTE, but not shadow + * stack (RW=0, Dirty=1), so the restoring code doesn't + * make the shadow stack memory conventionally writable. + */ + if (vma->vm_flags & VM_SHADOW_STACK) + preserve_write = false; + /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. From patchwork Sat Dec 3 00:35:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29182 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142870wrr; Fri, 2 Dec 2022 16:42:27 -0800 (PST) X-Google-Smtp-Source: AA0mqf4nQGqK10RiYGsL5VE/nPqe338/50IemP3lmGN8ElQCpb12wC+MounfpaptGCkDaQuvzVsn X-Received: by 2002:a05:6402:1943:b0:46a:e4cd:5204 with SMTP id f3-20020a056402194300b0046ae4cd5204mr16435594edz.402.1670028147549; Fri, 02 Dec 2022 16:42:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028147; cv=none; d=google.com; s=arc-20160816; b=RCSF5/qszpcZMlraU1t98Qkcf/LE0nTvzbrdwM7mPopBHvkYJP5svZf84s4u5Veh95 u6SLA3D8a/ARz0dkwpWJu6L0M4aYI8S0ksqquXbpv254jpFILVgGCWfLnDbDxQP8Tqbf PEQ8J8hK/GAUuhdxPhcH948Y2X/eVh+9YyRCAftpGha0YQ0tQyATETvwgHk4PuEII9te UjRoKDmj2bD7I2llUDFafhc1NHPVF8s67kwok7PfwQCmkI6ZqCAGqGzlIbHXSKIcvtjE YPPlmRsshil0zKr1ZEixc1z11JhCDPgyl6s3gK2V7JmvxK68nmXTQUzdkAiqQsIIB6Xi Hphg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=FFEMXJD8yVPVSnVqjSZ7qEh6m72bzyPAClkcJEgo1tI=; b=AY60ol/Ro15MB2kr/xy6l/W5VfkDbUPdVxflYaMEy5kOohmw/wWgG7+bO3y76M5FA2 goWWbOBtDMOYz8O/vby7E/RAKVItZqJhL+8hZwhE+T2zg/ZuVFf+ArrJvu7GaRL7rxxn SWkzS8yBkMLXwY2esLw2/9VR8tfQeu5SPBkVsijAeeVvKfubSfzlUgg7c0zlub3htmLf xzxothR213slRkQf3gtGYkxhcWdp3mJ2UydvxTAf/xeh801KeDlCXQtKevf+VEP8ErVy vtxQp+o0B1BG4UmJ6R+7LYsOYx4ceSJftvDTHPTK5fkrZuk1hfU4yQ13/JK3lFwUgtTw clfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dI+zjLmZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h18-20020aa7c612000000b0046b2966d088si1117880edq.502.2022.12.02.16.42.04; Fri, 02 Dec 2022 16:42:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dI+zjLmZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235238AbiLCAlG (ORCPT + 99 others); Fri, 2 Dec 2022 19:41:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235118AbiLCAkS (ORCPT ); Fri, 2 Dec 2022 19:40:18 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0A7C421A9; Fri, 2 Dec 2022 16:38:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027882; x=1701563882; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=e1Skkfy1bUPFrf6Q3GKORbKcHv2d1Lsi1t/sz8iK3T0=; b=dI+zjLmZgQKl4reOc5ild3Xo8n29jQF38wpDrK8gGOXx294gdZsWSG1q pE2N5o/H0jQrArrtcxYZhRJpfkldOlQ6VQN7Wnj0ovdeM7pfO86RmvkWR m5Uus6P8CyUvFd2AbsC5vnsMkXlbyu7xu9CYtbjg6kxWk65X9K4hKL3eh nurw6QVoByszTHLbhoCy+1nPJtvlzR+wp2F4brw61j7vcs1VmdzoKOl6L 6/ahAo2G9Viw72ydEbJc4ZsZLf+iIPtQKIVAdHiLUVIlc57ZmJLwDvDtV eY2J6ZiiZsDrQw0q6LzgZTVrhm56qkh3JJyLHJJo1vmVryrG5y9pUNeFD A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711190" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711190" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:22 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479929" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479929" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:20 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 22/39] mm: Re-introduce vm_flags to do_mmap() Date: Fri, 2 Dec 2022 16:35:49 -0800 Message-Id: <20221203003606.6838-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151434756221011?= X-GMAIL-MSGID: =?utf-8?q?1751151434756221011?= From: Yu-cheng Yu There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Andrew Morton Cc: Oleg Nesterov Cc: linux-mm@kvack.org --- fs/aio.c | 2 +- include/linux/mm.h | 3 ++- ipc/shm.c | 2 +- mm/mmap.c | 10 +++++----- mm/nommu.c | 4 ++-- mm/util.c | 2 +- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 5b2ff20ad322..66119297125a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -554,7 +554,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index e0991d2fc5a8..7c10d2a7bbfd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2731,7 +2731,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool downgrade); diff --git a/ipc/shm.c b/ipc/shm.c index bd2fcc4d454e..1c5476bfec8b 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1662,7 +1662,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 7ce04d2d5a10..a3ce13be0767 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1239,11 +1239,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; validate_mm(mm); @@ -1304,7 +1304,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2877,7 +2877,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, 0, pgoff, &populate, NULL); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index 214c70e1d059..20ff1ec89091 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1042,6 +1042,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1049,7 +1050,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); @@ -1069,7 +1069,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ diff --git a/mm/util.c b/mm/util.c index 12984e76767e..aefe4fae7ecf 100644 --- a/mm/util.c +++ b/mm/util.c @@ -517,7 +517,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); From patchwork Sat Dec 3 00:35:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29183 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142906wrr; Fri, 2 Dec 2022 16:42:33 -0800 (PST) X-Google-Smtp-Source: AA0mqf7CuAhnT4Oqt4+PMWCzD3uJ4qYJYb7z0USZnWFcnmhNQqTlsojNX8nLH4ILdrw7O+RckKld X-Received: by 2002:a17:906:fa89:b0:7c0:bc68:c00a with SMTP id lt9-20020a170906fa8900b007c0bc68c00amr6091917ejb.665.1670028153787; Fri, 02 Dec 2022 16:42:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028153; cv=none; d=google.com; s=arc-20160816; b=ZFcODKB10Mp01OTWrlXN8DfhlFI3/8EZO0dAwm8c9rRldL0Q4EoM4YbcB/ZLzoUwke e0+yh5I1yp0dP4h7hYSkqxyCWgzKQZb1wTVO48q/rcGSkKdKB5T4/oRfNG9mBTeTdCyC EELLjz+thW9NmbCNXO+bMZRkYZEq+pKGIwgPjSEFQ2wPx7kNHXRyJ7iBFwM3+zj4WyFG tZ1ILg75uNUnLqkYnpV8uPOFt6xVePLc0zCScKgQ5I60hi1vKMIWZbwkXXciYsI2LZY6 vRUrYMbjwfIk/t/7K9w+EzawOvxs2OuNSdU2rav+/ELh9mWJ05EXd/m3u+RZv2ITq7zH C9cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=6nlk4qSz27C95X29zZwqpj4jHA+dpOluz8EuClfv5ZM=; b=EVLz5F8d+kyzLJncl4gY/tGstqtpyFvIzmeBJRhvXM10zYT3/MhvZruV8cXUjUzXir HSNsWDVwJXpP5u2SYnQN3kNCFCgjALZd/8S9po1zHmTqecos3dMB4ECjwHb3C72mQ7Cn EQ6AJcI906iOGGB1XMLft8Ik7jpHH2dc8ejaQ4qEd5ueXVlKJGOvXLB4/K14n12i0/0/ 2uAmSlrpjzd6ca7QkpwDB+L64UPIlRKf8bvfdEif66cXrEIsN41qqo+NRdDWyiqxJ37b UyG9zc0bPdn/fUi387qYocE4YfDN0KlEvUhPuLfIQRA4f2m+5RSAkJNezMMST8lQ8DqA FW/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=bqCzhTrL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ww1-20020a170907084100b007826de24087si5717809ejb.228.2022.12.02.16.42.10; Fri, 02 Dec 2022 16:42:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=bqCzhTrL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235061AbiLCAla (ORCPT + 99 others); Fri, 2 Dec 2022 19:41:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235167AbiLCAkV (ORCPT ); Fri, 2 Dec 2022 19:40:21 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82E2F2714D; Fri, 2 Dec 2022 16:38:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027885; x=1701563885; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=1B1fGdYaOUBrZLGly7/XqlrrigTtV2P34Ygy1ZAid0o=; b=bqCzhTrLzuACW+fDTw40WUtiRqWIcKe1jxjbK2fzdHu7aOzyqvBoUIVD UQkMU92E/N3xaky63L4zT+k4pPbqL0MJ1DPIjY5OePhnHac6iYRtbvyzr BEai8dkhWPyfaZye8eb5Q7Uj9ccEooMESzVyhPiSB6uuyKlXNihoWoATS ifBmVqRTCyCX+FqhPNbhaP344/K1SR3vDcd5AhkOkb2rIhpy0T1/1HW6w EeHroZ1BaYZrMyUm/lm+GuYyTRKKK+2TRKrlxmBCqYJMg095tp3wkp4WQ 5OTcYjelcBnTuObEzpAjHaqNGXNHOI3swaZ9NRqiAfaKp0WTkfb8j07y1 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711220" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711220" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:23 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479934" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479934" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:22 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 23/39] mm: Don't allow write GUPs to shadow stack memory Date: Fri, 2 Dec 2022 16:35:50 -0800 Message-Id: <20221203003606.6838-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151441105739152?= X-GMAIL-MSGID: =?utf-8?q?1751151441105739152?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Shadow stack memory is writable only in very specific, controlled ways. However, since it is writable, the kernel treats it as such. As a result there remain many ways for userspace to trigger the kernel to write to shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a little less exposed, block writable GUPs for shadow stack VMAs. Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v3: - Add comment in __pte_access_permitted() (Dave) - Remove unneeded shadow stack specific check in __pte_access_permitted() (Jann) arch/x86/include/asm/pgtable.h | 5 +++++ mm/gup.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a89dfa9174ae..945d58681a87 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1636,6 +1636,11 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write) { unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER; + /* + * Write=0,Dirty=1 PTEs are shadow stack, which the kernel + * shouldn't generally allow access to, but since they + * are already Write=0, the below logic covers both cases. + */ if (write) need_pte_bits |= _PAGE_RW; diff --git a/mm/gup.c b/mm/gup.c index cdff87343884..75e8d3853ff3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1062,7 +1062,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) return -EFAULT; if (write) { - if (!(vm_flags & VM_WRITE)) { + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* From patchwork Sat Dec 3 00:35:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29184 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142920wrr; Fri, 2 Dec 2022 16:42:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf6ucpi3GvlTbTqFtj0/ck828iQmyFOi8k5mJYYvlZco2jOohohMEZ2Lsr0aS40zhhh65I1w X-Received: by 2002:a05:6402:4c2:b0:461:3ae6:8bf2 with SMTP id n2-20020a05640204c200b004613ae68bf2mr12366324edw.396.1670028156093; Fri, 02 Dec 2022 16:42:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028156; cv=none; d=google.com; s=arc-20160816; b=y9hePe7N+srVflOKfquUM/ei7aI4lNmukFJ786ci6mVk0T44ZjTfdMt8VOqAgmvcLT Lnfh6Iq1avtE8miO+xoKkgwVdIZsgPxqxFY//u/BuLQnQaVH6hTbZYgiuRU7nh6zirN4 n6lP+dSdHL0LIWXDsY9CHumNhE/31ZpJRfqM/cHYaHhs4XLq7j7X6RNQd3uscP4ctjJo yI8RFzv7Akfvb9kEQ9JeslMXfnqlxEynGXwZaRjjnE9sKm3QW2EEzX170qgy4HNAB9yp fQuMN8Am+P8yomw+LnJeUbqykM4KCXq9fpXRhf4/Wa7j3YqqRPbklBQF/WfxN910mJmZ AwdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=jcY0rHrY/yCZl5CUB1EEdFWq5ffqnaDIfkwGGV24ND8=; b=k1YJmByudQfw37ANohbxXnCh7OiFgJ6pfFGZgZC9Ez4+UswD3ru1OfsTwfqf6Xb10Y gvpCk+qTA0RUCXPND0g/z+MNsgNRA2u3Zjrr18jJMkKcXvIIfmuCDN3S5M8rFYRfJHA7 DyiqzujXd6ecj3mTLyyHaw8rz40gO4WQn0cbUDyjCw1QeeeCyMsAxtHcz7tIY0eEYNqf mL2lQNg+y8RCYR7UW0HgNwxx6Ve3fiGRaogvssN7h1CUXDViB2A9V4NAWXVx9dNDOk2O gY9dDs/EVKXUUhl78whuMaso2OkGbqsc/6v0KwcOpNA7eoAcaY6p6C+b02t1R4Xnvnyr ytZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ims0zhRT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nb3-20020a1709071c8300b007c0abf0760fsi6486800ejc.54.2022.12.02.16.42.13; Fri, 02 Dec 2022 16:42:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ims0zhRT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235194AbiLCAlf (ORCPT + 99 others); Fri, 2 Dec 2022 19:41:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235181AbiLCAkX (ORCPT ); Fri, 2 Dec 2022 19:40:23 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C78F432B82; Fri, 2 Dec 2022 16:38:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027885; x=1701563885; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=MnAcuxZAV1b+bhHBPiW6hFp2ZyTZLf3Iug5aPuTPDEE=; b=Ims0zhRTJtQkwiXYgS0+Fl6ionTwk7MpdB4v+Dn4kB9jk1lq1eqXyjCU KDqfOZGqJxM3n4DnLu4DWILLd9qOYMVaSeSru6edadgaZ9UTnPuK/JvVh ocFz+6oPLDZrijrcdsfb7bMQex2PrpkwCmDTK/78WuMffwxWlHV3XeXIJ 6v8gokTUAAhFa8ahatRvY2Wwh0yT8ovM4l3J4A1aUpC4Be7mR+N51Cid/ DKcQrFUyxgpUrqzKWntXh2DvXZ4vUZvvASOibXd3D0q+/lfxKon89y0pz Q9prYyolLzspuvEMsyO8hhiRTeWv2XPQq03Mahh+kEGJZF3r/C7PTX5v0 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711249" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711249" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:25 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479940" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479940" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:23 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 24/39] mm: Warn on shadow stack memory in wrong vma Date: Fri, 2 Dec 2022 16:35:51 -0800 Message-Id: <20221203003606.6838-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151443620152182?= X-GMAIL-MSGID: =?utf-8?q?1751151443620152182?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One sharp edge is that PTEs that are both Write=0 and Dirty=1 are treated as shadow by the CPU, but this combination used to be created by the kernel on x86. Previous patches have changed the kernel to now avoid creating these PTEs unless they are for shadow stack memory. In case any missed corners of the kernel are still creating PTEs like this for non-shadow stack memory, and to catch any re-introductions of the logic, warn if any shadow stack PTEs (Write=0, Dirty=1) are found in non-shadow stack VMAs when they are being zapped. This won't catch transient cases but should have decent coverage. It will be compiled out when shadow stack is not configured. In order to check if a pte is shadow stack in core mm code, add default implmentations for pte_shstk() and pmd_shstk(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v3: - New patch arch/x86/include/asm/pgtable.h | 2 ++ include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 2 ++ mm/memory.c | 2 ++ 4 files changed, 20 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 945d58681a87..519a266ace10 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -129,6 +129,7 @@ static inline bool pte_dirty(pte_t pte) return pte_flags(pte) & _PAGE_DIRTY_BITS; } +#define pte_shstk pte_shstk static inline bool pte_shstk(pte_t pte) { if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) @@ -147,6 +148,7 @@ static inline bool pmd_dirty(pmd_t pmd) return pmd_flags(pmd) & _PAGE_DIRTY_BITS; } +#define pmd_shstk pmd_shstk static inline bool pmd_shstk(pmd_t pmd) { if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index b4a9d9936463..34aa57941d57 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -531,6 +531,20 @@ static inline pte_t pte_mkwrite_shstk(pte_t pte) } #endif +#ifndef pte_shstk +static inline bool pte_shstk(pte_t pte) +{ + return false; +} +#endif + +#ifndef pmd_shstk +static inline bool pmd_shstk(pmd_t pte) +{ + return false; +} +#endif + #ifndef pte_clear_savedwrite #define pte_clear_savedwrite pte_wrprotect #endif diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b6294c4ad471..64d81aa97a61 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1656,6 +1656,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, */ orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, tlb->fullmm); + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pmd_shstk(orig_pmd)); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (vma_is_special_huge(vma)) { if (arch_needs_pgtable_deposit()) diff --git a/mm/memory.c b/mm/memory.c index c02b6421241d..bae62b2d6696 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1437,6 +1437,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pte_shstk(ptent)); tlb_remove_tlb_entry(tlb, pte, addr); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); From patchwork Sat Dec 3 00:35:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29185 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142993wrr; Fri, 2 Dec 2022 16:42:50 -0800 (PST) X-Google-Smtp-Source: AA0mqf7jhdn2DTXwSP7vGb4EO55hjt+v9Ldi8IVKYAukfh7soO8IoVdMhS+/uvwrSLaX0cEMB9jg X-Received: by 2002:a17:906:402:b0:7a6:fc0f:6fe6 with SMTP id d2-20020a170906040200b007a6fc0f6fe6mr59628159eja.694.1670028170495; Fri, 02 Dec 2022 16:42:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028170; cv=none; d=google.com; s=arc-20160816; b=KrtCk0liUUCCcXfWfaX/qIKlAf91Yn22/YTdzUcwNyXGx5zPGhHsGg/QI5bsToSHEZ MFMraJNJE7moZMAr3aPoX0dmOuLaK2elu7k1surO4uXS1mxFiBjKoS1Mk3plalYsOW+W apq++iuX18IsWSk3zxLTOpbmhLDU5BABAjwdx6eH+9kXtD4xGYx8FTTCeY6UQ4osXH8n Dh2sazQTusOYHltyaK6YXMFI29h5Tg0NSDS1pmopH4q0rQ4GQ+1TJnGtnXAviA78iBbn V/gFqKPZWgu8B9m/qz3Ip+pPC31NVtopQarpGCOMn9c142ACvMI+QNTPjF6w2gEH4yZZ 9S2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=78s9RIXaFmjU4GezSUTMAHEgQum8T+J9bEapciSp1hs=; b=DyUZdl1TJpTPSrusQRTbA1hOCVdBz/KuP/9N/y7pBBtWkKfbnlcMaCivj5zm04US66 XwG5n4kwdGSOyMCfiYBJvwUbys4P5GXxyNyeHevgFJ0qEoDewkFQP9oqdiJ1y0b8jz5f USNIqkkEtRhx7fuh7bFqlwDxatjNHPnaRqqJye4ORnCzyKAnQhFuUecy8fBS3MMuKdCV Cm7xj7ceZ8QWkoC4vZ/RdQ2BC5wLHuoIuwsP9OVsjYX/1+B8Sv05m6vH0zfxNuooLyVy vPw0W/GQi23CEgx/0pXISahiTCp7nc4xv61wZydYr4XqL0evzffaBKFRWs3Y6l8GnQji TI5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gUETHRQ0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qo14-20020a170907874e00b007b957150b9fsi6261416ejc.480.2022.12.02.16.42.27; Fri, 02 Dec 2022 16:42:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gUETHRQ0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235084AbiLCAlj (ORCPT + 99 others); Fri, 2 Dec 2022 19:41:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235004AbiLCAkX (ORCPT ); Fri, 2 Dec 2022 19:40:23 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 921257F8A5; Fri, 2 Dec 2022 16:38:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027886; x=1701563886; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=hb52f1kQ206uopv1axYH6ww07mmrNoR971WJZuI+W7k=; b=gUETHRQ0XkCpDQP9hUx9hmXj0yqEi5jmVex3bB5y+9Qst9aUHPfkIraD EGmJHNuGBiU6mDZlP4lNBhgl3pk5btap4d9vRRxyd+HBQ8Jh/XvePcjc6 wBvzra2r1t2BNODXH/DPIDk7fahP882mxDOwscwlF3HK8dJ25wkDwEK+e LLKutZrUis67DvYhUz/xmBK9d9AGyixAGqWQ6kEY1DIbwdmoHs/0i4Y+W mdy67xI4ZELisH7oLXON3aO8MknWHfVaM5vH5tamRSGxdtB8BzAq9G4II bSh9+UK8L7mmkVArVOD9UgTNNTQzGzAQ0CClTC4hBjH0/pzwsE3IWnpoB A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711273" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711273" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:27 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479947" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479947" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:25 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 25/39] x86: Introduce userspace API for shadow stack Date: Fri, 2 Dec 2022 16:35:52 -0800 Message-Id: <20221203003606.6838-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151459052009314?= X-GMAIL-MSGID: =?utf-8?q?1751151459052009314?= From: "Kirill A. Shutemov" Add three new arch_prctl() handles: - ARCH_SHSTK_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or an error. - ARCH_SHSTK_LOCK prevents future disabling or enabling of the specified feature. Returns 0 on success or an error The features are handled per-thread and inherited over fork(2)/clone(2), but reset on exec(). This is preparation patch. It does not implement any features. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Kirill A. Shutemov [tweaked with feedback from tglx] Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v4: - Remove references to CET and replace with shadow stack (Peterz) v3: - Move shstk.c Makefile changes earlier (Kees) - Add #ifdef around features_locked and features (Kees) - Encapsulate features reset earlier in reset_thread_features() so features and features_locked are not referenced in code that would be compiled !CONFIG_X86_USER_SHADOW_STACK. (Kees) - Fix typo in commit log (Kees) - Switch arch_prctl() numbers to avoid conflict with LAM v2: - Only allow one enable/disable per call (tglx) - Return error code like a normal arch_prctl() (Alexander Potapenko) - Make CET only (tglx) arch/x86/include/asm/processor.h | 5 ++++ arch/x86/include/asm/shstk.h | 21 +++++++++++++++ arch/x86/include/uapi/asm/prctl.h | 6 +++++ arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/process_64.c | 6 +++++ arch/x86/kernel/shstk.c | 44 +++++++++++++++++++++++++++++++ 6 files changed, 84 insertions(+) create mode 100644 arch/x86/include/asm/shstk.h create mode 100644 arch/x86/kernel/shstk.c diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 4e35c66edeb7..ff1c0b1aca8c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -475,6 +475,11 @@ struct thread_struct { */ u32 pkru; +#ifdef CONFIG_X86_USER_SHADOW_STACK + unsigned long features; + unsigned long features_locked; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h new file mode 100644 index 000000000000..58f9ee675be0 --- /dev/null +++ b/arch/x86/include/asm/shstk.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SHSTK_H +#define _ASM_X86_SHSTK_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +long shstk_prctl(struct task_struct *task, int option, unsigned long features); +void reset_thread_features(void); +#else +static inline long shstk_prctl(struct task_struct *task, int option, + unsigned long features) { return -EINVAL; } +static inline void reset_thread_features(void) {} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_SHSTK_H */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index eb290d89cb32..8b427aea2345 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -20,9 +20,15 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +/* Don't use 0x3001-0x3004 because of old glibcs */ + #define ARCH_GET_UNTAG_MASK 0x4001 #define ARCH_ENABLE_TAGGED_ADDR 0x4002 #define ARCH_GET_MAX_TAG_BITS 0x4003 #define ARCH_FORCE_TAGGED_SVA 0x4004 +#define ARCH_SHSTK_ENABLE 0x5001 +#define ARCH_SHSTK_DISABLE 0x5002 +#define ARCH_SHSTK_LOCK 0x5003 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 96d51bbc2bd4..2260891609b4 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,6 +145,8 @@ obj-$(CONFIG_CFI_CLANG) += cfi.o obj-$(CONFIG_CALL_THUNKS) += callthunks.o +obj-$(CONFIG_X86_USER_SHADOW_STACK) += shstk.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 084ec467dbb1..4ddd7d9209e1 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -514,6 +514,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, load_gs_index(__USER_DS); } + reset_thread_features(); + loadsegment(fs, 0); loadsegment(es, _ds); loadsegment(ds, _ds); @@ -916,6 +918,10 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) return put_user(0, (unsigned long __user *)arg2); else return put_user(LAM_U57_BITS, (unsigned long __user *)arg2); + case ARCH_SHSTK_ENABLE: + case ARCH_SHSTK_DISABLE: + case ARCH_SHSTK_LOCK: + return shstk_prctl(task, option, arg2); default: ret = -EINVAL; break; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c new file mode 100644 index 000000000000..41ed6552e0a5 --- /dev/null +++ b/arch/x86/kernel/shstk.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * shstk.c - Intel shadow stack support + * + * Copyright (c) 2021, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include + +void reset_thread_features(void) +{ + current->thread.features = 0; + current->thread.features_locked = 0; +} + +long shstk_prctl(struct task_struct *task, int option, unsigned long features) +{ + if (option == ARCH_SHSTK_LOCK) { + task->thread.features_locked |= features; + return 0; + } + + /* Don't allow via ptrace */ + if (task != current) + return -EINVAL; + + /* Do not allow to change locked features */ + if (features & task->thread.features_locked) + return -EPERM; + + /* Only support enabling/disabling one feature at a time. */ + if (hweight_long(features) > 1) + return -EINVAL; + + if (option == ARCH_SHSTK_DISABLE) { + return -EINVAL; + } + + /* Handle ARCH_SHSTK_ENABLE */ + return -EINVAL; +} From patchwork Sat Dec 3 00:35:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29186 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1142998wrr; Fri, 2 Dec 2022 16:42:52 -0800 (PST) X-Google-Smtp-Source: AA0mqf4vom2aivcdy5DZUyGVJ8Q5lZhVhG7SY6Vqk5kY7Q1skuwiXA5kFBpvjiWrWG+PmxINKf/h X-Received: by 2002:a17:902:f812:b0:189:9a71:109b with SMTP id ix18-20020a170902f81200b001899a71109bmr21644806plb.171.1670028171959; Fri, 02 Dec 2022 16:42:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028171; cv=none; d=google.com; s=arc-20160816; b=eczerViah3yWZGIt20qPNn1MWz7DJ1RjjhAuMWaAva9B+OqbkzeLWriJCudnpxtDO1 e/GikgdpE/CIfK5P+7ehGnx2rCQ2YB23nQApdnl5phob5iVu2DhLCD37cUraDvTrINKZ 2k7volAv2YT1OL2tl80B+mm318zIJsuAkbl1GXPIhC8Vh0GsmjXVD0Cgszy9ZWkE2iwG yVe1tiXaviWf4et5mqOyMe2ClB4VqgyZ0jw7i191uiUNSBLTQlBd9f2aasHiPYlLCdfb 7qxw1hINhUYELakBImdtdPdaqfudgp96X3je/ck/l+lRHP6VW4LFBnY8SZNZb3qPXHly 3B7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=XGNkGuSK1plMLRwH5aRsfnsRLLazPR+OhkE/+qxYJGs=; b=H4DOt8OWzTOQ/WdlvtePf8HkITxp+cSVeStaE061rYAdGIfnwXmJ8A9/CSm5rRuf1P wVBmID807kRPtj1F3QD0R2hgHXpCri4ovkx/CNLVGqJfFNf4XUovxXT/2TFV1S3yVR7G vXd2Juu1j5dA9HuUIjn63qPph8DOttqxBU6bfMW+SQmU6t2UvDd6rXBcrSE/8X8J6efH pYCKj6jcyzlmJVaPF/Wy5JAThaWILEKxJH/OuFW4HqDK1zj8Ameo8Cl4fQo6q57Z9w9+ FZwqquCN2staDejc/ha5WeEk8qX4d8OYd/MUJS42rWhmuZ95rOlCkO5PpKoQ4zUs2Wdv efCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="H73p/5Br"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mw4-20020a17090b4d0400b002121890521csi13190266pjb.119.2022.12.02.16.42.38; Fri, 02 Dec 2022 16:42:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="H73p/5Br"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235211AbiLCAmK (ORCPT + 99 others); Fri, 2 Dec 2022 19:42:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235225AbiLCAks (ORCPT ); Fri, 2 Dec 2022 19:40:48 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCA3D98EB9; Fri, 2 Dec 2022 16:38:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027893; x=1701563893; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=lbuSimEQjw1TriXUmmivxu8q4RqxnT6oy85jNDQLCBE=; b=H73p/5Br8EuTIGthOnUSpkZDO5rO8PqsGsUzUKtjoajoNPWzj7DwC9d7 b89NbprSNjYXxHg1aHaCUVVTOfFfyVgelEN9paWBeXYvNhCfZFqpP/x5E Fw+6DW5u9Dn+LUCOOJiB+ThPwKwgdPIkN0B8wx6MkVqtnMGI+teud34ez BUxdObIPFEETxNBNgZ5tk9mwQI68kk9hM31D9mygp9Dlx1jKtdRsOob9a tGJSeubAU5anNdwqHBmcI/eGHLmWcbrH2jvqdHDaDioTcBxRmXlxq4j5V 9Ap3dOLMj45IAFGWaz9KbRm093WQgxw6Az6Sln586nvnhoSA4KmsOGmbH g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711300" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711300" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:29 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479953" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479953" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:27 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 26/39] x86/shstk: Add user-mode shadow stack support Date: Fri, 2 Dec 2022 16:35:53 -0800 Message-Id: <20221203003606.6838-27-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151460452261101?= X-GMAIL-MSGID: =?utf-8?q?1751151460452261101?= From: Yu-cheng Yu Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this. Do not support IA32 emulation or x32. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v4: - Just set MSR_IA32_U_CET when disabling shadow stack, since we don't have IBT yet. (Peterz) v3: - Use define for set_clr_bits_msrl() (Kees) - Make some functions static (Kees) - Change feature_foo() to features_foo() (Kees) - Centralize shadow stack size rlimit checks (Kees) - Disable x32 support v2: - Get rid of unnessary shstk->base checks - Don't support IA32 emulation v1: - Switch to xsave helpers. - Expand commit log. arch/x86/include/asm/msr.h | 11 +++ arch/x86/include/asm/processor.h | 3 + arch/x86/include/asm/shstk.h | 7 ++ arch/x86/include/uapi/asm/prctl.h | 3 + arch/x86/kernel/shstk.c | 146 ++++++++++++++++++++++++++++++ 5 files changed, 170 insertions(+) diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index 65ec1965cd28..a4b86eb537d6 100644 --- a/arch/x86/include/asm/msr.h +++ b/arch/x86/include/asm/msr.h @@ -310,6 +310,17 @@ void msrs_free(struct msr *msrs); int msr_set_bit(u32 msr, u8 bit); int msr_clear_bit(u32 msr, u8 bit); +/* Helper that can never get accidentally un-inlined. */ +#define set_clr_bits_msrl(msr, set, clear) do { \ + u64 __val, __new_val; \ + \ + rdmsrl(msr, __val); \ + __new_val = (__val & ~(clear)) | (set); \ + \ + if (__new_val != __val) \ + wrmsrl(msr, __new_val); \ +} while (0) + #ifdef CONFIG_SMP int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h); int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index ff1c0b1aca8c..3c257a1a0757 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -28,6 +28,7 @@ struct vm86; #include #include #include +#include #include #include @@ -478,6 +479,8 @@ struct thread_struct { #ifdef CONFIG_X86_USER_SHADOW_STACK unsigned long features; unsigned long features_locked; + + struct thread_shstk shstk; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index 58f9ee675be0..f40414a982e8 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -8,12 +8,19 @@ struct task_struct; #ifdef CONFIG_X86_USER_SHADOW_STACK +struct thread_shstk { + u64 base; + u64 size; +}; + long shstk_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +void shstk_free(struct task_struct *p); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 8b427aea2345..fc97ca7c4884 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -31,4 +31,7 @@ #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 +/* ARCH_SHSTK_ features bits */ +#define ARCH_SHSTK_SHSTK (1ULL << 0) + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 41ed6552e0a5..64f2521cae23 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -8,14 +8,160 @@ #include #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include +static bool features_enabled(unsigned long features) +{ + return current->thread.features & features; +} + +static void features_set(unsigned long features) +{ + current->thread.features |= features; +} + +static void features_clr(unsigned long features) +{ + current->thread.features &= ~features; +} + +static unsigned long alloc_shstk(unsigned long size) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE; + struct mm_struct *mm = current->mm; + unsigned long addr, unused; + + mmap_write_lock(mm); + addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + + mmap_write_unlock(mm); + + return addr; +} + +static unsigned long adjust_shstk_size(unsigned long size) +{ + if (size) + return PAGE_ALIGN(size); + + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); +} + +static void unmap_shadow_stack(u64 base, u64 size) +{ + while (1) { + int r; + + r = vm_munmap(base, size); + + /* + * vm_munmap() returns -EINTR when mmap_lock is held by + * something else, and that lock should not be held for a + * long time. Retry it for the case. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + /* + * For all other types of vm_munmap() failure, either the + * system is out of memory or there is bug. + */ + WARN_ON_ONCE(r); + break; + } +} + +static int shstk_setup(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long addr, size; + + /* Already enabled */ + if (features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + /* Also not supported for 32 bit and x32 */ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || in_32bit_syscall()) + return -EOPNOTSUPP; + + size = adjust_shstk_size(0); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + fpregs_unlock(); + + shstk->base = addr; + shstk->size = size; + features_set(ARCH_SHSTK_SHSTK); + + return 0; +} + void reset_thread_features(void) { + memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); current->thread.features = 0; current->thread.features_locked = 0; } +void shstk_free(struct task_struct *tsk) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return; + + if (!tsk->mm) + return; + + unmap_shadow_stack(shstk->base, shstk->size); +} + + +static int shstk_disable(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* Already disabled? */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + fpregs_lock_and_load(); + /* Disable WRSS too when disabling shadow stack */ + wrmsrl(MSR_IA32_U_CET, 0); + wrmsrl(MSR_IA32_PL3_SSP, 0); + fpregs_unlock(); + + shstk_free(current); + features_clr(ARCH_SHSTK_SHSTK); + + return 0; +} + long shstk_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_SHSTK_LOCK) { From patchwork Sat Dec 3 00:35:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29187 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143104wrr; Fri, 2 Dec 2022 16:43:15 -0800 (PST) X-Google-Smtp-Source: AA0mqf6osI+wE061VkkCe+pUc2y/+SCs4YMjJgE4sPBsb5H2v2jnfi+lwVRONT9SM3PZxdEm+8Ls X-Received: by 2002:a17:903:2350:b0:189:8ea3:7455 with SMTP id c16-20020a170903235000b001898ea37455mr26626042plh.19.1670028195116; Fri, 02 Dec 2022 16:43:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028195; cv=none; d=google.com; s=arc-20160816; b=eskSRQHBHaRaSBB1kVIEeGpj0qU8427ZgeUTzxP9UpXv3Le4lWYs0mS/jCAMm6xeSr eHjH4gA2zNj49KQd91rNBOeyTvynQ6GbPtRmgHrtnoAcgLuZ0vxxUMTqA5CG0oIyiwt9 1o63rjpkIdRYqmW/EGDYkwrCRMlBI9t9HmP7oE+UN27WZRbcKaxst+bFp5hO0abBQHaR +tkIP1+AlQbu2USGDujTEg0R4fmYbnY9gvhNU+dKX9wax/xb/4DX9/ouG5wwfRVAh538 vJvJbn18MmX9b9KjfLmlvymYnYkzyuPBcKE7qCmLqey4iZEE++cpWB7YdIaA56WlsGvT DzRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=7aD77f3rF6759ZYyy9rjbECYaGyvHlcmEETjZAsnmBM=; b=LFH/Z2BJKUvLiklJjfnRX0jam0td3FaXi3BKgfuP7Ecom2fsTx2mxWrLFKo1uWMU0u fbGAO5z5mwj7YRYZ2O23yG9lUADqZg2ecxgiymOH/JjX8D8sTtMRhfIQokfWydD8njmF mDi8Cii6jAvRM9/hGu5bYICfz8IdYEpmeSaGijPIPZ7W3kZA2LicaBr6gT0PXbLbHePh pfOJ076VdYMe9HuSnuWByURVtFE/ZYD8lh2BfQgckyJzZoHoYIxPKpgLL43ebI+Q3Pqc Ob4e+O99TSmJcddvkkTioWoUf79+vLvN1O9IJfEVgs8Z0zobuKnf/zny9KCMD/CuQTjZ mMeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="hiVR/Z89"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b21-20020a630c15000000b00434b19dc958si8253223pgl.349.2022.12.02.16.43.01; Fri, 02 Dec 2022 16:43:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="hiVR/Z89"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235118AbiLCAmO (ORCPT + 99 others); Fri, 2 Dec 2022 19:42:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235230AbiLCAkw (ORCPT ); Fri, 2 Dec 2022 19:40:52 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D29DAD8269; Fri, 2 Dec 2022 16:38:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027894; x=1701563894; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ug7y0RLSjnOtQ/lwyaAVbud3wA41BtCL9/Y7Iwzw2Pw=; b=hiVR/Z89YJNey0oQ7Hxpwnhqix6fgXqGvYKcKlfXN3/o1JeIQ9jx5SvE jUxZvyep1EEHovlnybOAqlZ8NWX/K3pG+OaEskzRN5NmBYNuH3u7w5f1O Okx82urM0J1k+0eaQ+HzhxNgrPzMe5O4kTKe0m4yJ4PE8Y26YckOiYU55 k9DYwvxmgl/QoqMp09ieWbsC/N2sTJaKToHD+CuIEL6o/xMmgMTefzI2Y 0l5keWFBFKDdlMKcMbD1SlgeKFOo1O1a9aQTVH1TMZ7HaulysZ8y3eV6O YIJuNvM4nUwrix5WXUMZX1jWXI15RBqLT/n5OGMFEVXAwPbxBBIXxLexE w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711344" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711344" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:31 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479962" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479962" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:29 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 27/39] x86/shstk: Handle thread shadow stack Date: Fri, 2 Dec 2022 16:35:54 -0800 Message-Id: <20221203003606.6838-28-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151484473991922?= X-GMAIL-MSGID: =?utf-8?q?1751151484473991922?= From: Yu-cheng Yu When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-cet case this is handled in two ways. With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks. For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack. For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it. Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB. For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork(). Userspace implementing posix vfork() can actually prevent the child from returning from the vfork() calling function, using CET. Glibc does this by adjusting the shadow stack pointer in the child, so that the child receives a #CP if it tries to return from vfork() calling function. Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent. During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner. Tested-by: Pengfei Xu Tested-by: John Allen Suggested-by: Thomas Gleixner Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v3: - Fix update_fpu_shstk() stub (Mike Rapoport) - Fix chunks around alloc_shstk() in wrong patch (Kees) - Fix stack_size/flags swap (Kees) - Use centalized stack size logic (Kees) v2: - Have fpu_clone() take new shadow stack pointer and update SSP in xsave buffer for new task. (tglx) v1: - Expand commit log. - Add more comments. - Switch to xsave helpers. Yu-cheng v30: - Update comments about clone()/clone3(). (Borislav Petkov) arch/x86/include/asm/fpu/sched.h | 3 +- arch/x86/include/asm/mmu_context.h | 2 ++ arch/x86/include/asm/shstk.h | 7 +++++ arch/x86/kernel/fpu/core.c | 41 +++++++++++++++++++++++++++- arch/x86/kernel/process.c | 18 +++++++++++- arch/x86/kernel/shstk.c | 44 ++++++++++++++++++++++++++++-- 6 files changed, 110 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h index b2486b2cbc6e..54c9c2fd1907 100644 --- a/arch/x86/include/asm/fpu/sched.h +++ b/arch/x86/include/asm/fpu/sched.h @@ -11,7 +11,8 @@ extern void save_fpregs_to_fpstate(struct fpu *fpu); extern void fpu__drop(struct fpu *fpu); -extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal); +extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long shstk_addr); extern void fpu_flush_thread(void); /* diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 32483ab3f763..8cafa3ccc027 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -192,6 +192,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index f40414a982e8..172a69052770 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -15,11 +15,18 @@ struct thread_shstk { long shstk_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size, + unsigned long *shstk_addr); void shstk_free(struct task_struct *p); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline int shstk_alloc_thread_stack(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size, + unsigned long *shstk_addr) { return 0; } static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 9af78e9d92a0..f5037dffee19 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -552,8 +552,41 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) } } +#ifdef CONFIG_X86_USER_SHADOW_STACK +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) +{ + struct cet_user_state *xstate; + + /* If ssp update is not needed. */ + if (!ssp) + return 0; + + xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, + XFEATURE_CET_USER); + + /* + * If there is a non-zero ssp, then 'dst' must be configured with a shadow + * stack and the fpu state should be up to date since it was just copied + * from the parent in fpu_clone(). So there must be a valid non-init CET + * state location in the buffer. + */ + if (WARN_ON_ONCE(!xstate)) + return 1; + + xstate->user_ssp = (u64)ssp; + + return 0; +} +#else +static int update_fpu_shstk(struct task_struct *dst, unsigned long shstk_addr) +{ + return 0; +} +#endif + /* Clone current's FPU state on fork */ -int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) +int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long ssp) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; @@ -613,6 +646,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) if (use_xsave()) dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID; + /* + * Update shadow stack pointer, in case it changed during clone. + */ + if (update_fpu_shstk(dst, ssp)) + return 1; + trace_x86_fpu_copy_src(src_fpu); trace_x86_fpu_copy_dst(dst_fpu); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index d1e83ba21130..16643f8fcded 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "process.h" @@ -119,6 +120,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -140,6 +142,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) struct inactive_task_frame *frame; struct fork_frame *fork_frame; struct pt_regs *childregs; + unsigned long shstk_addr = 0; int ret = 0; childregs = task_pt_regs(p); @@ -174,7 +177,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->flags = X86_EFLAGS_FIXED; #endif - fpu_clone(p, clone_flags, args->fn); + /* Allocate a new shadow stack for pthread if needed */ + ret = shstk_alloc_thread_stack(p, clone_flags, args->stack_size, + &shstk_addr); + if (ret) + return ret; + + fpu_clone(p, clone_flags, args->fn, shstk_addr); /* Kernel thread ? */ if (unlikely(p->flags & PF_KTHREAD)) { @@ -220,6 +229,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); + /* + * If copy_thread() if failing, don't leak the shadow stack possibly + * allocated in shstk_alloc_thread_stack() above. + */ + if (ret) + shstk_free(p); + return ret; } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 64f2521cae23..35d69078230a 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -47,7 +47,7 @@ static unsigned long alloc_shstk(unsigned long size) unsigned long addr, unused; mmap_write_lock(mm); - addr = do_mmap(NULL, addr, size, PROT_READ, flags, + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); @@ -126,6 +126,40 @@ void reset_thread_features(void) current->thread.features_locked = 0; } +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size, unsigned long *shstk_addr) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + unsigned long addr, size; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + /* + * For CLONE_VM, except vfork, the child needs a separate shadow + * stack. + */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + + size = adjust_shstk_size(stack_size); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + shstk->base = addr; + shstk->size = size; + + *shstk_addr = addr + size; + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; @@ -134,7 +168,13 @@ void shstk_free(struct task_struct *tsk) !features_enabled(ARCH_SHSTK_SHSTK)) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; unmap_shadow_stack(shstk->base, shstk->size); From patchwork Sat Dec 3 00:35:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29188 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143111wrr; Fri, 2 Dec 2022 16:43:17 -0800 (PST) X-Google-Smtp-Source: AA0mqf4S4Rrq53xw2MSVXLzdVAcoatS3aGO5P+JBIoYOFWwei6hi69D0xZgiOCXPaLvuBI65Lc/W X-Received: by 2002:a17:90a:2b88:b0:219:a1e4:20e2 with SMTP id u8-20020a17090a2b8800b00219a1e420e2mr1720287pjd.182.1670028197466; Fri, 02 Dec 2022 16:43:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028197; cv=none; d=google.com; s=arc-20160816; b=HdLXJT/ebandvDyEdjczGkkEy2M8E6jeZbeabh8gM697zJQyFQl0B43lCkAJnY503N 2/Fr9yacnXJAjonVoCidKqFlFipnrlLJlgMY9Ni2AmEMCKjJqARegBhFbr/jIMf+JpFd Ypjkq48MFE8h9V3XQF3JqKPEIPnpMIsUrYipyTtwxj+TQwzoyVClDK1GpEF5b/lvz+9C JKWebXPzJadAg349K1R0OkD8O0abPcqzTGA12U56wa+fVBzSHXT1tAKXRKofySQ07W5j Yql2XfPlCa4kcSunr7CQ8BFOL4DL/grAZ+zid78uchhBadHW3vZbcgWn+laQJmn6jI8E e98A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=Z2zMsk6o+LmeFUnlvBxidNzfnMHvSIIu3S6pPzKtTgk=; b=FpDDzmGgPmVN6uIBkHFIA+dxT21Zm1lP7qRCxLhTc423yA4/gq7qVTKZ5OqExLepKy 5EfMwbqOvlbo5SyPo8K7Ku5uKKlyT1m2oWkwquAp4YjszFIyqB/tPBmIieXt5nTwmLbr zLjGWnNjAkLl9VPB6jH7in113rCJwOAOmpFUFFAvLVYJUthe8RnaAor20nsti7rjXegr pFBVhkkN2/SzKtHCDRe9e0T0lNGpzq67+2ht+A4dlCZUrWtjl4Ifq3UrMf8BYCLJJBK2 aEu+RaxnfpQRUxK6dNm6vdtZ+qIC7x+bMuu0TZCCXkYlXLhIiS9YnwVgbfzo5lki0hl4 +ajw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Bt0y84XS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lr11-20020a17090b4b8b00b0021969656c9esi9664366pjb.3.2022.12.02.16.43.04; Fri, 02 Dec 2022 16:43:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Bt0y84XS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235016AbiLCAmZ (ORCPT + 99 others); Fri, 2 Dec 2022 19:42:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234902AbiLCAl0 (ORCPT ); Fri, 2 Dec 2022 19:41:26 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C2EBD9690; Fri, 2 Dec 2022 16:38:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027895; x=1701563895; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=bClCIMMax+f0VpU6qshW357KZpyoajTh8eBuviYAcuc=; b=Bt0y84XSgRSclSTLQOaUO4m2UwlnT6IEV2P2fMj80UeAuxhsz/YcRbrr S2fpzrdp3ZghmxFjJYlgblPXXrJ3bgFJESCty4TZ0SyueMravhlRzr+GB mOJEh2JEfZc+6vYvKWZQWfUt6b0x1SwAH1B9mIqOSs1OFqffpyfMiBydE uKx2vjVohoAUy/4EA8Rn5DLGJc3ongeBpVd1oDi+6WUVhVl6mgAZIXstP qogI0NzjaU9CH3d9YiEi15MtyWiP3eBEPsZvfJEETfWJXB3igpSMJqfLC ZQpzzL06BBPRt8+x6xtoJ5EteOL+z809IB3ptkqWr+SS1AfCWxEeaInQ/ A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711359" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711359" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:33 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479971" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479971" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:31 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 28/39] x86/shstk: Introduce routines modifying shstk Date: Fri, 2 Dec 2022 16:35:55 -0800 Message-Id: <20221203003606.6838-29-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151487463144006?= X-GMAIL-MSGID: =?utf-8?q?1751151487463144006?= From: Yu-cheng Yu Shadow stacks are normally written to via CALL/RET or specific CET instuctions like RSTORSSP/SAVEPREVSSP. However during some Linux operations the kernel will need to write to directly using the ring-0 only WRUSS instruction. A shadow stack restore token marks a restore point of the shadow stack, and the address in a token must point directly above the token, which is within the same shadow stack. This is distinctively different from other pointers on the shadow stack, since those pointers point to executable code area. Introduce token setup and verify routines. Also introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. In future patches that enable shadow stack to work with signals, the kernel will need something to denote the point in the stack where sigreturn may be called. This will prevent attackers calling sigreturn at arbitrary places in the stack, in order to help prevent SROP attacks. To do this, something that can only be written by the kernel needs to be placed on the shadow stack. This can be accomplished by setting bit 63 in the frame written to the shadow stack. Userspace return addresses can't have this bit set as it is in the kernel range. It is also can't be a valid restore token. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Drop shstk_check_rstor_token() - Fail put_shstk_data() if bit 63 is set in the data (Kees) - Add comment in create_rstor_token() (Kees) - Pull in create_rstor_token() changes from future patch (Kees) v2: - Add data helpers for writing to shadow stack. v1: - Use xsave helpers. Yu-cheng v30: - Update commit log, remove description about signals. - Update various comments. - Remove variable 'ssp' init and adjust return value accordingly. - Check get_user_shstk_addr() return value. - Replace 'ia32' with 'proc32'. arch/x86/include/asm/special_insns.h | 13 +++++ arch/x86/kernel/shstk.c | 73 ++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 35f709f619fb..6d51a87aea7f 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -223,6 +223,19 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_USER_SHADOW_STACK +static inline int write_user_shstk_64(u64 __user *addr, u64 val) +{ + asm_volatile_goto("1: wrussq %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); + return 0; +fail: + return -EFAULT; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #define nop() asm volatile ("nop") static inline void serialize(void) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 35d69078230a..64c60bc58520 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -25,6 +25,8 @@ #include #include +#define SS_FRAME_SIZE 8 + static bool features_enabled(unsigned long features) { return current->thread.features & features; @@ -40,6 +42,35 @@ static void features_clr(unsigned long features) current->thread.features &= ~features; } +/* + * Create a restore token on the shadow stack. A token is always 8-byte + * and aligned to 8. + */ +static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{ + unsigned long addr; + + /* Token must be aligned */ + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + addr = ssp - SS_FRAME_SIZE; + + /* + * SSP is aligned, so reserved bits and mode bit are a zero, just mark + * the token 64-bit. + */ + ssp |= BIT(0); + + if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) + return -EFAULT; + + if (token_addr) + *token_addr = addr; + + return 0; +} + static unsigned long alloc_shstk(unsigned long size) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; @@ -160,6 +191,48 @@ int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, return 0; } +static unsigned long get_user_shstk_addr(void) +{ + unsigned long long ssp; + + fpregs_lock_and_load(); + + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + fpregs_unlock(); + + return ssp; +} + +static int put_shstk_data(u64 __user *addr, u64 data) +{ + if (WARN_ON_ONCE(data & BIT(63))) + return -EINVAL; + + /* + * Mark the high bit so that the sigframe can't be processed as a + * return address. + */ + if (write_user_shstk_64(addr, data | BIT(63))) + return -EFAULT; + return 0; +} + +static int get_shstk_data(unsigned long *data, unsigned long __user *addr) +{ + unsigned long ldata; + + if (unlikely(get_user(ldata, addr))) + return -EFAULT; + + if (!(ldata & BIT(63))) + return -EINVAL; + + *data = ldata & ~BIT(63); + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; From patchwork Sat Dec 3 00:35:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29189 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143172wrr; Fri, 2 Dec 2022 16:43:29 -0800 (PST) X-Google-Smtp-Source: AA0mqf5mnH48xiCuhUMAzLGTIYIGuV/OUcNOQC6aGh6fQfoHlf0JlfE1S3MaGUhH3rAbKcwsUvzD X-Received: by 2002:a17:902:8bc4:b0:187:2790:9bc2 with SMTP id r4-20020a1709028bc400b0018727909bc2mr57821923plo.61.1670028209238; Fri, 02 Dec 2022 16:43:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028209; cv=none; d=google.com; s=arc-20160816; b=bHmZFpSjiRA7HEbYghlyG5ShzpKAdAEvnS5aAkj51oAsT1tgQU4xxjH5svBO6c6Ehz +8Kfu/EvmzSrRb66Bi7z+SkCMC6M8ZDGvPdnFTpR8v/QeZpQPfbYotomNnbn9KyQZBQb wgBm8mrKJtp4f6ze7zOw6WBFyaDSxzyavfVEX6zO4Ytk2f3Q45wDy8Zs78gd6lWakG3m UDUbFkuyZT/ij2pnKxrTOY2+7ixHl+xIYdcgUt5DTagSmsFirglYsjWFm5ExQfwLLl0p HoQa3Q8OjVbh5XHY4nenbXcgfGOCZAE9EDrDd5bAtEx3yc5Jq4fxwmQhJUYjvZD47Dog fw4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=tUEpqccO2QmXG2NXYflEZ+/vIBPmIz44a/dQN76qctg=; b=RtYeOOwMDjIfihM7SDrz08DMEZNiSnmNQQgN+qzkPVMaOZTscDvy88kKpP5CPgxkw5 JqRPWYOVUXJLp1RExlvcBarqEuCKVELlNrSoy38mwPbwDil/S5/RfZDxdf5smFkLzqb2 f4xQfENwQhjIG9FP/RDkWEkFVYYo6UztoncJYRVGU4yHMYjfZQXQHo5JkmsxW16eo21K 0XEAMM9oIUwr24SDWi0B2MV+djJ5GYlvbSZwD8E7JNTZlfBVMJzlyY0RU8r2JtoGq6b4 YKR02FPlysEkH5BFWq8IOQvY1cIV68mW13uUIG7yWeYUi/xzt+ahw1bo3kzU1+yGW1AO DOOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OJMlcLXz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c28-20020a634e1c000000b0046eb96c4f90si8216944pgb.549.2022.12.02.16.43.16; Fri, 02 Dec 2022 16:43:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OJMlcLXz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235300AbiLCAmi (ORCPT + 99 others); Fri, 2 Dec 2022 19:42:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234994AbiLCAl2 (ORCPT ); Fri, 2 Dec 2022 19:41:28 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A35D4BE6BC; Fri, 2 Dec 2022 16:38:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027895; x=1701563895; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=dW8pr7oLP1faANAdoTd00r5c08b8upZ7lrUCKaAAhx8=; b=OJMlcLXz+R+AQI6A0ne3zCDswmvAgbpTZST8YLvYG0ykdQ/H6lp6/Fta qWkNWLy6LW/DkkFVk4vERtmth/8v6BKKHLPIvpwvEOTtGPQVNY8d36ccc XJGqmuHjPhOTFjoudYMh2ytrHQspw//KfG8uqdroE3xGcvkqA90IlS3qs GghNQlsmnrW+gs3mqfaX+cbC/W49Ylx93Mfmq+idpXJGAi7DOo1JlGF4m oVPeYB2d9k5Y4polNyjeBUz7PNZTpJdnxPdaA/0V3x6HrZkrdXvpGgSds V1zRUhL8qPeRXtASPnvIpnDnYwj05L9ic88SuVy1pGhUBs8mnABUX54Gb Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711387" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711387" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:35 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479976" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479976" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:33 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 29/39] x86/shstk: Handle signals for shadow stack Date: Fri, 2 Dec 2022 16:35:56 -0800 Message-Id: <20221203003606.6838-30-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151499507625554?= X-GMAIL-MSGID: =?utf-8?q?1751151499507625554?= From: Yu-cheng Yu When a signal is handled normally the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only track's return addresses, there isn't any state that needs to be pushed. However, there are still a few things that need to be done. These things are userspace visible and which will be kernel ABI for shadow stacks. One is to make sure the restorer address is written to shadow stack, since the signal handler (if not changing ucontext) returns to the restorer, and the restorer calls sigreturn. So add the restorer on the shadow stack before handling the signal, so there is not a conflict when the signal handler returns to the restorer. The other thing to do is to place some type of checkable token on the thread's shadow stack before handling the signal and check it during sigreturn. This is an extra layer of protection to hamper attackers calling sigreturn manually as in SROP-like attacks. For this token we can use the shadow stack data format defined earlier. Have the data pushed be the previous SSP. In the future the sigreturn might want to return back to a different stack. Storing the SSP (instead of a restore offset or something) allows for future functionality that may want to restore to a different stack. So, when handling a signal push - the SSP pointing in the shadow stack data format - the restorer address below the restore token. In sigreturn, verify SSP is stored in the data format and pop the shadow stack. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Andy Lutomirski Cc: Cyrill Gorcunov Cc: Florian Weimer Cc: H. Peter Anvin Cc: Kees Cook Reviewed-by: Kees Cook --- v3: - Drop shstk_setup_rstor_token() (Kees) - Drop x32 signal support, since x32 support is dropped v2: - Switch to new shstk signal format v1: - Use xsave helpers. - Expand commit log. Yu-cheng v27: - Eliminate saving shadow stack pointer to signal context. arch/x86/include/asm/shstk.h | 5 ++ arch/x86/kernel/shstk.c | 98 ++++++++++++++++++++++++++++++++++++ arch/x86/kernel/signal.c | 1 + arch/x86/kernel/signal_64.c | 6 +++ 4 files changed, 110 insertions(+) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index 172a69052770..746c040f7cb6 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct ksignal; #ifdef CONFIG_X86_USER_SHADOW_STACK struct thread_shstk { @@ -19,6 +20,8 @@ int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size, unsigned long *shstk_addr); void shstk_free(struct task_struct *p); +int setup_signal_shadow_stack(struct ksignal *ksig); +int restore_signal_shadow_stack(void); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } @@ -28,6 +31,8 @@ static inline int shstk_alloc_thread_stack(struct task_struct *p, unsigned long stack_size, unsigned long *shstk_addr) { return 0; } static inline void shstk_free(struct task_struct *p) {} +static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } +static inline int restore_signal_shadow_stack(void) { return 0; } #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 64c60bc58520..e53225a8d39e 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -233,6 +233,104 @@ static int get_shstk_data(unsigned long *data, unsigned long __user *addr) return 0; } +static int shstk_push_sigframe(unsigned long *ssp) +{ + unsigned long target_ssp = *ssp; + + /* Token must be aligned */ + if (!IS_ALIGNED(*ssp, 8)) + return -EINVAL; + + if (!IS_ALIGNED(target_ssp, 8)) + return -EINVAL; + + *ssp -= SS_FRAME_SIZE; + if (put_shstk_data((void *__user)*ssp, target_ssp)) + return -EFAULT; + + return 0; +} + +static int shstk_pop_sigframe(unsigned long *ssp) +{ + unsigned long token_addr; + int err; + + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); + if (unlikely(err)) + return err; + + /* Restore SSP aligned? */ + if (unlikely(!IS_ALIGNED(token_addr, 8))) + return -EINVAL; + + /* SSP in userspace? */ + if (unlikely(token_addr >= TASK_SIZE_MAX)) + return -EINVAL; + + *ssp = token_addr; + + return 0; +} + +int setup_signal_shadow_stack(struct ksignal *ksig) +{ + void __user *restorer = ksig->ka.sa.sa_restorer; + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + if (!restorer) + return -EINVAL; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_push_sigframe(&ssp); + if (unlikely(err)) + return err; + + /* Push restorer address */ + ssp -= SS_FRAME_SIZE; + err = write_user_shstk_64((u64 __user *)ssp, (u64)restorer); + if (unlikely(err)) + return -EFAULT; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + +int restore_signal_shadow_stack(void) +{ + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_pop_sigframe(&ssp); + if (unlikely(err)) + return err; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 1504eb8d25aa..b2c9853ce1c5 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -40,6 +40,7 @@ #include #include #include +#include static inline int is_ia32_compat_frame(struct ksignal *ksig) { diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c index ff9c55064223..6708ec2b00a3 100644 --- a/arch/x86/kernel/signal_64.c +++ b/arch/x86/kernel/signal_64.c @@ -175,6 +175,9 @@ int x64_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs) frame = get_sigframe(ksig, regs, sizeof(struct rt_sigframe), &fp); uc_flags = frame_uc_flags(regs); + if (setup_signal_shadow_stack(ksig)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -260,6 +263,9 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (restore_altstack(&frame->uc.uc_stack)) goto badframe; From patchwork Sat Dec 3 00:35:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29190 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143189wrr; Fri, 2 Dec 2022 16:43:33 -0800 (PST) X-Google-Smtp-Source: AA0mqf6lcFXlU4CpT3IikxpFfs0QSM1rvYSX1UGOlAk+3ZEa59Db1qtmkMXznP/50BBTPXiVaqfL X-Received: by 2002:a17:90b:944:b0:219:33a1:d05f with SMTP id dw4-20020a17090b094400b0021933a1d05fmr30164351pjb.116.1670028212969; Fri, 02 Dec 2022 16:43:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028212; cv=none; d=google.com; s=arc-20160816; b=XkHsi28yPwyFSQHXR2NXqhzvkviZiqjyJoKJSStLLAeN2paVYogR490yY6dvCP8sQg g5qPx4c+590rEPEWrrlXVtqRljAC88rK6DBAmtACOecnaL5tHwwpg/ZwfRZRICA2RUoE mm6+lRh2aCUc7KhUsR1G0gShddY0H5afgmTFWw2HougHJxk7nCOyXKCfPu0m+G9qZvkV DrjP+Bc4g6gt8d1ZENwZd3mVLpn/9fWRCtYGEi0Y8Yb6R0ZllF3zJXUTtOz9wC+yAo8T F0ai5QR/amAWaW5IhnH7W8f+lp/kOQIhAwDUN/kwkEvd+jyEry7Eb77KZwfkDuNOos8y OGrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=yDgWE7m7m83zt57tKPHHAyeG+e3Qj4dfkK4mwlEtfv8=; b=XvbijHIpt0Av85q8yA4ynlX48ex5tiGtLIRYdZiINtiLOfyLGxfHtzv+ifkVildyT3 4htU28wm2KPZfOMtYEsnKunGrdGBxLdZaYIid1kc0xAzRHpSjMGVKMdqY5GW9Efpcsvz DZU6KCsZPoh9f6KmzL+BWOHO4TiSlFZkfEarlBMU9lom3uNhIyEFaQB2cKM72kJyS9Ia IMnICmXgbuF+Yh6lNct7Nah+wzrnmjbhN3HZhxTlIjQBcI8Wsv5P+JUtl7NEf3bG6AGQ YAfbF9pvLTYuAdDqK6VghQlFuUuY/v1Cwj4i9oXSUSmFfn20tByPkEOG4rEFuGaFmTja OP3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SutbyGyU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n3-20020a638f03000000b0043aebb63fc9si5038379pgd.732.2022.12.02.16.43.20; Fri, 02 Dec 2022 16:43:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SutbyGyU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235310AbiLCAnD (ORCPT + 99 others); Fri, 2 Dec 2022 19:43:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235246AbiLCAlg (ORCPT ); Fri, 2 Dec 2022 19:41:36 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D44C9FA446; Fri, 2 Dec 2022 16:38:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027902; x=1701563902; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=70dcNQFQBMnN3c1Es5lWcCS9Jd/iJ6xPvGDlYE+0Wd8=; b=SutbyGyUPv+NagtwGNf+8yHAYfiyUW/V72uLD5j3uIUaDdxWS12d/xoH LIZ6efx97ye6gRDJt3oMKSZiRIgfsZAXn6VkCIHvfQTsiJTyq6kpodD6O ev2rMWeVPm4Cw4aMo4gM28cx7hRc3o7b9JGEbpJOmgGF/FGw1G3ntfm3g GL/MSRELVyvijYutnS0zhdTnkFxP9DvpQv6ysxVGpURURRIM54mz/3gMI WcBxpXZ3xIg4TxdW3ygA3PwBQ0NL1RN5LktbijCuz/Ds7WFPH1ur31pa0 GvMLBhBHBbcAs7oy9yJQXEnlukiacyAuRfsS3znJW+9QA4Ag4h76w3lFi g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711414" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711414" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:37 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479982" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479982" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:35 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 30/39] x86/shstk: Introduce map_shadow_stack syscall Date: Fri, 2 Dec 2022 16:35:57 -0800 Message-Id: <20221203003606.6838-31-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151503572680050?= X-GMAIL-MSGID: =?utf-8?q?1751151503572680050?= When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stack's is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Previously, a new PROT_SHADOW_STACK was attempted, which could be mprotect()ed from RW permissions after the data was provisioned. This was found to not be secure enough, as other thread's could write to the shadow stack during the writable window. The kernel can use a special instruction, WRUSS, to write directly to userspace shadow stacks. So the solution can be that memory can be mapped as shadow stack permissions from the beginning (never generally writable in userspace), and the kernel itself can write the restore token. First, a new madvise() flag was explored, which could operate on the PROT_SHADOW_STACK memory. This had a couple downsides: 1. Extra checks were needed in mprotect() to prevent writable memory from ever becoming PROT_SHADOW_STACK. 2. Extra checks/vma state were needed in the new madvise() to prevent restore tokens being written into the middle of pre-used shadow stacks. It is ideal to prevent restore tokens being added at arbitrary locations, so the check was to make sure the shadow stack had never been written to. 3. It stood out from the rest of the madvise flags, as more of direct action than a hint at future desired behavior. So rather than repurpose two existing syscalls (mmap, madvise) that don't quite fit, just implement a new map_shadow_stack syscall to allow userspace to map and setup new shadow stacks in one step. While ucontext is the primary motivator, userspace may have other unforeseen reasons to setup it's own shadow stacks using the WRSS instruction. Towards this provide a flag so that stacks can be optionally setup securely for the common case of ucontext without enabling WRSS. Or potentially have the kernel set up the shadow stack in some new way. The following example demonstrates how to create a new shadow stack with map_shadow_stack: void *shstk = map_shadow_stack(addr, stack_size, SHADOW_STACK_SET_TOKEN); Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe --- v3: - Change syscall common -> 64 (Kees) - Use bit shift notation instead of 0x1 for uapi header (Kees) - Call do_mmap() with MAP_FIXED_NOREPLACE (Kees) - Block unsupported flags (Kees) - Require size >= 8 to set token (Kees) v2: - Change syscall to take address like mmap() for CRIU's usage v1: - New patch (replaces PROT_SHADOW_STACK). arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/uapi/asm/mman.h | 3 ++ arch/x86/kernel/shstk.c | 56 ++++++++++++++++++++++---- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 2 +- kernel/sys_ni.c | 1 + 6 files changed, 55 insertions(+), 9 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c84d12608cd2..f65c671ce3b1 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -372,6 +372,7 @@ 448 common process_mrelease sys_process_mrelease 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node +451 64 map_shadow_stack sys_map_shadow_stack # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 775dbd3aff73..15c5a1c4fc29 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -12,6 +12,9 @@ ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) #endif +/* Flags for map_shadow_stack(2) */ +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ + #include #endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index e53225a8d39e..8f329c22728a 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -71,19 +72,31 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) return 0; } -static unsigned long alloc_shstk(unsigned long size) +static unsigned long alloc_shstk(unsigned long addr, unsigned long size, + unsigned long token_offset, bool set_res_tok) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; struct mm_struct *mm = current->mm; - unsigned long addr, unused; + unsigned long mapped_addr, unused; - mmap_write_lock(mm); - addr = do_mmap(NULL, 0, size, PROT_READ, flags, - VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + if (addr) + flags |= MAP_FIXED_NOREPLACE; + mmap_write_lock(mm); + mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); - return addr; + if (!set_res_tok || IS_ERR_VALUE(addr)) + goto out; + + if (create_rstor_token(mapped_addr + token_offset, NULL)) { + vm_munmap(mapped_addr, size); + return -EINVAL; + } + +out: + return mapped_addr; } static unsigned long adjust_shstk_size(unsigned long size) @@ -134,7 +147,7 @@ static int shstk_setup(void) return -EOPNOTSUPP; size = adjust_shstk_size(0); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -179,7 +192,7 @@ int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, size = adjust_shstk_size(stack_size); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -373,6 +386,33 @@ static int shstk_disable(void) return 0; } +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ + bool set_tok = flags & SHADOW_STACK_SET_TOKEN; + unsigned long aligned_size; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -ENOSYS; + + if (flags & ~SHADOW_STACK_SET_TOKEN) + return -EINVAL; + + /* If there isn't space for a token */ + if (set_tok && size < 8) + return -EINVAL; + + /* + * An overflow would result in attempting to write the restore token + * to the wrong location. Not catastrophic, but just return the right + * error code and block it. + */ + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return alloc_shstk(addr, aligned_size, size, set_tok); +} + long shstk_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_SHSTK_LOCK) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 33a0ee3bcb2e..392dc11e3556 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1058,6 +1058,7 @@ asmlinkage long sys_memfd_secret(unsigned int flags); asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long len, unsigned long home_node, unsigned long flags); +asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 45fa180cc56a..b12940ec5926 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -887,7 +887,7 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 860b2dcf3ac4..cb9aebd34646 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -381,6 +381,7 @@ COND_SYSCALL(vm86old); COND_SYSCALL(modify_ldt); COND_SYSCALL(vm86); COND_SYSCALL(kexec_file_load); +COND_SYSCALL(map_shadow_stack); /* s390 */ COND_SYSCALL(s390_pci_mmio_read); From patchwork Sat Dec 3 00:35:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29191 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143277wrr; Fri, 2 Dec 2022 16:43:52 -0800 (PST) X-Google-Smtp-Source: AA0mqf5RaqxOz1QA3hwt0STtjme+3N1t1VyeUssWANGRjNIQw3dDt0zkdiJgSOQPDEpmH4PTIL5x X-Received: by 2002:a17:902:f2c5:b0:189:3c6e:d1b5 with SMTP id h5-20020a170902f2c500b001893c6ed1b5mr50313730plc.108.1670028232468; Fri, 02 Dec 2022 16:43:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028232; cv=none; d=google.com; s=arc-20160816; b=t9qK8sgBwEXsGf0LaNiWksUMui97ql5MxOBXL0B/9oX5j6JpfGflHGV7nTeJXTIGtZ ngty8ckq+Zr/Wh8tD/EPW9+wM+2/QYH993ZlIlAa4fqDg/SlchUMrZA/+zhpYQPger+q 6uODNax7XbKiuFR8w09ds5j2xdczEArl61xTm1G3rj+NpMTR8MMJgdYy4DLG3XVmxhG0 ZwgRvGVUsdiSCMtGUaWc1pEjL0s99CTHRYoC8gDVSIMylKUowSBgEQUEmPs5t37KZGjt 22LsmPXD14aFwbvjKc+fx+bN8Da5XIbhFnJczxhdj8aLBCmtGCG9R1Sip7W8dWo+Nu7j PGpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=+IV+lVothT0cFefShR6Wcug5827g8qJoedoXIFDoBck=; b=vtyRpHiwkKnB2zBTOEHvLV6Fybnwpuqb2M0qBjpTkyKrOLrF2IOidRW9HAfWgilL+H axOKFHM6f4FkrFVL//lX+dJHFn2sQLDtz72pmU5GFz20oTOeiKwKDhSgP1j82LvkZKE/ yQdFDekuLXh9PJZTPD5PrIozCm+eC6EdACdJFLfonqi5bqOBd0U9bYOOaOjBhIpGIVDE 0icdujGSXjai8E/Qfu8SsB9kdInej6hj7Cnss4r6SVnVrhsy1osUM84+C6mpAo2VwtKp gNuhzMWLlc4BV2llsTNh4oikiWGXHcxSvNv0ZThQ5YLX4zXnCcO8/CoDDwvaXRmaLW5s y1LQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gcGGXLNd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ik24-20020a170902ab1800b00189005c48aesi8047937plb.108.2022.12.02.16.43.39; Fri, 02 Dec 2022 16:43:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gcGGXLNd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235322AbiLCAnP (ORCPT + 99 others); Fri, 2 Dec 2022 19:43:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235255AbiLCAl5 (ORCPT ); Fri, 2 Dec 2022 19:41:57 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2CD02F9E37; Fri, 2 Dec 2022 16:38:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027905; x=1701563905; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=+3jRGl6sfkxrMwbNfEDKpByFMedHLcrmmY7U/T1bBrQ=; b=gcGGXLNd5gWJNugJxL1SZmh7+VEoC6sNysNRb5y/Hq6c8WseCfNTZ+a+ GMOw7Y7lP2j00TYUAqb1m3UGitgWqKJU/iSIV0JX7JRhvcfMZ91M7UaZG vUdxvXcW1cZjwwGapDQumP76MhlGBjUQadbLqBzM6LjWYMlLehTnsCuGQ MbYRFjEyMjOGI3hEXYwqbgMev1bxlknKrT/r1T11WcJXRLshGPtjEpSwQ 6yfpeW8Hk9bPgvSo1/p0P7f5BNbKAnnyDpoy6DsmaSrMTIGV3YSdxB7ue mBeQGf0lwKnMsReeSz22eBOWYAIuIXt6e/KVnUNNitQK1Gs60fca41drQ A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711452" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711452" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:38 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479989" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479989" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:37 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 31/39] x86/shstk: Support wrss for userspace Date: Fri, 2 Dec 2022 16:35:58 -0800 Message-Id: <20221203003606.6838-32-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151523707880091?= X-GMAIL-MSGID: =?utf-8?q?1751151523707880091?= For the current shadow stack implementation, shadow stacks contents can't easily be provisioned with arbitrary data. This property helps apps protect themselves better, but also restricts any potential apps that may want to do exotic things at the expense of a little security. The x86 shadow stack feature introduces a new instruction, wrss, which can be enabled to write directly to shadow stack permissioned memory from userspace. Allow it to get enabled via the prctl interface. Only enable the userspace wrss instruction, which allows writes to userspace shadow stacks from userspace. Do not allow it to be enabled independently of shadow stack, as HW does not support using WRSS when shadow stack is disabled. From a fault handler perspective, WRSS will behave very similar to WRUSS, which is treated like a user access from a #PF err code perspective. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v3: - Make wrss_control() static - Fix verbiage in commit log (Kees) v2: - Add some commit log verbiage from (Dave Hansen) v1: - New patch. arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/shstk.c | 31 ++++++++++++++++++++++++++++++- 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index fc97ca7c4884..f13751c6bae4 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -33,5 +33,6 @@ /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) +#define ARCH_SHSTK_WRSS (1ULL << 1) #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 8f329c22728a..e59544fec96d 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -364,6 +364,35 @@ void shstk_free(struct task_struct *tsk) unmap_shadow_stack(shstk->base, shstk->size); } +static int wrss_control(bool enable) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* + * Only enable wrss if shadow stack is enabled. If shadow stack is not + * enabled, wrss will already be disabled, so don't bother clearing it + * when disabling. + */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return -EPERM; + + /* Already enabled/disabled? */ + if (features_enabled(ARCH_SHSTK_WRSS) == enable) + return 0; + + fpregs_lock_and_load(); + if (enable) { + set_clr_bits_msrl(MSR_IA32_U_CET, CET_WRSS_EN, 0); + features_set(ARCH_SHSTK_WRSS); + } else { + set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_WRSS_EN); + features_clr(ARCH_SHSTK_WRSS); + } + fpregs_unlock(); + + return 0; +} static int shstk_disable(void) { @@ -381,7 +410,7 @@ static int shstk_disable(void) fpregs_unlock(); shstk_free(current); - features_clr(ARCH_SHSTK_SHSTK); + features_clr(ARCH_SHSTK_SHSTK | ARCH_SHSTK_WRSS); return 0; } From patchwork Sat Dec 3 00:35:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29193 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143311wrr; Fri, 2 Dec 2022 16:44:01 -0800 (PST) X-Google-Smtp-Source: AA0mqf5fOzDGGbSO+H9HSOKg6ezXPSRgbTYftaNy8ID4Ms2mOYp0ahEIelcJWNQrEwjxSYpk3xXI X-Received: by 2002:a63:5910:0:b0:42b:68a1:4207 with SMTP id n16-20020a635910000000b0042b68a14207mr53560353pgb.326.1670028241089; Fri, 02 Dec 2022 16:44:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028241; cv=none; d=google.com; s=arc-20160816; b=uaXRTDXOSx3TBlUutfDy3/M9OEiohFaSHQXlxPsSfucvltBgVeMKltsBYeAGf83ep+ fE1NAwOggYfRS0lWqpAFjAWHeSZm75saSiIkzbbB4kukp+pU3YkIvkTyPUgyHG7VHV00 MSVPKyFOFM+7kmHr13XUk4+A3XEhr62C9yccWFWb7CAhWdNXQCuZZDSivAiJpl3+yPlK w4cDRIxiA48SR/kZqLrcaZrYinUHpQpYhax4x7Jkr3aLNhXeu9+B2EDFufF84J8LdQE3 kxHDpqUICKCujDC1VqtLLT2MR9w7czyo6RxZe/gHA/Yqr/ROaPAY6zkCm2xDADYKdrNH htBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=sme6E7WId2b/OescfNOtoFZmHp2FGiGCXrZa7345wuM=; b=MRz/iO+dJSa+M247RCEQhwTZ+xWd7k/Gsk0v78k3qLZN+QRndhrSXFC89oSfI2h8MC 3FKZctixXJjTUrqMJWe0cTONU3M0oJXPZ3deoFq5RU7HInjET4/VKO9aaQ+GKsSgmscl kDITpU/45v9Ri7y6FBOSfor4jNpME03qDa84RxJf5xG74/OyNft5I21CBlkEpN46F9jX d6rAEk+hoB2a44/GbWKAaA9ZgdbwEMeieLHRMgd/3kWQje+7BOkl+cp6wjeb30vRJRWq HCvR4CZgus1tjQll3WztifEJhd3mv5WkeufSMXN9o67u2P58lVTymRWHALJRD0L9T4mC iAVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JLznq60w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c7-20020a056a000ac700b0056bf15d0cbfsi9564888pfl.308.2022.12.02.16.43.48; Fri, 02 Dec 2022 16:44:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JLznq60w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235132AbiLCAnZ (ORCPT + 99 others); Fri, 2 Dec 2022 19:43:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235279AbiLCAmD (ORCPT ); Fri, 2 Dec 2022 19:42:03 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 299851042E0; Fri, 2 Dec 2022 16:38:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027908; x=1701563908; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=9Rgq6Is1ft2SHhT8Knmc6hxAfAY7NUj7tCn2AnWmhHU=; b=JLznq60wRLmOAHjqC+rkZsCOl7pFbD2wpjlzi9HyaQHHk/0UJhUMJSXh 96jhzCFrokGQJxiP9kg+E6JbcRDKwJOizY6Jetv5tL5pYQ4Sdd9Wouw84 hdvq2xzmb3ya8x+vd0PlzwoWhodqhdDSA6epCLxMMqow5ujDt1uzXeisy sSdCZNAUDr5z7ck1cHlvksd1oB69AebpTJxCWnMXIZArPqu/p+zCCJdbZ oRabDTvIrhDFU/6Y30qAvBJBvOh9UBQxVhkTmsZMCNi10iAEJ1Hbp9haz mDAvCfwqMNojKgsWSdY9ssqWj+XfMlzoDD1/vakepWxjllmo+sJS0bOrr w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711482" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711482" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:40 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787479999" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787479999" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:38 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 32/39] x86: Expose thread features in /proc/$PID/status Date: Fri, 2 Dec 2022 16:35:59 -0800 Message-Id: <20221203003606.6838-33-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151533025356032?= X-GMAIL-MSGID: =?utf-8?q?1751151533025356032?= Applications and loaders can have logic to decide whether to enable shadow stack. They usually don't report whether shadow stack has been enabled or not, so there is no way to verify whether an application actually is protected by shadow stack. Add two lines in /proc/$PID/status to report enabled and locked features. Since, this involves referring to arch specific defines in asm/prctl.h, implement an arch breakout to emit the feature lines. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Kirill A. Shutemov [Switched to CET, added to commit log] Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v4: - Remove "CET" references v3: - Move to /proc/pid/status (Kees) v2: - New patch arch/x86/kernel/cpu/proc.c | 23 +++++++++++++++++++++++ fs/proc/array.c | 6 ++++++ include/linux/proc_fs.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 099b6f0d96bd..31c0e68f6227 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -4,6 +4,8 @@ #include #include #include +#include +#include #include "cpu.h" @@ -175,3 +177,24 @@ const struct seq_operations cpuinfo_op = { .stop = c_stop, .show = show_cpuinfo, }; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +static void dump_x86_features(struct seq_file *m, unsigned long features) +{ + if (features & ARCH_SHSTK_SHSTK) + seq_puts(m, "shstk "); + if (features & ARCH_SHSTK_WRSS) + seq_puts(m, "wrss "); +} + +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task) +{ + seq_puts(m, "x86_Thread_features:\t"); + dump_x86_features(m, task->thread.features); + seq_putc(m, '\n'); + + seq_puts(m, "x86_Thread_features_locked:\t"); + dump_x86_features(m, task->thread.features_locked); + seq_putc(m, '\n'); +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/fs/proc/array.c b/fs/proc/array.c index d2a94eafe9a3..96ee17d68b4c 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -433,6 +433,11 @@ static inline void task_untag_mask(struct seq_file *m, struct mm_struct *mm) seq_printf(m, "untag_mask:\t%#lx\n", mm_untag_mask(mm)); } +__weak void arch_proc_pid_thread_features(struct seq_file *m, + struct task_struct *task) +{ +} + int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { @@ -457,6 +462,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, task_cpus_allowed(m, task); cpuset_task_status_allowed(m, task); task_context_switch_counts(m, task); + arch_proc_pid_thread_features(m, task); return 0; } diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 81d6e4ec2294..5a8b21c0a587 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -158,6 +158,8 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); #endif /* CONFIG_PROC_PID_ARCH_STATUS */ +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task); + #else /* CONFIG_PROC_FS */ static inline void proc_root_init(void) From patchwork Sat Dec 3 00:36:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29192 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143296wrr; Fri, 2 Dec 2022 16:43:58 -0800 (PST) X-Google-Smtp-Source: AA0mqf6pCrs+nwvxSF0r8wJSjQwpK9C+T3AOFHJLyDOOow4D+Iu8P8c+20sIEpeancVEyWsPBvOZ X-Received: by 2002:a17:90b:a4b:b0:219:2c16:f1a1 with SMTP id gw11-20020a17090b0a4b00b002192c16f1a1mr32270143pjb.171.1670028237949; Fri, 02 Dec 2022 16:43:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028237; cv=none; d=google.com; s=arc-20160816; b=pD2A44vrufZ8wf2sNf+uCecIRUNH9Prq8YRJgDJna0jSKbY0x9BTiSXnL6tTPWpcyE thENofII6ZQ8ZdkQfGYa7vVjlXZYmAwL9/fc5CsfaV+cUHMonw7BaVNb/j5g/HTdVyff cepJrhe7JqahA5TDMYJ/hhdmK5cGGR52R1jPoz+x5h+hwOwoW7XEHT8aLin93SEwiZl2 cP98SClH2PckqzQ4hGLVHjt+T1P7CVhXuWJFvYsq+HyLI6MMaJx1UDjiY1cVLLZbd/aL 2QOyiFByQRKfjp2go1q6B6ednNo8y3ebX8XB4+mo7YOcJtXIIAM1lYmhzy4bSzAB63vf FQyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=+de84YIoYZqxK0JVMOFq+r9tX1VB9+lOs+4PRBWG3+w=; b=N9lEgdIWHdOpgxvBIoFvfSP1lZJe2iYnBM2UtCYj9ShWFKGbjsZpAPAdsL1oFV3b1q yVZjg6Pml7j40mOnFL2vO94XYH3M+D+y2maoJpxaKe9rQKlgJLXGFiKx2rOQCUFXUNWU q69kIkJ422vrg2V+jONMuDQ9zbotmrb3KGpFTNfp/uakeGI/iMnPEFvRWfJr8hyQ0nv5 yEECxV7PM5B0N8mpBPe3HLsl02bXo4VEv8T0gitvkT1KPJ+W3BfSb/92wkM5SK23LXM4 QfkKFY83MDboprh0L94jx6Talm/bLazq39ifBmHwhP5sruJjESRsxXSe2QiDVwsYhy0v 4fkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nk9tCHQw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u11-20020a6540cb000000b004788e156154si1919723pgp.360.2022.12.02.16.43.45; Fri, 02 Dec 2022 16:43:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nk9tCHQw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235330AbiLCAnU (ORCPT + 99 others); Fri, 2 Dec 2022 19:43:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235275AbiLCAmC (ORCPT ); Fri, 2 Dec 2022 19:42:02 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29A101042E1; Fri, 2 Dec 2022 16:38:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027908; x=1701563908; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=oXcpHnXlZARx8+yfiIaAkJ+dfiFM2oEqDcrBgXXzKas=; b=nk9tCHQwz3olEp/1u47Ox28tDtYJTMZQF8GEBMwjB6skfPV9SWCygZCL NT5CE1MTOKB5obYu178zJvAJwvkbelL1PDqyphFtXEyLnxTl6ZPfQ8x0Y Z0fykWezxIyhL+JfVX0nRF9i8WrEBIFkoueFCadIjqKy6oGz4RYT8lflq tuWDoCaOnqh5Pp5fovUUrfLzGSf4C3MKM19Ur7T4/1y/mPWhWKzbi51qL DWT/qBGaFHBCgcMNpiWDsgIes+CgCQNh1RvzFPFMTwEEAdL5+2BqJYXcp TAyzUPajlFEKMY+bQd3ddF21qufL2oRooZIpCGZqe/o+p7Iets4qmerrI A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711510" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711510" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:42 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787480005" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787480005" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:40 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 33/39] x86: Prevent 32 bit operations for 64 bit shstk tasks Date: Fri, 2 Dec 2022 16:36:00 -0800 Message-Id: <20221203003606.6838-34-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151529829902151?= X-GMAIL-MSGID: =?utf-8?q?1751151529829902151?= Shadow stack is only supported in 64 bit kernels. Since 64 bit kernels can run 32 bit applications using 32 bit emulation it would still be possible to support shadow stack for 32 bit applications. The challenge for this would be the shadow stack sigframe format uses noncannonical bits in the shadow stack frame, for which there are none in 32 bit. This means the 32 bit shadow stack sigframe would need either an alternate unknown solution or to not support any features that expand the shadow stack sigframe (alt shadow stack). It would add complexity to separate out these future features between the ABIs. But shadow stack support for 32 bit would not likely be very useful. 32 bit emulation is mostly to support old apps, which should not have enabled shadow stack. So this series disables support for 32 bit apps by blocking the arch_prctl()s when made via a 32 bit syscall. But PeterZ points out (see link) that some applications will utilize a 32 bit segment to run 32 bit code inside a 64 bit process. WINE does this to run 32 bit Windows apps. However, doing this with shadow stack is not likely to happen in practice since Windows does not support shadow stack for 32 bit applications either. Any preexisting applications that are marked with shadow stack are unlikely to make it to the point where they can exercise any 32 bit signal interactions anyway, because the HW requires that the shadow stack pointer register needs to be in the 32 bit range in this case, but shadow stack would have been allocated in the 64 bit address space. The shadow stack enabled app would #GP when doing the long jump into the 32 bit segment. So since 32 bit is not easy to support, and there are likely not many users. More cleanly don't support 32 bit signals in a 64 bit address space by not allowing 32 bit ABI signal handlers when shadow stack is enabled. Do this by clearing any 32 bit ABI signal handlers when shadow stack is enabled, and disallow any further 32 bit ABI signal handlers. Also, return an error code for the clone operations when in a 32 bit syscall. Link: https://lore.kernel.org/lkml/Y3S5AKhLaU+YuUpQ@hirez.programming.kicks-ass.net/#t Signed-off-by: Rick Edgecombe --- v4: - New patch arch/x86/include/asm/shstk.h | 12 ++++++++++++ arch/x86/include/asm/sighandling.h | 1 + arch/x86/kernel/shstk.c | 10 ++++++++-- arch/x86/kernel/signal.c | 20 ++++++++++++++++++++ arch/x86/kernel/signal_compat.c | 5 +++++ include/linux/ptrace.h | 1 + kernel/signal.c | 8 ++++++++ 7 files changed, 55 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index 746c040f7cb6..c82f22fd5e6d 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -4,6 +4,7 @@ #ifndef __ASSEMBLY__ #include +#include struct task_struct; struct ksignal; @@ -22,6 +23,12 @@ int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, void shstk_free(struct task_struct *p); int setup_signal_shadow_stack(struct ksignal *ksig); int restore_signal_shadow_stack(void); +bool features_enabled(unsigned long features); + +static inline bool shstk_enabled(void) +{ + return features_enabled(ARCH_SHSTK_SHSTK); +} #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } @@ -33,6 +40,11 @@ static inline int shstk_alloc_thread_stack(struct task_struct *p, static inline void shstk_free(struct task_struct *p) {} static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } static inline int restore_signal_shadow_stack(void) { return 0; } + +static inline bool shstk_enabled(void) +{ + return false; +} #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/sighandling.h b/arch/x86/include/asm/sighandling.h index e770c4fc47f4..eba88c7a6446 100644 --- a/arch/x86/include/asm/sighandling.h +++ b/arch/x86/include/asm/sighandling.h @@ -24,4 +24,5 @@ int ia32_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs); int x64_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs); int x32_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs); +void flush_32bit_signals(struct task_struct *t); #endif /* _ASM_X86_SIGHANDLING_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index e59544fec96d..3a7bcc01d985 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -24,11 +24,11 @@ #include #include #include -#include +#include #define SS_FRAME_SIZE 8 -static bool features_enabled(unsigned long features) +bool features_enabled(unsigned long features) { return current->thread.features & features; } @@ -146,6 +146,8 @@ static int shstk_setup(void) if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || in_32bit_syscall()) return -EOPNOTSUPP; + flush_32bit_signals(current); + size = adjust_shstk_size(0); addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) @@ -183,6 +185,10 @@ int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, if (!features_enabled(ARCH_SHSTK_SHSTK)) return 0; + /* If shadow stack is enabled, 32 bit syscalls are not supported */ + if (in_32bit_syscall()) + return 1; + /* * For CLONE_VM, except vfork, the child needs a separate shadow * stack. diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index b2c9853ce1c5..721b326d61ec 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -352,6 +352,26 @@ void signal_fault(struct pt_regs *regs, void __user *frame, char *where) force_sig(SIGSEGV); } +void flush_32bit_signals(struct task_struct *t) +{ + const unsigned long flags_32 = SA_IA32_ABI | SA_X32_ABI; + struct k_sigaction *ka; + int i; + + spin_lock_irq(&t->sighand->siglock); + ka = &t->sighand->action[0]; + for (i = 0; i < _NSIG; i++) { + if (ka->sa.sa_flags & flags_32) { + ka->sa.sa_handler = SIG_DFL; + ka->sa.sa_flags = 0; + ka->sa.sa_restorer = NULL; + sigemptyset(&ka->sa.sa_mask); + } + ka++; + } + spin_unlock_irq(&t->sighand->siglock); +} + #ifdef CONFIG_DYNAMIC_SIGFRAME #ifdef CONFIG_STRICT_SIGALTSTACK_SIZE static bool strict_sigaltstack_size __ro_after_init = true; diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index d441804443d5..9c73435bc393 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -177,6 +177,11 @@ static inline void signal_compat_build_tests(void) /* any new si_fields should be added here */ } +bool sigaction_compat_invalid(void) +{ + return in_32bit_syscall() && shstk_enabled(); +} + void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact) { signal_compat_build_tests(); diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index c952c5ba8fab..30ec68f56caf 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -405,6 +405,7 @@ static inline void user_single_step_report(struct pt_regs *regs) extern int task_current_syscall(struct task_struct *target, struct syscall_info *info); extern void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact); +bool sigaction_compat_invalid(void); /* * ptrace report for syscall entry and exit looks identical. diff --git a/kernel/signal.c b/kernel/signal.c index d140672185a4..a75351c8fc0e 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -4079,6 +4079,11 @@ void kernel_sigaction(int sig, __sighandler_t action) } EXPORT_SYMBOL(kernel_sigaction); +bool __weak sigaction_compat_invalid(void) +{ + return false; +} + void __weak sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact) { @@ -4093,6 +4098,9 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact) if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig))) return -EINVAL; + if (sigaction_compat_invalid()) + return -EINVAL; + k = &p->sighand->action[sig-1]; spin_lock_irq(&p->sighand->siglock); From patchwork Sat Dec 3 00:36:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29194 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143377wrr; Fri, 2 Dec 2022 16:44:12 -0800 (PST) X-Google-Smtp-Source: AA0mqf4iKOUYJ6myaSNtSf2wbtwz/gxSoVUL/c67Y7DUy8EemuLlmmA7WbbBOQ4vJJwEd6biMvSa X-Received: by 2002:a17:902:ab8d:b0:17f:8232:257a with SMTP id f13-20020a170902ab8d00b0017f8232257amr55558367plr.138.1670028252299; Fri, 02 Dec 2022 16:44:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028252; cv=none; d=google.com; s=arc-20160816; b=tKmhds30o9m9ex+62nLGjMQ2BRcj3j/sJ2SjzOa7g5Bqi685bdLdwrpKMek2W9jZxg pPn1e9FQ15GRwlVybVuca7Go407h/ffQ04UpY5kcyQP7v4IJ7rKV1fqRwzPda1gZlgXm +qhVw8U9t8LCHp0Xuoy72Dh/AflSwq4yCp376wwEXYCFSruSgsPnTKaROzg4k7Ma+x6K RBWLLH3Hv3PsvIIqv+/KHVYHwPa2FP0QGB44dYTSVksGNmVFG3pAvYWV4/5SB0IL6by9 X8dnJJPg9DOXCib0FNdq97YZhD/AIW7cZPI7jB+pRSNBc841lbMWItQqE66J7CsWCHPx IjNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=uvxv1iubneua9nCs2uvSGpfhBBq803f1R3WCljoUdUo=; b=VpoPUsK1Uvcii8Mr/NtIva1v1Xwrd/1ZVArBW2SDkYlf+MbeBYblUvD2FNpKZjGoWh ML36Y4LoygKc47HHoZHQRAQvwyQmExPumItn83z4HxqQl21AkDs02gr1WwsLeXkWD2Ty YfkwvNSyxkQpWRooOpzDQF1WRvDxPzFNJc8clfQNexnA88m9uJ2BGO8wGrP8ln7OLolY y0GKLGgNwwxKzYPVpjS5vCWdNM8BKMwi052k7LicnZ/CUeCDtF4PHuGqqUSF4u7yG0Mn /Wa7Tvdg35WN/rDhZQjMAPZvs9QsvKJLvZVxOxPyoU3blarZyoabAx+Bkx6Hv8UUcGuQ o6hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MACDcLCH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h2-20020a170902704200b0018930d070bdsi7830128plt.72.2022.12.02.16.43.58; Fri, 02 Dec 2022 16:44:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MACDcLCH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235001AbiLCAnn (ORCPT + 99 others); Fri, 2 Dec 2022 19:43:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234980AbiLCAmF (ORCPT ); Fri, 2 Dec 2022 19:42:05 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 596E3FA451; Fri, 2 Dec 2022 16:38:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027912; x=1701563912; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=tLQzvtFCG7gPr37x5ob+Amj80mPdYmHi/Bjvakhw+Z8=; b=MACDcLCHEH3KqhNgCMhCN9N6ImoxZTG/+c71XeRUzK2pvixCpU9+rdeg uLm/R98OeQWCQW95iYfcmQOTYpSCDCr62MiIxMJ9sUJDAN35mCmc/rYz3 ceBDHy914aIGzz93MHfZQxBuCr9xGdXLumvRvNbYxAWWjSpmDiidYN6KZ 1MwlNzHQmhwE2z7KQt6L3l3cw2FarSDWdC7P0QxIjtLe+rTBnHS3w/nL9 yOX5nW+5J22ek9XZoGzolFI5cEH/lJj5f4UpNAyUFa/uvIxHiHDdDpgAy v3PVil9NziIFXNJHlW1HQPBFa8d3WtHyoVegC7il+th6kt13FlsG8baCz g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711538" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711538" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:44 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787480020" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787480020" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:42 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 34/39] x86/shstk: Wire in shadow stack interface Date: Fri, 2 Dec 2022 16:36:01 -0800 Message-Id: <20221203003606.6838-35-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151544604757957?= X-GMAIL-MSGID: =?utf-8?q?1751151544604757957?= The kernel now has the main shadow stack functionality to support applications. Wire in the WRSS and shadow stack enable/disable functions into the existing shadow stack API skeleton. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Rick Edgecombe --- v4: - Remove "CET" references v2: - Split from other patches arch/x86/kernel/shstk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 3a7bcc01d985..5d91e653f77a 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -468,9 +468,17 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long features) return -EINVAL; if (option == ARCH_SHSTK_DISABLE) { + if (features & ARCH_SHSTK_WRSS) + return wrss_control(false); + if (features & ARCH_SHSTK_SHSTK) + return shstk_disable(); return -EINVAL; } /* Handle ARCH_SHSTK_ENABLE */ + if (features & ARCH_SHSTK_SHSTK) + return shstk_setup(); + if (features & ARCH_SHSTK_WRSS) + return wrss_control(true); return -EINVAL; } From patchwork Sat Dec 3 00:36:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29196 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143493wrr; Fri, 2 Dec 2022 16:44:37 -0800 (PST) X-Google-Smtp-Source: AA0mqf63Pa5DHwbf3Gh/w1DAPDqtngV49U1iP0oh69JC+rRbgmRF70RTpEiKi4Eb7923WIrrsTRV X-Received: by 2002:a63:3113:0:b0:478:99c3:7cf9 with SMTP id x19-20020a633113000000b0047899c37cf9mr957351pgx.616.1670028276865; Fri, 02 Dec 2022 16:44:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028276; cv=none; d=google.com; s=arc-20160816; b=hcSE3ZUQwWohz8AtS0mp7BVuwaXGz7aVTCb2GyxhDvBRYZKCqgp3q5CfP2IXgmrCLc LTBEdXmgbtTEnmDZLI6RZyhSnD6HgndJtw1PbDmWmMahezs4Lciwyf99kkCQI0nPbbK9 WHhk3OIDVk3PiHmCVzNygwMoNQ7397uyLIXczSSecsaNzIK3T26llWy4PvwlSWUOTb7N J622eRgNvODemxiv5Lb+yUMkVsMgVwaHFKlGxruf3GDy82zSslkzyKue5WnBLZs7SsGc Kz7GF9LQJzU/2ME0JuDOvyirLNio9sn+9zKLoEoNqHXsy59x9Ig9zS5dVA3rPNCSuk9R Ngsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=s+tOoCp62C8QcORDPC+o4rrawuoi5o/xVkoFtw6d+QI=; b=qnRIpUFqbANtg8tXXxhSDhlTEAN3BvVHYESK+p+Rh0ZIrMmdGfMSgIBlErlXECNFzs 4Y0Pxnt6gMdjrBY7OT4RH/a8iAgkNU6B3Mlhh65iyrzHr8SEpHt2NQ3J1YQfG0UweVb9 v+kOhjsnyrP5ho4bU+2CREFF7GDN6F75buAqKRhszKPYIWeeMyyRU9sIOOt3IjfK0DLH TKcr75fXe3aYjzr8TSSqWFf3IuN5qegNHrts5yQJIi2Mg99/g7PjiplhwUgWwAUtTj0o 4viSfAESiGFVo5bD1WIzR2GiaDYd+IiZsMI36uK/lc2aDpw1jgc7CjTrB/JdK7S96PIu I1vw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=b7H+bGg4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nd11-20020a17090b4ccb00b00212fa424170si8576491pjb.130.2022.12.02.16.44.24; Fri, 02 Dec 2022 16:44:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=b7H+bGg4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235340AbiLCAn5 (ORCPT + 99 others); Fri, 2 Dec 2022 19:43:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235141AbiLCAmI (ORCPT ); Fri, 2 Dec 2022 19:42:08 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2737FA45C; Fri, 2 Dec 2022 16:38:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027913; x=1701563913; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=efTvm24gcHxFzzeUdH8PlH6ucCNVivPdQJU6lJEiVBI=; b=b7H+bGg4WTodV/lApmuB4OWJmAWT4t5QrbDvaD5OxiPcs1u7UH/nlFdg dDQff4xf7Pg4vrSWk9JQO040Oh2aHqE/Ju48V6iUI29WORrD7dX2/gJ1U uaAs0cjg5L17ZrRVVv3AEexsyVwD3rwEyPd4Vizoj4VPYLyde4azwLjiJ eXwHtuWBrUxTq6xdS8sSd6XSunoqiQ4b65d77gr0rrrSKJYX6A6QsCV19 jBH2CQj1wjLLqgy4FTKa4N8HwbI/PJE90t1JU03nuIfiIoJa8Pwy0P67n aPbvf6N65WqbDPZ8ZTK6DBLvYXah2JWbBPpzL/7MVGLwbNSc8F4eTKKQh g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711565" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711565" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:46 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787480029" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787480029" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:44 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 35/39] selftests/x86: Add shadow stack test Date: Fri, 2 Dec 2022 16:36:02 -0800 Message-Id: <20221203003606.6838-36-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151570369187260?= X-GMAIL-MSGID: =?utf-8?q?1751151570369187260?= Add a simple selftest for exercising some shadow stack behavior: - map_shadow_stack syscall and pivot - Faulting in shadow stack memory - Handling shadow stack violations - GUP of shadow stack memory - mprotect() of shadow stack memory - Userfaultfd on shadow stack memory Since this test exercises a recently added syscall manually, it needs to find the automatically created __NR_foo defines. Per the selftest documentation, KHDR_INCLUDES can be used to help the selftest Makefile's find the headers from the kernel source. This way the new selftest can be built inside the kernel source tree without installing the headers to the system. So also add KHDR_INCLUDES as described in the selftest docs, to facilitate this. Tested-by: Pengfei Xu Tested-by: John Allen Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe --- v4: - Add test for 32 bit signal ABI blocking v3: - Change "+m" to "=m" in write_shstk() (Andrew Cooper) - Fix userfaultfd test with transparent huge pages by doing a MADV_DONTNEED, since the token write faults in the while stack with huge pages. v2: - Change print statements to more align with other selftests - Add more tests - Add KHDR_INCLUDES to Makefile v1: - New patch. tools/testing/selftests/x86/Makefile | 4 +- .../testing/selftests/x86/test_shadow_stack.c | 685 ++++++++++++++++++ 2 files changed, 687 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/x86/test_shadow_stack.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index c1a16a9d4f2f..7e8c937627dd 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \ - corrupt_xstate_header amx lam + corrupt_xstate_header amx lam test_shadow_stack # Some selftests require 32bit support enabled also on 64bit systems TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall @@ -34,7 +34,7 @@ BINARIES_64 := $(TARGETS_C_64BIT_ALL:%=%_64) BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32)) BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64)) -CFLAGS := -O2 -g -std=gnu99 -pthread -Wall +CFLAGS := -O2 -g -std=gnu99 -pthread -Wall $(KHDR_INCLUDES) # call32_from_64 in thunks.S uses absolute addresses. ifeq ($(CAN_BUILD_WITH_NOPIE),1) diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c new file mode 100644 index 000000000000..ec5864f0e9ef --- /dev/null +++ b/tools/testing/selftests/x86/test_shadow_stack.c @@ -0,0 +1,685 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program test's basic kernel shadow stack support. It enables shadow + * stack manual via the arch_prctl(), instead of relying on glibc. It's + * Makefile doesn't compile with shadow stack support, so it doesn't rely on + * any particular glibc. As a result it can't do any operations that require + * special glibc shadow stack support (longjmp(), swapcontext(), etc). Just + * stick to the basics and hope the compiler doesn't do anything strange. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SS_SIZE 0x200000 + +#if (__GNUC__ < 8) || (__GNUC__ == 8 && __GNUC_MINOR__ < 5) +int main(int argc, char *argv[]) +{ + printf("[SKIP]\tCompiler does not support CET.\n"); + return 0; +} +#else +void write_shstk(unsigned long *addr, unsigned long val) +{ + asm volatile("wrssq %[val], (%[addr])\n" + : "=m" (addr) + : [addr] "r" (addr), [val] "r" (val)); +} + +static inline unsigned long __attribute__((always_inline)) get_ssp(void) +{ + unsigned long ret = 0; + + asm volatile("xor %0, %0; rdsspq %0" : "=r" (ret)); + return ret; +} + +/* + * For use in inline enablement of shadow stack. + * + * The program can't return from the point where shadow stack get's enabled + * because there will be no address on the shadow stack. So it can't use + * syscall() for enablement, since it is a function. + * + * Based on code from nolibc.h. Keep a copy here because this can't pull in all + * of nolibc.h. + */ +#define ARCH_PRCTL(arg1, arg2) \ +({ \ + long _ret; \ + register long _num asm("eax") = __NR_arch_prctl; \ + register long _arg1 asm("rdi") = (long)(arg1); \ + register long _arg2 asm("rsi") = (long)(arg2); \ + \ + asm volatile ( \ + "syscall\n" \ + : "=a"(_ret) \ + : "r"(_arg1), "r"(_arg2), \ + "0"(_num) \ + : "rcx", "r11", "memory", "cc" \ + ); \ + _ret; \ +}) + +void *create_shstk(void *addr) +{ + return (void *)syscall(__NR_map_shadow_stack, addr, SS_SIZE, SHADOW_STACK_SET_TOKEN); +} + +void *create_normal_mem(void *addr) +{ + return mmap(addr, SS_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); +} + +void free_shstk(void *shstk) +{ + munmap(shstk, SS_SIZE); +} + +int reset_shstk(void *shstk) +{ + return madvise(shstk, SS_SIZE, MADV_DONTNEED); +} + +void try_shstk(unsigned long new_ssp) +{ + unsigned long ssp; + + printf("[INFO]\tnew_ssp = %lx, *new_ssp = %lx\n", + new_ssp, *((unsigned long *)new_ssp)); + + ssp = get_ssp(); + printf("[INFO]\tchanging ssp from %lx to %lx\n", ssp, new_ssp); + + asm volatile("rstorssp (%0)\n":: "r" (new_ssp)); + asm volatile("saveprevssp"); + printf("[INFO]\tssp is now %lx\n", get_ssp()); + + /* Switch back to original shadow stack */ + ssp -= 8; + asm volatile("rstorssp (%0)\n":: "r" (ssp)); + asm volatile("saveprevssp"); +} + +int test_shstk_pivot(void) +{ + void *shstk = create_shstk(0); + + if (shstk == MAP_FAILED) { + printf("[FAIL]\tError creating shadow stack: %d\n", errno); + return 1; + } + try_shstk((unsigned long)shstk + SS_SIZE - 8); + free_shstk(shstk); + + printf("[OK]\tShadow stack pivot\n"); + return 0; +} + +int test_shstk_faults(void) +{ + unsigned long *shstk = create_shstk(0); + + /* Read shadow stack, test if it's zero to not get read optimized out */ + if (*shstk != 0) + goto err; + + /* Wrss memory that was already read. */ + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + /* Page out memory, so we can wrss it again. */ + if (reset_shstk((void *)shstk)) + goto err; + + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + printf("[OK]\tShadow stack faults\n"); + return 0; + +err: + return 1; +} + +unsigned long saved_ssp; +unsigned long saved_ssp_val; +volatile bool segv_triggered; + +void __attribute__((noinline)) violate_ss(void) +{ + saved_ssp = get_ssp(); + saved_ssp_val = *(unsigned long *)saved_ssp; + + /* Corrupt shadow stack */ + printf("[INFO]\tCorrupting shadow stack\n"); + write_shstk((void *)saved_ssp, 0); +} + +void segv_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tGenerated shadow stack violation successfully\n"); + + segv_triggered = true; + + /* Fix shadow stack */ + write_shstk((void *)saved_ssp, saved_ssp_val); +} + +int test_shstk_violation(void) +{ + struct sigaction sa; + + sa.sa_sigaction = segv_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* Make sure segv_triggered is set before violate_ss() */ + asm volatile("" : : : "memory"); + + violate_ss(); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow stack violation test\n"); + + return !segv_triggered; +} + +/* Gup test state */ +#define MAGIC_VAL 0x12345678 +bool is_shstk_access; +void *shstk_ptr; +int fd; + +void reset_test_shstk(void *addr) +{ + if (shstk_ptr != NULL) + free_shstk(shstk_ptr); + shstk_ptr = create_shstk(addr); +} + +void test_access_fix_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tViolation from %s\n", is_shstk_access ? "shstk access" : "normal write"); + + segv_triggered = true; + + /* Fix shadow stack */ + if (is_shstk_access) { + reset_test_shstk(shstk_ptr); + return; + } + + free_shstk(shstk_ptr); + create_normal_mem(shstk_ptr); +} + +bool test_shstk_access(void *ptr) +{ + is_shstk_access = true; + segv_triggered = false; + write_shstk(ptr, MAGIC_VAL); + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool test_write_access(void *ptr) +{ + is_shstk_access = false; + segv_triggered = false; + *(unsigned long *)ptr = MAGIC_VAL; + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool gup_write(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (write(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +bool gup_read(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (read(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +int test_gup(void) +{ + struct sigaction sa; + int status; + pid_t pid; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + return 1; + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> write access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> write access success\n"); + + close(fd); + + /* COW/gup test */ + reset_test_shstk(0); + pid = fork(); + if (!pid) { + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + exit(1); + + if (gup_write(shstk_ptr)) { + close(fd); + exit(1); + } + close(fd); + exit(0); + } + waitpid(pid, &status, 0); + if (WEXITSTATUS(status)) { + printf("[FAIL]\tWrite in child failed\n"); + return 1; + } + if (*(unsigned long *)shstk_ptr == MAGIC_VAL) { + printf("[FAIL]\tWrite in child wrote through to shared memory\n"); + return 1; + } + + printf("[INFO]\tCow gup write -> write access success\n"); + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow gup test\n"); + + return 0; +} + +int test_mprotect(void) +{ + struct sigaction sa; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* mprotect a shaodw stack as read only */ + reset_test_shstk(0); + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* try to wrss it and fail */ + if (!test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to read-only memory succeeded\n"); + return 1; + } + + /* then back to writable */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_WRITE | PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_WRITE) failed\n"); + return 1; + } + + /* then pivot to it and succeed */ + if (test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to mprotect() writable memory failed\n"); + return 1; + } + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tmprotect() test\n"); + + return 0; +} + +char zero[4096]; + +static void *uffd_thread(void *arg) +{ + struct uffdio_copy req; + int uffd = *(int *)arg; + struct uffd_msg msg; + + if (read(uffd, &msg, sizeof(msg)) <= 0) + return (void *)1; + + req.dst = msg.arg.pagefault.address; + req.src = (__u64)zero; + req.len = 4096; + req.mode = 0; + + if (ioctl(uffd, UFFDIO_COPY, &req)) + return (void *)1; + + return (void *)0; +} + +int test_userfaultfd(void) +{ + struct uffdio_register uffdio_register; + struct uffdio_api uffdio_api; + struct sigaction sa; + pthread_t thread; + void *res; + int uffd; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + printf("[SKIP]\tUserfaultfd unavailable.\n"); + return 0; + } + + reset_test_shstk(0); + + uffdio_api.api = UFFD_API; + uffdio_api.features = 0; + if (ioctl(uffd, UFFDIO_API, &uffdio_api)) + goto err; + + uffdio_register.range.start = (__u64)shstk_ptr; + uffdio_register.range.len = 4096; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) + goto err; + + if (pthread_create(&thread, NULL, &uffd_thread, &uffd)) + goto err; + + reset_shstk(shstk_ptr); + test_shstk_access(shstk_ptr); + + if (pthread_join(thread, &res)) + goto err; + + if (test_shstk_access(shstk_ptr)) + goto err; + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + if (!res) + printf("[OK]\tUserfaultfd test\n"); + return !!res; +err: + free_shstk(shstk_ptr); + close(uffd); + signal(SIGSEGV, SIG_DFL); + return 1; +} + +/* + * Too complicated to pull it out of the 32 bit header, but also get the + * 64 bit one needed above. Just define a copy here. + */ +#define __NR_compat_sigaction 67 + +/* + * Call 32 bit signal handler to get 32 bit signals ABI. Make sure + * to push the registers that will get clobbered. + */ +int sigaction32(int signum, const struct sigaction *restrict act, + struct sigaction *restrict oldact) +{ + register long syscall_reg asm("eax") = __NR_compat_sigaction; + register long signum_reg asm("ebx") = signum; + register long act_reg asm("ecx") = (long)act; + register long oldact_reg asm("edx") = (long)oldact; + int ret = 0; + + asm volatile ( "push %%r8;" \ + "push %%r9;" \ + "push %%r10;" \ + "push %%r11;" \ + "int $0x80;" \ + "pop %%r11;" + "pop %%r10;" + "pop %%r9;" + "pop %%r8;" + : "=a"(ret), "=m"(oldact) + : "r"(syscall_reg), "r"(signum_reg), "r"(act_reg), + "r"(oldact_reg) + : + ); + + return ret; +} + +/* + * 32 bit signal ABI is not compatible with shadow stack. This test + * checks that they are disabled properly. + */ +int test_32bit_signals_disabled(void) +{ + struct sigaction *act, old; + int ret = 0; + + /* Create sigaction in 32 bit address range */ + act = mmap(0, 4096, PROT_READ | PROT_WRITE, + MAP_32BIT | MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + act->sa_flags = SA_SIGINFO; + + /* + * Set handler to somewhere in 32 bit address space, doesn't matter it + * won't get called. + */ + act->sa_handler = (void *)act+1; + if (sigaction32(SIGUSR1, act, NULL)) + return 1; + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not re-enable Shadow stack\n"); + return 1; + } + + /* + * Check that the shadow stack enabling above disabled the signal + * handler. + */ + if (sigaction(SIGUSR1, NULL, &old)) { + ret = 1; + goto out_disable; + } + + /* Did it not get disabled by the ARCH_SHSTK_ENABLE? */ + if (old.sa_handler != SIG_DFL) { + ret = 1; + goto out_disable; + } + + /* Registering new 32 bit signals should fail. */ + if (!sigaction32(SIGUSR1, act, NULL)) { + ret = 1; + } + +out_disable: + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + if (!ret) + printf("[OK]\t32 bit signal disable test\n"); + + return ret; +} + +int main(int argc, char *argv[]) +{ + int ret = 0; + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not re-enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_WRSS)) { + printf("[SKIP]\tCould not enable WRSS\n"); + ret = 1; + goto out; + } + + /* Should have succeeded if here, but this is a test, so double check. */ + if (!get_ssp()) { + printf("[FAIL]\tShadow stack disabled\n"); + return 1; + } + + if (test_shstk_pivot()) { + ret = 1; + printf("[FAIL]\tShadow stack pivot\n"); + goto out; + } + + if (test_shstk_faults()) { + ret = 1; + printf("[FAIL]\tShadow stack fault test\n"); + goto out; + } + + if (test_shstk_violation()) { + ret = 1; + printf("[FAIL]\tShadow stack violation test\n"); + goto out; + } + + if (test_gup()) { + ret = 1; + printf("[FAIL]\tShadow shadow stack gup\n"); + goto out; + } + + if (test_mprotect()) { + ret = 1; + printf("[FAIL]\tShadow shadow mprotect test\n"); + goto out; + } + + if (test_userfaultfd()) { + ret = 1; + printf("[FAIL]\tUserfaultfd test\n"); + goto out; + } + + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not disable Shadow stack\n"); + return 1; + } + + if (test_32bit_signals_disabled()) { + ret = 1; + printf("[FAIL]\t32 bit signal disable test\n"); + } + + return ret; + +out: + /* + * Disable shadow stack before the function returns, or there will be a + * shadow stack violation. + */ + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + return ret; +} +#endif From patchwork Sat Dec 3 00:36:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29195 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143381wrr; Fri, 2 Dec 2022 16:44:14 -0800 (PST) X-Google-Smtp-Source: AA0mqf4emKdGXdpLxlQr9WyUvnrBQ8d7+BBTIDx5GOUfB42NnhxW1BI6XRYrwfbZS0fxWW4485lv X-Received: by 2002:a17:90a:e542:b0:219:9bdc:b0f with SMTP id ei2-20020a17090ae54200b002199bdc0b0fmr2960866pjb.220.1670028254162; Fri, 02 Dec 2022 16:44:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028254; cv=none; d=google.com; s=arc-20160816; b=P0ehY0hHrIZlAHQzObF1g8ceuW+WsPZnQRIwLzhoT26nlwd+fJaB5eDUD9bYvruX/y jcY9VVLy95pIGRPIm7eIEv73LFLLbqvDDqR1u2OEpddzrMGM5tmhwfAgf8GZF8FWX/P0 RLbhY1JJaRQLawVKqpbrQ5wDgyXXj6WHMPBE9osWuIf8oGc0q1mGz0JQtLFRgiDnoWnn 8aO6QoEDpzm3957g0UBoFrcojg2cpDaQtZ+Mj7RtYpZW+/MoYiGJsrxkC0vpAoG2Uf1X oMh4wwqp4q4yK1dRmxsbbvclHZxKzjNPr8DD/J5ff5La1uY7838cyKfReIzZT3LcpVtc FcEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=UIaubgF5cxlgy9UntqyzMGFahvGgA9BhdHUjWEg2kcA=; b=z4rwK1zUqBF3LSZlCUfx7R9GmGeqOaDbvBZbowFZIpGm4mrNONiqD1G2Gi1GWj7tXc gWpD8doDXa/SGIBp/wTAGZKqH+g7OcW4DtkHaTerbIDcOmnkN0dZiN0p8hMmgwj/gYr6 C3cRpBlQfNUFk+KGpOqSqopDVQ45f2J3Ae3IdyhCO5Wq1xNF/JuVWFGCqNUNWE5SR7BG sieIVs8OD/OGVC610se1UFKtxZwvTTgIbHOlBBA8Q9jMfWbD1caKFXH9Re9fiIwYt4IJ SEdzjqD7uqUVcxg8I+XH8NgkAlgpZUfvMa6UCoixHPIPW8wES4KFp+ZzAqMSt79GjQZA nn0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="jBl/UXj5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q17-20020a056a00089100b0056e2b0193bdsi9539792pfj.141.2022.12.02.16.44.00; Fri, 02 Dec 2022 16:44:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="jBl/UXj5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235267AbiLCAnr (ORCPT + 99 others); Fri, 2 Dec 2022 19:43:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235094AbiLCAmG (ORCPT ); Fri, 2 Dec 2022 19:42:06 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 303B4FA44A; Fri, 2 Dec 2022 16:38:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027915; x=1701563915; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Y8HP+6SI6A4E79h0YyqSt+wn/k0j51I/AMnEXqlWOFs=; b=jBl/UXj5+Sgarm3jvEG5JfsuMBTkhPu3cbT20MlfLuo9PksxN9L3Uiey o0VwtFiNfE8ZjJHR8CvLNEIveoUW+ogJTy2tREo108glLf+GFVFeANIjt UBTxqwXa37QCT7tn8/ivH3YF46tTT5jDG3DLuF7VduE6fiQUyEe3Yt8Qe Pd60jMyWTONGCYVFpN1OGm+Dx/7NVgk9iqRemsZ0Osq95c8RQmXaacGQ3 GzXwxkwVheZnTwgofK67YIX8f4UYsGj4sS1JENpvxdFbHqCtDiCpFMHhE qr9luA5Hm/sQsJPAjBAl/Aw0utIIHuOeEJU2waA+ZMJDOnTc4zfZERQk9 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711600" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711600" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:50 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787480049" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787480049" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:46 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 36/39] x86/fpu: Add helper for initing features Date: Fri, 2 Dec 2022 16:36:03 -0800 Message-Id: <20221203003606.6838-37-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151546537802786?= X-GMAIL-MSGID: =?utf-8?q?1751151546537802786?= If an xfeature is saved in a buffer, the xfeature's bit will be set in xsave->header.xfeatures. The CPU may opt to not save the xfeature if it is in it's init state. In this case the xfeature buffer address cannot be retrieved with get_xsave_addr(). Future patches will need to handle the case of writing to an xfeature that may not be saved. So provide helpers to init an xfeature in an xsave buffer. This could of course be done directly by reaching into the xsave buffer, however this would not be robust against future changes to optimize the xsave buffer by compacting it. In that case the xsave buffer would need to be re-arranged as well. So the logic properly belongs encapsulated in a helper where the logic can be unified. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe --- v2: - New patch arch/x86/kernel/fpu/xstate.c | 58 +++++++++++++++++++++++++++++------- arch/x86/kernel/fpu/xstate.h | 6 ++++ 2 files changed, 53 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 13a80521dd51..3ff80be0a441 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -934,6 +934,24 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr) return (void *)xsave + xfeature_get_offset(xcomp_bv, xfeature_nr); } +static int xsave_buffer_access_checks(int xfeature_nr) +{ + /* + * Do we even *have* xsave state? + */ + if (!boot_cpu_has(X86_FEATURE_XSAVE)) + return 1; + + /* + * We should not ever be requesting features that we + * have not enabled. + */ + if (WARN_ON_ONCE(!xfeature_enabled(xfeature_nr))) + return 1; + + return 0; +} + /* * Given the xsave area and a state inside, this function returns the * address of the state. @@ -954,17 +972,7 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr) */ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr) { - /* - * Do we even *have* xsave state? - */ - if (!boot_cpu_has(X86_FEATURE_XSAVE)) - return NULL; - - /* - * We should not ever be requesting features that we - * have not enabled. - */ - if (WARN_ON_ONCE(!xfeature_enabled(xfeature_nr))) + if (xsave_buffer_access_checks(xfeature_nr)) return NULL; /* @@ -984,6 +992,34 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr) return __raw_xsave_addr(xsave, xfeature_nr); } +/* + * Given the xsave area and a state inside, this function + * initializes an xfeature in the buffer. + * + * get_xsave_addr() will return NULL if the feature bit is + * not present in the header. This function will make it so + * the xfeature buffer address is ready to be retrieved by + * get_xsave_addr(). + * + * Inputs: + * xstate: the thread's storage area for all FPU data + * xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP, + * XFEATURE_SSE, etc...) + * Output: + * 1 if the feature cannot be inited, 0 on success + */ +int init_xfeature(struct xregs_state *xsave, int xfeature_nr) +{ + if (xsave_buffer_access_checks(xfeature_nr)) + return 1; + + /* + * Mark the feature inited. + */ + xsave->header.xfeatures |= BIT_ULL(xfeature_nr); + return 0; +} + #ifdef CONFIG_ARCH_HAS_PKEYS /* diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h index a4ecb04d8d64..dc06f63063ee 100644 --- a/arch/x86/kernel/fpu/xstate.h +++ b/arch/x86/kernel/fpu/xstate.h @@ -54,6 +54,12 @@ extern void fpu__init_cpu_xstate(void); extern void fpu__init_system_xstate(unsigned int legacy_size); extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr); +extern int init_xfeature(struct xregs_state *xsave, int xfeature_nr); + +static inline int xfeature_saved(struct xregs_state *xsave, int xfeature_nr) +{ + return xsave->header.xfeatures & BIT_ULL(xfeature_nr); +} static inline u64 xfeatures_mask_supervisor(void) { From patchwork Sat Dec 3 00:36:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29199 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1152895wrr; Fri, 2 Dec 2022 17:14:51 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Sx+7o1/Ij3hm41JN8ZjD51pqdtBeZ0YBnzvCwh3pNEPvum35TI8WXViuXiPftTHqD3FE4 X-Received: by 2002:a17:902:e194:b0:189:56ab:ab68 with SMTP id y20-20020a170902e19400b0018956abab68mr46360503pla.144.1670030091333; Fri, 02 Dec 2022 17:14:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670030091; cv=none; d=google.com; s=arc-20160816; b=V01gl6pzNgG+GGGVRF3rUGSVBNTiUU6yReON3qO3Pu+b6TKbMMAQIbu5aMnIe7M5QR SKQ8N4RL9+bpX2IqOBX5m42eCfqJlFLxdXo4vN534emDd4aLl1qOBcoOte1iGA79UK4B jSPHsaRUqzClmqA2Qp/W84icWq0MhwV0Q8toE4EeUcDhumiZj39DhdrXZTtNOlDUxr2u D3rZ62ojVS6TVyU0n/UambGRdIwvk/OV/WEfwbO3ddrKYbiVdwyD3VIYZ/dRgcNMNcTT mdSB29rkL2CBtK5c4Jn3fNXZLcTPi4vdipxIwhDDx+wH3fIOVkwWhzaPBzf1y9zJ5G3t TB6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=LMdH7Wv2QotlVcGU41q2cEKKvSdrLETZ74gO8hnYQgA=; b=dBZkyFHDBdvdKE1gvEIGTBotOsRXwKIu/5f0+7Bbd8KUPy5hhLYrEIJsKlnw5vr0l0 em+v/fxRU0TEUEUql7YOHpwWCsCFX2ZD4cLW3DAVe1uKaC53JtSSbqBvYWhKkN1evPva hO5aJ/sOTyPqVxdIeAk3/QMVgmv2p2XZ7FaRfHW+PQi343Z0I/5DyUr8cFVoezxwZiWx eX1EkoHlDAyG5uL30w8JZoqcVCSAXOiwHMakekQSBuj/N0JHItKNchwPH9IG4+OQbZLh idtQjvwyzinRkHC3Z2CsFdd9JK28tnWjBZnLcW7xtMT+40fs5jhWaiCLOnVVlON9zLNs lqtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=l6Xe6Ggo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g22-20020a632016000000b0046b3ba2c806si8120629pgg.145.2022.12.02.17.14.37; Fri, 02 Dec 2022 17:14:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=l6Xe6Ggo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235356AbiLCAoF (ORCPT + 99 others); Fri, 2 Dec 2022 19:44:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235037AbiLCAmI (ORCPT ); Fri, 2 Dec 2022 19:42:08 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9EA54FA45E; Fri, 2 Dec 2022 16:38:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027915; x=1701563915; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=itoYo5wz6A01y1aA288M7ulOG+VVmtemkgoBfCLG4/Q=; b=l6Xe6GgowySAErGkAsv2AIeihi5cmdvpR/ZX3wO9sur+LOSOlWQUYHmO RagR47LFB7ajMvndbZzmMCNULYIcUZqC6BZ1Jey2i2+oEU69Lwq11D3Mv 8cmHN6QPMdabqVehQLCilsBDpFFFO4ET83FkKq2IlN7p7yD4wiPG7Dfft JSgBHw635rif34Rt5ESmN4gLMmDzaOVNQ6sDD37MFot3CqQER0M53CmYc dMtR7F3w1Ed9ing4ttMdDNlZY7mTJnfHnx/HEu5+Zzmon6Es0KQYNQrZN ejGj6N5dLQ+rbTXx5sJIKz2L6QrTjbWNMYW9PlJjF5ZC1QKlKyIrV+F8R g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711629" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711629" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:52 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787480058" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787480058" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:50 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v4 37/39] x86: Add PTRACE interface for shadow stack Date: Fri, 2 Dec 2022 16:36:04 -0800 Message-Id: <20221203003606.6838-38-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751153473459886877?= X-GMAIL-MSGID: =?utf-8?q?1751153473459886877?= From: Yu-cheng Yu Some applications (like GDB) would like to tweak shadow stack state via ptrace. This allows for existing functionality to continue to work for seized shadow stack applications. Provide an regset interface for manipulating the shadow stack pointer (SSP). There is already ptrace functionality for accessing xstate, but this does not include supervisor xfeatures. So there is not a completely clear place for where to put the shadow stack state. Adding it to the user xfeatures regset would complicate that code, as it currently shares logic with signals which should not have supervisor features. Don't add a general supervisor xfeature regset like the user one, because it is better to maintain flexibility for other supervisor xfeatures to define their own interface. For example, an xfeature may decide not to expose all of it's state to userspace, as is actually the case for shadow stack ptrace functionality. A lot of enum values remain to be used, so just put it in dedicated shadow stack regset. The only downside to not having a generic supervisor xfeature regset, is that apps need to be enlightened of any new supervisor xfeature exposed this way (i.e. they can't try to have generic save/restore logic). But maybe that is a good thing, because they have to think through each new xfeature instead of encountering issues when new a new supervisor xfeature was added. By adding a shadow stack regset, it also has the effect of including the shadow stack state in a core dump, which could be useful for debugging. The shadow stack specific xstate includes the SSP, and the shadow stack and WRSS enablement status. Enabling shadow stack or wrss in the kernel involves more than just flipping the bit. The kernel is made aware that it has to do extra things when cloning or handling signals. That logic is triggered off of separate feature enablement state kept in the task struct. So the flipping on HW shadow stack enforcement without notifying the kernel to change its behavior would severely limit what an application could do without crashing, and the results would depend on kernel internal implementation details. There is also no known use for controlling this state via prtace today. So only expose the SSP, which is something that userspace already has indirect control over. Tested-by: Pengfei Xu Tested-by: John Allen Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v4: - Make shadow stack only. Reduce to only supporting SSP register, and remove CET references (peterz) - Add comment to not use 0x203, becuase binutils already looks for it in coredumps. (Christina Schimpe) v3: - Drop dependence on thread.shstk.size, and use thread.features bits - Drop 32 bit support v2: - Check alignment on ssp. - Block IBT bits. - Handle init states instead of returning error. - Add verbose commit log justifying the design. Yu-Cheng v12: - Return -ENODEV when CET registers are in INIT state. - Check reserved/non-support bits from user input. arch/x86/include/asm/fpu/regset.h | 7 +-- arch/x86/kernel/fpu/regset.c | 87 +++++++++++++++++++++++++++++++ arch/x86/kernel/ptrace.c | 12 +++++ include/uapi/linux/elf.h | 2 + 4 files changed, 105 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/regset.h b/arch/x86/include/asm/fpu/regset.h index 4f928d6a367b..697b77e96025 100644 --- a/arch/x86/include/asm/fpu/regset.h +++ b/arch/x86/include/asm/fpu/regset.h @@ -7,11 +7,12 @@ #include -extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active; +extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active, + ssp_active; extern user_regset_get2_fn fpregs_get, xfpregs_get, fpregs_soft_get, - xstateregs_get; + xstateregs_get, ssp_get; extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set, - xstateregs_set; + xstateregs_set, ssp_set; /* * xstateregs_active == regset_fpregs_active. Please refer to the comment diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index 6d056b68f4ed..00f3d5c9b682 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "context.h" #include "internal.h" @@ -174,6 +175,92 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, return ret; } + +#ifdef CONFIG_X86_USER_SHADOW_STACK +int ssp_active(struct task_struct *target, const struct user_regset *regset) +{ + if (shstk_enabled()) + return regset->n; + + return 0; +} + +int ssp_get(struct task_struct *target, const struct user_regset *regset, + struct membuf to) +{ + struct fpu *fpu = &target->thread.fpu; + struct cet_user_state *cetregs; + + if (!boot_cpu_has(X86_FEATURE_USER_SHSTK)) + return -ENODEV; + + sync_fpstate(fpu); + cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER); + if (!cetregs) { + /* + * The registers are the in the init state. The init values for + * these regs are zero, so just zero the output buffer. + */ + membuf_zero(&to, sizeof(cetregs->user_ssp)); + return 0; + } + + return membuf_write(&to, (unsigned long *)&cetregs->user_ssp, + sizeof(cetregs->user_ssp)); +} + +int ssp_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + struct fpu *fpu = &target->thread.fpu; + struct xregs_state *xsave = &fpu->fpstate->regs.xsave; + struct cet_user_state *cetregs; + unsigned long user_ssp; + int r; + + if (!boot_cpu_has(X86_FEATURE_USER_SHSTK) || + !ssp_active(target, regset)) + return -ENODEV; + + r = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &user_ssp, 0, -1); + if (r) + return r; + + /* + * Some kernel instructions (IRET, etc) can cause exceptions in the case + * of disallowed CET register values. Just prevent invalid values. + */ + if ((user_ssp >= TASK_SIZE_MAX) || !IS_ALIGNED(user_ssp, 8)) + return -EINVAL; + + fpu_force_restore(fpu); + + /* + * Don't want to init the xfeature until the kernel will definetely + * overwrite it, otherwise if it inits and then fails out, it would + * end up initing it to random data. + */ + if (!xfeature_saved(xsave, XFEATURE_CET_USER) && + WARN_ON(init_xfeature(xsave, XFEATURE_CET_USER))) + return -ENODEV; + + cetregs = get_xsave_addr(xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because it was successfully + * inited above if needed. The only scenario would be if an + * xfeature was somehow saved in a buffer, but not enabled in + * xsave. + */ + return -ENODEV; + } + + cetregs->user_ssp = user_ssp; + return 0; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION /* diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index dfaa270a7cc9..095f04bdabdc 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -58,6 +58,7 @@ enum x86_regset_64 { REGSET64_FP, REGSET64_IOPERM, REGSET64_XSTATE, + REGSET64_SSP, }; #define REGSET_GENERAL \ @@ -1267,6 +1268,17 @@ static struct user_regset x86_64_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, +#ifdef CONFIG_X86_USER_SHADOW_STACK + [REGSET64_SSP] = { + .core_note_type = NT_X86_SHSTK, + .n = 1, + .size = sizeof(u64), + .align = sizeof(u64), + .active = ssp_active, + .regset_get = ssp_get, + .set = ssp_set + }, +#endif }; static const struct user_regset_view user_x86_64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index c7b056af9ef0..e9283f0641c4 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -406,6 +406,8 @@ typedef struct elf64_shdr { #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ +/* Old binutils treats 0x203 as a CET state */ +#define NT_X86_SHSTK 0x204 /* x86 SHSTK state */ #define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */ #define NT_S390_TIMER 0x301 /* s390 timer register */ #define NT_S390_TODCMP 0x302 /* s390 TOD clock comparator register */ From patchwork Sat Dec 3 00:36:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29197 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143550wrr; Fri, 2 Dec 2022 16:44:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf59Acwhyq101bCxdprEs1HKLSpxuf5f4ptGzApxSpbAneUvDWCOS2L0PFiiuanD682zJwo8 X-Received: by 2002:aa7:9a4e:0:b0:563:b1bc:7f98 with SMTP id x14-20020aa79a4e000000b00563b1bc7f98mr54243712pfj.29.1670028288311; Fri, 02 Dec 2022 16:44:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028288; cv=none; d=google.com; s=arc-20160816; b=eiNbZXiuFzDxqJjmdJKiDpLqfeR4somgDgOcALuDuOGIbiGfFYaxli2jVMf0B+6+RU A6dHhjGhXS/ERcwBAuZR2B6s5I+WhzCBP758CTmZunG7mnbiV7BEOJORrIhK4sjzjI9d Gq3glwZh5OmZ+ZiKt5IS3ZXm+TgvelzOq9uciRFg3dUk0AtmV6JEhxFXcXr0FjwdhPXw jhOfyqjBoJjzo893zrKjuzroTYQ+2IxrChDkVoOc325nrbf3EQfAITER8v+93DCaRM26 B8oybLWlHXI0Uoh+90KWNMQ+ThUpzSE6yETsMhFZ0zQbTZrwf9MJkob7CX7cfbH1ccoB nf1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=PBSOj0CQcG8eJ62zEVn8L7b7RKx/StbKLg+xXU3bKoo=; b=VqcG3OBm5fFdt8SHYH5N6FAA76QvEypnoHFYfFWfJJxiqPEWQ52x8QPVwREYvpHgoK uvC9tsq3GFQ10USWLxzdgOumKqXWAb9W7thEkbNviNd3d+9yDW/aTsfRi73nUn/G54Nm InytLIOJMGD3prmPQYgIFDj/RW3ntqhxAPGmNo2rEVkjeW2clwxivBJUf/LLt38tI61c N6cOLsvEiLCyhPxUfyI+AXX39z/ZzS2SM+/6EtPJMsvBaar972wjrwcQCLDo5QkW6gU+ gvG/RU9GCJpCCnKHKVivl7jYNNss26P/sJoLLj4Jpv97JFDqhMwYfYurKfWaNSFFxk6C bItA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ldddwqru; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ix4-20020a170902f80400b0018725be1285si7825394plb.558.2022.12.02.16.44.35; Fri, 02 Dec 2022 16:44:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ldddwqru; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235209AbiLCAoA (ORCPT + 99 others); Fri, 2 Dec 2022 19:44:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235145AbiLCAmI (ORCPT ); Fri, 2 Dec 2022 19:42:08 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB59CFA477; Fri, 2 Dec 2022 16:38:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027917; x=1701563917; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ihrpz4xCBLZQivZyXSYIAMTeYpug2An/yX4B7/nlQAk=; b=Ldddwqru6iEH6rmHwtdhhdyQofSD9ZDYHWreFVP7PySiP6f5P8z1mrzZ 6MWo4aS+KFLOVCONv5lQbMiETlTxa0MQf7288+6gWFkT3TvTHcA1VxoGP j4Cx+qcn3EU/2973Su44pVd3KT5moTTEnD8XEUD5IuwHyE6L14sWgXnnj hT4KOn3sriQy8gS9UoiY2CQ9kjdIVDtJRl1zcfK9xQOhjjACZ2y6B+091 iS0TRzrs1vLwdGr4F/myFUhLlSYTuJA9n5czDA/vebfKReq7cX6X4N/sH g/wnY1C9aYM8YWBUg1a0OnvZBFX2K+WupYiEfP0ctwSugPj7qUBAddQjx A==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711666" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711666" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:55 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787480067" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787480067" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:52 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Mike Rapoport Subject: [PATCH v4 38/39] x86/shstk: Add ARCH_SHSTK_UNLOCK Date: Fri, 2 Dec 2022 16:36:05 -0800 Message-Id: <20221203003606.6838-39-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151582627041631?= X-GMAIL-MSGID: =?utf-8?q?1751151582627041631?= From: Mike Rapoport Userspace loaders may lock features before a CRIU restore operation has the chance to set them to whatever state is required by the process being restored. Allow a way for CRIU to unlock features. Add it as an arch_prctl() like the other shadow stack operations, but restrict it being called by the ptrace arch_pctl() interface. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Mike Rapoport [Merged into recent API changes, added commit log and docs] Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v4: - Add to docs that it is ptrace only. - Remove "CET" references v3: - Depend on CONFIG_CHECKPOINT_RESTORE (Kees) Documentation/x86/shstk.rst | 4 ++++ arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 9 +++++++-- 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/x86/shstk.rst b/Documentation/x86/shstk.rst index 8e0b2fe83ef8..0d7d1ccfff06 100644 --- a/Documentation/x86/shstk.rst +++ b/Documentation/x86/shstk.rst @@ -73,6 +73,10 @@ arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) are ignored. The mask is ORed with the existing value. So any feature bits set here cannot be enabled or disabled afterwards. +arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) + Unlock features. 'features' is a mask of all features to unlock. All + bits set are processed, unset bits are ignored. Only works via ptrace. + The return values are as following: On success, return 0. On error, errno can be:: diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index f13751c6bae4..0c95688cf58e 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -30,6 +30,7 @@ #define ARCH_SHSTK_ENABLE 0x5001 #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 +#define ARCH_SHSTK_UNLOCK 0x5004 /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 4ddd7d9209e1..2be6e01fb144 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -921,6 +921,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_SHSTK_ENABLE: case ARCH_SHSTK_DISABLE: case ARCH_SHSTK_LOCK: + case ARCH_SHSTK_UNLOCK: return shstk_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 5d91e653f77a..95579f7bace3 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -455,9 +455,14 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long features) return 0; } - /* Don't allow via ptrace */ - if (task != current) + /* Only allow via ptrace */ + if (task != current) { + if (option == ARCH_SHSTK_UNLOCK && IS_ENABLED(CONFIG_CHECKPOINT_RESTORE)) { + task->thread.features_locked &= ~features; + return 0; + } return -EINVAL; + } /* Do not allow to change locked features */ if (features & task->thread.features_locked) From patchwork Sat Dec 3 00:36:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 29198 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1143920wrr; Fri, 2 Dec 2022 16:46:07 -0800 (PST) X-Google-Smtp-Source: AA0mqf6K7pJ1gJ8lBqQ7aBp06dZCbe6YvDeskE0XA9zVUcBFwmP03odRZ1H6FCecP3SqoeoTjtYb X-Received: by 2002:a17:902:dc83:b0:189:b4ed:5699 with SMTP id n3-20020a170902dc8300b00189b4ed5699mr9096093pld.56.1670028367578; Fri, 02 Dec 2022 16:46:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670028367; cv=none; d=google.com; s=arc-20160816; b=LXEvC8cDs1CSRMfiQWNJXn20Q1DzdNL3fFhOegafOW7X1saGrFxlLjdnYR4wkSZmNO 0G/h31FyCwqNIU3L78a4T/WO0V/mSLljQsMk8xKEg+h+O0fKKmrD4Sz45f+imlcChE3/ stpCAKA+2pVtpO8nkDUkCCk1YhnXPKYrZ32L87vCIWj9d6hROYo8N4p5aN6fx43iQvHo F1rk6aZ6zKdlntKa1jY9KQvrfgSNHOsvGOKLqjlvB5DWHCikFFPR8cQUjP6IZWhDmw5q 6ggaatt4zYR1hILpB8SEkMEKZSss6gTdPBziLk7Yeu088OMXfC4z9BxJDyJHT2LWFe+i WEhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=94GmVWv7wl0QjttGfKG6Bh/M6nYzSSBF0sL6SuZ78yQ=; b=0LjqroqxI/7GEQo3XTjap7vhPQkTxy8AZGD5FWCiTKrdhiqqu3ageeC813PMPuESRX ST8ySSltU77FkZJftVVioRy5Om+kFcjibW0uQIh3jKvqj+JVAucilb9jFlpIs9FgMpQV W/oDVOe6x7JBCtTZBkX6CKXUZzKJS47WnCPDuXBEHtCNBr0KDBANdZD6HEUcC7uGLD15 UH+e7UYXCh4yyvyipgNWZbGe1ZpAqiWHq7Y7MmmXO+VFrckqTzYX7fWhg/dvJ0KBv6DQ m5YpiNiqldXCjLeRfQ6HuSGVhAIlRqMReCC+Roq32q/w6lQc6AAuz2fBq79vVmqxXA8G gsHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="VmQJtG/w"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s7-20020a639247000000b004779a46866csi1629282pgn.563.2022.12.02.16.45.53; Fri, 02 Dec 2022 16:46:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="VmQJtG/w"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235366AbiLCAoK (ORCPT + 99 others); Fri, 2 Dec 2022 19:44:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235208AbiLCAmJ (ORCPT ); Fri, 2 Dec 2022 19:42:09 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A54E4FB88D; Fri, 2 Dec 2022 16:38:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670027918; x=1701563918; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=IEms/ukaJcEhqIOw9O3rDng3eVx/Cw+mUObHcCDOvMo=; b=VmQJtG/wplIYftv8Ytm+ntwscUXvPWhieTMLJyZCffG6GL3nu9bSQ4Jo 3vxLsJnGI5GailneJC/yTjN9tZDrBWZZ+vOUx86a9pFROW4Mi+Z1pTzlK tX7VXFa6watwdEDg81CLlG8TpFOdxYPPqJlU3iTHP0CFX8frGYH3ebKow Vn5JEHJSLUx1uX/pbcZB8nC2zRqTWu7+gb+vgkw+ZRnWf25y2sOSmKh7p AzJgZE0NXzYk2OFO2i3H8eymO3auO3LUXKLP38EbLUsempZrYecGsF8hY biUSfCpvAWJ1aIyhiGviwry0BQha6zfZcX39vDqAcJRVsdYA7YP6d9M0p w==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="313711706" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="313711706" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:57 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="787480082" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="787480082" Received: from bgordon1-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.211.211]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 16:37:55 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v4 39/39] x86/shstk: Add ARCH_SHSTK_STATUS Date: Fri, 2 Dec 2022 16:36:06 -0800 Message-Id: <20221203003606.6838-40-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221203003606.6838-1-rick.p.edgecombe@intel.com> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751151665477533541?= X-GMAIL-MSGID: =?utf-8?q?1751151665477533541?= CRIU and GDB need to get the current shadow stack and WRSS enablement status. This information is already available via /proc/pid/status, but this is incovienent for CRIU because it involves parsing the text output in an area of the code where this is difficult. Provide a status arch_prctl(), ARCH_SHSTK_STATUS for retrieving the status. Have arg2 be a userspace address, and make the new arch_prctl simply copy the features out to userspace. Tested-by: Pengfei Xu Requested-by: Mike Rapoport Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v4: - New patch Documentation/x86/shstk.rst | 6 ++++++ arch/x86/include/asm/shstk.h | 4 ++-- arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 8 +++++++- 5 files changed, 17 insertions(+), 3 deletions(-) diff --git a/Documentation/x86/shstk.rst b/Documentation/x86/shstk.rst index 0d7d1ccfff06..b3eb87046c27 100644 --- a/Documentation/x86/shstk.rst +++ b/Documentation/x86/shstk.rst @@ -77,6 +77,11 @@ arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) Unlock features. 'features' is a mask of all features to unlock. All bits set are processed, unset bits are ignored. Only works via ptrace. +arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr) + Copy the currently enabled features to the address passed in addr. The + features are described using the bits passed into the others in + 'features'. + The return values are as following: On success, return 0. On error, errno can be:: @@ -84,6 +89,7 @@ The return values are as following: -EOPNOTSUPP if the feature is not supported by the hardware or disabled by kernel parameter. -EINVAL arguments (non existing feature, etc) + -EFAULT if could not copy information back to userspace The feature's bits supported are:: diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index c82f22fd5e6d..23bfb63c597d 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -15,7 +15,7 @@ struct thread_shstk { u64 size; }; -long shstk_prctl(struct task_struct *task, int option, unsigned long features); +long shstk_prctl(struct task_struct *task, int option, unsigned long arg2); void reset_thread_features(void); int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size, @@ -31,7 +31,7 @@ static inline bool shstk_enabled(void) } #else static inline long shstk_prctl(struct task_struct *task, int option, - unsigned long features) { return -EINVAL; } + unsigned long arg2) { return -EINVAL; } static inline void reset_thread_features(void) {} static inline int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 0c95688cf58e..abe3fe6db6d2 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -31,6 +31,7 @@ #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 #define ARCH_SHSTK_UNLOCK 0x5004 +#define ARCH_SHSTK_STATUS 0x5005 /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 2be6e01fb144..5dcf5426241b 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -922,6 +922,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_SHSTK_DISABLE: case ARCH_SHSTK_LOCK: case ARCH_SHSTK_UNLOCK: + case ARCH_SHSTK_STATUS: return shstk_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 95579f7bace3..05f8dcc19dbc 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -448,8 +448,14 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi return alloc_shstk(addr, aligned_size, size, set_tok); } -long shstk_prctl(struct task_struct *task, int option, unsigned long features) +long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { + unsigned long features = arg2; + + if (option == ARCH_SHSTK_STATUS) { + return put_user(task->thread.features, (unsigned long __user *)arg2); + } + if (option == ARCH_SHSTK_LOCK) { task->thread.features_locked |= features; return 0;