From patchwork Fri Nov 4 22:35:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15827 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678655wru; Fri, 4 Nov 2022 15:41:36 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5pForDAqPfXrNUutKj4dxQP+/Ng19E8nrIJ4vWvvgWPOhMuYKQCO/eeo2g8K2p49qecF27 X-Received: by 2002:a05:6a00:16c1:b0:563:177f:99ee with SMTP id l1-20020a056a0016c100b00563177f99eemr38138240pfc.7.1667601696436; Fri, 04 Nov 2022 15:41:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601696; cv=none; d=google.com; s=arc-20160816; b=eALQKP/qy1P3+WgDY1y2V453dPWM4z2QiqY7znhsV5G6QZnXLxMI7e2zuOwBJ5xUPG Vu6UqTyJyJiRX9giZ5tDK8Mz3o7bpaJyl6oMpEtqPn6zpGuZqgtlRJW0/b9jMGaT1MtN +f6/9Ukg3xDBCqD8nH8JeTDbwb4sO8EfuNEYTApH9MRG7eXJ3ciRX3ULq+HRVWiU2b3s HQAchiVYaHoKQuKalTQHscNRqonvUz46FeVdoN54X/K5JYEtSEr+y3S5bJtpCA8x1gUW 6P4m/yIQ2UjcRoiQtyAJzYgCwIl1ixY0t+eJZqfrUHbP7o1Bp1tAQvHGfQ6SaVCLYvis lp4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=rWv83AKmsDFAFys9ysgq8gyLz4h9vKzTJKq3E6NVa/U=; b=DjDTB9/o40Nd3LrLyzpO3AK6BwUOR75A8zBqF3FvoitHTrttsqUh51ERCIP95Wlebm VtAC3uTZAy2TWQGFYyYScmgWFLC48Cv8kCae6rE/oKCzSzFbCyKMLWjBVaplb/bb8T2I uPAi0BP6l7po/dUorZJamXwMOQQi186w66j8OCT0D2XLWfEyjA4P6lWmftrlue/ECvsP +YVlJszykXS/xu9H1EA+biybr4173bmhbc+mr03cLWS74K9YHk96b2kkSO2fR0Vtvk23 8L1ibta3yQAvgb91aanhcUErMViYJGylE/aM8vAiecHSr6pBWx4BCDE09ILT0bAaDG8F jI6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Tq2qM6Ve; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u14-20020a63790e000000b0046eecbac482si757667pgc.415.2022.11.04.15.41.23; Fri, 04 Nov 2022 15:41:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Tq2qM6Ve; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229556AbiKDWjb (ORCPT + 99 others); Fri, 4 Nov 2022 18:39:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229750AbiKDWjZ (ORCPT ); Fri, 4 Nov 2022 18:39:25 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD3D324961; Fri, 4 Nov 2022 15:39:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601564; x=1699137564; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=HzY88wRuFQaKqUeaRtvIS4XEASKTUtROHQ9F/zxMQKY=; b=Tq2qM6Ve5Kb3WBrUkGLfucPkVLCc0LPFyRFwbEy1IGpftSzFHICnf0Ly oy6pcgBoBuorXAHlFc9ZOQB+c7DDIeYLRLf2XtrVZiPfiTnPeqRKbUprB lhRpewJdlbU+FjKgooJ06CC/9RyX0UEMsC7zn5IOemInnSt8W5bmylAan p0sAnLPN+hj1LFXdlSFJRZKuvgOtp6PJjJLSNkT39F0NuakfYLLtem3ht h+hgXJOATqHONK817KRExbcavwbX7zROAY4RdgYczQsFv+nr7veCv4TV/ lKj9J7Ur+1uwAa7gQ5nutu92nATbbut0Fp6AVh1a1iZV3SlIZW61dazfM A==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840473" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840473" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:23 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668513914" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668513914" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:21 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 01/37] Documentation/x86: Add CET description Date: Fri, 4 Nov 2022 15:35:28 -0700 Message-Id: <20221104223604.29615-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607116586228721?= X-GMAIL-MSGID: =?utf-8?q?1748607116586228721?= From: Yu-cheng Yu Introduce a new document on Control-flow Enforcement Technology (CET). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Clarify kernel IBT is supported by the kernel. (Kees, Andrew Cooper) - Clarify which arch_prctl's can take multiple bits. (Kees) - Describe ASLR characteristics of thread shadow stacks. (Kees) - Add exec section. (Andrew Cooper) - Fix some capitalization (Bagas Sanjaya) - Update new location of enablement status proc. - Add info about new user_shstk software capability. - Add more info about what the kernel pushes to the shadow stack on signal. v2: - Updated to new arch_prctl() API - Add bit about new proc status v1: - Update and clarify the docs. - Moved kernel parameters documentation to other patch. Documentation/x86/cet.rst | 147 ++++++++++++++++++++++++++++++++++++ Documentation/x86/index.rst | 1 + 2 files changed, 148 insertions(+) create mode 100644 Documentation/x86/cet.rst diff --git a/Documentation/x86/cet.rst b/Documentation/x86/cet.rst new file mode 100644 index 000000000000..b56811566531 --- /dev/null +++ b/Documentation/x86/cet.rst @@ -0,0 +1,147 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Control-flow Enforcement Technology (CET) +========================================= + +Overview +======== + +Control-flow Enforcement Technology (CET) is term referring to several +related x86 processor features that provides protection against control +flow hijacking attacks. The HW feature itself can be set up to protect +both applications and the kernel. + +CET introduces Shadow Stack and Indirect Branch Tracking (IBT). Shadow stack +is a secondary stack allocated from memory and cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. IBT verifies indirect CALL/JMP targets are intended +as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow +Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace +Shadow Stack and kernel IBT is supported in the kernel. + +The Kconfig option is X86_USER_SHADOW_STACK, and it can be disabled with +the kernel parameter clearcpuid, like this: "clearcpuid=user_shstk". + +To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later +are required. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. "shstk" and "ibt" relate to the individual HW features. "user_shstk" +relates to whether the userspace shadow stack specifically is supported. + +Application Enabling +==================== + +An application's CET capability is marked in its ELF note and can be verified +from readelf/llvm-readelf output: + + readelf -n | grep -a SHSTK + properties: x86 feature: SHSTK + +The kernel does not process these applications markers directly. Applications +or loaders must enable CET features using the interface described in section 4. +Typically this would be done in dynamic loader or static runtime objects, as is +the case in GLIBC. + +CET arch_prctl()'s +================== + +Elf features should be enabled by the loader using the below arch_prctl's. + +arch_prctl(ARCH_CET_ENABLE, unsigned int feature) + Enable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_CET_DISABLE, unsigned int feature) + Disable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_CET_LOCK, unsigned int features) + Lock in features at their current enabled or disabled status. 'features' + is a mask of all features to lock. All bits set are processed, unset bits + are ignored. The mask is ORed with the existing value. So any feature bits + set here cannot be enabled or disabled afterwards. + +The return values are as following: + On success, return 0. On error, errno can be:: + + -EPERM if any of the passed feature are locked. + -EOPNOTSUPP if the feature is not supported by the hardware or + disabled by kernel parameter. + -EINVAL arguments (non existing feature, etc) + +Currently shadow stack and WRSS are supported via this interface. WRSS +can only be enabled with shadow stack, and is automatically disabled +if shadow stack is disabled. + +Proc status +=========== +To check if an application is actually running with shadow stack, the +user can read the /proc/$PID/status. It will report "wrss" or "shstk" +depending on what is enabled. The lines look like this:: + + x86_Thread_features: shstk wrss + x86_Thread_features_locked: shstk wrss + +The implementation of the Shadow Stack +====================================== + +Shadow Stack size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. However, +a compat-mode application's address space is smaller, each of its thread's +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB). + +Signal +------ + +By default, the main program and its signal handlers use the same shadow +stack. Because the shadow stack stores only return addresses, a large +shadow stack covers the condition that both the program stack and the +signal alternate stack run out. + +When a signal happens, the old pre-signal state is pushed on the stack. When +shadow stack is enabled, the shadow stack specific state is pushed onto the +shadow stack. Today this is only the old SSP (shadow stack pointer), pushed +in a special format with bit 63 set. On sigreturn this old SSP token is +verified and restored by the kernel. The kernel will also push the normal +restorer address to the shadow stack to help userspace avoid a shadow stack +violation on the sigreturn path that goes through the restorer. + +So the shadow stack signal frame format is as follows:: + + |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format + (bit 63 set to 1) + | ...| - Other state may be added in the future + + + +Fork +---- + +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. New shadow stack's behave like mmap() with respect to +ASLR behavior. + +Exec +---- + +On exec, shadow stack features are disabled by the kernel. At which point, +userspace can choose to re-enable, or lock them. diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index c73d133fd37c..9ac03055c4b5 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -22,6 +22,7 @@ x86-specific Documentation mtrr pat intel-hfi + cet iommu intel_txt amd-memory-encryption From patchwork Fri Nov 4 22:35:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15817 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678242wru; Fri, 4 Nov 2022 15:40:41 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5l0iK2GHueQfS0u9Ehhx5zbpTOuI/LaaiLQOtUIPbj+8Lvzv+NyDCyCwqwzlTkEgWug/w4 X-Received: by 2002:a05:6a00:420f:b0:56d:a89e:19e2 with SMTP id cd15-20020a056a00420f00b0056da89e19e2mr24247336pfb.85.1667601640940; Fri, 04 Nov 2022 15:40:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601640; cv=none; d=google.com; s=arc-20160816; b=zS0ZLOxivAuJcplwX53c0kvONr9eIjN+dCsMHoSKZYv5DJdLrVFRIZ/YlrwhEcY/Yn e1vtvtwvQjfIGqqwqxSEqu//s5QiEvZ5kn4XpVoNUYuuqLyGfWctTljwz7f1i9kaMk+q tbMTIaQ8QSm+hY1tHdtDrLYzWmad3KunkKdBFYfRnXrieZS7iPzvD49KM1TrI2jfd7/c WqM38foDNqnozdanq+WN6luwTNvkSKkU4MuA2z6S8cpV3iBBLj4ALm69TUOJELcHyrFs ScL+jCo5yPKFPXkS16GNpt6+EkCWbDRm8UwAlLmZZ18PDuINMKOUQAsADG7VA0nOlia/ 1jjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=967J9TDbAc9q97askeHX02xV9Um4fuTwilA5aOHm5YY=; b=0DTr5sJMCVWMwC7qqj9UZatvfAV3C6V6Idb3A/Sp1U3i2nMz6DVgUBgI+YoGONoveX 2bzr8i63MsR85IsjQPzvpiS3WU+cZrMwv0wZBlZDGmDEXm8c3PKMRkVPGLIJxUrupX9O 0IzJ/u6mG0CVth5wejCkfrnkShVrjr8HDUNz+b1rcyIrZocOc7OYuP1y2BHKd1oWMDFq 9DQ63d4NeDpXtwJH5+9kcRBtrgXTOPLFsqFzjNeAvYwticaOPEtYSeG04CH8ykxkCngl 9rUp/oo9UHvgx0nJJNmbCEbm/kEV9U6iWjLsb+o5H1OwFOI/Ub26RU0OhGFkBJ9AU1P+ YVqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=IQ0xgCAj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w12-20020a056a0014cc00b005672abf1284si619605pfu.161.2022.11.04.15.40.28; Fri, 04 Nov 2022 15:40:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=IQ0xgCAj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229993AbiKDWjd (ORCPT + 99 others); Fri, 4 Nov 2022 18:39:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229883AbiKDWjZ (ORCPT ); Fri, 4 Nov 2022 18:39:25 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FBF2D5F; Fri, 4 Nov 2022 15:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601565; x=1699137565; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=XjjEWtLkd8LTsXZrdDMhOoWhDojdmFCHoVnaTLWF/ew=; b=IQ0xgCAjrUstjmKeNoi8j14w/ZUf4p+uaoUy7YYZV07QP6BM08x21jI2 +CFeby87wkQu8NxxcL/A9QACwVBcvi2ZsUzLz5QMMLl6iRBo583fTwydl NQ0lZvv9stkt+FiSaEVnLfPP7GLFE6ErejH3uwNUUwIO/aQUKZFbumzXu /wrmEHrXWsxgg3zCPRDtxffUy1Iq+2/+hUOtdOCH9KSO4NHMf4pstb483 64FfJFUEfOUen2OhQJPyIbCQWVUqfxb1TWguDhCJ8Oz0OufRQvB55VIcr Fvg2gI6zeFriDbXhcB1gkZXPalh7YofFjPWid6fffNMiYveDHdRHMNBGp g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840480" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840480" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:23 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668513920" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668513920" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:22 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 02/37] x86/cet/shstk: Add Kconfig option for Shadow Stack Date: Fri, 4 Nov 2022 15:35:29 -0700 Message-Id: <20221104223604.29615-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607058080215999?= X-GMAIL-MSGID: =?utf-8?q?1748607058080215999?= From: Yu-cheng Yu Shadow Stack provides protection for applications against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-Shadow Stack applications continue to work, but without protection. Since there is another feature that utilizes CET (Kernel IBT) that will share implementation with Shadow Stacks, create CONFIG_CET to signify that at least one CET feature is configured. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Add X86_CET (Kees) - Add back WRUSS dependency (Kees) - Fix verbiage (Dave) - Change from promt to bool (Kirill) - Add more to commit log v2: - Remove already wrong kernel size increase info (tlgx) - Change prompt to remove "Intel" (tglx) - Update line about what CPUs are supported (Dave) Yu-cheng v25: - Remove X86_CET and use X86_SHADOW_STACK directly. Yu-cheng v24: - Update for the splitting X86_CET to X86_SHADOW_STACK and X86_IBT. arch/x86/Kconfig | 24 ++++++++++++++++++++++++ arch/x86/Kconfig.assembler | 5 +++++ 2 files changed, 29 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 67745ceab0db..f3d14f5accce 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1852,6 +1852,11 @@ config CC_HAS_IBT (CC_IS_CLANG && CLANG_VERSION >= 140000)) && \ $(as-instr,endbr64) +config X86_CET + def_bool n + help + CET features configured (Shadow Stack or IBT) + config X86_KERNEL_IBT prompt "Indirect Branch Tracking" bool @@ -1859,6 +1864,7 @@ config X86_KERNEL_IBT # https://github.com/llvm/llvm-project/commit/9d7001eba9c4cb311e03cd8cdc231f9e579f2d0f depends on !LD_IS_LLD || LLD_VERSION >= 140000 select OBJTOOL + select X86_CET help Build the kernel with support for Indirect Branch Tracking, a hardware support course-grain forward-edge Control Flow Integrity @@ -1953,6 +1959,24 @@ config X86_SGX If unsure, say N. +config X86_USER_SHADOW_STACK + bool "X86 Userspace Shadow Stack" + depends on AS_WRUSS + depends on X86_64 + select ARCH_USES_HIGH_VMA_FLAGS + select X86_CET + help + Shadow Stack protection is a hardware feature that detects function + return address corruption. This helps mitigate ROP attacks. + Applications must be enabled to use it, and old userspace does not + get protection "for free". + + CPUs supporting shadow stacks were first released in 2020. + + See Documentation/x86/cet.rst for more information. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler index 26b8c08e2fc4..00c79dd93651 100644 --- a/arch/x86/Kconfig.assembler +++ b/arch/x86/Kconfig.assembler @@ -19,3 +19,8 @@ config AS_TPAUSE def_bool $(as-instr,tpause %ecx) help Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7 + +config AS_WRUSS + def_bool $(as-instr,wrussq %rax$(comma)(%rbx)) + help + Supported by binutils >= 2.31 and LLVM integrated assembler From patchwork Fri Nov 4 22:35:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15818 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678282wru; Fri, 4 Nov 2022 15:40:47 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7TiA/Dd89vLdZhsw9S7Vo+dlO/jpBdpIcvlHrGCcZiJXIAS8gzDu5ddJ/1GFDxB/fWQvWP X-Received: by 2002:aa7:8b46:0:b0:56c:349f:191e with SMTP id i6-20020aa78b46000000b0056c349f191emr37347751pfd.29.1667601647370; Fri, 04 Nov 2022 15:40:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601647; cv=none; d=google.com; s=arc-20160816; b=v2Z96pPU9XXpZOXal1FWuAisb0GFqlwRNoLpD7Mqp3SimRNp7djOVeAbf0zhWW2Mfp j8Rb/xQGW85mpWeBhUGhsKhefPStzLGVIa4Ul8qJ3Wq90VWQ2Vgmdb/Bui65Rclxv7bL raOdg0xGULckqSXPvnBA4wnH69JtfuU/YkLVgxae2fgbPvbcDzUoyzmlxX6DQMDyuoCc IlYnWjAFDZv+RoX1LeYwystSe555mt2EdKo9FGkVmJxUwveBmeGFP8zMDAh2fOXbOzhf f/x+ukRePoEly24JbA9uhoCmr9O/bJCy0d4mTmwh3REB59aSsU6VeAewmaLvTFFVQo/v wNJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=wfij3On5SR3Fk6sHG6SiK7Kiq6GdNSzM8aUPT131Pzc=; b=s8gQz99i3Ho47nEnrJdtLDP9SDTyqx4UXlj/f/8BsHwyRKWN7EAFSACfJH922tOq41 fJ+CS0RDU3XfyWKZfHNczCyoK+rKBks8Q9JB5qaGOMVmzKBkX5+0WDzlKSQc3idlWwTg NhLRJzsRfYvoESbWhW84u5sFE0U3LhR9i1V1y2TOCLWidvAjLdI11oWY//JW8tntiD1e +2x0PjknWJ7M2wCi0ejnuBzbbqV7kqkYwrVr9r1HW8veV9JiFiymW1NZCF0MrQHreYw6 mYmKOxCUh0S3j5kYpEVwG3rz9xrPxArGULmlJNEUy6mS60N9AMokR6FNPeRPx3M5HR2/ ohcA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lcRNsHxN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c7-20020a17090a8d0700b0021305ca5933si4399583pjo.42.2022.11.04.15.40.35; Fri, 04 Nov 2022 15:40:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lcRNsHxN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230038AbiKDWjj (ORCPT + 99 others); Fri, 4 Nov 2022 18:39:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229898AbiKDWj0 (ORCPT ); Fri, 4 Nov 2022 18:39:26 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3043924BDA; Fri, 4 Nov 2022 15:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601565; x=1699137565; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=rczL0H8pPnMW1qW7foBVSx9Ii+JR6CtkXBQ8oZY4Am4=; b=lcRNsHxN8ztO7Cj01+/rsf6TrBL8e9w/NqBvKPDfA/c0ofxej19dif2B T+YqQiT6Gz6k9DardUQ1WziP06eOxs17/dezJIffMOdskrFFfcYPNw6zm 5DwRuutHjeCscyP+T2kKSXSPKTRJGRjFwutbW2BmEKE5SMjxugKn73uIS eSN71cZ878YlUY4g9eOmIO6sgBRDZPpM5c8Md2IMU6+aAteRZLXEq3/KP /3wnHRQIM8AkLLiVtJo/aOpdSHZMrrmDmaI/EPGyq29/WGjiEuENVYJ9E 9PJDrvct3ia+ukAXQhFqsQet8UKQL0HnmJuHWsugGngfR89KeQarxqNzm A==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840485" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840485" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:24 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668513924" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668513924" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:23 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 03/37] x86/cpufeatures: Add CPU feature flags for shadow stacks Date: Fri, 4 Nov 2022 15:35:30 -0700 Message-Id: <20221104223604.29615-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607065199206876?= X-GMAIL-MSGID: =?utf-8?q?1748607065199206876?= From: Yu-cheng Yu The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. The shadow stack feature, enumerated by the CPUID bit described above, encompasses both supervisor and userspace support for shadow stack. In near future patches, only userspace shadow stack will be enabled. In expectation of future supervisor shadow stack support, create a software CPU capability to enumerate kernel utilization of userspace shadow stack support. This will also allow for userspace shadow stack to be disabled, while leaving the shadow stack hardware capability exposed in the cpuinfo proc. This user shadow stack bit should depend on the HW "shstk" capability and that logic will be implemented in future patches. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Add user specific shadow stack cpu cap (Andrew Cooper) - Drop reviewed-bys from Boris and Kees due to the above change. v2: - Remove IBT reference in commit log (Kees) - Describe xsaves dependency using text from (Dave) v1: - Remove IBT, can be added in a follow on IBT series. Yu-cheng v25: - Make X86_FEATURE_IBT depend on X86_FEATURE_SHSTK. arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/include/asm/disabled-features.h | 9 ++++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index b71f4f2ecdd5..5626ecb8a080 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -304,6 +304,7 @@ #define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */ #define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */ #define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */ +#define X86_FEATURE_USER_SHSTK (11*32+18) /* Shadow stack support for user mode applications */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ @@ -365,6 +366,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* Shadow Stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 33d2cd04d254..30cd12905499 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -87,6 +87,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define DISABLE_USER_SHSTK 0 +#else +#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -101,7 +107,8 @@ #define DISABLED_MASK8 (DISABLE_TDX_GUEST) #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 -#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET) +#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ + DISABLE_USER_SHSTK) #define DISABLED_MASK12 0 #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index c881bcafba7d..bf1b55a1ba21 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -78,6 +78,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Fri Nov 4 22:35:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15819 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678301wru; Fri, 4 Nov 2022 15:40:49 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6OdmwAGbosA9OjLdFr5LJHuv5jfjIma47rres5H3iHFJxZQIUdXoiSkx6iTDL1DG0KDvdK X-Received: by 2002:a65:44c5:0:b0:439:4697:ead0 with SMTP id g5-20020a6544c5000000b004394697ead0mr31822156pgs.45.1667601649119; Fri, 04 Nov 2022 15:40:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601649; cv=none; d=google.com; s=arc-20160816; b=lL5xFgnlqIMM5klz/lH5cLFY9eO7aewH2NH/X9a+vR8bLeq6PLAy69YE3XXZDLfAoQ qXL4LVNUjG83JZufjGeYBSX1FJa/KqHK2MWy4auTSQ9marwH8wLsm1I8OdsjCKXk+znU hwYHjjq3lqLSJ/R9YwMKR0vXECV+CxWS8/Ubq9ua4KYDFUutfVs+wCTI0D2KeUnTahxB QtewTIJEGOzd/BsxO/9gu1TvmgihBW0WQrnGoF4zADAEpGUs3vsAJTsT7y9VLG9MHLF7 zFFZjZs7VVZjeEhvN2ZMOPAv5zR+HW0vwQ+bmnVEpahNKj3pMusy5u+UQHwhCgeu8w8e c4yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=xJbAWMlrkB6SpT4AB6t4hEXHcv/BUbFClg/2bE0bDy0=; b=RtBrwNFcKyF23mHoHbAY1S8ZnFROifqKfCgjRpEFx9ULmsLSvG7A+/MTuTnDODMbV4 VMpbCty71pmzVTDloyIFppLqoiGV7m8j8mngqsegq/vBFlTbPP0PEd2kg7fDvaHxB+YY aYujWPQ5ftUk2e8idJWUlQk5uguwuUTJsPXb8j383CUJw1VioqMkZyqcLjuw3aAKQWTt wgR8jOmQyJZQNrT2fh7RzzR3nXLxeBw6UBo60OiVY7CDODJ9U/ZyXdSJpZEn4NT+vkqQ S05IA4IUNP7qRUB8L+Hn30icI/Zp3tjn9J53oVMhgFhhVFP2WIVSR11z+GTc/WUSKXno hlLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=X6x4CBOp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q192-20020a632ac9000000b0046aabd03ce5si812479pgq.796.2022.11.04.15.40.36; Fri, 04 Nov 2022 15:40:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=X6x4CBOp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229935AbiKDWkA (ORCPT + 99 others); Fri, 4 Nov 2022 18:40:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229919AbiKDWj0 (ORCPT ); Fri, 4 Nov 2022 18:39:26 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B859275C0; Fri, 4 Nov 2022 15:39:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601566; x=1699137566; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=mTVhLRNKpSbmCscJuYNnJ4SsjQqGcAiqFTAkxD22q3I=; b=X6x4CBOppF3ITWJobXajbXNzYzIslHOhiTpt1rto7lkvYzBUykATWZNY njwmOETrsfOtqGBSmE4wm4m/1hlZT3RExJ2Dzmodglfe5gTlLzWM3FmLM JEqYZlQn46YypfR/DpYe3gEIEfB2tB9nQfGyralVpChPsSc8WkyQdNKOE TPVSWvD1qbUYJFdgnkCHuC9g+qGDAqYfEdl+VnqNuCf8xCTjd226Ge9EI KSZijs0LZpbunwfH5tP5ulaom/dMN2h9CDIPGrwb00qfjJVaHvYReLVyQ y8QUAOAEGpL4v6VrOAnM1AdBzoszppN58P969aDwIRn4N+V/NSN/cjflE g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840489" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840489" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:25 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668513927" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668513927" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:24 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 04/37] x86/cpufeatures: Enable CET CR4 bit for shadow stack Date: Fri, 4 Nov 2022 15:35:31 -0700 Message-Id: <20221104223604.29615-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607066395017772?= X-GMAIL-MSGID: =?utf-8?q?1748607066395017772?= From: Yu-cheng Yu Setting CR4.CET is a prerequisite for utilizing any CET features, most of which also require setting MSRs. Kernel IBT already enables the CET CR4 bit when it detects IBT HW support and is configured with kernel IBT. However, future patches that enable userspace shadow stack support will need the bit set as well. So change the logic to enable it in either case. Clear MSR_IA32_U_CET in cet_disable() so that it can't live to see userspace in a new kexec-ed kernel that has CR4.CET set from kernel IBT. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Remove stay new line (Boris) - Simplify commit log (Andrew Cooper) v2: - In the shadow stack case, go back to only setting CR4.CET if the kernel is compiled with user shadow stack support. - Clear MSR_IA32_U_CET as well. (PeterZ) KVM refresh: - Set CR4.CET if SHSTK or IBT are supported by HW, so that KVM can support CET even if IBT is disabled. - Drop no_user_shstk (Dave Hansen) - Elaborate on what the CR4 bit does in the commit log - Integrate with Kernel IBT logic v1: - Moved kernel-parameters.txt changes here from patch 1. arch/x86/kernel/cpu/common.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 3e508f239098..0ba0a136adcb 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -596,28 +596,51 @@ __noendbr void ibt_restore(u64 save) #endif +#ifdef CONFIG_X86_CET static __always_inline void setup_cet(struct cpuinfo_x86 *c) { - u64 msr = CET_ENDBR_EN; + bool kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT); + bool user_shstk; + u64 msr = 0; - if (!HAS_KERNEL_IBT || - !cpu_feature_enabled(X86_FEATURE_IBT)) + /* + * Enable user shadow stack only if the Linux defined user shadow stack + * cap was not cleared by command line. + */ + user_shstk = cpu_feature_enabled(X86_FEATURE_SHSTK) && + IS_ENABLED(CONFIG_X86_USER_SHADOW_STACK) && + !test_bit(X86_FEATURE_USER_SHSTK, (unsigned long *)cpu_caps_cleared); + + if (!kernel_ibt && !user_shstk) return; + if (user_shstk) + set_cpu_cap(c, X86_FEATURE_USER_SHSTK); + + if (kernel_ibt) + msr = CET_ENDBR_EN; + wrmsrl(MSR_IA32_S_CET, msr); cr4_set_bits(X86_CR4_CET); - if (!ibt_selftest()) { + if (kernel_ibt && !ibt_selftest()) { pr_err("IBT selftest: Failed!\n"); setup_clear_cpu_cap(X86_FEATURE_IBT); return; } } +#else /* CONFIG_X86_CET */ +static inline void setup_cet(struct cpuinfo_x86 *c) {} +#endif __noendbr void cet_disable(void) { - if (cpu_feature_enabled(X86_FEATURE_IBT)) - wrmsrl(MSR_IA32_S_CET, 0); + if (!(cpu_feature_enabled(X86_FEATURE_IBT) || + cpu_feature_enabled(X86_FEATURE_SHSTK))) + return; + + wrmsrl(MSR_IA32_S_CET, 0); + wrmsrl(MSR_IA32_U_CET, 0); } /* From patchwork Fri Nov 4 22:35:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15820 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678331wru; Fri, 4 Nov 2022 15:40:52 -0700 (PDT) X-Google-Smtp-Source: AMsMyM61RY8Emhh1PjelfEjQ0BaKYJTBDCL3Abz9NXIh0baxe5l16dK3yhQfoDVzYv4WGx1uPOem X-Received: by 2002:a05:6a00:1749:b0:56e:4586:4bc1 with SMTP id j9-20020a056a00174900b0056e45864bc1mr10980759pfc.41.1667601652631; Fri, 04 Nov 2022 15:40:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601652; cv=none; d=google.com; s=arc-20160816; b=YZB9UlPMTBKRayMXf+pzek35uO+NYuZh+oE+uUnGc1vHbVFiNjLR7WPTnSQYvqN0ZX YkRbngtmsy1GFYscUuTh49FTS+2ivW6EHdXHr+nTEVdVQG7eA/Bz05zcdt5NrYg03j2R euYSbycy6YMnIPEX0NSl0wrbNOcHJOWWdCe4ts5bjaKZmi0SshNX+SXo/fR7DcD/IlWN WOgs3zPvr2OQUMCs4DDNzM5Y3I4yVx6oSExNC8bSQbT/mr7chf1Fdk7/1L+KmP6g6aeK LWhNy6+qly6Qy8cYj4m+hoUx7HSxD8Ku0V0gxFaEtI30hjlz/2Ht0aD8B/OqjanhONVm +dOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=wMwE20lpDvdskyQbOXxJt0ZSmmUoC3ZOGbGA4Ztww6I=; b=vQjvhCp5BZUI5o8yKNIp6Rszu+0I8c5sPsy9ck4+mvY2hc2f454oSHnb1cU86TfLxh XSw6CdM0ZnfVRY1DXSsjp6HnDMEuSSdYPJiPM9pPDQ+1/19hmOOglN49evLHrndDQRCO rNBnEzRkQfZO4vjUQT3WmTu3B8shpDTlYDxRdKI6biyMR/ZAEyegaZsJOocCLGrQhh1q eeqaemXkvskeq28oFd1y4wWZ3f9hOTdIFKJI3uE0CUpoxR/dLsfwGa8IxnY/aRk57wN1 iajNoTZ2ObLoyLXRcz5s2Dh2p4PxKUL47BSO/vlZK/wysLpDEYUjso82xo6XlYcdzvQ4 zMqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Qms8kBZb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d23-20020a17090ab31700b0021414ac5cc2si4609606pjr.131.2022.11.04.15.40.40; Fri, 04 Nov 2022 15:40:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Qms8kBZb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230070AbiKDWkK (ORCPT + 99 others); Fri, 4 Nov 2022 18:40:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229499AbiKDWj2 (ORCPT ); Fri, 4 Nov 2022 18:39:28 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D555AD5F; Fri, 4 Nov 2022 15:39:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601566; x=1699137566; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=No0hPiiWDHKVXrSZwrXRH0EcbmsozvEq6ptwayjKxic=; b=Qms8kBZbk2YeC6ox3yVNp/9VZXp/XT9t9KgCMjr6/gTuiEKg+kjacxlC 5Ra0vVv3qfhMp2gQC1xqWVyiNZmshd9r6Bdfs0k/52lIOWkQ1axTjkSDm MYPm7iFTYLplyvvSaDazdvobrYUUNKEnQVhstr1i1sKqRShD0l8L6q7kx vRBaJoxXRlZBTjyKD8/f4HMChMLJ6WKeu3nWamfmXJGGMCWxLjXNfCfNd qxkLG9wce/t/LWh/YjS07OrSUG1J3//xg/3oBZW/Vdbx3HxBY5uQh5uu0 +5DBQ/KWnwubphwPJY220RkF3xhBqcoMmEAdDXxadIC+pB3C38KmMv0IG w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840496" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840496" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:26 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668513942" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668513942" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:25 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 05/37] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Fri, 4 Nov 2022 15:35:32 -0700 Message-Id: <20221104223604.29615-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607070190415658?= X-GMAIL-MSGID: =?utf-8?q?1748607070190415658?= From: Yu-cheng Yu Shadow stack register state can be managed with XSAVE. The registers can logically be separated into two groups: * Registers controlling user-mode operation * Registers controlling kernel-mode operation The architecture has two new XSAVE state components: one for each group of those groups of registers. This lets an OS manage them separately if it chooses. Future patches for host userspace and KVM guests will only utilize the user-mode registers, so only configure XSAVE to save user-mode registers. This state will add 16 bytes to the xsave buffer size. Future patches will use the user-mode XSAVE area to save guest user-mode CET state. However, VMCS includes new fields for guest CET supervisor states. KVM can use these to save and restore guest supervisor state, so host supervisor XSAVE support is not required. Adding this exacerbates the already unwieldy if statement in check_xstate_against_struct() that handles warning about un-implemented xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when it actually check's the xfeature. This ends up exceeding 80 chars, but was better on balance than other options explored. Pass the bool as pointer to make it clear that XCHECK_SZ() can change the variable. While configuring user-mode XSAVE, clarify kernel-mode registers are not managed by XSAVE by defining the xfeature in XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT. This serves more of a documentation as code purpose, and functionally, only enables a few safety checks. Both XSAVE state components are supervisor states, even the state controlling user-mode operation. This is a departure from earlier features like protection keys where the PKRU state is a normal user (non-supervisor) state. Having the user state be supervisor-managed ensures there is no direct, unprivileged access to it, making it harder for an attacker to subvert CET. To facilitate this privileged access, define the two user-mode CET MSRs, and the bits defined in those MSRs relevant to future shadow stack enablement patches. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Add missing "is" in commit log (Boris) - Change to case statement for struct size checking (Boris) - Adjust commas on xfeature_names (Kees, Boris) v2: - Change name to XFEATURE_CET_KERNEL_UNUSED (peterz) KVM refresh: - Reword commit log using some verbiage posted by Dave Hansen - Remove unlikely to be used supervisor cet xsave struct - Clarify that supervisor cet state is not saved by xsave - Remove unused supervisor MSRs v1: - Remove outdated reference to sigreturn checks on msr's. arch/x86/include/asm/fpu/types.h | 14 ++++- arch/x86/include/asm/fpu/xstate.h | 6 ++- arch/x86/kernel/fpu/xstate.c | 90 +++++++++++++++---------------- 3 files changed, 59 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index eb7cd1139d97..344baad02b97 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL_UNUSED, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,14 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + u64 user_cet; /* user control-flow settings */ + u64 user_ssp; /* user shadow stack pointer */ +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 59e543b95a3c..959d4dd64434 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -39,26 +39,26 @@ */ static const char *xfeature_names[] = { - "x87 floating point registers" , - "SSE registers" , - "AVX registers" , - "MPX bounds registers" , - "MPX CSR" , - "AVX-512 opmask" , - "AVX-512 Hi256" , - "AVX-512 ZMM_Hi256" , - "Processor Trace (unused)" , + "x87 floating point registers", + "SSE registers", + "AVX registers", + "MPX bounds registers", + "MPX CSR", + "AVX-512 opmask", + "AVX-512 Hi256", + "AVX-512 ZMM_Hi256", + "Processor Trace (unused)", "Protection Keys User registers", "PASID state", - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "AMX Tile config" , - "AMX Tile data" , - "unknown xstate feature" , + "Control-flow User registers", + "Control-flow Kernel registers (unused)", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "AMX Tile config", + "AMX Tile data", + "unknown xstate feature", }; static unsigned short xsave_cpuid_features[] __initdata = { @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -276,6 +277,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void) } \ } while (0) -#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \ - if ((nr == nr_macro) && \ - WARN_ONCE(sz != sizeof(__struct), \ - "%s: struct is %zu bytes, cpu state %d bytes\n", \ - __stringify(nr_macro), sizeof(__struct), sz)) { \ +#define XCHECK_SZ(sz, nr, __struct) ({ \ + if (WARN_ONCE(sz != sizeof(__struct), \ + "[%s]: struct is %zu bytes, cpu state %d bytes\n", \ + xfeature_names[nr], sizeof(__struct), sz)) { \ __xstate_dump_leaves(); \ } \ -} while (0) + true; \ +}) + /** * check_xtile_data_against_struct - Check tile data state size. @@ -527,37 +531,29 @@ static bool __init check_xstate_against_struct(int nr) * Ask the CPU for the size of the state. */ int sz = xfeature_size(nr); + /* * Match each CPU state with the corresponding software * structure. */ - XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct); - XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); - XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); - XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); - XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); - XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); - XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); - XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); - XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); - - /* The tile data size varies between implementations. */ - if (nr == XFEATURE_XTILE_DATA) - check_xtile_data_against_struct(sz); - - /* - * Make *SURE* to add any feature numbers in below if - * there are "holes" in the xsave state component - * numbers. - */ - if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || - (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + switch (nr) { + case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct); + case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state); + case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state); + case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state); + case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state); + case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state); + case XFEATURE_PKRU: return XCHECK_SZ(sz, nr, struct pkru_state); + case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); + case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); + case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true; + default: WARN_ONCE(1, "no structure for xstate: %d\n", nr); XSTATE_WARN_ON(1); return false; } + return true; } From patchwork Fri Nov 4 22:35:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15821 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678441wru; Fri, 4 Nov 2022 15:41:09 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4tgb+IlckP9RP0Xu2X+LQz4B2VzljTECdbtRmnW8pgMWLNH/SybxhgreFuf799IBq2LsbO X-Received: by 2002:a05:6a00:14d2:b0:56c:b1c5:c5da with SMTP id w18-20020a056a0014d200b0056cb1c5c5damr38701101pfu.2.1667601668925; Fri, 04 Nov 2022 15:41:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601668; cv=none; d=google.com; s=arc-20160816; b=QV098Qixq4S61SGF1BHZ9q/VlTLZsgkXCun8OpRo8c3PMlPTAoV4Vk8FzKHjk4P8n/ NBnCnUx0wL2ZQv5o+ma1TafQGD7fioXPw9IciNXk5jZCVt9TXzhkuIZTVKJImx15XJC9 pqb3/IHTHtzZbMJMoiZ0tSdZf1YIR0/fmNZhJyZTIL58sf0Ehe96+118a39EzdTdYhM7 DRh8RWKshqXkh3uPVTT17WgDPeNyh9UnWi2aSWmcRT1ZyNbIeWOzVQfXrK7/tG+UAWM8 pO72jRViWyWLaigl/1G5X/J2f3uLVRCfzdf1SyxQaCNbiKVdazM/sSBEhSQhdbDtKH8O 3Wfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=AbmGId16tK0t2rGVD1R/lf6dwe+cYapm4W+Qlo/nZ2Y=; b=iYvAabJA3HcsmUw7HicAQj/hx/1g8TtNIEsX17MfaOX7g76kUrDqyQlYCcrJQNB6r4 CA4USRXAWCuDR2X7pkYTOHM9Hg8iwF71lhqeYTK6ao5nmQ8bDViLgaI8gdv+asaTOcdC w0soQP60/dqPJa2lFxt/115uKz2gWFGFbtnlDlkM8JDXUrTCRCsI0WQd7iB41JV3kgEI 0k3wSU/Etn4UG7XFvo1tyu2RtxtHL/huH5YP1eKY5MWXzemO2/aQIQ+UrM9ON5UU0voE 4V6yJp9kOLY6J4Ojv/+XBkahtS2y+6/xl3ygyBybktQKuXbuyPvjXJDN9LU973+MteBZ AlKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=P75ynLi4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p20-20020a056a000b5400b005633c373dcbsi608107pfo.147.2022.11.04.15.40.54; Fri, 04 Nov 2022 15:41:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=P75ynLi4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229531AbiKDWkP (ORCPT + 99 others); Fri, 4 Nov 2022 18:40:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229975AbiKDWj3 (ORCPT ); Fri, 4 Nov 2022 18:39:29 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A2A432075; Fri, 4 Nov 2022 15:39:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601568; x=1699137568; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4/pxGAyl97+5ZD5/gTFtyEZLTubfAza59sVFqeoJNMY=; b=P75ynLi4f3CdcPkP7sRcOKQ0h31R5kobQGfvmHpXcm9EBkELTSWoCmO/ pQFjJn0OUt6SD77u9ONGIOJTPzx0hwk7A8AVHmcah7/dNsyNcjM00ajdg Hr0SYxyDd0aTkddqBulwv19WPoKH1u7/YDfIx57PsJtVviuYQld8AtciS JvOMAuLXkf817hkB34EKTU2oS3u+1vpR3HZA+YBVjJ+IkUWBu+PAFyS4k TJuautssz1TTffvmTLM3tpClUDiFiA9e/9yVCkmN3N5bEY+qjuoI56lsX d60AbM/PHzfUJ/IGy/Btq9sEjxc99V/XthBXCFmzqDR2nFfGX1WABKsfn Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840500" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840500" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:27 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668513959" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668513959" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:26 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 06/37] x86/fpu: Add helper for modifying xstate Date: Fri, 4 Nov 2022 15:35:33 -0700 Message-Id: <20221104223604.29615-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607087428677001?= X-GMAIL-MSGID: =?utf-8?q?1748607087428677001?= Just like user xfeatures, supervisor xfeatures can be active in the registers or present in the task FPU buffer. If the registers are active, the registers can be modified directly. If the registers are not active, the modification must be performed on the task FPU buffer. When the state is not active, the kernel could perform modifications directly to the buffer. But in order for it to do that, it needs to know where in the buffer the specific state it wants to modify is located. Doing this is not robust against optimizations that compact the FPU buffer, as each access would require computing where in the buffer it is. The easiest way to modify supervisor xfeature data is to force restore the registers and write directly to the MSRs. Often times this is just fine anyway as the registers need to be restored before returning to userspace. Do this for now, leaving buffer writing optimizations for the future. Add a new function fpregs_lock_and_load() that can simultaneously call fpregs_lock() and do this restore. Also perform some extra sanity checks in this function since this will be used in non-fpu focused code. Tested-by: Pengfei Xu Tested-by: John Allen Suggested-by: Thomas Gleixner Signed-off-by: Rick Edgecombe --- v3: - Rename to fpregs_lock_and_load() to match the unlocking fpregs_unlock(). (Kees) - Elaborate in comment about helper. (Dave) v2: - Drop optimization of writing directly the buffer, and change API accordingly. - fpregs_lock_and_load() suggested by tglx - Some commit log verbiage from dhansen v1: - New patch. arch/x86/include/asm/fpu/api.h | 9 +++++++++ arch/x86/kernel/fpu/core.c | 19 +++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 503a577814b2..aadc6893dcaa 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -82,6 +82,15 @@ static inline void fpregs_unlock(void) preempt_enable(); } +/* + * FPU state gets lazily restored before returning to userspace. So when in the + * kernel, the valid FPU state may be kept in the buffer. This function will force + * restore all the fpu state to the registers early if needed, and lock them from + * being automatically saved/restored. Then FPU state can be modified safely in the + * registers, before unlocking with fpregs_unlock(). + */ +void fpregs_lock_and_load(void); + #ifdef CONFIG_X86_DEBUG_FPU extern void fpregs_assert_state_consistent(void); #else diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 3b28c5b25e12..8b3162badab7 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -756,6 +756,25 @@ void switch_fpu_return(void) } EXPORT_SYMBOL_GPL(switch_fpu_return); +void fpregs_lock_and_load(void) +{ + /* + * fpregs_lock() only disables preemption (mostly). So modifing state + * in an interrupt could screw up some in progress fpregs operation, + * but appear to work. Warn about it. + */ + WARN_ON_ONCE(!irq_fpu_usable()); + WARN_ON_ONCE(current->flags & PF_KTHREAD); + + fpregs_lock(); + + fpregs_assert_state_consistent(); + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + fpregs_restore_userregs(); +} +EXPORT_SYMBOL_GPL(fpregs_lock_and_load); + #ifdef CONFIG_X86_DEBUG_FPU /* * If current FPU state according to its tracking (loaded FPU context on this From patchwork Fri Nov 4 22:35:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15824 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678548wru; Fri, 4 Nov 2022 15:41:22 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7k2lCJLOHYPNx4K2MGcUFrB9xQCMxYFz0XgNZ5KBMgNH//9bEsjPMpIDTpoKOw0rdUStPj X-Received: by 2002:a63:ee4f:0:b0:46f:87a8:97ab with SMTP id n15-20020a63ee4f000000b0046f87a897abmr29596193pgk.349.1667601682641; Fri, 04 Nov 2022 15:41:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601682; cv=none; d=google.com; s=arc-20160816; b=BteZ24AmyawEbvC0ngTaS6pT4XdhZCHFzQO9foCiScu8QU6XjJMZtIQJEPeb16in25 991rVXFWlqYDVgeAiVXy5tUuLva1WTCbFGWrwhh4rq6zHXdl6gDpXanswaomQdZBqubK 3KADtMrUkbD/Z3o4WNF8SlpxExyeNyKD/PCZuUD2rHqG+gIxsNpaS0LZ+zPwokhGL5Hx sZRaoo8PNeS+wP57HYC4jSSfcxdbz3PgxrQhO5kY1ghyk9Tl8fg1TA8VAni2oxWMgWbV kUnBqotXMqQB5C6/WFZBRRYWKCnrziJl9QxDdMkryVebM8eZc79SW+RGqz+b60PsYzsJ VlLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=5cXRZ+GFeRiwqSbuVS4bPphYTB3HdTmzjcvqB/Pyra0=; b=ZYc0aML0e2WRyV38jb2HfICq7IgUM1stWr4sF+2xDf2LJ55NcIxjvOpF8ulP681f8R zoXDVjZkTeTy6K86iLgWmQNQ5jzHpEkiAtllb9w/sJBCAPcLi4RtPjE3DCHaHX48Ym1V tOhTYkiQ1HMlwS2UUT4wodEHV7oOZYPhmPcetnWzZO8/T2n+A56PTt6RqmMal+swoWlG L/y4JyoBtiutaY/IfccqFRW/JmK1eJpdMGxak2ig8y2WocPcJUGpETwP4fjZUPj75oZR 2gLonBvR1f+RtRf9uTbdpaRkOwLgFKeBiVUY+5HMMRFZF07Q4UNw61FtYaguuuyMLEH6 j4gQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jFyUQeMr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c11-20020a65674b000000b0041b577f3356si925724pgu.720.2022.11.04.15.41.09; Fri, 04 Nov 2022 15:41:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jFyUQeMr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229975AbiKDWk1 (ORCPT + 99 others); Fri, 4 Nov 2022 18:40:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230003AbiKDWjg (ORCPT ); Fri, 4 Nov 2022 18:39:36 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAE3E32BAC; Fri, 4 Nov 2022 15:39:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601569; x=1699137569; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=xVzcmLcOg1LYW5UNi4yo2WlRphPxkoAotUhaYFjXMBs=; b=jFyUQeMrJeUUn+y/c/zTaR8TL6nII1K6dY5uUVxVujjC9kV1OaitoNZU l6CBoG2MbsFVgpiD7vO+7AszqSiNA3m/xuipzS5fyYwxnzlA51vqktzGr 4BQgp6C1eeWmuPzlysfMOI/GbURgv3+PR+9VgY0XapqcNL1zoQLue9LYO U/3H9TwjrIYTt6gpqzjszHdMT/F0Hg99s6h5pzNRKMSJ4snB8oXWQB3sH Jik0neGS9duuE2JllqZJ6p0/mhbOiB3RugmPczaFQHlIY258g/s70zbJ2 zFKcBrYldMj7ofkbD7ZqhE0a4GY7uZeEZjoM6aZGijYPDJ2t9g4OL+k5L w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840503" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840503" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:28 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668513990" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668513990" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:27 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Michael Kerrisk Subject: [PATCH v3 07/37] x86/cet: Add user control-protection fault handler Date: Fri, 4 Nov 2022 15:35:34 -0700 Message-Id: <20221104223604.29615-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607102056573840?= X-GMAIL-MSGID: =?utf-8?q?1748607102056573840?= From: Yu-cheng Yu A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack. There already exists a control-protection fault handler for handling kernel IBT. Refactor this fault handler into sparate user and kernel handlers, like the page fault handler. Add a control-protection handler for usermode. Keep the same behavior for the kernel side of the fault handler, except for converting a BUG to a WARN in the case of a #CP happening when !cpu_feature_enabled(). This unifies the behavior with the new shadow stack code, and also prevents the kernel from crashing under this situation which is potentially recoverable. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Cc: Michael Kerrisk --- v3: - Shorten user/kernel #CP handler function names (peterz) - Restore CP_ENDBR check to kernel handler (peterz) - Utilize CONFIG_X86_CET (Kees) - Unify "unexpected" warnings (Andrew Cooper) - Use 2d array for error code chars (Andrew Cooper) - Add comment about why to read SSP MSR before enabling interrupts v2: - Integrate with kernel IBT fault handler - Update printed messages. (Dave) - Remove array_index_nospec() usage. (Dave) - Remove IBT messages. (Dave) - Add enclave error code bit processing it case it can get triggered somehow. - Add extra "unknown" in control_protection_err. v1: - Update static asserts for NSIGSEGV Yu-cheng v29: - Remove pr_emerg() since it is followed by die(). - Change boot_cpu_has() to cpu_feature_enabled(). arch/arm/kernel/signal.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/signal32.c | 2 +- arch/sparc/kernel/signal32.c | 2 +- arch/sparc/kernel/signal_64.c | 2 +- arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/idtentry.h | 2 +- arch/x86/kernel/idt.c | 2 +- arch/x86/kernel/signal_compat.c | 2 +- arch/x86/kernel/traps.c | 107 ++++++++++++++++++++--- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm.S | 2 +- include/uapi/asm-generic/siginfo.h | 3 +- 13 files changed, 114 insertions(+), 24 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index e07f359254c3..9a3c9de5ac5e 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -681,7 +681,7 @@ asmlinkage void do_rseq_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 9ad911f1647c..81b13a21046e 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1166,7 +1166,7 @@ void __init minsigstksz_setup(void) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index 4700f8522d27..bbd542704730 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -460,7 +460,7 @@ void compat_setup_restart_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c index dad38960d1a8..82da8a2d769d 100644 --- a/arch/sparc/kernel/signal32.c +++ b/arch/sparc/kernel/signal32.c @@ -751,7 +751,7 @@ asmlinkage int do_sys32_sigstack(u32 u_ssptr, u32 u_ossptr, unsigned long sp) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c index 570e43e6fda5..b4e410976e0d 100644 --- a/arch/sparc/kernel/signal_64.c +++ b/arch/sparc/kernel/signal_64.c @@ -562,7 +562,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 30cd12905499..5ff93b8165ed 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -93,6 +93,12 @@ #define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) #endif +#ifdef CONFIG_X86_KERNEL_IBT +#define DISABLE_IBT 0 +#else +#define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -116,7 +122,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ DISABLE_ENQCMD) #define DISABLED_MASK17 0 -#define DISABLED_MASK18 0 +#define DISABLED_MASK18 (DISABLE_IBT) #define DISABLED_MASK19 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 72184b0b2219..69e26f48d027 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -618,7 +618,7 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault); #endif /* #CP */ -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); #endif diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index a58c6bc1cd68..5074b8420359 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -107,7 +107,7 @@ static const __initconst struct idt_data def_idts[] = { ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE), #endif -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET INTG(X86_TRAP_CP, asm_exc_control_protection), #endif diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 879ef8c72f5c..d441804443d5 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void) */ BUILD_BUG_ON(NSIGILL != 11); BUILD_BUG_ON(NSIGFPE != 15); - BUILD_BUG_ON(NSIGSEGV != 9); + BUILD_BUG_ON(NSIGSEGV != 10); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 6); BUILD_BUG_ON(NSIGCHLD != 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 178015a820f0..1ba42c6118ce 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -212,12 +212,7 @@ DEFINE_IDTENTRY(exc_overflow) do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL); } -#ifdef CONFIG_X86_KERNEL_IBT - -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - +#ifdef CONFIG_X86_CET enum cp_error_code { CP_EC = (1 << 15) - 1, @@ -230,15 +225,87 @@ enum cp_error_code { CP_ENCL = 1 << 15, }; -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +static const char control_protection_err[][10] = { + [0] = "unknown", + [1] = "near ret", + [2] = "far/iret", + [3] = "endbranch", + [4] = "rstorssp", + [5] = "setssbsy", +}; + +static const char *cp_err_string(unsigned long error_code) +{ + unsigned int cpec = error_code & CP_EC; + + if (cpec >= ARRAY_SIZE(control_protection_err)) + cpec = 0; + return control_protection_err[cpec]; +} + +static void do_unexpected_cp(struct pt_regs *regs, unsigned long error_code) +{ + WARN_ONCE(1, "Unexpected %s #CP, error_code: %s\n", + user_mode(regs) ? "user mode" : "kernel mode", + cp_err_string(error_code)); +} +#endif /* CONFIG_X86_CET */ + +void do_user_cp_fault(struct pt_regs *regs, unsigned long error_code); + +#ifdef CONFIG_X86_USER_SHADOW_STACK +static DEFINE_RATELIMIT_STATE(cpf_rate, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + +void do_user_cp_fault(struct pt_regs *regs, unsigned long error_code) { - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); + struct task_struct *tsk; + unsigned long ssp; + + /* + * An exception was just taken from userspace. Since interrupts are disabled + * here, no scheduling should have messed with the registers yet and they + * will be whatever is live in userspace. So read the SSP before enabling + * interrupts so locking the fpregs to do it later is not required. + */ + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + cond_local_irq_enable(regs); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + /* Ratelimit to prevent log spamming. */ + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + __ratelimit(&cpf_rate)) { + pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)%s", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + cp_err_string(error_code), + error_code & CP_ENCL ? " in enclave" : ""); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); } - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) + force_sig_fault(SIGSEGV, SEGV_CPERR, (void __user *)0); + cond_local_irq_disable(regs); +} +#endif + +void do_kernel_cp_fault(struct pt_regs *regs, unsigned long error_code); + +#ifdef CONFIG_X86_KERNEL_IBT +static __ro_after_init bool ibt_fatal = true; + +extern void ibt_selftest_ip(void); /* code label defined in asm below */ + +void do_kernel_cp_fault(struct pt_regs *regs, unsigned long error_code) +{ + if ((error_code & CP_EC) != CP_ENDBR) { + do_unexpected_cp(regs, error_code); return; + } if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { regs->ax = 0; @@ -284,9 +351,25 @@ static int __init ibt_setup(char *str) } __setup("ibt=", ibt_setup); - #endif /* CONFIG_X86_KERNEL_IBT */ +#ifdef CONFIG_X86_CET +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (user_mode(regs)) { + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + do_user_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } else { + if (cpu_feature_enabled(X86_FEATURE_IBT)) + do_kernel_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } +} +#endif /* CONFIG_X86_CET */ + #ifdef CONFIG_X86_F00F_BUG void handle_invalid_op(struct pt_regs *regs) #else diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index f82857e48815..cf4ee15e956e 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -638,7 +638,7 @@ static struct trap_array_entry trap_array[] = { TRAP_ENTRY(exc_coprocessor_error, false ), TRAP_ENTRY(exc_alignment_check, false ), TRAP_ENTRY(exc_simd_coprocessor_error, false ), -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET TRAP_ENTRY(exc_control_protection, false ), #endif }; diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index 6b4fdf6b9542..32f1b05b7a3c 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -148,7 +148,7 @@ xen_pv_trap asm_exc_page_fault xen_pv_trap asm_exc_spurious_interrupt_bug xen_pv_trap asm_exc_coprocessor_error xen_pv_trap asm_exc_alignment_check -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET xen_pv_trap asm_exc_control_protection #endif #ifdef CONFIG_X86_MCE diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ffbe4cec9f32..0f52d0ac47c5 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -242,7 +242,8 @@ typedef struct siginfo { #define SEGV_ADIPERR 7 /* Precise MCD exception */ #define SEGV_MTEAERR 8 /* Asynchronous ARM MTE error */ #define SEGV_MTESERR 9 /* Synchronous ARM MTE exception */ -#define NSIGSEGV 9 +#define SEGV_CPERR 10 /* Control protection fault */ +#define NSIGSEGV 10 /* * SIGBUS si_codes From patchwork Fri Nov 4 22:35:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15823 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678527wru; Fri, 4 Nov 2022 15:41:20 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4t2ob2NJXCylA8q3oteafpoqVhJtXJ66LobHtfc0J01Blhd+7vW/71sqaXBuZu6lyqADFO X-Received: by 2002:a63:2c90:0:b0:439:ee2c:ab2f with SMTP id s138-20020a632c90000000b00439ee2cab2fmr33752814pgs.2.1667601679798; Fri, 04 Nov 2022 15:41:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601679; cv=none; d=google.com; s=arc-20160816; b=NMSo840MSOg1c/dtROvhZTz/8+XTPoa2qWSuytNADAwBLFXC+byo4fnM5mo8DO4fO0 j96U62cUbkDqc/aFPGML6daTL4lUmYF2e2q0nSw0L28zCGLklDCCdVIgIPke37gIUuFt W9S5l2pSofWkObj8c7X57UDUYV7gW7HFLtZIxxSrbc6ORrXHbE5+z158UWMJdwpaUf6R eIzQbI+kYQAeLrp9QzMYtXm1NcW07F7Mdk/spfATZSjqF8tKN+Yb9WcZ/W8S7U2KzYBL k7LjyHQWncK/BXwM4bnbaTQfa1sL+tbcmp0b3xjbeTaookAAi3X2ypH8DU+PFIyCzKzm SHYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=XiM16Dm0rAaZdxXbCBWDj+qQN92LNq0rwVghQU9otjk=; b=xX1FbvhXkb6IrZDGD+QKY/F5MY8m1rGOnBn/Y/B2RYgjkxHr/moY1z3vMb3iLOP/7q w52Trv/rSacXsIF/U1BzafF7e8c9vbWrrRVbIL1OSDJw7jS5lM8saQaxLixljlyJNbVM /Z4F/5K+4LE7dUgAH7SyalwR8AJT/C+avCqTjqnKcIye2V1fEvNX6KXvOqniZ8ZiEiz3 RXdbKQfVxFxor3yS3Lt4H0NHqdMFtl8FwddTPM7uu4ylHL8LCGXw72ay/qTEi225ICz+ ZgxEGoebFBBaS/Ic4qrjTZX+YbYy3UYGQsfSv3VufuIdwMU3WVz1cmMt95LLAKcvjNsB gSEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=delmJX49; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ng17-20020a17090b1a9100b00213f1db312esi950313pjb.111.2022.11.04.15.41.05; Fri, 04 Nov 2022 15:41:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=delmJX49; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229781AbiKDWkY (ORCPT + 99 others); Fri, 4 Nov 2022 18:40:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbiKDWj6 (ORCPT ); Fri, 4 Nov 2022 18:39:58 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DABAA4044F; Fri, 4 Nov 2022 15:39:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601570; x=1699137570; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=s4pESg4b+rdmfkzmthwTSS4lIUw19qtNeWA9uReG7HU=; b=delmJX49R61nYkAZk3MvRj3Jq9D/QyZ4NQF4GBg1+EalmV48rPnwzb8r PJdHnGdL+jhFfIV31xso1GnZRAOSWj4bpo3bi5FCtdN7Cwu904ypv00G3 0Z+204qWDF7JFk6XEm1HW72bbyhZ/dtX3j8yyvBk2Nwf4yUpMrstA7oZS ra4EJuqU6QNAutV9EtD4CMttoU0nTxTs7SyLhGL6V6Uafq+OE4ToypEpj /iOpVMsyZIK95MFpo1J/gPhZs9x4MZP3ZuySzmi/e1kb5ClvSqPN6tMbv dGMXgXWVwMZNspN2EpbXZhZVugmnOVZIksBmVhyWFH8bDcq8saa96Sx0G Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840506" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840506" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:29 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514010" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514010" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:28 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Christoph Hellwig Subject: [PATCH v3 08/37] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Date: Fri, 4 Nov 2022 15:35:35 -0700 Message-Id: <20221104223604.29615-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607098991614524?= X-GMAIL-MSGID: =?utf-8?q?1748607098991614524?= From: Yu-cheng Yu New processors that support Shadow Stack regard Write=0,Dirty=1 PTEs as shadow stack pages. In normal cases, it can be helpful to create Write=1 PTEs as also Dirty=1 if HW dirty tracking is not needed, because if the Dirty bit is not already set the CPU has to set Dirty=1 when it the memory gets written to. This creates addiontal work for the CPU. So tradional wisdom was to simply set the Dirty bit whenever you didn't care about it. However, it was never really very helpful for read only kernel memory. When CR4.CET=1 and IA32_S_CET.SH_STK_EN=1, some instructions can write to such supervisor memory. The kernel does not set IA32_S_CET.SH_STK_EN, so avoiding kernel Write=0,Dirty=1 memory is not strictly needed for any functional reason. But having Write=0,Dirty=1 kernel memory doesn't have any functional benefit either, so to reduce ambiguity between shadow stack and regular Write=0 pages, removed Dirty=1 from any kernel Write=0 PTEs. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: "H. Peter Anvin" Cc: Kees Cook Cc: Thomas Gleixner Cc: Dave Hansen Cc: Christoph Hellwig Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra --- v3: - Update commit log (Andrew Cooper, Peterz) v2: - Normalize PTE bit descriptions between patches arch/x86/include/asm/pgtable_types.h | 6 +++--- arch/x86/mm/pat/set_memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index aa174fed3a71..ff82237e7b6b 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -192,10 +192,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 2e5a045731de..af2267a9cdab 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2026,7 +2026,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } int set_memory_rw(unsigned long addr, int numpages) From patchwork Fri Nov 4 22:35:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15822 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678483wru; Fri, 4 Nov 2022 15:41:13 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5P3h3tbQU8rUituufienDC2MUR/XN0KotMFIrrmDQzd95mAu2lH3baIhELTB6XQqSDGBfB X-Received: by 2002:a63:5c21:0:b0:46f:cbcb:7617 with SMTP id q33-20020a635c21000000b0046fcbcb7617mr23422545pgb.199.1667601673173; Fri, 04 Nov 2022 15:41:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601673; cv=none; d=google.com; s=arc-20160816; b=q7IdwQ30rHnwfd1fLKWRurf2UdHVgRMOwnzDEV0pTLOUdhH24DMqOpKDKWv7oxgqRs VGxvel7Zzy2TqTs8iK3D6FDRRLZ1MNMN5/w/WpXh7qbDNVfT9eS5CcXfgsaeIyOe9wsb rNOxsoo8vvog6ASenrrD93kf5if+5ntuKWZBiZw1sRIYURsFapEP+ZUaWmneZvtfHhty xxS6GRFWupMpXb5Kt0fxt3/yJP/ENidh+YiccMA/ADYlCxbofSsq1Nrx+M345EuuB646 yXxHcoLNcktm4xeJDHrOZBn/fsIQb/q53vXyW7g0R6MgEIBGMbSP1RST9s1V7qV4Yaii 6S5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=u+P+qa8kKTWg8AnGQCR7FIL5mFISVmI7FR/YQkztV68=; b=zpw1OHGcZQhS/HJIBs/aq0LhGbIqlc1dNM/NIGvLLGgeidSK4ELebAR6hnlJZScxuK jLbuIzCtNrRGTJ/BDSdei7lKvVZQd6D86+75L3QXyFUJ4nW5CEDLwRfOLJiQwuaGsBSp RxNS9r60S4T0g37WolT8rCgC6O0r1rWjl5aNHnLkfPWfYeUZ+xTTtH1WBzqsUr7K9JF/ YRxf19YAaOLb0uA7kOyNIviD1PyFVfd52RW5p0ijlKXTtpyXdxjzRMFEqMN/cBfCjQhz XxutSDXmJk8LdEPMIcPLN+36JeJEp4HLZS9WRixdHXry04SnnJ/AYy9K9qboEm9oUFNm lUsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UtajLHmc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g15-20020a63f40f000000b0046f41385afdsi929478pgi.118.2022.11.04.15.40.58; Fri, 04 Nov 2022 15:41:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UtajLHmc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229703AbiKDWkV (ORCPT + 99 others); Fri, 4 Nov 2022 18:40:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229492AbiKDWj6 (ORCPT ); Fri, 4 Nov 2022 18:39:58 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1646E41985; Fri, 4 Nov 2022 15:39:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601571; x=1699137571; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=poplwKY6u8XD6c+b4FqOVfxOhvQPYQQiQgDKo7YigJ8=; b=UtajLHmcYYmrIISyZVlFrFI6GpzOeK5R4Y5PuWZKMT74+P1uxt7QoewO qc7KZl4z0XWZhySG58qrnB66JLMUPOORFOaQhPZpyWAqP0q9QgOf6Uf56 NiXW8wSQK5Yynwdq73qBN97aIeYxAQZYgfliRmvy+vJuW20Zr6IpS33fV Whm7tpD6n3n2gDuxU5clwwFlMQrYsuZu/fwBfS8SpYqLk8/1Gwf0Fb1Oj ex+ExoDj7ZYAdRCb+wrA1kI6U469ZIa9e+3+0YF7sGwtZAKTbRYcDcKqz 7WUG9e45XIf+1qQTIilkWkgZLjGFVhMC135CbGfb6fW3ilL0Ilfxf2ZjH A==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840510" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840510" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:30 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514015" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514015" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:29 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 09/37] x86/mm: Move pmd_write(), pud_write() up in the file Date: Fri, 4 Nov 2022 15:35:36 -0700 Message-Id: <20221104223604.29615-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607091774642535?= X-GMAIL-MSGID: =?utf-8?q?1748607091774642535?= From: Yu-cheng Yu To prepare the introduction of _PAGE_COW, move pmd_write() and pud_write() up in the file, so that they can be used by other helpers below. No functional changes. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5059799bebe3..a1d6f121ee35 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -159,6 +159,18 @@ static inline int pte_write(pte_t pte) return pte_flags(pte) & _PAGE_RW; } +#define pmd_write pmd_write +static inline int pmd_write(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_RW; +} + +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + static inline int pte_huge(pte_t pte) { return pte_flags(pte) & _PAGE_PSE; @@ -1103,12 +1115,6 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#define pmd_write pmd_write -static inline int pmd_write(pmd_t pmd) -{ - return pmd_flags(pmd) & _PAGE_RW; -} - #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) @@ -1138,12 +1144,6 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } -#define pud_write pud_write -static inline int pud_write(pud_t pud) -{ - return pud_flags(pud) & _PAGE_RW; -} - #ifndef pmdp_establish #define pmdp_establish pmdp_establish static inline pmd_t pmdp_establish(struct vm_area_struct *vma, From patchwork Fri Nov 4 22:35:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15825 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678625wru; Fri, 4 Nov 2022 15:41:32 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5iIf29jDC44jG8hZnhCBqZcu+25GgwJhcD0nk8HQJeM6Yu8vtIDL17qPgMQSFwVq1n/ZyU X-Received: by 2002:a17:90b:4ac5:b0:213:e936:a843 with SMTP id mh5-20020a17090b4ac500b00213e936a843mr28964987pjb.156.1667601692095; Fri, 04 Nov 2022 15:41:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601692; cv=none; d=google.com; s=arc-20160816; b=BL/22YUp/Crsy8rH/j2/7Mq9s9bSPTBuLS5k3v3mRXvGDi+zzgCy6LF+Wg13YeiqyW 2iI3PziUwoGORygt2WaN/ALJLDPQQfFiN/rjhuP8/2n0tj9gOBCDEi1vV0ip7168GHO/ uzmfoUaorkoQMseDcQ+taOE3iDgC77w7u5OC+KNhtCtcOI+a0lO+8Xf48WBbTe7QQRwA 19pjVDDy3vLMxbr3OiSzDUR/yUhM0seOSPxVagDUaVozT3KzBm38himow3l4dQ+6G99h N19xk8PN+IAobRbMiz3MrNgpIopVbVdOxJb9WDdm/PzpSaKTvuVA6vsxJ9IAcSyVAeyf +xFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=oWlN+JVyb8nINAJq2r92WC0WHs4zT051oVtBwlEGdwg=; b=cTwtcyuAMguE2WYLacXPB4ZTe4JnYxzslYwGGaMR9nFuuiv9w6j1PMKYazoz7dBYwZ 5B0MSaBbaktBwLTPfW5VgsIMhEmFxaUh/FqPeIoutuKrDY49YEvgKyEij2HNekV1ZkyC 4XUIY73MCaIPcFfEG1qeEDeagNhXF4MVG0s5AuP2lSzZizaOf2TP7tYgdlj6eeUQ3IMW AjhBGQ75qzm+zxB3zou1xOu9Tx3C5brFRfblH5BxGoPbRvna32U9JN3+sIle+e8Jmqg/ y7u3NvqNuVK3f1R6qAD1ZjQgmIMJqpV3e2+35Oj+gZYHSMJ6SFE8EFHeU8zidA8GXUS3 DLhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HyqxOKbs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u5-20020a170902b28500b00176b8830921si798794plr.294.2022.11.04.15.41.19; Fri, 04 Nov 2022 15:41:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HyqxOKbs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230031AbiKDWks (ORCPT + 99 others); Fri, 4 Nov 2022 18:40:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229988AbiKDWkI (ORCPT ); Fri, 4 Nov 2022 18:40:08 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC2DA419AB; Fri, 4 Nov 2022 15:39:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601572; x=1699137572; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=yxqLIMvVIC0HjvdAAoTTOb3e0gYLZmNcBxDEff9qZxI=; b=HyqxOKbsfPwZMovWHT1TLH/0vFBJ9T+YRQK4yWrPndLDCdX+dCpbQjtK UEWdkbaweJHyawfXAVgL5hnDyo/dNPue/D/Z2QXh5EYFad01mDJLRyEp1 oIwRGtUNPXPiFoTrOcuXy3iPZVaed3ykKsAQ+7SUXHPOh+yZO5YagWO0G 2LA0f3MppqqPDDyRLrTcMQamd6Obskgu7LjTn2tC/pGonCEPuZ+XrP8ow 4DKJ2dzuSkgBNPavTUxUD8932RJ9+iK+lH5zZ6BeC13kbLEo6YEhvWecp PdfoZALsYin5XlgODgM6eQaL7si1yfteCeIdJQf3xxyE5mI4prIdv0ckz Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840513" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840513" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:31 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514022" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514022" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:30 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 10/37] x86/mm: Introduce _PAGE_COW Date: Fri, 4 Nov 2022 15:35:37 -0700 Message-Id: <20221104223604.29615-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607111864263473?= X-GMAIL-MSGID: =?utf-8?q?1748607111864263473?= From: Yu-cheng Yu Some OSes have a greater dependence on software available bits in PTEs than Linux. That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. So in order to support shadow stack memory, Linux should avoid creating memory with this PTE bit combination unless it intends for it to be shadow stack. The reason it's lightly used is that Dirty=1 is normally set by HW _before_ a write. A write with a Write=0 PTE would typically only generate a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1 *and* generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware which supports shadow stacks will no longer exhibit this oddity. So that leaves Write=0,Dirty=1 PTEs created in software. To achieve this, in places where Linux normally creates Write=0,Dirty=1, it can use the software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0,Dirty=1, it instead creates Write=0,Cow=1 except for shadow stack, which is Write=0,Dirty=1. Further differentiated by VMA flags, these PTE bit combinations would be set as follows for various types of memory: (Write=0,Cow=1,Dirty=0): - A modified, copy-on-write (COW) page. Previously when a typical anonymous writable mapping was made COW via fork(), the kernel would mark it Write=0,Dirty=1. Now it will instead use the Cow bit. This happens in copy_present_pte(). - A R/O page that has been COW'ed. The user page is in a R/O VMA, and get_user_pages(FOLL_FORCE) needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as Write=0 and Cow=1. - A shared shadow stack PTE. When a shadow stack page is being shared among processes (this happens at fork()), its PTE is made Dirty=0, so the next shadow stack access causes a fault, and the page is duplicated and Dirty=1 is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (Write=0,Cow=0,Dirty=1): - A shadow stack PTE. - A Cow PTE created when a processor without shadow stack support set Dirty=1. Define _PAGE_COW and update pte_*() helpers and apply the same changes to pmd and pud. There are six bits left available to software in the 64-bit PTE after consuming a bit for _PAGE_COW. No space is consumed in 32-bit kernels because shadow stacks are not enabled there. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v3: - Add comment around _PAGE_TABLE in response to comment from (Andrew Cooper) - Check for PSE in pmd_shstk (Andrew Cooper) - Get to the point quicker in commit log (Andrew Cooper) - Clarify and reorder commit log for why the PTE bit examples have multiple entries. Apply same changes for comment. (peterz) - Fix comment that implied dirty bit for COW was a specific x86 thing (peterz) - Fix swapping of Write/Dirty (PeterZ) v2: - Update commit log with comments (Dave Hansen) - Add comments in code to explain pte modification code better (Dave) - Clarify info on the meaning of various Write,Cow,Dirty combinations arch/x86/include/asm/pgtable.h | 211 ++++++++++++++++++++++++--- arch/x86/include/asm/pgtable_types.h | 59 +++++++- 2 files changed, 245 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a1d6f121ee35..c284bb6f62a5 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,9 +124,17 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -134,9 +142,18 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) == + (_PAGE_DIRTY | _PAGE_PSE); } static inline int pmd_young(pmd_t pmd) @@ -144,9 +161,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -156,13 +173,21 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -300,6 +325,44 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +/* + * Normally COW memory can result in Dirty=1,Write=0 PTs. But in the case + * of X86_FEATURE_USER_SHSTK, the software COW bit is used, since the + * Dirty=1,Write=0 will result in the memory being treated as shaodw stack + * by the HW. So when creating COW memory, a software bit is used + * _PAGE_BIT_COW. The following functions pte_mkcow() and pte_clear_cow() + * take a PTE marked conventially COW (Dirty=1) and transition it to the + * shadow stack compatible version of COW (Cow=1). + */ + +static inline pte_t pte_mkcow(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pte; + + pte = pte_clear_flags(pte, _PAGE_DIRTY); + return pte_set_flags(pte, _PAGE_COW); +} + +static inline pte_t pte_clear_cow(pte_t pte) +{ + /* + * _PAGE_COW is unnecessary on !X86_FEATURE_USER_SHSTK kernels. + * See the _PAGE_COW definition for more details. + */ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pte; + + /* + * PTE is getting copied-on-write, so it will be dirtied + * if writable, or made shadow stack if shadow stack and + * being copied on access. Set they dirty bit for both + * cases. + */ + pte = pte_set_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { @@ -319,7 +382,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -329,7 +392,16 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (Write=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pte_dirty(pte)) + pte = pte_mkcow(pte); + return pte; } static inline pte_t pte_mkexec(pte_t pte) @@ -339,7 +411,19 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pteval_t dirty = _PAGE_DIRTY; + + /* Avoid creating Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + /* pte_clear_cow() also sets Dirty=1 */ + return pte_clear_cow(pte); } static inline pte_t pte_mkyoung(pte_t pte) @@ -349,7 +433,12 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { - return pte_set_flags(pte, _PAGE_RW); + pte = pte_set_flags(pte, _PAGE_RW); + + if (pte_dirty(pte)) + pte = pte_clear_cow(pte); + + return pte; } static inline pte_t pte_mkhuge(pte_t pte) @@ -396,6 +485,26 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +/* See comments above pte_mkcow() */ +static inline pmd_t pmd_mkcow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pmd; + + pmd = pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_set_flags(pmd, _PAGE_COW); +} + +/* See comments above pte_mkcow() */ +static inline pmd_t pmd_clear_cow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pmd; + + pmd = pmd_set_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pmd_uffd_wp(pmd_t pmd) { @@ -420,17 +529,36 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pmd_dirty(pmd)) + pmd = pmd_mkcow(pmd); + return pmd; } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmdval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pmd_write(pmd)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd_clear_cow(pmd); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -450,7 +578,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_RW); + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_cow(pmd); + return pmd; } static inline pud_t pud_set_flags(pud_t pud, pudval_t set) @@ -467,6 +599,26 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) return native_make_pud(v & ~clear); } +/* See comments above pte_mkcow() */ +static inline pud_t pud_mkcow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pud; + + pud = pud_clear_flags(pud, _PAGE_DIRTY); + return pud_set_flags(pud, _PAGE_COW); +} + +/* See comments above pte_mkcow() */ +static inline pud_t pud_clear_cow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pud; + + pud = pud_set_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_COW); +} + static inline pud_t pud_mkold(pud_t pud) { return pud_clear_flags(pud, _PAGE_ACCESSED); @@ -474,17 +626,32 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pud_dirty(pud)) + pud = pud_mkcow(pud); + return pud; } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pud_write(pud)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -504,7 +671,11 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + if (pud_dirty(pud)) + pud = pud_clear_cow(pud); + return pud; } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index ff82237e7b6b..2bb49e62398d 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -21,7 +21,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -34,6 +35,15 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * Indicates a copy-on-write page. + */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ +#else +#define _PAGE_BIT_COW 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +127,40 @@ #define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif +/* + * The hardware requires shadow stack to be read-only and Dirty. + * _PAGE_COW is a software-only bit used to separate copy-on-write PTEs + * from shadow stack PTEs: + * + * (Write=0,Cow=1,Dirty=0): + * - A modified, copy-on-write (COW) page. Previously when a typical + * anonymous writable mapping was made COW via fork(), the kernel would + * mark it Write=0,Dirty=1. Now it will instead use the Cow bit. This + * happens in copy_present_pte(). + * - A R/O page that has been COW'ed. The user page is in a R/O VMA, + * and get_user_pages(FOLL_FORCE) needs a writable copy. The page fault + * handler creates a copy of the page and sets the new copy's PTE as + * Write=0 and Cow=1. + * - A shared shadow stack PTE. When a shadow stack page is being shared + * among processes (this happens at fork()), its PTE is made Dirty=0, so + * the next shadow stack access causes a fault, and the page is + * duplicated and Dirty=1 is set again. This is the COW equivalent for + * shadow stack pages, even though it's copy-on-access rather than + * copy-on-write. + * + * (Write=0,Cow=0,Dirty=1): + * - A shadow stack PTE. + * - A Cow PTE created when a processor without shadow stack support set + * Dirty=1. + */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_COW) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* @@ -186,12 +230,17 @@ enum page_cache_mode { #define PAGE_READONLY __pg(__PP| 0|_USR|___A|__NX| 0| 0| 0) #define PAGE_READONLY_EXEC __pg(__PP| 0|_USR|___A| 0| 0| 0| 0) -#define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) -#define _KERNPG_TABLE_NOENC (__PP|__RW| 0|___A| 0|___D| 0| 0) -#define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) +/* + * Page tables needs to have Write=1 in order for any lower PTEs to be + * writable. This includes shadow stack memory (Write=0, Dirty=1) + */ #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) +#define _KERNPG_TABLE_NOENC (__PP|__RW| 0|___A| 0|___D| 0| 0) +#define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) + +#define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) #define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) From patchwork Fri Nov 4 22:35:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15826 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678635wru; Fri, 4 Nov 2022 15:41:33 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4zPohOW3+PHecbcL4m2TgCy59XQad64hYR/bozBGcQOBhqHQVKxdLBJ+pP+ZiUiNvGbX9i X-Received: by 2002:a05:6a00:1149:b0:53e:62c8:10bc with SMTP id b9-20020a056a00114900b0053e62c810bcmr37303660pfm.49.1667601693250; Fri, 04 Nov 2022 15:41:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601693; cv=none; d=google.com; s=arc-20160816; b=QqvxgRVhSICzGl8glBKDuf1z/up/YAo9oUZgwjJAHuQ7SYRMU3Nly4koTHBfkoFb/4 3jKKPmnfI40/n2z33FekiQ6n8gcH3Xb7XsNUI5w7ygt0MsyV9o+gVHebN6kmR+2POiav 0Ugj72CtSqRYHLpGfwOlq0BtRplYyyCqBF0a7uo5j7wpwCNPw3HNtLnY8VU3knoI3HNY vrWI5vqcqiq3gSsR3x1b6J7qPZ8LFdmZKWvxG/+4sMEBJDHKFiJaY3963cms6k4QsMWf tdQrqRKWearrls+9axSaWxHJBVTMfOK4UTtppvLq4XJlNCzl8BVNGTMnV5lNqDFnLD5I MDOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=M1brq0wHgICg9bwZzkT3y65gbxnx6ig+81ciG+lhKzQ=; b=oReN/UV1mnmReQM177m5Snl3daB49Nf5fYZxWwcRAITts6XN7Sd84lEEO/32UU5/6B 2frZWuAtMobzluToU+usLi82IgOM2rMe7F+GDXI2yl0rLb4F490Eb3kehIxojwAATV17 7yIJvb2Me+tlBtGLtwqJNOh071zu5GEDLwHn8wMBxPp2Xt8De3cHLzCH9MVUwnXoxosA EE+oq3E95dSovbcRhnU6gnyzf2ZBSgFuJXrwS1sERGzRtQw87Vv/MJ7ZgbqOvKO8ecIv elGRHC0Zsyvu2bYZgxd1KKEyvqBVPA0e7NDxWpWb4ud3ZJsb7MpIIpDXEAn7N8RV0SI3 liyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GePPTV4o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p1-20020a170902b08100b001854f631c8esi808917plr.221.2022.11.04.15.41.21; Fri, 04 Nov 2022 15:41:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GePPTV4o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230153AbiKDWkv (ORCPT + 99 others); Fri, 4 Nov 2022 18:40:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230045AbiKDWkJ (ORCPT ); Fri, 4 Nov 2022 18:40:09 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB26C42F78; Fri, 4 Nov 2022 15:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601572; x=1699137572; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=7RPrghckrOaEzYfVScgc7PDmuhgD0p+mb/96hlyTVUg=; b=GePPTV4oZJGLtTa0qlOmr7fBM8oH/rMsRg6YSHixgU0edWrP0PkWYBwT Gxcs+oGx5RYjPOp+QcpWhjawWMTDT7xu8WLfXlH3Wv6XP+aOJk4sQm4dG Glh0HRQhQZuzYwVDee+khmSq15T0ZGQjdYKL/HBlVMlWd8EwPg1fob4+6 kMxNnE1vQ9IDMASvf0w/lQhu+/9UQ9bh3+Xzl2Ja/4ubrS+nspjIV5TFW etSmZJfgytY5+0iHxYxJUfiC4V1jI1KZ1oBn+g4UdeKcXPDgtCHs2HaZ7 gslELfuj7M8Bhl/CBbFPArttSN4JIh8tBwV8wZVH7m+QRqp97MnmURNdP g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840516" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840516" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:32 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514028" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514028" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:31 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 11/37] x86/mm: Update pte_modify for _PAGE_COW Date: Fri, 4 Nov 2022 15:35:38 -0700 Message-Id: <20221104223604.29615-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607113106517338?= X-GMAIL-MSGID: =?utf-8?q?1748607113106517338?= From: Yu-cheng Yu The Write=0,Dirty=1 PTE has been used to indicate copy-on-write pages. However, newer x86 processors also regard a Write=0,Dirty=1 PTE as a shadow stack page. In order to separate the two, the software-defined _PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and pte_*() are updated to do this. pte_modify() takes a "raw" pgprot_t which was not necessarily created with any of the existing PTE bit helpers. That means that it can return a pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to create one. However pte_modify() changes a PTE to 'newprot', but it doesn't use the pte_*(). Modify it to also move _PAGE_DIRTY to _PAGE_COW. Apply the same changes to pmd_modify(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v2: - Update commit log with text and suggestions from (Dave Hansen) - Drop fixup_dirty_pte() in favor of clearing the HW dirty bit along with the _PAGE_CHG_MASK masking, then calling pte_mkdirty() (Dave Hansen) arch/x86/include/asm/pgtable.h | 41 +++++++++++++++++++++++++++++----- 1 file changed, 35 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index c284bb6f62a5..81f388a5a5ab 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -791,26 +791,55 @@ static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { + pteval_t _page_chg_mask_no_dirty = _PAGE_CHG_MASK & ~_PAGE_DIRTY; pteval_t val = pte_val(pte), oldval = val; + pte_t pte_result; /* * Chop off the NX bit (if present), and add the NX portion of * the newprot (if present): */ - val &= _PAGE_CHG_MASK; - val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; + val &= _page_chg_mask_no_dirty; + val |= check_pgprot(newprot) & ~_page_chg_mask_no_dirty; val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); - return __pte(val); + + pte_result = __pte(val); + + /* + * Dirty bit is not preserved above so it can be done + * in a special way for the shadow stack case, where it + * needs to set _PAGE_COW. pte_mkdirty() will do this in + * the case of shadow stack. + */ + if (pte_dirty(pte)) + pte_result = pte_mkdirty(pte_result); + + return pte_result; } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { + pteval_t _hpage_chg_mask_no_dirty = _HPAGE_CHG_MASK & ~_PAGE_DIRTY; pmdval_t val = pmd_val(pmd), oldval = val; + pmd_t pmd_result; - val &= _HPAGE_CHG_MASK; - val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val &= _hpage_chg_mask_no_dirty; + val |= check_pgprot(newprot) & ~_hpage_chg_mask_no_dirty; val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); - return __pmd(val); + + + pmd_result = __pmd(val); + + /* + * Dirty bit is not preserved above so it can be done + * specially for the shadow stack case. It needs to move + * the HW dirty bit to the software COW bit. Set in the + * result if it was set in the original value. + */ + if (pmd_dirty(pmd)) + pmd_result = pmd_mkdirty(pmd_result); + + return pmd_result; } /* From patchwork Fri Nov 4 22:35:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15830 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679275wru; Fri, 4 Nov 2022 15:42:55 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5FUkcxJZPAc+fLVpGv18tvKsqjJuf5kOuCcGZq+JSIt/lppSl1S9TIMA3g0BVOO49r0QRh X-Received: by 2002:a17:906:58c7:b0:722:f4bf:cb75 with SMTP id e7-20020a17090658c700b00722f4bfcb75mr36114039ejs.450.1667601775364; Fri, 04 Nov 2022 15:42:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601775; cv=none; d=google.com; s=arc-20160816; b=OhfT7hprF3y4u0p1dPZsLRTJO3hs7MkdjXuTgpYYnxlf3uAoPvRjCTuZRCu039dbOm tID22d8zA6AcuzZHSX5fPZJM1rFEaXmTa1Ci7ha0rH5djoKxpnNZJcT9zHxnluWTk5aC u19dH1pomDPOFWYqff17nFAqDIBqdiK+VTky8dVXv3sCR91Ldq99+zvmT2qvBO5vcgez x55prbfPVyQ5L8sJmvi62qgeYEaqXzHy0p12ZjeiTQLLaAps8OHuJJWLqpMC8qvkpfxE myLnE6mEd5zEkCWJPG48wAhXAI7R9ZJ7qmtRl4iuj4PGFNSU97ugRlInmH/uRTNWARCp OXtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=Kjn6NeKXWh8OCzYwhdpWW/fhJG1nU6Z+OP15Pr5tU/I=; b=kcYqpngKj4vrdKxZEqqGXwVGM1i0QtMDVU4ox4039a+8XQcQVI/EocSyQcoeCWQDDT R033DFKeNhC6MYajM6Q+/IJO4pT3pIoeoDSfWVzexfnCUX/nsQ4Y9K9yhha+vNvM0Kan /ZDociKbwHpWZz+PX0xZmTRrS8NC2EBCGTZx9HdElBjItibZE8h7Lsu4DLd9O3ExXTIE cNSSzCCGgLGOljyCBCQrRfHuUO1X1PE9N/7d9sC4tR0pfUN3YXtXhb2VC1cp9CZnPK14 EnYMvhV2kZuN/DTtvLu8JLhPxZgeNfpVf39Qf3GvQNvOW1MQzAboHR0babFtVFRul0td aMFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hxvqphCQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gs8-20020a1709072d0800b007addff99f09si224377ejc.1004.2022.11.04.15.42.31; Fri, 04 Nov 2022 15:42:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hxvqphCQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230165AbiKDWlL (ORCPT + 99 others); Fri, 4 Nov 2022 18:41:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230072AbiKDWkK (ORCPT ); Fri, 4 Nov 2022 18:40:10 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F03D243AE2; Fri, 4 Nov 2022 15:39:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601574; x=1699137574; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=8jsAS8647WvaR/jj/jAxKS3gRgPtSsCKGZz+X7gLR1o=; b=hxvqphCQSCjS35NnftYr9QjHGkghCOg3XAstdgT/12tf30bdqcppV9qi z/LpepMAErUs6HRCtHzlmXoLCQ/CQAJxCroB5Uo4huIx6CGZfdKPHn47A V2SE3Dz0myWAbR/3A6wprxiFcqd8IcoCCuHfQfH3T7Ah7i4a/byZFkOYv oQrLoS5fIQXD80lEK7fD3pV7aL4Nramoi8t+fbKyYP81rNKOovFXZSjFi wIFNxEvl9WAozBPQI2Ie9a6G7Yg642UyqMgy9hl1d2gMwxrWdLjMXg66/ X30WSUvz7yYOWhHQdcO9KqReC+MpiywW2K8Dqzy+hPQJwdF5+DcDROEwd g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840521" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840521" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:33 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514033" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514033" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:32 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 12/37] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Date: Fri, 4 Nov 2022 15:35:39 -0700 Message-Id: <20221104223604.29615-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607199108992764?= X-GMAIL-MSGID: =?utf-8?q?1748607199108992764?= From: Yu-cheng Yu When Shadow Stack is in use, Write=0,Dirty=1 PTE are reserved for shadow stack. Copy-on-write PTes then have Write=0,Cow=1. When a PTE goes from Write=1,Dirty=1 to Write=0,Cow=1, it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting Shadow Stack, and a TLB flush is not necessary. The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non- atomically, a transient shadow stack PTE can be created as a result. Thus, prevent that with cmpxchg. In the case of pmdp_set_wrprotect(), for nopmd configs the ->pmd operated on does not exist and the logic would need to be different. Although the extra functionality will normally be optimized out when user shadow stacks are not configured, also exclude it in the preprocessor stage so that it will still compile. User shadow stack is not supported there by Linux anyway. Leave the cpu_feature_enabled() check so that the functionality also disables based on runtime detection of the feature. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v3: - Remove unnecessary #ifdef (Dave Hansen) v2: - Compile out some code due to clang build error - Clarify commit log (dhansen) - Normalize PTE bit descriptions between patches (dhansen) - Update comment with text from (dhansen) Yu-cheng v30: - Replace (pmdval_t) cast with CONFIG_PGTABLE_LEVELES > 2 (Borislav Petkov). arch/x86/include/asm/pgtable.h | 35 ++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 81f388a5a5ab..f252c42f3ca1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1289,6 +1289,21 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1341,6 +1356,26 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#ifdef CONFIG_X86_USER_SHADOW_STACK + /* + * If Shadow Stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY + * to _PAGE_COW (see comments at pmd_wrprotect()). + * When a thread reads a RW=1, Dirty=0 PMD and before changing it + * to RW=0, Dirty=0, another thread could have written to the page + * and the PMD is RW=1, Dirty=1 now. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg(&pmdp->pmd, &old_pmd.pmd, new_pmd.pmd)); + + return; + } +#endif + clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } From patchwork Fri Nov 4 22:35:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15828 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp678934wru; Fri, 4 Nov 2022 15:42:15 -0700 (PDT) X-Google-Smtp-Source: AMsMyM62oOYpgeRC5lfUQa7y7xXwiT+sRp44LqSpLpeYYeFdCiR0LZYIEs6hQELxnQsWtRzlAI1/ X-Received: by 2002:aa7:d341:0:b0:464:778:c3fe with SMTP id m1-20020aa7d341000000b004640778c3femr16411720edr.251.1667601734974; Fri, 04 Nov 2022 15:42:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601734; cv=none; d=google.com; s=arc-20160816; b=h7ZDY57yjIpJDHZDW3xSA2tzkTU7NNmUNbYqc7x1hrLp25X3UQ5MlV7xQJvmg3/eMf bWyRzLjBWdXSxst3BU2I/7yIuMRukiEQpG+lzjzEXMTiRIRSIvibUUdn7p2QeEWPgkub NUbYlqpZNkq0klSQkGP7LqCmHClkVM0ULy3Ooq3D63lnNQsxKYPo0uamxbkn+ECuKlvU UcR6tyEimJyBw0QOr8kY3Psr0R23GgcWji1ug0S88kNau/msPK48ElKoeLpikBJvsKqs LdbxsZWuBeWecZ/dE7euCof0XYuDS98gTfgwZ16UcbW8jw2n7d9HPEH/e3v0yEK0Om02 tzKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=Ai+0YI2QFg3LVALS3FzlacVh2cbs76Lwk562aTEZcng=; b=gGmg942njUd5LBed2F4DHUqPeiSpZRRb2xGsIF/PcrDOVcPHSHmul6mpeLsDuRwfbA +f0b46ctUdYNgwiSeTma07GoO0GNXekGeFI/PxrTLdRxN2AZjFqqAa4RYaHRIKye/Xcb 2zqlZAcZcxmf3vkIYe+5nIawS1KnbpSIhoiP/EVjRIEGdOALfKjJamdySeL1R3+PXV0r 3QcDc7k+EnAo6/o+LUb6aXhTwrrbUTOss0gdQ6EHAChYCDUrgfOVMZd+MuXZvy5PbfUX K20U4isZkNPw3fm1B54P8IyEO7+tikhabP3HAQ+UzjcOFKhVgjp5gZRHrMCGy6W+RFSh gGXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="dS+Ey/oE"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ho35-20020a1709070ea300b0078219af8361si278614ejc.883.2022.11.04.15.41.51; Fri, 04 Nov 2022 15:42:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="dS+Ey/oE"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230197AbiKDWlf (ORCPT + 99 others); Fri, 4 Nov 2022 18:41:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230079AbiKDWkK (ORCPT ); Fri, 4 Nov 2022 18:40:10 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0823A3F050; Fri, 4 Nov 2022 15:39:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601575; x=1699137575; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=LWmw5Lvcc9LBkkGN/WFq1NU3hxCK/DdQKZluTUtVie8=; b=dS+Ey/oEUopXZY4dxwQ5Zh97Bdd5NDwcNp0EgJJ4lmqBdavxXylEHI2Y xOferj+/qs81gs0vp//mChVYTwGPsXl/hu957cdpYv67r/i1a2b8DVGu0 ZIsOEbyJwCQQbngRzekQL2hgWEMmPaXBhFbF36iXRyAG5dm6BBF3mr74U lb8yO1PQyQ0806SxmhDD39iA3JL5ny6otLhEGTa65RRpLnrIQ40TJXE2d 7usKaQoftZpHuU4PJjiLVlJeC6jTn/Yv5HYGRsa3eBJ5gVTGAHqVqIZ6d KXDYWCKKyF04mhADaKkFNm7KE3MEBPGciK7GFBzDh4iZqxn2EYoqbpMGJ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840526" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840526" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:34 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514039" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514039" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:33 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Peter Xu Subject: [PATCH v3 13/37] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 Date: Fri, 4 Nov 2022 15:35:40 -0700 Message-Id: <20221104223604.29615-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607156961588933?= X-GMAIL-MSGID: =?utf-8?q?1748607156961588933?= From: Yu-cheng Yu To introduce VM_SHADOW_STACK as VM_HIGH_ARCH_BIT (37), and make all VM_HIGH_ARCH_BITs stay together, move VM_UFFD_MINOR_BIT from 37 to 38. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Acked-by: Peter Xu Signed-off-by: Yu-cheng Yu Reviewed-by: Axel Rasmussen Signed-off-by: Rick Edgecombe Cc: Peter Xu Cc: Mike Kravetz --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8bbcccbc5565..5314ad0a342d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -365,7 +365,7 @@ extern unsigned int kobjsize(const void *objp); #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR -# define VM_UFFD_MINOR_BIT 37 +# define VM_UFFD_MINOR_BIT 38 # define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */ #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ # define VM_UFFD_MINOR VM_NONE From patchwork Fri Nov 4 22:35:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15829 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679038wru; Fri, 4 Nov 2022 15:42:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4Kuu5MBlMcy5Ppo1gNisy2UfKhDZdU+I5VQnb6LDROquE1iKLDq9yIhCsNkrsLBhXh2oP9 X-Received: by 2002:a17:90b:4a04:b0:213:587b:204e with SMTP id kk4-20020a17090b4a0400b00213587b204emr38564230pjb.98.1667601748160; Fri, 04 Nov 2022 15:42:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601748; cv=none; d=google.com; s=arc-20160816; b=vP7d5GiDIFo4Y/X1HrHXt/N6uC1oc9+4wx48QLNtGZvmtu/ng/voMGVGqbpXDLGCOr /IMA2JE6TEuKvZGUhdX1CHQxGXpKg2xvg61h986nRJqjFhza1+hNSSN3gytYdDDbTohv bIr6ffWAO2+1UAdE8eA+c4NTtjPP8lFFkWNi8u/cSN7elGWO9ePnfjpTcxrCmwNjt9CD U1ihmp83mn/yVmRnz2ZSfpfRMzjm8rWZTDlG/oszpuYvwpA6u84/6aiGHi/gpYGFL1D4 3vhhqpb0YgpGWUSBRE/Ct1YUfBzabWJ36pt+kGy3y3nx1kM4RkiKTlvtUa3tG2vyZlCm VpGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=UooqySfvhz6nUCqODgK1IUgqPGwNtERM6mW36ipLFxs=; b=lPyu1k0+Zai5M7rq4Y0r07c/7xD5406uG2sAz0AFVzpFgGrcepc98HBXy0N7qQmbVN AxAejt8qoe1wuDYavI+NHF8vcQMCpwJqUt7xwZrWk4O66O6jFS7RC7akyO8n9lgVGdks iujMVRkXN+sNJHf3nX8N/B06cu4/t7C0VQ8C820ooY9lqU5rrmAAVs7RRvfulFv8BuXQ vRhA4Fsu/h//4qdyZSuNsCl3AE4y1ZHabtSzlBcj30VjhSKx/7ZA6Pd/b+jJCNo2thoa JokDDalv4S2i3htnUk6wEy/+4VTVDYpjlmTtMt7+hIAaYKFFFtVqDgE7SNqvzFgZg2Kq 8yhg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kk86YClE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h1-20020a170902704100b001789a178e33si812230plt.428.2022.11.04.15.42.12; Fri, 04 Nov 2022 15:42:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kk86YClE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230236AbiKDWlo (ORCPT + 99 others); Fri, 4 Nov 2022 18:41:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230096AbiKDWkL (ORCPT ); Fri, 4 Nov 2022 18:40:11 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7978332BB2; Fri, 4 Nov 2022 15:39:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601575; x=1699137575; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Ws9wJw09e/c09QwWvWkPhdHb+7+SXb6r8d6POcm1BXU=; b=kk86YClEaRq8P3oKEwBQ1n2kQFRNgF41DJSIFAqvAF3eJHiZWKSrKgoc CW92ZqvBZtHbwrmQwYEt7QwPQB4wUsiTGWJ0n5eeB1BQJzXPrtM7v3kZe h82ewwDdps3HgBiQYMmyhqjP7KRYDcZfu0fKHUpUZwIqd7traBmcfAIe1 KRgTc+8Swrcv3SfiaMmRKxJSymZ/nKvBAVSUNwBnbLSwd2Ai+rYyPTMw1 ByGdQUWdMphpW623KltMSL5EhvSfjhpNqZ1c00AywMyht42mA94bRzT0V PKhqULQgo06p0Bq/u0CtHt5xrvrONf1tMP2mR/h68M0UIRgSVsAiDWjbv Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840531" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840531" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:35 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514049" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514049" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:34 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 14/37] mm: Introduce VM_SHADOW_STACK for shadow stack memory Date: Fri, 4 Nov 2022 15:35:41 -0700 Message-Id: <20221104223604.29615-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607170938996207?= X-GMAIL-MSGID: =?utf-8?q?1748607170938996207?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. A shadow stack PTE must be read-only and have _PAGE_DIRTY set. However, read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These two cases are handled differently for page faults. Introduce VM_SHADOW_STACK to track shadow stack VMAs. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Drop arch specific change in arch_vma_name(). The memory can show as anonymous (Kirill) - Change CONFIG_ARCH_HAS_SHADOW_STACK to CONFIG_X86_USER_SHADOW_STACK in show_smap_vma_flags() (Boris) Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 3 files changed, 12 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 898c99eae8e4..05506dfa0480 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -560,6 +560,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking + ss shadow stack page == ======================================= Note that there is no guarantee that every flag and associated mnemonic will diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8a74cdcc9af0..7dee7afbb01b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -703,6 +703,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_X86_USER_SHADOW_STACK + [ilog2(VM_SHADOW_STACK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 5314ad0a342d..42c4e4bc972d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -314,11 +314,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -334,6 +336,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Fri Nov 4 22:35:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15831 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679349wru; Fri, 4 Nov 2022 15:43:04 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4voLGXnbO91OuhbWcpPk4YWVJCb4EAbjtk+fxohNPYKoDlVxPlHgkKQ1Ai1jyu4uIrWb1k X-Received: by 2002:a17:906:6a26:b0:7ad:975c:9785 with SMTP id qw38-20020a1709066a2600b007ad975c9785mr39405591ejc.25.1667601784335; Fri, 04 Nov 2022 15:43:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601784; cv=none; d=google.com; s=arc-20160816; b=xuvivPrx36/8BsGktcLDT5VsQciWjfklxIxKbVC/2Ulkv99X2LDgKojqoJIi/j+Kni U4Ai7mUDyMn6/cNRfy7dwR0+ALV3l1bPgtGUrlqbQscNnZuCiH7UZKDdfRklRsrhqvTN TLdnlVKmkeinX7c3okciPkXA/TNz6HcU+8jQgp8+XilXrwCP1LvsrcCqQhITrBeK/CSS e5LdQ0VCzSNsZv+FnNFqte75qbUN5qjQLv4nJMgmvvAd05Twr00TT7K8vT72XSTdKtGZ oX0HtGOSWNn3uDLS60CSAGw4Wfw2k8woCU1utiNNiJTdwxtPCAQKXFbTmFWKSz2Fkf+R Df3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=5W9EdcBuQ+BX+iYQV757l9mPEjV+CHVF5qhLTO+Fgr8=; b=IeZXJv2/K2OtIU94/yM5GgR3vbKG5vjHEjxkRlt2e3A+/YGP/l2pLMq8UdmoN61z6v 4D148sfUshtBVKY48PKqCx6NL2g7mX0gyAnzNzSSs3drsmKdiOB/ZdPQrD20NWEYr4CM NbJVUUWCgfi/LUxc+1DetzN2wPrD8YjUrc6ZuTxnfVmDZ/RBd3msphm4JOeIqeLQUI7C 5O7jWEQKA6eYROP07KN9KKlTjDrRfFOPtf9O97rfQKiCsxZc7uomgjV0/ngvxYp2UcAR e874JWl5kSvCxr/tctCFxQDEOH6uEaiqod00+s5jrmmJ0LT9DrRgPIgMh7ix7NGrlfqU J5HQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RD6QhiLk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mp33-20020a1709071b2100b007acbac0871csi344874ejc.420.2022.11.04.15.42.40; Fri, 04 Nov 2022 15:43:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RD6QhiLk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230241AbiKDWlt (ORCPT + 99 others); Fri, 4 Nov 2022 18:41:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229985AbiKDWkp (ORCPT ); Fri, 4 Nov 2022 18:40:45 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE45B4AF30; Fri, 4 Nov 2022 15:39:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601577; x=1699137577; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=3/rM2KyC/x9FhcIHFIgAr562vZ6+pVG10d1tw9A8cn4=; b=RD6QhiLkzJyTfOFSedDqKGmLY8rWgQehYYU5uUWPqnyN3t5QN9DRucHD viVYFPrPJgrlVLQVsmERw3AU8B3ZePWtLsfpjhfrHKcde+BFMkvWtcQdl 3r/I+Q2qb03VV1zdheLB3cLv7bomtw0ezqII6OlUjJlKYyKypIVxgCx/q kdCFx8mN/Bl8R/tZn4CqotgXkbeSmhVGpI7o8MDSbGabW3FXjreg08KRr ogGDTwmFNjMUp4sImHZtk9XUlx1969SsjSiXRLZ0KBMrNgOt62EDNELZc oaVsgGGxnL7ufBkRWneUCUSdThHL98FJJCHMAGEN3L02ULuG7qPpfuOxi Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840538" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840538" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:36 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514058" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514058" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:35 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 15/37] x86/mm: Check Shadow Stack page fault errors Date: Fri, 4 Nov 2022 15:35:42 -0700 Message-Id: <20221104223604.29615-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607208883507700?= X-GMAIL-MSGID: =?utf-8?q?1748607208883507700?= From: Yu-cheng Yu The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stacks accesses to shadow-stack mappings can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v3: - Improve comment talking about using FAULT_FLAG_WRITE (Peterz) v2: - Update commit log with verbiage/feedback from Dave Hansen - Clarify reasoning for FAULT_FLAG_WRITE for all shadow stack accesses - Update comments with some verbiage from Dave Hansen Yu-cheng v30: - Update Subject line and add a verb arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 7b0d4ab894c8..0af3d7f52c2e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1138,8 +1138,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1331,6 +1345,18 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * To service shadow stack read faults, unlike normal read faults, the + * fault handler needs to create a type of memory that will also be + * writable (with instructions that generate shadow stack writes). + * In the case of COW memory, the COW needs to take place even with + * a shadow stack read. Otherwise the shared page will be left (shadow + * stack) writable in userspace. So to trigger the appropriate behavior + * by setting FAULT_FLAG_WRITE for shadow stack accesses, even if the + * access was a shadow stack read. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR) From patchwork Fri Nov 4 22:35:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15833 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679553wru; Fri, 4 Nov 2022 15:43:33 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7TDUB8+viRmXPe7BBWigB7hYaM8YrtiNqUYKQfw9P4fV9LiJiC7V+EAgM6J67k6KjkfPpb X-Received: by 2002:a63:4383:0:b0:440:3e0d:b9ec with SMTP id q125-20020a634383000000b004403e0db9ecmr33382272pga.54.1667601812976; Fri, 04 Nov 2022 15:43:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601812; cv=none; d=google.com; s=arc-20160816; b=eDjmdbJoQcWZeDW5IARBRSatLKaf/HneaIAEi+uPQDhOvp+ZY7GUKM2t79xD7LfirW jvc4lNxKoOWJrr/pAPefZtwzjNCOvZOl7rAFqYWg1kZTKL/B1gW0mbIfvNgWqEX/XghN LQu685OweTa/PtpFyUXWXRPN0eCBALrtxzfu9/nbWmYwHIUA28kyR/6AlswDnmJ2Msow G45oXpzYtKpBkFjK3agcl4XvB9Amj/3jDjutT1qqUVKVZh6YMzXjoMZmSuNT8MDkJupy wIPulvJWg95Grsu/e0zEuhAYCfHLbT25hS3VBF8pZwH9xs+we1Kvv0WatZkTa5FiXwNj twVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=nBTWAjcv66kbHX4AaVWrQ5311qyWLHUMgz/own7AHvA=; b=ygonP4gpGXchOYP48DkkmHJkWhtaNYgVQJMxsEkjgJBsjDdHGbB/JHJFnr9mv7Z2VI x5andt9dbo0ufMXPiH326XZs0pw3NwOtGLSVJlS2Oa5V2zuQIiJlnULf+yNVpa10Av8u nxP6vYzyIpQ7FIR3anmJOtmWLblukcwkNsnDRbas/beXowC+Nsu3KnJrXzcSQ4e5+0mb UvwrPL9C4t4f5Rmxmh0hUhP6qDND1GGYnmi6DfNj+RtA1aI3wTswQmGOdZYL0JhtuuZf FIf8Oqlc/6yqmIZ3a3mntUalm62aa7iOvW+qO85W/V9Pa1IsD1DYfH96XcllDJN77Nf3 gPag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="QbvRcO/+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p1-20020a170902b08100b001854f631c8esi808917plr.221.2022.11.04.15.42.56; Fri, 04 Nov 2022 15:43:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="QbvRcO/+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230261AbiKDWlz (ORCPT + 99 others); Fri, 4 Nov 2022 18:41:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230142AbiKDWkt (ORCPT ); Fri, 4 Nov 2022 18:40:49 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 599C84B99D; Fri, 4 Nov 2022 15:39:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601577; x=1699137577; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=5QjeO1hUpatOnGhS13rjzsuKy1aDnHsoqPiBGBWzrX0=; b=QbvRcO/+ZCWaHHF5YvwVQbUxea6vlKziKJAaalcgJdadAG6CRJZKPGyB TLt/z0BRPrDKC84A2xpLCHg8NEp8lwny5szzoFsIQsPpfrFxfEDRHnIHb IdlrCH+2lzSLBdUD74wKJqG6U4DnJhPOf0viALQFcEtM4mmTbhtf//N+4 lUluDI2Rn4M7K3OQFo4i/waFTMhpogSINCcbHYDcAAMDjga2T0epfGqPL SWSAtdxuWCJ2CNJfvTvDX+Bun+cYkmXOEZvN6URDdJj8R4hrW/GZgxt8k nuYWd0AAD3p095SAzKOfaMtI5llv66sQxotda4bvwmjn8pmVfkOfzLsl2 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840541" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840541" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:37 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514069" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514069" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:36 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 16/37] x86/mm: Update maybe_mkwrite() for shadow stack Date: Fri, 4 Nov 2022 15:35:43 -0700 Message-Id: <20221104223604.29615-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607238228742620?= X-GMAIL-MSGID: =?utf-8?q?1748607238228742620?= From: Yu-cheng Yu When serving a page fault, maybe_mkwrite() makes a PTE writable if there is a write access to it, and its vma has VM_WRITE. Shadow stack accesses to shadow stack vma's are also treated as write accesses by the fault handler. This is because setting shadow stack memory makes it writable via some instructions, so COW has to happen even for shadow stack reads. So maybe_mkwrite() should continue to set VM_WRITE vma's as normally writable, but also set VM_WRITE|VM_SHADOW_STACK vma's as shadow stack. Do this by adding a pte_mkwrite_shstk() and a cross-arch stub. Check for VM_SHADOW_STACK in maybe_mkwrite() and call pte_mkwrite_shstk() accordingly. Apply the same changes to maybe_pmd_mkwrite(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Remove unneeded define for maybe_mkwrite (Peterz) - Switch to cleaner version of maybe_mkwrite() (Peterz) v2: - Change to handle shadow stacks that are VM_WRITE|VM_SHADOW_STACK - Ditch arch specific maybe_mkwrite(), and make the code generic - Move do_anonymous_page() to next patch (Kirill) Yu-cheng v29: - Remove likely()'s. arch/x86/include/asm/pgtable.h | 2 ++ include/linux/mm.h | 13 ++++++++++--- include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 10 +++++++--- 4 files changed, 33 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f252c42f3ca1..df67bcf9f69e 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -420,6 +420,7 @@ static inline pte_t pte_mkdirty(pte_t pte) return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); } +#define pte_mkwrite_shstk pte_mkwrite_shstk static inline pte_t pte_mkwrite_shstk(pte_t pte) { /* pte_clear_cow() also sets Dirty=1 */ @@ -556,6 +557,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd) return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); } +#define pmd_mkwrite_shstk pmd_mkwrite_shstk static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) { return pmd_clear_cow(pmd); diff --git a/include/linux/mm.h b/include/linux/mm.h index 42c4e4bc972d..5d9536fa860a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1015,12 +1015,19 @@ void free_compound_page(struct page *page); * servicing faults for write access. In the normal case, do always want * pte_mkwrite. But get_user_pages can cause write faults for mappings * that do not have writing enabled, when used by access_process_vm. + * + * If a vma is shadow stack (a type of writable memory), mark the pte shadow + * stack. */ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { - if (likely(vma->vm_flags & VM_WRITE)) - pte = pte_mkwrite(pte); - return pte; + if (!(vma->vm_flags & VM_WRITE)) + return pte; + + if (vma->vm_flags & VM_SHADOW_STACK) + return pte_mkwrite_shstk(pte); + + return pte_mkwrite(pte); } vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a108b60a6962..5ce6732a6b65 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -493,6 +493,13 @@ static inline pte_t pte_sw_mkyoung(pte_t pte) #define pte_mk_savedwrite pte_mkwrite #endif +#ifndef pte_mkwrite_shstk +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + return pte; +} +#endif + #ifndef pte_clear_savedwrite #define pte_clear_savedwrite pte_wrprotect #endif @@ -501,6 +508,13 @@ static inline pte_t pte_sw_mkyoung(pte_t pte) #define pmd_savedwrite pmd_write #endif +#ifndef pmd_mkwrite_shstk +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd; +} +#endif + #ifndef pmd_mk_savedwrite #define pmd_mk_savedwrite pmd_mkwrite #endif diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 561a42567477..73b9b78f8cf4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -553,9 +553,13 @@ __setup("transparent_hugepage=", setup_transparent_hugepage); pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { - if (likely(vma->vm_flags & VM_WRITE)) - pmd = pmd_mkwrite(pmd); - return pmd; + if (!(vma->vm_flags & VM_WRITE)) + return pmd; + + if (vma->vm_flags & VM_SHADOW_STACK) + return pmd_mkwrite_shstk(pmd); + + return pmd_mkwrite(pmd); } #ifdef CONFIG_MEMCG From patchwork Fri Nov 4 22:35:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15832 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679369wru; Fri, 4 Nov 2022 15:43:07 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4zPXoHDQk/znSkqXwcFCVv49xxe9QI8sb17oDXx95K6cDsdpVtffn0Vr8MLQdkrwsWSQ3J X-Received: by 2002:a17:907:6e28:b0:7a0:b6b5:5103 with SMTP id sd40-20020a1709076e2800b007a0b6b55103mr35645874ejc.300.1667601787293; Fri, 04 Nov 2022 15:43:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601787; cv=none; d=google.com; s=arc-20160816; b=nMTcNto16ZMW3qU0TA/ERo1NCvvZ/R/3sDa8HQQcKMX3DLSn5HEyhliVNuiOkqY5aE N2Jwsr+/hzcVpGL0QC0/aWY/yaTLbxlMpT9y5JljJayzC2BFr/FDrMdzyMiUF6M2i/Kk bFqdQAMnH8qXyrUrm2yYb4i7ElrGT06+EhmCJzXlzmrcZ9kETpPxx1QhMOnygfekri+0 ukAkd7XYkuLpekZ2MPTlB9vKXIFmGZjsBvE+CSKM3Di6xmhXfsShIumnTOs8HSGxJ4Hz ycoia49lzdGAmXC5EOFtD7ZGaLhigvA8vaO4locS7w3NqLAWmCunofJoK/fAMGnRFKSL RneA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=ruL36AwKl4M6rJ5s+Gw99mn0gvmkSfEMEMTWFoPnbcw=; b=BAUVmWJsp3A/ny+9TiCin4GPkhxIhyLo9Qh8Nl5dYoNrmOznkkyYEbYi+ikIsLfLzN FvabtQfDIn1Wazj/TgL/qJ4mw588Pzvrj30GwwxoMgzwhR3mPBkSBYomCpK+CHPB5O2C q3h/V3VvorAHjFk3Rg5am5q/1euYrUyXbn6zKFegDcxo4esxVI9MRaGp4qB1kRAUvzyi TXeGXGGFRHKsHYksxIbgJA+Zc5SupN6YpGHJEc0gSYCrHINz8QL040HouV01Ebv/rZK7 ApB1qhg1k/fzCIHhFuvToFehmv6nx1KfxpFtUx9CUQyttH28G1lg3s4m1slJxaLum78D dZfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=PRjBhwV6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b3-20020a170906728300b0078d550c0272si325349ejl.269.2022.11.04.15.42.41; Fri, 04 Nov 2022 15:43:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=PRjBhwV6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229994AbiKDWmF (ORCPT + 99 others); Fri, 4 Nov 2022 18:42:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229884AbiKDWkz (ORCPT ); Fri, 4 Nov 2022 18:40:55 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E7A843AE1; Fri, 4 Nov 2022 15:39:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601578; x=1699137578; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=zI9mlyn7kruoeLHzJpJEYX6FArWTErpn+ABnmp5ANPM=; b=PRjBhwV6Gy7i+7krXExXYYGgaqUeuBJywx+r8ygjReHhUgvIm8yDuxHm k7pN+kkP2KihBlv1hM43G8CsRwSbqqxkxmjjp96OzM33VpM5PXMiParON kdpAtcaz+aKF5qSREMLqAlS6NocE9kSxK2lmdjgr66qgv1T5Q31iRshY3 0c/1cRxnZ6pHrFBU64OYm7JRwh8SjeWyYF2N8I+UgwjxhReRPHkeQG3VO esnWFIef64Zn9kUW1R9dHBLwakke3XJ/6a2DV0zecZEKuCtnX85RVyxu6 i+pGJcd1kVSWmLClUkDwIdeFT4HuCoSMCD9F5ayGpJH/EMECYWFdxChYE g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840544" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840544" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:38 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514073" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514073" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:37 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 17/37] mm: Fixup places that call pte_mkwrite() directly Date: Fri, 4 Nov 2022 15:35:44 -0700 Message-Id: <20221104223604.29615-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607211359982946?= X-GMAIL-MSGID: =?utf-8?q?1748607211359982946?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. With the introduction of shadow stack memory there are two ways a pte can be writable: regular writable memory and shadow stack memory. In past patches, maybe_mkwrite() has been updated to apply pte_mkwrite() or pte_mkwrite_shstk() depending on the VMA flag. This covers most cases where a PTE is made writable. However, there are places where pte_mkwrite() is called directly and the logic should now also create a shadow stack PTE in the case of a shadow stack VMA. - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly and call pte_mkwrite(). Teach it about pte_mkwrite_shstk() - When userfaultfd is creating a PTE after userspace handles the fault it calls pte_mkwrite() directly. Teach it about pte_mkwrite_shstk() To make the code cleaner, introduce is_shstk_write() which simplifies checking for VM_WRITE | VM_SHADOW_STACK together. In other cases where pte_mkwrite() is called directly, the VMA will not be VM_SHADOW_STACK, and so shadow stack memory should not be created. - In the case of pte_savedwrite(), shadow stack VMA's are excluded. - In the case of the "dirty_accountable" optimization in mprotect(), shadow stack VMA's won't be VM_SHARED, so it is not nessary. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Restore do_anonymous_page() that accidetally moved commits (Kirill) - Open code maybe_mkwrite() cases from v2, so the behavior doesn't change to mark that non-writable PTEs dirty. (Nadav) v2: - Updated commit log with comment's from Dave Hansen - Dave also suggested (I understood) to maybe tweak vm_get_page_prot() to avoid having to call maybe_mkwrite(). After playing around with this I opted to *not* do this. Shadow stack memory memory is effectively writable, so having the default permissions be writable ended up mapping the zero page as writable and other surprises. So creating shadow stack memory needs to be done with manual logic like pte_mkwrite(). - Drop change in change_pte_range() because it couldn't actually trigger for shadow stack VMAs. - Clarify reasoning for skipped cases of pte_mkwrite(). Yu-cheng v25: - Apply same changes to do_huge_pmd_numa_page() as to do_numa_page(). arch/x86/include/asm/pgtable.h | 3 +++ arch/x86/mm/pgtable.c | 6 ++++++ include/linux/pgtable.h | 7 +++++++ mm/memory.c | 5 ++++- mm/migrate_device.c | 4 +++- mm/userfaultfd.c | 10 +++++++--- 6 files changed, 30 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index df67bcf9f69e..d57dc1b2d3e8 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -919,6 +919,9 @@ static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd) } #endif /* CONFIG_PAGE_TABLE_ISOLATION */ +#define is_shstk_write is_shstk_write +extern bool is_shstk_write(unsigned long vm_flags); + #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 8525f2876fb4..f0e536bea3ca 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -876,3 +876,9 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +bool is_shstk_write(unsigned long vm_flags) +{ + return (vm_flags & (VM_SHADOW_STACK | VM_WRITE)) == + (VM_SHADOW_STACK | VM_WRITE); +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5ce6732a6b65..36926a207b6d 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1567,6 +1567,13 @@ static inline bool arch_has_pfn_modify_check(void) } #endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */ +#ifndef is_shstk_write +static inline bool is_shstk_write(unsigned long vm_flags) +{ + return false; +} +#endif + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/memory.c b/mm/memory.c index f88c351aecd4..b9bee283aad3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4128,7 +4128,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) + + if (is_shstk_write(vma->vm_flags)) + entry = pte_mkwrite_shstk(pte_mkdirty(entry)); + else if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6fa682eef7a0..4c21c600bf46 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -641,7 +641,9 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto abort; } entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) + if (is_shstk_write(vma->vm_flags)) + entry = pte_mkwrite_shstk(pte_mkdirty(entry)); + else if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3d0fef3980b3..503135b079b6 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, int ret; pte_t _dst_pte, *dst_pte; bool writable = dst_vma->vm_flags & VM_WRITE; + bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; bool vm_shared = dst_vma->vm_flags & VM_SHARED; bool page_in_cache = page->mapping; spinlock_t *ptl; @@ -83,9 +84,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, writable = false; } - if (writable) - _dst_pte = pte_mkwrite(_dst_pte); - else + if (writable) { + if (shstk) + _dst_pte = pte_mkwrite_shstk(_dst_pte); + else + _dst_pte = pte_mkwrite(_dst_pte); + } else /* * We need this to make sure write bit removed; as mk_pte() * could return a pte with write bit set. From patchwork Fri Nov 4 22:35:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15835 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679575wru; Fri, 4 Nov 2022 15:43:35 -0700 (PDT) X-Google-Smtp-Source: AMsMyM40TwTOA+SUgCDKh0qajB7bOAI/ZBZsQioax+tu3sFSji16ZqGygs5H42KHgzUuDmnezGVL X-Received: by 2002:a17:906:8592:b0:7ae:7b1:df4d with SMTP id v18-20020a170906859200b007ae07b1df4dmr14903374ejx.716.1667601815600; Fri, 04 Nov 2022 15:43:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601815; cv=none; d=google.com; s=arc-20160816; b=F7alpbeuxZ6tKFwuEDKIR5bjcfHcIYzL0EezBdb4DCsH3eDcrQa4qPRSUbbGj2VndH weqGeOMAVxzBaJeYf1+4TFmzfHMYjvbCyi0SixZhajN3yBaEMB16d7B/XKwF/Zz7t+DD VhoxuZqWguIIQLRhwppg/FM75X59hIbgToQH33Ft0DeDTCYRONFFUVV5UYW1qebHdVnD v00YpcmLhLe7MQN1rwu947UfG3lHDm/532D+Vaxd1QnPw7lkwQUXjEMJfxJozd5l8ayd RORqp+2b1kDnAH9hBB47Z8l4lyouBXdhmaubD61hKFpxnxk4NecbNryW51xrfw2vsKzz 3OCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=DmuQx6Ay/1YJtKD1F62EnxEuTFUb7Q0JnaLc98wFWNY=; b=qXba2Pjn2y32qecJ3MStaHBSKzNv9hVAr/ZvtEYV7YfWk0DDjbzn4Ipqirl3iBY99+ Za4+7ZfL8oADeJ3ADS2ikK75iI5Rl92q7lf91j0pJuQeO4Zh/ywHKIVT6t77wuWDv6vy j5cSNN8Q9hagWnq2nvah5MYhWhS2cR2VQFyvYyJY+pxTKIEZ6gGkYQjJFy3kJScBGXkm M+RL7fqQMgOle0NoIE9gWD73A4wyINiN1kHw+WdX9iMptq/s46gsHLVicJwH7DasuqR3 lzpSbuEA5cPehPOidYESkCLxaaqp2KNj2h8yDzcUGEyW8WMm8n5uFBuJis4MXu0ie4wE khWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=eDoBbnBJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qf26-20020a1709077f1a00b0079195d87013si357650ejc.713.2022.11.04.15.43.01; Fri, 04 Nov 2022 15:43:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=eDoBbnBJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230318AbiKDWm1 (ORCPT + 99 others); Fri, 4 Nov 2022 18:42:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230182AbiKDWlf (ORCPT ); Fri, 4 Nov 2022 18:41:35 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B88A4B9AF; Fri, 4 Nov 2022 15:39:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601580; x=1699137580; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=jeXRfJlKrlPUUKU7HYMbOtf6yG0qFBUWZwuPnRIvISw=; b=eDoBbnBJCuqQNGf6vc8cF6xjiDOnD/CQKDSrrEtF5JlfH0lnJpReNqDD +847OjwkI5JHCsH4Uf44j1U95ypgHsk3uI8WY2xxROfEr90bmqSfxMzOP iOpNmjPUnHfTXet7ZekE2QyoFHV7YekE0v7yyD2dEGi+Pm6z+KK4GZxOf pTUNBDZ6QWeBfn7vQ2XemQTZrbf6D6vr5qqUgIAi8C5RRWtAqQI8bxRMg RYUyLpFyyJCzCuw6Km6dWfvAL9bAQAoNViVk0AzURK2g3K8FykcIkpMQU 8rMKXihcLTFRVtbznihsnuPqU1aAoOCrMGbXarhdn3FIc2/waCO+7MHMp A==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840554" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840554" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:38 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514077" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514077" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:37 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 18/37] mm: Add guard pages around a shadow stack. Date: Fri, 4 Nov 2022 15:35:45 -0700 Message-Id: <20221104223604.29615-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607241840148265?= X-GMAIL-MSGID: =?utf-8?q?1748607241840148265?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. The architecture of shadow stack constrains the ability of userspace to move the shadow stack pointer (SSP) in order to prevent corrupting or switching to other shadow stacks. The RSTORSSP can move the spp to different shadow stacks, but it requires a specially placed token in order to do this. However, the architecture does not prevent incrementing the stack pointer to wander onto an adjacent shadow stack. To prevent this in software, enforce guard pages at the beginning of shadow stack vmas, such that there will always be a gap between adjacent shadow stacks. Make the gap big enough so that no userspace SSP changing operations (besides RSTORSSP), can move the SSP from one stack to the next. The SSP can increment or decrement by CALL, RET and INCSSP. CALL and RET can move the SSP by a maximum of 8 bytes, at which point the shadow stack would be accessed. The INCSSP instruction can also increment the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". It can be thought of as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element The maximum distance INCSSP can move the SSP is 2040 bytes, before it would read the memory. Therefore a single page gap will be enough to prevent any operation from shifting the SSP to an adjacent stack, since it would have to land in the gap at least once, causing a fault. This could be accomplished by using VM_GROWSDOWN, but this has a downside. The behavior would allow shadow stack's to grow, which is unneeded and adds a strange difference to how most regular stacks work. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v2: - Use __weak instead of #ifdef (Dave Hansen) - Only have start gap on shadow stack (Andy Luto) - Create stack_guard_start_gap() to not duplicate code in an arch version of vm_start_gap() (Dave Hansen) - Improve commit log partly with verbiage from (Dave Hansen) Yu-cheng v25: - Move SHADOW_STACK_GUARD_GAP to arch/x86/mm/mmap.c. Yu-cheng v24: - Instead changing vm_*_gap(), create x86-specific versions. arch/x86/mm/mmap.c | 23 +++++++++++++++++++++++ include/linux/mm.h | 11 ++++++----- mm/mmap.c | 7 +++++++ 3 files changed, 36 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index c90c20904a60..66da1f3298b0 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -248,3 +248,26 @@ bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) return false; return true; } + +unsigned long stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + + /* + * Shadow stack pointer is moved by CALL, RET, and INCSSP(Q/D). + * INCSSPQ moves shadow stack pointer up to 255 * 8 = ~2 KB + * (~1KB for INCSSPD) and touches the first and the last element + * in the range, which triggers a page fault if the range is not + * in a shadow stack. Because of this, creating 4-KB guard pages + * around a shadow stack prevents these instructions from going + * beyond. + * + * Creation of VM_SHADOW_STACK is tightly controlled, so a vma + * can't be both VM_GROWSDOWN and VM_SHADOW_STACK + */ + if (vma->vm_flags & VM_SHADOW_STACK) + return PAGE_SIZE; + + return 0; +} diff --git a/include/linux/mm.h b/include/linux/mm.h index 5d9536fa860a..0a3f7e2b32df 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2832,15 +2832,16 @@ struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) return mtree_load(&mm->mm_mt, addr); } +unsigned long stack_guard_start_gap(struct vm_area_struct *vma); + static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { + unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; - if (vm_start > vma->vm_start) - vm_start = 0; - } + vm_start -= gap; + if (vm_start > vma->vm_start) + vm_start = 0; return vm_start; } diff --git a/mm/mmap.c b/mm/mmap.c index 2def55555e05..f67606fbc464 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -281,6 +281,13 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) return origbrk; } +unsigned long __weak stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + return 0; +} + #if defined(CONFIG_DEBUG_VM_MAPLE_TREE) extern void mt_validate(struct maple_tree *mt); extern void mt_dump(const struct maple_tree *mt); From patchwork Fri Nov 4 22:35:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15834 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679573wru; Fri, 4 Nov 2022 15:43:35 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7HYfwFq7+dCTW/RiUsx8a5VKzWFuZmM2l9U9QbBGOKG9IlH7Cphb0oze1VQp8Vw/nlDCja X-Received: by 2002:a17:907:c1e:b0:7ae:31a0:571e with SMTP id ga30-20020a1709070c1e00b007ae31a0571emr5573961ejc.690.1667601815221; Fri, 04 Nov 2022 15:43:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601815; cv=none; d=google.com; s=arc-20160816; b=LmPbEBHMUPYQO0FnXkFM/NK58+Wj/P9X+hh3h08sP+sSY0ffPb+2nihCo6X/hn70ee JMZ4TSM/R9am3K22uqVsqV50DUVmAWPeRix+paijfHW5MGvOTNWj3vF5Le3fCR54YUrJ YSFRk/xiI/LjKc3b12rN8Rqzpq5YIGvOCSchHicDXAHoJRAZb9NL+YPdEPNxs0gLqU3X 1guT1RizPpoxnL01I3Lyz/ajS1iQPUt9iCwQsIYAdYGgJTZmnMLS4aPA431mKYOqSkxV HJQ0J0zWQGzdl3EGBwwF6xmucDW8XN6KFmAjED2BFRJW/kOWQSRI0LzDAGS2qNiFooU3 otcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=OS38q1UI76P3zo2mEE1/FuGGSmIk3DJz+B+bAUpkUYk=; b=Bm5yi60d92VVfgKvjbLnkfdCpqK2rRIl6d1mUK8OsnzCtvu2GSbFaDbamMMOPlpAQ5 ZoktpnfaQaiDt4Q3tZDbh0m9h58TWMAjC/xQb9GFGtnfl67Hc8HsryRUVugh21rAewG/ MvqLbZv1Z5bc8Ce1/lehJUWe5+OvQextLMiBWxt0pTH8r+N/fxrHyHKFFCRTUr5cBYx2 vYFIj3SYvCniBeBgwAqSdT1Dk1gMAoEhdViuYB3trZa0t2WWxctwEEHZR49dpU3DfL4w xxYpJtyZPk7xkrmowQkOdrYAMa2ouKJD7vgnNsnkqOSiq7s210Mgh5n7HjhKicIgPPqs 6nzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hVRKlIIS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fj21-20020a0564022b9500b0045d15f39bf4si777092edb.479.2022.11.04.15.43.05; Fri, 04 Nov 2022 15:43:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hVRKlIIS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230200AbiKDWmd (ORCPT + 99 others); Fri, 4 Nov 2022 18:42:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230189AbiKDWlf (ORCPT ); Fri, 4 Nov 2022 18:41:35 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0F2C4E416; Fri, 4 Nov 2022 15:39:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601580; x=1699137580; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=QCRYLxuft9By+ea93NBBljihOI4kEYqg35nP/V5qBtI=; b=hVRKlIISj5c7QqVHNns+LxPHtM8v7CamYB0xN3pu3JkES5y33sKnwY1u 7PK6E/N2nH1HECFvM0BT2kVsi96U/+Bk8c2OD2mpBRnw/xlw6t9EZrLhc l1mkRI1FF5cvpVZiagyO2fMo5priKOKPCY3i7wIpBOFQycdzoqNdBvMat PYOGjRqmrFqTRGFr3Lw2bgv+pof3on4eEqHwz1LCongwNLSH6SD8TOpmH fWcnPMhd6wZZv8js63x90RqQ09pJfo5nQ1UK6TbbpEmalDyeddMB5Xkvs PedHk2Kq6IBXxFa3gDrbYupNt4YLEpCmH631pclIEf9vlPX/5vI7sdikz Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840556" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840556" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:39 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514085" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514085" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:38 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 19/37] mm/mmap: Add shadow stack pages to memory accounting Date: Fri, 4 Nov 2022 15:35:46 -0700 Message-Id: <20221104223604.29615-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607240861120764?= X-GMAIL-MSGID: =?utf-8?q?1748607240861120764?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Account shadow stack pages to stack memory. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Remove unneeded VM_SHADOW_STACK check in accountable_mapping() (Kirill) v2: - Remove is_shadow_stack_mapping() and just change it to directly bitwise and VM_SHADOW_STACK. Yu-cheng v26: - Remove redundant #ifdef CONFIG_MMU. Yu-cheng v25: - Remove #ifdef CONFIG_ARCH_HAS_SHADOW_STACK for is_shadow_stack_mapping(). mm/mmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index f67606fbc464..4dc157869b34 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3298,6 +3298,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages) mm->exec_vm += npages; else if (is_stack_mapping(flags)) mm->stack_vm += npages; + else if (flags & VM_SHADOW_STACK) + mm->stack_vm += npages; else if (is_data_mapping(flags)) mm->data_vm += npages; } From patchwork Fri Nov 4 22:35:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15836 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679741wru; Fri, 4 Nov 2022 15:43:55 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4UosyyD/u5hB6NjMFK8uYBk8hZAImCEAIdvYIsZGnQjcWRR/pUPZJBwrQDVPb9m+GnsQkj X-Received: by 2002:a17:907:a02:b0:7ac:9a51:3403 with SMTP id bb2-20020a1709070a0200b007ac9a513403mr37270183ejc.220.1667601835040; Fri, 04 Nov 2022 15:43:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601835; cv=none; d=google.com; s=arc-20160816; b=yB+HmQEnUgP4piAC/ELterwqkz9Xl8Ok3mEE82W6uUhs9N6iSt876W8IG0Eoe8DR31 68lDtW7W0HH3OXMceh5Llirt5RzS/6f2LGChBP/rs9CzV3vuld2adQgLf8CR6LopZVNO lkCktTB6nAnPGs0UTXebPY17s9b7WwEn0QDmDzN8OB1fqws7wEtOibGow36IpI0YiODC uZrEIWg/mgQwB+4Fj3WfluAL1rnyz6uIfgSu2XuIL8sZvVrkFijNfrDQ6yibwrDqY7m4 5ft/SnU34A0Ui2GggXJrqg/UqL8xzKC3z7aATEBhH/ntxKvj1FvJ2qNRQ+OOfcSgjD+5 LMmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=opoH7NecdpHP4Fcjl0236tgId5JC2cqlPok8ay5lAPU=; b=UAaUOj1d4kFhAPY0i6SLc8GORnrQTrYuDQoNGn0ma3jsHx4JrFmrMPyuKMF6NpjSHO 1xgdogvWyGAwP6+qdtmzCHcpqzUE3iUKJ19BJXuYc9VK3++oxR8rL3pCmxSYNi40mYZS FhfV1hJQ0qwJfXI4eSUbmfgL/XcKlsQrlBv6SW2c/clNhfzRePxqZNrkX2/byiCicoJS S78NH1bniuajO8shKg4C0L/C+v6GG/0HMycoBrJjKqI7cDNZKPU7fvVEuRHaWl+P7kYO V1G8RRiLqHRpGc1CoabcSo+1s/uIB1oK22cf2YDvnYg+XOHKFI5bnq9UGRGQlWOakNFU HsBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QOaLeC5h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qw8-20020a1709066a0800b0078d9f1f72bcsi329259ejc.726.2022.11.04.15.43.23; Fri, 04 Nov 2022 15:43:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QOaLeC5h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230239AbiKDWml (ORCPT + 99 others); Fri, 4 Nov 2022 18:42:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230214AbiKDWlh (ORCPT ); Fri, 4 Nov 2022 18:41:37 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30E6E4E43F; Fri, 4 Nov 2022 15:39:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601582; x=1699137582; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4aVCw5ODNp+SkJKgvPAW6TwS6KUC24UPNM4SuYiLUfg=; b=QOaLeC5hksoUbpgOqXEw0SygnmAo8GB4E2veLGgHokb83WqMo9qKmclN A5PkTqwoDrJ5DfvLC1pX3Gn5xaZe/IwPpSqXGrJbSYNHBVXD1v+bUcqOP Zib9IqyjDCv0z8EUx/Jj6muWAcQHcD7Tkwqq7vjJQTbmHt3CRI4vrABnY i9AU+UwabwTZk47lzfLJK7g7llXp2ZDr4RPf86aW6B1V2L/zILFlPHbMM VpiRpUzwVmBrcaGfqe3j0WoB1QC5mNbmCjrOPklGxP45H8FHajhL2ae19 VBEzDXbOddrcXqSutRnoECIEEz2gWBm+BBaHOUfyeUYoxbPP0ishNiA9F w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840561" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840561" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:40 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514089" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514089" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:39 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 20/37] mm/mprotect: Exclude shadow stack from preserve_write Date: Fri, 4 Nov 2022 15:35:47 -0700 Message-Id: <20221104223604.29615-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607261921403744?= X-GMAIL-MSGID: =?utf-8?q?1748607261921403744?= From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. In change_pte_range(), when a PTE is changed for prot_numa, _PAGE_RW is preserved to avoid the additional write fault after the NUMA hinting fault. However, pte_write() now includes both normal writable and shadow stack (Write=0, Dirty=1) PTEs, but the latter does not have _PAGE_RW and has no need to preserve it. Exclude shadow stack from preserve_write test, and apply the same change to change_huge_pmd(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- Yu-cheng v25: - Move is_shadow_stack_mapping() to a separate line. Yu-cheng v24: - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping(). mm/huge_memory.c | 7 +++++++ mm/mprotect.c | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 73b9b78f8cf4..7643a4db1b50 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1803,6 +1803,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return 0; preserve_write = prot_numa && pmd_write(*pmd); + + /* + * Preserve only normal writable huge PMD, but not shadow + * stack (RW=0, Dirty=1). + */ + if (vma->vm_flags & VM_SHADOW_STACK) + preserve_write = false; ret = 1; #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION diff --git a/mm/mprotect.c b/mm/mprotect.c index 668bfaa6ed2a..ea82ce5f38fe 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -115,6 +115,13 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, pte_t ptent; bool preserve_write = prot_numa && pte_write(oldpte); + /* + * Preserve only normal writable PTE, but not shadow + * stack (RW=0, Dirty=1). + */ + if (vma->vm_flags & VM_SHADOW_STACK) + preserve_write = false; + /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. From patchwork Fri Nov 4 22:35:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15837 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679840wru; Fri, 4 Nov 2022 15:44:05 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4BYbbccLSWLu6zecNnaxR9+F8FrBBXuhG122zDpYj18TckQ8j18OLO48UsIUiUawlHXtcL X-Received: by 2002:a17:907:7b95:b0:731:113a:d7a2 with SMTP id ne21-20020a1709077b9500b00731113ad7a2mr35492940ejc.377.1667601844907; Fri, 04 Nov 2022 15:44:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601844; cv=none; d=google.com; s=arc-20160816; b=SeklNJ7i80oXsewk5YzMyFBBre/2/rQshksuOXliPvZokRsY/dpWWzhMwyDjfrMSYM AXwVfI5TqbwK/BTt+9CjdgOB2M4p4QdeTi68WjexFsn/NqlMGOzc2X91cAiUNlSaccIE TewogA5J/xJoSu0wMNlPExUZoesHz37EFp8YZoct+x2gMcvwFyWiEnb0OVlR1K9eHeMY 0r+Kfp6zoq84xvWBNATkecz17wW5a9MIfKJrVnhykfa2XisTbmrTAR9BcNzGNno7CfTe CN8wm+cRKq2wH/rFakOdxnMVWSd0SSZIZZs4Knt1xUKXM2sdN43VJHA9/ceoJ7oLuiiY Zc7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=A9KOgrg6TEFWegeB56vFKLF9KnYohsAxssC1Tpd+Ro4=; b=eX61UIBmVqvxlbNUSRnuKEUDgBMcAg5XFcuc3y1jWmqdwLykOdoqWN/YAyidUFFE8a iwCs+GyhfFpsRcwU9FulHzRNeGVVDLidK0j7D0gNIi7j4/h6dQb65IaGkeHLMUmKETO/ DLW93kmFq4kbNnKX+0ls1r/BFQM3ufwTCYFLgvk2Ng89gFd0Rmai/QCWwLBjMQqSLm2l jEJyBJJPIfspPfs75ASW1e0mGvyNfqRP+Q/oqqoEjCFpqZ+vZRyEMO4FGzRti93x0eaZ 7GcWO5mqV0Llwtk0WrHgYgvyG4HSOkeXgCLnzhUuS+5SW0pSs6xCc1WF9QE441TetMq5 Xwpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=e6pBsCrx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u17-20020a50c2d1000000b00461be535949si673730edf.424.2022.11.04.15.43.41; Fri, 04 Nov 2022 15:44:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=e6pBsCrx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230283AbiKDWnA (ORCPT + 99 others); Fri, 4 Nov 2022 18:43:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230285AbiKDWl7 (ORCPT ); Fri, 4 Nov 2022 18:41:59 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 503AA51C09; Fri, 4 Nov 2022 15:39:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601586; x=1699137586; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=IWLykOH+9RNL2fQqoktCgHl0/j5L54Xy0tPhl/NmdTI=; b=e6pBsCrxxdKIASNO3lGpF5P75UsBHPBLzVSNbGfbHWqocuNc1TSPakMM 4SgkEhNnRY3mp1BrdLv/wQOHMmefvdBCKDHvqPuLCG2oU0fkslP4JAjDt xFZl+0P9E+i+9yKCUIsjkFVTw6sdX9s9JZ8cJB/1lJTPd4XEiR1NXSOpF QEsuirjeIfJG7Ins6ayGZvf+VRdEtC7OrQwp0JfEoehreY083Yb9lonbz 6jh+596Efp5RZMh6XrSQ9EkJhH5SH5a18kiiPBVi2h8r2y7+j77TWgZH3 5FJFvk6F/+V+o/J71sC7qd1XVz+Zap8Yo2eRE4E1MwqzDsV8rSUmB5hoL w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840565" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840565" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:41 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514093" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514093" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:40 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 21/37] mm: Re-introduce vm_flags to do_mmap() Date: Fri, 4 Nov 2022 15:35:48 -0700 Message-Id: <20221104223604.29615-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607272233613859?= X-GMAIL-MSGID: =?utf-8?q?1748607272233613859?= From: Yu-cheng Yu There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Andrew Morton Cc: Oleg Nesterov Cc: linux-mm@kvack.org --- fs/aio.c | 2 +- include/linux/mm.h | 3 ++- ipc/shm.c | 2 +- mm/mmap.c | 10 +++++----- mm/nommu.c | 4 ++-- mm/util.c | 2 +- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 5b2ff20ad322..66119297125a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -554,7 +554,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 0a3f7e2b32df..c9b387b905df 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2742,7 +2742,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool downgrade); diff --git a/ipc/shm.c b/ipc/shm.c index 7d86f058fb86..11e98de7e522 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1646,7 +1646,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 4dc157869b34..47a8a7d9c560 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1246,11 +1246,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; validate_mm(mm); @@ -1311,7 +1311,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2880,7 +2880,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, 0, pgoff, &populate, NULL); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index 214c70e1d059..20ff1ec89091 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1042,6 +1042,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1049,7 +1050,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); @@ -1069,7 +1069,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ diff --git a/mm/util.c b/mm/util.c index 12984e76767e..aefe4fae7ecf 100644 --- a/mm/util.c +++ b/mm/util.c @@ -517,7 +517,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); From patchwork Fri Nov 4 22:35:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15838 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp679891wru; Fri, 4 Nov 2022 15:44:08 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7lHoaeEsLsX8OM9/UqACvilKCXaSIKxk6GMkCLToW7SQLzX+6XoWV5jKh+T2/gAHQEnhB8 X-Received: by 2002:a17:907:a423:b0:7ad:b5f1:8ff8 with SMTP id sg35-20020a170907a42300b007adb5f18ff8mr33912835ejc.241.1667601848436; Fri, 04 Nov 2022 15:44:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601848; cv=none; d=google.com; s=arc-20160816; b=bL5TlUqHwCUYCbWlJRejuihcgiymSSMPkgjuXHgAh8x1u5VLj0jrYcLxUA7F+4aB+Y 8+23tNUWcXAP6GhOZy0MkpI1LnHEf10DQxMk0mnukDYfAYm4ltHztMK6aTT5Gxr04hzl lCCU0PN36sf6cIcoJexUF+yy/nt+gI9WtAYARKoPKFR5so8I1f5iIPb2xkm1vupliBhO kNIF2UvYmzVijVJJzqLcbMqhxgB7u89gV5yScJnDD4kB5JIigAu+JBR6yMNHAo982pfa 92rd+++rAlV4pdFQa0MpH6AHgwbW5BLoi4dUa+tlK4nmN20WcpDIddjPWc5z7ulUp1ik 55hw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=RivJ7Jf0eqPqaFzyb06995WGh+0/by2/HPOSpRY/UWg=; b=qUS1CisMoNt5x5GdmuslvgcV39Y8IX38/e72wdxTgHCL+mET5l4j2FXb3rrCfe33c1 EbGIGwsyKvGK1pSyUH4VEeVcxTIhAMabkwetvIu60/TS2TBmQ84SL9VWIa9MMKHsErKl qhvu1sHSo0mtdCkKYDrNcIEWM5zIbMBBTTYLNGkSCiqNMvVopcDneNWzrMmxLJU8S6aX Rwxe4HSdScD/0ngNC6NKzON0vVENpdoPdVOQOExeY7bEepamcl3abYVk6vwwjzOjpWog Pfi7GVjELkMbhR4PVgCVsDLQEDn4iEvPXpJzTYk4E4Gl9Ufqtt5gdM6LPVXMukFFJwSE XT1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DGAXFOrt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ec16-20020a170906b6d000b007ae4717bf1esi305496ejb.99.2022.11.04.15.43.45; Fri, 04 Nov 2022 15:44:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DGAXFOrt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230296AbiKDWnF (ORCPT + 99 others); Fri, 4 Nov 2022 18:43:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230298AbiKDWmW (ORCPT ); Fri, 4 Nov 2022 18:42:22 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C2AD51C2C; Fri, 4 Nov 2022 15:39:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601587; x=1699137587; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=kPqO2kwM78EGnIAhrKsxSWDtILBo5g6MXomNZ5elCHU=; b=DGAXFOrtwsEH7TIFKWN3+OLNoiH39FOkhIVerZx8QcRLr4gUntxODpvF W1W/HX8GLOlwSZFE8h44wZy47pIiraGN1p1nolxc9up/kmGwQCEv89Mw0 y9hYICJpEn6Dv3qWpRolN42+qHKxJBejnuLnxRqQsSIwWiQVb3h14EaoV rK51yBCuB7AQqIPsnnPUN63tUZwAiTw0y/ld5/UOYYkX0DSrDqWrz+XgW TEKWSxaJdpz6cSnUtIMiaEZJzWtNZM9+XMC5qZ2CvfLVJ4AoXoQk+RNAM XH+mc8xBRs+D79t5EAJnAiUx7M6bLuqrx1MkTPkc7LResKBKXbcVDQv1O Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840568" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840568" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:42 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514096" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514096" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:41 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 22/37] mm: Don't allow write GUPs to shadow stack memory Date: Fri, 4 Nov 2022 15:35:49 -0700 Message-Id: <20221104223604.29615-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607275779062718?= X-GMAIL-MSGID: =?utf-8?q?1748607275779062718?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Shadow stack memory is writable only in very specific, controlled ways. However, since it is writable, the kernel treats it as such. As a result there remain many ways for userspace to trigger the kernel to write to shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a little less exposed, block writable GUPs for shadow stack VMAs. Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe --- v3: - Add comment in __pte_access_permitted() (Dave) - Remove unneeded shadow stack specific check in __pte_access_permitted() (Jann) arch/x86/include/asm/pgtable.h | 5 +++++ mm/gup.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index d57dc1b2d3e8..0d18f3a4373d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1637,6 +1637,11 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write) { unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER; + /* + * Write=0,Dirty=1 PTEs are shadow stack, which the kernel + * shouldn't generally allow access to, but since they + * are already Write=0, the below logic covers both cases. + */ if (write) need_pte_bits |= _PAGE_RW; diff --git a/mm/gup.c b/mm/gup.c index fe195d47de74..411befd4d431 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1062,7 +1062,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) return -EFAULT; if (write) { - if (!(vm_flags & VM_WRITE)) { + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* From patchwork Fri Nov 4 22:35:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15839 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680051wru; Fri, 4 Nov 2022 15:44:26 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6jO12C8FLIPDwctkbDeLnqNsmTXThhjEah3bNErhqPAPssqxIWWJkm0aEljwlyzxnu4Za9 X-Received: by 2002:a17:907:94c3:b0:78e:2866:f89f with SMTP id dn3-20020a17090794c300b0078e2866f89fmr35655503ejc.617.1667601866339; Fri, 04 Nov 2022 15:44:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601866; cv=none; d=google.com; s=arc-20160816; b=tiDjnL2mGMxnU6aZ63G+BEGzFkqt8AgowYw8USXkyzSK1P5MQoYwnJIq8YRlYnLA35 kgrU28gz/X5DCZFiKFOhREpQj2bKWnWibZkbnfXJ282iXoOJkiXOAcb3oVpPNoUorAQF ldPz58BQv1m27RqMUq72wm3+W5m2CddUPZ5GuH5NO4qbkp1WVgq2pz3OJbEQt+mW2U+u nMCNzFETLVw7625GcP/++TplY+UriYdjyucTE2six0HfD2KZ95byzEzIdK2ra+Qjp9Z6 EHBd6Nn9ndBYRNzxmXfIT931CjBIEfk3YwQaC4/nEftf/otGIPsO3WAbMcdf2NEAl1eA iChg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=4C0U4C/Am5K4X0WaSC2NtdD6FJYGChD7H8vpOqQxCM8=; b=bqu+A4omReoHufP0ymuW5ZHqw0zqs7r88V7XcGI+t8S7wdew6czvAppffGlMqvnxXZ WwwXKnonjZ8ei+VTmRmRnMaIg83A3FaBdR7RqopZEUKGsy7UY2cfMh3mNzx8ahYW0NtL /ZaHzUfTH3Rs7PrdFHBdQfPY82m6q8G+MNo7M7eBMJvhznxyb7cmK18CTklbo2pMpuIc K4vKXHzqtHcoFgAg68v2VpJX+6bqzLlA/9KSYwt/z9/nyvEXA1nbUFBTPF60coYl4i3Q CqFfZEznglLZStVpxrerwmIcT1C+iU/XPZIxeqGf8iqBob+Fl3uQL27dQqIJkJwXmPN8 XzkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UHBOlj4y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dn10-20020a17090794ca00b007acf3aed493si310302ejc.910.2022.11.04.15.44.03; Fri, 04 Nov 2022 15:44:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UHBOlj4y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230339AbiKDWnY (ORCPT + 99 others); Fri, 4 Nov 2022 18:43:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230322AbiKDWm1 (ORCPT ); Fri, 4 Nov 2022 18:42:27 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD0FC51C22; Fri, 4 Nov 2022 15:39:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601589; x=1699137589; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=xaioYoGXdFCH6JyKO39LB3gea0qfxKRbZWJ+Tm5uBgg=; b=UHBOlj4ytEwvcBfpxBSgF8ylCmyjaOQ3RUgLA0v9ABEZuBCeQBcFGqEz BHUiXbvlEOjIz++ZUNUF16jm52Pvuw8yiED6ZWJk2Cr9xWmWYCLcPYOBq M5/D9jNgXB2ozyZuRKGDwdrJ7cjFBo/xtrBy5sCbkVJhSIj8DZw8SLim3 m2XLT7XjEWfpii+NVFe7iA8PLNonW+O1r5YU0p5Xxhu6wnDmWlJqMpdwo rCROKQ/wivWcjKQy/xIjHQxh1tSADNy+iImb/Pk+ja/w0eP37uqRPtxyP YG2pIGNhJACbjIyPjMEiQ5+XG/ubWtiCNtfchN/f5gSf4NnIRW74eVnwr w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840573" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840573" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:43 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514099" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514099" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:42 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 23/37] mm: Warn on shadow stack memory in wrong vma Date: Fri, 4 Nov 2022 15:35:50 -0700 Message-Id: <20221104223604.29615-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607294548575355?= X-GMAIL-MSGID: =?utf-8?q?1748607294548575355?= The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One sharp edge is that PTEs that are both Write=0 and Dirty=1 are treated as shadow by the CPU, but this combination used to be created by the kernel on x86. Previous patches have changed the kernel to now avoid creating these PTEs unless they are for shadow stack memory. In case any missed corners of the kernel are still creating PTEs like this for non-shadow stack memory, and to catch any re-introductions of the logic, warn if any shadow stack PTEs (Write=0, Dirty=1) are found in non-shadow stack VMAs when they are being zapped. This won't catch transient cases but should have decent coverage. It will be compiled out when shadow stack is not configured. In order to check if a pte is shadow stack in core mm code, add default implmentations for pte_shstk() and pmd_shstk(). Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe --- v3: - New patch arch/x86/include/asm/pgtable.h | 2 ++ include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 2 ++ mm/memory.c | 2 ++ 4 files changed, 20 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 0d18f3a4373d..051c03e59468 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -129,6 +129,7 @@ static inline bool pte_dirty(pte_t pte) return pte_flags(pte) & _PAGE_DIRTY_BITS; } +#define pte_shstk pte_shstk static inline bool pte_shstk(pte_t pte) { if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) @@ -147,6 +148,7 @@ static inline bool pmd_dirty(pmd_t pmd) return pmd_flags(pmd) & _PAGE_DIRTY_BITS; } +#define pmd_shstk pmd_shstk static inline bool pmd_shstk(pmd_t pmd) { if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 36926a207b6d..dd84af70b434 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -500,6 +500,20 @@ static inline pte_t pte_mkwrite_shstk(pte_t pte) } #endif +#ifndef pte_shstk +static inline bool pte_shstk(pte_t pte) +{ + return false; +} +#endif + +#ifndef pmd_shstk +static inline bool pmd_shstk(pmd_t pte) +{ + return false; +} +#endif + #ifndef pte_clear_savedwrite #define pte_clear_savedwrite pte_wrprotect #endif diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7643a4db1b50..2540f0d4c8ff 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1656,6 +1656,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, */ orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, tlb->fullmm); + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pmd_shstk(orig_pmd)); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (vma_is_special_huge(vma)) { if (arch_needs_pgtable_deposit()) diff --git a/mm/memory.c b/mm/memory.c index b9bee283aad3..4331f33a02d6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1437,6 +1437,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pte_shstk(ptent)); tlb_remove_tlb_entry(tlb, pte, addr); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); From patchwork Fri Nov 4 22:35:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15840 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680168wru; Fri, 4 Nov 2022 15:44:42 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6saoc5GymxqF4jRx8Wo0brCqKIkaCXdYOsyKbIW7ouDasES6mF7Jq7r2rQi7RY1JjZHvYh X-Received: by 2002:aa7:c792:0:b0:453:98b7:213c with SMTP id n18-20020aa7c792000000b0045398b7213cmr37686097eds.159.1667601882322; Fri, 04 Nov 2022 15:44:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601882; cv=none; d=google.com; s=arc-20160816; b=DE+OkkIvnMV9BzOWSBx2gUWmus42k0LSNz0AfgnPt7LDS+N0ZX2KMuTDofSL+hMzdb 51+rhmZ9H5hCjzwSXzT7pdxp91PFCpPaHSyGkOal6f6ql/mH8QXSxxBmBLhVusx5ilr0 adjNwA1SI8fzUlJG8ACz7ryWczPwHm/4jZ4gDGBH07PzJkhB5vw7tiFWL2NNFf8uNCfS IzaIiABGV/fXDQM0GSOQQ5W9KkZk7K+V+MUVqJyGiAcksQRcYXS72ecHqpooNlx3evF8 raHuZPCB7NC9Z7+71b1yoe5ArFPIminnCzXwBQCWQl3VEhmmppciBoMOTeAQmzLSa7ta Yz4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=UHIHbysnFHt1EQQMa1Yzx2h5UZfX5x8iqfxpSNvPs8A=; b=z78P2fRd3U7gi+zfX6bz/BY1f2O0zfyNggvgQDvYhUrMgGdnlfOYjxQhfCH3XiwMKw SaxYMVA1/mywZ+FfSz2sD4A2/DKq6PX699bnPlfXlwkkULYfET6N/VQc+9rVb3RtRAyh YkAa2w3hl0Kv9ppcbmMPGyf/snkzyYhF0L+5iWv2uBGp85X1wj/XOTvFWq156uLQ73fq feJjPiH6vZIvqZWj3eYjvpHBrOyZtlz7IvA8nknaUi2QfO9Zmx7n/svka8hl6IBHzyEl jvJTW1ObAE6p77dY8OhrxOeanOYYaLKpbKOGl4oFWZl4uKWK3pW6mpkQtICGq8MgMoUC MYiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=l7kc6a2l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j27-20020a170906279b00b0078b41dcf4b8si234278ejc.479.2022.11.04.15.44.19; Fri, 04 Nov 2022 15:44:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=l7kc6a2l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230280AbiKDWnl (ORCPT + 99 others); Fri, 4 Nov 2022 18:43:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230271AbiKDWnA (ORCPT ); Fri, 4 Nov 2022 18:43:00 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BA6259FDF; Fri, 4 Nov 2022 15:39:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601598; x=1699137598; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=YH9KUY8mxtUoG/C7O+owx6kIzcNei0xchFyR6pwYJYk=; b=l7kc6a2lgzOgTJRzzXLDuIUJ5wNlXL0rkVlEyNEp+q9K6JjnXaK5JCNl 6cqhdu9R6mPtxGd/uLx1xou8AsbWAUj2/v0GHSd60NYZm85xYuFe0aQ/a ZDJtBstL6Jdpzlep1XLv43DGEZyd1O9YrlOjLcpPFSj5dYTZNIKEh9RoC q2jP2ZK9/ZNoPyNoJ9bRGCyOG7g5GrRDUK/POkcpjxgkwVAXxWx/ExJYG TfVYb4SfQJrwFFPpDebKbesFAyA31vZrPzgHPrT5cJq56fLf27H4IHOTl O20wgI6fNkStZSTI19COhaYVkjRp5zhbBIBGVasEMm+v3UuI+g8aPmFMV A==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840575" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840575" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:44 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514107" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514107" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:43 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 24/37] x86: Introduce userspace API for CET enabling Date: Fri, 4 Nov 2022 15:35:51 -0700 Message-Id: <20221104223604.29615-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607311293832642?= X-GMAIL-MSGID: =?utf-8?q?1748607311293832642?= From: "Kirill A. Shutemov" Add three new arch_prctl() handles: - ARCH_CET_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or an error. - ARCH_CET_LOCK prevents future disabling or enabling of the specified feature. Returns 0 on success or an error The features are handled per-thread and inherited over fork(2)/clone(2), but reset on exec(). This is preparation patch. It does not implement any features. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Kirill A. Shutemov [tweaked with feedback from tglx] Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v3: - Move shstk.c Makefile changes earlier (Kees) - Add #ifdef around features_locked and features (Kees) - Encapsulate features reset earlier in reset_thread_features() so features and features_locked are not referenced in code that would be compiled !CONFIG_X86_USER_SHADOW_STACK. (Kees) - Fix typo in commit log (Kees) - Switch arch_prctl() numbers to avoid conflict with LAM v2: - Only allow one enable/disable per call (tglx) - Return error code like a normal arch_prctl() (Alexander Potapenko) - Make CET only (tglx) arch/x86/include/asm/cet.h | 21 +++++++++++++++ arch/x86/include/asm/processor.h | 5 ++++ arch/x86/include/uapi/asm/prctl.h | 6 +++++ arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/process_64.c | 7 ++++- arch/x86/kernel/shstk.c | 44 +++++++++++++++++++++++++++++++ 6 files changed, 84 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/cet.h create mode 100644 arch/x86/kernel/shstk.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h new file mode 100644 index 000000000000..a2f3c6e06ef5 --- /dev/null +++ b/arch/x86/include/asm/cet.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_CET_H +#define _ASM_X86_CET_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +long cet_prctl(struct task_struct *task, int option, unsigned long features); +void reset_thread_features(void); +#else +static inline long cet_prctl(struct task_struct *task, int option, + unsigned long features) { return -EINVAL; } +static inline void reset_thread_features(void) {} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_CET_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 67c9d73b31fa..ca66d320a263 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -530,6 +530,11 @@ struct thread_struct { */ u32 pkru; +#ifdef CONFIG_X86_USER_SHADOW_STACK + unsigned long features; + unsigned long features_locked; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 500b96e71f18..2dae9997ee17 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -20,4 +20,10 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +/* Don't use 0x3001-0x3004 because of old glibcs */ + +#define ARCH_CET_ENABLE 0x5001 +#define ARCH_CET_DISABLE 0x5002 +#define ARCH_CET_LOCK 0x5003 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index f901658d9f7c..fbb1cb34188d 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -143,6 +143,8 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += sev.o obj-$(CONFIG_CFI_CLANG) += cfi.o +obj-$(CONFIG_X86_USER_SHADOW_STACK) += shstk.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 6b3418bff326..17fec059317c 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -514,6 +514,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, load_gs_index(__USER_DS); } + reset_thread_features(); + loadsegment(fs, 0); loadsegment(es, _ds); loadsegment(ds, _ds); @@ -830,7 +832,10 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_MAP_VDSO_64: return prctl_map_vdso(&vdso_image_64, arg2); #endif - + case ARCH_CET_ENABLE: + case ARCH_CET_DISABLE: + case ARCH_CET_LOCK: + return cet_prctl(task, option, arg2); default: ret = -EINVAL; break; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c new file mode 100644 index 000000000000..ed6f25cc07c5 --- /dev/null +++ b/arch/x86/kernel/shstk.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * shstk.c - Intel shadow stack support + * + * Copyright (c) 2021, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include + +void reset_thread_features(void) +{ + current->thread.features = 0; + current->thread.features_locked = 0; +} + +long cet_prctl(struct task_struct *task, int option, unsigned long features) +{ + if (option == ARCH_CET_LOCK) { + task->thread.features_locked |= features; + return 0; + } + + /* Don't allow via ptrace */ + if (task != current) + return -EINVAL; + + /* Do not allow to change locked features */ + if (features & task->thread.features_locked) + return -EPERM; + + /* Only support enabling/disabling one feature at a time. */ + if (hweight_long(features) > 1) + return -EINVAL; + + if (option == ARCH_CET_DISABLE) { + return -EINVAL; + } + + /* Handle ARCH_CET_ENABLE */ + return -EINVAL; +} From patchwork Fri Nov 4 22:35:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15841 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680191wru; Fri, 4 Nov 2022 15:44:44 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6CUgZWT861rrxdoae6fqHaSGkx5HcqtOHbr3qo4jgNsFmjAhOVKgQFy860qh/5UF2yebMi X-Received: by 2002:a17:906:b345:b0:7ae:8d4:4204 with SMTP id cd5-20020a170906b34500b007ae08d44204mr14778604ejb.746.1667601884402; Fri, 04 Nov 2022 15:44:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601884; cv=none; d=google.com; s=arc-20160816; b=dR+k797VydFsfuWiyfB/cauDNJFQ06EpK8TUVe13u4jTPwLhfctoNtht4to6Ubafje j6j75DPX6pRH5EThhGbCIJ1pTvnDuKnpr4fVCkGmlT4mpyNSsVomTkdVwfEcfOYcfp+Q LSMJbqbuG3YX6f1O2CS+kQ+aRzyqmU/gf9ww5/mlddues/wbu91zXypbVGXse/44SJl7 D546UelaJsBxU0cVP/JmVgD0otQxbDJnMbeC4Zfr9A6OztrkBLnD7fh2ZyuFJr4464w8 QxAedKjqIvm4Wo9UoT68u0qqXSXOT6VSlpuUT4EMC36vp9jkIpZge4eZ5oep8EFI7u9i jZlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=Kgwor1DwbH3dwby19NJnXrf1FpJgn8/1r67POTZCv2g=; b=XKHAOlbHo+qbaQMIQLhwafM5+gR7nEHpFmR/zI0KYFfPPnlkvnBBdRzsCQOL85MYeA OuUhmDbKhEwZeBBj7luWtQxPJFDH1xPP4Z/8WGbRIaklO+YqnaV2B2YBpSYJ+bEVVhmS 0HHI8C+U6LBQZhRWE/OzekoG9eyQpvJr2b5pjAZ9raIOzM+pc/04y86j5JL5YzrgyP2U vbWJPzYwQqGtl+wJ/5O8fyfWYzCwg1UKIQ0kC3a1WJHXVSI8QqWA83mCg0Fh3Gx9L8gq qugXa+3K1ocPEaczgrBFTz/H50S9prrrZNUdSVIienBcrxd1FR5pcWEsaLJdLIgJKoVU hkGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R8qzkDLg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g7-20020a170906394700b007815e9c5b80si220189eje.617.2022.11.04.15.44.21; Fri, 04 Nov 2022 15:44:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R8qzkDLg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230174AbiKDWno (ORCPT + 99 others); Fri, 4 Nov 2022 18:43:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230298AbiKDWnR (ORCPT ); Fri, 4 Nov 2022 18:43:17 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8553E59FED; Fri, 4 Nov 2022 15:40:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601601; x=1699137601; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=of60iN5sOAyNYUtQIOruia5Wg5JZGtNeyl3TvdwitXo=; b=R8qzkDLg+mL96ra/ohxGtVrP9H660y+7JpK16wy0neLQFOoQjR+lXmCW 8IgEMyCTiRei8zyF3YuGBy3bqikt6ykThOvKdkzFOxKuPd7iaYK5M4no0 YCwSX/jUeYSX/S9akjoaNAghOqiyEnL9OmkClFaUm3cIpzvplDK3Hn3ec B1rSsEKjIOL3nbffteJjRwvWx1d0W+LqSgSngojgLRVKeMuu9u+cJFDQL Hnr+5+MD5hsCXJtZi2xmfhVlOzR5piMNI0WHLDFsk9E1FErcCy8cKDN8R 4XnC2MY+dM2/01TZxtdBQNWSygYZZnz9YH7VVi7JMeTs7cScpFx8nHXi+ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840579" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840579" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:45 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514117" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514117" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:44 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 25/37] x86/shstk: Add user-mode shadow stack support Date: Fri, 4 Nov 2022 15:35:52 -0700 Message-Id: <20221104223604.29615-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607313773515845?= X-GMAIL-MSGID: =?utf-8?q?1748607313773515845?= From: Yu-cheng Yu Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this. Do not support IA32 emulation or x32. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Use define for set_clr_bits_msrl() (Kees) - Make some functions static (Kees) - Change feature_foo() to features_foo() (Kees) - Centralize shadow stack size rlimit checks (Kees) - Disable x32 support v2: - Get rid of unnessary shstk->base checks - Don't support IA32 emulation v1: - Switch to xsave helpers. - Expand commit log. Yu-cheng v30: - Remove superfluous comments for struct thread_shstk. - Replace 'populate' with 'unused'. arch/x86/include/asm/cet.h | 7 ++ arch/x86/include/asm/msr.h | 11 +++ arch/x86/include/asm/processor.h | 3 + arch/x86/include/uapi/asm/prctl.h | 3 + arch/x86/kernel/shstk.c | 146 ++++++++++++++++++++++++++++++ 5 files changed, 170 insertions(+) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index a2f3c6e06ef5..cade110b2ea8 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -8,12 +8,19 @@ struct task_struct; #ifdef CONFIG_X86_USER_SHADOW_STACK +struct thread_shstk { + u64 base; + u64 size; +}; + long cet_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +void shstk_free(struct task_struct *p); #else static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index 65ec1965cd28..a4b86eb537d6 100644 --- a/arch/x86/include/asm/msr.h +++ b/arch/x86/include/asm/msr.h @@ -310,6 +310,17 @@ void msrs_free(struct msr *msrs); int msr_set_bit(u32 msr, u8 bit); int msr_clear_bit(u32 msr, u8 bit); +/* Helper that can never get accidentally un-inlined. */ +#define set_clr_bits_msrl(msr, set, clear) do { \ + u64 __val, __new_val; \ + \ + rdmsrl(msr, __val); \ + __new_val = (__val & ~(clear)) | (set); \ + \ + if (__new_val != __val) \ + wrmsrl(msr, __new_val); \ +} while (0) + #ifdef CONFIG_SMP int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h); int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index ca66d320a263..a6c414dfd10f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -27,6 +27,7 @@ struct vm86; #include #include #include +#include #include #include @@ -533,6 +534,8 @@ struct thread_struct { #ifdef CONFIG_X86_USER_SHADOW_STACK unsigned long features; unsigned long features_locked; + + struct thread_shstk shstk; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 2dae9997ee17..dad5288bf086 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -26,4 +26,7 @@ #define ARCH_CET_DISABLE 0x5002 #define ARCH_CET_LOCK 0x5003 +/* ARCH_CET_ features bits */ +#define CET_SHSTK (1ULL << 0) + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index ed6f25cc07c5..20da2008e021 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -8,14 +8,160 @@ #include #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include +static bool features_enabled(unsigned long features) +{ + return current->thread.features & features; +} + +static void features_set(unsigned long features) +{ + current->thread.features |= features; +} + +static void features_clr(unsigned long features) +{ + current->thread.features &= ~features; +} + +static unsigned long alloc_shstk(unsigned long size) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE; + struct mm_struct *mm = current->mm; + unsigned long addr, unused; + + mmap_write_lock(mm); + addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + + mmap_write_unlock(mm); + + return addr; +} + +static unsigned long adjust_shstk_size(unsigned long size) +{ + if (size) + return PAGE_ALIGN(size); + + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); +} + +static void unmap_shadow_stack(u64 base, u64 size) +{ + while (1) { + int r; + + r = vm_munmap(base, size); + + /* + * vm_munmap() returns -EINTR when mmap_lock is held by + * something else, and that lock should not be held for a + * long time. Retry it for the case. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + /* + * For all other types of vm_munmap() failure, either the + * system is out of memory or there is bug. + */ + WARN_ON_ONCE(r); + break; + } +} + +static int shstk_setup(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long addr, size; + + /* Already enabled */ + if (features_enabled(CET_SHSTK)) + return 0; + + /* Also not supported for 32 bit and x32 */ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || in_32bit_syscall()) + return -EOPNOTSUPP; + + size = adjust_shstk_size(0); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + fpregs_unlock(); + + shstk->base = addr; + shstk->size = size; + features_set(CET_SHSTK); + + return 0; +} + void reset_thread_features(void) { + memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); current->thread.features = 0; current->thread.features_locked = 0; } +void shstk_free(struct task_struct *tsk) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(CET_SHSTK)) + return; + + if (!tsk->mm) + return; + + unmap_shadow_stack(shstk->base, shstk->size); +} + + +static int shstk_disable(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* Already disabled? */ + if (!features_enabled(CET_SHSTK)) + return 0; + + fpregs_lock_and_load(); + /* Disable WRSS too when disabling shadow stack */ + set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_SHSTK_EN); + wrmsrl(MSR_IA32_PL3_SSP, 0); + fpregs_unlock(); + + shstk_free(current); + features_clr(CET_SHSTK); + + return 0; +} + long cet_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_CET_LOCK) { From patchwork Fri Nov 4 22:35:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15842 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680313wru; Fri, 4 Nov 2022 15:45:01 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5FZB+pVbiaHOGHcJPj7WPIef8WZsZ6yEkUnyseFi5/j4sKMybdDWDHQYmZ1D6Omv96YpRB X-Received: by 2002:a17:906:cd11:b0:7ad:a911:eb97 with SMTP id oz17-20020a170906cd1100b007ada911eb97mr373650ejb.373.1667601901184; Fri, 04 Nov 2022 15:45:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601901; cv=none; d=google.com; s=arc-20160816; b=oHcrTr25sucrw5hSsNhSzP1MhUG4YnLgZ8LQP3umrXD4u2RABNmH9fY/Vo8g+3Ipwq 0knvq7Q7Q2Tu01XVX3W16CcpWPwCzSHOyM3qh1ePBpd0GvcVOL+eHjN+6OLzdR4rUtUU 3VYH/XcPpkKRVqme4UuHYHVAoozxVvrst9Tf/RmSoXILPu4WH6QfQnOT3iXNs3mpXtKg U+uX7d7oIfYnIBIp1YU4oxAa2xw5nWS2MznsBqgN/7fXP3tlRrPN+eCByqZiPDqgKbqD COg9dfU85hDK7vxuJ5oopKqiddQURJMHi+w6JaWpY+QCrPQRhriOpl/2NO0zPa1BMnUY nW7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=lekw/rWxNO6MBXY20fcAjpGha7QIbS6HaI5Zi/Q3p9M=; b=L9vXsCE+kxKz+icM83wNWjsb3XU80vU8k451gtHCupprNrbJczxRCRbSmKVkMbVN+2 EPbkwvcKHMVpr2Tz0zHeB62fP8tbLkj8QsLRQR2yyNvW75diGaRXaWfhIw57slDVLkwj 6PzYvKniyf8gNbfVeDLFT+Vf4wTgRL1Vf3XdEqw9rGkTu2H7unANyRS6OAbx8znbb6TP qNbXnR4mz3WPUBVeQ3SZn8embwqJtjWg7OgwG5M3v5HrfNSHNymWE7lj9yaXYe6Et0i6 mfvB/btrPNdUX3c7GVOIEhnF2+VGEn5lXbRc4hqlyHIzlmIYEadioawCnECk6xV1m3vw PNGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=h+KTf1+5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bb6-20020a1709070a0600b0078d93325645si334072ejc.405.2022.11.04.15.44.37; Fri, 04 Nov 2022 15:45:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=h+KTf1+5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230072AbiKDWns (ORCPT + 99 others); Fri, 4 Nov 2022 18:43:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230196AbiKDWnV (ORCPT ); Fri, 4 Nov 2022 18:43:21 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7DE565E4E; Fri, 4 Nov 2022 15:40:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601607; x=1699137607; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4CYw8ElLWR442FlEhGEx1GtaP80qYWriBnDcauBz1vE=; b=h+KTf1+5KAE4fbeDlBQyVQFes+r7vsYGfIiZpukOxj7mtYctYhMY3njN 2tSfytzUmjG1a2vrugT95BrJuFkB+vnOffQ9shTZmKCMY+FK4pMDYhK+W IGbkj5S+SmwF8iRRZBIAVXirQw4TYiroYPl9CQMkhOzStYFVcHVznkJu4 lsx2w+nKjQvr0rNxBMITNVClsWmyp0BXfyJUTb/jbfG6M4SXYFmKTSNmi xq99jWjkOyVmoBG9L7gNo9OkLnQADXFff3Au/1ZF6itG0w/hb27bxPKGl lej8VB6IHJ1Y0K8DueH1lXGDI0Tmc4wIj9SHvvTfV1AujBA99nRINnAAh Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840583" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840583" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:46 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514123" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514123" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:45 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 26/37] x86/shstk: Handle thread shadow stack Date: Fri, 4 Nov 2022 15:35:53 -0700 Message-Id: <20221104223604.29615-27-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607331052233212?= X-GMAIL-MSGID: =?utf-8?q?1748607331052233212?= From: Yu-cheng Yu When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-cet case this is handled in two ways. With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks. For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack. For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it. Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB. For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork(). Userspace implementing posix vfork() can actually prevent the child from returning from the vfork() calling function, using CET. Glibc does this by adjusting the shadow stack pointer in the child, so that the child receives a #CP if it tries to return from vfork() calling function. Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent. During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner. Tested-by: Pengfei Xu Tested-by: John Allen Suggested-by: Thomas Gleixner Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v3: - Fix update_fpu_shstk() stub (Mike Rapoport) - Fix chunks around alloc_shstk() in wrong patch (Kees) - Fix stack_size/flags swap (Kees) - Use centalized stack size logic (Kees) v2: - Have fpu_clone() take new shadow stack pointer and update SSP in xsave buffer for new task. (tglx) v1: - Expand commit log. - Add more comments. - Switch to xsave helpers. Yu-cheng v30: - Update comments about clone()/clone3(). (Borislav Petkov) arch/x86/include/asm/cet.h | 7 +++++ arch/x86/include/asm/fpu/sched.h | 3 +- arch/x86/include/asm/mmu_context.h | 2 ++ arch/x86/kernel/fpu/core.c | 41 +++++++++++++++++++++++++++- arch/x86/kernel/process.c | 18 +++++++++++- arch/x86/kernel/shstk.c | 44 ++++++++++++++++++++++++++++-- 6 files changed, 110 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index cade110b2ea8..1a97223e7d2f 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -15,11 +15,18 @@ struct thread_shstk { long cet_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size, + unsigned long *shstk_addr); void shstk_free(struct task_struct *p); #else static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline int shstk_alloc_thread_stack(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size, + unsigned long *shstk_addr) { return 0; } static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h index b2486b2cbc6e..54c9c2fd1907 100644 --- a/arch/x86/include/asm/fpu/sched.h +++ b/arch/x86/include/asm/fpu/sched.h @@ -11,7 +11,8 @@ extern void save_fpregs_to_fpstate(struct fpu *fpu); extern void fpu__drop(struct fpu *fpu); -extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal); +extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long shstk_addr); extern void fpu_flush_thread(void); /* diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index b8d40ddeab00..d29988cbdf20 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -146,6 +146,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 8b3162badab7..3a2f37ac3005 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -555,8 +555,41 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) } } +#ifdef CONFIG_X86_USER_SHADOW_STACK +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) +{ + struct cet_user_state *xstate; + + /* If ssp update is not needed. */ + if (!ssp) + return 0; + + xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, + XFEATURE_CET_USER); + + /* + * If there is a non-zero ssp, then 'dst' must be configured with a shadow + * stack and the fpu state should be up to date since it was just copied + * from the parent in fpu_clone(). So there must be a valid non-init CET + * state location in the buffer. + */ + if (WARN_ON_ONCE(!xstate)) + return 1; + + xstate->user_ssp = (u64)ssp; + + return 0; +} +#else +static int update_fpu_shstk(struct task_struct *dst, unsigned long shstk_addr) +{ + return 0; +} +#endif + /* Clone current's FPU state on fork */ -int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) +int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long ssp) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; @@ -616,6 +649,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) if (use_xsave()) dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID; + /* + * Update shadow stack pointer, in case it changed during clone. + */ + if (update_fpu_shstk(dst, ssp)) + return 1; + trace_x86_fpu_copy_src(src_fpu); trace_x86_fpu_copy_dst(dst_fpu); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c21b7347a26d..12c28161867c 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "process.h" @@ -118,6 +119,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -139,6 +141,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) struct inactive_task_frame *frame; struct fork_frame *fork_frame; struct pt_regs *childregs; + unsigned long shstk_addr = 0; int ret = 0; childregs = task_pt_regs(p); @@ -173,7 +176,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->flags = X86_EFLAGS_FIXED; #endif - fpu_clone(p, clone_flags, args->fn); + /* Allocate a new shadow stack for pthread if needed */ + ret = shstk_alloc_thread_stack(p, clone_flags, args->stack_size, + &shstk_addr); + if (ret) + return ret; + + fpu_clone(p, clone_flags, args->fn, shstk_addr); /* Kernel thread ? */ if (unlikely(p->flags & PF_KTHREAD)) { @@ -219,6 +228,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); + /* + * If copy_thread() if failing, don't leak the shadow stack possibly + * allocated in shstk_alloc_thread_stack() above. + */ + if (ret) + shstk_free(p); + return ret; } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 20da2008e021..a7a982924b9a 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -47,7 +47,7 @@ static unsigned long alloc_shstk(unsigned long size) unsigned long addr, unused; mmap_write_lock(mm); - addr = do_mmap(NULL, addr, size, PROT_READ, flags, + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); @@ -126,6 +126,40 @@ void reset_thread_features(void) current->thread.features_locked = 0; } +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size, unsigned long *shstk_addr) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + unsigned long addr, size; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!features_enabled(CET_SHSTK)) + return 0; + + /* + * For CLONE_VM, except vfork, the child needs a separate shadow + * stack. + */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + + size = adjust_shstk_size(stack_size); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + shstk->base = addr; + shstk->size = size; + + *shstk_addr = addr + size; + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; @@ -134,7 +168,13 @@ void shstk_free(struct task_struct *tsk) !features_enabled(CET_SHSTK)) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; unmap_shadow_stack(shstk->base, shstk->size); From patchwork Fri Nov 4 22:35:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15843 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680442wru; Fri, 4 Nov 2022 15:45:16 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6gHqwWjCZi0Wx5ApZxj6CIDF5D52yR4HD3dBMBvgNVI5DoYgSqA4WKVyH9WNVwVSllh6JF X-Received: by 2002:a17:907:3e87:b0:7ae:46a8:af0a with SMTP id hs7-20020a1709073e8700b007ae46a8af0amr1218974ejc.554.1667601916116; Fri, 04 Nov 2022 15:45:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601916; cv=none; d=google.com; s=arc-20160816; b=qRWSfW9WVOEXeN3kvTcOn1IyIwfAlWfcd5tLkiEL8fnpZ59NQd4T8COL1O8S1VmT1I 8aFrYfmhb22CLQNM5fVuzMfC5FhiyuxbY+Hlw07kKFTwa/Uenz7itzZaVFmvZKuvwcU/ +vHq8nTIyokGL1uwdYK7E9IxhLeG5dEmSdp0E6ksnE0Zt/NGoAaH8PmTZV1tA2xz0qCP DWs9+UyFENI059XFP6YNTC+CBlRvR0CHnwrGYxmJof2f1ZvfTMAH6T346+Q9P6LY0I7R /I3i6u2YSmSVNVOBT4KdpQzsb2r9t7Yq3g46y0938KBZ3/JMp41S4NUedrLiE4xhVEBt G9hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=zdawbPiNNL5pIHgsXa7uobdUrDBlQz3ICNnQKDs4NyM=; b=pKS0akmnr5S08K4SqMbxlkGAQpzF+USWg6xn0itKQwxIQVMcLg0W/hbvpm2DYlyvNX dZUTGziJasz2S+cL9RMe5lf8EDAHPT+xtP/Bv0LxqMlDYdOLyV5ujnfsSkh39Ha2LobY 7Y6AdG4fNwIs3PC/p89EV7KJmu1PEB4hq4/Pl4slZlQW3W7JQaeXEGcpNoVhIecX7GUk LFg2d1XHz5yKt3PU0LShBmX1pNZtq/Cs3mq9Ewiq8kUV6fPS1PcfUnY03k0N5eeZ3WxU BMMD4YvpL72eNJSnVnuBN3kAU6SHCMMeJFeoHU98Z3yHT+pcQOvcDuKgMcuD7CUHAzQW cecA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JkIG6Cfu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q10-20020a056402518a00b00461bde34a12si923988edd.627.2022.11.04.15.44.52; Fri, 04 Nov 2022 15:45:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JkIG6Cfu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230142AbiKDWoT (ORCPT + 99 others); Fri, 4 Nov 2022 18:44:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230143AbiKDWn1 (ORCPT ); Fri, 4 Nov 2022 18:43:27 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C171065E65; Fri, 4 Nov 2022 15:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601619; x=1699137619; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=vmBs8zfUxV/v5rZQt4IrNbhAPa+GTPySpspTiNWW+cI=; b=JkIG6CfuXPI8XEX3Z/aJKwz0QGnBOCX0g6JDRENKobIyh1ranfI9EV2j ACFV+onvxo+52/HlIs8xRRO5EXsCu6ECllGWP8mLuH5eppLz7u4VYIicJ fqabWEBywtUGdc4Pc4NxnOOecscpJkZHqTLUELlLd9pbUqZNGXoYtC00o ha6UzHRMW4VIyjtybkHfUmFhcz+oUmzAU84y2KGXHRuiuH78wC8tEAy/A ckszskwYdiY/Dyuok6Uv9OkmYbLadENSpMchYillmUh/E4BXZaZUAhWTv ZfPfqb8bx9XPavrIj1QS38DYzB8CNbsG1F9LhmNpgOu/7QN5IEMsPCogQ g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840586" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840586" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:47 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514127" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514127" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:46 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 27/37] x86/shstk: Introduce routines modifying shstk Date: Fri, 4 Nov 2022 15:35:54 -0700 Message-Id: <20221104223604.29615-28-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607346655001700?= X-GMAIL-MSGID: =?utf-8?q?1748607346655001700?= From: Yu-cheng Yu Shadow stack's are normally written to via CALL/RET or specific CET instuctions like RSTORSSP/SAVEPREVSSP. However during some Linux operations the kernel will need to write to directly using the ring-0 only WRUSS instruction. A shadow stack restore token marks a restore point of the shadow stack, and the address in a token must point directly above the token, which is within the same shadow stack. This is distinctively different from other pointers on the shadow stack, since those pointers point to executable code area. Introduce token setup and verify routines. Also introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. In future patches that enable shadow stack to work with signals, the kernel will need something to denote the point in the stack where sigreturn may be called. This will prevent attackers calling sigreturn at arbitrary places in the stack, in order to help prevent SROP attacks. To do this, something that can only be written by the kernel needs to be placed on the shadow stack. This can be accomplished by setting bit 63 in the frame written to the shadow stack. Userspace return addresses can't have this bit set as it is in the kernel range. It is also can't be a valid restore token. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v3: - Drop shstk_check_rstor_token() - Fail put_shstk_data() if bit 63 is set in the data (Kees) - Add comment in create_rstor_token() (Kees) - Pull in create_rstor_token() changes from future patch (Kees) v2: - Add data helpers for writing to shadow stack. v1: - Use xsave helpers. Yu-cheng v30: - Update commit log, remove description about signals. - Update various comments. - Remove variable 'ssp' init and adjust return value accordingly. - Check get_user_shstk_addr() return value. - Replace 'ia32' with 'proc32'. arch/x86/include/asm/special_insns.h | 13 +++++ arch/x86/kernel/shstk.c | 73 ++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 35f709f619fb..6d51a87aea7f 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -223,6 +223,19 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_USER_SHADOW_STACK +static inline int write_user_shstk_64(u64 __user *addr, u64 val) +{ + asm_volatile_goto("1: wrussq %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); + return 0; +fail: + return -EFAULT; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #define nop() asm volatile ("nop") static inline void serialize(void) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index a7a982924b9a..755b4af40413 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -25,6 +25,8 @@ #include #include +#define SS_FRAME_SIZE 8 + static bool features_enabled(unsigned long features) { return current->thread.features & features; @@ -40,6 +42,35 @@ static void features_clr(unsigned long features) current->thread.features &= ~features; } +/* + * Create a restore token on the shadow stack. A token is always 8-byte + * and aligned to 8. + */ +static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{ + unsigned long addr; + + /* Token must be aligned */ + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + addr = ssp - SS_FRAME_SIZE; + + /* + * SSP is aligned, so reserved bits and mode bit are a zero, just mark + * the token 64-bit. + */ + ssp |= BIT(0); + + if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) + return -EFAULT; + + if (token_addr) + *token_addr = addr; + + return 0; +} + static unsigned long alloc_shstk(unsigned long size) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; @@ -160,6 +191,48 @@ int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, return 0; } +static unsigned long get_user_shstk_addr(void) +{ + unsigned long long ssp; + + fpregs_lock_and_load(); + + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + fpregs_unlock(); + + return ssp; +} + +static int put_shstk_data(u64 __user *addr, u64 data) +{ + if (WARN_ON_ONCE(data & BIT(63))) + return -EINVAL; + + /* + * Mark the high bit so that the sigframe can't be processed as a + * return address. + */ + if (write_user_shstk_64(addr, data | BIT(63))) + return -EFAULT; + return 0; +} + +static int get_shstk_data(unsigned long *data, unsigned long __user *addr) +{ + unsigned long ldata; + + if (unlikely(get_user(ldata, addr))) + return -EFAULT; + + if (!(ldata & BIT(63))) + return -EINVAL; + + *data = ldata & ~BIT(63); + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; From patchwork Fri Nov 4 22:35:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15844 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680476wru; Fri, 4 Nov 2022 15:45:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM65niV2Gym1CmNf+1EBXjTv5jMQL0nppc5ltUElAxeQdo1srwW2OZTXBg5K2nlSZEbXU5oT X-Received: by 2002:a05:6402:1e96:b0:462:89aa:d402 with SMTP id f22-20020a0564021e9600b0046289aad402mr37542900edf.190.1667601919574; Fri, 04 Nov 2022 15:45:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601919; cv=none; d=google.com; s=arc-20160816; b=OCsNTNJOBhpkfJClzoMtiPav+Ph0LhXHr29m6vYf3uzAwHEzYZ3lhUh2egf9dM7mIc qg2/9MRhWVG79gC9bB0DbVoTXXub1BlA7bdHq9XX1ImsPKCdOMiQKP8gSTjScmOIbvZi KQ/cfttdhvsyV9AzogpF0I+0KYmITCfroMWsch2q613cwScsSSZQgFLdm+fnDC45BsHa NJMZf7PMSXKUIm/rwHv/jq7k6Tyd7InGOZyt6O9k4sGUvWnYJdQFRsNbykZ3CZ3+aIMw I5ubj+ckwY69yPQI3P7xpwIXn/iizvoTSNHDQa7IQ/rK2WoQ5dN8h3jiE9m+EKEHy+kz wgtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=WL/S2XkTKEgZuvPPSIshzgqU5bltaXv+qGtPyVmHzC0=; b=UERh9vAwH9txaqvXu6gG/6r1GI7uQXhcrcllF0aW0EXh32oGJJT99+yeFNc5/CivY2 fiC+6fLFlLMATJU0tlkkMedwsQdUmLP0KZpeTGZFjpDIJ62ZSm8282eXPVvRh8qS8IAw hgMV1rq0KL0oTtKmTVLcYpe5LAtCx3oIibfs/AWcl2a50bw4MJSg1iPxoCj4LNaqDSu4 CYLlzMtTWyfJ/WoGfaziCUNxGTwwqQ1+TT+43+kigERYOarG5odMP8iSA9cqeDIeDN/E eSXTQK+J/Hl1Z5S85bDPrJdzYY9ZkTAYG7x8SAnewnlm7zSbtF70umBjaWv5MHyxOGAB nmtg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FFzddPLV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h12-20020aa7c60c000000b00461a7bb34c2si671364edq.473.2022.11.04.15.44.56; Fri, 04 Nov 2022 15:45:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FFzddPLV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230146AbiKDWoX (ORCPT + 99 others); Fri, 4 Nov 2022 18:44:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229651AbiKDWnf (ORCPT ); Fri, 4 Nov 2022 18:43:35 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D97D4C256; Fri, 4 Nov 2022 15:40:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601622; x=1699137622; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=eHVT7p0xfZi4NC1zJnpLF/CK5uL8RAltvWhtj089zuI=; b=FFzddPLV9Gg3HHAR2WchA6lxZ9fmbaYCBUtNSVJ85Swk7JXRA3GIKwXS 7glfsUMkdorGO7pTumEUjh0ZrzZDd9Pf3nSBJVmm8rIDN/kfMxBHvvi6w LwHOOOVwk2F0Amy2DPHSgtCJJvWLzB3fukwyuc9h8LgdwG1DKKsOhXaa+ WEKVcJ2YejXs3LyfKtDu2hTMF0npoFjm8QQHRFnr5wtTonuxPaUPtB/yG wEtJjLKlRoS4NRsXjr1RYeyGvN4ExfV6w7czYWA7r/DXC1qmMbcz5GOlx mdv4zb3YpFTim62COLHyLoaKbG1fveBf3wQe3a9TE6HPimhglMJ0ahlrA w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840590" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840590" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:47 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514132" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514132" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:46 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 28/37] x86/shstk: Handle signals for shadow stack Date: Fri, 4 Nov 2022 15:35:55 -0700 Message-Id: <20221104223604.29615-29-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607350483849128?= X-GMAIL-MSGID: =?utf-8?q?1748607350483849128?= From: Yu-cheng Yu When a signal is handled normally the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only track's return addresses, there isn't any state that needs to be pushed. However, there are still a few things that need to be done. These things are userspace visible and which will be kernel ABI for shadow stacks. One is to make sure the restorer address is written to shadow stack, since the signal handler (if not changing ucontext) returns to the restorer, and the restorer calls sigreturn. So add the restorer on the shadow stack before handling the signal, so there is not a conflict when the signal handler returns to the restorer. The other thing to do is to place some type of checkable token on the thread's shadow stack before handling the signal and check it during sigreturn. This is an extra layer of protection to hamper attackers calling sigreturn manually as in SROP-like attacks. For this token we can use the shadow stack data format defined earlier. Have the data pushed be the previous SSP. In the future the sigreturn might want to return back to a different stack. Storing the SSP (instead of a restore offset or something) allows for future functionality that may want to restore to a different stack. So, when handling a signal push - the SSP pointing in the shadow stack data format - the restorer address below the restore token. In sigreturn, verify SSP is stored in the data format and pop the shadow stack. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Andy Lutomirski Cc: Cyrill Gorcunov Cc: Florian Weimer Cc: H. Peter Anvin Cc: Kees Cook --- v3: - Drop shstk_setup_rstor_token() (Kees) - Drop x32 signal support, since x32 support is dropped v2: - Switch to new shstk signal format v1: - Use xsave helpers. - Expand commit log. Yu-cheng v27: - Eliminate saving shadow stack pointer to signal context. Yu-cheng v25: - Update commit log/comments for the sc_ext struct. - Use restorer address already calculated. - Change CONFIG_X86_CET to CONFIG_X86_SHADOW_STACK. - Change X86_FEATURE_CET to X86_FEATURE_SHSTK. - Eliminate writing to MSR_IA32_U_CET for shadow stack. - Change wrmsrl() to wrmsrl_safe() and handle error. arch/x86/ia32/ia32_signal.c | 1 + arch/x86/include/asm/cet.h | 5 ++ arch/x86/kernel/shstk.c | 98 +++++++++++++++++++++++++++++++++++++ arch/x86/kernel/signal.c | 7 +++ 4 files changed, 111 insertions(+) diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c index c9c3859322fa..88d71b9de616 100644 --- a/arch/x86/ia32/ia32_signal.c +++ b/arch/x86/ia32/ia32_signal.c @@ -34,6 +34,7 @@ #include #include #include +#include static inline void reload_segments(struct sigcontext_32 *sc) { diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 1a97223e7d2f..098e4ecfdf9b 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct ksignal; #ifdef CONFIG_X86_USER_SHADOW_STACK struct thread_shstk { @@ -19,6 +20,8 @@ int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size, unsigned long *shstk_addr); void shstk_free(struct task_struct *p); +int setup_signal_shadow_stack(struct ksignal *ksig); +int restore_signal_shadow_stack(void); #else static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } @@ -28,6 +31,8 @@ static inline int shstk_alloc_thread_stack(struct task_struct *p, unsigned long stack_size, unsigned long *shstk_addr) { return 0; } static inline void shstk_free(struct task_struct *p) {} +static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } +static inline int restore_signal_shadow_stack(void) { return 0; } #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 755b4af40413..332b7c73a1af 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -233,6 +233,104 @@ static int get_shstk_data(unsigned long *data, unsigned long __user *addr) return 0; } +static int shstk_push_sigframe(unsigned long *ssp) +{ + unsigned long target_ssp = *ssp; + + /* Token must be aligned */ + if (!IS_ALIGNED(*ssp, 8)) + return -EINVAL; + + if (!IS_ALIGNED(target_ssp, 8)) + return -EINVAL; + + *ssp -= SS_FRAME_SIZE; + if (put_shstk_data((void *__user)*ssp, target_ssp)) + return -EFAULT; + + return 0; +} + +static int shstk_pop_sigframe(unsigned long *ssp) +{ + unsigned long token_addr; + int err; + + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); + if (unlikely(err)) + return err; + + /* Restore SSP aligned? */ + if (unlikely(!IS_ALIGNED(token_addr, 8))) + return -EINVAL; + + /* SSP in userspace? */ + if (unlikely(token_addr >= TASK_SIZE_MAX)) + return -EINVAL; + + *ssp = token_addr; + + return 0; +} + +int setup_signal_shadow_stack(struct ksignal *ksig) +{ + void __user *restorer = ksig->ka.sa.sa_restorer; + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(CET_SHSTK)) + return 0; + + if (!restorer) + return -EINVAL; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_push_sigframe(&ssp); + if (unlikely(err)) + return err; + + /* Push restorer address */ + ssp -= SS_FRAME_SIZE; + err = write_user_shstk_64((u64 __user *)ssp, (u64)restorer); + if (unlikely(err)) + return -EFAULT; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + +int restore_signal_shadow_stack(void) +{ + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(CET_SHSTK)) + return 0; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_pop_sigframe(&ssp); + if (unlikely(err)) + return err; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 9c7265b524c7..be25f7dce2d5 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -47,6 +47,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 /* @@ -472,6 +473,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig, frame = get_sigframe(&ksig->ka, regs, sizeof(struct rt_sigframe), &fp); uc_flags = frame_uc_flags(regs); + if (setup_signal_shadow_stack(ksig)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -675,6 +679,9 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (restore_altstack(&frame->uc.uc_stack)) goto badframe; From patchwork Fri Nov 4 22:35:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15845 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680657wru; Fri, 4 Nov 2022 15:45:41 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5aP2IJ7pKP3mUOgdnBR9bE4+Vz3vNTS2GlJxsEHVt5L3FQDxlIWNdu9w3MYZVtqh61vxkY X-Received: by 2002:a05:6402:510c:b0:462:51ad:2af5 with SMTP id m12-20020a056402510c00b0046251ad2af5mr38509016edd.257.1667601941450; Fri, 04 Nov 2022 15:45:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601941; cv=none; d=google.com; s=arc-20160816; b=B510FcZP/DcyX7QJGSnzde4UfVqiQO3ERQT8/FOdo3A9/VkGbwxtet0CxrPbWiz64j P+smczwC+qJtbIBTSWcPTcbsZ5BOrakMv8KWSMx297+IZUmQ1ikDYiqFZq3nK/I2cNLx 8gCtxdi+cfYIdUn8WSwbKwFtbGerprc2YylJdP08ODvI1PnkCeJqc7ZEDqh1LhexP5Je cmOVEvZy8S9B8x2roDhnSFDVpWoww5izohnqoKegS/txR52fyYAUbOagrdiKcCQE9QAv 4u62tZrPmhzbenDjcyGpBfnYYOiZfZJWFzCOYdQZSor3rEoV25RMy8qymOJjzlIocx3U YNCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=a/MtkaNgcwRlXBshFUBOH1/Bz1LyMHZ68yLZqMqLgd0=; b=t1hG90Cvw7Q7iv+7o4ZA+CUJV1QOs6AFceXpUWHwxIURw5N01V7XA0a5bSGTHxk1TJ FDVDvyIDSxTiGuK9DE/Qhl4iicFxZIpIls1A4K4o2H39M6quuZFzd/jAumD/khKqDCCf 22PFNncywHUzWSFy8BinfqmSCJsdrvbEc25fvMeUc7Js6qa5gjCRPn12eldsgebwBggc hXOSAtiFIG8Osai48iQ3AjPHiUjKwBH9OWREEtpJXlI1kogVY/l1T2KrgzQVUbXPV8yV ttDVrpBclhD5uO6LE0F5vhYnu2NDj3Yu1bbfg28raobQ/NG7XWbtgmFKwE9mDHlLgdRX m2Xg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=S9mz998r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id go11-20020a1709070d8b00b0078db79317bbsi334834ejc.689.2022.11.04.15.45.18; Fri, 04 Nov 2022 15:45:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=S9mz998r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230328AbiKDWog (ORCPT + 99 others); Fri, 4 Nov 2022 18:44:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230288AbiKDWnl (ORCPT ); Fri, 4 Nov 2022 18:43:41 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A04066C8E; Fri, 4 Nov 2022 15:40:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601627; x=1699137627; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=WNhhiRqVHAvXool292BUKOHlXWpzFCY5sXrk4JXZZ5k=; b=S9mz998rx6iVxjbRFa0FZc8kibCHBGhrnr1VFD3oOyfEjA5W+gW0gDAh Cnbu/R1FCtARj1mjBrUfcsgoJuarxNf3pKGn2QMPUyyw+b1bdXq/Tb3L0 Qx/PCXL37OcSwgX2bLwB/Fo2GSdHDEiW/7a4/bl1exXHx0TDq3lOsNJkG 2VJqbYK9bz4UaqC7rfb7yIE026qHzDFXhnFwo7KcizDoeaxa9yw0LtLpB bd+vz3kpKxFXMgIKay36PLq8o9F+hhUSr8olQTXnl+CT2LfQBTX55+/6O +lIIZxYCqei8JfWsVVTGJ3qPsbf4s2vAl8FHLm9c2US5FlNguqyvWmN18 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840593" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840593" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:48 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514136" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514136" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:47 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 29/37] x86/shstk: Introduce map_shadow_stack syscall Date: Fri, 4 Nov 2022 15:35:56 -0700 Message-Id: <20221104223604.29615-30-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607373413935312?= X-GMAIL-MSGID: =?utf-8?q?1748607373413935312?= When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stack's is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Previously, a new PROT_SHADOW_STACK was attempted, which could be mprotect()ed from RW permissions after the data was provisioned. This was found to not be secure enough, as other thread's could write to the shadow stack during the writable window. The kernel can use a special instruction, WRUSS, to write directly to userspace shadow stacks. So the solution can be that memory can be mapped as shadow stack permissions from the beginning (never generally writable in userspace), and the kernel itself can write the restore token. First, a new madvise() flag was explored, which could operate on the PROT_SHADOW_STACK memory. This had a couple downsides: 1. Extra checks were needed in mprotect() to prevent writable memory from ever becoming PROT_SHADOW_STACK. 2. Extra checks/vma state were needed in the new madvise() to prevent restore tokens being written into the middle of pre-used shadow stacks. It is ideal to prevent restore tokens being added at arbitrary locations, so the check was to make sure the shadow stack had never been written to. 3. It stood out from the rest of the madvise flags, as more of direct action than a hint at future desired behavior. So rather than repurpose two existing syscalls (mmap, madvise) that don't quite fit, just implement a new map_shadow_stack syscall to allow userspace to map and setup new shadow stacks in one step. While ucontext is the primary motivator, userspace may have other unforeseen reasons to setup it's own shadow stacks using the WRSS instruction. Towards this provide a flag so that stacks can be optionally setup securely for the common case of ucontext without enabling WRSS. Or potentially have the kernel set up the shadow stack in some new way. The following example demonstrates how to create a new shadow stack with map_shadow_stack: void *shstk = map_shadow_stack(adrr, stack_size, SHADOW_STACK_SET_TOKEN); Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe --- v3: - Change syscall common -> 64 (Kees) - Use bit shift notation instead of 0x1 for uapi header (Kees) - Call do_mmap() with MAP_FIXED_NOREPLACE (Kees) - Block unsupported flags (Kees) - Require size >= 8 to set token (Kees) v2: - Change syscall to take address like mmap() for CRIU's usage v1: - New patch (replaces PROT_SHADOW_STACK). arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/uapi/asm/mman.h | 3 ++ arch/x86/kernel/shstk.c | 57 ++++++++++++++++++++++---- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 2 +- kernel/sys_ni.c | 1 + 6 files changed, 56 insertions(+), 9 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c84d12608cd2..f65c671ce3b1 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -372,6 +372,7 @@ 448 common process_mrelease sys_process_mrelease 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node +451 64 map_shadow_stack sys_map_shadow_stack # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 775dbd3aff73..15c5a1c4fc29 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -12,6 +12,9 @@ ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) #endif +/* Flags for map_shadow_stack(2) */ +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ + #include #endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 332b7c73a1af..9a025eea520f 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -71,19 +72,31 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) return 0; } -static unsigned long alloc_shstk(unsigned long size) +static unsigned long alloc_shstk(unsigned long addr, unsigned long size, + unsigned long token_offset, bool set_res_tok) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; struct mm_struct *mm = current->mm; - unsigned long addr, unused; + unsigned long mapped_addr, unused; - mmap_write_lock(mm); - addr = do_mmap(NULL, 0, size, PROT_READ, flags, - VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + if (addr) + flags |= MAP_FIXED_NOREPLACE; + mmap_write_lock(mm); + mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); - return addr; + if (!set_res_tok || IS_ERR_VALUE(addr)) + goto out; + + if (create_rstor_token(mapped_addr + token_offset, NULL)) { + vm_munmap(mapped_addr, size); + return -EINVAL; + } + +out: + return mapped_addr; } static unsigned long adjust_shstk_size(unsigned long size) @@ -134,7 +147,7 @@ static int shstk_setup(void) return -EOPNOTSUPP; size = adjust_shstk_size(0); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -179,7 +192,7 @@ int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, size = adjust_shstk_size(stack_size); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -373,6 +386,34 @@ static int shstk_disable(void) return 0; } + +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ + bool set_tok = flags & SHADOW_STACK_SET_TOKEN; + unsigned long aligned_size; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -ENOSYS; + + if (flags & ~SHADOW_STACK_SET_TOKEN) + return -EINVAL; + + /* If there isn't space for a token */ + if (set_tok && size < 8) + return -EINVAL; + + /* + * An overflow would result in attempting to write the restore token + * to the wrong location. Not catastrophic, but just return the right + * error code and block it. + */ + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return alloc_shstk(addr, aligned_size, size, set_tok); +} + long cet_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_CET_LOCK) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index a34b0f9a9972..3ae05cbdea5b 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1056,6 +1056,7 @@ asmlinkage long sys_memfd_secret(unsigned int flags); asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long len, unsigned long home_node, unsigned long flags); +asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 45fa180cc56a..b12940ec5926 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -887,7 +887,7 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 860b2dcf3ac4..cb9aebd34646 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -381,6 +381,7 @@ COND_SYSCALL(vm86old); COND_SYSCALL(modify_ldt); COND_SYSCALL(vm86); COND_SYSCALL(kexec_file_load); +COND_SYSCALL(map_shadow_stack); /* s390 */ COND_SYSCALL(s390_pci_mmio_read); From patchwork Fri Nov 4 22:35:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15847 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680871wru; Fri, 4 Nov 2022 15:46:10 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6OfN3Dke1mnlzzFn08cGxE3JsydHa0UIBo2RWjmzACjQCio6LiXUG/s1QPFdxE1gfn/zYU X-Received: by 2002:a17:906:d550:b0:7ad:d2f1:dba5 with SMTP id cr16-20020a170906d55000b007add2f1dba5mr28475124ejc.52.1667601969992; Fri, 04 Nov 2022 15:46:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601969; cv=none; d=google.com; s=arc-20160816; b=jxhxijpDJvWreE59rvFwOHzz6FB564o0frJoUcOYEj/3SO0nRv9VlVuoisbImP+NM2 v1Lbe6hfmJoLagOEoqCmlGXNd3xoMf/bCCqVKc0vHjEjUdemR3QWWoFdWtLz5emqiSIm vvMBlZldC7EUNcRKIKGv88HN4oSU/OU9tfdnpfngnGrgsTXxL+mLhZNrFwtkAkpAiW+/ nQ3+lCBQnDMx+XcfnPQRTSOLmZbn3qI8SC+lMp5rpF06m3IdPGxl20BsOBEXcCmFm1tC LDXect0J59TSOyUFFG3VTAWf+vV5/WAjdw924FylwbXFbqiaSX161GB+cBpzCagXPF9t 4+2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=jYJvV5iMfPmkDGiHFT8a79PJw+Qr0OeWIjZvDhBR51Y=; b=OAHLb3TQJxZW9Q3isNJA3Ph+o4TPlI7Jiek40qyzpNdbhtx6claHbFgXAl/1kD/5QI agwJ7+Zm0hlem90cL/EDaJsChGX3SegwNKflRYE26dQhGzIAfHbjph+tvK7BZKw35Op9 J2Z/GBSeOfvl4qvrPsyQU0vWJVV9xZyHGhVqaXrCNVO/3fz7MzS1FmCY9anQUbDdcnao wikcIpW0OH1mREXD2q59AjkNjZHmw3qLtB1r9ptGHUhU5NSy+DjObp17rG1LQF8r/YAZ ggH/U2/OqJB2WyBDNQBWHvVLeo1sx9P5fiaTXVo3pGzESwiDCEpm6Z+suEtoyK/tRZQ5 4nnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Y/9xKPT/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f12-20020a170906390c00b00780805b99ccsi221137eje.648.2022.11.04.15.45.47; Fri, 04 Nov 2022 15:46:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Y/9xKPT/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230222AbiKDWow (ORCPT + 99 others); Fri, 4 Nov 2022 18:44:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230196AbiKDWnu (ORCPT ); Fri, 4 Nov 2022 18:43:50 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8E1D66CB0; Fri, 4 Nov 2022 15:40:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601633; x=1699137633; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=xYUN5lPpKChBVK2npDRqZ1rxSFprNl3Bqg8OsGm5TD4=; b=Y/9xKPT/urutBBTIw034vEp2FgtuQIIzaKcVyTr/FhFEoHlRk2Hwz/4W 4T1CC4ovM8B0e8gyohRB1cpCIkMah1BTfOzJrBgkCfM8SRHDhSUifNI17 4X7VB6kO1FgfG81HtP62C0s/JZMagaUnDbpgi/WlbBtYQsgvWwI+CKq/q ix1NgPZOgQRH+oE9gJWUN4K3SIF7rOp56hvL5ZQb0eddrv4VroTa2TaLu 6QuVmODe1Dsg0A+mqD4p77VcbvY2yJv3S9MMkouP3wyuOIe832/Th5y94 ReN1yyqGHmQQUDiDFRGOIRsS/ubHtvFvLaeWMM3jMXMlsmaVOvykDXsG5 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840596" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840596" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:49 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514139" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514139" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:48 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 30/37] x86/shstk: Support wrss for userspace Date: Fri, 4 Nov 2022 15:35:57 -0700 Message-Id: <20221104223604.29615-31-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607403231828745?= X-GMAIL-MSGID: =?utf-8?q?1748607403231828745?= For the current shadow stack implementation, shadow stacks contents can't easily be provisioned with arbitrary data. This property helps apps protect themselves better, but also restricts any potential apps that may want to do exotic things at the expense of a little security. The x86 shadow stack feature introduces a new instruction, wrss, which can be enabled to write directly to shadow stack permissioned memory from userspace. Allow it to get enabled via the prctl interface. Only enable the userspace wrss instruction, which allows writes to userspace shadow stacks from userspace. Do not allow it to be enabled independently of shadow stack, as HW does not support using WRSS when shadow stack is disabled. From a fault handler perspective, WRSS will behave very similar to WRUSS, which is treated like a user access from a #PF err code perspective. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe --- v3: - Make wrss_control() static - Fix verbiage in commit log (Kees) v2: - Add some commit log verbiage from (Dave Hansen) v1: - New patch. arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/shstk.c | 33 +++++++++++++++++++++++++++++-- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index dad5288bf086..5f1d3181e4a1 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -28,5 +28,6 @@ /* ARCH_CET_ features bits */ #define CET_SHSTK (1ULL << 0) +#define CET_WRSS (1ULL << 1) #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 9a025eea520f..cbd0970b26d7 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -364,6 +364,35 @@ void shstk_free(struct task_struct *tsk) unmap_shadow_stack(shstk->base, shstk->size); } +static int wrss_control(bool enable) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* + * Only enable wrss if shadow stack is enabled. If shadow stack is not + * enabled, wrss will already be disabled, so don't bother clearing it + * when disabling. + */ + if (!features_enabled(CET_SHSTK)) + return -EPERM; + + /* Already enabled/disabled? */ + if (features_enabled(CET_WRSS) == enable) + return 0; + + fpregs_lock_and_load(); + if (enable) { + set_clr_bits_msrl(MSR_IA32_U_CET, CET_WRSS_EN, 0); + features_set(CET_WRSS); + } else { + set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_WRSS_EN); + features_clr(CET_WRSS); + } + fpregs_unlock(); + + return 0; +} static int shstk_disable(void) { @@ -376,12 +405,12 @@ static int shstk_disable(void) fpregs_lock_and_load(); /* Disable WRSS too when disabling shadow stack */ - set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_SHSTK_EN); + set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_SHSTK_EN | CET_WRSS_EN); wrmsrl(MSR_IA32_PL3_SSP, 0); fpregs_unlock(); shstk_free(current); - features_clr(CET_SHSTK); + features_clr(CET_SHSTK | CET_WRSS); return 0; } From patchwork Fri Nov 4 22:35:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15846 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680801wru; Fri, 4 Nov 2022 15:46:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM49PhoBK2t/aTiDctZSC1Uf8eJsD+jbeVSBowZxhIDXw5tNQHPyDccaLc22ekCnAv/zeJ7C X-Received: by 2002:a05:6402:1acf:b0:464:46e7:503b with SMTP id ba15-20020a0564021acf00b0046446e7503bmr9500492edb.241.1667601960462; Fri, 04 Nov 2022 15:46:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601960; cv=none; d=google.com; s=arc-20160816; b=afXfDNCwQRgtvfESiRFSwDaEEbIxb8cPVF7C+3XJzglFz5xTdoMzb/r7mF4hh13kx/ 932ry58AusWX9Smbmx5IUnVSkTkz/8iNU6cnajggM3dmyuOoCp/icXyf3HSnrBUx7p+D 6iKjySjg0ubT8rtqpX4fc1b4qyhXl9UI3gmNFcjHAPWqyjE56fsPIx837y39Ybos57b5 zd//CcWrcKqLDoPCQxr/oaW374AB1STDUWJalexoGQu7gDNe4DofqvCSFhsbS6IwCff5 nW99EdLH2OPBTEHxePHif6N4Y9s0d6CHWPp5Ysa5TvlL8XcTEfODrOGwc+Yq/HycbF72 etPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=Eu7eLlxVBWKEVHa42ldyG6KO0oyIM+LboIItdIy+tuY=; b=jRj6iEyRaagoj80PGwbRObtI7lTgzbYiuF3fKvZ+uzuHcG/Sklc3jV/zxA8NJiYk1I NfFeLvJeyGtfnGfugxrr8Brr28rSYBMFXtGh7hrwbW/oWJfIVQb+yFGUsN445oD3596a TrHQKFbE7fevqyWPA1W/imcSbGZlU5mR08h/VMvdJZ/hSa12GJ1lzvfOI6gRQoTaMTa4 mek09tUUVOO3QI3HczO3FANKzNzhJ4H+FX7RIQtwS3nyh65PHwuFyF+dZZe3b1VJ5yMx 5jlX7y8mn6Ao18pJ/pe5mayDZho2KQ7NOEoV/fRCWUdgyHqPK2ILUeFEzjFjft6UjUxr sc8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UxWHCTtw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ty22-20020a170907c71600b0078dc5c888f1si329374ejc.135.2022.11.04.15.45.37; Fri, 04 Nov 2022 15:46:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UxWHCTtw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230403AbiKDWpB (ORCPT + 99 others); Fri, 4 Nov 2022 18:45:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230040AbiKDWoO (ORCPT ); Fri, 4 Nov 2022 18:44:14 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A55F66CB1; Fri, 4 Nov 2022 15:40:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601635; x=1699137635; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=UXzhjhw4AECWj6QkxFrPwKgALHbK58V+F3EC9InWArw=; b=UxWHCTtwo6qvjZcO1ouMRx4w8+ikBSSu343rXTUR5f16MA7AJYw1qVct Wx/RVJZvQh+koGBoHJL6/9Hbg/jF46qM0MF+KNa0NBL8E2XMlsc6GNT4C Ws+YStziG1iLRJbXhNGZ9CoRab+UHsVtTrLlEq5HELRSOdTc8w+Xe1ioS AE5zI4F4rndcEiL4s6/66dr3vS1vPq7A0YWQxKrVWMxuxHdN0VjvGeaVw n+EMz8iId7jmCPDrREgOklzlnI+mloQxVJjuegBLVnSwNljOSrat/LPkV S4bEWgqhumCLA52c9ks2C1SOcBLDsZJKFqeLrmTKdAhBk4/26csXGj4Nx A==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840602" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840602" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:50 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514142" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514142" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:49 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 31/37] x86: Expose thread features in /proc/$PID/status Date: Fri, 4 Nov 2022 15:35:58 -0700 Message-Id: <20221104223604.29615-32-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607393542745126?= X-GMAIL-MSGID: =?utf-8?q?1748607393542745126?= Applications and loaders can have logic to decide whether to enable CET. They usually don't report whether CET has been enabled or not, so there is no way to verify whether an application actually is protected by CET features. Add two lines in /proc/$PID/status to report enabled and locked features. Since, this involves referring to arch specific defines in asm/prctl.h, implement an arch breakout to emit the feature lines. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Kirill A. Shutemov [Switched to CET, added to commit log] Signed-off-by: Rick Edgecombe --- v3: - Move to /proc/pid/status (Kees) v2: - New patch arch/x86/kernel/cpu/proc.c | 23 +++++++++++++++++++++++ fs/proc/array.c | 6 ++++++ include/linux/proc_fs.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 099b6f0d96bd..105587d43500 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -4,6 +4,8 @@ #include #include #include +#include +#include #include "cpu.h" @@ -175,3 +177,24 @@ const struct seq_operations cpuinfo_op = { .stop = c_stop, .show = show_cpuinfo, }; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +static void dump_x86_features(struct seq_file *m, unsigned long features) +{ + if (features & CET_SHSTK) + seq_puts(m, "shstk "); + if (features & CET_WRSS) + seq_puts(m, "wrss "); +} + +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task) +{ + seq_puts(m, "x86_Thread_features:\t"); + dump_x86_features(m, task->thread.features); + seq_putc(m, '\n'); + + seq_puts(m, "x86_Thread_features_locked:\t"); + dump_x86_features(m, task->thread.features_locked); + seq_putc(m, '\n'); +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/fs/proc/array.c b/fs/proc/array.c index 49283b8103c7..7ac43ecda1c2 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -428,6 +428,11 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm) seq_printf(m, "THP_enabled:\t%d\n", thp_enabled); } +__weak void arch_proc_pid_thread_features(struct seq_file *m, + struct task_struct *task) +{ +} + int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { @@ -451,6 +456,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, task_cpus_allowed(m, task); cpuset_task_status_allowed(m, task); task_context_switch_counts(m, task); + arch_proc_pid_thread_features(m, task); return 0; } diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 81d6e4ec2294..5a8b21c0a587 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -158,6 +158,8 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); #endif /* CONFIG_PROC_PID_ARCH_STATUS */ +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task); + #else /* CONFIG_PROC_FS */ static inline void proc_root_init(void) From patchwork Fri Nov 4 22:35:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15848 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp680873wru; Fri, 4 Nov 2022 15:46:10 -0700 (PDT) X-Google-Smtp-Source: AMsMyM417yuQsCfHNG9mJbGIy821ulPfh4G4ND7XYzrrN7Fu1MKt64B85a4IByoG0YSnhcm/o39A X-Received: by 2002:a05:6402:1771:b0:463:c94d:c7d9 with SMTP id da17-20020a056402177100b00463c94dc7d9mr19788872edb.135.1667601970386; Fri, 04 Nov 2022 15:46:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667601970; cv=none; d=google.com; s=arc-20160816; b=oXLBvVTwaJylU0x7viZl6pWah0kcfxxI88XOOHuKapZFMiZr2mXn61iP8YSZpOQ2KV hT0ACtflipK8D6Aa7xg0kpVuYGpu8BY/veVq/bMBV7CHO1W94K+ehdm7404yffHF0TWj FBkPOB3HlfRXkXR+sFE+R8u3uY/sBzaH6kr5OKPUpszdcxs15cicu8DL/vbDMN7eAbcD cD7t+PVOFdACUam6CyLm3xZ/Sp8FDj3AJXlj0Zym0IlNgWrDROy3IXSoV5pjU8z00sRD Q5oHXeXITU14sn1Tq1YEgH+t7+vg1qtf4si+NeabQX9xGIliqdsfCqDPKVceWsDa/2kH znHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=Csl/uHSVigsX3hH5FQ/OeAa/No2MfkHUHzhB7SHX95Y=; b=TuIyhRaH8k1jp9BvuZTA3oLqXRpnFu8grJnwsjgrrfkTabGcCP01wUq5/RpnSo7dcy j91/wzJdw4WG2+0k4YcDzbdhjRBoEoLCKnOYEHwq0jbK+NoAfghdKcSs3hg78bsWzSKl AskJDO8er5FLcw5sSxQtpd9447aRij/Dp61lLKYWRR7T30NUf+0FXHeWnaJnfCalL1Sf KZkyt0xUVUitSXR9QWKDyE4AvunL5UU7JNlBg/scUZwBhI11Sx+DJo0B9XOZxgXeJd1o 985XAZ5uYGnkz8PwqnAe9B+N2KbqUXe44e/p9qF/kmANjgJOF0eWrJZO7tkTvYa6N/Wo w1Hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Vc4tgZY0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hd15-20020a170907968f00b00783a5f78700si407184ejc.226.2022.11.04.15.45.45; Fri, 04 Nov 2022 15:46:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Vc4tgZY0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230214AbiKDWpS (ORCPT + 99 others); Fri, 4 Nov 2022 18:45:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230051AbiKDWoS (ORCPT ); Fri, 4 Nov 2022 18:44:18 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2DED69DC9; Fri, 4 Nov 2022 15:40:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601638; x=1699137638; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Q2WzLNvi+uaoeKQh1K+WrXv9LEwAcETs+iDQvlLduko=; b=Vc4tgZY0ip44QOpxBsqsdiqaUZCO2swDxmZPeHTlUtrL13P8ilbtEUfp zi4mTrOAgCmFRkvQ9QSI4G7f5jCVRdeybQlcU7r5UWhN2PwdySxThqmHK Ks/WRpLl3SUOE8BqKvMok24fvI3MzPl5QpQwAeTbSrA09ht/GXd3mG/cS E7+7l+NNmrUlxyAXRZiHWZVitZLA3X3Cfj4TxFl+9AydC1iHMddOZY7Z/ kR8oGcR5oOFsTKqIFq4x8CnvWa3gxARn+PGbw5Dfiw2Nl9ZDriIkzyRKj Gxh+zBMRbSmA8nx+94ad1ERvYO8M90DPYh68HMsv6T2T/hoo/YycH6SH9 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840607" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840607" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:51 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514147" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514147" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:50 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 32/37] x86/cet/shstk: Wire in CET interface Date: Fri, 4 Nov 2022 15:35:59 -0700 Message-Id: <20221104223604.29615-33-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607403552116804?= X-GMAIL-MSGID: =?utf-8?q?1748607403552116804?= The kernel now has the main CET functionality to support applications. Wire in the WRSS and shadow stack enable/disable functions into the existing CET API skeleton. Tested-by: Pengfei Xu Tested-by: John Allen Reviewed-by: Kees Cook Signed-off-by: Rick Edgecombe --- v2: - Split from other patches arch/x86/kernel/shstk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index cbd0970b26d7..71620b77a654 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -463,9 +463,17 @@ long cet_prctl(struct task_struct *task, int option, unsigned long features) return -EINVAL; if (option == ARCH_CET_DISABLE) { + if (features & CET_WRSS) + return wrss_control(false); + if (features & CET_SHSTK) + return shstk_disable(); return -EINVAL; } /* Handle ARCH_CET_ENABLE */ + if (features & CET_SHSTK) + return shstk_setup(); + if (features & CET_WRSS) + return wrss_control(true); return -EINVAL; } From patchwork Fri Nov 4 22:36:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15850 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp681121wru; Fri, 4 Nov 2022 15:46:51 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7ane0rlXwev0Gr3n9o1IflvGpjWVjz74k8L01DD1FbfvJOPNjM1Gb857kDj/pIrsdZRpi6 X-Received: by 2002:a05:6402:50d3:b0:461:ba8a:8779 with SMTP id h19-20020a05640250d300b00461ba8a8779mr38468785edb.411.1667602011266; Fri, 04 Nov 2022 15:46:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667602011; cv=none; d=google.com; s=arc-20160816; b=s7RKSO7Ys5SLB4ZutTHQW3/ycZCP6tCFTHt7sYDef8NUuYaCEYUZtdGyJTsSYlAt4Z Gz584xA8rK/BhQ3chIsxxMz95NsS7UrKCo61zKuBwBmHrQLinFq7jQX7k5iuh0mN4TT8 ndtO9bMhQhEf53EZ+8+izejm5vC6c8e7EHTdZjIYsPhFk2lAvEX/e1eFDL9zst2zzNcB 3Sfl0E/oKA0Tzh02EoIlffl+Y/BY9Oh9nzZL2LKozN0bPFcuaAh5z+dt0wBmo3EVY9fx lf9Y/zN2mtQsUuPxwVi3jKpZwaqNoTqFD9e8EZcV/Vs6FUfZLKNvOKeLioOSNzWbkxP1 QVrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=ohlSSQyn8wnK8BW4/jJcfV48b2euQ+g2Ism8OZdTAkg=; b=m5tRNHP2DfMrRbHz5Y7HHvkx793HnLS/X7caDaRmnUsOuJlWNacTIBwcgi1VblVFlG J6kHv4oViBdOjJzQHJYtdh97h1yBNvgcJrPYtuJsSG3/VV/0MBrfNK2+nN47K+0rA9Sb E/tW5bUjSzwu0JeH9dZVoJyxVZoBblTXx/+5NemhwFyQ57YJjg+m1pW350Uogjj62Coi vbHAi8YrIgkgS5nwwstKNsfaAkVIeb4ujQfDuwJfZD+tjbRKY1VvJgqWENomZxGtAEkG Dr8ed/1RB3n3ihR8xWRUFTx3zdnjyTbDGScCnnWEeQ6MlT+jnDEJxoM/TYB/aEtMwpBx oa2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mMIlrtIW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hb42-20020a170907162a00b0078d9be7f100si316579ejc.852.2022.11.04.15.46.27; Fri, 04 Nov 2022 15:46:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mMIlrtIW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230344AbiKDWpo (ORCPT + 99 others); Fri, 4 Nov 2022 18:45:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230340AbiKDWow (ORCPT ); Fri, 4 Nov 2022 18:44:52 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F055B69DCE; Fri, 4 Nov 2022 15:40:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601647; x=1699137647; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=2bikX3W+wHwtlfkK+PPS8dtSt6AtvNxDtjkqyPE4gcw=; b=mMIlrtIWMWsFISx/9MJZRoeVdDGAfZdnelPsCOJZvTkSsv2mSOHNZzns ZahLQMKNWQ+gNwKT8+JjhYkZOMdx7ieJy61lc0mqiIAqM3wRnme/W38sf uE79542rL/tT051S21+Ij6uIRikPl++x7sssA79+9J0oZKWECpPgnEqvL 60x+4ArB4NqZom1py1RE2yUeBMNSR07Tud8qaB8Qj14XsRpkx6rHflaol u46MlKuAxOj8pgtf6aKe1jiq6rBMioCEg3WmjNsLPRd1SbI2i+keZgnDe NBt8fvHdrFK7u9+FDdMzzKOwgkmI7FiqKgYaomXOxKbwqHty5ypvPwI/R g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840614" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840614" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:52 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514154" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514154" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:51 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 33/37] selftests/x86: Add shadow stack test Date: Fri, 4 Nov 2022 15:36:00 -0700 Message-Id: <20221104223604.29615-34-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607446735596391?= X-GMAIL-MSGID: =?utf-8?q?1748607446735596391?= Add a simple selftest for exercising some shadow stack behavior: - map_shadow_stack syscall and pivot - Faulting in shadow stack memory - Handling shadow stack violations - GUP of shadow stack memory - mprotect() of shadow stack memory - Userfaultfd on shadow stack memory Since this test exercises a recently added syscall manually, it needs to find the automatically created __NR_foo defines. Per the selftest documentation, KHDR_INCLUDES can be used to help the selftest Makefile's find the headers from the kernel source. This way the new selftest can be built inside the kernel source tree without installing the headers to the system. So also add KHDR_INCLUDES as described in the selftest docs, to facilitate this. Tested-by: Pengfei Xu Tested-by: John Allen Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe --- v3: - Change "+m" to "=m" in write_shstk() (Andrew Cooper) - Fix userfaultfd test with transparent huge pages by doing a MADV_DONTNEED, since the token write faults in the while stack with huge pages. v2: - Change print statements to more align with other selftests - Add more tests - Add KHDR_INCLUDES to Makefile v1: - New patch. tools/testing/selftests/x86/Makefile | 4 +- .../testing/selftests/x86/test_shadow_stack.c | 574 ++++++++++++++++++ 2 files changed, 576 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/x86/test_shadow_stack.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 0388c4d60af0..cfc8a26ad151 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \ - corrupt_xstate_header amx + corrupt_xstate_header amx test_shadow_stack # Some selftests require 32bit support enabled also on 64bit systems TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall @@ -34,7 +34,7 @@ BINARIES_64 := $(TARGETS_C_64BIT_ALL:%=%_64) BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32)) BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64)) -CFLAGS := -O2 -g -std=gnu99 -pthread -Wall +CFLAGS := -O2 -g -std=gnu99 -pthread -Wall $(KHDR_INCLUDES) # call32_from_64 in thunks.S uses absolute addresses. ifeq ($(CAN_BUILD_WITH_NOPIE),1) diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c new file mode 100644 index 000000000000..b347447da317 --- /dev/null +++ b/tools/testing/selftests/x86/test_shadow_stack.c @@ -0,0 +1,574 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program test's basic kernel shadow stack support. It enables shadow + * stack manual via the arch_prctl(), instead of relying on glibc. It's + * Makefile doesn't compile with shadow stack support, so it doesn't rely on + * any particular glibc. As a result it can't do any operations that require + * special glibc shadow stack support (longjmp(), swapcontext(), etc). Just + * stick to the basics and hope the compiler doesn't do anything strange. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SS_SIZE 0x200000 + +#if (__GNUC__ < 8) || (__GNUC__ == 8 && __GNUC_MINOR__ < 5) +int main(int argc, char *argv[]) +{ + printf("[SKIP]\tCompiler does not support CET.\n"); + return 0; +} +#else +void write_shstk(unsigned long *addr, unsigned long val) +{ + asm volatile("wrssq %[val], (%[addr])\n" + : "=m" (addr) + : [addr] "r" (addr), [val] "r" (val)); +} + +static inline unsigned long __attribute__((always_inline)) get_ssp(void) +{ + unsigned long ret = 0; + + asm volatile("xor %0, %0; rdsspq %0" : "=r" (ret)); + return ret; +} + +/* + * For use in inline enablement of shadow stack. + * + * The program can't return from the point where shadow stack get's enabled + * because there will be no address on the shadow stack. So it can't use + * syscall() for enablement, since it is a function. + * + * Based on code from nolibc.h. Keep a copy here because this can't pull in all + * of nolibc.h. + */ +#define ARCH_PRCTL(arg1, arg2) \ +({ \ + long _ret; \ + register long _num asm("eax") = __NR_arch_prctl; \ + register long _arg1 asm("rdi") = (long)(arg1); \ + register long _arg2 asm("rsi") = (long)(arg2); \ + \ + asm volatile ( \ + "syscall\n" \ + : "=a"(_ret) \ + : "r"(_arg1), "r"(_arg2), \ + "0"(_num) \ + : "rcx", "r11", "memory", "cc" \ + ); \ + _ret; \ +}) + +void *create_shstk(void *addr) +{ + return (void *)syscall(__NR_map_shadow_stack, addr, SS_SIZE, SHADOW_STACK_SET_TOKEN); +} + +void *create_normal_mem(void *addr) +{ + return mmap(addr, SS_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); +} + +void free_shstk(void *shstk) +{ + munmap(shstk, SS_SIZE); +} + +int reset_shstk(void *shstk) +{ + return madvise(shstk, SS_SIZE, MADV_DONTNEED); +} + +void try_shstk(unsigned long new_ssp) +{ + unsigned long ssp; + + printf("[INFO]\tnew_ssp = %lx, *new_ssp = %lx\n", + new_ssp, *((unsigned long *)new_ssp)); + + ssp = get_ssp(); + printf("[INFO]\tchanging ssp from %lx to %lx\n", ssp, new_ssp); + + asm volatile("rstorssp (%0)\n":: "r" (new_ssp)); + asm volatile("saveprevssp"); + printf("[INFO]\tssp is now %lx\n", get_ssp()); + + /* Switch back to original shadow stack */ + ssp -= 8; + asm volatile("rstorssp (%0)\n":: "r" (ssp)); + asm volatile("saveprevssp"); +} + +int test_shstk_pivot(void) +{ + void *shstk = create_shstk(0); + + if (shstk == MAP_FAILED) { + printf("[FAIL]\tError creating shadow stack: %d\n", errno); + return 1; + } + try_shstk((unsigned long)shstk + SS_SIZE - 8); + free_shstk(shstk); + + printf("[OK]\tShadow stack pivot\n"); + return 0; +} + +int test_shstk_faults(void) +{ + unsigned long *shstk = create_shstk(0); + + /* Read shadow stack, test if it's zero to not get read optimized out */ + if (*shstk != 0) + goto err; + + /* Wrss memory that was already read. */ + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + /* Page out memory, so we can wrss it again. */ + if (reset_shstk((void *)shstk)) + goto err; + + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + printf("[OK]\tShadow stack faults\n"); + return 0; + +err: + return 1; +} + +unsigned long saved_ssp; +unsigned long saved_ssp_val; +volatile bool segv_triggered; + +void __attribute__((noinline)) violate_ss(void) +{ + saved_ssp = get_ssp(); + saved_ssp_val = *(unsigned long *)saved_ssp; + + /* Corrupt shadow stack */ + printf("[INFO]\tCorrupting shadow stack\n"); + write_shstk((void *)saved_ssp, 0); +} + +void segv_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tGenerated shadow stack violation successfully\n"); + + segv_triggered = true; + + /* Fix shadow stack */ + write_shstk((void *)saved_ssp, saved_ssp_val); +} + +int test_shstk_violation(void) +{ + struct sigaction sa; + + sa.sa_sigaction = segv_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* Make sure segv_triggered is set before violate_ss() */ + asm volatile("" : : : "memory"); + + violate_ss(); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow stack violation test\n"); + + return !segv_triggered; +} + +/* Gup test state */ +#define MAGIC_VAL 0x12345678 +bool is_shstk_access; +void *shstk_ptr; +int fd; + +void reset_test_shstk(void *addr) +{ + if (shstk_ptr != NULL) + free_shstk(shstk_ptr); + shstk_ptr = create_shstk(addr); +} + +void test_access_fix_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tViolation from %s\n", is_shstk_access ? "shstk access" : "normal write"); + + segv_triggered = true; + + /* Fix shadow stack */ + if (is_shstk_access) { + reset_test_shstk(shstk_ptr); + return; + } + + free_shstk(shstk_ptr); + create_normal_mem(shstk_ptr); +} + +bool test_shstk_access(void *ptr) +{ + is_shstk_access = true; + segv_triggered = false; + write_shstk(ptr, MAGIC_VAL); + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool test_write_access(void *ptr) +{ + is_shstk_access = false; + segv_triggered = false; + *(unsigned long *)ptr = MAGIC_VAL; + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool gup_write(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (write(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +bool gup_read(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (read(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +int test_gup(void) +{ + struct sigaction sa; + int status; + pid_t pid; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + return 1; + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> write access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> write access success\n"); + + close(fd); + + /* COW/gup test */ + reset_test_shstk(0); + pid = fork(); + if (!pid) { + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + exit(1); + + if (gup_write(shstk_ptr)) { + close(fd); + exit(1); + } + close(fd); + exit(0); + } + waitpid(pid, &status, 0); + if (WEXITSTATUS(status)) { + printf("[FAIL]\tWrite in child failed\n"); + return 1; + } + if (*(unsigned long *)shstk_ptr == MAGIC_VAL) { + printf("[FAIL]\tWrite in child wrote through to shared memory\n"); + return 1; + } + + printf("[INFO]\tCow gup write -> write access success\n"); + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow gup test\n"); + + return 0; +} + +int test_mprotect(void) +{ + struct sigaction sa; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* mprotect a shaodw stack as read only */ + reset_test_shstk(0); + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* try to wrss it and fail */ + if (!test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to read-only memory succeeded\n"); + return 1; + } + + /* then back to writable */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_WRITE | PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_WRITE) failed\n"); + return 1; + } + + /* then pivot to it and succeed */ + if (test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to mprotect() writable memory failed\n"); + return 1; + } + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tmprotect() test\n"); + + return 0; +} + +char zero[4096]; + +static void *uffd_thread(void *arg) +{ + struct uffdio_copy req; + int uffd = *(int *)arg; + struct uffd_msg msg; + + if (read(uffd, &msg, sizeof(msg)) <= 0) + return (void *)1; + + req.dst = msg.arg.pagefault.address; + req.src = (__u64)zero; + req.len = 4096; + req.mode = 0; + + if (ioctl(uffd, UFFDIO_COPY, &req)) + return (void *)1; + + return (void *)0; +} + +int test_userfaultfd(void) +{ + struct uffdio_register uffdio_register; + struct uffdio_api uffdio_api; + struct sigaction sa; + pthread_t thread; + void *res; + int uffd; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + printf("[SKIP]\tUserfaultfd unavailable.\n"); + return 0; + } + + reset_test_shstk(0); + + uffdio_api.api = UFFD_API; + uffdio_api.features = 0; + if (ioctl(uffd, UFFDIO_API, &uffdio_api)) + goto err; + + uffdio_register.range.start = (__u64)shstk_ptr; + uffdio_register.range.len = 4096; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) + goto err; + + if (pthread_create(&thread, NULL, &uffd_thread, &uffd)) + goto err; + + reset_shstk(shstk_ptr); + test_shstk_access(shstk_ptr); + + if (pthread_join(thread, &res)) + goto err; + + if (test_shstk_access(shstk_ptr)) + goto err; + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + if (!res) + printf("[OK]\tUserfaultfd test\n"); + return !!res; +err: + free_shstk(shstk_ptr); + close(uffd); + signal(SIGSEGV, SIG_DFL); + return 1; +} + +int main(int argc, char *argv[]) +{ + int ret = 0; + + if (ARCH_PRCTL(ARCH_CET_ENABLE, CET_SHSTK)) { + printf("[SKIP]\tCould not enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_CET_DISABLE, CET_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + if (ARCH_PRCTL(ARCH_CET_ENABLE, CET_SHSTK)) { + printf("[SKIP]\tCould not re-enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_CET_ENABLE, CET_WRSS)) { + printf("[SKIP]\tCould not enable WRSS\n"); + ret = 1; + goto out; + } + + /* Should have succeeded if here, but this is a test, so double check. */ + if (!get_ssp()) { + printf("[FAIL]\tShadow stack disabled\n"); + return 1; + } + + if (test_shstk_pivot()) { + ret = 1; + printf("[FAIL]\tShadow stack pivot\n"); + goto out; + } + + if (test_shstk_faults()) { + ret = 1; + printf("[FAIL]\tShadow stack fault test\n"); + goto out; + } + + if (test_shstk_violation()) { + ret = 1; + printf("[FAIL]\tShadow stack violation test\n"); + goto out; + } + + if (test_gup()) { + ret = 1; + printf("[FAIL]\tShadow shadow stack gup\n"); + } + + if (test_mprotect()) { + ret = 1; + printf("[FAIL]\tShadow shadow mprotect test\n"); + } + + if (test_userfaultfd()) { + ret = 1; + printf("[FAIL]\tUserfaultfd test\n"); + } + +out: + /* + * Disable shadow stack before the function returns, or there will be a + * shadow stack violation. + */ + if (ARCH_PRCTL(ARCH_CET_DISABLE, CET_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + return ret; +} +#endif From patchwork Fri Nov 4 22:36:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15849 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp681111wru; Fri, 4 Nov 2022 15:46:49 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7rvzQCdFOyB/Bjib1RKAFUGEdPlxPN4E8s2N8oCdHHLnH/tLjIvNk55IaaAzuy4VJrkpzo X-Received: by 2002:a17:906:5d04:b0:722:f46c:b891 with SMTP id g4-20020a1709065d0400b00722f46cb891mr36377683ejt.4.1667602009527; Fri, 04 Nov 2022 15:46:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667602009; cv=none; d=google.com; s=arc-20160816; b=ino+ekmfc3dMhJMEqwAmgSJnHWdREqXwUN2dBjeeJuK9mSxa5GEUoWc0Lvtq1aLCNg km1GwGWWlFcAHaMrWFV+LtGk00c4JZ4D11FXgGREjaSnuzHu43a7hcDaGaHE8r5jT57B J/BU8zd8Ck8A2dlFg5QcqJQJwqVcupKZWhILLl2g9LPnwRUZnF5TCDQAJC0FjlqxX4Px k8VDG9bU2sUeTvcp44MzwHieLrhc9g6lX7FjkHXUX6RlqK3B762bOIASL2B8n/ZzlPse yovdmBFgLwqncFICX8u89xT+pkAF1UxCbLxLWK2ppVxDu9CWIVfX2xWXCMm+MYHNPF8s kJfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=qrAb9WBcxZa2Ls4drWG5rzJJsVV/R9zzooExIZoY3XI=; b=uoqnsuwH6vE/xW6UUQsgSry3e8Sf+f36++wy85uH6h7aX26kIijjqPX4pswc6F2aLf c9QAGaV7Fxcj/Vsm4Pp9KFgmsJEYoYByrQ4ivpHvTzjFOjXUk4Nco3v8E8Si5iXzffrZ WPIZS3z+KXzuz5xgIkG0aD3Rb8nWK7LIzScdOInMFRF7HkZ9LcAyBl+/vFwVrX0taDVG ZH9SIoHgHVSv/kavUn5HKskIsalFjpvIGIOU5L2xJkAqqSjt4s15/J6keR+45SF38+BK zYZBqUMKLS6QT7suHtGWH/Six20CRgBw2MAc3/tlCrBU5xBdrmh8STXjJQ13W6fvU/vE mEJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Akbq8CuO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xh14-20020a170906da8e00b0078def76da94si364434ejb.437.2022.11.04.15.46.26; Fri, 04 Nov 2022 15:46:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Akbq8CuO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230342AbiKDWpf (ORCPT + 99 others); Fri, 4 Nov 2022 18:45:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230353AbiKDWow (ORCPT ); Fri, 4 Nov 2022 18:44:52 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73C5D69DF4; Fri, 4 Nov 2022 15:40:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601648; x=1699137648; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=zjdmX1zB3aTbVFcJICXaDZe79u/RmhU8f5GNP/klTjA=; b=Akbq8CuOp8BYDkiuXEMTXwQDNmJ5RsIHOaAQkBdvZrULVEBgnqTlGh8o 33ORdXce+qNdS0ehtXrOVq7uNWX2UKX4jiCAjrKasy4ekhuvpeffpqHIu NZSWFgyZ0srE2u2ypN/XvbSZeznN9pITrjdPWQ+nvrqCdDWKql7yqSju+ 4pli0VavKq8wLS70AqTGQ8zR1RyzhV5vDQWJjRgnongPO3IVQkGjYSHy7 85eaaG27fS0vJ1sWV35Rk4QHKpT9IlxPqYtEsOK8o2r5ECfjfQBYvmjG1 3JdrRpCgHGEWUaIx5KilOtc0GLtBhDCsLy070FTKfaHBJWVK33Z9dIDjF Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840620" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840620" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:53 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514159" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514159" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:52 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v3 34/37] x86/fpu: Add helper for initing features Date: Fri, 4 Nov 2022 15:36:01 -0700 Message-Id: <20221104223604.29615-35-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607444433540939?= X-GMAIL-MSGID: =?utf-8?q?1748607444433540939?= If an xfeature is saved in a buffer, the xfeature's bit will be set in xsave->header.xfeatures. The CPU may opt to not save the xfeature if it is in it's init state. In this case the xfeature buffer address cannot be retrieved with get_xsave_addr(). Future patches will need to handle the case of writing to an xfeature that may not be saved. So provide helpers to init an xfeature in an xsave buffer. This could of course be done directly by reaching into the xsave buffer, however this would not be robust against future changes to optimize the xsave buffer by compacting it. In that case the xsave buffer would need to be re-arranged as well. So the logic properly belongs encapsulated in a helper where the logic can be unified. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Rick Edgecombe --- v2: - New patch arch/x86/kernel/fpu/xstate.c | 58 +++++++++++++++++++++++++++++------- arch/x86/kernel/fpu/xstate.h | 6 ++++ 2 files changed, 53 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 959d4dd64434..665737559a1f 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -934,6 +934,24 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr) return (void *)xsave + xfeature_get_offset(xcomp_bv, xfeature_nr); } +static int xsave_buffer_access_checks(int xfeature_nr) +{ + /* + * Do we even *have* xsave state? + */ + if (!boot_cpu_has(X86_FEATURE_XSAVE)) + return 1; + + /* + * We should not ever be requesting features that we + * have not enabled. + */ + if (WARN_ON_ONCE(!xfeature_enabled(xfeature_nr))) + return 1; + + return 0; +} + /* * Given the xsave area and a state inside, this function returns the * address of the state. @@ -954,17 +972,7 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr) */ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr) { - /* - * Do we even *have* xsave state? - */ - if (!boot_cpu_has(X86_FEATURE_XSAVE)) - return NULL; - - /* - * We should not ever be requesting features that we - * have not enabled. - */ - if (WARN_ON_ONCE(!xfeature_enabled(xfeature_nr))) + if (xsave_buffer_access_checks(xfeature_nr)) return NULL; /* @@ -984,6 +992,34 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr) return __raw_xsave_addr(xsave, xfeature_nr); } +/* + * Given the xsave area and a state inside, this function + * initializes an xfeature in the buffer. + * + * get_xsave_addr() will return NULL if the feature bit is + * not present in the header. This function will make it so + * the xfeature buffer address is ready to be retrieved by + * get_xsave_addr(). + * + * Inputs: + * xstate: the thread's storage area for all FPU data + * xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP, + * XFEATURE_SSE, etc...) + * Output: + * 1 if the feature cannot be inited, 0 on success + */ +int init_xfeature(struct xregs_state *xsave, int xfeature_nr) +{ + if (xsave_buffer_access_checks(xfeature_nr)) + return 1; + + /* + * Mark the feature inited. + */ + xsave->header.xfeatures |= BIT_ULL(xfeature_nr); + return 0; +} + #ifdef CONFIG_ARCH_HAS_PKEYS /* diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h index 5ad47031383b..fb8aae678e9f 100644 --- a/arch/x86/kernel/fpu/xstate.h +++ b/arch/x86/kernel/fpu/xstate.h @@ -54,6 +54,12 @@ extern void fpu__init_cpu_xstate(void); extern void fpu__init_system_xstate(unsigned int legacy_size); extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr); +extern int init_xfeature(struct xregs_state *xsave, int xfeature_nr); + +static inline int xfeature_saved(struct xregs_state *xsave, int xfeature_nr) +{ + return xsave->header.xfeatures & BIT_ULL(xfeature_nr); +} static inline u64 xfeatures_mask_supervisor(void) { From patchwork Fri Nov 4 22:36:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15851 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp681138wru; Fri, 4 Nov 2022 15:46:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6fVwb5FjLJChk4EENKxuWbipvoH0BkOMvh2iEOBK8D96kQilL7irT9PfzFJ4Lt22mcEdf0 X-Received: by 2002:a17:907:6e06:b0:7ae:1219:7035 with SMTP id sd6-20020a1709076e0600b007ae12197035mr12216144ejc.30.1667602014714; Fri, 04 Nov 2022 15:46:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667602014; cv=none; d=google.com; s=arc-20160816; b=eIGqBzBDF9UradyFe7Sn7C+5XdQmTxqrsiSWkl3mKc05jBGSQKzkc5DharxFw6Rzkn 5elRiGQgq6nZ4jf2SxaKcgVpmaABwyLH1gvsN5u1L/N+KxfuQQZX1P63MsSNP9YH+bQX 1ckTA5Am1Y5QsyISZ2OmywrS7YD7HIWH6RuZwCUb0IM+oh3LQr4BplCCuoyohGZFeBTn iEEpgZ/3CxAAZSSdoOrTgK+ZML729xFgCOiWg/Fz+yAZWRHArsg16hqLLrpFfCTgdTsQ Ecf7LabwuLvjlSSuW2K3LsjQvUOxZH9ftrehk8UBqqkMb1h2lZ37TK9IFwlt9w8hwSMA 7hZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sz0N0XOqhW51KlS9+6psk6mCNOC96yLY7IdTKmVGxJ4=; b=TTNni1V4s9iNlQQLwbqYdIbTRht/rAO8kgf07vdCalPi/u0/LdKlbmELjkZEc2pNN8 scULdnBzcfVWYyPY5LLTZH5MGlTY+jtKJNpTxc7dx1ztVGtJ0hqcwPSdGYvDh8ZXKZam rvjX8vZJS1r2JzRAeIcDhTk1I7DwnSrdNUYbeuQlwL0BorDGXoyLWCjGdZ+WA7oBpMgy QMTZPBo6cp+M/gjle37MVHT7FBOlBGFHSMH3YvRtfzSdiZv6Ubtc+4kxaYzCcjWUSj12 VfpTSXcb1dADGSFbKytR9RUma+InX2QnO8eUJwHeoYxQxt1tjWrJ+/0qL2hzHmFCwi0e /PIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hI+057rc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r19-20020aa7da13000000b004619e210510si813721eds.150.2022.11.04.15.46.31; Fri, 04 Nov 2022 15:46:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hI+057rc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230445AbiKDWpt (ORCPT + 99 others); Fri, 4 Nov 2022 18:45:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229953AbiKDWpO (ORCPT ); Fri, 4 Nov 2022 18:45:14 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FAF6748E8; Fri, 4 Nov 2022 15:40:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601651; x=1699137651; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1SbH7KoNxvEwp9S5kMeICfV9WFkC+yJHuL5TGX2hAx8=; b=hI+057rc4rP/YkRP6tRz9ZXkpPjLrxDP8DqPTZ/OY6p8cwh4YXNOVKgG UXX1giMy7bBK35+1SiojyiMkFOKfVUKrFdWoE4SCQS4N+oumZGYLSEo8S rrZGrIruO1ssOmZBfEoByiWyThyBNxVh3OjN/lTqpO1uyTY20NC6iQwPa qcXPpA01NuRawkjN3ZCavViewbtsguG2jVmRjSbraHBc28mWn5f+3oCpO yF+8xgQsj/bkNLObPBEvqqPgs1YodyCHRsNYq+wk45b7hT0zzOaNBP8cW wyeW0Fl1RTC2k3cvAp981ES9T2GIc3d9K9/99H/T83Zb3fnroIPDUUkvt Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840627" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840627" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:54 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514171" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514171" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:53 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v3 35/37] x86/cet: Add PTRACE interface for CET Date: Fri, 4 Nov 2022 15:36:02 -0700 Message-Id: <20221104223604.29615-36-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607450078254448?= X-GMAIL-MSGID: =?utf-8?q?1748607450078254448?= From: Yu-cheng Yu Some applications (like GDB and CRIU) would like to tweak CET state via ptrace. This allows for existing functionality to continue to work for seized CET applications. Provide an interface based on the xsave buffer format of CET, but filter unneeded states to make the kernel’s job easier. There is already ptrace functionality for accessing xstate, but this does not include supervisor xfeatures. So there is not a completely clear place for where to put the CET state. Adding it to the user xfeatures regset would complicate that code, as it currently shares logic with signals which should not have supervisor features. Don’t add a general supervisor xfeature regset like the user one, because it is better to maintain flexibility for other supervisor xfeatures to define their own interface. For example, an xfeature may decide not to expose all of it’s state to userspace. A lot of enum values remain to be used, so just put it in dedicated CET regset. The only downside to not having a generic supervisor xfeature regset, is that apps need to be enlightened of any new supervisor xfeature exposed this way (i.e. they can’t try to have generic save/restore logic). But maybe that is a good thing, because they have to think through each new xfeature instead of encountering issues when new a new supervisor xfeature was added. By adding a CET regset, it also has the effect of including the CET state in a core dump, which could be useful for debugging. Inside the setter CET regset, filter out invalid state. Today this includes states disallowed by the HW and states involving Indirect Branch Tracking which the kernel does not currently support for usersapce. So this leaves three pieces of data that can be set, shadow stack enablement, WRSS enablement and the shadow stack pointer. It is worth noting that this is separate than enabling shadow stack via the arch_prctl()s. Enabling shadow stack involves more than just flipping the bit. The kernel is made aware that it has to do extra things when cloning or handling signals. That logic is triggered off of separate feature enablement state kept in the task struct. So the flipping on HW shadow stack enforcement without notifying the kernel to change its behavior would severely limit what an application could do without crashing. Since there is likely no use for this, only allow the CET registers to be set if shadow stack is already enabled via the arch_prctl()s. This will let apps like GDB toggle shadow stack enforcement for apps that already have shadow stack enabled, and minimize scenarios the kernel has to worry about. Tested-by: Pengfei Xu Tested-by: John Allen Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu --- v3: - Drop dependence on thread.shstk.size, and use thread.features bits - Drop 32 bit support v2: - Check alignment on ssp. - Block IBT bits. - Handle init states instead of returning error. - Add verbose commit log justifying the design. Yu-Cheng v12: - Return -ENODEV when CET registers are in INIT state. - Check reserved/non-support bits from user input. arch/x86/include/asm/fpu/regset.h | 7 +-- arch/x86/include/asm/msr-index.h | 5 ++ arch/x86/kernel/fpu/regset.c | 90 +++++++++++++++++++++++++++++++ arch/x86/kernel/ptrace.c | 20 +++++++ include/uapi/linux/elf.h | 1 + 5 files changed, 120 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/regset.h b/arch/x86/include/asm/fpu/regset.h index 4f928d6a367b..8622184d87f5 100644 --- a/arch/x86/include/asm/fpu/regset.h +++ b/arch/x86/include/asm/fpu/regset.h @@ -7,11 +7,12 @@ #include -extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active; +extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active, + cetregs_active; extern user_regset_get2_fn fpregs_get, xfpregs_get, fpregs_soft_get, - xstateregs_get; + xstateregs_get, cetregs_get; extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set, - xstateregs_set; + xstateregs_set, cetregs_set; /* * xstateregs_active == regset_fpregs_active. Please refer to the comment diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 10ac52705892..674c508798ee 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -437,6 +437,11 @@ #define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9)) #define CET_SUPPRESS BIT_ULL(10) #define CET_WAIT_ENDBR BIT_ULL(11) +#define CET_EG_LEG_BITMAP_BASE_MASK GENMASK_ULL(63, 13) + +#define CET_U_IBT_MASK (CET_ENDBR_EN | CET_LEG_IW_EN | CET_NO_TRACK_EN | \ + CET_NO_TRACK_EN | CET_SUPPRESS_DISABLE | CET_SUPPRESS | \ + CET_WAIT_ENDBR | CET_EG_LEG_BITMAP_BASE_MASK) #define MSR_IA32_PL0_SSP 0x000006a4 /* ring-0 shadow stack pointer */ #define MSR_IA32_PL1_SSP 0x000006a5 /* ring-1 shadow stack pointer */ diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index 75ffaef8c299..21225b994b2d 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "context.h" #include "internal.h" @@ -174,6 +175,95 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, return ret; } +int cetregs_active(struct task_struct *target, const struct user_regset *regset) +{ +#ifdef CONFIG_X86_USER_SHADOW_STACK + if (target->thread.features & CET_SHSTK) + return regset->n; +#endif + return 0; +} + +int cetregs_get(struct task_struct *target, const struct user_regset *regset, + struct membuf to) +{ + struct fpu *fpu = &target->thread.fpu; + struct cet_user_state *cetregs; + + if (!boot_cpu_has(X86_FEATURE_USER_SHSTK)) + return -ENODEV; + + sync_fpstate(fpu); + cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER); + if (!cetregs) { + /* + * The registers are the in the init state. The init values for + * these regs are zero, so just zero the output buffer. + */ + membuf_zero(&to, sizeof(struct cet_user_state)); + return 0; + } + + return membuf_write(&to, cetregs, sizeof(struct cet_user_state)); +} + +int cetregs_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + struct fpu *fpu = &target->thread.fpu; + struct xregs_state *xsave = &fpu->fpstate->regs.xsave; + struct cet_user_state *cetregs, tmp; + int r; + + if (!boot_cpu_has(X86_FEATURE_USER_SHSTK) || + !cetregs_active(target, regset)) + return -ENODEV; + + r = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &tmp, 0, -1); + if (r) + return r; + + /* + * Some kernel instructions (IRET, etc) can cause exceptions in the case + * of disallowed CET register values. Just prevent invalid values. + */ + if ((tmp.user_ssp >= TASK_SIZE_MAX) || !IS_ALIGNED(tmp.user_ssp, 8)) + return -EINVAL; + + /* + * Don't allow any IBT bits to be set because it is not supported by + * the kernel yet. Also don't allow reserved bits. + */ + if ((tmp.user_cet & CET_RESERVED) || (tmp.user_cet & CET_U_IBT_MASK)) + return -EINVAL; + + fpu_force_restore(fpu); + + /* + * Don't want to init the xfeature until the kernel will definetely + * overwrite it, otherwise if it inits and then fails out, it would + * end up initing it to random data. + */ + if (!xfeature_saved(xsave, XFEATURE_CET_USER) && + WARN_ON(init_xfeature(xsave, XFEATURE_CET_USER))) + return -ENODEV; + + cetregs = get_xsave_addr(xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because it was successfully + * inited above if needed. The only scenario would be if an + * xfeature was somehow saved in a buffer, but not enabled in + * xsave. + */ + return -ENODEV; + } + + memmove(cetregs, &tmp, sizeof(tmp)); + return 0; +} + #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION /* diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index eed8a65d335d..f9e6635b69ce 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -51,6 +51,7 @@ enum x86_regset_32 { REGSET_XSTATE32, REGSET_TLS32, REGSET_IOPERM32, + REGSET_CET32, }; enum x86_regset_64 { @@ -58,6 +59,7 @@ enum x86_regset_64 { REGSET_FP64, REGSET_IOPERM64, REGSET_XSTATE64, + REGSET_CET64, }; #define REGSET_GENERAL \ @@ -1267,6 +1269,15 @@ static struct user_regset x86_64_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, + [REGSET_CET64] = { + .core_note_type = NT_X86_CET, + .n = sizeof(struct cet_user_state) / sizeof(u64), + .size = sizeof(u64), + .align = sizeof(u64), + .active = cetregs_active, + .regset_get = cetregs_get, + .set = cetregs_set + }, }; static const struct user_regset_view user_x86_64_view = { @@ -1336,6 +1347,15 @@ static struct user_regset x86_32_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, + [REGSET_CET32] = { + .core_note_type = NT_X86_CET, + .n = sizeof(struct cet_user_state) / sizeof(u64), + .size = sizeof(u64), + .align = sizeof(u64), + .active = cetregs_active, + .regset_get = cetregs_get, + .set = cetregs_set + }, }; static const struct user_regset_view user_x86_32_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index c7b056af9ef0..11089731e2e9 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -406,6 +406,7 @@ typedef struct elf64_shdr { #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ +#define NT_X86_CET 0x203 /* x86 CET state */ #define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */ #define NT_S390_TIMER 0x301 /* s390 timer register */ #define NT_S390_TODCMP 0x302 /* s390 TOD clock comparator register */ From patchwork Fri Nov 4 22:36:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15852 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp683332wru; Fri, 4 Nov 2022 15:53:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4prrhUTtoRlLkQ7vCQr4fkKp6BkieyBB4zkiZX5O77UoGx8vxUNgMq3i7Xb1ebN8xM4dnX X-Received: by 2002:aa7:9565:0:b0:56c:8860:bce1 with SMTP id x5-20020aa79565000000b0056c8860bce1mr37501264pfq.31.1667602382986; Fri, 04 Nov 2022 15:53:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667602382; cv=none; d=google.com; s=arc-20160816; b=UjkiRwraFPxt8DJc68nfYTBq+PWe5LtNaEUMUEiEZC23cW/lXr62Uyj8azN+jcZyYz DUs4kavXtYExuH2ziufKhcs90gDq0fCJTo2gF/FOJDBc3P5LKsi/iNDH+c8ZO4xcoDZe Tbe+eEOOMu/qnP4nZ447d9Lq3+iyW+ficr1kw1uTJCk5eJi2hiDZwi5Nh7wmZ65mhibx 1N7C/2bJo/3Gy1HmTyXw4kHaI6SU905goQAoDy1sQOD/R3N41xPDUlFMZ+AwzHBKlvSp f2WSXWZMcfoCFKGNncdWOOzhdP0AWrgGrj9/jO7RUtE10bKR9UqydjEkKNH7gCRRNHdA 6liQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=4DbhJGL2YfsIhJmqdbV0qT0hHBtBUjxUx75J/ln81gM=; b=VnEUcG2X9JBl+VJ8UzdYObE76BvMgVeulUEmUnon6wUCBkoHpksb34ApjbiJElNbDG Y1DhVRuMSUrm6oC7jrlGCoqHoO+q9K0log5H91S42JUi31hvcAEBViKV8rGLrx3wL1Un Y4jZWpRVyq6pDYVz5xDRVz6D4vBo9YyEzxjP8vIpEKMTYMleovREcgVL4Y0IUndkGKGB YPYoXoNkhX3ql8M6zVskI+iY8feEjwtDN+HXpjg4G5D1STroDlRi8mlSkdlppqlpPS0c ZxLCScTGHvyXMPaxlUmEns2fW+VEDWq9zcWLFnLLBN6qBmVw3CgoS1eiswl1oNwhDJxc GHWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=def8mlvq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l191-20020a6391c8000000b0046ec9ae4a2esi926171pge.31.2022.11.04.15.52.47; Fri, 04 Nov 2022 15:53:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=def8mlvq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230173AbiKDWqZ (ORCPT + 99 others); Fri, 4 Nov 2022 18:46:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230354AbiKDWpb (ORCPT ); Fri, 4 Nov 2022 18:45:31 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE37612126F; Fri, 4 Nov 2022 15:40:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601658; x=1699137658; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=mTDZ+f4KdYSqV9q+619GLZ5q/2YYY0xPkbFAnS8RyTs=; b=def8mlvqYPnYhc3nOZcvqfNWaA1k1m8BtlQQAr+YKbLlUqkbYUNrs2tO aN5yrwy65XdbOg+tEpabzei3ae/cJctLJUhxgyewZbEmAS5SszEh96WhH /8/skaDavFapsTWE0VP6JJXas5ADW3P/csS9viip+v+DTjE/jix0cX7Zh eSJc6HGVtQBEA+CWU2mtXCb2s+ugRFGHZ/I/jhBX1DtRjGR8SQ6xn4hr3 54LBW0R5CzMmHcbOCHdVnulJf5lCRrei6BH6zE0I9KZWwJVEZ8dGEQg9S uEenIOsZ1wqhqbF7JP1v/RmgK0BsgDuzbpksvAZMKxnXZw9Asyd8uhzCF g==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840633" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840633" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:55 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514181" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514181" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:54 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Mike Rapoport Subject: [PATCH v3 36/37] x86/cet/shstk: Add ARCH_CET_UNLOCK Date: Fri, 4 Nov 2022 15:36:03 -0700 Message-Id: <20221104223604.29615-37-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607836314970506?= X-GMAIL-MSGID: =?utf-8?q?1748607836314970506?= From: Mike Rapoport Userspace loaders may lock features before a CRIU restore operation has the chance to set them to whatever state is required by the process being restored. Allow a way for CRIU to unlock features. Add it as an arch_prctl() like the other CET operations, but restrict it being called by the ptrace arch_pctl() interface. Tested-by: Pengfei Xu Tested-by: John Allen Signed-off-by: Mike Rapoport [Merged into recent API changes, added commit log and docs] Signed-off-by: Rick Edgecombe --- v3: - Depend on CONFIG_CHECKPOINT_RESTORE (Kees) Documentation/x86/cet.rst | 4 ++++ arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 9 +++++++-- 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/x86/cet.rst b/Documentation/x86/cet.rst index b56811566531..f69cafb1feff 100644 --- a/Documentation/x86/cet.rst +++ b/Documentation/x86/cet.rst @@ -66,6 +66,10 @@ arch_prctl(ARCH_CET_LOCK, unsigned int features) are ignored. The mask is ORed with the existing value. So any feature bits set here cannot be enabled or disabled afterwards. +arch_prctl(ARCH_CET_UNLOCK, unsigned int features) + Unlock features. 'features' is a mask of all features to unlock. All + bits set are processed, unset bits are ignored. + The return values are as following: On success, return 0. On error, errno can be:: diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 5f1d3181e4a1..0c37fd0ad8d9 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -25,6 +25,7 @@ #define ARCH_CET_ENABLE 0x5001 #define ARCH_CET_DISABLE 0x5002 #define ARCH_CET_LOCK 0x5003 +#define ARCH_CET_UNLOCK 0x5004 /* ARCH_CET_ features bits */ #define CET_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 17fec059317c..03bc16c9cc19 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -835,6 +835,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_CET_ENABLE: case ARCH_CET_DISABLE: case ARCH_CET_LOCK: + case ARCH_CET_UNLOCK: return cet_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 71620b77a654..bed7032d35f2 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -450,9 +450,14 @@ long cet_prctl(struct task_struct *task, int option, unsigned long features) return 0; } - /* Don't allow via ptrace */ - if (task != current) + /* Only allow via ptrace */ + if (task != current) { + if (option == ARCH_CET_UNLOCK && IS_ENABLED(CONFIG_CHECKPOINT_RESTORE)) { + task->thread.features_locked &= ~features; + return 0; + } return -EINVAL; + } /* Do not allow to change locked features */ if (features & task->thread.features_locked) From patchwork Fri Nov 4 22:36:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 15853 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp683461wru; Fri, 4 Nov 2022 15:53:25 -0700 (PDT) X-Google-Smtp-Source: AMsMyM58ipO8hEZ5cpjoaD9goy5kVvKeD5fbWHepjBarzlZP7GreB8QIZukHQsUMGKW4DZAmwEAX X-Received: by 2002:a63:e64f:0:b0:43c:9db1:8096 with SMTP id p15-20020a63e64f000000b0043c9db18096mr31253954pgj.567.1667602405449; Fri, 04 Nov 2022 15:53:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667602405; cv=none; d=google.com; s=arc-20160816; b=ZKbKlgcKjr0QJCpZxUEdthbKSkpX/Y6eDDwt6fZ8A73kAbZDPDbNbVHSH0dUjL49DN nSoOAEmrZozVCYdqG4Gss6OpN26/0iMTRHNeV7ZzigwTOLz7ooOrbSBiga6nIaWjajzZ WA1JPJ3HkUpNlPBs/5qvWrwaUoe0tR/azxQzIDCI9/tZdSKa8ZceH0bnX6ZY4rPrcCdE 0Rj8vb9Z+TB47ecMJl4MNZ6VqVZKbxnrTKGhn+k+/DOQAmgxoQyRCk9TWiiMQykjy4+V k0lTfzMbDzzEDIbavfIu+EtPqTwRJt0kswZgtLeuwFvistiuLPFpDJ0GEmZl/VmSj0bD ChQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=6gYZU/qVJuoCl3Jd0ozzU40OjUNydsi9o6k71TVqkmE=; b=uimTEXacoiwzjWcUDCHcBFZidj1JtT+vzOEFmlBAvdz4ayvGsr3U1CAXkna5T38qLI AlTVurkO/xpXa6AeKik6HfRRCAIghJIoY+OrdV8dz+ZV0dSFaYw6jCu5vHnyitzYpYuD C5zF1Y26Xg4gcK9oonjhxGSkvFhXpBStGCGmIna7P7aLq/h2yx+heg/eLN86rt51ATbJ p8JlAZWf9wRTBkGdnq5o9nCyBJcKM7u1udQl5xIQuUuLEtN0sEz4eZBmDv60Qw4KH7xC isTYIAXy63bv0IYQwd3ReWroDof8590Grwf+fyY2YYjakUsz3K9DsaMB52nXSdcByqwg DJfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NpEYZrVi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s78-20020a632c51000000b00460c07c5542si919794pgs.407.2022.11.04.15.53.11; Fri, 04 Nov 2022 15:53:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NpEYZrVi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230407AbiKDWqa (ORCPT + 99 others); Fri, 4 Nov 2022 18:46:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230364AbiKDWpb (ORCPT ); Fri, 4 Nov 2022 18:45:31 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65E3A121277; Fri, 4 Nov 2022 15:40:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667601659; x=1699137659; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=oFyzrlClONZwQJyWr62ON/SaIrHd5uDBUG8KA9aMSoM=; b=NpEYZrVicVXA/mdbRMu0zSEuLe/l1kbTfOcOvBPzKLkHiqODijh6Lekq kygt779Ds04eyNz1FDU0JzxJcRgDklP/0LWHWfC+Sjy6eY1SezQwc7s9H Q41cX5p3o47iU2M1ewnlZTDNQF4NRYIJTkfwVrshRi1lmIGPyDrLjsZGW fZ25SdAzgt++wfOQtsLogwUIwjEuWAopuvu2Q/t1X3wn2wWixLCq8Sfwt PQdCk/w5wui5T8WhIGpvBAL3ova0i2CgXg2H081iHyCQXjWItXg/XAmL2 P2Zj6809WtjWdtDHRc8jYUVnQZbRQUztWi5V7buR0/g0PCO2b34aj7Vx3 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="311840637" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="311840637" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:55 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10521"; a="668514186" X-IronPort-AV: E=Sophos;i="5.96,138,1665471600"; d="scan'208";a="668514186" Received: from adhjerms-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.227.68]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2022 15:39:55 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [RFC 37/37] fs/binfmt_elf: Block old shstk elf bit Date: Fri, 4 Nov 2022 15:36:04 -0700 Message-Id: <20221104223604.29615-38-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221104223604.29615-1-rick.p.edgecombe@intel.com> References: <20221104223604.29615-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748607860066086091?= X-GMAIL-MSGID: =?utf-8?q?1748607860066086091?= The x86 Control-flow Enforcement Technology (CET) feature includes a new feature called shadow stacks that provides security enforcement of behavior that is rarely used by non-attackers. There exists a lurking compatibility problem for userspace shadow stack. Old binaries exist that are marked as supporting shadow stack in their elf header, but actually will crash if shadow stack is enabled. This would only happens if the loader chooses to call the kernel APIs that enable shadow stack. However, glibc plans to update to do just this. At which point the old apps will crash. In a lot of ways this is userspace's business, however the kernel could save the user from these crashes. It could do this by detecting the elf bit and blocking the shadow stack APIs, so that loader (glibc) will fail to enable shadow stack and the binary would then run without shadow stack. So implement this logic in the elf processing that happens during exec. This is a bit dirty, and implemented here just for discussion on whether the kernel should actually do something like this. The elf loading logic in the kernel has to do a little extra scanning through the elf header in order to find the shadow stack bit. Since some people may not mind if some apps crash, also create a Kconfig X86_USER_SHADOW_STACK_ALLOW_BROKEN to allow the old binaries to still have access to the shadow stack kernel APIs. This is based on an earlier patch by Yu-cheng Yu that was looking at elf bits on the interpreter instead of the execing binary. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe --- arch/arm64/include/asm/elf.h | 5 +++++ arch/x86/Kconfig | 13 +++++++++++++ arch/x86/include/asm/cet.h | 2 ++ arch/x86/include/asm/elf.h | 11 +++++++++++ arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/process_64.c | 33 ++++++++++++++++++++++++++++++++ arch/x86/kernel/shstk.c | 15 +++++++++++++++ fs/binfmt_elf.c | 24 ++++++++++++++++++++++- include/linux/elf.h | 6 ++++++ include/uapi/linux/elf.h | 15 +++++++++++++++ 10 files changed, 124 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 97932fbf973d..1aa76ed02dda 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -279,6 +279,11 @@ static inline int arch_parse_elf_property(u32 type, const void *data, return 0; } +static inline int arch_process_elf_property(struct arch_elf_state *arch) +{ + return 0; +} + static inline int arch_elf_pt_proc(void *ehdr, void *phdr, struct file *f, bool is_interp, struct arch_elf_state *state) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f3d14f5accce..da9e43aa91a3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -28,6 +28,7 @@ config X86_64 select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_USE_GNU_PROPERTY select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE @@ -60,6 +61,7 @@ config X86 select ACPI_LEGACY_TABLES_LOOKUP if ACPI select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI select ARCH_32BIT_OFF_T if X86_32 + select ARCH_BINFMT_ELF_STATE select ARCH_CLOCKSOURCE_INIT select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE select ARCH_ENABLE_HUGEPAGE_MIGRATION if X86_64 && HUGETLB_PAGE && MIGRATION @@ -1977,6 +1979,17 @@ config X86_USER_SHADOW_STACK If unsure, say N. +config X86_USER_SHADOW_STACK_ALLOW_BROKEN + bool "Allow enabling shadow stack for broken binaries" + depends on EXPERT + depends on X86_USER_SHADOW_STACK + help + There exist old binaries that are marked as compatible with shadow + stack, but actually aren't. The kernel blocks these binaries from + getting shadow stack enabled by default. But some working binaries + are also blocked. Select this option if you would like to allow these + binaries to run with shadow stack, and possibly crash. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 098e4ecfdf9b..7f0cabb3db21 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -22,6 +22,7 @@ int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, void shstk_free(struct task_struct *p); int setup_signal_shadow_stack(struct ksignal *ksig); int restore_signal_shadow_stack(void); +void bad_cet_binary_disable(bool disable); #else static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } @@ -33,6 +34,7 @@ static inline int shstk_alloc_thread_stack(struct task_struct *p, static inline void shstk_free(struct task_struct *p) {} static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } static inline int restore_signal_shadow_stack(void) { return 0; } +static inline void bad_cet_binary_disable(bool disable) {}; #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index cb0ff1055ab1..95ee133acffb 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -383,6 +383,17 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm, extern bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs); +struct arch_elf_state { + unsigned int gnu_property; +}; + +#define INIT_ARCH_ELF_STATE { \ + .gnu_property = 0, \ +} + +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) (0) +#define arch_check_elf(ehdr, interp, interp_ehdr, state) (0) + /* Do not change the values. See get_align_mask() */ enum align_flags { ALIGN_VA_32 = BIT(0), diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a6c414dfd10f..4b333c801010 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -534,6 +534,7 @@ struct thread_struct { #ifdef CONFIG_X86_USER_SHADOW_STACK unsigned long features; unsigned long features_locked; + bool bad_cet_binary_disable; struct thread_shstk shstk; #endif diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 03bc16c9cc19..461b8e9468df 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -867,3 +867,36 @@ unsigned long KSTK_ESP(struct task_struct *task) { return task_pt_regs(task)->sp; } + +#ifdef CONFIG_X86_USER_SHADOW_STACK +int arch_parse_elf_property(u32 type, const void *data, size_t datasz, + bool compat, struct arch_elf_state *state) +{ + if (type != GNU_PROPERTY_X86_FEATURE_1_AND) + return 0; + + if (datasz != sizeof(unsigned int)) + return -ENOEXEC; + + state->gnu_property = *(unsigned int *)data; + return 0; +} + +int arch_process_elf_property(struct arch_elf_state *state) +{ + bad_cet_binary_disable(state->gnu_property & GNU_PROPERTY_X86_FEATURE_1_BAD); + return 0; +} +#else /* CONFIG_X86_USER_SHADOW_STACK */ +int arch_parse_elf_property(u32 type, const void *data, size_t datasz, + bool compat, struct arch_elf_state *state) +{ + return 0; +} + +int arch_process_elf_property(struct arch_elf_state *state) +{ + return 0; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index bed7032d35f2..cb105e69c840 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -445,6 +445,9 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi long cet_prctl(struct task_struct *task, int option, unsigned long features) { + if (task->thread.bad_cet_binary_disable) + return -EINVAL; + if (option == ARCH_CET_LOCK) { task->thread.features_locked |= features; return 0; @@ -482,3 +485,15 @@ long cet_prctl(struct task_struct *task, int option, unsigned long features) return wrss_control(true); return -EINVAL; } + +#ifdef CONFIG_X86_USER_SHADOW_STACK_ALLOW_BROKEN +void bad_cet_binary_disable(bool disable) +{ + current->thread.bad_cet_binary_disable = false; +} +#else /* CONFIG_X86_USER_SHADOW_STACK_ALLOW_BROKEN */ +void bad_cet_binary_disable(bool disable) +{ + current->thread.bad_cet_binary_disable = disable; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK_ALLOW_BROKEN */ diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6a11025e5850..8b6ae5e423fb 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -764,6 +764,8 @@ static int parse_elf_property(const char *data, size_t *off, size_t datasz, #define GNU_PROPERTY_TYPE_0_NAME "GNU" #define NOTE_NAME_SZ (sizeof(GNU_PROPERTY_TYPE_0_NAME)) + + static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr, struct arch_elf_state *arch) { @@ -821,6 +823,18 @@ static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr, return ret == -ENOENT ? 0 : ret; } +static int check_elf_properties(struct file *f, const struct elf_phdr *phdr) +{ + struct arch_elf_state arch_state = INIT_ARCH_ELF_STATE; + int retval; + + retval = parse_elf_properties(f, phdr, &arch_state); + if (retval) + return retval; + + return arch_process_elf_property(&arch_state); +} + static int load_elf_binary(struct linux_binprm *bprm) { struct file *interpreter = NULL; /* to shut gcc up */ @@ -920,13 +934,21 @@ static int load_elf_binary(struct linux_binprm *bprm) if (retval < 0) goto out_free_dentry; - break; + /* Quit if already found PT_GNU_PROPERTY */ + if (elf_property_phdata) + break; + + continue; out_free_interp: kfree(elf_interpreter); goto out_free_ph; } + retval = check_elf_properties(bprm->file, elf_property_phdata); + if (retval) + return retval; + elf_ppnt = elf_phdata; for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) switch (elf_ppnt->p_type) { diff --git a/include/linux/elf.h b/include/linux/elf.h index c9a46c4e183b..faf961b92a95 100644 --- a/include/linux/elf.h +++ b/include/linux/elf.h @@ -92,9 +92,15 @@ static inline int arch_parse_elf_property(u32 type, const void *data, { return 0; } + +static inline int arch_process_elf_property(struct arch_elf_state *arch) +{ + return 0; +} #else extern int arch_parse_elf_property(u32 type, const void *data, size_t datasz, bool compat, struct arch_elf_state *arch); +extern int arch_process_elf_property(struct arch_elf_state *arch); #endif #ifdef CONFIG_ARCH_HAVE_ELF_PROT diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 11089731e2e9..d9b58adce321 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -469,4 +469,19 @@ typedef struct elf64_note { /* Bits for GNU_PROPERTY_AARCH64_FEATURE_1_BTI */ #define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0) +/* + * See the x86 64 psABI at: + * https://gitlab.com/x86-psABIs/x86-64-ABI/-/wikis/x86-64-psABI + * .note.gnu.property types for x86: + */ +/* 0xc0000000 and 0xc0000001 are reserved */ +#define GNU_PROPERTY_X86_FEATURE_1_AND 0xc0000002 + +/* Bits for GNU_PROPERTY_X86_FEATURE_1_AND */ +#define GNU_PROPERTY_X86_FEATURE_1_IBT 0x00000001 +#define GNU_PROPERTY_X86_FEATURE_1_SHSTK 0x00000002 + +#define GNU_PROPERTY_X86_FEATURE_1_BAD (GNU_PROPERTY_X86_FEATURE_1_IBT | \ + GNU_PROPERTY_X86_FEATURE_1_SHSTK) + #endif /* _UAPI_LINUX_ELF_H */