From patchwork Thu Jan 19 21:22:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 4269 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp555288wrn; Thu, 19 Jan 2023 13:33:47 -0800 (PST) X-Google-Smtp-Source: AMrXdXtbt6bipasHjceT1LqUqyhhjQc9ejjcl4piA33wsDY8GZVsIr3P6/aRo1oH7LCOPgBIVklf X-Received: by 2002:a05:6a00:4291:b0:573:f869:2115 with SMTP id bx17-20020a056a00429100b00573f8692115mr12837547pfb.9.1674164027484; Thu, 19 Jan 2023 13:33:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674164027; cv=none; d=google.com; s=arc-20160816; b=WGK7avTDPg3mbgEFCf+AsxyT+PGnAvwA0tfisU4XajJIbIOU4e1LMnPcH145JxGQ3L VIDRKNhaqhQv69g8j4qFMgm2P7+xH75a1uMGM56hLQmDgVkDnaHLz1yDj5gA6EzwlrFy wzUNJbqbtyE7hpzVVQbqjGMMpawFBh0F7RvJ3qxQLjQn6/dIRy1p0APR54wfxgTprtaC FM6/exJ4QQbJpHAObWEBjp1ujhgyiZRRhyl23jJuZnhD7SiDq7nloFGGab8q+hrJ0xpd uCi4IOIT/6sWShmfC5Lmsd65+3mnDbANwW0UHzlLQeQqGB1hqBzqfbxmMu/x40PrGe98 l44g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=g+2PRwqZcp2dJEdWD5Hf1MbtN8bNDIL6rm89W+Rq+f0=; b=rs5ynvUcwHN09ASDgzR5cLvEGxItGFkcBIy8qNlByqrD1jtmC4tslStKTxu4wT52pJ 3aoiyeMXDB/39OgSnfqxCdnvX3gVTDfwqKL0F2Nudwj0zOBEuBGuN8s+Wy6XuCI/rsg9 ANYr2+/GTReK+dIJrzfHJOx1zv1sCTcnEOzNCZZ0k0XHmMNO+XB48zQucQcsGjcY24M4 RuQxx/0tzBzeEl5ZIrIdvs6xidIBRa8Xt4e7Jq1hDAs1mKsaouvYQLSm5cSnQy5q2X6g oXRbJxEJok2NzPG7yrEzYFw7PCNk2hOaXUuUnhSP4zOHgxiJQ323Kydi+9aZATB46rVn xyLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=EG+G0Oq9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t125-20020a625f83000000b00580e8dd6cf1si37072404pfb.73.2023.01.19.13.33.35; Thu, 19 Jan 2023 13:33:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=EG+G0Oq9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230312AbjASVbg (ORCPT + 99 others); Thu, 19 Jan 2023 16:31:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229712AbjASV3Q (ORCPT ); Thu, 19 Jan 2023 16:29:16 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 436CEA5CFE; Thu, 19 Jan 2023 13:24:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674163449; x=1705699449; h=from:to:cc:subject:date:message-id; bh=KeghPjHYUOp2vxEfBJ4+MNk/QB9fTnQp/kAL+jxFnkc=; b=EG+G0Oq99FhhN5UGqQPnEoxm3iXKvDFP3RugfPsi+Se0V9ktIzJHNAAS gZyICg6WYA5ExYU7rzSJR0+bDQ5j8Zhkd72Lhz0r2v1sRfipRe54/ekOr 0l+IDcdBFEqYWoSF+nQOIJ++eBBy8TfywodNXqAsu2LKi9D8Vm+MBEyz0 sSPZgDdtereIsRGDmV56B20yoIfYeM/+WrRrd/BmXlNBfV+1Ef5xug16Z cdYhVvjECBVQKD2jx+IIWZ3MB/dzrIRE7qDIvcKarguT6JDloa6fRObWG /csQIwGHI5TGdg1R/lAFXDJLZNd4InXjBuqVhfoBaoaVtgtbmfc9ldDoi g==; X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="323119137" X-IronPort-AV: E=Sophos;i="5.97,230,1669104000"; d="scan'208";a="323119137" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 13:23:24 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="989138978" X-IronPort-AV: E=Sophos;i="5.97,230,1669104000"; d="scan'208";a="989138978" Received: from hossain3-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.252.128.187]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 13:23:23 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v5 00/39] Shadow stacks for userspace Date: Thu, 19 Jan 2023 13:22:38 -0800 Message-Id: <20230119212317.8324-1-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755488219624395993?= X-GMAIL-MSGID: =?utf-8?q?1755488219624395993?= Hi, This series implements Shadow Stacks for userspace using x86's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more details, see the coverletter from v1 [0]. The main change in this version is the removal of the attempt to prevent 32 bit signals from being registered with shadow stack enabled. Peterz originally raised the issue that shadow stack support in 32 bit signals was in a half working state. The reason for that was 32 bit signals are not easy to support for shadow stack, and also there is not a huge demand for shadow stack support in 32 bit apps using 32 bit emulation on 64 bit kernels. At that point the solution was to prevent shadow stack from being enabled on 32 bit processes. But Peterz pointed that 64 bit apps can transition to 32 bit outside of kernel interaction by making a far call to a 32 bit segment. So the next solution was to prevent 32 bit signals from being registered when shadow stack was enabled. This turned out to be hard to do, due to signals being per-process and shadow stack being per task. But it turns out this far call scenario was already mostly not possible due to the HW not supporting shadow stacks located outside of the 32 bit address space when in 32 bit mode. During the transition to 32 bit mode with an SSP pointing outside of the 32 bit address space, HW generates a #GP which in turn triggers a segfault. So basically there is already a barrier in place for this far call scenario for the most part. Creation of shadow stack memory is tightly controlled, so the solution in this version is just to *ensure* that shadow stacks can never be allocated in the 32 bit address space. For more information see the new patch: "x86/mm: Introduce MAP_ABOVE4G", and the documentation in patch 1. Additionally: - A smattering of small changes from Boris and Kees - Fixed my spellcheck setup and then fixed a bunch of spelling issues in the commit logs. - An update to the pte_modify() PAGE_COW solution I left tested-by tags in place per discussion with testers. Testers, please retest. Previous version [1]. [0] https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ [1] https://lore.kernel.org/lkml/20221203003606.6838-1-rick.p.edgecombe@intel.com/ Kirill A. Shutemov (1): x86: Introduce userspace API for shadow stack Mike Rapoport (1): x86/shstk: Add ARCH_SHSTK_UNLOCK Rick Edgecombe (14): x86/fpu: Add helper for modifying xstate x86/mm: Introduce _PAGE_COW x86/mm: Start actually marking _PAGE_COW mm: Handle faultless write upgrades for shstk mm: Don't allow write GUPs to shadow stack memory x86/mm: Introduce MAP_ABOVE4G mm: Warn on shadow stack memory in wrong vma x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Support WRSS for userspace x86: Expose thread features in /proc/$PID/status x86/shstk: Wire in shadow stack interface selftests/x86: Add shadow stack test x86/fpu: Add helper for initing features x86/shstk: Add ARCH_SHSTK_STATUS Yu-cheng Yu (23): Documentation/x86: Add CET shadow stack description x86/shstk: Add Kconfig option for shadow stack x86/cpufeatures: Add CPU feature flags for shadow stacks x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states x86: Add user control-protection fault handler x86/mm: Remove _PAGE_DIRTY from kernel RO pages x86/mm: Move pmd_write(), pud_write() up in the file x86/mm: Update pte_modify for _PAGE_COW x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW mm: Move VM_UFFD_MINOR_BIT from 37 to 38 mm: Introduce VM_SHADOW_STACK for shadow stack memory x86/mm: Check shadow stack page fault errors x86/mm: Update maybe_mkwrite() for shadow stack mm: Fixup places that call pte_mkwrite() directly mm: Add guard pages around a shadow stack. mm/mmap: Add shadow stack pages to memory accounting mm: Re-introduce vm_flags to do_mmap() x86/shstk: Add user-mode shadow stack support x86/shstk: Handle thread shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle signals for shadow stack x86: Add PTRACE interface for shadow stack Documentation/filesystems/proc.rst | 1 + Documentation/x86/index.rst | 1 + Documentation/x86/shstk.rst | 176 +++++ arch/arm/kernel/signal.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/signal32.c | 2 +- arch/sparc/kernel/signal32.c | 2 +- arch/sparc/kernel/signal_64.c | 2 +- arch/x86/Kconfig | 24 + arch/x86/Kconfig.assembler | 5 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/include/asm/disabled-features.h | 16 +- arch/x86/include/asm/fpu/api.h | 9 + arch/x86/include/asm/fpu/regset.h | 7 +- arch/x86/include/asm/fpu/sched.h | 3 +- arch/x86/include/asm/fpu/types.h | 16 +- arch/x86/include/asm/fpu/xstate.h | 6 +- arch/x86/include/asm/idtentry.h | 2 +- arch/x86/include/asm/mmu_context.h | 2 + arch/x86/include/asm/msr.h | 11 + arch/x86/include/asm/pgtable.h | 338 ++++++++- arch/x86/include/asm/pgtable_types.h | 65 +- arch/x86/include/asm/processor.h | 8 + arch/x86/include/asm/shstk.h | 40 ++ arch/x86/include/asm/special_insns.h | 13 + arch/x86/include/asm/tlbflush.h | 3 +- arch/x86/include/asm/trap_pf.h | 2 + arch/x86/include/asm/traps.h | 12 + arch/x86/include/uapi/asm/mman.h | 4 + arch/x86/include/uapi/asm/prctl.h | 12 + arch/x86/kernel/Makefile | 4 + arch/x86/kernel/cet.c | 152 ++++ arch/x86/kernel/cpu/common.c | 35 +- arch/x86/kernel/cpu/cpuid-deps.c | 1 + arch/x86/kernel/cpu/proc.c | 23 + arch/x86/kernel/fpu/core.c | 59 +- arch/x86/kernel/fpu/regset.c | 87 +++ arch/x86/kernel/fpu/xstate.c | 148 ++-- arch/x86/kernel/fpu/xstate.h | 6 + arch/x86/kernel/idt.c | 2 +- arch/x86/kernel/process.c | 18 +- arch/x86/kernel/process_64.c | 9 +- arch/x86/kernel/ptrace.c | 12 + arch/x86/kernel/shstk.c | 492 +++++++++++++ arch/x86/kernel/signal.c | 1 + arch/x86/kernel/signal_32.c | 2 +- arch/x86/kernel/signal_64.c | 8 +- arch/x86/kernel/sys_x86_64.c | 6 +- arch/x86/kernel/traps.c | 87 --- arch/x86/mm/fault.c | 38 + arch/x86/mm/pat/set_memory.c | 2 +- arch/x86/mm/pgtable.c | 6 + arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm.S | 2 +- fs/aio.c | 2 +- fs/proc/array.c | 6 + fs/proc/task_mmu.c | 3 + include/linux/mm.h | 59 +- include/linux/mman.h | 4 + include/linux/pgtable.h | 35 + include/linux/proc_fs.h | 2 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/siginfo.h | 3 +- include/uapi/asm-generic/unistd.h | 2 +- include/uapi/linux/elf.h | 2 + ipc/shm.c | 2 +- kernel/sys_ni.c | 1 + mm/gup.c | 2 +- mm/huge_memory.c | 12 +- mm/memory.c | 7 +- mm/migrate_device.c | 4 +- mm/mmap.c | 12 +- mm/nommu.c | 4 +- mm/userfaultfd.c | 10 +- mm/util.c | 2 +- tools/testing/selftests/x86/Makefile | 4 +- .../testing/selftests/x86/test_shadow_stack.c | 667 ++++++++++++++++++ 78 files changed, 2578 insertions(+), 259 deletions(-) create mode 100644 Documentation/x86/shstk.rst create mode 100644 arch/x86/include/asm/shstk.h create mode 100644 arch/x86/kernel/cet.c create mode 100644 arch/x86/kernel/shstk.c create mode 100644 tools/testing/selftests/x86/test_shadow_stack.c Tested-by: John Allen Acked-by: Mike Rapoport (IBM)