Message ID | 20230227222957.24501-25-rick.p.edgecombe@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2682748wrd; Mon, 27 Feb 2023 14:34:41 -0800 (PST) X-Google-Smtp-Source: AK7set89Q+OIQnv6yblCTB37iJsiEzVlN8QyTyovogN98HpFfkQo4pQm27qrl8A3iNgWUDBUB5IU X-Received: by 2002:a17:907:98f8:b0:839:74cf:7c4f with SMTP id ke24-20020a17090798f800b0083974cf7c4fmr337611ejc.8.1677537281652; Mon, 27 Feb 2023 14:34:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677537281; cv=none; d=google.com; s=arc-20160816; b=Rsbx74srCK0bWi55lmkCtDEb2UFf94ljdQCoLJMMw/qmGZVX+w+VqoAv32b3ojpXZA xfERoye69ALeltIECAsNIey/bTNByB5p2G9fqfuwO/GBIJF5s/kI3A5upQmjRsfN4LrT ib3I1wYtfvtWbtOurmvJqvUa2Xy2nJosdSFz+GiSF27Q1uOhuuDePtLUY/fNZXJo6bU3 uffY1sifosTegrKzcqUZAzIzxVvUaejBEMPe7Dq90sUPofXG9PfrUMpzrixAAT7TMwsW sa8lkcaL96C7ESdEspP+SKEE+2shshJSvOXYRXJc93aF33ey6uXc0pkzEwhZvKX2acAp CFoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=se2s0CGLP/xSzdXeus2cLfjx/jzzmrnWH6Cj7Cnle5Y=; b=Vv+PgsZOKF0z4rirAITK+zHMJSFZSw1MfnrJCWRX/cuYGpULBlr4pmX2sf+af33YoV H++66JGE/GGPMuJn+mguwkYTRTgVKuC7KuEM1IwqPrbG1fFIyEi3cRwVzrnUt2D8qFNs ypPXZVEpEYPxfEHHVKZWWCZDQVguUjpUCvGTDM8bDQr84Kqj+BdWVKNnNjvJefhgmqTl GgWOGvJZLK116wDv6PEaj+2cLwdMBtj/9zbY9pB0G0SXmYaCQJQ090Ijy7or77LITg2W J0j4EVh/2LeYV3Dz0pOvnNBPPHXLJOgzZpszjYPUqxYInl7b+JL5HDtHFDf6DOCybJ00 vB7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=k8zJDgRA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f24-20020a056402151800b004acccbad706si9238313edw.194.2023.02.27.14.34.15; Mon, 27 Feb 2023 14:34:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=k8zJDgRA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230461AbjB0WdT (ORCPT <rfc822;wenzhi022@gmail.com> + 99 others); Mon, 27 Feb 2023 17:33:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230284AbjB0Wb4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 27 Feb 2023 17:31:56 -0500 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 657F8298F3; Mon, 27 Feb 2023 14:31:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677537112; x=1709073112; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=g/mLbA9/W/OJJZ4lgb8cNHR8TFa9W9m/xYBMcnZGnDk=; b=k8zJDgRAV1xNrNwJwOJ9du+q7bLyWgmbgE8cZRU8sZufgm250zXuJUUG WJzikpHwdIQ+UmIXYr/wY11cF00De6MnoYTlpTj70O94Bc7xkpB9V0EtG hbwHLAad2UULbN9OesdbW9VoSmdne+yvIu5zoYBZpnczkxIcI8CtFzvTj uhISFxu1YL0COnr/lhysLH2ejC5n7JXBjff9tOMTEyhLrFBRgMcGda//1 gn6PlnZNV6KFl3FlNVXhLe2Yukr6zbC4wRSbXHRWdzFu6xBRcYRunJ2a2 lif1YsjwfbsJpMVy4lKz7XQiZhJxTfFWJ+RC/CZ1Hpdvt4HfokVge6Wxm w==; X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="313657585" X-IronPort-AV: E=Sophos;i="5.98,220,1673942400"; d="scan'208";a="313657585" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2023 14:31:26 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="848024659" X-IronPort-AV: E=Sophos;i="5.98,220,1673942400"; d="scan'208";a="848024659" Received: from leonqu-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.72.19]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2023 14:31:25 -0800 From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>, Andy Lutomirski <luto@kernel.org>, Balbir Singh <bsingharora@gmail.com>, Borislav Petkov <bp@alien8.de>, Cyrill Gorcunov <gorcunov@gmail.com>, Dave Hansen <dave.hansen@linux.intel.com>, Eugene Syromiatnikov <esyr@redhat.com>, Florian Weimer <fweimer@redhat.com>, "H . J . Lu" <hjl.tools@gmail.com>, Jann Horn <jannh@google.com>, Jonathan Corbet <corbet@lwn.net>, Kees Cook <keescook@chromium.org>, Mike Kravetz <mike.kravetz@oracle.com>, Nadav Amit <nadav.amit@gmail.com>, Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>, Peter Zijlstra <peterz@infradead.org>, Randy Dunlap <rdunlap@infradead.org>, Weijiang Yang <weijiang.yang@intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, John Allen <john.allen@amd.com>, kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v7 24/41] mm: Don't allow write GUPs to shadow stack memory Date: Mon, 27 Feb 2023 14:29:40 -0800 Message-Id: <20230227222957.24501-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230227222957.24501-1-rick.p.edgecombe@intel.com> References: <20230227222957.24501-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759025332581558842?= X-GMAIL-MSGID: =?utf-8?q?1759025332581558842?= |
Series |
Shadow stacks for userspace
|
|
Commit Message
Edgecombe, Rick P
Feb. 27, 2023, 10:29 p.m. UTC
The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Shadow stack memory is writable only in very specific, controlled ways. However, since it is writable, the kernel treats it as such. As a result there remain many ways for userspace to trigger the kernel to write to shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a little less exposed, block writable GUPs for shadow stack VMAs. Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections. Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- v3: - Add comment in __pte_access_permitted() (Dave) - Remove unneeded shadow stack specific check in __pte_access_permitted() (Jann) --- arch/x86/include/asm/pgtable.h | 5 +++++ mm/gup.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-)
Comments
On Mon, Feb 27, 2023 at 02:29:40PM -0800, Rick Edgecombe wrote: > The x86 Control-flow Enforcement Technology (CET) feature includes a new > type of memory called shadow stack. This shadow stack memory has some > unusual properties, which requires some core mm changes to function > properly. > > Shadow stack memory is writable only in very specific, controlled ways. > However, since it is writable, the kernel treats it as such. As a result ^ , > there remain many ways for userspace to trigger the kernel to write to > shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a "stacks" or "to write to a shadow stack via..." > little less exposed, block writable GUPs for shadow stack VMAs. GUPs? I supposed this means "prevent get_user_pages() from pinning pages to which the corresponding VMA is a shadow stack one."? Or something like that which is less mm-internal speak...
On Mon, Mar 6, 2023 at 5:10 AM Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Feb 27, 2023 at 02:29:40PM -0800, Rick Edgecombe wrote: > > The x86 Control-flow Enforcement Technology (CET) feature includes a new > > type of memory called shadow stack. This shadow stack memory has some > > unusual properties, which requires some core mm changes to function > > properly. > > > > Shadow stack memory is writable only in very specific, controlled ways. > > However, since it is writable, the kernel treats it as such. As a result > ^ > , > > > there remain many ways for userspace to trigger the kernel to write to > > shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a Is there an alternate mechanism, or do we still want to allow FOLL_FORCE so that debuggers can write it? --Andy
On Mon, 2023-03-06 at 10:15 -0800, Andy Lutomirski wrote: > On Mon, Mar 6, 2023 at 5:10 AM Borislav Petkov <bp@alien8.de> wrote: > > > > On Mon, Feb 27, 2023 at 02:29:40PM -0800, Rick Edgecombe wrote: > > > The x86 Control-flow Enforcement Technology (CET) feature > > > includes a new > > > type of memory called shadow stack. This shadow stack memory has > > > some > > > unusual properties, which requires some core mm changes to > > > function > > > properly. > > > > > > Shadow stack memory is writable only in very specific, controlled > > > ways. > > > However, since it is writable, the kernel treats it as such. As a > > > result > > > > > > ^ > > > > , > > > > > there remain many ways for userspace to trigger the kernel to > > > write to > > > shadow stack's via get_user_pages(, FOLL_WRITE) operations. To > > > make this a > > Is there an alternate mechanism, or do we still want to allow > FOLL_FORCE so that debuggers can write it? Yes, GDB shadow stack support uses it via both ptrace poke and /proc/pid/mem apparently. So some ability to write through is needed for debuggers. But not CRIU actually. It uses WRSS. There was also some discussion[0] previously about how apps might prefer to block /proc/self/mem for general security reasons. Blocking shadow stack writes while you allow text writes is probably not that impactful security-wise. So I thought it would be better to leave the logic simpler. Then when /proc/self/mem could be locked down per the discussion, shadow stack can be locked down the same way. [0] https://lore.kernel.org/lkml/E857CF98-EEB2-4F83-8305-0A52B463A661@kernel.org/
On Mon, Mar 6, 2023, at 10:33 AM, Edgecombe, Rick P wrote: > On Mon, 2023-03-06 at 10:15 -0800, Andy Lutomirski wrote: >> On Mon, Mar 6, 2023 at 5:10 AM Borislav Petkov <bp@alien8.de> wrote: >> > >> > On Mon, Feb 27, 2023 at 02:29:40PM -0800, Rick Edgecombe wrote: >> > > The x86 Control-flow Enforcement Technology (CET) feature >> > > includes a new >> > > type of memory called shadow stack. This shadow stack memory has >> > > some >> > > unusual properties, which requires some core mm changes to >> > > function >> > > properly. >> > > >> > > Shadow stack memory is writable only in very specific, controlled >> > > ways. >> > > However, since it is writable, the kernel treats it as such. As a >> > > result >> > >> > >> > ^ >> > >> > , >> > >> > > there remain many ways for userspace to trigger the kernel to >> > > write to >> > > shadow stack's via get_user_pages(, FOLL_WRITE) operations. To >> > > make this a >> >> Is there an alternate mechanism, or do we still want to allow >> FOLL_FORCE so that debuggers can write it? > > Yes, GDB shadow stack support uses it via both ptrace poke and > /proc/pid/mem apparently. So some ability to write through is needed > for debuggers. But not CRIU actually. It uses WRSS. > > There was also some discussion[0] previously about how apps might > prefer to block /proc/self/mem for general security reasons. Blocking > shadow stack writes while you allow text writes is probably not that > impactful security-wise. So I thought it would be better to leave the > logic simpler. Then when /proc/self/mem could be locked down per the > discussion, shadow stack can be locked down the same way. Ah, I am guilty of reading your changelog but not the code. You said: Shadow stack memory is writable only in very specific, controlled ways. However, since it is writable, the kernel treats it as such. As a result there remain many ways for userspace to trigger the kernel to write to shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a little less exposed, block writable GUPs for shadow stack VMAs. I read that as *denying* FOLL_FORCE. Maybe clarify the changelog? > > [0] > https://lore.kernel.org/lkml/E857CF98-EEB2-4F83-8305-0A52B463A661@kernel.org/
On Mon, 2023-03-06 at 10:57 -0800, Andy Lutomirski wrote: > On Mon, Mar 6, 2023, at 10:33 AM, Edgecombe, Rick P wrote: > > On Mon, 2023-03-06 at 10:15 -0800, Andy Lutomirski wrote: > > > On Mon, Mar 6, 2023 at 5:10 AM Borislav Petkov <bp@alien8.de> > > > wrote: > > > > > > > > On Mon, Feb 27, 2023 at 02:29:40PM -0800, Rick Edgecombe wrote: > > > > > The x86 Control-flow Enforcement Technology (CET) feature > > > > > includes a new > > > > > type of memory called shadow stack. This shadow stack memory > > > > > has > > > > > some > > > > > unusual properties, which requires some core mm changes to > > > > > function > > > > > properly. > > > > > > > > > > Shadow stack memory is writable only in very specific, > > > > > controlled > > > > > ways. > > > > > However, since it is writable, the kernel treats it as such. > > > > > As a > > > > > result > > > > > > > > > > > > > > > > ^ > > > > > > > > > > > > , > > > > > > > > > there remain many ways for userspace to trigger the kernel to > > > > > write to > > > > > shadow stack's via get_user_pages(, FOLL_WRITE) operations. > > > > > To > > > > > make this a > > > > > > Is there an alternate mechanism, or do we still want to allow > > > FOLL_FORCE so that debuggers can write it? > > > > Yes, GDB shadow stack support uses it via both ptrace poke and > > /proc/pid/mem apparently. So some ability to write through is > > needed > > for debuggers. But not CRIU actually. It uses WRSS. > > > > There was also some discussion[0] previously about how apps might > > prefer to block /proc/self/mem for general security reasons. > > Blocking > > shadow stack writes while you allow text writes is probably not > > that > > impactful security-wise. So I thought it would be better to leave > > the > > logic simpler. Then when /proc/self/mem could be locked down per > > the > > discussion, shadow stack can be locked down the same way. > > Ah, I am guilty of reading your changelog but not the code. > > You said: > > Shadow stack memory is writable only in very specific, controlled > ways. > However, since it is writable, the kernel treats it as such. As a > result > there remain many ways for userspace to trigger the kernel to write > to > shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make > this a > little less exposed, block writable GUPs for shadow stack VMAs. > > I read that as *denying* FOLL_FORCE. Maybe clarify the changelog? I think maybe some helpful text missed the quote in Boris comment about other issues: "Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections." But, yea, the tenses are hard to parse. Maybe something like this: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. In userspace, shadow stack memory is writable only in very specific, controlled ways. However, since userspace can, even in the limited ways, modify shadow stack contents, the kernel treats it as writable memory. As a result, without additional work there would remain many ways for userspace to trigger the kernel to write arbitrary data to shadow stacks via get_user_pages(, FOLL_WRITE) based operations. To help userspace protect their shadow stacks, make this a little less exposed by blocking writable get_user_pages() operations for shadow stack VMAs. Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections. This is required for debugging use cases.
On Mon, Feb 27, 2023 at 2:31 PM Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > diff --git a/mm/gup.c b/mm/gup.c > index eab18ba045db..e7c7bcc0e268 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -978,7 +978,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) > return -EFAULT; > > if (write) { > - if (!(vm_flags & VM_WRITE)) { > + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { I think I missed this in the review. `VM_SHADOW_STACK` is an x86 specific vmaflag to represent a shadow stack VMA. Since this is arch agnostic code. Can we instead have `is_arch_shadow_stack_vma` which consumes vma flags and returns true. This allows different architectures to choose whatever encoding of the vma flag to represent a shadow stack. > if (!(gup_flags & FOLL_FORCE)) > return -EFAULT; > /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ > -- > 2.17.1 >
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index d81e7ec27507..2e3d8cca1195 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1638,6 +1638,11 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write) { unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER; + /* + * Write=0,Dirty=1 PTEs are shadow stack, which the kernel + * shouldn't generally allow access to, but since they + * are already Write=0, the below logic covers both cases. + */ if (write) need_pte_bits |= _PAGE_RW; diff --git a/mm/gup.c b/mm/gup.c index eab18ba045db..e7c7bcc0e268 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -978,7 +978,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) return -EFAULT; if (write) { - if (!(vm_flags & VM_WRITE)) { + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */