Message ID | 20230613001108.3040476-3-rick.p.edgecombe@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp211077vqr; Mon, 12 Jun 2023 17:14:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7ymVYgDTUkljCsXnGb1u5cqIdfAb1oa2pU24ckW8XJqpq1bbvQoU8QmMoD5OJZ/TJnc3ly X-Received: by 2002:a17:906:eec5:b0:969:bac4:8e22 with SMTP id wu5-20020a170906eec500b00969bac48e22mr11195458ejb.26.1686615283094; Mon, 12 Jun 2023 17:14:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686615283; cv=none; d=google.com; s=arc-20160816; b=kwhB+1P9P+/AMXJm34884qF20hUCVvP6LSFGcRTiP9i0WU4C1AgTUm7dYKfdFhOBAp Tn59Yzk3Z7pLbOL46ZtI4jivf0cxSABkgJGj5mvgqXll3PnuG4t+42JA219TLUgPT3Np M1Br0MEKw3TDDxkZ2Kib0hreiVEy5qUURoO/Dz6H8BSP4NXC8SimRVT3/KvvSwiCW2Sv jrAmCRPN8yNKRlxyKdrYQBzqZ7Esl8VsRu/wBvg+wf1/FrYXNE//XttsJgIn3PC1xsBQ Y6Ihn81qmaBlhTyvQVcssY5cueVZ4asaQb+2bB24urWFmn4pW15kItPn81imdPhHylH3 OIoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bhEM/cU7s2w8AGabEsZ/KJXI1WiWuC9Y86R+7+JHEQk=; b=XZrGkfkjVva5CiB89PI+HvJGBePE/rZQZs2+hmnU5f7yjm3CwD8B5dtSBZZeE9Xs7B 3YlJxry1M1MLrHO43pwLpbccdXlwZ0rtEt6mLJPloWIEBVsWSU55mnoZausQK5Vjv0nE VkFGQ4s8glEZU1dNZijVvl0lw1ZUGSXbqmk1xc4gwCENep/KDFQyUjfpXdI9MmTKzbla NmXiBr2nUN4WECACWx+PtbNrDqceLrYCU5sSsuEOK/UTPGbpPgHzN1UAwg4SdvA6nCDb 5Gcr+Uf/0ZVkRyfBqHpVS3yBo/+YtfXzJscmdxkQm0I60gtMD20Kt04E9iedBtKW8UKK NEfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=b2BP33R2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b20-20020aa7c914000000b00510aed0c7acsi6804154edt.77.2023.06.12.17.13.51; Mon, 12 Jun 2023 17:14:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=b2BP33R2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232547AbjFMAMY (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Mon, 12 Jun 2023 20:12:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238261AbjFMAML (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 12 Jun 2023 20:12:11 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6863C13E; Mon, 12 Jun 2023 17:12:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615130; x=1718151130; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3c77rqX6YUy6Cyz5r8ePlQOYiEkqAYIhDQVjx5oVoZw=; b=b2BP33R2x4W89UJISrB9QYeO5rGrEZ1SyCKsZ3eaSks8xsjHUHy5tIeK wHWWbo3Suujw4Y70dHzl2ylpHVmW+Qc+nd3Ep2R+BH1I1lCGhvuro7HFz 67Pd9IaDXVWFhK+L6r5aVc4ubsMU8sdt5CqjH92DXGWcGke7qHFabqClo Bs0EmXjyLkqmaYBNROdV1+6WLxsCfdK3s14i7fdf+YrA9d5aM4lQW8XlH qGB0ZlSh1D+tLAU91QtyWOZPpR+hcWPT2YkQp1++pRtL1bwW5XGrAwy6M Uf3IbkLAeJGoC7/v41gKJzoOaQeX/RnYQT5Pc5M3ck8NMIz/1akdfjINs Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556696" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556696" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670972" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670972" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:07 -0700 From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>, Andy Lutomirski <luto@kernel.org>, Balbir Singh <bsingharora@gmail.com>, Borislav Petkov <bp@alien8.de>, Cyrill Gorcunov <gorcunov@gmail.com>, Dave Hansen <dave.hansen@linux.intel.com>, Eugene Syromiatnikov <esyr@redhat.com>, Florian Weimer <fweimer@redhat.com>, "H . J . Lu" <hjl.tools@gmail.com>, Jann Horn <jannh@google.com>, Jonathan Corbet <corbet@lwn.net>, Kees Cook <keescook@chromium.org>, Mike Kravetz <mike.kravetz@oracle.com>, Nadav Amit <nadav.amit@gmail.com>, Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>, Peter Zijlstra <peterz@infradead.org>, Randy Dunlap <rdunlap@infradead.org>, Weijiang Yang <weijiang.yang@intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, John Allen <john.allen@amd.com>, kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v9 02/42] mm: Move pte/pmd_mkwrite() callers with no VMA to _novma() Date: Mon, 12 Jun 2023 17:10:28 -0700 Message-Id: <20230613001108.3040476-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768544307100370478?= X-GMAIL-MSGID: =?utf-8?q?1768544307100370478?= |
Series |
Shadow stacks for userspace
|
|
Commit Message
Edgecombe, Rick P
June 13, 2023, 12:10 a.m. UTC
The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.
One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable memory or shadow stack memory.
But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.
So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.
Previous patches have done the first step, so next move the callers that
don't have a VMA to pte_mkwrite_novma(). Also do the same for
pmd_mkwrite(). This will be ok for the shadow stack feature, as these
callers are on kernel memory which will not need to be made shadow stack,
and the other architectures only currently support one type of memory
in pte_mkwrite()
Cc: linux-doc@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
Hi Non-x86 Arch’s,
x86 has a feature that allows for the creation of a special type of
writable memory (shadow stack) that is only writable in limited specific
ways. Previously, changes were proposed to core MM code to teach it to
decide when to create normally writable memory or the special shadow stack
writable memory, but David Hildenbrand suggested[0] to change
pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be
moved into x86 code. Later Linus suggested a less error-prone way[1] to go
about this after the first attempt had a bug.
Since pXX_mkwrite() is defined in every arch, it requires some tree-wide
changes. So that is why you are seeing some patches out of a big x86
series pop up in your arch mailing list. There is no functional change.
After this refactor, the shadow stack series goes on to use the arch
helpers to push arch memory details inside arch/x86 and other arch's
with upcoming shadow stack features.
Testing was just 0-day build testing.
Hopefully that is enough context. Thanks!
[0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/
[1] https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/
---
arch/arm64/mm/trans_pgd.c | 4 ++--
arch/s390/mm/pageattr.c | 4 ++--
arch/x86/xen/mmu_pv.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
Comments
On Mon, Jun 12, 2023 at 05:10:28PM -0700, Rick Edgecombe wrote: > The x86 Shadow stack feature includes a new type of memory called shadow > stack. This shadow stack memory has some unusual properties, which requires > some core mm changes to function properly. > > One of these unusual properties is that shadow stack memory is writable, > but only in limited ways. These limits are applied via a specific PTE > bit combination. Nevertheless, the memory is writable, and core mm code > will need to apply the writable permissions in the typical paths that > call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so > that the x86 implementation of it can know whether to create regular > writable memory or shadow stack memory. Nit: ^ mappings? > But there are a couple of challenges to this. Modifying the signatures of > each arch pte_mkwrite() implementation would be error prone because some > are generated with macros and would need to be re-implemented. Also, some > pte_mkwrite() callers operate on kernel memory without a VMA. > > So this can be done in a three step process. First pte_mkwrite() can be > renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() > added that just calls pte_mkwrite_novma(). Next callers without a VMA can > be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers > can be changed to take/pass a VMA. > > Previous patches have done the first step, so next move the callers that > don't have a VMA to pte_mkwrite_novma(). Also do the same for I hear x86 maintainers asking to drop "previous patches" ;-) Maybe This is the second step of the conversion that moves the callers ... > pmd_mkwrite(). This will be ok for the shadow stack feature, as these > callers are on kernel memory which will not need to be made shadow stack, > and the other architectures only currently support one type of memory > in pte_mkwrite() > > Cc: linux-doc@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-s390@vger.kernel.org > Cc: xen-devel@lists.xenproject.org > Cc: linux-arch@vger.kernel.org > Cc: linux-mm@kvack.org > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
On 13.06.23 02:10, Rick Edgecombe wrote: > The x86 Shadow stack feature includes a new type of memory called shadow > stack. This shadow stack memory has some unusual properties, which requires > some core mm changes to function properly. > > One of these unusual properties is that shadow stack memory is writable, > but only in limited ways. These limits are applied via a specific PTE > bit combination. Nevertheless, the memory is writable, and core mm code > will need to apply the writable permissions in the typical paths that > call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so > that the x86 implementation of it can know whether to create regular > writable memory or shadow stack memory. > > But there are a couple of challenges to this. Modifying the signatures of > each arch pte_mkwrite() implementation would be error prone because some > are generated with macros and would need to be re-implemented. Also, some > pte_mkwrite() callers operate on kernel memory without a VMA. > > So this can be done in a three step process. First pte_mkwrite() can be > renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() > added that just calls pte_mkwrite_novma(). Next callers without a VMA can > be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers > can be changed to take/pass a VMA. > > Previous patches have done the first step, so next move the callers that > don't have a VMA to pte_mkwrite_novma(). Also do the same for > pmd_mkwrite(). This will be ok for the shadow stack feature, as these > callers are on kernel memory which will not need to be made shadow stack, > and the other architectures only currently support one type of memory > in pte_mkwrite() > > Cc: linux-doc@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-s390@vger.kernel.org > Cc: xen-devel@lists.xenproject.org > Cc: linux-arch@vger.kernel.org > Cc: linux-mm@kvack.org > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > --- Acked-by: David Hildenbrand <david@redhat.com>
On Tue, 2023-06-13 at 10:44 +0300, Mike Rapoport wrote: > > Previous patches have done the first step, so next move the callers > > that > > don't have a VMA to pte_mkwrite_novma(). Also do the same for > > I hear x86 maintainers asking to drop "previous patches" ;-) > > Maybe > This is the second step of the conversion that moves the callers ... Really? I've not heard that. Just a strong aversion to "this patch". I've got feedback to say "previous patches" and not "the last patch" so it doesn't get stale. I guess it could be "previous changes". > > > pmd_mkwrite(). This will be ok for the shadow stack feature, as > > these > > callers are on kernel memory which will not need to be made shadow > > stack, > > and the other architectures only currently support one type of > > memory > > in pte_mkwrite() > > > > Cc: linux-doc@vger.kernel.org > > Cc: linux-arm-kernel@lists.infradead.org > > Cc: linux-s390@vger.kernel.org > > Cc: xen-devel@lists.xenproject.org > > Cc: linux-arch@vger.kernel.org > > Cc: linux-mm@kvack.org > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Thanks!
On Tue, 2023-06-13 at 14:27 +0200, David Hildenbrand wrote:
> Acked-by: David Hildenbrand <david@redhat.com>
Thanks!
On 13.06.23 18:19, Edgecombe, Rick P wrote: > On Tue, 2023-06-13 at 10:44 +0300, Mike Rapoport wrote: >>> Previous patches have done the first step, so next move the callers >>> that >>> don't have a VMA to pte_mkwrite_novma(). Also do the same for >> >> I hear x86 maintainers asking to drop "previous patches" ;-) >> >> Maybe >> This is the second step of the conversion that moves the callers ... > > Really? I've not heard that. Just a strong aversion to "this patch". > I've got feedback to say "previous patches" and not "the last patch" so > it doesn't get stale. I guess it could be "previous changes". Talking about patches make sense when discussing literal patches sent to the mailing list. In the git log, it's commit, and "future commits" or "follow-up work". Yes, we use "patches" all of the time in commit logs, especially when we include the cover letter in the commit message (as done frequently in the -mm tree).
On Tue, 2023-06-13 at 19:00 +0200, David Hildenbrand wrote: > On 13.06.23 18:19, Edgecombe, Rick P wrote: > > On Tue, 2023-06-13 at 10:44 +0300, Mike Rapoport wrote: > > > > Previous patches have done the first step, so next move the > > > > callers > > > > that > > > > don't have a VMA to pte_mkwrite_novma(). Also do the same for > > > > > > I hear x86 maintainers asking to drop "previous patches" ;-) > > > > > > Maybe > > > This is the second step of the conversion that moves the callers > > > ... > > > > Really? I've not heard that. Just a strong aversion to "this > > patch". > > I've got feedback to say "previous patches" and not "the last > > patch" so > > it doesn't get stale. I guess it could be "previous changes". > > Talking about patches make sense when discussing literal patches sent > to > the mailing list. In the git log, it's commit, and "future commits" > or > "follow-up work". > > Yes, we use "patches" all of the time in commit logs, especially when > we > include the cover letter in the commit message (as done frequently > in > the -mm tree). I think I'll switch over to talking about "changes". If you talk about commits it doesn't make as much sense when they are still just patches. Thanks.
diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 4ea2eefbc053..a01493f3a06f 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -40,7 +40,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) * read only (code, rodata). Clear the RDONLY bit from * the temporary mappings we use during restore. */ - set_pte(dst_ptep, pte_mkwrite(pte)); + set_pte(dst_ptep, pte_mkwrite_novma(pte)); } else if (debug_pagealloc_enabled() && !pte_none(pte)) { /* * debug_pagealloc will removed the PTE_VALID bit if @@ -53,7 +53,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) */ BUG_ON(!pfn_valid(pte_pfn(pte))); - set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte))); + set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte))); } } diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index 5ba3bd8a7b12..6931d484d8a7 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -97,7 +97,7 @@ static int walk_pte_level(pmd_t *pmdp, unsigned long addr, unsigned long end, if (flags & SET_MEMORY_RO) new = pte_wrprotect(new); else if (flags & SET_MEMORY_RW) - new = pte_mkwrite(pte_mkdirty(new)); + new = pte_mkwrite_novma(pte_mkdirty(new)); if (flags & SET_MEMORY_NX) new = set_pte_bit(new, __pgprot(_PAGE_NOEXEC)); else if (flags & SET_MEMORY_X) @@ -155,7 +155,7 @@ static void modify_pmd_page(pmd_t *pmdp, unsigned long addr, if (flags & SET_MEMORY_RO) new = pmd_wrprotect(new); else if (flags & SET_MEMORY_RW) - new = pmd_mkwrite(pmd_mkdirty(new)); + new = pmd_mkwrite_novma(pmd_mkdirty(new)); if (flags & SET_MEMORY_NX) new = set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_NOEXEC)); else if (flags & SET_MEMORY_X) diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index b3b8d289b9ab..63fced067057 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -150,7 +150,7 @@ void make_lowmem_page_readwrite(void *vaddr) if (pte == NULL) return; /* vaddr missing */ - ptev = pte_mkwrite(*pte); + ptev = pte_mkwrite_novma(*pte); if (HYPERVISOR_update_va_mapping(address, ptev, 0)) BUG();