[v1,RFC,Zisslpcfi,11/20] mmu: maybe_mkwrite updated to manufacture shadow stack PTEs
Message ID | 20230213045351.3945824-12-debug@rivosinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2175531wrn; Sun, 12 Feb 2023 20:56:23 -0800 (PST) X-Google-Smtp-Source: AK7set9y2YQ0MTkBlpQpCljV68i+mSBkZA3OJ4zEN8WrHwt5GPfWu+GkF5lYxv0+G7RInd4v2BGR X-Received: by 2002:a05:6a20:e489:b0:bf:1662:b2f4 with SMTP id ni9-20020a056a20e48900b000bf1662b2f4mr18885686pzb.49.1676264182959; Sun, 12 Feb 2023 20:56:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676264182; cv=none; d=google.com; s=arc-20160816; b=uWUT0mHlEQaAc64tHB7r4gpfVUmSOk/IKosFlAOO+6EZZsArL3HfzSwVzhcwd4OJlu leZAfG/JtwMgI1pOkY0RhyhfJ/YpQ/AmEONOQKBZRfrn7P/iYMfGO0b1gv4YYGhB4Ph4 gnEI+sXs9DDMCg7xaM0+DStFoRarm5ECRmP6To02GEFcPPEB1iT1k5P7T97iVcuJhJhn ktLkftZds0WfEYhtJEplQbEgbeABxvVYjPAUEy2WX52/uNNgI8yn5jMT/QIRYf7M9HW8 43oWIw0JbEEfB37H8WnJ77F82IoY6Yc6hFjxWM2Ct5gVgBPH3QTzV1C9WK7oj06qx2OV WEsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=8YCem4BrJJv8uMiBliK+eUgJsgA757fo9l+siWv19bQ=; b=E2UjoF/agm9mfema7VLl9kpwAU0VGqvw3OzDxYZhWeaNs0klJbdtqmtKkFpNa36pi0 1mclLHfTuXU/IvBAM2kappR5Atik9qFr+7AZy0fqSNABrbCCqTD4NsKnzm5RN+aHGkjO LP17h4Ztuw9cFzXyjLkNawpY6hxxZ5x1AowO/D96LWkrG2csu6UUSAptDZp9qMi17T6b KML+WoQF995AAEV3kp8qAV0NVvz5iE1rn1FUM9BT+Y/3JgbKTQcNUybCgwB21d3Vfx0l 5Ly7GkNFHFWyDHbZjTEnGB34tMzWjF5MDXEP55jG9/tTJeNc9S8A5VeYA60MW/VKWAHW NtYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20210112.gappssmtp.com header.s=20210112 header.b=K5WRT8kv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z18-20020a63b912000000b004fb96c08146si3238604pge.634.2023.02.12.20.56.10; Sun, 12 Feb 2023 20:56:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20210112.gappssmtp.com header.s=20210112 header.b=K5WRT8kv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229892AbjBMEzB (ORCPT <rfc822;tebrre53rla2o@gmail.com> + 99 others); Sun, 12 Feb 2023 23:55:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229897AbjBMEyo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 12 Feb 2023 23:54:44 -0500 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29F1111EAC for <linux-kernel@vger.kernel.org>; Sun, 12 Feb 2023 20:54:19 -0800 (PST) Received: by mail-pl1-x62d.google.com with SMTP id k13so12361796plg.0 for <linux-kernel@vger.kernel.org>; Sun, 12 Feb 2023 20:54:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8YCem4BrJJv8uMiBliK+eUgJsgA757fo9l+siWv19bQ=; b=K5WRT8kvkrTRnxFXNxNWmZZe1eWQeiMi1qunE1LIKywvna8j3v5SzYl1jChj3QeZjI oNQVdHcccQpG2WuR1gzQg9kq/+0VuzgdVWD7JtRLTBXEFTOX1GEgMl7e5WBjD6OfQPdd 71LTAu0tFhni4ZSn88ImId4fpVYObX1bY6JW37HXCp7hvg9l286KFURTu9ooi1NBKqLt u26MG8kM25n8svLfFhRQp8B3l5+xVXSXRfyHCQXaBio+kKCjC+Q8QLGXOyPk+PiaEIGE UxpBMq4IRtaJ3U/MEkWjmDf10E4N70sMBA71dAXlsLakRYCt8pcisJr17siP90QETbWq 4NMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8YCem4BrJJv8uMiBliK+eUgJsgA757fo9l+siWv19bQ=; b=4B/NpWzf0qtQkmkBRHHIBdGzh0rUvhQ6FToMZz1H4oQsTxjwVzvnABvCj3m6xI9VW5 OcDjJ5Mpj4AxNQrBaAMWHStwCzvO/MEONnXGAzbLhBhDl02yYXA/lUgiTo0dLHnekwHI I8YRbWsLc6IaZ9l6RHUm2OukUrVXbje86FOPr1RQIlZXzaD1Cib1KAScajLnIaGEwm4j GL7yi/EFjpcT9fYlM2hhRp5PvnTu+Wtzpz4+HZXtqUo3AEfe3IM+SQ+kRY/+tRUgIN6H QbXlvt3j+kc25xfnXRw1noFmgfIdDQXCn4vBZ07v0XG6MfixFzc1SmCmEas3Lb1xQ4h4 wFuQ== X-Gm-Message-State: AO0yUKXzataOCEysrZMbgx/btzY43IOB1Hw0d4ipn6f7a6fnPKRsVoUD 2RydXBrPwJyUTR+wC9270sK5J0yA/G9o+JVl X-Received: by 2002:a17:902:e74c:b0:199:2a36:6c3f with SMTP id p12-20020a170902e74c00b001992a366c3fmr27035701plf.6.1676264058174; Sun, 12 Feb 2023 20:54:18 -0800 (PST) Received: from debug.ba.rivosinc.com ([66.220.2.162]) by smtp.gmail.com with ESMTPSA id e5-20020a170902784500b00189e7cb8b89sm7078303pln.127.2023.02.12.20.54.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Feb 2023 20:54:17 -0800 (PST) From: Deepak Gupta <debug@rivosinc.com> To: linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Andrew Morton <akpm@linux-foundation.org> Cc: Deepak Gupta <debug@rivosinc.com>, linux-mm@kvack.org Subject: [PATCH v1 RFC Zisslpcfi 11/20] mmu: maybe_mkwrite updated to manufacture shadow stack PTEs Date: Sun, 12 Feb 2023 20:53:40 -0800 Message-Id: <20230213045351.3945824-12-debug@rivosinc.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230213045351.3945824-1-debug@rivosinc.com> References: <20230213045351.3945824-1-debug@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757690392160845635?= X-GMAIL-MSGID: =?utf-8?q?1757690392160845635?= |
Series |
riscv control-flow integrity for U mode
|
|
Commit Message
Deepak Gupta
Feb. 13, 2023, 4:53 a.m. UTC
maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if
VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write-
able memory except it can only be written by certain specific
instructions. This patch allows maybe_mkwrite to create shadow stack PTEs
if vma is shadow stack VMA. Each arch can define which combination of VMA
flags means a shadow stack.
Additionally pte_mkshdwstk must be provided by arch specific PTE
construction headers to create shadow stack PTEs. (in arch specific
pgtable.h).
This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK
is not selected.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
include/linux/mm.h | 23 +++++++++++++++++++++--
include/linux/pgtable.h | 4 ++++
2 files changed, 25 insertions(+), 2 deletions(-)
Comments
On 13.02.23 05:53, Deepak Gupta wrote: > maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if > VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- > able memory except it can only be written by certain specific > instructions. This patch allows maybe_mkwrite to create shadow stack PTEs > if vma is shadow stack VMA. Each arch can define which combination of VMA > flags means a shadow stack. > > Additionally pte_mkshdwstk must be provided by arch specific PTE > construction headers to create shadow stack PTEs. (in arch specific > pgtable.h). > > This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK > is not selected. > > Signed-off-by: Deepak Gupta <debug@rivosinc.com> > --- > include/linux/mm.h | 23 +++++++++++++++++++++-- > include/linux/pgtable.h | 4 ++++ > 2 files changed, 25 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 8f857163ac89..a7705bc49bfe 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) > void free_compound_page(struct page *page); > > #ifdef CONFIG_MMU > + > +#ifdef CONFIG_USER_SHADOW_STACK > +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); > +#endif > + > +static inline bool > +is_shadow_stack_vma(struct vm_area_struct *vma) > +{ > +#ifdef CONFIG_USER_SHADOW_STACK > + return arch_is_shadow_stack_vma(vma); > +#else > + return false; > +#endif > +} > + > /* > * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when > * servicing faults for write access. In the normal case, do always want > @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); > */ > static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) > { > - if (likely(vma->vm_flags & VM_WRITE)) > - pte = pte_mkwrite(pte); > + if (likely(vma->vm_flags & VM_WRITE)) { > + if (unlikely(is_shadow_stack_vma(vma))) > + pte = pte_mkshdwstk(pte); > + else > + pte = pte_mkwrite(pte); > + } > return pte; Exactly what we are trying to avoid in the x86 approach right now. Please see the x86 series on details, we shouldn't try reinventing the wheel but finding a core-mm approach that fits multiple architectures. https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com
On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote: >On 13.02.23 05:53, Deepak Gupta wrote: >>maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if >>VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- >>able memory except it can only be written by certain specific >>instructions. This patch allows maybe_mkwrite to create shadow stack PTEs >>if vma is shadow stack VMA. Each arch can define which combination of VMA >>flags means a shadow stack. >> >>Additionally pte_mkshdwstk must be provided by arch specific PTE >>construction headers to create shadow stack PTEs. (in arch specific >>pgtable.h). >> >>This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK >>is not selected. >> >>Signed-off-by: Deepak Gupta <debug@rivosinc.com> >>--- >> include/linux/mm.h | 23 +++++++++++++++++++++-- >> include/linux/pgtable.h | 4 ++++ >> 2 files changed, 25 insertions(+), 2 deletions(-) >> >>diff --git a/include/linux/mm.h b/include/linux/mm.h >>index 8f857163ac89..a7705bc49bfe 100644 >>--- a/include/linux/mm.h >>+++ b/include/linux/mm.h >>@@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) >> void free_compound_page(struct page *page); >> #ifdef CONFIG_MMU >>+ >>+#ifdef CONFIG_USER_SHADOW_STACK >>+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); >>+#endif >>+ >>+static inline bool >>+is_shadow_stack_vma(struct vm_area_struct *vma) >>+{ >>+#ifdef CONFIG_USER_SHADOW_STACK >>+ return arch_is_shadow_stack_vma(vma); >>+#else >>+ return false; >>+#endif >>+} >>+ >> /* >> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when >> * servicing faults for write access. In the normal case, do always want >>@@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); >> */ >> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >> { >>- if (likely(vma->vm_flags & VM_WRITE)) >>- pte = pte_mkwrite(pte); >>+ if (likely(vma->vm_flags & VM_WRITE)) { >>+ if (unlikely(is_shadow_stack_vma(vma))) >>+ pte = pte_mkshdwstk(pte); >>+ else >>+ pte = pte_mkwrite(pte); >>+ } >> return pte; > >Exactly what we are trying to avoid in the x86 approach right now. >Please see the x86 series on details, we shouldn't try reinventing the >wheel but finding a core-mm approach that fits multiple architectures. > >https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com Thanks David for comment here. I looked at x86 approach. This patch actually written in a way which is not re-inventing wheel and is following a core-mm approach that fits multiple architectures. Change above checks `is_shadow_stack_vma` and if it returns true then only it manufactures shadow stack pte else it'll make a regular writeable mapping. Now if we look at `is_shadow_stack_vma` implementation, it returns false if `CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is defined then it calls `arch_is_shadow_stack_vma` which should be implemented by arch specific code. This allows each architecture to define their own vma flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE` which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0) Additionally pte_mkshdwstk will be nop if not implemented by architecture. Let me know if this make sense. If I am missing something here, let me know. > >-- >Thanks, > >David / dhildenb >
On 13.02.23 15:37, Deepak Gupta wrote: > On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote: >> On 13.02.23 05:53, Deepak Gupta wrote: >>> maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if >>> VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- >>> able memory except it can only be written by certain specific >>> instructions. This patch allows maybe_mkwrite to create shadow stack PTEs >>> if vma is shadow stack VMA. Each arch can define which combination of VMA >>> flags means a shadow stack. >>> >>> Additionally pte_mkshdwstk must be provided by arch specific PTE >>> construction headers to create shadow stack PTEs. (in arch specific >>> pgtable.h). >>> >>> This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK >>> is not selected. >>> >>> Signed-off-by: Deepak Gupta <debug@rivosinc.com> >>> --- >>> include/linux/mm.h | 23 +++++++++++++++++++++-- >>> include/linux/pgtable.h | 4 ++++ >>> 2 files changed, 25 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index 8f857163ac89..a7705bc49bfe 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) >>> void free_compound_page(struct page *page); >>> #ifdef CONFIG_MMU >>> + >>> +#ifdef CONFIG_USER_SHADOW_STACK >>> +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); >>> +#endif >>> + >>> +static inline bool >>> +is_shadow_stack_vma(struct vm_area_struct *vma) >>> +{ >>> +#ifdef CONFIG_USER_SHADOW_STACK >>> + return arch_is_shadow_stack_vma(vma); >>> +#else >>> + return false; >>> +#endif >>> +} >>> + >>> /* >>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when >>> * servicing faults for write access. In the normal case, do always want >>> @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); >>> */ >>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >>> { >>> - if (likely(vma->vm_flags & VM_WRITE)) >>> - pte = pte_mkwrite(pte); >>> + if (likely(vma->vm_flags & VM_WRITE)) { >>> + if (unlikely(is_shadow_stack_vma(vma))) >>> + pte = pte_mkshdwstk(pte); >>> + else >>> + pte = pte_mkwrite(pte); >>> + } >>> return pte; >> >> Exactly what we are trying to avoid in the x86 approach right now. >> Please see the x86 series on details, we shouldn't try reinventing the >> wheel but finding a core-mm approach that fits multiple architectures. >> >> https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com > > Thanks David for comment here. I looked at x86 approach. This patch > actually written in a way which is not re-inventing wheel and is following > a core-mm approach that fits multiple architectures. > > Change above checks `is_shadow_stack_vma` and if it returns true then only > it manufactures shadow stack pte else it'll make a regular writeable mapping. > > Now if we look at `is_shadow_stack_vma` implementation, it returns false if > `CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is > defined then it calls `arch_is_shadow_stack_vma` which should be implemented > by arch specific code. This allows each architecture to define their own vma > flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE` > which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0) > > Additionally pte_mkshdwstk will be nop if not implemented by architecture. > > Let me know if this make sense. If I am missing something here, let me know. See the discussion in that thread. The idea is to pass a VMA to pte_mkwrite() and let it handle how to actually set it writable.
On Mon, Feb 13, 2023 at 03:56:22PM +0100, David Hildenbrand wrote: >On 13.02.23 15:37, Deepak Gupta wrote: >>On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote: >>>On 13.02.23 05:53, Deepak Gupta wrote: >>>>maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if >>>>VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- >>>>able memory except it can only be written by certain specific >>>>instructions. This patch allows maybe_mkwrite to create shadow stack PTEs >>>>if vma is shadow stack VMA. Each arch can define which combination of VMA >>>>flags means a shadow stack. >>>> >>>>Additionally pte_mkshdwstk must be provided by arch specific PTE >>>>construction headers to create shadow stack PTEs. (in arch specific >>>>pgtable.h). >>>> >>>>This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK >>>>is not selected. >>>> >>>>Signed-off-by: Deepak Gupta <debug@rivosinc.com> >>>>--- >>>> include/linux/mm.h | 23 +++++++++++++++++++++-- >>>> include/linux/pgtable.h | 4 ++++ >>>> 2 files changed, 25 insertions(+), 2 deletions(-) >>>> >>>>diff --git a/include/linux/mm.h b/include/linux/mm.h >>>>index 8f857163ac89..a7705bc49bfe 100644 >>>>--- a/include/linux/mm.h >>>>+++ b/include/linux/mm.h >>>>@@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) >>>> void free_compound_page(struct page *page); >>>> #ifdef CONFIG_MMU >>>>+ >>>>+#ifdef CONFIG_USER_SHADOW_STACK >>>>+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); >>>>+#endif >>>>+ >>>>+static inline bool >>>>+is_shadow_stack_vma(struct vm_area_struct *vma) >>>>+{ >>>>+#ifdef CONFIG_USER_SHADOW_STACK >>>>+ return arch_is_shadow_stack_vma(vma); >>>>+#else >>>>+ return false; >>>>+#endif >>>>+} >>>>+ >>>> /* >>>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when >>>> * servicing faults for write access. In the normal case, do always want >>>>@@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); >>>> */ >>>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >>>> { >>>>- if (likely(vma->vm_flags & VM_WRITE)) >>>>- pte = pte_mkwrite(pte); >>>>+ if (likely(vma->vm_flags & VM_WRITE)) { >>>>+ if (unlikely(is_shadow_stack_vma(vma))) >>>>+ pte = pte_mkshdwstk(pte); >>>>+ else >>>>+ pte = pte_mkwrite(pte); >>>>+ } >>>> return pte; >>> >>>Exactly what we are trying to avoid in the x86 approach right now. >>>Please see the x86 series on details, we shouldn't try reinventing the >>>wheel but finding a core-mm approach that fits multiple architectures. >>> >>>https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com >> >>Thanks David for comment here. I looked at x86 approach. This patch >>actually written in a way which is not re-inventing wheel and is following >>a core-mm approach that fits multiple architectures. >> >>Change above checks `is_shadow_stack_vma` and if it returns true then only >>it manufactures shadow stack pte else it'll make a regular writeable mapping. >> >>Now if we look at `is_shadow_stack_vma` implementation, it returns false if >>`CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is >>defined then it calls `arch_is_shadow_stack_vma` which should be implemented >>by arch specific code. This allows each architecture to define their own vma >>flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE` >>which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0) >> >>Additionally pte_mkshdwstk will be nop if not implemented by architecture. >> >>Let me know if this make sense. If I am missing something here, let me know. > >See the discussion in that thread. The idea is to pass a VMA to >pte_mkwrite() and let it handle how to actually set it writable. > Thanks. I see. Instances where `pte_mkwrite` is directly invoked by checking VM_WRITE and thus instead of fixing all those instance, make pte_mkwrite itself take vma flag or vma. I'll revise. >-- >Thanks, > >David / dhildenb >
On 13.02.23 21:01, Deepak Gupta wrote: > On Mon, Feb 13, 2023 at 03:56:22PM +0100, David Hildenbrand wrote: >> On 13.02.23 15:37, Deepak Gupta wrote: >>> On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote: >>>> On 13.02.23 05:53, Deepak Gupta wrote: >>>>> maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if >>>>> VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- >>>>> able memory except it can only be written by certain specific >>>>> instructions. This patch allows maybe_mkwrite to create shadow stack PTEs >>>>> if vma is shadow stack VMA. Each arch can define which combination of VMA >>>>> flags means a shadow stack. >>>>> >>>>> Additionally pte_mkshdwstk must be provided by arch specific PTE >>>>> construction headers to create shadow stack PTEs. (in arch specific >>>>> pgtable.h). >>>>> >>>>> This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK >>>>> is not selected. >>>>> >>>>> Signed-off-by: Deepak Gupta <debug@rivosinc.com> >>>>> --- >>>>> include/linux/mm.h | 23 +++++++++++++++++++++-- >>>>> include/linux/pgtable.h | 4 ++++ >>>>> 2 files changed, 25 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>>> index 8f857163ac89..a7705bc49bfe 100644 >>>>> --- a/include/linux/mm.h >>>>> +++ b/include/linux/mm.h >>>>> @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) >>>>> void free_compound_page(struct page *page); >>>>> #ifdef CONFIG_MMU >>>>> + >>>>> +#ifdef CONFIG_USER_SHADOW_STACK >>>>> +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); >>>>> +#endif >>>>> + >>>>> +static inline bool >>>>> +is_shadow_stack_vma(struct vm_area_struct *vma) >>>>> +{ >>>>> +#ifdef CONFIG_USER_SHADOW_STACK >>>>> + return arch_is_shadow_stack_vma(vma); >>>>> +#else >>>>> + return false; >>>>> +#endif >>>>> +} >>>>> + >>>>> /* >>>>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when >>>>> * servicing faults for write access. In the normal case, do always want >>>>> @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); >>>>> */ >>>>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >>>>> { >>>>> - if (likely(vma->vm_flags & VM_WRITE)) >>>>> - pte = pte_mkwrite(pte); >>>>> + if (likely(vma->vm_flags & VM_WRITE)) { >>>>> + if (unlikely(is_shadow_stack_vma(vma))) >>>>> + pte = pte_mkshdwstk(pte); >>>>> + else >>>>> + pte = pte_mkwrite(pte); >>>>> + } >>>>> return pte; >>>> >>>> Exactly what we are trying to avoid in the x86 approach right now. >>>> Please see the x86 series on details, we shouldn't try reinventing the >>>> wheel but finding a core-mm approach that fits multiple architectures. >>>> >>>> https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com >>> >>> Thanks David for comment here. I looked at x86 approach. This patch >>> actually written in a way which is not re-inventing wheel and is following >>> a core-mm approach that fits multiple architectures. >>> >>> Change above checks `is_shadow_stack_vma` and if it returns true then only >>> it manufactures shadow stack pte else it'll make a regular writeable mapping. >>> >>> Now if we look at `is_shadow_stack_vma` implementation, it returns false if >>> `CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is >>> defined then it calls `arch_is_shadow_stack_vma` which should be implemented >>> by arch specific code. This allows each architecture to define their own vma >>> flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE` >>> which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0) >>> >>> Additionally pte_mkshdwstk will be nop if not implemented by architecture. >>> >>> Let me know if this make sense. If I am missing something here, let me know. >> >> See the discussion in that thread. The idea is to pass a VMA to >> pte_mkwrite() and let it handle how to actually set it writable. >> > > Thanks. I see. Instances where `pte_mkwrite` is directly invoked by checking > VM_WRITE and thus instead of fixing all those instance, make pte_mkwrite itself > take vma flag or vma. > > I'll revise. Thanks, it would be great to discuss in the other threads what else you would need to make it work for you. I assume Rick will have something to play with soonish (Right, Rick? :) ).
On Tue, 2023-02-14 at 13:10 +0100, David Hildenbrand wrote: > I assume Rick will have something to > play with soonish (Right, Rick? :) ). Yes, Deepak and I were discussing on the x86 series. I haven't heard anything from 0-day for a few days so looking good. There was discussion happening with Boris on the pte_modify() patch, so might wait a day more to post a new version.
diff --git a/include/linux/mm.h b/include/linux/mm.h index 8f857163ac89..a7705bc49bfe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) void free_compound_page(struct page *page); #ifdef CONFIG_MMU + +#ifdef CONFIG_USER_SHADOW_STACK +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); +#endif + +static inline bool +is_shadow_stack_vma(struct vm_area_struct *vma) +{ +#ifdef CONFIG_USER_SHADOW_STACK + return arch_is_shadow_stack_vma(vma); +#else + return false; +#endif +} + /* * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when * servicing faults for write access. In the normal case, do always want @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); */ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { - if (likely(vma->vm_flags & VM_WRITE)) - pte = pte_mkwrite(pte); + if (likely(vma->vm_flags & VM_WRITE)) { + if (unlikely(is_shadow_stack_vma(vma))) + pte = pte_mkshdwstk(pte); + else + pte = pte_mkwrite(pte); + } return pte; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 1159b25b0542..94b157218c73 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1736,4 +1736,8 @@ pgprot_t vm_get_page_prot(unsigned long vm_flags) \ } \ EXPORT_SYMBOL(vm_get_page_prot); +#ifndef CONFIG_USER_SHADOW_STACK +#define pte_mkshdwstk(pte) pte +#endif + #endif /* _LINUX_PGTABLE_H */