Message ID | 20230518110727.2106156-5-ryan.roberts@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp415660vqo; Thu, 18 May 2023 04:26:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4Cng6sNAd9n8hhj+2UxdtA5YRwB4w+r75F3UzzixNn6Uwq2o6+6//p6ZoF4Ru4HGyNctC2 X-Received: by 2002:a17:903:24d:b0:1ac:5382:6e24 with SMTP id j13-20020a170903024d00b001ac53826e24mr2639689plh.10.1684409218729; Thu, 18 May 2023 04:26:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684409218; cv=none; d=google.com; s=arc-20160816; b=UoEI1PgKatAjEPmCd0usfSb+c/SoCPzRgKKZMeazui7mluyVZijzn5wievdngG15+m rxTfxEHV4s3erW1w4evu4zndAa4F5YP2qeriVZrAC0MygN/DcB401mb5V+mnRNXi61en yokEQNmrGYIkJSZLyFnC8fSEn+l7jKJFptYvyDraTySIzCyM+kYep90m38umxbHPBC7i qzqM+4QDnQySQrhGwMBD33bCOM2N6ZPHQGwPDTn42QoNsOgWMDCKLFEkdU380YXUed6/ w7fIxAtsGOtzuPCIurhABKQfonH0ZcJlUkSgDpsgSAirVf3lLO/ePEL6zDte3wKOHVoM 4RqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=/52sUb9+9WPQZmxdyTfX0GVTpGvmb9ShLbs41ARbI6k=; b=OmQLTciuHf5uNKSAL2BbDAbOkATFndAl4qjnhkIfIHO7wwevoKwtb0ekTNeH0YnicH VrZkmPaM6lI7se+NsoQl+PDoj3R1Moz6upQt/gzhrUlu0u6Izil39VSmLCWbJ0dRZtBn 3QKbIsAhTGkm6d6RD++A0ynt9WLnRdhfCz7SxAq9ht30rF9HgjAeqiNBffee+kiJYatZ WKJt9yBCpcN9yPY8Lkxq0dRUGAxt157xh48/7Ff0k5ghrbeGzD+shvs2Fv1LUbXxdY01 WGXmn7D7D03vCEu4WpMnbfMaBiGEq8wzpwRN2pHN12vjIZK4PPZlW13r638Va5CB0e1p +PDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id iw2-20020a170903044200b001ae5fcecb49si1114432plb.25.2023.05.18.04.26.45; Thu, 18 May 2023 04:26:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230513AbjERLHx (ORCPT <rfc822;pacteraone@gmail.com> + 99 others); Thu, 18 May 2023 07:07:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230413AbjERLHq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 18 May 2023 07:07:46 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 92D9510D8 for <linux-kernel@vger.kernel.org>; Thu, 18 May 2023 04:07:45 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 311D91595; Thu, 18 May 2023 04:08:30 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CB3943F793; Thu, 18 May 2023 04:07:43 -0700 (PDT) From: Ryan Roberts <ryan.roberts@arm.com> To: Andrew Morton <akpm@linux-foundation.org>, SeongJae Park <sj@kernel.org>, Christoph Hellwig <hch@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Lorenzo Stoakes <lstoakes@gmail.com>, Uladzislau Rezki <urezki@gmail.com>, Zi Yan <ziy@nvidia.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, damon@lists.linux.dev Subject: [PATCH v2 4/5] mm: Add new ptep_deref() helper to fully encapsulate pte_t Date: Thu, 18 May 2023 12:07:26 +0100 Message-Id: <20230518110727.2106156-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230518110727.2106156-1-ryan.roberts@arm.com> References: <20230518110727.2106156-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766231081189705297?= X-GMAIL-MSGID: =?utf-8?q?1766231081189705297?= |
Series |
Encapsulate PTE contents from non-arch code
|
|
Commit Message
Ryan Roberts
May 18, 2023, 11:07 a.m. UTC
There are many call sites that directly dereference a pte_t pointer.
This makes it very difficult to properly encapsulate a page table in the
arch code without having to allocate shadow page tables. ptep_deref()
aims to solve this by replacing all direct dereferences with a call to
this function.
The default implementation continues to just dereference the pointer
(*ptep), so generated code should be exactly the same. However, it is
possible for the architecture to override the default with their own
implementation, that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
While ptep_get() and ptep_get_lockless() already exist, these are
implemented as atomic accesses (e.g. READ_ONCE() in the default case).
So rather than using ptep_get() and risking performance regressions,
introduce an new variant.
Call sites will be converted to use the accessor in future commits.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
include/linux/pgtable.h | 7 +++++++
1 file changed, 7 insertions(+)
Comments
On Thu, May 18, 2023 at 5:07 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > > There are many call sites that directly dereference a pte_t pointer. > This makes it very difficult to properly encapsulate a page table in the > arch code without having to allocate shadow page tables. ptep_deref() > aims to solve this by replacing all direct dereferences with a call to > this function. > > The default implementation continues to just dereference the pointer > (*ptep), so generated code should be exactly the same. However, it is > possible for the architecture to override the default with their own > implementation, that can (e.g.) hide certain bits from the core code, or > determine young/dirty status by mixing in state from another source. > > While ptep_get() and ptep_get_lockless() already exist, these are > implemented as atomic accesses (e.g. READ_ONCE() in the default case). > So rather than using ptep_get() and risking performance regressions, > introduce an new variant. We should reuse ptep_get(): 1. I don't think READ_ONCE() can cause measurable regressions in this case. 2. It's technically wrong without it.
On 18/05/2023 20:28, Yu Zhao wrote: > On Thu, May 18, 2023 at 5:07 AM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> There are many call sites that directly dereference a pte_t pointer. >> This makes it very difficult to properly encapsulate a page table in the >> arch code without having to allocate shadow page tables. ptep_deref() >> aims to solve this by replacing all direct dereferences with a call to >> this function. >> >> The default implementation continues to just dereference the pointer >> (*ptep), so generated code should be exactly the same. However, it is >> possible for the architecture to override the default with their own >> implementation, that can (e.g.) hide certain bits from the core code, or >> determine young/dirty status by mixing in state from another source. >> >> While ptep_get() and ptep_get_lockless() already exist, these are >> implemented as atomic accesses (e.g. READ_ONCE() in the default case). >> So rather than using ptep_get() and risking performance regressions, >> introduce an new variant. > > We should reuse ptep_get(): > 1. I don't think READ_ONCE() can cause measurable regressions in this case. > 2. It's technically wrong without it. Can you clarify what you mean by technically wrong? Are you saying that the current code that does direct dereferencing is buggy? I previously convinced myself that the potential for the compiler generating multiple loads was safe because the code in question is under the PTL so there are no concurrent stores. And we shouldn't see any tearing for the same reason. That said, if there is concensus that we can just use ptep_get() (== READ_ONCE()) everywhere, then I agree that would be cleaner. Does anyone object?
On Thu, May 18, 2023 at 12:07:26PM +0100, Ryan Roberts wrote: > There are many call sites that directly dereference a pte_t pointer. > This makes it very difficult to properly encapsulate a page table in the > arch code without having to allocate shadow page tables. ptep_deref() > aims to solve this by replacing all direct dereferences with a call to > this function. > > The default implementation continues to just dereference the pointer > (*ptep), so generated code should be exactly the same. However, it is > possible for the architecture to override the default with their own > implementation, that can (e.g.) hide certain bits from the core code, or > determine young/dirty status by mixing in state from another source. > > While ptep_get() and ptep_get_lockless() already exist, these are > implemented as atomic accesses (e.g. READ_ONCE() in the default case). > So rather than using ptep_get() and risking performance regressions, > introduce an new variant. > > Call sites will be converted to use the accessor in future commits. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > include/linux/pgtable.h | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index c5a51481bbb9..1161beab2492 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -204,6 +204,13 @@ static inline int pudp_set_access_flags(struct vm_area_struct *vma, > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > #endif > > +#ifndef ptep_deref > +static inline pte_t ptep_deref(pte_t *ptep) > +{ > + return *(pte_t *)ptep; Why do you need the casting here? > +} > +#endif > + > #ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG > static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, > unsigned long address, > -- > 2.25.1 > >
On 24/05/2023 20:06, Mike Rapoport wrote: > On Thu, May 18, 2023 at 12:07:26PM +0100, Ryan Roberts wrote: >> There are many call sites that directly dereference a pte_t pointer. >> This makes it very difficult to properly encapsulate a page table in the >> arch code without having to allocate shadow page tables. ptep_deref() >> aims to solve this by replacing all direct dereferences with a call to >> this function. >> >> The default implementation continues to just dereference the pointer >> (*ptep), so generated code should be exactly the same. However, it is >> possible for the architecture to override the default with their own >> implementation, that can (e.g.) hide certain bits from the core code, or >> determine young/dirty status by mixing in state from another source. >> >> While ptep_get() and ptep_get_lockless() already exist, these are >> implemented as atomic accesses (e.g. READ_ONCE() in the default case). >> So rather than using ptep_get() and risking performance regressions, >> introduce an new variant. >> >> Call sites will be converted to use the accessor in future commits. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> include/linux/pgtable.h | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h >> index c5a51481bbb9..1161beab2492 100644 >> --- a/include/linux/pgtable.h >> +++ b/include/linux/pgtable.h >> @@ -204,6 +204,13 @@ static inline int pudp_set_access_flags(struct vm_area_struct *vma, >> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ >> #endif >> >> +#ifndef ptep_deref >> +static inline pte_t ptep_deref(pte_t *ptep) >> +{ >> + return *(pte_t *)ptep; > > Why do you need the casting here? I don't - good spot. Will fix for v3. This is some residue from one of the approaches I took to finding all the call sites, where I globally did s/pte_t */pte_handle_t/ and typedef'ed pte_handle_t as a void*. Then the compiler would error on any attempted dereferences, but I had to explicitly cast in the places that could legitimately dereference. Thanks for the reviews. > >> +} >> +#endif >> + >> #ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG >> static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, >> unsigned long address, >> -- >> 2.25.1 >> >> >
On 19/05/2023 10:12, Ryan Roberts wrote: > On 18/05/2023 20:28, Yu Zhao wrote: >> On Thu, May 18, 2023 at 5:07 AM Ryan Roberts <ryan.roberts@arm.com> wrote: >>> >>> There are many call sites that directly dereference a pte_t pointer. >>> This makes it very difficult to properly encapsulate a page table in the >>> arch code without having to allocate shadow page tables. ptep_deref() >>> aims to solve this by replacing all direct dereferences with a call to >>> this function. >>> >>> The default implementation continues to just dereference the pointer >>> (*ptep), so generated code should be exactly the same. However, it is >>> possible for the architecture to override the default with their own >>> implementation, that can (e.g.) hide certain bits from the core code, or >>> determine young/dirty status by mixing in state from another source. >>> >>> While ptep_get() and ptep_get_lockless() already exist, these are >>> implemented as atomic accesses (e.g. READ_ONCE() in the default case). >>> So rather than using ptep_get() and risking performance regressions, >>> introduce an new variant. >> >> We should reuse ptep_get(): >> 1. I don't think READ_ONCE() can cause measurable regressions in this case. >> 2. It's technically wrong without it. > > Can you clarify what you mean by technically wrong? Are you saying that the > current code that does direct dereferencing is buggy? > > I previously convinced myself that the potential for the compiler generating > multiple loads was safe because the code in question is under the PTL so there > are no concurrent stores. And we shouldn't see any tearing for the same reason. > > That said, if there is concensus that we can just use ptep_get() (== > READ_ONCE()) everywhere, then I agree that would be cleaner. Does anyone object? Hi all, A politie bump: It would be great to hear opinions on this before I go ahead and make the change. Thanks, Ryan
On Fri, May 19, 2023 at 3:12 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 18/05/2023 20:28, Yu Zhao wrote: > > On Thu, May 18, 2023 at 5:07 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > >> > >> There are many call sites that directly dereference a pte_t pointer. > >> This makes it very difficult to properly encapsulate a page table in the > >> arch code without having to allocate shadow page tables. ptep_deref() > >> aims to solve this by replacing all direct dereferences with a call to > >> this function. > >> > >> The default implementation continues to just dereference the pointer > >> (*ptep), so generated code should be exactly the same. However, it is > >> possible for the architecture to override the default with their own > >> implementation, that can (e.g.) hide certain bits from the core code, or > >> determine young/dirty status by mixing in state from another source. > >> > >> While ptep_get() and ptep_get_lockless() already exist, these are > >> implemented as atomic accesses (e.g. READ_ONCE() in the default case). > >> So rather than using ptep_get() and risking performance regressions, > >> introduce an new variant. > > > > We should reuse ptep_get(): > > 1. I don't think READ_ONCE() can cause measurable regressions in this case. > > 2. It's technically wrong without it. > > Can you clarify what you mean by technically wrong? Are you saying that the > current code that does direct dereferencing is buggy? Sorry for not being clear. I think we can agree that *ptep is volatile. Not being treated as such seems a bad idea to me. I don't think it'd cause any real problems -- most warnings KCSAN reported didn't either, but we fixed them anyway. So should we fix this case as well while we are at it. > I previously convinced myself that the potential for the compiler generating > multiple loads was safe because the code in question is under the PTL so there > are no concurrent stores. And we shouldn't see any tearing for the same reason. > > That said, if there is concensus that we can just use ptep_get() (== > READ_ONCE()) everywhere, then I agree that would be cleaner. Does anyone object? (No objection to NOT using it either. Just a recommendation, since it's already there.)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c5a51481bbb9..1161beab2492 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -204,6 +204,13 @@ static inline int pudp_set_access_flags(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif +#ifndef ptep_deref +static inline pte_t ptep_deref(pte_t *ptep) +{ + return *(pte_t *)ptep; +} +#endif + #ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long address,