Message ID | 20230718234512.1690985-11-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c923:0:b0:3e4:2afc:c1 with SMTP id j3csp2097237vqt; Tue, 18 Jul 2023 17:17:00 -0700 (PDT) X-Google-Smtp-Source: APBJJlE5CZv8FCnYHWb6S40OdH8hyjretyWZfo2AhaytgF7C/S+8l5BRYUrObL9/TI1X9dRZRIKH X-Received: by 2002:a17:902:6942:b0:1b9:cca6:551b with SMTP id k2-20020a170902694200b001b9cca6551bmr3025224plt.7.1689725820341; Tue, 18 Jul 2023 17:17:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689725820; cv=none; d=google.com; s=arc-20160816; b=T8zRKs7auuSwJPZDcsBPuGJ96v9NCoB+g2tmO7NA0qn7D1Z1Sw4I68IcZssbGeMfwu 2fkym5x/CKQ5YY5O2HwKPdwDAW4f6kH0xfQldP1Mim+2TBoyf4vMsXqHELPIk+qs/aJE iJqyXjtAgORjE9gJZzspuJi8YPzWmU3Xgg4RvJNHhEG5ehuEK9Ro9SNSRyWCjoDolQvP NpvaVRmbrvdNYeA9rXrczlTWeqSGHNY4xzRHGZzRmIM1g4tqOhvPWlvnmCxO/H0AAA6G qPpWeKTCY6jKDLcdWuGCxE9GuBq7meTcZ5Ny8RsyGofcEhETiS0ICMXLnILt2HDe6i5R mCsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=muvilKyoWvn1tarN80afquV4Wd3ZvyUZATzGD2ucPnA=; fh=D6d5aNryYQiYyjenfSFWOT3594jRd4xJmyqVXmIxeQo=; b=QkoTC0lEgEKy+YueUoKl4S/kJhNzHebUC1YpRbZ6OHpm+KqiyMYM/viOcMmodHKcmC FeGTQKWZam5kmZyW7Ss0BgryH/qnVmX64tZiaUOUqvEVW6bbR87NiQRcThwuwK0HGhhx m+U5e7CyUA0dHy4CTzYu62nRSBVrhNpGiwUXF5J9bpvU6HbgC7mPyDnnwu1ToQ2+oTCn SIN6TPhc+lCxHN0vkiIv21navel3qWszVqIOx9DjCmT5eYnuRUkqbH4ATGwYvMwrmT4k r9JmIxtdeZ6paguiNm0sH2oi5qU0DopF9yvSZNbe1ozrJqOrnzRedgCRUfnGPY5RMlKQ nkow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="dy1/NFlQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ky15-20020a170902f98f00b001b5219acbbfsi2356127plb.359.2023.07.18.17.16.47; Tue, 18 Jul 2023 17:17:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="dy1/NFlQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231148AbjGRXuo (ORCPT <rfc822;assdfgzxcv4@gmail.com> + 99 others); Tue, 18 Jul 2023 19:50:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231165AbjGRXty (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 18 Jul 2023 19:49:54 -0400 Received: from mail-oa1-x49.google.com (mail-oa1-x49.google.com [IPv6:2001:4860:4864:20::49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 762AE210C for <linux-kernel@vger.kernel.org>; Tue, 18 Jul 2023 16:48:53 -0700 (PDT) Received: by mail-oa1-x49.google.com with SMTP id 586e51a60fabf-1ba5121da9eso6760057fac.1 for <linux-kernel@vger.kernel.org>; Tue, 18 Jul 2023 16:48:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689724131; x=1692316131; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=muvilKyoWvn1tarN80afquV4Wd3ZvyUZATzGD2ucPnA=; b=dy1/NFlQGuJqswhd4jyp5/5Y2DKjXWq8ZhDnL7jyfQUkTP7A9Hvq/HvrZglV4JQg4M 0K4GnyosjYX3JEla9t9lm8LJXLHv4GUKKzg9xFm9cJWrlQ5Qmy5LmSHitAR/Xzi9H8/Y MjJIFDOPL8g3wik3LCZjllYyd45Gr2H0qgazCaqc+yiHTFw7sn+VQEAa9DzzRpRQoil4 OF2suL13dOrJbxrD1Es2a+3MSZoSnJ4hgva3/KXiELtegBBB1WkzGy/iAanliFh6iYW0 B8x8MynR4QN8rTTnTiUPKbxJnJ7ZIBs38bI6aS2sGXphM22HDrUDYnhUpurmEDIvnFzg o5tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689724131; x=1692316131; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=muvilKyoWvn1tarN80afquV4Wd3ZvyUZATzGD2ucPnA=; b=L2MgveajqFETMQgkP9zSMnajh6O3i3fkAc8OB3PE8k91JBiRnaMH3UyZsku9460rEb jb8yJ/iW1+p+m3RQcBPvIk4f/oVyNyLR0fxIlfhEaLVTQFyWB8xw3s/2uIUDBPfN1d7n oIVC6M7JR12ZxjsZtjCUYM2V8j5PUs8vbVkcVzu/AS2bsosfHb6vx3HlXJxLoMtXrRLc 5oirrRNwSUS0b0ti4Xyxdl4uvDMuYQwqsVaWVZJHetiD3ag7uotWS9R1oGx9l6o3Wi2I JxE8EbNN1yb5Y13oVmhOdlUt+TAahhYrzowAAIg3Aje16UJamYOYucKxDY/wiSYYnStx wBBQ== X-Gm-Message-State: ABy/qLZKHrf7UC4fxpX+d2jeeXK1cnXZ63lK9DqVgmiDvyxZUD1wQDJH SkiDkJum+pPU2iRIw0P/ipJ3g2+5Fpg= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6870:c796:b0:1b0:20bd:eef with SMTP id dy22-20020a056870c79600b001b020bd0eefmr758790oab.2.1689724131216; Tue, 18 Jul 2023 16:48:51 -0700 (PDT) Reply-To: Sean Christopherson <seanjc@google.com> Date: Tue, 18 Jul 2023 16:44:53 -0700 In-Reply-To: <20230718234512.1690985-1-seanjc@google.com> Mime-Version: 1.0 References: <20230718234512.1690985-1-seanjc@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230718234512.1690985-11-seanjc@google.com> Subject: [RFC PATCH v11 10/29] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable From: Sean Christopherson <seanjc@google.com> To: Paolo Bonzini <pbonzini@redhat.com>, Marc Zyngier <maz@kernel.org>, Oliver Upton <oliver.upton@linux.dev>, Huacai Chen <chenhuacai@kernel.org>, Michael Ellerman <mpe@ellerman.id.au>, Anup Patel <anup@brainfault.org>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Sean Christopherson <seanjc@google.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, Paul Moore <paul@paul-moore.com>, James Morris <jmorris@namei.org>, "Serge E. Hallyn" <serge@hallyn.com> Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng <chao.p.peng@linux.intel.com>, Fuad Tabba <tabba@google.com>, Jarkko Sakkinen <jarkko@kernel.org>, Yu Zhang <yu.c.zhang@linux.intel.com>, Vishal Annapurve <vannapurve@google.com>, Ackerley Tng <ackerleytng@google.com>, Maciej Szmigiero <mail@maciej.szmigiero.name>, Vlastimil Babka <vbabka@suse.cz>, David Hildenbrand <david@redhat.com>, Quentin Perret <qperret@google.com>, Michael Roth <michael.roth@amd.com>, Wang <wei.w.wang@intel.com>, Liam Merwick <liam.merwick@oracle.com>, Isaku Yamahata <isaku.yamahata@gmail.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771805941858866059 X-GMAIL-MSGID: 1771805941858866059 |
Series |
KVM: guest_memfd() and per-page attributes
|
|
Commit Message
Sean Christopherson
July 18, 2023, 11:44 p.m. UTC
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/pagemap.h | 11 +++++++++++
mm/compaction.c | 4 ++++
mm/migrate.c | 2 ++
3 files changed, 17 insertions(+)
Comments
On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: > diff --git a/mm/compaction.c b/mm/compaction.c > index dbc9f86b1934..a3d2b132df52 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) > goto isolate_fail_put; > > + /* The mapping truly isn't movable. */ > + if (mapping && mapping_unmovable(mapping)) > + goto isolate_fail_put; > + I doubt that it is safe to dereference mapping here. I believe the folio can be truncated from under us and the mapping freed with the inode. The folio has to be locked to dereference mapping safely (given that the mapping is still tied to the folio). Vlastimil, any comments?
On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: > On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: > > diff --git a/mm/compaction.c b/mm/compaction.c > > index dbc9f86b1934..a3d2b132df52 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) > > goto isolate_fail_put; > > > > + /* The mapping truly isn't movable. */ > > + if (mapping && mapping_unmovable(mapping)) > > + goto isolate_fail_put; > > + > > I doubt that it is safe to dereference mapping here. I believe the folio > can be truncated from under us and the mapping freed with the inode. > > The folio has to be locked to dereference mapping safely (given that the > mapping is still tied to the folio). There's even a comment to that effect later on in the function: /* * Only pages without mappings or that have a * ->migrate_folio callback are possible to migrate * without blocking. However, we can be racing with * truncation so it's necessary to lock the page * to stabilise the mapping as truncation holds * the page lock until after the page is removed * from the page cache. */ (that could be reworded to make it clear how dangerous dereferencing ->mapping is without the lock ... and it does need to be changed to say "folio lock" instead of "page lock", so ...) How does this look? /* * Only folios without mappings or that have * a ->migrate_folio callback are possible to * migrate without blocking. However, we can * be racing with truncation, which can free * the mapping. Truncation holds the folio lock * until after the folio is removed from the page * cache so holding it ourselves is sufficient. */
On Tue, Jul 25, 2023 at 01:51:55PM +0100, Matthew Wilcox wrote: > On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: > > On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > index dbc9f86b1934..a3d2b132df52 100644 > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) > > > goto isolate_fail_put; > > > > > > + /* The mapping truly isn't movable. */ > > > + if (mapping && mapping_unmovable(mapping)) > > > + goto isolate_fail_put; > > > + > > > > I doubt that it is safe to dereference mapping here. I believe the folio > > can be truncated from under us and the mapping freed with the inode. > > > > The folio has to be locked to dereference mapping safely (given that the > > mapping is still tied to the folio). > > There's even a comment to that effect later on in the function: > > /* > * Only pages without mappings or that have a > * ->migrate_folio callback are possible to migrate > * without blocking. However, we can be racing with > * truncation so it's necessary to lock the page > * to stabilise the mapping as truncation holds > * the page lock until after the page is removed > * from the page cache. > */ > > (that could be reworded to make it clear how dangerous dereferencing > ->mapping is without the lock ... and it does need to be changed to say > "folio lock" instead of "page lock", so ...) > > How does this look? > > /* > * Only folios without mappings or that have > * a ->migrate_folio callback are possible to > * migrate without blocking. However, we can > * be racing with truncation, which can free > * the mapping. Truncation holds the folio lock > * until after the folio is removed from the page > * cache so holding it ourselves is sufficient. > */ > Looks good to me.
On 7/25/23 14:51, Matthew Wilcox wrote: > On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: >> On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: >> > diff --git a/mm/compaction.c b/mm/compaction.c >> > index dbc9f86b1934..a3d2b132df52 100644 >> > --- a/mm/compaction.c >> > +++ b/mm/compaction.c >> > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, >> > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) >> > goto isolate_fail_put; >> > >> > + /* The mapping truly isn't movable. */ >> > + if (mapping && mapping_unmovable(mapping)) >> > + goto isolate_fail_put; >> > + >> >> I doubt that it is safe to dereference mapping here. I believe the folio >> can be truncated from under us and the mapping freed with the inode. >> >> The folio has to be locked to dereference mapping safely (given that the >> mapping is still tied to the folio). > > There's even a comment to that effect later on in the function: Hmm, well spotted. But it wouldn't be so great if we now had to lock every inspected page (and not just dirty pages), just to check the AS_ bit. But I wonder if this is leftover from previous versions. Are the guest pages even PageLRU currently? (and should they be, given how they can't be swapped out or anything?) If not, isolate_migratepages_block will skip them anyway. > > /* > * Only pages without mappings or that have a > * ->migrate_folio callback are possible to migrate > * without blocking. However, we can be racing with > * truncation so it's necessary to lock the page > * to stabilise the mapping as truncation holds > * the page lock until after the page is removed > * from the page cache. > */ > > (that could be reworded to make it clear how dangerous dereferencing > ->mapping is without the lock ... and it does need to be changed to say > "folio lock" instead of "page lock", so ...) > How does this look? > > /* > * Only folios without mappings or that have > * a ->migrate_folio callback are possible to > * migrate without blocking. However, we can > * be racing with truncation, which can free > * the mapping. Truncation holds the folio lock > * until after the folio is removed from the page > * cache so holding it ourselves is sufficient. > */ >
On 7/28/23 18:02, Vlastimil Babka wrote: >> There's even a comment to that effect later on in the function: > Hmm, well spotted. But it wouldn't be so great if we now had to lock every > inspected page (and not just dirty pages), just to check the AS_ bit. > > But I wonder if this is leftover from previous versions. Are the guest pages > even PageLRU currently? (and should they be, given how they can't be swapped > out or anything?) If not, isolate_migratepages_block will skip them anyway. No, they're not (migration or even swap-out is not excluded for the future, but for now it's left for future work. Paolo
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 716953ee1ebd..931d2f1da7d5 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -203,6 +203,7 @@ enum mapping_flags { /* writeback related tags are not used */ AS_NO_WRITEBACK_TAGS = 5, AS_LARGE_FOLIO_SUPPORT = 6, + AS_UNMOVABLE = 7, /* The mapping cannot be moved, ever */ }; /** @@ -273,6 +274,16 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping) return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags); } +static inline void mapping_set_unmovable(struct address_space *mapping) +{ + set_bit(AS_UNMOVABLE, &mapping->flags); +} + +static inline bool mapping_unmovable(struct address_space *mapping) +{ + return test_bit(AS_UNMOVABLE, &mapping->flags); +} + static inline gfp_t mapping_gfp_mask(struct address_space * mapping) { return mapping->gfp_mask; diff --git a/mm/compaction.c b/mm/compaction.c index dbc9f86b1934..a3d2b132df52 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) goto isolate_fail_put; + /* The mapping truly isn't movable. */ + if (mapping && mapping_unmovable(mapping)) + goto isolate_fail_put; + /* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. diff --git a/mm/migrate.c b/mm/migrate.c index 24baad2571e3..c00a4ca86698 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -954,6 +954,8 @@ static int move_to_new_folio(struct folio *dst, struct folio *src, if (!mapping) rc = migrate_folio(mapping, dst, src, mode); + else if (mapping_unmovable(mapping)) + rc = -EOPNOTSUPP; else if (mapping->a_ops->migrate_folio) /* * Most folios have a mapping and most filesystems