Message ID | 20230526234435.662652-7-yuzhao@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp27540vqr; Fri, 26 May 2023 16:48:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4GXwD9y+7qM+4jSAn3bygqQAr6ArCMM7tMK1U1hd8d0oIbJJB+lfud0pt23Q+hwhhQtR5q X-Received: by 2002:a05:6a20:394e:b0:10b:6265:6ddc with SMTP id r14-20020a056a20394e00b0010b62656ddcmr1235007pzg.22.1685144886256; Fri, 26 May 2023 16:48:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685144886; cv=none; d=google.com; s=arc-20160816; b=JNOgN6m6/HAopiHen2SSYMNDxUyeuecDTx583xoH5+ie0DRFhePQ/KJr2LXTC6EsLP WjBkVZvXKo6XpywZ3gc+qV1EPPKIMV8HkpyzUiIpUmzIbDa0aFEJYMYNkxxFc2+JZStK g+SsUa9daV1zBkALPhhBn4m6UOp54yd+JP2cLiZVuhsdcWRSc7IlOGikh8OYMJ/HKY1v ymJixxDcF3YikvJjyYmq5JlSth+YUIptKOYjDwCJLpNgKg/3hsQ5Mo71ENIgMMX6Y99L ekLR8wo3x0cjNlrddjcNc+jZQQWVdzCOxkP9oSev8Oef6JfIQeiCGY6l4z1zXo0RxMda NXrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:dkim-signature; bh=Bguo3hUZaLYxse9v+ik+MX3ufqUdHR10jXNpIhPD3cE=; b=MuP2/H6PA7ABqX3FTT8rP+3mHAvldXPl7km9nOWhDsE73wtfYvjeB0qDIiSQ6FmUV7 hwjUJLPxe927zpHxFzOCgsMZ/8ylKbhuT5h6maNbDRY1X/nA26PhwirTQinIpccuuXZ/ xp81Hr1CrriYrV+R5Qz7qlc1+WjwrfFMKDFqtkkNZYLK2RNvK38OCdgeO/LHXsaouVu0 7YJNtDiLnZkxXbWB8iQD8smSpecLwvs32z9BFxp1WFyDCSfHQvua2MkIcqZhjAv4cR4/ csvK6iiEmQJCdgM0ftviV/wcEdh7B+t6Sw9HUD3ZP1f8om1Bl0pLGTEepXUvuzkbew9S Jufg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=Y62V38pt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w16-20020a17090a8a1000b002535c437f9fsi5404484pjn.33.2023.05.26.16.47.51; Fri, 26 May 2023 16:48:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=Y62V38pt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243227AbjEZXpV (ORCPT <rfc822;zhanglyra.2023@gmail.com> + 99 others); Fri, 26 May 2023 19:45:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237862AbjEZXox (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 26 May 2023 19:44:53 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 600CEFB for <linux-kernel@vger.kernel.org>; Fri, 26 May 2023 16:44:50 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-babb53e6952so2824744276.0 for <linux-kernel@vger.kernel.org>; Fri, 26 May 2023 16:44:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144689; x=1687736689; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Bguo3hUZaLYxse9v+ik+MX3ufqUdHR10jXNpIhPD3cE=; b=Y62V38pt3Kf7P9M2YivEC5661SUHDQjGPJeLNpi65nB4y+p3TsqoRR0z/9AOIr1Kjf gl3yg0EBv9UJcyGGhkoV+eP5nxHHEckaVjNM+WV0ph8y6N8NKNrOoxejQzf6xVLF0GNF CMPzW7iyFKrk0I+YLOBH1dB2cwpR8z8XT9p+1tlyni+H85cDXJGB/ZGkaiDpbVC82/o/ GKwW1qcFloQI+vCY/C+X0I8xbKAWwMlWN0KpQsh6osmLARaX2g1tNlgMEbkEwKadQaPu 8s3poXNR7qoLyRgjp4108zRCziAP6oNak8GIUn6lV7gRCswVoQjlt0PKr+oTkaiyWyY0 Ciug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144689; x=1687736689; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Bguo3hUZaLYxse9v+ik+MX3ufqUdHR10jXNpIhPD3cE=; b=O6Tq45B8/SUjJn/acDQPjN2mGXJOf45yFeGVb79BhHeB3Fb66frXH+fheo4fn/8D7S QD/aWmIPMuSBEPQ4L+9Yz+uceJyicG4kNnhxFyQWU310/qkj6bJqEQhAAPkwOka0oQa9 ZYHstEr7MUs/DXs2S66/g8Wa5zxn5OyL6OA6eWQbCH21JBhM/hqmk9ZCd6Al1Ja+0QM0 SAKKz7ZH2Z2lIeQEgP3kfjtHXsQtMUf9G4NbgqBykRVoYifzsZXHYUsotG++963X0kVX 7R7YzTZlvb7p2ZRHvCdup2Dup2O1U0jYD2uPF6ChJkVwPg6FS0jN8NSRr09TolQ5ZB0g IaIg== X-Gm-Message-State: AC+VfDztDg6i/UM1MD6LvNo6rtxIJd+CWmrbQQGPuWr8jveQ5QJiFB7X 3nQwLKhmfb9EuWCDhc/9sVaOyQAbKYo= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a25:3cb:0:b0:ba8:337a:d8a3 with SMTP id 194-20020a2503cb000000b00ba8337ad8a3mr1807757ybd.11.1685144689141; Fri, 26 May 2023 16:44:49 -0700 (PDT) Date: Fri, 26 May 2023 17:44:31 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-7-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 06/10] kvm/powerpc: make radix page tables RCU safe From: Yu Zhao <yuzhao@google.com> To: Andrew Morton <akpm@linux-foundation.org>, Paolo Bonzini <pbonzini@redhat.com> Cc: Alistair Popple <apopple@nvidia.com>, Anup Patel <anup@brainfault.org>, Ben Gardon <bgardon@google.com>, Borislav Petkov <bp@alien8.de>, Catalin Marinas <catalin.marinas@arm.com>, Chao Peng <chao.p.peng@linux.intel.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Dave Hansen <dave.hansen@linux.intel.com>, Fabiano Rosas <farosas@linux.ibm.com>, Gaosheng Cui <cuigaosheng1@huawei.com>, Gavin Shan <gshan@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>, James Morse <james.morse@arm.com>, "Jason A. Donenfeld" <Jason@zx2c4.com>, Jason Gunthorpe <jgg@ziepe.ca>, Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>, Masami Hiramatsu <mhiramat@kernel.org>, Michael Ellerman <mpe@ellerman.id.au>, Michael Larabel <michael@michaellarabel.com>, Mike Rapoport <rppt@kernel.org>, Nicholas Piggin <npiggin@gmail.com>, Oliver Upton <oliver.upton@linux.dev>, Paul Mackerras <paulus@ozlabs.org>, Peter Xu <peterx@redhat.com>, Sean Christopherson <seanjc@google.com>, Steven Rostedt <rostedt@goodmis.org>, Suzuki K Poulose <suzuki.poulose@arm.com>, Thomas Gleixner <tglx@linutronix.de>, Thomas Huth <thuth@redhat.com>, Will Deacon <will@kernel.org>, Zenghui Yu <yuzenghui@huawei.com>, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao <yuzhao@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767002483842606565?= X-GMAIL-MSGID: =?utf-8?q?1767002483842606565?= |
Series |
mm/kvm: locklessly clear the accessed bit
|
|
Commit Message
Yu Zhao
May 26, 2023, 11:44 p.m. UTC
KVM page tables are currently not RCU safe against remapping, i.e.,
kvmppc_unmap_free_pmd_entry_table() et al. The previous
mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with
that operation.
However, the new mmu_notifier_ops member test_clear_young() provides
a fast path that does not take kvm->mmu_lock. To implement
kvm_arch_test_clear_young() for that path, orphan page tables need to
be freed by RCU.
Unmapping, specifically kvm_unmap_radix(), does not free page tables,
hence not a concern.
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Comments
On Sat May 27, 2023 at 9:44 AM AEST, Yu Zhao wrote: > KVM page tables are currently not RCU safe against remapping, i.e., > kvmppc_unmap_free_pmd_entry_table() et al. The previous Minor nit but the "page table" is not RCU-safe against something. It is RCU-freed, and therefore some algorithm that accesses it can have the existence guarantee provided by RCU (usually there still needs to be more to it). > mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with > that operation. > > However, the new mmu_notifier_ops member test_clear_young() provides > a fast path that does not take kvm->mmu_lock. To implement > kvm_arch_test_clear_young() for that path, orphan page tables need to > be freed by RCU. Short version: clear the referenced bit using RCU instead of MMU lock to protect against page table freeing, and there is no problem with clearing the bit in a table that has been freed. Seems reasonable. > > Unmapping, specifically kvm_unmap_radix(), does not free page tables, > hence not a concern. Not sure if you really need to make the distinction about why the page table is freed, we might free them via unmapping. The point is just anything that frees them while there can be concurrent access, right? > > Signed-off-by: Yu Zhao <yuzhao@google.com> > --- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index 461307b89c3a..3b65b3b11041 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void) > { > unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE; > > - kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor); > + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, > + SLAB_TYPESAFE_BY_RCU, pte_ctor); > if (!kvm_pte_cache) > return -ENOMEM; > > size = sizeof(void *) << RADIX_PMD_INDEX_SIZE; > > - kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor); > + kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, > + SLAB_TYPESAFE_BY_RCU, pmd_ctor); > if (!kvm_pmd_cache) { > kmem_cache_destroy(kvm_pte_cache); > return -ENOMEM; KVM PPC HV radix PUD level page tables use the arch/powerpc allocators (for some reason), which are not RCU freed. I think you need them too? I wouldn't mind if the kvm pud table slab was moved in here instead of shared. Thanks, Nick
On Tue, Jun 20, 2023 at 12:33 AM Nicholas Piggin <npiggin@gmail.com> wrote: > > On Sat May 27, 2023 at 9:44 AM AEST, Yu Zhao wrote: > > KVM page tables are currently not RCU safe against remapping, i.e., > > kvmppc_unmap_free_pmd_entry_table() et al. The previous > > Minor nit but the "page table" is not RCU-safe against something. It > is RCU-freed, and therefore some algorithm that accesses it can have > the existence guarantee provided by RCU (usually there still needs > to be more to it). > > > mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with > > that operation. > > > > However, the new mmu_notifier_ops member test_clear_young() provides > > a fast path that does not take kvm->mmu_lock. To implement > > kvm_arch_test_clear_young() for that path, orphan page tables need to > > be freed by RCU. > > Short version: clear the referenced bit using RCU instead of MMU lock > to protect against page table freeing, and there is no problem with > clearing the bit in a table that has been freed. > > Seems reasonable. Thanks. All above points taken. > > Unmapping, specifically kvm_unmap_radix(), does not free page tables, > > hence not a concern. > > Not sure if you really need to make the distinction about why the page > table is freed, we might free them via unmapping. The point is just > anything that frees them while there can be concurrent access, right? Correct. > > Signed-off-by: Yu Zhao <yuzhao@google.com> > > --- > > arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > index 461307b89c3a..3b65b3b11041 100644 > > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void) > > { > > unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE; > > > > - kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor); > > + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, > > + SLAB_TYPESAFE_BY_RCU, pte_ctor); > > if (!kvm_pte_cache) > > return -ENOMEM; > > > > size = sizeof(void *) << RADIX_PMD_INDEX_SIZE; > > > > - kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor); > > + kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, > > + SLAB_TYPESAFE_BY_RCU, pmd_ctor); > > if (!kvm_pmd_cache) { > > kmem_cache_destroy(kvm_pte_cache); > > return -ENOMEM; > > KVM PPC HV radix PUD level page tables use the arch/powerpc allocators > (for some reason), which are not RCU freed. I think you need them too? We don't. The use of the arch/powerpc allocator for PUD tables seems appropriate to me because, unlike PMD/PTE tables, we never free PUD tables during the lifetime of a VM: * We don't free PUD/PMD/PTE tables when they become empty, i.e., not mapping any pages but still attached. (We could in theory, as x86/aarch64 do.) * We have to free PMD/PTE tables when we replace them with 1GB/2MB pages. (Otherwise we'd lose track of detached tables.) And we currently don't support huge pages at P4D level, so we never detach and free PUD tables.
On Tue Jun 20, 2023 at 6:00 PM AEST, Yu Zhao wrote: > On Tue, Jun 20, 2023 at 12:33 AM Nicholas Piggin <npiggin@gmail.com> wrote: > > > > On Sat May 27, 2023 at 9:44 AM AEST, Yu Zhao wrote: > > > KVM page tables are currently not RCU safe against remapping, i.e., > > > kvmppc_unmap_free_pmd_entry_table() et al. The previous > > > > Minor nit but the "page table" is not RCU-safe against something. It > > is RCU-freed, and therefore some algorithm that accesses it can have > > the existence guarantee provided by RCU (usually there still needs > > to be more to it). > > > > > mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with > > > that operation. > > > > > > However, the new mmu_notifier_ops member test_clear_young() provides > > > a fast path that does not take kvm->mmu_lock. To implement > > > kvm_arch_test_clear_young() for that path, orphan page tables need to > > > be freed by RCU. > > > > Short version: clear the referenced bit using RCU instead of MMU lock > > to protect against page table freeing, and there is no problem with > > clearing the bit in a table that has been freed. > > > > Seems reasonable. > > Thanks. All above points taken. > > > > Unmapping, specifically kvm_unmap_radix(), does not free page tables, > > > hence not a concern. > > > > Not sure if you really need to make the distinction about why the page > > table is freed, we might free them via unmapping. The point is just > > anything that frees them while there can be concurrent access, right? > > Correct. > > > > Signed-off-by: Yu Zhao <yuzhao@google.com> > > > --- > > > arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > > index 461307b89c3a..3b65b3b11041 100644 > > > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > > > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > > @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void) > > > { > > > unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE; > > > > > > - kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor); > > > + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, > > > + SLAB_TYPESAFE_BY_RCU, pte_ctor); > > > if (!kvm_pte_cache) > > > return -ENOMEM; > > > > > > size = sizeof(void *) << RADIX_PMD_INDEX_SIZE; > > > > > > - kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor); > > > + kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, > > > + SLAB_TYPESAFE_BY_RCU, pmd_ctor); > > > if (!kvm_pmd_cache) { > > > kmem_cache_destroy(kvm_pte_cache); > > > return -ENOMEM; > > > > KVM PPC HV radix PUD level page tables use the arch/powerpc allocators > > (for some reason), which are not RCU freed. I think you need them too? > > We don't. The use of the arch/powerpc allocator for PUD tables seems > appropriate to me because, unlike PMD/PTE tables, we never free PUD > tables during the lifetime of a VM: Ah you're right, the pud_free only comes from the double alloc case so it's never visible to concurrent threads. > * We don't free PUD/PMD/PTE tables when they become empty, i.e., not > mapping any pages but still attached. (We could in theory, as > x86/aarch64 do.) We may try to do that at some point, but that's not related to your patch for now so no worries. Thanks, Nick
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 461307b89c3a..3b65b3b11041 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void) { unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE; - kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor); + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, + SLAB_TYPESAFE_BY_RCU, pte_ctor); if (!kvm_pte_cache) return -ENOMEM; size = sizeof(void *) << RADIX_PMD_INDEX_SIZE; - kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor); + kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, + SLAB_TYPESAFE_BY_RCU, pmd_ctor); if (!kvm_pmd_cache) { kmem_cache_destroy(kvm_pte_cache); return -ENOMEM;