Message ID | 20230807082305.198784-2-dylan@andestech.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp1310743vqr; Mon, 7 Aug 2023 01:50:02 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFSppfLMPXkGvC1jahFhUKeN1SJUl1bmWMp1GJvUJNV3Sqgsf71Nsc53Y1aMCgUIMTcl3CO X-Received: by 2002:a05:620a:28d1:b0:76c:95a5:b86b with SMTP id l17-20020a05620a28d100b0076c95a5b86bmr14637631qkp.47.1691398202571; Mon, 07 Aug 2023 01:50:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691398202; cv=none; d=google.com; s=arc-20160816; b=a9m0ipWxsSxmpbe2XCO6AJHdEgrviuZL8dznPgPOYrEFBfOTqGbWE2PzPWsE0q2+kf BsfwHSM9HGrnDaDkKH4B7F9DSXvwcpUG5Rbbpr5eWhOyt7/janhEx1g+5pj6Wt/jOZOF G8VJVjuMRPnIT+EsFexGtZYSYxamINygQGeqHuIYcW+xbs0ZZRXLtyxzjDyxy0A1QvcJ csoesntOaJU674CnD6UcPMS7k2rqVzTy9Vn9EOo7fnXKlnNTliHLVrl+Sq02pyBhYo/g KOq2wedQnd8fKT8R0KtDm+AouJXnpHnMTJiKd0U4nCjwLcu9NBXsF5gCthLvnNlOKyAu eZMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=XixaZNuiN0YZx7uLmJJUib0AlVhV3rWpXWLycet0C2A=; fh=ZYlRn2g767F43XtOdyo3SRNTefo0p1j7iXRdr3U7KxE=; b=elfno0upGOYYlKEaZk6qOjzf4+9ZMYjCqOMvsA/biOtMgvEtEtgkb4Ma+F1qEuRIdU /QEmeo+CjbjmwMaL/Cm22SpiA4LL2e5iNbgbBIUACAIyn4w2pbIkiwF3NeKI8UZ9Rl3d fLb2sx4/ifMw2rNbU7Z3erQFhGRaPbyS/uJIIMnc/lCEBcXS+1sw7XDJ4Lu6SkiVBv5P jVrbueZVIftQSFF4h9KnZc+AZU2s/mYcRTgb23dbXWKpFCnrutq6ZBjnYHXc9sY73+y2 9Hpk870D0OCMqibIR05ZPxe2Eatc5ZkVlmklOZjUGyCm8dodavqlT2MNSe7etdv6XuZK QIqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j71-20020a63804a000000b00563fa937d90si5302244pgd.126.2023.08.07.01.49.49; Mon, 07 Aug 2023 01:50:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231135AbjHGIXb (ORCPT <rfc822;aaronkmseo@gmail.com> + 99 others); Mon, 7 Aug 2023 04:23:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229703AbjHGIXa (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 7 Aug 2023 04:23:30 -0400 Received: from Atcsqr.andestech.com (60-248-80-70.hinet-ip.hinet.net [60.248.80.70]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04F0C170A for <linux-kernel@vger.kernel.org>; Mon, 7 Aug 2023 01:23:27 -0700 (PDT) Received: from mail.andestech.com (ATCPCS16.andestech.com [10.0.1.222]) by Atcsqr.andestech.com with ESMTP id 3778NDqf098860; Mon, 7 Aug 2023 16:23:13 +0800 (+08) (envelope-from dylan@andestech.com) Received: from atctrx.andestech.com (10.0.15.173) by ATCPCS16.andestech.com (10.0.1.222) with Microsoft SMTP Server id 14.3.498.0; Mon, 7 Aug 2023 16:23:10 +0800 From: Dylan Jhong <dylan@andestech.com> To: <paul.walmsley@sifive.com>, <palmer@dabbelt.com>, <aou@eecs.berkeley.edu>, <ajones@ventanamicro.com>, <alexghiti@rivosinc.com>, <anup@brainfault.org>, <rppt@kernel.org>, <samuel@sholland.org>, <panqinglin2020@iscas.ac.cn>, <sergey.matyukevich@syntacore.com>, <maz@kernel.org>, <linux-riscv@lists.infradead.org>, <conor.dooley@microchip.com>, <linux-kernel@vger.kernel.org> CC: <ycliang@andestech.com>, <x5710999x@gmail.com>, <tim609@andestech.com>, Dylan Jhong <dylan@andestech.com> Subject: [PATCH 1/1] riscv: Implement arch_sync_kernel_mappings() for "preventive" TLB flush Date: Mon, 7 Aug 2023 16:23:05 +0800 Message-ID: <20230807082305.198784-2-dylan@andestech.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230807082305.198784-1-dylan@andestech.com> References: <20230807082305.198784-1-dylan@andestech.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.0.15.173] X-DNSRBL: X-SPAM-SOURCE-CHECK: pass X-MAIL: Atcsqr.andestech.com 3778NDqf098860 X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,PDS_RDNS_DYNAMIC_FP, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773559561491990000 X-GMAIL-MSGID: 1773559561491990000 |
Series |
Enhanced TLB flushing for vmap/vmalloc()
|
|
Commit Message
Dylan Jhong
Aug. 7, 2023, 8:23 a.m. UTC
Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
the correct kernel mapping.
The patch implements TLB flushing in arch_sync_kernel_mappings(), ensuring that kernel
page table mappings created via vmap/vmalloc() are updated before switching MM.
Signed-off-by: Dylan Jhong <dylan@andestech.com>
---
arch/riscv/include/asm/page.h | 2 ++
arch/riscv/mm/tlbflush.c | 12 ++++++++++++
2 files changed, 14 insertions(+)
Comments
Hi Dylan, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on v6.5-rc5 next-20230807] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Dylan-Jhong/riscv-Implement-arch_sync_kernel_mappings-for-preventive-TLB-flush/20230807-162922 base: linus/master patch link: https://lore.kernel.org/r/20230807082305.198784-2-dylan%40andestech.com patch subject: [PATCH 1/1] riscv: Implement arch_sync_kernel_mappings() for "preventive" TLB flush config: riscv-allyesconfig (https://download.01.org/0day-ci/archive/20230807/202308071710.irjERWVF-lkp@intel.com/config) compiler: riscv64-linux-gcc (GCC) 12.3.0 reproduce: (https://download.01.org/0day-ci/archive/20230807/202308071710.irjERWVF-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202308071710.irjERWVF-lkp@intel.com/ All warnings (new ones prefixed by >>): >> arch/riscv/mm/tlbflush.c:159:6: warning: no previous prototype for 'arch_sync_kernel_mappings' [-Wmissing-prototypes] 159 | void arch_sync_kernel_mappings(unsigned long start, unsigned long end) | ^~~~~~~~~~~~~~~~~~~~~~~~~ vim +/arch_sync_kernel_mappings +159 arch/riscv/mm/tlbflush.c 152 153 /* 154 * Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB, 155 * it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains 156 * the correct kernel mapping. arch_sync_kernel_mappings() will ensure that kernel 157 * page table mappings created via vmap/vmalloc() are updated before switching MM. 158 */ > 159 void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
Hi Dylan, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.5-rc5 next-20230807] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Dylan-Jhong/riscv-Implement-arch_sync_kernel_mappings-for-preventive-TLB-flush/20230807-162922 base: linus/master patch link: https://lore.kernel.org/r/20230807082305.198784-2-dylan%40andestech.com patch subject: [PATCH 1/1] riscv: Implement arch_sync_kernel_mappings() for "preventive" TLB flush config: riscv-allnoconfig (https://download.01.org/0day-ci/archive/20230807/202308072050.0T0FlSpT-lkp@intel.com/config) compiler: riscv64-linux-gcc (GCC) 12.3.0 reproduce: (https://download.01.org/0day-ci/archive/20230807/202308072050.0T0FlSpT-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202308072050.0T0FlSpT-lkp@intel.com/ All errors (new ones prefixed by >>): riscv64-linux-ld: mm/memory.o: in function `.L1539': >> memory.c:(.text+0x3b5c): undefined reference to `arch_sync_kernel_mappings' riscv64-linux-ld: mm/vmalloc.o: in function `.L301': >> vmalloc.c:(.text+0xd24): undefined reference to `arch_sync_kernel_mappings' riscv64-linux-ld: mm/vmalloc.o: in function `vb_alloc.constprop.0': vmalloc.c:(.text+0x2c4e): undefined reference to `arch_sync_kernel_mappings' riscv64-linux-ld: mm/vmalloc.o: in function `.L0 ': vmalloc.c:(.text+0x2f2c): undefined reference to `arch_sync_kernel_mappings'
Hi Dylan, On 07/08/2023 10:23, Dylan Jhong wrote: > Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB, > it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains > the correct kernel mapping. > > The patch implements TLB flushing in arch_sync_kernel_mappings(), ensuring that kernel > page table mappings created via vmap/vmalloc() are updated before switching MM. > > Signed-off-by: Dylan Jhong <dylan@andestech.com> > --- > arch/riscv/include/asm/page.h | 2 ++ > arch/riscv/mm/tlbflush.c | 12 ++++++++++++ > 2 files changed, 14 insertions(+) > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h > index b55ba20903ec..6c86ab69687e 100644 > --- a/arch/riscv/include/asm/page.h > +++ b/arch/riscv/include/asm/page.h > @@ -21,6 +21,8 @@ > #define HPAGE_MASK (~(HPAGE_SIZE - 1)) > #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) > > +#define ARCH_PAGE_TABLE_SYNC_MASK PGTBL_PTE_MODIFIED > + > /* > * PAGE_OFFSET -- the first address of the first page of memory. > * When not using MMU this corresponds to the first free page in > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > index 77be59aadc73..d63364948c85 100644 > --- a/arch/riscv/mm/tlbflush.c > +++ b/arch/riscv/mm/tlbflush.c > @@ -149,3 +149,15 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, > __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE); > } > #endif > + > +/* > + * Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB, > + * it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains > + * the correct kernel mapping. arch_sync_kernel_mappings() will ensure that kernel > + * page table mappings created via vmap/vmalloc() are updated before switching MM. > + */ > +void arch_sync_kernel_mappings(unsigned long start, unsigned long end) > +{ > + if (start < VMALLOC_END && end > VMALLOC_START) This test is too restrictive, it should catch the range [MODULES_VADDR; MODULES_END[ too, sorry I did not notice that at first. > + flush_tlb_all(); > +} > \ No newline at end of file I have to admit that I *think* both your patch and mine are wrong: one of the problem that led to the removal of vmalloc_fault() is the possibility for tracing functions to actually allocate vmalloc regions in the vmalloc page fault path, which could give rise to nested exceptions (see https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/). Here, everytime we allocate a vmalloc region, we send an IPI. If a vmalloc allocation happens in this path (if it is traced for example), it will give rise to an IPI...and so on. So I came to the conclusion that the only way to actually fix this issue is by resolving the vmalloc faults very early in the page fault path (by emitting a sfence.vma on uarch that cache invalid entries), before the kernel stack is even accessed. That's the best solution since it would completely remove all the preventive sfence.vma in flush_cache_vmap()/arch_sync_kernel_mappings(), we would rely on faulting which I assume should not happen a lot (?). I'm implementing this solution, but I'm pretty sure it won't be ready for 6.5. In the meantime, we need either your patch or mine to fix your issue...
On Tue, Aug 08, 2023 at 12:16:50PM +0200, Alexandre Ghiti wrote: > Hi Dylan, > > On 07/08/2023 10:23, Dylan Jhong wrote: > > Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB, > > it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains > > the correct kernel mapping. > > > > The patch implements TLB flushing in arch_sync_kernel_mappings(), ensuring that kernel > > page table mappings created via vmap/vmalloc() are updated before switching MM. > > > > Signed-off-by: Dylan Jhong <dylan@andestech.com> > > --- > > arch/riscv/include/asm/page.h | 2 ++ > > arch/riscv/mm/tlbflush.c | 12 ++++++++++++ > > 2 files changed, 14 insertions(+) > > > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h > > index b55ba20903ec..6c86ab69687e 100644 > > --- a/arch/riscv/include/asm/page.h > > +++ b/arch/riscv/include/asm/page.h > > @@ -21,6 +21,8 @@ > > #define HPAGE_MASK (~(HPAGE_SIZE - 1)) > > #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) > > +#define ARCH_PAGE_TABLE_SYNC_MASK PGTBL_PTE_MODIFIED > > + > > /* > > * PAGE_OFFSET -- the first address of the first page of memory. > > * When not using MMU this corresponds to the first free page in > > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > > index 77be59aadc73..d63364948c85 100644 > > --- a/arch/riscv/mm/tlbflush.c > > +++ b/arch/riscv/mm/tlbflush.c > > @@ -149,3 +149,15 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, > > __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE); > > } > > #endif > > + > > +/* > > + * Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB, > > + * it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains > > + * the correct kernel mapping. arch_sync_kernel_mappings() will ensure that kernel > > + * page table mappings created via vmap/vmalloc() are updated before switching MM. > > + */ > > +void arch_sync_kernel_mappings(unsigned long start, unsigned long end) > > +{ > > + if (start < VMALLOC_END && end > VMALLOC_START) > > > This test is too restrictive, it should catch the range [MODULES_VADDR; > MODULES_END[ too, sorry I did not notice that at first. > > > > + flush_tlb_all(); > > +} > > \ No newline at end of file > > > I have to admit that I *think* both your patch and mine are wrong: one of > the problem that led to the removal of vmalloc_fault() is the possibility > for tracing functions to actually allocate vmalloc regions in the vmalloc > page fault path, which could give rise to nested exceptions (see > https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/). > > Here, everytime we allocate a vmalloc region, we send an IPI. If a vmalloc > allocation happens in this path (if it is traced for example), it will give > rise to an IPI...and so on. > > So I came to the conclusion that the only way to actually fix this issue is > by resolving the vmalloc faults very early in the page fault path (by > emitting a sfence.vma on uarch that cache invalid entries), before the > kernel stack is even accessed. That's the best solution since it would > completely remove all the preventive sfence.vma in > flush_cache_vmap()/arch_sync_kernel_mappings(), we would rely on faulting > which I assume should not happen a lot (?). > Hi Alex, Agree. If we could introduce a "new vmalloc_fault()" function before accessing the kernel stack, which would trigger an SFENCE.VMA instruction, then each time we call vmalloc() or vmap() to create new kernel mappings, we wouldn't need to execute flush_cache_vmap() or arch_sync_kernel_mappings() to update the TLB. This should be able to balance both performance and correctness. > I'm implementing this solution, but I'm pretty sure it won't be ready for > 6.5. In the meantime, we need either your patch or mine to fix your issue... > If there are no others reporting this issues, I believe encountering this TLB flush problem might not be so common. Perhaps we could wait until you've finished implementing the "new vmalloc_fault()" feature. If anyone encounters problems in the meantime, I think they can temporarily apply either my patch or yours to workaround the issue of updating TLB for vmalloc. Best regards, Dylan Jhong
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h index b55ba20903ec..6c86ab69687e 100644 --- a/arch/riscv/include/asm/page.h +++ b/arch/riscv/include/asm/page.h @@ -21,6 +21,8 @@ #define HPAGE_MASK (~(HPAGE_SIZE - 1)) #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) +#define ARCH_PAGE_TABLE_SYNC_MASK PGTBL_PTE_MODIFIED + /* * PAGE_OFFSET -- the first address of the first page of memory. * When not using MMU this corresponds to the first free page in diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 77be59aadc73..d63364948c85 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -149,3 +149,15 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE); } #endif + +/* + * Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB, + * it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains + * the correct kernel mapping. arch_sync_kernel_mappings() will ensure that kernel + * page table mappings created via vmap/vmalloc() are updated before switching MM. + */ +void arch_sync_kernel_mappings(unsigned long start, unsigned long end) +{ + if (start < VMALLOC_END && end > VMALLOC_START) + flush_tlb_all(); +} \ No newline at end of file