[v5,08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200

Message ID 20240229232211.161961-9-samuel.holland@sifive.com
State New
Headers
Series riscv: ASID-related and UP-related TLB flush enhancements |

Commit Message

Samuel Holland Feb. 29, 2024, 11:21 p.m. UTC
  Since implementations affected by SiFive errata CIP-1200 always use the
global variant of the sfence.vma instruction, they only need to execute
the instruction once. The range-based loop only hurts performance.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - Only set tlb_flush_all_threshold when CONFIG_MMU=y.

Changes in v3:
 - New patch for v3

 arch/riscv/errata/sifive/errata.c | 5 +++++
 arch/riscv/include/asm/tlbflush.h | 2 ++
 arch/riscv/mm/tlbflush.c          | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)
  

Comments

yunhui cui March 1, 2024, 2:48 a.m. UTC | #1
Hi Samuel,

On Fri, Mar 1, 2024 at 7:22 AM Samuel Holland <samuel.holland@sifive.com> wrote:
>
> Since implementations affected by SiFive errata CIP-1200 always use the
> global variant of the sfence.vma instruction, they only need to execute
> the instruction once. The range-based loop only hurts performance.
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>  - Only set tlb_flush_all_threshold when CONFIG_MMU=y.
>
> Changes in v3:
>  - New patch for v3
>
>  arch/riscv/errata/sifive/errata.c | 5 +++++
>  arch/riscv/include/asm/tlbflush.h | 2 ++
>  arch/riscv/mm/tlbflush.c          | 2 +-
>  3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
> index 3d9a32d791f7..716cfedad3a2 100644
> --- a/arch/riscv/errata/sifive/errata.c
> +++ b/arch/riscv/errata/sifive/errata.c
> @@ -42,6 +42,11 @@ static bool errata_cip_1200_check_func(unsigned long  arch_id, unsigned long imp
>                 return false;
>         if ((impid & 0xffffff) > 0x200630 || impid == 0x1200626)
>                 return false;
> +
> +#ifdef CONFIG_MMU
> +       tlb_flush_all_threshold = 0;
> +#endif
> +
>         return true;
>  }
>
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 463b615d7728..8e329721375b 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -66,6 +66,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>                                unsigned long uaddr);
>  void arch_flush_tlb_batched_pending(struct mm_struct *mm);
>  void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
> +
> +extern unsigned long tlb_flush_all_threshold;
>  #else /* CONFIG_MMU */
>  #define local_flush_tlb_all()                  do { } while (0)
>  #endif /* CONFIG_MMU */
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 365e0a0e4725..22870f213188 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -11,7 +11,7 @@
>   * Flush entire TLB if number of entries to be flushed is greater
>   * than the threshold below.
>   */
> -static unsigned long tlb_flush_all_threshold __read_mostly = 64;
> +unsigned long tlb_flush_all_threshold __read_mostly = 64;
>
>  static void local_flush_tlb_range_threshold_asid(unsigned long start,
>                                                  unsigned long size,
> --
> 2.43.1
>

If local_flush_tlb_all_asid() is used every time, more PTWs will be
generated. Will such modifications definitely improve the overall
performance?

Hi Alex, Samuel,
The relationship between flush_xx_range_asid() and nr_ptes is
basically linear growth (y=kx +b), while flush_all_asid() has nothing
to do with nr_ptes (y=c).
Some TLBs may do some optimization. The operation of flush all itself
requires very few cycles, but there is a certain delay between
consecutive flush all.
The intersection of the two straight lines is the optimal solution of
tlb_flush_all_threshold. In actual situations, continuous
flush_all_asid will not occur. One problem caused by flush_all_asid()
is that multiple flush entries require PTW, which causes greater
latency.
Therefore, the value of tlb_flush_all_threshold needs to be considered
or quantified. Maybe doing local_flush_tlb_page_asid() based on the
actual nr_ptes_in_range would give better overall performance.
What do you think?

Thanks,
Yunhui
  

Patch

diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
index 3d9a32d791f7..716cfedad3a2 100644
--- a/arch/riscv/errata/sifive/errata.c
+++ b/arch/riscv/errata/sifive/errata.c
@@ -42,6 +42,11 @@  static bool errata_cip_1200_check_func(unsigned long  arch_id, unsigned long imp
 		return false;
 	if ((impid & 0xffffff) > 0x200630 || impid == 0x1200626)
 		return false;
+
+#ifdef CONFIG_MMU
+	tlb_flush_all_threshold = 0;
+#endif
+
 	return true;
 }
 
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 463b615d7728..8e329721375b 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -66,6 +66,8 @@  void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 			       unsigned long uaddr);
 void arch_flush_tlb_batched_pending(struct mm_struct *mm);
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
+
+extern unsigned long tlb_flush_all_threshold;
 #else /* CONFIG_MMU */
 #define local_flush_tlb_all()			do { } while (0)
 #endif /* CONFIG_MMU */
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 365e0a0e4725..22870f213188 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -11,7 +11,7 @@ 
  * Flush entire TLB if number of entries to be flushed is greater
  * than the threshold below.
  */
-static unsigned long tlb_flush_all_threshold __read_mostly = 64;
+unsigned long tlb_flush_all_threshold __read_mostly = 64;
 
 static void local_flush_tlb_range_threshold_asid(unsigned long start,
 						 unsigned long size,