LoongArch: Make WriteCombine configurable for ioremap()

Message ID 20230314085433.4078119-1-chenhuacai@loongson.cn
State New
Headers
Series LoongArch: Make WriteCombine configurable for ioremap() |

Commit Message

Huacai Chen March 14, 2023, 8:54 a.m. UTC
  LoongArch maintains cache coherency in hardware, but when works with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, may be fixed in newer
chipsets).

This means WUC can only used for write-only memory regions now, so this
option is disabled by default (means WUC falls back to SUC for ioremap).
You can enable this option if the kernel is ensured to run on bug-free
hardwares.

Suggested-by: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 arch/loongarch/Kconfig          | 14 ++++++++++++++
 arch/loongarch/include/asm/io.h |  5 +++++
 2 files changed, 19 insertions(+)
  

Comments

Xi Ruoyao March 14, 2023, 9:40 a.m. UTC | #1
On Tue, 2023-03-14 at 16:54 +0800, Huacai Chen wrote:
> LoongArch maintains cache coherency in hardware, but when works with
> LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> to WriteCombine) is out of the scope of cache coherency machanism for
> PCIe devices (this is a PCIe protocol violation, may be fixed in newer
> chipsets).

IIUC all launched LS7A models (7A1000 and 7A2000) suffers this issue?

> This means WUC can only used for write-only memory regions now, so this
> option is disabled by default (means WUC falls back to SUC for ioremap).
> You can enable this option if the kernel is ensured to run on bug-free
> hardwares.

Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
automatically from the vendor:device ID of the PCI root controller? 
Then we don't need to rely on the user or distro maintainer to select an
option.  I see there is already many architecture-dependant #if
directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
in PCI tree...

If a PCI quirk is not possible, then is it possible to make a kernel
command line option, leaving this CONFIG as the default value of the
option?  I guess in the future many LoongArch users will just install a
binary distro, then it would be much easier to edit grub.cfg than
rebuilding the kernel when they finally buy a compliant PCIe controller.

> Suggested-by: WANG Xuerui <kernel@xen0n.name>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>  arch/loongarch/Kconfig          | 14 ++++++++++++++
>  arch/loongarch/include/asm/io.h |  5 +++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 0d11738a861a..e3f5c422636f 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -446,6 +446,20 @@ config ARCH_IOREMAP
>           protection support. However, you can enable LoongArch DMW-based
>           ioremap() for better performance.
>  
> +config ARCH_WRITECOMBINE
> +       bool "Enable WriteCombine (WUC) for ioremap()"
> +       help
> +         LoongArch maintains cache coherency in hardware, but with LS7A
> +         chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> +         to WriteCombine) is out of the scope of cache coherency machanism
> +         for PCIe devices (this is a PCIe protocol violation, may be fixed
> +         in newer chipsets).
> +
> +         This means WUC can only used for write-only memory regions now, so
> +         this option is disabled by default (means WUC falls back to SUC for
> +         ioremap). You can enable this option if the kernel is ensured to run
> +         on bug-free hardwares.
> +
>  config ARCH_STRICT_ALIGN
>         bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
>         default y
> diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
> index 402a7d9e3a53..079ef897ed1a 100644
> --- a/arch/loongarch/include/asm/io.h
> +++ b/arch/loongarch/include/asm/io.h
> @@ -54,8 +54,13 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
>   * @offset:    bus address of the memory
>   * @size:      size of the resource to map
>   */
> +#ifdef CONFIG_ARCH_WRITECOMBINE
>  #define ioremap_wc(offset, size)       \
>         ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
> +#else
> +#define ioremap_wc(offset, size)       \
> +       ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_SUC))
> +#endif
>  
>  #define ioremap_cache(offset, size)    \
>         ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
  
Huacai Chen March 14, 2023, 10:02 a.m. UTC | #2
Hi, Ruoyao,

On Tue, Mar 14, 2023 at 5:41 PM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Tue, 2023-03-14 at 16:54 +0800, Huacai Chen wrote:
> > LoongArch maintains cache coherency in hardware, but when works with
> > LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> > to WriteCombine) is out of the scope of cache coherency machanism for
> > PCIe devices (this is a PCIe protocol violation, may be fixed in newer
> > chipsets).
>
> IIUC all launched LS7A models (7A1000 and 7A2000) suffers this issue?
Yes, very unfortunately, but this issue is only observed in the amdgpu
driver now.

>
> > This means WUC can only used for write-only memory regions now, so this
> > option is disabled by default (means WUC falls back to SUC for ioremap).
> > You can enable this option if the kernel is ensured to run on bug-free
> > hardwares.
>
> Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
> automatically from the vendor:device ID of the PCI root controller?
> Then we don't need to rely on the user or distro maintainer to select an
> option.  I see there is already many architecture-dependant #if
> directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
> in PCI tree...
Not a good idea, pci quirks need too long a time to review, and we
don't know when this issue can be fixed in hardware.

>
> If a PCI quirk is not possible, then is it possible to make a kernel
> command line option, leaving this CONFIG as the default value of the
> option?  I guess in the future many LoongArch users will just install a
> binary distro, then it would be much easier to edit grub.cfg than
> rebuilding the kernel when they finally buy a compliant PCIe controller.
If we use command line parameter, we can remove this Kconfig option.

Huacai
>
> > Suggested-by: WANG Xuerui <kernel@xen0n.name>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >  arch/loongarch/Kconfig          | 14 ++++++++++++++
> >  arch/loongarch/include/asm/io.h |  5 +++++
> >  2 files changed, 19 insertions(+)
> >
> > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> > index 0d11738a861a..e3f5c422636f 100644
> > --- a/arch/loongarch/Kconfig
> > +++ b/arch/loongarch/Kconfig
> > @@ -446,6 +446,20 @@ config ARCH_IOREMAP
> >           protection support. However, you can enable LoongArch DMW-based
> >           ioremap() for better performance.
> >
> > +config ARCH_WRITECOMBINE
> > +       bool "Enable WriteCombine (WUC) for ioremap()"
> > +       help
> > +         LoongArch maintains cache coherency in hardware, but with LS7A
> > +         chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> > +         to WriteCombine) is out of the scope of cache coherency machanism
> > +         for PCIe devices (this is a PCIe protocol violation, may be fixed
> > +         in newer chipsets).
> > +
> > +         This means WUC can only used for write-only memory regions now, so
> > +         this option is disabled by default (means WUC falls back to SUC for
> > +         ioremap). You can enable this option if the kernel is ensured to run
> > +         on bug-free hardwares.
> > +
> >  config ARCH_STRICT_ALIGN
> >         bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
> >         default y
> > diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
> > index 402a7d9e3a53..079ef897ed1a 100644
> > --- a/arch/loongarch/include/asm/io.h
> > +++ b/arch/loongarch/include/asm/io.h
> > @@ -54,8 +54,13 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
> >   * @offset:    bus address of the memory
> >   * @size:      size of the resource to map
> >   */
> > +#ifdef CONFIG_ARCH_WRITECOMBINE
> >  #define ioremap_wc(offset, size)       \
> >         ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
> > +#else
> > +#define ioremap_wc(offset, size)       \
> > +       ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_SUC))
> > +#endif
> >
> >  #define ioremap_cache(offset, size)    \
> >         ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
>
> --
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University
  
WANG Xuerui March 14, 2023, 10:08 a.m. UTC | #3
On 2023/3/14 16:54, Huacai Chen wrote:
> LoongArch maintains cache coherency in hardware, but when works with

but when paired with current LS7A chipsets

> LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> to WriteCombine) is out of the scope of cache coherency machanism for
> PCIe devices (this is a PCIe protocol violation, may be fixed in newer

which may be fixed

> chipsets).
> 
> This means WUC can only used for write-only memory regions now, so this
> option is disabled by default (means WUC falls back to SUC for ioremap).

by default, making ioremap_wc silently fallback to SUC.

> You can enable this option if the kernel is ensured to run on bug-free
> hardwares.

if you can ensure the kernel will always be running on hardware without 
this bug

> 
> Suggested-by: WANG Xuerui <kernel@xen0n.name>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>   arch/loongarch/Kconfig          | 14 ++++++++++++++
>   arch/loongarch/include/asm/io.h |  5 +++++
>   2 files changed, 19 insertions(+)
> 
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 0d11738a861a..e3f5c422636f 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -446,6 +446,20 @@ config ARCH_IOREMAP
>   	  protection support. However, you can enable LoongArch DMW-based
>   	  ioremap() for better performance.
>   
> +config ARCH_WRITECOMBINE
> +	bool "Enable WriteCombine (WUC) for ioremap()"
> +	help
> +	  LoongArch maintains cache coherency in hardware, but with LS7A
> +	  chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> +	  to WriteCombine) is out of the scope of cache coherency machanism
> +	  for PCIe devices (this is a PCIe protocol violation, may be fixed
> +	  in newer chipsets).
> +
> +	  This means WUC can only used for write-only memory regions now, so
> +	  this option is disabled by default (means WUC falls back to SUC for
> +	  ioremap). You can enable this option if the kernel is ensured to run
> +	  on bug-free hardwares.

Fix text here like with the commit message.

Then add a "If unsure, say N" line to serve as a warning?

> +
>   config ARCH_STRICT_ALIGN
>   	bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
>   	default y
> diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
> index 402a7d9e3a53..079ef897ed1a 100644
> --- a/arch/loongarch/include/asm/io.h
> +++ b/arch/loongarch/include/asm/io.h
> @@ -54,8 +54,13 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
>    * @offset:    bus address of the memory
>    * @size:      size of the resource to map
>    */
> +#ifdef CONFIG_ARCH_WRITECOMBINE
>   #define ioremap_wc(offset, size)	\
>   	ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
> +#else
> +#define ioremap_wc(offset, size)	\
> +	ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_SUC))
> +#endif
>   
>   #define ioremap_cache(offset, size)	\
>   	ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))

I'll test this later tonight with my RX 6400. See you in a few hours...
  
Xi Ruoyao March 14, 2023, 10:10 a.m. UTC | #4
On Tue, 2023-03-14 at 18:02 +0800, Huacai Chen wrote:
> > Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
> > automatically from the vendor:device ID of the PCI root controller?
> > Then we don't need to rely on the user or distro maintainer to select an
> > option.  I see there is already many architecture-dependant #if
> > directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
> > in PCI tree...
> Not a good idea, pci quirks need too long a time to review, and we
> don't know when this issue can be fixed in hardware.

How about make a PCI fixup in arch/loongarch/pci/pci.c then?  I think we
just need to set a flag in the fixup, then check the flag in
ioremap_wc...
  
WANG Xuerui March 14, 2023, 10:11 a.m. UTC | #5
On 2023/3/14 18:02, Huacai Chen wrote:
> Hi, Ruoyao,
> 
> On Tue, Mar 14, 2023 at 5:41 PM Xi Ruoyao <xry111@xry111.site> wrote:
>>
>> On Tue, 2023-03-14 at 16:54 +0800, Huacai Chen wrote:
>>> LoongArch maintains cache coherency in hardware, but when works with
>>> LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
>>> to WriteCombine) is out of the scope of cache coherency machanism for
>>> PCIe devices (this is a PCIe protocol violation, may be fixed in newer
>>> chipsets).
>>
>> IIUC all launched LS7A models (7A1000 and 7A2000) suffers this issue?
> Yes, very unfortunately, but this issue is only observed in the amdgpu
> driver now.

It's PCIe protocol violation after all, and we can never be sure about 
the vast amount of hardware untested on loongarch after all. Miserable 
as the performance hit may get, we don't really have another choice, 
unfortunately. Someone needs to lecture the LS7A team real hard!

>>
>>> This means WUC can only used for write-only memory regions now, so this
>>> option is disabled by default (means WUC falls back to SUC for ioremap).
>>> You can enable this option if the kernel is ensured to run on bug-free
>>> hardwares.
>>
>> Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
>> automatically from the vendor:device ID of the PCI root controller?
>> Then we don't need to rely on the user or distro maintainer to select an
>> option.  I see there is already many architecture-dependant #if
>> directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
>> in PCI tree...
> Not a good idea, pci quirks need too long a time to review, and we
> don't know when this issue can be fixed in hardware.
> 
>>
>> If a PCI quirk is not possible, then is it possible to make a kernel
>> command line option, leaving this CONFIG as the default value of the
>> option?  I guess in the future many LoongArch users will just install a
>> binary distro, then it would be much easier to edit grub.cfg than
>> rebuilding the kernel when they finally buy a compliant PCIe controller.
> If we use command line parameter, we can remove this Kconfig option.

An option is still useful as specifying the compile-time default for 
such a kernel parameter, IMO.
  
Huacai Chen March 14, 2023, 11:08 a.m. UTC | #6
On Tue, Mar 14, 2023 at 6:12 PM WANG Xuerui <kernel@xen0n.name> wrote:
>
> On 2023/3/14 18:02, Huacai Chen wrote:
> > Hi, Ruoyao,
> >
> > On Tue, Mar 14, 2023 at 5:41 PM Xi Ruoyao <xry111@xry111.site> wrote:
> >>
> >> On Tue, 2023-03-14 at 16:54 +0800, Huacai Chen wrote:
> >>> LoongArch maintains cache coherency in hardware, but when works with
> >>> LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> >>> to WriteCombine) is out of the scope of cache coherency machanism for
> >>> PCIe devices (this is a PCIe protocol violation, may be fixed in newer
> >>> chipsets).
> >>
> >> IIUC all launched LS7A models (7A1000 and 7A2000) suffers this issue?
> > Yes, very unfortunately, but this issue is only observed in the amdgpu
> > driver now.
>
> It's PCIe protocol violation after all, and we can never be sure about
> the vast amount of hardware untested on loongarch after all. Miserable
> as the performance hit may get, we don't really have another choice,
> unfortunately. Someone needs to lecture the LS7A team real hard!
>
> >>
> >>> This means WUC can only used for write-only memory regions now, so this
> >>> option is disabled by default (means WUC falls back to SUC for ioremap).
> >>> You can enable this option if the kernel is ensured to run on bug-free
> >>> hardwares.
> >>
> >> Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
> >> automatically from the vendor:device ID of the PCI root controller?
> >> Then we don't need to rely on the user or distro maintainer to select an
> >> option.  I see there is already many architecture-dependant #if
> >> directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
> >> in PCI tree...
> > Not a good idea, pci quirks need too long a time to review, and we
> > don't know when this issue can be fixed in hardware.
> >
> >>
> >> If a PCI quirk is not possible, then is it possible to make a kernel
> >> command line option, leaving this CONFIG as the default value of the
> >> option?  I guess in the future many LoongArch users will just install a
> >> binary distro, then it would be much easier to edit grub.cfg than
> >> rebuilding the kernel when they finally buy a compliant PCIe controller.
> > If we use command line parameter, we can remove this Kconfig option.
>
> An option is still useful as specifying the compile-time default for
> such a kernel parameter, IMO.
I will update commit messages and add a kernel parameter, thanks.

Huacai
>
> --
> WANG "xen0n" Xuerui
>
> Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
>
>
  

Patch

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 0d11738a861a..e3f5c422636f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -446,6 +446,20 @@  config ARCH_IOREMAP
 	  protection support. However, you can enable LoongArch DMW-based
 	  ioremap() for better performance.
 
+config ARCH_WRITECOMBINE
+	bool "Enable WriteCombine (WUC) for ioremap()"
+	help
+	  LoongArch maintains cache coherency in hardware, but with LS7A
+	  chipsets the WUC attribute (Weak-ordered UnCached, which is similar
+	  to WriteCombine) is out of the scope of cache coherency machanism
+	  for PCIe devices (this is a PCIe protocol violation, may be fixed
+	  in newer chipsets).
+
+	  This means WUC can only used for write-only memory regions now, so
+	  this option is disabled by default (means WUC falls back to SUC for
+	  ioremap). You can enable this option if the kernel is ensured to run
+	  on bug-free hardwares.
+
 config ARCH_STRICT_ALIGN
 	bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
 	default y
diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
index 402a7d9e3a53..079ef897ed1a 100644
--- a/arch/loongarch/include/asm/io.h
+++ b/arch/loongarch/include/asm/io.h
@@ -54,8 +54,13 @@  static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
  * @offset:    bus address of the memory
  * @size:      size of the resource to map
  */
+#ifdef CONFIG_ARCH_WRITECOMBINE
 #define ioremap_wc(offset, size)	\
 	ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
+#else
+#define ioremap_wc(offset, size)	\
+	ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_SUC))
+#endif
 
 #define ioremap_cache(offset, size)	\
 	ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))