arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency

Message ID 20221124142501.29314-1-johan+linaro@kernel.org
State New
Headers
Series arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency |

Commit Message

Johan Hovold Nov. 24, 2022, 2:25 p.m. UTC
  The devices on the SC8280XP PCIe buses are cache coherent and must be
marked as such to avoid data corruption.

A coherent device can, for example, end up snooping stale data from the
caches instead of using data written by the CPU through the
non-cacheable mapping which is used for consistent DMA buffers for
non-coherent devices.

Note that this is much more likely to happen since commit c44094eee32f
("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()")
that was added in 6.1 and which removed the cache invalidation when
setting up the non-cacheable mapping.

Marking the PCIe devices as coherent specifically fixes the intermittent
NVMe probe failures observed on the Thinkpad X13s, which was due to
corruption of the submission and completion queues. This was typically
observed as corruption of the admin submission queue (with well-formed
completion):

	could not locate request for tag 0x0
	nvme nvme0: invalid id 0 completed on queue 0

or corruption of the admin or I/O completion queues (malformed
completion):

	could not locate request for tag 0x45f
	nvme nvme0: invalid id 25695 completed on queue 25965

presumably as these queues are small enough to not be allocated using
CMA which in turn make them more likely to be cached (e.g. due to
accesses to nearby pages through the cacheable linear map). Increasing
the buffer sizes to two pages to force CMA allocation also appears to
make the problem go away.

Fixes: 813e83157001 ("arm64: dts: qcom: sc8280xp/sa8540p: add PCIe2-4 nodes")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
 arch/arm64/boot/dts/qcom/sc8280xp.dtsi | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

Konrad Dybcio Nov. 24, 2022, 2:32 p.m. UTC | #1
On 24.11.2022 15:25, Johan Hovold wrote:
> The devices on the SC8280XP PCIe buses are cache coherent and must be
> marked as such to avoid data corruption.
> 
> A coherent device can, for example, end up snooping stale data from the
> caches instead of using data written by the CPU through the
> non-cacheable mapping which is used for consistent DMA buffers for
> non-coherent devices.
> 
> Note that this is much more likely to happen since commit c44094eee32f
> ("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()")
> that was added in 6.1 and which removed the cache invalidation when
> setting up the non-cacheable mapping.
> 
> Marking the PCIe devices as coherent specifically fixes the intermittent
> NVMe probe failures observed on the Thinkpad X13s, which was due to
> corruption of the submission and completion queues. This was typically
> observed as corruption of the admin submission queue (with well-formed
> completion):
> 
> 	could not locate request for tag 0x0
> 	nvme nvme0: invalid id 0 completed on queue 0
> 
> or corruption of the admin or I/O completion queues (malformed
> completion):
> 
> 	could not locate request for tag 0x45f
> 	nvme nvme0: invalid id 25695 completed on queue 25965
> 
> presumably as these queues are small enough to not be allocated using
> CMA which in turn make them more likely to be cached (e.g. due to
> accesses to nearby pages through the cacheable linear map). Increasing
> the buffer sizes to two pages to force CMA allocation also appears to
> make the problem go away.
> 
> Fixes: 813e83157001 ("arm64: dts: qcom: sc8280xp/sa8540p: add PCIe2-4 nodes")
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
Looks like 8450 should also be like this, good catch!

Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
>  arch/arm64/boot/dts/qcom/sc8280xp.dtsi | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> index 27f5c2f82338..7748cd29276d 100644
> --- a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> @@ -854,6 +854,8 @@ pcie4: pcie@1c00000 {
>  				 <0x02000000 0x0 0x30300000 0x0 0x30300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <6>;
>  			num-lanes = <1>;
>  
> @@ -951,6 +953,8 @@ pcie3b: pcie@1c08000 {
>  				 <0x02000000 0x0 0x32300000 0x0 0x32300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <5>;
>  			num-lanes = <2>;
>  
> @@ -1046,6 +1050,8 @@ pcie3a: pcie@1c10000 {
>  				 <0x02000000 0x0 0x34300000 0x0 0x34300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <4>;
>  			num-lanes = <4>;
>  
> @@ -1144,6 +1150,8 @@ pcie2b: pcie@1c18000 {
>  				 <0x02000000 0x0 0x38300000 0x0 0x38300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <3>;
>  			num-lanes = <2>;
>  
> @@ -1239,6 +1247,8 @@ pcie2a: pcie@1c20000 {
>  				 <0x02000000 0x0 0x3c300000 0x0 0x3c300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <2>;
>  			num-lanes = <4>;
>
  
Manivannan Sadhasivam Nov. 25, 2022, 2:26 p.m. UTC | #2
On Thu, Nov 24, 2022 at 03:25:01PM +0100, Johan Hovold wrote:
> The devices on the SC8280XP PCIe buses are cache coherent and must be
> marked as such to avoid data corruption.
> 
> A coherent device can, for example, end up snooping stale data from the
> caches instead of using data written by the CPU through the
> non-cacheable mapping which is used for consistent DMA buffers for
> non-coherent devices.
> 

Also, the device may write into the L2 cache (or whatever cache that is
accessible) if there is an entry and the CPU may invalidate it before reading
from the DMA buffer. This will end up in a data loss.

> Note that this is much more likely to happen since commit c44094eee32f
> ("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()")
> that was added in 6.1 and which removed the cache invalidation when
> setting up the non-cacheable mapping.
> 
> Marking the PCIe devices as coherent specifically fixes the intermittent
> NVMe probe failures observed on the Thinkpad X13s, which was due to
> corruption of the submission and completion queues. This was typically
> observed as corruption of the admin submission queue (with well-formed
> completion):
> 
> 	could not locate request for tag 0x0
> 	nvme nvme0: invalid id 0 completed on queue 0
> 
> or corruption of the admin or I/O completion queues (malformed
> completion):
> 
> 	could not locate request for tag 0x45f
> 	nvme nvme0: invalid id 25695 completed on queue 25965
> 
> presumably as these queues are small enough to not be allocated using
> CMA which in turn make them more likely to be cached (e.g. due to
> accesses to nearby pages through the cacheable linear map). Increasing
> the buffer sizes to two pages to force CMA allocation also appears to
> make the problem go away.
> 

I don't think the problem will go away if the allocation happens from CMA
region. It may just decrease the chances of cache hit but it could always
happen due to the existence of linear mapping with cacheable attribute.

> Fixes: 813e83157001 ("arm64: dts: qcom: sc8280xp/sa8540p: add PCIe2-4 nodes")
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>

Anyway, this is a really good find!

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Thanks,
Mani

> ---
>  arch/arm64/boot/dts/qcom/sc8280xp.dtsi | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> index 27f5c2f82338..7748cd29276d 100644
> --- a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> @@ -854,6 +854,8 @@ pcie4: pcie@1c00000 {
>  				 <0x02000000 0x0 0x30300000 0x0 0x30300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <6>;
>  			num-lanes = <1>;
>  
> @@ -951,6 +953,8 @@ pcie3b: pcie@1c08000 {
>  				 <0x02000000 0x0 0x32300000 0x0 0x32300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <5>;
>  			num-lanes = <2>;
>  
> @@ -1046,6 +1050,8 @@ pcie3a: pcie@1c10000 {
>  				 <0x02000000 0x0 0x34300000 0x0 0x34300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <4>;
>  			num-lanes = <4>;
>  
> @@ -1144,6 +1150,8 @@ pcie2b: pcie@1c18000 {
>  				 <0x02000000 0x0 0x38300000 0x0 0x38300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <3>;
>  			num-lanes = <2>;
>  
> @@ -1239,6 +1247,8 @@ pcie2a: pcie@1c20000 {
>  				 <0x02000000 0x0 0x3c300000 0x0 0x3c300000 0x0 0x1d00000>;
>  			bus-range = <0x00 0xff>;
>  
> +			dma-coherent;
> +
>  			linux,pci-domain = <2>;
>  			num-lanes = <4>;
>  
> -- 
> 2.37.4
>
  
Johan Hovold Nov. 25, 2022, 2:43 p.m. UTC | #3
On Fri, Nov 25, 2022 at 07:56:25PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Nov 24, 2022 at 03:25:01PM +0100, Johan Hovold wrote:
> > The devices on the SC8280XP PCIe buses are cache coherent and must be
> > marked as such to avoid data corruption.
> > 
> > A coherent device can, for example, end up snooping stale data from the
> > caches instead of using data written by the CPU through the
> > non-cacheable mapping which is used for consistent DMA buffers for
> > non-coherent devices.
> > 
> 
> Also, the device may write into the L2 cache (or whatever cache that is
> accessible) if there is an entry and the CPU may invalidate it before reading
> from the DMA buffer. This will end up in a data loss.

I mentioned the above as an example, but clearly it can affect also the
other direction (e.g. as described below).

> > Note that this is much more likely to happen since commit c44094eee32f
> > ("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()")
> > that was added in 6.1 and which removed the cache invalidation when
> > setting up the non-cacheable mapping.
> > 
> > Marking the PCIe devices as coherent specifically fixes the intermittent
> > NVMe probe failures observed on the Thinkpad X13s, which was due to
> > corruption of the submission and completion queues. This was typically
> > observed as corruption of the admin submission queue (with well-formed
> > completion):
> > 
> > 	could not locate request for tag 0x0
> > 	nvme nvme0: invalid id 0 completed on queue 0
> > 
> > or corruption of the admin or I/O completion queues (malformed
> > completion):
> > 
> > 	could not locate request for tag 0x45f
> > 	nvme nvme0: invalid id 25695 completed on queue 25965
> > 
> > presumably as these queues are small enough to not be allocated using
> > CMA which in turn make them more likely to be cached (e.g. due to
> > accesses to nearby pages through the cacheable linear map). Increasing
> > the buffer sizes to two pages to force CMA allocation also appears to
> > make the problem go away.
> > 
> 
> I don't think the problem will go away if the allocation happens from CMA
> region. It may just decrease the chances of cache hit but it could always
> happen due to the existence of linear mapping with cacheable attribute.

I never claimed it would fix the problem, I explicitly wrote that it
made it less likely to occur (to the point where my reproducer no longer
triggers).

Johan
  
Manivannan Sadhasivam Nov. 25, 2022, 2:53 p.m. UTC | #4
On Fri, Nov 25, 2022 at 03:43:59PM +0100, Johan Hovold wrote:
> On Fri, Nov 25, 2022 at 07:56:25PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Nov 24, 2022 at 03:25:01PM +0100, Johan Hovold wrote:
> > > The devices on the SC8280XP PCIe buses are cache coherent and must be
> > > marked as such to avoid data corruption.
> > > 
> > > A coherent device can, for example, end up snooping stale data from the
> > > caches instead of using data written by the CPU through the
> > > non-cacheable mapping which is used for consistent DMA buffers for
> > > non-coherent devices.
> > > 
> > 
> > Also, the device may write into the L2 cache (or whatever cache that is
> > accessible) if there is an entry and the CPU may invalidate it before reading
> > from the DMA buffer. This will end up in a data loss.
> 
> I mentioned the above as an example, but clearly it can affect also the
> other direction (e.g. as described below).
> 
> > > Note that this is much more likely to happen since commit c44094eee32f
> > > ("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()")
> > > that was added in 6.1 and which removed the cache invalidation when
> > > setting up the non-cacheable mapping.
> > > 
> > > Marking the PCIe devices as coherent specifically fixes the intermittent
> > > NVMe probe failures observed on the Thinkpad X13s, which was due to
> > > corruption of the submission and completion queues. This was typically
> > > observed as corruption of the admin submission queue (with well-formed
> > > completion):
> > > 
> > > 	could not locate request for tag 0x0
> > > 	nvme nvme0: invalid id 0 completed on queue 0
> > > 
> > > or corruption of the admin or I/O completion queues (malformed
> > > completion):
> > > 
> > > 	could not locate request for tag 0x45f
> > > 	nvme nvme0: invalid id 25695 completed on queue 25965
> > > 
> > > presumably as these queues are small enough to not be allocated using
> > > CMA which in turn make them more likely to be cached (e.g. due to
> > > accesses to nearby pages through the cacheable linear map). Increasing
> > > the buffer sizes to two pages to force CMA allocation also appears to
> > > make the problem go away.
> > > 
> > 
> > I don't think the problem will go away if the allocation happens from CMA
> > region. It may just decrease the chances of cache hit but it could always
> > happen due to the existence of linear mapping with cacheable attribute.
> 
> I never claimed it would fix the problem, I explicitly wrote that it
> made it less likely to occur (to the point where my reproducer no longer
> triggers).
> 

> Increasing the buffer sizes to two pages to force CMA allocation also appears
> to make the problem go away.

The "go away" part sounded like a claim to me and hence I added the statement.
But no worries :)

Thanks,
Mani

> Johan
  
Johan Hovold Nov. 25, 2022, 3:49 p.m. UTC | #5
On Fri, Nov 25, 2022 at 08:23:36PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Nov 25, 2022 at 03:43:59PM +0100, Johan Hovold wrote:
> > On Fri, Nov 25, 2022 at 07:56:25PM +0530, Manivannan Sadhasivam wrote:
> > > On Thu, Nov 24, 2022 at 03:25:01PM +0100, Johan Hovold wrote:

> > I never claimed it would fix the problem, I explicitly wrote that it
> > made it less likely to occur (to the point where my reproducer no longer
> > triggers).
> 
> > Increasing the buffer sizes to two pages to force CMA allocation also appears
> > to make the problem go away.
> 
> The "go away" part sounded like a claim to me and hence I added the statement.
> But no worries :)

Hopefully it's clear enough if you also read the preceding sentence
(with emphasis added):

  presumably as these queues are small enough to not be allocated using
  CMA which in turn make them *more likely to be cached* (e.g. due to
  accesses to nearby pages through the cacheable linear map).

Johan
  
Bjorn Andersson Dec. 2, 2022, 8:58 p.m. UTC | #6
On Thu, 24 Nov 2022 15:25:01 +0100, Johan Hovold wrote:
> The devices on the SC8280XP PCIe buses are cache coherent and must be
> marked as such to avoid data corruption.
> 
> A coherent device can, for example, end up snooping stale data from the
> caches instead of using data written by the CPU through the
> non-cacheable mapping which is used for consistent DMA buffers for
> non-coherent devices.
> 
> [...]

Applied, thanks!

[1/1] arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency
      commit: 0922df8f52b88d5c718d0cfe10794ac44b95ac78

Best regards,
  

Patch

diff --git a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
index 27f5c2f82338..7748cd29276d 100644
--- a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
@@ -854,6 +854,8 @@  pcie4: pcie@1c00000 {
 				 <0x02000000 0x0 0x30300000 0x0 0x30300000 0x0 0x1d00000>;
 			bus-range = <0x00 0xff>;
 
+			dma-coherent;
+
 			linux,pci-domain = <6>;
 			num-lanes = <1>;
 
@@ -951,6 +953,8 @@  pcie3b: pcie@1c08000 {
 				 <0x02000000 0x0 0x32300000 0x0 0x32300000 0x0 0x1d00000>;
 			bus-range = <0x00 0xff>;
 
+			dma-coherent;
+
 			linux,pci-domain = <5>;
 			num-lanes = <2>;
 
@@ -1046,6 +1050,8 @@  pcie3a: pcie@1c10000 {
 				 <0x02000000 0x0 0x34300000 0x0 0x34300000 0x0 0x1d00000>;
 			bus-range = <0x00 0xff>;
 
+			dma-coherent;
+
 			linux,pci-domain = <4>;
 			num-lanes = <4>;
 
@@ -1144,6 +1150,8 @@  pcie2b: pcie@1c18000 {
 				 <0x02000000 0x0 0x38300000 0x0 0x38300000 0x0 0x1d00000>;
 			bus-range = <0x00 0xff>;
 
+			dma-coherent;
+
 			linux,pci-domain = <3>;
 			num-lanes = <2>;
 
@@ -1239,6 +1247,8 @@  pcie2a: pcie@1c20000 {
 				 <0x02000000 0x0 0x3c300000 0x0 0x3c300000 0x0 0x1d00000>;
 			bus-range = <0x00 0xff>;
 
+			dma-coherent;
+
 			linux,pci-domain = <2>;
 			num-lanes = <4>;