[06/21] powerpc: dma-mapping: minimize for_cpu flushing

Message ID 20230327121317.4081816-7-arnd@kernel.org
State New
Headers
Series dma-mapping: unify support for cache flushes |

Commit Message

Arnd Bergmann March 27, 2023, 12:13 p.m. UTC
  From: Arnd Bergmann <arnd@arndb.de>

The powerpc dma_sync_*_for_cpu() variants do more flushes than on other
architectures. Reduce it to what everyone else does:

 - No flush is needed after data has been sent to a device

 - When data has been received from a device, the cache only needs to
   be invalidated to clear out cache lines that were speculatively
   prefetched.

In particular, the second flushing of partial cache lines of bidirectional
buffers is actively harmful -- if a single cache line is written by both
the CPU and the device, flushing it again does not maintain coherency
but instead overwrite the data that was just received from the device.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/powerpc/mm/dma-noncoherent.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)
  

Comments

Christophe Leroy March 27, 2023, 12:56 p.m. UTC | #1
Le 27/03/2023 à 14:13, Arnd Bergmann a écrit :
> From: Arnd Bergmann <arnd@arndb.de>
> 
> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other
> architectures. Reduce it to what everyone else does:
> 
>   - No flush is needed after data has been sent to a device
> 
>   - When data has been received from a device, the cache only needs to
>     be invalidated to clear out cache lines that were speculatively
>     prefetched.
> 
> In particular, the second flushing of partial cache lines of bidirectional
> buffers is actively harmful -- if a single cache line is written by both
> the CPU and the device, flushing it again does not maintain coherency
> but instead overwrite the data that was just received from the device.

Hum ..... Who is right ?

That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent 
memory corruption due to cache invalidation of unaligned DMA buffer")

I think your commit log should explain why that commit was wrong, and 
maybe say that your patch is a revert of that commit ?

Christophe


> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>   arch/powerpc/mm/dma-noncoherent.c | 18 ++++--------------
>   1 file changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
> index f10869d27de5..e108cacf877f 100644
> --- a/arch/powerpc/mm/dma-noncoherent.c
> +++ b/arch/powerpc/mm/dma-noncoherent.c
> @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
>   	switch (direction) {
>   	case DMA_NONE:
>   		BUG();
> -	case DMA_FROM_DEVICE:
> -		/*
> -		 * invalidate only when cache-line aligned otherwise there is
> -		 * the potential for discarding uncommitted data from the cache
> -		 */
> -		if ((start | end) & (L1_CACHE_BYTES - 1))
> -			__dma_phys_op(start, end, DMA_CACHE_FLUSH);
> -		else
> -			__dma_phys_op(start, end, DMA_CACHE_INVAL);
> -		break;
> -	case DMA_TO_DEVICE:		/* writeback only */
> -		__dma_phys_op(start, end, DMA_CACHE_CLEAN);
> +	case DMA_TO_DEVICE:
>   		break;
> -	case DMA_BIDIRECTIONAL:	/* writeback and invalidate */
> -		__dma_phys_op(start, end, DMA_CACHE_FLUSH);
> +	case DMA_FROM_DEVICE:
> +	case DMA_BIDIRECTIONAL:
> +		__dma_phys_op(start, end, DMA_CACHE_INVAL);
>   		break;
>   	}
>   }
  
Arnd Bergmann March 27, 2023, 1:02 p.m. UTC | #2
On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote:
> Le 27/03/2023 à 14:13, Arnd Bergmann a écrit :
>> From: Arnd Bergmann <arnd@arndb.de>
>> 
>> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other
>> architectures. Reduce it to what everyone else does:
>> 
>>   - No flush is needed after data has been sent to a device
>> 
>>   - When data has been received from a device, the cache only needs to
>>     be invalidated to clear out cache lines that were speculatively
>>     prefetched.
>> 
>> In particular, the second flushing of partial cache lines of bidirectional
>> buffers is actively harmful -- if a single cache line is written by both
>> the CPU and the device, flushing it again does not maintain coherency
>> but instead overwrite the data that was just received from the device.
>
> Hum ..... Who is right ?
>
> That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent 
> memory corruption due to cache invalidation of unaligned DMA buffer")
>
> I think your commit log should explain why that commit was wrong, and 
> maybe say that your patch is a revert of that commit ?

Ok, I'll try to explain this better. To clarify here: the __dma_sync()
function in commit 03d70617b8a7 is used both before and after a DMA,
but my patch 05/21 splits this in two, and patch 06/21 only changes
the part that gets called after the DMA-from-device but leaves the
part before DMA-from-device unchanged, which Andrew's patch
addressed.

As I mentioned in the cover letter, it is still unclear whether
we want to consider this the expected behavior as the documentation
seems unclear, but my series does not attempt to answer that
question.

     Arnd
  

Patch

diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
index f10869d27de5..e108cacf877f 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -132,21 +132,11 @@  void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 	switch (direction) {
 	case DMA_NONE:
 		BUG();
-	case DMA_FROM_DEVICE:
-		/*
-		 * invalidate only when cache-line aligned otherwise there is
-		 * the potential for discarding uncommitted data from the cache
-		 */
-		if ((start | end) & (L1_CACHE_BYTES - 1))
-			__dma_phys_op(start, end, DMA_CACHE_FLUSH);
-		else
-			__dma_phys_op(start, end, DMA_CACHE_INVAL);
-		break;
-	case DMA_TO_DEVICE:		/* writeback only */
-		__dma_phys_op(start, end, DMA_CACHE_CLEAN);
+	case DMA_TO_DEVICE:
 		break;
-	case DMA_BIDIRECTIONAL:	/* writeback and invalidate */
-		__dma_phys_op(start, end, DMA_CACHE_FLUSH);
+	case DMA_FROM_DEVICE:
+	case DMA_BIDIRECTIONAL:
+		__dma_phys_op(start, end, DMA_CACHE_INVAL);
 		break;
 	}
 }