Message ID | 20221114110329.68413-1-manivannan.sadhasivam@linaro.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2082246wru; Mon, 14 Nov 2022 03:05:05 -0800 (PST) X-Google-Smtp-Source: AA0mqf7HlvkCCYoxryVaxR/QfPU5pdDOmLMniwsoOnDmwqoLUZaLPwykKixQzNRTuN3FvrueLWew X-Received: by 2002:aa7:cd99:0:b0:467:7508:89ca with SMTP id x25-20020aa7cd99000000b00467750889camr8656108edv.284.1668423905103; Mon, 14 Nov 2022 03:05:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668423905; cv=none; d=google.com; s=arc-20160816; b=s4MF0evYPF0YF0wD7g0ezhHFrcDOOxlZGZ5wBGfBAlgArPPakfldE0FBY7Zl21sRjC 22zg1DyR8kD+w9hCnGi3lymvW8j3Nzqc5WUlWprZ/ewPuX01Ts0TWRokdYqloW/IGb3R klAopgcm5+U1igwnQoqJUodYB7B58GnmPkT5bdhsXnwqBV7qZLlt3twMh4miJIrck3sR 7Mg4ZTljGO2f2QcK0t6uWh6sD3Ue50U+P2zgKEeNW/0fwTbMVfSm2o2nKKNidaC69rup TLw8fQ/9E+SCBlyJwisnxTdIX4aiMgUoPhop5PL9zVT/FA7ebrr4KVMtAu0lSchyJuET 6SfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=wulaqFmz/16KbqnxhxcVgelYxnEambXsRQ1dqrt/q5w=; b=nqXpD4f6h/cRvKsicnYSgauxo+WbNXOiYUYLeRiNiZwrV2EIC3OShENC+S+eO+fsZI oHx+y5r+bgqAMkQDJWk+aczDYNVuKbBsva1KZDIg/l0NW9QHHDduBsd6zS4TOZfdp201 S4Z6srYT2aFBeM79VThwl1zi82cvXvaV8n9v6peEEnqe8ERZ+yqmOIzwiZrYWdevIhoC cPLYKmFcFmmdbvJGIl00L0jmTKqdtsZjJAn0EYv+C/Rk+x1BpXsJg/REjtgVUG/CVZSI ulP549uU37d7n6ExXoMF82oZifoM88uF8a+RRh7Li1NI76lo40X2AlUgco4Zkmxga4cz WyJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="LrdXKb/j"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l17-20020a170906795100b007ae74740f8bsi8504542ejo.386.2022.11.14.03.04.40; Mon, 14 Nov 2022 03:05:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="LrdXKb/j"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236524AbiKNLDr (ORCPT <rfc822;winker.wchi@gmail.com> + 99 others); Mon, 14 Nov 2022 06:03:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236454AbiKNLDo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 14 Nov 2022 06:03:44 -0500 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C550111A for <linux-kernel@vger.kernel.org>; Mon, 14 Nov 2022 03:03:43 -0800 (PST) Received: by mail-pj1-x1035.google.com with SMTP id e7-20020a17090a77c700b00216928a3917so13413066pjs.4 for <linux-kernel@vger.kernel.org>; Mon, 14 Nov 2022 03:03:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=wulaqFmz/16KbqnxhxcVgelYxnEambXsRQ1dqrt/q5w=; b=LrdXKb/jllsa3/8C9UROuiSf25VBPlOrjTAqiK+seBx9Gh4tMhNNgmUhbdwrUhX5s6 coS6SKNfn1yAu9EO+xHyOBJkZbug2wb/7/0feFzOTnISHWYWb1dFj11Aq8olGsBLbSIw Lasw0Jx54i9hcnZMCpNR2ommOUtheoKrcAHsyTsZNUOKazAqobN3G+MNhC8RRxm+OD+c oGQsnPXH+1l7SjJ0/kvn5EIHhkkZJMJnXt8gExdyfru/QkaRH2/CuOnXAAfG36Njsska KJwLOLWW5RcUbUEzsJDahWlJI1JKZnFyXrY63sz6DQw91RBUPYKqvgQsTdMOgzRViW8Y EvwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wulaqFmz/16KbqnxhxcVgelYxnEambXsRQ1dqrt/q5w=; b=1G9JYbtjKOpgi1tcf/SBjD/54lbVN6JOd0AXIJ/C9rhjCeU5WoTA40BMXq8mQ0mvUu lrWd6Npfj8Gm2H7OfVqLaKMLhhgPw+H4npcn2eLEbeU6+rwthVllLsR42CLqqjufnn2P lRp0x3UJMGph6NUNR90+OFQcc+EMBZBrhADSITq7+Vd2e8IoJLlkX6eb0wphnNxSUJkv iP4/CDCItUjBX6TsdyC7WcznXJKZwDz2SRlOcembEKiToAhIzLGBad6WNeSXPcyq++g9 RGE4KmahZ5cxOrXlHWb7jjOujFwooiwlIyDgLVFKqZ1isCIeXM6Bgksnhn44ATqpUCP7 sjVw== X-Gm-Message-State: ANoB5plUUEPt4m7h6uM71/pqUxXkzjeN7kpZQRI3FXPr/zTWAL2g99hZ a8iJP/AeGAi322iafjAEbijn X-Received: by 2002:a17:902:ef50:b0:17f:799b:297d with SMTP id e16-20020a170902ef5000b0017f799b297dmr13107125plx.72.1668423822795; Mon, 14 Nov 2022 03:03:42 -0800 (PST) Received: from localhost.localdomain ([117.248.0.54]) by smtp.gmail.com with ESMTPSA id l12-20020a170903244c00b0017f592a7eccsm7128773pls.298.2022.11.14.03.03.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 03:03:41 -0800 (PST) From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> To: catalin.marinas@arm.com, will@kernel.org Cc: robin.murphy@arm.com, amit.pundir@linaro.org, andersson@kernel.org, quic_sibis@quicinc.com, sumit.semwal@linaro.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Subject: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Date: Mon, 14 Nov 2022 16:33:29 +0530 Message-Id: <20221114110329.68413-1-manivannan.sadhasivam@linaro.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749469264381781656?= X-GMAIL-MSGID: =?utf-8?q?1749469264381781656?= |
Series |
Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
|
|
Commit Message
Manivannan Sadhasivam
Nov. 14, 2022, 11:03 a.m. UTC
This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
As reported by Amit [1], dropping cache invalidation from
arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
(most probably on other Qcom platforms too). The reason is, Qcom
qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
for validation. The modem has a secure block (XPU) that will trigger a
whole system crash if the shared memory is accessed by the CPU while modem
is poking at it.
To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
with no kernel mapping, vmap's it, copies the firmware metadata and
unvmap's it. Finally the address is then shared with modem for metadata
validation [2].
Now because of the removal of cache invalidation from
arch_dma_prep_coherent(), there will be cache lines associated with this
memory even after sharing with modem. So when the CPU accesses it, the XPU
violation gets triggered.
So let's revert this commit to get remoteproc's working (thereby avoiding
full system crash) on Qcom platforms.
[1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c#n933
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
Will, Catalin: Please share if you have any other suggestions to handle the
resource sharing in the remoteproc driver that could avoid this revert.
arch/arm64/mm/dma-mapping.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > As reported by Amit [1], dropping cache invalidation from > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform s/SM8250/SDM845/g Sorry for the confusion. Thanks, Mani > (most probably on other Qcom platforms too). The reason is, Qcom > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > for validation. The modem has a secure block (XPU) that will trigger a > whole system crash if the shared memory is accessed by the CPU while modem > is poking at it. > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > with no kernel mapping, vmap's it, copies the firmware metadata and > unvmap's it. Finally the address is then shared with modem for metadata > validation [2]. > > Now because of the removal of cache invalidation from > arch_dma_prep_coherent(), there will be cache lines associated with this > memory even after sharing with modem. So when the CPU accesses it, the XPU > violation gets triggered. > > So let's revert this commit to get remoteproc's working (thereby avoiding > full system crash) on Qcom platforms. > > [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/ > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c#n933 > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > --- > > Will, Catalin: Please share if you have any other suggestions to handle the > resource sharing in the remoteproc driver that could avoid this revert. > > arch/arm64/mm/dma-mapping.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 3cb101e8cb29..7d7e9a046305 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -36,7 +36,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > > - dcache_clean_poc(start, start + size); > + dcache_clean_inval_poc(start, start + size); > } > > #ifdef CONFIG_IOMMU_DMA > -- > 2.25.1 >
On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > As reported by Amit [1], dropping cache invalidation from > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > (most probably on other Qcom platforms too). The reason is, Qcom > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > for validation. The modem has a secure block (XPU) that will trigger a > whole system crash if the shared memory is accessed by the CPU while modem > is poking at it. > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > with no kernel mapping, vmap's it, copies the firmware metadata and > unvmap's it. Finally the address is then shared with modem for metadata > validation [2]. > > Now because of the removal of cache invalidation from > arch_dma_prep_coherent(), there will be cache lines associated with this > memory even after sharing with modem. So when the CPU accesses it, the XPU > violation gets triggered. This last past is a non-sequitur: the buffer is no longer mapped on the CPU side, so how would the CPU access it? As I just replied to Amit, we need more information about what this "access" is and how it is being detected. Will
On 2022-11-14 14:11, Will Deacon wrote: > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: >> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. >> >> As reported by Amit [1], dropping cache invalidation from >> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform >> (most probably on other Qcom platforms too). The reason is, Qcom >> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem >> for validation. The modem has a secure block (XPU) that will trigger a >> whole system crash if the shared memory is accessed by the CPU while modem >> is poking at it. >> >> To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory >> with no kernel mapping, vmap's it, copies the firmware metadata and >> unvmap's it. Finally the address is then shared with modem for metadata >> validation [2]. >> >> Now because of the removal of cache invalidation from >> arch_dma_prep_coherent(), there will be cache lines associated with this >> memory even after sharing with modem. So when the CPU accesses it, the XPU >> violation gets triggered. > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > side, so how would the CPU access it? Right, for the previous change to have made a difference the offending part of this buffer must be present in some cache somewhere *before* the DMA buffer allocation completes. Clearly that driver is completely broken though. If the DMA allocation came from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() shenanigans wouldn't work, so if it backed by struct pages then the whole dance is still pointless because *a cacheable linear mapping exists*, and it's just relying on the reduced chance that anything's going to re-fetch the linear map address after those pages have been allocated, exactly as I called out previously[1]. Robin. [1] https://lore.kernel.org/linux-arm-kernel/97fface8-e40e-072c-4335-c94094884e93@arm.com/ > As I just replied to Amit, we need more information about what this > "access" is and how it is being detected. > > Will
On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > On 2022-11-14 14:11, Will Deacon wrote: > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > > > As reported by Amit [1], dropping cache invalidation from > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > > (most probably on other Qcom platforms too). The reason is, Qcom > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > > for validation. The modem has a secure block (XPU) that will trigger a > > > whole system crash if the shared memory is accessed by the CPU while modem > > > is poking at it. > > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > > > with no kernel mapping, vmap's it, copies the firmware metadata and > > > unvmap's it. Finally the address is then shared with modem for metadata > > > validation [2]. > > > > > > Now because of the removal of cache invalidation from > > > arch_dma_prep_coherent(), there will be cache lines associated with this > > > memory even after sharing with modem. So when the CPU accesses it, the XPU > > > violation gets triggered. > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > > side, so how would the CPU access it? > > Right, for the previous change to have made a difference the offending part > of this buffer must be present in some cache somewhere *before* the DMA > buffer allocation completes. > > Clearly that driver is completely broken though. If the DMA allocation came > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > shenanigans wouldn't work, so if it backed by struct pages then the whole > dance is still pointless because *a cacheable linear mapping exists*, and > it's just relying on the reduced chance that anything's going to re-fetch > the linear map address after those pages have been allocated, exactly as I > called out previously[1]. So I guess a DMA pool that's not mapped in the linear map, together with memremap() instead of vmap(), would work around the issue. But the driver needs fixing, not the arch code.
On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > > On 2022-11-14 14:11, Will Deacon wrote: > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > > > > > As reported by Amit [1], dropping cache invalidation from > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > > > (most probably on other Qcom platforms too). The reason is, Qcom > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > > > for validation. The modem has a secure block (XPU) that will trigger a > > > > whole system crash if the shared memory is accessed by the CPU while modem > > > > is poking at it. > > > > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > > > > with no kernel mapping, vmap's it, copies the firmware metadata and > > > > unvmap's it. Finally the address is then shared with modem for metadata > > > > validation [2]. > > > > > > > > Now because of the removal of cache invalidation from > > > > arch_dma_prep_coherent(), there will be cache lines associated with this > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU > > > > violation gets triggered. > > > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > > > side, so how would the CPU access it? > > > > Right, for the previous change to have made a difference the offending part > > of this buffer must be present in some cache somewhere *before* the DMA > > buffer allocation completes. > > > > Clearly that driver is completely broken though. If the DMA allocation came > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > > shenanigans wouldn't work, so if it backed by struct pages then the whole > > dance is still pointless because *a cacheable linear mapping exists*, and > > it's just relying on the reduced chance that anything's going to re-fetch > > the linear map address after those pages have been allocated, exactly as I > > called out previously[1]. > > So I guess a DMA pool that's not mapped in the linear map, together with > memremap() instead of vmap(), would work around the issue. But the > driver needs fixing, not the arch code. > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's not part of the kernel's linear map? I looked into it but couldn't find a way. Thanks, Mani > -- > Catalin
On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: > On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: > > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > > > On 2022-11-14 14:11, Will Deacon wrote: > > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > > > > > > > As reported by Amit [1], dropping cache invalidation from > > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > > > > (most probably on other Qcom platforms too). The reason is, Qcom > > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > > > > for validation. The modem has a secure block (XPU) that will trigger a > > > > > whole system crash if the shared memory is accessed by the CPU while modem > > > > > is poking at it. > > > > > > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > > > > > with no kernel mapping, vmap's it, copies the firmware metadata and > > > > > unvmap's it. Finally the address is then shared with modem for metadata > > > > > validation [2]. > > > > > > > > > > Now because of the removal of cache invalidation from > > > > > arch_dma_prep_coherent(), there will be cache lines associated with this > > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU > > > > > violation gets triggered. > > > > > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > > > > side, so how would the CPU access it? > > > > > > Right, for the previous change to have made a difference the offending part > > > of this buffer must be present in some cache somewhere *before* the DMA > > > buffer allocation completes. > > > > > > Clearly that driver is completely broken though. If the DMA allocation came > > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > > > shenanigans wouldn't work, so if it backed by struct pages then the whole > > > dance is still pointless because *a cacheable linear mapping exists*, and > > > it's just relying on the reduced chance that anything's going to re-fetch > > > the linear map address after those pages have been allocated, exactly as I > > > called out previously[1]. > > > > So I guess a DMA pool that's not mapped in the linear map, together with > > memremap() instead of vmap(), would work around the issue. But the > > driver needs fixing, not the arch code. > > > > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's > not part of the kernel's linear map? I looked into it but couldn't find a way. The no-map property should take care of this iirc Will
On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote: > On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: > > On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: > > > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > > > > On 2022-11-14 14:11, Will Deacon wrote: > > > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > > > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > > > > > > > > > As reported by Amit [1], dropping cache invalidation from > > > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > > > > > (most probably on other Qcom platforms too). The reason is, Qcom > > > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > > > > > for validation. The modem has a secure block (XPU) that will trigger a > > > > > > whole system crash if the shared memory is accessed by the CPU while modem > > > > > > is poking at it. > > > > > > > > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > > > > > > with no kernel mapping, vmap's it, copies the firmware metadata and > > > > > > unvmap's it. Finally the address is then shared with modem for metadata > > > > > > validation [2]. > > > > > > > > > > > > Now because of the removal of cache invalidation from > > > > > > arch_dma_prep_coherent(), there will be cache lines associated with this > > > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU > > > > > > violation gets triggered. > > > > > > > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > > > > > side, so how would the CPU access it? > > > > > > > > Right, for the previous change to have made a difference the offending part > > > > of this buffer must be present in some cache somewhere *before* the DMA > > > > buffer allocation completes. > > > > > > > > Clearly that driver is completely broken though. If the DMA allocation came > > > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > > > > shenanigans wouldn't work, so if it backed by struct pages then the whole > > > > dance is still pointless because *a cacheable linear mapping exists*, and > > > > it's just relying on the reduced chance that anything's going to re-fetch > > > > the linear map address after those pages have been allocated, exactly as I > > > > called out previously[1]. > > > > > > So I guess a DMA pool that's not mapped in the linear map, together with > > > memremap() instead of vmap(), would work around the issue. But the > > > driver needs fixing, not the arch code. > > > > > > > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's > > not part of the kernel's linear map? I looked into it but couldn't find a way. > > The no-map property should take care of this iirc > Yeah, we have been using it in other places of the same driver. But as per Sibi, we used dynamic allocation for metadata validation since there was no memory reserved statically for that. But if we do not have a way to allocate a dynamic memory that is not part of kernel's linear map, then we may have to resort to using an existing reserved memory. Thanks, Mani > Will
On 11/21/22 12:12, Manivannan Sadhasivam wrote: > On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote: >> On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: >>> On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: >>>> On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: >>>>> On 2022-11-14 14:11, Will Deacon wrote: >>>>>> On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: >>>>>>> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. >>>>>>> >>>>>>> As reported by Amit [1], dropping cache invalidation from >>>>>>> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform >>>>>>> (most probably on other Qcom platforms too). The reason is, Qcom >>>>>>> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem >>>>>>> for validation. The modem has a secure block (XPU) that will trigger a >>>>>>> whole system crash if the shared memory is accessed by the CPU while modem >>>>>>> is poking at it. >>>>>>> >>>>>>> To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory >>>>>>> with no kernel mapping, vmap's it, copies the firmware metadata and >>>>>>> unvmap's it. Finally the address is then shared with modem for metadata >>>>>>> validation [2]. >>>>>>> >>>>>>> Now because of the removal of cache invalidation from >>>>>>> arch_dma_prep_coherent(), there will be cache lines associated with this >>>>>>> memory even after sharing with modem. So when the CPU accesses it, the XPU >>>>>>> violation gets triggered. >>>>>> >>>>>> This last past is a non-sequitur: the buffer is no longer mapped on the CPU >>>>>> side, so how would the CPU access it? >>>>> >>>>> Right, for the previous change to have made a difference the offending part >>>>> of this buffer must be present in some cache somewhere *before* the DMA >>>>> buffer allocation completes. >>>>> >>>>> Clearly that driver is completely broken though. If the DMA allocation came >>>>> from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() >>>>> shenanigans wouldn't work, so if it backed by struct pages then the whole >>>>> dance is still pointless because *a cacheable linear mapping exists*, and >>>>> it's just relying on the reduced chance that anything's going to re-fetch >>>>> the linear map address after those pages have been allocated, exactly as I >>>>> called out previously[1]. >>>> >>>> So I guess a DMA pool that's not mapped in the linear map, together with >>>> memremap() instead of vmap(), would work around the issue. But the >>>> driver needs fixing, not the arch code. >>>> >>> >>> Okay, thanks for the hint. Can you share how to allocate the dma-pool that's >>> not part of the kernel's linear map? I looked into it but couldn't find a way. >> >> The no-map property should take care of this iirc >> > > Yeah, we have been using it in other places of the same driver. But as per > Sibi, we used dynamic allocation for metadata validation since there was no > memory reserved statically for that. Will, Unlike the other portions in the driver that required statically defined no-map carveouts, metadata just needed a contiguous memory for authentication. Re-using existing carveouts for this metadata region may not work due to modem FW limitations and declaring a new carveout for metadata will break the device tree bindings. That's the reason for using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there other suggestions for achieving the same without breaking bindings? - Sibi > > But if we do not have a way to allocate a dynamic memory that is not part of > kernel's linear map, then we may have to resort to using an existing reserved > memory. > > Thanks, > Mani > >> Will >
On Mon, Nov 21, 2022 at 03:42:27PM +0530, Sibi Sankar wrote: > On 11/21/22 12:12, Manivannan Sadhasivam wrote: > > On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote: > > > On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: > > > > On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: > > > > > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > > > > > > Clearly that driver is completely broken though. If the DMA allocation came > > > > > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > > > > > > shenanigans wouldn't work, so if it backed by struct pages then the whole > > > > > > dance is still pointless because *a cacheable linear mapping exists*, and > > > > > > it's just relying on the reduced chance that anything's going to re-fetch > > > > > > the linear map address after those pages have been allocated, exactly as I > > > > > > called out previously[1]. > > > > > > > > > > So I guess a DMA pool that's not mapped in the linear map, together with > > > > > memremap() instead of vmap(), would work around the issue. But the > > > > > driver needs fixing, not the arch code. > > > > > > > > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's > > > > not part of the kernel's linear map? I looked into it but couldn't find a way. > > > > > > The no-map property should take care of this iirc > > > > Yeah, we have been using it in other places of the same driver. But as per > > Sibi, we used dynamic allocation for metadata validation since there was no > > memory reserved statically for that. > > Unlike the other portions in the driver that required statically defined > no-map carveouts, metadata just needed a contiguous memory for > authentication. Re-using existing carveouts for this metadata region > may not work due to modem FW limitations and declaring a new carveout for > metadata will break the device tree bindings. That's the reason for > using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with > VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there other > suggestions for achieving the same without breaking bindings? Your DMA_ATTR_NO_KERNEL_MAPPING workaround doesn't work, it only makes the failure rate smaller. All this attribute does is avoiding creating a non-cacheable mapping but you still have the kernel linear mapping in place that may be speculatively accessed by the CPU. You were just lucky so far not to have hit the issue. So I'd rather see this fixed properly with a no-map carveout. Maybe you can reuse an existing carveout if the driver already needs some and avoid changing the DT. More complicated options include allocating memory and unmapping it from the linear map with set_memory_valid(), though that's not exported to modules and it also requires the linear map to be pages only, not block mappings. Yet another option is to have the swiotlb buffer unmapped from the kernel linear map and use the bounce buffer for this. That's more involved (Robin has some patches, though for a different reason and they may not make it upstream).
Linux regression tracking (Thorsten Leemhuis)
Nov. 28, 2022, 5:44 a.m. UTC |
#10
Addressed
Unaddressed
On 14.11.22 12:03, Manivannan Sadhasivam wrote: > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > As reported by Amit [1], dropping cache invalidation from > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > (most probably on other Qcom platforms too). The reason is, Qcom > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > for validation. The modem has a secure block (XPU) that will trigger a > whole system crash if the shared memory is accessed by the CPU while modem > is poking at it. > [...] > [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/ > I have Amit's report on the list of tracked regressions. I also noticed the proposed change "arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency": https://lore.kernel.org/all/20221124142501.29314-1-johan+linaro@kernel.org/ I have no expertise in this area, but it looked somewhat related to my eyes, so please allow me to quickly ask: is that related or even supposed to fix Amit's regression? Ciao, Thorsten
On Mon, Nov 28, 2022 at 06:44:13AM +0100, Thorsten Leemhuis wrote: > On 14.11.22 12:03, Manivannan Sadhasivam wrote: > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > As reported by Amit [1], dropping cache invalidation from > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > (most probably on other Qcom platforms too). The reason is, Qcom > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > for validation. The modem has a secure block (XPU) that will trigger a > > whole system crash if the shared memory is accessed by the CPU while modem > > is poking at it. > > [...] > > [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/ > > > > I have Amit's report on the list of tracked regressions. I also noticed > the proposed change "arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency": > https://lore.kernel.org/all/20221124142501.29314-1-johan+linaro@kernel.org/ > > I have no expertise in this area, but it looked somewhat related to my > eyes, so please allow me to quickly ask: is that related or even > supposed to fix Amit's regression? > The proposed patch doesn't fix the regression reported by Amit. But the patch itself fixes an issue that might be triggered more frequently by c44094eee32f. Thanks, Mani > Ciao, Thorsten
Linux regression tracking (Thorsten Leemhuis)
Dec. 1, 2022, 9:29 a.m. UTC |
#12
Addressed
Unaddressed
Hi, this is your Linux kernel regression tracker. Top-posting for once, to make this easily accessible to everyone. Has any progress been made to fix this regression? It afaics is not a release critical issue, but well, it still would be nice to get this fixed before 6.1 is released. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. On 24.11.22 12:55, Catalin Marinas wrote: > On Mon, Nov 21, 2022 at 03:42:27PM +0530, Sibi Sankar wrote: >> On 11/21/22 12:12, Manivannan Sadhasivam wrote: >>> On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote: >>>> On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: >>>>> On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: >>>>>> On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: >>>>>>> Clearly that driver is completely broken though. If the DMA allocation came >>>>>>> from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() >>>>>>> shenanigans wouldn't work, so if it backed by struct pages then the whole >>>>>>> dance is still pointless because *a cacheable linear mapping exists*, and >>>>>>> it's just relying on the reduced chance that anything's going to re-fetch >>>>>>> the linear map address after those pages have been allocated, exactly as I >>>>>>> called out previously[1]. >>>>>> >>>>>> So I guess a DMA pool that's not mapped in the linear map, together with >>>>>> memremap() instead of vmap(), would work around the issue. But the >>>>>> driver needs fixing, not the arch code. >>>>> >>>>> Okay, thanks for the hint. Can you share how to allocate the dma-pool that's >>>>> not part of the kernel's linear map? I looked into it but couldn't find a way. >>>> >>>> The no-map property should take care of this iirc >>> >>> Yeah, we have been using it in other places of the same driver. But as per >>> Sibi, we used dynamic allocation for metadata validation since there was no >>> memory reserved statically for that. >> >> Unlike the other portions in the driver that required statically defined >> no-map carveouts, metadata just needed a contiguous memory for >> authentication. Re-using existing carveouts for this metadata region >> may not work due to modem FW limitations and declaring a new carveout for >> metadata will break the device tree bindings. That's the reason for >> using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with >> VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there other >> suggestions for achieving the same without breaking bindings? > > Your DMA_ATTR_NO_KERNEL_MAPPING workaround doesn't work, it only makes > the failure rate smaller. All this attribute does is avoiding creating a > non-cacheable mapping but you still have the kernel linear mapping in > place that may be speculatively accessed by the CPU. You were just lucky > so far not to have hit the issue. So I'd rather see this fixed properly > with a no-map carveout. Maybe you can reuse an existing carveout if the > driver already needs some and avoid changing the DT. More complicated > options include allocating memory and unmapping it from the linear map > with set_memory_valid(), though that's not exported to modules and it > also requires the linear map to be pages only, not block mappings. > > Yet another option is to have the swiotlb buffer unmapped from the > kernel linear map and use the bounce buffer for this. That's more > involved (Robin has some patches, though for a different reason and they > may not make it upstream). >
On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > Has any progress been made to fix this regression? It afaics is not a > release critical issue, but well, it still would be nice to get this > fixed before 6.1 is released. The only (nearly) risk-free "fix" for 6.1 would be to revert the commit that exposed the driver bug. It doesn't fix the actual bug, it only makes it less likely to happen. I like the original commit removing the cache invalidation as it shows drivers not behaving properly but, as a workaround, we could add a command line option to force back the old behaviour (defaulting to the new one) until the driver is fixed.
On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > Has any progress been made to fix this regression? It afaics is not a > > release critical issue, but well, it still would be nice to get this > > fixed before 6.1 is released. > > The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > that exposed the driver bug. It doesn't fix the actual bug, it only > makes it less likely to happen. > > I like the original commit removing the cache invalidation as it shows > drivers not behaving properly but, as a workaround, we could add a > command line option to force back the old behaviour (defaulting to the > new one) until the driver is fixed. We use DB845c extensively for mainline and android-mainline[1] testing with AOSP, and it is broken for weeks now. So be it a temporary workaround or a proper driver fix in place, we'd really appreciate a quick fix here. I understand that the revert doesn't fix the actual driver bug, but we were very very lucky so far that we had never hit this issue before. So at this point I'll take the revert of the upstream commit as well, while a proper fix is being worked upon. Regards, Amit Pundir [1] https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline > > -- > Catalin
Linux regression tracking (Thorsten Leemhuis)
Dec. 2, 2022, 8:54 a.m. UTC |
#15
Addressed
Unaddressed
On 02.12.22 09:26, Amit Pundir wrote: > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: >> >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: >>> Has any progress been made to fix this regression? It afaics is not a >>> release critical issue, but well, it still would be nice to get this >>> fixed before 6.1 is released. >> >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit >> that exposed the driver bug. It doesn't fix the actual bug, it only >> makes it less likely to happen. >> >> I like the original commit removing the cache invalidation as it shows >> drivers not behaving properly Yeah, I understand that, but I guess it's my job to ask at this point: "is continuing to live with the old behavior for one or two more cycles" that much of a problem"? >> but, as a workaround, we could add a >> command line option to force back the old behaviour (defaulting to the >> new one) until the driver is fixed. Well, sometimes that approach is fine to fix a regression, but I'm not sure this is one of those situations, as this... > We use DB845c extensively for mainline and android-mainline[1] testing > with AOSP, and it is broken for weeks now. So be it a temporary > workaround or a proper driver fix in place, we'd really appreciate a > quick fix here. ...doesn't sound like we are not talking about some odd corner case here. But in the end that would be up to Linus to decide. I'll point him to this thread once more in my weekly report anyway. Maybe I'll even suggest to revert this change, not sure yet. > [...] Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > On 02.12.22 09:26, Amit Pundir wrote: > > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > >> > >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > >>> Has any progress been made to fix this regression? It afaics is not a > >>> release critical issue, but well, it still would be nice to get this > >>> fixed before 6.1 is released. > >> > >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > >> that exposed the driver bug. It doesn't fix the actual bug, it only > >> makes it less likely to happen. > >> > >> I like the original commit removing the cache invalidation as it shows > >> drivers not behaving properly > > Yeah, I understand that, but I guess it's my job to ask at this point: > "is continuing to live with the old behavior for one or two more cycles" > that much of a problem"? That wouldn't be a problem. The problem is that I haven't see any efforts from the Qualcomm side to actually fix the drivers so what makes you think the issue will be addressed in one or two more cycles? On the other hand, if there were patches out there trying to fix the drivers and we just needed to revert this change to buy them some time, then that would obviously be the right thing to do. > >> but, as a workaround, we could add a > >> command line option to force back the old behaviour (defaulting to the > >> new one) until the driver is fixed. > > Well, sometimes that approach is fine to fix a regression, but I'm not > sure this is one of those situations, as this... > > > We use DB845c extensively for mainline and android-mainline[1] testing > > with AOSP, and it is broken for weeks now. So be it a temporary > > workaround or a proper driver fix in place, we'd really appreciate a > > quick fix here. > > ...doesn't sound like we are not talking about some odd corner case > here. But in the end that would be up to Linus to decide. The issue is that these drivers are abusing the DMA API to manage buffers which are being transferred to trustzone. Even with the revert, this is broken (the CPU can speculate from the kernel's cacheable linear mapping of memory), it just appears to be less likely with the CPUs on this SoC. So we end up in a situation where the kernel is flakey on these devices but with even less incentive for the drivers to be fixed. As well as broken drivers, the patch has also identified broken device-tree files where DMA-coherent devices weher incorrectly being treated as non-coherent: https://lore.kernel.org/linux-arm-kernel/20221124142501.29314-1-johan+linaro@kernel.org/ so I do think it's something that's worth having as the default behaviour. > I'll point him to this thread once more in my weekly report anyway. > Maybe I'll even suggest to revert this change, not sure yet. As I said above, I think the revert makes sense if the drivers are actually being fixed, but I'm not seeing any movement at all on that front. Will
Linux regression tracking (Thorsten Leemhuis)
Dec. 2, 2022, 10:34 a.m. UTC |
#17
Addressed
Unaddressed
On 02.12.22 11:03, Will Deacon wrote: > On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: >> On 02.12.22 09:26, Amit Pundir wrote: >>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: >>>> >>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: >>>>> Has any progress been made to fix this regression? It afaics is not a >>>>> release critical issue, but well, it still would be nice to get this >>>>> fixed before 6.1 is released. >>>> >>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit >>>> that exposed the driver bug. It doesn't fix the actual bug, it only >>>> makes it less likely to happen. >>>> >>>> I like the original commit removing the cache invalidation as it shows >>>> drivers not behaving properly >> >> Yeah, I understand that, but I guess it's my job to ask at this point: >> "is continuing to live with the old behavior for one or two more cycles" >> that much of a problem"? > > That wouldn't be a problem. The problem is that I haven't see any efforts > from the Qualcomm side to actually fix the drivers [...] Thx for sharing the details. I can fully understand your pain. But well, in the end it looks to me like this commit it intentionally breaking something that used to work -- which to my understanding of the "no regression rule" is not okay, even if things only worked by chance and not flawless. But well, as with every rule there are misunderstandings, grey areas, and situations where judgement calls have to be made. Then it's up to Linus to decide how to handle things. Hence I'll just point him to this thread and then he can decide. No biggie. And sorry if I'm being a PITA here, I just thing doing that is my duty as regression tracker in situations like this. Hope your won't mind. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On Fri, Dec 02, 2022 at 10:03:58AM +0000, Will Deacon wrote: > On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > On 02.12.22 09:26, Amit Pundir wrote: > > > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > >> > > >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > >>> Has any progress been made to fix this regression? It afaics is not a > > >>> release critical issue, but well, it still would be nice to get this > > >>> fixed before 6.1 is released. > > >> > > >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > >> that exposed the driver bug. It doesn't fix the actual bug, it only > > >> makes it less likely to happen. > > >> > > >> I like the original commit removing the cache invalidation as it shows > > >> drivers not behaving properly > > > > Yeah, I understand that, but I guess it's my job to ask at this point: > > "is continuing to live with the old behavior for one or two more cycles" > > that much of a problem"? > > That wouldn't be a problem. The problem is that I haven't see any efforts > from the Qualcomm side to actually fix the drivers so what makes you think > the issue will be addressed in one or two more cycles? On the other hand, if > there were patches out there trying to fix the drivers and we just needed to > revert this change to buy them some time, then that would obviously be the > right thing to do. > There are efforts going on to fix the driver from Qualcomm. It's just that the patches are not available yet. The delay is mainly due to the internal communication that should happen between the internal teams. The fix would be use a separate no-map carveout for the usecase. But it'd be good to revert this patch untill those patches get merged. Thanks, Mani > > >> but, as a workaround, we could add a > > >> command line option to force back the old behaviour (defaulting to the > > >> new one) until the driver is fixed. > > > > Well, sometimes that approach is fine to fix a regression, but I'm not > > sure this is one of those situations, as this... > > > > > We use DB845c extensively for mainline and android-mainline[1] testing > > > with AOSP, and it is broken for weeks now. So be it a temporary > > > workaround or a proper driver fix in place, we'd really appreciate a > > > quick fix here. > > > > ...doesn't sound like we are not talking about some odd corner case > > here. But in the end that would be up to Linus to decide. > > The issue is that these drivers are abusing the DMA API to manage buffers > which are being transferred to trustzone. Even with the revert, this is > broken (the CPU can speculate from the kernel's cacheable linear mapping > of memory), it just appears to be less likely with the CPUs on this SoC. > So we end up in a situation where the kernel is flakey on these devices > but with even less incentive for the drivers to be fixed. > > As well as broken drivers, the patch has also identified broken device-tree > files where DMA-coherent devices weher incorrectly being treated as > non-coherent: > > https://lore.kernel.org/linux-arm-kernel/20221124142501.29314-1-johan+linaro@kernel.org/ > > so I do think it's something that's worth having as the default behaviour. > > > I'll point him to this thread once more in my weekly report anyway. > > Maybe I'll even suggest to revert this change, not sure yet. > > As I said above, I think the revert makes sense if the drivers are actually > being fixed, but I'm not seeing any movement at all on that front. > > Will
On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > On 02.12.22 11:03, Will Deacon wrote: > > On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > >> On 02.12.22 09:26, Amit Pundir wrote: > >>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > >>>> > >>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > >>>>> Has any progress been made to fix this regression? It afaics is not a > >>>>> release critical issue, but well, it still would be nice to get this > >>>>> fixed before 6.1 is released. > >>>> > >>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > >>>> that exposed the driver bug. It doesn't fix the actual bug, it only > >>>> makes it less likely to happen. > >>>> > >>>> I like the original commit removing the cache invalidation as it shows > >>>> drivers not behaving properly > >> > >> Yeah, I understand that, but I guess it's my job to ask at this point: > >> "is continuing to live with the old behavior for one or two more cycles" > >> that much of a problem"? > > > > That wouldn't be a problem. The problem is that I haven't see any efforts > > from the Qualcomm side to actually fix the drivers [...] > > Thx for sharing the details. I can fully understand your pain. But well, > in the end it looks to me like this commit it intentionally breaking > something that used to work -- which to my understanding of the "no > regression rule" is not okay, even if things only worked by chance and > not flawless. "no regressions" for userspace code, this is broken, out-of-tree driver code, right? I do not think any in-kernel drivers have this issue today from what I can tell, but if I am wrong here, please let me know. We don't keep stable apis, or even functionality, for out-of-tree kernel code as that would be impossible for us to do for obvious reasons. thanks, greg kh
Linux regression tracking (Thorsten Leemhuis)
Dec. 2, 2022, 4:27 p.m. UTC |
#20
Addressed
Unaddressed
On 02.12.22 17:10, Greg KH wrote: > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: >> On 02.12.22 11:03, Will Deacon wrote: >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: >>>> On 02.12.22 09:26, Amit Pundir wrote: >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: >>>>>> >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: >>>>>>> Has any progress been made to fix this regression? It afaics is not a >>>>>>> release critical issue, but well, it still would be nice to get this >>>>>>> fixed before 6.1 is released. >>>>>> >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only >>>>>> makes it less likely to happen. >>>>>> >>>>>> I like the original commit removing the cache invalidation as it shows >>>>>> drivers not behaving properly >>>> >>>> Yeah, I understand that, but I guess it's my job to ask at this point: >>>> "is continuing to live with the old behavior for one or two more cycles" >>>> that much of a problem"? >>> >>> That wouldn't be a problem. The problem is that I haven't see any efforts >>> from the Qualcomm side to actually fix the drivers [...] >> >> Thx for sharing the details. I can fully understand your pain. But well, >> in the end it looks to me like this commit it intentionally breaking >> something that used to work -- which to my understanding of the "no >> regression rule" is not okay, even if things only worked by chance and >> not flawless. > > "no regressions" for userspace code, this is broken, out-of-tree driver > code, right? If so: apologies. But that's not the impression I got, as Amit wrote "I can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree drivers." here: https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > I do not think any in-kernel drivers have this issue today > from what I can tell, but if I am wrong here, please let me know. > > We don't keep stable apis, or even functionality, for out-of-tree kernel > code as that would be impossible for us to do for obvious reasons. Ciao, Thorsten
On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > On 02.12.22 17:10, Greg KH wrote: > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > >> On 02.12.22 11:03, Will Deacon wrote: > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > >>>> On 02.12.22 09:26, Amit Pundir wrote: > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > >>>>>> > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > >>>>>>> release critical issue, but well, it still would be nice to get this > >>>>>>> fixed before 6.1 is released. > >>>>>> > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > >>>>>> makes it less likely to happen. > >>>>>> > >>>>>> I like the original commit removing the cache invalidation as it shows > >>>>>> drivers not behaving properly > >>>> > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > >>>> "is continuing to live with the old behavior for one or two more cycles" > >>>> that much of a problem"? > >>> > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > >>> from the Qualcomm side to actually fix the drivers [...] > >> > >> Thx for sharing the details. I can fully understand your pain. But well, > >> in the end it looks to me like this commit it intentionally breaking > >> something that used to work -- which to my understanding of the "no > >> regression rule" is not okay, even if things only worked by chance and > >> not flawless. > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > code, right? > > If so: apologies. But that's not the impression I got, as Amit wrote "I > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > drivers." here: > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ Ah, I missed that. Ok, what in-tree drivers are having problems being buggy? I can't seem to figure that out from that report at all. Does anyone know? thanks, greg k-h
On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote: > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > > > > On 02.12.22 17:10, Greg KH wrote: > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > > >> On 02.12.22 11:03, Will Deacon wrote: > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > >>>> On 02.12.22 09:26, Amit Pundir wrote: > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > >>>>>> > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > > >>>>>>> release critical issue, but well, it still would be nice to get this > > >>>>>>> fixed before 6.1 is released. > > >>>>>> > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > > >>>>>> makes it less likely to happen. > > >>>>>> > > >>>>>> I like the original commit removing the cache invalidation as it shows > > >>>>>> drivers not behaving properly > > >>>> > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > > >>>> "is continuing to live with the old behavior for one or two more cycles" > > >>>> that much of a problem"? > > >>> > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > > >>> from the Qualcomm side to actually fix the drivers [...] > > >> > > >> Thx for sharing the details. I can fully understand your pain. But well, > > >> in the end it looks to me like this commit it intentionally breaking > > >> something that used to work -- which to my understanding of the "no > > >> regression rule" is not okay, even if things only worked by chance and > > >> not flawless. > > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > > code, right? > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > > drivers." here: > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > > Ah, I missed that. > > Ok, what in-tree drivers are having problems being buggy? I can't seem > to figure that out from that report at all. Does anyone know? > It is the Qualcomm Q6V5_MSS remoteproc driver: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c Qualcomm is working on the fix but the patches are not ready yet. So if we can get this patch reverted in the meantime, that would be helpful. Thanks, Mani > thanks, > > greg k-h
On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote: > On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote: > > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > On 02.12.22 17:10, Greg KH wrote: > > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > > > >> On 02.12.22 11:03, Will Deacon wrote: > > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > > >>>> On 02.12.22 09:26, Amit Pundir wrote: > > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > >>>>>> > > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > > > >>>>>>> release critical issue, but well, it still would be nice to get this > > > >>>>>>> fixed before 6.1 is released. > > > >>>>>> > > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > > > >>>>>> makes it less likely to happen. > > > >>>>>> > > > >>>>>> I like the original commit removing the cache invalidation as it shows > > > >>>>>> drivers not behaving properly > > > >>>> > > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > > > >>>> "is continuing to live with the old behavior for one or two more cycles" > > > >>>> that much of a problem"? > > > >>> > > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > > > >>> from the Qualcomm side to actually fix the drivers [...] > > > >> > > > >> Thx for sharing the details. I can fully understand your pain. But well, > > > >> in the end it looks to me like this commit it intentionally breaking > > > >> something that used to work -- which to my understanding of the "no > > > >> regression rule" is not okay, even if things only worked by chance and > > > >> not flawless. > > > > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > > > code, right? > > > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I > > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > > > drivers." here: > > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > > > > Ah, I missed that. > > > > Ok, what in-tree drivers are having problems being buggy? I can't seem > > to figure that out from that report at all. Does anyone know? > > > > It is the Qualcomm Q6V5_MSS remoteproc driver: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c > > Qualcomm is working on the fix but the patches are not ready yet. So if we can > get this patch reverted in the meantime, that would be helpful. It's good to hear that you're working to fix this, even if it's happening behind closed doors. Do you have a rough idea how soon you'll be able to post the remoteproc driver fixes? That would help us to figure out when to bring back the change if we were to revert it. Cheers, Will
On Mon, Dec 05, 2022 at 02:24:03PM +0000, Will Deacon wrote: > On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote: > > On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote: > > > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > > On 02.12.22 17:10, Greg KH wrote: > > > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > > > > >> On 02.12.22 11:03, Will Deacon wrote: > > > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > > > >>>> On 02.12.22 09:26, Amit Pundir wrote: > > > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > >>>>>> > > > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > > > > >>>>>>> release critical issue, but well, it still would be nice to get this > > > > >>>>>>> fixed before 6.1 is released. > > > > >>>>>> > > > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > > > > >>>>>> makes it less likely to happen. > > > > >>>>>> > > > > >>>>>> I like the original commit removing the cache invalidation as it shows > > > > >>>>>> drivers not behaving properly > > > > >>>> > > > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > > > > >>>> "is continuing to live with the old behavior for one or two more cycles" > > > > >>>> that much of a problem"? > > > > >>> > > > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > > > > >>> from the Qualcomm side to actually fix the drivers [...] > > > > >> > > > > >> Thx for sharing the details. I can fully understand your pain. But well, > > > > >> in the end it looks to me like this commit it intentionally breaking > > > > >> something that used to work -- which to my understanding of the "no > > > > >> regression rule" is not okay, even if things only worked by chance and > > > > >> not flawless. > > > > > > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > > > > code, right? > > > > > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I > > > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > > > > drivers." here: > > > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > > > > > > Ah, I missed that. > > > > > > Ok, what in-tree drivers are having problems being buggy? I can't seem > > > to figure that out from that report at all. Does anyone know? > > > > > > > It is the Qualcomm Q6V5_MSS remoteproc driver: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c > > > > Qualcomm is working on the fix but the patches are not ready yet. So if we can > > get this patch reverted in the meantime, that would be helpful. > > It's good to hear that you're working to fix this, even if it's happening > behind closed doors. Do you have a rough idea how soon you'll be able to > post the remoteproc driver fixes? That would help us to figure out when > to bring back the change if we were to revert it. > Sibi is the one working on the fix. I believe he should be able to post the patches within this week. Thanks, Mani > Cheers, > > Will
On Tue, Dec 06, 2022 at 02:51:52PM +0530, Manivannan Sadhasivam wrote: > On Mon, Dec 05, 2022 at 02:24:03PM +0000, Will Deacon wrote: > > On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote: > > > On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote: > > > > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > > > On 02.12.22 17:10, Greg KH wrote: > > > > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > > > > > >> On 02.12.22 11:03, Will Deacon wrote: > > > > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > > > > >>>> On 02.12.22 09:26, Amit Pundir wrote: > > > > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > >>>>>> > > > > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > > > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > > > > > >>>>>>> release critical issue, but well, it still would be nice to get this > > > > > >>>>>>> fixed before 6.1 is released. > > > > > >>>>>> > > > > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > > > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > > > > > >>>>>> makes it less likely to happen. > > > > > >>>>>> > > > > > >>>>>> I like the original commit removing the cache invalidation as it shows > > > > > >>>>>> drivers not behaving properly > > > > > >>>> > > > > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > > > > > >>>> "is continuing to live with the old behavior for one or two more cycles" > > > > > >>>> that much of a problem"? > > > > > >>> > > > > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > > > > > >>> from the Qualcomm side to actually fix the drivers [...] > > > > > >> > > > > > >> Thx for sharing the details. I can fully understand your pain. But well, > > > > > >> in the end it looks to me like this commit it intentionally breaking > > > > > >> something that used to work -- which to my understanding of the "no > > > > > >> regression rule" is not okay, even if things only worked by chance and > > > > > >> not flawless. > > > > > > > > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > > > > > code, right? > > > > > > > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I > > > > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > > > > > drivers." here: > > > > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > > > > > > > > Ah, I missed that. > > > > > > > > Ok, what in-tree drivers are having problems being buggy? I can't seem > > > > to figure that out from that report at all. Does anyone know? > > > > > > > > > > It is the Qualcomm Q6V5_MSS remoteproc driver: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c > > > > > > Qualcomm is working on the fix but the patches are not ready yet. So if we can > > > get this patch reverted in the meantime, that would be helpful. > > > > It's good to hear that you're working to fix this, even if it's happening > > behind closed doors. Do you have a rough idea how soon you'll be able to > > post the remoteproc driver fixes? That would help us to figure out when > > to bring back the change if we were to revert it. > > > > Sibi is the one working on the fix. I believe he should be able to post the > patches within this week. Oh nice, that's a lot sooner than I expected! I'll send a revert out now, with a comment about where we're at. Will
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes: > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > As reported by Amit [1], dropping cache invalidation from > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > (most probably on other Qcom platforms too). On sc7180 with c44094ee applied, it does not trigger crash but makes Wifi dysfunctional by preventing initialization of ath10k_snoc. qcom-q6v5-mss 4080000.remoteproc: PBL returned unexpected status -284098560 With the revert of c44094ee, wifi works fine again. Thank you Leonard
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 3cb101e8cb29..7d7e9a046305 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -36,7 +36,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); - dcache_clean_poc(start, start + size); + dcache_clean_inval_poc(start, start + size); } #ifdef CONFIG_IOMMU_DMA