Message ID | 9af8a19c3398e7dc09cfc1fbafed98d795d9f83e.1699464622.git.baruch@tkos.co.il |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:aa0b:0:b0:403:3b70:6f57 with SMTP id k11csp1077101vqo; Wed, 8 Nov 2023 09:39:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IGn5VtLzT+3ANjxzX8Qxix6r3AmOtEeHw41sHaToFhEnToM/Jk31gYgNk85Al2sjYUzCbfl X-Received: by 2002:a05:6a20:3d27:b0:181:78ef:dc90 with SMTP id y39-20020a056a203d2700b0018178efdc90mr3119781pzi.13.1699465179539; Wed, 08 Nov 2023 09:39:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699465179; cv=none; d=google.com; s=arc-20160816; b=Ogbtp1YF9WLwOMxsmbciOBcPrPyMGb8Y5fwzAWZtkIKwHW+4duoUWIGCKd1RFsamSU c4Qo1hzUBdpzluEgYgJvYniH4UfaX54xCA0dZQ3RuriGgdmcMz+BPOqA09KMNmLirlph AaplaZ3itRtv15NNI3eh0Ob9zzTKKPpLqG/dSpmyzb7Kv5917VQ99G8pxffv7hlOqUO4 Gn+dNdRP10CQ6RoRSYrWo7JaBHfKnrlkIRgtXI88+Vvu9LkclkW58uVSGyicI/qV5wom 0FId1gp7EA5Pu7xq3ZVtAGMJkSp315nmXF5FH6v7I9Y7qTMpEACvxMcXWyFEFm9pZrI9 1o2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=PPCRkHBAUwawPimt8+pCIl43PzYFXbYdqL5A/uA5pzY=; fh=wfOyEiprTIzZ3UMKdEIwAFUednXPNA/JGYdpMj8P6d4=; b=K6NMXBV1CI+AZ2JhuoVIih+WKszTUiRDULjb548nZWNoQEF74ii5FZsWmkrWqlYYiB OjZ1X9FhqqUxSLrE1rW1/tZDRpz013gy06Uk/+EJ3v0xxgo+sUISZ1N1hli54/d17wU3 60tWyjqp2TlKmUNjHyQj+Me/6SGKmaVRZuGsbZTraOIZN/iIEESO3V7/4M7wwT1P709g kU5Fw7SHzmH8Auxaz9DXlTIdoSxHdy1na51pTY7uEJw9fM1yz57+TMmPQcVTWzitEMvZ sJZIHKkeN8TiJAn5TU62faRjW8F8GgNw+cXYQVNAjUnrUrhjL7O0ZCNAxIjTFjRTCc5Z BekA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@tkos.co.il header.s=default header.b=WYZBm2Gg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=tkos.co.il Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id g9-20020a656cc9000000b005859b1b34f7si5975103pgw.862.2023.11.08.09.39.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 09:39:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@tkos.co.il header.s=default header.b=WYZBm2Gg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=tkos.co.il Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id DE111811F24D; Wed, 8 Nov 2023 09:39:31 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230232AbjKHRjU (ORCPT <rfc822;jaysivo@gmail.com> + 32 others); Wed, 8 Nov 2023 12:39:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229907AbjKHRjT (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 8 Nov 2023 12:39:19 -0500 X-Greylist: delayed 509 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 08 Nov 2023 09:39:15 PST Received: from mail.tkos.co.il (wiki.tkos.co.il [84.110.109.230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76E3F1FF9 for <linux-kernel@vger.kernel.org>; Wed, 8 Nov 2023 09:39:15 -0800 (PST) Received: from tarshish.tkos.co.il (unknown [10.0.8.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.tkos.co.il (Postfix) with ESMTPS id CA51A440F61; Wed, 8 Nov 2023 19:29:47 +0200 (IST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tkos.co.il; s=default; t=1699464587; bh=1yLvpECl2rEGKsiM0fBZ4N89l8tkzdS1xWi8lB8tcnY=; h=From:To:Cc:Subject:Date:From; b=WYZBm2Gg+ekQkqyy3/yQwk9KwZsuYzf7EPGZgq25AcBtEiQYIG9v79dlW0g7MFcOU Hh8ySGKTW3Fx0e3dCwYbkJIkzEE4adqj/fzuCKT6GF6OWXm5uPMvAOpJtDD/+GPcv3 l3JXWD2sDN+i+jAFXUMtdp9R2vXm+qZ9eEzNs6WN2Gi88tDjetYxnUe7poiIGEBB6c MKEPDl4oIEyrXJgKK2MLlbgsWvSW0CQgDmaorWljdQfs+XGTfY+Y+ngeQyBxp060jM 1UCzfy+cvy/JAkUjz81Nh1WfRBeb6B4pbjZJxgOgfyk3LPy+rok4L6pCKOXEC5OQDs HnR/oqxVyIfRA== From: Baruch Siach <baruch@tkos.co.il> To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org> Cc: Christoph Hellwig <hch@lst.de>, Marek Szyprowski <m.szyprowski@samsung.com>, Robin Murphy <robin.murphy@arm.com>, Ramon Fried <ramon@neureality.ai>, iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Baruch Siach <baruch@tkos.co.il> Subject: [PATCH RFC] arm64: DMA zone above 4GB Date: Wed, 8 Nov 2023 19:30:22 +0200 Message-ID: <9af8a19c3398e7dc09cfc1fbafed98d795d9f83e.1699464622.git.baruch@tkos.co.il> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 08 Nov 2023 09:39:32 -0800 (PST) X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782018399781840115 X-GMAIL-MSGID: 1782018399781840115 |
Series |
[RFC] arm64: DMA zone above 4GB
|
|
Commit Message
Baruch Siach
Nov. 8, 2023, 5:30 p.m. UTC
My platform RAM starts at 32GB. It has no RAM under 4GB. zone_sizes_init() puts the entire RAM in the DMA zone as follows: [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000800000000-0x00000008bfffffff] [ 0.000000] DMA32 empty [ 0.000000] Normal empty Consider a bus with this 'dma-ranges' property: #address-cells = <2>; #size-cells = <2>; dma-ranges = <0x00000000 0xc0000000 0x00000008 0x00000000 0x0 0x40000000>; Devices under this bus can see 1GB of DMA range between 3GB-4GB. This range is mapped to CPU memory at 32GB-33GB. Current zone_sizes_init() code considers 'dma-ranges' only when it maps to RAM under 4GB, because zone_dma_bits is limited to 32. In this case 'dma-ranges' is ignored in practice, since DMA/DMA32 zones are both assumed to be located under 4GB. The result is that the stmmac driver DMA buffers allocation GFP_DMA32 flag has no effect. As a result DMA buffer allocations fail. The patch below is a crude workaround hack. It makes the DMA zone cover the 1GB memory area that is visible to stmmac DMA as follows: [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000800000000-0x000000083fffffff] [ 0.000000] DMA32 empty [ 0.000000] Normal [mem 0x0000000840000000-0x0000000bffffffff] ... [ 0.000000] software IO TLB: mapped [mem 0x000000083bfff000-0x000000083ffff000] (64MB) With this hack the stmmac driver works on my platform with no modification. Clearly this can't be the right solutions. zone_dma_bits is now wrong for one. It probably breaks other code as well. Is there any better suggestion to make DMA buffer allocations work on this hardware? Thanks --- arch/arm64/mm/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Comments
Hello Baruch, On Wed, 8 Nov 2023 19:30:22 +0200 Baruch Siach <baruch@tkos.co.il> wrote: > My platform RAM starts at 32GB. It has no RAM under 4GB. zone_sizes_init() > puts the entire RAM in the DMA zone as follows: > > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000800000000-0x00000008bfffffff] > [ 0.000000] DMA32 empty > [ 0.000000] Normal empty > > Consider a bus with this 'dma-ranges' property: > > #address-cells = <2>; > #size-cells = <2>; > dma-ranges = <0x00000000 0xc0000000 0x00000008 0x00000000 0x0 0x40000000>; > > Devices under this bus can see 1GB of DMA range between 3GB-4GB. This > range is mapped to CPU memory at 32GB-33GB. Thank you for this email. I have recently expressed my concerns about the possibility of such setups in theory (physical addresses v. DMA addresses). Now it seems we have a real-world example where this is actually happening. > Current zone_sizes_init() code considers 'dma-ranges' only when it maps > to RAM under 4GB, because zone_dma_bits is limited to 32. In this case > 'dma-ranges' is ignored in practice, since DMA/DMA32 zones are both > assumed to be located under 4GB. The result is that the stmmac driver > DMA buffers allocation GFP_DMA32 flag has no effect. As a result DMA > buffer allocations fail. > > The patch below is a crude workaround hack. It makes the DMA zone > cover the 1GB memory area that is visible to stmmac DMA as follows: > > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000800000000-0x000000083fffffff] > [ 0.000000] DMA32 empty > [ 0.000000] Normal [mem 0x0000000840000000-0x0000000bffffffff] > ... > [ 0.000000] software IO TLB: mapped [mem 0x000000083bfff000-0x000000083ffff000] (64MB) > > With this hack the stmmac driver works on my platform with no > modification. > > Clearly this can't be the right solutions. zone_dma_bits is now wrong for > one. It probably breaks other code as well. > > Is there any better suggestion to make DMA buffer allocations work on > this hardware? Yes, but not any time soon. My idea was to abandon the various DMA zones in the MM subsystem and replace them with a more flexible system based on "allocation constraints". I'm afraid there's not much the current Linux kernel can do for you. Petr T > Thanks > --- > arch/arm64/mm/init.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 74c1db8ce271..5fe826ac3a5f 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -136,13 +136,13 @@ static void __init zone_sizes_init(void) > unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; > unsigned int __maybe_unused acpi_zone_dma_bits; > unsigned int __maybe_unused dt_zone_dma_bits; > - phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32); > + phys_addr_t __maybe_unused dma32_phys_limit = DMA_BIT_MASK(32) + 1; > > #ifdef CONFIG_ZONE_DMA > acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address()); > dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL)); > zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits); > - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); > + arm64_dma_phys_limit = of_dma_get_max_cpu_address(NULL) + 1; > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); > #endif > #ifdef CONFIG_ZONE_DMA32
On Wed, Nov 08, 2023 at 07:30:22PM +0200, Baruch Siach wrote: > My platform RAM starts at 32GB. It has no RAM under 4GB. zone_sizes_init() > puts the entire RAM in the DMA zone as follows: > > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000800000000-0x00000008bfffffff] > [ 0.000000] DMA32 empty > [ 0.000000] Normal empty This was a concious decision with commit 791ab8b2e3db ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation"). The prior code seemed a bit of a hack and it also did not play well with zone_dma_bits (as you noticed as well). > Consider a bus with this 'dma-ranges' property: > > #address-cells = <2>; > #size-cells = <2>; > dma-ranges = <0x00000000 0xc0000000 0x00000008 0x00000000 0x0 0x40000000>; > > Devices under this bus can see 1GB of DMA range between 3GB-4GB. This > range is mapped to CPU memory at 32GB-33GB. Is this on real hardware or just theoretical for now (the rest of your email implies it's real)? Normally I'd expected the first GB (or first two) of RAM from 32G to be aliased to the lower 32-bit range for the CPU view as well, not just for devices. You'd then get a ZONE_DMA without having to play with DMA offsets. > Current zone_sizes_init() code considers 'dma-ranges' only when it maps > to RAM under 4GB, because zone_dma_bits is limited to 32. In this case > 'dma-ranges' is ignored in practice, since DMA/DMA32 zones are both > assumed to be located under 4GB. The result is that the stmmac driver > DMA buffers allocation GFP_DMA32 flag has no effect. As a result DMA > buffer allocations fail. > > The patch below is a crude workaround hack. It makes the DMA zone > cover the 1GB memory area that is visible to stmmac DMA as follows: > > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000800000000-0x000000083fffffff] > [ 0.000000] DMA32 empty > [ 0.000000] Normal [mem 0x0000000840000000-0x0000000bffffffff] > ... > [ 0.000000] software IO TLB: mapped [mem 0x000000083bfff000-0x000000083ffff000] (64MB) > > With this hack the stmmac driver works on my platform with no > modification. > > Clearly this can't be the right solutions. zone_dma_bits is now wrong for > one. It probably breaks other code as well. zone_dma_bits ends up as 36 if I counted correctly. So DMA_BIT_MASK(36) is 0xf_ffff_ffff and the phys_limit for your device is below this mask, so dma_direct_optimal_gfp_mask() does end up setting GFP_DMA. However, looking at how it sets GFP_DMA32, it is obvious that the code is not set up for such configurations. I'm also not a big fan of zone_dma_bits describing a mask that goes well above what the device can access. A workaround would be for zone_dma_bits to become a *_limit and sort out all places where we compare masks with masks derived from zone_dma_bits (e.g. cma_in_zone(), dma_direct_supported()). Alternatively, take the DMA offset into account when comparing the physical address corresponding to zone_dma_bits and keep zone_dma_bits small (phys offset subtracted, so in your case it would be 30 rather than 36). > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 74c1db8ce271..5fe826ac3a5f 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -136,13 +136,13 @@ static void __init zone_sizes_init(void) > unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; > unsigned int __maybe_unused acpi_zone_dma_bits; > unsigned int __maybe_unused dt_zone_dma_bits; > - phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32); > + phys_addr_t __maybe_unused dma32_phys_limit = DMA_BIT_MASK(32) + 1; This effectively gets rid of ZONE_DMA32 for your scenario. It's probably the right approach. I wouldn't worry about a GFP_DMA32 in such platform configurations. If a device has a DMA offset, don't bother checking whether its mask is 32-bit. > #ifdef CONFIG_ZONE_DMA > acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address()); > dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL)); > zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits); > - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); > + arm64_dma_phys_limit = of_dma_get_max_cpu_address(NULL) + 1; > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); > #endif > #ifdef CONFIG_ZONE_DMA32 This would need to work for ACPI as well, so use the max of acpi_iort_dma_get_max_cpu_address() and of_dma_get_max_cpu_address(). The zone_dma_bits would than be derived from the arm64_dma_phys_limit rather than the other way around. However, other than correctly defining the zones, we'd need to decide what the DMA mask actually is (bits from an offset or it includes the offset), whether we need to switch to a limit instead.
Hi Catalin, Petr, Thanks for your detailed response. See below a few comments and questions. On Thu, Nov 09 2023, Catalin Marinas wrote: > On Wed, Nov 08, 2023 at 07:30:22PM +0200, Baruch Siach wrote: >> Consider a bus with this 'dma-ranges' property: >> >> #address-cells = <2>; >> #size-cells = <2>; >> dma-ranges = <0x00000000 0xc0000000 0x00000008 0x00000000 0x0 0x40000000>; >> >> Devices under this bus can see 1GB of DMA range between 3GB-4GB. This >> range is mapped to CPU memory at 32GB-33GB. > > Is this on real hardware or just theoretical for now (the rest of your > email implies it's real)? Normally I'd expected the first GB (or first > two) of RAM from 32G to be aliased to the lower 32-bit range for the CPU > view as well, not just for devices. You'd then get a ZONE_DMA without > having to play with DMA offsets. This hardware is currently in fabrication past tapeout. Software tests are running on FPGA models and software simulators. CPU view of the 3GB-4GB range is not linear with DMA addresses. That is, for offset N where 0 <= N <= 1GB, the CPU address 3GB+N does not map to the same physical location of DMA address 3GB+N. Hardware engineers are not sure this is fixable. So as is often the case we look at software to save us. After all, from hardware perspective this design "works". >> Current zone_sizes_init() code considers 'dma-ranges' only when it maps >> to RAM under 4GB, because zone_dma_bits is limited to 32. In this case >> 'dma-ranges' is ignored in practice, since DMA/DMA32 zones are both >> assumed to be located under 4GB. The result is that the stmmac driver >> DMA buffers allocation GFP_DMA32 flag has no effect. As a result DMA >> buffer allocations fail. >> >> The patch below is a crude workaround hack. It makes the DMA zone >> cover the 1GB memory area that is visible to stmmac DMA as follows: >> >> [ 0.000000] Zone ranges: >> [ 0.000000] DMA [mem 0x0000000800000000-0x000000083fffffff] >> [ 0.000000] DMA32 empty >> [ 0.000000] Normal [mem 0x0000000840000000-0x0000000bffffffff] >> ... >> [ 0.000000] software IO TLB: mapped [mem 0x000000083bfff000-0x000000083ffff000] (64MB) >> >> With this hack the stmmac driver works on my platform with no >> modification. >> >> Clearly this can't be the right solutions. zone_dma_bits is now wrong for >> one. It probably breaks other code as well. > > zone_dma_bits ends up as 36 if I counted correctly. So DMA_BIT_MASK(36) > is 0xf_ffff_ffff and the phys_limit for your device is below this mask, > so dma_direct_optimal_gfp_mask() does end up setting GFP_DMA. However, > looking at how it sets GFP_DMA32, it is obvious that the code is not set > up for such configurations. I'm also not a big fan of zone_dma_bits > describing a mask that goes well above what the device can access. > > A workaround would be for zone_dma_bits to become a *_limit and sort out > all places where we compare masks with masks derived from zone_dma_bits > (e.g. cma_in_zone(), dma_direct_supported()). I was also thinking along these lines. I wasn't sure I see the entire picture, so I hesitated to suggest a patch. Specifically, the assumption that DMA range limits are power of 2 looks deeply ingrained in the code. Another assumption is that DMA32 zone is in low 4GB range. I can work on an RFC implementation of this approach. Petr suggested a more radical solution of per bus DMA constraints to replace DMA/DMA32 zones. As Petr acknowledged, this does not look like a near future solution. > Alternatively, take the DMA offset into account when comparing the > physical address corresponding to zone_dma_bits and keep zone_dma_bits > small (phys offset subtracted, so in your case it would be 30 rather > than 36). I am not following here. Care to elaborate? Thanks, baruch
On Sun, Nov 12, 2023 at 09:25:46AM +0200, Baruch Siach wrote: > On Thu, Nov 09 2023, Catalin Marinas wrote: > > On Wed, Nov 08, 2023 at 07:30:22PM +0200, Baruch Siach wrote: > >> Consider a bus with this 'dma-ranges' property: > >> > >> #address-cells = <2>; > >> #size-cells = <2>; > >> dma-ranges = <0x00000000 0xc0000000 0x00000008 0x00000000 0x0 0x40000000>; > >> > >> Devices under this bus can see 1GB of DMA range between 3GB-4GB. This > >> range is mapped to CPU memory at 32GB-33GB. > > > > Is this on real hardware or just theoretical for now (the rest of your > > email implies it's real)? Normally I'd expected the first GB (or first > > two) of RAM from 32G to be aliased to the lower 32-bit range for the CPU > > view as well, not just for devices. You'd then get a ZONE_DMA without > > having to play with DMA offsets. > > This hardware is currently in fabrication past tapeout. Software tests > are running on FPGA models and software simulators. > > CPU view of the 3GB-4GB range is not linear with DMA addresses. That is, > for offset N where 0 <= N <= 1GB, the CPU address 3GB+N does not map to > the same physical location of DMA address 3GB+N. Hardware engineers are > not sure this is fixable. So as is often the case we look at software to > save us. After all, from hardware perspective this design "works". We had a similar setup before with the AMD Seattle platform and the random assumption that that the first 4GB of the start of DRAM are used for ZONE_DMA32. We reverted that since. > >> Current zone_sizes_init() code considers 'dma-ranges' only when it maps > >> to RAM under 4GB, because zone_dma_bits is limited to 32. In this case > >> 'dma-ranges' is ignored in practice, since DMA/DMA32 zones are both > >> assumed to be located under 4GB. The result is that the stmmac driver > >> DMA buffers allocation GFP_DMA32 flag has no effect. As a result DMA > >> buffer allocations fail. > >> > >> The patch below is a crude workaround hack. It makes the DMA zone > >> cover the 1GB memory area that is visible to stmmac DMA as follows: > >> > >> [ 0.000000] Zone ranges: > >> [ 0.000000] DMA [mem 0x0000000800000000-0x000000083fffffff] > >> [ 0.000000] DMA32 empty > >> [ 0.000000] Normal [mem 0x0000000840000000-0x0000000bffffffff] > >> ... > >> [ 0.000000] software IO TLB: mapped [mem 0x000000083bfff000-0x000000083ffff000] (64MB) > >> > >> With this hack the stmmac driver works on my platform with no > >> modification. > >> > >> Clearly this can't be the right solutions. zone_dma_bits is now wrong for > >> one. It probably breaks other code as well. > > > > zone_dma_bits ends up as 36 if I counted correctly. So DMA_BIT_MASK(36) > > is 0xf_ffff_ffff and the phys_limit for your device is below this mask, > > so dma_direct_optimal_gfp_mask() does end up setting GFP_DMA. However, > > looking at how it sets GFP_DMA32, it is obvious that the code is not set > > up for such configurations. I'm also not a big fan of zone_dma_bits > > describing a mask that goes well above what the device can access. > > > > A workaround would be for zone_dma_bits to become a *_limit and sort out > > all places where we compare masks with masks derived from zone_dma_bits > > (e.g. cma_in_zone(), dma_direct_supported()). > > I was also thinking along these lines. I wasn't sure I see the entire > picture, so I hesitated to suggest a patch. Specifically, the assumption > that DMA range limits are power of 2 looks deeply ingrained in the > code. Another assumption is that DMA32 zone is in low 4GB range. I think we can safely assume that the DMA range (difference between the min and max addresses) is a power of two. Once the DMA/CPU offsets are added, this assumption no longer holds. The kernel assumes that the bits in the DMA range as seen by the device would match the CPU addresses. More below. > I can work on an RFC implementation of this approach. > > Petr suggested a more radical solution of per bus DMA constraints to > replace DMA/DMA32 zones. As Petr acknowledged, this does not look like a > near future solution. Petr's suggestion is more generic indeed but I reckon it will take a long time to implement and upstream. > > Alternatively, take the DMA offset into account when comparing the > > physical address corresponding to zone_dma_bits and keep zone_dma_bits > > small (phys offset subtracted, so in your case it would be 30 rather > > than 36). > > I am not following here. Care to elaborate? In your case, zone_dma_bits would be 30 covering 1GB. When doing all those comparison in the generic code, make sure that (1 << zone_dma_bits) value is used relative to the CPU offset or the start of DRAM rather than an absolute limit. Similarly for ZONE_DMA32, it wouldn't be the 32-bit of the CPU address but rather the 32-bit from the start of the RAM, on the assumption that the hardware lifts the device access into the correct physical address. So, I think it's doable keeping zone_dma_bits as 'bits' rather than limit, it just needs some updates throughout the generic code. Now, the DMA API maintainers may have a different opinion. Robin was away on holiday last week so I couldn't check with him.
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 74c1db8ce271..5fe826ac3a5f 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -136,13 +136,13 @@ static void __init zone_sizes_init(void) unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; unsigned int __maybe_unused acpi_zone_dma_bits; unsigned int __maybe_unused dt_zone_dma_bits; - phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32); + phys_addr_t __maybe_unused dma32_phys_limit = DMA_BIT_MASK(32) + 1; #ifdef CONFIG_ZONE_DMA acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address()); dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL)); zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits); - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); + arm64_dma_phys_limit = of_dma_get_max_cpu_address(NULL) + 1; max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif #ifdef CONFIG_ZONE_DMA32