[v2] dma-mapping: fix dma_addressing_limited if dma_range_map can't cover all system RAM

Message ID 20231010020835.3678-1-justin.he@arm.com
State New
Headers
Series [v2] dma-mapping: fix dma_addressing_limited if dma_range_map can't cover all system RAM |

Commit Message

Justin He Oct. 10, 2023, 2:08 a.m. UTC
  There is an unusual case that the range map covers right up to the top of
system RAM, but leaves a hole somewhere lower down. Then it prevents the
nvme device dma mapping in the checking path of phys_to_dma() and causes
the hangs at boot.

E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
 Method (_DMA, 0, Serialized)  // _DMA: Direct Memory Access
            {
                Name (RBUF, ResourceTemplate ()
                {
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000000000000000, // Range Minimum
                        0x00000000FFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000000100000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000006010200000, // Range Minimum
                        0x000000602FFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x000000001FE00000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x00000060F0000000, // Range Minimum
                        0x00000060FFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000000010000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000007000000000, // Range Minimum
                        0x000003FFFFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000039000000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                })

But the System RAM ranges are:
cat /proc/iomem |grep -i ram
90000000-91ffffff : System RAM
92900000-fffbffff : System RAM
880000000-fffffffff : System RAM
8800000000-bff5990fff : System RAM
bff59d0000-bff5a4ffff : System RAM
bff8000000-bfffffffff : System RAM
So some RAM ranges are out of dma_range_map.

Fixes it by checking whether each of the system RAM resources can be
properly encompassed within the dma_range_map.

Signed-off-by: Jia He <justin.he@arm.com>
---
v2:
  - refine the subject and commit msg (By Robin Murphy)
  - refactor the checking loop in check_ram_in_range_map() in the pages
    unit to avoid wrap to 0 on 32bits platforms (Robin)

 include/linux/dma-mapping.h | 12 +++++++++--
 kernel/dma/mapping.c        | 42 +++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 2 deletions(-)
  

Comments

Christoph Hellwig Oct. 10, 2023, 7:48 a.m. UTC | #1
dma_addressing_limited is called for every dma (direct) map, and this
new code is way to heavy for that.  We'll need to cache the information
somehow.
  
Justin He Oct. 10, 2023, 8:42 a.m. UTC | #2
Hi Christoph,

> -----Original Message-----
> From: Christoph Hellwig <hch@lst.de>
> Sent: Tuesday, October 10, 2023 3:48 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Christoph Hellwig <hch@lst.de>; Marek Szyprowski
> <m.szyprowski@samsung.com>; Robin Murphy <Robin.Murphy@arm.com>;
> iommu@lists.linux.dev; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] dma-mapping: fix dma_addressing_limited if
> dma_range_map can't cover all system RAM
> 
> dma_addressing_limited is called for every dma (direct) map, and this new
> code is way to heavy for that.  We'll need to cache the information somehow.
Yes, maybe a flag is needed to set to true once the heavy work is done.
Hence, the flag should be set to false when the SYSTEM RAM resources are changed.
Let me check all the cases.


--
Cheers,
Justin (Jia He)
  
Robin Murphy Oct. 10, 2023, 9:40 a.m. UTC | #3
On 2023-10-10 08:48, Christoph Hellwig wrote:
> dma_addressing_limited is called for every dma (direct) map, and this
> new code is way to heavy for that.  We'll need to cache the information
> somehow.

But is it? That was my instinctive reaction as well, but AFAICS it is 
only actually used by drivers in setup paths, either directly or via 
dma_max_mapping_size(). The dma_capable() check which we *do* use for 
every actual mapping still just checks whether the given physical 
address exists in the range map. I believe the underlying problem here 
is when dma_capable() say the mapping needs bouncing, but then it's too 
big for that to succeed, since dma_max_mapping_size() gave the wrong 
answer to start with.

Thanks,
Robin.
  
Christoph Hellwig Oct. 11, 2023, 5:11 a.m. UTC | #4
On Tue, Oct 10, 2023 at 10:40:44AM +0100, Robin Murphy wrote:
> On 2023-10-10 08:48, Christoph Hellwig wrote:
>> dma_addressing_limited is called for every dma (direct) map, and this
>> new code is way to heavy for that.  We'll need to cache the information
>> somehow.
>
> But is it? That was my instinctive reaction as well, but AFAICS it is only 
> actually used by drivers in setup paths, either directly or via 
> dma_max_mapping_size(). The dma_capable() check which we *do* use for every 
> actual mapping still just checks whether the given physical address exists 
> in the range map. I believe the underlying problem here is when 
> dma_capable() say the mapping needs bouncing, but then it's too big for 
> that to succeed, since dma_max_mapping_size() gave the wrong answer to 
> start with.

Ah, indeed, I somehw misremembered calling it in the mapping code.

Justing, can you still respin this a bit, add a prep patch to move
dma_addressing_limited so that it is exported instead of the new
low-level helper, and fix up coding style issues like the overly
long lines of possible?  If it's not perfect I'll fix up the rest
later.
  
Justin He Oct. 16, 2023, 2:59 a.m. UTC | #5
Hi,

> -----Original Message-----
> From: Christoph Hellwig <hch@lst.de>
> Sent: Wednesday, October 11, 2023 1:11 PM
> To: Robin Murphy <Robin.Murphy@arm.com>
> Cc: Christoph Hellwig <hch@lst.de>; Justin He <Justin.He@arm.com>; Marek
> Szyprowski <m.szyprowski@samsung.com>; iommu@lists.linux.dev;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] dma-mapping: fix dma_addressing_limited if
> dma_range_map can't cover all system RAM
>
>
[snip]
> Ah, indeed, I somehw misremembered calling it in the mapping code.
>
> Justing, can you still respin this a bit, add a prep patch to move
> dma_addressing_limited so that it is exported instead of the new low-level
> helper, and fix up coding style issues like the overly long lines of possible?  If
> it's not perfect I'll fix up the rest later.
Sure, thanks for the suggestion.


--
Cheers,
Justin (Jia He)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  

Patch

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f0ccca16a0ac..b8abbb2619a2 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -144,6 +144,7 @@  bool dma_pci_p2pdma_supported(struct device *dev);
 int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
+bool all_ram_in_dma_range_map(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
 size_t dma_opt_mapping_size(struct device *dev);
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
@@ -264,6 +265,10 @@  static inline u64 dma_get_required_mask(struct device *dev)
 {
 	return 0;
 }
+static inline bool all_ram_in_dma_range_map(struct device *dev)
+{
+	return 1;
+}
 static inline size_t dma_max_mapping_size(struct device *dev)
 {
 	return 0;
@@ -475,8 +480,11 @@  static inline int dma_coerce_mask_and_coherent(struct device *dev, u64 mask)
  */
 static inline bool dma_addressing_limited(struct device *dev)
 {
-	return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
-			    dma_get_required_mask(dev);
+	if (min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
+			    dma_get_required_mask(dev))
+		return true;
+
+	return !all_ram_in_dma_range_map(dev);
 }
 
 static inline unsigned int dma_get_max_seg_size(struct device *dev)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index e323ca48f7f2..5f454949a428 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -7,6 +7,7 @@ 
  */
 #include <linux/memblock.h> /* for max_pfn */
 #include <linux/acpi.h>
+#include <linux/dma-direct.h> /* for bus_dma_region */
 #include <linux/dma-map-ops.h>
 #include <linux/export.h>
 #include <linux/gfp.h>
@@ -819,6 +820,47 @@  size_t dma_opt_mapping_size(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dma_opt_mapping_size);
 
+/*
+ * To check whether all ram resource ranges are mapped in dma range map
+ * Returns 0 when continuous check is needed
+ * Returns 1 if there is some ram range can't be mapped to dma_range_map
+ */
+static int check_ram_in_range_map(unsigned long start_pfn,
+				  unsigned long nr_pages, void *data)
+{
+	unsigned long end_pfn = start_pfn + nr_pages;
+	struct device *dev = (struct device *)data;
+	struct bus_dma_region *region = NULL;
+	const struct bus_dma_region *m;
+
+	while (start_pfn < end_pfn) {
+		for (m = dev->dma_range_map; PFN_DOWN(m->size); m++) {
+			unsigned long cpu_start_pfn = PFN_DOWN(m->cpu_start);
+
+			if (start_pfn >= cpu_start_pfn
+			    && start_pfn - cpu_start_pfn < PFN_DOWN(m->size)) {
+				region = (struct bus_dma_region *)m;
+				break;
+			}
+		}
+		if (!region)
+			return 1;
+
+		start_pfn = PFN_DOWN(region->cpu_start) + PFN_DOWN(region->size);
+	}
+
+	return 0;
+}
+
+bool all_ram_in_dma_range_map(struct device *dev)
+{
+	if (!dev->dma_range_map)
+		return 1;
+
+	return !walk_system_ram_range(0, ULONG_MAX, dev, check_ram_in_range_map);
+}
+EXPORT_SYMBOL_GPL(all_ram_in_dma_range_map);
+
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);