[v2] dma-mapping: fix dma_addressing_limited if dma_range_map can't cover all system RAM
Commit Message
There is an unusual case that the range map covers right up to the top of
system RAM, but leaves a hole somewhere lower down. Then it prevents the
nvme device dma mapping in the checking path of phys_to_dma() and causes
the hangs at boot.
E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
Method (_DMA, 0, Serialized) // _DMA: Direct Memory Access
{
Name (RBUF, ResourceTemplate ()
{
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000000000000000, // Range Minimum
0x00000000FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000100000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000006010200000, // Range Minimum
0x000000602FFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x000000001FE00000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x00000060F0000000, // Range Minimum
0x00000060FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000010000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000007000000000, // Range Minimum
0x000003FFFFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000039000000000, // Length
,, , AddressRangeMemory, TypeStatic)
})
But the System RAM ranges are:
cat /proc/iomem |grep -i ram
90000000-91ffffff : System RAM
92900000-fffbffff : System RAM
880000000-fffffffff : System RAM
8800000000-bff5990fff : System RAM
bff59d0000-bff5a4ffff : System RAM
bff8000000-bfffffffff : System RAM
So some RAM ranges are out of dma_range_map.
Fixes it by checking whether each of the system RAM resources can be
properly encompassed within the dma_range_map.
Signed-off-by: Jia He <justin.he@arm.com>
---
v2:
- refine the subject and commit msg (By Robin Murphy)
- refactor the checking loop in check_ram_in_range_map() in the pages
unit to avoid wrap to 0 on 32bits platforms (Robin)
include/linux/dma-mapping.h | 12 +++++++++--
kernel/dma/mapping.c | 42 +++++++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+), 2 deletions(-)
Comments
dma_addressing_limited is called for every dma (direct) map, and this
new code is way to heavy for that. We'll need to cache the information
somehow.
Hi Christoph,
> -----Original Message-----
> From: Christoph Hellwig <hch@lst.de>
> Sent: Tuesday, October 10, 2023 3:48 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Christoph Hellwig <hch@lst.de>; Marek Szyprowski
> <m.szyprowski@samsung.com>; Robin Murphy <Robin.Murphy@arm.com>;
> iommu@lists.linux.dev; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] dma-mapping: fix dma_addressing_limited if
> dma_range_map can't cover all system RAM
>
> dma_addressing_limited is called for every dma (direct) map, and this new
> code is way to heavy for that. We'll need to cache the information somehow.
Yes, maybe a flag is needed to set to true once the heavy work is done.
Hence, the flag should be set to false when the SYSTEM RAM resources are changed.
Let me check all the cases.
--
Cheers,
Justin (Jia He)
On 2023-10-10 08:48, Christoph Hellwig wrote:
> dma_addressing_limited is called for every dma (direct) map, and this
> new code is way to heavy for that. We'll need to cache the information
> somehow.
But is it? That was my instinctive reaction as well, but AFAICS it is
only actually used by drivers in setup paths, either directly or via
dma_max_mapping_size(). The dma_capable() check which we *do* use for
every actual mapping still just checks whether the given physical
address exists in the range map. I believe the underlying problem here
is when dma_capable() say the mapping needs bouncing, but then it's too
big for that to succeed, since dma_max_mapping_size() gave the wrong
answer to start with.
Thanks,
Robin.
On Tue, Oct 10, 2023 at 10:40:44AM +0100, Robin Murphy wrote:
> On 2023-10-10 08:48, Christoph Hellwig wrote:
>> dma_addressing_limited is called for every dma (direct) map, and this
>> new code is way to heavy for that. We'll need to cache the information
>> somehow.
>
> But is it? That was my instinctive reaction as well, but AFAICS it is only
> actually used by drivers in setup paths, either directly or via
> dma_max_mapping_size(). The dma_capable() check which we *do* use for every
> actual mapping still just checks whether the given physical address exists
> in the range map. I believe the underlying problem here is when
> dma_capable() say the mapping needs bouncing, but then it's too big for
> that to succeed, since dma_max_mapping_size() gave the wrong answer to
> start with.
Ah, indeed, I somehw misremembered calling it in the mapping code.
Justing, can you still respin this a bit, add a prep patch to move
dma_addressing_limited so that it is exported instead of the new
low-level helper, and fix up coding style issues like the overly
long lines of possible? If it's not perfect I'll fix up the rest
later.
Hi,
> -----Original Message-----
> From: Christoph Hellwig <hch@lst.de>
> Sent: Wednesday, October 11, 2023 1:11 PM
> To: Robin Murphy <Robin.Murphy@arm.com>
> Cc: Christoph Hellwig <hch@lst.de>; Justin He <Justin.He@arm.com>; Marek
> Szyprowski <m.szyprowski@samsung.com>; iommu@lists.linux.dev;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] dma-mapping: fix dma_addressing_limited if
> dma_range_map can't cover all system RAM
>
>
[snip]
> Ah, indeed, I somehw misremembered calling it in the mapping code.
>
> Justing, can you still respin this a bit, add a prep patch to move
> dma_addressing_limited so that it is exported instead of the new low-level
> helper, and fix up coding style issues like the overly long lines of possible? If
> it's not perfect I'll fix up the rest later.
Sure, thanks for the suggestion.
--
Cheers,
Justin (Jia He)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
@@ -144,6 +144,7 @@ bool dma_pci_p2pdma_supported(struct device *dev);
int dma_set_mask(struct device *dev, u64 mask);
int dma_set_coherent_mask(struct device *dev, u64 mask);
u64 dma_get_required_mask(struct device *dev);
+bool all_ram_in_dma_range_map(struct device *dev);
size_t dma_max_mapping_size(struct device *dev);
size_t dma_opt_mapping_size(struct device *dev);
bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
@@ -264,6 +265,10 @@ static inline u64 dma_get_required_mask(struct device *dev)
{
return 0;
}
+static inline bool all_ram_in_dma_range_map(struct device *dev)
+{
+ return 1;
+}
static inline size_t dma_max_mapping_size(struct device *dev)
{
return 0;
@@ -475,8 +480,11 @@ static inline int dma_coerce_mask_and_coherent(struct device *dev, u64 mask)
*/
static inline bool dma_addressing_limited(struct device *dev)
{
- return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
- dma_get_required_mask(dev);
+ if (min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
+ dma_get_required_mask(dev))
+ return true;
+
+ return !all_ram_in_dma_range_map(dev);
}
static inline unsigned int dma_get_max_seg_size(struct device *dev)
@@ -7,6 +7,7 @@
*/
#include <linux/memblock.h> /* for max_pfn */
#include <linux/acpi.h>
+#include <linux/dma-direct.h> /* for bus_dma_region */
#include <linux/dma-map-ops.h>
#include <linux/export.h>
#include <linux/gfp.h>
@@ -819,6 +820,47 @@ size_t dma_opt_mapping_size(struct device *dev)
}
EXPORT_SYMBOL_GPL(dma_opt_mapping_size);
+/*
+ * To check whether all ram resource ranges are mapped in dma range map
+ * Returns 0 when continuous check is needed
+ * Returns 1 if there is some ram range can't be mapped to dma_range_map
+ */
+static int check_ram_in_range_map(unsigned long start_pfn,
+ unsigned long nr_pages, void *data)
+{
+ unsigned long end_pfn = start_pfn + nr_pages;
+ struct device *dev = (struct device *)data;
+ struct bus_dma_region *region = NULL;
+ const struct bus_dma_region *m;
+
+ while (start_pfn < end_pfn) {
+ for (m = dev->dma_range_map; PFN_DOWN(m->size); m++) {
+ unsigned long cpu_start_pfn = PFN_DOWN(m->cpu_start);
+
+ if (start_pfn >= cpu_start_pfn
+ && start_pfn - cpu_start_pfn < PFN_DOWN(m->size)) {
+ region = (struct bus_dma_region *)m;
+ break;
+ }
+ }
+ if (!region)
+ return 1;
+
+ start_pfn = PFN_DOWN(region->cpu_start) + PFN_DOWN(region->size);
+ }
+
+ return 0;
+}
+
+bool all_ram_in_dma_range_map(struct device *dev)
+{
+ if (!dev->dma_range_map)
+ return 1;
+
+ return !walk_system_ram_range(0, ULONG_MAX, dev, check_ram_in_range_map);
+}
+EXPORT_SYMBOL_GPL(all_ram_in_dma_range_map);
+
bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
{
const struct dma_map_ops *ops = get_dma_ops(dev);