[v7,7/7] drm/etnaviv: add support for the dma coherent device
Commit Message
From: Sui Jingfeng <suijingfeng@loongson.cn>
Loongson CPUs maintain cache coherency by hardware, which means that the
data in the CPU cache is identical to the data in main system memory. As
for the peripheral device, most of Loongson chips chose to define the
peripherals as DMA coherent by default, device drivers do not need to
maintain the coherency between a processor and an I/O device manually.
There are exceptions, for LS2K1000 SoC, part of peripheral device can be
configured as dma non-coherent. But there is no released version of such
firmware exist in the market. Peripherals of older ls2k1000 is also DMA
non-conherent, but they are nearly outdated. So, those are trivial cases.
Nevertheless, kernel space still need to do probe work, because vivante GPU
IP has been integrated into various platform. Hence, this patch add runtime
detection code to probe if a specific gpu is DMA coherent, If the answer is
yes, we are going to utilize such features. On Loongson platfform, When a
buffer is accesed by both the GPU and the CPU, The driver should prefer
ETNA_BO_CACHED over ETNA_BO_WC.
This patch also add a new parameter: etnaviv_param_gpu_coherent, which
allow userspace to know if such a feature is available. Because
write-combined BO is still preferred in some case, especially where don't
need CPU read, for example, uploading shader bin.
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
---
drivers/gpu/drm/etnaviv/etnaviv_drv.c | 36 +++++++++++++++++++++
drivers/gpu/drm/etnaviv/etnaviv_drv.h | 6 ++++
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 22 ++++++++++---
drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 7 +++-
drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 7 +++-
include/uapi/drm/etnaviv_drm.h | 1 +
6 files changed, 73 insertions(+), 6 deletions(-)
Comments
On Sat, Jun 03, 2023 at 06:59:43PM +0800, Sui Jingfeng wrote:
> From: Sui Jingfeng <suijingfeng@loongson.cn>
>
> Loongson CPUs maintain cache coherency by hardware, which means that the
> data in the CPU cache is identical to the data in main system memory. As
> for the peripheral device, most of Loongson chips chose to define the
> peripherals as DMA coherent by default, device drivers do not need to
> maintain the coherency between a processor and an I/O device manually.
>
> There are exceptions, for LS2K1000 SoC, part of peripheral device can be
> configured as dma non-coherent. But there is no released version of such
> firmware exist in the market. Peripherals of older ls2k1000 is also DMA
> non-conherent, but they are nearly outdated. So, those are trivial cases.
s/dma/DMA/
s/non-conherent/non-coherent/
s/ls2k1000/LS2K1000/
I guess when you say these are "trivial cases," you mean you don't
care about supporting those devices?
> Nevertheless, kernel space still need to do probe work, because vivante GPU
> IP has been integrated into various platform. Hence, this patch add runtime
> detection code to probe if a specific gpu is DMA coherent, If the answer is
> yes, we are going to utilize such features. On Loongson platfform, When a
> buffer is accesed by both the GPU and the CPU, The driver should prefer
> ETNA_BO_CACHED over ETNA_BO_WC.
s/gpu/GPU/
s/platfform/platform/
s/accesed/accessed/
I guess the only way to discover this coherency attribute is via the
DT "vivante,gc" property? Seems a little weird but I'm really not a
DT person.
> This patch also add a new parameter: etnaviv_param_gpu_coherent, which
> allow userspace to know if such a feature is available. Because
> write-combined BO is still preferred in some case, especially where don't
> need CPU read, for example, uploading shader bin.
> ...
> +static struct device_node *etnaviv_of_first_available_node(void)
> +{
> + struct device_node *core_node;
> +
> + for_each_compatible_node(core_node, NULL, "vivante,gc") {
> + if (!of_device_is_available(core_node))
> + continue;
> +
> + return core_node;
> + }
> +
> + return NULL;
Seems like this would be simpler as:
for_each_compatible_node(core_node, NULL, "vivante,gc") {
if (of_device_is_available(core_node))
return core_node;
}
return NULL;
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> @@ -8,6 +8,7 @@
> #include <linux/delay.h>
> #include <linux/dma-fence.h>
> #include <linux/dma-mapping.h>
> +#include <linux/dma-map-ops.h>
It looks like this #include might not be needed? You're only adding a
new reference to priv->dma_coherent, which looks like it was added to
etnaviv_drv.h.
> #include <linux/module.h>
> #include <linux/of_device.h>
> #include <linux/platform_device.h>
> @@ -164,6 +165,10 @@ int etnaviv_gpu_get_param(struct etnaviv_gpu *gpu, u32 param, u64 *value)
> *value = gpu->identity.eco_id;
> break;
>
> + case ETNAVIV_PARAM_GPU_COHERENT:
> + *value = priv->dma_coherent;
> + break;
> +
> default:
> DBG("%s: invalid param: %u", dev_name(gpu->dev), param);
> return -EINVAL;
> @@ -1861,7 +1866,7 @@ static int etnaviv_gpu_register_irq(struct etnaviv_gpu *gpu, int irq)
>
> gpu->irq = irq;
>
> - dev_info(dev, "IRQ handler registered, irq = %d\n", irq);
> + dev_info(dev, "irq(%d) handler registered\n", irq);
Looks possibly unnecessary, or at least unrelated to this patch.
> return 0;
> }
Hi, I love your reviews
On 2023/6/7 00:56, Bjorn Helgaas wrote:
> On Sat, Jun 03, 2023 at 06:59:43PM +0800, Sui Jingfeng wrote:
>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>
>> Loongson CPUs maintain cache coherency by hardware, which means that the
>> data in the CPU cache is identical to the data in main system memory. As
>> for the peripheral device, most of Loongson chips chose to define the
>> peripherals as DMA coherent by default, device drivers do not need to
>> maintain the coherency between a processor and an I/O device manually.
>>
>> There are exceptions, for LS2K1000 SoC, part of peripheral device can be
>> configured as dma non-coherent. But there is no released version of such
>> firmware exist in the market. Peripherals of older ls2k1000 is also DMA
>> non-conherent, but they are nearly outdated. So, those are trivial cases.
> s/dma/DMA/
> s/non-conherent/non-coherent/
> s/ls2k1000/LS2K1000/
So sharpen eyes, as before. :-)
> I guess when you say these are "trivial cases," you mean you don't
> care about supporting those devices?
Not exactly, Its just that its kernel side thing.
the kernel side should tell us whether a peripheral device is dma
coherent or not.
I do try to support the LSDC/GC1000 as DMA non-coherent peripheral
device in the past,
It's no fun, Only helps to study knowledge, experiment, testing and
comparison(with the dma coherent case).
It requires me compile the PMON firmware on myself. And flash it to the
ROM the board.
change firmware is complex, there a lot of address windows and cross
bar(control
a access either go the cached path or go the non cached path) setting which
only firmware engineer know about.
If there a user want ask me to do it, I will try to test this driver on
such old chip again.
Now, I believe that the support is *already* there.
As etnaviv already works on DMA non-coherent platform originally.
The DC in Loongson Mips/LoongArch SoC can scanout from cached system RAM.
low-end application relay on CPU software rendering only.
There no need to do something like flush cache(write the data in the cache
back to system ram), invalid the cache. This is pretty convenient.
The old Ingenic SoC(jz4770 for example, mips32) also has vivante gpu
integrated.
but they are dma non-coherent, see more addition material at [1].
Therefore, still need to consider the support from cross(various)
devices perspective.
[1] https://lkml.org/lkml/2021/5/15/177
>> Nevertheless, kernel space still need to do probe work, because vivante GPU
>> IP has been integrated into various platform. Hence, this patch add runtime
>> detection code to probe if a specific gpu is DMA coherent, If the answer is
>> yes, we are going to utilize such features. On Loongson platfform, When a
>> buffer is accesed by both the GPU and the CPU, The driver should prefer
>> ETNA_BO_CACHED over ETNA_BO_WC.
> s/gpu/GPU/
> s/platfform/platform/
> s/accesed/accessed/
OK, will be fixed at next version,
I don't find this on myself. :-(
> I guess the only way to discover this coherency attribute is via the
> DT "vivante,gc" property? Seems a little weird but I'm really not a
> DT person.
I'm not sure it is *only*, but it is very convenient to achieve such a
thing with DT.
Just need to put the "dma-coherent;" or "dma-noncoherent" inside the
"vivante,gc" node.
see of_dma_is_coherent() function.
DT is flexible. But this is just used to describe the hardware, it don't
not changing the hardware.
Put any property only has a influence to the software or driver side.
The hardware still as it is.
Changing hardware to a different working mode probably still need the
firmware(uefi, uboot, pmon etc) to do it
>> This patch also add a new parameter: etnaviv_param_gpu_coherent, which
>> allow userspace to know if such a feature is available. Because
>> write-combined BO is still preferred in some case, especially where don't
>> need CPU read, for example, uploading shader bin.
>> ...
>> +static struct device_node *etnaviv_of_first_available_node(void)
>> +{
>> + struct device_node *core_node;
>> +
>> + for_each_compatible_node(core_node, NULL, "vivante,gc") {
>> + if (!of_device_is_available(core_node))
>> + continue;
>> +
>> + return core_node;
>> + }
>> +
>> + return NULL;
> Seems like this would be simpler as:
>
> for_each_compatible_node(core_node, NULL, "vivante,gc") {
> if (of_device_is_available(core_node))
> return core_node;
> }
>
> return NULL;
Indeed, I don't realize this when I create this patch.
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
>> @@ -8,6 +8,7 @@
>> #include <linux/delay.h>
>> #include <linux/dma-fence.h>
>> #include <linux/dma-mapping.h>
>> +#include <linux/dma-map-ops.h>
> It looks like this #include might not be needed?
No,
the dev_is_dma_coherent() function is declared and defined in dma-map-ops.h.
if remove it, gcc will complain:
drivers/gpu/drm/etnaviv/etnaviv_drv.c: In function
‘etnaviv_is_dma_coherent’:
drivers/gpu/drm/etnaviv/etnaviv_drv.c:56:14: error: implicit declaration
of function ‘dev_is_dma_coherent’; did you mean
‘etnaviv_is_dma_coherent’? [-Werror=implicit-function-declaration]
56 | coherent = dev_is_dma_coherent(dev);
| ^~~~~~~~~~~~~~~~~~~
> You're only adding a
> new reference to priv->dma_coherent, which looks like it was added to
> etnaviv_drv.h.
>
>> #include <linux/module.h>
>> #include <linux/of_device.h>
>> #include <linux/platform_device.h>
>> @@ -164,6 +165,10 @@ int etnaviv_gpu_get_param(struct etnaviv_gpu *gpu, u32 param, u64 *value)
>> *value = gpu->identity.eco_id;
>> break;
>>
>> + case ETNAVIV_PARAM_GPU_COHERENT:
>> + *value = priv->dma_coherent;
>> + break;
>> +
>> default:
>> DBG("%s: invalid param: %u", dev_name(gpu->dev), param);
>> return -EINVAL;
>> @@ -1861,7 +1866,7 @@ static int etnaviv_gpu_register_irq(struct etnaviv_gpu *gpu, int irq)
>>
>> gpu->irq = irq;
>>
>> - dev_info(dev, "IRQ handler registered, irq = %d\n", irq);
>> + dev_info(dev, "irq(%d) handler registered\n", irq);
> Looks possibly unnecessary, or at least unrelated to this patch.
Indeed, catched by you again.
>> return 0;
>> }
On Wed, Jun 07, 2023 at 02:43:27AM +0800, Sui Jingfeng wrote:
> On 2023/6/7 00:56, Bjorn Helgaas wrote:
> > On Sat, Jun 03, 2023 at 06:59:43PM +0800, Sui Jingfeng wrote:
> > > From: Sui Jingfeng <suijingfeng@loongson.cn>
> > >
> > > Loongson CPUs maintain cache coherency by hardware, which means that the
> > > data in the CPU cache is identical to the data in main system memory. As
> > > for the peripheral device, most of Loongson chips chose to define the
> > > peripherals as DMA coherent by default, device drivers do not need to
> > > maintain the coherency between a processor and an I/O device manually.
> > ...
> > I guess the only way to discover this coherency attribute is via the
> > DT "vivante,gc" property? Seems a little weird but I'm really not a
> > DT person.
>
> I'm not sure it is *only*, but it is very convenient to achieve such a thing
> with DT.
I don't know if this is a property of the PCI device, or a property of
the system as a whole. I asked because PCI devices should be
self-describing (the Device/Vendor ID should be enough to identify the
correct driver, and the driver should know how to learn anything else
it needs to know about the device from PCI config space) and should
not require extra DT properties.
But if this is a CPU or system property, you probably have to use a
firmware interface like DT or ACPI.
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> > > @@ -8,6 +8,7 @@
> > > #include <linux/delay.h>
> > > #include <linux/dma-fence.h>
> > > #include <linux/dma-mapping.h>
> > > +#include <linux/dma-map-ops.h>
> >
> > It looks like this #include might not be needed?
>
> No, the dev_is_dma_coherent() function is declared and defined in
> dma-map-ops.h. if remove it, gcc will complain:
>
> drivers/gpu/drm/etnaviv/etnaviv_drv.c: In function
> ‘etnaviv_is_dma_coherent’:
> drivers/gpu/drm/etnaviv/etnaviv_drv.c:56:14: error: implicit declaration of
> function ‘dev_is_dma_coherent’; did you mean ‘etnaviv_is_dma_coherent’?
> [-Werror=implicit-function-declaration]
> 56 | coherent = dev_is_dma_coherent(dev);
> | ^~~~~~~~~~~~~~~~~~~
Of course, but that warning is for etnaviv_drv.c, not for
etnaviv_gpu.c. So etnaviv_drv.c needs to include dma-map-ops.h, but I
don't think etnaviv_gpu.c does. I removed this #include from
etnaviv_gpu.c and it still built without errors.
> > You're only adding a
> > new reference to priv->dma_coherent, which looks like it was added to
> > etnaviv_drv.h.
@@ -5,7 +5,9 @@
#include <linux/component.h>
#include <linux/dma-mapping.h>
+#include <linux/dma-map-ops.h>
#include <linux/module.h>
+#include <linux/of_address.h>
#include <linux/of_platform.h>
#include <linux/uaccess.h>
@@ -27,6 +29,36 @@
#include "etnaviv_pci_drv.h"
#endif
+static struct device_node *etnaviv_of_first_available_node(void)
+{
+ struct device_node *core_node;
+
+ for_each_compatible_node(core_node, NULL, "vivante,gc") {
+ if (!of_device_is_available(core_node))
+ continue;
+
+ return core_node;
+ }
+
+ return NULL;
+}
+
+static bool etnaviv_is_dma_coherent(struct device *dev)
+{
+ struct device_node *np;
+ bool coherent;
+
+ np = etnaviv_of_first_available_node();
+ if (np) {
+ coherent = of_dma_is_coherent(np);
+ of_node_put(np);
+ } else {
+ coherent = dev_is_dma_coherent(dev);
+ }
+
+ return coherent;
+}
+
/*
* etnaviv private data construction and destructions:
*/
@@ -55,6 +87,10 @@ etnaviv_alloc_private(struct device *dev, struct drm_device *drm)
return ERR_PTR(-ENOMEM);
}
+ priv->dma_coherent = etnaviv_is_dma_coherent(dev);
+
+ drm_info(drm, "%s is dma coherent\n", dev_name(dev));
+
return priv;
}
@@ -46,6 +46,12 @@ struct etnaviv_drm_private {
struct xarray active_contexts;
u32 next_context_id;
+ /*
+ * If true, the GPU is capable of snoop cpu cache, here, it also
+ * means that cache coherency is enforced by the hardware.
+ */
+ bool dma_coherent;
+
/* list of GEM objects: */
struct mutex gem_lock;
struct list_head gem_list;
@@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj)
static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
{
struct page **pages;
+ pgprot_t prot;
lockdep_assert_held(&obj->lock);
@@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
if (IS_ERR(pages))
return NULL;
- return vmap(pages, obj->base.size >> PAGE_SHIFT,
- VM_MAP, pgprot_writecombine(PAGE_KERNEL));
+ switch (obj->flags) {
+ case ETNA_BO_CACHED:
+ prot = PAGE_KERNEL;
+ break;
+ case ETNA_BO_UNCACHED:
+ prot = pgprot_noncached(PAGE_KERNEL);
+ break;
+ case ETNA_BO_WC:
+ default:
+ prot = pgprot_writecombine(PAGE_KERNEL);
+ }
+
+ return vmap(pages, obj->base.size >> PAGE_SHIFT, VM_MAP, prot);
}
static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
@@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
{
struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
struct drm_device *dev = obj->dev;
+ struct etnaviv_drm_private *priv = dev->dev_private;
bool write = !!(op & ETNA_PREP_WRITE);
int ret;
@@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
return ret == 0 ? -ETIMEDOUT : ret;
}
- if (etnaviv_obj->flags & ETNA_BO_CACHED) {
+ if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
etnaviv_op_to_dma_dir(op));
etnaviv_obj->last_cpu_prep_op = op;
@@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
{
struct drm_device *dev = obj->dev;
struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
+ struct etnaviv_drm_private *priv = dev->dev_private;
- if (etnaviv_obj->flags & ETNA_BO_CACHED) {
+ if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
/* fini without a prep is almost certainly a userspace error */
WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
@@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
struct dma_buf_attachment *attach, struct sg_table *sgt)
{
+ struct etnaviv_drm_private *priv = dev->dev_private;
struct etnaviv_gem_object *etnaviv_obj;
size_t size = PAGE_ALIGN(attach->dmabuf->size);
+ u32 cache_flags = ETNA_BO_WC;
int ret, npages;
- ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
+ if (priv->dma_coherent)
+ cache_flags = ETNA_BO_CACHED;
+
+ ret = etnaviv_gem_new_private(dev, size, cache_flags,
&etnaviv_gem_prime_ops, &etnaviv_obj);
if (ret < 0)
return ERR_PTR(ret);
@@ -8,6 +8,7 @@
#include <linux/delay.h>
#include <linux/dma-fence.h>
#include <linux/dma-mapping.h>
+#include <linux/dma-map-ops.h>
#include <linux/module.h>
#include <linux/of_device.h>
#include <linux/platform_device.h>
@@ -164,6 +165,10 @@ int etnaviv_gpu_get_param(struct etnaviv_gpu *gpu, u32 param, u64 *value)
*value = gpu->identity.eco_id;
break;
+ case ETNAVIV_PARAM_GPU_COHERENT:
+ *value = priv->dma_coherent;
+ break;
+
default:
DBG("%s: invalid param: %u", dev_name(gpu->dev), param);
return -EINVAL;
@@ -1861,7 +1866,7 @@ static int etnaviv_gpu_register_irq(struct etnaviv_gpu *gpu, int irq)
gpu->irq = irq;
- dev_info(dev, "IRQ handler registered, irq = %d\n", irq);
+ dev_info(dev, "irq(%d) handler registered\n", irq);
return 0;
}
@@ -77,6 +77,7 @@ struct drm_etnaviv_timespec {
#define ETNAVIV_PARAM_GPU_PRODUCT_ID 0x1c
#define ETNAVIV_PARAM_GPU_CUSTOMER_ID 0x1d
#define ETNAVIV_PARAM_GPU_ECO_ID 0x1e
+#define ETNAVIV_PARAM_GPU_COHERENT 0x1f
#define ETNA_MAX_PIPES 4