drm/imagination: Defer probe if requested firmware is not available

Message ID 20240109120604.603700-1-javierm@redhat.com
State New
Headers
Series drm/imagination: Defer probe if requested firmware is not available |

Commit Message

Javier Martinez Canillas Jan. 9, 2024, 12:05 p.m. UTC
  The device is initialized in the driver's probe callback and as part of
that initialization, the required firmware is loaded. But this fails if
the driver is built-in and the firmware isn't present in the initramfs:

$ dmesg | grep powervr
[    2.969757] powervr fd00000.gpu: Direct firmware load for powervr/rogue_33.15.11.3_v1.fw failed with error -2
[    2.979727] powervr fd00000.gpu: [drm] *ERROR* failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-2)
[    2.989885] powervr: probe of fd00000.gpu failed with error -2

$ ls -lh /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
-rw-r--r-- 1 root root 51K Dec 12 19:00 /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz

To prevent the probe to fail for this case, let's defer the probe if the
firmware isn't available. That way, the driver core can retry it and get
the probe to eventually succeed once the root filesystem has been mounted.

If the firmware is also not present in the root filesystem, then the probe
will never succeed and the reason listed in the debugfs devices_deferred:

$ cat /sys/kernel/debug/devices_deferred
fd00000.gpu     powervr: failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-517)

Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
Suggested-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---

 drivers/gpu/drm/imagination/pvr_device.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)
  

Comments

Daniel Vetter Jan. 9, 2024, 1:22 p.m. UTC | #1
On Tue, Jan 09, 2024 at 01:05:59PM +0100, Javier Martinez Canillas wrote:
> The device is initialized in the driver's probe callback and as part of
> that initialization, the required firmware is loaded. But this fails if
> the driver is built-in and the firmware isn't present in the initramfs:
> 
> $ dmesg | grep powervr
> [    2.969757] powervr fd00000.gpu: Direct firmware load for powervr/rogue_33.15.11.3_v1.fw failed with error -2
> [    2.979727] powervr fd00000.gpu: [drm] *ERROR* failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-2)
> [    2.989885] powervr: probe of fd00000.gpu failed with error -2
> 
> $ ls -lh /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
> -rw-r--r-- 1 root root 51K Dec 12 19:00 /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
> 
> To prevent the probe to fail for this case, let's defer the probe if the
> firmware isn't available. That way, the driver core can retry it and get
> the probe to eventually succeed once the root filesystem has been mounted.
> 
> If the firmware is also not present in the root filesystem, then the probe
> will never succeed and the reason listed in the debugfs devices_deferred:
> 
> $ cat /sys/kernel/debug/devices_deferred
> fd00000.gpu     powervr: failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-517)
> 
> Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
> Suggested-by: Maxime Ripard <mripard@kernel.org>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>

Uh that doesn't work.

Probe is for "I'm missing a struct device" and _only_ that. You can't
assume that probe deferral will defer enough until the initrd shows up.

You need to fix this by fixing the initrd to include the required
firmwares. This is what MODULE_FIRMWARE is for, and if your initrd fails
to observe that it's just broken.

Yes I know as long as you have enough stuff built as module so that there
will be _any_ kind of device probe after the root fs is mounted, this
works, because that triggers a re-probe of everything. But that's the most
kind of fragile fix there is.

If you want to change that then I think that needs an official blessing
from Greg KH/device core folks.

Cheers, Sima

> ---
> 
>  drivers/gpu/drm/imagination/pvr_device.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/imagination/pvr_device.c b/drivers/gpu/drm/imagination/pvr_device.c
> index 1704c0268589..6eda25366431 100644
> --- a/drivers/gpu/drm/imagination/pvr_device.c
> +++ b/drivers/gpu/drm/imagination/pvr_device.c
> @@ -295,8 +295,16 @@ pvr_request_firmware(struct pvr_device *pvr_dev)
>  	 */
>  	err = request_firmware(&fw, filename, pvr_dev->base.dev);
>  	if (err) {
> -		drm_err(drm_dev, "failed to load firmware %s (err=%d)\n",
> -			filename, err);
> +		/*
> +		 * Defer probe if the firmware is not available yet (e.g: the driver
> +		 * is built-in and the firmware not present in the initramfs image).
> +		 */
> +		if (err == -ENOENT)
> +			err = -EPROBE_DEFER;
> +
> +		dev_err_probe(drm_dev->dev, err, "failed to load firmware %s (err=%d)\n",
> +			      filename, err);
> +
>  		goto err_free_filename;
>  	}
>  
> -- 
> 2.43.0
>
  
Javier Martinez Canillas Jan. 9, 2024, 1:48 p.m. UTC | #2
Daniel Vetter <daniel@ffwll.ch> writes:

Hello Sima,

Thanks for your feedback.

> On Tue, Jan 09, 2024 at 01:05:59PM +0100, Javier Martinez Canillas wrote:
>> The device is initialized in the driver's probe callback and as part of
>> that initialization, the required firmware is loaded. But this fails if
>> the driver is built-in and the firmware isn't present in the initramfs:
>> 
>> $ dmesg | grep powervr
>> [    2.969757] powervr fd00000.gpu: Direct firmware load for powervr/rogue_33.15.11.3_v1.fw failed with error -2
>> [    2.979727] powervr fd00000.gpu: [drm] *ERROR* failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-2)
>> [    2.989885] powervr: probe of fd00000.gpu failed with error -2
>> 
>> $ ls -lh /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
>> -rw-r--r-- 1 root root 51K Dec 12 19:00 /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
>> 
>> To prevent the probe to fail for this case, let's defer the probe if the
>> firmware isn't available. That way, the driver core can retry it and get
>> the probe to eventually succeed once the root filesystem has been mounted.
>> 
>> If the firmware is also not present in the root filesystem, then the probe
>> will never succeed and the reason listed in the debugfs devices_deferred:
>> 
>> $ cat /sys/kernel/debug/devices_deferred
>> fd00000.gpu     powervr: failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-517)
>> 
>> Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
>> Suggested-by: Maxime Ripard <mripard@kernel.org>
>> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
>
> Uh that doesn't work.
>
> Probe is for "I'm missing a struct device" and _only_ that. You can't
> assume that probe deferral will defer enough until the initrd shows up.
>

Fair.

> You need to fix this by fixing the initrd to include the required
> firmwares. This is what MODULE_FIRMWARE is for, and if your initrd fails
> to observe that it's just broken.
>

Tha's already the case, when is built as a module the initrd (dracut in
this particular case) does figure out that the firmware needs to be added
but that doesn't work when the DRM driver is built-in. Because dracut is
not able to figure out and doesn't even have a powervr.ko info to look at
whatever is set by the MODULE_FIRMWARE macro.

> Yes I know as long as you have enough stuff built as module so that there
> will be _any_ kind of device probe after the root fs is mounted, this
> works, because that triggers a re-probe of everything. But that's the most
> kind of fragile fix there is.
>

Is fragile that's true but on the other hand it does solve the issue in
pratice. The whole device probal mechanism is just a best effort anyways.

> If you want to change that then I think that needs an official blessing
> from Greg KH/device core folks.
>

I liked this approach due its simplicity but an alternative (and more
complex) solution could be to delay the firmware request and not do it at
probe time.

For example, the following (only barely tested) patch solves the issue for
me as well but it's a bigger change to this driver and wasn't sure if will
be acceptable:

From c3fb715047a44691412196d8408f2bd495bcd1ed Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javierm@redhat.com>
Date: Tue, 9 Jan 2024 14:47:05 +0100
Subject: [RFC PATCH] drm/imagination: Move PowerVR GPU init to the drivers's open
 callback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Currently the device is initialized in the driver's probe callback. But as
part of this initialization, the required firmware is loaded and this will
fail when the driver is built-in, unless FW is included in the initramfs:

$ dmesg | grep powervr
[    2.969757] powervr fd00000.gpu: Direct firmware load for powervr/rogue_33.15.11.3_v1.fw failed with error -2
[    2.979727] powervr fd00000.gpu: [drm] *ERROR* failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-2)
[    2.989885] powervr: probe of fd00000.gpu failed with error -2

$ ls -lh /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
-rw-r--r-- 1 root root 51K Dec 12 19:00 /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz

To prevent this, let's delay the PowerVR GPU-specific initialization until
the render device is opened by user-space. By then, the root filesystem
will be mounted already and the driver able to find the required firmware.

Besides the mentioned problem, it seems more correct to only load firmware
and request the IRQ if the device is opened rather than do these on probe.

Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---
 drivers/gpu/drm/imagination/pvr_device.c | 41 +++++++-----------------
 drivers/gpu/drm/imagination/pvr_device.h |  2 ++
 drivers/gpu/drm/imagination/pvr_drv.c    | 19 +++++++----
 3 files changed, 27 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/imagination/pvr_device.c b/drivers/gpu/drm/imagination/pvr_device.c
index 1704c0268589..1e0a3868394e 100644
--- a/drivers/gpu/drm/imagination/pvr_device.c
+++ b/drivers/gpu/drm/imagination/pvr_device.c
@@ -404,7 +404,7 @@ pvr_set_dma_info(struct pvr_device *pvr_dev)
  *  * Any error returned by pvr_memory_context_init(), or
  *  * Any error returned by pvr_request_firmware().
  */
-static int
+int
 pvr_device_gpu_init(struct pvr_device *pvr_dev)
 {
 	int err;
@@ -444,6 +444,10 @@ pvr_device_gpu_init(struct pvr_device *pvr_dev)
 	if (err)
 		goto err_vm_ctx_put;
 
+	err = pvr_device_irq_init(pvr_dev);
+	if (err)
+		goto err_vm_ctx_put;
+
 	return 0;
 
 err_vm_ctx_put:
@@ -459,9 +463,15 @@ pvr_device_gpu_init(struct pvr_device *pvr_dev)
  * pvr_device_gpu_fini() - GPU-specific deinitialization for a PowerVR device
  * @pvr_dev: Target PowerVR device.
  */
-static void
+void
 pvr_device_gpu_fini(struct pvr_device *pvr_dev)
 {
+	/*
+	 * Deinitialization stages are performed in reverse order compared to
+	 * the initialization stages in pvr_device_gpu_init().
+	 */
+	pvr_device_irq_fini(pvr_dev);
+
 	pvr_fw_fini(pvr_dev);
 
 	if (pvr_dev->fw_dev.processor_type != PVR_FW_PROCESSOR_TYPE_MIPS) {
@@ -519,43 +529,16 @@ pvr_device_init(struct pvr_device *pvr_dev)
 	if (err)
 		goto err_pm_runtime_put;
 
-	/* Perform GPU-specific initialization steps. */
-	err = pvr_device_gpu_init(pvr_dev);
-	if (err)
-		goto err_pm_runtime_put;
-
-	err = pvr_device_irq_init(pvr_dev);
-	if (err)
-		goto err_device_gpu_fini;
-
 	pm_runtime_put(dev);
 
 	return 0;
 
-err_device_gpu_fini:
-	pvr_device_gpu_fini(pvr_dev);
-
 err_pm_runtime_put:
 	pm_runtime_put_sync_suspend(dev);
 
 	return err;
 }
 
-/**
- * pvr_device_fini() - Deinitialize a PowerVR device
- * @pvr_dev: Target PowerVR device.
- */
-void
-pvr_device_fini(struct pvr_device *pvr_dev)
-{
-	/*
-	 * Deinitialization stages are performed in reverse order compared to
-	 * the initialization stages in pvr_device_init().
-	 */
-	pvr_device_irq_fini(pvr_dev);
-	pvr_device_gpu_fini(pvr_dev);
-}
-
 bool
 pvr_device_has_uapi_quirk(struct pvr_device *pvr_dev, u32 quirk)
 {
diff --git a/drivers/gpu/drm/imagination/pvr_device.h b/drivers/gpu/drm/imagination/pvr_device.h
index 2ca7e535799f..3083fcd3f91e 100644
--- a/drivers/gpu/drm/imagination/pvr_device.h
+++ b/drivers/gpu/drm/imagination/pvr_device.h
@@ -481,6 +481,8 @@ packed_bvnc_to_pvr_gpu_id(u64 bvnc, struct pvr_gpu_id *gpu_id)
 	gpu_id->c = bvnc & GENMASK_ULL(15, 0);
 }
 
+int pvr_device_gpu_init(struct pvr_device *pvr_dev);
+void pvr_device_gpu_fini(struct pvr_device *pvr_dev);
 int pvr_device_init(struct pvr_device *pvr_dev);
 void pvr_device_fini(struct pvr_device *pvr_dev);
 void pvr_device_reset(struct pvr_device *pvr_dev);
diff --git a/drivers/gpu/drm/imagination/pvr_drv.c b/drivers/gpu/drm/imagination/pvr_drv.c
index 5c3b2d58d766..f8fb45136326 100644
--- a/drivers/gpu/drm/imagination/pvr_drv.c
+++ b/drivers/gpu/drm/imagination/pvr_drv.c
@@ -1309,10 +1309,18 @@ pvr_drm_driver_open(struct drm_device *drm_dev, struct drm_file *file)
 {
 	struct pvr_device *pvr_dev = to_pvr_device(drm_dev);
 	struct pvr_file *pvr_file;
+	int err;
+
+	/* Perform GPU-specific initialization steps. */
+	err = pvr_device_gpu_init(pvr_dev);
+	if (err)
+		return err;
 
 	pvr_file = kzalloc(sizeof(*pvr_file), GFP_KERNEL);
-	if (!pvr_file)
+	if (!pvr_file) {
+		pvr_device_gpu_fini(pvr_dev);
 		return -ENOMEM;
+	}
 
 	/*
 	 * Store reference to base DRM file private data for use by
@@ -1354,6 +1362,7 @@ static void
 pvr_drm_driver_postclose(__always_unused struct drm_device *drm_dev,
 			 struct drm_file *file)
 {
+	struct pvr_device *pvr_dev = to_pvr_device(drm_dev);
 	struct pvr_file *pvr_file = to_pvr_file(file);
 
 	/* Kill remaining contexts. */
@@ -1364,6 +1373,8 @@ pvr_drm_driver_postclose(__always_unused struct drm_device *drm_dev,
 	pvr_destroy_hwrt_datasets_for_file(pvr_file);
 	pvr_destroy_vm_contexts_for_file(pvr_file);
 
+	pvr_device_gpu_fini(pvr_dev);
+
 	kfree(pvr_file);
 	file->driver_priv = NULL;
 }
@@ -1430,16 +1441,13 @@ pvr_probe(struct platform_device *plat_dev)
 
 	err = drm_dev_register(drm_dev, 0);
 	if (err)
-		goto err_device_fini;
+		goto err_watchdog_fini;
 
 	xa_init_flags(&pvr_dev->free_list_ids, XA_FLAGS_ALLOC1);
 	xa_init_flags(&pvr_dev->job_ids, XA_FLAGS_ALLOC1);
 
 	return 0;
 
-err_device_fini:
-	pvr_device_fini(pvr_dev);
-
 err_watchdog_fini:
 	pvr_watchdog_fini(pvr_dev);
 
@@ -1464,7 +1472,6 @@ pvr_remove(struct platform_device *plat_dev)
 	xa_destroy(&pvr_dev->free_list_ids);
 
 	pm_runtime_suspend(drm_dev->dev);
-	pvr_device_fini(pvr_dev);
 	drm_dev_unplug(drm_dev);
 	pvr_watchdog_fini(pvr_dev);
 	pvr_queue_device_fini(pvr_dev);
  
Daniel Vetter Jan. 9, 2024, 2:02 p.m. UTC | #3
On Tue, Jan 09, 2024 at 02:48:42PM +0100, Javier Martinez Canillas wrote:
> Daniel Vetter <daniel@ffwll.ch> writes:
> 
> Hello Sima,
> 
> Thanks for your feedback.
> 
> > On Tue, Jan 09, 2024 at 01:05:59PM +0100, Javier Martinez Canillas wrote:
> >> The device is initialized in the driver's probe callback and as part of
> >> that initialization, the required firmware is loaded. But this fails if
> >> the driver is built-in and the firmware isn't present in the initramfs:
> >> 
> >> $ dmesg | grep powervr
> >> [    2.969757] powervr fd00000.gpu: Direct firmware load for powervr/rogue_33.15.11.3_v1.fw failed with error -2
> >> [    2.979727] powervr fd00000.gpu: [drm] *ERROR* failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-2)
> >> [    2.989885] powervr: probe of fd00000.gpu failed with error -2
> >> 
> >> $ ls -lh /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
> >> -rw-r--r-- 1 root root 51K Dec 12 19:00 /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
> >> 
> >> To prevent the probe to fail for this case, let's defer the probe if the
> >> firmware isn't available. That way, the driver core can retry it and get
> >> the probe to eventually succeed once the root filesystem has been mounted.
> >> 
> >> If the firmware is also not present in the root filesystem, then the probe
> >> will never succeed and the reason listed in the debugfs devices_deferred:
> >> 
> >> $ cat /sys/kernel/debug/devices_deferred
> >> fd00000.gpu     powervr: failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-517)
> >> 
> >> Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
> >> Suggested-by: Maxime Ripard <mripard@kernel.org>
> >> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> >
> > Uh that doesn't work.
> >
> > Probe is for "I'm missing a struct device" and _only_ that. You can't
> > assume that probe deferral will defer enough until the initrd shows up.
> >
> 
> Fair.
> 
> > You need to fix this by fixing the initrd to include the required
> > firmwares. This is what MODULE_FIRMWARE is for, and if your initrd fails
> > to observe that it's just broken.
> >
> 
> Tha's already the case, when is built as a module the initrd (dracut in
> this particular case) does figure out that the firmware needs to be added
> but that doesn't work when the DRM driver is built-in. Because dracut is
> not able to figure out and doesn't even have a powervr.ko info to look at
> whatever is set by the MODULE_FIRMWARE macro.

Yeah built-in drivers that require firmware don't really work. I'm not
sure it changed, but a while ago you had to actually include these in the
kernel image itself (initrd was again too late), and that gives you
something you can't even ship because it links blobs with gplv2 kernel.

Maybe that changed and the initramfs is set up early enough now that it's
sufficient to have it there ...

Either way I think this needs module/kernel-image build changes so that
the list of firmware images needed for the kernel itself is dumped
somewhere, so that dracut can consume it and tdrt. My take at least.

> > Yes I know as long as you have enough stuff built as module so that there
> > will be _any_ kind of device probe after the root fs is mounted, this
> > works, because that triggers a re-probe of everything. But that's the most
> > kind of fragile fix there is.
> >
> 
> Is fragile that's true but on the other hand it does solve the issue in
> pratice. The whole device probal mechanism is just a best effort anyways.
> 
> > If you want to change that then I think that needs an official blessing
> > from Greg KH/device core folks.
> >
> 
> I liked this approach due its simplicity but an alternative (and more
> complex) solution could be to delay the firmware request and not do it at
> probe time.
> 
> For example, the following (only barely tested) patch solves the issue for
> me as well but it's a bigger change to this driver and wasn't sure if will
> be acceptable:

I think this is still barking up the wrong tree. I think there's two
proper fixes:

- make the "EPROBE_DEFER delays until rootfs no matter what" official and
  documented policy. That's much better than drivers hand-rolling
  EPROBE_DEFER each in their own driver code.

- fix kernel build and dracut so it can pick up the firmware images the
  kernel itself needs. Because having a driver built-in but it still fails
  to load until the rootfs is there is some very confusing failure mode.
  Due to that failure mode I think this is the right fix, otherwise
  built-in drivers become confusing.

  Alternatively I guess you could disallow drm/img as a built-in driver
  ... And also any other driver that requires fw to function.

I don't think a "mostly works due to undocumented driver-specific hack" is
a good fix, since this is entirely a generic issue.

I think it's different if the fw is only needed for optional features,
e.g. for i915 some of the display firmware is only needed for self refresh
and low power modes. And a runtime_pm_get until the firmware has shown up
to prevent mayhem is imo a clean design for that, since the hardware is
fully working aside from using a bit too much power.

> From c3fb715047a44691412196d8408f2bd495bcd1ed Mon Sep 17 00:00:00 2001
> From: Javier Martinez Canillas <javierm@redhat.com>
> Date: Tue, 9 Jan 2024 14:47:05 +0100
> Subject: [RFC PATCH] drm/imagination: Move PowerVR GPU init to the drivers's open
>  callback
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Currently the device is initialized in the driver's probe callback. But as
> part of this initialization, the required firmware is loaded and this will
> fail when the driver is built-in, unless FW is included in the initramfs:
> 
> $ dmesg | grep powervr
> [    2.969757] powervr fd00000.gpu: Direct firmware load for powervr/rogue_33.15.11.3_v1.fw failed with error -2
> [    2.979727] powervr fd00000.gpu: [drm] *ERROR* failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-2)
> [    2.989885] powervr: probe of fd00000.gpu failed with error -2
> 
> $ ls -lh /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
> -rw-r--r-- 1 root root 51K Dec 12 19:00 /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
> 
> To prevent this, let's delay the PowerVR GPU-specific initialization until
> the render device is opened by user-space. By then, the root filesystem
> will be mounted already and the driver able to find the required firmware.
> 
> Besides the mentioned problem, it seems more correct to only load firmware
> and request the IRQ if the device is opened rather than do these on probe.
> 
> Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> ---

[snip]

> diff --git a/drivers/gpu/drm/imagination/pvr_drv.c b/drivers/gpu/drm/imagination/pvr_drv.c
> index 5c3b2d58d766..f8fb45136326 100644
> --- a/drivers/gpu/drm/imagination/pvr_drv.c
> +++ b/drivers/gpu/drm/imagination/pvr_drv.c
> @@ -1309,10 +1309,18 @@ pvr_drm_driver_open(struct drm_device *drm_dev, struct drm_file *file)
>  {
>  	struct pvr_device *pvr_dev = to_pvr_device(drm_dev);
>  	struct pvr_file *pvr_file;
> +	int err;
> +
> +	/* Perform GPU-specific initialization steps. */
> +	err = pvr_device_gpu_init(pvr_dev);

Ok this is full blas "init hw on first open" drm 1 design. I think what
would be ok somewhat is delaying the drm_dev_register, but this here gives
me nightmares ...

Please no :-)

Cheers, Sima
  
Javier Martinez Canillas Jan. 9, 2024, 3:21 p.m. UTC | #4
Daniel Vetter <daniel@ffwll.ch> writes:

> On Tue, Jan 09, 2024 at 02:48:42PM +0100, Javier Martinez Canillas wrote:
>> Daniel Vetter <daniel@ffwll.ch> writes:
>> 
>> Hello Sima,
>> 
>> Thanks for your feedback.
>> 
>> > On Tue, Jan 09, 2024 at 01:05:59PM +0100, Javier Martinez Canillas wrote:
>> >> The device is initialized in the driver's probe callback and as part of
>> >> that initialization, the required firmware is loaded. But this fails if
>> >> the driver is built-in and the firmware isn't present in the initramfs:
>> >> 
>> >> $ dmesg | grep powervr
>> >> [    2.969757] powervr fd00000.gpu: Direct firmware load for powervr/rogue_33.15.11.3_v1.fw failed with error -2
>> >> [    2.979727] powervr fd00000.gpu: [drm] *ERROR* failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-2)
>> >> [    2.989885] powervr: probe of fd00000.gpu failed with error -2
>> >> 
>> >> $ ls -lh /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
>> >> -rw-r--r-- 1 root root 51K Dec 12 19:00 /lib/firmware/powervr/rogue_33.15.11.3_v1.fw.xz
>> >> 
>> >> To prevent the probe to fail for this case, let's defer the probe if the
>> >> firmware isn't available. That way, the driver core can retry it and get
>> >> the probe to eventually succeed once the root filesystem has been mounted.
>> >> 
>> >> If the firmware is also not present in the root filesystem, then the probe
>> >> will never succeed and the reason listed in the debugfs devices_deferred:
>> >> 
>> >> $ cat /sys/kernel/debug/devices_deferred
>> >> fd00000.gpu     powervr: failed to load firmware powervr/rogue_33.15.11.3_v1.fw (err=-517)
>> >> 
>> >> Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
>> >> Suggested-by: Maxime Ripard <mripard@kernel.org>
>> >> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
>> >
>> > Uh that doesn't work.
>> >
>> > Probe is for "I'm missing a struct device" and _only_ that. You can't
>> > assume that probe deferral will defer enough until the initrd shows up.
>> >
>> 
>> Fair.
>> 
>> > You need to fix this by fixing the initrd to include the required
>> > firmwares. This is what MODULE_FIRMWARE is for, and if your initrd fails
>> > to observe that it's just broken.
>> >
>> 
>> Tha's already the case, when is built as a module the initrd (dracut in
>> this particular case) does figure out that the firmware needs to be added
>> but that doesn't work when the DRM driver is built-in. Because dracut is
>> not able to figure out and doesn't even have a powervr.ko info to look at
>> whatever is set by the MODULE_FIRMWARE macro.
>
> Yeah built-in drivers that require firmware don't really work. I'm not
> sure it changed, but a while ago you had to actually include these in the
> kernel image itself (initrd was again too late), and that gives you
> something you can't even ship because it links blobs with gplv2 kernel.
>

Indeed and even let the legal question aside, doing that makes the kernel
too platform specific, even more than building drivers in.

> Maybe that changed and the initramfs is set up early enough now that it's
> sufficient to have it there ...
>

It does work if I force to include the firmware file in the initrd, e.g:

$ cat /etc/dracut.conf.d/firmware.conf 
install_items+=" /lib/firmware/powervr/rogue_33.15.11.3_v1.fw "

> Either way I think this needs module/kernel-image build changes so that
> the list of firmware images needed for the kernel itself is dumped
> somewhere, so that dracut can consume it and tdrt. My take at least.
>

That's a very good idea indeed. Dracut just looking at loaded modules and
their respective module info doesn't really work for built-in drivers...

>> > Yes I know as long as you have enough stuff built as module so that there
>> > will be _any_ kind of device probe after the root fs is mounted, this
>> > works, because that triggers a re-probe of everything. But that's the most
>> > kind of fragile fix there is.
>> >
>> 
>> Is fragile that's true but on the other hand it does solve the issue in
>> pratice. The whole device probal mechanism is just a best effort anyways.
>> 
>> > If you want to change that then I think that needs an official blessing
>> > from Greg KH/device core folks.
>> >

Ok. Let's see what Greg says, I think $SUBJECT is the path of least
resistance and something that is simple enough that could be easy to
backport / cherry-pick if needed.

But also agree with you that it's fragile (just like the driver as is).

Something that I forgot to mention but came up in our IRC discussion is
that the powervr driver is render only, so deferring the probe won't
affect the display that's driven by a different driver (tidss in my case).

>> 
>> I liked this approach due its simplicity but an alternative (and more
>> complex) solution could be to delay the firmware request and not do it at
>> probe time.
>> 
>> For example, the following (only barely tested) patch solves the issue for
>> me as well but it's a bigger change to this driver and wasn't sure if will
>> be acceptable:
>
> I think this is still barking up the wrong tree. I think there's two
> proper fixes:
>
> - make the "EPROBE_DEFER delays until rootfs no matter what" official and
>   documented policy. That's much better than drivers hand-rolling
>   EPROBE_DEFER each in their own driver code.
>

I would love that to be the case but the whole probe and deferral is
already quite messy. Most subsystems just keep deferring but there are
some that timeout after an arbitrary value (currently that being 30 secs)
and there's  a "deferred_probe_timeout" kernel cmdline param to change it.

I even tried to disable that timeout by default to have an official and
consistent deferral policy but unfortunately the patches were nacked due
some people relying on the existing beahviour of the deferral timing out:

https://lore.kernel.org/lkml/20221116115348.517599-1-javierm@redhat.com/

> - fix kernel build and dracut so it can pick up the firmware images the
>   kernel itself needs. Because having a driver built-in but it still fails
>   to load until the rootfs is there is some very confusing failure mode.
>   Due to that failure mode I think this is the right fix, otherwise
>   built-in drivers become confusing.
>

That's a good idea but that will mean adding a new kernel interface (and
unsure where that should live since sysfs for example has the "one entry,
one value" rule. I don't know where is a good place to expose such list.

>   Alternatively I guess you could disallow drm/img as a built-in driver
>   ... And also any other driver that requires fw to function.
>

That's an option too. But I still think that retrying is better than forcing
the driver to be built as a module.

> I don't think a "mostly works due to undocumented driver-specific hack" is
> a good fix, since this is entirely a generic issue.
>

Maybe something that could be proposed is to have a request_firmware_defer()
helper that changes the request_firmare() behaviour to return -EPROBE_DEFER
instead of -ENOENT? At least is something that could be documented and will
avoid drivers to open code a if (ret == -ENOENT) return -EPROBE_DEFER logic?

> I think it's different if the fw is only needed for optional features,
> e.g. for i915 some of the display firmware is only needed for self refresh
> and low power modes. And a runtime_pm_get until the firmware has shown up
> to prevent mayhem is imo a clean design for that, since the hardware is
> fully working aside from using a bit too much power.
>

As mentioned is only needed for rendering, display works without the FW
since is handled by another driver.

[...]

>> diff --git a/drivers/gpu/drm/imagination/pvr_drv.c b/drivers/gpu/drm/imagination/pvr_drv.c
>> index 5c3b2d58d766..f8fb45136326 100644
>> --- a/drivers/gpu/drm/imagination/pvr_drv.c
>> +++ b/drivers/gpu/drm/imagination/pvr_drv.c
>> @@ -1309,10 +1309,18 @@ pvr_drm_driver_open(struct drm_device *drm_dev, struct drm_file *file)
>>  {
>>  	struct pvr_device *pvr_dev = to_pvr_device(drm_dev);
>>  	struct pvr_file *pvr_file;
>> +	int err;
>> +
>> +	/* Perform GPU-specific initialization steps. */
>> +	err = pvr_device_gpu_init(pvr_dev);
>
> Ok this is full blas "init hw on first open" drm 1 design. I think what
> would be ok somewhat is delaying the drm_dev_register, but this here gives
> me nightmares ...
>
> Please no :-)
>

Ok, that's why I added a RFC prefix to this patch's subject :)

> Cheers, Sima
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>
  

Patch

diff --git a/drivers/gpu/drm/imagination/pvr_device.c b/drivers/gpu/drm/imagination/pvr_device.c
index 1704c0268589..6eda25366431 100644
--- a/drivers/gpu/drm/imagination/pvr_device.c
+++ b/drivers/gpu/drm/imagination/pvr_device.c
@@ -295,8 +295,16 @@  pvr_request_firmware(struct pvr_device *pvr_dev)
 	 */
 	err = request_firmware(&fw, filename, pvr_dev->base.dev);
 	if (err) {
-		drm_err(drm_dev, "failed to load firmware %s (err=%d)\n",
-			filename, err);
+		/*
+		 * Defer probe if the firmware is not available yet (e.g: the driver
+		 * is built-in and the firmware not present in the initramfs image).
+		 */
+		if (err == -ENOENT)
+			err = -EPROBE_DEFER;
+
+		dev_err_probe(drm_dev->dev, err, "failed to load firmware %s (err=%d)\n",
+			      filename, err);
+
 		goto err_free_filename;
 	}