[v2,4/4] driver core: Disable driver deferred probe timeout by default

Message ID 20221116120236.520017-1-javierm@redhat.com
State New
Headers
Series driver core: Decouple device links enforcing and probe deferral timeouts |

Commit Message

Javier Martinez Canillas Nov. 16, 2022, 12:02 p.m. UTC
  The driver_deferred_probe_timeout value has a long history. It was first
set to -1 when was introduced by commit 25b4e70dcce9 ("driver core: allow
stopping deferred probe after init"), meaning that the driver core would
defer the probe forever unless a subsystem would opt-in by checking if the
initcalls where done using the driver_deferred_probe_check_state() helper,
or if a timeout was explicitly set with a "deferred_probe_timeout" param.

Only the power domain, IOMMU and MDIO subsystems currently opt-in to check
if the initcalls have completed with driver_deferred_probe_check_state().

Commit c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state()
logic") then changed the driver_deferred_probe_check_state() helper logic,
to take into account whether modules have been enabled or not and also to
return -EPROBE_DEFER if the probe deferred timeout work was still running.

Then in commit e2cec7d68537 ("driver core: Set deferred_probe_timeout to a
longer default if CONFIG_MODULES is set"), the timeout was increased to 30
seconds if modules are enabled. Because seems that some of the subsystems
that were opt-in to not return -EPROBE_DEFER after the initcall where done
could still have dependencies whose drivers were built as a module.

This commit did a fundamental change to how probe deferral worked though,
since now the default was not to attempt probing for drivers indefinitely
but instead to timeout after 30 seconds, unless a different timeout is set
using the "deferred_probe_timeout" command line parameter.

The behavior was changed even more with commit ce68929f07de ("driver core:
Revert default driver_deferred_probe_timeout value to 0"), since the value
was set to 0 by default. Meaning that the probe deferral would be disabled
after the initcalls where done. Unless a timeout was set in the cmdline.

Notice that the commit said that it was reverting the default value to 0,
but this was never 0. The default was -1 at the beginning and then changed
to 30 in a later commit.

This default value of 0 was reverted again by commit f516d01b9df2 ("Revert
"driver core: Set default deferred_probe_timeout back to 0."") and set to
10 seconds instead. Which was still less than the 30 seconds that was set
at some point, to allow systems with drivers built as modules and loaded
later by user-land to probe drivers that were still in the deferred list.

The 10 seconds timeout isn't enough in some cases, for example the Fedora
kernel builds as much drivers as possible as modules. And this leads to an
Snapdragon SC7180 based HP X2 Chromebook to not have display, due the DRM
driver failing to probe if CONFIG_ARM_SMMU=y and CONFIG_SC_GPUCC_7180=m.

So let's change the default again to -1 as it was at the beginning. That's
how probe deferral always worked. The kernel should try to avoid guessing
when it should be safe to give up on deferred drivers to be probed.

The reason why the default "deferred_probe_timeout" was changed from -1 to
the other values was to allow drivers that have only optional dependencies
to probe even if the suppliers are not available.

But now there is a "fw_devlink.timeout" parameter to timeout the links and
allow drivers to probe even when the dependencies are not present. Let's
set the default for that timeout to 10 seconds, to give the same behaviour
as expected by these driver with optional device links.

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---

Changes in v2:
- Mention in the commit messsage the specific machine and drivers that
  are affected by the issue (Greg).
- Double check the commit message for accuracy (John).
- Add a second workqueue to timeout the devlink enforcing and allow
  drivers to probe even without their optional dependencies available.

 drivers/base/dd.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)
  

Comments

Andrew Halaney Nov. 16, 2022, 7:15 p.m. UTC | #1
On Wed, Nov 16, 2022 at 01:02:36PM +0100, Javier Martinez Canillas wrote:
> The driver_deferred_probe_timeout value has a long history. It was first
> set to -1 when was introduced by commit 25b4e70dcce9 ("driver core: allow
> stopping deferred probe after init"), meaning that the driver core would
> defer the probe forever unless a subsystem would opt-in by checking if the
> initcalls where done using the driver_deferred_probe_check_state() helper,
> or if a timeout was explicitly set with a "deferred_probe_timeout" param.

This or statement here sounds like you either opt-in, or the timeout
affects you (at least that's how I read it).

A subsystem has to opt-in to get either result by using
driver_deferred_probe_check_state()!

> 
> Only the power domain, IOMMU and MDIO subsystems currently opt-in to check
> if the initcalls have completed with driver_deferred_probe_check_state().
> 
> Commit c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state()
> logic") then changed the driver_deferred_probe_check_state() helper logic,
> to take into account whether modules have been enabled or not and also to
> return -EPROBE_DEFER if the probe deferred timeout work was still running.
> 
> Then in commit e2cec7d68537 ("driver core: Set deferred_probe_timeout to a
> longer default if CONFIG_MODULES is set"), the timeout was increased to 30
> seconds if modules are enabled. Because seems that some of the subsystems
> that were opt-in to not return -EPROBE_DEFER after the initcall where done

s/where/were/

> could still have dependencies whose drivers were built as a module.
> 
> This commit did a fundamental change to how probe deferral worked though,
> since now the default was not to attempt probing for drivers indefinitely
> but instead to timeout after 30 seconds, unless a different timeout is set
> using the "deferred_probe_timeout" command line parameter.
> 
> The behavior was changed even more with commit ce68929f07de ("driver core:
> Revert default driver_deferred_probe_timeout value to 0"), since the value
> was set to 0 by default. Meaning that the probe deferral would be disabled
> after the initcalls where done. Unless a timeout was set in the cmdline.
> 
> Notice that the commit said that it was reverting the default value to 0,
> but this was never 0. The default was -1 at the beginning and then changed
> to 30 in a later commit.
> 
> This default value of 0 was reverted again by commit f516d01b9df2 ("Revert
> "driver core: Set default deferred_probe_timeout back to 0."") and set to
> 10 seconds instead. Which was still less than the 30 seconds that was set
> at some point, to allow systems with drivers built as modules and loaded
> later by user-land to probe drivers that were still in the deferred list.
> 
> The 10 seconds timeout isn't enough in some cases, for example the Fedora
> kernel builds as much drivers as possible as modules. And this leads to an
> Snapdragon SC7180 based HP X2 Chromebook to not have display, due the DRM
> driver failing to probe if CONFIG_ARM_SMMU=y and CONFIG_SC_GPUCC_7180=m.
> 
> So let's change the default again to -1 as it was at the beginning. That's
> how probe deferral always worked. The kernel should try to avoid guessing
> when it should be safe to give up on deferred drivers to be probed.
> 
> The reason why the default "deferred_probe_timeout" was changed from -1 to
> the other values was to allow drivers that have only optional dependencies
> to probe even if the suppliers are not available.
> 
> But now there is a "fw_devlink.timeout" parameter to timeout the links and
> allow drivers to probe even when the dependencies are not present. Let's
> set the default for that timeout to 10 seconds, to give the same behaviour
> as expected by these driver with optional device links.
> 
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>

This sounds like a reasonable solution to me:

Acked-by: Andrew Halaney <ahalaney@redhat.com>

> ---
> 
> Changes in v2:
> - Mention in the commit messsage the specific machine and drivers that
>   are affected by the issue (Greg).
> - Double check the commit message for accuracy (John).
> - Add a second workqueue to timeout the devlink enforcing and allow
>   drivers to probe even without their optional dependencies available.
> 
>  drivers/base/dd.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index ea448df94d24..5f18cd497850 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -256,12 +256,8 @@ static int deferred_devs_show(struct seq_file *s, void *data)
>  }
>  DEFINE_SHOW_ATTRIBUTE(deferred_devs);
>  
> -#ifdef CONFIG_MODULES
> -static int driver_deferred_probe_timeout = 10;
> -#else
> -static int driver_deferred_probe_timeout;
> -#endif
> -static int fw_devlink_timeout = -1;
> +static int driver_deferred_probe_timeout = -1;
> +static int fw_devlink_timeout = 10;
>  
>  static int __init deferred_probe_timeout_setup(char *str)
>  {
> -- 
> 2.38.1
>
  

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index ea448df94d24..5f18cd497850 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -256,12 +256,8 @@  static int deferred_devs_show(struct seq_file *s, void *data)
 }
 DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
-#ifdef CONFIG_MODULES
-static int driver_deferred_probe_timeout = 10;
-#else
-static int driver_deferred_probe_timeout;
-#endif
-static int fw_devlink_timeout = -1;
+static int driver_deferred_probe_timeout = -1;
+static int fw_devlink_timeout = 10;
 
 static int __init deferred_probe_timeout_setup(char *str)
 {