firmware_loader: Add reboot_in_progress for user helper path

Message ID 1694773288-15755-1-git-send-email-quic_mojha@quicinc.com
State New
Headers
Series firmware_loader: Add reboot_in_progress for user helper path |

Commit Message

Mukesh Ojha Sept. 15, 2023, 10:21 a.m. UTC
  There could be following scenario where there is a ongoing reboot
is going from processA which tries to call all the reboot notifier
callback and one of them is firmware reboot call which tries to
abort all the ongoing firmware userspace request under fw_lock
but there could be another processB which tries to do request
firmware, which came just after abort done from ProcessA and
ask for userspace to load the firmware and this can stop the
ongoing reboot ProcessA to stall for next 60s(default timeout)
which may be expected behaviour everyone like to see, instead
we should abort every request which came after once firmware
marks reboot notification.

      ProcessA                             ProcessB

kernel_restart_prepare
  blocking_notifier_call_chain
   fw_shutdown_notify
     kill_pending_fw_fallback_reqs
      __fw_load_abort
       fw_state_aborted	              request_firmware
         __fw_state_set                   firmware_fallback_sysfs
...                                        fw_load_from_user_helper
..                                         ...
.                                          ..
                                            usermodehelper_read_trylock
                                             fw_load_sysfs_fallback
                                              fw_sysfs_wait_timeout
usermodehelper_disable
 __usermodehelper_disable
  down_write()

Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
---
 drivers/base/firmware_loader/fallback.c | 2 +-
 drivers/base/firmware_loader/firmware.h | 1 +
 drivers/base/firmware_loader/main.c     | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)
  

Comments

Greg KH Sept. 17, 2023, 10:12 a.m. UTC | #1
On Fri, Sep 15, 2023 at 03:51:28PM +0530, Mukesh Ojha wrote:
> There could be following scenario where there is a ongoing reboot
> is going from processA which tries to call all the reboot notifier
> callback and one of them is firmware reboot call which tries to
> abort all the ongoing firmware userspace request under fw_lock
> but there could be another processB which tries to do request
> firmware, which came just after abort done from ProcessA and
> ask for userspace to load the firmware and this can stop the
> ongoing reboot ProcessA to stall for next 60s(default timeout)
> which may be expected behaviour everyone like to see, instead
> we should abort every request which came after once firmware
> marks reboot notification.
> 
>       ProcessA                             ProcessB
> 
> kernel_restart_prepare
>   blocking_notifier_call_chain
>    fw_shutdown_notify
>      kill_pending_fw_fallback_reqs
>       __fw_load_abort
>        fw_state_aborted	              request_firmware
>          __fw_state_set                   firmware_fallback_sysfs
> ...                                        fw_load_from_user_helper
> ..                                         ...
> .                                          ..
>                                             usermodehelper_read_trylock
>                                              fw_load_sysfs_fallback
>                                               fw_sysfs_wait_timeout
> usermodehelper_disable
>  __usermodehelper_disable
>   down_write()
> 
> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> ---
>  drivers/base/firmware_loader/fallback.c | 2 +-
>  drivers/base/firmware_loader/firmware.h | 1 +
>  drivers/base/firmware_loader/main.c     | 2 ++
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
> index bf68e3947814..a5546aeea91f 100644
> --- a/drivers/base/firmware_loader/fallback.c
> +++ b/drivers/base/firmware_loader/fallback.c
> @@ -86,7 +86,7 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
>  	}
>  
>  	mutex_lock(&fw_lock);
> -	if (fw_state_is_aborted(fw_priv)) {
> +	if (reboot_in_progress || fw_state_is_aborted(fw_priv)) {
>  		mutex_unlock(&fw_lock);
>  		retval = -EINTR;
>  		goto out;

What prevents reboot_in_progress to change right after you check it
here?

And what kernel driver is trying to call the reboot notifier that gets
mixed up in this?  Why not fix that driver to not need the reboot
notifier at all (hint, I really doubt it needs it...)

> --- a/drivers/base/firmware_loader/main.c
> +++ b/drivers/base/firmware_loader/main.c
> @@ -93,6 +93,7 @@ static inline struct fw_priv *to_fw_priv(struct kref *ref)
>  DEFINE_MUTEX(fw_lock);
>  
>  struct firmware_cache fw_cache;
> +bool reboot_in_progress;

Bad global name for a variable in the firmware_loader core.

thanks,

greg k-h
  
Mukesh Ojha Sept. 19, 2023, 7:57 a.m. UTC | #2
On 9/17/2023 3:42 PM, Greg KH wrote:
> On Fri, Sep 15, 2023 at 03:51:28PM +0530, Mukesh Ojha wrote:
>> There could be following scenario where there is a ongoing reboot
>> is going from processA which tries to call all the reboot notifier
>> callback and one of them is firmware reboot call which tries to
>> abort all the ongoing firmware userspace request under fw_lock
>> but there could be another processB which tries to do request
>> firmware, which came just after abort done from ProcessA and
>> ask for userspace to load the firmware and this can stop the
>> ongoing reboot ProcessA to stall for next 60s(default timeout)
>> which may be expected behaviour everyone like to see, instead
>> we should abort every request which came after once firmware
>> marks reboot notification.
>>
>>        ProcessA                             ProcessB
>>
>> kernel_restart_prepare
>>    blocking_notifier_call_chain
>>     fw_shutdown_notify
>>       kill_pending_fw_fallback_reqs
>>        __fw_load_abort
>>         fw_state_aborted	              request_firmware
>>           __fw_state_set                   firmware_fallback_sysfs
>> ...                                        fw_load_from_user_helper
>> ..                                         ...
>> .                                          ..
>>                                              usermodehelper_read_trylock
>>                                               fw_load_sysfs_fallback
>>                                                fw_sysfs_wait_timeout
>> usermodehelper_disable
>>   __usermodehelper_disable
>>    down_write()
>>
>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
>> ---
>>   drivers/base/firmware_loader/fallback.c | 2 +-
>>   drivers/base/firmware_loader/firmware.h | 1 +
>>   drivers/base/firmware_loader/main.c     | 2 ++
>>   3 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
>> index bf68e3947814..a5546aeea91f 100644
>> --- a/drivers/base/firmware_loader/fallback.c
>> +++ b/drivers/base/firmware_loader/fallback.c
>> @@ -86,7 +86,7 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
>>   	}
>>   
>>   	mutex_lock(&fw_lock);
>> -	if (fw_state_is_aborted(fw_priv)) {
>> +	if (reboot_in_progress || fw_state_is_aborted(fw_priv)) {
>>   		mutex_unlock(&fw_lock);
>>   		retval = -EINTR;
>>   		goto out;
> 
> What prevents reboot_in_progress to change right after you check it
> here?

e.g, reboot_in_progress was false, it gets added to the pending list
under fw_lock

  list_add(&fw_priv->pending_list, &pending_fw_head);

reboot_in_progress = true, when all the outstanding fw request
are aborted during from reboot thread from below path. However,
I realize my mistake, reboot_in_progress should be modified
under fw_lock, will fix in v2.

 >>     fw_shutdown_notify
 >>       kill_pending_fw_fallback_reqs

So, idea is to revoke any fw request once firmware knows about ongoing
reboot and not delay the reboot process further for next default
60s .

> 
> And what kernel driver is trying to call the reboot notifier that gets
> mixed up in this?  Why not fix that driver to not need the reboot
> notifier at all (hint, I really doubt it needs it...)

'drivers/base/firmware_loader/main.c' has reboot notifier which aborts
the ongoing firmware requests and but can race with other parallel
ongoing request which are not yet added to pending list.

fw_load_sysfs_fallback-> stuck waiting for fw_lock which was held by

kill_pending_fw_fallback_reqs=>
mutex_lock(&fw_lock);
  __fw_load_abort


> 
>> --- a/drivers/base/firmware_loader/main.c
>> +++ b/drivers/base/firmware_loader/main.c
>> @@ -93,6 +93,7 @@ static inline struct fw_priv *to_fw_priv(struct kref *ref)
>>   DEFINE_MUTEX(fw_lock);
>>   
>>   struct firmware_cache fw_cache;
>> +bool reboot_in_progress;
> 
> Bad global name for a variable in the firmware_loader core.

bool abort_fw_load_req ??

-Mukesh
> 
> thanks,
> 
> greg k-h
  
Greg KH Sept. 19, 2023, 8:31 a.m. UTC | #3
On Tue, Sep 19, 2023 at 01:27:05PM +0530, Mukesh Ojha wrote:
> > > +bool reboot_in_progress;
> > 
> > Bad global name for a variable in the firmware_loader core.
> 
> bool abort_fw_load_req ??

Use "noun_verb" please for global symbols so it's more obvious where the
symbol is, and it's a bit easier to manage the namespace.

thanks,

greg k-h
  

Patch

diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
index bf68e3947814..a5546aeea91f 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -86,7 +86,7 @@  static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
 	}
 
 	mutex_lock(&fw_lock);
-	if (fw_state_is_aborted(fw_priv)) {
+	if (reboot_in_progress || fw_state_is_aborted(fw_priv)) {
 		mutex_unlock(&fw_lock);
 		retval = -EINTR;
 		goto out;
diff --git a/drivers/base/firmware_loader/firmware.h b/drivers/base/firmware_loader/firmware.h
index bf549d6500d7..6f44248e2e44 100644
--- a/drivers/base/firmware_loader/firmware.h
+++ b/drivers/base/firmware_loader/firmware.h
@@ -86,6 +86,7 @@  struct fw_priv {
 
 extern struct mutex fw_lock;
 extern struct firmware_cache fw_cache;
+extern bool reboot_in_progress;
 
 static inline bool __fw_state_check(struct fw_priv *fw_priv,
 				    enum fw_status status)
diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index b58c42f1b1ce..d72b7950edf0 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -93,6 +93,7 @@  static inline struct fw_priv *to_fw_priv(struct kref *ref)
 DEFINE_MUTEX(fw_lock);
 
 struct firmware_cache fw_cache;
+bool reboot_in_progress;
 
 void fw_state_init(struct fw_priv *fw_priv)
 {
@@ -1613,6 +1614,7 @@  static int fw_shutdown_notify(struct notifier_block *unused1,
 	 * and avoid a deadlock with the usermode_lock.
 	 */
 	kill_pending_fw_fallback_reqs(false);
+	reboot_in_progress = true;
 
 	return NOTIFY_DONE;
 }