[v1] ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()

Message ID 12237421.O9o76ZdvQC@kreacher
State New
Headers
Series [v1] ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep() |

Commit Message

Rafael J. Wysocki June 13, 2023, 3:25 p.m. UTC
  From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The addition of might_sleep() to down_timeout() caused the latter to
enable interrupts unconditionally in some cases, which in turn broke
the ACPI S3 wakeup path in acpi_suspend_enter(), where down_timeout()
is called by acpi_disable_all_gpes() via acpi_ut_acquire_mutex().

Namely, if CONFIG_DEBUG_ATOMIC_SLEEP is set, might_sleep() causes
might_resched() to be used and if CONFIG_PREEMPT_VOLUNTARY is set,
this triggers __cond_resched() which may call preempt_schedule_common(),
so __schedule() gets invoked and it ends up with enabled interrupts (in
the prev == next case).

Now, enabling interrupts early in the S3 wakeup path causes the kernel
to crash.

Address this by modifying acpi_suspend_enter() to disable GPEs without
attempting to acquire the sleeping lock which is not needed in that code
path anyway.

Fixes: 99409b935c9a locking/semaphore: Add might_sleep() to down_*() family
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/acpica/achware.h |    2 --
 drivers/acpi/sleep.c          |   16 ++++++++++++----
 include/acpi/acpixf.h         |    1 +
 3 files changed, 13 insertions(+), 6 deletions(-)
  

Comments

Peter Zijlstra June 14, 2023, 8:47 a.m. UTC | #1
On Tue, Jun 13, 2023 at 05:25:07PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The addition of might_sleep() to down_timeout() caused the latter to
> enable interrupts unconditionally in some cases, which in turn broke
> the ACPI S3 wakeup path in acpi_suspend_enter(), where down_timeout()
> is called by acpi_disable_all_gpes() via acpi_ut_acquire_mutex().
> 
> Namely, if CONFIG_DEBUG_ATOMIC_SLEEP is set, might_sleep() causes
> might_resched() to be used and if CONFIG_PREEMPT_VOLUNTARY is set,
> this triggers __cond_resched() which may call preempt_schedule_common(),
> so __schedule() gets invoked and it ends up with enabled interrupts (in
> the prev == next case).

Urgh, so that code was relying on the lack of contention to not trigger
the schedule path -- with the added might_sleep() it triggers a
preemption point.

> Now, enabling interrupts early in the S3 wakeup path causes the kernel
> to crash.
> 
> Address this by modifying acpi_suspend_enter() to disable GPEs without
> attempting to acquire the sleeping lock which is not needed in that code
> path anyway.
> 
> Fixes: 99409b935c9a locking/semaphore: Add might_sleep() to down_*() family

$ git show -s --pretty='format:%h ("%s")' 99409b935c9a
99409b935c9a ("locking/semaphore: Add might_sleep() to down_*() family")

> Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> ---
>  drivers/acpi/acpica/achware.h |    2 --
>  drivers/acpi/sleep.c          |   16 ++++++++++++----
>  include/acpi/acpixf.h         |    1 +
>  3 files changed, 13 insertions(+), 6 deletions(-)
> 
> Index: linux-pm/drivers/acpi/acpica/achware.h
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/achware.h
> +++ linux-pm/drivers/acpi/acpica/achware.h
> @@ -101,8 +101,6 @@ acpi_status
>  acpi_hw_get_gpe_status(struct acpi_gpe_event_info *gpe_event_info,
>  		       acpi_event_status *event_status);
>  
> -acpi_status acpi_hw_disable_all_gpes(void);
> -
>  acpi_status acpi_hw_enable_all_runtime_gpes(void);
>  
>  acpi_status acpi_hw_enable_all_wakeup_gpes(void);
> Index: linux-pm/include/acpi/acpixf.h
> ===================================================================
> --- linux-pm.orig/include/acpi/acpixf.h
> +++ linux-pm/include/acpi/acpixf.h
> @@ -761,6 +761,7 @@ ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_sta
>  						     acpi_event_status
>  						     *event_status))
>  ACPI_HW_DEPENDENT_RETURN_UINT32(u32 acpi_dispatch_gpe(acpi_handle gpe_device, u32 gpe_number))
> +ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_hw_disable_all_gpes(void))
>  ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_disable_all_gpes(void))
>  ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_runtime_gpes(void))
>  ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_wakeup_gpes(void))
> Index: linux-pm/drivers/acpi/sleep.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/sleep.c
> +++ linux-pm/drivers/acpi/sleep.c
> @@ -636,11 +636,19 @@ static int acpi_suspend_enter(suspend_st
>  	}
>  
>  	/*
> -	 * Disable and clear GPE status before interrupt is enabled. Some GPEs
> -	 * (like wakeup GPE) haven't handler, this can avoid such GPE misfire.
> -	 * acpi_leave_sleep_state will reenable specific GPEs later
> +	 * Disable all GPE and clear their status bits before interrupts are
> +	 * enabled. Some GPEs (like wakeup GPEs) have no handlers and this can
> +	 * prevent them from producing spurious interrups.
> +	 *
> +	 * acpi_leave_sleep_state() will reenable specific GPEs later.
> +	 *
> +	 * Because this code runs on one CPU with disabled interrupts (all of
> +	 * the other CPUs are offline at that time), it need not acquire any
> +	 * sleeping locks which maybe harmful due to instrumentation even if
> +	 * those locks are not contended, so avoid doing that by using a low-
> +	 * level library routine here.

I'm not sure I'd call the implicit preemption point 'instrumentation'
but yeah, fair enough I suppose.

>  	 */
> -	acpi_disable_all_gpes();
> +	acpi_hw_disable_all_gpes();
>  	/* Allow EC transactions to happen. */
>  	acpi_ec_unblock_transactions();
>  
> 
> 
>
  
Rafael J. Wysocki June 14, 2023, 2:15 p.m. UTC | #2
On Wed, Jun 14, 2023 at 10:48 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jun 13, 2023 at 05:25:07PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The addition of might_sleep() to down_timeout() caused the latter to
> > enable interrupts unconditionally in some cases, which in turn broke
> > the ACPI S3 wakeup path in acpi_suspend_enter(), where down_timeout()
> > is called by acpi_disable_all_gpes() via acpi_ut_acquire_mutex().
> >
> > Namely, if CONFIG_DEBUG_ATOMIC_SLEEP is set, might_sleep() causes
> > might_resched() to be used and if CONFIG_PREEMPT_VOLUNTARY is set,
> > this triggers __cond_resched() which may call preempt_schedule_common(),
> > so __schedule() gets invoked and it ends up with enabled interrupts (in
> > the prev == next case).
>
> Urgh, so that code was relying on the lack of contention to not trigger
> the schedule path -- with the added might_sleep() it triggers a
> preemption point.

Right.

> > Now, enabling interrupts early in the S3 wakeup path causes the kernel
> > to crash.
> >
> > Address this by modifying acpi_suspend_enter() to disable GPEs without
> > attempting to acquire the sleeping lock which is not needed in that code
> > path anyway.
> >
> > Fixes: 99409b935c9a locking/semaphore: Add might_sleep() to down_*() family
>
> $ git show -s --pretty='format:%h ("%s")' 99409b935c9a
> 99409b935c9a ("locking/semaphore: Add might_sleep() to down_*() family")

Right, thanks!

> > Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks!

> > ---
> >  drivers/acpi/acpica/achware.h |    2 --
> >  drivers/acpi/sleep.c          |   16 ++++++++++++----
> >  include/acpi/acpixf.h         |    1 +
> >  3 files changed, 13 insertions(+), 6 deletions(-)
> >
> > Index: linux-pm/drivers/acpi/acpica/achware.h
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/acpica/achware.h
> > +++ linux-pm/drivers/acpi/acpica/achware.h
> > @@ -101,8 +101,6 @@ acpi_status
> >  acpi_hw_get_gpe_status(struct acpi_gpe_event_info *gpe_event_info,
> >                      acpi_event_status *event_status);
> >
> > -acpi_status acpi_hw_disable_all_gpes(void);
> > -
> >  acpi_status acpi_hw_enable_all_runtime_gpes(void);
> >
> >  acpi_status acpi_hw_enable_all_wakeup_gpes(void);
> > Index: linux-pm/include/acpi/acpixf.h
> > ===================================================================
> > --- linux-pm.orig/include/acpi/acpixf.h
> > +++ linux-pm/include/acpi/acpixf.h
> > @@ -761,6 +761,7 @@ ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_sta
> >                                                    acpi_event_status
> >                                                    *event_status))
> >  ACPI_HW_DEPENDENT_RETURN_UINT32(u32 acpi_dispatch_gpe(acpi_handle gpe_device, u32 gpe_number))
> > +ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_hw_disable_all_gpes(void))
> >  ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_disable_all_gpes(void))
> >  ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_runtime_gpes(void))
> >  ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_wakeup_gpes(void))
> > Index: linux-pm/drivers/acpi/sleep.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/sleep.c
> > +++ linux-pm/drivers/acpi/sleep.c
> > @@ -636,11 +636,19 @@ static int acpi_suspend_enter(suspend_st
> >       }
> >
> >       /*
> > -      * Disable and clear GPE status before interrupt is enabled. Some GPEs
> > -      * (like wakeup GPE) haven't handler, this can avoid such GPE misfire.
> > -      * acpi_leave_sleep_state will reenable specific GPEs later
> > +      * Disable all GPE and clear their status bits before interrupts are
> > +      * enabled. Some GPEs (like wakeup GPEs) have no handlers and this can
> > +      * prevent them from producing spurious interrups.
> > +      *
> > +      * acpi_leave_sleep_state() will reenable specific GPEs later.
> > +      *
> > +      * Because this code runs on one CPU with disabled interrupts (all of
> > +      * the other CPUs are offline at that time), it need not acquire any
> > +      * sleeping locks which maybe harmful due to instrumentation even if
> > +      * those locks are not contended, so avoid doing that by using a low-
> > +      * level library routine here.
>
> I'm not sure I'd call the implicit preemption point 'instrumentation'
> but yeah, fair enough I suppose.

OK, I'll send a v2 with this clarified.

> >        */
> > -     acpi_disable_all_gpes();
> > +     acpi_hw_disable_all_gpes();
> >       /* Allow EC transactions to happen. */
> >       acpi_ec_unblock_transactions();
  

Patch

Index: linux-pm/drivers/acpi/acpica/achware.h
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/achware.h
+++ linux-pm/drivers/acpi/acpica/achware.h
@@ -101,8 +101,6 @@  acpi_status
 acpi_hw_get_gpe_status(struct acpi_gpe_event_info *gpe_event_info,
 		       acpi_event_status *event_status);
 
-acpi_status acpi_hw_disable_all_gpes(void);
-
 acpi_status acpi_hw_enable_all_runtime_gpes(void);
 
 acpi_status acpi_hw_enable_all_wakeup_gpes(void);
Index: linux-pm/include/acpi/acpixf.h
===================================================================
--- linux-pm.orig/include/acpi/acpixf.h
+++ linux-pm/include/acpi/acpixf.h
@@ -761,6 +761,7 @@  ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_sta
 						     acpi_event_status
 						     *event_status))
 ACPI_HW_DEPENDENT_RETURN_UINT32(u32 acpi_dispatch_gpe(acpi_handle gpe_device, u32 gpe_number))
+ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_hw_disable_all_gpes(void))
 ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_disable_all_gpes(void))
 ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_runtime_gpes(void))
 ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_wakeup_gpes(void))
Index: linux-pm/drivers/acpi/sleep.c
===================================================================
--- linux-pm.orig/drivers/acpi/sleep.c
+++ linux-pm/drivers/acpi/sleep.c
@@ -636,11 +636,19 @@  static int acpi_suspend_enter(suspend_st
 	}
 
 	/*
-	 * Disable and clear GPE status before interrupt is enabled. Some GPEs
-	 * (like wakeup GPE) haven't handler, this can avoid such GPE misfire.
-	 * acpi_leave_sleep_state will reenable specific GPEs later
+	 * Disable all GPE and clear their status bits before interrupts are
+	 * enabled. Some GPEs (like wakeup GPEs) have no handlers and this can
+	 * prevent them from producing spurious interrups.
+	 *
+	 * acpi_leave_sleep_state() will reenable specific GPEs later.
+	 *
+	 * Because this code runs on one CPU with disabled interrupts (all of
+	 * the other CPUs are offline at that time), it need not acquire any
+	 * sleeping locks which maybe harmful due to instrumentation even if
+	 * those locks are not contended, so avoid doing that by using a low-
+	 * level library routine here.
 	 */
-	acpi_disable_all_gpes();
+	acpi_hw_disable_all_gpes();
 	/* Allow EC transactions to happen. */
 	acpi_ec_unblock_transactions();