[01/20] posix-timers: Prevent RT livelock in itimer_delete()

Message ID 20230425183312.862346341@linutronix.de
State New
Headers
Series posix-timers: Fixes and cleanups |

Commit Message

Thomas Gleixner April 25, 2023, 6:48 p.m. UTC
  itimer_delete() has a retry loop when the timer is concurrently expired. On
non-RT kernels this just spin-waits until the timer callback has
completed. On RT kernels this is a potential livelock when the exiting task
preempted the hrtimer soft interrupt.

This only affects hrtimer based timers as Posix CPU timers cannot be
concurrently expired. For CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y this is
obviously impossible as the task cannot run task work and exit at the same
time. The CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (only non-RT) is prevented
because interrupts are disabled.

Replace spin_unlock() with an invocation of timer_wait_running() to handle
it the same way as the other retry loops in the posix timer code.

Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
---
 kernel/time/posix-timers.c |   50 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 8 deletions(-)
  

Comments

Frederic Weisbecker May 4, 2023, 5:06 p.m. UTC | #1
Le Tue, Apr 25, 2023 at 08:48:56PM +0200, Thomas Gleixner a écrit :
> itimer_delete() has a retry loop when the timer is concurrently expired. On
> non-RT kernels this just spin-waits until the timer callback has
> completed. On RT kernels this is a potential livelock when the exiting task
> preempted the hrtimer soft interrupt.
> 
> This only affects hrtimer based timers as Posix CPU timers cannot be
> concurrently expired. For CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y this is
> obviously impossible as the task cannot run task work and exit at the same
> time. The CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (only non-RT) is prevented
> because interrupts are disabled.

But the owner of the timer is not the same as the target of the timer, right?

Though I seem to remember that we forbid setting a timer to a target outside
the current process, in which case the owner and the target are the same at
this exit stage. But I can't remember what enforces that permission in pid_for_clock()...

Thanks.
  
Thomas Gleixner May 4, 2023, 6:20 p.m. UTC | #2
On Thu, May 04 2023 at 19:06, Frederic Weisbecker wrote:
> Le Tue, Apr 25, 2023 at 08:48:56PM +0200, Thomas Gleixner a écrit :
>> itimer_delete() has a retry loop when the timer is concurrently expired. On
>> non-RT kernels this just spin-waits until the timer callback has
>> completed. On RT kernels this is a potential livelock when the exiting task
>> preempted the hrtimer soft interrupt.
>> 
>> This only affects hrtimer based timers as Posix CPU timers cannot be
>> concurrently expired. For CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y this is
>> obviously impossible as the task cannot run task work and exit at the same
>> time. The CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (only non-RT) is prevented
>> because interrupts are disabled.
>
> But the owner of the timer is not the same as the target of the timer, right?
>
> Though I seem to remember that we forbid setting a timer to a target outside
> the current process, in which case the owner and the target are the same at
> this exit stage. But I can't remember what enforces that permission in
> pid_for_clock()..

The owner of the timer is always the one which needs to find the entity
to synchronize on, whether that's the right hrtimer base or the task
which runs the expiry code.

wait_for_running_timer() is taking that into account:

  - The hrtimer timer based posix timers lock the hrtimer base expiry
    lock on the base to which the timer is currently associated

  - Posix CPU timers can be armed on a differnet process (only per
    thread timers are restricted to currents threadgroup) but the
    wait_for_running() callback "knows" how to find that process:

    When the timer is moved to the expiry list it gets:

         cputimer->firing = 1;
         rcu_assign_pointer(ctmr->handling, current);

   and the wait for running side does:

       rcu_read_lock()
       tsk = rcu_dereference(timr->it.cpu.handling);
       ....
       mutex_lock(&tsk->posix_cputimers_work.mutex);

   See collect_timerqueue(), handle_posix_cpu_timers() and
   posix_cpu_timer_wait_running() for details.

   commit f7abf14f0001 ("posix-cpu-timers: Implement the missing
   timer_wait_running callback") has quite some prose in the changelog.

Thanks,

        tglx
  
Thomas Gleixner May 5, 2023, 7:57 a.m. UTC | #3
On Thu, May 04 2023 at 20:20, Thomas Gleixner wrote:
> On Thu, May 04 2023 at 19:06, Frederic Weisbecker wrote:
>> Le Tue, Apr 25, 2023 at 08:48:56PM +0200, Thomas Gleixner a écrit :
>>> itimer_delete() has a retry loop when the timer is concurrently expired. On
>>> non-RT kernels this just spin-waits until the timer callback has
>>> completed. On RT kernels this is a potential livelock when the exiting task
>>> preempted the hrtimer soft interrupt.
>>> 
>>> This only affects hrtimer based timers as Posix CPU timers cannot be
>>> concurrently expired. For CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y this is
>>> obviously impossible as the task cannot run task work and exit at the same
>>> time. The CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (only non-RT) is prevented
>>> because interrupts are disabled.
>>
>> But the owner of the timer is not the same as the target of the timer, right?
>>
>> Though I seem to remember that we forbid setting a timer to a target outside
>> the current process, in which case the owner and the target are the same at
>> this exit stage. But I can't remember what enforces that permission in
>> pid_for_clock()..
>
> The owner of the timer is always the one which needs to find the entity
> to synchronize on, whether that's the right hrtimer base or the task
> which runs the expiry code.
>
> wait_for_running_timer() is taking that into account:
>
>   - The hrtimer timer based posix timers lock the hrtimer base expiry
>     lock on the base to which the timer is currently associated
>
>   - Posix CPU timers can be armed on a differnet process (only per
>     thread timers are restricted to currents threadgroup) but the
>     wait_for_running() callback "knows" how to find that process:
>
>     When the timer is moved to the expiry list it gets:
>
>          cputimer->firing = 1;
>          rcu_assign_pointer(ctmr->handling, current);
>
>    and the wait for running side does:
>
>        rcu_read_lock()
>        tsk = rcu_dereference(timr->it.cpu.handling);
>        ....
>        mutex_lock(&tsk->posix_cputimers_work.mutex);
>
>    See collect_timerqueue(), handle_posix_cpu_timers() and
>    posix_cpu_timer_wait_running() for details.
>
>    commit f7abf14f0001 ("posix-cpu-timers: Implement the missing
>    timer_wait_running callback") has quite some prose in the changelog.

But you have a point. The comment I added in itimer_delete() vs. CPU
timers is wrong for timers which are armed on a different process.
Needs to be removed.

Thanks,

        tglx
  

Patch

--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -1037,27 +1037,59 @@  SYSCALL_DEFINE1(timer_delete, timer_t, t
 }
 
 /*
- * return timer owned by the process, used by exit_itimers
+ * Delete a timer if it is armed, remove it from the hash and schedule it
+ * for RCU freeing.
  */
 static void itimer_delete(struct k_itimer *timer)
 {
-retry_delete:
-	spin_lock_irq(&timer->it_lock);
+	unsigned long flags;
 
+retry_delete:
+	/*
+	 * irqsave is required to make timer_wait_running() work.
+	 */
+	spin_lock_irqsave(&timer->it_lock, flags);
+
+	/*
+	 * Even if the timer is not longer accessible from other tasks
+	 * it still might be armed and queued in the underlying timer
+	 * mechanism. Worse, that timer mechanism might run the expiry
+	 * function concurrently.
+	 */
 	if (timer_delete_hook(timer) == TIMER_RETRY) {
-		spin_unlock_irq(&timer->it_lock);
+		/*
+		 * Timer is expired concurrently, prevent livelocks
+		 * and pointless spinning on RT.
+		 *
+		 * The CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y case is
+		 * irrelevant here because obviously the exiting task
+		 * cannot be expiring timer in task work concurrently.
+		 * Ditto for CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n as the
+		 * tick interrupt cannot run on this CPU because the above
+		 * spin_lock disabled interrupts.
+		 *
+		 * timer_wait_running() drops timer::it_lock, which opens
+		 * the possibility for another task to delete the timer.
+		 *
+		 * That's not possible here because this is invoked from
+		 * do_exit() only for the last thread of the thread group.
+		 * So no other task can access that timer.
+		 */
+		if (WARN_ON_ONCE(timer_wait_running(timer, &flags) != timer))
+			return;
+
 		goto retry_delete;
 	}
 	list_del(&timer->list);
 
-	spin_unlock_irq(&timer->it_lock);
+	spin_unlock_irqrestore(&timer->it_lock, flags);
 	release_posix_timer(timer, IT_ID_SET);
 }
 
 /*
- * This is called by do_exit or de_thread, only when nobody else can
- * modify the signal->posix_timers list. Yet we need sighand->siglock
- * to prevent the race with /proc/pid/timers.
+ * Invoked from do_exit() when the last thread of a thread group exits.
+ * At that point no other task can access the timers of the dying
+ * task anymore.
  */
 void exit_itimers(struct task_struct *tsk)
 {
@@ -1067,10 +1099,12 @@  void exit_itimers(struct task_struct *ts
 	if (list_empty(&tsk->signal->posix_timers))
 		return;
 
+	/* Protect against concurrent read via /proc/$PID/timers */
 	spin_lock_irq(&tsk->sighand->siglock);
 	list_replace_init(&tsk->signal->posix_timers, &timers);
 	spin_unlock_irq(&tsk->sighand->siglock);
 
+	/* The timers are not longer accessible via tsk::signal */
 	while (!list_empty(&timers)) {
 		tmr = list_first_entry(&timers, struct k_itimer, list);
 		itimer_delete(tmr);