Message ID | 20231130204817.2031407-1-jannh@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp675205vqy; Thu, 30 Nov 2023 12:48:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IFkWTe8cjvLP1+FcMaXVqOvKxMcrG59gMkNR0cGes8OUwYSbbxn24pcr4LTKteqdv6r390G X-Received: by 2002:a17:902:d489:b0:1cf:f7c3:1e32 with SMTP id c9-20020a170902d48900b001cff7c31e32mr11680261plg.27.1701377326032; Thu, 30 Nov 2023 12:48:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701377326; cv=none; d=google.com; s=arc-20160816; b=OhMhcn2sj5bo3Ft4uig58tXH5LZFKejQBvGE+s52aONsHIxpZ7xyZp/nOWdgW7GJEa zuWabEl1P9NPKV+W+Cyt0yivXHkXy50mFy1YvN6bawbiSVpPw54TH+P1bMILcM4kjB9R 2Pey0Q5ex6sJrEgm2Pbzeqb22nu4zvgG35j0drTk7OG1AXdjvJc/RkwRsf0HKVdzaQaW ymrFi+GmYfwZzpOW6PbHhHWgq2kaXIfkwLPBVXRlyyTlDf/ABh3HoXqLnwmMVtcLoG08 HGpgiBoTHkNGEaQ/ayXl5J09SVtqQF4rnqLe3swKAWHRA2lZa/iI2xnEPR29PqBWuk3W RsSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=lDbR5wn/ymhzANEbstkor6jRTtY0ksRtmkAGEgVn3ME=; fh=KCz2os+n1Otqcw2CIM0fYmpSkYuo9EOsu6ZxvNNqxSA=; b=nZMVhurbgxUaF2Bqs9klkt6zDFe1Iw8w0cZqroXh0QTiBqYATtOxe+v9YESXk+Jtdb fd5taM2A4iG2CmUIkActQhPw9zckGT44kiUIXidyya8WqhgBxg/rlmuRHyZ5hsf/LJ0j 87talQFIX/xMuxrU7AbaZOqgaLMDobvBHKJSimw0z1P5JwVSw1X2exXdA8n30OjI41Sp TgQCjwh91P+rGKU7HqmsagOokRDccru0RFdRt6uqNB9zajq2BWBWFASHfTnGuXlK+yWg zDonBcIc/WyeSiLWa9Ejlf38js+LJ8kWdxLylFtkUA+oowJPCXXQqnwoprJMrlameVgn X8lw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=c68nmUyn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id g4-20020a170902868400b001d0050e2452si1804386plo.45.2023.11.30.12.48.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 12:48:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=c68nmUyn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id A4AD980DF9AC; Thu, 30 Nov 2023 12:48:39 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376364AbjK3Usa (ORCPT <rfc822;ruipengqi7@gmail.com> + 99 others); Thu, 30 Nov 2023 15:48:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229989AbjK3Usa (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 30 Nov 2023 15:48:30 -0500 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 005C2D40 for <linux-kernel@vger.kernel.org>; Thu, 30 Nov 2023 12:48:34 -0800 (PST) Received: by mail-wm1-x335.google.com with SMTP id 5b1f17b1804b1-40b51e26a7aso1295e9.1 for <linux-kernel@vger.kernel.org>; Thu, 30 Nov 2023 12:48:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701377313; x=1701982113; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=lDbR5wn/ymhzANEbstkor6jRTtY0ksRtmkAGEgVn3ME=; b=c68nmUyn7+GdxK0l9ja1a0VcOWtqcXOOGjajdIyD6+8k0y0R4Au3jI6lvPrQzmZyJN 2PkvNbIRZCddExGR53DD36OXokzDkVXErQ+FBTKB6RY1Fx8Sciw85Wov/ssJbyDlPUQX wvmK+GYxpc343e/HeTKnQqlPslSAXrZYxxvG3E7rFxjMo5dOWh4emMvPXVOF9VSgeafJ 55TT8smWaxH/e3bg8/oNBF9e2z13IgauuQR73DiNJe8SzrHLah5EbKWBO1GqDA0g9OB9 RBE4syT1eo9KbhesACFq1vBiXm1AX7PhChy8PYKzKeoW8xTfMiv14wgpncY4V0wmNU0K 0Qdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701377313; x=1701982113; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lDbR5wn/ymhzANEbstkor6jRTtY0ksRtmkAGEgVn3ME=; b=HXtqozH+peffivRgJzstD/rwYpUpAQUuKpUkylOCtEdjZWE+jBDf4iYNb22pmWJx1C ohshcxqryrvMunUzXNI7LRmFqS0mxkrRaSbd0OqVb+PQXrVFzIAV/KnjKwyj5PIuMhus rDW9YmRIcfYoOWM2vhpVKBEq3Tsiw5TwsEJ8kXjV+ANlJHvdTZ78mgKsDMJIJz7P0lbV 9uOrFgGWoQJhd19yu4GaS4Cxnk7BnpMMhySo24arYxbEUwGGKVDD+iytij6YCq7+zaFD uZGykxtWIbT26Gu0nUldcGSWZY1VGNHJu6EjLRA3YkSPvcVvs+x5eWn87pwgqKSV1v8U vrKQ== X-Gm-Message-State: AOJu0YxF+w7jHLf8j9ImyaotiXpqzSmwD8vOGL9vkOW+TeqbfdvaChIJ lDyipfuNfM6EHHs97K+GWIi+vQ== X-Received: by 2002:a1c:7202:0:b0:40b:4355:a04b with SMTP id n2-20020a1c7202000000b0040b4355a04bmr15152wmc.6.1701377313270; Thu, 30 Nov 2023 12:48:33 -0800 (PST) Received: from localhost ([2a00:79e0:9d:4:9869:5af3:4653:dd50]) by smtp.gmail.com with ESMTPSA id h19-20020a05600c351300b0040b347d90d0sm6680258wmq.12.2023.11.30.12.48.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 12:48:32 -0800 (PST) From: Jann Horn <jannh@google.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com>, Jonathan Corbet <corbet@lwn.net>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: [PATCH] locking: Document that mutex_unlock() is non-atomic Date: Thu, 30 Nov 2023 21:48:17 +0100 Message-ID: <20231130204817.2031407-1-jannh@google.com> X-Mailer: git-send-email 2.43.0.rc2.451.g8631bc7472-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 30 Nov 2023 12:48:39 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784023430718184786 X-GMAIL-MSGID: 1784023430718184786 |
Series |
locking: Document that mutex_unlock() is non-atomic
|
|
Commit Message
Jann Horn
Nov. 30, 2023, 8:48 p.m. UTC
I have seen several cases of attempts to use mutex_unlock() to release an
object such that the object can then be freed by another task.
My understanding is that this is not safe because mutex_unlock(), in the
MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex
structure after having marked it as unlocked; so mutex_unlock() requires
its caller to ensure that the mutex stays alive until mutex_unlock()
returns.
If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters
have to keep the mutex alive, I think; but we could have a spurious
MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed
between the points where __mutex_unlock_slowpath() did the cmpxchg
reading the flags and where it acquired the wait_lock.
(With spinlocks, that kind of code pattern is allowed and, from what I
remember, used in several places in the kernel.)
If my understanding of this is correct, we should probably document this -
I think such a semantic difference between mutexes and spinlocks is fairly
unintuitive.
Signed-off-by: Jann Horn <jannh@google.com>
---
I hope for some thorough review on this patch to make sure the comments
I'm adding are actually true, and to confirm that mutexes intentionally
do not support this usage pattern.
Documentation/locking/mutex-design.rst | 6 ++++++
kernel/locking/mutex.c | 5 +++++
2 files changed, 11 insertions(+)
base-commit: 3b47bc037bd44f142ac09848e8d3ecccc726be99
Comments
On 11/30/23 15:48, Jann Horn wrote: > I have seen several cases of attempts to use mutex_unlock() to release an > object such that the object can then be freed by another task. > My understanding is that this is not safe because mutex_unlock(), in the > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > structure after having marked it as unlocked; so mutex_unlock() requires > its caller to ensure that the mutex stays alive until mutex_unlock() > returns. > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > have to keep the mutex alive, I think; but we could have a spurious > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > between the points where __mutex_unlock_slowpath() did the cmpxchg > reading the flags and where it acquired the wait_lock. Could you clarify under what condition a concurrent task can decide to free the object holding the mutex? Is it !mutex_is_locked() or after a mutex_lock()/mutex_unlock sequence? mutex_is_locked() will return true if the mutex has waiter even if it is currently free. Cheers, Longman > > (With spinlocks, that kind of code pattern is allowed and, from what I > remember, used in several places in the kernel.) > > If my understanding of this is correct, we should probably document this - > I think such a semantic difference between mutexes and spinlocks is fairly > unintuitive. > > Signed-off-by: Jann Horn <jannh@google.com> > --- > I hope for some thorough review on this patch to make sure the comments > I'm adding are actually true, and to confirm that mutexes intentionally > do not support this usage pattern. > > Documentation/locking/mutex-design.rst | 6 ++++++ > kernel/locking/mutex.c | 5 +++++ > 2 files changed, 11 insertions(+) > > diff --git a/Documentation/locking/mutex-design.rst b/Documentation/locking/mutex-design.rst > index 78540cd7f54b..087716bfa7b2 100644 > --- a/Documentation/locking/mutex-design.rst > +++ b/Documentation/locking/mutex-design.rst > @@ -101,6 +101,12 @@ features that make lock debugging easier and faster: > - Detects multi-task circular deadlocks and prints out all affected > locks and tasks (and only those tasks). > > +Releasing a mutex is not an atomic operation: Once a mutex release operation > +has begun, another context may be able to acquire the mutex before the release > +operation has completed. The mutex user must ensure that the mutex is not > +destroyed while a release operation is still in progress - in other words, > +callers of 'mutex_unlock' must ensure that the mutex stays alive until > +'mutex_unlock' has returned. > > Interfaces > ---------- > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 2deeeca3e71b..4c6b83bab643 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -532,6 +532,11 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne > * This function must not be used in interrupt context. Unlocking > * of a not locked mutex is not allowed. > * > + * The caller must ensure that the mutex stays alive until this function has > + * returned - mutex_unlock() can NOT directly be used to release an object such > + * that another concurrent task can free it. > + * Mutexes are different from spinlocks in this aspect. > + * > * This function is similar to (but not equivalent to) up(). > */ > void __sched mutex_unlock(struct mutex *lock) > > base-commit: 3b47bc037bd44f142ac09848e8d3ecccc726be99
On Thu, Nov 30, 2023 at 10:53 PM Waiman Long <longman@redhat.com> wrote: > On 11/30/23 15:48, Jann Horn wrote: > > I have seen several cases of attempts to use mutex_unlock() to release an > > object such that the object can then be freed by another task. > > My understanding is that this is not safe because mutex_unlock(), in the > > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > > structure after having marked it as unlocked; so mutex_unlock() requires > > its caller to ensure that the mutex stays alive until mutex_unlock() > > returns. > > > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > > have to keep the mutex alive, I think; but we could have a spurious > > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > > between the points where __mutex_unlock_slowpath() did the cmpxchg > > reading the flags and where it acquired the wait_lock. > > Could you clarify under what condition a concurrent task can decide to > free the object holding the mutex? Is it !mutex_is_locked() or after a > mutex_lock()/mutex_unlock sequence? I mean a mutex_lock()+mutex_unlock() sequence. > mutex_is_locked() will return true if the mutex has waiter even if it > is currently free. I don't understand your point, and maybe I also don't understand what you mean by "free". Isn't mutex_is_locked() defined such that it only looks at whether a mutex has an owner, and doesn't look at the waiter list?
On 11/30/23 17:24, Jann Horn wrote: > On Thu, Nov 30, 2023 at 10:53 PM Waiman Long <longman@redhat.com> wrote: >> On 11/30/23 15:48, Jann Horn wrote: >>> I have seen several cases of attempts to use mutex_unlock() to release an >>> object such that the object can then be freed by another task. >>> My understanding is that this is not safe because mutex_unlock(), in the >>> MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex >>> structure after having marked it as unlocked; so mutex_unlock() requires >>> its caller to ensure that the mutex stays alive until mutex_unlock() >>> returns. >>> >>> If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters >>> have to keep the mutex alive, I think; but we could have a spurious >>> MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed >>> between the points where __mutex_unlock_slowpath() did the cmpxchg >>> reading the flags and where it acquired the wait_lock. >> Could you clarify under what condition a concurrent task can decide to >> free the object holding the mutex? Is it !mutex_is_locked() or after a >> mutex_lock()/mutex_unlock sequence? > I mean a mutex_lock()+mutex_unlock() sequence. Because of optimistic spinning, a mutex_lock()/mutex_unlock() can succeed even if there are still waiters waiting for the lock. > >> mutex_is_locked() will return true if the mutex has waiter even if it >> is currently free. > I don't understand your point, and maybe I also don't understand what > you mean by "free". Isn't mutex_is_locked() defined such that it only > looks at whether a mutex has an owner, and doesn't look at the waiter > list? What I mean is that the mutex is in an unlocked state ready to be acquired by another locker. mutex_is_locked() considers the state of the mutex as locked if any of the owner flags is set. Beside the mutex_lock()/mutex_unlock() sequence, I will suggest adding a mutex_is_locked() check just to be sure. Cheers, Longman
On 11/30/23 15:48, Jann Horn wrote: > I have seen several cases of attempts to use mutex_unlock() to release an > object such that the object can then be freed by another task. > My understanding is that this is not safe because mutex_unlock(), in the > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > structure after having marked it as unlocked; so mutex_unlock() requires > its caller to ensure that the mutex stays alive until mutex_unlock() > returns. > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > have to keep the mutex alive, I think; but we could have a spurious > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > between the points where __mutex_unlock_slowpath() did the cmpxchg > reading the flags and where it acquired the wait_lock. > > (With spinlocks, that kind of code pattern is allowed and, from what I > remember, used in several places in the kernel.) > > If my understanding of this is correct, we should probably document this - > I think such a semantic difference between mutexes and spinlocks is fairly > unintuitive. Spinlocks are fair. So doing a lock/unlock sequence will make sure that all the previously waiting waiters are done with the lock. Para-virtual spinlocks, however, can be a bit unfair so doing a lock/unlock sequence may not be enough to guarantee there is no waiter. The same is true for mutex. Adding a spin_is_locked() or mutex_is_locked() check can make sure that all the waiters are gone. Also the term "non-atomc" is kind of ambiguous as to what is the exact meaning of this word. Cheers, Longman > > Signed-off-by: Jann Horn <jannh@google.com> > --- > I hope for some thorough review on this patch to make sure the comments > I'm adding are actually true, and to confirm that mutexes intentionally > do not support this usage pattern. > > Documentation/locking/mutex-design.rst | 6 ++++++ > kernel/locking/mutex.c | 5 +++++ > 2 files changed, 11 insertions(+) > > diff --git a/Documentation/locking/mutex-design.rst b/Documentation/locking/mutex-design.rst > index 78540cd7f54b..087716bfa7b2 100644 > --- a/Documentation/locking/mutex-design.rst > +++ b/Documentation/locking/mutex-design.rst > @@ -101,6 +101,12 @@ features that make lock debugging easier and faster: > - Detects multi-task circular deadlocks and prints out all affected > locks and tasks (and only those tasks). > > +Releasing a mutex is not an atomic operation: Once a mutex release operation > +has begun, another context may be able to acquire the mutex before the release > +operation has completed. The mutex user must ensure that the mutex is not > +destroyed while a release operation is still in progress - in other words, > +callers of 'mutex_unlock' must ensure that the mutex stays alive until > +'mutex_unlock' has returned. > > Interfaces > ---------- > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 2deeeca3e71b..4c6b83bab643 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -532,6 +532,11 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne > * This function must not be used in interrupt context. Unlocking > * of a not locked mutex is not allowed. > * > + * The caller must ensure that the mutex stays alive until this function has > + * returned - mutex_unlock() can NOT directly be used to release an object such > + * that another concurrent task can free it. > + * Mutexes are different from spinlocks in this aspect. > + * > * This function is similar to (but not equivalent to) up(). > */ > void __sched mutex_unlock(struct mutex *lock) > > base-commit: 3b47bc037bd44f142ac09848e8d3ecccc726be99
On Thu, Nov 30, 2023 at 09:48:17PM +0100, Jann Horn wrote: > I have seen several cases of attempts to use mutex_unlock() to release an > object such that the object can then be freed by another task. > My understanding is that this is not safe because mutex_unlock(), in the > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > structure after having marked it as unlocked; so mutex_unlock() requires > its caller to ensure that the mutex stays alive until mutex_unlock() > returns. > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > have to keep the mutex alive, I think; but we could have a spurious > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > between the points where __mutex_unlock_slowpath() did the cmpxchg > reading the flags and where it acquired the wait_lock. > > (With spinlocks, that kind of code pattern is allowed and, from what I > remember, used in several places in the kernel.) > > If my understanding of this is correct, we should probably document this - > I think such a semantic difference between mutexes and spinlocks is fairly > unintuitive. IIRC this is true of all sleeping locks, and I think completion was the explcicit exception here, but it's been a while. > index 78540cd7f54b..087716bfa7b2 100644 > --- a/Documentation/locking/mutex-design.rst > +++ b/Documentation/locking/mutex-design.rst > @@ -101,6 +101,12 @@ features that make lock debugging easier and faster: > - Detects multi-task circular deadlocks and prints out all affected > locks and tasks (and only those tasks). > > +Releasing a mutex is not an atomic operation: Once a mutex release operation Well, it very much is an atomic store-release. That is, I object to your confusing use of atomic here :-)
* Waiman Long <longman@redhat.com> wrote: > On 11/30/23 15:48, Jann Horn wrote: > > I have seen several cases of attempts to use mutex_unlock() to release an > > object such that the object can then be freed by another task. > > My understanding is that this is not safe because mutex_unlock(), in the > > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > > structure after having marked it as unlocked; so mutex_unlock() requires > > its caller to ensure that the mutex stays alive until mutex_unlock() > > returns. > > > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > > have to keep the mutex alive, I think; but we could have a spurious > > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > > between the points where __mutex_unlock_slowpath() did the cmpxchg > > reading the flags and where it acquired the wait_lock. > > Could you clarify under what condition a concurrent task can decide to free > the object holding the mutex? Is it !mutex_is_locked() or after a > mutex_lock()/mutex_unlock sequence? > > mutex_is_locked() will return true if the mutex has waiter even if it is > currently free. I believe the correct condition is what the changelog already says: "until mutex_unlock() returns". What happens within mutex_unlock() is kernel implementation specific and once a caller has called mutex_unlock(), the mutex must remain alive until it returns. No other call can substitute for this: neither mutex_is_locked(), nor some sort of mutex_lock()+mutex_unlock() sequence. Thanks, Ingo
On Fri, Dec 1, 2023 at 1:33 AM Waiman Long <longman@redhat.com> wrote: > On 11/30/23 15:48, Jann Horn wrote: > > I have seen several cases of attempts to use mutex_unlock() to release an > > object such that the object can then be freed by another task. > > My understanding is that this is not safe because mutex_unlock(), in the > > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > > structure after having marked it as unlocked; so mutex_unlock() requires > > its caller to ensure that the mutex stays alive until mutex_unlock() > > returns. > > > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > > have to keep the mutex alive, I think; but we could have a spurious > > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > > between the points where __mutex_unlock_slowpath() did the cmpxchg > > reading the flags and where it acquired the wait_lock. > > > > (With spinlocks, that kind of code pattern is allowed and, from what I > > remember, used in several places in the kernel.) > > > > If my understanding of this is correct, we should probably document this - > > I think such a semantic difference between mutexes and spinlocks is fairly > > unintuitive. > > Spinlocks are fair. So doing a lock/unlock sequence will make sure that > all the previously waiting waiters are done with the lock. Para-virtual > spinlocks, however, can be a bit unfair so doing a lock/unlock sequence > may not be enough to guarantee there is no waiter. The same is true for > mutex. Adding a spin_is_locked() or mutex_is_locked() check can make > sure that all the waiters are gone. I think this pattern anyway only works when you're only trying to wait for the current holder of the lock, not tasks that are queued up on the lock as waiters - so a task initially holds a stable reference to some object, then acquires the object's lock, then drops the original reference, and then later drops the lock. You can see an example of such mutex usage (which is explicitly legal with userspace POSIX mutexes, but is forbidden with kernel mutexes) at the bottom of the POSIX manpage for pthread_mutex_destroy() at <https://pubs.opengroup.org/onlinepubs/007904875/functions/pthread_mutex_destroy.html>, in the section "Destroying Mutexes". (I think trying to wait for pending waiters before destroying a mutex wouldn't make sense because if there can still be pending waiters, there can almost always also be tasks that are about to _become_ pending waiters but that haven't called mutex_lock() yet.)
On Fri, Dec 1, 2023 at 10:10 AM Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Nov 30, 2023 at 09:48:17PM +0100, Jann Horn wrote: > > I have seen several cases of attempts to use mutex_unlock() to release an > > object such that the object can then be freed by another task. > > My understanding is that this is not safe because mutex_unlock(), in the > > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > > structure after having marked it as unlocked; so mutex_unlock() requires > > its caller to ensure that the mutex stays alive until mutex_unlock() > > returns. > > > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > > have to keep the mutex alive, I think; but we could have a spurious > > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > > between the points where __mutex_unlock_slowpath() did the cmpxchg > > reading the flags and where it acquired the wait_lock. > > > > (With spinlocks, that kind of code pattern is allowed and, from what I > > remember, used in several places in the kernel.) > > > > If my understanding of this is correct, we should probably document this - > > I think such a semantic difference between mutexes and spinlocks is fairly > > unintuitive. > > IIRC this is true of all sleeping locks, and I think completion was the > explcicit exception here, but it's been a while. In addition to completions, I think this also applies to up()? But I don't know if that's intentionally supported or just an implementation detail. Is there some central place where this should be documented instead of Documentation/locking/mutex-design.rst as a more general kernel locking design thing? Maybe Documentation/locking/locktypes.rst? I think it should also be documented on top of the relevant locking function(s) though, since I don't think everyone who uses locking functions necessarily reads the separate documentation files first. Mutexes kind of stand out as the most common locking type, but I guess to be consistent, we'd have to put the same comment on functions like up_read() and up_write()? And maybe drop the "Mutexes are different from spinlocks in this aspect" part? (Sidenote: Someone pointed out to me that an additional source of confusion could be that userspace POSIX mutexes support this usage pattern.) > > index 78540cd7f54b..087716bfa7b2 100644 > > --- a/Documentation/locking/mutex-design.rst > > +++ b/Documentation/locking/mutex-design.rst > > @@ -101,6 +101,12 @@ features that make lock debugging easier and faster: > > - Detects multi-task circular deadlocks and prints out all affected > > locks and tasks (and only those tasks). > > > > +Releasing a mutex is not an atomic operation: Once a mutex release operation > > Well, it very much is an atomic store-release. That is, I object to your > confusing use of atomic here :-) I'd say it involves an atomic store-release, but the whole operation is not atomic. :P But yeah, I see how this is confusing wording, and I'm not particularly attached to my specific choice of words.
On Fri, Dec 1, 2023 at 4:52 PM Waiman Long <longman@redhat.com> wrote: > On 12/1/23 10:01, Jann Horn wrote: >> I think this pattern anyway only works when you're only trying to wait >> for the current holder of the lock, not tasks that are queued up on >> the lock as waiters - so a task initially holds a stable reference to >> some object, then acquires the object's lock, then drops the original >> reference, and then later drops the lock. >> You can see an example of such mutex usage (which is explicitly legal >> with userspace POSIX mutexes, but is forbidden with kernel mutexes) at >> the bottom of the POSIX manpage for pthread_mutex_destroy() at >> <https://pubs.opengroup.org/onlinepubs/007904875/functions/pthread_mutex_destroy.html>, >> in the section "Destroying Mutexes". > > The POSIX mutex is reference-counted. I don't understand what you mean by that. Anyway, I guess this thread of discussion is moot - I'm not suggesting that kernel mutexes should support this behavior.
From: Jann Horn > Sent: 01 December 2023 15:02 > > On Fri, Dec 1, 2023 at 1:33 AM Waiman Long <longman@redhat.com> wrote: > > On 11/30/23 15:48, Jann Horn wrote: > > > I have seen several cases of attempts to use mutex_unlock() to release an > > > object such that the object can then be freed by another task. > > > My understanding is that this is not safe because mutex_unlock(), in the > > > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > > > structure after having marked it as unlocked; so mutex_unlock() requires > > > its caller to ensure that the mutex stays alive until mutex_unlock() > > > returns. > > > > > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > > > have to keep the mutex alive, I think; but we could have a spurious > > > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > > > between the points where __mutex_unlock_slowpath() did the cmpxchg > > > reading the flags and where it acquired the wait_lock. > > > > > > (With spinlocks, that kind of code pattern is allowed and, from what I > > > remember, used in several places in the kernel.) > > > > > > If my understanding of this is correct, we should probably document this - > > > I think such a semantic difference between mutexes and spinlocks is fairly > > > unintuitive. > > > > Spinlocks are fair. So doing a lock/unlock sequence will make sure that > > all the previously waiting waiters are done with the lock. Para-virtual > > spinlocks, however, can be a bit unfair so doing a lock/unlock sequence > > may not be enough to guarantee there is no waiter. The same is true for > > mutex. Adding a spin_is_locked() or mutex_is_locked() check can make > > sure that all the waiters are gone. > > I think this pattern anyway only works when you're only trying to wait > for the current holder of the lock, not tasks that are queued up on > the lock as waiters - so a task initially holds a stable reference to > some object, then acquires the object's lock, then drops the original > reference, and then later drops the lock. > You can see an example of such mutex usage (which is explicitly legal > with userspace POSIX mutexes, but is forbidden with kernel mutexes) at > the bottom of the POSIX manpage for pthread_mutex_destroy() at > <https://pubs.opengroup.org/onlinepubs/007904875/functions/pthread_mutex_destroy.html>, > in the section "Destroying Mutexes". I don't understand at all what any of this is about. You cannot de-initialise, free (etc) a mutex (or any other piece of memory for that matter) if another thread can have a reference to it. If some other code might be holding the mutex it also might be just about to acquire it - you always need another lock of some kind to ensure that doesn't happen. IIRC pretty much the only time you need to acquire the mutex in the free path is if locks are chained, eg: lock(table) entry = find_entry(); lock(entry) unlock(table) ... unlock(entry) Then the free code has to: lock(table) remove_from_table(entry) lock(entry) unlock(entry) unlock(table) free(entry) ISTR something about mutex_unlock() not being a full memory barrier. But that is entirely different to this discussion. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Dec 1, 2023 at 7:12 PM David Laight <David.Laight@aculab.com> wrote: > From: Jann Horn > > Sent: 01 December 2023 15:02 > > > > On Fri, Dec 1, 2023 at 1:33 AM Waiman Long <longman@redhat.com> wrote: > > > On 11/30/23 15:48, Jann Horn wrote: > > > > I have seen several cases of attempts to use mutex_unlock() to release an > > > > object such that the object can then be freed by another task. > > > > My understanding is that this is not safe because mutex_unlock(), in the > > > > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex > > > > structure after having marked it as unlocked; so mutex_unlock() requires > > > > its caller to ensure that the mutex stays alive until mutex_unlock() > > > > returns. > > > > > > > > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters > > > > have to keep the mutex alive, I think; but we could have a spurious > > > > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed > > > > between the points where __mutex_unlock_slowpath() did the cmpxchg > > > > reading the flags and where it acquired the wait_lock. > > > > > > > > (With spinlocks, that kind of code pattern is allowed and, from what I > > > > remember, used in several places in the kernel.) > > > > > > > > If my understanding of this is correct, we should probably document this - > > > > I think such a semantic difference between mutexes and spinlocks is fairly > > > > unintuitive. > > > > > > Spinlocks are fair. So doing a lock/unlock sequence will make sure that > > > all the previously waiting waiters are done with the lock. Para-virtual > > > spinlocks, however, can be a bit unfair so doing a lock/unlock sequence > > > may not be enough to guarantee there is no waiter. The same is true for > > > mutex. Adding a spin_is_locked() or mutex_is_locked() check can make > > > sure that all the waiters are gone. > > > > I think this pattern anyway only works when you're only trying to wait > > for the current holder of the lock, not tasks that are queued up on > > the lock as waiters - so a task initially holds a stable reference to > > some object, then acquires the object's lock, then drops the original > > reference, and then later drops the lock. > > You can see an example of such mutex usage (which is explicitly legal > > with userspace POSIX mutexes, but is forbidden with kernel mutexes) at > > the bottom of the POSIX manpage for pthread_mutex_destroy() at > > <https://pubs.opengroup.org/onlinepubs/007904875/functions/pthread_mutex_destroy.html>, > > in the section "Destroying Mutexes". > > I don't understand at all what any of this is about. > You cannot de-initialise, free (etc) a mutex (or any other piece of > memory for that matter) if another thread can have a reference to it. > If some other code might be holding the mutex it also might be just > about to acquire it - you always need another lock of some kind to > ensure that doesn't happen. > > IIRC pretty much the only time you need to acquire the mutex in the > free path is if locks are chained, eg: > lock(table) > entry = find_entry(); > lock(entry) > unlock(table) > ... > unlock(entry) > > Then the free code has to: > lock(table) > remove_from_table(entry) > lock(entry) > unlock(entry) > unlock(table) > free(entry) Yep, this is exactly the kind of code pattern for which I'm trying to document that it is forbidden with mutexes (while it is allowed with spinlocks).
On 12/1/23 13:44, David Laight wrote: > > (Top post due to perverted outluck rules on html) > > Pending waiters aren't the problem. > Pending waiters can still be a problem if code decides to free the lock containing object after a lock/unlock sequence as it may cause use-after-free. > > You have to ensure there aren't any, but the mutex() can be held. > Using reference count to track the number of active users is one way to prevent that if you only release the reference count after mutex_unlock() returns but not in the lock critical section. Cheers, Longman > David > > *From:*Waiman Long <longman@redhat.com> > *Sent:* 01 December 2023 18:40 > *To:* Jann Horn <jannh@google.com>; David Laight <David.Laight@ACULAB.COM> > *Cc:* Peter Zijlstra <peterz@infradead.org>; Ingo Molnar > <mingo@redhat.com>; Will Deacon <will@kernel.org>; Jonathan Corbet > <corbet@lwn.net>; linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org > *Subject:* Re: [PATCH] locking: Document that mutex_unlock() is non-atomic > > On 12/1/23 13:18, Jann Horn wrote: > > On Fri, Dec 1, 2023 at 7:12 PM David Laight<David.Laight@aculab.com> <mailto:David.Laight@aculab.com> wrote: > > From: Jann Horn > > I think this pattern anyway only works when you're only trying to wait > > for the current holder of the lock, not tasks that are queued up on > > the lock as waiters - so a task initially holds a stable reference to > > some object, then acquires the object's lock, then drops the original > > reference, and then later drops the lock. > > You can see an example of such mutex usage (which is explicitly legal > > with userspace POSIX mutexes, but is forbidden with kernel mutexes) at > > the bottom of the POSIX manpage for pthread_mutex_destroy() at > > <https://pubs.opengroup.org/onlinepubs/007904875/functions/pthread_mutex_destroy.html> <https://pubs.opengroup.org/onlinepubs/007904875/functions/pthread_mutex_destroy.html>, > > in the section "Destroying Mutexes". > > I don't understand at all what any of this is about. > > You cannot de-initialise, free (etc) a mutex (or any other piece of > > memory for that matter) if another thread can have a reference to it. > > If some other code might be holding the mutex it also might be just > > about to acquire it - you always need another lock of some kind to > > ensure that doesn't happen. > > IIRC pretty much the only time you need to acquire the mutex in the > > free path is if locks are chained, eg: > > lock(table) > > entry = find_entry(); > > lock(entry) > > unlock(table) > > ... > > unlock(entry) > > Then the free code has to: > > lock(table) > > remove_from_table(entry) > > lock(entry) > > unlock(entry) > > unlock(table) > > free(entry) > > Yep, this is exactly the kind of code pattern for which I'm trying to > > document that it is forbidden with mutexes (while it is allowed with > > spinlocks). > > Actually, even spinlocks may not guarantee the lock/unlock sequence > will flush out all the pending waiters in the case of paravirt spinlocks. > > Cheers, > Longman > > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, > MK1 1PT, UK > Registration No: 1397386 (Wales) > > P *Please consider the environment and don't print this e-mail unless > you really need to* >
From: Waiman Long > Sent: 01 December 2023 19:16 > > On 12/1/23 13:44, David Laight wrote: > > > > Pending waiters aren't the problem. > > > Pending waiters can still be a problem if code decides to free the lock > containing object after a lock/unlock sequence as it may cause > use-after-free. > > > > You have to ensure there aren't any, but the mutex() can be held. > > > Using reference count to track the number of active users is one way to > prevent that if you only release the reference count after > mutex_unlock() returns but not in the lock critical section. I suspect the documentation need to be more explicit than just saying it is non-atomic. Saying something like: The mutex structure may be accessed by mutex_unlock() after another thread has locked and unlocked the mutex. So if a reference count is used to ensure a structure remains valid when a lock is released (with the item being freed when the count becomes zero) the reference count itself cannot be protected by a mutex in the structure. So code like: ... count = --item->refcount; mutex_unlock(item->mtx); if (!count) free(item); can lead to a 'use after free' in mutex_unlock(). However if the refcount is atomic and decremented without the mutex held there isn't a problem. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 12/2/23 10:51, David Laight wrote: > From: Waiman Long >> Sent: 01 December 2023 19:16 >> >> On 12/1/23 13:44, David Laight wrote: >>> Pending waiters aren't the problem. >>> >> Pending waiters can still be a problem if code decides to free the lock >> containing object after a lock/unlock sequence as it may cause >> use-after-free. >>> You have to ensure there aren't any, but the mutex() can be held. >>> >> Using reference count to track the number of active users is one way to >> prevent that if you only release the reference count after >> mutex_unlock() returns but not in the lock critical section. > I suspect the documentation need to be more explicit than just saying > it is non-atomic. > Saying something like: > > The mutex structure may be accessed by mutex_unlock() after another > thread has locked and unlocked the mutex. > > So if a reference count is used to ensure a structure remains valid when > a lock is released (with the item being freed when the count becomes zero) > the reference count itself cannot be protected by a mutex in the structure. > So code like: > ... > count = --item->refcount; > mutex_unlock(item->mtx); > if (!count) > free(item); > can lead to a 'use after free' in mutex_unlock(). > However if the refcount is atomic and decremented without the > mutex held there isn't a problem. > > David That is definitely better than saying it is non-atomic which is vague in meaning. Cheers, Longman
diff --git a/Documentation/locking/mutex-design.rst b/Documentation/locking/mutex-design.rst index 78540cd7f54b..087716bfa7b2 100644 --- a/Documentation/locking/mutex-design.rst +++ b/Documentation/locking/mutex-design.rst @@ -101,6 +101,12 @@ features that make lock debugging easier and faster: - Detects multi-task circular deadlocks and prints out all affected locks and tasks (and only those tasks). +Releasing a mutex is not an atomic operation: Once a mutex release operation +has begun, another context may be able to acquire the mutex before the release +operation has completed. The mutex user must ensure that the mutex is not +destroyed while a release operation is still in progress - in other words, +callers of 'mutex_unlock' must ensure that the mutex stays alive until +'mutex_unlock' has returned. Interfaces ---------- diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 2deeeca3e71b..4c6b83bab643 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -532,6 +532,11 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne * This function must not be used in interrupt context. Unlocking * of a not locked mutex is not allowed. * + * The caller must ensure that the mutex stays alive until this function has + * returned - mutex_unlock() can NOT directly be used to release an object such + * that another concurrent task can free it. + * Mutexes are different from spinlocks in this aspect. + * * This function is similar to (but not equivalent to) up(). */ void __sched mutex_unlock(struct mutex *lock)