Commit Message
Greg KH
Oct. 19, 2022, 8:29 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com> [ Upstream commit 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4 ] There are two problems can lead to lost wakeup: 1) invalid wakeup on the wrong waitqueue: For example, 2 * wake_batch tags are put, while only wake_batch threads are woken: __sbq_wake_up atomic_cmpxchg -> reset wait_cnt __sbq_wake_up -> decrease wait_cnt ... __sbq_wake_up -> wait_cnt is decreased to 0 again atomic_cmpxchg sbq_index_atomic_inc -> increase wake_index wake_up_nr -> wake up and waitqueue might be empty sbq_index_atomic_inc -> increase again, one waitqueue is skipped wake_up_nr -> invalid wake up because old wakequeue might be empty To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. 2) 'wait_cnt' can be decreased while waitqueue is empty As pointed out by Jan Kara, following race is possible: CPU1 CPU2 __sbq_wake_up __sbq_wake_up sbq_wake_ptr() sbq_wake_ptr() -> the same wait_cnt = atomic_dec_return() /* decreased to 0 */ sbq_index_atomic_inc() /* move to next waitqueue */ atomic_set() /* reset wait_cnt */ wake_up_nr() /* wake up on the old waitqueue */ wait_cnt = atomic_dec_return() /* * decrease wait_cnt in the old * waitqueue, while it can be * empty. */ Fix the problem by waking up before updating 'wake_index' and 'wait_cnt'. With this patch, noted that 'wait_cnt' is still decreased in the old empty waitqueue, however, the wakeup is redirected to a active waitqueue, and the extra decrement on the old empty waitqueue is not handled. Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> --- lib/sbitmap.c | 55 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 33 insertions(+), 22 deletions(-)
Comments
On Wed, 19 Oct 2022, Greg Kroah-Hartman wrote: > From: Yu Kuai <yukuai3@huawei.com> > > [ Upstream commit 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4 ] > > There are two problems can lead to lost wakeup: > > 1) invalid wakeup on the wrong waitqueue: > > For example, 2 * wake_batch tags are put, while only wake_batch threads > are woken: > > __sbq_wake_up > atomic_cmpxchg -> reset wait_cnt > __sbq_wake_up -> decrease wait_cnt > ... > __sbq_wake_up -> wait_cnt is decreased to 0 again > atomic_cmpxchg > sbq_index_atomic_inc -> increase wake_index > wake_up_nr -> wake up and waitqueue might be empty > sbq_index_atomic_inc -> increase again, one waitqueue is skipped > wake_up_nr -> invalid wake up because old wakequeue might be empty > > To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. > > 2) 'wait_cnt' can be decreased while waitqueue is empty > > As pointed out by Jan Kara, following race is possible: > > CPU1 CPU2 > __sbq_wake_up __sbq_wake_up > sbq_wake_ptr() sbq_wake_ptr() -> the same > wait_cnt = atomic_dec_return() > /* decreased to 0 */ > sbq_index_atomic_inc() > /* move to next waitqueue */ > atomic_set() > /* reset wait_cnt */ > wake_up_nr() > /* wake up on the old waitqueue */ > wait_cnt = atomic_dec_return() > /* > * decrease wait_cnt in the old > * waitqueue, while it can be > * empty. > */ > > Fix the problem by waking up before updating 'wake_index' and > 'wait_cnt'. > > With this patch, noted that 'wait_cnt' is still decreased in the old > empty waitqueue, however, the wakeup is redirected to a active waitqueue, > and the extra decrement on the old empty waitqueue is not handled. > > Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > Reviewed-by: Jan Kara <jack@suse.cz> > Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com > Signed-off-by: Jens Axboe <axboe@kernel.dk> > Signed-off-by: Sasha Levin <sashal@kernel.org> I have no authority on linux-block, but I'll say NAK to this one (and 517/862), and let Jens and Jan overrule me if they disagree. This was the first of several 6.1-rc1 commits which had given me lost wakeups never suffered before; was not tagged Cc stable; and (unless I've missed it on lore) never had AUTOSEL posted to linux-block or linux-kernel. Hugh > --- > lib/sbitmap.c | 55 ++++++++++++++++++++++++++++++--------------------- > 1 file changed, 33 insertions(+), 22 deletions(-) > > diff --git a/lib/sbitmap.c b/lib/sbitmap.c > index 29eb0484215a..1f31147872e6 100644 > --- a/lib/sbitmap.c > +++ b/lib/sbitmap.c > @@ -611,32 +611,43 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) > return false; > > wait_cnt = atomic_dec_return(&ws->wait_cnt); > - if (wait_cnt <= 0) { > - int ret; > + /* > + * For concurrent callers of this, callers should call this function > + * again to wakeup a new batch on a different 'ws'. > + */ > + if (wait_cnt < 0 || !waitqueue_active(&ws->wait)) > + return true; > > - wake_batch = READ_ONCE(sbq->wake_batch); > + if (wait_cnt > 0) > + return false; > > - /* > - * Pairs with the memory barrier in sbitmap_queue_resize() to > - * ensure that we see the batch size update before the wait > - * count is reset. > - */ > - smp_mb__before_atomic(); > + wake_batch = READ_ONCE(sbq->wake_batch); > > - /* > - * For concurrent callers of this, the one that failed the > - * atomic_cmpxhcg() race should call this function again > - * to wakeup a new batch on a different 'ws'. > - */ > - ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); > - if (ret == wait_cnt) { > - sbq_index_atomic_inc(&sbq->wake_index); > - wake_up_nr(&ws->wait, wake_batch); > - return false; > - } > + /* > + * Wake up first in case that concurrent callers decrease wait_cnt > + * while waitqueue is empty. > + */ > + wake_up_nr(&ws->wait, wake_batch); > > - return true; > - } > + /* > + * Pairs with the memory barrier in sbitmap_queue_resize() to > + * ensure that we see the batch size update before the wait > + * count is reset. > + * > + * Also pairs with the implicit barrier between decrementing wait_cnt > + * and checking for waitqueue_active() to make sure waitqueue_active() > + * sees result of the wakeup if atomic_dec_return() has seen the result > + * of atomic_set(). > + */ > + smp_mb__before_atomic(); > + > + /* > + * Increase wake_index before updating wait_cnt, otherwise concurrent > + * callers can see valid wait_cnt in old waitqueue, which can cause > + * invalid wakeup on the old waitqueue. > + */ > + sbq_index_atomic_inc(&sbq->wake_index); > + atomic_set(&ws->wait_cnt, wake_batch); > > return false; > } > -- > 2.35.1 > > > >
On Wed, Oct 19, 2022 at 08:06:26AM -0700, Hugh Dickins wrote: > On Wed, 19 Oct 2022, Greg Kroah-Hartman wrote: > > > From: Yu Kuai <yukuai3@huawei.com> > > > > [ Upstream commit 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4 ] > > > > There are two problems can lead to lost wakeup: > > > > 1) invalid wakeup on the wrong waitqueue: > > > > For example, 2 * wake_batch tags are put, while only wake_batch threads > > are woken: > > > > __sbq_wake_up > > atomic_cmpxchg -> reset wait_cnt > > __sbq_wake_up -> decrease wait_cnt > > ... > > __sbq_wake_up -> wait_cnt is decreased to 0 again > > atomic_cmpxchg > > sbq_index_atomic_inc -> increase wake_index > > wake_up_nr -> wake up and waitqueue might be empty > > sbq_index_atomic_inc -> increase again, one waitqueue is skipped > > wake_up_nr -> invalid wake up because old wakequeue might be empty > > > > To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. > > > > 2) 'wait_cnt' can be decreased while waitqueue is empty > > > > As pointed out by Jan Kara, following race is possible: > > > > CPU1 CPU2 > > __sbq_wake_up __sbq_wake_up > > sbq_wake_ptr() sbq_wake_ptr() -> the same > > wait_cnt = atomic_dec_return() > > /* decreased to 0 */ > > sbq_index_atomic_inc() > > /* move to next waitqueue */ > > atomic_set() > > /* reset wait_cnt */ > > wake_up_nr() > > /* wake up on the old waitqueue */ > > wait_cnt = atomic_dec_return() > > /* > > * decrease wait_cnt in the old > > * waitqueue, while it can be > > * empty. > > */ > > > > Fix the problem by waking up before updating 'wake_index' and > > 'wait_cnt'. > > > > With this patch, noted that 'wait_cnt' is still decreased in the old > > empty waitqueue, however, the wakeup is redirected to a active waitqueue, > > and the extra decrement on the old empty waitqueue is not handled. > > > > Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > Reviewed-by: Jan Kara <jack@suse.cz> > > Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > I have no authority on linux-block, but I'll say NAK to this one > (and 517/862), and let Jens and Jan overrule me if they disagree. > > This was the first of several 6.1-rc1 commits which had given me lost > wakeups never suffered before; was not tagged Cc stable; and (unless I've > missed it on lore) never had AUTOSEL posted to linux-block or linux-kernel. Ok, thanks for the review. I'll drop both of the sbitmap.c changes and if people report issues and want them back, I'll be glad to revisit them then. greg k-h
On 10/19/22 10:25 AM, Greg Kroah-Hartman wrote: > On Wed, Oct 19, 2022 at 08:06:26AM -0700, Hugh Dickins wrote: >> On Wed, 19 Oct 2022, Greg Kroah-Hartman wrote: >> >>> From: Yu Kuai <yukuai3@huawei.com> >>> >>> [ Upstream commit 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4 ] >>> >>> There are two problems can lead to lost wakeup: >>> >>> 1) invalid wakeup on the wrong waitqueue: >>> >>> For example, 2 * wake_batch tags are put, while only wake_batch threads >>> are woken: >>> >>> __sbq_wake_up >>> atomic_cmpxchg -> reset wait_cnt >>> __sbq_wake_up -> decrease wait_cnt >>> ... >>> __sbq_wake_up -> wait_cnt is decreased to 0 again >>> atomic_cmpxchg >>> sbq_index_atomic_inc -> increase wake_index >>> wake_up_nr -> wake up and waitqueue might be empty >>> sbq_index_atomic_inc -> increase again, one waitqueue is skipped >>> wake_up_nr -> invalid wake up because old wakequeue might be empty >>> >>> To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. >>> >>> 2) 'wait_cnt' can be decreased while waitqueue is empty >>> >>> As pointed out by Jan Kara, following race is possible: >>> >>> CPU1 CPU2 >>> __sbq_wake_up __sbq_wake_up >>> sbq_wake_ptr() sbq_wake_ptr() -> the same >>> wait_cnt = atomic_dec_return() >>> /* decreased to 0 */ >>> sbq_index_atomic_inc() >>> /* move to next waitqueue */ >>> atomic_set() >>> /* reset wait_cnt */ >>> wake_up_nr() >>> /* wake up on the old waitqueue */ >>> wait_cnt = atomic_dec_return() >>> /* >>> * decrease wait_cnt in the old >>> * waitqueue, while it can be >>> * empty. >>> */ >>> >>> Fix the problem by waking up before updating 'wake_index' and >>> 'wait_cnt'. >>> >>> With this patch, noted that 'wait_cnt' is still decreased in the old >>> empty waitqueue, however, the wakeup is redirected to a active waitqueue, >>> and the extra decrement on the old empty waitqueue is not handled. >>> >>> Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") >>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>> Reviewed-by: Jan Kara <jack@suse.cz> >>> Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>> Signed-off-by: Sasha Levin <sashal@kernel.org> >> >> I have no authority on linux-block, but I'll say NAK to this one >> (and 517/862), and let Jens and Jan overrule me if they disagree. >> >> This was the first of several 6.1-rc1 commits which had given me lost >> wakeups never suffered before; was not tagged Cc stable; and (unless I've >> missed it on lore) never had AUTOSEL posted to linux-block or linux-kernel. > > Ok, thanks for the review. I'll drop both of the sbitmap.c changes and > if people report issues and want them back, I'll be glad to revisit them > then. Sorry for being late, did see Hugh respond to the original auto-select as well, and was surprised to see it moving forward after that. Let's please drop them for now.
diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 29eb0484215a..1f31147872e6 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -611,32 +611,43 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) return false; wait_cnt = atomic_dec_return(&ws->wait_cnt); - if (wait_cnt <= 0) { - int ret; + /* + * For concurrent callers of this, callers should call this function + * again to wakeup a new batch on a different 'ws'. + */ + if (wait_cnt < 0 || !waitqueue_active(&ws->wait)) + return true; - wake_batch = READ_ONCE(sbq->wake_batch); + if (wait_cnt > 0) + return false; - /* - * Pairs with the memory barrier in sbitmap_queue_resize() to - * ensure that we see the batch size update before the wait - * count is reset. - */ - smp_mb__before_atomic(); + wake_batch = READ_ONCE(sbq->wake_batch); - /* - * For concurrent callers of this, the one that failed the - * atomic_cmpxhcg() race should call this function again - * to wakeup a new batch on a different 'ws'. - */ - ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); - if (ret == wait_cnt) { - sbq_index_atomic_inc(&sbq->wake_index); - wake_up_nr(&ws->wait, wake_batch); - return false; - } + /* + * Wake up first in case that concurrent callers decrease wait_cnt + * while waitqueue is empty. + */ + wake_up_nr(&ws->wait, wake_batch); - return true; - } + /* + * Pairs with the memory barrier in sbitmap_queue_resize() to + * ensure that we see the batch size update before the wait + * count is reset. + * + * Also pairs with the implicit barrier between decrementing wait_cnt + * and checking for waitqueue_active() to make sure waitqueue_active() + * sees result of the wakeup if atomic_dec_return() has seen the result + * of atomic_set(). + */ + smp_mb__before_atomic(); + + /* + * Increase wake_index before updating wait_cnt, otherwise concurrent + * callers can see valid wait_cnt in old waitqueue, which can cause + * invalid wakeup on the old waitqueue. + */ + sbq_index_atomic_inc(&sbq->wake_index); + atomic_set(&ws->wait_cnt, wake_batch); return false; }