Message ID | 20221017211356.333862-1-longman@redhat.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp1652878wrs; Mon, 17 Oct 2022 14:39:59 -0700 (PDT) X-Google-Smtp-Source: AMsMyM68cD3OB3ukJRwKTE/PrFWuIvkVofpXvgsEy2XJM3afHW4jeC+XhZ4uXf6WKhMdc0IpsXR1 X-Received: by 2002:a17:907:2cf7:b0:78d:c7fc:29ff with SMTP id hz23-20020a1709072cf700b0078dc7fc29ffmr10554975ejc.748.1666042799604; Mon, 17 Oct 2022 14:39:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666042799; cv=none; d=google.com; s=arc-20160816; b=Zy0tqfE771wrQ+86oPXQw9KSg0bGyKtz/T41AKq+cmaBmbjhxYjSSVXY8yWOHbuW4/ 1hJmL5pvJs915v+S+nb1TMZt5ZtaCHde1aXQ4x9JpzA7QKlW4J2xPe1F6OQDEV6QewFm E4fa6nXr/fQKFby/ZxPQDRHVWraHiC2o/bpAMea9QE/uQwD/QKFJsBrInyu+riZ2J84t +GhuFdZYmi69n625IJeQVSBSFik6W5RhRMT19MCWfykdz+q1sL2har3Ym1L0EsKQDZ4R 3m7ilcbVW8Pb00tvFgKACfTu7yymDjVX5A6Oj4n4wPJfusPRbYUiO+5c6xAh2+OfjBfq smgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=OFj+fpTcybhy1EO/TBp6S/h5BFRJbIkWDHIGTag8Fxc=; b=wcyhQRUarHPEEfyIBO0+LnK6ATsvz/MqGn6/WDbbqNripOkN7xjPeXRGn+k70hccjf pcJQf+CpUl1q93E3cnVSpMW31KzRl9NtQJ8NHMIesvlJC61uqmI0bJv5ODJYn96/aKwa bOEpo6TAXvmqnYNLoIpjdb6DabtvAXoD+E8b/EQLr2R6fAm1zhsQdBTaJxpwnhpvnV/I VdqlR3o/O3sdgehjkg9/lx2NGRBuHC347+FHCh9akSgQWlt1URKoOaLg4nJtYaTc8/4e FO4zwFgPSDX3ac9P3xlUrXo12BtxGh8ozHpcwgZINVVlpZoaJAwQZ4u8JcRAEPR8+/D0 uwYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=L7sgrU+X; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dv2-20020a170906b80200b0078d2848bca9si8518770ejb.704.2022.10.17.14.39.34; Mon, 17 Oct 2022 14:39:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=L7sgrU+X; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229707AbiJQVOt (ORCPT <rfc822;kernel.ruili@gmail.com> + 99 others); Mon, 17 Oct 2022 17:14:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230123AbiJQVOp (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 17 Oct 2022 17:14:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 554C0183A4 for <linux-kernel@vger.kernel.org>; Mon, 17 Oct 2022 14:14:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666041281; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=OFj+fpTcybhy1EO/TBp6S/h5BFRJbIkWDHIGTag8Fxc=; b=L7sgrU+XMrreS7mtCcR58rg6ajkEwiO2tK4Pz17MTQdMN40yqUWLmyjeVyyetxMQL0rqih bzN+WgdMoYOrUYjVBhquCuu/Kk9P5pTEqWujIDw2yyf6hZom5YtaQfREFHSy18ctyf80Ij NrGZbNGmtFqJoXeACn+l1HFUo7/INPY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-589-65OJOv85O4-RjKwZGFgZLg-1; Mon, 17 Oct 2022 17:14:36 -0400 X-MC-Unique: 65OJOv85O4-RjKwZGFgZLg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C619486F12E; Mon, 17 Oct 2022 21:14:35 +0000 (UTC) Received: from llong.com (unknown [10.22.33.56]) by smtp.corp.redhat.com (Postfix) with ESMTP id E2DD440C206B; Mon, 17 Oct 2022 21:14:34 +0000 (UTC) From: Waiman Long <longman@redhat.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com> Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton <hdanton@sina.com>, Mukesh Ojha <quic_mojha@quicinc.com>, =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= <wangting11@xiaomi.com>, Waiman Long <longman@redhat.com> Subject: [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Date: Mon, 17 Oct 2022 17:13:51 -0400 Message-Id: <20221017211356.333862-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746972494517157755?= X-GMAIL-MSGID: =?utf-8?q?1746972494517157755?= |
Series |
lockinig/rwsem: Fix rwsem bugs & enable true lock handoff
|
|
Message
Waiman Long
Oct. 17, 2022, 9:13 p.m. UTC
v3: - Make a minor cleanup to patch 1. - Add 3 more patches to implement true lock handoff. v2: - Add an additional patch to limit the # of first waiter optimistic spinning in the writer slowpath. It turns out the current waiter optimistic spinning code does not work that well if we have RT tasks in the mix. This patch series include two different fixes to resolve those issues. The last 3 patches modify the handoff code to implement true lock handoff similar to that of mutex. Waiman Long (5): locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath locking/rwsem: Limit # of null owner retries for handoff writer locking/rwsem: Change waiter->hanodff_set to a handoff_state enum locking/rwsem: Enable direct rwsem lock handoff locking/rwsem: Update handoff lock events tracking kernel/locking/lock_events_list.h | 6 +- kernel/locking/rwsem.c | 172 +++++++++++++++++++++++------- 2 files changed, 138 insertions(+), 40 deletions(-)
Comments
Hi, On 10/18/2022 4:44 PM, Hillf Danton wrote: > On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com> >> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat >> return sem; >> } >> adjustment += RWSEM_FLAG_WAITERS; >> + } else if ((count & RWSEM_FLAG_HANDOFF) && >> + ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) { > > Could a couple of CPUs go read slow path in parallel? > >> + /* >> + * If the waiter to be handed off is a reader, this reader >> + * can piggyback on top of top of that. >> + */ >> + if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ) >> + adjustment = 0; >> + rwsem_handoff(sem, adjustment, &wake_q); >> + >> + if (!adjustment) { >> + raw_spin_unlock_irq(&sem->wait_lock); >> + wake_up_q(&wake_q); >> + return sem; >> + } >> + adjustment = 0; >> } >> rwsem_add_waiter(sem, &waiter); > > Why can this acquirer pigyback without becoming a waiter? > >> >> - /* we're now waiting on the lock, but no longer actively locking */ >> - count = atomic_long_add_return(adjustment, &sem->count); >> - >> - rwsem_cond_wake_waiter(sem, count, &wake_q); >> + if (adjustment) { >> + /* >> + * We are now waiting on the lock with no handoff, but no >> + * longer actively locking. >> + */ >> + count = atomic_long_add_return(adjustment, &sem->count); >> + rwsem_cond_wake_waiter(sem, count, &wake_q); >> + } >> raw_spin_unlock_irq(&sem->wait_lock); >> >> if (!wake_q_empty(&wake_q)) >> @@ -1120,7 +1192,6 @@ static struct rw_semaphore __sched * >> rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) >> { >> struct rwsem_waiter waiter; >> - int null_owner_retries; > > This reverts 2/5 and feel free to drop it directly. I think, he intents to tag the first two patches to go to stable branches. -Mukesh > > Hillf
On 10/18/22 10:13, Mukesh Ojha wrote: > Hi, > > On 10/18/2022 4:44 PM, Hillf Danton wrote: >> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com> >>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore >>> *sem, long count, unsigned int stat >>> return sem; >>> } >>> adjustment += RWSEM_FLAG_WAITERS; >>> + } else if ((count & RWSEM_FLAG_HANDOFF) && >>> + ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) { >> >> Could a couple of CPUs go read slow path in parallel? This is under wait_lock protection. So no parallel execution is possible. >> >>> + /* >>> + * If the waiter to be handed off is a reader, this reader >>> + * can piggyback on top of top of that. >>> + */ >>> + if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ) >>> + adjustment = 0; >>> + rwsem_handoff(sem, adjustment, &wake_q); >>> + >>> + if (!adjustment) { >>> + raw_spin_unlock_irq(&sem->wait_lock); >>> + wake_up_q(&wake_q); >>> + return sem; >>> + } >>> + adjustment = 0; >>> } >>> rwsem_add_waiter(sem, &waiter); >> >> Why can this acquirer pigyback without becoming a waiter? The idea is to have as much reader parallelism as possible without writer starvation. In other word, a continuous stream of readers is not allowed to block out writer. However, there are places where allow one more reader to get the lock won't cause writer starvation. >> >>> - /* we're now waiting on the lock, but no longer actively >>> locking */ >>> - count = atomic_long_add_return(adjustment, &sem->count); >>> - >>> - rwsem_cond_wake_waiter(sem, count, &wake_q); >>> + if (adjustment) { >>> + /* >>> + * We are now waiting on the lock with no handoff, but no >>> + * longer actively locking. >>> + */ >>> + count = atomic_long_add_return(adjustment, &sem->count); >>> + rwsem_cond_wake_waiter(sem, count, &wake_q); >>> + } >>> raw_spin_unlock_irq(&sem->wait_lock); >>> if (!wake_q_empty(&wake_q)) >>> @@ -1120,7 +1192,6 @@ static struct rw_semaphore __sched * >>> rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) >>> { >>> struct rwsem_waiter waiter; >>> - int null_owner_retries; >> >> This reverts 2/5 and feel free to drop it directly. > > I think, he intents to tag the first two patches to go to stable > branches. This patch is too disruptive to go to the stable branches. Yes, I do intend the first 2 patches to go into stable branches. Cheers, Longman
On 10/18/22 19:51, Hillf Danton wrote: > On 18 Oct 2022 13:37:20 -0400 Waiman Long <longman@redhat.com> >> On 10/18/22 10:13, Mukesh Ojha wrote: >>> On 10/18/2022 4:44 PM, Hillf Danton wrote: >>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com> >>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore >>>>> return sem; >>>>> } >>>>> adjustment += RWSEM_FLAG_WAITERS; >>>>> + } else if ((count & RWSEM_FLAG_HANDOFF) && >>>>> + ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) { >>>> Could a couple of CPUs go read slow path in parallel? >>>> >> This is under wait_lock protection. So no parallel execution is possible. > They individually add RWSEM_READER_BIAS to count before taking wait_lock, > and the check for BIAS here does not cover the case of readers in parallel. > Is this intended? > > Hillf As I said in the patch description, the lock handoff can only be done if we can be sure that there is no other active locks outstanding with the handoff bit set. If at the time of the check, another reader come in and adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to put its waiter in the queue and begin sleeping. Hopefully, the last one left will find that count has only its RWSEM_READER_BIAS and it can start the handoff process. Cheers, Longman
On 10/18/22 22:29, Hillf Danton wrote: > On 18 Oct 2022 20:39:59 -0400 Waiman Long <longman@redhat.com> >> On 10/18/22 19:51, Hillf Danton wrote: >>> On 18 Oct 2022 13:37:20 -0400 Waiman Long <longman@redhat.com> >>>> On 10/18/22 10:13, Mukesh Ojha wrote: >>>>> On 10/18/2022 4:44 PM, Hillf Danton wrote: >>>>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com> >>>>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore >>>>>>> return sem; >>>>>>> } >>>>>>> adjustment += RWSEM_FLAG_WAITERS; >>>>>>> + } else if ((count & RWSEM_FLAG_HANDOFF) && >>>>>>> + ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) { >>>>>> Could a couple of CPUs go read slow path in parallel? >>>>>> >>>> This is under wait_lock protection. So no parallel execution is possible. >>> They individually add RWSEM_READER_BIAS to count before taking wait_lock, >>> and the check for BIAS here does not cover the case of readers in parallel. >>> Is this intended? >>> >>> Hillf >> As I said in the patch description, the lock handoff can only be done if >> we can be sure that there is no other active locks outstanding with the >> handoff bit set. If at the time of the check, another reader come in and >> adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to >> put its waiter in the queue and begin sleeping. Hopefully, the last one >> left will find that count has only its RWSEM_READER_BIAS and it can >> start the handoff process. > If handoff grants rwsem to a read waiter then the read fast path may revive. I don't quite understand what you mean by "read fast path may revive". > And at the time of the check, multiple readers do not break handoff IMO. I am not saying that multiple readers will break handoff. They will just delay it until all their temporary RWSEM_READ_BIAS are taken off. Cheers, Longman
On 10/19/22 03:05, Hillf Danton wrote: > On 18 Oct 2022 22:49:02 -0400 Waiman Long <longman@redhat.com> >> On 10/18/22 22:29, Hillf Danton wrote: >>> If handoff grants rwsem to a read waiter then the read fast path may revive. >> I don't quite understand what you mean by "read fast path may revive". > Subsequent readers may take rwsem without going the slow path after a > read waiter is granted. That may not be true. As long as there are still waiters in the wait queue, a reader has to go into the slow path and queued up in the wait queue. This is is to prevent a continuous stream of readers from starving writers in the wait queue. > > OTOH even after the check for single BIAS, another reader may come in > and add its BIAS to count, which builds the same count as multiple > readers came in before the check. That is true, but hopefully we will eventually run out reader trying to get a read lock on a given rwsem. Cheers, Longman
On 10/18/22 19:51, Hillf Danton wrote: > On 18 Oct 2022 13:37:20 -0400 Waiman Long<longman@redhat.com> >> On 10/18/22 10:13, Mukesh Ojha wrote: >>> On 10/18/2022 4:44 PM, Hillf Danton wrote: >>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long<longman@redhat.com> >>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore >>>>> return sem; >>>>> } >>>>> adjustment += RWSEM_FLAG_WAITERS; >>>>> + } else if ((count & RWSEM_FLAG_HANDOFF) && >>>>> + ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) { >>>> Could a couple of CPUs go read slow path in parallel? >>>> >> This is under wait_lock protection. So no parallel execution is possible. > They individually add RWSEM_READER_BIAS to count before taking wait_lock, > and the check for BIAS here does not cover the case of readers in parallel. > Is this intended? > > Hillf As I said in the patch description, the lock handoff can only be done if we can be sure that there is no other active locks outstanding with the handoff bit set. If at the time of the check, another reader come in and adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to put its waiter in the queue and begin sleeping. Hopefully, the last one left will find that count has only its RWSEM_READER_BIAS and it can start the handoff process. Cheers, Longman