From patchwork Mon Oct 24 17:44:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 9957 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp609103wru; Mon, 24 Oct 2022 12:10:07 -0700 (PDT) X-Google-Smtp-Source: AMsMyM47CF6yrj5btKdw2Ih/7nPwNFoER14C9WTdp4hcucH93PJTlBkVrn4aSX9P9bWYkPkHAaMG X-Received: by 2002:a63:4283:0:b0:457:dced:8ba3 with SMTP id p125-20020a634283000000b00457dced8ba3mr29097952pga.220.1666638607090; Mon, 24 Oct 2022 12:10:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666638607; cv=none; d=google.com; s=arc-20160816; b=eNUDZW59ol/ytKL75QJbSeRDG2KiO0zsXGYmLpkQTmEvzrdg23naCMKFyfv7mANn4w QlmqT14/LMeWqGleB+wmdytflcrMH1BYvFWEd5ZCmJwnCoRVlSlWnoBboSwAzefyhQ+X bZTpTE7+JZzBYX27iEAJQHyT59ln+LGQ4O9M0hkJTR5INqOoBl93cRO2nx4eNUmMeafD +OLAEjlWpGRgxVUs3s7i8kql950W6OncJODCXlcGJcE810GcFVLvT1Z2p6XQKGChkD/i v6wnr7IDol7MYhnEbLzHr7rY0viOadoQJVvvWfs8nHeN/DvR9PLoqOQNqZ+rtVYoI5bE A4Gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jqegOKwKp0b5k01538L+8elHQKTxwvJermmvNJSeQ8I=; b=GD0rvb9djGHJVb1vHZU1Er9DLW/qYHLzf9lsxWc/I93izdEWqEhQ7RY4Iw5mfNMwt2 f7tL/XWG7vG6ilXBCw6PxoIQHvjI91sjTxL95+jL+ON9Q7rVMuDMiL537r4papvdUPua XNG04g8QFfe1DxPujP3hZw9Kwt7hmp1rja9RjzQD1DC/cZKNjuU3qFOlg7elAWTZH2dr 6Qft39LLlA09qU3Bj38xqPV1v6vru7x2F2SV8W5sbil29RqVF99q5b4pXMhqULrADobU s2qIkVZ2Bm/6S05j2YiFbrNa+9irenKVI6OnWYnJTqp8oe3C53mENWa14KaLCECj+K41 lK+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R91sIa4L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kb10-20020a17090ae7ca00b00200a43d4d58si12304023pjb.80.2022.10.24.12.09.52; Mon, 24 Oct 2022 12:10:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R91sIa4L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232689AbiJXTIK (ORCPT + 99 others); Mon, 24 Oct 2022 15:08:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232682AbiJXTHs (ORCPT ); Mon, 24 Oct 2022 15:07:48 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 876D07963D for ; Mon, 24 Oct 2022 10:47:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666633496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jqegOKwKp0b5k01538L+8elHQKTxwvJermmvNJSeQ8I=; b=R91sIa4LDybOjV3Yv6P6Qbjt/MyLNaUv1Ow+KfTzO7XxpDD44R7EBQiyIqyfhEc+WSWdwJ UEBuRQJ7mAeImuA5UB0gHdsgbJoTBTkpF7UvWXCqIse+20ac2WOB/jzDkg+rHgnlFz6w82 Yc9wUdnRHe/BtxCvVIp1pQ41yIjImG8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-135-O_QEXEtDO8y_40DJ98iRpw-1; Mon, 24 Oct 2022 13:44:51 -0400 X-MC-Unique: O_QEXEtDO8y_40DJ98iRpw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8B99B8027FD; Mon, 24 Oct 2022 17:44:50 +0000 (UTC) Received: from llong.com (dhcp-17-153.bos.redhat.com [10.18.17.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 40FAA492B0F; Mon, 24 Oct 2022 17:44:50 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long , stable@vger.kernel.org Subject: [PATCH v4 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Date: Mon, 24 Oct 2022 13:44:14 -0400 Message-Id: <20221024174418.796468-2-longman@redhat.com> In-Reply-To: <20221024174418.796468-1-longman@redhat.com> References: <20221024174418.796468-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747597244240570178?= X-GMAIL-MSGID: =?utf-8?q?1747597244240570178?= A non-first waiter can potentially spin in the for loop of rwsem_down_write_slowpath() without sleeping but fail to acquire the lock even if the rwsem is free if the following sequence happens: Non-first RT waiter First waiter Lock holder ------------------- ------------ ----------- Acquire wait_lock rwsem_try_write_lock(): Set handoff bit if RT or wait too long Set waiter->handoff_set Release wait_lock Acquire wait_lock Inherit waiter->handoff_set Release wait_lock Clear owner Release lock if (waiter.handoff_set) { rwsem_spin_on_owner((); if (OWNER_NULL) goto trylock_again; } trylock_again: Acquire wait_lock rwsem_try_write_lock(): if (first->handoff_set && (waiter != first)) return false; Release wait_lock A non-first waiter cannot really acquire the rwsem even if it mistakenly believes that it can spin on OWNER_NULL value. If that waiter happens to be an RT task running on the same CPU as the first waiter, it can block the first waiter from acquiring the rwsem leading to live lock. Fix this problem by making sure that a non-first waiter cannot spin in the slowpath loop without sleeping. Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Reviewed-and-tested-by: Mukesh Ojha Signed-off-by: Waiman Long Cc: stable@vger.kernel.org --- kernel/locking/rwsem.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 44873594de03..be2df9ea7c30 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -624,18 +624,16 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, */ if (first->handoff_set && (waiter != first)) return false; - - /* - * First waiter can inherit a previously set handoff - * bit and spin on rwsem if lock acquisition fails. - */ - if (waiter == first) - waiter->handoff_set = true; } new = count; if (count & RWSEM_LOCK_MASK) { + /* + * A waiter (first or not) can set the handoff bit + * if it is an RT task or wait in the wait queue + * for too long. + */ if (has_handoff || (!rt_task(waiter->task) && !time_after(jiffies, waiter->timeout))) return false; @@ -651,11 +649,12 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); /* - * We have either acquired the lock with handoff bit cleared or - * set the handoff bit. + * We have either acquired the lock with handoff bit cleared or set + * the handoff bit. Only the first waiter can have its handoff_set + * set here to enable optimistic spinning in slowpath loop. */ if (new & RWSEM_FLAG_HANDOFF) { - waiter->handoff_set = true; + first->handoff_set = true; lockevent_inc(rwsem_wlock_handoff); return false; } From patchwork Mon Oct 24 17:44:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10290 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp653818wru; Mon, 24 Oct 2022 14:10:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM510F39N9VUiwIemz7N6yvXxiqpR619tzJQRG3M+1hDiwHzbvsSOJVmQDlQmqm+vIP9eXCY X-Received: by 2002:a17:90b:394:b0:212:19d7:9072 with SMTP id ga20-20020a17090b039400b0021219d79072mr27781142pjb.69.1666645854173; Mon, 24 Oct 2022 14:10:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666645854; cv=none; d=google.com; s=arc-20160816; b=JP+3ot6uJUCadMwGsMnH8LtoaXuY9ptGXsLCz/c8GGr1y8LZs+d/9A962tQKbXx176 L/kaLwUTD3tfXHYEjvuxn6HGBN7D3mV2y6MEk1YZd0rD4JHTvz4qJSpj0UHhNMpByZ/m GokO69tingGkqKGSvco16lEg/f7wlG6WlxxdZFg5QmFkvcBrdK8qQsGgvjOQpowLEnBf qhoMH21CItd3gdFrC6LFh8QKfRLcR5uDYSdXMueQt34rYHqq6PVUIDmaXtNM99HqnoW7 UY3Q8i8KGwWffdDpEDzqBQ6on2JE9D+M6Xy8nZ+YeRpub8Zyzyg50Q1wwoeec6FUhJH2 tVtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=24Ojt1p/QmnKYVxseLXRDHEIY77x+3H59Ny0BXrLoPY=; b=XhjbSxODDQlLzc8VG1T4JCbd4MutuUkLIBpkk/eHDSfDfeujbnzzL3e0yHZXudkR7U 4FARCcMfuv5AxPxDBGIceRNybjtpwPgve/JQG2g4BAU4GvjwqA0o6rxzItdIfeTBkNQ5 Qt4627UtuRzG3zGlP7qDS2OSiDlvFOfTkt5dYSaOjJRK1KFS5EsGm/8Bryx89Xu2jyMA 7xhcYWfAj5iHSgiPVe5hadzUDTekJkt1EH5t4brxilv0CqfbAooyyxcA6FiySIeNv8Mw hibqtWuJfIGbl2SOBXic+sgimf7gO7ojvYfWGaI8nUx4yD+/LiJSnV0wDSIeyX5F+hEW Aeew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LWBlnyRV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k184-20020a6384c1000000b0046eda1d577asi581018pgd.21.2022.10.24.14.10.35; Mon, 24 Oct 2022 14:10:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LWBlnyRV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235169AbiJXU6F (ORCPT + 99 others); Mon, 24 Oct 2022 16:58:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235216AbiJXU5L (ORCPT ); Mon, 24 Oct 2022 16:57:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BEFB22C622 for ; Mon, 24 Oct 2022 12:03:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666638180; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=24Ojt1p/QmnKYVxseLXRDHEIY77x+3H59Ny0BXrLoPY=; b=LWBlnyRVeSEGY+L278A4kxADQhVw507TTs2iPCi9qdZTIiGroqY09B7fzcbYgFdXBQb6T6 fsVie1qbKuNfLYy48S+XMrpMTdTthq4Xqtf5GL5Z3zXTVX5yyOk5YfZ47hWf+cp7V104RW CEiF2PPuvVnbL7XoAuE072Pnuv52GOg= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-205-Xy8wObC_NC2hhS38zz7kew-1; Mon, 24 Oct 2022 13:44:51 -0400 X-MC-Unique: Xy8wObC_NC2hhS38zz7kew-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DE0351C08962; Mon, 24 Oct 2022 17:44:50 +0000 (UTC) Received: from llong.com (dhcp-17-153.bos.redhat.com [10.18.17.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9376A492B0E; Mon, 24 Oct 2022 17:44:50 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long , stable@vger.kernel.org Subject: [PATCH v4 2/5] locking/rwsem: Limit # of null owner retries for handoff writer Date: Mon, 24 Oct 2022 13:44:15 -0400 Message-Id: <20221024174418.796468-3-longman@redhat.com> In-Reply-To: <20221024174418.796468-1-longman@redhat.com> References: <20221024174418.796468-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747604842828710267?= X-GMAIL-MSGID: =?utf-8?q?1747604842828710267?= Commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") assumes that when the owner field is changed to NULL, the lock will become free soon. Commit 48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock") disable preemption when acquiring rwsem for write. However, preemption has not yet been disabled when acquiring a read lock on a rwsem. So a reader can add a RWSEM_READER_BIAS to count without setting owner to signal a reader, got preempted out by a RT task which then spins in the writer slowpath as owner remains NULL leading to live lock. One way to fix that is to disable preemption before the read lock attempt and then immediately remove RWSEM_READER_BIAS when the trylock fails before reenabling preemption. This will remove some optimizations that can be done by delaying the RWSEM_READER_BIAS backoff. Alternatively we could delay the preempt_enable() into the rwsem_down_read_slowpath() and even after acquiring and releasing the wait_lock. Another possible alternative is to limit the number of trylock attempts without sleeping. The last alternative seems to be less messy and is being implemented in this patch. The current limit is now set to 8 to allow enough time for the other task to hopefully complete its action. By adding new lock events to track the number of NULL owner retries with handoff flag set before a successful trylock when running a 96 threads locking microbenchmark with equal number of readers and writers running on a 2-core 96-thread system for 15 seconds, the following stats are obtained. Note that none of locking threads are RT tasks. Retries of successful trylock Count ----------------------------- ----- 1 1738 2 19 3 11 4 2 5 1 6 1 7 1 8 0 X 1 The last row is the one failed attempt that needs more than 8 retries. So a retry count maximum of 8 should capture most of them if no RT task is in the mix. Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") Reported-by: Mukesh Ojha Signed-off-by: Waiman Long Reviewed-and-tested-by: Mukesh Ojha Cc: stable@vger.kernel.org --- kernel/locking/rwsem.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index be2df9ea7c30..c68d76fc8c68 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1115,6 +1115,7 @@ static struct rw_semaphore __sched * rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) { struct rwsem_waiter waiter; + int null_owner_retries; DEFINE_WAKE_Q(wake_q); /* do optimistic spinning and steal lock if possible */ @@ -1156,7 +1157,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) set_current_state(state); trace_contention_begin(sem, LCB_F_WRITE); - for (;;) { + for (null_owner_retries = 0;;) { if (rwsem_try_write_lock(sem, &waiter)) { /* rwsem_try_write_lock() implies ACQUIRE on success */ break; @@ -1182,8 +1183,21 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) owner_state = rwsem_spin_on_owner(sem); preempt_enable(); - if (owner_state == OWNER_NULL) + /* + * owner is NULL doesn't guarantee the lock is free. + * An incoming reader will temporarily increment the + * reader count without changing owner and the + * rwsem_try_write_lock() will fails if the reader + * is not able to decrement it in time. Allow 8 + * trylock attempts when hitting a NULL owner before + * going to sleep. + */ + if ((owner_state == OWNER_NULL) && + (null_owner_retries < 8)) { + null_owner_retries++; goto trylock_again; + } + null_owner_retries = 0; } schedule(); From patchwork Mon Oct 24 17:44:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10006 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp617337wru; Mon, 24 Oct 2022 12:31:23 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5FCLuqFyJufvFOvm2L/4njTZ+lW41GIpf6DN6gU/4/kl8Jsj6WPIIlxNE6bZV5miIDH3Ox X-Received: by 2002:a05:6a02:11a:b0:441:9db5:e8cc with SMTP id bg26-20020a056a02011a00b004419db5e8ccmr28821419pgb.345.1666639883438; Mon, 24 Oct 2022 12:31:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666639883; cv=none; d=google.com; s=arc-20160816; b=QsVwkZu45MK0GIOCbxztZ2jfaegdNM+YlkZfL500gCcSB/pnfzP8hDnzVr3ZOJ+Iya /S4xqq1vBAVR8e24bhxnovi++JvH6pLd1NrkazCqAy1HtxyhbhKVWfui6eNeWRGvP2pb ABiIYeHYPbhycbeHp6u/dI6161T0atGeAWu78MWHzmM+YdPnQP8GWuHW6Odws9abwx1q 27Bu3m3l7GEBfnOxl0AnoHR3VZ3jHMF+RMCr+pGz0xUvRrg73SpIckejlOuVU/FTZ13u SLf+pxv4rOcAH/He7CMXLm3gk2I1z322xXSjXVaCKrzipTjQfAxENyt8KvPtyczLEYc6 rEXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+XFNqhe/9TyVt9WobmDUsN79a+94gmZaId1xsUvIpII=; b=wikcZ6VHcLuTst1Fr6qX0TLxixLxFwIYEPsNX6tx5GX5CQJCVl/YrknJNYqnM4ZWPs 3Fs8G+KDz0xhpJPQlDYfc6nr800gMAfAhtpxhMHl7g2RX6TaZwrvEFc2eE12wDKjWRZX QR0NIN6A2kjISWxHaOzvNNX8VYOY1aSnX//6iAipo450gFeuQK7tRBCbWpEOI18oXARg 2/IwSrUFv/Bd1mZ/URE859YXC30C45y5urFI1+VYGQBuDlB4mae4guEnD2kl67U6Bpty Q/WBZcqaY2qeMDdBUIV8r+p57Io7dsks7KjHQccwG5Ssw7eK1c83TZYye0IV+6JnqxeQ Lq4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VOS48z8P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n7-20020a170902f60700b0016d7639681dsi336269plg.493.2022.10.24.12.31.09; Mon, 24 Oct 2022 12:31:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VOS48z8P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231171AbiJXTam (ORCPT + 99 others); Mon, 24 Oct 2022 15:30:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233222AbiJXT2v (ORCPT ); Mon, 24 Oct 2022 15:28:51 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B91D6B5FF0 for ; Mon, 24 Oct 2022 11:00:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666634341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+XFNqhe/9TyVt9WobmDUsN79a+94gmZaId1xsUvIpII=; b=VOS48z8PIvfgSnGvEuhWrINbbG60figtq7oLWt95Vy6onN9oZeNAOMTts5GuyoNXJ2OHm/ A+yaq1XtK8hZmUW1pgPl5Vf8a5IU6nyRPl8+/Jqe6yEmcw5D8IKup4ZKceqmLWvIoJ/0V0 HQiPPcYmCsLYUPa8o4KJ9IIFx9Ra32M= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-104-k9qH5hQlNIiEGiRqsutV4A-1; Mon, 24 Oct 2022 13:44:52 -0400 X-MC-Unique: k9qH5hQlNIiEGiRqsutV4A-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 37353803D4B; Mon, 24 Oct 2022 17:44:51 +0000 (UTC) Received: from llong.com (dhcp-17-153.bos.redhat.com [10.18.17.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id E6073492B0F; Mon, 24 Oct 2022 17:44:50 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long Subject: [PATCH v4 3/5] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum Date: Mon, 24 Oct 2022 13:44:16 -0400 Message-Id: <20221024174418.796468-4-longman@redhat.com> In-Reply-To: <20221024174418.796468-1-longman@redhat.com> References: <20221024174418.796468-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747598582523556636?= X-GMAIL-MSGID: =?utf-8?q?1747598582523556636?= Change the boolean waiter->handoff_set to an enum type so that we can have more states in some later patches. Also use READ_ONCE() outside wait_lock critical sections for read and WRITE_ONCE() inside wait_lock critical sections for write for proper synchronization. There is no functional change. Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index c68d76fc8c68..a8bfc905637a 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -338,12 +338,17 @@ enum rwsem_waiter_type { RWSEM_WAITING_FOR_READ }; +enum rwsem_handoff_state { + HANDOFF_NONE = 0, + HANDOFF_REQUESTED, +}; + struct rwsem_waiter { struct list_head list; struct task_struct *task; enum rwsem_waiter_type type; + enum rwsem_handoff_state handoff_state; unsigned long timeout; - bool handoff_set; }; #define rwsem_first_waiter(sem) \ list_first_entry(&sem->wait_list, struct rwsem_waiter, list) @@ -470,7 +475,7 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, adjustment -= RWSEM_FLAG_HANDOFF; lockevent_inc(rwsem_rlock_handoff); } - waiter->handoff_set = true; + WRITE_ONCE(waiter->handoff_state, HANDOFF_REQUESTED); } atomic_long_add(-adjustment, &sem->count); @@ -622,7 +627,7 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, * waiter is the one that set it. Otherwisee, we * still try to acquire the rwsem. */ - if (first->handoff_set && (waiter != first)) + if (first->handoff_state && (waiter != first)) return false; } @@ -650,11 +655,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, /* * We have either acquired the lock with handoff bit cleared or set - * the handoff bit. Only the first waiter can have its handoff_set + * the handoff bit. Only the first waiter can have its handoff_state * set here to enable optimistic spinning in slowpath loop. */ if (new & RWSEM_FLAG_HANDOFF) { - first->handoff_set = true; + WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED); lockevent_inc(rwsem_wlock_handoff); return false; } @@ -1043,7 +1048,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat waiter.task = current; waiter.type = RWSEM_WAITING_FOR_READ; waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; - waiter.handoff_set = false; + waiter.handoff_state = HANDOFF_NONE; raw_spin_lock_irq(&sem->wait_lock); if (list_empty(&sem->wait_list)) { @@ -1131,7 +1136,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) waiter.task = current; waiter.type = RWSEM_WAITING_FOR_WRITE; waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; - waiter.handoff_set = false; + waiter.handoff_state = HANDOFF_NONE; raw_spin_lock_irq(&sem->wait_lock); rwsem_add_waiter(sem, &waiter); @@ -1176,7 +1181,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) * In this case, we attempt to acquire the lock again * without sleeping. */ - if (waiter.handoff_set) { + if (READ_ONCE(waiter.handoff_state)) { enum owner_state owner_state; preempt_disable(); From patchwork Mon Oct 24 17:44:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 9958 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp609107wru; Mon, 24 Oct 2022 12:10:07 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4paAUKD5OT6IcMDROBRnoDJk+PmTDweBtgPi5XQYkK8fRUOnmqAIVhKQfVeFsofXsE4/xv X-Received: by 2002:a63:5658:0:b0:43c:dac:9e24 with SMTP id g24-20020a635658000000b0043c0dac9e24mr28923270pgm.562.1666638607657; Mon, 24 Oct 2022 12:10:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666638607; cv=none; d=google.com; s=arc-20160816; b=rkIli+0IKRA5sp56HD/azBTMT0013hg9ZYjyECOy6ddozYt742hJ6c/BvpAJPiMXR/ mOrUg3Mp7rH5Gt+3YZw6JpO0gDp0dfl7akAPrFl2vBfPHFvvz8nwDFLUepij9rJ98EnI bbebvjkdWDcH2+iWRxLvf621IsKNDbNCpFz3JiLj4qPJcjYtA40UcLBmXadz/SKENma6 cS3XXr8Ew6lmR1EZ6mW8iAd4Gz6nDD1JxUZgYid86rrNhMWpCR3yxeTCJxURiLM6H5YA kCB2At6w4tKdky7a4wS7yZKjXrVr/NLxCRARJY64o6ARbyPnEKHnUUr3qLdBHs5xF6/S 1iuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=dj+K0QnmZOTaKDE9gNQSNFlKIJ5weGyKvZBo3KJQByY=; b=NcLTTKvzp8l3Ay6ucquqDz9vLHi0SPhMospE9PH4q5Aa5x69G+gqQWp3N6t047pfzx RucdM6eiYzucZ1Tr8t/YaapqHUf9kc94n9vjbtVncXdvMRAUkCVkoVccIxZsqQXPikaT 0p9vodgkukbtyn99NF5G7ILb8pl5U0J8cE313Icad5zPk+IdqVsbxMkIB+vtjdbIJ5s7 3p2gnv8WcvAsCJWENd1KMcvQ3QLVgMQYfG+SHEZmpnitGkzZ7sMocfDyUJ2K73KQWYrt llwUZ13zGiuPVszb9AhEaO5gf8v1Ig84AUJoRNAFw0teq3VETFn8juglSPbRbBkT7jRk FnrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=eFpya0j5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 3-20020a17090a030300b0020a6302b4f2si9676158pje.75.2022.10.24.12.09.54; Mon, 24 Oct 2022 12:10:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=eFpya0j5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232876AbiJXTIb (ORCPT + 99 others); Mon, 24 Oct 2022 15:08:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232364AbiJXTIB (ORCPT ); Mon, 24 Oct 2022 15:08:01 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E6326BD46 for ; Mon, 24 Oct 2022 10:47:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666633496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dj+K0QnmZOTaKDE9gNQSNFlKIJ5weGyKvZBo3KJQByY=; b=eFpya0j5yu0wRUQbkpLeQDl6XmtuXSs5dVJnAjMbtiTlStuL2kMlyiqyYobvRM+Nh4Pv1u Sl9dwh/8LS2ytjQrxqzmH61tNbAIpwp1Uq6KpILOrtl6nbpEDyoh9Vckg+bEawI3b3M8cL EY5bPjAw4A6QKP2t0mdc3uUj0ecdzzo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-649-8_A9CgkHOe-LDS4tt285hQ-1; Mon, 24 Oct 2022 13:44:52 -0400 X-MC-Unique: 8_A9CgkHOe-LDS4tt285hQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 86622882826; Mon, 24 Oct 2022 17:44:51 +0000 (UTC) Received: from llong.com (dhcp-17-153.bos.redhat.com [10.18.17.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 404D7492B0E; Mon, 24 Oct 2022 17:44:51 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long Subject: [PATCH v4 4/5] locking/rwsem: Enable direct rwsem lock handoff Date: Mon, 24 Oct 2022 13:44:17 -0400 Message-Id: <20221024174418.796468-5-longman@redhat.com> In-Reply-To: <20221024174418.796468-1-longman@redhat.com> References: <20221024174418.796468-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747597244854895009?= X-GMAIL-MSGID: =?utf-8?q?1747597244854895009?= The lock handoff provided in rwsem isn't a true handoff like that in the mutex. Instead, it is more like a quiescent state where optimistic spinning and lock stealing are disabled to make it easier for the first waiter to acquire the lock. Reworking the code to enable a true lock handoff is more complex due to the following facts: 1) The RWSEM_FLAG_HANDOFF bit is protected by the wait_lock and it is too expensive to always take the wait_lock in the unlock path to prevent racing. 2) The reader lock fast path may add a RWSEM_READER_BIAS at the wrong time to prevent a proper lock handoff from a reader owned rwsem. A lock handoff can only be initiated when the following conditions are true: 1) The RWSEM_FLAG_HANDOFF bit is set. 2) The task to do the handoff don't see any other active lock excluding the lock that it might have held. The new handoff mechanism performs handoff in rwsem_wakeup() to minimize overhead. The rwsem count will be known at that point to determine if handoff should be done. However, there is a small time gap between the rwsem becomes free and the wait_lock is taken where a reader can come in and add a RWSEM_READER_BIAS to the count or the current first waiter can take the rwsem and clear RWSEM_FLAG_HANDOFF in the interim. That will fail the handoff operation. To handle the former case, a secondary handoff will also be done in the rwsem_down_read_slowpath() to catch it. With true lock handoff, there is no need to do a NULL owner spinning anymore as wakeup will be performed if handoff is possible. So it is likely that the first waiter won't actually go to sleep even when schedule() is called in this case. Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 141 +++++++++++++++++++++++++++++++---------- 1 file changed, 109 insertions(+), 32 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index a8bfc905637a..287606aee0e6 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -341,6 +341,7 @@ enum rwsem_waiter_type { enum rwsem_handoff_state { HANDOFF_NONE = 0, HANDOFF_REQUESTED, + HANDOFF_GRANTED, }; struct rwsem_waiter { @@ -489,6 +490,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, */ owner = waiter->task; __rwsem_set_reader_owned(sem, owner); + } else if (waiter->handoff_state == HANDOFF_GRANTED) { + /* + * rwsem_handoff() has added to count RWSEM_READER_BIAS of + * the first waiter. + */ + adjustment = RWSEM_READER_BIAS; } /* @@ -586,7 +593,7 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waiter, struct wake_q_head *wake_q) __releases(&sem->wait_lock) { - bool first = rwsem_first_waiter(sem) == waiter; + struct rwsem_waiter *first = rwsem_first_waiter(sem); wake_q_init(wake_q); @@ -595,8 +602,21 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waiter, * the first waiter, we wake up the remaining waiters as they may * be eligible to acquire or spin on the lock. */ - if (rwsem_del_waiter(sem, waiter) && first) + if (rwsem_del_waiter(sem, waiter) && (waiter == first)) { + switch (waiter->handoff_state) { + case HANDOFF_GRANTED: + raw_spin_unlock_irq(&sem->wait_lock); + return; + case HANDOFF_REQUESTED: + /* Pass handoff state to the new first waiter */ + first = rwsem_first_waiter(sem); + WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED); + fallthrough; + default: + break; + } rwsem_mark_wake(sem, RWSEM_WAKE_ANY, wake_q); + } raw_spin_unlock_irq(&sem->wait_lock); if (!wake_q_empty(wake_q)) wake_up_q(wake_q); @@ -764,6 +784,11 @@ rwsem_spin_on_owner(struct rw_semaphore *sem) owner = rwsem_owner_flags(sem, &flags); state = rwsem_owner_state(owner, flags); + + /* A handoff may have been granted */ + if (!flags && (owner == current)) + return OWNER_NONSPINNABLE; + if (state != OWNER_WRITER) return state; @@ -977,6 +1002,32 @@ rwsem_spin_on_owner(struct rw_semaphore *sem) } #endif +/* + * Hand off the lock to the first waiter + */ +static void rwsem_handoff(struct rw_semaphore *sem, long adj, + struct wake_q_head *wake_q) +{ + struct rwsem_waiter *waiter; + enum rwsem_wake_type wake_type; + + lockdep_assert_held(&sem->wait_lock); + adj -= RWSEM_FLAG_HANDOFF; + waiter = rwsem_first_waiter(sem); + WRITE_ONCE(waiter->handoff_state, HANDOFF_GRANTED); + if (waiter->type == RWSEM_WAITING_FOR_WRITE) { + wake_type = RWSEM_WAKE_ANY; + adj += RWSEM_WRITER_LOCKED; + atomic_long_set(&sem->owner, (long)waiter->task); + } else { + wake_type = RWSEM_WAKE_READ_OWNED; + adj += RWSEM_READER_BIAS; + __rwsem_set_reader_owned(sem, waiter->task); + } + atomic_long_add(adj, &sem->count); + rwsem_mark_wake(sem, wake_type, wake_q); +} + /* * Prepare to wake up waiter(s) in the wait queue by putting them into the * given wake_q if the rwsem lock owner isn't a writer. If rwsem is likely @@ -1051,6 +1102,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat waiter.handoff_state = HANDOFF_NONE; raw_spin_lock_irq(&sem->wait_lock); + count = atomic_long_read(&sem->count); if (list_empty(&sem->wait_list)) { /* * In case the wait queue is empty and the lock isn't owned @@ -1058,7 +1110,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat * immediately as its RWSEM_READER_BIAS has already been set * in the count. */ - if (!(atomic_long_read(&sem->count) & RWSEM_WRITER_MASK)) { + if (!(count & RWSEM_WRITER_MASK)) { /* Provide lock ACQUIRE */ smp_acquire__after_ctrl_dep(); raw_spin_unlock_irq(&sem->wait_lock); @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat return sem; } adjustment += RWSEM_FLAG_WAITERS; + } else if ((count & RWSEM_FLAG_HANDOFF) && + ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) { + /* + * If the waiter to be handed off is a reader, this reader + * can piggyback on top of top of that. + */ + if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ) + adjustment = 0; + rwsem_handoff(sem, adjustment, &wake_q); + + if (!adjustment) { + raw_spin_unlock_irq(&sem->wait_lock); + wake_up_q(&wake_q); + return sem; + } + adjustment = 0; } rwsem_add_waiter(sem, &waiter); - /* we're now waiting on the lock, but no longer actively locking */ - count = atomic_long_add_return(adjustment, &sem->count); - - rwsem_cond_wake_waiter(sem, count, &wake_q); + if (adjustment) { + /* + * We are now waiting on the lock with no handoff, but no + * longer actively locking. + */ + count = atomic_long_add_return(adjustment, &sem->count); + rwsem_cond_wake_waiter(sem, count, &wake_q); + } raw_spin_unlock_irq(&sem->wait_lock); if (!wake_q_empty(&wake_q)) @@ -1120,7 +1192,6 @@ static struct rw_semaphore __sched * rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) { struct rwsem_waiter waiter; - int null_owner_retries; DEFINE_WAKE_Q(wake_q); /* do optimistic spinning and steal lock if possible */ @@ -1162,7 +1233,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) set_current_state(state); trace_contention_begin(sem, LCB_F_WRITE); - for (null_owner_retries = 0;;) { + for (;;) { if (rwsem_try_write_lock(sem, &waiter)) { /* rwsem_try_write_lock() implies ACQUIRE on success */ break; @@ -1182,37 +1253,28 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) * without sleeping. */ if (READ_ONCE(waiter.handoff_state)) { - enum owner_state owner_state; - - preempt_disable(); - owner_state = rwsem_spin_on_owner(sem); - preempt_enable(); - - /* - * owner is NULL doesn't guarantee the lock is free. - * An incoming reader will temporarily increment the - * reader count without changing owner and the - * rwsem_try_write_lock() will fails if the reader - * is not able to decrement it in time. Allow 8 - * trylock attempts when hitting a NULL owner before - * going to sleep. - */ - if ((owner_state == OWNER_NULL) && - (null_owner_retries < 8)) { - null_owner_retries++; - goto trylock_again; + if (READ_ONCE(waiter.handoff_state) == HANDOFF_REQUESTED) { + preempt_disable(); + rwsem_spin_on_owner(sem); + preempt_enable(); } - null_owner_retries = 0; + if (READ_ONCE(waiter.handoff_state) == HANDOFF_GRANTED) + goto skip_sleep; } schedule(); lockevent_inc(rwsem_sleep_writer); set_current_state(state); -trylock_again: +skip_sleep: raw_spin_lock_irq(&sem->wait_lock); + if (waiter.handoff_state == HANDOFF_GRANTED) { + rwsem_del_waiter(sem, &waiter); + break; + } } __set_current_state(TASK_RUNNING); raw_spin_unlock_irq(&sem->wait_lock); +out_lock: lockevent_inc(rwsem_wlock); trace_contention_end(sem, 0); return sem; @@ -1221,6 +1283,9 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) __set_current_state(TASK_RUNNING); raw_spin_lock_irq(&sem->wait_lock); rwsem_del_wake_waiter(sem, &waiter, &wake_q); + if (unlikely(READ_ONCE(waiter.handoff_state) == HANDOFF_GRANTED)) + goto out_lock; + lockevent_inc(rwsem_wlock_fail); trace_contention_end(sem, -EINTR); return ERR_PTR(-EINTR); @@ -1232,12 +1297,24 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) */ static struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem) { - unsigned long flags; DEFINE_WAKE_Q(wake_q); + unsigned long flags; + long count; raw_spin_lock_irqsave(&sem->wait_lock, flags); - if (!list_empty(&sem->wait_list)) + if (list_empty(&sem->wait_list)) { + raw_spin_unlock_irqrestore(&sem->wait_lock, flags); + return sem; + } + /* + * If the rwsem is free and handoff flag is set with wait_lock held, + * no other CPUs can take an active lock. + */ + count = atomic_long_read(&sem->count); + if (!(count & RWSEM_LOCK_MASK) && (count & RWSEM_FLAG_HANDOFF)) + rwsem_handoff(sem, 0, &wake_q); + else rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); raw_spin_unlock_irqrestore(&sem->wait_lock, flags); From patchwork Mon Oct 24 17:44:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10004 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp617106wru; Mon, 24 Oct 2022 12:30:52 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7jUgScuuvcM2qEuQzOphaZvDRlcuGaD9aHj0nxyUeFgLIvCSzTbcPvowccQNBTUhM1A+aL X-Received: by 2002:a17:903:2c9:b0:182:c500:d93d with SMTP id s9-20020a17090302c900b00182c500d93dmr35861300plk.44.1666639852118; Mon, 24 Oct 2022 12:30:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666639852; cv=none; d=google.com; s=arc-20160816; b=wwztIrTHtTEm43XNoYNmNXsNYjlx3nPvmSiCvm/C4ghlVRuF5YFoPca3ePbho23btS Iv8SDEtTfhR81aTFjqXMOVsbDiTMbEDwSjRYWoUGQp0iiA3iE0/s0SxqOdivJqTtwykZ 6/CScxIyktKDykm3XrrZklGvu18Fmb2tIzkYJ9y/nBXiab1W/VENyOzwhUUmZ6Q/U35v 96D/OiYlkYkhFWkqomYJ8A7qWwBiYpy4JpHS3SyGqWkt1/zLkFypE6/C+oOPSGFrthYz PRor25o+RKsEo/uvePRfoho4znU0tMzOFBhlT8qDex7sTlmxRXcjE6hSevwKgZlVX9g2 0gFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fCvbSgxKkVngETIOIh3AmCdsye2c6FCNzZDR+LWaBok=; b=yvqzuec06DH7YH31jkup3Hh+l3vTk1feDHD+atcsXy/bZlmekYr/DX4+jm404wC/5V JC2GCX8G4RfG1jgBibCEt+uBcgwjDgCZKs2N76vCzVLAvxNh+6742ibX6ID2hsDpKR0S ZrXj/hYLmwvMYRFSG7Duaz8BpLp4vpkDepZ/ZtV1YtSPj0QwSudgMtu0XL0GAfNWDVp7 IgSc4+Xri/WD5P0MYeJgtuDHGJ0nPY+BpAJxQSL6IciwsksGdME+DAAafTIeJ1LIBZUL KDXY42JGKglpMTnEEMKVcP2OxCszgIXL+M7go3JM2p9twEYHNQz5k7nwfiS+MgqUOtSa uyYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="DWzb+n/5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q18-20020a17090311d200b001867a6a1791si377263plh.426.2022.10.24.12.30.37; Mon, 24 Oct 2022 12:30:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="DWzb+n/5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230311AbiJXT1e (ORCPT + 99 others); Mon, 24 Oct 2022 15:27:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230055AbiJXT0U (ORCPT ); Mon, 24 Oct 2022 15:26:20 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 886FC72FD3 for ; Mon, 24 Oct 2022 10:58:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666634295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fCvbSgxKkVngETIOIh3AmCdsye2c6FCNzZDR+LWaBok=; b=DWzb+n/5GMSSkudDS0cgxtdghZ6VmYC/AVHqZjS9RN+4es+Dv2mxBbqQKywUCBnPGnOJOJ 41axuWjq/DUM3wWt0Aca0l354BS+ctwtV/708WVBYg0+HyKY77i0d/Cig2q0E2KqQO5zAr WvWK1zyQVS/yj4QTkakUt11yKN8YiRo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-137-lPIKskdWP3-rNMpu5I5t0w-1; Mon, 24 Oct 2022 13:44:52 -0400 X-MC-Unique: lPIKskdWP3-rNMpu5I5t0w-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D49A63817960; Mon, 24 Oct 2022 17:44:51 +0000 (UTC) Received: from llong.com (dhcp-17-153.bos.redhat.com [10.18.17.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8F12C492B0E; Mon, 24 Oct 2022 17:44:51 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long Subject: [PATCH v4 5/5] locking/rwsem: Update handoff lock events tracking Date: Mon, 24 Oct 2022 13:44:18 -0400 Message-Id: <20221024174418.796468-6-longman@redhat.com> In-Reply-To: <20221024174418.796468-1-longman@redhat.com> References: <20221024174418.796468-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747598549651492681?= X-GMAIL-MSGID: =?utf-8?q?1747598549651492681?= With the new direct rwsem lock handoff, the corresponding handoff lock events are updated to also track the number of secondary lock handoffs in rwsem_down_read_slowpath() to see how prevalent those handoff events are. The number of primary lock handoffs in the unlock paths is (rwsem_handoff_read + rwsem_handoff_write - rwsem_handoff_rslow). After running a 96-thread rwsem microbenchmark with equal number of readers and writers on a 2-socket 96-thread system for 40s, the following handoff stats were obtained: rwsem_handoff_read=189 rwsem_handoff_rslow=1 rwsem_handoff_write=6678 rwsem_handoff_wspin=6681 The number of primary handoffs was 6866, whereas there was only one secondary handoff for this test run. Signed-off-by: Waiman Long --- kernel/locking/lock_events_list.h | 6 ++++-- kernel/locking/rwsem.c | 9 +++++---- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 97fb6f3f840a..04d101767c2c 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -63,7 +63,9 @@ LOCK_EVENT(rwsem_rlock) /* # of read locks acquired */ LOCK_EVENT(rwsem_rlock_steal) /* # of read locks by lock stealing */ LOCK_EVENT(rwsem_rlock_fast) /* # of fast read locks acquired */ LOCK_EVENT(rwsem_rlock_fail) /* # of failed read lock acquisitions */ -LOCK_EVENT(rwsem_rlock_handoff) /* # of read lock handoffs */ LOCK_EVENT(rwsem_wlock) /* # of write locks acquired */ LOCK_EVENT(rwsem_wlock_fail) /* # of failed write lock acquisitions */ -LOCK_EVENT(rwsem_wlock_handoff) /* # of write lock handoffs */ +LOCK_EVENT(rwsem_handoff_read) /* # of read lock handoffs */ +LOCK_EVENT(rwsem_handoff_write) /* # of write lock handoffs */ +LOCK_EVENT(rwsem_handoff_rslow) /* # of handoffs in read slowpath */ +LOCK_EVENT(rwsem_handoff_wspin) /* # of handoff spins in write slowpath */ diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 287606aee0e6..46aea1994bf8 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -472,10 +472,8 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, * force the issue. */ if (time_after(jiffies, waiter->timeout)) { - if (!(oldcount & RWSEM_FLAG_HANDOFF)) { + if (!(oldcount & RWSEM_FLAG_HANDOFF)) adjustment -= RWSEM_FLAG_HANDOFF; - lockevent_inc(rwsem_rlock_handoff); - } WRITE_ONCE(waiter->handoff_state, HANDOFF_REQUESTED); } @@ -680,7 +678,6 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, */ if (new & RWSEM_FLAG_HANDOFF) { WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED); - lockevent_inc(rwsem_wlock_handoff); return false; } @@ -1019,10 +1016,12 @@ static void rwsem_handoff(struct rw_semaphore *sem, long adj, wake_type = RWSEM_WAKE_ANY; adj += RWSEM_WRITER_LOCKED; atomic_long_set(&sem->owner, (long)waiter->task); + lockevent_inc(rwsem_handoff_write); } else { wake_type = RWSEM_WAKE_READ_OWNED; adj += RWSEM_READER_BIAS; __rwsem_set_reader_owned(sem, waiter->task); + lockevent_inc(rwsem_handoff_read); } atomic_long_add(adj, &sem->count); rwsem_mark_wake(sem, wake_type, wake_q); @@ -1128,6 +1127,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ) adjustment = 0; rwsem_handoff(sem, adjustment, &wake_q); + lockevent_inc(rwsem_handoff_rslow); if (!adjustment) { raw_spin_unlock_irq(&sem->wait_lock); @@ -1257,6 +1257,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) preempt_disable(); rwsem_spin_on_owner(sem); preempt_enable(); + lockevent_inc(rwsem_handoff_wspin); } if (READ_ONCE(waiter.handoff_state) == HANDOFF_GRANTED) goto skip_sleep;