From patchwork Thu Nov 3 18:29:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15099 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp699235wru; Thu, 3 Nov 2022 11:33:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6ZLIT5zJW1vJDrM5tx88gESnqQqInumD8OalsxQ8PLZLwzpmrkKmmBQpf8SEtoxiy54wU3 X-Received: by 2002:a50:fd12:0:b0:463:cb99:5b1 with SMTP id i18-20020a50fd12000000b00463cb9905b1mr13681540eds.256.1667500420351; Thu, 03 Nov 2022 11:33:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667500420; cv=none; d=google.com; s=arc-20160816; b=waM4Q4+JHu9qYmZCJr/aKkHBnJlCX0RgdgxUMpBE9+G0wAm5PqmIdrz8FkyO7DNpXw aDXHpmpzP0TuM5dCEokVMjpG4tmDKSvLm5ZOVML37ui0AL5EzrP+azVOJe2hN7yDV8ne rXBi7hni6F/ohVCX/Pd9abBSkzMK1hHLn8cdBgCCl8PaRE476Pztzrvy7t+d4mRiW29i pizCMohD1Y8yesPlnXKUyTn2jZcdEPMA/td2FoevPN9UVBux+GuCinnFEQv5kDjastcw Q6hxD10oalSph3gZ37FUHHNqhmCI5Ad9AYEhCxJv0SYETMYplimPSTX9QisYsQNd6Saf nLpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jqegOKwKp0b5k01538L+8elHQKTxwvJermmvNJSeQ8I=; b=XYIhL87nJ903iJrk5ZdgJulrT1REdG03DVhEK20XUzx9F8aMm9GnCpKzXm+PG+DhTW IC3EWdK2YqbLDEKp5uCCL3xpFyBE9oR6XuNaI31j+EWoQz+rtMZDy8Gz0koQYhU4dmUa BuSdKYtEj+yLCxRt6doyhLMDHE42v8RuB6eqk0NlN6DTqrdPk9Gq+L5F4nU/mZ1Fp7d8 d0Z5kxdAyMihbT27fZj9APPMb5F7W5m5N4C8JBNacPNiy5OdAad+yfcGGvWDjy8OOKO3 R9ceKyQiyp1tIw3K++XUJkYOB3ly8U6ptZjqnDxh7Fc3IYUYQ6f3LMr+kfF/gvIMVSdq coMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="MKKO/bAW"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e14-20020a056402190e00b004635dccd3casi2633126edz.57.2022.11.03.11.33.14; Thu, 03 Nov 2022 11:33:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="MKKO/bAW"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230220AbiKCSbM (ORCPT + 99 others); Thu, 3 Nov 2022 14:31:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231244AbiKCSap (ORCPT ); Thu, 3 Nov 2022 14:30:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C81064D8 for ; Thu, 3 Nov 2022 11:29:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667500188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jqegOKwKp0b5k01538L+8elHQKTxwvJermmvNJSeQ8I=; b=MKKO/bAWnt4NVYrY28x9XZsHJ2nXeAG0lMJp6VXhghcvqw9FW5yV+TmUArxceGr9LdeSW3 ZNi58BNDr1Tg8QTxqowW6Um51/mdV6Tz5TBjIYxhxKyH0SujmcNKlqKlmyr1O1NTpiE43+ NJR8CavrHxNJvuvF0F9NiDGryKqXHoc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-642-Jdu_Ns1KO_W26tMJ7EVP5g-1; Thu, 03 Nov 2022 14:29:44 -0400 X-MC-Unique: Jdu_Ns1KO_W26tMJ7EVP5g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DCECE85A5B6; Thu, 3 Nov 2022 18:29:43 +0000 (UTC) Received: from llong.com (unknown [10.22.33.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id 71F821121325; Thu, 3 Nov 2022 18:29:43 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long , stable@vger.kernel.org Subject: [PATCH v5 1/6] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Date: Thu, 3 Nov 2022 14:29:31 -0400 Message-Id: <20221103182936.217120-2-longman@redhat.com> In-Reply-To: <20221103182936.217120-1-longman@redhat.com> References: <20221103182936.217120-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748500920906474847?= X-GMAIL-MSGID: =?utf-8?q?1748500920906474847?= A non-first waiter can potentially spin in the for loop of rwsem_down_write_slowpath() without sleeping but fail to acquire the lock even if the rwsem is free if the following sequence happens: Non-first RT waiter First waiter Lock holder ------------------- ------------ ----------- Acquire wait_lock rwsem_try_write_lock(): Set handoff bit if RT or wait too long Set waiter->handoff_set Release wait_lock Acquire wait_lock Inherit waiter->handoff_set Release wait_lock Clear owner Release lock if (waiter.handoff_set) { rwsem_spin_on_owner((); if (OWNER_NULL) goto trylock_again; } trylock_again: Acquire wait_lock rwsem_try_write_lock(): if (first->handoff_set && (waiter != first)) return false; Release wait_lock A non-first waiter cannot really acquire the rwsem even if it mistakenly believes that it can spin on OWNER_NULL value. If that waiter happens to be an RT task running on the same CPU as the first waiter, it can block the first waiter from acquiring the rwsem leading to live lock. Fix this problem by making sure that a non-first waiter cannot spin in the slowpath loop without sleeping. Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Reviewed-and-tested-by: Mukesh Ojha Signed-off-by: Waiman Long Cc: stable@vger.kernel.org --- kernel/locking/rwsem.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 44873594de03..be2df9ea7c30 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -624,18 +624,16 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, */ if (first->handoff_set && (waiter != first)) return false; - - /* - * First waiter can inherit a previously set handoff - * bit and spin on rwsem if lock acquisition fails. - */ - if (waiter == first) - waiter->handoff_set = true; } new = count; if (count & RWSEM_LOCK_MASK) { + /* + * A waiter (first or not) can set the handoff bit + * if it is an RT task or wait in the wait queue + * for too long. + */ if (has_handoff || (!rt_task(waiter->task) && !time_after(jiffies, waiter->timeout))) return false; @@ -651,11 +649,12 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); /* - * We have either acquired the lock with handoff bit cleared or - * set the handoff bit. + * We have either acquired the lock with handoff bit cleared or set + * the handoff bit. Only the first waiter can have its handoff_set + * set here to enable optimistic spinning in slowpath loop. */ if (new & RWSEM_FLAG_HANDOFF) { - waiter->handoff_set = true; + first->handoff_set = true; lockevent_inc(rwsem_wlock_handoff); return false; } From patchwork Thu Nov 3 18:29:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15098 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp698850wru; Thu, 3 Nov 2022 11:32:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4Ih5GN/xd91IjijSIVzce9xarjgAJSu/RY0X27/ctjZSzS501eA0sx/w51ZE/5b8nX+M/s X-Received: by 2002:a17:902:d50c:b0:187:460:bf9c with SMTP id b12-20020a170902d50c00b001870460bf9cmr31422020plg.4.1667500373784; Thu, 03 Nov 2022 11:32:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667500373; cv=none; d=google.com; s=arc-20160816; b=qeJ7DCrVtx6EXIeGUkEfTnDBo97IKDgR8Eu4db7DpZFjV5WvymWpnHWbNkpJviFfwG WBY9733PS1Yr05b035kHsV5cgaPTjeqd472prABJI4y9/yUKWk/EUyPpIOLorTzAptEr 3wse97Ba8BfCopy+mx7WzYN2MGOpBghHNxUFRlG6BQYO+DcbsoZo3XM6ShQjxEOFEYjZ lxRzhSuQ/BcM/640Ld6he685ACujugYZspxJ2j1o5IRaOj8Adl5Ki8JQwREggjKAwJEX LVSUqMe5L1eznqIaYJruqxrIPJ4X/yRqFX2v6+OBSJBCJ+VF+tfX79VA0Mf97NPv22My 7PNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lVJQ6hSeE7yu8R15/R64CIrCsSyaWJogCMScfV2fhQc=; b=VJ/fldflBQCyFrPoMI7oi/7w8A2+z3d8ct+JsCQIMpMJyJnMaVvu6bLU6k9qiRn11f 3ZLCVohcqrlNSxCkCnB8+NnzilPtZqEubHnX0StFp6fotXAEFChTlfPljMF1xfQKQ+99 zycioDbrgnXCHMvh5XGioyCqfikL8HWasfsmjRcZdW9M4HgdVRU5Y8p2TqTjhJ6dWk8i /JPlAWVBwUSFA4sxReFCt/tXhi3nHhxZ/UNUpM4oFufCwqaNVdlgdSwIrBunerRgfhjz rHwiqixPjOCgLz1RGFrHJyfd6kQwUlQtm6dpQLvC/qxU9h9VWhyFsEgXI3fh1aNX2BDB xy1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R47aNvhL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nn5-20020a17090b38c500b00200b014d2adsi802330pjb.26.2022.11.03.11.32.40; Thu, 03 Nov 2022 11:32:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R47aNvhL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231684AbiKCSbJ (ORCPT + 99 others); Thu, 3 Nov 2022 14:31:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230423AbiKCSap (ORCPT ); Thu, 3 Nov 2022 14:30:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 283E26448 for ; Thu, 3 Nov 2022 11:29:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667500188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lVJQ6hSeE7yu8R15/R64CIrCsSyaWJogCMScfV2fhQc=; b=R47aNvhLImD8rsRM02Y5r8kCNbHZH0qY2UGUm4C8pZqIgRDeFvLrDRFwr/FaP7VcDrPnxe 1ASVD51HLNKNOpdWnd7h2I33k/dvsY2Zu366BDH6lSA3+DbbxqFr4B1Ib8ur9T3YAyS7+T eSDXpPFyscHymqt9ivmhJoOnAmcDhXo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-635-sI_NjxXbN56K98bwbs_pWQ-1; Thu, 03 Nov 2022 14:29:46 -0400 X-MC-Unique: sI_NjxXbN56K98bwbs_pWQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 57683858F13; Thu, 3 Nov 2022 18:29:44 +0000 (UTC) Received: from llong.com (unknown [10.22.33.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id EA05A1121325; Thu, 3 Nov 2022 18:29:43 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long Subject: [PATCH v5 2/6] locking/rwsem: Disable preemption at all down_read*() and up_read() code paths Date: Thu, 3 Nov 2022 14:29:32 -0400 Message-Id: <20221103182936.217120-3-longman@redhat.com> In-Reply-To: <20221103182936.217120-1-longman@redhat.com> References: <20221103182936.217120-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748500871883588295?= X-GMAIL-MSGID: =?utf-8?q?1748500871883588295?= Commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") assumes that when the owner field is changed to NULL, the lock will become free soon. Commit 48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock") disables preemption when acquiring rwsem for write. However, preemption has not yet been disabled when acquiring a read lock on a rwsem. So a reader can add a RWSEM_READER_BIAS to count without setting owner to signal a reader, got preempted out by a RT task which then spins in the writer slowpath as owner remains NULL leading to live lock. One easy way to fix this problem is to disable preemption at all the down_read*() and up_read() code paths as implemented in this patch. Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") Reported-by: Mukesh Ojha Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index be2df9ea7c30..ebaff8a87e1d 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1091,7 +1091,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ break; } - schedule(); + schedule_preempt_disabled(); lockevent_inc(rwsem_sleep_reader); } @@ -1253,14 +1253,20 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem) */ static inline int __down_read_common(struct rw_semaphore *sem, int state) { + int ret = 0; long count; + preempt_disable(); if (!rwsem_read_trylock(sem, &count)) { - if (IS_ERR(rwsem_down_read_slowpath(sem, count, state))) - return -EINTR; + if (IS_ERR(rwsem_down_read_slowpath(sem, count, state))) { + ret = -EINTR; + goto out; + } DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); } - return 0; +out: + preempt_enable(); + return ret; } static inline void __down_read(struct rw_semaphore *sem) @@ -1280,19 +1286,23 @@ static inline int __down_read_killable(struct rw_semaphore *sem) static inline int __down_read_trylock(struct rw_semaphore *sem) { + int ret = 0; long tmp; DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); + preempt_disable(); tmp = atomic_long_read(&sem->count); while (!(tmp & RWSEM_READ_FAILED_MASK)) { if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, tmp + RWSEM_READER_BIAS)) { rwsem_set_reader_owned(sem); - return 1; + ret = 1; + break; } } - return 0; + preempt_enable(); + return ret; } /* @@ -1334,6 +1344,7 @@ static inline void __up_read(struct rw_semaphore *sem) DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); + preempt_disable(); rwsem_clear_reader_owned(sem); tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count); DEBUG_RWSEMS_WARN_ON(tmp < 0, sem); @@ -1342,6 +1353,7 @@ static inline void __up_read(struct rw_semaphore *sem) clear_nonspinnable(sem); rwsem_wake(sem); } + preempt_enable(); } /* From patchwork Thu Nov 3 18:29:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15094 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp698641wru; Thu, 3 Nov 2022 11:32:25 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7VqqNXgp5QNC3UQrdmYumx2SLKxAXFNLwQzazq8FcSpq7C4OrEmlzD78gFGLn4eydWQbr4 X-Received: by 2002:a17:902:7786:b0:178:48c0:a083 with SMTP id o6-20020a170902778600b0017848c0a083mr31562336pll.125.1667500344776; Thu, 03 Nov 2022 11:32:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667500344; cv=none; d=google.com; s=arc-20160816; b=PFW5vtOOZSu8bZBddMJZLBX5FArhNLaAk3sgpEucbDFi7ff7TgzgD9Oze9luXFcfB9 0uH3bQYnXWA3q1OvF3+Ae3CQ3np3OVUomDqSuSd/F4MaqYCQEsG0XVfCjR3SBy9DgAq/ FbnMYrqBLOKAmK45KYjA+KUJEZyr4PtiF9/7wKa0AbkUzh0kjf0aaIMVwXcdai3e6Sbw Hw+tNguzitMNKcGvCQutApYyP2q6Q7uLJCW1J8xw2uXiadhLTG+2iS5F+bkkTyHkm8sM 6MNPQD5u4GvakMF79aG97XHPXWtwHQPD/t+3jYgEFfvvm9PiGAE+bmlaOnE/PEiH53Pk SNIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QeVhMTv4WeHhoSDG61naFzwv8taUrm36WiOql1W/sjc=; b=OuznhTdu9S9Zcy7t8G/0h2fMvLFycCLEAPc1p09AtaJIZw8qkSd8UhWiWRDUgzYTPS FkbrostcnfVQljd3bRsqcg3Wf5BnzapVymkVFp9noi+wtO83leUH8hKgomGsZR7B6Mii R86Fqq/7l0Iccu72/LOcnTAQWrlEUamWcUTOajlMx1qnuV3lAx69qMPQ9Uupx1Y5uBcq 3ljlF9+QtxTx7e3Ebsih3S7hmmSBJ04nQsFXuG1In8ExOmEswxUy7YaD2cAHY9QPaDXT YW4NbMX0F1rQCbi9M3hrTw9cPn7udH30DTNPn+qTBhjx9xYvH1YrXEtKaVXa4jrlnERY nGMA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fl31Wi8o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bv8-20020a056a00414800b005698856bb2bsi1588036pfb.330.2022.11.03.11.32.11; Thu, 03 Nov 2022 11:32:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fl31Wi8o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230504AbiKCSbF (ORCPT + 99 others); Thu, 3 Nov 2022 14:31:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230203AbiKCSap (ORCPT ); Thu, 3 Nov 2022 14:30:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CB3C63EB for ; Thu, 3 Nov 2022 11:29:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667500187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QeVhMTv4WeHhoSDG61naFzwv8taUrm36WiOql1W/sjc=; b=fl31Wi8oTyMQLYOqpecDj9gd0+Ij6+WHCirIoia0rhYMhpBbA2TriYN06gi9ANbSA7BQKm kL1UiR2mi1QivJb6R6khXIi+U7hkSZHVXgOBt7Dk6DIihjb/rSt4M1mMEfOJyh8G/7i83Y bWz/7VJ5ZFyBCrJqAmSZCbyh/1tgUu0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-413-hDkW7RqPMfy-zfxVt6GxHQ-1; Thu, 03 Nov 2022 14:29:46 -0400 X-MC-Unique: hDkW7RqPMfy-zfxVt6GxHQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C4BDF800B23; Thu, 3 Nov 2022 18:29:44 +0000 (UTC) Received: from llong.com (unknown [10.22.33.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id 647FE1121325; Thu, 3 Nov 2022 18:29:44 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long Subject: [PATCH v5 3/6] locking/rwsem: Disable preemption at all down_write*() and up_write() code paths Date: Thu, 3 Nov 2022 14:29:33 -0400 Message-Id: <20221103182936.217120-4-longman@redhat.com> In-Reply-To: <20221103182936.217120-1-longman@redhat.com> References: <20221103182936.217120-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748500841381113553?= X-GMAIL-MSGID: =?utf-8?q?1748500841381113553?= The previous patch has disabled preemption at all the down_read() and up_read() code paths. For symmetry, this patch extends commit 48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock") to have preemption disabled at all the down_write() and up_write() code path including downgrade_write(). Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index ebaff8a87e1d..2953fa4dd790 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -256,16 +256,13 @@ static inline bool rwsem_read_trylock(struct rw_semaphore *sem, long *cntp) static inline bool rwsem_write_trylock(struct rw_semaphore *sem) { long tmp = RWSEM_UNLOCKED_VALUE; - bool ret = false; - preempt_disable(); if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) { rwsem_set_owner(sem); - ret = true; + return true; } - preempt_enable(); - return ret; + return false; } /* @@ -716,7 +713,6 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) return false; } - preempt_disable(); /* * Disable preemption is equal to the RCU read-side crital section, * thus the task_strcut structure won't go away. @@ -728,7 +724,6 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) if ((flags & RWSEM_NONSPINNABLE) || (owner && !(flags & RWSEM_READER_OWNED) && !owner_on_cpu(owner))) ret = false; - preempt_enable(); lockevent_cond_inc(rwsem_opt_fail, !ret); return ret; @@ -828,8 +823,6 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) int loop = 0; u64 rspin_threshold = 0; - preempt_disable(); - /* sem->wait_lock should not be held when doing optimistic spinning */ if (!osq_lock(&sem->osq)) goto done; @@ -937,7 +930,6 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) } osq_unlock(&sem->osq); done: - preempt_enable(); lockevent_cond_inc(rwsem_opt_fail, !taken); return taken; } @@ -1178,15 +1170,12 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) if (waiter.handoff_set) { enum owner_state owner_state; - preempt_disable(); owner_state = rwsem_spin_on_owner(sem); - preempt_enable(); - if (owner_state == OWNER_NULL) goto trylock_again; } - schedule(); + schedule_preempt_disabled(); lockevent_inc(rwsem_sleep_writer); set_current_state(state); trylock_again: @@ -1310,10 +1299,14 @@ static inline int __down_read_trylock(struct rw_semaphore *sem) */ static inline int __down_write_common(struct rw_semaphore *sem, int state) { + int ret = 0; + + preempt_disable(); if (unlikely(!rwsem_write_trylock(sem))) { if (IS_ERR(rwsem_down_write_slowpath(sem, state))) - return -EINTR; + ret = -EINTR; } + preempt_enable(); return 0; } @@ -1330,8 +1323,14 @@ static inline int __down_write_killable(struct rw_semaphore *sem) static inline int __down_write_trylock(struct rw_semaphore *sem) { + int ret; + + preempt_disable(); DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); - return rwsem_write_trylock(sem); + ret = rwsem_write_trylock(sem); + preempt_enable(); + + return ret; } /* @@ -1374,9 +1373,9 @@ static inline void __up_write(struct rw_semaphore *sem) preempt_disable(); rwsem_clear_owner(sem); tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count); - preempt_enable(); if (unlikely(tmp & RWSEM_FLAG_WAITERS)) rwsem_wake(sem); + preempt_enable(); } /* @@ -1394,11 +1393,13 @@ static inline void __downgrade_write(struct rw_semaphore *sem) * write side. As such, rely on RELEASE semantics. */ DEBUG_RWSEMS_WARN_ON(rwsem_owner(sem) != current, sem); + preempt_disable(); tmp = atomic_long_fetch_add_release( -RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS, &sem->count); rwsem_set_reader_owned(sem); if (tmp & RWSEM_FLAG_WAITERS) rwsem_downgrade_wake(sem); + preempt_enable(); } #else /* !CONFIG_PREEMPT_RT */ From patchwork Thu Nov 3 18:29:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15096 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp698830wru; Thu, 3 Nov 2022 11:32:51 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7xAAbiP5uOWuoqesLFfk4KYdq+QJ5u62PBywJXaAx1lMBalKDbX8kn/kCRfePxB26rNaTH X-Received: by 2002:a63:5a08:0:b0:43c:9fcc:cc54 with SMTP id o8-20020a635a08000000b0043c9fcccc54mr27089197pgb.229.1667500370805; Thu, 03 Nov 2022 11:32:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667500370; cv=none; d=google.com; s=arc-20160816; b=k5bxeY4pHZzxArdqeRrQcHnSr7czRRiUA78m78UfahYD8xXR9lsmNCGkSn3iF3eFVt oGbOq5oZ1zqVS8r+KStEvw2ylfCu0qVrg1sD6qTs9gwAekHNqC1z2bMwTT0MqGpi86kq MUoLoI5kih6r+gsn5PkrsQko1tNCc7NQEonTHcBhGj2+insItnSdNljdiLKEBDInst8q WDKGJ/TkoWx+yblM+EI6At1an+wtVLcQhXFiQtcmzEsp9LRLNO7fpqyABkFUqUTey3ZI 22vBbbWqNfoWVp/+YmYEkdhDZXEBOnsKlW40CwtW/c2EQ++q/6w3Scu1vabUwCU1sT2F Q3nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=rq8XGM/za3SECEC5EhKMn++1K/i+OiHH929IAZiyu1Q=; b=wsg0rFNbWrmTEktWQLjd/yKTqo0QmV/ZnYXwRaRUZGKDrzohPNqHY5KFFKFlcnxnYv PbMcYJPdxEUfnR84jH7Ax+U1Pkgoi6W8nIqZd+d3zUsu/JU5Qxf0KXOoIm7Kpz2gxOtk txKHyYUcC7EM1WVGfqOUk3i/w1Qnvtq7plaQ8CxATiH6kF4dzrR2DOm43Tqc92l7uQl1 JbnTa7QO8g0LCyo8ztIl2GCV+vUXu6rwXBz+rjOuNkqxSUc3SRxkPa37ujgZ2dt6uiF7 XTnz1fRK9NT7DUT/qxGeYlfdCve+s57Xh8n4BO9dU8sfmUfyBDZoj7bLeUzOX2dsOKqR u6fQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=P2qGaU9A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u11-20020a65670b000000b0045f83f1eb56si2034905pgf.234.2022.11.03.11.32.37; Thu, 03 Nov 2022 11:32:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=P2qGaU9A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231831AbiKCSbq (ORCPT + 99 others); Thu, 3 Nov 2022 14:31:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229672AbiKCSbk (ORCPT ); Thu, 3 Nov 2022 14:31:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF55095A3 for ; Thu, 3 Nov 2022 11:29:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667500191; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rq8XGM/za3SECEC5EhKMn++1K/i+OiHH929IAZiyu1Q=; b=P2qGaU9A9nr5mUiXQj3b2UWo5i9omcAOwW+i1ZQ/eAuuMA5ccH4CaDIZPLYtPDtj2jShWa jOzyg5+StEBzJ46F80benlzFnbqCm/BourbVa3j1w0IqoMlSilpg+mQRTL91iuWUzZaGm5 cktCOPSsqhC6oAYcH5COzeWCB1+udmc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-615-pGWcr2k4OL-41WqZfZJuLQ-1; Thu, 03 Nov 2022 14:29:46 -0400 X-MC-Unique: pGWcr2k4OL-41WqZfZJuLQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 392C43C11EA4; Thu, 3 Nov 2022 18:29:45 +0000 (UTC) Received: from llong.com (unknown [10.22.33.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id D13EE1121325; Thu, 3 Nov 2022 18:29:44 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long Subject: [PATCH v5 4/6] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum Date: Thu, 3 Nov 2022 14:29:34 -0400 Message-Id: <20221103182936.217120-5-longman@redhat.com> In-Reply-To: <20221103182936.217120-1-longman@redhat.com> References: <20221103182936.217120-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748500868700936193?= X-GMAIL-MSGID: =?utf-8?q?1748500868700936193?= Change the boolean waiter->handoff_set to an enum type so that we can have more states in some later patches. Also use READ_ONCE() outside wait_lock critical sections for read and WRITE_ONCE() inside wait_lock critical sections for write for proper synchronization. There is no functional change. Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 2953fa4dd790..d80f22f7ecb6 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -335,12 +335,17 @@ enum rwsem_waiter_type { RWSEM_WAITING_FOR_READ }; +enum rwsem_handoff_state { + HANDOFF_NONE = 0, + HANDOFF_REQUESTED, +}; + struct rwsem_waiter { struct list_head list; struct task_struct *task; enum rwsem_waiter_type type; + enum rwsem_handoff_state handoff_state; unsigned long timeout; - bool handoff_set; }; #define rwsem_first_waiter(sem) \ list_first_entry(&sem->wait_list, struct rwsem_waiter, list) @@ -467,7 +472,7 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, adjustment -= RWSEM_FLAG_HANDOFF; lockevent_inc(rwsem_rlock_handoff); } - waiter->handoff_set = true; + WRITE_ONCE(waiter->handoff_state, HANDOFF_REQUESTED); } atomic_long_add(-adjustment, &sem->count); @@ -619,7 +624,7 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, * waiter is the one that set it. Otherwisee, we * still try to acquire the rwsem. */ - if (first->handoff_set && (waiter != first)) + if (first->handoff_state && (waiter != first)) return false; } @@ -647,11 +652,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, /* * We have either acquired the lock with handoff bit cleared or set - * the handoff bit. Only the first waiter can have its handoff_set + * the handoff bit. Only the first waiter can have its handoff_state * set here to enable optimistic spinning in slowpath loop. */ if (new & RWSEM_FLAG_HANDOFF) { - first->handoff_set = true; + WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED); lockevent_inc(rwsem_wlock_handoff); return false; } @@ -1035,7 +1040,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat waiter.task = current; waiter.type = RWSEM_WAITING_FOR_READ; waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; - waiter.handoff_set = false; + waiter.handoff_state = HANDOFF_NONE; raw_spin_lock_irq(&sem->wait_lock); if (list_empty(&sem->wait_list)) { @@ -1122,7 +1127,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) waiter.task = current; waiter.type = RWSEM_WAITING_FOR_WRITE; waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; - waiter.handoff_set = false; + waiter.handoff_state = HANDOFF_NONE; raw_spin_lock_irq(&sem->wait_lock); rwsem_add_waiter(sem, &waiter); @@ -1167,7 +1172,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) * In this case, we attempt to acquire the lock again * without sleeping. */ - if (waiter.handoff_set) { + if (READ_ONCE(waiter.handoff_state)) { enum owner_state owner_state; owner_state = rwsem_spin_on_owner(sem); From patchwork Thu Nov 3 18:29:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15093 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp698218wru; Thu, 3 Nov 2022 11:31:33 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6GpXExXLQoPO7yaurQaND7qoDlNZtjvFRfx9Jo6kK0RwvaiGLK1t1FAfTWMvseU1HYVyVl X-Received: by 2002:a05:6a00:18a8:b0:56c:702a:5f49 with SMTP id x40-20020a056a0018a800b0056c702a5f49mr31894840pfh.1.1667500293631; Thu, 03 Nov 2022 11:31:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667500293; cv=none; d=google.com; s=arc-20160816; b=jwGFXOnIafOfKl2XhmAOtvUY5WNVJHtLieN29ntAc7XO8YxY6+JnEw/0OQHz7uHJRS +cr8Y0//r2KNBqOQkZhr9bCKBAyi5mXR8KxvweO7rL/pS2Xxb/34HDvD4SG4BCc8hf8C 0zZpA4dFj7jFzcTqo7EFAgDzLaoca3oPHYm+MU4G05hzCDbhyqHQYHYNvRlEtSsuXRyz z5A+FPHZRd8XiNL7XqajlvW9fECH/ZXPvJCtfhO3slhufaE++lYHBUDfeHrJdoZJMSc3 fEY8xofN5nzkg3aJ6ikZ63t07XmFMUy0KLOaBqRyVEGpSsmcJnhEwMLDWOetakwdkoXq lirQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=l4s9+JHBoTG9/gQjRX0RwWWyQfm5vM6EXmU/9JRuJrY=; b=lgSHKzg4+4V2G3gQO+wS5CAaIDDn0PgprvI562J+CYaSWvYl9hqYnk7gPJCMYx535J +Kw4jkbMk9BhfrZw8DD1OpMwZQq9s5kUMdKkd7/gEB5v5C06IcjwQGvbH+0d7npVL9E7 82d/5TROcmJJA+Yjlck5ZKz2vaqOorgzVBQ0xCzOTwHo0AbV/xksKYSvgp4yszPq8dRC jNfVDwdYlaZLHB+AQKtV/N4PmP/SAKEQA8e9qIzTqbpT7/l9825AAHgn2YSGkb9rTsDW WTMAn3M+LbxBJMoyotq+HkAd0CMHpLYYewXsPoxgG9o256AAnnOdedRD4Pw3sR33T7sr g1Rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i8Jc1JWo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b13-20020a170902d88d00b0018678dab05dsi1464387plz.199.2022.11.03.11.31.20; Thu, 03 Nov 2022 11:31:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i8Jc1JWo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231663AbiKCSau (ORCPT + 99 others); Thu, 3 Nov 2022 14:30:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229771AbiKCSap (ORCPT ); Thu, 3 Nov 2022 14:30:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 505763899 for ; Thu, 3 Nov 2022 11:29:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667500187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l4s9+JHBoTG9/gQjRX0RwWWyQfm5vM6EXmU/9JRuJrY=; b=i8Jc1JWoOewOaCHoHffNOF6gnIwQ9Rzu2PoXunZ8+8c1TJVzQ4M5wqzXdyJpHhpt9QKZNg RIX3+Ken9CGqHZY/b2rN2+3q4g3/urSmPrz1+dn3jadYcuwsqQRIQHntKkErbJy6NlsPuC 8od6V4nS8J4ceyvzHLWkXxBGhJLmCBM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-473-IxJNVEndNv2mu2aQTbfs_g-1; Thu, 03 Nov 2022 14:29:46 -0400 X-MC-Unique: IxJNVEndNv2mu2aQTbfs_g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A72EF1C14345; Thu, 3 Nov 2022 18:29:45 +0000 (UTC) Received: from llong.com (unknown [10.22.33.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id 46E2F1121325; Thu, 3 Nov 2022 18:29:45 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long Subject: [PATCH v5 5/6] locking/rwsem: Enable direct rwsem lock handoff Date: Thu, 3 Nov 2022 14:29:35 -0400 Message-Id: <20221103182936.217120-6-longman@redhat.com> In-Reply-To: <20221103182936.217120-1-longman@redhat.com> References: <20221103182936.217120-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748500787731586620?= X-GMAIL-MSGID: =?utf-8?q?1748500787731586620?= The lock handoff provided in rwsem isn't a true handoff like that in the mutex. Instead, it is more like a quiescent state where optimistic spinning and lock stealing are disabled to make it easier for the first waiter to acquire the lock. Reworking the code to enable a true lock handoff is more complex due to the following facts: 1) The RWSEM_FLAG_HANDOFF bit is protected by the wait_lock and it is too expensive to always take the wait_lock in the unlock path to prevent racing. 2) The reader lock fast path may add a RWSEM_READER_BIAS at the wrong time to prevent a proper lock handoff from a reader owned rwsem. A lock handoff can only be initiated when the following conditions are true: 1) The RWSEM_FLAG_HANDOFF bit is set. 2) The task to do the handoff don't see any other active lock excluding the lock that it might have held. The new handoff mechanism performs handoff in rwsem_wakeup() to minimize overhead. The rwsem count will be known at that point to determine if handoff should be done. However, there is a small time gap between the rwsem becomes free and the wait_lock is taken where a reader can come in and add a RWSEM_READER_BIAS to the count or the current first waiter can take the rwsem and clear RWSEM_FLAG_HANDOFF in the interim. That will fail the handoff operation. To handle the former case, a secondary handoff will also be done in the rwsem_down_read_slowpath() to catch it. With true lock handoff, there is no need to do a NULL owner spinning anymore as wakeup will be performed if handoff is possible. So it is likely that the first waiter won't actually go to sleep even when schedule() is called in this case. Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 135 +++++++++++++++++++++++++++++++++++------ 1 file changed, 117 insertions(+), 18 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index d80f22f7ecb6..c9f24ed8757d 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -338,6 +338,7 @@ enum rwsem_waiter_type { enum rwsem_handoff_state { HANDOFF_NONE = 0, HANDOFF_REQUESTED, + HANDOFF_GRANTED, }; struct rwsem_waiter { @@ -486,6 +487,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, */ owner = waiter->task; __rwsem_set_reader_owned(sem, owner); + } else if (waiter->handoff_state == HANDOFF_GRANTED) { + /* + * rwsem_handoff() has added to count RWSEM_READER_BIAS of + * the first waiter. + */ + adjustment = RWSEM_READER_BIAS; } /* @@ -583,7 +590,7 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waiter, struct wake_q_head *wake_q) __releases(&sem->wait_lock) { - bool first = rwsem_first_waiter(sem) == waiter; + struct rwsem_waiter *first = rwsem_first_waiter(sem); wake_q_init(wake_q); @@ -592,8 +599,21 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waiter, * the first waiter, we wake up the remaining waiters as they may * be eligible to acquire or spin on the lock. */ - if (rwsem_del_waiter(sem, waiter) && first) + if (rwsem_del_waiter(sem, waiter) && (waiter == first)) { + switch (waiter->handoff_state) { + case HANDOFF_GRANTED: + raw_spin_unlock_irq(&sem->wait_lock); + return; + case HANDOFF_REQUESTED: + /* Pass handoff state to the new first waiter */ + first = rwsem_first_waiter(sem); + WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED); + fallthrough; + default: + break; + } rwsem_mark_wake(sem, RWSEM_WAKE_ANY, wake_q); + } raw_spin_unlock_irq(&sem->wait_lock); if (!wake_q_empty(wake_q)) wake_up_q(wake_q); @@ -759,6 +779,11 @@ rwsem_spin_on_owner(struct rw_semaphore *sem) owner = rwsem_owner_flags(sem, &flags); state = rwsem_owner_state(owner, flags); + + /* A handoff may have been granted */ + if (!flags && (owner == current)) + return OWNER_NONSPINNABLE; + if (state != OWNER_WRITER) return state; @@ -969,6 +994,32 @@ rwsem_spin_on_owner(struct rw_semaphore *sem) } #endif +/* + * Hand off the lock to the first waiter + */ +static void rwsem_handoff(struct rw_semaphore *sem, long adj, + struct wake_q_head *wake_q) +{ + struct rwsem_waiter *waiter; + enum rwsem_wake_type wake_type; + + lockdep_assert_held(&sem->wait_lock); + adj -= RWSEM_FLAG_HANDOFF; + waiter = rwsem_first_waiter(sem); + WRITE_ONCE(waiter->handoff_state, HANDOFF_GRANTED); + if (waiter->type == RWSEM_WAITING_FOR_WRITE) { + wake_type = RWSEM_WAKE_ANY; + adj += RWSEM_WRITER_LOCKED; + atomic_long_set(&sem->owner, (long)waiter->task); + } else { + wake_type = RWSEM_WAKE_READ_OWNED; + adj += RWSEM_READER_BIAS; + __rwsem_set_reader_owned(sem, waiter->task); + } + atomic_long_add(adj, &sem->count); + rwsem_mark_wake(sem, wake_type, wake_q); +} + /* * Prepare to wake up waiter(s) in the wait queue by putting them into the * given wake_q if the rwsem lock owner isn't a writer. If rwsem is likely @@ -1043,6 +1094,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat waiter.handoff_state = HANDOFF_NONE; raw_spin_lock_irq(&sem->wait_lock); + count = atomic_long_read(&sem->count); if (list_empty(&sem->wait_list)) { /* * In case the wait queue is empty and the lock isn't owned @@ -1050,7 +1102,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat * immediately as its RWSEM_READER_BIAS has already been set * in the count. */ - if (!(atomic_long_read(&sem->count) & RWSEM_WRITER_MASK)) { + if (!(count & RWSEM_WRITER_MASK)) { /* Provide lock ACQUIRE */ smp_acquire__after_ctrl_dep(); raw_spin_unlock_irq(&sem->wait_lock); @@ -1059,13 +1111,36 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat return sem; } adjustment += RWSEM_FLAG_WAITERS; + } else if ((count & RWSEM_FLAG_HANDOFF) && + ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) { + /* + * If the waiter to be handed off is a reader, all the + * readers in the wait queue will be waken up. As this reader + * hasn't been queued in the wait queue yet, it may as well + * keep its RWSEM_READER_BIAS and return after waking up + * other readers in the queue. + */ + if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ) + adjustment = 0; + rwsem_handoff(sem, adjustment, &wake_q); + + if (!adjustment) { + raw_spin_unlock_irq(&sem->wait_lock); + wake_up_q(&wake_q); + return sem; + } + adjustment = 0; } rwsem_add_waiter(sem, &waiter); - /* we're now waiting on the lock, but no longer actively locking */ - count = atomic_long_add_return(adjustment, &sem->count); - - rwsem_cond_wake_waiter(sem, count, &wake_q); + if (adjustment) { + /* + * We are now waiting on the lock with no handoff, but no + * longer actively locking. + */ + count = atomic_long_add_return(adjustment, &sem->count); + rwsem_cond_wake_waiter(sem, count, &wake_q); + } raw_spin_unlock_irq(&sem->wait_lock); if (!wake_q_empty(&wake_q)) @@ -1154,6 +1229,8 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) trace_contention_begin(sem, LCB_F_WRITE); for (;;) { + enum rwsem_handoff_state handoff; + if (rwsem_try_write_lock(sem, &waiter)) { /* rwsem_try_write_lock() implies ACQUIRE on success */ break; @@ -1168,26 +1245,33 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) * After setting the handoff bit and failing to acquire * the lock, attempt to spin on owner to accelerate lock * transfer. If the previous owner is a on-cpu writer and it - * has just released the lock, OWNER_NULL will be returned. - * In this case, we attempt to acquire the lock again - * without sleeping. + * has just released the lock, handoff_state is likely to be + * set to HANDOFF_GRANTED or is to be set soon. */ - if (READ_ONCE(waiter.handoff_state)) { - enum owner_state owner_state; + handoff = READ_ONCE(waiter.handoff_state); + if (handoff) { + if (handoff == HANDOFF_REQUESTED) { + rwsem_spin_on_owner(sem); + handoff = READ_ONCE(waiter.handoff_state); + } - owner_state = rwsem_spin_on_owner(sem); - if (owner_state == OWNER_NULL) - goto trylock_again; + if (handoff == HANDOFF_GRANTED) + goto skip_sleep; } schedule_preempt_disabled(); lockevent_inc(rwsem_sleep_writer); set_current_state(state); -trylock_again: +skip_sleep: raw_spin_lock_irq(&sem->wait_lock); + if (waiter.handoff_state == HANDOFF_GRANTED) { + rwsem_del_waiter(sem, &waiter); + break; + } } __set_current_state(TASK_RUNNING); raw_spin_unlock_irq(&sem->wait_lock); +out_lock: lockevent_inc(rwsem_wlock); trace_contention_end(sem, 0); return sem; @@ -1196,6 +1280,9 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) __set_current_state(TASK_RUNNING); raw_spin_lock_irq(&sem->wait_lock); rwsem_del_wake_waiter(sem, &waiter, &wake_q); + if (unlikely(READ_ONCE(waiter.handoff_state) == HANDOFF_GRANTED)) + goto out_lock; + lockevent_inc(rwsem_wlock_fail); trace_contention_end(sem, -EINTR); return ERR_PTR(-EINTR); @@ -1207,12 +1294,24 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) */ static struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem) { - unsigned long flags; DEFINE_WAKE_Q(wake_q); + unsigned long flags; + long count; raw_spin_lock_irqsave(&sem->wait_lock, flags); - if (!list_empty(&sem->wait_list)) + if (list_empty(&sem->wait_list)) { + raw_spin_unlock_irqrestore(&sem->wait_lock, flags); + return sem; + } + /* + * If the rwsem is free and handoff flag is set with wait_lock held, + * no other CPUs can take an active lock. + */ + count = atomic_long_read(&sem->count); + if (!(count & RWSEM_LOCK_MASK) && (count & RWSEM_FLAG_HANDOFF)) + rwsem_handoff(sem, 0, &wake_q); + else rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); raw_spin_unlock_irqrestore(&sem->wait_lock, flags); From patchwork Thu Nov 3 18:29:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15095 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp698794wru; Thu, 3 Nov 2022 11:32:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6GR0PKlv8WTAvuRHptDSmP/LP+GPndjO190SUJ0oB/bjiuiAsx5yFWK2yndCbZ68LeVfi3 X-Received: by 2002:a17:902:ebce:b0:186:9905:11bf with SMTP id p14-20020a170902ebce00b00186990511bfmr32134251plg.110.1667500366318; Thu, 03 Nov 2022 11:32:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667500366; cv=none; d=google.com; s=arc-20160816; b=eGCNja+AicBElzSNo/s1tsiot0NN+qeQEwNQhRapRQTy5CIxz/HfRnz54+0NvdQlIU OR71I5JvwuDBTxXGNyS+F4kW+M9z1sROkyFcvsipfHJdILor8OgturtbU3qdsndaKtJU efZfQCrPnoikSd37IzqEApCbGcGCjB5kpSXqdIuic57LWqd6EBSCcUZsxkLOG8WAW6yN Nwna3+OnplAeVszNlfn9Az8uKJmJngXcpO6wssx593ubhEJ4VOS9De466idYoQTD3vLQ gzA1BwDL6i4EFkfhqIVpMPdzjq9yz6hj+85ouOQeFOZ+k/5jAc6a6eM5pHj69S7249Ls luFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fs9hdEnSlUloTwUQc2bLSSOzYbdrDiXiodQGEMzW6HM=; b=zMpedy+N7oQYja7wtnbWMXTNNUrf2lxSaiZkwS/q/baKdraHTr6W2xDBAdRRMQeI0p AbRFEYRRj1uxUlJrhY8uuzk5LOM8cm6RIiOzvZNel5Eli2lKDDwDf8KqGAbVLj7Q/RNn EHm8ufJjRQ+0aGAPBZZT4FiLI6F6h0VnIDbU5GUxikf0VQRAMuzHRbH6d27wJ6Gis852 Ks+n9qdLIUmjMuC3RPkd8uf8F7py8AM79MWsBfo81g7BisaNGx+AUFhROZZWSPEoZgki X8K8bhj6OWnnAIuZJD9TuwBE8zHFSvhs1Fo3h2v7xT2dPJvxLe0WifJbTEdquIniLm5s 1RQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TIkAx75O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z6-20020a170902d54600b00179f3da346bsi1548551plf.39.2022.11.03.11.32.32; Thu, 03 Nov 2022 11:32:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TIkAx75O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231714AbiKCScJ (ORCPT + 99 others); Thu, 3 Nov 2022 14:32:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231805AbiKCSbn (ORCPT ); Thu, 3 Nov 2022 14:31:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8948265DD for ; Thu, 3 Nov 2022 11:29:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667500191; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fs9hdEnSlUloTwUQc2bLSSOzYbdrDiXiodQGEMzW6HM=; b=TIkAx75Om3g/2yjvD4PvJgt19GTFMFyyM/x9zoljXKhkVqXiRL94p/UlggAwJE7Lo815p9 9q40CApCbLMRwXa72B5sXEzkBo+gI42MFaVW1uoy9DwdLk8MwtLPw0jG9S/+Qs3Y6o7oRZ /0T9mnpLqLVUXH55AYcvM+t4BxhyPZ0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-668-DW_knKuvPDiczMYT1b9sUw-1; Thu, 03 Nov 2022 14:29:46 -0400 X-MC-Unique: DW_knKuvPDiczMYT1b9sUw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 23DE68027EC; Thu, 3 Nov 2022 18:29:46 +0000 (UTC) Received: from llong.com (unknown [10.22.33.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id B32481121325; Thu, 3 Nov 2022 18:29:45 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= , Waiman Long Subject: [PATCH v5 6/6] locking/rwsem: Update handoff lock events tracking Date: Thu, 3 Nov 2022 14:29:36 -0400 Message-Id: <20221103182936.217120-7-longman@redhat.com> In-Reply-To: <20221103182936.217120-1-longman@redhat.com> References: <20221103182936.217120-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748500863861474159?= X-GMAIL-MSGID: =?utf-8?q?1748500863861474159?= With the new direct rwsem lock handoff, the corresponding handoff lock events are updated to also track the number of secondary lock handoffs in rwsem_down_read_slowpath() to see how prevalent those handoff events are. The number of primary lock handoffs in the unlock paths is (rwsem_handoff_read + rwsem_handoff_write - rwsem_handoff_rslow). After running a 96-thread rwsem microbenchmark with equal number of readers and writers on a 2-socket 96-thread system for 40s, the following handoff stats were obtained: rwsem_handoff_read=189 rwsem_handoff_rslow=1 rwsem_handoff_write=6678 rwsem_handoff_wspin=6681 The number of primary handoffs was 6866, whereas there was only one secondary handoff for this test run. Signed-off-by: Waiman Long --- kernel/locking/lock_events_list.h | 6 ++++-- kernel/locking/rwsem.c | 9 +++++---- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 97fb6f3f840a..04d101767c2c 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -63,7 +63,9 @@ LOCK_EVENT(rwsem_rlock) /* # of read locks acquired */ LOCK_EVENT(rwsem_rlock_steal) /* # of read locks by lock stealing */ LOCK_EVENT(rwsem_rlock_fast) /* # of fast read locks acquired */ LOCK_EVENT(rwsem_rlock_fail) /* # of failed read lock acquisitions */ -LOCK_EVENT(rwsem_rlock_handoff) /* # of read lock handoffs */ LOCK_EVENT(rwsem_wlock) /* # of write locks acquired */ LOCK_EVENT(rwsem_wlock_fail) /* # of failed write lock acquisitions */ -LOCK_EVENT(rwsem_wlock_handoff) /* # of write lock handoffs */ +LOCK_EVENT(rwsem_handoff_read) /* # of read lock handoffs */ +LOCK_EVENT(rwsem_handoff_write) /* # of write lock handoffs */ +LOCK_EVENT(rwsem_handoff_rslow) /* # of handoffs in read slowpath */ +LOCK_EVENT(rwsem_handoff_wspin) /* # of handoff spins in write slowpath */ diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index c9f24ed8757d..84bdb4fd18c3 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -469,10 +469,8 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, * force the issue. */ if (time_after(jiffies, waiter->timeout)) { - if (!(oldcount & RWSEM_FLAG_HANDOFF)) { + if (!(oldcount & RWSEM_FLAG_HANDOFF)) adjustment -= RWSEM_FLAG_HANDOFF; - lockevent_inc(rwsem_rlock_handoff); - } WRITE_ONCE(waiter->handoff_state, HANDOFF_REQUESTED); } @@ -677,7 +675,6 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, */ if (new & RWSEM_FLAG_HANDOFF) { WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED); - lockevent_inc(rwsem_wlock_handoff); return false; } @@ -1011,10 +1008,12 @@ static void rwsem_handoff(struct rw_semaphore *sem, long adj, wake_type = RWSEM_WAKE_ANY; adj += RWSEM_WRITER_LOCKED; atomic_long_set(&sem->owner, (long)waiter->task); + lockevent_inc(rwsem_handoff_write); } else { wake_type = RWSEM_WAKE_READ_OWNED; adj += RWSEM_READER_BIAS; __rwsem_set_reader_owned(sem, waiter->task); + lockevent_inc(rwsem_handoff_read); } atomic_long_add(adj, &sem->count); rwsem_mark_wake(sem, wake_type, wake_q); @@ -1123,6 +1122,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ) adjustment = 0; rwsem_handoff(sem, adjustment, &wake_q); + lockevent_inc(rwsem_handoff_rslow); if (!adjustment) { raw_spin_unlock_irq(&sem->wait_lock); @@ -1253,6 +1253,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) if (handoff == HANDOFF_REQUESTED) { rwsem_spin_on_owner(sem); handoff = READ_ONCE(waiter.handoff_state); + lockevent_inc(rwsem_handoff_wspin); } if (handoff == HANDOFF_GRANTED)