[v3,0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff

Message ID	20221017211356.333862-1-longman@redhat.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Waiman Long <longman@redhat.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com> Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton <hdanton@sina.com>, Mukesh Ojha <quic_mojha@quicinc.com>, =?utf-8?b?VGluZzExIFdhbmcg546L5am3?= <wangting11@xiaomi.com>, Waiman Long <longman@redhat.com> Subject: [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Date: Mon, 17 Oct 2022 17:13:51 -0400 Message-Id: <20221017211356.333862-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	lockinig/rwsem: Fix rwsem bugs & enable true lock handoff \| [v3,0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff [v3,1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath [v3,2/5] locking/rwsem: Limit # of null owner retries for handoff writer [v3,3/5] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum [v3,4/5] locking/rwsem: Enable direct rwsem lock handoff [v3,5/5] locking/rwsem: Update handoff lock events tracking

Message ID

20221017211356.333862-1-longman@redhat.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
        Boqun Feng <boqun.feng@gmail.com>
Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com,
 Hillf Danton <hdanton@sina.com>, Mukesh Ojha <quic_mojha@quicinc.com>,
	=?utf-8?b?VGluZzExIFdhbmcg546L5am3?= <wangting11@xiaomi.com>,
 Waiman Long <longman@redhat.com>
Subject: [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock
 handoff
Date: Mon, 17 Oct 2022 17:13:51 -0400
Message-Id: <20221017211356.333862-1-longman@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

lockinig/rwsem: Fix rwsem bugs & enable true lock handoff |

Message

Waiman Long Oct. 17, 2022, 9:13 p.m. UTC

  v3:
 - Make a minor cleanup to patch 1.
 - Add 3 more patches to implement true lock handoff.

v2:
 - Add an additional patch to limit the # of first waiter optimistic
   spinning in the writer slowpath.

It turns out the current waiter optimistic spinning code does not work
that well if we have RT tasks in the mix. This patch series include two
different fixes to resolve those issues. The last 3 patches modify the
handoff code to implement true lock handoff similar to that of mutex.

Waiman Long (5):
  locking/rwsem: Prevent non-first waiter from spinning in down_write()
    slowpath
  locking/rwsem: Limit # of null owner retries for handoff writer
  locking/rwsem: Change waiter->hanodff_set to a handoff_state enum
  locking/rwsem: Enable direct rwsem lock handoff
  locking/rwsem: Update handoff lock events tracking

 kernel/locking/lock_events_list.h |   6 +-
 kernel/locking/rwsem.c            | 172 +++++++++++++++++++++++-------
 2 files changed, 138 insertions(+), 40 deletions(-)

Comments

Mukesh Ojha Oct. 18, 2022, 2:13 p.m. UTC | #1

Hi,

On 10/18/2022 4:44 PM, Hillf Danton wrote:
> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com>
>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
>>   			return sem;
>>   		}
>>   		adjustment += RWSEM_FLAG_WAITERS;
>> +	} else if ((count & RWSEM_FLAG_HANDOFF) &&
>> +		  ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
> 
> Could a couple of CPUs go read slow path in parallel?
> 
>> +		/*
>> +		 * If the waiter to be handed off is a reader, this reader
>> +		 * can piggyback on top of top of that.
>> +		 */
>> +		if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ)
>> +			adjustment = 0;
>> +		rwsem_handoff(sem, adjustment, &wake_q);
>> +
>> +		if (!adjustment) {
>> +			raw_spin_unlock_irq(&sem->wait_lock);
>> +			wake_up_q(&wake_q);
>> +			return sem;
>> +		}
>> +		adjustment = 0;
>>   	}
>>   	rwsem_add_waiter(sem, &waiter);
> 
> Why can this acquirer pigyback without becoming a waiter?
> 
>>   
>> -	/* we're now waiting on the lock, but no longer actively locking */
>> -	count = atomic_long_add_return(adjustment, &sem->count);
>> -
>> -	rwsem_cond_wake_waiter(sem, count, &wake_q);
>> +	if (adjustment) {
>> +		/*
>> +		 * We are now waiting on the lock with no handoff, but no
>> +		 * longer actively locking.
>> +		 */
>> +		count = atomic_long_add_return(adjustment, &sem->count);
>> +		rwsem_cond_wake_waiter(sem, count, &wake_q);
>> +	}
>>   	raw_spin_unlock_irq(&sem->wait_lock);
>>   
>>   	if (!wake_q_empty(&wake_q))
>> @@ -1120,7 +1192,6 @@ static struct rw_semaphore __sched *
>>   rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>>   {
>>   	struct rwsem_waiter waiter;
>> -	int null_owner_retries;
> 
> This reverts 2/5 and feel free to drop it directly.

I think, he intents to tag the first two patches to go to stable branches.

-Mukesh
> 
> Hillf

Waiman Long Oct. 18, 2022, 5:37 p.m. UTC | #2

On 10/18/22 10:13, Mukesh Ojha wrote:
> Hi,
>
> On 10/18/2022 4:44 PM, Hillf Danton wrote:
>> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com>
>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore 
>>> *sem, long count, unsigned int stat
>>>               return sem;
>>>           }
>>>           adjustment += RWSEM_FLAG_WAITERS;
>>> +    } else if ((count & RWSEM_FLAG_HANDOFF) &&
>>> +          ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
>>
>> Could a couple of CPUs go read slow path in parallel?
This is under wait_lock protection. So no parallel execution is possible.
>>
>>> +        /*
>>> +         * If the waiter to be handed off is a reader, this reader
>>> +         * can piggyback on top of top of that.
>>> +         */
>>> +        if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ)
>>> +            adjustment = 0;
>>> +        rwsem_handoff(sem, adjustment, &wake_q);
>>> +
>>> +        if (!adjustment) {
>>> +            raw_spin_unlock_irq(&sem->wait_lock);
>>> +            wake_up_q(&wake_q);
>>> +            return sem;
>>> +        }
>>> +        adjustment = 0;
>>>       }
>>>       rwsem_add_waiter(sem, &waiter);
>>
>> Why can this acquirer pigyback without becoming a waiter?
The idea is to have as much reader parallelism as possible without 
writer starvation. In other word, a continuous stream of readers is not 
allowed to block out writer. However, there are places where allow one 
more reader to get the lock won't cause writer starvation.
>>
>>>   -    /* we're now waiting on the lock, but no longer actively 
>>> locking */
>>> -    count = atomic_long_add_return(adjustment, &sem->count);
>>> -
>>> -    rwsem_cond_wake_waiter(sem, count, &wake_q);
>>> +    if (adjustment) {
>>> +        /*
>>> +         * We are now waiting on the lock with no handoff, but no
>>> +         * longer actively locking.
>>> +         */
>>> +        count = atomic_long_add_return(adjustment, &sem->count);
>>> +        rwsem_cond_wake_waiter(sem, count, &wake_q);
>>> +    }
>>>       raw_spin_unlock_irq(&sem->wait_lock);
>>>         if (!wake_q_empty(&wake_q))
>>> @@ -1120,7 +1192,6 @@ static struct rw_semaphore __sched *
>>>   rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>>>   {
>>>       struct rwsem_waiter waiter;
>>> -    int null_owner_retries;
>>
>> This reverts 2/5 and feel free to drop it directly.
>
> I think, he intents to tag the first two patches to go to stable 
> branches.

This patch is too disruptive to go to the stable branches. Yes, I do 
intend the first 2 patches to go into stable branches.

Cheers,
Longman

Waiman Long Oct. 19, 2022, 12:39 a.m. UTC | #3

On 10/18/22 19:51, Hillf Danton wrote:
> On 18 Oct 2022 13:37:20 -0400 Waiman Long <longman@redhat.com>
>> On 10/18/22 10:13, Mukesh Ojha wrote:
>>> On 10/18/2022 4:44 PM, Hillf Danton wrote:
>>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com>
>>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore
>>>>>                return sem;
>>>>>            }
>>>>>            adjustment += RWSEM_FLAG_WAITERS;
>>>>> +    } else if ((count & RWSEM_FLAG_HANDOFF) &&
>>>>> +          ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
>>>> Could a couple of CPUs go read slow path in parallel?
>>>>
>> This is under wait_lock protection. So no parallel execution is possible.
> They individually add RWSEM_READER_BIAS to count before taking wait_lock,
> and the check for BIAS here does not cover the case of readers in parallel.
> Is this intended?
>
> Hillf

As I said in the patch description, the lock handoff can only be done if 
we can be sure that there is no other active locks outstanding with the 
handoff bit set. If at the time of the check, another reader come in and 
adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to 
put its waiter in the queue and begin sleeping. Hopefully, the last one 
left will find that count has only its RWSEM_READER_BIAS and it can 
start the handoff process.

Cheers,
Longman

Waiman Long Oct. 19, 2022, 2:49 a.m. UTC | #4

On 10/18/22 22:29, Hillf Danton wrote:
> On 18 Oct 2022 20:39:59 -0400 Waiman Long <longman@redhat.com>
>> On 10/18/22 19:51, Hillf Danton wrote:
>>> On 18 Oct 2022 13:37:20 -0400 Waiman Long <longman@redhat.com>
>>>> On 10/18/22 10:13, Mukesh Ojha wrote:
>>>>> On 10/18/2022 4:44 PM, Hillf Danton wrote:
>>>>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com>
>>>>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore
>>>>>>>                 return sem;
>>>>>>>             }
>>>>>>>             adjustment += RWSEM_FLAG_WAITERS;
>>>>>>> +    } else if ((count & RWSEM_FLAG_HANDOFF) &&
>>>>>>> +          ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
>>>>>> Could a couple of CPUs go read slow path in parallel?
>>>>>>
>>>> This is under wait_lock protection. So no parallel execution is possible.
>>> They individually add RWSEM_READER_BIAS to count before taking wait_lock,
>>> and the check for BIAS here does not cover the case of readers in parallel.
>>> Is this intended?
>>>
>>> Hillf
>> As I said in the patch description, the lock handoff can only be done if
>> we can be sure that there is no other active locks outstanding with the
>> handoff bit set. If at the time of the check, another reader come in and
>> adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to
>> put its waiter in the queue and begin sleeping. Hopefully, the last one
>> left will find that count has only its RWSEM_READER_BIAS and it can
>> start the handoff process.
> If handoff grants rwsem to a read waiter then the read fast path may revive.
I don't quite understand what you mean by "read fast path may revive".
> And at the time of the check, multiple readers do not break handoff IMO.

I am not saying that multiple readers will break handoff. They will just 
delay it until all their temporary RWSEM_READ_BIAS are taken off.

Cheers,
Longman

Waiman Long Oct. 19, 2022, 3:02 p.m. UTC | #5

On 10/19/22 03:05, Hillf Danton wrote:
> On 18 Oct 2022 22:49:02 -0400 Waiman Long <longman@redhat.com>
>> On 10/18/22 22:29, Hillf Danton wrote:
>>> If handoff grants rwsem to a read waiter then the read fast path may revive.
>> I don't quite understand what you mean by "read fast path may revive".
> Subsequent readers may take rwsem without going the slow path after a
> read waiter is granted.
That may not be true. As long as there are still waiters in the wait 
queue, a reader has to go into the slow path and queued up in the wait 
queue. This is is to prevent a continuous stream of readers from 
starving writers in the wait queue.
>
> OTOH even after the check for single BIAS, another reader may come in
> and add its BIAS to count, which builds the same count as multiple
> readers came in before the check.

That is true, but hopefully we will eventually run out reader trying to 
get a read lock on a given rwsem.

Cheers,
Longman

Waiman Long Oct. 24, 2022, 4:18 p.m. UTC | #6

On 10/18/22 19:51, Hillf Danton wrote:
> On 18 Oct 2022 13:37:20 -0400 Waiman Long<longman@redhat.com>
>> On 10/18/22 10:13, Mukesh Ojha wrote:
>>> On 10/18/2022 4:44 PM, Hillf Danton wrote:
>>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long<longman@redhat.com>
>>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore
>>>>>                return sem;
>>>>>            }
>>>>>            adjustment += RWSEM_FLAG_WAITERS;
>>>>> +    } else if ((count & RWSEM_FLAG_HANDOFF) &&
>>>>> +          ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
>>>> Could a couple of CPUs go read slow path in parallel?
>>>>
>> This is under wait_lock protection. So no parallel execution is possible.
> They individually add RWSEM_READER_BIAS to count before taking wait_lock,
> and the check for BIAS here does not cover the case of readers in parallel.
> Is this intended?
>
> Hillf

As I said in the patch description, the lock handoff can only be done if 
we can be sure that there is no other active locks outstanding with the 
handoff bit set. If at the time of the check, another reader come in and 
adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to 
put its waiter in the queue and begin sleeping. Hopefully, the last one 
left will find that count has only its RWSEM_READER_BIAS and it can 
start the handoff process.

Cheers,
Longman