[v2] pstore: Revert pmsg_lock back to a normal mutex

Message ID 20230304031029.3037914-1-jstultz@google.com
State New
Headers
Series [v2] pstore: Revert pmsg_lock back to a normal mutex |

Commit Message

John Stultz March 4, 2023, 3:10 a.m. UTC
  This reverts commit 76d62f24db07f22ccf9bc18ca793c27d4ebef721.

So while priority inversion on the pmsg_lock is an occasional
problem that an rt_mutex would help with, in uses where logging
is writing to pmsg heavily from multiple threads, the pmsg_lock
can be heavily contended.

Normal mutexes can do adaptive spinning, which keeps the
contention overhead fairly low maybe adding on the order of 10s
of us delay waiting, but the slowpath w/ rt_mutexes makes the
blocked tasks sleep & wake. This makes matters worse when there
is heavy contentention, as it just allows additional threads to
run and line up to try to take the lock.

It devolves to a worse case senerio where the lock acquisition
and scheduling overhead dominates, and each thread is waiting on
the order of ~ms to do ~us of work.

Obviously, having tons of threads all contending on a single
lock for logging is non-optimal, so the proper fix is probably
reworking pstore pmsg to have per-cpu buffers so we don't have
contention.

But in the short term, lets revert the change to the rt_mutex
and go back to normal mutexes to avoid a potentially major
performance regression.

Cc: Wei Wang <wvw@google.com>
Cc: Midas Chien<midaschieh@google.com>
Cc: "Chunhui Li (李春辉)" <chunhui.li@mediatek.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: kernel-team@android.com
Fixes: 76d62f24db07 ("pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion")
Reported-by: "Chunhui Li (李春辉)" <chunhui.li@mediatek.com>
Tested-by: Chunhui Li <chunhui.li@mediatek.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
I know Steven is working on a fix to address the rtmutex not
spinning, but as the earlier version of it didn't resolve the
issue for Chunhui Li, I wanted to resend this out again w/
Tested-by tags, so it is ready to go if needed. I am looking
to get a local reproducer so I can help validate Steven's
efforts.

v2:
* Fix quoting around Chunhui Li's email name (so they are actually
  cc'ed)
* Added tested by tag
---
 fs/pstore/pmsg.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)
  

Comments

Steven Rostedt March 5, 2023, 4:36 p.m. UTC | #1
On Sat,  4 Mar 2023 03:10:29 +0000
John Stultz <jstultz@google.com> wrote:

> This reverts commit 76d62f24db07f22ccf9bc18ca793c27d4ebef721.
> 
> So while priority inversion on the pmsg_lock is an occasional
> problem that an rt_mutex would help with, in uses where logging
> is writing to pmsg heavily from multiple threads, the pmsg_lock
> can be heavily contended.
> 
> Normal mutexes can do adaptive spinning, which keeps the
> contention overhead fairly low maybe adding on the order of 10s
> of us delay waiting, but the slowpath w/ rt_mutexes makes the
> blocked tasks sleep & wake. This makes matters worse when there
> is heavy contentention, as it just allows additional threads to
> run and line up to try to take the lock.

That is incorrect. rt_mutexes have pretty much the same adaptive spinning
as normal mutexes. So that is *not* the reason for the difference in
performance, and should not be stated so in this change log.

The difference between rt_mutex and normal mutex, is that the rt_mutex will
trigger priority inheritance. Perhaps on high contention, that makes a
difference. But do not state it's because rt_mutex does not have adaptive
spinning, because it most definitely does.

-- Steve


> 
> It devolves to a worse case senerio where the lock acquisition
> and scheduling overhead dominates, and each thread is waiting on
> the order of ~ms to do ~us of work.
> 
> Obviously, having tons of threads all contending on a single
> lock for logging is non-optimal, so the proper fix is probably
> reworking pstore pmsg to have per-cpu buffers so we don't have
> contention.
> 
> But in the short term, lets revert the change to the rt_mutex
> and go back to normal mutexes to avoid a potentially major
> performance regression.
> 
> Cc: Wei Wang <wvw@google.com>
> Cc: Midas Chien<midaschieh@google.com>
> Cc: "Chunhui Li (李春辉)" <chunhui.li@mediatek.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: kernel-team@android.com
> Fixes: 76d62f24db07 ("pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion")
> Reported-by: "Chunhui Li (李春辉)" <chunhui.li@mediatek.com>
> Tested-by: Chunhui Li <chunhui.li@mediatek.com>
> Signed-off-by: John Stultz <jstultz@google.com>
> ---
> I know Steven is working on a fix to address the rtmutex not
> spinning, but as the earlier version of it didn't resolve the
> issue for Chunhui Li, I wanted to resend this out again w/
> Tested-by tags, so it is ready to go if needed. I am looking
> to get a local reproducer so I can help validate Steven's
> efforts.
> 
> v2:
> * Fix quoting around Chunhui Li's email name (so they are actually
>   cc'ed)
> * Added tested by tag
> ---
>  fs/pstore/pmsg.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/pstore/pmsg.c b/fs/pstore/pmsg.c
> index ab82e5f05346..b31c9c72d90b 100644
> --- a/fs/pstore/pmsg.c
> +++ b/fs/pstore/pmsg.c
> @@ -7,10 +7,9 @@
>  #include <linux/device.h>
>  #include <linux/fs.h>
>  #include <linux/uaccess.h>
> -#include <linux/rtmutex.h>
>  #include "internal.h"
>  
> -static DEFINE_RT_MUTEX(pmsg_lock);
> +static DEFINE_MUTEX(pmsg_lock);
>  
>  static ssize_t write_pmsg(struct file *file, const char __user *buf,
>  			  size_t count, loff_t *ppos)
> @@ -29,9 +28,9 @@ static ssize_t write_pmsg(struct file *file, const char __user *buf,
>  	if (!access_ok(buf, count))
>  		return -EFAULT;
>  
> -	rt_mutex_lock(&pmsg_lock);
> +	mutex_lock(&pmsg_lock);
>  	ret = psinfo->write_user(&record, buf);
> -	rt_mutex_unlock(&pmsg_lock);
> +	mutex_unlock(&pmsg_lock);
>  	return ret ? ret : count;
>  }
>
  
Steven Rostedt March 6, 2023, 3:28 p.m. UTC | #2
On Mon,  6 Mar 2023 09:03:23 +0800
Hillf Danton <hdanton@sina.com> wrote:

> PS what sense made by spinning on owner until need_resched() with preempt
> disabled in the non-rt context?

Not sure what the question you have here is? If need_resched() is set, we
want to schedule out.

-- Steve
  
John Stultz March 6, 2023, 6:27 p.m. UTC | #3
On Sun, Mar 5, 2023 at 8:36 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sat,  4 Mar 2023 03:10:29 +0000
> John Stultz <jstultz@google.com> wrote:
>
> > This reverts commit 76d62f24db07f22ccf9bc18ca793c27d4ebef721.
> >
> > So while priority inversion on the pmsg_lock is an occasional
> > problem that an rt_mutex would help with, in uses where logging
> > is writing to pmsg heavily from multiple threads, the pmsg_lock
> > can be heavily contended.
> >
> > Normal mutexes can do adaptive spinning, which keeps the
> > contention overhead fairly low maybe adding on the order of 10s
> > of us delay waiting, but the slowpath w/ rt_mutexes makes the
> > blocked tasks sleep & wake. This makes matters worse when there
> > is heavy contentention, as it just allows additional threads to
> > run and line up to try to take the lock.
>
> That is incorrect. rt_mutexes have pretty much the same adaptive spinning
> as normal mutexes. So that is *not* the reason for the difference in
> performance, and should not be stated so in this change log.

Good point. Apologies, I was trying to describe the *behavior* being
seen in the traces by switching between mutex and rt_mtuex, but I do
see how it reads as a description of the capabilities of the
primitive.

thanks
-john
  
Steven Rostedt March 7, 2023, 1:58 a.m. UTC | #4
On Tue,  7 Mar 2023 08:31:06 +0800
Hillf Danton <hdanton@sina.com> wrote:

> On Mon, 6 Mar 2023 10:28:44 -0500 Steven Rostedt <rostedt@goodmis.org>
> > On Mon, 6 Mar 2023 09:03:23 +0800 Hillf Danton <hdanton@sina.com> wrote:  
> > >
> > > PS what sense made by spinning on owner until need_resched() with preempt
> > > disabled in the non-rt context?  
> > 
> > Not sure what the question you have here is? If need_resched() is set, we
> > want to schedule out.  
> 
> Given the critical section under mutex could be preempted, what is hard to
> understand is the wakeup of a ten-minute sleeper could not preempt a nice-10
> mutex spinner for instance.

But it can. Most wakeups are done by interrupts or softirqs. Both of which
will happen even if a task is running with preemption disabled.

-- Steve
  

Patch

diff --git a/fs/pstore/pmsg.c b/fs/pstore/pmsg.c
index ab82e5f05346..b31c9c72d90b 100644
--- a/fs/pstore/pmsg.c
+++ b/fs/pstore/pmsg.c
@@ -7,10 +7,9 @@ 
 #include <linux/device.h>
 #include <linux/fs.h>
 #include <linux/uaccess.h>
-#include <linux/rtmutex.h>
 #include "internal.h"
 
-static DEFINE_RT_MUTEX(pmsg_lock);
+static DEFINE_MUTEX(pmsg_lock);
 
 static ssize_t write_pmsg(struct file *file, const char __user *buf,
 			  size_t count, loff_t *ppos)
@@ -29,9 +28,9 @@  static ssize_t write_pmsg(struct file *file, const char __user *buf,
 	if (!access_ok(buf, count))
 		return -EFAULT;
 
-	rt_mutex_lock(&pmsg_lock);
+	mutex_lock(&pmsg_lock);
 	ret = psinfo->write_user(&record, buf);
-	rt_mutex_unlock(&pmsg_lock);
+	mutex_unlock(&pmsg_lock);
 	return ret ? ret : count;
 }