[v3,2/2] rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale

Message ID 20230321052337.26553-2-qiuxu.zhuo@intel.com
State New
Headers
Series [v3,1/2] rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() |

Commit Message

Qiuxu Zhuo March 21, 2023, 5:23 a.m. UTC
  When running the 'kfree_rcu_test' test case with commands [1] the call
trace [2] was thrown. This was because the kfree_scale_thread thread(s)
still run after unloading rcuscale and torture modules. Fix the call
trace by invoking kfree_scale_cleanup() from rcu_scale_cleanup() when
removing the rcuscale module.

[1] modprobe rcuscale kfree_rcu_test=1
    // After some time
    rmmod rcuscale
    rmmod torture

[2] BUG: unable to handle page fault for address: ffffffffc0601a87
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0
    Oops: 0010 [#1] PREEMPT SMP NOPTI
    CPU: 1 PID: 1798 Comm: kfree_scale_thr Not tainted 6.3.0-rc1-rcu+ #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
    RIP: 0010:0xffffffffc0601a87
    Code: Unable to access opcode bytes at 0xffffffffc0601a5d.
    RSP: 0018:ffffb25bc2e57e18 EFLAGS: 00010297
    RAX: 0000000000000000 RBX: ffffffffc061f0b6 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffffff962fd0de RDI: ffffffff962fd0de
    RBP: ffffb25bc2e57ea8 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
    R13: 0000000000000000 R14: 000000000000000a R15: 00000000001c1dbe
    FS:  0000000000000000(0000) GS:ffff921fa2200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffc0601a5d CR3: 000000011de4c006 CR4: 0000000000370ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     ? kvfree_call_rcu+0xf0/0x3a0
     ? kthread+0xf3/0x120
     ? kthread_complete_and_exit+0x20/0x20
     ? ret_from_fork+0x1f/0x30
     </TASK>
    Modules linked in: rfkill sunrpc ... [last unloaded: torture]
    CR2: ffffffffc0601a87
    ---[ end trace 0000000000000000 ]---

Fixes: e6e78b004fa7 ("rcuperf: Add kfree_rcu() performance Tests")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
v1 -> v2:

 - Move rcu_scale_cleanup() after kfree_scale_cleanup() to eliminate the
   declaration of kfree_scale_cleanup().

 - Remove the unnecessary step "modprobe torture" from the commit message.

 - Add the description for why move rcu_scale_cleanup() after
   kfree_scale_cleanup() to the commit message.

v2 -> v3:

 - Split the single v2 patch into two patches.

 - Move the commit message description for why move rcu_scale_cleanup()
   after kfree_scale_cleanup() to Patch 1.

 kernel/rcu/rcuscale.c | 5 +++++
 1 file changed, 5 insertions(+)
  

Comments

Davidlohr Bueso March 21, 2023, 3:47 p.m. UTC | #1
On Tue, 21 Mar 2023, Qiuxu Zhuo wrote:

>When running the 'kfree_rcu_test' test case with commands [1] the call
>trace [2] was thrown. This was because the kfree_scale_thread thread(s)
>still run after unloading rcuscale and torture modules. Fix the call
>trace by invoking kfree_scale_cleanup() from rcu_scale_cleanup() when
>removing the rcuscale module.
>
>[1] modprobe rcuscale kfree_rcu_test=1
>    // After some time
>    rmmod rcuscale
>    rmmod torture
>
>[2] BUG: unable to handle page fault for address: ffffffffc0601a87
>    #PF: supervisor instruction fetch in kernel mode
>    #PF: error_code(0x0010) - not-present page
>    PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0
>    Oops: 0010 [#1] PREEMPT SMP NOPTI
>    CPU: 1 PID: 1798 Comm: kfree_scale_thr Not tainted 6.3.0-rc1-rcu+ #1
>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
>    RIP: 0010:0xffffffffc0601a87
>    Code: Unable to access opcode bytes at 0xffffffffc0601a5d.
>    RSP: 0018:ffffb25bc2e57e18 EFLAGS: 00010297
>    RAX: 0000000000000000 RBX: ffffffffc061f0b6 RCX: 0000000000000000
>    RDX: 0000000000000000 RSI: ffffffff962fd0de RDI: ffffffff962fd0de
>    RBP: ffffb25bc2e57ea8 R08: 0000000000000000 R09: 0000000000000000
>    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
>    R13: 0000000000000000 R14: 000000000000000a R15: 00000000001c1dbe
>    FS:  0000000000000000(0000) GS:ffff921fa2200000(0000) knlGS:0000000000000000
>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    CR2: ffffffffc0601a5d CR3: 000000011de4c006 CR4: 0000000000370ee0
>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>    Call Trace:
>     <TASK>
>     ? kvfree_call_rcu+0xf0/0x3a0
>     ? kthread+0xf3/0x120
>     ? kthread_complete_and_exit+0x20/0x20
>     ? ret_from_fork+0x1f/0x30
>     </TASK>
>    Modules linked in: rfkill sunrpc ... [last unloaded: torture]
>    CR2: ffffffffc0601a87
>    ---[ end trace 0000000000000000 ]---
>
>Fixes: e6e78b004fa7 ("rcuperf: Add kfree_rcu() performance Tests")
>Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
  
Paul E. McKenney March 21, 2023, 7:23 p.m. UTC | #2
On Tue, Mar 21, 2023 at 08:47:51AM -0700, Davidlohr Bueso wrote:
> On Tue, 21 Mar 2023, Qiuxu Zhuo wrote:
> 
> > When running the 'kfree_rcu_test' test case with commands [1] the call
> > trace [2] was thrown. This was because the kfree_scale_thread thread(s)
> > still run after unloading rcuscale and torture modules. Fix the call
> > trace by invoking kfree_scale_cleanup() from rcu_scale_cleanup() when
> > removing the rcuscale module.
> > 
> > [1] modprobe rcuscale kfree_rcu_test=1
> >    // After some time
> >    rmmod rcuscale
> >    rmmod torture
> > 
> > [2] BUG: unable to handle page fault for address: ffffffffc0601a87
> >    #PF: supervisor instruction fetch in kernel mode
> >    #PF: error_code(0x0010) - not-present page
> >    PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0
> >    Oops: 0010 [#1] PREEMPT SMP NOPTI
> >    CPU: 1 PID: 1798 Comm: kfree_scale_thr Not tainted 6.3.0-rc1-rcu+ #1
> >    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> >    RIP: 0010:0xffffffffc0601a87
> >    Code: Unable to access opcode bytes at 0xffffffffc0601a5d.
> >    RSP: 0018:ffffb25bc2e57e18 EFLAGS: 00010297
> >    RAX: 0000000000000000 RBX: ffffffffc061f0b6 RCX: 0000000000000000
> >    RDX: 0000000000000000 RSI: ffffffff962fd0de RDI: ffffffff962fd0de
> >    RBP: ffffb25bc2e57ea8 R08: 0000000000000000 R09: 0000000000000000
> >    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> >    R13: 0000000000000000 R14: 000000000000000a R15: 00000000001c1dbe
> >    FS:  0000000000000000(0000) GS:ffff921fa2200000(0000) knlGS:0000000000000000
> >    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >    CR2: ffffffffc0601a5d CR3: 000000011de4c006 CR4: 0000000000370ee0
> >    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >    Call Trace:
> >     <TASK>
> >     ? kvfree_call_rcu+0xf0/0x3a0
> >     ? kthread+0xf3/0x120
> >     ? kthread_complete_and_exit+0x20/0x20
> >     ? ret_from_fork+0x1f/0x30
> >     </TASK>
> >    Modules linked in: rfkill sunrpc ... [last unloaded: torture]
> >    CR2: ffffffffc0601a87
> >    ---[ end trace 0000000000000000 ]---
> > 
> > Fixes: e6e78b004fa7 ("rcuperf: Add kfree_rcu() performance Tests")
> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> 
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

Much better, thank you both!

But unfortunately, these patches do not apply cleanly.  Qiuxu Zhuo,
could you please forward port these to the -rcu "dev" branch [1]?

						Thanx, Paul

[1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html
  
Joel Fernandes March 21, 2023, 9:28 p.m. UTC | #3
On Tue, Mar 21, 2023 at 3:24 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Mar 21, 2023 at 08:47:51AM -0700, Davidlohr Bueso wrote:
> > On Tue, 21 Mar 2023, Qiuxu Zhuo wrote:
> >
> > > When running the 'kfree_rcu_test' test case with commands [1] the call
> > > trace [2] was thrown. This was because the kfree_scale_thread thread(s)
> > > still run after unloading rcuscale and torture modules. Fix the call
> > > trace by invoking kfree_scale_cleanup() from rcu_scale_cleanup() when
> > > removing the rcuscale module.
> > >
> > > [1] modprobe rcuscale kfree_rcu_test=1
> > >    // After some time
> > >    rmmod rcuscale
> > >    rmmod torture
> > >
> > > [2] BUG: unable to handle page fault for address: ffffffffc0601a87
> > >    #PF: supervisor instruction fetch in kernel mode
> > >    #PF: error_code(0x0010) - not-present page
> > >    PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0
> > >    Oops: 0010 [#1] PREEMPT SMP NOPTI
> > >    CPU: 1 PID: 1798 Comm: kfree_scale_thr Not tainted 6.3.0-rc1-rcu+ #1
> > >    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > >    RIP: 0010:0xffffffffc0601a87
> > >    Code: Unable to access opcode bytes at 0xffffffffc0601a5d.
> > >    RSP: 0018:ffffb25bc2e57e18 EFLAGS: 00010297
> > >    RAX: 0000000000000000 RBX: ffffffffc061f0b6 RCX: 0000000000000000
> > >    RDX: 0000000000000000 RSI: ffffffff962fd0de RDI: ffffffff962fd0de
> > >    RBP: ffffb25bc2e57ea8 R08: 0000000000000000 R09: 0000000000000000
> > >    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> > >    R13: 0000000000000000 R14: 000000000000000a R15: 00000000001c1dbe
> > >    FS:  0000000000000000(0000) GS:ffff921fa2200000(0000) knlGS:0000000000000000
> > >    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >    CR2: ffffffffc0601a5d CR3: 000000011de4c006 CR4: 0000000000370ee0
> > >    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >    Call Trace:
> > >     <TASK>
> > >     ? kvfree_call_rcu+0xf0/0x3a0
> > >     ? kthread+0xf3/0x120
> > >     ? kthread_complete_and_exit+0x20/0x20
> > >     ? ret_from_fork+0x1f/0x30
> > >     </TASK>
> > >    Modules linked in: rfkill sunrpc ... [last unloaded: torture]
> > >    CR2: ffffffffc0601a87
> > >    ---[ end trace 0000000000000000 ]---
> > >
> > > Fixes: e6e78b004fa7 ("rcuperf: Add kfree_rcu() performance Tests")
> > > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> >
> > Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>
> Much better, thank you both!
>
> But unfortunately, these patches do not apply cleanly.  Qiuxu Zhuo,
> could you please forward port these to the -rcu "dev" branch [1]?

After making it cleanly apply:
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel

>
>                                                 Thanx, Paul
>
> [1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html
  
Qiuxu Zhuo March 22, 2023, 1:26 a.m. UTC | #4
> From: Paul E. McKenney <paulmck@kernel.org>
> [...]
> > > Fixes: e6e78b004fa7 ("rcuperf: Add kfree_rcu() performance Tests")
> > > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> >
> > Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> 
> Much better, thank you both!
> 
> But unfortunately, these patches do not apply cleanly.  Qiuxu Zhuo, could
> you please forward port these to the -rcu "dev" branch [1]?
> 

Hi Paul,

OK. 
I'll be making v4 patches rebased on the top of the -rcu "dev" branch.
Thanks for letting me know more about the RCU patch workflow.

Also thank you Davidlohr Bueso and Joel for reviewing the patches.

- Qiuxu

> 						Thanx, Paul
> 
> [1]
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.
> html
  
Joel Fernandes March 22, 2023, 2:18 a.m. UTC | #5
On Tue, Mar 21, 2023 at 9:26 PM Zhuo, Qiuxu <qiuxu.zhuo@intel.com> wrote:
>
> > From: Paul E. McKenney <paulmck@kernel.org>
> > [...]
> > > > Fixes: e6e78b004fa7 ("rcuperf: Add kfree_rcu() performance Tests")
> > > > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > >
> > > Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> >
> > Much better, thank you both!
> >
> > But unfortunately, these patches do not apply cleanly.  Qiuxu Zhuo, could
> > you please forward port these to the -rcu "dev" branch [1]?
> >
>
> Hi Paul,
>
> OK.
> I'll be making v4 patches rebased on the top of the -rcu "dev" branch.
> Thanks for letting me know more about the RCU patch workflow.
>
> Also thank you Davidlohr Bueso and Joel for reviewing the patches.

You're welcome and thanks for your interactions on the mailing list
and RCU interest. :-)


 - Joel
  

Patch

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index e99096a4f094..5a000d26f03e 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -797,6 +797,11 @@  rcu_scale_cleanup(void)
 	if (gp_exp && gp_async)
 		SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
 
+	if (kfree_rcu_test) {
+		kfree_scale_cleanup();
+		return;
+	}
+
 	if (torture_cleanup_begin())
 		return;
 	if (!cur_ops) {