iommu/vt-d: Fix a IOMMU perfmon warning when CPU hotplug

Message ID 20230324151106.526132-1-kan.liang@linux.intel.com
State New
Headers
Series iommu/vt-d: Fix a IOMMU perfmon warning when CPU hotplug |

Commit Message

Liang, Kan March 24, 2023, 3:11 p.m. UTC
  From: Kan Liang <kan.liang@linux.intel.com>

A warning can be triggered when hotplug CPU 0.
 $ echo 0 > /sys/devices/system/cpu/cpu0/online

[11958.737635] ------------[ cut here ]------------
[11958.742882] Voluntary context switch within RCU read-side critical
section!
[11958.742891] WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
rcu_note_context_switch+0x4f4/0x580
[11958.860095] RIP: 0010:rcu_note_context_switch+0x4f4/0x580
[11958.960360] Call Trace:
[11958.963161]  <TASK>
[11958.965565]  ? perf_event_update_userpage+0x104/0x150
[11958.971293]  __schedule+0x8d/0x960
[11958.975165]  ? perf_event_set_state.part.82+0x11/0x50
[11958.980891]  schedule+0x44/0xb0
[11958.984464]  schedule_timeout+0x226/0x310
[11958.989017]  ? __perf_event_disable+0x64/0x1a0
[11958.994054]  ? _raw_spin_unlock+0x14/0x30
[11958.998605]  wait_for_completion+0x94/0x130
[11959.003352]  __wait_rcu_gp+0x108/0x130
[11959.007616]  synchronize_rcu+0x67/0x70
[11959.011876]  ? invoke_rcu_core+0xb0/0xb0
[11959.016333]  ? __bpf_trace_rcu_stall_warning+0x10/0x10
[11959.022147]  perf_pmu_migrate_context+0x121/0x370
[11959.027478]  iommu_pmu_cpu_offline+0x6a/0xa0
[11959.032325]  ? iommu_pmu_del+0x1e0/0x1e0
[11959.036782]  cpuhp_invoke_callback+0x129/0x510
[11959.041825]  cpuhp_thread_fun+0x94/0x150
[11959.046283]  smpboot_thread_fn+0x183/0x220
[11959.050933]  ? sort_range+0x20/0x20
[11959.054902]  kthread+0xe6/0x110
[11959.058479]  ? kthread_complete_and_exit+0x20/0x20
[11959.063911]  ret_from_fork+0x1f/0x30
[11959.067982]  </TASK>
[11959.070489] ---[ end trace 0000000000000000 ]---

The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
when migrating a PMU to a new CPU. However, the current for_each_iommu()
is within RCU read-side critical section.

Use the dmar_global_lock to replace the RCU read lock when going through
the drhd list.

Fixes: 46284c6ceb5e ("iommu/vt-d: Support cpumask for IOMMU perfmon")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 drivers/iommu/intel/perfmon.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Baolu Lu March 25, 2023, 12:59 p.m. UTC | #1
On 2023/3/24 23:11, kan.liang@linux.intel.com wrote:
> From: Kan Liang<kan.liang@linux.intel.com>
> 
> A warning can be triggered when hotplug CPU 0.
>   $ echo 0 > /sys/devices/system/cpu/cpu0/online
> 
> [11958.737635] ------------[ cut here ]------------
> [11958.742882] Voluntary context switch within RCU read-side critical
> section!
> [11958.742891] WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
> rcu_note_context_switch+0x4f4/0x580
> [11958.860095] RIP: 0010:rcu_note_context_switch+0x4f4/0x580
> [11958.960360] Call Trace:
> [11958.963161]  <TASK>
> [11958.965565]  ? perf_event_update_userpage+0x104/0x150
> [11958.971293]  __schedule+0x8d/0x960
> [11958.975165]  ? perf_event_set_state.part.82+0x11/0x50
> [11958.980891]  schedule+0x44/0xb0
> [11958.984464]  schedule_timeout+0x226/0x310
> [11958.989017]  ? __perf_event_disable+0x64/0x1a0
> [11958.994054]  ? _raw_spin_unlock+0x14/0x30
> [11958.998605]  wait_for_completion+0x94/0x130
> [11959.003352]  __wait_rcu_gp+0x108/0x130
> [11959.007616]  synchronize_rcu+0x67/0x70
> [11959.011876]  ? invoke_rcu_core+0xb0/0xb0
> [11959.016333]  ? __bpf_trace_rcu_stall_warning+0x10/0x10
> [11959.022147]  perf_pmu_migrate_context+0x121/0x370
> [11959.027478]  iommu_pmu_cpu_offline+0x6a/0xa0
> [11959.032325]  ? iommu_pmu_del+0x1e0/0x1e0
> [11959.036782]  cpuhp_invoke_callback+0x129/0x510
> [11959.041825]  cpuhp_thread_fun+0x94/0x150
> [11959.046283]  smpboot_thread_fn+0x183/0x220
> [11959.050933]  ? sort_range+0x20/0x20
> [11959.054902]  kthread+0xe6/0x110
> [11959.058479]  ? kthread_complete_and_exit+0x20/0x20
> [11959.063911]  ret_from_fork+0x1f/0x30
> [11959.067982]  </TASK>
> [11959.070489] ---[ end trace 0000000000000000 ]---
> 
> The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
> when migrating a PMU to a new CPU. However, the current for_each_iommu()
> is within RCU read-side critical section.
> 
> Use the dmar_global_lock to replace the RCU read lock when going through
> the drhd list.

This introduces the following lockdep warning.

The iommu_pmu_register() path holds the dmar_global_lock and applies for
cpuhp_state lock; On the contrary, the iommu_pmu_cpu_offline() path
holds the cpuhp_state and applies for the dmar_global_lock.

Can you please have a look into it?

[12861.883945] ======================================================
[12861.884750] WARNING: possible circular locking dependency detected
[12861.885556] 6.3.0-rc3-00033-g5ff8930d087a #491 Not tainted
[12861.886272] ------------------------------------------------------
[12861.887077] cpuhp/0/19 is trying to acquire lock:
[12861.887703] ffffffff91c1d0d0 (dmar_global_lock){++++}-{3:3}, at: 
iommu_pmu_cpu_offline+0x62/0xd0
[12861.888839]
                but task is already holding lock:
[12861.889605] ffffffff8e26ed40 (cpuhp_state-down){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x4b/0x210
[12861.890701]
                which lock already depends on the new lock.

[12861.891747]
                the existing dependency chain (in reverse order) is:
[12861.892713]
                -> #2 (cpuhp_state-down){+.+.}-{0:0}:
[12861.893525]        lock_acquire+0xc2/0x2a0
[12861.894074]        cpuhp_kick_ap_work+0x5b/0x1d0
[12861.894692]        _cpu_down+0xc3/0x300
[12861.895210]        cpu_down+0x2e/0x50
[12861.895708]        device_offline+0x8c/0xc0
[12861.896266]        online_store+0x4e/0xa0
[12861.896804]        kernfs_fop_write_iter+0x12a/0x1d0
[12861.897462]        vfs_write+0x313/0x4c0
[12861.897989]        ksys_write+0x60/0xe0
[12861.898507]        do_syscall_64+0x43/0x90
[12861.899055]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
[12861.899783]
                -> #1 (cpu_hotplug_lock){++++}-{0:0}:
[12861.900595]        lock_acquire+0xc2/0x2a0
[12861.901143]        __cpuhp_setup_state+0x5b/0x1f0
[12861.901772]        iommu_pmu_register+0x1a2/0x200
[12861.902400]        intel_iommu_init+0x5de/0x880
[12861.903009]        pci_iommu_init+0x12/0x40
[12861.903567]        do_one_initcall+0x65/0x290
[12861.904155]        kernel_init_freeable+0x3cf/0x560
[12861.904804]        kernel_init+0x1a/0x140
[12861.905342]        ret_from_fork+0x29/0x50
[12861.905890]
                -> #0 (dmar_global_lock){++++}-{3:3}:
[12861.906702]        check_prevs_add+0x160/0xee0
[12861.907300]        __lock_acquire+0x143b/0x18c0
[12861.907909]        lock_acquire+0xc2/0x2a0
[12861.908457]        down_write+0x3f/0xc0
[12861.908976]        iommu_pmu_cpu_offline+0x62/0xd0
[12861.909614]        cpuhp_invoke_callback+0x175/0x670
[12861.910273]        cpuhp_thread_fun+0x19d/0x210
[12861.910882]        smpboot_thread_fn+0x11d/0x240
[12861.911501]        kthread+0xeb/0x120
[12861.911998]        ret_from_fork+0x29/0x50
[12861.912546]
                other info that might help us debug this:

[12861.913572] Chain exists of:
                  dmar_global_lock --> cpu_hotplug_lock --> cpuhp_state-down

[12861.914977]  Possible unsafe locking scenario:

[12861.915753]        CPU0                    CPU1
[12861.916358]        ----                    ----
[12861.916963]   lock(cpuhp_state-down);
[12861.917460]                                lock(cpu_hotplug_lock);
[12861.918269]                                lock(cpuhp_state-down);
[12861.919077]   lock(dmar_global_lock);
[12861.919575]
                 *** DEADLOCK ***

[12861.920350] 2 locks held by cpuhp/0/19:
[12861.920865]  #0: ffffffff8e26ebd0 (cpu_hotplug_lock){++++}-{0:0}, at: 
cpuhp_thread_fun+0x47/0x210
[12861.922014]  #1: ffffffff8e26ed40 (cpuhp_state-down){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x4b/0x210
[12861.923162]
                stack backtrace:
[12861.923747] CPU: 0 PID: 19 Comm: cpuhp/0 Not tainted 
6.3.0-rc3-00033-g5ff8930d087a #491
[12861.924775] Hardware name: Intel Corporation BIRCHSTREAM/BIRCHSTREAM, 
BIOS BHSDREL1.86B.0018.D47.2211031932 11/03/2022
[12861.926133] Call Trace:
[12861.926478]  <TASK>
[12861.926783]  dump_stack_lvl+0x4a/0x80
[12861.927282]  check_noncircular+0x102/0x120
[12861.927833]  ? alloc_list_entry+0x13/0x130
[12861.928384]  check_prevs_add+0x160/0xee0
[12861.928918]  __lock_acquire+0x143b/0x18c0
[12861.929461]  lock_acquire+0xc2/0x2a0
[12861.929950]  ? iommu_pmu_cpu_offline+0x62/0xd0
[12861.930550]  ? lock_is_held_type+0x9d/0x110
[12861.931113]  down_write+0x3f/0xc0
[12861.931572]  ? iommu_pmu_cpu_offline+0x62/0xd0
[12861.932171]  iommu_pmu_cpu_offline+0x62/0xd0
[12861.932749]  ? __pfx_iommu_pmu_cpu_offline+0x10/0x10
[12861.933408]  cpuhp_invoke_callback+0x175/0x670
[12861.934007]  ? lock_release+0xc3/0x250
[12861.934519]  cpuhp_thread_fun+0x19d/0x210
[12861.935059]  ? smpboot_thread_fn+0x2f/0x240
[12861.935618]  ? smpboot_thread_fn+0x69/0x240
[12861.936177]  ? smpboot_thread_fn+0x125/0x240
[12861.936757]  smpboot_thread_fn+0x11d/0x240
[12861.937308]  ? __pfx_smpboot_thread_fn+0x10/0x10
[12861.937927]  kthread+0xeb/0x120
[12861.938365]  ? __pfx_kthread+0x10/0x10
[12861.938874]  ret_from_fork+0x29/0x50
[12861.939370]  </TASK>

Best regards,
baolu
  

Patch

diff --git a/drivers/iommu/intel/perfmon.c b/drivers/iommu/intel/perfmon.c
index e17d9743a0d8..81c9554fb1dc 100644
--- a/drivers/iommu/intel/perfmon.c
+++ b/drivers/iommu/intel/perfmon.c
@@ -797,14 +797,14 @@  static int iommu_pmu_cpu_offline(unsigned int cpu)
 	else
 		target = -1;
 
-	rcu_read_lock();
+	down_write(&dmar_global_lock);
 
 	for_each_iommu(iommu, drhd) {
 		if (!iommu->pmu)
 			continue;
 		perf_pmu_migrate_context(&iommu->pmu->pmu, cpu, target);
 	}
-	rcu_read_unlock();
+	up_write(&dmar_global_lock);
 
 	return 0;
 }