iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()

Message ID 20230314051836.23817-1-baolu.lu@linux.intel.com
State New
Headers
Series iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc() |

Commit Message

Baolu Lu March 14, 2023, 5:18 a.m. UTC
  The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.

Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
the DMAR global data structures are not touched there. Remove it to avoid
below lockdep warning.

 ======================================================
 WARNING: possible circular locking dependency detected
 6.3.0-rc2 #468 Not tainted
 ------------------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
   at: __irq_domain_alloc_irqs+0x3b/0xa0

 but task is already holding lock:
 ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
   at: intel_iommu_init+0x58e/0x880

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (dmar_global_lock){++++}-{3:3}:
        lock_acquire+0xd6/0x320
        down_read+0x42/0x180
        intel_irq_remapping_alloc+0xad/0x750
        mp_irqdomain_alloc+0xb8/0x2b0
        irq_domain_alloc_irqs_locked+0x12f/0x2d0
        __irq_domain_alloc_irqs+0x56/0xa0
        alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
        mp_map_pin_to_irq+0x1dc/0x330
        setup_IO_APIC+0x128/0x210
        apic_intr_mode_init+0x67/0x110
        x86_late_time_init+0x24/0x40
        start_kernel+0x41e/0x7e0
        secondary_startup_64_no_verify+0xe0/0xeb

 -> #0 (&domain->mutex){+.+.}-{3:3}:
        check_prevs_add+0x160/0xef0
        __lock_acquire+0x147d/0x1950
        lock_acquire+0xd6/0x320
        __mutex_lock+0x9c/0xfc0
        __irq_domain_alloc_irqs+0x3b/0xa0
        dmar_alloc_hwirq+0x9e/0x120
        iommu_pmu_register+0x11d/0x200
        intel_iommu_init+0x5de/0x880
        pci_iommu_init+0x12/0x40
        do_one_initcall+0x65/0x350
        kernel_init_freeable+0x3ca/0x610
        kernel_init+0x1a/0x140
        ret_from_fork+0x29/0x50

 other info that might help us debug this:

 Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(dmar_global_lock);
                                lock(&domain->mutex);
                                lock(dmar_global_lock);
   lock(&domain->mutex);

                *** DEADLOCK ***

Fixes: 9dbb8e3452ab ("irqdomain: Switch to per-domain locking")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/irq_remapping.c | 6 ------
 1 file changed, 6 deletions(-)
  

Comments

Jacob Pan March 14, 2023, 3:54 p.m. UTC | #1
Hi BaoLu,

On Tue, 14 Mar 2023 13:18:36 +0800, Lu Baolu <baolu.lu@linux.intel.com>
wrote:

> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
> is used to protect DMAR related global data from DMAR hotplug operations.
> 
> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
> the DMAR global data structures are not touched there. Remove it to avoid
> below lockdep warning.
> 
>  ======================================================
>  WARNING: possible circular locking dependency detected
>  6.3.0-rc2 #468 Not tainted
>  ------------------------------------------------------
>  swapper/0/1 is trying to acquire lock:
>  ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
>    at: __irq_domain_alloc_irqs+0x3b/0xa0
> 
>  but task is already holding lock:
>  ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
>    at: intel_iommu_init+0x58e/0x880
> 
>  which lock already depends on the new lock.
> 
>  the existing dependency chain (in reverse order) is:
> 
>  -> #1 (dmar_global_lock){++++}-{3:3}:
>         lock_acquire+0xd6/0x320
>         down_read+0x42/0x180
>         intel_irq_remapping_alloc+0xad/0x750
>         mp_irqdomain_alloc+0xb8/0x2b0
>         irq_domain_alloc_irqs_locked+0x12f/0x2d0
>         __irq_domain_alloc_irqs+0x56/0xa0
>         alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
>         mp_map_pin_to_irq+0x1dc/0x330
>         setup_IO_APIC+0x128/0x210
>         apic_intr_mode_init+0x67/0x110
>         x86_late_time_init+0x24/0x40
>         start_kernel+0x41e/0x7e0
>         secondary_startup_64_no_verify+0xe0/0xeb
> 
>  -> #0 (&domain->mutex){+.+.}-{3:3}:
>         check_prevs_add+0x160/0xef0
>         __lock_acquire+0x147d/0x1950
>         lock_acquire+0xd6/0x320
>         __mutex_lock+0x9c/0xfc0
>         __irq_domain_alloc_irqs+0x3b/0xa0
>         dmar_alloc_hwirq+0x9e/0x120
>         iommu_pmu_register+0x11d/0x200
>         intel_iommu_init+0x5de/0x880
>         pci_iommu_init+0x12/0x40
>         do_one_initcall+0x65/0x350
>         kernel_init_freeable+0x3ca/0x610
>         kernel_init+0x1a/0x140
>         ret_from_fork+0x29/0x50
> 
>  other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(dmar_global_lock);
>                                 lock(&domain->mutex);
>                                 lock(dmar_global_lock);
>    lock(&domain->mutex);
> 
>                 *** DEADLOCK ***
> 
> Fixes: 9dbb8e3452ab ("irqdomain: Switch to per-domain locking")
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/irq_remapping.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/iommu/intel/irq_remapping.c
> b/drivers/iommu/intel/irq_remapping.c index 6d01fa078c36..df9e261af0b5
> 100644 --- a/drivers/iommu/intel/irq_remapping.c
> +++ b/drivers/iommu/intel/irq_remapping.c
> @@ -311,14 +311,12 @@ static int set_ioapic_sid(struct irte *irte, int
> apic) if (!irte)
>  		return -1;
>  
> -	down_read(&dmar_global_lock);
>  	for (i = 0; i < MAX_IO_APICS; i++) {
>  		if (ir_ioapic[i].iommu && ir_ioapic[i].id == apic) {
>  			sid = (ir_ioapic[i].bus << 8) |
> ir_ioapic[i].devfn; break;
>  		}
>  	}
> -	up_read(&dmar_global_lock);
>  
>  	if (sid == 0) {
>  		pr_warn("Failed to set source-id of IOAPIC (%d)\n",
> apic); @@ -338,14 +336,12 @@ static int set_hpet_sid(struct irte *irte,
> u8 id) if (!irte)
>  		return -1;
>  
> -	down_read(&dmar_global_lock);
>  	for (i = 0; i < MAX_HPET_TBS; i++) {
>  		if (ir_hpet[i].iommu && ir_hpet[i].id == id) {
>  			sid = (ir_hpet[i].bus << 8) | ir_hpet[i].devfn;
>  			break;
>  		}
>  	}
> -	up_read(&dmar_global_lock);
>  
>  	if (sid == 0) {
>  		pr_warn("Failed to set source-id of HPET block (%d)\n",
> id); @@ -1339,9 +1335,7 @@ static int intel_irq_remapping_alloc(struct
> irq_domain *domain, if (!data)
>  		goto out_free_parent;
>  
> -	down_read(&dmar_global_lock);
>  	index = alloc_irte(iommu, &data->irq_2_iommu, nr_irqs);
> -	up_read(&dmar_global_lock);
>  	if (index < 0) {
>  		pr_warn("Failed to allocate IRTE\n");
>  		kfree(data);
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

slightly beyond the scope of this, do we need to take dmar_global_lock
below? shouldn't it be in single threaded context?

	down_write(&dmar_global_lock);
	ret = dmar_dev_scope_init();
	up_write(&dmar_global_lock);

	return ret;
}
rootfs_initcall(ir_dev_scope_init);

Thanks,

Jacob
  
Baolu Lu March 15, 2023, 1:14 a.m. UTC | #2
On 3/14/23 11:54 PM, Jacob Pan wrote:
> Hi BaoLu,
> 
> On Tue, 14 Mar 2023 13:18:36 +0800, Lu Baolu <baolu.lu@linux.intel.com>
> wrote:
> 
>> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
>> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
>> is used to protect DMAR related global data from DMAR hotplug operations.
>>
>> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
>> the DMAR global data structures are not touched there. Remove it to avoid
>> below lockdep warning.
>>
>>   ======================================================
>>   WARNING: possible circular locking dependency detected
>>   6.3.0-rc2 #468 Not tainted
>>   ------------------------------------------------------
>>   swapper/0/1 is trying to acquire lock:
>>   ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
>>     at: __irq_domain_alloc_irqs+0x3b/0xa0
>>
>>   but task is already holding lock:
>>   ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
>>     at: intel_iommu_init+0x58e/0x880
>>
>>   which lock already depends on the new lock.
>>
>>   the existing dependency chain (in reverse order) is:
>>
>>   -> #1 (dmar_global_lock){++++}-{3:3}:
>>          lock_acquire+0xd6/0x320
>>          down_read+0x42/0x180
>>          intel_irq_remapping_alloc+0xad/0x750
>>          mp_irqdomain_alloc+0xb8/0x2b0
>>          irq_domain_alloc_irqs_locked+0x12f/0x2d0
>>          __irq_domain_alloc_irqs+0x56/0xa0
>>          alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
>>          mp_map_pin_to_irq+0x1dc/0x330
>>          setup_IO_APIC+0x128/0x210
>>          apic_intr_mode_init+0x67/0x110
>>          x86_late_time_init+0x24/0x40
>>          start_kernel+0x41e/0x7e0
>>          secondary_startup_64_no_verify+0xe0/0xeb
>>
>>   -> #0 (&domain->mutex){+.+.}-{3:3}:
>>          check_prevs_add+0x160/0xef0
>>          __lock_acquire+0x147d/0x1950
>>          lock_acquire+0xd6/0x320
>>          __mutex_lock+0x9c/0xfc0
>>          __irq_domain_alloc_irqs+0x3b/0xa0
>>          dmar_alloc_hwirq+0x9e/0x120
>>          iommu_pmu_register+0x11d/0x200
>>          intel_iommu_init+0x5de/0x880
>>          pci_iommu_init+0x12/0x40
>>          do_one_initcall+0x65/0x350
>>          kernel_init_freeable+0x3ca/0x610
>>          kernel_init+0x1a/0x140
>>          ret_from_fork+0x29/0x50
>>
>>   other info that might help us debug this:
>>
>>   Possible unsafe locking scenario:
>>
>>          CPU0                    CPU1
>>          ----                    ----
>>     lock(dmar_global_lock);
>>                                  lock(&domain->mutex);
>>                                  lock(dmar_global_lock);
>>     lock(&domain->mutex);
>>
>>                  *** DEADLOCK ***
>>
>> Fixes: 9dbb8e3452ab ("irqdomain: Switch to per-domain locking")
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel/irq_remapping.c | 6 ------
>>   1 file changed, 6 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/irq_remapping.c
>> b/drivers/iommu/intel/irq_remapping.c index 6d01fa078c36..df9e261af0b5
>> 100644 --- a/drivers/iommu/intel/irq_remapping.c
>> +++ b/drivers/iommu/intel/irq_remapping.c
>> @@ -311,14 +311,12 @@ static int set_ioapic_sid(struct irte *irte, int
>> apic) if (!irte)
>>   		return -1;
>>   
>> -	down_read(&dmar_global_lock);
>>   	for (i = 0; i < MAX_IO_APICS; i++) {
>>   		if (ir_ioapic[i].iommu && ir_ioapic[i].id == apic) {
>>   			sid = (ir_ioapic[i].bus << 8) |
>> ir_ioapic[i].devfn; break;
>>   		}
>>   	}
>> -	up_read(&dmar_global_lock);
>>   
>>   	if (sid == 0) {
>>   		pr_warn("Failed to set source-id of IOAPIC (%d)\n",
>> apic); @@ -338,14 +336,12 @@ static int set_hpet_sid(struct irte *irte,
>> u8 id) if (!irte)
>>   		return -1;
>>   
>> -	down_read(&dmar_global_lock);
>>   	for (i = 0; i < MAX_HPET_TBS; i++) {
>>   		if (ir_hpet[i].iommu && ir_hpet[i].id == id) {
>>   			sid = (ir_hpet[i].bus << 8) | ir_hpet[i].devfn;
>>   			break;
>>   		}
>>   	}
>> -	up_read(&dmar_global_lock);
>>   
>>   	if (sid == 0) {
>>   		pr_warn("Failed to set source-id of HPET block (%d)\n",
>> id); @@ -1339,9 +1335,7 @@ static int intel_irq_remapping_alloc(struct
>> irq_domain *domain, if (!data)
>>   		goto out_free_parent;
>>   
>> -	down_read(&dmar_global_lock);
>>   	index = alloc_irte(iommu, &data->irq_2_iommu, nr_irqs);
>> -	up_read(&dmar_global_lock);
>>   	if (index < 0) {
>>   		pr_warn("Failed to allocate IRTE\n");
>>   		kfree(data);
> Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 
> slightly beyond the scope of this, do we need to take dmar_global_lock
> below? shouldn't it be in single threaded context?
> 
> 	down_write(&dmar_global_lock);
> 	ret = dmar_dev_scope_init();
> 	up_write(&dmar_global_lock);
> 
> 	return ret;
> }
> rootfs_initcall(ir_dev_scope_init);

Yes, agreed. This runs in a single threaded context. I will remove the
locking in a cleanup patch.

Best regards,
baolu
  
Jason Gunthorpe March 27, 2023, 12:18 p.m. UTC | #3
On Tue, Mar 14, 2023 at 01:18:36PM +0800, Lu Baolu wrote:
> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
> is used to protect DMAR related global data from DMAR hotplug operations.
> 
> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
> the DMAR global data structures are not touched there. Remove it to avoid
> below lockdep warning.

Tested-by: Jason Gunthorpe <jgg@nvidia.com>

Solves my splat too

Let's send this to -rc please

Jason
  
Baolu Lu March 27, 2023, 1:13 p.m. UTC | #4
On 2023/3/27 20:18, Jason Gunthorpe wrote:
> On Tue, Mar 14, 2023 at 01:18:36PM +0800, Lu Baolu wrote:
>> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
>> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
>> is used to protect DMAR related global data from DMAR hotplug operations.
>>
>> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
>> the DMAR global data structures are not touched there. Remove it to avoid
>> below lockdep warning.
> 
> Tested-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Solves my splat too
> 
> Let's send this to -rc please

Thank you for the testing. I will queue it for Joerg this week.

Best regards,
baolu
  
Nicolin Chen April 27, 2023, 12:43 a.m. UTC | #5
Hi Baolu/Jason,

On Mon, Mar 27, 2023 at 06:13:10AM -0700, Baolu Lu wrote:
> On 2023/3/27 20:18, Jason Gunthorpe wrote:
> > On Tue, Mar 14, 2023 at 01:18:36PM +0800, Lu Baolu wrote:
> >> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
> >> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
> >> is used to protect DMAR related global data from DMAR hotplug operations.
> >>
> >> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
> >> the DMAR global data structures are not touched there. Remove it to avoid
> >> below lockdep warning.
> >
> > Tested-by: Jason Gunthorpe <jgg@nvidia.com>
> >
> > Solves my splat too
> >
> > Let's send this to -rc please
> 
> Thank you for the testing. I will queue it for Joerg this week.

I found a couple of kernel warnings switching from v6.3-rc4
to v6.3-rc5. Git-bisect is pointing at this commit.

My test environment is MKT enabling irq_remap:
https://github.com/Mellanox/mkt
CONFIG_IOMMUFD=m
CONFIG_IOMMUFD_VFIO_CONTAINER=y
CONFIG_IOMMUFD_TEST=y
CONFIG_IRQ_REMAP=y

Any idea?

Thanks
Nicolin

Attaching WARNINGs:
[   19.680725] ------------[ cut here ]------------
[   19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100
[   19.681356] Modules linked in: vfio_pci vfio_pci_core irqbypass vfio iommufd mlx5_ib ib_uverbs ib_core mlx5_core
[   19.681654] CPU: 0 PID: 561 Comm: python3 Not tainted 6.3.0-rc6+ #1080
[   19.681808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[   19.682108] RIP: 0010:track_pfn_remap+0xf5/0x100
[   19.682270] Code: 5e 5d c3 48 89 f2 31 c9 4c 89 c6 4c 89 e7 e8 42 fc ff ff e9 54 ff ff ff 48 8d b8 88 01 00 00 31 f6 e8 5f 97 76 00 85 c0 75 be <0f> 0b eb ba 0f 1f 80 00 00 00 00 80 3d 71 72 ef 00 00 74 01 c3 55
[   19.682678] RSP: 0000:ffffc900014b7ce8 EFLAGS: 00010246
[   19.682805] RAX: 0000000000000000 RBX: 0000000002000000 RCX: 0000000000000000
[   19.682984] RDX: 0000000000000000 RSI: ffff888104a709c8 RDI: ffff888108756150
[   19.683214] RBP: ffffc900014b7d08 R08: 0000000000000001 R09: 0000000000000003
[   19.683464] R10: 000000000e6a4a47 R11: 00000000620892b1 R12: 00000000fc000000
[   19.683663] R13: ffffc900014b7d20 R14: ffff888106a35100 R15: 0000000002000000
[   19.683843] FS:  00007fa8aa4edb80(0000) GS:ffff8881ba400000(0000) knlGS:0000000000000000
[   19.684054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.684223] CR2: 00007fa8a748f004 CR3: 000000010673b005 CR4: 00000000003706b0
[   19.684414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   19.684632] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   19.684817] Call Trace:
[   19.684893]  <TASK>
[   19.684967]  remap_pfn_range+0x3e/0xa0
[   19.685084]  vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core]
[   19.685253]  __do_fault+0x30/0xa0
[   19.685368]  __handle_mm_fault+0xe08/0x1ff0
[   19.685520]  ? find_held_lock+0x31/0x80
[   19.685655]  ? mt_find+0x15d/0x400
[   19.685759]  ? lock_release+0xbc/0x240
[   19.685862]  handle_mm_fault+0xa8/0x170
[   19.685963]  ? find_vma+0x3c/0x70
[   19.686066]  exc_page_fault+0x1e6/0x7b0
[   19.686167]  asm_exc_page_fault+0x27/0x30
[   19.686271] RIP: 0033:0x7fa8a986bfd5
[   19.686373] Code: ef 41 89 c4 e8 dc 70 fc ff 45 85 e4 0f 85 a0 0e 00 00 48 89 df e8 3b ce ff ff 48 85 c0 0f 84 eb 0d 00 00 48 8b ab e8 02 00 00 <8b> 45 04 0f c8 c1 e8 10 83 f8 05 0f 85 b0 0b 00 00 8b 45 14 ba 01
[   19.686773] RSP: 002b:00007ffc2392ab30 EFLAGS: 00010206
[   19.686931] RAX: 0000559646201590 RBX: 0000559646471a50 RCX: 0000559646471d00
[   19.687132] RDX: 0000559646471d00 RSI: 0000000000003b71 RDI: 0000000000000003
[   19.687377] RBP: 00007fa8a748f000 R08: 00000000fedfffff R09: 0000000000000000
[   19.687563] R10: 0000000000000022 R11: 0000000000000246 R12: 0000559646470904
[   19.687744] R13: 00007ffc2392ab70 R14: 0000000000000002 R15: 0000559646470934
[   19.687943]  </TASK>
[   19.688016] irq event stamp: 655597
[   19.688114] hardirqs last  enabled at (655605): [<ffffffff810c9683>] __up_console_sem+0x53/0x60
[   19.688340] hardirqs last disabled at (655612): [<ffffffff810c9668>] __up_console_sem+0x38/0x60
[   19.688554] softirqs last  enabled at (655148): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90
[   19.688733] softirqs last disabled at (655143): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90
[   19.688915] ---[ end trace 0000000000000000 ]---
[   19.689076] ------------[ cut here ]------------
[   19.689197] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 remap_pfn_range_notrack+0x40f/0x4f0
[   19.689440] Modules linked in: vfio_pci vfio_pci_core irqbypass vfio iommufd mlx5_ib ib_uverbs ib_core mlx5_core
[   19.689691] CPU: 0 PID: 561 Comm: python3 Tainted: G        W          6.3.0-rc6+ #1080
[   19.689867] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[   19.690109] RIP: 0010:remap_pfn_range_notrack+0x40f/0x4f0
[   19.690234] Code: 39 eb 0f 85 d9 fc ff ff 31 c0 eb 5e 48 8b 45 a8 31 f6 4c 89 45 d0 48 8d b8 88 01 00 00 e8 89 bc 5a 00 4c 8b 45 d0 85 c0 75 02 <0f> 0b 48 8b 43 20 e9 5e fc ff ff 48 8b 7d a8 48 89 c6 48 89 c3 e8
[   19.690628] RSP: 0000:ffffc900014b7c68 EFLAGS: 00010246
[   19.690750] RAX: 0000000000000000 RBX: ffff888106a35100 RCX: 0000000000000000
[   19.690914] RDX: 0000000000000000 RSI: ffff888104a709c8 RDI: ffff888108756150
[   19.691074] RBP: ffffc900014b7d08 R08: 00007fa8a948f000 R09: 0000000000000003
[   19.691274] R10: 000000000e6a4a47 R11: 00000000620892b1 R12: 00000000000fc000
[   19.691469] R13: 00007fa8a748f000 R14: 00007fa8a748f000 R15: 8000000000000037
[   19.691647] FS:  00007fa8aa4edb80(0000) GS:ffff8881ba400000(0000) knlGS:0000000000000000
[   19.691830] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.691980] CR2: 00007fa8a748f004 CR3: 000000010673b005 CR4: 00000000003706b0
[   19.692159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   19.692336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   19.692538] Call Trace:
[   19.692592]  <TASK>
[   19.692647]  ? track_pfn_remap+0xf7/0x100
[   19.692745]  remap_pfn_range+0x57/0xa0
[   19.692845]  vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core]
[   19.692991]  __do_fault+0x30/0xa0
[   19.693089]  __handle_mm_fault+0xe08/0x1ff0
[   19.693186]  ? find_held_lock+0x31/0x80
[   19.693291]  ? mt_find+0x15d/0x400
[   19.693391]  ? lock_release+0xbc/0x240
[   19.693491]  handle_mm_fault+0xa8/0x170
[   19.693587]  ? find_vma+0x3c/0x70
[   19.693685]  exc_page_fault+0x1e6/0x7b0
[   19.693782]  asm_exc_page_fault+0x27/0x30
[   19.693880] RIP: 0033:0x7fa8a986bfd5
[   19.693961] Code: ef 41 89 c4 e8 dc 70 fc ff 45 85 e4 0f 85 a0 0e 00 00 48 89 df e8 3b ce ff ff 48 85 c0 0f 84 eb 0d 00 00 48 8b ab e8 02 00 00 <8b> 45 04 0f c8 c1 e8 10 83 f8 05 0f 85 b0 0b 00 00 8b 45 14 ba 01
[   19.694342] RSP: 002b:00007ffc2392ab30 EFLAGS: 00010206
[   19.694466] RAX: 0000559646201590 RBX: 0000559646471a50 RCX: 0000559646471d00
[   19.694619] RDX: 0000559646471d00 RSI: 0000000000003b71 RDI: 0000000000000003
[   19.694778] RBP: 00007fa8a748f000 R08: 00000000fedfffff R09: 0000000000000000
[   19.694934] R10: 0000000000000022 R11: 0000000000000246 R12: 0000559646470904
[   19.695092] R13: 00007ffc2392ab70 R14: 0000000000000002 R15: 0000559646470934
[   19.695302]  </TASK>
[   19.695373] irq event stamp: 656049
[   19.695452] hardirqs last  enabled at (656057): [<ffffffff810c9683>] __up_console_sem+0x53/0x60
[   19.695657] hardirqs last disabled at (656064): [<ffffffff810c9668>] __up_console_sem+0x38/0x60
[   19.695883] softirqs last  enabled at (655148): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90
[   19.696061] softirqs last disabled at (655143): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90
[   19.696240] ---[ end trace 0000000000000000 ]---
  
Baolu Lu April 27, 2023, 3:20 a.m. UTC | #6
On 4/27/23 8:43 AM, Nicolin Chen wrote:
> Hi Baolu/Jason,

Hi Nicolin,

> 
> On Mon, Mar 27, 2023 at 06:13:10AM -0700, Baolu Lu wrote:
>> On 2023/3/27 20:18, Jason Gunthorpe wrote:
>>> On Tue, Mar 14, 2023 at 01:18:36PM +0800, Lu Baolu wrote:
>>>> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
>>>> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
>>>> is used to protect DMAR related global data from DMAR hotplug operations.
>>>>
>>>> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
>>>> the DMAR global data structures are not touched there. Remove it to avoid
>>>> below lockdep warning.
>>>
>>> Tested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>
>>> Solves my splat too
>>>
>>> Let's send this to -rc please
>>
>> Thank you for the testing. I will queue it for Joerg this week.
> 
> I found a couple of kernel warnings switching from v6.3-rc4
> to v6.3-rc5. Git-bisect is pointing at this commit.
> 
> My test environment is MKT enabling irq_remap:
> https://github.com/Mellanox/mkt
> CONFIG_IOMMUFD=m
> CONFIG_IOMMUFD_VFIO_CONTAINER=y
> CONFIG_IOMMUFD_TEST=y
> CONFIG_IRQ_REMAP=y
> 
> Any idea?
> 
> Thanks
> Nicolin
> 
> Attaching WARNINGs:
> [   19.680725] ------------[ cut here ]------------
> [   19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100
> [   19.681356] Modules linked in: vfio_pci vfio_pci_core irqbypass vfio iommufd mlx5_ib ib_uverbs ib_core mlx5_core
> [   19.681654] CPU: 0 PID: 561 Comm: python3 Not tainted 6.3.0-rc6+ #1080
> [   19.681808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> [   19.682108] RIP: 0010:track_pfn_remap+0xf5/0x100
> [   19.682270] Code: 5e 5d c3 48 89 f2 31 c9 4c 89 c6 4c 89 e7 e8 42 fc ff ff e9 54 ff ff ff 48 8d b8 88 01 00 00 31 f6 e8 5f 97 76 00 85 c0 75 be <0f> 0b eb ba 0f 1f 80 00 00 00 00 80 3d 71 72 ef 00 00 74 01 c3 55
> [   19.682678] RSP: 0000:ffffc900014b7ce8 EFLAGS: 00010246
> [   19.682805] RAX: 0000000000000000 RBX: 0000000002000000 RCX: 0000000000000000
> [   19.682984] RDX: 0000000000000000 RSI: ffff888104a709c8 RDI: ffff888108756150
> [   19.683214] RBP: ffffc900014b7d08 R08: 0000000000000001 R09: 0000000000000003
> [   19.683464] R10: 000000000e6a4a47 R11: 00000000620892b1 R12: 00000000fc000000
> [   19.683663] R13: ffffc900014b7d20 R14: ffff888106a35100 R15: 0000000002000000
> [   19.683843] FS:  00007fa8aa4edb80(0000) GS:ffff8881ba400000(0000) knlGS:0000000000000000
> [   19.684054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   19.684223] CR2: 00007fa8a748f004 CR3: 000000010673b005 CR4: 00000000003706b0
> [   19.684414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   19.684632] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   19.684817] Call Trace:
> [   19.684893]  <TASK>
> [   19.684967]  remap_pfn_range+0x3e/0xa0
> [   19.685084]  vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core]
> [   19.685253]  __do_fault+0x30/0xa0
> [   19.685368]  __handle_mm_fault+0xe08/0x1ff0
> [   19.685520]  ? find_held_lock+0x31/0x80
> [   19.685655]  ? mt_find+0x15d/0x400
> [   19.685759]  ? lock_release+0xbc/0x240
> [   19.685862]  handle_mm_fault+0xa8/0x170
> [   19.685963]  ? find_vma+0x3c/0x70
> [   19.686066]  exc_page_fault+0x1e6/0x7b0
> [   19.686167]  asm_exc_page_fault+0x27/0x30
> [   19.686271] RIP: 0033:0x7fa8a986bfd5
> [   19.686373] Code: ef 41 89 c4 e8 dc 70 fc ff 45 85 e4 0f 85 a0 0e 00 00 48 89 df e8 3b ce ff ff 48 85 c0 0f 84 eb 0d 00 00 48 8b ab e8 02 00 00 <8b> 45 04 0f c8 c1 e8 10 83 f8 05 0f 85 b0 0b 00 00 8b 45 14 ba 01
> [   19.686773] RSP: 002b:00007ffc2392ab30 EFLAGS: 00010206
> [   19.686931] RAX: 0000559646201590 RBX: 0000559646471a50 RCX: 0000559646471d00
> [   19.687132] RDX: 0000559646471d00 RSI: 0000000000003b71 RDI: 0000000000000003
> [   19.687377] RBP: 00007fa8a748f000 R08: 00000000fedfffff R09: 0000000000000000
> [   19.687563] R10: 0000000000000022 R11: 0000000000000246 R12: 0000559646470904
> [   19.687744] R13: 00007ffc2392ab70 R14: 0000000000000002 R15: 0000559646470934
> [   19.687943]  </TASK>
> [   19.688016] irq event stamp: 655597
> [   19.688114] hardirqs last  enabled at (655605): [<ffffffff810c9683>] __up_console_sem+0x53/0x60
> [   19.688340] hardirqs last disabled at (655612): [<ffffffff810c9668>] __up_console_sem+0x38/0x60
> [   19.688554] softirqs last  enabled at (655148): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90
> [   19.688733] softirqs last disabled at (655143): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90
> [   19.688915] ---[ end trace 0000000000000000 ]---
> [   19.689076] ------------[ cut here ]------------
> [   19.689197] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 remap_pfn_range_notrack+0x40f/0x4f0
> [   19.689440] Modules linked in: vfio_pci vfio_pci_core irqbypass vfio iommufd mlx5_ib ib_uverbs ib_core mlx5_core
> [   19.689691] CPU: 0 PID: 561 Comm: python3 Tainted: G        W          6.3.0-rc6+ #1080
> [   19.689867] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> [   19.690109] RIP: 0010:remap_pfn_range_notrack+0x40f/0x4f0
> [   19.690234] Code: 39 eb 0f 85 d9 fc ff ff 31 c0 eb 5e 48 8b 45 a8 31 f6 4c 89 45 d0 48 8d b8 88 01 00 00 e8 89 bc 5a 00 4c 8b 45 d0 85 c0 75 02 <0f> 0b 48 8b 43 20 e9 5e fc ff ff 48 8b 7d a8 48 89 c6 48 89 c3 e8
> [   19.690628] RSP: 0000:ffffc900014b7c68 EFLAGS: 00010246
> [   19.690750] RAX: 0000000000000000 RBX: ffff888106a35100 RCX: 0000000000000000
> [   19.690914] RDX: 0000000000000000 RSI: ffff888104a709c8 RDI: ffff888108756150
> [   19.691074] RBP: ffffc900014b7d08 R08: 00007fa8a948f000 R09: 0000000000000003
> [   19.691274] R10: 000000000e6a4a47 R11: 00000000620892b1 R12: 00000000000fc000
> [   19.691469] R13: 00007fa8a748f000 R14: 00007fa8a748f000 R15: 8000000000000037
> [   19.691647] FS:  00007fa8aa4edb80(0000) GS:ffff8881ba400000(0000) knlGS:0000000000000000
> [   19.691830] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   19.691980] CR2: 00007fa8a748f004 CR3: 000000010673b005 CR4: 00000000003706b0
> [   19.692159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   19.692336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   19.692538] Call Trace:
> [   19.692592]  <TASK>
> [   19.692647]  ? track_pfn_remap+0xf7/0x100
> [   19.692745]  remap_pfn_range+0x57/0xa0
> [   19.692845]  vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core]
> [   19.692991]  __do_fault+0x30/0xa0
> [   19.693089]  __handle_mm_fault+0xe08/0x1ff0
> [   19.693186]  ? find_held_lock+0x31/0x80
> [   19.693291]  ? mt_find+0x15d/0x400
> [   19.693391]  ? lock_release+0xbc/0x240
> [   19.693491]  handle_mm_fault+0xa8/0x170
> [   19.693587]  ? find_vma+0x3c/0x70
> [   19.693685]  exc_page_fault+0x1e6/0x7b0
> [   19.693782]  asm_exc_page_fault+0x27/0x30
> [   19.693880] RIP: 0033:0x7fa8a986bfd5
> [   19.693961] Code: ef 41 89 c4 e8 dc 70 fc ff 45 85 e4 0f 85 a0 0e 00 00 48 89 df e8 3b ce ff ff 48 85 c0 0f 84 eb 0d 00 00 48 8b ab e8 02 00 00 <8b> 45 04 0f c8 c1 e8 10 83 f8 05 0f 85 b0 0b 00 00 8b 45 14 ba 01
> [   19.694342] RSP: 002b:00007ffc2392ab30 EFLAGS: 00010206
> [   19.694466] RAX: 0000559646201590 RBX: 0000559646471a50 RCX: 0000559646471d00
> [   19.694619] RDX: 0000559646471d00 RSI: 0000000000003b71 RDI: 0000000000000003
> [   19.694778] RBP: 00007fa8a748f000 R08: 00000000fedfffff R09: 0000000000000000
> [   19.694934] R10: 0000000000000022 R11: 0000000000000246 R12: 0000559646470904
> [   19.695092] R13: 00007ffc2392ab70 R14: 0000000000000002 R15: 0000559646470934
> [   19.695302]  </TASK>
> [   19.695373] irq event stamp: 656049
> [   19.695452] hardirqs last  enabled at (656057): [<ffffffff810c9683>] __up_console_sem+0x53/0x60
> [   19.695657] hardirqs last disabled at (656064): [<ffffffff810c9668>] __up_console_sem+0x38/0x60
> [   19.695883] softirqs last  enabled at (655148): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90
> [   19.696061] softirqs last disabled at (655143): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90
> [   19.696240] ---[ end trace 0000000000000000 ]---

I took a quick look. It seems that above warnings are irrelevant to this
commit. Can you please simply revert this commit and check whether there
are any changes?

Best regards,
baolu
  
Nicolin Chen April 27, 2023, 3:37 a.m. UTC | #7
On Thu, Apr 27, 2023 at 11:20:40AM +0800, Baolu Lu wrote:

> > Attaching WARNINGs:
> > [   19.680725] ------------[ cut here ]------------
> > [   19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100

> > [   19.684817] Call Trace:
> > [   19.684893]  <TASK>
> > [   19.684967]  remap_pfn_range+0x3e/0xa0
> > [   19.685084]  vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core]
 
> I took a quick look. It seems that above warnings are irrelevant to this
> commit. Can you please simply revert this commit and check whether there
> are any changes?

I tried on top of the v6.3-rc5 tag. The warnings were triggered
constantly. And reverting the commit fixes it:
nicolinc@Asurada-Nvidia:~/work/mkt/images/nicolinc/src/kernel$ git log --oneline -2
cb3dc9b2417e (HEAD -> v6.3-rc5) Revert "iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()"
7e364e56293b (tag: v6.3-rc5, jgg/linus) Linux 6.3-rc5

I don't think the commit is the causation yet there seems to be
a strong correlation here...

Thanks
Nic
  
Robin Murphy April 27, 2023, 9:35 a.m. UTC | #8
On 2023-04-27 04:37, Nicolin Chen wrote:
> On Thu, Apr 27, 2023 at 11:20:40AM +0800, Baolu Lu wrote:
> 
>>> Attaching WARNINGs:
>>> [   19.680725] ------------[ cut here ]------------
>>> [   19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100
> 
>>> [   19.684817] Call Trace:
>>> [   19.684893]  <TASK>
>>> [   19.684967]  remap_pfn_range+0x3e/0xa0
>>> [   19.685084]  vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core]
>   
>> I took a quick look. It seems that above warnings are irrelevant to this
>> commit. Can you please simply revert this commit and check whether there
>> are any changes?
> 
> I tried on top of the v6.3-rc5 tag. The warnings were triggered
> constantly. And reverting the commit fixes it:
> nicolinc@Asurada-Nvidia:~/work/mkt/images/nicolinc/src/kernel$ git log --oneline -2
> cb3dc9b2417e (HEAD -> v6.3-rc5) Revert "iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()"
> 7e364e56293b (tag: v6.3-rc5, jgg/linus) Linux 6.3-rc5
> 
> I don't think the commit is the causation yet there seems to be
> a strong correlation here...

The correlation is probably that you're now getting to see a genuine 
warning from lockdep_assert_held(), since intel-iommu is no longer 
causing debug_locks to be turned off earlier.

Robin.
  
Nicolin Chen April 27, 2023, 5:24 p.m. UTC | #9
On Thu, Apr 27, 2023 at 10:35:11AM +0100, Robin Murphy wrote:
> On 2023-04-27 04:37, Nicolin Chen wrote:
> > On Thu, Apr 27, 2023 at 11:20:40AM +0800, Baolu Lu wrote:
> > 
> > > > Attaching WARNINGs:
> > > > [   19.680725] ------------[ cut here ]------------
> > > > [   19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100
> > 
> > > > [   19.684817] Call Trace:
> > > > [   19.684893]  <TASK>
> > > > [   19.684967]  remap_pfn_range+0x3e/0xa0
> > > > [   19.685084]  vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core]
> > 
> > > I took a quick look. It seems that above warnings are irrelevant to this
> > > commit. Can you please simply revert this commit and check whether there
> > > are any changes?
> > 
> > I tried on top of the v6.3-rc5 tag. The warnings were triggered
> > constantly. And reverting the commit fixes it:
> > nicolinc@Asurada-Nvidia:~/work/mkt/images/nicolinc/src/kernel$ git log --oneline -2
> > cb3dc9b2417e (HEAD -> v6.3-rc5) Revert "iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()"
> > 7e364e56293b (tag: v6.3-rc5, jgg/linus) Linux 6.3-rc5
> > 
> > I don't think the commit is the causation yet there seems to be
> > a strong correlation here...
> 
> The correlation is probably that you're now getting to see a genuine
> warning from lockdep_assert_held(), since intel-iommu is no longer
> causing debug_locks to be turned off earlier.

Hmm, looks like so. The mmap_lock is held with a read permission,
in arch/x86/mm/fault.c file, v.s. the expectation of a write one
by remap_pfn_range().

Thanks
Nicolin
  

Patch

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 6d01fa078c36..df9e261af0b5 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -311,14 +311,12 @@  static int set_ioapic_sid(struct irte *irte, int apic)
 	if (!irte)
 		return -1;
 
-	down_read(&dmar_global_lock);
 	for (i = 0; i < MAX_IO_APICS; i++) {
 		if (ir_ioapic[i].iommu && ir_ioapic[i].id == apic) {
 			sid = (ir_ioapic[i].bus << 8) | ir_ioapic[i].devfn;
 			break;
 		}
 	}
-	up_read(&dmar_global_lock);
 
 	if (sid == 0) {
 		pr_warn("Failed to set source-id of IOAPIC (%d)\n", apic);
@@ -338,14 +336,12 @@  static int set_hpet_sid(struct irte *irte, u8 id)
 	if (!irte)
 		return -1;
 
-	down_read(&dmar_global_lock);
 	for (i = 0; i < MAX_HPET_TBS; i++) {
 		if (ir_hpet[i].iommu && ir_hpet[i].id == id) {
 			sid = (ir_hpet[i].bus << 8) | ir_hpet[i].devfn;
 			break;
 		}
 	}
-	up_read(&dmar_global_lock);
 
 	if (sid == 0) {
 		pr_warn("Failed to set source-id of HPET block (%d)\n", id);
@@ -1339,9 +1335,7 @@  static int intel_irq_remapping_alloc(struct irq_domain *domain,
 	if (!data)
 		goto out_free_parent;
 
-	down_read(&dmar_global_lock);
 	index = alloc_irte(iommu, &data->irq_2_iommu, nr_irqs);
-	up_read(&dmar_global_lock);
 	if (index < 0) {
 		pr_warn("Failed to allocate IRTE\n");
 		kfree(data);