Message ID | 20230314051836.23817-1-baolu.lu@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1577075wrd; Mon, 13 Mar 2023 22:32:35 -0700 (PDT) X-Google-Smtp-Source: AK7set/PPPycatIjXZtoxhFEWfNuKoGca6fS9DmK/wMLKpxVzpple4cte4gDK1aKXUINKop40XDz X-Received: by 2002:a62:4e47:0:b0:624:677c:e2ae with SMTP id c68-20020a624e47000000b00624677ce2aemr4299643pfb.14.1678771955527; Mon, 13 Mar 2023 22:32:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678771955; cv=none; d=google.com; s=arc-20160816; b=Vq4ZztVfWqkp9VE+fSheNpHl19CMBXAC35BNT4sKYJCwQjrByyqATSTrYKMas5OphA SpJbJni2u4akVo43u2UFi23muMqcKW1n89gJFQWU+VoD5Nic3igsXfHFkUklzEzZ4Ul5 leumKumNmFJG1PYectgE18bi6D0lYN1R5F9oQsAp+xHE0G+Dg/aKH9UkT9rWLL43g5dU Stl8oBHatCNGeFmRTP8rb1V7mKblCANpT+rgAQuR/NnMGOmG/W4TUAjKZPldHc4uoCyh 7uKEsE5IawBe8/RiYDkPpEDpBUnibllNQ3yVoThN43tc3Dzk4GJjmQ1JrYfgCvtdSAah Sz5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=nqB960CjkAaor9zKgSC0Dlb4QQCvaYcB99eWV5WxrQg=; b=lWjA9S7Thmg6Q8AvSMnIvxLUkKDub6h8Z2kyD2A/qC59ivy0WZTYce2tFdt4CAKQk2 vNe5IK579pe5nMvtCpZggKavYrqVpSUiPJ8tPtwwjvTHekaGnhphzL3/4dMQuWTlKL+j mnawZb2NDA8jaK2gNqu+wXneoyUNRxIYczQFhFPtEgGKMs1jMT3zpKe36uDuWGPN+SWZ gMpPt4YzVpn37Gl+8Betq6Yg2/+v58sNN2HCETU0AHeNc+IY9UZNygSnQn9PJ1mhPbUz ijX9GC8CBTwHMJhRthvWEk0tmfmJ/TNDdfzm+A6mEDqmFH/wXPIlOh2rt4heu3ep2a73 mIfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FqS0PNy2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d19-20020a63fd13000000b004fba7af9ce7si1356992pgh.135.2023.03.13.22.32.20; Mon, 13 Mar 2023 22:32:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FqS0PNy2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229626AbjCNFTr (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Tue, 14 Mar 2023 01:19:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229528AbjCNFTq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 14 Mar 2023 01:19:46 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3132951FBB for <linux-kernel@vger.kernel.org>; Mon, 13 Mar 2023 22:19:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678771185; x=1710307185; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=AgQ9bSzSPzcGMLNoWPFkB0EICC/cZjX30lB9FzLG2Pc=; b=FqS0PNy2gViVtnQPiqw+UsQ7vsbqyu3b0bnUW+VrkfkdtVbNenyMjHOA cKKf2yQayvHDdj3gLJmXU1W3uZuPmM6zxj8jUK6cvUY6q7Rl23BApJ590 UcihPl20Y1elye5ci1Wxi3NKPjVeM8y/lnzkK23Qm/Cl3++uwI7HhMAwf +4e3vUXPeCl1IlcDUB65AqAdTpeO7gi9830E341x1lgBHqpd8ygk5YtUs 3r1+uMJdhpkLc0ZkQ5L+rIE8lorVdhtAKWAwFMGxTFmPvOilEGsFRXOhs AzNHT+AsP8JfNq+4xgCvIwKYUZDCK7Rb6gN8meAzAI4Sxcg4JZVyVJwhn A==; X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="402205411" X-IronPort-AV: E=Sophos;i="5.98,259,1673942400"; d="scan'208";a="402205411" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 22:19:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="709139698" X-IronPort-AV: E=Sophos;i="5.98,259,1673942400"; d="scan'208";a="709139698" Received: from allen-box.sh.intel.com ([10.239.159.48]) by orsmga008.jf.intel.com with ESMTP; 13 Mar 2023 22:19:42 -0700 From: Lu Baolu <baolu.lu@linux.intel.com> To: iommu@lists.linux.dev Cc: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Kevin Tian <kevin.tian@intel.com>, linux-kernel@vger.kernel.org, Lu Baolu <baolu.lu@linux.intel.com> Subject: [PATCH] iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc() Date: Tue, 14 Mar 2023 13:18:36 +0800 Message-Id: <20230314051836.23817-1-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760319982200217557?= X-GMAIL-MSGID: =?utf-8?q?1760319982200217557?= |
Series |
iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
|
|
Commit Message
Baolu Lu
March 14, 2023, 5:18 a.m. UTC
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.
Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
the DMAR global data structures are not touched there. Remove it to avoid
below lockdep warning.
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2 #468 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
at: __irq_domain_alloc_irqs+0x3b/0xa0
but task is already holding lock:
ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
at: intel_iommu_init+0x58e/0x880
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (dmar_global_lock){++++}-{3:3}:
lock_acquire+0xd6/0x320
down_read+0x42/0x180
intel_irq_remapping_alloc+0xad/0x750
mp_irqdomain_alloc+0xb8/0x2b0
irq_domain_alloc_irqs_locked+0x12f/0x2d0
__irq_domain_alloc_irqs+0x56/0xa0
alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
mp_map_pin_to_irq+0x1dc/0x330
setup_IO_APIC+0x128/0x210
apic_intr_mode_init+0x67/0x110
x86_late_time_init+0x24/0x40
start_kernel+0x41e/0x7e0
secondary_startup_64_no_verify+0xe0/0xeb
-> #0 (&domain->mutex){+.+.}-{3:3}:
check_prevs_add+0x160/0xef0
__lock_acquire+0x147d/0x1950
lock_acquire+0xd6/0x320
__mutex_lock+0x9c/0xfc0
__irq_domain_alloc_irqs+0x3b/0xa0
dmar_alloc_hwirq+0x9e/0x120
iommu_pmu_register+0x11d/0x200
intel_iommu_init+0x5de/0x880
pci_iommu_init+0x12/0x40
do_one_initcall+0x65/0x350
kernel_init_freeable+0x3ca/0x610
kernel_init+0x1a/0x140
ret_from_fork+0x29/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(dmar_global_lock);
lock(&domain->mutex);
lock(dmar_global_lock);
lock(&domain->mutex);
*** DEADLOCK ***
Fixes: 9dbb8e3452ab ("irqdomain: Switch to per-domain locking")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
drivers/iommu/intel/irq_remapping.c | 6 ------
1 file changed, 6 deletions(-)
Comments
Hi BaoLu, On Tue, 14 Mar 2023 13:18:36 +0800, Lu Baolu <baolu.lu@linux.intel.com> wrote: > The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932 > ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It > is used to protect DMAR related global data from DMAR hotplug operations. > > Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as > the DMAR global data structures are not touched there. Remove it to avoid > below lockdep warning. > > ====================================================== > WARNING: possible circular locking dependency detected > 6.3.0-rc2 #468 Not tainted > ------------------------------------------------------ > swapper/0/1 is trying to acquire lock: > ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3}, > at: __irq_domain_alloc_irqs+0x3b/0xa0 > > but task is already holding lock: > ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3}, > at: intel_iommu_init+0x58e/0x880 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #1 (dmar_global_lock){++++}-{3:3}: > lock_acquire+0xd6/0x320 > down_read+0x42/0x180 > intel_irq_remapping_alloc+0xad/0x750 > mp_irqdomain_alloc+0xb8/0x2b0 > irq_domain_alloc_irqs_locked+0x12f/0x2d0 > __irq_domain_alloc_irqs+0x56/0xa0 > alloc_isa_irq_from_domain.isra.7+0xa0/0xe0 > mp_map_pin_to_irq+0x1dc/0x330 > setup_IO_APIC+0x128/0x210 > apic_intr_mode_init+0x67/0x110 > x86_late_time_init+0x24/0x40 > start_kernel+0x41e/0x7e0 > secondary_startup_64_no_verify+0xe0/0xeb > > -> #0 (&domain->mutex){+.+.}-{3:3}: > check_prevs_add+0x160/0xef0 > __lock_acquire+0x147d/0x1950 > lock_acquire+0xd6/0x320 > __mutex_lock+0x9c/0xfc0 > __irq_domain_alloc_irqs+0x3b/0xa0 > dmar_alloc_hwirq+0x9e/0x120 > iommu_pmu_register+0x11d/0x200 > intel_iommu_init+0x5de/0x880 > pci_iommu_init+0x12/0x40 > do_one_initcall+0x65/0x350 > kernel_init_freeable+0x3ca/0x610 > kernel_init+0x1a/0x140 > ret_from_fork+0x29/0x50 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(dmar_global_lock); > lock(&domain->mutex); > lock(dmar_global_lock); > lock(&domain->mutex); > > *** DEADLOCK *** > > Fixes: 9dbb8e3452ab ("irqdomain: Switch to per-domain locking") > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel/irq_remapping.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/drivers/iommu/intel/irq_remapping.c > b/drivers/iommu/intel/irq_remapping.c index 6d01fa078c36..df9e261af0b5 > 100644 --- a/drivers/iommu/intel/irq_remapping.c > +++ b/drivers/iommu/intel/irq_remapping.c > @@ -311,14 +311,12 @@ static int set_ioapic_sid(struct irte *irte, int > apic) if (!irte) > return -1; > > - down_read(&dmar_global_lock); > for (i = 0; i < MAX_IO_APICS; i++) { > if (ir_ioapic[i].iommu && ir_ioapic[i].id == apic) { > sid = (ir_ioapic[i].bus << 8) | > ir_ioapic[i].devfn; break; > } > } > - up_read(&dmar_global_lock); > > if (sid == 0) { > pr_warn("Failed to set source-id of IOAPIC (%d)\n", > apic); @@ -338,14 +336,12 @@ static int set_hpet_sid(struct irte *irte, > u8 id) if (!irte) > return -1; > > - down_read(&dmar_global_lock); > for (i = 0; i < MAX_HPET_TBS; i++) { > if (ir_hpet[i].iommu && ir_hpet[i].id == id) { > sid = (ir_hpet[i].bus << 8) | ir_hpet[i].devfn; > break; > } > } > - up_read(&dmar_global_lock); > > if (sid == 0) { > pr_warn("Failed to set source-id of HPET block (%d)\n", > id); @@ -1339,9 +1335,7 @@ static int intel_irq_remapping_alloc(struct > irq_domain *domain, if (!data) > goto out_free_parent; > > - down_read(&dmar_global_lock); > index = alloc_irte(iommu, &data->irq_2_iommu, nr_irqs); > - up_read(&dmar_global_lock); > if (index < 0) { > pr_warn("Failed to allocate IRTE\n"); > kfree(data); Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com> slightly beyond the scope of this, do we need to take dmar_global_lock below? shouldn't it be in single threaded context? down_write(&dmar_global_lock); ret = dmar_dev_scope_init(); up_write(&dmar_global_lock); return ret; } rootfs_initcall(ir_dev_scope_init); Thanks, Jacob
On 3/14/23 11:54 PM, Jacob Pan wrote: > Hi BaoLu, > > On Tue, 14 Mar 2023 13:18:36 +0800, Lu Baolu <baolu.lu@linux.intel.com> > wrote: > >> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932 >> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It >> is used to protect DMAR related global data from DMAR hotplug operations. >> >> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as >> the DMAR global data structures are not touched there. Remove it to avoid >> below lockdep warning. >> >> ====================================================== >> WARNING: possible circular locking dependency detected >> 6.3.0-rc2 #468 Not tainted >> ------------------------------------------------------ >> swapper/0/1 is trying to acquire lock: >> ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3}, >> at: __irq_domain_alloc_irqs+0x3b/0xa0 >> >> but task is already holding lock: >> ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3}, >> at: intel_iommu_init+0x58e/0x880 >> >> which lock already depends on the new lock. >> >> the existing dependency chain (in reverse order) is: >> >> -> #1 (dmar_global_lock){++++}-{3:3}: >> lock_acquire+0xd6/0x320 >> down_read+0x42/0x180 >> intel_irq_remapping_alloc+0xad/0x750 >> mp_irqdomain_alloc+0xb8/0x2b0 >> irq_domain_alloc_irqs_locked+0x12f/0x2d0 >> __irq_domain_alloc_irqs+0x56/0xa0 >> alloc_isa_irq_from_domain.isra.7+0xa0/0xe0 >> mp_map_pin_to_irq+0x1dc/0x330 >> setup_IO_APIC+0x128/0x210 >> apic_intr_mode_init+0x67/0x110 >> x86_late_time_init+0x24/0x40 >> start_kernel+0x41e/0x7e0 >> secondary_startup_64_no_verify+0xe0/0xeb >> >> -> #0 (&domain->mutex){+.+.}-{3:3}: >> check_prevs_add+0x160/0xef0 >> __lock_acquire+0x147d/0x1950 >> lock_acquire+0xd6/0x320 >> __mutex_lock+0x9c/0xfc0 >> __irq_domain_alloc_irqs+0x3b/0xa0 >> dmar_alloc_hwirq+0x9e/0x120 >> iommu_pmu_register+0x11d/0x200 >> intel_iommu_init+0x5de/0x880 >> pci_iommu_init+0x12/0x40 >> do_one_initcall+0x65/0x350 >> kernel_init_freeable+0x3ca/0x610 >> kernel_init+0x1a/0x140 >> ret_from_fork+0x29/0x50 >> >> other info that might help us debug this: >> >> Possible unsafe locking scenario: >> >> CPU0 CPU1 >> ---- ---- >> lock(dmar_global_lock); >> lock(&domain->mutex); >> lock(dmar_global_lock); >> lock(&domain->mutex); >> >> *** DEADLOCK *** >> >> Fixes: 9dbb8e3452ab ("irqdomain: Switch to per-domain locking") >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel/irq_remapping.c | 6 ------ >> 1 file changed, 6 deletions(-) >> >> diff --git a/drivers/iommu/intel/irq_remapping.c >> b/drivers/iommu/intel/irq_remapping.c index 6d01fa078c36..df9e261af0b5 >> 100644 --- a/drivers/iommu/intel/irq_remapping.c >> +++ b/drivers/iommu/intel/irq_remapping.c >> @@ -311,14 +311,12 @@ static int set_ioapic_sid(struct irte *irte, int >> apic) if (!irte) >> return -1; >> >> - down_read(&dmar_global_lock); >> for (i = 0; i < MAX_IO_APICS; i++) { >> if (ir_ioapic[i].iommu && ir_ioapic[i].id == apic) { >> sid = (ir_ioapic[i].bus << 8) | >> ir_ioapic[i].devfn; break; >> } >> } >> - up_read(&dmar_global_lock); >> >> if (sid == 0) { >> pr_warn("Failed to set source-id of IOAPIC (%d)\n", >> apic); @@ -338,14 +336,12 @@ static int set_hpet_sid(struct irte *irte, >> u8 id) if (!irte) >> return -1; >> >> - down_read(&dmar_global_lock); >> for (i = 0; i < MAX_HPET_TBS; i++) { >> if (ir_hpet[i].iommu && ir_hpet[i].id == id) { >> sid = (ir_hpet[i].bus << 8) | ir_hpet[i].devfn; >> break; >> } >> } >> - up_read(&dmar_global_lock); >> >> if (sid == 0) { >> pr_warn("Failed to set source-id of HPET block (%d)\n", >> id); @@ -1339,9 +1335,7 @@ static int intel_irq_remapping_alloc(struct >> irq_domain *domain, if (!data) >> goto out_free_parent; >> >> - down_read(&dmar_global_lock); >> index = alloc_irte(iommu, &data->irq_2_iommu, nr_irqs); >> - up_read(&dmar_global_lock); >> if (index < 0) { >> pr_warn("Failed to allocate IRTE\n"); >> kfree(data); > Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com> > > slightly beyond the scope of this, do we need to take dmar_global_lock > below? shouldn't it be in single threaded context? > > down_write(&dmar_global_lock); > ret = dmar_dev_scope_init(); > up_write(&dmar_global_lock); > > return ret; > } > rootfs_initcall(ir_dev_scope_init); Yes, agreed. This runs in a single threaded context. I will remove the locking in a cleanup patch. Best regards, baolu
On Tue, Mar 14, 2023 at 01:18:36PM +0800, Lu Baolu wrote: > The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932 > ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It > is used to protect DMAR related global data from DMAR hotplug operations. > > Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as > the DMAR global data structures are not touched there. Remove it to avoid > below lockdep warning. Tested-by: Jason Gunthorpe <jgg@nvidia.com> Solves my splat too Let's send this to -rc please Jason
On 2023/3/27 20:18, Jason Gunthorpe wrote: > On Tue, Mar 14, 2023 at 01:18:36PM +0800, Lu Baolu wrote: >> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932 >> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It >> is used to protect DMAR related global data from DMAR hotplug operations. >> >> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as >> the DMAR global data structures are not touched there. Remove it to avoid >> below lockdep warning. > > Tested-by: Jason Gunthorpe <jgg@nvidia.com> > > Solves my splat too > > Let's send this to -rc please Thank you for the testing. I will queue it for Joerg this week. Best regards, baolu
Hi Baolu/Jason, On Mon, Mar 27, 2023 at 06:13:10AM -0700, Baolu Lu wrote: > On 2023/3/27 20:18, Jason Gunthorpe wrote: > > On Tue, Mar 14, 2023 at 01:18:36PM +0800, Lu Baolu wrote: > >> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932 > >> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It > >> is used to protect DMAR related global data from DMAR hotplug operations. > >> > >> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as > >> the DMAR global data structures are not touched there. Remove it to avoid > >> below lockdep warning. > > > > Tested-by: Jason Gunthorpe <jgg@nvidia.com> > > > > Solves my splat too > > > > Let's send this to -rc please > > Thank you for the testing. I will queue it for Joerg this week. I found a couple of kernel warnings switching from v6.3-rc4 to v6.3-rc5. Git-bisect is pointing at this commit. My test environment is MKT enabling irq_remap: https://github.com/Mellanox/mkt CONFIG_IOMMUFD=m CONFIG_IOMMUFD_VFIO_CONTAINER=y CONFIG_IOMMUFD_TEST=y CONFIG_IRQ_REMAP=y Any idea? Thanks Nicolin Attaching WARNINGs: [ 19.680725] ------------[ cut here ]------------ [ 19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100 [ 19.681356] Modules linked in: vfio_pci vfio_pci_core irqbypass vfio iommufd mlx5_ib ib_uverbs ib_core mlx5_core [ 19.681654] CPU: 0 PID: 561 Comm: python3 Not tainted 6.3.0-rc6+ #1080 [ 19.681808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 19.682108] RIP: 0010:track_pfn_remap+0xf5/0x100 [ 19.682270] Code: 5e 5d c3 48 89 f2 31 c9 4c 89 c6 4c 89 e7 e8 42 fc ff ff e9 54 ff ff ff 48 8d b8 88 01 00 00 31 f6 e8 5f 97 76 00 85 c0 75 be <0f> 0b eb ba 0f 1f 80 00 00 00 00 80 3d 71 72 ef 00 00 74 01 c3 55 [ 19.682678] RSP: 0000:ffffc900014b7ce8 EFLAGS: 00010246 [ 19.682805] RAX: 0000000000000000 RBX: 0000000002000000 RCX: 0000000000000000 [ 19.682984] RDX: 0000000000000000 RSI: ffff888104a709c8 RDI: ffff888108756150 [ 19.683214] RBP: ffffc900014b7d08 R08: 0000000000000001 R09: 0000000000000003 [ 19.683464] R10: 000000000e6a4a47 R11: 00000000620892b1 R12: 00000000fc000000 [ 19.683663] R13: ffffc900014b7d20 R14: ffff888106a35100 R15: 0000000002000000 [ 19.683843] FS: 00007fa8aa4edb80(0000) GS:ffff8881ba400000(0000) knlGS:0000000000000000 [ 19.684054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 19.684223] CR2: 00007fa8a748f004 CR3: 000000010673b005 CR4: 00000000003706b0 [ 19.684414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 19.684632] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 19.684817] Call Trace: [ 19.684893] <TASK> [ 19.684967] remap_pfn_range+0x3e/0xa0 [ 19.685084] vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core] [ 19.685253] __do_fault+0x30/0xa0 [ 19.685368] __handle_mm_fault+0xe08/0x1ff0 [ 19.685520] ? find_held_lock+0x31/0x80 [ 19.685655] ? mt_find+0x15d/0x400 [ 19.685759] ? lock_release+0xbc/0x240 [ 19.685862] handle_mm_fault+0xa8/0x170 [ 19.685963] ? find_vma+0x3c/0x70 [ 19.686066] exc_page_fault+0x1e6/0x7b0 [ 19.686167] asm_exc_page_fault+0x27/0x30 [ 19.686271] RIP: 0033:0x7fa8a986bfd5 [ 19.686373] Code: ef 41 89 c4 e8 dc 70 fc ff 45 85 e4 0f 85 a0 0e 00 00 48 89 df e8 3b ce ff ff 48 85 c0 0f 84 eb 0d 00 00 48 8b ab e8 02 00 00 <8b> 45 04 0f c8 c1 e8 10 83 f8 05 0f 85 b0 0b 00 00 8b 45 14 ba 01 [ 19.686773] RSP: 002b:00007ffc2392ab30 EFLAGS: 00010206 [ 19.686931] RAX: 0000559646201590 RBX: 0000559646471a50 RCX: 0000559646471d00 [ 19.687132] RDX: 0000559646471d00 RSI: 0000000000003b71 RDI: 0000000000000003 [ 19.687377] RBP: 00007fa8a748f000 R08: 00000000fedfffff R09: 0000000000000000 [ 19.687563] R10: 0000000000000022 R11: 0000000000000246 R12: 0000559646470904 [ 19.687744] R13: 00007ffc2392ab70 R14: 0000000000000002 R15: 0000559646470934 [ 19.687943] </TASK> [ 19.688016] irq event stamp: 655597 [ 19.688114] hardirqs last enabled at (655605): [<ffffffff810c9683>] __up_console_sem+0x53/0x60 [ 19.688340] hardirqs last disabled at (655612): [<ffffffff810c9668>] __up_console_sem+0x38/0x60 [ 19.688554] softirqs last enabled at (655148): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90 [ 19.688733] softirqs last disabled at (655143): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90 [ 19.688915] ---[ end trace 0000000000000000 ]--- [ 19.689076] ------------[ cut here ]------------ [ 19.689197] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 remap_pfn_range_notrack+0x40f/0x4f0 [ 19.689440] Modules linked in: vfio_pci vfio_pci_core irqbypass vfio iommufd mlx5_ib ib_uverbs ib_core mlx5_core [ 19.689691] CPU: 0 PID: 561 Comm: python3 Tainted: G W 6.3.0-rc6+ #1080 [ 19.689867] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 19.690109] RIP: 0010:remap_pfn_range_notrack+0x40f/0x4f0 [ 19.690234] Code: 39 eb 0f 85 d9 fc ff ff 31 c0 eb 5e 48 8b 45 a8 31 f6 4c 89 45 d0 48 8d b8 88 01 00 00 e8 89 bc 5a 00 4c 8b 45 d0 85 c0 75 02 <0f> 0b 48 8b 43 20 e9 5e fc ff ff 48 8b 7d a8 48 89 c6 48 89 c3 e8 [ 19.690628] RSP: 0000:ffffc900014b7c68 EFLAGS: 00010246 [ 19.690750] RAX: 0000000000000000 RBX: ffff888106a35100 RCX: 0000000000000000 [ 19.690914] RDX: 0000000000000000 RSI: ffff888104a709c8 RDI: ffff888108756150 [ 19.691074] RBP: ffffc900014b7d08 R08: 00007fa8a948f000 R09: 0000000000000003 [ 19.691274] R10: 000000000e6a4a47 R11: 00000000620892b1 R12: 00000000000fc000 [ 19.691469] R13: 00007fa8a748f000 R14: 00007fa8a748f000 R15: 8000000000000037 [ 19.691647] FS: 00007fa8aa4edb80(0000) GS:ffff8881ba400000(0000) knlGS:0000000000000000 [ 19.691830] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 19.691980] CR2: 00007fa8a748f004 CR3: 000000010673b005 CR4: 00000000003706b0 [ 19.692159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 19.692336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 19.692538] Call Trace: [ 19.692592] <TASK> [ 19.692647] ? track_pfn_remap+0xf7/0x100 [ 19.692745] remap_pfn_range+0x57/0xa0 [ 19.692845] vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core] [ 19.692991] __do_fault+0x30/0xa0 [ 19.693089] __handle_mm_fault+0xe08/0x1ff0 [ 19.693186] ? find_held_lock+0x31/0x80 [ 19.693291] ? mt_find+0x15d/0x400 [ 19.693391] ? lock_release+0xbc/0x240 [ 19.693491] handle_mm_fault+0xa8/0x170 [ 19.693587] ? find_vma+0x3c/0x70 [ 19.693685] exc_page_fault+0x1e6/0x7b0 [ 19.693782] asm_exc_page_fault+0x27/0x30 [ 19.693880] RIP: 0033:0x7fa8a986bfd5 [ 19.693961] Code: ef 41 89 c4 e8 dc 70 fc ff 45 85 e4 0f 85 a0 0e 00 00 48 89 df e8 3b ce ff ff 48 85 c0 0f 84 eb 0d 00 00 48 8b ab e8 02 00 00 <8b> 45 04 0f c8 c1 e8 10 83 f8 05 0f 85 b0 0b 00 00 8b 45 14 ba 01 [ 19.694342] RSP: 002b:00007ffc2392ab30 EFLAGS: 00010206 [ 19.694466] RAX: 0000559646201590 RBX: 0000559646471a50 RCX: 0000559646471d00 [ 19.694619] RDX: 0000559646471d00 RSI: 0000000000003b71 RDI: 0000000000000003 [ 19.694778] RBP: 00007fa8a748f000 R08: 00000000fedfffff R09: 0000000000000000 [ 19.694934] R10: 0000000000000022 R11: 0000000000000246 R12: 0000559646470904 [ 19.695092] R13: 00007ffc2392ab70 R14: 0000000000000002 R15: 0000559646470934 [ 19.695302] </TASK> [ 19.695373] irq event stamp: 656049 [ 19.695452] hardirqs last enabled at (656057): [<ffffffff810c9683>] __up_console_sem+0x53/0x60 [ 19.695657] hardirqs last disabled at (656064): [<ffffffff810c9668>] __up_console_sem+0x38/0x60 [ 19.695883] softirqs last enabled at (655148): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90 [ 19.696061] softirqs last disabled at (655143): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90 [ 19.696240] ---[ end trace 0000000000000000 ]---
On 4/27/23 8:43 AM, Nicolin Chen wrote: > Hi Baolu/Jason, Hi Nicolin, > > On Mon, Mar 27, 2023 at 06:13:10AM -0700, Baolu Lu wrote: >> On 2023/3/27 20:18, Jason Gunthorpe wrote: >>> On Tue, Mar 14, 2023 at 01:18:36PM +0800, Lu Baolu wrote: >>>> The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932 >>>> ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It >>>> is used to protect DMAR related global data from DMAR hotplug operations. >>>> >>>> Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as >>>> the DMAR global data structures are not touched there. Remove it to avoid >>>> below lockdep warning. >>> >>> Tested-by: Jason Gunthorpe <jgg@nvidia.com> >>> >>> Solves my splat too >>> >>> Let's send this to -rc please >> >> Thank you for the testing. I will queue it for Joerg this week. > > I found a couple of kernel warnings switching from v6.3-rc4 > to v6.3-rc5. Git-bisect is pointing at this commit. > > My test environment is MKT enabling irq_remap: > https://github.com/Mellanox/mkt > CONFIG_IOMMUFD=m > CONFIG_IOMMUFD_VFIO_CONTAINER=y > CONFIG_IOMMUFD_TEST=y > CONFIG_IRQ_REMAP=y > > Any idea? > > Thanks > Nicolin > > Attaching WARNINGs: > [ 19.680725] ------------[ cut here ]------------ > [ 19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100 > [ 19.681356] Modules linked in: vfio_pci vfio_pci_core irqbypass vfio iommufd mlx5_ib ib_uverbs ib_core mlx5_core > [ 19.681654] CPU: 0 PID: 561 Comm: python3 Not tainted 6.3.0-rc6+ #1080 > [ 19.681808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > [ 19.682108] RIP: 0010:track_pfn_remap+0xf5/0x100 > [ 19.682270] Code: 5e 5d c3 48 89 f2 31 c9 4c 89 c6 4c 89 e7 e8 42 fc ff ff e9 54 ff ff ff 48 8d b8 88 01 00 00 31 f6 e8 5f 97 76 00 85 c0 75 be <0f> 0b eb ba 0f 1f 80 00 00 00 00 80 3d 71 72 ef 00 00 74 01 c3 55 > [ 19.682678] RSP: 0000:ffffc900014b7ce8 EFLAGS: 00010246 > [ 19.682805] RAX: 0000000000000000 RBX: 0000000002000000 RCX: 0000000000000000 > [ 19.682984] RDX: 0000000000000000 RSI: ffff888104a709c8 RDI: ffff888108756150 > [ 19.683214] RBP: ffffc900014b7d08 R08: 0000000000000001 R09: 0000000000000003 > [ 19.683464] R10: 000000000e6a4a47 R11: 00000000620892b1 R12: 00000000fc000000 > [ 19.683663] R13: ffffc900014b7d20 R14: ffff888106a35100 R15: 0000000002000000 > [ 19.683843] FS: 00007fa8aa4edb80(0000) GS:ffff8881ba400000(0000) knlGS:0000000000000000 > [ 19.684054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 19.684223] CR2: 00007fa8a748f004 CR3: 000000010673b005 CR4: 00000000003706b0 > [ 19.684414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 19.684632] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 19.684817] Call Trace: > [ 19.684893] <TASK> > [ 19.684967] remap_pfn_range+0x3e/0xa0 > [ 19.685084] vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core] > [ 19.685253] __do_fault+0x30/0xa0 > [ 19.685368] __handle_mm_fault+0xe08/0x1ff0 > [ 19.685520] ? find_held_lock+0x31/0x80 > [ 19.685655] ? mt_find+0x15d/0x400 > [ 19.685759] ? lock_release+0xbc/0x240 > [ 19.685862] handle_mm_fault+0xa8/0x170 > [ 19.685963] ? find_vma+0x3c/0x70 > [ 19.686066] exc_page_fault+0x1e6/0x7b0 > [ 19.686167] asm_exc_page_fault+0x27/0x30 > [ 19.686271] RIP: 0033:0x7fa8a986bfd5 > [ 19.686373] Code: ef 41 89 c4 e8 dc 70 fc ff 45 85 e4 0f 85 a0 0e 00 00 48 89 df e8 3b ce ff ff 48 85 c0 0f 84 eb 0d 00 00 48 8b ab e8 02 00 00 <8b> 45 04 0f c8 c1 e8 10 83 f8 05 0f 85 b0 0b 00 00 8b 45 14 ba 01 > [ 19.686773] RSP: 002b:00007ffc2392ab30 EFLAGS: 00010206 > [ 19.686931] RAX: 0000559646201590 RBX: 0000559646471a50 RCX: 0000559646471d00 > [ 19.687132] RDX: 0000559646471d00 RSI: 0000000000003b71 RDI: 0000000000000003 > [ 19.687377] RBP: 00007fa8a748f000 R08: 00000000fedfffff R09: 0000000000000000 > [ 19.687563] R10: 0000000000000022 R11: 0000000000000246 R12: 0000559646470904 > [ 19.687744] R13: 00007ffc2392ab70 R14: 0000000000000002 R15: 0000559646470934 > [ 19.687943] </TASK> > [ 19.688016] irq event stamp: 655597 > [ 19.688114] hardirqs last enabled at (655605): [<ffffffff810c9683>] __up_console_sem+0x53/0x60 > [ 19.688340] hardirqs last disabled at (655612): [<ffffffff810c9668>] __up_console_sem+0x38/0x60 > [ 19.688554] softirqs last enabled at (655148): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90 > [ 19.688733] softirqs last disabled at (655143): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90 > [ 19.688915] ---[ end trace 0000000000000000 ]--- > [ 19.689076] ------------[ cut here ]------------ > [ 19.689197] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 remap_pfn_range_notrack+0x40f/0x4f0 > [ 19.689440] Modules linked in: vfio_pci vfio_pci_core irqbypass vfio iommufd mlx5_ib ib_uverbs ib_core mlx5_core > [ 19.689691] CPU: 0 PID: 561 Comm: python3 Tainted: G W 6.3.0-rc6+ #1080 > [ 19.689867] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > [ 19.690109] RIP: 0010:remap_pfn_range_notrack+0x40f/0x4f0 > [ 19.690234] Code: 39 eb 0f 85 d9 fc ff ff 31 c0 eb 5e 48 8b 45 a8 31 f6 4c 89 45 d0 48 8d b8 88 01 00 00 e8 89 bc 5a 00 4c 8b 45 d0 85 c0 75 02 <0f> 0b 48 8b 43 20 e9 5e fc ff ff 48 8b 7d a8 48 89 c6 48 89 c3 e8 > [ 19.690628] RSP: 0000:ffffc900014b7c68 EFLAGS: 00010246 > [ 19.690750] RAX: 0000000000000000 RBX: ffff888106a35100 RCX: 0000000000000000 > [ 19.690914] RDX: 0000000000000000 RSI: ffff888104a709c8 RDI: ffff888108756150 > [ 19.691074] RBP: ffffc900014b7d08 R08: 00007fa8a948f000 R09: 0000000000000003 > [ 19.691274] R10: 000000000e6a4a47 R11: 00000000620892b1 R12: 00000000000fc000 > [ 19.691469] R13: 00007fa8a748f000 R14: 00007fa8a748f000 R15: 8000000000000037 > [ 19.691647] FS: 00007fa8aa4edb80(0000) GS:ffff8881ba400000(0000) knlGS:0000000000000000 > [ 19.691830] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 19.691980] CR2: 00007fa8a748f004 CR3: 000000010673b005 CR4: 00000000003706b0 > [ 19.692159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 19.692336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 19.692538] Call Trace: > [ 19.692592] <TASK> > [ 19.692647] ? track_pfn_remap+0xf7/0x100 > [ 19.692745] remap_pfn_range+0x57/0xa0 > [ 19.692845] vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core] > [ 19.692991] __do_fault+0x30/0xa0 > [ 19.693089] __handle_mm_fault+0xe08/0x1ff0 > [ 19.693186] ? find_held_lock+0x31/0x80 > [ 19.693291] ? mt_find+0x15d/0x400 > [ 19.693391] ? lock_release+0xbc/0x240 > [ 19.693491] handle_mm_fault+0xa8/0x170 > [ 19.693587] ? find_vma+0x3c/0x70 > [ 19.693685] exc_page_fault+0x1e6/0x7b0 > [ 19.693782] asm_exc_page_fault+0x27/0x30 > [ 19.693880] RIP: 0033:0x7fa8a986bfd5 > [ 19.693961] Code: ef 41 89 c4 e8 dc 70 fc ff 45 85 e4 0f 85 a0 0e 00 00 48 89 df e8 3b ce ff ff 48 85 c0 0f 84 eb 0d 00 00 48 8b ab e8 02 00 00 <8b> 45 04 0f c8 c1 e8 10 83 f8 05 0f 85 b0 0b 00 00 8b 45 14 ba 01 > [ 19.694342] RSP: 002b:00007ffc2392ab30 EFLAGS: 00010206 > [ 19.694466] RAX: 0000559646201590 RBX: 0000559646471a50 RCX: 0000559646471d00 > [ 19.694619] RDX: 0000559646471d00 RSI: 0000000000003b71 RDI: 0000000000000003 > [ 19.694778] RBP: 00007fa8a748f000 R08: 00000000fedfffff R09: 0000000000000000 > [ 19.694934] R10: 0000000000000022 R11: 0000000000000246 R12: 0000559646470904 > [ 19.695092] R13: 00007ffc2392ab70 R14: 0000000000000002 R15: 0000559646470934 > [ 19.695302] </TASK> > [ 19.695373] irq event stamp: 656049 > [ 19.695452] hardirqs last enabled at (656057): [<ffffffff810c9683>] __up_console_sem+0x53/0x60 > [ 19.695657] hardirqs last disabled at (656064): [<ffffffff810c9668>] __up_console_sem+0x38/0x60 > [ 19.695883] softirqs last enabled at (655148): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90 > [ 19.696061] softirqs last disabled at (655143): [<ffffffff81064e79>] irq_exit_rcu+0x69/0x90 > [ 19.696240] ---[ end trace 0000000000000000 ]--- I took a quick look. It seems that above warnings are irrelevant to this commit. Can you please simply revert this commit and check whether there are any changes? Best regards, baolu
On Thu, Apr 27, 2023 at 11:20:40AM +0800, Baolu Lu wrote: > > Attaching WARNINGs: > > [ 19.680725] ------------[ cut here ]------------ > > [ 19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100 > > [ 19.684817] Call Trace: > > [ 19.684893] <TASK> > > [ 19.684967] remap_pfn_range+0x3e/0xa0 > > [ 19.685084] vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core] > I took a quick look. It seems that above warnings are irrelevant to this > commit. Can you please simply revert this commit and check whether there > are any changes? I tried on top of the v6.3-rc5 tag. The warnings were triggered constantly. And reverting the commit fixes it: nicolinc@Asurada-Nvidia:~/work/mkt/images/nicolinc/src/kernel$ git log --oneline -2 cb3dc9b2417e (HEAD -> v6.3-rc5) Revert "iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()" 7e364e56293b (tag: v6.3-rc5, jgg/linus) Linux 6.3-rc5 I don't think the commit is the causation yet there seems to be a strong correlation here... Thanks Nic
On 2023-04-27 04:37, Nicolin Chen wrote: > On Thu, Apr 27, 2023 at 11:20:40AM +0800, Baolu Lu wrote: > >>> Attaching WARNINGs: >>> [ 19.680725] ------------[ cut here ]------------ >>> [ 19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100 > >>> [ 19.684817] Call Trace: >>> [ 19.684893] <TASK> >>> [ 19.684967] remap_pfn_range+0x3e/0xa0 >>> [ 19.685084] vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core] > >> I took a quick look. It seems that above warnings are irrelevant to this >> commit. Can you please simply revert this commit and check whether there >> are any changes? > > I tried on top of the v6.3-rc5 tag. The warnings were triggered > constantly. And reverting the commit fixes it: > nicolinc@Asurada-Nvidia:~/work/mkt/images/nicolinc/src/kernel$ git log --oneline -2 > cb3dc9b2417e (HEAD -> v6.3-rc5) Revert "iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()" > 7e364e56293b (tag: v6.3-rc5, jgg/linus) Linux 6.3-rc5 > > I don't think the commit is the causation yet there seems to be > a strong correlation here... The correlation is probably that you're now getting to see a genuine warning from lockdep_assert_held(), since intel-iommu is no longer causing debug_locks to be turned off earlier. Robin.
On Thu, Apr 27, 2023 at 10:35:11AM +0100, Robin Murphy wrote: > On 2023-04-27 04:37, Nicolin Chen wrote: > > On Thu, Apr 27, 2023 at 11:20:40AM +0800, Baolu Lu wrote: > > > > > > Attaching WARNINGs: > > > > [ 19.680725] ------------[ cut here ]------------ > > > > [ 19.681083] WARNING: CPU: 0 PID: 561 at include/linux/mmap_lock.h:161 track_pfn_remap+0xf5/0x100 > > > > > > [ 19.684817] Call Trace: > > > > [ 19.684893] <TASK> > > > > [ 19.684967] remap_pfn_range+0x3e/0xa0 > > > > [ 19.685084] vfio_pci_mmap_fault+0x8a/0x160 [vfio_pci_core] > > > > > I took a quick look. It seems that above warnings are irrelevant to this > > > commit. Can you please simply revert this commit and check whether there > > > are any changes? > > > > I tried on top of the v6.3-rc5 tag. The warnings were triggered > > constantly. And reverting the commit fixes it: > > nicolinc@Asurada-Nvidia:~/work/mkt/images/nicolinc/src/kernel$ git log --oneline -2 > > cb3dc9b2417e (HEAD -> v6.3-rc5) Revert "iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()" > > 7e364e56293b (tag: v6.3-rc5, jgg/linus) Linux 6.3-rc5 > > > > I don't think the commit is the causation yet there seems to be > > a strong correlation here... > > The correlation is probably that you're now getting to see a genuine > warning from lockdep_assert_held(), since intel-iommu is no longer > causing debug_locks to be turned off earlier. Hmm, looks like so. The mmap_lock is held with a read permission, in arch/x86/mm/fault.c file, v.s. the expectation of a write one by remap_pfn_range(). Thanks Nicolin
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 6d01fa078c36..df9e261af0b5 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -311,14 +311,12 @@ static int set_ioapic_sid(struct irte *irte, int apic) if (!irte) return -1; - down_read(&dmar_global_lock); for (i = 0; i < MAX_IO_APICS; i++) { if (ir_ioapic[i].iommu && ir_ioapic[i].id == apic) { sid = (ir_ioapic[i].bus << 8) | ir_ioapic[i].devfn; break; } } - up_read(&dmar_global_lock); if (sid == 0) { pr_warn("Failed to set source-id of IOAPIC (%d)\n", apic); @@ -338,14 +336,12 @@ static int set_hpet_sid(struct irte *irte, u8 id) if (!irte) return -1; - down_read(&dmar_global_lock); for (i = 0; i < MAX_HPET_TBS; i++) { if (ir_hpet[i].iommu && ir_hpet[i].id == id) { sid = (ir_hpet[i].bus << 8) | ir_hpet[i].devfn; break; } } - up_read(&dmar_global_lock); if (sid == 0) { pr_warn("Failed to set source-id of HPET block (%d)\n", id); @@ -1339,9 +1335,7 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain, if (!data) goto out_free_parent; - down_read(&dmar_global_lock); index = alloc_irte(iommu, &data->irq_2_iommu, nr_irqs); - up_read(&dmar_global_lock); if (index < 0) { pr_warn("Failed to allocate IRTE\n"); kfree(data);