diff mbox series

Allocate DMAR fault interrupts locally

Message ID	ZeDj/LK56borSxO4@hpe.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-87555-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Date: Thu, 29 Feb 2024 14:07:24 -0600 From: Dimitri Sivanich <sivanich@hpe.com> To: Dimitri Sivanich <sivanich@hpe.com>, David Woodhouse <dwmw2@infradead.org>, Lu Baolu <baolu.lu@linux.intel.com>, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Thomas Gleixner <tglx@linutronix.de>, iommu@lists.linux.dev Cc: linux-kernel@vger.kernel.org, Steve Wahl <steve.wahl@hpe.com>, Russ Anderson <russ.anderson@hpe.com> Subject: [PATCH] Allocate DMAR fault interrupts locally Message-ID: <ZeDj/LK56borSxO4@hpe.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	Allocate DMAR fault interrupts locally \| Allocate DMAR fault interrupts locally

Commit Message

Dimitri Sivanich Feb. 29, 2024, 8:07 p.m. UTC

  The Intel IOMMU code currently tries to allocate all DMAR fault interrupt
vectors on the boot cpu.  On large systems with high DMAR counts this
results in vector exhaustion, and most of the vectors are not initially
allocated socket local.

Instead, have a cpu on each node do the vector allocation for the DMARs on
that node.  The boot cpu still does the allocation for its node during its
boot sequence.

Signed-off-by: Dimitri Sivanich <sivanich@hpe.com>
---
 drivers/iommu/intel/dmar.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

Comments

Thomas Gleixner Feb. 29, 2024, 10:18 p.m. UTC | #1

Dimitri!

On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote:

The subject lacks a subsystem prefix. You're doing this for how many
decades now?

> The Intel IOMMU code currently tries to allocate all DMAR fault interrupt
>  
> +#ifdef CONFIG_X86_LOCAL_APIC

I seriously doubt that this code can ever be compiled w/o X86_LOCAL_APIC:

obj-$(CONFIG_DMAR_TABLE) += dmar.o

config DMAR_TABLE
        bool

config INTEL_IOMMU
        depends on PCI_MSI && ACPI && X86
        select DMAR_TABLE

config IRQ_REMAP
        depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI
        select DMAR_TABLE

config X86_LOCAL_APIC
        def_bool y
        depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI

What are you trying to achieve here other than #ifdef voodoo?

> +static void __init irq_remap_enable_fault_handling_thr(struct work_struct *work)
> +{
> +	irq_remap_enable_fault_handling();

because if INTEL_IOMMU=y and IRQ_REMAP=n then X86_LOCAL_APIC=y and this
muck gets invoked for nothing. 'git grep irq_remap_enable_fault_handling
include/' might give you a hint.

> +}
> +
> +static int __init assign_dmar_vectors(void)
> +{
> +	struct work_struct irq_remap_work;
> +	int nid;
> +
> +	INIT_WORK(&irq_remap_work, irq_remap_enable_fault_handling_thr);
> +	cpus_read_lock();
> +	for_each_online_node(nid) {
> +		/* Boot cpu dmar vectors are assigned before the rest */
> +		if (nid == cpu_to_node(get_boot_cpu_id()))
> +			continue;
> +		schedule_work_on(cpumask_first(cpumask_of_node(nid)),
> +				 &irq_remap_work);
> +		flush_work(&irq_remap_work);
> +	}
> +	cpus_read_unlock();
> +	return 0;
> +}
> +
> +arch_initcall(assign_dmar_vectors);

Stray newline before arch_initcall(), but that's not the problem.

The real problems are:

 1) This approach only works when _ALL_ APs have been brought up during
    boot. With 'maxcpus=N' on the command line this will fail to enable
    fault handling when the APs which have not been brought up initially
    are onlined later on.

    This might be working in practice because intel_iommu_init() will
    enable the interrupts later on via init_dmars() unconditionally, but
    that's far from correct because IRQ_REMAP does not depend on
    INTEL_IOMMU.

 2) It leaves a gap where the reporting is not working between bringing
    up the APs during boot and this initcall. Mostly theoretical, but
    that does not make it more correct either.

What you really want is a cpu hotplug state in the CPUHP_BP_PREPARE_DYN
space which enables the interrupt for the node _before_ the first AP of
the node is brought up. That will solve the problem nicely w/o any of
the above issues.

Thanks,

        tglx

Jacob Pan March 1, 2024, 7:50 p.m. UTC | #2

Hi Thomas,

On Thu, 29 Feb 2024 23:18:37 +0100, Thomas Gleixner <tglx@linutronix.de>
wrote:

> Dimitri!
> 
> On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote:
> 
> The subject lacks a subsystem prefix. You're doing this for how many
> decades now?
> 
> > The Intel IOMMU code currently tries to allocate all DMAR fault
> > interrupt 
> > +#ifdef CONFIG_X86_LOCAL_APIC  
> 
> I seriously doubt that this code can ever be compiled w/o X86_LOCAL_APIC:
> 
> obj-$(CONFIG_DMAR_TABLE) += dmar.o
> 
> config DMAR_TABLE
>         bool
> 
> config INTEL_IOMMU
>         depends on PCI_MSI && ACPI && X86
>         select DMAR_TABLE
> 
> config IRQ_REMAP
>         depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI
>         select DMAR_TABLE
> 
> config X86_LOCAL_APIC
>         def_bool y
>         depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC ||
> PCI_MSI
> 
> What are you trying to achieve here other than #ifdef voodoo?
> 
> > +static void __init irq_remap_enable_fault_handling_thr(struct
> > work_struct *work) +{
> > +	irq_remap_enable_fault_handling();  
> 
> because if INTEL_IOMMU=y and IRQ_REMAP=n then X86_LOCAL_APIC=y and this
> muck gets invoked for nothing. 'git grep irq_remap_enable_fault_handling
> include/' might give you a hint.
> 
> > +}
> > +
> > +static int __init assign_dmar_vectors(void)
> > +{
> > +	struct work_struct irq_remap_work;
> > +	int nid;
> > +
> > +	INIT_WORK(&irq_remap_work,
> > irq_remap_enable_fault_handling_thr);
> > +	cpus_read_lock();
> > +	for_each_online_node(nid) {
> > +		/* Boot cpu dmar vectors are assigned before the rest
> > */
> > +		if (nid == cpu_to_node(get_boot_cpu_id()))
> > +			continue;
> > +		schedule_work_on(cpumask_first(cpumask_of_node(nid)),
> > +				 &irq_remap_work);
> > +		flush_work(&irq_remap_work);
> > +	}
> > +	cpus_read_unlock();
> > +	return 0;
> > +}
> > +
> > +arch_initcall(assign_dmar_vectors);  
> 
> Stray newline before arch_initcall(), but that's not the problem.
> 
> The real problems are:
> 
>  1) This approach only works when _ALL_ APs have been brought up during
>     boot. With 'maxcpus=N' on the command line this will fail to enable
>     fault handling when the APs which have not been brought up initially
>     are onlined later on.
> 
>     This might be working in practice because intel_iommu_init() will
>     enable the interrupts later on via init_dmars() unconditionally, but
>     that's far from correct because IRQ_REMAP does not depend on
>     INTEL_IOMMU.
The dmar fault interrupt is VT-d's own interrupt, not subject to IRQ_REMAP.
So this set up has nothing to do with IR, right?

Maybe we should not call it irq_remap_work, call it dmar_fault_irq_work
instead?

>  2) It leaves a gap where the reporting is not working between bringing
>     up the APs during boot and this initcall. Mostly theoretical, but
>     that does not make it more correct either.
> 
> What you really want is a cpu hotplug state in the CPUHP_BP_PREPARE_DYN
> space which enables the interrupt for the node _before_ the first AP of
> the node is brought up. That will solve the problem nicely w/o any of
> the above issues.
> 
> Thanks,
> 
>         tglx
> 


Thanks,

Jacob

Thomas Gleixner March 1, 2024, 7:59 p.m. UTC | #3

Jacob!

On Fri, Mar 01 2024 at 11:50, Jacob Pan wrote:
> On Thu, 29 Feb 2024 23:18:37 +0100, Thomas Gleixner <tglx@linutronix.de>
> wrote:
>> On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote:
>> The real problems are:
>> 
>>  1) This approach only works when _ALL_ APs have been brought up during
>>     boot. With 'maxcpus=N' on the command line this will fail to enable
>>     fault handling when the APs which have not been brought up initially
>>     are onlined later on.
>> 
>>     This might be working in practice because intel_iommu_init() will
>>     enable the interrupts later on via init_dmars() unconditionally, but
>>     that's far from correct because IRQ_REMAP does not depend on
>>     INTEL_IOMMU.
> The dmar fault interrupt is VT-d's own interrupt, not subject to IRQ_REMAP.
> So this set up has nothing to do with IR, right?

Both interrupt remap and the IOMMU use DMAR and both set the DMAR's
fault interrupt up, whatever comes first. If IR is enabled then IR does
it, if not then the IOMMU init handles it.

But yes, the interrupt is part of DMAR.

> Maybe we should not call it irq_remap_work, call it dmar_fault_irq_work
> instead?

This work thing does not work at all as I explained in detail. No matter
how you name it. The only sane way to do that is with a hotplug state.

Thanks,

        tglx

diff mbox series

Patch

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 23cb80d62a9a..41ef72ba7509 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -2108,8 +2108,12 @@  int __init enable_drhd_fault_handling(void)
 	 */
 	for_each_iommu(iommu, drhd) {
 		u32 fault_status;
-		int ret = dmar_set_interrupt(iommu);
+		int ret;
 
+		if (iommu->node != cpu_to_node(smp_processor_id()))
+			continue;
+
+		ret = dmar_set_interrupt(iommu);
 		if (ret) {
 			pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
 			       (unsigned long long)drhd->reg_base_addr, ret);
@@ -2192,6 +2196,34 @@  static int __init dmar_free_unused_resources(void)
 
 late_initcall(dmar_free_unused_resources);
 
+#ifdef CONFIG_X86_LOCAL_APIC
+static void __init irq_remap_enable_fault_handling_thr(struct work_struct *work)
+{
+	irq_remap_enable_fault_handling();
+}
+
+static int __init assign_dmar_vectors(void)
+{
+	struct work_struct irq_remap_work;
+	int nid;
+
+	INIT_WORK(&irq_remap_work, irq_remap_enable_fault_handling_thr);
+	cpus_read_lock();
+	for_each_online_node(nid) {
+		/* Boot cpu dmar vectors are assigned before the rest */
+		if (nid == cpu_to_node(get_boot_cpu_id()))
+			continue;
+		schedule_work_on(cpumask_first(cpumask_of_node(nid)),
+				 &irq_remap_work);
+		flush_work(&irq_remap_work);
+	}
+	cpus_read_unlock();
+	return 0;
+}
+
+arch_initcall(assign_dmar_vectors);
+#endif /* CONFIG_X86_LOCAL_APIC */
+
 /*
  * DMAR Hotplug Support
  * For more details, please refer to Intel(R) Virtualization Technology