Message ID | ZeDj/LK56borSxO4@hpe.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-87555-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp685674dyb; Thu, 29 Feb 2024 13:15:38 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVwbPSV+YfEs7Vs3Cmzg2ew9YZjddtNoNeGax8Jl+bLTHuNca8jzpCy4i2LUUtgUhCtfgHPAd7QCp0mkKmpI7JSTIr8Uw== X-Google-Smtp-Source: AGHT+IFGh8xVxTzADEbYLQ+C4WqPu8J0KL1ypEcU2MPpjFALxXrJBGDzlHXc26dcuJWtldhSYg89 X-Received: by 2002:a05:6a20:a083:b0:1a1:cdd:1ab with SMTP id r3-20020a056a20a08300b001a10cdd01abmr2634736pzj.40.1709241337953; Thu, 29 Feb 2024 13:15:37 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709241337; cv=pass; d=google.com; s=arc-20160816; b=C3DjsdmWZnWYIqkebQgfen19egecwGVPIn0jtm3nwkz4/DPbuk4Ehsh62V0e02tKVS cgsCuuYlOBDkZX0xoyb9PC6CTuOmtoWfE+aKFbBRKM5N6jhUIOrdJic9GlY2JYGeX4XC nIyHMCV2AlnYMYYqXrgyg52R10k/dWHazLhRnIEYPfgJ1S58EeDV38/6EAYeVtxXodYQ Ba9eMPf6/zLxuteGwljmpJ4q3stzlm6XC/y5nryoX2UhPZkH7io+9GKFO6J9JT597lVg RlGg+ME2D8JhDtFOEm1yssnhB3bLja5cqBlUpsG+7GsyaTojzjafNttttEQtOZdqSJBy egRg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:message-id:subject:cc:to:from:date :dkim-signature; bh=bTL4DOSmyNxQGe0QwLYqgTCiTE3h293KECpT2Cd+njk=; fh=XcFjdjFTaNtz88QPe6LSj9P32kiohGo6scefG7jmvdI=; b=pUrU+naSOMMpCFwUPa9ezWeLlSSnJbiww341o39Y41KWs6pMrOCvMFM142P5eNVaiG jOwkoB8Z1GhKH7HI7Du4rSrdg0hDsl5DHfbflGCpUyaYCVU5ZCQ/FiXh7PWKnCLrYNp5 wJLvkDyf5SV5XrETUvcUrtuK/Jeu7DVHFBGOfNNpneCYTZ9z97zHgwPO+tZsPVLXP6e6 P5R36BNQwIPEaWMXHGGpnhN3rll/dftDDKEoBaYemtoVuYKQ5wQCUwv7n6TMRvnPQnJ5 x+aSxOw7oej4tmnMegLA50qPHu4dRMnMa6CGXBHv9h/rKsq2F6rsG5y2yJOsyTIjxDgj CkjA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=kzPXwjK1; arc=pass (i=1 spf=pass spfdomain=hpe.com dkim=pass dkdomain=hpe.com dmarc=pass fromdomain=hpe.com); spf=pass (google.com: domain of linux-kernel+bounces-87555-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87555-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id n9-20020a6543c9000000b005dc1bb540cesi2110199pgp.76.2024.02.29.13.15.37 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 13:15:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-87555-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=kzPXwjK1; arc=pass (i=1 spf=pass spfdomain=hpe.com dkim=pass dkdomain=hpe.com dmarc=pass fromdomain=hpe.com); spf=pass (google.com: domain of linux-kernel+bounces-87555-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87555-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 33557288870 for <ouuuleilei@gmail.com>; Thu, 29 Feb 2024 21:11:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1FBF914404D; Thu, 29 Feb 2024 21:03:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b="kzPXwjK1" Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F184014373A; Thu, 29 Feb 2024 21:03:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.147.86 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709240594; cv=none; b=Pq2FBAZyY+s0XsGgrnFsNwjIqHRVTe50OOEmyZrWrvhiEsC2G+TET0eZV1XU8whCSx+SH5VOAqDBmQqq27Dn2XEsbwULY7cbe8BvFzcCpLlFJmaBomEnzE3MSZwv05U9JxRlFEW1rwd9m2pDVjxBriE1hJFCP33Uc90OxBs2Ffg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709240594; c=relaxed/simple; bh=VTjWIjnaot8bvp8Uz7YSb+Wu+VwzB7BvnAiwu0uiHUE=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=aA5Kx72eCTQTlcfN85Ih+XhIu4XF4ziOO3F3Zx1Tun8tw96TIREO2CLLU2ZHPTvqDXf+FHPFlKHsHC+dYAz1pm48C3JNfsBcJ8j0yZBgUibj9Cc0+EWKyRPK2ruKLwpnFxIlWWYx9HbrqDopnnj8ljt5aKIXMwhqUVwjy3Jkrhw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com; spf=pass smtp.mailfrom=hpe.com; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b=kzPXwjK1; arc=none smtp.client-ip=148.163.147.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hpe.com Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41TFBuaH010561; Thu, 29 Feb 2024 20:07:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=date : from : to : cc : subject : message-id : mime-version : content-type; s=pps0720; bh=bTL4DOSmyNxQGe0QwLYqgTCiTE3h293KECpT2Cd+njk=; b=kzPXwjK1SZsCC4A2UXBWK91e9uFF/hCuTPi/3IZkSdWrHbLVAxL10mh32LdEw+Qf5/ik at81CsioejMhAKCSvyosSXQTrD63rc1D1WG3QZE84heShZI+kdUW4LvQ0dzZE4esF0OQ /GLydvzxR5es2XSb3ehAa2Sd4tkM+ZqWsYOjmbT1nfxPMjNPWoDTqxkvuPohvhABOSFU e1sc33lrbdKqeMAkVfZzwza1x2KLjLatL3e6oHtFcInLnKCYtXEyqtvWrOqwpzmj/fb/ qthu2c9EMSRUd+FI9oRgFnlP/q2d3V9q4YlabwIFqRkVDvEs4B8R3/II53iogipiTrde 7w== Received: from p1lg14879.it.hpe.com ([16.230.97.200]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3wjvbxb4w5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Feb 2024 20:07:30 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id 9BB85130F8; Thu, 29 Feb 2024 20:07:29 +0000 (UTC) Received: from hpe.com (unknown [16.231.227.39]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTPS id A83CB8005F1; Thu, 29 Feb 2024 20:07:26 +0000 (UTC) Date: Thu, 29 Feb 2024 14:07:24 -0600 From: Dimitri Sivanich <sivanich@hpe.com> To: Dimitri Sivanich <sivanich@hpe.com>, David Woodhouse <dwmw2@infradead.org>, Lu Baolu <baolu.lu@linux.intel.com>, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Thomas Gleixner <tglx@linutronix.de>, iommu@lists.linux.dev Cc: linux-kernel@vger.kernel.org, Steve Wahl <steve.wahl@hpe.com>, Russ Anderson <russ.anderson@hpe.com> Subject: [PATCH] Allocate DMAR fault interrupts locally Message-ID: <ZeDj/LK56borSxO4@hpe.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Proofpoint-GUID: kaKI-Rt3ddMThGjjOtXFFOFMzOETu6v1 X-Proofpoint-ORIG-GUID: kaKI-Rt3ddMThGjjOtXFFOFMzOETu6v1 X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-29_06,2024-02-29_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 adultscore=0 mlxscore=0 mlxlogscore=995 priorityscore=1501 impostorscore=0 phishscore=0 suspectscore=0 lowpriorityscore=0 bulkscore=0 malwarescore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402290155 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792269445555206762 X-GMAIL-MSGID: 1792269445555206762 |
Series |
Allocate DMAR fault interrupts locally
|
|
Commit Message
Dimitri Sivanich
Feb. 29, 2024, 8:07 p.m. UTC
The Intel IOMMU code currently tries to allocate all DMAR fault interrupt
vectors on the boot cpu. On large systems with high DMAR counts this
results in vector exhaustion, and most of the vectors are not initially
allocated socket local.
Instead, have a cpu on each node do the vector allocation for the DMARs on
that node. The boot cpu still does the allocation for its node during its
boot sequence.
Signed-off-by: Dimitri Sivanich <sivanich@hpe.com>
---
drivers/iommu/intel/dmar.c | 34 +++++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
Comments
Dimitri! On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote: The subject lacks a subsystem prefix. You're doing this for how many decades now? > The Intel IOMMU code currently tries to allocate all DMAR fault interrupt > > +#ifdef CONFIG_X86_LOCAL_APIC I seriously doubt that this code can ever be compiled w/o X86_LOCAL_APIC: obj-$(CONFIG_DMAR_TABLE) += dmar.o config DMAR_TABLE bool config INTEL_IOMMU depends on PCI_MSI && ACPI && X86 select DMAR_TABLE config IRQ_REMAP depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI select DMAR_TABLE config X86_LOCAL_APIC def_bool y depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI What are you trying to achieve here other than #ifdef voodoo? > +static void __init irq_remap_enable_fault_handling_thr(struct work_struct *work) > +{ > + irq_remap_enable_fault_handling(); because if INTEL_IOMMU=y and IRQ_REMAP=n then X86_LOCAL_APIC=y and this muck gets invoked for nothing. 'git grep irq_remap_enable_fault_handling include/' might give you a hint. > +} > + > +static int __init assign_dmar_vectors(void) > +{ > + struct work_struct irq_remap_work; > + int nid; > + > + INIT_WORK(&irq_remap_work, irq_remap_enable_fault_handling_thr); > + cpus_read_lock(); > + for_each_online_node(nid) { > + /* Boot cpu dmar vectors are assigned before the rest */ > + if (nid == cpu_to_node(get_boot_cpu_id())) > + continue; > + schedule_work_on(cpumask_first(cpumask_of_node(nid)), > + &irq_remap_work); > + flush_work(&irq_remap_work); > + } > + cpus_read_unlock(); > + return 0; > +} > + > +arch_initcall(assign_dmar_vectors); Stray newline before arch_initcall(), but that's not the problem. The real problems are: 1) This approach only works when _ALL_ APs have been brought up during boot. With 'maxcpus=N' on the command line this will fail to enable fault handling when the APs which have not been brought up initially are onlined later on. This might be working in practice because intel_iommu_init() will enable the interrupts later on via init_dmars() unconditionally, but that's far from correct because IRQ_REMAP does not depend on INTEL_IOMMU. 2) It leaves a gap where the reporting is not working between bringing up the APs during boot and this initcall. Mostly theoretical, but that does not make it more correct either. What you really want is a cpu hotplug state in the CPUHP_BP_PREPARE_DYN space which enables the interrupt for the node _before_ the first AP of the node is brought up. That will solve the problem nicely w/o any of the above issues. Thanks, tglx
Hi Thomas, On Thu, 29 Feb 2024 23:18:37 +0100, Thomas Gleixner <tglx@linutronix.de> wrote: > Dimitri! > > On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote: > > The subject lacks a subsystem prefix. You're doing this for how many > decades now? > > > The Intel IOMMU code currently tries to allocate all DMAR fault > > interrupt > > +#ifdef CONFIG_X86_LOCAL_APIC > > I seriously doubt that this code can ever be compiled w/o X86_LOCAL_APIC: > > obj-$(CONFIG_DMAR_TABLE) += dmar.o > > config DMAR_TABLE > bool > > config INTEL_IOMMU > depends on PCI_MSI && ACPI && X86 > select DMAR_TABLE > > config IRQ_REMAP > depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI > select DMAR_TABLE > > config X86_LOCAL_APIC > def_bool y > depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || > PCI_MSI > > What are you trying to achieve here other than #ifdef voodoo? > > > +static void __init irq_remap_enable_fault_handling_thr(struct > > work_struct *work) +{ > > + irq_remap_enable_fault_handling(); > > because if INTEL_IOMMU=y and IRQ_REMAP=n then X86_LOCAL_APIC=y and this > muck gets invoked for nothing. 'git grep irq_remap_enable_fault_handling > include/' might give you a hint. > > > +} > > + > > +static int __init assign_dmar_vectors(void) > > +{ > > + struct work_struct irq_remap_work; > > + int nid; > > + > > + INIT_WORK(&irq_remap_work, > > irq_remap_enable_fault_handling_thr); > > + cpus_read_lock(); > > + for_each_online_node(nid) { > > + /* Boot cpu dmar vectors are assigned before the rest > > */ > > + if (nid == cpu_to_node(get_boot_cpu_id())) > > + continue; > > + schedule_work_on(cpumask_first(cpumask_of_node(nid)), > > + &irq_remap_work); > > + flush_work(&irq_remap_work); > > + } > > + cpus_read_unlock(); > > + return 0; > > +} > > + > > +arch_initcall(assign_dmar_vectors); > > Stray newline before arch_initcall(), but that's not the problem. > > The real problems are: > > 1) This approach only works when _ALL_ APs have been brought up during > boot. With 'maxcpus=N' on the command line this will fail to enable > fault handling when the APs which have not been brought up initially > are onlined later on. > > This might be working in practice because intel_iommu_init() will > enable the interrupts later on via init_dmars() unconditionally, but > that's far from correct because IRQ_REMAP does not depend on > INTEL_IOMMU. The dmar fault interrupt is VT-d's own interrupt, not subject to IRQ_REMAP. So this set up has nothing to do with IR, right? Maybe we should not call it irq_remap_work, call it dmar_fault_irq_work instead? > 2) It leaves a gap where the reporting is not working between bringing > up the APs during boot and this initcall. Mostly theoretical, but > that does not make it more correct either. > > What you really want is a cpu hotplug state in the CPUHP_BP_PREPARE_DYN > space which enables the interrupt for the node _before_ the first AP of > the node is brought up. That will solve the problem nicely w/o any of > the above issues. > > Thanks, > > tglx > Thanks, Jacob
Jacob! On Fri, Mar 01 2024 at 11:50, Jacob Pan wrote: > On Thu, 29 Feb 2024 23:18:37 +0100, Thomas Gleixner <tglx@linutronix.de> > wrote: >> On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote: >> The real problems are: >> >> 1) This approach only works when _ALL_ APs have been brought up during >> boot. With 'maxcpus=N' on the command line this will fail to enable >> fault handling when the APs which have not been brought up initially >> are onlined later on. >> >> This might be working in practice because intel_iommu_init() will >> enable the interrupts later on via init_dmars() unconditionally, but >> that's far from correct because IRQ_REMAP does not depend on >> INTEL_IOMMU. > The dmar fault interrupt is VT-d's own interrupt, not subject to IRQ_REMAP. > So this set up has nothing to do with IR, right? Both interrupt remap and the IOMMU use DMAR and both set the DMAR's fault interrupt up, whatever comes first. If IR is enabled then IR does it, if not then the IOMMU init handles it. But yes, the interrupt is part of DMAR. > Maybe we should not call it irq_remap_work, call it dmar_fault_irq_work > instead? This work thing does not work at all as I explained in detail. No matter how you name it. The only sane way to do that is with a hotplug state. Thanks, tglx
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 23cb80d62a9a..41ef72ba7509 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -2108,8 +2108,12 @@ int __init enable_drhd_fault_handling(void) */ for_each_iommu(iommu, drhd) { u32 fault_status; - int ret = dmar_set_interrupt(iommu); + int ret; + if (iommu->node != cpu_to_node(smp_processor_id())) + continue; + + ret = dmar_set_interrupt(iommu); if (ret) { pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n", (unsigned long long)drhd->reg_base_addr, ret); @@ -2192,6 +2196,34 @@ static int __init dmar_free_unused_resources(void) late_initcall(dmar_free_unused_resources); +#ifdef CONFIG_X86_LOCAL_APIC +static void __init irq_remap_enable_fault_handling_thr(struct work_struct *work) +{ + irq_remap_enable_fault_handling(); +} + +static int __init assign_dmar_vectors(void) +{ + struct work_struct irq_remap_work; + int nid; + + INIT_WORK(&irq_remap_work, irq_remap_enable_fault_handling_thr); + cpus_read_lock(); + for_each_online_node(nid) { + /* Boot cpu dmar vectors are assigned before the rest */ + if (nid == cpu_to_node(get_boot_cpu_id())) + continue; + schedule_work_on(cpumask_first(cpumask_of_node(nid)), + &irq_remap_work); + flush_work(&irq_remap_work); + } + cpus_read_unlock(); + return 0; +} + +arch_initcall(assign_dmar_vectors); +#endif /* CONFIG_X86_LOCAL_APIC */ + /* * DMAR Hotplug Support * For more details, please refer to Intel(R) Virtualization Technology