From patchwork Wed Jan 17 19:34:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dimitri Sivanich X-Patchwork-Id: 188932 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:30f:b0:101:a8e8:374 with SMTP id ia15csp137755dyb; Wed, 17 Jan 2024 11:36:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IFA3XAVkCs4f++aYJ8pez4J4LHu0tq58xGhi3eKsm634UFCukwKRPWa3+42Py09I6VW6rft X-Received: by 2002:a05:6402:340a:b0:559:6f8e:e3a6 with SMTP id k10-20020a056402340a00b005596f8ee3a6mr1282061edc.167.1705520193809; Wed, 17 Jan 2024 11:36:33 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705520193; cv=pass; d=google.com; s=arc-20160816; b=lR59osaWgDncye/Lh2B3Kzp+3h4U+UwURdYhLd6V3SDxEcwPZuoErPbgpAt9IFi4SA qgTRamYPsUTd0O+9yzntEx9Qkn/rsUqxYvIydCDw2nwDQtqmlWAkxSghrTAYv30lS+G0 mHDdphCQhbvm0RDwuUqCk2XMEOoWE723qsAmOtixK7TqOQOHa1YvLL7sdAqvftBJ0+DZ L1XG5v8jp47CfWquongZ5qXswN6jR0su7lmBTSEWowfXghXga8PYc3cx18Dr15yw6cdx bEmPKatQ0ADwwnk8TWGJCOnH5QLDTdf9uJ0WMLJOYBjmXNHMUOsR+1hPH8agCRfoyLdt eI2Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:message-id:subject:to:from:date:dkim-signature; bh=5NEj7FBKtmppNrd48cW0AoeV3qh+diVhVYhI4Gm/3pA=; fh=Ra8340g72MpHQIjEohBdsbEEdyaBG8Mql2b7B3tfIyA=; b=ccX0hTssQ7pe+/kvh9Ywoj7t5ptYfsiFotYsdNBuK0VRSSbNmMk7JriLXm/ZAqeoOO CKwruJb47Ig1d34y12Co958nS4P5y7SJT2zmCC0rcHnPUZ0GmFKzvjOKJH4ZUyiQ+a3q 8bywPUt/jWmm7yO6TGTkNcWGevdOo1kkexyw88CEFjHepXzwu9RblD73ErOu/lkTqtTF d6N3VECVmx4BjUSxQ0IGDiPppQWj61lGwTrzGwBgL16SzZ1D0bQJGtV6Yi8H6thfiD+F UfizFAcw8jOyLjnPx4ovMxJPV8I4ZnHTYLFM0HQLN14iIxVEcl8RJk5MF2EXZXJ9aCsr WNrQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=KC9fgn+y; arc=pass (i=1 spf=pass spfdomain=hpe.com dkim=pass dkdomain=hpe.com dmarc=pass fromdomain=hpe.com); spf=pass (google.com: domain of linux-kernel+bounces-29367-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29367-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id et13-20020a056402378d00b005572213b5dfsi6334679edb.122.2024.01.17.11.36.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 11:36:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-29367-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=KC9fgn+y; arc=pass (i=1 spf=pass spfdomain=hpe.com dkim=pass dkdomain=hpe.com dmarc=pass fromdomain=hpe.com); spf=pass (google.com: domain of linux-kernel+bounces-29367-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29367-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id AF2C91F214BA for ; Wed, 17 Jan 2024 19:36:01 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F0C5C24A07; Wed, 17 Jan 2024 19:35:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b="KC9fgn+y" Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C90E1249E0; Wed, 17 Jan 2024 19:35:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.147.86 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705520144; cv=none; b=XULUPQEipaGUOt537HzxU7X99a3tv0UN0d9UJ7ZG1RBj33mIDumX0hX3PQSmQkx2UgI4pvhSrzFyiM8y+hH9kSTdxyoVvVdXSjyPfQqmsAJOvDFIqZJ6oympEMh5BrdIV3Z7Rr6dKbk7CzlmihV6LySbrbcKPncipUebwxPYWYs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705520144; c=relaxed/simple; bh=8O6ZpuJktfG7PP+qpivWRnoX8s1X5AAm7DEcq/yXN3A=; h=Received:DKIM-Signature:Received:Received:Received:Date:From:To: Subject:Message-ID:MIME-Version:Content-Type:Content-Disposition: X-Proofpoint-GUID:X-Proofpoint-ORIG-GUID:X-HPE-SCL: X-Proofpoint-Virus-Version:X-Proofpoint-Spam-Details; b=tEQ5IOY+OM9917u+zTHuJVB/TgrDNq/uGRWgf7irMcLezIiDXstWV/arW85srGVoRwYNqfd40L0B3P1yYyVu/BGX5j3MyWcHQfi2xD0O/dKiYeOTlEaNpYWxFMn4/27uoLKErv6f7SCFTEIG0FDrcNJq4fvvJAKsdXiv8NHxRl0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com; spf=pass smtp.mailfrom=hpe.com; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b=KC9fgn+y; arc=none smtp.client-ip=148.163.147.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hpe.com Received: from pps.filterd (m0150242.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 40HH0uDr020284; Wed, 17 Jan 2024 19:35:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=date : from : to : subject : message-id : mime-version : content-type; s=pps0720; bh=5NEj7FBKtmppNrd48cW0AoeV3qh+diVhVYhI4Gm/3pA=; b=KC9fgn+yBrtzXiYmu2c4kPQavGuUpPlwe1whyYZX0+gVFHcWseD7Eny7wbDBvHgXPXPV +3/IjU4YGo6i/S4o5mzQBPRdijP34GgNYRuTxfMysAJh6U0O66osoNF7VGFJt+INMY3k F8d71PxjYedjO67tJwXSF8J5Q0DfiA61vIPj6r6NDoZ3Ne3HDymb5HRsjFmKwmjf6EMv MhAIHZn4xmKlNFhQrbCdxoQHOd+PB2wpuGI8antb5EM3hBcTMqmSEGSOQXO4SXBoS+t5 6Z59sbhMYUk5rX0IRglIIsnhXCrBJfSHMJ1CtQxTJhwpvNINlmWJzSnphBcc1MCdX22u Yw== Received: from p1lg14880.it.hpe.com ([16.230.97.201]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3vphs7agry-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 Jan 2024 19:35:03 +0000 Received: from p1lg14886.dc01.its.hpecorp.net (unknown [10.119.18.237]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14880.it.hpe.com (Postfix) with ESMTPS id 37D908005DA; Wed, 17 Jan 2024 19:35:02 +0000 (UTC) Received: from hpe.com (unknown [16.231.227.39]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by p1lg14886.dc01.its.hpecorp.net (Postfix) with ESMTPS id A33308005FE; Wed, 17 Jan 2024 19:34:53 +0000 (UTC) Date: Wed, 17 Jan 2024 13:34:51 -0600 From: Dimitri Sivanich To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Robin Murphy , Thomas Gleixner , Steve Wahl , Russ Anderson , iommu@lists.linux.dev, linux-kernel@vger.kernel.org Subject: DMAR interrupt reservations exceed available resources on 32s Intel SPR Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline X-Proofpoint-GUID: O8L5LhUTcRVdzynEi5H33LOHC3iCxqM7 X-Proofpoint-ORIG-GUID: O8L5LhUTcRVdzynEi5H33LOHC3iCxqM7 X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-17_12,2024-01-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 adultscore=0 mlxscore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=877 bulkscore=0 priorityscore=1501 clxscore=1011 spamscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2401170142 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788367542815019531 X-GMAIL-MSGID: 1788367542815019531 We currently are running Sapphire Rapids systems with 32 sockets. We have noticed that on systems of this size we see the following warnings: DMAR-IR: Enabled IRQ remapping in x2apic mode Interrupt reservation exceeds available resources DMAR: Can't request irq DMAR: DRHD 97bfc000: failed to enable fault, interrupt, ret -28 There are 10 DMAR units per socket on these systems, so 320 DMAR units. It appears that each is being allocated its own vector on the first cpu, as this happens prior to other cpus being started. At the time that the DMAR vector allocations begin, cpu 0 has 186 vectors available (as does the global set), and that is the number of DMAR fault interrupts that show up as allocated (/proc/interrupts) after boot. As a simple experiment, I patched the kernel to only allocate node 0 DMAR fault interrupt vectors while only the boot cpu is running, then to allocate fault interrupt vectors for the rest of the DMAR units as cpus on their respective nodes are coming up. With that change, all 320 interrupts were allocated, and the interrupt affinity was set to the first cpu on each DMAR's respective node. Does this seem like a sensible approach to fixing this, or is there another avenue that we should be looking at? For illustrative purposes, here's the patch for the experiment described above: Index: linux/drivers/iommu/intel/dmar.c =================================================================== --- linux.orig/drivers/iommu/intel/dmar.c +++ linux/drivers/iommu/intel/dmar.c @@ -2055,6 +2055,38 @@ int dmar_set_interrupt(struct intel_iomm return ret; } +int __init enable_remaining_drhd_fault_handling(int node) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + + /* + * Enable fault control interrupt. + */ + for_each_iommu(iommu, drhd) { + u32 fault_status; + int ret; + if (iommu->node != node) + continue; + ret = dmar_set_interrupt(iommu); + + if (ret) { + pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n", + (unsigned long long)drhd->reg_base_addr, ret); + return -1; + } + + /* + * Clear any previous faults. + */ + dmar_fault(iommu->irq, iommu); + fault_status = readl(iommu->reg + DMAR_FSTS_REG); + writel(fault_status, iommu->reg + DMAR_FSTS_REG); + } + + return 0; +} + int __init enable_drhd_fault_handling(void) { struct dmar_drhd_unit *drhd; @@ -2065,7 +2097,10 @@ int __init enable_drhd_fault_handling(vo */ for_each_iommu(iommu, drhd) { u32 fault_status; - int ret = dmar_set_interrupt(iommu); + int ret; + if (iommu->node != 0) + continue; + ret = dmar_set_interrupt(iommu); if (ret) { pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n", Index: linux/arch/x86/kernel/smpboot.c =================================================================== --- linux.orig/arch/x86/kernel/smpboot.c +++ linux/arch/x86/kernel/smpboot.c @@ -210,6 +210,7 @@ static void smp_callin(void) cpumask_set_cpu(cpuid, cpu_callin_mask); } +extern int __init enable_remaining_drhd_fault_handling(int cpu); static int cpu0_logical_apicid; static int enable_start_cpu0; /* @@ -263,6 +264,7 @@ static void notrace start_secondary(void x86_cpuinit.setup_percpu_clockev(); + enable_remaining_drhd_fault_handling(cpu_to_node(smp_processor_id())); wmb(); cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); }