From patchwork Tue Mar 28 18:20:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Kan" X-Patchwork-Id: 76228 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2443256vqo; Tue, 28 Mar 2023 12:06:54 -0700 (PDT) X-Google-Smtp-Source: AKy350YQUquLc1OHpxPTKg8WPfDjsPEZYe7dohm3+YyZsk41GNUeuSV6selAd8RDe1PVPAlFVQic X-Received: by 2002:aa7:9542:0:b0:625:7536:7b0e with SMTP id w2-20020aa79542000000b0062575367b0emr14818604pfq.29.1680030413661; Tue, 28 Mar 2023 12:06:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680030413; cv=none; d=google.com; s=arc-20160816; b=m9gjjbuaSXR69HLVd75pDlAqD/KBmydluwNf1eJBMCnxVJ/qZEEf9j2K9ivIDGcz0k Ip2zlDDsQKOsW7e/p1Lz1IjkWx7cVe5qGIcF3cHfABJG10P+vP0CT6BipGJNczwBEQiX ZXhAjNQDFdz55eiV2N9As2WY0FQIpzov01M7Qtat/R1NIGutqHcxvL82vouFDwgOM4HD ot3zzLfF2BMAH9wRDfO9qng7JsD+SKpP0q8Z5TIvYbIkOrE42OzJdU/OmvOwbl3muN1o ArkgWcKdM/q91cedFSrbXVMfEfjn40FDIz/0wI2BPyj9tjnHJNFIeZ8wDNXUnLtFXvXx oTSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=kgsBFOaHzOf/2wD18tTYlDDB764HpuAR+Co5OQLEIyo=; b=ALoE+/+gy6NVgrUCVrIsaeKFLvmoKAxNVWDY1vOIdtCa7F/AjxHzykSDvibgc9vgRZ VGYJFuFBJp1qF8ri5OMhW5Y8rWvAI7koDOV6zq7sSaxhjZCBbPAGkqfx32qzcQkIybxM sZbDOajdvWYcSAucHe9WpKvSEgT7TTigN9NiSWPAjr9SHZxDrG4u6r1C6F9kpVrfBXa5 eQ3ef21jgRPtzW/OdmV3sQpi/FrEcigxTZfggwTl8OqwyzAgUyZ8kaycv49NNH7CVxWf MgOOQgbKoL2fmAfrkD8Qq0tDfQerUoQo4DWcDDAzUF1K1SpmDUwMSbdYkhRaoi+KbBwB Wi+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GxUTy6UC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bk13-20020a056a02028d00b00512fab8c401si14206968pgb.427.2023.03.28.12.06.41; Tue, 28 Mar 2023 12:06:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GxUTy6UC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229586AbjC1SUg (ORCPT + 99 others); Tue, 28 Mar 2023 14:20:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229477AbjC1SUe (ORCPT ); Tue, 28 Mar 2023 14:20:34 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B95510EF for ; Tue, 28 Mar 2023 11:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1680027633; x=1711563633; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=ehl2l/YXzABPoP55K0efsJ5+hepRqDPbrMR4yAeoFdE=; b=GxUTy6UCjnYbhAWUdpZLI2a7ITbjc3oLMrSNCCrlXyOWZmZcZPIURHxD cm/m3DR7rFRJdjz/Knu9ZPsRL6h6C0RRgxma1PxG/DX5OwJ9CIWLmpwcp DoV7hm6mbONg8Y7nsEj2OwTGYJzL4V6HysLhSfzWpiOokzg8h/L7bBCXD Im+dRjboOsmBOimhZ10Ypnd8jFfNShFQBRV3pYv8qpEWVsf4QGBaMg7ka z8ABUMXbXtMDrGLklp1XvycpEC4WkKl78U+zttANfS2whrbi8KHe7Quj+ tAVT6AI70C79GUw8IL8eKPRxgI5+RaPfUvV0D/oi35tXNf+aXUZu81gVm g==; X-IronPort-AV: E=McAfee;i="6600,9927,10663"; a="321055067" X-IronPort-AV: E=Sophos;i="5.98,297,1673942400"; d="scan'208";a="321055067" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Mar 2023 11:20:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10663"; a="716570304" X-IronPort-AV: E=Sophos;i="5.98,297,1673942400"; d="scan'208";a="716570304" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orsmga001.jf.intel.com with ESMTP; 28 Mar 2023 11:20:32 -0700 From: kan.liang@linux.intel.com To: joro@8bytes.org, will@kernel.org, baolu.lu@linux.intel.com, dwmw2@infradead.org, robin.murphy@arm.com, iommu@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Kan Liang Subject: [PATCH V2] iommu/vt-d: Fix a IOMMU perfmon warning when CPU hotplug Date: Tue, 28 Mar 2023 11:20:28 -0700 Message-Id: <20230328182028.1366416-1-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 X-Spam-Status: No, score=-2.4 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761639570783265511?= X-GMAIL-MSGID: =?utf-8?q?1761639570783265511?= From: Kan Liang A warning can be triggered when hotplug CPU 0. $ echo 0 > /sys/devices/system/cpu/cpu0/online [11958.737635] ------------[ cut here ]------------ [11958.742882] Voluntary context switch within RCU read-side critical section! [11958.742891] WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4f4/0x580 [11958.860095] RIP: 0010:rcu_note_context_switch+0x4f4/0x580 [11958.960360] Call Trace: [11958.963161] [11958.965565] ? perf_event_update_userpage+0x104/0x150 [11958.971293] __schedule+0x8d/0x960 [11958.975165] ? perf_event_set_state.part.82+0x11/0x50 [11958.980891] schedule+0x44/0xb0 [11958.984464] schedule_timeout+0x226/0x310 [11958.989017] ? __perf_event_disable+0x64/0x1a0 [11958.994054] ? _raw_spin_unlock+0x14/0x30 [11958.998605] wait_for_completion+0x94/0x130 [11959.003352] __wait_rcu_gp+0x108/0x130 [11959.007616] synchronize_rcu+0x67/0x70 [11959.011876] ? invoke_rcu_core+0xb0/0xb0 [11959.016333] ? __bpf_trace_rcu_stall_warning+0x10/0x10 [11959.022147] perf_pmu_migrate_context+0x121/0x370 [11959.027478] iommu_pmu_cpu_offline+0x6a/0xa0 [11959.032325] ? iommu_pmu_del+0x1e0/0x1e0 [11959.036782] cpuhp_invoke_callback+0x129/0x510 [11959.041825] cpuhp_thread_fun+0x94/0x150 [11959.046283] smpboot_thread_fn+0x183/0x220 [11959.050933] ? sort_range+0x20/0x20 [11959.054902] kthread+0xe6/0x110 [11959.058479] ? kthread_complete_and_exit+0x20/0x20 [11959.063911] ret_from_fork+0x1f/0x30 [11959.067982] [11959.070489] ---[ end trace 0000000000000000 ]--- The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(), when migrating a PMU to a new CPU. However, the current for_each_iommu() is within RCU read-side critical section. Two methods were considered to fix the issue. - Use the dmar_global_lock to replace the RCU read lock when going through the drhd list. But it triggers a lockdep warning. - Use the cpuhp_setup_state_multi() to set up a dedicated state for each IOMMU PMU. The lock can be avoided. The latter method is implemented in this patch. Since each IOMMU PMU has a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track the state. The state can be dynamically allocated now. Remove the CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE. Fixes: 46284c6ceb5e ("iommu/vt-d: Support cpumask for IOMMU perfmon") Signed-off-by: Kan Liang Reported-by: Ammy Yi --- drivers/iommu/intel/iommu.h | 2 ++ drivers/iommu/intel/perfmon.c | 68 ++++++++++++++++++++++------------- include/linux/cpuhotplug.h | 1 - 3 files changed, 46 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index d6df3b865812..694ab9b7d3e9 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -641,6 +641,8 @@ struct iommu_pmu { DECLARE_BITMAP(used_mask, IOMMU_PMU_IDX_MAX); struct perf_event *event_list[IOMMU_PMU_IDX_MAX]; unsigned char irq_name[16]; + struct hlist_node cpuhp_node; + int cpu; }; #define IOMMU_IRQ_ID_OFFSET_PRQ (DMAR_UNITS_SUPPORTED) diff --git a/drivers/iommu/intel/perfmon.c b/drivers/iommu/intel/perfmon.c index e17d9743a0d8..e27bc954e866 100644 --- a/drivers/iommu/intel/perfmon.c +++ b/drivers/iommu/intel/perfmon.c @@ -773,19 +773,34 @@ static void iommu_pmu_unset_interrupt(struct intel_iommu *iommu) iommu->perf_irq = 0; } -static int iommu_pmu_cpu_online(unsigned int cpu) +static int iommu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node) { + struct iommu_pmu *iommu_pmu = hlist_entry_safe(node, typeof(*iommu_pmu), cpuhp_node); + if (cpumask_empty(&iommu_pmu_cpu_mask)) cpumask_set_cpu(cpu, &iommu_pmu_cpu_mask); + if (cpumask_test_cpu(cpu, &iommu_pmu_cpu_mask)) + iommu_pmu->cpu = cpu; + return 0; } -static int iommu_pmu_cpu_offline(unsigned int cpu) +static int iommu_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node) { - struct dmar_drhd_unit *drhd; - struct intel_iommu *iommu; - int target; + struct iommu_pmu *iommu_pmu = hlist_entry_safe(node, typeof(*iommu_pmu), cpuhp_node); + int target = cpumask_first(&iommu_pmu_cpu_mask); + + /* + * The iommu_pmu_cpu_mask has been updated when offline the CPU + * for the first iommu_pmu. Migrate the other iommu_pmu to the + * new target. + */ + if ((target < nr_cpu_ids) && (target != iommu_pmu->cpu)) { + perf_pmu_migrate_context(&iommu_pmu->pmu, cpu, target); + iommu_pmu->cpu = target; + return 0; + } if (!cpumask_test_and_clear_cpu(cpu, &iommu_pmu_cpu_mask)) return 0; @@ -795,45 +810,50 @@ static int iommu_pmu_cpu_offline(unsigned int cpu) if (target < nr_cpu_ids) cpumask_set_cpu(target, &iommu_pmu_cpu_mask); else - target = -1; + return 0; - rcu_read_lock(); - - for_each_iommu(iommu, drhd) { - if (!iommu->pmu) - continue; - perf_pmu_migrate_context(&iommu->pmu->pmu, cpu, target); - } - rcu_read_unlock(); + perf_pmu_migrate_context(&iommu_pmu->pmu, cpu, target); + iommu_pmu->cpu = target; return 0; } static int nr_iommu_pmu; +static enum cpuhp_state iommu_cpuhp_slot; static int iommu_pmu_cpuhp_setup(struct iommu_pmu *iommu_pmu) { int ret; - if (nr_iommu_pmu++) - return 0; + if (!nr_iommu_pmu) { + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "driver/iommu/intel/perfmon:online", + iommu_pmu_cpu_online, + iommu_pmu_cpu_offline); + if (ret < 0) + return ret; + iommu_cpuhp_slot = ret; + } - ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE, - "driver/iommu/intel/perfmon:online", - iommu_pmu_cpu_online, - iommu_pmu_cpu_offline); - if (ret) - nr_iommu_pmu = 0; + ret = cpuhp_state_add_instance(iommu_cpuhp_slot, &iommu_pmu->cpuhp_node); + if (ret) { + if (!nr_iommu_pmu) + cpuhp_remove_multi_state(iommu_cpuhp_slot); + return ret; + } + nr_iommu_pmu++; - return ret; + return 0; } static void iommu_pmu_cpuhp_free(struct iommu_pmu *iommu_pmu) { + cpuhp_state_remove_instance(iommu_cpuhp_slot, &iommu_pmu->cpuhp_node); + if (--nr_iommu_pmu) return; - cpuhp_remove_state(CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE); + cpuhp_remove_multi_state(iommu_cpuhp_slot); } void iommu_pmu_register(struct intel_iommu *iommu) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index c6fab004104a..5b2f8147d1ae 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -218,7 +218,6 @@ enum cpuhp_state { CPUHP_AP_PERF_X86_CQM_ONLINE, CPUHP_AP_PERF_X86_CSTATE_ONLINE, CPUHP_AP_PERF_X86_IDXD_ONLINE, - CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE, CPUHP_AP_PERF_S390_CF_ONLINE, CPUHP_AP_PERF_S390_SF_ONLINE, CPUHP_AP_PERF_ARM_CCI_ONLINE,