From patchwork Mon Dec  4 08:50:41 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Souradeep Chakrabarti
 <schakrabarti@linux.microsoft.com>
X-Patchwork-Id: 173139
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp2629255vqy;
        Mon, 4 Dec 2023 00:51:05 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IHA56WzMN7FfZhhU6YafyV1V0Ygr7RavR5PWEM1aWNnCig4yJG5VKrZ5W0z7VSMmpqMHN3C
X-Received: by 2002:a05:6a20:7291:b0:18b:962c:1ead with SMTP id
 o17-20020a056a20729100b0018b962c1eadmr5160718pzk.3.1701679865648;
        Mon, 04 Dec 2023 00:51:05 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1701679865; cv=none;
        d=google.com; s=arc-20160816;
        b=0NgMGTuJIjHMNfqGRIO2i+qXdJztGdYgb95g14hWwyQ71kHaTLRCgXUnyxuZbgqZnO
         vD0F9JD6LD72vFIBgsEac/gt3Y2DxgUAVfyFBruDl7Lp6P7CicWHNlnpApZeZwFVLyWf
         qbFiAIQVndM29BBb78SSUZg4rT5T1kasnVVbIYPPW3sCsqrdt3aAzTllcDx0EAgnjHvw
         C0CIvGuR6c/pG2ER5FyHzGmEuuDd8Qho9DSxxr/4A9mK6F5LoLG7bZWCKiy8mYT92pRy
         fNBMMSQGvKhK36JsU1q+oogj5Q6oYap4wPhn1BObHmrTBrTFyaXVVSXGzKk8+L6iY/9b
         FTNw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:message-id:date:subject:cc:to:from
         :dkim-signature:dkim-filter;
        bh=+82ZmFRWEGviEskvJhBjDw/CYlrBPi2B0lQ9asHWocA=;
        fh=Hh7d9ukVkoaMDOlPHkbcv03b7xLEMwfFR7i+nOdaGEY=;
        b=OCKMa/c8fG6L5Z7gBUOEQMoFMF4MJ1RIG25ipN/D6ejj+05+DgPDcGv77GIHm9XChR
         6meNif3yJJd8ujeRBM9C2dQm1WtFSJzTKk9R4B3gF+Yc3BXu5M7+enqvMFb0X3p/g0pY
         7o9brJAc7q3nLc/4gFsODBJEFfmsCQS07xZe/qUPNmm65PixD7SDiRhZVZ1A7EVpXVFZ
         CRLVsJPC/hBcamMUNRhjgwrf++JJgONDiF9eP5MvPPBnyHK8YP6Rgz1YsD4Xt8NmCmCc
         RDdK0sGLAwbgUQRKWVot1fFlr0OwALElYpl4UM+NEA/YcOCdzfr0uAwJ6ijNzHXCYwj+
         yvWg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@linux.microsoft.com header.s=default
 header.b=FG72TTRU;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::3:8 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com
Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8])
        by mx.google.com with ESMTPS id
 bf9-20020a170902b90900b001bc162f3318si7222214plb.640.2023.12.04.00.51.05
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 04 Dec 2023 00:51:05 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::3:8 as permitted sender)
 client-ip=2620:137:e000::3:8;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@linux.microsoft.com header.s=default
 header.b=FG72TTRU;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::3:8 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com
Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0])
	by fry.vger.email (Postfix) with ESMTP id 80FC3808F009;
	Mon,  4 Dec 2023 00:50:59 -0800 (PST)
X-Virus-Status: Clean
X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234893AbjLDIuu (ORCPT <rfc822;chrisfriedt@gmail.com>
        + 99 others); Mon, 4 Dec 2023 03:50:50 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60688 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1343735AbjLDIur (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 4 Dec 2023 03:50:47 -0500
Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182])
        by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2981911F;
        Mon,  4 Dec 2023 00:50:53 -0800 (PST)
Received: by linux.microsoft.com (Postfix, from userid 1099)
        id 5F36D20B74C0; Mon,  4 Dec 2023 00:50:52 -0800 (PST)
DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 5F36D20B74C0
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com;
        s=default; t=1701679852;
        bh=+82ZmFRWEGviEskvJhBjDw/CYlrBPi2B0lQ9asHWocA=;
        h=From:To:Cc:Subject:Date:From;
        b=FG72TTRUO8BK7F1aqqE2fnikQCMuZI4qTS0WgkBu4sSkEAW7yIXTzXy58+I+LBTMg
         kkJKNVQUYwaUGWbUXEG3eVxLzPlSl5g2cR6R7R/aUrH39vl87fQSIIf0fb9NBflrZC
         mzKhFP068KZ3yBveHtZl2aXbvLv1fLkEtkkui8kg=
From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
To: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
        decui@microsoft.com, davem@davemloft.net, edumazet@google.com,
        kuba@kernel.org, pabeni@redhat.com, longli@microsoft.com,
        yury.norov@gmail.com, leon@kernel.org, cai.huoqing@linux.dev,
        ssengar@linux.microsoft.com, vkuznets@redhat.com,
        tglx@linutronix.de, linux-hyperv@vger.kernel.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        linux-rdma@vger.kernel.org
Cc: sch^Crabarti@microsoft.com, paulros@microsoft.com,
        Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Subject: [PATCH V4 net-next] net: mana: Assigning IRQ affinity on HT cores
Date: Mon,  4 Dec 2023 00:50:41 -0800
Message-Id: 
 <1701679841-9359-1-git-send-email-schakrabarti@linux.microsoft.com>
X-Mailer: git-send-email 1.8.3.1
X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL
	autolearn=unavailable autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-Greylist: Sender passed SPF test,
 not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]);
 Mon, 04 Dec 2023 00:50:59 -0800 (PST)
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1784340666827173894
X-GMAIL-MSGID: 1784340666827173894

Existing MANA design assigns IRQ to every CPU, including sibling
hyper-threads. This may cause multiple IRQs to be active simultaneously
in the same core and may reduce the network performance with RSS.

Improve the performance by assigning IRQ to non sibling CPUs in local
NUMA node.

Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
V3 -> V4:
* Used for_each_numa_hop_mask() macro and simplified the code.
Thanks to Yury Norov for the suggestion.
* Added code to assign hwc irq separately in mana_gd_setup_irqs.

V2 -> V3:
* Created a helper function to get the next NUMA with CPU.
* Added some error checks for unsuccessful memory allocation.
* Fixed some comments on the code.

V1 -> V2:
* Simplified the code by removing filter_mask_list and using avail_cpus.
* Addressed infinite loop issue when there are numa nodes with no CPUs.
* Addressed uses of local numa node instead of 0 to start.
* Removed uses of BUG_ON.
* Placed cpus_read_lock in parent function to avoid num_online_cpus
  to get changed before function finishes the affinity assignment.
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 70 +++++++++++++++++--
 1 file changed, 63 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 6367de0c2c2e..2194a53cce10 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1243,15 +1243,57 @@ void mana_gd_free_res_map(struct gdma_resource *r)
 	r->size = 0;
 }
 
+static int irq_setup(int *irqs, int nvec, int start_numa_node)
+{
+	int i = 0, cpu, err = 0;
+	const struct cpumask *node_cpumask;
+	unsigned int  next_node = start_numa_node;
+	cpumask_var_t visited_cpus, node_cpumask_temp;
+
+	if (!zalloc_cpumask_var(&visited_cpus, GFP_KERNEL)) {
+		err = ENOMEM;
+		return err;
+	}
+	if (!zalloc_cpumask_var(&node_cpumask_temp, GFP_KERNEL)) {
+		err = -ENOMEM;
+		return err;
+	}
+	rcu_read_lock();
+	for_each_numa_hop_mask(node_cpumask, next_node) {
+		cpumask_copy(node_cpumask_temp, node_cpumask);
+		for_each_cpu(cpu, node_cpumask_temp) {
+			cpumask_andnot(node_cpumask_temp, node_cpumask_temp,
+				       topology_sibling_cpumask(cpu));
+			irq_set_affinity_and_hint(irqs[i], cpumask_of(cpu));
+			if (++i == nvec)
+				goto free_mask;
+			cpumask_set_cpu(cpu, visited_cpus);
+			if (cpumask_empty(node_cpumask_temp)) {
+				cpumask_copy(node_cpumask_temp, node_cpumask);
+				cpumask_andnot(node_cpumask_temp, node_cpumask_temp,
+					       visited_cpus);
+				cpu = 0;
+			}
+		}
+	}
+free_mask:
+	rcu_read_unlock();
+	free_cpumask_var(visited_cpus);
+	free_cpumask_var(node_cpumask_temp);
+	return err;
+}
+
 static int mana_gd_setup_irqs(struct pci_dev *pdev)
 {
-	unsigned int max_queues_per_port = num_online_cpus();
 	struct gdma_context *gc = pci_get_drvdata(pdev);
+	unsigned int max_queues_per_port;
 	struct gdma_irq_context *gic;
 	unsigned int max_irqs, cpu;
-	int nvec, irq;
+	int nvec, *irqs, irq;
 	int err, i = 0, j;
 
+	cpus_read_lock();
+	max_queues_per_port = num_online_cpus();
 	if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
 		max_queues_per_port = MANA_MAX_NUM_QUEUES;
 
@@ -1261,6 +1303,11 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
 	nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX);
 	if (nvec < 0)
 		return nvec;
+	irqs = kmalloc_array(max_queues_per_port, sizeof(int), GFP_KERNEL);
+	if (!irqs) {
+		err = -ENOMEM;
+		goto free_irq_vector;
+	}
 
 	gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context),
 				   GFP_KERNEL);
@@ -1287,21 +1334,28 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
 			goto free_irq;
 		}
 
-		err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
+		if (!i) {
+			err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
+			cpu = cpumask_local_spread(i, gc->numa_node);
+			irq_set_affinity_and_hint(irq, cpumask_of(cpu));
+		} else {
+			irqs[i - 1] = irq;
+			err = request_irq(irqs[i - 1], mana_gd_intr, 0, gic->name, gic);
+		}
 		if (err)
 			goto free_irq;
-
-		cpu = cpumask_local_spread(i, gc->numa_node);
-		irq_set_affinity_and_hint(irq, cpumask_of(cpu));
 	}
 
+	err = irq_setup(irqs, max_queues_per_port, gc->numa_node);
+	if (err)
+		goto free_irq;
 	err = mana_gd_alloc_res_map(nvec, &gc->msix_resource);
 	if (err)
 		goto free_irq;
 
 	gc->max_num_msix = nvec;
 	gc->num_msix_usable = nvec;
-
+	cpus_read_unlock();
 	return 0;
 
 free_irq:
@@ -1314,8 +1368,10 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
 	}
 
 	kfree(gc->irq_contexts);
+	kfree(irqs);
 	gc->irq_contexts = NULL;
 free_irq_vector:
+	cpus_read_unlock();
 	pci_free_irq_vectors(pdev);
 	return err;
 }