From patchwork Tue Dec 5 21:25:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Zanussi X-Patchwork-Id: 174195 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3716848vqy; Tue, 5 Dec 2023 13:26:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IGzvEPSdHtA7InhXwQD7T3KhymiG4kUGLUwFcsHBvvB+JgFVwmn/KFSYfbgs8WHpqEFMsn3 X-Received: by 2002:a05:6a00:124b:b0:6ce:2731:d5c5 with SMTP id u11-20020a056a00124b00b006ce2731d5c5mr1787913pfi.54.1701811592802; Tue, 05 Dec 2023 13:26:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701811592; cv=none; d=google.com; s=arc-20160816; b=0869J912IUoj3k8aSBbOGXYRRhWBBDrTZkzrihuzLKA6zRCHDI7+NUvIvn5U3zi5Qa lFB/DmTWvhcF4J6kzUJc/yESd+m725IAqjqkYEr8C7bfKy0Q+IeOFu+10ZJTFh7fzl81 Xh8DC5jeX+VIgaiis9BO/6SW2m5HZcMC77bKrlDtd5BPdeSH3yTjjGy45w2ILjQZOfBG JENKQYVZdbKSeL2IuGVdMtQp9rRnyV/6STWrYTjw9UDXzwRzeRJLtKN5r2U7CLZJZciJ ti7jM4oPHSjjAUIkjOFk5DVd/5DKkMhhIdEka6Y+cPWpLFvdMlsxC9So1r15zw/2WkKm GXkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ACa4Dh1DeIziPolfSK5OZeIQN0M0HZFXzO5fHr8latw=; fh=CKy338m5MbqOKLCytkQoYDn9hkoAEvElo6dffmo/qoM=; b=K97Z3bHphTRXLB4R1bd0EQulkJ2EOm5eP6Gh2ELJ5be0h8lzXfBuRe8y7bLEpNF21d 1jkD90XWqntZLhGrlOYZgml/UVg6nbJoCHAOY9AJWpx/9mWSqdsRWGxYIWHER5ByiTx/ 1Fe75ytwQa5jzNqJaN+KwA5GWnaz7yE66YWEvnskHeHGGmzqVaWBT1JblAKqzmvs1OpD DZo8j2WDTbbEc8CNF6Y8fWjWD7KvQYnp9AbPOB45Diu2nfjZLAufpOlTcUuDXaS1q9oy iks4FIA4BZB2CzbbDrdN3WeRWdyWp2fdx0egkhEjWm3RyZUrhDegx+YWLakZa/AhpH3J 09Yg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OaoJDir6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id dn12-20020a056a00498c00b006cdcff16880si6192804pfb.356.2023.12.05.13.26.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 13:26:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OaoJDir6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id B53B78032A28; Tue, 5 Dec 2023 13:26:22 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346644AbjLEV0K (ORCPT + 99 others); Tue, 5 Dec 2023 16:26:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346588AbjLEVZ7 (ORCPT ); Tue, 5 Dec 2023 16:25:59 -0500 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C528F1BC; Tue, 5 Dec 2023 13:26:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701811564; x=1733347564; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XEI4/q1yM7enNsCoofXrUzwmgzvYAZFOUD2+PYXFUVA=; b=OaoJDir6cx5UOjg64SwHcOTOY3FnUil8N7FuGJYZ/gvai6wM4cdTfusW gbHOs8ucFNdJhd6rWkgpRIs7fOpo5cYcFbVrgaBFhab/iMY+HwNSD7nge y+vq5foCgoYpy3Hck/9hh3cBRLbSzZu9NPQR5LfZuBB+s7LXEQ4sR3dEe mUsKFCaefb8fL5+cGmTGvAvjhgoBJQKp0JhG6C+FFbASg66fWbldICMIl 7ZPVtAsa+AnnSC9HlXZvSPGG2lZixCsgbrZ/bD6se0qnRzIXfhPBzeTJs V4ok5YsKKNTlL0AhHgV+5jqJqljaNZR4bwkuQsiGnsBiuyzPBinZwwgJc A==; X-IronPort-AV: E=McAfee;i="6600,9927,10915"; a="396751663" X-IronPort-AV: E=Sophos;i="6.04,253,1695711600"; d="scan'208";a="396751663" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Dec 2023 13:26:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,253,1695711600"; d="scan'208";a="19102668" Received: from jsamonte-mobl.amr.corp.intel.com (HELO tzanussi-mobl1.amr.corp.intel.com) ([10.212.71.180]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Dec 2023 13:26:03 -0800 From: Tom Zanussi To: herbert@gondor.apana.org.au, davem@davemloft.net, fenghua.yu@intel.com, vkoul@kernel.org Cc: dave.jiang@intel.com, tony.luck@intel.com, wajdi.k.feghali@intel.com, james.guilford@intel.com, kanchana.p.sridhar@intel.com, vinodh.gopal@intel.com, giovanni.cabiddu@intel.com, pavel@ucw.cz, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, dmaengine@vger.kernel.org Subject: [PATCH v12 09/14] crypto: iaa - Add per-cpu workqueue table with rebalancing Date: Tue, 5 Dec 2023 15:25:25 -0600 Message-Id: <20231205212530.285671-10-tom.zanussi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205212530.285671-1-tom.zanussi@linux.intel.com> References: <20231205212530.285671-1-tom.zanussi@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 05 Dec 2023 13:26:23 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784478792989456769 X-GMAIL-MSGID: 1784478792989456769 The iaa compression/decompression algorithms in later patches need a way to retrieve an appropriate IAA workqueue depending on how close the associated IAA device is to the current cpu. For this purpose, add a per-cpu array of workqueues such that an appropriate workqueue can be retrieved by simply accessing the per-cpu array. Whenever a new workqueue is bound to or unbound from the iaa_crypto driver, the available workqueues are 'rebalanced' such that work submitted from a particular CPU is given to the most appropriate workqueue available. There currently isn't any way for the user to tweak the way this is done internally - if necessary, knobs can be added later for that purpose. Current best practice is to configure and bind at least one workqueue for each IAA device, but as long as there is at least one workqueue configured and bound to any IAA device in the system, the iaa_crypto driver will work, albeit most likely not as efficiently. [ Based on work originally by George Powley, Jing Lin and Kyung Min Park ] Signed-off-by: Tom Zanussi --- drivers/crypto/intel/iaa/iaa_crypto.h | 7 + drivers/crypto/intel/iaa/iaa_crypto_main.c | 222 +++++++++++++++++++++ 2 files changed, 229 insertions(+) diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/iaa/iaa_crypto.h index 5d1fff7f4b8e..c25546fa87f7 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto.h +++ b/drivers/crypto/intel/iaa/iaa_crypto.h @@ -27,4 +27,11 @@ struct iaa_device { struct list_head wqs; }; +struct wq_table_entry { + struct idxd_wq **wqs; + int max_wqs; + int n_wqs; + int cur_wq; +}; + #endif diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c index 9a662ae49106..925b8fea0d06 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -22,6 +22,46 @@ /* number of iaa instances probed */ static unsigned int nr_iaa; +static unsigned int nr_cpus; +static unsigned int nr_nodes; +static unsigned int nr_cpus_per_node; + +/* Number of physical cpus sharing each iaa instance */ +static unsigned int cpus_per_iaa; + +/* Per-cpu lookup table for balanced wqs */ +static struct wq_table_entry __percpu *wq_table; + +static void wq_table_add(int cpu, struct idxd_wq *wq) +{ + struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu); + + if (WARN_ON(entry->n_wqs == entry->max_wqs)) + return; + + entry->wqs[entry->n_wqs++] = wq; + + pr_debug("%s: added iaa wq %d.%d to idx %d of cpu %d\n", __func__, + entry->wqs[entry->n_wqs - 1]->idxd->id, + entry->wqs[entry->n_wqs - 1]->id, entry->n_wqs - 1, cpu); +} + +static void wq_table_free_entry(int cpu) +{ + struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu); + + kfree(entry->wqs); + memset(entry, 0, sizeof(*entry)); +} + +static void wq_table_clear_entry(int cpu) +{ + struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu); + + entry->n_wqs = 0; + entry->cur_wq = 0; + memset(entry->wqs, 0, entry->max_wqs * sizeof(struct idxd_wq *)); +} static LIST_HEAD(iaa_devices); static DEFINE_MUTEX(iaa_devices_lock); @@ -141,6 +181,53 @@ static void del_iaa_wq(struct iaa_device *iaa_device, struct idxd_wq *wq) } } +static void clear_wq_table(void) +{ + int cpu; + + for (cpu = 0; cpu < nr_cpus; cpu++) + wq_table_clear_entry(cpu); + + pr_debug("cleared wq table\n"); +} + +static void free_wq_table(void) +{ + int cpu; + + for (cpu = 0; cpu < nr_cpus; cpu++) + wq_table_free_entry(cpu); + + free_percpu(wq_table); + + pr_debug("freed wq table\n"); +} + +static int alloc_wq_table(int max_wqs) +{ + struct wq_table_entry *entry; + int cpu; + + wq_table = alloc_percpu(struct wq_table_entry); + if (!wq_table) + return -ENOMEM; + + for (cpu = 0; cpu < nr_cpus; cpu++) { + entry = per_cpu_ptr(wq_table, cpu); + entry->wqs = kcalloc(max_wqs, sizeof(struct wq *), GFP_KERNEL); + if (!entry->wqs) { + free_wq_table(); + return -ENOMEM; + } + + entry->max_wqs = max_wqs; + } + + pr_debug("initialized wq table\n"); + + return 0; +} + static int save_iaa_wq(struct idxd_wq *wq) { struct iaa_device *iaa_device, *found = NULL; @@ -193,6 +280,8 @@ static int save_iaa_wq(struct idxd_wq *wq) if (WARN_ON(nr_iaa == 0)) return -EINVAL; + + cpus_per_iaa = (nr_nodes * nr_cpus_per_node) / nr_iaa; out: return 0; } @@ -207,6 +296,116 @@ static void remove_iaa_wq(struct idxd_wq *wq) break; } } + + if (nr_iaa) + cpus_per_iaa = (nr_nodes * nr_cpus_per_node) / nr_iaa; + else + cpus_per_iaa = 0; +} + +static int wq_table_add_wqs(int iaa, int cpu) +{ + struct iaa_device *iaa_device, *found_device = NULL; + int ret = 0, cur_iaa = 0, n_wqs_added = 0; + struct idxd_device *idxd; + struct iaa_wq *iaa_wq; + struct pci_dev *pdev; + struct device *dev; + + list_for_each_entry(iaa_device, &iaa_devices, list) { + idxd = iaa_device->idxd; + pdev = idxd->pdev; + dev = &pdev->dev; + + if (cur_iaa != iaa) { + cur_iaa++; + continue; + } + + found_device = iaa_device; + dev_dbg(dev, "getting wq from iaa_device %d, cur_iaa %d\n", + found_device->idxd->id, cur_iaa); + break; + } + + if (!found_device) { + found_device = list_first_entry_or_null(&iaa_devices, + struct iaa_device, list); + if (!found_device) { + pr_debug("couldn't find any iaa devices with wqs!\n"); + ret = -EINVAL; + goto out; + } + cur_iaa = 0; + + idxd = found_device->idxd; + pdev = idxd->pdev; + dev = &pdev->dev; + dev_dbg(dev, "getting wq from only iaa_device %d, cur_iaa %d\n", + found_device->idxd->id, cur_iaa); + } + + list_for_each_entry(iaa_wq, &found_device->wqs, list) { + wq_table_add(cpu, iaa_wq->wq); + pr_debug("rebalance: added wq for cpu=%d: iaa wq %d.%d\n", + cpu, iaa_wq->wq->idxd->id, iaa_wq->wq->id); + n_wqs_added++; + }; + + if (!n_wqs_added) { + pr_debug("couldn't find any iaa wqs!\n"); + ret = -EINVAL; + goto out; + } +out: + return ret; +} + +/* + * Rebalance the wq table so that given a cpu, it's easy to find the + * closest IAA instance. The idea is to try to choose the most + * appropriate IAA instance for a caller and spread available + * workqueues around to clients. + */ +static void rebalance_wq_table(void) +{ + const struct cpumask *node_cpus; + int node, cpu, iaa = -1; + + if (nr_iaa == 0) + return; + + pr_debug("rebalance: nr_nodes=%d, nr_cpus %d, nr_iaa %d, cpus_per_iaa %d\n", + nr_nodes, nr_cpus, nr_iaa, cpus_per_iaa); + + clear_wq_table(); + + if (nr_iaa == 1) { + for (cpu = 0; cpu < nr_cpus; cpu++) { + if (WARN_ON(wq_table_add_wqs(0, cpu))) { + pr_debug("could not add any wqs for iaa 0 to cpu %d!\n", cpu); + return; + } + } + + return; + } + + for_each_online_node(node) { + node_cpus = cpumask_of_node(node); + + for (cpu = 0; cpu < nr_cpus_per_node; cpu++) { + int node_cpu = cpumask_nth(cpu, node_cpus); + + if ((cpu % cpus_per_iaa) == 0) + iaa++; + + if (WARN_ON(wq_table_add_wqs(iaa, node_cpu))) { + pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu); + return; + } + } + } } static int iaa_crypto_probe(struct idxd_dev *idxd_dev) @@ -215,6 +414,7 @@ static int iaa_crypto_probe(struct idxd_dev *idxd_dev) struct idxd_device *idxd = wq->idxd; struct idxd_driver_data *data = idxd->data; struct device *dev = &idxd_dev->conf_dev; + bool first_wq = false; int ret = 0; if (idxd->state != IDXD_DEV_ENABLED) @@ -245,10 +445,19 @@ static int iaa_crypto_probe(struct idxd_dev *idxd_dev) mutex_lock(&iaa_devices_lock); + if (list_empty(&iaa_devices)) { + ret = alloc_wq_table(wq->idxd->max_wqs); + if (ret) + goto err_alloc; + first_wq = true; + } + ret = save_iaa_wq(wq); if (ret) goto err_save; + rebalance_wq_table(); + mutex_unlock(&iaa_devices_lock); out: mutex_unlock(&wq->wq_lock); @@ -256,6 +465,10 @@ static int iaa_crypto_probe(struct idxd_dev *idxd_dev) return ret; err_save: + if (first_wq) + free_wq_table(); +err_alloc: + mutex_unlock(&iaa_devices_lock); idxd_drv_disable_wq(wq); err: wq->type = IDXD_WQT_NONE; @@ -273,7 +486,12 @@ static void iaa_crypto_remove(struct idxd_dev *idxd_dev) mutex_lock(&iaa_devices_lock); remove_iaa_wq(wq); + idxd_drv_disable_wq(wq); + rebalance_wq_table(); + + if (nr_iaa == 0) + free_wq_table(); mutex_unlock(&iaa_devices_lock); mutex_unlock(&wq->wq_lock); @@ -295,6 +513,10 @@ static int __init iaa_crypto_init_module(void) { int ret = 0; + nr_cpus = num_online_cpus(); + nr_nodes = num_online_nodes(); + nr_cpus_per_node = nr_cpus / nr_nodes; + ret = idxd_driver_register(&iaa_crypto_driver); if (ret) { pr_debug("IAA wq sub-driver registration failed\n");