Message ID | 20230323053043.35-4-xieyongji@bytedance.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2739067wrt; Wed, 22 Mar 2023 22:55:43 -0700 (PDT) X-Google-Smtp-Source: AK7set9hNPPgdHRPpaY1PvOGYBLYOzt2updcw2r1O4D+FqhNJEsbjQHp65OjOPUDDzh2pF4A/357 X-Received: by 2002:a62:84d0:0:b0:628:184f:2c5 with SMTP id k199-20020a6284d0000000b00628184f02c5mr5549232pfd.14.1679550942830; Wed, 22 Mar 2023 22:55:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679550942; cv=none; d=google.com; s=arc-20160816; b=Bspa9lyrHHEcUvCvgVERBab8qHO1vpOOHXPM4svE4eqjLQGA3u1N4E8y9hfWrcSl1d Y7X/bK3HtOvk7nkz7M1QjaE1nMIUT36ImM6vzQjUZE/947rVcYqI8sJgHvSroqklLRbc wlQjT1gUDF+YqI/cl4qsl3o4wl+eAhj11I9RqJJCpitOUJ6PTAix4jK1BU4Wgxc1bGEz Xdhy8knNMMcKlqaVVL45yWjVN7aNl/YAcRgNxjGWyXeLcYvThDAEhWIMS3B64AydWJzZ wUdMKii34R1QqJlj8nLblmRTRzxiHOxb2KWZlPlQklWof8sGjMJusEfQzpvMNUL9zX+H mj8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QcDeLFrXedHSKuPyT2iyCEdMTIpqJ+T19wOg244Ts4I=; b=hWu1DrN+4tMNKBqjZBhi1+qUMjY5ZwqQ24QD8lpTMWHtqN31xzp4YU31fjBRoGMnDK YY4BWjG+r0iLFJsRYoTx2zqZES0GRrTNiabEkv4YLeySWOCmYCUjhUb+x8U3sWdsmcXF zY53pr+zVf08e2JtMpGFLOXbTZBUcg6sro8qWlf9yYWOsIJjsEXNVnSJijXVGpz39vJi hXPZKyAo9Mh2z2oz2gMrgnz2LC5FELqH+Z/cc3jxrJRktYNb3ZatpNXY4OO20L40wnhp raS5OHPnT+NS7XJRea8k5p1iC6T8K50tm2CVMG6+HG8DfEaQoURH3ZWF5TWd/tOzwf0U aFzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=W9SSFngk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x185-20020a6386c2000000b005089e773be5si18839869pgd.661.2023.03.22.22.55.29; Wed, 22 Mar 2023 22:55:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=W9SSFngk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230457AbjCWFcf (ORCPT <rfc822;ezelljr.billy@gmail.com> + 99 others); Thu, 23 Mar 2023 01:32:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230467AbjCWFcB (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 23 Mar 2023 01:32:01 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9635C31BCD for <linux-kernel@vger.kernel.org>; Wed, 22 Mar 2023 22:31:23 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id kq3so8919736plb.13 for <linux-kernel@vger.kernel.org>; Wed, 22 Mar 2023 22:31:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1679549482; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QcDeLFrXedHSKuPyT2iyCEdMTIpqJ+T19wOg244Ts4I=; b=W9SSFngkB1ECAtiQ37iuP109gTcmletw6vq+zRxp/NRGzpExnhGsK4kPrjj1ak5b4L aeyffW2TlZUySj70RZs94gUF8zv5FdUiHpxw15qAr9fH0T05eV+3IiKo4jC8gWqa8YLa C8kTUg6jzQNubyAjSltIJFxWsVlc+4h+aIUvW50QX7Q25sCZj/FO5DpLggKQ0Zelmlj4 JLQavFZ9OYxOYqJK22eDQHcjDTrs3xbkJBLEkNO5E/KweCY+9fTUSkHmbs7CRWVKEo5f 3nf9rgKyv8tVgNS3os7wQGnVJgwRKAuN0q0f6TRupP3UlA9jIH0YdL7GfJhPp4ZuNSxJ b4ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679549482; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QcDeLFrXedHSKuPyT2iyCEdMTIpqJ+T19wOg244Ts4I=; b=z4VQNziPIDW33eFUuzDtYlXG+WSqYSiipwuqznLducSuUjMk4McuIZewSbgwZd/rrE G/T4CpQ+x2mKVJAr9tpd/IKAr4psGCpLwztnVYLfs7r3gtlV1Ex6YmAOJwVZfG1Rw4EI OigHIfQmSNSE1JdokNDZvtFL9V03qo/nAYb9u4QwlwciC138xPCJQPbcSgzc94PjBeKc dNA9ZsXqSDLP0n1Hdo8Ly7Hq/F5hrM41rD12cF6A9cxFYcS9lhA5JK6cCNG76fdque4r d0Cs/j8zaqsDdiZDKmY4JoeEbbM4evWOIxqYSnQUSup2MZhdt+p5tk55EpNGQ/0cJtfe Sszw== X-Gm-Message-State: AO0yUKUTkt/zlsu4GqS79O+BUW7Ky9YOBFjjqs5w6Kr2spcas13yYWyL HDsJznXSu2rWIGjBY6X6MBSaiN+OJM/LUK5A2A== X-Received: by 2002:a17:90b:1e42:b0:23d:39e0:13b with SMTP id pi2-20020a17090b1e4200b0023d39e0013bmr6254179pjb.43.1679549482181; Wed, 22 Mar 2023 22:31:22 -0700 (PDT) Received: from localhost ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id d9-20020a17090a6a4900b0023c8a23005asm431804pjm.49.2023.03.22.22.31.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Mar 2023 22:31:22 -0700 (PDT) From: Xie Yongji <xieyongji@bytedance.com> To: mst@redhat.com, jasowang@redhat.com, tglx@linutronix.de, hch@lst.de Cc: virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism Date: Thu, 23 Mar 2023 13:30:35 +0800 Message-Id: <20230323053043.35-4-xieyongji@bytedance.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230323053043.35-1-xieyongji@bytedance.com> References: <20230323053043.35-1-xieyongji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761136809622036277?= X-GMAIL-MSGID: =?utf-8?q?1761136809622036277?= |
Series |
VDUSE: Improve performance
|
|
Commit Message
Yongji Xie
March 23, 2023, 5:30 a.m. UTC
To support interrupt affinity spreading mechanism,
this makes use of group_cpus_evenly() to create
an irq callback affinity mask for each virtqueue
of vdpa device. Then we will unify set_vq_affinity
callback to pass the affinity to the vdpa device driver.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
Comments
On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > To support interrupt affinity spreading mechanism, > this makes use of group_cpus_evenly() to create > an irq callback affinity mask for each virtqueue > of vdpa device. Then we will unify set_vq_affinity > callback to pass the affinity to the vdpa device driver. > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Thinking hard of all the logics, I think I've found something interesting. Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to pass irq_affinity to transport specific find_vqs(). This seems a layer violation since driver has no knowledge of 1) whether or not the callback is based on an IRQ 2) whether or not the device is a PCI or not (the details are hided by the transport driver) 3) how many vectors could be used by a device This means the driver can't actually pass a real affinity masks so the commit passes a zero irq affinity structure as a hint in fact, so the PCI layer can build a default affinity based that groups cpus evenly based on the number of MSI-X vectors (the core logic is the group_cpus_evenly). I think we should fix this by replacing the irq_affinity structure with 1) a boolean like auto_cb_spreading or 2) queue to cpu mapping So each transport can do its own logic based on that. Then virtio-vDPA can pass that policy to VDUSE where we only need a group_cpus_evenly() and avoid duplicating irq_create_affinity_masks()? Thanks > --- > drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 68 insertions(+) > > diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c > index f72696b4c1c2..f3826f42b704 100644 > --- a/drivers/virtio/virtio_vdpa.c > +++ b/drivers/virtio/virtio_vdpa.c > @@ -13,6 +13,7 @@ > #include <linux/kernel.h> > #include <linux/slab.h> > #include <linux/uuid.h> > +#include <linux/group_cpus.h> > #include <linux/virtio.h> > #include <linux/vdpa.h> > #include <linux/virtio_config.h> > @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev) > virtio_vdpa_del_vq(vq); > } > > +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs) > +{ > + affd->nr_sets = 1; > + affd->set_size[0] = affvecs; > +} > + > +static struct cpumask * > +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) > +{ > + unsigned int affvecs = 0, curvec, usedvecs, i; > + struct cpumask *masks = NULL; > + > + if (nvecs > affd->pre_vectors + affd->post_vectors) > + affvecs = nvecs - affd->pre_vectors - affd->post_vectors; > + > + if (!affd->calc_sets) > + affd->calc_sets = default_calc_sets; > + > + affd->calc_sets(affd, affvecs); > + > + if (!affvecs) > + return NULL; > + > + masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL); > + if (!masks) > + return NULL; > + > + /* Fill out vectors at the beginning that don't need affinity */ > + for (curvec = 0; curvec < affd->pre_vectors; curvec++) > + cpumask_setall(&masks[curvec]); > + > + for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) { > + unsigned int this_vecs = affd->set_size[i]; > + int j; > + struct cpumask *result = group_cpus_evenly(this_vecs); > + > + if (!result) { > + kfree(masks); > + return NULL; > + } > + > + for (j = 0; j < this_vecs; j++) > + cpumask_copy(&masks[curvec + j], &result[j]); > + kfree(result); > + > + curvec += this_vecs; > + usedvecs += this_vecs; > + } > + > + /* Fill out vectors at the end that don't need affinity */ > + if (usedvecs >= affvecs) > + curvec = affd->pre_vectors + affvecs; > + else > + curvec = affd->pre_vectors + usedvecs; > + for (; curvec < nvecs; curvec++) > + cpumask_setall(&masks[curvec]); > + > + return masks; > +} > + > static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, > struct virtqueue *vqs[], > vq_callback_t *callbacks[], > @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, > struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); > struct vdpa_device *vdpa = vd_get_vdpa(vdev); > const struct vdpa_config_ops *ops = vdpa->config; > + struct irq_affinity default_affd = { 0 }; > + struct cpumask *masks; > struct vdpa_callback cb; > int i, err, queue_idx = 0; > > + masks = create_affinity_masks(nvqs, desc ? desc : &default_affd); > + if (!masks) > + return -ENOMEM; > + > for (i = 0; i < nvqs; ++i) { > if (!names[i]) { > vqs[i] = NULL; > @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, > err = PTR_ERR(vqs[i]); > goto err_setup_vq; > } > + ops->set_vq_affinity(vdpa, i, &masks[i]); > } > > cb.callback = virtio_vdpa_config_cb; > -- > 2.20.1 >
On Fri, Mar 24, 2023 at 02:27:52PM +0800, Jason Wang wrote: > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > To support interrupt affinity spreading mechanism, > > this makes use of group_cpus_evenly() to create > > an irq callback affinity mask for each virtqueue > > of vdpa device. Then we will unify set_vq_affinity > > callback to pass the affinity to the vdpa device driver. > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > Thinking hard of all the logics, I think I've found something interesting. > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to > pass irq_affinity to transport specific find_vqs(). This seems a > layer violation since driver has no knowledge of > > 1) whether or not the callback is based on an IRQ > 2) whether or not the device is a PCI or not (the details are hided by > the transport driver) > 3) how many vectors could be used by a device > > This means the driver can't actually pass a real affinity masks so the > commit passes a zero irq affinity structure as a hint in fact, so the > PCI layer can build a default affinity based that groups cpus evenly > based on the number of MSI-X vectors (the core logic is the > group_cpus_evenly). I think we should fix this by replacing the > irq_affinity structure with > > 1) a boolean like auto_cb_spreading > > or > > 2) queue to cpu mapping > > So each transport can do its own logic based on that. Then virtio-vDPA > can pass that policy to VDUSE where we only need a group_cpus_evenly() > and avoid duplicating irq_create_affinity_masks()? > > Thanks I don't really understand what you propose. Care to post a patch? Also does it have to block this patchset or can it be done on top? > > --- > > drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 68 insertions(+) > > > > diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c > > index f72696b4c1c2..f3826f42b704 100644 > > --- a/drivers/virtio/virtio_vdpa.c > > +++ b/drivers/virtio/virtio_vdpa.c > > @@ -13,6 +13,7 @@ > > #include <linux/kernel.h> > > #include <linux/slab.h> > > #include <linux/uuid.h> > > +#include <linux/group_cpus.h> > > #include <linux/virtio.h> > > #include <linux/vdpa.h> > > #include <linux/virtio_config.h> > > @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev) > > virtio_vdpa_del_vq(vq); > > } > > > > +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs) > > +{ > > + affd->nr_sets = 1; > > + affd->set_size[0] = affvecs; > > +} > > + > > +static struct cpumask * > > +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) > > +{ > > + unsigned int affvecs = 0, curvec, usedvecs, i; > > + struct cpumask *masks = NULL; > > + > > + if (nvecs > affd->pre_vectors + affd->post_vectors) > > + affvecs = nvecs - affd->pre_vectors - affd->post_vectors; > > + > > + if (!affd->calc_sets) > > + affd->calc_sets = default_calc_sets; > > + > > + affd->calc_sets(affd, affvecs); > > + > > + if (!affvecs) > > + return NULL; > > + > > + masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL); > > + if (!masks) > > + return NULL; > > + > > + /* Fill out vectors at the beginning that don't need affinity */ > > + for (curvec = 0; curvec < affd->pre_vectors; curvec++) > > + cpumask_setall(&masks[curvec]); > > + > > + for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) { > > + unsigned int this_vecs = affd->set_size[i]; > > + int j; > > + struct cpumask *result = group_cpus_evenly(this_vecs); > > + > > + if (!result) { > > + kfree(masks); > > + return NULL; > > + } > > + > > + for (j = 0; j < this_vecs; j++) > > + cpumask_copy(&masks[curvec + j], &result[j]); > > + kfree(result); > > + > > + curvec += this_vecs; > > + usedvecs += this_vecs; > > + } > > + > > + /* Fill out vectors at the end that don't need affinity */ > > + if (usedvecs >= affvecs) > > + curvec = affd->pre_vectors + affvecs; > > + else > > + curvec = affd->pre_vectors + usedvecs; > > + for (; curvec < nvecs; curvec++) > > + cpumask_setall(&masks[curvec]); > > + > > + return masks; > > +} > > + > > static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, > > struct virtqueue *vqs[], > > vq_callback_t *callbacks[], > > @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, > > struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); > > struct vdpa_device *vdpa = vd_get_vdpa(vdev); > > const struct vdpa_config_ops *ops = vdpa->config; > > + struct irq_affinity default_affd = { 0 }; > > + struct cpumask *masks; > > struct vdpa_callback cb; > > int i, err, queue_idx = 0; > > > > + masks = create_affinity_masks(nvqs, desc ? desc : &default_affd); > > + if (!masks) > > + return -ENOMEM; > > + > > for (i = 0; i < nvqs; ++i) { > > if (!names[i]) { > > vqs[i] = NULL; > > @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, > > err = PTR_ERR(vqs[i]); > > goto err_setup_vq; > > } > > + ops->set_vq_affinity(vdpa, i, &masks[i]); > > } > > > > cb.callback = virtio_vdpa_config_cb; > > -- > > 2.20.1 > >
On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote: > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > To support interrupt affinity spreading mechanism, > > this makes use of group_cpus_evenly() to create > > an irq callback affinity mask for each virtqueue > > of vdpa device. Then we will unify set_vq_affinity > > callback to pass the affinity to the vdpa device driver. > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > Thinking hard of all the logics, I think I've found something interesting. > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to > pass irq_affinity to transport specific find_vqs(). This seems a > layer violation since driver has no knowledge of > > 1) whether or not the callback is based on an IRQ > 2) whether or not the device is a PCI or not (the details are hided by > the transport driver) > 3) how many vectors could be used by a device > > This means the driver can't actually pass a real affinity masks so the > commit passes a zero irq affinity structure as a hint in fact, so the > PCI layer can build a default affinity based that groups cpus evenly > based on the number of MSI-X vectors (the core logic is the > group_cpus_evenly). I think we should fix this by replacing the > irq_affinity structure with > > 1) a boolean like auto_cb_spreading > > or > > 2) queue to cpu mapping > But only the driver knows which queues are used in the control path which don't need the automatic irq affinity assignment. So I think the irq_affinity structure can only be created by device drivers and passed to the virtio-pci/virtio-vdpa driver. > So each transport can do its own logic based on that. Then virtio-vDPA > can pass that policy to VDUSE where we only need a group_cpus_evenly() > and avoid duplicating irq_create_affinity_masks()? > I don't get why we would have duplicated irq_create_affinity_masks(). Thanks, Yongji
On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > To support interrupt affinity spreading mechanism, > > > this makes use of group_cpus_evenly() to create > > > an irq callback affinity mask for each virtqueue > > > of vdpa device. Then we will unify set_vq_affinity > > > callback to pass the affinity to the vdpa device driver. > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > > > Thinking hard of all the logics, I think I've found something interesting. > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to > > pass irq_affinity to transport specific find_vqs(). This seems a > > layer violation since driver has no knowledge of > > > > 1) whether or not the callback is based on an IRQ > > 2) whether or not the device is a PCI or not (the details are hided by > > the transport driver) > > 3) how many vectors could be used by a device > > > > This means the driver can't actually pass a real affinity masks so the > > commit passes a zero irq affinity structure as a hint in fact, so the > > PCI layer can build a default affinity based that groups cpus evenly > > based on the number of MSI-X vectors (the core logic is the > > group_cpus_evenly). I think we should fix this by replacing the > > irq_affinity structure with > > > > 1) a boolean like auto_cb_spreading > > > > or > > > > 2) queue to cpu mapping > > > > But only the driver knows which queues are used in the control path > which don't need the automatic irq affinity assignment. Is this knowledge awarded by the transport driver now? E.g virtio-blk uses: struct irq_affinity desc = { 0, }; Atleast we can tell the transport driver which vq requires automatic irq affinity. > So I think the > irq_affinity structure can only be created by device drivers and > passed to the virtio-pci/virtio-vdpa driver. This could be not easy since the driver doesn't even know how many interrupts will be used by the transport driver, so it can't built the actual affinity structure. > > > So each transport can do its own logic based on that. Then virtio-vDPA > > can pass that policy to VDUSE where we only need a group_cpus_evenly() > > and avoid duplicating irq_create_affinity_masks()? > > > > I don't get why we would have duplicated irq_create_affinity_masks(). I meant the create_affinity_masks() in patch 3 seems a duplication of irq_create_affinity_masks(). Thanks > > Thanks, > Yongji >
On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <jasowang@redhat.com> wrote: > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > > > To support interrupt affinity spreading mechanism, > > > > this makes use of group_cpus_evenly() to create > > > > an irq callback affinity mask for each virtqueue > > > > of vdpa device. Then we will unify set_vq_affinity > > > > callback to pass the affinity to the vdpa device driver. > > > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > > > > > Thinking hard of all the logics, I think I've found something interesting. > > > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to > > > pass irq_affinity to transport specific find_vqs(). This seems a > > > layer violation since driver has no knowledge of > > > > > > 1) whether or not the callback is based on an IRQ > > > 2) whether or not the device is a PCI or not (the details are hided by > > > the transport driver) > > > 3) how many vectors could be used by a device > > > > > > This means the driver can't actually pass a real affinity masks so the > > > commit passes a zero irq affinity structure as a hint in fact, so the > > > PCI layer can build a default affinity based that groups cpus evenly > > > based on the number of MSI-X vectors (the core logic is the > > > group_cpus_evenly). I think we should fix this by replacing the > > > irq_affinity structure with > > > > > > 1) a boolean like auto_cb_spreading > > > > > > or > > > > > > 2) queue to cpu mapping > > > > > > > But only the driver knows which queues are used in the control path > > which don't need the automatic irq affinity assignment. > > Is this knowledge awarded by the transport driver now? > This knowledge is awarded by the device driver rather than the transport driver. E.g. virtio-scsi uses: struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control queue, vq1 is event queue > E.g virtio-blk uses: > > struct irq_affinity desc = { 0, }; > > Atleast we can tell the transport driver which vq requires automatic > irq affinity. > I think that is what the current implementation does. > > So I think the > > irq_affinity structure can only be created by device drivers and > > passed to the virtio-pci/virtio-vdpa driver. > > This could be not easy since the driver doesn't even know how many > interrupts will be used by the transport driver, so it can't built the > actual affinity structure. > The actual affinity mask is built by the transport driver, device driver only passes a hint on which queues don't need the automatic irq affinity assignment. Thanks, Yongji
On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > > > > > To support interrupt affinity spreading mechanism, > > > > > this makes use of group_cpus_evenly() to create > > > > > an irq callback affinity mask for each virtqueue > > > > > of vdpa device. Then we will unify set_vq_affinity > > > > > callback to pass the affinity to the vdpa device driver. > > > > > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > > > > > > > Thinking hard of all the logics, I think I've found something interesting. > > > > > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to > > > > pass irq_affinity to transport specific find_vqs(). This seems a > > > > layer violation since driver has no knowledge of > > > > > > > > 1) whether or not the callback is based on an IRQ > > > > 2) whether or not the device is a PCI or not (the details are hided by > > > > the transport driver) > > > > 3) how many vectors could be used by a device > > > > > > > > This means the driver can't actually pass a real affinity masks so the > > > > commit passes a zero irq affinity structure as a hint in fact, so the > > > > PCI layer can build a default affinity based that groups cpus evenly > > > > based on the number of MSI-X vectors (the core logic is the > > > > group_cpus_evenly). I think we should fix this by replacing the > > > > irq_affinity structure with > > > > > > > > 1) a boolean like auto_cb_spreading > > > > > > > > or > > > > > > > > 2) queue to cpu mapping > > > > > > > > > > But only the driver knows which queues are used in the control path > > > which don't need the automatic irq affinity assignment. > > > > Is this knowledge awarded by the transport driver now? > > > > This knowledge is awarded by the device driver rather than the transport driver. > > E.g. virtio-scsi uses: > > struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control > queue, vq1 is event queue Ok, but it only works as a hint, it's not a real affinity. As replied, we can pass an array of boolean in this case then transport driver knows it doesn't need to use automatic affinity for the first two queues. > > > E.g virtio-blk uses: > > > > struct irq_affinity desc = { 0, }; > > > > Atleast we can tell the transport driver which vq requires automatic > > irq affinity. > > > > I think that is what the current implementation does. > > > > So I think the > > > irq_affinity structure can only be created by device drivers and > > > passed to the virtio-pci/virtio-vdpa driver. > > > > This could be not easy since the driver doesn't even know how many > > interrupts will be used by the transport driver, so it can't built the > > actual affinity structure. > > > > The actual affinity mask is built by the transport driver, For PCI yes, it talks directly to the IRQ subsystems. > device > driver only passes a hint on which queues don't need the automatic irq > affinity assignment. But not for virtio-vDPA since the IRQ needs to be dealt with by the parent driver. For our case, it's the VDUSE where it doesn't need IRQ at all, a queue to cpu mapping is sufficient. Thanks > > Thanks, > Yongji >
On Tue, Mar 28, 2023 at 11:44 AM Jason Wang <jasowang@redhat.com> wrote: > > On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > To support interrupt affinity spreading mechanism, > > > > > > this makes use of group_cpus_evenly() to create > > > > > > an irq callback affinity mask for each virtqueue > > > > > > of vdpa device. Then we will unify set_vq_affinity > > > > > > callback to pass the affinity to the vdpa device driver. > > > > > > > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > > > > > > > > > Thinking hard of all the logics, I think I've found something interesting. > > > > > > > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to > > > > > pass irq_affinity to transport specific find_vqs(). This seems a > > > > > layer violation since driver has no knowledge of > > > > > > > > > > 1) whether or not the callback is based on an IRQ > > > > > 2) whether or not the device is a PCI or not (the details are hided by > > > > > the transport driver) > > > > > 3) how many vectors could be used by a device > > > > > > > > > > This means the driver can't actually pass a real affinity masks so the > > > > > commit passes a zero irq affinity structure as a hint in fact, so the > > > > > PCI layer can build a default affinity based that groups cpus evenly > > > > > based on the number of MSI-X vectors (the core logic is the > > > > > group_cpus_evenly). I think we should fix this by replacing the > > > > > irq_affinity structure with > > > > > > > > > > 1) a boolean like auto_cb_spreading > > > > > > > > > > or > > > > > > > > > > 2) queue to cpu mapping > > > > > > > > > > > > > But only the driver knows which queues are used in the control path > > > > which don't need the automatic irq affinity assignment. > > > > > > Is this knowledge awarded by the transport driver now? > > > > > > > This knowledge is awarded by the device driver rather than the transport driver. > > > > E.g. virtio-scsi uses: > > > > struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control > > queue, vq1 is event queue > > Ok, but it only works as a hint, it's not a real affinity. As replied, > we can pass an array of boolean in this case then transport driver > knows it doesn't need to use automatic affinity for the first two > queues. > But we don't know whether we would use other fields in structure irq_affinity in the future. So a full set should be better? > > > > > E.g virtio-blk uses: > > > > > > struct irq_affinity desc = { 0, }; > > > > > > Atleast we can tell the transport driver which vq requires automatic > > > irq affinity. > > > > > > > I think that is what the current implementation does. > > > > > > So I think the > > > > irq_affinity structure can only be created by device drivers and > > > > passed to the virtio-pci/virtio-vdpa driver. > > > > > > This could be not easy since the driver doesn't even know how many > > > interrupts will be used by the transport driver, so it can't built the > > > actual affinity structure. > > > > > > > The actual affinity mask is built by the transport driver, > > For PCI yes, it talks directly to the IRQ subsystems. > > > device > > driver only passes a hint on which queues don't need the automatic irq > > affinity assignment. > > But not for virtio-vDPA since the IRQ needs to be dealt with by the > parent driver. For our case, it's the VDUSE where it doesn't need IRQ > at all, a queue to cpu mapping is sufficient. > The device driver doesn't know whether it is binded to virtio-pci or virtio-vdpa. So it should pass a full set needed by the automatic irq affinity assignment instead of a subset. Then virtio-vdpa can choose to pass a queue to cpu mapping to VDUSE, which is what we do now (use set_vq_affinity()). Thanks, Yongji
On Tue, Mar 28, 2023 at 12:05 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Tue, Mar 28, 2023 at 11:44 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > > > To support interrupt affinity spreading mechanism, > > > > > > > this makes use of group_cpus_evenly() to create > > > > > > > an irq callback affinity mask for each virtqueue > > > > > > > of vdpa device. Then we will unify set_vq_affinity > > > > > > > callback to pass the affinity to the vdpa device driver. > > > > > > > > > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > > > > > > > > > > > Thinking hard of all the logics, I think I've found something interesting. > > > > > > > > > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to > > > > > > pass irq_affinity to transport specific find_vqs(). This seems a > > > > > > layer violation since driver has no knowledge of > > > > > > > > > > > > 1) whether or not the callback is based on an IRQ > > > > > > 2) whether or not the device is a PCI or not (the details are hided by > > > > > > the transport driver) > > > > > > 3) how many vectors could be used by a device > > > > > > > > > > > > This means the driver can't actually pass a real affinity masks so the > > > > > > commit passes a zero irq affinity structure as a hint in fact, so the > > > > > > PCI layer can build a default affinity based that groups cpus evenly > > > > > > based on the number of MSI-X vectors (the core logic is the > > > > > > group_cpus_evenly). I think we should fix this by replacing the > > > > > > irq_affinity structure with > > > > > > > > > > > > 1) a boolean like auto_cb_spreading > > > > > > > > > > > > or > > > > > > > > > > > > 2) queue to cpu mapping > > > > > > > > > > > > > > > > But only the driver knows which queues are used in the control path > > > > > which don't need the automatic irq affinity assignment. > > > > > > > > Is this knowledge awarded by the transport driver now? > > > > > > > > > > This knowledge is awarded by the device driver rather than the transport driver. > > > > > > E.g. virtio-scsi uses: > > > > > > struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control > > > queue, vq1 is event queue > > > > Ok, but it only works as a hint, it's not a real affinity. As replied, > > we can pass an array of boolean in this case then transport driver > > knows it doesn't need to use automatic affinity for the first two > > queues. > > > > But we don't know whether we would use other fields in structure > irq_affinity in the future. So a full set should be better? Good point. So the issue is the calc_sets() and we probably need that if there's a virtio driver that needs more than one set of vectors that needs to be spreaded. Technically, we could have a virtio level abstraction for this but I agree it's probably not worth bothering now. > > > > > > > > E.g virtio-blk uses: > > > > > > > > struct irq_affinity desc = { 0, }; > > > > > > > > Atleast we can tell the transport driver which vq requires automatic > > > > irq affinity. > > > > > > > > > > I think that is what the current implementation does. > > > > > > > > So I think the > > > > > irq_affinity structure can only be created by device drivers and > > > > > passed to the virtio-pci/virtio-vdpa driver. > > > > > > > > This could be not easy since the driver doesn't even know how many > > > > interrupts will be used by the transport driver, so it can't built the > > > > actual affinity structure. > > > > > > > > > > The actual affinity mask is built by the transport driver, > > > > For PCI yes, it talks directly to the IRQ subsystems. > > > > > device > > > driver only passes a hint on which queues don't need the automatic irq > > > affinity assignment. > > > > But not for virtio-vDPA since the IRQ needs to be dealt with by the > > parent driver. For our case, it's the VDUSE where it doesn't need IRQ > > at all, a queue to cpu mapping is sufficient. > > > > The device driver doesn't know whether it is binded to virtio-pci or > virtio-vdpa. So it should pass a full set needed by the automatic irq > affinity assignment instead of a subset. Then virtio-vdpa can choose > to pass a queue to cpu mapping to VDUSE, which is what we do now (use > set_vq_affinity()). Yes, so basically two ways: 1) automatic IRQ management, passing affd to find_vqs(), affinity was determined by the transport (e.g vDPA). 2) affinity that is under the control of the driver, it needs to use set_vq_affinity() but need to deal with cpu hotplug stuffs. Thanks > > Thanks, > Yongji >
在 2023/3/24 17:12, Michael S. Tsirkin 写道: > On Fri, Mar 24, 2023 at 02:27:52PM +0800, Jason Wang wrote: >> On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote: >>> To support interrupt affinity spreading mechanism, >>> this makes use of group_cpus_evenly() to create >>> an irq callback affinity mask for each virtqueue >>> of vdpa device. Then we will unify set_vq_affinity >>> callback to pass the affinity to the vdpa device driver. >>> >>> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> >> Thinking hard of all the logics, I think I've found something interesting. >> >> Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to >> pass irq_affinity to transport specific find_vqs(). This seems a >> layer violation since driver has no knowledge of >> >> 1) whether or not the callback is based on an IRQ >> 2) whether or not the device is a PCI or not (the details are hided by >> the transport driver) >> 3) how many vectors could be used by a device >> >> This means the driver can't actually pass a real affinity masks so the >> commit passes a zero irq affinity structure as a hint in fact, so the >> PCI layer can build a default affinity based that groups cpus evenly >> based on the number of MSI-X vectors (the core logic is the >> group_cpus_evenly). I think we should fix this by replacing the >> irq_affinity structure with >> >> 1) a boolean like auto_cb_spreading >> >> or >> >> 2) queue to cpu mapping >> >> So each transport can do its own logic based on that. Then virtio-vDPA >> can pass that policy to VDUSE where we only need a group_cpus_evenly() >> and avoid duplicating irq_create_affinity_masks()? >> >> Thanks > I don't really understand what you propose. Care to post a patch? I meant to avoid passing irq_affinity structure in find_vqs but an array of boolean telling us whether or not the vq requires a automatic spreading of callbacks. But it seems less flexible. > Also does it have to block this patchset or can it be done on top? We can leave it in the future. So Acked-by: Jason Wang <jasowang@redhat.com> Thanks > >>> --- >>> drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 68 insertions(+) >>> >>> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c >>> index f72696b4c1c2..f3826f42b704 100644 >>> --- a/drivers/virtio/virtio_vdpa.c >>> +++ b/drivers/virtio/virtio_vdpa.c >>> @@ -13,6 +13,7 @@ >>> #include <linux/kernel.h> >>> #include <linux/slab.h> >>> #include <linux/uuid.h> >>> +#include <linux/group_cpus.h> >>> #include <linux/virtio.h> >>> #include <linux/vdpa.h> >>> #include <linux/virtio_config.h> >>> @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev) >>> virtio_vdpa_del_vq(vq); >>> } >>> >>> +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs) >>> +{ >>> + affd->nr_sets = 1; >>> + affd->set_size[0] = affvecs; >>> +} >>> + >>> +static struct cpumask * >>> +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) >>> +{ >>> + unsigned int affvecs = 0, curvec, usedvecs, i; >>> + struct cpumask *masks = NULL; >>> + >>> + if (nvecs > affd->pre_vectors + affd->post_vectors) >>> + affvecs = nvecs - affd->pre_vectors - affd->post_vectors; >>> + >>> + if (!affd->calc_sets) >>> + affd->calc_sets = default_calc_sets; >>> + >>> + affd->calc_sets(affd, affvecs); >>> + >>> + if (!affvecs) >>> + return NULL; >>> + >>> + masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL); >>> + if (!masks) >>> + return NULL; >>> + >>> + /* Fill out vectors at the beginning that don't need affinity */ >>> + for (curvec = 0; curvec < affd->pre_vectors; curvec++) >>> + cpumask_setall(&masks[curvec]); >>> + >>> + for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) { >>> + unsigned int this_vecs = affd->set_size[i]; >>> + int j; >>> + struct cpumask *result = group_cpus_evenly(this_vecs); >>> + >>> + if (!result) { >>> + kfree(masks); >>> + return NULL; >>> + } >>> + >>> + for (j = 0; j < this_vecs; j++) >>> + cpumask_copy(&masks[curvec + j], &result[j]); >>> + kfree(result); >>> + >>> + curvec += this_vecs; >>> + usedvecs += this_vecs; >>> + } >>> + >>> + /* Fill out vectors at the end that don't need affinity */ >>> + if (usedvecs >= affvecs) >>> + curvec = affd->pre_vectors + affvecs; >>> + else >>> + curvec = affd->pre_vectors + usedvecs; >>> + for (; curvec < nvecs; curvec++) >>> + cpumask_setall(&masks[curvec]); >>> + >>> + return masks; >>> +} >>> + >>> static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, >>> struct virtqueue *vqs[], >>> vq_callback_t *callbacks[], >>> @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, >>> struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); >>> struct vdpa_device *vdpa = vd_get_vdpa(vdev); >>> const struct vdpa_config_ops *ops = vdpa->config; >>> + struct irq_affinity default_affd = { 0 }; >>> + struct cpumask *masks; >>> struct vdpa_callback cb; >>> int i, err, queue_idx = 0; >>> >>> + masks = create_affinity_masks(nvqs, desc ? desc : &default_affd); >>> + if (!masks) >>> + return -ENOMEM; >>> + >>> for (i = 0; i < nvqs; ++i) { >>> if (!names[i]) { >>> vqs[i] = NULL; >>> @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, >>> err = PTR_ERR(vqs[i]); >>> goto err_setup_vq; >>> } >>> + ops->set_vq_affinity(vdpa, i, &masks[i]); >>> } >>> >>> cb.callback = virtio_vdpa_config_cb; >>> -- >>> 2.20.1 >>>
diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c index f72696b4c1c2..f3826f42b704 100644 --- a/drivers/virtio/virtio_vdpa.c +++ b/drivers/virtio/virtio_vdpa.c @@ -13,6 +13,7 @@ #include <linux/kernel.h> #include <linux/slab.h> #include <linux/uuid.h> +#include <linux/group_cpus.h> #include <linux/virtio.h> #include <linux/vdpa.h> #include <linux/virtio_config.h> @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev) virtio_vdpa_del_vq(vq); } +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs) +{ + affd->nr_sets = 1; + affd->set_size[0] = affvecs; +} + +static struct cpumask * +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) +{ + unsigned int affvecs = 0, curvec, usedvecs, i; + struct cpumask *masks = NULL; + + if (nvecs > affd->pre_vectors + affd->post_vectors) + affvecs = nvecs - affd->pre_vectors - affd->post_vectors; + + if (!affd->calc_sets) + affd->calc_sets = default_calc_sets; + + affd->calc_sets(affd, affvecs); + + if (!affvecs) + return NULL; + + masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL); + if (!masks) + return NULL; + + /* Fill out vectors at the beginning that don't need affinity */ + for (curvec = 0; curvec < affd->pre_vectors; curvec++) + cpumask_setall(&masks[curvec]); + + for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) { + unsigned int this_vecs = affd->set_size[i]; + int j; + struct cpumask *result = group_cpus_evenly(this_vecs); + + if (!result) { + kfree(masks); + return NULL; + } + + for (j = 0; j < this_vecs; j++) + cpumask_copy(&masks[curvec + j], &result[j]); + kfree(result); + + curvec += this_vecs; + usedvecs += this_vecs; + } + + /* Fill out vectors at the end that don't need affinity */ + if (usedvecs >= affvecs) + curvec = affd->pre_vectors + affvecs; + else + curvec = affd->pre_vectors + usedvecs; + for (; curvec < nvecs; curvec++) + cpumask_setall(&masks[curvec]); + + return masks; +} + static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); struct vdpa_device *vdpa = vd_get_vdpa(vdev); const struct vdpa_config_ops *ops = vdpa->config; + struct irq_affinity default_affd = { 0 }; + struct cpumask *masks; struct vdpa_callback cb; int i, err, queue_idx = 0; + masks = create_affinity_masks(nvqs, desc ? desc : &default_affd); + if (!masks) + return -ENOMEM; + for (i = 0; i < nvqs; ++i) { if (!names[i]) { vqs[i] = NULL; @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs, err = PTR_ERR(vqs[i]); goto err_setup_vq; } + ops->set_vq_affinity(vdpa, i, &masks[i]); } cb.callback = virtio_vdpa_config_cb;