Message ID | 20221205085846.741-1-xieyongji@bytedance.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2140518wrr; Mon, 5 Dec 2022 01:05:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf5wT9peg0yk++2mWRSZU/1m0vs/uCTbi5YopbyzacDfZTJaRPIFOpHcBbdZSDzX/uR/BSr+ X-Received: by 2002:a17:907:6d12:b0:7b2:bb8c:5398 with SMTP id sa18-20020a1709076d1200b007b2bb8c5398mr56666981ejc.573.1670231135949; Mon, 05 Dec 2022 01:05:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670231135; cv=none; d=google.com; s=arc-20160816; b=ubeWWYnkV53/cGlBBJw00W4D1/oG7HqOaKS6ABLcTdl/BhIrqMp8ZurW1cW08mcD+W X9hNHwQA98GfvB33Td7RX/30GcLFmQE8/pivH1hkfvTmGhHBzO4EGSv1hNhPQiMC1FTC CZkJAaFYsaIbG8t+89EDfKGmUiFmLUGNxBOZkT3Qie4pMOM5ZVwqHCcIQu8RJAeEhoub e6T2AZF+Zxv84eMWjYmE27uWC0g+7K9eDnIFkzUNQM3kyLOrTY/swywd/PmoUzExUthn xRsvxJqNYQi5KpkyBnNN2aknCPfzGVB3xR3I/eMMFGg4DHHYN1Csp2xsPIGwv8jjAQzG UG4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SbUN/Ko04SNzoBySAaB7G+D6Ee//XjJqOiItc7bmihU=; b=nvRTX+Wm7Nq2xPBP+eLcHBeVnp3vZVRhBEmbI1gviXOtOAgp75uXEGb6u6THUVJ2fc WXYgr+44PD28rRvN7qkIPTF0IOjgyABotEM0tTy/rufeps+SYKkrhbVV3tCkAsIkamjs yUezbiVdBostUk1hZC3sbRLTFuTpDjNUDg8FwRa4tM/iPqbE0O/83MyTCrcf2MKojxB5 CzgAul7ISMxjfT3QanxXKXBRufrZOvZ8BD9/fnN7eNLjhXiDCzfavFOdya/fxMyI3jO9 8/6rzVWRyBvhZBvOr8AK9RIVPogdvK3aANi+YmcjVcE+Xu1FASibJFT4304u0KfRT5ib ylGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=PU3O4v2c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cz3-20020a0564021ca300b004573107a5basi10003517edb.352.2022.12.05.01.05.12; Mon, 05 Dec 2022 01:05:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=PU3O4v2c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231671AbiLEJB1 (ORCPT <rfc822;jaysivo@gmail.com> + 99 others); Mon, 5 Dec 2022 04:01:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231904AbiLEJAn (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 5 Dec 2022 04:00:43 -0500 Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05DC26426 for <linux-kernel@vger.kernel.org>; Mon, 5 Dec 2022 00:59:07 -0800 (PST) Received: by mail-pg1-x534.google.com with SMTP id q71so9913539pgq.8 for <linux-kernel@vger.kernel.org>; Mon, 05 Dec 2022 00:59:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SbUN/Ko04SNzoBySAaB7G+D6Ee//XjJqOiItc7bmihU=; b=PU3O4v2cQ68yHc6IM8LwdPnVejbvUKQh3EhLMNBFT8BoTrTB5xYC+K7usjaOi6IIdq kmHAB+GDafvHchHAP4Ll8tOEvxqZZCHIYOhdV++DIFj6gF8yGCo9eCANPrCu7cTFYT0h 3iZwyjWwGa/33rxA8gEJ49Lg4x96x9AYDrV6cQZKtqnyt5cIfpb2xEqbMa+yuW3ivl2f QIrVUZFVkD1qRuf015T6SANDjzE+2+A/QHP3EandFeQUgJjmog8pa3PukdCxfo0zNOjp a/5Ua7zJ5uijTSW5rsIx58IxSFQ0/s0Oq2Ou6Qaz5bozmliCAqic7EUZvJV5JY6XjM0B atew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SbUN/Ko04SNzoBySAaB7G+D6Ee//XjJqOiItc7bmihU=; b=GHkYA+CbVXUE4yKpT1S+KQM0w7uT4DuFtXoZyT3AVfihg2plcnBUSJPsN9aAQJkz1v QAM/Iq6T1mqR8ipS6vbD+n/Cw2LN5KgaIsR8DDNX9ifziqhRcs2T9tYwS/U4va0jkF/J hK6DlGVogBG0/14mtHUBemVSQ5vR3f77rdzuDWl2sgfsQEhWzzkCdpIGuCVA1qiR2KkP Ge5GLzx/jZHfH+aTp6+tHYruku2/boe0iyUQ+YBiLPEkkyLHesICZuFqPMYNBaudwSrn NfaR/5wpkMteSdXuS4khxZ/Pd9J1dz50q8+vKcFnh4gy/JxTaqn9xmdc4fNaf8wBa3oi LT+w== X-Gm-Message-State: ANoB5pn1G+NTtjyIxr1aeDnvr9WCO4ihacpDdOxL5nKDXvhD0IEo/Dwx an0YLg30Ij6Zruy6xZWoMTAh X-Received: by 2002:a63:545d:0:b0:476:d44d:358 with SMTP id e29-20020a63545d000000b00476d44d0358mr72628703pgm.521.1670230746520; Mon, 05 Dec 2022 00:59:06 -0800 (PST) Received: from localhost ([101.229.232.206]) by smtp.gmail.com with ESMTPSA id e4-20020a056a0000c400b0057709fce782sm910735pfj.54.2022.12.05.00.59.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Dec 2022 00:59:06 -0800 (PST) From: Xie Yongji <xieyongji@bytedance.com> To: mst@redhat.com, jasowang@redhat.com, tglx@linutronix.de, hch@lst.de Cc: virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 06/11] vduse: Support automatic irq callback affinity Date: Mon, 5 Dec 2022 16:58:41 +0800 Message-Id: <20221205085846.741-1-xieyongji@bytedance.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221205084127.535-1-xieyongji@bytedance.com> References: <20221205084127.535-1-xieyongji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751364283685368791?= X-GMAIL-MSGID: =?utf-8?q?1751364283685368791?= |
Series |
VDUSE: Improve performance
|
|
Commit Message
Yongji Xie
Dec. 5, 2022, 8:58 a.m. UTC
This brings current interrupt affinity spreading mechanism
to vduse device. We will make use of irq_create_affinity_masks()
to create an irq callback affinity mask for each virtqueue of
vduse device. Then we will choose the CPU which has the lowest
number of interrupt allocated in the affinity mask to run the
irq callback.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 50 ++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
Comments
On Mon, Dec 5, 2022 at 4:59 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > This brings current interrupt affinity spreading mechanism > to vduse device. We will make use of irq_create_affinity_masks() > to create an irq callback affinity mask for each virtqueue of > vduse device. Then we will choose the CPU which has the lowest > number of interrupt allocated in the affinity mask to run the > irq callback. This seems a balance mechanism but it might not be the semantic of the affinity or any reason we need to do this? I guess we should use at least round-robin in this case. > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > --- > drivers/vdpa/vdpa_user/vduse_dev.c | 50 ++++++++++++++++++++++++++++++ > 1 file changed, 50 insertions(+) > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > index d126f3e32a20..90c2896039d9 100644 > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > @@ -23,6 +23,7 @@ > #include <linux/nospec.h> > #include <linux/vmalloc.h> > #include <linux/sched/mm.h> > +#include <linux/interrupt.h> > #include <uapi/linux/vduse.h> > #include <uapi/linux/vdpa.h> > #include <uapi/linux/virtio_config.h> > @@ -58,6 +59,8 @@ struct vduse_virtqueue { > struct work_struct inject; > struct work_struct kick; > int irq_effective_cpu; > + struct cpumask irq_affinity; > + spinlock_t irq_affinity_lock; Ok, I'd suggest to squash this into patch 5 to make it more easier to be reviewed. > }; > > struct vduse_dev; > @@ -123,6 +126,7 @@ struct vduse_control { > > static DEFINE_MUTEX(vduse_lock); > static DEFINE_IDR(vduse_idr); > +static DEFINE_PER_CPU(unsigned long, vduse_allocated_irq); > > static dev_t vduse_major; > static struct class *vduse_class; > @@ -710,6 +714,49 @@ static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa) > return dev->generation; > } > > +static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq) > +{ > + unsigned int cpu, best_cpu; > + unsigned long allocated, allocated_min = UINT_MAX; > + > + spin_lock(&vq->irq_affinity_lock); > + > + best_cpu = vq->irq_effective_cpu; > + if (best_cpu != -1) > + per_cpu(vduse_allocated_irq, best_cpu) -= 1; > + > + for_each_cpu(cpu, &vq->irq_affinity) { > + allocated = per_cpu(vduse_allocated_irq, cpu); > + if (!cpu_online(cpu) || allocated >= allocated_min) > + continue; > + > + best_cpu = cpu; > + allocated_min = allocated; > + } > + vq->irq_effective_cpu = best_cpu; > + per_cpu(vduse_allocated_irq, best_cpu) += 1; > + > + spin_unlock(&vq->irq_affinity_lock); > +} > + > +static void vduse_vdpa_set_irq_affinity(struct vdpa_device *vdpa, > + struct irq_affinity *desc) > +{ > + struct vduse_dev *dev = vdpa_to_vduse(vdpa); > + struct irq_affinity_desc *affd = NULL; > + int i; > + > + affd = irq_create_affinity_masks(dev->vq_num, desc); > + if (!affd) Let's add a comment on the vdpa config ops to say set_irq_affinity() is best effort. Thanks > + return; > + > + for (i = 0; i < dev->vq_num; i++) { > + cpumask_copy(&dev->vqs[i]->irq_affinity, &affd[i].mask); > + vduse_vq_update_effective_cpu(dev->vqs[i]); > + } > + kfree(affd); > +} > + > static int vduse_vdpa_set_map(struct vdpa_device *vdpa, > unsigned int asid, > struct vhost_iotlb *iotlb) > @@ -760,6 +807,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = { > .get_config = vduse_vdpa_get_config, > .set_config = vduse_vdpa_set_config, > .get_generation = vduse_vdpa_get_generation, > + .set_irq_affinity = vduse_vdpa_set_irq_affinity, > .reset = vduse_vdpa_reset, > .set_map = vduse_vdpa_set_map, > .free = vduse_vdpa_free, > @@ -1380,6 +1428,8 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num) > INIT_WORK(&dev->vqs[i]->kick, vduse_vq_kick_work); > spin_lock_init(&dev->vqs[i]->kick_lock); > spin_lock_init(&dev->vqs[i]->irq_lock); > + spin_lock_init(&dev->vqs[i]->irq_affinity_lock); > + cpumask_setall(&dev->vqs[i]->irq_affinity); > } > > return 0; > -- > 2.20.1 >
On Fri, Dec 16, 2022 at 1:30 PM Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Dec 5, 2022 at 4:59 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > This brings current interrupt affinity spreading mechanism > > to vduse device. We will make use of irq_create_affinity_masks() > > to create an irq callback affinity mask for each virtqueue of > > vduse device. Then we will choose the CPU which has the lowest > > number of interrupt allocated in the affinity mask to run the > > irq callback. > > This seems a balance mechanism but it might not be the semantic of the > affinity or any reason we need to do this? I guess we should use at > least round-robin in this case. > Here we try to follow the pci interrupt management mechanism. In VM cases, the interrupt should always be triggered to one specific CPU rather than to each CPU in turn. > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > --- > > drivers/vdpa/vdpa_user/vduse_dev.c | 50 ++++++++++++++++++++++++++++++ > > 1 file changed, 50 insertions(+) > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > index d126f3e32a20..90c2896039d9 100644 > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > @@ -23,6 +23,7 @@ > > #include <linux/nospec.h> > > #include <linux/vmalloc.h> > > #include <linux/sched/mm.h> > > +#include <linux/interrupt.h> > > #include <uapi/linux/vduse.h> > > #include <uapi/linux/vdpa.h> > > #include <uapi/linux/virtio_config.h> > > @@ -58,6 +59,8 @@ struct vduse_virtqueue { > > struct work_struct inject; > > struct work_struct kick; > > int irq_effective_cpu; > > + struct cpumask irq_affinity; > > + spinlock_t irq_affinity_lock; > > Ok, I'd suggest to squash this into patch 5 to make it more easier to > be reviewed. > OK. > > }; > > > > struct vduse_dev; > > @@ -123,6 +126,7 @@ struct vduse_control { > > > > static DEFINE_MUTEX(vduse_lock); > > static DEFINE_IDR(vduse_idr); > > +static DEFINE_PER_CPU(unsigned long, vduse_allocated_irq); > > > > static dev_t vduse_major; > > static struct class *vduse_class; > > @@ -710,6 +714,49 @@ static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa) > > return dev->generation; > > } > > > > +static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq) > > +{ > > + unsigned int cpu, best_cpu; > > + unsigned long allocated, allocated_min = UINT_MAX; > > + > > + spin_lock(&vq->irq_affinity_lock); > > + > > + best_cpu = vq->irq_effective_cpu; > > + if (best_cpu != -1) > > + per_cpu(vduse_allocated_irq, best_cpu) -= 1; > > + > > + for_each_cpu(cpu, &vq->irq_affinity) { > > + allocated = per_cpu(vduse_allocated_irq, cpu); > > + if (!cpu_online(cpu) || allocated >= allocated_min) > > + continue; > > + > > + best_cpu = cpu; > > + allocated_min = allocated; > > + } > > + vq->irq_effective_cpu = best_cpu; > > + per_cpu(vduse_allocated_irq, best_cpu) += 1; > > + > > + spin_unlock(&vq->irq_affinity_lock); > > +} > > + > > +static void vduse_vdpa_set_irq_affinity(struct vdpa_device *vdpa, > > + struct irq_affinity *desc) > > +{ > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa); > > + struct irq_affinity_desc *affd = NULL; > > + int i; > > + > > + affd = irq_create_affinity_masks(dev->vq_num, desc); > > + if (!affd) > > Let's add a comment on the vdpa config ops to say set_irq_affinity() > is best effort. > OK. Thanks, Yongji
On Mon, Dec 19, 2022 at 12:56 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Fri, Dec 16, 2022 at 1:30 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Dec 5, 2022 at 4:59 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > This brings current interrupt affinity spreading mechanism > > > to vduse device. We will make use of irq_create_affinity_masks() > > > to create an irq callback affinity mask for each virtqueue of > > > vduse device. Then we will choose the CPU which has the lowest > > > number of interrupt allocated in the affinity mask to run the > > > irq callback. > > > > This seems a balance mechanism but it might not be the semantic of the > > affinity or any reason we need to do this? I guess we should use at > > least round-robin in this case. > > > > Here we try to follow the pci interrupt management mechanism. In VM > cases, the interrupt should always be triggered to one specific CPU > rather than to each CPU in turn. If I was not wrong, when using MSI, most arch allows not only the cpuid as the destination but policy like rr and low priority first. Thanks > > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > > --- > > > drivers/vdpa/vdpa_user/vduse_dev.c | 50 ++++++++++++++++++++++++++++++ > > > 1 file changed, 50 insertions(+) > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > index d126f3e32a20..90c2896039d9 100644 > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > @@ -23,6 +23,7 @@ > > > #include <linux/nospec.h> > > > #include <linux/vmalloc.h> > > > #include <linux/sched/mm.h> > > > +#include <linux/interrupt.h> > > > #include <uapi/linux/vduse.h> > > > #include <uapi/linux/vdpa.h> > > > #include <uapi/linux/virtio_config.h> > > > @@ -58,6 +59,8 @@ struct vduse_virtqueue { > > > struct work_struct inject; > > > struct work_struct kick; > > > int irq_effective_cpu; > > > + struct cpumask irq_affinity; > > > + spinlock_t irq_affinity_lock; > > > > Ok, I'd suggest to squash this into patch 5 to make it more easier to > > be reviewed. > > > > OK. > > > > }; > > > > > > struct vduse_dev; > > > @@ -123,6 +126,7 @@ struct vduse_control { > > > > > > static DEFINE_MUTEX(vduse_lock); > > > static DEFINE_IDR(vduse_idr); > > > +static DEFINE_PER_CPU(unsigned long, vduse_allocated_irq); > > > > > > static dev_t vduse_major; > > > static struct class *vduse_class; > > > @@ -710,6 +714,49 @@ static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa) > > > return dev->generation; > > > } > > > > > > +static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq) > > > +{ > > > + unsigned int cpu, best_cpu; > > > + unsigned long allocated, allocated_min = UINT_MAX; > > > + > > > + spin_lock(&vq->irq_affinity_lock); > > > + > > > + best_cpu = vq->irq_effective_cpu; > > > + if (best_cpu != -1) > > > + per_cpu(vduse_allocated_irq, best_cpu) -= 1; > > > + > > > + for_each_cpu(cpu, &vq->irq_affinity) { > > > + allocated = per_cpu(vduse_allocated_irq, cpu); > > > + if (!cpu_online(cpu) || allocated >= allocated_min) > > > + continue; > > > + > > > + best_cpu = cpu; > > > + allocated_min = allocated; > > > + } > > > + vq->irq_effective_cpu = best_cpu; > > > + per_cpu(vduse_allocated_irq, best_cpu) += 1; > > > + > > > + spin_unlock(&vq->irq_affinity_lock); > > > +} > > > + > > > +static void vduse_vdpa_set_irq_affinity(struct vdpa_device *vdpa, > > > + struct irq_affinity *desc) > > > +{ > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa); > > > + struct irq_affinity_desc *affd = NULL; > > > + int i; > > > + > > > + affd = irq_create_affinity_masks(dev->vq_num, desc); > > > + if (!affd) > > > > Let's add a comment on the vdpa config ops to say set_irq_affinity() > > is best effort. > > > > OK. > > Thanks, > Yongji >
On Tue, Dec 20, 2022 at 2:32 PM Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Dec 19, 2022 at 12:56 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Fri, Dec 16, 2022 at 1:30 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Dec 5, 2022 at 4:59 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > > > This brings current interrupt affinity spreading mechanism > > > > to vduse device. We will make use of irq_create_affinity_masks() > > > > to create an irq callback affinity mask for each virtqueue of > > > > vduse device. Then we will choose the CPU which has the lowest > > > > number of interrupt allocated in the affinity mask to run the > > > > irq callback. > > > > > > This seems a balance mechanism but it might not be the semantic of the > > > affinity or any reason we need to do this? I guess we should use at > > > least round-robin in this case. > > > > > > > Here we try to follow the pci interrupt management mechanism. In VM > > cases, the interrupt should always be triggered to one specific CPU > > rather than to each CPU in turn. > > If I was not wrong, when using MSI, most arch allows not only the > cpuid as the destination but policy like rr and low priority first. > I see. I think we can remove the irq effective affinity and just use the irq affinity. If the irq affinity mask contains more than one CPU, we can use round-robin to spread IRQ between CPUs. But the sysfs interface for irq affinity should be writable so that someone wants to stop round-robin and pick one CPU to run irq callback. Thanks, Yongji
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index d126f3e32a20..90c2896039d9 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -23,6 +23,7 @@ #include <linux/nospec.h> #include <linux/vmalloc.h> #include <linux/sched/mm.h> +#include <linux/interrupt.h> #include <uapi/linux/vduse.h> #include <uapi/linux/vdpa.h> #include <uapi/linux/virtio_config.h> @@ -58,6 +59,8 @@ struct vduse_virtqueue { struct work_struct inject; struct work_struct kick; int irq_effective_cpu; + struct cpumask irq_affinity; + spinlock_t irq_affinity_lock; }; struct vduse_dev; @@ -123,6 +126,7 @@ struct vduse_control { static DEFINE_MUTEX(vduse_lock); static DEFINE_IDR(vduse_idr); +static DEFINE_PER_CPU(unsigned long, vduse_allocated_irq); static dev_t vduse_major; static struct class *vduse_class; @@ -710,6 +714,49 @@ static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa) return dev->generation; } +static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq) +{ + unsigned int cpu, best_cpu; + unsigned long allocated, allocated_min = UINT_MAX; + + spin_lock(&vq->irq_affinity_lock); + + best_cpu = vq->irq_effective_cpu; + if (best_cpu != -1) + per_cpu(vduse_allocated_irq, best_cpu) -= 1; + + for_each_cpu(cpu, &vq->irq_affinity) { + allocated = per_cpu(vduse_allocated_irq, cpu); + if (!cpu_online(cpu) || allocated >= allocated_min) + continue; + + best_cpu = cpu; + allocated_min = allocated; + } + vq->irq_effective_cpu = best_cpu; + per_cpu(vduse_allocated_irq, best_cpu) += 1; + + spin_unlock(&vq->irq_affinity_lock); +} + +static void vduse_vdpa_set_irq_affinity(struct vdpa_device *vdpa, + struct irq_affinity *desc) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct irq_affinity_desc *affd = NULL; + int i; + + affd = irq_create_affinity_masks(dev->vq_num, desc); + if (!affd) + return; + + for (i = 0; i < dev->vq_num; i++) { + cpumask_copy(&dev->vqs[i]->irq_affinity, &affd[i].mask); + vduse_vq_update_effective_cpu(dev->vqs[i]); + } + kfree(affd); +} + static int vduse_vdpa_set_map(struct vdpa_device *vdpa, unsigned int asid, struct vhost_iotlb *iotlb) @@ -760,6 +807,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = { .get_config = vduse_vdpa_get_config, .set_config = vduse_vdpa_set_config, .get_generation = vduse_vdpa_get_generation, + .set_irq_affinity = vduse_vdpa_set_irq_affinity, .reset = vduse_vdpa_reset, .set_map = vduse_vdpa_set_map, .free = vduse_vdpa_free, @@ -1380,6 +1428,8 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num) INIT_WORK(&dev->vqs[i]->kick, vduse_vq_kick_work); spin_lock_init(&dev->vqs[i]->kick_lock); spin_lock_init(&dev->vqs[i]->irq_lock); + spin_lock_init(&dev->vqs[i]->irq_affinity_lock); + cpumask_setall(&dev->vqs[i]->irq_affinity); } return 0;