Message ID | 20230320132316.3126838-1-korantwork@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1216389wrt; Mon, 20 Mar 2023 06:29:15 -0700 (PDT) X-Google-Smtp-Source: AK7set80i5brnK2leNz59a0rxwtRoRpPTS+NGQxdpwoVrTuXfmvpuoTxihDwsm73y1/CbDHgXvuT X-Received: by 2002:a62:7946:0:b0:625:2636:9cd2 with SMTP id u67-20020a627946000000b0062526369cd2mr19150954pfc.18.1679318955330; Mon, 20 Mar 2023 06:29:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679318955; cv=none; d=google.com; s=arc-20160816; b=lpyq7sWAbqfcQjNZW15vt8FKMUhQlaptI6JA49khtoPEUbjBnqMXwevp2ZwZbvEiBb l/WgoLZR1XQd1b73vj4kgoiGM3Un6jwMAM5WiamQMZaXniB2AT+7O6VsgNrSYtFCYwSK f2K0Mb3CiEN8JDsyIU+ByKDZ0I7cxCKwcsSxmQLDL7u//kYYocDy96Ksybz/UCN6DWnf dKkMtqoMusH2sc+1H2jf4LR46RXHnOSz2IIiwyIq0LS5RxCLKYQi6pPGM9lquUJDNfTf Yn4PvbS3rHKPyL6KGibp/tzsuyPVnPPqds7CwF46nlp8mt9z1jgNuXm8mqX50097V2Ch Sk2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=LwCDKhpVotUuQMaBr5jTrfbX68cNb9ff470+ysZxzzQ=; b=GynnQl+tlwMlhptx59yaoW+8dezGRI6cgPu7EyJduIvFwz5JJ/NrwwPGyYvPiu/GH7 L0UVMJ666l4Lp87nSl1ZKGgj/yjegcSiimiA6+DUivYp+Wrq1Ls8oDLTZf6/jPRSLAtE 7BX6HQZOCV5W2k78g0SkU4ti67mSnP9bxjSYJPu60NGtrkOiEm79m43ffKz9CVlLspOt xpfqDE4Vds4MxxFW4G13v35aARhcsuQi8CIpkGiVH2yrdEH2+qUabSmp5uczgz+7rk6K VFYaHfh0Jpe9hj/lYr8iGbGIiWt0aPCBC0aHLIMjOG70ROPFWsKNyylfTMKpB2caUjbs KqpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=DbPUt+Ef; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z4-20020aa78884000000b00622a0ec4624si11635589pfe.9.2023.03.20.06.29.03; Mon, 20 Mar 2023 06:29:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=DbPUt+Ef; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231666AbjCTNXc (ORCPT <rfc822;pusanteemu@gmail.com> + 99 others); Mon, 20 Mar 2023 09:23:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231732AbjCTNXY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 20 Mar 2023 09:23:24 -0400 Received: from mail-pj1-x1044.google.com (mail-pj1-x1044.google.com [IPv6:2607:f8b0:4864:20::1044]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA3A12700; Mon, 20 Mar 2023 06:23:21 -0700 (PDT) Received: by mail-pj1-x1044.google.com with SMTP id o6-20020a17090a9f8600b0023f32869993so12550762pjp.1; Mon, 20 Mar 2023 06:23:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679318600; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=LwCDKhpVotUuQMaBr5jTrfbX68cNb9ff470+ysZxzzQ=; b=DbPUt+EfMJ+gN+n6cCxeLDCHK554tVu7xveqEz1ELZYrSZU538T3sIDkOC2iEjLIj3 vxyOtqP1ORPBGGeWWY46uhEHukWiZdqTrcJYK4WmRBV+BqNwsat4pGHhP2lpFDF6X8qY wBmIMBPiigRbDggFkUwax/ZeAKqVtNKlUN7cLHY58OLU9kqmdvMT70JBWt3eNt0b6chR 9o8ugYkJBCfDbaPcUQ4/rc0MjCMCgNjWj+2RYSMV4u2rVtzUGNU0GTJk1Nt6ypxmTN9A HRbkkDGmm7Xir+AWDuEMejFwBSvZp0Vqijsia2SVlSi3FPDyy4xAD4239RbBglvz8Ds/ Ynow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679318600; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LwCDKhpVotUuQMaBr5jTrfbX68cNb9ff470+ysZxzzQ=; b=LwVGFqdirmwgB2zMNLQM/y/VhtGJBVP/4pGZWQoVAaviilX7VJfoW26awcJz0jAYbI 10onTQgrZeImbOtG7b8oMMS3DV+ItLeeZVZdDNxTTxy1ksMjwKhYql9D/Fy4FbDRlnJ0 6mHOKb7ws0RmGnatenb/RqgvFOSBUxW4mpm9SBDb3uMlSuHy5gxnb5t3rZNRFh9dTn9H glDmQSWQh2zWMXM+Xqw8HnBoxG00ZpFIpQmdvg3lixf4T5kMynrEOVlab9yELXiUeKLH 3F0uB5S8Remqrq/gBd7IEMG6NQwTQkoaIFsrJEI42Jp2Dmv1t++rYJt6RG8SaTsen+vx MqWw== X-Gm-Message-State: AO0yUKUlh2MbdCFNRKwYeOidRzcYSHI2skSUeZvSinovnB7zW8GH9qXQ xzqJd+tk5fbdBLc+fmEVhPk= X-Received: by 2002:a17:90b:380d:b0:232:fa13:4453 with SMTP id mq13-20020a17090b380d00b00232fa134453mr19012364pjb.13.1679318600483; Mon, 20 Mar 2023 06:23:20 -0700 (PDT) Received: from localhost.localdomain ([43.132.141.4]) by smtp.gmail.com with ESMTPSA id a19-20020a17090aa51300b00233db0db3dfsm9824478pjq.7.2023.03.20.06.23.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 06:23:20 -0700 (PDT) From: korantwork@gmail.com To: helgaas@kernel.org, nirmal.patel@linux.intel.com, kbusch@kernel.org, jonathan.derrick@linux.dev, lpieralisi@kernel.org Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Xinghui Li <korantli@tencent.com> Subject: [PATCH v4] PCI: vmd: Add the module param to adjust MSI mode Date: Mon, 20 Mar 2023 21:23:16 +0800 Message-Id: <20230320132316.3126838-1-korantwork@gmail.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760527389265453890?= X-GMAIL-MSGID: =?utf-8?q?1760893552735914374?= |
Series |
[v4] PCI: vmd: Add the module param to adjust MSI mode
|
|
Commit Message
Xinghui Li
March 20, 2023, 1:23 p.m. UTC
From: Xinghui Li <korantli@tencent.com> In the legacy, the vmd MSI mode can only be adjusted by configuring vmd_ids table. This patch adds another way to adjust MSI mode by adjusting module param, which allows users easier to adjust the vmd according to the I/O scenario without rebuilding driver. There are two params that could be recognized: on, off. The default param is NULL, the goal is not to effect the existing settings of the device. Signed-off-by: Xinghui Li <korantli@tencent.com> Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com> --- drivers/pci/controller/vmd.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
Comments
Friendly ping~ On Mon, Mar 20, 2023 at 9:23 PM <korantwork@gmail.com> wrote: > > From: Xinghui Li <korantli@tencent.com> > > In the legacy, the vmd MSI mode can only be adjusted by configuring > vmd_ids table. This patch adds another way to adjust MSI mode by > adjusting module param, which allows users easier to adjust the vmd > according to the I/O scenario without rebuilding driver. There are two > params that could be recognized: on, off. The default param is NULL, > the goal is not to effect the existing settings of the device. > > Signed-off-by: Xinghui Li <korantli@tencent.com> > Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com> > --- > drivers/pci/controller/vmd.c | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index 990630ec57c6..fb61181baa9e 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -34,6 +34,19 @@ > #define MB2_SHADOW_OFFSET 0x2000 > #define MB2_SHADOW_SIZE 16 > > +/* > + * The VMD msi_remap module parameter provides the alternative way > + * to adjust MSI mode when loading vmd.ko other than vmd_ids table. > + * There are two params could be recognized: > + * > + * off: disable MSI remapping > + * on: enable MSI remapping > + * > + */ > +static char *msi_remap; > +module_param(msi_remap, charp, 0444); > +MODULE_PARM_DESC(msi_remap, "Whether to enable MSI remapping function"); > + > enum vmd_features { > /* > * Device may contain registers which hint the physical location of the > @@ -875,6 +888,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > return ret; > > vmd_set_msi_remapping(vmd, true); > + dev_info(&vmd->dev->dev, "init vmd with remapping MSI\n"); > > ret = vmd_create_irq_domain(vmd); > if (ret) > @@ -887,6 +901,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); > } else { > vmd_set_msi_remapping(vmd, false); > + dev_info(&vmd->dev->dev, "init vmd with bypass MSI\n"); > } > > pci_add_resource(&resources, &vmd->resources[0]); > @@ -955,6 +970,16 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > return 0; > } > > +static void vmd_config_msi_remap_param(unsigned long *features) > +{ > + if (msi_remap) { > + if (strcmp(msi_remap, "on") == 0) > + *features &= ~(VMD_FEAT_CAN_BYPASS_MSI_REMAP); > + else if (strcmp(msi_remap, "off") == 0) > + *features |= VMD_FEAT_CAN_BYPASS_MSI_REMAP; > + } > +} > + > static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) > { > unsigned long features = (unsigned long) id->driver_data; > @@ -984,6 +1009,8 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) > if (err < 0) > goto out_release_instance; > > + vmd_config_msi_remap_param(&features); > + > vmd->cfgbar = pcim_iomap(dev, VMD_CFGBAR, 0); > if (!vmd->cfgbar) { > err = -ENOMEM; > -- > 2.31.1 >
On Mon, Mar 20, 2023 at 09:23:16PM +0800, korantwork@gmail.com wrote: > From: Xinghui Li <korantli@tencent.com> > > In the legacy, the vmd MSI mode can only be adjusted by configuring > vmd_ids table. This patch adds another way to adjust MSI mode by > adjusting module param, which allows users easier to adjust the vmd > according to the I/O scenario without rebuilding driver. There are two > params that could be recognized: on, off. The default param is NULL, > the goal is not to effect the existing settings of the device. "two params: on, off ... default param is NULL" doesn't read quite right because "NULL" is not a valid parameter value. I think you could just omit that last sentence completely because it's obvious that if you don't specify the parameter, it doesn't affect anything and the existing behavior is unchanged (it's determined by whether VMD_FEAT_CAN_BYPASS_MSI_REMAP is specified in vmd_ids[]). > Signed-off-by: Xinghui Li <korantli@tencent.com> > Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com> I think Lorenzo is out of the office this week, but I'm sure he'll take care of this when he returns. In the meantime, can you include a sample usage in the commit log so folks don't have to dig through the patch and figure out how to use the parameter? It would also be nice to include a hint about why a user would choose "on" or "off". What is the performance effect? What sort of I/O scenario would lead you to choose "on" vs "off"? ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible") suggests that MSI-X remapping (I assume the "msi_remap=on" case): - Limits the number MSI-X vectors available to child devices to the number of VMD MSI-X vectors. - Reduces interrupt handling performance because child device interrupts have to go through the VMD MSI-X domain interrupt handler. So I assume "msi_remap=off" would remove that MSI-X vector limit and improve interrupt handling performance? But obviously there's more to consider because those are both good things and if we could do that all the time, we would. So there must be cases where we *have* to remap. ee81ee84f873 suggests that not all VMD devices support disabling remap. There's also a hint that some virt configs require it. This patch doesn't enforce either of those things. What happens if the user gets it wrong? > drivers/pci/controller/vmd.c | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index 990630ec57c6..fb61181baa9e 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -34,6 +34,19 @@ > #define MB2_SHADOW_OFFSET 0x2000 > #define MB2_SHADOW_SIZE 16 > > +/* > + * The VMD msi_remap module parameter provides the alternative way > + * to adjust MSI mode when loading vmd.ko other than vmd_ids table. > + * There are two params could be recognized: > + * > + * off: disable MSI remapping > + * on: enable MSI remapping > + * Spurious blank line. > + */ > +static char *msi_remap; > +module_param(msi_remap, charp, 0444); > +MODULE_PARM_DESC(msi_remap, "Whether to enable MSI remapping function"); ee81ee84f873 mentions MSI-X explicitly (which is different from MSI), so maybe use "MSI-X" here and in the messages below to avoid any confusion. > enum vmd_features { > /* > * Device may contain registers which hint the physical location of the > @@ -875,6 +888,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > return ret; > > vmd_set_msi_remapping(vmd, true); > + dev_info(&vmd->dev->dev, "init vmd with remapping MSI\n"); > > ret = vmd_create_irq_domain(vmd); > if (ret) > @@ -887,6 +901,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); > } else { > vmd_set_msi_remapping(vmd, false); > + dev_info(&vmd->dev->dev, "init vmd with bypass MSI\n"); > } > > pci_add_resource(&resources, &vmd->resources[0]); > @@ -955,6 +970,16 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > return 0; > } > > +static void vmd_config_msi_remap_param(unsigned long *features) > +{ > + if (msi_remap) { > + if (strcmp(msi_remap, "on") == 0) > + *features &= ~(VMD_FEAT_CAN_BYPASS_MSI_REMAP); > + else if (strcmp(msi_remap, "off") == 0) > + *features |= VMD_FEAT_CAN_BYPASS_MSI_REMAP; The usual strcmp() idiom is to test "!strcmp(...)" instead of "strcmp(...) == 0)". No need for "()" around VMD_FEAT_CAN_BYPASS_MSI_REMAP. > + } > +} > + > static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) > { > unsigned long features = (unsigned long) id->driver_data; > @@ -984,6 +1009,8 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) > if (err < 0) > goto out_release_instance; > > + vmd_config_msi_remap_param(&features); > + > vmd->cfgbar = pcim_iomap(dev, VMD_CFGBAR, 0); > if (!vmd->cfgbar) { > err = -ENOMEM; > -- > 2.31.1 >
On Wed, Mar 29, 2023 at 5:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > "two params: on, off ... default param is NULL" doesn't read quite > right because "NULL" is not a valid parameter value. > > I think you could just omit that last sentence completely because it's > obvious that if you don't specify the parameter, it doesn't affect > anything and the existing behavior is unchanged (it's determined by > whether VMD_FEAT_CAN_BYPASS_MSI_REMAP is specified in vmd_ids[]). > Your opinion is more reasonable. I will omit this sentence. > > > Signed-off-by: Xinghui Li <korantli@tencent.com> > > Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com> > > I think Lorenzo is out of the office this week, but I'm sure he'll > take care of this when he returns. > > In the meantime, can you include a sample usage in the commit log so > folks don't have to dig through the patch and figure out how to use > the parameter? > I will add instructions on how to use this parameter. > > It would also be nice to include a hint about why a user would choose > "on" or "off". What is the performance effect? What sort of I/O > scenario would lead you to choose "on" vs "off"? > Before this patch, I sent the patch named : PCI: vmd: Do not disable MSI-X remapping in VMD 28C0 controller (patchwork link: https://patchwork.kernel.org/project/linux-pci/patch/20221222072603.1175248-1-korantwork@gmail.com/) We found the 4k rand read's iops could drop 50% if 4 NVMEs were mounted in one PCIE port with VMD MSI bypass. I suppose this is because the VMD Controller can aggregate interrupts. But those test result is so long that I didn't add them to this patch commit log. If you believe it is necessary, I will try to add some simple instructions > > ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible") > suggests that MSI-X remapping (I assume the "msi_remap=on" case): > > - Limits the number MSI-X vectors available to child devices to the > number of VMD MSI-X vectors. > > - Reduces interrupt handling performance because child device > interrupts have to go through the VMD MSI-X domain interrupt > handler. > > So I assume "msi_remap=off" would remove that MSI-X vector limit and > improve interrupt handling performance? > > But obviously there's more to consider because those are both good > things and if we could do that all the time, we would. So there must > be cases where we *have* to remap. ee81ee84f873 suggests that not all > VMD devices support disabling remap. There's also a hint that some > virt configs require it. > I used to just want to disable 28C0's VMD MSI bypass by default. But Nirmal suggested the current method by adjusting the param. Because he and other reviewers worry there are some other scenarios we didn't consider. Adding a method to adjust VMD'S MSI-X mode is better. > This patch doesn't enforce either of those things. What happens if > the user gets it wrong? > If I am wrong, please feel free to correct me at any time. I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode param configuring helper front "vmd_enable_domain". So, It will not change the logic disabling remapping from ee81ee84f873, such as "Currently MSI remapping must be enabled in guest passthrough mode". So, if the user config the wrong type, it will not work, and they can find it by dmesg. And that's why I add the MSI-X mode init print. > > Spurious blank line. sorry for this > > > ee81ee84f873 mentions MSI-X explicitly (which is different from MSI), > so maybe use "MSI-X" here and in the messages below to avoid any > confusion. > That's nicer. > > The usual strcmp() idiom is to test "!strcmp(...)" instead of > "strcmp(...) == 0)". No need for "()" around > VMD_FEAT_CAN_BYPASS_MSI_REMAP. > All right, I will change it to "!strcmp()" way. Thanks for you review~
On Wed, Mar 29, 2023 at 04:57:08PM +0800, Xinghui Li wrote: > On Wed, Mar 29, 2023 at 5:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > It would also be nice to include a hint about why a user would choose > > "on" or "off". What is the performance effect? What sort of I/O > > scenario would lead you to choose "on" vs "off"? > > > Before this patch, I sent the patch named : > PCI: vmd: Do not disable MSI-X remapping in VMD 28C0 controller > (patchwork link: > https://patchwork.kernel.org/project/linux-pci/patch/20221222072603.1175248-1-korantwork@gmail.com/) > We found the 4k rand read's iops could drop 50% if 4 NVMEs were > mounted in one PCIE port with VMD MSI bypass. > I suppose this is because the VMD Controller can aggregate interrupts. > But those test result is so long that I didn't add them to this patch > commit log. > If you believe it is necessary, I will try to add some simple instructions I don't think we need detailed performance numbers, but we need something like: - "msi_remap=off" improves interrupt handling performance by avoiding the VMD MSI-X domain interrupt handler - But "msi_remap=on" is needed when ...? > > ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible") > > suggests that MSI-X remapping (I assume the "msi_remap=on" case): > > > > - Limits the number MSI-X vectors available to child devices to the > > number of VMD MSI-X vectors. > > > > - Reduces interrupt handling performance because child device > > interrupts have to go through the VMD MSI-X domain interrupt > > handler. > > > > So I assume "msi_remap=off" would remove that MSI-X vector limit and > > improve interrupt handling performance? > > > > But obviously there's more to consider because those are both good > > things and if we could do that all the time, we would. So there must > > be cases where we *have* to remap. ee81ee84f873 suggests that not all > > VMD devices support disabling remap. There's also a hint that some > > virt configs require it. > > > I used to just want to disable 28C0's VMD MSI bypass by default. > But Nirmal suggested the current method by adjusting the param. > Because he and other reviewers worry there are some other scenarios we > didn't consider. > Adding a method to adjust VMD'S MSI-X mode is better. This commit log doesn't outline any of those other scenarios, and it doesn't say anything about when "msi_remap=on" or "msi_remap=off" would be necessary or desired, so I have no idea how users are supposed to figure out whether or not to use this parameter. > > This patch doesn't enforce either of those things. What happens if > > the user gets it wrong? > > If I am wrong, please feel free to correct me at any time. > I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode > param configuring helper front > "vmd_enable_domain". So, It will not change the logic disabling > remapping from ee81ee84f873, such as > "Currently MSI remapping must be enabled in guest passthrough mode". > So, if the user config the wrong type, it will not work, and they can > find it by dmesg. That's kind of a problem. I'm not in favor of something failing and the user having to debug it via dmesg. That causes user frustration and problem reports. I don't know what "guest passthrough mode" is. Can you detect that automatically? Bjorn
On 3/29/2023 9:31 AM, Bjorn Helgaas wrote: > On Wed, Mar 29, 2023 at 04:57:08PM +0800, Xinghui Li wrote: >> On Wed, Mar 29, 2023 at 5:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote: >>> It would also be nice to include a hint about why a user would choose >>> "on" or "off". What is the performance effect? What sort of I/O >>> scenario would lead you to choose "on" vs "off"? >>> >> Before this patch, I sent the patch named : >> PCI: vmd: Do not disable MSI-X remapping in VMD 28C0 controller >> (patchwork link: >> https://patchwork.kernel.org/project/linux-pci/patch/20221222072603.1175248-1-korantwork@gmail.com/) >> We found the 4k rand read's iops could drop 50% if 4 NVMEs were >> mounted in one PCIE port with VMD MSI bypass. >> I suppose this is because the VMD Controller can aggregate interrupts. >> But those test result is so long that I didn't add them to this patch >> commit log. >> If you believe it is necessary, I will try to add some simple instructions > I don't think we need detailed performance numbers, but we need > something like: > > - "msi_remap=off" improves interrupt handling performance by > avoiding the VMD MSI-X domain interrupt handler > > - But "msi_remap=on" is needed when ...? > >>> ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible") >>> suggests that MSI-X remapping (I assume the "msi_remap=on" case): >>> >>> - Limits the number MSI-X vectors available to child devices to the >>> number of VMD MSI-X vectors. >>> >>> - Reduces interrupt handling performance because child device >>> interrupts have to go through the VMD MSI-X domain interrupt >>> handler. >>> >>> So I assume "msi_remap=off" would remove that MSI-X vector limit and >>> improve interrupt handling performance? >>> >>> But obviously there's more to consider because those are both good >>> things and if we could do that all the time, we would. So there must >>> be cases where we *have* to remap. ee81ee84f873 suggests that not all >>> VMD devices support disabling remap. There's also a hint that some >>> virt configs require it. >>> >> I used to just want to disable 28C0's VMD MSI bypass by default. >> But Nirmal suggested the current method by adjusting the param. >> Because he and other reviewers worry there are some other scenarios we >> didn't consider. >> Adding a method to adjust VMD'S MSI-X mode is better. > This commit log doesn't outline any of those other scenarios, and it > doesn't say anything about when "msi_remap=on" or "msi_remap=off" > would be necessary or desired, so I have no idea how users are > supposed to figure out whether or not to use this parameter. > >>> This patch doesn't enforce either of those things. What happens if >>> the user gets it wrong? >> If I am wrong, please feel free to correct me at any time. >> I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode >> param configuring helper front >> "vmd_enable_domain". So, It will not change the logic disabling >> remapping from ee81ee84f873, such as >> "Currently MSI remapping must be enabled in guest passthrough mode". >> So, if the user config the wrong type, it will not work, and they can >> find it by dmesg. > That's kind of a problem. I'm not in favor of something failing and > the user having to debug it via dmesg. That causes user frustration > and problem reports. > > I don't know what "guest passthrough mode" is. Can you detect that > automatically? > > Bjorn How about adding a boolean flag by comparing user input for module parameter msi_remap? and add the flag at - if (!(features & VMD_FEAT_CAN_BYPASS_MSI_REMAP) || msi_flag || offset[0] || offset[1]) Correct if I am wrong, but in this way we can cover all the cases. If user adds msi_remap=on, msi_flag=true and enables remapping. If user adds msi_remap=off, msi_flag=false and disables remapping. If user doesn't add anything, msi_flag=false and decision will be made same as current implementation. This will cover guest OS case as well.
On Thu, Mar 30, 2023 at 2:49 PM Patel, Nirmal <nirmal.patel@linux.intel.com> wrote: > > How about adding a boolean flag by comparing user input for module > parameter msi_remap? and add the flag at > > - if (!(features & VMD_FEAT_CAN_BYPASS_MSI_REMAP) || msi_flag > || offset[0] || offset[1]) > > Correct if I am wrong, but in this way we can cover all the cases. > If user adds msi_remap=on, msi_flag=true and enables remapping. > If user adds msi_remap=off, msi_flag=false and disables remapping. > If user doesn't add anything, msi_flag=false and decision will be > made same as current implementation. This will cover guest OS case > as well. > Sorry, I don't quite get your point. How is msi_flag assigned? Do you mean when msi_remap=no, the msi_flag is assigned as true? And msi_remap=off, the msi_flag is assigned as false? Thanks~
On Thu, Mar 30, 2023 at 12:31 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Mar 29, 2023 at 04:57:08PM +0800, Xinghui Li wrote: > > On Wed, Mar 29, 2023 at 5:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > It would also be nice to include a hint about why a user would choose > > > "on" or "off". What is the performance effect? What sort of I/O > > > scenario would lead you to choose "on" vs "off"? > > > > > Before this patch, I sent the patch named : > > PCI: vmd: Do not disable MSI-X remapping in VMD 28C0 controller > > (patchwork link: > > https://patchwork.kernel.org/project/linux-pci/patch/20221222072603.1175248-1-korantwork@gmail.com/) > > We found the 4k rand read's iops could drop 50% if 4 NVMEs were > > mounted in one PCIE port with VMD MSI bypass. > > I suppose this is because the VMD Controller can aggregate interrupts. > > But those test result is so long that I didn't add them to this patch > > commit log. > > If you believe it is necessary, I will try to add some simple instructions > > I don't think we need detailed performance numbers, but we need > something like: > > - "msi_remap=off" improves interrupt handling performance by > avoiding the VMD MSI-X domain interrupt handler > > - But "msi_remap=on" is needed when ...? > In the patch I send in last email, We speculate that the VMD Controller aggregate interrupts, making the PCIe port less stressed and improving iops. In this case(lots of 4k random IO), if we enable the VMD MSI remapping, we found the interrupts from VMD Controller's MSI are less and the IOPS is much higher. > > > ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible") > > > suggests that MSI-X remapping (I assume the "msi_remap=on" case): > > > > > > - Limits the number MSI-X vectors available to child devices to the > > > number of VMD MSI-X vectors. > > > > > > - Reduces interrupt handling performance because child device > > > interrupts have to go through the VMD MSI-X domain interrupt > > > handler. > > > > > > So I assume "msi_remap=off" would remove that MSI-X vector limit and > > > improve interrupt handling performance? > > > > > > But obviously there's more to consider because those are both good > > > things and if we could do that all the time, we would. So there must > > > be cases where we *have* to remap. ee81ee84f873 suggests that not all > > > VMD devices support disabling remap. There's also a hint that some > > > virt configs require it. > > > > > I used to just want to disable 28C0's VMD MSI bypass by default. > > But Nirmal suggested the current method by adjusting the param. > > Because he and other reviewers worry there are some other scenarios we > > didn't consider. > > Adding a method to adjust VMD'S MSI-X mode is better. > > This commit log doesn't outline any of those other scenarios, and it > doesn't say anything about when "msi_remap=on" or "msi_remap=off" > would be necessary or desired, so I have no idea how users are > supposed to figure out whether or not to use this parameter. > I do miss this part. I'm sorry I ignored the users outside the community or this discussion. I will add the necessary guidelines. > > > This patch doesn't enforce either of those things. What happens if > > > the user gets it wrong? > > > > If I am wrong, please feel free to correct me at any time. > > I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode > > param configuring helper front > > "vmd_enable_domain". So, It will not change the logic disabling > > remapping from ee81ee84f873, such as > > "Currently MSI remapping must be enabled in guest passthrough mode". > > So, if the user config the wrong type, it will not work, and they can > > find it by dmesg. > > That's kind of a problem. I'm not in favor of something failing and > the user having to debug it via dmesg. That causes user frustration > and problem reports. What about adding a sysfs node for it in VMD PCI bus dir, which allows users to catch VMD's MSI current working mode? > > I don't know what "guest passthrough mode" is. Can you detect that > automatically? I quote this from the commit ee81ee84f873's comment, it can be detected by the logic like this: if (!(features & VMD_FEAT_CAN_BYPASS_MSI_REMAP) || offset[0] || offset[1]) I just want to answer your comment: "There's also a hint that some virt configs require it." This patch will not modify the logic of determining whether MSI remapping is enabled when running VMD in Guest. Thanks~
On 4/2/2023 7:34 AM, Xinghui Li wrote: > On Thu, Mar 30, 2023 at 2:49 PM Patel, Nirmal > <nirmal.patel@linux.intel.com> wrote: >> How about adding a boolean flag by comparing user input for module >> parameter msi_remap? and add the flag at >> >> - if (!(features & VMD_FEAT_CAN_BYPASS_MSI_REMAP) || msi_flag >> || offset[0] || offset[1]) >> >> Correct if I am wrong, but in this way we can cover all the cases. >> If user adds msi_remap=on, msi_flag=true and enables remapping. >> If user adds msi_remap=off, msi_flag=false and disables remapping. >> If user doesn't add anything, msi_flag=false and decision will be >> made same as current implementation. This will cover guest OS case >> as well. >> > Sorry, I don't quite get your point. How is msi_flag assigned? > Do you mean when msi_remap=no, the msi_flag is assigned as true? > And msi_remap=off, the msi_flag is assigned as false? > > Thanks~ if msi_remap=on, then msi_flag=true; if msi_remap=off, then msi_flag=false.
On Sun, Apr 02, 2023 at 11:02:07PM +0800, Xinghui Li wrote: > On Thu, Mar 30, 2023 at 12:31 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Mar 29, 2023 at 04:57:08PM +0800, Xinghui Li wrote: > > > On Wed, Mar 29, 2023 at 5:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > It would also be nice to include a hint about why a user would choose > > > > "on" or "off". What is the performance effect? What sort of I/O > > > > scenario would lead you to choose "on" vs "off"? > > I don't think we need detailed performance numbers, but we need > > something like: > > > > - "msi_remap=off" improves interrupt handling performance by > > avoiding the VMD MSI-X domain interrupt handler > > > > - But "msi_remap=on" is needed when ...? > > > In the patch I send in last email, We speculate that the VMD > Controller aggregate interrupts, > making the PCIe port less stressed and improving iops. In this > case(lots of 4k random IO), if we enable the VMD MSI > remapping, we found the interrupts from VMD Controller's MSI are less > and the IOPS is much higher. Great, that's useful information about why users would want to use this parameter. Obviously the other half is to mention the reasons why they may not be able to use it. If "msi_remap=off" were *always* safe and desirable, we would just do that unconditionally and there would be no point in a parameter. > > > I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode > > > param configuring helper front > > > "vmd_enable_domain". So, It will not change the logic disabling > > > remapping from ee81ee84f873, such as > > > "Currently MSI remapping must be enabled in guest passthrough mode". > > > So, if the user config the wrong type, it will not work, and they can > > > find it by dmesg. > > > > That's kind of a problem. I'm not in favor of something failing and > > the user having to debug it via dmesg. That causes user frustration > > and problem reports. > > What about adding a sysfs node for it in VMD PCI bus dir, which allows > users to catch VMD's MSI current working mode? No, a sysfs node is not the answer to this. If we *can* avoid a non-working situation, we should avoid it. If users see a non-working situation, they will rightly complain, and somebody will have to debug it. We don't want that. > > I don't know what "guest passthrough mode" is. Can you detect that > > automatically? > > I quote this from the commit ee81ee84f873's comment, it can be detected by the > logic like this: > if (!(features & VMD_FEAT_CAN_BYPASS_MSI_REMAP) || > offset[0] || offset[1]) > I just want to answer your comment: "There's also a hint that some > virt configs require it." > This patch will not modify the logic of determining whether MSI > remapping is enabled > when running VMD in Guest. My point is that apparently guest passthrough mode is relevant and maybe it should be part of the commit log about how this parameter should be used. Bjorn
On Tue, Apr 4, 2023 at 6:45 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > In the patch I send in last email, We speculate that the VMD > > Controller aggregate interrupts, > > making the PCIe port less stressed and improving iops. In this > > case(lots of 4k random IO), if we enable the VMD MSI > > remapping, we found the interrupts from VMD Controller's MSI are less > > and the IOPS is much higher. > > Great, that's useful information about why users would want to use > this parameter. > > Obviously the other half is to mention the reasons why they may not be > able to use it. If "msi_remap=off" were *always* safe and desirable, > we would just do that unconditionally and there would be no point in a > parameter. > > > > > I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode > > > > param configuring helper front > > > > "vmd_enable_domain". So, It will not change the logic disabling > > > > remapping from ee81ee84f873, such as > > > > "Currently MSI remapping must be enabled in guest passthrough mode". > > > > So, if the user config the wrong type, it will not work, and they can > > > > find it by dmesg. > > > > > > That's kind of a problem. I'm not in favor of something failing and > > > the user having to debug it via dmesg. That causes user frustration > > > and problem reports. > > > > What about adding a sysfs node for it in VMD PCI bus dir, which allows > > users to catch VMD's MSI current working mode? > > No, a sysfs node is not the answer to this. If we *can* avoid a > non-working situation, we should avoid it. If users see a non-working > situation, they will rightly complain, and somebody will have to debug > it. We don't want that. emm~ I privately suppose: In the traditional way without module parameters, this problem also exists. If the user disables remapping in guest os, the VMD driver will force it to remapping mode. What about I add the additional description in MODULE_PARM_DESC and comment? As for some devices not support disable remapping, What about changing the param to disabae_bypass=0/1? And make it clear in the description: this parameter will only affect the disabling of bypass mode on devices that support it. > > My point is that apparently guest passthrough mode is relevant and > maybe it should be part of the commit log about how this parameter > should be used. You are right~ I will add some clarification on how to configure VMD MSI in the guest. > > Bjorn
Friendly ping~ On Tue, Apr 4, 2023 at 7:02 PM Xinghui Li <korantwork@gmail.com> wrote: > > On Tue, Apr 4, 2023 at 6:45 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > In the patch I send in last email, We speculate that the VMD > > > Controller aggregate interrupts, > > > making the PCIe port less stressed and improving iops. In this > > > case(lots of 4k random IO), if we enable the VMD MSI > > > remapping, we found the interrupts from VMD Controller's MSI are less > > > and the IOPS is much higher. > > > > Great, that's useful information about why users would want to use > > this parameter. > > > > Obviously the other half is to mention the reasons why they may not be > > able to use it. If "msi_remap=off" were *always* safe and desirable, > > we would just do that unconditionally and there would be no point in a > > parameter. > > > > > > > I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode > > > > > param configuring helper front > > > > > "vmd_enable_domain". So, It will not change the logic disabling > > > > > remapping from ee81ee84f873, such as > > > > > "Currently MSI remapping must be enabled in guest passthrough mode". > > > > > So, if the user config the wrong type, it will not work, and they can > > > > > find it by dmesg. > > > > > > > > That's kind of a problem. I'm not in favor of something failing and > > > > the user having to debug it via dmesg. That causes user frustration > > > > and problem reports. > > > > > > What about adding a sysfs node for it in VMD PCI bus dir, which allows > > > users to catch VMD's MSI current working mode? > > > > No, a sysfs node is not the answer to this. If we *can* avoid a > > non-working situation, we should avoid it. If users see a non-working > > situation, they will rightly complain, and somebody will have to debug > > it. We don't want that. > emm~ > I privately suppose: In the traditional way without module parameters, > this problem also exists. If the user disables remapping in guest os, the VMD > driver will force it to remapping mode. > What about I add the additional description in MODULE_PARM_DESC and comment? > As for some devices not support disable remapping, What about changing > the param to disabae_bypass=0/1? > And make it clear in the description: > this parameter will only affect the disabling of bypass mode on > devices that support it. > > > > My point is that apparently guest passthrough mode is relevant and > > maybe it should be part of the commit log about how this parameter > > should be used. > You are right~ > I will add some clarification on how to configure VMD MSI in the guest. > > > > Bjorn
On Mon, Apr 17, 2023 at 05:15:00PM +0800, Xinghui Li wrote: > Friendly ping~ We had quite a bit of discussion that I don't see reflected in the latest patch, so we're waiting on a v5 patch that addresses the comments. > On Tue, Apr 4, 2023 at 7:02 PM Xinghui Li <korantwork@gmail.com> wrote: > > > > On Tue, Apr 4, 2023 at 6:45 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > In the patch I send in last email, We speculate that the VMD > > > > Controller aggregate interrupts, > > > > making the PCIe port less stressed and improving iops. In this > > > > case(lots of 4k random IO), if we enable the VMD MSI > > > > remapping, we found the interrupts from VMD Controller's MSI are less > > > > and the IOPS is much higher. > > > > > > Great, that's useful information about why users would want to use > > > this parameter. > > > > > > Obviously the other half is to mention the reasons why they may not be > > > able to use it. If "msi_remap=off" were *always* safe and desirable, > > > we would just do that unconditionally and there would be no point in a > > > parameter. > > > > > > > > > I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode > > > > > > param configuring helper front > > > > > > "vmd_enable_domain". So, It will not change the logic disabling > > > > > > remapping from ee81ee84f873, such as > > > > > > "Currently MSI remapping must be enabled in guest passthrough mode". > > > > > > So, if the user config the wrong type, it will not work, and they can > > > > > > find it by dmesg. > > > > > > > > > > That's kind of a problem. I'm not in favor of something failing and > > > > > the user having to debug it via dmesg. That causes user frustration > > > > > and problem reports. > > > > > > > > What about adding a sysfs node for it in VMD PCI bus dir, which allows > > > > users to catch VMD's MSI current working mode? > > > > > > No, a sysfs node is not the answer to this. If we *can* avoid a > > > non-working situation, we should avoid it. If users see a non-working > > > situation, they will rightly complain, and somebody will have to debug > > > it. We don't want that. > > emm~ > > I privately suppose: In the traditional way without module parameters, > > this problem also exists. If the user disables remapping in guest os, the VMD > > driver will force it to remapping mode. > > What about I add the additional description in MODULE_PARM_DESC and comment? > > As for some devices not support disable remapping, What about changing > > the param to disabae_bypass=0/1? > > And make it clear in the description: > > this parameter will only affect the disabling of bypass mode on > > devices that support it. > > > > > > My point is that apparently guest passthrough mode is relevant and > > > maybe it should be part of the commit log about how this parameter > > > should be used. > > You are right~ > > I will add some clarification on how to configure VMD MSI in the guest. > > > > > > Bjorn
On Tue, Apr 18, 2023 at 5:45 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Apr 17, 2023 at 05:15:00PM +0800, Xinghui Li wrote: > > Friendly ping~ > > We had quite a bit of discussion that I don't see reflected in the > latest patch, so we're waiting on a v5 patch that addresses the > comments. > Got it. I will send the next version asap. Thanks~
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 990630ec57c6..fb61181baa9e 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -34,6 +34,19 @@ #define MB2_SHADOW_OFFSET 0x2000 #define MB2_SHADOW_SIZE 16 +/* + * The VMD msi_remap module parameter provides the alternative way + * to adjust MSI mode when loading vmd.ko other than vmd_ids table. + * There are two params could be recognized: + * + * off: disable MSI remapping + * on: enable MSI remapping + * + */ +static char *msi_remap; +module_param(msi_remap, charp, 0444); +MODULE_PARM_DESC(msi_remap, "Whether to enable MSI remapping function"); + enum vmd_features { /* * Device may contain registers which hint the physical location of the @@ -875,6 +888,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) return ret; vmd_set_msi_remapping(vmd, true); + dev_info(&vmd->dev->dev, "init vmd with remapping MSI\n"); ret = vmd_create_irq_domain(vmd); if (ret) @@ -887,6 +901,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); } else { vmd_set_msi_remapping(vmd, false); + dev_info(&vmd->dev->dev, "init vmd with bypass MSI\n"); } pci_add_resource(&resources, &vmd->resources[0]); @@ -955,6 +970,16 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) return 0; } +static void vmd_config_msi_remap_param(unsigned long *features) +{ + if (msi_remap) { + if (strcmp(msi_remap, "on") == 0) + *features &= ~(VMD_FEAT_CAN_BYPASS_MSI_REMAP); + else if (strcmp(msi_remap, "off") == 0) + *features |= VMD_FEAT_CAN_BYPASS_MSI_REMAP; + } +} + static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) { unsigned long features = (unsigned long) id->driver_data; @@ -984,6 +1009,8 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) if (err < 0) goto out_release_instance; + vmd_config_msi_remap_param(&features); + vmd->cfgbar = pcim_iomap(dev, VMD_CFGBAR, 0); if (!vmd->cfgbar) { err = -ENOMEM;