Message ID | cover.1683740667.git.reinette.chatre@intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp4482042vqo; Thu, 11 May 2023 08:54:14 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ66WXolYSnm8Eg0fxI3EvabUFR5QWa2UtKErCGyBPdWHzedzYFGqSU2PU1IvEneidfsyT8+ X-Received: by 2002:a17:90a:fe18:b0:252:8698:d03b with SMTP id ck24-20020a17090afe1800b002528698d03bmr3013730pjb.14.1683820453995; Thu, 11 May 2023 08:54:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683820453; cv=none; d=google.com; s=arc-20160816; b=UBPnQ//x9iK46rf4iPvx+Dsq8mRmEmBkMRJypvB4IKiAwA1Y6qxDf3xNYCMBAIpzjr R4KI36k1wLO9f/5eettUY0iY375bXicKO5h3lwB3CrtsPTdighuU30QxURchnceNrUfB EkwHw5IQmXYPm7kE20TIm8O953u9q95LRay2U2T0XBAKv4kx2TN3ysH5wsxscAEy6pPq kShrwpSz8vx6tjG+HTesTDTJ1W7YnmTo6XK1GDcQcMIIns+PE8UHECvc+xpFMk+3W1Gr 7VlW7FUWHG3bYfGh6L199wOwHI0yJWNpk5PEu9FRpeidXThraKBMqX2VFhdPMME2UBHO jPoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=yyFE1NN9Md6TMu7T1/kcSJTWqwiFPlYPLes3PzyYUKY=; b=fjQ8hxHhwBl0TFb5BatIKSgDj1ONVhBO40S5qXP/v9JsEgSeWIjD8HeynqLrnT0GIs 1UiG0rHA8M+i99aG1Vef8ftEopte7OmIjTyYHtwbX43odpVmjKKFdtBnlKn/vXC5H1p4 QJGevb6P62GISxCBZx2rRTTkhHnveIDXxrdH0Ng38uDZT9nh4ZwzJOk3VfdkQ7HTXJZr wPG2qPQFE3Iazwwif5CNod7k2j0Pk7E8AOYM5n8n6dp0h4Cw7FTduSKH6iOxbE9O2G7v 22+SdzRhb04dtWrpa18HEOXKxEew4nIy0AgOcfVLm0sIOZOHmxuECG/7NYkWr2WYVnJe oHnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="WwchiOZ/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j64-20020a638043000000b0050300b179f3si6861665pgd.444.2023.05.11.08.54.01; Thu, 11 May 2023 08:54:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="WwchiOZ/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238496AbjEKPo4 (ORCPT <rfc822;peekingduck44@gmail.com> + 99 others); Thu, 11 May 2023 11:44:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237526AbjEKPoz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 11 May 2023 11:44:55 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FDCED7; Thu, 11 May 2023 08:44:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683819894; x=1715355894; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=UPzRdwh80F7gww+xL3tYd6eQrJ5u7CiNTO8sjTFxYz8=; b=WwchiOZ/dPsMt2jhO+sAHKVyBiqoiB1S6y2FLy5zaYjm7GHMyo1x+KWX lkWVrPHVzps9wA7CqE0TjqvX4kQjk6LueoLU9OG7IXDsePqZ8DoW90YIa W1JKW6RLxtg5Vb3QQYp6eJVCYjyYJQbNYxjjtx241yO+lUPVfNpRvZR7G Kvl2E2ig94Cqo22eguPO4MY4sN88G8NMIdwtN56nUV8UbsiOKqNd5s9Rb 4uLWY8mRzKVfSQiinGhHaLmZHlTbTaO3F5ZkO5of73okCKhVy+5fgKRse 5qeVKHE6MgAITsZnqCzIca8pnvBS7AUUDdKXo9sN2cZ9fXNSnqClW6A87 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="335046640" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="335046640" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 08:44:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="699776218" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="699776218" Received: from rchatre-ws.ostc.intel.com ([10.54.69.144]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 08:44:48 -0700 From: Reinette Chatre <reinette.chatre@intel.com> To: jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, kevin.tian@intel.com, alex.williamson@redhat.com Cc: tglx@linutronix.de, darwi@linutronix.de, kvm@vger.kernel.org, dave.jiang@intel.com, jing2.liu@intel.com, ashok.raj@intel.com, fenghua.yu@intel.com, tom.zanussi@linux.intel.com, reinette.chatre@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Date: Thu, 11 May 2023 08:44:27 -0700 Message-Id: <cover.1683740667.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765613716632161803?= X-GMAIL-MSGID: =?utf-8?q?1765613716632161803?= |
Series |
vfio/pci: Support dynamic allocation of MSI-X interrupts
|
|
Message
Reinette Chatre
May 11, 2023, 3:44 p.m. UTC
Changes since V4: - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/ - Add Kevin's Reviewed-by tag as applicable. - Treat non-existing INTx interrupt context as kernel bug with WARN. This exposed an issue in the scenario where INTx mask/unmask may occur without INTx enabled. This is fixed by obtaining the interrupt context later (right before use) within impacted functions: vfio_pci_intx_mask() and vfio_pci_intx_unmask_handler(). (Kevin) - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel bug via a WARN instead of ignoring this value. (Kevin) - Improve accuracy of comments. (Kevin) - Please refer to individual patches for local changes. Changes since V3: - V3: https://lore.kernel.org/lkml/cover.1681837892.git.reinette.chatre@intel.com/ - Be considerate about layout and size with changes to struct vfio_pci_core_device. Keep flags together and transition all to use bitfields. (Alex and Jason) - Do not free dynamically allocated interrupts on error path. (Alex) - Please refer to individual patches for localized changes. Changes since V2: - V2: https://lore.kernel.org/lkml/cover.1680038771.git.reinette.chatre@intel.com/ - During testing of V2 "kernel test robot" reported issues resulting from include/linux/pci.h missing a stub for pci_msix_can_alloc_dyn() when CONFIG_PCI_MSI=n. A separate fix was sent to address this. The fix can be found in the kernel (since v6.3-rc7) as commit 195d8e5da3ac ("PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()") - Biggest change is the transition to "active contexts" for both MSI and MSI-X. Interrupt contexts have always been allocated when the interrupts are allocated while they are only used while interrupts are enabled. In this series interrupt contexts are made dynamic, while doing so their allocation is moved to match how they are used: allocated when interrupts are enabled. Whether a Linux interrupt number exists determines whether an interrupt can be enabled. Previous policy (up to V2) that an allocated interrupt has an interrupt context no longer applies. Instead, an interrupt context has a handler/trigger, aka "active contexts". (Alex) - Re-ordered patches in support of "active contexts". - Only free interrupts on MSI-X teardown and otherwise use the allocated interrupts as a cache. (Alex) - Using unsigned int for the vector broke the unwind loop within vfio_msi_set_block(). (Alex) - Introduce new "has_dyn_msix" property of virtual device instead of querying support every time. (Alex) - Some smaller changes, please refer to individual patches. Changes since RFC V1: - RFC V1: https://lore.kernel.org/lkml/cover.1678911529.git.reinette.chatre@intel.com/ - Improved changelogs. - Simplify interface so that vfio_irq_ctx_alloc_single() returns pointer to allocated context. (Alex) - Remove vfio_irq_ctx_range_allocated() and associated attempts to maintain invalid error path behavior. (Alex and Kevin) - Add pointer to interrupt context as function parameter to vfio_irq_ctx_free(). (Alex) - Ensure variables are initialized. (Dan Carpenter) - Only support dynamic allocation if device supports it. (Alex) Qemu allocates interrupts incrementally at the time the guest unmasks an interrupt, for example each time a Linux guest runs request_irq(). Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1]. This prompted Qemu to, when allocating a new interrupt, first release all previously allocated interrupts (including disable of MSI-X) followed by re-allocation of all interrupts that includes the new interrupt. Please see [2] for a detailed discussion about this issue. Releasing and re-allocating interrupts may be acceptable if all interrupts are unmasked during device initialization. If unmasking of interrupts occur during runtime this may result in lost interrupts. For example, consider an accelerator device with multiple work queues, each work queue having a dedicated interrupt. A work queue can be enabled at any time with its associated interrupt unmasked while other work queues are already active. Having all interrupts released and MSI-X disabled to enable the new work queue will impact active work queues. This series builds on the recent interrupt sub-system core changes that added support for dynamic MSI-X allocation after initial MSI-X enabling. Add support for dynamic MSI-X allocation to vfio-pci. A flag indicating lack of support for dynamic allocation already exist: VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With support for dynamic MSI-X the flag is cleared for MSI-X when supported, enabling Qemu to modify its behavior. Any feedback is appreciated Reinette [1] commit 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X") [2] https://lore.kernel.org/kvm/MWHPR11MB188603D0D809C1079F5817DC8C099@MWHPR11MB1886.namprd11.prod.outlook.com/#t Reinette Chatre (11): vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable vfio/pci: Remove negative check on unsigned vector vfio/pci: Prepare for dynamic interrupt context storage vfio/pci: Move to single error path vfio/pci: Use xarray for interrupt context storage vfio/pci: Remove interrupt context counter vfio/pci: Update stale comment vfio/pci: Use bitfield for struct vfio_pci_core_device flags vfio/pci: Probe and store ability to support dynamic MSI-X vfio/pci: Support dynamic MSI-X vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X drivers/vfio/pci/vfio_pci_core.c | 8 +- drivers/vfio/pci/vfio_pci_intrs.c | 305 ++++++++++++++++++++---------- include/linux/vfio_pci_core.h | 26 +-- include/uapi/linux/vfio.h | 3 + 4 files changed, 229 insertions(+), 113 deletions(-) base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4
Comments
On Thu, 11 May 2023 08:44:27 -0700 Reinette Chatre <reinette.chatre@intel.com> wrote: > Changes since V4: > - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/ > - Add Kevin's Reviewed-by tag as applicable. > - Treat non-existing INTx interrupt context as kernel bug with WARN. This > exposed an issue in the scenario where INTx mask/unmask may occur without > INTx enabled. This is fixed by obtaining the interrupt context later > (right before use) within impacted functions: vfio_pci_intx_mask() and > vfio_pci_intx_unmask_handler(). (Kevin) > - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel > bug via a WARN instead of ignoring this value. (Kevin) > - Improve accuracy of comments. (Kevin) > - Please refer to individual patches for local changes. Looks good to me. Kevin, do you want to add any additional reviews or check the changes made based on your previous comments? Thanks, Alex > Changes since V3: > - V3: https://lore.kernel.org/lkml/cover.1681837892.git.reinette.chatre@intel.com/ > - Be considerate about layout and size with changes to > struct vfio_pci_core_device. Keep flags together and transition all to > use bitfields. (Alex and Jason) > - Do not free dynamically allocated interrupts on error path. (Alex) > - Please refer to individual patches for localized changes. > > Changes since V2: > - V2: https://lore.kernel.org/lkml/cover.1680038771.git.reinette.chatre@intel.com/ > - During testing of V2 "kernel test robot" reported issues resulting from > include/linux/pci.h missing a stub for pci_msix_can_alloc_dyn() when > CONFIG_PCI_MSI=n. A separate fix was sent to address this. The fix can > be found in the kernel (since v6.3-rc7) as > commit 195d8e5da3ac ("PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()") > - Biggest change is the transition to "active contexts" for both MSI and MSI-X. > Interrupt contexts have always been allocated when the interrupts are > allocated while they are only used while interrupts are > enabled. In this series interrupt contexts are made dynamic, while doing > so their allocation is moved to match how they are used: allocated when > interrupts are enabled. Whether a Linux interrupt number exists determines > whether an interrupt can be enabled. > Previous policy (up to V2) that an allocated interrupt has an interrupt > context no longer applies. Instead, an interrupt context has a > handler/trigger, aka "active contexts". (Alex) > - Re-ordered patches in support of "active contexts". > - Only free interrupts on MSI-X teardown and otherwise use the > allocated interrupts as a cache. (Alex) > - Using unsigned int for the vector broke the unwind loop within > vfio_msi_set_block(). (Alex) > - Introduce new "has_dyn_msix" property of virtual device instead of > querying support every time. (Alex) > - Some smaller changes, please refer to individual patches. > > Changes since RFC V1: > - RFC V1: https://lore.kernel.org/lkml/cover.1678911529.git.reinette.chatre@intel.com/ > - Improved changelogs. > - Simplify interface so that vfio_irq_ctx_alloc_single() returns pointer to > allocated context. (Alex) > - Remove vfio_irq_ctx_range_allocated() and associated attempts to maintain > invalid error path behavior. (Alex and Kevin) > - Add pointer to interrupt context as function parameter to > vfio_irq_ctx_free(). (Alex) > - Ensure variables are initialized. (Dan Carpenter) > - Only support dynamic allocation if device supports it. (Alex) > > Qemu allocates interrupts incrementally at the time the guest unmasks an > interrupt, for example each time a Linux guest runs request_irq(). > > Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1]. > This prompted Qemu to, when allocating a new interrupt, first release all > previously allocated interrupts (including disable of MSI-X) followed > by re-allocation of all interrupts that includes the new interrupt. > Please see [2] for a detailed discussion about this issue. > > Releasing and re-allocating interrupts may be acceptable if all > interrupts are unmasked during device initialization. If unmasking of > interrupts occur during runtime this may result in lost interrupts. > For example, consider an accelerator device with multiple work queues, > each work queue having a dedicated interrupt. A work queue can be > enabled at any time with its associated interrupt unmasked while other > work queues are already active. Having all interrupts released and MSI-X > disabled to enable the new work queue will impact active work queues. > > This series builds on the recent interrupt sub-system core changes > that added support for dynamic MSI-X allocation after initial MSI-X > enabling. > > Add support for dynamic MSI-X allocation to vfio-pci. A flag > indicating lack of support for dynamic allocation already exist: > VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With > support for dynamic MSI-X the flag is cleared for MSI-X when supported, > enabling Qemu to modify its behavior. > > Any feedback is appreciated > > Reinette > > [1] commit 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X") > [2] https://lore.kernel.org/kvm/MWHPR11MB188603D0D809C1079F5817DC8C099@MWHPR11MB1886.namprd11.prod.outlook.com/#t > > Reinette Chatre (11): > vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable > vfio/pci: Remove negative check on unsigned vector > vfio/pci: Prepare for dynamic interrupt context storage > vfio/pci: Move to single error path > vfio/pci: Use xarray for interrupt context storage > vfio/pci: Remove interrupt context counter > vfio/pci: Update stale comment > vfio/pci: Use bitfield for struct vfio_pci_core_device flags > vfio/pci: Probe and store ability to support dynamic MSI-X > vfio/pci: Support dynamic MSI-X > vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X > > drivers/vfio/pci/vfio_pci_core.c | 8 +- > drivers/vfio/pci/vfio_pci_intrs.c | 305 ++++++++++++++++++++---------- > include/linux/vfio_pci_core.h | 26 +-- > include/uapi/linux/vfio.h | 3 + > 4 files changed, 229 insertions(+), 113 deletions(-) > > > base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Wednesday, May 17, 2023 6:53 AM > > On Thu, 11 May 2023 08:44:27 -0700 > Reinette Chatre <reinette.chatre@intel.com> wrote: > > > Changes since V4: > > - V4: > https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com > / > > - Add Kevin's Reviewed-by tag as applicable. > > - Treat non-existing INTx interrupt context as kernel bug with WARN. This > > exposed an issue in the scenario where INTx mask/unmask may occur > without > > INTx enabled. This is fixed by obtaining the interrupt context later > > (right before use) within impacted functions: vfio_pci_intx_mask() and > > vfio_pci_intx_unmask_handler(). (Kevin) > > - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel > > bug via a WARN instead of ignoring this value. (Kevin) > > - Improve accuracy of comments. (Kevin) > > - Please refer to individual patches for local changes. > > Looks good to me. > > Kevin, do you want to add any additional reviews or check the changes > made based on your previous comments? > Good to me too. I've given the remaining reviewed-by's.
On Thu, May 11, 2023 at 08:44:27AM -0700, Reinette Chatre wrote: > > Qemu allocates interrupts incrementally at the time the guest unmasks an > interrupt, for example each time a Linux guest runs request_irq(). > > Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1]. > This prompted Qemu to, when allocating a new interrupt, first release all > previously allocated interrupts (including disable of MSI-X) followed > by re-allocation of all interrupts that includes the new interrupt. > Please see [2] for a detailed discussion about this issue. > > Releasing and re-allocating interrupts may be acceptable if all > interrupts are unmasked during device initialization. If unmasking of > interrupts occur during runtime this may result in lost interrupts. > For example, consider an accelerator device with multiple work queues, > each work queue having a dedicated interrupt. A work queue can be > enabled at any time with its associated interrupt unmasked while other > work queues are already active. Having all interrupts released and MSI-X > disabled to enable the new work queue will impact active work queues. > > This series builds on the recent interrupt sub-system core changes > that added support for dynamic MSI-X allocation after initial MSI-X > enabling. > > Add support for dynamic MSI-X allocation to vfio-pci. A flag > indicating lack of support for dynamic allocation already exist: > VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With > support for dynamic MSI-X the flag is cleared for MSI-X when supported, > enabling Qemu to modify its behavior. > > Any feedback is appreciated Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason
On 5/16/2023 7:14 PM, Tian, Kevin wrote: >> From: Alex Williamson <alex.williamson@redhat.com> >> Sent: Wednesday, May 17, 2023 6:53 AM >> >> On Thu, 11 May 2023 08:44:27 -0700 >> Reinette Chatre <reinette.chatre@intel.com> wrote: >> >>> Changes since V4: >>> - V4: >> https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com >> / >>> - Add Kevin's Reviewed-by tag as applicable. >>> - Treat non-existing INTx interrupt context as kernel bug with WARN. This >>> exposed an issue in the scenario where INTx mask/unmask may occur >> without >>> INTx enabled. This is fixed by obtaining the interrupt context later >>> (right before use) within impacted functions: vfio_pci_intx_mask() and >>> vfio_pci_intx_unmask_handler(). (Kevin) >>> - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel >>> bug via a WARN instead of ignoring this value. (Kevin) >>> - Improve accuracy of comments. (Kevin) >>> - Please refer to individual patches for local changes. >> >> Looks good to me. >> >> Kevin, do you want to add any additional reviews or check the changes >> made based on your previous comments? >> > > Good to me too. I've given the remaining reviewed-by's. Thank you very much Alex and Kevin. Reinette
On 5/17/2023 7:25 AM, Jason Gunthorpe wrote: > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > Thank you very much Jason. Reinette
Reinette! On Thu, May 11 2023 at 08:44, Reinette Chatre wrote: > Changes since V4: > - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/ > - Add Kevin's Reviewed-by tag as applicable. > - Treat non-existing INTx interrupt context as kernel bug with WARN. This > exposed an issue in the scenario where INTx mask/unmask may occur without > INTx enabled. This is fixed by obtaining the interrupt context later > (right before use) within impacted functions: vfio_pci_intx_mask() and > vfio_pci_intx_unmask_handler(). (Kevin) > - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel > bug via a WARN instead of ignoring this value. (Kevin) > - Improve accuracy of comments. (Kevin) > - Please refer to individual patches for local changes. I only skimmed the series for obvious hickups vs. the PCI/MSI core (my virt[io] foo is limited) and I did not find anything to complain about. Aside of that I really like how this series is built up to restructure and cleanup things first so that the new functionality falls in place instead of the popular 'glue it in no matter what' approach. That's a pleasure to read even for the virt[io] illiterate! Acked-by: Thomas Gleixner <tglx@linutronix.de> Thanks, tglx
On 5/22/2023 3:25 PM, Thomas Gleixner wrote: > > Acked-by: Thomas Gleixner <tglx@linutronix.de> > Thank you very much Thomas. Reinette
On Thu, 11 May 2023 08:44:27 -0700 Reinette Chatre <reinette.chatre@intel.com> wrote: > Changes since V4: > - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/ > - Add Kevin's Reviewed-by tag as applicable. > - Treat non-existing INTx interrupt context as kernel bug with WARN. This > exposed an issue in the scenario where INTx mask/unmask may occur without > INTx enabled. This is fixed by obtaining the interrupt context later > (right before use) within impacted functions: vfio_pci_intx_mask() and > vfio_pci_intx_unmask_handler(). (Kevin) > - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel > bug via a WARN instead of ignoring this value. (Kevin) > - Improve accuracy of comments. (Kevin) > - Please refer to individual patches for local changes. > > Changes since V3: > - V3: https://lore.kernel.org/lkml/cover.1681837892.git.reinette.chatre@intel.com/ > - Be considerate about layout and size with changes to > struct vfio_pci_core_device. Keep flags together and transition all to > use bitfields. (Alex and Jason) > - Do not free dynamically allocated interrupts on error path. (Alex) > - Please refer to individual patches for localized changes. > > Changes since V2: > - V2: https://lore.kernel.org/lkml/cover.1680038771.git.reinette.chatre@intel.com/ > - During testing of V2 "kernel test robot" reported issues resulting from > include/linux/pci.h missing a stub for pci_msix_can_alloc_dyn() when > CONFIG_PCI_MSI=n. A separate fix was sent to address this. The fix can > be found in the kernel (since v6.3-rc7) as > commit 195d8e5da3ac ("PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()") > - Biggest change is the transition to "active contexts" for both MSI and MSI-X. > Interrupt contexts have always been allocated when the interrupts are > allocated while they are only used while interrupts are > enabled. In this series interrupt contexts are made dynamic, while doing > so their allocation is moved to match how they are used: allocated when > interrupts are enabled. Whether a Linux interrupt number exists determines > whether an interrupt can be enabled. > Previous policy (up to V2) that an allocated interrupt has an interrupt > context no longer applies. Instead, an interrupt context has a > handler/trigger, aka "active contexts". (Alex) > - Re-ordered patches in support of "active contexts". > - Only free interrupts on MSI-X teardown and otherwise use the > allocated interrupts as a cache. (Alex) > - Using unsigned int for the vector broke the unwind loop within > vfio_msi_set_block(). (Alex) > - Introduce new "has_dyn_msix" property of virtual device instead of > querying support every time. (Alex) > - Some smaller changes, please refer to individual patches. > > Changes since RFC V1: > - RFC V1: https://lore.kernel.org/lkml/cover.1678911529.git.reinette.chatre@intel.com/ > - Improved changelogs. > - Simplify interface so that vfio_irq_ctx_alloc_single() returns pointer to > allocated context. (Alex) > - Remove vfio_irq_ctx_range_allocated() and associated attempts to maintain > invalid error path behavior. (Alex and Kevin) > - Add pointer to interrupt context as function parameter to > vfio_irq_ctx_free(). (Alex) > - Ensure variables are initialized. (Dan Carpenter) > - Only support dynamic allocation if device supports it. (Alex) > > Qemu allocates interrupts incrementally at the time the guest unmasks an > interrupt, for example each time a Linux guest runs request_irq(). > > Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1]. > This prompted Qemu to, when allocating a new interrupt, first release all > previously allocated interrupts (including disable of MSI-X) followed > by re-allocation of all interrupts that includes the new interrupt. > Please see [2] for a detailed discussion about this issue. > > Releasing and re-allocating interrupts may be acceptable if all > interrupts are unmasked during device initialization. If unmasking of > interrupts occur during runtime this may result in lost interrupts. > For example, consider an accelerator device with multiple work queues, > each work queue having a dedicated interrupt. A work queue can be > enabled at any time with its associated interrupt unmasked while other > work queues are already active. Having all interrupts released and MSI-X > disabled to enable the new work queue will impact active work queues. > > This series builds on the recent interrupt sub-system core changes > that added support for dynamic MSI-X allocation after initial MSI-X > enabling. > > Add support for dynamic MSI-X allocation to vfio-pci. A flag > indicating lack of support for dynamic allocation already exist: > VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With > support for dynamic MSI-X the flag is cleared for MSI-X when supported, > enabling Qemu to modify its behavior. > > Any feedback is appreciated > > Reinette > > [1] commit 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X") > [2] https://lore.kernel.org/kvm/MWHPR11MB188603D0D809C1079F5817DC8C099@MWHPR11MB1886.namprd11.prod.outlook.com/#t > > Reinette Chatre (11): > vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable > vfio/pci: Remove negative check on unsigned vector > vfio/pci: Prepare for dynamic interrupt context storage > vfio/pci: Move to single error path > vfio/pci: Use xarray for interrupt context storage > vfio/pci: Remove interrupt context counter > vfio/pci: Update stale comment > vfio/pci: Use bitfield for struct vfio_pci_core_device flags > vfio/pci: Probe and store ability to support dynamic MSI-X > vfio/pci: Support dynamic MSI-X > vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X > > drivers/vfio/pci/vfio_pci_core.c | 8 +- > drivers/vfio/pci/vfio_pci_intrs.c | 305 ++++++++++++++++++++---------- > include/linux/vfio_pci_core.h | 26 +-- > include/uapi/linux/vfio.h | 3 + > 4 files changed, 229 insertions(+), 113 deletions(-) > > > base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4 Applied to vfio next branch for v6.5. Thanks! Alex
Running regression tests in the following test matrix: i40e PF + INTx interrupt + RHEL9 guest -- PASS bnx2x PF + MSI-X + RHEL9 guest -- PASS iavf VF + MSI-X + Win2019 guest -- PASS mlx5_core VF + MSI-X + Win2022 guest -- PASS ixgbe PF + INTx + RHEL9 guest -- PASS i40e PF + MSIX + Win2019 guest -- PASS qede VF + MSIX + RHEL9 guest -- PASS Tested-by: YangHang Liu <yanghliu@redhat.com> On Wed, May 24, 2023 at 6:46 AM Alex Williamson <alex.williamson@redhat.com> wrote: > > On Thu, 11 May 2023 08:44:27 -0700 > Reinette Chatre <reinette.chatre@intel.com> wrote: > > > Changes since V4: > > - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/ > > - Add Kevin's Reviewed-by tag as applicable. > > - Treat non-existing INTx interrupt context as kernel bug with WARN. This > > exposed an issue in the scenario where INTx mask/unmask may occur without > > INTx enabled. This is fixed by obtaining the interrupt context later > > (right before use) within impacted functions: vfio_pci_intx_mask() and > > vfio_pci_intx_unmask_handler(). (Kevin) > > - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel > > bug via a WARN instead of ignoring this value. (Kevin) > > - Improve accuracy of comments. (Kevin) > > - Please refer to individual patches for local changes. > > > > Changes since V3: > > - V3: https://lore.kernel.org/lkml/cover.1681837892.git.reinette.chatre@intel.com/ > > - Be considerate about layout and size with changes to > > struct vfio_pci_core_device. Keep flags together and transition all to > > use bitfields. (Alex and Jason) > > - Do not free dynamically allocated interrupts on error path. (Alex) > > - Please refer to individual patches for localized changes. > > > > Changes since V2: > > - V2: https://lore.kernel.org/lkml/cover.1680038771.git.reinette.chatre@intel.com/ > > - During testing of V2 "kernel test robot" reported issues resulting from > > include/linux/pci.h missing a stub for pci_msix_can_alloc_dyn() when > > CONFIG_PCI_MSI=n. A separate fix was sent to address this. The fix can > > be found in the kernel (since v6.3-rc7) as > > commit 195d8e5da3ac ("PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()") > > - Biggest change is the transition to "active contexts" for both MSI and MSI-X. > > Interrupt contexts have always been allocated when the interrupts are > > allocated while they are only used while interrupts are > > enabled. In this series interrupt contexts are made dynamic, while doing > > so their allocation is moved to match how they are used: allocated when > > interrupts are enabled. Whether a Linux interrupt number exists determines > > whether an interrupt can be enabled. > > Previous policy (up to V2) that an allocated interrupt has an interrupt > > context no longer applies. Instead, an interrupt context has a > > handler/trigger, aka "active contexts". (Alex) > > - Re-ordered patches in support of "active contexts". > > - Only free interrupts on MSI-X teardown and otherwise use the > > allocated interrupts as a cache. (Alex) > > - Using unsigned int for the vector broke the unwind loop within > > vfio_msi_set_block(). (Alex) > > - Introduce new "has_dyn_msix" property of virtual device instead of > > querying support every time. (Alex) > > - Some smaller changes, please refer to individual patches. > > > > Changes since RFC V1: > > - RFC V1: https://lore.kernel.org/lkml/cover.1678911529.git.reinette.chatre@intel.com/ > > - Improved changelogs. > > - Simplify interface so that vfio_irq_ctx_alloc_single() returns pointer to > > allocated context. (Alex) > > - Remove vfio_irq_ctx_range_allocated() and associated attempts to maintain > > invalid error path behavior. (Alex and Kevin) > > - Add pointer to interrupt context as function parameter to > > vfio_irq_ctx_free(). (Alex) > > - Ensure variables are initialized. (Dan Carpenter) > > - Only support dynamic allocation if device supports it. (Alex) > > > > Qemu allocates interrupts incrementally at the time the guest unmasks an > > interrupt, for example each time a Linux guest runs request_irq(). > > > > Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1]. > > This prompted Qemu to, when allocating a new interrupt, first release all > > previously allocated interrupts (including disable of MSI-X) followed > > by re-allocation of all interrupts that includes the new interrupt. > > Please see [2] for a detailed discussion about this issue. > > > > Releasing and re-allocating interrupts may be acceptable if all > > interrupts are unmasked during device initialization. If unmasking of > > interrupts occur during runtime this may result in lost interrupts. > > For example, consider an accelerator device with multiple work queues, > > each work queue having a dedicated interrupt. A work queue can be > > enabled at any time with its associated interrupt unmasked while other > > work queues are already active. Having all interrupts released and MSI-X > > disabled to enable the new work queue will impact active work queues. > > > > This series builds on the recent interrupt sub-system core changes > > that added support for dynamic MSI-X allocation after initial MSI-X > > enabling. > > > > Add support for dynamic MSI-X allocation to vfio-pci. A flag > > indicating lack of support for dynamic allocation already exist: > > VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With > > support for dynamic MSI-X the flag is cleared for MSI-X when supported, > > enabling Qemu to modify its behavior. > > > > Any feedback is appreciated > > > > Reinette > > > > [1] commit 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X") > > [2] https://lore.kernel.org/kvm/MWHPR11MB188603D0D809C1079F5817DC8C099@MWHPR11MB1886.namprd11.prod.outlook.com/#t > > > > Reinette Chatre (11): > > vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable > > vfio/pci: Remove negative check on unsigned vector > > vfio/pci: Prepare for dynamic interrupt context storage > > vfio/pci: Move to single error path > > vfio/pci: Use xarray for interrupt context storage > > vfio/pci: Remove interrupt context counter > > vfio/pci: Update stale comment > > vfio/pci: Use bitfield for struct vfio_pci_core_device flags > > vfio/pci: Probe and store ability to support dynamic MSI-X > > vfio/pci: Support dynamic MSI-X > > vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X > > > > drivers/vfio/pci/vfio_pci_core.c | 8 +- > > drivers/vfio/pci/vfio_pci_intrs.c | 305 ++++++++++++++++++++---------- > > include/linux/vfio_pci_core.h | 26 +-- > > include/uapi/linux/vfio.h | 3 + > > 4 files changed, 229 insertions(+), 113 deletions(-) > > > > > > base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4 > > Applied to vfio next branch for v6.5. Thanks! > > Alex >
Hi YangHang, On 5/23/2023 7:43 PM, YangHang Liu wrote: > Running regression tests in the following test matrix: > > i40e PF + INTx interrupt + RHEL9 guest -- PASS > bnx2x PF + MSI-X + RHEL9 guest -- PASS > iavf VF + MSI-X + Win2019 guest -- PASS > mlx5_core VF + MSI-X + Win2022 guest -- PASS > ixgbe PF + INTx + RHEL9 guest -- PASS > i40e PF + MSIX + Win2019 guest -- PASS > qede VF + MSIX + RHEL9 guest -- PASS > > Tested-by: YangHang Liu <yanghliu@redhat.com> > Thank you very much for this thorough testing. Reinette