[V2,07/33] genirq/msi: Provide msi_create/free_device_irq_domain()

Message ID 20221121091326.879869866@linutronix.de
State New
Headers
Series genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation |

Commit Message

Thomas Gleixner Nov. 21, 2022, 2:37 p.m. UTC
  Now that all prerequsites are in place, provide the actual interfaces for
creating and removing per device interrupt domains.

MSI device interrupt domains are created from the provided
msi_domain_template which is duplicated so that it can be modified for the
particular device.

The name of the domain and the name of the interrupt chip are composed by
"$(PREFIX)$(CHIPNAME)-$(DEVNAME)"

  $PREFIX:   The optional prefix provided by the underlying MSI parent domain
             via msi_parent_ops::prefix.
  $CHIPNAME: The name of the irq_chip in the template
  $DEVNAME:  The name of the device

The domain is further initialized through a MSI parent domain callback which
fills in the required functionality for the parent domain or domains further
down the hierarchy. This initialization can fail, e.g. when the requested
feature or MSI domain type cannot be supported.

The domain pointer is stored in the pointer array inside of msi_device_data
which is attached to the domain.

The domain can be removed via the API or left for disposal via devres when
the device is torn down. The API removal is useful e.g. for PCI to have
seperate domains for MSI and MSI-X, which are mutually exclusive and always
occupy the default domain id slot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/msi.h |    6 ++
 kernel/irq/msi.c    |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 152 insertions(+)
  

Comments

Tian, Kevin Nov. 23, 2022, 8:02 a.m. UTC | #1
> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Monday, November 21, 2022 10:38 PM
> 
> 
> +static inline void msi_remove_device_irqdomains(struct device *dev, struct
> msi_device_data *md)
> +{

'md' is unused

> 
> +/**
> + * msi_create_device_irq_domain - Create a device MSI interrupt domain
> + * @dev:		Pointer to the device
> + * @domid:		Domain id
> + * @template:		MSI domain info bundle used as template
> + * @hwsize:		Maximum number of MSI table entries (0 if unknown
> or unlimited)
> + * @domain_data:	Optional pointer to domain specific data which is set
> in
> + *			msi_domain_info::data
> + * @chip_data:		Optional pointer to chip specific data which is
> set in
> + *			msi_domain_info::chip_data
> + *
> + * Return: True on success, false otherwise

Can you elaborate why boolean type is selected instead of returning the
actual error codes? the outmost callers are all new functions:

	pci_setup_msi[x]_device_domain()
	pci_create_ims_domain()

I didn't find out any legacy convention forcing this way...

> +	bundle = kmemdup(template, sizeof(*bundle), GFP_KERNEL);
> +	if (!bundle)
> +		return false;
> +
> +	bundle->info.hwsize = hwsize ? hwsize : MSI_MAX_INDEX;

patch04 marks that hwsize being 0 means unknown or unlimited in the
header file.

but here info.hwsize always gets a value i.e. the meaning of 0 only exists
in this function. What about removing the trailing words about 0 in
patch04?

- + * @hwsize:		The hardware table size (0 if unknown/unlimited)
+ + * @hwsize:		The hardware table size
  
Thomas Gleixner Nov. 23, 2022, 11:38 a.m. UTC | #2
On Wed, Nov 23 2022 at 08:02, Kevin Tian wrote:
>> From: Thomas Gleixner <tglx@linutronix.de>
>> Sent: Monday, November 21, 2022 10:38 PM
>> 
>> 
>> +static inline void msi_remove_device_irqdomains(struct device *dev, struct
>> msi_device_data *md)
>> +{
>
> 'md' is unused

Duh, yes.

>> + *
>> + * Return: True on success, false otherwise
>
> Can you elaborate why boolean type is selected instead of returning the
> actual error codes? the outmost callers are all new functions:
>
> 	pci_setup_msi[x]_device_domain()
> 	pci_create_ims_domain()
>
> I didn't find out any legacy convention forcing this way...

What's the value of error codes? 99% of all callsites do:

       ret = foo();
       if (ret)
          goto fail;

Nothing evaluates the error codes, unless there is real useful
information like EAGAIN or ENOSPC which can tell the caller to retry
eventually or with different parameters. But for the above, the error
code is just useless.

>> +	bundle = kmemdup(template, sizeof(*bundle), GFP_KERNEL);
>> +	if (!bundle)
>> +		return false;
>> +
>> +	bundle->info.hwsize = hwsize ? hwsize : MSI_MAX_INDEX;
>
> patch04 marks that hwsize being 0 means unknown or unlimited in the
> header file.
>
> but here info.hwsize always gets a value i.e. the meaning of 0 only exists
> in this function. What about removing the trailing words about 0 in
> patch04?
>
> - + * @hwsize:		The hardware table size (0 if unknown/unlimited)
> + + * @hwsize:		The hardware table size

Fair enough, though I rather make that:

 * @hwsize:		The hardware table size or the software defined
                        index limit
  
Thomas Gleixner Nov. 23, 2022, 9:01 p.m. UTC | #3
On Wed, Nov 23 2022 at 12:38, Thomas Gleixner wrote:
> On Wed, Nov 23 2022 at 08:02, Kevin Tian wrote:
>>> +	bundle->info.hwsize = hwsize ? hwsize : MSI_MAX_INDEX;
>>
>> patch04 marks that hwsize being 0 means unknown or unlimited in the
>> header file.
>>
>> but here info.hwsize always gets a value i.e. the meaning of 0 only exists
>> in this function. What about removing the trailing words about 0 in
>> patch04?
>>
>> - + * @hwsize:		The hardware table size (0 if unknown/unlimited)
>> + + * @hwsize:		The hardware table size
>
> Fair enough, though I rather make that:
>
>  * @hwsize:		The hardware table size or the software defined
>                         index limit
>

Actually 0 still needs to be valid to guarantee backward compatibility
for all existing msi_domain_info implementations.

The above is the per device domain creation function, but yes, I can lift
that initialization into the common MSI domain creation code.

Let me stare at this some more.
  
Tian, Kevin Nov. 24, 2022, 1:07 a.m. UTC | #4
> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Wednesday, November 23, 2022 7:38 PM
> 
> >> + *
> >> + * Return: True on success, false otherwise
> >
> > Can you elaborate why boolean type is selected instead of returning the
> > actual error codes? the outmost callers are all new functions:
> >
> > 	pci_setup_msi[x]_device_domain()
> > 	pci_create_ims_domain()
> >
> > I didn't find out any legacy convention forcing this way...
> 
> What's the value of error codes? 99% of all callsites do:
> 
>        ret = foo();
>        if (ret)
>           goto fail;
> 
> Nothing evaluates the error codes, unless there is real useful
> information like EAGAIN or ENOSPC which can tell the caller to retry
> eventually or with different parameters. But for the above, the error
> code is just useless.
> 

I looked at it from the outmost invocation:

@@ -436,6 +436,9 @@ int __pci_enable_msi_range(struct pci_de
 	if (rc)
 		return rc;
 
+	if (!pci_setup_msi_device_domain(dev))
+		return -ENODEV;
+

the current style kind of converts meaningful -EINVAL/-ENOMEM/etc.
into -ENODEV.
  
Thomas Gleixner Nov. 24, 2022, 8:36 a.m. UTC | #5
On Thu, Nov 24 2022 at 01:07, Kevin Tian wrote:
>> From: Thomas Gleixner <tglx@linutronix.de>
> I looked at it from the outmost invocation:
>
> @@ -436,6 +436,9 @@ int __pci_enable_msi_range(struct pci_de
>  	if (rc)
>  		return rc;
>  
> +	if (!pci_setup_msi_device_domain(dev))
> +		return -ENODEV;
> +
>
> the current style kind of converts meaningful -EINVAL/-ENOMEM/etc.
> into -ENODEV.

But go to the call sites of the various places in drivers which set up
MSI or MSI-X and check whether anything evaluates those error codes in a
meaningful way.

Some of them print the error code, but that does not help much because
the error code does not allow you to pin point the place which returns
that. If you just analyze the pci_alloc_irq_vectors_affinity() call then
you find at least 10 places, which can return -ENOMEM. So how is that
meaningful and helpful?

All it tells you is that some memory allocation failed. In that case the
failure of the PCI/MSI[-X] setup is the least of the problems.

Where error codes are mandatory are user space interfaces, but in the
kernel a simple fail/success like we have with many interfaces which
just return a NULL pointer on fail is sufficient.

Just because the kernel historically propagated error codes all over the
place does not make them useful or meaningful.

Thanks,

        tglx
  
Tian, Kevin Nov. 28, 2022, 1:47 a.m. UTC | #6
> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Thursday, November 24, 2022 4:36 PM
> 
> On Thu, Nov 24 2022 at 01:07, Kevin Tian wrote:
> >> From: Thomas Gleixner <tglx@linutronix.de>
> > I looked at it from the outmost invocation:
> >
> > @@ -436,6 +436,9 @@ int __pci_enable_msi_range(struct pci_de
> >  	if (rc)
> >  		return rc;
> >
> > +	if (!pci_setup_msi_device_domain(dev))
> > +		return -ENODEV;
> > +
> >
> > the current style kind of converts meaningful -EINVAL/-ENOMEM/etc.
> > into -ENODEV.
> 
> But go to the call sites of the various places in drivers which set up
> MSI or MSI-X and check whether anything evaluates those error codes in a
> meaningful way.
> 
> Some of them print the error code, but that does not help much because
> the error code does not allow you to pin point the place which returns
> that. If you just analyze the pci_alloc_irq_vectors_affinity() call then
> you find at least 10 places, which can return -ENOMEM. So how is that
> meaningful and helpful?
> 
> All it tells you is that some memory allocation failed. In that case the
> failure of the PCI/MSI[-X] setup is the least of the problems.
> 
> Where error codes are mandatory are user space interfaces, but in the
> kernel a simple fail/success like we have with many interfaces which
> just return a NULL pointer on fail is sufficient.
> 
> Just because the kernel historically propagated error codes all over the
> place does not make them useful or meaningful.
> 

Good learning. Thanks.
  

Patch

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -536,6 +536,12 @@  struct irq_domain *msi_create_irq_domain
 					 struct msi_domain_info *info,
 					 struct irq_domain *parent);
 
+bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
+				  const struct msi_domain_template *template,
+				  unsigned int hwsize, void *domain_data,
+				  void *chip_data);
+void msi_remove_device_irq_domain(struct device *dev, unsigned int domid);
+
 int msi_domain_alloc_irqs_range_locked(struct device *dev, unsigned int domid,
 				       unsigned int first, unsigned int last);
 int msi_domain_alloc_irqs_range(struct device *dev, unsigned int domid,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -53,6 +53,14 @@  static inline void msi_setup_default_irq
 		md->__irqdomains[MSI_DEFAULT_DOMAIN] = dev->msi.domain;
 }
 
+static inline void msi_remove_device_irqdomains(struct device *dev, struct msi_device_data *md)
+{
+	int domid;
+
+	for (domid = 0; domid < MSI_MAX_DEVICE_IRQDOMAINS; domid++)
+		msi_remove_device_irq_domain(dev, domid);
+}
+
 static int msi_get_domain_base_index(struct device *dev, unsigned int domid)
 {
 	lockdep_assert_held(&dev->msi.data->mutex);
@@ -279,6 +287,7 @@  static void msi_device_data_release(stru
 {
 	struct msi_device_data *md = res;
 
+	msi_remove_device_irqdomains(dev, md);
 	WARN_ON_ONCE(!xa_empty(&md->__store));
 	xa_destroy(&md->__store);
 	dev->msi.data = NULL;
@@ -875,6 +884,143 @@  bool msi_parent_init_dev_msi_info(struct
 							 msi_child_info);
 }
 
+/**
+ * msi_create_device_irq_domain - Create a device MSI interrupt domain
+ * @dev:		Pointer to the device
+ * @domid:		Domain id
+ * @template:		MSI domain info bundle used as template
+ * @hwsize:		Maximum number of MSI table entries (0 if unknown or unlimited)
+ * @domain_data:	Optional pointer to domain specific data which is set in
+ *			msi_domain_info::data
+ * @chip_data:		Optional pointer to chip specific data which is set in
+ *			msi_domain_info::chip_data
+ *
+ * Return: True on success, false otherwise
+ *
+ * There is no firmware node required for this interface because the per
+ * device domains are software constructs which are actually closer to the
+ * hardware reality than any firmware can describe them.
+ *
+ * The domain name and the irq chip name for a MSI device domain are
+ * composed by: "$(PREFIX)$(CHIPNAME)-$(DEVNAME)"
+ *
+ * $PREFIX:   Optional prefix provided by the underlying MSI parent domain
+ *	      via msi_parent_ops::prefix. If that pointer is NULL the prefix
+ *	      is empty.
+ * $CHIPNAME: The name of the irq_chip in @template
+ * $DEVNAME:  The name of the device
+ *
+ * This results in understandable chip names and hardware interrupt numbers
+ * in e.g. /proc/interrupts
+ *
+ * PCI-MSI-0000:00:1c.0     0-edge  Parent domain has no prefix
+ * IR-PCI-MSI-0000:00:1c.4  0-edge  Same with interrupt remapping prefix 'IR-'
+ *
+ * IR-PCI-MSIX-0000:3d:00.0 0-edge  Hardware interrupt numbers reflect
+ * IR-PCI-MSIX-0000:3d:00.0 1-edge  the real MSI-X index on that device
+ * IR-PCI-MSIX-0000:3d:00.0 2-edge
+ *
+ * On IMS domains the hardware interrupt number is either a table entry
+ * index or a purely software managed index but it is guaranteed to be
+ * unique.
+ *
+ * The domain pointer is stored in @dev::msi::data::__irqdomains[]. All
+ * subsequent operations on the domain depend on the domain id.
+ *
+ * The domain is automatically freed when the device is removed via devres
+ * in the context of @dev::msi::data freeing, but it can also be
+ * independently removed via @msi_remove_device_irq_domain().
+ */
+bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
+				  const struct msi_domain_template *template,
+				  unsigned int hwsize, void *domain_data,
+				  void *chip_data)
+{
+	struct irq_domain *domain, *parent = dev->msi.domain;
+	const struct msi_parent_ops *pops;
+	struct msi_domain_template *bundle;
+	struct fwnode_handle *fwnode;
+
+	if (!irq_domain_is_msi_parent(parent))
+		return false;
+
+	if (domid >= MSI_MAX_DEVICE_IRQDOMAINS)
+		return false;
+
+	bundle = kmemdup(template, sizeof(*bundle), GFP_KERNEL);
+	if (!bundle)
+		return false;
+
+	bundle->info.hwsize = hwsize ? hwsize : MSI_MAX_INDEX;
+	bundle->info.chip = &bundle->chip;
+	bundle->info.ops = &bundle->ops;
+	bundle->info.data = domain_data;
+	bundle->info.chip_data = chip_data;
+
+	pops = parent->msi_parent_ops;
+	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
+		 pops->prefix ? : "", bundle->chip.name, dev_name(dev));
+	bundle->chip.name = bundle->name;
+
+	fwnode = irq_domain_alloc_named_fwnode(bundle->name);
+	if (!fwnode)
+		goto free_bundle;
+
+	if (msi_setup_device_data(dev))
+		goto free_fwnode;
+
+	msi_lock_descs(dev);
+
+	if (WARN_ON_ONCE(msi_get_device_domain(dev, domid)))
+		goto fail;
+
+	if (!pops->init_dev_msi_info(dev, parent, parent, &bundle->info))
+		goto fail;
+
+	domain = __msi_create_irq_domain(fwnode, &bundle->info, IRQ_DOMAIN_FLAG_MSI_DEVICE, parent);
+	if (!domain)
+		goto fail;
+
+	domain->dev = dev;
+	dev->msi.data->__irqdomains[domid] = domain;
+	msi_unlock_descs(dev);
+	return true;
+
+fail:
+	msi_unlock_descs(dev);
+free_fwnode:
+	kfree(fwnode);
+free_bundle:
+	kfree(bundle);
+	return false;
+}
+
+/**
+ * msi_remove_device_irq_domain - Free a device MSI interrupt domain
+ * @dev:	Pointer to the device
+ * @domid:	Domain id
+ */
+void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
+{
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+
+	msi_lock_descs(dev);
+
+	domain = msi_get_device_domain(dev, domid);
+
+	if (!domain || !irq_domain_is_msi_device(domain))
+		goto unlock;
+
+	dev->msi.data->__irqdomains[domid] = NULL;
+	info = domain->host_data;
+	irq_domain_remove(domain);
+	kfree(container_of(info, struct msi_domain_template, info));
+
+unlock:
+	msi_unlock_descs(dev);
+}
+
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
 			    int nvec, msi_alloc_info_t *arg)
 {