cdx: add MSI support for CDX bus

Message ID 20230508140950.12717-1-nipun.gupta@amd.com
State New
Headers
Series cdx: add MSI support for CDX bus |

Commit Message

Gupta, Nipun May 8, 2023, 2:09 p.m. UTC
  Add CDX-MSI domain with gic-its domain as parent, to support MSI
for CDX devices. CDX devices allocate MSIs from the CDX domain.
Also, introduce APIs to alloc and free IRQs for CDX domain.

Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
---

This is a rework of CDX-MSI RFC sent at:
https://lore.kernel.org/lkml/20221014044049.2557085-8-nipun.gupta@amd.com/
The change was separated out from CDX bus series and has now been
validated on Versal-net platform.

 drivers/cdx/Kconfig                     |   1 +
 drivers/cdx/Makefile                    |   2 +-
 drivers/cdx/cdx.c                       |   6 +
 drivers/cdx/cdx.h                       |  10 ++
 drivers/cdx/cdx_msi.c                   | 153 ++++++++++++++++++++++++
 drivers/cdx/controller/cdx_controller.c |  23 ++++
 drivers/cdx/controller/mc_cdx_pcol.h    |  58 +++++++++
 drivers/cdx/controller/mcdi_functions.c |  19 +++
 drivers/cdx/controller/mcdi_functions.h |  20 ++++
 include/linux/cdx/cdx_bus.h             |  26 ++++
 kernel/irq/msi.c                        |   1 +
 11 files changed, 318 insertions(+), 1 deletion(-)
 create mode 100644 drivers/cdx/cdx_msi.c
  

Comments

Thomas Gleixner May 9, 2023, 8:01 a.m. UTC | #1
On Mon, May 08 2023 at 19:39, Nipun Gupta wrote:
> Add CDX-MSI domain with gic-its domain as parent, to support MSI
> for CDX devices. CDX devices allocate MSIs from the CDX domain.
> Also, introduce APIs to alloc and free IRQs for CDX domain.

This lacks any information why this needs to have a separate irq domain
and what this is about. Changelogs need to be self explanatory and
providing a link to some RFC series which might have more information
does not cut it.

Just for the record. I complained about the useless changelog in that
RFC series already.

> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>

This Signed-off-by chain is broken. If this is intended to denote
co-authorship, then please follow:

  https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by

> +static struct irq_chip cdx_msi_irq_chip = {
> +	.name			= "CDX-MSI",
> +	.irq_mask		= irq_chip_mask_parent,
> +	.irq_unmask		= irq_chip_unmask_parent,
> +	.irq_eoi		= irq_chip_eoi_parent,
> +	.irq_set_affinity	= msi_domain_set_affinity,
> +	.irq_write_msi_msg	= cdx_msi_write_msg
> +};

The only real CDX specific functionality here is a CDX specific
irq_write_msi_msg() callback, right?

And I gave you a pointer how this should be handled, but instead of
helping this effort along you go off and implement it differently just
because. Sigh!

Thanks,

        tglx
  
Gupta, Nipun May 9, 2023, 11:06 a.m. UTC | #2
> -----Original Message-----
> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Tuesday, May 9, 2023 1:32 PM
> To: Gupta, Nipun <Nipun.Gupta@amd.com>; gregkh@linuxfoundation.org;
> maz@kernel.org; jgg@ziepe.ca; linux-kernel@vger.kernel.org
> Cc: git (AMD-Xilinx) <git@amd.com>; Anand, Harpreet
> <harpreet.anand@amd.com>; Jansen Van Vuuren, Pieter <pieter.jansen-
> van-vuuren@amd.com>; Agarwal, Nikhil <nikhil.agarwal@amd.com>; Simek,
> Michal <michal.simek@amd.com>; Gangurde, Abhijit
> <abhijit.gangurde@amd.com>; Cascon, Pablo <pablo.cascon@amd.com>;
> Gupta, Nipun <Nipun.Gupta@amd.com>
> Subject: Re: [PATCH] cdx: add MSI support for CDX bus
> 
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
> 
> 
> On Mon, May 08 2023 at 19:39, Nipun Gupta wrote:
> > Add CDX-MSI domain with gic-its domain as parent, to support MSI
> > for CDX devices. CDX devices allocate MSIs from the CDX domain.
> > Also, introduce APIs to alloc and free IRQs for CDX domain.
> 
> This lacks any information why this needs to have a separate irq domain
> and what this is about. Changelogs need to be self explanatory and
> providing a link to some RFC series which might have more information
> does not cut it.
> 
> Just for the record. I complained about the useless changelog in that
> RFC series already.
> 
> > Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
> > Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
> 
> This Signed-off-by chain is broken. If this is intended to denote
> co-authorship, then please follow:
> 
>   https://www.kernel.org/doc/html/latest/process/submitting-
> patches.html#when-to-use-acked-by-cc-and-co-developed-by
> 
> > +static struct irq_chip cdx_msi_irq_chip = {
> > +     .name                   = "CDX-MSI",
> > +     .irq_mask               = irq_chip_mask_parent,
> > +     .irq_unmask             = irq_chip_unmask_parent,
> > +     .irq_eoi                = irq_chip_eoi_parent,
> > +     .irq_set_affinity       = msi_domain_set_affinity,
> > +     .irq_write_msi_msg      = cdx_msi_write_msg
> > +};
> 
> The only real CDX specific functionality here is a CDX specific
> irq_write_msi_msg() callback, right?
> 
> And I gave you a pointer how this should be handled, but instead of
> helping this effort along you go off and implement it differently just
> because. Sigh!

Hi Thomas,

As you rightly mentioned the irq_chip has only irq_write_msi_msg() as
callback, but there is also cdx_msi_prepare() in msi_domain_ops which
needs to fetch device ID from CDX device, due to which we are currently
using separate CDX domain.

IIUC, as per your suggestion we should have CDX bus token added into 
its_init_dev_msi_info() of
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/tree/drivers/irqchip/irq-gic-v3-its-msi-parent.c?h=devmsi-arm,
and register CDX specific 'msi_prepare' here; so that we can use
msi_create_device_irq_domain() to create a per device domain?

Thanks,
Nipun

> 
> Thanks,
> 
>         tglx
  
Thomas Gleixner May 9, 2023, 10:01 p.m. UTC | #3
Nipun!

On Tue, May 09 2023 at 11:06, Nipun Gupta wrote:
>> -----Original Message-----
>> From: Thomas Gleixner <tglx@linutronix.de>
>> Sent: Tuesday, May 9, 2023 1:32 PM
>> To: Gupta, Nipun <Nipun.Gupta@amd.com>; gregkh@linuxfoundation.org;
>> maz@kernel.org; jgg@ziepe.ca; linux-kernel@vger.kernel.org

Can you please fix your mail client to not copy half of the mail header
into your reply?

>> Caution: This message originated from an External Source. Use proper
>> caution when opening attachments, clicking links, or responding.

That's also relevant information for me, right?
 
>> The only real CDX specific functionality here is a CDX specific
>> irq_write_msi_msg() callback, right?
>> 
>> And I gave you a pointer how this should be handled, but instead of
>> helping this effort along you go off and implement it differently just
>> because. Sigh!
>
> As you rightly mentioned the irq_chip has only irq_write_msi_msg() as
> callback, but there is also cdx_msi_prepare() in msi_domain_ops which
> needs to fetch device ID from CDX device, due to which we are currently
> using separate CDX domain.

Sure. But where is that information in the changelog?

> IIUC, as per your suggestion we should have CDX bus token added into 
> its_init_dev_msi_info() of
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/tree/drivers/irqchip/irq-gic-v3-its-msi-parent.c?h=devmsi-arm,
> and register CDX specific 'msi_prepare' here; so that we can use
> msi_create_device_irq_domain() to create a per device domain?

Correct.

I'm not insisting on that, but you could at least have had the courtesy
of responding to my review reply and explain to me why you want to solve
it differently and why my suggestion is not the right solution.

Alternatively you could have added that information in the changelog or
cover letter.

So in summary you ignored _all_ review comments I made, went off and did
something different and provided a slightly different useless changelog
with the extra add on of a broken Signed-off-by chain.

Feel free to ignore my reviews and the documentation which we put out
there to make collaboration feasible for both sides, but please don't be
upset when I ignore you and your patches in return.

Thanks,

        tglx
  
Gupta, Nipun May 10, 2023, 2:04 p.m. UTC | #4
On 5/10/2023 3:31 AM, Thomas Gleixner wrote:
> 
> Nipun!
> 
> On Tue, May 09 2023 at 11:06, Nipun Gupta wrote:
>>> -----Original Message-----
>>> From: Thomas Gleixner <tglx@linutronix.de>
>>> Sent: Tuesday, May 9, 2023 1:32 PM
>>> To: Gupta, Nipun <Nipun.Gupta@amd.com>; gregkh@linuxfoundation.org;
>>> maz@kernel.org; jgg@ziepe.ca; linux-kernel@vger.kernel.org
> 
> Can you please fix your mail client to not copy half of the mail header
> into your reply?

Sure. Got it fixed.

> 
>>> Caution: This message originated from an External Source. Use proper
>>> caution when opening attachments, clicking links, or responding.
> 
> That's also relevant information for me, right?

Sorry to submit with this text, have already contacted concerned 
internal team regarding removal of this text. Have removed it manually 
for now.

> 
>>> The only real CDX specific functionality here is a CDX specific
>>> irq_write_msi_msg() callback, right?
>>>
>>> And I gave you a pointer how this should be handled, but instead of
>>> helping this effort along you go off and implement it differently just
>>> because. Sigh!
>>
>> As you rightly mentioned the irq_chip has only irq_write_msi_msg() as
>> callback, but there is also cdx_msi_prepare() in msi_domain_ops which
>> needs to fetch device ID from CDX device, due to which we are currently
>> using separate CDX domain.
> 
> Sure. But where is that information in the changelog?
> 
>> IIUC, as per your suggestion we should have CDX bus token added into
>> its_init_dev_msi_info() of
>> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/tree/drivers/irqchip/irq-gic-v3-its-msi-parent.c?h=devmsi-arm,
>> and register CDX specific 'msi_prepare' here; so that we can use
>> msi_create_device_irq_domain() to create a per device domain?
> 
> Correct.
> 
> I'm not insisting on that, but you could at least have had the courtesy
> of responding to my review reply and explain to me why you want to solve
> it differently and why my suggestion is not the right solution.
> 
> Alternatively you could have added that information in the changelog or
> cover letter.
> 
> So in summary you ignored _all_ review comments I made, went off and did
> something different and provided a slightly different useless changelog
> with the extra add on of a broken Signed-off-by chain.
> 
> Feel free to ignore my reviews and the documentation which we put out
> there to make collaboration feasible for both sides, but please don't be
> upset when I ignore you and your patches in return.

Sincere apology for not responding to the earlier comments. Intention 
was never to ignore the review comments. Appreciate your vast changes 
regarding the MSI, and the patch series you shared took time to 
understand (provided other things as well), and it was quite late to 
reply. I understand that even in this case atleast I should have added 
this as part of the cover-letter.

IMHO, use-case for MSI in CDX subsystem is a bit different from per 
device MSI domain. Here we are trying to create a domain per CDX 
controller which is attached to a MSI controller, and all devices on a 
particular CDX controller will have same mechanism of write MSI message. 
Also, the current CDX controller that we have added has a different 
mechanism for MSI prepare (it gets requester ID from firmware).

In your opinion is there any advantage in moving to a per device domain 
for CDX devices? We can definitely rethink the implementation of MSI in 
CDX subsystem.

Thanks,
Nipun

> 
> Thanks,
> 
>          tglx
> 
>
  
Thomas Gleixner May 10, 2023, 10:29 p.m. UTC | #5
Nipun!

On Wed, May 10 2023 at 19:34, Nipun Gupta wrote:
> On 5/10/2023 3:31 AM, Thomas Gleixner wrote:
>> I'm not insisting on that, but you could at least have had the courtesy
>> of responding to my review reply and explain to me why you want to solve
>> it differently and why my suggestion is not the right solution.
>> 
>> Alternatively you could have added that information in the changelog or
>> cover letter.
>> 
>> So in summary you ignored _all_ review comments I made, went off and did
>> something different and provided a slightly different useless changelog
>> with the extra add on of a broken Signed-off-by chain.
>> 
>> Feel free to ignore my reviews and the documentation which we put out
>> there to make collaboration feasible for both sides, but please don't be
>> upset when I ignore you and your patches in return.
>
> Sincere apology for not responding to the earlier comments. Intention 
> was never to ignore the review comments. Appreciate your vast changes 
> regarding the MSI, and the patch series you shared took time to 
> understand (provided other things as well), and it was quite late to 
> reply. I understand that even in this case atleast I should have added 
> this as part of the cover-letter.

Fair enough. All settled.

> IMHO, use-case for MSI in CDX subsystem is a bit different from per 
> device MSI domain. Here we are trying to create a domain per CDX 
> controller which is attached to a MSI controller, and all devices on a 
> particular CDX controller will have same mechanism of write MSI
> message.

That was exactly the same assumption which PCI/MSI and other MSI
implementations made. It turned out to be the wrong abstraction.

CDX is not any different than PCI. The actual "interrupt chip" is not
part of the bus, it's part of the device and pretending that it is a bus
specific thing is just running in to the same cul-de-sac sooner than
later.

> Also, the current CDX controller that we have added has a different 
> mechanism for MSI prepare (it gets requester ID from firmware).

That's not an argument, that's just an implementation detail.

> In your opinion is there any advantage in moving to a per device domain 
> for CDX devices? We can definitely rethink the implementation of MSI in 
> CDX subsystem.

See above.

While talking about implementation and design. I actually got curious
and looked at CDX because I was amazed about the gazillion indirections
in that msi_write_msg() callback.

So this ends up doing:

   cdx->ops->dev_configure(cdx, ...)
     cdx_configure_device()
       cdx_mcdi_write_msi()
         cdx_mcdi_rpc_async()
           kmalloc()                            <- FAIL #1
           cdx_mcdi_rpc_async_internal()
              queue_work()                      <- FAIL #2

#1) That kmalloc() uses GFP_ATOMIC, but this is invoked deep in the guts
    of interrupt handling with locks held and interrupts disabled.

    Aside of the fact that this breaks on PREEMPT_RT, such allocations
    are generally frowned upon. As a consequence the kref_put()s in the
    error paths of cdx_mcdi_rpc_async_internal() will blow up on RT
    too.

    I know that Xilinx stated publicly that they don't support RT, but
    RT is not that far out to be supported in mainline and aside of that
    I know for sure that quite a lot of Xilinx customers use PREEMPT_RT
    nevertheless.

#2) That's actually the worse part of it and completely broken versus
    device setup

    probe()
      cdx_msi_domain_alloc_irqs()
      ...
      request_irq() {
        ...
        irq_activate()
          irq_chip_write_msi_msg()
            ...
            queue_work()
          ...
      }

      enable_irq_in_device()
      
        <- device raises interrupt and eventually uses an uninitialized
           MSI message because the scheduled work has not yet completed.

    That's going to be a nightmare to debug and it's going to happen
    once in a blue moon out in the field.

The interrupt subsystem already can handle update mechanisms which
require sleepable context:

   irq_bus_lock() and irq_bus_sync_unlock() irqchip callbacks

They were initially implemented to deal with interrupt chips which are
configured via I2C, SPI etc. 

How does that work?

On entry to interrupt management functions the sequence is:

    if (desc->irq_data.chip->irq_bus_lock)
       desc->irq_data.chip->irq_bus_lock(...)
    raw_spin_lock_irq(&desc->lock);

and on exit:

    raw_spin_unlock_irq(&desc->lock);
    if (desc->irq_data.chip->irq_bus_sync_unlock)
       desc->irq_data.chip->irq_bus_sync_unlock(...)

irq_bus_lock() usually just acquires a mutex.

The other irqchip callbacks just cache the relevant information, but do
not execute the bus transaction because that is not possible with
desc->lock held.

In the irq_bus_sync_unlock() they execute the bus transaction with the
cached information before dropping the mutex.

So you can solve #1 and #2 with that. Your msi_write_msg() callback will
just save the message and set some internal flag that it needs to be
written out in the irq_bus_sync_unlock() callback.

See?

IIRC, there is a gap vs. interrupt affinity setting from user space,
which is irrelevant for I2C, SPI etc. configured interrupt chips as they
raise interrupt via an SoC interrupt pin and that's the entity which
does the affinity management w/o requiring I2C/SPI. IIRC I posted a
patch snippet to that effect in one of those lengthy PCI/MSI/IMS threads
because that is also required for MSI storage which happens to be in
queue memory and needs to be synchronized via some command channel. But
I can't be bothered to search for it as it's a no-brainer to fix that
up.

Thanks,

        tglx
  
Thomas Gleixner May 10, 2023, 11:52 p.m. UTC | #6
Nipun!

On Thu, May 11 2023 at 00:29, Thomas Gleixner wrote:
> On Wed, May 10 2023 at 19:34, Nipun Gupta wrote:
> #2) That's actually the worse part of it and completely broken versus
>     device setup
>
>     probe()
>       cdx_msi_domain_alloc_irqs()
>       ...
>       request_irq() {
>         ...
>         irq_activate()
>           irq_chip_write_msi_msg()
>             ...
>             queue_work()
>           ...
>       }
>
>       enable_irq_in_device()
>       
>         <- device raises interrupt and eventually uses an uninitialized
>            MSI message because the scheduled work has not yet completed.
>
>     That's going to be a nightmare to debug and it's going to happen
>     once in a blue moon out in the field.

Here is another failure scenario:

cpu_down()
  // No scheduling possible as this happens in stomp_machine() context
  __cpu_disable()
    // Point of no return
    set_cpu_online(cpu, false);
    irq_migrate_all_off_this_cpu()
      ...
      set_affinity()

That works with your current implementation by some definition of works
because you schedule the affinity change async.

But what makes sure that this takes effect _before_ the CPU goes into
lala land and cannot respond to an interrupt anymore?

Nothing, other than pure luck, which is not really a solid engineering
principle.

While the regular operations can be fixed via the bus_lock mechanics,
this one falls flat on its nose because in that context this can't
schedule anymore so acquiring a mutex or waiting for some async muck to
complete is a no-no.

Not the end of the world though. There is a way to handle that
gracefully and we need that for the PCI/IMS implementations which will
do that via command queues too.

irq_migrate_all_off_this_cpu() is invoked from stomp_machine() context
with interrupts disabled because the current x86 interrupt management
hardware trainwreck requires it at least in the case that interrupt
remapping is not available.

Of course that got copied to all other architectures... Whether they
really require it or not from a hardware POV is a different
story. Though they all rely on the stomp_machine() context which
prevents concurrent modifications.

So from a historical POV all of this makes sense to some extent, but
that does not prevent us to fix this and make it an two stage process
which can actually handle both worlds.

I'm way too tired to think that through, but you get the idea and you
are welcome to beat me to it.

Thanks,

        tglx
  
Gupta, Nipun May 12, 2023, 2:20 p.m. UTC | #7
On 5/11/2023 3:59 AM, Thomas Gleixner wrote:
> 
> Nipun!
> 
> On Wed, May 10 2023 at 19:34, Nipun Gupta wrote:
>> On 5/10/2023 3:31 AM, Thomas Gleixner wrote:
>>> I'm not insisting on that, but you could at least have had the courtesy
>>> of responding to my review reply and explain to me why you want to solve
>>> it differently and why my suggestion is not the right solution.
>>>
>>> Alternatively you could have added that information in the changelog or
>>> cover letter.
>>>
>>> So in summary you ignored _all_ review comments I made, went off and did
>>> something different and provided a slightly different useless changelog
>>> with the extra add on of a broken Signed-off-by chain.
>>>
>>> Feel free to ignore my reviews and the documentation which we put out
>>> there to make collaboration feasible for both sides, but please don't be
>>> upset when I ignore you and your patches in return.
>>
>> Sincere apology for not responding to the earlier comments. Intention
>> was never to ignore the review comments. Appreciate your vast changes
>> regarding the MSI, and the patch series you shared took time to
>> understand (provided other things as well), and it was quite late to
>> reply. I understand that even in this case atleast I should have added
>> this as part of the cover-letter.
> 
> Fair enough. All settled.
> 
>> IMHO, use-case for MSI in CDX subsystem is a bit different from per
>> device MSI domain. Here we are trying to create a domain per CDX
>> controller which is attached to a MSI controller, and all devices on a
>> particular CDX controller will have same mechanism of write MSI
>> message.
> 
> That was exactly the same assumption which PCI/MSI and other MSI
> implementations made. It turned out to be the wrong abstraction.
> 
> CDX is not any different than PCI. The actual "interrupt chip" is not
> part of the bus, it's part of the device and pretending that it is a bus
> specific thing is just running in to the same cul-de-sac sooner than
> later.

I understand your viewpoint, but would state that CDX bus is somewhat 
different than PCI in the sense that firmware is a controller for
all the devices and their configuration. CDX bus controller sends all 
the write_msi_msg commands to firmware running on RPU over the RPmsg and 
it is the firmware which interfaces with actual devices to pass this 
information to devices in a way agreed between firmware and device. The 
only way to pass MSI information to device is via firmware and CDX bus 
controller is only entity which can communicate with the firmware for this.

> 
>> Also, the current CDX controller that we have added has a different
>> mechanism for MSI prepare (it gets requester ID from firmware).
> 
> That's not an argument, that's just an implementation detail.
> 
>> In your opinion is there any advantage in moving to a per device domain
>> for CDX devices? We can definitely rethink the implementation of MSI in
>> CDX subsystem.
> 
> See above.
> 
> While talking about implementation and design. I actually got curious
> and looked at CDX because I was amazed about the gazillion indirections
> in that msi_write_msg() callback.
> 
> So this ends up doing:
> 
>     cdx->ops->dev_configure(cdx, ...)
>       cdx_configure_device()
>         cdx_mcdi_write_msi()
>           cdx_mcdi_rpc_async()
>             kmalloc()                            <- FAIL #1
>             cdx_mcdi_rpc_async_internal()
>                queue_work()                      <- FAIL #2
> 
> #1) That kmalloc() uses GFP_ATOMIC, but this is invoked deep in the guts
>      of interrupt handling with locks held and interrupts disabled.
> 
>      Aside of the fact that this breaks on PREEMPT_RT, such allocations
>      are generally frowned upon. As a consequence the kref_put()s in the
>      error paths of cdx_mcdi_rpc_async_internal() will blow up on RT
>      too.
> 
>      I know that Xilinx stated publicly that they don't support RT, but
>      RT is not that far out to be supported in mainline and aside of that
>      I know for sure that quite a lot of Xilinx customers use PREEMPT_RT
>      nevertheless.
> 
> #2) That's actually the worse part of it and completely broken versus
>      device setup
> 
>      probe()
>        cdx_msi_domain_alloc_irqs()
>        ...
>        request_irq() {
>          ...
>          irq_activate()
>            irq_chip_write_msi_msg()
>              ...
>              queue_work()
>            ...
>        }
> 
>        enable_irq_in_device()
> 
>          <- device raises interrupt and eventually uses an uninitialized
>             MSI message because the scheduled work has not yet completed.
> 
>      That's going to be a nightmare to debug and it's going to happen
>      once in a blue moon out in the field.
> 
> The interrupt subsystem already can handle update mechanisms which
> require sleepable context:
> 
>     irq_bus_lock() and irq_bus_sync_unlock() irqchip callbacks
> 
> They were initially implemented to deal with interrupt chips which are
> configured via I2C, SPI etc.
> 
> How does that work?
> 
> On entry to interrupt management functions the sequence is:
> 
>      if (desc->irq_data.chip->irq_bus_lock)
>         desc->irq_data.chip->irq_bus_lock(...)
>      raw_spin_lock_irq(&desc->lock);
> 
> and on exit:
> 
>      raw_spin_unlock_irq(&desc->lock);
>      if (desc->irq_data.chip->irq_bus_sync_unlock)
>         desc->irq_data.chip->irq_bus_sync_unlock(...)
> 
> irq_bus_lock() usually just acquires a mutex.
> 
> The other irqchip callbacks just cache the relevant information, but do
> not execute the bus transaction because that is not possible with
> desc->lock held.
> 
> In the irq_bus_sync_unlock() they execute the bus transaction with the
> cached information before dropping the mutex.
> 
> So you can solve #1 and #2 with that. Your msi_write_msg() callback will
> just save the message and set some internal flag that it needs to be
> written out in the irq_bus_sync_unlock() callback.
> 
> See?
> 
> IIRC, there is a gap vs. interrupt affinity setting from user space,
> which is irrelevant for I2C, SPI etc. configured interrupt chips as they
> raise interrupt via an SoC interrupt pin and that's the entity which
> does the affinity management w/o requiring I2C/SPI. IIRC I posted a
> patch snippet to that effect in one of those lengthy PCI/MSI/IMS threads
> because that is also required for MSI storage which happens to be in
> queue memory and needs to be synchronized via some command channel. But
> I can't be bothered to search for it as it's a no-brainer to fix that
> up.

Thanks for this analysis and pointing the hidden crucial issues with the 
implementation. These needs to be fixed.

As per your suggestion, we can add Firmware interaction code in the 
irq_bus_sync_xx APIs. Another option is to change the 
cdx_mcdi_rpc_async() API to atomic synchronous API. We are evaluating 
both the solutions and will update the implementation accordingly.

Thanks,
Nipun

> 
> Thanks,
> 
>          tglx
  
Thomas Gleixner May 12, 2023, 6:15 p.m. UTC | #8
Nipun!

On Fri, May 12 2023 at 19:50, Nipun Gupta wrote:
> On 5/11/2023 3:59 AM, Thomas Gleixner wrote:
>> CDX is not any different than PCI. The actual "interrupt chip" is not
>> part of the bus, it's part of the device and pretending that it is a bus
>> specific thing is just running in to the same cul-de-sac sooner than
>> later.
>
> I understand your viewpoint, but would state that CDX bus is somewhat 
> different than PCI in the sense that firmware is a controller for
> all the devices and their configuration. CDX bus controller sends all 
> the write_msi_msg commands to firmware running on RPU over the RPmsg and 
> it is the firmware which interfaces with actual devices to pass this 
> information to devices in a way agreed between firmware and device. The 
> only way to pass MSI information to device is via firmware and CDX bus 
> controller is only entity which can communicate with the firmware for
> this.

Fair enough, but we wouldn't had this dicussion if the above information
would have been part of the changelog. See?

>> IIRC, there is a gap vs. interrupt affinity setting from user space,
>> which is irrelevant for I2C, SPI etc. configured interrupt chips as they
>> raise interrupt via an SoC interrupt pin and that's the entity which
>> does the affinity management w/o requiring I2C/SPI. IIRC I posted a
>> patch snippet to that effect in one of those lengthy PCI/MSI/IMS threads
>> because that is also required for MSI storage which happens to be in
>> queue memory and needs to be synchronized via some command channel. But
>> I can't be bothered to search for it as it's a no-brainer to fix that
>> up.
>
> Thanks for this analysis and pointing the hidden crucial issues with the 
> implementation. These needs to be fixed.
>
> As per your suggestion, we can add Firmware interaction code in the
> irq_bus_sync_xx APIs. Another option is to change the
> cdx_mcdi_rpc_async() API to atomic synchronous API.

I'm not a great fan of that. Depending on how long this update takes the
CPU will busy wait for it to complete with interrupts disabled and locks
held.

> We are evaluating both the solutions and will update the
> implementation accordingly.

Thanks,
 
         tglx
  
Gupta, Nipun May 15, 2023, 1:09 p.m. UTC | #9
On 5/12/2023 11:45 PM, Thomas Gleixner wrote:
> 
> Nipun!
> 
> On Fri, May 12 2023 at 19:50, Nipun Gupta wrote:
>> On 5/11/2023 3:59 AM, Thomas Gleixner wrote:
>>> CDX is not any different than PCI. The actual "interrupt chip" is not
>>> part of the bus, it's part of the device and pretending that it is a bus
>>> specific thing is just running in to the same cul-de-sac sooner than
>>> later.
>>
>> I understand your viewpoint, but would state that CDX bus is somewhat
>> different than PCI in the sense that firmware is a controller for
>> all the devices and their configuration. CDX bus controller sends all
>> the write_msi_msg commands to firmware running on RPU over the RPmsg and
>> it is the firmware which interfaces with actual devices to pass this
>> information to devices in a way agreed between firmware and device. The
>> only way to pass MSI information to device is via firmware and CDX bus
>> controller is only entity which can communicate with the firmware for
>> this.
> 
> Fair enough, but we wouldn't had this dicussion if the above information
> would have been part of the changelog. See?

Yes, you are correct, we will make sure to update and add more info in 
the changelog in next spin.

> 
>>> IIRC, there is a gap vs. interrupt affinity setting from user space,
>>> which is irrelevant for I2C, SPI etc. configured interrupt chips as they
>>> raise interrupt via an SoC interrupt pin and that's the entity which
>>> does the affinity management w/o requiring I2C/SPI. IIRC I posted a
>>> patch snippet to that effect in one of those lengthy PCI/MSI/IMS threads
>>> because that is also required for MSI storage which happens to be in
>>> queue memory and needs to be synchronized via some command channel. But
>>> I can't be bothered to search for it as it's a no-brainer to fix that
>>> up.
>>
>> Thanks for this analysis and pointing the hidden crucial issues with the
>> implementation. These needs to be fixed.
>>
>> As per your suggestion, we can add Firmware interaction code in the
>> irq_bus_sync_xx APIs. Another option is to change the
>> cdx_mcdi_rpc_async() API to atomic synchronous API.
> 
> I'm not a great fan of that. Depending on how long this update takes the
> CPU will busy wait for it to complete with interrupts disabled and locks
> held.

Agree. we are also inclined towards using irq_bus_sync_xx APIs. This 
would definitely solve the issue (#1 and #2) for the setup_irq which you 
mentioned.

For MSI affinity, since GIC-ITS always return IRQ_SET_MASK_OK_DONE, the 
irq_chip_write_msi_msg does not get called.

msi_domain_set_affinity(...)
     ret = parent->chip->irq_set_affinity(...);
     // For GIC ITS it always return IRQ_SET_MASK_OK_DONE
     if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
         irq_chip_write_msi_msg(...);
     }
Since CDX bus is specific to ARM and uses GIC ITS, it seems we do not 
need to do anything here. Can you please share your opinion on this?

Thanks,
Nipun

> 
>> We are evaluating both the solutions and will update the
>> implementation accordingly.
> 
> Thanks,
> 
>           tglx
  
Thomas Gleixner May 15, 2023, 4:46 p.m. UTC | #10
Nipun!

On Mon, May 15 2023 at 18:39, Nipun Gupta wrote:
> On 5/12/2023 11:45 PM, Thomas Gleixner wrote:
>> On Fri, May 12 2023 at 19:50, Nipun Gupta wrote:
>>>
>>> As per your suggestion, we can add Firmware interaction code in the
>>> irq_bus_sync_xx APIs. Another option is to change the
>>> cdx_mcdi_rpc_async() API to atomic synchronous API.
>> 
>> I'm not a great fan of that. Depending on how long this update takes the
>> CPU will busy wait for it to complete with interrupts disabled and locks
>> held.
>
> Agree. we are also inclined towards using irq_bus_sync_xx APIs. This 
> would definitely solve the issue (#1 and #2) for the setup_irq which you 
> mentioned.
>
> For MSI affinity, since GIC-ITS always return IRQ_SET_MASK_OK_DONE, the 
> irq_chip_write_msi_msg does not get called.
>
> msi_domain_set_affinity(...)
>      ret = parent->chip->irq_set_affinity(...);
>      // For GIC ITS it always return IRQ_SET_MASK_OK_DONE
>      if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
>          irq_chip_write_msi_msg(...);
>      }
> Since CDX bus is specific to ARM and uses GIC ITS, it seems we do not 
> need to do anything here. Can you please share your opinion on this?

That makes sense.

Thanks,

        tglx
  

Patch

diff --git a/drivers/cdx/Kconfig b/drivers/cdx/Kconfig
index a08958485e31..86df7ccb76bb 100644
--- a/drivers/cdx/Kconfig
+++ b/drivers/cdx/Kconfig
@@ -8,6 +8,7 @@ 
 config CDX_BUS
 	bool "CDX Bus driver"
 	depends on OF && ARM64
+	select GENERIC_MSI_IRQ_DOMAIN
 	help
 	  Driver to enable Composable DMA Transfer(CDX) Bus. CDX bus
 	  exposes Fabric devices which uses composable DMA IP to the
diff --git a/drivers/cdx/Makefile b/drivers/cdx/Makefile
index 0324e4914f6e..4bad79d1d188 100644
--- a/drivers/cdx/Makefile
+++ b/drivers/cdx/Makefile
@@ -5,4 +5,4 @@ 
 # Copyright (C) 2022-2023, Advanced Micro Devices, Inc.
 #
 
-obj-$(CONFIG_CDX_BUS) += cdx.o controller/
+obj-$(CONFIG_CDX_BUS) += cdx.o cdx_msi.o controller/
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index 38511fd36325..4231f691a230 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -56,6 +56,7 @@ 
  */
 
 #include <linux/init.h>
+#include <linux/irqdomain.h>
 #include <linux/kernel.h>
 #include <linux/of_device.h>
 #include <linux/slab.h>
@@ -479,6 +480,11 @@  int cdx_device_add(struct cdx_dev_params *dev_params)
 		     ((cdx->id << CDX_CONTROLLER_ID_SHIFT) | (cdx_dev->bus_num & CDX_BUS_NUM_MASK)),
 		     cdx_dev->dev_num);
 
+	if (cdx->msi_domain) {
+		cdx_dev->num_msi = dev_params->num_msi;
+		dev_set_msi_domain(&cdx_dev->dev, cdx->msi_domain);
+	}
+
 	ret = device_add(&cdx_dev->dev);
 	if (ret) {
 		dev_err(&cdx_dev->dev,
diff --git a/drivers/cdx/cdx.h b/drivers/cdx/cdx.h
index c436ac7ac86f..90e2da1e320d 100644
--- a/drivers/cdx/cdx.h
+++ b/drivers/cdx/cdx.h
@@ -21,6 +21,7 @@ 
  * @res: array of MMIO region entries
  * @res_count: number of valid MMIO regions
  * @req_id: Requestor ID associated with CDX device
+ * @num_msi: Number of MSI's supported by the device
  */
 struct cdx_dev_params {
 	struct cdx_controller *cdx;
@@ -31,6 +32,7 @@  struct cdx_dev_params {
 	struct resource res[MAX_CDX_DEV_RESOURCES];
 	u8 res_count;
 	u32 req_id;
+	u32 num_msi;
 };
 
 /**
@@ -59,4 +61,12 @@  void cdx_unregister_controller(struct cdx_controller *cdx);
  */
 int cdx_device_add(struct cdx_dev_params *dev_params);
 
+/**
+ * cdx_msi_domain_init - Init the CDX bus MSI domain.
+ * @dev: Device of the CDX bus controller
+ *
+ * Return: CDX MSI domain, NULL on failure
+ */
+struct irq_domain *cdx_msi_domain_init(struct device *dev);
+
 #endif /* _CDX_H_ */
diff --git a/drivers/cdx/cdx_msi.c b/drivers/cdx/cdx_msi.c
new file mode 100644
index 000000000000..60b165c998bb
--- /dev/null
+++ b/drivers/cdx/cdx_msi.c
@@ -0,0 +1,153 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD CDX bus driver MSI support
+ *
+ * Copyright (C) 2022-2023, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
+#include <linux/cdx/cdx_bus.h>
+
+#include "cdx.h"
+
+static void cdx_msi_write_msg(struct irq_data *irq_data,
+			      struct msi_msg *msg)
+{
+	struct msi_desc *msi_desc = irq_data_get_msi_desc(irq_data);
+	struct cdx_device *cdx_dev = to_cdx_device(msi_desc->dev);
+	struct cdx_controller *cdx = cdx_dev->cdx;
+	struct cdx_device_config dev_config;
+	int ret;
+
+	msi_desc->msg = *msg;
+	dev_config.msi.msi_index = msi_desc->msi_index;
+	dev_config.msi.data = msi_desc->msg.data;
+	dev_config.msi.addr = ((u64)(msi_desc->msg.address_hi) << 32) |
+			      msi_desc->msg.address_lo;
+
+	dev_config.type = CDX_DEV_MSI_CONF;
+	ret = cdx->ops->dev_configure(cdx, cdx_dev->bus_num, cdx_dev->dev_num,
+				      &dev_config);
+	if (ret)
+		dev_err(&cdx_dev->dev, "Write MSI failed to CDX controller\n");
+}
+
+static struct irq_chip cdx_msi_irq_chip = {
+	.name			= "CDX-MSI",
+	.irq_mask		= irq_chip_mask_parent,
+	.irq_unmask		= irq_chip_unmask_parent,
+	.irq_eoi		= irq_chip_eoi_parent,
+	.irq_set_affinity	= msi_domain_set_affinity,
+	.irq_write_msi_msg	= cdx_msi_write_msg
+};
+
+int cdx_msi_domain_alloc_irqs(struct device *dev, unsigned int irq_count)
+{
+	int ret;
+
+	ret = msi_setup_device_data(dev);
+	if (ret)
+		return ret;
+
+	ret = msi_domain_alloc_irqs_range(dev, MSI_DEFAULT_DOMAIN,
+					  0, irq_count - 1);
+	if (ret)
+		dev_err(dev, "Failed to allocate IRQs\n");
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cdx_msi_domain_alloc_irqs);
+
+/* Convert an msi_desc to a globally unique identifier. */
+static irq_hw_number_t cdx_domain_calc_hwirq(struct cdx_device *dev,
+					     struct msi_desc *desc)
+{
+	return ((irq_hw_number_t)dev->req_id << 10) | desc->msi_index;
+}
+
+static void cdx_msi_set_desc(msi_alloc_info_t *arg,
+			     struct msi_desc *desc)
+{
+	arg->desc = desc;
+	arg->hwirq = cdx_domain_calc_hwirq(to_cdx_device(desc->dev), desc);
+}
+
+static int cdx_msi_prepare(struct irq_domain *msi_domain,
+			   struct device *dev,
+			   int nvec, msi_alloc_info_t *info)
+{
+	struct cdx_device *cdx_dev = to_cdx_device(dev);
+	struct device *parent = dev->parent;
+	struct msi_domain_info *msi_info;
+	u32 dev_id = 0;
+	int ret;
+
+	/* Retrieve device ID from requestor ID using parent device */
+	ret = of_map_id(parent->of_node, cdx_dev->req_id, "msi-map",
+			"msi-map-mask", NULL, &dev_id);
+	if (ret) {
+		dev_err(dev, "of_map_id failed for MSI: %d\n", ret);
+		return ret;
+	}
+
+	/* Set the device Id to be passed to the GIC-ITS */
+	info->scratchpad[0].ul = dev_id;
+
+	msi_info = msi_get_domain_info(msi_domain->parent);
+
+	return msi_info->ops->msi_prepare(msi_domain->parent, dev, nvec, info);
+}
+
+static struct msi_domain_ops cdx_msi_ops = {
+	.msi_prepare	= cdx_msi_prepare,
+	.set_desc	= cdx_msi_set_desc
+};
+
+static struct msi_domain_info cdx_msi_domain_info = {
+	.ops	= &cdx_msi_ops,
+	.chip	= &cdx_msi_irq_chip,
+	.flags	= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+		  MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS | MSI_FLAG_FREE_MSI_DESCS
+};
+
+struct irq_domain *cdx_msi_domain_init(struct device *dev)
+{
+	struct device_node *np = dev->of_node;
+	struct fwnode_handle *fwnode_handle;
+	struct irq_domain *cdx_msi_domain;
+	struct device_node *parent_node;
+	struct irq_domain *parent;
+
+	fwnode_handle = of_node_to_fwnode(np);
+
+	parent_node = of_parse_phandle(np, "msi-map", 1);
+	if (!parent_node) {
+		dev_err(dev, "msi-map not present on cdx controller\n");
+		return NULL;
+	}
+
+	parent = irq_find_matching_fwnode(of_node_to_fwnode(parent_node),
+					  DOMAIN_BUS_NEXUS);
+	if (!parent || !msi_get_domain_info(parent)) {
+		dev_err(dev, "unable to locate ITS domain\n");
+		return NULL;
+	}
+
+	cdx_msi_domain = msi_create_irq_domain(fwnode_handle, &cdx_msi_domain_info,
+					       parent);
+	if (!cdx_msi_domain) {
+		dev_err(dev, "unable to create CDX-MSI domain\n");
+		return NULL;
+	}
+
+	dev_dbg(dev, "CDX-MSI domain created\n");
+
+	return cdx_msi_domain;
+}
+EXPORT_SYMBOL_GPL(cdx_msi_domain_init);
diff --git a/drivers/cdx/controller/cdx_controller.c b/drivers/cdx/controller/cdx_controller.c
index dc52f95f8978..29e8e4a3b8c1 100644
--- a/drivers/cdx/controller/cdx_controller.c
+++ b/drivers/cdx/controller/cdx_controller.c
@@ -8,6 +8,7 @@ 
 #include <linux/of_platform.h>
 #include <linux/slab.h>
 #include <linux/cdx/cdx_bus.h>
+#include <linux/irqdomain.h>
 
 #include "cdx_controller.h"
 #include "../cdx.h"
@@ -49,9 +50,20 @@  static int cdx_configure_device(struct cdx_controller *cdx,
 				u8 bus_num, u8 dev_num,
 				struct cdx_device_config *dev_config)
 {
+	u16 msi_index;
 	int ret = 0;
+	u32 data;
+	u64 addr;
 
 	switch (dev_config->type) {
+	case CDX_DEV_MSI_CONF:
+		msi_index = dev_config->msi.msi_index;
+		data = dev_config->msi.data;
+		addr = dev_config->msi.addr;
+
+		ret = cdx_mcdi_write_msi(cdx->priv, bus_num, dev_num,
+					 msi_index, addr, data);
+		break;
 	case CDX_DEV_RESET_CONF:
 		ret = cdx_mcdi_reset_device(cdx->priv, bus_num, dev_num);
 		break;
@@ -154,6 +166,14 @@  static int xlnx_cdx_probe(struct platform_device *pdev)
 	cdx->priv = cdx_mcdi;
 	cdx->ops = &cdx_ops;
 
+	/* Create MSI domain */
+	cdx->msi_domain = cdx_msi_domain_init(&pdev->dev);
+	if (!cdx->msi_domain) {
+		dev_err(&pdev->dev, "cdx_msi_domain_init() failed");
+		ret = -ENODEV;
+		goto cdx_msi_fail;
+	}
+
 	ret = cdx_setup_rpmsg(pdev);
 	if (ret) {
 		if (ret != -EPROBE_DEFER)
@@ -165,6 +185,8 @@  static int xlnx_cdx_probe(struct platform_device *pdev)
 	return 0;
 
 cdx_rpmsg_fail:
+	irq_domain_remove(cdx->msi_domain);
+cdx_msi_fail:
 	kfree(cdx);
 cdx_alloc_fail:
 	cdx_mcdi_finish(cdx_mcdi);
@@ -181,6 +203,7 @@  static int xlnx_cdx_remove(struct platform_device *pdev)
 
 	cdx_destroy_rpmsg(pdev);
 
+	irq_domain_remove(cdx->msi_domain);
 	kfree(cdx);
 
 	cdx_mcdi_finish(cdx_mcdi);
diff --git a/drivers/cdx/controller/mc_cdx_pcol.h b/drivers/cdx/controller/mc_cdx_pcol.h
index 4ccb7b52951b..07358d25b90b 100644
--- a/drivers/cdx/controller/mc_cdx_pcol.h
+++ b/drivers/cdx/controller/mc_cdx_pcol.h
@@ -562,6 +562,64 @@ 
 #define MC_CMD_CDX_DEVICE_CONTROL_GET_OUT_MMIO_REGIONS_ENABLE_LBN	2
 #define MC_CMD_CDX_DEVICE_CONTROL_GET_OUT_MMIO_REGIONS_ENABLE_WIDTH	1
 
+/***********************************/
+/*
+ * MC_CMD_CDX_DEVICE_WRITE_MSI_MSG
+ * Populates the MSI message to be used by the hardware to raise the specified
+ * interrupt vector. Versal-net implementation specific limitations are that
+ * only 4 CDX devices with MSI interrupt capability are supported and all
+ * vectors within a device must use the same write address. The command will
+ * return EINVAL if any of these limitations is violated.
+ */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG					0x9
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_MSGSET				0x9
+#undef MC_CMD_0x9_PRIVILEGE_CTG
+
+#define MC_CMD_0x9_PRIVILEGE_CTG SRIOV_CTG_ADMIN
+
+/* MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN msgrequest */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_LEN				28
+/* Device bus number, in range 0 to BUS_COUNT-1 */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_BUS_OFST			0
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_BUS_LEN			4
+/* Device number relative to the bus, in range 0 to DEVICE_COUNT-1 for that bus */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_DEVICE_OFST			4
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_DEVICE_LEN			4
+/*
+ * Device-relative MSI vector number. Must be < MSI_COUNT reported for the
+ * device.
+ */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_VECTOR_OFST		8
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_VECTOR_LEN		4
+/* Reserved (alignment) */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_RESERVED_OFST		12
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_RESERVED_LEN			4
+/*
+ * MSI address to be used by the hardware. Typically, on ARM systems this
+ * address is translated by the IOMMU (if enabled) and it is the responsibility
+ * of the entity managing the IOMMU (APU kernel) to supply the correct IOVA
+ * here.
+ */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_OFST		16
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_LEN		8
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_LO_OFST		16
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_LO_LEN		4
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_LO_LBN		128
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_LO_WIDTH		32
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_HI_OFST		20
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_HI_LEN		4
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_HI_LBN		160
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS_HI_WIDTH		32
+/*
+ * MSI data to be used by the hardware. On versal-net, only the lower 16-bits
+ * are used, the remaining bits are ignored and should be set to zero.
+ */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_DATA_OFST		24
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_DATA_LEN			4
+
+/* MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_OUT msgresponse */
+#define MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_OUT_LEN				0
+
 /***********************************/
 /* MC_CMD_V2_EXTN - Encapsulation for a v2 extended command */
 #define MC_CMD_V2_EXTN					0x7f
diff --git a/drivers/cdx/controller/mcdi_functions.c b/drivers/cdx/controller/mcdi_functions.c
index 0158f26533dd..41ebd26cd585 100644
--- a/drivers/cdx/controller/mcdi_functions.c
+++ b/drivers/cdx/controller/mcdi_functions.c
@@ -120,10 +120,29 @@  int cdx_mcdi_get_dev_config(struct cdx_mcdi *cdx,
 
 	dev_params->vendor = MCDI_WORD(outbuf, CDX_BUS_GET_DEVICE_CONFIG_OUT_VENDOR_ID);
 	dev_params->device = MCDI_WORD(outbuf, CDX_BUS_GET_DEVICE_CONFIG_OUT_DEVICE_ID);
+	dev_params->num_msi = MCDI_DWORD(outbuf, CDX_BUS_GET_DEVICE_CONFIG_OUT_MSI_COUNT);
 
 	return 0;
 }
 
+int cdx_mcdi_write_msi(struct cdx_mcdi *cdx, u8 bus_num, u8 dev_num,
+		       u32 msi_vector, u64 msi_address, u32 msi_data)
+{
+	MCDI_DECLARE_BUF(inbuf, MC_CMD_CDX_DEVICE_WRITE_MSI_MSG_IN_LEN);
+	int ret;
+
+	MCDI_SET_DWORD(inbuf, CDX_DEVICE_WRITE_MSI_MSG_IN_BUS, bus_num);
+	MCDI_SET_DWORD(inbuf, CDX_DEVICE_WRITE_MSI_MSG_IN_DEVICE, dev_num);
+	MCDI_SET_DWORD(inbuf, CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_VECTOR, msi_vector);
+	MCDI_SET_QWORD(inbuf, CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_ADDRESS, msi_address);
+	MCDI_SET_DWORD(inbuf, CDX_DEVICE_WRITE_MSI_MSG_IN_MSI_DATA, msi_data);
+
+	ret = cdx_mcdi_rpc_async(cdx, MC_CMD_CDX_DEVICE_WRITE_MSI_MSG, inbuf,
+				 sizeof(inbuf), NULL, 0);
+
+	return ret;
+}
+
 int cdx_mcdi_reset_device(struct cdx_mcdi *cdx, u8 bus_num, u8 dev_num)
 {
 	MCDI_DECLARE_BUF(inbuf, MC_CMD_CDX_DEVICE_RESET_IN_LEN);
diff --git a/drivers/cdx/controller/mcdi_functions.h b/drivers/cdx/controller/mcdi_functions.h
index 7440ace5539a..35a300727f34 100644
--- a/drivers/cdx/controller/mcdi_functions.h
+++ b/drivers/cdx/controller/mcdi_functions.h
@@ -47,6 +47,26 @@  int cdx_mcdi_get_dev_config(struct cdx_mcdi *cdx,
 			    u8 bus_num, u8 dev_num,
 			    struct cdx_dev_params *dev_params);
 
+/**
+ * cdx_mcdi_write_msi - Write MSI configuration for CDX device
+ * @cdx: pointer to MCDI interface.
+ * @bus_num: Bus number.
+ * @dev_num: Device number.
+ * @msi_vector: Device-relative MSI vector number.
+ *	Must be < MSI_COUNT reported for the device.
+ * @msi_address: MSI address to be used by the hardware. Typically, on ARM
+ *	systems this address is translated by the IOMMU (if enabled) and
+ *	it is the responsibility of the entity managing the IOMMU (APU kernel)
+ *	to supply the correct IOVA here.
+ * @msi_data: MSI data to be used by the hardware. On versal-net, only the
+ *	lower 16-bits are used, the remaining bits are ignored and should be
+ *	set to zero.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+int cdx_mcdi_write_msi(struct cdx_mcdi *cdx, u8 bus_num, u8 dev_num,
+		       u32 msi_vector, u64 msi_address, u32 msi_data);
+
 /**
  * cdx_mcdi_reset_device - Reset cdx device represented by bus_num:dev_num
  * @cdx: pointer to MCDI interface.
diff --git a/include/linux/cdx/cdx_bus.h b/include/linux/cdx/cdx_bus.h
index bead71b7bc73..7b84e0642a20 100644
--- a/include/linux/cdx/cdx_bus.h
+++ b/include/linux/cdx/cdx_bus.h
@@ -21,11 +21,19 @@ 
 struct cdx_controller;
 
 enum {
+	CDX_DEV_MSI_CONF,
 	CDX_DEV_RESET_CONF,
 };
 
+struct cdx_msi_config {
+	u16 msi_index;
+	u32 data;
+	u64 addr;
+};
+
 struct cdx_device_config {
 	u8 type;
+	struct cdx_msi_config msi;
 };
 
 typedef int (*cdx_scan_cb)(struct cdx_controller *cdx);
@@ -62,12 +70,14 @@  struct cdx_ops {
  * struct cdx_controller: CDX controller object
  * @dev: Linux device associated with the CDX controller.
  * @priv: private data
+ * @msi_domain: MSI domain
  * @id: Controller ID
  * @ops: CDX controller ops
  */
 struct cdx_controller {
 	struct device *dev;
 	void *priv;
+	struct irq_domain *msi_domain;
 	u32 id;
 	struct cdx_ops *ops;
 };
@@ -86,6 +96,7 @@  struct cdx_controller {
  * @dma_mask: Default DMA mask
  * @flags: CDX device flags
  * @req_id: Requestor ID associated with CDX device
+ * @num_msi: Number of MSI's supported by the device
  * @driver_override: driver name to force a match; do not set directly,
  *                   because core frees it; use driver_set_override() to
  *                   set or clear it.
@@ -102,6 +113,7 @@  struct cdx_device {
 	u64 dma_mask;
 	u16 flags;
 	u32 req_id;
+	u32 num_msi;
 	const char *driver_override;
 };
 
@@ -162,6 +174,20 @@  void cdx_driver_unregister(struct cdx_driver *cdx_driver);
 
 extern struct bus_type cdx_bus_type;
 
+/**
+ * cdx_msi_domain_alloc_irqs - Allocate MSI's for the CDX device
+ * @dev: device pointer
+ * @irq_count: Number of MSI's to be allocated
+ *
+ * Return: 0 for success, -errno on failure
+ */
+int cdx_msi_domain_alloc_irqs(struct device *dev, unsigned int irq_count);
+
+/**
+ * cdx_msi_domain_free_irqs - Free MSI's for CDX device
+ */
+#define cdx_msi_domain_free_irqs msi_domain_free_irqs_all
+
 /**
  * cdx_dev_reset - Reset CDX device
  * @dev: device pointer
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7a97bcb086bf..d9b793900931 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1637,6 +1637,7 @@  void msi_domain_free_irqs_all(struct device *dev, unsigned int domid)
 	msi_domain_free_irqs_all_locked(dev, domid);
 	msi_unlock_descs(dev);
 }
+EXPORT_SYMBOL_GPL(msi_domain_free_irqs_all);
 
 /**
  * msi_get_domain_info - Get the MSI interrupt domain info for @domain