[19/33] genirq/msi: Provide msi_desc::msi_data

Message ID 20221111135206.346985384@linutronix.de
State New
Headers
Series genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation |

Commit Message

Thomas Gleixner Nov. 11, 2022, 1:58 p.m. UTC
  The upcoming support for PCI/IMS requires to store some information related
to the message handling in the MSI descriptor, e.g. PASID or a pointer to a
queue.

Provide a generic storage struct which maps over the existing PCI specific
storage which means the size of struct msi_desc is not getting bigger.

It contains a iomem pointer for device memory based IMS and a union of a
u64 and a void pointer which allows the device specific IMS implementations
to store the necessary information.

The iomem pointer is set up by the domain allocation functions.

The data union msi_dev_cookie is going to be handed in when allocating an
interrupt on an IMS domain so the irq chip callbacks of the IMS domain have
the necessary per vector information available. It also comes in handy when
cleaning up the platform MSI code for wire to MSI bridges which need to
hand down the type information to the underlying interrupt domain.

For the core code the cookie is opaque and meaningless. It just stores it
during an allocation through the upcoming interfaces for IMS and wire to
MSI brigdes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/msi.h     |   19 ++++++++++++++++++-
 include/linux/msi_api.h |   17 +++++++++++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)
  

Comments

Jason Gunthorpe Nov. 16, 2022, 7:28 p.m. UTC | #1
On Fri, Nov 11, 2022 at 02:58:41PM +0100, Thomas Gleixner wrote:

> +/**
> + * struct msi_desc_data - Generic MSI descriptor data
> + * @iobase:     Pointer to the IOMEM base adress for interrupt callbacks
> + * @cookie:	Device cookie provided at allocation time
> + *
> + * The content of this data is implementation defined, e.g. PCI/IMS
> + * implementations will define the meaning of the data.
> + */
> +struct msi_desc_data {
> +	void			__iomem *iobase;
> +	union msi_dev_cookie	cookie;
> +};

It would be nice to see the pci_msi_desc converted to a domain
specific storage as well.

Maybe could be written

struct msi_desc {
   u64 domain_data[2];
}

struct pci_msi_desc {
		u32 msi_mask;
		u8	multiple	: 3;
		u8	multi_cap	: 3;
		u8	can_mask	: 1;
		u8	is_64		: 1;
		u8	mask_pos;
		u16 default_irq;
}
static_assert(sizeof(struct pci_msi_desc) <= sizeof(((struct msi_desc *)0)->domain_data));

struct pci_msix_desc {
		u32 msix_ctrl;
		u8	multiple	: 3;
		u8	multi_cap	: 3;
		u8	can_mask	: 1;
		u8	is_64		: 1;
		u16 default_irq;
		void __iomem *mask_base;
}
static_assert(sizeof(struct pci_msix_desc) <= sizeof(((struct msi_desc *)0)->domain_data));

ideally hidden in the pci code with some irq_chip facing export API to
snoop in the bits a few places need

We've used 128 bits for the PCI descriptor, we might as well like
everyone have all 128 bits for whatever they want to do

Jason
  
Thomas Gleixner Nov. 17, 2022, 8:48 a.m. UTC | #2
On Wed, Nov 16 2022 at 15:28, Jason Gunthorpe wrote:
> On Fri, Nov 11, 2022 at 02:58:41PM +0100, Thomas Gleixner wrote:
>> +/**
>> + * struct msi_desc_data - Generic MSI descriptor data
>> + * @iobase:     Pointer to the IOMEM base adress for interrupt callbacks
>> + * @cookie:	Device cookie provided at allocation time
>> + *
>> + * The content of this data is implementation defined, e.g. PCI/IMS
>> + * implementations will define the meaning of the data.
>> + */
>> +struct msi_desc_data {
>> +	void			__iomem *iobase;
>> +	union msi_dev_cookie	cookie;
>> +};
>
> It would be nice to see the pci_msi_desc converted to a domain
> specific storage as well.
>
> Maybe could be written
>
> struct msi_desc {
>    u64 domain_data[2];
> }
>
> struct pci_msi_desc {
> 		u32 msi_mask;
> 		u8	multiple	: 3;
> 		u8	multi_cap	: 3;
> 		u8	can_mask	: 1;
> 		u8	is_64		: 1;
> 		u8	mask_pos;
> 		u16 default_irq;
> }
> static_assert(sizeof(struct pci_msi_desc) <= sizeof(((struct msi_desc *)0)->domain_data));
>
> struct pci_msix_desc {
> 		u32 msix_ctrl;
> 		u8	multiple	: 3;
> 		u8	multi_cap	: 3;
> 		u8	can_mask	: 1;
> 		u8	is_64		: 1;
> 		u16 default_irq;
> 		void __iomem *mask_base;
> }
> static_assert(sizeof(struct pci_msix_desc) <= sizeof(((struct msi_desc *)0)->domain_data));
>
> ideally hidden in the pci code with some irq_chip facing export API to
> snoop in the bits a few places need
>
> We've used 128 bits for the PCI descriptor, we might as well like
> everyone have all 128 bits for whatever they want to do

Not sure because we end up with nasty type casts for

> struct msi_desc {
>    u64 domain_data[2];
> }

Let me think about it.

Thanks,

        tglx
  
Jason Gunthorpe Nov. 17, 2022, 1:33 p.m. UTC | #3
On Thu, Nov 17, 2022 at 09:48:45AM +0100, Thomas Gleixner wrote:

> > We've used 128 bits for the PCI descriptor, we might as well like
> > everyone have all 128 bits for whatever they want to do
> 
> Not sure because we end up with nasty type casts for

Something like:

void *msi_desc_device_priv() {return dec->device_data;}

As netdev does it?

Jason
  
Thomas Gleixner Nov. 18, 2022, 10:08 p.m. UTC | #4
On Wed, Nov 16 2022 at 15:28, Jason Gunthorpe wrote:
> On Fri, Nov 11, 2022 at 02:58:41PM +0100, Thomas Gleixner wrote:
>
>> +/**
>> + * struct msi_desc_data - Generic MSI descriptor data
>> + * @iobase:     Pointer to the IOMEM base adress for interrupt callbacks
>> + * @cookie:	Device cookie provided at allocation time
>> + *
>> + * The content of this data is implementation defined, e.g. PCI/IMS
>> + * implementations will define the meaning of the data.
>> + */
>> +struct msi_desc_data {
>> +	void			__iomem *iobase;
>> +	union msi_dev_cookie	cookie;
>> +};
>
> It would be nice to see the pci_msi_desc converted to a domain
> specific storage as well.

I looked into this and it gets ugly very fast.

The above has two parts:

    iobase    is domain specific and setup by the domain code

    cookie    is per interrupt allocation. That's where the instance
              queue or whatever connects to the domain.

I can abuse the fields for PCI/MSI of course, but see below.

> Maybe could be written
>
> struct pci_msi_desc {
> }
> static_assert(sizeof(struct pci_msi_desc) <= sizeof(((struct msi_desc *)0)->domain_data));
>
> struct pci_msix_desc {
> }
> static_assert(sizeof(struct pci_msix_desc) <= sizeof(((struct msi_desc *)0)->domain_data));
>
> ideally hidden in the pci code with some irq_chip facing export API to
> snoop in the bits a few places need

I can't use that for the current combo legacy PCI/MSI code as I can't
split the irq chip implementations like I can with the new per device
domains.

And no, I'm not going to create separate code pathes which do the same
thing on different data structures just to pretend that it's all shiny.

> We've used 128 bits for the PCI descriptor, we might as well like
> everyone have all 128 bits for whatever they want to do

That's fine, but there are two parts to it:

   1) Domain specific data

   2) Per allocation specific data

#1 is data which the domain code puts there, e.g. in the prepare_desc()
   callback

#2 is data which the usage site hands in which gives the domain and the
   interrupt chip the information it needs

So the data structure should look like this:

struct msi_desc_data {
	union msi_domain_cookie		dom_cookie;
	union msi_instance_cookie	ins_cookie;
};

union msi_domain_cookie {
	void __iomem	*iobase;
        void		*ptr;
        u64		value;
};

union msi_instance_cookie {
        void		*ptr;
        u64		value;
};

Sure I could make both cookies plain u64, but I hate these forced type
casts and the above is simple to handle and understand.

So you get your 128 bit, but not per instance because that's a nightmare
to validate versus the allocation code which has to copy the data into
the msi descriptor, whatever it is (PASID, queue pointer ....).

Having two cookies makes a lot of sense to have a proper separation
between domain and usage site.

For IDXD the domain wants to store the iobase and needs a per allocation
PASID.

For your queue model, the domain wants a pointer to some device or
whatever specific things and the queue provides a pointer so that the
domain/chip can do the right thing for that particular instance.

For both sides, the domain and the allocation side something like the
above is sufficient.

Thanks,

        tglx
  
Jason Gunthorpe Nov. 21, 2022, 5:20 p.m. UTC | #5
On Fri, Nov 18, 2022 at 11:08:55PM +0100, Thomas Gleixner wrote:

> I looked into this and it gets ugly very fast.
> 
> The above has two parts:
> 
>     iobase    is domain specific and setup by the domain code
> 
>     cookie    is per interrupt allocation. That's where the instance
>               queue or whatever connects to the domain.
> 
> I can abuse the fields for PCI/MSI of course, but see below.

I don't know that we need to store the second one forever in the desc.
I was thinking this information is ephemeral, just used during alloc,
and if the msi domain driver wishes some of it to be stored then it
should do so.

> Sure I could make both cookies plain u64, but I hate these forced type
> casts and the above is simple to handle and understand.

I guess, they aren't what I think of as cookies, so I wouldn't make
them u64 in the first place.

The argument to msi_domain_alloc_irq_at() ideally wants to be a
per-domain-type struct so we can folow it around more cleanly. This is
C so we have to type erase it as a void * through the core code, but
OK.

The second one is typically called "driver private data" in device
driver subsystems that can't use container_of for some reason - just a
chunk of data the driver can associate with a core owned struct.

The usual pattern for driver private data is for the core to provide
some kind of accessor void *get_priv() (think dev_get_drvdata()) or
whatever.

But I do understand your point about keeping the drivers away from
things. Maybe some other pattern is better in this case.

Jason
  
Thomas Gleixner Nov. 21, 2022, 7:40 p.m. UTC | #6
On Mon, Nov 21 2022 at 13:20, Jason Gunthorpe wrote:
> On Fri, Nov 18, 2022 at 11:08:55PM +0100, Thomas Gleixner wrote:
>> Sure I could make both cookies plain u64, but I hate these forced type
>> casts and the above is simple to handle and understand.
>
> I guess, they aren't what I think of as cookies, so I wouldn't make
> them u64 in the first place.
>
> The argument to msi_domain_alloc_irq_at() ideally wants to be a
> per-domain-type struct so we can folow it around more cleanly. This is
> C so we have to type erase it as a void * through the core code, but
> OK.

When looking at the wire to MSI abomination and also PASID there is no
real per domain struct. It's plain integer information and I hate to
store it in a pointer. Especially as the pointer width on 32bit is not
necessarily sufficient.

Allocating 8 bytes and tracking them to be freed would be an horrible
idea.

> The second one is typically called "driver private data" in device
> driver subsystems that can't use container_of for some reason - just a
> chunk of data the driver can associate with a core owned struct.
>
> The usual pattern for driver private data is for the core to provide
> some kind of accessor void *get_priv() (think dev_get_drvdata()) or
> whatever.
>
> But I do understand your point about keeping the drivers away from
> things. Maybe some other pattern is better in this case.

At least from the two examples I have (IDXD and wire2MSI) the per
instance union works perfectly fine and I can't see a reason why
e.g. for your usecase

     cookie = { .ptr = myqueue };

would not work. The meaning of the cookie is domain implementation
defined and only the actual MSI domain and the related users know
whether its a value or a pointer and what to do with this information. I
named it cookie because from the core MSI code's view it's completely
opaque and aside of the fact that the allocation function copies the
cookie into msi_desc, the core does not care at all about it. Same for
the domain one which is solely handled by the domain setup code and is
e.g. used to enable the irq chip callbacks to do what they need to do.

Thanks,

        tglx
  
Jason Gunthorpe Nov. 22, 2022, 1:52 a.m. UTC | #7
On Mon, Nov 21, 2022 at 08:40:05PM +0100, Thomas Gleixner wrote:
> On Mon, Nov 21 2022 at 13:20, Jason Gunthorpe wrote:
> > On Fri, Nov 18, 2022 at 11:08:55PM +0100, Thomas Gleixner wrote:
> >> Sure I could make both cookies plain u64, but I hate these forced type
> >> casts and the above is simple to handle and understand.
> >
> > I guess, they aren't what I think of as cookies, so I wouldn't make
> > them u64 in the first place.
> >
> > The argument to msi_domain_alloc_irq_at() ideally wants to be a
> > per-domain-type struct so we can folow it around more cleanly. This is
> > C so we have to type erase it as a void * through the core code, but
> > OK.
> 
> When looking at the wire to MSI abomination and also PASID there is no
> real per domain struct. It's plain integer information and I hate to
> store it in a pointer. Especially as the pointer width on 32bit is not
> necessarily sufficient.
> 
> Allocating 8 bytes and tracking them to be freed would be an horrible
> idea.

No, not allocation, just wrap in a stack variable:

  struct foo_bar_domain_data arg = {.pasid = XX};

  msi_domain_alloc_irq_at(..., &arg);

Then there is a great big clue right in the code who is supposed to be
consuming that opaque argument. grep the code for foo_bar_domain_data
and you can find the receiving side

> At least from the two examples I have (IDXD and wire2MSI) the per
> instance union works perfectly fine and I can't see a reason why
> e.g. for your usecase
> 
>      cookie = { .ptr = myqueue };
> 
> would not work. 

I'm not saying not work, I'm asking about the style choice

Regards,
Jason
  
Thomas Gleixner Nov. 22, 2022, 8:49 p.m. UTC | #8
Jason,

On Mon, Nov 21 2022 at 21:52, Jason Gunthorpe wrote:
> On Mon, Nov 21, 2022 at 08:40:05PM +0100, Thomas Gleixner wrote:
>> When looking at the wire to MSI abomination and also PASID there is no
>> real per domain struct. It's plain integer information and I hate to
>> store it in a pointer. Especially as the pointer width on 32bit is not
>> necessarily sufficient.
>> 
>> Allocating 8 bytes and tracking them to be freed would be an horrible
>> idea.
>
> No, not allocation, just wrap in a stack variable:
>
>   struct foo_bar_domain_data arg = {.pasid = XX};
>
>   msi_domain_alloc_irq_at(..., &arg);
>
> Then there is a great big clue right in the code who is supposed to be
> consuming that opaque argument. grep the code for foo_bar_domain_data
> and you can find the receiving side

Agreed for the one or two sane people who actually will create their
data struct. The rest will just hand in a random pointer or a casted
integer, which is pretty useless for finding the counterpart.

>> At least from the two examples I have (IDXD and wire2MSI) the per
>> instance union works perfectly fine and I can't see a reason why
>> e.g. for your usecase
>> 
>>      cookie = { .ptr = myqueue };
>> 
>> would not work. 
>
> I'm not saying not work, I'm asking about the style choice

Sure. The other reason why made this choice is that for many cases it
spares a callback to actually convert the pointer into real storage,
which is necessary because the data it points to is on stack.

Just copying the data into the MSI descriptor solves this nicely without
having some extra magic.

I guess we're nearing bike shed realm by now :) Let's pick one evil and
see how it works out. Coccinelle is there to help us fixing it up when
it turns out to be the wrong evil. :)

Thanks,

        tglx
  
Jason Gunthorpe Nov. 23, 2022, 4:58 p.m. UTC | #9
On Tue, Nov 22, 2022 at 09:49:11PM +0100, Thomas Gleixner wrote:

> I guess we're nearing bike shed realm by now :) Let's pick one evil and
> see how it works out. Coccinelle is there to help us fixing it up when
> it turns out to be the wrong evil. :)

Sure, it is all changeable

I find your perspective on driver authors as the enemy quite
interesting :)

Jason
  
Thomas Gleixner Nov. 23, 2022, 6:38 p.m. UTC | #10
On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote:
> On Tue, Nov 22, 2022 at 09:49:11PM +0100, Thomas Gleixner wrote:
>> I guess we're nearing bike shed realm by now :) Let's pick one evil and
>> see how it works out. Coccinelle is there to help us fixing it up when
>> it turns out to be the wrong evil. :)
>
> Sure, it is all changeable
>
> I find your perspective on driver authors as the enemy quite
> interesting :)

I'm not seeing them as enemies. Just my expectations are rather low by
now :)

Thanks,

        tglx
  
Thomas Gleixner Dec. 1, 2022, 12:24 p.m. UTC | #11
Jason!

On Wed, Nov 23 2022 at 19:38, Thomas Gleixner wrote:
> On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote:
>> I find your perspective on driver authors as the enemy quite
>> interesting :)
>
> I'm not seeing them as enemies. Just my expectations are rather low by
> now :)

This made me think about it for a while. Let me follow up on that.

When I set out to add real-time capabilities to the kernel about 20 years
ago, I did a thorough analysis of the kernel design and code base.

It turned out that aside of well encapsulated infrastructure, e.g. mm,
vfs, scheduler, network core, quite some of the rest was consisting of
blatant layering violations held together with duct tape, super glue and
haywire-circuit.

It was immediately clear to me, that this needs a lot of consolidation
and cleanup work to get me even close to the point where RT becomes
feasible as an integral part of the kernel. But not only this became
clear, I also realized that a continuation of this model will end up in
a maintenance nightmare sooner than later.

Me and the other people interested in RT estimated back then that it'll
take 5-10 years to get this done.

Boy, we were young and naive back then and completely underestimating
the efforts required. Obviously we were also underestimating the
concurrent influx of new stuff.

Just to give you an example. Our early experiments with substituting
spinlocks was just the start of the horrors. Instead of working on the
actual substitution mechanisms and the required other modifications, we
spent a vast amount of our time chasing dead locks all over the place.
My main test machine had not a single device driver which was correct
and working out of the box. What's worse is that we had to debate with
some of the driver people about the correctness of our locking analysis
and fight for stuff getting fixed.

This ended in writing and integrating lockdep, which has thankfully
taken this burden of our plate.

When I started to look into interrupt handling to add support for
threaded interrupts, which are a fundamental prerequisite for RT, the
next nightmare started to unfold.

The "generic" core code was a skeleton and everything real was
implemented in architecture specific code in completely incompatible
ways. It was not even possible to change common data structures without
breaking the world.  What was even worse, drivers fiddled in the
interrupt descriptors just to scratch an itch.

What I learned pretty fast is that most driver writers try to work
around short-comings in common infrastructure instead of tackling the
problem at the root or talking to the developers/maintainers of that
infrastructure.

The consequence of that is: if you want to change core infrastructure
you end up mopping up the driver tree in order not to break things all
over the place. There are clearly better ways to spend your time.

So I started to encapsulate things more strictly - admittedly to make my
own life easier. But at the same time I always tried hard to make these
encapsulations easy to use, to provide common infrastructure in order to
replace boilerplate code and to help with resource management, which is
one of the common problems in driver code. I'm also quite confident that
I carefully listened to the needs of driver developers and I think the
whole discussion about IMS last year is a good example for that. I
surely have opinions, but who doesn't?

So no, I'm not seeing driver writers as enemies. I'm just accepting the
reality that quite some of the drivers are written in "get it out the
door" mode. I'm well aware that there are other folks who stay around for
a long time and do proper engineering and maintenance, but that's sadly
the minority.

Being responsible for core infrastructure is an interesting challenge
especially with the zoo of legacy to keep alive and the knowledge that
you can break the world with a trivial and obviously "correct"
change. Been there, done that. :)

Thanks,

        Thomas
  
Jason Gunthorpe Dec. 2, 2022, 12:35 a.m. UTC | #12
On Thu, Dec 01, 2022 at 01:24:03PM +0100, Thomas Gleixner wrote:
> Jason!
> 
> On Wed, Nov 23 2022 at 19:38, Thomas Gleixner wrote:
> > On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote:
> >> I find your perspective on driver authors as the enemy quite
> >> interesting :)
> >
> > I'm not seeing them as enemies. Just my expectations are rather low by
> > now :)
> 
> This made me think about it for a while. Let me follow up on that.

I didn't intend to pick such a harsh word, I get your point.

In lands like netdev/rdma breaking driver is unhappy, but we can get
away with it because most people don't have systems that won't boot if
you break those drivers. The people responsible will eventually test a
new kernel, see the warn/lockdep/etc and can debug and fix the problem
without a huge hassle.

In the platform code, if someone's machine boots to a black, dead,
screen you are much less likely to get a helpful person able to fix
it, more likely an end user is unhappy their kernel is
busted. Especially since the chances of a bug getting past the testing
basically forces it to be on some obscure configuration.

Due to this difference I've come to appreciate much more putting
stronger guard rails and clearer design on the lower level
code. Supporting the long tail of rare platforms is difficult.

> What I learned pretty fast is that most driver writers try to work
> around short-comings in common infrastructure instead of tackling the
> problem at the root or talking to the developers/maintainers of that
> infrastructure.

Yes, few people can tackle this stuff, and there are often interesting
headwinds.

Regards,
Jason
  
Thomas Gleixner Dec. 2, 2022, 2:14 a.m. UTC | #13
Jason!

On Thu, Dec 01 2022 at 20:35, Jason Gunthorpe wrote:
> On Thu, Dec 01, 2022 at 01:24:03PM +0100, Thomas Gleixner wrote:
>> On Wed, Nov 23 2022 at 19:38, Thomas Gleixner wrote:
>> > On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote:
>> >> I find your perspective on driver authors as the enemy quite
>> >> interesting :)
>> >
>> > I'm not seeing them as enemies. Just my expectations are rather low by
>> > now :)
>> 
>> This made me think about it for a while. Let me follow up on that.
>
> I didn't intend to pick such a harsh word, I get your point.

I didn't take that as offence at all.

It's a good thing to be enforced from time to time to reflect on what
I'm doing and why I'm doing it to make sure that I'm not completely off
track.

Thanks,

        Thomas
  

Patch

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -121,6 +121,19 @@  struct pci_msi_desc {
 	};
 };
 
+/**
+ * struct msi_desc_data - Generic MSI descriptor data
+ * @iobase:     Pointer to the IOMEM base adress for interrupt callbacks
+ * @cookie:	Device cookie provided at allocation time
+ *
+ * The content of this data is implementation defined, e.g. PCI/IMS
+ * implementations will define the meaning of the data.
+ */
+struct msi_desc_data {
+	void			__iomem *iobase;
+	union msi_dev_cookie	cookie;
+};
+
 #define MSI_MAX_INDEX		((unsigned int)USHRT_MAX)
 
 /**
@@ -138,6 +151,7 @@  struct pci_msi_desc {
  *
  * @msi_index:	Index of the msi descriptor
  * @pci:	PCI specific msi descriptor data
+ * @data:	Generic MSI descriptor data
  */
 struct msi_desc {
 	/* Shared device/bus type independent data */
@@ -157,7 +171,10 @@  struct msi_desc {
 	void *write_msi_msg_data;
 
 	u16				msi_index;
-	struct pci_msi_desc		pci;
+	union {
+		struct pci_msi_desc	pci;
+		struct msi_desc_data	data;
+	};
 };
 
 /*
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -19,6 +19,23 @@  enum msi_domain_ids {
 };
 
 /**
+ * union msi_dev_cookie - MSI device cookie
+ * @value:	u64 value store
+ * @ptr:	Pointer
+ *
+ * This data is handed to the IMS allocation function and stored
+ * in the MSI descriptor for the interrupt chip callbacks.
+ *
+ * The content of this data is implementation defined, e.g. PCI/IMS
+ * implementations will define the meaning of the data, e.g. PASID or a
+ * pointer to queue memory.
+ */
+union msi_dev_cookie {
+	u64	value;
+	void	*ptr;
+};
+
+/**
  * msi_map - Mapping between MSI index and Linux interrupt number
  * @index:	The MSI index, e.g. slot in the MSI-X table or
  *		a software managed index if >= 0. If negative