Message ID | 20221111135206.346985384@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp760047wru; Fri, 11 Nov 2022 06:08:19 -0800 (PST) X-Google-Smtp-Source: AA0mqf6BdH3ln2/m3SueTu9zWOCz47La+AyaOjd7biQlvCObXbHyuz3OQTUgTZ/fbpga394KyCK8 X-Received: by 2002:a63:e644:0:b0:459:7d5f:293e with SMTP id p4-20020a63e644000000b004597d5f293emr1721224pgj.602.1668175699203; Fri, 11 Nov 2022 06:08:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668175699; cv=none; d=google.com; s=arc-20160816; b=Kba51EBOTgLbz7swtTWPNctq4Sae9chg3j+jK39AqnJWmHoxtsQARVzg63CNoqwN8r iCZu/YzFeOzX1k4TH39L0RUjrveowIm0EvgbZXTpVsMwfl9NZustVKHgGp6H6xO+o4KN 8oKmFzb/CxDezVbHi+tsQQBCBUj+6g1llF2kRbgXzK2oe/kol0iSpP0j1M8qfyXB80Hg inwcIQi6LYIgZSEnk6iHPe81r25ZCnh3I2G0n/rypAmRDzTfVSxobbVaE0FFaawSd/QN zy11fa1LLazqlw4a9oS0MmAhd0YoeSvmY88vsnYPQ2xsu8iccHR+Pi6AavM7N1pnilF/ ZT3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=UQ8wGaxCGGl6i2FWioqhnvUbaJkzVnC/Tizkv4RadYs=; b=mTKT0ZJjUoZJ0t5QnN0RmeMAuATO3S/OxoSf8lTzI/JKF+mOQwtoyy7DbQKHSnWcai JVxylnnyFLKkILzy/rXdUYxOjDQMU0AI3pfYdQEZGZVl5RB2p4jpyhnauKvK/FO066DP tDTZQgyEYsiR22GyzteD/grd/d5llsJNTv7hleeBmM+FfONW5/zX3uvcy+wHc5LRZnGv bsn32YaKMnDIAbthMAeid3YnR37+YUBLk8HwekMNIqUrQJM9/oAGxvXdMTB7dcTcAF17 Cif3cNqURuIMLxt0UGETYiHraK653ZfB8fXX7PJFOTT3fmLyLiWvX2+BWvHP/RuPq+pV OjAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=InbvRLz2; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q18-20020a170902789200b001822121c45fsi2357669pll.338.2022.11.11.06.08.04; Fri, 11 Nov 2022 06:08:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=InbvRLz2; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234485AbiKKOGy (ORCPT <rfc822;winker.wchi@gmail.com> + 99 others); Fri, 11 Nov 2022 09:06:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234215AbiKKOGV (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 11 Nov 2022 09:06:21 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9091598BB8; Fri, 11 Nov 2022 05:59:40 -0800 (PST) Message-ID: <20221111135206.346985384@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1668175121; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=UQ8wGaxCGGl6i2FWioqhnvUbaJkzVnC/Tizkv4RadYs=; b=InbvRLz2xiE1xSxHfoXRhru+XjEM5gfiGxuSEQ7Lcj/JvYVrWqnuRqCyOQbXgOZP3a+IrX afUht5oZxyk+AaXMsRFzV3gSY/hfw8vB9zcjWVluI96CJyotpxAVAo7mMcKjj54m9ID8aq muBCRnwtZ+4h/QIWJzK4dJnmMiBw/06/XhL8AxiR/qvLsmXNeHzJn01//ekakzNVQLEW2A enPEZxfcPxnxGzRc58QjjccMWsTbjSD3j5FMH6qVY/a4RRpkrBOYYz21mYNHPexZWNmQfo JKHFdxdT926RfGjIssN/mFK3I9DtOzBaQjuryH4M8Cvl05TXDLJpa+1WW38zqA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1668175122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=UQ8wGaxCGGl6i2FWioqhnvUbaJkzVnC/Tizkv4RadYs=; b=CZMuJlTc7yrU0Fd0ZAHupzdYF1YUnHqpbn+7nNNm+rX7yFmAGQs8NV5Ytqm12bxK2u1hDq ATinmN2jvy4w6rBg== From: Thomas Gleixner <tglx@linutronix.de> To: LKML <linux-kernel@vger.kernel.org> Cc: x86@kernel.org, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Marc Zyngier <maz@kernel.org>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Jason Gunthorpe <jgg@mellanox.com>, Dave Jiang <dave.jiang@intel.com>, Alex Williamson <alex.williamson@redhat.com>, Kevin Tian <kevin.tian@intel.com>, Dan Williams <dan.j.williams@intel.com>, Logan Gunthorpe <logang@deltatee.com>, Ashok Raj <ashok.raj@intel.com>, Jon Mason <jdmason@kudzu.us>, Allen Hubbe <allenbh@gmail.com>, "Ahmed S. Darwish" <darwi@linutronix.de>, Reinette Chatre <reinette.chatre@intel.com> Subject: [patch 19/33] genirq/msi: Provide msi_desc::msi_data References: <20221111133158.196269823@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Date: Fri, 11 Nov 2022 14:58:41 +0100 (CET) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749209001932782574?= X-GMAIL-MSGID: =?utf-8?q?1749209001932782574?= |
Series |
genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation
|
|
Commit Message
Thomas Gleixner
Nov. 11, 2022, 1:58 p.m. UTC
The upcoming support for PCI/IMS requires to store some information related
to the message handling in the MSI descriptor, e.g. PASID or a pointer to a
queue.
Provide a generic storage struct which maps over the existing PCI specific
storage which means the size of struct msi_desc is not getting bigger.
It contains a iomem pointer for device memory based IMS and a union of a
u64 and a void pointer which allows the device specific IMS implementations
to store the necessary information.
The iomem pointer is set up by the domain allocation functions.
The data union msi_dev_cookie is going to be handed in when allocating an
interrupt on an IMS domain so the irq chip callbacks of the IMS domain have
the necessary per vector information available. It also comes in handy when
cleaning up the platform MSI code for wire to MSI bridges which need to
hand down the type information to the underlying interrupt domain.
For the core code the cookie is opaque and meaningless. It just stores it
during an allocation through the upcoming interfaces for IMS and wire to
MSI brigdes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/msi.h | 19 ++++++++++++++++++-
include/linux/msi_api.h | 17 +++++++++++++++++
2 files changed, 35 insertions(+), 1 deletion(-)
Comments
On Fri, Nov 11, 2022 at 02:58:41PM +0100, Thomas Gleixner wrote: > +/** > + * struct msi_desc_data - Generic MSI descriptor data > + * @iobase: Pointer to the IOMEM base adress for interrupt callbacks > + * @cookie: Device cookie provided at allocation time > + * > + * The content of this data is implementation defined, e.g. PCI/IMS > + * implementations will define the meaning of the data. > + */ > +struct msi_desc_data { > + void __iomem *iobase; > + union msi_dev_cookie cookie; > +}; It would be nice to see the pci_msi_desc converted to a domain specific storage as well. Maybe could be written struct msi_desc { u64 domain_data[2]; } struct pci_msi_desc { u32 msi_mask; u8 multiple : 3; u8 multi_cap : 3; u8 can_mask : 1; u8 is_64 : 1; u8 mask_pos; u16 default_irq; } static_assert(sizeof(struct pci_msi_desc) <= sizeof(((struct msi_desc *)0)->domain_data)); struct pci_msix_desc { u32 msix_ctrl; u8 multiple : 3; u8 multi_cap : 3; u8 can_mask : 1; u8 is_64 : 1; u16 default_irq; void __iomem *mask_base; } static_assert(sizeof(struct pci_msix_desc) <= sizeof(((struct msi_desc *)0)->domain_data)); ideally hidden in the pci code with some irq_chip facing export API to snoop in the bits a few places need We've used 128 bits for the PCI descriptor, we might as well like everyone have all 128 bits for whatever they want to do Jason
On Wed, Nov 16 2022 at 15:28, Jason Gunthorpe wrote: > On Fri, Nov 11, 2022 at 02:58:41PM +0100, Thomas Gleixner wrote: >> +/** >> + * struct msi_desc_data - Generic MSI descriptor data >> + * @iobase: Pointer to the IOMEM base adress for interrupt callbacks >> + * @cookie: Device cookie provided at allocation time >> + * >> + * The content of this data is implementation defined, e.g. PCI/IMS >> + * implementations will define the meaning of the data. >> + */ >> +struct msi_desc_data { >> + void __iomem *iobase; >> + union msi_dev_cookie cookie; >> +}; > > It would be nice to see the pci_msi_desc converted to a domain > specific storage as well. > > Maybe could be written > > struct msi_desc { > u64 domain_data[2]; > } > > struct pci_msi_desc { > u32 msi_mask; > u8 multiple : 3; > u8 multi_cap : 3; > u8 can_mask : 1; > u8 is_64 : 1; > u8 mask_pos; > u16 default_irq; > } > static_assert(sizeof(struct pci_msi_desc) <= sizeof(((struct msi_desc *)0)->domain_data)); > > struct pci_msix_desc { > u32 msix_ctrl; > u8 multiple : 3; > u8 multi_cap : 3; > u8 can_mask : 1; > u8 is_64 : 1; > u16 default_irq; > void __iomem *mask_base; > } > static_assert(sizeof(struct pci_msix_desc) <= sizeof(((struct msi_desc *)0)->domain_data)); > > ideally hidden in the pci code with some irq_chip facing export API to > snoop in the bits a few places need > > We've used 128 bits for the PCI descriptor, we might as well like > everyone have all 128 bits for whatever they want to do Not sure because we end up with nasty type casts for > struct msi_desc { > u64 domain_data[2]; > } Let me think about it. Thanks, tglx
On Thu, Nov 17, 2022 at 09:48:45AM +0100, Thomas Gleixner wrote: > > We've used 128 bits for the PCI descriptor, we might as well like > > everyone have all 128 bits for whatever they want to do > > Not sure because we end up with nasty type casts for Something like: void *msi_desc_device_priv() {return dec->device_data;} As netdev does it? Jason
On Wed, Nov 16 2022 at 15:28, Jason Gunthorpe wrote: > On Fri, Nov 11, 2022 at 02:58:41PM +0100, Thomas Gleixner wrote: > >> +/** >> + * struct msi_desc_data - Generic MSI descriptor data >> + * @iobase: Pointer to the IOMEM base adress for interrupt callbacks >> + * @cookie: Device cookie provided at allocation time >> + * >> + * The content of this data is implementation defined, e.g. PCI/IMS >> + * implementations will define the meaning of the data. >> + */ >> +struct msi_desc_data { >> + void __iomem *iobase; >> + union msi_dev_cookie cookie; >> +}; > > It would be nice to see the pci_msi_desc converted to a domain > specific storage as well. I looked into this and it gets ugly very fast. The above has two parts: iobase is domain specific and setup by the domain code cookie is per interrupt allocation. That's where the instance queue or whatever connects to the domain. I can abuse the fields for PCI/MSI of course, but see below. > Maybe could be written > > struct pci_msi_desc { > } > static_assert(sizeof(struct pci_msi_desc) <= sizeof(((struct msi_desc *)0)->domain_data)); > > struct pci_msix_desc { > } > static_assert(sizeof(struct pci_msix_desc) <= sizeof(((struct msi_desc *)0)->domain_data)); > > ideally hidden in the pci code with some irq_chip facing export API to > snoop in the bits a few places need I can't use that for the current combo legacy PCI/MSI code as I can't split the irq chip implementations like I can with the new per device domains. And no, I'm not going to create separate code pathes which do the same thing on different data structures just to pretend that it's all shiny. > We've used 128 bits for the PCI descriptor, we might as well like > everyone have all 128 bits for whatever they want to do That's fine, but there are two parts to it: 1) Domain specific data 2) Per allocation specific data #1 is data which the domain code puts there, e.g. in the prepare_desc() callback #2 is data which the usage site hands in which gives the domain and the interrupt chip the information it needs So the data structure should look like this: struct msi_desc_data { union msi_domain_cookie dom_cookie; union msi_instance_cookie ins_cookie; }; union msi_domain_cookie { void __iomem *iobase; void *ptr; u64 value; }; union msi_instance_cookie { void *ptr; u64 value; }; Sure I could make both cookies plain u64, but I hate these forced type casts and the above is simple to handle and understand. So you get your 128 bit, but not per instance because that's a nightmare to validate versus the allocation code which has to copy the data into the msi descriptor, whatever it is (PASID, queue pointer ....). Having two cookies makes a lot of sense to have a proper separation between domain and usage site. For IDXD the domain wants to store the iobase and needs a per allocation PASID. For your queue model, the domain wants a pointer to some device or whatever specific things and the queue provides a pointer so that the domain/chip can do the right thing for that particular instance. For both sides, the domain and the allocation side something like the above is sufficient. Thanks, tglx
On Fri, Nov 18, 2022 at 11:08:55PM +0100, Thomas Gleixner wrote: > I looked into this and it gets ugly very fast. > > The above has two parts: > > iobase is domain specific and setup by the domain code > > cookie is per interrupt allocation. That's where the instance > queue or whatever connects to the domain. > > I can abuse the fields for PCI/MSI of course, but see below. I don't know that we need to store the second one forever in the desc. I was thinking this information is ephemeral, just used during alloc, and if the msi domain driver wishes some of it to be stored then it should do so. > Sure I could make both cookies plain u64, but I hate these forced type > casts and the above is simple to handle and understand. I guess, they aren't what I think of as cookies, so I wouldn't make them u64 in the first place. The argument to msi_domain_alloc_irq_at() ideally wants to be a per-domain-type struct so we can folow it around more cleanly. This is C so we have to type erase it as a void * through the core code, but OK. The second one is typically called "driver private data" in device driver subsystems that can't use container_of for some reason - just a chunk of data the driver can associate with a core owned struct. The usual pattern for driver private data is for the core to provide some kind of accessor void *get_priv() (think dev_get_drvdata()) or whatever. But I do understand your point about keeping the drivers away from things. Maybe some other pattern is better in this case. Jason
On Mon, Nov 21 2022 at 13:20, Jason Gunthorpe wrote: > On Fri, Nov 18, 2022 at 11:08:55PM +0100, Thomas Gleixner wrote: >> Sure I could make both cookies plain u64, but I hate these forced type >> casts and the above is simple to handle and understand. > > I guess, they aren't what I think of as cookies, so I wouldn't make > them u64 in the first place. > > The argument to msi_domain_alloc_irq_at() ideally wants to be a > per-domain-type struct so we can folow it around more cleanly. This is > C so we have to type erase it as a void * through the core code, but > OK. When looking at the wire to MSI abomination and also PASID there is no real per domain struct. It's plain integer information and I hate to store it in a pointer. Especially as the pointer width on 32bit is not necessarily sufficient. Allocating 8 bytes and tracking them to be freed would be an horrible idea. > The second one is typically called "driver private data" in device > driver subsystems that can't use container_of for some reason - just a > chunk of data the driver can associate with a core owned struct. > > The usual pattern for driver private data is for the core to provide > some kind of accessor void *get_priv() (think dev_get_drvdata()) or > whatever. > > But I do understand your point about keeping the drivers away from > things. Maybe some other pattern is better in this case. At least from the two examples I have (IDXD and wire2MSI) the per instance union works perfectly fine and I can't see a reason why e.g. for your usecase cookie = { .ptr = myqueue }; would not work. The meaning of the cookie is domain implementation defined and only the actual MSI domain and the related users know whether its a value or a pointer and what to do with this information. I named it cookie because from the core MSI code's view it's completely opaque and aside of the fact that the allocation function copies the cookie into msi_desc, the core does not care at all about it. Same for the domain one which is solely handled by the domain setup code and is e.g. used to enable the irq chip callbacks to do what they need to do. Thanks, tglx
On Mon, Nov 21, 2022 at 08:40:05PM +0100, Thomas Gleixner wrote: > On Mon, Nov 21 2022 at 13:20, Jason Gunthorpe wrote: > > On Fri, Nov 18, 2022 at 11:08:55PM +0100, Thomas Gleixner wrote: > >> Sure I could make both cookies plain u64, but I hate these forced type > >> casts and the above is simple to handle and understand. > > > > I guess, they aren't what I think of as cookies, so I wouldn't make > > them u64 in the first place. > > > > The argument to msi_domain_alloc_irq_at() ideally wants to be a > > per-domain-type struct so we can folow it around more cleanly. This is > > C so we have to type erase it as a void * through the core code, but > > OK. > > When looking at the wire to MSI abomination and also PASID there is no > real per domain struct. It's plain integer information and I hate to > store it in a pointer. Especially as the pointer width on 32bit is not > necessarily sufficient. > > Allocating 8 bytes and tracking them to be freed would be an horrible > idea. No, not allocation, just wrap in a stack variable: struct foo_bar_domain_data arg = {.pasid = XX}; msi_domain_alloc_irq_at(..., &arg); Then there is a great big clue right in the code who is supposed to be consuming that opaque argument. grep the code for foo_bar_domain_data and you can find the receiving side > At least from the two examples I have (IDXD and wire2MSI) the per > instance union works perfectly fine and I can't see a reason why > e.g. for your usecase > > cookie = { .ptr = myqueue }; > > would not work. I'm not saying not work, I'm asking about the style choice Regards, Jason
Jason, On Mon, Nov 21 2022 at 21:52, Jason Gunthorpe wrote: > On Mon, Nov 21, 2022 at 08:40:05PM +0100, Thomas Gleixner wrote: >> When looking at the wire to MSI abomination and also PASID there is no >> real per domain struct. It's plain integer information and I hate to >> store it in a pointer. Especially as the pointer width on 32bit is not >> necessarily sufficient. >> >> Allocating 8 bytes and tracking them to be freed would be an horrible >> idea. > > No, not allocation, just wrap in a stack variable: > > struct foo_bar_domain_data arg = {.pasid = XX}; > > msi_domain_alloc_irq_at(..., &arg); > > Then there is a great big clue right in the code who is supposed to be > consuming that opaque argument. grep the code for foo_bar_domain_data > and you can find the receiving side Agreed for the one or two sane people who actually will create their data struct. The rest will just hand in a random pointer or a casted integer, which is pretty useless for finding the counterpart. >> At least from the two examples I have (IDXD and wire2MSI) the per >> instance union works perfectly fine and I can't see a reason why >> e.g. for your usecase >> >> cookie = { .ptr = myqueue }; >> >> would not work. > > I'm not saying not work, I'm asking about the style choice Sure. The other reason why made this choice is that for many cases it spares a callback to actually convert the pointer into real storage, which is necessary because the data it points to is on stack. Just copying the data into the MSI descriptor solves this nicely without having some extra magic. I guess we're nearing bike shed realm by now :) Let's pick one evil and see how it works out. Coccinelle is there to help us fixing it up when it turns out to be the wrong evil. :) Thanks, tglx
On Tue, Nov 22, 2022 at 09:49:11PM +0100, Thomas Gleixner wrote: > I guess we're nearing bike shed realm by now :) Let's pick one evil and > see how it works out. Coccinelle is there to help us fixing it up when > it turns out to be the wrong evil. :) Sure, it is all changeable I find your perspective on driver authors as the enemy quite interesting :) Jason
On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote: > On Tue, Nov 22, 2022 at 09:49:11PM +0100, Thomas Gleixner wrote: >> I guess we're nearing bike shed realm by now :) Let's pick one evil and >> see how it works out. Coccinelle is there to help us fixing it up when >> it turns out to be the wrong evil. :) > > Sure, it is all changeable > > I find your perspective on driver authors as the enemy quite > interesting :) I'm not seeing them as enemies. Just my expectations are rather low by now :) Thanks, tglx
Jason! On Wed, Nov 23 2022 at 19:38, Thomas Gleixner wrote: > On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote: >> I find your perspective on driver authors as the enemy quite >> interesting :) > > I'm not seeing them as enemies. Just my expectations are rather low by > now :) This made me think about it for a while. Let me follow up on that. When I set out to add real-time capabilities to the kernel about 20 years ago, I did a thorough analysis of the kernel design and code base. It turned out that aside of well encapsulated infrastructure, e.g. mm, vfs, scheduler, network core, quite some of the rest was consisting of blatant layering violations held together with duct tape, super glue and haywire-circuit. It was immediately clear to me, that this needs a lot of consolidation and cleanup work to get me even close to the point where RT becomes feasible as an integral part of the kernel. But not only this became clear, I also realized that a continuation of this model will end up in a maintenance nightmare sooner than later. Me and the other people interested in RT estimated back then that it'll take 5-10 years to get this done. Boy, we were young and naive back then and completely underestimating the efforts required. Obviously we were also underestimating the concurrent influx of new stuff. Just to give you an example. Our early experiments with substituting spinlocks was just the start of the horrors. Instead of working on the actual substitution mechanisms and the required other modifications, we spent a vast amount of our time chasing dead locks all over the place. My main test machine had not a single device driver which was correct and working out of the box. What's worse is that we had to debate with some of the driver people about the correctness of our locking analysis and fight for stuff getting fixed. This ended in writing and integrating lockdep, which has thankfully taken this burden of our plate. When I started to look into interrupt handling to add support for threaded interrupts, which are a fundamental prerequisite for RT, the next nightmare started to unfold. The "generic" core code was a skeleton and everything real was implemented in architecture specific code in completely incompatible ways. It was not even possible to change common data structures without breaking the world. What was even worse, drivers fiddled in the interrupt descriptors just to scratch an itch. What I learned pretty fast is that most driver writers try to work around short-comings in common infrastructure instead of tackling the problem at the root or talking to the developers/maintainers of that infrastructure. The consequence of that is: if you want to change core infrastructure you end up mopping up the driver tree in order not to break things all over the place. There are clearly better ways to spend your time. So I started to encapsulate things more strictly - admittedly to make my own life easier. But at the same time I always tried hard to make these encapsulations easy to use, to provide common infrastructure in order to replace boilerplate code and to help with resource management, which is one of the common problems in driver code. I'm also quite confident that I carefully listened to the needs of driver developers and I think the whole discussion about IMS last year is a good example for that. I surely have opinions, but who doesn't? So no, I'm not seeing driver writers as enemies. I'm just accepting the reality that quite some of the drivers are written in "get it out the door" mode. I'm well aware that there are other folks who stay around for a long time and do proper engineering and maintenance, but that's sadly the minority. Being responsible for core infrastructure is an interesting challenge especially with the zoo of legacy to keep alive and the knowledge that you can break the world with a trivial and obviously "correct" change. Been there, done that. :) Thanks, Thomas
On Thu, Dec 01, 2022 at 01:24:03PM +0100, Thomas Gleixner wrote: > Jason! > > On Wed, Nov 23 2022 at 19:38, Thomas Gleixner wrote: > > On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote: > >> I find your perspective on driver authors as the enemy quite > >> interesting :) > > > > I'm not seeing them as enemies. Just my expectations are rather low by > > now :) > > This made me think about it for a while. Let me follow up on that. I didn't intend to pick such a harsh word, I get your point. In lands like netdev/rdma breaking driver is unhappy, but we can get away with it because most people don't have systems that won't boot if you break those drivers. The people responsible will eventually test a new kernel, see the warn/lockdep/etc and can debug and fix the problem without a huge hassle. In the platform code, if someone's machine boots to a black, dead, screen you are much less likely to get a helpful person able to fix it, more likely an end user is unhappy their kernel is busted. Especially since the chances of a bug getting past the testing basically forces it to be on some obscure configuration. Due to this difference I've come to appreciate much more putting stronger guard rails and clearer design on the lower level code. Supporting the long tail of rare platforms is difficult. > What I learned pretty fast is that most driver writers try to work > around short-comings in common infrastructure instead of tackling the > problem at the root or talking to the developers/maintainers of that > infrastructure. Yes, few people can tackle this stuff, and there are often interesting headwinds. Regards, Jason
Jason! On Thu, Dec 01 2022 at 20:35, Jason Gunthorpe wrote: > On Thu, Dec 01, 2022 at 01:24:03PM +0100, Thomas Gleixner wrote: >> On Wed, Nov 23 2022 at 19:38, Thomas Gleixner wrote: >> > On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote: >> >> I find your perspective on driver authors as the enemy quite >> >> interesting :) >> > >> > I'm not seeing them as enemies. Just my expectations are rather low by >> > now :) >> >> This made me think about it for a while. Let me follow up on that. > > I didn't intend to pick such a harsh word, I get your point. I didn't take that as offence at all. It's a good thing to be enforced from time to time to reflect on what I'm doing and why I'm doing it to make sure that I'm not completely off track. Thanks, Thomas
--- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -121,6 +121,19 @@ struct pci_msi_desc { }; }; +/** + * struct msi_desc_data - Generic MSI descriptor data + * @iobase: Pointer to the IOMEM base adress for interrupt callbacks + * @cookie: Device cookie provided at allocation time + * + * The content of this data is implementation defined, e.g. PCI/IMS + * implementations will define the meaning of the data. + */ +struct msi_desc_data { + void __iomem *iobase; + union msi_dev_cookie cookie; +}; + #define MSI_MAX_INDEX ((unsigned int)USHRT_MAX) /** @@ -138,6 +151,7 @@ struct pci_msi_desc { * * @msi_index: Index of the msi descriptor * @pci: PCI specific msi descriptor data + * @data: Generic MSI descriptor data */ struct msi_desc { /* Shared device/bus type independent data */ @@ -157,7 +171,10 @@ struct msi_desc { void *write_msi_msg_data; u16 msi_index; - struct pci_msi_desc pci; + union { + struct pci_msi_desc pci; + struct msi_desc_data data; + }; }; /* --- a/include/linux/msi_api.h +++ b/include/linux/msi_api.h @@ -19,6 +19,23 @@ enum msi_domain_ids { }; /** + * union msi_dev_cookie - MSI device cookie + * @value: u64 value store + * @ptr: Pointer + * + * This data is handed to the IMS allocation function and stored + * in the MSI descriptor for the interrupt chip callbacks. + * + * The content of this data is implementation defined, e.g. PCI/IMS + * implementations will define the meaning of the data, e.g. PASID or a + * pointer to queue memory. + */ +union msi_dev_cookie { + u64 value; + void *ptr; +}; + +/** * msi_map - Mapping between MSI index and Linux interrupt number * @index: The MSI index, e.g. slot in the MSI-X table or * a software managed index if >= 0. If negative