Message ID | 20230527054633.704916-2-chenhuacai@loongson.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp175474vqr; Sat, 27 May 2023 00:04:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7KuJ6sHImXBUIRVivRP1JrMwZgcLmMRqWkcvfHzMsOm6iO+CaRwgiexyBBtU7dAOOQGmns X-Received: by 2002:a05:6a21:3718:b0:10c:5324:dba with SMTP id yl24-20020a056a21371800b0010c53240dbamr1705581pzb.28.1685171081142; Sat, 27 May 2023 00:04:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685171081; cv=none; d=google.com; s=arc-20160816; b=qCVjIa33ijJi+CSNJIGtBv0uthLsp7ZZqUxLd+d4jGVozXnz5mjLat/sD86IIanK7R rjWJHepTXOYiOXeiPEac2ZrEBKbk+FFC7mF3dZbvyUQlxRwB5lTlzMEE/sKoxf1mUxlg +T5177gLf2pGUr3LbHaofguf0pue1gmCdIrD0E4eOfInyhuqFym+udoe7mblS8A8GHIX q+ErWQ6Ab9l2Veu4H2mMpcDE5NpG6hvdeDo2JlBq7/3B0w9qp2nsoP9iiHiFCGHgru+s ZjJy3t3TtXhDvk6JDdRGIq2h3mwZPtcxNZ7s4egoY5liMGseNX4XbN2siah7F53vz9Gi ynCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=/KNmsBgCiB6C38hFQ9fRt0MZjL4be6P6IGIGMYOTUPI=; b=TEdk3BZz3ZmBtR7zLVo0DA5kxNvavc01yrPi0gxMRbGnp7ms2nGraydIhc+jGQOGbk yvBq3c0ZV3XycFPtzqRtbzVIjgCXFVEpn+e+i/D+4CjwuqNb9+WyJT3i2mCVC+91AveH RBHH03eaINvl5f4LO0SaAMMeh6r7mqWShZL0nUiySbb6kUE+53Qvfx2lyIRkuFF8r0fs HkxFYc8G4tuhZAMpuTtcx2ADrQ1LJU5C871WaAyhYYtx2NoSO5r2Ve+GwPpdCZ8iKLtC jrZKaqmSY2UkXQVADyNWqAASCpLZNvD4WppCFJNserWi/KqY05lE7B4OmaL8N+YdlIJT Ct+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k62-20020a636f41000000b005349286fcc0si5717795pgc.302.2023.05.27.00.04.08; Sat, 27 May 2023 00:04:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229494AbjE0Frc (ORCPT <rfc822;zhanglyra.2023@gmail.com> + 99 others); Sat, 27 May 2023 01:47:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231173AbjE0Frb (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 27 May 2023 01:47:31 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BCAC12A for <linux-kernel@vger.kernel.org>; Fri, 26 May 2023 22:47:30 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A991660EF4 for <linux-kernel@vger.kernel.org>; Sat, 27 May 2023 05:47:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 031ECC433D2; Sat, 27 May 2023 05:47:26 +0000 (UTC) From: Huacai Chen <chenhuacai@loongson.cn> To: Thomas Gleixner <tglx@linutronix.de>, Marc Zyngier <maz@kernel.org>, Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org, loongson-kernel@lists.loongnix.cn, Xuefeng Li <lixuefeng@loongson.cn>, Huacai Chen <chenhuacai@gmail.com>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Huacai Chen <chenhuacai@loongson.cn> Subject: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs() Date: Sat, 27 May 2023 13:46:32 +0800 Message-Id: <20230527054633.704916-2-chenhuacai@loongson.cn> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230527054633.704916-1-chenhuacai@loongson.cn> References: <20230527054633.704916-1-chenhuacai@loongson.cn> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767029951536936055?= X-GMAIL-MSGID: =?utf-8?q?1767029951536936055?= |
Series |
Add machanism to limit msi allocation for Loongson
|
|
Commit Message
Huacai Chen
May 27, 2023, 5:46 a.m. UTC
Adjust the return value semanteme of msi_domain_prepare_irqs(), which allows us to modify the input nvec by overriding the msi_domain_ops:: msi_prepare(). This is necessary for the later patch. Before: 0 on success, others on error. After: = 0: Success; > 0: The modified nvec; < 0: Error code. Callers are also updated. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> --- drivers/base/platform-msi.c | 2 +- kernel/irq/msi.c | 10 +++++++++- 2 files changed, 10 insertions(+), 2 deletions(-)
Comments
On Sat, May 27 2023 at 13:46, Huacai Chen wrote: > Adjust the return value semanteme of msi_domain_prepare_irqs(), which > allows us to modify the input nvec by overriding the msi_domain_ops:: > msi_prepare(). This is necessary for the later patch. > > Before: > 0 on success, others on error. > > After: > = 0: Success; >> 0: The modified nvec; > < 0: Error code. This explains what the patch does, but provides zero justification for this nor any analysis why this is correct for the existing use cases. That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a mechanism to return the actual possible number of vectors if the underlying space is exhausted. Why is that not sufficient for your problem at hand? Thanks, tglx
Hi, Thomas, On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Sat, May 27 2023 at 13:46, Huacai Chen wrote: > > Adjust the return value semanteme of msi_domain_prepare_irqs(), which > > allows us to modify the input nvec by overriding the msi_domain_ops:: > > msi_prepare(). This is necessary for the later patch. > > > > Before: > > 0 on success, others on error. > > > > After: > > = 0: Success; > >> 0: The modified nvec; > > < 0: Error code. > > This explains what the patch does, but provides zero justification for > this nor any analysis why this is correct for the existing use cases. I checked all msi_prepare() callbacks and none of them return positive values now, so I think it is correct. > > That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a > mechanism to return the actual possible number of vectors if the > underlying space is exhausted. > > Why is that not sufficient for your problem at hand? Hmm, maybe I should make things clearer. We want to do some proactive throttling here. For example, if we have two NICs, we want both of them to get 32 msi vectors, not one get 64 vectors, and the other fallback to use legacy irq. Huacai > > Thanks, > > tglx
On Sat, 27 May 2023 15:03:29 +0100, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Sat, May 27 2023 at 13:46, Huacai Chen wrote: > > Adjust the return value semanteme of msi_domain_prepare_irqs(), which > > allows us to modify the input nvec by overriding the msi_domain_ops:: > > msi_prepare(). This is necessary for the later patch. > > > > Before: > > 0 on success, others on error. > > > > After: > > = 0: Success; > >> 0: The modified nvec; > > < 0: Error code. > > This explains what the patch does, but provides zero justification for > this nor any analysis why this is correct for the existing use cases. > > That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a > mechanism to return the actual possible number of vectors if the > underlying space is exhausted. > > Why is that not sufficient for your problem at hand? I've already made that point, but it seems that the argument is falling on deaf ears. Being able to allocate MSIs is not a guarantee, and is always opportunistic. If some drivers badly fail because the they don't get the number of MSIs they need, then they need fixing. I really don't see the point in papering over this at the lowest level of the stack. M.
Hi, Marc, On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <maz@kernel.org> wrote: > > On Sat, 27 May 2023 15:03:29 +0100, > Thomas Gleixner <tglx@linutronix.de> wrote: > > > > On Sat, May 27 2023 at 13:46, Huacai Chen wrote: > > > Adjust the return value semanteme of msi_domain_prepare_irqs(), which > > > allows us to modify the input nvec by overriding the msi_domain_ops:: > > > msi_prepare(). This is necessary for the later patch. > > > > > > Before: > > > 0 on success, others on error. > > > > > > After: > > > = 0: Success; > > >> 0: The modified nvec; > > > < 0: Error code. > > > > This explains what the patch does, but provides zero justification for > > this nor any analysis why this is correct for the existing use cases. > > > > That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a > > mechanism to return the actual possible number of vectors if the > > underlying space is exhausted. > > > > Why is that not sufficient for your problem at hand? > > I've already made that point, but it seems that the argument is > falling on deaf ears. I'm very sorry that I didn't answer your question directly. > > Being able to allocate MSIs is not a guarantee, and is always > opportunistic. If some drivers badly fail because the they don't get > the number of MSIs they need, then they need fixing. Yes, I know allocating MSIs is not a guarantee, and most existing drivers will fallback to use legacy irqs when failed. However, as I replied in an early mail, we want to do some proactive throttling in the loongson-pch-msi irqchip driver, rather than consume msi vectors aggressively. For example, if we have two NICs, we want both of them to get 32 msi vectors; not one exhaust all available vectors, and the other fallback to use legacy irq. I hope I have explained clearly, thanks. Huacai > > I really don't see the point in papering over this at the lowest level > of the stack. > > M. > > -- > Without deviation from the norm, progress is not possible.
On Sun, May 28 2023 at 11:42, Huacai Chen wrote: > On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> On Sat, May 27 2023 at 13:46, Huacai Chen wrote: >> > After: >> > = 0: Success; >> >> 0: The modified nvec; >> > < 0: Error code. >> >> This explains what the patch does, but provides zero justification for >> this nor any analysis why this is correct for the existing use cases. > I checked all msi_prepare() callbacks and none of them return positive > values now, so I think it is correct. Still you failed to tell so in the changelog. It's not helpful if you think it is correct. The point is that you have to make clear why it _IS_ correct. Thanks, tglx
On Sun, May 28 2023 at 20:07, Huacai Chen wrote: > On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <maz@kernel.org> wrote: >> >> Being able to allocate MSIs is not a guarantee, and is always >> opportunistic. If some drivers badly fail because the they don't get >> the number of MSIs they need, then they need fixing. > > Yes, I know allocating MSIs is not a guarantee, and most existing > drivers will fallback to use legacy irqs when failed. However, as I > replied in an early mail, we want to do some proactive throttling in > the loongson-pch-msi irqchip driver, rather than consume msi vectors > aggressively. For example, if we have two NICs, we want both of them > to get 32 msi vectors; not one exhaust all available vectors, and the > other fallback to use legacy irq. By default you allow up to 256 interrupts to be allocated, right? So to prevent vector exhaustion, the admin needs to reboot the machine and set a command line parameter to limit this, right? As that parameter is not documented the admin is going to dice a number. That's impractical and just a horrible bandaid. Thanks, tglx
Hi, Thomas, On Mon, May 29, 2023 at 3:44 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Sun, May 28 2023 at 11:42, Huacai Chen wrote: > > On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <tglx@linutronix.de> wrote: > >> On Sat, May 27 2023 at 13:46, Huacai Chen wrote: > >> > After: > >> > = 0: Success; > >> >> 0: The modified nvec; > >> > < 0: Error code. > >> > >> This explains what the patch does, but provides zero justification for > >> this nor any analysis why this is correct for the existing use cases. > > I checked all msi_prepare() callbacks and none of them return positive > > values now, so I think it is correct. > > Still you failed to tell so in the changelog. It's not helpful if you > think it is correct. The point is that you have to make clear why it > _IS_ correct. OK, I think I should add more information in the next version, thanks. Huacai > > Thanks, > > tglx
Hi, Thomas, On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Sun, May 28 2023 at 20:07, Huacai Chen wrote: > > On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <maz@kernel.org> wrote: > >> > >> Being able to allocate MSIs is not a guarantee, and is always > >> opportunistic. If some drivers badly fail because the they don't get > >> the number of MSIs they need, then they need fixing. > > > > Yes, I know allocating MSIs is not a guarantee, and most existing > > drivers will fallback to use legacy irqs when failed. However, as I > > replied in an early mail, we want to do some proactive throttling in > > the loongson-pch-msi irqchip driver, rather than consume msi vectors > > aggressively. For example, if we have two NICs, we want both of them > > to get 32 msi vectors; not one exhaust all available vectors, and the > > other fallback to use legacy irq. > > By default you allow up to 256 interrupts to be allocated, right? So to > prevent vector exhaustion, the admin needs to reboot the machine and set > a command line parameter to limit this, right? As that parameter is not > documented the admin is going to dice a number. That's impractical and > just a horrible bandaid. OK, I think I should update the documents in the new version. Huacai > > Thanks, > > tglx > >
Huacai! On Mon, May 29 2023 at 17:36, Huacai Chen wrote: > On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> By default you allow up to 256 interrupts to be allocated, right? So to >> prevent vector exhaustion, the admin needs to reboot the machine and set >> a command line parameter to limit this, right? As that parameter is not >> documented the admin is going to dice a number. That's impractical and >> just a horrible bandaid. > > OK, I think I should update the documents in the new version. Updating documentation neither makes it more practical (it still requires a reboot) nor does it justify the abuse of the msi_prepare() callback. The only reason why this hack "works" is that there is a historical mechanism which tells the PCI/MSI core that the number of requested vectors cannot be allocated, but that there would be $N vectors possible. But even that return value has no guarantee. This mechanism is ill defined and really should go away. Adding yet another way to limit this via msi_prepare() is just proliferating this ill defined mechanism and I have zero interest in that. Let's take a step back and look at the larger picture: 1) A PCI/MSI irqdomain is attached to a PCI bus 2) The number of PCI devices on that PCI bus is usually known at boot time _before_ the first device driver is probed. That's not entirely true for PCI hotplug devices, but that's hardly relevant for an architecture which got designed less than 10 years ago and the architects decided that 256 MSI vectors are good enough for up to 256 CPUs. The concept of per CPU queues was already known at that time, no? So the irqdomain can tell the PCI/MSI core the maximum number of vectors available for a particular bus, right? The default, i.e if the irqdomain does not expose that information, would be "unlimited", i.e. ULONG_MAX. Now take that number and divide it by the number of devices on the bus and you get at least a sensible limit which does not immediately cause vector exhaustion. That limit might be suboptimal if there are lots of other devices on that bus which just require one or two vectors, but that's something which can be optimized via a generic command line option or even a sysfs mechanism. Thanks, tglx
Hi, Thomas, On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Huacai! > > On Mon, May 29 2023 at 17:36, Huacai Chen wrote: > > On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <tglx@linutronix.de> wrote: > >> By default you allow up to 256 interrupts to be allocated, right? So to > >> prevent vector exhaustion, the admin needs to reboot the machine and set > >> a command line parameter to limit this, right? As that parameter is not > >> documented the admin is going to dice a number. That's impractical and > >> just a horrible bandaid. > > > > OK, I think I should update the documents in the new version. > > Updating documentation neither makes it more practical (it still > requires a reboot) nor does it justify the abuse of the msi_prepare() > callback. > > The only reason why this hack "works" is that there is a historical > mechanism which tells the PCI/MSI core that the number of requested > vectors cannot be allocated, but that there would be $N vectors > possible. But even that return value has no guarantee. > > This mechanism is ill defined and really should go away. > > Adding yet another way to limit this via msi_prepare() is just > proliferating this ill defined mechanism and I have zero interest in > that. > > Let's take a step back and look at the larger picture: > > 1) A PCI/MSI irqdomain is attached to a PCI bus > > 2) The number of PCI devices on that PCI bus is usually known at boot > time _before_ the first device driver is probed. > > That's not entirely true for PCI hotplug devices, but that's hardly > relevant for an architecture which got designed less than 10 years > ago and the architects decided that 256 MSI vectors are good enough > for up to 256 CPUs. The concept of per CPU queues was already known > at that time, no? Does this solution depend on the per-device msi domain? Can we do that if we use the global msi domain? > > So the irqdomain can tell the PCI/MSI core the maximum number of vectors > available for a particular bus, right? > > The default, i.e if the irqdomain does not expose that information, > would be "unlimited", i.e. ULONG_MAX. OK, thanks, but how to expose? By msi_domain_info::hwsize? > > Now take that number and divide it by the number of devices on the bus > and you get at least a sensible limit which does not immediately cause > vector exhaustion. > > That limit might be suboptimal if there are lots of other devices on > that bus which just require one or two vectors, but that's something > which can be optimized via a generic command line option or even a sysfs > mechanism. Hmm, if we still use the command line, then we still have some similar drawbacks. Huacai > > Thanks, > > tglx > > > > >
On Tue, May 30 2023 at 16:34, Huacai Chen wrote: > On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> Now take that number and divide it by the number of devices on the bus >> and you get at least a sensible limit which does not immediately cause >> vector exhaustion. >> >> That limit might be suboptimal if there are lots of other devices on >> that bus which just require one or two vectors, but that's something >> which can be optimized via a generic command line option or even a sysfs >> mechanism. > Hmm, if we still use the command line, then we still have some similar > drawbacks. Only for optimization. Without the optimization the limit might end up being overbroad, but it definitely prevents vector exhaustion. For quite some cases this might be even the proper limit which does not need any further tweaking. Thanks, tglx
On Tue, May 30 2023 at 16:34, Huacai Chen wrote: > On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> Let's take a step back and look at the larger picture: >> >> 1) A PCI/MSI irqdomain is attached to a PCI bus >> >> 2) The number of PCI devices on that PCI bus is usually known at boot >> time _before_ the first device driver is probed. >> >> That's not entirely true for PCI hotplug devices, but that's hardly >> relevant for an architecture which got designed less than 10 years >> ago and the architects decided that 256 MSI vectors are good enough >> for up to 256 CPUs. The concept of per CPU queues was already known >> at that time, no? > Does this solution depend on the per-device msi domain? Can we do that > if we use the global msi domain? In principle it should not depend on per-device MSI domains, but I really don't want to add new functionality to the old operating models as that does not create an incentive for people to convert their stuff over. >> So the irqdomain can tell the PCI/MSI core the maximum number of vectors >> available for a particular bus, right? >> >> The default, i.e if the irqdomain does not expose that information, >> would be "unlimited", i.e. ULONG_MAX. > OK, thanks, but how to expose? By msi_domain_info::hwsize? Probably. Needs a proper helper around it. Thanks, tglx
Hi, Thomas, On Tue, May 30, 2023 at 11:03 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, May 30 2023 at 16:34, Huacai Chen wrote: > > On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote: > >> Let's take a step back and look at the larger picture: > >> > >> 1) A PCI/MSI irqdomain is attached to a PCI bus > >> > >> 2) The number of PCI devices on that PCI bus is usually known at boot > >> time _before_ the first device driver is probed. > >> > >> That's not entirely true for PCI hotplug devices, but that's hardly > >> relevant for an architecture which got designed less than 10 years > >> ago and the architects decided that 256 MSI vectors are good enough > >> for up to 256 CPUs. The concept of per CPU queues was already known > >> at that time, no? > > Does this solution depend on the per-device msi domain? Can we do that > > if we use the global msi domain? > > In principle it should not depend on per-device MSI domains, but I > really don't want to add new functionality to the old operating models > as that does not create an incentive for people to convert their stuff > over. Thank you for your advice, but for our scenario, its effect is no better than this patch (because not all devices are aggressive devices), so we give up. :) And as Jason said in another thread, this problem can be solved by simply reducing the number of queues by ethtool. So let's ignore this patch since it is not acceptable. Huacai > > >> So the irqdomain can tell the PCI/MSI core the maximum number of vectors > >> available for a particular bus, right? > >> > >> The default, i.e if the irqdomain does not expose that information, > >> would be "unlimited", i.e. ULONG_MAX. > > OK, thanks, but how to expose? By msi_domain_info::hwsize? > > Probably. Needs a proper helper around it. > > Thanks, > > tglx
在 2023/5/30 23:03, Thomas Gleixner 写道: > On Tue, May 30 2023 at 16:34, Huacai Chen wrote: >> On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote: >>> Let's take a step back and look at the larger picture: >>> >>> 1) A PCI/MSI irqdomain is attached to a PCI bus >>> >>> 2) The number of PCI devices on that PCI bus is usually known at boot >>> time _before_ the first device driver is probed. >>> >>> That's not entirely true for PCI hotplug devices, but that's hardly >>> relevant for an architecture which got designed less than 10 years >>> ago and the architects decided that 256 MSI vectors are good enough >>> for up to 256 CPUs. The concept of per CPU queues was already known >>> at that time, no? >> Does this solution depend on the per-device msi domain? Can we do that >> if we use the global msi domain? > > In principle it should not depend on per-device MSI domains, but I > really don't want to add new functionality to the old operating models > as that does not create an incentive for people to convert their stuff > over. > >>> So the irqdomain can tell the PCI/MSI core the maximum number of vectors >>> available for a particular bus, right? >>> >>> The default, i.e if the irqdomain does not expose that information, >>> would be "unlimited", i.e. ULONG_MAX. >> OK, thanks, but how to expose? By msi_domain_info::hwsize? > > Probably. Needs a proper helper around it. It is not common issue, command line and documentation explanation is not suitable here. Can we add weak function like this? int __weak arch_set_max_msix_vectors(void) Regards Bibo, mao > > Thanks, > > tglx
diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index f37ad34c80ec..e4a517c144e7 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -298,7 +298,7 @@ __platform_msi_create_device_domain(struct device *dev, platform_msi_set_proxy_dev(&data->arg); err = msi_domain_prepare_irqs(domain->parent, dev, nvec, &data->arg); - if (err) + if (err < 0) goto free_domain; return domain; diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index 7a97bcb086bf..d151936aec05 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -1058,6 +1058,12 @@ bool msi_match_device_irq_domain(struct device *dev, unsigned int domid, return ret; } +/* + * Return Val: + * = 0: Success; + * > 0: The modified nvec; + * < 0: Error code. + */ int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg) { @@ -1260,8 +1266,10 @@ static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain int i, ret, virq; ret = msi_domain_prepare_irqs(domain, dev, ctrl->nirqs, &arg); - if (ret) + if (ret < 0) return ret; + if (ret > 0) + ctrl->nirqs = ret; /* * This flag is set by the PCI layer as we need to activate