Message ID | 20231128204938.1453583-9-pasha.tatashin@soleen.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp4229589vqx; Tue, 28 Nov 2023 13:11:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IFud0gvD7yPZ2HYhrBUPIxn3XW4vy+SV3FQXdI6WkRz8368NQ78yUqZDZv1jhDxQef6g3Na X-Received: by 2002:a17:90b:4f4a:b0:285:a189:cc6c with SMTP id pj10-20020a17090b4f4a00b00285a189cc6cmr14290388pjb.5.1701205910925; Tue, 28 Nov 2023 13:11:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701205910; cv=none; d=google.com; s=arc-20160816; b=ONLqcRdBKy4QQ1I3dZpvbbDmbBjjy9F9na64r1HNX2xvZKkbmxoi3q97itpCLzTe0u vTxyMv2kDme2mKsKbevPQsRJ+66MygDO4FAGkap/+iNoWVW2w+SXrjpeIc0gw5RGT+IX Td7/g/AyXL6Z8aR4r/rRsgRD62zsI/Sg38JBdvh92u7gkCtxDJdT6XmFiXZ7QRpSHVA1 S2+xZkUohvBxNHrnIraNeDP1L5N5cGhpCDQK8RLa3ghXMAdTc+Gk4Wlk3CDCL0JKKk1B oi5ehPAo/w5OMydREh3ySXcinjiS+nzURPyFHNeBJA8yyLjPR4gXw5QbGKo56kaGI/Ed sS+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=S6nAo2mT7pHcrL/oNBUQ6H06jfJ0V1n29MMcCYzl3v8=; fh=z7CihAYYs+frw4g3uL4utEDJXxOOJDcFFsLsywo+fzs=; b=X4UVwGhSSydabK+37ZfBLetNAO1/9MCTJ5EiykX+CDLiviX829aVKhZOcoKMMLLcY8 /fX2en48lgsFbQalpaREShbUnk3dR1kEkU28QQFpCtnF59ogNvmA+bTLEXb9q42WH5Xz b2djcaAVoopJ5+KN9xeE0z5o87fEdXb1/QM0Ww6ztG0cl1mP97ip2Y/125tdt5G4RvbU MRJmuBHON05AZPPwDUMJ4WJ1k5/+WWsidWF5Pm8oWgOQqq3dkM9aep1MUYxZuSP1/BiT eBwIweiF89yinDPrJ/dEJeWTwQumhhSTWYpWSsjTBocFOsVCr518hwUVv8ZHnYv4gB3m kEgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=S0UQOTWH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id nd16-20020a17090b4cd000b00285be1801d8si6507248pjb.101.2023.11.28.13.11.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 13:11:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=S0UQOTWH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id E64FD805061B; Tue, 28 Nov 2023 12:50:24 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346432AbjK1UuD (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Tue, 28 Nov 2023 15:50:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346164AbjK1Utr (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 28 Nov 2023 15:49:47 -0500 Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD07B1BF5 for <linux-kernel@vger.kernel.org>; Tue, 28 Nov 2023 12:49:49 -0800 (PST) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-5cd0af4a7d3so61688827b3.0 for <linux-kernel@vger.kernel.org>; Tue, 28 Nov 2023 12:49:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1701204589; x=1701809389; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=S6nAo2mT7pHcrL/oNBUQ6H06jfJ0V1n29MMcCYzl3v8=; b=S0UQOTWHBNlFGfUS4oGwhx5iaN0e1rCi7aqR76MZSiPG0xbH9+lXPOVI4Z2x0nm9Jz 8FVqAma4K5GcauDVS52zXymBUgJ1/4kmzrhtLpzUgW7cp6Pv72F6IOeRTevdCLnKVmG0 9F4hkBykJAhtVz1b2Q2ZszgZWy47Y9i+JHCOsOvXnANSH4mKo3IQsQuI5VcU2EE6+MeL LyDw1Pr6LsdVJ89PFwTRLsOaRYYVYSHbcp8EvLGnmCUoQw7ojSpQL/wYST7Fhzy8MnlC VX1ZBmx550XJVf4NJdQGERUa/E+jQOXP7y96zaTD7iWE6epcHzVsbQAbPHVYp2qCEY+N HtVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701204589; x=1701809389; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S6nAo2mT7pHcrL/oNBUQ6H06jfJ0V1n29MMcCYzl3v8=; b=XY5FIIt9tbpVNj6Ibb4qPBVce4TuPO5Z3gwDhahG0Ko3tlD3+q2o3Q7Mu2fWVTQQI0 ilFa+FRDFfyetB8d2yMOdpCn0frTy1E8FtdYXprKFY1pelkWmmTzkaTNlk905kIULGjU Ja53PaFDayfr6tnah/5RzHW1TOhX4Y68FMiQyp6meEcdGtf5Xicl4CguRRgzec2sy3CM Tm5XEamGlXjpuiSXk8/yUyNBVWKXq85UaSV06Na2up6X8T3Q8CAa/bgUu5aq+uT4jIAQ YGcJrlCR9772UTX/Mh2/66q3lZxM0PE6utvhfRC2r5n5cUjpypda/Du8VvQc1jfYxYiy /zOw== X-Gm-Message-State: AOJu0YzwY1BqiNleY0+SGP1S0Eojru0MVq5avONFZUnj0b+rHKmChW1i U6GW4Z/VD+yrRLs1q8nJMGRFPQ== X-Received: by 2002:a25:68c7:0:b0:da0:c6ae:ad0e with SMTP id d190-20020a2568c7000000b00da0c6aead0emr15080570ybc.21.1701204588830; Tue, 28 Nov 2023 12:49:48 -0800 (PST) Received: from soleen.c.googlers.com.com (55.87.194.35.bc.googleusercontent.com. [35.194.87.55]) by smtp.gmail.com with ESMTPSA id d11-20020a0cfe8b000000b0067a56b6adfesm1056863qvs.71.2023.11.28.12.49.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 12:49:48 -0800 (PST) From: Pasha Tatashin <pasha.tatashin@soleen.com> To: akpm@linux-foundation.org, alex.williamson@redhat.com, alim.akhtar@samsung.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, baolu.lu@linux.intel.com, bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net, david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org, heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com, jernej.skrabec@gmail.com, jgg@ziepe.ca, jonathanh@nvidia.com, joro@8bytes.org, kevin.tian@intel.com, krzysztof.kozlowski@linaro.org, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rockchip@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev, linux-tegra@vger.kernel.org, lizefan.x@bytedance.com, marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com, m.szyprowski@samsung.com, netdev@vger.kernel.org, pasha.tatashin@soleen.com, paulmck@kernel.org, rdunlap@infradead.org, robin.murphy@arm.com, samuel@sholland.org, suravee.suthikulpanit@amd.com, sven@svenpeter.dev, thierry.reding@gmail.com, tj@kernel.org, tomas.mudrunka@gmail.com, vdumpa@nvidia.com, virtualization@lists.linux.dev, wens@csie.org, will@kernel.org, yu-cheng.yu@intel.com Subject: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h Date: Tue, 28 Nov 2023 20:49:30 +0000 Message-ID: <20231128204938.1453583-9-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.43.0.rc2.451.g8631bc7472-goog In-Reply-To: <20231128204938.1453583-1-pasha.tatashin@soleen.com> References: <20231128204938.1453583-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 28 Nov 2023 12:50:25 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783843689000896059 X-GMAIL-MSGID: 1783843689000896059 |
Series |
IOMMU memory observability
|
|
Commit Message
Pasha Tatashin
Nov. 28, 2023, 8:49 p.m. UTC
Convert iommu/fsl_pamu.c to use the new page allocation functions
provided in iommu-pages.h.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
drivers/iommu/fsl_pamu.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Comments
On 2023-11-28 8:49 pm, Pasha Tatashin wrote: > Convert iommu/fsl_pamu.c to use the new page allocation functions > provided in iommu-pages.h. Again, this is not a pagetable. This thing doesn't even *have* pagetables. Similar to patches #1 and #2 where you're lumping in configuration tables which belong to the IOMMU driver itself, as opposed to pagetables which effectively belong to an IOMMU domain's user. But then there are still drivers where you're *not* accounting similar configuration structures, so I really struggle to see how this metric is useful when it's so completely inconsistent in what it's counting :/ Thanks, Robin. > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> > --- > drivers/iommu/fsl_pamu.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c > index f37d3b044131..7bfb49940f0c 100644 > --- a/drivers/iommu/fsl_pamu.c > +++ b/drivers/iommu/fsl_pamu.c > @@ -16,6 +16,7 @@ > #include <linux/platform_device.h> > > #include <asm/mpc85xx.h> > +#include "iommu-pages.h" > > /* define indexes for each operation mapping scenario */ > #define OMI_QMAN 0x00 > @@ -828,7 +829,7 @@ static int fsl_pamu_probe(struct platform_device *pdev) > (PAGE_SIZE << get_order(OMT_SIZE)); > order = get_order(mem_size); > > - p = alloc_pages(GFP_KERNEL | __GFP_ZERO, order); > + p = __iommu_alloc_pages(GFP_KERNEL, order); > if (!p) { > dev_err(dev, "unable to allocate PAACT/SPAACT/OMT block\n"); > ret = -ENOMEM; > @@ -916,7 +917,7 @@ static int fsl_pamu_probe(struct platform_device *pdev) > iounmap(guts_regs); > > if (ppaact) > - free_pages((unsigned long)ppaact, order); > + iommu_free_pages(ppaact, order); > > ppaact = NULL; >
On Tue, Nov 28, 2023 at 5:53 PM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2023-11-28 8:49 pm, Pasha Tatashin wrote: > > Convert iommu/fsl_pamu.c to use the new page allocation functions > > provided in iommu-pages.h. > > Again, this is not a pagetable. This thing doesn't even *have* pagetables. > > Similar to patches #1 and #2 where you're lumping in configuration > tables which belong to the IOMMU driver itself, as opposed to pagetables > which effectively belong to an IOMMU domain's user. But then there are > still drivers where you're *not* accounting similar configuration > structures, so I really struggle to see how this metric is useful when > it's so completely inconsistent in what it's counting :/ The whole IOMMU subsystem allocates a significant amount of kernel locked memory that we want to at least observe. The new field in vmstat does just that: it reports ALL buddy allocator memory that IOMMU allocates. However, for accounting purposes, I agree, we need to do better, and separate at least iommu pagetables from the rest. We can separate the metric into two: iommu pagetable only iommu everything or into three: iommu pagetable only iommu dma iommu everything What do you think? Pasha > > Thanks, > Robin. > > > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> > > --- > > drivers/iommu/fsl_pamu.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c > > index f37d3b044131..7bfb49940f0c 100644 > > --- a/drivers/iommu/fsl_pamu.c > > +++ b/drivers/iommu/fsl_pamu.c > > @@ -16,6 +16,7 @@ > > #include <linux/platform_device.h> > > > > #include <asm/mpc85xx.h> > > +#include "iommu-pages.h" > > > > /* define indexes for each operation mapping scenario */ > > #define OMI_QMAN 0x00 > > @@ -828,7 +829,7 @@ static int fsl_pamu_probe(struct platform_device *pdev) > > (PAGE_SIZE << get_order(OMT_SIZE)); > > order = get_order(mem_size); > > > > - p = alloc_pages(GFP_KERNEL | __GFP_ZERO, order); > > + p = __iommu_alloc_pages(GFP_KERNEL, order); > > if (!p) { > > dev_err(dev, "unable to allocate PAACT/SPAACT/OMT block\n"); > > ret = -ENOMEM; > > @@ -916,7 +917,7 @@ static int fsl_pamu_probe(struct platform_device *pdev) > > iounmap(guts_regs); > > > > if (ppaact) > > - free_pages((unsigned long)ppaact, order); > > + iommu_free_pages(ppaact, order); > > > > ppaact = NULL; > >
On Tue, Nov 28, 2023 at 06:00:13PM -0500, Pasha Tatashin wrote: > On Tue, Nov 28, 2023 at 5:53 PM Robin Murphy <robin.murphy@arm.com> wrote: > > > > On 2023-11-28 8:49 pm, Pasha Tatashin wrote: > > > Convert iommu/fsl_pamu.c to use the new page allocation functions > > > provided in iommu-pages.h. > > > > Again, this is not a pagetable. This thing doesn't even *have* pagetables. > > > > Similar to patches #1 and #2 where you're lumping in configuration > > tables which belong to the IOMMU driver itself, as opposed to pagetables > > which effectively belong to an IOMMU domain's user. But then there are > > still drivers where you're *not* accounting similar configuration > > structures, so I really struggle to see how this metric is useful when > > it's so completely inconsistent in what it's counting :/ > > The whole IOMMU subsystem allocates a significant amount of kernel > locked memory that we want to at least observe. The new field in > vmstat does just that: it reports ALL buddy allocator memory that > IOMMU allocates. However, for accounting purposes, I agree, we need to > do better, and separate at least iommu pagetables from the rest. > > We can separate the metric into two: > iommu pagetable only > iommu everything > > or into three: > iommu pagetable only > iommu dma > iommu everything > > What do you think? I think I said this at LPC - if you want to have fine grained accounting of memory by owner you need to go talk to the cgroup people and come up with something generic. Adding ever open coded finer category breakdowns just for iommu doesn't make alot of sense. You can make some argument that the pagetable memory should be counted because kvm counts it's shadow memory, but I wouldn't go into further detail than that with hand coded counters.. Jason
On 28/11/2023 11:50 pm, Jason Gunthorpe wrote: > On Tue, Nov 28, 2023 at 06:00:13PM -0500, Pasha Tatashin wrote: >> On Tue, Nov 28, 2023 at 5:53 PM Robin Murphy <robin.murphy@arm.com> wrote: >>> >>> On 2023-11-28 8:49 pm, Pasha Tatashin wrote: >>>> Convert iommu/fsl_pamu.c to use the new page allocation functions >>>> provided in iommu-pages.h. >>> >>> Again, this is not a pagetable. This thing doesn't even *have* pagetables. >>> >>> Similar to patches #1 and #2 where you're lumping in configuration >>> tables which belong to the IOMMU driver itself, as opposed to pagetables >>> which effectively belong to an IOMMU domain's user. But then there are >>> still drivers where you're *not* accounting similar configuration >>> structures, so I really struggle to see how this metric is useful when >>> it's so completely inconsistent in what it's counting :/ >> >> The whole IOMMU subsystem allocates a significant amount of kernel >> locked memory that we want to at least observe. The new field in >> vmstat does just that: it reports ALL buddy allocator memory that >> IOMMU allocates. However, for accounting purposes, I agree, we need to >> do better, and separate at least iommu pagetables from the rest. >> >> We can separate the metric into two: >> iommu pagetable only >> iommu everything >> >> or into three: >> iommu pagetable only >> iommu dma >> iommu everything >> >> What do you think? > > I think I said this at LPC - if you want to have fine grained > accounting of memory by owner you need to go talk to the cgroup people > and come up with something generic. Adding ever open coded finer > category breakdowns just for iommu doesn't make alot of sense. > > You can make some argument that the pagetable memory should be counted > because kvm counts it's shadow memory, but I wouldn't go into further > detail than that with hand coded counters.. Right, pagetable memory is interesting since it's something that any random kernel user can indirectly allocate via iommu_domain_alloc() and iommu_map(), and some of those users may even be doing so on behalf of userspace. I have no objection to accounting and potentially applying limits to *that*. Beyond that, though, there is nothing special about "the IOMMU subsystem". The amount of memory an IOMMU driver needs to allocate for itself in order to function is not of interest beyond curiosity, it just is what it is; limiting it would only break the IOMMU, and if a user thinks it's "too much", the only actionable thing that might help is to physically remove devices from the system. Similar for DMA buffers; it might be intriguing to account those, but it's not really an actionable metric - in the overwhelming majority of cases you can't simply tell a driver to allocate less than what it needs. And that is of course assuming if we were to account *all* DMA buffers, since whether they happen to have an IOMMU translation or not is irrelevant (we'd have already accounted the pagetables as pagetables if so). I bet "the networking subsystem" also consumes significant memory on the same kind of big systems where IOMMU pagetables would be of any concern. I believe some of the some of the "serious" NICs can easily run up hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. - would you propose accounting those too? Thanks, Robin.
> >> We can separate the metric into two: > >> iommu pagetable only > >> iommu everything > >> > >> or into three: > >> iommu pagetable only > >> iommu dma > >> iommu everything > >> > >> What do you think? > > > > I think I said this at LPC - if you want to have fine grained > > accounting of memory by owner you need to go talk to the cgroup people > > and come up with something generic. Adding ever open coded finer > > category breakdowns just for iommu doesn't make alot of sense. > > > > You can make some argument that the pagetable memory should be counted > > because kvm counts it's shadow memory, but I wouldn't go into further > > detail than that with hand coded counters.. > > Right, pagetable memory is interesting since it's something that any > random kernel user can indirectly allocate via iommu_domain_alloc() and > iommu_map(), and some of those users may even be doing so on behalf of > userspace. I have no objection to accounting and potentially applying > limits to *that*. Yes, in the next version, I will separate pagetable only from the rest, for the limits. > Beyond that, though, there is nothing special about "the IOMMU > subsystem". The amount of memory an IOMMU driver needs to allocate for > itself in order to function is not of interest beyond curiosity, it just > is what it is; limiting it would only break the IOMMU, and if a user Agree about the amount of memory IOMMU allocates for itself, but that should be small, if it is not, we have to at least show where the memory is used. > thinks it's "too much", the only actionable thing that might help is to > physically remove devices from the system. Similar for DMA buffers; it > might be intriguing to account those, but it's not really an actionable > metric - in the overwhelming majority of cases you can't simply tell a > driver to allocate less than what it needs. And that is of course > assuming if we were to account *all* DMA buffers, since whether they > happen to have an IOMMU translation or not is irrelevant (we'd have > already accounted the pagetables as pagetables if so). DMA mappings should be observable (do not have to be limited). At the very least, it can help with explaining the kernel memory overhead anomalies on production systems. > I bet "the networking subsystem" also consumes significant memory on the It does, and GPU drivers also may consume a significant amount of memory. > same kind of big systems where IOMMU pagetables would be of any concern. > I believe some of the some of the "serious" NICs can easily run up > hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. > - would you propose accounting those too? Yes. Any kind of kernel memory that is proportional to the workload should be accountable. Someone is using those resources compared to the idling system, and that someone should be charged. Pasha
On Wed, Nov 29, 2023 at 02:45:03PM -0500, Pasha Tatashin wrote: > > same kind of big systems where IOMMU pagetables would be of any concern. > > I believe some of the some of the "serious" NICs can easily run up > > hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. > > - would you propose accounting those too? > > Yes. Any kind of kernel memory that is proportional to the workload > should be accountable. Someone is using those resources compared to > the idling system, and that someone should be charged. There is a difference between charged and accounted You should be running around adding GFP_KERNEL_ACCOUNT, yes. I already did a bunch of that work. Split that out from this series and send it to the right maintainers. Adding a counter for allocations and showing in procfs is a very different question. IMHO that should not be done in micro, the threshold to add a new counter should be high. There is definately room for a generic debugging feature to break down GFP_KERNEL_ACCOUNT by owernship somehow. Maybe it can already be done with BPF. IDK Jason
On Wed, Nov 29, 2023 at 3:03 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Wed, Nov 29, 2023 at 02:45:03PM -0500, Pasha Tatashin wrote: > > > > same kind of big systems where IOMMU pagetables would be of any concern. > > > I believe some of the some of the "serious" NICs can easily run up > > > hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. > > > - would you propose accounting those too? > > > > Yes. Any kind of kernel memory that is proportional to the workload > > should be accountable. Someone is using those resources compared to > > the idling system, and that someone should be charged. > > There is a difference between charged and accounted > > You should be running around adding GFP_KERNEL_ACCOUNT, yes. I already > did a bunch of that work. Split that out from this series and send it > to the right maintainers. I will do that. > > Adding a counter for allocations and showing in procfs is a very > different question. IMHO that should not be done in micro, the > threshold to add a new counter should be high. I agree, /proc/meminfo, should not include everything, however overall network consumption that includes memory allocated by network driver would be useful to have, may be it should be exported by device drivers and added to the protocol memory. We already have network protocol memory consumption in procfs: # awk '{printf "%-10s %s\n", $1, $4}' /proc/net/protocols | grep -v '\-1' protocol memory UDPv6 22673 TCPv6 16961 > There is definately room for a generic debugging feature to break down > GFP_KERNEL_ACCOUNT by owernship somehow. Maybe it can already be done > with BPF. IDK
diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index f37d3b044131..7bfb49940f0c 100644 --- a/drivers/iommu/fsl_pamu.c +++ b/drivers/iommu/fsl_pamu.c @@ -16,6 +16,7 @@ #include <linux/platform_device.h> #include <asm/mpc85xx.h> +#include "iommu-pages.h" /* define indexes for each operation mapping scenario */ #define OMI_QMAN 0x00 @@ -828,7 +829,7 @@ static int fsl_pamu_probe(struct platform_device *pdev) (PAGE_SIZE << get_order(OMT_SIZE)); order = get_order(mem_size); - p = alloc_pages(GFP_KERNEL | __GFP_ZERO, order); + p = __iommu_alloc_pages(GFP_KERNEL, order); if (!p) { dev_err(dev, "unable to allocate PAACT/SPAACT/OMT block\n"); ret = -ENOMEM; @@ -916,7 +917,7 @@ static int fsl_pamu_probe(struct platform_device *pdev) iounmap(guts_regs); if (ppaact) - free_pages((unsigned long)ppaact, order); + iommu_free_pages(ppaact, order); ppaact = NULL;