Message ID | 1680324300-124563-1-git-send-email-mikelley@microsoft.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1020689vqo; Fri, 31 Mar 2023 21:57:43 -0700 (PDT) X-Google-Smtp-Source: AK7set/+RBa9kJdrnQZ2Ou7qh09V80eGV4gM10ryrGZOnwikZ+tuUSjAcK1fvK/kMDcS3DmeAr5x X-Received: by 2002:a05:6a20:4f88:b0:d3:78ab:77c3 with SMTP id gh8-20020a056a204f8800b000d378ab77c3mr26191387pzb.48.1680325063597; Fri, 31 Mar 2023 21:57:43 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1680325063; cv=pass; d=google.com; s=arc-20160816; b=TscpS3WJwiM8AahVTKdm9bV77Bd61PdznKF0OQVUb3xLq9qJIE65EjwZHmx6Qrunos cRkbrgsFJTS50+UZ38KbhqBH7SCXl2wgAPKq0syi36OktoJYWYbhImx+gt3jGUu80ucp uOWgFQ0AJTCFyx3+S+qIvhSEsCCyHLiNyqYY/SqRbAX/Nn+TiintpxNXHgYGDkU8fJv5 i7RvUg7pH04UaJdFEmTTy6aS43/7Ksl3C1vX9PYHn3LOKn6B5ZY9gqfvJw3kMr+THUHs E1YU96FHqldWNl12mlr/vCVnLqG92PSnGD7aYWR/emvs2Rbl+YdCn6rcKSzPbI6ReBS1 Rbbw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature; bh=BzrQNY7r01O2nau9qg2MZq4BUNEK5Re1t9CsW7YHzyA=; b=y9Ofej4aZoegJ0wQjQkrSZnu9UbptyZ2m2MGsM68gDxAG0MgSU9V1DCSyCpGeliFZ7 3G+sbUw5MNq7983Z4nVwlup9TuhYgkt7epyVruDfHT8LpS8MozI3k56zlVzVTxZnJeiY 19c7kk9NViYydxvG9VIBBxTvDiCExsvIaTU+wa3LAYOLyuEPlHH8LvyoBfLqO+IrR3WJ lbgnU44YuzpCvqUKi34Khb2EKVBonNl6rOQzDyM2uahZPVlgknx9w2WQcWwvUDXOSpLI TGESy1aNQJXCIIcbc+W+sdQzzwJh2BN8F6lEdiZN4bif9z7tBfykQYH7mMWynJpcc4S6 DAJA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector2 header.b=BbyJjiS7; arc=pass (i=1 spf=pass spfdomain=microsoft.com dkim=pass dkdomain=microsoft.com dmarc=pass fromdomain=microsoft.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w18-20020a63af12000000b00502f49f43easi4123927pge.338.2023.03.31.21.57.31; Fri, 31 Mar 2023 21:57:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector2 header.b=BbyJjiS7; arc=pass (i=1 spf=pass spfdomain=microsoft.com dkim=pass dkdomain=microsoft.com dmarc=pass fromdomain=microsoft.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233518AbjDAEqN (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Sat, 1 Apr 2023 00:46:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229850AbjDAEqL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 1 Apr 2023 00:46:11 -0400 Received: from BN6PR00CU002.outbound.protection.outlook.com (mail-eastus2azon11021014.outbound.protection.outlook.com [52.101.57.14]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CDC7FF03 for <linux-kernel@vger.kernel.org>; Fri, 31 Mar 2023 21:46:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Lf6uvZpZuQQuAqBDPyuw7Dq86QtDQt7dRiSd+PT0inRl5UdXTejVWRaBLEGprN3Iho9NB5GHHUpP7uK2tHZ23YEPF59Z2CyxTsk5otFgIH0rJLxYL/nYcvYTnu8Q2cmCiKJJE035W83jULg4gC59TiomF5lZBUivGaUalMJAwtaaMP620G9i699J8bZkPdiZKcf+fqWqiUBlv5g0gQTlCbPlxF08UEDEFeCciiF2IAkT52xlBURXWj3k05iw0ZHX0NKYlBCve4dgvx+tPME1eBOnPTCsi7/9FlvhDgZQuPeo4yZxE6YHQAMbNnOmx0btSKnvshRjrllNbUyzYb9XMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BzrQNY7r01O2nau9qg2MZq4BUNEK5Re1t9CsW7YHzyA=; b=LKArXlylTV8qJVhuMPva9vfMPcyJe3fy4wWmG+BwM5D457FPkoIbKkByQgSX936PeTgL69fSJ2kx4aTI/KlNIJDSI1ZEYtGpUE6wo0oRXZLHYeRc2/uSixoqPSEA68eXT6ky5XtziYC0QXQ7Agz/BXC1QgC/KIRiEaS/7UyovT1RUau99+PbLOQE9b/1GSSAkVp/X/NXrT6bqdZ57Hg9IX4CtOVsXybrxZWB6SYCZEZdvbKo0YfptUS03vj9ere6LnTxGUASOoatb+WIEsp3yxMTZv3lsPlkXFFXntEdJNeaKS76txg6QP2A7beNID13a1gCoxkT/kLfClQGwZXnWA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=microsoft.com; dmarc=pass action=none header.from=microsoft.com; dkim=pass header.d=microsoft.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BzrQNY7r01O2nau9qg2MZq4BUNEK5Re1t9CsW7YHzyA=; b=BbyJjiS7NEQI4vprCXfv0R7sO9Ji47zYKikCyo6EiT9iEcjVFuS9OaDk2XD2NkVvv+ii+zBD7O2TFmKh58RJADACpd/+jAXohykAKbu2BjXl9PPobFtWZ3f8q/NjN+FSBd/KL72G1TqOz+KuP1G0+4yc4ZQynU0x6SvFbxEIz/s= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=microsoft.com; Received: from DM6PR21MB1370.namprd21.prod.outlook.com (2603:10b6:5:16b::28) by IA1PR21MB3594.namprd21.prod.outlook.com (2603:10b6:208:3e1::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.14; Sat, 1 Apr 2023 04:46:07 +0000 Received: from DM6PR21MB1370.namprd21.prod.outlook.com ([fe80::b7e9:4da1:3c23:35f]) by DM6PR21MB1370.namprd21.prod.outlook.com ([fe80::b7e9:4da1:3c23:35f%4]) with mapi id 15.20.6277.013; Sat, 1 Apr 2023 04:46:07 +0000 From: Michael Kelley <mikelley@microsoft.com> To: hch@lst.de, m.szyprowski@samsung.com, robin.murphy@arm.com, decui@microsoft.com, tiala@microsoft.com, iommu@lists.linux.dev, linux-kernel@vger.kernel.org Cc: mikelley@microsoft.com Subject: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs Date: Fri, 31 Mar 2023 21:45:00 -0700 Message-Id: <1680324300-124563-1-git-send-email-mikelley@microsoft.com> X-Mailer: git-send-email 1.8.3.1 Content-Type: text/plain X-ClientProxiedBy: MW4PR04CA0120.namprd04.prod.outlook.com (2603:10b6:303:83::35) To DM6PR21MB1370.namprd21.prod.outlook.com (2603:10b6:5:16b::28) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6PR21MB1370:EE_|IA1PR21MB3594:EE_ X-MS-Office365-Filtering-Correlation-Id: f0057b43-25d6-4505-8fe5-08db326c01ee X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: rF165aQN+L2szMWmsXIW9CqlJJzHZb6/b+NwzyiY4VCUdf7TgrQI6r/EjbDaj0ECFxHSzBBKoOOD/BbpuNBdW/4X6WX6CVVvE70tKLhZlmh6Dt+aY1yw1pMgMKINMLaANF4RBNP9KbZjaaNYPArB+NuKKXCaZqv8FmSmqBwGzrHI+iduMjHOWtyt2repCbhmXoFzB8hlDXcml1EHJCzuNU/ibUZw8qgXpRqFyIjM9N+BNvjPDczLG2baUi9Nv9VeQD1ti3JtROchWr6Adf2lLrhBADve/t4sW6hrAyLkb1eIgpPzQv00A+YZi2hjCA4WGGiUABCNipeaAPYqJdIZvVV0nb0WM1bKu8ixhkBMD2a9cruBQ6JvRDP5kzccx71JQxYiexD4qtQPJCl3SLdoPalthZ15GI5rQKbA4n/K2lz+A8g2UOhFfbPNeTuxVyUJ0LTv1cH13AqgHhugoUtb+Pnd+YaBrtXB5X3s80S5UD77V8N1StNz+oWvof6ZqlFOytTkLhAwJ7SJzWcW1OOO/CRNadveMMxjFJs5s7883qufIyar7tD02I0PmaXQii7MzjiWUZjYQM1XI8GEkrJEUPHp1nX03SqST8qGnLRODra03L/Ju7tnAXjaFhSrhZPqmrygz34iXIZ3h+x4INVlEmq3McjgvRrM5LA9ICymTu8= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR21MB1370.namprd21.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(346002)(366004)(136003)(376002)(39860400002)(396003)(451199021)(4326008)(8676002)(66556008)(316002)(786003)(66946007)(66476007)(6486002)(36756003)(6512007)(6506007)(107886003)(6666004)(26005)(38350700002)(2616005)(186003)(38100700002)(83380400001)(10290500003)(41300700001)(8936002)(478600001)(52116002)(5660300002)(86362001)(2906002)(82960400001)(82950400001);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: erhg+B4BmizUzpUaAZ+F1TkqCgEsv/gppqF7EA68jldO5afqDTvAxVYRACSDBcunlWlxv+oCtxUMrbM8OPnZf/hP5x+G79271wUWfnKCWoBOp0LB0TLJ23BZ95NWuaVXfbkpCHVuQIIWjPSUNhRqiF/MNQOTQ8ImaJ+OYud4+dyjk+ZkShzfDeHlV4G3NauYa6RJdN4WNl6EZGNoNTlxn58+KcWb95oqIgYGIE5qveogQR0/okXtdYjupWBVy+U7UelDklj1FuiqxWJPwsyNeL/rdydcBIJlyiLvhFWyo6/5mnZoFy6u/Tg8ydojxy6GLYkPw9md0gRaBG3GqFj/DtKWJqHeh+8ejCiP0+xeSWawAubuDLpIsf6q88oRbZLxLfdEbIrN6HvDPbnwUWA4lrkUMJcBH3BTIVnMTMyO/EnuTDb0cKMRTLiDNA+DXmciDIPr6+7AuXHR3qCPh3kQyuI8cT9p/6ikDq6YZdUC2bv8AQKGIG+qN1YMH8gIJXNsAOkgjf3JQ14eZ9dyBTXSP6cH3Mh/x3yCSTbRWN6YLKRU60r/0wzOmYJ/Uwwxk97ie8YDx+okSGZ6oRpvPhemNEhXS15cQVNOXMzsQJJts44pasF+7PCurdKsLRs7btJKm+5JXqeJZyBxBddbjFH6nGGDCkyLMF6UW4ge1ZC34GAHkmNjBesPzFzRhVJinbzUoxNQQiuu2kBN07k9rxvd+PMDN5bSh6jYijuSPMoLACBDhsCtciiP8Wn9TQDfXrAvnn70mo5TywamGPcP5JS2/Auh5/sSvmP2Rnjbt2lKyM3mPHqzYWRYM4X3VQuPqSJJW9bOQtv3on8mUorQ/P//UqclMdEx5rIxaYt0F7Mpo+u0krCe9d9Z6B0rrMRu/6abQ5mSRfzILpbKEI7pYBciklu4ZQK2nACA0y4EXWfRRmxR8KqO6s9JyBKRIR1FHKfcL5jTvX3XQYxkL8heTAkO1XvFQbpaTZUpRcJCy/0LvCHdrWqH0xIuObsN23Ni9ZPwlPASNQgFAx9MewO6NCWXnVe+W0Mrh6WtJoiC79qbBBkVwpo+DPLWTCZRqaeMn+rM2VDKdqyiZutD2MO6vOIk3sk/LiTcDvDr0WKpKzPOxmp5EzJCOQbKrffiOYDiH/B/5lTsMm+0OJGAIlXMFzb0NeB8+ny3xsTI4BoejJTqHz1Ra7afK/TBcKZKmgKIj2gKcnw698g9gjcAv2iAexW8CNWfhRuBf5axbWKedFxknlljnRfkEAbe5RJvM+uOJIoQoWuk3CAu3CzzI1D2Osx/D/d3vhegCBOBHNNjwp+NLL/oPrvLYDhF+dxtzyodAVugrbjk1IP2pgWT1yPXmuubbILsFJpmZeTPNk1KOBPVMoxykUpxduv5BJpmowlIzeJG4OYQz+fZ08wJG7wOT6Iybn8tHWUSZBToUi872tyY70l3aBIlGCS8SLdqdPTJdLFR/PNESGJSUMJphMeJc1C6yxtkYucjqhLmQU8gpgcwSKqZlqX+trU6uyxIjFIq6vQF0mz0FNMunC9igy5QqANDKhbs1yjEEGw3t5P2Pb+xx1fhZneM08JPKViu7yd0Qsqt X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-Network-Message-Id: f0057b43-25d6-4505-8fe5-08db326c01ee X-MS-Exchange-CrossTenant-AuthSource: DM6PR21MB1370.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2023 04:46:07.4842 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: RrTKPPCdWn6cdWmBSwC065UKyPnqw4FdoEA+hno8InkXbOZLN3Ngs98gNQ5Fw4LJVouMYj4O3aafPPU7cKOdzQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR21MB3594 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761948533468930171?= X-GMAIL-MSGID: =?utf-8?q?1761948533468930171?= |
Series |
[v3,1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs
|
|
Commit Message
Michael Kelley (LINUX)
April 1, 2023, 4:45 a.m. UTC
swiotlb currently reports the total number of slabs and the instantaneous
in-use slabs in debugfs. But with increased usage of swiotlb for all I/O
in Confidential Computing (coco) VMs, it has become difficult to know
how much memory to allocate for swiotlb bounce buffers, either via the
automatic algorithm in the kernel or by specifying a value on the
kernel boot line. The current automatic algorithm generously allocates
swiotlb bounce buffer memory, and may be wasting significant memory in
many use cases.
To support better understanding of swiotlb usage, add tracking of the
the high water mark usage of the default swiotlb bounce buffer memory
pool. Report the high water mark in debugfs along with the other swiotlb
metrics. Allow the high water mark to be reset to zero at runtime by
writing to it.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
Changes in v3:
* Do high water mark accounting only when CONFIG_DEBUG_FS=y. As
as a result, add back the mem_used() function for the "swiotlb
buffer is full" error message. [Christoph -- I didn't hear back
whether this approach addresses your concern about one additional
atomic operation when slots are allocated and again when freed. I've
gone ahead with this new version, and we can obviously have further
discussion.]
* Remove unnecessary u64 casts. [Christoph Hellwig]
* Track slot usage and the high water mark only for io_tlb_default_mem.
Previous versions incorrectly included per-device pools. [Petr Tesarik]
Changes in v2:
* Only reset the high water mark to zero when the specified new value
is zero, to prevent confusion about the ability to reset to some
other value [Dexuan Cui]
kernel/dma/swiotlb.c | 41 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 40 insertions(+), 1 deletion(-)
Comments
On Fri, Mar 31, 2023 at 09:45:00PM -0700, Michael Kelley wrote: > Changes in v3: > * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As > as a result, add back the mem_used() function for the "swiotlb > buffer is full" error message. [Christoph -- I didn't hear back > whether this approach addresses your concern about one additional > atomic operation when slots are allocated and again when freed. I've > gone ahead with this new version, and we can obviously have further > discussion.] Still not too happy, but at least debugfs is an interfact we could remove at any time. But can you please factor the used_hiwater accounting into two separate helpers that are udner CONFIG_DEBUG_FS and otherwise stubbed out, instead of adding the logic directly into swiotlb_do_find_slots and swiotlb_release_slots?
On Fri, 31 Mar 2023 21:45:00 -0700 Michael Kelley <mikelley@microsoft.com> wrote: > swiotlb currently reports the total number of slabs and the instantaneous > in-use slabs in debugfs. But with increased usage of swiotlb for all I/O > in Confidential Computing (coco) VMs, it has become difficult to know > how much memory to allocate for swiotlb bounce buffers, either via the > automatic algorithm in the kernel or by specifying a value on the > kernel boot line. The current automatic algorithm generously allocates > swiotlb bounce buffer memory, and may be wasting significant memory in > many use cases. > > To support better understanding of swiotlb usage, add tracking of the > the high water mark usage of the default swiotlb bounce buffer memory > pool. Report the high water mark in debugfs along with the other swiotlb > metrics. Allow the high water mark to be reset to zero at runtime by > writing to it. > > Signed-off-by: Michael Kelley <mikelley@microsoft.com> > --- > Changes in v3: > * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As > as a result, add back the mem_used() function for the "swiotlb > buffer is full" error message. [Christoph -- I didn't hear back > whether this approach addresses your concern about one additional > atomic operation when slots are allocated and again when freed. I've > gone ahead with this new version, and we can obviously have further > discussion.] > > * Remove unnecessary u64 casts. [Christoph Hellwig] > > * Track slot usage and the high water mark only for io_tlb_default_mem. > Previous versions incorrectly included per-device pools. [Petr Tesarik] > > Changes in v2: > * Only reset the high water mark to zero when the specified new value > is zero, to prevent confusion about the ability to reset to some > other value [Dexuan Cui] > > kernel/dma/swiotlb.c | 41 ++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 40 insertions(+), 1 deletion(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index d3d6be0..6587a3d 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -76,6 +76,9 @@ struct io_tlb_slot { > static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; > static unsigned long default_nareas; > > +static atomic_long_t total_used = ATOMIC_LONG_INIT(0); > +static atomic_long_t used_hiwater = ATOMIC_LONG_INIT(0); > + > /** > * struct io_tlb_area - IO TLB memory area descriptor > * > @@ -594,6 +597,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, > unsigned long flags; > unsigned int slot_base; > unsigned int slot_index; > + unsigned long old_hiwater, new_used; > > BUG_ON(!nslots); > BUG_ON(area_index >= mem->nareas); > @@ -663,6 +667,17 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, > area->index = 0; > area->used += nslots; > spin_unlock_irqrestore(&area->lock, flags); > + > + if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) { Yes, this works fine now, but why are total_used and used_hiwater global variables? If you make them fields in struct io_tlb_mem (possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the check. Of course, in instances other than io_tlb_default_mem these fields would not be exported to userspace through debugfs, but if really needed, I can at least find them in a crash dump (or read them through /proc/kcore). Petr T
From: Christoph Hellwig <hch@lst.de> Sent: Thursday, April 6, 2023 10:50 PM > > On Fri, Mar 31, 2023 at 09:45:00PM -0700, Michael Kelley wrote: > > Changes in v3: > > * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As > > as a result, add back the mem_used() function for the "swiotlb > > buffer is full" error message. [Christoph -- I didn't hear back > > whether this approach addresses your concern about one additional > > atomic operation when slots are allocated and again when freed. I've > > gone ahead with this new version, and we can obviously have further > > discussion.] > > Still not too happy, but at least debugfs is an interfact we could > remove at any time. > > But can you please factor the used_hiwater accounting into two > separate helpers that are udner CONFIG_DEBUG_FS and otherwise > stubbed out, instead of adding the logic directly into > swiotlb_do_find_slots and swiotlb_release_slots? I coded the way I did to follow the kernel coding style guidance that prefers converting a Kconfig symbol into a C boolean expression, and using it in a normal C conditional instead of using #ifdef. If CONFIG_DEBUG_FS=n, the compiler will constant fold the conditional away so there's no runtime overhead. I like the way that approached worked out in this case, but if you prefer separate functions with #ifdef and stubs, I don't feel strongly either way. Michael
From: Petr Tesařík <petr@tesarici.cz> Sent: Friday, April 7, 2023 3:56 AM > > On Fri, 31 Mar 2023 21:45:00 -0700 > Michael Kelley <mikelley@microsoft.com> wrote: > > > swiotlb currently reports the total number of slabs and the instantaneous > > in-use slabs in debugfs. But with increased usage of swiotlb for all I/O > > in Confidential Computing (coco) VMs, it has become difficult to know > > how much memory to allocate for swiotlb bounce buffers, either via the > > automatic algorithm in the kernel or by specifying a value on the > > kernel boot line. The current automatic algorithm generously allocates > > swiotlb bounce buffer memory, and may be wasting significant memory in > > many use cases. > > > > To support better understanding of swiotlb usage, add tracking of the > > the high water mark usage of the default swiotlb bounce buffer memory > > pool. Report the high water mark in debugfs along with the other swiotlb > > metrics. Allow the high water mark to be reset to zero at runtime by > > writing to it. > > > > Signed-off-by: Michael Kelley <mikelley@microsoft.com> > > --- > > Changes in v3: > > * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As > > as a result, add back the mem_used() function for the "swiotlb > > buffer is full" error message. [Christoph -- I didn't hear back > > whether this approach addresses your concern about one additional > > atomic operation when slots are allocated and again when freed. I've > > gone ahead with this new version, and we can obviously have further > > discussion.] > > > > * Remove unnecessary u64 casts. [Christoph Hellwig] > > > > * Track slot usage and the high water mark only for io_tlb_default_mem. > > Previous versions incorrectly included per-device pools. [Petr Tesarik] > > > > Changes in v2: > > * Only reset the high water mark to zero when the specified new value > > is zero, to prevent confusion about the ability to reset to some > > other value [Dexuan Cui] > > > > kernel/dma/swiotlb.c | 41 ++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 40 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > > index d3d6be0..6587a3d 100644 > > --- a/kernel/dma/swiotlb.c > > +++ b/kernel/dma/swiotlb.c > > @@ -76,6 +76,9 @@ struct io_tlb_slot { > > static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; > > static unsigned long default_nareas; > > > > +static atomic_long_t total_used = ATOMIC_LONG_INIT(0); > > +static atomic_long_t used_hiwater = ATOMIC_LONG_INIT(0); > > + > > /** > > * struct io_tlb_area - IO TLB memory area descriptor > > * > > @@ -594,6 +597,7 @@ static int swiotlb_do_find_slots(struct device *dev, int > area_index, > > unsigned long flags; > > unsigned int slot_base; > > unsigned int slot_index; > > + unsigned long old_hiwater, new_used; > > > > BUG_ON(!nslots); > > BUG_ON(area_index >= mem->nareas); > > @@ -663,6 +667,17 @@ static int swiotlb_do_find_slots(struct device *dev, int > area_index, > > area->index = 0; > > area->used += nslots; > > spin_unlock_irqrestore(&area->lock, flags); > > + > > + if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) { > > Yes, this works fine now, but why are total_used and used_hiwater > global variables? If you make them fields in struct io_tlb_mem > (possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the > check. Of course, in instances other than io_tlb_default_mem these > fields would not be exported to userspace through debugfs, but if really > needed, I can at least find them in a crash dump (or read them through > /proc/kcore). > Got it. Your previously comments mentioned making them fields in struct io_tlb_mem, and I missed your point. :-( I got focused on fixing the accounting for DEBUG_FS so it didn't include the non-default pools, and didn't pick up on the idea of doing the accounting for the non-default pools even though the values aren't exposed in /sys. I'll fix this in the next version. Michael
On Fri, Apr 07, 2023 at 10:05:26PM +0000, Michael Kelley (LINUX) wrote: > > Yes, this works fine now, but why are total_used and used_hiwater > > global variables? If you make them fields in struct io_tlb_mem > > (possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the > > check. Of course, in instances other than io_tlb_default_mem these > > fields would not be exported to userspace through debugfs, but if really > > needed, I can at least find them in a crash dump (or read them through > > /proc/kcore). > > > > Got it. > > Your previously comments mentioned making them fields in struct io_tlb_mem, > and I missed your point. :-( I got focused on fixing the accounting for > DEBUG_FS so it didn't include the non-default pools, and didn't pick up on the > idea of doing the accounting for the non-default pools even though the values > aren't exposed in /sys. I'll fix this in the next version. FYI, I agree that per-instance accounting is probably the better way, too.
On Fri, Apr 07, 2023 at 10:01:13PM +0000, Michael Kelley (LINUX) wrote: > I coded the way I did to follow the kernel coding style guidance > that prefers converting a Kconfig symbol into a C boolean > expression, and using it in a normal C conditional instead of > using #ifdef. If CONFIG_DEBUG_FS=n, the compiler will constant > fold the conditional away so there's no runtime overhead. I > like the way that approached worked out in this case, but if you prefer > separate functions with #ifdef and stubs, I don't feel strongly either way. I don't think there is a a hard and clear rule. Actual ifdefs have the benefit of allowing to actually remove struct fields as well. But the important bit is that I do want the accounting in helpers instead of in the main swiotlb logic. And once you do that, having #ifdefed stubs for the functions make sense to me.
From: hch@lst.de <hch@lst.de> Sent: Monday, April 10, 2023 8:42 PM > > On Fri, Apr 07, 2023 at 10:05:26PM +0000, Michael Kelley (LINUX) wrote: > > > Yes, this works fine now, but why are total_used and used_hiwater > > > global variables? If you make them fields in struct io_tlb_mem > > > (possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the > > > check. Of course, in instances other than io_tlb_default_mem these > > > fields would not be exported to userspace through debugfs, but if really > > > needed, I can at least find them in a crash dump (or read them through > > > /proc/kcore). > > > > > > > Got it. > > > > Your previously comments mentioned making them fields in struct io_tlb_mem, > > and I missed your point. :-( I got focused on fixing the accounting for > > DEBUG_FS so it didn't include the non-default pools, and didn't pick up on the > > idea of doing the accounting for the non-default pools even though the values > > aren't exposed in /sys. I'll fix this in the next version. > > FYI, I agree that per-instance accounting is probably the better way, > too. It turns out that restricted pool swiotlb information *is* available in /sys when CONFIG_DEBUG_FS=y. There's a call to swiotlb_create_debugfs_files() in rmem_swiotlb_device_init(). But the /sys info is broken because io_tlb_used_get() is hardwired to the default pool. But this is easily fixable. I'll do a two-patch series that fixes this problem, and then makes the other changes for the high water mark that we've been discussing. Michael
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index d3d6be0..6587a3d 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -76,6 +76,9 @@ struct io_tlb_slot { static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; static unsigned long default_nareas; +static atomic_long_t total_used = ATOMIC_LONG_INIT(0); +static atomic_long_t used_hiwater = ATOMIC_LONG_INIT(0); + /** * struct io_tlb_area - IO TLB memory area descriptor * @@ -594,6 +597,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, unsigned long flags; unsigned int slot_base; unsigned int slot_index; + unsigned long old_hiwater, new_used; BUG_ON(!nslots); BUG_ON(area_index >= mem->nareas); @@ -663,6 +667,17 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, area->index = 0; area->used += nslots; spin_unlock_irqrestore(&area->lock, flags); + + if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) { + new_used = atomic_long_add_return(nslots, &total_used); + old_hiwater = atomic_long_read(&used_hiwater); + do { + if (new_used <= old_hiwater) + break; + } while (!atomic_long_try_cmpxchg(&used_hiwater, + &old_hiwater, new_used)); + } + return slot_index; } @@ -795,6 +810,9 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr) mem->slots[i].list = ++count; area->used -= nslots; spin_unlock_irqrestore(&area->lock, flags); + + if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) + atomic_long_sub(nslots, &total_used); } /* @@ -891,10 +909,29 @@ bool is_swiotlb_active(struct device *dev) static int io_tlb_used_get(void *data, u64 *val) { - *val = mem_used(&io_tlb_default_mem); + *val = atomic_long_read(&total_used); + return 0; +} + +static int io_tlb_hiwater_get(void *data, u64 *val) +{ + *val = atomic_long_read(&used_hiwater); + return 0; +} + +static int io_tlb_hiwater_set(void *data, u64 val) +{ + /* Only allow setting to zero */ + if (val != 0) + return -EINVAL; + + atomic_long_set(&used_hiwater, val); return 0; } + DEFINE_DEBUGFS_ATTRIBUTE(fops_io_tlb_used, io_tlb_used_get, NULL, "%llu\n"); +DEFINE_DEBUGFS_ATTRIBUTE(fops_io_tlb_hiwater, io_tlb_hiwater_get, + io_tlb_hiwater_set, "%llu\n"); static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem, const char *dirname) @@ -906,6 +943,8 @@ static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem, debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, &mem->nslabs); debugfs_create_file("io_tlb_used", 0400, mem->debugfs, NULL, &fops_io_tlb_used); + debugfs_create_file("io_tlb_used_hiwater", 0600, mem->debugfs, NULL, + &fops_io_tlb_hiwater); } static int __init __maybe_unused swiotlb_create_default_debugfs(void)