From patchwork Sat Jun 24 20:56:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Demi Marie Obenour X-Patchwork-Id: 112494 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp6598192vqr; Sat, 24 Jun 2023 14:06:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7dnOK3p1kQtfjKYXBSpBkGSr5ZIMKyhCh3eZB6EdemRKaJeApqNuHyiX3lZGLAhjXBMlCg X-Received: by 2002:a81:848e:0:b0:56d:5a2:89ff with SMTP id u136-20020a81848e000000b0056d05a289ffmr25204056ywf.33.1687640776932; Sat, 24 Jun 2023 14:06:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687640776; cv=none; d=google.com; s=arc-20160816; b=vuLTx0F9FvD5H1Mbu1zn/9vucMbliYil0UzkZzWauJCyPfWW8srTqlUWMdUJ7JLtVk XzKFVw1u+EndRfFXLksZhWLn6LMqiwxLflKwDcOywS4gZxqK1ZGJA9W+rj0UFO5uXgst KfRWjzjDgNllIfb/jujXQPnDwSzR7hblSgqZrwMehJFY2rS0ThKVadV1j3wmFe2Yzdt7 GIPKeDExuMY6EeVOyZ3lDgRGu2kYsn/8zQRDBGedBJb0eD2awvVSUefDV1bpTzLvyDUC 4bSXvLeJajCBpEzsK0MiNE15+WAG0AkUVdGzMwIuoC10gao80IzGRw7vdqk32TtZftRI B/Mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:feedback-id:dkim-signature :dkim-signature; bh=aTk/llGT6CDFgEyN9hwEaJy+86atLHKKPsvmBAPjgoI=; fh=hRc7PlSyn0ZrKTWhVsrXmq9/CiIyFjO5OaPqnUMtrf8=; b=ODAxJ3IHzg+equcpf4WUDSxvazXgo+Q6VkS4jvwnT9nNL7jHAx2KLiBQVUwgjdSSeL qsd41W9tN/EvFvKOjwP4xtQuhVjcbClmvBpRXFu9FBYCdbt1nxZo1BUxIGyqfYpJAMnG ailsbhgY8vEHrwIAuMmsW/4XOnBso1AIeUXbQpfs9MDltHlznShPzgV8WFzEKaEDUz5e GQlgeq2qlzXWzI/K6Jmpu1fZEmXd4drDFX3ixAQ0WH5Q+iRafujJpNa74q3Gv+NRgcPM 6aevKGwCoOxDfnTKrkQdpNUgJN8pWZ6YC7dq+bYBMvMeJb1hqfKs+VlunJQX8r4OsrL+ uqHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@invisiblethingslab.com header.s=fm2 header.b=fyMhMcfw; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=ivb94SJs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fa21-20020a056a002d1500b0064d6c74a8bbsi2020814pfb.98.2023.06.24.14.06.01; Sat, 24 Jun 2023 14:06:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@invisiblethingslab.com header.s=fm2 header.b=fyMhMcfw; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=ivb94SJs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229529AbjFXU4m (ORCPT + 99 others); Sat, 24 Jun 2023 16:56:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229446AbjFXU4k (ORCPT ); Sat, 24 Jun 2023 16:56:40 -0400 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84048E79; Sat, 24 Jun 2023 13:56:39 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 088F3320005D; Sat, 24 Jun 2023 16:56:35 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Sat, 24 Jun 2023 16:56:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= invisiblethingslab.com; h=cc:cc:content-transfer-encoding :content-type:date:date:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to; s=fm2; t= 1687640195; x=1687726595; bh=aTk/llGT6CDFgEyN9hwEaJy+86atLHKKPsv mBAPjgoI=; b=fyMhMcfww/xTkjO40U1PRouf/cP4p09aJoUZE4MMa9yghcSCirW 7UhJhZT0zNY0WltKQnh35PIvYvGGbEZ+i+ryRtrAPNs6OkOisr5tMzLw9/gORTX/ F5vPvrCAGF1qBYdyp1L2WXsgHEO/8RR3yJG7y1PFsZfXyCwAuNIe0n7CvZFw/e8T YCk2lpjTG1mlvGIJD2JYYPHSas6RdpPpuUUNhm2zTskIsYC5n6LoVb2UIzXhKx+T 7tlmD4XOo11ozSXNakt0mfwmRBnzKPws4MXB/lD02BaFNmzvmIxuzTKhctjXm8hC YUceEjpRRFs7ESg6WppzCpSjC52+iguZOfA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1687640195; x=1687726595; bh=aTk/llGT6CDFg EyN9hwEaJy+86atLHKKPsvmBAPjgoI=; b=ivb94SJsXamChFqJteILHqJU/fr28 VyX3tHydhyEw+jTyLFTCCXbrKV0WkdAvAUhPJ6hATPN492d6RtAq3TNkgkjRQxXt iuuQTo3ImKNZQ0zcI0JbICYIrEnKPmEOH1V1GPtxDI/5YoeKuOhdJK/vIRaEDAIf bOimXxC5laS/JBUcbBctecsP0+M6Owpo+qbdYzmMvkzTU2DDQdoUV948FJH1Jifa S0khYuSQSHqclJ2hJ9JL+L835k6iDDaM+KMu2NzX3McuT5zM3s+xcHOP3jwSv4U8 IzbJIcmP45vRM93XTinU7XUtjhKD5WV41TR284jhtlm8amh9iRN4Px0SQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgeegjedgudehgecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvfevufffkffoggfgsedtkeertdertddtnecuhfhrohhmpeffvghmihcu ofgrrhhivgcuqfgsvghnohhurhcuoeguvghmihesihhnvhhishhisghlvghthhhinhhgsh hlrggsrdgtohhmqeenucggtffrrghtthgvrhhnpedvfeegkedvkefgffegkefhieejtdff keehhfelheefjeeutefgleeggfdtveeileenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpeguvghmihesihhnvhhishhisghlvghthhhinhhgshhl rggsrdgtohhm X-ME-Proxy: Feedback-ID: iac594737:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 24 Jun 2023 16:56:34 -0400 (EDT) From: Demi Marie Obenour To: Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Jan Beulich , Konrad Rzeszutek Wilk Cc: Demi Marie Obenour , Xen developer discussion , Linux Kernel Mailing List , =?utf-8?q?Marek_M?= =?utf-8?q?arczykowski-G=C3=B3recki?= , stable@vger.kernel.org Subject: [PATCH v3] xen: speed up grant-table reclaim Date: Sat, 24 Jun 2023 16:56:22 -0400 Message-ID: <20230624205624.1817-1-demi@invisiblethingslab.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769619615463955922?= X-GMAIL-MSGID: =?utf-8?q?1769619615463955922?= When a grant entry is still in use by the remote domain, Linux must put it on a deferred list. Normally, this list is very short, because the PV network and block protocols expect the backend to unmap the grant first. However, Qubes OS's GUI protocol is subject to the constraints of the X Window System, and as such winds up with the frontend unmapping the window first. As a result, the list can grow very large, resulting in a massive memory leak and eventual VM freeze. To partially solve this problem, make the number of entries that the VM will attempt to free at each iteration tunable. The default is still 10, but it can be overridden at compile-time (via Kconfig), boot-time (via a kernel command-line option), or runtime (via sysfs). This is Cc: stable because (when combined with appropriate userspace changes) it fixes a severe performance and stability problem for Qubes OS users. Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour --- drivers/xen/grant-table.c | 40 ++++++++++++++++++++++++++++----------- 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index e1ec725c2819d4d5dede063eb00d86a6d52944c0..fa666aa6abc3e786dddc94f895641505ec0b23d8 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -498,14 +498,20 @@ static LIST_HEAD(deferred_list); static void gnttab_handle_deferred(struct timer_list *); static DEFINE_TIMER(deferred_timer, gnttab_handle_deferred); +static atomic64_t deferred_count; +static atomic64_t leaked_count; +static unsigned int free_per_iteration = 10; + static void gnttab_handle_deferred(struct timer_list *unused) { - unsigned int nr = 10; + unsigned int nr = READ_ONCE(free_per_iteration); + const bool ignore_limit = nr == 0; struct deferred_entry *first = NULL; unsigned long flags; + size_t freed = 0; spin_lock_irqsave(&gnttab_list_lock, flags); - while (nr--) { + while ((ignore_limit || nr--) && !list_empty(&deferred_list)) { struct deferred_entry *entry = list_first_entry(&deferred_list, struct deferred_entry, list); @@ -515,10 +521,13 @@ static void gnttab_handle_deferred(struct timer_list *unused) list_del(&entry->list); spin_unlock_irqrestore(&gnttab_list_lock, flags); if (_gnttab_end_foreign_access_ref(entry->ref)) { + uint64_t ret = atomic64_sub_return(1, &deferred_count); put_free_entry(entry->ref); - pr_debug("freeing g.e. %#x (pfn %#lx)\n", - entry->ref, page_to_pfn(entry->page)); + pr_debug("freeing g.e. %#x (pfn %#lx), %llu remaining\n", + entry->ref, page_to_pfn(entry->page), + (unsigned long long)ret); put_page(entry->page); + freed++; kfree(entry); entry = NULL; } else { @@ -530,21 +539,22 @@ static void gnttab_handle_deferred(struct timer_list *unused) spin_lock_irqsave(&gnttab_list_lock, flags); if (entry) list_add_tail(&entry->list, &deferred_list); - else if (list_empty(&deferred_list)) - break; } - if (!list_empty(&deferred_list) && !timer_pending(&deferred_timer)) { + if (list_empty(&deferred_list)) + WARN_ON(atomic64_read(&deferred_count)); + else if (!timer_pending(&deferred_timer)) { deferred_timer.expires = jiffies + HZ; add_timer(&deferred_timer); } spin_unlock_irqrestore(&gnttab_list_lock, flags); + pr_debug("Freed %zu references", freed); } static void gnttab_add_deferred(grant_ref_t ref, struct page *page) { struct deferred_entry *entry; gfp_t gfp = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL; - const char *what = KERN_WARNING "leaking"; + uint64_t leaked, deferred; entry = kmalloc(sizeof(*entry), gfp); if (!page) { @@ -567,12 +577,20 @@ static void gnttab_add_deferred(grant_ref_t ref, struct page *page) add_timer(&deferred_timer); } spin_unlock_irqrestore(&gnttab_list_lock, flags); - what = KERN_DEBUG "deferring"; + deferred = atomic64_add_return(1, &deferred_count); + leaked = atomic64_read(&leaked_count); + pr_debug("deferring g.e. %#x (pfn %#lx) (total deferred %llu, total leaked %llu)\n", + ref, page ? page_to_pfn(page) : -1, deferred, leaked); + } else { + deferred = atomic64_read(&deferred_count); + leaked = atomic64_add_return(1, &leaked_count); + pr_warn("leaking g.e. %#x (pfn %#lx) (total deferred %llu, total leaked %llu)\n", + ref, page ? page_to_pfn(page) : -1, deferred, leaked); } - printk("%s g.e. %#x (pfn %#lx)\n", - what, ref, page ? page_to_pfn(page) : -1); } +module_param(free_per_iteration, uint, 0600); + int gnttab_try_end_foreign_access(grant_ref_t ref) { int ret = _gnttab_end_foreign_access_ref(ref);