Message ID | 20240212193052.27765-1-will@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-62273-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp125039dyb; Mon, 12 Feb 2024 11:31:15 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUdzsrmiYH4k47TovS50DPOkdCGtCO0yySGRG3pnatweLTv7q2PKh3OibyoKTTigSV8pjyNDqDJYBxn+lHg3fEVWXf2QA== X-Google-Smtp-Source: AGHT+IFBQKNsPqqCVC6AeDpnRhX2sEsHcs269NLO+hUm9zUwmAQ2Wh94WKzlbffsEBZVSW1zYjV5 X-Received: by 2002:a17:902:d346:b0:1d8:e7a4:3474 with SMTP id l6-20020a170902d34600b001d8e7a43474mr7926329plk.68.1707766275196; Mon, 12 Feb 2024 11:31:15 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707766275; cv=pass; d=google.com; s=arc-20160816; b=ducFpsnhtHrX9Liowdlb+xerkyLVM3+43kvcYlKTwEoee/TFBFa/06oXt6QxO0dnrG MPVVy4U72kIzSryLAOJO9i8tf8Adih/R7Y4VjKVFv1Y0Q1DqpVshg4yr+K2qz9Jxv7jB t6/DhLw8uNf4+26XokpTZ2dpZ4B45yJYdrzHnm19xe0iDA/5K/TbQ/1jlhZRVdCJKO+E X4AHxKaImxiBkRGLGFb24DzjxN6ZvqbIL+5Z1ROehalM+qt0VSjXBCysNVUteaYVD2gX wLJBKy5C1+CGcld7fY7UEEdiun9glhcvufYuWaWPRvtLZWrFLbtz1Xn8VtOLsuE4ecEj 2U3Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=xbDIjGqMtKkdj5NCFegLKFmNbw2fCjzwgVIKSfYLNiQ=; fh=YoTsfucATJtE7W0KRrpqzw/5tLYVbhv+ZD0eXEfPx0Q=; b=Ss4hZzbv3GPyMpIncfiFwzEiDVH+98zXQgvMKCXEK4rBVrsYI7VmsJY2ZYG+Aoflrg KEkAQ6SwdZc4DErldVul+8Vm2TXVzyFAduH9g5diUSTO34qe9VekQJ953xmkybRU3Qzn /rSFZPvEo68KTvXMttRcPCghXwY1dNWuusnro+zxX3Neu8vNR+IV2XhIzlpmpMK6/KEJ ZKkn2CthFCuQHfFmj/LDVULgy5E9pD8YTV4AyLmOR6sh5c7Lag7+LayQTte/XkMpIeYO RiJtww/YDGf2ZvzqvxxYwDdfA1UdZ5+a7UIYs1XV/H/u1eaj09+OSEBVd+RtNsT9Dsqf vBhQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="YyM/2gMB"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-62273-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62273-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=2; AJvYcCUgtdGW9sbmb17DV2lWpi0dzTV0m0jbWIwdpmoo7TsW94WFcAGCSqBTdPm7fPyyYIwDeEOL02awhhx5Ks4VE0XLt6Lixw== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id q6-20020a170902a3c600b001d92eb54a6asi655632plb.387.2024.02.12.11.31.15 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Feb 2024 11:31:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-62273-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="YyM/2gMB"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-62273-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62273-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id F1BAF2821AD for <ouuuleilei@gmail.com>; Mon, 12 Feb 2024 19:31:14 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1E2E047A6C; Mon, 12 Feb 2024 19:31:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YyM/2gMB" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 789004644F; Mon, 12 Feb 2024 19:30:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707766259; cv=none; b=ZS3zNBQfOAA5r29AI3MUR+V7ZbBGp2ygX2rUI0fMn5w1RLDrNnFeHeaoabKa4QftdMJqUQSCrFs5y/F3JIpt9G0etAgZW3TFkNT4heYOfQGVSK7bJkbMmmvmTEVhdvbEfHkY/yGwRkVrdtCIzXAvDrE3XVsMqUSr0Evr0AGKUKo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707766259; c=relaxed/simple; bh=/rhBDl7xe1E+WL+owZXps5bvP9Vg6PrEijQUZDpqhSY=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=oaclM9k8gphJ6EEh5lX+R3wYwMs9+UaadIlYVaffhl8c+y/AM9MDp0F/31r6HJDW8EvCC/5zRocAStdawtmIMpeXX14AGe82kJ5Ea0HQNiw0VJYxbNFsQqKTu2BO9e6LqQzClDn4AHOCdZNQb5I44ckqlvijQi/oww6S6OvFwx4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YyM/2gMB; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B7E3C433C7; Mon, 12 Feb 2024 19:30:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707766259; bh=/rhBDl7xe1E+WL+owZXps5bvP9Vg6PrEijQUZDpqhSY=; h=From:To:Cc:Subject:Date:From; b=YyM/2gMBKPR1BY2Dhqzeg8KnJA56etjOSCiimzNdDy7AkF8yoy1JYKjc7HNc52NmO 2PdesYd+VB9mfzxKCmlFbLQopIzt3P0GZtfpn70y2GkatZ8x7n8RDM4Yu0yRtpY1uU RFnxnSsWx1gNlnAKxL/deJo4N4qDi/TRtRHn/qBAGsVpymiy9RBMzZcht6iXbklamk Kx+MckCcNetqMOwJ8fp23wFO7IS34ynvBf6Sv5RgEQASzum5JvuhGm8yxldUqyttZo +ZC2CWYABXCvpy+1WAA7+iyQjwdRUEEskM8onLcUwKR6Iq9nOwGjz5sbcAo9GglJj4 q3DUdMiWr8QKw== From: Will Deacon <will@kernel.org> To: kvmarm@lists.linux.dev Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>, Oliver Upton <oliver.upton@linux.dev>, Ricardo Koller <ricarkol@google.com> Subject: [RFC PATCH] KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked() Date: Mon, 12 Feb 2024 19:30:52 +0000 Message-Id: <20240212193052.27765-1-will@kernel.org> X-Mailer: git-send-email 2.20.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790722729866877662 X-GMAIL-MSGID: 1790722729866877662 |
Series |
[RFC] KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked()
|
|
Commit Message
Will Deacon
Feb. 12, 2024, 7:30 p.m. UTC
kvm_pgtable_stage2_free_unlinked() does the final put_page() on the
root page of the sub-tree before returning, so remove the additional
put_page() invocations in the callers.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
Hi folks,
Sending this as an RFC as I only spotted it from code inspection and I'm
surprised others aren't seeing fireworks if it's a genuine bug. I also
couldn't come up with a sensible Fixes tag, as all of:
e7c05540c694b ("KVM: arm64: Add helper for creating unlinked stage2 subtrees")
8f5a3eb7513fc ("KVM: arm64: Add kvm_pgtable_stage2_split()")
f6a27d6dc51b2 ("KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()")
are actually ok in isolation. Hrm. Please tell me I'm wrong?
arch/arm64/kvm/hyp/pgtable.c | 2 --
1 file changed, 2 deletions(-)
Comments
On Mon, Feb 12, 2024 at 07:30:52PM +0000, Will Deacon wrote: > kvm_pgtable_stage2_free_unlinked() does the final put_page() on the > root page of the sub-tree before returning, so remove the additional > put_page() invocations in the callers. > > Cc: Marc Zyngier <maz@kernel.org> > Cc: Oliver Upton <oliver.upton@linux.dev> > Cc: Ricardo Koller <ricarkol@google.com> > Signed-off-by: Will Deacon <will@kernel.org> > --- > > Hi folks, > > Sending this as an RFC as I only spotted it from code inspection and I'm > surprised others aren't seeing fireworks if it's a genuine bug. I also > couldn't come up with a sensible Fixes tag, as all of: > > e7c05540c694b ("KVM: arm64: Add helper for creating unlinked stage2 subtrees") > 8f5a3eb7513fc ("KVM: arm64: Add kvm_pgtable_stage2_split()") > f6a27d6dc51b2 ("KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()") > > are actually ok in isolation. Hrm. Please tell me I'm wrong? > > arch/arm64/kvm/hyp/pgtable.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index c651df904fe3..ab9d05fcf98b 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -1419,7 +1419,6 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > level + 1); > if (ret) { > kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); > - mm_ops->put_page(pgtable); > return ERR_PTR(ret); > } AFAICT, this entire branch is effectively dead code, unless there's a KVM bug lurking behind the page table walk. The sub-tree isn't visible to other software or hardware walkers yet, so none of the PTE races could cause this to pop. So while this is very obviously a bug, it might be pure luck that folks haven't seen smoke here. Perhaps while fixing the bug we should take the opportunity to promote the condition to WARN_ON_ONCE(). > @@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx, > > if (!stage2_try_break_pte(ctx, mmu)) { > kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level); > - mm_ops->put_page(childp); > return -EAGAIN; > } This, on the other hand, seems possible. There exists a race where an old block PTE could have the AF set on it and the underlying cmpxchg() could fail. There shouldn't be a race with any software walkers, as we hold the MMU lock for write here.
On Mon, Feb 12, 2024 at 08:14:41PM +0000, Oliver Upton wrote: > On Mon, Feb 12, 2024 at 07:30:52PM +0000, Will Deacon wrote: > > kvm_pgtable_stage2_free_unlinked() does the final put_page() on the > > root page of the sub-tree before returning, so remove the additional > > put_page() invocations in the callers. > > > > Cc: Marc Zyngier <maz@kernel.org> > > Cc: Oliver Upton <oliver.upton@linux.dev> > > Cc: Ricardo Koller <ricarkol@google.com> > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > > > Hi folks, > > > > Sending this as an RFC as I only spotted it from code inspection and I'm > > surprised others aren't seeing fireworks if it's a genuine bug. I also > > couldn't come up with a sensible Fixes tag, as all of: > > > > e7c05540c694b ("KVM: arm64: Add helper for creating unlinked stage2 subtrees") > > 8f5a3eb7513fc ("KVM: arm64: Add kvm_pgtable_stage2_split()") > > f6a27d6dc51b2 ("KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()") > > > > are actually ok in isolation. Hrm. Please tell me I'm wrong? > > > > arch/arm64/kvm/hyp/pgtable.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > index c651df904fe3..ab9d05fcf98b 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -1419,7 +1419,6 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > > level + 1); > > if (ret) { > > kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); > > - mm_ops->put_page(pgtable); > > return ERR_PTR(ret); > > } > > AFAICT, this entire branch is effectively dead code, unless there's a > KVM bug lurking behind the page table walk. The sub-tree isn't visible > to other software or hardware walkers yet, so none of the PTE races > could cause this to pop. > > So while this is very obviously a bug, it might be pure luck that folks > haven't seen smoke here. Perhaps while fixing the bug we should take the > opportunity to promote the condition to WARN_ON_ONCE(). > > > @@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx, > > > > if (!stage2_try_break_pte(ctx, mmu)) { > > kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level); > > - mm_ops->put_page(childp); > > return -EAGAIN; > > } > > This, on the other hand, seems possible. There exists a race where an > old block PTE could have the AF set on it and the underlying cmpxchg() > could fail. That is to say, a race with a *HW* update to the PTE that sets AF. > There shouldn't be a race with any software walkers, as we > hold the MMU lock for write here. > > -- > Thanks, > Oliver
On Mon, 12 Feb 2024 20:14:37 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Mon, Feb 12, 2024 at 07:30:52PM +0000, Will Deacon wrote: > > kvm_pgtable_stage2_free_unlinked() does the final put_page() on the > > root page of the sub-tree before returning, so remove the additional > > put_page() invocations in the callers. > > > > Cc: Marc Zyngier <maz@kernel.org> > > Cc: Oliver Upton <oliver.upton@linux.dev> > > Cc: Ricardo Koller <ricarkol@google.com> > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > > > Hi folks, > > > > Sending this as an RFC as I only spotted it from code inspection and I'm > > surprised others aren't seeing fireworks if it's a genuine bug. I also > > couldn't come up with a sensible Fixes tag, as all of: > > > > e7c05540c694b ("KVM: arm64: Add helper for creating unlinked stage2 subtrees") > > 8f5a3eb7513fc ("KVM: arm64: Add kvm_pgtable_stage2_split()") > > f6a27d6dc51b2 ("KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()") I'd blame it on the last commit, as we really ought to have it if we have the others. > > > > are actually ok in isolation. Hrm. Please tell me I'm wrong? > > > > arch/arm64/kvm/hyp/pgtable.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > index c651df904fe3..ab9d05fcf98b 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -1419,7 +1419,6 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > > level + 1); > > if (ret) { > > kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); > > - mm_ops->put_page(pgtable); > > return ERR_PTR(ret); > > } > > AFAICT, this entire branch is effectively dead code, unless there's a > KVM bug lurking behind the page table walk. The sub-tree isn't visible > to other software or hardware walkers yet, so none of the PTE races > could cause this to pop. > > So while this is very obviously a bug, it might be pure luck that folks > haven't seen smoke here. Perhaps while fixing the bug we should take the > opportunity to promote the condition to WARN_ON_ONCE(). Can't you construct a case where an allocation fails during the walk (memcache empty), and we end up on this exact path? > > > @@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx, > > > > if (!stage2_try_break_pte(ctx, mmu)) { > > kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level); > > - mm_ops->put_page(childp); > > return -EAGAIN; > > } > > This, on the other hand, seems possible. There exists a race where an > old block PTE could have the AF set on it and the underlying cmpxchg() > could fail. There shouldn't be a race with any software walkers, as we > hold the MMU lock for write here. AF update is indeed a likely candidate. In any case, this patch looks good to me as it is, and we can always have a separate tweak to adjust the severity of the first case as required. Unless anyone objects, I'd like to queue it shortly. Thanks, M.
On Tue, Feb 13, 2024 at 11:12:34AM +0000, Marc Zyngier wrote: > On Mon, 12 Feb 2024 20:14:37 +0000, > Oliver Upton <oliver.upton@linux.dev> wrote: > > On Mon, Feb 12, 2024 at 07:30:52PM +0000, Will Deacon wrote: > > > Sending this as an RFC as I only spotted it from code inspection and I'm > > > surprised others aren't seeing fireworks if it's a genuine bug. I also > > > couldn't come up with a sensible Fixes tag, as all of: > > > > > > e7c05540c694b ("KVM: arm64: Add helper for creating unlinked stage2 subtrees") > > > 8f5a3eb7513fc ("KVM: arm64: Add kvm_pgtable_stage2_split()") > > > f6a27d6dc51b2 ("KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()") > > I'd blame it on the last commit, as we really ought to have it if we > have the others. Yes, that's probably the best approach if you're adding a Fixes tag. > > > @@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx, > > > > > > if (!stage2_try_break_pte(ctx, mmu)) { > > > kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level); > > > - mm_ops->put_page(childp); > > > return -EAGAIN; > > > } > > > > This, on the other hand, seems possible. There exists a race where an > > old block PTE could have the AF set on it and the underlying cmpxchg() > > could fail. There shouldn't be a race with any software walkers, as we > > hold the MMU lock for write here. > > AF update is indeed a likely candidate. > > In any case, this patch looks good to me as it is, and we can always > have a separate tweak to adjust the severity of the first case as > required. Unless anyone objects, I'd like to queue it shortly. Fine by me! Even though I found it by inspection, I have taken the patch for a spin to check that I (somehow) didn't break something else. Will
On Tue, Feb 13, 2024 at 11:12:34AM +0000, Marc Zyngier wrote: > On Mon, 12 Feb 2024 20:14:37 +0000, > Oliver Upton <oliver.upton@linux.dev> wrote: > > > > On Mon, Feb 12, 2024 at 07:30:52PM +0000, Will Deacon wrote: > > > kvm_pgtable_stage2_free_unlinked() does the final put_page() on the > > > root page of the sub-tree before returning, so remove the additional > > > put_page() invocations in the callers. > > > > > > Cc: Marc Zyngier <maz@kernel.org> > > > Cc: Oliver Upton <oliver.upton@linux.dev> > > > Cc: Ricardo Koller <ricarkol@google.com> > > > Signed-off-by: Will Deacon <will@kernel.org> > > > --- > > > > > > Hi folks, > > > > > > Sending this as an RFC as I only spotted it from code inspection and I'm > > > surprised others aren't seeing fireworks if it's a genuine bug. I also > > > couldn't come up with a sensible Fixes tag, as all of: > > > > > > e7c05540c694b ("KVM: arm64: Add helper for creating unlinked stage2 subtrees") > > > 8f5a3eb7513fc ("KVM: arm64: Add kvm_pgtable_stage2_split()") > > > f6a27d6dc51b2 ("KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()") > > I'd blame it on the last commit, as we really ought to have it if we > have the others. > > > > > > > are actually ok in isolation. Hrm. Please tell me I'm wrong? > > > > > > arch/arm64/kvm/hyp/pgtable.c | 2 -- > > > 1 file changed, 2 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > > index c651df904fe3..ab9d05fcf98b 100644 > > > --- a/arch/arm64/kvm/hyp/pgtable.c > > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > > @@ -1419,7 +1419,6 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > > > level + 1); > > > if (ret) { > > > kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); > > > - mm_ops->put_page(pgtable); > > > return ERR_PTR(ret); > > > } > > > > AFAICT, this entire branch is effectively dead code, unless there's a > > KVM bug lurking behind the page table walk. The sub-tree isn't visible > > to other software or hardware walkers yet, so none of the PTE races > > could cause this to pop. > > > > So while this is very obviously a bug, it might be pure luck that folks > > haven't seen smoke here. Perhaps while fixing the bug we should take the > > opportunity to promote the condition to WARN_ON_ONCE(). > > Can't you construct a case where an allocation fails during the walk > (memcache empty), and we end up on this exact path? Possibly, but AFAICT that can only happen if there was a bug in KVM. We don't start the walk at all if userspace set the split chunk size to 0, and otherwise we expect it to be at least PMD_SIZE, which will top up the cache to 1 every pass. stage2_split_walker() will 'do the right thing' if there aren't enough preallocated pages to get down to level 3. It really doesn't matter either way, I'm just trying to convince myself of the reasons why we haven't seen this explode yet :) > > > > > @@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx, > > > > > > if (!stage2_try_break_pte(ctx, mmu)) { > > > kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level); > > > - mm_ops->put_page(childp); > > > return -EAGAIN; > > > } > > > > This, on the other hand, seems possible. There exists a race where an > > old block PTE could have the AF set on it and the underlying cmpxchg() > > could fail. There shouldn't be a race with any software walkers, as we > > hold the MMU lock for write here. > > AF update is indeed a likely candidate. > > In any case, this patch looks good to me as it is, and we can always > have a separate tweak to adjust the severity of the first case as > required. Unless anyone objects, I'd like to queue it shortly. Agreed, happy with the way this looks and should've added: Reviewed-by: Oliver Upton <oliver.upton@linux.dev> the first time around.
On Tue, 13 Feb 2024 16:29:42 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Tue, Feb 13, 2024 at 11:12:34AM +0000, Marc Zyngier wrote: > > On Mon, 12 Feb 2024 20:14:37 +0000, > > Oliver Upton <oliver.upton@linux.dev> wrote: > > > > > > On Mon, Feb 12, 2024 at 07:30:52PM +0000, Will Deacon wrote: > > > > kvm_pgtable_stage2_free_unlinked() does the final put_page() on the > > > > root page of the sub-tree before returning, so remove the additional > > > > put_page() invocations in the callers. > > > > > > > > Cc: Marc Zyngier <maz@kernel.org> > > > > Cc: Oliver Upton <oliver.upton@linux.dev> > > > > Cc: Ricardo Koller <ricarkol@google.com> > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > --- > > > > > > > > Hi folks, > > > > > > > > Sending this as an RFC as I only spotted it from code inspection and I'm > > > > surprised others aren't seeing fireworks if it's a genuine bug. I also > > > > couldn't come up with a sensible Fixes tag, as all of: > > > > > > > > e7c05540c694b ("KVM: arm64: Add helper for creating unlinked stage2 subtrees") > > > > 8f5a3eb7513fc ("KVM: arm64: Add kvm_pgtable_stage2_split()") > > > > f6a27d6dc51b2 ("KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()") > > > > I'd blame it on the last commit, as we really ought to have it if we > > have the others. > > > > > > > > > > are actually ok in isolation. Hrm. Please tell me I'm wrong? > > > > > > > > arch/arm64/kvm/hyp/pgtable.c | 2 -- > > > > 1 file changed, 2 deletions(-) > > > > > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > > > index c651df904fe3..ab9d05fcf98b 100644 > > > > --- a/arch/arm64/kvm/hyp/pgtable.c > > > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > > > @@ -1419,7 +1419,6 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > > > > level + 1); > > > > if (ret) { > > > > kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); > > > > - mm_ops->put_page(pgtable); > > > > return ERR_PTR(ret); > > > > } > > > > > > AFAICT, this entire branch is effectively dead code, unless there's a > > > KVM bug lurking behind the page table walk. The sub-tree isn't visible > > > to other software or hardware walkers yet, so none of the PTE races > > > could cause this to pop. > > > > > > So while this is very obviously a bug, it might be pure luck that folks > > > haven't seen smoke here. Perhaps while fixing the bug we should take the > > > opportunity to promote the condition to WARN_ON_ONCE(). > > > > Can't you construct a case where an allocation fails during the walk > > (memcache empty), and we end up on this exact path? > > Possibly, but AFAICT that can only happen if there was a bug in KVM. We > don't start the walk at all if userspace set the split chunk size to 0, > and otherwise we expect it to be at least PMD_SIZE, which will top up > the cache to 1 every pass. stage2_split_walker() will 'do the right > thing' if there aren't enough preallocated pages to get down to level 3. > > It really doesn't matter either way, I'm just trying to convince myself > of the reasons why we haven't seen this explode yet :) Yeah, that's probably very unlikely to hit given the current conditions. > > > > > > > > @@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx, > > > > > > > > if (!stage2_try_break_pte(ctx, mmu)) { > > > > kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level); > > > > - mm_ops->put_page(childp); > > > > return -EAGAIN; > > > > } > > > > > > This, on the other hand, seems possible. There exists a race where an > > > old block PTE could have the AF set on it and the underlying cmpxchg() > > > could fail. There shouldn't be a race with any software walkers, as we > > > hold the MMU lock for write here. > > > > AF update is indeed a likely candidate. > > > > In any case, this patch looks good to me as it is, and we can always > > have a separate tweak to adjust the severity of the first case as > > required. Unless anyone objects, I'd like to queue it shortly. > > Agreed, happy with the way this looks and should've added: > > Reviewed-by: Oliver Upton <oliver.upton@linux.dev> > > the first time around. Thanks for that. I'll queue that shortly and send (another) PR. M.
On Mon, 12 Feb 2024 19:30:52 +0000, Will Deacon wrote: > kvm_pgtable_stage2_free_unlinked() does the final put_page() on the > root page of the sub-tree before returning, so remove the additional > put_page() invocations in the callers. Applied to fixes, thanks! [1/1] KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked() commit: c60d847be7b8e69e419e02a2b3d19c2842a3c35d Cheers, M.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index c651df904fe3..ab9d05fcf98b 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1419,7 +1419,6 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, level + 1); if (ret) { kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); - mm_ops->put_page(pgtable); return ERR_PTR(ret); } @@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx, if (!stage2_try_break_pte(ctx, mmu)) { kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level); - mm_ops->put_page(childp); return -EAGAIN; }