Message ID | 20240215010004.1456078-2-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-66207-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:b825:b0:106:860b:bbdd with SMTP id da37csp89474dyb; Wed, 14 Feb 2024 17:00:51 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCW8I/oaR+ED3ggocDUw1OqsAGp4s3jwt53jcqSqmpI3T1a1ukEhrrAZQpdxFa2Qp5KqYvmfCzMQqlGPEU1KSD03qhfw2w== X-Google-Smtp-Source: AGHT+IEKhZJ6kGZ6q6J5BAdQlMyN7BYuJn7iV9BM9PfSd7t6+8dzR8Pm0AR9CzWFHTPcbpRKZluU X-Received: by 2002:a05:651c:85:b0:2d0:94b8:72f6 with SMTP id 5-20020a05651c008500b002d094b872f6mr216470ljq.20.1707958851702; Wed, 14 Feb 2024 17:00:51 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707958851; cv=pass; d=google.com; s=arc-20160816; b=XuIA5ttERLEgWuEsCbNC5yIUm+Fnzas+RtpaWemkkhMM5L4uF1K8Hce0tOA0QVaECD fqJXGvofZfSEwY3L/JovPxvAC0Hd1xQZMK/0DU+H2xKDdot06o+8iUpWPZm2lbm+1ddC iAStdxOPnh2nGIfDcCxQdQelw17Kx/8prKuxjDmQjo6FXwijQfiPPzkV5y6Q4tMX9Uz3 oQC43QhV6ZNJDkhNMIcyy7iZNvEP3HAiuK3A1DtnnVtQ06mCoggSFX8vtae9fgvKiJXF 7mDsqVQufSymvuIt2kv/uA2cwHzKE0GO4J29hmd7O0ZO+T/95yPO42XuoXJQXlqF9kJv IkuA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :reply-to:dkim-signature; bh=oyVxJ2h9yAcFbY918b+63LNZqAyOtEjbfiMvxQtpRyI=; fh=T1lt3GMbWfSXGJnG2HyF4TIchQUPEcmzbhWnDJ7n8Ik=; b=hisTG4G6Z3C8pUgCsi1pssLWAt/TFPwsqlqJJ1jMrxlM+fqYLiQVCOVnvdNpKaUr1u lqRPDKKf2DeB/+Mkxp6VCIfmHehhAgy+8vQEtyTmvXNSNQ5TSgqdHYcRpke33wrtBbX2 KlS0ATZoLXzlx48oqptxH/OmJgx6/ix9hEMesDuu066yjphc1QZoKGSgscKICDKHweQZ bm8/hlzPx6pOj0b4tIkIwG3SgMJovlZ7bcpY5hYRvVp2NCMPBwR94N6amCpaCeRhejt4 xH19S4evPYe1YNd/n0DHA7+GPnOPPDwnMJ/EtN8sgWqB6g2sE2FnbdOoCzIsABGmLnTr tHGw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=IeruiMMc; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-66207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-66207-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id d14-20020a056402078e00b005638ef4efeasi87010edy.616.2024.02.14.17.00.51 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 17:00:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-66207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=IeruiMMc; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-66207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-66207-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4FDD01F272CF for <ouuuleilei@gmail.com>; Thu, 15 Feb 2024 01:00:51 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A3E766117; Thu, 15 Feb 2024 01:00:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IeruiMMc" Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 898941362 for <linux-kernel@vger.kernel.org>; Thu, 15 Feb 2024 01:00:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707958812; cv=none; b=tOEosybzQLSLCk1WRRcGCBHecozYW76SF8JADe2T1gLqzEQz3iy/tDOkrz5H31PQ2KfHgZRKyknvIiobnS3J0TPoWeKudBJdKnQ61LT4TTYoIo8eVWoucZf0hgC577/YZglLZ8TOyNhd9ynsSBSxoVbmeNg6E8j8YCCvWClK3Cg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707958812; c=relaxed/simple; bh=jTbj8jFvFfIS5LDb9HHYZfAL1obtBWvqzjhsbEJwzp0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CMmH4zhWYgklYpW/WyneyBUlFl94HH9Q4v8zp/EVbcctDsJG/XFuGsXyifm/CSEPZ2ro3YV7of+yFkoiNTxd+d7Fhleml7h5TSwQEh3EgpJjOvB4/vEa4hPsajslmO6Dn2BLnMUDwDfG1OotGcZKDrmv1mo0C+I1sCXPvF0SupE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=IeruiMMc; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-1da2a1f6509so4428795ad.1 for <linux-kernel@vger.kernel.org>; Wed, 14 Feb 2024 17:00:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707958811; x=1708563611; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=oyVxJ2h9yAcFbY918b+63LNZqAyOtEjbfiMvxQtpRyI=; b=IeruiMMcMZSFPUI0zT3SmhYQ2E27EviymZSruaJf2DFqI/eLMnVmFKWZCBJw27WewJ 3g0M1qyMgbdLxpq7ieITvQwYIQStDVJIecQseU285o//X35JYa72PJwoUXnLkVT4yRGb x6Jw++LQRLZFo9kIFwCWnTP73icwzoozj83eRzZm5ngLP9zr235riQEnc2HfLfdfXceK qpRfUneUjvh7rhlC+qBC3L7a2xF/4VrWvm2RnhBNdBKiLQclw6myONEvWuLI4XP2vFpv d47Wh7sg8oe0FNLpPTyqPHnuUH0vmO0CClVAp7vUc8RazdWECB0Xtoqv6gZyZWQ6EgFb Cp7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707958811; x=1708563611; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oyVxJ2h9yAcFbY918b+63LNZqAyOtEjbfiMvxQtpRyI=; b=LkU6LVCjfwnhjXiCkZvXG+14pAwg+KStjvxCWyn43oFKML1sfdU2gDM5sp8+st5Aiu lwbKbYA6s3n9DQW9P2Zr2xRW/xf+XEZR1Fn81FvFX27NACb9m/KRB+9ei57gSc1Zjg5+ UpkHxfuB9lFyjWyRteUcEOEndOB4pJ0laV2bO1Q6Q1HbfYT+zv2hxxIGvZuPxcLLKNSU O5oFWeURlZ4sss58axxj2eZFbr1OZ8HSruCxdK7QsFrqRUnWduiVTKShv0XA1OpMrAi0 kWsiV4wX/A33AZAj6ZQGrZ1xF3BFtpNYAL2g2qWQQhlyMbISO2ZXWVsads5wPX+KMN7A Dc/A== X-Forwarded-Encrypted: i=1; AJvYcCWObTvnlOcoge/q/bEZzN4jTFydtUyqHpZrKBGS0RH6Cips6iwj4Xx47N8ZLW96QaLPzL0hkzFKoh8xZibQfl5/HcOe6Hm5e+mtpHge X-Gm-Message-State: AOJu0YwJ5yURtfO+5LQj65KEOEgptUc2ztYbRp7SqCqtRbJqOdRZRJbd VmeOrAEYZUxaESCoz3knPFMqovWaIf0hM4uH1bAsRmp0Nz9kIrPqk8KR5es5Te3Q8I9r/DdfwCr 0+w== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:f343:b0:1d9:a460:a63d with SMTP id q3-20020a170902f34300b001d9a460a63dmr682ple.11.1707958810510; Wed, 14 Feb 2024 17:00:10 -0800 (PST) Reply-To: Sean Christopherson <seanjc@google.com> Date: Wed, 14 Feb 2024 17:00:03 -0800 In-Reply-To: <20240215010004.1456078-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> Mime-Version: 1.0 References: <20240215010004.1456078-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog Message-ID: <20240215010004.1456078-2-seanjc@google.com> Subject: [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty From: Sean Christopherson <seanjc@google.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack <dmatlack@google.com>, Pasha Tatashin <tatashin@google.com>, Michael Krebs <mkrebs@google.com> Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790924661270378484 X-GMAIL-MSGID: 1790924661270378484 |
Series |
KVM: x86: Fix dirty logging of emulated atomics
|
|
Commit Message
Sean Christopherson
Feb. 15, 2024, 1 a.m. UTC
When emulating an atomic access on behalf of the guest, mark the target
gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault. This
fixes a bug where KVM effectively corrupts guest memory during live
migration by writing to guest memory without informing userspace that the
page is dirty.
Marking the page dirty got unintentionally dropped when KVM's emulated
CMPXCHG was converted to do a user access. Before that, KVM explicitly
mapped the guest page into kernel memory, and marked the page dirty during
the unmap phase.
Mark the page dirty even if the CMPXCHG fails, as the old data is written
back on failure, i.e. the page is still written. The value written is
guaranteed to be the same because the operation is atomic, but KVM's ABI
is that all writes are dirty logged regardless of the value written. And
more importantly, that's what KVM did before the buggy commit.
Huge kudos to the folks on the Cc list (and many others), who did all the
actual work of triaging and debugging.
Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Pasha Tatashin <tatashin@google.com>
Cc: Michael Krebs <mkrebs@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 10 ++++++++++
1 file changed, 10 insertions(+)
Comments
On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote: > > When emulating an atomic access on behalf of the guest, mark the target > gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault. This > fixes a bug where KVM effectively corrupts guest memory during live > migration by writing to guest memory without informing userspace that the > page is dirty. > > Marking the page dirty got unintentionally dropped when KVM's emulated > CMPXCHG was converted to do a user access. Before that, KVM explicitly > mapped the guest page into kernel memory, and marked the page dirty during > the unmap phase. > > Mark the page dirty even if the CMPXCHG fails, as the old data is written > back on failure, i.e. the page is still written. The value written is > guaranteed to be the same because the operation is atomic, but KVM's ABI > is that all writes are dirty logged regardless of the value written. And > more importantly, that's what KVM did before the buggy commit. > > Huge kudos to the folks on the Cc list (and many others), who did all the > actual work of triaging and debugging. > > Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses") > Cc: stable@vger.kernel.org > Cc: David Matlack <dmatlack@google.com> > Cc: Pasha Tatashin <tatashin@google.com> > Cc: Michael Krebs <mkrebs@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> Let's try this again, gmail... Reviewed-by: Jim Mattson <jmattson@google.com>
On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote: > > When emulating an atomic access on behalf of the guest, mark the target > gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault. This > fixes a bug where KVM effectively corrupts guest memory during live > migration by writing to guest memory without informing userspace that the > page is dirty. > > Marking the page dirty got unintentionally dropped when KVM's emulated > CMPXCHG was converted to do a user access. Before that, KVM explicitly > mapped the guest page into kernel memory, and marked the page dirty during > the unmap phase. > > Mark the page dirty even if the CMPXCHG fails, as the old data is written > back on failure, i.e. the page is still written. The value written is > guaranteed to be the same because the operation is atomic, but KVM's ABI > is that all writes are dirty logged regardless of the value written. And > more importantly, that's what KVM did before the buggy commit. > > Huge kudos to the folks on the Cc list (and many others), who did all the > actual work of triaging and debugging. > > Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses") I'm only half serious but... Should we just revert this commit? This commit claims that kvm_vcpu_map() is unsafe because it can race with mremap(). But there are many other places where KVM uses kvm_vcpu_map() (e.g. nested VMX). It seems like KVM is just not compatible with mremap() until we address all the users of kvm_vcpu_map(). Patching _just_ emulator_cmpxchg_emulated() seems silly but maybe I'm missing some context on what led to commit 1c2361f667f3 being written. kvm_vcpu_map/unmap() might not be the best interface, but it serves as a common choke-point for mapping guest memory to access in KVM. This is helpful for avoiding missed dirty logging updates (obviously) and will be even more helpful if we add support for freezing guest memory and "KVM Userfault" (as discussed in the 1/3 PUCK). I think we all agree we should do more of this (common choke points), not less. If there's a usecase for mremap()ing guest memory, we should make kvm_vcpu_map() play nice with mmu_notifiers.
On Thu, Feb 15, 2024, David Matlack wrote: > On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote: > > > > When emulating an atomic access on behalf of the guest, mark the target > > gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault. This > > fixes a bug where KVM effectively corrupts guest memory during live > > migration by writing to guest memory without informing userspace that the > > page is dirty. > > > > Marking the page dirty got unintentionally dropped when KVM's emulated > > CMPXCHG was converted to do a user access. Before that, KVM explicitly > > mapped the guest page into kernel memory, and marked the page dirty during > > the unmap phase. > > > > Mark the page dirty even if the CMPXCHG fails, as the old data is written > > back on failure, i.e. the page is still written. The value written is > > guaranteed to be the same because the operation is atomic, but KVM's ABI > > is that all writes are dirty logged regardless of the value written. And > > more importantly, that's what KVM did before the buggy commit. > > > > Huge kudos to the folks on the Cc list (and many others), who did all the > > actual work of triaging and debugging. > > > > Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses") > > I'm only half serious but... Should we just revert this commit? No. > This commit claims that kvm_vcpu_map() is unsafe because it can race > with mremap(). But there are many other places where KVM uses > kvm_vcpu_map() (e.g. nested VMX). It seems like KVM is just not > compatible with mremap() until we address all the users of > kvm_vcpu_map(). Patching _just_ emulator_cmpxchg_emulated() seems > silly but maybe I'm missing some context on what led to commit > 1c2361f667f3 being written. The short version is that it's a rather trivial vector for userspace to trigger UAF. E.g. map non-refcounted memory into a guest and then unmap the memory. We tried to fix the nVMX usage, but that proved to be impractical[1]. We haven't forced the issue because it's not obvious that there's meaningful exposure in practice, e.g. unless userspace is hiding regular memory from the kernel *and* oversubscribing VMs, a benign userspace won't be affected. But at the same time, we don't have high confidence that the unsafe behavior can't be exploited in practice. What I am pushing for now is an off-by-default module param to let userspace opt-in to unsafe mappings such as these[2]. Because if KVM starts allowing non-refcounted struct page memory, the ability to exploit these flaws skyrockets. (Though this reminds me that I need to take another look at whether or not allowing non-refcounted struct page memory is actually necessary). [1] https://lore.kernel.org/all/ZBEEQtmtNPaEqU1i@google.com [2] https://lore.kernel.org/all/20230911021637.1941096-4-stevensd@google.com > kvm_vcpu_map/unmap() might not be the best interface, but it serves as > a common choke-point for mapping guest memory to access in KVM. This > is helpful for avoiding missed dirty logging updates (obviously) and > will be even more helpful if we add support for freezing guest memory > and "KVM Userfault" (as discussed in the 1/3 PUCK). I think we all > agree we should do more of this (common choke points), not less. If > there's a usecase for mremap()ing guest memory, we should make > kvm_vcpu_map() play nice with mmu_notifiers. I agree, but KVM needs to __try_cmpxchg_user() use anyways, when updating guest A/D bits in FNAME(update_accessed_dirty_bits)(). And that one we *definitely* don't want to revert; see commit 2a8859f373b0 ("KVM: x86/mmu: do compare-and-exchange of gPTE via the user address") for details on how broken the previous code was. In other words, reverting to kvm_vcpu_{un,}map() *probably* isn't wildly unsafe, but it also doesn't really buy us anything, and long term we have line of sight to closing the holes for good. And unlike the nVMX code, where it's reasonable for KVM to disallow using non-refcounted memory for VMCS pages, disallowing such memory for emulated atomic accesses is less reasonable. Rather than revert, to make this more robust in the longer term, we can add a wrapper in KVM to mark the gfn dirty. I didn't do it here because I was hustling to get this minimal fix posted. E.g. -- Subject: [PATCH] KVM: x86: Provide a wrapper for __try_cmpxchg_user() to mark the gfn dirty Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/paging_tmpl.h | 4 ++-- arch/x86/kvm/x86.c | 25 +++++++++---------------- arch/x86/kvm/x86.h | 19 +++++++++++++++++++ 3 files changed, 30 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 4d4e98fe4f35..a8123406fe99 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -246,11 +246,11 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, if (unlikely(!walker->pte_writable[level - 1])) continue; - ret = __try_cmpxchg_user(ptep_user, &orig_pte, pte, fault); + ret = kvm_try_cmpxchg_user(ptep_user, &orig_pte, pte, fault, + vcpu, table_gfn); if (ret) return ret; - kvm_vcpu_mark_page_dirty(vcpu, table_gfn); walker->ptes[level - 1] = pte; } return 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3ec9781d6122..bedb51fbbad3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7946,8 +7946,9 @@ static int emulator_write_emulated(struct x86_emulate_ctxt *ctxt, exception, &write_emultor); } -#define emulator_try_cmpxchg_user(t, ptr, old, new) \ - (__try_cmpxchg_user((t __user *)(ptr), (t *)(old), *(t *)(new), efault ## t)) +#define emulator_try_cmpxchg_user(t, ptr, old, new, vcpu, gfn) \ + (kvm_try_cmpxchg_user((t __user *)(ptr), (t *)(old), *(t *)(new), \ + efault ## t, vcpu, gfn)) static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, unsigned long addr, @@ -7960,6 +7961,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, u64 page_line_mask; unsigned long hva; gpa_t gpa; + gfn_t gfn; int r; /* guests cmpxchg8b have to be emulated atomically */ @@ -7990,18 +7992,19 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, hva += offset_in_page(gpa); + gfn = gpa_to_gfn(gpa); switch (bytes) { case 1: - r = emulator_try_cmpxchg_user(u8, hva, old, new); + r = emulator_try_cmpxchg_user(u8, hva, old, new, vcpu, gfn); break; case 2: - r = emulator_try_cmpxchg_user(u16, hva, old, new); + r = emulator_try_cmpxchg_user(u16, hva, old, new, vcpu, gfn); break; case 4: - r = emulator_try_cmpxchg_user(u32, hva, old, new); + r = emulator_try_cmpxchg_user(u32, hva, old, new, vcpu, gfn); break; case 8: - r = emulator_try_cmpxchg_user(u64, hva, old, new); + r = emulator_try_cmpxchg_user(u64, hva, old, new, vcpu, gfn); break; default: BUG(); @@ -8009,16 +8012,6 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, if (r < 0) return X86EMUL_UNHANDLEABLE; - - /* - * Mark the page dirty _before_ checking whether or not the CMPXCHG was - * successful, as the old value is written back on failure. Note, for - * live migration, this is unnecessarily conservative as CMPXCHG writes - * back the original value and the access is atomic, but KVM's ABI is - * that all writes are dirty logged, regardless of the value written. - */ - kvm_vcpu_mark_page_dirty(vcpu, gpa_to_gfn(gpa)); - if (r) return X86EMUL_CMPXCHG_FAILED; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 2f7e19166658..2fabc7cd7e39 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -290,6 +290,25 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) return !(kvm->arch.disabled_quirks & quirk); } + + +/* + * Mark the page dirty even if the CMPXCHG fails (but didn't fault), as the old + * old value is written back on failure. Note, for live migration, this is + * unnecessarily conservative as CMPXCHG writes back the original value and the + * access is atomic, but KVM's ABI is that all writes are dirty logged, + * regardless of the value written. + */ +#define kvm_try_cmpxchg_user(ptr, oldp, nval, label, vcpu, gfn) \ +({ \ + int ret; \ + \ + ret = __try_cmpxchg_user(ptr, oldp, nval, label); \ + if (ret >= 0) \ + kvm_vcpu_mark_page_dirty(vcpu, gfn); \ + ret; \ +}) + void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); u64 get_kvmclock_ns(struct kvm *kvm); base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64 --
On Thu, Feb 15, 2024, Sean Christopherson wrote: > On Thu, Feb 15, 2024, David Matlack wrote: > > On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote: > > > > > > When emulating an atomic access on behalf of the guest, mark the target > > > gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault. This > > > fixes a bug where KVM effectively corrupts guest memory during live > > > migration by writing to guest memory without informing userspace that the > > > page is dirty. > > > > > > Marking the page dirty got unintentionally dropped when KVM's emulated > > > CMPXCHG was converted to do a user access. Before that, KVM explicitly > > > mapped the guest page into kernel memory, and marked the page dirty during > > > the unmap phase. > > > > > > Mark the page dirty even if the CMPXCHG fails, as the old data is written > > > back on failure, i.e. the page is still written. The value written is > > > guaranteed to be the same because the operation is atomic, but KVM's ABI > > > is that all writes are dirty logged regardless of the value written. And > > > more importantly, that's what KVM did before the buggy commit. > > > > > > Huge kudos to the folks on the Cc list (and many others), who did all the > > > actual work of triaging and debugging. > > > > > > Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses") > > > > I'm only half serious but... Should we just revert this commit? > > No. David, any objection to this patch? I'd like to get this on its way to Paolo asap, but also want to make sure we all agree this is the right solution before doing so.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b66c45e7f6f8..3ec9781d6122 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8009,6 +8009,16 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, if (r < 0) return X86EMUL_UNHANDLEABLE; + + /* + * Mark the page dirty _before_ checking whether or not the CMPXCHG was + * successful, as the old value is written back on failure. Note, for + * live migration, this is unnecessarily conservative as CMPXCHG writes + * back the original value and the access is atomic, but KVM's ABI is + * that all writes are dirty logged, regardless of the value written. + */ + kvm_vcpu_mark_page_dirty(vcpu, gpa_to_gfn(gpa)); + if (r) return X86EMUL_CMPXCHG_FAILED;