Message ID | 20221020093055.224317-5-mlevitsk@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4f06:0:0:0:0:0 with SMTP id c6csp175734wru; Thu, 20 Oct 2022 02:39:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7SAJOAp4FLlQDCZb1PWZn0Gvmi69bFYDIwj0+zMn/nLyc5it7n7OBFZmT+f0TnwkfEtZCs X-Received: by 2002:a17:902:7c8a:b0:17b:6eaa:5da3 with SMTP id y10-20020a1709027c8a00b0017b6eaa5da3mr12803027pll.33.1666258740633; Thu, 20 Oct 2022 02:39:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666258740; cv=none; d=google.com; s=arc-20160816; b=bWANayvH/Vc83MsVUgLd8/YnPoRKANJxJieKm5J6pjsCl+XM2Hcz+894H5uSEzn7J6 D1QGAtLETKB8nUISmlZ9i5Eh83M7UVe8GivyGgKD97Bys2g1leqczckdsM59N1H4/yCo xKRV6FnXb1s9nSuK4KHpzMIWQNUdeuAuy+rPLpVOIbg3PAgF1g8DQ9PYVsJQak6FhiDb tIOF4FkQpVCiTxqwefWKM060GwYbtw8iPLGGbrq4oOdsUE0pXJxMI/c+eZi9zQ2Ddars n8TbqHuEELKDHcDwIfHliIQFb0MMOutnnSsOFop9yNK4QtsfWM8Bvln2fLgk3Dt/zcO+ kpJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=nm86sopHALiAvOTB3Mb1uaCRxtNMK3Osw7jYg3LC+M0=; b=HK3dxwDPusaobO1KUYgvxJWQyZQ5bR63D3CwEc42Cfl6xAASvfZyoo1dd+xVwODPHe 3T6V4wD7Z+q7c3k0VHl4W70tjwz2IwSN80IGenpBebUjw/FW8oOKUIsqAstlk1QGr8dy DvoP7Phj7CECPa+XV/EQGz1nKAHr0waXrStDgeavD4xa/kCApxa1FGfq69elE2mcRRp2 C9TsPnmXqZBwQS2rs1qXJz57JcCyhdCUh4e8fe+n9rbcEkRirlxtocoDn3DjlknohOXD kM8rzVANjHwyD1+Q2d6rAV9a2ypYkqIkd5uaKMlRudIOJUUPdUss9VXrdjq2e6HwUSgM BPDw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PrBzeCYA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s19-20020a63f053000000b0044b412c1a77si20001757pgj.362.2022.10.20.02.38.46; Thu, 20 Oct 2022 02:39:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PrBzeCYA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230453AbiJTJbp (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Thu, 20 Oct 2022 05:31:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230439AbiJTJb3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 20 Oct 2022 05:31:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A6CC157F6F for <linux-kernel@vger.kernel.org>; Thu, 20 Oct 2022 02:31:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666258283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nm86sopHALiAvOTB3Mb1uaCRxtNMK3Osw7jYg3LC+M0=; b=PrBzeCYAoQuMxz23urWDMNkIm5te+d7ccZd/Ny3fT2+GBPnEdZQzALrJUB0+ZWxImEOq0R FxlTuhZwN9aiV2wCuoOqdZKlnGjeOvZ3/+qd0nm8GSV8P1te/gh8W9+vapaOfVD/6YmAEk iJ9eu6619IYI28GShB35+XZwssdMIUc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-423-MgE-q4rgP1SE4fwskpXN-A-1; Thu, 20 Oct 2022 05:31:18 -0400 X-MC-Unique: MgE-q4rgP1SE4fwskpXN-A-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9305729324B0; Thu, 20 Oct 2022 09:31:17 +0000 (UTC) Received: from localhost.localdomain (ovpn-192-51.brq.redhat.com [10.40.192.51]) by smtp.corp.redhat.com (Postfix) with ESMTP id C4AAC49BB60; Thu, 20 Oct 2022 09:31:14 +0000 (UTC) From: Maxim Levitsky <mlevitsk@redhat.com> To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>, Sean Christopherson <seanjc@google.com>, x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Maxim Levitsky <mlevitsk@redhat.com>, stable@vger.kernel.org Subject: [PATCH 4/4] KVM: x86: forcibly leave nested mode on vCPU reset Date: Thu, 20 Oct 2022 12:30:55 +0300 Message-Id: <20221020093055.224317-5-mlevitsk@redhat.com> In-Reply-To: <20221020093055.224317-1-mlevitsk@redhat.com> References: <20221020093055.224317-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747198924972366287?= X-GMAIL-MSGID: =?utf-8?q?1747198924972366287?= |
Series |
nSVM: fix L0 crash if L2 has shutdown condtion which L1 doesn't intercept
|
|
Commit Message
Maxim Levitsky
Oct. 20, 2022, 9:30 a.m. UTC
While not obivous, kvm_vcpu_reset leaves the nested mode by
clearing 'vcpu->arch.hflags' but it does so without all the
required housekeeping.
This makes SVM and VMX continue to use vmcs02/vmcb02 while
the cpu is not in nested mode.
In particular, in SVM code, it makes the 'svm_free_nested'
free the vmcb02, while still in use, which later triggers
use after free and a kernel crash.
This issue is assigned CVE-2022-3344
Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/kvm/x86.c | 1 +
1 file changed, 1 insertion(+)
Comments
On Thu, Oct 20, 2022, Maxim Levitsky wrote: > While not obivous, kvm_vcpu_reset leaves the nested mode by Please add () when referencing function, and wrap closer to ~75 chars. > clearing 'vcpu->arch.hflags' but it does so without all the > required housekeeping. > > This makes SVM and VMX continue to use vmcs02/vmcb02 while This bug should be impossible to hit on VMX as INIT and TRIPLE_FAULT unconditionally cause VM-Exit, i.e. will always be forwarded to L1. > the cpu is not in nested mode. Can you add a blurb to call out exactly how this bug can be triggered? Doesn't take much effort to suss out the "how", but it'd be nice to capture that info in the changelog. > In particular, in SVM code, it makes the 'svm_free_nested' > free the vmcb02, while still in use, which later triggers > use after free and a kernel crash. > > This issue is assigned CVE-2022-3344 > > Cc: stable@vger.kernel.org > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > arch/x86/kvm/x86.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d86a8aae1471d3..313c4a6dc65e45 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -11931,6 +11931,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > WARN_ON_ONCE(!init_event && > (old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu))); > > + kvm_leave_nested(vcpu); Not a big deal, especially if/when nested_ops are turned into static_calls, but at the same time it's quite easy to do: if (is_guest_mode(vcpu)) kvm_leave_nested(vcpu); I think it's worth adding a comment explaining how this can happen, and to also call out that EFER is cleared on INIT, i.e. that virtualization is disabled due to EFER.SVME=0. Unsurprisingly, I don't see anything in the APM that explicitly states what happens if INIT occurs in guest mode, i.e. it's not immediately obvious that forcing the vCPU back to L1 is architecturally correct. > kvm_lapic_reset(vcpu, init_event); > > vcpu->arch.hflags = 0; Maybe add a WARN above this to try and detect other potential issues? Kinda silly, but it'd at least help draw attention to the importance of hflags. E.g. this? diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4bd5f8a751de..c50fa0751a0b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11915,6 +11915,15 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) unsigned long old_cr0 = kvm_read_cr0(vcpu); unsigned long new_cr0; + /* + * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's + * possible to INIT the vCPU while L2 is active. Force the vCPU back + * into L1 as EFER.SVME is cleared on INIT (along with all other EFER + * bits), i.e. virtualization is disabled. + */ + if (is_guest_mode(vcpu)) + kvm_leave_nested(vcpu); + /* * Several of the "set" flows, e.g. ->set_cr0(), read other registers * to handle side effects. RESET emulation hits those flows and relies @@ -11927,6 +11936,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_lapic_reset(vcpu, init_event); + WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu)); vcpu->arch.hflags = 0; vcpu->arch.smi_pending = 0;
On Thu, 2022-10-20 at 15:33 +0000, Sean Christopherson wrote: > On Thu, Oct 20, 2022, Maxim Levitsky wrote: > > While not obivous, kvm_vcpu_reset leaves the nested mode by > > Please add () when referencing function, and wrap closer to ~75 chars. > > > clearing 'vcpu->arch.hflags' but it does so without all the > > required housekeeping. > > > > This makes SVM and VMX continue to use vmcs02/vmcb02 while > > This bug should be impossible to hit on VMX as INIT and TRIPLE_FAULT unconditionally > cause VM-Exit, i.e. will always be forwarded to L1. True I guess as I found out as well, in VMX the physical CPU can't be reset while in guest mode. I'll update the changelog. > > > the cpu is not in nested mode. > > Can you add a blurb to call out exactly how this bug can be triggered? Doesn't > take much effort to suss out the "how", but it'd be nice to capture that info in > the changelog. I will add (in another patch) a selftest for this. > > > In particular, in SVM code, it makes the 'svm_free_nested' > > free the vmcb02, while still in use, which later triggers > > use after free and a kernel crash. > > > > This issue is assigned CVE-2022-3344 > > > > Cc: stable@vger.kernel.org > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > --- > > arch/x86/kvm/x86.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index d86a8aae1471d3..313c4a6dc65e45 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -11931,6 +11931,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > > WARN_ON_ONCE(!init_event && > > (old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu))); > > > > + kvm_leave_nested(vcpu); > > Not a big deal, especially if/when nested_ops are turned into static_calls, but > at the same time it's quite easy to do: > > if (is_guest_mode(vcpu)) > kvm_leave_nested(vcpu); > > I think it's worth adding a comment explaining how this can happen, and to also > call out that EFER is cleared on INIT, i.e. that virtualization is disabled due > to EFER.SVME=0. Unsurprisingly, I don't see anything in the APM that explicitly > states what happens if INIT occurs in guest mode, i.e. it's not immediately obvious > that forcing the vCPU back to L1 is architecturally correct. > > > > kvm_lapic_reset(vcpu, init_event); > > > > vcpu->arch.hflags = 0; > > Maybe add a WARN above this to try and detect other potential issues? Kinda silly, > but it'd at least help draw attention to the importance of hflags. > > E.g. this? > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4bd5f8a751de..c50fa0751a0b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -11915,6 +11915,15 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > unsigned long old_cr0 = kvm_read_cr0(vcpu); > unsigned long new_cr0; > > + /* > + * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's > + * possible to INIT the vCPU while L2 is active. Force the vCPU back > + * into L1 as EFER.SVME is cleared on INIT (along with all other EFER > + * bits), i.e. virtualization is disabled. > + */ > + if (is_guest_mode(vcpu)) > + kvm_leave_nested(vcpu); > + > /* > * Several of the "set" flows, e.g. ->set_cr0(), read other registers > * to handle side effects. RESET emulation hits those flows and relies > @@ -11927,6 +11936,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > > kvm_lapic_reset(vcpu, init_event); > > + WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu)); > vcpu->arch.hflags = 0; > > vcpu->arch.smi_pending = 0; > Best regards, Maxim Levitsky
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d86a8aae1471d3..313c4a6dc65e45 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11931,6 +11931,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) WARN_ON_ONCE(!init_event && (old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu))); + kvm_leave_nested(vcpu); kvm_lapic_reset(vcpu, init_event); vcpu->arch.hflags = 0;