Message ID | 20230601142309.6307-4-guang.zeng@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp404807vqr; Thu, 1 Jun 2023 08:13:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6Vhnc69bFLU3OqBQCk1ezavkivppN8ZlzUVxPjmzNJwFIk6ZYNpcYKhUYdvq1PYjaMBZkm X-Received: by 2002:a17:903:2349:b0:1af:f4f5:6fae with SMTP id c9-20020a170903234900b001aff4f56faemr10538554plh.54.1685632398080; Thu, 01 Jun 2023 08:13:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685632398; cv=none; d=google.com; s=arc-20160816; b=wfSHVgd9Mj0EVTaPyBOth/W8XCLYFHua/seLhcAsIld/VA6rq0JZw+RtZYq5l2Yxg/ j0Cnl+KGBQvcWhAiRb6y147doWxMvdgrYgA4m/b8ZWHgI8JpaAlrqfPrC9tSawDxD+1A 7RNPeeWkzKEKqF/kJBNKnyDiXP15G+VJLwKjlXF7i6UZm1gs9/zoAT3nchD5nzLcGThv SDsj7ANTp/Bex0CQKB9PvDPfCh/s70s7BdWpJNdqxIIqZUZVcMX65E+4AkJq+A3oIbNm aeuHPDHOkl0azEuTitxk0jis4x4i5N941sCMm/7702ZJs2LMCu1Hh+rz4B0ciAB6yKq6 7CCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=qzAlyf08ojZ3a7CdNdnhMrX8yuUwu1axV+/c25k2ifI=; b=a+VhDnSxaGqlOgxFj48ePmDKWm06Ly4jwLdAly/EEdqJKXT09WZYcCzlYT64paMVZn HZ9Ww9Lt1cR/E4pGtmnsR8bSMrEux4F/n3Sctg2Ul+o78DqJMJQfWl7qgmmYMS0YyOjP E56Er3T89koTfARROFiQEJ3ScHcZj94MW1nE4n593V9QHAW+q6U06ji5gdJtJNyVEVV9 xjElRL4Wu/cYLW+ydfZBYsuiqdPnuvUmB4zHm1unnZuBxDOndPmnasxyWSO3r/ZaJHwo 78rengZfmC9BmiasCvmRS/A5UzbKRiz65TUbuXcwKrvbPRJpztlbYLLxl3eTFE22Foss 4Z+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UXHtoTY6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a8-20020a170902b58800b001ab23391d6asi2804286pls.590.2023.06.01.08.12.59; Thu, 01 Jun 2023 08:13:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UXHtoTY6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234831AbjFAPFu (ORCPT <rfc822;limurcpp@gmail.com> + 99 others); Thu, 1 Jun 2023 11:05:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235127AbjFAPFG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 1 Jun 2023 11:05:06 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DC1D1A6; Thu, 1 Jun 2023 08:04:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685631849; x=1717167849; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=atIMQ453MmU200T6ZuCKJrrV46VwSuyWV8VVDSHc8gU=; b=UXHtoTY6vVugtxldXqUVAkwXaD9IOcMO0PJ7rJdmx9QnpzsmlKLDYJGK zFabKc4sPW7LYKIvOuYmXtLXkJZaoV7pHLqxg5HK+AD7kDZQZfslcwF18 wTgLq0oJm69OHZ94blnsltGIJGzAgzXT9TDFz7RBFCFIAvCZw2bkOuFY0 jTKXKZkqclAsSE0+Q151xkdjmh4OkUz2bm8nhVph6RBGVAEZkXHwQF2H7 ZzOBeY8i0oeyNFdrGh/nzxwnAtTYTmYLlRy56Kvxc6sRJ1euGosmhdnlZ Hgh/JzAg+1BQgr94d3eKP5yuFflLTia+Dr9fw+B4C5x/yuEsYoZsJyY54 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="383853415" X-IronPort-AV: E=Sophos;i="6.00,210,1681196400"; d="scan'208";a="383853415" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2023 08:02:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="657828205" X-IronPort-AV: E=Sophos;i="6.00,210,1681196400"; d="scan'208";a="657828205" Received: from arthur-vostro-3668.sh.intel.com ([10.238.200.123]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2023 08:02:45 -0700 From: Zeng Guang <guang.zeng@intel.com> To: Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, H Peter Anvin <hpa@zytor.com>, kvm@vger.kernel.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Zeng Guang <guang.zeng@intel.com> Subject: [PATCH v1 3/6] KVM: VMX: Add new ops in kvm_x86_ops for LASS violation check Date: Thu, 1 Jun 2023 22:23:06 +0800 Message-Id: <20230601142309.6307-4-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230601142309.6307-1-guang.zeng@intel.com> References: <20230601142309.6307-1-guang.zeng@intel.com> X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767513677255958138?= X-GMAIL-MSGID: =?utf-8?q?1767513677255958138?= |
Series |
LASS KVM virtualization support
|
|
Commit Message
Zeng Guang
June 1, 2023, 2:23 p.m. UTC
Intel introduces LASS (Linear Address Separation) feature providing an independent mechanism to achieve the mode-based protection. LASS partitions 64-bit linear address space into two halves, user-mode address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It stops any code execution or conditional data access[1] 1. from user mode to supervisor-mode address space 2. from supervisor mode to user-mode address space and generates LASS violation fault accordingly. [1]A supervisor mode data access causes a LASS violation only if supervisor mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 or the access implicitly accesses a system data structure. Following are the rules of LASS violation check on the linear address(LA). User access to supervisor-mode address space: LA[bit 63] && (CPL == 3) Supervisor access to user-mode address space: Instruction fetch: !LA[bit 63] && (CPL < 3) Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && CPL < 3) || Implicit supervisor access) Add new ops in kvm_x86_ops to do LASS violation check. Signed-off-by: Zeng Guang <guang.zeng@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> --- arch/x86/include/asm/kvm-x86-ops.h | 3 +- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/kvm_emulate.h | 1 + arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.h | 2 ++ 5 files changed, 54 insertions(+), 1 deletion(-)
Comments
On 6/1/2023 10:23 PM, Zeng Guang wrote: > Intel introduces LASS (Linear Address Separation) feature providing ^ missing "Space" here > an independent mechanism to achieve the mode-based protection. > > LASS partitions 64-bit linear address space into two halves, user-mode > address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It > stops any code execution or conditional data access[1] > 1. from user mode to supervisor-mode address space > 2. from supervisor mode to user-mode address space > and generates LASS violation fault accordingly. > > [1]A supervisor mode data access causes a LASS violation only if supervisor > mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 > or the access implicitly accesses a system data structure. > > Following are the rules of LASS violation check on the linear address(LA). > User access to supervisor-mode address space: > LA[bit 63] && (CPL == 3) > Supervisor access to user-mode address space: > Instruction fetch: !LA[bit 63] && (CPL < 3) > Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && > CPL < 3) || Implicit supervisor access) > > Add new ops in kvm_x86_ops to do LASS violation check. > > Signed-off-by: Zeng Guang <guang.zeng@intel.com> > Tested-by: Xuelian Guo <xuelian.guo@intel.com> > --- > arch/x86/include/asm/kvm-x86-ops.h | 3 +- > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/kvm_emulate.h | 1 + > arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ > arch/x86/kvm/vmx/vmx.h | 2 ++ > 5 files changed, 54 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index 13bc212cd4bc..8980a3bfa687 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) > KVM_X86_OP(msr_filter_changed) > KVM_X86_OP(complete_emulated_msr) > KVM_X86_OP(vcpu_deliver_sipi_vector) > -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) > +KVM_X86_OP_OPTIONAL_RET0(check_lass) > > #undef KVM_X86_OP > #undef KVM_X86_OP_OPTIONAL > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 92d8e65fe88c..98666d1e7727 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { > * Returns vCPU specific APICv inhibit reasons > */ > unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); > + > + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > }; > > struct kvm_x86_nested_ops { > diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h > index 5b9ec610b2cb..f1439ab7c14b 100644 > --- a/arch/x86/kvm/kvm_emulate.h > +++ b/arch/x86/kvm/kvm_emulate.h > @@ -91,6 +91,7 @@ struct x86_instruction_info { > /* x86-specific emulation flags */ > #define X86EMUL_F_FETCH BIT(0) > #define X86EMUL_F_WRITE BIT(1) > +#define X86EMUL_F_SKIPLASS BIT(2) > > struct x86_emulate_ops { > void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index a33205ded85c..876997e8448e 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) > free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); > } > > +/* > + * Determine whether an access to the linear address causes a LASS violation. > + * LASS protection is only effective in long mode. As a prerequisite, caller > + * should make sure vCPU running in long mode and invoke this api to do LASS > + * violation check. > + */ > +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) > +{ > + bool user_mode, user_as, rflags_ac; > + > + if (!!(flags & X86EMUL_F_SKIPLASS) || > + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) > + return false; > + > + WARN_ON_ONCE(!is_long_mode(vcpu)); IMHO, it's better to skip the following checks and return false if it is out of long mode. > + > + user_as = !(la >> 63); It's better to describe how LASS treat linear address in compatibility mode in changelog or/and in comment, i.e. for a linear address with only 32 bits (or 16 bits), the processor treats bit 63 as if it were 0. > + > + /* > + * An access is a supervisor-mode access if CPL < 3 or if it implicitly > + * accesses a system data structure. For implicit accesses to system > + * data structure, the processor acts as if RFLAGS.AC is clear. > + */ > + if (access & PFERR_IMPLICIT_ACCESS) { > + user_mode = false; > + rflags_ac = false; > + } else { > + user_mode = vmx_get_cpl(vcpu) == 3; > + if (!user_mode) > + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); > + } > + > + if (user_mode == user_as) > + return false; > + > + /* > + * Supervisor-mode _data_ accesses to user address space > + * cause LASS violations only if SMAP is enabled. > + */ > + if (!user_mode && !(access & PFERR_FETCH_MASK)) > + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; > + > + return true; > +} > + > static struct kvm_x86_ops vmx_x86_ops __initdata = { > .name = KBUILD_MODNAME, > > @@ -8269,6 +8314,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { > .complete_emulated_msr = kvm_complete_insn_gp, > > .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, > + > + .check_lass = vmx_check_lass, > }; > > static unsigned int vmx_handle_intel_pt_intr(void) > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > index 9e66531861cf..f2e775b9849b 100644 > --- a/arch/x86/kvm/vmx/vmx.h > +++ b/arch/x86/kvm/vmx/vmx.h > @@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type); > u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu); > u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu); > > +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > + > static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, > int type, bool value) > {
On Thu, Jun 01, 2023 at 10:23:06PM +0800, Zeng Guang wrote: >diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >index 92d8e65fe88c..98666d1e7727 100644 >--- a/arch/x86/include/asm/kvm_host.h >+++ b/arch/x86/include/asm/kvm_host.h >@@ -1731,6 +1731,8 @@ struct kvm_x86_ops { > * Returns vCPU specific APICv inhibit reasons > */ > unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); >+ >+ bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); It is better to declare the @la as gva_t since the address is a virtual address. Both @access and @flags provide additional informaiton about a memory access. I think we can drop one of them e.g. adding a new bit X86EMUL_F_IMPLICIT_ACCESS. Or maybe in the first place, we can just extend PFERR_? for SKIP_LASS/LAM behavior instead of adding another set of flags (X86EMUL_F_?). The benefit of adding new flags is they won't collide with future hardware extensions. I am not sure.
On Mon, 5 Jun 2023 11:31:48 +0800 Binbin Wu <binbin.wu@linux.intel.com> wrote: > > > On 6/1/2023 10:23 PM, Zeng Guang wrote: > > Intel introduces LASS (Linear Address Separation) feature providing > ^ > missing "Space" here > > an independent mechanism to achieve the mode-based protection. > > > > LASS partitions 64-bit linear address space into two halves, user-mode > > address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It > > stops any code execution or conditional data access[1] > > 1. from user mode to supervisor-mode address space > > 2. from supervisor mode to user-mode address space > > and generates LASS violation fault accordingly. > > > > [1]A supervisor mode data access causes a LASS violation only if supervisor > > mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 > > or the access implicitly accesses a system data structure. > > > > Following are the rules of LASS violation check on the linear address(LA). > > User access to supervisor-mode address space: > > LA[bit 63] && (CPL == 3) > > Supervisor access to user-mode address space: > > Instruction fetch: !LA[bit 63] && (CPL < 3) > > Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && > > CPL < 3) || Implicit supervisor access) > > > > Add new ops in kvm_x86_ops to do LASS violation check. > > > > Signed-off-by: Zeng Guang <guang.zeng@intel.com> > > Tested-by: Xuelian Guo <xuelian.guo@intel.com> > > --- > > arch/x86/include/asm/kvm-x86-ops.h | 3 +- > > arch/x86/include/asm/kvm_host.h | 2 ++ > > arch/x86/kvm/kvm_emulate.h | 1 + > > arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ > > arch/x86/kvm/vmx/vmx.h | 2 ++ > > 5 files changed, 54 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > > index 13bc212cd4bc..8980a3bfa687 100644 > > --- a/arch/x86/include/asm/kvm-x86-ops.h > > +++ b/arch/x86/include/asm/kvm-x86-ops.h > > @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) > > KVM_X86_OP(msr_filter_changed) > > KVM_X86_OP(complete_emulated_msr) > > KVM_X86_OP(vcpu_deliver_sipi_vector) > > -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > > +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) > > +KVM_X86_OP_OPTIONAL_RET0(check_lass) > > > > #undef KVM_X86_OP > > #undef KVM_X86_OP_OPTIONAL > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 92d8e65fe88c..98666d1e7727 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { > > * Returns vCPU specific APICv inhibit reasons > > */ > > unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); > > + > > + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > > }; > > > > struct kvm_x86_nested_ops { > > diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h > > index 5b9ec610b2cb..f1439ab7c14b 100644 > > --- a/arch/x86/kvm/kvm_emulate.h > > +++ b/arch/x86/kvm/kvm_emulate.h > > @@ -91,6 +91,7 @@ struct x86_instruction_info { > > /* x86-specific emulation flags */ > > #define X86EMUL_F_FETCH BIT(0) > > #define X86EMUL_F_WRITE BIT(1) > > +#define X86EMUL_F_SKIPLASS BIT(2) > > > > struct x86_emulate_ops { > > void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index a33205ded85c..876997e8448e 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) > > free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); > > } > > > > +/* > > + * Determine whether an access to the linear address causes a LASS violation. > > + * LASS protection is only effective in long mode. As a prerequisite, caller > > + * should make sure vCPU running in long mode and invoke this api to do LASS > > + * violation check. > > + */ > > +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) > > +{ > > + bool user_mode, user_as, rflags_ac; > > + > > + if (!!(flags & X86EMUL_F_SKIPLASS) || > > + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) > > + return false; > > + > > + WARN_ON_ONCE(!is_long_mode(vcpu)); > IMHO, it's better to skip the following checks and return false if it is > out of long mode. > The check of long mode is in the caller implemented in in the next patch. :) + if (!is_long_mode(vcpu)) + return false; > > + > > + user_as = !(la >> 63); > It's better to describe how LASS treat linear address in compatibility > mode in changelog or/and in comment, > i.e. for a linear address with only 32 bits (or 16 bits), the processor > treats bit 63 as if it were 0. > > > > + > > + /* > > + * An access is a supervisor-mode access if CPL < 3 or if it implicitly > > + * accesses a system data structure. For implicit accesses to system > > + * data structure, the processor acts as if RFLAGS.AC is clear. > > + */ > > + if (access & PFERR_IMPLICIT_ACCESS) { > > + user_mode = false; > > + rflags_ac = false; > > + } else { > > + user_mode = vmx_get_cpl(vcpu) == 3; > > + if (!user_mode) > > + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); > > + } > > + > > + if (user_mode == user_as) > > + return false; > > + > > + /* > > + * Supervisor-mode _data_ accesses to user address space > > + * cause LASS violations only if SMAP is enabled. > > + */ > > + if (!user_mode && !(access & PFERR_FETCH_MASK)) > > + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; > > + > > + return true; > > +} > > + > > static struct kvm_x86_ops vmx_x86_ops __initdata = { > > .name = KBUILD_MODNAME, > > > > @@ -8269,6 +8314,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { > > .complete_emulated_msr = kvm_complete_insn_gp, > > > > .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, > > + > > + .check_lass = vmx_check_lass, > > }; > > > > static unsigned int vmx_handle_intel_pt_intr(void) > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > > index 9e66531861cf..f2e775b9849b 100644 > > --- a/arch/x86/kvm/vmx/vmx.h > > +++ b/arch/x86/kvm/vmx/vmx.h > > @@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type); > > u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu); > > u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu); > > > > +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > > + > > static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, > > int type, bool value) > > { >
On Thu, Jun 01, 2023 at 10:23:06PM +0800, Zeng Guang wrote: > Intel introduces LASS (Linear Address Separation) feature providing > an independent mechanism to achieve the mode-based protection. > > LASS partitions 64-bit linear address space into two halves, user-mode > address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It > stops any code execution or conditional data access[1] > 1. from user mode to supervisor-mode address space > 2. from supervisor mode to user-mode address space > and generates LASS violation fault accordingly. > > [1]A supervisor mode data access causes a LASS violation only if supervisor > mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 > or the access implicitly accesses a system data structure. > > Following are the rules of LASS violation check on the linear address(LA). > User access to supervisor-mode address space: > LA[bit 63] && (CPL == 3) > Supervisor access to user-mode address space: > Instruction fetch: !LA[bit 63] && (CPL < 3) > Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && > CPL < 3) || Implicit supervisor access) > > Add new ops in kvm_x86_ops to do LASS violation check. > > Signed-off-by: Zeng Guang <guang.zeng@intel.com> > Tested-by: Xuelian Guo <xuelian.guo@intel.com> > --- > arch/x86/include/asm/kvm-x86-ops.h | 3 +- > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/kvm_emulate.h | 1 + > arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ > arch/x86/kvm/vmx/vmx.h | 2 ++ > 5 files changed, 54 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index 13bc212cd4bc..8980a3bfa687 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) > KVM_X86_OP(msr_filter_changed) > KVM_X86_OP(complete_emulated_msr) > KVM_X86_OP(vcpu_deliver_sipi_vector) > -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) > +KVM_X86_OP_OPTIONAL_RET0(check_lass) > > #undef KVM_X86_OP > #undef KVM_X86_OP_OPTIONAL > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 92d8e65fe88c..98666d1e7727 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { > * Returns vCPU specific APICv inhibit reasons > */ > unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); > + > + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > }; > > struct kvm_x86_nested_ops { > diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h > index 5b9ec610b2cb..f1439ab7c14b 100644 > --- a/arch/x86/kvm/kvm_emulate.h > +++ b/arch/x86/kvm/kvm_emulate.h > @@ -91,6 +91,7 @@ struct x86_instruction_info { > /* x86-specific emulation flags */ > #define X86EMUL_F_FETCH BIT(0) > #define X86EMUL_F_WRITE BIT(1) > +#define X86EMUL_F_SKIPLASS BIT(2) > > struct x86_emulate_ops { > void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index a33205ded85c..876997e8448e 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) > free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); > } > > +/* > + * Determine whether an access to the linear address causes a LASS violation. > + * LASS protection is only effective in long mode. As a prerequisite, caller > + * should make sure vCPU running in long mode and invoke this api to do LASS > + * violation check. > + */ > +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) > +{ > + bool user_mode, user_as, rflags_ac; > + > + if (!!(flags & X86EMUL_F_SKIPLASS) || > + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) > + return false; > + > + WARN_ON_ONCE(!is_long_mode(vcpu)); > + > + user_as = !(la >> 63); > + > + /* > + * An access is a supervisor-mode access if CPL < 3 or if it implicitly > + * accesses a system data structure. For implicit accesses to system > + * data structure, the processor acts as if RFLAGS.AC is clear. > + */ > + if (access & PFERR_IMPLICIT_ACCESS) { > + user_mode = false; > + rflags_ac = false; > + } else { > + user_mode = vmx_get_cpl(vcpu) == 3; > + if (!user_mode) > + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); > + } > + > + if (user_mode == user_as) Confused by user_as, it's role of address(U/S) so how about "user_addr" ? "if (user_mode == user_addr)" looks more clear to me. > + return false; > + > + /* > + * Supervisor-mode _data_ accesses to user address space > + * cause LASS violations only if SMAP is enabled. > + */ > + if (!user_mode && !(access & PFERR_FETCH_MASK)) > + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; > + > + return true; > +} > + > static struct kvm_x86_ops vmx_x86_ops __initdata = { > .name = KBUILD_MODNAME, > > @@ -8269,6 +8314,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { > .complete_emulated_msr = kvm_complete_insn_gp, > > .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, > + > + .check_lass = vmx_check_lass, > }; > > static unsigned int vmx_handle_intel_pt_intr(void) > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > index 9e66531861cf..f2e775b9849b 100644 > --- a/arch/x86/kvm/vmx/vmx.h > +++ b/arch/x86/kvm/vmx/vmx.h > @@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type); > u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu); > u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu); > > +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > + > static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, > int type, bool value) > { > -- > 2.27.0 >
On 6/5/2023 8:53 PM, Zhi Wang wrote: > On Mon, 5 Jun 2023 11:31:48 +0800 > Binbin Wu <binbin.wu@linux.intel.com> wrote: > >> >> On 6/1/2023 10:23 PM, Zeng Guang wrote: >>> Intel introduces LASS (Linear Address Separation) feature providing >> ^ >> missing "Space" here >>> an independent mechanism to achieve the mode-based protection. >>> >>> LASS partitions 64-bit linear address space into two halves, user-mode >>> address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It >>> stops any code execution or conditional data access[1] >>> 1. from user mode to supervisor-mode address space >>> 2. from supervisor mode to user-mode address space >>> and generates LASS violation fault accordingly. >>> >>> [1]A supervisor mode data access causes a LASS violation only if supervisor >>> mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 >>> or the access implicitly accesses a system data structure. >>> >>> Following are the rules of LASS violation check on the linear address(LA). >>> User access to supervisor-mode address space: >>> LA[bit 63] && (CPL == 3) >>> Supervisor access to user-mode address space: >>> Instruction fetch: !LA[bit 63] && (CPL < 3) >>> Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && >>> CPL < 3) || Implicit supervisor access) >>> >>> Add new ops in kvm_x86_ops to do LASS violation check. >>> >>> Signed-off-by: Zeng Guang <guang.zeng@intel.com> >>> Tested-by: Xuelian Guo <xuelian.guo@intel.com> >>> --- >>> arch/x86/include/asm/kvm-x86-ops.h | 3 +- >>> arch/x86/include/asm/kvm_host.h | 2 ++ >>> arch/x86/kvm/kvm_emulate.h | 1 + >>> arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ >>> arch/x86/kvm/vmx/vmx.h | 2 ++ >>> 5 files changed, 54 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h >>> index 13bc212cd4bc..8980a3bfa687 100644 >>> --- a/arch/x86/include/asm/kvm-x86-ops.h >>> +++ b/arch/x86/include/asm/kvm-x86-ops.h >>> @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) >>> KVM_X86_OP(msr_filter_changed) >>> KVM_X86_OP(complete_emulated_msr) >>> KVM_X86_OP(vcpu_deliver_sipi_vector) >>> -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); >>> +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) >>> +KVM_X86_OP_OPTIONAL_RET0(check_lass) >>> >>> #undef KVM_X86_OP >>> #undef KVM_X86_OP_OPTIONAL >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >>> index 92d8e65fe88c..98666d1e7727 100644 >>> --- a/arch/x86/include/asm/kvm_host.h >>> +++ b/arch/x86/include/asm/kvm_host.h >>> @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { >>> * Returns vCPU specific APICv inhibit reasons >>> */ >>> unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); >>> + >>> + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); >>> }; >>> >>> struct kvm_x86_nested_ops { >>> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h >>> index 5b9ec610b2cb..f1439ab7c14b 100644 >>> --- a/arch/x86/kvm/kvm_emulate.h >>> +++ b/arch/x86/kvm/kvm_emulate.h >>> @@ -91,6 +91,7 @@ struct x86_instruction_info { >>> /* x86-specific emulation flags */ >>> #define X86EMUL_F_FETCH BIT(0) >>> #define X86EMUL_F_WRITE BIT(1) >>> +#define X86EMUL_F_SKIPLASS BIT(2) >>> >>> struct x86_emulate_ops { >>> void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); >>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >>> index a33205ded85c..876997e8448e 100644 >>> --- a/arch/x86/kvm/vmx/vmx.c >>> +++ b/arch/x86/kvm/vmx/vmx.c >>> @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) >>> free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); >>> } >>> >>> +/* >>> + * Determine whether an access to the linear address causes a LASS violation. >>> + * LASS protection is only effective in long mode. As a prerequisite, caller >>> + * should make sure vCPU running in long mode and invoke this api to do LASS >>> + * violation check. >>> + */ >>> +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) >>> +{ >>> + bool user_mode, user_as, rflags_ac; >>> + >>> + if (!!(flags & X86EMUL_F_SKIPLASS) || >>> + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) >>> + return false; >>> + >>> + WARN_ON_ONCE(!is_long_mode(vcpu)); >> IMHO, it's better to skip the following checks and return false if it is >> out of long mode. >> > The check of long mode is in the caller implemented in in the next patch. :) > > + if (!is_long_mode(vcpu)) > + return false; I know the callers have checked the mode, however, IMHO, it's better as following: + if (!!(flags & X86EMUL_F_SKIPLASS) || + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS) || WARN_ON_ONCE(!is_long_mode(vcpu))) + return false; >>> + >>> + user_as = !(la >> 63); >> It's better to describe how LASS treat linear address in compatibility >> mode in changelog or/and in comment, >> i.e. for a linear address with only 32 bits (or 16 bits), the processor >> treats bit 63 as if it were 0. >> >> >>> + >>> + /* >>> + * An access is a supervisor-mode access if CPL < 3 or if it implicitly >>> + * accesses a system data structure. For implicit accesses to system >>> + * data structure, the processor acts as if RFLAGS.AC is clear. >>> + */ >>> + if (access & PFERR_IMPLICIT_ACCESS) { >>> + user_mode = false; >>> + rflags_ac = false; >>> + } else { >>> + user_mode = vmx_get_cpl(vcpu) == 3; >>> + if (!user_mode) >>> + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); >>> + } >>> + >>> + if (user_mode == user_as) >>> + return false; >>> + >>> + /* >>> + * Supervisor-mode _data_ accesses to user address space >>> + * cause LASS violations only if SMAP is enabled. >>> + */ >>> + if (!user_mode && !(access & PFERR_FETCH_MASK)) >>> + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; >>> + >>> + return true; >>> +} >>> + >>> static struct kvm_x86_ops vmx_x86_ops __initdata = { >>> .name = KBUILD_MODNAME, >>> >>> @@ -8269,6 +8314,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { >>> .complete_emulated_msr = kvm_complete_insn_gp, >>> >>> .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, >>> + >>> + .check_lass = vmx_check_lass, >>> }; >>> >>> static unsigned int vmx_handle_intel_pt_intr(void) >>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h >>> index 9e66531861cf..f2e775b9849b 100644 >>> --- a/arch/x86/kvm/vmx/vmx.h >>> +++ b/arch/x86/kvm/vmx/vmx.h >>> @@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type); >>> u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu); >>> u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu); >>> >>> +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); >>> + >>> static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, >>> int type, bool value) >>> {
On 6/5/2023 10:07 PM, Yuan Yao wrote: > On Thu, Jun 01, 2023 at 10:23:06PM +0800, Zeng Guang wrote: >> Intel introduces LASS (Linear Address Separation) feature providing >> an independent mechanism to achieve the mode-based protection. >> >> LASS partitions 64-bit linear address space into two halves, user-mode >> address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It >> stops any code execution or conditional data access[1] >> 1. from user mode to supervisor-mode address space >> 2. from supervisor mode to user-mode address space >> and generates LASS violation fault accordingly. >> >> +/* >> + * Determine whether an access to the linear address causes a LASS violation. >> + * LASS protection is only effective in long mode. As a prerequisite, caller >> + * should make sure vCPU running in long mode and invoke this api to do LASS >> + * violation check. >> + */ >> +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) >> +{ >> + bool user_mode, user_as, rflags_ac; >> + >> + if (!!(flags & X86EMUL_F_SKIPLASS) || >> + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) >> + return false; >> + >> + WARN_ON_ONCE(!is_long_mode(vcpu)); >> + >> + user_as = !(la >> 63); >> + >> + /* >> + * An access is a supervisor-mode access if CPL < 3 or if it implicitly >> + * accesses a system data structure. For implicit accesses to system >> + * data structure, the processor acts as if RFLAGS.AC is clear. >> + */ >> + if (access & PFERR_IMPLICIT_ACCESS) { >> + user_mode = false; >> + rflags_ac = false; >> + } else { >> + user_mode = vmx_get_cpl(vcpu) == 3; >> + if (!user_mode) >> + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); >> + } >> + >> + if (user_mode == user_as) > Confused by user_as, it's role of address(U/S) so how about > "user_addr" ? "if (user_mode == user_addr)" looks more clear > to me. > Actually "as" stands for "address space". I suppose it more precise. :)
On Tue, 6 Jun 2023 10:57:23 +0800 Binbin Wu <binbin.wu@linux.intel.com> wrote: > > > On 6/5/2023 8:53 PM, Zhi Wang wrote: > > On Mon, 5 Jun 2023 11:31:48 +0800 > > Binbin Wu <binbin.wu@linux.intel.com> wrote: > > > >> > >> On 6/1/2023 10:23 PM, Zeng Guang wrote: > >>> Intel introduces LASS (Linear Address Separation) feature providing > >> ^ > >> missing "Space" here > >>> an independent mechanism to achieve the mode-based protection. > >>> > >>> LASS partitions 64-bit linear address space into two halves, user-mode > >>> address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It > >>> stops any code execution or conditional data access[1] > >>> 1. from user mode to supervisor-mode address space > >>> 2. from supervisor mode to user-mode address space > >>> and generates LASS violation fault accordingly. > >>> > >>> [1]A supervisor mode data access causes a LASS violation only if supervisor > >>> mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 > >>> or the access implicitly accesses a system data structure. > >>> > >>> Following are the rules of LASS violation check on the linear address(LA). > >>> User access to supervisor-mode address space: > >>> LA[bit 63] && (CPL == 3) > >>> Supervisor access to user-mode address space: > >>> Instruction fetch: !LA[bit 63] && (CPL < 3) > >>> Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && > >>> CPL < 3) || Implicit supervisor access) > >>> > >>> Add new ops in kvm_x86_ops to do LASS violation check. > >>> > >>> Signed-off-by: Zeng Guang <guang.zeng@intel.com> > >>> Tested-by: Xuelian Guo <xuelian.guo@intel.com> > >>> --- > >>> arch/x86/include/asm/kvm-x86-ops.h | 3 +- > >>> arch/x86/include/asm/kvm_host.h | 2 ++ > >>> arch/x86/kvm/kvm_emulate.h | 1 + > >>> arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ > >>> arch/x86/kvm/vmx/vmx.h | 2 ++ > >>> 5 files changed, 54 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > >>> index 13bc212cd4bc..8980a3bfa687 100644 > >>> --- a/arch/x86/include/asm/kvm-x86-ops.h > >>> +++ b/arch/x86/include/asm/kvm-x86-ops.h > >>> @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) > >>> KVM_X86_OP(msr_filter_changed) > >>> KVM_X86_OP(complete_emulated_msr) > >>> KVM_X86_OP(vcpu_deliver_sipi_vector) > >>> -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > >>> +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) > >>> +KVM_X86_OP_OPTIONAL_RET0(check_lass) > >>> > >>> #undef KVM_X86_OP > >>> #undef KVM_X86_OP_OPTIONAL > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > >>> index 92d8e65fe88c..98666d1e7727 100644 > >>> --- a/arch/x86/include/asm/kvm_host.h > >>> +++ b/arch/x86/include/asm/kvm_host.h > >>> @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { > >>> * Returns vCPU specific APICv inhibit reasons > >>> */ > >>> unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); > >>> + > >>> + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > >>> }; > >>> > >>> struct kvm_x86_nested_ops { > >>> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h > >>> index 5b9ec610b2cb..f1439ab7c14b 100644 > >>> --- a/arch/x86/kvm/kvm_emulate.h > >>> +++ b/arch/x86/kvm/kvm_emulate.h > >>> @@ -91,6 +91,7 @@ struct x86_instruction_info { > >>> /* x86-specific emulation flags */ > >>> #define X86EMUL_F_FETCH BIT(0) > >>> #define X86EMUL_F_WRITE BIT(1) > >>> +#define X86EMUL_F_SKIPLASS BIT(2) > >>> > >>> struct x86_emulate_ops { > >>> void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); > >>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > >>> index a33205ded85c..876997e8448e 100644 > >>> --- a/arch/x86/kvm/vmx/vmx.c > >>> +++ b/arch/x86/kvm/vmx/vmx.c > >>> @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) > >>> free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); > >>> } > >>> > >>> +/* > >>> + * Determine whether an access to the linear address causes a LASS violation. > >>> + * LASS protection is only effective in long mode. As a prerequisite, caller > >>> + * should make sure vCPU running in long mode and invoke this api to do LASS > >>> + * violation check. > >>> + */ > >>> +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) > >>> +{ > >>> + bool user_mode, user_as, rflags_ac; > >>> + > >>> + if (!!(flags & X86EMUL_F_SKIPLASS) || > >>> + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) > >>> + return false; > >>> + > >>> + WARN_ON_ONCE(!is_long_mode(vcpu)); > >> IMHO, it's better to skip the following checks and return false if it is > >> out of long mode. > >> > > The check of long mode is in the caller implemented in in the next patch. :) > > > > + if (!is_long_mode(vcpu)) > > + return false; > I know the callers have checked the mode, however, IMHO, it's better as > following: > > + if (!!(flags & X86EMUL_F_SKIPLASS) || > + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS) || WARN_ON_ONCE(!is_long_mode(vcpu))) > + return false; > Uh. I see. LGTM. > > > >>> + > >>> + user_as = !(la >> 63); > >> It's better to describe how LASS treat linear address in compatibility > >> mode in changelog or/and in comment, > >> i.e. for a linear address with only 32 bits (or 16 bits), the processor > >> treats bit 63 as if it were 0. > >> > >> > >>> + > >>> + /* > >>> + * An access is a supervisor-mode access if CPL < 3 or if it implicitly > >>> + * accesses a system data structure. For implicit accesses to system > >>> + * data structure, the processor acts as if RFLAGS.AC is clear. > >>> + */ > >>> + if (access & PFERR_IMPLICIT_ACCESS) { > >>> + user_mode = false; > >>> + rflags_ac = false; > >>> + } else { > >>> + user_mode = vmx_get_cpl(vcpu) == 3; > >>> + if (!user_mode) > >>> + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); > >>> + } > >>> + > >>> + if (user_mode == user_as) > >>> + return false; > >>> + > >>> + /* > >>> + * Supervisor-mode _data_ accesses to user address space > >>> + * cause LASS violations only if SMAP is enabled. > >>> + */ > >>> + if (!user_mode && !(access & PFERR_FETCH_MASK)) > >>> + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; > >>> + > >>> + return true; > >>> +} > >>> + > >>> static struct kvm_x86_ops vmx_x86_ops __initdata = { > >>> .name = KBUILD_MODNAME, > >>> > >>> @@ -8269,6 +8314,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { > >>> .complete_emulated_msr = kvm_complete_insn_gp, > >>> > >>> .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, > >>> + > >>> + .check_lass = vmx_check_lass, > >>> }; > >>> > >>> static unsigned int vmx_handle_intel_pt_intr(void) > >>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > >>> index 9e66531861cf..f2e775b9849b 100644 > >>> --- a/arch/x86/kvm/vmx/vmx.h > >>> +++ b/arch/x86/kvm/vmx/vmx.h > >>> @@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type); > >>> u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu); > >>> u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu); > >>> > >>> +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > >>> + > >>> static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, > >>> int type, bool value) > >>> { >
On 6/5/2023 11:47 AM, Gao, Chao wrote: > On Thu, Jun 01, 2023 at 10:23:06PM +0800, Zeng Guang wrote: >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 92d8e65fe88c..98666d1e7727 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { >> * Returns vCPU specific APICv inhibit reasons >> */ >> unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); >> + >> + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > It is better to declare the @la as gva_t since the address is a virtual address. > > Both @access and @flags provide additional informaiton about a memory access. I > think we can drop one of them e.g. adding a new bit X86EMUL_F_IMPLICIT_ACCESS. > > Or maybe in the first place, we can just extend PFERR_? for SKIP_LASS/LAM > behavior instead of adding another set of flags (X86EMUL_F_?). The benefit of > adding new flags is they won't collide with future hardware extensions. I am not > sure. Make sense. Prefer to adding a new bit of X86EMUL flags. PFERR_ is used for page fault case and actually not proper to be taken for LASS/LAM usage.
On 6/5/2023 11:31 AM, Binbin Wu wrote: > > On 6/1/2023 10:23 PM, Zeng Guang wrote: >> Intel introduces LASS (Linear Address Separation) feature providing > ^ > missing "Space" here Thanks. >> an independent mechanism to achieve the mode-based protection. >> >> LASS partitions 64-bit linear address space into two halves, user-mode >> address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It >> stops any code execution or conditional data access[1] >> 1. from user mode to supervisor-mode address space >> 2. from supervisor mode to user-mode address space >> and generates LASS violation fault accordingly. >> >> [1]A supervisor mode data access causes a LASS violation only if supervisor >> mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 >> or the access implicitly accesses a system data structure. >> >> Following are the rules of LASS violation check on the linear address(LA). >> User access to supervisor-mode address space: >> LA[bit 63] && (CPL == 3) >> Supervisor access to user-mode address space: >> Instruction fetch: !LA[bit 63] && (CPL < 3) >> Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && >> CPL < 3) || Implicit supervisor access) >> >> Add new ops in kvm_x86_ops to do LASS violation check. >> >> Signed-off-by: Zeng Guang <guang.zeng@intel.com> >> Tested-by: Xuelian Guo <xuelian.guo@intel.com> >> --- >> arch/x86/include/asm/kvm-x86-ops.h | 3 +- >> arch/x86/include/asm/kvm_host.h | 2 ++ >> arch/x86/kvm/kvm_emulate.h | 1 + >> arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ >> arch/x86/kvm/vmx/vmx.h | 2 ++ >> 5 files changed, 54 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h >> index 13bc212cd4bc..8980a3bfa687 100644 >> --- a/arch/x86/include/asm/kvm-x86-ops.h >> +++ b/arch/x86/include/asm/kvm-x86-ops.h >> @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) >> KVM_X86_OP(msr_filter_changed) >> KVM_X86_OP(complete_emulated_msr) >> KVM_X86_OP(vcpu_deliver_sipi_vector) >> -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); >> +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) >> +KVM_X86_OP_OPTIONAL_RET0(check_lass) >> >> #undef KVM_X86_OP >> #undef KVM_X86_OP_OPTIONAL >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 92d8e65fe88c..98666d1e7727 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { >> * Returns vCPU specific APICv inhibit reasons >> */ >> unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); >> + >> + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); >> }; >> >> struct kvm_x86_nested_ops { >> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h >> index 5b9ec610b2cb..f1439ab7c14b 100644 >> --- a/arch/x86/kvm/kvm_emulate.h >> +++ b/arch/x86/kvm/kvm_emulate.h >> @@ -91,6 +91,7 @@ struct x86_instruction_info { >> /* x86-specific emulation flags */ >> #define X86EMUL_F_FETCH BIT(0) >> #define X86EMUL_F_WRITE BIT(1) >> +#define X86EMUL_F_SKIPLASS BIT(2) >> >> struct x86_emulate_ops { >> void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index a33205ded85c..876997e8448e 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) >> free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); >> } >> >> +/* >> + * Determine whether an access to the linear address causes a LASS violation. >> + * LASS protection is only effective in long mode. As a prerequisite, caller >> + * should make sure vCPU running in long mode and invoke this api to do LASS >> + * violation check. >> + */ >> +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) >> +{ >> + bool user_mode, user_as, rflags_ac; >> + >> + if (!!(flags & X86EMUL_F_SKIPLASS) || >> + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) >> + return false; >> + >> + WARN_ON_ONCE(!is_long_mode(vcpu)); > IMHO, it's better to skip the following checks and return false if it is > out of long mode. In some cases , cpu mode check already exists before invoking LASS violation detect, e.g. vmx instruction emulation. So it's designed to make vmx_check_lass() focusing on LASS violation alone, and leave it to caller taking care of cpu mode in advance. The purpose is to avoid duplicating cpu mode check though the impact seems not significant. :) >> + >> + user_as = !(la >> 63); > It's better to describe how LASS treat linear address in compatibility > mode in changelog or/and in comment, > i.e. for a linear address with only 32 bits (or 16 bits), the processor > treats bit 63 as if it were 0. > OK, will add comments on specific treatment in compatibility mode. >> + >> + /* >> + * An access is a supervisor-mode access if CPL < 3 or if it implicitly >> + * accesses a system data structure. For implicit accesses to system >> + * data structure, the processor acts as if RFLAGS.AC is clear. >> + */ >> + if (access & PFERR_IMPLICIT_ACCESS) { >> + user_mode = false; >> + rflags_ac = false; >> + } else { >> + user_mode = vmx_get_cpl(vcpu) == 3; >> + if (!user_mode) >> + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); >> + } >> + >> + if (user_mode == user_as) >> + return false; >> + >> + /* >> + * Supervisor-mode _data_ accesses to user address space >> + * cause LASS violations only if SMAP is enabled. >> + */ >> + if (!user_mode && !(access & PFERR_FETCH_MASK)) >> + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; >> + >> + return true; >> +} >> + >> static struct kvm_x86_ops vmx_x86_ops __initdata = { >> .name = KBUILD_MODNAME, >> >> @@ -8269,6 +8314,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { >> .complete_emulated_msr = kvm_complete_insn_gp, >> >> .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, >> + >> + .check_lass = vmx_check_lass, >> }; >> >> static unsigned int vmx_handle_intel_pt_intr(void) >> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h >> index 9e66531861cf..f2e775b9849b 100644 >> --- a/arch/x86/kvm/vmx/vmx.h >> +++ b/arch/x86/kvm/vmx/vmx.h >> @@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type); >> u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu); >> u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu); >> >> +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); >> + >> static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, >> int type, bool value) >> {
Similar to comments on an earlier patch, don't try to describe the literal code change, e.g. this does far more than just "Add new ops in kvm_x86_ops". Something like KVM: VMX: Add emulation of LASS violation checks on linear address On Thu, Jun 01, 2023, Zeng Guang wrote: > Intel introduces LASS (Linear Address Separation) feature providing > an independent mechanism to achieve the mode-based protection. > > LASS partitions 64-bit linear address space into two halves, user-mode > address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It > stops any code execution or conditional data access[1] > 1. from user mode to supervisor-mode address space > 2. from supervisor mode to user-mode address space > and generates LASS violation fault accordingly. > > [1]A supervisor mode data access causes a LASS violation only if supervisor > mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 > or the access implicitly accesses a system data structure. > > Following are the rules of LASS violation check on the linear address(LA). > User access to supervisor-mode address space: > LA[bit 63] && (CPL == 3) > Supervisor access to user-mode address space: > Instruction fetch: !LA[bit 63] && (CPL < 3) > Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && > CPL < 3) || Implicit supervisor access) > > Add new ops in kvm_x86_ops to do LASS violation check. > > Signed-off-by: Zeng Guang <guang.zeng@intel.com> > Tested-by: Xuelian Guo <xuelian.guo@intel.com> > --- > arch/x86/include/asm/kvm-x86-ops.h | 3 +- > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/kvm_emulate.h | 1 + > arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ > arch/x86/kvm/vmx/vmx.h | 2 ++ > 5 files changed, 54 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index 13bc212cd4bc..8980a3bfa687 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) > KVM_X86_OP(msr_filter_changed) > KVM_X86_OP(complete_emulated_msr) > KVM_X86_OP(vcpu_deliver_sipi_vector) > -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) > +KVM_X86_OP_OPTIONAL_RET0(check_lass) Use "is_lass_violation" instead of "check_lass" for all function names. "check" doesn't convey that the function is a predicate, i.e. that it returns true/false. > #undef KVM_X86_OP > #undef KVM_X86_OP_OPTIONAL > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 92d8e65fe88c..98666d1e7727 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { > * Returns vCPU specific APICv inhibit reasons > */ > unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); > + > + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); > }; > > struct kvm_x86_nested_ops { > diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h > index 5b9ec610b2cb..f1439ab7c14b 100644 > --- a/arch/x86/kvm/kvm_emulate.h > +++ b/arch/x86/kvm/kvm_emulate.h > @@ -91,6 +91,7 @@ struct x86_instruction_info { > /* x86-specific emulation flags */ > #define X86EMUL_F_FETCH BIT(0) > #define X86EMUL_F_WRITE BIT(1) > +#define X86EMUL_F_SKIPLASS BIT(2) This belongs in the patch that integrates LASS into the emulator. And rather than make the flag a command (SKIPLASS), I think it makes sense to make the flag describe the access. It'll mean more flags, but those are free. That way the originators of the accesses, e.g. em_invlpg(), don't need to document how they interact with LASS, e.g. this code is self-documenting, and if someone wants to understand why KVM has a dedicated X86EMUL_F_INVLPG flag, it's easy enough to find the consumer. And in the unlikely scenario that other things care about INVLPG, branch targets, etc., we won't end up with X86EMUL_F_SKIPLASS, X86EMUL_F_SKIPOTHER, etc. rc = __linearize(ctxt, ctxt->src.addr.mem, &max_size, 1, X86EMUL_F_INVLPG, ctxt->mode, &linear); So this? #define X86EMUL_F_IMPLICIT #define X86EMUL_F_INVLPG #define X86EMUL_F_BRANCH_TARGET > struct x86_emulate_ops { > void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index a33205ded85c..876997e8448e 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) > free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); > } > > +/* > + * Determine whether an access to the linear address causes a LASS violation. > + * LASS protection is only effective in long mode. As a prerequisite, caller > + * should make sure vCPU running in long mode and invoke this api to do LASS > + * violation check. > + */ > +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) > +{ > + bool user_mode, user_as, rflags_ac; > + > + if (!!(flags & X86EMUL_F_SKIPLASS) || > + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) > + return false; > + > + WARN_ON_ONCE(!is_long_mode(vcpu)); This is silly. By making this a preqreq, 2 of the 3 callers are forced to explicitly check for is_long_mode(), *and* it unnecessarily bleeds LASS details outside of VMX. > + > + user_as = !(la >> 63); > + > + /* > + * An access is a supervisor-mode access if CPL < 3 or if it implicitly > + * accesses a system data structure. For implicit accesses to system > + * data structure, the processor acts as if RFLAGS.AC is clear. > + */ > + if (access & PFERR_IMPLICIT_ACCESS) { Please don't use PFERR_IMPLICIT_ACCESS, just extend the new flags. This is obviously not coming from a page fault. PFERR_IMPLICIT_ACCESS really shouldn't exist, but at least there was reasonable justification for adding it (changing all of the paths that lead to permission_fault() would have require a massive overhaul). ***HOWEVER*** I think the entire approach of hooking __linearize() may be a mistake, and LASS should instead be implemented in a wrapper of ->gva_to_gpa(). The two calls to __linearize() that are escaped with SKIPLASS are escaped *because* they don't actually access memory (branch targets and INVLPG), and so if LASS is enforced only when ->gva_to_gpa() is invoked, all of these new flags go away because the cases that ignore LASS are naturally handled. That should also make it unnecessary to add one-off checks since e.g. kvm_handle_invpcid() will hit kvm_read_guest_virt() and thus ->gva_to_gpa(), i.e. we won't end up playing an ongoing game of whack-a-mole. And one question that needs to be answered is what happens when an access rolls over from supervisor to user, e.g. if the kernel access 8 bytes at -1ull and thus reads all Fs => 0x6, does the access get a LASS violation on the access to user memory. User=>supervisor can't happen because non-canonical checks have higher priority, but supervisor=>user can. And that matters because it will impact whether or not KVM needs to check each *page* or just the initial address. > + user_mode = false; > + rflags_ac = false; > + } else { > + user_mode = vmx_get_cpl(vcpu) == 3; > + if (!user_mode) > + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); > + } > + > + if (user_mode == user_as) > + return false; > + > + /* > + * Supervisor-mode _data_ accesses to user address space > + * cause LASS violations only if SMAP is enabled. > + */ > + if (!user_mode && !(access & PFERR_FETCH_MASK)) > + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; This is all more complex than it needs to be. Using local bools just so that the "user_mode == user_as" is common is not a good tradeoff. User addresses have *significantly* different behavior, lean into that instead of dancing around it. bool is_lass_violation(struct kvm_vcpu *vcpu, unsigned long addr, unsigned int flags) { const bool is_supervisor_access = addr & BIT_ULL(63); const bool implicit = flags & X86EMUL_F_IMPLICIT; bool user_mode, user_as, rflags_ac; if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS) || !is_long_mode(vcpu)) return false; /* * INVLPG isn't subject to LASS, e.g. to allow invalidating userspace * addresses without toggling RFLAGS.AC. Branch targets aren't subject * to LASS in order to simplifiy far control transfers (the subsequent * fetch will enforce LASS as appropriate). */ if (flags & (X86EMUL_F_BRANCH_TARGET | X86EMUL_F_INVLPG)) return false; if (!implicit && vmx_get_cpl(vcpu) == 3) return is_supervisor_address; /* LASS is enforced for supervisor access iff SMAP is enabled. */ if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP)) return false; /* Like SMAP, RFLAGS.AC disables LASS checks in supervisor mode. */ if (!implicit && (kvm_get_rflags(vcpu) & X86_EFLAGS_AC)) return false; return !is_supervisor_address; } > + return true; > +}
On Tue, Jun 27, 2023, Sean Christopherson wrote: > > + /* > > + * An access is a supervisor-mode access if CPL < 3 or if it implicitly > > + * accesses a system data structure. For implicit accesses to system > > + * data structure, the processor acts as if RFLAGS.AC is clear. > > + */ > > + if (access & PFERR_IMPLICIT_ACCESS) { > > Please don't use PFERR_IMPLICIT_ACCESS, just extend the new flags. This is > obviously not coming from a page fault. PFERR_IMPLICIT_ACCESS really shouldn't > exist, but at least there was reasonable justification for adding it (changing > all of the paths that lead to permission_fault() would have require a massive > overhaul). > > ***HOWEVER*** > > I think the entire approach of hooking __linearize() may be a mistake, and LASS > should instead be implemented in a wrapper of ->gva_to_gpa(). The two calls to > __linearize() that are escaped with SKIPLASS are escaped *because* they don't > actually access memory (branch targets and INVLPG), and so if LASS is enforced > only when ->gva_to_gpa() is invoked, all of these new flags go away because the > cases that ignore LASS are naturally handled. > > That should also make it unnecessary to add one-off checks since e.g. kvm_handle_invpcid() > will hit kvm_read_guest_virt() and thus ->gva_to_gpa(), i.e. we won't end up playing > an ongoing game of whack-a-mole. Drat, that won't work, at least not without quite a few more changes. 1. kvm_{read,write,fetch}_guest_virt() are effectively defined to work with a fully resolve linear address, i.e. callers assume failure means #PF 2. Similar to (1), segment information isn't available, i.e. KVM wouldn't know when to inject #SS instead of #GP And IIUC, LASS violations are higher priority than instruction specific alignment checks, e.g. on CMPXCHG16B. And looking at LAM, that untagging needs to be done before the canonical checks, which means that we'll need at least X86EMUL_F_INVLPG. So my idea of shoving this into a ->gva_to_gpa() wrapper won't work well. Bummer.
On 6/28/2023 2:26 AM, Sean Christopherson wrote: > Similar to comments on an earlier patch, don't try to describe the literal code > change, e.g. this does far more than just "Add new ops in kvm_x86_ops". Something > like > > KVM: VMX: Add emulation of LASS violation checks on linear address OK. I will modify the change log to focus on overview of patch implementation and be more concise. > > On Thu, Jun 01, 2023, Zeng Guang wrote: >> Intel introduces LASS (Linear Address Separation) feature providing >> an independent mechanism to achieve the mode-based protection. >> >> LASS partitions 64-bit linear address space into two halves, user-mode >> address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It >> stops any code execution or conditional data access[1] >> 1. from user mode to supervisor-mode address space >> 2. from supervisor mode to user-mode address space >> and generates LASS violation fault accordingly. >> >> [1]A supervisor mode data access causes a LASS violation only if supervisor >> mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0 >> or the access implicitly accesses a system data structure. >> >> Following are the rules of LASS violation check on the linear address(LA). >> User access to supervisor-mode address space: >> LA[bit 63] && (CPL == 3) >> Supervisor access to user-mode address space: >> Instruction fetch: !LA[bit 63] && (CPL < 3) >> Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 && >> CPL < 3) || Implicit supervisor access) >> >> Add new ops in kvm_x86_ops to do LASS violation check. >> >> Signed-off-by: Zeng Guang <guang.zeng@intel.com> >> Tested-by: Xuelian Guo <xuelian.guo@intel.com> >> --- >> arch/x86/include/asm/kvm-x86-ops.h | 3 +- >> arch/x86/include/asm/kvm_host.h | 2 ++ >> arch/x86/kvm/kvm_emulate.h | 1 + >> arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++ >> arch/x86/kvm/vmx/vmx.h | 2 ++ >> 5 files changed, 54 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h >> index 13bc212cd4bc..8980a3bfa687 100644 >> --- a/arch/x86/include/asm/kvm-x86-ops.h >> +++ b/arch/x86/include/asm/kvm-x86-ops.h >> @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) >> KVM_X86_OP(msr_filter_changed) >> KVM_X86_OP(complete_emulated_msr) >> KVM_X86_OP(vcpu_deliver_sipi_vector) >> -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); >> +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) >> +KVM_X86_OP_OPTIONAL_RET0(check_lass) > Use "is_lass_violation" instead of "check_lass" for all function names. "check" > doesn't convey that the function is a predicate, i.e. that it returns true/false. That does make sense. Will change it. >> #undef KVM_X86_OP >> #undef KVM_X86_OP_OPTIONAL >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 92d8e65fe88c..98666d1e7727 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { >> * Returns vCPU specific APICv inhibit reasons >> */ >> unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); >> + >> + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); >> }; >> >> struct kvm_x86_nested_ops { >> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h >> index 5b9ec610b2cb..f1439ab7c14b 100644 >> --- a/arch/x86/kvm/kvm_emulate.h >> +++ b/arch/x86/kvm/kvm_emulate.h >> @@ -91,6 +91,7 @@ struct x86_instruction_info { >> /* x86-specific emulation flags */ >> #define X86EMUL_F_FETCH BIT(0) >> #define X86EMUL_F_WRITE BIT(1) >> +#define X86EMUL_F_SKIPLASS BIT(2) > This belongs in the patch that integrates LASS into the emulator. And rather than > make the flag a command (SKIPLASS), I think it makes sense to make the flag describe > the access. It'll mean more flags, but those are free. That way the originators of > the accesses, e.g. em_invlpg(), don't need to document how they interact with LASS, > e.g. this code is self-documenting, and if someone wants to understand why KVM > has a dedicated X86EMUL_F_INVLPG flag, it's easy enough to find the consumer. > And in the unlikely scenario that other things care about INVLPG, branch targets, > etc., we won't end up with X86EMUL_F_SKIPLASS, X86EMUL_F_SKIPOTHER, etc. > > rc = __linearize(ctxt, ctxt->src.addr.mem, &max_size, 1, > X86EMUL_F_INVLPG, ctxt->mode, &linear); > > So this? > > #define X86EMUL_F_IMPLICIT > #define X86EMUL_F_INVLPG > #define X86EMUL_F_BRANCH_TARGET By this way, emulator just provides the type of operation and make it to become common interface open to various platforms. That also conceals vendor specific implementation details. We will follow this idea. Thanks. >> struct x86_emulate_ops { >> void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index a33205ded85c..876997e8448e 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) >> free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); >> } >> >> +/* >> + * Determine whether an access to the linear address causes a LASS violation. >> + * LASS protection is only effective in long mode. As a prerequisite, caller >> + * should make sure vCPU running in long mode and invoke this api to do LASS >> + * violation check. >> + */ >> +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) >> +{ >> + bool user_mode, user_as, rflags_ac; >> + >> + if (!!(flags & X86EMUL_F_SKIPLASS) || >> + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) >> + return false; >> + >> + WARN_ON_ONCE(!is_long_mode(vcpu)); > This is silly. By making this a preqreq, 2 of the 3 callers are forced to explicitly > check for is_long_mode(), *and* it unnecessarily bleeds LASS details outside of VMX. Right. Will give up this implementation. >> + >> + user_as = !(la >> 63); >> + >> + /* >> + * An access is a supervisor-mode access if CPL < 3 or if it implicitly >> + * accesses a system data structure. For implicit accesses to system >> + * data structure, the processor acts as if RFLAGS.AC is clear. >> + */ >> + if (access & PFERR_IMPLICIT_ACCESS) { > Please don't use PFERR_IMPLICIT_ACCESS, just extend the new flags. This is > obviously not coming from a page fault. PFERR_IMPLICIT_ACCESS really shouldn't > exist, but at least there was reasonable justification for adding it (changing > all of the paths that lead to permission_fault() would have require a massive > overhaul). I've realized it not proper to use "PFERR_xxx" which was defined for mmu process actually. It easily makes things confuse. > ***HOWEVER*** > > I think the entire approach of hooking __linearize() may be a mistake, and LASS > should instead be implemented in a wrapper of ->gva_to_gpa(). The two calls to > __linearize() that are escaped with SKIPLASS are escaped *because* they don't > actually access memory (branch targets and INVLPG), and so if LASS is enforced > only when ->gva_to_gpa() is invoked, all of these new flags go away because the > cases that ignore LASS are naturally handled. > > That should also make it unnecessary to add one-off checks since e.g. kvm_handle_invpcid() > will hit kvm_read_guest_virt() and thus ->gva_to_gpa(), i.e. we won't end up playing > an ongoing game of whack-a-mole. We've ever thought about this scheme, i.e. doing LASS violation check before page table walk in ->gva_to_gpa(). But we find it difficult to identify and process the fault of Lass violation. That may need to change the architecture design of mmu. Thus we recommend current implementation. > And one question that needs to be answered is what happens when an access rolls > over from supervisor to user, e.g. if the kernel access 8 bytes at -1ull and thus > reads all Fs => 0x6, does the access get a LASS violation on the access to user > memory. User=>supervisor can't happen because non-canonical checks have higher > priority, but supervisor=>user can. And that matters because it will impact > whether or not KVM needs to check each *page* or just the initial address. By experiment on Simics platform, I verified such kind of access will cause a LASS violation. KVM has to consider lass violation check at both end of address range of instruction fetch and data access in emulation. Currently it doesn't take care of this corner case. Thanks to point out this potential problem. > >> + user_mode = false; >> + rflags_ac = false; >> + } else { >> + user_mode = vmx_get_cpl(vcpu) == 3; >> + if (!user_mode) >> + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); >> + } >> + >> + if (user_mode == user_as) >> + return false; >> + >> + /* >> + * Supervisor-mode _data_ accesses to user address space >> + * cause LASS violations only if SMAP is enabled. >> + */ >> + if (!user_mode && !(access & PFERR_FETCH_MASK)) >> + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; > This is all more complex than it needs to be. Using local bools just so that > the "user_mode == user_as" is common is not a good tradeoff. User addresses have > *significantly* different behavior, lean into that instead of dancing around it. > > bool is_lass_violation(struct kvm_vcpu *vcpu, unsigned long addr, > unsigned int flags) > { > const bool is_supervisor_access = addr & BIT_ULL(63); > const bool implicit = flags & X86EMUL_F_IMPLICIT; > > bool user_mode, user_as, rflags_ac; > > if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS) || !is_long_mode(vcpu)) > return false; > > /* > * INVLPG isn't subject to LASS, e.g. to allow invalidating userspace > * addresses without toggling RFLAGS.AC. Branch targets aren't subject > * to LASS in order to simplifiy far control transfers (the subsequent > * fetch will enforce LASS as appropriate). > */ > if (flags & (X86EMUL_F_BRANCH_TARGET | X86EMUL_F_INVLPG)) > return false; > > if (!implicit && vmx_get_cpl(vcpu) == 3) > return is_supervisor_address; > > /* LASS is enforced for supervisor access iff SMAP is enabled. */ > if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP)) > return false; > > /* Like SMAP, RFLAGS.AC disables LASS checks in supervisor mode. */ > if (!implicit && (kvm_get_rflags(vcpu) & X86_EFLAGS_AC)) > return false; > > return !is_supervisor_address; > } OK. I will refine the implementation further. Thanks. >> + return true; >> +}
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 13bc212cd4bc..8980a3bfa687 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers) KVM_X86_OP(msr_filter_changed) KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) -KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); +KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons) +KVM_X86_OP_OPTIONAL_RET0(check_lass) #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 92d8e65fe88c..98666d1e7727 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1731,6 +1731,8 @@ struct kvm_x86_ops { * Returns vCPU specific APICv inhibit reasons */ unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); + + bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index 5b9ec610b2cb..f1439ab7c14b 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -91,6 +91,7 @@ struct x86_instruction_info { /* x86-specific emulation flags */ #define X86EMUL_F_FETCH BIT(0) #define X86EMUL_F_WRITE BIT(1) +#define X86EMUL_F_SKIPLASS BIT(2) struct x86_emulate_ops { void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a33205ded85c..876997e8448e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm) free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); } +/* + * Determine whether an access to the linear address causes a LASS violation. + * LASS protection is only effective in long mode. As a prerequisite, caller + * should make sure vCPU running in long mode and invoke this api to do LASS + * violation check. + */ +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags) +{ + bool user_mode, user_as, rflags_ac; + + if (!!(flags & X86EMUL_F_SKIPLASS) || + !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS)) + return false; + + WARN_ON_ONCE(!is_long_mode(vcpu)); + + user_as = !(la >> 63); + + /* + * An access is a supervisor-mode access if CPL < 3 or if it implicitly + * accesses a system data structure. For implicit accesses to system + * data structure, the processor acts as if RFLAGS.AC is clear. + */ + if (access & PFERR_IMPLICIT_ACCESS) { + user_mode = false; + rflags_ac = false; + } else { + user_mode = vmx_get_cpl(vcpu) == 3; + if (!user_mode) + rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC); + } + + if (user_mode == user_as) + return false; + + /* + * Supervisor-mode _data_ accesses to user address space + * cause LASS violations only if SMAP is enabled. + */ + if (!user_mode && !(access & PFERR_FETCH_MASK)) + return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac; + + return true; +} + static struct kvm_x86_ops vmx_x86_ops __initdata = { .name = KBUILD_MODNAME, @@ -8269,6 +8314,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + + .check_lass = vmx_check_lass, }; static unsigned int vmx_handle_intel_pt_intr(void) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 9e66531861cf..f2e775b9849b 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type); u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu); u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu); +bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags); + static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type, bool value) {