[PATCHv5,0/8] Virtual NMI feature

Message ID	20221027083831.2985-1-santosh.shukla@amd.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Santosh Shukla <santosh.shukla@amd.com> To: <pbonzini@redhat.com>, <seanjc@google.com> CC: <kvm@vger.kernel.org>, <jmattson@google.com>, <joro@8bytes.org>, <linux-kernel@vger.kernel.org>, <mail@maciej.szmigiero.name>, <mlevitsk@redhat.com>, <thomas.lendacky@amd.com>, <vkuznets@redhat.com> Subject: [PATCHv5 0/8] Virtual NMI feature Date: Thu, 27 Oct 2022 14:08:23 +0530 Message-ID: <20221027083831.2985-1-santosh.shukla@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	Virtual NMI feature \| [PATCHv5,0/8] Virtual NMI feature [PATCHv5,1/8] x86/cpu: Add CPUID feature bit for VNMI [PATCHv5,2/8] KVM: SVM: Add VNMI bit definition [PATCHv5,3/8] KVM: SVM: Add VNMI support in get/set_nmi_mask [PATCHv5,4/8] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI [PATCHv5,5/8] KVM: SVM: Add VNMI support in inject_nmi [PATCHv5,6/8] KVM: nSVM: implement nested VNMI [PATCHv5,7/8] KVM: nSVM: emulate VMEXIT_INVALID case for nested VNMI [PATCHv5,8/8] KVM: SVM: Enable VNMI feature

Message ID

20221027083831.2985-1-santosh.shukla@amd.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Received-SPF: Pass (protection.outlook.com: domain of amd.com designates
 165.204.84.17 as permitted sender) receiver=protection.outlook.com;
 client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C
From: Santosh Shukla <santosh.shukla@amd.com>
To: <pbonzini@redhat.com>, <seanjc@google.com>
CC: <kvm@vger.kernel.org>, <jmattson@google.com>, <joro@8bytes.org>,
        <linux-kernel@vger.kernel.org>, <mail@maciej.szmigiero.name>,
        <mlevitsk@redhat.com>, <thomas.lendacky@amd.com>,
        <vkuznets@redhat.com>
Subject: [PATCHv5 0/8] Virtual NMI feature
Date: Thu, 27 Oct 2022 14:08:23 +0530
Message-ID: <20221027083831.2985-1-santosh.shukla@amd.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Oct 2022 08:38:51.1785
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 f74b258a-6f7e-44c2-861c-08dab7f6ac9e
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 BN8NAM11FT098.eop-nam11.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB5667
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=ham
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1747829907268744107?=
X-GMAIL-MSGID: =?utf-8?q?1747829907268744107?=

Series

Virtual NMI feature |

Message

Santosh Shukla Oct. 27, 2022, 8:38 a.m. UTC

  VNMI Spec is at [1].

Change History:

v5 (6.1-rc2)
01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)

v4 (v6.0-rc3):
https://lore.kernel.org/all/20220829100850.1474-1-santosh.shukla@amd.com/

v3 (rebased on eb555cb5b794f):
https://lore.kernel.org/all/20220810061226.1286-1-santosh.shukla@amd.com/

v2:
https://lore.kernel.org/lkml/20220709134230.2397-7-santosh.shukla@amd.com/T/#m4bf8a131748688fed00ab0fefdcac209a169e202

v1:
https://lore.kernel.org/all/20220602142620.3196-1-santosh.shukla@amd.com/

Description:
Currently, NMI is delivered to the guest using the Event Injection
mechanism [2]. The Event Injection mechanism does not block the delivery
of subsequent NMIs. So the Hypervisor needs to track the NMI delivery
and its completion(by intercepting IRET) before sending a new NMI.

Virtual NMI (VNMI) allows the hypervisor to inject the NMI into the guest
w/o using Event Injection mechanism meaning not required to track the
guest NMI and intercepting the IRET. To achieve that,
VNMI feature provides virtualized NMI and NMI_MASK capability bits in
VMCB intr_control -
V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.

When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor will
clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
handling NMI, After the guest handled the NMI, The processor will clear
the V_NMI_MASK on the successful completion of IRET instruction
Or if VMEXIT occurs while delivering the virtual NMI.

If NMI virtualization enabled and NMI_INTERCEPT bit is unset
then HW will exit with #INVALID exit reason.

To enable the VNMI capability, Hypervisor need to program
V_NMI_ENABLE bit 1.

The presence of this feature is indicated via the CPUID function
0x8000000A_EDX[25].

Testing -
* Used qemu's `inject_nmi` for testing.
* tested with and w/o AVIC case.
* tested with kvm-unit-test
* tested with vGIF enable and disable.
* tested nested env:
  - L1+L2 using vnmi
  - L1 using vnmi and L2 not


Thanks,
Santosh
[1] https://www.amd.com/en/support/tech-docs/amd64-architecture-programmers-manual-volumes-1-5
(Ch-15.21.10 - NMI Virtualization)

[2] https://www.amd.com/en/support/tech-docs/amd64-architecture-programmers-manual-volumes-1-5
(Ch-15.20 - Event Injection)


Santosh Shukla (8):
  x86/cpu: Add CPUID feature bit for VNMI
  KVM: SVM: Add VNMI bit definition
  KVM: SVM: Add VNMI support in get/set_nmi_mask
  KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  KVM: SVM: Add VNMI support in inject_nmi
  KVM: nSVM: implement nested VNMI
  KVM: nSVM: emulate VMEXIT_INVALID case for nested VNMI
  KVM: SVM: Enable VNMI feature

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/svm.h         |  7 +++
 arch/x86/kvm/svm/nested.c          | 32 ++++++++++++++
 arch/x86/kvm/svm/svm.c             | 44 ++++++++++++++++++-
 arch/x86/kvm/svm/svm.h             | 68 ++++++++++++++++++++++++++++++
 5 files changed, 151 insertions(+), 1 deletion(-)

Comments

Santosh Shukla Nov. 14, 2022, 8:02 a.m. UTC | #1

On 10/27/2022 2:08 PM, Santosh Shukla wrote:
> VNMI Spec is at [1].
> 
> Change History:
> 
> v5 (6.1-rc2)
> 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
>

Gentle reminder.

Thanks,
Santosh

Maxim Levitsky Nov. 14, 2022, 2:31 p.m. UTC | #2

On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
> 
> 
> On 10/27/2022 2:08 PM, Santosh Shukla wrote:
> > VNMI Spec is at [1].
> > 
> > Change History:
> > 
> > v5 (6.1-rc2)
> > 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
> > 
> 
> Gentle reminder.
> 
> Thanks,
> Santosh
> 

I started reviewing it today and I think there are still few issues,
and the biggest one is that if a NMI arrives while vNMI injection
is pending, current code just drops such NMI.

We had a discussion about this, like forcing immeditate vm exit
in this case and such but I have a simplier idea:

In this case we can just open the NMI window in the good old way
by intercepting IRET, STGI, and or RSM (which is intercepted anyway),

and only if we already *just* intercepted IRET, only then just drop 
the new NMI instead of single stepping over it based on reasoning that
its 3rd NMI (one is almost done the servicing (its IRET is executing),
one is pending injection, and we want to inject another one.

Does this sound good to you? It won't work for SEV-ES as it looks
like it doesn't intercept IRET, but it might be a reasonable tradeof
for SEV-ES guests to accept that we can't inject a NMI if one is
already pending injection.

Best regards,
	Maxim Levitsky

Santosh Shukla Nov. 16, 2022, 5:40 a.m. UTC | #3

Hi Maxim,

On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
> On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
>>
>>
>> On 10/27/2022 2:08 PM, Santosh Shukla wrote:
>>> VNMI Spec is at [1].
>>>
>>> Change History:
>>>
>>> v5 (6.1-rc2)
>>> 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
>>>
>>
>> Gentle reminder.
>>
>> Thanks,
>> Santosh
>>
> 
> I started reviewing it today and I think there are still few issues,
> and the biggest one is that if a NMI arrives while vNMI injection
> is pending, current code just drops such NMI.
> 
> We had a discussion about this, like forcing immeditate vm exit

I believe, We discussed above case in [1] i.e.. HW can handle
the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
is it same scenario or different one that you are mentioning?

[1] https://lore.kernel.org/lkml/1782cdbb-8274-8c3d-fa98-29147f1e5d1e@amd.com/

Thanks,
Santosh

> in this case and such but I have a simplier idea:
> 
> In this case we can just open the NMI window in the good old way
> by intercepting IRET, STGI, and or RSM (which is intercepted anyway),
> 
> and only if we already *just* intercepted IRET, only then just drop 
> the new NMI instead of single stepping over it based on reasoning that
> its 3rd NMI (one is almost done the servicing (its IRET is executing),
> one is pending injection, and we want to inject another one.
> 
> Does this sound good to you? It won't work for SEV-ES as it looks
> like it doesn't intercept IRET, but it might be a reasonable tradeof
> for SEV-ES guests to accept that we can't inject a NMI if one is
> already pending injection.
> 
> Best regards,
> 	Maxim Levitsky
>

Maxim Levitsky Nov. 16, 2022, 9:21 a.m. UTC | #4

On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
> Hi Maxim,
> 
> On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
> > On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
> > > 
> > > 
> > > On 10/27/2022 2:08 PM, Santosh Shukla wrote:
> > > > VNMI Spec is at [1].
> > > > 
> > > > Change History:
> > > > 
> > > > v5 (6.1-rc2)
> > > > 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
> > > > 
> > > 
> > > Gentle reminder.
> > > 
> > > Thanks,
> > > Santosh
> > > 
> > 
> > I started reviewing it today and I think there are still few issues,
> > and the biggest one is that if a NMI arrives while vNMI injection
> > is pending, current code just drops such NMI.
> > 
> > We had a discussion about this, like forcing immeditate vm exit
> 
> I believe, We discussed above case in [1] i.e.. HW can handle
> the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
> is it same scenario or different one that you are mentioning?
> 
> [1] https://lore.kernel.org/lkml/1782cdbb-8274-8c3d-fa98-29147f1e5d1e@amd.com/

You misunderstood the issue.

Hardware can handle the case when a NMI is in service (that is V_NMI_MASK is set) and another one is injected 
(V_NMI_PENDING can be set),

but it is not possible to handle the case when a NMI is already injected (V_NMI_PENDING set) but
and KVM wants to inject another one before the first one went into the service (that is V_NMI_MASK is not set
yet).

Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is set despite no NMI in service,
we will be able to inject only one NMI by setting the V_NMI_PENDING.

I think I was able to solve all these issues and I will today post a modified patch series of yours,
which should cover all these cases and have some nice refactoring as well.


Best regards,
	Maxim Levitsky


> 
> Thanks,
> Santosh
> 
> > in this case and such but I have a simplier idea:
> > 
> > In this case we can just open the NMI window in the good old way
> > by intercepting IRET, STGI, and or RSM (which is intercepted anyway),
> > 
> > and only if we already *just* intercepted IRET, only then just drop 
> > the new NMI instead of single stepping over it based on reasoning that
> > its 3rd NMI (one is almost done the servicing (its IRET is executing),
> > one is pending injection, and we want to inject another one.
> > 
> > Does this sound good to you? It won't work for SEV-ES as it looks
> > like it doesn't intercept IRET, but it might be a reasonable tradeof
> > for SEV-ES guests to accept that we can't inject a NMI if one is
> > already pending injection.
> > 
> > Best regards,
> >         Maxim Levitsky
> > 
>

Santosh Shukla Nov. 16, 2022, 3:44 p.m. UTC | #5

Hello Maxim,.

On 11/16/2022 2:51 PM, Maxim Levitsky wrote:
> On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
>> Hi Maxim,
>>
>> On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
>>> On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
>>>>
>>>>
>>>> On 10/27/2022 2:08 PM, Santosh Shukla wrote:
>>>>> VNMI Spec is at [1].
>>>>>
>>>>> Change History:
>>>>>
>>>>> v5 (6.1-rc2)
>>>>> 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
>>>>>
>>>>
>>>> Gentle reminder.
>>>>
>>>> Thanks,
>>>> Santosh
>>>>
>>>
>>> I started reviewing it today and I think there are still few issues,
>>> and the biggest one is that if a NMI arrives while vNMI injection
>>> is pending, current code just drops such NMI.
>>>
>>> We had a discussion about this, like forcing immeditate vm exit
>>
>> I believe, We discussed above case in [1] i.e.. HW can handle
>> the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
>> is it same scenario or different one that you are mentioning?
>>
>> [1] https://lore.kernel.org/lkml/1782cdbb-8274-8c3d-fa98-29147f1e5d1e@amd.com/>> 
> You misunderstood the issue.
> 
> Hardware can handle the case when a NMI is in service (that is V_NMI_MASK is set) and another one is injected 
> (V_NMI_PENDING can be set),
> 
> but it is not possible to handle the case when a NMI is already injected (V_NMI_PENDING set) but
> and KVM wants to inject another one before the first one went into the service (that is V_NMI_MASK is not set
> yet).
> 

In this case, HW will collapse the NMI.

Note that the HW will take the pending NMI at the boundary of IRET instruction such that
it will check for the V_NMI_PENDING and if its set then HW will *take* the NMI,
HW will clear the V_NMI_PENDING bit and set the V_NMI_MASK w/o the VMEXIT!,.


> Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is set despite no NMI in service,
> we will be able to inject only one NMI by setting the V_NMI_PENDING.
>

Ditto,. HW will collapse the NMI.

Thanks,
Santosh
 
> I think I was able to solve all these issues and I will today post a modified patch series of yours,
> which should cover all these cases and have some nice refactoring as well.
> 
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
>>
>> Thanks,
>> Santosh
>>
>>> in this case and such but I have a simplier idea:
>>>
>>> In this case we can just open the NMI window in the good old way
>>> by intercepting IRET, STGI, and or RSM (which is intercepted anyway),
>>>
>>> and only if we already *just* intercepted IRET, only then just drop 
>>> the new NMI instead of single stepping over it based on reasoning that
>>> its 3rd NMI (one is almost done the servicing (its IRET is executing),
>>> one is pending injection, and we want to inject another one.
>>>
>>> Does this sound good to you? It won't work for SEV-ES as it looks
>>> like it doesn't intercept IRET, but it might be a reasonable tradeof
>>> for SEV-ES guests to accept that we can't inject a NMI if one is
>>> already pending injection.
>>>
>>> Best regards,
>>>         Maxim Levitsky
>>>
>>
> 
>

Tom Lendacky Nov. 16, 2022, 3:55 p.m. UTC | #6

On 11/16/22 09:44, Santosh Shukla wrote:
> Hello Maxim,.
> 
> On 11/16/2022 2:51 PM, Maxim Levitsky wrote:
>> On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
>>> Hi Maxim,
>>>
>>> On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
>>>> On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
>>>>>
>>>>>
>>>>> On 10/27/2022 2:08 PM, Santosh Shukla wrote:
>>>>>> VNMI Spec is at [1].
>>>>>>
>>>>>> Change History:
>>>>>>
>>>>>> v5 (6.1-rc2)
>>>>>> 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
>>>>>>
>>>>>
>>>>> Gentle reminder.
>>>>>
>>>>> Thanks,
>>>>> Santosh
>>>>>
>>>>
>>>> I started reviewing it today and I think there are still few issues,
>>>> and the biggest one is that if a NMI arrives while vNMI injection
>>>> is pending, current code just drops such NMI.
>>>>
>>>> We had a discussion about this, like forcing immeditate vm exit
>>>
>>> I believe, We discussed above case in [1] i.e.. HW can handle
>>> the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
>>> is it same scenario or different one that you are mentioning?
>>>
>>> [1] https://lore.kernel.org/lkml/1782cdbb-8274-8c3d-fa98-29147f1e5d1e@amd.com/>>
>> You misunderstood the issue.
>>
>> Hardware can handle the case when a NMI is in service (that is V_NMI_MASK is set) and another one is injected
>> (V_NMI_PENDING can be set),
>>
>> but it is not possible to handle the case when a NMI is already injected (V_NMI_PENDING set) but
>> and KVM wants to inject another one before the first one went into the service (that is V_NMI_MASK is not set
>> yet).
>>
> 
> In this case, HW will collapse the NMI.
> 
> Note that the HW will take the pending NMI at the boundary of IRET instruction such that
> it will check for the V_NMI_PENDING and if its set then HW will *take* the NMI,
> HW will clear the V_NMI_PENDING bit and set the V_NMI_MASK w/o the VMEXIT!,.
> 
> 
>> Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is set despite no NMI in service,
>> we will be able to inject only one NMI by setting the V_NMI_PENDING.
>>
> 
> Ditto,. HW will collapse the NMI.

Note, this is how bare-metal NMIs are also handled. Multiple NMIs are 
collapsed into a single NMI if they are received while an NMI is currently 
being processed.

Thanks,
Tom

> 
> Thanks,
> Santosh
>   
>> I think I was able to solve all these issues and I will today post a modified patch series of yours,
>> which should cover all these cases and have some nice refactoring as well.
>>
>>
>> Best regards,
>> 	Maxim Levitsky
>>
>>
>>>
>>> Thanks,
>>> Santosh
>>>
>>>> in this case and such but I have a simplier idea:
>>>>
>>>> In this case we can just open the NMI window in the good old way
>>>> by intercepting IRET, STGI, and or RSM (which is intercepted anyway),
>>>>
>>>> and only if we already *just* intercepted IRET, only then just drop
>>>> the new NMI instead of single stepping over it based on reasoning that
>>>> its 3rd NMI (one is almost done the servicing (its IRET is executing),
>>>> one is pending injection, and we want to inject another one.
>>>>
>>>> Does this sound good to you? It won't work for SEV-ES as it looks
>>>> like it doesn't intercept IRET, but it might be a reasonable tradeof
>>>> for SEV-ES guests to accept that we can't inject a NMI if one is
>>>> already pending injection.
>>>>
>>>> Best regards,
>>>>          Maxim Levitsky
>>>>
>>>
>>
>>
>

Sean Christopherson Nov. 16, 2022, 4:59 p.m. UTC | #7

On Wed, Nov 16, 2022, Tom Lendacky wrote:
> On 11/16/22 09:44, Santosh Shukla wrote:
> > Hello Maxim,.
> > 
> > On 11/16/2022 2:51 PM, Maxim Levitsky wrote:
> > > On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
> > > > Hi Maxim,
> > > > 
> > > > On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
> > > > > On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
> > > > > I started reviewing it today and I think there are still few issues,
> > > > > and the biggest one is that if a NMI arrives while vNMI injection
> > > > > is pending, current code just drops such NMI.

I don't think it gets dropped, just improperly delayed.

> > > > > We had a discussion about this, like forcing immeditate vm exit
> > > > 
> > > > I believe, We discussed above case in [1] i.e.. HW can handle
> > > > the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
> > > > is it same scenario or different one that you are mentioning?
> > > > 
> > > > [1] https://lore.kernel.org/lkml/1782cdbb-8274-8c3d-fa98-29147f1e5d1e@amd.com/>>
> > > You misunderstood the issue.
> > > 
> > > Hardware can handle the case when a NMI is in service (that is V_NMI_MASK
> > > is set) and another one is injected (V_NMI_PENDING can be set),
> > > 
> > > but it is not possible to handle the case when a NMI is already injected
> > > (V_NMI_PENDING set) but and KVM wants to inject another one before the
> > > first one went into the service (that is V_NMI_MASK is not set yet).
> > 
> > In this case, HW will collapse the NMI.

Yes, but practically speaking two NMIs can't arrive at the exact same instance on
bare metal.  One NMI will always arrive first and get vectored, and then the second
NMI will arrive and be pended.  In a virtual environment, two NMIs that are sent
far apart can arrive together, e.g. if the vCPU is scheduled out for an extended
period of time.  KVM mitigates this side effect by allowing two NMIs to be pending.

The problem here isn't that second the NMI is lost, it's that KVM can't get control
when the first NMI completes (IRETs).  KVM can pend both NMIs and queue the first
for injection/vectoring (set V_NMI_PENDING), but AIUI there is no mechanism (and no
code) to force a VM-Exit on the IRET so that KVM can inject the second NMI.

Santosh's response in v2[*] suggested that hardware would allow KVM to "post" an
NMI while the vCPU is running, but I don't see any code in this series to actually
do that.  svm_inject_nmi() is only called from the vCPU's run loop, i.e. requires
a VM-Exit.

[*] https://lore.kernel.org/all/1782cdbb-8274-8c3d-fa98-29147f1e5d1e@amd.com

> > Note that the HW will take the pending NMI at the boundary of IRET instruction such that
> > it will check for the V_NMI_PENDING and if its set then HW will *take* the NMI,
> > HW will clear the V_NMI_PENDING bit and set the V_NMI_MASK w/o the VMEXIT!,.
> > 
> > 
> > > Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is
> > > set despite no NMI in service, we will be able to inject only one NMI by
> > > setting the V_NMI_PENDING.

I believe this one is a non-issue.  Like bare metal, KVM only allows one NMI to
be pending if NMIs are blocked.

  static void process_nmi(struct kvm_vcpu *vcpu)
  {
	unsigned limit = 2;

	/*
	 * x86 is limited to one NMI running, and one NMI pending after it.
	 * If an NMI is already in progress, limit further NMIs to just one.
	 * Otherwise, allow two (and we'll inject the first one immediately).
	 */
	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
		limit = 1;

	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
	kvm_make_request(KVM_REQ_EVENT, vcpu);
  }


> > Ditto,. HW will collapse the NMI.
> 
> Note, this is how bare-metal NMIs are also handled. Multiple NMIs are
> collapsed into a single NMI if they are received while an NMI is currently
> being processed.