[RFC] Randomness on confidential computing platforms

Message ID 20240126134230.1166943-1-kirill.shutemov@linux.intel.com
State New
Headers
Series [RFC] Randomness on confidential computing platforms |

Commit Message

Kirill A. Shutemov Jan. 26, 2024, 1:42 p.m. UTC
  Problem Statement

Currently Linux RNG uses the random inputs obtained from x86
RDRAND/RDSEED instructions (if present) during early initialization
stage (by mixing the obtained input into the random pool via
_mix_pool_bytes()), as well as for seeding/reseeding ChaCha-based CRNG.
When the calls to both RDRAND/RDSEED fail (including RDRAND internal
retries), the timing-based fallbacks are used in the latter case, and
during the early boot case this source of entropy input is simply
skipped. Overall Linux RNG has many other sources of entropy that it
uses (also depending on what HW is used), but the dominating one is
interrupts.

In a Confidential Computing Guest threat model, given the absence of any
special trusted HW for the secure entropy input, RDRAND/RDSEED
instructions is the only entropy source that is unobservable outside of
Confidential Computing Guest TCB. However, with enough pressure on these
instructions from multiple cores (see Intel SDM, Volume 1, Section
7.3.17, “Random Number Generator Instructions”), they can be made to
fail on purpose and force the Confidential Computing Guest Linux RNG to
use only Host/VMM controlled entropy sources.

Solution options

There are several possible solutions to this problem and the intention
of this RFC is to initiate a joined discussion. Here are some options
that has been considered:

1. Do nothing and accept the risk.
2. Force endless looping on RDRAND/RDSEED instructions when run in a
   Confidential Computing Guest (this patch). This option turns the
   attack against the quality of cryptographic randomness provided by
   Confidential Computing Guest’s Linux RNG into a DoS attack against
   the Confidential Computing Guest itself (DoS attack is out of scope
   for the Confidential Computing threat model).
3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
   Another DoS variant against the Guest.
4. Exit to the host/VMM with an error indication after a Confidential
   Computing Guest failed to obtain random input from RDRAND/RDSEED
   instructions after reasonable number of retries. This option allows
   host/VMM to take some correction action for cases when the load on
   RDRAND/RDSEED instructions has been put by another actor, i.e. the
   other guest VM. The exit to host/VMM in such cases can be made
   transparent for the Confidential Computing Guest in the TDX case with
   the assistance of the TDX module component.
5. Anything other better option?

The patch below implements the second option. I believe the problem is
common for Intel TDX and AMD SEV. The patch cover both.
---
 arch/x86/boot/compressed/kaslr.c  |  6 ++++++
 arch/x86/boot/compressed/mem.c    | 26 -------------------------
 arch/x86/boot/compressed/misc.h   |  3 +++
 arch/x86/boot/compressed/sev.c    |  5 +++++
 arch/x86/boot/compressed/sev.h    |  2 ++
 arch/x86/boot/compressed/tdx.c    | 32 ++++++++++++++++++++++++++-----
 arch/x86/boot/compressed/tdx.h    |  2 ++
 arch/x86/coco/core.c              |  2 ++
 arch/x86/include/asm/archrandom.h | 22 ++++++++++++++++-----
 include/linux/cc_platform.h       | 11 +++++++++++
 10 files changed, 75 insertions(+), 36 deletions(-)
  

Comments

Reshetova, Elena Jan. 29, 2024, 7:15 a.m. UTC | #1
> On 26.01.24 г. 17:57 ч., Daniel P. Berrangé wrote:
> > If the CPU performance counters could report RDRAND exhaustion directly,
> > then the host admin could trust that information and monitor it, but the
> > host shouldn't rely on the (hostile) guest software to tell it about
> 
> I guess it really depends on the POV - from the POV of an encrypted
> guest the VMM is hostile so we ideally don't like to divulge more
> information than is absolutely necessary.
> 
> OTOH, from the POV of the VMM we could say that the guest could be
> running anything and so a facility like that could cause some confusion
> on the VMM site.
> 
> I think it would be very hard to reconcile the 2 views.

I agree that both views need to be taken into account, and in the confidential
computing threat model nobody has removed the possibility that a CoCo guest
can be malicious. So any action VMM is about to take has to be considered
carefully. We were not prescribing any action here, just asking if VMM would
want to have such a control/option. But since Sean clearly doesn’t find this
approach viable, we will drop the VMM-based option. 

Best Regards,
Elena.
  
Kirill A. Shutemov Jan. 29, 2024, 10:27 a.m. UTC | #2
On Fri, Jan 26, 2024 at 07:23:55AM -0800, Sean Christopherson wrote:
> On Fri, Jan 26, 2024, Kirill A. Shutemov wrote:
> > Problem Statement
> > 
> > Currently Linux RNG uses the random inputs obtained from x86
> > RDRAND/RDSEED instructions (if present) during early initialization
> > stage (by mixing the obtained input into the random pool via
> > _mix_pool_bytes()), as well as for seeding/reseeding ChaCha-based CRNG.
> > When the calls to both RDRAND/RDSEED fail (including RDRAND internal
> > retries), the timing-based fallbacks are used in the latter case, and
> > during the early boot case this source of entropy input is simply
> > skipped. Overall Linux RNG has many other sources of entropy that it
> > uses (also depending on what HW is used), but the dominating one is
> > interrupts.
> > 
> > In a Confidential Computing Guest threat model, given the absence of any
> > special trusted HW for the secure entropy input, RDRAND/RDSEED
> > instructions is the only entropy source that is unobservable outside of
> > Confidential Computing Guest TCB. However, with enough pressure on these
> > instructions from multiple cores (see Intel SDM, Volume 1, Section
> > 7.3.17, “Random Number Generator Instructions”), they can be made to
> > fail on purpose and force the Confidential Computing Guest Linux RNG to
> > use only Host/VMM controlled entropy sources.
> > 
> > Solution options
> > 
> > There are several possible solutions to this problem and the intention
> > of this RFC is to initiate a joined discussion. Here are some options
> > that has been considered:
> > 
> > 1. Do nothing and accept the risk.
> > 2. Force endless looping on RDRAND/RDSEED instructions when run in a
> >    Confidential Computing Guest (this patch). This option turns the
> >    attack against the quality of cryptographic randomness provided by
> >    Confidential Computing Guest’s Linux RNG into a DoS attack against
> >    the Confidential Computing Guest itself (DoS attack is out of scope
> >    for the Confidential Computing threat model).
> > 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
> >    Another DoS variant against the Guest.
> > 4. Exit to the host/VMM with an error indication after a Confidential
> >    Computing Guest failed to obtain random input from RDRAND/RDSEED
> >    instructions after reasonable number of retries. This option allows
> >    host/VMM to take some correction action for cases when the load on
> >    RDRAND/RDSEED instructions has been put by another actor, i.e. the
> >    other guest VM. The exit to host/VMM in such cases can be made
> >    transparent for the Confidential Computing Guest in the TDX case with
> >    the assistance of the TDX module component.
> 
> Hell no.  Develop better hardware if you want to guarantee forward progress.
> Don't push more complexity into the host stack for something that in all likelihood
> will never happen outside of buggy software or hardware.

My idea for this option was to make TDH.VP.ENTER return TDX_RND_NO_ENTROPY
in such case. VMM can simply retry or maybe schedule other workload and
let entropy pool to recover.

I don't think making RDRAND/RDSEED never-fail on HW level is feasible. And
it is definitely not guaranteed by current architecture.

> > 5. Anything other better option?
> 
> Give the admin the option to choose between "I don't care, carry-on with less
> randomness" and "I'm paranoid, panic, panic, panic!".  In other words, let the
> admin choose between #1 and #3 at boot time.  You could probably even let the
> admin control the number of retries, though that's probably a bit excessive.
> 
> And don't tie it to CoCo VMs, e.g. if someone is relying on randomness for a bare
> metal workload, they might prefer to panic if hardware is acting funky.

If we go this path, I still the option has to have strict default for
CoCo VMs as they don't have options to fallback to.
  
Dave Hansen Jan. 29, 2024, 4:30 p.m. UTC | #3
On 1/26/24 05:42, Kirill A. Shutemov wrote:
> 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
>    Another DoS variant against the Guest.

I think Sean was going down the same path, but I really dislike the idea
of having TDX-specific (or CoCo-specific) policy here.

How about we WARN_ON() RDRAND/RDSEED going bonkers?  The paranoid folks
can turn on panic_on_warn, if they haven't already.
  
H. Peter Anvin Jan. 29, 2024, 4:37 p.m. UTC | #4
On January 29, 2024 8:30:11 AM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>On 1/26/24 05:42, Kirill A. Shutemov wrote:
>> 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
>>    Another DoS variant against the Guest.
>
>I think Sean was going down the same path, but I really dislike the idea
>of having TDX-specific (or CoCo-specific) policy here.
>
>How about we WARN_ON() RDRAND/RDSEED going bonkers?  The paranoid folks
>can turn on panic_on_warn, if they haven't already.

That would be good anyway.
  
Kirill A. Shutemov Jan. 29, 2024, 4:41 p.m. UTC | #5
On Mon, Jan 29, 2024 at 08:30:11AM -0800, Dave Hansen wrote:
> On 1/26/24 05:42, Kirill A. Shutemov wrote:
> > 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
> >    Another DoS variant against the Guest.
> 
> I think Sean was going down the same path, but I really dislike the idea
> of having TDX-specific (or CoCo-specific) policy here.
> 
> How about we WARN_ON() RDRAND/RDSEED going bonkers?  The paranoid folks
> can turn on panic_on_warn, if they haven't already.

Sure, we can do it for kernel, but we have no control on what userspace
does.

Sensible userspace on RDRAND/RDSEED failure should fallback to kernel
asking for random bytes, but who knows if it happens in practice
everywhere.

Do we care?
  
H. Peter Anvin Jan. 29, 2024, 5:07 p.m. UTC | #6
On January 29, 2024 8:41:49 AM PST, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>On Mon, Jan 29, 2024 at 08:30:11AM -0800, Dave Hansen wrote:
>> On 1/26/24 05:42, Kirill A. Shutemov wrote:
>> > 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
>> >    Another DoS variant against the Guest.
>> 
>> I think Sean was going down the same path, but I really dislike the idea
>> of having TDX-specific (or CoCo-specific) policy here.
>> 
>> How about we WARN_ON() RDRAND/RDSEED going bonkers?  The paranoid folks
>> can turn on panic_on_warn, if they haven't already.
>
>Sure, we can do it for kernel, but we have no control on what userspace
>does.
>
>Sensible userspace on RDRAND/RDSEED failure should fallback to kernel
>asking for random bytes, but who knows if it happens in practice
>everywhere.
>
>Do we care?
>

You can't fix what you can't touch.
  
Dave Hansen Jan. 29, 2024, 6:55 p.m. UTC | #7
On 1/29/24 08:41, Kirill A. Shutemov wrote:
> On Mon, Jan 29, 2024 at 08:30:11AM -0800, Dave Hansen wrote:
>> On 1/26/24 05:42, Kirill A. Shutemov wrote:
>>> 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
>>>    Another DoS variant against the Guest.
>>
>> I think Sean was going down the same path, but I really dislike the idea
>> of having TDX-specific (or CoCo-specific) policy here.
>>
>> How about we WARN_ON() RDRAND/RDSEED going bonkers?  The paranoid folks
>> can turn on panic_on_warn, if they haven't already.
> 
> Sure, we can do it for kernel, but we have no control on what userspace
> does.
> 
> Sensible userspace on RDRAND/RDSEED failure should fallback to kernel
> asking for random bytes, but who knows if it happens in practice
> everywhere.
> 
> Do we care?

I want to make sure I understand the scenario:

 1. We're running in a guest under TDX (or SEV-SNP)
 2. The VMM (or somebody) is attacking the guest by eating all the
    hardware entropy and RDRAND is effectively busted
 3. Assuming kernel-based panic_on_warn and WARN_ON() rdrand_long()
    failure, that rdrand_long() never gets called.
 4. Userspace is using RDRAND output in some critical place like key
    generation and is not checking it for failure, nor mixing it with
    entropy from any other source
 5. Userspace uses the failed RDRAND output to generate a key
 6. Someone exploits the horrible key

Is that it?
  
Kirill A. Shutemov Jan. 29, 2024, 8:26 p.m. UTC | #8
On Mon, Jan 29, 2024 at 10:55:38AM -0800, Dave Hansen wrote:
> On 1/29/24 08:41, Kirill A. Shutemov wrote:
> > On Mon, Jan 29, 2024 at 08:30:11AM -0800, Dave Hansen wrote:
> >> On 1/26/24 05:42, Kirill A. Shutemov wrote:
> >>> 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
> >>>    Another DoS variant against the Guest.
> >>
> >> I think Sean was going down the same path, but I really dislike the idea
> >> of having TDX-specific (or CoCo-specific) policy here.
> >>
> >> How about we WARN_ON() RDRAND/RDSEED going bonkers?  The paranoid folks
> >> can turn on panic_on_warn, if they haven't already.
> > 
> > Sure, we can do it for kernel, but we have no control on what userspace
> > does.
> > 
> > Sensible userspace on RDRAND/RDSEED failure should fallback to kernel
> > asking for random bytes, but who knows if it happens in practice
> > everywhere.
> > 
> > Do we care?
> 
> I want to make sure I understand the scenario:
> 
>  1. We're running in a guest under TDX (or SEV-SNP)
>  2. The VMM (or somebody) is attacking the guest by eating all the
>     hardware entropy and RDRAND is effectively busted
>  3. Assuming kernel-based panic_on_warn and WARN_ON() rdrand_long()
>     failure, that rdrand_long() never gets called.

Never gets called during attack. It can be used before and after.

>  4. Userspace is using RDRAND output in some critical place like key
>     generation and is not checking it for failure, nor mixing it with
>     entropy from any other source
>  5. Userspace uses the failed RDRAND output to generate a key
>  6. Someone exploits the horrible key
> 
> Is that it?

Yes.
  
Dave Hansen Jan. 29, 2024, 9:04 p.m. UTC | #9
On 1/29/24 12:26, Kirill A. Shutemov wrote:
>>> Do we care?
>> I want to make sure I understand the scenario:
>>
>>  1. We're running in a guest under TDX (or SEV-SNP)
>>  2. The VMM (or somebody) is attacking the guest by eating all the
>>     hardware entropy and RDRAND is effectively busted
>>  3. Assuming kernel-based panic_on_warn and WARN_ON() rdrand_long()
>>     failure, that rdrand_long() never gets called.
> Never gets called during attack. It can be used before and after.
> 
>>  4. Userspace is using RDRAND output in some critical place like key
>>     generation and is not checking it for failure, nor mixing it with
>>     entropy from any other source
>>  5. Userspace uses the failed RDRAND output to generate a key
>>  6. Someone exploits the horrible key
>>
>> Is that it?
> Yes.

Is there something that fundamentally makes this a VMM vs. TDX guest
problem?  If a malicious VMM can exhaust RDRAND, why can't malicious
userspace do the same?

Let's assume buggy userspace exists.  Is that userspace *uniquely*
exposed to a naughty VMM or is that VMM just added to the list of things
that can attack buggy userspace?
  
H. Peter Anvin Jan. 29, 2024, 9:17 p.m. UTC | #10
On January 29, 2024 1:04:23 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>On 1/29/24 12:26, Kirill A. Shutemov wrote:
>>>> Do we care?
>>> I want to make sure I understand the scenario:
>>>
>>>  1. We're running in a guest under TDX (or SEV-SNP)
>>>  2. The VMM (or somebody) is attacking the guest by eating all the
>>>     hardware entropy and RDRAND is effectively busted
>>>  3. Assuming kernel-based panic_on_warn and WARN_ON() rdrand_long()
>>>     failure, that rdrand_long() never gets called.
>> Never gets called during attack. It can be used before and after.
>> 
>>>  4. Userspace is using RDRAND output in some critical place like key
>>>     generation and is not checking it for failure, nor mixing it with
>>>     entropy from any other source
>>>  5. Userspace uses the failed RDRAND output to generate a key
>>>  6. Someone exploits the horrible key
>>>
>>> Is that it?
>> Yes.
>
>Is there something that fundamentally makes this a VMM vs. TDX guest
>problem?  If a malicious VMM can exhaust RDRAND, why can't malicious
>userspace do the same?
>
>Let's assume buggy userspace exists.  Is that userspace *uniquely*
>exposed to a naughty VMM or is that VMM just added to the list of things
>that can attack buggy userspace?

The concern, I believe, is that a TDX guest is vulnerable as a *victim*, especially if the OS is being malicious.

However, as you say a malicious user space including a conventional VM could try to use it to attack another. The only thing we can do in the kernel about that is to be resilient.

Note that there is an option to the kernel to suspend boot until enough entropy has been gathered that predicting the output of the entropy pool in the kernel ought to be equivalent to breaking AES (in which case we have far worse problems.) To harden the VM case in general perhaps we should consider RDRAND to have zero entropy credit when used as a fallback for RDSEED.
  
Kirill A. Shutemov Jan. 29, 2024, 9:33 p.m. UTC | #11
On Mon, Jan 29, 2024 at 01:04:23PM -0800, Dave Hansen wrote:
> On 1/29/24 12:26, Kirill A. Shutemov wrote:
> >>> Do we care?
> >> I want to make sure I understand the scenario:
> >>
> >>  1. We're running in a guest under TDX (or SEV-SNP)
> >>  2. The VMM (or somebody) is attacking the guest by eating all the
> >>     hardware entropy and RDRAND is effectively busted
> >>  3. Assuming kernel-based panic_on_warn and WARN_ON() rdrand_long()
> >>     failure, that rdrand_long() never gets called.
> > Never gets called during attack. It can be used before and after.
> > 
> >>  4. Userspace is using RDRAND output in some critical place like key
> >>     generation and is not checking it for failure, nor mixing it with
> >>     entropy from any other source
> >>  5. Userspace uses the failed RDRAND output to generate a key
> >>  6. Someone exploits the horrible key
> >>
> >> Is that it?
> > Yes.
> 
> Is there something that fundamentally makes this a VMM vs. TDX guest
> problem?  If a malicious VMM can exhaust RDRAND, why can't malicious
> userspace do the same?
> 
> Let's assume buggy userspace exists.  Is that userspace *uniquely*
> exposed to a naughty VMM or is that VMM just added to the list of things
> that can attack buggy userspace?

This is good question.

VMM has control over when a VCPU gets scheduled and on what CPU which
gives it tighter control over the target workload. It can make a
difference if there's small window for an attack before RDRAND is
functional again.

Admittedly, I don't find my own argument very convincing :)
  
H. Peter Anvin Jan. 29, 2024, 9:38 p.m. UTC | #12
On January 29, 2024 1:17:07 PM PST, "H. Peter Anvin" <hpa@zytor.com> wrote:
>On January 29, 2024 1:04:23 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>>On 1/29/24 12:26, Kirill A. Shutemov wrote:
>>>>> Do we care?
>>>> I want to make sure I understand the scenario:
>>>>
>>>>  1. We're running in a guest under TDX (or SEV-SNP)
>>>>  2. The VMM (or somebody) is attacking the guest by eating all the
>>>>     hardware entropy and RDRAND is effectively busted
>>>>  3. Assuming kernel-based panic_on_warn and WARN_ON() rdrand_long()
>>>>     failure, that rdrand_long() never gets called.
>>> Never gets called during attack. It can be used before and after.
>>> 
>>>>  4. Userspace is using RDRAND output in some critical place like key
>>>>     generation and is not checking it for failure, nor mixing it with
>>>>     entropy from any other source
>>>>  5. Userspace uses the failed RDRAND output to generate a key
>>>>  6. Someone exploits the horrible key
>>>>
>>>> Is that it?
>>> Yes.
>>
>>Is there something that fundamentally makes this a VMM vs. TDX guest
>>problem?  If a malicious VMM can exhaust RDRAND, why can't malicious
>>userspace do the same?
>>
>>Let's assume buggy userspace exists.  Is that userspace *uniquely*
>>exposed to a naughty VMM or is that VMM just added to the list of things
>>that can attack buggy userspace?
>
>The concern, I believe, is that a TDX guest is vulnerable as a *victim*, especially if the OS is being malicious.
>
>However, as you say a malicious user space including a conventional VM could try to use it to attack another. The only thing we can do in the kernel about that is to be resilient.
>
>Note that there is an option to the kernel to suspend boot until enough entropy has been gathered that predicting the output of the entropy pool in the kernel ought to be equivalent to breaking AES (in which case we have far worse problems.) To harden the VM case in general perhaps we should consider RDRAND to have zero entropy credit when used as a fallback for RDSEED.
>

It is probably worth pointing out, too, that in reality the specs for RDRAND/RDSEED are *extremely* sandbagged. The architect told me that it is extremely unlikely that we will *ever* see a failure due to exhaustion, even if it is executed continuously on all cores – the randomness production rate exceeds the bandwidth of the bus in uncore.
  
H. Peter Anvin Jan. 29, 2024, 10:12 p.m. UTC | #13
On January 29, 2024 1:17:07 PM PST, "H. Peter Anvin" <hpa@zytor.com> wrote:
>On January 29, 2024 1:04:23 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>>On 1/29/24 12:26, Kirill A. Shutemov wrote:
>>>>> Do we care?
>>>> I want to make sure I understand the scenario:
>>>>
>>>>  1. We're running in a guest under TDX (or SEV-SNP)
>>>>  2. The VMM (or somebody) is attacking the guest by eating all the
>>>>     hardware entropy and RDRAND is effectively busted
>>>>  3. Assuming kernel-based panic_on_warn and WARN_ON() rdrand_long()
>>>>     failure, that rdrand_long() never gets called.
>>> Never gets called during attack. It can be used before and after.
>>> 
>>>>  4. Userspace is using RDRAND output in some critical place like key
>>>>     generation and is not checking it for failure, nor mixing it with
>>>>     entropy from any other source
>>>>  5. Userspace uses the failed RDRAND output to generate a key
>>>>  6. Someone exploits the horrible key
>>>>
>>>> Is that it?
>>> Yes.
>>
>>Is there something that fundamentally makes this a VMM vs. TDX guest
>>problem?  If a malicious VMM can exhaust RDRAND, why can't malicious
>>userspace do the same?
>>
>>Let's assume buggy userspace exists.  Is that userspace *uniquely*
>>exposed to a naughty VMM or is that VMM just added to the list of things
>>that can attack buggy userspace?
>
>The concern, I believe, is that a TDX guest is vulnerable as a *victim*, especially if the OS is being malicious.
>
>However, as you say a malicious user space including a conventional VM could try to use it to attack another. The only thing we can do in the kernel about that is to be resilient.
>
>Note that there is an option to the kernel to suspend boot until enough entropy has been gathered that predicting the output of the entropy pool in the kernel ought to be equivalent to breaking AES (in which case we have far worse problems.) To harden the VM case in general perhaps we should consider RDRAND to have zero entropy credit when used as a fallback for RDSEED.
>

So as far as I understand, the uncore bus (at least at the time RDRAND/RDSEED was designed) is a single-transaction bus; once a read transaction has been accepted by the bus the bus is locked until the reply is sent (like PCI.) As such, the RNG unit simply doesn't have to option of not returning a response without holding the whole uncore bus locked. However, I believe that if another core is waiting for the bus, that request will be served before the other core can return for more.

If the RNG bit source is crippled for some reason to the point of being near failure, it is certainly possible for a livelock to happen, but at least as far as I understand the likelihood of that happening enough to cause 16 failures in a row is so close to a total failure that it might be as well treated as one.

*Any* security sensitive application that doesn't take total RNG failure into account is fundamentally broken. *Any* hardware random number generator is inherently an analog device, and as such has a nonzero probability of failure. It has an integrity monitor, but all it can do is say "no" and not credit entropy, thereby slowing down and eventually stopping the unit (even RDRAND has a minimum seeding frequency guarantee, unlike /dev/urandom.)
  
Dave Hansen Jan. 29, 2024, 10:18 p.m. UTC | #14
On 1/29/24 13:33, Kirill A. Shutemov wrote:
>> Let's assume buggy userspace exists.  Is that userspace *uniquely*
>> exposed to a naughty VMM or is that VMM just added to the list of things
>> that can attack buggy userspace?
> This is good question.
> 
> VMM has control over when a VCPU gets scheduled and on what CPU which
> gives it tighter control over the target workload. It can make a
> difference if there's small window for an attack before RDRAND is
> functional again.

This is all a bit too theoretical for my taste.  I'm fine with doing
some generic mitigation (WARN_ON_ONCE(hardware_is_exhausted)), but we're
talking about a theoretical attack with theoretical buggy software when
in a theoretically unreachable hardware state.

Until it's clearly much more practical, we have much bigger problems to
worry about.
  
H. Peter Anvin Jan. 29, 2024, 11:32 p.m. UTC | #15
On January 29, 2024 2:18:50 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>On 1/29/24 13:33, Kirill A. Shutemov wrote:
>>> Let's assume buggy userspace exists.  Is that userspace *uniquely*
>>> exposed to a naughty VMM or is that VMM just added to the list of things
>>> that can attack buggy userspace?
>> This is good question.
>> 
>> VMM has control over when a VCPU gets scheduled and on what CPU which
>> gives it tighter control over the target workload. It can make a
>> difference if there's small window for an attack before RDRAND is
>> functional again.
>
>This is all a bit too theoretical for my taste.  I'm fine with doing
>some generic mitigation (WARN_ON_ONCE(hardware_is_exhausted)), but we're
>talking about a theoretical attack with theoretical buggy software when
>in a theoretically unreachable hardware state.
>
>Until it's clearly much more practical, we have much bigger problems to
>worry about.

Again, do we even have a problem with the "hold the boot until we have entropy"option?
  
Reshetova, Elena Jan. 30, 2024, 8:01 a.m. UTC | #16
> -----Original Message-----
> From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Sent: Monday, January 29, 2024 11:33 PM
> To: Hansen, Dave <dave.hansen@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>;
> Borislav Petkov <bp@alien8.de>; Dave Hansen <dave.hansen@linux.intel.com>; H.
> Peter Anvin <hpa@zytor.com>; x86@kernel.org; Theodore Ts'o <tytso@mit.edu>;
> Jason A. Donenfeld <Jason@zx2c4.com>; Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com>; Reshetova, Elena
> <elena.reshetova@intel.com>; Nakajima, Jun <jun.nakajima@intel.com>; Tom
> Lendacky <thomas.lendacky@amd.com>; Kalra, Ashish <ashish.kalra@amd.com>;
> Sean Christopherson <seanjc@google.com>; linux-coco@lists.linux.dev; linux-
> kernel@vger.kernel.org
> Subject: Re: [RFC] Randomness on confidential computing platforms
> 
> On Mon, Jan 29, 2024 at 01:04:23PM -0800, Dave Hansen wrote:
> > On 1/29/24 12:26, Kirill A. Shutemov wrote:
> > >>> Do we care?
> > >> I want to make sure I understand the scenario:
> > >>
> > >>  1. We're running in a guest under TDX (or SEV-SNP)
> > >>  2. The VMM (or somebody) is attacking the guest by eating all the
> > >>     hardware entropy and RDRAND is effectively busted
> > >>  3. Assuming kernel-based panic_on_warn and WARN_ON() rdrand_long()
> > >>     failure, that rdrand_long() never gets called.
> > > Never gets called during attack. It can be used before and after.
> > >
> > >>  4. Userspace is using RDRAND output in some critical place like key
> > >>     generation and is not checking it for failure, nor mixing it with
> > >>     entropy from any other source
> > >>  5. Userspace uses the failed RDRAND output to generate a key
> > >>  6. Someone exploits the horrible key
> > >>
> > >> Is that it?
> > > Yes.
> >
> > Is there something that fundamentally makes this a VMM vs. TDX guest
> > problem?  If a malicious VMM can exhaust RDRAND, why can't malicious
> > userspace do the same?

Let's be more concrete here: the main problem we are trying to fix here is
to make sure Linux RNG has entropy source(s) that are not under attacker control.
In case of userspace attacking kernel, yes, it can exhaust RDRAND/RDSEED,
but kernel has other entropy sources (interrupts) that are not under full userspace
control or fully observable. 
What makes the confidential VM story different is after VMM has exhausted
RDRAND/RDSEED, guest Linux RNG will fall back to the entropy sources that 
are under observance/control of VMM and this is what we try to avoid. 


> >
> > Let's assume buggy userspace exists.  Is that userspace *uniquely*
> > exposed to a naughty VMM or is that VMM just added to the list of things
> > that can attack buggy userspace?

Good behaving userspace will ask for its cryptographic randomness from 
Linux RNG (some might do direct RDRAND/RDSEED calls, but most will
rely on Linux RNG). When it does ask for it, it is going to get a number
from it. The fact that that number doesn’t have adequate security is not
visible for userspace in any way. I don’t think anyone will go to dmesg and
check the warning logs to determine this. 
So, I don’t see how warning helps here in practice. 

Best Regards,
Elena
  
Reshetova, Elena Jan. 30, 2024, 8:19 a.m. UTC | #17
> On January 29, 2024 2:18:50 PM PST, Dave Hansen <dave.hansen@intel.com>
> wrote:
> >On 1/29/24 13:33, Kirill A. Shutemov wrote:
> >>> Let's assume buggy userspace exists.  Is that userspace *uniquely*
> >>> exposed to a naughty VMM or is that VMM just added to the list of things
> >>> that can attack buggy userspace?
> >> This is good question.
> >>
> >> VMM has control over when a VCPU gets scheduled and on what CPU which
> >> gives it tighter control over the target workload. It can make a
> >> difference if there's small window for an attack before RDRAND is
> >> functional again.
> >
> >This is all a bit too theoretical for my taste.  I'm fine with doing
> >some generic mitigation (WARN_ON_ONCE(hardware_is_exhausted)), but we're
> >talking about a theoretical attack with theoretical buggy software when
> >in a theoretically unreachable hardware state.
> >
> >Until it's clearly much more practical, we have much bigger problems to
> >worry about.
> 
> Again, do we even have a problem with the "hold the boot until we have
> entropy"option?

Yes, we do have a problem. You cannot build a secure random number generator
in a situation when attacker controls/observes all your entropy sources. 
Linux RNG has many entropy sources (RDRAND/RDSEED is just one of them), and
as soon as we have at least some proper entropy input, you are ok (I am greatly 
oversimplifying the RNG theory now). 
What changes with confidential computing is that the entropy sources like
interrupts or timing-based information can be viewed as under attacker control
/observance. But this is *not* how Linux RNG views it by its threat model.
So, Linux RNG will boot and run just fine in a confidential guest in situations when
RDRAND/RDSEED always fails (it will use other entropy source like interrupts/timing info),
but the quality of its output becomes questionable assuming host/VMM is out of TCB. 

I hope we can get an opinion on this from maintainers of Linux RNG.

Best Regards,
Elena.
  

Patch

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index dec961c6d16a..a7bba37c7539 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -23,6 +23,7 @@ 
 #include "error.h"
 #include "../string.h"
 #include "efi.h"
+#include "sev.h"
 
 #include <generated/compile.h>
 #include <linux/module.h>
@@ -304,6 +305,11 @@  static void handle_mem_options(void)
 	return;
 }
 
+bool rd_loop(void)
+{
+	return early_is_tdx_guest() || sev_enabled();
+}
+
 /*
  * In theory, KASLR can put the kernel anywhere in the range of [16M, MAXMEM)
  * on 64-bit, and [16M, KERNEL_IMAGE_SIZE) on 32-bit.
diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
index dbba332e4a12..84a9d9ad98b2 100644
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -6,32 +6,6 @@ 
 #include "sev.h"
 #include <asm/shared/tdx.h>
 
-/*
- * accept_memory() and process_unaccepted_memory() called from EFI stub which
- * runs before decompressor and its early_tdx_detect().
- *
- * Enumerate TDX directly from the early users.
- */
-static bool early_is_tdx_guest(void)
-{
-	static bool once;
-	static bool is_tdx;
-
-	if (!IS_ENABLED(CONFIG_INTEL_TDX_GUEST))
-		return false;
-
-	if (!once) {
-		u32 eax, sig[3];
-
-		cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax,
-			    &sig[0], &sig[2],  &sig[1]);
-		is_tdx = !memcmp(TDX_IDENT, sig, sizeof(sig));
-		once = true;
-	}
-
-	return is_tdx;
-}
-
 void arch_accept_memory(phys_addr_t start, phys_addr_t end)
 {
 	/* Platform-specific memory-acceptance call goes here */
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index bc2f0f17fb90..3fd0aba836e7 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -255,4 +255,7 @@  static inline bool init_unaccepted_memory(void) { return false; }
 extern struct efi_unaccepted_memory *unaccepted_table;
 void accept_memory(phys_addr_t start, phys_addr_t end);
 
+#define rd_loop rd_loop
+extern bool rd_loop(void);
+
 #endif /* BOOT_COMPRESSED_MISC_H */
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 454acd7a2daf..5e7fb31e630b 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -125,6 +125,11 @@  static bool fault_in_kernel_space(unsigned long address)
 /* Include code for early handlers */
 #include "../../kernel/sev-shared.c"
 
+bool sev_enabled(void)
+{
+	return sev_status & MSR_AMD64_SEV_ENABLED;
+}
+
 bool sev_snp_enabled(void)
 {
 	return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
diff --git a/arch/x86/boot/compressed/sev.h b/arch/x86/boot/compressed/sev.h
index fc725a981b09..ec99e0390324 100644
--- a/arch/x86/boot/compressed/sev.h
+++ b/arch/x86/boot/compressed/sev.h
@@ -10,11 +10,13 @@ 
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
+bool sev_enabled(void);
 bool sev_snp_enabled(void);
 void snp_accept_memory(phys_addr_t start, phys_addr_t end);
 
 #else
 
+static inline bool sev_enabled(void) { return false; }
 static inline bool sev_snp_enabled(void) { return false; }
 static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
 
diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
index 8451d6a1030c..90dcfb9e82bf 100644
--- a/arch/x86/boot/compressed/tdx.c
+++ b/arch/x86/boot/compressed/tdx.c
@@ -61,13 +61,35 @@  static inline void tdx_outw(u16 value, u16 port)
 	tdx_io_out(2, port, value);
 }
 
+/*
+ * accept_memory() and process_unaccepted_memory() called from EFI stub which
+ * runs before decompressor and its early_tdx_detect().
+ *
+ * Enumerate TDX directly from the early users.
+ */
+bool early_is_tdx_guest(void)
+{
+	static bool once;
+	static bool is_tdx;
+
+	if (!IS_ENABLED(CONFIG_INTEL_TDX_GUEST))
+		return false;
+
+	if (!once) {
+		u32 eax, sig[3];
+
+		cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax,
+			    &sig[0], &sig[2],  &sig[1]);
+		is_tdx = !memcmp(TDX_IDENT, sig, sizeof(sig));
+		once = true;
+	}
+
+	return is_tdx;
+}
+
 void early_tdx_detect(void)
 {
-	u32 eax, sig[3];
-
-	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2],  &sig[1]);
-
-	if (memcmp(TDX_IDENT, sig, sizeof(sig)))
+	if (!early_is_tdx_guest())
 		return;
 
 	/* Use hypercalls instead of I/O instructions */
diff --git a/arch/x86/boot/compressed/tdx.h b/arch/x86/boot/compressed/tdx.h
index 9055482cd35c..6c097de8392e 100644
--- a/arch/x86/boot/compressed/tdx.h
+++ b/arch/x86/boot/compressed/tdx.h
@@ -5,8 +5,10 @@ 
 #include <linux/types.h>
 
 #ifdef CONFIG_INTEL_TDX_GUEST
+bool early_is_tdx_guest(void);
 void early_tdx_detect(void);
 #else
+bool early_is_tdx_guest(void) { return false; }
 static inline void early_tdx_detect(void) { };
 #endif
 
diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index f07c3bb7deab..655d881a9cfa 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -22,6 +22,7 @@  static bool noinstr intel_cc_platform_has(enum cc_attr attr)
 	case CC_ATTR_GUEST_UNROLL_STRING_IO:
 	case CC_ATTR_GUEST_MEM_ENCRYPT:
 	case CC_ATTR_MEM_ENCRYPT:
+	case CC_ATTR_GUEST_RAND_LOOP:
 		return true;
 	default:
 		return false;
@@ -72,6 +73,7 @@  static bool noinstr amd_cc_platform_has(enum cc_attr attr)
 		return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
 
 	case CC_ATTR_GUEST_MEM_ENCRYPT:
+	case CC_ATTR_GUEST_RAND_LOOP:
 		return sev_status & MSR_AMD64_SEV_ENABLED;
 
 	case CC_ATTR_GUEST_STATE_ENCRYPT:
diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h
index 02bae8e0758b..63368227c9d6 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -10,6 +10,7 @@ 
 #ifndef ASM_X86_ARCHRANDOM_H
 #define ASM_X86_ARCHRANDOM_H
 
+#include <linux/cc_platform.h>
 #include <asm/processor.h>
 #include <asm/cpufeature.h>
 
@@ -17,6 +18,13 @@ 
 
 /* Unconditional execution of RDRAND and RDSEED */
 
+#ifndef rd_loop
+static inline bool rd_loop(void)
+{
+	return cc_platform_has(CC_ATTR_GUEST_RAND_LOOP);
+}
+#endif
+
 static inline bool __must_check rdrand_long(unsigned long *v)
 {
 	bool ok;
@@ -27,17 +35,21 @@  static inline bool __must_check rdrand_long(unsigned long *v)
 			     : CC_OUT(c) (ok), [out] "=r" (*v));
 		if (ok)
 			return true;
-	} while (--retry);
+	} while (rd_loop() || --retry);
 	return false;
 }
 
 static inline bool __must_check rdseed_long(unsigned long *v)
 {
 	bool ok;
-	asm volatile("rdseed %[out]"
-		     CC_SET(c)
-		     : CC_OUT(c) (ok), [out] "=r" (*v));
-	return ok;
+	do {
+		asm volatile("rdseed %[out]"
+			     CC_SET(c)
+			     : CC_OUT(c) (ok), [out] "=r" (*v));
+		if (ok)
+			return ok;
+	} while (rd_loop());
+	return false;
 }
 
 /*
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index d08dd65b5c43..e554e8919eb0 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -80,6 +80,17 @@  enum cc_attr {
 	 * using AMD SEV-SNP features.
 	 */
 	CC_ATTR_GUEST_SEV_SNP,
+
+	/**
+	 * @CC_ATTR_GUEST_RAND_LOOP: Make RDRAND/RDSEED loop forever to
+	 * harden the random number generation.
+	 *
+	 * The platform/OS is running as a guest/virtual machine and
+	 * harden the random number generation.
+	 *
+	 * Examples include TDX guest & SEV.
+	 */
+	CC_ATTR_GUEST_RAND_LOOP,
 };
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM