Message ID | 20240130083007.1876787-1-kirill.shutemov@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-44136-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1222481dyb; Tue, 30 Jan 2024 05:36:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IGhML0o92gFobubOTY3C7+HWm4dzmKPjtBqTY0Iua4Q02dQwe9ntzvg6F2W3t8f/qe9y/ZE X-Received: by 2002:a17:906:f2d6:b0:a31:5941:4f7 with SMTP id gz22-20020a170906f2d600b00a31594104f7mr6296141ejb.39.1706621782437; Tue, 30 Jan 2024 05:36:22 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706621782; cv=pass; d=google.com; s=arc-20160816; b=uP8zK8NJfPoxfLQ8nJhoeAadJVAdMETnwP2oC8ol6wO6aKq0zTAZ7BrOAIvE6/5BRo cCAQJnegq/I5J2yJlOXv3XRi7NPb4b4o/2ga+AcmMLcY0x5+Fhkdv3dwezFlkrbd4jCR ARvV9k0hzm0fDDOpFNBnLL2Car+bJyaeXel+z+QfvJvHjIUJpiWOvEJOyr/4HW1jwp0u E+F6+uTwVibPB/lhdD5Lzf29S64LzQpVPINxgP6EkceyZxGhAFi4fiGLuiMqubPyP9w7 myL1I9QX0PCloyjS5lHQJ8OuoV0N15VwZNsPYNcGpz9UeoV4+pEfxAhw8A+nr0GNegfJ TTqQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=q1U0xxr7NbDQ1APeZApOThkxr40MUek3YuM/FrkqZvU=; fh=ODt9kxcTNWB7X1EmolPcmJdx0Rxj6DM5rRkwFDfN79c=; b=YBRIXW+i+3/7ws5/hHxgBW3Vv+eVoRuzOx9UnEH4P8xFn9gOrzSKsDj52Ot05TQUmf boWL+CHUS2bdVA2FT+8vh2AkehdxFa9m/Ove59lORCwtXv2vnJ3AUci1Pq4D4Yu7+fLU 1JqWty9O4oit4Y3zKF2hatwPvYp7qVVlGHm5B3lMzx+BjfBRZw4+qktnr2Nrov1f/cNw UIjXIdGtEaKXZ6bkf0BaYWqvPG8LLalmc+NifofC9JyFQBFQvPlzg7u67Dj+BoNAVBXK bnU5UCVpY57tC7DqqNHsihLWU0QTgUPiKWiqrJXAknJSPByhcRiDBJlto0U1BnyeQXf3 6tAw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="W/e4fGrO"; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-44136-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-44136-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id bl23-20020a170906c25700b00a35a69ff2a2si2148745ejb.864.2024.01.30.05.36.22 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 05:36:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-44136-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="W/e4fGrO"; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-44136-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-44136-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 5DA911F24B84 for <ouuuleilei@gmail.com>; Tue, 30 Jan 2024 08:31:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E405757881; Tue, 30 Jan 2024 08:30:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="W/e4fGrO" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80BF41E52C for <linux-kernel@vger.kernel.org>; Tue, 30 Jan 2024 08:30:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=134.134.136.31 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706603457; cv=none; b=TWnaD69fo2seLR77TK1FfGMVq8nsxJhI3WIFY2Ayf3DYgvIa0ZN/3sZVw7lkypkFZx1IZCXGUdBPKc41tSasXB4T6JrD4HsIZYPAh0fZx2xKUtoiEk3fbj4sJFLpKPnlUqjHUfv0c1Xv6BokHQapI6Bmuw/m+EtwXpruUtkVyqg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706603457; c=relaxed/simple; bh=bboqiXpspNty+IUr75AIb5o8t24UN3eTB7gE/wxTugU=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=pVyfO9DK+tFLFMt1QRtp5QIO3Zdoa6v70vxakR+bTRBorhixEaoxWL2VopVuLOFhUW50fj2+U7lh7UeabAOnEh34Rhulv0P/pmrpUYnf0yhCC0pncQdzb68B0yXCv9s1i/Cmm9ib1Rph8askSF0EPidIEyPHmsvi9Etuqb9fTcY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=W/e4fGrO; arc=none smtp.client-ip=134.134.136.31 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706603455; x=1738139455; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=bboqiXpspNty+IUr75AIb5o8t24UN3eTB7gE/wxTugU=; b=W/e4fGrOycPonf5ZB6fkIRXhcv4JHn4gKUHXqvzVih9cnrDoSp/zrr9F O7deKy+fFG0GtAH5mCl5VI9n0NnlRMuXW8QBf5dJylzJd3NQ5UJGtOcU8 PVH84BJ245h6jqMIOm9OSYdWpL0D0Sb0yol61x3iVd2+pxqgc1xiFFCIJ QAo6+0dpS2hOIXKaW5drqlmo81gbrLY0y8kzRxLmoCYlJPsmtztQnK0bi 66Ik8utqNG8za/qZv0UBGt5+5WEZHeM06viCsqkTk8/51kewlDBF8bxLi xocIwiae3fZDJxpjp0R7UO6e5UYVdkRsE9heH6OzyXatZIgKRH4FviVc3 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="467464323" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="467464323" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2024 00:30:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822120351" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822120351" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 30 Jan 2024 00:30:13 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 751D5DE; Tue, 30 Jan 2024 10:30:12 +0200 (EET) From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org, "Theodore Ts'o" <tytso@mit.edu>, "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, Elena Reshetova <elena.reshetova@intel.com>, Jun Nakajima <jun.nakajima@intel.com>, Tom Lendacky <thomas.lendacky@amd.com>, "Kalra, Ashish" <ashish.kalra@amd.com>, Sean Christopherson <seanjc@google.com>, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCH 1/2] x86/random: Retry on RDSEED failure Date: Tue, 30 Jan 2024 10:30:06 +0200 Message-ID: <20240130083007.1876787-1-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789518447278626787 X-GMAIL-MSGID: 1789522642077369852 |
Series |
[1/2] x86/random: Retry on RDSEED failure
|
|
Commit Message
Kirill A. Shutemov
Jan. 30, 2024, 8:30 a.m. UTC
The function rdrand_long() retries 10 times before returning failure to
the caller. On the other hand, rdseed_long() gives up on the first
failure.
According to the Intel SDM, both instructions should follow the same
retry approach. This information can be found in the section titled
"Random Number Generator Instructions".
To align the behavior of rdseed_long() with rdrand_long(), it should be
modified to retry 10 times before giving up.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/include/asm/archrandom.h | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
Comments
On Tue, Jan 30, 2024 at 01:29:10PM +0100, Jason A. Donenfeld wrote: > Hi Kirill, > > I've been following the other discussion closely thinking about the > matter, but I suppose I'll jump in here directly on this patch, if > this is the approach the discussion is congealing around. > > A comment below: > > On Tue, Jan 30, 2024 at 9:30 AM Kirill A. Shutemov > <kirill.shutemov@linux.intel.com> wrote: > > static inline bool __must_check rdseed_long(unsigned long *v) > > { > > + unsigned int retry = RDRAND_RETRY_LOOPS; > > bool ok; > > - asm volatile("rdseed %[out]" > > - CC_SET(c) > > - : CC_OUT(c) (ok), [out] "=r" (*v)); > > - return ok; > > + > > + do { > > + asm volatile("rdseed %[out]" > > + CC_SET(c) > > + : CC_OUT(c) (ok), [out] "=r" (*v)); > > + > > + if (ok) > > + return true; > > + } while (--retry); > > + > > + return false; > > } > > So, my understanding of RDRAND vs RDSEED -- deliberately leaving out > any cryptographic discussion here -- is roughly that RDRAND will > expand the seed material for longer, while RDSEED will mostly always > try to sample more bits from the environment. AES is fast, while > sampling is slow, so RDRAND gives better performance and is less > likely to fail, whereas RDSEED always has to wait on the hardware to > collect some bits, so is more likely to fail. > > For that reason, most of the usage of RDRAND and RDSEED inside of > random.c is something to the tune of `if (!rdseed(out)) rdrand(out);`, > first trying RDSEED but falling back to RDRAND if it's busy. That > still seems to me like a reasonable approach, which this patch would > partly undermine (in concert with the next patch, which I'll comment > on in a follow up email there). > > So maybe this patch #1 (of 2) can be dropped? Unless there's a difference between ring 0 and ring 3, this simple test is telling: #include <stdio.h> #include <immintrin.h> int main(int argc, char *argv[]) { unsigned long long rand; unsigned int i, success_rand = 0, success_seed = 0; enum { TOTAL = 1000000 }; for (i = 0; i < TOTAL; ++i) success_rand += !!_rdrand64_step(&rand); for (i = 0; i < TOTAL; ++i) success_seed += !!_rdseed64_step(&rand); printf("RDRAND: %.2f%%, RDSEED: %.2f%%\n", success_rand * 100.0 / TOTAL, success_seed * 100.0 / TOTAL); return 0; } Result on my i7-11850H: RDRAND: 100.00%, RDSEED: 29.26% And this doesn't even test multicore stuff. Jason
On Tue, Jan 30, 2024 at 2:10 PM Reshetova, Elena <elena.reshetova@intel.com> wrote: > The internals of Intel DRBG behind RDRAND/RDSEED has been publicly > documented, so the structure is no secret. Please see [1] for overall > structure and other aspects. So, yes, your overall understanding is correct > (there are many more details though). Indeed, have read it. > > So maybe this patch #1 (of 2) can be dropped? > > Before we start debating this patchset, what is your opinion on the original > problem we raised for CoCo VMs when both RDRAND/RDSEED are made to > fail deliberately? My general feeling is that this seems like a hardware problem. If you have a VM, the hypervisor should provide a seed. But with CoCo, you can't trust the host to do that. But can't the host do anything to the VM that it wants, like fiddle with its memory? No, there are special new hardware features to encrypt and protect ram to prevent this. So if you've found yourself in a situation where you absolutely cannot trust the host, AND the hardware already has working guest protections from the host, then it would seem you also need a hardware solution to handle seeding. And you're claiming that RDRAND/RDSEED is the *only* hardware solution available for it. Is that an accurate summary? If it is, then the actual problem is that the hardware provided to solve this problem doesn't actually solve it that well, so we're caught deciding between guest-guest DoS (some other guest on the system uses all RDRAND resources) and cryptographic failure because of a malicious host creating a deterministic environment. But I have two questions: 1) Is this CoCo VM stuff even real? Is protecting guests from hosts actually possible in the end? Is anybody doing this? I assume they are, so maybe ignore this question, but I would like to register my gut feeling that on the Intel platform this seems like an endless whack-a-mole problem like SGX. 2) Can a malicious host *actually* create a fully deterministic environment? One that'll produce the same timing for the jitter entropy creation, and all the other timers and interrupts and things? I imagine the attestation part of CoCo means these VMs need to run on real Intel silicon and so it can't be single stepped in TCG or something, right? So is this problem actually a real one? And to what degree? Any good experimental research on this? Either way, if you're convinced RDRAND is the *only* way here, adding a `WARN_ON(is_in_early_boot)` to the RDRAND (but not RDSEED) failure path seems a fairly lightweight bandaid. I just wonder if the hardware people could come up with something more reliable that we wouldn't have to agonize over in the kernel. Jason
On Tue, Jan 30, 2024 at 03:06:14PM +0100, Jason A. Donenfeld wrote: > Is that an accurate summary? If it is, then the actual problem is that > the hardware provided to solve this problem doesn't actually solve it > that well, so we're caught deciding between guest-guest DoS (some > other guest on the system uses all RDRAND resources) and cryptographic > failure because of a malicious host creating a deterministic > environment. In a CoCo VM environment, a guest DoS is not a unique threat scenario, as it is unrelated to confidentiality. Ensuring fair subdivision of resources between competeing guests is just a general VM threat. There are many easy ways a host admin can stop a guest making computational progress. Simply not scheduling the guest vCPU threads is one. CoCo doesn't try to solve this problem. Preserving confidentiality is the primary aim of CoCo. IOW, if the guest boot is stalled because the kernel is spinning waiting on RDRAND to return data, that's fine. If the kernel panics after "n" RDRAND failures in a row that's fine too. They are both just yet another DoS scenario. If the kernel ignores the RDRAND failure and lets it boot with degraded RNG state there were susceptible to attacks, that would not be OK for CoCo. > But I have two questions: > > 1) Is this CoCo VM stuff even real? Is protecting guests from hosts > actually possible in the end? Is anybody doing this? I assume they > are, so maybe ignore this question, but I would like to register my > gut feeling that on the Intel platform this seems like an endless > whack-a-mole problem like SGX. It is real, but it is also not perfect. I expect it /will/ be an endless whack-a-mole problem though. None the less, it is a significant layer of defence, as compared to traditional VMs where the guest RAM is nothing more than a 'cat' command away from host admin exposure. > 2) Can a malicious host *actually* create a fully deterministic > environment? One that'll produce the same timing for the jitter > entropy creation, and all the other timers and interrupts and things? > I imagine the attestation part of CoCo means these VMs need to run on > real Intel silicon and so it can't be single stepped in TCG or > something, right? So is this problem actually a real one? And to what > degree? Any good experimental research on this? > > Either way, if you're convinced RDRAND is the *only* way here, adding > a `WARN_ON(is_in_early_boot)` to the RDRAND (but not RDSEED) failure > path seems a fairly lightweight bandaid. I just wonder if the hardware > people could come up with something more reliable that we wouldn't > have to agonize over in the kernel. If RDRAND failure is more of a theoretical problem than a practical real world problem, I'd be inclined to just let the kernel loop on RDRAND failure until it suceeds, with a WARN after 'n' iterations to aid diagnosis of the stall in the unlikely even it did hit. With regards, Daniel
On Tue, Jan 30, 2024 at 02:43:19PM +0000, Daniel P. Berrangé wrote: > On Tue, Jan 30, 2024 at 03:06:14PM +0100, Jason A. Donenfeld wrote: > > Is that an accurate summary? If it is, then the actual problem is that > > the hardware provided to solve this problem doesn't actually solve it > > that well, so we're caught deciding between guest-guest DoS (some > > other guest on the system uses all RDRAND resources) and cryptographic > > failure because of a malicious host creating a deterministic > > environment. > > In a CoCo VM environment, a guest DoS is not a unique threat > scenario, as it is unrelated to confidentiality. Ensuring > fair subdivision of resources between competeing guests is > just a general VM threat. There are many easy ways a host > admin can stop a guest making computational progress. Simply > not scheduling the guest vCPU threads is one. CoCo doesn't > try to solve this problem. > > Preserving confidentiality is the primary aim of CoCo. > > IOW, if the guest boot is stalled because the kernel is spinning > waiting on RDRAND to return data, that's fine. If the kernel > panics after "n" RDRAND failures in a row that's fine too. They > are both just yet another DoS scenario. > > If the kernel ignores the RDRAND failure and lets it boot with > degraded RNG state there were susceptible to attacks, that would > not be OK for CoCo. Yea, that's why I said "we're caught deciding..." One case is a DoS that would affect all VMs, so while one guest preventing new guests from booting seems like not a CoCo problem, yes, it is still a problem. At least in theory. And in practice this is easy with RDSEED too. In practice, could you actually indefinably starve RDRAND between guests? Is this pretty easy to do with a little tinkering, or is this a practically impossible DoS vector? I don't actually know.
On January 30, 2024 5:10:20 AM PST, "Reshetova, Elena" <elena.reshetova@intel.com> wrote: > >> Hi Kirill, >> >> I've been following the other discussion closely thinking about the >> matter, but I suppose I'll jump in here directly on this patch, if >> this is the approach the discussion is congealing around. >> >> A comment below: >> >> On Tue, Jan 30, 2024 at 9:30 AM Kirill A. Shutemov >> <kirill.shutemov@linux.intel.com> wrote: >> > static inline bool __must_check rdseed_long(unsigned long *v) >> > { >> > + unsigned int retry = RDRAND_RETRY_LOOPS; >> > bool ok; >> > - asm volatile("rdseed %[out]" >> > - CC_SET(c) >> > - : CC_OUT(c) (ok), [out] "=r" (*v)); >> > - return ok; >> > + >> > + do { >> > + asm volatile("rdseed %[out]" >> > + CC_SET(c) >> > + : CC_OUT(c) (ok), [out] "=r" (*v)); >> > + >> > + if (ok) >> > + return true; >> > + } while (--retry); >> > + >> > + return false; >> > } >> >> So, my understanding of RDRAND vs RDSEED -- deliberately leaving out >> any cryptographic discussion here -- is roughly that RDRAND will >> expand the seed material for longer, while RDSEED will mostly always >> try to sample more bits from the environment. AES is fast, while >> sampling is slow, so RDRAND gives better performance and is less >> likely to fail, whereas RDSEED always has to wait on the hardware to >> collect some bits, so is more likely to fail. > >The internals of Intel DRBG behind RDRAND/RDSEED has been publicly >documented, so the structure is no secret. Please see [1] for overall >structure and other aspects. So, yes, your overall understanding is correct >(there are many more details though). > >[1] https://www.intel.com/content/www/us/en/developer/articles/guide/intel-digital-random-number-generator-drng-software-implementation-guide.html > > >> >> For that reason, most of the usage of RDRAND and RDSEED inside of >> random.c is something to the tune of `if (!rdseed(out)) rdrand(out);`, >> first trying RDSEED but falling back to RDRAND if it's busy. That >> still seems to me like a reasonable approach, which this patch would >> partly undermine (in concert with the next patch, which I'll comment >> on in a follow up email there). > >I agree that for the purpose of extracting entropy for Linux RNG falling >back to RDRAND (current behavior) is perfectly ok, so I think you are doing >the right thing. However, in principle it is not always the case, there are >situations when a fallback to RDRAND should not be used, but it is also >true that the user of this interface should know/understand this situation. > >> >> So maybe this patch #1 (of 2) can be dropped? > >Before we start debating this patchset, what is your opinion on the original >problem we raised for CoCo VMs when both RDRAND/RDSEED are made to >fail deliberately? > >Best Regards, >Elena. > > I have a real concern with this. We already have the option to let the entropy pool fill before the boot can proceed. This would have the risk of massively increasing the interrupt latency for what will be retried anyway.
On 1/30/24 12:30 AM, Kirill A. Shutemov wrote: > The function rdrand_long() retries 10 times before returning failure to > the caller. On the other hand, rdseed_long() gives up on the first > failure. > > According to the Intel SDM, both instructions should follow the same > retry approach. This information can be found in the section titled > "Random Number Generator Instructions". > > To align the behavior of rdseed_long() with rdrand_long(), it should be > modified to retry 10 times before giving up. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- Change looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Wondering whether this needs to go to stable trees? > arch/x86/include/asm/archrandom.h | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h > index 02bae8e0758b..918c5880de9e 100644 > --- a/arch/x86/include/asm/archrandom.h > +++ b/arch/x86/include/asm/archrandom.h > @@ -33,11 +33,19 @@ static inline bool __must_check rdrand_long(unsigned long *v) > > static inline bool __must_check rdseed_long(unsigned long *v) > { > + unsigned int retry = RDRAND_RETRY_LOOPS; > bool ok; > - asm volatile("rdseed %[out]" > - CC_SET(c) > - : CC_OUT(c) (ok), [out] "=r" (*v)); > - return ok; > + > + do { > + asm volatile("rdseed %[out]" > + CC_SET(c) > + : CC_OUT(c) (ok), [out] "=r" (*v)); > + > + if (ok) > + return true; > + } while (--retry); > + > + return false; > } > > /*
Elena, On Tue, Jan 30, 2024 at 3:06 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > 2) Can a malicious host *actually* create a fully deterministic > environment? One that'll produce the same timing for the jitter > entropy creation, and all the other timers and interrupts and things? > I imagine the attestation part of CoCo means these VMs need to run on > real Intel silicon and so it can't be single stepped in TCG or > something, right? So is this problem actually a real one? And to what > degree? Any good experimental research on this? I'd like to re-up this question. It seems like assessing the reality of the concern would be worthwhile. Jason
> Elena, > > On Tue, Jan 30, 2024 at 3:06 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > 2) Can a malicious host *actually* create a fully deterministic > > environment? One that'll produce the same timing for the jitter > > entropy creation, and all the other timers and interrupts and things? > > I'd like to re-up this question. It seems like assessing the reality > of the concern would be worthwhile. Yes, sorry, I am just behind answering this thread and it is getting late here. This is exactly what I would like to have an open discussion about with inputs from everyone. We have to remember that it is not only about host 'producing' a fully deterministic environment, but also about host being able to *observe* the entropy input. So the more precise question to ask is how much can a host observe? My personal understanding is that host can observe all guest interrupts and their timings, including APIC timer interrupts (and IPIs), so what is actually left for the guest as unobservable entropy input? And let's also please remember that this is by no means Intel-specific, we have other confidential computing vendors, so we need a common agreement on what is the superset of attacker powers that we can assume. > > I imagine the attestation part of CoCo means these VMs need to run on > > real Intel silicon and so it can't be single stepped in TCG or > > something, right? Yes, there is an attestation of a confidential VM and some protections in place that helps against single-stepping attacks. But I am not sure how this is relevant for this, could you please clarify? Best Regards, Elena.
Hi Elena, On Tue, Jan 30, 2024 at 8:06 PM Reshetova, Elena <elena.reshetova@intel.com> wrote: > Yes, sorry, I am just behind answering this thread and it is getting late here. > This is exactly what I would like to have an open discussion about > with inputs from everyone. > We have to remember that it is not only about host 'producing' > a fully deterministic environment, but also about host being able to > *observe* the entropy input. So the more precise question to ask is > how much can a host observe? Right, observation is just as relevant. > My personal understanding is that host can > observe all guest interrupts and their timings, including APIC timer interrupts > (and IPIs), so what is actually left for the guest as unobservable entropy > input? Check out try_to_generate_entropy() and random_get_entropy(), for example. How observable is RDTSC? Other HPTs? > > > I imagine the attestation part of CoCo means these VMs need to run on > > > real Intel silicon and so it can't be single stepped in TCG or > > > something, right? > > Yes, there is an attestation of a confidential VM and some protections in place > that helps against single-stepping attacks. But I am not sure how this is relevant > for this, could you please clarify? I was just thinking that if this didn't require genuine Intel hardware with prebaked keys in it that you could emulate a CPU and all its peripherals and ram with defined latencies and such, and run the VM in a very straightforwardly deterministic environment, because nothing would be real. But if this does have to hit metal somewhere, then there's some possibility you at least interact with some hard-to-model physical hardware. Jason
> Hi Elena, > > On Tue, Jan 30, 2024 at 8:06 PM Reshetova, Elena > <elena.reshetova@intel.com> wrote: > > Yes, sorry, I am just behind answering this thread and it is getting late here. > > This is exactly what I would like to have an open discussion about > > with inputs from everyone. > > We have to remember that it is not only about host 'producing' > > a fully deterministic environment, but also about host being able to > > *observe* the entropy input. So the more precise question to ask is > > how much can a host observe? > > Right, observation is just as relevant. > > > My personal understanding is that host can > > observe all guest interrupts and their timings, including APIC timer interrupts > > (and IPIs), so what is actually left for the guest as unobservable entropy > > input? > > Check out try_to_generate_entropy() and random_get_entropy(), for > example. How observable is RDTSC? Other HPTs? Ok, here imo it gets arch-specific and so please treat my answers only with Intel TDX arch in mind. I do know that for example AMD behavior for TSC is different, albeit I am not sure of details. Other archs might also have different behavior. For Intel TDX, when a guest executes RDTSC, it gets a virtual TSC value that is calculated deterministically based on a bunch of inputs that are either platform HW specific or VMM/host configured. The physical TSC value is taken into account also in calculations. The guest itself is not able to use usual controls (such as IA32_TSC_ADJUST and such). For details (albeit not exact calculations) please see [1]. If you are interested in exact calculations, the public source code of TDX module is a better reference [2], check calculate_virt_tsc() or just grep with "tsc" it would show you both comments explaining what is happening and calculations. So given this, I would personally consider the virtual guest TSC value observable by host/VMM. [1] TDX module spec, section 11.13 Time Stamp Counter (TSC) https://cdrdv2.intel.com/v1/dl/getContent/733575 [2] TDX module source code: https://www.intel.com/content/www/us/en/download/738875/782152/intel-trust-domain-extension-intel-tdx-module.html For the high resolution timers, host controls guest apic timers and interrupts fully. So, it has the power to see and even affect when a certain interrupt happens or doesnt happen in the guest. It can delay guest timers at its will on pretty extensive time periods. This seems powerful enough for me. Things like HPET are also fully under host control. > > > > I imagine the attestation part of CoCo means these VMs need to run on > > > > real Intel silicon and so it can't be single stepped in TCG or > > > > something, right? > > > > Yes, there is an attestation of a confidential VM and some protections in place > > that helps against single-stepping attacks. But I am not sure how this is relevant > > for this, could you please clarify? > > I was just thinking that if this didn't require genuine Intel hardware > with prebaked keys in it that you could emulate a CPU and all its > peripherals and ram with defined latencies and such, and run the VM in > a very straightforwardly deterministic environment, because nothing > would be real. But if this does have to hit metal somewhere, then > there's some possibility you at least interact with some hard-to-model > physical hardware. Yes, in practice there will be physical hw underneath, but the problem imo is that the host is in between and still very powerful when it comes to interrupts and timers at the moment. So, I want to make sure people understand the potential implications overall, and in this case the potential implications on such a critical security component as Linux RNG. Best Regards, Elena.
On Wed, Jan 31, 2024 at 8:56 AM Reshetova, Elena <elena.reshetova@intel.com> wrote: > So given this, I would personally consider the virtual guest TSC value > observable by host/VMM. > [2] TDX module source code: > https://www.intel.com/content/www/us/en/download/738875/782152/intel-trust-domain-extension-intel-tdx-module.html Thanks for the explanation and link. Indeed if this is all mediated by the host, we're in bad shape. > For the high resolution timers, host controls guest apic timers and interrupts fully. > So, it has the power to see and even affect when a certain interrupt happens > or doesnt happen in the guest. It can delay guest timers at its will on pretty > extensive time periods. This seems powerful enough for me. > Things like HPET are also fully under host control. And I suppose RDPMC is similar? And it's not like the guest can just take an excessive amount of TSC samples and randomly select which ones it uses because chickens and eggs... The situation you paint is that all of our entropy inputs -- timers, rdrand, etc -- are either host controllable, host observable, or host (and guest sibling) DoS'able, so if you don't trust the host, there are no good inputs. That's not a great position to be in, and I wonder if something can be done on the hardware side to remedy it, as this seems like a major shortcoming in TDX. So far, all of the proposed mitigations introduce some other DoS. > Yes, in practice there will be physical hw underneath, but the problem imo is > that the host is in between and still very powerful when it comes to interrupts and > timers at the moment. Sure sounds like it. Jason
What about simply treating boot-time initialization of the /dev/random state as special. That is, on x86, if the hardware promises that RDSEED or RDRAND is available, we use them to initialization our RNG state at boot. On bare metal, there can't be anyone else trying to exhaust the on-chip RNG's entropy supply, so if RDSEED or RDRAND aren't working available --- panic, since the hardware is clearly busted. On a guest OS, if confidential compute is enabled, and if RDSEED and RDRAND don't work after N retries, and we know CC is enabled, panic, since the kernel can't provide the promised security gaurantees, and the CC developers and users are cordially invited to sharpen their pitchforks and to send their tender regards to the Intel RNG engineers. For non-confidential compute guests, the question is what is the appropriate reaction if another VM, possibly belonging to a different user/customer, is carrying out a RDRAND DOS attack. I'd argue that in these cases, if the guest VM is using virtio-random, then the host's /dev/random should be able to cover for cases of Intel RNG exhaustion, and allowing other customer to be able to prevent other user's VM's from being able to boot is the the greater evil, so we shouldn't treat boot-time RDRAND/RDSEED failures as panic-worthy. - Ted
On Wed, Jan 31, 2024 at 09:07:56AM -0500, Theodore Ts'o wrote: > What about simply treating boot-time initialization of the /dev/random > state as special. That is, on x86, if the hardware promises that > RDSEED or RDRAND is available, we use them to initialization our RNG > state at boot. On bare metal, there can't be anyone else trying to > exhaust the on-chip RNG's entropy supply, so if RDSEED or RDRAND > aren't working available --- panic, since the hardware is clearly > busted. This is the first thing I suggested here: https://lore.kernel.org/all/CAHmME9qsfOdOEHHw_MOBmt6YAtncbbqP9LPK2dRjuOp1CrHzRA@mail.gmail.com/ But Elena found this dissatisfying because we still can't guarantee new material later. > On a guest OS, if confidential compute is enabled, and if RDSEED and > RDRAND don't work after N retries, and we know CC is enabled, panic, > since the kernel can't provide the promised security gaurantees, and > the CC developers and users are cordially invited to sharpen their > pitchforks and to send their tender regards to the Intel RNG > engineers. Yea, maybe bubbling the RDRAND DoS up to another DoS in the CoCo case is a good tradeoff that will produce the right pitchforkers without breaking anything real. > For non-confidential compute guests, the question is what is the > appropriate reaction if another VM, possibly belonging to a different > user/customer, is carrying out a RDRAND DOS attack. I'd argue that in > these cases, if the guest VM is using virtio-random, then the host's > /dev/random should be able to cover for cases of Intel RNG exhaustion, > and allowing other customer to be able to prevent other user's VM's > from being able to boot is the the greater evil, so we shouldn't treat > boot-time RDRAND/RDSEED failures as panic-worthy. The non-CoCo case is fine, because guests can trust hosts, so things are as they have been forever. Jason
On Wed, Jan 31, 2024 at 03:45:06PM +0100, Jason A. Donenfeld wrote: > On Wed, Jan 31, 2024 at 09:07:56AM -0500, Theodore Ts'o wrote: > > What about simply treating boot-time initialization of the /dev/random > > state as special. That is, on x86, if the hardware promises that > > RDSEED or RDRAND is available, we use them to initialization our RNG > > state at boot. On bare metal, there can't be anyone else trying to > > exhaust the on-chip RNG's entropy supply, so if RDSEED or RDRAND > > aren't working available --- panic, since the hardware is clearly > > busted. > > This is the first thing I suggested here: https://lore.kernel.org/all/CAHmME9qsfOdOEHHw_MOBmt6YAtncbbqP9LPK2dRjuOp1CrHzRA@mail.gmail.com/ > > But Elena found this dissatisfying because we still can't guarantee new > material later. > > > On a guest OS, if confidential compute is enabled, and if RDSEED and > > RDRAND don't work after N retries, and we know CC is enabled, panic, > > since the kernel can't provide the promised security gaurantees, and > > the CC developers and users are cordially invited to sharpen their > > pitchforks and to send their tender regards to the Intel RNG > > engineers. > > Yea, maybe bubbling the RDRAND DoS up to another DoS in the CoCo case is > a good tradeoff that will produce the right pitchforkers without > breaking anything real. One problem, though, is userspace can DoS the kernel's use of RDRAND. So probably infinitely retrying in CoCo environments is better than panicing/warning, since ostensibly a kthread will eventually succeed. Maybe, though, the Intel platform just simply isn't ready for CoCo, and marketing got a little bit ahead of the tech? Jason
On Wed, Jan 31, 2024 at 03:45:06PM +0100, Jason A. Donenfeld wrote: > On Wed, Jan 31, 2024 at 09:07:56AM -0500, Theodore Ts'o wrote: > > What about simply treating boot-time initialization of the /dev/random > > state as special. That is, on x86, if the hardware promises that > > RDSEED or RDRAND is available, we use them to initialization our RNG > > state at boot. On bare metal, there can't be anyone else trying to > > exhaust the on-chip RNG's entropy supply, so if RDSEED or RDRAND > > aren't working available --- panic, since the hardware is clearly > > busted. > > This is the first thing I suggested here: https://lore.kernel.org/all/CAHmME9qsfOdOEHHw_MOBmt6YAtncbbqP9LPK2dRjuOp1CrHzRA@mail.gmail.com/ > > But Elena found this dissatisfying because we still can't guarantee new > material later. Right, but this is good enough that modulo in-kernel RNG state compromise, or the ability to attack the underlying cryptographic primitives (in which case we have much bigger vulnerabilities than this largely theoretical one), even if we don't have new material later, the in-kernel RNG for the CC VM should be sufficiently trustworthy for government work. > Yea, maybe bubbling the RDRAND DoS up to another DoS in the CoCo case is > a good tradeoff that will produce the right pitchforkers without > breaking anything real. <Evil Grin> - Ted
> On Wed, Jan 31, 2024 at 03:45:06PM +0100, Jason A. Donenfeld wrote: > > On Wed, Jan 31, 2024 at 09:07:56AM -0500, Theodore Ts'o wrote: > > > What about simply treating boot-time initialization of the /dev/random > > > state as special. That is, on x86, if the hardware promises that > > > RDSEED or RDRAND is available, we use them to initialization our RNG > > > state at boot. On bare metal, there can't be anyone else trying to > > > exhaust the on-chip RNG's entropy supply, so if RDSEED or RDRAND > > > aren't working available --- panic, since the hardware is clearly > > > busted. > > > > This is the first thing I suggested here: > https://lore.kernel.org/all/CAHmME9qsfOdOEHHw_MOBmt6YAtncbbqP9LPK2dRjuO > p1CrHzRA@mail.gmail.com/ > > > > But Elena found this dissatisfying because we still can't guarantee new > > material later. > > Right, but this is good enough that modulo in-kernel RNG state > compromise, or the ability to attack the underlying cryptographic > primitives (in which case we have much bigger vulnerabilities than > this largely theoretical one), even if we don't have new material > later, the in-kernel RNG for the CC VM should be sufficiently > trustworthy for government work. I agree, this is probably the best we can do at the moment. I did want to point out the runtime need of fresh entropy also, but as we discussed in this thread we might not be able to get it without introducing a DoS path for the userspace. In this case, it is the best to only loose the forward prediction property vs. the whole Linux RNG. Best Regards, Elena.
On Wed, Jan 31, 2024 at 6:37 PM Reshetova, Elena <elena.reshetova@intel.com> wrote: > > > > > On Wed, Jan 31, 2024 at 03:45:06PM +0100, Jason A. Donenfeld wrote: > > > On Wed, Jan 31, 2024 at 09:07:56AM -0500, Theodore Ts'o wrote: > > > > What about simply treating boot-time initialization of the /dev/random > > > > state as special. That is, on x86, if the hardware promises that > > > > RDSEED or RDRAND is available, we use them to initialization our RNG > > > > state at boot. On bare metal, there can't be anyone else trying to > > > > exhaust the on-chip RNG's entropy supply, so if RDSEED or RDRAND > > > > aren't working available --- panic, since the hardware is clearly > > > > busted. > > > > > > This is the first thing I suggested here: > > https://lore.kernel.org/all/CAHmME9qsfOdOEHHw_MOBmt6YAtncbbqP9LPK2dRjuO > > p1CrHzRA@mail.gmail.com/ > > > > > > But Elena found this dissatisfying because we still can't guarantee new > > > material later. > > > > Right, but this is good enough that modulo in-kernel RNG state > > compromise, or the ability to attack the underlying cryptographic > > primitives (in which case we have much bigger vulnerabilities than > > this largely theoretical one), even if we don't have new material > > later, the in-kernel RNG for the CC VM should be sufficiently > > trustworthy for government work. > > I agree, this is probably the best we can do at the moment. > I did want to point out the runtime need of fresh entropy also, but > as we discussed in this thread we might not be able to get it > without introducing a DoS path for the userspace. > In this case, it is the best to only loose the forward prediction property > vs. the whole Linux RNG. So if this is what we're congealing around, I guess we can: 0) Leave RDSEED alone and focus on RDRAND. 1) Add `WARN_ON_ONCE(in_early_boot);` to the failure path of RDRAND (and simply hope this doesn't get exploited for guest-guest boot DoS). 2) Loop forever in RDRAND on CoCo VMs, post-boot, with the comments and variable naming making it clear that this is a hardware bug workaround, not a "feature" added for "extra security". 3) Complain loudly to Intel and get them to fix the hardware. Though, a large part of me would really like to skip that step (2), first because it's a pretty gross bandaid that adds lots of complexity, and second because it'll make (3) less poignant. Jason
On Wed, Jan 31, 2024 at 07:01:01PM +0100, Jason A. Donenfeld wrote: > So if this is what we're congealing around, I guess we can: > > 0) Leave RDSEED alone and focus on RDRAND. > 1) Add `WARN_ON_ONCE(in_early_boot);` to the failure path of RDRAND > (and simply hope this doesn't get exploited for guest-guest boot DoS). > 2) Loop forever in RDRAND on CoCo VMs, post-boot, with the comments > and variable naming making it clear that this is a hardware bug > workaround, not a "feature" added for "extra security". > 3) Complain loudly to Intel and get them to fix the hardware. > > Though, a large part of me would really like to skip that step (2), > first because it's a pretty gross bandaid that adds lots of > complexity, and second because it'll make (3) less poignant If we need to loop more than, say, 10 seconds in a CoCo VM, I'd just panic with a repeated RDRAND failure message. This makes the point of (3) that much pointed, and it's better than having a CoCo VM mysteriously hang in the face of a DOS attack. I'll note that it should be relatively easy for Intel to make sure that if there is an undue draw on RDRAND, to at that point enforce "fair share" mode where each of the N cores get at most 1/N of the available entropy. So if you have single core CoCo VM on a 256 core machine trying to boot, and the evil attacker has purchased 255 cores worth of VM's, all of which are busy-looping on RDRAND, while the CoCo VM is booting, if it is looping on RDRAND, it should be getting 1/256th of the availabe RDRAND output, and since it is only trying to grab enough randomness to seed the /dev/random CRNG, if it can't get enough randomness in 10 seconds --- well, Intel's customers should be finding another vendor's CPU that can do a better job. - Ted
Hi Ted, Elena, Dave, On Thu, Feb 1, 2024 at 5:57 AM Theodore Ts'o <tytso@mit.edu> wrote: > > On Wed, Jan 31, 2024 at 07:01:01PM +0100, Jason A. Donenfeld wrote: > > So if this is what we're congealing around, I guess we can: > > > > 0) Leave RDSEED alone and focus on RDRAND. > > 1) Add `WARN_ON_ONCE(in_early_boot);` to the failure path of RDRAND > > (and simply hope this doesn't get exploited for guest-guest boot DoS). > > 2) Loop forever in RDRAND on CoCo VMs, post-boot, with the comments > > and variable naming making it clear that this is a hardware bug > > workaround, not a "feature" added for "extra security". > > 3) Complain loudly to Intel and get them to fix the hardware. > > > > Though, a large part of me would really like to skip that step (2), > > first because it's a pretty gross bandaid that adds lots of > > complexity, and second because it'll make (3) less poignant > > If we need to loop more than, say, 10 seconds in a CoCo VM, I'd just > panic with a repeated RDRAND failure message. This makes the point of > (3) that much pointed, and it's better than having a CoCo VM > mysteriously hang in the face of a DOS attack. Yea, true. Problem is that in theory, userspace can DoS the kernel's use of RDRAND. Of course in practice, a userspace process preempting a kthread for >10 seconds is probably a larger problem. Anyway, I want to lay out the various potential solutions discussed. As they all have some drawback, it's worth enumerating them. == Solution A) WARN_ON_ONCE(is_early_boot)/BUG_ON(is_early_boot) in the RDRAND failure path (> 10 retries). The biggest advantage here is that this is super simple and isn't CoCo-specific. The premise is that if RDRAND fails 10 times in a row before userspace has started, it's most definitely a hardware problem. Systems-wise, the drawback is that, in a VM, it alternatively might be a guest-guest DoS attack on RDRAND, or in the CoCo case, a host-guest DoS attack (which is presumably easier because the host controls scheduling). In the CoCo case, not booting is better than losing confidentiality. In the non-CoCo case, that seems like theoretically a DoS we might not want. RNG-wise, the drawback is that this doesn't help deal with secure reseeding later in time, which is a RNG property that we otherwise enjoy. Solution B) BUG_ON(is_early_boot && is_coco_system) in the RDRAND failure path (> 10 retries). This is slightly less simple than A, because we have to plumb CoCo-detection through to the RDRAND helper. [Side note: I feel ridiculous typing 'CoCo'.] Systems-wise, I don't see drawbacks. RNG-wise, the drawback is that this doesn't help deal with secure reseeding later in time, which is a RNG property that we otherwise enjoy. Solution C) WARN_ONCE()/BUG() in the RDRAND failure path (> 10 retries). The advantage here is also simplicity, and the fact that it "ensures" we'll be able to securely reseed later on. Systems-wise, the drawback is that userspace can in theory DoS the kernel's RDRAND and cause a crash. Solution D) BUG_ON(is_coco_system) in the RDRAND failure path (> 10 retries). This is slightly less simple than A, because we have to plumb CoCo-detection through to the RDRAND helper, but it "ensures" we'll be able to securely reseed later on. Systems-wise, the drawback is that userspace can in theory DoS the kernel's RDRAND and cause a crash. Solution E) BUG() in a new time-based RDRAND failure path on CoCo systems (> 10 seconds). This adds a lot of complexity, and we'd need some alternative code path for CoCo with an infinite loop that breaks on a jiffies comparison. But it at least makes it harder for userspace to DoS the kernel's use of RDRAND, because it seems hard for a user thread to preempt a kthread for that long, though maybe somebody has some nasty scheduler tricks here that would break that hope. Solution F) Loop forever in RDRAND on CoCo systems. This makes debugging harder because of lockups (though I suppose we could WARN after some amount of time), but at least it's somewhat "sound". == I am currently leaning toward (B) as being the lightest touch that has the least potential to break anything. (F) is also tempting because it doesn't have the RNG-drawback. The others seem complex or incomplete or otherwise annoying somehow. There is also "Solution G" -- do nothing and raise a fuss and let security researchers go to town and hope Intel gets their act together. Given that the CoCo thing seems kind of imaginary/aspirational anyway at this point, I'm very attracted by this. I don't mean to say that I intend to mount a large argument that we *should* do nothing, but it's just sort of sitting there in the back of my mind as an appealing possibility. Also, I wanted to enumerate currently open questions: == Question i) Just how deterministic can these CoCo VMs be? Elena pointed to some TDX code regarding RDTSC that seemed fairly damning, but I also wonder what gotchas a motivated researcher might run into and how those could help us (or not). Question ii) Just how DoS-able is RDRAND? From host to guest, where the host controls scheduling, that seems easier, but how much so, and what's the granularity of these operations, and could retries still help, or not at all? What about from guest to guest, where the scheduling is out of control; in that case is there a value of N for which N retries makes it actually impossible to DoS? What about from userspace to kernelspace; good value of N? Question iii) How likely is Intel to actually fix this in a satisfactory way (see "specifying this is an interesting question" in [1])? And if they would, what would the timeline even be? == Anyway, that's about where I'm at. I figure I'll wait to see if the internal inquiry within Intel yields anything interesting, and then maybe we can move forward with solutions (B) or (F) or (G) or a different Roald Dahl novel instead. Jason [1] https://lore.kernel.org/all/CAHmME9ps6W5snQrYeNVMFgfhMKFKciky=-UxxGFbAx_RrxSHoA@mail.gmail.com/
On 2/1/24 10:09, Jason A. Donenfeld wrote: > Question ii) Just how DoS-able is RDRAND? From host to guest, where > the host controls scheduling, that seems easier, but how much so, and > what's the granularity of these operations, and could retries still > help, or not at all? What about from guest to guest, where the > scheduling is out of control; in that case is there a value of N for > which N retries makes it actually impossible to DoS? What about from > userspace to kernelspace; good value of N? So far, in practice, I haven't seen a single failure of RDRAND. It's been limited to RDSEED. In a perfect world, I'd change the architecture docs to say, "RDRAND only fails when the hardware breaks" and leave RDSEED defined to be the one that fails easily. Dealing with a fragile RDSEED seems like a much easier problem than dealing with a fragile RDRAND since RDSEED is used _much_ more sparingly in the kernel today. But I'm not sure if the hardware implementations fit into this perfect world I've conjured up. We're going to wrangle up the folks at Intel who can hopefully tell me if I'm totally deluded. Has anyone seen RDRAND failures in practice? Or just RDSEED? > Question iii) How likely is Intel to actually fix this in a > satisfactory way (see "specifying this is an interesting question" in > [1])? And if they would, what would the timeline even be? If the fix is pure documentation, it's on the order of months. I'm holding out hope that some kind of anti-DoS claims like you mentioned: > Specifying this is an interesting question. What exactly might our > requirements be for a "non-broken" RDRAND? It seems like we have two > basic ones: > > - One VMX (or host) context can't DoS another one. > - Ring 3 can't DoS ring 0. are still possible on existing hardware, at least for RDRAND.
On February 1, 2024 10:46:06 AM PST, Dave Hansen <dave.hansen@intel.com> wrote: >On 2/1/24 10:09, Jason A. Donenfeld wrote: >> Question ii) Just how DoS-able is RDRAND? From host to guest, where >> the host controls scheduling, that seems easier, but how much so, and >> what's the granularity of these operations, and could retries still >> help, or not at all? What about from guest to guest, where the >> scheduling is out of control; in that case is there a value of N for >> which N retries makes it actually impossible to DoS? What about from >> userspace to kernelspace; good value of N? > >So far, in practice, I haven't seen a single failure of RDRAND. It's >been limited to RDSEED. In a perfect world, I'd change the architecture >docs to say, "RDRAND only fails when the hardware breaks" and leave >RDSEED defined to be the one that fails easily. > >Dealing with a fragile RDSEED seems like a much easier problem than >dealing with a fragile RDRAND since RDSEED is used _much_ more sparingly >in the kernel today. > >But I'm not sure if the hardware implementations fit into this perfect >world I've conjured up. We're going to wrangle up the folks at Intel >who can hopefully tell me if I'm totally deluded. > >Has anyone seen RDRAND failures in practice? Or just RDSEED? > >> Question iii) How likely is Intel to actually fix this in a >> satisfactory way (see "specifying this is an interesting question" in >> [1])? And if they would, what would the timeline even be? > >If the fix is pure documentation, it's on the order of months. I'm >holding out hope that some kind of anti-DoS claims like you mentioned: > >> Specifying this is an interesting question. What exactly might our >> requirements be for a "non-broken" RDRAND? It seems like we have two >> basic ones: >> >> - One VMX (or host) context can't DoS another one. >> - Ring 3 can't DoS ring 0. > >are still possible on existing hardware, at least for RDRAND. The real question is: what do we actually need? During startup, we could afford a *lot* of looping to collect enough entropy before giving up. After that, even if RDSEED fails 99% of the time, it will still produce far more entropy than a typical external randomness source. We don't want to loop that long, obviously (*), but instead try periodically and let the entropy accumulate. (*) We *could* of course choose to aggressively loop in task context if there task would otherwise block on /dev/random.
> Hi Ted, Elena, Dave, > > On Thu, Feb 1, 2024 at 5:57 AM Theodore Ts'o <tytso@mit.edu> wrote: > > > > On Wed, Jan 31, 2024 at 07:01:01PM +0100, Jason A. Donenfeld wrote: > > > So if this is what we're congealing around, I guess we can: > > > > > > 0) Leave RDSEED alone and focus on RDRAND. > > > 1) Add `WARN_ON_ONCE(in_early_boot);` to the failure path of RDRAND > > > (and simply hope this doesn't get exploited for guest-guest boot DoS). > > > 2) Loop forever in RDRAND on CoCo VMs, post-boot, with the comments > > > and variable naming making it clear that this is a hardware bug > > > workaround, not a "feature" added for "extra security". > > > 3) Complain loudly to Intel and get them to fix the hardware. > > > > > > Though, a large part of me would really like to skip that step (2), > > > first because it's a pretty gross bandaid that adds lots of > > > complexity, and second because it'll make (3) less poignant > > > > If we need to loop more than, say, 10 seconds in a CoCo VM, I'd just > > panic with a repeated RDRAND failure message. This makes the point of > > (3) that much pointed, and it's better than having a CoCo VM > > mysteriously hang in the face of a DOS attack. > > Yea, true. Problem is that in theory, userspace can DoS the kernel's > use of RDRAND. Of course in practice, a userspace process preempting a > kthread for >10 seconds is probably a larger problem. > > Anyway, I want to lay out the various potential solutions discussed. > As they all have some drawback, it's worth enumerating them. > > == > > Solution A) WARN_ON_ONCE(is_early_boot)/BUG_ON(is_early_boot) in the > RDRAND failure path (> 10 retries). > > The biggest advantage here is that this is super simple and isn't > CoCo-specific. The premise is that if RDRAND fails 10 times in a row > before userspace has started, it's most definitely a hardware problem. > Systems-wise, the drawback is that, in a VM, it alternatively might be > a guest-guest DoS attack on RDRAND, or in the CoCo case, a host-guest > DoS attack (which is presumably easier because the host controls > scheduling). In the CoCo case, not booting is better than losing > confidentiality. In the non-CoCo case, that seems like theoretically a > DoS we might not want. RNG-wise, the drawback is that this doesn't > help deal with secure reseeding later in time, which is a RNG property > that we otherwise enjoy. > > Solution B) BUG_ON(is_early_boot && is_coco_system) in the RDRAND > failure path (> 10 retries). > > This is slightly less simple than A, because we have to plumb > CoCo-detection through to the RDRAND helper. [Side note: I feel > ridiculous typing 'CoCo'.] Systems-wise, I don't see drawbacks. > RNG-wise, the drawback is that this doesn't help deal with secure > reseeding later in time, which is a RNG property that we otherwise > enjoy. > > Solution C) WARN_ONCE()/BUG() in the RDRAND failure path (> 10 retries). > > The advantage here is also simplicity, and the fact that it "ensures" > we'll be able to securely reseed later on. Systems-wise, the drawback > is that userspace can in theory DoS the kernel's RDRAND and cause a > crash. > > Solution D) BUG_ON(is_coco_system) in the RDRAND failure path (> 10 retries). > > This is slightly less simple than A, because we have to plumb > CoCo-detection through to the RDRAND helper, but it "ensures" we'll be > able to securely reseed later on. Systems-wise, the drawback is that > userspace can in theory DoS the kernel's RDRAND and cause a crash. > > Solution E) BUG() in a new time-based RDRAND failure path on CoCo > systems (> 10 seconds). > > This adds a lot of complexity, and we'd need some alternative code > path for CoCo with an infinite loop that breaks on a jiffies > comparison. But it at least makes it harder for userspace to DoS the > kernel's use of RDRAND, because it seems hard for a user thread to > preempt a kthread for that long, though maybe somebody has some nasty > scheduler tricks here that would break that hope. > > Solution F) Loop forever in RDRAND on CoCo systems. > > This makes debugging harder because of lockups (though I suppose we > could WARN after some amount of time), but at least it's somewhat > "sound". > > == This is a great summary of options, thank you Jason! My proposal would be to wait on result of our internal investigation before proceeding to choose the approach. > > I am currently leaning toward (B) as being the lightest touch that has > the least potential to break anything. (F) is also tempting because it > doesn't have the RNG-drawback. The others seem complex or incomplete > or otherwise annoying somehow. > > There is also "Solution G" -- do nothing and raise a fuss and let > security researchers go to town and hope Intel gets their act > together. Given that the CoCo thing seems kind of > imaginary/aspirational anyway at this point, I'm very attracted by > this. I don't mean to say that I intend to mount a large argument that > we *should* do nothing, but it's just sort of sitting there in the > back of my mind as an appealing possibility. > > Also, I wanted to enumerate currently open questions: > > == > > Question i) Just how deterministic can these CoCo VMs be? Elena > pointed to some TDX code regarding RDTSC that seemed fairly damning, > but I also wonder what gotchas a motivated researcher might run into > and how those could help us (or not). This would be great imo to have a discussion on. I don’t think the internal design or implementation of TDX module is complicated to scare anyone off. So I think it would be a question on how practical would be for VMM to make such an attack on guest kernel? A lot of times such things are about precision, reliability and an ability to filter out the noise. So questions like how precisely *in practice* can VMM measure guest's virtual TSC and other parameters that are used as entropy inputs? But overall both in crypto and security, we don’t like to be too near the security bounds, because we always assume our understanding might be incomplete, so putting a reasonable and clear countermeasure is usually the better approach. > > Question ii) Just how DoS-able is RDRAND? From host to guest, where > the host controls scheduling, that seems easier, but how much so, and > what's the granularity of these operations, and could retries still > help, or not at all? What about from guest to guest, where the > scheduling is out of control; in that case is there a value of N for > which N retries makes it actually impossible to DoS? What about from > userspace to kernelspace; good value of N? All valid questions that I am also trying to understand the answers. Best Regards, Elena. > > Question iii) How likely is Intel to actually fix this in a > satisfactory way (see "specifying this is an interesting question" in > [1])? And if they would, what would the timeline even be? > > == > > Anyway, that's about where I'm at. I figure I'll wait to see if the > internal inquiry within Intel yields anything interesting, and then > maybe we can move forward with solutions (B) or (F) or (G) or a > different Roald Dahl novel instead. > > Jason > > [1] https://lore.kernel.org/all/CAHmME9ps6W5snQrYeNVMFgfhMKFKciky=- > UxxGFbAx_RrxSHoA@mail.gmail.com/
On Fri, Feb 02, 2024 at 07:25:42AM +0000, Reshetova, Elena wrote: > This is a great summary of options, thank you Jason! > My proposal would be to wait on result of our internal investigation > before proceeding to choose the approach. I'm happy for the option "Do nothing for now", but if we do want to do something in the absence of more detailed information, I'd suggest doing something simple for first, on the theory that it doesn't make things worse, and we can always do something more complicated if it turns out to be needed. In that vein, my suggestion is: > > Solution B) BUG_ON(is_early_boot && is_coco_system) in the RDRAND > > failure path (> 10 retries). > > > > This is slightly less simple than A, because we have to plumb > > CoCo-detection through to the RDRAND helper. [Side note: I feel > > ridiculous typing 'CoCo'.] Systems-wise, I don't see drawbacks. > > RNG-wise, the drawback is that this doesn't help deal with secure > > reseeding later in time, which is a RNG property that we otherwise > > enjoy. If there isn't a global variable we can test to see if Confidential Compute is enabled, I suspect we should just add one. I would assume that /dev/random isn't the only place where we might need to do whether Confidential Compute is enabled. So I don't think plumbing CC into the /dev/random code, and since we are only doing this in early boot, I wouldn't put it in the RDRAND helper, but rather in the caller of the RDRAND helper that gets used in the early boot path. (Side note, internally, at least in my part of my company, we use CC as the acronym of convenience. And any comments that I make are my own opinion, and do not reflect the positions or desires of my employer...) > > Question iii) How likely is Intel to actually fix this in a > > satisfactory way (see "specifying this is an interesting question" in > > [1])? And if they would, what would the timeline even be? Here are at least two obvious ways that Intel could fix or mitigate this issue: (1) Add more hardware RNG IP's for chips with a huge number of cores. This is the *obvious* way to address the problem with hundreds of CPU cores, although it's only something that can be done on newer chips. (2) Have a per-core throttle where a core is not allowed to issue RDRAND or RDSEED instructions more than N times per millisecond (or some other unit of time). So long as N is larger than the maximum number of SSL connections that a front-end server can actually terminate, it's not going to impact legitimate workloads. This can be approximated by the number of public key operations per unit time that a single CPU core achieve. And if RDRAND isn't sufficient to support that today, then see solution (1), or CPU customers should switch to some other CPU vendor that can... (3) Provide some kind of perf counter so the host can see which cores are issuing a huge number of RDRAND/RDSEED instructions, and which cores have been suffering from entropy exhaustion RDRAND/RDSEED failures. This would allow the operator of the host to detect which VM's might be carrying out DOS attacks, so that the operator can kill those VM's, and disable the customer account that was launching these abusive VM's. Hopefully mitigation #2 in particular (and maybe mitigation #3) is something that Intel could implement as a firmware update; I'd love comments from Intel if that is the case. I'll also note that the threat model where customer A is trying to start a CC VM, and customer B has purchased VM's that use all of the other cores on a server, is primarily the sort of thing that a public cloud vendor would need to worry about. And *if* this become a real issue, where some researcher demonstrates that there is a problem, the cloud provider will be hugely incentivized to apply host-side mitigations, and to lean on the guest OS providers to apply guest-side mitigations. So if this is only a DOS which applies for CC VM's, and it turns out that solution (B) is not sufficient, we can do something more complicated, such as having the guest retry the RDRAND instruction for ten seconds. And if some hypothetical "RandExhaust" attack is being written about by the New York Times, I suspect it won't be that hard to get Red Hat to apply mitigations to the RHEL kernel. :-) So I don't think it really is *that* big of a deal; if it turns out to be an issue, we will be able to deal with it. - Ted
On Thu, 2024-02-01 at 19:09 +0100, Jason A. Donenfeld wrote: [...] > Anyway, that's about where I'm at. I figure I'll wait to see if the > internal inquiry within Intel yields anything interesting, and then > maybe we can move forward with solutions (B) or (F) or (G) or a > different Roald Dahl novel instead. It's a lot to quote, so I cut it, but all of your solutions assume a rdseed/rdrand failure equates to a system one but it really doesn't: in most systems there are other entropy sources. In confidential computing it is an issue because we have no other trusted sources. The problem with picking on rdseed/rdrand is that there are bound to be older CPUs somewhere that have rng generation bugs that this will expose. How about making the failure contingent on the entropy pool not having any entropy when the first random number is requested? That way systems with more than one usable entropy source won't flag a bug, but it will still flag up confidential computing systems where there's a malicious entropy depleter. James
On Fri, Feb 02, 2024 at 04:47:11PM +0100, James Bottomley wrote: > > It's a lot to quote, so I cut it, but all of your solutions assume a > rdseed/rdrand failure equates to a system one but it really doesn't: in > most systems there are other entropy sources. In confidential > computing it is an issue because we have no other trusted sources. The > problem with picking on rdseed/rdrand is that there are bound to be > older CPUs somewhere that have rng generation bugs that this will > expose. I'm not sure what you're concerned about. As far as I know, all of the CPU's have some variant of Confidential Compute have some kind of RDRAND-like command. And while we're using the term RDRAND, I'd extend this to any CPU architecture-level RNG instruction which can return failure if it is subject to exhaustion attacks. > How about making the failure contingent on the entropy pool > not having any entropy when the first random number is requested? We have tried to avoid characterizing entropy sources as "valid" or "invalid". First of all, it's rarely quite so black-and-white. Something which is vulnerable to someone who can spy on inter-packet arrival times by having a hardware tap between the CPU and the network switch, or a wireless radio right next to the device being attacked, might not be easily carried out by someone who doesn't have local physical access. So we may be measuring various things that might or might not have "entropy". In the case of Confidential Compute, we have declared that none of those other sources constitute "entropy". But that's not a decision that can be made by the computer, or at least until we've tracked the AGI problem. (At which point, we might have other problems --- "I'm sorry, I'm afraid I can't do that.") - Ted
On Fri, 2024-02-02 at 11:05 -0500, Theodore Ts'o wrote: > On Fri, Feb 02, 2024 at 04:47:11PM +0100, James Bottomley wrote: > > > > It's a lot to quote, so I cut it, but all of your solutions assume > > a rdseed/rdrand failure equates to a system one but it really > > doesn't: in most systems there are other entropy sources. In > > confidential computing it is an issue because we have no other > > trusted sources. The problem with picking on rdseed/rdrand is that > > there are bound to be older CP s somewhere that have rng generation > > bugs that this will > > expose. > > I'm not sure what you're concerned about. As far as I know, all of > the CPU's have some variant of Confidential Compute have some kind of > RDRAND-like command. And while we're using the term RDRAND, I'd > extend this to any CPU architecture-level RNG instruction which can > return failure if it is subject to exhaustion attacks. My big concern is older cpus where rdrand/rdseed don't produce useful entropy. Exhaustion attacks are going to be largely against VMs not physical systems, so I worry about physical systems with older CPUs that might have rdrand issues which then trip our Confidential Computing checks. > > How about making the failure contingent on the entropy pool > > not having any entropy when the first random number is requested? > > We have tried to avoid characterizing entropy sources as "valid" or > "invalid". First of all, it's rarely quite so black-and-white. > Something which is vulnerable to someone who can spy on inter-packet > arrival times by having a hardware tap between the CPU and the > network switch, or a wireless radio right next to the device being > attacked, might not be easily carried out by someone who doesn't have > local physical access. > > So we may be measuring various things that might or might not have > "entropy". In the case of Confidential Compute, we have declared > that none of those other sources constitute "entropy". But that's > not a decision that can be made by the computer, or at least until > we've tracked the AGI problem. (At which point, we might have other > problems --- "I'm sorry, I'm afraid I can't do that.") The signal for rdseed failing is fairly clear, so if the node has other entropy sources, it should continue otherwise it should signal failure. Figuring out how a confidential computing environment signals that failure is TBD. James
Hi Ted, Kirill, On Fri, Feb 02, 2024 at 10:39:27AM -0500, Theodore Ts'o wrote: > On Fri, Feb 02, 2024 at 07:25:42AM +0000, Reshetova, Elena wrote: > > This is a great summary of options, thank you Jason! > > My proposal would be to wait on result of our internal investigation > > before proceeding to choose the approach. > > I'm happy for the option "Do nothing for now", but if we do want to do > something in the absence of more detailed information, I'd suggest > doing something simple for first, on the theory that it doesn't make > things worse, and we can always do something more complicated if it > turns out to be needed. > > In that vein, my suggestion is: > > > > Solution B) BUG_ON(is_early_boot && is_coco_system) in the RDRAND > > > failure path (> 10 retries). > > > > > > This is slightly less simple than A, because we have to plumb > > > CoCo-detection through to the RDRAND helper. [Side note: I feel > > > ridiculous typing 'CoCo'.] Systems-wise, I don't see drawbacks. > > > RNG-wise, the drawback is that this doesn't help deal with secure > > > reseeding later in time, which is a RNG property that we otherwise > > > enjoy. > > If there isn't a global variable we can test to see if Confidential > Compute is enabled, I suspect we should just add one. I would assume > that /dev/random isn't the only place where we might need to do > whether Confidential Compute is enabled. > > So I don't think plumbing CC into the /dev/random code, and since we > are only doing this in early boot, I wouldn't put it in the RDRAND > helper, but rather in the caller of the RDRAND helper that gets used > in the early boot path. Yea, actually, I had a pretty similar idea for something like that that's very non-invasive, where none of this even touches the RDRAND core code, much less random.c. Specifically, we consider "adding some extra RDRAND to the pool" like any other driver that wants to add some of its own seeds to the pool, with add_device_randomness(), a call that lives in various driver code, doesn't influence any entropy readiness aspects of random.c, and can safely be sprinkled in any device or platform driver. Specifically what I'm thinking about is something like: void coco_main_boottime_init_function_somewhere_deep_in_arch_code(void) { // [...] // bring up primary CoCo nuts // [...] /* CoCo requires an explicit RDRAND seed, because the host can make the * rest of the system deterministic. */ unsigned long seed[32 / sizeof(long)]; size_t i, longs; for (i = 0; i < ARRAY_SIZE(seed); i += longs) { longs = arch_get_random_longs(&seed[i], ARRAY_SIZE(seed) - i); /* If RDRAND is being DoS'd, panic, because we can't ensure * confidentiality. */ BUG_ON(!longs); } add_device_randomness(seed, sizeof(seed)); memzero_explicit(seed, sizeof(seed)); // [...] // do other CoCo things // [...] } I would have no objection to the CoCo people adding something like this and would give it my Ack, but more importantly, my Ack for that doesn't even matter, because add_device_randomness() is pretty innocuous. So Kirill, if nobody else here objects to that approach, and you want to implement it in some super minimal way like that, that would be fine with me. Or maybe we want to wait for that internal inquiry at Intel to return some answers first. But either way, this might be an easy approach that doesn't add too much complexity. Jason
On Fri, Feb 02, 2024 at 10:28:01PM +0100, James Bottomley wrote: > > My big concern is older cpus where rdrand/rdseed don't produce useful > entropy. Exhaustion attacks are going to be largely against VMs not > physical systems, so I worry about physical systems with older CPUs > that might have rdrand issues which then trip our Confidential > Computing checks. For (non-CC) VM's the answer is virtio-rng. This solves the exhaustion problem, since if you can't trust the host, the VM's security is taost anyway (again, ignoring Confidential Compute). > The signal for rdseed failing is fairly clear, so if the node has other > entropy sources, it should continue otherwise it should signal failure. > Figuring out how a confidential computing environment signals that > failure is TBD. That's a design decision, and I believe we've been converging on a panic during early boot. Post boot, if we've successfully succeeded in initializing the guest kernel's RNG, we're secure so long as the cryptographic primitives haven't been defeated --- and if we have, such as if Quantuum Computing because practical, we've got bigger problems anyway. - Ted
On February 3, 2024 6:35:47 AM PST, Theodore Ts'o <tytso@mit.edu> wrote: >On Fri, Feb 02, 2024 at 10:28:01PM +0100, James Bottomley wrote: >> >> My big concern is older cpus where rdrand/rdseed don't produce useful >> entropy. Exhaustion attacks are going to be largely against VMs not >> physical systems, so I worry about physical systems with older CPUs >> that might have rdrand issues which then trip our Confidential >> Computing checks. > >For (non-CC) VM's the answer is virtio-rng. This solves the >exhaustion problem, since if you can't trust the host, the VM's >security is taost anyway (again, ignoring Confidential Compute). > >> The signal for rdseed failing is fairly clear, so if the node has other >> entropy sources, it should continue otherwise it should signal failure. >> Figuring out how a confidential computing environment signals that >> failure is TBD. > >That's a design decision, and I believe we've been converging on a >panic during early boot. Post boot, if we've successfully succeeded >in initializing the guest kernel's RNG, we're secure so long as the >cryptographic primitives haven't been defeated --- and if we have, >such as if Quantuum Computing because practical, we've got bigger >problems anyway. > > - Ted I also want to emphasize that there is a huge difference between boot (initialization) time and runtime. Runtime harvesting has always been opportunistic in Linux, and so if RDSEED fails, try again later – unless perhaps a task is blocked on /dev/random in which case it might make sense to aggressively loop on the blocked core instead of just putting the process to sleep. Initialization time is a different game entirely. Until we have accumulated about 256-512 bits of seed data, even the best PRNG can't really be considered "completely random." Thus a far more aggressive approach may be called for; furthermore, this is the time to look for total failure of the NRBG if after some number N attempts (where I believe N should be quite large, if we spend a full second in the very worst case that is probably better than declaring failure and optionally panic the system) we have not acquired enough entropy then warn and optionally panic the system. By setting the limit in terms of time rather than iterations, this avoids the awkward issue of "the interface to the RDSEED unit is too fast and so it returns failure too often." I don't think anyone would argue that the right thing would be to slow down the response time of RDSEED for that reason, even though it would most likely radically reduce the failure rate (because the NRBG would have more time to produce entropy between queries at the maximum rate.) Let's say, entirely hypothetically (as of right now I have absolutely *no* insider information of the RNG unit roadmap), that we were to implement a prefetch buffer in the core, such that a single or a handful of RD* instructions could execute in a handful of cycles, with the core itself issuing the request to the RNG unit when there is space in the queue. Such a prefetch buffer could rather obviously get *very* quickly exhausted because the poll rate could be dramatically increased, and having the core stall until there is data may or may not be a good solution (advantage: the CPU can go to a lower power state while waiting; disadvantage: opportunistic harvesting would prefer a "poll and fail fast" variation, *especially* if the CPU is going to fulfill the request autonomously anyway.)
Hi Kirill, On Sat, Feb 3, 2024 at 11:12 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > Yea, actually, I had a pretty similar idea for something like that > that's very non-invasive, where none of this even touches the RDRAND > core code, much less random.c. Specifically, we consider "adding some > extra RDRAND to the pool" like any other driver that wants to add some > of its own seeds to the pool, with add_device_randomness(), a call that > lives in various driver code, doesn't influence any entropy readiness > aspects of random.c, and can safely be sprinkled in any device or > platform driver. > > Specifically what I'm thinking about is something like: > > void coco_main_boottime_init_function_somewhere_deep_in_arch_code(void) > { > // [...] > // bring up primary CoCo nuts > // [...] > > /* CoCo requires an explicit RDRAND seed, because the host can make the > * rest of the system deterministic. > */ > unsigned long seed[32 / sizeof(long)]; > size_t i, longs; > for (i = 0; i < ARRAY_SIZE(seed); i += longs) { > longs = arch_get_random_longs(&seed[i], ARRAY_SIZE(seed) - i); > /* If RDRAND is being DoS'd, panic, because we can't ensure > * confidentiality. > */ > BUG_ON(!longs); > } > add_device_randomness(seed, sizeof(seed)); > memzero_explicit(seed, sizeof(seed)); > > // [...] > // do other CoCo things > // [...] > } > > I would have no objection to the CoCo people adding something like this > and would give it my Ack, but more importantly, my Ack for that doesn't > even matter, because add_device_randomness() is pretty innocuous. > > So Kirill, if nobody else here objects to that approach, and you want to > implement it in some super minimal way like that, that would be fine > with me. Or maybe we want to wait for that internal inquiry at Intel to > return some answers first. But either way, this might be an easy > approach that doesn't add too much complexity. I went ahead and implemented this just to have something concrete out there: https://lore.kernel.org/all/20240209164946.4164052-1-Jason@zx2c4.com/ I probably screwed up some x86 platform conventions/details, but that's the general idea I had in mind. Jason
> Hi Kirill, > > On Sat, Feb 3, 2024 at 11:12 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Yea, actually, I had a pretty similar idea for something like that > > that's very non-invasive, where none of this even touches the RDRAND > > core code, much less random.c. Specifically, we consider "adding some > > extra RDRAND to the pool" like any other driver that wants to add some > > of its own seeds to the pool, with add_device_randomness(), a call that > > lives in various driver code, doesn't influence any entropy readiness > > aspects of random.c, and can safely be sprinkled in any device or > > platform driver. > > > > Specifically what I'm thinking about is something like: > > > > void coco_main_boottime_init_function_somewhere_deep_in_arch_code(void) > > { > > // [...] > > // bring up primary CoCo nuts > > // [...] > > > > /* CoCo requires an explicit RDRAND seed, because the host can make the > > * rest of the system deterministic. > > */ > > unsigned long seed[32 / sizeof(long)]; > > size_t i, longs; > > for (i = 0; i < ARRAY_SIZE(seed); i += longs) { > > longs = arch_get_random_longs(&seed[i], ARRAY_SIZE(seed) - i); > > /* If RDRAND is being DoS'd, panic, because we can't ensure > > * confidentiality. > > */ > > BUG_ON(!longs); > > } > > add_device_randomness(seed, sizeof(seed)); > > memzero_explicit(seed, sizeof(seed)); > > > > // [...] > > // do other CoCo things > > // [...] > > } > > > > I would have no objection to the CoCo people adding something like this > > and would give it my Ack, but more importantly, my Ack for that doesn't > > even matter, because add_device_randomness() is pretty innocuous. > > > > So Kirill, if nobody else here objects to that approach, and you want to > > implement it in some super minimal way like that, that would be fine > > with me. Or maybe we want to wait for that internal inquiry at Intel to > > return some answers first. But either way, this might be an easy > > approach that doesn't add too much complexity. > > I went ahead and implemented this just to have something concrete out there: > https://lore.kernel.org/all/20240209164946.4164052-1-Jason@zx2c4.com/ > > I probably screwed up some x86 platform conventions/details, but > that's the general idea I had in mind. > Thank you Jason! I want to bring another potential idea here for a discussion, which Peter Anvin proposed in our internal discussions, and I like it conceptually better than any options we discussed so far since it is much more generic. What if we instead of doing some special treatment on rdrand/seed, we try to fix the underneath problem of Linux RNG not supporting CoCo threat model. Linux RNG has almost set in stone definition of what sources contribute entropy and what don’t (with some additional flexibility with flags like trust_cpu). This works well for the current fixed threat model, but doesn’t work for CoCo because some sources are suddenly not trusted anymore to contribute entropy. However, some are still trusted and that is not just rdrand/rdseed, but we would also trust add_hwgenerator_randomness (given that we use TEE IO device here or have a way to get this input securely). So, even in theoretical scenario that both rdrand/rdseed is broken (let's say HW failure), a Linux RNG can actually boot securely in the guest if we have enough entropy from add_hwgenerator_randomness. So the change would be around adding the notion of conditional entropy counting (we will always take input as we do now because it wont hurt), which would automatically give us a correct behavior in _credit_init_bits() for initial seeding of crng. Also we need to have a generic way to stop the boot if the entropy is not increasing (for any reasons) and prevent booting with insecurely seeded crng. I do understand that this is going to be much bigger change than anything we are discussing so far, but conceptually it sounds right to be able to have a say what sources of entropy one trusts in runtime (probably applicable beyond CoCo in the future also) and what is the action when we cannot collect the entropy from these sources. What does everyone think? Best Regards, Elena.
On Mon, Feb 12, 2024 at 08:25:33AM +0000, Reshetova, Elena wrote: > What if we instead of doing some special treatment on rdrand/seed, we > try to fix the underneath problem of Linux RNG not supporting CoCo threat > model. Linux RNG has almost set in stone definition of what sources contribute > entropy and what don’t (with some additional flexibility with flags like trust_cpu). > This works well for the current fixed threat model, but doesn’t work for > CoCo because some sources are suddenly not trusted anymore to contribute > entropy. However, some are still trusted and that is not just rdrand/rdseed, > but we would also trust add_hwgenerator_randomness (given that we use > TEE IO device here or have a way to get this input securely). So, even in > theoretical scenario that both rdrand/rdseed is broken (let's say HW failure), > a Linux RNG can actually boot securely in the guest if we have enough > entropy from add_hwgenerator_randomness. So the problem with this is that there is now way we can authenticate the hardware RNG. For example, the hypervisor could claim that there is a ChaosKey USB key attached, and at the moment, unlike all other hardware random number generators, the Linux kernel is configured to blindly trust the ChaosKey because it was designed by Keith Packard and Bdale Garbee, and "It Must Be Good". But the only way that we know that it is a ChaosKey is by its USB major and minor id numbers --- and a malicious hypervisor could fake up such a device. And of course, that's not unique to the hypervisor --- someone could create a hardware USB key that claimed to be a ChaosKey, but which generated a fixed sequence, say 3,1,4,1,5,9,2,6,... and it would pass most RNG quality checkers, since it's obviously not a repeated sequence of digits, so the mandated FIPS required check would give it a thumbs up. And it doesn't have to be a faked ChaosKey device; a hypervisor could claim that there is a virtual TPM with its hardware random number generator, but it's also gimmicked to always give the same fixed sequence, and there's no way the guest OS could know otherwise. Hence, for the unique requirements of Confidential Compute, I'm afraid it's RDRAND/RSEED or bust.... - Ted
Theodore Ts'o wrote: > On Mon, Feb 12, 2024 at 08:25:33AM +0000, Reshetova, Elena wrote: > > What if we instead of doing some special treatment on rdrand/seed, we > > try to fix the underneath problem of Linux RNG not supporting CoCo threat > > model. Linux RNG has almost set in stone definition of what sources contribute > > entropy and what don’t (with some additional flexibility with flags like trust_cpu). > > This works well for the current fixed threat model, but doesn’t work for > > CoCo because some sources are suddenly not trusted anymore to contribute > > entropy. However, some are still trusted and that is not just rdrand/rdseed, > > but we would also trust add_hwgenerator_randomness (given that we use > > TEE IO device here or have a way to get this input securely). So, even in > > theoretical scenario that both rdrand/rdseed is broken (let's say HW failure), > > a Linux RNG can actually boot securely in the guest if we have enough > > entropy from add_hwgenerator_randomness. > > So the problem with this is that there is now way we can authenticate > the hardware RNG. Sure there is, that is what, for example, PCI TDISP (TEE Device Interface Security Protocol) is about. Set aside the difficulty of doing the PCI TDISP flow early in boot, and validating the device certficate and measurements based on golden values without talking to a remote verifier etc..., but if such a device has been accepted and its driver calls hwrng_register() it should be added as an entropy source. Now maybe there is something fatal in that "etc", and RDRAND needs to work for early entropy, but if a PCI device passes guest acceptance there should be no additional concerns for it to be considered a CC approved RNG.
On Mon, Feb 12, 2024 at 11:28:31PM -0800, Dan Williams wrote: > Sure there is, that is what, for example, PCI TDISP (TEE Device > Interface Security Protocol) is about. Set aside the difficulty of doing > the PCI TDISP flow early in boot, and validating the device certficate > and measurements based on golden values without talking to a remote > verifier etc..., but if such a device has been accepted and its driver > calls hwrng_register() it should be added as an entropy source. How real is TDISP? What hardware exists today and how much of this support is ready to land in the kernel? Looking at the news articles, it appears to me like bleeding edge technology, and what an unkind person might call "vaporware"? Is that an unfair characterization? There have plenty of things that have squirted out of standards bodies, like for example, "objected base storage", which has turned out to be a complete commercial failure and was never actually deployed in any real numbers, other than sample hardare being provided to academic researchers. How can we be sure that PCI TDISP won't end up going down that route? In any case, if we are going to go down this path, we will need to have some kind of policy engine hwrng_register() reject non-authenticated hardware if Confidential Compute is enabled (and possibly in other cases). - Ted
Theodore Ts'o wrote: > On Mon, Feb 12, 2024 at 11:28:31PM -0800, Dan Williams wrote: > > Sure there is, that is what, for example, PCI TDISP (TEE Device > > Interface Security Protocol) is about. Set aside the difficulty of doing > > the PCI TDISP flow early in boot, and validating the device certficate > > and measurements based on golden values without talking to a remote > > verifier etc..., but if such a device has been accepted and its driver > > calls hwrng_register() it should be added as an entropy source. > > How real is TDISP? What hardware exists today and how much of this > support is ready to land in the kernel? Looking at the news articles, > it appears to me like bleeding edge technology, and what an unkind > person might call "vaporware"? Is that an unfair characterization? Indeed it is. Typically when you have x86, riscv, arm, and s390 folks all show up at a Linux Plumbers session [1] to talk about their approach to handling a new platform paradigm, that is a decent indication that the technology is more real than not. Point taken that it is not here today, but it is also not multiple hardware generations away as the Plumbers participation indicated. > There have plenty of things that have squirted out of standards > bodies, like for example, "objected base storage", which has turned > out to be a complete commercial failure and was never actually > deployed in any real numbers, other than sample hardare being provided > to academic researchers. How can we be sure that PCI TDISP won't end > up going down that route? Of course, that is always a risk. History is littered with obsolesence, some of it before seeing any commercial uptake, some after. > In any case, if we are going to go down this path, we will need to > have some kind of policy engine hwrng_register() reject > non-authenticated hardware if Confidential Compute is enabled (and > possibly in other cases). Sounds reasonable, that recognition is all I wanted from mentioning PCI TDISP. [1]: https://lpc.events/event/17/contributions/1633/
On Tue, Feb 13, 2024 at 04:53:06PM -0800, Dan Williams wrote: > > Indeed it is. Typically when you have x86, riscv, arm, and s390 folks > all show up at a Linux Plumbers session [1] to talk about their approach > to handling a new platform paradigm, that is a decent indication that > the technology is more real than not. Point taken that it is not here > today, but it is also not multiple hardware generations away as the > Plumbers participation indicated. My big concerns with TDISP which make me believe it may not be a silver bullet is that (a) it's hyper-complex (although to be fair Confidential Compute isn't exactly simple, and (b) it's one thing to digitally sign software so you know that it comes from a trusted source; but it's a **lot** harder to prove that hardware hasn't been tampered with --- a digital siganture can't tell you much about whether or not the hardware is in an as-built state coming from the factory --- this requires things like wrapping the device with resistive wire in multiple directions with a whetstone bridge to detect if the wire has gotten cut or shorted, then dunking the whole thing in epoxy, so that any attempt to tamper with the hardware will result it self-destructing (via a thermite charge or equivalent :-) Remember, the whole conceit of Confidential Compute is that you don't trust the cloud provider --- but if that entity controls the PCI cards installed in their servers, and and that entity has the ability to *modify* the PCI cards in the server, all of the digital signatures and fancy-schmancy TDISP complexity isn't necessarily going to save you. The final concern is that it may take quite a while before these devices become real, and then for cloud providers like Amazon, Azure, to actually deploy them. And in the meantime, Confidential Compute VM's are already something which are available for customers to purchase *today*. So we need some kind of solution right now, and preferably, something which is simple enough that it is likely to be back-portable to RHEL. (And I fear that even if TDISP hardware existed today, it is so complicated that it may be a heavy lift to get it backported into enterprise distro kernels.) Ultimately, if CPU's can actually have an architectgural RNG ala RDRAND/RDSEED that actually can do the right thing in the face of entropy draining attacks, that seems to be a **much** simpler solution. And even if it requires waiting for the next generation of CPU's, this might be faster than waiting for the TDISP ecosystem mature. - Ted
Theodore Ts'o wrote: > On Tue, Feb 13, 2024 at 04:53:06PM -0800, Dan Williams wrote: > > > > Indeed it is. Typically when you have x86, riscv, arm, and s390 folks > > all show up at a Linux Plumbers session [1] to talk about their approach > > to handling a new platform paradigm, that is a decent indication that > > the technology is more real than not. Point taken that it is not here > > today, but it is also not multiple hardware generations away as the > > Plumbers participation indicated. > > My big concerns with TDISP which make me believe it may not be a > silver bullet is that (a) it's hyper-complex (although to be fair > Confidential Compute isn't exactly simple, and (b) it's one thing to > digitally sign software so you know that it comes from a trusted > source; but it's a **lot** harder to prove that hardware hasn't been > tampered with --- a digital siganture can't tell you much about > whether or not the hardware is in an as-built state coming from the > factory --- this requires things like wrapping the device with > resistive wire in multiple directions with a whetstone bridge to > detect if the wire has gotten cut or shorted, then dunking the whole > thing in epoxy, so that any attempt to tamper with the hardware will > result it self-destructing (via a thermite charge or equivalent :-) > > Remember, the whole conceit of Confidential Compute is that you don't > trust the cloud provider --- but if that entity controls the PCI cards > installed in their servers, and and that entity has the ability to > *modify* the PCI cards in the server, all of the digital signatures > and fancy-schmancy TDISP complexity isn't necessarily going to save > you. > > The final concern is that it may take quite a while before these > devices become real, and then for cloud providers like Amazon, Azure, > to actually deploy them. And in the meantime, Confidential Compute > VM's are already something which are available for customers to > purchase *today*. So we need some kind of solution right now, and > preferably, something which is simple enough that it is likely to be > back-portable to RHEL. > > (And I fear that even if TDISP hardware existed today, it is so > complicated that it may be a heavy lift to get it backported into > enterprise distro kernels.) No lies detected. Something is broken if you need to rely on TDISP to get a reliable random number in a guest. All it can enforce is that the VMM is not emulating a HWRNG. Also, VMM denial of service is outside of the TDISP threat model, so if VMM can steal all the entropy, or DoS RDSEED, you are back at square one. The only reason for jumping in on this tangent was to counterpoint the implication that the RNG core must always hard code a dependency on CPU HWRNG for confidential computing. However, yes, given the timelines for TDISP Linux could hard code that choice in the near term for expediency and leave it to the TDISP folks to unwind it later. > Ultimately, if CPU's can actually have an architectgural RNG ala > RDRAND/RDSEED that actually can do the right thing in the face of > entropy draining attacks, that seems to be a **much** simpler > solution. And even if it requires waiting for the next generation of > CPU's, this might be faster than waiting for the TDISP ecosystem > mature. Yes, please. I am happy if TDISP flies below the hype cycle so that its implications can be considered carefullly. At the same time I will keep an eye out for discussions like this where guest attestation of hardware provenance is raised.
> Ultimately, if CPU's can actually have an architectgural RNG ala > RDRAND/RDSEED that actually can do the right thing in the face of > entropy draining attacks, that seems to be a **much** simpler > solution. I don’t think anyone would object that the rdrand approach we are discussing here is simpler. My point (and also Peter original idea) was that if we want to do it correctly and generically (and *not* just about confidential computing), we ought to provide a way for users to define what entropy sources for Linux RNG they are willing to trust or not. This should not be a policy decision that kernel hardcodes (we try hard to avoid policies in kernel), but left for users to decide/configure based on their preferences, trust notions, fears of backdooring, whatelse. This of course has the flip part that some users will get it wrong, but reasonable secure defaults can be provided also. Best Regards, Elena.
On 14.02.24 г. 6:32 ч., Theodore Ts'o wrote: > On Tue, Feb 13, 2024 at 04:53:06PM -0800, Dan Williams wrote: >> >> Indeed it is. Typically when you have x86, riscv, arm, and s390 folks >> all show up at a Linux Plumbers session [1] to talk about their approach >> to handling a new platform paradigm, that is a decent indication that >> the technology is more real than not. Point taken that it is not here >> today, but it is also not multiple hardware generations away as the >> Plumbers participation indicated. > > My big concerns with TDISP which make me believe it may not be a > silver bullet is that (a) it's hyper-complex (although to be fair > Confidential Compute isn't exactly simple, and (b) it's one thing to > digitally sign software so you know that it comes from a trusted > source; but it's a **lot** harder to prove that hardware hasn't been > tampered with --- a digital siganture can't tell you much about > whether or not the hardware is in an as-built state coming from the > factory --- this requires things like wrapping the device with > resistive wire in multiple directions with a whetstone bridge to > detect if the wire has gotten cut or shorted, then dunking the whole > thing in epoxy, so that any attempt to tamper with the hardware will > result it self-destructing (via a thermite charge or equivalent :-) This really reminds me of the engineering that goes into the omnipresent POS terminals ate every store, since they store certificates from the card (Visa/Master) operators. So I wonder if at somepoint we'll have a pos-like device (by merit of its engineering) in every server.... > > Remember, the whole conceit of Confidential Compute is that you don't > trust the cloud provider --- but if that entity controls the PCI cards > installed in their servers, and and that entity has the ability to > *modify* the PCI cards in the server, all of the digital signatures > and fancy-schmancy TDISP complexity isn't necessarily going to save > you. Can't the same argument go for the CPU, though it's a lot more "integrated" into the silicong substrate, yet we somehow believe CoCo ascertains that a vm is running on trusted hardware? But ultimately the CPU is still a part that comes from the untrusted CSP. <snip>
On Wed, Feb 14, 2024 at 10:34:48AM +0200, Nikolay Borisov wrote: Hi, I hope the week is going well for everyone. > On 14.02.24 ??. 6:32 ??., Theodore Ts'o wrote: > >On Tue, Feb 13, 2024 at 04:53:06PM -0800, Dan Williams wrote: > >> > >>Indeed it is. Typically when you have x86, riscv, arm, and s390 folks > >>all show up at a Linux Plumbers session [1] to talk about their approach > >>to handling a new platform paradigm, that is a decent indication that > >>the technology is more real than not. Point taken that it is not here > >>today, but it is also not multiple hardware generations away as the > >>Plumbers participation indicated. > > > >My big concerns with TDISP which make me believe it may not be a > >silver bullet is that (a) it's hyper-complex (although to be fair > >Confidential Compute isn't exactly simple, and (b) it's one thing to > >digitally sign software so you know that it comes from a trusted > >source; but it's a **lot** harder to prove that hardware hasn't been > >tampered with --- a digital siganture can't tell you much about > >whether or not the hardware is in an as-built state coming from the > >factory --- this requires things like wrapping the device with > >resistive wire in multiple directions with a whetstone bridge to > >detect if the wire has gotten cut or shorted, then dunking the whole > >thing in epoxy, so that any attempt to tamper with the hardware will > >result it self-destructing (via a thermite charge or equivalent :-) > This really reminds me of the engineering that goes into the > omnipresent POS terminals ate every store, since they store > certificates from the card (Visa/Master) operators. So I wonder if > at somepoint we'll have a pos-like device (by merit of its > engineering) in every server.... It already exists. CoCo, at least the Intel implementation, is dependent on what amounts to this concept. > >Remember, the whole conceit of Confidential Compute is that you don't > >trust the cloud provider --- but if that entity controls the PCI cards > >installed in their servers, and and that entity has the ability to > >*modify* the PCI cards in the server, all of the digital signatures > >and fancy-schmancy TDISP complexity isn't necessarily going to save > >you. > Can't the same argument go for the CPU, though it's a lot more > "integrated" into the silicong substrate, yet we somehow believe > CoCo ascertains that a vm is running on trusted hardware? But > ultimately the CPU is still a part that comes from the untrusted > CSP. The attestation model for TDX is largely built on top of SGX. The Intel predicate with respect to SGX/TDX is that you have to trust the CPU silicon implementation, if you can't entertain that level of trust, it is game over for security. To support that security model, Intel provides infrastructure that proves that the software is running on a 'Genuine Intel' CPU. Roughly, a root key is burned into the silicon that is used as the basis for additional derived keys. The key access and derivation processes can only occur when the process is running software with a known signature in a protected region of memory (enclave). The model is to fill a structure with data that defines the hardware/software state. A keyed checksum is run over the structure that allows a relying party to verify that the data structure contents could have only been generated on a valid Intel CPU. This process verifies that the CPU is from a known vendor, which is of course only the initial starting point for verifying that something like a VM is running in a known and trusted state. But, if you can't start with that predicate you have nothing to build on. The actual implementation nowadays is a bit more complex, given that all of this has to happen on multi-socket systems which involve more than one CPU, but the concept is the same. Have a good day. As always, Dr. Greg The Quixote Project - Flailing at the Travails of Cybersecurity https://github.com/Quixote-Project
> This is a great summary of options, thank you Jason! > My proposal would be to wait on result of our internal investigation > before proceeding to choose the approach. Hi everyone, I am finally able to share the result of my AR and here is the statement about rdrand/rdseed on Intel platforms: "The RdRand in a non-defective device is designed to be faster than the bus, so when a core accesses the output from the DRNG, it will always get a random number. As a result, it is hard to envision a scenario where the RdRand, on a fully functional device, will underflow. The carry flag after RdRand signals an underflow so in the case of a defective chip, this will prevent the code thinking it has a random number when it does not. RdSeed however is limited by the speed of the noise source. So it is not faster than the bus and there may be an underflow signaled by the carry flag. When reading for multiple values, the total throughput of RdSeed random numbers varies over different products due to variation in the silicon processes, operating voltage and speed vs power tradeoffs. The throughput is shared between the cores" In addition there is a plan to publish a whitepaper and add clarifications to Intel official documentation on this topic, but this would obviously take longer. Best Regards, Elena.
Hi Elena, On Wed, Feb 14, 2024 at 4:18 PM Reshetova, Elena <elena.reshetova@intel.com> wrote: > "The RdRand in a non-defective device is designed to be faster than the bus, > so when a core accesses the output from the DRNG, it will always get a > random number. > As a result, it is hard to envision a scenario where the RdRand, on a fully > functional device, will underflow. > The carry flag after RdRand signals an underflow so in the case of a defective chip, > this will prevent the code thinking it has a random number when it does not. That's really great news, especially combined with a very similar statement from Borislav about AMD chips: On Fri, Feb 9, 2024 at 10:45 PM Borislav Petkov <bp@alien8.de> wrote: > Yeah, I know exactly what you mean and I won't go into details for > obvious reasons. Two things: > > * Starting with Zen3, provided properly configured hw RDRAND will never > fail. It is also fair when feeding the different contexts. I assume that this faster-than-the-bus-ness also takes into account the various accesses required to even switch contexts when scheduling VMs, so your proposed host-guest scheduling attack can't really happen either. Correct? One clarifying question in all of this: what is the point of the "try 10 times" advice? Is the "faster than the bus" statement actually "faster than the bus if you try 10 times"? Or is the "10 times" advice just old and not relevant. In other words, is the following a reasonable patch? diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h index 02bae8e0758b..2d5bf5aa9774 100644 --- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -13,22 +13,16 @@ #include <asm/processor.h> #include <asm/cpufeature.h> -#define RDRAND_RETRY_LOOPS 10 - /* Unconditional execution of RDRAND and RDSEED */ static inline bool __must_check rdrand_long(unsigned long *v) { bool ok; - unsigned int retry = RDRAND_RETRY_LOOPS; - do { - asm volatile("rdrand %[out]" - CC_SET(c) - : CC_OUT(c) (ok), [out] "=r" (*v)); - if (ok) - return true; - } while (--retry); - return false; + asm volatile("rdrand %[out]" + CC_SET(c) + : CC_OUT(c) (ok), [out] "=r" (*v)); + WARN_ON(!ok); + return ok; } static inline bool __must_check rdseed_long(unsigned long *v) (As for the RDSEED clarification, that also matches Borislav's reply, is what we expected and knew experimentally, and doesn't really have any bearing on Linux's RNG or this discussion, since RDRAND is all we need anyway.) Regards, Jason
On Mon, Feb 12, 2024 at 08:25:33AM +0000, Reshetova, Elena wrote: > So the change would be around adding the notion of conditional entropy > counting (we will always take input as we do now because it wont hurt), > which would automatically give us a correct behavior in _credit_init_bits() > for initial seeding of crng. I basically have zero interest in this kind of highly complex addition, and I think that'll lead us back toward how the RNG was in the past. "Entropy counting" is mostly an illusion, at least in terms of doing so from measurement. We've got some heuristics to mitigate "premature first" but these things will mostly only ever be heuristic. If a platform like CoCo knows nothing else will work, then a platform-specific choice like the one in this patch is sufficient to do the trick. And in general, this seems like a weird thing to design around: if the CPU is actually just totally broken and defective, maybe CoCo shouldn't continue executing anyway? So I'm pretty loathe to go in this direction of highly complex policy frameworks and such. Anyway, based on your last email (and my reply to it), it seems like we're mostly in the clear anyway, and we can rely on RDRAND failure ==> hardware failure. Jason
Hi Elena, > > On Wed, Feb 14, 2024 at 4:18 PM Reshetova, Elena <elena.reshetova@intel.com> > wrote: > > "The RdRand in a non-defective device is designed to be faster than the bus, > > so when a core accesses the output from the DRNG, it will always get a > > random number. > > As a result, it is hard to envision a scenario where the RdRand, on a fully > > functional device, will underflow. > > The carry flag after RdRand signals an underflow so in the case of a defective chip, > > this will prevent the code thinking it has a random number when it does not. > > That's really great news, especially combined with a very similar > statement from Borislav about AMD chips: > > On Fri, Feb 9, 2024 at 10:45 PM Borislav Petkov <bp@alien8.de> wrote: > > Yeah, I know exactly what you mean and I won't go into details for > > obvious reasons. Two things: > > > > * Starting with Zen3, provided properly configured hw RDRAND will never > > fail. It is also fair when feeding the different contexts. > > I assume that this faster-than-the-bus-ness also takes into account the > various accesses required to even switch contexts when scheduling VMs, > so your proposed host-guest scheduling attack can't really happen > either. Correct? Yes, this attack wont be possible for rdrand, so we are good. > > One clarifying question in all of this: what is the point of the "try 10 > times" advice? Is the "faster than the bus" statement actually "faster > than the bus if you try 10 times"? Or is the "10 times" advice just old > and not relevant. The whitepaper should clarify this more in the future, but in short 10 times retry is not relevant based on the above statement. "when core accesses the output from the DRNG, it will always get a random number" - there are no statements of re-try here. > > In other words, is the following a reasonable patch? > > diff --git a/arch/x86/include/asm/archrandom.h > b/arch/x86/include/asm/archrandom.h > index 02bae8e0758b..2d5bf5aa9774 100644 > --- a/arch/x86/include/asm/archrandom.h > +++ b/arch/x86/include/asm/archrandom.h > @@ -13,22 +13,16 @@ > #include <asm/processor.h> > #include <asm/cpufeature.h> > > -#define RDRAND_RETRY_LOOPS 10 > - > /* Unconditional execution of RDRAND and RDSEED */ > > static inline bool __must_check rdrand_long(unsigned long *v) > { > bool ok; > - unsigned int retry = RDRAND_RETRY_LOOPS; > - do { > - asm volatile("rdrand %[out]" > - CC_SET(c) > - : CC_OUT(c) (ok), [out] "=r" (*v)); > - if (ok) > - return true; > - } while (--retry); > - return false; > + asm volatile("rdrand %[out]" > + CC_SET(c) > + : CC_OUT(c) (ok), [out] "=r" (*v)); > + WARN_ON(!ok); > + return ok; > } Do you intend this as a generic rdrand change or also a fix for CoCo case problem? I personally don’t like WARN_ON from security pov, but I know I am in minority with this. > > static inline bool __must_check rdseed_long(unsigned long *v) > > (As for the RDSEED clarification, that also matches Borislav's reply, is > what we expected and knew experimentally, and doesn't really have any > bearing on Linux's RNG or this discussion, since RDRAND is all we need > anyway.) Agree. Just wanted to have it also included for the overall picture. > > Regards, > Jason
Hi Elena, On Wed, Feb 14, 2024 at 05:59:48PM +0000, Reshetova, Elena wrote: > > > > In other words, is the following a reasonable patch? > > > > diff --git a/arch/x86/include/asm/archrandom.h > > b/arch/x86/include/asm/archrandom.h > > index 02bae8e0758b..2d5bf5aa9774 100644 > > --- a/arch/x86/include/asm/archrandom.h > > +++ b/arch/x86/include/asm/archrandom.h > > @@ -13,22 +13,16 @@ > > #include <asm/processor.h> > > #include <asm/cpufeature.h> > > > > -#define RDRAND_RETRY_LOOPS 10 > > - > > /* Unconditional execution of RDRAND and RDSEED */ > > > > static inline bool __must_check rdrand_long(unsigned long *v) > > { > > bool ok; > > - unsigned int retry = RDRAND_RETRY_LOOPS; > > - do { > > - asm volatile("rdrand %[out]" > > - CC_SET(c) > > - : CC_OUT(c) (ok), [out] "=r" (*v)); > > - if (ok) > > - return true; > > - } while (--retry); > > - return false; > > + asm volatile("rdrand %[out]" > > + CC_SET(c) > > + : CC_OUT(c) (ok), [out] "=r" (*v)); > > + WARN_ON(!ok); > > + return ok; > > } > > Do you intend this as a generic rdrand change or also a fix for CoCo > case problem? I was thinking generic, since in all cases, RDRAND failing points to a hardware bug in the CPU ITSELF (!), which is solid grounds for a WARN(). > I personally don’t like WARN_ON from security > pov, but I know I am in minority with this. I share the same opinion as you, that WARN_ON() is a little weak and we should BUG_ON() or panic() or whatever, but I also know that this ship has really sailed long ago, that in lots of ways Linus is also right that BUG() is bad and shouldn't be used for much, and this just isn't a hill to die on. And the "panic_on_warn" flag exists and "security guides" sometimes say to turn this on, etc, so I think WARN_ON() remains the practical compromise that won't get everyone's feathers ruffelled up. By the way, there is still one question lingering in the back of my mind, but I don't know if answering it would divulge confidential implementation details. You said that RDRAND is faster than the bus, so failures won't be observable, while RDSEED is not because it requires collecting entropy from the ether which is slow. That makes intuitive sense on a certain dumb simplistic level: AES is just an algorithm so is fast, while entropy collection is a more physical thing so is slow. But if you read the implementation details, RDRAND is supposed to reseed after 511 calls. So what's to stop you from exhausting RDSEED in one place, while also getting RDRAND to the end of its 511 calls, and *then* having your victim make the subsequent RDRAND call, which tries to reseed (or is in progress of doing so), finds that RDSEED is out of batteries, and underflows? What's the magic detail that makes this scenario not possible? Jason
On 2/14/24 11:21, Jason A. Donenfeld wrote: > Hi Elena, > > On Wed, Feb 14, 2024 at 4:18 PM Reshetova, Elena <elena.reshetova@intel.com> wrote: >> "The RdRand in a non-defective device is designed to be faster than the bus, >> so when a core accesses the output from the DRNG, it will always get a >> random number. >> As a result, it is hard to envision a scenario where the RdRand, on a fully >> functional device, will underflow. >> The carry flag after RdRand signals an underflow so in the case of a defective chip, >> this will prevent the code thinking it has a random number when it does not. > > That's really great news, especially combined with a very similar > statement from Borislav about AMD chips: > > On Fri, Feb 9, 2024 at 10:45 PM Borislav Petkov <bp@alien8.de> wrote: >> Yeah, I know exactly what you mean and I won't go into details for >> obvious reasons. Two things: >> >> * Starting with Zen3, provided properly configured hw RDRAND will never >> fail. It is also fair when feeding the different contexts. > > I assume that this faster-than-the-bus-ness also takes into account the > various accesses required to even switch contexts when scheduling VMs, > so your proposed host-guest scheduling attack can't really happen > either. Correct? > > One clarifying question in all of this: what is the point of the "try 10 > times" advice? Is the "faster than the bus" statement actually "faster > than the bus if you try 10 times"? Or is the "10 times" advice just old > and not relevant. > > In other words, is the following a reasonable patch? > > diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h > index 02bae8e0758b..2d5bf5aa9774 100644 > --- a/arch/x86/include/asm/archrandom.h > +++ b/arch/x86/include/asm/archrandom.h > @@ -13,22 +13,16 @@ > #include <asm/processor.h> > #include <asm/cpufeature.h> > > -#define RDRAND_RETRY_LOOPS 10 > - > /* Unconditional execution of RDRAND and RDSEED */ > > static inline bool __must_check rdrand_long(unsigned long *v) > { > bool ok; > - unsigned int retry = RDRAND_RETRY_LOOPS; > - do { > - asm volatile("rdrand %[out]" > - CC_SET(c) > - : CC_OUT(c) (ok), [out] "=r" (*v)); > - if (ok) > - return true; > - } while (--retry); > - return false; > + asm volatile("rdrand %[out]" > + CC_SET(c) > + : CC_OUT(c) (ok), [out] "=r" (*v)); > + WARN_ON(!ok); > + return ok; Don't forget that Linux will run on older hardware as well, so the 10 retries might be valid for that. Or do you intend this change purely for CVMs? Thanks, Tom > } > > static inline bool __must_check rdseed_long(unsigned long *v) > > (As for the RDSEED clarification, that also matches Borislav's reply, is > what we expected and knew experimentally, and doesn't really have any > bearing on Linux's RNG or this discussion, since RDRAND is all we need > anyway.) > > Regards, > Jason
Hi Tom, On Wed, Feb 14, 2024 at 8:46 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: > Don't forget that Linux will run on older hardware as well, so the 10 > retries might be valid for that. Or do you intend this change purely for CVMs? Oh, grr, darnit. That is indeed a very important detail. I meant this for generic code, so yea, if it's actually just Zen3+, then this won't fly. AMD people, Intel people: what are the fullest statements we can rely on here? Do the following two statements work? 1) On newer chips, RDRAND never fails. 2) On older chips, RDRAND never fails if you try 10 times in a loop, unless you consider host->guest attacks, which we're not, because CoCo is only a thing on the newer chips. If those hold true, then the course of action would be to just add a WARN_ON(!ok) but keep the loop as-is. (Anyway, I posted https://lore.kernel.org/lkml/20240214195744.8332-1-Jason@zx2c4.com/ just before seeing this message.) Jason
On Wed, Feb 14, 2024 at 09:04:34PM +0100, Jason A. Donenfeld wrote: > AMD people, Intel people: what are the fullest statements we can rely > on here? Do the following two statements work? > > 1) On newer chips, RDRAND never fails. > 2) On older chips, RDRAND never fails if you try 10 times in a loop, > unless you consider host->guest attacks, which we're not, because CoCo > is only a thing on the newer chips. > > If those hold true, then the course of action would be to just add a > WARN_ON(!ok) but keep the loop as-is. I think we may only want to do the WARN_ON in early boot. Otherwise, on older chips, if a userspace process executes RDRAND is a tight loop, it might cause the WARN_ON to trigger, which is considered undesirable (and is certainly going to be something that could result in a syzbot complaint). - Ted
On 2/14/24 09:21, Jason A. Donenfeld wrote: > One clarifying question in all of this: what is the point of the "try 10 > times" advice? Is the "faster than the bus" statement actually "faster > than the bus if you try 10 times"? Or is the "10 times" advice just old > and not relevant. > > In other words, is the following a reasonable patch? > > diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h > index 02bae8e0758b..2d5bf5aa9774 100644 > --- a/arch/x86/include/asm/archrandom.h > +++ b/arch/x86/include/asm/archrandom.h > @@ -13,22 +13,16 @@ > #include <asm/processor.h> > #include <asm/cpufeature.h> > > -#define RDRAND_RETRY_LOOPS 10 > - > /* Unconditional execution of RDRAND and RDSEED */ > > static inline bool __must_check rdrand_long(unsigned long *v) > { > bool ok; > - unsigned int retry = RDRAND_RETRY_LOOPS; > - do { > - asm volatile("rdrand %[out]" > - CC_SET(c) > - : CC_OUT(c) (ok), [out] "=r" (*v)); > - if (ok) > - return true; > - } while (--retry); > - return false; > + asm volatile("rdrand %[out]" > + CC_SET(c) > + : CC_OUT(c) (ok), [out] "=r" (*v)); > + WARN_ON(!ok); > + return ok; > } The key question here is if RDRAND can ever fail on perfectly good hardware. I think it's theoretically possible for the entropy source health checks to fail on perfectly good hardware for an arbitrarily long time. But the odds of this happening to the point of it affecting RDRAND are rather small. There's a reason that the guidance says: "the odds of ten failures in a row are astronomically small" _instead_ of claiming the same about a single RDRAND. Given the scale that the kernel operates at, I think we should leave the loop.
> You said that RDRAND is faster than the bus, so failures won't be > observable, while RDSEED is not because it requires collecting entropy > from the ether which is slow. That makes intuitive sense on a certain > dumb simplistic level: AES is just an algorithm so is fast, while > entropy collection is a more physical thing so is slow. But if you read > the implementation details, RDRAND is supposed to reseed after 511 > calls. So what's to stop you from exhausting RDSEED in one place, while > also getting RDRAND to the end of its 511 calls, and *then* having your > victim make the subsequent RDRAND call, which tries to reseed (or is in > progress of doing so), finds that RDSEED is out of batteries, and > underflows? What's the magic detail that makes this scenario not > possible? This was on my list of scenarios to double check whenever it is possible or not, and the answer is that it is not possible (at least for Intel). This scenario is also briefly described in the public doc [1]: " Note that the conditioner does not send the same seed values to both the DRBG and the ENRNG. This pathway can be thought of as an alternating switch, with one seed going to the DRGB and the next seed going to the ENRNG. *This construction ensures* that a software application can never obtain the value used to seed the DRBG, *nor can it launch a Denial of Service (DoS) attack against the DRBG through repeated executions of the RDSEED instruction.*" The upcoming whitepaper hopefully should provide more details on this also. [1] https://www.intel.com/content/www/us/en/developer/articles/guide/intel-digital-random-number-generator-drng-software-implementation-guide.html
On Thu, Feb 15, 2024 at 07:07:45AM +0000, Reshetova, Elena wrote: > > You said that RDRAND is faster than the bus, so failures won't be > > observable, while RDSEED is not because it requires collecting entropy > > from the ether which is slow. That makes intuitive sense on a certain > > dumb simplistic level: AES is just an algorithm so is fast, while > > entropy collection is a more physical thing so is slow. But if you read > > the implementation details, RDRAND is supposed to reseed after 511 > > calls. So what's to stop you from exhausting RDSEED in one place, while > > also getting RDRAND to the end of its 511 calls, and *then* having your > > victim make the subsequent RDRAND call, which tries to reseed (or is in > > progress of doing so), finds that RDSEED is out of batteries, and > > underflows? What's the magic detail that makes this scenario not > > possible? > > This was on my list of scenarios to double check whenever it is possible > or not, and the answer is that it is not possible (at least for Intel). > This scenario is also briefly described in the public doc [1]: > > " Note that the conditioner does not send the same seed values to both the > DRBG and the ENRNG. This pathway can be thought of as an alternating > switch, with one seed going to the DRGB and the next seed going to the ENRNG. > *This construction ensures* that a software application can never obtain the > value used to seed the DRBG, *nor can it launch a Denial of Service (DoS) > attack against the DRBG through repeated executions of the RDSEED instruction.*" Interesting, and good to hear. So also implicit must be that the time required by 511 calls to RDRAND exceeds the reseeding time, so that you couldn't exhaust the seeds indirectly by flushing RDRAND. Jason
On Wed, Feb 14, 2024 at 03:11:03PM -0500, Theodore Ts'o wrote: > On Wed, Feb 14, 2024 at 09:04:34PM +0100, Jason A. Donenfeld wrote: > > AMD people, Intel people: what are the fullest statements we can rely > > on here? Do the following two statements work? > > > > 1) On newer chips, RDRAND never fails. > > 2) On older chips, RDRAND never fails if you try 10 times in a loop, > > unless you consider host->guest attacks, which we're not, because CoCo > > is only a thing on the newer chips. > > > > If those hold true, then the course of action would be to just add a > > WARN_ON(!ok) but keep the loop as-is. > > I think we may only want to do the WARN_ON in early boot. Otherwise, > on older chips, if a userspace process executes RDRAND is a tight > loop, it might cause the WARN_ON to trigger, which is considered > undesirable (and is certainly going to be something that could result > in a syzbot complaint). Yea, seems reasonable. Or maybe we just don't bother adding any WARN there and just address the CoCo thing with the patch 2/2. As it turns out, on normal systems, the RNG is designed anyway to deal with a broken or missing RDRAND. So maybe adding these heuristics to warn when the CPU is broken isn't worth it? Or maybe that's an interesting thing to do? Dunno, I'm indifferent about it I suppose. But I agree if it's added, doing it at early boot only makes most sense. Jason
diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h index 02bae8e0758b..918c5880de9e 100644 --- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -33,11 +33,19 @@ static inline bool __must_check rdrand_long(unsigned long *v) static inline bool __must_check rdseed_long(unsigned long *v) { + unsigned int retry = RDRAND_RETRY_LOOPS; bool ok; - asm volatile("rdseed %[out]" - CC_SET(c) - : CC_OUT(c) (ok), [out] "=r" (*v)); - return ok; + + do { + asm volatile("rdseed %[out]" + CC_SET(c) + : CC_OUT(c) (ok), [out] "=r" (*v)); + + if (ok) + return true; + } while (--retry); + + return false; } /*