From patchwork Mon Nov 28 11:18:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 26649 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp5596671wrr; Mon, 28 Nov 2022 03:37:40 -0800 (PST) X-Google-Smtp-Source: AA0mqf5u6MXdwT0dpSqWmqdOfgSf0NwNIfCaMK1deYbvwAnzIXjVPM8ht7W6iik5DyqmwOO2pZfJ X-Received: by 2002:aa7:c788:0:b0:458:b9f9:9fba with SMTP id n8-20020aa7c788000000b00458b9f99fbamr30708059eds.305.1669635460686; Mon, 28 Nov 2022 03:37:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669635460; cv=none; d=google.com; s=arc-20160816; b=WOSPGLCHGR1bPulhdIRHcPSwsPzi1iXLs7XF3CtmQxYPXMhqtLxNsL/96UDLhIP2En 40JKB6NZvbt/auMucXNka1Qq8sfmztC/Zmiy707eeUC1SusIZcJFWLBlTAvXEtbD1dO4 v7+zOJVVHqoqOhR0oZV9/cSFZV2WAwPIJn030vI+6TucSw4etL+krSZjgfJ2nsEwpmOH A3Nqkcs1Qe2DWGlI5I0BjOvEFBo/KU3Tj5+ZwTWTR0zxN/9yHDpperpV/zH0VhR5Qtwd yfzMiBirLVMEBbV6dzsnE8+riEBUe9EO+ZetcidFPy3JItaSa9Od/ewv50LaFhAzC5Jz mZXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=C6bKCdbJI8caGV86jHwYqsJ4Ef3NCBwwd7g33zRLmuo=; b=TQDmPNH9D9vKBghi/e4fQrkT7TKsk70uADKLb1r05YPVTW9blfNheQ+wnGjo+m/8jo chXAZkn5LPAFZnAYinQMLq8PZuBFGNmyMCK/XsPp3Df2sFiZmdqv1r/oiyCD0AOEpJG1 BITedNDEgSfhJGeKbr2SH03q7prr/8lw3hKmQu6pUDgfugUFDUXnTybPCX6ok67w24kA qHz3uO8km7fdlTIhCMEe2/4mW+YNIFJrXcHVOzXxpcjg7ZeZe6QvsA0V06n6b2Ul2KIo N6xjwFYFeQG8PcJ8xRaVQgm+ydZAB0mEha/dtWYepBRVn7G6XRxaVMBPgb5eMgYba2XJ ExqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=20210105 header.b=ckVqQGGQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=zx2c4.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nd5-20020a170907628500b0078d9d67841fsi9525366ejc.400.2022.11.28.03.37.17; Mon, 28 Nov 2022 03:37:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=20210105 header.b=ckVqQGGQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231331AbiK1LUk (ORCPT + 99 others); Mon, 28 Nov 2022 06:20:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231307AbiK1LUJ (ORCPT ); Mon, 28 Nov 2022 06:20:09 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61A0D1CB0B; Mon, 28 Nov 2022 03:18:56 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E518D61118; Mon, 28 Nov 2022 11:18:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0ED6DC433C1; Mon, 28 Nov 2022 11:18:53 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="ckVqQGGQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1669634333; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C6bKCdbJI8caGV86jHwYqsJ4Ef3NCBwwd7g33zRLmuo=; b=ckVqQGGQeuga74ChPjFfsMQUB4b1SV3DbIz3y7zKQls1AxbW45vAtZKY4GsrlwkQo7orHE M1BCPnt7iiGc58J/ZnpgyOE0MVMout30lEX3znJOM3ALwiLlS4kHDuYXJn4+x/FdzmwOQO 5Y2WlvM1Sx868PXTMWOFuk5a1BfGaEc= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id dc600f2e (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 28 Nov 2022 11:18:52 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de Cc: "Jason A. Donenfeld" , linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , Carlos O'Donell , Florian Weimer , Arnd Bergmann , Christian Brauner Subject: [PATCH v8 1/3] random: add vgetrandom_alloc() syscall Date: Mon, 28 Nov 2022 12:18:27 +0100 Message-Id: <20221128111829.2477505-2-Jason@zx2c4.com> In-Reply-To: <20221128111829.2477505-1-Jason@zx2c4.com> References: <20221128111829.2477505-1-Jason@zx2c4.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750739673050977412?= X-GMAIL-MSGID: =?utf-8?q?1750739673050977412?= The vDSO getrandom() works over an opaque per-thread state of an unexported size, which must be marked as MADV_WIPEONFORK and be mlock()'d for proper operation. Over time, the nuances of these allocations may change or grow or even differ based on architectural features. The syscall has the signature: void *vgetrandom_alloc([inout] unsigned int *num, [out] unsigned int *size_per_each, unsigned int flags); This takes the desired number of opaque states in `num`, and returns a pointer to an array of opaque states, the number actually allocated back in `num`, and the size in bytes of each one in `size_per_each`, enabling a libc to slice up the returned array into a state per each thread. (The `flags` argument is always zero for now.) Libc is expected to allocate a chunk of these on first use, and then dole them out to threads as they're created, allocating more when needed. The following commit shows an example of this, being used in conjunction with the getrandom() vDSO function. We very intentionally do *not* leave state allocation for vDSO getrandom() up to userspace itself, but rather provide this new syscall for such allocations. vDSO getrandom() must not store its state in just any old memory address, but rather just ones that the kernel specially allocates for it, leaving the particularities of those allocations up to the kernel. Signed-off-by: Jason A. Donenfeld --- MAINTAINERS | 1 + drivers/char/random.c | 75 +++++++++++++++++++++++++ include/uapi/asm-generic/unistd.h | 7 ++- include/vdso/getrandom.h | 24 ++++++++ kernel/sys_ni.c | 3 + lib/vdso/Kconfig | 7 +++ scripts/checksyscalls.sh | 4 ++ tools/include/uapi/asm-generic/unistd.h | 7 ++- 8 files changed, 126 insertions(+), 2 deletions(-) create mode 100644 include/vdso/getrandom.h diff --git a/MAINTAINERS b/MAINTAINERS index 256f03904987..3894f947a507 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17287,6 +17287,7 @@ T: git https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git S: Maintained F: drivers/char/random.c F: drivers/virt/vmgenid.c +F: include/vdso/getrandom.h RAPIDIO SUBSYSTEM M: Matt Porter diff --git a/drivers/char/random.c b/drivers/char/random.c index a2a18bd3d7d7..16e9edce771f 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -8,6 +8,7 @@ * into roughly six sections, each with a section header: * * - Initialization and readiness waiting. + * - vDSO support helpers. * - Fast key erasure RNG, the "crng". * - Entropy accumulation and extraction routines. * - Entropy collection routines. @@ -39,6 +40,7 @@ #include #include #include +#include #include #include #include @@ -55,6 +57,9 @@ #include #include #include +#ifdef CONFIG_VGETRANDOM_ALLOC_SYSCALL +#include +#endif #include #include #include @@ -167,6 +172,76 @@ int __cold execute_with_initialized_rng(struct notifier_block *nb) __func__, (void *)_RET_IP_, crng_init) + +/******************************************************************** + * + * vDSO support helpers. + * + * The actual vDSO function is defined over in lib/vdso/getrandom.c, + * but this section contains the kernel-mode helpers to support that. + * + ********************************************************************/ + +#ifdef CONFIG_VGETRANDOM_ALLOC_SYSCALL +/** + * vgetrandom_alloc - allocate opaque states for use with vDSO getrandom(). + * + * @num: on input, a pointer to a suggested hint of how many states to + * allocate, and on output the number of states actually allocated. + * + * @size_per_each: the size of each state allocated, so that the caller can + * split up the returned allocation into individual states. + * + * @flags: currently always zero. + * + * The getrandom() vDSO function in userspace requires an opaque state, which + * this function allocates by mapping a certain number of special pages into + * the calling process. It takes a hint as to the number of opaque states + * desired, and provides the caller with the number of opaque states actually + * allocated, the size of each one in bytes, and the address of the first + * state. + + * Returns a pointer to the first state in the allocation. + * + */ +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned int __user *, num, + unsigned int __user *, size_per_each, unsigned int, flags) +{ + const size_t state_size = sizeof(struct vgetrandom_state); + size_t alloc_size, num_states; + unsigned long pages_addr; + unsigned int num_hint; + int ret; + + if (flags) + return -EINVAL; + + if (get_user(num_hint, num)) + return -EFAULT; + + num_states = clamp_t(size_t, num_hint, 1, (SIZE_MAX & PAGE_MASK) / state_size); + alloc_size = PAGE_ALIGN(num_states * state_size); + + if (put_user(alloc_size / state_size, num) || put_user(state_size, size_per_each)) + return -EFAULT; + + pages_addr = vm_mmap(NULL, 0, alloc_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, 0); + if (IS_ERR_VALUE(pages_addr)) + return pages_addr; + + ret = do_madvise(current->mm, pages_addr, alloc_size, MADV_WIPEONFORK); + if (ret < 0) + goto err_unmap; + + return pages_addr; + +err_unmap: + vm_munmap(pages_addr, alloc_size); + return ret; +} +#endif + /********************************************************************* * * Fast key erasure RNG, the "crng". diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 45fa180cc56a..77b6debe7e18 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -886,8 +886,13 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) #define __NR_set_mempolicy_home_node 450 __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) +#ifdef __ARCH_WANT_VGETRANDOM_ALLOC +#define __NR_vgetrandom_alloc 451 +__SYSCALL(__NR_vgetrandom_alloc, sys_vgetrandom_alloc) +#endif + #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different diff --git a/include/vdso/getrandom.h b/include/vdso/getrandom.h new file mode 100644 index 000000000000..5f04c8bf4bd4 --- /dev/null +++ b/include/vdso/getrandom.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _VDSO_GETRANDOM_H +#define _VDSO_GETRANDOM_H + +#include + +struct vgetrandom_state { + union { + struct { + u8 batch[CHACHA_BLOCK_SIZE * 3 / 2]; + u32 key[CHACHA_KEY_SIZE / sizeof(u32)]; + }; + u8 batch_key[CHACHA_BLOCK_SIZE * 2]; + }; + unsigned long generation; + u8 pos; + bool in_use; +}; + +#endif /* _VDSO_GETRANDOM_H */ diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 860b2dcf3ac4..f28196cb919b 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -360,6 +360,9 @@ COND_SYSCALL(pkey_free); /* memfd_secret */ COND_SYSCALL(memfd_secret); +/* random */ +COND_SYSCALL(vgetrandom_alloc); + /* * Architecture specific weak syscall entries. */ diff --git a/lib/vdso/Kconfig b/lib/vdso/Kconfig index d883ac299508..b22584f8da03 100644 --- a/lib/vdso/Kconfig +++ b/lib/vdso/Kconfig @@ -31,3 +31,10 @@ config GENERIC_VDSO_TIME_NS VDSO endif + +config VGETRANDOM_ALLOC_SYSCALL + bool + select ADVISE_SYSCALLS + help + Selected by the getrandom() vDSO function, which requires this + for state allocation. diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh index f33e61aca93d..7f7928c6487f 100755 --- a/scripts/checksyscalls.sh +++ b/scripts/checksyscalls.sh @@ -44,6 +44,10 @@ cat << EOF #define __IGNORE_memfd_secret #endif +#ifndef __ARCH_WANT_VGETRANDOM_ALLOC +#define __IGNORE_vgetrandom_alloc +#endif + /* Missing flags argument */ #define __IGNORE_renameat /* renameat2 */ diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h index 45fa180cc56a..77b6debe7e18 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h @@ -886,8 +886,13 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) #define __NR_set_mempolicy_home_node 450 __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) +#ifdef __ARCH_WANT_VGETRANDOM_ALLOC +#define __NR_vgetrandom_alloc 451 +__SYSCALL(__NR_vgetrandom_alloc, sys_vgetrandom_alloc) +#endif + #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different From patchwork Mon Nov 28 11:18:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 26650 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp5597171wrr; Mon, 28 Nov 2022 03:38:46 -0800 (PST) X-Google-Smtp-Source: AA0mqf7TuHGMZR3DVtvNBFSVVkBqXVoBqudLJ1Ai3IoAZFG1711/GZx4xPl5IrzOq6UYH4D6u07g X-Received: by 2002:a17:906:32d9:b0:7ae:31a0:5727 with SMTP id k25-20020a17090632d900b007ae31a05727mr27417445ejk.540.1669635526358; Mon, 28 Nov 2022 03:38:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669635526; cv=none; d=google.com; s=arc-20160816; b=TG52kw/5mTEJ3VQyo4rZcxOeqGcA81u6wbhgAe2nqYUwfEoWlNcG1WFVD8i/faFIqt LgtRXlLEW8NiGa9YaCb09Ry6elmqOqWCMIGo1BZXrV5I4MtJPMpdi6i79IzzmFjP9CqJ hgJaW5zs1C6UUdTkhqIASI0cEYVZXWrxV8IVM6yZOSPpaCadnvtL4TQ9vPbhE8aOHFWn lMvpofzZun4J+tNJkYh1FniSSL7raLEEXox1TvDlgc8ZRkKrq2iXYO950XHFjHiCW2Lu GG5MWx4bw7VAsBML/APHcCLoh1mU0UqXYRwGYsgKoSxOXOoViAMcOdYEQ1sOVUzu98pB QHHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RpsQmTYfqDB6iJajbsrUtqb091fdsxaSA7iIIr/rcdY=; b=XwCCto8vuoNQnfEyfE3WTnXXe1acIDHXnpNN2HdX9vziqTSVhAB4Vp1QOyKx3pcYV6 B8ryIKdWDLjoc6p28bofFTsHJ1MbTq90n1d4AYwCnMvyCf3HsmCWbkPxQdHIA+n5dkrc FOPonNwS5EZmdKsl7AGaQdI4EO9bLRRrAfC5v46GxTlxj3jlRFR71voor/DJRP8Wxzsu Jb3+8/bH9c3qy9eigkB9ySH0iCIfuiEgSivIXzfcAra5McGcZZtI3xzc8apLvUox0/02 tFwjLVhU5/+8jWEzTo6qpqx7vvdPbzlRpkhqICeDdww8P3pZdnwRgmDt5Cf9ecspz5Dp qVhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=20210105 header.b=fcZzR29h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=zx2c4.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cx23-20020a05640222b700b00462e23be64fsi8491723edb.578.2022.11.28.03.38.20; Mon, 28 Nov 2022 03:38:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=20210105 header.b=fcZzR29h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231279AbiK1LU6 (ORCPT + 99 others); Mon, 28 Nov 2022 06:20:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231271AbiK1LUP (ORCPT ); Mon, 28 Nov 2022 06:20:15 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D7391CFCD; Mon, 28 Nov 2022 03:19:01 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7DE1E6108D; Mon, 28 Nov 2022 11:19:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 99DFFC43141; Mon, 28 Nov 2022 11:18:59 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="fcZzR29h" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1669634338; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RpsQmTYfqDB6iJajbsrUtqb091fdsxaSA7iIIr/rcdY=; b=fcZzR29hmxR7fwO2ZySI3+B45o2I1PtxU8ZjecKJM0m59WZmfUENeV9Pc0bb3LdCM5Jey+ 7i7hrE/w16/3g4vxkhcmeFYWA7XlswKWSI1RsO/4pqSCN1ZObu3BrOe1xltQHo2ygOXoid 3pDEFrMUU8erRN4wFONC7s15rZYCReo= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id fe9105f7 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 28 Nov 2022 11:18:57 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de Cc: "Jason A. Donenfeld" , linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , Carlos O'Donell , Florian Weimer , Arnd Bergmann , Christian Brauner Subject: [PATCH v8 2/3] random: introduce generic vDSO getrandom() implementation Date: Mon, 28 Nov 2022 12:18:28 +0100 Message-Id: <20221128111829.2477505-3-Jason@zx2c4.com> In-Reply-To: <20221128111829.2477505-1-Jason@zx2c4.com> References: <20221128111829.2477505-1-Jason@zx2c4.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS,T_FILL_THIS_FORM_SHORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750739741910100006?= X-GMAIL-MSGID: =?utf-8?q?1750739741910100006?= Provide a generic C vDSO getrandom() implementation, which operates on an opaque state returned by vgetrandom_alloc() and produces random bytes the same way as getrandom(). This has a the API signature: ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state); The return value and the first 3 arguments are the same as ordinary getrandom(), while the last argument is a pointer to the opaque allocated state. Were all four arguments passed to the getrandom() syscall, nothing different would happen, and the functions would have the exact same behavior. The actual vDSO RNG algorithm implemented is the same one implemented by drivers/char/random.c, using the same fast-erasure techniques as that. Should the in-kernel implementation change, so too will the vDSO one. It requires an implementation of ChaCha20 that does not use any stack, in order to maintain forward secrecy, so this is left as an architecture-specific fill-in. Stack-less ChaCha20 is an easy algorithm to implement on a variety of architectures, so this shouldn't be too onerous. Initially, the state is keyless, and so the first call makes a getrandom() syscall to generate that key, and then uses it for subsequent calls. By keeping track of a generation counter, it knows when its key is invalidated and it should fetch a new one using the syscall. Later, more than just a generation counter might be used. Since MADV_WIPEONFORK is set on the opaque state, the key and related state is wiped during a fork(), so secrets don't roll over into new processes, and the same state doesn't accidentally generate the same random stream. The generation counter, as well, is always >0, so that the 0 counter is a useful indication of a fork() or otherwise uninitialized state. If the kernel RNG is not yet initialized, then the vDSO always calls the syscall, because that behavior cannot be emulated in userspace, but fortunately that state is short lived and only during early boot. If it has been initialized, then there is no need to inspect the `flags` argument, because the behavior does not change post-initialization regardless of the `flags` value. Since the opaque state passed to it is mutated, vDSO getrandom() is not reentrant, when used with the same opaque state, which libc should be mindful of. vgetrandom_alloc() and vDSO getrandom() together provide the ability for userspace to generate random bytes quickly and safely, and is intended to be integrated into libc's thread management. As an illustrative example, the following code might be used to do the same outside of libc. All of the static functions are to be considered implementation private, including the vgetrandom_alloc() syscall wrapper, which generally shouldn't be exposed outside of libc, with the non-static vgetrandom() function at the end being the exported interface. The various pthread-isms are expected to be elided into libc internals. This per-thread allocation scheme is very naive and does not shrink; other implementations may choose to be more complex. static void *vgetrandom_alloc(unsigned int *num, unsigned int *size_per_each, unsigned int flags) { long ret = syscall(__NR_vgetrandom_alloc, &num, &size_per_each, flags); return ret == -1 ? NULL : (void *)ret; } static struct { pthread_mutex_t lock; void **states; size_t len, cap; } grnd_allocator = { .lock = PTHREAD_MUTEX_INITIALIZER }; static void *vgetrandom_get_state(void) { void *state = NULL; pthread_mutex_lock(&grnd_allocator.lock); if (!grnd_allocator.len) { size_t new_cap; unsigned int size_per_each, num = 16; /* Just a hint. Could also be nr_cpus. */ void *new_block = vgetrandom_alloc(&num, &size_per_each, 0), *new_states; if (!new_block) goto out; new_cap = grnd_allocator.cap + num; new_states = reallocarray(grnd_allocator.states, new_cap, sizeof(*grnd_allocator.states)); if (!new_states) { munmap(new_block, num * size_per_each); goto out; } grnd_allocator.cap = new_cap; grnd_allocator.states = new_states; for (size_t i = 0; i < num; ++i) { grnd_allocator.states[i] = new_block; new_block += size_per_each; } grnd_allocator.len = num; } state = grnd_allocator.states[--grnd_allocator.len]; out: pthread_mutex_unlock(&grnd_allocator.lock); return state; } static void vgetrandom_put_state(void *state) { if (!state) return; pthread_mutex_lock(&grnd_allocator.lock); grnd_allocator.states[grnd_allocator.len++] = state; pthread_mutex_unlock(&grnd_allocator.lock); } static struct { ssize_t(*fn)(void *buf, size_t len, unsigned long flags, void *state); pthread_key_t key; pthread_once_t initialized; } grnd_ctx = { .initialized = PTHREAD_ONCE_INIT }; static void vgetrandom_init(void) { if (pthread_key_create(&grnd_ctx.key, vgetrandom_put_state) != 0) return; grnd_ctx.fn = __vdsosym("LINUX_2.6", "__vdso_getrandom"); } ssize_t vgetrandom(void *buf, size_t len, unsigned long flags) { void *state; pthread_once(&grnd_ctx.initialized, vgetrandom_init); if (!grnd_ctx.fn) return getrandom(buf, len, flags); state = pthread_getspecific(grnd_ctx.key); if (!state) { state = vgetrandom_get_state(); if (pthread_setspecific(grnd_ctx.key, state) != 0) { vgetrandom_put_state(state); state = NULL; } if (!state) return getrandom(buf, len, flags); } return grnd_ctx.fn(buf, len, flags, state); } Signed-off-by: Jason A. Donenfeld --- MAINTAINERS | 1 + drivers/char/random.c | 9 ++ include/vdso/datapage.h | 11 +++ lib/vdso/Kconfig | 7 +- lib/vdso/getrandom.c | 204 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 231 insertions(+), 1 deletion(-) create mode 100644 lib/vdso/getrandom.c diff --git a/MAINTAINERS b/MAINTAINERS index 3894f947a507..70dff39fcff9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17288,6 +17288,7 @@ S: Maintained F: drivers/char/random.c F: drivers/virt/vmgenid.c F: include/vdso/getrandom.h +F: lib/vdso/getrandom.c RAPIDIO SUBSYSTEM M: Matt Porter diff --git a/drivers/char/random.c b/drivers/char/random.c index 16e9edce771f..a2c530e10d6a 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -60,6 +60,9 @@ #ifdef CONFIG_VGETRANDOM_ALLOC_SYSCALL #include #endif +#ifdef CONFIG_VDSO_GETRANDOM +#include +#endif #include #include #include @@ -344,6 +347,9 @@ static void crng_reseed(struct work_struct *work) if (next_gen == ULONG_MAX) ++next_gen; WRITE_ONCE(base_crng.generation, next_gen); +#ifdef CONFIG_VDSO_GETRANDOM + smp_store_release(&_vdso_rng_data.generation, next_gen + 1); +#endif if (!static_branch_likely(&crng_is_ready)) crng_init = CRNG_READY; spin_unlock_irqrestore(&base_crng.lock, flags); @@ -794,6 +800,9 @@ static void __cold _credit_init_bits(size_t bits) if (static_key_initialized) execute_in_process_context(crng_set_ready, &set_ready); atomic_notifier_call_chain(&random_ready_notifier, 0, NULL); +#ifdef CONFIG_VDSO_GETRANDOM + smp_store_release(&_vdso_rng_data.is_ready, true); +#endif wake_up_interruptible(&crng_init_wait); kill_fasync(&fasync, SIGIO, POLL_IN); pr_notice("crng init done\n"); diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h index 73eb622e7663..9ae4d76b36c7 100644 --- a/include/vdso/datapage.h +++ b/include/vdso/datapage.h @@ -109,6 +109,16 @@ struct vdso_data { struct arch_vdso_data arch_data; }; +/** + * struct vdso_rng_data - vdso RNG state information + * @generation: a counter representing the number of RNG reseeds + * @is_ready: whether the RNG is initialized + */ +struct vdso_rng_data { + unsigned long generation; + bool is_ready; +}; + /* * We use the hidden visibility to prevent the compiler from generating a GOT * relocation. Not only is going through a GOT useless (the entry couldn't and @@ -120,6 +130,7 @@ struct vdso_data { */ extern struct vdso_data _vdso_data[CS_BASES] __attribute__((visibility("hidden"))); extern struct vdso_data _timens_data[CS_BASES] __attribute__((visibility("hidden"))); +extern struct vdso_rng_data _vdso_rng_data __attribute__((visibility("hidden"))); /* * The generic vDSO implementation requires that gettimeofday.h diff --git a/lib/vdso/Kconfig b/lib/vdso/Kconfig index b22584f8da03..f12b76642921 100644 --- a/lib/vdso/Kconfig +++ b/lib/vdso/Kconfig @@ -29,7 +29,6 @@ config GENERIC_VDSO_TIME_NS help Selected by architectures which support time namespaces in the VDSO - endif config VGETRANDOM_ALLOC_SYSCALL @@ -38,3 +37,9 @@ config VGETRANDOM_ALLOC_SYSCALL help Selected by the getrandom() vDSO function, which requires this for state allocation. + +config VDSO_GETRANDOM + bool + select VGETRANDOM_ALLOC_SYSCALL + help + Selected by architectures that support vDSO getrandom(). diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c new file mode 100644 index 000000000000..1c51e24a7f24 --- /dev/null +++ b/lib/vdso/getrandom.c @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022 Jason A. Donenfeld . All Rights Reserved. + */ + +#include +#include +#include +#include +#include +#include +#include + +#define MEMCPY_AND_ZERO_SRC(type, dst, src, len) do { \ + while (len >= sizeof(type)) { \ + __put_unaligned_t(type, __get_unaligned_t(type, src), dst); \ + __put_unaligned_t(type, 0, src); \ + dst += sizeof(type); \ + src += sizeof(type); \ + len -= sizeof(type); \ + } \ +} while (0) + +static void memcpy_and_zero_src(void *dst, void *src, size_t len) +{ + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) { + if (IS_ENABLED(CONFIG_64BIT)) + MEMCPY_AND_ZERO_SRC(u64, dst, src, len); + MEMCPY_AND_ZERO_SRC(u32, dst, src, len); + MEMCPY_AND_ZERO_SRC(u16, dst, src, len); + } + MEMCPY_AND_ZERO_SRC(u8, dst, src, len); +} + +/** + * __cvdso_getrandom_data - generic vDSO implementation of getrandom() syscall + * @rng_info: describes state of kernel RNG, memory shared with kernel + * @buffer: destination buffer to fill with random bytes + * @len: size of @buffer in bytes + * @flags: zero or more GRND_* flags + * @opaque_state: a pointer to an opaque state area + * + * This implements a "fast key erasure" RNG using ChaCha20, in the same way that the kernel's + * getrandom() syscall does. It periodically reseeds its key from the kernel's RNG, at the same + * schedule that the kernel's RNG is reseeded. If the kernel's RNG is not ready, then this always + * calls into the syscall. + * + * @opaque_state *must* be allocated using the vgetrandom_alloc() syscall. Unless external locking + * is used, one state must be allocated per thread, as it is not safe to call this function + * concurrently with the same @opaque_state. However, it is safe to call this using the same + * @opaque_state that is shared between main code and signal handling code, within the same thread. + * + * Returns the number of random bytes written to @buffer, or a negative value indicating an error. + */ +static __always_inline ssize_t +__cvdso_getrandom_data(const struct vdso_rng_data *rng_info, void *buffer, size_t len, + unsigned int flags, void *opaque_state) +{ + ssize_t ret = min_t(size_t, INT_MAX & PAGE_MASK /* = MAX_RW_COUNT */, len); + struct vgetrandom_state *state = opaque_state; + size_t batch_len, nblocks, orig_len = len; + unsigned long current_generation; + void *orig_buffer = buffer; + u32 counter[2] = { 0 }; + bool in_use; + + /* + * If the kernel's RNG is not yet ready, then it's not possible to provide random bytes from + * userspace, because A) the various @flags require this to block, or not, depending on + * various factors unavailable to userspace, and B) the kernel's behavior before the RNG is + * ready is to reseed from the entropy pool at every invocation. + */ + if (unlikely(!READ_ONCE(rng_info->is_ready))) + goto fallback_syscall; + + /* + * This condition is checked after @rng_info->is_ready, because before the kernel's RNG is + * initialized, the @flags parameter may require this to block or return an error, even when + * len is zero. + */ + if (unlikely(!len)) + return 0; + + /* + * @state->in_use is basic reentrancy protection against this running in a signal handler + * with the same @opaque_state, but obviously not atomic wrt multiple CPUs or more than one + * level of reentrancy. If a signal interrupts this after reading @state->in_use, but before + * writing @state->in_use, there is still no race, because the signal handler will run to + * its completion before returning execution. + */ + in_use = READ_ONCE(state->in_use); + if (unlikely(in_use)) + goto fallback_syscall; + WRITE_ONCE(state->in_use, true); + +retry_generation: + /* + * @rng_info->generation must always be read here, as it serializes @state->key with the + * kernel's RNG reseeding schedule. + */ + current_generation = READ_ONCE(rng_info->generation); + + /* + * If @state->generation doesn't match the kernel RNG's generation, then it means the + * kernel's RNG has reseeded, and so @state->key is reseeded as well. + */ + if (unlikely(state->generation != current_generation)) { + /* + * Write the generation before filling the key, in case of fork. If there is a fork + * just after this line, the two forks will get different random bytes from the + * syscall, which is good. However, were this line to occur after the getrandom + * syscall, then both child and parent could have the same bytes and the same + * generation counter, so the fork would not be detected. Therefore, write + * @state->generation before the call to the getrandom syscall. + */ + WRITE_ONCE(state->generation, current_generation); + + /* Reseed @state->key using fresh bytes from the kernel. */ + if (getrandom_syscall(state->key, sizeof(state->key), 0) != sizeof(state->key)) { + /* + * If the syscall failed to refresh the key, then @state->key is now + * invalid, so invalidate the generation so that it is not used again, and + * fallback to using the syscall entirely. + */ + WRITE_ONCE(state->generation, 0); + + /* + * Set @state->in_use to false only after the last write to @state in the + * line above. + */ + WRITE_ONCE(state->in_use, false); + + goto fallback_syscall; + } + + /* + * Set @state->pos to beyond the end of the batch, so that the batch is refilled + * using the new key. + */ + state->pos = sizeof(state->batch); + } + + len = ret; +more_batch: + /* + * First use bytes out of @state->batch, which may have been filled by the last call to this + * function. + */ + batch_len = min_t(size_t, sizeof(state->batch) - state->pos, len); + if (batch_len) { + /* Zeroing at the same time as memcpying helps preserve forward secrecy. */ + memcpy_and_zero_src(buffer, state->batch + state->pos, batch_len); + state->pos += batch_len; + buffer += batch_len; + len -= batch_len; + } + + if (!len) { + /* + * Since @rng_info->generation will never be 0, re-read @state->generation, rather + * than using the local current_generation variable, to learn whether a fork + * occurred. Primarily, though, this indicates whether the kernel's RNG has + * reseeded, in which case generate a new key and start over. + */ + if (unlikely(READ_ONCE(state->generation) != READ_ONCE(rng_info->generation))) { + buffer = orig_buffer; + goto retry_generation; + } + + /* + * Set @state->in_use to false only when there will be no more reads or writes of + * @state. + */ + WRITE_ONCE(state->in_use, false); + return ret; + } + + /* Generate blocks of RNG output directly into @buffer while there's enough room left. */ + nblocks = len / CHACHA_BLOCK_SIZE; + if (nblocks) { + __arch_chacha20_blocks_nostack(buffer, state->key, counter, nblocks); + buffer += nblocks * CHACHA_BLOCK_SIZE; + len -= nblocks * CHACHA_BLOCK_SIZE; + } + + BUILD_BUG_ON(sizeof(state->batch_key) % CHACHA_BLOCK_SIZE != 0); + + /* Refill the batch and then overwrite the key, in order to preserve forward secrecy. */ + __arch_chacha20_blocks_nostack(state->batch_key, state->key, counter, + sizeof(state->batch_key) / CHACHA_BLOCK_SIZE); + + /* Since the batch was just refilled, set the position back to 0 to indicate a full batch. */ + state->pos = 0; + goto more_batch; + +fallback_syscall: + return getrandom_syscall(orig_buffer, orig_len, flags); +} + +static __always_inline ssize_t +__cvdso_getrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state) +{ + return __cvdso_getrandom_data(__arch_get_vdso_rng_data(), buffer, len, flags, opaque_state); +} From patchwork Mon Nov 28 11:18:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 26651 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp5597991wrr; Mon, 28 Nov 2022 03:40:32 -0800 (PST) X-Google-Smtp-Source: AA0mqf5lOv29iCknDNuwfyWjzw48OTpCuYeF59QjTnt3GiDbPcvr31MEbDXBNWJxDA7zQ7QSxzHU X-Received: by 2002:a05:6402:4d0:b0:462:317a:e02f with SMTP id n16-20020a05640204d000b00462317ae02fmr34604752edw.125.1669635632198; Mon, 28 Nov 2022 03:40:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669635632; cv=none; d=google.com; s=arc-20160816; b=XCp05ekTsm6+57I/aC+9PYrFz6Rf6VJ8F3NMdMC+NgUHdjvFQHW5Tr5hiyRMHVpF/U 8CBxLKQ1WYh/PUx3odZES0SYfWQ1VR7kX8FPf2p5zj3v288Upg4mgjGurHhmTUmCBHhR oRZ3+qAlmLme5YCT0FPBui8dBClERnKD5C2HngfygiULEjgJm0XQVd49QFigI/uaqx94 ANqHvkw6hKKlPDS5IC2kTy3JBEpdghWYv8b13+TVvWzBW1Ljya+rP9NELLBOHb2wPd84 MGPQXPVfWCBdWO6mAa7xzm8DJXLGhPOMoFe9qB+yeeaOCVAtDLolzU38Y1ufTcDO0dH4 z5fQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=f5i67mH1DM2ISWIZ2GZkS7FA5vKEZX+lFb/6tPCa01s=; b=xbKJxz+9+5BxN+WUwB3iiR/IdoEtLNFUSmmm+zqztd9I4ct1S7P1Ir7XH6KhxFRXSy w9Fs5Brc+tmi2LV7CtqW1CmC+2Qm0g4Z6i8HaZDNn0RDCdvzD9xGf6cWrrN/KHDoS8/k T1+TUzLK7ISEAlU2AbOaKQ8oLt5x6pjT+MLmiusWbEO+GZOJTFJvLO370unGWq31o0Lg tRzdDE4753b0hfw2i6VK8y/Y3YTadOsBNSEkZoPjp3w+QO/j6CA/RFH67sqYAgX1Ik/S 4M/GNVRv/WB2lyih6Tm+RQeYn1CbKa7K2kk2XNuOnJCgU6QE6X7SDH65M5OM6TixBUKm LmXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=20210105 header.b=XBdqkkt0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=zx2c4.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q24-20020aa7d458000000b0046ada79b960si5792606edr.615.2022.11.28.03.40.06; Mon, 28 Nov 2022 03:40:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=20210105 header.b=XBdqkkt0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230513AbiK1LVQ (ORCPT + 99 others); Mon, 28 Nov 2022 06:21:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231213AbiK1LUR (ORCPT ); Mon, 28 Nov 2022 06:20:17 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD66513F07; Mon, 28 Nov 2022 03:19:07 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3C93C6108F; Mon, 28 Nov 2022 11:19:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 33B54C433C1; Mon, 28 Nov 2022 11:19:05 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="XBdqkkt0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1669634344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f5i67mH1DM2ISWIZ2GZkS7FA5vKEZX+lFb/6tPCa01s=; b=XBdqkkt0fHbqOTsG32gSBrLvAh9mDbM9XzD0EtlH1bLZty5ieEYmdm84tJs7uf/l6CTR0p f4Jm7UcP8rsvpGaFvHXbu9Z/gtEY+TaP9Dny+aGpeQju7XZXbO21YTaFKgC/4qMJqL8rFL JIKHyePkXIeSy4bmyJ1lKoFN01fYDdw= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id a17222f8 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 28 Nov 2022 11:19:03 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de Cc: "Jason A. Donenfeld" , linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , Carlos O'Donell , Florian Weimer , Arnd Bergmann , Christian Brauner , Samuel Neves Subject: [PATCH v8 3/3] x86: vdso: Wire up getrandom() vDSO implementation Date: Mon, 28 Nov 2022 12:18:29 +0100 Message-Id: <20221128111829.2477505-4-Jason@zx2c4.com> In-Reply-To: <20221128111829.2477505-1-Jason@zx2c4.com> References: <20221128111829.2477505-1-Jason@zx2c4.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750739852672483260?= X-GMAIL-MSGID: =?utf-8?q?1750739852672483260?= Hook up the generic vDSO implementation to the x86 vDSO data page. Since the existing vDSO infrastructure is heavily based on the timekeeping functionality, which works over arrays of bases, a new macro is introduced for vvars that are not arrays. Also enable the vgetrandom_alloc() syscall, which the vDSO implementation relies on. The vDSO function requires a ChaCha20 implementation that does not write to the stack, yet can still do an entire ChaCha20 permutation, so provide this using SSE2, since this is userland code that must work on all x86-64 processors. Reviewed-by: Samuel Neves # for vgetrandom-chacha.S Signed-off-by: Jason A. Donenfeld --- arch/x86/Kconfig | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/entry/vdso/Makefile | 3 +- arch/x86/entry/vdso/vdso.lds.S | 2 + arch/x86/entry/vdso/vgetrandom-chacha.S | 177 ++++++++++++++++++++++++ arch/x86/entry/vdso/vgetrandom.c | 17 +++ arch/x86/include/asm/unistd.h | 1 + arch/x86/include/asm/vdso/getrandom.h | 55 ++++++++ arch/x86/include/asm/vdso/vsyscall.h | 2 + arch/x86/include/asm/vvar.h | 16 +++ 10 files changed, 274 insertions(+), 1 deletion(-) create mode 100644 arch/x86/entry/vdso/vgetrandom-chacha.S create mode 100644 arch/x86/entry/vdso/vgetrandom.c create mode 100644 arch/x86/include/asm/vdso/getrandom.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 67745ceab0db..357148c4a3a4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -269,6 +269,7 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO + select VDSO_GETRANDOM if X86_64 select HOTPLUG_SMT if SMP select IRQ_FORCED_THREADING select NEED_PER_CPU_EMBED_FIRST_CHUNK diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c84d12608cd2..0186f173f0e8 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -372,6 +372,7 @@ 448 common process_mrelease sys_process_mrelease 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node +451 common vgetrandom_alloc sys_vgetrandom_alloc # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 3e88b9df8c8f..2de64e52236a 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -27,7 +27,7 @@ VDSO32-$(CONFIG_X86_32) := y VDSO32-$(CONFIG_IA32_EMULATION) := y # files to link into the vdso -vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o +vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vgetrandom.o vgetrandom-chacha.o vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o vobjs32-y += vdso32/vclock_gettime.o vobjs-$(CONFIG_X86_SGX) += vsgx.o @@ -104,6 +104,7 @@ CFLAGS_REMOVE_vclock_gettime.o = -pg CFLAGS_REMOVE_vdso32/vclock_gettime.o = -pg CFLAGS_REMOVE_vgetcpu.o = -pg CFLAGS_REMOVE_vsgx.o = -pg +CFLAGS_REMOVE_vgetrandom.o = -pg # # X32 processes use x32 vDSO to access 64bit kernel data. diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S index 4bf48462fca7..1919cc39277e 100644 --- a/arch/x86/entry/vdso/vdso.lds.S +++ b/arch/x86/entry/vdso/vdso.lds.S @@ -28,6 +28,8 @@ VERSION { clock_getres; __vdso_clock_getres; __vdso_sgx_enter_enclave; + getrandom; + __vdso_getrandom; local: *; }; } diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/vgetrandom-chacha.S new file mode 100644 index 000000000000..91fbb7ac7af4 --- /dev/null +++ b/arch/x86/entry/vdso/vgetrandom-chacha.S @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022 Jason A. Donenfeld . All Rights Reserved. + */ + +#include +#include + +.section .rodata.cst16.CONSTANTS, "aM", @progbits, 16 +.align 16 +CONSTANTS: .octa 0x6b20657479622d323320646e61707865 +.text + +/* + * Very basic SSE2 implementation of ChaCha20. Produces a given positive number + * of blocks of output with a nonce of 0, taking an input key and 8-byte + * counter. Importantly does not spill to the stack. Its arguments are: + * + * rdi: output bytes + * rsi: 32-byte key input + * rdx: 8-byte counter input/output + * rcx: number of 64-byte blocks to write to output + */ +SYM_FUNC_START(__arch_chacha20_blocks_nostack) + +#define output %rdi +#define key %rsi +#define counter %rdx +#define nblocks %rcx +#define i %al +#define state0 %xmm0 +#define state1 %xmm1 +#define state2 %xmm2 +#define state3 %xmm3 +#define copy0 %xmm4 +#define copy1 %xmm5 +#define copy2 %xmm6 +#define copy3 %xmm7 +#define temp %xmm8 +#define one %xmm9 + + /* copy0 = "expand 32-byte k" */ + movaps CONSTANTS(%rip),copy0 + /* copy1,copy2 = key */ + movups 0x00(key),copy1 + movups 0x10(key),copy2 + /* copy3 = counter || zero nonce */ + movq 0x00(counter),copy3 + /* one = 1 || 0 */ + movq $1,%rax + movq %rax,one + +.Lblock: + /* state0,state1,state2,state3 = copy0,copy1,copy2,copy3 */ + movdqa copy0,state0 + movdqa copy1,state1 + movdqa copy2,state2 + movdqa copy3,state3 + + movb $10,i +.Lpermute: + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ + paddd state1,state0 + pxor state0,state3 + movdqa state3,temp + pslld $16,temp + psrld $16,state3 + por temp,state3 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ + paddd state3,state2 + pxor state2,state1 + movdqa state1,temp + pslld $12,temp + psrld $20,state1 + por temp,state1 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ + paddd state1,state0 + pxor state0,state3 + movdqa state3,temp + pslld $8,temp + psrld $24,state3 + por temp,state3 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ + paddd state3,state2 + pxor state2,state1 + movdqa state1,temp + pslld $7,temp + psrld $25,state1 + por temp,state1 + + /* state1[0,1,2,3] = state1[0,3,2,1] */ + pshufd $0x39,state1,state1 + /* state2[0,1,2,3] = state2[1,0,3,2] */ + pshufd $0x4e,state2,state2 + /* state3[0,1,2,3] = state3[2,1,0,3] */ + pshufd $0x93,state3,state3 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ + paddd state1,state0 + pxor state0,state3 + movdqa state3,temp + pslld $16,temp + psrld $16,state3 + por temp,state3 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ + paddd state3,state2 + pxor state2,state1 + movdqa state1,temp + pslld $12,temp + psrld $20,state1 + por temp,state1 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ + paddd state1,state0 + pxor state0,state3 + movdqa state3,temp + pslld $8,temp + psrld $24,state3 + por temp,state3 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ + paddd state3,state2 + pxor state2,state1 + movdqa state1,temp + pslld $7,temp + psrld $25,state1 + por temp,state1 + + /* state1[0,1,2,3] = state1[2,1,0,3] */ + pshufd $0x93,state1,state1 + /* state2[0,1,2,3] = state2[1,0,3,2] */ + pshufd $0x4e,state2,state2 + /* state3[0,1,2,3] = state3[0,3,2,1] */ + pshufd $0x39,state3,state3 + + decb i + jnz .Lpermute + + /* output0 = state0 + copy0 */ + paddd copy0,state0 + movups state0,0x00(output) + /* output1 = state1 + copy1 */ + paddd copy1,state1 + movups state1,0x10(output) + /* output2 = state2 + copy2 */ + paddd copy2,state2 + movups state2,0x20(output) + /* output3 = state3 + copy3 */ + paddd copy3,state3 + movups state3,0x30(output) + + /* ++copy3.counter */ + paddq one,copy3 + + /* output += 64, --nblocks */ + addq $64,output + decq nblocks + jnz .Lblock + + /* counter = copy3.counter */ + movq copy3,0x00(counter) + + /* Zero out the potentially sensitive regs, in case nothing uses these again. */ + pxor state0,state0 + pxor state1,state1 + pxor state2,state2 + pxor state3,state3 + pxor copy1,copy1 + pxor copy2,copy2 + pxor temp,temp + + ret +SYM_FUNC_END(__arch_chacha20_blocks_nostack) diff --git a/arch/x86/entry/vdso/vgetrandom.c b/arch/x86/entry/vdso/vgetrandom.c new file mode 100644 index 000000000000..6045ded5da90 --- /dev/null +++ b/arch/x86/entry/vdso/vgetrandom.c @@ -0,0 +1,17 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2022 Jason A. Donenfeld . All Rights Reserved. + */ +#include + +#include "../../../../lib/vdso/getrandom.c" + +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, void *state); + +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, void *state) +{ + return __cvdso_getrandom(buffer, len, flags, state); +} + +ssize_t getrandom(void *, size_t, unsigned int, void *) + __attribute__((weak, alias("__vdso_getrandom"))); diff --git a/arch/x86/include/asm/unistd.h b/arch/x86/include/asm/unistd.h index 761173ccc33c..1bf509eaeff1 100644 --- a/arch/x86/include/asm/unistd.h +++ b/arch/x86/include/asm/unistd.h @@ -27,6 +27,7 @@ # define __ARCH_WANT_COMPAT_SYS_PWRITEV64 # define __ARCH_WANT_COMPAT_SYS_PREADV64V2 # define __ARCH_WANT_COMPAT_SYS_PWRITEV64V2 +# define __ARCH_WANT_VGETRANDOM_ALLOC # define X32_NR_syscalls (__NR_x32_syscalls) # define IA32_NR_syscalls (__NR_ia32_syscalls) diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h new file mode 100644 index 000000000000..a2bb2dc4443e --- /dev/null +++ b/arch/x86/include/asm/vdso/getrandom.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022 Jason A. Donenfeld . All Rights Reserved. + */ +#ifndef __ASM_VDSO_GETRANDOM_H +#define __ASM_VDSO_GETRANDOM_H + +#ifndef __ASSEMBLY__ + +#include +#include + +/** + * getrandom_syscall - invoke the getrandom() syscall + * @buffer: destination buffer to fill with random bytes + * @len: size of @buffer in bytes + * @flags: zero or more GRND_* flags + * Returns the number of random bytes written to @buffer, or a negative value indicating an error. + */ +static __always_inline ssize_t getrandom_syscall(void *buffer, size_t len, unsigned int flags) +{ + long ret; + + asm ("syscall" : "=a" (ret) : + "0" (__NR_getrandom), "D" (buffer), "S" (len), "d" (flags) : + "rcx", "r11", "memory"); + + return ret; +} + +#define __vdso_rng_data (VVAR(_vdso_rng_data)) + +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void) +{ + if (__vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS) + return (void *)&__vdso_rng_data + ((void *)&__timens_vdso_data - (void *)&__vdso_data); + return &__vdso_rng_data; +} + +/** + * __arch_chacha20_blocks_nostack - generate ChaCha20 stream without using the stack + * @dst_bytes: a destination buffer to hold @nblocks * 64 bytes of output + * @key: 32-byte input key + * @counter: 8-byte counter, read on input and updated on return + * @nblocks: the number of blocks to generate + * + * Generates a given positive number of block of ChaCha20 output with nonce=0, and does not write to + * any stack or memory outside of the parameters passed to it. This way, there's no concern about + * stack data leaking into forked child processes. + */ +extern void __arch_chacha20_blocks_nostack(u8 *dst_bytes, const u32 *key, u32 *counter, size_t nblocks); + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETRANDOM_H */ diff --git a/arch/x86/include/asm/vdso/vsyscall.h b/arch/x86/include/asm/vdso/vsyscall.h index be199a9b2676..71c56586a22f 100644 --- a/arch/x86/include/asm/vdso/vsyscall.h +++ b/arch/x86/include/asm/vdso/vsyscall.h @@ -11,6 +11,8 @@ #include DEFINE_VVAR(struct vdso_data, _vdso_data); +DEFINE_VVAR_SINGLE(struct vdso_rng_data, _vdso_rng_data); + /* * Update the vDSO data page to keep in sync with kernel timekeeping. */ diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h index 183e98e49ab9..9d9af37f7cab 100644 --- a/arch/x86/include/asm/vvar.h +++ b/arch/x86/include/asm/vvar.h @@ -26,6 +26,8 @@ */ #define DECLARE_VVAR(offset, type, name) \ EMIT_VVAR(name, offset) +#define DECLARE_VVAR_SINGLE(offset, type, name) \ + EMIT_VVAR(name, offset) #else @@ -37,6 +39,10 @@ extern char __vvar_page; extern type timens_ ## name[CS_BASES] \ __attribute__((visibility("hidden"))); \ +#define DECLARE_VVAR_SINGLE(offset, type, name) \ + extern type vvar_ ## name \ + __attribute__((visibility("hidden"))); \ + #define VVAR(name) (vvar_ ## name) #define TIMENS(name) (timens_ ## name) @@ -44,12 +50,22 @@ extern char __vvar_page; type name[CS_BASES] \ __attribute__((section(".vvar_" #name), aligned(16))) __visible +#define DEFINE_VVAR_SINGLE(type, name) \ + type name \ + __attribute__((section(".vvar_" #name), aligned(16))) __visible + #endif /* DECLARE_VVAR(offset, type, name) */ DECLARE_VVAR(128, struct vdso_data, _vdso_data) +#if !defined(_SINGLE_DATA) +#define _SINGLE_DATA +DECLARE_VVAR_SINGLE(640, struct vdso_rng_data, _vdso_rng_data) +#endif + #undef DECLARE_VVAR +#undef DECLARE_VVAR_SINGLE #endif