From patchwork Mon Nov 21 15:29:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 23920 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1670416wrr; Mon, 21 Nov 2022 07:53:59 -0800 (PST) X-Google-Smtp-Source: AA0mqf73ohZV3L+TpI1yX8Q+m2BsB4OLhIW2DzJfugKw2ozdYTjHJ+ORHRa9G70y3A0wK8/g9B3G X-Received: by 2002:a17:906:4351:b0:78d:513d:f447 with SMTP id z17-20020a170906435100b0078d513df447mr2137177ejm.708.1669046038931; Mon, 21 Nov 2022 07:53:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669046038; cv=none; d=google.com; s=arc-20160816; b=pLMTXqunEN+p5Bvnq3eEZNrIwWwLeXyt+Obm1SAVfITS5FfUSav8f7wKGlfKpKA/Cs bd/uOSfILRum4qtzDOdk3BNYNd5+FF0OfmxkZxbOtfeGw1dx42PBYp8QpoQEnYc7F06M nZqyG4i14++Z4VuLw2x/eZPmrT2WzLNTkbFFy12L8fOgL/aXp4P9zmF4bBA3RPNUdq/K 3D+ZMmVK2apxR6N5XBS73AOmjG5GOcI6gGjinSpa3Dl4cJqNpyM3pkFm1aVjXh72oi0O PrDhIq7nL7l0idth0KBTNAz9VQofwJoreRJjIXrKH5yab73EGch3KUZmX6WEWdDX/wZt YOyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bUe/UPdVanyT01WYSBYtxq9AiXBRH1RhNA3ekgRQKdg=; b=LtEkQTraVui5Ctb6emlFOZeLV/seVRuB7uSv27iJ4LQ+XmFf++fVTA8elFxCXzPcWa wTWmQH9wqSVcgncmD5Xn+J+g9Cg5WopJPPPtaxvZizAWeL24l8oA1ZS/s9gqESd0nmuW 3eBjz5MMfZtaOAt9P4/weJs4xK+cXameJdJnh7AtZ0F7jvxQJ4FLs6h2KfLfuAq7gp6O OVXrfeZ4HjyemqL18wCJ46oJUhprQTNEbuYpbEmgpTSnVrAWEXqHBnZHdKy+kfCnJmb5 1v3gh0ckR/oupDsItcptps4+AOcXks3GoHeB+7wpQ9vbBxb2UdkQD/sDcnWDp+TeX34q 09eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=20210105 header.b=PrOQPOSL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=zx2c4.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u20-20020a170906069400b007add6c835a9si8159622ejb.867.2022.11.21.07.53.31; Mon, 21 Nov 2022 07:53:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=20210105 header.b=PrOQPOSL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232518AbiKUPbE (ORCPT + 99 others); Mon, 21 Nov 2022 10:31:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232514AbiKUPaO (ORCPT ); Mon, 21 Nov 2022 10:30:14 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03717CFA7E; Mon, 21 Nov 2022 07:29:32 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 625EC612DA; Mon, 21 Nov 2022 15:29:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DDD6AC433D6; Mon, 21 Nov 2022 15:29:30 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="PrOQPOSL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1669044569; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bUe/UPdVanyT01WYSBYtxq9AiXBRH1RhNA3ekgRQKdg=; b=PrOQPOSL+jFkJcjruOfktxO0tqBqSV4JIcPXODWTs8apRamvl+Y5WlbOIyFMyMG7hlHpKJ ZNeDq9qKwrLe2cOI1DvtEwdxx7Mjjq2ww1Fi6Is+hLK6nPGmVAEqx1PS+qL3j4yJ0IJLLd UKcUdfP7j1BFLEOhpYAmkWzueTWmQjk= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 2d21d89e (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 21 Nov 2022 15:29:29 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de Cc: "Jason A. Donenfeld" , linux-crypto@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , Carlos O'Donell Subject: [PATCH v6 3/3] x86: vdso: Wire up getrandom() vDSO implementation Date: Mon, 21 Nov 2022 16:29:09 +0100 Message-Id: <20221121152909.3414096-4-Jason@zx2c4.com> In-Reply-To: <20221121152909.3414096-1-Jason@zx2c4.com> References: <20221121152909.3414096-1-Jason@zx2c4.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750121619037733427?= X-GMAIL-MSGID: =?utf-8?q?1750121619037733427?= Hook up the generic vDSO implementation to the x86 vDSO data page. Since the existing vDSO infrastructure is heavily based on the timekeeping functionality, which works over arrays of bases, a new macro is introduced for vvars that are not arrays. The vDSO function requires a ChaCha20 implementation that does not write to the stack, yet can still do an entire ChaCha20 permutation, so provide this using SSE2, since this is userland code that must work on all x86-64 processors. Signed-off-by: Jason A. Donenfeld --- arch/x86/Kconfig | 1 + arch/x86/entry/vdso/Makefile | 3 +- arch/x86/entry/vdso/vdso.lds.S | 2 + arch/x86/entry/vdso/vgetrandom-chacha.S | 181 ++++++++++++++++++++++++ arch/x86/entry/vdso/vgetrandom.c | 18 +++ arch/x86/include/asm/vdso/getrandom.h | 49 +++++++ arch/x86/include/asm/vdso/vsyscall.h | 2 + arch/x86/include/asm/vvar.h | 16 +++ 8 files changed, 271 insertions(+), 1 deletion(-) create mode 100644 arch/x86/entry/vdso/vgetrandom-chacha.S create mode 100644 arch/x86/entry/vdso/vgetrandom.c create mode 100644 arch/x86/include/asm/vdso/getrandom.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 331e21ba961a..b64b1b1274ae 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -270,6 +270,7 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO + select HAVE_VDSO_GETRANDOM if X86_64 select HOTPLUG_SMT if SMP select IRQ_FORCED_THREADING select NEED_PER_CPU_EMBED_FIRST_CHUNK diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 3e88b9df8c8f..2de64e52236a 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -27,7 +27,7 @@ VDSO32-$(CONFIG_X86_32) := y VDSO32-$(CONFIG_IA32_EMULATION) := y # files to link into the vdso -vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o +vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vgetrandom.o vgetrandom-chacha.o vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o vobjs32-y += vdso32/vclock_gettime.o vobjs-$(CONFIG_X86_SGX) += vsgx.o @@ -104,6 +104,7 @@ CFLAGS_REMOVE_vclock_gettime.o = -pg CFLAGS_REMOVE_vdso32/vclock_gettime.o = -pg CFLAGS_REMOVE_vgetcpu.o = -pg CFLAGS_REMOVE_vsgx.o = -pg +CFLAGS_REMOVE_vgetrandom.o = -pg # # X32 processes use x32 vDSO to access 64bit kernel data. diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S index 4bf48462fca7..1919cc39277e 100644 --- a/arch/x86/entry/vdso/vdso.lds.S +++ b/arch/x86/entry/vdso/vdso.lds.S @@ -28,6 +28,8 @@ VERSION { clock_getres; __vdso_clock_getres; __vdso_sgx_enter_enclave; + getrandom; + __vdso_getrandom; local: *; }; } diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/vgetrandom-chacha.S new file mode 100644 index 000000000000..bc563d95b976 --- /dev/null +++ b/arch/x86/entry/vdso/vgetrandom-chacha.S @@ -0,0 +1,181 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022 Jason A. Donenfeld . All Rights Reserved. + */ + +#include +#include + +.section .rodata.cst16.CONSTANTS, "aM", @progbits, 16 +.align 16 +CONSTANTS: .octa 0x6b20657479622d323320646e61707865 +.text + +/* + * Very basic SSE2 implementation of ChaCha20. Produces a given positive number + * of blocks of output with a nonce of 0, taking an input key and 8-byte + * counter. Importantly does not spill to the stack. Its arguments are: + * + * rdi: output bytes + * rsi: 32-byte key input + * rdx: 8-byte counter input/output + * rcx: number of 64-byte blocks to write to output + */ +SYM_FUNC_START(chacha20_blocks_nostack) + FRAME_BEGIN + +#define output %rdi +#define key %rsi +#define counter %rdx +#define nblocks %rcx +#define i %al +#define state0 %xmm0 +#define state1 %xmm1 +#define state2 %xmm2 +#define state3 %xmm3 +#define copy0 %xmm4 +#define copy1 %xmm5 +#define copy2 %xmm6 +#define copy3 %xmm7 +#define temp %xmm8 +#define one %xmm9 + + /* copy0 = "expand 32-byte k" */ + movaps CONSTANTS(%rip),copy0 + /* copy1,copy2 = key */ + movdqu 0x00(key),copy1 + movdqu 0x10(key),copy2 + /* copy3 = counter || zero nonce */ + movq 0x00(counter),copy3 + /* one = 1 || 0 */ + movq $1,%rax + movq %rax,one + +.Lblock: + /* state0,state1,state2,state3 = copy0,copy1,copy2,copy3 */ + movdqa copy0,state0 + movdqa copy1,state1 + movdqa copy2,state2 + movdqa copy3,state3 + + movb $10,i +.Lpermute: + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ + paddd state1,state0 + pxor state0,state3 + movdqa state3,temp + pslld $16,temp + psrld $16,state3 + por temp,state3 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ + paddd state3,state2 + pxor state2,state1 + movdqa state1,temp + pslld $12,temp + psrld $20,state1 + por temp,state1 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ + paddd state1,state0 + pxor state0,state3 + movdqa state3,temp + pslld $8,temp + psrld $24,state3 + por temp,state3 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ + paddd state3,state2 + pxor state2,state1 + movdqa state1,temp + pslld $7,temp + psrld $25,state1 + por temp,state1 + + /* state1 = shuffle32(state1, MASK(0, 3, 2, 1)) */ + pshufd $0x39,state1,state1 + /* state2 = shuffle32(state2, MASK(1, 0, 3, 2)) */ + pshufd $0x4e,state2,state2 + /* state3 = shuffle32(state3, MASK(2, 1, 0, 3)) */ + pshufd $0x93,state3,state3 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ + paddd state1,state0 + pxor state0,state3 + movdqa state3,temp + pslld $16,temp + psrld $16,state3 + por temp,state3 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ + paddd state3,state2 + pxor state2,state1 + movdqa state1,temp + pslld $12,temp + psrld $20,state1 + por temp,state1 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ + paddd state1,state0 + pxor state0,state3 + movdqa state3,temp + pslld $8,temp + psrld $24,state3 + por temp,state3 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ + paddd state3,state2 + pxor state2,state1 + movdqa state1,temp + pslld $7,temp + psrld $25,state1 + por temp,state1 + + /* state1 = shuffle32(state1, MASK(2, 1, 0, 3)) */ + pshufd $0x93,state1,state1 + /* state2 = shuffle32(state2, MASK(1, 0, 3, 2)) */ + pshufd $0x4e,state2,state2 + /* state3 = shuffle32(state3, MASK(0, 3, 2, 1)) */ + pshufd $0x39,state3,state3 + + decb i + jnz .Lpermute + + /* output0 = state0 + copy0 */ + paddd copy0,state0 + movdqu state0,0x00(output) + /* output1 = state1 + copy1 */ + paddd copy1,state1 + movdqu state1,0x10(output) + /* output2 = state2 + copy2 */ + paddd copy2,state2 + movdqu state2,0x20(output) + /* output3 = state3 + copy3 */ + paddd copy3,state3 + movdqu state3,0x30(output) + + /* ++copy3.counter */ + paddq one,copy3 + + /* output += 64, --nblocks */ + addq $64,output + decq nblocks + jnz .Lblock + + /* counter = copy3.counter */ + movq copy3,0x00(counter) + + /* Zero out all the regs, in case nothing uses these again. */ + pxor state0,state0 + pxor state1,state1 + pxor state2,state2 + pxor state3,state3 + pxor copy0,copy0 + pxor copy1,copy1 + pxor copy2,copy2 + pxor copy3,copy3 + pxor temp,temp + + FRAME_END + RET +SYM_FUNC_END(chacha20_blocks_nostack) diff --git a/arch/x86/entry/vdso/vgetrandom.c b/arch/x86/entry/vdso/vgetrandom.c new file mode 100644 index 000000000000..c7a2476d5d8a --- /dev/null +++ b/arch/x86/entry/vdso/vgetrandom.c @@ -0,0 +1,18 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2022 Jason A. Donenfeld . All Rights Reserved. + */ +#include +#include + +#include "../../../../lib/vdso/getrandom.c" + +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, void *state); + +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, void *state) +{ + return __cvdso_getrandom(buffer, len, flags, state); +} + +ssize_t getrandom(void *, size_t, unsigned int, void *) + __attribute__((weak, alias("__vdso_getrandom"))); diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h new file mode 100644 index 000000000000..099aca58ef20 --- /dev/null +++ b/arch/x86/include/asm/vdso/getrandom.h @@ -0,0 +1,49 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022 Jason A. Donenfeld . All Rights Reserved. + */ +#ifndef __ASM_VDSO_GETRANDOM_H +#define __ASM_VDSO_GETRANDOM_H + +#ifndef __ASSEMBLY__ + +#include +#include + +static __always_inline ssize_t +getrandom_syscall(void *buffer, size_t len, unsigned int flags) +{ + long ret; + + asm ("syscall" : "=a" (ret) : + "0" (__NR_getrandom), "D" (buffer), "S" (len), "d" (flags) : + "rcx", "r11", "memory"); + + return ret; +} + +#define __vdso_rng_data (VVAR(_vdso_rng_data)) + +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void) +{ + if (__vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS) + return (void *)&__vdso_rng_data + + ((void *)&__timens_vdso_data - (void *)&__vdso_data); + return &__vdso_rng_data; +} + +/* + * Generates a given positive number of block of ChaCha20 output with nonce=0, + * and does not write to any stack or memory outside of the parameters passed + * to it. This way, we don't need to worry about stack data leaking into forked + * child processes. + */ +static __always_inline void __arch_chacha20_blocks_nostack(u8 *dst_bytes, const u32 *key, u32 *counter, size_t nblocks) +{ + extern void chacha20_blocks_nostack(u8 *dst_bytes, const u32 *key, u32 *counter, size_t nblocks); + return chacha20_blocks_nostack(dst_bytes, key, counter, nblocks); +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETRANDOM_H */ diff --git a/arch/x86/include/asm/vdso/vsyscall.h b/arch/x86/include/asm/vdso/vsyscall.h index be199a9b2676..71c56586a22f 100644 --- a/arch/x86/include/asm/vdso/vsyscall.h +++ b/arch/x86/include/asm/vdso/vsyscall.h @@ -11,6 +11,8 @@ #include DEFINE_VVAR(struct vdso_data, _vdso_data); +DEFINE_VVAR_SINGLE(struct vdso_rng_data, _vdso_rng_data); + /* * Update the vDSO data page to keep in sync with kernel timekeeping. */ diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h index 183e98e49ab9..9d9af37f7cab 100644 --- a/arch/x86/include/asm/vvar.h +++ b/arch/x86/include/asm/vvar.h @@ -26,6 +26,8 @@ */ #define DECLARE_VVAR(offset, type, name) \ EMIT_VVAR(name, offset) +#define DECLARE_VVAR_SINGLE(offset, type, name) \ + EMIT_VVAR(name, offset) #else @@ -37,6 +39,10 @@ extern char __vvar_page; extern type timens_ ## name[CS_BASES] \ __attribute__((visibility("hidden"))); \ +#define DECLARE_VVAR_SINGLE(offset, type, name) \ + extern type vvar_ ## name \ + __attribute__((visibility("hidden"))); \ + #define VVAR(name) (vvar_ ## name) #define TIMENS(name) (timens_ ## name) @@ -44,12 +50,22 @@ extern char __vvar_page; type name[CS_BASES] \ __attribute__((section(".vvar_" #name), aligned(16))) __visible +#define DEFINE_VVAR_SINGLE(type, name) \ + type name \ + __attribute__((section(".vvar_" #name), aligned(16))) __visible + #endif /* DECLARE_VVAR(offset, type, name) */ DECLARE_VVAR(128, struct vdso_data, _vdso_data) +#if !defined(_SINGLE_DATA) +#define _SINGLE_DATA +DECLARE_VVAR_SINGLE(640, struct vdso_rng_data, _vdso_rng_data) +#endif + #undef DECLARE_VVAR +#undef DECLARE_VVAR_SINGLE #endif