From patchwork Tue Dec 27 12:13:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 36877 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1363890wrt; Tue, 27 Dec 2022 04:23:22 -0800 (PST) X-Google-Smtp-Source: AMrXdXug6VXeuYLe6O7s60hdPwwj3oy1F87HyIyuUTGZOo8t9stk9qYjqms2YxkSM/uTpujLwFmK X-Received: by 2002:a17:902:724b:b0:189:6d2f:4bc7 with SMTP id c11-20020a170902724b00b001896d2f4bc7mr39161461pll.37.1672143802006; Tue, 27 Dec 2022 04:23:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672143801; cv=none; d=google.com; s=arc-20160816; b=oD5o0gZbDQuAFFpPAr2jcReJrj++uOaB7CQAyGJTV2Icz1w/yS75hTxR3AgJywP96o ACUAW68R7+bEtDsvn9xjYvIiQo9E9SjRR6kcWryAe4pukizsGebon8JwFeLJe8qe9X0C 6tMQtepCVhTsxRS+QUDt6oY+5eHpeJzbWIAg9m5IsvxDFsdSqAkqDyWm0tuyPSUNduxB KpGyHEs7VLGL8svAF+aG/daftQgMp3N19SN5hG3UoErZpC0cpfycSbQ6zrPQvx5IsMhl 4IPC6rdICW2n+uqmz9fQ+qcuInJVkZTjM0GKlb68uSmhwh+TfvGLF0yWk84DbosDqF5u Zcfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=5DAL5Q5nrFxihn+kbgAp72qNEoYjdUKlz1xuabz9lMA=; b=w5lIZ5QoWR+4nPpUeYDqFGZ9WtHomUqDgcAZyXPL7HpRoG18Qb0iCe/xniw+Bu2YcM aHPvhaB5Vxor4nMJS+Iw8IwcA4+MHmAMZSCkPuGYVwEabDHxFwE/WuzJC/Np1hRiqrGV 7BEroECRat9YLE6nayBS0uKDtQdZSaP4nV4eFjlP7SLqL+6t1vMGIFfuS+lwwUAswhC/ ww9QatgyzVZe55kh8uuH4dgDzueSJgZX6WqpsnX89hlCjbkFqi95CM0QU0vCPK33ifOi UCzUPCGQfEzN9fk7lDd426qiIl9nzt+qzzHNQT9u7WiRJ/UT8ENYYEe3NR/fyJo+pNcD O2nQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=s2tSbRif; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t17-20020a170902e1d100b00186a2d376afsi12771429pla.273.2022.12.27.04.23.09; Tue, 27 Dec 2022 04:23:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=s2tSbRif; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232285AbiL0MQw (ORCPT + 99 others); Tue, 27 Dec 2022 07:16:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231651AbiL0MOn (ORCPT ); Tue, 27 Dec 2022 07:14:43 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BD6A2639; Tue, 27 Dec 2022 04:13:56 -0800 (PST) Date: Tue, 27 Dec 2022 12:13:52 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1672143232; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5DAL5Q5nrFxihn+kbgAp72qNEoYjdUKlz1xuabz9lMA=; b=s2tSbRif1KWaLprm8ZbDNPD/DQPr+1tdZM6ShFz1h2reXcQ8zQRRiWs4jEYbbnblOnPLIC uODZwTJ15XKzIl8He62mqVSb2gYF7yoEYkH+B3mZ/X/SU2zzxox2MECQ/maDUt8/0GEEz+ lkc2JGj+1b21DrrcxdVTxNzOmlgjNTPoZb6irrWLwE21YFPLKym+an+FMMrMBhc8rwB5F4 yH15rs3aptkHGdXDmuJ9C5/yn2zp3tqh179lO6NoPfZm0+AnOwbAPzpTmZpUeUH96OQjrj 3Du14o/97HqpUPJ3oFyGvM+FS9ttnTg/9cNErioQuy455iL5E88eeDtQ07OU4w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1672143232; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5DAL5Q5nrFxihn+kbgAp72qNEoYjdUKlz1xuabz9lMA=; b=izSLIbKw3EFdqe2gMFAOKBQzUROnaMMpPuQcKjg5MiykZDAocdM3w/zOfIF0jxy6VUgxaC RVWbCABbrszxdBDg== From: "tip-bot2 for Mathieu Desnoyers" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] rseq: Introduce extensible rseq ABI Cc: Mathieu Desnoyers , "Peter Zijlstra (Intel)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20221122203932.231377-4-mathieu.desnoyers@efficios.com> References: <20221122203932.231377-4-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Message-ID: <167214323246.4906.2743738647026767672.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750230287789118096?= X-GMAIL-MSGID: =?utf-8?q?1753369859498369102?= The following commit has been merged into the sched/core branch of tip: Commit-ID: ee3e3ac05c2631ce1f12d88c9cc9a092f8fe947a Gitweb: https://git.kernel.org/tip/ee3e3ac05c2631ce1f12d88c9cc9a092f8fe947a Author: Mathieu Desnoyers AuthorDate: Tue, 22 Nov 2022 15:39:05 -05:00 Committer: Peter Zijlstra CommitterDate: Tue, 27 Dec 2022 12:52:10 +01:00 rseq: Introduce extensible rseq ABI Introduce the extensible rseq ABI, where the feature size supported by the kernel and the required alignment are communicated to user-space through ELF auxiliary vectors. This allows user-space to call rseq registration with a rseq_len of either 32 bytes for the original struct rseq size (which includes padding), or larger. If rseq_len is larger than 32 bytes, then it must be large enough to contain the feature size communicated to user-space through ELF auxiliary vectors. Signed-off-by: Mathieu Desnoyers Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20221122203932.231377-4-mathieu.desnoyers@efficios.com --- include/linux/sched.h | 4 ++++ kernel/ptrace.c | 2 +- kernel/rseq.c | 37 ++++++++++++++++++++++++++++++------- 3 files changed, 35 insertions(+), 8 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 853d08f..e0bc020 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1302,6 +1302,7 @@ struct task_struct { #ifdef CONFIG_RSEQ struct rseq __user *rseq; + u32 rseq_len; u32 rseq_sig; /* * RmW on rseq_event_mask must be performed atomically @@ -2352,10 +2353,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) { if (clone_flags & CLONE_VM) { t->rseq = NULL; + t->rseq_len = 0; t->rseq_sig = 0; t->rseq_event_mask = 0; } else { t->rseq = current->rseq; + t->rseq_len = current->rseq_len; t->rseq_sig = current->rseq_sig; t->rseq_event_mask = current->rseq_event_mask; } @@ -2364,6 +2367,7 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) static inline void rseq_execve(struct task_struct *t) { t->rseq = NULL; + t->rseq_len = 0; t->rseq_sig = 0; t->rseq_event_mask = 0; } diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 5448219..0786450 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -813,7 +813,7 @@ static long ptrace_get_rseq_configuration(struct task_struct *task, { struct ptrace_rseq_configuration conf = { .rseq_abi_pointer = (u64)(uintptr_t)task->rseq, - .rseq_abi_size = sizeof(*task->rseq), + .rseq_abi_size = task->rseq_len, .signature = task->rseq_sig, .flags = 0, }; diff --git a/kernel/rseq.c b/kernel/rseq.c index d38ab94..7962738 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -18,6 +18,9 @@ #define CREATE_TRACE_POINTS #include +/* The original rseq structure size (including padding) is 32 bytes. */ +#define ORIG_RSEQ_SIZE 32 + #define RSEQ_CS_NO_RESTART_FLAGS (RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT | \ RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL | \ RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE) @@ -87,10 +90,15 @@ static int rseq_update_cpu_id(struct task_struct *t) u32 cpu_id = raw_smp_processor_id(); struct rseq __user *rseq = t->rseq; - if (!user_write_access_begin(rseq, sizeof(*rseq))) + if (!user_write_access_begin(rseq, t->rseq_len)) goto efault; unsafe_put_user(cpu_id, &rseq->cpu_id_start, efault_end); unsafe_put_user(cpu_id, &rseq->cpu_id, efault_end); + /* + * Additional feature fields added after ORIG_RSEQ_SIZE + * need to be conditionally updated only if + * t->rseq_len != ORIG_RSEQ_SIZE. + */ user_write_access_end(); trace_rseq_update(t); return 0; @@ -117,6 +125,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) */ if (put_user(cpu_id, &t->rseq->cpu_id)) return -EFAULT; + /* + * Additional feature fields added after ORIG_RSEQ_SIZE + * need to be conditionally reset only if + * t->rseq_len != ORIG_RSEQ_SIZE. + */ return 0; } @@ -344,7 +357,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, /* Unregister rseq for current thread. */ if (current->rseq != rseq || !current->rseq) return -EINVAL; - if (rseq_len != sizeof(*rseq)) + if (rseq_len != current->rseq_len) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -353,6 +366,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, return ret; current->rseq = NULL; current->rseq_sig = 0; + current->rseq_len = 0; return 0; } @@ -365,7 +379,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, * the provided address differs from the prior * one. */ - if (current->rseq != rseq || rseq_len != sizeof(*rseq)) + if (current->rseq != rseq || rseq_len != current->rseq_len) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -374,15 +388,24 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, } /* - * If there was no rseq previously registered, - * ensure the provided rseq is properly aligned and valid. + * If there was no rseq previously registered, ensure the provided rseq + * is properly aligned, as communcated to user-space through the ELF + * auxiliary vector AT_RSEQ_ALIGN. If rseq_len is the original rseq + * size, the required alignment is the original struct rseq alignment. + * + * In order to be valid, rseq_len is either the original rseq size, or + * large enough to contain all supported fields, as communicated to + * user-space through the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE. */ - if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || - rseq_len != sizeof(*rseq)) + if (rseq_len < ORIG_RSEQ_SIZE || + (rseq_len == ORIG_RSEQ_SIZE && !IS_ALIGNED((unsigned long)rseq, ORIG_RSEQ_SIZE)) || + (rseq_len != ORIG_RSEQ_SIZE && (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || + rseq_len < offsetof(struct rseq, end)))) return -EINVAL; if (!access_ok(rseq, rseq_len)) return -EFAULT; current->rseq = rseq; + current->rseq_len = rseq_len; current->rseq_sig = sig; /* * If rseq was previously inactive, and has just been