From patchwork Tue Jan 24 23:41:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 47960 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2429845wrn; Tue, 24 Jan 2023 15:45:54 -0800 (PST) X-Google-Smtp-Source: AMrXdXv1Y0qhUMzFfHFb95jvKL74nRFWHMzbE3zMrO44zlqmF9t/Y4vE5II4nP0zLNHlcJNQNtEz X-Received: by 2002:a05:6a00:3390:b0:58c:6ba1:58dd with SMTP id cm16-20020a056a00339000b0058c6ba158ddmr30652348pfb.11.1674603954284; Tue, 24 Jan 2023 15:45:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674603954; cv=none; d=google.com; s=arc-20160816; b=QJ2koMTG4wQliTDjstFVDedQm/ebhZB7TciP778E6QAB5IVDBDdljsaECrWQoEive1 J2dd4oX2ElgNZq/aiF9lIPoGdApCC3x742pB7E1PcTte/nz3n7tHkOyMN8VOyMeDKcGl +a/FX/0XbdqZdQIeNMtTOBErSs2AIx2VA8LiE68GsJM3h8f3gcF4mAOaVavwFcxvqE3a 9O5hpEjdx2BN78ztvS5ywz2uS0TZzhVfMNVWdct82jIk/hkMJUkYufDkKcmMQkclVfGC G/CrdYgKzJK4vmDLS5cqivfVv+lhDgyuvGyWw9hQUzP4ssrstccNLhaJxrxEyigZ/Xvr V/dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=xszjUWrkTFV6nH5PVQmd1FHGHVSzRQgiwaL4b/IJnWk=; b=faWctdAuV2F6ps1hvl0YPZWQ//Dwz/t2JrR8fX/TcraEEmPItMkitKDSkvz0muXbuM CPUM4/dq2VPC3YJ/Ox37Lm8gaacpTqVkULp5McogKLJ008mureWgFnqhMkhGFLqgIYEz X27WBKgopSinHIjDbaly6zgsgkWXT9KcLkX1heEh8zHEP/mW2zmyMnEsp4uORSzndESX Q+DofLd3hqDTQWtq4wPAHUWkQPFnY/1YeiVjxCImL83smGNQcfRvQE8NII1akMFahuA8 kAt14fQGmW9xoTcg+WhSMH8ejbEKdN895Fru8FXC9EAAi+P8X4qjheUeDzqBEWtLhtWH ShbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="hTF/rOOw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z9-20020a056a001d8900b0057a7b003c82si3551976pfw.219.2023.01.24.15.45.42; Tue, 24 Jan 2023 15:45:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="hTF/rOOw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233911AbjAXXmI (ORCPT + 99 others); Tue, 24 Jan 2023 18:42:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233425AbjAXXmF (ORCPT ); Tue, 24 Jan 2023 18:42:05 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C8B0366AD for ; Tue, 24 Jan 2023 15:42:04 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id 74-20020a62184d000000b0058b9f769609so7372490pfy.3 for ; Tue, 24 Jan 2023 15:42:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xszjUWrkTFV6nH5PVQmd1FHGHVSzRQgiwaL4b/IJnWk=; b=hTF/rOOwLOSId0uqhOxiwPXbw3/AXfGx+3kN23wQ14X/2W7FixnuFbx/Tx6JHDwQXz 3tP7WJa3k/2ZilKHOpQmLQ1aPQtsM6W9ZA44Ye/92hWta06p1j2LWKloC6IG/F0H5ZTj TSgVAaQ1+Dgfvq2UI3a2IcCw8yhkZnCGzpV6jAgfi5esnr30KxmO6iI7tMPSb6Fzs6wC 0DHDbnkKfA9g+2tJSzW3M8WBpev21WzZWhWbNoleSeuJDiZ4lODqxa/uxyIJDpNP6qTk Zw3YnweayXtLDiSYst8UKW40KJrsZUQaKb+QbMjDzz5SyjXSJHfHBR/6AHW8wSPOZLsF z8Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xszjUWrkTFV6nH5PVQmd1FHGHVSzRQgiwaL4b/IJnWk=; b=iNex4cZgj1gCwN1Mf+P9byUJnGinw/vpWLrAB8c/qRCWV7gC5LnUSbLzb8gBGV0fo5 24Zla62fcp1zrK+lradpRYftOAnS1kxrKQVRdXK+txYHcK83dc17YktIGmRhs6+VXbw0 DOHp322oqsrL3hD51rNkXti3/EMRJSzN3PHiqUOHj+pajSRP1/5nt+Q6NCQoNlGUpnwt bkSW/UYabeHfzlLEQnJ/Mfe8DxMJNRz5t3HhHls7a8PRwxLER4xsWe6wsYzVkIpt4B3Z H/m4616nHsIIkHVqn71ODKjWLEmK71rz3CQsPdYLUKltCOHOTK1jNcP1q2F/5Lq8YuoL XMzA== X-Gm-Message-State: AFqh2kqzwZWup5tLn2cH/smiMXJMxgyL8At+nLU8Hzp6LNFF37ea6dKh 8vNPxvmpBJTiF3UU9XCOyx2xPR+GsM4= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:cf1b:2f7f:3ca1:6488]) (user=avagin job=sendgmr) by 2002:a62:1c86:0:b0:58d:a84a:190b with SMTP id c128-20020a621c86000000b0058da84a190bmr3121261pfc.48.1674603723879; Tue, 24 Jan 2023 15:42:03 -0800 (PST) Date: Tue, 24 Jan 2023 15:41:51 -0800 In-Reply-To: <20230124234156.211569-1-avagin@google.com> Mime-Version: 1.0 References: <20230124234156.211569-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230124234156.211569-2-avagin@google.com> Subject: [PATCH 1/6] seccomp: don't use semaphore and wait_queue together From: Andrei Vagin To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Kees Cook , Christian Brauner , Chen Yu , Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755949516008943197?= X-GMAIL-MSGID: =?utf-8?q?1755949516008943197?= From: Andrei Vagin The main reason is to use new wake_up helpers that will be added in the following patches. But here are a few other reasons: * if we use two different ways, we always need to call them both. This patch fixes seccomp_notify_recv where we forgot to call wake_up_poll in the error path. * If we use one primitive, we can control how many waiters are woken up for each request. Our goal is to wake up just one that will handle a request. Right now, wake_up_poll can wake up one waiter and up(&match->notif->request) can wake up one more. Signed-off-by: Andrei Vagin --- kernel/seccomp.c | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index e9852d1b4a5e..876022e9c88c 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -145,7 +145,7 @@ struct seccomp_kaddfd { * @notifications: A list of struct seccomp_knotif elements. */ struct notification { - struct semaphore request; + atomic_t requests; u64 next_id; struct list_head notifications; }; @@ -1116,7 +1116,7 @@ static int seccomp_do_user_notification(int this_syscall, list_add_tail(&n.list, &match->notif->notifications); INIT_LIST_HEAD(&n.addfd); - up(&match->notif->request); + atomic_add(1, &match->notif->requests); wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); /* @@ -1450,6 +1450,37 @@ find_notification(struct seccomp_filter *filter, u64 id) return NULL; } +static int recv_wake_function(wait_queue_entry_t *wait, unsigned int mode, int sync, + void *key) +{ + /* Avoid a wakeup if event not interesting for us. */ + if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR))) + return 0; + return autoremove_wake_function(wait, mode, sync, key); +} + +static int recv_wait_event(struct seccomp_filter *filter) +{ + DEFINE_WAIT_FUNC(wait, recv_wake_function); + int ret; + + if (atomic_add_unless(&filter->notif->requests, -1, 0) != 0) + return 0; + + for (;;) { + ret = prepare_to_wait_event(&filter->wqh, &wait, TASK_INTERRUPTIBLE); + + if (atomic_add_unless(&filter->notif->requests, -1, 0) != 0) + break; + + if (ret) + return ret; + + schedule(); + } + finish_wait(&filter->wqh, &wait); + return 0; +} static long seccomp_notify_recv(struct seccomp_filter *filter, void __user *buf) @@ -1467,7 +1498,7 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, memset(&unotif, 0, sizeof(unotif)); - ret = down_interruptible(&filter->notif->request); + ret = recv_wait_event(filter); if (ret < 0) return ret; @@ -1515,7 +1546,8 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, if (should_sleep_killable(filter, knotif)) complete(&knotif->ready); knotif->state = SECCOMP_NOTIFY_INIT; - up(&filter->notif->request); + atomic_add(1, &filter->notif->requests); + wake_up_poll(&filter->wqh, EPOLLIN | EPOLLRDNORM); } mutex_unlock(&filter->notify_lock); } @@ -1777,7 +1809,6 @@ static struct file *init_listener(struct seccomp_filter *filter) if (!filter->notif) goto out; - sema_init(&filter->notif->request, 0); filter->notif->next_id = get_random_u64(); INIT_LIST_HEAD(&filter->notif->notifications); From patchwork Tue Jan 24 23:41:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 47954 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2429542wrn; Tue, 24 Jan 2023 15:45:05 -0800 (PST) X-Google-Smtp-Source: AMrXdXv9LPLs7MYUv5F9BjZiPBVH8VyxU2pb82nHDku88wPEjG9y/I28lrKcGSqIVa6fMB1W8Ypb X-Received: by 2002:a17:902:e84b:b0:194:ddc2:60e8 with SMTP id t11-20020a170902e84b00b00194ddc260e8mr23682789plg.48.1674603905346; Tue, 24 Jan 2023 15:45:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674603905; cv=none; d=google.com; s=arc-20160816; b=VvGypR9PXjdnVV0zO6lPcuD2EJ9eSvjqqSWCiZTih0QXQEx1PF7bZobEVe+aaHzGou qMyfxaebVruzuqUoiDU/SWvCD2WjWEXHMatMbQwcDepL+NIqjh2DnQJ/f1C44UfDPa4T Lm2VphhMSeLWKDJZZZeUWraAavZqNiyCuwFZTScvQQ6IhMqEqHCxRESJxFsBYd0vi7tl nNEiiYCHJcTkjM+/pHGpDnxn2Wrc5yTOJvt5AIb+qRmLDINDbdz18Q5DzUu98AHGbwUI wly1wMdf1hZICeAGUMATLjdBPqxK0ePvOMXJgFi3iix0nkZPUhQRATt7Z1GMcGrG4zPZ kgsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=lah5T4LA+ROCG4YZ0vVyyEKK2I5gjhcF16gSjf24Noo=; b=TTD3XhIxcCx44rw8VEIkvfAFTdPduot2h8kz1GCenTSbrVKpoXlStN8FhiLm1ghnH2 Sro9SYJwJakk8GBWv1lX91obI74S0iAthmoMw4SbOKSKJVu0A+7CaqmCvbO9oHnJPofO 2CsybjaQmSj1m8DxWPf6QrKUkluwFBOCpMt8PoFaQfVqm0+ru7ciY0tBB1EYIyaP0mME 7G6kRDvCs6T5F/6V+kku2GimXwcqAMsGD1YPArKl36HWESAwV3knckJaKmCkq0Aa/3Fo nvAEMlPUiIgXfCp0pb5x9GgbMdZpES0AmnpCzUPKRTcS458cB3L8vWCAfoFGKvoUnpvS SD2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=IGyqT2rl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bs192-20020a6328c9000000b0047780dc6a67si3578610pgb.370.2023.01.24.15.44.53; Tue, 24 Jan 2023 15:45:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=IGyqT2rl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233983AbjAXXmL (ORCPT + 99 others); Tue, 24 Jan 2023 18:42:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233174AbjAXXmH (ORCPT ); Tue, 24 Jan 2023 18:42:07 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5443CEC47 for ; Tue, 24 Jan 2023 15:42:06 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id cw8-20020a056a00450800b0058a3508303eso7435398pfb.13 for ; Tue, 24 Jan 2023 15:42:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lah5T4LA+ROCG4YZ0vVyyEKK2I5gjhcF16gSjf24Noo=; b=IGyqT2rl17rRqTPfLdyuUnvDyatbJTaprQUp50FrNxlkElHUAQ3qVdFWKUPsOzYUq5 4oPRURkXY/vQbChgZRoNYC8JgWMMFE/cPIjWFH1Iu7QKftN2wi1N2eXwDMlnAoy96MRK IMIRFkXPABMKN822NiWifYufm479ZeVAp5IOyaT8iWyx/orysTDv8u2x/3nKhXclU5iU PHwNu3sw7RdpX6FGLP1jV6vQIG3GGrHNAr8AP8+lLUEy6GuSLMWSuQqxDrA/r6J+tjzz XB17TJG2qpEGN0Mj8OJY9uQfolU97Xcl8htO7xLvY0Z/ITZNT+HsXwaAC8hgFl2Ctm3H ssrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lah5T4LA+ROCG4YZ0vVyyEKK2I5gjhcF16gSjf24Noo=; b=sz7jGsXpKIkUP0e4qjTJWCGwcEAYUBmVftDWCb0/JBV4iz0+Be6s2jSPTiKSVqS4iz B1u55zkXCTzAJgZqL7AgdmyUfu3OLojFUOxduDCs9U6hvuk5g+hHOryB9n1DbbgvpHT8 4bFg9mZ0EXqC+lrJyacNVXdWNLb48bzOZ/Z9+OqlhVPL0K12KAbujhFz5+P9GOQr2Csr z5eO1snDFa0GFYvPWZ9QTTFmeSoGt1llghWmKByK6QfLVFmZ6plUj5NorbpINKnza8Bh AH+1IUo7zvh6zAniflh7GGUxWhNMCKjYsKIXShGY2UMdUE29ilmJGvS1PsBP5iimqHVf NIGw== X-Gm-Message-State: AFqh2koj7Qj0AJ4cmXGw3y8jgKsqYxXcDq+ga5PIvEU9OJ3aFKqcKv+F BhBPDHDQmHSYxUBlt8LF7AtzWceFvqE= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:cf1b:2f7f:3ca1:6488]) (user=avagin job=sendgmr) by 2002:a05:6a00:2490:b0:575:c993:d318 with SMTP id c16-20020a056a00249000b00575c993d318mr3017697pfv.78.1674603725657; Tue, 24 Jan 2023 15:42:05 -0800 (PST) Date: Tue, 24 Jan 2023 15:41:52 -0800 In-Reply-To: <20230124234156.211569-1-avagin@google.com> Mime-Version: 1.0 References: <20230124234156.211569-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230124234156.211569-3-avagin@google.com> Subject: [PATCH 2/6] sched: add WF_CURRENT_CPU and externise ttwu From: Andrei Vagin To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Kees Cook , Christian Brauner , Chen Yu , Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755949464787679170?= X-GMAIL-MSGID: =?utf-8?q?1755949464787679170?= From: Peter Oskolkov Add WF_CURRENT_CPU wake flag that advices the scheduler to move the wakee to the current CPU. This is useful for fast on-CPU context switching use cases. In addition, make ttwu external rather than static so that the flag could be passed to it from outside of sched/core.c. Signed-off-by: Peter Oskolkov Signed-off-by: Andrei Vagin --- kernel/sched/core.c | 3 +-- kernel/sched/fair.c | 4 ++++ kernel/sched/sched.h | 13 ++++++++----- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bb1ee6d7bdde..028c2840baa6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4112,8 +4112,7 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success) * Return: %true if @p->state changes (an actual wakeup was done), * %false otherwise. */ -static int -try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) +int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) { unsigned long flags; int cpu, success = 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c36aa54ae071..d6f76bead3c5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7380,6 +7380,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags) if (wake_flags & WF_TTWU) { record_wakee(p); + if ((wake_flags & WF_CURRENT_CPU) && + cpumask_test_cpu(cpu, p->cpus_ptr)) + return cpu; + if (sched_energy_enabled()) { new_cpu = find_energy_efficient_cpu(p, prev_cpu); if (new_cpu >= 0) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 771f8ddb7053..34b4c54b2a2a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2088,12 +2088,13 @@ static inline int task_on_rq_migrating(struct task_struct *p) } /* Wake flags. The first three directly map to some SD flag value */ -#define WF_EXEC 0x02 /* Wakeup after exec; maps to SD_BALANCE_EXEC */ -#define WF_FORK 0x04 /* Wakeup after fork; maps to SD_BALANCE_FORK */ -#define WF_TTWU 0x08 /* Wakeup; maps to SD_BALANCE_WAKE */ +#define WF_EXEC 0x02 /* Wakeup after exec; maps to SD_BALANCE_EXEC */ +#define WF_FORK 0x04 /* Wakeup after fork; maps to SD_BALANCE_FORK */ +#define WF_TTWU 0x08 /* Wakeup; maps to SD_BALANCE_WAKE */ -#define WF_SYNC 0x10 /* Waker goes to sleep after wakeup */ -#define WF_MIGRATED 0x20 /* Internal use, task got migrated */ +#define WF_SYNC 0x10 /* Waker goes to sleep after wakeup */ +#define WF_MIGRATED 0x20 /* Internal use, task got migrated */ +#define WF_CURRENT_CPU 0x40 /* Prefer to move the wakee to the current CPU. */ #ifdef CONFIG_SMP static_assert(WF_EXEC == SD_BALANCE_EXEC); @@ -3245,6 +3246,8 @@ static inline bool is_per_cpu_kthread(struct task_struct *p) extern void swake_up_all_locked(struct swait_queue_head *q); extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); +extern int try_to_wake_up(struct task_struct *tsk, unsigned int state, int wake_flags); + #ifdef CONFIG_PREEMPT_DYNAMIC extern int preempt_dynamic_mode; extern int sched_dynamic_mode(const char *str); From patchwork Tue Jan 24 23:41:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 47955 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2429570wrn; Tue, 24 Jan 2023 15:45:09 -0800 (PST) X-Google-Smtp-Source: AMrXdXspk1jxu5ytpafVYvEQIYgC/OcKTZ9MKRw5Qk2A2eSrcQ0mJcPUNf1yQtxTPyGKg5M59nhW X-Received: by 2002:a17:90b:3905:b0:229:ff0a:bdfc with SMTP id ob5-20020a17090b390500b00229ff0abdfcmr20216496pjb.24.1674603909212; Tue, 24 Jan 2023 15:45:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674603909; cv=none; d=google.com; s=arc-20160816; b=jiAxy3LCiL3F0WGhDgOggn2xG/Nmv3ki0f9ryEaUfxW2raVAheYyPfbak5Q0i8Cgl6 zHQbiKh+hH0McrXeCvympdQfnw1+1ecF5ZwmOHbJON0VNIszwUzldxlB3divvChWjAZ/ dNBSophYZLBTyMy2dYIjb5OXTcCsZ4lQFfv5lM7COsHxqjgseC9KTBShf251fwToOxmc DTZm7MttTjVIk+1Rj02ypOj1ICFAdQNaswkaEG+7IqusuhhTtTRJXe/HGLOqFUORfRgh uZYJhUUOWxtvv8GcpB8s79VZsVqyPmS6J7smixQPd59SUQ6x7ZKLcHxQXSn3yRzHXNWw l4hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=qi7EKreQxkEoaHFcnKjib1WJc2fpRwlVhvJbtYcJB8A=; b=D4GJGKmYjvzJKJDJj0FHLTtVXh9L4rjQanYvofmQJxaAvZpChuX4zai3TrBjCSteqh jwYS+M8h3IWKwdUkoY6O79Ab2cP5TX79uvkyhJZulvqgzJiwDpxscZQ4zb7GNqGWP6nD SvMxv//T/V7b4bu3C+l7+yrUZ18HeBKv8T4yH29xNGbjzFoH2crqNp80o2aMWqyTQZb8 Fep2fGCvOzjuFbJTjD8B6OC1vmZiX1ho7cKNg2j//cixzjgN1GXiZma0e+RcacyYH3AS vLQWE0s4xoeQHQMQNjcSZDW1QvsJygHfJKygoMiBe7IoR/7hOQsWl2beGTWuq0HGCwh7 wa4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=HiNOz2UD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t16-20020a17090a449000b0021878aebd90si231087pjg.168.2023.01.24.15.44.56; Tue, 24 Jan 2023 15:45:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=HiNOz2UD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234083AbjAXXmS (ORCPT + 99 others); Tue, 24 Jan 2023 18:42:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233294AbjAXXmJ (ORCPT ); Tue, 24 Jan 2023 18:42:09 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26E8F4A1D7 for ; Tue, 24 Jan 2023 15:42:08 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id om10-20020a17090b3a8a00b002299e350deaso136322pjb.1 for ; Tue, 24 Jan 2023 15:42:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qi7EKreQxkEoaHFcnKjib1WJc2fpRwlVhvJbtYcJB8A=; b=HiNOz2UDVT2ZAczIQ2k4nyb+4XwJ/ePZO7Aw6FmWsnmQlWlzz1eJqh+wfP307mGIwm YVQ3zX5pLPTwVYThBKcGJ5D+/a3sk/oeQrDSB7NROq4JfVzMztqRn9DBoyVnjAx+ZWOC Kvs/OlgRdFjQUzeq8w/mzCclINwTxgpL79SA2eoFw2nK2QwcywyFJmFdrIK0DEEOQIA6 d/I92h2HYkrjwrXEWr+/eWU7K2TiWzyaK9ALU9I0Q9k2IjG/XnTyP2jlkE96a4WL2UNg r9MFj1mvsSoG4Wi2BXwUBOHqsZiyohuahjnK77kZy9J70GxvTpwbvBc6LP4lDpMHFl1x t/EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qi7EKreQxkEoaHFcnKjib1WJc2fpRwlVhvJbtYcJB8A=; b=piokAKYc4klhMbbtyveLK1ghKoPxwUSbpY2tEv4Oo4LBuYUrzAoDSyRraNLxYwYXjt don7JLMLwTjVdGmpezzgO9VihMnR625xi+I9LsNxUdkd+SJ+qmqRo2BP2A+g/G5HrP/P kw0pg5C7hmBTJEyA9RAptO+clNTmsn2BFR3fJ7HDAWt0I/Y6GjajSgXkbwj2dIDS1Fju 6DN259rgD/ckoL8F939v6Ep5S0WAVNKBILBhHmsNY0Ec4rtG7/cM5K7OXgX2nHp5fIan HFJBhLou6g4mOC7oJOaM6dljPF3W2rEponKPquvc1BWpPjCmL65Ute7FybC+pXaBIBC/ LrmA== X-Gm-Message-State: AFqh2krvwTxYF4mWxkOaYKwF+tUMM724INDx0uFgplvbftUqBpL+BHvE WJuOW0hmkLc5GxT5iGUPqEQC8kiCX54= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:cf1b:2f7f:3ca1:6488]) (user=avagin job=sendgmr) by 2002:a17:902:ce01:b0:18c:5dae:6f2 with SMTP id k1-20020a170902ce0100b0018c5dae06f2mr3065462plg.24.1674603727622; Tue, 24 Jan 2023 15:42:07 -0800 (PST) Date: Tue, 24 Jan 2023 15:41:53 -0800 In-Reply-To: <20230124234156.211569-1-avagin@google.com> Mime-Version: 1.0 References: <20230124234156.211569-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230124234156.211569-4-avagin@google.com> Subject: [PATCH 3/6] sched: add a few helpers to wake up tasks on the current cpu From: Andrei Vagin To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Kees Cook , Christian Brauner , Chen Yu , Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755949469116909218?= X-GMAIL-MSGID: =?utf-8?q?1755949469116909218?= From: Andrei Vagin Add complete_on_current_cpu, wake_up_poll_on_current_cpu helpers to wake up tasks on the current CPU. These two helpers are useful when the task needs to make a synchronous context switch to another task. In this context, synchronous means it wakes up the target task and falls asleep right after that. One example of such workloads is seccomp user notifies. This mechanism allows the supervisor process handles system calls on behalf of a target process. While the supervisor is handling an intercepted system call, the target process will be blocked in the kernel, waiting for a response to come back. On-CPU context switches are much faster than regular ones. Signed-off-by: Andrei Vagin --- include/linux/completion.h | 1 + include/linux/swait.h | 2 +- include/linux/wait.h | 3 +++ kernel/sched/completion.c | 26 ++++++++++++++++++-------- kernel/sched/core.c | 2 +- kernel/sched/swait.c | 8 ++++---- kernel/sched/wait.c | 5 +++++ 7 files changed, 33 insertions(+), 14 deletions(-) diff --git a/include/linux/completion.h b/include/linux/completion.h index 62b32b19e0a8..fb2915676574 100644 --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -116,6 +116,7 @@ extern bool try_wait_for_completion(struct completion *x); extern bool completion_done(struct completion *x); extern void complete(struct completion *); +extern void complete_on_current_cpu(struct completion *x); extern void complete_all(struct completion *); #endif diff --git a/include/linux/swait.h b/include/linux/swait.h index 6a8c22b8c2a5..d324419482a0 100644 --- a/include/linux/swait.h +++ b/include/linux/swait.h @@ -146,7 +146,7 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq) extern void swake_up_one(struct swait_queue_head *q); extern void swake_up_all(struct swait_queue_head *q); -extern void swake_up_locked(struct swait_queue_head *q); +extern void swake_up_locked(struct swait_queue_head *q, int wake_flags); extern void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *wait, int state); extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state); diff --git a/include/linux/wait.h b/include/linux/wait.h index a0307b516b09..5ec7739400f4 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -210,6 +210,7 @@ __remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq } int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, int nr, void *key); +void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, unsigned int mode, void *key, wait_queue_entry_t *bookmark); @@ -237,6 +238,8 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head); #define key_to_poll(m) ((__force __poll_t)(uintptr_t)(void *)(m)) #define wake_up_poll(x, m) \ __wake_up(x, TASK_NORMAL, 1, poll_to_key(m)) +#define wake_up_poll_on_current_cpu(x, m) \ + __wake_up_on_current_cpu(x, TASK_NORMAL, poll_to_key(m)) #define wake_up_locked_poll(x, m) \ __wake_up_locked_key((x), TASK_NORMAL, poll_to_key(m)) #define wake_up_interruptible_poll(x, m) \ diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c index d57a5c1c1cd9..3561ab533dd4 100644 --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -13,6 +13,23 @@ * Waiting for completion is a typically sync point, but not an exclusion point. */ +static void complete_with_flags(struct completion *x, int wake_flags) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&x->wait.lock, flags); + + if (x->done != UINT_MAX) + x->done++; + swake_up_locked(&x->wait, wake_flags); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); +} + +void complete_on_current_cpu(struct completion *x) +{ + return complete_with_flags(x, WF_CURRENT_CPU); +} + /** * complete: - signals a single thread waiting on this completion * @x: holds the state of this particular completion @@ -27,14 +44,7 @@ */ void complete(struct completion *x) { - unsigned long flags; - - raw_spin_lock_irqsave(&x->wait.lock, flags); - - if (x->done != UINT_MAX) - x->done++; - swake_up_locked(&x->wait); - raw_spin_unlock_irqrestore(&x->wait.lock, flags); + complete_with_flags(x, 0); } EXPORT_SYMBOL(complete); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 028c2840baa6..a4cb1b5fd52d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6925,7 +6925,7 @@ asmlinkage __visible void __sched preempt_schedule_irq(void) int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags, void *key) { - WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC); + WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~(WF_SYNC|WF_CURRENT_CPU)); return try_to_wake_up(curr->private, mode, wake_flags); } EXPORT_SYMBOL(default_wake_function); diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c index 76b9b796e695..72505cd3b60a 100644 --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -18,7 +18,7 @@ EXPORT_SYMBOL(__init_swait_queue_head); * If for some reason it would return 0, that means the previously waiting * task is already running, so it will observe condition true (or has already). */ -void swake_up_locked(struct swait_queue_head *q) +void swake_up_locked(struct swait_queue_head *q, int wake_flags) { struct swait_queue *curr; @@ -26,7 +26,7 @@ void swake_up_locked(struct swait_queue_head *q) return; curr = list_first_entry(&q->task_list, typeof(*curr), task_list); - wake_up_process(curr->task); + try_to_wake_up(curr->task, TASK_NORMAL, wake_flags); list_del_init(&curr->task_list); } EXPORT_SYMBOL(swake_up_locked); @@ -41,7 +41,7 @@ EXPORT_SYMBOL(swake_up_locked); void swake_up_all_locked(struct swait_queue_head *q) { while (!list_empty(&q->task_list)) - swake_up_locked(q); + swake_up_locked(q, 0); } void swake_up_one(struct swait_queue_head *q) @@ -49,7 +49,7 @@ void swake_up_one(struct swait_queue_head *q) unsigned long flags; raw_spin_lock_irqsave(&q->lock, flags); - swake_up_locked(q); + swake_up_locked(q, 0); raw_spin_unlock_irqrestore(&q->lock, flags); } EXPORT_SYMBOL(swake_up_one); diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 133b74730738..47803a0b8d5d 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -161,6 +161,11 @@ int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, } EXPORT_SYMBOL(__wake_up); +void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key) +{ + __wake_up_common_lock(wq_head, mode, 1, WF_CURRENT_CPU, key); +} + /* * Same as __wake_up but called with the spinlock in wait_queue_head_t held. */ From patchwork Tue Jan 24 23:41:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 47956 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2429580wrn; Tue, 24 Jan 2023 15:45:12 -0800 (PST) X-Google-Smtp-Source: AMrXdXu+Kjmo7AVerxHj1u/tlh7sepa144oDR+CRQleboysn2TvLoys96nyCYSqUggaZUs4J1GWQ X-Received: by 2002:a05:6a21:339a:b0:a7:345a:100f with SMTP id yy26-20020a056a21339a00b000a7345a100fmr40664417pzb.10.1674603911791; Tue, 24 Jan 2023 15:45:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674603911; cv=none; d=google.com; s=arc-20160816; b=CYASubl+KVepxdfBey0RrOXRTNVHnkv5sng8u4zOABKN4kgcXp/G9+wwmqYTZZYTJR zoON3F4IA8NUTpiCGYrYSnB6rJ3md+8ou7iK3j1/KPfnCXfzNtyF4pJISwnriTcdaJM6 j68ZGjgWu+kr73rBVHDqe2ecBeGR/cFVm3r9g7T8/WPfNY7Lz+r7wHeUIHm8Y5bfWcAr 8I8TzeBpCgeubV629vGO7n6o9lq/NMKwMM/KainVmwmfKj4bIZyzyH563C83gAObvIJy I3BmaIXHho1IuAB2ws+6epmnXlYT8Z1cP0+fmG9uHtaLFf7O0hL/FoFteaio1IevCQtN 3fVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=g6Z/L7bX9MFc/+bSE9Rh2MDjwE5MQawcR3+IL7tl120=; b=R/bhlBu4wdYAa3MnWGjkKjquP+/8JAl+rIIZuMctV2n50FxTZGav0J9whaXlINyD9D sVkYPRzUOnsm9i+VvR7szOZrUqUlF639Pf8rBL3ASZ2fxUv8OqDV8urJrjM+gOnqmWss jle56IRt0SVQwmaZMAoJMw2gF6qXMCd83L2yVuwaAiCitanmwL634YxqqHI7a/KGJxP4 cTiaqPvYy0PfaJ50YjYjzosggOXYX9Yfpdf0kNGUwK2xGMDnFF6gsWPPCSM7xjxG7IqJ 9FtsfOleyWY90PDQBt03h002duEKeiLtQSzqMYaTBGScS5wsTrZlW2phbcDc2jJ4SAvj KfhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jqgEvr3L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t7-20020a637807000000b004d378558f31si3582123pgc.136.2023.01.24.15.44.57; Tue, 24 Jan 2023 15:45:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jqgEvr3L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232753AbjAXXmX (ORCPT + 99 others); Tue, 24 Jan 2023 18:42:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233999AbjAXXmM (ORCPT ); Tue, 24 Jan 2023 18:42:12 -0500 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29B314A21F for ; Tue, 24 Jan 2023 15:42:10 -0800 (PST) Received: by mail-pg1-x549.google.com with SMTP id 69-20020a630148000000b00478118684c4so7602142pgb.20 for ; Tue, 24 Jan 2023 15:42:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=g6Z/L7bX9MFc/+bSE9Rh2MDjwE5MQawcR3+IL7tl120=; b=jqgEvr3LLSFk3IpBhw0lkz7ygaaSYsrv9PmPLfWIjqQS5/6kHgpMdG2b3XIz6KOvrB fcHWrb+YGbCLFZO9QtB+/KXICKIjUl/DyBYphIVlYvFbNQcThhXjzaZDBP5ysEj1+fvg HYAZD+//ffgLjsg9uPIzBMRJuDW2FbC631nadBYc+jB2AcCjEJtbG6EI5OyS/R6Bu+NC z7NlSrJSg9mOw1FfgiF/OSwIZ5Ed41/lpzau0U5fK5jrceP0zUfUorq7YswjPFasqjVZ GmpUp/cFG0QCFMsFpp0JmEBDUhcnKdJZDO9a+aqbdpcXxBut6G3R+WpwAYlk6jwrnJpb 79YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=g6Z/L7bX9MFc/+bSE9Rh2MDjwE5MQawcR3+IL7tl120=; b=QY18gfw1XgAH79y17WAmKTDloDy9WnZHyHohL58IPBkCeSKm1wCwhlSnHqtBnNTcMU yMVdUUqrzncsrYTaiS+LHSttNOgBTUQG+zijr4t8nul9t1pvHPEfRhmMqpK7T5p4WZGt vn42XsH+WD1NQkWycJzqAsf7/v6K1tw3HdxhJuCkNog+EL6pYc4fHhCwIhylvqKCaw5v YobbUNbhS2iO6Q1ElShXlHVBjQMnkqC1GqIAdh9aTKAYXg7qE+ldRVZNKr4W3iJeD1+A 5L8RaptX9aVLdKSbjCR/tMzEP0R3Gp1LVlteqb4dGJaDm0Ak4+Io6dDlcfCqa9rrwLT9 95/A== X-Gm-Message-State: AFqh2kozd8iCeyv2NYqPOT40ZQ63Ypg2VAq5xpV1BzgKU8gBfr2CsxH/ BSGDT8yQ3P1rPT4RKnmhgZNHVul0NEc= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:cf1b:2f7f:3ca1:6488]) (user=avagin job=sendgmr) by 2002:a05:6a00:400c:b0:58b:c1a4:3c29 with SMTP id by12-20020a056a00400c00b0058bc1a43c29mr3151066pfb.32.1674603729454; Tue, 24 Jan 2023 15:42:09 -0800 (PST) Date: Tue, 24 Jan 2023 15:41:54 -0800 In-Reply-To: <20230124234156.211569-1-avagin@google.com> Mime-Version: 1.0 References: <20230124234156.211569-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230124234156.211569-5-avagin@google.com> Subject: [PATCH 4/6] seccomp: add the synchronous mode for seccomp_unotify From: Andrei Vagin To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Kees Cook , Christian Brauner , Chen Yu , Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755949458165254428?= X-GMAIL-MSGID: =?utf-8?q?1755949471627886420?= From: Andrei Vagin seccomp_unotify allows more privileged processes do actions on behalf of less privileged processes. In many cases, the workflow is fully synchronous. It means a target process triggers a system call and passes controls to a supervisor process that handles the system call and returns controls to the target process. In this context, "synchronous" means that only one process is running and another one is waiting. There is the WF_CURRENT_CPU flag that is used to advise the scheduler to move the wakee to the current CPU. For such synchronous workflows, it makes context switches a few times faster. Right now, each interaction takes 12µs. With this patch, it takes about 3µs. This change introduce the SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP flag that it used to enable the sync mode. Signed-off-by: Andrei Vagin --- include/uapi/linux/seccomp.h | 4 ++++ kernel/seccomp.c | 31 +++++++++++++++++++++++++++++-- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 0fdc6ef02b94..dbfc9b37fcae 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -115,6 +115,8 @@ struct seccomp_notif_resp { __u32 flags; }; +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) + /* valid flags for seccomp_notif_addfd */ #define SECCOMP_ADDFD_FLAG_SETFD (1UL << 0) /* Specify remote fd */ #define SECCOMP_ADDFD_FLAG_SEND (1UL << 1) /* Addfd and return it, atomically */ @@ -150,4 +152,6 @@ struct seccomp_notif_addfd { #define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOW(3, \ struct seccomp_notif_addfd) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) + #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 876022e9c88c..0a62d44f4898 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -143,9 +143,12 @@ struct seccomp_kaddfd { * filter->notify_lock. * @next_id: The id of the next request. * @notifications: A list of struct seccomp_knotif elements. + * @flags: A set of SECCOMP_USER_NOTIF_FD_* flags. */ + struct notification { atomic_t requests; + u32 flags; u64 next_id; struct list_head notifications; }; @@ -1117,7 +1120,10 @@ static int seccomp_do_user_notification(int this_syscall, INIT_LIST_HEAD(&n.addfd); atomic_add(1, &match->notif->requests); - wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); + if (match->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + wake_up_poll_on_current_cpu(&match->wqh, EPOLLIN | EPOLLRDNORM); + else + wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); /* * This is where we wait for a reply from userspace. @@ -1593,7 +1599,10 @@ static long seccomp_notify_send(struct seccomp_filter *filter, knotif->error = resp.error; knotif->val = resp.val; knotif->flags = resp.flags; - complete(&knotif->ready); + if (filter->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + complete_on_current_cpu(&knotif->ready); + else + complete(&knotif->ready); out: mutex_unlock(&filter->notify_lock); return ret; @@ -1623,6 +1632,22 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter, return ret; } +static long seccomp_notify_set_flags(struct seccomp_filter *filter, + unsigned long flags) +{ + long ret; + + if (flags & ~SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + return -EINVAL; + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + filter->notif->flags = flags; + mutex_unlock(&filter->notify_lock); + return 0; +} + static long seccomp_notify_addfd(struct seccomp_filter *filter, struct seccomp_notif_addfd __user *uaddfd, unsigned int size) @@ -1752,6 +1777,8 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, case SECCOMP_IOCTL_NOTIF_ID_VALID_WRONG_DIR: case SECCOMP_IOCTL_NOTIF_ID_VALID: return seccomp_notify_id_valid(filter, buf); + case SECCOMP_IOCTL_NOTIF_SET_FLAGS: + return seccomp_notify_set_flags(filter, arg); } /* Extensible Argument ioctls */ From patchwork Tue Jan 24 23:41:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 47957 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2429586wrn; Tue, 24 Jan 2023 15:45:12 -0800 (PST) X-Google-Smtp-Source: AMrXdXstg8+rYmz+f438VxxEgZM7ycYpeOSDR5C2SRmAtxNTzKEkpMm9o0wRTkOJQxOJ/cItD7P+ X-Received: by 2002:a05:6a20:d39a:b0:b0:32f0:6237 with SMTP id iq26-20020a056a20d39a00b000b032f06237mr33556813pzb.18.1674603912719; Tue, 24 Jan 2023 15:45:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674603912; cv=none; d=google.com; s=arc-20160816; b=GGFZiXf0FO1dw9y04Cq5uhIUsTQ1T5S+Ti9E2/12rTtM3nJhAPi0dExBPVahACUx+p 2jG4H7v/BTocMzMjS9o80/Ag3dyN5Qhrv4iHve4fKt/d51r0CZxKvzp486RRxZ37kH3n RxiHjgOH5/45czGqshrmmmh3Tr1VZmnLJgZBdgXv03AJTkrDiKizJspAnpgY+ycj2x3m gH1XUOnYCZM9nHhRBOZssSWksEPz3/FrJVmcbzJmYYr3vGU1a866PMjeSOcaXKV0/zKz Kq3dxXMm2JESpwYwIesXchit6mYWnzqIHGOl6w9C8nyOMYYdNY7RV8N8jPxd1df8ZkU8 ehaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8/JrJgyN+mBphtoymyqpedRSMPQyxXzI/fFyz6RiTZs=; b=E71IGZprPlzi43bFbSy4p4rBZIW1WnIrO90ObmfCi+5uHLL/hgUdyP/NRfb3nyEd7v cxqQ15b2EO18mqSYqtw8NC7t1gBIAw4UYbW38HQF4qLDbt26gdWIQSf6H5t2vjA1Aml5 aFTJ/DiwzW0ihvzd6bbmlkIfk+GvKza4DaXquNwoK3/7+j7UKPfjrkDnywEHMGclS/QB yXHkCMk78UdcKoDkZ0guS3TF/YJPJ9hCTWG6QGop2ZANM8BrVv7k8Zo1bT4PABu/3Ebq 2tl3bfpX0dd6rjj3MU5ae4FY0iKSB+v/N3P8s+tLgv97bfCGguufc60Nc9Bew++jwhw8 GLeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Nm2AcpMz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m8-20020a656a08000000b0047caad28cdbsi3952492pgu.621.2023.01.24.15.44.59; Tue, 24 Jan 2023 15:45:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Nm2AcpMz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234264AbjAXXm1 (ORCPT + 99 others); Tue, 24 Jan 2023 18:42:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234039AbjAXXmP (ORCPT ); Tue, 24 Jan 2023 18:42:15 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8FBC4997F for ; Tue, 24 Jan 2023 15:42:11 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id j16-20020a170902da9000b00194c056109eso9750165plx.18 for ; Tue, 24 Jan 2023 15:42:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8/JrJgyN+mBphtoymyqpedRSMPQyxXzI/fFyz6RiTZs=; b=Nm2AcpMznwiKEnd1/Ey177FD+c9SBaCTTy/WvSRxfjzHfyp0s7lkLF374MJHtE8g2+ ek0C0I6omsMdcME2Lcqxsqo+u6HUdoXO/njA7NR9XOLk7A4BNx4MFuNTQsMSndgDb38e 3oHoX95tyxRlgkRFhejIrMxK+4jBm7BYjQO9fEJ4f5GguGeyq4RU2KakiHpqX/X3RLPZ gy6wZFlqqMuho8v/EkxfGr0k+Z/hSIWozfDG9w1qag3eB1UfPwAfi0hZiFBAmZ8zQrza L+BmvAAViLJ7gLA8kmIR+Hn/0FAgjmOL7BfIOVEp2GbKwARAEM/maRzRzmydo+Ysolq0 UaNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8/JrJgyN+mBphtoymyqpedRSMPQyxXzI/fFyz6RiTZs=; b=PNbJxellyaH0Kmz219Dt29B9h1pD8PUuOX4Pj5HXEx5fee4dlmRqbeBkToj2HzLhvi 3s61CUEGl6d7RcyMmhMrDLEPVTG0xCQdVFoADcVxy7vTmDvH/KyYVuMhno1Sj4w+3aCX 8HhjXETNc8T+O7vuWU55biRVVvPb+kEfbABAB1YrfP/MdqdDw2bCc7SCb7BWNMzP6eBn LYc3E0MYl5blukLEeVjsYGDnUEWYY6qzkOmEO6iwKa2bKsuOSWUo8PIrviJSeVQ7h5Y3 t1ogOJlIrJkiz1Sq8EdnFrc70GRBnext7HjoP7ky/tZtw3dMLIEjc1V/svXL1jF89p4J bHTQ== X-Gm-Message-State: AO0yUKVVRuDAqx+sgr2mv4H5CQkOm74lWqB4P1i8oUUwv28hat0nAi+Q J8KNX6e4hiVExAkoNu534F+j0oNkqq8= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:cf1b:2f7f:3ca1:6488]) (user=avagin job=sendgmr) by 2002:a05:6a00:22c9:b0:590:e49:9ce7 with SMTP id f9-20020a056a0022c900b005900e499ce7mr428597pfj.13.1674603731171; Tue, 24 Jan 2023 15:42:11 -0800 (PST) Date: Tue, 24 Jan 2023 15:41:55 -0800 In-Reply-To: <20230124234156.211569-1-avagin@google.com> Mime-Version: 1.0 References: <20230124234156.211569-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230124234156.211569-6-avagin@google.com> Subject: [PATCH 5/6] selftest/seccomp: add a new test for the sync mode of seccomp_user_notify From: Andrei Vagin To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Kees Cook , Christian Brauner , Chen Yu , Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755949472212230093?= X-GMAIL-MSGID: =?utf-8?q?1755949472212230093?= From: Andrei Vagin Test output: # RUN global.user_notification_sync ... # OK global.user_notification_sync ok 51 global.user_notification_sync Signed-off-by: Andrei Vagin --- tools/testing/selftests/seccomp/seccomp_bpf.c | 55 +++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 9c2f448bb3a9..05b8de6d1fcb 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -4243,6 +4243,61 @@ TEST(user_notification_addfd_rlimit) close(memfd); } +#ifndef SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) +#endif + +TEST(user_notification_sync) +{ + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + int status, listener; + pid_t pid; + long ret; + + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(0, ret) { + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); + } + + listener = user_notif_syscall(__NR_getppid, + SECCOMP_FILTER_FLAG_NEW_LISTENER); + ASSERT_GE(listener, 0); + + /* Try to set invalid flags. */ + EXPECT_SYSCALL_RETURN(-EINVAL, + ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, 0xffffffff, 0)); + + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, + SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP, 0), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + if (pid == 0) { + ret = syscall(__NR_getppid); + ASSERT_EQ(ret, USER_NOTIF_MAGIC) { + _exit(1); + } + _exit(0); + } + + req.pid = 0; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + + ASSERT_EQ(req.data.nr, __NR_getppid); + + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + resp.flags = 0; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + ASSERT_EQ(waitpid(pid, &status, 0), pid); + ASSERT_EQ(status, 0); +} + + /* Make sure PTRACE_O_SUSPEND_SECCOMP requires CAP_SYS_ADMIN. */ FIXTURE(O_SUSPEND_SECCOMP) { pid_t pid; From patchwork Tue Jan 24 23:41:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 47958 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2429661wrn; Tue, 24 Jan 2023 15:45:25 -0800 (PST) X-Google-Smtp-Source: AMrXdXtxx/Oqj5pNhhsYZNAJNibValE7epvSvRKms2/Yf1PNufAF++0e7Vef+hBU5kGdGqeutzuC X-Received: by 2002:a17:90b:4389:b0:229:263c:5d6a with SMTP id in9-20020a17090b438900b00229263c5d6amr31742056pjb.6.1674603925706; Tue, 24 Jan 2023 15:45:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674603925; cv=none; d=google.com; s=arc-20160816; b=VlbsHYHRBgrmqdvs3uM0dDGhfP7zB+GOojnGpAGbAsYn/3g62xVDA5cJBsLGCLzo9X tN+CDh43vmbZrI+bfLRi5Cft9Mh8i4ypgdJq46WJuyi/XRxTh1YA+tbBVZkenM7/mTV3 wSUYdBpSMfjuxc7bGPGXs3pSgFz9i9ykEeZl4ZQcubJtIQp7jGJjEh4C8nXlxjBEkfLo kL6qc+Pu8SGCdP++12YH6bpb8ulqCXgCAIHmB8rATkI8FExlKCBWmtf22lX49AwHGNcY bOyu1EcmN4Hx2Wg+UlQY886C56xu3XsY3Hi7Ib/eGJAdsph2AQ6w+suWMjtUBb4XHvsy WztQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=hVyP2low+4E/EH1CecBEmwNgjb6LivwRzJWZvnQqZ38=; b=Y6wbHJKGerXuPbSBqA6NOiHM7fvBNc764vfjXeBHYR433RXnbPwAY6lD/9jL/RWLRZ nuuyh3iKqXP3GDUTQ36dESsNrN+2QK+lLLVbnzYZCdOK9W1cFFo/hJB/a/OnlyE4KtmB CVnl+5rRin6mVgobjFApm+zCQbL4n7P5bCvfeIQ4NozGEZh6ENElbe31LDf8vFUxJST7 e3BuCbcm5Qs0CW/1Y79LC0vehNyIczZLBVwFmVZav5yBoEbVvFYHt4URH65lcQnfw/o/ qXg3wo/AeVpuXXfw2/1RnL/L0xKb9HxMGEpNA0N/SHDpiC265Agn9hUfQli4KlTgMcND bNmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CgWzqsRL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ay9-20020a17090b030900b002201e3b4a66si290115pjb.71.2023.01.24.15.45.13; Tue, 24 Jan 2023 15:45:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CgWzqsRL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234126AbjAXXme (ORCPT + 99 others); Tue, 24 Jan 2023 18:42:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229584AbjAXXmT (ORCPT ); Tue, 24 Jan 2023 18:42:19 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D88C64F840 for ; Tue, 24 Jan 2023 15:42:13 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id b196-20020a621bcd000000b0058a63dc105eso7337694pfb.6 for ; Tue, 24 Jan 2023 15:42:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hVyP2low+4E/EH1CecBEmwNgjb6LivwRzJWZvnQqZ38=; b=CgWzqsRLgW+Z6iMokOKOAfzSxzWgX7y8odBVqiVYJkQ/euRfGJnxt/bGi0T1Ruek7b F1bWr5K+NRm6iEg+TLQz3/gAMkfZ9zSqbJma/fA6KIrEx6X4+Y4DCMZTbmIdmaJ3fpLC 0pNYkhIibJFhNUGwUDm9v/IXXqFXrcYxMj9TMvNdvhW+Ov5BYm1oBQOP9BBTLC3ZEbwI xcjqa6Q8AFyIjrXFncakYYGvf9k4byPTjqvZUnp0v6Y+jT3h+qFFJyL7SEYaPXswrStg wk+r0pslWv3IqhxdTmh+Wyle5dLYdtECseITebzfXT4N2BWbBDw69PzVFQ3PkUw7CYO+ C7wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hVyP2low+4E/EH1CecBEmwNgjb6LivwRzJWZvnQqZ38=; b=GBO2tIY/0f53Z12BfL+qU7bv+/t7yyl0rHcOvW8XFYlWcVmuRERSya/LonDl//4sLi Fwf/ZF29VX5kXZRqCnt12bS/SZflD6hCSIFeepFgieQg43ayyBhetXRCXMXj5Ac2NnoR Iea19dDK+bvdMo0ZHXSfRd6N538khWCF3ImchtbPsIUxLirmf1f1eiweh0hNl/XO+Val pVBrM971Aj9co4eg6XmRmvwO6hCrjhRIAKUEw8QUZNc4/8qtaNvBy/BgMoVIRjyJ3W1d lGEkdYKKC+6k34GHJSE9o4S1/MpJFOp3xFEFbNW69cJ15aSiICOyytuTJ0augtblg6/7 4IgA== X-Gm-Message-State: AFqh2koY3F+WCFMCEelm9aKnQxmZyiuQjdOk3HvSDV3PErV2KbwyIcPA cVWixdCxDrLPVzjFxxfuRdh0OcJAIvs= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:cf1b:2f7f:3ca1:6488]) (user=avagin job=sendgmr) by 2002:a63:e554:0:b0:4c2:95ad:fd77 with SMTP id z20-20020a63e554000000b004c295adfd77mr2951926pgj.67.1674603733167; Tue, 24 Jan 2023 15:42:13 -0800 (PST) Date: Tue, 24 Jan 2023 15:41:56 -0800 In-Reply-To: <20230124234156.211569-1-avagin@google.com> Mime-Version: 1.0 References: <20230124234156.211569-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230124234156.211569-7-avagin@google.com> Subject: [PATCH 6/6] perf/benchmark: add a new benchmark for seccom_unotify From: Andrei Vagin To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Kees Cook , Christian Brauner , Chen Yu , Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755949485992657997?= X-GMAIL-MSGID: =?utf-8?q?1755949485992657997?= From: Andrei Vagin The benchmark is similar to the pipe benchmark. It creates two processes, one is calling syscalls, and another process is handling them via seccomp user notifications. It measures the time required to run a specified number of interations. $ ./perf bench sched seccomp-notify --sync-mode --loop 1000000 # Running 'sched/seccomp-notify' benchmark: # Executed 1000000 system calls Total time: 2.769 [sec] 2.769629 usecs/op 361059 ops/sec $ ./perf bench sched seccomp-notify # Running 'sched/seccomp-notify' benchmark: # Executed 1000000 system calls Total time: 8.571 [sec] 8.571119 usecs/op 116670 ops/sec Signed-off-by: Andrei Vagin --- tools/arch/x86/include/uapi/asm/unistd_32.h | 3 + tools/arch/x86/include/uapi/asm/unistd_64.h | 3 + tools/perf/bench/Build | 1 + tools/perf/bench/bench.h | 1 + tools/perf/bench/sched-seccomp-notify.c | 167 ++++++++++++++++++++ tools/perf/builtin-bench.c | 1 + 6 files changed, 176 insertions(+) create mode 100644 tools/perf/bench/sched-seccomp-notify.c diff --git a/tools/arch/x86/include/uapi/asm/unistd_32.h b/tools/arch/x86/include/uapi/asm/unistd_32.h index 60a89dba01b6..c0c74befc8df 100644 --- a/tools/arch/x86/include/uapi/asm/unistd_32.h +++ b/tools/arch/x86/include/uapi/asm/unistd_32.h @@ -14,3 +14,6 @@ #ifndef __NR_setns # define __NR_setns 346 #endif +#ifdef __NR_seccomp +#define __NR_seccomp 354 +#endif diff --git a/tools/arch/x86/include/uapi/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h index cb52a3a8b8fc..b695246da684 100644 --- a/tools/arch/x86/include/uapi/asm/unistd_64.h +++ b/tools/arch/x86/include/uapi/asm/unistd_64.h @@ -14,3 +14,6 @@ #ifndef __NR_setns #define __NR_setns 308 #endif +#ifndef __NR_seccomp +#define __NR_seccomp 317 +#endif diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build index 6b6155a8ad09..e3ec2c1b0682 100644 --- a/tools/perf/bench/Build +++ b/tools/perf/bench/Build @@ -1,5 +1,6 @@ perf-y += sched-messaging.o perf-y += sched-pipe.o +perf-y += sched-seccomp-notify.o perf-y += syscall.o perf-y += mem-functions.o perf-y += futex-hash.o diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h index a5d49b3b6a09..40657b0959a9 100644 --- a/tools/perf/bench/bench.h +++ b/tools/perf/bench/bench.h @@ -21,6 +21,7 @@ extern struct timeval bench__start, bench__end, bench__runtime; int bench_numa(int argc, const char **argv); int bench_sched_messaging(int argc, const char **argv); int bench_sched_pipe(int argc, const char **argv); +int bench_sched_seccomp_notify(int argc, const char **argv); int bench_syscall_basic(int argc, const char **argv); int bench_mem_memcpy(int argc, const char **argv); int bench_mem_memset(int argc, const char **argv); diff --git a/tools/perf/bench/sched-seccomp-notify.c b/tools/perf/bench/sched-seccomp-notify.c new file mode 100644 index 000000000000..f6f32b0a865a --- /dev/null +++ b/tools/perf/bench/sched-seccomp-notify.c @@ -0,0 +1,167 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include "bench.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define LOOPS_DEFAULT 1000000UL +static uint64_t loops = LOOPS_DEFAULT; +static bool sync_mode; +static const struct option options[] = { + OPT_U64('l', "loop", &loops, "Specify number of loops"), + OPT_BOOLEAN('s', "sync-mode", &sync_mode, + "Enable the synchronious mode for seccomp notifications"), + OPT_END() +}; + +static const char * const bench_seccomp_usage[] = { + "perf bench sched secccomp-notify ", + NULL +}; + +static int seccomp(unsigned int op, unsigned int flags, void *args) +{ + return syscall(__NR_seccomp, op, flags, args); +} + +static int user_notif_syscall(int nr, unsigned int flags) +{ + struct sock_filter filter[] = { + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, + offsetof(struct seccomp_data, nr)), + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, nr, 0, 1), + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF), + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), + }; + + struct sock_fprog prog = { + .len = (unsigned short)ARRAY_SIZE(filter), + .filter = filter, + }; + + return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); +} + +#define USER_NOTIF_MAGIC INT_MAX +static void user_notification_sync_loop(int listener) +{ + struct seccomp_notif_resp resp; + struct seccomp_notif req; + uint64_t nr; + + for (nr = 0; nr < loops; nr++) { + memset(&req, 0, sizeof(req)); + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req) == 0); + + assert(req.data.nr == __NR_gettid); + + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + resp.flags = 0; + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp) == 0); + } +} + +#ifndef SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) +#endif +int bench_sched_seccomp_notify(int argc, const char **argv) +{ + struct timeval start, stop, diff; + unsigned long long result_usec = 0; + int status, listener; + pid_t pid; + long ret; + + argc = parse_options(argc, argv, options, bench_seccomp_usage, 0); + + gettimeofday(&start, NULL); + + prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + listener = user_notif_syscall(__NR_gettid, + SECCOMP_FILTER_FLAG_NEW_LISTENER); + assert(listener >= 0); + + pid = fork(); + assert(pid >= 0); + if (pid == 0) { + assert(prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0) == 0); + while (1) { + ret = syscall(__NR_gettid); + if (ret == USER_NOTIF_MAGIC) + continue; + break; + } + _exit(1); + } + + if (sync_mode) { + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, + SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP, 0) == 0); + } + user_notification_sync_loop(listener); + + kill(pid, SIGKILL); + assert(waitpid(pid, &status, 0) == pid); + assert(WIFSIGNALED(status)); + assert(WTERMSIG(status) == SIGKILL); + + gettimeofday(&stop, NULL); + timersub(&stop, &start, &diff); + + switch (bench_format) { + case BENCH_FORMAT_DEFAULT: + printf("# Executed %lu system calls\n\n", + loops); + + result_usec = diff.tv_sec * USEC_PER_SEC; + result_usec += diff.tv_usec; + + printf(" %14s: %lu.%03lu [sec]\n\n", "Total time", + (unsigned long) diff.tv_sec, + (unsigned long) (diff.tv_usec / USEC_PER_MSEC)); + + printf(" %14lf usecs/op\n", + (double)result_usec / (double)loops); + printf(" %14d ops/sec\n", + (int)((double)loops / + ((double)result_usec / (double)USEC_PER_SEC))); + break; + + case BENCH_FORMAT_SIMPLE: + printf("%lu.%03lu\n", + (unsigned long) diff.tv_sec, + (unsigned long) (diff.tv_usec / USEC_PER_MSEC)); + break; + + default: + /* reaching here is something disaster */ + fprintf(stderr, "Unknown format:%d\n", bench_format); + exit(1); + break; + } + + return 0; +} diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c index 334ab897aae3..71044575c571 100644 --- a/tools/perf/builtin-bench.c +++ b/tools/perf/builtin-bench.c @@ -46,6 +46,7 @@ static struct bench numa_benchmarks[] = { static struct bench sched_benchmarks[] = { { "messaging", "Benchmark for scheduling and IPC", bench_sched_messaging }, { "pipe", "Benchmark for pipe() between two processes", bench_sched_pipe }, + { "seccomp-notify", "Benchmark for seccomp user notify", bench_sched_seccomp_notify}, { "all", "Run all scheduler benchmarks", NULL }, { NULL, NULL, NULL } };