From patchwork Thu Feb 2 03:04:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 51637 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp5239wrn; Wed, 1 Feb 2023 19:05:33 -0800 (PST) X-Google-Smtp-Source: AK7set/qfWkjhVXtaT47lqLELQzN1MueQG4Zf2S97Jx2Kkw36cj97XFIkvAMa/EH5cSX4ItuyMMG X-Received: by 2002:a05:6a00:2491:b0:593:ec73:424d with SMTP id c17-20020a056a00249100b00593ec73424dmr5602274pfv.8.1675307132996; Wed, 01 Feb 2023 19:05:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675307132; cv=none; d=google.com; s=arc-20160816; b=CblXB1x68LSWDMtBf5LRYHkD6tFL+dw5YwcF5QQd/LyjTo7+k7YZuLx18d4A+7inw/ Upn+vorGBrhgkhhHOWfLK7b042xNfWjSfMwbK1on/jwh7LgThpmfIJt71M+s1ULn363S 6r1vf80wpS9kKAxHcw80BxTnJe089wmDqXgETUkknRodq0/LVxgrtb2sxlBfEu3qMxPI rE50013DI8mEMbRiVjoMRFTuzal3qVMFSFdm4QU1GnbRiBV6cTVX091vjRslIPl56Q/s F0BbLk7kVl34RtvTpegHu8zdY/+s85WmGcmiSccyT5SotjfzBP/9gWTeG25KCpcP8OdX 9Usg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=xxNz1I6f1ntPCey48VXsGK/J5ar2tTOh4Waqg/8Yo5s=; b=FdlcX/sW0Bp+NNafd/rdYnc5J/wb6JT6eBeHwRfUz7JH4yL0sjkVBPEL/z+yv/EwfR ithYg+taH+rQQD32+AJ3mW1yut6vHhDHqNq/NxZJ1/+WfjrFFcFpL4fw5BKWieCTNHOP vsmXgq8S/a++59z8mDWqMSfC7kJ5aXWOSHpQvBLRZb/Ogjf1EE//HQTMZrQA02er4RYN 1ixXUMA2C+8rekcwZKekFVAztfHJcwvNtwxs3Wyr/cCIVgxb9QWyloNDi275WGgO392a xo0lOAZ/cBggpQcruwvW4XZGgX6+z/V6xf+mosCZdvtv+xEabALrd75RykF7hRs/XZMO f/ww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=UuLCw0Xv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u205-20020a6279d6000000b00590bfd23272si21394645pfc.304.2023.02.01.19.05.21; Wed, 01 Feb 2023 19:05:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=UuLCw0Xv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231376AbjBBDEm (ORCPT + 99 others); Wed, 1 Feb 2023 22:04:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229630AbjBBDEj (ORCPT ); Wed, 1 Feb 2023 22:04:39 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70783F9 for ; Wed, 1 Feb 2023 19:04:38 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id k204-20020a256fd5000000b007b8b040bc50so527691ybc.1 for ; Wed, 01 Feb 2023 19:04:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xxNz1I6f1ntPCey48VXsGK/J5ar2tTOh4Waqg/8Yo5s=; b=UuLCw0Xv/mTrMv3zluwN0thXeO+NQ0dikgAAZB90G1dHXsYWmHsCgbCHlTUXEMEfZ8 gEEg0d/y02gM0sesKJRteW43UijVtmvp211AGjlJKl7r/n82zxtm7XJjBmRIIY/kVukR be8SUS4/HEu7/21XjrLcgrdepzwnQN/oku5GfZ26Z+8AESIFvO0wtRn2ExRYu8tCtca1 QRziFUN1gmTVezcAmNUR3UIe5qUD9JLeL/kCIxR6ffe5S5R3utE+kQHGYrQOYH4PPO1T iQGAqahpD2aRZFDmKvtKK656DjDrYBAoDqFG2RmaC9sgeujPiUCDlnCRpOzE/2SeI5Xz 6Ylg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xxNz1I6f1ntPCey48VXsGK/J5ar2tTOh4Waqg/8Yo5s=; b=Nx7Yyy4G8RRF5h35wX9u2sHZHNad993QWU4/OGU4SRd5sRlFJjUDDBIm9HWr+pgdGy Z2nXyFV4bodsXI/QpoF3/pZNCsS+MYriOpeOTfxweOoM/FyYXCrB44ucpyiyAB8/mWRa Uj0SwPX/5FePYxEc5mtVE2EMrPIrA7L2kwTYRqMzmytO4mpM9BElsHIt4D57txHfDdn0 wLnfQOI38lSYXOa/IDhsdPUMXcsp/R6cyy9qSZyDMrcTBsY9fLZcQtp7a+w+8ROHhoM9 JC8IBxw4x7CAxWZ3pU2Z4CKhdGxFv+I0WJJPYch23CYn5cXU4AeVAGGiNxaGWMjv051P hZSQ== X-Gm-Message-State: AO0yUKULhQ7tEppRRwME4ogMyoH7bcWhJ3l73DCcGpBxW8mQ4f/HZ1WF MahZak0ik/UgaOEgkRxAb/OCrOI4X+0= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:eee0:dc42:a911:8b59]) (user=avagin job=sendgmr) by 2002:a81:1c4:0:b0:4fe:3a3c:d911 with SMTP id 187-20020a8101c4000000b004fe3a3cd911mr488130ywb.311.1675307077693; Wed, 01 Feb 2023 19:04:37 -0800 (PST) Date: Wed, 1 Feb 2023 19:04:24 -0800 In-Reply-To: <20230202030429.3304875-1-avagin@google.com> Mime-Version: 1.0 References: <20230202030429.3304875-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog Message-ID: <20230202030429.3304875-2-avagin@google.com> Subject: [PATCH 1/6] seccomp: don't use semaphore and wait_queue together From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756686852553071456?= X-GMAIL-MSGID: =?utf-8?q?1756686852553071456?= The main reason is to use new wake_up helpers that will be added in the following patches. But here are a few other reasons: * if we use two different ways, we always need to call them both. This patch fixes seccomp_notify_recv where we forgot to call wake_up_poll in the error path. * If we use one primitive, we can control how many waiters are woken up for each request. Our goal is to wake up just one that will handle a request. Right now, wake_up_poll can wake up one waiter and up(&match->notif->request) can wake up one more. Signed-off-by: Andrei Vagin --- kernel/seccomp.c | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index e9852d1b4a5e..876022e9c88c 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -145,7 +145,7 @@ struct seccomp_kaddfd { * @notifications: A list of struct seccomp_knotif elements. */ struct notification { - struct semaphore request; + atomic_t requests; u64 next_id; struct list_head notifications; }; @@ -1116,7 +1116,7 @@ static int seccomp_do_user_notification(int this_syscall, list_add_tail(&n.list, &match->notif->notifications); INIT_LIST_HEAD(&n.addfd); - up(&match->notif->request); + atomic_add(1, &match->notif->requests); wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); /* @@ -1450,6 +1450,37 @@ find_notification(struct seccomp_filter *filter, u64 id) return NULL; } +static int recv_wake_function(wait_queue_entry_t *wait, unsigned int mode, int sync, + void *key) +{ + /* Avoid a wakeup if event not interesting for us. */ + if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR))) + return 0; + return autoremove_wake_function(wait, mode, sync, key); +} + +static int recv_wait_event(struct seccomp_filter *filter) +{ + DEFINE_WAIT_FUNC(wait, recv_wake_function); + int ret; + + if (atomic_add_unless(&filter->notif->requests, -1, 0) != 0) + return 0; + + for (;;) { + ret = prepare_to_wait_event(&filter->wqh, &wait, TASK_INTERRUPTIBLE); + + if (atomic_add_unless(&filter->notif->requests, -1, 0) != 0) + break; + + if (ret) + return ret; + + schedule(); + } + finish_wait(&filter->wqh, &wait); + return 0; +} static long seccomp_notify_recv(struct seccomp_filter *filter, void __user *buf) @@ -1467,7 +1498,7 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, memset(&unotif, 0, sizeof(unotif)); - ret = down_interruptible(&filter->notif->request); + ret = recv_wait_event(filter); if (ret < 0) return ret; @@ -1515,7 +1546,8 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, if (should_sleep_killable(filter, knotif)) complete(&knotif->ready); knotif->state = SECCOMP_NOTIFY_INIT; - up(&filter->notif->request); + atomic_add(1, &filter->notif->requests); + wake_up_poll(&filter->wqh, EPOLLIN | EPOLLRDNORM); } mutex_unlock(&filter->notify_lock); } @@ -1777,7 +1809,6 @@ static struct file *init_listener(struct seccomp_filter *filter) if (!filter->notif) goto out; - sema_init(&filter->notif->request, 0); filter->notif->next_id = get_random_u64(); INIT_LIST_HEAD(&filter->notif->notifications); From patchwork Thu Feb 2 03:04:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 51638 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp5247wrn; Wed, 1 Feb 2023 19:05:34 -0800 (PST) X-Google-Smtp-Source: AK7set9zCcXjirIbV4n6OdB6Lo1F+fxXrylbqXldJcTlRnsjLpRqnmGAHeBit4O5LewqLP6ryfzM X-Received: by 2002:a17:902:f301:b0:196:8a5d:40fe with SMTP id c1-20020a170902f30100b001968a5d40femr3959694ple.66.1675307134680; Wed, 01 Feb 2023 19:05:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675307134; cv=none; d=google.com; s=arc-20160816; b=Uy2lzFO9xFmntvK77gJtnBXBmxRqZ0BQwbeGTHymTn2ousswdPAFf/rqm6vOoyHGpQ H6qUdDHE3iC1Uie9CEzJYpCAIb6yyfM92Vyb+VNOLvL7AwXb6hL0UaRIqNuKFh1JUY9x hoFgcZUBpsetegBYHcOBaK8e9sJztecsXTvjOAwvY0JM/BjnTV5OJ7+DhCAUo3pi4dk5 VwU1h4lefvQf5H8KMI8/8VKbsL79CwWwwwVyqE9QV5qvxyBoGpGOR6tIW6tq93MHJ5oM /SyXQmfjc7a7hyuQ+HRW9Oe99+NNpvBzBZxZfoVxszHbVzitG3EfIQWjKgFlGWYjFGdb obMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=QH2jJ0lYMHyWgETUubfbRs+kKTgqzQ/t/S1vJNh165w=; b=gqzqjsqxmJR0Hol4Egbjxd5rL9cVoA5tG/2LF5qDQ6Q/HdBOoAtf7UXHCCo4YZ8QqX eQA/Yy5k7TRaFQmJ5zQFFvnI2eA2VTvC1UK2LC9huyH06dMbr2n5s2Z5AONTiZj5DGqr Z5exom2Ul7wfZpYiJPRAA2CraaZbMzDSM5EE5t3zddDi3nCd4syHQYnpd+o5IHcB+OUc 3rPqFlPeSOokpxAYn/6lHpCXcwrj3obmG2vO0RyyHPl6QknJPyCr6TkBpibQhkUumsWM Detlqr++oKPrOXdaTlV448Crlp+srYgBP2U9rDZiBj63OtxFRWi1sD0sUK/5rM59m0hq J/oA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rtMV7RhR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h12-20020a170902f7cc00b001978ecec8b2si8646069plw.494.2023.02.01.19.05.22; Wed, 01 Feb 2023 19:05:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rtMV7RhR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230526AbjBBDEo (ORCPT + 99 others); Wed, 1 Feb 2023 22:04:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230361AbjBBDEl (ORCPT ); Wed, 1 Feb 2023 22:04:41 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32D165FFA for ; Wed, 1 Feb 2023 19:04:40 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id h4-20020a17090aa88400b0022c8dfc9db4so364054pjq.2 for ; Wed, 01 Feb 2023 19:04:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QH2jJ0lYMHyWgETUubfbRs+kKTgqzQ/t/S1vJNh165w=; b=rtMV7RhRmvPg8Ucd6Z/onimnlE9lf06PbCwPfNV/GB8xMtJNDystCRo0DaxhIgmIUb nLjjcHpdKF63ac6JkmxtKZjVRVN9v4Gx+sNYnl7HmBq+h/5XsYcP5ihqGsSnKW/xUAwb 73lPnZ7/LqTvLJXCDPF2MfnbWcirqk6QT8cp2Ho/5bzpw+cOx4ZstUEuk3XKB2MU78zV 0xFJc0vMyEcgAky7/8WRFvqyTl8Mtwyzp7jvpHEiKlbUvjn8hpMoq92XNqITMU6f43cu qyRuKxqroLeFReijOSuysEkFbUFi3MaklNT6r9E0fNBx+vMsWMmrzW6fHFftdqBVz6hT Kanw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QH2jJ0lYMHyWgETUubfbRs+kKTgqzQ/t/S1vJNh165w=; b=1WQa3GKLrMr0Q8U1DiWSlZaVy8Kju0vvl4I2xku1wZPILbhpyxjj8f1xrfNuzQMTl5 Ok5KpaClcEI3v6v9GEEDaU/PJe3bmMYaFb32zHtlAyAYb5K1sOLb7Th+m/bJk1PvQgja XRzIwsfbGAcCWPE+n0y407aPaKzzTETC2dBfYRgLcMlDqGfm1EGYT4myl3bSXtdxXgRo 1wjCYTubqbj2mdd/Z4IiA/XTc47YkI3n2tTKF3vGL5fwwp4/H8fBvkWkJTyuB7TOuXl2 47aNBiMcytkBunztSXTzdZF4hgk7JQ0qzlmV2O69IbXvvQqI8LaC8YLFm2r+u3sW+weg eBfQ== X-Gm-Message-State: AO0yUKWxZBLjI5gedWy3NKEetJ886NfoAOHaOXtacOvorPmdQYGjQnhZ Ydd9/QB66eyrO8vsdhB6IGO/Eet8Pa4= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:eee0:dc42:a911:8b59]) (user=avagin job=sendgmr) by 2002:a05:6a00:17a7:b0:590:762f:58bc with SMTP id s39-20020a056a0017a700b00590762f58bcmr1077020pfg.50.1675307079604; Wed, 01 Feb 2023 19:04:39 -0800 (PST) Date: Wed, 1 Feb 2023 19:04:25 -0800 In-Reply-To: <20230202030429.3304875-1-avagin@google.com> Mime-Version: 1.0 References: <20230202030429.3304875-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog Message-ID: <20230202030429.3304875-3-avagin@google.com> Subject: [PATCH 2/6] sched: add WF_CURRENT_CPU and externise ttwu From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756686854163674893?= X-GMAIL-MSGID: =?utf-8?q?1756686854163674893?= From: Peter Oskolkov Add WF_CURRENT_CPU wake flag that advices the scheduler to move the wakee to the current CPU. This is useful for fast on-CPU context switching use cases. In addition, make ttwu external rather than static so that the flag could be passed to it from outside of sched/core.c. Signed-off-by: Peter Oskolkov Signed-off-by: Andrei Vagin --- kernel/sched/core.c | 3 +-- kernel/sched/fair.c | 4 ++++ kernel/sched/sched.h | 13 ++++++++----- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e838feb6adc5..25e902b40a18 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4112,8 +4112,7 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success) * Return: %true if @p->state changes (an actual wakeup was done), * %false otherwise. */ -static int -try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) +int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) { unsigned long flags; int cpu, success = 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0f8736991427..698828bd8d72 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7377,6 +7377,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags) if (wake_flags & WF_TTWU) { record_wakee(p); + if ((wake_flags & WF_CURRENT_CPU) && + cpumask_test_cpu(cpu, p->cpus_ptr)) + return cpu; + if (sched_energy_enabled()) { new_cpu = find_energy_efficient_cpu(p, prev_cpu); if (new_cpu >= 0) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 771f8ddb7053..34b4c54b2a2a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2088,12 +2088,13 @@ static inline int task_on_rq_migrating(struct task_struct *p) } /* Wake flags. The first three directly map to some SD flag value */ -#define WF_EXEC 0x02 /* Wakeup after exec; maps to SD_BALANCE_EXEC */ -#define WF_FORK 0x04 /* Wakeup after fork; maps to SD_BALANCE_FORK */ -#define WF_TTWU 0x08 /* Wakeup; maps to SD_BALANCE_WAKE */ +#define WF_EXEC 0x02 /* Wakeup after exec; maps to SD_BALANCE_EXEC */ +#define WF_FORK 0x04 /* Wakeup after fork; maps to SD_BALANCE_FORK */ +#define WF_TTWU 0x08 /* Wakeup; maps to SD_BALANCE_WAKE */ -#define WF_SYNC 0x10 /* Waker goes to sleep after wakeup */ -#define WF_MIGRATED 0x20 /* Internal use, task got migrated */ +#define WF_SYNC 0x10 /* Waker goes to sleep after wakeup */ +#define WF_MIGRATED 0x20 /* Internal use, task got migrated */ +#define WF_CURRENT_CPU 0x40 /* Prefer to move the wakee to the current CPU. */ #ifdef CONFIG_SMP static_assert(WF_EXEC == SD_BALANCE_EXEC); @@ -3245,6 +3246,8 @@ static inline bool is_per_cpu_kthread(struct task_struct *p) extern void swake_up_all_locked(struct swait_queue_head *q); extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); +extern int try_to_wake_up(struct task_struct *tsk, unsigned int state, int wake_flags); + #ifdef CONFIG_PREEMPT_DYNAMIC extern int preempt_dynamic_mode; extern int sched_dynamic_mode(const char *str); From patchwork Thu Feb 2 03:04:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 51641 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp10301wrn; Wed, 1 Feb 2023 19:20:50 -0800 (PST) X-Google-Smtp-Source: AK7set+N6Fy+bRAzdA8xnSD6tQSCL4B/IZN0j/yWpzP3VRnOpbzSLDeYSaouhVV3cqU0PWGiqkF4 X-Received: by 2002:aa7:800b:0:b0:590:13f4:e08c with SMTP id j11-20020aa7800b000000b0059013f4e08cmr3626729pfi.0.1675308049757; Wed, 01 Feb 2023 19:20:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675308049; cv=none; d=google.com; s=arc-20160816; b=mUWPZLjCHWV5LJwtMgCDFW8CGuEfrF9F0WhUaE6WPmVpqoBxmvISnz4BE7HLLpX0FM dmYQxf23WIptr1nd6LDvVVUOiBdRzlR3VmBsGNLWmBv1xG5vZ+WtdmogODChdwLWWZdw YXdOWhhOEIU/b2jt9Kf52cOp+V7dGlMf7BHrBO3LE18o2zMJQdd5msDLNXnehmJAQT0d POtdaOvc0hRe/pITV00m5oxe+VeuF7AhVrT+LB4BALc66UaAd5zbD5JhD/onWEDaAiKa eV0pD6koxYkSjxtTKZijRGGdcz7FHL6g49pdanpxq2zrJrSxOwVZt1eYD46L9EkmSBLE hQig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=HTX8wzNPsZu74zeJph9OkWigioGFnAWTTV+3hnubuUc=; b=fI4YoA0CfTbZX+JgIfWH1spkT6CCj2iGTntijw+Xftlp51NAiEfyUsl660H4n3EO75 ldJRDS7jVCwVKmWm2cX5Zc/eGOC46euZJ65niTu7Ej3tCGCRpT779wdtdYxsV2F7ruNF nTq23Xly6aKCLm37W7RsoImm22QzMynMPdft+ctYQQV/ZZREPx8L07z0eDeOwt791HL9 9jKIm+YRdTO+pIBW1Rnn5OogGM9ToD/IG6E7KycSlgrdJb4Qv7LlsjtY7gZDhQYFbIG/ QbOX8RqpKRRp/41sH7D+k9Cb9OmkP6wLUQnrVwA0fhB6NiHesmZeXymbbTJU7NRX2EAV PHjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GOtFddA3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q26-20020a056a0002ba00b0058aa91de9e1si19523477pfs.25.2023.02.01.19.20.07; Wed, 01 Feb 2023 19:20:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GOtFddA3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231589AbjBBDFA (ORCPT + 99 others); Wed, 1 Feb 2023 22:05:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231479AbjBBDEo (ORCPT ); Wed, 1 Feb 2023 22:04:44 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 041866A6C for ; Wed, 1 Feb 2023 19:04:43 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id a4-20020a5b0004000000b006fdc6aaec4fso472026ybp.20 for ; Wed, 01 Feb 2023 19:04:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HTX8wzNPsZu74zeJph9OkWigioGFnAWTTV+3hnubuUc=; b=GOtFddA3jxoavTZhKPhRLErj6Xkb3Nv2UZkRtMDrpFRvyVQDBdEOPwG/v0XIPWnRvC j7Gaqx3nyr86hFgCUJ0n7zchDddpUdd9OA7KB3kMDzolPDe3dM2FAdiVw6n4GEV0/vzj bA9ItArQ8334Kf38p3DR7LxMAl5cEswp9ELVGyAi4hgAbkyDzMAd5DXoJKTXRF53QlwP LjGWWGyVrUsnkSZbnuoAtDRYN+NA/e0/CLDl/WiNh3+a44prHQ1Xa52lqDKUwbkstZT3 yobiXmx5I340M6Ou/G+0HS5K9Da65i/zixBHhCMAWlKJoxMNBiRZ/3UMNejSTwwtzaqL jkOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HTX8wzNPsZu74zeJph9OkWigioGFnAWTTV+3hnubuUc=; b=qDZau0Wzm30p+X1iZxsLWJMzLHCKeRcJRNp1oRCzxMVd1mzsJHe3CbKQxsI+7oCG6u xn2HyMqWhgqQCpfAyFo8cJwssCmrinixky0H7I+iFupp2xsAZzImRsbeKg+x6d0ixQSD ayMsLEB0JF+ToNeT8DnuOqqzVwzQXaC+DtNodjeeFH2Rk0U1I7c4UdEKnkjUdtBv9GIA PU9uAifuOE1PWoWM3PYT2ix+gHiFWDfAGjSK7k1cfJILSmXD6/zXEpsDZ5Ay7chB5Tc3 XgSCmi6DzBAKQUERfecyYEUWlDDe8+Z2TfB99Bi2YcFbp1VCOOcE+j1uNDWBFSOeL1AZ MJwQ== X-Gm-Message-State: AO0yUKV2IZrJ8S6XN6/1AJRgMUkqNLtuophyicjIsW5jIHBd8jSJ/luZ i6axH4dWTzSSqYPcgkAZz/3z0/yxtsg= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:eee0:dc42:a911:8b59]) (user=avagin job=sendgmr) by 2002:a81:7d0:0:b0:521:da86:f53 with SMTP id 199-20020a8107d0000000b00521da860f53mr0ywh.6.1675307081548; Wed, 01 Feb 2023 19:04:41 -0800 (PST) Date: Wed, 1 Feb 2023 19:04:26 -0800 In-Reply-To: <20230202030429.3304875-1-avagin@google.com> Mime-Version: 1.0 References: <20230202030429.3304875-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog Message-ID: <20230202030429.3304875-4-avagin@google.com> Subject: [PATCH 3/6] sched: add a few helpers to wake up tasks on the current cpu From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756687813519268875?= X-GMAIL-MSGID: =?utf-8?q?1756687813519268875?= Add complete_on_current_cpu, wake_up_poll_on_current_cpu helpers to wake up tasks on the current CPU. These two helpers are useful when the task needs to make a synchronous context switch to another task. In this context, synchronous means it wakes up the target task and falls asleep right after that. One example of such workloads is seccomp user notifies. This mechanism allows the supervisor process handles system calls on behalf of a target process. While the supervisor is handling an intercepted system call, the target process will be blocked in the kernel, waiting for a response to come back. On-CPU context switches are much faster than regular ones. Signed-off-by: Andrei Vagin --- include/linux/completion.h | 1 + include/linux/swait.h | 2 +- include/linux/wait.h | 3 +++ kernel/sched/completion.c | 26 ++++++++++++++++++-------- kernel/sched/core.c | 2 +- kernel/sched/swait.c | 8 ++++---- kernel/sched/wait.c | 5 +++++ 7 files changed, 33 insertions(+), 14 deletions(-) diff --git a/include/linux/completion.h b/include/linux/completion.h index 62b32b19e0a8..fb2915676574 100644 --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -116,6 +116,7 @@ extern bool try_wait_for_completion(struct completion *x); extern bool completion_done(struct completion *x); extern void complete(struct completion *); +extern void complete_on_current_cpu(struct completion *x); extern void complete_all(struct completion *); #endif diff --git a/include/linux/swait.h b/include/linux/swait.h index 6a8c22b8c2a5..d324419482a0 100644 --- a/include/linux/swait.h +++ b/include/linux/swait.h @@ -146,7 +146,7 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq) extern void swake_up_one(struct swait_queue_head *q); extern void swake_up_all(struct swait_queue_head *q); -extern void swake_up_locked(struct swait_queue_head *q); +extern void swake_up_locked(struct swait_queue_head *q, int wake_flags); extern void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *wait, int state); extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state); diff --git a/include/linux/wait.h b/include/linux/wait.h index a0307b516b09..5ec7739400f4 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -210,6 +210,7 @@ __remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq } int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, int nr, void *key); +void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, unsigned int mode, void *key, wait_queue_entry_t *bookmark); @@ -237,6 +238,8 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head); #define key_to_poll(m) ((__force __poll_t)(uintptr_t)(void *)(m)) #define wake_up_poll(x, m) \ __wake_up(x, TASK_NORMAL, 1, poll_to_key(m)) +#define wake_up_poll_on_current_cpu(x, m) \ + __wake_up_on_current_cpu(x, TASK_NORMAL, poll_to_key(m)) #define wake_up_locked_poll(x, m) \ __wake_up_locked_key((x), TASK_NORMAL, poll_to_key(m)) #define wake_up_interruptible_poll(x, m) \ diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c index d57a5c1c1cd9..3561ab533dd4 100644 --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -13,6 +13,23 @@ * Waiting for completion is a typically sync point, but not an exclusion point. */ +static void complete_with_flags(struct completion *x, int wake_flags) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&x->wait.lock, flags); + + if (x->done != UINT_MAX) + x->done++; + swake_up_locked(&x->wait, wake_flags); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); +} + +void complete_on_current_cpu(struct completion *x) +{ + return complete_with_flags(x, WF_CURRENT_CPU); +} + /** * complete: - signals a single thread waiting on this completion * @x: holds the state of this particular completion @@ -27,14 +44,7 @@ */ void complete(struct completion *x) { - unsigned long flags; - - raw_spin_lock_irqsave(&x->wait.lock, flags); - - if (x->done != UINT_MAX) - x->done++; - swake_up_locked(&x->wait); - raw_spin_unlock_irqrestore(&x->wait.lock, flags); + complete_with_flags(x, 0); } EXPORT_SYMBOL(complete); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 25e902b40a18..5233e5182755 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6925,7 +6925,7 @@ asmlinkage __visible void __sched preempt_schedule_irq(void) int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags, void *key) { - WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC); + WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~(WF_SYNC|WF_CURRENT_CPU)); return try_to_wake_up(curr->private, mode, wake_flags); } EXPORT_SYMBOL(default_wake_function); diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c index 76b9b796e695..72505cd3b60a 100644 --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -18,7 +18,7 @@ EXPORT_SYMBOL(__init_swait_queue_head); * If for some reason it would return 0, that means the previously waiting * task is already running, so it will observe condition true (or has already). */ -void swake_up_locked(struct swait_queue_head *q) +void swake_up_locked(struct swait_queue_head *q, int wake_flags) { struct swait_queue *curr; @@ -26,7 +26,7 @@ void swake_up_locked(struct swait_queue_head *q) return; curr = list_first_entry(&q->task_list, typeof(*curr), task_list); - wake_up_process(curr->task); + try_to_wake_up(curr->task, TASK_NORMAL, wake_flags); list_del_init(&curr->task_list); } EXPORT_SYMBOL(swake_up_locked); @@ -41,7 +41,7 @@ EXPORT_SYMBOL(swake_up_locked); void swake_up_all_locked(struct swait_queue_head *q) { while (!list_empty(&q->task_list)) - swake_up_locked(q); + swake_up_locked(q, 0); } void swake_up_one(struct swait_queue_head *q) @@ -49,7 +49,7 @@ void swake_up_one(struct swait_queue_head *q) unsigned long flags; raw_spin_lock_irqsave(&q->lock, flags); - swake_up_locked(q); + swake_up_locked(q, 0); raw_spin_unlock_irqrestore(&q->lock, flags); } EXPORT_SYMBOL(swake_up_one); diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 133b74730738..47803a0b8d5d 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -161,6 +161,11 @@ int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, } EXPORT_SYMBOL(__wake_up); +void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key) +{ + __wake_up_common_lock(wq_head, mode, 1, WF_CURRENT_CPU, key); +} + /* * Same as __wake_up but called with the spinlock in wait_queue_head_t held. */ From patchwork Thu Feb 2 03:04:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 51643 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp11445wrn; Wed, 1 Feb 2023 19:24:47 -0800 (PST) X-Google-Smtp-Source: AK7set+cc6yuTQCFLRq5D+nYkr0KS+dkaqxGWU/lj/XDABAll0M3J/Rx42dlFnwpR6uaRIQNit4f X-Received: by 2002:a05:6a20:7b2a:b0:bc:e785:5ad3 with SMTP id s42-20020a056a207b2a00b000bce7855ad3mr851181pzh.29.1675308287136; Wed, 01 Feb 2023 19:24:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675308287; cv=none; d=google.com; s=arc-20160816; b=DvV84NP019mCszrJ9HM+QKiiGQ879PM6YfIpMSmScBfMB29Dx6foaXNJx1eq3TgdFk 8iG5tTFKNprEoCn72saoUt70R3LDUDH6v0QJhUh+3Wpa2S94v86Fu/fY2kMSkkGQWixC rg3WCak2xadjV0Fc5vmqFGlG8Ooma9JQ0YLcwJqc8xAvoguuuncVNzCXmGzRmhXUhRJF xAzs6XzSvp8XA+WM5iHBzqhR4af4x9viNBTQY7Ii6mJG2a7Z9YrNvbJpXGpvyg3duCzL QTK1UR4S2oDrurOzhf4iRiLURzXEQiSvmkbega4wpahehUjVVkJxiff6mxWcePC/URP/ Tjjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=vmKwcK8nOOGhNuEtAlRRmLDd6QNY/0CCAJIdQUNwbyw=; b=CJYba1SCIQDMKz6rhF7z4sk5yUD+VjJjlvIelpw+OXW5GMcO6FYECikqbcDD5aZd2W fld88RroBk+tX/secFMrL3VTZ/q4D2FqmvGnwh1Df9NuM+iTDReoZT87DzlFK8hHFWm2 kVz5VLbdZKfzOL0ZMyh4metf2Kj/OKMFU/BbvOSdnGfo0CINEMr+e+Ep9ChJIZG6yJVo qRSr1TuuRos9dylcptguLSoHu7K42f7JhdWBlKURi/FCc3AB5uDiTBL+b4gfqSQrj3fa 7zIIu81ko2sS5Uz9kKOpBfi7x8wevDzm/B5js3iMNiPxKUtwK8REhDqjAcmIRJYgR1rA pfMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WhSUIlvP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w28-20020a63af1c000000b004e299dbbec9si17926130pge.372.2023.02.01.19.24.31; Wed, 01 Feb 2023 19:24:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WhSUIlvP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231681AbjBBDFD (ORCPT + 99 others); Wed, 1 Feb 2023 22:05:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231697AbjBBDEs (ORCPT ); Wed, 1 Feb 2023 22:04:48 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5BCD1167C for ; Wed, 1 Feb 2023 19:04:44 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id x6-20020a1709029a4600b0019747acb19cso284637plv.5 for ; Wed, 01 Feb 2023 19:04:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=vmKwcK8nOOGhNuEtAlRRmLDd6QNY/0CCAJIdQUNwbyw=; b=WhSUIlvPiP3t5Byp86BE1TnJ8KqwB3vuESXiHWZr7bEaClkelX4feC4ncvGcBO0C+f a54h37SiNBF0GMyZYRdy7z0INyzyaC4eoVO9IriJLi1iQoCcTMZxh3YAhEdsCq9tCEZv GbtIgZg71Z6r5kUcVlmeT5vIlQcDWG5/bMTlUIowY8VRXv7OLNKhxy8hmT7E2yQ/S+p2 SmjZJEV+T6C88Ajwr5SC5w1dXM/4xD9k9CGkdF0dEdbOZ5zfCjo4qR/5J4QiOrstcj1N 28jCmuv1pplFDgdvqUwZbj74afDfGdeHMDLrjALR0ZBHK3LOGeyzYa94pp9p2IBmSPDV 27hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=vmKwcK8nOOGhNuEtAlRRmLDd6QNY/0CCAJIdQUNwbyw=; b=ADu0viS2ufHyuYH62bGN5YzWc0Z0EGMwca9/qWBMFTqZvuWhzdjwAc6lC0NPESOqbj dsWOvVRwWV532s9PCQjQIUF9RU/V0gYJMA8NuOTPWHhwKGfKadxDUtKUg7bmWQNfowo3 PBz0IATpjV5mCYuKwEiLrMia8RenncZ+N/ASE0MILwQIImQGTHA6lobM/Ih+qIx8MnOd ZDyLA9+WaFkMhCE9PXGqd7M1OyGcb8WKTuw7Jeo8sv02wtdIKeo58rwqhsaH9FqHoY3c ZplHvxvKnHX7zwrq9s3O1pp0SXUh9r8/1FIlj+zIvt8VYsduB7Smui3qttz4+giriZqz VJeA== X-Gm-Message-State: AO0yUKUJpWxslg7P/izuTXulZnRYZx22JGoiQ/uq+QEmGwYLz+xq+PFc ibbvZ+6JehGZvzqQK1DflJ9CnfI2ZmI= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:eee0:dc42:a911:8b59]) (user=avagin job=sendgmr) by 2002:a17:90a:8903:b0:226:9980:67f3 with SMTP id u3-20020a17090a890300b00226998067f3mr44814pjn.1.1675307083813; Wed, 01 Feb 2023 19:04:43 -0800 (PST) Date: Wed, 1 Feb 2023 19:04:27 -0800 In-Reply-To: <20230202030429.3304875-1-avagin@google.com> Mime-Version: 1.0 References: <20230202030429.3304875-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog Message-ID: <20230202030429.3304875-5-avagin@google.com> Subject: [PATCH 4/6] seccomp: add the synchronous mode for seccomp_unotify From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756686843595429307?= X-GMAIL-MSGID: =?utf-8?q?1756688062828268685?= seccomp_unotify allows more privileged processes do actions on behalf of less privileged processes. In many cases, the workflow is fully synchronous. It means a target process triggers a system call and passes controls to a supervisor process that handles the system call and returns controls to the target process. In this context, "synchronous" means that only one process is running and another one is waiting. There is the WF_CURRENT_CPU flag that is used to advise the scheduler to move the wakee to the current CPU. For such synchronous workflows, it makes context switches a few times faster. Right now, each interaction takes 12µs. With this patch, it takes about 3µs. This change introduce the SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP flag that it used to enable the sync mode. Signed-off-by: Andrei Vagin --- include/uapi/linux/seccomp.h | 4 ++++ kernel/seccomp.c | 31 +++++++++++++++++++++++++++++-- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 0fdc6ef02b94..dbfc9b37fcae 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -115,6 +115,8 @@ struct seccomp_notif_resp { __u32 flags; }; +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) + /* valid flags for seccomp_notif_addfd */ #define SECCOMP_ADDFD_FLAG_SETFD (1UL << 0) /* Specify remote fd */ #define SECCOMP_ADDFD_FLAG_SEND (1UL << 1) /* Addfd and return it, atomically */ @@ -150,4 +152,6 @@ struct seccomp_notif_addfd { #define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOW(3, \ struct seccomp_notif_addfd) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) + #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 876022e9c88c..0a62d44f4898 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -143,9 +143,12 @@ struct seccomp_kaddfd { * filter->notify_lock. * @next_id: The id of the next request. * @notifications: A list of struct seccomp_knotif elements. + * @flags: A set of SECCOMP_USER_NOTIF_FD_* flags. */ + struct notification { atomic_t requests; + u32 flags; u64 next_id; struct list_head notifications; }; @@ -1117,7 +1120,10 @@ static int seccomp_do_user_notification(int this_syscall, INIT_LIST_HEAD(&n.addfd); atomic_add(1, &match->notif->requests); - wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); + if (match->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + wake_up_poll_on_current_cpu(&match->wqh, EPOLLIN | EPOLLRDNORM); + else + wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); /* * This is where we wait for a reply from userspace. @@ -1593,7 +1599,10 @@ static long seccomp_notify_send(struct seccomp_filter *filter, knotif->error = resp.error; knotif->val = resp.val; knotif->flags = resp.flags; - complete(&knotif->ready); + if (filter->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + complete_on_current_cpu(&knotif->ready); + else + complete(&knotif->ready); out: mutex_unlock(&filter->notify_lock); return ret; @@ -1623,6 +1632,22 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter, return ret; } +static long seccomp_notify_set_flags(struct seccomp_filter *filter, + unsigned long flags) +{ + long ret; + + if (flags & ~SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + return -EINVAL; + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + filter->notif->flags = flags; + mutex_unlock(&filter->notify_lock); + return 0; +} + static long seccomp_notify_addfd(struct seccomp_filter *filter, struct seccomp_notif_addfd __user *uaddfd, unsigned int size) @@ -1752,6 +1777,8 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, case SECCOMP_IOCTL_NOTIF_ID_VALID_WRONG_DIR: case SECCOMP_IOCTL_NOTIF_ID_VALID: return seccomp_notify_id_valid(filter, buf); + case SECCOMP_IOCTL_NOTIF_SET_FLAGS: + return seccomp_notify_set_flags(filter, arg); } /* Extensible Argument ioctls */ From patchwork Thu Feb 2 03:04:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 51646 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp12490wrn; Wed, 1 Feb 2023 19:28:01 -0800 (PST) X-Google-Smtp-Source: AK7set9FbTa7qL6WtapSQKSQMOskN64hrAEi6cHl4BVWw8LBLVmkkMpb7ot16aKbHRZdmjug7ow0 X-Received: by 2002:a05:6a00:1746:b0:593:b491:409f with SMTP id j6-20020a056a00174600b00593b491409fmr5805342pfc.6.1675308481477; Wed, 01 Feb 2023 19:28:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675308481; cv=none; d=google.com; s=arc-20160816; b=z5eDfcRjjBeF1hfaWRJMQQaHofVbwsi3EotuazYe+NmtQEHTx34tdIs/5ZR4komQz3 Mswo/65vHpvAoSyEL626gv9szgU6RwqfY1C5NkZ8FWNNVgcQlrZ3u76rvdg9YvwH+X0o J94JwtuHEfZQZjVh77VAEVoAnx+PvPkgX8VXdbKb32acxYEhUG7hyDwwMEOVmqPDvHyZ BPNsVdJ5IypyBMdpyDOju7RZfviND4Q06USFhEdDYgJrsCUDCGGww/bW+U/rlHSMxnwA Y/7mnVFfZosizDIwj02i5e1GvAGm666QLJEmHmtvmD4PP+Y5aavuTh6VC1ye+3hJwjKs eVAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=eiBfk+BfeZMyiGmDtQ57nkoCct1hYoOMf69qzZGcvyo=; b=eh8oy9OmzyLFMF7guww6oUmfHPd59Cs1I1LC8fTwGUdSPdvU8DNOoKDZnHcQTc1Wek G+Zha+HpO0AdhCn5SPzkRZXPFlbrTnqgOapEzD6S5ukiLTE0nsd7WSFwmlezmVCOW9py yS8FPlnuUhtVEKYgBZJGbfnl32/2zRio9pTa0X4Qa49ObfYCMfGZfnX+6L0pSYFmYRU4 r2kte6qhOXdgEtsjy2G6zSJFMrv+rfcb7HYLz6ROZGGesYgns23QEtgdTM7X9zfQEduz haQ4hQhVFaeUCFn8Os3Gl2G9U+B1rmhIJPYxErqF6PTkZtWXofybNJ2yUwCjBY80XJGB JtJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iCBw856I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cj5-20020a056a00298500b0059258b3c5cbsi9360624pfb.358.2023.02.01.19.27.48; Wed, 01 Feb 2023 19:28:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iCBw856I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229851AbjBBDFK (ORCPT + 99 others); Wed, 1 Feb 2023 22:05:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231733AbjBBDEs (ORCPT ); Wed, 1 Feb 2023 22:04:48 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7879C38659 for ; Wed, 1 Feb 2023 19:04:46 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id a4-20020a5b0004000000b006fdc6aaec4fso472130ybp.20 for ; Wed, 01 Feb 2023 19:04:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eiBfk+BfeZMyiGmDtQ57nkoCct1hYoOMf69qzZGcvyo=; b=iCBw856I6Vh3uVjkTHXCwcpAedCJ/h5ZCWWLNJT+8sqP5ggv3TqZKfn/l3Ws4tNtBl eeg8aEBPq8gnFwOLML6sLsoVxT5NCFSJaGQJywPx2+wZnF7eT+Dodghc/YuhOVY2FiHF sfGnZ7h0enxgTNWAozbRL/bTJum5r4dsXCP6my1494BMC1WwYMkNIH55BRjbKeYmJEbq C8tejG+ZsgAPV61sAUq5BGzh7iVkqZq/GuKjGvx7wzFTLQJhjlIUjCuOnRmIpCb1nl4b g/Mp5SqlAESICg+QpmsyyWyyGfUrAxWcQ3LCW5U0uFce39qp8bu9ioATngi1gkUyDmz+ Rbeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eiBfk+BfeZMyiGmDtQ57nkoCct1hYoOMf69qzZGcvyo=; b=Wp32NEQuYB9ATLguUJVeIIW0Getzh/UUxYlHWijQfYcuo9Wug4fCLtrxFXkbRA/ANT 0AFOi7tfatwD5xZGnd+YoTFIUNJw51qtW1f69V77VKs2WRfmJgf6/txIE03MN6oXJuwo LD/5NlofV+eNZiBbTNTCpnK2Fd69AKZmohT5NK0WLM5OpwMoApWq0y5TboPWuoQhKweT tAUA0JGHX7KqT3BLG6igOU+Jw8hKZ56yDGmIOJJU7vBAbbR1JDYHukvofRlXs/NnIWf+ 7blh+sFysM4QC1UCzHrjti+Lko11mgYf7XlwUAqf8S76sD1dKEXk1vJGwQubGBg6Lzat TNeA== X-Gm-Message-State: AO0yUKXkJshdBn3BoP+XP1+karQNWrp+OLyGQ+4b/leY6G5X8h2vTWYt GyIU85TCIZXS4d/uQHNDIpXEdcev/+Q= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:eee0:dc42:a911:8b59]) (user=avagin job=sendgmr) by 2002:a0d:d497:0:b0:51d:3714:b2eb with SMTP id w145-20020a0dd497000000b0051d3714b2ebmr491482ywd.431.1675307086103; Wed, 01 Feb 2023 19:04:46 -0800 (PST) Date: Wed, 1 Feb 2023 19:04:28 -0800 In-Reply-To: <20230202030429.3304875-1-avagin@google.com> Mime-Version: 1.0 References: <20230202030429.3304875-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog Message-ID: <20230202030429.3304875-6-avagin@google.com> Subject: [PATCH 5/6] selftest/seccomp: add a new test for the sync mode of seccomp_user_notify From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756688266081864750?= X-GMAIL-MSGID: =?utf-8?q?1756688266081864750?= Test output: # RUN global.user_notification_sync ... # OK global.user_notification_sync ok 51 global.user_notification_sync Signed-off-by: Andrei Vagin --- tools/testing/selftests/seccomp/seccomp_bpf.c | 55 +++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 9c2f448bb3a9..05b8de6d1fcb 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -4243,6 +4243,61 @@ TEST(user_notification_addfd_rlimit) close(memfd); } +#ifndef SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) +#endif + +TEST(user_notification_sync) +{ + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + int status, listener; + pid_t pid; + long ret; + + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(0, ret) { + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); + } + + listener = user_notif_syscall(__NR_getppid, + SECCOMP_FILTER_FLAG_NEW_LISTENER); + ASSERT_GE(listener, 0); + + /* Try to set invalid flags. */ + EXPECT_SYSCALL_RETURN(-EINVAL, + ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, 0xffffffff, 0)); + + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, + SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP, 0), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + if (pid == 0) { + ret = syscall(__NR_getppid); + ASSERT_EQ(ret, USER_NOTIF_MAGIC) { + _exit(1); + } + _exit(0); + } + + req.pid = 0; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + + ASSERT_EQ(req.data.nr, __NR_getppid); + + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + resp.flags = 0; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + ASSERT_EQ(waitpid(pid, &status, 0), pid); + ASSERT_EQ(status, 0); +} + + /* Make sure PTRACE_O_SUSPEND_SECCOMP requires CAP_SYS_ADMIN. */ FIXTURE(O_SUSPEND_SECCOMP) { pid_t pid; From patchwork Thu Feb 2 03:04:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 51645 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp11933wrn; Wed, 1 Feb 2023 19:26:14 -0800 (PST) X-Google-Smtp-Source: AK7set92x2C1X/5W5YTAEwJMZcWyXMi3r3kUVRWq9VV8ESqG3ZgoGccyNq1jxBcA1NQlPvdt9jX1 X-Received: by 2002:a17:90b:1e47:b0:22c:1735:4298 with SMTP id pi7-20020a17090b1e4700b0022c17354298mr5035436pjb.14.1675308373765; Wed, 01 Feb 2023 19:26:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675308373; cv=none; d=google.com; s=arc-20160816; b=T8WUr09CMam4OLOmanqyLHnYNY62AaYErjZAEwwl685/9eZWEsxAzikyAetgEZn+Cq LnXhaCWSrcEFCFyHmablgBZIxoPuiCGf2vNuSD3pjKmVcWpQaOfjy9U+Lr+pUMmpyPiQ xBi1I78xjk+1V+Og4J0g9UPKW77dxicMONu4hX+XFf+R6ZDZ6SYe30XMqbSyWn4AV5CA KTCjYgTVnGm8dg5bT9lyzMZKBG3dJDtnhJeWd5/AN+qYXJr2xybasQKxNxizoPdZOXoi 8BzlXUmxVjYirPe3Nlp1/+Jm5zqPJRSdzW0opTFqUccnPcrKk2bMlg8q1a6aG9DYzw2H YuGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=XFKLf+0DEpMILqySSAP1sw/8VMjgM3KlxQ+ceqnrNMA=; b=qwYmf0nD0/fZpfMkrvROdd8Pcg3L7yWmPVzKlO469+hQv04koEyh+pGCWxdTIQf4SH BhN9tUmp5rb1C9xREBCpsAfTTnAT5CV4rGnoSg9GIrSwFT3YaxJaLjtlrC6jvBCIcmpp SUqvo8VtlYGNj0LKi3JD71N8qHVlHn3wHnFKtDUL3juhVn0VhuX2SwkHcu7MUBB5yOwO dCgnx2eKAcsmu4kHwD32IDChhp2+onbJUBCrNFPZyWTC8grecYSaW9VbzBuUB32agsuG fZ7qV9gQwlYmFEE8O2kDT1rdomDNEpTfhPKLL4h/VUU4Tg6w7hVRIaN15lL4UiSNndBl Zn+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OjSDkF66; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id np6-20020a17090b4c4600b00229f5cf70f9si4099523pjb.107.2023.02.01.19.25.57; Wed, 01 Feb 2023 19:26:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OjSDkF66; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231551AbjBBDFI (ORCPT + 99 others); Wed, 1 Feb 2023 22:05:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231854AbjBBDEx (ORCPT ); Wed, 1 Feb 2023 22:04:53 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E18F4F9 for ; Wed, 1 Feb 2023 19:04:48 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id w70-20020a25df49000000b00803e799d7b1so507644ybg.10 for ; Wed, 01 Feb 2023 19:04:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XFKLf+0DEpMILqySSAP1sw/8VMjgM3KlxQ+ceqnrNMA=; b=OjSDkF66rYI0/30G2OZKAzATIab+sipSIvxmX/4VcQzuNEknRB+a+b++Vob7ZQSIlQ FyWDvtR7f3YzL3qKW1LWRWqX9AO7RKYugwsN0KPd9fa2TE5HuPP1zGZ7XPXXQm/8bwaX v2t4Y15BwDYY7wG4QyadN4IjhIVGL/0bCfCyfZkrT+s69Hk+2HfE4LpG/kjcTqiMqsyp 5W6y1HxuDbSpHEc8WWwh8eTGZTGAzHW0iKknI3rXNnJglvDeUvq2+Qq1ay5D1tUG6S56 O64OMi3K3Fr/6lyrAJSS0rJ2818UWy1hoKyujwUyfDMbTWVzZ93kERKvVuyzziU7258n 3vuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XFKLf+0DEpMILqySSAP1sw/8VMjgM3KlxQ+ceqnrNMA=; b=uHRfh+Em/+ld1KuhjRff2/Of5CBvBECYxsG5AtYZMDnCcA2ezufn70WRqKYuHg7cc+ DanNMFyZGpHb4IIprn6j9ze3ufg3mopf7wXJhxkkwTMVakWztpvWGurAWDIrabN6Lg6y soOZba6WHPRRrRGFkhxfrrYd6ztreLCxL9YvxWt8k3XB2rdJETTLcjkfageX3U0vMmSI ac7tKOclFPNMUK3XeR5DEx6fzReUi7x5XMOZr/ytTrJmiUia8HIFwBppJBGA3UrA/NjT aX1+YgxG1lk2geLkl/1vBmsYFdBljiEZwZwgSiAdnIsIO7HC3iWlN7nCHV1gPZ3Qy1tO 1X1w== X-Gm-Message-State: AO0yUKXHsmwX6wiaJK38rrLk+OJ6rf9bchIN2msMUAkFTvHFBqnp3J47 p8wukoVC4CZLEqDKo7o8x7lHVzKINVU= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:eee0:dc42:a911:8b59]) (user=avagin job=sendgmr) by 2002:a81:1e13:0:b0:521:e0df:90e1 with SMTP id e19-20020a811e13000000b00521e0df90e1mr0ywe.3.1675307087727; Wed, 01 Feb 2023 19:04:47 -0800 (PST) Date: Wed, 1 Feb 2023 19:04:29 -0800 In-Reply-To: <20230202030429.3304875-1-avagin@google.com> Mime-Version: 1.0 References: <20230202030429.3304875-1-avagin@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog Message-ID: <20230202030429.3304875-7-avagin@google.com> Subject: [PATCH 6/6] perf/benchmark: add a new benchmark for seccom_unotify From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756688153467121233?= X-GMAIL-MSGID: =?utf-8?q?1756688153467121233?= The benchmark is similar to the pipe benchmark. It creates two processes, one is calling syscalls, and another process is handling them via seccomp user notifications. It measures the time required to run a specified number of interations. $ ./perf bench sched seccomp-notify --sync-mode --loop 1000000 # Running 'sched/seccomp-notify' benchmark: # Executed 1000000 system calls Total time: 2.769 [sec] 2.769629 usecs/op 361059 ops/sec $ ./perf bench sched seccomp-notify # Running 'sched/seccomp-notify' benchmark: # Executed 1000000 system calls Total time: 8.571 [sec] 8.571119 usecs/op 116670 ops/sec Signed-off-by: Andrei Vagin --- tools/arch/x86/include/uapi/asm/unistd_32.h | 3 + tools/arch/x86/include/uapi/asm/unistd_64.h | 3 + tools/perf/bench/Build | 1 + tools/perf/bench/bench.h | 1 + tools/perf/bench/sched-seccomp-notify.c | 168 ++++++++++++++++++++ tools/perf/builtin-bench.c | 1 + 6 files changed, 177 insertions(+) create mode 100644 tools/perf/bench/sched-seccomp-notify.c diff --git a/tools/arch/x86/include/uapi/asm/unistd_32.h b/tools/arch/x86/include/uapi/asm/unistd_32.h index 60a89dba01b6..c0c74befc8df 100644 --- a/tools/arch/x86/include/uapi/asm/unistd_32.h +++ b/tools/arch/x86/include/uapi/asm/unistd_32.h @@ -14,3 +14,6 @@ #ifndef __NR_setns # define __NR_setns 346 #endif +#ifdef __NR_seccomp +#define __NR_seccomp 354 +#endif diff --git a/tools/arch/x86/include/uapi/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h index cb52a3a8b8fc..b695246da684 100644 --- a/tools/arch/x86/include/uapi/asm/unistd_64.h +++ b/tools/arch/x86/include/uapi/asm/unistd_64.h @@ -14,3 +14,6 @@ #ifndef __NR_setns #define __NR_setns 308 #endif +#ifndef __NR_seccomp +#define __NR_seccomp 317 +#endif diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build index 6b6155a8ad09..e3ec2c1b0682 100644 --- a/tools/perf/bench/Build +++ b/tools/perf/bench/Build @@ -1,5 +1,6 @@ perf-y += sched-messaging.o perf-y += sched-pipe.o +perf-y += sched-seccomp-notify.o perf-y += syscall.o perf-y += mem-functions.o perf-y += futex-hash.o diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h index a5d49b3b6a09..40657b0959a9 100644 --- a/tools/perf/bench/bench.h +++ b/tools/perf/bench/bench.h @@ -21,6 +21,7 @@ extern struct timeval bench__start, bench__end, bench__runtime; int bench_numa(int argc, const char **argv); int bench_sched_messaging(int argc, const char **argv); int bench_sched_pipe(int argc, const char **argv); +int bench_sched_seccomp_notify(int argc, const char **argv); int bench_syscall_basic(int argc, const char **argv); int bench_mem_memcpy(int argc, const char **argv); int bench_mem_memset(int argc, const char **argv); diff --git a/tools/perf/bench/sched-seccomp-notify.c b/tools/perf/bench/sched-seccomp-notify.c new file mode 100644 index 000000000000..443f4b43702d --- /dev/null +++ b/tools/perf/bench/sched-seccomp-notify.c @@ -0,0 +1,168 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include "bench.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define LOOPS_DEFAULT 1000000UL +static uint64_t loops = LOOPS_DEFAULT; +static bool sync_mode; + +static const struct option options[] = { + OPT_U64('l', "loop", &loops, "Specify number of loops"), + OPT_BOOLEAN('s', "sync-mode", &sync_mode, + "Enable the synchronious mode for seccomp notifications"), + OPT_END() +}; + +static const char * const bench_seccomp_usage[] = { + "perf bench sched secccomp-notify ", + NULL +}; + +static int seccomp(unsigned int op, unsigned int flags, void *args) +{ + return syscall(__NR_seccomp, op, flags, args); +} + +static int user_notif_syscall(int nr, unsigned int flags) +{ + struct sock_filter filter[] = { + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, + offsetof(struct seccomp_data, nr)), + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, nr, 0, 1), + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF), + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), + }; + + struct sock_fprog prog = { + .len = (unsigned short)ARRAY_SIZE(filter), + .filter = filter, + }; + + return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); +} + +#define USER_NOTIF_MAGIC INT_MAX +static void user_notification_sync_loop(int listener) +{ + struct seccomp_notif_resp resp; + struct seccomp_notif req; + uint64_t nr; + + for (nr = 0; nr < loops; nr++) { + memset(&req, 0, sizeof(req)); + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req) == 0); + + assert(req.data.nr == __NR_gettid); + + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + resp.flags = 0; + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp) == 0); + } +} + +#ifndef SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) +#endif +int bench_sched_seccomp_notify(int argc, const char **argv) +{ + struct timeval start, stop, diff; + unsigned long long result_usec = 0; + int status, listener; + pid_t pid; + long ret; + + argc = parse_options(argc, argv, options, bench_seccomp_usage, 0); + + gettimeofday(&start, NULL); + + prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + listener = user_notif_syscall(__NR_gettid, + SECCOMP_FILTER_FLAG_NEW_LISTENER); + assert(listener >= 0); + + pid = fork(); + assert(pid >= 0); + if (pid == 0) { + assert(prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0) == 0); + while (1) { + ret = syscall(__NR_gettid); + if (ret == USER_NOTIF_MAGIC) + continue; + break; + } + _exit(1); + } + + if (sync_mode) { + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, + SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP, 0) == 0); + } + user_notification_sync_loop(listener); + + kill(pid, SIGKILL); + assert(waitpid(pid, &status, 0) == pid); + assert(WIFSIGNALED(status)); + assert(WTERMSIG(status) == SIGKILL); + + gettimeofday(&stop, NULL); + timersub(&stop, &start, &diff); + + switch (bench_format) { + case BENCH_FORMAT_DEFAULT: + printf("# Executed %lu system calls\n\n", + loops); + + result_usec = diff.tv_sec * USEC_PER_SEC; + result_usec += diff.tv_usec; + + printf(" %14s: %lu.%03lu [sec]\n\n", "Total time", + (unsigned long) diff.tv_sec, + (unsigned long) (diff.tv_usec / USEC_PER_MSEC)); + + printf(" %14lf usecs/op\n", + (double)result_usec / (double)loops); + printf(" %14d ops/sec\n", + (int)((double)loops / + ((double)result_usec / (double)USEC_PER_SEC))); + break; + + case BENCH_FORMAT_SIMPLE: + printf("%lu.%03lu\n", + (unsigned long) diff.tv_sec, + (unsigned long) (diff.tv_usec / USEC_PER_MSEC)); + break; + + default: + /* reaching here is something disaster */ + fprintf(stderr, "Unknown format:%d\n", bench_format); + exit(1); + break; + } + + return 0; +} diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c index 334ab897aae3..71044575c571 100644 --- a/tools/perf/builtin-bench.c +++ b/tools/perf/builtin-bench.c @@ -46,6 +46,7 @@ static struct bench numa_benchmarks[] = { static struct bench sched_benchmarks[] = { { "messaging", "Benchmark for scheduling and IPC", bench_sched_messaging }, { "pipe", "Benchmark for pipe() between two processes", bench_sched_pipe }, + { "seccomp-notify", "Benchmark for seccomp user notify", bench_sched_seccomp_notify}, { "all", "Run all scheduler benchmarks", NULL }, { NULL, NULL, NULL } };