From patchwork Wed Mar 8 07:31:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 66060 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp194530wrd; Tue, 7 Mar 2023 23:48:44 -0800 (PST) X-Google-Smtp-Source: AK7set+yY/cz98wIi9R7683BcXzXoV7ZL/zLSkRm2V6so1ub7BZMEaW7Fq8K1EOXDLy6hTlj5bb+ X-Received: by 2002:a17:907:7f13:b0:884:3174:119d with SMTP id qf19-20020a1709077f1300b008843174119dmr23273135ejc.14.1678261724495; Tue, 07 Mar 2023 23:48:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678261724; cv=none; d=google.com; s=arc-20160816; b=MkLrE73GVPqVHPJPdzIvGvckt5E68s4sOFjehBKI9WRiCkFYeUjLMuwV/OWUpgcLvt /3TKE1EA3qxhzJbABXRc8igReWHMbSD4GuVNeqlnn78c5r4cHp6zgpefXmQiWPamoPgg fIDGFUErPt66MyT4Qixp0+Ldr8Xn050FtmKzEl8TELZIdLM/HgVVrSIjNem4jGrc4e3m Jyu4XPeJDdDP+GWuwf62mRPrpJSuXxTQjsyiDcsUhxi0beHOtbbFmAbwhw7pZfIhKa0d RMKQe1SQgMyKqbLBqLZFDk2vuz7/RxlIyR1A7Ae01+nfvo0njSrjMUYwRXWiKlM6f2YG 4D+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=KdP6YpLIkrgOw0lYy0bdzpscqpNI4s/RcyKWmelvr0E=; b=Dmx207g+zpva4/bmP2Q2leckPpgQgNLL/yHIkiVYmMPAcGhz69muvK9p5Hf3shGEao viuTu2eUyUt9lk3fgHutfUCvR2S+DFlnoPXWbnY9FZ3b7zIaa4Hi1+F+MRcSUEfLMNHM Apc8O3y6A4AoW/Qi54bJfe17FcXhuDBFlDiqb3bT1yQlyGxB92477rfgXaOgU0+RZh2d BOZRp1dmPdhRAA3Et5AAlavnUJDEXM8lN4ANvtqTuQ/VX7hPCaN8lgAlaAMHT/SFxM5R 5/TR9xfmN2Qc6AzGGnP1p0Eiww2G7ncXHmo2AT6ZKmFhK3E3rK5o9A6AXZ9gzzAwcWW3 veLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=QLqdFxWJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m15-20020a1709066d0f00b008b1392f7c7bsi10068985ejr.770.2023.03.07.23.48.20; Tue, 07 Mar 2023 23:48:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=QLqdFxWJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229674AbjCHHcW (ORCPT + 99 others); Wed, 8 Mar 2023 02:32:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229786AbjCHHcS (ORCPT ); Wed, 8 Mar 2023 02:32:18 -0500 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 856D9A838D for ; Tue, 7 Mar 2023 23:32:12 -0800 (PST) Received: by mail-pf1-x44a.google.com with SMTP id w3-20020aa78583000000b005d244af158eso8586521pfn.23 for ; Tue, 07 Mar 2023 23:32:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678260732; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KdP6YpLIkrgOw0lYy0bdzpscqpNI4s/RcyKWmelvr0E=; b=QLqdFxWJJTCcjLfJJw26wi68d2otuqveE486Q1LDW5sQla52Ywi+1Tl4jv3D4AxLUw 3avLGCXQb+mv1yWcYc08B7I0oRGzwex3vI06gsV9j61XaiBQsgKuYpRXvXpAalOSSNev 9uW+1N9swUVFYiNNUrLahwCYNt8pSOAyNrjbAuij0LyQiw2yMX2MGn3B10pJ9i8bTgk8 pmgTIp307skmeFk5+T/6dvS64gc5D0SRFQVxDOobzth79IinQln6GNfwHB1AE9clAVAn dBp7xSgYLwfXWfxPMz/dmFQ/DMUnjXmUJAoQl3bjX5c4uxyGngzH+SlozLWaWG8rRxiF LfNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678260732; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KdP6YpLIkrgOw0lYy0bdzpscqpNI4s/RcyKWmelvr0E=; b=QFiuRgX2xCLHr9YfL6uOxZiJ8F1YwAawOOYt2Xzm8/B/hKkW5k4UnluRs6YLUQmMHS CDqDa4zXovp10OK3jiJVCHmXG7uH2UKd5W9JHvEEPfkgd40P06WM5EYsHmFXAdfPiTHE 1gEzFP2FQ0rbIJsCx6K72w0yThWQMEqWRlk5IUgnmsymYvXYqpcN++0FzEes7RnRrTub tMSmjtaxzT+ddK8utoouvNH4HdWu5iCDKhZFof3fQkKAGaGnHa/R///Dfc42QPWGzgfy 1xk3NrmhT3WdIOEtfIH6tXyaVAR19NOekq+8ToEt77XtHYqytMNrKGd/O962ZaEDKKBU POuQ== X-Gm-Message-State: AO0yUKUDaHwfIb8Eau5/KHuPxT6LBvSPghaxDu2HxRZjn41z03bAdpBe GaYo8t/jTARp86P6FN2qWHJeLEV9ID0= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:b53:99a6:b4fe:b30b]) (user=avagin job=sendgmr) by 2002:a17:902:f783:b0:19c:140d:aada with SMTP id q3-20020a170902f78300b0019c140daadamr6499087pln.2.1678260731766; Tue, 07 Mar 2023 23:32:11 -0800 (PST) Date: Tue, 7 Mar 2023 23:31:56 -0800 In-Reply-To: <20230308073201.3102738-1-avagin@google.com> Mime-Version: 1.0 References: <20230308073201.3102738-1-avagin@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230308073201.3102738-2-avagin@google.com> Subject: [PATCH 1/6] seccomp: don't use semaphore and wait_queue together From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759784965714666900?= X-GMAIL-MSGID: =?utf-8?q?1759784965714666900?= The main reason is to use new wake_up helpers that will be added in the following patches. But here are a few other reasons: * if we use two different ways, we always need to call them both. This patch fixes seccomp_notify_recv where we forgot to call wake_up_poll in the error path. * If we use one primitive, we can control how many waiters are woken up for each request. Our goal is to wake up just one that will handle a request. Right now, wake_up_poll can wake up one waiter and up(&match->notif->request) can wake up one more. Signed-off-by: Andrei Vagin --- kernel/seccomp.c | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index cebf26445f9e..9fca9345111c 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -145,7 +145,7 @@ struct seccomp_kaddfd { * @notifications: A list of struct seccomp_knotif elements. */ struct notification { - struct semaphore request; + atomic_t requests; u64 next_id; struct list_head notifications; }; @@ -1116,7 +1116,7 @@ static int seccomp_do_user_notification(int this_syscall, list_add_tail(&n.list, &match->notif->notifications); INIT_LIST_HEAD(&n.addfd); - up(&match->notif->request); + atomic_inc(&match->notif->requests); wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); /* @@ -1450,6 +1450,37 @@ find_notification(struct seccomp_filter *filter, u64 id) return NULL; } +static int recv_wake_function(wait_queue_entry_t *wait, unsigned int mode, int sync, + void *key) +{ + /* Avoid a wakeup if event not interesting for us. */ + if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR))) + return 0; + return autoremove_wake_function(wait, mode, sync, key); +} + +static int recv_wait_event(struct seccomp_filter *filter) +{ + DEFINE_WAIT_FUNC(wait, recv_wake_function); + int ret; + + if (atomic_dec_if_positive(&filter->notif->requests) >= 0) + return 0; + + for (;;) { + ret = prepare_to_wait_event(&filter->wqh, &wait, TASK_INTERRUPTIBLE); + + if (atomic_dec_if_positive(&filter->notif->requests) >= 0) + break; + + if (ret) + return ret; + + schedule(); + } + finish_wait(&filter->wqh, &wait); + return 0; +} static long seccomp_notify_recv(struct seccomp_filter *filter, void __user *buf) @@ -1467,7 +1498,7 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, memset(&unotif, 0, sizeof(unotif)); - ret = down_interruptible(&filter->notif->request); + ret = recv_wait_event(filter); if (ret < 0) return ret; @@ -1515,7 +1546,8 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, if (should_sleep_killable(filter, knotif)) complete(&knotif->ready); knotif->state = SECCOMP_NOTIFY_INIT; - up(&filter->notif->request); + atomic_inc(&filter->notif->requests); + wake_up_poll(&filter->wqh, EPOLLIN | EPOLLRDNORM); } mutex_unlock(&filter->notify_lock); } @@ -1777,7 +1809,6 @@ static struct file *init_listener(struct seccomp_filter *filter) if (!filter->notif) goto out; - sema_init(&filter->notif->request, 0); filter->notif->next_id = get_random_u64(); INIT_LIST_HEAD(&filter->notif->notifications); From patchwork Wed Mar 8 07:31:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 66059 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp193771wrd; Tue, 7 Mar 2023 23:46:06 -0800 (PST) X-Google-Smtp-Source: AK7set/Ss3iut7XAZcf9Scm31pFM2MSI45obcj0nA79abLku51nHTsF/5iPxR1AirfFVjfsXB9nw X-Received: by 2002:a05:6a20:548a:b0:cb:e735:65a5 with SMTP id i10-20020a056a20548a00b000cbe73565a5mr22320852pzk.40.1678261566684; Tue, 07 Mar 2023 23:46:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678261566; cv=none; d=google.com; s=arc-20160816; b=ybT9yk5+x9GBwEjtYWKyqvPYOcJh+U/02FdgO4r3ZkA6S57D1+PN4DL3aw5fRPUap4 H3OJSxljILSu43WalEbR10NavU0MAZevNtydD90nyOMJW/nUhXrdKFbrmmnDnospPg3O g6DSo2Q665uBVROxA5BvEulw3A8PxG210JDwtgMNtztxh6FJiZPQOC/CqIlosXGIVrPi xgxFph8k4tDsS5qTz2Ly9yToOLzGNw1e0T6FiCQ2dwUTRFrTYPV/f0VGyYMv4W0VMlWy GUve5ndhYXOOzm+O5rhCBDwExSnUWwZv2amWaE/LBu2YX6ZaN7XMA1s86wtgn+hG7p/Z 3K4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=1HJMLloX8l3VEo0+22HUaFfPeUdekf+/4qykWWC+d8Q=; b=ybE20Ai9Bz0QbhfBwNo6sRI3OFeqSKk7ykjNpyE9occNbpdReBvjIDAFkUAg57E4Hp hMkXZ0jaGmauGCPimYw70sGUNjuH85YrV/Rqtj7Fl8kTLsjhot0yZEfNaVLnPI3N17wi YYCcHvk5hW0qa3Ff9PZcyrQgI6XBlC5G//ZN9XnV5rfCyl7FeuvXPb9tc3/jIorcGCO1 l1lhOAL5QXL70rOciQ9bDB7PuUgKgcBG17+z07Eii4OVf/zfXIxdpF01o1TWLPSYTTiu Pd97kfOh4i6tAO5MJm6CXnvIuftF+FVCGyThlKxuNInzrcPybKM6LB6+8E/YlXKfPfy0 X62A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=sV1RKOnV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k2-20020a632402000000b00502d8d715aasi14574474pgk.321.2023.03.07.23.45.53; Tue, 07 Mar 2023 23:46:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=sV1RKOnV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229786AbjCHHc0 (ORCPT + 99 others); Wed, 8 Mar 2023 02:32:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229579AbjCHHcT (ORCPT ); Wed, 8 Mar 2023 02:32:19 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7DF2A336C for ; Tue, 7 Mar 2023 23:32:14 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id d185-20020a25e6c2000000b008fa1d22bd55so17083990ybh.21 for ; Tue, 07 Mar 2023 23:32:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678260734; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1HJMLloX8l3VEo0+22HUaFfPeUdekf+/4qykWWC+d8Q=; b=sV1RKOnV7SF9Z0hXW96OjKAFskv9QxYAq2DIgZs6X3jgNThSZn8PERZBPg+ExcAz8q L+yd8AyQO80kF1xo0A5IYqIKP+g037KRTtaDB0s0ruMuH8v/qos7aw9bx8KvIhxL+QdZ nbI3yF7wgTcPM6xv17msJrDyGBFNO/0cm5d3tj0H7gH+XqK/a/fLx4qnF3NtThvqlevo vF84/hm97EKEGvrDziTCNG5cNrIm0gZHhMjxUyXWZuVHZO2MZ8V9XVcbWhj9kj78mjJl qTLc8o1g+gKHaL5FlMjaaGPY8tqvJmhKhA0uYKy5ylyGYJZBd0LqA4JdzqXDGC/TunWj 7uyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678260734; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1HJMLloX8l3VEo0+22HUaFfPeUdekf+/4qykWWC+d8Q=; b=nqw5L0UnT+QQDxEErPU8piGziO0+YvlpBKyw6oT5NGWfnvQ+Bji2PdQN//w6iZqGl9 1STFli6une0VU+//6LqBj54QYEUiWvCAlbwJT3iTBzyKNfh53PWJ7+pg1ddLN54SCvNx vrZWuiN6pu8bjba4LQl3+UbEbTR00uwv+mcVnLGeE3gTRKW7gH/PGlKvqApVFOP2i6OD eMIABJLpQRc1imuYzUsMitWpqbQx5QWrtBOePsUNGT8FunbROTeTBaBkiTnJEnfGngoE COvYFYQPzRBQgl/9EjdtRcf2mTpla8oMQWvtb5LUWMPWF5aSfOu8Xjdy6b/n9IHbzl95 eewg== X-Gm-Message-State: AO0yUKW9qbsQn/6DZXPa48OpBFzMxEkfRhiE5RCYqf2dPhZkrBinNA5n 8Kf1dNS5iYdXE+qGidoeVRSZLsnFC4g= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:b53:99a6:b4fe:b30b]) (user=avagin job=sendgmr) by 2002:a05:6902:151:b0:afa:d8b5:8e82 with SMTP id p17-20020a056902015100b00afad8b58e82mr8271697ybh.6.1678260734085; Tue, 07 Mar 2023 23:32:14 -0800 (PST) Date: Tue, 7 Mar 2023 23:31:57 -0800 In-Reply-To: <20230308073201.3102738-1-avagin@google.com> Mime-Version: 1.0 References: <20230308073201.3102738-1-avagin@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230308073201.3102738-3-avagin@google.com> Subject: [PATCH 2/6] sched: add WF_CURRENT_CPU and externise ttwu From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759784800613187040?= X-GMAIL-MSGID: =?utf-8?q?1759784800613187040?= From: Peter Oskolkov Add WF_CURRENT_CPU wake flag that advices the scheduler to move the wakee to the current CPU. This is useful for fast on-CPU context switching use cases. In addition, make ttwu external rather than static so that the flag could be passed to it from outside of sched/core.c. Signed-off-by: Peter Oskolkov Signed-off-by: Andrei Vagin --- kernel/sched/core.c | 3 +-- kernel/sched/fair.c | 4 ++++ kernel/sched/sched.h | 13 ++++++++----- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index af017e038b48..386a0c40d341 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4123,8 +4123,7 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success) * Return: %true if @p->state changes (an actual wakeup was done), * %false otherwise. */ -static int -try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) +int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) { unsigned long flags; int cpu, success = 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a1b1f855b96..4c67652aa302 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7569,6 +7569,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags) if (wake_flags & WF_TTWU) { record_wakee(p); + if ((wake_flags & WF_CURRENT_CPU) && + cpumask_test_cpu(cpu, p->cpus_ptr)) + return cpu; + if (sched_energy_enabled()) { new_cpu = find_energy_efficient_cpu(p, prev_cpu); if (new_cpu >= 0) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 3e8df6d31c1e..f8420e9ed290 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2093,12 +2093,13 @@ static inline int task_on_rq_migrating(struct task_struct *p) } /* Wake flags. The first three directly map to some SD flag value */ -#define WF_EXEC 0x02 /* Wakeup after exec; maps to SD_BALANCE_EXEC */ -#define WF_FORK 0x04 /* Wakeup after fork; maps to SD_BALANCE_FORK */ -#define WF_TTWU 0x08 /* Wakeup; maps to SD_BALANCE_WAKE */ +#define WF_EXEC 0x02 /* Wakeup after exec; maps to SD_BALANCE_EXEC */ +#define WF_FORK 0x04 /* Wakeup after fork; maps to SD_BALANCE_FORK */ +#define WF_TTWU 0x08 /* Wakeup; maps to SD_BALANCE_WAKE */ -#define WF_SYNC 0x10 /* Waker goes to sleep after wakeup */ -#define WF_MIGRATED 0x20 /* Internal use, task got migrated */ +#define WF_SYNC 0x10 /* Waker goes to sleep after wakeup */ +#define WF_MIGRATED 0x20 /* Internal use, task got migrated */ +#define WF_CURRENT_CPU 0x40 /* Prefer to move the wakee to the current CPU. */ #ifdef CONFIG_SMP static_assert(WF_EXEC == SD_BALANCE_EXEC); @@ -3232,6 +3233,8 @@ static inline bool is_per_cpu_kthread(struct task_struct *p) extern void swake_up_all_locked(struct swait_queue_head *q); extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); +extern int try_to_wake_up(struct task_struct *tsk, unsigned int state, int wake_flags); + #ifdef CONFIG_PREEMPT_DYNAMIC extern int preempt_dynamic_mode; extern int sched_dynamic_mode(const char *str); From patchwork Wed Mar 8 07:31:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 66056 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp192438wrd; Tue, 7 Mar 2023 23:41:42 -0800 (PST) X-Google-Smtp-Source: AK7set8OZsomCYahXkVINEiZ6qczpD/3XlvJ/Mn8zOFmpcl0k6K9NdCYUHZ1TkiVH6VcQubBP4KQ X-Received: by 2002:a17:902:f80f:b0:19c:df17:b724 with SMTP id ix15-20020a170902f80f00b0019cdf17b724mr15030086plb.58.1678261301795; Tue, 07 Mar 2023 23:41:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678261301; cv=none; d=google.com; s=arc-20160816; b=u1bMH0wJVPDDyTni0XKWjxOBM7Lleqq8ViZN34JHzVDR45vWq2XKMYG0v84OQOdg9D Bjx/pQI4AnjGWU5gn+pbcEE3BeQ1Ol6Rp/mYOpxnOe2H1s67S2o6iiy9qRg20mwKryBa hg6uHnwpt0kqvS51hH+NvHa7uekLj3befifuc44ytwFeNKzB/OGTWpvP7/EnDdJdtZsN z0l6uroaDvppqqnFEkgkN8Ng+BIlUZOLCjJEIkRfrukPa0/ta/G7qPA0gdaMpU10Ssj/ 2I9X0zWUtQZl2d7EjqB3g7xCAyBRr8Nq7aAC4ThJMEh4J3NbE4ICKt7kgNpk4aGg97wR l+lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=/licageW3tiOFRclAqIq9/0fYCCwDsQhp71Jqd1wdrY=; b=0lhHxvYIeu8vUKCntnL/fNytTQjsbxD/mcduNpsTlStzks9lwaZrTQ0wcGx0G2qgzK ZS+d0QVxyHkjxu59ODTl18MDeytzh0/j5XhHgBvWHOuYMssivKUv8FxJB8ZKkeVgtaye ljUT0yh4RFNeJ7xiU7nI6UygALpOB5fqpcr+7if7tFahaZPdc+HY6tTDwDGqJb7AeOfp mAE9WGhbcA3APloNrz7RXWm63Sv/p9oz/D4BYvoYBHQxx2sScI0mA0wOnoDrBu7AvaUU CPutmHkm+oa9zwvmhb8dcPKYzYu/6HWGOb08Yc/+HFgYaJrn0hpC9GHbMOlLzbnAOpx0 2Bhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BuHMGtzH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jw14-20020a170903278e00b0019a826d304asi13595904plb.630.2023.03.07.23.41.29; Tue, 07 Mar 2023 23:41:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BuHMGtzH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229825AbjCHHca (ORCPT + 99 others); Wed, 8 Mar 2023 02:32:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229803AbjCHHcV (ORCPT ); Wed, 8 Mar 2023 02:32:21 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3C39A6771 for ; Tue, 7 Mar 2023 23:32:16 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id cl18-20020a17090af69200b0023470d96ae6so1760656pjb.1 for ; Tue, 07 Mar 2023 23:32:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678260736; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/licageW3tiOFRclAqIq9/0fYCCwDsQhp71Jqd1wdrY=; b=BuHMGtzH4pt/PW4rWmjgn3Ha71ZQ/6mC8Z1SpFucrs9JEGH54JgaPSCPpWSo581/1L MZL6dnSj+6eqJZfzF+zNpzX0T/2E6A3Bn8dtfQ9ppRkvU1A+/vDPmTkShzVtPxgh9/av ui+hcw+fozEiacx4JU3XGX3A2i+vjMUbV6tdlyoT2iNF8MOvbBwsV6w7xTBO3mDejv1c mNlv9GkNN2mReCMQ3BwA5iswQ9UB8/DEQQ1qIMYQ8jUm/00O1yIOt3R+seZWmUHjaH7R Gh4w3YhNu1wsQgO+7lFyIzJ5oIENbvWjrnNJIRFLkArYdDUjrKDfyboWPYUAXSzc/boo gukw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678260736; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/licageW3tiOFRclAqIq9/0fYCCwDsQhp71Jqd1wdrY=; b=TlImcbtPgDqu/WIN8bgFomzkELEHkLsJ8PPfsvwftqwCEg6HsVlVCVXoVoPrSaATcX AHEQYgV2k9FqSZ4IcsX9FzFHfOsgeV5tIvjOLnDbly4aDaFsXpl2rRb573W/Upd6irjj manNeO5BcFUIdV3RrJ80jW9VLyJImmSqYFwdDvIly4vQLcTpaVqESXshRCEP1IZYANtd tsAH09uLFGWT/+CEWpPThr2yyxINEn1ulIEbmkQ2AucIy3OwUVgK+7h1FgAzQYKgvd7t ANqZDvqEdIoUwsA2z2Avy9BDwmkk1wg/SdABRhQiWoxP6QyGrHIjIGSlCO3nYF2D9yGI 4Uyg== X-Gm-Message-State: AO0yUKWtub5Aiaa/PkkA5/30USyC/s/ELJFFfY+KdszKmgkHMj0jvLz5 wsLQBtOiwYtY8oo9kXvpDvT0l8/dJbk= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:b53:99a6:b4fe:b30b]) (user=avagin job=sendgmr) by 2002:a17:902:f807:b0:19a:a411:92ba with SMTP id ix7-20020a170902f80700b0019aa41192bamr6963804plb.8.1678260736548; Tue, 07 Mar 2023 23:32:16 -0800 (PST) Date: Tue, 7 Mar 2023 23:31:58 -0800 In-Reply-To: <20230308073201.3102738-1-avagin@google.com> Mime-Version: 1.0 References: <20230308073201.3102738-1-avagin@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230308073201.3102738-4-avagin@google.com> Subject: [PATCH 3/6] sched: add a few helpers to wake up tasks on the current cpu From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759784523194934770?= X-GMAIL-MSGID: =?utf-8?q?1759784523194934770?= Add complete_on_current_cpu, wake_up_poll_on_current_cpu helpers to wake up tasks on the current CPU. These two helpers are useful when the task needs to make a synchronous context switch to another task. In this context, synchronous means it wakes up the target task and falls asleep right after that. One example of such workloads is seccomp user notifies. This mechanism allows the supervisor process handles system calls on behalf of a target process. While the supervisor is handling an intercepted system call, the target process will be blocked in the kernel, waiting for a response to come back. On-CPU context switches are much faster than regular ones. Signed-off-by: Andrei Vagin --- include/linux/completion.h | 1 + include/linux/swait.h | 2 +- include/linux/wait.h | 3 +++ kernel/sched/completion.c | 26 ++++++++++++++++++-------- kernel/sched/core.c | 2 +- kernel/sched/swait.c | 8 ++++---- kernel/sched/wait.c | 5 +++++ 7 files changed, 33 insertions(+), 14 deletions(-) diff --git a/include/linux/completion.h b/include/linux/completion.h index 62b32b19e0a8..fb2915676574 100644 --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -116,6 +116,7 @@ extern bool try_wait_for_completion(struct completion *x); extern bool completion_done(struct completion *x); extern void complete(struct completion *); +extern void complete_on_current_cpu(struct completion *x); extern void complete_all(struct completion *); #endif diff --git a/include/linux/swait.h b/include/linux/swait.h index 6a8c22b8c2a5..d324419482a0 100644 --- a/include/linux/swait.h +++ b/include/linux/swait.h @@ -146,7 +146,7 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq) extern void swake_up_one(struct swait_queue_head *q); extern void swake_up_all(struct swait_queue_head *q); -extern void swake_up_locked(struct swait_queue_head *q); +extern void swake_up_locked(struct swait_queue_head *q, int wake_flags); extern void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *wait, int state); extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state); diff --git a/include/linux/wait.h b/include/linux/wait.h index a0307b516b09..5ec7739400f4 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -210,6 +210,7 @@ __remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq } int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, int nr, void *key); +void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, unsigned int mode, void *key, wait_queue_entry_t *bookmark); @@ -237,6 +238,8 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head); #define key_to_poll(m) ((__force __poll_t)(uintptr_t)(void *)(m)) #define wake_up_poll(x, m) \ __wake_up(x, TASK_NORMAL, 1, poll_to_key(m)) +#define wake_up_poll_on_current_cpu(x, m) \ + __wake_up_on_current_cpu(x, TASK_NORMAL, poll_to_key(m)) #define wake_up_locked_poll(x, m) \ __wake_up_locked_key((x), TASK_NORMAL, poll_to_key(m)) #define wake_up_interruptible_poll(x, m) \ diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c index d57a5c1c1cd9..3561ab533dd4 100644 --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -13,6 +13,23 @@ * Waiting for completion is a typically sync point, but not an exclusion point. */ +static void complete_with_flags(struct completion *x, int wake_flags) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&x->wait.lock, flags); + + if (x->done != UINT_MAX) + x->done++; + swake_up_locked(&x->wait, wake_flags); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); +} + +void complete_on_current_cpu(struct completion *x) +{ + return complete_with_flags(x, WF_CURRENT_CPU); +} + /** * complete: - signals a single thread waiting on this completion * @x: holds the state of this particular completion @@ -27,14 +44,7 @@ */ void complete(struct completion *x) { - unsigned long flags; - - raw_spin_lock_irqsave(&x->wait.lock, flags); - - if (x->done != UINT_MAX) - x->done++; - swake_up_locked(&x->wait); - raw_spin_unlock_irqrestore(&x->wait.lock, flags); + complete_with_flags(x, 0); } EXPORT_SYMBOL(complete); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 386a0c40d341..c5f7bfbc4967 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6941,7 +6941,7 @@ asmlinkage __visible void __sched preempt_schedule_irq(void) int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags, void *key) { - WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC); + WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~(WF_SYNC|WF_CURRENT_CPU)); return try_to_wake_up(curr->private, mode, wake_flags); } EXPORT_SYMBOL(default_wake_function); diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c index 76b9b796e695..72505cd3b60a 100644 --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -18,7 +18,7 @@ EXPORT_SYMBOL(__init_swait_queue_head); * If for some reason it would return 0, that means the previously waiting * task is already running, so it will observe condition true (or has already). */ -void swake_up_locked(struct swait_queue_head *q) +void swake_up_locked(struct swait_queue_head *q, int wake_flags) { struct swait_queue *curr; @@ -26,7 +26,7 @@ void swake_up_locked(struct swait_queue_head *q) return; curr = list_first_entry(&q->task_list, typeof(*curr), task_list); - wake_up_process(curr->task); + try_to_wake_up(curr->task, TASK_NORMAL, wake_flags); list_del_init(&curr->task_list); } EXPORT_SYMBOL(swake_up_locked); @@ -41,7 +41,7 @@ EXPORT_SYMBOL(swake_up_locked); void swake_up_all_locked(struct swait_queue_head *q) { while (!list_empty(&q->task_list)) - swake_up_locked(q); + swake_up_locked(q, 0); } void swake_up_one(struct swait_queue_head *q) @@ -49,7 +49,7 @@ void swake_up_one(struct swait_queue_head *q) unsigned long flags; raw_spin_lock_irqsave(&q->lock, flags); - swake_up_locked(q); + swake_up_locked(q, 0); raw_spin_unlock_irqrestore(&q->lock, flags); } EXPORT_SYMBOL(swake_up_one); diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 133b74730738..47803a0b8d5d 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -161,6 +161,11 @@ int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, } EXPORT_SYMBOL(__wake_up); +void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key) +{ + __wake_up_common_lock(wq_head, mode, 1, WF_CURRENT_CPU, key); +} + /* * Same as __wake_up but called with the spinlock in wait_queue_head_t held. */ From patchwork Wed Mar 8 07:31:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 66058 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp192906wrd; Tue, 7 Mar 2023 23:43:23 -0800 (PST) X-Google-Smtp-Source: AK7set9NGKFUe8FJ4u5iOeeMaO2MLxYFjXqTOjWzIPI7cTtVg5LaTTp8VYBcyrDA9w18JAjqlui3 X-Received: by 2002:a05:6a20:8c30:b0:cb:8906:bb06 with SMTP id j48-20020a056a208c3000b000cb8906bb06mr14556598pzh.49.1678261403213; Tue, 07 Mar 2023 23:43:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678261403; cv=none; d=google.com; s=arc-20160816; b=IA+jmM/0D5JKQ6YoKr4YfZFdVlyqEDozLy3ZHpzlSZ+uS9LxOy8IB4cj3Pf87rzi7n wAsRm/qrkwWO457pUGuZJfGujkkTv0IUafL3NyAb3sKrmMf8ei1NVY2PQRc5rD97KC1w jIhvm9sdDyJP6BYFaU2XJWMkZgnXI2NV3l1MJ7hfDS3rlyA8V/Y5tAa3EKu9iV3cAB33 uxuOu2/whe3jarKvCvDKSKz5ANvSxofMxSKRWoGIQJHApsh52lpDPMfwzGUzImSx/F5/ wl3smji515rvGt88kKdVZlmMiKEFxHNM0h4VVkKPPA93u05lxN88KUrGYEyigDifJsXs goVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=N/zczmyojBS9P2rt92Hh+NQdVJTLpoFAFXyTJQmiZyg=; b=qOEjpQE0sCUnDc4B9PjIG2ZyiedWA/+1poydHy8JZOxFrJpXNkjHVMRLvu+JIpOmYU 7rX+2xWNw89nBZpKJ+KZU67D5WUvK19kFWuK+GF4wJqKcTJSvhdmanSJ4LNRL906c7xm ZkdxHzghja45ewMoGm9B+nY1zJMtyaMED4Tca+jUUNND0+5Rw4rf/tB8Qy1CXKsdUGa7 7pwcwINfgvVtQ8l2uhVKLRrxm9qmx0L/K53G2ehoi8XkbCOmaMS5maazXCSLymF5J+5h JzlmCuPBTLCP1kFfvnJ5ckwGuIGLQeugunSj6dUUQHQicYOF7ojnO2F3rvKsYS8zi1Sq 1akw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="s/G0ZGPi"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v7-20020aa799c7000000b005a7ea9c6d7csi4890141pfi.41.2023.03.07.23.43.10; Tue, 07 Mar 2023 23:43:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="s/G0ZGPi"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229808AbjCHHcg (ORCPT + 99 others); Wed, 8 Mar 2023 02:32:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229816AbjCHHcX (ORCPT ); Wed, 8 Mar 2023 02:32:23 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43B69AA242 for ; Tue, 7 Mar 2023 23:32:19 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5376fa4106eso161313527b3.7 for ; Tue, 07 Mar 2023 23:32:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678260738; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=N/zczmyojBS9P2rt92Hh+NQdVJTLpoFAFXyTJQmiZyg=; b=s/G0ZGPiIu+LtCsClWoP3xJ4KF91NU9KO2/f3Oh5EMHV82JxNgVxUMHBhQuFe1RB/9 DmKSSpQH7/fOcJAsj+Wyfo2s/+jJuYmpBczyP1qMXTPlnvdWYEPGFJXBS4Bukyj6cuww JWq41rlZsYcERqCNTo2ogmEp4xUy1qZ0qh55tryMjF5UnbIaJ8n/U4IuJs6kzivuOQr6 X5GYVGxDIKJ3fR3qnboPFaaEMzcDZrZEFxxvI9i+CPkVPs6i/lTZwqQd3T5x7I8iWjTj XIWyjkqfCza7F0XOBQcHGgg7YqlL3vpzxijs1x4on9MXLAOovFhpXbl8tgNYLvgCkumF xkUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678260738; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=N/zczmyojBS9P2rt92Hh+NQdVJTLpoFAFXyTJQmiZyg=; b=6SLd3K55yncXuIqlsU6Nt65qlGKESGs72uiecg7crtPacDtX/ragsrnmtMp+W/KsGc ov4S8F33Eu7SC8FwwKrQbihMbIQMsqjaFWY40osa9PXpNEBHequERcApn1JOq1372i1u 00ysBjVazqV5vyQFukVzHHozYRDpn3RTPkGQ93fr3KOzj8MSy9Y2haFXjDqE1/VPwsbe 2FqcbWuzBrRQsrP4pOpTQSXN14mGyFNkXrK6o97dltusT6zjTx4CFLjGWrpFmlvsu/ll EOyzOki09/aGU+p1v7blUhI8syH6MhJQM+hnR/h/ueIRJvK9S9TVBMhfmqXMRyqZE2X0 jTKA== X-Gm-Message-State: AO0yUKUKLb+o9pWYmoBEEjjyJvP339XcDlGvQmp8k7DjmMWRaCsC3sd5 S036QpHASxPDDexsPguFLsgxdpyXjTA= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:b53:99a6:b4fe:b30b]) (user=avagin job=sendgmr) by 2002:a25:ae1c:0:b0:9f5:af6b:6f69 with SMTP id a28-20020a25ae1c000000b009f5af6b6f69mr13442474ybj.5.1678260738463; Tue, 07 Mar 2023 23:32:18 -0800 (PST) Date: Tue, 7 Mar 2023 23:31:59 -0800 In-Reply-To: <20230308073201.3102738-1-avagin@google.com> Mime-Version: 1.0 References: <20230308073201.3102738-1-avagin@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230308073201.3102738-5-avagin@google.com> Subject: [PATCH 4/6] seccomp: add the synchronous mode for seccomp_unotify From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759784377488728958?= X-GMAIL-MSGID: =?utf-8?q?1759784629371996667?= seccomp_unotify allows more privileged processes do actions on behalf of less privileged processes. In many cases, the workflow is fully synchronous. It means a target process triggers a system call and passes controls to a supervisor process that handles the system call and returns controls to the target process. In this context, "synchronous" means that only one process is running and another one is waiting. There is the WF_CURRENT_CPU flag that is used to advise the scheduler to move the wakee to the current CPU. For such synchronous workflows, it makes context switches a few times faster. Right now, each interaction takes 12µs. With this patch, it takes about 3µs. This change introduce the SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP flag that it used to enable the sync mode. Signed-off-by: Andrei Vagin --- include/uapi/linux/seccomp.h | 4 ++++ kernel/seccomp.c | 31 +++++++++++++++++++++++++++++-- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 0fdc6ef02b94..dbfc9b37fcae 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -115,6 +115,8 @@ struct seccomp_notif_resp { __u32 flags; }; +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) + /* valid flags for seccomp_notif_addfd */ #define SECCOMP_ADDFD_FLAG_SETFD (1UL << 0) /* Specify remote fd */ #define SECCOMP_ADDFD_FLAG_SEND (1UL << 1) /* Addfd and return it, atomically */ @@ -150,4 +152,6 @@ struct seccomp_notif_addfd { #define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOW(3, \ struct seccomp_notif_addfd) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) + #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 9fca9345111c..d323edeae7da 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -143,9 +143,12 @@ struct seccomp_kaddfd { * filter->notify_lock. * @next_id: The id of the next request. * @notifications: A list of struct seccomp_knotif elements. + * @flags: A set of SECCOMP_USER_NOTIF_FD_* flags. */ + struct notification { atomic_t requests; + u32 flags; u64 next_id; struct list_head notifications; }; @@ -1117,7 +1120,10 @@ static int seccomp_do_user_notification(int this_syscall, INIT_LIST_HEAD(&n.addfd); atomic_inc(&match->notif->requests); - wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); + if (match->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + wake_up_poll_on_current_cpu(&match->wqh, EPOLLIN | EPOLLRDNORM); + else + wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); /* * This is where we wait for a reply from userspace. @@ -1593,7 +1599,10 @@ static long seccomp_notify_send(struct seccomp_filter *filter, knotif->error = resp.error; knotif->val = resp.val; knotif->flags = resp.flags; - complete(&knotif->ready); + if (filter->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + complete_on_current_cpu(&knotif->ready); + else + complete(&knotif->ready); out: mutex_unlock(&filter->notify_lock); return ret; @@ -1623,6 +1632,22 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter, return ret; } +static long seccomp_notify_set_flags(struct seccomp_filter *filter, + unsigned long flags) +{ + long ret; + + if (flags & ~SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP) + return -EINVAL; + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + filter->notif->flags = flags; + mutex_unlock(&filter->notify_lock); + return 0; +} + static long seccomp_notify_addfd(struct seccomp_filter *filter, struct seccomp_notif_addfd __user *uaddfd, unsigned int size) @@ -1752,6 +1777,8 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, case SECCOMP_IOCTL_NOTIF_ID_VALID_WRONG_DIR: case SECCOMP_IOCTL_NOTIF_ID_VALID: return seccomp_notify_id_valid(filter, buf); + case SECCOMP_IOCTL_NOTIF_SET_FLAGS: + return seccomp_notify_set_flags(filter, arg); } /* Extensible Argument ioctls */ From patchwork Wed Mar 8 07:32:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 66053 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp192163wrd; Tue, 7 Mar 2023 23:40:57 -0800 (PST) X-Google-Smtp-Source: AK7set8vVQytujE5hSBRuMSrgXpr4epiNhq9SQ8w4o/kLOEOMit0MfoBb+lKYFUvqRBKcJzPVgYe X-Received: by 2002:a17:903:441:b0:19b:fa9:678b with SMTP id iw1-20020a170903044100b0019b0fa9678bmr17072412plb.40.1678261256725; Tue, 07 Mar 2023 23:40:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678261256; cv=none; d=google.com; s=arc-20160816; b=iUjTGsWA1qpfkp3gEA09rH4QkzIEb3dAf+UB5XqxUbuaQP/pPgXYtpe72ls2NdHXM8 HsIH8JXtvRRBq0yeAaEb7b9XiTc5h8t/RTFJHdd9sBvlU+KrFJ4dPCEZDu7FuvuYQNKv u/ahEmz0LKDk468y+CGfBHNDhwqnQztRffOp1OjuoX0R5eDFEshiHhIVAAUkCHhqpLsq wzy1heEwvgrZ8T5/STSjNplmDGijFZ9g6HFihzPP6OtDzTaMBDzFfwJVgjWPxFCMH4+m kI5nmPHj5IotSIzGK/KNCnxnigcJCUGHykEA0iDdRNwGsU2Yehjs/8d15Vyes+5n/dO4 hiKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=eiPiVq4qnV5b3yyOHukFyCYe6YsVtSEQ0lj3gSUnBVU=; b=KTbR7TMzR/9le+qI1vJIusPr7Kfd6o1kwfHessNjpXd38FMLeJNJh1CARzsI5M4quX 3sdqdvYyNxI86jPY0hJ5AY6dZVi6EpKAwXLQl1yF69p5A7cnSU3cjF2woHjL+pUO5mMe y7LZdc/gK97BHJGI11HXIjee2mKJEIJqDVjviC7mnMymDTIJ66wcblLA0WjaV5SPNQHX 8+yLB/0L3loudDEKCE+pHJtkinNmeB0qX3bEx1AIiQoZwHl50DRKKsUeLPSqroEcE3Px B+nHJbIrqk856r32rZ28iFBWbIOpxxv0+LIcI9YRIBIypUoXIXEKUy2kX7le7wiATuBH FrNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Nc5RYpMP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lg14-20020a170902fb8e00b00199051ac8f7si13622155plb.188.2023.03.07.23.40.44; Tue, 07 Mar 2023 23:40:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Nc5RYpMP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229816AbjCHHcl (ORCPT + 99 others); Wed, 8 Mar 2023 02:32:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229841AbjCHHce (ORCPT ); Wed, 8 Mar 2023 02:32:34 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B36B0A7295 for ; Tue, 7 Mar 2023 23:32:21 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536be78056eso160482717b3.1 for ; Tue, 07 Mar 2023 23:32:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678260741; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eiPiVq4qnV5b3yyOHukFyCYe6YsVtSEQ0lj3gSUnBVU=; b=Nc5RYpMPMcJmlMk8UTyky0FVierCho4KPLKPoggzP9J288W8hSAzN7Nt/ILUPiMFol 7a3b7CX70eq/+Byb31yem1HSegPxIZqw1QD7EraC/qnTvl8KGW2P8tbqjWSqbFdUnxiO QhoqxfYm8GTR8vJZwIhhtIjIwJ8OxnUcFhocWX4k6ZYfLlNxjBIKAYRTvYAenm6kCJ55 e2oqd0X5MvmYOv1Dhn8Z9ClBPU9VPd0uXVryuyYDO0BAFCKsAoeEM/gec8NvjRK8p69i 5Tg5FFypJJhpDSOMOuo9D7eFHl7f1lnIqU6W+nx+0SSnMapnyv6K/abDJwIG1p6ym3+4 EJHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678260741; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eiPiVq4qnV5b3yyOHukFyCYe6YsVtSEQ0lj3gSUnBVU=; b=Kv9BdiBf7hs2gcew48vG5S0Wq583yT8W7NgQAFjkX/xK4QoLNb1Qj7o76s8TiLvYel 10prIVLJWjBFojnV3GbaLPgZd5Q+ds3dZgQEzXP4LqA4EBE0Eg6puIWIGVOV1+jl8KYA CIhIYrebS/pFlqlcrExqZJuVucMpqbD+2rBFrlUemkHBaHujcLx+BTY1xNhQ+Nq1oi/i pZtj7y1LHJLatUDz8BOLzLD8saY/MNS8rH7nvfx4tA7sQ3LP97s6yJKUZiXyqmHs5ZGI AkrBe5k/uwtB1VhLNs8yGibcoXp1iEwVFdVulUFQHDv1zOjmcbtPKe0fPjfQrWYhwyHA MaVg== X-Gm-Message-State: AO0yUKU4AdPePHR/b0aC+QrgQkbhIg/Vw/Vs4PuVvam0BaOXkAj88krU qGYu9fS5Ua0WEG66sI4Iei6qcXtkI0c= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:b53:99a6:b4fe:b30b]) (user=avagin job=sendgmr) by 2002:a0d:ea13:0:b0:533:54d1:9e40 with SMTP id t19-20020a0dea13000000b0053354d19e40mr4ywe.21.1678260740739; Tue, 07 Mar 2023 23:32:20 -0800 (PST) Date: Tue, 7 Mar 2023 23:32:00 -0800 In-Reply-To: <20230308073201.3102738-1-avagin@google.com> Mime-Version: 1.0 References: <20230308073201.3102738-1-avagin@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230308073201.3102738-6-avagin@google.com> Subject: [PATCH 5/6] selftest/seccomp: add a new test for the sync mode of seccomp_user_notify From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759784475565654356?= X-GMAIL-MSGID: =?utf-8?q?1759784475565654356?= Test output: # RUN global.user_notification_sync ... # OK global.user_notification_sync ok 51 global.user_notification_sync Signed-off-by: Andrei Vagin --- tools/testing/selftests/seccomp/seccomp_bpf.c | 55 +++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 43ec36b179dc..f6a04d88e02f 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -4255,6 +4255,61 @@ TEST(user_notification_addfd_rlimit) close(memfd); } +#ifndef SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) +#endif + +TEST(user_notification_sync) +{ + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + int status, listener; + pid_t pid; + long ret; + + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(0, ret) { + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); + } + + listener = user_notif_syscall(__NR_getppid, + SECCOMP_FILTER_FLAG_NEW_LISTENER); + ASSERT_GE(listener, 0); + + /* Try to set invalid flags. */ + EXPECT_SYSCALL_RETURN(-EINVAL, + ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, 0xffffffff, 0)); + + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, + SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP, 0), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + if (pid == 0) { + ret = syscall(__NR_getppid); + ASSERT_EQ(ret, USER_NOTIF_MAGIC) { + _exit(1); + } + _exit(0); + } + + req.pid = 0; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + + ASSERT_EQ(req.data.nr, __NR_getppid); + + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + resp.flags = 0; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + ASSERT_EQ(waitpid(pid, &status, 0), pid); + ASSERT_EQ(status, 0); +} + + /* Make sure PTRACE_O_SUSPEND_SECCOMP requires CAP_SYS_ADMIN. */ FIXTURE(O_SUSPEND_SECCOMP) { pid_t pid; From patchwork Wed Mar 8 07:32:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrei Vagin X-Patchwork-Id: 66050 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp191646wrd; Tue, 7 Mar 2023 23:39:23 -0800 (PST) X-Google-Smtp-Source: AK7set+OITVHjBHZtLQ04p8splgCQ10CnXEF0PXTWo/QftqszkftC0WF30bul0s6AIj0MMtbz4xG X-Received: by 2002:a17:90b:17c4:b0:234:e0c:caaa with SMTP id me4-20020a17090b17c400b002340e0ccaaamr18061796pjb.6.1678261163411; Tue, 07 Mar 2023 23:39:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678261163; cv=none; d=google.com; s=arc-20160816; b=zfgBPcJvTKUxwx1OpsJ31Bw42M/8mPgM4esX48L2divxNFAazG2CAnTpF0CP3Jc7Zw cIPDTjhPYckXtJHQp2a0ssKoI/qyknBmVyCp/QL3619EnLKsCitrP8KI/+bI3e6dGFFX uHG+IGy8w3vPzbQj7J0xeDD34Mdg57xW19pKoaK/SzBVsr3zq62ghES39yBlOOP4qfv9 DyqwiV6hk4VrNUoFpVCzOKeNvmczhFhe7uvF1BELi4tRUGWZy2rN/S9B+E1Hi1vnwrFs 7EZhQGOUEiwzmukqt9DsGmdIuOorcD74pmnP9dupHzvEZKsgCldsN2O1WI7WhwZJiVgI Yfdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=OGMLdMm5qBcc4olci9fMTmM3WsrESinWs9N+BXJl1jg=; b=SbF0wg4yWY1YfZOknv/YNKGIZ6k2JZ4sRmy5KzDPBtcAyHyw28qtYTF2PURlpaGPmh yYYi836rCMb30XkQxBL49koLvnlh8+ihQxCb0GngLyjigL9svj2ES2/6c4XkzPO/wncc WgGDF05B9lp2ktkvCT5o3kz/Ikd2GYPmd0iKZ/4GQV/BQCRcqNrSqViDrfAlYwDUQjEA 9DxR2ZthT1nXfhgKWkqo56f2OYCAM0NPknImVmRsdZnXVA/llshrewKs0tZ0k4DUp0C+ fQyxsxnAhpoOLRrAfjFP48YCrG9TUh+shS3JNLOpURBanzPHJDL9L4UdRzqREepkcC4r uCQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=eFnDR5mg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h18-20020a17090ac39200b0023747b24923si13420493pjt.53.2023.03.07.23.39.09; Tue, 07 Mar 2023 23:39:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=eFnDR5mg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229702AbjCHHcv (ORCPT + 99 others); Wed, 8 Mar 2023 02:32:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229869AbjCHHcf (ORCPT ); Wed, 8 Mar 2023 02:32:35 -0500 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AF9DA72AB for ; Tue, 7 Mar 2023 23:32:23 -0800 (PST) Received: by mail-pl1-x649.google.com with SMTP id c3-20020a170902724300b0019d1ffec36dso8964686pll.9 for ; Tue, 07 Mar 2023 23:32:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678260743; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OGMLdMm5qBcc4olci9fMTmM3WsrESinWs9N+BXJl1jg=; b=eFnDR5mgN514m3UH4vGPG8H/G2XQBLtJr7TNKaPz0nsq7dq76gfvLoTW7/l4eG+xYr YUQ1k3uwseN1igCqXa9gcvETdz0XyZpT3I+OQuNMaDbvJlCbYYV5lgkcQBwGonABEF8j pgYLWtwBIsNJfPsTyka0p8zl8wPxIcExK8RIKT977hgU9yOuPhPC40JRiMlYPTWzCGqW 6XQoHCjwSaRJHbEjWZ+BIxSK+UWIVQlKogIDm4yddmh3nEuE1CxXkUrKtNce2oHKYCA5 fzkkKx2mo2/d3Cj0zdYivbxeCQPpYqsDawTxkcsG8Bh0HETbo5KlQ27bWcRTpcLQi9lp xu6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678260743; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OGMLdMm5qBcc4olci9fMTmM3WsrESinWs9N+BXJl1jg=; b=6bnIoNcJI7iwwEArnjIP6dy64OI4dYTuNjBGQfvyjlSt1/V4VA8O3pI1r07k9bDoS+ YiWO20NCuJsUO6Q3RbOuSBnE07aQ/GWLx1UaoTnz08/MpWA0ctLbC+PuxBgh6o6EJ6lv 4oLBX1HJ6OYRaZwSd94SNQ7muorPfKfWpy8E+yUCp4xbKEljFv3RjzUQY0H44jfs2n68 qj9CSB2L3qK3yTmBwktt2t4HfCcDV1uYJOGgfGotGeqBrZvNJyAoHfRnkfF2end+K6VD f0Mr3w3KhtschQDJ/2jW3xyHaiJG2qEgK4zpPRyCFYyFRZZ8VfNMWNSLRNTrSr2OYIVI vJ/Q== X-Gm-Message-State: AO0yUKU+PCXO8VDB997H60w2Dy9Y2nolAoEThRMWiae/JREcHmfAbxfy +zv2c/eEKrJpBegMputa8ICQcliAZVA= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:b53:99a6:b4fe:b30b]) (user=avagin job=sendgmr) by 2002:a63:fd41:0:b0:503:a7:c934 with SMTP id m1-20020a63fd41000000b0050300a7c934mr7076545pgj.2.1678260742857; Tue, 07 Mar 2023 23:32:22 -0800 (PST) Date: Tue, 7 Mar 2023 23:32:01 -0800 In-Reply-To: <20230308073201.3102738-1-avagin@google.com> Mime-Version: 1.0 References: <20230308073201.3102738-1-avagin@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230308073201.3102738-7-avagin@google.com> Subject: [PATCH 6/6] perf/benchmark: add a new benchmark for seccom_unotify From: Andrei Vagin To: Kees Cook , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Christian Brauner , Chen Yu , avagin@gmail.com, Andrei Vagin , Andy Lutomirski , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry , Vincent Guittot X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759784377538223367?= X-GMAIL-MSGID: =?utf-8?q?1759784377538223367?= The benchmark is similar to the pipe benchmark. It creates two processes, one is calling syscalls, and another process is handling them via seccomp user notifications. It measures the time required to run a specified number of interations. $ ./perf bench sched seccomp-notify --sync-mode --loop 1000000 # Running 'sched/seccomp-notify' benchmark: # Executed 1000000 system calls Total time: 2.769 [sec] 2.769629 usecs/op 361059 ops/sec $ ./perf bench sched seccomp-notify # Running 'sched/seccomp-notify' benchmark: # Executed 1000000 system calls Total time: 8.571 [sec] 8.571119 usecs/op 116670 ops/sec Signed-off-by: Andrei Vagin --- tools/arch/x86/include/uapi/asm/unistd_32.h | 3 + tools/arch/x86/include/uapi/asm/unistd_64.h | 3 + tools/perf/bench/Build | 1 + tools/perf/bench/bench.h | 1 + tools/perf/bench/sched-seccomp-notify.c | 168 ++++++++++++++++++++ tools/perf/builtin-bench.c | 1 + 6 files changed, 177 insertions(+) create mode 100644 tools/perf/bench/sched-seccomp-notify.c diff --git a/tools/arch/x86/include/uapi/asm/unistd_32.h b/tools/arch/x86/include/uapi/asm/unistd_32.h index 2712d5e03e2e..5fb3589c14bf 100644 --- a/tools/arch/x86/include/uapi/asm/unistd_32.h +++ b/tools/arch/x86/include/uapi/asm/unistd_32.h @@ -23,3 +23,6 @@ #ifndef __NR_setns #define __NR_setns 346 #endif +#ifdef __NR_seccomp +#define __NR_seccomp 354 +#endif diff --git a/tools/arch/x86/include/uapi/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h index a6f7fe84d4df..e0549617f9d7 100644 --- a/tools/arch/x86/include/uapi/asm/unistd_64.h +++ b/tools/arch/x86/include/uapi/asm/unistd_64.h @@ -23,3 +23,6 @@ #ifndef __NR_getcpu #define __NR_getcpu 309 #endif +#ifndef __NR_seccomp +#define __NR_seccomp 317 +#endif diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build index 6b6155a8ad09..e3ec2c1b0682 100644 --- a/tools/perf/bench/Build +++ b/tools/perf/bench/Build @@ -1,5 +1,6 @@ perf-y += sched-messaging.o perf-y += sched-pipe.o +perf-y += sched-seccomp-notify.o perf-y += syscall.o perf-y += mem-functions.o perf-y += futex-hash.o diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h index e43893151a3e..9d28510fcf9d 100644 --- a/tools/perf/bench/bench.h +++ b/tools/perf/bench/bench.h @@ -21,6 +21,7 @@ extern struct timeval bench__start, bench__end, bench__runtime; int bench_numa(int argc, const char **argv); int bench_sched_messaging(int argc, const char **argv); int bench_sched_pipe(int argc, const char **argv); +int bench_sched_seccomp_notify(int argc, const char **argv); int bench_syscall_basic(int argc, const char **argv); int bench_syscall_getpgid(int argc, const char **argv); int bench_syscall_execve(int argc, const char **argv); diff --git a/tools/perf/bench/sched-seccomp-notify.c b/tools/perf/bench/sched-seccomp-notify.c new file mode 100644 index 000000000000..443f4b43702d --- /dev/null +++ b/tools/perf/bench/sched-seccomp-notify.c @@ -0,0 +1,168 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include "bench.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define LOOPS_DEFAULT 1000000UL +static uint64_t loops = LOOPS_DEFAULT; +static bool sync_mode; + +static const struct option options[] = { + OPT_U64('l', "loop", &loops, "Specify number of loops"), + OPT_BOOLEAN('s', "sync-mode", &sync_mode, + "Enable the synchronious mode for seccomp notifications"), + OPT_END() +}; + +static const char * const bench_seccomp_usage[] = { + "perf bench sched secccomp-notify ", + NULL +}; + +static int seccomp(unsigned int op, unsigned int flags, void *args) +{ + return syscall(__NR_seccomp, op, flags, args); +} + +static int user_notif_syscall(int nr, unsigned int flags) +{ + struct sock_filter filter[] = { + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, + offsetof(struct seccomp_data, nr)), + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, nr, 0, 1), + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF), + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), + }; + + struct sock_fprog prog = { + .len = (unsigned short)ARRAY_SIZE(filter), + .filter = filter, + }; + + return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); +} + +#define USER_NOTIF_MAGIC INT_MAX +static void user_notification_sync_loop(int listener) +{ + struct seccomp_notif_resp resp; + struct seccomp_notif req; + uint64_t nr; + + for (nr = 0; nr < loops; nr++) { + memset(&req, 0, sizeof(req)); + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req) == 0); + + assert(req.data.nr == __NR_gettid); + + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + resp.flags = 0; + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp) == 0); + } +} + +#ifndef SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP +#define SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP (1UL << 0) +#define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) +#endif +int bench_sched_seccomp_notify(int argc, const char **argv) +{ + struct timeval start, stop, diff; + unsigned long long result_usec = 0; + int status, listener; + pid_t pid; + long ret; + + argc = parse_options(argc, argv, options, bench_seccomp_usage, 0); + + gettimeofday(&start, NULL); + + prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + listener = user_notif_syscall(__NR_gettid, + SECCOMP_FILTER_FLAG_NEW_LISTENER); + assert(listener >= 0); + + pid = fork(); + assert(pid >= 0); + if (pid == 0) { + assert(prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0) == 0); + while (1) { + ret = syscall(__NR_gettid); + if (ret == USER_NOTIF_MAGIC) + continue; + break; + } + _exit(1); + } + + if (sync_mode) { + assert(ioctl(listener, SECCOMP_IOCTL_NOTIF_SET_FLAGS, + SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP, 0) == 0); + } + user_notification_sync_loop(listener); + + kill(pid, SIGKILL); + assert(waitpid(pid, &status, 0) == pid); + assert(WIFSIGNALED(status)); + assert(WTERMSIG(status) == SIGKILL); + + gettimeofday(&stop, NULL); + timersub(&stop, &start, &diff); + + switch (bench_format) { + case BENCH_FORMAT_DEFAULT: + printf("# Executed %lu system calls\n\n", + loops); + + result_usec = diff.tv_sec * USEC_PER_SEC; + result_usec += diff.tv_usec; + + printf(" %14s: %lu.%03lu [sec]\n\n", "Total time", + (unsigned long) diff.tv_sec, + (unsigned long) (diff.tv_usec / USEC_PER_MSEC)); + + printf(" %14lf usecs/op\n", + (double)result_usec / (double)loops); + printf(" %14d ops/sec\n", + (int)((double)loops / + ((double)result_usec / (double)USEC_PER_SEC))); + break; + + case BENCH_FORMAT_SIMPLE: + printf("%lu.%03lu\n", + (unsigned long) diff.tv_sec, + (unsigned long) (diff.tv_usec / USEC_PER_MSEC)); + break; + + default: + /* reaching here is something disaster */ + fprintf(stderr, "Unknown format:%d\n", bench_format); + exit(1); + break; + } + + return 0; +} diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c index 814e9afc86f6..db57813fe4e5 100644 --- a/tools/perf/builtin-bench.c +++ b/tools/perf/builtin-bench.c @@ -46,6 +46,7 @@ static struct bench numa_benchmarks[] = { static struct bench sched_benchmarks[] = { { "messaging", "Benchmark for scheduling and IPC", bench_sched_messaging }, { "pipe", "Benchmark for pipe() between two processes", bench_sched_pipe }, + { "seccomp-notify", "Benchmark for seccomp user notify", bench_sched_seccomp_notify}, { "all", "Run all scheduler benchmarks", NULL }, { NULL, NULL, NULL } };