Message ID | 20230308073201.3102738-1-avagin@google.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp191647wrd; Tue, 7 Mar 2023 23:39:23 -0800 (PST) X-Google-Smtp-Source: AK7set/vM2Arw0JsEzaxRvFNoMviO2slfNtyvJExvMs2pi3Yanqqv00hUTMXI0R7I6SwroRyLmxA X-Received: by 2002:a17:903:41c1:b0:19c:d550:8cd4 with SMTP id u1-20020a17090341c100b0019cd5508cd4mr22224664ple.7.1678261163415; Tue, 07 Mar 2023 23:39:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678261163; cv=none; d=google.com; s=arc-20160816; b=e7xrsgq4sCFtSJ2CN6bXNHudpgizO/wOpxU9S25g2w+pH0dgYNw8IdpmtTplh6i9im 4Uhocg+HNweOQAA8pVqGjHXuqkrmuvsuWXn531niI7x8LamwMeXCuERftIYCBcb9sDo4 HcBUn9TW546kN5TnhUOFSofH8/OXTZuRbEAaTTSZtTAtzk9PjQ623LHh3AQLXt6wL08Y d+QZxRKLP5c7W/Q6GpFIa6z4/B5guITP3DWE3tw3tsnLr99VjApDNxFUuypMjfvzkqQh LYny7wHqLxn5wkq818l3j2gVUFcH9lUMiGQIjmRekZPcxA+cggL9iIvh7wOWIkJiOt62 BTCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:mime-version:date:dkim-signature; bh=NEMJGeDtED5Z6IXdZ3S6Hy8H0y0x1giIRcxgJ32fnIg=; b=pzGugy7WGsdrsCBICu3HNH9c+4PUWLA+Z5OprF0pz/QFMh6O5n+Fsk2iVoUs7PJ4xw 73vgDqKWXB28TIJU5y3tTYrk6+Es0vAVToSqyPwSn9Dvxy70UlJBzhFKoPcpmMykHfq2 YPV30XTGqa64UHzIw9e40HhA4RIByZMl7MxHVEpDpwqtR5rQcM2DBWYh++t1oDUrSVgY /mVhAqV2/UMbSRw5ZfzBQp9cfGnQ4YaiZw2CXF6F7vfLDB9UAl/L/b/qnYjRWvXt54jN 7QATN2rSYdwRjPjz9D1u3QSRSmlygxqmZBHm4urQLJOxVa172Sj05DXGo8G2QM6kM29+ 35SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=e0ijmSi7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p7-20020a631e47000000b004f1d4f11e32si14248260pgm.186.2023.03.07.23.39.10; Tue, 07 Mar 2023 23:39:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=e0ijmSi7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229794AbjCHHcP (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Wed, 8 Mar 2023 02:32:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229579AbjCHHcN (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 8 Mar 2023 02:32:13 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3234BA7288 for <linux-kernel@vger.kernel.org>; Tue, 7 Mar 2023 23:32:10 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id i6-20020a170902c94600b0019d16e4ac0bso8955633pla.5 for <linux-kernel@vger.kernel.org>; Tue, 07 Mar 2023 23:32:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678260729; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:from:to:cc:subject:date:message-id:reply-to; bh=NEMJGeDtED5Z6IXdZ3S6Hy8H0y0x1giIRcxgJ32fnIg=; b=e0ijmSi763yfus+Cd+UeP1fJIGUiCeeC4/hLMs5oD9/Ym+J8adTqgcI9agTYmKTSvU SNh/xR5WEkiII3Qfl6AAectVwkF1IMXzcRUTVsj9HiFaRvGVFMJTUKtNSMlHzNwtQiNg ++S8sdzpuojAkg6fTaawyyWmtBX5Y4voELdtbr/ChcB+gYJKZS509wdnZ887mOhX+VkM UKilw333WdonVYXFAoRNprZZoqvnzEMCw76q1q7185VJVpsPHPZTHKjD3uw+Oi5odiF/ 4yzgYJ7t7XVu2mGQWvl+QNl5Ry3rn3kWnCJXZL7D1ocCJi8uAZ9NgA1L5NnBZ3fzOanM FGXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678260729; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NEMJGeDtED5Z6IXdZ3S6Hy8H0y0x1giIRcxgJ32fnIg=; b=FCidQxGcjVbw9UlVs4HW9DFf5Qjf3nq8AjjijADAbs8Ex6UMMznichWUvC3WA79TN1 6UhMdJJ9vbrwEQAUsUhMTvHp5zV5wXUUwEXp0JmxA3s1GbfPTf3Gox8H3qPjsumEkYLj Y0XWVpJ29JPdEfJKoMR/HsXv6/0k0XEcMEpa86wTJl/ZXyNRKy+HXtpTg7amsz4U/TzV xEas28UfmwDXnZ/zjVsIbWKCyVv7hPggZmL1mF4FRYullzVKQy6zNSD+tevs3jrmUpEv EQRjxrofTU9Y54YBRB+PkrBv14H+Ck5RuYYgNhuhnIKqssXJ/P8Fx6HVN7KRzRkg0ehr T8RQ== X-Gm-Message-State: AO0yUKV2kQ+tRtLOqJqT/i1Z+sYC3HlYMAbEGP5RYBLMlehkwaSMSgeT Qj3Q0ly/fMrIryV3rtsrrWQ95lZHEYI= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:11:b53:99a6:b4fe:b30b]) (user=avagin job=sendgmr) by 2002:a62:8307:0:b0:5a8:4dc1:5916 with SMTP id h7-20020a628307000000b005a84dc15916mr7343103pfe.2.1678260729582; Tue, 07 Mar 2023 23:32:09 -0800 (PST) Date: Tue, 7 Mar 2023 23:31:55 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230308073201.3102738-1-avagin@google.com> Subject: [PATCH 0/6 v5 RESEND] seccomp: add the synchronous mode for seccomp_unotify From: Andrei Vagin <avagin@google.com> To: Kees Cook <keescook@chromium.org>, Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org, Christian Brauner <brauner@kernel.org>, Chen Yu <yu.c.chen@intel.com>, avagin@gmail.com, Andrei Vagin <avagin@google.com>, Andy Lutomirski <luto@amacapital.net>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Ingo Molnar <mingo@redhat.com>, Juri Lelli <juri.lelli@redhat.com>, Peter Oskolkov <posk@google.com>, Tycho Andersen <tycho@tycho.pizza>, Will Drewry <wad@chromium.org>, Vincent Guittot <vincent.guittot@linaro.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759784377488728958?= X-GMAIL-MSGID: =?utf-8?q?1759784377488728958?= |
Series |
seccomp: add the synchronous mode for seccomp_unotify
|
|
Message
Andrei Vagin
March 8, 2023, 7:31 a.m. UTC
seccomp_unotify allows more privileged processes do actions on behalf of less privileged processes. In many cases, the workflow is fully synchronous. It means a target process triggers a system call and passes controls to a supervisor process that handles the system call and returns controls back to the target process. In this context, "synchronous" means that only one process is running and another one is waiting. The new WF_CURRENT_CPU flag advises the scheduler to move the wakee to the current CPU. For such synchronous workflows, it makes context switches a few times faster. Right now, each interaction takes 12µs. With this patch, it takes about 3µs. v2: clean up the first patch and add the test. v3: update commit messages and a few fixes suggested by Kees Cook. v4: update the third patch to avoid code duplications (suggested by Peter Zijlstra) Add the benchmark to the perf bench set. v5: Update the author email. No code changes. Kees is ready to take this patch set, but wants to get Acks from the sched folks. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Christian Brauner <brauner@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Will Drewry <wad@chromium.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Andrei Vagin (4): seccomp: don't use semaphore and wait_queue together sched: add a few helpers to wake up tasks on the current cpu seccomp: add the synchronous mode for seccomp_unotify selftest/seccomp: add a new test for the sync mode of seccomp_user_notify Peter Oskolkov (1): sched: add WF_CURRENT_CPU and externise ttwu include/linux/completion.h | 1 + include/linux/swait.h | 2 +- include/linux/wait.h | 3 + include/uapi/linux/seccomp.h | 4 + kernel/sched/completion.c | 26 ++- kernel/sched/core.c | 5 +- kernel/sched/fair.c | 4 + kernel/sched/sched.h | 13 +- kernel/sched/swait.c | 8 +- kernel/sched/wait.c | 5 + kernel/seccomp.c | 72 +++++++- tools/arch/x86/include/uapi/asm/unistd_32.h | 3 + tools/arch/x86/include/uapi/asm/unistd_64.h | 3 + tools/perf/bench/Build | 1 + tools/perf/bench/bench.h | 1 + tools/perf/bench/sched-seccomp-notify.c | 167 ++++++++++++++++++ tools/perf/builtin-bench.c | 1 + tools/testing/selftests/seccomp/seccomp_bpf.c | 55 ++++++ 18 files changed, 346 insertions(+), 28 deletions(-) create mode 100644 tools/perf/bench/sched-seccomp-notify.c
Comments
On Tue, Mar 7, 2023 at 11:32 PM Andrei Vagin <avagin@google.com> wrote: > > seccomp_unotify allows more privileged processes do actions on behalf > of less privileged processes. > > In many cases, the workflow is fully synchronous. It means a target > process triggers a system call and passes controls to a supervisor > process that handles the system call and returns controls back to the > target process. In this context, "synchronous" means that only one > process is running and another one is waiting. > > The new WF_CURRENT_CPU flag advises the scheduler to move the wakee to > the current CPU. For such synchronous workflows, it makes context > switches a few times faster. > > Right now, each interaction takes 12µs. With this patch, it takes about > 3µs. > > v2: clean up the first patch and add the test. > v3: update commit messages and a few fixes suggested by Kees Cook. > v4: update the third patch to avoid code duplications (suggested by > Peter Zijlstra) > Add the benchmark to the perf bench set. > v5: Update the author email. No code changes. > > Kees is ready to take this patch set, but wants to get Acks from the > sched folks. Peter, could you review the second and third patches of this series? Thanks, Andrei
On Tue, Mar 07, 2023 at 11:31:55PM -0800, Andrei Vagin wrote: > Kees is ready to take this patch set, but wants to get Acks from the > sched folks. > > Andrei Vagin (4): > seccomp: don't use semaphore and wait_queue together > sched: add a few helpers to wake up tasks on the current cpu > seccomp: add the synchronous mode for seccomp_unotify > selftest/seccomp: add a new test for the sync mode of > seccomp_user_notify > > Peter Oskolkov (1): > sched: add WF_CURRENT_CPU and externise ttwu Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
On Mon, Mar 27, 2023 at 3:27 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Mar 07, 2023 at 11:31:55PM -0800, Andrei Vagin wrote: > > > Kees is ready to take this patch set, but wants to get Acks from the > > sched folks. > > > > > Andrei Vagin (4): > > seccomp: don't use semaphore and wait_queue together > > sched: add a few helpers to wake up tasks on the current cpu > > seccomp: add the synchronous mode for seccomp_unotify > > selftest/seccomp: add a new test for the sync mode of > > seccomp_user_notify > > > > Peter Oskolkov (1): > > sched: add WF_CURRENT_CPU and externise ttwu > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Kees, Could you look at this patch set? You wrote to one of the previous versions that you are ready to take it if sched maintainers approve it. Here is no major changes from that moment. The sched-related part has been cleaned up according with Peter's comments, and I moved the perf test to the perf tool. Thanks, Andrei
On April 3, 2023 11:32:00 AM PDT, Andrei Vagin <avagin@gmail.com> wrote: >On Mon, Mar 27, 2023 at 3:27 AM Peter Zijlstra <peterz@infradead.org> wrote: >> >> On Tue, Mar 07, 2023 at 11:31:55PM -0800, Andrei Vagin wrote: >> >> > Kees is ready to take this patch set, but wants to get Acks from the >> > sched folks. >> > >> >> > Andrei Vagin (4): >> > seccomp: don't use semaphore and wait_queue together >> > sched: add a few helpers to wake up tasks on the current cpu >> > seccomp: add the synchronous mode for seccomp_unotify >> > selftest/seccomp: add a new test for the sync mode of >> > seccomp_user_notify >> > >> > Peter Oskolkov (1): >> > sched: add WF_CURRENT_CPU and externise ttwu >> >> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > >Kees, > >Could you look at this patch set? You wrote to one of the previous >versions that you are ready to take it if sched maintainers approve it. >Here is no major changes from that moment. The sched-related part has >been cleaned up according with Peter's comments, and I moved the perf >test to the perf tool. Hi! Yes, thanks for keeping this going! I'm catching up after some vacation, but this is on my TODO list. :) -Kees
> Add complete_on_current_cpu, wake_up_poll_on_current_cpu helpers to wake > up tasks on the current CPU. > These two helpers are useful when the task needs to make a synchronous context > switch to another task. In this context, synchronous means it wakes up the > target task and falls asleep right after that. > One example of such workloads is seccomp user notifies. This mechanism allows > the supervisor process handles system calls on behalf of a target process. > While the supervisor is handling an intercepted system call, the target process > will be blocked in the kernel, waiting for a response to come back. > On-CPU context switches are much faster than regular ones. > Signed-off-by: Andrei Vagin <avagin@google.com> Avoiding cpu switches is very desirable for fuse, I'm working on fuse over uring with per core queues. In my current branch and running a single threaded bonnie++ I get about 9000 creates/s when I bind the process to a core, about 7000 creates/s when I set SCHED_IDLE for the ring threads and back to 9000 with SCHED_IDLE and disabling cpu migration in fs/fuse/dev.c request_wait_answer() before going into the waitq and enabling it back after waking up. I had reported this a few weeks back https://lore.kernel.org/lkml/d0ed1dbd-1b7e-bf98-65c0-7f61dd1a3228@ddn.com/ and had been pointed to your and Prateeks patch series. I'm now going through these series. Interesting part is that a few weeks I didn't need SCHED_IDLE, just disabling/enabling migration before/after waking up was enough. [...] > EXPORT_SYMBOL(swake_up_one); > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c > index 133b74730738..47803a0b8d5d 100644 > --- a/kernel/sched/wait.c > +++ b/kernel/sched/wait.c > @@ -161,6 +161,11 @@ int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, > } > EXPORT_SYMBOL(__wake_up); > +void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key) > +{ > + __wake_up_common_lock(wq_head, mode, 1, WF_CURRENT_CPU, key); > +} I'm about to test this instead of migrate_disable/migrate_enable, but the symbol needs to be exported - any objection to do that right from the beginning in your patch? Thanks, Bernd
On Wed, Apr 5, 2023 at 8:19 PM Kees Cook <kees@kernel.org> wrote: > > On April 3, 2023 11:32:00 AM PDT, Andrei Vagin <avagin@gmail.com> wrote: > >On Mon, Mar 27, 2023 at 3:27 AM Peter Zijlstra <peterz@infradead.org> wrote: > >> > >> On Tue, Mar 07, 2023 at 11:31:55PM -0800, Andrei Vagin wrote: > >> > >> > Kees is ready to take this patch set, but wants to get Acks from the > >> > sched folks. > >> > > >> > >> > Andrei Vagin (4): > >> > seccomp: don't use semaphore and wait_queue together > >> > sched: add a few helpers to wake up tasks on the current cpu > >> > seccomp: add the synchronous mode for seccomp_unotify > >> > selftest/seccomp: add a new test for the sync mode of > >> > seccomp_user_notify > >> > > >> > Peter Oskolkov (1): > >> > sched: add WF_CURRENT_CPU and externise ttwu > >> > >> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > >Kees, > > > >Could you look at this patch set? You wrote to one of the previous > >versions that you are ready to take it if sched maintainers approve it. > >Here is no major changes from that moment. The sched-related part has > >been cleaned up according with Peter's comments, and I moved the perf > >test to the perf tool. > > Hi! > > Yes, thanks for keeping this going! I'm catching up after some vacation, but this is on my TODO list. :) Hi Kees. Do you have any updates on this series? > > -Kees > > > -- > Kees Cook
On Wed, Jun 28, 2023 at 11:44:02AM -0700, Andrei Vagin wrote: > On Wed, Apr 5, 2023 at 8:19 PM Kees Cook <kees@kernel.org> wrote: > > > > On April 3, 2023 11:32:00 AM PDT, Andrei Vagin <avagin@gmail.com> wrote: > > >On Mon, Mar 27, 2023 at 3:27 AM Peter Zijlstra <peterz@infradead.org> wrote: > > >> > > >> On Tue, Mar 07, 2023 at 11:31:55PM -0800, Andrei Vagin wrote: > > >> > > >> > Kees is ready to take this patch set, but wants to get Acks from the > > >> > sched folks. > > >> > > > >> > > >> > Andrei Vagin (4): > > >> > seccomp: don't use semaphore and wait_queue together > > >> > sched: add a few helpers to wake up tasks on the current cpu > > >> > seccomp: add the synchronous mode for seccomp_unotify > > >> > selftest/seccomp: add a new test for the sync mode of > > >> > seccomp_user_notify > > >> > > > >> > Peter Oskolkov (1): > > >> > sched: add WF_CURRENT_CPU and externise ttwu > > >> > > >> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > > > >Kees, > > > > > >Could you look at this patch set? You wrote to one of the previous > > >versions that you are ready to take it if sched maintainers approve it. > > >Here is no major changes from that moment. The sched-related part has > > >been cleaned up according with Peter's comments, and I moved the perf > > >test to the perf tool. > > > > Hi! > > > > Yes, thanks for keeping this going! I'm catching up after some vacation, but this is on my TODO list. :) > > Hi Kees. Do you have any updates on this series? Apologies for the delay! I've added this to the seccomp tree -- it should show up in -next soon. -Kees