Message ID | 20230328221644.803272-5-yosryahmed@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp5849vqo; Tue, 28 Mar 2023 15:20:33 -0700 (PDT) X-Google-Smtp-Source: AKy350Z+rOmr81y9LIiOp0MKPh3u2AXXdlPqKQzqeguM0Vm9NNgingyONY4JS2t+NElLBi1EYBbo X-Received: by 2002:a17:902:d4c6:b0:1a1:cc91:c43c with SMTP id o6-20020a170902d4c600b001a1cc91c43cmr20873241plg.45.1680042033030; Tue, 28 Mar 2023 15:20:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680042033; cv=none; d=google.com; s=arc-20160816; b=Ctn3skdfiur36BNingUcxg5/vs/yQ6ofDX4JAeJsxKAAQN9RqPsZZulL5W+V4TQm6y ZRg2zrMDjtpZNFhMJ4c8nTswQjNc465N6e6vAKI/GbELUy1qpMLp9p8d+9FcEWAifzXg VZAc/W2AUnEGSJtHlyYlUUfpgJeoI7UD3uliSTwvc03MF6uC2xRdGxYBADBMYcsd2CXs 3GDg0reEavL16cWm2rA2QtlzNbGhztQ6JwWQti5YPBSbwHZ8Bdiq6frUxXAkJTGldNPz qYfIpsZNoMWorL44hwRNfEVSfRehFqwLWoQcKiJc3n/kf6X+n2sJMEfOpnSCOZku3+rh hjwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=ZQQVGwjs4PjltBBm07Wln67ACtAz9IV2/pUsgTkN4l8=; b=ymFywUnaP/2viVzR8tPBIyn4sY26lo7ST/VuwGnY/45ac59r6gz3XYDukRyweq8Fqq ij6XDq/7+CRaCcStRikjp8zx4BsPukASICA8xj9X82hQ5y0MkhUwENNbewFF39J++SxL kT/YJznelMRQ4wJVD3i1VxoVy/NHq9edSln4UiXeX8vEkHEvWVMTlY9Q/CgXvXnHyDsn yguFCP0ouAFfvobVe1/7V/NPPAwOgMNZDUaOBZuIxBeJPmOyEStLEjraSVdO3Pp3Y+U1 P4z/O8VxBwJoV8r1zeKlPiVp4WviDJvkRjrk3B7NLzBmw9nUzBl6eZGADPD33SPnPlqp UeIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BwpN5IEN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f13-20020a65400d000000b00512fab2e9casi15375684pgp.463.2023.03.28.15.20.20; Tue, 28 Mar 2023 15:20:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BwpN5IEN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229618AbjC1WRb (ORCPT <rfc822;rua109.linux@gmail.com> + 99 others); Tue, 28 Mar 2023 18:17:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230140AbjC1WRN (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 28 Mar 2023 18:17:13 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5C112D57 for <linux-kernel@vger.kernel.org>; Tue, 28 Mar 2023 15:16:58 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id p12-20020a25420c000000b00b6eb3c67574so13394482yba.11 for <linux-kernel@vger.kernel.org>; Tue, 28 Mar 2023 15:16:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680041818; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZQQVGwjs4PjltBBm07Wln67ACtAz9IV2/pUsgTkN4l8=; b=BwpN5IENfDJAO3k0fvlZBkjsW+/4dRHFvKU3Y61vx7dQ2RtgMRwRfqyo0DTmjCAyVc CnLkW84f17NTLqbLQGhvt1qFEPE3lJHVmINhnviNxj716C70wOSUz3DOzWHSZ0RhU5XN r07qEoT8pGKoQFAOapju4r5aUWkkVs0eQmIgIci7KEjffCx1HyDeGdNPdwdlAJbu1aOk 7WwXMQu8pO8A54UInZRnQw9kbNQZxf/x1jM4losJCUKwsJkxT/8pf/ZH8ftd03KJhEdg XtPwGtxKxPzrVZOjpFL3fFaXSbnVLYOlrmJD9NS/UpXXC6msV74XQ/lannH6QIJ1sQ32 hXEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680041818; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZQQVGwjs4PjltBBm07Wln67ACtAz9IV2/pUsgTkN4l8=; b=DwWY2JM7kDBgV56PkudgqRlITQZCCTVlnsNd4UDja4mbKPnL6V6LI+lza5X0IEqX6W okMrNt9eoiT0fD4qSsXOoOCgg4mP1PVGI0l3LUqF2fX9gHM0ub1Bb05zwy4xMgtgWxIc LcfieuDq+b1hCeEpm54q6nLHxdh+hkE+Uy6YT7PQ6EZdbq9wAslyV3rhaQCc+buhyjqk plY3PPvJVR+dTNDEwobW14zO2LM7O62CugKLVLJVe3ce0hNxPfRDhrpR+OjT3Xaskh02 WEv/MT4eB4flD3MP5x3UMOoBHiKcBP9QEsZ/PgSq7RPQpDQdcgsST209yIlepoOadzb+ e01w== X-Gm-Message-State: AAQBX9d1hgkD99HwtRgC4p4a3JSCbx5TasSyqBoGk11Jl4fUuvOvjowQ zoL8SMogZBB1I8qwj5ZRxJHmYoOqvw9Rodra X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:1003:b0:b1d:5061:98e3 with SMTP id w3-20020a056902100300b00b1d506198e3mr11533410ybt.6.1680041818092; Tue, 28 Mar 2023 15:16:58 -0700 (PDT) Date: Tue, 28 Mar 2023 22:16:39 +0000 In-Reply-To: <20230328221644.803272-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230328221644.803272-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230328221644.803272-5-yosryahmed@google.com> Subject: [PATCH v2 4/9] cgroup: rstat: add WARN_ON_ONCE() if flushing outside task context From: Yosry Ahmed <yosryahmed@google.com> To: Tejun Heo <tj@kernel.org>, Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>, Zefan Li <lizefan.x@bytedance.com>, Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>, Roman Gushchin <roman.gushchin@linux.dev>, Shakeel Butt <shakeelb@google.com>, Muchun Song <muchun.song@linux.dev>, Andrew Morton <akpm@linux-foundation.org>, " =?utf-8?q?Michal_Koutn=C3=BD?= " <mkoutny@suse.com> Cc: Vasily Averin <vasily.averin@linux.dev>, cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed <yosryahmed@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761651754744094323?= X-GMAIL-MSGID: =?utf-8?q?1761651754744094323?= |
Series |
memcg: make rstat flushing irq and sleep
|
|
Commit Message
Yosry Ahmed
March 28, 2023, 10:16 p.m. UTC
rstat flushing is too expensive to perform in irq context. The previous patch removed the only context that may invoke an rstat flush from irq context, add a WARN_ON_ONCE() to detect future violations, or those that we are not aware of. Ideally, we wouldn't flush with irqs disabled either, but we have one context today that does so in mem_cgroup_usage(). Forbid callers from irq context for now, and hopefully we can also forbid callers with irqs disabled in the future when we can get rid of this callsite. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> --- kernel/cgroup/rstat.c | 2 ++ 1 file changed, 2 insertions(+)
Comments
On Tue 28-03-23 22:16:39, Yosry Ahmed wrote: > rstat flushing is too expensive to perform in irq context. > The previous patch removed the only context that may invoke an rstat > flush from irq context, add a WARN_ON_ONCE() to detect future > violations, or those that we are not aware of. > > Ideally, we wouldn't flush with irqs disabled either, but we have one > context today that does so in mem_cgroup_usage(). Forbid callers from > irq context for now, and hopefully we can also forbid callers with irqs > disabled in the future when we can get rid of this callsite. I am sorry to be late to the discussion. I wanted to follow up on Johannes reply in the previous version but you are too fast ;) I do agree that this looks rather arbitrary. You do not explain how the warning actually helps. Is the intention to be really verbose to the kernel log when somebody uses this interface from the IRQ context and get bug reports? What about configurations with panic on warn? Do we really want to crash their systems for something like that? > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > Reviewed-by: Shakeel Butt <shakeelb@google.com> > --- > kernel/cgroup/rstat.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c > index d3252b0416b6..c2571939139f 100644 > --- a/kernel/cgroup/rstat.c > +++ b/kernel/cgroup/rstat.c > @@ -176,6 +176,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) > { > int cpu; > > + /* rstat flushing is too expensive for irq context */ > + WARN_ON_ONCE(!in_task()); > lockdep_assert_held(&cgroup_rstat_lock); > > for_each_possible_cpu(cpu) { > -- > 2.40.0.348.gf938b09366-goog
On Wed, Mar 29, 2023 at 4:22 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 28-03-23 22:16:39, Yosry Ahmed wrote: > > rstat flushing is too expensive to perform in irq context. > > The previous patch removed the only context that may invoke an rstat > > flush from irq context, add a WARN_ON_ONCE() to detect future > > violations, or those that we are not aware of. > > > > Ideally, we wouldn't flush with irqs disabled either, but we have one > > context today that does so in mem_cgroup_usage(). Forbid callers from > > irq context for now, and hopefully we can also forbid callers with irqs > > disabled in the future when we can get rid of this callsite. > > I am sorry to be late to the discussion. I wanted to follow up on > Johannes reply in the previous version but you are too fast ;) > > I do agree that this looks rather arbitrary. You do not explain how the > warning actually helps. Is the intention to be really verbose to the > kernel log when somebody uses this interface from the IRQ context and > get bug reports? What about configurations with panic on warn? Do we > really want to crash their systems for something like that? Thanks for taking a look, Michal! The ultimate goal is not to flush in irq context or with irqs disabled, as in some cases it causes irqs to be disabled for a long time, as flushing is an expensive operation. The previous patch in the series should have removed the only context that flushes in irq context, and the purpose of the WARN_ON_ONCE() is to catch future uses or uses that we might have missed. There is still one code path that flushes with irqs disabled (also mem_cgroup_usage()), and we cannot remove this just yet; we need to deprecate usage threshold events for root to do that. So we cannot enforce not flushing with irqs disabled yet. So basically the patch is trying to enforce what we have now, not flushing in irq context, and hopefully at some point we will also be able to enforce not flushing with irqs disabled. If WARN_ON_ONCE() is the wrong tool for this, please let me know. > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > > Reviewed-by: Shakeel Butt <shakeelb@google.com> > > --- > > kernel/cgroup/rstat.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c > > index d3252b0416b6..c2571939139f 100644 > > --- a/kernel/cgroup/rstat.c > > +++ b/kernel/cgroup/rstat.c > > @@ -176,6 +176,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) > > { > > int cpu; > > > > + /* rstat flushing is too expensive for irq context */ > > + WARN_ON_ONCE(!in_task()); > > lockdep_assert_held(&cgroup_rstat_lock); > > > > for_each_possible_cpu(cpu) { > > -- > > 2.40.0.348.gf938b09366-goog > > -- > Michal Hocko > SUSE Labs
On Wed, Mar 29, 2023 at 11:41:39AM -0700, Yosry Ahmed wrote: > On Wed, Mar 29, 2023 at 4:22 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Tue 28-03-23 22:16:39, Yosry Ahmed wrote: > > > rstat flushing is too expensive to perform in irq context. > > > The previous patch removed the only context that may invoke an rstat > > > flush from irq context, add a WARN_ON_ONCE() to detect future > > > violations, or those that we are not aware of. > > > > > > Ideally, we wouldn't flush with irqs disabled either, but we have one > > > context today that does so in mem_cgroup_usage(). Forbid callers from > > > irq context for now, and hopefully we can also forbid callers with irqs > > > disabled in the future when we can get rid of this callsite. > > > > I am sorry to be late to the discussion. I wanted to follow up on > > Johannes reply in the previous version but you are too fast ;) > > > > I do agree that this looks rather arbitrary. You do not explain how the > > warning actually helps. Is the intention to be really verbose to the > > kernel log when somebody uses this interface from the IRQ context and > > get bug reports? What about configurations with panic on warn? Do we > > really want to crash their systems for something like that? > > Thanks for taking a look, Michal! > > The ultimate goal is not to flush in irq context or with irqs > disabled, as in some cases it causes irqs to be disabled for a long > time, as flushing is an expensive operation. The previous patch in the > series should have removed the only context that flushes in irq > context, and the purpose of the WARN_ON_ONCE() is to catch future uses > or uses that we might have missed. > > There is still one code path that flushes with irqs disabled (also > mem_cgroup_usage()), and we cannot remove this just yet; we need to > deprecate usage threshold events for root to do that. So we cannot > enforce not flushing with irqs disabled yet. > > So basically the patch is trying to enforce what we have now, not > flushing in irq context, and hopefully at some point we will also be > able to enforce not flushing with irqs disabled. > > If WARN_ON_ONCE() is the wrong tool for this, please let me know. > If I understand Michal's concern, the question is should be start with pr_warn_once() instead of WARN_ON_ONCE() and I think yes we should start with pr_warn_once().
On Wed 29-03-23 19:20:59, Shakeel Butt wrote: > On Wed, Mar 29, 2023 at 11:41:39AM -0700, Yosry Ahmed wrote: > > On Wed, Mar 29, 2023 at 4:22 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Tue 28-03-23 22:16:39, Yosry Ahmed wrote: > > > > rstat flushing is too expensive to perform in irq context. > > > > The previous patch removed the only context that may invoke an rstat > > > > flush from irq context, add a WARN_ON_ONCE() to detect future > > > > violations, or those that we are not aware of. > > > > > > > > Ideally, we wouldn't flush with irqs disabled either, but we have one > > > > context today that does so in mem_cgroup_usage(). Forbid callers from > > > > irq context for now, and hopefully we can also forbid callers with irqs > > > > disabled in the future when we can get rid of this callsite. > > > > > > I am sorry to be late to the discussion. I wanted to follow up on > > > Johannes reply in the previous version but you are too fast ;) > > > > > > I do agree that this looks rather arbitrary. You do not explain how the > > > warning actually helps. Is the intention to be really verbose to the > > > kernel log when somebody uses this interface from the IRQ context and > > > get bug reports? What about configurations with panic on warn? Do we > > > really want to crash their systems for something like that? > > > > Thanks for taking a look, Michal! > > > > The ultimate goal is not to flush in irq context or with irqs > > disabled, as in some cases it causes irqs to be disabled for a long > > time, as flushing is an expensive operation. The previous patch in the > > series should have removed the only context that flushes in irq > > context, and the purpose of the WARN_ON_ONCE() is to catch future uses > > or uses that we might have missed. > > > > There is still one code path that flushes with irqs disabled (also > > mem_cgroup_usage()), and we cannot remove this just yet; we need to > > deprecate usage threshold events for root to do that. So we cannot > > enforce not flushing with irqs disabled yet. > > > > So basically the patch is trying to enforce what we have now, not > > flushing in irq context, and hopefully at some point we will also be > > able to enforce not flushing with irqs disabled. > > > > If WARN_ON_ONCE() is the wrong tool for this, please let me know. > > > > If I understand Michal's concern, the question is should be start with > pr_warn_once() instead of WARN_ON_ONCE() and I think yes we should start > with pr_warn_once(). Yes, I do not really like the WARN_ON here. It is an overkill. pr_warn would much less intrusive but potentially incomplete because you won't know who that offender is. So if you really care about those then you would need to call dump_stack as well. So the real question is. Do we really care so deeply? After all somebody might be calling this from within a spin lock or irq disabled section resulting in a similar situation without noticing.
On Thu, Mar 30, 2023 at 12:06 AM Michal Hocko <mhocko@suse.com> wrote: > > On Wed 29-03-23 19:20:59, Shakeel Butt wrote: > > On Wed, Mar 29, 2023 at 11:41:39AM -0700, Yosry Ahmed wrote: > > > On Wed, Mar 29, 2023 at 4:22 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > On Tue 28-03-23 22:16:39, Yosry Ahmed wrote: > > > > > rstat flushing is too expensive to perform in irq context. > > > > > The previous patch removed the only context that may invoke an rstat > > > > > flush from irq context, add a WARN_ON_ONCE() to detect future > > > > > violations, or those that we are not aware of. > > > > > > > > > > Ideally, we wouldn't flush with irqs disabled either, but we have one > > > > > context today that does so in mem_cgroup_usage(). Forbid callers from > > > > > irq context for now, and hopefully we can also forbid callers with irqs > > > > > disabled in the future when we can get rid of this callsite. > > > > > > > > I am sorry to be late to the discussion. I wanted to follow up on > > > > Johannes reply in the previous version but you are too fast ;) > > > > > > > > I do agree that this looks rather arbitrary. You do not explain how the > > > > warning actually helps. Is the intention to be really verbose to the > > > > kernel log when somebody uses this interface from the IRQ context and > > > > get bug reports? What about configurations with panic on warn? Do we > > > > really want to crash their systems for something like that? > > > > > > Thanks for taking a look, Michal! > > > > > > The ultimate goal is not to flush in irq context or with irqs > > > disabled, as in some cases it causes irqs to be disabled for a long > > > time, as flushing is an expensive operation. The previous patch in the > > > series should have removed the only context that flushes in irq > > > context, and the purpose of the WARN_ON_ONCE() is to catch future uses > > > or uses that we might have missed. > > > > > > There is still one code path that flushes with irqs disabled (also > > > mem_cgroup_usage()), and we cannot remove this just yet; we need to > > > deprecate usage threshold events for root to do that. So we cannot > > > enforce not flushing with irqs disabled yet. > > > > > > So basically the patch is trying to enforce what we have now, not > > > flushing in irq context, and hopefully at some point we will also be > > > able to enforce not flushing with irqs disabled. > > > > > > If WARN_ON_ONCE() is the wrong tool for this, please let me know. > > > > > > > If I understand Michal's concern, the question is should be start with > > pr_warn_once() instead of WARN_ON_ONCE() and I think yes we should start > > with pr_warn_once(). > > Yes, I do not really like the WARN_ON here. It is an overkill. pr_warn > would much less intrusive but potentially incomplete because you won't > know who that offender is. So if you really care about those then you > would need to call dump_stack as well. > > So the real question is. Do we really care so deeply? After all somebody > might be calling this from within a spin lock or irq disabled section > resulting in a similar situation without noticing. There are discussions in [1] about making atomic rstat flush not disable irqs throughout the process, so in that case it would only result in a similar situation if the caller has irq disabled. The only caller that might have irq disabled is the same caller that might be in irq context before this series: mem_cgroup_usage(). On that note, and while I have your attention, I was wondering if we can eliminate the flush call completely from mem_cgroup_usage(), and read the global stats counters for root memcg instead of the root counters. There might be subtle differences, but the root memcg usage isn't super accurate now anyway (e.g. it doesn't include kernel memory). With that removed, no callers to rstat flushing would be from irq context or have irqs disabled. There will only be one atomic flusher (mem_cgroup_wb_stats()), and we can proceed with [1] if it causes a problem. What do you think? [1] https://lore.kernel.org/lkml/CAJD7tkZrp=4zWvjE9_010TAG1T_crCbf9P64UzJABspgcrGPKg@mail.gmail.com/ > -- > Michal Hocko > SUSE Labs
On Thu 30-03-23 00:13:16, Yosry Ahmed wrote: > On Thu, Mar 30, 2023 at 12:06 AM Michal Hocko <mhocko@suse.com> wrote: [...] > > So the real question is. Do we really care so deeply? After all somebody > > might be calling this from within a spin lock or irq disabled section > > resulting in a similar situation without noticing. > > There are discussions in [1] about making atomic rstat flush not > disable irqs throughout the process, so in that case it would only > result in a similar situation if the caller has irq disabled. The only > caller that might have irq disabled is the same caller that might be > in irq context before this series: mem_cgroup_usage(). > > On that note, and while I have your attention, I was wondering if we > can eliminate the flush call completely from mem_cgroup_usage(), and > read the global stats counters for root memcg instead of the root > counters. There might be subtle differences, but the root memcg usage > isn't super accurate now anyway (e.g. it doesn't include kernel > memory). root memcg stats are imprecise indeed and I have to admit I do not really know why we are adding more work for that case. I have tried to dig into git history for that yesterday but couldn't find a satisfying answer. It goes all the way down to 2d146aa3aa842 which has done the switch to rstat. Maybe Johannes remembers. Anyway, back to this particular patch. I still do not see strong reasons to be verbose about !in_task case so I would just drop this patch.
On Thu, Mar 30, 2023 at 12:49 AM Michal Hocko <mhocko@suse.com> wrote: > > On Thu 30-03-23 00:13:16, Yosry Ahmed wrote: > > On Thu, Mar 30, 2023 at 12:06 AM Michal Hocko <mhocko@suse.com> wrote: > [...] > > > So the real question is. Do we really care so deeply? After all somebody > > > might be calling this from within a spin lock or irq disabled section > > > resulting in a similar situation without noticing. > > > > There are discussions in [1] about making atomic rstat flush not > > disable irqs throughout the process, so in that case it would only > > result in a similar situation if the caller has irq disabled. The only > > caller that might have irq disabled is the same caller that might be > > in irq context before this series: mem_cgroup_usage(). > > > > On that note, and while I have your attention, I was wondering if we > > can eliminate the flush call completely from mem_cgroup_usage(), and > > read the global stats counters for root memcg instead of the root > > counters. There might be subtle differences, but the root memcg usage > > isn't super accurate now anyway (e.g. it doesn't include kernel > > memory). > > root memcg stats are imprecise indeed and I have to admit I do not > really know why we are adding more work for that case. I have tried to > dig into git history for that yesterday but couldn't find a satisfying > answer. It goes all the way down to 2d146aa3aa842 which has done the > switch to rstat. Maybe Johannes remembers. I think it goes back even before that. Even before rstat, we used to get the root usage by iterating over the hierarchy. Maybe we didn't have the global counters at some point or they weren't in line with the root memcg (e.g. use_hierarchy = 0). It would be great if we can just use the global counters here. Hopefully Johannes can confirm or deny this. > > Anyway, back to this particular patch. I still do not see strong reasons > to be verbose about !in_task case so I would just drop this patch. I will drop this patch in the next iteration. If I implement a patch that makes rstat flushing not disable irqs all throughout (like [1]), whether part of this series or not, and we remove flushing from mem_cgroup_usage(), then at this point: - No existing flushers will be disabling irqs. - rstat flushing itself will not be disabling irqs throughout the entire flush. If we achieve that, do you think it makes sense to add WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from flushing while disabling irqs or in irq context? All I am trying to achieve here is make sure we don't regress. I don't want to minimize the current atomic flushers now only to have them increase later. [1] https://lore.kernel.org/lkml/CAJD7tkZrp=4zWvjE9_010TAG1T_crCbf9P64UzJABspgcrGPKg@mail.gmail.com/ > -- > Michal Hocko > SUSE Labs
On Thu 30-03-23 01:06:26, Yosry Ahmed wrote: [...] > If we achieve that, do you think it makes sense to add > WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from > flushing while disabling irqs or in irq context? WARN_ON (similar to BUG_ON) will not prevent anybody from doing bad things. We already have means to shout about sleepable code being invoked from an atomic context and there is no reason to duplicate that. As I've said earlier WARN_ON might panic the system in some configurations (and yes they are used also in production systems - do not ask me why...). So please be careful about that and use that only when something really bad (yet recoverable) is going on.
On Thu, Mar 30, 2023 at 1:15 AM Michal Hocko <mhocko@suse.com> wrote: > > On Thu 30-03-23 01:06:26, Yosry Ahmed wrote: > [...] > > If we achieve that, do you think it makes sense to add > > WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from > > flushing while disabling irqs or in irq context? > > WARN_ON (similar to BUG_ON) will not prevent anybody from doing bad > things. We already have means to shout about sleepable code being > invoked from an atomic context and there is no reason to duplicate that. > As I've said earlier WARN_ON might panic the system in some > configurations (and yes they are used also in production systems - do > not ask me why...). So please be careful about that and use that only > when something really bad (yet recoverable) is going on. Thanks for the information (I was about to ask why about production systems, but okay..). I will avoid WARN_ON completely. For the purposes of this series I will drop this patch anyway. Any idea how to shout about "hey this may take too long, why are you doing it with irqs disabled?!"? > > -- > Michal Hocko > SUSE Labs
On Thu 30-03-23 01:19:29, Yosry Ahmed wrote: > On Thu, Mar 30, 2023 at 1:15 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Thu 30-03-23 01:06:26, Yosry Ahmed wrote: > > [...] > > > If we achieve that, do you think it makes sense to add > > > WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from > > > flushing while disabling irqs or in irq context? > > > > WARN_ON (similar to BUG_ON) will not prevent anybody from doing bad > > things. We already have means to shout about sleepable code being > > invoked from an atomic context and there is no reason to duplicate that. > > As I've said earlier WARN_ON might panic the system in some > > configurations (and yes they are used also in production systems - do > > not ask me why...). So please be careful about that and use that only > > when something really bad (yet recoverable) is going on. > > Thanks for the information (I was about to ask why about production > systems, but okay..). I will avoid WARN_ON completely. For the > purposes of this series I will drop this patch anyway. Thanks! People do strange things sometimes... > Any idea how to shout about "hey this may take too long, why are you > doing it with irqs disabled?!"? Well we have a hard lockup detector. It hits at a much higher stall by default but if you care about IRQ latencies in general then you likely want to lower. Another thing would be IRQ tracing. In any case this code path shouldn't be any special. Sure it can take long on large systems but I bet there are more of those.
On Thu, Mar 30, 2023 at 1:39 AM Michal Hocko <mhocko@suse.com> wrote: > > On Thu 30-03-23 01:19:29, Yosry Ahmed wrote: > > On Thu, Mar 30, 2023 at 1:15 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Thu 30-03-23 01:06:26, Yosry Ahmed wrote: > > > [...] > > > > If we achieve that, do you think it makes sense to add > > > > WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from > > > > flushing while disabling irqs or in irq context? > > > > > > WARN_ON (similar to BUG_ON) will not prevent anybody from doing bad > > > things. We already have means to shout about sleepable code being > > > invoked from an atomic context and there is no reason to duplicate that. > > > As I've said earlier WARN_ON might panic the system in some > > > configurations (and yes they are used also in production systems - do > > > not ask me why...). So please be careful about that and use that only > > > when something really bad (yet recoverable) is going on. > > > > Thanks for the information (I was about to ask why about production > > systems, but okay..). I will avoid WARN_ON completely. For the > > purposes of this series I will drop this patch anyway. > > Thanks! People do strange things sometimes... > > > Any idea how to shout about "hey this may take too long, why are you > > doing it with irqs disabled?!"? > > Well we have a hard lockup detector. It hits at a much higher stall by > default but if you care about IRQ latencies in general then you likely > want to lower. Another thing would be IRQ tracing. In any case this code > path shouldn't be any special. Sure it can take long on large systems > but I bet there are more of those. We did see hard lockups in extreme cases, and we do have setups with "nmi_watchdog=panic" that will panic when this happens, so we would rather catch the code paths that can lead to that before it actually happens. Maybe we can add a primitive like might_sleep() for this, just food for thought. > -- > Michal Hocko > SUSE Labs
On Thu 30-03-23 01:53:38, Yosry Ahmed wrote:
[...]
> Maybe we can add a primitive like might_sleep() for this, just food for thought.
I do not think it is the correct to abuse might_sleep if the function
itself doesn't sleep. If it does might_sleep is already involved.
On Fri, Mar 31, 2023 at 4:02 AM Michal Hocko <mhocko@suse.com> wrote: > > On Thu 30-03-23 01:53:38, Yosry Ahmed wrote: > [...] > > Maybe we can add a primitive like might_sleep() for this, just food for thought. > > I do not think it is the correct to abuse might_sleep if the function > itself doesn't sleep. If it does might_sleep is already involved. Oh, sorry if I wasn't clear, I did not mean to reuse might_sleep() -- I meant introducing a new similar debug primitive that shouts if irqs are disabled. > -- > Michal Hocko > SUSE Labs
On Fri 31-03-23 12:03:47, Yosry Ahmed wrote: > On Fri, Mar 31, 2023 at 4:02 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Thu 30-03-23 01:53:38, Yosry Ahmed wrote: > > [...] > > > Maybe we can add a primitive like might_sleep() for this, just food for thought. > > > > I do not think it is the correct to abuse might_sleep if the function > > itself doesn't sleep. If it does might_sleep is already involved. > > Oh, sorry if I wasn't clear, I did not mean to reuse might_sleep() -- > I meant introducing a new similar debug primitive that shouts if irqs > are disabled. This is circling back to original concerns about arbitrary decision to care about IRQs. Is this really any different from spin locks or preempt disabled critical sections preventing any scheduling and potentially triggereing soft lockups?
On Mon, Apr 3, 2023 at 1:38 AM Michal Hocko <mhocko@suse.com> wrote: > > On Fri 31-03-23 12:03:47, Yosry Ahmed wrote: > > On Fri, Mar 31, 2023 at 4:02 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Thu 30-03-23 01:53:38, Yosry Ahmed wrote: > > > [...] > > > > Maybe we can add a primitive like might_sleep() for this, just food for thought. > > > > > > I do not think it is the correct to abuse might_sleep if the function > > > itself doesn't sleep. If it does might_sleep is already involved. > > > > Oh, sorry if I wasn't clear, I did not mean to reuse might_sleep() -- > > I meant introducing a new similar debug primitive that shouts if irqs > > are disabled. > > This is circling back to original concerns about arbitrary decision to > care about IRQs. Is this really any different from spin locks or preempt > disabled critical sections preventing any scheduling and potentially > triggereing soft lockups? Not really, I am sure there are other code paths that may cause similar problems. At least we can start annotating them so that we don't regress them (e.g. by introducing a caller that disables irqs -- converting them into hard lockups). > -- > Michal Hocko > SUSE Labs
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index d3252b0416b6..c2571939139f 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -176,6 +176,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) { int cpu; + /* rstat flushing is too expensive for irq context */ + WARN_ON_ONCE(!in_task()); lockdep_assert_held(&cgroup_rstat_lock); for_each_possible_cpu(cpu) {