Message ID | ZJtBrybavtb1x45V@tpad |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8463783vqr; Tue, 27 Jun 2023 13:52:45 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7GV6z4EgwblSgmWMRE7LfKa9+bvC0l/i/QJpvP0sGB/2r41LFmbPJtWdiUm1kRxE+6whRS X-Received: by 2002:a05:6808:1449:b0:3a0:3144:dd3c with SMTP id x9-20020a056808144900b003a03144dd3cmr35104294oiv.2.1687899165244; Tue, 27 Jun 2023 13:52:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687899165; cv=none; d=google.com; s=arc-20160816; b=y2KOE6HYxnZu5lQlM0Ze4nSGkVFdzzKfFFjY9HEVcbiGLpRchBVv+XmnHQTAqR5sC3 D5SsZng2WzvK+SH4FVti9WCPXKcjb6weNHxpgZKXiUiFtAquzEsyLjBRrlDUCqOROe6i WPeuDJWHUFLLnzgiotP8yOxMLeSP+Uqyl00KV9/npl7/S3+7rERwgcWCeUNXNQmbmobF 9nzHASbSMDfpOAl/nUie3mxtaSKwgK517yKT7NSbrthwREuFg+p4I4OVoZclfM3NHlro xdQVgkitt0Zqyh/H/G3BSjO354KX9Z/sbo9JCvAsj6L546jhbaZ58cmXaLTpEEACk/uM fbxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=rnusInJBPW2GEGronAXVDyjVA+cllAQmOUTk55olvn8=; fh=EeLicN7z6Ybp7d+imj1S/UsRj0f/6h4qTVbjSnjo03Q=; b=rCKdxvxZAmkjEcFlkJXimbcVVnZWCE5PZzN7y0GnQG8PzEkinUywH1ezXd8li/0CJq 0uZj1eQe7Zx5qObrt4DBYZRykVxlQKphe2COdsWipy9JYwM0T+UJUClLjunhe414u3lS y5OmUmijUCk1cWLA3bO9aHyes7xqXNYe3Q/O8EkreWR137To6DZZRAU6FtegjhnPCH7N Kem6UC1ud0dtCqHmG9zY3ieJKwD4EEKIBxtmxkEL0Dm3YJ8j2m5lh3QV/lqXVf+ntaxp H0c/Ea7cQnATN5TBsYZvs27DuPMa+Hb2NN5jFT5qaTRHD4ePcS7xOz11Z3t4CIUhKftU sLJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Eo3jXY5o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d26-20020a631d5a000000b00557888b4e8fsi7883313pgm.342.2023.06.27.13.52.32; Tue, 27 Jun 2023 13:52:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Eo3jXY5o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230324AbjF0UJg (ORCPT <rfc822;nicolai.engesland@gmail.com> + 99 others); Tue, 27 Jun 2023 16:09:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229748AbjF0UJ2 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 27 Jun 2023 16:09:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2CF226BD for <linux-kernel@vger.kernel.org>; Tue, 27 Jun 2023 13:08:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687896521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=rnusInJBPW2GEGronAXVDyjVA+cllAQmOUTk55olvn8=; b=Eo3jXY5ol5d0I84Hr6lLq/Rp+5RdXR6983YDVaYM+Wcx4ngNNK1e+6TxdUBjXiUVUSMgvs OzV5UW+LvPBaD5X7L2upcd0dRbbPScU8X9FOE9Mxa3cWvUPK6KD4l4xAw/UifeewRVO3eV 9ECMLjq4/uPpcQa+AdJEgGgwuL1WouI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-203-X3wHMvlBPX6bjTDWtTSNpw-1; Tue, 27 Jun 2023 16:08:37 -0400 X-MC-Unique: X3wHMvlBPX6bjTDWtTSNpw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E21B9104D51C; Tue, 27 Jun 2023 20:08:35 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6B775200BA86; Tue, 27 Jun 2023 20:08:35 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 30CBC400F7B5B; Tue, 27 Jun 2023 17:08:15 -0300 (-03) Date: Tue, 27 Jun 2023 17:08:15 -0300 From: Marcelo Tosatti <mtosatti@redhat.com> To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, Matthew Wilcox <willy@infradead.org>, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>, Frederic Weisbecker <frederic@kernel.org>, Dave Chinner <david@fromorbit.com>, Valentin Schneider <vschneid@redhat.com>, Leonardo Bras <leobras@redhat.com>, Yair Podemsky <ypodemsk@redhat.com>, P J P <ppandit@redhat.com> Subject: [PATCH] fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs Message-ID: <ZJtBrybavtb1x45V@tpad> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769890555373967497?= X-GMAIL-MSGID: =?utf-8?q?1769890555373967497?= |
Series |
fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
|
|
Commit Message
Marcelo Tosatti
June 27, 2023, 8:08 p.m. UTC
For certain types of applications (for example PLC software or
RAN processing), upon occurrence of an event, it is necessary to
complete a certain task in a maximum amount of time (deadline).
One way to express this requirement is with a pair of numbers,
deadline time and execution time, where:
* deadline time: length of time between event and deadline.
* execution time: length of time it takes for processing of event
to occur on a particular hardware platform
(uninterrupted).
The particular values depend on use-case. For the case
where the realtime application executes in a virtualized
guest, an IPI which must be serviced in the host will cause
the following sequence of events:
1) VM-exit
2) execution of IPI (and function call)
3) VM-entry
Which causes an excess of 50us latency as observed by cyclictest
(this violates the latency requirement of vRAN application with 1ms TTI,
for example).
invalidate_bh_lrus calls an IPI on each CPU that has non empty
per-CPU cache:
on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);
The performance when using the per-CPU LRU cache is as follows:
42 ns per __find_get_block
68 ns per __find_get_block_slow
Given that the main use cases for latency sensitive applications
do not involve block I/O (data necessary for program operation is
locked in RAM), disable per-CPU buffer_head caches for isolated CPUs.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Comments
Ping, apparently there is no objection to this patch... Christian, what is the preferred tree for integration? On Tue, Jun 27, 2023 at 05:08:15PM -0300, Marcelo Tosatti wrote: > > For certain types of applications (for example PLC software or > RAN processing), upon occurrence of an event, it is necessary to > complete a certain task in a maximum amount of time (deadline). > > One way to express this requirement is with a pair of numbers, > deadline time and execution time, where: > > * deadline time: length of time between event and deadline. > * execution time: length of time it takes for processing of event > to occur on a particular hardware platform > (uninterrupted). > > The particular values depend on use-case. For the case > where the realtime application executes in a virtualized > guest, an IPI which must be serviced in the host will cause > the following sequence of events: > > 1) VM-exit > 2) execution of IPI (and function call) > 3) VM-entry > > Which causes an excess of 50us latency as observed by cyclictest > (this violates the latency requirement of vRAN application with 1ms TTI, > for example). > > invalidate_bh_lrus calls an IPI on each CPU that has non empty > per-CPU cache: > > on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1); > > The performance when using the per-CPU LRU cache is as follows: > > 42 ns per __find_get_block > 68 ns per __find_get_block_slow > > Given that the main use cases for latency sensitive applications > do not involve block I/O (data necessary for program operation is > locked in RAM), disable per-CPU buffer_head caches for isolated CPUs. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > diff --git a/fs/buffer.c b/fs/buffer.c > index a7fc561758b1..49e9160ce100 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -49,6 +49,7 @@ > #include <trace/events/block.h> > #include <linux/fscrypt.h> > #include <linux/fsverity.h> > +#include <linux/sched/isolation.h> > > #include "internal.h" > > @@ -1289,7 +1290,7 @@ static void bh_lru_install(struct buffer_head *bh) > * failing page migration. > * Skip putting upcoming bh into bh_lru until migration is done. > */ > - if (lru_cache_disabled()) { > + if (lru_cache_disabled() || cpu_is_isolated(smp_processor_id())) { > bh_lru_unlock(); > return; > } > @@ -1319,6 +1320,10 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size) > > check_irqs_on(); > bh_lru_lock(); > + if (cpu_is_isolated(smp_processor_id())) { > + bh_lru_unlock(); > + return NULL; > + } > for (i = 0; i < BH_LRU_SIZE; i++) { > struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]); >
On Wed, Jul 26, 2023 at 11:31:26AM -0300, Marcelo Tosatti wrote: > > Ping, apparently there is no objection to this patch... > > Christian, what is the preferred tree for integration? It'd be good if we could get an Ack from someone familiar with isolated cpus for this; or just in general from someone who can ack this.
On Thu, Jul 27, 2023 at 11:18:11AM +0200, Christian Brauner wrote: > On Wed, Jul 26, 2023 at 11:31:26AM -0300, Marcelo Tosatti wrote: > > > > Ping, apparently there is no objection to this patch... > > > > Christian, what is the preferred tree for integration? > > It'd be good if we could get an Ack from someone familiar with isolated > cpus for this; or just in general from someone who can ack this. Frederic?
On Tue, Jun 27, 2023 at 05:08:15PM -0300, Marcelo Tosatti wrote: > > For certain types of applications (for example PLC software or > RAN processing), upon occurrence of an event, it is necessary to > complete a certain task in a maximum amount of time (deadline). > > One way to express this requirement is with a pair of numbers, > deadline time and execution time, where: > > * deadline time: length of time between event and deadline. > * execution time: length of time it takes for processing of event > to occur on a particular hardware platform > (uninterrupted). > > The particular values depend on use-case. For the case > where the realtime application executes in a virtualized > guest, an IPI which must be serviced in the host will cause > the following sequence of events: > > 1) VM-exit > 2) execution of IPI (and function call) > 3) VM-entry > > Which causes an excess of 50us latency as observed by cyclictest > (this violates the latency requirement of vRAN application with 1ms TTI, > for example). > > invalidate_bh_lrus calls an IPI on each CPU that has non empty > per-CPU cache: > > on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1); > > The performance when using the per-CPU LRU cache is as follows: > > 42 ns per __find_get_block > 68 ns per __find_get_block_slow > > Given that the main use cases for latency sensitive applications > do not involve block I/O (data necessary for program operation is > locked in RAM), disable per-CPU buffer_head caches for isolated CPUs. So what happens if they ever do I/O then? Like if they need to do some prep work before entering an isolated critical section? Thanks. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > diff --git a/fs/buffer.c b/fs/buffer.c > index a7fc561758b1..49e9160ce100 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -49,6 +49,7 @@ > #include <trace/events/block.h> > #include <linux/fscrypt.h> > #include <linux/fsverity.h> > +#include <linux/sched/isolation.h> > > #include "internal.h" > > @@ -1289,7 +1290,7 @@ static void bh_lru_install(struct buffer_head *bh) > * failing page migration. > * Skip putting upcoming bh into bh_lru until migration is done. > */ > - if (lru_cache_disabled()) { > + if (lru_cache_disabled() || cpu_is_isolated(smp_processor_id())) { > bh_lru_unlock(); > return; > } > @@ -1319,6 +1320,10 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size) > > check_irqs_on(); > bh_lru_lock(); > + if (cpu_is_isolated(smp_processor_id())) { > + bh_lru_unlock(); > + return NULL; > + } > for (i = 0; i < BH_LRU_SIZE; i++) { > struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]); > >
On Sat, Aug 05, 2023 at 12:03:59AM +0200, Frederic Weisbecker wrote: > On Tue, Jun 27, 2023 at 05:08:15PM -0300, Marcelo Tosatti wrote: > > > > For certain types of applications (for example PLC software or > > RAN processing), upon occurrence of an event, it is necessary to > > complete a certain task in a maximum amount of time (deadline). > > > > One way to express this requirement is with a pair of numbers, > > deadline time and execution time, where: > > > > * deadline time: length of time between event and deadline. > > * execution time: length of time it takes for processing of event > > to occur on a particular hardware platform > > (uninterrupted). > > > > The particular values depend on use-case. For the case > > where the realtime application executes in a virtualized > > guest, an IPI which must be serviced in the host will cause > > the following sequence of events: > > > > 1) VM-exit > > 2) execution of IPI (and function call) > > 3) VM-entry > > > > Which causes an excess of 50us latency as observed by cyclictest > > (this violates the latency requirement of vRAN application with 1ms TTI, > > for example). > > > > invalidate_bh_lrus calls an IPI on each CPU that has non empty > > per-CPU cache: > > > > on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1); > > > > The performance when using the per-CPU LRU cache is as follows: > > > > 42 ns per __find_get_block > > 68 ns per __find_get_block_slow > > > > Given that the main use cases for latency sensitive applications > > do not involve block I/O (data necessary for program operation is > > locked in RAM), disable per-CPU buffer_head caches for isolated CPUs. Hi Frederic, > So what happens if they ever do I/O then? Like if they need to do > some prep work before entering an isolated critical section? Then instead of going through the per-CPU LRU buffer_head cache (__find_get_block), isolated CPUs will work as if their per-CPU cache is always empty, going through the slowpath (__find_get_block_slow). The algorithm is: /* * Perform a pagecache lookup for the matching buffer. If it's there, refresh * it in the LRU and mark it as accessed. If it is not present then return * NULL */ struct buffer_head * __find_get_block(struct block_device *bdev, sector_t block, unsigned size) { struct buffer_head *bh = lookup_bh_lru(bdev, block, size); if (bh == NULL) { /* __find_get_block_slow will mark the page accessed */ bh = __find_get_block_slow(bdev, block); if (bh) bh_lru_install(bh); } else touch_buffer(bh); return bh; } EXPORT_SYMBOL(__find_get_block); I think the performance difference between the per-CPU LRU cache VS __find_get_block_slow was much more significant when the cache was introduced. Nowadays its only 26ns (moreover modern filesystems do not use buffer_head's). > Thanks. Thank you for the review. > > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index a7fc561758b1..49e9160ce100 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -49,6 +49,7 @@ > > #include <trace/events/block.h> > > #include <linux/fscrypt.h> > > #include <linux/fsverity.h> > > +#include <linux/sched/isolation.h> > > > > #include "internal.h" > > > > @@ -1289,7 +1290,7 @@ static void bh_lru_install(struct buffer_head *bh) > > * failing page migration. > > * Skip putting upcoming bh into bh_lru until migration is done. > > */ > > - if (lru_cache_disabled()) { > > + if (lru_cache_disabled() || cpu_is_isolated(smp_processor_id())) { > > bh_lru_unlock(); > > return; > > } > > @@ -1319,6 +1320,10 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size) > > > > check_irqs_on(); > > bh_lru_lock(); > > + if (cpu_is_isolated(smp_processor_id())) { > > + bh_lru_unlock(); > > + return NULL; > > + } > > for (i = 0; i < BH_LRU_SIZE; i++) { > > struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]); > > > > > >
On Fri, Aug 04, 2023 at 08:54:37PM -0300, Marcelo Tosatti wrote: > > So what happens if they ever do I/O then? Like if they need to do > > some prep work before entering an isolated critical section? > > Then instead of going through the per-CPU LRU buffer_head cache > (__find_get_block), isolated CPUs will work as if their per-CPU > cache is always empty, going through the slowpath > (__find_get_block_slow). The algorithm is: > > /* > * Perform a pagecache lookup for the matching buffer. If it's there, refresh > * it in the LRU and mark it as accessed. If it is not present then return > * NULL > */ > struct buffer_head * > __find_get_block(struct block_device *bdev, sector_t block, unsigned size) > { > struct buffer_head *bh = lookup_bh_lru(bdev, block, size); > > if (bh == NULL) { > /* __find_get_block_slow will mark the page accessed */ > bh = __find_get_block_slow(bdev, block); > if (bh) > bh_lru_install(bh); > } else > touch_buffer(bh); > > return bh; > } > EXPORT_SYMBOL(__find_get_block); > > I think the performance difference between the per-CPU LRU cache > VS __find_get_block_slow was much more significant when the cache > was introduced. Nowadays its only 26ns (moreover modern filesystems > do not use buffer_head's). Sounds good then! Acked-by: Frederic Weisbecker <frederic@kernel.org> Thanks!
On Tue, 27 Jun 2023 17:08:15 -0300, Marcelo Tosatti wrote: > For certain types of applications (for example PLC software or > RAN processing), upon occurrence of an event, it is necessary to > complete a certain task in a maximum amount of time (deadline). > > One way to express this requirement is with a pair of numbers, > deadline time and execution time, where: > > [...] Applied to the vfs.misc branch of the vfs/vfs.git tree. Patches in the vfs.misc branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.misc [1/1] fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs https://git.kernel.org/vfs/vfs/c/9ed7cfdf38b8
diff --git a/fs/buffer.c b/fs/buffer.c index a7fc561758b1..49e9160ce100 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -49,6 +49,7 @@ #include <trace/events/block.h> #include <linux/fscrypt.h> #include <linux/fsverity.h> +#include <linux/sched/isolation.h> #include "internal.h" @@ -1289,7 +1290,7 @@ static void bh_lru_install(struct buffer_head *bh) * failing page migration. * Skip putting upcoming bh into bh_lru until migration is done. */ - if (lru_cache_disabled()) { + if (lru_cache_disabled() || cpu_is_isolated(smp_processor_id())) { bh_lru_unlock(); return; } @@ -1319,6 +1320,10 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size) check_irqs_on(); bh_lru_lock(); + if (cpu_is_isolated(smp_processor_id())) { + bh_lru_unlock(); + return NULL; + } for (i = 0; i < BH_LRU_SIZE; i++) { struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]);