Message ID | 20230223164353.2839177-2-leitao@debian.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp418781wrd; Thu, 23 Feb 2023 09:02:07 -0800 (PST) X-Google-Smtp-Source: AK7set9z6+V28HykaxbUcw4u5yjaINJ8+vk4iuytNov87n94/Ajy+UcdoR23nEMZ0ZshecGOyye9 X-Received: by 2002:a05:6a20:12cd:b0:cc:416f:2dd0 with SMTP id v13-20020a056a2012cd00b000cc416f2dd0mr1184917pzg.3.1677171726871; Thu, 23 Feb 2023 09:02:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677171726; cv=none; d=google.com; s=arc-20160816; b=vPQjo6LJlToJWx9KzJcZrJu/0JzEgI1cP+B2JRaHfnyCCNLNE0JgLttJr4JpMiNLFk Ej/wAYdLSRnK+NeoSeBiBA8A/ec1cSA6+K7Yqjwp9vL9w+XPnfU1HCNgAiaJhj2F3NK6 dcN5muL9rwtZ3J94MUukF91HszMgAX6mDfrdGWLnDB32Mj42shDqtU9VUkYE9cnRD76Q hHMi/K0A1iBc1OFT0z+Y/YHIFLQueED3KcpIOS2F/Cn/0n6l4F+unr3Q8quHFyKtF9al YNbS8X5X85lSc/0VD/BUMoqSzNBiAHlSqq3qL7JKXgq1u12o6EhLaMQAjseKQidhP+FD m1+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=+k0d3ITubH2glaeJAp2cfaEZXtQloEAVR31lcQ0bITk=; b=IxUXmkyJj6TYjnWfgcD2kAmpV17No+UV5lHIS6KmQQEUp/WNWMbESKxTUCRz4lrDeJ h2QjT2YInY6EtlugfTw4V7iBzXxs1qENDdujwcMoPN0QMJ2IxTkSat23f5ANXL+w1azA q4CklWgMz24EZlb8T6KFVX+4GvNUhLCEmF932VdZrQ9xYMc4Rk/WenvmHQlY8Pn0Je3O YxYdrVL75GVaBh0ViABEncHEoacT+fGwx7kXjTJcwDwSMvomvnv8YBPubLTjkYfA6Nnm BPNWJ7cuKqiTukIsjebTtlxoQiTtm/43uNk1oZxhlmfcnMmzagDOJddce8BwFMqeklG4 Jdfg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u9-20020a63df09000000b004fc1d8eaf07si5894520pgg.406.2023.02.23.09.01.41; Thu, 23 Feb 2023 09:02:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234681AbjBWQoZ (ORCPT <rfc822;cambridge8321@gmail.com> + 99 others); Thu, 23 Feb 2023 11:44:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234026AbjBWQoX (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 23 Feb 2023 11:44:23 -0500 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBFC639CD4; Thu, 23 Feb 2023 08:44:21 -0800 (PST) Received: by mail-wr1-f42.google.com with SMTP id j2so10990011wrh.9; Thu, 23 Feb 2023 08:44:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+k0d3ITubH2glaeJAp2cfaEZXtQloEAVR31lcQ0bITk=; b=QlBPgV3u59OIN4TN6/CCgtQNRFPZhpiFaAhJ2/VczU9uZlblr1x7kJncMuRYoMWAUz e6dadVz/ncEOn/QbV0oQRQN7pTjaEoX7h2I5p3gM6OQfMC8iVQlpElqA4l4q39xrqtEi QLDBt6/6N+741jKJCe+Svf69qt6ilGbWp2BuYGdl3GKUSs5bl50SyEk/HsyIDNXV7kcD 4csb1Kagp60HLc29eWJ/CqEARf+6q+tVdecRt9hoM0rxz3fbuonNFOKqXQDwUga6izN8 HhVOGrKLbPB+JuP78CaHE7pWvOWT0f8j1rsdBt5GZ5j4tjk0yQ45K90J5tZvNfML4nge vqXg== X-Gm-Message-State: AO0yUKUVwyhXgcrWy1zvMTW2Mc1Q6MN6gcrebfiJmbk8gWoWuxRXdKbZ cJEbx/Uq2eFyX9i2jn38jbI= X-Received: by 2002:a5d:51ca:0:b0:2c7:1159:ea43 with SMTP id n10-20020a5d51ca000000b002c71159ea43mr2492370wrv.51.1677170660333; Thu, 23 Feb 2023 08:44:20 -0800 (PST) Received: from localhost (fwdproxy-cln-008.fbsv.net. [2a03:2880:31ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id v26-20020a5d591a000000b002c573cff730sm7700970wrd.68.2023.02.23.08.44.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Feb 2023 08:44:19 -0800 (PST) From: Breno Leitao <leitao@debian.org> To: axboe@kernel.dk, asml.silence@gmail.com, io-uring@vger.kernel.org Cc: linux-kernel@vger.kernel.org, gustavold@meta.com, leit@meta.com, kasan-dev@googlegroups.com Subject: [PATCH v3 1/2] io_uring: Move from hlist to io_wq_work_node Date: Thu, 23 Feb 2023 08:43:52 -0800 Message-Id: <20230223164353.2839177-2-leitao@debian.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230223164353.2839177-1-leitao@debian.org> References: <20230223164353.2839177-1-leitao@debian.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758642020785591678?= X-GMAIL-MSGID: =?utf-8?q?1758642020785591678?= |
Series |
io_uring: Add KASAN support for alloc caches
|
|
Commit Message
Breno Leitao
Feb. 23, 2023, 4:43 p.m. UTC
Having cache entries linked using the hlist format brings no benefit, and
also requires an unnecessary extra pointer address per cache entry.
Use the internal io_wq_work_node single-linked list for the internal
alloc caches (async_msghdr and async_poll)
This is required to be able to use KASAN on cache entries, since we do
not need to touch unused (and poisoned) cache entries when adding more
entries to the list.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/linux/io_uring_types.h | 2 +-
io_uring/alloc_cache.h | 24 +++++++++++++-----------
2 files changed, 14 insertions(+), 12 deletions(-)
Comments
Breno Leitao <leitao@debian.org> writes: > Having cache entries linked using the hlist format brings no benefit, and > also requires an unnecessary extra pointer address per cache entry. > > Use the internal io_wq_work_node single-linked list for the internal > alloc caches (async_msghdr and async_poll) > > This is required to be able to use KASAN on cache entries, since we do > not need to touch unused (and poisoned) cache entries when adding more > entries to the list. > Looking at this patch, I wonder if it could go in the opposite direction instead, and drop io_wq_work_node entirely in favor of list_head. :) Do we gain anything other than avoiding the backpointer with a custom linked implementation, instead of using the interface available in list.h, that developers know how to use and has other features like poisoning and extra debug checks? > static inline struct io_cache_entry *io_alloc_cache_get(struct io_alloc_cache *cache) > { > - if (!hlist_empty(&cache->list)) { > - struct hlist_node *node = cache->list.first; > + if (cache->list.next) { > + struct io_cache_entry *entry; > > - hlist_del(node); > - return container_of(node, struct io_cache_entry, node); > + entry = container_of(cache->list.next, struct io_cache_entry, node); > + cache->list.next = cache->list.next->next; > + return entry; > } From a quick look, I think you could use wq_stack_extract() here
On 2/23/23 12:02?PM, Gabriel Krisman Bertazi wrote: > Breno Leitao <leitao@debian.org> writes: > >> Having cache entries linked using the hlist format brings no benefit, and >> also requires an unnecessary extra pointer address per cache entry. >> >> Use the internal io_wq_work_node single-linked list for the internal >> alloc caches (async_msghdr and async_poll) >> >> This is required to be able to use KASAN on cache entries, since we do >> not need to touch unused (and poisoned) cache entries when adding more >> entries to the list. >> > > Looking at this patch, I wonder if it could go in the opposite direction > instead, and drop io_wq_work_node entirely in favor of list_head. :) > > Do we gain anything other than avoiding the backpointer with a custom > linked implementation, instead of using the interface available in > list.h, that developers know how to use and has other features like > poisoning and extra debug checks? list_head is twice as big, that's the main motivation. This impacts memory usage (obviously), but also caches when adding/removing entries.
Hello Krisman, thanks for the review On Thu, Feb 23, 2023 at 04:02:25PM -0300, Gabriel Krisman Bertazi wrote: > Breno Leitao <leitao@debian.org> writes: > > static inline struct io_cache_entry *io_alloc_cache_get(struct io_alloc_cache *cache) > > { > > - if (!hlist_empty(&cache->list)) { > > - struct hlist_node *node = cache->list.first; > > + if (cache->list.next) { > > + struct io_cache_entry *entry; > > > > - hlist_del(node); > > - return container_of(node, struct io_cache_entry, node); > > + entry = container_of(cache->list.next, struct io_cache_entry, node); > > + cache->list.next = cache->list.next->next; > > + return entry; > > } > > From a quick look, I think you could use wq_stack_extract() here True, we can use wq_stack_extract() in this patch, but, we would need to revert to back to this code in the next patch. Remember that wq_stack_extract() touches the stack->next->next, which will be poisoned, causing a KASAN warning. Here is relevant part of the code: struct io_wq_work_node *wq_stack_extract(struct io_wq_work_node *stack) { struct io_wq_work_node *node = stack->next; stack->next = node->next;
Jens Axboe <axboe@kernel.dk> writes: > On 2/23/23 12:02?PM, Gabriel Krisman Bertazi wrote: >> Breno Leitao <leitao@debian.org> writes: >> >>> Having cache entries linked using the hlist format brings no benefit, and >>> also requires an unnecessary extra pointer address per cache entry. >>> >>> Use the internal io_wq_work_node single-linked list for the internal >>> alloc caches (async_msghdr and async_poll) >>> >>> This is required to be able to use KASAN on cache entries, since we do >>> not need to touch unused (and poisoned) cache entries when adding more >>> entries to the list. >>> >> >> Looking at this patch, I wonder if it could go in the opposite direction >> instead, and drop io_wq_work_node entirely in favor of list_head. :) >> >> Do we gain anything other than avoiding the backpointer with a custom >> linked implementation, instead of using the interface available in >> list.h, that developers know how to use and has other features like >> poisoning and extra debug checks? > > list_head is twice as big, that's the main motivation. This impacts > memory usage (obviously), but also caches when adding/removing > entries. Right. But this is true all around the kernel. Many (Most?) places that use list_head don't even need to touch list_head->prev. And list_head is usually embedded in larger structures where the cost of the extra pointer is insignificant. I suspect the memory footprint shouldn't really be the problem. This specific patch is extending io_wq_work_node to io_cache_entry, where the increased size will not matter. In fact, for the cached structures, the cache layout and memory footprint don't even seem to change, as io_cache_entry is already in a union larger than itself, that is not crossing cachelines, (io_async_msghdr, async_poll). The other structures currently embedding struct io_work_node are io_kiocb (216 bytes long, per request) and io_ring_ctx (1472 bytes long, per ring). so it is not like we are saving a lot of memory with a single linked list. A more compact cache line still makes sense, though, but I think the only case (if any) where there might be any gain is io_kiocb? I don't severely oppose this patch, of course. But I think it'd be worth killing io_uring/slist.h entirely in the future instead of adding more users. I intend to give that approach a try, if there's a way to keep the size of io_kiocb.
On 2/24/23 11:32?AM, Gabriel Krisman Bertazi wrote: > Jens Axboe <axboe@kernel.dk> writes: > >> On 2/23/23 12:02?PM, Gabriel Krisman Bertazi wrote: >>> Breno Leitao <leitao@debian.org> writes: >>> >>>> Having cache entries linked using the hlist format brings no benefit, and >>>> also requires an unnecessary extra pointer address per cache entry. >>>> >>>> Use the internal io_wq_work_node single-linked list for the internal >>>> alloc caches (async_msghdr and async_poll) >>>> >>>> This is required to be able to use KASAN on cache entries, since we do >>>> not need to touch unused (and poisoned) cache entries when adding more >>>> entries to the list. >>>> >>> >>> Looking at this patch, I wonder if it could go in the opposite direction >>> instead, and drop io_wq_work_node entirely in favor of list_head. :) >>> >>> Do we gain anything other than avoiding the backpointer with a custom >>> linked implementation, instead of using the interface available in >>> list.h, that developers know how to use and has other features like >>> poisoning and extra debug checks? >> >> list_head is twice as big, that's the main motivation. This impacts >> memory usage (obviously), but also caches when adding/removing >> entries. > > Right. But this is true all around the kernel. Many (Most?) places > that use list_head don't even need to touch list_head->prev. And > list_head is usually embedded in larger structures where the cost of > the extra pointer is insignificant. I suspect the memory > footprint shouldn't really be the problem. I may be in the minority here in caring deeply about even little details in terms of memory foot print and how many cachelines we touch... Eg if we can embed 8 bytes rather than 16, then why not? Particularly for cases where we may have a lot of these structures. But it's of course always a tradeoff. > This specific patch is extending io_wq_work_node to io_cache_entry, > where the increased size will not matter. In fact, for the cached > structures, the cache layout and memory footprint don't even seem to > change, as io_cache_entry is already in a union larger than itself, that > is not crossing cachelines, (io_async_msghdr, async_poll). True, for the caching case, the member size doesn't matter. At least immediately. Sometimes things are shuffled around and optimized further, and then you may need to find 8 bytes to avoid bloating the struct. > The other structures currently embedding struct io_work_node are > io_kiocb (216 bytes long, per request) and io_ring_ctx (1472 bytes long, > per ring). so it is not like we are saving a lot of memory with a single > linked list. A more compact cache line still makes sense, though, but I > think the only case (if any) where there might be any gain is io_kiocb? Yeah, the ring is already pretty big. It is still handled in cachelines for the bits that matter, so nice to keep them as small for the sections. Maybe bumping it will waste an extra cacheline. Or, more commonly, later additions now end up bumping into the next cacheline rather than still fitting. > I don't severely oppose this patch, of course. But I think it'd be worth > killing io_uring/slist.h entirely in the future instead of adding more > users. I intend to give that approach a try, if there's a way to keep > the size of io_kiocb. At least it's consistent within io_uring, which also means something. I'd be fine with taking a look at such a patch, but let's please keep it outside the scope of this change.
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 0efe4d784358..efa66b6c32c9 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -188,7 +188,7 @@ struct io_ev_fd { }; struct io_alloc_cache { - struct hlist_head list; + struct io_wq_work_node list; unsigned int nr_cached; }; diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h index 729793ae9712..301855e94309 100644 --- a/io_uring/alloc_cache.h +++ b/io_uring/alloc_cache.h @@ -7,7 +7,7 @@ #define IO_ALLOC_CACHE_MAX 512 struct io_cache_entry { - struct hlist_node node; + struct io_wq_work_node node; }; static inline bool io_alloc_cache_put(struct io_alloc_cache *cache, @@ -15,7 +15,7 @@ static inline bool io_alloc_cache_put(struct io_alloc_cache *cache, { if (cache->nr_cached < IO_ALLOC_CACHE_MAX) { cache->nr_cached++; - hlist_add_head(&entry->node, &cache->list); + wq_stack_add_head(&entry->node, &cache->list); return true; } return false; @@ -23,11 +23,12 @@ static inline bool io_alloc_cache_put(struct io_alloc_cache *cache, static inline struct io_cache_entry *io_alloc_cache_get(struct io_alloc_cache *cache) { - if (!hlist_empty(&cache->list)) { - struct hlist_node *node = cache->list.first; + if (cache->list.next) { + struct io_cache_entry *entry; - hlist_del(node); - return container_of(node, struct io_cache_entry, node); + entry = container_of(cache->list.next, struct io_cache_entry, node); + cache->list.next = cache->list.next->next; + return entry; } return NULL; @@ -35,18 +36,19 @@ static inline struct io_cache_entry *io_alloc_cache_get(struct io_alloc_cache *c static inline void io_alloc_cache_init(struct io_alloc_cache *cache) { - INIT_HLIST_HEAD(&cache->list); + cache->list.next = NULL; cache->nr_cached = 0; } static inline void io_alloc_cache_free(struct io_alloc_cache *cache, void (*free)(struct io_cache_entry *)) { - while (!hlist_empty(&cache->list)) { - struct hlist_node *node = cache->list.first; + while (1) { + struct io_cache_entry *entry = io_alloc_cache_get(cache); - hlist_del(node); - free(container_of(node, struct io_cache_entry, node)); + if (!entry) + break; + free(entry); } cache->nr_cached = 0; }