Message ID | 20230119162055.20944-1-fmdefrancesco@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp421797wrn; Thu, 19 Jan 2023 08:25:54 -0800 (PST) X-Google-Smtp-Source: AMrXdXt214MNLLb6EnNYBECwgMRuBl6DnnsSKxcU/3Rof6smTVBeTMQVGcSjBPa/Re1H8cwD8AYL X-Received: by 2002:a50:ec86:0:b0:49e:984f:656d with SMTP id e6-20020a50ec86000000b0049e984f656dmr751669edr.28.1674145553873; Thu, 19 Jan 2023 08:25:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674145553; cv=none; d=google.com; s=arc-20160816; b=zMjlP46ob994kFtnywFWwei9dNd278cs7/LjweiSKbnnuM+LFBmbVMKwlJQLyeJ1EE MWq+HuPTV5vEn/ZPCq+I3Tv/KTqfYPQwSowfhilX8xQgiJRnQykifCYpVuzE/ymXhpyW Qjmkaiw/v4VIh9mFP2A6ZYVJBXIVP+09imgw3oVsprL+HyqmCR5pa5XjM7PhVInZWotX NcrX5ZNEuDgkrvOlInUbjK/qFVC0Nx37uWj7G3I7eqcRL+A+Y2Zf6ryNprPx1yOfOjRr 6V0lxEBDm2I4abgOclQYcl02A/chFx1Y39N1sHZTpYgGyd4AVu5ANxgm08A7mwdU2tyf eToA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=k4NpyI3zb9wVlCKCJ/xuTMFB4MtidWKieRg7FBUYu1E=; b=HF47NyilLoP+Hthllz7I3KpEz3gdRKclkDpALiP+ylPdtPkC/87HOlM1RStHGkK+Ck 311qo2y6hmk86Z7BTpjZ404uc/4gLw3/U/17lcAe2uJaQ4r9rjrGDtXZ3geOANlw+rZZ drVfucqgkO0a4kZM98wnWj7h0BSdzZFPiX1ut+mgEOOBCDV8QX6dhFp9qmYjSdXlAo9F cGlA8L67eeKexY/Mc1AVWIKeSd+PALX/nvjGev/2yXGnYZ8vOiAYBwzMSB65hxJx14kT dtpUBLGzsSKZ/zC3SD/mI+3bNxs6PVjzMdF57m/8Y/iUEthQzYpCFt6Dbe5aTdkbleZQ UKGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=i9i6egUn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b6-20020aa7dc06000000b0048bb3657808si15465340edu.487.2023.01.19.08.25.27; Thu, 19 Jan 2023 08:25:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=i9i6egUn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230117AbjASQVk (ORCPT <rfc822;pfffrao@gmail.com> + 99 others); Thu, 19 Jan 2023 11:21:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230159AbjASQVL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 19 Jan 2023 11:21:11 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C3BF8A0E9; Thu, 19 Jan 2023 08:21:01 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id j17so1990615wms.0; Thu, 19 Jan 2023 08:21:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=k4NpyI3zb9wVlCKCJ/xuTMFB4MtidWKieRg7FBUYu1E=; b=i9i6egUnPreoSDeZhrSkJk1l31ER8it2irbo/InNoPp1A5Ad8t17LOzTXakbQzpUvh bjbWvJfMSPVC6qpBknfsDjgaMWe4exdOFF+L6NVl4a0v8vk14Db6zRuOor3TMXn1rhu6 xZ1nQnEYlJYNnOfacdfe/zJWbjNGiTpvfO6eQuvaM+hbuRMvgY6B0E3ulTlmU/aslqJt NcabjA0Wc0ZTinwBjRD44MHXdkw4FicvWAeec0jr5VhX+blwkDfZ/oxpim+uSlDNXoZy oyYESXwFHhD3koN7iKErvcyB8dba9pk6B6lxg/VU1dHFf+1dQ0DwaVbsm2B5SzEM5HkS 0x4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=k4NpyI3zb9wVlCKCJ/xuTMFB4MtidWKieRg7FBUYu1E=; b=nL3VToIP/GPJ4GBYPLk98yd0q0qjGnOQWrjQEirtqlqeG3h9BoPcAeDQRoJEognaVf CGWeNp1jPy0Di6uTB31cjH4SG3bi9n0an59oyXi2SMD1reRo6ytqRPb6m/TMMNnAkTjE zrt8WhX1hLOCuGlkUSm864KlX4TbCLrPE297XpFYm+5w/fGq30eKpD5PceDzigzKe04w jSYedJtcIZGPAzpu2QGmX0KnHvNa36UWuuyxpcD3B79e6rfYwKQexEkc3Bh/cP+7JDl4 3WokfJhBV8wyaPPSRzbCjkU6Xax0pZqsxvwVlPcLGple5bNmqN7CUoxdCxObBwVy7Cqr q/ag== X-Gm-Message-State: AFqh2kqtCZKR6wO9SpSkvemgpi/ml0QJjLqOWfEALGDU/u1Jn5iA36jb skgqjkduOcDo+ee57vlyo/c= X-Received: by 2002:a7b:cbd6:0:b0:3db:622:4962 with SMTP id n22-20020a7bcbd6000000b003db06224962mr10811984wmi.21.1674145259615; Thu, 19 Jan 2023 08:20:59 -0800 (PST) Received: from localhost.localdomain (host-82-55-106-56.retail.telecomitalia.it. [82.55.106.56]) by smtp.gmail.com with ESMTPSA id ay22-20020a05600c1e1600b003dafbd859a6sm5385919wmb.43.2023.01.19.08.20.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Jan 2023 08:20:58 -0800 (PST) From: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> To: Alexander Viro <viro@zeniv.linux.org.uk>, Benjamin LaHaise <bcrl@kvack.org>, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-kernel@vger.kernel.org Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>, "Venkataramanan, Anirudh" <anirudh.venkataramanan@intel.com>, Ira Weiny <ira.weiny@intel.com>, Jeff Moyer <jmoyer@redhat.com>, Kent Overstreet <kent.overstreet@linux.dev> Subject: [PATCH v3] fs/aio: Replace kmap{,_atomic}() with kmap_local_page() Date: Thu, 19 Jan 2023 17:20:55 +0100 Message-Id: <20230119162055.20944-1-fmdefrancesco@gmail.com> X-Mailer: git-send-email 2.39.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755468848129583369?= X-GMAIL-MSGID: =?utf-8?q?1755468848129583369?= |
Series |
[v3] fs/aio: Replace kmap{,_atomic}() with kmap_local_page()
|
|
Commit Message
Fabio M. De Francesco
Jan. 19, 2023, 4:20 p.m. UTC
The use of kmap() and kmap_atomic() are being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as the mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. The use of kmap_local_page() in fs/aio.c is "safe" in the sense that the code don't hands the returned kernel virtual addresses to other threads and there are no nestings which should be handled with the stack based (LIFO) mappings/un-mappings order. Furthermore, the code between the old kmap_atomic()/kunmap_atomic() did not depend on disabling page-faults and/or preemption, so that there is no need to call pagefault_disable() and/or preempt_disable() before the mappings. Therefore, replace kmap() and kmap_atomic() with kmap_local_page() in fs/aio.c. Tested with xfstests on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Cc: "Venkataramanan, Anirudh" <anirudh.venkataramanan@intel.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> --- I've tested with "./check -g aio". The tests in this group fail 3/26 times, with and without my patch. Therefore, these changes don't introduce further errors. I'm not aware of any other tests which I may run, so that any suggestions would be precious and much appreciated :-) I'm resending this patch because some recipients were missing in the previous submissions. In the meantime I'm also adding some more information in the commit message. There are no changes in the code. Changes from v1: Add further information in the commit message, and the "Reviewed-by" tags from Ira and Jeff (thanks!). Changes from v2: Rewrite a block of code between mapping/un-mapping to improve readability in aio_setup_ring() and add a missing call to flush_dcache_page() in ioctx_add_table() (thanks to Al Viro); Add a "Reviewed-by" tag from Kent Overstreet (thanks). fs/aio.c | 46 +++++++++++++++++++++------------------------- 1 file changed, 21 insertions(+), 25 deletions(-)
Comments
On giovedì 19 gennaio 2023 17:20:55 CET Fabio M. De Francesco wrote: > The use of kmap() and kmap_atomic() are being deprecated in favor of > kmap_local_page(). > > There are two main problems with kmap(): (1) It comes with an overhead as > the mapping space is restricted and protected by a global lock for > synchronization and (2) it also requires global TLB invalidation when the > kmap’s pool wraps and it might block when the mapping space is fully > utilized until a slot becomes available. > > With kmap_local_page() the mappings are per thread, CPU local, can take > page faults, and can be called from any context (including interrupts). > It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, > the tasks can be preempted and, when they are scheduled to run again, the > kernel virtual addresses are restored and still valid. > > The use of kmap_local_page() in fs/aio.c is "safe" in the sense that the > code don't hands the returned kernel virtual addresses to other threads > and there are no nestings which should be handled with the stack based > (LIFO) mappings/un-mappings order. Furthermore, the code between the old > kmap_atomic()/kunmap_atomic() did not depend on disabling page-faults > and/or preemption, so that there is no need to call pagefault_disable() > and/or preempt_disable() before the mappings. > > Therefore, replace kmap() and kmap_atomic() with kmap_local_page() in > fs/aio.c. > > Tested with xfstests on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel > with HIGHMEM64GB enabled. > > Cc: "Venkataramanan, Anirudh" <anirudh.venkataramanan@intel.com> > Suggested-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Jeff Moyer <jmoyer@redhat.com> > Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> > Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> > --- > > I've tested with "./check -g aio". The tests in this group fail 3/26 > times, with and without my patch. Therefore, these changes don't introduce > further errors. I'm not aware of any other tests which I may run, so that > any suggestions would be precious and much appreciated :-) > > I'm resending this patch because some recipients were missing in the > previous submissions. In the meantime I'm also adding some more information > in the commit message. There are no changes in the code. > > Changes from v1: > Add further information in the commit message, and the > "Reviewed-by" tags from Ira and Jeff (thanks!). > > Changes from v2: > Rewrite a block of code between mapping/un-mapping to improve > readability in aio_setup_ring() and add a missing call to > flush_dcache_page() in ioctx_add_table() (thanks to Al Viro); > Add a "Reviewed-by" tag from Kent Overstreet (thanks). > > fs/aio.c | 46 +++++++++++++++++++++------------------------- > 1 file changed, 21 insertions(+), 25 deletions(-) > > diff --git a/fs/aio.c b/fs/aio.c > index 562916d85cba..9b39063dc7ac 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -486,7 +486,6 @@ static const struct address_space_operations aio_ctx_aops > = { > > static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) > { > - struct aio_ring *ring; > struct mm_struct *mm = current->mm; > unsigned long size, unused; > int nr_pages; > @@ -567,16 +566,12 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned > int nr_events) ctx->user_id = ctx->mmap_base; > ctx->nr_events = nr_events; /* trusted copy */ > > - ring = kmap_atomic(ctx->ring_pages[0]); > - ring->nr = nr_events; /* user copy */ > - ring->id = ~0U; > - ring->head = ring->tail = 0; > - ring->magic = AIO_RING_MAGIC; > - ring->compat_features = AIO_RING_COMPAT_FEATURES; > - ring->incompat_features = AIO_RING_INCOMPAT_FEATURES; > - ring->header_length = sizeof(struct aio_ring); > - kunmap_atomic(ring); > - flush_dcache_page(ctx->ring_pages[0]); > + memcpy_to_page(ctx->ring_pages[0], 0, (const char *)&(struct aio_ring) { > + .nr = nr_events, .id = ~0U, .magic = AIO_RING_MAGIC, > + .compat_features = AIO_RING_COMPAT_FEATURES, > + .incompat_features = AIO_RING_INCOMPAT_FEATURES, > + .header_length = sizeof(struct aio_ring) }, > + sizeof(struct aio_ring)); > > return 0; > } > @@ -678,9 +673,10 @@ static int ioctx_add_table(struct kioctx *ctx, struct > mm_struct *mm) * we are protected from page migration > * changes ring_pages by - >ring_lock. > */ > - ring = kmap_atomic(ctx- >ring_pages[0]); > + ring = kmap_local_page(ctx- >ring_pages[0]); > ring->id = ctx->id; > - kunmap_atomic(ring); > + kunmap_local(ring); > + flush_dcache_page(ctx- >ring_pages[0]); > return 0; > } > > @@ -1021,9 +1017,9 @@ static void user_refill_reqs_available(struct kioctx > *ctx) * against ctx->completed_events below will make sure we do the > * safe/right thing. > */ > - ring = kmap_atomic(ctx->ring_pages[0]); > + ring = kmap_local_page(ctx->ring_pages[0]); > head = ring->head; > - kunmap_atomic(ring); > + kunmap_local(ring); > > refill_reqs_available(ctx, head, ctx->tail); > } > @@ -1129,12 +1125,12 @@ static void aio_complete(struct aio_kiocb *iocb) > if (++tail >= ctx->nr_events) > tail = 0; > > - ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); > + ev_page = kmap_local_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); > event = ev_page + pos % AIO_EVENTS_PER_PAGE; > > *event = iocb->ki_res; > > - kunmap_atomic(ev_page); > + kunmap_local(ev_page); > flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); > > pr_debug("%p[%u]: %p: %p %Lx %Lx %Lx\n", ctx, tail, iocb, > @@ -1148,10 +1144,10 @@ static void aio_complete(struct aio_kiocb *iocb) > > ctx->tail = tail; > > - ring = kmap_atomic(ctx->ring_pages[0]); > + ring = kmap_local_page(ctx->ring_pages[0]); > head = ring->head; > ring->tail = tail; > - kunmap_atomic(ring); > + kunmap_local(ring); > flush_dcache_page(ctx->ring_pages[0]); > > ctx->completed_events++; > @@ -1211,10 +1207,10 @@ static long aio_read_events_ring(struct kioctx *ctx, > mutex_lock(&ctx->ring_lock); > > /* Access to ->ring_pages here is protected by ctx->ring_lock. */ > - ring = kmap_atomic(ctx->ring_pages[0]); > + ring = kmap_local_page(ctx->ring_pages[0]); > head = ring->head; > tail = ring->tail; > - kunmap_atomic(ring); > + kunmap_local(ring); > > /* > * Ensure that once we've read the current tail pointer, that > @@ -1246,10 +1242,10 @@ static long aio_read_events_ring(struct kioctx *ctx, > avail = min(avail, nr - ret); > avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos); > > - ev = kmap(page); > + ev = kmap_local_page(page); > copy_ret = copy_to_user(event + ret, ev + pos, > sizeof(*ev) * avail); > - kunmap(page); > + kunmap_local(ev); > > if (unlikely(copy_ret)) { > ret = -EFAULT; > @@ -1261,9 +1257,9 @@ static long aio_read_events_ring(struct kioctx *ctx, > head %= ctx->nr_events; > } > > - ring = kmap_atomic(ctx->ring_pages[0]); > + ring = kmap_local_page(ctx->ring_pages[0]); > ring->head = head; > - kunmap_atomic(ring); > + kunmap_local(ring); > flush_dcache_page(ctx->ring_pages[0]); > > pr_debug("%li h%u t%u\n", ret, head, tail); > -- > 2.39.0 Hi Al, I see that this patch is here since Jan 19, 2023. Is there anything that prevents its merging? Am I expected to do further changes? Please notice that it already had three "Reviewed-by:" tags (again thanks to Ira, Jeff and Kent). Can you please take it in your three? Thanks, Fabio
On giovedì 19 gennaio 2023 17:20:55 CEST Fabio M. De Francesco wrote: > The use of kmap() and kmap_atomic() are being deprecated in favor of > kmap_local_page(). > > There are two main problems with kmap(): (1) It comes with an overhead as > the mapping space is restricted and protected by a global lock for > synchronization and (2) it also requires global TLB invalidation when the > kmap’s pool wraps and it might block when the mapping space is fully > utilized until a slot becomes available. > > With kmap_local_page() the mappings are per thread, CPU local, can take > page faults, and can be called from any context (including interrupts). > It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, > the tasks can be preempted and, when they are scheduled to run again, the > kernel virtual addresses are restored and still valid. > > The use of kmap_local_page() in fs/aio.c is "safe" in the sense that the > code don't hands the returned kernel virtual addresses to other threads > and there are no nesting which should be handled with the stack based > (LIFO) mappings/un-mappings order. Furthermore, the code between the old > kmap_atomic()/kunmap_atomic() did not depend on disabling page-faults > and/or preemption, so that there is no need to call pagefault_disable() > and/or preempt_disable() before the mappings. > > Therefore, replace kmap() and kmap_atomic() with kmap_local_page() in > fs/aio.c. > > Tested with xfstests on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel > with HIGHMEM64GB enabled. > Hi Al, I see that this patch is here since Jan 19, 2023. Is there anything that prevents its merging? Am I expected to do further changes? Please notice that it already had three "Reviewed-by:" tags (again thanks to Ira, Jeff and Kent). Can you please take it in your three? Thanks, Fabio > > Cc: "Venkataramanan, Anirudh" <anirudh.venkataramanan@intel.com> > Suggested-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Jeff Moyer <jmoyer@redhat.com> > Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> > Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> > --- > > I've tested with "./check -g aio". The tests in this group fail 3/26 > times, with and without my patch. Therefore, these changes don't introduce > further errors. I'm not aware of any other tests which I may run, so that > any suggestions would be precious and much appreciated :-) > > I'm resending this patch because some recipients were missing in the > previous submissions. In the meantime I'm also adding some more information > in the commit message. There are no changes in the code. > > Changes from v1: > Add further information in the commit message, and the > "Reviewed-by" tags from Ira and Jeff (thanks!). > > Changes from v2: > Rewrite a block of code between mapping/un-mapping to improve > readability in aio_setup_ring() and add a missing call to > flush_dcache_page() in ioctx_add_table() (thanks to Al Viro); > Add a "Reviewed-by" tag from Kent Overstreet (thanks). > > fs/aio.c | 46 +++++++++++++++++++++------------------------- > 1 file changed, 21 insertions(+), 25 deletions(-) > > diff --git a/fs/aio.c b/fs/aio.c > index 562916d85cba..9b39063dc7ac 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -486,7 +486,6 @@ static const struct address_space_operations aio_ctx_aops > = { > > static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) > { > - struct aio_ring *ring; > struct mm_struct *mm = current->mm; > unsigned long size, unused; > int nr_pages; > @@ -567,16 +566,12 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned > int nr_events) ctx->user_id = ctx->mmap_base; > ctx->nr_events = nr_events; /* trusted copy */ > > - ring = kmap_atomic(ctx->ring_pages[0]); > - ring->nr = nr_events; /* user copy */ > - ring->id = ~0U; > - ring->head = ring->tail = 0; > - ring->magic = AIO_RING_MAGIC; > - ring->compat_features = AIO_RING_COMPAT_FEATURES; > - ring->incompat_features = AIO_RING_INCOMPAT_FEATURES; > - ring->header_length = sizeof(struct aio_ring); > - kunmap_atomic(ring); > - flush_dcache_page(ctx->ring_pages[0]); > + memcpy_to_page(ctx->ring_pages[0], 0, (const char *)&(struct aio_ring) { > + .nr = nr_events, .id = ~0U, .magic = AIO_RING_MAGIC, > + .compat_features = AIO_RING_COMPAT_FEATURES, > + .incompat_features = AIO_RING_INCOMPAT_FEATURES, > + .header_length = sizeof(struct aio_ring) }, > + sizeof(struct aio_ring)); > > return 0; > } > @@ -678,9 +673,10 @@ static int ioctx_add_table(struct kioctx *ctx, struct > mm_struct *mm) * we are protected from page migration > * changes ring_pages by - >ring_lock. > */ > - ring = kmap_atomic(ctx- >ring_pages[0]); > + ring = kmap_local_page(ctx- >ring_pages[0]); > ring->id = ctx->id; > - kunmap_atomic(ring); > + kunmap_local(ring); > + flush_dcache_page(ctx- >ring_pages[0]); > return 0; > } > > @@ -1021,9 +1017,9 @@ static void user_refill_reqs_available(struct kioctx > *ctx) * against ctx->completed_events below will make sure we do the > * safe/right thing. > */ > - ring = kmap_atomic(ctx->ring_pages[0]); > + ring = kmap_local_page(ctx->ring_pages[0]); > head = ring->head; > - kunmap_atomic(ring); > + kunmap_local(ring); > > refill_reqs_available(ctx, head, ctx->tail); > } > @@ -1129,12 +1125,12 @@ static void aio_complete(struct aio_kiocb *iocb) > if (++tail >= ctx->nr_events) > tail = 0; > > - ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); > + ev_page = kmap_local_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); > event = ev_page + pos % AIO_EVENTS_PER_PAGE; > > *event = iocb->ki_res; > > - kunmap_atomic(ev_page); > + kunmap_local(ev_page); > flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); > > pr_debug("%p[%u]: %p: %p %Lx %Lx %Lx\n", ctx, tail, iocb, > @@ -1148,10 +1144,10 @@ static void aio_complete(struct aio_kiocb *iocb) > > ctx->tail = tail; > > - ring = kmap_atomic(ctx->ring_pages[0]); > + ring = kmap_local_page(ctx->ring_pages[0]); > head = ring->head; > ring->tail = tail; > - kunmap_atomic(ring); > + kunmap_local(ring); > flush_dcache_page(ctx->ring_pages[0]); > > ctx->completed_events++; > @@ -1211,10 +1207,10 @@ static long aio_read_events_ring(struct kioctx *ctx, > mutex_lock(&ctx->ring_lock); > > /* Access to ->ring_pages here is protected by ctx->ring_lock. */ > - ring = kmap_atomic(ctx->ring_pages[0]); > + ring = kmap_local_page(ctx->ring_pages[0]); > head = ring->head; > tail = ring->tail; > - kunmap_atomic(ring); > + kunmap_local(ring); > > /* > * Ensure that once we've read the current tail pointer, that > @@ -1246,10 +1242,10 @@ static long aio_read_events_ring(struct kioctx *ctx, > avail = min(avail, nr - ret); > avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos); > > - ev = kmap(page); > + ev = kmap_local_page(page); > copy_ret = copy_to_user(event + ret, ev + pos, > sizeof(*ev) * avail); > - kunmap(page); > + kunmap_local(ev); > > if (unlikely(copy_ret)) { > ret = -EFAULT; > @@ -1261,9 +1257,9 @@ static long aio_read_events_ring(struct kioctx *ctx, > head %= ctx->nr_events; > } > > - ring = kmap_atomic(ctx->ring_pages[0]); > + ring = kmap_local_page(ctx->ring_pages[0]); > ring->head = head; > - kunmap_atomic(ring); > + kunmap_local(ring); > flush_dcache_page(ctx->ring_pages[0]); > > pr_debug("%li h%u t%u\n", ret, head, tail); > -- > 2.39.0
On Mon, Mar 27, 2023 at 12:08:20PM +0200, Fabio M. De Francesco wrote: > On giovedì 19 gennaio 2023 17:20:55 CEST Fabio M. De Francesco wrote: > > The use of kmap() and kmap_atomic() are being deprecated in favor of > > kmap_local_page(). > > > > There are two main problems with kmap(): (1) It comes with an overhead as > > the mapping space is restricted and protected by a global lock for > > synchronization and (2) it also requires global TLB invalidation when the > > kmap’s pool wraps and it might block when the mapping space is fully > > utilized until a slot becomes available. > > > > With kmap_local_page() the mappings are per thread, CPU local, can take > > page faults, and can be called from any context (including interrupts). > > It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, > > the tasks can be preempted and, when they are scheduled to run again, the > > kernel virtual addresses are restored and still valid. > > > > The use of kmap_local_page() in fs/aio.c is "safe" in the sense that the > > code don't hands the returned kernel virtual addresses to other threads > > and there are no nesting which should be handled with the stack based > > (LIFO) mappings/un-mappings order. Furthermore, the code between the old > > kmap_atomic()/kunmap_atomic() did not depend on disabling page-faults > > and/or preemption, so that there is no need to call pagefault_disable() > > and/or preempt_disable() before the mappings. > > > > Therefore, replace kmap() and kmap_atomic() with kmap_local_page() in > > fs/aio.c. Or should we just stop allocating aio rings from HIGHMEM and remove the calls to kmap()? How much memory are we talking about here?
On Mon, Mar 27, 2023 at 02:22:46PM +0100, Matthew Wilcox wrote: > Or should we just stop allocating aio rings from HIGHMEM and remove > the calls to kmap()? How much memory are we talking about here? I don't think that should stop us from taking these patches, but yes.
On lunedì 27 marzo 2023 15:22:46 CEST Matthew Wilcox wrote: > On Mon, Mar 27, 2023 at 12:08:20PM +0200, Fabio M. De Francesco wrote: > > On giovedì 19 gennaio 2023 17:20:55 CEST Fabio M. De Francesco wrote: > > > The use of kmap() and kmap_atomic() are being deprecated in favor of > > > kmap_local_page(). > > > > > > [...] > > > > > > Therefore, replace kmap() and kmap_atomic() with kmap_local_page() in > > > fs/aio.c. > > Or should we just stop allocating aio rings from HIGHMEM and remove > the calls to kmap()? How much memory are we talking about here? Matthew, Well, I'll do as you suggested. Actually, I should have made this change when you suggested it but... well, I think you can easily guess why I did not. Here it seems that a call of find_or_create_pages() with the GFP_USER flag instead of GFP_HIGHUSER is all that is required. And then I'll get rid of the mappings in favor of some straight page_address(). I just gave a look after months, so I could very well have missed something else. If what I just saw it's all that must be changed, I'll send the new patch by tomorrow. Thanks, Fabio P.S.: I had sent other patches that must also be changed according to a similar comment you made. Obviously, I'll work also on them (no matter if you can't probably recall the short series to fs/ufs I'm referring to).
On giovedì 19 gennaio 2023 17:20:55 CEST Fabio M. De Francesco wrote: > The use of kmap() and kmap_atomic() are being deprecated in favor of > kmap_local_page(). According to a suggestion by Matthew, I just sent another patch which stops allocating aio rings from ZONE_HIGHMEM.[1] Therefore, please drop this patch. Since the purpose of the new patch is entirely different from this, I changed the subject and reset the version number to v1. Thanks, Fabio [1] https://lore.kernel.org/lkml/20230609145937.17610-1-fmdefrancesco@gmail.com/ > There are two main problems with kmap(): (1) It comes with an overhead as > the mapping space is restricted and protected by a global lock for > synchronization and (2) it also requires global TLB invalidation when the > kmap’s pool wraps and it might block when the mapping space is fully > utilized until a slot becomes available. > > With kmap_local_page() the mappings are per thread, CPU local, can take > page faults, and can be called from any context (including interrupts). > It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, > the tasks can be preempted and, when they are scheduled to run again, the > kernel virtual addresses are restored and still valid. > > The use of kmap_local_page() in fs/aio.c is "safe" in the sense that the > code don't hands the returned kernel virtual addresses to other threads > and there are no nesting which should be handled with the stack based > (LIFO) mappings/un-mappings order. Furthermore, the code between the old > kmap_atomic()/kunmap_atomic() did not depend on disabling page-faults > and/or preemption, so that there is no need to call pagefault_disable() > and/or preempt_disable() before the mappings. > > Therefore, replace kmap() and kmap_atomic() with kmap_local_page() in > fs/aio.c. > > Tested with xfstests on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel > with HIGHMEM64GB enabled. > > Cc: "Venkataramanan, Anirudh" <anirudh.venkataramanan@intel.com> > Suggested-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Jeff Moyer <jmoyer@redhat.com> > Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> > Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> > ---
diff --git a/fs/aio.c b/fs/aio.c index 562916d85cba..9b39063dc7ac 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -486,7 +486,6 @@ static const struct address_space_operations aio_ctx_aops = { static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) { - struct aio_ring *ring; struct mm_struct *mm = current->mm; unsigned long size, unused; int nr_pages; @@ -567,16 +566,12 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->user_id = ctx->mmap_base; ctx->nr_events = nr_events; /* trusted copy */ - ring = kmap_atomic(ctx->ring_pages[0]); - ring->nr = nr_events; /* user copy */ - ring->id = ~0U; - ring->head = ring->tail = 0; - ring->magic = AIO_RING_MAGIC; - ring->compat_features = AIO_RING_COMPAT_FEATURES; - ring->incompat_features = AIO_RING_INCOMPAT_FEATURES; - ring->header_length = sizeof(struct aio_ring); - kunmap_atomic(ring); - flush_dcache_page(ctx->ring_pages[0]); + memcpy_to_page(ctx->ring_pages[0], 0, (const char *)&(struct aio_ring) { + .nr = nr_events, .id = ~0U, .magic = AIO_RING_MAGIC, + .compat_features = AIO_RING_COMPAT_FEATURES, + .incompat_features = AIO_RING_INCOMPAT_FEATURES, + .header_length = sizeof(struct aio_ring) }, + sizeof(struct aio_ring)); return 0; } @@ -678,9 +673,10 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm) * we are protected from page migration * changes ring_pages by ->ring_lock. */ - ring = kmap_atomic(ctx->ring_pages[0]); + ring = kmap_local_page(ctx->ring_pages[0]); ring->id = ctx->id; - kunmap_atomic(ring); + kunmap_local(ring); + flush_dcache_page(ctx->ring_pages[0]); return 0; } @@ -1021,9 +1017,9 @@ static void user_refill_reqs_available(struct kioctx *ctx) * against ctx->completed_events below will make sure we do the * safe/right thing. */ - ring = kmap_atomic(ctx->ring_pages[0]); + ring = kmap_local_page(ctx->ring_pages[0]); head = ring->head; - kunmap_atomic(ring); + kunmap_local(ring); refill_reqs_available(ctx, head, ctx->tail); } @@ -1129,12 +1125,12 @@ static void aio_complete(struct aio_kiocb *iocb) if (++tail >= ctx->nr_events) tail = 0; - ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); + ev_page = kmap_local_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); event = ev_page + pos % AIO_EVENTS_PER_PAGE; *event = iocb->ki_res; - kunmap_atomic(ev_page); + kunmap_local(ev_page); flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); pr_debug("%p[%u]: %p: %p %Lx %Lx %Lx\n", ctx, tail, iocb, @@ -1148,10 +1144,10 @@ static void aio_complete(struct aio_kiocb *iocb) ctx->tail = tail; - ring = kmap_atomic(ctx->ring_pages[0]); + ring = kmap_local_page(ctx->ring_pages[0]); head = ring->head; ring->tail = tail; - kunmap_atomic(ring); + kunmap_local(ring); flush_dcache_page(ctx->ring_pages[0]); ctx->completed_events++; @@ -1211,10 +1207,10 @@ static long aio_read_events_ring(struct kioctx *ctx, mutex_lock(&ctx->ring_lock); /* Access to ->ring_pages here is protected by ctx->ring_lock. */ - ring = kmap_atomic(ctx->ring_pages[0]); + ring = kmap_local_page(ctx->ring_pages[0]); head = ring->head; tail = ring->tail; - kunmap_atomic(ring); + kunmap_local(ring); /* * Ensure that once we've read the current tail pointer, that @@ -1246,10 +1242,10 @@ static long aio_read_events_ring(struct kioctx *ctx, avail = min(avail, nr - ret); avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos); - ev = kmap(page); + ev = kmap_local_page(page); copy_ret = copy_to_user(event + ret, ev + pos, sizeof(*ev) * avail); - kunmap(page); + kunmap_local(ev); if (unlikely(copy_ret)) { ret = -EFAULT; @@ -1261,9 +1257,9 @@ static long aio_read_events_ring(struct kioctx *ctx, head %= ctx->nr_events; } - ring = kmap_atomic(ctx->ring_pages[0]); + ring = kmap_local_page(ctx->ring_pages[0]); ring->head = head; - kunmap_atomic(ring); + kunmap_local(ring); flush_dcache_page(ctx->ring_pages[0]); pr_debug("%li h%u t%u\n", ret, head, tail);