Message ID | 20230302231638.521280-1-dhowells@redhat.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp119711wrd; Thu, 2 Mar 2023 15:22:02 -0800 (PST) X-Google-Smtp-Source: AK7set+iHv11qlUpjPG4GzhlBD/2drDmUJnOUklbQXNTJR+QSVn/u8NiE+cxH0yjKtUycI9296c5 X-Received: by 2002:a17:906:2350:b0:8b1:7e88:c20e with SMTP id m16-20020a170906235000b008b17e88c20emr12367290eja.41.1677799322306; Thu, 02 Mar 2023 15:22:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677799322; cv=none; d=google.com; s=arc-20160816; b=rPVmna8acepZqF6YP+u0BfJeAYFR2OqHhKnZss9GaQ5HRA7kbiFm/pf6C6XdV3uVK3 fEye9KTRkxpJpbp+3/3KyyiJSd48+qJbyrjW2Vepvo0JoojynVQGC6bXbks4gYq8AkFz ORWUivzrc0E/ieo+7IjMkAeDhiSBwu+cJmbYnE+SeyfMNJv5z692ddkOieK8+dNETiqK DNsfVSLN1cgVLKxj9Enk7tvVfFkShaf1fKbTYN/DMmO+Iq081qW/Qstcgdm50EU77zdb 0t5ssmWNvFOcRGFuTlc/CLFcMVPBKNLL50V4Kw4QLWhdo7k1Mq/R5gqfYKVfRNfUSLHw uGVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=LAJJhudeXwtcWIsgOSeQzqHZOSp/kZ+Tsq3aTHJfVgw=; b=DmbtlcJAoqg/JJzRZ5FYXUDpHHW0F4qvHShoWzmuZq9ryCFX/zhDIRQgryPu2IV5SI O+qd0kMoeqDhb/BD3B3cOmbptgbYangOKYJhPgKaukHqpLU43VLzsJY6TnC+y/9ShIKu JVgcVb3NiZ3aYCLoeTj4V4iMmHzCeXNRz4l2iq1XNvjzXOEc8XjGLYMXSB0M03OUH8nN ifQ6XwnSMb3CdE53pp+/cY0UzDKvLym+4oZFB16fEbzp0+OqhWAjSf642uPU9HMMr8OI qbupfuY6pVhepTvasLxku4q00b6ewaL+kqTG/42RFkDFwHxK9yyafvKotqrHq0SPCZEh OsFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iZfi+qvn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r16-20020aa7c150000000b004af6aa59012si852092edp.443.2023.03.02.15.21.37; Thu, 02 Mar 2023 15:22:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iZfi+qvn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229799AbjCBXSK (ORCPT <rfc822;davidbtadokoro@gmail.com> + 99 others); Thu, 2 Mar 2023 18:18:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229461AbjCBXSG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 2 Mar 2023 18:18:06 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64A5B5943C for <linux-kernel@vger.kernel.org>; Thu, 2 Mar 2023 15:16:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677799010; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=LAJJhudeXwtcWIsgOSeQzqHZOSp/kZ+Tsq3aTHJfVgw=; b=iZfi+qvnI3H+ALFiBwtPQqrO0pPv4xNvm5iVBBTcxTMcYRMRpK2sYAHUJx9dB23mQoLwWc oZdOvCJYOS9SAoyziqA4gtf23SQUtPVwW3aT23PlqAxTmx9UaEfK7Dg2zS5mpwIdmLxw2t Lqw6M1FFlLmfSUeYAZrPCxhj0acEbdM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-656-L88jf00AM7usTIYZwG25HA-1; Thu, 02 Mar 2023 18:16:44 -0500 X-MC-Unique: L88jf00AM7usTIYZwG25HA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B9C151C05ABE; Thu, 2 Mar 2023 23:16:43 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id C38BC2026D76; Thu, 2 Mar 2023 23:16:41 +0000 (UTC) From: David Howells <dhowells@redhat.com> To: Linus Torvalds <torvalds@linux-foundation.org>, Steve French <smfrench@gmail.com> Cc: David Howells <dhowells@redhat.com>, Vishal Moola <vishal.moola@gmail.com>, Shyam Prasad N <nspmangalore@gmail.com>, Rohith Surabattula <rohiths.msft@gmail.com>, Tom Talpey <tom@talpey.com>, Stefan Metzmacher <metze@samba.org>, Paulo Alcantara <pc@cjr.nz>, Jeff Layton <jlayton@kernel.org>, Matthew Wilcox <willy@infradead.org>, Marc Dionne <marc.dionne@auristor.com>, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 0/3] smb3, afs: Revert changes to {cifs,afs}_writepages_region() Date: Thu, 2 Mar 2023 23:16:35 +0000 Message-Id: <20230302231638.521280-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759300102493215853?= X-GMAIL-MSGID: =?utf-8?q?1759300102493215853?= |
Series |
smb3, afs: Revert changes to {cifs,afs}_writepages_region()
|
|
Message
David Howells
March 2, 2023, 11:16 p.m. UTC
Hi Linus, Steve, Could you consider applying these please? I've split the patch that I proposed[1] to revert Vishal's patch to afs and Linus's changes to cifs back to the point where find_get_pages_range_tag() was being used to get a single folio and then replace that with a function, filemap_get_folio_tag() that just gets a single folio and done some benchmarking against this and some conversions to use write_cache_pages() in various ways. This is using the following to do testing of the write paths: fio --ioengine=libaio --direct=0 --gtod_reduce=1 --name=readtest \ --filename=/xfstest.test/foo --iodepth=128 --time_based \ --runtime=120 --readwrite=randread --iodepth_low=96 \ --iodepth_batch=16 --numjobs=4 --size=16M --bs=4k The base for comparison, the upstream kernel at commit: d2980d8d826554fa6981d621e569a453787472f8 "Merge tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git./linux/kernel/git/akpm/mm" plus the accumulated fixes on Steve's cifs for-next branch. AFS firstly. The code that's upstream keeps track of the dirtied region of a folio in page->private, so I tried removing that to see what difference it makes, in addition to trying conversions to use write_cache_pages(). I also tried giving afs it's own copy of write_cache_pages() in order to eliminate the function pointer - in case that had a signifcant effect due to spectre mitigations. Base: WRITE: bw=302MiB/s (316MB/s), 71.9MiB/s-78.9MiB/s (75.3MB/s-82.8MB/s) WRITE: bw=303MiB/s (318MB/s), 65.9MiB/s-84.0MiB/s (69.1MB/s-88.1MB/s) WRITE: bw=310MiB/s (325MB/s), 73.6MiB/s-87.3MiB/s (77.1MB/s-91.5MB/s) Base + Partial revert (these patches): WRITE: bw=348MiB/s (365MB/s), 86.4MiB/s-87.5MiB/s (90.6MB/s-91.8MB/s) WRITE: bw=350MiB/s (367MB/s), 86.6MiB/s-88.4MiB/s (90.8MB/s-92.7MB/s) WRITE: bw=387MiB/s (406MB/s), 96.8MiB/s-97.0MiB/s (101MB/s-102MB/s) Base + write_cache_pages(): WRITE: bw=280MiB/s (294MB/s), 69.7MiB/s-70.5MiB/s (73.0MB/s-73.9MB/s) WRITE: bw=285MiB/s (299MB/s), 70.9MiB/s-71.5MiB/s (74.4MB/s-74.9MB/s) WRITE: bw=290MiB/s (304MB/s), 71.6MiB/s-73.2MiB/s (75.1MB/s-76.8MB/s) Base + Page-dirty-region removed: WRITE: bw=301MiB/s (315MB/s), 70.4MiB/s-80.2MiB/s (73.8MB/s-84.1MB/s) WRITE: bw=325MiB/s (341MB/s), 78.5MiB/s-87.1MiB/s (82.3MB/s-91.3MB/s) WRITE: bw=320MiB/s (335MB/s), 71.6MiB/s-88.6MiB/s (75.0MB/s-92.9MB/s) Base + Page-dirty-region tracking removed + write_cache_pages(): WRITE: bw=288MiB/s (302MB/s), 71.9MiB/s-72.3MiB/s (75.4MB/s-75.8MB/s) WRITE: bw=284MiB/s (297MB/s), 70.7MiB/s-71.3MiB/s (74.1MB/s-74.8MB/s) WRITE: bw=287MiB/s (301MB/s), 71.2MiB/s-72.6MiB/s (74.7MB/s-76.1MB/s) Base + Page-dirty-region tracking removed + Own write_cache_pages() WRITE: bw=302MiB/s (316MB/s), 75.1MiB/s-76.1MiB/s (78.7MB/s-79.8MB/s) WRITE: bw=302MiB/s (316MB/s), 74.5MiB/s-76.1MiB/s (78.1MB/s-79.8MB/s) WRITE: bw=301MiB/s (316MB/s), 75.2MiB/s-75.5MiB/s (78.9MB/s-79.1MB/s) So the partially reverted code appears significantly faster than code based on write_cache_pages(). Removing the page-dirty-region tracking also slows things down - I have a suspicion that this may be due to multipage folios enlarging the apparently dirty regions of a file. And then CIFS. There's no dirtied region tracking here, so just the partial reversion, a conversion to write_cache_pages() and its own version of write_cache_pages() to eliminate the function pointer. Base: WRITE: bw=464MiB/s (487MB/s), 116MiB/s-116MiB/s (122MB/s-122MB/s) WRITE: bw=463MiB/s (486MB/s), 116MiB/s-116MiB/s (121MB/s-122MB/s) WRITE: bw=465MiB/s (488MB/s), 116MiB/s-116MiB/s (122MB/s-122MB/s) Base + Partial revert (these patches): WRITE: bw=470MiB/s (493MB/s), 117MiB/s-118MiB/s (123MB/s-123MB/s) WRITE: bw=467MiB/s (489MB/s), 117MiB/s-117MiB/s (122MB/s-122MB/s) WRITE: bw=464MiB/s (486MB/s), 116MiB/s-116MiB/s (121MB/s-122MB/s) Base + write_cache_pages(): WRITE: bw=457MiB/s (479MB/s), 114MiB/s-114MiB/s (120MB/s-120MB/s) WRITE: bw=449MiB/s (471MB/s), 112MiB/s-113MiB/s (118MB/s-118MB/s) WRITE: bw=459MiB/s (482MB/s), 115MiB/s-115MiB/s (120MB/s-121MB/s) Base + Own write_cache_pages(): WRITE: bw=451MiB/s (473MB/s), 113MiB/s-113MiB/s (118MB/s-118MB/s) WRITE: bw=455MiB/s (478MB/s), 114MiB/s-114MiB/s (119MB/s-120MB/s) WRITE: bw=453MiB/s (475MB/s), 113MiB/s-113MiB/s (119MB/s-119MB/s) WRITE: bw=459MiB/s (481MB/s), 115MiB/s-115MiB/s (120MB/s-120MB/s) Here the partially reverted code appears slightly better - but the results are very close so I'm not sure if it's statistically significant. I've pushed the patches here also: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-cifs David Link: https://lore.kernel.org/r/2214157.1677250083@warthog.procyon.org.uk/ [1] David Howells (3): mm: Add a function to get a single tagged folio from a file afs: Partially revert and use filemap_get_folio_tag() cifs: Partially revert and use filemap_get_folio_tag() fs/afs/write.c | 118 +++++++++++++++++++--------------------- fs/cifs/file.c | 115 +++++++++++++++++---------------------- include/linux/pagemap.h | 2 + mm/filemap.c | 58 ++++++++++++++++++++ 4 files changed, 166 insertions(+), 127 deletions(-)
Comments
David Howells <dhowells@redhat.com> wrote: > AFS firstly. ... > > Base + write_cache_pages(): > WRITE: bw=280MiB/s (294MB/s), 69.7MiB/s-70.5MiB/s (73.0MB/s-73.9MB/s) > WRITE: bw=285MiB/s (299MB/s), 70.9MiB/s-71.5MiB/s (74.4MB/s-74.9MB/s) > WRITE: bw=290MiB/s (304MB/s), 71.6MiB/s-73.2MiB/s (75.1MB/s-76.8MB/s) Here's the patch to convert AFS to use write_cache_pages(), retaining the use of page->private to track the dirtied part of the page. David --- write.c | 382 +++++++++++++--------------------------------------------------- 1 file changed, 78 insertions(+), 304 deletions(-) diff --git a/fs/afs/write.c b/fs/afs/write.c index 571f3b9a417e..01323fa58e1c 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -14,11 +14,6 @@ #include <linux/netfs.h> #include "internal.h" -static int afs_writepages_region(struct address_space *mapping, - struct writeback_control *wbc, - loff_t start, loff_t end, loff_t *_next, - bool max_one_loop); - static void afs_write_to_cache(struct afs_vnode *vnode, loff_t start, size_t len, loff_t i_size, bool caching); @@ -56,10 +51,8 @@ static int afs_flush_conflicting_write(struct address_space *mapping, .range_start = folio_pos(folio), .range_end = LLONG_MAX, }; - loff_t next; - return afs_writepages_region(mapping, &wbc, folio_pos(folio), LLONG_MAX, - &next, true); + return afs_writepages(mapping, &wbc); } /* @@ -449,212 +442,57 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t return afs_put_operation(op); } -/* - * Extend the region to be written back to include subsequent contiguously - * dirty pages if possible, but don't sleep while doing so. - * - * If this page holds new content, then we can include filler zeros in the - * writeback. - */ -static void afs_extend_writeback(struct address_space *mapping, - struct afs_vnode *vnode, - long *_count, - loff_t start, - loff_t max_len, - bool new_content, - bool caching, - unsigned int *_len) -{ - struct pagevec pvec; - struct folio *folio; - unsigned long priv; - unsigned int psize, filler = 0; - unsigned int f, t; - loff_t len = *_len; - pgoff_t index = (start + len) / PAGE_SIZE; - bool stop = true; - unsigned int i; - - XA_STATE(xas, &mapping->i_pages, index); - pagevec_init(&pvec); - - do { - /* Firstly, we gather up a batch of contiguous dirty pages - * under the RCU read lock - but we can't clear the dirty flags - * there if any of those pages are mapped. - */ - rcu_read_lock(); - - xas_for_each(&xas, folio, ULONG_MAX) { - stop = true; - if (xas_retry(&xas, folio)) - continue; - if (xa_is_value(folio)) - break; - if (folio_index(folio) != index) - break; - - if (!folio_try_get_rcu(folio)) { - xas_reset(&xas); - continue; - } - - /* Has the page moved or been split? */ - if (unlikely(folio != xas_reload(&xas))) { - folio_put(folio); - break; - } - - if (!folio_trylock(folio)) { - folio_put(folio); - break; - } - if (!folio_test_dirty(folio) || - folio_test_writeback(folio) || - folio_test_fscache(folio)) { - folio_unlock(folio); - folio_put(folio); - break; - } - - psize = folio_size(folio); - priv = (unsigned long)folio_get_private(folio); - f = afs_folio_dirty_from(folio, priv); - t = afs_folio_dirty_to(folio, priv); - if (f != 0 && !new_content) { - folio_unlock(folio); - folio_put(folio); - break; - } - - len += filler + t; - filler = psize - t; - if (len >= max_len || *_count <= 0) - stop = true; - else if (t == psize || new_content) - stop = false; - - index += folio_nr_pages(folio); - if (!pagevec_add(&pvec, &folio->page)) - break; - if (stop) - break; - } - - if (!stop) - xas_pause(&xas); - rcu_read_unlock(); - - /* Now, if we obtained any pages, we can shift them to being - * writable and mark them for caching. - */ - if (!pagevec_count(&pvec)) - break; - - for (i = 0; i < pagevec_count(&pvec); i++) { - folio = page_folio(pvec.pages[i]); - trace_afs_folio_dirty(vnode, tracepoint_string("store+"), folio); - - if (!folio_clear_dirty_for_io(folio)) - BUG(); - if (folio_start_writeback(folio)) - BUG(); - afs_folio_start_fscache(caching, folio); - - *_count -= folio_nr_pages(folio); - folio_unlock(folio); - } - - pagevec_release(&pvec); - cond_resched(); - } while (!stop); - - *_len = len; -} +struct afs_writepages_context { + unsigned long long start; + unsigned long long end; + unsigned long long annex_at; + bool begun; + bool caching; + bool new_content; +}; /* - * Synchronously write back the locked page and any subsequent non-locked dirty - * pages. + * Flush a block of pages to the server and the cache. */ -static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping, - struct writeback_control *wbc, - struct folio *folio, - loff_t start, loff_t end) +static int afs_writepages_submit(struct address_space *mapping, + struct writeback_control *wbc, + struct afs_writepages_context *ctx) { struct afs_vnode *vnode = AFS_FS_I(mapping->host); struct iov_iter iter; - unsigned long priv; - unsigned int offset, to, len, max_len; - loff_t i_size = i_size_read(&vnode->netfs.inode); - bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); - bool caching = fscache_cookie_enabled(afs_vnode_cache(vnode)); - long count = wbc->nr_to_write; + unsigned long long i_size = i_size_read(&vnode->netfs.inode); + size_t len = ctx->end - ctx->start; int ret; - _enter(",%lx,%llx-%llx", folio_index(folio), start, end); - - if (folio_start_writeback(folio)) - BUG(); - afs_folio_start_fscache(caching, folio); - - count -= folio_nr_pages(folio); - - /* Find all consecutive lockable dirty pages that have contiguous - * written regions, stopping when we find a page that is not - * immediately lockable, is not dirty or is missing, or we reach the - * end of the range. - */ - priv = (unsigned long)folio_get_private(folio); - offset = afs_folio_dirty_from(folio, priv); - to = afs_folio_dirty_to(folio, priv); - trace_afs_folio_dirty(vnode, tracepoint_string("store"), folio); - - len = to - offset; - start += offset; - if (start < i_size) { - /* Trim the write to the EOF; the extra data is ignored. Also - * put an upper limit on the size of a single storedata op. - */ - max_len = 65536 * 4096; - max_len = min_t(unsigned long long, max_len, end - start + 1); - max_len = min_t(unsigned long long, max_len, i_size - start); - - if (len < max_len && - (to == folio_size(folio) || new_content)) - afs_extend_writeback(mapping, vnode, &count, - start, max_len, new_content, - caching, &len); - len = min_t(loff_t, len, max_len); - } + _enter("%llx-%llx", ctx->start, ctx->start + len - 1); /* We now have a contiguous set of dirty pages, each with writeback - * set; the first page is still locked at this point, but all the rest - * have been unlocked. + * set. */ - folio_unlock(folio); - - if (start < i_size) { - _debug("write back %x @%llx [%llx]", len, start, i_size); + if (ctx->start < i_size) { + if (len > i_size - ctx->start) + len = i_size - ctx->start; + _debug("write back %zx @%llx [%llx]", len, ctx->start, i_size); /* Speculatively write to the cache. We have to fix this up * later if the store fails. */ - afs_write_to_cache(vnode, start, len, i_size, caching); + afs_write_to_cache(vnode, ctx->start, len, i_size, ctx->caching); - iov_iter_xarray(&iter, ITER_SOURCE, &mapping->i_pages, start, len); - ret = afs_store_data(vnode, &iter, start, false); + iov_iter_xarray(&iter, ITER_SOURCE, + &mapping->i_pages, ctx->start, len); + ret = afs_store_data(vnode, &iter, ctx->start, false); } else { - _debug("write discard %x @%llx [%llx]", len, start, i_size); + _debug("write discard %zx @%llx [%llx]", len, ctx->start, i_size); /* The dirty region was entirely beyond the EOF. */ - fscache_clear_page_bits(mapping, start, len, caching); - afs_pages_written_back(vnode, start, len); + fscache_clear_page_bits(mapping, ctx->start, len, ctx->caching); + afs_pages_written_back(vnode, ctx->start, len); ret = 0; } switch (ret) { case 0: - wbc->nr_to_write = count; ret = len; break; @@ -668,13 +506,13 @@ static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping, case -EKEYREJECTED: case -EKEYREVOKED: case -ENETRESET: - afs_redirty_pages(wbc, mapping, start, len); + afs_redirty_pages(wbc, mapping, ctx->start, len); mapping_set_error(mapping, ret); break; case -EDQUOT: case -ENOSPC: - afs_redirty_pages(wbc, mapping, start, len); + afs_redirty_pages(wbc, mapping, ctx->start, len); mapping_set_error(mapping, -ENOSPC); break; @@ -686,7 +524,7 @@ static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping, case -ENOMEDIUM: case -ENXIO: trace_afs_file_error(vnode, ret, afs_file_error_writeback_fail); - afs_kill_pages(mapping, start, len); + afs_kill_pages(mapping, ctx->start, len); mapping_set_error(mapping, ret); break; } @@ -696,100 +534,51 @@ static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping, } /* - * write a region of pages back to the server + * Add a page to the set and flush when large enough. */ -static int afs_writepages_region(struct address_space *mapping, - struct writeback_control *wbc, - loff_t start, loff_t end, loff_t *_next, - bool max_one_loop) +static int afs_writepages_add_folio(struct folio *folio, + struct writeback_control *wbc, void *data) { - struct folio *folio; - struct folio_batch fbatch; - ssize_t ret; - unsigned int i; - int n, skips = 0; - - _enter("%llx,%llx,", start, end); - folio_batch_init(&fbatch); - - do { - pgoff_t index = start / PAGE_SIZE; + struct afs_writepages_context *ctx = data; + struct afs_vnode *vnode = AFS_FS_I(folio->mapping->host); + unsigned long long pos = folio_pos(folio); + unsigned long priv; + size_t f, t; + int ret; - n = filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE, - PAGECACHE_TAG_DIRTY, &fbatch); + priv = (unsigned long)folio_get_private(folio); + f = afs_folio_dirty_from(folio, priv); + t = afs_folio_dirty_to(folio, priv); - if (!n) - break; - for (i = 0; i < n; i++) { - folio = fbatch.folios[i]; - start = folio_pos(folio); /* May regress with THPs */ - - _debug("wback %lx", folio_index(folio)); - - /* At this point we hold neither the i_pages lock nor the - * page lock: the page may be truncated or invalidated - * (changing page->mapping to NULL), or even swizzled - * back from swapper_space to tmpfs file mapping - */ - if (wbc->sync_mode != WB_SYNC_NONE) { - ret = folio_lock_killable(folio); - if (ret < 0) { - folio_batch_release(&fbatch); - return ret; - } - } else { - if (!folio_trylock(folio)) - continue; - } - - if (folio->mapping != mapping || - !folio_test_dirty(folio)) { - start += folio_size(folio); - folio_unlock(folio); - continue; - } - - if (folio_test_writeback(folio) || - folio_test_fscache(folio)) { - folio_unlock(folio); - if (wbc->sync_mode != WB_SYNC_NONE) { - folio_wait_writeback(folio); -#ifdef CONFIG_AFS_FSCACHE - folio_wait_fscache(folio); -#endif - } else { - start += folio_size(folio); - } - if (wbc->sync_mode == WB_SYNC_NONE) { - if (skips >= 5 || need_resched()) { - *_next = start; - _leave(" = 0 [%llx]", *_next); - return 0; - } - skips++; - } - continue; - } - - if (!folio_clear_dirty_for_io(folio)) - BUG(); - ret = afs_write_back_from_locked_folio(mapping, wbc, - folio, start, end); - if (ret < 0) { - _leave(" = %zd", ret); - folio_batch_release(&fbatch); - return ret; - } - - start += ret; + if (ctx->begun) { + if ((f == 0 || ctx->new_content) && + pos == ctx->annex_at) { + trace_afs_folio_dirty(vnode, tracepoint_string("store+"), folio); + goto add; } + ret = afs_writepages_submit(folio->mapping, wbc, ctx); + if (ret < 0) + return ret; + } + + ctx->begun = true; + ctx->start = pos + f; + trace_afs_folio_dirty(vnode, tracepoint_string("store"), folio); +add: + ctx->end = pos + t; + ctx->annex_at = pos + folio_size(folio); - folio_batch_release(&fbatch); - cond_resched(); - } while (wbc->nr_to_write > 0); + folio_wait_fscache(folio); + folio_start_writeback(folio); + afs_folio_start_fscache(ctx->caching, folio); + folio_unlock(folio); - *_next = start; - _leave(" = 0 [%llx]", *_next); + if (ctx->end - ctx->start >= 65536 * 4096) { + ret = afs_writepages_submit(folio->mapping, wbc, ctx); + if (ret < 0) + return ret; + ctx->begun = false; + } return 0; } @@ -800,7 +589,10 @@ int afs_writepages(struct address_space *mapping, struct writeback_control *wbc) { struct afs_vnode *vnode = AFS_FS_I(mapping->host); - loff_t start, next; + struct afs_writepages_context ctx = { + .caching = fscache_cookie_enabled(afs_vnode_cache(vnode)), + .new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags), + }; int ret; _enter(""); @@ -814,29 +606,11 @@ int afs_writepages(struct address_space *mapping, else if (!down_read_trylock(&vnode->validate_lock)) return 0; - if (wbc->range_cyclic) { - start = mapping->writeback_index * PAGE_SIZE; - ret = afs_writepages_region(mapping, wbc, start, LLONG_MAX, - &next, false); - if (ret == 0) { - mapping->writeback_index = next / PAGE_SIZE; - if (start > 0 && wbc->nr_to_write > 0) { - ret = afs_writepages_region(mapping, wbc, 0, - start, &next, false); - if (ret == 0) - mapping->writeback_index = - next / PAGE_SIZE; - } - } - } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { - ret = afs_writepages_region(mapping, wbc, 0, LLONG_MAX, - &next, false); - if (wbc->nr_to_write > 0 && ret == 0) - mapping->writeback_index = next / PAGE_SIZE; - } else { - ret = afs_writepages_region(mapping, wbc, - wbc->range_start, wbc->range_end, - &next, false); + ret = write_cache_pages(mapping, wbc, afs_writepages_add_folio, &ctx); + if (ret >= 0 && ctx.begun) { + ret = afs_writepages_submit(mapping, wbc, &ctx); + if (ret < 0) + return ret; } up_read(&vnode->validate_lock);