[RFC] afs: Stop implementing ->writepage()
Commit Message
We're trying to get rid of the ->writepage() hook[1]. Stop afs from using
it by unlocking the page and calling filemap_fdatawrite_wbc() rather than
folio_write_one(). We drop the folio lock so that writeback can include
pages on both sides of the target page in the write without risking
deadlock.
A hint flag is added to the writeback_control struct so that a filesystem
can say that the write is triggered by write_begin seeing a conflicting
write. This causes do_writepages() to do a single pass of the loop only.
This requires ->migrate_folio() to be implemented, so point that at
filemap_migrate_folio() for files and also for symlinks and directories.
A couple of questions:
(1) afs_write_back_from_locked_folio() could be called directly rather
than calling filemap_fdatawrite_wbc(), but that would avoid the
control group stuff that wbc_attach_and_unlock_inode() and co. seem to
do. Do I actually need to do this?
(2) afs_writepages_region() has a loop in it to generate multiple writes.
do_writepages() also acquired a loop[2] which will also generate
multiple writes. Should I remove the loop from
afs_writepages_region() and leave it to the caller of ->writepages()?
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Matthew Wilcox <willy@infradead.org>
cc: Theodore Ts'o <tytso@mit.edu>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20221113162902.883850-1-hch@lst.de/ [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=80a2ea9f85850f1cdae814be03b4a16c3d3abc00 [2]
---
fs/afs/dir.c | 1 +
fs/afs/file.c | 3 +-
fs/afs/write.c | 63 +++++++++++++++++++++++----------------------
include/linux/writeback.h | 1 +
mm/page-writeback.c | 3 +-
5 files changed, 38 insertions(+), 33 deletions(-)
Comments
On Fri, Nov 18, 2022 at 10:37:35AM +0000, David Howells wrote:
> A hint flag is added to the writeback_control struct so that a filesystem
> can say that the write is triggered by write_begin seeing a conflicting
> write. This causes do_writepages() to do a single pass of the loop only.
Not a huge fan of that, especially as write_begin is not really a
method, but just an awkward hook in legacy write implementations.
I'd much rather have a private pointer in the writeback_control and
make the behavior implementation specific. It will need to be split
into a separate patch with proper documentation and a CC to linux-mm.
> (1) afs_write_back_from_locked_folio() could be called directly rather
> than calling filemap_fdatawrite_wbc(), but that would avoid the
> control group stuff that wbc_attach_and_unlock_inode() and co. seem to
> do. Do I actually need to do this?
That would be much preferred over the for_write_begin hack, given that
write_begin really isn't a well defined method but a hacky hook for
legacy write implementations.
> (2) afs_writepages_region() has a loop in it to generate multiple writes.
> do_writepages() also acquired a loop[2] which will also generate
> multiple writes. Should I remove the loop from
> afs_writepages_region() and leave it to the caller of ->writepages()?
Dropping out of ->writpages inside a page does seem a bit problematic,
so you probably want to keep the loop.
Christoph Hellwig <hch@lst.de> wrote:
> > (1) afs_write_back_from_locked_folio() could be called directly rather
> > than calling filemap_fdatawrite_wbc(), but that would avoid the
> > control group stuff that wbc_attach_and_unlock_inode() and co. seem to
> > do. Do I actually need to do this?
>
> That would be much preferred over the for_write_begin hack, given that
> write_begin really isn't a well defined method but a hacky hook for
> legacy write implementations.
So I don't need to worry about the control group stuff? I'll still need some
way to flush a conflicting write whatever mechanism is being used to write to
the page cache.
David
On Mon, Nov 21, 2022 at 08:28:57AM +0000, David Howells wrote:
> > That would be much preferred over the for_write_begin hack, given that
> > write_begin really isn't a well defined method but a hacky hook for
> > legacy write implementations.
>
> So I don't need to worry about the control group stuff? I'll still need some
> way to flush a conflicting write whatever mechanism is being used to write to
> the page cache.
cgroup is used for throttling writeback. If you need to flush conflicting
writes I can't see how interacting with the cgroups benefits anyone.
An as far as I can tell afs doesn't support cgroup writeback to start
with.
@@ -77,6 +77,7 @@ const struct address_space_operations afs_dir_aops = {
.dirty_folio = afs_dir_dirty_folio,
.release_folio = afs_dir_release_folio,
.invalidate_folio = afs_dir_invalidate_folio,
+ .migrate_folio = filemap_migrate_folio,
};
const struct dentry_operations afs_fs_dentry_operations = {
@@ -58,14 +58,15 @@ const struct address_space_operations afs_file_aops = {
.invalidate_folio = afs_invalidate_folio,
.write_begin = afs_write_begin,
.write_end = afs_write_end,
- .writepage = afs_writepage,
.writepages = afs_writepages,
+ .migrate_folio = filemap_migrate_folio,
};
const struct address_space_operations afs_symlink_aops = {
.read_folio = afs_symlink_read_folio,
.release_folio = afs_release_folio,
.invalidate_folio = afs_invalidate_folio,
+ .migrate_folio = filemap_migrate_folio,
};
static const struct vm_operations_struct afs_vm_ops = {
@@ -38,6 +38,24 @@ static void afs_folio_start_fscache(bool caching, struct folio *folio)
}
#endif
+/*
+ * Flush out a conflicting write. This may extend the write to the surrounding
+ * pages if also dirty and contiguous to the conflicting region..
+ */
+static int afs_flush_conflicting_write(struct address_space *mapping,
+ struct folio *folio)
+{
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = LONG_MAX,
+ .range_start = folio_pos(folio),
+ .range_end = LLONG_MAX,
+ .for_write_begin = true,
+ };
+
+ return filemap_fdatawrite_wbc(mapping, &wbc);
+}
+
/*
* prepare to perform part of a write to a page
*/
@@ -80,7 +98,8 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
if (folio_test_writeback(folio)) {
trace_afs_folio_dirty(vnode, tracepoint_string("alrdy"), folio);
- goto flush_conflicting_write;
+ folio_unlock(folio);
+ goto wait_for_writeback;
}
/* If the file is being filled locally, allow inter-write
* spaces to be merged into writes. If it's not, only write
@@ -99,8 +118,15 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
* flush the page out.
*/
flush_conflicting_write:
- _debug("flush conflict");
- ret = folio_write_one(folio);
+ trace_afs_folio_dirty(vnode, tracepoint_string("confl"), folio);
+ folio_unlock(folio);
+
+ ret = afs_flush_conflicting_write(mapping, folio);
+ if (ret < 0)
+ goto error;
+
+wait_for_writeback:
+ ret = folio_wait_writeback_killable(folio);
if (ret < 0)
goto error;
@@ -663,34 +689,6 @@ static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping,
return ret;
}
-/*
- * write a page back to the server
- * - the caller locked the page for us
- */
-int afs_writepage(struct page *subpage, struct writeback_control *wbc)
-{
- struct folio *folio = page_folio(subpage);
- ssize_t ret;
- loff_t start;
-
- _enter("{%lx},", folio_index(folio));
-
-#ifdef CONFIG_AFS_FSCACHE
- folio_wait_fscache(folio);
-#endif
-
- start = folio_index(folio) * PAGE_SIZE;
- ret = afs_write_back_from_locked_folio(folio_mapping(folio), wbc,
- folio, start, LLONG_MAX - start);
- if (ret < 0) {
- _leave(" = %zd", ret);
- return ret;
- }
-
- _leave(" = 0");
- return 0;
-}
-
/*
* write a region of pages back to the server
*/
@@ -775,6 +773,9 @@ static int afs_writepages_region(struct address_space *mapping,
start += ret;
+ if (wbc->for_write_begin)
+ break;
+
cond_resched();
} while (wbc->nr_to_write > 0);
@@ -61,6 +61,7 @@ struct writeback_control {
unsigned range_cyclic:1; /* range_start is cyclic */
unsigned for_sync:1; /* sync(2) WB_SYNC_ALL writeback */
unsigned unpinned_fscache_wb:1; /* Cleared I_PINNING_FSCACHE_WB */
+ unsigned for_write_begin:1; /* Flush conflicting write */
/*
* When writeback IOs are bounced through async layers, only the
@@ -2469,7 +2469,8 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
ret = mapping->a_ops->writepages(mapping, wbc);
else
ret = generic_writepages(mapping, wbc);
- if ((ret != -ENOMEM) || (wbc->sync_mode != WB_SYNC_ALL))
+ if ((ret != -ENOMEM) || (wbc->sync_mode != WB_SYNC_ALL) ||
+ wbc->for_write_begin)
break;
/*