[09/12] writeback: Factor writeback_iter_next() out of write_cache_pages()
Commit Message
Pull the post-processing of the writepage_t callback into a
separate function. That means changing writeback_finish() to
return NULL, and writeback_get_next() to call writeback_finish()
when we naturally run out of folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/page-writeback.c | 84 ++++++++++++++++++++++++---------------------
1 file changed, 44 insertions(+), 40 deletions(-)
Comments
On Mon, Jun 26, 2023 at 06:35:18PM +0100, Matthew Wilcox (Oracle) wrote:
> Pull the post-processing of the writepage_t callback into a
> separate function. That means changing writeback_finish() to
> return NULL, and writeback_get_next() to call writeback_finish()
> when we naturally run out of folios.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> mm/page-writeback.c | 84 ++++++++++++++++++++++++---------------------
> 1 file changed, 44 insertions(+), 40 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 659df2b5c7c0..ef61d7006c5e 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2360,7 +2360,7 @@ void tag_pages_for_writeback(struct address_space *mapping,
> }
> EXPORT_SYMBOL(tag_pages_for_writeback);
>
> -static int writeback_finish(struct address_space *mapping,
> +static struct folio *writeback_finish(struct address_space *mapping,
> struct writeback_control *wbc, bool done)
> {
> folio_batch_release(&wbc->fbatch);
> @@ -2375,7 +2375,7 @@ static int writeback_finish(struct address_space *mapping,
> if (wbc->range_cyclic || (wbc->range_whole && wbc->nr_to_write > 0))
> mapping->writeback_index = wbc->done_index;
>
> - return wbc->err;
> + return NULL;
Having a return value that is always NULL feels a bit weird vs just
doing that return in the caller.
> +static struct folio *writeback_iter_next(struct address_space *mapping,
> + struct writeback_control *wbc, struct folio *folio, int error)
> +{
> + if (unlikely(error)) {
> + /*
> + * Handle errors according to the type of writeback.
> + * There's no need to continue for background writeback.
> + * Just push done_index past this folio so media
> + * errors won't choke writeout for the entire file.
> + * For integrity writeback, we must process the entire
> + * dirty set regardless of errors because the fs may
> + * still have state to clear for each folio. In that
> + * case we continue processing and return the first error.
> + */
> + if (error == AOP_WRITEPAGE_ACTIVATE) {
> + folio_unlock(folio);
> + error = 0;
Note there really shouldn't be any need for this in outside of the
legacy >writepage case. But it might make sense to delay the removal
until after ->writepage is gone to avoid bugs in conversions.
> + } else if (wbc->sync_mode != WB_SYNC_ALL) {
> + wbc->err = error;
> + wbc->done_index = folio->index +
> + folio_nr_pages(folio);
Btw, I wonder if a folio_next_index helper for this might be useful
as it's a pattern we have in a few places.
On Mon, Jun 26, 2023 at 09:39:39PM -0700, Christoph Hellwig wrote:
> On Mon, Jun 26, 2023 at 06:35:18PM +0100, Matthew Wilcox (Oracle) wrote:
> > Pull the post-processing of the writepage_t callback into a
> > separate function. That means changing writeback_finish() to
> > return NULL, and writeback_get_next() to call writeback_finish()
> > when we naturally run out of folios.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> > mm/page-writeback.c | 84 ++++++++++++++++++++++++---------------------
> > 1 file changed, 44 insertions(+), 40 deletions(-)
> >
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 659df2b5c7c0..ef61d7006c5e 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2360,7 +2360,7 @@ void tag_pages_for_writeback(struct address_space *mapping,
> > }
> > EXPORT_SYMBOL(tag_pages_for_writeback);
> >
> > -static int writeback_finish(struct address_space *mapping,
> > +static struct folio *writeback_finish(struct address_space *mapping,
> > struct writeback_control *wbc, bool done)
> > {
> > folio_batch_release(&wbc->fbatch);
> > @@ -2375,7 +2375,7 @@ static int writeback_finish(struct address_space *mapping,
> > if (wbc->range_cyclic || (wbc->range_whole && wbc->nr_to_write > 0))
> > mapping->writeback_index = wbc->done_index;
> >
> > - return wbc->err;
> > + return NULL;
>
> Having a return value that is always NULL feels a bit weird vs just
> doing that return in the caller.
It makes the callers neater. Compare:
if (!folio)
return writeback_finish(mapping, wbc, false);
vs
if (!folio) {
writeback_finish(mapping, wbc, false);
return NULL;
}
Similarly for the other two callers.
> > + if (error == AOP_WRITEPAGE_ACTIVATE) {
> > + folio_unlock(folio);
> > + error = 0;
>
> Note there really shouldn't be any need for this in outside of the
> legacy >writepage case. But it might make sense to delay the removal
> until after ->writepage is gone to avoid bugs in conversions.
ext4_journalled_writepage_callback() still returns
AOP_WRITEPAGE_ACTIVATE, and that's used by a direct call to
write_cache_pages().
> > + } else if (wbc->sync_mode != WB_SYNC_ALL) {
> > + wbc->err = error;
> > + wbc->done_index = folio->index +
> > + folio_nr_pages(folio);
>
> Btw, I wonder if a folio_next_index helper for this might be useful
> as it's a pattern we have in a few places.
I think that's a reasonable addition to the API.
On Tue, Jun 27, 2023 at 04:31:17PM +0100, Matthew Wilcox wrote:
> It makes the callers neater. Compare:
>
> if (!folio)
> return writeback_finish(mapping, wbc, false);
> vs
> if (!folio) {
> writeback_finish(mapping, wbc, false);
> return NULL;
> }
>
> Similarly for the other two callers.
Not sure I agree. See my quickly cooked up patch below. But in the
end this completely superficial and I won't complain, do it the way
your prefer.
>
> > > + if (error == AOP_WRITEPAGE_ACTIVATE) {
> > > + folio_unlock(folio);
> > > + error = 0;
> >
> > Note there really shouldn't be any need for this in outside of the
> > legacy >writepage case. But it might make sense to delay the removal
> > until after ->writepage is gone to avoid bugs in conversions.
>
> ext4_journalled_writepage_callback() still returns
> AOP_WRITEPAGE_ACTIVATE, and that's used by a direct call to
> write_cache_pages().
Yeah. But that could trivially do the open coded unlock_page.
But probably not worth mixing into this series.
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 55832679af2194..07bbbc0dec4d00 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,7 +2360,7 @@ void tag_pages_for_writeback(struct address_space *mapping,
}
EXPORT_SYMBOL(tag_pages_for_writeback);
-static struct folio *writeback_finish(struct address_space *mapping,
+static void writeback_finish(struct address_space *mapping,
struct writeback_control *wbc, bool done)
{
folio_batch_release(&wbc->fbatch);
@@ -2374,8 +2374,6 @@ static struct folio *writeback_finish(struct address_space *mapping,
wbc->done_index = 0;
if (wbc->range_cyclic || (wbc->range_whole && wbc->nr_to_write > 0))
mapping->writeback_index = wbc->done_index;
-
- return NULL;
}
static struct folio *writeback_get_next(struct address_space *mapping,
@@ -2435,20 +2433,19 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
{
struct folio *folio;
- for (;;) {
- folio = writeback_get_next(mapping, wbc);
- if (!folio)
- return writeback_finish(mapping, wbc, false);
+ while ((folio = writeback_get_next(mapping, wbc))) {
wbc->done_index = folio->index;
folio_lock(folio);
- if (likely(should_writeback_folio(mapping, wbc, folio)))
- break;
+ if (likely(should_writeback_folio(mapping, wbc, folio))) {
+ trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+ return folio;
+ }
folio_unlock(folio);
}
- trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
- return folio;
+ writeback_finish(mapping, wbc, false);
+ return NULL;
}
struct folio *writeback_iter_init(struct address_space *mapping,
@@ -2494,7 +2491,7 @@ struct folio *writeback_iter_next(struct address_space *mapping,
wbc->err = error;
wbc->done_index = folio->index +
folio_nr_pages(folio);
- return writeback_finish(mapping, wbc, true);
+ goto done;
}
if (!wbc->err)
wbc->err = error;
@@ -2507,9 +2504,12 @@ struct folio *writeback_iter_next(struct address_space *mapping,
* to entering this loop.
*/
if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
- return writeback_finish(mapping, wbc, true);
+ goto done;
return writeback_get_folio(mapping, wbc);
+done:
+ writeback_finish(mapping, wbc, true);
+ return NULL;
}
/**
On Tue 27-06-23 16:31:17, Matthew Wilcox wrote:
> > > + if (error == AOP_WRITEPAGE_ACTIVATE) {
> > > + folio_unlock(folio);
> > > + error = 0;
> >
> > Note there really shouldn't be any need for this in outside of the
> > legacy >writepage case. But it might make sense to delay the removal
> > until after ->writepage is gone to avoid bugs in conversions.
>
> ext4_journalled_writepage_callback() still returns
> AOP_WRITEPAGE_ACTIVATE, and that's used by a direct call to
> write_cache_pages().
Yeah. For record ext4_journalled_writepage_callback() is a bit of an abuse
of writeback code by ext4 data=journal path but it seemed like a lesser
evil than duplicating write_cache_pages() code. Essentially we need to
iterate all dirty folios and call page_mkclean() on them (to stop folio
modifications through mmap while transaction commit code is writing these
pages to the journal). If we exported from page writeback code enough
helpers for dirty page iteration, we could get rid of
ext4_journalled_writepage_callback() and its AOP_WRITEPAGE_ACTIVATE usage
as well.
Honza
@@ -2360,7 +2360,7 @@ void tag_pages_for_writeback(struct address_space *mapping,
}
EXPORT_SYMBOL(tag_pages_for_writeback);
-static int writeback_finish(struct address_space *mapping,
+static struct folio *writeback_finish(struct address_space *mapping,
struct writeback_control *wbc, bool done)
{
folio_batch_release(&wbc->fbatch);
@@ -2375,7 +2375,7 @@ static int writeback_finish(struct address_space *mapping,
if (wbc->range_cyclic || (wbc->range_whole && wbc->nr_to_write > 0))
mapping->writeback_index = wbc->done_index;
- return wbc->err;
+ return NULL;
}
static struct folio *writeback_get_next(struct address_space *mapping,
@@ -2438,7 +2438,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
for (;;) {
folio = writeback_get_next(mapping, wbc);
if (!folio)
- return NULL;
+ return writeback_finish(mapping, wbc, false);
wbc->done_index = folio->index;
folio_lock(folio);
@@ -2473,6 +2473,45 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
return writeback_get_folio(mapping, wbc);
}
+static struct folio *writeback_iter_next(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio, int error)
+{
+ if (unlikely(error)) {
+ /*
+ * Handle errors according to the type of writeback.
+ * There's no need to continue for background writeback.
+ * Just push done_index past this folio so media
+ * errors won't choke writeout for the entire file.
+ * For integrity writeback, we must process the entire
+ * dirty set regardless of errors because the fs may
+ * still have state to clear for each folio. In that
+ * case we continue processing and return the first error.
+ */
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
+ } else if (wbc->sync_mode != WB_SYNC_ALL) {
+ wbc->err = error;
+ wbc->done_index = folio->index +
+ folio_nr_pages(folio);
+ return writeback_finish(mapping, wbc, true);
+ }
+ if (!wbc->err)
+ wbc->err = error;
+ }
+
+ /*
+ * We stop writing back only if we are not doing integrity
+ * sync. In case of integrity sync we have to keep going until
+ * we have written all the folios we tagged for writeback prior
+ * to entering this loop.
+ */
+ if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
+ return writeback_finish(mapping, wbc, true);
+
+ return writeback_get_folio(mapping, wbc);
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2513,46 +2552,11 @@ int write_cache_pages(struct address_space *mapping,
for (folio = writeback_iter_init(mapping, wbc);
folio;
- folio = writeback_get_folio(mapping, wbc)) {
+ folio = writeback_iter_next(mapping, wbc, folio, error)) {
error = writepage(folio, wbc, data);
- if (unlikely(error)) {
- /*
- * Handle errors according to the type of
- * writeback. There's no need to continue for
- * background writeback. Just push done_index
- * past this page so media errors won't choke
- * writeout for the entire file. For integrity
- * writeback, we must process the entire dirty
- * set regardless of errors because the fs may
- * still have state to clear for each page. In
- * that case we continue processing and return
- * the first error.
- */
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- } else if (wbc->sync_mode != WB_SYNC_ALL) {
- wbc->err = error;
- wbc->done_index = folio->index +
- folio_nr_pages(folio);
- return writeback_finish(mapping, wbc, true);
- }
- if (!wbc->err)
- wbc->err = error;
- }
-
- /*
- * We stop writing back only if we are not doing
- * integrity sync. In case of integrity sync we have to
- * keep going until we have written all the pages
- * we tagged for writeback prior to entering this loop.
- */
- if (--wbc->nr_to_write <= 0 &&
- wbc->sync_mode == WB_SYNC_NONE)
- return writeback_finish(mapping, wbc, true);
}
- return writeback_finish(mapping, wbc, false);
+ return wbc->err;
}
EXPORT_SYMBOL(write_cache_pages);