erofs: fix missing xas_retry() in fscache mode

Message ID 20221111090813.72068-1-jefflexu@linux.alibaba.com
State New
Headers
Series erofs: fix missing xas_retry() in fscache mode |

Commit Message

Jingbo Xu Nov. 11, 2022, 9:08 a.m. UTC
  The xarray iteration only holds RCU and thus may encounter
XA_RETRY_ENTRY if there's process modifying the xarray concurrently.
This will cause oops when referring to the invalid entry.

Fix this by adding the missing xas_retry(), which will make the
iteration wind back to the root node if XA_RETRY_ENTRY is encountered.

Fixes: d435d53228dd ("erofs: change to use asynchronous io for fscache readpage/readahead")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
  

Comments

Gao Xiang Nov. 14, 2022, 4:42 a.m. UTC | #1
On Fri, Nov 11, 2022 at 05:08:13PM +0800, Jingbo Xu wrote:
> The xarray iteration only holds RCU and thus may encounter
> XA_RETRY_ENTRY if there's process modifying the xarray concurrently.
> This will cause oops when referring to the invalid entry.
> 
> Fix this by adding the missing xas_retry(), which will make the
> iteration wind back to the root node if XA_RETRY_ENTRY is encountered.
> 
> Fixes: d435d53228dd ("erofs: change to use asynchronous io for fscache readpage/readahead")
> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/fscache.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index fe05bc51f9f2..458c1c70ef30 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -75,11 +75,15 @@ static void erofs_fscache_rreq_unlock_folios(struct netfs_io_request *rreq)
>  
>  	rcu_read_lock();
>  	xas_for_each(&xas, folio, last_page) {
> -		unsigned int pgpos =
> -			(folio_index(folio) - start_page) * PAGE_SIZE;
> -		unsigned int pgend = pgpos + folio_size(folio);
> +		unsigned int pgpos, pgend;
>  		bool pg_failed = false;
>  
> +		if (xas_retry(&xas, folio))
> +			continue;
> +
> +		pgpos = (folio_index(folio) - start_page) * PAGE_SIZE;
> +		pgend = pgpos + folio_size(folio);
> +
>  		for (;;) {
>  			if (!subreq) {
>  				pg_failed = true;
> -- 
> 2.19.1.6.gb485710b
  
Jia Zhu Nov. 14, 2022, 6:27 a.m. UTC | #2
在 2022/11/11 17:08, Jingbo Xu 写道:
> The xarray iteration only holds RCU and thus may encounter
> XA_RETRY_ENTRY if there's process modifying the xarray concurrently.
> This will cause oops when referring to the invalid entry.
> 
> Fix this by adding the missing xas_retry(), which will make the
> iteration wind back to the root node if XA_RETRY_ENTRY is encountered.
> 
> Fixes: d435d53228dd ("erofs: change to use asynchronous io for fscache readpage/readahead")
> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
> ---
>   fs/erofs/fscache.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index fe05bc51f9f2..458c1c70ef30 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -75,11 +75,15 @@ static void erofs_fscache_rreq_unlock_folios(struct netfs_io_request *rreq)
>   
>   	rcu_read_lock();
>   	xas_for_each(&xas, folio, last_page) {
> -		unsigned int pgpos =
> -			(folio_index(folio) - start_page) * PAGE_SIZE;
> -		unsigned int pgend = pgpos + folio_size(folio);
> +		unsigned int pgpos, pgend;
>   		bool pg_failed = false;
>   
> +		if (xas_retry(&xas, folio))
> +			continue;
> +
> +		pgpos = (folio_index(folio) - start_page) * PAGE_SIZE;
> +		pgend = pgpos + folio_size(folio);
> +
>   		for (;;) {
>   			if (!subreq) {
>   				pg_failed = true;
  
David Howells Nov. 14, 2022, 11:44 a.m. UTC | #3
Jingbo Xu <jefflexu@linux.alibaba.com> wrote:

> The xarray iteration only holds RCU

I would say "the RCU read lock".

Also, I think you've copied the code to which my dodgy-maths fix applies:

	https://lore.kernel.org/linux-fsdevel/166757988611.950645.7626959069846893164.stgit@warthog.procyon.org.uk/

David
  
Jingbo Xu Nov. 14, 2022, 12:11 p.m. UTC | #4
Hi David,

Thanks for the comment.

On 11/14/22 7:44 PM, David Howells wrote:
> Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> The xarray iteration only holds RCU
> 
> I would say "the RCU read lock".

Yeah, this looks clearer. I will update the commit message in v2 later.

> 
> Also, I think you've copied the code to which my dodgy-maths fix applies:
> 
> 	https://lore.kernel.org/linux-fsdevel/166757988611.950645.7626959069846893164.stgit@warthog.procyon.org.uk/
> 

Thanks for the kindly reminder. Yeah this code was ever copied from
libnetfs. In the scenario of erofs, currently req->start is always
aligned with folio size and erofs doesn't support large folio yet. Thus
req->start won't be inside the folio so far, and I think the current
code works well in the scenario of erofs, though the issue indeed exist
mathematically.

Actually I'm working on the support for large folio now, and the
completion routine of erofs in fscache mode will be refactored quite a
lot. I think this issue will be fixed along with the refactoring.

Thanks again for the suggestion :)
  

Patch

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index fe05bc51f9f2..458c1c70ef30 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -75,11 +75,15 @@  static void erofs_fscache_rreq_unlock_folios(struct netfs_io_request *rreq)
 
 	rcu_read_lock();
 	xas_for_each(&xas, folio, last_page) {
-		unsigned int pgpos =
-			(folio_index(folio) - start_page) * PAGE_SIZE;
-		unsigned int pgend = pgpos + folio_size(folio);
+		unsigned int pgpos, pgend;
 		bool pg_failed = false;
 
+		if (xas_retry(&xas, folio))
+			continue;
+
+		pgpos = (folio_index(folio) - start_page) * PAGE_SIZE;
+		pgend = pgpos + folio_size(folio);
+
 		for (;;) {
 			if (!subreq) {
 				pg_failed = true;