[v2,2/6] md: fix soft lockup in status_resync

Message ID 20230310073855.1337560-3-yukuai1@huaweicloud.com
State New
Headers
Series md/raid10: several simple obvious bugfix |

Commit Message

Yu Kuai March 10, 2023, 7:38 a.m. UTC
  From: Yu Kuai <yukuai3@huawei.com>

status_resync() will calculate 'curr_resync - recovery_active' to show
user a progress bar like following:

[============>........]  resync = 61.4%

'curr_resync' and 'recovery_active' is updated in md_do_sync(), and
status_resync() can read them concurrently, hence it's possible that
'curr_resync - recovery_active' can overflow to a huge number. In this
case status_resync() will be stuck in the loop to print a large amount
of '=', which will end up soft lockup.

Fix the problem by setting 'resync' to MD_RESYNC_ACTIVE in this case,
this way resync in progress will be reported to user.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)
  

Comments

Song Liu March 13, 2023, 10:24 p.m. UTC | #1
On Thu, Mar 9, 2023 at 11:39 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> status_resync() will calculate 'curr_resync - recovery_active' to show
> user a progress bar like following:
>
> [============>........]  resync = 61.4%
>
> 'curr_resync' and 'recovery_active' is updated in md_do_sync(), and
> status_resync() can read them concurrently, hence it's possible that
> 'curr_resync - recovery_active' can overflow to a huge number. In this
> case status_resync() will be stuck in the loop to print a large amount
> of '=', which will end up soft lockup.
>
> Fix the problem by setting 'resync' to MD_RESYNC_ACTIVE in this case,
> this way resync in progress will be reported to user.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Applied to md-next.

Thanks,
Song

> ---
>  drivers/md/md.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 546b1b81eb28..98970bbe32bf 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8009,16 +8009,16 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
>         } else if (resync > max_sectors) {
>                 resync = max_sectors;
>         } else {
> -               resync -= atomic_read(&mddev->recovery_active);
> -               if (resync < MD_RESYNC_ACTIVE) {
> -                       /*
> -                        * Resync has started, but the subtraction has
> -                        * yielded one of the special values. Force it
> -                        * to active to ensure the status reports an
> -                        * active resync.
> -                        */
> +               res = atomic_read(&mddev->recovery_active);
> +               /*
> +                * Resync has started, but the subtraction has overflowed or
> +                * yielded one of the special values. Force it to active to
> +                * ensure the status reports an active resync.
> +                */
> +               if (resync < res || resync - res < MD_RESYNC_ACTIVE)
>                         resync = MD_RESYNC_ACTIVE;
> -               }
> +               else
> +                       resync -= res;
>         }
>
>         if (resync == MD_RESYNC_NONE) {
> --
> 2.31.1
>
  

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 546b1b81eb28..98970bbe32bf 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8009,16 +8009,16 @@  static int status_resync(struct seq_file *seq, struct mddev *mddev)
 	} else if (resync > max_sectors) {
 		resync = max_sectors;
 	} else {
-		resync -= atomic_read(&mddev->recovery_active);
-		if (resync < MD_RESYNC_ACTIVE) {
-			/*
-			 * Resync has started, but the subtraction has
-			 * yielded one of the special values. Force it
-			 * to active to ensure the status reports an
-			 * active resync.
-			 */
+		res = atomic_read(&mddev->recovery_active);
+		/*
+		 * Resync has started, but the subtraction has overflowed or
+		 * yielded one of the special values. Force it to active to
+		 * ensure the status reports an active resync.
+		 */
+		if (resync < res || resync - res < MD_RESYNC_ACTIVE)
 			resync = MD_RESYNC_ACTIVE;
-		}
+		else
+			resync -= res;
 	}
 
 	if (resync == MD_RESYNC_NONE) {