[v2,1/4] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request

Message ID 20230526074551.669792-2-linan666@huaweicloud.com
State New
Headers
Series raid10 bugfix |

Commit Message

Li Nan May 26, 2023, 7:45 a.m. UTC
  From: Li Nan <linan122@huawei.com>

need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
will be deref later. However, the latter check of mreplace might set
mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.

Fix it by merging two checks into one. And replace 'need_replace' with
'mreplace' because their values are always the same.

Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)
  

Comments

Song Liu May 26, 2023, 9:38 p.m. UTC | #1
On Fri, May 26, 2023 at 12:47 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
> will be deref later. However, the latter check of mreplace might set
> mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.
>
> Fix it by merging two checks into one. And replace 'need_replace' with
> 'mreplace' because their values are always the same.
>
> Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>  drivers/md/raid10.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 4fcfcb350d2b..e21502c03b45 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>                         int must_sync;
>                         int any_working;
>                         int need_recover = 0;
> -                       int need_replace = 0;
>                         struct raid10_info *mirror = &conf->mirrors[i];
>                         struct md_rdev *mrdev, *mreplace;
>
> @@ -3451,10 +3450,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>                             !test_bit(In_sync, &mrdev->flags))
>                                 need_recover = 1;
>                         if (mreplace != NULL &&
> -                           !test_bit(Faulty, &mreplace->flags))
> -                               need_replace = 1;
> +                           test_bit(Faulty, &mreplace->flags))
> +                               mreplace = NULL;
>
> -                       if (!need_recover && !need_replace) {
> +                       if (!need_recover && !mreplace) {
>                                 rcu_read_unlock();
>                                 continue;
>                         }
> @@ -3470,8 +3469,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>                                 rcu_read_unlock();
>                                 continue;
>                         }

To make sure I understand the issue correctly:

The null-ptr-deref only happens when the Faulty bit was set after the
last check and before this check below, right?

> -                       if (mreplace && test_bit(Faulty, &mreplace->flags))
> -                               mreplace = NULL;
>                         /* Unless we are doing a full sync, or a replacement
>                          * we only need to recover the block if it is set in
>                          * the bitmap

Thanks,
Song

> @@ -3594,11 +3591,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>                                 bio = r10_bio->devs[1].repl_bio;
>                                 if (bio)
>                                         bio->bi_end_io = NULL;
> -                               /* Note: if need_replace, then bio
> +                               /* Note: if replace is not NULL, then bio
>                                  * cannot be NULL as r10buf_pool_alloc will
>                                  * have allocated it.
>                                  */
> -                               if (!need_replace)
> +                               if (!mreplace)
>                                         break;
>                                 bio->bi_next = biolist;
>                                 biolist = bio;
> --
> 2.31.1
>
  
Yu Kuai May 27, 2023, 1:17 a.m. UTC | #2
Hi,

在 2023/05/27 5:38, Song Liu 写道:
> On Fri, May 26, 2023 at 12:47 AM <linan666@huaweicloud.com> wrote:                    }
> 
> To make sure I understand the issue correctly:
> 
> The null-ptr-deref only happens when the Faulty bit was set after the
> last check and before this check below, right?

Yes, you're right.

Thanks,
Kuai
> 
>> -                       if (mreplace && test_bit(Faulty, &mreplace->flags))
>> -                               mreplace = NULL;
>>                          /* Unless we are doing a full sync, or a replacement
>>                           * we only need to recover the block if it is set in
>>                           * the bitmap
> 
> Thanks,
> Song
>
  
Yu Kuai May 27, 2023, 1:21 a.m. UTC | #3
Hi,

在 2023/05/26 15:45, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
> will be deref later. However, the latter check of mreplace might set
> mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.
> 
> Fix it by merging two checks into one. And replace 'need_replace' with
> 'mreplace' because their values are always the same.
> 
> Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
> Signed-off-by: Li Nan <linan122@huawei.com>

Other than some nits below, this patch looks good to me, feel free too
add:

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/raid10.c | 13 +++++--------
>   1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 4fcfcb350d2b..e21502c03b45 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   			int must_sync;
>   			int any_working;
>   			int need_recover = 0;
> -			int need_replace = 0;
>   			struct raid10_info *mirror = &conf->mirrors[i];
>   			struct md_rdev *mrdev, *mreplace;
>   
> @@ -3451,10 +3450,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   			    !test_bit(In_sync, &mrdev->flags))
>   				need_recover = 1;
>   			if (mreplace != NULL &&
> -			    !test_bit(Faulty, &mreplace->flags))
> -				need_replace = 1;
> +			    test_bit(Faulty, &mreplace->flags))
This can be keeped in one line.

> +				mreplace = NULL;
>   
> -			if (!need_recover && !need_replace) {
> +			if (!need_recover && !mreplace) {
>   				rcu_read_unlock();
>   				continue;
>   			}
> @@ -3470,8 +3469,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   				rcu_read_unlock();
>   				continue;
>   			}
> -			if (mreplace && test_bit(Faulty, &mreplace->flags))
> -				mreplace = NULL;
>   			/* Unless we are doing a full sync, or a replacement
>   			 * we only need to recover the block if it is set in
>   			 * the bitmap
> @@ -3594,11 +3591,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   				bio = r10_bio->devs[1].repl_bio;
>   				if (bio)
>   					bio->bi_end_io = NULL;
> -				/* Note: if need_replace, then bio
> +				/* Note: if replace is not NULL, then bio
>   				 * cannot be NULL as r10buf_pool_alloc will
>   				 * have allocated it.
>   				 */
> -				if (!need_replace)
> +				if (!mreplace)
>   					break;
>   				bio->bi_next = biolist;
>   				biolist = bio;
>
  
Li Nan May 27, 2023, 1:28 a.m. UTC | #4
在 2023/5/27 5:38, Song Liu 写道:
> On Fri, May 26, 2023 at 12:47 AM <linan666@huaweicloud.com> wrote:
>>
>> From: Li Nan <linan122@huawei.com>
>>
>> need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
>> will be deref later. However, the latter check of mreplace might set
>> mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.
>>
>> Fix it by merging two checks into one. And replace 'need_replace' with
>> 'mreplace' because their values are always the same.
>>
>> Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>>   drivers/md/raid10.c | 13 +++++--------
>>   1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 4fcfcb350d2b..e21502c03b45 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>                          int must_sync;
>>                          int any_working;
>>                          int need_recover = 0;
>> -                       int need_replace = 0;
>>                          struct raid10_info *mirror = &conf->mirrors[i];
>>                          struct md_rdev *mrdev, *mreplace;
>>
>> @@ -3451,10 +3450,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>                              !test_bit(In_sync, &mrdev->flags))
>>                                  need_recover = 1;
>>                          if (mreplace != NULL &&
>> -                           !test_bit(Faulty, &mreplace->flags))
>> -                               need_replace = 1;
>> +                           test_bit(Faulty, &mreplace->flags))
>> +                               mreplace = NULL;
>>
>> -                       if (!need_recover && !need_replace) {
>> +                       if (!need_recover && !mreplace) {
>>                                  rcu_read_unlock();
>>                                  continue;
>>                          }
>> @@ -3470,8 +3469,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>                                  rcu_read_unlock();
>>                                  continue;
>>                          }
> 
> To make sure I understand the issue correctly:
> 
> The null-ptr-deref only happens when the Faulty bit was set after the
> last check and before this check below, right?
> 

Yes. I will improve log in next version.
  
Li Nan May 27, 2023, 1:29 a.m. UTC | #5
在 2023/5/27 9:21, Yu Kuai 写道:
> Hi,
> 
> 在 2023/05/26 15:45, linan666@huaweicloud.com 写道:
>> From: Li Nan <linan122@huawei.com>
>>
>> need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
>> will be deref later. However, the latter check of mreplace might set
>> mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this 
>> time.
>>
>> Fix it by merging two checks into one. And replace 'need_replace' with
>> 'mreplace' because their values are always the same.
>>
>> Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new 
>> added disk faulty")
>> Signed-off-by: Li Nan <linan122@huawei.com>
> 
> Other than some nits below, this patch looks good to me, feel free too
> add:
> 
> Reviewed-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/raid10.c | 13 +++++--------
>>   1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 4fcfcb350d2b..e21502c03b45 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev 
>> *mddev, sector_t sector_nr,
>>               int must_sync;
>>               int any_working;
>>               int need_recover = 0;
>> -            int need_replace = 0;
>>               struct raid10_info *mirror = &conf->mirrors[i];
>>               struct md_rdev *mrdev, *mreplace;
>> @@ -3451,10 +3450,10 @@ static sector_t raid10_sync_request(struct 
>> mddev *mddev, sector_t sector_nr,
>>                   !test_bit(In_sync, &mrdev->flags))
>>                   need_recover = 1;
>>               if (mreplace != NULL &&
>> -                !test_bit(Faulty, &mreplace->flags))
>> -                need_replace = 1;
>> +                test_bit(Faulty, &mreplace->flags))
> This can be keeped in one line.
> 

OK, I will change it.
Thanks for your review.
  

Patch

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4fcfcb350d2b..e21502c03b45 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3438,7 +3438,6 @@  static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			int must_sync;
 			int any_working;
 			int need_recover = 0;
-			int need_replace = 0;
 			struct raid10_info *mirror = &conf->mirrors[i];
 			struct md_rdev *mrdev, *mreplace;
 
@@ -3451,10 +3450,10 @@  static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			    !test_bit(In_sync, &mrdev->flags))
 				need_recover = 1;
 			if (mreplace != NULL &&
-			    !test_bit(Faulty, &mreplace->flags))
-				need_replace = 1;
+			    test_bit(Faulty, &mreplace->flags))
+				mreplace = NULL;
 
-			if (!need_recover && !need_replace) {
+			if (!need_recover && !mreplace) {
 				rcu_read_unlock();
 				continue;
 			}
@@ -3470,8 +3469,6 @@  static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				rcu_read_unlock();
 				continue;
 			}
-			if (mreplace && test_bit(Faulty, &mreplace->flags))
-				mreplace = NULL;
 			/* Unless we are doing a full sync, or a replacement
 			 * we only need to recover the block if it is set in
 			 * the bitmap
@@ -3594,11 +3591,11 @@  static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				bio = r10_bio->devs[1].repl_bio;
 				if (bio)
 					bio->bi_end_io = NULL;
-				/* Note: if need_replace, then bio
+				/* Note: if replace is not NULL, then bio
 				 * cannot be NULL as r10buf_pool_alloc will
 				 * have allocated it.
 				 */
-				if (!need_replace)
+				if (!mreplace)
 					break;
 				bio->bi_next = biolist;
 				biolist = bio;