md: Don't clear MD_CLOSING when the raid is about to stop

Message ID 20231211081714.1923567-1-linan666@huaweicloud.com
State New
Headers
Series md: Don't clear MD_CLOSING when the raid is about to stop |

Commit Message

Li Nan Dec. 11, 2023, 8:17 a.m. UTC
  From: Li Nan <linan122@huawei.com>

The raid should not be opened anymore when it is about to be stopped.
However, other processes can open it again if the flag MD_CLOSING is
cleared before exiting. From now on, this flag will not be cleared when
the raid will be stopped.

Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop")
Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)
  

Comments

Mariusz Tkaczyk Dec. 11, 2023, 9:56 a.m. UTC | #1
On Mon, 11 Dec 2023 16:17:14 +0800
linan666@huaweicloud.com wrote:

> From: Li Nan <linan122@huawei.com>
> 
> The raid should not be opened anymore when it is about to be stopped.
> However, other processes can open it again if the flag MD_CLOSING is
> cleared before exiting. From now on, this flag will not be cleared when
> the raid will be stopped.
> 
> Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called
> md_set_readonly or do_md_stop") Signed-off-by: Li Nan <linan122@huawei.com>

Hello Li Nan,
I was there when I needed to fix this:
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c8870379a21fbd9ad14ca36204ccfbe9d25def43

For sure, you have to consider applying same solution for array_store "clear".
Minor nit below.

Thanks,
Mariusz

> ---
>  drivers/md/md.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 4e9fe5cbeedc..ebdfc9068a60 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev)
>  	mddev->persistent = 0;
>  	mddev->level = LEVEL_NONE;
>  	mddev->clevel[0] = 0;
> -	mddev->flags = 0;

I recommend (safety recommendation):
	mddev->flags = MD_CLOSING;

Unless you can prove that other flags cannot race.

>  	mddev->sb_flags = 0;
>  	mddev->ro = MD_RDWR;
>  	mddev->metadata_type[0] = 0;
  
Yu Kuai Dec. 12, 2023, 3:21 a.m. UTC | #2
Hi,

在 2023/12/11 17:56, Mariusz Tkaczyk 写道:
> On Mon, 11 Dec 2023 16:17:14 +0800
> linan666@huaweicloud.com wrote:
> 
>> From: Li Nan <linan122@huawei.com>
>>
>> The raid should not be opened anymore when it is about to be stopped.
>> However, other processes can open it again if the flag MD_CLOSING is
>> cleared before exiting. From now on, this flag will not be cleared when
>> the raid will be stopped.
>>
>> Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called
>> md_set_readonly or do_md_stop") Signed-off-by: Li Nan <linan122@huawei.com>
> 
> Hello Li Nan,
> I was there when I needed to fix this:
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c8870379a21fbd9ad14ca36204ccfbe9d25def43
> 
> For sure, you have to consider applying same solution for array_store "clear".
> Minor nit below.
> 
> Thanks,
> Mariusz
> 
>> ---
>>   drivers/md/md.c | 8 +++-----
>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 4e9fe5cbeedc..ebdfc9068a60 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev)
>>   	mddev->persistent = 0;
>>   	mddev->level = LEVEL_NONE;
>>   	mddev->clevel[0] = 0;
>> -	mddev->flags = 0;
> 
> I recommend (safety recommendation):
> 	mddev->flags = MD_CLOSING;

Taking a look I think both MD_CLOSING and MD_DELETED should not be
cleared, however, there is no guarantee that MD_CLOSING will be set
before md_clean, because mdadm can be removed without running. Hence I
think just set MD_CLOSING is werid.

I think the proper way is to keep MD_CLOSING and MD_DELETED if they are
set. However, there is no such api to clear other bits at once. Since
we're not expecting anyone else to write flags, following maybe
acceptable:

mddev->flags &= BIT_ULL_MASK(MD_CLOSING) | BIT_ULL_MASK(MD_DELETED);

Or after making sure other flags cannot race, this patch is ok.

Thanks,
Kuai

> 
> Unless you can prove that other flags cannot race.
> 
>>   	mddev->sb_flags = 0;
>>   	mddev->ro = MD_RDWR;
>>   	mddev->metadata_type[0] = 0;
> 
> .
>
  
Mariusz Tkaczyk Dec. 12, 2023, 9:34 a.m. UTC | #3
On Tue, 12 Dec 2023 11:21:28 +0800
Yu Kuai <yukuai1@huaweicloud.com> wrote:

> Hi,
> 
> 在 2023/12/11 17:56, Mariusz Tkaczyk 写道:
> > On Mon, 11 Dec 2023 16:17:14 +0800
> > linan666@huaweicloud.com wrote:
> >   
> >> From: Li Nan <linan122@huawei.com>
> >>
> >> The raid should not be opened anymore when it is about to be stopped.
> >> However, other processes can open it again if the flag MD_CLOSING is
> >> cleared before exiting. From now on, this flag will not be cleared when
> >> the raid will be stopped.
> >>
> >> Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called
> >> md_set_readonly or do_md_stop") Signed-off-by: Li Nan
> >> <linan122@huawei.com>  
> > 
> > Hello Li Nan,
> > I was there when I needed to fix this:
> > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c8870379a21fbd9ad14ca36204ccfbe9d25def43
> > 
> > For sure, you have to consider applying same solution for array_store
> > "clear". Minor nit below.
> > 
> > Thanks,
> > Mariusz
> >   
> >> ---
> >>   drivers/md/md.c | 8 +++-----
> >>   1 file changed, 3 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 4e9fe5cbeedc..ebdfc9068a60 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev)
> >>   	mddev->persistent = 0;
> >>   	mddev->level = LEVEL_NONE;
> >>   	mddev->clevel[0] = 0;
> >> -	mddev->flags = 0;  
> > 
> > I recommend (safety recommendation):
> > 	mddev->flags = MD_CLOSING;  
> 
> Taking a look I think both MD_CLOSING and MD_DELETED should not be
> cleared, however, there is no guarantee that MD_CLOSING will be set
> before md_clean, because mdadm can be removed without running. Hence I
> think just set MD_CLOSING is werid.
> 
> I think the proper way is to keep MD_CLOSING and MD_DELETED if they are
> set. However, there is no such api to clear other bits at once. Since
> we're not expecting anyone else to write flags, following maybe
> acceptable:
> 
> mddev->flags &= BIT_ULL_MASK(MD_CLOSING) | BIT_ULL_MASK(MD_DELETED);

Yes, MD_CLOSING is a bit number to not a bit value I can assign directly.
Thanks for clarifying!

Mariusz
> 
> Or after making sure other flags cannot race, this patch is ok.
> 
> Thanks,
> Kuai
> 
> > 
> > Unless you can prove that other flags cannot race.
> >   
> >>   	mddev->sb_flags = 0;
> >>   	mddev->ro = MD_RDWR;
> >>   	mddev->metadata_type[0] = 0;  
> > 
> > .
> >   
>
  
kernel test robot Dec. 26, 2023, 6:24 a.m. UTC | #4
Hello,

kernel test robot noticed "mdadm-selftests.06name.fail" on:

commit: 4d060913335fad6f1d1f0816859c20aae823851c ("[PATCH] md: Don't clear MD_CLOSING when the raid is about to stop")
url: https://github.com/intel-lab-lkp/linux/commits/linan666-huaweicloud-com/md-Don-t-clear-MD_CLOSING-when-the-raid-is-about-to-stop/20231211-162010
base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next
patch link: https://lore.kernel.org/all/20231211081714.1923567-1-linan666@huaweicloud.com/
patch subject: [PATCH] md: Don't clear MD_CLOSING when the raid is about to stop

in testcase: mdadm-selftests
version: mdadm-selftests-x86_64-5f41845-1_20220826
with following parameters:

	disk: 1HDD
	test_prefix: 06name



compiler: gcc-12
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202312261217.c5597bdf-oliver.sang@intel.com

2023-12-24 10:19:28 mkdir -p /var/tmp
2023-12-24 10:19:28 mke2fs -t ext3 -b 4096 -J size=4 -q /dev/sdb1
2023-12-24 10:19:59 mount -t ext3 /dev/sdb1 /var/tmp
sed -e 's/{DEFAULT_METADATA}/1.2/g' \
-e 's,{MAP_PATH},/run/mdadm/map,g'  mdadm.8.in > mdadm.8
/usr/bin/install -D -m 644 mdadm.8 /usr/share/man/man8/mdadm.8
/usr/bin/install -D -m 644 mdmon.8 /usr/share/man/man8/mdmon.8
/usr/bin/install -D -m 644 md.4 /usr/share/man/man4/md.4
/usr/bin/install -D -m 644 mdadm.conf.5 /usr/share/man/man5/mdadm.conf.5
/usr/bin/install -D -m 644 udev-md-raid-creating.rules /lib/udev/rules.d/01-md-raid-creating.rules
/usr/bin/install -D -m 644 udev-md-raid-arrays.rules /lib/udev/rules.d/63-md-raid-arrays.rules
/usr/bin/install -D -m 644 udev-md-raid-assembly.rules /lib/udev/rules.d/64-md-raid-assembly.rules
/usr/bin/install -D -m 644 udev-md-clustered-confirm-device.rules /lib/udev/rules.d/69-md-clustered-confirm-device.rules
/usr/bin/install -D  -m 755 mdadm /sbin/mdadm
/usr/bin/install -D  -m 755 mdmon /sbin/mdmon
Testing on linux-6.7.0-rc3-00014-g4d060913335f kernel
/lkp/benchmarks/mdadm-selftests/tests/06name... FAILED - see /var/tmp/06name.log and /var/tmp/fail06name.log for details



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231226/202312261217.c5597bdf-oliver.sang@intel.com
  
Li Nan Dec. 26, 2023, 12:22 p.m. UTC | #5
在 2023/12/26 14:24, kernel test robot 写道:
> 
> 
> Hello,
> 
> kernel test robot noticed "mdadm-selftests.06name.fail" on:
> 
> commit: 4d060913335fad6f1d1f0816859c20aae823851c ("[PATCH] md: Don't clear MD_CLOSING when the raid is about to stop")
> url: https://github.com/intel-lab-lkp/linux/commits/linan666-huaweicloud-com/md-Don-t-clear-MD_CLOSING-when-the-raid-is-about-to-stop/20231211-162010
> base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next
> patch link: https://lore.kernel.org/all/20231211081714.1923567-1-linan666@huaweicloud.com/
> patch subject: [PATCH] md: Don't clear MD_CLOSING when the raid is about to stop
> 
> in testcase: mdadm-selftests
> version: mdadm-selftests-x86_64-5f41845-1_20220826
> with following parameters:
> 
> 	disk: 1HDD
> 	test_prefix: 06name
> 
> 
> 
> compiler: gcc-12
> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202312261217.c5597bdf-oliver.sang@intel.com
> 
> 2023-12-24 10:19:28 mkdir -p /var/tmp
> 2023-12-24 10:19:28 mke2fs -t ext3 -b 4096 -J size=4 -q /dev/sdb1
> 2023-12-24 10:19:59 mount -t ext3 /dev/sdb1 /var/tmp
> sed -e 's/{DEFAULT_METADATA}/1.2/g' \
> -e 's,{MAP_PATH},/run/mdadm/map,g'  mdadm.8.in > mdadm.8
> /usr/bin/install -D -m 644 mdadm.8 /usr/share/man/man8/mdadm.8
> /usr/bin/install -D -m 644 mdmon.8 /usr/share/man/man8/mdmon.8
> /usr/bin/install -D -m 644 md.4 /usr/share/man/man4/md.4
> /usr/bin/install -D -m 644 mdadm.conf.5 /usr/share/man/man5/mdadm.conf.5
> /usr/bin/install -D -m 644 udev-md-raid-creating.rules /lib/udev/rules.d/01-md-raid-creating.rules
> /usr/bin/install -D -m 644 udev-md-raid-arrays.rules /lib/udev/rules.d/63-md-raid-arrays.rules
> /usr/bin/install -D -m 644 udev-md-raid-assembly.rules /lib/udev/rules.d/64-md-raid-assembly.rules
> /usr/bin/install -D -m 644 udev-md-clustered-confirm-device.rules /lib/udev/rules.d/69-md-clustered-confirm-device.rules
> /usr/bin/install -D  -m 755 mdadm /sbin/mdadm
> /usr/bin/install -D  -m 755 mdmon /sbin/mdmon
> Testing on linux-6.7.0-rc3-00014-g4d060913335f kernel
> /lkp/benchmarks/mdadm-selftests/tests/06name... FAILED - see /var/tmp/06name.log and /var/tmp/fail06name.log for details
> 
> 

If MD_CLOSING is not cleared when stopping, it will still exist when
assembling and mddev can't be opened anymore. We need to clear it when 
starting. I will fix it in next version.

> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20231226/202312261217.c5597bdf-oliver.sang@intel.com
> 
> 
>
  
Li Nan Jan. 18, 2024, 1:49 a.m. UTC | #6
在 2023/12/12 11:21, Yu Kuai 写道:
> Hi,
> 
> 在 2023/12/11 17:56, Mariusz Tkaczyk 写道:
>> On Mon, 11 Dec 2023 16:17:14 +0800
>> linan666@huaweicloud.com wrote:
>>
>>> From: Li Nan <linan122@huawei.com>
>>>
>>> The raid should not be opened anymore when it is about to be stopped.
>>> However, other processes can open it again if the flag MD_CLOSING is
>>> cleared before exiting. From now on, this flag will not be cleared when
>>> the raid will be stopped.
>>>
>>> Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called
>>> md_set_readonly or do_md_stop") Signed-off-by: Li Nan 
>>> <linan122@huawei.com>
>>
>> Hello Li Nan,
>> I was there when I needed to fix this:
>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c8870379a21fbd9ad14ca36204ccfbe9d25def43 
>>
>>
>> For sure, you have to consider applying same solution for array_store 
>> "clear".
>> Minor nit below.
>>
>> Thanks,
>> Mariusz
>>
>>> ---
>>>   drivers/md/md.c | 8 +++-----
>>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 4e9fe5cbeedc..ebdfc9068a60 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev)
>>>       mddev->persistent = 0;
>>>       mddev->level = LEVEL_NONE;
>>>       mddev->clevel[0] = 0;
>>> -    mddev->flags = 0;
>>
>> I recommend (safety recommendation):
>>     mddev->flags = MD_CLOSING;
> 
> Taking a look I think both MD_CLOSING and MD_DELETED should not be
> cleared, however, there is no guarantee that MD_CLOSING will be set
> before md_clean, because mdadm can be removed without running. Hence I
> think just set MD_CLOSING is werid.
> 
> I think the proper way is to keep MD_CLOSING and MD_DELETED if they are
> set. However, there is no such api to clear other bits at once. Since
> we're not expecting anyone else to write flags, following maybe
> acceptable:
> 
> mddev->flags &= BIT_ULL_MASK(MD_CLOSING) | BIT_ULL_MASK(MD_DELETED);
> 

MD_DELETED is only set after mddev->active is put to 0. We need to open 
mddev and get it before stropping raid, so the active must not be 0 and
MD_DELETED  will not be set in md_clean.

> Or after making sure other flags cannot race, this patch is ok.
> 
> Thanks,
> Kuai
> 
>>
>> Unless you can prove that other flags cannot race.
>>
>>>       mddev->sb_flags = 0;
>>>       mddev->ro = MD_RDWR;
>>>       mddev->metadata_type[0] = 0;
>>
>> .
>>
> 
> 
> .
  

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4e9fe5cbeedc..ebdfc9068a60 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6238,7 +6238,6 @@  static void md_clean(struct mddev *mddev)
 	mddev->persistent = 0;
 	mddev->level = LEVEL_NONE;
 	mddev->clevel[0] = 0;
-	mddev->flags = 0;
 	mddev->sb_flags = 0;
 	mddev->ro = MD_RDWR;
 	mddev->metadata_type[0] = 0;
@@ -7614,7 +7613,6 @@  static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 	int err = 0;
 	void __user *argp = (void __user *)arg;
 	struct mddev *mddev = NULL;
-	bool did_set_md_closing = false;
 
 	if (!md_ioctl_valid(cmd))
 		return -ENOTTY;
@@ -7698,7 +7696,6 @@  static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 			err = -EBUSY;
 			goto out;
 		}
-		did_set_md_closing = true;
 		mutex_unlock(&mddev->open_mutex);
 		sync_blockdev(bdev);
 	}
@@ -7742,10 +7739,13 @@  static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 
 	case STOP_ARRAY:
 		err = do_md_stop(mddev, 0, bdev);
+		if (err)
+			clear_bit(MD_CLOSING, &mddev->flags);
 		goto unlock;
 
 	case STOP_ARRAY_RO:
 		err = md_set_readonly(mddev, bdev);
+		clear_bit(MD_CLOSING, &mddev->flags);
 		goto unlock;
 
 	case HOT_REMOVE_DISK:
@@ -7840,8 +7840,6 @@  static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 				     mddev_unlock(mddev);
 
 out:
-	if(did_set_md_closing)
-		clear_bit(MD_CLOSING, &mddev->flags);
 	return err;
 }
 #ifdef CONFIG_COMPAT