[RFC,v1,0/1] nvme testsuite runtime optimization

Message ID	20230419085643.25714-1-dwagner@suse.de
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Daniel Wagner <dwagner@suse.de> To: linux-nvme@lists.infradead.org Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Chaitanya Kulkarni <kch@nvidia.com>, Shin'ichiro Kawasaki <shinichiro@fastmail.com>, Daniel Wagner <dwagner@suse.de> Subject: [RFC v1 0/1] nvme testsuite runtime optimization Date: Wed, 19 Apr 2023 10:56:42 +0200 Message-Id: <20230419085643.25714-1-dwagner@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	nvme testsuite runtime optimization \| [RFC,v1,0/1] nvme testsuite runtime optimization [RFC,v1,1/1] nvme: Limit runtime for verification and limit test image size

Message ID

20230419085643.25714-1-dwagner@suse.de

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: Daniel Wagner <dwagner@suse.de>
To: linux-nvme@lists.infradead.org
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
        Chaitanya Kulkarni <kch@nvidia.com>,
        Shin'ichiro Kawasaki <shinichiro@fastmail.com>,
        Daniel Wagner <dwagner@suse.de>
Subject: [RFC v1 0/1] nvme testsuite runtime optimization
Date: Wed, 19 Apr 2023 10:56:42 +0200
Message-Id: <20230419085643.25714-1-dwagner@suse.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

nvme testsuite runtime optimization |

Message

Daniel Wagner April 19, 2023, 8:56 a.m. UTC

  While testing the fc transport I got a bit tired of wait for the I/O jobs to
finish. Thus here some runtime optimization.

With a small/slow VM I got following values:

with 'optimizations'
  loop:
    real    4m43.981s
    user    0m17.754s
    sys     2m6.249s

  rdma:
    real    2m35.160s
    user    0m6.264s
    sys     0m56.230s

  tcp:
    real    2m30.391s
    user    0m5.770s
    sys     0m46.007s

  fc:
    real    2m19.738s
    user    0m6.012s
    sys     0m42.201s

base:
  loop:
    real    7m35.061s
    user    0m23.493s
    sys     2m54.866s

  rdma:
    real    8m29.347s
    user    0m13.078s
    sys     1m53.158s

  tcp:
    real    8m11.357s
    user    0m13.033s
    sys     2m43.156s

  fc:
    real    5m46.615s
    user    0m12.819s
    sys     1m46.338s

Daniel Wagner (1):
  nvme: Limit runtime for verification and limit test image size

 common/xfs     |  3 ++-
 tests/nvme/004 |  2 +-
 tests/nvme/005 |  2 +-
 tests/nvme/006 |  2 +-
 tests/nvme/007 |  2 +-
 tests/nvme/008 |  2 +-
 tests/nvme/009 |  2 +-
 tests/nvme/010 |  5 +++--
 tests/nvme/011 |  5 +++--
 tests/nvme/012 |  4 ++--
 tests/nvme/013 |  4 ++--
 tests/nvme/014 | 10 ++++++++--
 tests/nvme/015 | 10 ++++++++--
 tests/nvme/017 |  2 +-
 tests/nvme/018 |  2 +-
 tests/nvme/019 |  2 +-
 tests/nvme/020 |  2 +-
 tests/nvme/021 |  2 +-
 tests/nvme/022 |  2 +-
 tests/nvme/023 |  2 +-
 tests/nvme/024 |  2 +-
 tests/nvme/025 |  2 +-
 tests/nvme/026 |  2 +-
 tests/nvme/027 |  2 +-
 tests/nvme/028 |  2 +-
 tests/nvme/029 |  2 +-
 tests/nvme/031 |  2 +-
 tests/nvme/032 |  4 ++--
 tests/nvme/034 |  3 ++-
 tests/nvme/035 |  4 ++--
 tests/nvme/040 |  4 ++--
 tests/nvme/041 |  2 +-
 tests/nvme/042 |  2 +-
 tests/nvme/043 |  2 +-
 tests/nvme/044 |  2 +-
 tests/nvme/045 |  2 +-
 tests/nvme/047 |  2 +-
 tests/nvme/048 |  2 +-
 38 files changed, 63 insertions(+), 47 deletions(-)

Comments

Chaitanya Kulkarni April 19, 2023, 9:34 a.m. UTC | #1

On 4/19/23 01:56, Daniel Wagner wrote:
> While testing the fc transport I got a bit tired of wait for the I/O jobs to
> finish. Thus here some runtime optimization.
>
> With a small/slow VM I got following values:
>
> with 'optimizations'
>    loop:
>      real    4m43.981s
>      user    0m17.754s
>      sys     2m6.249s
>
>    rdma:
>      real    2m35.160s
>      user    0m6.264s
>      sys     0m56.230s
>
>    tcp:
>      real    2m30.391s
>      user    0m5.770s
>      sys     0m46.007s
>
>    fc:
>      real    2m19.738s
>      user    0m6.012s
>      sys     0m42.201s
>
> base:
>    loop:
>      real    7m35.061s
>      user    0m23.493s
>      sys     2m54.866s
>
>    rdma:
>      real    8m29.347s
>      user    0m13.078s
>      sys     1m53.158s
>
>    tcp:
>      real    8m11.357s
>      user    0m13.033s
>      sys     2m43.156s
>
>    fc:
>      real    5m46.615s
>      user    0m12.819s
>      sys     1m46.338s
>
>

Those jobs are meant to be run for at least 1G to establish
confidence on the data set and the system under test since SSDs
are in TBs nowadays and we don't even get anywhere close to that,
with your suggestion we are going even lower ...

we cannot change the dataset size for slow VMs, instead add
a command line argument and pass it to tests e.g.
nvme_verification_size=XXX similar to nvme_trtype but don't change
the default values which we have been testing for years now

Testing is supposed to be time consuming especially verification jobs..

-ck

Sagi Grimberg April 19, 2023, 9:50 a.m. UTC | #2

>> While testing the fc transport I got a bit tired of wait for the I/O jobs to
>> finish. Thus here some runtime optimization.
>>
>> With a small/slow VM I got following values:
>>
>> with 'optimizations'
>>     loop:
>>       real    4m43.981s
>>       user    0m17.754s
>>       sys     2m6.249s

How come loop is doubling the time with this patch?
ratio is not the same before and after.

>>
>>     rdma:
>>       real    2m35.160s
>>       user    0m6.264s
>>       sys     0m56.230s
>>
>>     tcp:
>>       real    2m30.391s
>>       user    0m5.770s
>>       sys     0m46.007s
>>
>>     fc:
>>       real    2m19.738s
>>       user    0m6.012s
>>       sys     0m42.201s
>>
>> base:
>>     loop:
>>       real    7m35.061s
>>       user    0m23.493s
>>       sys     2m54.866s
>>
>>     rdma:
>>       real    8m29.347s
>>       user    0m13.078s
>>       sys     1m53.158s
>>
>>     tcp:
>>       real    8m11.357s
>>       user    0m13.033s
>>       sys     2m43.156s
>>
>>     fc:
>>       real    5m46.615s
>>       user    0m12.819s
>>       sys     1m46.338s
>>
>>
> 
> Those jobs are meant to be run for at least 1G to establish
> confidence on the data set and the system under test since SSDs
> are in TBs nowadays and we don't even get anywhere close to that,
> with your suggestion we are going even lower ...

Where does the 1G boundary coming from?

> we cannot change the dataset size for slow VMs, instead add
> a command line argument and pass it to tests e.g.
> nvme_verification_size=XXX similar to nvme_trtype but don't change
> the default values which we have been testing for years now
> 
> Testing is supposed to be time consuming especially verification jobs..

I like the idea, but I think it may need to be the other way around.
Have shortest possible runs by default.

Daniel Wagner April 19, 2023, 11:10 a.m. UTC | #3

On Wed, Apr 19, 2023 at 12:50:10PM +0300, Sagi Grimberg wrote:
> 
> > > While testing the fc transport I got a bit tired of wait for the I/O jobs to
> > > finish. Thus here some runtime optimization.
> > > 
> > > With a small/slow VM I got following values:
> > > 
> > > with 'optimizations'
> > >     loop:
> > >       real    4m43.981s
> > >       user    0m17.754s
> > >       sys     2m6.249s
> 
> How come loop is doubling the time with this patch?
> ratio is not the same before and after.

first run was with loop, second one with rdma:

nvme/002 (create many subsystems and test discovery)         [not run]
    runtime  82.089s  ...
    nvme_trtype=rdma is not supported in this test

nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
    runtime  39.948s  ...
    nvme_trtype=rdma is not supported in this test
nvme/017 (create/delete many file-ns and test discovery)     [not run]
    runtime  40.237s  ...

nvme/047 (test different queue types for fabric transports)  [passed]
    runtime    ...  13.580s
nvme/048 (Test queue count changes on reconnect)             [passed]
    runtime    ...  6.287s

82 + 40 + 40 - 14 - 6 = 142. So loop runs additional tests. Hmm, though my
optimization didn't work there...

> > Those jobs are meant to be run for at least 1G to establish
> > confidence on the data set and the system under test since SSDs
> > are in TBs nowadays and we don't even get anywhere close to that,
> > with your suggestion we are going even lower ...
> 
> Where does the 1G boundary coming from?

No idea, it just the existing hard coded values. I guess it might be from
efa06fcf3c83 ("loop: test partition scanning") which was the first real test
case (according the logs).

> > we cannot change the dataset size for slow VMs, instead add
> > a command line argument and pass it to tests e.g.
> > nvme_verification_size=XXX similar to nvme_trtype but don't change
> > the default values which we have been testing for years now
> > 
> > Testing is supposed to be time consuming especially verification jobs..
> 
> I like the idea, but I think it may need to be the other way around.
> Have shortest possible runs by default.

Good point, I'll make it configurable. What is a good small default then? There
are some test cases in loop which allocated a 1M file. That's propably too
small.

Sagi Grimberg April 19, 2023, 1:15 p.m. UTC | #4

>>>> While testing the fc transport I got a bit tired of wait for the I/O jobs to
>>>> finish. Thus here some runtime optimization.
>>>>
>>>> With a small/slow VM I got following values:
>>>>
>>>> with 'optimizations'
>>>>      loop:
>>>>        real    4m43.981s
>>>>        user    0m17.754s
>>>>        sys     2m6.249s
>>
>> How come loop is doubling the time with this patch?
>> ratio is not the same before and after.
> 
> first run was with loop, second one with rdma:
> 
> nvme/002 (create many subsystems and test discovery)         [not run]
>      runtime  82.089s  ...
>      nvme_trtype=rdma is not supported in this test
> 
> nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
>      runtime  39.948s  ...
>      nvme_trtype=rdma is not supported in this test
> nvme/017 (create/delete many file-ns and test discovery)     [not run]
>      runtime  40.237s  ...
> 
> nvme/047 (test different queue types for fabric transports)  [passed]
>      runtime    ...  13.580s
> nvme/048 (Test queue count changes on reconnect)             [passed]
>      runtime    ...  6.287s
> 
> 82 + 40 + 40 - 14 - 6 = 142. So loop runs additional tests. Hmm, though my
> optimization didn't work there...

How come loop is 4m+ while the others are 2m+ when before all
were the same timeframe more or less?

> 
>>> Those jobs are meant to be run for at least 1G to establish
>>> confidence on the data set and the system under test since SSDs
>>> are in TBs nowadays and we don't even get anywhere close to that,
>>> with your suggestion we are going even lower ...
>>
>> Where does the 1G boundary coming from?
> 
> No idea, it just the existing hard coded values. I guess it might be from
> efa06fcf3c83 ("loop: test partition scanning") which was the first real test
> case (according the logs).

Was asking Chaitanya why is 1G considered sufficient vs. other sizes?
Why not 10G? Why not 100M?

Chaitanya Kulkarni April 19, 2023, 9:11 p.m. UTC | #5

On 4/19/23 02:50, Sagi Grimberg wrote:
>
>>> While testing the fc transport I got a bit tired of wait for the I/O 
>>> jobs to
>>> finish. Thus here some runtime optimization.
>>>
>>> With a small/slow VM I got following values:
>>>
>>> with 'optimizations'
>>>     loop:
>>>       real    4m43.981s
>>>       user    0m17.754s
>>>       sys     2m6.249s
>
> How come loop is doubling the time with this patch?
> ratio is not the same before and after.
>
>>>
>>>     rdma:
>>>       real    2m35.160s
>>>       user    0m6.264s
>>>       sys     0m56.230s
>>>
>>>     tcp:
>>>       real    2m30.391s
>>>       user    0m5.770s
>>>       sys     0m46.007s
>>>
>>>     fc:
>>>       real    2m19.738s
>>>       user    0m6.012s
>>>       sys     0m42.201s
>>>
>>> base:
>>>     loop:
>>>       real    7m35.061s
>>>       user    0m23.493s
>>>       sys     2m54.866s
>>>
>>>     rdma:
>>>       real    8m29.347s
>>>       user    0m13.078s
>>>       sys     1m53.158s
>>>
>>>     tcp:
>>>       real    8m11.357s
>>>       user    0m13.033s
>>>       sys     2m43.156s
>>>
>>>     fc:
>>>       real    5m46.615s
>>>       user    0m12.819s
>>>       sys     1m46.338s
>>>
>>>
>>
>> Those jobs are meant to be run for at least 1G to establish
>> confidence on the data set and the system under test since SSDs
>> are in TBs nowadays and we don't even get anywhere close to that,
>> with your suggestion we are going even lower ...
>
> Where does the 1G boundary coming from?
>


I wrote these testcases 3 times, initially they were the part of
nvme-cli tests7-8 years ago, then nvmftests 7-6 years ago, then they
moved to blktests.

In that time some of the testcases would not fail on with small size
such as less than 512MB especially with verification but they were
in the errors with 1G Hence I kept to be 1G.

Now I don't remember why I didn't use bigger size than 1G
should have documented that somewhere ...

>> we cannot change the dataset size for slow VMs, instead add
>> a command line argument and pass it to tests e.g.
>> nvme_verification_size=XXX similar to nvme_trtype but don't change
>> the default values which we have been testing for years now
>>
>> Testing is supposed to be time consuming especially verification jobs..
>
> I like the idea, but I think it may need to be the other way around.
> Have shortest possible runs by default.

see above..

-ck

Chaitanya Kulkarni April 19, 2023, 9:13 p.m. UTC | #6

On 4/19/23 06:15, Sagi Grimberg wrote:
>
>>>>> While testing the fc transport I got a bit tired of wait for the 
>>>>> I/O jobs to
>>>>> finish. Thus here some runtime optimization.
>>>>>
>>>>> With a small/slow VM I got following values:
>>>>>
>>>>> with 'optimizations'
>>>>>      loop:
>>>>>        real    4m43.981s
>>>>>        user    0m17.754s
>>>>>        sys     2m6.249s
>>>
>>> How come loop is doubling the time with this patch?
>>> ratio is not the same before and after.
>>
>> first run was with loop, second one with rdma:
>>
>> nvme/002 (create many subsystems and test discovery) [not run]
>>      runtime  82.089s  ...
>>      nvme_trtype=rdma is not supported in this test
>>
>> nvme/016 (create/delete many NVMeOF block device-backed ns and test 
>> discovery) [not run]
>>      runtime  39.948s  ...
>>      nvme_trtype=rdma is not supported in this test
>> nvme/017 (create/delete many file-ns and test discovery) [not run]
>>      runtime  40.237s  ...
>>
>> nvme/047 (test different queue types for fabric transports) [passed]
>>      runtime    ...  13.580s
>> nvme/048 (Test queue count changes on reconnect) [passed]
>>      runtime    ...  6.287s
>>
>> 82 + 40 + 40 - 14 - 6 = 142. So loop runs additional tests. Hmm, 
>> though my
>> optimization didn't work there...
>
> How come loop is 4m+ while the others are 2m+ when before all
> were the same timeframe more or less?
>
>>
>>>> Those jobs are meant to be run for at least 1G to establish
>>>> confidence on the data set and the system under test since SSDs
>>>> are in TBs nowadays and we don't even get anywhere close to that,
>>>> with your suggestion we are going even lower ...
>>>
>>> Where does the 1G boundary coming from?
>>
>> No idea, it just the existing hard coded values. I guess it might be 
>> from
>> efa06fcf3c83 ("loop: test partition scanning") which was the first 
>> real test
>> case (according the logs).
>
> Was asking Chaitanya why is 1G considered sufficient vs. other sizes?
> Why not 10G? Why not 100M?

See the earlier response ...

-ck

Chaitanya Kulkarni April 19, 2023, 9:31 p.m. UTC | #7

we cannot change the dataset size for slow VMs, instead add
>> a command line argument and pass it to tests e.g.
>> nvme_verification_size=XXX similar to nvme_trtype but don't change
>> the default values which we have been testing for years now
>>
>> Testing is supposed to be time consuming especially verification jobs..
>
> I like the idea, but I think it may need to be the other way around.
> Have shortest possible runs by default.

not everyone is running blktests on the slow vms, so I think it should
be the other way around, the default integration of these testcases
using 1G size in various distros, and it is not a good idea to change
that so everyone else who are not running slow vms who should update
their testscripts ...

-ck

Daniel Wagner April 20, 2023, 8:24 a.m. UTC | #8

On Wed, Apr 19, 2023 at 09:11:33PM +0000, Chaitanya Kulkarni wrote:
> >> Those jobs are meant to be run for at least 1G to establish
> >> confidence on the data set and the system under test since SSDs
> >> are in TBs nowadays and we don't even get anywhere close to that,
> >> with your suggestion we are going even lower ...
> >
> > Where does the 1G boundary coming from?
> >
>
> I wrote these testcases 3 times, initially they were the part of
> nvme-cli tests7-8 years ago, then nvmftests 7-6 years ago, then they
> moved to blktests.
> 
> In that time some of the testcases would not fail on with small size
> such as less than 512MB especially with verification but they were
> in the errors with 1G Hence I kept to be 1G.
> 
> Now I don't remember why I didn't use bigger size than 1G
> should have documented that somewhere ...

Can you remember why you chosed to set the image size to 1G and the io size for
fio to 950m in nvme/012 and nvme/013?

I am testing various image sizes and found that small images e.g in the range of
[4..64]m are passing fine but larger ones like [512-...]M do not (no space
left). Note I've added a calc function which does image size - 1M to leave some
room left.

Daniel Wagner April 20, 2023, 8:31 a.m. UTC | #9

On Thu, Apr 20, 2023 at 10:24:15AM +0200, Daniel Wagner wrote:
> On Wed, Apr 19, 2023 at 09:11:33PM +0000, Chaitanya Kulkarni wrote:
> > >> Those jobs are meant to be run for at least 1G to establish
> > >> confidence on the data set and the system under test since SSDs
> > >> are in TBs nowadays and we don't even get anywhere close to that,
> > >> with your suggestion we are going even lower ...
> > >
> > > Where does the 1G boundary coming from?
> > >
> >
> > I wrote these testcases 3 times, initially they were the part of
> > nvme-cli tests7-8 years ago, then nvmftests 7-6 years ago, then they
> > moved to blktests.
> > 
> > In that time some of the testcases would not fail on with small size
> > such as less than 512MB especially with verification but they were
> > in the errors with 1G Hence I kept to be 1G.
> > 
> > Now I don't remember why I didn't use bigger size than 1G
> > should have documented that somewhere ...
> 
> Can you remember why you chosed to set the image size to 1G and the io size for
> fio to 950m in nvme/012 and nvme/013?

forget it, found a commit message which explains it

e5bd71872b3b ("nvme/012,013,035: change fio I/O size and move size definition place")
  [...]
  Change fio I/O size of nvme/012,013,035 from 950m to 900m, since recent change
  increased the xfs log size and it caused fio failure with I/O size 950m.