Why is MEGASAS_SAS_QD set to 256?

Message ID 1c4d66ca-fe1a-3d1a-d7f9-4981d2fc9eb1@huaweicloud.com
State New
Headers
Series Why is MEGASAS_SAS_QD set to 256? |

Commit Message

Yu Kuai Nov. 24, 2022, 3:45 a.m. UTC
  Hi,

While upgrading kernel from 4.19 to 5.10, I found that fio 1 thread 4k
sequential io performance is dropped(160Mib -> 100 Mib), root cause is
that queue_depth is changed from 64 to 256.

commit 6e73550670ed1c07779706bb6cf61b99c871fc42
scsi: megaraid_sas: Update optimal queue depth for SAS and NVMe devices

elevator has no effect, specifically io can't be merged in this test
case. Hence it doesn't make sense to me to set default queue_depth to
256.

Is there any reason why MEGASAS_SAS_QD is changed to 64?

Thanks,
Kuai
  

Comments

John Garry Nov. 25, 2022, 12:33 p.m. UTC | #1
On 24/11/2022 03:45, Yu Kuai wrote:
> Hi,
> 
> While upgrading kernel from 4.19 to 5.10, I found that fio 1 thread 4k
> sequential io performance is dropped(160Mib -> 100 Mib), root cause is
> that queue_depth is changed from 64 to 256.
> 
> commit 6e73550670ed1c07779706bb6cf61b99c871fc42
> scsi: megaraid_sas: Update optimal queue depth for SAS and NVMe devices
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas.h 
> b/drivers/scsi/megaraid/megaraid_sas.h
> index bd8184072bed..ddfbe6f6667a 100644
> --- a/drivers/scsi/megaraid/megaraid_sas.h
> +++ b/drivers/scsi/megaraid/megaraid_sas.h
> @@ -2233,9 +2233,9 @@ enum MR_PD_TYPE {
> 
>   /* JBOD Queue depth definitions */
>   #define MEGASAS_SATA_QD        32
> -#define MEGASAS_SAS_QD 64
> +#define MEGASAS_SAS_QD 256
>   #define MEGASAS_DEFAULT_PD_QD  64
> -#define MEGASAS_NVME_QD                32
> +#define MEGASAS_NVME_QD        64
> 
> 
> And with the default nr_requests 256, 256 queue_depth will make the
> elevator has no effect, specifically io can't be merged in this test
> case. Hence it doesn't make sense to me to set default queue_depth to
> 256.
> 
> Is there any reason why MEGASAS_SAS_QD is changed to 64?
> 
> Thanks,
> Kuai
> 

Which type of drive do you use?

JFYI, in case missed, there was this discussion on SCSI queue depth a 
while ago:
https://lore.kernel.org/linux-scsi/4b50f067-a368-2197-c331-a8c981f5cd02@huawei.com/

Thanks,
John
  
Yu Kuai Nov. 26, 2022, 1:15 a.m. UTC | #2
Hi,

在 2022/11/25 20:33, John Garry 写道:
> On 24/11/2022 03:45, Yu Kuai wrote:
>> Hi,
>>
>> While upgrading kernel from 4.19 to 5.10, I found that fio 1 thread 4k
>> sequential io performance is dropped(160Mib -> 100 Mib), root cause is
>> that queue_depth is changed from 64 to 256.
>>
>> commit 6e73550670ed1c07779706bb6cf61b99c871fc42
>> scsi: megaraid_sas: Update optimal queue depth for SAS and NVMe devices
>>
>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h 
>> b/drivers/scsi/megaraid/megaraid_sas.h
>> index bd8184072bed..ddfbe6f6667a 100644
>> --- a/drivers/scsi/megaraid/megaraid_sas.h
>> +++ b/drivers/scsi/megaraid/megaraid_sas.h
>> @@ -2233,9 +2233,9 @@ enum MR_PD_TYPE {
>>
>>   /* JBOD Queue depth definitions */
>>   #define MEGASAS_SATA_QD        32
>> -#define MEGASAS_SAS_QD 64
>> +#define MEGASAS_SAS_QD 256
>>   #define MEGASAS_DEFAULT_PD_QD  64
>> -#define MEGASAS_NVME_QD                32
>> +#define MEGASAS_NVME_QD        64
>>
>>
>> And with the default nr_requests 256, 256 queue_depth will make the
>> elevator has no effect, specifically io can't be merged in this test
>> case. Hence it doesn't make sense to me to set default queue_depth to
>> 256.
>>
>> Is there any reason why MEGASAS_SAS_QD is changed to 64?
>>
>> Thanks,
>> Kuai
>>
> 
> Which type of drive do you use?

SAS SSDs

BTW, I also test with nvme as well, the default elevator is deadline and
queue_depth seems too small, and performance is far from optimal.

Current default values don't seem good to me... 😒

Thanks,
Kuai
> 
> JFYI, in case missed, there was this discussion on SCSI queue depth a 
> while ago:
> https://lore.kernel.org/linux-scsi/4b50f067-a368-2197-c331-a8c981f5cd02@huawei.com/ 
> 
> 
> Thanks,
> John
> 
> 
> .
>
  
Ming Lei Nov. 26, 2022, 2:18 a.m. UTC | #3
On Sat, Nov 26, 2022 at 09:15:53AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2022/11/25 20:33, John Garry 写道:
> > On 24/11/2022 03:45, Yu Kuai wrote:
> > > Hi,
> > > 
> > > While upgrading kernel from 4.19 to 5.10, I found that fio 1 thread 4k
> > > sequential io performance is dropped(160Mib -> 100 Mib), root cause is
> > > that queue_depth is changed from 64 to 256.
> > > 
> > > commit 6e73550670ed1c07779706bb6cf61b99c871fc42
> > > scsi: megaraid_sas: Update optimal queue depth for SAS and NVMe devices
> > > 
> > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h
> > > b/drivers/scsi/megaraid/megaraid_sas.h
> > > index bd8184072bed..ddfbe6f6667a 100644
> > > --- a/drivers/scsi/megaraid/megaraid_sas.h
> > > +++ b/drivers/scsi/megaraid/megaraid_sas.h
> > > @@ -2233,9 +2233,9 @@ enum MR_PD_TYPE {
> > > 
> > >   /* JBOD Queue depth definitions */
> > >   #define MEGASAS_SATA_QD        32
> > > -#define MEGASAS_SAS_QD 64
> > > +#define MEGASAS_SAS_QD 256
> > >   #define MEGASAS_DEFAULT_PD_QD  64
> > > -#define MEGASAS_NVME_QD                32
> > > +#define MEGASAS_NVME_QD        64
> > > 
> > > 
> > > And with the default nr_requests 256, 256 queue_depth will make the
> > > elevator has no effect, specifically io can't be merged in this test
> > > case. Hence it doesn't make sense to me to set default queue_depth to
> > > 256.
> > > 
> > > Is there any reason why MEGASAS_SAS_QD is changed to 64?
> > > 
> > > Thanks,
> > > Kuai
> > > 
> > 
> > Which type of drive do you use?
> 
> SAS SSDs
> 
> BTW, I also test with nvme as well, the default elevator is deadline and
> queue_depth seems too small, and performance is far from optimal.
> 
> Current default values don't seem good to me... 😒

If you want aggressive merge on sequential IO workload, the queue depth need
to be a bit less, then more requests can be staggered into scheduler queue,
and merge chance is increased.

If you want good perf on random IO perf, the queue depth needs to
be deep enough to have enough parallelism for saturating SSD internal.

But we don't recognize sequential/random IO pattern, and usually fixed
queue depth is used.

Thanks,
Ming
  
Yu Kuai Nov. 26, 2022, 6:08 a.m. UTC | #4
Hi, Ming

在 2022/11/26 10:18, Ming Lei 写道:
> 
> If you want aggressive merge on sequential IO workload, the queue depth need
> to be a bit less, then more requests can be staggered into scheduler queue,
> and merge chance is increased.

But if nr_requests >= queue_depth, it seems to me elevator will have no
effect, no request can be merged or sorted by scheduler, right?
> 
> If you want good perf on random IO perf, the queue depth needs to
> be deep enough to have enough parallelism for saturating SSD internal.
> 
> But we don't recognize sequential/random IO pattern, and usually fixed
> queue depth is used.

Is it possible to use none elevator and set large queue_depth if nvme is
used in this case?

Thansk,
Kuai
> 
> Thanks,
> Ming
> 
> .
>
  
Yu Kuai Nov. 26, 2022, 6:08 a.m. UTC | #5
在 2022/11/26 14:08, Yu Kuai 写道:
> Hi, Ming
> 
> 在 2022/11/26 10:18, Ming Lei 写道:
>>
>> If you want aggressive merge on sequential IO workload, the queue 
>> depth need
>> to be a bit less, then more requests can be staggered into scheduler 
>> queue,
>> and merge chance is increased.
> 
> But if nr_requests >= queue_depth, it seems to me elevator will have no
> effect, no request can be merged or sorted by scheduler, right?
Sorry that should be nr_requests <= queue_depth.

>>
>> If you want good perf on random IO perf, the queue depth needs to
>> be deep enough to have enough parallelism for saturating SSD internal.
>>
>> But we don't recognize sequential/random IO pattern, and usually fixed
>> queue depth is used.
> 
> Is it possible to use none elevator and set large queue_depth if nvme is
> used in this case?
> 
> Thansk,
> Kuai
>>
>> Thanks,
>> Ming
>>
>> .
>>
> 
> .
>
  
Ming Lei Nov. 27, 2022, 9:42 a.m. UTC | #6
On Sat, Nov 26, 2022 at 02:08:02PM +0800, Yu Kuai wrote:
> Hi, Ming
> 
> 在 2022/11/26 10:18, Ming Lei 写道:
> > 
> > If you want aggressive merge on sequential IO workload, the queue depth need
> > to be a bit less, then more requests can be staggered into scheduler queue,
> > and merge chance is increased.
> 
> But if nr_requests >= queue_depth, it seems to me elevator will have no
> effect, no request can be merged or sorted by scheduler, right?

Yeah.

If nr_requests <= queue_depth, every request can be queued to
driver/device, so requests won't be merged by scheduler.

But plug merge still works if IOs are submitted as batch.

> > 
> > If you want good perf on random IO perf, the queue depth needs to
> > be deep enough to have enough parallelism for saturating SSD internal.
> > 
> > But we don't recognize sequential/random IO pattern, and usually fixed
> > queue depth is used.
> 
> Is it possible to use none elevator and set large queue_depth if nvme is
> used in this case?

Yeah, if the storage is SSD, usually none with bigger queue_depth should
help, and usually 256 should be enough to saturate one single SSD for
one well implemented driver.


Thanks
Ming
  
Yu Kuai Nov. 28, 2022, 3:09 a.m. UTC | #7
Hi,

在 2022/11/27 17:42, Ming Lei 写道:
> On Sat, Nov 26, 2022 at 02:08:02PM +0800, Yu Kuai wrote:
>> Hi, Ming
>>
>> 在 2022/11/26 10:18, Ming Lei 写道:
>>>
>>> If you want aggressive merge on sequential IO workload, the queue depth need
>>> to be a bit less, then more requests can be staggered into scheduler queue,
>>> and merge chance is increased.
>>
>> But if nr_requests >= queue_depth, it seems to me elevator will have no
>> effect, no request can be merged or sorted by scheduler, right?
> 
> Yeah.
> 
> If nr_requests <= queue_depth, every request can be queued to
> driver/device, so requests won't be merged by scheduler.
> 
> But plug merge still works if IOs are submitted as batch.

Yes, io can still be merged by plug. I just find it a little werid to
set default elevator as deadline, and default queue_depth to 256. Which
means deadline here is useless.

> 
>>>
>>> If you want good perf on random IO perf, the queue depth needs to
>>> be deep enough to have enough parallelism for saturating SSD internal.
>>>
>>> But we don't recognize sequential/random IO pattern, and usually fixed
>>> queue depth is used.
>>
>> Is it possible to use none elevator and set large queue_depth if nvme is
>> used in this case?
> 
> Yeah, if the storage is SSD, usually none with bigger queue_depth should
> help, and usually 256 should be enough to saturate one single SSD for
> one well implemented driver.

Yes, I'm testing with multiple SSDs / NVMEs, and I can't get optimal
performance by default.

Thanks,
Kuai
> 
> 
> Thanks
> Ming
> 
> .
>
  

Patch

diff --git a/drivers/scsi/megaraid/megaraid_sas.h 
b/drivers/scsi/megaraid/megaraid_sas.h
index bd8184072bed..ddfbe6f6667a 100644
--- a/drivers/scsi/megaraid/megaraid_sas.h
+++ b/drivers/scsi/megaraid/megaraid_sas.h
@@ -2233,9 +2233,9 @@  enum MR_PD_TYPE {

  /* JBOD Queue depth definitions */
  #define MEGASAS_SATA_QD        32
-#define MEGASAS_SAS_QD 64
+#define MEGASAS_SAS_QD 256
  #define MEGASAS_DEFAULT_PD_QD  64
-#define MEGASAS_NVME_QD                32
+#define MEGASAS_NVME_QD        64


And with the default nr_requests 256, 256 queue_depth will make the