Message ID | 1666693976-181094-3-git-send-email-john.garry@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp917296wru; Tue, 25 Oct 2022 03:14:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4gKdstP0PHx8e1BJJC1KhFTYesAN1PF0fh0Izicrs3aFyXtuetAJILlzh6bT5xu72Y5ViO X-Received: by 2002:aa7:9dcb:0:b0:565:89a8:c708 with SMTP id g11-20020aa79dcb000000b0056589a8c708mr38606001pfq.4.1666692868035; Tue, 25 Oct 2022 03:14:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666692868; cv=none; d=google.com; s=arc-20160816; b=iSHvlzcAJnjATMbIC9oY2ybPdnrTUuOma3k5VRtgPYwt0M0zjs8lpS7rfkSimPhx7U mRGxKIprnUf3F5zeLMdOIDdoLRA5r0LPuhICOAW/kr5Gjazf06e3+qIX14E18BbtVxDm lkHNqsfqI0JYy4GybwUQUm+I5t2dToLa6LbbtEILbb0IO05ibtNe+HnkCLcxnm/Tjv9Q dd/9l1D1rQHgf1VltvR2Hl5ak+q2P3MsnbztQITOydSicpD1p6m0GruaQRiX/rwKWJSW mvWNn7CEKyvHqgtsCP4CTqtt/CGnvqxk2quXDBb4jC5NrkEURHLhGeOd8O0Q4viutm88 4KNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from; bh=yKftcTuMP1lpUQsxRoeRUgpzkno5mZnny31oRfO5Y0A=; b=czo1CccrghGRFYZVFXH39W2Nej+f5Q3hnFETsGCSU8L/jQqiVDQUJtFEpOm4+z8KNe YJAlMQE9IRpjEIqPpO/UZe5y6LixnOYUxmWbSxgBnJ2xrcIaDK9dr+RstxqnLKRYu1gE KzguLElskSjHkehZDuAerLbeZenOjeyHesGNtx1Lk0Hs2WYkJDddHgHEzGzTnhW4yFhx RHTgoYr1DiLv/wjImLvUfG1l0nAkXkse8qwF/vgh19ERKvU9YYYm3lQWBDWjKgEL0b1G vskNIbl5cHQ9gu92lXbvelSk4JYO9d5H+Ms80kxN54NH5LXiLMOGWKyGfjP+Lg0v9maH dKvg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x21-20020a631715000000b0046f18bdf887si2369003pgl.476.2022.10.25.03.14.15; Tue, 25 Oct 2022 03:14:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231855AbiJYKLY (ORCPT <rfc822;lucius.rs.storz@gmail.com> + 99 others); Tue, 25 Oct 2022 06:11:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230046AbiJYKKi (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Oct 2022 06:10:38 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6976B11F487; Tue, 25 Oct 2022 03:03:24 -0700 (PDT) Received: from fraeml708-chm.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4MxS942Vhvz689yL; Tue, 25 Oct 2022 17:59:52 +0800 (CST) Received: from lhrpeml500003.china.huawei.com (7.191.162.67) by fraeml708-chm.china.huawei.com (10.206.15.36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 25 Oct 2022 12:03:22 +0200 Received: from localhost.localdomain (10.69.192.58) by lhrpeml500003.china.huawei.com (7.191.162.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 25 Oct 2022 11:03:19 +0100 From: John Garry <john.garry@huawei.com> To: <damien.lemoal@opensource.wdc.com>, <jejb@linux.ibm.com>, <martin.petersen@oracle.com>, <hare@suse.de>, <bvanassche@acm.org>, <hch@lst.de>, <ming.lei@redhat.com>, <niklas.cassel@wdc.com> CC: <axboe@kernel.dk>, <jinpu.wang@cloud.ionos.com>, <linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-ide@vger.kernel.org>, <linux-scsi@vger.kernel.org>, <linuxarm@huawei.com>, John Garry <john.garry@huawei.com> Subject: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand() Date: Tue, 25 Oct 2022 18:32:51 +0800 Message-ID: <1666693976-181094-3-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1666693976-181094-1-git-send-email-john.garry@huawei.com> References: <1666693976-181094-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.69.192.58] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To lhrpeml500003.china.huawei.com (7.191.162.67) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747654140507647861?= X-GMAIL-MSGID: =?utf-8?q?1747654140507647861?= |
Series |
blk-mq/libata/scsi: SCSI driver tagging improvements Part II
|
|
Commit Message
John Garry
Oct. 25, 2022, 10:32 a.m. UTC
Add callback to queue reserved commands - call it "internal" as this is
what libata uses.
Also add it to the base ATA SHT.
Signed-off-by: John Garry <john.garry@huawei.com>
---
drivers/ata/libata-scsi.c | 14 ++++++++++++++
include/linux/libata.h | 5 ++++-
2 files changed, 18 insertions(+), 1 deletion(-)
Comments
On 10/25/22 19:32, John Garry wrote: > Add callback to queue reserved commands - call it "internal" as this is > what libata uses. > > Also add it to the base ATA SHT. > > Signed-off-by: John Garry <john.garry@huawei.com> > --- > drivers/ata/libata-scsi.c | 14 ++++++++++++++ > include/linux/libata.h | 5 ++++- > 2 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 30d7c90b0c35..0d6f37d80137 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) > return 0; > } > > +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd) > +{ > + struct ata_port *ap; > + int res; > + > + ap = ata_shost_to_port(shost); You can have this initialization together with the ap declaration. > + spin_lock_irq(ap->lock); > + res = ata_sas_queuecmd(scmd, ap); > + spin_unlock_irq(ap->lock); > + > + return res; > +} > +EXPORT_SYMBOL_GPL(ata_internal_queuecommand); I am officially lost here. Do not see why this function is needed... > + > /** > * ata_scsi_slave_config - Set SCSI device attributes > * @sdev: SCSI device to examine > diff --git a/include/linux/libata.h b/include/linux/libata.h > index 8938b584520f..f09c5dca16ce 100644 > --- a/include/linux/libata.h > +++ b/include/linux/libata.h > @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev, > sector_t capacity, int geom[]); > extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev); > extern int ata_scsi_slave_config(struct scsi_device *sdev); > +extern int ata_internal_queuecommand(struct Scsi_Host *shost, > + struct scsi_cmnd *scmd); > extern void ata_scsi_slave_destroy(struct scsi_device *sdev); > extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, > int queue_depth); > @@ -1391,7 +1393,8 @@ extern const struct attribute_group *ata_common_sdev_groups[]; > .slave_destroy = ata_scsi_slave_destroy, \ > .bios_param = ata_std_bios_param, \ > .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ > - .max_sectors = ATA_MAX_SECTORS_LBA48 > + .max_sectors = ATA_MAX_SECTORS_LBA48,\ > + .reserved_queuecommand = ata_internal_queuecommand > > #define ATA_SUBBASE_SHT(drv_name) \ > __ATA_BASE_SHT(drv_name), \
On 27/10/2022 02:45, Damien Le Moal wrote: > On 10/25/22 19:32, John Garry wrote: >> Add callback to queue reserved commands - call it "internal" as this is >> what libata uses. >> >> Also add it to the base ATA SHT. >> >> Signed-off-by: John Garry <john.garry@huawei.com> >> --- >> drivers/ata/libata-scsi.c | 14 ++++++++++++++ >> include/linux/libata.h | 5 ++++- >> 2 files changed, 18 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c >> index 30d7c90b0c35..0d6f37d80137 100644 >> --- a/drivers/ata/libata-scsi.c >> +++ b/drivers/ata/libata-scsi.c >> @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) >> return 0; >> } >> >> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd) >> +{ >> + struct ata_port *ap; >> + int res; >> + >> + ap = ata_shost_to_port(shost); > > You can have this initialization together with the ap declaration. > >> + spin_lock_irq(ap->lock); >> + res = ata_sas_queuecmd(scmd, ap); >> + spin_unlock_irq(ap->lock); >> + >> + return res; >> +} >> +EXPORT_SYMBOL_GPL(ata_internal_queuecommand); > > I am officially lost here. Do not see why this function is needed... The general idea in this series is to send ATA internal commands as requests. And this function is used as the SCSI midlayer to queue reserved commands. See how it is plugged into __ATA_BASE_SHT, below. So we have this overall flow: ata_exec_internal_sg(): -> alloc request -> blk_execute_rq_nowait() ... -> scsi_queue_rq() -> sht->reserved_queuecommd() -> ata_internal_queuecommand() And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() -> ata_scsi_queue_internal() -> ata_qc_issue(). Hope it makes sense. Thanks, John > >> + >> /** >> * ata_scsi_slave_config - Set SCSI device attributes >> * @sdev: SCSI device to examine >> diff --git a/include/linux/libata.h b/include/linux/libata.h >> index 8938b584520f..f09c5dca16ce 100644 >> --- a/include/linux/libata.h >> +++ b/include/linux/libata.h >> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev, >> sector_t capacity, int geom[]); >> extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev); >> extern int ata_scsi_slave_config(struct scsi_device *sdev); >> +extern int ata_internal_queuecommand(struct Scsi_Host *shost, >> + struct scsi_cmnd *scmd); >> extern void ata_scsi_slave_destroy(struct scsi_device *sdev); >> extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, >> int queue_depth); >> @@ -1391,7 +1393,8 @@ extern const struct attribute_group *ata_common_sdev_groups[]; >> .slave_destroy = ata_scsi_slave_destroy, \ >> .bios_param = ata_std_bios_param, \ >> .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ >> - .max_sectors = ATA_MAX_SECTORS_LBA48 >> + .max_sectors = ATA_MAX_SECTORS_LBA48,\ >> + .reserved_queuecommand = ata_internal_queuecommand >> >> #define ATA_SUBBASE_SHT(drv_name) \ >> __ATA_BASE_SHT(drv_name), \ >
On 10/27/22 11:56, John Garry wrote: > On 27/10/2022 02:45, Damien Le Moal wrote: >> On 10/25/22 19:32, John Garry wrote: >>> Add callback to queue reserved commands - call it "internal" as this is >>> what libata uses. >>> >>> Also add it to the base ATA SHT. >>> >>> Signed-off-by: John Garry <john.garry@huawei.com> >>> --- >>> drivers/ata/libata-scsi.c | 14 ++++++++++++++ >>> include/linux/libata.h | 5 ++++- >>> 2 files changed, 18 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c >>> index 30d7c90b0c35..0d6f37d80137 100644 >>> --- a/drivers/ata/libata-scsi.c >>> +++ b/drivers/ata/libata-scsi.c >>> @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device >>> *sdev, struct ata_device *dev) >>> return 0; >>> } >>> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct >>> scsi_cmnd *scmd) >>> +{ >>> + struct ata_port *ap; >>> + int res; >>> + >>> + ap = ata_shost_to_port(shost); >> >> You can have this initialization together with the ap declaration. >> >>> + spin_lock_irq(ap->lock); >>> + res = ata_sas_queuecmd(scmd, ap); >>> + spin_unlock_irq(ap->lock); >>> + >>> + return res; >>> +} >>> +EXPORT_SYMBOL_GPL(ata_internal_queuecommand); >> >> I am officially lost here. Do not see why this function is needed... > > The general idea in this series is to send ATA internal commands as > requests. And this function is used as the SCSI midlayer to queue > reserved commands. See how it is plugged into __ATA_BASE_SHT, below. > > So we have this overall flow: > > ata_exec_internal_sg(): > -> alloc request > -> blk_execute_rq_nowait() > ... -> scsi_queue_rq() > -> sht->reserved_queuecommd() > -> ata_internal_queuecommand() > > And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() -> > ata_scsi_queue_internal() -> ata_qc_issue(). > > Hope it makes sense. > > Thanks, > John > >> >>> + >>> /** >>> * ata_scsi_slave_config - Set SCSI device attributes >>> * @sdev: SCSI device to examine >>> diff --git a/include/linux/libata.h b/include/linux/libata.h >>> index 8938b584520f..f09c5dca16ce 100644 >>> --- a/include/linux/libata.h >>> +++ b/include/linux/libata.h >>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct >>> scsi_device *sdev, >>> sector_t capacity, int geom[]); >>> extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev); >>> extern int ata_scsi_slave_config(struct scsi_device *sdev); >>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost, >>> + struct scsi_cmnd *scmd); >>> extern void ata_scsi_slave_destroy(struct scsi_device *sdev); >>> extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, >>> int queue_depth); >>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group >>> *ata_common_sdev_groups[]; >>> .slave_destroy = ata_scsi_slave_destroy, \ >>> .bios_param = ata_std_bios_param, \ >>> .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ >>> - .max_sectors = ATA_MAX_SECTORS_LBA48 >>> + .max_sectors = ATA_MAX_SECTORS_LBA48,\ >>> + .reserved_queuecommand = ata_internal_queuecommand >>> #define ATA_SUBBASE_SHT(drv_name) \ >>> __ATA_BASE_SHT(drv_name), \ >> > But that means we can't use it before the SCSI host is initialized; some HBAs require to send commands before the host can be initialized properly. Cheers, Hannes
On 27/10/2022 14:02, Hannes Reinecke wrote: >>>> /** >>>> * ata_scsi_slave_config - Set SCSI device attributes >>>> * @sdev: SCSI device to examine >>>> diff --git a/include/linux/libata.h b/include/linux/libata.h >>>> index 8938b584520f..f09c5dca16ce 100644 >>>> --- a/include/linux/libata.h >>>> +++ b/include/linux/libata.h >>>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct >>>> scsi_device *sdev, >>>> sector_t capacity, int geom[]); >>>> extern void ata_scsi_unlock_native_capacity(struct scsi_device >>>> *sdev); >>>> extern int ata_scsi_slave_config(struct scsi_device *sdev); >>>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost, >>>> + struct scsi_cmnd *scmd); >>>> extern void ata_scsi_slave_destroy(struct scsi_device *sdev); >>>> extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, >>>> int queue_depth); >>>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group >>>> *ata_common_sdev_groups[]; >>>> .slave_destroy = ata_scsi_slave_destroy, \ >>>> .bios_param = ata_std_bios_param, \ >>>> .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ >>>> - .max_sectors = ATA_MAX_SECTORS_LBA48 >>>> + .max_sectors = ATA_MAX_SECTORS_LBA48,\ >>>> + .reserved_queuecommand = ata_internal_queuecommand >>>> #define ATA_SUBBASE_SHT(drv_name) \ >>>> __ATA_BASE_SHT(drv_name), \ >>> >> > > But that means we can't use it before the SCSI host is initialized; some > HBAs require to send commands before the host can be initialized properly. At what stage do you want to send these commands? The tags for the shost are not setup until scsi_add_host() -> scsi_mq_setup_tags() is called, so can't expect blk-mq to manage reserved tags before then. If you are required to send commands prior to scsi_add_host(), then I suppose the low-level driver still needs to manage tags until the shost is ready. I guess that some very simple scheme can be used, like always use tag 0, since most probe is done serially per-host. But that's not a case which I have had to deal with yet. Thanks, John
On 10/27/22 18:56, John Garry wrote: > On 27/10/2022 02:45, Damien Le Moal wrote: >> On 10/25/22 19:32, John Garry wrote: >>> Add callback to queue reserved commands - call it "internal" as this is >>> what libata uses. >>> >>> Also add it to the base ATA SHT. >>> >>> Signed-off-by: John Garry <john.garry@huawei.com> >>> --- >>> drivers/ata/libata-scsi.c | 14 ++++++++++++++ >>> include/linux/libata.h | 5 ++++- >>> 2 files changed, 18 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c >>> index 30d7c90b0c35..0d6f37d80137 100644 >>> --- a/drivers/ata/libata-scsi.c >>> +++ b/drivers/ata/libata-scsi.c >>> @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) >>> return 0; >>> } >>> >>> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd) >>> +{ >>> + struct ata_port *ap; >>> + int res; >>> + >>> + ap = ata_shost_to_port(shost); >> >> You can have this initialization together with the ap declaration. >> >>> + spin_lock_irq(ap->lock); >>> + res = ata_sas_queuecmd(scmd, ap); >>> + spin_unlock_irq(ap->lock); >>> + >>> + return res; >>> +} >>> +EXPORT_SYMBOL_GPL(ata_internal_queuecommand); >> >> I am officially lost here. Do not see why this function is needed... > > The general idea in this series is to send ATA internal commands as > requests. And this function is used as the SCSI midlayer to queue > reserved commands. See how it is plugged into __ATA_BASE_SHT, below. > > So we have this overall flow: > > ata_exec_internal_sg(): > -> alloc request > -> blk_execute_rq_nowait() > ... -> scsi_queue_rq() > -> sht->reserved_queuecommd() > -> ata_internal_queuecommand() > > And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() -> > ata_scsi_queue_internal() -> ata_qc_issue(). > > Hope it makes sense. OK. Got it. However, ata_exec_internal_sg() being used only from EH context with the queue quiesced, will blk_execute_rq_nowait() work ? Is there an exception for internal reserved tags ? > > Thanks, > John > >> >>> + >>> /** >>> * ata_scsi_slave_config - Set SCSI device attributes >>> * @sdev: SCSI device to examine >>> diff --git a/include/linux/libata.h b/include/linux/libata.h >>> index 8938b584520f..f09c5dca16ce 100644 >>> --- a/include/linux/libata.h >>> +++ b/include/linux/libata.h >>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev, >>> sector_t capacity, int geom[]); >>> extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev); >>> extern int ata_scsi_slave_config(struct scsi_device *sdev); >>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost, >>> + struct scsi_cmnd *scmd); >>> extern void ata_scsi_slave_destroy(struct scsi_device *sdev); >>> extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, >>> int queue_depth); >>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group *ata_common_sdev_groups[]; >>> .slave_destroy = ata_scsi_slave_destroy, \ >>> .bios_param = ata_std_bios_param, \ >>> .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ >>> - .max_sectors = ATA_MAX_SECTORS_LBA48 >>> + .max_sectors = ATA_MAX_SECTORS_LBA48,\ >>> + .reserved_queuecommand = ata_internal_queuecommand >>> >>> #define ATA_SUBBASE_SHT(drv_name) \ >>> __ATA_BASE_SHT(drv_name), \ >> >
On 10/28/22 02:23, John Garry wrote: > On 27/10/2022 14:02, Hannes Reinecke wrote: >>>>> /** >>>>> * ata_scsi_slave_config - Set SCSI device attributes >>>>> * @sdev: SCSI device to examine >>>>> diff --git a/include/linux/libata.h b/include/linux/libata.h >>>>> index 8938b584520f..f09c5dca16ce 100644 >>>>> --- a/include/linux/libata.h >>>>> +++ b/include/linux/libata.h >>>>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct >>>>> scsi_device *sdev, >>>>> sector_t capacity, int geom[]); >>>>> extern void ata_scsi_unlock_native_capacity(struct scsi_device >>>>> *sdev); >>>>> extern int ata_scsi_slave_config(struct scsi_device *sdev); >>>>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost, >>>>> + struct scsi_cmnd *scmd); >>>>> extern void ata_scsi_slave_destroy(struct scsi_device *sdev); >>>>> extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, >>>>> int queue_depth); >>>>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group >>>>> *ata_common_sdev_groups[]; >>>>> .slave_destroy = ata_scsi_slave_destroy, \ >>>>> .bios_param = ata_std_bios_param, \ >>>>> .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ >>>>> - .max_sectors = ATA_MAX_SECTORS_LBA48 >>>>> + .max_sectors = ATA_MAX_SECTORS_LBA48,\ >>>>> + .reserved_queuecommand = ata_internal_queuecommand >>>>> #define ATA_SUBBASE_SHT(drv_name) \ >>>>> __ATA_BASE_SHT(drv_name), \ >>>> >>> >> >> But that means we can't use it before the SCSI host is initialized; some >> HBAs require to send commands before the host can be initialized properly. > > At what stage do you want to send these commands? The tags for the shost > are not setup until scsi_add_host() -> scsi_mq_setup_tags() is called, > so can't expect blk-mq to manage reserved tags before then. > > If you are required to send commands prior to scsi_add_host(), then I > suppose the low-level driver still needs to manage tags until the shost > is ready. I guess that some very simple scheme can be used, like always > use tag 0, since most probe is done serially per-host. But that's not a > case which I have had to deal with yet. In libata case, ata_dev_configure() will cause a lot of ata_exec_internal_sg() calls for IDENTIFY and various READ LOG commands. That is all done with non-ncq commands, which means that we do not require a hw tag. But given that you are changing ata_exec_internal_sg() to call alloc_request + blk_execute_rq_nowait(), how would these work without a tag, at least a soft one ? Or we would need to keep the current code to use ata_qc_issue() directly for probe time ? That will look very ugly... > > Thanks, > John
On 27/10/2022 23:25, Damien Le Moal wrote: >> So we have this overall flow: >> >> ata_exec_internal_sg(): >> -> alloc request >> -> blk_execute_rq_nowait() >> ... -> scsi_queue_rq() >> -> sht->reserved_queuecommd() >> -> ata_internal_queuecommand() >> >> And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() -> >> ata_scsi_queue_internal() -> ata_qc_issue(). >> >> Hope it makes sense. > OK. Got it. > However, ata_exec_internal_sg() being used only from EH context with the > queue quiesced, will blk_execute_rq_nowait() work ? Is there an exception > for internal reserved tags ? > Well, yeah. So if some error happens and EH kicks in, then full queue depth of requests may be allocated. I have seen this for NCQ error. So this is why I make in very first patch change allow us to allocate reserved request from sdev request queue even when budget is fully allocated. Please also note that for AHCI, I make reserved depth =1, while for SAS controllers it is greater. This means that in theory we could alloc > 1x reserved command for SATA disk, but I don't think it matters. Thanks, John
On 10/28/22 17:01, John Garry wrote: > On 27/10/2022 23:25, Damien Le Moal wrote: >>> So we have this overall flow: >>> >>> ata_exec_internal_sg(): >>> -> alloc request >>> -> blk_execute_rq_nowait() >>> ... -> scsi_queue_rq() >>> -> sht->reserved_queuecommd() >>> -> ata_internal_queuecommand() >>> >>> And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() -> >>> ata_scsi_queue_internal() -> ata_qc_issue(). >>> >>> Hope it makes sense. >> OK. Got it. >> However, ata_exec_internal_sg() being used only from EH context with the >> queue quiesced, will blk_execute_rq_nowait() work ? Is there an exception >> for internal reserved tags ? >> > > Well, yeah. So if some error happens and EH kicks in, then full queue > depth of requests may be allocated. I have seen this for NCQ error. So > this is why I make in very first patch change allow us to allocate > reserved request from sdev request queue even when budget is fully > allocated. > > Please also note that for AHCI, I make reserved depth =1, while for SAS > controllers it is greater. This means that in theory we could alloc > 1x > reserved command for SATA disk, but I don't think it matters. Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that the user can only use 31 tags ? or is it 32+1 reserved ? which we can do since when using the reserved request, we will not use a hw tag (all reserved requests will be non-ncq). The 32 + 1 scheme will work. But for CDL command completion handling, we will need a NCQ command to do a read log, to avoid forcing a queue drain. For that to reliably work, we'll need a 31+1+1 setup... > > Thanks, > John
On 27/10/2022 23:35, Damien Le Moal wrote: >> At what stage do you want to send these commands? The tags for the shost >> are not setup until scsi_add_host() -> scsi_mq_setup_tags() is called, >> so can't expect blk-mq to manage reserved tags before then. >> >> If you are required to send commands prior to scsi_add_host(), then I >> suppose the low-level driver still needs to manage tags until the shost >> is ready. I guess that some very simple scheme can be used, like always >> use tag 0, since most probe is done serially per-host. But that's not a >> case which I have had to deal with yet. > In libata case, ata_dev_configure() will cause a lot of > ata_exec_internal_sg() calls for IDENTIFY and various READ LOG commands. > That is all done with non-ncq commands, which means that we do not require > a hw tag. But given that you are changing ata_exec_internal_sg() to call > alloc_request + blk_execute_rq_nowait(), how would these work without a > tag, at least a soft one ? Or we would need to keep the current code to > use ata_qc_issue() directly for probe time ? That will look very ugly... > I am not sure if there is really a problem. So libata/libsas allocs the shost quite early, and that is before we try using ata_exec_internal_sg(). Also note that I added patch "ata: libata-scsi: Allocate sdev early in port probe" so that we have ata_device.sdev ready before issuing ata_exec_internal_sg() (sorry if I'm stating the obvious). I think Hannes' issue is that some SCSI HBA driver needs to send "internal" commands to probe the HW for info, and this would be before shost is ready. He can tell us more. Thanks, John
On 10/28/22 17:14, John Garry wrote: > On 27/10/2022 23:35, Damien Le Moal wrote: >>> At what stage do you want to send these commands? The tags for the shost >>> are not setup until scsi_add_host() -> scsi_mq_setup_tags() is called, >>> so can't expect blk-mq to manage reserved tags before then. >>> >>> If you are required to send commands prior to scsi_add_host(), then I >>> suppose the low-level driver still needs to manage tags until the shost >>> is ready. I guess that some very simple scheme can be used, like always >>> use tag 0, since most probe is done serially per-host. But that's not a >>> case which I have had to deal with yet. >> In libata case, ata_dev_configure() will cause a lot of >> ata_exec_internal_sg() calls for IDENTIFY and various READ LOG commands. >> That is all done with non-ncq commands, which means that we do not require >> a hw tag. But given that you are changing ata_exec_internal_sg() to call >> alloc_request + blk_execute_rq_nowait(), how would these work without a >> tag, at least a soft one ? Or we would need to keep the current code to >> use ata_qc_issue() directly for probe time ? That will look very ugly... >> > > I am not sure if there is really a problem. So libata/libsas allocs the > shost quite early, and that is before we try using > ata_exec_internal_sg(). Also note that I added patch "ata: libata-scsi: > Allocate sdev early in port probe" so that we have ata_device.sdev ready > before issuing ata_exec_internal_sg() (sorry if I'm stating the obvious). > > I think Hannes' issue is that some SCSI HBA driver needs to send > "internal" commands to probe the HW for info, and this would be before > shost is ready. He can tell us more. OK. Understood. > > Thanks, > John
On 28/10/2022 09:07, Damien Le Moal wrote: >> Well, yeah. So if some error happens and EH kicks in, then full queue >> depth of requests may be allocated. I have seen this for NCQ error. So >> this is why I make in very first patch change allow us to allocate >> reserved request from sdev request queue even when budget is fully >> allocated. >> >> Please also note that for AHCI, I make reserved depth =1, while for SAS >> controllers it is greater. This means that in theory we could alloc > 1x >> reserved command for SATA disk, but I don't think it matters. > Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that the > user can only use 31 tags ? or is it 32+1 reserved ? which we can do since > when using the reserved request, we will not use a hw tag (all reserved > requests will be non-ncq). > > The 32 + 1 scheme will work. Yes, 32 + 1 is what we want. I now think that there is a mistake in my code in this series for ahci. So I think we need for ahci: can_queue = 33 nr_reserved_cmds = 1 while I only have can_queue = 32 I need to check that again for ahci driver and AHCI SHT... > But for CDL command completion handling, we > will need a NCQ command to do a read log, to avoid forcing a queue drain. > For that to reliably work, we'll need a 31+1+1 setup... > So is your idea to permanently reserve 1 more command from 32 commands ? Or re-use 1 from 32 (and still also have 1 separate internal command)? Thanks, John
On 10/28/22 17:33, John Garry wrote: > On 28/10/2022 09:07, Damien Le Moal wrote: >>> Well, yeah. So if some error happens and EH kicks in, then full queue >>> depth of requests may be allocated. I have seen this for NCQ error. So >>> this is why I make in very first patch change allow us to allocate >>> reserved request from sdev request queue even when budget is fully >>> allocated. >>> >>> Please also note that for AHCI, I make reserved depth =1, while for SAS >>> controllers it is greater. This means that in theory we could alloc > 1x >>> reserved command for SATA disk, but I don't think it matters. >> Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that >> the >> user can only use 31 tags ? or is it 32+1 reserved ? which we can do >> since >> when using the reserved request, we will not use a hw tag (all reserved >> requests will be non-ncq). >> >> The 32 + 1 scheme will work. > > Yes, 32 + 1 is what we want. I now think that there is a mistake in my > code in this series for ahci. > > So I think we need for ahci: > > can_queue = 33 Hmm.. For ATA, can_queue should be only 32. nr_tags is going to be 33 though as we include one tag for a reserved command. No ? (may be I misunderstand can_queue though). > nr_reserved_cmds = 1 > > while I only have can_queue = 32 Which seems right to me. > > I need to check that again for ahci driver and AHCI SHT... > >> But for CDL command completion handling, we >> will need a NCQ command to do a read log, to avoid forcing a queue drain. >> For that to reliably work, we'll need a 31+1+1 setup... >> > > So is your idea to permanently reserve 1 more command from 32 commands ? Yes. Command Duration Limits has this weird case were commands may be failed when exceeding their duration limit with a "good status" and "sense data available bit" set. This case was defined to avoid the queue stall that happens with any NCQ error. The way to handle this without relying on EH (as that would also cause a queue drain) is to issue an NCQ read log command to fetch the "sense data for successful NCQ commands" log, to retrieve the sense data for the completed command and check if it really failed or not. So we absolutely need a reserved command for this, Without a reserved command, it is a nightmare to support (tag reuse would be another solution, but it is horrible). > Or re-use 1 from 32 (and still also have 1 separate internal command)? I am not yet 100% sure if we can treat that internal NCQ read log like any other read/write request... If we can, then the 1-out-of-32 reservation would not be needed. Need to revisit all the cases we need to take care of (because in the middle of this CDL completion handling, regular NCQ errors can happen, resulting in a drive reset that could wreck everything as we lose the sense data for the completed requests). In any case, I think that we can deal with that extra reserved command on top of you current series. No need to worry about it for now I think. > > Thanks, > John
Hi Damien, >>>> >>>> Please also note that for AHCI, I make reserved depth =1, while for SAS >>>> controllers it is greater. This means that in theory we could alloc > 1x >>>> reserved command for SATA disk, but I don't think it matters. >>> Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that >>> the >>> user can only use 31 tags ? or is it 32+1 reserved ? which we can do >>> since >>> when using the reserved request, we will not use a hw tag (all reserved >>> requests will be non-ncq). >>> >>> The 32 + 1 scheme will work. >> >> Yes, 32 + 1 is what we want. I now think that there is a mistake in my >> code in this series for ahci. >> >> So I think we need for ahci: >> >> can_queue = 33 > > Hmm.. For ATA, can_queue should be only 32. nr_tags is going to be 33 > though as we include one tag for a reserved command. No ? (may be I > misunderstand can_queue though). Yes, we want nr_tags=33. But according to semantics of can_queue, it includes total of regular and reserved tags. This is because tag_set depth is total of regular and reserved tags, and we subtract reserved tags from total depth in blk_mq_init_bitmaps(): int blk_mq_init_bitmaps(.., unsigned int queue_depth, unsigned int reserved, ..) { unsigned int depth = queue_depth - reserved; ... if (bt_alloc(bitmap_tags, depth, round_robin, node So we want a change like this as well: diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h index da7ee8bec165..cbcc337055a7 100644 --- a/drivers/ata/ahci.h +++ b/drivers/ata/ahci.h @@ -390,7 +390,7 @@ extern const struct attribute_group *ahci_sdev_groups[]; */ #define AHCI_SHT(drv_name) \ __ATA_BASE_SHT(drv_name), \ - .can_queue = AHCI_MAX_CMDS, \ + .can_queue = AHCI_MAX_CMDS + 1, \ .sg_tablesize = AHCI_MAX_SG, \ .dma_boundary = AHCI_DMA_BOUNDARY, \ .shost_groups = ahci_shost_groups, I know it seems strange, but still 32 tags will only ever be available for non-internal commands (as it is today) and 1 for ata internal command. > >> nr_reserved_cmds = 1 >> >> while I only have can_queue = 32 > > Which seems right to me. > >> >> I need to check that again for ahci driver and AHCI SHT... >> >>> But for CDL command completion handling, we >>> will need a NCQ command to do a read log, to avoid forcing a queue drain. >>> For that to reliably work, we'll need a 31+1+1 setup... >>> >> >> So is your idea to permanently reserve 1 more command from 32 commands ? > > Yes. Command Duration Limits has this weird case were commands may be > failed when exceeding their duration limit with a "good status" and > "sense data available bit" set. This case was defined to avoid the queue > stall that happens with any NCQ error. The way to handle this without > relying on EH (as that would also cause a queue drain) is to issue an > NCQ read log command to fetch the "sense data for successful NCQ > commands" log, to retrieve the sense data for the completed command and > check if it really failed or not. So we absolutely need a reserved > command for this, Without a reserved command, it is a nightmare to > support (tag reuse would be another solution, but it is horrible). > >> Or re-use 1 from 32 (and still also have 1 separate internal command)? > > I am not yet 100% sure if we can treat that internal NCQ read log like > any other read/write request... If we can, then the 1-out-of-32 > reservation would not be needed. Need to revisit all the cases we need > to take care of (because in the middle of this CDL completion handling, > regular NCQ errors can happen, resulting in a drive reset that could > wreck everything as we lose the sense data for the completed requests). > > In any case, I think that we can deal with that extra reserved command > on top of you current series. No need to worry about it for now I think. > So are you saying that you are basing current CDL support on libata internally managing this extra reserved tag (and so do not need this SCSI midlayer reserved tag support yet)? Thanks, John
On 11/2/22 18:52, John Garry wrote: > Hi Damien, > >>>>> >>>>> Please also note that for AHCI, I make reserved depth =1, while for SAS >>>>> controllers it is greater. This means that in theory we could alloc > 1x >>>>> reserved command for SATA disk, but I don't think it matters. >>>> Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that >>>> the >>>> user can only use 31 tags ? or is it 32+1 reserved ? which we can do >>>> since >>>> when using the reserved request, we will not use a hw tag (all reserved >>>> requests will be non-ncq). >>>> >>>> The 32 + 1 scheme will work. >>> >>> Yes, 32 + 1 is what we want. I now think that there is a mistake in my >>> code in this series for ahci. >>> >>> So I think we need for ahci: >>> >>> can_queue = 33 > >> Hmm.. For ATA, can_queue should be only 32. nr_tags is going to be 33 >> though as we include one tag for a reserved command. No ? (may be I >> misunderstand can_queue though). > > Yes, we want nr_tags=33. But according to semantics of can_queue, it > includes total of regular and reserved tags. This is because tag_set > depth is total of regular and reserved tags, and we subtract reserved > tags from total depth in blk_mq_init_bitmaps(): > > int blk_mq_init_bitmaps(.., unsigned int queue_depth, unsigned int > reserved, ..) > { > unsigned int depth = queue_depth - reserved; > ... > > if (bt_alloc(bitmap_tags, depth, round_robin, node > > > So we want a change like this as well: > > diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h > index da7ee8bec165..cbcc337055a7 100644 > --- a/drivers/ata/ahci.h > +++ b/drivers/ata/ahci.h > @@ -390,7 +390,7 @@ extern const struct attribute_group > *ahci_sdev_groups[]; > */ > #define AHCI_SHT(drv_name) \ > __ATA_BASE_SHT(drv_name), \ > - .can_queue = AHCI_MAX_CMDS, \ > + .can_queue = AHCI_MAX_CMDS + 1, \ > .sg_tablesize = AHCI_MAX_SG, \ > .dma_boundary = AHCI_DMA_BOUNDARY, \ > .shost_groups = ahci_shost_groups, > > I know it seems strange, but still 32 tags will only ever be available > for non-internal commands (as it is today) and 1 for ata internal command. > >> >>> nr_reserved_cmds = 1 >>> >>> while I only have can_queue = 32 >> >> Which seems right to me. >> >>> >>> I need to check that again for ahci driver and AHCI SHT... >>> >>>> But for CDL command completion handling, we >>>> will need a NCQ command to do a read log, to avoid forcing a queue drain. >>>> For that to reliably work, we'll need a 31+1+1 setup... >>>> >>> >>> So is your idea to permanently reserve 1 more command from 32 commands ? >> >> Yes. Command Duration Limits has this weird case were commands may be >> failed when exceeding their duration limit with a "good status" and >> "sense data available bit" set. This case was defined to avoid the queue >> stall that happens with any NCQ error. The way to handle this without >> relying on EH (as that would also cause a queue drain) is to issue an >> NCQ read log command to fetch the "sense data for successful NCQ >> commands" log, to retrieve the sense data for the completed command and >> check if it really failed or not. So we absolutely need a reserved >> command for this, Without a reserved command, it is a nightmare to >> support (tag reuse would be another solution, but it is horrible). >> >>> Or re-use 1 from 32 (and still also have 1 separate internal command)? >> >> I am not yet 100% sure if we can treat that internal NCQ read log like >> any other read/write request... If we can, then the 1-out-of-32 >> reservation would not be needed. Need to revisit all the cases we need >> to take care of (because in the middle of this CDL completion handling, >> regular NCQ errors can happen, resulting in a drive reset that could >> wreck everything as we lose the sense data for the completed requests). >> >> In any case, I think that we can deal with that extra reserved command >> on top of you current series. No need to worry about it for now I think. >> > > So are you saying that you are basing current CDL support on libata > internally managing this extra reserved tag (and so do not need this > SCSI midlayer reserved tag support yet)? Not really. For now, it is using libata EH, that is, when we need the internal command for the read log, we know the device is idle and no command is on-going. So we send a non-NCQ command which does not have a tag. Ideally, all of this should use a real reserved tag to allow for an NCQ read log outside of EH, avoiding the drive queue drain. > > Thanks, > John
On 11/2/22 11:07, Damien Le Moal wrote: > On 11/2/22 18:52, John Garry wrote: >> Hi Damien, >> [ .. ] >>>> Or re-use 1 from 32 (and still also have 1 separate internal command)? >>> >>> I am not yet 100% sure if we can treat that internal NCQ read log like >>> any other read/write request... If we can, then the 1-out-of-32 >>> reservation would not be needed. Need to revisit all the cases we need >>> to take care of (because in the middle of this CDL completion handling, >>> regular NCQ errors can happen, resulting in a drive reset that could >>> wreck everything as we lose the sense data for the completed requests). >>> >>> In any case, I think that we can deal with that extra reserved command >>> on top of you current series. No need to worry about it for now I think. >>> >> >> So are you saying that you are basing current CDL support on libata >> internally managing this extra reserved tag (and so do not need this >> SCSI midlayer reserved tag support yet)? > > Not really. For now, it is using libata EH, that is, when we need the > internal command for the read log, we know the device is idle and no > command is on-going. So we send a non-NCQ command which does not have a tag. > > Ideally, all of this should use a real reserved tag to allow for an NCQ > read log outside of EH, avoiding the drive queue drain. > But with the current design you'll only get that if you reserve one precious tag. OTOH, we might not need that tag at all, as _if_ we get an error for a specific command the tag associated with it is necessarily free after completion, right? So we only need to find a way of 're-using' that tag, then we won't have to set aside a reserved tag and everything would be dandy... Maybe we can stop processing when we receive an error (should be doing that anyway as otherwise the log might be overwritten), then we should be having a pretty good chance of getting that tag. Or, precisely, getting _any_ tag as at least one tag is free at that point. Hmm? Cheers, Hannes
On 11/2/22 20:12, Hannes Reinecke wrote: > On 11/2/22 11:07, Damien Le Moal wrote: >> On 11/2/22 18:52, John Garry wrote: >>> Hi Damien, >>> > [ .. ] >>>>> Or re-use 1 from 32 (and still also have 1 separate internal command)? >>>> >>>> I am not yet 100% sure if we can treat that internal NCQ read log like >>>> any other read/write request... If we can, then the 1-out-of-32 >>>> reservation would not be needed. Need to revisit all the cases we need >>>> to take care of (because in the middle of this CDL completion handling, >>>> regular NCQ errors can happen, resulting in a drive reset that could >>>> wreck everything as we lose the sense data for the completed requests). >>>> >>>> In any case, I think that we can deal with that extra reserved command >>>> on top of you current series. No need to worry about it for now I think. >>>> >>> >>> So are you saying that you are basing current CDL support on libata >>> internally managing this extra reserved tag (and so do not need this >>> SCSI midlayer reserved tag support yet)? >> >> Not really. For now, it is using libata EH, that is, when we need the >> internal command for the read log, we know the device is idle and no >> command is on-going. So we send a non-NCQ command which does not have a tag. >> >> Ideally, all of this should use a real reserved tag to allow for an NCQ >> read log outside of EH, avoiding the drive queue drain. >> > But with the current design you'll only get that if you reserve one > precious tag. yes, which is annoying. Back to the days where ATA max qd was 31... > OTOH, we might not need that tag at all, as _if_ we get an error for a > specific command the tag associated with it is necessarily free after > completion, right? Well, it is not really free. It is unused as far as the device is concerned since the command that needs to be checked completed. But not free yet since we need to do the read log first before being able to scsi-complete the command (which will free the tag). So if we use the regular submission path to issue the read log, we must be guaranteed that we can get a tag, otherwise we will deadlock. Hence the need to reserve one tag. > So we only need to find a way of 're-using' that tag, then we won't have > to set aside a reserved tag and everything would be dandy... I tried that. It is very ugly... Problem is that integration with EH in case a real NCQ error happens when all that read-log-complete dance is happening is hard. And don't get me started with the need to save/restore the scsi command context of the command we are reusing the tag from. And given that the code is changing to use regular submission path for internal commands, right now, we need a reserved tag. Or a way to "borrow" the tag from a request that we need to check. Which means we need some additional api to not always try to allocate a tag. > > Maybe we can stop processing when we receive an error (should be doing > that anyway as otherwise the log might be overwritten), then we should > be having a pretty good chance of getting that tag. Hmmm.... that would be no better than using EH which does stop processing until the internal house keeping is done. > Or, precisely, getting _any_ tag as at least one tag is free at that point. > Hmm? See above. Not free, but usable as far as the device is concerned since we have at least on command we need to check completed at the device level (but not yet completed from scsi/block layer point of view). > > Cheers, > > Hannes
On 11/2/22 12:25, Damien Le Moal wrote: > On 11/2/22 20:12, Hannes Reinecke wrote: >> On 11/2/22 11:07, Damien Le Moal wrote: >>> On 11/2/22 18:52, John Garry wrote: >>>> Hi Damien, >>>> >> [ .. ] >> So we only need to find a way of 're-using' that tag, then we won't have >> to set aside a reserved tag and everything would be dandy... > > I tried that. It is very ugly... Problem is that integration with EH in > case a real NCQ error happens when all that read-log-complete dance is > happening is hard. And don't get me started with the need to save/restore > the scsi command context of the command we are reusing the tag from. > > And given that the code is changing to use regular submission path for > internal commands, right now, we need a reserved tag. Or a way to "borrow" > the tag from a request that we need to check. Which means we need some > additional api to not always try to allocate a tag. > >> >> Maybe we can stop processing when we receive an error (should be doing >> that anyway as otherwise the log might be overwritten), then we should >> be having a pretty good chance of getting that tag. > > Hmmm.... that would be no better than using EH which does stop processing > until the internal house keeping is done. > >> Or, precisely, getting _any_ tag as at least one tag is free at that point. >> Hmm? > > See above. Not free, but usable as far as the device is concerned since we > have at least on command we need to check completed at the device level > (but not yet completed from scsi/block layer point of view). > So, having had an entire weekend pondering this issue why don't we allocate an _additional_ set of requests? After all, we had been very generous with allocating queues and requests (what with us doing a full provisioning of the requests for all queues already for the non-shared tag case). Idea would be to keep the single tag bitmap, but add eg a new rq state MQ_RQ_ERROR. Once that flag is set we'll fetch the error request instead of the normal one: @@ -761,6 +763,8 @@ static inline struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, { if (tag < tags->nr_tags) { prefetch(tags->rqs[tag]); + if (unlikely(blk_mq_request_error(tags->rqs[tag]))) + return tags->error_rqs[tag]; return tags->rqs[tag]; } and, of course, we would need to provision the error request first. Rationale here is that this will be primarily for devices with a low number of tags, so doubling the number of request isn't much of an overhead (as we'll be doing it essentially anyway in the error case as we'll have to save the original request _somewhere_), and that it would remove quite some cruft from the subsystem; look at SCSI EH trying to store the original request contents and then after EH restoring them again. Hmm? Cheers, Hannes
On 11/7/22 19:12, Hannes Reinecke wrote: > On 11/2/22 12:25, Damien Le Moal wrote: >> On 11/2/22 20:12, Hannes Reinecke wrote: >>> On 11/2/22 11:07, Damien Le Moal wrote: >>>> On 11/2/22 18:52, John Garry wrote: >>>>> Hi Damien, >>>>> >>> [ .. ] >> So we only need to find a way of 're-using' that tag, then we won't have >>> to set aside a reserved tag and everything would be dandy... >> >> I tried that. It is very ugly... Problem is that integration with EH in >> case a real NCQ error happens when all that read-log-complete dance is >> happening is hard. And don't get me started with the need to save/restore >> the scsi command context of the command we are reusing the tag from. >> >> And given that the code is changing to use regular submission path for >> internal commands, right now, we need a reserved tag. Or a way to "borrow" >> the tag from a request that we need to check. Which means we need some >> additional api to not always try to allocate a tag. >> >>> >>> Maybe we can stop processing when we receive an error (should be doing >>> that anyway as otherwise the log might be overwritten), then we should >>> be having a pretty good chance of getting that tag. >> >> Hmmm.... that would be no better than using EH which does stop processing >> until the internal house keeping is done. >> >>> Or, precisely, getting _any_ tag as at least one tag is free at that point. >>> Hmm? >> >> See above. Not free, but usable as far as the device is concerned since we >> have at least on command we need to check completed at the device level >> (but not yet completed from scsi/block layer point of view). >> > So, having had an entire weekend pondering this issue why don't we > allocate an _additional_ set of requests? > After all, we had been very generous with allocating queues and requests > (what with us doing a full provisioning of the requests for all queues > already for the non-shared tag case). > > Idea would be to keep the single tag bitmap, but add eg a new rq state > MQ_RQ_ERROR. Once that flag is set we'll fetch the error request instead > of the normal one: > > @@ -761,6 +763,8 @@ static inline struct request > *blk_mq_tag_to_rq(struct blk_mq_tags *tags, > { > if (tag < tags->nr_tags) { > prefetch(tags->rqs[tag]); > + if (unlikely(blk_mq_request_error(tags->rqs[tag]))) > + return tags->error_rqs[tag]; > return tags->rqs[tag]; > } > > and, of course, we would need to provision the error request first. > > Rationale here is that this will be primarily for devices with a low > number of tags, so doubling the number of request isn't much of an > overhead (as we'll be doing it essentially anyway in the error case as > we'll have to save the original request _somewhere_), and that it would > remove quite some cruft from the subsystem; look at SCSI EH trying to > store the original request contents and then after EH restoring them again. Interesting idea. I like it. It is essentially a set of reserved requests without reserved tags: the tag to use for these requests would be provided "manually" by the user. Right ? That should allow simplifying any processing that needs to reuse a tag, and currently its request. That is, CDL, but also usb-scsi, scsi EH and the few scsi LLDs using scsi_eh_prep_cmnd()+scsi_eh_restore_cmnd(). Ideally, these 2 functions could go away too. > > Hmm? > > Cheers, > > Hannes
On 11/7/22 14:29, Damien Le Moal wrote: > On 11/7/22 19:12, Hannes Reinecke wrote: >> On 11/2/22 12:25, Damien Le Moal wrote: >>> On 11/2/22 20:12, Hannes Reinecke wrote: >>>> On 11/2/22 11:07, Damien Le Moal wrote: >>>>> On 11/2/22 18:52, John Garry wrote: >>>>>> Hi Damien, >>>>>> >>>> [ .. ] >> So we only need to find a way of 're-using' that tag, then we won't have >>>> to set aside a reserved tag and everything would be dandy... >>> >>> I tried that. It is very ugly... Problem is that integration with EH in >>> case a real NCQ error happens when all that read-log-complete dance is >>> happening is hard. And don't get me started with the need to save/restore >>> the scsi command context of the command we are reusing the tag from. >>> >>> And given that the code is changing to use regular submission path for >>> internal commands, right now, we need a reserved tag. Or a way to "borrow" >>> the tag from a request that we need to check. Which means we need some >>> additional api to not always try to allocate a tag. >>> >>>> >>>> Maybe we can stop processing when we receive an error (should be doing >>>> that anyway as otherwise the log might be overwritten), then we should >>>> be having a pretty good chance of getting that tag. >>> >>> Hmmm.... that would be no better than using EH which does stop processing >>> until the internal house keeping is done. >>> >>>> Or, precisely, getting _any_ tag as at least one tag is free at that point. >>>> Hmm? >>> >>> See above. Not free, but usable as far as the device is concerned since we >>> have at least on command we need to check completed at the device level >>> (but not yet completed from scsi/block layer point of view). >>> >> So, having had an entire weekend pondering this issue why don't we >> allocate an _additional_ set of requests? >> After all, we had been very generous with allocating queues and requests >> (what with us doing a full provisioning of the requests for all queues >> already for the non-shared tag case). >> >> Idea would be to keep the single tag bitmap, but add eg a new rq state >> MQ_RQ_ERROR. Once that flag is set we'll fetch the error request instead >> of the normal one: >> >> @@ -761,6 +763,8 @@ static inline struct request >> *blk_mq_tag_to_rq(struct blk_mq_tags *tags, >> { >> if (tag < tags->nr_tags) { >> prefetch(tags->rqs[tag]); >> + if (unlikely(blk_mq_request_error(tags->rqs[tag]))) >> + return tags->error_rqs[tag]; >> return tags->rqs[tag]; >> } >> >> and, of course, we would need to provision the error request first. >> >> Rationale here is that this will be primarily for devices with a low >> number of tags, so doubling the number of request isn't much of an >> overhead (as we'll be doing it essentially anyway in the error case as >> we'll have to save the original request _somewhere_), and that it would >> remove quite some cruft from the subsystem; look at SCSI EH trying to >> store the original request contents and then after EH restoring them again. > > Interesting idea. I like it. It is essentially a set of reserved requests > without reserved tags: the tag to use for these requests would be provided > "manually" by the user. Right ? > Yes. Upon failure one would be calling something like 'blk_mq_get_error_rq(rq)', which would set the error flag in the original request, fetch the matching request from ->static_rqs, and return that one. Just figured, we could simply enlarge 'static_rqs' to have double the size; then we can easily get the appropriate request from 'static_rqs' by just adding the queue size. Making things even easier ... > That should allow simplifying any processing that needs to reuse a tag, > and currently its request. That is, CDL, but also usb-scsi, scsi EH and > the few scsi LLDs using scsi_eh_prep_cmnd()+scsi_eh_restore_cmnd(). > Ideally, these 2 functions could go away too. > Which was precisely the idea. We have quite some drivers/infrastructure which already require a similar functionality, and basically all of them cover devices with a really low tag space (32/31 in the libata NCQ case, 16 in the SCSI TCQ case, or even _1_ in the SCSI parallel case :-) So a request duplication wouldn't matter _that_ much here. Drivers with a higher queue depth typically can do 'real' TMFs, where you need to allocate a new tag anyway, and so the whole operation doesn't apply here. Cheers, Hannes
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 30d7c90b0c35..0d6f37d80137 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) return 0; } +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd) +{ + struct ata_port *ap; + int res; + + ap = ata_shost_to_port(shost); + spin_lock_irq(ap->lock); + res = ata_sas_queuecmd(scmd, ap); + spin_unlock_irq(ap->lock); + + return res; +} +EXPORT_SYMBOL_GPL(ata_internal_queuecommand); + /** * ata_scsi_slave_config - Set SCSI device attributes * @sdev: SCSI device to examine diff --git a/include/linux/libata.h b/include/linux/libata.h index 8938b584520f..f09c5dca16ce 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev, sector_t capacity, int geom[]); extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev); extern int ata_scsi_slave_config(struct scsi_device *sdev); +extern int ata_internal_queuecommand(struct Scsi_Host *shost, + struct scsi_cmnd *scmd); extern void ata_scsi_slave_destroy(struct scsi_device *sdev); extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, int queue_depth); @@ -1391,7 +1393,8 @@ extern const struct attribute_group *ata_common_sdev_groups[]; .slave_destroy = ata_scsi_slave_destroy, \ .bios_param = ata_std_bios_param, \ .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ - .max_sectors = ATA_MAX_SECTORS_LBA48 + .max_sectors = ATA_MAX_SECTORS_LBA48,\ + .reserved_queuecommand = ata_internal_queuecommand #define ATA_SUBBASE_SHT(drv_name) \ __ATA_BASE_SHT(drv_name), \