[v3] nvme: update firmware version after commit

Message ID 20231030160044.20355-1-dwagner@suse.de
State New
Headers
Series [v3] nvme: update firmware version after commit |

Commit Message

Daniel Wagner Oct. 30, 2023, 4 p.m. UTC
  The firmware version sysfs entry needs to be updated after a successfully
firmware activation.

nvme-cli stopped issuing an Identify Controller command to list the
current firmware information and relies on sysfs showing the current
firmware version.

Reported-by: Kenji Tomonaga <tkenbo@gmail.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---

Only compile tested. Asked for testing.

changes:

v3:
  - use afi variable directly, no _to_cpu helper
  - fix bit mask size

v2:
  - use fw slot info instead issuing another identify controller command
  - https://lore.kernel.org/linux-nvme/20231013163420.3097-1-dwagner@suse.de

v1:
  - initial version
  - https://lore.kernel.org/linux-nvme/20231013062623.6745-1-dwagner@suse.de/


 drivers/nvme/host/core.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)
  

Comments

Christoph Hellwig Oct. 31, 2023, 8:53 a.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
  
Niklas Cassel Oct. 31, 2023, 2:20 p.m. UTC | #2
On Mon, Oct 30, 2023 at 05:00:44PM +0100, Daniel Wagner wrote:
> The firmware version sysfs entry needs to be updated after a successfully
> firmware activation.
> 
> nvme-cli stopped issuing an Identify Controller command to list the
> current firmware information and relies on sysfs showing the current
> firmware version.
> 
> Reported-by: Kenji Tomonaga <tkenbo@gmail.com>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> 
> Only compile tested. Asked for testing.
> 
> changes:
> 
> v3:
>   - use afi variable directly, no _to_cpu helper
>   - fix bit mask size
> 
> v2:
>   - use fw slot info instead issuing another identify controller command
>   - https://lore.kernel.org/linux-nvme/20231013163420.3097-1-dwagner@suse.de
> 
> v1:
>   - initial version
>   - https://lore.kernel.org/linux-nvme/20231013062623.6745-1-dwagner@suse.de/
> 
> 
>  drivers/nvme/host/core.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 37b6fa746662..e8511bff78d2 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4053,8 +4053,21 @@ static void nvme_get_fw_slot_info(struct nvme_ctrl *ctrl)
>  		return;
>  
>  	if (nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_FW_SLOT, 0, NVME_CSI_NVM,
> -			log, sizeof(*log), 0))
> +			 log, sizeof(*log), 0)) {
>  		dev_warn(ctrl->device, "Get FW SLOT INFO log error\n");
> +		goto out_free_log;
> +	}
> +
> +	if (log->afi & 0x70) {
> +		dev_info(ctrl->device,
> +			 "Firmware is activated after next Controller Level Reset\n");
> +		goto out_free_log;
> +	}
> +
> +	memcpy(ctrl->subsys->firmware_rev, &log->frs[log->afi & 0x7],
> +		sizeof(ctrl->subsys->firmware_rev));
> +
> +out_free_log:
>  	kfree(log);
>  }
>  
> -- 
> 2.42.0
> 

Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
  
Daniel Wagner Oct. 31, 2023, 4:08 p.m. UTC | #3
On Mon, Oct 30, 2023 at 05:00:44PM +0100, Daniel Wagner wrote:
> The firmware version sysfs entry needs to be updated after a successfully
> firmware activation.
> 
> nvme-cli stopped issuing an Identify Controller command to list the
> current firmware information and relies on sysfs showing the current
> firmware version.
> 
> Reported-by: Kenji Tomonaga <tkenbo@gmail.com>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>

Unfortunately, Kenji is not able to post on the mailing list, so here his tag:

Tested-by: Kenji Tomonaga <tkenbo@gmail.com>
  
Keith Busch Oct. 31, 2023, 4:08 p.m. UTC | #4
On Mon, Oct 30, 2023 at 05:00:44PM +0100, Daniel Wagner wrote:
> The firmware version sysfs entry needs to be updated after a successfully
> firmware activation.
> 
> nvme-cli stopped issuing an Identify Controller command to list the
> current firmware information and relies on sysfs showing the current
> firmware version.
> 
> Reported-by: Kenji Tomonaga <tkenbo@gmail.com>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>

Thanks, applied for nvme-6.7.
  
Daniel Wagner Nov. 3, 2023, 12:11 p.m. UTC | #5
On Tue, Oct 31, 2023 at 10:08:53AM -0600, Keith Busch wrote:
> On Mon, Oct 30, 2023 at 05:00:44PM +0100, Daniel Wagner wrote:
> > The firmware version sysfs entry needs to be updated after a successfully
> > firmware activation.
> > 
> > nvme-cli stopped issuing an Identify Controller command to list the
> > current firmware information and relies on sysfs showing the current
> > firmware version.
> > 
> > Reported-by: Kenji Tomonaga <tkenbo@gmail.com>
> > Signed-off-by: Daniel Wagner <dwagner@suse.de>
> 
> Thanks, applied for nvme-6.7.

I've got negative feedback from one of our customer. I've annotate the
code with

	dev_info(ctrl->device, "afi: %#x\n", log->afi);
	for (i = 0; i < 7; i++) {
		dev_info(ctrl->device, "frs%d: %.*s\n", i + 1,
			 nvme_strlen((char *)&log->frs[i], sizeof(ctrl->subsys->firmware_rev)),
			 (char *)&log->frs[i]);
	}


[  124.824812] nvme nvme8: afi: 0x3
[  124.824824] nvme nvme8: frs1: 0.4.0
[  124.824828] nvme nvme8: frs2: 0.3.0
[  124.824832] nvme nvme8: frs3: 0.4.0
[  124.824835] nvme nvme8: frs4:
[  124.824837] nvme nvme8: frs5:
[  124.824840] nvme nvme8: frs6:
[  124.824842] nvme nvme8: frs7:


This particular firmware seem to interpret afi one based, while
the this patch assumes it is zero based


	memcpy(ctrl->subsys->firmware_rev, &log->frs[log->afi & 0x7],
		sizeof(ctrl->subsys->firmware_rev));


The spec says


  Active Firmware Info (AFI): Specifies information about the active
                              firmware revision.

  Bit 7    is reserved.
  Bits 6:4 indicates the firmware slot that is going to be activated
           at the next Controller Level Reset. If this field is 0h,
           then the controller does not indicate the firmware slot that
           is going to be activated at the next Controller Level Reset.
  Bit 3    is reserved.
  Bits 2:0 indicates the firmware slot from which the actively running
           firmware revision was loaded.


It's not clear to me if afi bits 2:0 is zero or one based. Bits 6:4
indicate to be 1 based.

Any ideas how to handle this?

Thanks,
Daniel
  
Christoph Hellwig Nov. 3, 2023, 1:58 p.m. UTC | #6
On Fri, Nov 03, 2023 at 01:11:02PM +0100, Daniel Wagner wrote:
> This particular firmware seem to interpret afi one based, while
> the this patch assumes it is zero based

>   Active Firmware Info (AFI): Specifies information about the active
>                               firmware revision.
> 
>   Bit 7    is reserved.
>   Bits 6:4 indicates the firmware slot that is going to be activated
>            at the next Controller Level Reset. If this field is 0h,
>            then the controller does not indicate the firmware slot that
>            is going to be activated at the next Controller Level Reset.
>   Bit 3    is reserved.
>   Bits 2:0 indicates the firmware slot from which the actively running
>            firmware revision was loaded.
> 
> 
> It's not clear to me if afi bits 2:0 is zero or one based. Bits 6:4
> indicate to be 1 based.

All 0's based (what a stupid term..) fields in NVMe are explicitly
marked as such.  And even if that wasn't the case I'd very much
expect the same encoding for the two sub-fields.
  
Keith Busch Nov. 3, 2023, 2:22 p.m. UTC | #7
On Fri, Nov 03, 2023 at 02:58:57PM +0100, Christoph Hellwig wrote:
> On Fri, Nov 03, 2023 at 01:11:02PM +0100, Daniel Wagner wrote:
> > This particular firmware seem to interpret afi one based, while
> > the this patch assumes it is zero based
> 
> >   Active Firmware Info (AFI): Specifies information about the active
> >                               firmware revision.
> > 
> >   Bit 7    is reserved.
> >   Bits 6:4 indicates the firmware slot that is going to be activated
> >            at the next Controller Level Reset. If this field is 0h,
> >            then the controller does not indicate the firmware slot that
> >            is going to be activated at the next Controller Level Reset.
> >   Bit 3    is reserved.
> >   Bits 2:0 indicates the firmware slot from which the actively running
> >            firmware revision was loaded.
> > 
> > 
> > It's not clear to me if afi bits 2:0 is zero or one based. Bits 6:4
> > indicate to be 1 based.
> 
> All 0's based (what a stupid term..) fields in NVMe are explicitly
> marked as such.  And even if that wasn't the case I'd very much
> expect the same encoding for the two sub-fields.

Yeah, it's just the firmware slot number, taken literally. AFI = 1 means
slot 1, AFI = 2 means slot 2, etc... Slot 0 either has special meaning
(firmware commit SF field, or fw log AFI bits 6:4), or is reserved
value, like in Identify Controller FRMW.NOFS, and has no place in the FW
Slot Info log page.

Our first slot in the log page is defined as slot one, so we have to
subtract one from the AFI field to index into the slot array. I messed
up for not catching that earlier, but thanks for pointing it out now.
  
Daniel Wagner Nov. 6, 2023, 7 a.m. UTC | #8
On Fri, Nov 03, 2023 at 08:22:11AM -0600, Keith Busch wrote:
> > All 0's based (what a stupid term..) fields in NVMe are explicitly
> > marked as such.  And even if that wasn't the case I'd very much
> > expect the same encoding for the two sub-fields.
> 
> Yeah, it's just the firmware slot number, taken literally. AFI = 1 means
> slot 1, AFI = 2 means slot 2, etc... Slot 0 either has special meaning
> (firmware commit SF field, or fw log AFI bits 6:4), or is reserved
> value, like in Identify Controller FRMW.NOFS, and has no place in the FW
> Slot Info log page.
> 
> Our first slot in the log page is defined as slot one, so we have to
> subtract one from the AFI field to index into the slot array. I messed
> up for not catching that earlier, but thanks for pointing it out now.

Thanks for the clarification.

Do you want me to send a follow up patch, a new version of this one or
do you fix up yourself?
  
Keith Busch Nov. 6, 2023, 4:44 p.m. UTC | #9
On Mon, Nov 06, 2023 at 08:00:44AM +0100, Daniel Wagner wrote:
> On Fri, Nov 03, 2023 at 08:22:11AM -0600, Keith Busch wrote:
> > > All 0's based (what a stupid term..) fields in NVMe are explicitly
> > > marked as such.  And even if that wasn't the case I'd very much
> > > expect the same encoding for the two sub-fields.
> > 
> > Yeah, it's just the firmware slot number, taken literally. AFI = 1 means
> > slot 1, AFI = 2 means slot 2, etc... Slot 0 either has special meaning
> > (firmware commit SF field, or fw log AFI bits 6:4), or is reserved
> > value, like in Identify Controller FRMW.NOFS, and has no place in the FW
> > Slot Info log page.
> > 
> > Our first slot in the log page is defined as slot one, so we have to
> > subtract one from the AFI field to index into the slot array. I messed
> > up for not catching that earlier, but thanks for pointing it out now.
> 
> Thanks for the clarification.
> 
> Do you want me to send a follow up patch, a new version of this one or
> do you fix up yourself?

Fixed up inline when applying the original patch. Let me know if you
have any concerns with the result, currently here:

  https://git.infradead.org/nvme.git/commitdiff/983a338b96c8a25b81e773b643f80634358e81bc
  
Daniel Wagner Nov. 7, 2023, 8:30 a.m. UTC | #10
On Mon, Nov 06, 2023 at 09:44:39AM -0700, Keith Busch wrote:
> Fixed up inline when applying the original patch. Let me know if you
> have any concerns with the result, currently here:
> 
>   https://git.infradead.org/nvme.git/commitdiff/983a338b96c8a25b81e773b643f80634358e81bc

Looks good. Thanks!
  

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37b6fa746662..e8511bff78d2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4053,8 +4053,21 @@  static void nvme_get_fw_slot_info(struct nvme_ctrl *ctrl)
 		return;
 
 	if (nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_FW_SLOT, 0, NVME_CSI_NVM,
-			log, sizeof(*log), 0))
+			 log, sizeof(*log), 0)) {
 		dev_warn(ctrl->device, "Get FW SLOT INFO log error\n");
+		goto out_free_log;
+	}
+
+	if (log->afi & 0x70) {
+		dev_info(ctrl->device,
+			 "Firmware is activated after next Controller Level Reset\n");
+		goto out_free_log;
+	}
+
+	memcpy(ctrl->subsys->firmware_rev, &log->frs[log->afi & 0x7],
+		sizeof(ctrl->subsys->firmware_rev));
+
+out_free_log:
 	kfree(log);
 }