[1/1] nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM173X

Message ID 20230315223436.2857712-1-saeed.mirzamohammadi@oracle.com
State New
Headers
Series [1/1] nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM173X |

Commit Message

Saeed Mirzamohammadi March 15, 2023, 10:34 p.m. UTC
  This adds a quirk to fix the Samsung PM1733a and PM173X reporting
bogus eui64 so they are not marked as "non globally unique" duplicates.

Cc: <stable@vger.kernel.org>
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
---
 drivers/nvme/host/pci.c | 4 ++++
 1 file changed, 4 insertions(+)
  

Comments

Chaitanya Kulkarni March 15, 2023, 10:38 p.m. UTC | #1
On 3/15/23 15:34, Saeed Mirzamohammadi wrote:
> This adds a quirk to fix the Samsung PM1733a and PM173X reporting
> bogus eui64 so they are not marked as "non globally unique" duplicates.
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
> ---
Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck
  
Christoph Hellwig March 16, 2023, 5:15 a.m. UTC | #2
On Wed, Mar 15, 2023 at 03:34:36PM -0700, Saeed Mirzamohammadi wrote:
> This adds a quirk to fix the Samsung PM1733a and PM173X reporting
> bogus eui64 so they are not marked as "non globally unique" duplicates.

What kinds of euid do they report?  Did you report the bug to Samsung?
  
Saeed Mirzamohammadi March 16, 2023, 9:31 p.m. UTC | #3
Hi,

> On Mar 15, 2023, at 10:15 PM, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Wed, Mar 15, 2023 at 03:34:36PM -0700, Saeed Mirzamohammadi wrote:
>> This adds a quirk to fix the Samsung PM1733a and PM173X reporting
>> bogus eui64 so they are not marked as "non globally unique" duplicates.
> 
> What kinds of euid do they report?  Did you report the bug to Samsung?

eui64 values are not unique. Here is an example:
namespace1
nguid   : 36554630529000710025384500000001
eui64   : 002538191100104a
namespace2
nguid   : 36554630529000710025384500000002
eui64   : 002538191100104a
namespace3
nguid   : 36554630529000710025384500000003
eui64   : 002538191100104a
namespace4
nguid   : 36554630529000710025384500000004
eui64   : 002538191100104a

I haven’t yet contacted Samsung. Do you recommend any one to reach out to?

Thanks,
Saeed
  
Christoph Hellwig March 17, 2023, 7:33 a.m. UTC | #4
On Thu, Mar 16, 2023 at 09:31:38PM +0000, Saeed Mirzamohammadi wrote:
> 
> eui64 values are not unique. Here is an example:
> namespace1
> nguid   : 36554630529000710025384500000001
> eui64   : 002538191100104a
> namespace2
> nguid   : 36554630529000710025384500000002
> eui64   : 002538191100104a
> namespace3
> nguid   : 36554630529000710025384500000003
> eui64   : 002538191100104a
> namespace4
> nguid   : 36554630529000710025384500000004
> eui64   : 002538191100104a

Eww, that's really a grave bug.  Interestingly enough the nguid
works, so I wonder if we should just quirk the EUI so that we have
a least some uniqueue identifier.

> I haven’t yet contacted Samsung. Do you recommend any one to reach out to?

We have plenty of Samsung folks here on the list, maybe someone can
reply?
  
Pankaj Raghav March 17, 2023, 11:10 a.m. UTC | #5
Hi Saeed,
On Thu, Mar 16, 2023 at 09:31:38PM +0000, Saeed Mirzamohammadi wrote:
> eui64 values are not unique. Here is an example:
> namespace1
> nguid   : 36554630529000710025384500000001
> eui64   : 002538191100104a
> namespace2
> nguid   : 36554630529000710025384500000002
> eui64   : 002538191100104a
> namespace3
> nguid   : 36554630529000710025384500000003
> eui64   : 002538191100104a
> namespace4
> nguid   : 36554630529000710025384500000004
> eui64   : 002538191100104a
> 
> I haven’t yet contacted Samsung. Do you recommend any one to reach out to?

I am able to reproduce this error with a PM173X.

nvme id-ns:
root@missingno:~# nvme id-ns -n 1 /dev/nvme1 | grep eui64
eui64   : 00253891010005b3
root@missingno:~# nvme id-ns -n 2 /dev/nvme1 | grep eui64
eui64   : 00253891010005b3
root@missingno:~# nvme id-ns -n 3 /dev/nvme1 | grep eui64
eui64   : 00253891010005b3

dmesg:
[174690.305507] nvme nvme1: identifiers changed for nsid 1
[174878.481002] nvme nvme1: rescanning namespaces.
[174878.535981] nvme nvme1: duplicate IDs in subsystem for nsid 2
[174878.599982] nvme nvme1: duplicate IDs in subsystem for nsid 2
[174883.673001] nvme nvme1: rescanning namespaces.
[174883.740045] nvme nvme1: duplicate IDs in subsystem for nsid 2
[174883.784025] nvme nvme1: duplicate IDs in subsystem for nsid 3
[174883.852034] nvme nvme1: duplicate IDs in subsystem for nsid 2
[174883.892040] nvme nvme1: duplicate IDs in subsystem for nsid 3

I will report this to our firmware team. Meanwhile, Could you paste your
output id-ctrl output here? I am interested in the fw version you are using.

--
Pankaj
  
Saeed Mirzamohammadi March 17, 2023, 4:26 p.m. UTC | #6
Hi Pankaj,

See the output below.

> On Mar 17, 2023, at 4:10 AM, Pankaj Raghav <p.raghav@samsung.com> wrote:
> 
> Hi Saeed,
> On Thu, Mar 16, 2023 at 09:31:38PM +0000, Saeed Mirzamohammadi wrote:
>> eui64 values are not unique. Here is an example:
>> namespace1
>> nguid   : 36554630529000710025384500000001
>> eui64   : 002538191100104a
>> namespace2
>> nguid   : 36554630529000710025384500000002
>> eui64   : 002538191100104a
>> namespace3
>> nguid   : 36554630529000710025384500000003
>> eui64   : 002538191100104a
>> namespace4
>> nguid   : 36554630529000710025384500000004
>> eui64   : 002538191100104a
>> 
>> I haven’t yet contacted Samsung. Do you recommend any one to reach out to?
> 
> I am able to reproduce this error with a PM173X.
> 
> nvme id-ns:
> root@missingno:~# nvme id-ns -n 1 /dev/nvme1 | grep eui64
> eui64   : 00253891010005b3
> root@missingno:~# nvme id-ns -n 2 /dev/nvme1 | grep eui64
> eui64   : 00253891010005b3
> root@missingno:~# nvme id-ns -n 3 /dev/nvme1 | grep eui64
> eui64   : 00253891010005b3
> 
> dmesg:
> [174690.305507] nvme nvme1: identifiers changed for nsid 1
> [174878.481002] nvme nvme1: rescanning namespaces.
> [174878.535981] nvme nvme1: duplicate IDs in subsystem for nsid 2
> [174878.599982] nvme nvme1: duplicate IDs in subsystem for nsid 2
> [174883.673001] nvme nvme1: rescanning namespaces.
> [174883.740045] nvme nvme1: duplicate IDs in subsystem for nsid 2
> [174883.784025] nvme nvme1: duplicate IDs in subsystem for nsid 3
> [174883.852034] nvme nvme1: duplicate IDs in subsystem for nsid 2
> [174883.892040] nvme nvme1: duplicate IDs in subsystem for nsid 3
> 
> I will report this to our firmware team. Meanwhile, Could you paste your
> output id-ctrl output here? I am interested in the fw version you are using.

NVME Identify Controller:
vid       : 0x144d
ssvid     : 0x108e
sn        : S64VNE0R602271
mn        : SAMSUNG MZWLR3T8HBLS-00AU3
fr        : MPK94R5Q
rab       : 8
ieee      : 002538
cmic      : 0x3
mdts      : 8
cntlid    : 0x41
ver       : 0x10300
rtd3r     : 0xe4e1c0
rtd3e     : 0x989680
oaes      : 0x300
ctratt    : 0
rrls      : 0
cntrltype : 0
fguid     :
crdt1     : 0
crdt2     : 0
crdt3     : 0
nvmsr     : 1
vwci      : 255
mec       : 3
oacs      : 0x5f
acl       : 127
aerl      : 15
frmw      : 0x17
lpa       : 0xe
elpe      : 255
npss      : 0
avscc     : 0x1
apsta     : 0
wctemp    : 345
cctemp    : 358
mtfa      : 130
hmpre     : 0
hmmin     : 0
tnvmcap   : 3840755982336
unvmcap   : 3989938176
rpmbs     : 0
edstt     : 2
dsto      : 1
fwug      : 255
kas       : 0
hctma     : 0
mntmt     : 0
mxtmt     : 0
sanicap   : 0x3
hmminds   : 0
hmmaxd    : 0
nsetidmax : 0
endgidmax : 0
anatt     : 0
anacap    : 0
anagrpmax : 0
nanagrpid : 0
pels      : 0
domainid  : 0
megcap    : 0
sqes      : 0x66
cqes      : 0x44
maxcmd    : 0
nn        : 32
oncs      : 0x7f
fuses     : 0
fna       : 0x4
vwc       : 0
awun      : 65535
awupf     : 0
icsvscc     : 1
nwpc      : 0
acwu      : 0
ocfs      : 0
sgls      : 0xf0002
mnan      : 0
maxdna    : 0
maxcna    : 0
subnqn    : nqn.1994-11.com.samsung:nvme:PM1733:2.5-inch:S64VNE0R602271
ioccsz    : 0
iorcsz    : 0
icdoff    : 0
fcatt     : 0
msdbd     : 0
ofcs      : 0
ps    0 : mp:25.00W operational enlat:180 exlat:180 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:7.00W active_power:20.00W

Thanks,
Saeed

> 
> --
> Pankaj
  
Pankaj Raghav March 20, 2023, 1:21 p.m. UTC | #7
On 2023-03-17 17:26, Saeed Mirzamohammadi wrote:
>> Hi Saeed,
>> On Thu, Mar 16, 2023 at 09:31:38PM +0000, Saeed Mirzamohammadi wrote:
>>> eui64 values are not unique. Here is an example:
>>> namespace1
>>> nguid   : 36554630529000710025384500000001
>>> eui64   : 002538191100104a
>>> namespace2
>>> nguid   : 36554630529000710025384500000002
>>> eui64   : 002538191100104a
>>> namespace3
>>> nguid   : 36554630529000710025384500000003
>>> eui64   : 002538191100104a
>>> namespace4
>>> nguid   : 36554630529000710025384500000004
>>> eui64   : 002538191100104a
>>>
>>> I haven’t yet contacted Samsung. Do you recommend any one to reach out to?
>>
>> I am able to reproduce this error with a PM173X.
>>
>> nvme id-ns:
>> root@missingno:~# nvme id-ns -n 1 /dev/nvme1 | grep eui64
>> eui64   : 00253891010005b3
>> root@missingno:~# nvme id-ns -n 2 /dev/nvme1 | grep eui64
>> eui64   : 00253891010005b3
>> root@missingno:~# nvme id-ns -n 3 /dev/nvme1 | grep eui64
>> eui64   : 00253891010005b3
>>
>> dmesg:
>> [174690.305507] nvme nvme1: identifiers changed for nsid 1
>> [174878.481002] nvme nvme1: rescanning namespaces.
>> [174878.535981] nvme nvme1: duplicate IDs in subsystem for nsid 2
>> [174878.599982] nvme nvme1: duplicate IDs in subsystem for nsid 2
>> [174883.673001] nvme nvme1: rescanning namespaces.
>> [174883.740045] nvme nvme1: duplicate IDs in subsystem for nsid 2
>> [174883.784025] nvme nvme1: duplicate IDs in subsystem for nsid 3
>> [174883.852034] nvme nvme1: duplicate IDs in subsystem for nsid 2
>> [174883.892040] nvme nvme1: duplicate IDs in subsystem for nsid 3
>>
>> I will report this to our firmware team. Meanwhile, Could you paste your
>> output id-ctrl output here? I am interested in the fw version you are using.
> 
> NVME Identify Controller:
> vid       : 0x144d
> ssvid     : 0x108e
> sn        : S64VNE0R602271
> mn        : SAMSUNG MZWLR3T8HBLS-00AU3
> fr        : MPK94R5Q

We have reported this error to our firmware team. I will keep you posted.

--
Pankaj
  
Christoph Hellwig March 21, 2023, 1:26 p.m. UTC | #8
Can you send a patch with a new quirk that just disables the EUI64,
but keeps the NGUID?
  
Christoph Hellwig April 14, 2023, 5:12 a.m. UTC | #9
On Tue, Mar 21, 2023 at 02:26:04PM +0100, Christoph Hellwig wrote:
> Can you send a patch with a new quirk that just disables the EUI64,
> but keeps the NGUID?

Did this go anywhere?
  
Pankaj Raghav April 27, 2023, 7:37 a.m. UTC | #10
Hi Christoph,
On Fri, Apr 14, 2023 at 07:12:59AM +0200, Christoph Hellwig wrote:
> On Tue, Mar 21, 2023 at 02:26:04PM +0100, Christoph Hellwig wrote:
> > Can you send a patch with a new quirk that just disables the EUI64,
> > but keeps the NGUID?
> 
> Did this go anywhere?
We had a discussion about this internally with our firmware team, and it
looks like these firmware were given to specific customers based on
mutual agreement. They are already in discussion with our firmware team
regarding this.

I don't think this should go into as a generic quirk in Linux for these
models.
  

Patch

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5b95c94ee40f2..c0b1caba1c893 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3359,6 +3359,10 @@  static const struct pci_device_id nvme_id_table[] = {
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY |
 				NVME_QUIRK_DISABLE_WRITE_ZEROES|
 				NVME_QUIRK_IGNORE_DEV_SUBNQN, },
+	{ PCI_DEVICE(0x144d, 0xa824),   /* Samsung PM173X */
+		.driver_data = NVME_QUIRK_BOGUS_NID, },
+	{ PCI_DEVICE(0x144d, 0xa825),   /* Samsung PM1733a */
+		.driver_data = NVME_QUIRK_BOGUS_NID, },
 	{ PCI_DEVICE(0x1987, 0x5012),	/* Phison E12 */
 		.driver_data = NVME_QUIRK_BOGUS_NID, },
 	{ PCI_DEVICE(0x1987, 0x5016),	/* Phison E16 */