x86/amd_nb: Check for invalid SMN reads

Message ID 20230403164244.471141-1-yazen.ghannam@amd.com
State New
Headers
Series x86/amd_nb: Check for invalid SMN reads |

Commit Message

Yazen Ghannam April 3, 2023, 4:42 p.m. UTC
  AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.

SMN accesses are done indirectly through an index/data pair in PCI
config space. The PCI config access may fail and return an error code.
This would prevent the "read" value from being updated.

However, the PCI config access may succeed, but the return value may be
invalid. This is in similar fashion to PCI bad reads, i.e. return all
bits set.

Most systems will return 0 for SMN addresses that are not accessible.
This is in line with AMD convention that unavailable registers are
Read-as-Zero/Writes-Ignored.

However, some systems will return a "PCI Error Response" instead. This
value, along with an error code of 0 from the PCI config access, will
confuse callers of the amd_smn_read() function.

Check for this condition, clear the return value, and set a proper error
code.

Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/amd_nb.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)
  

Comments

Borislav Petkov April 3, 2023, 7:32 p.m. UTC | #1
On Mon, Apr 03, 2023 at 04:42:44PM +0000, Yazen Ghannam wrote:
>  int amd_smn_read(u16 node, u32 address, u32 *value)
>  {
> -	return __amd_smn_rw(node, address, value, false);
> +	int err = __amd_smn_rw(node, address, value, false);
> +
> +	if (PCI_POSSIBLE_ERROR(*value)) {
> +		err = -ENODEV;
> +		*value = 0;
> +	}

Why not put this check in __amd_smn_rw()?
  
Yazen Ghannam April 3, 2023, 7:40 p.m. UTC | #2
On 4/3/23 15:32, Borislav Petkov wrote:
> On Mon, Apr 03, 2023 at 04:42:44PM +0000, Yazen Ghannam wrote:
>>  int amd_smn_read(u16 node, u32 address, u32 *value)
>>  {
>> -	return __amd_smn_rw(node, address, value, false);
>> +	int err = __amd_smn_rw(node, address, value, false);
>> +
>> +	if (PCI_POSSIBLE_ERROR(*value)) {
>> +		err = -ENODEV;
>> +		*value = 0;
>> +	}
> 
> Why not put this check in __amd_smn_rw()?
> 

I don't think pci_write_config*() sets the PCI Error response like
pci_read_config(), AFAICT.

I think to solve the writes-ignored problem, we'd need to do another read and
compare it to what we intended to write. That could go into amd_smn_write(),
if needed. Unless the PCI kernel API has something like this.

Thanks,
Yazen
  
Borislav Petkov April 3, 2023, 8:36 p.m. UTC | #3
On Mon, Apr 03, 2023 at 03:40:38PM -0400, Yazen Ghannam wrote:
> I don't think pci_write_config*() sets the PCI Error response like
> pci_read_config(), AFAICT.

diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 4266b64631a4..c4caade434a7 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -181,6 +181,13 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
 		pr_warn("Error %s SMN address 0x%x.\n",
 			(write ? "writing to" : "reading from"), address);
 
+	if (!write) {
+		if (PCI_POSSIBLE_ERROR(*value)) {
+			err = -ENODEV;
+			*value = 0;
+		}
+	}
+
 out_unlock:
 	mutex_unlock(&smn_mutex);
  
Yazen Ghannam April 3, 2023, 9:36 p.m. UTC | #4
On 4/3/23 16:36, Borislav Petkov wrote:
> On Mon, Apr 03, 2023 at 03:40:38PM -0400, Yazen Ghannam wrote:
>> I don't think pci_write_config*() sets the PCI Error response like
>> pci_read_config(), AFAICT.
> 
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 4266b64631a4..c4caade434a7 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -181,6 +181,13 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
>  		pr_warn("Error %s SMN address 0x%x.\n",
>  			(write ? "writing to" : "reading from"), address);
>  
> +	if (!write) {
> +		if (PCI_POSSIBLE_ERROR(*value)) {
> +			err = -ENODEV;
> +			*value = 0;
> +		}
> +	}
> +
>  out_unlock:
>  	mutex_unlock(&smn_mutex);
>  
> 

Yes, that's fine. Should I send another revision?

Thanks,
Yazen
  
Borislav Petkov April 5, 2023, 6:06 p.m. UTC | #5
On Mon, Apr 03, 2023 at 05:36:40PM -0400, Yazen Ghannam wrote:
> Yes, that's fine. Should I send another revision?

On a second thought, I think we should do what you said in the write
function too. Because the write can fail too. So if it can, we need to
handle that potential error too.

Care to send a new version which does this check in the read and in the
write function? Basically what you had initially but with the write side
check added too to amd_smn_write.

Thx.
  
Yazen Ghannam April 5, 2023, 6:32 p.m. UTC | #6
On 4/5/23 14:06, Borislav Petkov wrote:
> On Mon, Apr 03, 2023 at 05:36:40PM -0400, Yazen Ghannam wrote:
>> Yes, that's fine. Should I send another revision?
> 
> On a second thought, I think we should do what you said in the write
> function too. Because the write can fail too. So if it can, we need to
> handle that potential error too.
> 
> Care to send a new version which does this check in the read and in the
> write function? Basically what you had initially but with the write side
> check added too to amd_smn_write.
> 

Sure thing. I don't have a real test for the write path. But I'll test by
faking it.

Thanks,
Yazen
  
Yazen Ghannam April 5, 2023, 7:10 p.m. UTC | #7
On 4/5/23 14:32, Yazen Ghannam wrote:
> On 4/5/23 14:06, Borislav Petkov wrote:
>> On Mon, Apr 03, 2023 at 05:36:40PM -0400, Yazen Ghannam wrote:
>>> Yes, that's fine. Should I send another revision?
>>
>> On a second thought, I think we should do what you said in the write
>> function too. Because the write can fail too. So if it can, we need to
>> handle that potential error too.
>>
>> Care to send a new version which does this check in the read and in the
>> write function? Basically what you had initially but with the write side
>> check added too to amd_smn_write.
>>
> 
> Sure thing. I don't have a real test for the write path. But I'll test by
> faking it.
> 

So I thought about it for a bit and quickly realized the "write and read back"
method isn't robust when done here.

Possible issues:
1) Bits that are "Write-1-to-clear". In this case, we *don't* want the read to
match the write.
2) Bits that are "Read-as-Zero"/"Writes-Ignored". We can't know this
information here.
3) Bits that are "Reserved / Set to 1". Ditto above.

I think all these issues should be handled by the callers of amd_smn_write().
They should do the "write and read back" check themselves, if needed.

For #1, they can see if their target bits got cleared.

For #2 and #3, they can check if their target bits got set as intended.

This matches what we do for rdmsr/wrmsr. As long as there's no #GP, then we're
good, and the caller does their own checking.

The "PCI Error Response" for the SMN read is the only check that would apply
to *any* SMN read. So I think that makes sense to do here instead of at each
call site.

What do you think?

Thanks,
Yazen
  
Yazen Ghannam May 8, 2023, 10:01 p.m. UTC | #8
On 4/5/23 3:10 PM, Yazen Ghannam wrote:
> On 4/5/23 14:32, Yazen Ghannam wrote:
>> On 4/5/23 14:06, Borislav Petkov wrote:
>>> On Mon, Apr 03, 2023 at 05:36:40PM -0400, Yazen Ghannam wrote:
>>>> Yes, that's fine. Should I send another revision?
>>>
>>> On a second thought, I think we should do what you said in the write
>>> function too. Because the write can fail too. So if it can, we need to
>>> handle that potential error too.
>>>
>>> Care to send a new version which does this check in the read and in the
>>> write function? Basically what you had initially but with the write side
>>> check added too to amd_smn_write.
>>>
>>
>> Sure thing. I don't have a real test for the write path. But I'll test by
>> faking it.
>>
> 
> So I thought about it for a bit and quickly realized the "write and read back"
> method isn't robust when done here.
> 
> Possible issues:
> 1) Bits that are "Write-1-to-clear". In this case, we *don't* want the read to
> match the write.
> 2) Bits that are "Read-as-Zero"/"Writes-Ignored". We can't know this
> information here.
> 3) Bits that are "Reserved / Set to 1". Ditto above.
> 
> I think all these issues should be handled by the callers of amd_smn_write().
> They should do the "write and read back" check themselves, if needed.
> 
> For #1, they can see if their target bits got cleared.
> 
> For #2 and #3, they can check if their target bits got set as intended.
> 
> This matches what we do for rdmsr/wrmsr. As long as there's no #GP, then we're
> good, and the caller does their own checking.
> 
> The "PCI Error Response" for the SMN read is the only check that would apply
> to *any* SMN read. So I think that makes sense to do here instead of at each
> call site.
> 
> What do you think?
>

Hi Boris,

Just following up. What do you think about the points above? I can send
another revision for whichever way we need to go.

Thanks,
Yazen
  
Borislav Petkov May 10, 2023, 11:35 a.m. UTC | #9
On Wed, Apr 05, 2023 at 03:10:09PM -0400, Yazen Ghannam wrote:
> What do you think?

Yes, please put that as a comment over __amd_smn_rw() as to why callers
should check the retval *and* make both the read and the write
__must_check and get rid of

        if (err)
                pr_warn("Error %s SMN address 0x%x.\n",
                        (write ? "writing to" : "reading from"), address);

which won't be seen in all cases. The __must_check will force the
callers to do the proper checking which is the only sane thing to do
with such a variety of bit behaviors.

Thx.
  
Yazen Ghannam May 10, 2023, 1:56 p.m. UTC | #10
On 5/10/23 7:35 AM, Borislav Petkov wrote:
> On Wed, Apr 05, 2023 at 03:10:09PM -0400, Yazen Ghannam wrote:
>> What do you think?
> 
> Yes, please put that as a comment over __amd_smn_rw() as to why callers
> should check the retval *and* make both the read and the write
> __must_check and get rid of
> 
>         if (err)
>                 pr_warn("Error %s SMN address 0x%x.\n",
>                         (write ? "writing to" : "reading from"), address);
> 
> which won't be seen in all cases. The __must_check will force the
> callers to do the proper checking which is the only sane thing to do
> with such a variety of bit behaviors.
>

Understood, will do.

Thanks,
Yazen
  

Patch

diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 4266b64631a4..33ebf104a020 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -190,7 +190,14 @@  static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
 
 int amd_smn_read(u16 node, u32 address, u32 *value)
 {
-	return __amd_smn_rw(node, address, value, false);
+	int err = __amd_smn_rw(node, address, value, false);
+
+	if (PCI_POSSIBLE_ERROR(*value)) {
+		err = -ENODEV;
+		*value = 0;
+	}
+
+	return err;
 }
 EXPORT_SYMBOL_GPL(amd_smn_read);