[RFC,v2,1/3] ACPI: APEI: EINJ: Refactor available_error_type_show()

Message ID 20230525204422.4754-2-Avadhut.Naik@amd.com
State New
Headers
Series Add support for Vendor Defined Error Types in Einj Module |

Commit Message

Avadhut Naik May 25, 2023, 8:44 p.m. UTC
  OSPM can discover the error injection capabilities of the platform by
executing GET_ERROR_TYPE error injection action.[1] The action returns
a DWORD representing a bitmap of platform supported error injections.[2]

The available_error_type_show() function determines the bits set within
this DWORD and provides a verbose output, from einj_error_type_string
array, through /sys/kernel/debug/apei/einj/available_error_type file.

The function however, assumes one to one correspondence between an error's
position in the bitmap and its array entry offset. Consequently, some
errors like Vendor Defined Error Type fail this assumption and will
incorrectly be shown as not supported, even if their corresponding bit is
set in the bitmap and they have an entry in the array.

Navigate around the issue by converting einj_error_type_string into an
array of structures with a predetermined mask for all error types
corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE
action. The same breaks the aforementioned assumption resulting in all
supported error types by a platform being outputted through the above
available_error_type file.

[1] ACPI specification 6.5, Table 18.25
[2] ACPI specification 6.5, Table 18.30

Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com>
Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com>
---
 drivers/acpi/apei/einj.c | 43 ++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 21 deletions(-)
  

Comments

Yazen Ghannam June 7, 2023, 2:20 p.m. UTC | #1
On 5/25/23 4:44 PM, Avadhut Naik wrote:
> OSPM can discover the error injection capabilities of the platform by
> executing GET_ERROR_TYPE error injection action.[1] The action returns
> a DWORD representing a bitmap of platform supported error injections.[2]
> 
> The available_error_type_show() function determines the bits set within
> this DWORD and provides a verbose output, from einj_error_type_string
> array, through /sys/kernel/debug/apei/einj/available_error_type file.
> 
> The function however, assumes one to one correspondence between an error's
> position in the bitmap and its array entry offset. Consequently, some
> errors like Vendor Defined Error Type fail this assumption and will
> incorrectly be shown as not supported, even if their corresponding bit is
> set in the bitmap and they have an entry in the array.
> 
> Navigate around the issue by converting einj_error_type_string into an
> array of structures with a predetermined mask for all error types
> corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE
> action. The same breaks the aforementioned assumption resulting in all
> supported error types by a platform being outputted through the above
> available_error_type file.
> 
> [1] ACPI specification 6.5, Table 18.25
> [2] ACPI specification 6.5, Table 18.30
> 
> Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com>
> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com>
> ---
>  drivers/acpi/apei/einj.c | 43 ++++++++++++++++++++--------------------
>  1 file changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 013eb621dc92..d5f8dc4df7a5 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -577,25 +577,25 @@ static u64 error_param2;
>  static u64 error_param3;
>  static u64 error_param4;
>  static struct dentry *einj_debug_dir;
> -static const char * const einj_error_type_string[] = {
> -	"0x00000001\tProcessor Correctable\n",
> -	"0x00000002\tProcessor Uncorrectable non-fatal\n",
> -	"0x00000004\tProcessor Uncorrectable fatal\n",
> -	"0x00000008\tMemory Correctable\n",
> -	"0x00000010\tMemory Uncorrectable non-fatal\n",
> -	"0x00000020\tMemory Uncorrectable fatal\n",
> -	"0x00000040\tPCI Express Correctable\n",
> -	"0x00000080\tPCI Express Uncorrectable non-fatal\n",
> -	"0x00000100\tPCI Express Uncorrectable fatal\n",
> -	"0x00000200\tPlatform Correctable\n",
> -	"0x00000400\tPlatform Uncorrectable non-fatal\n",
> -	"0x00000800\tPlatform Uncorrectable fatal\n",
> -	"0x00001000\tCXL.cache Protocol Correctable\n",
> -	"0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
> -	"0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
> -	"0x00008000\tCXL.mem Protocol Correctable\n",
> -	"0x00010000\tCXL.mem Protocol Uncorrectable non-fatal\n",
> -	"0x00020000\tCXL.mem Protocol Uncorrectable fatal\n",
> +static struct { u32 mask; const char *str; } const einj_error_type_string[] = {
> +	{0x00000001, "Processor Correctable"},
> +	{0x00000002, "Processor Uncorrectable non-fatal"},
> +	{0x00000004, "Processor Uncorrectable fatal"},
> +	{0x00000008, "Memory Correctable"},
> +	{0x00000010, "Memory Uncorrectable non-fatal"},
> +	{0x00000020, "Memory Uncorrectable fatal"},
> +	{0x00000040, "PCI Express Correctable"},
> +	{0x00000080, "PCI Express Uncorrectable non-fatal"},
> +	{0x00000100, "PCI Express Uncorrectable fatal"},
> +	{0x00000200, "Platform Correctable"},
> +	{0x00000400, "Platform Uncorrectable non-fatal"},
> +	{0x00000800, "Platform Uncorrectable fatal"},
> +	{0x00001000, "CXL.cache Protocol Correctable"},
> +	{0x00002000, "CXL.cache Protocol Uncorrectable non-fatal"},
> +	{0x00004000, "CXL.cache Protocol Uncorrectable fatal"},
> +	{0x00008000, "CXL.mem Protocol Correctable"},
> +	{0x00010000, "CXL.mem Protocol Uncorrectable non-fatal"},
> +	{0x00020000, "CXL.mem Protocol Uncorrectable fatal"},
>  };
>

I think it'd be easier to read if the masks used the BIT() macro rather
than a hex value.

Thanks,
Yazen
  
Alexey Kardashevskiy June 8, 2023, 3:48 a.m. UTC | #2
On 8/6/23 00:20, Yazen Ghannam wrote:
> On 5/25/23 4:44 PM, Avadhut Naik wrote:
>> OSPM can discover the error injection capabilities of the platform by
>> executing GET_ERROR_TYPE error injection action.[1] The action returns
>> a DWORD representing a bitmap of platform supported error injections.[2]
>>
>> The available_error_type_show() function determines the bits set within
>> this DWORD and provides a verbose output, from einj_error_type_string
>> array, through /sys/kernel/debug/apei/einj/available_error_type file.
>>
>> The function however, assumes one to one correspondence between an error's
>> position in the bitmap and its array entry offset. Consequently, some
>> errors like Vendor Defined Error Type fail this assumption and will
>> incorrectly be shown as not supported, even if their corresponding bit is
>> set in the bitmap and they have an entry in the array.
>>
>> Navigate around the issue by converting einj_error_type_string into an
>> array of structures with a predetermined mask for all error types
>> corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE
>> action. The same breaks the aforementioned assumption resulting in all
>> supported error types by a platform being outputted through the above
>> available_error_type file.
>>
>> [1] ACPI specification 6.5, Table 18.25
>> [2] ACPI specification 6.5, Table 18.30
>>
>> Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com>
>> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com>
>> ---
>>   drivers/acpi/apei/einj.c | 43 ++++++++++++++++++++--------------------
>>   1 file changed, 22 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>> index 013eb621dc92..d5f8dc4df7a5 100644
>> --- a/drivers/acpi/apei/einj.c
>> +++ b/drivers/acpi/apei/einj.c
>> @@ -577,25 +577,25 @@ static u64 error_param2;
>>   static u64 error_param3;
>>   static u64 error_param4;
>>   static struct dentry *einj_debug_dir;
>> -static const char * const einj_error_type_string[] = {
>> -	"0x00000001\tProcessor Correctable\n",
>> -	"0x00000002\tProcessor Uncorrectable non-fatal\n",
>> -	"0x00000004\tProcessor Uncorrectable fatal\n",
>> -	"0x00000008\tMemory Correctable\n",
>> -	"0x00000010\tMemory Uncorrectable non-fatal\n",
>> -	"0x00000020\tMemory Uncorrectable fatal\n",
>> -	"0x00000040\tPCI Express Correctable\n",
>> -	"0x00000080\tPCI Express Uncorrectable non-fatal\n",
>> -	"0x00000100\tPCI Express Uncorrectable fatal\n",
>> -	"0x00000200\tPlatform Correctable\n",
>> -	"0x00000400\tPlatform Uncorrectable non-fatal\n",
>> -	"0x00000800\tPlatform Uncorrectable fatal\n",
>> -	"0x00001000\tCXL.cache Protocol Correctable\n",
>> -	"0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
>> -	"0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
>> -	"0x00008000\tCXL.mem Protocol Correctable\n",
>> -	"0x00010000\tCXL.mem Protocol Uncorrectable non-fatal\n",
>> -	"0x00020000\tCXL.mem Protocol Uncorrectable fatal\n",
>> +static struct { u32 mask; const char *str; } const einj_error_type_string[] = {
>> +	{0x00000001, "Processor Correctable"},
>> +	{0x00000002, "Processor Uncorrectable non-fatal"},
>> +	{0x00000004, "Processor Uncorrectable fatal"},
>> +	{0x00000008, "Memory Correctable"},
>> +	{0x00000010, "Memory Uncorrectable non-fatal"},
>> +	{0x00000020, "Memory Uncorrectable fatal"},
>> +	{0x00000040, "PCI Express Correctable"},
>> +	{0x00000080, "PCI Express Uncorrectable non-fatal"},
>> +	{0x00000100, "PCI Express Uncorrectable fatal"},
>> +	{0x00000200, "Platform Correctable"},
>> +	{0x00000400, "Platform Uncorrectable non-fatal"},
>> +	{0x00000800, "Platform Uncorrectable fatal"},
>> +	{0x00001000, "CXL.cache Protocol Correctable"},
>> +	{0x00002000, "CXL.cache Protocol Uncorrectable non-fatal"},
>> +	{0x00004000, "CXL.cache Protocol Uncorrectable fatal"},
>> +	{0x00008000, "CXL.mem Protocol Correctable"},
>> +	{0x00010000, "CXL.mem Protocol Uncorrectable non-fatal"},
>> +	{0x00020000, "CXL.mem Protocol Uncorrectable fatal"},
>>   };
>>
> 
> I think it'd be easier to read if the masks used the BIT() macro rather
> than a hex value.

Makes sense but I'd say because it is easier to match the APCI spec 
which uses the bit numbers, not easier to read (which is arguable).


> 
> Thanks,
> Yazen
  
Naik, Avadhut June 8, 2023, 9:08 p.m. UTC | #3
Hi,
	Thanks for reviewing.

On 6/7/2023 22:48, Alexey Kardashevskiy wrote:
> 
> 
> On 8/6/23 00:20, Yazen Ghannam wrote:
>> On 5/25/23 4:44 PM, Avadhut Naik wrote:
>>> OSPM can discover the error injection capabilities of the platform by
>>> executing GET_ERROR_TYPE error injection action.[1] The action returns
>>> a DWORD representing a bitmap of platform supported error injections.[2]
>>>
>>> The available_error_type_show() function determines the bits set within
>>> this DWORD and provides a verbose output, from einj_error_type_string
>>> array, through /sys/kernel/debug/apei/einj/available_error_type file.
>>>
>>> The function however, assumes one to one correspondence between an error's
>>> position in the bitmap and its array entry offset. Consequently, some
>>> errors like Vendor Defined Error Type fail this assumption and will
>>> incorrectly be shown as not supported, even if their corresponding bit is
>>> set in the bitmap and they have an entry in the array.
>>>
>>> Navigate around the issue by converting einj_error_type_string into an
>>> array of structures with a predetermined mask for all error types
>>> corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE
>>> action. The same breaks the aforementioned assumption resulting in all
>>> supported error types by a platform being outputted through the above
>>> available_error_type file.
>>>
>>> [1] ACPI specification 6.5, Table 18.25
>>> [2] ACPI specification 6.5, Table 18.30
>>>
>>> Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com>
>>> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com>
>>> ---
>>>   drivers/acpi/apei/einj.c | 43 ++++++++++++++++++++--------------------
>>>   1 file changed, 22 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>>> index 013eb621dc92..d5f8dc4df7a5 100644
>>> --- a/drivers/acpi/apei/einj.c
>>> +++ b/drivers/acpi/apei/einj.c
>>> @@ -577,25 +577,25 @@ static u64 error_param2;
>>>   static u64 error_param3;
>>>   static u64 error_param4;
>>>   static struct dentry *einj_debug_dir;
>>> -static const char * const einj_error_type_string[] = {
>>> -    "0x00000001\tProcessor Correctable\n",
>>> -    "0x00000002\tProcessor Uncorrectable non-fatal\n",
>>> -    "0x00000004\tProcessor Uncorrectable fatal\n",
>>> -    "0x00000008\tMemory Correctable\n",
>>> -    "0x00000010\tMemory Uncorrectable non-fatal\n",
>>> -    "0x00000020\tMemory Uncorrectable fatal\n",
>>> -    "0x00000040\tPCI Express Correctable\n",
>>> -    "0x00000080\tPCI Express Uncorrectable non-fatal\n",
>>> -    "0x00000100\tPCI Express Uncorrectable fatal\n",
>>> -    "0x00000200\tPlatform Correctable\n",
>>> -    "0x00000400\tPlatform Uncorrectable non-fatal\n",
>>> -    "0x00000800\tPlatform Uncorrectable fatal\n",
>>> -    "0x00001000\tCXL.cache Protocol Correctable\n",
>>> -    "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
>>> -    "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
>>> -    "0x00008000\tCXL.mem Protocol Correctable\n",
>>> -    "0x00010000\tCXL.mem Protocol Uncorrectable non-fatal\n",
>>> -    "0x00020000\tCXL.mem Protocol Uncorrectable fatal\n",
>>> +static struct { u32 mask; const char *str; } const einj_error_type_string[] = {
>>> +    {0x00000001, "Processor Correctable"},
>>> +    {0x00000002, "Processor Uncorrectable non-fatal"},
>>> +    {0x00000004, "Processor Uncorrectable fatal"},
>>> +    {0x00000008, "Memory Correctable"},
>>> +    {0x00000010, "Memory Uncorrectable non-fatal"},
>>> +    {0x00000020, "Memory Uncorrectable fatal"},
>>> +    {0x00000040, "PCI Express Correctable"},
>>> +    {0x00000080, "PCI Express Uncorrectable non-fatal"},
>>> +    {0x00000100, "PCI Express Uncorrectable fatal"},
>>> +    {0x00000200, "Platform Correctable"},
>>> +    {0x00000400, "Platform Uncorrectable non-fatal"},
>>> +    {0x00000800, "Platform Uncorrectable fatal"},
>>> +    {0x00001000, "CXL.cache Protocol Correctable"},
>>> +    {0x00002000, "CXL.cache Protocol Uncorrectable non-fatal"},
>>> +    {0x00004000, "CXL.cache Protocol Uncorrectable fatal"},
>>> +    {0x00008000, "CXL.mem Protocol Correctable"},
>>> +    {0x00010000, "CXL.mem Protocol Uncorrectable non-fatal"},
>>> +    {0x00020000, "CXL.mem Protocol Uncorrectable fatal"},
>>>   };
>>>
>>
>> I think it'd be easier to read if the masks used the BIT() macro rather
>> than a hex value.
> 
> Makes sense but I'd say because it is easier to match the APCI spec which uses the bit numbers, not easier to read (which is arguable).
> 
	Agreed, will replace the hex values with BIT() macro.

Thanks,
Avadhut Naik
> 
>>
>> Thanks,
>> Yazen
> 

--
  

Patch

diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 013eb621dc92..d5f8dc4df7a5 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -577,25 +577,25 @@  static u64 error_param2;
 static u64 error_param3;
 static u64 error_param4;
 static struct dentry *einj_debug_dir;
-static const char * const einj_error_type_string[] = {
-	"0x00000001\tProcessor Correctable\n",
-	"0x00000002\tProcessor Uncorrectable non-fatal\n",
-	"0x00000004\tProcessor Uncorrectable fatal\n",
-	"0x00000008\tMemory Correctable\n",
-	"0x00000010\tMemory Uncorrectable non-fatal\n",
-	"0x00000020\tMemory Uncorrectable fatal\n",
-	"0x00000040\tPCI Express Correctable\n",
-	"0x00000080\tPCI Express Uncorrectable non-fatal\n",
-	"0x00000100\tPCI Express Uncorrectable fatal\n",
-	"0x00000200\tPlatform Correctable\n",
-	"0x00000400\tPlatform Uncorrectable non-fatal\n",
-	"0x00000800\tPlatform Uncorrectable fatal\n",
-	"0x00001000\tCXL.cache Protocol Correctable\n",
-	"0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
-	"0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
-	"0x00008000\tCXL.mem Protocol Correctable\n",
-	"0x00010000\tCXL.mem Protocol Uncorrectable non-fatal\n",
-	"0x00020000\tCXL.mem Protocol Uncorrectable fatal\n",
+static struct { u32 mask; const char *str; } const einj_error_type_string[] = {
+	{0x00000001, "Processor Correctable"},
+	{0x00000002, "Processor Uncorrectable non-fatal"},
+	{0x00000004, "Processor Uncorrectable fatal"},
+	{0x00000008, "Memory Correctable"},
+	{0x00000010, "Memory Uncorrectable non-fatal"},
+	{0x00000020, "Memory Uncorrectable fatal"},
+	{0x00000040, "PCI Express Correctable"},
+	{0x00000080, "PCI Express Uncorrectable non-fatal"},
+	{0x00000100, "PCI Express Uncorrectable fatal"},
+	{0x00000200, "Platform Correctable"},
+	{0x00000400, "Platform Uncorrectable non-fatal"},
+	{0x00000800, "Platform Uncorrectable fatal"},
+	{0x00001000, "CXL.cache Protocol Correctable"},
+	{0x00002000, "CXL.cache Protocol Uncorrectable non-fatal"},
+	{0x00004000, "CXL.cache Protocol Uncorrectable fatal"},
+	{0x00008000, "CXL.mem Protocol Correctable"},
+	{0x00010000, "CXL.mem Protocol Uncorrectable non-fatal"},
+	{0x00020000, "CXL.mem Protocol Uncorrectable fatal"},
 };
 
 static int available_error_type_show(struct seq_file *m, void *v)
@@ -607,8 +607,9 @@  static int available_error_type_show(struct seq_file *m, void *v)
 	if (rc)
 		return rc;
 	for (int pos = 0; pos < ARRAY_SIZE(einj_error_type_string); pos++)
-		if (available_error_type & BIT(pos))
-			seq_puts(m, einj_error_type_string[pos]);
+		if (available_error_type & einj_error_type_string[pos].mask)
+			seq_printf(m, "0x%08x\t%s\n", einj_error_type_string[pos].mask,
+				   einj_error_type_string[pos].str);
 
 	return 0;
 }