[RESEND,v5,4/4] ACPI: APEI: EINJ: Add support for vendor defined error types

Message ID 20231107213647.1405493-5-avadhut.naik@amd.com
State New
Headers
Series Add support for Vendor Defined Error Types in Einj Module |

Commit Message

Avadhut Naik Nov. 7, 2023, 9:36 p.m. UTC
  From: Avadhut Naik <Avadhut.Naik@amd.com>

Vendor-Defined Error types are supported by the platform apart from
standard error types if bit 31 is set in the output of GET_ERROR_TYPE
Error Injection Action.[1] While the errors themselves and the length
of their associated "OEM Defined data structure" might vary between
vendors, the physical address of this structure can be computed through
vendor_extension and length fields of "SET_ERROR_TYPE_WITH_ADDRESS" and
"Vendor Error Type Extension" Structures respectively.[2][3]

Currently, however, the einj module only computes the physical address of
Vendor Error Type Extension Structure. Neither does it compute the physical
address of OEM Defined structure nor does it establish the memory mapping
required for injecting Vendor-defined errors. Consequently, userspace
tools have to establish the very mapping through /dev/mem, nopat kernel
parameter and system calls like mmap/munmap initially before injecting
Vendor-defined errors.

Circumvent the issue by computing the physical address of OEM Defined data
structure and establishing the required mapping with the structure. Create
a new file "oem_error", if the system supports Vendor-defined errors, to
export this mapping, through debugfs_create_blob(). Userspace tools can
then populate their respective OEM Defined structure instances and just
write to the file as part of injecting Vendor-defined Errors. Similarly,
the tools can also read from the file if the system firmware provides some
information through the OEM defined structure after error injection.

[1] ACPI specification 6.5, section 18.6.4
[2] ACPI specification 6.5, Table 18.31
[3] ACPI specification 6.5, Table 18.32

Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com>
---
 drivers/acpi/apei/einj.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)
  

Comments

Luck, Tony Nov. 7, 2023, 10:41 p.m. UTC | #1
> Circumvent the issue by computing the physical address of OEM Defined data
> structure and establishing the required mapping with the structure. Create
> a new file "oem_error", if the system supports Vendor-defined errors, to
> export this mapping, through debugfs_create_blob(). Userspace tools can
> then populate their respective OEM Defined structure instances and just
> write to the file as part of injecting Vendor-defined Errors. Similarly,
> the tools can also read from the file if the system firmware provides some
> information through the OEM defined structure after error injection.

Tried it out on an Intel Icelake machine.  Seems like it is doing something
reasonable: I end up with this "oem_error" file:

# ls -l oem_error
-rw-------. 1 root root 0 Nov  7 14:26 oem_error

Which is 24 bytes full of zeroes:

# od -t x1 oem_error
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000020 00 00 00 00 00 00 00 00
0000030

I'm not sure what OEM specific errors are implemented on this machine,
so that's as far as I'm able to test.

So apart from s/0444/0666/ in part 2, and a better commit description for
part 3 you can have a:

Reviewed-by: Tony Luck <tony.luck@intel.com>

-Tony
  
Borislav Petkov Nov. 15, 2023, 12:33 p.m. UTC | #2
On Tue, Nov 07, 2023 at 03:36:47PM -0600, Avadhut Naik wrote:
> From: Avadhut Naik <Avadhut.Naik@amd.com>
> 
> Vendor-Defined Error types are supported by the platform apart from
> standard error types if bit 31 is set in the output of GET_ERROR_TYPE
> Error Injection Action.[1] While the errors themselves and the length
> of their associated "OEM Defined data structure" might vary between
> vendors, the physical address of this structure can be computed through
> vendor_extension and length fields of "SET_ERROR_TYPE_WITH_ADDRESS" and
> "Vendor Error Type Extension" Structures respectively.[2][3]
> 
> Currently, however, the einj module only computes the physical address of
> Vendor Error Type Extension Structure. Neither does it compute the physical
> address of OEM Defined structure nor does it establish the memory mapping
> required for injecting Vendor-defined errors. Consequently, userspace
> tools have to establish the very mapping through /dev/mem, nopat kernel
> parameter and system calls like mmap/munmap initially before injecting
> Vendor-defined errors.
> 
> Circumvent the issue by computing the physical address of OEM Defined data
> structure and establishing the required mapping with the structure. Create
> a new file "oem_error", if the system supports Vendor-defined errors, to
> export this mapping, through debugfs_create_blob(). Userspace tools can
> then populate their respective OEM Defined structure instances and just
> write to the file as part of injecting Vendor-defined Errors. Similarly,
> the tools can also read from the file if the system firmware provides some
> information through the OEM defined structure after error injection.
> 
> [1] ACPI specification 6.5, section 18.6.4
> [2] ACPI specification 6.5, Table 18.31
> [3] ACPI specification 6.5, Table 18.32
> 
> Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com>
> ---
>  drivers/acpi/apei/einj.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
  

Patch

diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index ee360fcb1618..e2c5afdd42f2 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -73,6 +73,7 @@  static u32 notrigger;
 
 static u32 vendor_flags;
 static struct debugfs_blob_wrapper vendor_blob;
+static struct debugfs_blob_wrapper vendor_errors;
 static char vendor_dev[64];
 
 /*
@@ -182,6 +183,21 @@  static int einj_timedout(u64 *t)
 	return 0;
 }
 
+static void get_oem_vendor_struct(u64 paddr, int offset,
+				  struct vendor_error_type_extension *v)
+{
+	unsigned long vendor_size;
+	u64 target_pa = paddr + offset + sizeof(struct vendor_error_type_extension);
+
+	vendor_size = v->length - sizeof(struct vendor_error_type_extension);
+
+	if (vendor_size)
+		vendor_errors.data = acpi_os_map_memory(target_pa, vendor_size);
+
+	if (vendor_errors.data)
+		vendor_errors.size = vendor_size;
+}
+
 static void check_vendor_extension(u64 paddr,
 				   struct set_error_type_with_address *v5param)
 {
@@ -194,6 +210,7 @@  static void check_vendor_extension(u64 paddr,
 	v = acpi_os_map_iomem(paddr + offset, sizeof(*v));
 	if (!v)
 		return;
+	get_oem_vendor_struct(paddr, offset, v);
 	sbdf = v->pcie_sbdf;
 	sprintf(vendor_dev, "%x:%x:%x.%x vendor_id=%x device_id=%x rev_id=%x\n",
 		sbdf >> 24, (sbdf >> 16) & 0xff,
@@ -596,6 +613,7 @@  static struct { u32 mask; const char *str; } const einj_error_type_string[] = {
 	{BIT(15), "CXL.mem Protocol Correctable"},
 	{BIT(16), "CXL.mem Protocol Uncorrectable non-fatal"},
 	{BIT(17), "CXL.mem Protocol Uncorrectable fatal"},
+	{BIT(31), "Vendor Defined Error Types"},
 };
 
 static int available_error_type_show(struct seq_file *m, void *v)
@@ -768,6 +786,10 @@  static int __init einj_init(void)
 				   einj_debug_dir, &vendor_flags);
 	}
 
+	if (vendor_errors.size)
+		debugfs_create_blob("oem_error", 0600, einj_debug_dir,
+				    &vendor_errors);
+
 	pr_info("Error INJection is initialized.\n");
 
 	return 0;
@@ -793,6 +815,8 @@  static void __exit einj_exit(void)
 			sizeof(struct einj_parameter);
 
 		acpi_os_unmap_iomem(einj_param, size);
+		if (vendor_errors.size)
+			acpi_os_unmap_memory(vendor_errors.data, vendor_errors.size);
 	}
 	einj_exec_ctx_init(&ctx);
 	apei_exec_post_unmap_gars(&ctx);