[v9,2/3] Documentation: add a isolation strategy sysfs node for uacce

Message ID 20221025123931.42161-3-yekai13@huawei.com
State New
Headers
Series crypto: hisilicon - supports device isolation feature |

Commit Message

yekai (A) Oct. 25, 2022, 12:39 p.m. UTC
  Update documentation describing sysfs node that could help to
configure isolation strategy for users in the user space. And
describing sysfs node that could read the device isolated state.

Signed-off-by: Kai Ye <yekai13@huawei.com>
---
 Documentation/ABI/testing/sysfs-driver-uacce | 27 ++++++++++++++++++++
 1 file changed, 27 insertions(+)
  

Comments

Greg KH Oct. 25, 2022, 1:03 p.m. UTC | #1
On Tue, Oct 25, 2022 at 12:39:30PM +0000, Kai Ye wrote:
> Update documentation describing sysfs node that could help to
> configure isolation strategy for users in the user space. And
> describing sysfs node that could read the device isolated state.
> 
> Signed-off-by: Kai Ye <yekai13@huawei.com>
> ---
>  Documentation/ABI/testing/sysfs-driver-uacce | 27 ++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
> index 08f2591138af..50737c897ba3 100644
> --- a/Documentation/ABI/testing/sysfs-driver-uacce
> +++ b/Documentation/ABI/testing/sysfs-driver-uacce
> @@ -19,6 +19,33 @@ Contact:        linux-accelerators@lists.ozlabs.org
>  Description:    Available instances left of the device
>                  Return -ENODEV if uacce_ops get_available_instances is not provided
>  
> +What:           /sys/class/uacce/<dev_name>/isolate_strategy
> +Date:           Oct 2022
> +KernelVersion:  6.1
> +Contact:        linux-accelerators@lists.ozlabs.org
> +Description:    (RW) Configure the frequency size for the hardware error
> +                isolation strategy. This unit is the number of times. Number

Number of times what?

> +                of occurrences in a period, also means threshold. If the number
> +                of device pci AER error exceeds the threshold in a time window,

What is the time window?

> +                the device is isolated. This size is a configured integer value.
> +                The default is 0. The maximum value is 65535.
> +
> +                In the hisilicon accelerator engine, first we will
> +                time-stamp every slot AER error. Then check the AER error log
> +                when the device AER error occurred. if the device slot AER error
> +                count exceeds the preset the number of times in one hour, the
> +                isolated state will be set to true. So the device will be
> +                isolated. And the AER error log that exceed one hour will be
> +                cleared.

This seems like a very hardware-specific implementation here.  And this
is supposed to be a generic class?

I feel this is getting really messy :(

thanks,

greg k-h
  

Patch

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
index 08f2591138af..50737c897ba3 100644
--- a/Documentation/ABI/testing/sysfs-driver-uacce
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -19,6 +19,33 @@  Contact:        linux-accelerators@lists.ozlabs.org
 Description:    Available instances left of the device
                 Return -ENODEV if uacce_ops get_available_instances is not provided
 
+What:           /sys/class/uacce/<dev_name>/isolate_strategy
+Date:           Oct 2022
+KernelVersion:  6.1
+Contact:        linux-accelerators@lists.ozlabs.org
+Description:    (RW) Configure the frequency size for the hardware error
+                isolation strategy. This unit is the number of times. Number
+                of occurrences in a period, also means threshold. If the number
+                of device pci AER error exceeds the threshold in a time window,
+                the device is isolated. This size is a configured integer value.
+                The default is 0. The maximum value is 65535.
+
+                In the hisilicon accelerator engine, first we will
+                time-stamp every slot AER error. Then check the AER error log
+                when the device AER error occurred. if the device slot AER error
+                count exceeds the preset the number of times in one hour, the
+                isolated state will be set to true. So the device will be
+                isolated. And the AER error log that exceed one hour will be
+                cleared.
+
+What:           /sys/class/uacce/<dev_name>/isolate
+Date:           Oct 2022
+KernelVersion:  6.1
+Contact:        linux-accelerators@lists.ozlabs.org
+Description:    (R) A sysfs node that read the device isolated state. The value 1
+                means the device is unavailable. The 0 means the device is
+                available.
+
 What:           /sys/class/uacce/<dev_name>/algorithms
 Date:           Feb 2020
 KernelVersion:  5.7