[RESEND,v7,0/2] ACPI: APEI: handle synchronous exceptions with proper si_code

Message ID 20230606074238.97166-1-xueshuai@linux.alibaba.com
Headers
Series ACPI: APEI: handle synchronous exceptions with proper si_code |

Message

Shuai Xue June 6, 2023, 7:42 a.m. UTC
  changes since v6:
- add more explicty error message suggested by Xiaofei
- pick up reviewed-by tag from Xiaofei
- pick up internal reviewed-by tag from Baolin

changes since v5 by addressing comments from Kefeng:
- document return value of memory_failure()
- drop redundant comments in call site of memory_failure() 
- make ghes_do_proc void and handle abnormal case within it
- pick up reviewed-by tag from Kefeng Wang 

changes since v4 by addressing comments from Xiaofei:
- do a force kill only for abnormal sync errors

changes since v3 by addressing comments from Xiaofei:
- do a force kill for abnormal memory failure error such as invalid PA,
unexpected severity, OOM, etc
- pcik up tested-by tag from Ma Wupeng

changes since v2 by addressing comments from Naoya:
- rename mce_task_work to sync_task_work
- drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
- add steps to reproduce this problem in cover letter

changes since v1:
- synchronous events by notify type
- Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/


Shuai Xue (2):
  ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
    synchronous events
  ACPI: APEI: handle synchronous exceptions in task work

 arch/x86/kernel/cpu/mce/core.c |   9 +--
 drivers/acpi/apei/ghes.c       | 113 ++++++++++++++++++++++-----------
 include/acpi/ghes.h            |   3 -
 mm/memory-failure.c            |  17 +----
 4 files changed, 79 insertions(+), 63 deletions(-)
  

Comments

Shuai Xue June 16, 2023, 7:15 a.m. UTC | #1
On 2023/6/6 15:42, Shuai Xue wrote:
> changes since v6:
> - add more explicty error message suggested by Xiaofei
> - pick up reviewed-by tag from Xiaofei
> - pick up internal reviewed-by tag from Baolin
> 
> changes since v5 by addressing comments from Kefeng:
> - document return value of memory_failure()
> - drop redundant comments in call site of memory_failure() 
> - make ghes_do_proc void and handle abnormal case within it
> - pick up reviewed-by tag from Kefeng Wang 
> 
> changes since v4 by addressing comments from Xiaofei:
> - do a force kill only for abnormal sync errors
> 
> changes since v3 by addressing comments from Xiaofei:
> - do a force kill for abnormal memory failure error such as invalid PA,
> unexpected severity, OOM, etc
> - pcik up tested-by tag from Ma Wupeng
> 
> changes since v2 by addressing comments from Naoya:
> - rename mce_task_work to sync_task_work
> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
> - add steps to reproduce this problem in cover letter
> 
> changes since v1:
> - synchronous events by notify type
> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
> 
> 
> Shuai Xue (2):
>   ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
>     synchronous events
>   ACPI: APEI: handle synchronous exceptions in task work
> 
>  arch/x86/kernel/cpu/mce/core.c |   9 +--
>  drivers/acpi/apei/ghes.c       | 113 ++++++++++++++++++++++-----------
>  include/acpi/ghes.h            |   3 -
>  mm/memory-failure.c            |  17 +----
>  4 files changed, 79 insertions(+), 63 deletions(-)
> 


Hi, Rafael,

Gentle ping.

Are you happy to queue this patch set or anything I can do to improve it?
As @Kefeng said, this issue is met in Alibaba and Huawei products, we hope it
could be fixed ASAP.

Thank you.

Best Regards,
Shuai
  
Shuai Xue July 10, 2023, 3:15 a.m. UTC | #2
On 2023/6/16 15:15, Shuai Xue wrote:
> 
> 
> On 2023/6/6 15:42, Shuai Xue wrote:
>> changes since v6:
>> - add more explicty error message suggested by Xiaofei
>> - pick up reviewed-by tag from Xiaofei
>> - pick up internal reviewed-by tag from Baolin
>>
>> changes since v5 by addressing comments from Kefeng:
>> - document return value of memory_failure()
>> - drop redundant comments in call site of memory_failure() 
>> - make ghes_do_proc void and handle abnormal case within it
>> - pick up reviewed-by tag from Kefeng Wang 
>>
>> changes since v4 by addressing comments from Xiaofei:
>> - do a force kill only for abnormal sync errors
>>
>> changes since v3 by addressing comments from Xiaofei:
>> - do a force kill for abnormal memory failure error such as invalid PA,
>> unexpected severity, OOM, etc
>> - pcik up tested-by tag from Ma Wupeng
>>
>> changes since v2 by addressing comments from Naoya:
>> - rename mce_task_work to sync_task_work
>> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
>> - add steps to reproduce this problem in cover letter
>>
>> changes since v1:
>> - synchronous events by notify type
>> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
>>
>>
>> Shuai Xue (2):
>>   ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
>>     synchronous events
>>   ACPI: APEI: handle synchronous exceptions in task work
>>
>>  arch/x86/kernel/cpu/mce/core.c |   9 +--
>>  drivers/acpi/apei/ghes.c       | 113 ++++++++++++++++++++++-----------
>>  include/acpi/ghes.h            |   3 -
>>  mm/memory-failure.c            |  17 +----
>>  4 files changed, 79 insertions(+), 63 deletions(-)
>>
> 
> 
> Hi, Rafael,
> 
> Gentle ping.
> 
> Are you happy to queue this patch set or anything I can do to improve it?
> As @Kefeng said, this issue is met in Alibaba and Huawei products, we hope it
> could be fixed ASAP.

Hi Rafael, Tony, and Naoya,

Gentle ping. I am sorry to see that we have missed v6.3 and v6.4 merge window
since three Reviewed-by tags and one Tested-by tag.

Do we still need any designated APEI reviewers Reviewed-by? Could you give me your
Reviewed-by @Tony, and @Naoya if you are happy with the change.

Or Please could you Ack this change if you are happy with the proposal and the
change? @Rafael

> 
> Thank you.
> 
> Best Regards,
> Shuai
  
Shuai Xue Aug. 8, 2023, 3:17 a.m. UTC | #3
On 2023/7/10 11:15, Shuai Xue wrote:
> 
> 
> On 2023/6/16 15:15, Shuai Xue wrote:
>>
>>
>> On 2023/6/6 15:42, Shuai Xue wrote:
>>> changes since v6:
>>> - add more explicty error message suggested by Xiaofei
>>> - pick up reviewed-by tag from Xiaofei
>>> - pick up internal reviewed-by tag from Baolin
>>>
>>> changes since v5 by addressing comments from Kefeng:
>>> - document return value of memory_failure()
>>> - drop redundant comments in call site of memory_failure() 
>>> - make ghes_do_proc void and handle abnormal case within it
>>> - pick up reviewed-by tag from Kefeng Wang 
>>>
>>> changes since v4 by addressing comments from Xiaofei:
>>> - do a force kill only for abnormal sync errors
>>>
>>> changes since v3 by addressing comments from Xiaofei:
>>> - do a force kill for abnormal memory failure error such as invalid PA,
>>> unexpected severity, OOM, etc
>>> - pcik up tested-by tag from Ma Wupeng
>>>
>>> changes since v2 by addressing comments from Naoya:
>>> - rename mce_task_work to sync_task_work
>>> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
>>> - add steps to reproduce this problem in cover letter
>>>
>>> changes since v1:
>>> - synchronous events by notify type
>>> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
>>>
>>>
>>> Shuai Xue (2):
>>>   ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
>>>     synchronous events
>>>   ACPI: APEI: handle synchronous exceptions in task work
>>>
>>>  arch/x86/kernel/cpu/mce/core.c |   9 +--
>>>  drivers/acpi/apei/ghes.c       | 113 ++++++++++++++++++++++-----------
>>>  include/acpi/ghes.h            |   3 -
>>>  mm/memory-failure.c            |  17 +----
>>>  4 files changed, 79 insertions(+), 63 deletions(-)
>>>
>>
>>
>> Hi, Rafael,
>>
>> Gentle ping.
>>
>> Are you happy to queue this patch set or anything I can do to improve it?
>> As @Kefeng said, this issue is met in Alibaba and Huawei products, we hope it
>> could be fixed ASAP.
> 
> Hi Rafael, Tony, and Naoya,
> 
> Gentle ping. I am sorry to see that we have missed v6.3 and v6.4 merge window
> since three Reviewed-by tags and one Tested-by tag.
> 
> Do we still need any designated APEI reviewers Reviewed-by? Could you give me your
> Reviewed-by @Tony, and @Naoya if you are happy with the change.
> 
> Or Please could you Ack this change if you are happy with the proposal and the
> change? @Rafael
> 

Hi, ALL,

Gentle ping.

>>
>> Thank you.
>>
>> Best Regards,
>> Shuai