[v2,4/6] drm/amdgpu: Limit info in coredump for kernel threads

Message ID 20230713213242.680944-5-andrealmeid@igalia.com
State New
Headers
Series drm/amdgpu: Add new reset option and rework coredump |

Commit Message

André Almeida July 13, 2023, 9:32 p.m. UTC
  If a kernel thread caused the reset, the information available to be
logged will be limited, so return early in the dump function.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
  

Comments

Christian König July 14, 2023, 7:52 a.m. UTC | #1
Am 13.07.23 um 23:32 schrieb André Almeida:
> If a kernel thread caused the reset, the information available to be
> logged will be limited, so return early in the dump function.

Why? The register values and vram lost state should still be valid.

Christian.

>
> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e80670420586..07546781b8b8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4988,10 +4988,14 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
>   	drm_printf(&p, "kernel: " UTS_RELEASE "\n");
>   	drm_printf(&p, "module: " KBUILD_MODNAME "\n");
>   	drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
> -	if (coredump->reset_task_info.pid)
> +	if (coredump->reset_task_info.pid) {
>   		drm_printf(&p, "process_name: %s PID: %d\n",
>   			   coredump->reset_task_info.process_name,
>   			   coredump->reset_task_info.pid);
> +	} else {
> +		drm_printf(&p, "GPU reset caused by a kernel thread\n");
> +		return count - iter.remain;
> +	}
>   
>   	if (coredump->reset_vram_lost)
>   		drm_printf(&p, "VRAM is lost due to GPU reset!\n");
  
André Almeida July 14, 2023, 12:18 p.m. UTC | #2
Em 14/07/2023 04:52, Christian König escreveu:
> 
> 
> Am 13.07.23 um 23:32 schrieb André Almeida:
>> If a kernel thread caused the reset, the information available to be
>> logged will be limited, so return early in the dump function.
> 
> Why? The register values and vram lost state should still be valid.
> 

Fair enough, I was thinking about the new added information, such as 
ring and job, that won't be around for this type of thread. I'll drop 
this patch for the next version.

> Christian.
> 
>>
>> Signed-off-by: André Almeida <andrealmeid@igalia.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e80670420586..07546781b8b8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4988,10 +4988,14 @@ static ssize_t amdgpu_devcoredump_read(char 
>> *buffer, loff_t offset,
>>       drm_printf(&p, "kernel: " UTS_RELEASE "\n");
>>       drm_printf(&p, "module: " KBUILD_MODNAME "\n");
>>       drm_printf(&p, "time: %lld.%09ld\n", 
>> coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
>> -    if (coredump->reset_task_info.pid)
>> +    if (coredump->reset_task_info.pid) {
>>           drm_printf(&p, "process_name: %s PID: %d\n",
>>                  coredump->reset_task_info.process_name,
>>                  coredump->reset_task_info.pid);
>> +    } else {
>> +        drm_printf(&p, "GPU reset caused by a kernel thread\n");
>> +        return count - iter.remain;
>> +    }
>>       if (coredump->reset_vram_lost)
>>           drm_printf(&p, "VRAM is lost due to GPU reset!\n");
>
  

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e80670420586..07546781b8b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4988,10 +4988,14 @@  static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
 	drm_printf(&p, "kernel: " UTS_RELEASE "\n");
 	drm_printf(&p, "module: " KBUILD_MODNAME "\n");
 	drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
-	if (coredump->reset_task_info.pid)
+	if (coredump->reset_task_info.pid) {
 		drm_printf(&p, "process_name: %s PID: %d\n",
 			   coredump->reset_task_info.process_name,
 			   coredump->reset_task_info.pid);
+	} else {
+		drm_printf(&p, "GPU reset caused by a kernel thread\n");
+		return count - iter.remain;
+	}
 
 	if (coredump->reset_vram_lost)
 		drm_printf(&p, "VRAM is lost due to GPU reset!\n");