[v3,4/5] selftests/resctrl: Cleanup properly when an error occurs in CAT test

Message ID 20221101094341.3383073-5-tan.shaopeng@jp.fujitsu.com
State New
Headers
Series Some improvements of resctrl selftest |

Commit Message

Shaopeng Tan Nov. 1, 2022, 9:43 a.m. UTC
  After creating a child process with fork() in CAT test, if there is
an error occurs or such as a SIGINT signal is received, the parent
process will be terminated immediately, but the child process will not
be killed and also umount_resctrlfs() will not be called.

Add a signal handler like other tests to kill child process, umount
resctrlfs, cleanup result files, etc. when an error occurs.

Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
---
 tools/testing/selftests/resctrl/cat_test.c | 28 +++++++++++++++-------
 1 file changed, 19 insertions(+), 9 deletions(-)
  

Comments

Shuah Khan Nov. 2, 2022, 9:41 a.m. UTC | #1
On 11/1/22 03:43, Shaopeng Tan wrote:
> After creating a child process with fork() in CAT test, if there is
> an error occurs or such as a SIGINT signal is received, the parent
> process will be terminated immediately, but the child process will not
> be killed and also umount_resctrlfs() will not be called.
> 
> Add a signal handler like other tests to kill child process, umount
> resctrlfs, cleanup result files, etc. when an error occurs.
> 
> Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> ---
>   tools/testing/selftests/resctrl/cat_test.c | 28 +++++++++++++++-------
>   1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
> index 6a8306b0a109..5f81817f4366 100644
> --- a/tools/testing/selftests/resctrl/cat_test.c
> +++ b/tools/testing/selftests/resctrl/cat_test.c
> @@ -98,12 +98,21 @@ void cat_test_cleanup(void)
>   	remove(RESULT_FILE_NAME2);
>   }
>   
> +static void ctrl_handler(int signo)
> +{
> +	kill(bm_pid, SIGKILL);
> +	umount_resctrlfs();
> +	tests_cleanup();
> +	ksft_print_msg("Ending\n\n");

Is there a reason to print this message? Remove it unless it serves
a purpose.

> +
> +	exit(EXIT_SUCCESS);
> +}
> +
>   int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>   {
>   	unsigned long l_mask, l_mask_1;
>   	int ret, pipefd[2], sibling_cpu_no;
>   	char pipe_message;
> -	pid_t bm_pid;

Odd. bm_pid is used below - why remove it here?

>   
>   	cache_size = 0;
>   
> @@ -181,17 +190,19 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>   		strcpy(param.filename, RESULT_FILE_NAME1);
>   		param.num_of_runs = 0;
>   		param.cpu_no = sibling_cpu_no;
> +	} else {
> +		/* set up ctrl-c handler */
> +		if (signal(SIGINT, ctrl_handler) == SIG_ERR ||
> +		    signal(SIGHUP, ctrl_handler) == SIG_ERR ||
> +		    signal(SIGTERM, ctrl_handler) == SIG_ERR)
> +			printf("Failed to catch SIGNAL!\n");

Is perror() more appropriate here?

>   	}
>   
>   	remove(param.filename);
>   
>   	ret = cat_val(&param);
> -	if (ret)
> -		return ret;
> -
> -	ret = check_results(&param);
> -	if (ret)
> -		return ret;
> +	if (ret == 0)
> +		ret = check_results(&param);

Why not use a goto in error case to do umount_resctrlfs() instead of changing
the conditionals?

>   
>   	if (bm_pid == 0) {
>   		/* Tell parent that child is ready */
> @@ -201,7 +212,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>   		    sizeof(pipe_message)) {
>   			close(pipefd[1]);
>   			perror("# failed signaling parent process");
> -			return errno;
>   		}
>   
>   		close(pipefd[1]);
> @@ -226,5 +236,5 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>   	if (bm_pid)
>   		umount_resctrlfs();
>   
> -	return 0;
> +	return ret;
>   }


With these changes made:

Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah
  
Reinette Chatre Nov. 7, 2022, 10:31 p.m. UTC | #2
Hi Shaopeng and Shuah,

On 11/2/2022 2:41 AM, Shuah Khan wrote:
> On 11/1/22 03:43, Shaopeng Tan wrote:
>> After creating a child process with fork() in CAT test, if there is
>> an error occurs or such as a SIGINT signal is received, the parent
>> process will be terminated immediately, but the child process will not
>> be killed and also umount_resctrlfs() will not be called.
>>
>> Add a signal handler like other tests to kill child process, umount
>> resctrlfs, cleanup result files, etc. when an error occurs.
>>
>> Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> ---
>>   tools/testing/selftests/resctrl/cat_test.c | 28 +++++++++++++++-------
>>   1 file changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
>> index 6a8306b0a109..5f81817f4366 100644
>> --- a/tools/testing/selftests/resctrl/cat_test.c
>> +++ b/tools/testing/selftests/resctrl/cat_test.c
>> @@ -98,12 +98,21 @@ void cat_test_cleanup(void)
>>       remove(RESULT_FILE_NAME2);
>>   }
>>   +static void ctrl_handler(int signo)
>> +{
>> +    kill(bm_pid, SIGKILL);
>> +    umount_resctrlfs();
>> +    tests_cleanup();
>> +    ksft_print_msg("Ending\n\n");
> 
> Is there a reason to print this message? Remove it unless it serves
> a purpose.

This function appears to be a duplicate of existing
resctrl_val.c:ctrlc_handler(). Could the duplication be avoided
instead of refining the copy?

> 
>> +
>> +    exit(EXIT_SUCCESS);
>> +}
>> +
>>   int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>>   {
>>       unsigned long l_mask, l_mask_1;
>>       int ret, pipefd[2], sibling_cpu_no;
>>       char pipe_message;
>> -    pid_t bm_pid;
> 
> Odd. bm_pid is used below - why remove it here?

There is a global bm_pid in resctrl_val.c that is made available
via extern in resctrl.h. This is what causes this code to still
compile but I would also like to learn why moving to that is
desired as a change here. I expect such a big change to get a
mention in the commit message.

> 
>>         cache_size = 0;
>>   @@ -181,17 +190,19 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>>           strcpy(param.filename, RESULT_FILE_NAME1);
>>           param.num_of_runs = 0;
>>           param.cpu_no = sibling_cpu_no;
>> +    } else {
>> +        /* set up ctrl-c handler */
>> +        if (signal(SIGINT, ctrl_handler) == SIG_ERR ||
>> +            signal(SIGHUP, ctrl_handler) == SIG_ERR ||
>> +            signal(SIGTERM, ctrl_handler) == SIG_ERR)
>> +            printf("Failed to catch SIGNAL!\n");
> 
> Is perror() more appropriate here?

Should we be using signal() at all? "man signal" reads:
"WARNING: the behavior of signal() varies across UNIX versions,
and has also varied historically across different versions of Linux.
Avoid its use: use sigaction(2) instead."

"Failed to catch SIGNAL" also seems unclear to me. This is
where a signal handler is set up, the signal for which the handler
is installed has not arrived.


> 
>>       }
>>         remove(param.filename);
>>         ret = cat_val(&param);
>> -    if (ret)
>> -        return ret;
>> -
>> -    ret = check_results(&param);
>> -    if (ret)
>> -        return ret;
>> +    if (ret == 0)
>> +        ret = check_results(&param);
> 
> Why not use a goto in error case to do umount_resctrlfs() instead of changing
> the conditionals?

My understanding is the code that follows is needed to
synchronize the exits between the parent and child. It is the parent
that will run umount_resctrlfs() and it should only do so
after the child is done. A goto by the parent may thus cause
umount_resctrlfs() to be run while the child still relies on it while
a goto by the child may cause the parent not to receive the message
that the child is complete.

Reinette
  
Reinette Chatre Nov. 7, 2022, 10:40 p.m. UTC | #3
Hi Shaopeng,

On 11/1/2022 2:43 AM, Shaopeng Tan wrote:
> After creating a child process with fork() in CAT test, if there is
> an error occurs or such as a SIGINT signal is received, the parent

I find the above hard to read. How about "..., if an error occurs or
a signal such as SIGINT is received, ..."

> process will be terminated immediately, but the child process will not
> be killed and also umount_resctrlfs() will not be called.
> 
> Add a signal handler like other tests to kill child process, umount
> resctrlfs, cleanup result files, etc. when an error occurs.
> 
> Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> ---
>  tools/testing/selftests/resctrl/cat_test.c | 28 +++++++++++++++-------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 

...

> @@ -201,7 +212,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>  		    sizeof(pipe_message)) {
>  			close(pipefd[1]);
>  			perror("# failed signaling parent process");
> -			return errno;

It looks like pipefd[1] will be closed twice if the write() failed.

It does look strange to let the child continue to its infinite loop
after the write() failed. I assume that it is because the parent will
also be stuck and the new ctrl_handler() is expected to unblock both?
Could you please add a comment to the code to clarify this flow?

Thank you very much

Reinette
  
Shaopeng Tan (Fujitsu) Nov. 8, 2022, 8:32 a.m. UTC | #4
Hi Reinette and Shuah,

> On 11/2/2022 2:41 AM, Shuah Khan wrote:
> > On 11/1/22 03:43, Shaopeng Tan wrote:
> >> After creating a child process with fork() in CAT test, if there is
> >> an error occurs or such as a SIGINT signal is received, the parent
> >> process will be terminated immediately, but the child process will
> >> not be killed and also umount_resctrlfs() will not be called.
> >>
> >> Add a signal handler like other tests to kill child process, umount
> >> resctrlfs, cleanup result files, etc. when an error occurs.
> >>
> >> Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> >> ---
> >>   tools/testing/selftests/resctrl/cat_test.c | 28
> >> +++++++++++++++-------
> >>   1 file changed, 19 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/tools/testing/selftests/resctrl/cat_test.c
> >> b/tools/testing/selftests/resctrl/cat_test.c
> >> index 6a8306b0a109..5f81817f4366 100644
> >> --- a/tools/testing/selftests/resctrl/cat_test.c
> >> +++ b/tools/testing/selftests/resctrl/cat_test.c
> >> @@ -98,12 +98,21 @@ void cat_test_cleanup(void)
> >>       remove(RESULT_FILE_NAME2);
> >>   }
> >>   +static void ctrl_handler(int signo)
> >> +{
> >> +    kill(bm_pid, SIGKILL);
> >> +    umount_resctrlfs();
> >> +    tests_cleanup();
> >> +    ksft_print_msg("Ending\n\n");
> >
> > Is there a reason to print this message? Remove it unless it serves a
> > purpose.
> 
> This function appears to be a duplicate of existing resctrl_val.c:ctrlc_handler().
> Could the duplication be avoided instead of refining the copy?

Yes, it's a duplicate of existing resctrl_val.c:ctrlc_handler().
I will try to use resctrl_val.c:ctrlc_handler() in next version patch series.

> >> +
> >> +    exit(EXIT_SUCCESS);
> >> +}
> >> +
> >>   int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >>   {
> >>       unsigned long l_mask, l_mask_1;
> >>       int ret, pipefd[2], sibling_cpu_no;
> >>       char pipe_message;
> >> -    pid_t bm_pid;
> >
> > Odd. bm_pid is used below - why remove it here?
> 
> There is a global bm_pid in resctrl_val.c that is made available via extern in
> resctrl.h. This is what causes this code to still compile but I would also like to
> learn why moving to that is desired as a change here. I expect such a big
> change to get a mention in the commit message.

Yes. I used global bm_pid, since bm_pid is used in ctrl_handler() before this function cat_perf_miss_val().
I will add a description to commit message.

> >>         cache_size = 0;
> >>   @@ -181,17 +190,19 @@ int cat_perf_miss_val(int cpu_no, int n, char
> >> *cache_type)
> >>           strcpy(param.filename, RESULT_FILE_NAME1);
> >>           param.num_of_runs = 0;
> >>           param.cpu_no = sibling_cpu_no;
> >> +    } else {
> >> +        /* set up ctrl-c handler */
> >> +        if (signal(SIGINT, ctrl_handler) == SIG_ERR ||
> >> +            signal(SIGHUP, ctrl_handler) == SIG_ERR ||
> >> +            signal(SIGTERM, ctrl_handler) == SIG_ERR)
> >> +            printf("Failed to catch SIGNAL!\n");
> >
> > Is perror() more appropriate here?
> 
> Should we be using signal() at all? "man signal" reads:
> "WARNING: the behavior of signal() varies across UNIX versions, and has also
> varied historically across different versions of Linux.
> Avoid its use: use sigaction(2) instead."
> 
> "Failed to catch SIGNAL" also seems unclear to me. This is where a signal
> handler is set up, the signal for which the handler is installed has not arrived.

I will use sigaction(2) and perror().

> >>       }
> >>         remove(param.filename);
> >>         ret = cat_val(&param);
> >> -    if (ret)
> >> -        return ret;
> >> -
> >> -    ret = check_results(&param);
> >> -    if (ret)
> >> -        return ret;
> >> +    if (ret == 0)
> >> +        ret = check_results(&param);
> >
> > Why not use a goto in error case to do umount_resctrlfs() instead of
> > changing the conditionals?
> 
> My understanding is the code that follows is needed to synchronize the exits
> between the parent and child. It is the parent that will run umount_resctrlfs()
> and it should only do so after the child is done. A goto by the parent may thus
> cause
> umount_resctrlfs() to be run while the child still relies on it while a goto by the
> child may cause the parent not to receive the message that the child is
> complete.

Yes, the code that follows is needed to synchronize the exits between the parent and child.

> > @@ -201,7 +212,6 @@ int cat_perf_miss_val(int cpu_no, int n, char
> *cache_type)
> >  		    sizeof(pipe_message)) {
> >  			close(pipefd[1]);
> >  			perror("# failed signaling parent process");
> > -			return errno;
> 
> It looks like pipefd[1] will be closed twice if the write() failed.

This close(pipefd[1]); should also be removed.

> It does look strange to let the child continue to its infinite loop after the write()
> failed. I assume that it is because the parent will also be stuck and the new
> ctrl_handler() is expected to unblock both?

If a SIGNAL is received, ctrl_handler() will be called to kill the child process and exit parent process.
If no SIGNAL is received, the parent process will kill the child process. (by else{kill(bm_pid, SIGKILL);})

> Could you please add a comment to the code to clarify this flow?
I will add comments here.

Best regards,
Shaopeng Tan
  

Patch

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
index 6a8306b0a109..5f81817f4366 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -98,12 +98,21 @@  void cat_test_cleanup(void)
 	remove(RESULT_FILE_NAME2);
 }
 
+static void ctrl_handler(int signo)
+{
+	kill(bm_pid, SIGKILL);
+	umount_resctrlfs();
+	tests_cleanup();
+	ksft_print_msg("Ending\n\n");
+
+	exit(EXIT_SUCCESS);
+}
+
 int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
 {
 	unsigned long l_mask, l_mask_1;
 	int ret, pipefd[2], sibling_cpu_no;
 	char pipe_message;
-	pid_t bm_pid;
 
 	cache_size = 0;
 
@@ -181,17 +190,19 @@  int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
 		strcpy(param.filename, RESULT_FILE_NAME1);
 		param.num_of_runs = 0;
 		param.cpu_no = sibling_cpu_no;
+	} else {
+		/* set up ctrl-c handler */
+		if (signal(SIGINT, ctrl_handler) == SIG_ERR ||
+		    signal(SIGHUP, ctrl_handler) == SIG_ERR ||
+		    signal(SIGTERM, ctrl_handler) == SIG_ERR)
+			printf("Failed to catch SIGNAL!\n");
 	}
 
 	remove(param.filename);
 
 	ret = cat_val(&param);
-	if (ret)
-		return ret;
-
-	ret = check_results(&param);
-	if (ret)
-		return ret;
+	if (ret == 0)
+		ret = check_results(&param);
 
 	if (bm_pid == 0) {
 		/* Tell parent that child is ready */
@@ -201,7 +212,6 @@  int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
 		    sizeof(pipe_message)) {
 			close(pipefd[1]);
 			perror("# failed signaling parent process");
-			return errno;
 		}
 
 		close(pipefd[1]);
@@ -226,5 +236,5 @@  int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
 	if (bm_pid)
 		umount_resctrlfs();
 
-	return 0;
+	return ret;
 }