[24/24] selftests/resctrl: Ignore failures from L2 CAT test with <= 2 bits

Message ID 20231024092634.7122-25-ilpo.jarvinen@linux.intel.com
State New
Headers
Series selftests/resctrl: CAT test improvements & generalized test framework |

Commit Message

Ilpo Järvinen Oct. 24, 2023, 9:26 a.m. UTC
  L2 CAT test with low number of bits tends to occasionally fail because
of what seems random variation. The margin is quite small to begin with
for <= 2 bits in CBM. At times, the result can even become negative.
While it would be possible to allow negative values for those cases, it
would be more confusing to user.

Ignore failures from the tests where <= 2 were used to avoid false
negative results.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 tools/testing/selftests/resctrl/cat_test.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)
  

Comments

Maciej Wieczor-Retman Oct. 27, 2023, 12:48 p.m. UTC | #1
On 2023-10-24 at 12:26:34 +0300, Ilpo Järvinen wrote:
>L2 CAT test with low number of bits tends to occasionally fail because
>of what seems random variation. The margin is quite small to begin with
>for <= 2 bits in CBM. At times, the result can even become negative.
>While it would be possible to allow negative values for those cases, it
>would be more confusing to user.

"to user" -> "to the user"?

>
>Ignore failures from the tests where <= 2 were used to avoid false
>negative results.
>
>Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
  
Reinette Chatre Nov. 2, 2023, 5:57 p.m. UTC | #2
Hi Ilpo,

On 10/24/2023 2:26 AM, Ilpo Järvinen wrote:
> L2 CAT test with low number of bits tends to occasionally fail because
> of what seems random variation. The margin is quite small to begin with
> for <= 2 bits in CBM. At times, the result can even become negative.
> While it would be possible to allow negative values for those cases, it
> would be more confusing to user.
> 
> Ignore failures from the tests where <= 2 were used to avoid false
> negative results.
> 

I think the core message is that 2 or fewer bits should not be used. Instead
of running the test and ignoring the results the test should perhaps just not
be run.

Reinette
  
Ilpo Järvinen Nov. 3, 2023, 10:24 a.m. UTC | #3
On Thu, 2 Nov 2023, Reinette Chatre wrote:
> On 10/24/2023 2:26 AM, Ilpo Järvinen wrote:
> > L2 CAT test with low number of bits tends to occasionally fail because
> > of what seems random variation. The margin is quite small to begin with
> > for <= 2 bits in CBM. At times, the result can even become negative.
> > While it would be possible to allow negative values for those cases, it
> > would be more confusing to user.
> > 
> > Ignore failures from the tests where <= 2 were used to avoid false
> > negative results.
> > 
> 
> I think the core message is that 2 or fewer bits should not be used. Instead
> of running the test and ignoring the results the test should perhaps just not
> be run.

I considered that but it often does work so it felt shame to now present
them when they're successful. Then I just had to decide how to deal with
the cases where they failed.

Also, if I make it to not run down to 1 bit, those numbers will never ever 
be seen by anyone. It doesn't say 2 and 1 bit results don't contain any 
information to a human reader who is able to do more informed decisions 
whether something is truly working or not. We could, hypothetically, have 
a HW issue one day which makes 1-bit L2 mask to misbehave and if the 
number is never seen by anyone, it's extremely unlikely to be caught 
easily.

They are just reliable enough for simple automated threshold currently. 
Maybe something else than average value would be, it would need to be 
explored but I suspect also the memory address of the buffer might affect 
the value, with L3 it definitely should because of how the things work but 
I don't know if that holds for L2 too. I have earlier tried playing with 
the buffer addresses with L3 but as I didn't immediately yield positive 
outcome to guard against outliers, I postponed that investigation (e.g., 
my alloc pattern might have been too straightforward and didn't provide 
enough entropy into the buffer start address because I just alloc'ed n x 
buf_size buffers back-to-back).

But I don't have very strong opinion on this so if you prefer I just stop 
at 3 bits, I can change it?
  
Reinette Chatre Nov. 3, 2023, 10:53 p.m. UTC | #4
Hi Ilpo,

On 11/3/2023 3:24 AM, Ilpo Järvinen wrote:
> On Thu, 2 Nov 2023, Reinette Chatre wrote:
>> On 10/24/2023 2:26 AM, Ilpo Järvinen wrote:
>>> L2 CAT test with low number of bits tends to occasionally fail because
>>> of what seems random variation. The margin is quite small to begin with
>>> for <= 2 bits in CBM. At times, the result can even become negative.
>>> While it would be possible to allow negative values for those cases, it
>>> would be more confusing to user.
>>>
>>> Ignore failures from the tests where <= 2 were used to avoid false
>>> negative results.
>>>
>>
>> I think the core message is that 2 or fewer bits should not be used. Instead
>> of running the test and ignoring the results the test should perhaps just not
>> be run.
> 
> I considered that but it often does work so it felt shame to now present
> them when they're successful. Then I just had to decide how to deal with
> the cases where they failed.
> 
> Also, if I make it to not run down to 1 bit, those numbers will never ever 
> be seen by anyone. It doesn't say 2 and 1 bit results don't contain any 
> information to a human reader who is able to do more informed decisions 
> whether something is truly working or not. We could, hypothetically, have 
> a HW issue one day which makes 1-bit L2 mask to misbehave and if the 
> number is never seen by anyone, it's extremely unlikely to be caught 
> easily.
> 
> They are just reliable enough for simple automated threshold currently. 
> Maybe something else than average value would be, it would need to be 
> explored but I suspect also the memory address of the buffer might affect 
> the value, with L3 it definitely should because of how the things work but 
> I don't know if that holds for L2 too. I have earlier tried playing with 
> the buffer addresses with L3 but as I didn't immediately yield positive 
> outcome to guard against outliers, I postponed that investigation (e.g., 
> my alloc pattern might have been too straightforward and didn't provide 
> enough entropy into the buffer start address because I just alloc'ed n x 
> buf_size buffers back-to-back).
> 
> But I don't have very strong opinion on this so if you prefer I just stop 
> at 3 bits, I can change it?
> 

We seem to have different users in mind when thinking about this. I was
considering the users that just run the selftest to get a pass/fail. You
seem to also consider folks using this for validation. I'm ok with keeping
this change to accommodate both.

Reinette
  

Patch

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
index a9c72022bb5a..bc88eb891f35 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -28,7 +28,7 @@ 
  */
 #define MIN_DIFF_PERCENT_PER_BIT	1
 
-static int show_results_info(__u64 sum_llc_val, int no_of_bits,
+static int show_results_info(__u64 sum_llc_val, int no_of_bits, bool ignore_fail,
 			     unsigned long cache_span, long min_diff_percent,
 			     unsigned long num_of_runs, bool platform,
 			     __s64 *prev_avg_llc_val)
@@ -40,12 +40,18 @@  static int show_results_info(__u64 sum_llc_val, int no_of_bits,
 	avg_llc_val = sum_llc_val / num_of_runs;
 	if (*prev_avg_llc_val) {
 		float delta = (__s64)(avg_llc_val - *prev_avg_llc_val);
+		char *res_str;
 
 		avg_diff = delta / *prev_avg_llc_val;
 		ret = platform && (avg_diff * 100) < (float)min_diff_percent;
 
+		res_str = ret ? "Fail:" : "Pass:";
+		if (ret && ignore_fail) {
+			res_str = "Pass (failure ignored):";
+			ret = 0;
+		}
 		ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",
-			       ret ? "Fail:" : "Pass:", (float)min_diff_percent);
+			       res_str, (float)min_diff_percent);
 
 		ksft_print_msg("Percent diff=%.1f\n", avg_diff * 100);
 	}
@@ -85,6 +91,7 @@  static int check_results(struct resctrl_val_param *param, const char *cache_type
 
 	while (fgets(temp, sizeof(temp), fp)) {
 		char *token = strtok(temp, ":\t");
+		bool ignore_fail = false;
 		int fields = 0;
 		int bits;
 
@@ -108,7 +115,15 @@  static int check_results(struct resctrl_val_param *param, const char *cache_type
 
 		bits = count_bits(current_mask);
 
-		ret = show_results_info(sum_llc_perf_miss, bits,
+		/*
+		 * L2 CAT test with low number of bits has too small margin to
+		 * always remain positive. As negative values would be confusing
+		 * for the user, ignore failure instead.
+		 */
+		if (bits <= 2 && !strcmp(cache_type, "L2"))
+			ignore_fail = true;
+
+		ret = show_results_info(sum_llc_perf_miss, bits, ignore_fail,
 					alloc_size / 64,
 					MIN_DIFF_PERCENT_PER_BIT * (bits - 1), runs,
 					get_vendor() == ARCH_INTEL,