Message ID | 20231024092634.7122-25-ilpo.jarvinen@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1821468vqx; Tue, 24 Oct 2023 02:32:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHWdhfa8x9tDlhh8AU90zF4aJ26nrVUp4yDNMUVPdcb3iogNB4wrRH1A4LrHXn9FNo1e5ai X-Received: by 2002:a05:6a00:1a90:b0:690:2ad9:1454 with SMTP id e16-20020a056a001a9000b006902ad91454mr14421902pfv.33.1698139971816; Tue, 24 Oct 2023 02:32:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698139971; cv=none; d=google.com; s=arc-20160816; b=HZgBmIcf4WXIJoSX+3RGwUfgbj0wx2VFpInZparLdZUoYTzHrMVz1km37pDiqJ4wD5 sblySbzT7wAQELgJeCr0F3lPtpH7nXWo50pCQdV0iBaeJaGxvtwq/J5CjMachMb1EaXZ S19TGy1CYRMC/t8DC53f+cMmAegSk6ji8z49yZ5Dx9dnMDVfkKsW3OSWlE9Aex8agMyz /OYo/qzRUjs7jGvP3Uw6U7nAMAtRDY4IVbTTTBGhFxKwaL0wr+p/hUYpLWvGdOysu1kd 4IBjStx4CIV42UGkgURlWtTaVQHXByzH8I+2Bbdi57P+HWhJCKlotK7zpsFT9NdgFupn S3Zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=K/06NA3Um1N+gUzTiTgal6sxginCiZnUioyie0nJBVc=; fh=gPQ6jqLSfsDb5bE3yrtO+AlT5R4d75RXkjC5xckz7Dk=; b=TZUyl7DAnFlz9ab5I1231msmFRzCpdODQ8r1+A4CcrKsqUXj2zfVfKT4LcW6oRuWhK XBJJM/oVhq1/H8Qtg2YtGk+pbuJaa8u5A7b95pU4bKTBZIx8WVZb6NGsRHlybbIP9wFU kj41kQg9Zns3CJ3Seh+EpR36SVrlwASgywtC/MGUFm4fHG7rPzdQdlscRwT0HlMghFgh 1jmETTZPK+wk1diSTbYkX4+3zABzJb2DABFSl5OraHuzL0X05FUl+PmrNijmRYvIVJUu rR9J8XAIqEYqc4ligReK1ojLsU65cRm150v2fWlptNDVIqQEVT59UxEu00itCuUQMLjE U0Hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=A07DuuYp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id x11-20020a634a0b000000b005b82de74216si7842592pga.901.2023.10.24.02.32.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 02:32:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=A07DuuYp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 86444804239D; Tue, 24 Oct 2023 02:32:49 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234235AbjJXJc2 (ORCPT <rfc822;a1648639935@gmail.com> + 26 others); Tue, 24 Oct 2023 05:32:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234489AbjJXJbh (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 24 Oct 2023 05:31:37 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC12A1BC7; Tue, 24 Oct 2023 02:30:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698139812; x=1729675812; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1rS7dOlEZmpZ+PuUamvzhE47459WoiCsJ336xs2d7Tc=; b=A07DuuYp1ggXcBt3Sbsggr9IqcL2L/E53wJvPzeguo/eRpFsSCwpUjZg xV3tYbujAHQ6UTvxQr547FlmlhyXCTsQQ8PSF12rGpGcZa3aT2YeMaKfH BZuqLe1zGuFvqD9yd0TjaiIPR/YYfkP7DmoLiE85tuEXqtFEdkYLc6HWr n2gXRfhbjsZ0Hoqm5171TDPUWhavfQtx0pOMnve3nXCg4k2XFLLNddijF ebfBVw7rNmwQryYSrJJrP0Krb0ptI1hq4+2XXvKaYsblDeddlo4dkTxs7 R33DSU76lef2v1Csffvns6mijYPijr9rv+I3OUtiqW65e8RanY+6AVMCH g==; X-IronPort-AV: E=McAfee;i="6600,9927,10872"; a="366364219" X-IronPort-AV: E=Sophos;i="6.03,247,1694761200"; d="scan'208";a="366364219" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2023 02:30:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10872"; a="849076026" X-IronPort-AV: E=Sophos;i="6.03,247,1694761200"; d="scan'208";a="849076026" Received: from hprosing-mobl.ger.corp.intel.com (HELO localhost) ([10.249.40.219]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2023 02:30:07 -0700 From: =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com> To: linux-kselftest@vger.kernel.org, Reinette Chatre <reinette.chatre@intel.com>, Shuah Khan <shuah@kernel.org>, Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>, =?utf-8?q?Maciej_Wiecz=C3=B3r-R?= =?utf-8?q?etman?= <maciej.wieczor-retman@intel.com>, Fenghua Yu <fenghua.yu@intel.com> Cc: linux-kernel@vger.kernel.org, =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com> Subject: [PATCH 24/24] selftests/resctrl: Ignore failures from L2 CAT test with <= 2 bits Date: Tue, 24 Oct 2023 12:26:34 +0300 Message-Id: <20231024092634.7122-25-ilpo.jarvinen@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20231024092634.7122-1-ilpo.jarvinen@linux.intel.com> References: <20231024092634.7122-1-ilpo.jarvinen@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 24 Oct 2023 02:32:49 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780628819190853190 X-GMAIL-MSGID: 1780628819190853190 |
Series |
selftests/resctrl: CAT test improvements & generalized test framework
|
|
Commit Message
Ilpo Järvinen
Oct. 24, 2023, 9:26 a.m. UTC
L2 CAT test with low number of bits tends to occasionally fail because
of what seems random variation. The margin is quite small to begin with
for <= 2 bits in CBM. At times, the result can even become negative.
While it would be possible to allow negative values for those cases, it
would be more confusing to user.
Ignore failures from the tests where <= 2 were used to avoid false
negative results.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
tools/testing/selftests/resctrl/cat_test.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
Comments
On 2023-10-24 at 12:26:34 +0300, Ilpo Järvinen wrote: >L2 CAT test with low number of bits tends to occasionally fail because >of what seems random variation. The margin is quite small to begin with >for <= 2 bits in CBM. At times, the result can even become negative. >While it would be possible to allow negative values for those cases, it >would be more confusing to user. "to user" -> "to the user"? > >Ignore failures from the tests where <= 2 were used to avoid false >negative results. > >Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Hi Ilpo, On 10/24/2023 2:26 AM, Ilpo Järvinen wrote: > L2 CAT test with low number of bits tends to occasionally fail because > of what seems random variation. The margin is quite small to begin with > for <= 2 bits in CBM. At times, the result can even become negative. > While it would be possible to allow negative values for those cases, it > would be more confusing to user. > > Ignore failures from the tests where <= 2 were used to avoid false > negative results. > I think the core message is that 2 or fewer bits should not be used. Instead of running the test and ignoring the results the test should perhaps just not be run. Reinette
On Thu, 2 Nov 2023, Reinette Chatre wrote: > On 10/24/2023 2:26 AM, Ilpo Järvinen wrote: > > L2 CAT test with low number of bits tends to occasionally fail because > > of what seems random variation. The margin is quite small to begin with > > for <= 2 bits in CBM. At times, the result can even become negative. > > While it would be possible to allow negative values for those cases, it > > would be more confusing to user. > > > > Ignore failures from the tests where <= 2 were used to avoid false > > negative results. > > > > I think the core message is that 2 or fewer bits should not be used. Instead > of running the test and ignoring the results the test should perhaps just not > be run. I considered that but it often does work so it felt shame to now present them when they're successful. Then I just had to decide how to deal with the cases where they failed. Also, if I make it to not run down to 1 bit, those numbers will never ever be seen by anyone. It doesn't say 2 and 1 bit results don't contain any information to a human reader who is able to do more informed decisions whether something is truly working or not. We could, hypothetically, have a HW issue one day which makes 1-bit L2 mask to misbehave and if the number is never seen by anyone, it's extremely unlikely to be caught easily. They are just reliable enough for simple automated threshold currently. Maybe something else than average value would be, it would need to be explored but I suspect also the memory address of the buffer might affect the value, with L3 it definitely should because of how the things work but I don't know if that holds for L2 too. I have earlier tried playing with the buffer addresses with L3 but as I didn't immediately yield positive outcome to guard against outliers, I postponed that investigation (e.g., my alloc pattern might have been too straightforward and didn't provide enough entropy into the buffer start address because I just alloc'ed n x buf_size buffers back-to-back). But I don't have very strong opinion on this so if you prefer I just stop at 3 bits, I can change it?
Hi Ilpo, On 11/3/2023 3:24 AM, Ilpo Järvinen wrote: > On Thu, 2 Nov 2023, Reinette Chatre wrote: >> On 10/24/2023 2:26 AM, Ilpo Järvinen wrote: >>> L2 CAT test with low number of bits tends to occasionally fail because >>> of what seems random variation. The margin is quite small to begin with >>> for <= 2 bits in CBM. At times, the result can even become negative. >>> While it would be possible to allow negative values for those cases, it >>> would be more confusing to user. >>> >>> Ignore failures from the tests where <= 2 were used to avoid false >>> negative results. >>> >> >> I think the core message is that 2 or fewer bits should not be used. Instead >> of running the test and ignoring the results the test should perhaps just not >> be run. > > I considered that but it often does work so it felt shame to now present > them when they're successful. Then I just had to decide how to deal with > the cases where they failed. > > Also, if I make it to not run down to 1 bit, those numbers will never ever > be seen by anyone. It doesn't say 2 and 1 bit results don't contain any > information to a human reader who is able to do more informed decisions > whether something is truly working or not. We could, hypothetically, have > a HW issue one day which makes 1-bit L2 mask to misbehave and if the > number is never seen by anyone, it's extremely unlikely to be caught > easily. > > They are just reliable enough for simple automated threshold currently. > Maybe something else than average value would be, it would need to be > explored but I suspect also the memory address of the buffer might affect > the value, with L3 it definitely should because of how the things work but > I don't know if that holds for L2 too. I have earlier tried playing with > the buffer addresses with L3 but as I didn't immediately yield positive > outcome to guard against outliers, I postponed that investigation (e.g., > my alloc pattern might have been too straightforward and didn't provide > enough entropy into the buffer start address because I just alloc'ed n x > buf_size buffers back-to-back). > > But I don't have very strong opinion on this so if you prefer I just stop > at 3 bits, I can change it? > We seem to have different users in mind when thinking about this. I was considering the users that just run the selftest to get a pass/fail. You seem to also consider folks using this for validation. I'm ok with keeping this change to accommodate both. Reinette
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index a9c72022bb5a..bc88eb891f35 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -28,7 +28,7 @@ */ #define MIN_DIFF_PERCENT_PER_BIT 1 -static int show_results_info(__u64 sum_llc_val, int no_of_bits, +static int show_results_info(__u64 sum_llc_val, int no_of_bits, bool ignore_fail, unsigned long cache_span, long min_diff_percent, unsigned long num_of_runs, bool platform, __s64 *prev_avg_llc_val) @@ -40,12 +40,18 @@ static int show_results_info(__u64 sum_llc_val, int no_of_bits, avg_llc_val = sum_llc_val / num_of_runs; if (*prev_avg_llc_val) { float delta = (__s64)(avg_llc_val - *prev_avg_llc_val); + char *res_str; avg_diff = delta / *prev_avg_llc_val; ret = platform && (avg_diff * 100) < (float)min_diff_percent; + res_str = ret ? "Fail:" : "Pass:"; + if (ret && ignore_fail) { + res_str = "Pass (failure ignored):"; + ret = 0; + } ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n", - ret ? "Fail:" : "Pass:", (float)min_diff_percent); + res_str, (float)min_diff_percent); ksft_print_msg("Percent diff=%.1f\n", avg_diff * 100); } @@ -85,6 +91,7 @@ static int check_results(struct resctrl_val_param *param, const char *cache_type while (fgets(temp, sizeof(temp), fp)) { char *token = strtok(temp, ":\t"); + bool ignore_fail = false; int fields = 0; int bits; @@ -108,7 +115,15 @@ static int check_results(struct resctrl_val_param *param, const char *cache_type bits = count_bits(current_mask); - ret = show_results_info(sum_llc_perf_miss, bits, + /* + * L2 CAT test with low number of bits has too small margin to + * always remain positive. As negative values would be confusing + * for the user, ignore failure instead. + */ + if (bits <= 2 && !strcmp(cache_type, "L2")) + ignore_fail = true; + + ret = show_results_info(sum_llc_perf_miss, bits, ignore_fail, alloc_size / 64, MIN_DIFF_PERCENT_PER_BIT * (bits - 1), runs, get_vendor() == ARCH_INTEL,