From patchwork Tue Oct 24 09:26:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 157338 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1820987vqx; Tue, 24 Oct 2023 02:31:37 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEpqHTqdszycCdbBPbMgVuWktJOUULpevA66N2utexWibk9zxfQ8pGJjPiVs/Zo5YCvA5As X-Received: by 2002:a05:6a21:4887:b0:15e:d84:1c5e with SMTP id av7-20020a056a21488700b0015e0d841c5emr2141294pzc.38.1698139897651; Tue, 24 Oct 2023 02:31:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698139897; cv=none; d=google.com; s=arc-20160816; b=gFVmSgLIxGJU9kvaQwJ6Konu+C29KTDQR99YchlHiadVAN7F1ky/RP4UvXTjZMwoaI 1+YSIaLeSvAcQ/CqGD19ThVEQ5jYogylBkslh6wi+ZvbWUxflxPmrzqwxF4EeMI32EK8 9NGYq/ULxW5FA2i80QpQDZ5dxsj/ECvW1XNLkB/tvE2ax8DxDC5md5CjLKKZLGKy20Uw 0FhnqEhfajXGyikwJ+eumn8EZdX16e19rAD58KsMW61nMTbJFQ/2LfRLMAIPNi/7JQG1 s7gVzWF5n65nvbX3slUJeM531Eoa+J3MYfmUKbcuqwnyxD9QUu3A44QhmDuXnLXP6N06 kVZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=tukJiYoE+JoVbcAOLzY6dzGTtdmqO1g/81ZAlLFqiOM=; fh=gPQ6jqLSfsDb5bE3yrtO+AlT5R4d75RXkjC5xckz7Dk=; b=qwrNhRhpIocHSvbh7/2sPCEM4I65Wm5zJG79w4tR/GhgRGHguSdegQxAwMNvw7siaW vSt+JBGW05jgqkdV4Q4YZAagCVEdWz3xMhbRRFmvwFEtlUdks9yCEXeUIUfkp0CQ7E48 nwC9Kngj/IGTJZdcnHOBjb9/HceR3xuZBwv3C2ai7vmUYNSw/Y3JJaCi92iFqA4VY4xF CBF52KohhbVMP9Jl7IMRhyRDn/eyU9QyRlsZgMvm4gTEL0y97hNgEHv6ZGb7MRk4rJsT YOkdJgGAaNj2y7Z2VNaXJm2SGS5srFFA6kt4LG0gq3xetRssmYewkc//EANdeAG0OmIZ AGlQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aMtvfXym; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id a13-20020a056a000c8d00b0069d1639f09esi8347911pfv.190.2023.10.24.02.31.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 02:31:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aMtvfXym; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 77FDF806D7C1; Tue, 24 Oct 2023 02:30:31 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234261AbjJXJaM (ORCPT + 26 others); Tue, 24 Oct 2023 05:30:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234236AbjJXJ3g (ORCPT ); Tue, 24 Oct 2023 05:29:36 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF27C1729; Tue, 24 Oct 2023 02:28:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698139735; x=1729675735; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bqHDSFhKyH5O4q+WlQcPjHBDU9dNxKX/Zht0fn5lHcg=; b=aMtvfXymGwsc9tLQQCSu9gvp6/91FilF7HER6ERPLVwFJ5T1AcE6Dhxf z2Dx9kuZCtOyaXGizSNXlpbR4/FCecoHuLGZ6w1eNaZjFMSHSfdmvMgkU ife3J+PF0NYhu44xW0y4ZiOyjfImuUqXCawW8cjWTv1QrYkV+gWqot4Dq qPr5Jpep0LxOIFpARMvBxB7xl9pml10n12CvPLHucA/sMoHQnBo//Kiwj RYBJHsLbt0MWNAzgsinfRydDEyaycTz5f5pXEBCEZrA8REA+MAMCxa2Bu 0yZkDRkc1ULLOdeo0Gbsts+eNe2quhe76F1j+mtRCgz0258EsOSRpeqQx A==; X-IronPort-AV: E=McAfee;i="6600,9927,10872"; a="384218902" X-IronPort-AV: E=Sophos;i="6.03,247,1694761200"; d="scan'208";a="384218902" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2023 02:28:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,247,1694761200"; d="scan'208";a="6397798" Received: from hprosing-mobl.ger.corp.intel.com (HELO localhost) ([10.249.40.219]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2023 02:28:44 -0700 From: =?utf-8?q?Ilpo_J=C3=A4rvinen?= To: linux-kselftest@vger.kernel.org, Reinette Chatre , Shuah Khan , Shaopeng Tan , =?utf-8?q?Maciej_Wiecz=C3=B3r-R?= =?utf-8?q?etman?= , Fenghua Yu Cc: linux-kernel@vger.kernel.org, =?utf-8?q?Ilpo_J=C3=A4rvinen?= Subject: [PATCH 15/24] selftests/resctrl: Read in less obvious order to defeat prefetch optimizations Date: Tue, 24 Oct 2023 12:26:25 +0300 Message-Id: <20231024092634.7122-16-ilpo.jarvinen@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20231024092634.7122-1-ilpo.jarvinen@linux.intel.com> References: <20231024092634.7122-1-ilpo.jarvinen@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 24 Oct 2023 02:30:31 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780628741318932094 X-GMAIL-MSGID: 1780628741318932094 When reading memory in order, HW prefetching optimizations will interfere with measuring how caches and memory are being accessed. This adds noise into the results. Change the fill_buf reading loop to not use an obvious in-order access using multiply by a prime and modulo. Using a prime multiplier with modulo ensures the entire buffer is eventually read. 23 is small enough that the reads are spread out but wrapping does not occur very frequently (wrapping too often can trigger L2 hits more frequently which causes noise to the test because getting the data from LLC is not required). It was discovered that not all primes work equally well and some can cause wildly unstable results (e.g., in an earlier version of this patch, the reads were done in reversed order and 59 was used as the prime resulting in unacceptably high and unstable results in MBA and MBM test on some architectures). Link: https://lore.kernel.org/linux-kselftest/TYAPR01MB6330025B5E6537F94DA49ACB8B499@TYAPR01MB6330.jpnprd01.prod.outlook.com/ Signed-off-by: Ilpo Järvinen --- tools/testing/selftests/resctrl/fill_buf.c | 38 +++++++++++++++++----- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 9d0b0bf4b85a..326d530425d0 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -51,16 +51,38 @@ static void mem_flush(unsigned char *buf, size_t buf_size) sb(); } +/* + * Buffer index step advance to workaround HW prefetching interfering with + * the measurements. + * + * Must be a prime to step through all indexes of the buffer. + * + * Some primes work better than others on some architectures (from MBA/MBM + * result stability point of view). + */ +#define FILL_IDX_MULT 23 + static int fill_one_span_read(unsigned char *buf, size_t buf_size) { - unsigned char *end_ptr = buf + buf_size; - unsigned char sum, *p; - - sum = 0; - p = buf; - while (p < end_ptr) { - sum += *p; - p += (CL_SIZE / 2); + unsigned int size = buf_size / (CL_SIZE / 2); + unsigned int i, idx = 0; + unsigned char sum = 0; + + /* + * Read the buffer in an order that is unexpected by HW prefetching + * optimizations to prevent them interfering with the caching pattern. + * + * The read order is (in terms of halves of cachelines): + * i * FILL_IDX_MULT % size + * The formula is open-coded below to avoiding modulo inside the loop + * as it improves MBA/MBM result stability on some architectures. + */ + for (i = 0; i < size; i++) { + sum += buf[idx * (CL_SIZE / 2)]; + + idx += FILL_IDX_MULT; + while (idx >= size) + idx -= size; } return sum;