Message ID | tencent_4DC4468312A1CB2CA34B0215FAD797D11F07@qq.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp231593vqo; Tue, 16 May 2023 00:11:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4rpUW5n43HltRA5RXcMv51LaRm/JfW3VP/T97mig5eSGGDa1A7tMu3uJzzIsn1mDzkWO4J X-Received: by 2002:a17:90b:a48:b0:24d:f2f5:f571 with SMTP id gw8-20020a17090b0a4800b0024df2f5f571mr36820255pjb.36.1684221110360; Tue, 16 May 2023 00:11:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684221110; cv=none; d=google.com; s=arc-20160816; b=mRwTjdwuDwTeL+1e/qOjppxhzJv0GtUH66xB+dumxF+iQD4PbyckbE/ISzCt5jIiyf e4SW/7TL2sMTWVJpkWus7ISH/nm82PvQ1NVA+8OzsSjN8mp7bNBHLADDjFQody069rsh euS1SSOGtlN6p7Pqe/7eBqzP2wspKzLwbREIAMdjX7YQ7JD4SEEwPqba+gdpry3wnKPa tqrJuMfB65jL79YNDPqPuyVVJwXFbeHwZz2ADaxJY4G84V/ywYDJWrMWbad4YrPqeA+V fAdZFCGkbrDsg+rooYR5zNHQdnix/E6R7zurg0XUvCFBUEtpPiJzgCg3WAJWOhxLOwGk Xy3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:date :subject:cc:to:from:message-id:dkim-signature; bh=SDq8vRlBuA6Er2G1+K62Nfmk/ihK4Wc1KqVV8N0ACJA=; b=eYiarD8/INYDzu+LumztWQa3H7XiT04xaMJWt+Pix5pXsNscf0CTEkijL2/gCDPuCm rOS10WMul2EwphZAdF6beALKh3buPqJCmE6ew4pJGWA9FFAWrVZj2Ovp1NZGqjM2Jj48 prOPqNJId7Y9313EB33yFXj4mSp+TyLXsAskjCHANywa3udObz7MUSvELrtzfbETJ53U OySp24gGT8D1m0nGhXnIkBQPUq5olGlzDynLnrsXJDc+WHQXE2Z+0khv7kfQrmzk0bbI tm29zdZWhU1908IFw2IE+gZ5hcyxSqJ/Gput70xsjQ2McoxDYZGp+ki94J2Hn6FIZarN G49A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@foxmail.com header.s=s201512 header.b=R94YbxVC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s14-20020a63770e000000b0053076a4da8esi12252089pgc.761.2023.05.16.00.11.35; Tue, 16 May 2023 00:11:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@foxmail.com header.s=s201512 header.b=R94YbxVC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230006AbjEPG5y (ORCPT <rfc822;peekingduck44@gmail.com> + 99 others); Tue, 16 May 2023 02:57:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230134AbjEPG5o (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 16 May 2023 02:57:44 -0400 Received: from out203-205-221-153.mail.qq.com (out203-205-221-153.mail.qq.com [203.205.221.153]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95224E6 for <linux-kernel@vger.kernel.org>; Mon, 15 May 2023 23:57:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1684219934; bh=SDq8vRlBuA6Er2G1+K62Nfmk/ihK4Wc1KqVV8N0ACJA=; h=From:To:Cc:Subject:Date; b=R94YbxVCz9TTfmfoTypDoTDssHPTfV04bQMzDcglAnn4oGi7j2SCYEZ8ku+C/u67y HQ3otNgkbj+5P0DJNgBBgx+DMsgZUcOhP9HKnrRVPznDlpblAIQAWH2aUoZB4XT2ey HHXoaUBMuZSMx7FnVkxhkl/dyxcg5x92mlzJuQHo= Received: from localhost.localdomain ([39.156.73.12]) by newxmesmtplogicsvrszc2-1.qq.com (NewEsmtp) with SMTP id D083C0E0; Tue, 16 May 2023 14:52:08 +0800 X-QQ-mid: xmsmtpt1684219928t74i4c82v Message-ID: <tencent_4DC4468312A1CB2CA34B0215FAD797D11F07@qq.com> X-QQ-XMAILINFO: MGSlRwRrdVfIisT3TiXB4uYvCn2VPP6XpnWBkaHc0Fn1xUCdZ0n2LQgq1EGkOK PpppkJLZ3hV2G2VRhLsM8HVzJygVLuFXFJNZmD30h9Xxyt+A90tFJoZi0NhhHFFaaCs9kFFJk+EQ u0IGl5YZIjQfxOJcXgoGPOgOQtebO0DuvLJti2RK/bBPsUsmJ4LeSdCTM3iAXMdmXSipRQEbM0f6 Z4PymvSRlJbGJRbdYDAzeFEvDcOkRXHK38cDh5a8/xhfFobv3Rj+rdrGouIEWzLvlr+tXgUmLXa+ av+q22dlSr4NPYgtGBH73WXd5voBiP0ho9ziAoTINCA/ffzwAqrVDNA85AktekFsRcPm9kFBE9nI RS5iK1EyhDvOsN2KyTPICrN9fa+sYIO1OFUYhG7Xs4Ga5Xvdi4PHKvRSoNftufYm/h+xp3H4ZtFm kGp6C/a8+uvpUlbNT89xkgFKeAbiuv9bjxXN97+OI8TSURtrFX8e8ShGvx3rMrbqeZAoM9pvUUH6 lrlbkNq/qGpsgXeumzEHnoRX9JiJsCbV/MXVdHU8M1tE8d3a6OKbEpB5RsfT+0ZU/F/5MheI1wUJ SpzYo3CburYpyhGMkZAwq+wg2TG5K7fMPrqmr0Pc3izqaDpXMFN2g4QEFqCXoyy4avjCRbNaLEBo B5FXWiUlThR9FnyCdLqXQu81lPvV/gJUnklY56dK+VToXbAnfmiExAnjHS0zHW+Od47Ae70ABhTO ZdkyVQ02OOIeYTyXo1FIKLqU3DEloYgmp/ZnKpXrGa1Xeb6b//AVDMA56bhbPm8OiZzAznLM3Mlc O81nQ+QOU/MQ29j4zyPIXPRMk0hAzIu51F3rgW9ArS2nJIiLNB9GwpJhIPahyIiFuLbdwV5ffQbZ cdEBBTDPVT80VY+Dy7AvuZW7uZHUbvs78YWpBxCt8dVJ2x2qIitVv38m4Rn9pHU+gr/geh3v+5gL fJXHCAKFBLQnqUPqGL1YdotzzBPhql2zLCi48fg5U= From: Rong Tao <rtoax@foxmail.com> To: tglx@linutronix.de Cc: rtoax@foxmail.com, Rong Tao <rongtao@cestc.cn>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" <hpa@zytor.com>, linux-kernel@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)) Subject: [PATCH] x86/vdso: Use non-serializing instruction rdtsc Date: Tue, 16 May 2023 14:52:03 +0800 X-OQ-MSGID: <20230516065203.14548-1-rtoax@foxmail.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, HELO_DYNAMIC_IPADDR,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,RDNS_DYNAMIC, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766033835388202139?= X-GMAIL-MSGID: =?utf-8?q?1766033835388202139?= |
Series |
x86/vdso: Use non-serializing instruction rdtsc
|
|
Commit Message
Rong Tao
May 16, 2023, 6:52 a.m. UTC
From: Rong Tao <rongtao@cestc.cn> Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction rdtsc can achieve a 40% performance improvement with only a small loss of precision. The RDTSCP instruction is not a serializing instruction, but it does wait until all previous instructions have executed and all previous loads are globally visible. The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Record the time-consuming of vdso clock_gettime(), pseudo code: count = 1000 * 1000 * 100; while (count--) clock_gettime(CLOCK_REALTIME, &ts); Time-consuming comparison: Time Consume(ns) | rdtsc_ordered() | rdtsc() | Promote ------------------+-----------------+-----------+--------- Physical Machine | 1269147289 | 759067324 | 40% Guest OS (KVM) | 1756615963 | 995823886 | 43% Signed-off-by: Rong Tao <rongtao@cestc.cn> --- arch/x86/include/asm/vdso/gettimeofday.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On 5/15/23 23:52, Rong Tao wrote: > Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction > rdtsc can achieve a 40% performance improvement with only a small loss of > precision. I think the minimum that can be done in a changelog like this is to figure out _why_ a RDTSCP was in use. There are a ton of things that can make the kernel go faster, but not all of them are a good idea. I assume that the folks that wrote this had good reason for not using plain RSTSC. What were those reasons?
Rong! On Tue, May 16 2023 at 14:52, Rong Tao wrote: > Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction > rdtsc can achieve a 40% performance improvement with only a small loss of > precision. That rdtsc_ordered() is not there to achieve precision. It's there to guarantee correctness. The correctness requirement is that reading clock MONOTONIC is strictly monotonic, i.e. there is no way that you can observe time going backwards. Neither locally nor accross CPUs. As you explained: > The RDTSC instruction is not a serializing instruction. It does not > necessarily wait until all previous instructions have been executed > before reading the counter. Q: What guarantees that this does not speculate deep enough to actually make time go backwards? A: Nothing Conclusion: The fence stays, unless you can prove the contrary under all circumstances and microarchitecture generations. Thanks, tglx
On May 16, 2023 7:12:34 AM PDT, Dave Hansen <dave.hansen@intel.com> wrote: >On 5/15/23 23:52, Rong Tao wrote: >> Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction >> rdtsc can achieve a 40% performance improvement with only a small loss of >> precision. > >I think the minimum that can be done in a changelog like this is to >figure out _why_ a RDTSCP was in use. There are a ton of things that >can make the kernel go faster, but not all of them are a good idea. > >I assume that the folks that wrote this had good reason for not using >plain RSTSC. What were those reasons? I believe the motivation is that it is atomic with reading the CPU number.
On Tue, May 16 2023 at 10:57, H. Peter Anvin wrote: > On May 16, 2023 7:12:34 AM PDT, Dave Hansen <dave.hansen@intel.com> wrote: >>On 5/15/23 23:52, Rong Tao wrote: >>> Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction >>> rdtsc can achieve a 40% performance improvement with only a small loss of >>> precision. >> >>I think the minimum that can be done in a changelog like this is to >>figure out _why_ a RDTSCP was in use. There are a ton of things that >>can make the kernel go faster, but not all of them are a good idea. >> >>I assume that the folks that wrote this had good reason for not using >>plain RSTSC. What were those reasons? > > I believe the motivation is that it is atomic with reading the CPU number. Believe belongs in the realm of religion and does not help much to explain technical issues. :) rdtsc_ordered() has actually useful comments and also see: https://lore.kernel.org/lkml/87ttwc73za.ffs@tglx The Intel SDM and the AMD APM are both blury about RDTSC speculation and we've observed (quite some time ago) situations where the RDTSC value was clearly from the past solely due to speculation. So we had to bite the bullet to add the fencing. Preferrably RDTSCP or if not available LFENCE; RDTSC. IIRC the original variant was even CPUID; RDTSC, which is daft. The time readout does (simplified): do { // Wait for the sequence count to become even while ((seq = READ_ONCE(vd->seq)) & 1); tsc = rdtsc_ordered(); now = convert(vd, tsc); } while (seq != READ_ONCE(vd->seq)); It's obviously more complex than that, but you get the idea. Now replace RDTSCP with RDTSC and explain what guarantees that the TSC read isn't speculated ahead of the sequence check. If it's architecturally guaranteed that this can't happen, I'm more than happy to use plain RDTSC. But as I've observed that myself in the past, I'm pretty sure that it is not guaranteed, at least not on older microarchitectures. If newer ones make that guarantee then they should have exposed that as a feature bit in CPUID and clearly documented it in the SDM. As long as that does not happen, I'm sticking to the correctness first principle. Thanks, tglx
On Mon, May 15, 2023, at 11:52 PM, Rong Tao wrote: > From: Rong Tao <rongtao@cestc.cn> > > Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction > rdtsc can achieve a 40% performance improvement with only a small loss of > precision. > > The RDTSCP instruction is not a serializing instruction, but it does wait > until all previous instructions have executed and all previous loads are > globally visible. The RDTSC instruction is not a serializing instruction. > It does not necessarily wait until all previous instructions have been > executed before reading the counter. > > Record the time-consuming of vdso clock_gettime(), pseudo code: > > count = 1000 * 1000 * 100; > while (count--) > clock_gettime(CLOCK_REALTIME, &ts); > > Time-consuming comparison: > > Time Consume(ns) | rdtsc_ordered() | rdtsc() | Promote > ------------------+-----------------+-----------+--------- > Physical Machine | 1269147289 | 759067324 | 40% > Guest OS (KVM) | 1756615963 | 995823886 | 43% > > Signed-off-by: Rong Tao <rongtao@cestc.cn> Out of curiosity, what happens if you apply that patch and run this thing: https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc-tests.git/tree/evil-clock-test.cc Build it with g++ -O2 and run: ./evil-clock-test -c monotonic --Andy
Thank you all very much for your responses, I tested the test code evil-clock-test[0] provided by Andy, this patch does cause time read errors and load errors. $ ./evil-clock-test.out -c monotonic CPU vendor : GenuineIntel CPU model : Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz CPU stepping : 0 TSC flags : tsc rdtscp constant_tsc tsc_known_freq tsc_deadline_timer tsc_adjust Will test the "CLOCK_MONOTONIC" clock. Now test failed : worst error 255 with 81902816 samples Load3 test failed: worst error 384 with 3284297 samples Load test passed : margin 32 with 18848374 samples Store test failed as expected: worst error 704 with 18213325 samples Thanks again :) [0] https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc-tests.git/tree/evil-clock-test.cc
diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h index 4cf6794f9d68..342d29106208 100644 --- a/arch/x86/include/asm/vdso/gettimeofday.h +++ b/arch/x86/include/asm/vdso/gettimeofday.h @@ -228,7 +228,7 @@ static u64 vread_pvclock(void) if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) return U64_MAX; - ret = __pvclock_read_cycles(pvti, rdtsc_ordered()); + ret = __pvclock_read_cycles(pvti, rdtsc()); } while (pvclock_read_retry(pvti, version)); return ret; @@ -246,7 +246,7 @@ static inline u64 __arch_get_hw_counter(s32 clock_mode, const struct vdso_data *vd) { if (likely(clock_mode == VDSO_CLOCKMODE_TSC)) - return (u64)rdtsc_ordered(); + return (u64)rdtsc(); /* * For any memory-mapped vclock type, we need to make sure that gcc * doesn't cleverly hoist a load before the mode check. Otherwise we