From patchwork Mon Feb 20 12:41:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 59426 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1285641wrn; Mon, 20 Feb 2023 04:45:16 -0800 (PST) X-Google-Smtp-Source: AK7set+BnQVlwuMZoiorodugHjfElDAPw39DOrsQJZdKPBGzV3MWdbG9eOgnijTHNvTbimyXIRJV X-Received: by 2002:a05:6a20:7d9c:b0:bc:246c:9bdf with SMTP id v28-20020a056a207d9c00b000bc246c9bdfmr703528pzj.1.1676897116164; Mon, 20 Feb 2023 04:45:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676897116; cv=none; d=google.com; s=arc-20160816; b=nmPg9xpE894hC6u0TRnMIiM8Jdb8f32SMhH9r2RSzI8fOSAV+Mn9ZJ9ITR7S3/gHLE QwMCviqhzk67Dd+zS2Kjm8LApj3PxI4OMFTfn27tEiAW9T18ucgY9NPM77mM7mlqmGgv opYNmCr9lgViESK/fNjBQGR3BHHh8LITYmTD5RCXdkto1K3Jek+JojIcLt6y7epKCdV3 nUYqtFJOUbkIwRJjA/RV+HdbxoZqIyuCqt73svbqbUyxWjmRLIOsKiQSKozcJqfUYjz3 6+4PLgyUpXp0lMmV/R2SbJZk1Np7I3nIyjpkM7gLYXFeAkexvp7yzIGy8EvEszemILk+ 4ZTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=j8S8KwqGSfvA+BWFE3awZzpfdbRcW5+HPxYAuXJJeCs=; b=rSYrohAJnk2x5qIEUC94ycer4Nmu4n8y3T7VLsAItG9boT88L2LRJAzrGMVC9r+3aa dZCCTcMJajM8rXje12rXj9Fa6/sOWW4wsofDwSzT4uSuFNZyDFKTIsCMr6ZYLlh70Igk tqKtDPdatmPNN4FjN6D/dbjD+m3yMwXBK9/sR2N1ESw23h9dVecJrWZs1+7a4YfWfMk6 IK6H9DCHxme1Xr24YE8iXZSwIdogEJTe1dp9caqhr94KJaIa2DAUaRwqnWP0kl9asQ0V F8TLI7W9Zy1zt0hI0wubQXmKMAPqVnW2vgHpJrf9MvXeG4lFtdE/W5s90yNPqbik1KZn p/lQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HhSXVcdg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u185-20020a6385c2000000b00502b0909af2si7352992pgd.71.2023.02.20.04.45.03; Mon, 20 Feb 2023 04:45:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HhSXVcdg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232138AbjBTMlz (ORCPT + 99 others); Mon, 20 Feb 2023 07:41:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232122AbjBTMlt (ORCPT ); Mon, 20 Feb 2023 07:41:49 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16A921C32E for ; Mon, 20 Feb 2023 04:41:48 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B88ECB80D1D for ; Mon, 20 Feb 2023 12:41:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 36ABAC433D2; Mon, 20 Feb 2023 12:41:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1676896905; bh=5sxyMTW09Kl3ct1nkcDaj+gcruIUS7UwkGWllRn6QWo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HhSXVcdgC6xEPlUxPKP49VgnejEr9wrqf4dGce43aXAn2W68mjRZi3Z51o58oLw0i ePRSOGz/OIswWofrZXxN5RNZi8lt7FAaefHaTsu+kZqp7MJ7knumFmhcvnI1B7jizk 2IlEOtdEArdBjsbsSbD9jhCo6PUBKB6X2iJT8Cp5TkT3BnDFydel2Vmt5C04p4+Rzj Vk9l5bbyYT0G0tLK8Oj4H889QtilvdxjzAydXpb3FLJE8MDHGxxcVlFRcRV2D5ZdgB nN03UHuOd9HOj6Y6rblB1FNd9+5h2p/qeymTk61SmAmYDpu+WCOQ7DGrgO1SJRTvJ5 8a97N3yOdP4Hg== From: Frederic Weisbecker To: Thomas Gleixner Cc: LKML , Frederic Weisbecker , Alexey Dobriyan , Peter Zijlstra , Wei Li , Mirsad Goran Todorovac , Yu Liao , Hillf Danton , Ingo Molnar Subject: [PATCH 3/7] timers/nohz: Protect idle/iowait sleep time under seqcount Date: Mon, 20 Feb 2023 13:41:25 +0100 Message-Id: <20230220124129.519477-4-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230220124129.519477-1-frederic@kernel.org> References: <20230220124129.519477-1-frederic@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758354069948601792?= X-GMAIL-MSGID: =?utf-8?q?1758354069948601792?= Reading idle/io sleep time (eg: from /proc/stat) can race with idle exit updates because the state machine handling the stats is not atomic and requires a coherent read batch. As a result reading the sleep time may report irrelevant or backward values. Fix this with protecting the simple state machine within a seqcount. This is expected to be cheap enough not to add measurable performance impact on the idle path. Note this only fixes reader VS writer condition partitially. A race remains that involves remote updates of the CPU iowait task counter. It can hardly be fixed. Reported-by: Yu Liao Acked-by: Peter Zijlstra (Intel) Cc: Hillf Danton Cc: Yu Liao Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Wei Li Cc: Alexey Dobriyan Cc: Mirsad Goran Todorovac Signed-off-by: Frederic Weisbecker --- kernel/time/tick-sched.c | 22 ++++++++++++++++------ kernel/time/tick-sched.h | 1 + 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 9058b9eb8bc1..90d9b7b29875 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -646,6 +646,7 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) delta = ktime_sub(now, ts->idle_entrytime); + write_seqcount_begin(&ts->idle_sleeptime_seq); if (nr_iowait_cpu(smp_processor_id()) > 0) ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta); else @@ -653,14 +654,18 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) ts->idle_entrytime = now; ts->idle_active = 0; + write_seqcount_end(&ts->idle_sleeptime_seq); sched_clock_idle_wakeup_event(); } static void tick_nohz_start_idle(struct tick_sched *ts) { + write_seqcount_begin(&ts->idle_sleeptime_seq); ts->idle_entrytime = ktime_get(); ts->idle_active = 1; + write_seqcount_end(&ts->idle_sleeptime_seq); + sched_clock_idle_sleep_event(); } @@ -668,6 +673,7 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime, bool compute_delta, u64 *last_update_time) { ktime_t now, idle; + unsigned int seq; if (!tick_nohz_active) return -1; @@ -676,13 +682,17 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime, if (last_update_time) *last_update_time = ktime_to_us(now); - if (ts->idle_active && compute_delta) { - ktime_t delta = ktime_sub(now, ts->idle_entrytime); + do { + seq = read_seqcount_begin(&ts->idle_sleeptime_seq); - idle = ktime_add(*sleeptime, delta); - } else { - idle = *sleeptime; - } + if (ts->idle_active && compute_delta) { + ktime_t delta = ktime_sub(now, ts->idle_entrytime); + + idle = ktime_add(*sleeptime, delta); + } else { + idle = *sleeptime; + } + } while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq)); return ktime_to_us(idle); diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h index c6663254d17d..5ed5a9d41d5a 100644 --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -75,6 +75,7 @@ struct tick_sched { ktime_t idle_waketime; /* Idle entry */ + seqcount_t idle_sleeptime_seq; ktime_t idle_entrytime; /* Tick stop */