From patchwork Wed Feb 22 14:46:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 60564 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp618180wrd; Wed, 22 Feb 2023 06:52:14 -0800 (PST) X-Google-Smtp-Source: AK7set8sXuV5H+LIKpSMQS1LYaBX8yMHG8vouc8RdkSVG4YVsPlt58cHF+7xYuN1/R7LWYaXEzwq X-Received: by 2002:a17:906:8586:b0:8b2:7567:9c30 with SMTP id v6-20020a170906858600b008b275679c30mr21463404ejx.59.1677077534088; Wed, 22 Feb 2023 06:52:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677077534; cv=none; d=google.com; s=arc-20160816; b=DkpJYXzaGgF9MxlhzSdh/6BNeYEfA2kexyzSdBAbHuG66G/VN9pQs76WbPS2f+Ia7i y6fv6i1fC63j8JlhsD4SkpXAVZ2H4oiam8UtXTMoIH4/EVWD4jGJXIVbLKDH0xgTYh9E WQL2Q8kyLjI9LNyUl1wlU9DNGMjHx8trDnXqJH2e8nNK/81aFpRvi4eP5I1BxgY0XFzY 0T3t0Nwoc5yz7GlPHqq+EEltuLJ7Fr2RezHZ5Z2cCXM0w+VkNI3gSwk3b3hEn83co3ZI Jqygz3J/CnJF6NrBOGk/GR6RHUUUJvAUoNxHY5n9PUL8F646uArciC7Z+sPH1fDXSZlQ khKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NrB7afxi3d5+PqIM9nM6wjQbxSOb/RrygeHypAOKagQ=; b=HQxMxj1mUS6voWy7QIJfgm11nY/9eVMfoa0szfG2GrIed/UvHE+GbeIwCLQP8Sx3lW kr8/hNQAgEAybjZ2KgexgS5Df7xrqlZ9V1LflqkpsgQxUq+pTuAAPiVhepWAVfJ4sHPg Q0y5mhpICSKp+tn6IwDPeJjgFLdzvYX6fH9Hk7aBDnpoPpasSQOle8ensKVYcpml7gFy yKQmvWRqXnq2D31FCszGBd6wDluGZFa8MqOsGe45iyG8SSFreRu3xEOD7cLAnHub3iRZ 3X/hn3DZzN6cAaT8hL4mxhbzimZQ3zl/SYg+q34KW6U7j43Pdyh0YeNhdZDQbQvmNc8X s2gw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=freejV55; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wc23-20020a170907125700b008d2f41325e4si9201237ejb.420.2023.02.22.06.51.48; Wed, 22 Feb 2023 06:52:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=freejV55; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232284AbjBVOrt (ORCPT + 99 others); Wed, 22 Feb 2023 09:47:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232239AbjBVOrf (ORCPT ); Wed, 22 Feb 2023 09:47:35 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A43C63B215 for ; Wed, 22 Feb 2023 06:47:20 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 85E1C6148E for ; Wed, 22 Feb 2023 14:47:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9F62C4339E; Wed, 22 Feb 2023 14:47:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677077227; bh=60UVPMKRBBZlhZputsHs6D/GUWH9jHmYFrcL5ldfROQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=freejV5546gLOuur9fYo+33buf40+Z96Mnn9j+UC1Jxp8s5/spdnEbt5zWmtjnFOc QXWQeq1FBg88cz9a7m5Gy36Ckj9xyFCDnW4KgPvsM7XrwYmAjJ1a7+Jsz1LPdA5FdL vH1yWwgm168Z5/4gQlURHesskOnAf4YQ3LQzCQ7F4B/yEKWx9uvjxJ5hMnNE+5xiqB jVMtIzfw+iF9GWl+VU1vnLRyMsgCU3nCDtNcEsoAbNycgCVZSVELl2HPGL7I8nvY/c fPG7t6B3IQLqPDmbHcRJ5RW1slecVqquDxO1pqOc1kSlvbmr2RZWPZQz+zETA6z6NN czPhb1KyzQPPg== From: Frederic Weisbecker To: Thomas Gleixner Cc: LKML , Frederic Weisbecker , Alexey Dobriyan , Wei Li , Peter Zijlstra , Mirsad Goran Todorovac , Yu Liao , Hillf Danton , Ingo Molnar Subject: [PATCH 4/8] timers/nohz: Add a comment about broken iowait counter update race Date: Wed, 22 Feb 2023 15:46:45 +0100 Message-Id: <20230222144649.624380-5-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230222144649.624380-1-frederic@kernel.org> References: <20230222144649.624380-1-frederic@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758543252374747903?= X-GMAIL-MSGID: =?utf-8?q?1758543252374747903?= The per-cpu iowait task counter is incremented locally upon sleeping. But since the task can be woken to (and by) another CPU, the counter may then be decremented remotely. This is the source of a race involving readers VS writer of idle/iowait sleeptime. The following scenario shows an example where a /proc/stat reader observes a pending sleep time as IO whereas that pending sleep time later eventually gets accounted as non-IO. CPU 0 CPU 1 CPU 2 ----- ----- ------ //io_schedule() TASK A current->in_iowait = 1 rq(0)->nr_iowait++ //switch to idle // READ /proc/stat // See nr_iowait_cpu(0) == 1 return ts->iowait_sleeptime + ktime_sub(ktime_get(), ts->idle_entrytime) //try_to_wake_up(TASK A) rq(0)->nr_iowait-- //idle exit // See nr_iowait_cpu(0) == 0 ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime) As a result subsequent reads on /proc/stat may expose backward progress. This is unfortunately hardly fixable. Just add a comment about that condition. Acked-by: Peter Zijlstra (Intel) Cc: Hillf Danton Cc: Yu Liao Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Wei Li Cc: Alexey Dobriyan Cc: Mirsad Goran Todorovac Signed-off-by: Frederic Weisbecker --- kernel/time/tick-sched.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 90d9b7b29875..edd6e9f26d16 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -705,7 +705,10 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime, * counters if NULL. * * Return the cumulative idle time (since boot) for a given - * CPU, in microseconds. + * CPU, in microseconds. Note this is partially broken due to + * the counter of iowait tasks that can be remotely updated without + * any synchronization. Therefore it is possible to observe backward + * values within two consecutive reads. * * This time is measured via accounting rather than sampling, * and is as accurate as ktime_get() is. @@ -728,7 +731,10 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); * counters if NULL. * * Return the cumulative iowait time (since boot) for a given - * CPU, in microseconds. + * CPU, in microseconds. Note this is partially broken due to + * the counter of iowait tasks that can be remotely updated without + * any synchronization. Therefore it is possible to observe backward + * values within two consecutive reads. * * This time is measured via accounting rather than sampling, * and is as accurate as ktime_get() is.