Message ID | 2692681.mvXUDI8C0e@kreacher |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1239331wrd; Mon, 13 Mar 2023 08:08:17 -0700 (PDT) X-Google-Smtp-Source: AK7set+KQ9bHq22sRlF7UoHbqtS763MRjGd6qKNF7d/vhXamHIb71TM9LWVm3S45Kj/z9t0eG2wu X-Received: by 2002:a05:6a20:69a3:b0:cd:49a4:305d with SMTP id t35-20020a056a2069a300b000cd49a4305dmr55011282pzk.11.1678720097643; Mon, 13 Mar 2023 08:08:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678720097; cv=none; d=google.com; s=arc-20160816; b=Kb6vnpN/ljtmp5bjXm+/QKexPNRpnosGADRzYa8H8FYH4yfj2hLNLBctbpIKXFBhPe YO3eTeR3+RyrXafuRfOBbsKcxVekjOI0PSQzz7sIFAXvjJ72Y5o+IHl0epNIJuLic2gV uWptnckgTLUb/CYpUQHJfM7p6netEsxP0do06/m1hLyblAwcHPz/s9VpdbfNYxkcKyUv 2jJ3ixVyE0akwiaBcjoVLS7txRu44Gl7wUzvlnkg7O8aF7Clw/GtwRXWIMQEm1lNpeMr KEaQxFyZqBbHcwZh7uWFjFjCKxg8kQ6JayuUqzoub8IgXIMNwL1uh4ovMGbVbfd5A202 YwoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=GibcWFnJA7+HJe2XfQm+r7Uc/P9w67U1KC09QVbc3mc=; b=N+fyeHa3NMvqteOC0VExXPIIsSkCgtffzFfpcKEgW1VqBlcHhtVzL/OGSVyo3C1oV4 jJa4Gp3tlIP+TJJauCcowwot7J/s/ZNPDnF34wEWxm4F70viTm0C/gjpDBdjS6V4NhK/ EDgtcrR1fLlsoLV+onOe26J3/HAik/jFZfQuB/8clLQlckOmkKmQ8evgdwvbj4nnLsA5 wn2VbU9Jx+aYH4S21i7hHX5lIk7d/qw+L1fJm2fHybdxA9VI/7Fy6isvuUxhpI3zEMQW VQA9735J/YZl4ynTMvG+0tVOCGNR082uPqgHomcBdzo18Iwt2fOJDoDZ5xKEactgI7Wo RzhQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r80-20020a632b53000000b005032da97824si6666264pgr.781.2023.03.13.08.08.02; Mon, 13 Mar 2023 08:08:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230286AbjCMOfE (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Mon, 13 Mar 2023 10:35:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231455AbjCMOeq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 13 Mar 2023 10:34:46 -0400 Received: from cloudserver094114.home.pl (cloudserver094114.home.pl [79.96.170.134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E5C35CEC6; Mon, 13 Mar 2023 07:34:41 -0700 (PDT) Received: from localhost (127.0.0.1) (HELO v370.home.net.pl) by /usr/run/smtp (/usr/run/postfix/private/idea_relay_lmtp) via UNIX with SMTP (IdeaSmtpServer 5.1.0) id 3f74474fbd97d700; Mon, 13 Mar 2023 15:34:39 +0100 Received: from kreacher.localnet (unknown [213.134.189.11]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by v370.home.net.pl (Postfix) with ESMTPSA id 0D8B29C5854; Mon, 13 Mar 2023 15:34:38 +0100 (CET) From: "Rafael J. Wysocki" <rjw@rjwysocki.net> To: Linux PM <linux-pm@vger.kernel.org> Cc: Zhang Rui <rui.zhang@intel.com>, Linux ACPI <linux-acpi@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, Daniel Lezcano <daniel.lezcano@linaro.org>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Viresh Kumar <viresh.kumar@linaro.org>, Quanxian Wang <quanxian.wang@intel.com> Subject: [PATCH v2 0/4] thermal: core/ACPI: Fix processor cooling device regression Date: Mon, 13 Mar 2023 15:24:27 +0100 Message-ID: <2692681.mvXUDI8C0e@kreacher> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="UTF-8" X-CLIENT-IP: 213.134.189.11 X-CLIENT-HOSTNAME: 213.134.189.11 X-VADE-SPAMSTATE: clean X-VADE-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedvhedrvddvgedgieegucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecujffqoffgrffnpdggtffipffknecuuegrihhlohhuthemucduhedtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvfevufffkfgggfgtsehtufertddttdejnecuhfhrohhmpedftfgrfhgrvghlucflrdcuhgihshhotghkihdfuceorhhjfiesrhhjfiihshhotghkihdrnhgvtheqnecuggftrfgrthhtvghrnhepgeffhfdujeelhfdtgeffkeetudfhtefhhfeiteethfekvefgvdfgfeeikeeigfehnecuffhomhgrihhnpehkvghrnhgvlhdrohhrghenucfkphepvddufedrudefgedrudekledruddunecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepvddufedrudefgedrudekledruddupdhhvghlohepkhhrvggrtghhvghrrdhlohgtrghlnhgvthdpmhgrihhlfhhrohhmpedftfgrfhgrvghlucflrdcuhgihshhotghkihdfuceorhhjfiesrhhjfiihshhotghkihdrnhgvtheqpdhnsggprhgtphhtthhopeekpdhrtghpthhtoheplhhinhhugidqphhmsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheprhhuihdriihhrghnghesihhnthgvlhdrtghomhdprhgtphhtthhopehlihhnuhigqdgrtghpihesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrdhk vghrnhgvlhdrohhrghdprhgtphhtthhopegurghnihgvlhdrlhgviigtrghnoheslhhinhgrrhhordhorhhgpdhrtghpthhtohepshhrihhnihhvrghsrdhprghnughruhhvrggurgeslhhinhhugidrihhnthgvlhdrtghomh X-DCC--Metrics: v370.home.net.pl 1024; Body=8 Fuz1=8 Fuz2=8 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760265604841729442?= X-GMAIL-MSGID: =?utf-8?q?1760265604841729442?= |
Series |
thermal: core/ACPI: Fix processor cooling device regression
|
|
Message
Rafael J. Wysocki
March 13, 2023, 2:24 p.m. UTC
Hi All, The first revision of this patch series was posted as https://lore.kernel.org/linux-pm/2148907.irdbgypaU6@kreacher/ As reported by Rui in this thread: Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/ some recent changes in the thermal core cause the CPU cooling devices registered by the ACPI processor driver to become unusable in some cases and somewhat crippled in general. The problem is that the ACPI processor driver changes its ->get_max_state() callback return value depending on whether or not cpufreq is available and there is a cpufreq policy for a given CPU. However, the thermal core has always assumed that the return value of that callback will not change, which in fact is relied on by the cooling device statistics code. In particular, when the ->get_max_state() grows, the memory buffer allocated for storing the statistics will be too small and corruption may ensue as a result. For this reason, the issue needs to be addressed in the ACPI processor driver and not in the thermal core, but the core needs to help somewhat too. Namely, it needs to provide a helper allowing an interested driver to update the max_state value for an already registered cooling device in certain situations which will also cause the statistics to be rebuilt. This series implements the above and for details please refer to the individual patch chagelogs. Thanks!
Comments
Hi, Rafael, The only concern to me is that, in thermal_cooling_device_update(), we should handle the cases that the cooling device is current used by one/more thermal zone. say, something like list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) { /* e.g. what to do if tz1 set it to state 1 previously */ } I have not got a clear idea what we should do here. But given that I have confirmed that this patch series fixes the original problem, and the ACPI passive cooling is unlikely to be triggered before CPUFREQ_CREATE_POLICY notification, probably we can address that problem later. Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> thanks, rui On Mon, 2023-03-13 at 15:24 +0100, Rafael J. Wysocki wrote: > Hi All, > > The first revision of this patch series was posted as > > https://lore.kernel.org/linux-pm/2148907.irdbgypaU6@kreacher/ > > As reported by Rui in this thread: > > Link: > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/ > > some recent changes in the thermal core cause the CPU cooling devices > registered by the ACPI processor driver to become unusable in some > cases > and somewhat crippled in general. > > The problem is that the ACPI processor driver changes its > ->get_max_state() > callback return value depending on whether or not cpufreq is > available and > there is a cpufreq policy for a given CPU. However, the thermal core > has > always assumed that the return value of that callback will not > change, which > in fact is relied on by the cooling device statistics code. In > particular, > when the ->get_max_state() grows, the memory buffer allocated for > storing the > statistics will be too small and corruption may ensue as a result. > > For this reason, the issue needs to be addressed in the ACPI > processor driver > and not in the thermal core, but the core needs to help somewhat > too. Namely, > it needs to provide a helper allowing an interested driver to update > the > max_state value for an already registered cooling device in certain > situations > which will also cause the statistics to be rebuilt. > > This series implements the above and for details please refer to the > individual > patch chagelogs. > > Thanks! > > >
On Mon, Mar 13, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com> wrote: > > Hi, Rafael, > > The only concern to me is that, in thermal_cooling_device_update(), we > should handle the cases that the cooling device is current used by > one/more thermal zone. say, something like > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) { > /* e.g. what to do if tz1 set it to state 1 previously */ > } > I have not got a clear idea what we should do here. For each instance, set upper to max_state if above it and set target to upper if above it I'd say. I guess otherwise there may be some confusion in principle and I have missed that piece, so thanks for pointing it out! > But given that I have confirmed that this patch series fixes the > original problem, and the ACPI passive cooling is unlikely to be > triggered before CPUFREQ_CREATE_POLICY notification, probably we can > address that problem later. > > Tested-by: Zhang Rui <rui.zhang@intel.com> > Reviewed-by: Zhang Rui <rui.zhang@intel.com> Thank you!
On Mon, 2023-03-13 at 19:02 +0100, Rafael J. Wysocki wrote: > On Mon, Mar 13, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com> > wrote: > > Hi, Rafael, > > > > The only concern to me is that, in thermal_cooling_device_update(), > > we > > should handle the cases that the cooling device is current used by > > one/more thermal zone. say, something like > > > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) { > > /* e.g. what to do if tz1 set it to state 1 previously */ > > } > > I have not got a clear idea what we should do here. > > For each instance, set upper to max_state if above it and set target > to upper if above it I'd say. > Say, before update, max_state: 3 target: 1 upper is set to 3 because upper == THERMAL_NO_LIMIT during binding then, after update max_state: 7 target: ? upper: ? Maybe we should do unbind and rebind, and then set target to THERMAL_NO_TARGET? it is really the governor that should set the target. > I guess otherwise there may be some confusion in principle and I have > missed that piece, so thanks for pointing it out! > > > But given that I have confirmed that this patch series fixes the > > original problem, and the ACPI passive cooling is unlikely to be > > triggered before CPUFREQ_CREATE_POLICY notification, probably we > > can > > address that problem later. > > > > Tested-by: Zhang Rui <rui.zhang@intel.com> > > Reviewed-by: Zhang Rui <rui.zhang@intel.com> > > I recalled that patchwork used to catch these tags here and apply them to every patches in the series, so the tags are appended automatically when applying the patches. But it apparently does not work now. Let me reply to the patches one by one. thanks, rui