Message ID | 20230324070807.6342-2-rui.zhang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp429195vqo; Fri, 24 Mar 2023 00:12:03 -0700 (PDT) X-Google-Smtp-Source: AKy350Zy0asQBlgTqv4LKd78Kh6Wjeuo+dYloqLE5qziT88hkhFTTjBQotd5YhoVIobClvyrXCy1 X-Received: by 2002:a62:1a10:0:b0:625:e77b:433e with SMTP id a16-20020a621a10000000b00625e77b433emr1767131pfa.24.1679641922799; Fri, 24 Mar 2023 00:12:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679641922; cv=none; d=google.com; s=arc-20160816; b=ysf952u8GoOoUMXe0UtbGOBmeEFI6kF3diNK7/57r6dYoh0ddsd+y8fS8PRQrlCqn5 eUQBVdzU+ibSp4kwtfc5Z48K8Z33f4GQYcveinlTzBjRKaoNcXYjbc87m5pENvTLV2/G Y1/UMPlphCeCaPJMAnNaCoOZLWDyCY3N4EmECeoxUs+Xlp0dOlhqD/9DPjAsRrGRSt4x xjk8AWm0fgkk3zuM1KYxThg5M25DFGSGUhV4Ecoe0x/scih88yeYEw9oMl9mV9H2TXf5 dulqQTeLCUY2Q4ireJFX+HI9XUuZTsGTbeFoih7AoQjTGA5xKog6aG5lFHlUUFK0VRsc BhoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Pjkuq56Hjqbyp3nRU5SXqN8WHBcPHrTnvEXcZbJ2lxY=; b=PRqLuh7DoRnH9b8jTU7RY044YPxRLxINbz8mggxfKymiSJm3Vzv8DjBYk0cNtOkvfE /HiFWDIOazQtJ72gjIOJL++ZziKte7XKllrh0NHT2jWYg0Z/WL+Jc3fYPNnO8C2vhBsT hNcrkoJVOJyJIj67qlY4QyIxDmDZCyE4ejoZDviAjptLt4wtfDYCCzZ2m7WF0omae+s1 wr/MVjsxZ/fJOTqYiZ/B8G9WqGiCFlBK/Q/oum4XiHG58AQIW0DdZm0St01/nVW9WkHN gcJi4nxsp5rWr4QR8hAvo5E3Tab2OkJAgrhgcV90MUUR2lYkJAYtHjtzm4gMZsgwM29V 00CQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=F4jTTHnM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q19-20020a632a13000000b0050bc928c0d4si21003156pgq.418.2023.03.24.00.11.50; Fri, 24 Mar 2023 00:12:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=F4jTTHnM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231215AbjCXHI3 (ORCPT <rfc822;ezelljr.billy@gmail.com> + 99 others); Fri, 24 Mar 2023 03:08:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230491AbjCXHIS (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 24 Mar 2023 03:08:18 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B969E2723; Fri, 24 Mar 2023 00:08:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679641697; x=1711177697; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DNQRMQLHwGWEKQ5jQmcWcG0jrDlQUzQ9voomVMZTSz8=; b=F4jTTHnMhJRVdfDYr1sqW1xw7ofz+7oJ+Q28R37QQei5ja0chpI+LGC8 0bcoMm92u7djLPv75f1T+ygOOmUhbgDbXNVmmeBqOIo75ZVweCpytAGOF RosBzFrpNgM71FXopBLtbpaWLeAt8yiZLT2vAN7C+cMJIKnA14syONI6R Oy6Lfki9QG+qdHxTME1dz7/30m4Q34TrBZFyNLW4RNXdPY1eiuZGn0UjA GgGlHtQ6m5mH+Dx34hWvE2w263maJnKKc39gO8Pl5i++Kr1ovifcYPhZF 9VX7UEmrefGpSo9Fv/K+Zx/BHIlOsyQRUzd8t7KEyF97jI+ZU96LLW2Vu A==; X-IronPort-AV: E=McAfee;i="6600,9927,10658"; a="402296717" X-IronPort-AV: E=Sophos;i="5.98,287,1673942400"; d="scan'208";a="402296717" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2023 00:08:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10658"; a="747046053" X-IronPort-AV: E=Sophos;i="5.98,287,1673942400"; d="scan'208";a="747046053" Received: from fli4-mobl1.ccr.corp.intel.com (HELO rzhang1-DESK.intel.com) ([10.255.28.30]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2023 00:08:16 -0700 From: Zhang Rui <rui.zhang@intel.com> To: linux-pm@vger.kernel.org, rafael.j.wysocki@intel.com, daniel.lezcano@linaro.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH 2/5] thermal/core: Reset cooling state during cooling device unregistration Date: Fri, 24 Mar 2023 15:08:04 +0800 Message-Id: <20230324070807.6342-2-rui.zhang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230324070807.6342-1-rui.zhang@intel.com> References: <20230324070807.6342-1-rui.zhang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761232208719528013?= X-GMAIL-MSGID: =?utf-8?q?1761232208719528013?= |
Series |
[v2,1/5] thermal/core: Update cooling device during thermal zone unregistration
|
|
Commit Message
Zhang, Rui
March 24, 2023, 7:08 a.m. UTC
When unregistering a cooling device, it is possible that the cooling
device has been activated. And once the cooling device is unregistered,
no one will deactivate it anymore.
Reset cooling state during cooling device unregistration.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
In theory, this problem that this patch fixes can be triggered on a
platform with ACPI Active cooling, by
1. overheat the system to trigger ACPI active cooling
2. unload ACPI fan driver
3. check if the fan is still spinning
But I don't have such a system so I didn't trigger then problem and I
only did build & boot test.
---
drivers/thermal/thermal_core.c | 4 ++++
1 file changed, 4 insertions(+)
Comments
On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> wrote: > > When unregistering a cooling device, it is possible that the cooling > device has been activated. And once the cooling device is unregistered, > no one will deactivate it anymore. > > Reset cooling state during cooling device unregistration. > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > In theory, this problem that this patch fixes can be triggered on a > platform with ACPI Active cooling, by > 1. overheat the system to trigger ACPI active cooling > 2. unload ACPI fan driver > 3. check if the fan is still spinning > But I don't have such a system so I didn't trigger then problem and I > only did build & boot test. So I'm not sure if this change is actually safe. In the example above, the system will still need the fan to spin after the ACPI fan driver is unloaded in order to cool down, won't it? > --- > drivers/thermal/thermal_core.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 30ff39154598..fd54e6c10b60 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -1192,6 +1192,10 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev) > } > } > > + mutex_lock(&cdev->lock); > + cdev->ops->set_cur_state(cdev, 0); > + mutex_unlock(&cdev->lock); > + > mutex_unlock(&thermal_list_lock); > > device_unregister(&cdev->device); > -- > 2.25.1 >
On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote: > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> > wrote: > > When unregistering a cooling device, it is possible that the > > cooling > > device has been activated. And once the cooling device is > > unregistered, > > no one will deactivate it anymore. > > > > Reset cooling state during cooling device unregistration. > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > In theory, this problem that this patch fixes can be triggered on a > > platform with ACPI Active cooling, by > > 1. overheat the system to trigger ACPI active cooling > > 2. unload ACPI fan driver > > 3. check if the fan is still spinning > > But I don't have such a system so I didn't trigger then problem and > > I > > only did build & boot test. > > So I'm not sure if this change is actually safe. > > In the example above, the system will still need the fan to spin > after > the ACPI fan driver is unloaded in order to cool down, won't it? Then we can argue that the ACPI fan driver should not be unloaded in this case. Actually, this is the same situation as patch 1/5. Patch 1/5 fixes the problem that cooling state not restored to 0 when unloading the thermal driver, and this fixes the same problem when unloading the cooling device driver. thanks, rui > > > --- > > drivers/thermal/thermal_core.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/thermal/thermal_core.c > > b/drivers/thermal/thermal_core.c > > index 30ff39154598..fd54e6c10b60 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -1192,6 +1192,10 @@ void > > thermal_cooling_device_unregister(struct thermal_cooling_device > > *cdev) > > } > > } > > > > + mutex_lock(&cdev->lock); > > + cdev->ops->set_cur_state(cdev, 0); > > + mutex_unlock(&cdev->lock); > > + > > mutex_unlock(&thermal_list_lock); > > > > device_unregister(&cdev->device); > > -- > > 2.25.1 > >
On Mon, Mar 27, 2023 at 4:50 PM Zhang, Rui <rui.zhang@intel.com> wrote: > > On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote: > > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> > > wrote: > > > When unregistering a cooling device, it is possible that the > > > cooling > > > device has been activated. And once the cooling device is > > > unregistered, > > > no one will deactivate it anymore. > > > > > > Reset cooling state during cooling device unregistration. > > > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > --- > > > In theory, this problem that this patch fixes can be triggered on a > > > platform with ACPI Active cooling, by > > > 1. overheat the system to trigger ACPI active cooling > > > 2. unload ACPI fan driver > > > 3. check if the fan is still spinning > > > But I don't have such a system so I didn't trigger then problem and > > > I > > > only did build & boot test. > > > > So I'm not sure if this change is actually safe. > > > > In the example above, the system will still need the fan to spin > > after > > the ACPI fan driver is unloaded in order to cool down, won't it? > > Then we can argue that the ACPI fan driver should not be unloaded in > this case. I don't think that whether or not the driver is expected to be unloaded at a given time has any bearing on how it should behave when actually unloaded. Leaving the cooling device in its current state is "safe" from the thermal control perspective, but it may affect the general user experience (which may include performance too) going forward, so there is a tradeoff. You can argue that even if the cooling device is reset on the driver removal, there should be another thermal control mechanism in place that will take care of the overheat condition instead of it, but that mechanism may be an emergency system shutdown. What do the other cooling device drivers do in general when they get removed? > Actually, this is the same situation as patch 1/5. > Patch 1/5 fixes the problem that cooling state not restored to 0 when > unloading the thermal driver, and this fixes the same problem when > unloading the cooling device driver. Right, it is analogous.
On Mon, 2023-03-27 at 17:13 +0200, Rafael J. Wysocki wrote: > On Mon, Mar 27, 2023 at 4:50 PM Zhang, Rui <rui.zhang@intel.com> > wrote: > > On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote: > > > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> > > > wrote: > > > > When unregistering a cooling device, it is possible that the > > > > cooling > > > > device has been activated. And once the cooling device is > > > > unregistered, > > > > no one will deactivate it anymore. > > > > > > > > Reset cooling state during cooling device unregistration. > > > > > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > > --- > > > > In theory, this problem that this patch fixes can be triggered > > > > on a > > > > platform with ACPI Active cooling, by > > > > 1. overheat the system to trigger ACPI active cooling > > > > 2. unload ACPI fan driver > > > > 3. check if the fan is still spinning > > > > But I don't have such a system so I didn't trigger then problem > > > > and > > > > I > > > > only did build & boot test. > > > > > > So I'm not sure if this change is actually safe. > > > > > > In the example above, the system will still need the fan to spin > > > after > > > the ACPI fan driver is unloaded in order to cool down, won't it? > > > > Then we can argue that the ACPI fan driver should not be unloaded > > in > > this case. > > I don't think that whether or not the driver is expected to be > unloaded at a given time has any bearing on how it should behave when > actually unloaded. > > Leaving the cooling device in its current state is "safe" from the > thermal control perspective, but it may affect the general user > experience (which may include performance too) going forward, so > there > is a tradeoff. Right. If we don't have a third choice, then the question is simple. "thermal safety" vs. "user experience"? I'd vote for "thermal safety" and drop this patch series. > > What do the other cooling device drivers do in general when they get > removed? No cooling device driver has extra handling after cdev unregistration. thanks, rui
On Tue, Mar 28, 2023 at 4:46 AM Zhang, Rui <rui.zhang@intel.com> wrote: > > On Mon, 2023-03-27 at 17:13 +0200, Rafael J. Wysocki wrote: > > On Mon, Mar 27, 2023 at 4:50 PM Zhang, Rui <rui.zhang@intel.com> > > wrote: > > > On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote: > > > > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> > > > > wrote: > > > > > When unregistering a cooling device, it is possible that the > > > > > cooling > > > > > device has been activated. And once the cooling device is > > > > > unregistered, > > > > > no one will deactivate it anymore. > > > > > > > > > > Reset cooling state during cooling device unregistration. > > > > > > > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > > > --- > > > > > In theory, this problem that this patch fixes can be triggered > > > > > on a > > > > > platform with ACPI Active cooling, by > > > > > 1. overheat the system to trigger ACPI active cooling > > > > > 2. unload ACPI fan driver > > > > > 3. check if the fan is still spinning > > > > > But I don't have such a system so I didn't trigger then problem > > > > > and > > > > > I > > > > > only did build & boot test. > > > > > > > > So I'm not sure if this change is actually safe. > > > > > > > > In the example above, the system will still need the fan to spin > > > > after > > > > the ACPI fan driver is unloaded in order to cool down, won't it? > > > > > > Then we can argue that the ACPI fan driver should not be unloaded > > > in > > > this case. > > > > I don't think that whether or not the driver is expected to be > > unloaded at a given time has any bearing on how it should behave when > > actually unloaded. > > > > Leaving the cooling device in its current state is "safe" from the > > thermal control perspective, but it may affect the general user > > experience (which may include performance too) going forward, so > > there > > is a tradeoff. > > Right. > If we don't have a third choice, then the question is simple. > "thermal safety" vs. "user experience"? > > I'd vote for "thermal safety" and drop this patch series. Works for me. > > What do the other cooling device drivers do in general when they get > > removed? > > No cooling device driver has extra handling after cdev unregistration. However, the question regarding what to do when the driver of a cooling device in use is being removed is a valid one. One possible approach that comes to mind could be to defer the driver removal until the overheat condition goes away, but anyway it would be better to do that in the core IMV.
On Tue, 2023-03-28 at 19:54 +0200, Rafael J. Wysocki wrote: > > > What do the other cooling device drivers do in general when they > > > get > > > removed? > > > > No cooling device driver has extra handling after cdev > > unregistration. > > However, the question regarding what to do when the driver of a > cooling device in use is being removed is a valid one. > > One possible approach that comes to mind could be to defer the driver > removal until the overheat condition goes away, but anyway it would > be > better to do that in the core IMV. In this case, we should guarantee that the thermal zone driver is still functional. i.e. it still can get temperature change notifications and update the thermal zone. I doubt if current thermal zone drivers can guarantee this. Given that this is a rare case, and the current behavior is not perfect but still acceptable, maybe we can leave this low priority for now. thanks, rui
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 30ff39154598..fd54e6c10b60 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1192,6 +1192,10 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev) } } + mutex_lock(&cdev->lock); + cdev->ops->set_cur_state(cdev, 0); + mutex_unlock(&cdev->lock); + mutex_unlock(&thermal_list_lock); device_unregister(&cdev->device);