thermal: intel: Don't set HFI status bit to 1

Message ID 20221214020651.1362731-1-srinivas.pandruvada@linux.intel.com
State New
Headers
Series thermal: intel: Don't set HFI status bit to 1 |

Commit Message

srinivas pandruvada Dec. 14, 2022, 2:06 a.m. UTC
  When CPU doesn't support HFI (Hardware Feedback Interface), don't include
BIT 26 in the mask to prevent clearing. otherwise this results in:
    unchecked MSR access error: WRMSR to 0x1b1
      (tried to write 0x0000000004000aa8)
      at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/thermal/intel/therm_throt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Linus Torvalds Dec. 14, 2022, 3:16 a.m. UTC | #1
On Tue, Dec 13, 2022 at 6:07 PM Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
>
> When CPU doesn't support HFI (Hardware Feedback Interface), don't include
> BIT 26 in the mask to prevent clearing. otherwise this results in:
>     unchecked MSR access error: WRMSR to 0x1b1
>       (tried to write 0x0000000004000aa8)
>       at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)
>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

You can add my tested-by as well, it seems to fix the issue.

Of course, it could be that I just didn't happen to trigger the
throttling in my test just now, so that testing is pretty limited, but
at least from a very quick check it seems good.

                 Linus
  
Rafael J. Wysocki Dec. 14, 2022, 1:53 p.m. UTC | #2
On Wed, Dec 14, 2022 at 4:16 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Dec 13, 2022 at 6:07 PM Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> >
> > When CPU doesn't support HFI (Hardware Feedback Interface), don't include
> > BIT 26 in the mask to prevent clearing. otherwise this results in:
> >     unchecked MSR access error: WRMSR to 0x1b1
> >       (tried to write 0x0000000004000aa8)
> >       at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)
> >
> > Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
> > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>
> You can add my tested-by as well, it seems to fix the issue.
>
> Of course, it could be that I just didn't happen to trigger the
> throttling in my test just now, so that testing is pretty limited, but
> at least from a very quick check it seems good.

I've applied the patch for 6.2-rc1, thanks!
  

Patch

diff --git a/drivers/thermal/intel/therm_throt.c b/drivers/thermal/intel/therm_throt.c
index 4bb7fddaa143..2e22bb82b738 100644
--- a/drivers/thermal/intel/therm_throt.c
+++ b/drivers/thermal/intel/therm_throt.c
@@ -194,7 +194,7 @@  static const struct attribute_group thermal_attr_group = {
 #define THERM_STATUS_PROCHOT_LOG	BIT(1)
 
 #define THERM_STATUS_CLEAR_CORE_MASK (BIT(1) | BIT(3) | BIT(5) | BIT(7) | BIT(9) | BIT(11) | BIT(13) | BIT(15))
-#define THERM_STATUS_CLEAR_PKG_MASK  (BIT(1) | BIT(3) | BIT(5) | BIT(7) | BIT(9) | BIT(11) | BIT(26))
+#define THERM_STATUS_CLEAR_PKG_MASK  (BIT(1) | BIT(3) | BIT(5) | BIT(7) | BIT(9) | BIT(11))
 
 /*
  * Clear the bits in package thermal status register for bit = 1
@@ -211,6 +211,9 @@  void thermal_clear_package_intr_status(int level, u64 bit_mask)
 	} else {
 		msr  = MSR_IA32_PACKAGE_THERM_STATUS;
 		msr_val = THERM_STATUS_CLEAR_PKG_MASK;
+		if (boot_cpu_has(X86_FEATURE_HFI))
+			msr_val |= BIT(26);
+
 	}
 
 	msr_val &= ~bit_mask;