thermal: intel: Don't set HFI status bit to 1
Commit Message
When CPU doesn't support HFI (Hardware Feedback Interface), don't include
BIT 26 in the mask to prevent clearing. otherwise this results in:
unchecked MSR access error: WRMSR to 0x1b1
(tried to write 0x0000000004000aa8)
at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
drivers/thermal/intel/therm_throt.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
Comments
On Tue, Dec 13, 2022 at 6:07 PM Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
>
> When CPU doesn't support HFI (Hardware Feedback Interface), don't include
> BIT 26 in the mask to prevent clearing. otherwise this results in:
> unchecked MSR access error: WRMSR to 0x1b1
> (tried to write 0x0000000004000aa8)
> at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)
>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
You can add my tested-by as well, it seems to fix the issue.
Of course, it could be that I just didn't happen to trigger the
throttling in my test just now, so that testing is pretty limited, but
at least from a very quick check it seems good.
Linus
On Wed, Dec 14, 2022 at 4:16 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Dec 13, 2022 at 6:07 PM Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> >
> > When CPU doesn't support HFI (Hardware Feedback Interface), don't include
> > BIT 26 in the mask to prevent clearing. otherwise this results in:
> > unchecked MSR access error: WRMSR to 0x1b1
> > (tried to write 0x0000000004000aa8)
> > at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)
> >
> > Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
> > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>
> You can add my tested-by as well, it seems to fix the issue.
>
> Of course, it could be that I just didn't happen to trigger the
> throttling in my test just now, so that testing is pretty limited, but
> at least from a very quick check it seems good.
I've applied the patch for 6.2-rc1, thanks!
@@ -194,7 +194,7 @@ static const struct attribute_group thermal_attr_group = {
#define THERM_STATUS_PROCHOT_LOG BIT(1)
#define THERM_STATUS_CLEAR_CORE_MASK (BIT(1) | BIT(3) | BIT(5) | BIT(7) | BIT(9) | BIT(11) | BIT(13) | BIT(15))
-#define THERM_STATUS_CLEAR_PKG_MASK (BIT(1) | BIT(3) | BIT(5) | BIT(7) | BIT(9) | BIT(11) | BIT(26))
+#define THERM_STATUS_CLEAR_PKG_MASK (BIT(1) | BIT(3) | BIT(5) | BIT(7) | BIT(9) | BIT(11))
/*
* Clear the bits in package thermal status register for bit = 1
@@ -211,6 +211,9 @@ void thermal_clear_package_intr_status(int level, u64 bit_mask)
} else {
msr = MSR_IA32_PACKAGE_THERM_STATUS;
msr_val = THERM_STATUS_CLEAR_PKG_MASK;
+ if (boot_cpu_has(X86_FEATURE_HFI))
+ msr_val |= BIT(26);
+
}
msr_val &= ~bit_mask;