[1/1] clocksource: hyper-v: Rework clocksource and sched clock setup

Message ID 1686325621-16382-1-git-send-email-mikelley@microsoft.com
State New
Headers
Series [1/1] clocksource: hyper-v: Rework clocksource and sched clock setup |

Commit Message

Michael Kelley (LINUX) June 9, 2023, 3:47 p.m. UTC
  Current code assigns either the Hyper-V TSC page or MSR-based ref counter
as the sched clock. This may be sub-optimal in two cases. First, if there
is hardware support to ensure consistent TSC frequency across live
migrations and Hyper-V is using that support, the raw TSC is a faster
source of time than the Hyper-V TSC page.  Second, the MSR-based ref
counter is relatively slow because reads require a trap to the hypervisor.
As such, it should never be used as the sched clock. The native sched
clock based on the raw TSC or jiffies is much better.

Rework the sched clock setup so it is set to the TSC page only if
Hyper-V indicates that the TSC may have inconsistent frequency across
live migrations. Also, remove the code that sets the sched clock to
the MSR-based ref counter. In the cases where it is not set, the sched
clock will then be the native sched clock.

As part of the rework, always enable both the TSC page clocksource and
the MSR-based ref counter clocksource. Set the ratings so the TSC page
clocksource is preferred. While the MSR-based ref counter clocksource
is unlikely to ever be the default, having it available for manual
selection is convenient for development purposes.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/clocksource/hyperv_timer.c | 54 ++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 31 deletions(-)
  

Comments

Dexuan Cui June 15, 2023, 5:15 a.m. UTC | #1
> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Friday, June 9, 2023 8:47 AM
> ...

Looks good to me. 

Reviewed-by: Dexuan Cui <decui@microsoft.com>
  
Daniel Lezcano June 19, 2023, 4:16 p.m. UTC | #2
On 09/06/2023 17:47, Michael Kelley wrote:
> Current code assigns either the Hyper-V TSC page or MSR-based ref counter
> as the sched clock. This may be sub-optimal in two cases. First, if there
> is hardware support to ensure consistent TSC frequency across live
> migrations and Hyper-V is using that support, the raw TSC is a faster
> source of time than the Hyper-V TSC page.  Second, the MSR-based ref
> counter is relatively slow because reads require a trap to the hypervisor.
> As such, it should never be used as the sched clock. The native sched
> clock based on the raw TSC or jiffies is much better.
> 
> Rework the sched clock setup so it is set to the TSC page only if
> Hyper-V indicates that the TSC may have inconsistent frequency across
> live migrations. Also, remove the code that sets the sched clock to
> the MSR-based ref counter. In the cases where it is not set, the sched
> clock will then be the native sched clock.
> 
> As part of the rework, always enable both the TSC page clocksource and
> the MSR-based ref counter clocksource. Set the ratings so the TSC page
> clocksource is preferred. While the MSR-based ref counter clocksource
> is unlikely to ever be the default, having it available for manual
> selection is convenient for development purposes.
> 
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>

The patch does not apply, does it depend on another patch?

Rejected chunk:

--- drivers/clocksource/hyperv_timer.c
+++ drivers/clocksource/hyperv_timer.c
@@ -485,15 +485,9 @@ static u64 notrace read_hv_clock_msr_cs(struct 
clocksource *arg)
  	return read_hv_clock_msr();
  }

-static u64 noinstr read_hv_sched_clock_msr(void)
-{
-	return (read_hv_clock_msr() - hv_sched_clock_offset) *
-		(NSEC_PER_SEC / HV_CLOCK_HZ);
-}
-
  static struct clocksource hyperv_cs_msr = {
  	.name	= "hyperv_clocksource_msr",
-	.rating	= 500,
+	.rating	= 495,
  	.read	= read_hv_clock_msr_cs,
  	.mask	= CLOCKSOURCE_MASK(64),
  	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
  
Michael Kelley (LINUX) June 19, 2023, 4:44 p.m. UTC | #3
From: Daniel Lezcano <daniel.lezcano@linaro.org> Sent: Monday, June 19, 2023 9:16 AM
> 
> On 09/06/2023 17:47, Michael Kelley wrote:
> > Current code assigns either the Hyper-V TSC page or MSR-based ref counter
> > as the sched clock. This may be sub-optimal in two cases. First, if there
> > is hardware support to ensure consistent TSC frequency across live
> > migrations and Hyper-V is using that support, the raw TSC is a faster
> > source of time than the Hyper-V TSC page.  Second, the MSR-based ref
> > counter is relatively slow because reads require a trap to the hypervisor.
> > As such, it should never be used as the sched clock. The native sched
> > clock based on the raw TSC or jiffies is much better.
> >
> > Rework the sched clock setup so it is set to the TSC page only if
> > Hyper-V indicates that the TSC may have inconsistent frequency across
> > live migrations. Also, remove the code that sets the sched clock to
> > the MSR-based ref counter. In the cases where it is not set, the sched
> > clock will then be the native sched clock.
> >
> > As part of the rework, always enable both the TSC page clocksource and
> > the MSR-based ref counter clocksource. Set the ratings so the TSC page
> > clocksource is preferred. While the MSR-based ref counter clocksource
> > is unlikely to ever be the default, having it available for manual
> > selection is convenient for development purposes.
> >
> > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> 
> The patch does not apply, does it depend on another patch?

It should apply to linux-next.  It depends on two previous patches from
Peter Zijlstra in the sched/core branch of tip.  See:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=9397fa2ea3e7634f61da1ab76b9eb88ba04dfdfc

and 

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=e39acc37db34f6688e2c16e958fb1d662c422c81

Michael

> 
> Rejected chunk:
> 
> --- drivers/clocksource/hyperv_timer.c
> +++ drivers/clocksource/hyperv_timer.c
> @@ -485,15 +485,9 @@ static u64 notrace read_hv_clock_msr_cs(struct
> clocksource *arg)
>   	return read_hv_clock_msr();
>   }
> 
> -static u64 noinstr read_hv_sched_clock_msr(void)
> -{
> -	return (read_hv_clock_msr() - hv_sched_clock_offset) *
> -		(NSEC_PER_SEC / HV_CLOCK_HZ);
> -}
> -
>   static struct clocksource hyperv_cs_msr = {
>   	.name	= "hyperv_clocksource_msr",
> -	.rating	= 500,
> +	.rating	= 495,
>   	.read	= read_hv_clock_msr_cs,
>   	.mask	= CLOCKSOURCE_MASK(64),
>   	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
> 
> 
> --
  
Daniel Lezcano June 19, 2023, 4:58 p.m. UTC | #4
On 19/06/2023 18:44, Michael Kelley (LINUX) wrote:
> From: Daniel Lezcano <daniel.lezcano@linaro.org> Sent: Monday, June 19, 2023 9:16 AM
>>
>> On 09/06/2023 17:47, Michael Kelley wrote:
>>> Current code assigns either the Hyper-V TSC page or MSR-based ref counter
>>> as the sched clock. This may be sub-optimal in two cases. First, if there
>>> is hardware support to ensure consistent TSC frequency across live
>>> migrations and Hyper-V is using that support, the raw TSC is a faster
>>> source of time than the Hyper-V TSC page.  Second, the MSR-based ref
>>> counter is relatively slow because reads require a trap to the hypervisor.
>>> As such, it should never be used as the sched clock. The native sched
>>> clock based on the raw TSC or jiffies is much better.
>>>
>>> Rework the sched clock setup so it is set to the TSC page only if
>>> Hyper-V indicates that the TSC may have inconsistent frequency across
>>> live migrations. Also, remove the code that sets the sched clock to
>>> the MSR-based ref counter. In the cases where it is not set, the sched
>>> clock will then be the native sched clock.
>>>
>>> As part of the rework, always enable both the TSC page clocksource and
>>> the MSR-based ref counter clocksource. Set the ratings so the TSC page
>>> clocksource is preferred. While the MSR-based ref counter clocksource
>>> is unlikely to ever be the default, having it available for manual
>>> selection is convenient for development purposes.
>>>
>>> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
>>
>> The patch does not apply, does it depend on another patch?
> 
> It should apply to linux-next.  It depends on two previous patches from
> Peter Zijlstra in the sched/core branch of tip.  See:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=9397fa2ea3e7634f61da1ab76b9eb88ba04dfdfc
> 
> and
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=e39acc37db34f6688e2c16e958fb1d662c422c81

Yeah, but the branch is tip/timers/core

Could you respin against it ?

Thanks
  
Michael Kelley (LINUX) June 19, 2023, 5:41 p.m. UTC | #5
From: Daniel Lezcano <daniel.lezcano@linaro.org> Sent: Monday, June 19, 2023 9:58 AM
> 
> On 19/06/2023 18:44, Michael Kelley (LINUX) wrote:
> > From: Daniel Lezcano <daniel.lezcano@linaro.org> Sent: Monday, June 19, 2023 9:16 AM
> >>
> >> On 09/06/2023 17:47, Michael Kelley wrote:
> >>> Current code assigns either the Hyper-V TSC page or MSR-based ref counter
> >>> as the sched clock. This may be sub-optimal in two cases. First, if there
> >>> is hardware support to ensure consistent TSC frequency across live
> >>> migrations and Hyper-V is using that support, the raw TSC is a faster
> >>> source of time than the Hyper-V TSC page.  Second, the MSR-based ref
> >>> counter is relatively slow because reads require a trap to the hypervisor.
> >>> As such, it should never be used as the sched clock. The native sched
> >>> clock based on the raw TSC or jiffies is much better.
> >>>
> >>> Rework the sched clock setup so it is set to the TSC page only if
> >>> Hyper-V indicates that the TSC may have inconsistent frequency across
> >>> live migrations. Also, remove the code that sets the sched clock to
> >>> the MSR-based ref counter. In the cases where it is not set, the sched
> >>> clock will then be the native sched clock.
> >>>
> >>> As part of the rework, always enable both the TSC page clocksource and
> >>> the MSR-based ref counter clocksource. Set the ratings so the TSC page
> >>> clocksource is preferred. While the MSR-based ref counter clocksource
> >>> is unlikely to ever be the default, having it available for manual
> >>> selection is convenient for development purposes.
> >>>
> >>> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> >>
> >> The patch does not apply, does it depend on another patch?
> >
> > It should apply to linux-next.  It depends on two previous patches from
> > Peter Zijlstra in the sched/core branch of tip.  See:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=9397fa2ea3e7634f61da1ab76b9eb88ba04dfdfc
> >
> > and
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=e39acc37db34f6688e2c16e958fb1d662c422c81
>
> Yeah, but the branch is tip/timers/core
> 
> Could you respin against it ?
> 

Ah, OK.  Just to confirm, you are saying that this patch would go through the
tip/timers/core branch, which does *not* have Peter's patches.  Resolving the
merge conflict will be done within the tip branches.

Yes, I'll respin for tip/timers/core.

Michael
  

Patch

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index d851970..e56307a 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -485,15 +485,9 @@  static u64 notrace read_hv_clock_msr_cs(struct clocksource *arg)
 	return read_hv_clock_msr();
 }
 
-static u64 noinstr read_hv_sched_clock_msr(void)
-{
-	return (read_hv_clock_msr() - hv_sched_clock_offset) *
-		(NSEC_PER_SEC / HV_CLOCK_HZ);
-}
-
 static struct clocksource hyperv_cs_msr = {
 	.name	= "hyperv_clocksource_msr",
-	.rating	= 500,
+	.rating	= 495,
 	.read	= read_hv_clock_msr_cs,
 	.mask	= CLOCKSOURCE_MASK(64),
 	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
@@ -523,7 +517,7 @@  static __always_inline void hv_setup_sched_clock(void *sched_clock)
 static __always_inline void hv_setup_sched_clock(void *sched_clock) {}
 #endif /* CONFIG_GENERIC_SCHED_CLOCK */
 
-static bool __init hv_init_tsc_clocksource(void)
+static void __init hv_init_tsc_clocksource(void)
 {
 	union hv_reference_tsc_msr tsc_msr;
 
@@ -534,17 +528,14 @@  static bool __init hv_init_tsc_clocksource(void)
 	 * Hyper-V Reference TSC rating, causing the generic TSC to be used.
 	 * TSC_INVARIANT is not offered on ARM64, so the Hyper-V Reference
 	 * TSC will be preferred over the virtualized ARM64 arch counter.
-	 * While the Hyper-V MSR clocksource won't be used since the
-	 * Reference TSC clocksource is present, change its rating as
-	 * well for consistency.
 	 */
 	if (ms_hyperv.features & HV_ACCESS_TSC_INVARIANT) {
 		hyperv_cs_tsc.rating = 250;
-		hyperv_cs_msr.rating = 250;
+		hyperv_cs_msr.rating = 245;
 	}
 
 	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
-		return false;
+		return;
 
 	hv_read_reference_counter = read_hv_clock_tsc;
 
@@ -575,33 +566,34 @@  static bool __init hv_init_tsc_clocksource(void)
 
 	clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 
-	hv_sched_clock_offset = hv_read_reference_counter();
-	hv_setup_sched_clock(read_hv_sched_clock_tsc);
-
-	return true;
+	/*
+	 * If TSC is invariant, then let it stay as the sched clock since it
+	 * will be faster than reading the TSC page. But if not invariant, use
+	 * the TSC page so that live migrations across hosts with different
+	 * frequencies is handled correctly.
+	 */
+	if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT)) {
+		hv_sched_clock_offset = hv_read_reference_counter();
+		hv_setup_sched_clock(read_hv_sched_clock_tsc);
+	}
 }
 
 void __init hv_init_clocksource(void)
 {
 	/*
-	 * Try to set up the TSC page clocksource. If it succeeds, we're
-	 * done. Otherwise, set up the MSR clocksource.  At least one of
-	 * these will always be available except on very old versions of
-	 * Hyper-V on x86.  In that case we won't have a Hyper-V
+	 * Try to set up the TSC page clocksource, then the MSR clocksource.
+	 * At least one of these will always be available except on very old
+	 * versions of Hyper-V on x86.  In that case we won't have a Hyper-V
 	 * clocksource, but Linux will still run with a clocksource based
 	 * on the emulated PIT or LAPIC timer.
+	 *
+	 * Never use the MSR clocksource as sched clock.  It's too slow.
+	 * Better to use the native sched clock as the fallback.
 	 */
-	if (hv_init_tsc_clocksource())
-		return;
-
-	if (!(ms_hyperv.features & HV_MSR_TIME_REF_COUNT_AVAILABLE))
-		return;
-
-	hv_read_reference_counter = read_hv_clock_msr;
-	clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
+	hv_init_tsc_clocksource();
 
-	hv_sched_clock_offset = hv_read_reference_counter();
-	hv_setup_sched_clock(read_hv_sched_clock_msr);
+	if (ms_hyperv.features & HV_MSR_TIME_REF_COUNT_AVAILABLE)
+		clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
 }
 
 void __init hv_remap_tsc_clocksource(void)