[1/1] Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs

Message ID 1684172191-17100-1-git-send-email-mikelley@microsoft.com
State New
Headers
Series [1/1] Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs |

Commit Message

Michael Kelley (LINUX) May 15, 2023, 5:36 p.m. UTC
  vmbus_wait_for_unload() may be called in the panic path after other
CPUs are stopped. vmbus_wait_for_unload() currently loops through
online CPUs looking for the UNLOAD response message. But the values of
CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
to stop the other CPUs, and in one of the paths the stopped CPUs
are removed from cpu_online_mask. This removal happens in both
x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
only checks the panic'ing CPU, and misses the UNLOAD response message
except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
eventually times out, but only after waiting 100 seconds.

Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
The cpu_present_mask is not modified by stopping the other CPUs in the
panic path, nor should it be.  Furthermore, the synic_message_page
being checked in vmbus_wait_for_unload() is allocated in
hv_synic_alloc() for all present CPUs. So looping through the
present CPUs is more consistent.

For additional safety, also add a check for the message_page being
NULL before looking for the UNLOAD response message.

Reported-by: John Starks <jostarks@microsoft.com>
Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/hv/channel_mgmt.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
  

Comments

Vitaly Kuznetsov May 16, 2023, 9:11 a.m. UTC | #1
Michael Kelley <mikelley@microsoft.com> writes:

> vmbus_wait_for_unload() may be called in the panic path after other
> CPUs are stopped. vmbus_wait_for_unload() currently loops through
> online CPUs looking for the UNLOAD response message. But the values of
> CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
> to stop the other CPUs, and in one of the paths the stopped CPUs
> are removed from cpu_online_mask. This removal happens in both
> x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
> only checks the panic'ing CPU, and misses the UNLOAD response message
> except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
> eventually times out, but only after waiting 100 seconds.
>
> Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
> The cpu_present_mask is not modified by stopping the other CPUs in the
> panic path, nor should it be.  Furthermore, the synic_message_page
> being checked in vmbus_wait_for_unload() is allocated in
> hv_synic_alloc() for all present CPUs. So looping through the
> present CPUs is more consistent.
>
> For additional safety, also add a check for the message_page being
> NULL before looking for the UNLOAD response message.
>
> Reported-by: John Starks <jostarks@microsoft.com>
> Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios")

I see you Cc:ed stable@ on the patch, should we also add 

Cc: stable@vger.kernel.org

here explicitly so it gets picked up by various stable backporting
scritps? I guess Wei can do it when picking the patch to the queue...

> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 007f26d..df2ba20 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -829,11 +829,14 @@ static void vmbus_wait_for_unload(void)
>  		if (completion_done(&vmbus_connection.unload_event))
>  			goto completed;
>  
> -		for_each_online_cpu(cpu) {
> +		for_each_present_cpu(cpu) {
>  			struct hv_per_cpu_context *hv_cpu
>  				= per_cpu_ptr(hv_context.cpu_context, cpu);
>  
>  			page_addr = hv_cpu->synic_message_page;
> +			if (!page_addr)
> +				continue;
> +

In theory, synic_message_page for all present CPUs is permanently
assigned in hv_synic_alloc() and we fail the whole thing if any of these
allocations fail so page_addr == NULL is likely impossible today
but there's certainly no harm in having this extra check here, this is
not a hotpath.

>  			msg = (struct hv_message *)page_addr
>  				+ VMBUS_MESSAGE_SINT;
>  
> @@ -867,11 +870,14 @@ static void vmbus_wait_for_unload(void)
>  	 * maybe-pending messages on all CPUs to be able to receive new
>  	 * messages after we reconnect.
>  	 */
> -	for_each_online_cpu(cpu) {
> +	for_each_present_cpu(cpu) {
>  		struct hv_per_cpu_context *hv_cpu
>  			= per_cpu_ptr(hv_context.cpu_context, cpu);
>  
>  		page_addr = hv_cpu->synic_message_page;
> +		if (!page_addr)
> +			continue;
> +
>  		msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
>  		msg->header.message_type = HVMSG_NONE;
>  	}

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
  
Michael Kelley (LINUX) May 16, 2023, 2:04 p.m. UTC | #2
From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Tuesday, May 16, 2023 2:12 AM
> 
> Michael Kelley <mikelley@microsoft.com> writes:
> 
> > vmbus_wait_for_unload() may be called in the panic path after other
> > CPUs are stopped. vmbus_wait_for_unload() currently loops through
> > online CPUs looking for the UNLOAD response message. But the values of
> > CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
> > to stop the other CPUs, and in one of the paths the stopped CPUs
> > are removed from cpu_online_mask. This removal happens in both
> > x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
> > only checks the panic'ing CPU, and misses the UNLOAD response message
> > except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
> > eventually times out, but only after waiting 100 seconds.
> >
> > Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
> > The cpu_present_mask is not modified by stopping the other CPUs in the
> > panic path, nor should it be.  Furthermore, the synic_message_page
> > being checked in vmbus_wait_for_unload() is allocated in
> > hv_synic_alloc() for all present CPUs. So looping through the
> > present CPUs is more consistent.
> >
> > For additional safety, also add a check for the message_page being
> > NULL before looking for the UNLOAD response message.
> >
> > Reported-by: John Starks <jostarks@microsoft.com>
> > Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios")
> 
> I see you Cc:ed stable@ on the patch, should we also add
> 
> Cc: stable@vger.kernel.org
> 
> here explicitly so it gets picked up by various stable backporting
> scritps? I guess Wei can do it when picking the patch to the queue...

Yes, the kernel test robot has already warned me about not
doing that right. :-(

> 
> > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > ---
> >  drivers/hv/channel_mgmt.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > index 007f26d..df2ba20 100644
> > --- a/drivers/hv/channel_mgmt.c
> > +++ b/drivers/hv/channel_mgmt.c
> > @@ -829,11 +829,14 @@ static void vmbus_wait_for_unload(void)
> >  		if (completion_done(&vmbus_connection.unload_event))
> >  			goto completed;
> >
> > -		for_each_online_cpu(cpu) {
> > +		for_each_present_cpu(cpu) {
> >  			struct hv_per_cpu_context *hv_cpu
> >  				= per_cpu_ptr(hv_context.cpu_context, cpu);
> >
> >  			page_addr = hv_cpu->synic_message_page;
> > +			if (!page_addr)
> > +				continue;
> > +
> 
> In theory, synic_message_page for all present CPUs is permanently
> assigned in hv_synic_alloc() and we fail the whole thing if any of these
> allocations fail so page_addr == NULL is likely impossible today
> but there's certainly no harm in having this extra check here, this is
> not a hotpath.

But consider a CoCo VM where the allocation is not done in
hv_synic_alloc().  In this case, synic_message_page is set in
hv_synic_enable_regs(), which is called only when a CPU is brought
online.  If the CPUs that are brought online are less than all present
CPUs because of kernel command line options, then we might have
synic_message_page values for other present CPUs that don't get
initialized and remain NULL.

I should probably tweak the commit message to call out this case
explicitly.

> 
> >  			msg = (struct hv_message *)page_addr
> >  				+ VMBUS_MESSAGE_SINT;
> >
> > @@ -867,11 +870,14 @@ static void vmbus_wait_for_unload(void)
> >  	 * maybe-pending messages on all CPUs to be able to receive new
> >  	 * messages after we reconnect.
> >  	 */
> > -	for_each_online_cpu(cpu) {
> > +	for_each_present_cpu(cpu) {
> >  		struct hv_per_cpu_context *hv_cpu
> >  			= per_cpu_ptr(hv_context.cpu_context, cpu);
> >
> >  		page_addr = hv_cpu->synic_message_page;
> > +		if (!page_addr)
> > +			continue;
> > +
> >  		msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
> >  		msg->header.message_type = HVMSG_NONE;
> >  	}
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 

Thanks for reviewing!
  

Patch

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 007f26d..df2ba20 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -829,11 +829,14 @@  static void vmbus_wait_for_unload(void)
 		if (completion_done(&vmbus_connection.unload_event))
 			goto completed;
 
-		for_each_online_cpu(cpu) {
+		for_each_present_cpu(cpu) {
 			struct hv_per_cpu_context *hv_cpu
 				= per_cpu_ptr(hv_context.cpu_context, cpu);
 
 			page_addr = hv_cpu->synic_message_page;
+			if (!page_addr)
+				continue;
+
 			msg = (struct hv_message *)page_addr
 				+ VMBUS_MESSAGE_SINT;
 
@@ -867,11 +870,14 @@  static void vmbus_wait_for_unload(void)
 	 * maybe-pending messages on all CPUs to be able to receive new
 	 * messages after we reconnect.
 	 */
-	for_each_online_cpu(cpu) {
+	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
 		page_addr = hv_cpu->synic_message_page;
+		if (!page_addr)
+			continue;
+
 		msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
 		msg->header.message_type = HVMSG_NONE;
 	}