[1/4] KVM: arm64: Allow saving vgic3 LPI pending status in no running vcpu context

Message ID 20230116040405.260935-2-gshan@redhat.com
State New
Headers
Series Improve dirty ring warning report |

Commit Message

Gavin Shan Jan. 16, 2023, 4:04 a.m. UTC
  When dirty ring is enabled, the dirty page information is pushed to
the dirty ring if there is a running VCPU context. Otherwise, the
dirty page information is still tracked by the backup dirty bitmap.
In order to detect if there is a running VCPU context when a guest
page becomes dirty, kvm_arch_allow_write_without_running_vcpu() was
introduced to warn when no running VCPU context exists on unknown
cases.

Other than the site of saving ITS tables, it's possible to save vgic3
LPI pending status in no running vcpu context because it can happen when
ITS ITE is restored through the command KVM_DEV_ARM_ITS_RESTORE_TABLES
on 'kvm-arm-vgic-its' device.

Fix it by allowing to save vgic3 LPI pending status in no running
vcpu context.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst | 5 +++--
 arch/arm64/kvm/vgic/vgic-its.c | 3 ++-
 arch/arm64/kvm/vgic/vgic-v3.c  | 3 +++
 include/kvm/arm_vgic.h         | 1 +
 4 files changed, 9 insertions(+), 3 deletions(-)
  

Comments

Oliver Upton Jan. 17, 2023, 8:51 p.m. UTC | #1
Hi Gavin,

On Mon, Jan 16, 2023 at 12:04:02PM +0800, Gavin Shan wrote:
> When dirty ring is enabled, the dirty page information is pushed to
> the dirty ring if there is a running VCPU context. Otherwise, the
> dirty page information is still tracked by the backup dirty bitmap.
> In order to detect if there is a running VCPU context when a guest
> page becomes dirty, kvm_arch_allow_write_without_running_vcpu() was
> introduced to warn when no running VCPU context exists on unknown
> cases.
> 
> Other than the site of saving ITS tables, it's possible to save vgic3
> LPI pending status in no running vcpu context because it can happen when
> ITS ITE is restored through the command KVM_DEV_ARM_ITS_RESTORE_TABLES
> on 'kvm-arm-vgic-its' device.
> 
> Fix it by allowing to save vgic3 LPI pending status in no running
> vcpu context.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst | 5 +++--
>  arch/arm64/kvm/vgic/vgic-its.c | 3 ++-
>  arch/arm64/kvm/vgic/vgic-v3.c  | 3 +++
>  include/kvm/arm_vgic.h         | 1 +
>  4 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 9807b05a1b57..18b245a0ba02 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8071,8 +8071,9 @@ state is final and avoid missing dirty pages from another ioctl ordered
>  after the bitmap collection.
>  
>  NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> -tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> -KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
> +tables and vgic3 LPI pending status through KVM_DEV_ARM_{VGIC_GRP_CTRL,
> +ITS_SAVE_TABLES} and KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES}
> +command on KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
>  
>  8.30 KVM_CAP_XEN_HVM
>  --------------------
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 94a666dd1443..119a9c7a0a52 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -2792,7 +2792,8 @@ bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>  {
>  	struct vgic_dist *dist = &kvm->arch.vgic;
>  
> -	return dist->save_its_tables_in_progress;
> +	return dist->save_vgic_v3_tables_in_progress ||
> +	       dist->save_its_tables_in_progress;

I'd much prefer using a single bool to keep track of this, i.e:

	return dist->save_tables_in_progress;

>  }
>  
>  static int vgic_its_set_attr(struct kvm_device *dev,
> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> index 2074521d4a8c..32998c8587a8 100644
> --- a/arch/arm64/kvm/vgic/vgic-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> @@ -304,6 +304,7 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
>  int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>  {
>  	struct kvm_vcpu *vcpu;
> +	struct vgic_dist *dist = &kvm->arch.vgic;
>  	int byte_offset, bit_nr;
>  	gpa_t pendbase, ptr;
>  	bool status;
> @@ -339,7 +340,9 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>  	if (status) {
>  		/* clear consumed data */
>  		val &= ~(1 << bit_nr);
> +		dist->save_vgic_v3_tables_in_progress = true;
>  		ret = kvm_write_guest_lock(kvm, ptr, &val, 1);
> +		dist->save_vgic_v3_tables_in_progress = false;

With the above suggestion of using a bool, this should become a helper
used at all the affected callsites:

  static int vgic_write_guest_lock(struct kvm *kvm, gpa_t gpa,
  				   const void *data, unsigned long len)
  {
  	struct vgic_dist *dist = &kvm->arch.vgic;
	int ret;

	dist->save_tables_in_progress = true;
	ret = kvm_write_guest_lock(kvm, gpa, data, len);
	dist->save_tables_in_progress = false;

	return ret;
  }

--
Thanks,
Oliver
  
Gavin Shan Jan. 19, 2023, 1:11 a.m. UTC | #2
Hi Oliver,

On 1/18/23 7:51 AM, Oliver Upton wrote:
> On Mon, Jan 16, 2023 at 12:04:02PM +0800, Gavin Shan wrote:
>> When dirty ring is enabled, the dirty page information is pushed to
>> the dirty ring if there is a running VCPU context. Otherwise, the
>> dirty page information is still tracked by the backup dirty bitmap.
>> In order to detect if there is a running VCPU context when a guest
>> page becomes dirty, kvm_arch_allow_write_without_running_vcpu() was
>> introduced to warn when no running VCPU context exists on unknown
>> cases.
>>
>> Other than the site of saving ITS tables, it's possible to save vgic3
>> LPI pending status in no running vcpu context because it can happen when
>> ITS ITE is restored through the command KVM_DEV_ARM_ITS_RESTORE_TABLES
>> on 'kvm-arm-vgic-its' device.
>>
>> Fix it by allowing to save vgic3 LPI pending status in no running
>> vcpu context.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   Documentation/virt/kvm/api.rst | 5 +++--
>>   arch/arm64/kvm/vgic/vgic-its.c | 3 ++-
>>   arch/arm64/kvm/vgic/vgic-v3.c  | 3 +++
>>   include/kvm/arm_vgic.h         | 1 +
>>   4 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 9807b05a1b57..18b245a0ba02 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -8071,8 +8071,9 @@ state is final and avoid missing dirty pages from another ioctl ordered
>>   after the bitmap collection.
>>   
>>   NOTE: One example of using the backup bitmap is saving arm64 vgic/its
>> -tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
>> -KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
>> +tables and vgic3 LPI pending status through KVM_DEV_ARM_{VGIC_GRP_CTRL,
>> +ITS_SAVE_TABLES} and KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES}
>> +command on KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
>>   
>>   8.30 KVM_CAP_XEN_HVM
>>   --------------------
>> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
>> index 94a666dd1443..119a9c7a0a52 100644
>> --- a/arch/arm64/kvm/vgic/vgic-its.c
>> +++ b/arch/arm64/kvm/vgic/vgic-its.c
>> @@ -2792,7 +2792,8 @@ bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>>   {
>>   	struct vgic_dist *dist = &kvm->arch.vgic;
>>   
>> -	return dist->save_its_tables_in_progress;
>> +	return dist->save_vgic_v3_tables_in_progress ||
>> +	       dist->save_its_tables_in_progress;
> 
> I'd much prefer using a single bool to keep track of this, i.e:
> 

Yes, it's clean to have 'dist->save_tables_in_progress' for all
3 cases. One more concern like below.

> 	return dist->save_tables_in_progress;
> 
>>   }
>>   
>>   static int vgic_its_set_attr(struct kvm_device *dev,
>> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
>> index 2074521d4a8c..32998c8587a8 100644
>> --- a/arch/arm64/kvm/vgic/vgic-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-v3.c
>> @@ -304,6 +304,7 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
>>   int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>>   {
>>   	struct kvm_vcpu *vcpu;
>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>>   	int byte_offset, bit_nr;
>>   	gpa_t pendbase, ptr;
>>   	bool status;
>> @@ -339,7 +340,9 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>>   	if (status) {
>>   		/* clear consumed data */
>>   		val &= ~(1 << bit_nr);
>> +		dist->save_vgic_v3_tables_in_progress = true;
>>   		ret = kvm_write_guest_lock(kvm, ptr, &val, 1);
>> +		dist->save_vgic_v3_tables_in_progress = false;
> 
> With the above suggestion of using a bool, this should become a helper
> used at all the affected callsites:
> 
>    static int vgic_write_guest_lock(struct kvm *kvm, gpa_t gpa,
>    				   const void *data, unsigned long len)
>    {
>    	struct vgic_dist *dist = &kvm->arch.vgic;
> 	int ret;
> 
> 	dist->save_tables_in_progress = true;
> 	ret = kvm_write_guest_lock(kvm, gpa, data, len);
> 	dist->save_tables_in_progress = false;
> 
> 	return ret;
>    }
> 

I will have vgic_write_guest_lock() in v2. Note that those 3 paths can't be
running in parallel since one switch is shared by them. Alternatively, we
extend struct vgic_dist::save_tables_in_progress from 'bool' to 'unsigned long'.
Several bit is defined for each site as below. In this way, the 3 paths can be
running in parallel:

   unsigned long struct vgic_dist::save_tables_in_progress

   #define VGIC_DIST_SAVE_ITS_ITE		0	/* ITS Translation Entry */
   #define VGIC_DIST_SAVE_ITS_DTE		1	/* ITS Device Table Entry */
   #define VGIC_DIST_SAVE_ITS_CTE		2	/* ITS Collection Table Entry */
   #define VGIC_DIST_SAVE_ITS_CT			3	/* ITS Collection Table */
   #define VGIC_DIST_SAVE_VGIC3_LPI		4	/* VGIC3 LPI Pending Status */
   #define VGIC_DIST_SAVE_VGIC3_PENDING_TABLE	5	/* VGIC3 Pending Table */

The drawback is the calls are limited to 64. If those 3 paths can't be running
in parallel, we needn't the extension at all.

Thanks,
Gavin
  
Marc Zyngier Jan. 19, 2023, 3:47 p.m. UTC | #3
On Thu, 19 Jan 2023 01:11:44 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> I will have vgic_write_guest_lock() in v2. Note that those 3 paths can't be
> running in parallel since one switch is shared by them. Alternatively, we
> extend struct vgic_dist::save_tables_in_progress from 'bool' to 'unsigned long'.
> Several bit is defined for each site as below. In this way, the 3 paths can be
> running in parallel:
> 
>   unsigned long struct vgic_dist::save_tables_in_progress
> 
>   #define VGIC_DIST_SAVE_ITS_ITE		0	/* ITS Translation Entry */
>   #define VGIC_DIST_SAVE_ITS_DTE		1	/* ITS Device Table Entry */
>   #define VGIC_DIST_SAVE_ITS_CTE		2	/* ITS Collection Table Entry */
>   #define VGIC_DIST_SAVE_ITS_CT			3	/* ITS Collection Table */
>   #define VGIC_DIST_SAVE_VGIC3_LPI		4	/* VGIC3 LPI Pending Status */
>   #define VGIC_DIST_SAVE_VGIC3_PENDING_TABLE	5	/* VGIC3 Pending Table */
> 
> The drawback is the calls are limited to 64. If those 3 paths can't be running
> in parallel, we needn't the extension at all.

It should all be completely sequential. KVM_DEV_ARM_ITS_SAVE_TABLES
runs in a context where everything is locked, and so is
VGIC_DIST_SAVE_VGIC3_PENDING_TABLE.

	M.
  
Gavin Shan Jan. 19, 2023, 11:04 p.m. UTC | #4
Hi Marc,

On 1/20/23 2:47 AM, Marc Zyngier wrote:
> On Thu, 19 Jan 2023 01:11:44 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>>
>> I will have vgic_write_guest_lock() in v2. Note that those 3 paths can't be
>> running in parallel since one switch is shared by them. Alternatively, we
>> extend struct vgic_dist::save_tables_in_progress from 'bool' to 'unsigned long'.
>> Several bit is defined for each site as below. In this way, the 3 paths can be
>> running in parallel:
>>
>>    unsigned long struct vgic_dist::save_tables_in_progress
>>
>>    #define VGIC_DIST_SAVE_ITS_ITE		0	/* ITS Translation Entry */
>>    #define VGIC_DIST_SAVE_ITS_DTE		1	/* ITS Device Table Entry */
>>    #define VGIC_DIST_SAVE_ITS_CTE		2	/* ITS Collection Table Entry */
>>    #define VGIC_DIST_SAVE_ITS_CT			3	/* ITS Collection Table */
>>    #define VGIC_DIST_SAVE_VGIC3_LPI		4	/* VGIC3 LPI Pending Status */
>>    #define VGIC_DIST_SAVE_VGIC3_PENDING_TABLE	5	/* VGIC3 Pending Table */
>>
>> The drawback is the calls are limited to 64. If those 3 paths can't be running
>> in parallel, we needn't the extension at all.
> 
> It should all be completely sequential. KVM_DEV_ARM_ITS_SAVE_TABLES
> runs in a context where everything is locked, and so is
> VGIC_DIST_SAVE_VGIC3_PENDING_TABLE.
> 

Thanks for your confirm. Yeah, it's sequential because 'kvm->lock' is
hold on KVM_DEV_ARM_ITS_SAVE_TABLES and VGIC_DIST_SAVE_VGIC3_PENDING_TABLE.
So all good to have one shared switch. v2 will be posted pretty soon.

Thanks,
Gavin
  

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 9807b05a1b57..18b245a0ba02 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8071,8 +8071,9 @@  state is final and avoid missing dirty pages from another ioctl ordered
 after the bitmap collection.
 
 NOTE: One example of using the backup bitmap is saving arm64 vgic/its
-tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
-KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
+tables and vgic3 LPI pending status through KVM_DEV_ARM_{VGIC_GRP_CTRL,
+ITS_SAVE_TABLES} and KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES}
+command on KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
 
 8.30 KVM_CAP_XEN_HVM
 --------------------
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 94a666dd1443..119a9c7a0a52 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2792,7 +2792,8 @@  bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 
-	return dist->save_its_tables_in_progress;
+	return dist->save_vgic_v3_tables_in_progress ||
+	       dist->save_its_tables_in_progress;
 }
 
 static int vgic_its_set_attr(struct kvm_device *dev,
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 2074521d4a8c..32998c8587a8 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -304,6 +304,7 @@  void vgic_v3_enable(struct kvm_vcpu *vcpu)
 int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
 {
 	struct kvm_vcpu *vcpu;
+	struct vgic_dist *dist = &kvm->arch.vgic;
 	int byte_offset, bit_nr;
 	gpa_t pendbase, ptr;
 	bool status;
@@ -339,7 +340,9 @@  int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
 	if (status) {
 		/* clear consumed data */
 		val &= ~(1 << bit_nr);
+		dist->save_vgic_v3_tables_in_progress = true;
 		ret = kvm_write_guest_lock(kvm, ptr, &val, 1);
+		dist->save_vgic_v3_tables_in_progress = false;
 		if (ret)
 			return ret;
 	}
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9270cd87da3f..0485b4e82b00 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -264,6 +264,7 @@  struct vgic_dist {
 
 	bool			has_its;
 	bool			save_its_tables_in_progress;
+	bool			save_vgic_v3_tables_in_progress;
 
 	/*
 	 * Contains the attributes and gpa of the LPI configuration table.