hisi_acc_vfio_pci: Update migration data pointer correctly on saving/resume

Message ID 20231120091406.780-1-shameerali.kolothum.thodi@huawei.com
State New
Headers
Series hisi_acc_vfio_pci: Update migration data pointer correctly on saving/resume |

Commit Message

Shameerali Kolothum Thodi Nov. 20, 2023, 9:14 a.m. UTC
  When the optional PRE_COPY support was added to speed up the device
compatibility check, it failed to update the saving/resuming data
pointers based on the fd offset. This results in migration data
corruption and when the device gets started on the destination the
following error is reported in some cases,

[  478.907684] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[  478.913691] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000310200000010
[  478.919603] arm-smmu-v3 arm-smmu-v3.2.auto:  0x000002088000007f
[  478.925515] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000000000000000
[  478.931425] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000000000000000
[  478.947552] hisi_zip 0000:31:00.0: qm_axi_rresp [error status=0x1] found
[  478.955930] hisi_zip 0000:31:00.0: qm_db_timeout [error status=0x400] found
[  478.955944] hisi_zip 0000:31:00.0: qm sq doorbell timeout in function 2

Fixes: d9a871e4a143 ("hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions")
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
  

Comments

Jason Gunthorpe Nov. 20, 2023, 2:29 p.m. UTC | #1
On Mon, Nov 20, 2023 at 09:14:06AM +0000, Shameer Kolothum wrote:
> When the optional PRE_COPY support was added to speed up the device
> compatibility check, it failed to update the saving/resuming data
> pointers based on the fd offset. This results in migration data
> corruption and when the device gets started on the destination the
> following error is reported in some cases,
> 
> [  478.907684] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [  478.913691] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000310200000010
> [  478.919603] arm-smmu-v3 arm-smmu-v3.2.auto:  0x000002088000007f
> [  478.925515] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000000000000000
> [  478.931425] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000000000000000
> [  478.947552] hisi_zip 0000:31:00.0: qm_axi_rresp [error status=0x1] found
> [  478.955930] hisi_zip 0000:31:00.0: qm_db_timeout [error status=0x400] found
> [  478.955944] hisi_zip 0000:31:00.0: qm sq doorbell timeout in function 2
> 
> Fixes: d9a871e4a143 ("hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions")
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
  
Shameerali Kolothum Thodi Jan. 5, 2024, 3:56 p.m. UTC | #2
Hi Alex,

Just a gentle ping on this. 

Thanks,
Shameer

> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, November 20, 2023 2:29 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> alex.williamson@redhat.com; yishaih@nvidia.com; kevin.tian@intel.com;
> Linuxarm <linuxarm@huawei.com>; liulongfang <liulongfang@huawei.com>
> Subject: Re: [PATCH] hisi_acc_vfio_pci: Update migration data pointer correctly
> on saving/resume
> 
> On Mon, Nov 20, 2023 at 09:14:06AM +0000, Shameer Kolothum wrote:
> > When the optional PRE_COPY support was added to speed up the device
> > compatibility check, it failed to update the saving/resuming data
> > pointers based on the fd offset. This results in migration data
> > corruption and when the device gets started on the destination the
> > following error is reported in some cases,
> >
> > [  478.907684] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> > [  478.913691] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000310200000010 [
> > 478.919603] arm-smmu-v3 arm-smmu-v3.2.auto:  0x000002088000007f [
> > 478.925515] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000000000000000 [
> > 478.931425] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000000000000000 [
> > 478.947552] hisi_zip 0000:31:00.0: qm_axi_rresp [error status=0x1]
> > found [  478.955930] hisi_zip 0000:31:00.0: qm_db_timeout [error
> > status=0x400] found [  478.955944] hisi_zip 0000:31:00.0: qm sq
> > doorbell timeout in function 2
> >
> > Fixes: d9a871e4a143 ("hisi_acc_vfio_pci: Introduce support for
> > PRE_COPY state transitions")
> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason
  
Alex Williamson Jan. 5, 2024, 4:30 p.m. UTC | #3
On Fri, 5 Jan 2024 15:56:09 +0000
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

> Hi Alex,
> 
> Just a gentle ping on this. 

Thanks for the ping, it seems to have slipped under my radar.  Applied
to vfio next branch for v6.8.  Thanks,

Alex

> > -----Original Message-----
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, November 20, 2023 2:29 PM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > alex.williamson@redhat.com; yishaih@nvidia.com; kevin.tian@intel.com;
> > Linuxarm <linuxarm@huawei.com>; liulongfang <liulongfang@huawei.com>
> > Subject: Re: [PATCH] hisi_acc_vfio_pci: Update migration data pointer correctly
> > on saving/resume
> > 
> > On Mon, Nov 20, 2023 at 09:14:06AM +0000, Shameer Kolothum wrote:  
> > > When the optional PRE_COPY support was added to speed up the device
> > > compatibility check, it failed to update the saving/resuming data
> > > pointers based on the fd offset. This results in migration data
> > > corruption and when the device gets started on the destination the
> > > following error is reported in some cases,
> > >
> > > [  478.907684] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> > > [  478.913691] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000310200000010 [
> > > 478.919603] arm-smmu-v3 arm-smmu-v3.2.auto:  0x000002088000007f [
> > > 478.925515] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000000000000000 [
> > > 478.931425] arm-smmu-v3 arm-smmu-v3.2.auto:  0x0000000000000000 [
> > > 478.947552] hisi_zip 0000:31:00.0: qm_axi_rresp [error status=0x1]
> > > found [  478.955930] hisi_zip 0000:31:00.0: qm_db_timeout [error
> > > status=0x400] found [  478.955944] hisi_zip 0000:31:00.0: qm sq
> > > doorbell timeout in function 2
> > >
> > > Fixes: d9a871e4a143 ("hisi_acc_vfio_pci: Introduce support for
> > > PRE_COPY state transitions")
> > > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > ---
> > >  drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)  
> > 
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > Jason  
>
  

Patch

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index b2f9778c8366..4d27465c8f1a 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -694,6 +694,7 @@  static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *bu
 					size_t len, loff_t *pos)
 {
 	struct hisi_acc_vf_migration_file *migf = filp->private_data;
+	u8 *vf_data = (u8 *)&migf->vf_data;
 	loff_t requested_length;
 	ssize_t done = 0;
 	int ret;
@@ -715,7 +716,7 @@  static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *bu
 		goto out_unlock;
 	}
 
-	ret = copy_from_user(&migf->vf_data, buf, len);
+	ret = copy_from_user(vf_data + *pos, buf, len);
 	if (ret) {
 		done = -EFAULT;
 		goto out_unlock;
@@ -835,7 +836,9 @@  static ssize_t hisi_acc_vf_save_read(struct file *filp, char __user *buf, size_t
 
 	len = min_t(size_t, migf->total_length - *pos, len);
 	if (len) {
-		ret = copy_to_user(buf, &migf->vf_data, len);
+		u8 *vf_data = (u8 *)&migf->vf_data;
+
+		ret = copy_to_user(buf, vf_data + *pos, len);
 		if (ret) {
 			done = -EFAULT;
 			goto out_unlock;