[RFC,V1,10/13] vdpa_sim: flush workers on suspend

Message ID 1704919215-91319-11-git-send-email-steven.sistare@oracle.com
State New
Headers
Series vdpa live update |

Commit Message

Steven Sistare Jan. 10, 2024, 8:40 p.m. UTC
  To pass ownership of a live vdpa device to a new process, the user
suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
mm.  Flush workers in suspend to guarantee that no worker sees the new
mm and old VA in between.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)
  

Comments

Eugenio Perez Martin Jan. 16, 2024, 6:57 p.m. UTC | #1
On Wed, Jan 10, 2024 at 9:40 PM Steve Sistare <steven.sistare@oracle.com> wrote:
>
> To pass ownership of a live vdpa device to a new process, the user
> suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
> VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
> mm.  Flush workers in suspend to guarantee that no worker sees the new
> mm and old VA in between.
>

The worker should already be stopped by the end of the suspend ioctl,
so maybe we can consider this a fix?

> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 6304cb0b4770..8734834983cb 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -74,6 +74,17 @@ static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
>         kthread_flush_work(work);
>  }
>
> +static void flush_work_fn(struct kthread_work *work) {}
> +
> +static void vdpasim_flush_work(struct vdpasim *vdpasim)
> +{
> +       struct kthread_work work;
> +
> +       kthread_init_work(&work, flush_work_fn);
> +       kthread_queue_work(vdpasim->worker, &work);
> +       kthread_flush_work(&work);

Wouldn't it be better to cancel the work with kthread_cancel_work_sync here?

> +}
> +
>  static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
>  {
>         return container_of(vdpa, struct vdpasim, vdpa);
> @@ -512,6 +523,8 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
>         vdpasim->running = false;
>         mutex_unlock(&vdpasim->mutex);
>
> +       vdpasim_flush_work(vdpasim);
> +
>         return 0;
>  }
>
> --
> 2.39.3
>
  
Steven Sistare Jan. 17, 2024, 8:31 p.m. UTC | #2
On 1/16/2024 1:57 PM, Eugenio Perez Martin wrote:
> On Wed, Jan 10, 2024 at 9:40 PM Steve Sistare <steven.sistare@oracle.com> wrote:
>>
>> To pass ownership of a live vdpa device to a new process, the user
>> suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
>> VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
>> mm.  Flush workers in suspend to guarantee that no worker sees the new
>> mm and old VA in between.
> 
> The worker should already be stopped by the end of the suspend ioctl,
> so maybe we can consider this a fix?

Do you mean: the current behavior is a bug, independently of my new use case,
so I should submit this patch as a separate bug fix?  If yes, then will do.

>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index 6304cb0b4770..8734834983cb 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -74,6 +74,17 @@ static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
>>         kthread_flush_work(work);
>>  }
>>
>> +static void flush_work_fn(struct kthread_work *work) {}
>> +
>> +static void vdpasim_flush_work(struct vdpasim *vdpasim)
>> +{
>> +       struct kthread_work work;
>> +
>> +       kthread_init_work(&work, flush_work_fn);
>> +       kthread_queue_work(vdpasim->worker, &work);
>> +       kthread_flush_work(&work);
> 
> Wouldn't it be better to cancel the work with kthread_cancel_work_sync here?

I believe that does not guarantee that currently executing work completes:

  static bool __kthread_cancel_work_sync()
    if (worker->current_work != work)
        goto out_fast;

- Steve

>> +}
>> +
>>  static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
>>  {
>>         return container_of(vdpa, struct vdpasim, vdpa);
>> @@ -512,6 +523,8 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
>>         vdpasim->running = false;
>>         mutex_unlock(&vdpasim->mutex);
>>
>> +       vdpasim_flush_work(vdpasim);
>> +
>>         return 0;
>>  }
>>
>> --
>> 2.39.3
>>
>
  

Patch

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 6304cb0b4770..8734834983cb 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -74,6 +74,17 @@  static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
 	kthread_flush_work(work);
 }
 
+static void flush_work_fn(struct kthread_work *work) {}
+
+static void vdpasim_flush_work(struct vdpasim *vdpasim)
+{
+	struct kthread_work work;
+
+	kthread_init_work(&work, flush_work_fn);
+	kthread_queue_work(vdpasim->worker, &work);
+	kthread_flush_work(&work);
+}
+
 static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
 {
 	return container_of(vdpa, struct vdpasim, vdpa);
@@ -512,6 +523,8 @@  static int vdpasim_suspend(struct vdpa_device *vdpa)
 	vdpasim->running = false;
 	mutex_unlock(&vdpasim->mutex);
 
+	vdpasim_flush_work(vdpasim);
+
 	return 0;
 }