[vhost,v4,02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

Message ID 20231219180858.120898-3-dtatulea@nvidia.com
State New
Headers
Series vdpa/mlx5: Add support for resumable vqs |

Commit Message

Dragos Tatulea Dec. 19, 2023, 6:08 p.m. UTC
  The virtio spec doesn't allow changing virtqueue addresses after
DRIVER_OK. Some devices do support this operation when the device is
suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
advertises this support as a backend features.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/uapi/linux/vhost_types.h | 4 ++++
 1 file changed, 4 insertions(+)
  

Comments

Jason Wang Dec. 20, 2023, 3:46 a.m. UTC | #1
On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The virtio spec doesn't allow changing virtqueue addresses after
> DRIVER_OK. Some devices do support this operation when the device is
> suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> advertises this support as a backend features.

There's an ongoing effort in virtio spec to introduce the suspend state.

So I wonder if it's better to just allow such behaviour?

Thanks


>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/uapi/linux/vhost_types.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index d7656908f730..aacd067afc89 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -192,5 +192,9 @@ struct vhost_vdpa_iova_range {
>  #define VHOST_BACKEND_F_DESC_ASID    0x7
>  /* IOTLB don't flush memory mapping across device reset */
>  #define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
> +/* Device supports changing virtqueue addresses when device is suspended
> + * and is in state DRIVER_OK.
> + */
> +#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND  0x9
>
>  #endif
> --
> 2.43.0
>
  
Jason Wang Dec. 20, 2023, 4:05 a.m. UTC | #2
On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> >
> > The virtio spec doesn't allow changing virtqueue addresses after
> > DRIVER_OK. Some devices do support this operation when the device is
> > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > advertises this support as a backend features.
>
> There's an ongoing effort in virtio spec to introduce the suspend state.
>
> So I wonder if it's better to just allow such behaviour?

Actually I mean, allow drivers to modify the parameters during suspend
without a new feature.

Thanks

>
> Thanks
>
>
  
Dragos Tatulea Dec. 20, 2023, 12:57 p.m. UTC | #3
On Wed, 2023-12-20 at 12:05 +0800, Jason Wang wrote:
> On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > 
> > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > 
> > > The virtio spec doesn't allow changing virtqueue addresses after
> > > DRIVER_OK. Some devices do support this operation when the device is
> > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > advertises this support as a backend features.
> > 
> > There's an ongoing effort in virtio spec to introduce the suspend state.
> > 
> > So I wonder if it's better to just allow such behaviour?
> 
> Actually I mean, allow drivers to modify the parameters during suspend
> without a new feature.
> 
Fine by me. Less code is better than more code. The v2 of this series would be
sufficient then.

Thanks
  
Eugenio Perez Martin Dec. 20, 2023, 1:32 p.m. UTC | #4
On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > The virtio spec doesn't allow changing virtqueue addresses after
> > > DRIVER_OK. Some devices do support this operation when the device is
> > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > advertises this support as a backend features.
> >
> > There's an ongoing effort in virtio spec to introduce the suspend state.
> >
> > So I wonder if it's better to just allow such behaviour?
>
> Actually I mean, allow drivers to modify the parameters during suspend
> without a new feature.
>

That would be ideal, but how do userland checks if it can suspend +
change properties + resume?

The only way that comes to my mind is to make sure all parents return
error if userland tries to do it, and then fallback in userland. I'm
ok with that, but I'm not sure if the current master & previous kernel
has a coherent behavior. Do they return error? Or return success
without changing address / vq state?
  
Eugenio Perez Martin Dec. 20, 2023, 4:09 p.m. UTC | #5
On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The virtio spec doesn't allow changing virtqueue addresses after
> DRIVER_OK. Some devices do support this operation when the device is
> suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> advertises this support as a backend features.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/uapi/linux/vhost_types.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index d7656908f730..aacd067afc89 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -192,5 +192,9 @@ struct vhost_vdpa_iova_range {
>  #define VHOST_BACKEND_F_DESC_ASID    0x7
>  /* IOTLB don't flush memory mapping across device reset */
>  #define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
> +/* Device supports changing virtqueue addresses when device is suspended
> + * and is in state DRIVER_OK.
> + */
> +#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND  0x9
>

If we go by feature flag,

Acked-by: Eugenio Pérez <eperezma@redhat.com>

>  #endif
> --
> 2.43.0
>
  
Jason Wang Dec. 21, 2023, 2:03 a.m. UTC | #6
On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > >
> > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > advertises this support as a backend features.
> > >
> > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > >
> > > So I wonder if it's better to just allow such behaviour?
> >
> > Actually I mean, allow drivers to modify the parameters during suspend
> > without a new feature.
> >
>
> That would be ideal, but how do userland checks if it can suspend +
> change properties + resume?

As discussed, it looks to me the only device that supports suspend is
simulator and it supports change properties.

E.g:

static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
                                  u64 desc_area, u64 driver_area,
                                  u64 device_area)
{
        struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
        struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];

        vq->desc_addr = desc_area;
        vq->driver_addr = driver_area;
        vq->device_addr = device_area;

        return 0;
}

>
> The only way that comes to my mind is to make sure all parents return
> error if userland tries to do it, and then fallback in userland.

Yes.

> I'm
> ok with that, but I'm not sure if the current master & previous kernel
> has a coherent behavior. Do they return error? Or return success
> without changing address / vq state?

We probably don't need to worry too much here, as e.g set_vq_address
could fail even without suspend (just at uAPI level).

Thanks

>
  
Eugenio Perez Martin Dec. 21, 2023, 7:46 a.m. UTC | #7
On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > >
> > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > advertises this support as a backend features.
> > > >
> > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > >
> > > > So I wonder if it's better to just allow such behaviour?
> > >
> > > Actually I mean, allow drivers to modify the parameters during suspend
> > > without a new feature.
> > >
> >
> > That would be ideal, but how do userland checks if it can suspend +
> > change properties + resume?
>
> As discussed, it looks to me the only device that supports suspend is
> simulator and it supports change properties.
>
> E.g:
>
> static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
>                                   u64 desc_area, u64 driver_area,
>                                   u64 device_area)
> {
>         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>
>         vq->desc_addr = desc_area;
>         vq->driver_addr = driver_area;
>         vq->device_addr = device_area;
>
>         return 0;
> }
>

So in the current kernel master it is valid to set a different vq
address while the device is suspended in vdpa_sim. But it is not valid
in mlx5, as the FW will not be updated in resume (Dragos, please
correct me if I'm wrong). Both of them return success.

How can we know in the destination QEMU if it is valid to suspend &
set address? Should we handle this as a bugfix and backport the
change?

> >
> > The only way that comes to my mind is to make sure all parents return
> > error if userland tries to do it, and then fallback in userland.
>
> Yes.
>
> > I'm
> > ok with that, but I'm not sure if the current master & previous kernel
> > has a coherent behavior. Do they return error? Or return success
> > without changing address / vq state?
>
> We probably don't need to worry too much here, as e.g set_vq_address
> could fail even without suspend (just at uAPI level).
>

I don't get this, sorry. I rephrased my point with an example earlier
in the mail.
  
Dragos Tatulea Dec. 21, 2023, 11:52 a.m. UTC | #8
On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > 
> > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > > 
> > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > 
> > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > 
> > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > advertises this support as a backend features.
> > > > > 
> > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > 
> > > > > So I wonder if it's better to just allow such behaviour?
> > > > 
> > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > without a new feature.
> > > > 
> > > 
> > > That would be ideal, but how do userland checks if it can suspend +
> > > change properties + resume?
> > 
> > As discussed, it looks to me the only device that supports suspend is
> > simulator and it supports change properties.
> > 
> > E.g:
> > 
> > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> >                                   u64 desc_area, u64 driver_area,
> >                                   u64 device_area)
> > {
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > 
> >         vq->desc_addr = desc_area;
> >         vq->driver_addr = driver_area;
> >         vq->device_addr = device_area;
> > 
> >         return 0;
> > }
> > 
> 
> So in the current kernel master it is valid to set a different vq
> address while the device is suspended in vdpa_sim. But it is not valid
> in mlx5, as the FW will not be updated in resume (Dragos, please
> correct me if I'm wrong). Both of them return success.
> 
In the current state, there is no resume. HW Virtqueues will just get re-created
with the new address. 

> How can we know in the destination QEMU if it is valid to suspend &
> set address? Should we handle this as a bugfix and backport the
> change?
> 
> > > 
> > > The only way that comes to my mind is to make sure all parents return
> > > error if userland tries to do it, and then fallback in userland.
> > 
> > Yes.
> > 
> > > I'm
> > > ok with that, but I'm not sure if the current master & previous kernel
> > > has a coherent behavior. Do they return error? Or return success
> > > without changing address / vq state?
> > 
> > We probably don't need to worry too much here, as e.g set_vq_address
> > could fail even without suspend (just at uAPI level).
> > 
> 
> I don't get this, sorry. I rephrased my point with an example earlier
> in the mail.
>
  
Eugenio Perez Martin Dec. 21, 2023, 12:08 p.m. UTC | #9
On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > >
> > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > advertises this support as a backend features.
> > > > > >
> > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > >
> > > > > > So I wonder if it's better to just allow such behaviour?
> > > > >
> > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > without a new feature.
> > > > >
> > > >
> > > > That would be ideal, but how do userland checks if it can suspend +
> > > > change properties + resume?
> > >
> > > As discussed, it looks to me the only device that supports suspend is
> > > simulator and it supports change properties.
> > >
> > > E.g:
> > >
> > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > >                                   u64 desc_area, u64 driver_area,
> > >                                   u64 device_area)
> > > {
> > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > >
> > >         vq->desc_addr = desc_area;
> > >         vq->driver_addr = driver_area;
> > >         vq->device_addr = device_area;
> > >
> > >         return 0;
> > > }
> > >
> >
> > So in the current kernel master it is valid to set a different vq
> > address while the device is suspended in vdpa_sim. But it is not valid
> > in mlx5, as the FW will not be updated in resume (Dragos, please
> > correct me if I'm wrong). Both of them return success.
> >
> In the current state, there is no resume. HW Virtqueues will just get re-created
> with the new address.
>

Oh, then all of this is effectively transparent to the userspace
except for the time it takes?

In that case you're right, we don't need feature flags. But I think it
would be great to also move the error return in case userspace tries
to modify vq parameters out of suspend state.

Thanks!


> > How can we know in the destination QEMU if it is valid to suspend &
> > set address? Should we handle this as a bugfix and backport the
> > change?
> >
> > > >
> > > > The only way that comes to my mind is to make sure all parents return
> > > > error if userland tries to do it, and then fallback in userland.
> > >
> > > Yes.
> > >
> > > > I'm
> > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > has a coherent behavior. Do they return error? Or return success
> > > > without changing address / vq state?
> > >
> > > We probably don't need to worry too much here, as e.g set_vq_address
> > > could fail even without suspend (just at uAPI level).
> > >
> >
> > I don't get this, sorry. I rephrased my point with an example earlier
> > in the mail.
> >
>
  
Dragos Tatulea Dec. 21, 2023, 2:38 p.m. UTC | #10
On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > > 
> > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > 
> > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > 
> > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > > 
> > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > advertises this support as a backend features.
> > > > > > > 
> > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > 
> > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > 
> > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > without a new feature.
> > > > > > 
> > > > > 
> > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > change properties + resume?
> > > > 
> > > > As discussed, it looks to me the only device that supports suspend is
> > > > simulator and it supports change properties.
> > > > 
> > > > E.g:
> > > > 
> > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > >                                   u64 desc_area, u64 driver_area,
> > > >                                   u64 device_area)
> > > > {
> > > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > 
> > > >         vq->desc_addr = desc_area;
> > > >         vq->driver_addr = driver_area;
> > > >         vq->device_addr = device_area;
> > > > 
> > > >         return 0;
> > > > }
> > > > 
> > > 
> > > So in the current kernel master it is valid to set a different vq
> > > address while the device is suspended in vdpa_sim. But it is not valid
> > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > correct me if I'm wrong). Both of them return success.
> > > 
> > In the current state, there is no resume. HW Virtqueues will just get re-created
> > with the new address.
> > 
> 
> Oh, then all of this is effectively transparent to the userspace
> except for the time it takes?
> 
Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
representation. Only later will it will call into the FW to update the FW. Later
means:
- On DRIVER_OK state, when the VQs get created.
- On .set_map when the VQs get re-created (before this series) / updated (after
this series)
- On .resume (after this series).

So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
suspended those addresses will be set later for later.

> In that case you're right, we don't need feature flags. But I think it
> would be great to also move the error return in case userspace tries
> to modify vq parameters out of suspend state.
> 
On the driver side or on the core side?

Thanks
> Thanks!
> 
> 
> > > How can we know in the destination QEMU if it is valid to suspend &
> > > set address? Should we handle this as a bugfix and backport the
> > > change?
> > > 
> > > > > 
> > > > > The only way that comes to my mind is to make sure all parents return
> > > > > error if userland tries to do it, and then fallback in userland.
> > > > 
> > > > Yes.
> > > > 
> > > > > I'm
> > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > has a coherent behavior. Do they return error? Or return success
> > > > > without changing address / vq state?
> > > > 
> > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > could fail even without suspend (just at uAPI level).
> > > > 
> > > 
> > > I don't get this, sorry. I rephrased my point with an example earlier
> > > in the mail.
> > > 
> > 
>
  
Eugenio Perez Martin Dec. 21, 2023, 2:55 p.m. UTC | #11
On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > > >
> > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > advertises this support as a backend features.
> > > > > > > >
> > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > >
> > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > >
> > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > without a new feature.
> > > > > > >
> > > > > >
> > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > change properties + resume?
> > > > >
> > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > simulator and it supports change properties.
> > > > >
> > > > > E.g:
> > > > >
> > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > >                                   u64 desc_area, u64 driver_area,
> > > > >                                   u64 device_area)
> > > > > {
> > > > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > >
> > > > >         vq->desc_addr = desc_area;
> > > > >         vq->driver_addr = driver_area;
> > > > >         vq->device_addr = device_area;
> > > > >
> > > > >         return 0;
> > > > > }
> > > > >
> > > >
> > > > So in the current kernel master it is valid to set a different vq
> > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > correct me if I'm wrong). Both of them return success.
> > > >
> > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > with the new address.
> > >
> >
> > Oh, then all of this is effectively transparent to the userspace
> > except for the time it takes?
> >
> Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> representation. Only later will it will call into the FW to update the FW. Later
> means:
> - On DRIVER_OK state, when the VQs get created.
> - On .set_map when the VQs get re-created (before this series) / updated (after
> this series)
> - On .resume (after this series).
>
> So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> suspended those addresses will be set later for later.
>

Ouch, that is more in the line of my thoughts :(.

> > In that case you're right, we don't need feature flags. But I think it
> > would be great to also move the error return in case userspace tries
> > to modify vq parameters out of suspend state.
> >
> On the driver side or on the core side?
>

Core side.

It does not have to be part of this series, I meant it can be proposed
in a separate series and applied before the parent driver one.

> Thanks
> > Thanks!
> >
> >
> > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > set address? Should we handle this as a bugfix and backport the
> > > > change?
> > > >
> > > > > >
> > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > error if userland tries to do it, and then fallback in userland.
> > > > >
> > > > > Yes.
> > > > >
> > > > > > I'm
> > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > without changing address / vq state?
> > > > >
> > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > could fail even without suspend (just at uAPI level).
> > > > >
> > > >
> > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > in the mail.
> > > >
> > >
> >
>
  
Dragos Tatulea Dec. 21, 2023, 3:07 p.m. UTC | #12
On Thu, 2023-12-21 at 15:55 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > 
> > > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > 
> > > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > > 
> > > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > 
> > > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > 
> > > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > > advertises this support as a backend features.
> > > > > > > > > 
> > > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > > > 
> > > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > > > 
> > > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > > without a new feature.
> > > > > > > > 
> > > > > > > 
> > > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > > change properties + resume?
> > > > > > 
> > > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > > simulator and it supports change properties.
> > > > > > 
> > > > > > E.g:
> > > > > > 
> > > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > > >                                   u64 desc_area, u64 driver_area,
> > > > > >                                   u64 device_area)
> > > > > > {
> > > > > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > > > 
> > > > > >         vq->desc_addr = desc_area;
> > > > > >         vq->driver_addr = driver_area;
> > > > > >         vq->device_addr = device_area;
> > > > > > 
> > > > > >         return 0;
> > > > > > }
> > > > > > 
> > > > > 
> > > > > So in the current kernel master it is valid to set a different vq
> > > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > > correct me if I'm wrong). Both of them return success.
> > > > > 
> > > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > > with the new address.
> > > > 
> > > 
> > > Oh, then all of this is effectively transparent to the userspace
> > > except for the time it takes?
> > > 
> > Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> > representation. Only later will it will call into the FW to update the FW. Later
> > means:
> > - On DRIVER_OK state, when the VQs get created.
> > - On .set_map when the VQs get re-created (before this series) / updated (after
> > this series)
> > - On .resume (after this series).
> > 
> > So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> > suspended those addresses will be set later for later.
> > 
> 
> Ouch, that is more in the line of my thoughts :(.
> 
> > > In that case you're right, we don't need feature flags. But I think it
> > > would be great to also move the error return in case userspace tries
> > > to modify vq parameters out of suspend state.
> > > 
> > On the driver side or on the core side?
> > 
> 
> Core side.
> 
Checking my understanding: instead of the feature flags there would be a check
(for .set_vq_addr and .set_vq_state) to return an error if they are called under
DRIVER_OK and not suspended state?

> It does not have to be part of this series, I meant it can be proposed
> in a separate series and applied before the parent driver one.
> 
> > Thanks
> > > Thanks!
> > > 
> > > 
> > > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > > set address? Should we handle this as a bugfix and backport the
> > > > > change?
> > > > > 
> > > > > > > 
> > > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > > error if userland tries to do it, and then fallback in userland.
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > > I'm
> > > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > > without changing address / vq state?
> > > > > > 
> > > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > > could fail even without suspend (just at uAPI level).
> > > > > > 
> > > > > 
> > > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > > in the mail.
> > > > > 
> > > > 
> > > 
> > 
>
  
Jason Wang Dec. 22, 2023, 2:50 a.m. UTC | #13
On Thu, Dec 21, 2023 at 3:47 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > >
> > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > advertises this support as a backend features.
> > > > >
> > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > >
> > > > > So I wonder if it's better to just allow such behaviour?
> > > >
> > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > without a new feature.
> > > >
> > >
> > > That would be ideal, but how do userland checks if it can suspend +
> > > change properties + resume?
> >
> > As discussed, it looks to me the only device that supports suspend is
> > simulator and it supports change properties.
> >
> > E.g:
> >
> > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> >                                   u64 desc_area, u64 driver_area,
> >                                   u64 device_area)
> > {
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> >
> >         vq->desc_addr = desc_area;
> >         vq->driver_addr = driver_area;
> >         vq->device_addr = device_area;
> >
> >         return 0;
> > }
> >
>
> So in the current kernel master it is valid to set a different vq
> address while the device is suspended in vdpa_sim. But it is not valid
> in mlx5, as the FW will not be updated in resume (Dragos, please
> correct me if I'm wrong). Both of them return success.
>
> How can we know in the destination QEMU if it is valid to suspend &
> set address? Should we handle this as a bugfix and backport the
> change?

Good point.

We probably need to do backport, this seems to be the easiest way.
Theoretically, userspace may assume this behavior (though I don't
think there would be a user that depends on the simulator).

>
> > >
> > > The only way that comes to my mind is to make sure all parents return
> > > error if userland tries to do it, and then fallback in userland.
> >
> > Yes.
> >
> > > I'm
> > > ok with that, but I'm not sure if the current master & previous kernel
> > > has a coherent behavior. Do they return error? Or return success
> > > without changing address / vq state?
> >
> > We probably don't need to worry too much here, as e.g set_vq_address
> > could fail even without suspend (just at uAPI level).
> >
>
> I don't get this, sorry. I rephrased my point with an example earlier
> in the mail.

I mean currently, VHOST_SET_VRING_ADDR can fail. So userspace should
not assume it will always succeed.

Thanks

>
  
Eugenio Perez Martin Dec. 22, 2023, 7:30 a.m. UTC | #14
On Thu, Dec 21, 2023 at 4:07 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Thu, 2023-12-21 at 15:55 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > > > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > >
> > > > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > > > advertises this support as a backend features.
> > > > > > > > > >
> > > > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > > > >
> > > > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > > > >
> > > > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > > > without a new feature.
> > > > > > > > >
> > > > > > > >
> > > > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > > > change properties + resume?
> > > > > > >
> > > > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > > > simulator and it supports change properties.
> > > > > > >
> > > > > > > E.g:
> > > > > > >
> > > > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > > > >                                   u64 desc_area, u64 driver_area,
> > > > > > >                                   u64 device_area)
> > > > > > > {
> > > > > > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > > > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > > > >
> > > > > > >         vq->desc_addr = desc_area;
> > > > > > >         vq->driver_addr = driver_area;
> > > > > > >         vq->device_addr = device_area;
> > > > > > >
> > > > > > >         return 0;
> > > > > > > }
> > > > > > >
> > > > > >
> > > > > > So in the current kernel master it is valid to set a different vq
> > > > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > > > correct me if I'm wrong). Both of them return success.
> > > > > >
> > > > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > > > with the new address.
> > > > >
> > > >
> > > > Oh, then all of this is effectively transparent to the userspace
> > > > except for the time it takes?
> > > >
> > > Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> > > representation. Only later will it will call into the FW to update the FW. Later
> > > means:
> > > - On DRIVER_OK state, when the VQs get created.
> > > - On .set_map when the VQs get re-created (before this series) / updated (after
> > > this series)
> > > - On .resume (after this series).
> > >
> > > So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> > > suspended those addresses will be set later for later.
> > >
> >
> > Ouch, that is more in the line of my thoughts :(.
> >
> > > > In that case you're right, we don't need feature flags. But I think it
> > > > would be great to also move the error return in case userspace tries
> > > > to modify vq parameters out of suspend state.
> > > >
> > > On the driver side or on the core side?
> > >
> >
> > Core side.
> >
> Checking my understanding: instead of the feature flags there would be a check
> (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> DRIVER_OK and not suspended state?
>

Yes, correct. Per Jason's message, it should be enough with two
independent series:
* Patches 6, 7 and 8 of this series, just checking for suspend state
and not feature flags.
* Your v2.

Thanks!

> > It does not have to be part of this series, I meant it can be proposed
> > in a separate series and applied before the parent driver one.
> >
> > > Thanks
> > > > Thanks!
> > > >
> > > >
> > > > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > > > set address? Should we handle this as a bugfix and backport the
> > > > > > change?
> > > > > >
> > > > > > > >
> > > > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > > > error if userland tries to do it, and then fallback in userland.
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > I'm
> > > > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > > > without changing address / vq state?
> > > > > > >
> > > > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > > > could fail even without suspend (just at uAPI level).
> > > > > > >
> > > > > >
> > > > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > > > in the mail.
> > > > > >
> > > > >
> > > >
> > >
> >
>
  
Michael S. Tsirkin Dec. 22, 2023, 8:29 a.m. UTC | #15
On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > In that case you're right, we don't need feature flags. But I think it
> > > > would be great to also move the error return in case userspace tries
> > > > to modify vq parameters out of suspend state.
> > > > 
> > > On the driver side or on the core side?
> > > 
> > 
> > Core side.
> > 
> Checking my understanding: instead of the feature flags there would be a check
> (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> DRIVER_OK and not suspended state?

Yea this looks much saner, if we start adding feature flags for
each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
feature bits which is not reasonable.
  
Dragos Tatulea Dec. 22, 2023, 10:51 a.m. UTC | #16
On Fri, 2023-12-22 at 03:29 -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > > In that case you're right, we don't need feature flags. But I think it
> > > > > would be great to also move the error return in case userspace tries
> > > > > to modify vq parameters out of suspend state.
> > > > > 
> > > > On the driver side or on the core side?
> > > > 
> > > 
> > > Core side.
> > > 
> > Checking my understanding: instead of the feature flags there would be a check
> > (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> > DRIVER_OK and not suspended state?
> 
> Yea this looks much saner, if we start adding feature flags for
> each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
> feature bits which is not reasonable.
> 
Ack. Is the v2 enough or should I respin a v5 with the updated Acked-by tags?

I will prepare the core part as a different series without the flags.

Thanks,
Dragos
  
Dragos Tatulea Dec. 25, 2023, 1:45 p.m. UTC | #17
On Fri, 2023-12-22 at 11:51 +0100, Dragos Tatulea wrote:
> On Fri, 2023-12-22 at 03:29 -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > > > In that case you're right, we don't need feature flags. But I think it
> > > > > > would be great to also move the error return in case userspace tries
> > > > > > to modify vq parameters out of suspend state.
> > > > > > 
> > > > > On the driver side or on the core side?
> > > > > 
> > > > 
> > > > Core side.
> > > > 
> > > Checking my understanding: instead of the feature flags there would be a check
> > > (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> > > DRIVER_OK and not suspended state?
> > 
> > Yea this looks much saner, if we start adding feature flags for
> > each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
> > feature bits which is not reasonable.
> > 
> Ack. Is the v2 enough or should I respin a v5 with the updated Acked-by tags?
> 
> I will prepare the core part as a different series without the flags.
> 
Core part sent:
https://lore.kernel.org/virtualization/20231225134210.151540-1-dtatulea@nvidia.com/T/#t

I also have a v2 respin with extra Acked-by tags if necessary as a v5. Just let
me know if it is needed.

Thanks,
Dragos
  

Patch

diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index d7656908f730..aacd067afc89 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -192,5 +192,9 @@  struct vhost_vdpa_iova_range {
 #define VHOST_BACKEND_F_DESC_ASID    0x7
 /* IOTLB don't flush memory mapping across device reset */
 #define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
+/* Device supports changing virtqueue addresses when device is suspended
+ * and is in state DRIVER_OK.
+ */
+#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND  0x9
 
 #endif