[4/4] vdpa/mlx5: implement .reset_map driver op
Commit Message
Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with
virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device
creation time. This 1:1 DMA MR will be implicitly destroyed while
the first .set_map call is invoked, in which case callers like
vhost-vdpa will start to set up custom mappings. When the .reset
callback is invoked, the custom mappings will be cleared and the 1:1
DMA MR will be re-created.
In order to reduce excessive memory mapping cost in live migration,
it is desirable to decouple the vhost-vdpa IOTLB abstraction from
the virtio device life cycle, i.e. mappings can be kept around intact
across virtio device reset. Leverage the .reset_map callback, which
is meant to destroy the regular MR on the given ASID and recreate the
initial DMA mapping. That way, the device .reset op can run free from
having to maintain and clean up memory mappings by itself.
The cvq mapping also needs to be cleared if is in the given ASID.
Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
drivers/vdpa/mlx5/core/mr.c | 17 +++++++++++++++++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 +++++++++++++-----
3 files changed, 31 insertions(+), 5 deletions(-)
Comments
On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with
> virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device
> creation time. This 1:1 DMA MR will be implicitly destroyed while
> the first .set_map call is invoked, in which case callers like
> vhost-vdpa will start to set up custom mappings. When the .reset
> callback is invoked, the custom mappings will be cleared and the 1:1
> DMA MR will be re-created.
>
> In order to reduce excessive memory mapping cost in live migration,
> it is desirable to decouple the vhost-vdpa IOTLB abstraction from
> the virtio device life cycle, i.e. mappings can be kept around intact
> across virtio device reset. Leverage the .reset_map callback, which
> is meant to destroy the regular MR on the given ASID and recreate the
> initial DMA mapping. That way, the device .reset op can run free from
> having to maintain and clean up memory mappings by itself.
>
> The cvq mapping also needs to be cleared if is in the given ASID.
>
> Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
I wonder if the simulator suffers from the exact same issue. If yes,
let's fix the simulator as well?
Thanks
On 10/12/2023 8:04 PM, Jason Wang wrote:
> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with
>> virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device
>> creation time. This 1:1 DMA MR will be implicitly destroyed while
>> the first .set_map call is invoked, in which case callers like
>> vhost-vdpa will start to set up custom mappings. When the .reset
>> callback is invoked, the custom mappings will be cleared and the 1:1
>> DMA MR will be re-created.
>>
>> In order to reduce excessive memory mapping cost in live migration,
>> it is desirable to decouple the vhost-vdpa IOTLB abstraction from
>> the virtio device life cycle, i.e. mappings can be kept around intact
>> across virtio device reset. Leverage the .reset_map callback, which
>> is meant to destroy the regular MR on the given ASID and recreate the
>> initial DMA mapping. That way, the device .reset op can run free from
>> having to maintain and clean up memory mappings by itself.
>>
>> The cvq mapping also needs to be cleared if is in the given ASID.
>>
>> Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com>
>> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> I wonder if the simulator suffers from the exact same issue.
For vdpa-sim !use_va (map using PA and with pinning) case, yes. But I'm
not sure the situation of the vdpa-sim(-blk) use_va case, e.g. I haven't
checked if there's dependency on today's reset behavior (coupled), and
if QEMU vhost-vdpa backend driver is the only userspace consumer. Maybe
Stefano knows?
I can give it a try on simulator fix but don't count me on the
vdpa-sim(-blk) use_va part.
Regards,
-Siwei
> If yes,
> let's fix the simulator as well?
>
> Thanks
>
@@ -127,6 +127,7 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb,
unsigned int asid);
int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev);
+int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
#define mlx5_vdpa_warn(__dev, format, ...) \
dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__, \
@@ -645,3 +645,20 @@ int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev)
return mlx5_vdpa_update_cvq_iotlb(mvdev, NULL, 0);
}
+
+int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
+{
+ if (asid >= MLX5_VDPA_NUM_AS)
+ return -EINVAL;
+
+ mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[asid]);
+
+ if (asid == 0 && MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
+ if (mlx5_vdpa_create_dma_mr(mvdev))
+ mlx5_vdpa_warn(mvdev, "create DMA MR failed\n");
+ } else {
+ mlx5_vdpa_update_cvq_iotlb(mvdev, NULL, asid);
+ }
+
+ return 0;
+}
@@ -2838,7 +2838,6 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
unregister_link_notifier(ndev);
teardown_driver(ndev);
clear_vqs_ready(ndev);
- mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
ndev->mvdev.status = 0;
ndev->mvdev.suspended = false;
ndev->cur_num_vqs = 0;
@@ -2849,10 +2848,6 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
init_group_to_asid_map(mvdev);
++mvdev->generation;
- if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
- if (mlx5_vdpa_create_dma_mr(mvdev))
- mlx5_vdpa_warn(mvdev, "create MR failed\n");
- }
up_write(&ndev->reslock);
return 0;
@@ -2932,6 +2927,18 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, unsigned int asid,
return err;
}
+static int mlx5_vdpa_reset_map(struct vdpa_device *vdev, unsigned int asid)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ int err;
+
+ down_write(&ndev->reslock);
+ err = mlx5_vdpa_reset_mr(mvdev, asid);
+ up_write(&ndev->reslock);
+ return err;
+}
+
static struct device *mlx5_get_vq_dma_dev(struct vdpa_device *vdev, u16 idx)
{
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
@@ -3199,6 +3206,7 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
.set_config = mlx5_vdpa_set_config,
.get_generation = mlx5_vdpa_get_generation,
.set_map = mlx5_vdpa_set_map,
+ .reset_map = mlx5_vdpa_reset_map,
.set_group_asid = mlx5_set_group_asid,
.get_vq_dma_dev = mlx5_get_vq_dma_dev,
.free = mlx5_vdpa_free,