[v4,0/7] vdpa: decouple reset of iotlb mapping from device reset

Message ID 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com
Headers
Series vdpa: decouple reset of iotlb mapping from device reset |

Message

Si-Wei Liu Oct. 21, 2023, 9:25 a.m. UTC
  In order to reduce needlessly high setup and teardown cost
of iotlb mapping during live migration, it's crucial to
decouple the vhost-vdpa iotlb abstraction from the virtio
device life cycle, i.e. iotlb mappings should be left
intact across virtio device reset [1]. For it to work, the
on-chip IOMMU parent device could implement a separate
.reset_map() operation callback to restore 1:1 DMA mapping
without having to resort to the .reset() callback, the
latter of which is mainly used to reset virtio device state.
This new .reset_map() callback will be invoked only before
the vhost-vdpa driver is to be removed and detached from
the vdpa bus, such that other vdpa bus drivers, e.g. 
virtio-vdpa, can start with 1:1 DMA mapping when they
are attached. For the context, those on-chip IOMMU parent
devices, create the 1:1 DMA mapping at vdpa device creation,
and they would implicitly destroy the 1:1 mapping when
the first .set_map or .dma_map callback is invoked.

This patchset is rebased on top of the latest vhost tree.

[1] Reducing vdpa migration downtime because of memory pin / maps
https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html

---
v4:
- Rework compatibility using new .compat_reset driver op

v3:
- add .reset_map support to vdpa_sim
- introduce module parameter to provide bug-for-bug compatibility with older
  userspace 

v2:
- improved commit message to clarify the intended csope of .reset_map API
- improved commit messages to clarify no breakage on older userspace

v1:
- rewrote commit messages to include more detailed description and background
- reword to vendor specific IOMMU implementation from on-chip IOMMU
- include parent device backend feautres to persistent iotlb precondition
- reimplement mlx5_vdpa patch on top of descriptor group series

RFC v3:
- fix missing return due to merge error in patch #4

RFC v2:
- rebased on top of the "[PATCH RFC v2 0/3] vdpa: dedicated descriptor table group" series:
  https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei.liu@oracle.com/

---

Si-Wei Liu (7):
  vdpa: introduce .reset_map operation callback
  vhost-vdpa: reset vendor specific mapping to initial state in .release
  vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
  vdpa: introduce .compat_reset operation callback
  vhost-vdpa: clean iotlb map during reset for older userspace
  vdpa/mlx5: implement .reset_map driver op
  vdpa_sim: implement .reset_map support

 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
 drivers/vdpa/mlx5/core/mr.c        | 17 ++++++++++
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 27 ++++++++++++++--
 drivers/vdpa/vdpa_sim/vdpa_sim.c   | 52 ++++++++++++++++++++++++------
 drivers/vhost/vdpa.c               | 49 +++++++++++++++++++++++++---
 drivers/virtio/virtio_vdpa.c       |  2 +-
 include/linux/vdpa.h               | 30 +++++++++++++++--
 include/uapi/linux/vhost_types.h   |  2 ++
 8 files changed, 161 insertions(+), 19 deletions(-)
  

Comments

Jason Wang Oct. 23, 2023, 3:51 a.m. UTC | #1
Hi Si-Wei:

On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> In order to reduce needlessly high setup and teardown cost
> of iotlb mapping during live migration, it's crucial to
> decouple the vhost-vdpa iotlb abstraction from the virtio
> device life cycle, i.e. iotlb mappings should be left
> intact across virtio device reset [1]. For it to work, the
> on-chip IOMMU parent device could implement a separate
> .reset_map() operation callback to restore 1:1 DMA mapping
> without having to resort to the .reset() callback, the
> latter of which is mainly used to reset virtio device state.
> This new .reset_map() callback will be invoked only before
> the vhost-vdpa driver is to be removed and detached from
> the vdpa bus, such that other vdpa bus drivers, e.g.
> virtio-vdpa, can start with 1:1 DMA mapping when they
> are attached. For the context, those on-chip IOMMU parent
> devices, create the 1:1 DMA mapping at vdpa device creation,
> and they would implicitly destroy the 1:1 mapping when
> the first .set_map or .dma_map callback is invoked.
>
> This patchset is rebased on top of the latest vhost tree.
>
> [1] Reducing vdpa migration downtime because of memory pin / maps
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>
> ---
> v4:
> - Rework compatibility using new .compat_reset driver op

I still think having a set_backend_feature() or reset_map(clean=true)
might be better. As it tries hard to not introduce new stuff on the
bus.

But we can listen to others for sure.

Thanks
  
Si-Wei Liu Oct. 23, 2023, 10 p.m. UTC | #2
On 10/22/2023 8:51 PM, Jason Wang wrote:
> Hi Si-Wei:
>
> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> In order to reduce needlessly high setup and teardown cost
>> of iotlb mapping during live migration, it's crucial to
>> decouple the vhost-vdpa iotlb abstraction from the virtio
>> device life cycle, i.e. iotlb mappings should be left
>> intact across virtio device reset [1]. For it to work, the
>> on-chip IOMMU parent device could implement a separate
>> .reset_map() operation callback to restore 1:1 DMA mapping
>> without having to resort to the .reset() callback, the
>> latter of which is mainly used to reset virtio device state.
>> This new .reset_map() callback will be invoked only before
>> the vhost-vdpa driver is to be removed and detached from
>> the vdpa bus, such that other vdpa bus drivers, e.g.
>> virtio-vdpa, can start with 1:1 DMA mapping when they
>> are attached. For the context, those on-chip IOMMU parent
>> devices, create the 1:1 DMA mapping at vdpa device creation,
>> and they would implicitly destroy the 1:1 mapping when
>> the first .set_map or .dma_map callback is invoked.
>>
>> This patchset is rebased on top of the latest vhost tree.
>>
>> [1] Reducing vdpa migration downtime because of memory pin / maps
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>>
>> ---
>> v4:
>> - Rework compatibility using new .compat_reset driver op
> I still think having a set_backend_feature()
This will overload backend features with the role of carrying over 
compatibility quirks, which I tried to avoid from. While I think the 
.compat_reset from the v4 code just works with the backend features 
acknowledgement (and maybe others as well) to determine, but not 
directly tie it to backend features itself. These two have different 
implications in terms of requirement, scope and maintaining/deprecation, 
better to cope with compat quirks in explicit and driver visible way.

>   or reset_map(clean=true) might be better.
An explicit op might be marginally better in driver writer's point of 
view. Compliant driver doesn't have to bother asserting clean_map never 
be true so their code would never bother dealing with this case, as 
explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map 
during reset for older userspace":

"
     The separation of .compat_reset from the regular .reset allows
     vhost-vdpa able to know which driver had broken behavior before, so it
     can apply the corresponding compatibility quirk to the individual 
driver
     whenever needed.  Compared to overloading the existing .reset with
     flags, .compat_reset won't cause any extra burden to the implementation
     of every compliant driver.
"

>   As it tries hard to not introduce new stuff on the bus.
Honestly I don't see substantial difference between these other than the 
color. There's no single best solution that stands out among the 3. And 
I assume you already noticed it from all the above 3 approaches will 
have to go with backend features negotiation, that the 1st vdpa reset 
before backend feature negotiation will use the compliant version of 
.reset that doesn't clean up the map. While I don't think this nuance 
matters much to existing older userspace apps, as the maps should 
already get cleaned by previous process in vhost_vdpa_cleanup(), but if 
bug-for-bug behavioral compatibility is what you want, module parameter 
will be the single best answer.

Regards,
-Siwei

> But we can listen to others for sure.
>
> Thanks
>
  
Lei Yang Oct. 24, 2023, 6:51 a.m. UTC | #3
QE tested this series v4 with regression testing on real nic, there is
no new regression bug.

Tested-by: Lei Yang <leiyang@redhat.com>

On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 10/22/2023 8:51 PM, Jason Wang wrote:
> > Hi Si-Wei:
> >
> > On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> In order to reduce needlessly high setup and teardown cost
> >> of iotlb mapping during live migration, it's crucial to
> >> decouple the vhost-vdpa iotlb abstraction from the virtio
> >> device life cycle, i.e. iotlb mappings should be left
> >> intact across virtio device reset [1]. For it to work, the
> >> on-chip IOMMU parent device could implement a separate
> >> .reset_map() operation callback to restore 1:1 DMA mapping
> >> without having to resort to the .reset() callback, the
> >> latter of which is mainly used to reset virtio device state.
> >> This new .reset_map() callback will be invoked only before
> >> the vhost-vdpa driver is to be removed and detached from
> >> the vdpa bus, such that other vdpa bus drivers, e.g.
> >> virtio-vdpa, can start with 1:1 DMA mapping when they
> >> are attached. For the context, those on-chip IOMMU parent
> >> devices, create the 1:1 DMA mapping at vdpa device creation,
> >> and they would implicitly destroy the 1:1 mapping when
> >> the first .set_map or .dma_map callback is invoked.
> >>
> >> This patchset is rebased on top of the latest vhost tree.
> >>
> >> [1] Reducing vdpa migration downtime because of memory pin / maps
> >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
> >>
> >> ---
> >> v4:
> >> - Rework compatibility using new .compat_reset driver op
> > I still think having a set_backend_feature()
> This will overload backend features with the role of carrying over
> compatibility quirks, which I tried to avoid from. While I think the
> .compat_reset from the v4 code just works with the backend features
> acknowledgement (and maybe others as well) to determine, but not
> directly tie it to backend features itself. These two have different
> implications in terms of requirement, scope and maintaining/deprecation,
> better to cope with compat quirks in explicit and driver visible way.
>
> >   or reset_map(clean=true) might be better.
> An explicit op might be marginally better in driver writer's point of
> view. Compliant driver doesn't have to bother asserting clean_map never
> be true so their code would never bother dealing with this case, as
> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> during reset for older userspace":
>
> "
>      The separation of .compat_reset from the regular .reset allows
>      vhost-vdpa able to know which driver had broken behavior before, so it
>      can apply the corresponding compatibility quirk to the individual
> driver
>      whenever needed.  Compared to overloading the existing .reset with
>      flags, .compat_reset won't cause any extra burden to the implementation
>      of every compliant driver.
> "
>
> >   As it tries hard to not introduce new stuff on the bus.
> Honestly I don't see substantial difference between these other than the
> color. There's no single best solution that stands out among the 3. And
> I assume you already noticed it from all the above 3 approaches will
> have to go with backend features negotiation, that the 1st vdpa reset
> before backend feature negotiation will use the compliant version of
> .reset that doesn't clean up the map. While I don't think this nuance
> matters much to existing older userspace apps, as the maps should
> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> bug-for-bug behavioral compatibility is what you want, module parameter
> will be the single best answer.
>
> Regards,
> -Siwei
>
> > But we can listen to others for sure.
> >
> > Thanks
> >
>
  
Si-Wei Liu Oct. 24, 2023, 5:27 p.m. UTC | #4
Thanks a lot for testing! Please be aware that there's a follow-up fix 
for a potential oops in this v4 series:

https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/

Would be nice to have it applied for any tests.

Thanks,
-Siwei

On 10/23/2023 11:51 PM, Lei Yang wrote:
> QE tested this series v4 with regression testing on real nic, there is
> no new regression bug.
>
> Tested-by: Lei Yang <leiyang@redhat.com>
>
> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 10/22/2023 8:51 PM, Jason Wang wrote:
>>> Hi Si-Wei:
>>>
>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>> In order to reduce needlessly high setup and teardown cost
>>>> of iotlb mapping during live migration, it's crucial to
>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
>>>> device life cycle, i.e. iotlb mappings should be left
>>>> intact across virtio device reset [1]. For it to work, the
>>>> on-chip IOMMU parent device could implement a separate
>>>> .reset_map() operation callback to restore 1:1 DMA mapping
>>>> without having to resort to the .reset() callback, the
>>>> latter of which is mainly used to reset virtio device state.
>>>> This new .reset_map() callback will be invoked only before
>>>> the vhost-vdpa driver is to be removed and detached from
>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
>>>> are attached. For the context, those on-chip IOMMU parent
>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
>>>> and they would implicitly destroy the 1:1 mapping when
>>>> the first .set_map or .dma_map callback is invoked.
>>>>
>>>> This patchset is rebased on top of the latest vhost tree.
>>>>
>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>>>>
>>>> ---
>>>> v4:
>>>> - Rework compatibility using new .compat_reset driver op
>>> I still think having a set_backend_feature()
>> This will overload backend features with the role of carrying over
>> compatibility quirks, which I tried to avoid from. While I think the
>> .compat_reset from the v4 code just works with the backend features
>> acknowledgement (and maybe others as well) to determine, but not
>> directly tie it to backend features itself. These two have different
>> implications in terms of requirement, scope and maintaining/deprecation,
>> better to cope with compat quirks in explicit and driver visible way.
>>
>>>    or reset_map(clean=true) might be better.
>> An explicit op might be marginally better in driver writer's point of
>> view. Compliant driver doesn't have to bother asserting clean_map never
>> be true so their code would never bother dealing with this case, as
>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
>> during reset for older userspace":
>>
>> "
>>       The separation of .compat_reset from the regular .reset allows
>>       vhost-vdpa able to know which driver had broken behavior before, so it
>>       can apply the corresponding compatibility quirk to the individual
>> driver
>>       whenever needed.  Compared to overloading the existing .reset with
>>       flags, .compat_reset won't cause any extra burden to the implementation
>>       of every compliant driver.
>> "
>>
>>>    As it tries hard to not introduce new stuff on the bus.
>> Honestly I don't see substantial difference between these other than the
>> color. There's no single best solution that stands out among the 3. And
>> I assume you already noticed it from all the above 3 approaches will
>> have to go with backend features negotiation, that the 1st vdpa reset
>> before backend feature negotiation will use the compliant version of
>> .reset that doesn't clean up the map. While I don't think this nuance
>> matters much to existing older userspace apps, as the maps should
>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
>> bug-for-bug behavioral compatibility is what you want, module parameter
>> will be the single best answer.
>>
>> Regards,
>> -Siwei
>>
>>> But we can listen to others for sure.
>>>
>>> Thanks
>>>
  
Lei Yang Oct. 25, 2023, 9:41 a.m. UTC | #5
On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
Hello Si-Wei
> Thanks a lot for testing! Please be aware that there's a follow-up fix
> for a potential oops in this v4 series:
>
The first, when I did not apply this patch [1], I will also hit this
patch mentioned problem. After I applied this patch, this problem will
no longer to hit again. But I hit another issues, about the error
messages please review the attached file.
[1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/

My test steps:
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd  linux/
b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com
b4 am 20231018171456.1624030-2-dtatulea@nvidia.com
b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com
git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
cp /boot/config-5.14.0-377.el9.x86_64 .config
make -j 32
make modules_install
make install

Thanks

Lei
> https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
>
> Would be nice to have it applied for any tests.
>
> Thanks,
> -Siwei
>
> On 10/23/2023 11:51 PM, Lei Yang wrote:
> > QE tested this series v4 with regression testing on real nic, there is
> > no new regression bug.
> >
> > Tested-by: Lei Yang <leiyang@redhat.com>
> >
> > On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 10/22/2023 8:51 PM, Jason Wang wrote:
> >>> Hi Si-Wei:
> >>>
> >>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>> In order to reduce needlessly high setup and teardown cost
> >>>> of iotlb mapping during live migration, it's crucial to
> >>>> decouple the vhost-vdpa iotlb abstraction from the virtio
> >>>> device life cycle, i.e. iotlb mappings should be left
> >>>> intact across virtio device reset [1]. For it to work, the
> >>>> on-chip IOMMU parent device could implement a separate
> >>>> .reset_map() operation callback to restore 1:1 DMA mapping
> >>>> without having to resort to the .reset() callback, the
> >>>> latter of which is mainly used to reset virtio device state.
> >>>> This new .reset_map() callback will be invoked only before
> >>>> the vhost-vdpa driver is to be removed and detached from
> >>>> the vdpa bus, such that other vdpa bus drivers, e.g.
> >>>> virtio-vdpa, can start with 1:1 DMA mapping when they
> >>>> are attached. For the context, those on-chip IOMMU parent
> >>>> devices, create the 1:1 DMA mapping at vdpa device creation,
> >>>> and they would implicitly destroy the 1:1 mapping when
> >>>> the first .set_map or .dma_map callback is invoked.
> >>>>
> >>>> This patchset is rebased on top of the latest vhost tree.
> >>>>
> >>>> [1] Reducing vdpa migration downtime because of memory pin / maps
> >>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
> >>>>
> >>>> ---
> >>>> v4:
> >>>> - Rework compatibility using new .compat_reset driver op
> >>> I still think having a set_backend_feature()
> >> This will overload backend features with the role of carrying over
> >> compatibility quirks, which I tried to avoid from. While I think the
> >> .compat_reset from the v4 code just works with the backend features
> >> acknowledgement (and maybe others as well) to determine, but not
> >> directly tie it to backend features itself. These two have different
> >> implications in terms of requirement, scope and maintaining/deprecation,
> >> better to cope with compat quirks in explicit and driver visible way.
> >>
> >>>    or reset_map(clean=true) might be better.
> >> An explicit op might be marginally better in driver writer's point of
> >> view. Compliant driver doesn't have to bother asserting clean_map never
> >> be true so their code would never bother dealing with this case, as
> >> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> >> during reset for older userspace":
> >>
> >> "
> >>       The separation of .compat_reset from the regular .reset allows
> >>       vhost-vdpa able to know which driver had broken behavior before, so it
> >>       can apply the corresponding compatibility quirk to the individual
> >> driver
> >>       whenever needed.  Compared to overloading the existing .reset with
> >>       flags, .compat_reset won't cause any extra burden to the implementation
> >>       of every compliant driver.
> >> "
> >>
> >>>    As it tries hard to not introduce new stuff on the bus.
> >> Honestly I don't see substantial difference between these other than the
> >> color. There's no single best solution that stands out among the 3. And
> >> I assume you already noticed it from all the above 3 approaches will
> >> have to go with backend features negotiation, that the 1st vdpa reset
> >> before backend feature negotiation will use the compliant version of
> >> .reset that doesn't clean up the map. While I don't think this nuance
> >> matters much to existing older userspace apps, as the maps should
> >> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> >> bug-for-bug behavioral compatibility is what you want, module parameter
> >> will be the single best answer.
> >>
> >> Regards,
> >> -Siwei
> >>
> >>> But we can listen to others for sure.
> >>>
> >>> Thanks
> >>>
>
  
Si-Wei Liu Oct. 25, 2023, 11:31 p.m. UTC | #6
Hi Yang Lei,

Thanks for testing my patches and reporting! As for the issue, could you 
please try what I posted in:

https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/

and let me know how it goes? Thank you very much!

Thanks,
-Siwei

On 10/25/2023 2:41 AM, Lei Yang wrote:
> On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> Hello Si-Wei
>> Thanks a lot for testing! Please be aware that there's a follow-up fix
>> for a potential oops in this v4 series:
>>
> The first, when I did not apply this patch [1], I will also hit this
> patch mentioned problem. After I applied this patch, this problem will
> no longer to hit again. But I hit another issues, about the error
> messages please review the attached file.
> [1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
>
> My test steps:
> git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> cd  linux/
> b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com
> b4 am 20231018171456.1624030-2-dtatulea@nvidia.com
> b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com
> git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
> git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
> git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
> cp /boot/config-5.14.0-377.el9.x86_64 .config
> make -j 32
> make modules_install
> make install
>
> Thanks
>
> Lei
>> https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
>>
>> Would be nice to have it applied for any tests.
>>
>> Thanks,
>> -Siwei
>>
>> On 10/23/2023 11:51 PM, Lei Yang wrote:
>>> QE tested this series v4 with regression testing on real nic, there is
>>> no new regression bug.
>>>
>>> Tested-by: Lei Yang <leiyang@redhat.com>
>>>
>>> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 10/22/2023 8:51 PM, Jason Wang wrote:
>>>>> Hi Si-Wei:
>>>>>
>>>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>> In order to reduce needlessly high setup and teardown cost
>>>>>> of iotlb mapping during live migration, it's crucial to
>>>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
>>>>>> device life cycle, i.e. iotlb mappings should be left
>>>>>> intact across virtio device reset [1]. For it to work, the
>>>>>> on-chip IOMMU parent device could implement a separate
>>>>>> .reset_map() operation callback to restore 1:1 DMA mapping
>>>>>> without having to resort to the .reset() callback, the
>>>>>> latter of which is mainly used to reset virtio device state.
>>>>>> This new .reset_map() callback will be invoked only before
>>>>>> the vhost-vdpa driver is to be removed and detached from
>>>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
>>>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
>>>>>> are attached. For the context, those on-chip IOMMU parent
>>>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
>>>>>> and they would implicitly destroy the 1:1 mapping when
>>>>>> the first .set_map or .dma_map callback is invoked.
>>>>>>
>>>>>> This patchset is rebased on top of the latest vhost tree.
>>>>>>
>>>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
>>>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>>>>>>
>>>>>> ---
>>>>>> v4:
>>>>>> - Rework compatibility using new .compat_reset driver op
>>>>> I still think having a set_backend_feature()
>>>> This will overload backend features with the role of carrying over
>>>> compatibility quirks, which I tried to avoid from. While I think the
>>>> .compat_reset from the v4 code just works with the backend features
>>>> acknowledgement (and maybe others as well) to determine, but not
>>>> directly tie it to backend features itself. These two have different
>>>> implications in terms of requirement, scope and maintaining/deprecation,
>>>> better to cope with compat quirks in explicit and driver visible way.
>>>>
>>>>>     or reset_map(clean=true) might be better.
>>>> An explicit op might be marginally better in driver writer's point of
>>>> view. Compliant driver doesn't have to bother asserting clean_map never
>>>> be true so their code would never bother dealing with this case, as
>>>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
>>>> during reset for older userspace":
>>>>
>>>> "
>>>>        The separation of .compat_reset from the regular .reset allows
>>>>        vhost-vdpa able to know which driver had broken behavior before, so it
>>>>        can apply the corresponding compatibility quirk to the individual
>>>> driver
>>>>        whenever needed.  Compared to overloading the existing .reset with
>>>>        flags, .compat_reset won't cause any extra burden to the implementation
>>>>        of every compliant driver.
>>>> "
>>>>
>>>>>     As it tries hard to not introduce new stuff on the bus.
>>>> Honestly I don't see substantial difference between these other than the
>>>> color. There's no single best solution that stands out among the 3. And
>>>> I assume you already noticed it from all the above 3 approaches will
>>>> have to go with backend features negotiation, that the 1st vdpa reset
>>>> before backend feature negotiation will use the compliant version of
>>>> .reset that doesn't clean up the map. While I don't think this nuance
>>>> matters much to existing older userspace apps, as the maps should
>>>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
>>>> bug-for-bug behavioral compatibility is what you want, module parameter
>>>> will be the single best answer.
>>>>
>>>> Regards,
>>>> -Siwei
>>>>
>>>>> But we can listen to others for sure.
>>>>>
>>>>> Thanks
>>>>>
  
Lei Yang Oct. 26, 2023, 6:16 a.m. UTC | #7
On Thu, Oct 26, 2023 at 7:32 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Hi Yang Lei,
>
> Thanks for testing my patches and reporting! As for the issue, could you
> please try what I posted in:
>
> https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/
>
HI Si-Wei
> and let me know how it goes? Thank you very much!

This problem has gone after applying this patch [1].
[1] https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/

Thanks
Lei
>
> Thanks,
> -Siwei
>
> On 10/25/2023 2:41 AM, Lei Yang wrote:
> > On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> > Hello Si-Wei
> >> Thanks a lot for testing! Please be aware that there's a follow-up fix
> >> for a potential oops in this v4 series:
> >>
> > The first, when I did not apply this patch [1], I will also hit this
> > patch mentioned problem. After I applied this patch, this problem will
> > no longer to hit again. But I hit another issues, about the error
> > messages please review the attached file.
> > [1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
> >
> > My test steps:
> > git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > cd  linux/
> > b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com
> > b4 am 20231018171456.1624030-2-dtatulea@nvidia.com
> > b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com
> > git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
> > git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
> > git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
> > cp /boot/config-5.14.0-377.el9.x86_64 .config
> > make -j 32
> > make modules_install
> > make install
> >
> > Thanks
> >
> > Lei
> >> https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
> >>
> >> Would be nice to have it applied for any tests.
> >>
> >> Thanks,
> >> -Siwei
> >>
> >> On 10/23/2023 11:51 PM, Lei Yang wrote:
> >>> QE tested this series v4 with regression testing on real nic, there is
> >>> no new regression bug.
> >>>
> >>> Tested-by: Lei Yang <leiyang@redhat.com>
> >>>
> >>> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>
> >>>> On 10/22/2023 8:51 PM, Jason Wang wrote:
> >>>>> Hi Si-Wei:
> >>>>>
> >>>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>>> In order to reduce needlessly high setup and teardown cost
> >>>>>> of iotlb mapping during live migration, it's crucial to
> >>>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
> >>>>>> device life cycle, i.e. iotlb mappings should be left
> >>>>>> intact across virtio device reset [1]. For it to work, the
> >>>>>> on-chip IOMMU parent device could implement a separate
> >>>>>> .reset_map() operation callback to restore 1:1 DMA mapping
> >>>>>> without having to resort to the .reset() callback, the
> >>>>>> latter of which is mainly used to reset virtio device state.
> >>>>>> This new .reset_map() callback will be invoked only before
> >>>>>> the vhost-vdpa driver is to be removed and detached from
> >>>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
> >>>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
> >>>>>> are attached. For the context, those on-chip IOMMU parent
> >>>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
> >>>>>> and they would implicitly destroy the 1:1 mapping when
> >>>>>> the first .set_map or .dma_map callback is invoked.
> >>>>>>
> >>>>>> This patchset is rebased on top of the latest vhost tree.
> >>>>>>
> >>>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
> >>>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
> >>>>>>
> >>>>>> ---
> >>>>>> v4:
> >>>>>> - Rework compatibility using new .compat_reset driver op
> >>>>> I still think having a set_backend_feature()
> >>>> This will overload backend features with the role of carrying over
> >>>> compatibility quirks, which I tried to avoid from. While I think the
> >>>> .compat_reset from the v4 code just works with the backend features
> >>>> acknowledgement (and maybe others as well) to determine, but not
> >>>> directly tie it to backend features itself. These two have different
> >>>> implications in terms of requirement, scope and maintaining/deprecation,
> >>>> better to cope with compat quirks in explicit and driver visible way.
> >>>>
> >>>>>     or reset_map(clean=true) might be better.
> >>>> An explicit op might be marginally better in driver writer's point of
> >>>> view. Compliant driver doesn't have to bother asserting clean_map never
> >>>> be true so their code would never bother dealing with this case, as
> >>>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> >>>> during reset for older userspace":
> >>>>
> >>>> "
> >>>>        The separation of .compat_reset from the regular .reset allows
> >>>>        vhost-vdpa able to know which driver had broken behavior before, so it
> >>>>        can apply the corresponding compatibility quirk to the individual
> >>>> driver
> >>>>        whenever needed.  Compared to overloading the existing .reset with
> >>>>        flags, .compat_reset won't cause any extra burden to the implementation
> >>>>        of every compliant driver.
> >>>> "
> >>>>
> >>>>>     As it tries hard to not introduce new stuff on the bus.
> >>>> Honestly I don't see substantial difference between these other than the
> >>>> color. There's no single best solution that stands out among the 3. And
> >>>> I assume you already noticed it from all the above 3 approaches will
> >>>> have to go with backend features negotiation, that the 1st vdpa reset
> >>>> before backend feature negotiation will use the compliant version of
> >>>> .reset that doesn't clean up the map. While I don't think this nuance
> >>>> matters much to existing older userspace apps, as the maps should
> >>>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> >>>> bug-for-bug behavioral compatibility is what you want, module parameter
> >>>> will be the single best answer.
> >>>>
> >>>> Regards,
> >>>> -Siwei
> >>>>
> >>>>> But we can listen to others for sure.
> >>>>>
> >>>>> Thanks
> >>>>>
>