[v9,0/4] KVM: arm64: Allow the VM to select DEVICE_* and NORMAL_NC for IO memory

Message ID 20240224150546.368-1-ankita@nvidia.com
Headers
Series KVM: arm64: Allow the VM to select DEVICE_* and NORMAL_NC for IO memory |

Message

Ankit Agrawal Feb. 24, 2024, 3:05 p.m. UTC
  From: Ankit Agrawal <ankita@nvidia.com>

Currently, KVM for ARM64 maps at stage 2 memory that is considered device
with DEVICE_nGnRE memory attributes; this setting overrides (per
ARM architecture [1]) any device MMIO mapping present at stage 1,
resulting in a set-up whereby a guest operating system cannot
determine device MMIO mapping memory attributes on its own but
it is always overridden by the KVM stage 2 default.

This set-up does not allow guest operating systems to select device
memory attributes independently from KVM stage-2 mappings
(refer to [1], "Combining stage 1 and stage 2 memory type attributes"),
which turns out to be an issue in that guest operating systems
(e.g. Linux) may request to map devices MMIO regions with memory
attributes that guarantee better performance (e.g. gathering
attribute - that for some devices can generate larger PCIe memory
writes TLPs) and specific operations (e.g. unaligned transactions)
such as the NormalNC memory type.

The default device stage 2 mapping was chosen in KVM for ARM64 since
it was considered safer (i.e. it would not allow guests to trigger
uncontained failures ultimately crashing the machine) but this
turned out to be asynchronous (SError) defeating the purpose.

For these reasons, relax the KVM stage 2 device memory attributes
from DEVICE_nGnRE to Normal-NC.

Generalizing to other devices may be problematic, however. E.g.
GICv2 VCPU interface, which is effectively a shared peripheral, can
allow a guest to affect another guest's interrupt distribution. Hence
limit the change to VFIO PCI as caution. This is achieved by
making the VFIO PCI core module set a flag that is tested by KVM
to activate the code. This could be extended to other devices in
the future once that is deemed safe.

[1] section D8.5 - DDI0487J_a_a-profile_architecture_reference_manual.pdf

Applied over v6.8-rc5.

History
=======
v8 -> v9
- Collected Reviewed-by and Acked-by.
- Updated the commit messages in 2/4 and 4/4 to passive voice and fix
  spelling error.
- Updated subjects to align with convention of using capitalized first
  letter.
- Added links in 1/4 on the previous conversation for tracking purpose.

v7 -> v8
- Changed commit message of patches 2/4 and 4/4 to include detailed
  description of the VM_ALLOW_ANY_UNCACHED flag posted by Jason in
  the commit message.
- Added more detailed comment in the vfio_pci_core about
  VM_ALLOW_ANY_UNCACHED flag.
- Rebased to v6.8-rc5.

v6 -> v7
- Changed VM_VFIO_ALLOW_WC to VM_ALLOW_ANY_UNCACHED based on suggestion
  from Alex Williamson.
- Refactored stage2_set_prot_attr() based on Will's suggestion to
  reorganize the switch cases. Also updated the case to return -EINVAL
  when both KVM_PGTABLE_PROT_DEVICE and KVM_PGTABLE_PROT_NORMAL_NC set.
- Fixed nits pointed by Oliver and Catalin.

v5 -> v6
- Rebased to v6.8-rc2

v4 -> v5
- Moved the cover letter description text to patch 1/4.
- Cleaned up stage2_set_prot_attr() based on Marc Zyngier suggestions.
- Moved the mm header file changes to a separate patch.
- Rebased to v6.7-rc3.

v3 -> v4
- Moved the vfio-pci change to use the VM_VFIO_ALLOW_WC into
  separate patch.
- Added check to warn on the case NORMAL_NC and DEVICE are
  set simultaneously.
- Fixed miscellaneous nitpicks suggested in v3.

v2 -> v3
- Added a new patch (and converted to patch series) suggested by
  Catalin Marinas to ensure the code changes are restricted to
  VFIO PCI devices.
- Introduced VM_VFIO_ALLOW_WC flag for VFIO PCI to communicate
  with VMM.
- Reverted GIC mapping to DEVICE.

v1 -> v2
- Updated commit log to the one posted by
  Lorenzo Pieralisi <lpieralisi@kernel.org> (Thanks!)
- Added new flag to represent the NORMAL_NC setting. Updated
  stage2_set_prot_attr() to handle new flag.

v8 Link:
https://lore.kernel.org/all/20240220072926.6466-1-ankita@nvidia.com/

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Ankit Agrawal (4):
  KVM: arm64: Introduce new flag for non-cacheable IO memory
  mm: Introduce new flag to indicate wc safe
  KVM: arm64: Set io memory s2 pte as normalnc for vfio pci device
  vfio: Convey kvm that the vfio-pci device is wc safe

 arch/arm64/include/asm/kvm_pgtable.h |  2 ++
 arch/arm64/include/asm/memory.h      |  2 ++
 arch/arm64/kvm/hyp/pgtable.c         | 24 +++++++++++++++++++-----
 arch/arm64/kvm/mmu.c                 | 14 ++++++++++----
 drivers/vfio/pci/vfio_pci_core.c     | 19 ++++++++++++++++++-
 include/linux/mm.h                   | 14 ++++++++++++++
 6 files changed, 65 insertions(+), 10 deletions(-)
  

Comments

Oliver Upton Feb. 26, 2024, 11:45 p.m. UTC | #1
On Sat, 24 Feb 2024 20:35:42 +0530, ankita@nvidia.com wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
> 
> Currently, KVM for ARM64 maps at stage 2 memory that is considered device
> with DEVICE_nGnRE memory attributes; this setting overrides (per
> ARM architecture [1]) any device MMIO mapping present at stage 1,
> resulting in a set-up whereby a guest operating system cannot
> determine device MMIO mapping memory attributes on its own but
> it is always overridden by the KVM stage 2 default.
> 
> [...]

High time to get this cooking in -next. Looks like there aren't any
conflicts w/ VFIO, but if that changes I've pushed a topic branch to:

  https://git.kernel.org/pub/scm/linux/kernel/git/oupton/linux.git/log/?h=kvm-arm64/vfio-normal-nc

Applied to kvmarm/next, thanks!

[1/4] KVM: arm64: Introduce new flag for non-cacheable IO memory
      https://git.kernel.org/kvmarm/kvmarm/c/c034ec84e879
[2/4] mm: Introduce new flag to indicate wc safe
      https://git.kernel.org/kvmarm/kvmarm/c/5c656fcdd6c6
[3/4] KVM: arm64: Set io memory s2 pte as normalnc for vfio pci device
      https://git.kernel.org/kvmarm/kvmarm/c/8c47ce3e1d2c
[4/4] vfio: Convey kvm that the vfio-pci device is wc safe
      https://git.kernel.org/kvmarm/kvmarm/c/a39d3a966a09

--
Best,
Oliver
  
Ankit Agrawal Feb. 27, 2024, 8:45 a.m. UTC | #2
>>
>> Currently, KVM for ARM64 maps at stage 2 memory that is considered device
>> with DEVICE_nGnRE memory attributes; this setting overrides (per
>> ARM architecture [1]) any device MMIO mapping present at stage 1,
>> resulting in a set-up whereby a guest operating system cannot
>> determine device MMIO mapping memory attributes on its own but
>> it is always overridden by the KVM stage 2 default.
>>
>> [...]
>
> High time to get this cooking in -next. Looks like there aren't any
> conflicts w/ VFIO, but if that changes I've pushed a topic branch to:
>
>  https://git.kernel.org/pub/scm/linux/kernel/git/oupton/linux.git/log/?h=kvm-arm64/vfio-normal-nc
>
> Applied to kvmarm/next, thanks!

Thanks Oliver for your efforts. Pardon my naivety, but what would the
sequence of steps that this series go through next before landing in an
rc branch? Also, what is the earliest branch this is supposed to land
assuming all goes well?

>
> [1/4] KVM: arm64: Introduce new flag for non-cacheable IO memory
>      https://git.kernel.org/kvmarm/kvmarm/c/c034ec84e879
> [2/4] mm: Introduce new flag to indicate wc safe
>      https://git.kernel.org/kvmarm/kvmarm/c/5c656fcdd6c6
> [3/4] KVM: arm64: Set io memory s2 pte as normalnc for vfio pci device
>      https://git.kernel.org/kvmarm/kvmarm/c/8c47ce3e1d2c
> [4/4] vfio: Convey kvm that the vfio-pci device is wc safe
>      https://git.kernel.org/kvmarm/kvmarm/c/a39d3a966a09
  
Oliver Upton Feb. 27, 2024, 8:49 a.m. UTC | #3
On Tue, Feb 27, 2024 at 08:45:38AM +0000, Ankit Agrawal wrote:
> >>
> >> Currently, KVM for ARM64 maps at stage 2 memory that is considered device
> >> with DEVICE_nGnRE memory attributes; this setting overrides (per
> >> ARM architecture [1]) any device MMIO mapping present at stage 1,
> >> resulting in a set-up whereby a guest operating system cannot
> >> determine device MMIO mapping memory attributes on its own but
> >> it is always overridden by the KVM stage 2 default.
> >>
> >> [...]
> >
> > High time to get this cooking in -next. Looks like there aren't any
> > conflicts w/ VFIO, but if that changes I've pushed a topic branch to:
> >
> >  https://git.kernel.org/pub/scm/linux/kernel/git/oupton/linux.git/log/?h=kvm-arm64/vfio-normal-nc
> >
> > Applied to kvmarm/next, thanks!
> 
> Thanks Oliver for your efforts. Pardon my naivety, but what would the
> sequence of steps that this series go through next before landing in an
> rc branch? Also, what is the earliest branch this is supposed to land
> assuming all goes well?

We should see this showing up in linux-next imminently. Assuming there
are no issues there, your changes will be sent out as part of the kvmarm
pull request for 6.9.

At least in kvmarm, /next is used for patches that'll land in the next
merge window and /fixes is for bugfixes that need to go in the current
release cycle.
  
Ankit Agrawal Feb. 27, 2024, 9:42 a.m. UTC | #4
>> >
>> > High time to get this cooking in -next. Looks like there aren't any
>> > conflicts w/ VFIO, but if that changes I've pushed a topic branch to:
>> >
>> >  https://git.kernel.org/pub/scm/linux/kernel/git/oupton/linux.git/log/?h=kvm-arm64/vfio-normal-nc
>> >
>> > Applied to kvmarm/next, thanks!
>>
>> Thanks Oliver for your efforts. Pardon my naivety, but what would the
>> sequence of steps that this series go through next before landing in an
>> rc branch? Also, what is the earliest branch this is supposed to land
>> assuming all goes well?
>
> We should see this showing up in linux-next imminently. Assuming there
> are no issues there, your changes will be sent out as part of the kvmarm
> pull request for 6.9.
>
> At least in kvmarm, /next is used for patches that'll land in the next
> merge window and /fixes is for bugfixes that need to go in the current
> release cycle.

Got it, thanks for the information!