[v3,0/6] KVM: MMU: performance tweaks for heavy CR0.WP users

Message ID 20230201194604.11135-1-minipli@grsecurity.net
Headers
Series KVM: MMU: performance tweaks for heavy CR0.WP users |

Message

Mathias Krause Feb. 1, 2023, 7:45 p.m. UTC
  v2: https://lore.kernel.org/kvm/20230118145030.40845-1-minipli@grsecurity.net/

This series is a resurrection of the missing pieces of Paolo's previous
attempt[1] to avoid needless MMU roots unloading. The performance gap
between TDP and legacy MMU is still existent, especially noticeable under
grsecurity which implements kernel W^X by toggling CR0.WP, which happens
very frequently.

Patches 1-13 and 17 of the old series had been merged, but, unfortunately,
the remaining parts never saw a v3. I therefore took care of these, took
Sean's feedback into account[2] and simplified the whole approach to just
handle the case we care most about explicitly.

Patch 1 is a v3 of [3], addressing Sean's feedback.

Patch 2 is specifically useful for grsecurity, as handle_cr() is by far
*the* top vmexit reason.

Patch 3 is the most important one, as it skips unloading the MMU roots for
CR0.WP toggling.

Sean was suggesting another change on top of v2 of this series, to skip
intercepting CR0.WP writes completely for VMX[4]. That turned out to be
yet another performance boost and is implemenmted in patch 6.

While patches 1 and 2 bring small performance improvements already, the
big gains come from patches 3 and 6.

I used 'ssdd 10 50000' from rt-tests[5] as a micro-benchmark, running on a
grsecurity L1 VM. Below table shows the results (runtime in seconds, lower
is better):

                         legacy     TDP    shadow
    kvm.git/queue        11.55s   13.91s    75.2s
    + patches 1-3         7.32s    7.31s    74.6s
    + patches 4-6         4.89s    4.89s    73.4s

This series builds on top of kvm.git/queue, namely commit de60733246ff
("Merge branch 'kvm-hw-enable-refactor' into HEAD").

Patches 1-3 didn't change from v2, beside minor changlog mangling.

Patches 4-6 are new to v3.

Thanks,
Mathias

[1] https://lore.kernel.org/kvm/20220217210340.312449-1-pbonzini@redhat.com/
[2] https://lore.kernel.org/kvm/YhATewkkO%2Fl4P9UN@google.com/
[3] https://lore.kernel.org/kvm/YhAB1d1%2FnQbx6yvk@google.com/
[4] https://lore.kernel.org/kvm/Y8cTMnyBzNdO5dY3@google.com/
[5] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git

Mathias Krause (5):
  KVM: VMX: Avoid retpoline call for control register caused exits
  KVM: x86: Do not unload MMU roots when only toggling CR0.WP
  KVM: x86: Make use of kvm_read_cr*_bits() when testing bits
  KVM: x86/mmu: Fix comment typo
  KVM: VMX: Make CR0.WP a guest owned bit

Paolo Bonzini (1):
  KVM: x86/mmu: Avoid indirect call for get_cr3

 arch/x86/kvm/kvm_cache_regs.h   |  3 ++-
 arch/x86/kvm/mmu/mmu.c          | 31 ++++++++++++++++++++-----------
 arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
 arch/x86/kvm/mmu/spte.c         |  2 +-
 arch/x86/kvm/pmu.c              |  4 ++--
 arch/x86/kvm/vmx/capabilities.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  4 ++--
 arch/x86/kvm/vmx/vmx.c          | 15 ++++++++++++---
 arch/x86/kvm/vmx/vmx.h          |  8 ++++++++
 arch/x86/kvm/x86.c              |  9 +++++++++
 10 files changed, 58 insertions(+), 21 deletions(-)
  

Comments

Mathias Krause March 6, 2023, 6:34 a.m. UTC | #1
On 01.02.23 20:45, Mathias Krause wrote:
> v2: https://lore.kernel.org/kvm/20230118145030.40845-1-minipli@grsecurity.net/
> 
> This series is a resurrection of the missing pieces of Paolo's previous
> attempt[1] to avoid needless MMU roots unloading. The performance gap
> between TDP and legacy MMU is still existent, especially noticeable under
> grsecurity which implements kernel W^X by toggling CR0.WP, which happens
> very frequently.
> 
> Patches 1-13 and 17 of the old series had been merged, but, unfortunately,
> the remaining parts never saw a v3. I therefore took care of these, took
> Sean's feedback into account[2] and simplified the whole approach to just
> handle the case we care most about explicitly.
> 
> Patch 1 is a v3 of [3], addressing Sean's feedback.
> 
> Patch 2 is specifically useful for grsecurity, as handle_cr() is by far
> *the* top vmexit reason.
> 
> Patch 3 is the most important one, as it skips unloading the MMU roots for
> CR0.WP toggling.
> 
> Sean was suggesting another change on top of v2 of this series, to skip
> intercepting CR0.WP writes completely for VMX[4]. That turned out to be
> yet another performance boost and is implemenmted in patch 6.
> 
> While patches 1 and 2 bring small performance improvements already, the
> big gains come from patches 3 and 6.
> 
> I used 'ssdd 10 50000' from rt-tests[5] as a micro-benchmark, running on a
> grsecurity L1 VM. Below table shows the results (runtime in seconds, lower
> is better):
> 
>                          legacy     TDP    shadow
>     kvm.git/queue        11.55s   13.91s    75.2s
>     + patches 1-3         7.32s    7.31s    74.6s
>     + patches 4-6         4.89s    4.89s    73.4s
> 
> This series builds on top of kvm.git/queue, namely commit de60733246ff
> ("Merge branch 'kvm-hw-enable-refactor' into HEAD").
> 
> Patches 1-3 didn't change from v2, beside minor changlog mangling.
> 
> Patches 4-6 are new to v3.
> 
> Thanks,
> Mathias
> 
> [1] https://lore.kernel.org/kvm/20220217210340.312449-1-pbonzini@redhat.com/
> [2] https://lore.kernel.org/kvm/YhATewkkO%2Fl4P9UN@google.com/
> [3] https://lore.kernel.org/kvm/YhAB1d1%2FnQbx6yvk@google.com/
> [4] https://lore.kernel.org/kvm/Y8cTMnyBzNdO5dY3@google.com/
> [5] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
> 
> Mathias Krause (5):
>   KVM: VMX: Avoid retpoline call for control register caused exits
>   KVM: x86: Do not unload MMU roots when only toggling CR0.WP
>   KVM: x86: Make use of kvm_read_cr*_bits() when testing bits
>   KVM: x86/mmu: Fix comment typo
>   KVM: VMX: Make CR0.WP a guest owned bit
> 
> Paolo Bonzini (1):
>   KVM: x86/mmu: Avoid indirect call for get_cr3
> 
>  arch/x86/kvm/kvm_cache_regs.h   |  3 ++-
>  arch/x86/kvm/mmu/mmu.c          | 31 ++++++++++++++++++++-----------
>  arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
>  arch/x86/kvm/mmu/spte.c         |  2 +-
>  arch/x86/kvm/pmu.c              |  4 ++--
>  arch/x86/kvm/vmx/capabilities.h |  1 +
>  arch/x86/kvm/vmx/nested.c       |  4 ++--
>  arch/x86/kvm/vmx/vmx.c          | 15 ++++++++++++---
>  arch/x86/kvm/vmx/vmx.h          |  8 ++++++++
>  arch/x86/kvm/x86.c              |  9 +++++++++
>  10 files changed, 58 insertions(+), 21 deletions(-)

Ping!

Anything I can do to help getting this series reviewed and hopefully merged?

Thanks,
Mathias
  
Sean Christopherson March 6, 2023, 6:07 p.m. UTC | #2
On Mon, Mar 06, 2023, Mathias Krause wrote:
> On 01.02.23 20:45, Mathias Krause wrote:
> > Mathias Krause (5):
> >   KVM: VMX: Avoid retpoline call for control register caused exits
> >   KVM: x86: Do not unload MMU roots when only toggling CR0.WP
> >   KVM: x86: Make use of kvm_read_cr*_bits() when testing bits
> >   KVM: x86/mmu: Fix comment typo
> >   KVM: VMX: Make CR0.WP a guest owned bit
> > 
> > Paolo Bonzini (1):
> >   KVM: x86/mmu: Avoid indirect call for get_cr3
> > 
> >  arch/x86/kvm/kvm_cache_regs.h   |  3 ++-
> >  arch/x86/kvm/mmu/mmu.c          | 31 ++++++++++++++++++++-----------
> >  arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
> >  arch/x86/kvm/mmu/spte.c         |  2 +-
> >  arch/x86/kvm/pmu.c              |  4 ++--
> >  arch/x86/kvm/vmx/capabilities.h |  1 +
> >  arch/x86/kvm/vmx/nested.c       |  4 ++--
> >  arch/x86/kvm/vmx/vmx.c          | 15 ++++++++++++---
> >  arch/x86/kvm/vmx/vmx.h          |  8 ++++++++
> >  arch/x86/kvm/x86.c              |  9 +++++++++
> >  10 files changed, 58 insertions(+), 21 deletions(-)
> 
> Ping!
> 
> Anything I can do to help getting this series reviewed and hopefully merged?

I'm slowly getting there...

https://lore.kernel.org/kvm/Y%2Fk+n6HqfLNmmmtM@google.com