Message ID | 20231107183605.409588-1-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:aa0b:0:b0:403:3b70:6f57 with SMTP id k11csp432238vqo; Tue, 7 Nov 2023 10:37:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IExOREqykTQK8QjBLz6H6ybpJ8Y7BP3RYLFxcj2QzxC965yU+3YYUM+VK7VVQGCfUx7NCll X-Received: by 2002:a05:6358:70e:b0:168:e5e7:e652 with SMTP id e14-20020a056358070e00b00168e5e7e652mr30035157rwj.7.1699382244541; Tue, 07 Nov 2023 10:37:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699382244; cv=none; d=google.com; s=arc-20160816; b=vcvT/kgbmz/6bG2SEeNxdYRTqI72UwBU0LdOc+8oI/1yCtmA3SHOHumKNjgmApO5zy 9w0VIjMF7axWHPBNp00WFJ2A8ODYUz/XgAoxA3q1Uu8qQZATnS+iljNfPFDREMHehvC5 YSHMWYFk3Hk3q9kV2oC9iSOZjxneZp/iXWSiYYQeOOb+9O/PdI4UmitZEKHIGu8fP/r2 k014K4LIk7BHLUgwdSPlVzm+NGGzQdZvlyqa9IH+JtLOwV0ZWyCWLWy4TIVzdQR8TXs8 9+ekgh2S2Yzz2DhoAawx3B42FjfJPhKr36UB2mTJmjSc2/O6h2ZMtTU4w3/0zwG4X26t ysgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :reply-to:dkim-signature; bh=Nvoe8zuzQD68UsZwn7syk08Ugz3HDvlC6EsBdsDkAvU=; fh=RJmUFR00zsh7EQG6+UhC2I9wYMDWaVPVIIfZSndYMYo=; b=r3S1x6phvXBVQxZESHft3Hm7oCB+BRA3gPK0DeVeq6+3lL+mLoTybztgm5mGhgxw0Y Yni3orfmGicFBmLAWwnOFu6+TwWTXnmpXtCDSkbIRX+mB7lA3cg1LRZad9StjlBqttyH y0tzUMBWT5k1On8pUXnxSAU7ZNjDrHmd6ApyWJta+0BUL5oZz3nJG46D4Ri3oGZRua6B WQX/GKvt5GjrpblZ8dcDI2lz2X3MaArAxIuwuE6WmngUyIAreUjdobIEMGJ+zXQfkkul +chp+Ir2kF/gEKSSADCSvYeakvg9XHwkzbLY5bgtllp4ayxnK1dUAGqT00MkORbHmVmd 46rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=j5ptDbDH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id a190-20020a6390c7000000b005bd27920755si2476461pge.534.2023.11.07.10.37.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Nov 2023 10:37:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=j5ptDbDH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id E0BA380713CC; Tue, 7 Nov 2023 10:36:24 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230523AbjKGSgN (ORCPT <rfc822;lhua1029@gmail.com> + 32 others); Tue, 7 Nov 2023 13:36:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229886AbjKGSgK (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 7 Nov 2023 13:36:10 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A778DB9 for <linux-kernel@vger.kernel.org>; Tue, 7 Nov 2023 10:36:08 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-da3b6438170so6011952276.1 for <linux-kernel@vger.kernel.org>; Tue, 07 Nov 2023 10:36:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699382168; x=1699986968; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:reply-to:from:to:cc :subject:date:message-id:reply-to; bh=Nvoe8zuzQD68UsZwn7syk08Ugz3HDvlC6EsBdsDkAvU=; b=j5ptDbDHZ4q2C2BUC5FnZV1JwcS6SZCaIuiapgFRKgZv5rvFpqPvQ5lZk2dFeyiKED oJwVY8G7/2T4xUncgZxg/8+JETVUYf74wBgBFGUELCzMcOAS032mLRrEu5f6Mv1W8WY2 DCfR3G/k9Y7Z+89jRw6hfzOlV+QP6Xj5fyTtmeIa4jhND0aDhocswzGhUb9JgF8qjbRJ 3b6F+WJV44vhozGQK8VKV/Et4lCn7R3oMU+MqpQOFDXYj7Sa7f9Sk4OOXjKht7KTNhT9 gWgpLYbW296fGviaJqDmGI2CWABuwNe5W/mrNzZTCbVI927BqtnaJB9bb2VP8YYZDX/M /htw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699382168; x=1699986968; h=cc:to:from:subject:message-id:mime-version:date:reply-to :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Nvoe8zuzQD68UsZwn7syk08Ugz3HDvlC6EsBdsDkAvU=; b=ldBwnIVz11Y63dxHnS2ddpUXcYMMLeNHzi+SFEKLiC546r/wZdeIPmkvqEhyIxw1dj 0Zn9Bs5g/+PpjeeO0x1ZKuOoVI1aSdThPvV6mcMVMur0BvCFDMdzNmf8dFMgP1j640xt Pk4iioe+EM63XzCK0SityjJ3w5UNW+S6tw9DawlZoC8nCbqePc8iLv6TNJkcbLs4YYUR 6sGZfgRa2tkjzsRpJa5xXpiyYMJsQP+9TDapwjMJXOn4AFXP27xPU9Gg1B2wxYKxVuyN pW1bQyVGj3tOUcjmajaO2vDt9+vdHo/H5bpirkOyaG3PZHwtUjwCpni9/YmUxqTyjgu4 CXtw== X-Gm-Message-State: AOJu0YzbAVwTwE9Jw9OQRiteVDiLjkeXYMbpc1Kn52KZuhk5riLePAEf cG33+FFvVwRG0OAllDEMUwVuHfM0SEI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1788:b0:da0:c9a5:b529 with SMTP id ca8-20020a056902178800b00da0c9a5b529mr601875ybb.12.1699382167904; Tue, 07 Nov 2023 10:36:07 -0800 (PST) Reply-To: Sean Christopherson <seanjc@google.com> Date: Tue, 7 Nov 2023 10:36:05 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231107183605.409588-1-seanjc@google.com> Subject: [PATCH] perf/x86: Don't enforce minimum period for KVM guest-only events From: Sean Christopherson <seanjc@google.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Like Xu <likexu@tencent.com>, Jim Mattson <jmattson@google.com>, Mingwei Zhang <mizhang@google.com>, Sean Christopherson <seanjc@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 07 Nov 2023 10:36:24 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781931436553813741 X-GMAIL-MSGID: 1781931436553813741 |
Series |
perf/x86: Don't enforce minimum period for KVM guest-only events
|
|
Commit Message
Sean Christopherson
Nov. 7, 2023, 6:36 p.m. UTC
Don't apply minimum period workarounds/requirements to events that are
being created by KVM to virtualize PMCs for guests, i.e. skip limit
enforcement for events that exclude the host. Perf's somewhat arbitrary
limits prevents KVM from correctly virtualizing counter overflow, e.g. if
the guest sets a counter to have an effective period of '1', forcing a
minimum period of '2' results in overflow occurring at the incorrect time.
Whether or not a "real" profiling use case is affected is debatable, but
the incorrect behavior is trivially easy to observe and reproduce, and is
deterministic enough to make the PMU appear to be broken from the guest's
perspective.
Furthermore, the "period" set by KVM isn't actually a period, as KVM won't
automatically reprogram the event with the same period on overflow. KVM
will synthesize a PMI into the guest when appropriate, but what the guest
does in response to the PMI is purely a guest decision. In other words,
KVM effectively operates in a one-shot mode, not a periodic mode.
Letting KVM and/or the guest program "too small" periods is safe for the
host, as events that exclude the host are atomically disabled with respect
to VM-Exit, i.e. are guaranteed to stop counting upon transitioning to the
host. And whether or not *explicitly* programming a short period is safe
is somewhat of a moot point, as transitions to/from the guest effectively
yield the same effect, e.g. an unrelated VM-Exit => VM-Enter transition
will re-enable guest PMCs with whatever count happened to be in the PMC at
the time of VM-Exit.
Cc: Like Xu <likexu@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Disclaimer: I've only tested this from KVM's side of things.
arch/x86/events/core.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
base-commit: 744940f1921c8feb90e3c4bcc1e153fdd6e10fe2
Comments
On Tue, Nov 07, 2023, Sean Christopherson wrote: > Don't apply minimum period workarounds/requirements to events that are > being created by KVM to virtualize PMCs for guests, i.e. skip limit > enforcement for events that exclude the host. Perf's somewhat arbitrary > limits prevents KVM from correctly virtualizing counter overflow, e.g. if > the guest sets a counter to have an effective period of '1', forcing a > minimum period of '2' results in overflow occurring at the incorrect time. > > Whether or not a "real" profiling use case is affected is debatable, but > the incorrect behavior is trivially easy to observe and reproduce, and is > deterministic enough to make the PMU appear to be broken from the guest's > perspective. > > Furthermore, the "period" set by KVM isn't actually a period, as KVM won't > automatically reprogram the event with the same period on overflow. KVM > will synthesize a PMI into the guest when appropriate, but what the guest > does in response to the PMI is purely a guest decision. In other words, > KVM effectively operates in a one-shot mode, not a periodic mode. > > Letting KVM and/or the guest program "too small" periods is safe for the > host, as events that exclude the host are atomically disabled with respect > to VM-Exit, i.e. are guaranteed to stop counting upon transitioning to the > host. And whether or not *explicitly* programming a short period is safe > is somewhat of a moot point, as transitions to/from the guest effectively > yield the same effect, e.g. an unrelated VM-Exit => VM-Enter transition > will re-enable guest PMCs with whatever count happened to be in the PMC at > the time of VM-Exit. > > Cc: Like Xu <likexu@tencent.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: Mingwei Zhang <mizhang@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > > Disclaimer: I've only tested this from KVM's side of things. > > arch/x86/events/core.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > index 40ad1425ffa2..f8a8a4ea4d47 100644 > --- a/arch/x86/events/core.c > +++ b/arch/x86/events/core.c > @@ -1388,16 +1388,25 @@ int x86_perf_event_set_period(struct perf_event *event) > hwc->last_period = period; > ret = 1; > } > - /* > - * Quirk: certain CPUs dont like it if just 1 hw_event is left: > - */ > - if (unlikely(left < 2)) > - left = 2; > > if (left > x86_pmu.max_period) > left = x86_pmu.max_period; > > - static_call_cond(x86_pmu_limit_period)(event, &left); > + /* > + * Exempt KVM guest events from the minimum period requirements. It's > + * the guest's responsibility to ensure it can make forward progress, > + * and it's KVM's responsibility to configure an appropriate "period" > + * to correctly virtualize overflow for the guest's PMCs. > + */ > + if (!event->attr.exclude_host) { > + /* > + * Quirk: certain CPUs dont like it if just 1 event is left: > + */ > + if (unlikely(left < 2)) > + left = 2; > + > + static_call_cond(x86_pmu_limit_period)(event, &left); > + } > > this_cpu_write(pmc_prev_left[idx], left); > Nice one. I am curious how you tested this one? I would like to reproduce that one on my side. > > base-commit: 744940f1921c8feb90e3c4bcc1e153fdd6e10fe2 > -- > 2.42.0.869.gea05f2083d-goog >
On Tue, Nov 07, 2023, Mingwei Zhang wrote: > On Tue, Nov 07, 2023, Sean Christopherson wrote: > > arch/x86/events/core.c | 21 +++++++++++++++------ > > 1 file changed, 15 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > > index 40ad1425ffa2..f8a8a4ea4d47 100644 > > --- a/arch/x86/events/core.c > > +++ b/arch/x86/events/core.c > > @@ -1388,16 +1388,25 @@ int x86_perf_event_set_period(struct perf_event *event) > > hwc->last_period = period; > > ret = 1; > > } > > - /* > > - * Quirk: certain CPUs dont like it if just 1 hw_event is left: > > - */ > > - if (unlikely(left < 2)) > > - left = 2; > > > > if (left > x86_pmu.max_period) > > left = x86_pmu.max_period; > > > > - static_call_cond(x86_pmu_limit_period)(event, &left); > > + /* > > + * Exempt KVM guest events from the minimum period requirements. It's > > + * the guest's responsibility to ensure it can make forward progress, > > + * and it's KVM's responsibility to configure an appropriate "period" > > + * to correctly virtualize overflow for the guest's PMCs. > > + */ > > + if (!event->attr.exclude_host) { > > + /* > > + * Quirk: certain CPUs dont like it if just 1 event is left: > > + */ > > + if (unlikely(left < 2)) > > + left = 2; > > + > > + static_call_cond(x86_pmu_limit_period)(event, &left); > > + } > > > > this_cpu_write(pmc_prev_left[idx], left); > > > > Nice one. I am curious how you tested this one? I would like to > reproduce that one on my side. The check_emulated_instr() sub-test in KVM-Unit-Tests's x86/pmu.c fails when run with "my" (which is really yours) fix for the KVM's handling of emulated PMC events[*]. If KVM synthesizes an "instructions retired" event that bumps the PMC to all ones, i.e. -1 for all intents and purposes, the test fails because KVM creates a sample_period of '1', but perf programs a period of '2'. I suspect a very simple test of writing -1 to a PMC from the guest would exhibit the same behavior. [*] https://lkml.kernel.org/r/ZUWAg3WP2XESCAR4%40google.com
On Tue, Nov 7, 2023 at 3:02 PM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Nov 07, 2023, Mingwei Zhang wrote: > > On Tue, Nov 07, 2023, Sean Christopherson wrote: > > > arch/x86/events/core.c | 21 +++++++++++++++------ > > > 1 file changed, 15 insertions(+), 6 deletions(-) > > > > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > > > index 40ad1425ffa2..f8a8a4ea4d47 100644 > > > --- a/arch/x86/events/core.c > > > +++ b/arch/x86/events/core.c > > > @@ -1388,16 +1388,25 @@ int x86_perf_event_set_period(struct perf_event *event) > > > hwc->last_period = period; > > > ret = 1; > > > } > > > - /* > > > - * Quirk: certain CPUs dont like it if just 1 hw_event is left: > > > - */ > > > - if (unlikely(left < 2)) > > > - left = 2; > > > > > > if (left > x86_pmu.max_period) > > > left = x86_pmu.max_period; > > > > > > - static_call_cond(x86_pmu_limit_period)(event, &left); > > > + /* > > > + * Exempt KVM guest events from the minimum period requirements. It's > > > + * the guest's responsibility to ensure it can make forward progress, > > > + * and it's KVM's responsibility to configure an appropriate "period" > > > + * to correctly virtualize overflow for the guest's PMCs. > > > + */ > > > + if (!event->attr.exclude_host) { > > > + /* > > > + * Quirk: certain CPUs dont like it if just 1 event is left: > > > + */ > > > + if (unlikely(left < 2)) > > > + left = 2; > > > + > > > + static_call_cond(x86_pmu_limit_period)(event, &left); > > > + } > > > > > > this_cpu_write(pmc_prev_left[idx], left); > > > > > > > Nice one. I am curious how you tested this one? I would like to > > reproduce that one on my side. > > The check_emulated_instr() sub-test in KVM-Unit-Tests's x86/pmu.c fails when run > with "my" (which is really yours) fix for the KVM's handling of emulated PMC > events[*]. If KVM synthesizes an "instructions retired" event that bumps the > PMC to all ones, i.e. -1 for all intents and purposes, the test fails because > KVM creates a sample_period of '1', but perf programs a period of '2'. > > I suspect a very simple test of writing -1 to a PMC from the guest would exhibit > the same behavior. > > [*] https://lkml.kernel.org/r/ZUWAg3WP2XESCAR4%40google.com Nice, I will try that and see if I can reproduce. Will give Reviewed-by after testing it on my side. Thanks. -Mingwei
On Tue, Nov 07, 2023 at 10:36:05AM -0800, Sean Christopherson wrote: > Don't apply minimum period workarounds/requirements to events that are > being created by KVM to virtualize PMCs for guests, i.e. skip limit > enforcement for events that exclude the host. Perf's somewhat arbitrary > limits prevents KVM from correctly virtualizing counter overflow, e.g. if > the guest sets a counter to have an effective period of '1', forcing a > minimum period of '2' results in overflow occurring at the incorrect time. > > Whether or not a "real" profiling use case is affected is debatable, but > the incorrect behavior is trivially easy to observe and reproduce, and is > deterministic enough to make the PMU appear to be broken from the guest's > perspective. > > Furthermore, the "period" set by KVM isn't actually a period, as KVM won't > automatically reprogram the event with the same period on overflow. KVM > will synthesize a PMI into the guest when appropriate, but what the guest > does in response to the PMI is purely a guest decision. In other words, > KVM effectively operates in a one-shot mode, not a periodic mode. > > Letting KVM and/or the guest program "too small" periods is safe for the > host, as events that exclude the host are atomically disabled with respect > to VM-Exit, i.e. are guaranteed to stop counting upon transitioning to the > host. And whether or not *explicitly* programming a short period is safe > is somewhat of a moot point, as transitions to/from the guest effectively > yield the same effect, e.g. an unrelated VM-Exit => VM-Enter transition > will re-enable guest PMCs with whatever count happened to be in the PMC at > the time of VM-Exit. > > Cc: Like Xu <likexu@tencent.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: Mingwei Zhang <mizhang@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > > Disclaimer: I've only tested this from KVM's side of things. > > arch/x86/events/core.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > index 40ad1425ffa2..f8a8a4ea4d47 100644 > --- a/arch/x86/events/core.c > +++ b/arch/x86/events/core.c > @@ -1388,16 +1388,25 @@ int x86_perf_event_set_period(struct perf_event *event) > hwc->last_period = period; > ret = 1; > } > - /* > - * Quirk: certain CPUs dont like it if just 1 hw_event is left: > - */ > - if (unlikely(left < 2)) > - left = 2; > > if (left > x86_pmu.max_period) > left = x86_pmu.max_period; > > - static_call_cond(x86_pmu_limit_period)(event, &left); > + /* > + * Exempt KVM guest events from the minimum period requirements. It's > + * the guest's responsibility to ensure it can make forward progress, > + * and it's KVM's responsibility to configure an appropriate "period" > + * to correctly virtualize overflow for the guest's PMCs. > + */ > + if (!event->attr.exclude_host) { > + /* > + * Quirk: certain CPUs dont like it if just 1 event is left: > + */ > + if (unlikely(left < 2)) > + left = 2; > + > + static_call_cond(x86_pmu_limit_period)(event, &left); > + } Hmm, IIRC we can disable that left < 2 thing for anything that doesn't have x86_pmu.pebs_no_isolation IIRC. I'm not sure about taking out the limit_period call, why does it make sense to allow the guest to program obviously invalid settings? That is, would something like the below work for you? --- diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 40ad1425ffa2..5543a0bab1f8 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -152,6 +152,14 @@ u64 x86_perf_event_update(struct perf_event *event) return new_raw_count; } +static void x86_perf_limit_period(struct perf_event *event, s64 *left) +{ + /* + * Quirk: certain CPUs dont like it if just 1 hw_event is left: + */ + *left = max(*left, 2); +} + /* * Find and validate any extra registers to set up. */ @@ -1388,11 +1396,6 @@ int x86_perf_event_set_period(struct perf_event *event) hwc->last_period = period; ret = 1; } - /* - * Quirk: certain CPUs dont like it if just 1 hw_event is left: - */ - if (unlikely(left < 2)) - left = 2; if (left > x86_pmu.max_period) left = x86_pmu.max_period; @@ -2130,6 +2133,10 @@ static int __init init_hw_perf_events(void) if (!x86_pmu.update) x86_pmu.update = x86_perf_event_update; + // XXX check non-Intel + if (!x86_pmu.limit_period && x86_pmu.pebs_no_isolation) + x86_pmu.limit_update = x86_perf_limit_period; + x86_pmu_static_call_update(); /* diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index a08f794a0e79..9fe0f241779e 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4471,7 +4471,10 @@ static void bdw_limit_period(struct perf_event *event, s64 *left) if (*left < 128) *left = 128; *left &= ~0x3fULL; + return; } + if (unlikely(x86_pmu.pebs_no_isolation)) + *left = max(*left, 2); } static void nhm_limit_period(struct perf_event *event, s64 *left)
On Fri, Nov 17, 2023, Peter Zijlstra wrote: > On Tue, Nov 07, 2023 at 10:36:05AM -0800, Sean Christopherson wrote: > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > > index 40ad1425ffa2..f8a8a4ea4d47 100644 > > --- a/arch/x86/events/core.c > > +++ b/arch/x86/events/core.c > > @@ -1388,16 +1388,25 @@ int x86_perf_event_set_period(struct perf_event *event) > > hwc->last_period = period; > > ret = 1; > > } > > - /* > > - * Quirk: certain CPUs dont like it if just 1 hw_event is left: > > - */ > > - if (unlikely(left < 2)) > > - left = 2; > > > > if (left > x86_pmu.max_period) > > left = x86_pmu.max_period; > > > > - static_call_cond(x86_pmu_limit_period)(event, &left); > > + /* > > + * Exempt KVM guest events from the minimum period requirements. It's > > + * the guest's responsibility to ensure it can make forward progress, > > + * and it's KVM's responsibility to configure an appropriate "period" > > + * to correctly virtualize overflow for the guest's PMCs. > > + */ > > + if (!event->attr.exclude_host) { > > + /* > > + * Quirk: certain CPUs dont like it if just 1 event is left: > > + */ > > + if (unlikely(left < 2)) > > + left = 2; > > + > > + static_call_cond(x86_pmu_limit_period)(event, &left); > > + } > > Hmm, IIRC we can disable that left < 2 thing for anything that doesn't > have x86_pmu.pebs_no_isolation IIRC. > > I'm not sure about taking out the limit_period call, why does it make > sense to allow the guest to program obviously invalid settings? I don't see how the guest behavior is obviously invalid. Architecturally, writing -1 to a counter should result in overflow after a single event. Underlying uarch goofiness shouldn't enter into that equation. Honoring the guest's programming *might* cause oddness for the guest, whereas not honoring the architecture is guaranteed to cause visible issues. If programming a "period" of 1 puts the host at risk in some way, then I agree that this is unsafe and we need a different solution. But if the worst case scenario is non-determinstic or odd behavior from the guest's perspective, then that's the guest's problem (with the caveat that the guest might not have accurate Family/Model/Stepping data to make informed decisions). > That is, would something like the below work for you? No, because the fix ideally wouldn't require fancy hardware, i.e. would work for all CPUs for which KVM supports a virtual PMU.
On Tue, Nov 28, 2023 at 05:33:16PM -0800, Sean Christopherson wrote: > If programming a "period" of 1 puts the host at risk in some way, then I agree > that this is unsafe and we need a different solution. IIRC if you put in -1 on a Nehalem, you end up with an NMI-storm which wasn't trivial to recover from if at all (it's too long ago and I don't have ancient hardware like that anymore :/) > But if the worst case > scenario is non-determinstic or odd behavior from the guest's perspective, then > that's the guest's problem (with the caveat that the guest might not have accurate > Family/Model/Stepping data to make informed decisions). Things like bdm_limit_period() will cause odd behaviour IIRC, it does daft things like generate extra PEBS records on overflow and gives otherwise daft results for PDIR. glc_limit_period() lacks a useful comment :/
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 40ad1425ffa2..f8a8a4ea4d47 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1388,16 +1388,25 @@ int x86_perf_event_set_period(struct perf_event *event) hwc->last_period = period; ret = 1; } - /* - * Quirk: certain CPUs dont like it if just 1 hw_event is left: - */ - if (unlikely(left < 2)) - left = 2; if (left > x86_pmu.max_period) left = x86_pmu.max_period; - static_call_cond(x86_pmu_limit_period)(event, &left); + /* + * Exempt KVM guest events from the minimum period requirements. It's + * the guest's responsibility to ensure it can make forward progress, + * and it's KVM's responsibility to configure an appropriate "period" + * to correctly virtualize overflow for the guest's PMCs. + */ + if (!event->attr.exclude_host) { + /* + * Quirk: certain CPUs dont like it if just 1 event is left: + */ + if (unlikely(left < 2)) + left = 2; + + static_call_cond(x86_pmu_limit_period)(event, &left); + } this_cpu_write(pmc_prev_left[idx], left);