Message ID | 20230727141428.962286-10-alexghiti@rivosinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp1174585vqo; Thu, 27 Jul 2023 08:19:44 -0700 (PDT) X-Google-Smtp-Source: APBJJlHlybGSpbZuSfj64rEs0FWhlBdITZ89mrDahJM/oGOFnMnDxMETM5V/P+kxUcbYUhypv/t0 X-Received: by 2002:a17:902:9301:b0:1b1:76c2:296a with SMTP id bc1-20020a170902930100b001b176c2296amr3918087plb.60.1690471183636; Thu, 27 Jul 2023 08:19:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690471183; cv=none; d=google.com; s=arc-20160816; b=aRQt0IGzw0+Tn/5p8HEXcr74JRJAr7kiWcIgL65Zagmkkv4WxePgy39VIuHHYzoblX RAKxd67QHlvMYm6Zf2gulyEvblILiiUh210TBw1lQpJqdhqpHfn2gM6eO1lMeIXvtx3L UTxST0nhLHQmW/Z+HfaiTpB2hQkJwEI5zwyF46VWDAu3ydjlSh0NCSnkxoCoXB6ylzHJ ZLwwI4Y/a1KyZywe58IorZZQ0F+kwKdR4dx7qaHdg7i8PRxxamf/UajdhKFSHMWkLcLg SOvUjyH7A8rTRhO6TkYzrHBLBnKNkq01yrCQHr1u8N8y6WJT7CM2l/JNfbNG1Hv5RcmZ /KoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=66arnrPbHIu5M0dnPIGmis7CQPk0GPDOyAbE7q1SCMA=; fh=JaxZGKf0NGxFruMBNweFeRrHWrRBBIwkQ5sLyfk9jAY=; b=r2mzdur+N8D4tAquRwykB+8gFVapD/ay/2BLHnBOby476XItn+KuVy8zKU25HAXSNQ t3gkeGAB5YN6YQVxgwvroL2B2Q8YPLatzsMz3lobWaPJMTz+KZFmzIrdyApVTgUJh2Q3 2R8vz6KPV5qXejAbedMCI8WpBseBxhvnbHsIam8V5ah9bKUNXkc+ZDQ6OZy5X9ZPAoQG KRy32GzrBTKjqRJzXhRL7ypDvj/aRWvM+jyyZx+h4W0O3N9aUgCN5AwJO9FIfkDaw9VX MIJhA07VMjvcR7kyfJ249OzS6SJOGBXmae49lrGJG/2FXCK9FZvcX7KKdm+KIORbOtVq 0/Nw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=c8JQYNtx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ky6-20020a170902f98600b001b9e970e961si1404927plb.277.2023.07.27.08.19.29; Thu, 27 Jul 2023 08:19:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=c8JQYNtx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231927AbjG0O2f (ORCPT <rfc822;hanasaki@gmail.com> + 99 others); Thu, 27 Jul 2023 10:28:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233740AbjG0O2d (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 27 Jul 2023 10:28:33 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A8152D40 for <linux-kernel@vger.kernel.org>; Thu, 27 Jul 2023 07:28:31 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-3fbc244d384so11248885e9.0 for <linux-kernel@vger.kernel.org>; Thu, 27 Jul 2023 07:28:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1690468110; x=1691072910; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=66arnrPbHIu5M0dnPIGmis7CQPk0GPDOyAbE7q1SCMA=; b=c8JQYNtxWw5WK3euMF4rZHhmXo98QlKuUKMY+1Sg7Uqh3fqw9BDZ4Jxnn1YXpVefZi OdCWRKz/hUNg4LgncgOqHD58F06IL5aK6lBoHv5aOmASBZDbxIKX4FrhRqLLeWo9zhjh lJ0JLwcenXNUc1NDfRyHB0IvIoG8Uwa/wkBEiN6X5pQwanP80QTTLX8OBauFHpPFcyP/ FlLRH9iueZNjQ6X3yvNDXQMmzy9/RitIKdaTvJCEKflJyQmujCVUnezqfwM6aDKALDW3 Tf0gPdODKV84li+wkh2+o0JxJou4my+H4BUWRHab+O2O5LWQbBWcRi9y6ZiYrWeU+LZW TjQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690468110; x=1691072910; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=66arnrPbHIu5M0dnPIGmis7CQPk0GPDOyAbE7q1SCMA=; b=i9+WhtzrkKq3mrG3+ZtA4D5i8uyhIu1+GmiAbL1oiteUO2ToS6mNc9Soz8KKl7qTxY fkISe1o0vmwJgM0qv+P9wILMysxX71U4929+bHmZY5g5tRWaakBJuN538eyczu/xjPdw G85Wdj7p9U0W/G+JIGmUIsvFGm3ZFGQ22FvOYPsnwjlE77lL6YMo1h2+xGxKp21Ekyh2 rzTpgUdaAIQRpu5D09QDb/kQhA1ha5X6nrwASa4FyfwogFY/Yak/ymO+ubb+7WnEoqps ZDknDxGAPf0hLPSSfgNvertIbV1l8AQ45WzWP8MrI61YbYWf/65fDWalIIZ2Wmio/h0a K9LQ== X-Gm-Message-State: ABy/qLYI0eRvrPsAhVuPQJMxEr5QAq17iFk2xQ6b+G2eVqWRKwh4xc2B KQ3MamIhfxo4x8xzAesJlXwXUw== X-Received: by 2002:a1c:ed0e:0:b0:3f9:9a93:217f with SMTP id l14-20020a1ced0e000000b003f99a93217fmr1976415wmh.3.1690468109878; Thu, 27 Jul 2023 07:28:29 -0700 (PDT) Received: from alex-rivos.home (amontpellier-656-1-456-62.w92-145.abo.wanadoo.fr. [92.145.124.62]) by smtp.gmail.com with ESMTPSA id x17-20020a05600c21d100b003fbca05faa9sm1958433wmj.24.2023.07.27.07.28.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 07:28:29 -0700 (PDT) From: Alexandre Ghiti <alexghiti@rivosinc.com> To: Jonathan Corbet <corbet@lwn.net>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Ian Rogers <irogers@google.com>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Atish Patra <atishp@atishpatra.org>, Anup Patel <anup@brainfault.org>, Will Deacon <will@kernel.org>, Rob Herring <robh@kernel.org>, Andrew Jones <ajones@ventanamicro.com>, =?utf-8?q?R=C3=A9mi_Denis-Courmont?= <remi@remlab.net>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-riscv@lists.infradead.org, linux-arm-kernel@lists.infradead.org Cc: Alexandre Ghiti <alexghiti@rivosinc.com>, Atish Patra <atishp@rivosinc.com> Subject: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support Date: Thu, 27 Jul 2023 16:14:27 +0200 Message-Id: <20230727141428.962286-10-alexghiti@rivosinc.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230727141428.962286-1-alexghiti@rivosinc.com> References: <20230727141428.962286-1-alexghiti@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772587511632295110 X-GMAIL-MSGID: 1772587511632295110 |
Series |
riscv: Allow userspace to directly access perf counters
|
|
Commit Message
Alexandre Ghiti
July 27, 2023, 2:14 p.m. UTC
riscv now supports mmaping hardware counters so add what's needed to take advantage of that in libperf. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> --- tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+)
Comments
On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > riscv now supports mmaping hardware counters so add what's needed to > take advantage of that in libperf. > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > Reviewed-by: Atish Patra <atishp@rivosinc.com> > --- > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 65 insertions(+) > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c > index 0d1634cedf44..378a163f0554 100644 > --- a/tools/lib/perf/mmap.c > +++ b/tools/lib/perf/mmap.c > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } > > +#elif __riscv_xlen == 64 This is something of an odd guard, perhaps: #elif defined(__riscv) && __riscv_xlen == 64 That way it is more intention revealing that this is riscv code. Could you add a comment relating to the __riscv_xlen ? > + > +/* TODO: implement rv32 support */ > + > +#define CSR_CYCLE 0xc00 > +#define CSR_TIME 0xc01 > + > +#define csr_read(csr) \ > +({ \ > + register unsigned long __v; \ > + __asm__ __volatile__ ("csrr %0, " #csr \ > + : "=r" (__v) : \ > + : "memory"); \ To avoid the macro pasting that could potentially go weird, could this be: __asm__ __volatile__ ("csrr %0, %1", : "=r"(__v) /* outputs */ : "i"(csr) /* inputs */ : "memory" /* clobbers */) Also, why is this clobbering memory? Worth adding a comment. Thanks, Ian > + __v; \ > +}) > + > +static unsigned long csr_read_num(int csr_num) > +{ > +#define switchcase_csr_read(__csr_num, __val) {\ > + case __csr_num: \ > + __val = csr_read(__csr_num); \ > + break; } > +#define switchcase_csr_read_2(__csr_num, __val) {\ > + switchcase_csr_read(__csr_num + 0, __val) \ > + switchcase_csr_read(__csr_num + 1, __val)} > +#define switchcase_csr_read_4(__csr_num, __val) {\ > + switchcase_csr_read_2(__csr_num + 0, __val) \ > + switchcase_csr_read_2(__csr_num + 2, __val)} > +#define switchcase_csr_read_8(__csr_num, __val) {\ > + switchcase_csr_read_4(__csr_num + 0, __val) \ > + switchcase_csr_read_4(__csr_num + 4, __val)} > +#define switchcase_csr_read_16(__csr_num, __val) {\ > + switchcase_csr_read_8(__csr_num + 0, __val) \ > + switchcase_csr_read_8(__csr_num + 8, __val)} > +#define switchcase_csr_read_32(__csr_num, __val) {\ > + switchcase_csr_read_16(__csr_num + 0, __val) \ > + switchcase_csr_read_16(__csr_num + 16, __val)} > + > + unsigned long ret = 0; > + > + switch (csr_num) { > + switchcase_csr_read_32(CSR_CYCLE, ret) > + default: > + break; > + } > + > + return ret; > +#undef switchcase_csr_read_32 > +#undef switchcase_csr_read_16 > +#undef switchcase_csr_read_8 > +#undef switchcase_csr_read_4 > +#undef switchcase_csr_read_2 > +#undef switchcase_csr_read > +} > + > +static u64 read_perf_counter(unsigned int counter) > +{ > + return csr_read_num(CSR_CYCLE + counter); > +} > + > +static u64 read_timestamp(void) > +{ > + return csr_read_num(CSR_TIME); > +} > + > #else > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > static u64 read_timestamp(void) { return 0; } > -- > 2.39.2 >
Hi Ian, On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <irogers@google.com> wrote: > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > riscv now supports mmaping hardware counters so add what's needed to > > take advantage of that in libperf. > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > Reviewed-by: Atish Patra <atishp@rivosinc.com> > > --- > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 65 insertions(+) > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c > > index 0d1634cedf44..378a163f0554 100644 > > --- a/tools/lib/perf/mmap.c > > +++ b/tools/lib/perf/mmap.c > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } > > > > +#elif __riscv_xlen == 64 > > This is something of an odd guard, perhaps: > #elif defined(__riscv) && __riscv_xlen == 64 > > That way it is more intention revealing that this is riscv code. Could > you add a comment relating to the __riscv_xlen ? I guess Andrew answered that already. > > > + > > +/* TODO: implement rv32 support */ > > + > > +#define CSR_CYCLE 0xc00 > > +#define CSR_TIME 0xc01 > > + > > +#define csr_read(csr) \ > > +({ \ > > + register unsigned long __v; \ > > + __asm__ __volatile__ ("csrr %0, " #csr \ > > + : "=r" (__v) : \ > > + : "memory"); \ > > To avoid the macro pasting that could potentially go weird, could this be: > > __asm__ __volatile__ ("csrr %0, %1", > : "=r"(__v) /* outputs */ > : "i"(csr) /* inputs */ > : "memory" /* clobbers */) > > Also, why is this clobbering memory? Worth adding a comment. No idea, I see that it is also done this way in arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ? Thanks for your comments! Alex > > Thanks, > Ian > > > + __v; \ > > +}) > > + > > +static unsigned long csr_read_num(int csr_num) > > +{ > > +#define switchcase_csr_read(__csr_num, __val) {\ > > + case __csr_num: \ > > + __val = csr_read(__csr_num); \ > > + break; } > > +#define switchcase_csr_read_2(__csr_num, __val) {\ > > + switchcase_csr_read(__csr_num + 0, __val) \ > > + switchcase_csr_read(__csr_num + 1, __val)} > > +#define switchcase_csr_read_4(__csr_num, __val) {\ > > + switchcase_csr_read_2(__csr_num + 0, __val) \ > > + switchcase_csr_read_2(__csr_num + 2, __val)} > > +#define switchcase_csr_read_8(__csr_num, __val) {\ > > + switchcase_csr_read_4(__csr_num + 0, __val) \ > > + switchcase_csr_read_4(__csr_num + 4, __val)} > > +#define switchcase_csr_read_16(__csr_num, __val) {\ > > + switchcase_csr_read_8(__csr_num + 0, __val) \ > > + switchcase_csr_read_8(__csr_num + 8, __val)} > > +#define switchcase_csr_read_32(__csr_num, __val) {\ > > + switchcase_csr_read_16(__csr_num + 0, __val) \ > > + switchcase_csr_read_16(__csr_num + 16, __val)} > > + > > + unsigned long ret = 0; > > + > > + switch (csr_num) { > > + switchcase_csr_read_32(CSR_CYCLE, ret) > > + default: > > + break; > > + } > > + > > + return ret; > > +#undef switchcase_csr_read_32 > > +#undef switchcase_csr_read_16 > > +#undef switchcase_csr_read_8 > > +#undef switchcase_csr_read_4 > > +#undef switchcase_csr_read_2 > > +#undef switchcase_csr_read > > +} > > + > > +static u64 read_perf_counter(unsigned int counter) > > +{ > > + return csr_read_num(CSR_CYCLE + counter); > > +} > > + > > +static u64 read_timestamp(void) > > +{ > > + return csr_read_num(CSR_TIME); > > +} > > + > > #else > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > > static u64 read_timestamp(void) { return 0; } > > -- > > 2.39.2 > >
On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > Hi Ian, > > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <irogers@google.com> wrote: > > > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > riscv now supports mmaping hardware counters so add what's needed to > > > take advantage of that in libperf. > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > > Reviewed-by: Atish Patra <atishp@rivosinc.com> > > > --- > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 65 insertions(+) > > > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c > > > index 0d1634cedf44..378a163f0554 100644 > > > --- a/tools/lib/perf/mmap.c > > > +++ b/tools/lib/perf/mmap.c > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) > > > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } > > > > > > +#elif __riscv_xlen == 64 > > > > This is something of an odd guard, perhaps: > > #elif defined(__riscv) && __riscv_xlen == 64 > > > > That way it is more intention revealing that this is riscv code. Could > > you add a comment relating to the __riscv_xlen ? > > I guess Andrew answered that already. > > > > > > + > > > +/* TODO: implement rv32 support */ > > > + > > > +#define CSR_CYCLE 0xc00 > > > +#define CSR_TIME 0xc01 > > > + > > > +#define csr_read(csr) \ > > > +({ \ > > > + register unsigned long __v; \ > > > + __asm__ __volatile__ ("csrr %0, " #csr \ > > > + : "=r" (__v) : \ > > > + : "memory"); \ > > > > To avoid the macro pasting that could potentially go weird, could this be: > > > > __asm__ __volatile__ ("csrr %0, %1", > > : "=r"(__v) /* outputs */ > > : "i"(csr) /* inputs */ > > : "memory" /* clobbers */) Forgot to answer this one: it compiles, but I have to admit that I don't understand the difference and if that's correct (all macros in arch/riscv/include/asm/csr.h use # to do this) and what benefits it brings. Can you elaborate more on things that could "go weird"? Thanks again, Alex > > > > Also, why is this clobbering memory? Worth adding a comment. > > No idea, I see that it is also done this way in > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ? > > Thanks for your comments! > > Alex > > > > > Thanks, > > Ian > > > > > + __v; \ > > > +}) > > > + > > > +static unsigned long csr_read_num(int csr_num) > > > +{ > > > +#define switchcase_csr_read(__csr_num, __val) {\ > > > + case __csr_num: \ > > > + __val = csr_read(__csr_num); \ > > > + break; } > > > +#define switchcase_csr_read_2(__csr_num, __val) {\ > > > + switchcase_csr_read(__csr_num + 0, __val) \ > > > + switchcase_csr_read(__csr_num + 1, __val)} > > > +#define switchcase_csr_read_4(__csr_num, __val) {\ > > > + switchcase_csr_read_2(__csr_num + 0, __val) \ > > > + switchcase_csr_read_2(__csr_num + 2, __val)} > > > +#define switchcase_csr_read_8(__csr_num, __val) {\ > > > + switchcase_csr_read_4(__csr_num + 0, __val) \ > > > + switchcase_csr_read_4(__csr_num + 4, __val)} > > > +#define switchcase_csr_read_16(__csr_num, __val) {\ > > > + switchcase_csr_read_8(__csr_num + 0, __val) \ > > > + switchcase_csr_read_8(__csr_num + 8, __val)} > > > +#define switchcase_csr_read_32(__csr_num, __val) {\ > > > + switchcase_csr_read_16(__csr_num + 0, __val) \ > > > + switchcase_csr_read_16(__csr_num + 16, __val)} > > > + > > > + unsigned long ret = 0; > > > + > > > + switch (csr_num) { > > > + switchcase_csr_read_32(CSR_CYCLE, ret) > > > + default: > > > + break; > > > + } > > > + > > > + return ret; > > > +#undef switchcase_csr_read_32 > > > +#undef switchcase_csr_read_16 > > > +#undef switchcase_csr_read_8 > > > +#undef switchcase_csr_read_4 > > > +#undef switchcase_csr_read_2 > > > +#undef switchcase_csr_read > > > +} > > > + > > > +static u64 read_perf_counter(unsigned int counter) > > > +{ > > > + return csr_read_num(CSR_CYCLE + counter); > > > +} > > > + > > > +static u64 read_timestamp(void) > > > +{ > > > + return csr_read_num(CSR_TIME); > > > +} > > > + > > > #else > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > > > static u64 read_timestamp(void) { return 0; } > > > -- > > > 2.39.2 > > >
On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > Hi Ian, > > > > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <irogers@google.com> wrote: > > > > > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > > > riscv now supports mmaping hardware counters so add what's needed to > > > > take advantage of that in libperf. > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > > > Reviewed-by: Atish Patra <atishp@rivosinc.com> > > > > --- > > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 65 insertions(+) > > > > > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c > > > > index 0d1634cedf44..378a163f0554 100644 > > > > --- a/tools/lib/perf/mmap.c > > > > +++ b/tools/lib/perf/mmap.c > > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) > > > > > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } > > > > > > > > +#elif __riscv_xlen == 64 > > > > > > This is something of an odd guard, perhaps: > > > #elif defined(__riscv) && __riscv_xlen == 64 > > > > > > That way it is more intention revealing that this is riscv code. Could > > > you add a comment relating to the __riscv_xlen ? > > > > I guess Andrew answered that already. > > Not sure. I still think it looks weird: ... #if defined(__i386__) || defined(__x86_64__) ... #elif defined(__aarch64__) ... #elif __riscv_xlen == 64 ... #else static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } static u64 read_timestamp(void) { return 0; } #endif The first two are clearly #ifdef-ing architecture specific assembly code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth a comment like "csrr is only available when you have xlens of 64 because ..." > > > > > > > + > > > > +/* TODO: implement rv32 support */ > > > > + > > > > +#define CSR_CYCLE 0xc00 > > > > +#define CSR_TIME 0xc01 > > > > + > > > > +#define csr_read(csr) \ > > > > +({ \ > > > > + register unsigned long __v; \ > > > > + __asm__ __volatile__ ("csrr %0, " #csr \ > > > > + : "=r" (__v) : \ > > > > + : "memory"); \ > > > > > > To avoid the macro pasting that could potentially go weird, could this be: > > > > > > __asm__ __volatile__ ("csrr %0, %1", > > > : "=r"(__v) /* outputs */ > > > : "i"(csr) /* inputs */ > > > : "memory" /* clobbers */) > > Forgot to answer this one: it compiles, but I have to admit that I > don't understand the difference and if that's correct (all macros in > arch/riscv/include/asm/csr.h use # to do this) and what benefits it > brings. Can you elaborate more on things that could "go weird"? So rather than use an input constraint for the asm block you are using the C preprocessor to paste in the csr argument. If csr is something like "1" then it looks good and you'll get "csrr %0,1". If you pass something like "1 << 31" then that will be pasted as "csrr %0, 1 << 31" and that starts to get weird in the context of being in the assembler where it is unlikely the C operators work. Using the input constraint avoids this, causes the C compiler to check the type of the argument and you'll probably get more intelligible error messages as a consequence. > > Thanks again, > > Alex > > > > > > > Also, why is this clobbering memory? Worth adding a comment. > > > > No idea, I see that it is also done this way in > > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ? It would seem to make sense then not to have a memory constraint until we know why we're doing it? Thanks, Ian > > > > Thanks for your comments! > > > > Alex > > > > > > > > Thanks, > > > Ian > > > > > > > + __v; \ > > > > +}) > > > > + > > > > +static unsigned long csr_read_num(int csr_num) > > > > +{ > > > > +#define switchcase_csr_read(__csr_num, __val) {\ > > > > + case __csr_num: \ > > > > + __val = csr_read(__csr_num); \ > > > > + break; } > > > > +#define switchcase_csr_read_2(__csr_num, __val) {\ > > > > + switchcase_csr_read(__csr_num + 0, __val) \ > > > > + switchcase_csr_read(__csr_num + 1, __val)} > > > > +#define switchcase_csr_read_4(__csr_num, __val) {\ > > > > + switchcase_csr_read_2(__csr_num + 0, __val) \ > > > > + switchcase_csr_read_2(__csr_num + 2, __val)} > > > > +#define switchcase_csr_read_8(__csr_num, __val) {\ > > > > + switchcase_csr_read_4(__csr_num + 0, __val) \ > > > > + switchcase_csr_read_4(__csr_num + 4, __val)} > > > > +#define switchcase_csr_read_16(__csr_num, __val) {\ > > > > + switchcase_csr_read_8(__csr_num + 0, __val) \ > > > > + switchcase_csr_read_8(__csr_num + 8, __val)} > > > > +#define switchcase_csr_read_32(__csr_num, __val) {\ > > > > + switchcase_csr_read_16(__csr_num + 0, __val) \ > > > > + switchcase_csr_read_16(__csr_num + 16, __val)} > > > > + > > > > + unsigned long ret = 0; > > > > + > > > > + switch (csr_num) { > > > > + switchcase_csr_read_32(CSR_CYCLE, ret) > > > > + default: > > > > + break; > > > > + } > > > > + > > > > + return ret; > > > > +#undef switchcase_csr_read_32 > > > > +#undef switchcase_csr_read_16 > > > > +#undef switchcase_csr_read_8 > > > > +#undef switchcase_csr_read_4 > > > > +#undef switchcase_csr_read_2 > > > > +#undef switchcase_csr_read > > > > +} > > > > + > > > > +static u64 read_perf_counter(unsigned int counter) > > > > +{ > > > > + return csr_read_num(CSR_CYCLE + counter); > > > > +} > > > > + > > > > +static u64 read_timestamp(void) > > > > +{ > > > > + return csr_read_num(CSR_TIME); > > > > +} > > > > + > > > > #else > > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > > > > static u64 read_timestamp(void) { return 0; } > > > > -- > > > > 2.39.2 > > > >
On Mon, Jul 31, 2023 at 5:10 PM Ian Rogers <irogers@google.com> wrote: > > On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > Hi Ian, > > > > > > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <irogers@google.com> wrote: > > > > > > > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > > > > > riscv now supports mmaping hardware counters so add what's needed to > > > > > take advantage of that in libperf. > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > > > > Reviewed-by: Atish Patra <atishp@rivosinc.com> > > > > > --- > > > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ > > > > > 1 file changed, 65 insertions(+) > > > > > > > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c > > > > > index 0d1634cedf44..378a163f0554 100644 > > > > > --- a/tools/lib/perf/mmap.c > > > > > +++ b/tools/lib/perf/mmap.c > > > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) > > > > > > > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } > > > > > > > > > > +#elif __riscv_xlen == 64 > > > > > > > > This is something of an odd guard, perhaps: > > > > #elif defined(__riscv) && __riscv_xlen == 64 > > > > > > > > That way it is more intention revealing that this is riscv code. Could > > > > you add a comment relating to the __riscv_xlen ? > > > > > > I guess Andrew answered that already. > > > > > Not sure. I still think it looks weird: > ... > #if defined(__i386__) || defined(__x86_64__) > ... > #elif defined(__aarch64__) > ... > #elif __riscv_xlen == 64 > ... > #else > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > static u64 read_timestamp(void) { return 0; } > #endif > > The first two are clearly #ifdef-ing architecture specific assembly > code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth > a comment like "csrr is only available when you have xlens of 64 > because ..." __riscv_xlen indicates riscv64, which is the only target of this patchset. But if you prefer, I don't mind adding back the defined(__riscv) if I re-spin a new version. > > > > > > > > > > + > > > > > +/* TODO: implement rv32 support */ > > > > > + > > > > > +#define CSR_CYCLE 0xc00 > > > > > +#define CSR_TIME 0xc01 > > > > > + > > > > > +#define csr_read(csr) \ > > > > > +({ \ > > > > > + register unsigned long __v; \ > > > > > + __asm__ __volatile__ ("csrr %0, " #csr \ > > > > > + : "=r" (__v) : \ > > > > > + : "memory"); \ > > > > > > > > To avoid the macro pasting that could potentially go weird, could this be: > > > > > > > > __asm__ __volatile__ ("csrr %0, %1", > > > > : "=r"(__v) /* outputs */ > > > > : "i"(csr) /* inputs */ > > > > : "memory" /* clobbers */) > > > > Forgot to answer this one: it compiles, but I have to admit that I > > don't understand the difference and if that's correct (all macros in > > arch/riscv/include/asm/csr.h use # to do this) and what benefits it > > brings. Can you elaborate more on things that could "go weird"? > > So rather than use an input constraint for the asm block you are using > the C preprocessor to paste in the csr argument. If csr is something > like "1" then it looks good and you'll get "csrr %0,1". If you pass > something like "1 << 31" then that will be pasted as "csrr %0, 1 << > 31" and that starts to get weird in the context of being in the > assembler where it is unlikely the C operators work. Using the input > constraint avoids this, causes the C compiler to check the type of the > argument and you'll probably get more intelligible error messages as a > consequence. > Thanks. So if I'm not mistaken, in this exact context, given we only use csr_read() through the csr_read_num() function, it seems ok right? > > > > Thanks again, > > > > Alex > > > > > > > > > > Also, why is this clobbering memory? Worth adding a comment. > > > > > > No idea, I see that it is also done this way in > > > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ? > > It would seem to make sense then not to have a memory constraint until > we know why we're doing it? > I have just had the answer internally (thanks to @Brendan Sweeney): csr modifications can alter how the memory is accessed (satp which changes the address space, sum which allows/disallows userspace access...), so we need the memory barrier to make sure the compiler does not reorder the memory accesses. Thanks, Alex > Thanks, > Ian > > > > > > > Thanks for your comments! > > > > > > Alex > > > > > > > > > > > Thanks, > > > > Ian > > > > > > > > > + __v; \ > > > > > +}) > > > > > + > > > > > +static unsigned long csr_read_num(int csr_num) > > > > > +{ > > > > > +#define switchcase_csr_read(__csr_num, __val) {\ > > > > > + case __csr_num: \ > > > > > + __val = csr_read(__csr_num); \ > > > > > + break; } > > > > > +#define switchcase_csr_read_2(__csr_num, __val) {\ > > > > > + switchcase_csr_read(__csr_num + 0, __val) \ > > > > > + switchcase_csr_read(__csr_num + 1, __val)} > > > > > +#define switchcase_csr_read_4(__csr_num, __val) {\ > > > > > + switchcase_csr_read_2(__csr_num + 0, __val) \ > > > > > + switchcase_csr_read_2(__csr_num + 2, __val)} > > > > > +#define switchcase_csr_read_8(__csr_num, __val) {\ > > > > > + switchcase_csr_read_4(__csr_num + 0, __val) \ > > > > > + switchcase_csr_read_4(__csr_num + 4, __val)} > > > > > +#define switchcase_csr_read_16(__csr_num, __val) {\ > > > > > + switchcase_csr_read_8(__csr_num + 0, __val) \ > > > > > + switchcase_csr_read_8(__csr_num + 8, __val)} > > > > > +#define switchcase_csr_read_32(__csr_num, __val) {\ > > > > > + switchcase_csr_read_16(__csr_num + 0, __val) \ > > > > > + switchcase_csr_read_16(__csr_num + 16, __val)} > > > > > + > > > > > + unsigned long ret = 0; > > > > > + > > > > > + switch (csr_num) { > > > > > + switchcase_csr_read_32(CSR_CYCLE, ret) > > > > > + default: > > > > > + break; > > > > > + } > > > > > + > > > > > + return ret; > > > > > +#undef switchcase_csr_read_32 > > > > > +#undef switchcase_csr_read_16 > > > > > +#undef switchcase_csr_read_8 > > > > > +#undef switchcase_csr_read_4 > > > > > +#undef switchcase_csr_read_2 > > > > > +#undef switchcase_csr_read > > > > > +} > > > > > + > > > > > +static u64 read_perf_counter(unsigned int counter) > > > > > +{ > > > > > + return csr_read_num(CSR_CYCLE + counter); > > > > > +} > > > > > + > > > > > +static u64 read_timestamp(void) > > > > > +{ > > > > > + return csr_read_num(CSR_TIME); > > > > > +} > > > > > + > > > > > #else > > > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > > > > > static u64 read_timestamp(void) { return 0; } > > > > > -- > > > > > 2.39.2 > > > > >
On Mon, Jul 31, 2023 at 9:06 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > On Mon, Jul 31, 2023 at 5:10 PM Ian Rogers <irogers@google.com> wrote: > > > > On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > > > Hi Ian, > > > > > > > > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <irogers@google.com> wrote: > > > > > > > > > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > > > > > > > riscv now supports mmaping hardware counters so add what's needed to > > > > > > take advantage of that in libperf. > > > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > > > > > Reviewed-by: Atish Patra <atishp@rivosinc.com> > > > > > > --- > > > > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ > > > > > > 1 file changed, 65 insertions(+) > > > > > > > > > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c > > > > > > index 0d1634cedf44..378a163f0554 100644 > > > > > > --- a/tools/lib/perf/mmap.c > > > > > > +++ b/tools/lib/perf/mmap.c > > > > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) > > > > > > > > > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } > > > > > > > > > > > > +#elif __riscv_xlen == 64 > > > > > > > > > > This is something of an odd guard, perhaps: > > > > > #elif defined(__riscv) && __riscv_xlen == 64 > > > > > > > > > > That way it is more intention revealing that this is riscv code. Could > > > > > you add a comment relating to the __riscv_xlen ? > > > > > > > > I guess Andrew answered that already. > > > > > > > > Not sure. I still think it looks weird: > > ... > > #if defined(__i386__) || defined(__x86_64__) > > ... > > #elif defined(__aarch64__) > > ... > > #elif __riscv_xlen == 64 > > ... > > #else > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > > static u64 read_timestamp(void) { return 0; } > > #endif > > > > The first two are clearly #ifdef-ing architecture specific assembly > > code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth > > a comment like "csrr is only available when you have xlens of 64 > > because ..." > > __riscv_xlen indicates riscv64, which is the only target of this > patchset. But if you prefer, I don't mind adding back the > defined(__riscv) if I re-spin a new version. This kind of begs the question as to why there is no __riscv64 ifdef. The issue with xlen is it isn't intention revealing so for regular people trying to understand the code it would be nice to document it. > > > > > > > > > > > > > + > > > > > > +/* TODO: implement rv32 support */ > > > > > > + > > > > > > +#define CSR_CYCLE 0xc00 > > > > > > +#define CSR_TIME 0xc01 > > > > > > + > > > > > > +#define csr_read(csr) \ > > > > > > +({ \ > > > > > > + register unsigned long __v; \ > > > > > > + __asm__ __volatile__ ("csrr %0, " #csr \ > > > > > > + : "=r" (__v) : \ > > > > > > + : "memory"); \ > > > > > > > > > > To avoid the macro pasting that could potentially go weird, could this be: > > > > > > > > > > __asm__ __volatile__ ("csrr %0, %1", > > > > > : "=r"(__v) /* outputs */ > > > > > : "i"(csr) /* inputs */ > > > > > : "memory" /* clobbers */) > > > > > > Forgot to answer this one: it compiles, but I have to admit that I > > > don't understand the difference and if that's correct (all macros in > > > arch/riscv/include/asm/csr.h use # to do this) and what benefits it > > > brings. Can you elaborate more on things that could "go weird"? > > > > So rather than use an input constraint for the asm block you are using > > the C preprocessor to paste in the csr argument. If csr is something > > like "1" then it looks good and you'll get "csrr %0,1". If you pass > > something like "1 << 31" then that will be pasted as "csrr %0, 1 << > > 31" and that starts to get weird in the context of being in the > > assembler where it is unlikely the C operators work. Using the input > > constraint avoids this, causes the C compiler to check the type of the > > argument and you'll probably get more intelligible error messages as a > > consequence. > > > > Thanks. So if I'm not mistaken, in this exact context, given we only > use csr_read() through the csr_read_num() function, it seems ok right? So you've formed a cargo cult and the justification is not wanting to stop a copy-paste chain from somewhere else. This code itself will be copy-pasted and we go to some ends to encourage that by placing parts of it in include/uapi/linux/perf_event.h. It seems better to catch this issue early rather than propagate it. > > > > > > Thanks again, > > > > > > Alex > > > > > > > > > > > > > Also, why is this clobbering memory? Worth adding a comment. > > > > > > > > No idea, I see that it is also done this way in > > > > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ? > > > > It would seem to make sense then not to have a memory constraint until > > we know why we're doing it? > > > > I have just had the answer internally (thanks to @Brendan Sweeney): > csr modifications can alter how the memory is accessed (satp which > changes the address space, sum which allows/disallows userspace > access...), so we need the memory barrier to make sure the compiler > does not reorder the memory accesses. The conditions you mention shouldn't apply here though? Even if you add a memory barrier for the compiler what is stopping the hardware reordering loads and stores? If it absolutely has to be there then a comment something like "There is a bug is riscv where the csrr instruction can clobber memory breaking future reads and some how this constraint fixes it by ... ". Thanks, Ian > Thanks, > > Alex > > > Thanks, > > Ian > > > > > > > > > > Thanks for your comments! > > > > > > > > Alex > > > > > > > > > > > > > > Thanks, > > > > > Ian > > > > > > > > > > > + __v; \ > > > > > > +}) > > > > > > + > > > > > > +static unsigned long csr_read_num(int csr_num) > > > > > > +{ > > > > > > +#define switchcase_csr_read(__csr_num, __val) {\ > > > > > > + case __csr_num: \ > > > > > > + __val = csr_read(__csr_num); \ > > > > > > + break; } > > > > > > +#define switchcase_csr_read_2(__csr_num, __val) {\ > > > > > > + switchcase_csr_read(__csr_num + 0, __val) \ > > > > > > + switchcase_csr_read(__csr_num + 1, __val)} > > > > > > +#define switchcase_csr_read_4(__csr_num, __val) {\ > > > > > > + switchcase_csr_read_2(__csr_num + 0, __val) \ > > > > > > + switchcase_csr_read_2(__csr_num + 2, __val)} > > > > > > +#define switchcase_csr_read_8(__csr_num, __val) {\ > > > > > > + switchcase_csr_read_4(__csr_num + 0, __val) \ > > > > > > + switchcase_csr_read_4(__csr_num + 4, __val)} > > > > > > +#define switchcase_csr_read_16(__csr_num, __val) {\ > > > > > > + switchcase_csr_read_8(__csr_num + 0, __val) \ > > > > > > + switchcase_csr_read_8(__csr_num + 8, __val)} > > > > > > +#define switchcase_csr_read_32(__csr_num, __val) {\ > > > > > > + switchcase_csr_read_16(__csr_num + 0, __val) \ > > > > > > + switchcase_csr_read_16(__csr_num + 16, __val)} > > > > > > + > > > > > > + unsigned long ret = 0; > > > > > > + > > > > > > + switch (csr_num) { > > > > > > + switchcase_csr_read_32(CSR_CYCLE, ret) > > > > > > + default: > > > > > > + break; > > > > > > + } > > > > > > + > > > > > > + return ret; > > > > > > +#undef switchcase_csr_read_32 > > > > > > +#undef switchcase_csr_read_16 > > > > > > +#undef switchcase_csr_read_8 > > > > > > +#undef switchcase_csr_read_4 > > > > > > +#undef switchcase_csr_read_2 > > > > > > +#undef switchcase_csr_read > > > > > > +} > > > > > > + > > > > > > +static u64 read_perf_counter(unsigned int counter) > > > > > > +{ > > > > > > + return csr_read_num(CSR_CYCLE + counter); > > > > > > +} > > > > > > + > > > > > > +static u64 read_timestamp(void) > > > > > > +{ > > > > > > + return csr_read_num(CSR_TIME); > > > > > > +} > > > > > > + > > > > > > #else > > > > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > > > > > > static u64 read_timestamp(void) { return 0; } > > > > > > -- > > > > > > 2.39.2 > > > > > >
On 31 Jul 2023, at 17:06, Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > On Mon, Jul 31, 2023 at 5:10 PM Ian Rogers <irogers@google.com> wrote: >> >> On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: >>> >>> On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: >>>> >>>> Hi Ian, >>>> >>>> On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <irogers@google.com> wrote: >>>>> >>>>> On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: >>>>>> >>>>>> riscv now supports mmaping hardware counters so add what's needed to >>>>>> take advantage of that in libperf. >>>>>> >>>>>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> >>>>>> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> >>>>>> Reviewed-by: Atish Patra <atishp@rivosinc.com> >>>>>> --- >>>>>> tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ >>>>>> 1 file changed, 65 insertions(+) >>>>>> >>>>>> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c >>>>>> index 0d1634cedf44..378a163f0554 100644 >>>>>> --- a/tools/lib/perf/mmap.c >>>>>> +++ b/tools/lib/perf/mmap.c >>>>>> @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) >>>>>> >>>>>> static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } >>>>>> >>>>>> +#elif __riscv_xlen == 64 >>>>> >>>>> This is something of an odd guard, perhaps: >>>>> #elif defined(__riscv) && __riscv_xlen == 64 >>>>> >>>>> That way it is more intention revealing that this is riscv code. Could >>>>> you add a comment relating to the __riscv_xlen ? >>>> >>>> I guess Andrew answered that already. >>>> >> >> Not sure. I still think it looks weird: >> ... >> #if defined(__i386__) || defined(__x86_64__) >> ... >> #elif defined(__aarch64__) >> ... >> #elif __riscv_xlen == 64 >> ... >> #else >> static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } >> static u64 read_timestamp(void) { return 0; } >> #endif >> >> The first two are clearly #ifdef-ing architecture specific assembly >> code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth >> a comment like "csrr is only available when you have xlens of 64 >> because ..." > > __riscv_xlen indicates riscv64, which is the only target of this > patchset. But if you prefer, I don't mind adding back the > defined(__riscv) if I re-spin a new version. I mean, -Wundef is a thing that will scream at you if you evaluate a macro that’s undefined and get an implicit 0, and parts of the linux build seem to enable it. So I would strongly recommend against (ab)using that feature of the preprocessor, especially when it aligns with greater clarity. Jess
On Mon, Jul 31, 2023 at 09:46:07AM -0700, Ian Rogers wrote: > On Mon, Jul 31, 2023 at 9:06 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > I have just had the answer internally (thanks to @Brendan Sweeney): > > csr modifications can alter how the memory is accessed (satp which > > changes the address space, sum which allows/disallows userspace > > access...), so we need the memory barrier to make sure the compiler > > does not reorder the memory accesses. > > The conditions you mention shouldn't apply here though? Even if you > add a memory barrier for the compiler what is stopping the hardware > reordering loads and stores? If it absolutely has to be there then a > comment something like "There is a bug is riscv where the csrr > instruction can clobber memory breaking future reads and some how this > constraint fixes it by ... ". If the hardware doesn't know that writing to a csr can change how memory accesses are done and reorders memory accesses around that csr write, you've got a really broken piece of hardware on your hands ... I know nothing about risc-v, and maybe the definition says that you need to put a memory barrier before and/or after it in the instruction stream, but on all hardware I'm familiar with, writing to a CSR is an implicitly serialising operation.
On 31 Jul 2023, at 20:47, Matthew Wilcox <willy@infradead.org> wrote: > > On Mon, Jul 31, 2023 at 09:46:07AM -0700, Ian Rogers wrote: >> On Mon, Jul 31, 2023 at 9:06 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: >>> I have just had the answer internally (thanks to @Brendan Sweeney): >>> csr modifications can alter how the memory is accessed (satp which >>> changes the address space, sum which allows/disallows userspace >>> access...), so we need the memory barrier to make sure the compiler >>> does not reorder the memory accesses. >> >> The conditions you mention shouldn't apply here though? Even if you >> add a memory barrier for the compiler what is stopping the hardware >> reordering loads and stores? If it absolutely has to be there then a >> comment something like "There is a bug is riscv where the csrr >> instruction can clobber memory breaking future reads and some how this >> constraint fixes it by ... ". > > If the hardware doesn't know that writing to a csr can change how memory > accesses are done and reorders memory accesses around that csr write, > you've got a really broken piece of hardware on your hands ... > > I know nothing about risc-v, and maybe the definition says that you need > to put a memory barrier before and/or after it in the instruction stream, > but on all hardware I'm familiar with, writing to a CSR is an implicitly > serialising operation. satp has some special rules due to the interaction with TLBs. Enabling / disabling translation by toggling between Bare and non-Bare modes doesn’t require any kind of fence, nor does changing the ASID (if not recycling), but any other changes to satp (changing between Sv39/Sv48/Sv57 or changing the page table base address) do require fences (not really sure why to be honest, maybe something about having separate kernel and user page tables?..). §5.2.1 Supervisor Memory-Management Fence Instruction of the privileged spec (the one I’m looking at is version 20211203 but dated October 4, 2022) details this. Jess
On Mon, Jul 31, 2023 at 6:46 PM Ian Rogers <irogers@google.com> wrote: > > On Mon, Jul 31, 2023 at 9:06 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > On Mon, Jul 31, 2023 at 5:10 PM Ian Rogers <irogers@google.com> wrote: > > > > > > On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > > > On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > > > > > Hi Ian, > > > > > > > > > > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <irogers@google.com> wrote: > > > > > > > > > > > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > > > > > > > > > > > riscv now supports mmaping hardware counters so add what's needed to > > > > > > > take advantage of that in libperf. > > > > > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > > > > > > Reviewed-by: Atish Patra <atishp@rivosinc.com> > > > > > > > --- > > > > > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++ > > > > > > > 1 file changed, 65 insertions(+) > > > > > > > > > > > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c > > > > > > > index 0d1634cedf44..378a163f0554 100644 > > > > > > > --- a/tools/lib/perf/mmap.c > > > > > > > +++ b/tools/lib/perf/mmap.c > > > > > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) > > > > > > > > > > > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } > > > > > > > > > > > > > > +#elif __riscv_xlen == 64 > > > > > > > > > > > > This is something of an odd guard, perhaps: > > > > > > #elif defined(__riscv) && __riscv_xlen == 64 > > > > > > > > > > > > That way it is more intention revealing that this is riscv code. Could > > > > > > you add a comment relating to the __riscv_xlen ? > > > > > > > > > > I guess Andrew answered that already. > > > > > > > > > > > Not sure. I still think it looks weird: > > > ... > > > #if defined(__i386__) || defined(__x86_64__) > > > ... > > > #elif defined(__aarch64__) > > > ... > > > #elif __riscv_xlen == 64 > > > ... > > > #else > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > > > static u64 read_timestamp(void) { return 0; } > > > #endif > > > > > > The first two are clearly #ifdef-ing architecture specific assembly > > > code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth > > > a comment like "csrr is only available when you have xlens of 64 > > > because ..." > > > > __riscv_xlen indicates riscv64, which is the only target of this > > patchset. But if you prefer, I don't mind adding back the > > defined(__riscv) if I re-spin a new version. > > This kind of begs the question as to why there is no __riscv64 ifdef. > The issue with xlen is it isn't intention revealing so for regular > people trying to understand the code it would be nice to document it. I understand, I'll add the defined(__riscv) and a comment to explain the __riscv_xlen. > > > > > > > > > > > > > > > > > + > > > > > > > +/* TODO: implement rv32 support */ > > > > > > > + > > > > > > > +#define CSR_CYCLE 0xc00 > > > > > > > +#define CSR_TIME 0xc01 > > > > > > > + > > > > > > > +#define csr_read(csr) \ > > > > > > > +({ \ > > > > > > > + register unsigned long __v; \ > > > > > > > + __asm__ __volatile__ ("csrr %0, " #csr \ > > > > > > > + : "=r" (__v) : \ > > > > > > > + : "memory"); \ > > > > > > > > > > > > To avoid the macro pasting that could potentially go weird, could this be: > > > > > > > > > > > > __asm__ __volatile__ ("csrr %0, %1", > > > > > > : "=r"(__v) /* outputs */ > > > > > > : "i"(csr) /* inputs */ > > > > > > : "memory" /* clobbers */) > > > > > > > > Forgot to answer this one: it compiles, but I have to admit that I > > > > don't understand the difference and if that's correct (all macros in > > > > arch/riscv/include/asm/csr.h use # to do this) and what benefits it > > > > brings. Can you elaborate more on things that could "go weird"? > > > > > > So rather than use an input constraint for the asm block you are using > > > the C preprocessor to paste in the csr argument. If csr is something > > > like "1" then it looks good and you'll get "csrr %0,1". If you pass > > > something like "1 << 31" then that will be pasted as "csrr %0, 1 << > > > 31" and that starts to get weird in the context of being in the > > > assembler where it is unlikely the C operators work. Using the input > > > constraint avoids this, causes the C compiler to check the type of the > > > argument and you'll probably get more intelligible error messages as a > > > consequence. > > > > > > > Thanks. So if I'm not mistaken, in this exact context, given we only > > use csr_read() through the csr_read_num() function, it seems ok right? > > So you've formed a cargo cult and the justification is not wanting to > stop a copy-paste chain from somewhere else. This code itself will be > copy-pasted and we go to some ends to encourage that by placing parts > of it in include/uapi/linux/perf_event.h. It seems better to catch > this issue early rather than propagate it. OK, nothing to argue here, I'll use your first proposition in the next version. > > > > > > > > > Thanks again, > > > > > > > > Alex > > > > > > > > > > > > > > > > Also, why is this clobbering memory? Worth adding a comment. > > > > > > > > > > No idea, I see that it is also done this way in > > > > > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ? > > > > > > It would seem to make sense then not to have a memory constraint until > > > we know why we're doing it? > > > > > > > I have just had the answer internally (thanks to @Brendan Sweeney): > > csr modifications can alter how the memory is accessed (satp which > > changes the address space, sum which allows/disallows userspace > > access...), so we need the memory barrier to make sure the compiler > > does not reorder the memory accesses. > > The conditions you mention shouldn't apply here though? Even if you > add a memory barrier for the compiler what is stopping the hardware > reordering loads and stores? If it absolutely has to be there then a > comment something like "There is a bug is riscv where the csrr > instruction can clobber memory breaking future reads and some how this > constraint fixes it by ... ". You're right, it does not apply here, so I can remove this memory barrier. Regarding the hardware that could speculate across a csr instruction, that's dealt with in the riscv spec: "In particular, a CSR access is performed after the execution of any prior instructions in program order whose behavior modifies or is modified by the CSR state and before the execution of any subsequent instructions in program order whose behavior modifies or is modified by the CSR state" Thanks for your comments, Alex > > Thanks, > Ian > > > Thanks, > > > > Alex > > > > > Thanks, > > > Ian > > > > > > > > > > > > > Thanks for your comments! > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > Thanks, > > > > > > Ian > > > > > > > > > > > > > + __v; \ > > > > > > > +}) > > > > > > > + > > > > > > > +static unsigned long csr_read_num(int csr_num) > > > > > > > +{ > > > > > > > +#define switchcase_csr_read(__csr_num, __val) {\ > > > > > > > + case __csr_num: \ > > > > > > > + __val = csr_read(__csr_num); \ > > > > > > > + break; } > > > > > > > +#define switchcase_csr_read_2(__csr_num, __val) {\ > > > > > > > + switchcase_csr_read(__csr_num + 0, __val) \ > > > > > > > + switchcase_csr_read(__csr_num + 1, __val)} > > > > > > > +#define switchcase_csr_read_4(__csr_num, __val) {\ > > > > > > > + switchcase_csr_read_2(__csr_num + 0, __val) \ > > > > > > > + switchcase_csr_read_2(__csr_num + 2, __val)} > > > > > > > +#define switchcase_csr_read_8(__csr_num, __val) {\ > > > > > > > + switchcase_csr_read_4(__csr_num + 0, __val) \ > > > > > > > + switchcase_csr_read_4(__csr_num + 4, __val)} > > > > > > > +#define switchcase_csr_read_16(__csr_num, __val) {\ > > > > > > > + switchcase_csr_read_8(__csr_num + 0, __val) \ > > > > > > > + switchcase_csr_read_8(__csr_num + 8, __val)} > > > > > > > +#define switchcase_csr_read_32(__csr_num, __val) {\ > > > > > > > + switchcase_csr_read_16(__csr_num + 0, __val) \ > > > > > > > + switchcase_csr_read_16(__csr_num + 16, __val)} > > > > > > > + > > > > > > > + unsigned long ret = 0; > > > > > > > + > > > > > > > + switch (csr_num) { > > > > > > > + switchcase_csr_read_32(CSR_CYCLE, ret) > > > > > > > + default: > > > > > > > + break; > > > > > > > + } > > > > > > > + > > > > > > > + return ret; > > > > > > > +#undef switchcase_csr_read_32 > > > > > > > +#undef switchcase_csr_read_16 > > > > > > > +#undef switchcase_csr_read_8 > > > > > > > +#undef switchcase_csr_read_4 > > > > > > > +#undef switchcase_csr_read_2 > > > > > > > +#undef switchcase_csr_read > > > > > > > +} > > > > > > > + > > > > > > > +static u64 read_perf_counter(unsigned int counter) > > > > > > > +{ > > > > > > > + return csr_read_num(CSR_CYCLE + counter); > > > > > > > +} > > > > > > > + > > > > > > > +static u64 read_timestamp(void) > > > > > > > +{ > > > > > > > + return csr_read_num(CSR_TIME); > > > > > > > +} > > > > > > > + > > > > > > > #else > > > > > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } > > > > > > > static u64 read_timestamp(void) { return 0; } > > > > > > > -- > > > > > > > 2.39.2 > > > > > > >
diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c index 0d1634cedf44..378a163f0554 100644 --- a/tools/lib/perf/mmap.c +++ b/tools/lib/perf/mmap.c @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter) static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } +#elif __riscv_xlen == 64 + +/* TODO: implement rv32 support */ + +#define CSR_CYCLE 0xc00 +#define CSR_TIME 0xc01 + +#define csr_read(csr) \ +({ \ + register unsigned long __v; \ + __asm__ __volatile__ ("csrr %0, " #csr \ + : "=r" (__v) : \ + : "memory"); \ + __v; \ +}) + +static unsigned long csr_read_num(int csr_num) +{ +#define switchcase_csr_read(__csr_num, __val) {\ + case __csr_num: \ + __val = csr_read(__csr_num); \ + break; } +#define switchcase_csr_read_2(__csr_num, __val) {\ + switchcase_csr_read(__csr_num + 0, __val) \ + switchcase_csr_read(__csr_num + 1, __val)} +#define switchcase_csr_read_4(__csr_num, __val) {\ + switchcase_csr_read_2(__csr_num + 0, __val) \ + switchcase_csr_read_2(__csr_num + 2, __val)} +#define switchcase_csr_read_8(__csr_num, __val) {\ + switchcase_csr_read_4(__csr_num + 0, __val) \ + switchcase_csr_read_4(__csr_num + 4, __val)} +#define switchcase_csr_read_16(__csr_num, __val) {\ + switchcase_csr_read_8(__csr_num + 0, __val) \ + switchcase_csr_read_8(__csr_num + 8, __val)} +#define switchcase_csr_read_32(__csr_num, __val) {\ + switchcase_csr_read_16(__csr_num + 0, __val) \ + switchcase_csr_read_16(__csr_num + 16, __val)} + + unsigned long ret = 0; + + switch (csr_num) { + switchcase_csr_read_32(CSR_CYCLE, ret) + default: + break; + } + + return ret; +#undef switchcase_csr_read_32 +#undef switchcase_csr_read_16 +#undef switchcase_csr_read_8 +#undef switchcase_csr_read_4 +#undef switchcase_csr_read_2 +#undef switchcase_csr_read +} + +static u64 read_perf_counter(unsigned int counter) +{ + return csr_read_num(CSR_CYCLE + counter); +} + +static u64 read_timestamp(void) +{ + return csr_read_num(CSR_TIME); +} + #else static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; } static u64 read_timestamp(void) { return 0; }