Message ID | 20231023082911.23242-1-luxu.kernel@bytedance.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1150540vqx; Mon, 23 Oct 2023 01:29:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE3IovEVwr+1o9XjowimvjQHUqwNDpMLBYUpT5Kpw1sBZ/tQe+WGXpWvjLl5D/1pqsG9gCl X-Received: by 2002:aca:130f:0:b0:3b2:e6b5:e99 with SMTP id e15-20020aca130f000000b003b2e6b50e99mr6957895oii.52.1698049795136; Mon, 23 Oct 2023 01:29:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698049795; cv=none; d=google.com; s=arc-20160816; b=VGGZJr9PCvuyogTpjttqtHL3h+5QBeLARLQ/hfsWtfN119f+jma/G8hxyXcEnofS5A Y+YP4Ru06x9rqPxKUMJrm7oa/5cBD9+kybqdGa7eAN6xEgqbwx6bmDfW78lH9xXCNtCb X0XEu6K+yl5KBD4TtgFCnS/OZIjFPgdGARH1tbqksq3w3H14MW/cS3rd3c5FqhvRVkYW 8qVJmWZUXwErdDgNFCpu++Q5WmFx1q5xIaTiVVNKaD26DK29UJMLy7J8pOh71TE6L4YD YOLOBkkTbu3mw6aXG5pOSWAQ7NfbQYJn2HXyK46KY+OOd/KD4wv/hv9di3PwAUMwY1hU xwYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=T2Go0p6zVY7hIr9Nax8xzpGmWWVF8MFp+pn8gehqpE8=; fh=O4eOVhHjKTyaSORSQqoF+Vv6GYDagG+8LotEUwhRqvg=; b=l6PyW6IVsTnHckHYNALlAbuopyLoVTO2ZrDjywgyNrQ7TBuhUJRm0Z5ZMsOzQvW3Bj rs55jL5gl+QjseQ6u/OCeUVGtm9RrDmrWRogAVY8iKstO9q8A1yn9zYZA+zpYdIyNXCK zDEGERv7IBOybXoh0lPCYit34RQnYInmsKmLUHJUJr4iutLYy2PcH48LmWpG9J2k1vgG +ibroBAQuDpzu4pfHs2PLK8F7jDKLleCNTLi1vFMqIzOqIca8PAILuztVjPB5SKJwY1w AjtVs2rt8b8gkewLR4Sf2A18b5cQ4nX8GT69fdK5K3Fb6rCeDlUrAdIA+zc+cAyjVcb1 1zKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=M6bm7WAW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id fj1-20020a056a003a0100b006b1fc88d095si6012114pfb.71.2023.10.23.01.29.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 01:29:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=M6bm7WAW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id EC7B4804ACFD; Mon, 23 Oct 2023 01:29:53 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232985AbjJWI3w (ORCPT <rfc822;a1648639935@gmail.com> + 26 others); Mon, 23 Oct 2023 04:29:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232994AbjJWI3p (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 23 Oct 2023 04:29:45 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD84BC0 for <linux-kernel@vger.kernel.org>; Mon, 23 Oct 2023 01:29:21 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1c77449a6daso24818165ad.0 for <linux-kernel@vger.kernel.org>; Mon, 23 Oct 2023 01:29:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1698049761; x=1698654561; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=T2Go0p6zVY7hIr9Nax8xzpGmWWVF8MFp+pn8gehqpE8=; b=M6bm7WAWDiC96MnOnkLju3tgFhcLE1okBLJZ8vBop5J1QByBVUp0QWdSyNe+JIJNr7 JpHsm4yRoq6H15NNj6qvJApqqPLTExs3SsZ9/u7Fy+ac2BeEHUwKcjZMsNKNEa1dT6LP QVGvhgCEIjjl6uHS09AthIQ6scCVZrNM+PEhvpQ2q2MC67vUm88HmBWUdxRMsQgW1nRk 1jvM37oC6c8yH8H/Ahc1TIXwKUJFgYWvl6AjMSBSZWeIvD3Muab7UMCnVFPOLMedqUgs zgqAS85G3qMcZD6j7/C9AMJcD+nIlq5iK4AfVyVMhmO4nFIYhXv3xfJoqmXSeDYyMNrH DRig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698049761; x=1698654561; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=T2Go0p6zVY7hIr9Nax8xzpGmWWVF8MFp+pn8gehqpE8=; b=rAoRAEAIyOd9rE5epxqaDTXff8+njA0s3WHU3DffDoCCMGKRHqOiHBHre9nXN/p/l6 zejOcqVRToSSWKCGkeidMJLn+i0n3MZ8L2ZW4cPpIblqTDvmBneksf3iwGx+2atFBLQ/ CVPa4CKA8QYEVRWywHMpULyFeJMWVO1qNf6kCWYu/CB5vRMQbxPacNTHNEH23XZHCwzY 5nqHM4+y+DBEPO0W0J+1aHWa8GKs2zo+v/nyEZLGoipExEJGYUv/hQxdSJs6HZ7GkQ3q 1y8G/c/5AchTEO8khSGBrvez5ZQPfleZfKt3ZkHOUu80WMjc1o4RaS5VkkVTwnPbaaRz +goA== X-Gm-Message-State: AOJu0YyftnrSAa853l9mdtADwq4SpuUwd9lscZV3cxc7nMVeMKpXvqia wM08blEdxgPXUr0DSSk9ygUFjQ== X-Received: by 2002:a17:903:4111:b0:1c4:3cd5:4298 with SMTP id r17-20020a170903411100b001c43cd54298mr7877870pld.18.1698049761112; Mon, 23 Oct 2023 01:29:21 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id d15-20020a170903230f00b001b8b07bc600sm5415805plh.186.2023.10.23.01.29.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 23 Oct 2023 01:29:20 -0700 (PDT) From: Xu Lu <luxu.kernel@bytedance.com> To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, tglx@linutronix.de, maz@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, liyu.yukiteru@bytedance.com, sunjiadong.lff@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, chaiwen.cc@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu <luxu.kernel@bytedance.com> Subject: [RFC 00/12] riscv: Introduce Pseudo NMI Date: Mon, 23 Oct 2023 16:28:59 +0800 Message-Id: <20231023082911.23242-1-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 23 Oct 2023 01:29:54 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780534261940472926 X-GMAIL-MSGID: 1780534261940472926 |
Series |
riscv: Introduce Pseudo NMI
|
|
Message
Xu Lu
Oct. 23, 2023, 8:28 a.m. UTC
Sorry to resend this patch series as I forgot to Cc the open list before. Below is formal content. The existing RISC-V kernel lacks an NMI mechanism as there is still no ratified resumable NMI extension in RISC-V community, which can not satisfy some scenarios like high precision perf sampling. There is an incoming hardware extension called Smrnmi which supports resumable NMI by providing new control registers to save status when NMI happens. However, it is still a draft and requires privilege level switches for kernel to utilize it as NMIs are automatically trapped into machine mode. This patch series introduces a software pseudo NMI mechanism in RISC-V. The existing RISC-V kernel disables interrupts via per cpu control register CSR_STATUS, the SIE bit of which controls the enablement of all interrupts of whole cpu. When SIE bit is clear, no interrupt is enabled. This patch series implements NMI by switching interrupt disable way to another per cpu control register CSR_IE. This register controls the enablement of each separate interrupt. Each bit of CSR_IE corresponds to a single major interrupt and a clear bit means disablement of corresponding interrupt. To implement pseudo NMI, we switch to CSR_IE masking when disabling irqs. When interrupts are disabled, all bits of CSR_IE corresponding to normal interrupts are cleared while bits corresponding to NMIs are still kept as ones. The SIE bit of CSR_STATUS is now untouched and always kept as one. We measured performacne of Pseudo NMI patches based on v6.6-rc4 on SiFive FU740 Soc with hackbench as our benchmark. The result shows 1.90% performance degradation. "hackbench 200 process 1000" (average over 10 runs) +-----------+----------+------------+ | | v6.6-rc4 | Pseudo NMI | +-----------+----------+------------+ | time | 251.646s | 256.416s | +-----------+----------+------------+ The overhead mainly comes from two parts: 1. Saving and restoring CSR_IE register during kernel entry/return. This part introduces about 0.57% performance overhead. 2. The extra instructions introduced by 'irqs_enabled_ie'. It is a special value representing normal CSR_IE when irqs are enabled. It is implemented via ALTERNATIVE to adapt to platforms without PMU. This part introduces about 1.32% performance overhead. Limits: CSR_IE is now used for disabling irqs and any other code should not touch this register to avoid corrupting irq status, which means we do not support masking a single interrupt now. We have tried to fix this by introducing a per cpu variable to save CSR_IE value when disabling irqs. Then all operatations on CSR_IE will be redirected to this variable and CSR_IE's value will be restored from this variable when enabling irqs. Obviously this method introduces extra memory accesses in hot code path. TODO: 1. The adaption to hypervisor extension is ongoing. 2. The adaption to advanced interrupt architecture is ongoing. This version of Pseudo NMI is rebased on v6.6-rc7. Thanks in advance for comments. Xu Lu (12): riscv: Introduce CONFIG_RISCV_PSEUDO_NMI riscv: Make CSR_IE register part of context riscv: Switch to CSR_IE masking when disabling irqs riscv: Switch back to CSR_STATUS masking when going idle riscv: kvm: Switch back to CSR_STATUS masking when entering guest riscv: Allow requesting irq as pseudo NMI riscv: Handle pseudo NMI in arch irq handler riscv: Enable NMIs during irqs disabled context riscv: Enable NMIs during exceptions riscv: Enable NMIs during interrupt handling riscv: Request pmu overflow interrupt as NMI riscv: Enable CONFIG_RISCV_PSEUDO_NMI in default arch/riscv/Kconfig | 10 ++++ arch/riscv/include/asm/csr.h | 17 ++++++ arch/riscv/include/asm/irqflags.h | 91 ++++++++++++++++++++++++++++++ arch/riscv/include/asm/processor.h | 4 ++ arch/riscv/include/asm/ptrace.h | 7 +++ arch/riscv/include/asm/switch_to.h | 7 +++ arch/riscv/kernel/asm-offsets.c | 3 + arch/riscv/kernel/entry.S | 18 ++++++ arch/riscv/kernel/head.S | 10 ++++ arch/riscv/kernel/irq.c | 17 ++++++ arch/riscv/kernel/process.c | 6 ++ arch/riscv/kernel/suspend_entry.S | 1 + arch/riscv/kernel/traps.c | 54 ++++++++++++++---- arch/riscv/kvm/vcpu.c | 18 ++++-- drivers/clocksource/timer-clint.c | 4 ++ drivers/clocksource/timer-riscv.c | 4 ++ drivers/irqchip/irq-riscv-intc.c | 66 ++++++++++++++++++++++ drivers/perf/riscv_pmu_sbi.c | 21 ++++++- 18 files changed, 340 insertions(+), 18 deletions(-)
Comments
On Mon, Oct 23, 2023 at 1:29 AM Xu Lu <luxu.kernel@bytedance.com> wrote: > > Sorry to resend this patch series as I forgot to Cc the open list before. > Below is formal content. > > The existing RISC-V kernel lacks an NMI mechanism as there is still no > ratified resumable NMI extension in RISC-V community, which can not > satisfy some scenarios like high precision perf sampling. There is an > incoming hardware extension called Smrnmi which supports resumable NMI > by providing new control registers to save status when NMI happens. > However, it is still a draft and requires privilege level switches for > kernel to utilize it as NMIs are automatically trapped into machine mode. > > This patch series introduces a software pseudo NMI mechanism in RISC-V. > The existing RISC-V kernel disables interrupts via per cpu control > register CSR_STATUS, the SIE bit of which controls the enablement of all > interrupts of whole cpu. When SIE bit is clear, no interrupt is enabled. > This patch series implements NMI by switching interrupt disable way to > another per cpu control register CSR_IE. This register controls the > enablement of each separate interrupt. Each bit of CSR_IE corresponds > to a single major interrupt and a clear bit means disablement of > corresponding interrupt. > > To implement pseudo NMI, we switch to CSR_IE masking when disabling > irqs. When interrupts are disabled, all bits of CSR_IE corresponding to > normal interrupts are cleared while bits corresponding to NMIs are still > kept as ones. The SIE bit of CSR_STATUS is now untouched and always kept > as one. > > We measured performacne of Pseudo NMI patches based on v6.6-rc4 on SiFive > FU740 Soc with hackbench as our benchmark. The result shows 1.90% > performance degradation. > > "hackbench 200 process 1000" (average over 10 runs) > +-----------+----------+------------+ > | | v6.6-rc4 | Pseudo NMI | > +-----------+----------+------------+ > | time | 251.646s | 256.416s | > +-----------+----------+------------+ > > The overhead mainly comes from two parts: > > 1. Saving and restoring CSR_IE register during kernel entry/return. > This part introduces about 0.57% performance overhead. > > 2. The extra instructions introduced by 'irqs_enabled_ie'. It is a > special value representing normal CSR_IE when irqs are enabled. It is > implemented via ALTERNATIVE to adapt to platforms without PMU. This > part introduces about 1.32% performance overhead. > We had an evaluation of this approach earlier this year and concluded with the similar findings. The pseudo NMI is only useful for profiling use case which doesn't happen all the time in the system Adding the cost to the hotpath and sacrificing performance for everything for something for performance profiling is not desirable at all. That's why, an SBI extension Supervisor Software Events (SSE) is under development. https://lists.riscv.org/g/tech-prs/message/515 Instead of selective disabling of interrupts, SSE takes an orthogonal approach where M-mode would invoke a special trap handler. That special handler will invoke the driver specific handler which would be registered by the driver (i.e. perf driver) This covers both firmware first RAS and perf use cases. The above version of the specification is a bit out-of-date and the revised version will be sent soon. Clement(cc'd) has also done a PoC of SSE and perf driver using the SSE framework. This resulted in actual saving in performance for RAS/perf without sacrificing the normal performance. Clement is planning to send the series soon with more details. > Limits: > > CSR_IE is now used for disabling irqs and any other code should > not touch this register to avoid corrupting irq status, which means > we do not support masking a single interrupt now. > > We have tried to fix this by introducing a per cpu variable to save > CSR_IE value when disabling irqs. Then all operatations on CSR_IE > will be redirected to this variable and CSR_IE's value will be > restored from this variable when enabling irqs. Obviously this method > introduces extra memory accesses in hot code path. > > TODO: > > 1. The adaption to hypervisor extension is ongoing. > > 2. The adaption to advanced interrupt architecture is ongoing. > > This version of Pseudo NMI is rebased on v6.6-rc7. > > Thanks in advance for comments. > > Xu Lu (12): > riscv: Introduce CONFIG_RISCV_PSEUDO_NMI > riscv: Make CSR_IE register part of context > riscv: Switch to CSR_IE masking when disabling irqs > riscv: Switch back to CSR_STATUS masking when going idle > riscv: kvm: Switch back to CSR_STATUS masking when entering guest > riscv: Allow requesting irq as pseudo NMI > riscv: Handle pseudo NMI in arch irq handler > riscv: Enable NMIs during irqs disabled context > riscv: Enable NMIs during exceptions > riscv: Enable NMIs during interrupt handling > riscv: Request pmu overflow interrupt as NMI > riscv: Enable CONFIG_RISCV_PSEUDO_NMI in default > > arch/riscv/Kconfig | 10 ++++ > arch/riscv/include/asm/csr.h | 17 ++++++ > arch/riscv/include/asm/irqflags.h | 91 ++++++++++++++++++++++++++++++ > arch/riscv/include/asm/processor.h | 4 ++ > arch/riscv/include/asm/ptrace.h | 7 +++ > arch/riscv/include/asm/switch_to.h | 7 +++ > arch/riscv/kernel/asm-offsets.c | 3 + > arch/riscv/kernel/entry.S | 18 ++++++ > arch/riscv/kernel/head.S | 10 ++++ > arch/riscv/kernel/irq.c | 17 ++++++ > arch/riscv/kernel/process.c | 6 ++ > arch/riscv/kernel/suspend_entry.S | 1 + > arch/riscv/kernel/traps.c | 54 ++++++++++++++---- > arch/riscv/kvm/vcpu.c | 18 ++++-- > drivers/clocksource/timer-clint.c | 4 ++ > drivers/clocksource/timer-riscv.c | 4 ++ > drivers/irqchip/irq-riscv-intc.c | 66 ++++++++++++++++++++++ > drivers/perf/riscv_pmu_sbi.c | 21 ++++++- > 18 files changed, 340 insertions(+), 18 deletions(-) > > -- > 2.20.1 > -- Regards, Atish
On Thu, Oct 26, 2023 at 7:02 AM Atish Patra <atishp@atishpatra.org> wrote: > > On Mon, Oct 23, 2023 at 1:29 AM Xu Lu <luxu.kernel@bytedance.com> wrote: > > > > Sorry to resend this patch series as I forgot to Cc the open list before. > > Below is formal content. > > > > The existing RISC-V kernel lacks an NMI mechanism as there is still no > > ratified resumable NMI extension in RISC-V community, which can not > > satisfy some scenarios like high precision perf sampling. There is an > > incoming hardware extension called Smrnmi which supports resumable NMI > > by providing new control registers to save status when NMI happens. > > However, it is still a draft and requires privilege level switches for > > kernel to utilize it as NMIs are automatically trapped into machine mode. > > > > This patch series introduces a software pseudo NMI mechanism in RISC-V. > > The existing RISC-V kernel disables interrupts via per cpu control > > register CSR_STATUS, the SIE bit of which controls the enablement of all > > interrupts of whole cpu. When SIE bit is clear, no interrupt is enabled. > > This patch series implements NMI by switching interrupt disable way to > > another per cpu control register CSR_IE. This register controls the > > enablement of each separate interrupt. Each bit of CSR_IE corresponds > > to a single major interrupt and a clear bit means disablement of > > corresponding interrupt. > > > > To implement pseudo NMI, we switch to CSR_IE masking when disabling > > irqs. When interrupts are disabled, all bits of CSR_IE corresponding to > > normal interrupts are cleared while bits corresponding to NMIs are still > > kept as ones. The SIE bit of CSR_STATUS is now untouched and always kept > > as one. > > > > We measured performacne of Pseudo NMI patches based on v6.6-rc4 on SiFive > > FU740 Soc with hackbench as our benchmark. The result shows 1.90% > > performance degradation. > > > > "hackbench 200 process 1000" (average over 10 runs) > > +-----------+----------+------------+ > > | | v6.6-rc4 | Pseudo NMI | > > +-----------+----------+------------+ > > | time | 251.646s | 256.416s | > > +-----------+----------+------------+ > > > > The overhead mainly comes from two parts: > > > > 1. Saving and restoring CSR_IE register during kernel entry/return. > > This part introduces about 0.57% performance overhead. > > > > 2. The extra instructions introduced by 'irqs_enabled_ie'. It is a > > special value representing normal CSR_IE when irqs are enabled. It is > > implemented via ALTERNATIVE to adapt to platforms without PMU. This > > part introduces about 1.32% performance overhead. > > > > We had an evaluation of this approach earlier this year and concluded > with the similar findings. > The pseudo NMI is only useful for profiling use case which doesn't > happen all the time in the system > Adding the cost to the hotpath and sacrificing performance for > everything for something for performance profiling > is not desirable at all. Thanks a lot for your reply! First, please allow me to explain that CSR_IE Pseudo NMI actually can support more than PMU profiling. For example, if we choose to make external major interrupt as NMI and use ithreshold or eithreshold in AIA to control which minor external interrupts can be sent to CPU, then we actually can support multiple minor interrupts as NMI while keeping the other minor interrupts still normal irqs. This is what we are working on now. Also, if we take virtualization scenarios into account, CSR_IE Pseudo NMI can support NMI passthrough to VM without too much effort from hypervisor, if only corresponding interrupt can be delegated to VS-mode. I wonder if SSE supports interrupt passthrough to VM? > > That's why, an SBI extension Supervisor Software Events (SSE) is under > development. > https://lists.riscv.org/g/tech-prs/message/515 > > Instead of selective disabling of interrupts, SSE takes an orthogonal > approach where M-mode would invoke a special trap > handler. That special handler will invoke the driver specific handler > which would be registered by the driver (i.e. perf driver) > This covers both firmware first RAS and perf use cases. > > The above version of the specification is a bit out-of-date and the > revised version will be sent soon. > Clement(cc'd) has also done a PoC of SSE and perf driver using the SSE > framework. This resulted in actual saving > in performance for RAS/perf without sacrificing the normal performance. > > Clement is planning to send the series soon with more details. The SSE extension you mentioned is a brilliant design and does solve a lot of problems! We have considered implementing NMI via SBI calls before. The main problem is that if a driver using NMI needs to cooperate with SBI code, extra coupling will be introduced as the driver vendor and firmware vendor may not be the same one. We think perhaps it is better to keep SBI code as simple and stable as possible. Please correct me if there is any misunderstanding. Thanks again and looking forward to your reply. > > > Limits: > > > > CSR_IE is now used for disabling irqs and any other code should > > not touch this register to avoid corrupting irq status, which means > > we do not support masking a single interrupt now. > > > > We have tried to fix this by introducing a per cpu variable to save > > CSR_IE value when disabling irqs. Then all operatations on CSR_IE > > will be redirected to this variable and CSR_IE's value will be > > restored from this variable when enabling irqs. Obviously this method > > introduces extra memory accesses in hot code path. > > > > > > > TODO: > > > > 1. The adaption to hypervisor extension is ongoing. > > > > 2. The adaption to advanced interrupt architecture is ongoing. > > > > This version of Pseudo NMI is rebased on v6.6-rc7. > > > > Thanks in advance for comments. > > > > Xu Lu (12): > > riscv: Introduce CONFIG_RISCV_PSEUDO_NMI > > riscv: Make CSR_IE register part of context > > riscv: Switch to CSR_IE masking when disabling irqs > > riscv: Switch back to CSR_STATUS masking when going idle > > riscv: kvm: Switch back to CSR_STATUS masking when entering guest > > riscv: Allow requesting irq as pseudo NMI > > riscv: Handle pseudo NMI in arch irq handler > > riscv: Enable NMIs during irqs disabled context > > riscv: Enable NMIs during exceptions > > riscv: Enable NMIs during interrupt handling > > riscv: Request pmu overflow interrupt as NMI > > riscv: Enable CONFIG_RISCV_PSEUDO_NMI in default > > > > arch/riscv/Kconfig | 10 ++++ > > arch/riscv/include/asm/csr.h | 17 ++++++ > > arch/riscv/include/asm/irqflags.h | 91 ++++++++++++++++++++++++++++++ > > arch/riscv/include/asm/processor.h | 4 ++ > > arch/riscv/include/asm/ptrace.h | 7 +++ > > arch/riscv/include/asm/switch_to.h | 7 +++ > > arch/riscv/kernel/asm-offsets.c | 3 + > > arch/riscv/kernel/entry.S | 18 ++++++ > > arch/riscv/kernel/head.S | 10 ++++ > > arch/riscv/kernel/irq.c | 17 ++++++ > > arch/riscv/kernel/process.c | 6 ++ > > arch/riscv/kernel/suspend_entry.S | 1 + > > arch/riscv/kernel/traps.c | 54 ++++++++++++++---- > > arch/riscv/kvm/vcpu.c | 18 ++++-- > > drivers/clocksource/timer-clint.c | 4 ++ > > drivers/clocksource/timer-riscv.c | 4 ++ > > drivers/irqchip/irq-riscv-intc.c | 66 ++++++++++++++++++++++ > > drivers/perf/riscv_pmu_sbi.c | 21 ++++++- > > 18 files changed, 340 insertions(+), 18 deletions(-) > > > > -- > > 2.20.1 > > > > > -- > Regards, > Atish
On Thu, Oct 26, 2023 at 6:56 AM Xu Lu <luxu.kernel@bytedance.com> wrote: > > On Thu, Oct 26, 2023 at 7:02 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > On Mon, Oct 23, 2023 at 1:29 AM Xu Lu <luxu.kernel@bytedance.com> wrote: > > > > > > Sorry to resend this patch series as I forgot to Cc the open list before. > > > Below is formal content. > > > > > > The existing RISC-V kernel lacks an NMI mechanism as there is still no > > > ratified resumable NMI extension in RISC-V community, which can not > > > satisfy some scenarios like high precision perf sampling. There is an > > > incoming hardware extension called Smrnmi which supports resumable NMI > > > by providing new control registers to save status when NMI happens. > > > However, it is still a draft and requires privilege level switches for > > > kernel to utilize it as NMIs are automatically trapped into machine mode. > > > > > > This patch series introduces a software pseudo NMI mechanism in RISC-V. > > > The existing RISC-V kernel disables interrupts via per cpu control > > > register CSR_STATUS, the SIE bit of which controls the enablement of all > > > interrupts of whole cpu. When SIE bit is clear, no interrupt is enabled. > > > This patch series implements NMI by switching interrupt disable way to > > > another per cpu control register CSR_IE. This register controls the > > > enablement of each separate interrupt. Each bit of CSR_IE corresponds > > > to a single major interrupt and a clear bit means disablement of > > > corresponding interrupt. > > > > > > To implement pseudo NMI, we switch to CSR_IE masking when disabling > > > irqs. When interrupts are disabled, all bits of CSR_IE corresponding to > > > normal interrupts are cleared while bits corresponding to NMIs are still > > > kept as ones. The SIE bit of CSR_STATUS is now untouched and always kept > > > as one. > > > > > > We measured performacne of Pseudo NMI patches based on v6.6-rc4 on SiFive > > > FU740 Soc with hackbench as our benchmark. The result shows 1.90% > > > performance degradation. > > > > > > "hackbench 200 process 1000" (average over 10 runs) > > > +-----------+----------+------------+ > > > | | v6.6-rc4 | Pseudo NMI | > > > +-----------+----------+------------+ > > > | time | 251.646s | 256.416s | > > > +-----------+----------+------------+ > > > > > > The overhead mainly comes from two parts: > > > > > > 1. Saving and restoring CSR_IE register during kernel entry/return. > > > This part introduces about 0.57% performance overhead. > > > > > > 2. The extra instructions introduced by 'irqs_enabled_ie'. It is a > > > special value representing normal CSR_IE when irqs are enabled. It is > > > implemented via ALTERNATIVE to adapt to platforms without PMU. This > > > part introduces about 1.32% performance overhead. > > > > > > > We had an evaluation of this approach earlier this year and concluded > > with the similar findings. > > The pseudo NMI is only useful for profiling use case which doesn't > > happen all the time in the system > > Adding the cost to the hotpath and sacrificing performance for > > everything for something for performance profiling > > is not desirable at all. > > Thanks a lot for your reply! > > First, please allow me to explain that CSR_IE Pseudo NMI actually can support > more than PMU profiling. For example, if we choose to make external major > interrupt as NMI and use ithreshold or eithreshold in AIA to control which minor > external interrupts can be sent to CPU, then we actually can support multiple > minor interrupts as NMI while keeping the other minor interrupts still > normal irqs. > This is what we are working on now. > What's the use case for external interrupts to behave as NMI ? Note: You can do the same thing with SSE as well if required. But I want to understand the use case first. > Also, if we take virtualization scenarios into account, CSR_IE Pseudo NMI can > support NMI passthrough to VM without too much effort from hypervisor, if only > corresponding interrupt can be delegated to VS-mode. I wonder if SSE supports > interrupt passthrough to VM? > Not technically interrupt pass through but hypervisor can invoke the guest SSE handler with the same mechanism. In fact, the original proposal specifies the async page fault as another use case for SSE. > > > > That's why, an SBI extension Supervisor Software Events (SSE) is under > > development. > > https://lists.riscv.org/g/tech-prs/message/515 > > > > Instead of selective disabling of interrupts, SSE takes an orthogonal > > approach where M-mode would invoke a special trap > > handler. That special handler will invoke the driver specific handler > > which would be registered by the driver (i.e. perf driver) > > This covers both firmware first RAS and perf use cases. > > > > The above version of the specification is a bit out-of-date and the > > revised version will be sent soon. > > Clement(cc'd) has also done a PoC of SSE and perf driver using the SSE > > framework. This resulted in actual saving > > in performance for RAS/perf without sacrificing the normal performance. > > > > Clement is planning to send the series soon with more details. > > The SSE extension you mentioned is a brilliant design and does solve a lot of > problems! > > We have considered implementing NMI via SBI calls before. The main problem > is that if a driver using NMI needs to cooperate with SBI code, extra > coupling will > be introduced as the driver vendor and firmware vendor may not be the same one. > We think perhaps it is better to keep SBI code as simple and stable as possible. > Yes. However, we also gain significant performance while we have a 2% regression with current pseudo-NMI approach. Quoting the numbers from SSE series[1]: "Additionally, SSE event handling is faster that the standard IRQ handling path with almost half executed instruction (700 vs 1590). Some complementary tests/perf measurements will be done." Major infrastructure development is one time effort. Adding additional sources of SSE effort will be minimal once the framework is in place. The SSE extension is still in draft stage and can accomodate any other use case that you may have in mind. IMHO, it would better to define one performant mechanism to solve the high priority interrupt use case. [1] https://www.spinics.net/lists/kernel/msg4982224.html > Please correct me if there is any misunderstanding. > > Thanks again and looking forward to your reply. > > > > > > Limits: > > > > > > CSR_IE is now used for disabling irqs and any other code should > > > not touch this register to avoid corrupting irq status, which means > > > we do not support masking a single interrupt now. > > > > > > We have tried to fix this by introducing a per cpu variable to save > > > CSR_IE value when disabling irqs. Then all operatations on CSR_IE > > > will be redirected to this variable and CSR_IE's value will be > > > restored from this variable when enabling irqs. Obviously this method > > > introduces extra memory accesses in hot code path. > > > > > > > > > > > > TODO: > > > > > > 1. The adaption to hypervisor extension is ongoing. > > > > > > 2. The adaption to advanced interrupt architecture is ongoing. > > > > > > This version of Pseudo NMI is rebased on v6.6-rc7. > > > > > > Thanks in advance for comments. > > > > > > Xu Lu (12): > > > riscv: Introduce CONFIG_RISCV_PSEUDO_NMI > > > riscv: Make CSR_IE register part of context > > > riscv: Switch to CSR_IE masking when disabling irqs > > > riscv: Switch back to CSR_STATUS masking when going idle > > > riscv: kvm: Switch back to CSR_STATUS masking when entering guest > > > riscv: Allow requesting irq as pseudo NMI > > > riscv: Handle pseudo NMI in arch irq handler > > > riscv: Enable NMIs during irqs disabled context > > > riscv: Enable NMIs during exceptions > > > riscv: Enable NMIs during interrupt handling > > > riscv: Request pmu overflow interrupt as NMI > > > riscv: Enable CONFIG_RISCV_PSEUDO_NMI in default > > > > > > arch/riscv/Kconfig | 10 ++++ > > > arch/riscv/include/asm/csr.h | 17 ++++++ > > > arch/riscv/include/asm/irqflags.h | 91 ++++++++++++++++++++++++++++++ > > > arch/riscv/include/asm/processor.h | 4 ++ > > > arch/riscv/include/asm/ptrace.h | 7 +++ > > > arch/riscv/include/asm/switch_to.h | 7 +++ > > > arch/riscv/kernel/asm-offsets.c | 3 + > > > arch/riscv/kernel/entry.S | 18 ++++++ > > > arch/riscv/kernel/head.S | 10 ++++ > > > arch/riscv/kernel/irq.c | 17 ++++++ > > > arch/riscv/kernel/process.c | 6 ++ > > > arch/riscv/kernel/suspend_entry.S | 1 + > > > arch/riscv/kernel/traps.c | 54 ++++++++++++++---- > > > arch/riscv/kvm/vcpu.c | 18 ++++-- > > > drivers/clocksource/timer-clint.c | 4 ++ > > > drivers/clocksource/timer-riscv.c | 4 ++ > > > drivers/irqchip/irq-riscv-intc.c | 66 ++++++++++++++++++++++ > > > drivers/perf/riscv_pmu_sbi.c | 21 ++++++- > > > 18 files changed, 340 insertions(+), 18 deletions(-) > > > > > > -- > > > 2.20.1 > > > > > > > > > -- > > Regards, > > Atish
On Fri, Oct 27, 2023 at 3:42 AM Atish Patra <atishp@atishpatra.org> wrote: > > On Thu, Oct 26, 2023 at 6:56 AM Xu Lu <luxu.kernel@bytedance.com> wrote: > > > > On Thu, Oct 26, 2023 at 7:02 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > > > On Mon, Oct 23, 2023 at 1:29 AM Xu Lu <luxu.kernel@bytedance.com> wrote: > > > > > > > > Sorry to resend this patch series as I forgot to Cc the open list before. > > > > Below is formal content. > > > > > > > > The existing RISC-V kernel lacks an NMI mechanism as there is still no > > > > ratified resumable NMI extension in RISC-V community, which can not > > > > satisfy some scenarios like high precision perf sampling. There is an > > > > incoming hardware extension called Smrnmi which supports resumable NMI > > > > by providing new control registers to save status when NMI happens. > > > > However, it is still a draft and requires privilege level switches for > > > > kernel to utilize it as NMIs are automatically trapped into machine mode. > > > > > > > > This patch series introduces a software pseudo NMI mechanism in RISC-V. > > > > The existing RISC-V kernel disables interrupts via per cpu control > > > > register CSR_STATUS, the SIE bit of which controls the enablement of all > > > > interrupts of whole cpu. When SIE bit is clear, no interrupt is enabled. > > > > This patch series implements NMI by switching interrupt disable way to > > > > another per cpu control register CSR_IE. This register controls the > > > > enablement of each separate interrupt. Each bit of CSR_IE corresponds > > > > to a single major interrupt and a clear bit means disablement of > > > > corresponding interrupt. > > > > > > > > To implement pseudo NMI, we switch to CSR_IE masking when disabling > > > > irqs. When interrupts are disabled, all bits of CSR_IE corresponding to > > > > normal interrupts are cleared while bits corresponding to NMIs are still > > > > kept as ones. The SIE bit of CSR_STATUS is now untouched and always kept > > > > as one. > > > > > > > > We measured performacne of Pseudo NMI patches based on v6.6-rc4 on SiFive > > > > FU740 Soc with hackbench as our benchmark. The result shows 1.90% > > > > performance degradation. > > > > > > > > "hackbench 200 process 1000" (average over 10 runs) > > > > +-----------+----------+------------+ > > > > | | v6.6-rc4 | Pseudo NMI | > > > > +-----------+----------+------------+ > > > > | time | 251.646s | 256.416s | > > > > +-----------+----------+------------+ > > > > > > > > The overhead mainly comes from two parts: > > > > > > > > 1. Saving and restoring CSR_IE register during kernel entry/return. > > > > This part introduces about 0.57% performance overhead. > > > > > > > > 2. The extra instructions introduced by 'irqs_enabled_ie'. It is a > > > > special value representing normal CSR_IE when irqs are enabled. It is > > > > implemented via ALTERNATIVE to adapt to platforms without PMU. This > > > > part introduces about 1.32% performance overhead. > > > > > > > > > > We had an evaluation of this approach earlier this year and concluded > > > with the similar findings. > > > The pseudo NMI is only useful for profiling use case which doesn't > > > happen all the time in the system > > > Adding the cost to the hotpath and sacrificing performance for > > > everything for something for performance profiling > > > is not desirable at all. > > > > Thanks a lot for your reply! > > > > First, please allow me to explain that CSR_IE Pseudo NMI actually can support > > more than PMU profiling. For example, if we choose to make external major > > interrupt as NMI and use ithreshold or eithreshold in AIA to control which minor > > external interrupts can be sent to CPU, then we actually can support multiple > > minor interrupts as NMI while keeping the other minor interrupts still > > normal irqs. > > This is what we are working on now. > > > > What's the use case for external interrupts to behave as NMI ? > > Note: You can do the same thing with SSE as well if required. But I > want to understand the > use case first. For example, some high precision event devices designed as timer or watchdog devices (please refer to [1][2]) which may not be per cpu. [1] https://lwn.net/Articles/924927/ [2] https://lore.kernel.org/lkml/1445961999-9506-1-git-send-email-fu.wei@linaro.org/T/ > > > Also, if we take virtualization scenarios into account, CSR_IE Pseudo NMI can > > support NMI passthrough to VM without too much effort from hypervisor, if only > > corresponding interrupt can be delegated to VS-mode. I wonder if SSE supports > > interrupt passthrough to VM? > > > > Not technically interrupt pass through but hypervisor can invoke the > guest SSE handler > with the same mechanism. In fact, the original proposal specifies the > async page fault > as another use case for SSE. > > > > > > > That's why, an SBI extension Supervisor Software Events (SSE) is under > > > development. > > > https://lists.riscv.org/g/tech-prs/message/515 > > > > > > Instead of selective disabling of interrupts, SSE takes an orthogonal > > > approach where M-mode would invoke a special trap > > > handler. That special handler will invoke the driver specific handler > > > which would be registered by the driver (i.e. perf driver) > > > This covers both firmware first RAS and perf use cases. > > > > > > The above version of the specification is a bit out-of-date and the > > > revised version will be sent soon. > > > Clement(cc'd) has also done a PoC of SSE and perf driver using the SSE > > > framework. This resulted in actual saving > > > in performance for RAS/perf without sacrificing the normal performance. > > > > > > Clement is planning to send the series soon with more details. > > > > The SSE extension you mentioned is a brilliant design and does solve a lot of > > problems! > > > > We have considered implementing NMI via SBI calls before. The main problem > > is that if a driver using NMI needs to cooperate with SBI code, extra > > coupling will > > be introduced as the driver vendor and firmware vendor may not be the same one. > > We think perhaps it is better to keep SBI code as simple and stable as possible. > > > > Yes. However, we also gain significant performance while we have a 2% > regression with > current pseudo-NMI approach. Quoting the numbers from SSE series[1]: > > "Additionally, SSE event handling is faster that the > standard IRQ handling path with almost half executed instruction (700 vs > 1590). Some complementary tests/perf measurements will be done." I think maybe there are two more issues to be considered. 1) The instructions may increase as the supported event ids increases. More instructions will be introduced to maintain the mapping between event id and handler_context. Besides, some security check is needed to avoid the fact that the physical address passed by S-mode software does not belong to it (for example, the address may belong to an enclave). 2) I am wondering whether the control flow from user thread -> M-mode -> S-mode -> M-mode -> user thread will sacrifice locality and cause more cache misses. Looking forward to your more measurements! > > Major infrastructure development is one time effort. Adding additional > sources of SSE effort will be minimal once > the framework is in place. The SSE extension is still in draft stage > and can accomodate any other use case > that you may have in mind. IMHO, it would better to define one > performant mechanism to solve the high priority > interrupt use case. I am concerned that every time a new event id is added, both the SBI and driver code need to be modified simultaneously, which may increase coupling and complexity. Regards, Xu Lu. > > [1] https://www.spinics.net/lists/kernel/msg4982224.html > > Please correct me if there is any misunderstanding. > > > > Thanks again and looking forward to your reply. > > > > > > > > > Limits: > > > > > > > > CSR_IE is now used for disabling irqs and any other code should > > > > not touch this register to avoid corrupting irq status, which means > > > > we do not support masking a single interrupt now. > > > > > > > > We have tried to fix this by introducing a per cpu variable to save > > > > CSR_IE value when disabling irqs. Then all operatations on CSR_IE > > > > will be redirected to this variable and CSR_IE's value will be > > > > restored from this variable when enabling irqs. Obviously this method > > > > introduces extra memory accesses in hot code path. > > > > > > > > > > > > > > > > > TODO: > > > > > > > > 1. The adaption to hypervisor extension is ongoing. > > > > > > > > 2. The adaption to advanced interrupt architecture is ongoing. > > > > > > > > This version of Pseudo NMI is rebased on v6.6-rc7. > > > > > > > > Thanks in advance for comments. > > > > > > > > Xu Lu (12): > > > > riscv: Introduce CONFIG_RISCV_PSEUDO_NMI > > > > riscv: Make CSR_IE register part of context > > > > riscv: Switch to CSR_IE masking when disabling irqs > > > > riscv: Switch back to CSR_STATUS masking when going idle > > > > riscv: kvm: Switch back to CSR_STATUS masking when entering guest > > > > riscv: Allow requesting irq as pseudo NMI > > > > riscv: Handle pseudo NMI in arch irq handler > > > > riscv: Enable NMIs during irqs disabled context > > > > riscv: Enable NMIs during exceptions > > > > riscv: Enable NMIs during interrupt handling > > > > riscv: Request pmu overflow interrupt as NMI > > > > riscv: Enable CONFIG_RISCV_PSEUDO_NMI in default > > > > > > > > arch/riscv/Kconfig | 10 ++++ > > > > arch/riscv/include/asm/csr.h | 17 ++++++ > > > > arch/riscv/include/asm/irqflags.h | 91 ++++++++++++++++++++++++++++++ > > > > arch/riscv/include/asm/processor.h | 4 ++ > > > > arch/riscv/include/asm/ptrace.h | 7 +++ > > > > arch/riscv/include/asm/switch_to.h | 7 +++ > > > > arch/riscv/kernel/asm-offsets.c | 3 + > > > > arch/riscv/kernel/entry.S | 18 ++++++ > > > > arch/riscv/kernel/head.S | 10 ++++ > > > > arch/riscv/kernel/irq.c | 17 ++++++ > > > > arch/riscv/kernel/process.c | 6 ++ > > > > arch/riscv/kernel/suspend_entry.S | 1 + > > > > arch/riscv/kernel/traps.c | 54 ++++++++++++++---- > > > > arch/riscv/kvm/vcpu.c | 18 ++++-- > > > > drivers/clocksource/timer-clint.c | 4 ++ > > > > drivers/clocksource/timer-riscv.c | 4 ++ > > > > drivers/irqchip/irq-riscv-intc.c | 66 ++++++++++++++++++++++ > > > > drivers/perf/riscv_pmu_sbi.c | 21 ++++++- > > > > 18 files changed, 340 insertions(+), 18 deletions(-) > > > > > > > > -- > > > > 2.20.1 > > > > > > > > > > > > > -- > > > Regards, > > > Atish > > > > -- > Regards, > Atish
On Thu, Oct 26 2023 at 21:56, Xu Lu wrote: > On Thu, Oct 26, 2023 at 7:02 AM Atish Patra <atishp@atishpatra.org> wrote: > First, please allow me to explain that CSR_IE Pseudo NMI actually can support > more than PMU profiling. For example, if we choose to make external major > interrupt as NMI and use ithreshold or eithreshold in AIA to control which minor > external interrupts can be sent to CPU, then we actually can support multiple > minor interrupts as NMI while keeping the other minor interrupts still > normal irqs. What is the use case for these NMIs? Anything else than profiling is not really possible in NMI context at all. Thanks, tglx