Message ID | 20230126161559.1467374-1-guoren@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp363171wrn; Thu, 26 Jan 2023 08:29:53 -0800 (PST) X-Google-Smtp-Source: AK7set94n0hZrvHvbagDtlFSDR5r8lql9wEXZdrwRAzEK60o373OMbf2A1egjTzsREx2XWKbXico X-Received: by 2002:a05:6a20:7d96:b0:bc:5a6:1b32 with SMTP id v22-20020a056a207d9600b000bc05a61b32mr2185608pzj.19.1674750592864; Thu, 26 Jan 2023 08:29:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674750592; cv=none; d=google.com; s=arc-20160816; b=GxjpqxRU0VdZFH7UdyruCDFXBiaybu7OPqNNt/QdTkvcrwslWVm/NnxZVDz80newPK kQViXfZ4Ec/6MOi5MaMhuRNl6xpWzhE2fARcSGLW/JC5ITix7r4oQINQ0D2VcEhDQZ01 dyfxRThPK9THerug0mxFq0HPcT9cO8qZqgatTDFgTZArrHZSdSSlcHjkNe6I+Ho5RLIJ A9H9WZPwlLhBL46iGbxNZ/G/T8h2znx2bAPkI3FpOFIpXDUzOv27tjDrVm1qpQEAaQoa lKSY0+VJVzIRq44F6HznopTQvAhSS0eWebqHgJSZoL65BHxrStKRoHnj62LkbkjfmayW N9gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=1WL05kjpfZV+4ZBGRAW4Bzf2ARYJ0uD5QwRtICuYnPw=; b=J9itpEtYbCCBOT4n0KN4CF1ITSAYKT3C7FSTH2f2mXD1vCW5yWQmINVOTKyitzbeDU cNl7tPzGrtL4oR9w/ahA+9XDhUTrCU3/Wc0Yk/VJLTRma4kAVZfKhesePpXp0sMrjSBe 5qC8zEl/D0Wr/OOgFmtEszHVYr0QtPeBvsUVMpr2Mr17JwBpc4Snx2xCSvcsb391/dyx oH4EomoN99DA46Fndev5eXQ1wbXNBhmFL1+PhzNi7GAA4MM19x/dj0AoBk+QNUX4CFjN LKJpbavlTJf0/z9ur8Ny8xNVyTJMwn3LAmdePf/lT/42JcstVyeWcJokKU79XNbh3fMp YvUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Fg25g3sq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z1-20020a17090abd8100b0022c05520921si5726079pjr.139.2023.01.26.08.29.41; Thu, 26 Jan 2023 08:29:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Fg25g3sq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231480AbjAZQQR (ORCPT <rfc822;lekhanya01809@gmail.com> + 99 others); Thu, 26 Jan 2023 11:16:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229730AbjAZQQQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 26 Jan 2023 11:16:16 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11EEC1BDC for <linux-kernel@vger.kernel.org>; Thu, 26 Jan 2023 08:16:15 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9FDE1618BB for <linux-kernel@vger.kernel.org>; Thu, 26 Jan 2023 16:16:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E0601C433EF; Thu, 26 Jan 2023 16:16:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674749774; bh=kpO4X4ciaqU1KybKz9EuWl1FehzDKISrOewl41Tfjt8=; h=From:To:Cc:Subject:Date:From; b=Fg25g3sqdtvEn2lld+9h1q7vBOPNK3Mvam7nrVA8Xtph/MQWRPfXM4bSUligYxw2E 3D/9zGFB3uONVr+7WEHMcccAq6muVuCtIarbMVFett0G4ME/U6APha1oWyarnFVpUU mTSucSO8aeLCdWPZDHQOaego2FTHkr9QX1aQlzV0sWV9wpgFuBP5seRFx4Mz4FyohC uQbIcYGfCAL5kk55ZLTpPrQMPFNH9sCElbizaRwEkyjp5hQTBV8h61zsE9kfgP9GcQ aD6DOkUnMdPCFFh1i0XvADZmTOq4hdqLpUqeAH0h/T8tJjfrMcli2OH6o7zWkS7/Iz 1IUWkTotoUPvQ== From: guoren@kernel.org To: guoren@kernel.org, palmer@dabbelt.com, paul.walmsley@sifive.com, mhiramat@kernel.org, conor.dooley@microchip.com, penberg@kernel.org, mark.rutland@arm.com Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Guo Ren <guoren@linux.alibaba.com> Subject: [PATCH] riscv: kprobe: Optimize kprobe with accurate atomicity Date: Thu, 26 Jan 2023 11:15:59 -0500 Message-Id: <20230126161559.1467374-1-guoren@kernel.org> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756103277495049177?= X-GMAIL-MSGID: =?utf-8?q?1756103277495049177?= |
Series |
riscv: kprobe: Optimize kprobe with accurate atomicity
|
|
Commit Message
Guo Ren
Jan. 26, 2023, 4:15 p.m. UTC
From: Guo Ren <guoren@linux.alibaba.com> The previous implementation was based on the stop_matchine mechanism, which reduced the speed of arm/disarm_kprobe. Using minimum ebreak instruction would get accurate atomicity. This patch removes the patch_text of riscv, which is based on stop_machine. Then riscv only reserved patch_text_nosync, and developers need to be more careful in dealing with patch_text atomicity. When CONFIG_RISCV_ISA_C=n, the ebreak could replace the whole instruction. When CONFIG_RISCV_ISA_C=y, the patch uses 16-bit length c.ebreak instruction, which may occupy the first part of the 32-bit instruction and leave half the rest of the broken instruction. Because ebreak could detour the flow to skip it, leaving it in the kernel text memory is okay. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> --- arch/riscv/include/asm/patch.h | 1 - arch/riscv/kernel/patch.c | 33 ------------------------------ arch/riscv/kernel/probes/kprobes.c | 29 ++++++++++++++++++-------- 3 files changed, 21 insertions(+), 42 deletions(-)
Comments
Hi, Guo Ren 在 2023/1/27 0:15, guoren@kernel.org 写道: > From: Guo Ren <guoren@linux.alibaba.com> > > The previous implementation was based on the stop_matchine mechanism, > which reduced the speed of arm/disarm_kprobe. Using minimum ebreak > instruction would get accurate atomicity. > > This patch removes the patch_text of riscv, which is based on > stop_machine. Then riscv only reserved patch_text_nosync, and developers > need to be more careful in dealing with patch_text atomicity. In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute in the instructions that will be modified, it is still need to stop other CPUs via patch_text API, or you have any better solution to achieve the purpose? Thanks. > > When CONFIG_RISCV_ISA_C=n, the ebreak could replace the whole > instruction. When CONFIG_RISCV_ISA_C=y, the patch uses 16-bit length > c.ebreak instruction, which may occupy the first part of the 32-bit > instruction and leave half the rest of the broken instruction. Because > ebreak could detour the flow to skip it, leaving it in the kernel text > memory is okay. > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> > --- > arch/riscv/include/asm/patch.h | 1 - > arch/riscv/kernel/patch.c | 33 ------------------------------ > arch/riscv/kernel/probes/kprobes.c | 29 ++++++++++++++++++-------- > 3 files changed, 21 insertions(+), 42 deletions(-) > > diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h > index 9a7d7346001e..2500782e6f5b 100644 > --- a/arch/riscv/include/asm/patch.h > +++ b/arch/riscv/include/asm/patch.h > @@ -7,6 +7,5 @@ > #define _ASM_RISCV_PATCH_H > > int patch_text_nosync(void *addr, const void *insns, size_t len); > -int patch_text(void *addr, u32 insn); > > #endif /* _ASM_RISCV_PATCH_H */ > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > index 765004b60513..8bd51ed8b806 100644 > --- a/arch/riscv/kernel/patch.c > +++ b/arch/riscv/kernel/patch.c > @@ -98,36 +98,3 @@ int patch_text_nosync(void *addr, const void *insns, size_t len) > return ret; > } > NOKPROBE_SYMBOL(patch_text_nosync); > - > -static int patch_text_cb(void *data) > -{ > - struct patch_insn *patch = data; > - int ret = 0; > - > - if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) { > - ret = > - patch_text_nosync(patch->addr, &patch->insn, > - GET_INSN_LENGTH(patch->insn)); > - atomic_inc(&patch->cpu_count); > - } else { > - while (atomic_read(&patch->cpu_count) <= num_online_cpus()) > - cpu_relax(); > - smp_mb(); > - } > - > - return ret; > -} > -NOKPROBE_SYMBOL(patch_text_cb); > - > -int patch_text(void *addr, u32 insn) > -{ > - struct patch_insn patch = { > - .addr = addr, > - .insn = insn, > - .cpu_count = ATOMIC_INIT(0), > - }; > - > - return stop_machine_cpuslocked(patch_text_cb, > - &patch, cpu_online_mask); > -} > -NOKPROBE_SYMBOL(patch_text); > diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c > index 475989f06d6d..27f8960c321c 100644 > --- a/arch/riscv/kernel/probes/kprobes.c > +++ b/arch/riscv/kernel/probes/kprobes.c > @@ -24,12 +24,18 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); > static void __kprobes arch_prepare_ss_slot(struct kprobe *p) > { > unsigned long offset = GET_INSN_LENGTH(p->opcode); > +#ifdef CONFIG_RISCV_ISA_C > + u32 opcode = __BUG_INSN_16; > +#else > + u32 opcode = __BUG_INSN_32; > +#endif > > p->ainsn.api.restore = (unsigned long)p->addr + offset; > > - patch_text(p->ainsn.api.insn, p->opcode); > - patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset), > - __BUG_INSN_32); > + patch_text_nosync(p->ainsn.api.insn, &p->opcode, offset); > + patch_text_nosync((void *)((unsigned long)(p->ainsn.api.insn) + offset), > + &opcode, GET_INSN_LENGTH(opcode)); > + I have submit a similar optimization for patching single-step slot [2]. And it is indeed safe to use compact breakpoint in single-step slot no matter what type of patched instruction is. Thanks. > } > > static void __kprobes arch_prepare_simulate(struct kprobe *p) > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > /* install breakpoint in text */ > void __kprobes arch_arm_kprobe(struct kprobe *p) > { > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > - patch_text(p->addr, __BUG_INSN_32); > - else > - patch_text(p->addr, __BUG_INSN_16); > +#ifdef CONFIG_RISCV_ISA_C > + u32 opcode = __BUG_INSN_16; > +#else > + u32 opcode = __BUG_INSN_32; > +#endif > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); Sounds good, but it will leave some RVI instruction truncated in kernel text, i doubt kernel behavior depends on the rest of the truncated instruction, well, it needs more strict testing to prove my concern :) > } > > /* remove breakpoint from text */ > void __kprobes arch_disarm_kprobe(struct kprobe *p) > { > - patch_text(p->addr, p->opcode); > +#ifdef CONFIG_RISCV_ISA_C > + u32 opcode = __BUG_INSN_16; > +#else > + u32 opcode = __BUG_INSN_32; > +#endif > + patch_text_nosync(p->addr, &p->opcode, GET_INSN_LENGTH(opcode)); > } > > void __kprobes arch_remove_kprobe(struct kprobe *p) [1] - https://lore.kernel.org/lkml/20230127130541.1250865-9-chenguokai17@mails.ucas.ac.cn/ [2] - https://lore.kernel.org/lkml/20220927022435.129965-1-liaochang1@huawei.com/T/
On Sat, Jan 28, 2023 at 11:53 AM liaochang (A) <liaochang1@huawei.com> wrote: > > Hi, Guo Ren > > 在 2023/1/27 0:15, guoren@kernel.org 写道: > > From: Guo Ren <guoren@linux.alibaba.com> > > > > The previous implementation was based on the stop_matchine mechanism, > > which reduced the speed of arm/disarm_kprobe. Using minimum ebreak > > instruction would get accurate atomicity. > > > > This patch removes the patch_text of riscv, which is based on > > stop_machine. Then riscv only reserved patch_text_nosync, and developers > > need to be more careful in dealing with patch_text atomicity. > > In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > in the instructions that will be modified, it is still need to stop other CPUs > via patch_text API, or you have any better solution to achieve the purpose? - The stop_machine is an expensive way all architectures should avoid, and you could keep that in your OPTPROBES implementation files with static functions. - The stop_machine couldn't work with PREEMPTION, so your implementation needs to work with !PREEMPTION. > > Thanks. > > > > > When CONFIG_RISCV_ISA_C=n, the ebreak could replace the whole > > instruction. When CONFIG_RISCV_ISA_C=y, the patch uses 16-bit length > > c.ebreak instruction, which may occupy the first part of the 32-bit > > instruction and leave half the rest of the broken instruction. Because > > ebreak could detour the flow to skip it, leaving it in the kernel text > > memory is okay. > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > Signed-off-by: Guo Ren <guoren@kernel.org> > > --- > > arch/riscv/include/asm/patch.h | 1 - > > arch/riscv/kernel/patch.c | 33 ------------------------------ > > arch/riscv/kernel/probes/kprobes.c | 29 ++++++++++++++++++-------- > > 3 files changed, 21 insertions(+), 42 deletions(-) > > > > diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h > > index 9a7d7346001e..2500782e6f5b 100644 > > --- a/arch/riscv/include/asm/patch.h > > +++ b/arch/riscv/include/asm/patch.h > > @@ -7,6 +7,5 @@ > > #define _ASM_RISCV_PATCH_H > > > > int patch_text_nosync(void *addr, const void *insns, size_t len); > > -int patch_text(void *addr, u32 insn); > > > > #endif /* _ASM_RISCV_PATCH_H */ > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > index 765004b60513..8bd51ed8b806 100644 > > --- a/arch/riscv/kernel/patch.c > > +++ b/arch/riscv/kernel/patch.c > > @@ -98,36 +98,3 @@ int patch_text_nosync(void *addr, const void *insns, size_t len) > > return ret; > > } > > NOKPROBE_SYMBOL(patch_text_nosync); > > - > > -static int patch_text_cb(void *data) > > -{ > > - struct patch_insn *patch = data; > > - int ret = 0; > > - > > - if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) { > > - ret = > > - patch_text_nosync(patch->addr, &patch->insn, > > - GET_INSN_LENGTH(patch->insn)); > > - atomic_inc(&patch->cpu_count); > > - } else { > > - while (atomic_read(&patch->cpu_count) <= num_online_cpus()) > > - cpu_relax(); > > - smp_mb(); > > - } > > - > > - return ret; > > -} > > -NOKPROBE_SYMBOL(patch_text_cb); > > - > > -int patch_text(void *addr, u32 insn) > > -{ > > - struct patch_insn patch = { > > - .addr = addr, > > - .insn = insn, > > - .cpu_count = ATOMIC_INIT(0), > > - }; > > - > > - return stop_machine_cpuslocked(patch_text_cb, > > - &patch, cpu_online_mask); > > -} > > -NOKPROBE_SYMBOL(patch_text); > > diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c > > index 475989f06d6d..27f8960c321c 100644 > > --- a/arch/riscv/kernel/probes/kprobes.c > > +++ b/arch/riscv/kernel/probes/kprobes.c > > @@ -24,12 +24,18 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); > > static void __kprobes arch_prepare_ss_slot(struct kprobe *p) > > { > > unsigned long offset = GET_INSN_LENGTH(p->opcode); > > +#ifdef CONFIG_RISCV_ISA_C > > + u32 opcode = __BUG_INSN_16; > > +#else > > + u32 opcode = __BUG_INSN_32; > > +#endif > > > > p->ainsn.api.restore = (unsigned long)p->addr + offset; > > > > - patch_text(p->ainsn.api.insn, p->opcode); > > - patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset), > > - __BUG_INSN_32); > > + patch_text_nosync(p->ainsn.api.insn, &p->opcode, offset); > > + patch_text_nosync((void *)((unsigned long)(p->ainsn.api.insn) + offset), > > + &opcode, GET_INSN_LENGTH(opcode)); > > + > > I have submit a similar optimization for patching single-step slot [2]. > And it is indeed safe to use compact breakpoint in single-step slot no matter > what type of patched instruction is. Keep the same c.ebreak style in CONFIG_RISCV_ISA_C. It's my design principle. > > Thanks. > > > } > > > > static void __kprobes arch_prepare_simulate(struct kprobe *p) > > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > > /* install breakpoint in text */ > > void __kprobes arch_arm_kprobe(struct kprobe *p) > > { > > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > > - patch_text(p->addr, __BUG_INSN_32); > > - else > > - patch_text(p->addr, __BUG_INSN_16); > > +#ifdef CONFIG_RISCV_ISA_C > > + u32 opcode = __BUG_INSN_16; > > +#else > > + u32 opcode = __BUG_INSN_32; > > +#endif > > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > > Sounds good, but it will leave some RVI instruction truncated in kernel text, > i doubt kernel behavior depends on the rest of the truncated instruction, well, > it needs more strict testing to prove my concern :) I do this on purpose, and it doesn't cause any problems. Don't worry; IFU hw must enforce the fetch sequence, and there is no way to execute broken instructions even in the speculative execution path. > > > } > > > > /* remove breakpoint from text */ > > void __kprobes arch_disarm_kprobe(struct kprobe *p) > > { > > - patch_text(p->addr, p->opcode); > > +#ifdef CONFIG_RISCV_ISA_C > > + u32 opcode = __BUG_INSN_16; > > +#else > > + u32 opcode = __BUG_INSN_32; > > +#endif > > + patch_text_nosync(p->addr, &p->opcode, GET_INSN_LENGTH(opcode)); > > } > > > > void __kprobes arch_remove_kprobe(struct kprobe *p) > > [1] - https://lore.kernel.org/lkml/20230127130541.1250865-9-chenguokai17@mails.ucas.ac.cn/ > [2] - https://lore.kernel.org/lkml/20220927022435.129965-1-liaochang1@huawei.com/T/ > > -- > BR, > Liao, Chang
Guo Ren <guoren@kernel.org> writes: >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute >> in the instructions that will be modified, it is still need to stop other CPUs >> via patch_text API, or you have any better solution to achieve the purpose? > - The stop_machine is an expensive way all architectures should > avoid, and you could keep that in your OPTPROBES implementation files > with static functions. > - The stop_machine couldn't work with PREEMPTION, so your > implementation needs to work with !PREEMPTION. ...and stop_machine() with !PREEMPTION is broken as well, when you're replacing multiple instructions (see Mark's post at [1]). The stop_machine() dance might work when you're replacing *one* instruction, not multiple as in the RISC-V case. I'll expand on this in a comment in the OPTPROBES v6 series. >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) >> > /* install breakpoint in text */ >> > void __kprobes arch_arm_kprobe(struct kprobe *p) >> > { >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) >> > - patch_text(p->addr, __BUG_INSN_32); >> > - else >> > - patch_text(p->addr, __BUG_INSN_16); >> > +#ifdef CONFIG_RISCV_ISA_C >> > + u32 opcode = __BUG_INSN_16; >> > +#else >> > + u32 opcode = __BUG_INSN_32; >> > +#endif >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); >> >> Sounds good, but it will leave some RVI instruction truncated in kernel text, >> i doubt kernel behavior depends on the rest of the truncated instruction, well, >> it needs more strict testing to prove my concern :) > I do this on purpose, and it doesn't cause any problems. Don't worry; > IFU hw must enforce the fetch sequence, and there is no way to execute > broken instructions even in the speculative execution path. This is stretching reality a bit much. ARMv8, e.g., has a chapter in the Arm ARM [2] Appendix B "Concurrent modification and execution of instructions" (CMODX). *Some* instructions can be replaced concurrently, and others cannot without caution. Assuming that that all RISC-V implementations can, is a stretch. RISC-V hasn't even specified the behavior of CMODX (which is problematic). If anything it would be more likely that the existing "stop_machine()-to-replace-with-ebreak" works (again, replacing one instruction does not have the !PREEMPTION issues). Then again, no spec, so mostly guessing from my side. :-( Oh, but the existing "ebreak replace" might be broken like [3]. Björn [1] https://lore.kernel.org/linux-riscv/Y7%2F6AtX5X0+5qF6Y@FVFF77S0Q05N/ [2] https://developer.arm.com/documentation/ddi0487/latest [3] https://lore.kernel.org/linux-riscv/20230126170607.1489141-2-guoren@kernel.org/
Hi Bjorn, On Mon, Jan 30, 2023 at 04:28:15PM +0100, Björn Töpel wrote: > Guo Ren <guoren@kernel.org> writes: > > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > >> in the instructions that will be modified, it is still need to stop other CPUs > >> via patch_text API, or you have any better solution to achieve the purpose? > > - The stop_machine is an expensive way all architectures should > > avoid, and you could keep that in your OPTPROBES implementation files > > with static functions. > > - The stop_machine couldn't work with PREEMPTION, so your > > implementation needs to work with !PREEMPTION. > > ...and stop_machine() with !PREEMPTION is broken as well, when you're > replacing multiple instructions (see Mark's post at [1]). The > stop_machine() dance might work when you're replacing *one* instruction, > not multiple as in the RISC-V case. I'll expand on this in a comment in > the OPTPROBES v6 series. Just to clarify, my comments in [1] were assuming that stop_machine() was not used, in which case there is a problem with or without PREEMPTION. I believe that when using stop_machine(), the !PREEMPTION case is fine, since stop_machine() schedules work rather than running work in IRQ context on the back of an IPI, so no CPUs should be mid-sequnce during the patching, and it's not possible for there to be threads which are preempted mid-sequence. That all said, IIUC optprobes is going to disappear once fprobe is ready everywhere, so that might be moot. Thanks, Mark. > >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) > >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > >> > /* install breakpoint in text */ > >> > void __kprobes arch_arm_kprobe(struct kprobe *p) > >> > { > >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > >> > - patch_text(p->addr, __BUG_INSN_32); > >> > - else > >> > - patch_text(p->addr, __BUG_INSN_16); > >> > +#ifdef CONFIG_RISCV_ISA_C > >> > + u32 opcode = __BUG_INSN_16; > >> > +#else > >> > + u32 opcode = __BUG_INSN_32; > >> > +#endif > >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > >> > >> Sounds good, but it will leave some RVI instruction truncated in kernel text, > >> i doubt kernel behavior depends on the rest of the truncated instruction, well, > >> it needs more strict testing to prove my concern :) > > I do this on purpose, and it doesn't cause any problems. Don't worry; > > IFU hw must enforce the fetch sequence, and there is no way to execute > > broken instructions even in the speculative execution path. > > This is stretching reality a bit much. ARMv8, e.g., has a chapter in the > Arm ARM [2] Appendix B "Concurrent modification and execution of > instructions" (CMODX). *Some* instructions can be replaced concurrently, > and others cannot without caution. Assuming that that all RISC-V > implementations can, is a stretch. RISC-V hasn't even specified the > behavior of CMODX (which is problematic). > > If anything it would be more likely that the existing > "stop_machine()-to-replace-with-ebreak" works (again, replacing one > instruction does not have the !PREEMPTION issues). Then again, no spec, > so mostly guessing from my side. :-( > > Oh, but the existing "ebreak replace" might be broken like [3]. > > > Björn > > > [1] https://lore.kernel.org/linux-riscv/Y7%2F6AtX5X0+5qF6Y@FVFF77S0Q05N/ > [2] https://developer.arm.com/documentation/ddi0487/latest > [3] https://lore.kernel.org/linux-riscv/20230126170607.1489141-2-guoren@kernel.org/
Mark Rutland <mark.rutland@arm.com> writes: >> ...and stop_machine() with !PREEMPTION is broken as well, when you're >> replacing multiple instructions (see Mark's post at [1]). The >> stop_machine() dance might work when you're replacing *one* instruction, >> not multiple as in the RISC-V case. I'll expand on this in a comment in >> the OPTPROBES v6 series. > > Just to clarify, my comments in [1] were assuming that stop_machine() was not > used, in which case there is a problem with or without PREEMPTION. > > I believe that when using stop_machine(), the !PREEMPTION case is fine, since > stop_machine() schedules work rather than running work in IRQ context on the > back of an IPI, so no CPUs should be mid-sequnce during the patching, and it's > not possible for there to be threads which are preempted mid-sequence. TIL! stop_cpus() highlights that very nicely. Thanks for clearing that out! That's good news; That means that this fix [4] should go in. > That all said, IIUC optprobes is going to disappear once fprobe is ready > everywhere, so that might be moot. Yes (However, the stop_machine()/!PREEMPTION issue was with ftrace). Björn [4] https://lore.kernel.org/lkml/20230107133549.4192639-2-guoren@kernel.org/
On Mon, Jan 30, 2023 at 11:28 PM Björn Töpel <bjorn@kernel.org> wrote: > > Guo Ren <guoren@kernel.org> writes: > > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > >> in the instructions that will be modified, it is still need to stop other CPUs > >> via patch_text API, or you have any better solution to achieve the purpose? > > - The stop_machine is an expensive way all architectures should > > avoid, and you could keep that in your OPTPROBES implementation files > > with static functions. > > - The stop_machine couldn't work with PREEMPTION, so your > > implementation needs to work with !PREEMPTION. > > ...and stop_machine() with !PREEMPTION is broken as well, when you're > replacing multiple instructions (see Mark's post at [1]). The > stop_machine() dance might work when you're replacing *one* instruction, > not multiple as in the RISC-V case. I'll expand on this in a comment in > the OPTPROBES v6 series. > > >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) > >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > >> > /* install breakpoint in text */ > >> > void __kprobes arch_arm_kprobe(struct kprobe *p) > >> > { > >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > >> > - patch_text(p->addr, __BUG_INSN_32); > >> > - else > >> > - patch_text(p->addr, __BUG_INSN_16); > >> > +#ifdef CONFIG_RISCV_ISA_C > >> > + u32 opcode = __BUG_INSN_16; > >> > +#else > >> > + u32 opcode = __BUG_INSN_32; > >> > +#endif > >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > >> > >> Sounds good, but it will leave some RVI instruction truncated in kernel text, > >> i doubt kernel behavior depends on the rest of the truncated instruction, well, > >> it needs more strict testing to prove my concern :) > > I do this on purpose, and it doesn't cause any problems. Don't worry; > > IFU hw must enforce the fetch sequence, and there is no way to execute > > broken instructions even in the speculative execution path. > > This is stretching reality a bit much. ARMv8, e.g., has a chapter in the > Arm ARM [2] Appendix B "Concurrent modification and execution of > instructions" (CMODX). *Some* instructions can be replaced concurrently, > and others cannot without caution. Assuming that that all RISC-V > implementations can, is a stretch. RISC-V hasn't even specified the > behavior of CMODX (which is problematic). Here we only use one sw/sh instruction to store a 32bit/16bit aligned element: INSN_0 <- ebreak (16bit/32bit aligned) INSN_1 INSN_2 The ebreak would cause an exception which implies a huge fence here. No machine could give a speculative execution for the ebreak path. > > If anything it would be more likely that the existing > "stop_machine()-to-replace-with-ebreak" works (again, replacing one > instruction does not have the !PREEMPTION issues). Then again, no spec, > so mostly guessing from my side. :-( > > Oh, but the existing "ebreak replace" might be broken like [3]. > > > Björn > > > [1] https://lore.kernel.org/linux-riscv/Y7%2F6AtX5X0+5qF6Y@FVFF77S0Q05N/ > [2] https://developer.arm.com/documentation/ddi0487/latest > [3] https://lore.kernel.org/linux-riscv/20230126170607.1489141-2-guoren@kernel.org/
On Tue, Jan 31, 2023 at 9:01 AM Guo Ren <guoren@kernel.org> wrote: > > On Mon, Jan 30, 2023 at 11:28 PM Björn Töpel <bjorn@kernel.org> wrote: > > > > Guo Ren <guoren@kernel.org> writes: > > > > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > > >> in the instructions that will be modified, it is still need to stop other CPUs > > >> via patch_text API, or you have any better solution to achieve the purpose? > > > - The stop_machine is an expensive way all architectures should > > > avoid, and you could keep that in your OPTPROBES implementation files > > > with static functions. > > > - The stop_machine couldn't work with PREEMPTION, so your > > > implementation needs to work with !PREEMPTION. > > > > ...and stop_machine() with !PREEMPTION is broken as well, when you're > > replacing multiple instructions (see Mark's post at [1]). The > > stop_machine() dance might work when you're replacing *one* instruction, > > not multiple as in the RISC-V case. I'll expand on this in a comment in > > the OPTPROBES v6 series. > > > > >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) > > >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > > >> > /* install breakpoint in text */ > > >> > void __kprobes arch_arm_kprobe(struct kprobe *p) > > >> > { > > >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > > >> > - patch_text(p->addr, __BUG_INSN_32); > > >> > - else > > >> > - patch_text(p->addr, __BUG_INSN_16); > > >> > +#ifdef CONFIG_RISCV_ISA_C > > >> > + u32 opcode = __BUG_INSN_16; > > >> > +#else > > >> > + u32 opcode = __BUG_INSN_32; > > >> > +#endif > > >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > > >> > > >> Sounds good, but it will leave some RVI instruction truncated in kernel text, > > >> i doubt kernel behavior depends on the rest of the truncated instruction, well, > > >> it needs more strict testing to prove my concern :) > > > I do this on purpose, and it doesn't cause any problems. Don't worry; > > > IFU hw must enforce the fetch sequence, and there is no way to execute > > > broken instructions even in the speculative execution path. > > > > This is stretching reality a bit much. ARMv8, e.g., has a chapter in the > > Arm ARM [2] Appendix B "Concurrent modification and execution of > > instructions" (CMODX). *Some* instructions can be replaced concurrently, > > and others cannot without caution. Assuming that that all RISC-V > > implementations can, is a stretch. RISC-V hasn't even specified the > > behavior of CMODX (which is problematic). > Here we only use one sw/sh instruction to store a 32bit/16bit aligned element: > > INSN_0 <- ebreak (16bit/32bit aligned) > INSN_1 > INSN_2 > > The ebreak would cause an exception which implies a huge fence here. > No machine could give a speculative execution for the ebreak path. For ARMv7, ebreak is also safe: --- Concurrent modification and execution of instructions The ARMv7 architecture limits the set of instructions that can be executed by one thread of execution as they are being modified by another thread of execution without requiring explicit synchronization. ... The instructions to which this guarantee applies are: In the Thumb instruction set The 16-bit encodings of the B, NOP, BKPT, and SVC instructions. ... In the ARM instruction set The B, BL, NOP, BKPT, SVC, HVC, and SMC instructions. --- > > > > > If anything it would be more likely that the existing > > "stop_machine()-to-replace-with-ebreak" works (again, replacing one > > instruction does not have the !PREEMPTION issues). Then again, no spec, > > so mostly guessing from my side. :-( > > > > Oh, but the existing "ebreak replace" might be broken like [3]. > > > > > > Björn > > > > > > [1] https://lore.kernel.org/linux-riscv/Y7%2F6AtX5X0+5qF6Y@FVFF77S0Q05N/ > > [2] https://developer.arm.com/documentation/ddi0487/latest > > [3] https://lore.kernel.org/linux-riscv/20230126170607.1489141-2-guoren@kernel.org/ > > > > -- > Best Regards > Guo Ren
On Mon, Jan 30, 2023 at 11:49 PM Mark Rutland <mark.rutland@arm.com> wrote: > > Hi Bjorn, > > On Mon, Jan 30, 2023 at 04:28:15PM +0100, Björn Töpel wrote: > > Guo Ren <guoren@kernel.org> writes: > > > > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > > >> in the instructions that will be modified, it is still need to stop other CPUs > > >> via patch_text API, or you have any better solution to achieve the purpose? > > > - The stop_machine is an expensive way all architectures should > > > avoid, and you could keep that in your OPTPROBES implementation files > > > with static functions. > > > - The stop_machine couldn't work with PREEMPTION, so your > > > implementation needs to work with !PREEMPTION. > > > > ...and stop_machine() with !PREEMPTION is broken as well, when you're > > replacing multiple instructions (see Mark's post at [1]). The > > stop_machine() dance might work when you're replacing *one* instruction, > > not multiple as in the RISC-V case. I'll expand on this in a comment in > > the OPTPROBES v6 series. > > Just to clarify, my comments in [1] were assuming that stop_machine() was not > used, in which case there is a problem with or without PREEMPTION. > > I believe that when using stop_machine(), the !PREEMPTION case is fine, since > stop_machine() schedules work rather than running work in IRQ context on the > back of an IPI, so no CPUs should be mid-sequnce during the patching, and it's > not possible for there to be threads which are preempted mid-sequence. > > That all said, IIUC optprobes is going to disappear once fprobe is ready > everywhere, so that might be moot. The optprobes could be in the middle of a function, but fprobe must be the entry of a function, right? Does your fprobe here mean: ? The Linux kernel configuration item CONFIG_FPROBE: prompt: Kernel Function Probe (fprobe) type: bool depends on: ( CONFIG_FUNCTION_TRACER ) && ( CONFIG_DYNAMIC_FTRACE_WITH_REGS ) && ( CONFIG_HAVE_RETHOOK ) defined in kernel/trace/Kconfig > > Thanks, > Mark. > > > >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) > > >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > > >> > /* install breakpoint in text */ > > >> > void __kprobes arch_arm_kprobe(struct kprobe *p) > > >> > { > > >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > > >> > - patch_text(p->addr, __BUG_INSN_32); > > >> > - else > > >> > - patch_text(p->addr, __BUG_INSN_16); > > >> > +#ifdef CONFIG_RISCV_ISA_C > > >> > + u32 opcode = __BUG_INSN_16; > > >> > +#else > > >> > + u32 opcode = __BUG_INSN_32; > > >> > +#endif > > >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > > >> > > >> Sounds good, but it will leave some RVI instruction truncated in kernel text, > > >> i doubt kernel behavior depends on the rest of the truncated instruction, well, > > >> it needs more strict testing to prove my concern :) > > > I do this on purpose, and it doesn't cause any problems. Don't worry; > > > IFU hw must enforce the fetch sequence, and there is no way to execute > > > broken instructions even in the speculative execution path. > > > > This is stretching reality a bit much. ARMv8, e.g., has a chapter in the > > Arm ARM [2] Appendix B "Concurrent modification and execution of > > instructions" (CMODX). *Some* instructions can be replaced concurrently, > > and others cannot without caution. Assuming that that all RISC-V > > implementations can, is a stretch. RISC-V hasn't even specified the > > behavior of CMODX (which is problematic). > > > > If anything it would be more likely that the existing > > "stop_machine()-to-replace-with-ebreak" works (again, replacing one > > instruction does not have the !PREEMPTION issues). Then again, no spec, > > so mostly guessing from my side. :-( > > > > Oh, but the existing "ebreak replace" might be broken like [3]. > > > > > > Björn > > > > > > [1] https://lore.kernel.org/linux-riscv/Y7%2F6AtX5X0+5qF6Y@FVFF77S0Q05N/ > > [2] https://developer.arm.com/documentation/ddi0487/latest > > [3] https://lore.kernel.org/linux-riscv/20230126170607.1489141-2-guoren@kernel.org/
Guo Ren <guoren@kernel.org> writes: > On Mon, Jan 30, 2023 at 11:28 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Guo Ren <guoren@kernel.org> writes: >> >> >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair >> >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute >> >> in the instructions that will be modified, it is still need to stop other CPUs >> >> via patch_text API, or you have any better solution to achieve the purpose? >> > - The stop_machine is an expensive way all architectures should >> > avoid, and you could keep that in your OPTPROBES implementation files >> > with static functions. >> > - The stop_machine couldn't work with PREEMPTION, so your >> > implementation needs to work with !PREEMPTION. >> >> ...and stop_machine() with !PREEMPTION is broken as well, when you're >> replacing multiple instructions (see Mark's post at [1]). The >> stop_machine() dance might work when you're replacing *one* instruction, >> not multiple as in the RISC-V case. I'll expand on this in a comment in >> the OPTPROBES v6 series. >> >> >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) >> >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) >> >> > /* install breakpoint in text */ >> >> > void __kprobes arch_arm_kprobe(struct kprobe *p) >> >> > { >> >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) >> >> > - patch_text(p->addr, __BUG_INSN_32); >> >> > - else >> >> > - patch_text(p->addr, __BUG_INSN_16); >> >> > +#ifdef CONFIG_RISCV_ISA_C >> >> > + u32 opcode = __BUG_INSN_16; >> >> > +#else >> >> > + u32 opcode = __BUG_INSN_32; >> >> > +#endif >> >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); >> >> >> >> Sounds good, but it will leave some RVI instruction truncated in kernel text, >> >> i doubt kernel behavior depends on the rest of the truncated instruction, well, >> >> it needs more strict testing to prove my concern :) >> > I do this on purpose, and it doesn't cause any problems. Don't worry; >> > IFU hw must enforce the fetch sequence, and there is no way to execute >> > broken instructions even in the speculative execution path. >> >> This is stretching reality a bit much. ARMv8, e.g., has a chapter in the >> Arm ARM [2] Appendix B "Concurrent modification and execution of >> instructions" (CMODX). *Some* instructions can be replaced concurrently, >> and others cannot without caution. Assuming that that all RISC-V >> implementations can, is a stretch. RISC-V hasn't even specified the >> behavior of CMODX (which is problematic). > Here we only use one sw/sh instruction to store a 32bit/16bit aligned element: > > INSN_0 <- ebreak (16bit/32bit aligned) > INSN_1 > INSN_2 > > The ebreak would cause an exception which implies a huge fence here. > No machine could give a speculative execution for the ebreak path. It's the concurrent modification that I was referring to (removing stop_machine()). You're saying "it'll always work", I'm saying "I'm not so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that will work on all RISC-V implementations? Do you have examples of hardware where it will work? Björn
Guo Ren <guoren@kernel.org> writes: >> > >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) >> > >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) >> > >> > /* install breakpoint in text */ >> > >> > void __kprobes arch_arm_kprobe(struct kprobe *p) >> > >> > { >> > >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) >> > >> > - patch_text(p->addr, __BUG_INSN_32); >> > >> > - else >> > >> > - patch_text(p->addr, __BUG_INSN_16); >> > >> > +#ifdef CONFIG_RISCV_ISA_C >> > >> > + u32 opcode = __BUG_INSN_16; >> > >> > +#else >> > >> > + u32 opcode = __BUG_INSN_32; >> > >> > +#endif >> > >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); >> > >> >> > >> Sounds good, but it will leave some RVI instruction truncated in kernel text, >> > >> i doubt kernel behavior depends on the rest of the truncated instruction, well, >> > >> it needs more strict testing to prove my concern :) >> > > I do this on purpose, and it doesn't cause any problems. Don't worry; >> > > IFU hw must enforce the fetch sequence, and there is no way to execute >> > > broken instructions even in the speculative execution path. >> > >> > This is stretching reality a bit much. ARMv8, e.g., has a chapter in the >> > Arm ARM [2] Appendix B "Concurrent modification and execution of >> > instructions" (CMODX). *Some* instructions can be replaced concurrently, >> > and others cannot without caution. Assuming that that all RISC-V >> > implementations can, is a stretch. RISC-V hasn't even specified the >> > behavior of CMODX (which is problematic). >> Here we only use one sw/sh instruction to store a 32bit/16bit aligned element: >> >> INSN_0 <- ebreak (16bit/32bit aligned) >> INSN_1 >> INSN_2 >> >> The ebreak would cause an exception which implies a huge fence here. >> No machine could give a speculative execution for the ebreak path. > > For ARMv7, ebreak is also safe: > > --- > Concurrent modification and execution of instructions > > The ARMv7 architecture limits the set of instructions that can be > executed by one thread of execution as they are being modified by > another thread of execution without requiring explicit > synchronization. > ... > The instructions to which this guarantee applies are: > In the Thumb instruction set > The 16-bit encodings of the B, NOP, BKPT, and SVC instructions. > ... > In the ARM instruction set > The B, BL, NOP, BKPT, SVC, HVC, and SMC instructions. > --- Right, and "B7.7 Concurrent modification and execution of instructions" Armv8-M ARM (https://developer.arm.com/documentation/ddi0553/latest), also defines that certain instructions can be concurrently modified. This is beside the point. We don't have a spec for RISC-V, yet. We're not even sure we can (in general) replace the lower 16b of an 32b instruction concurrently. "It's in the Armv8-M spec" is not enough. I'd love to have a spec defining that, and Derek et al has started [1]. Slide #99 has CMODX details. Your patch might be great for some HW (which?), but not enough for general RISC-V Linux (yet). Until then, the existing stop_machine() way is unfortunately the way to go. Björn [1] https://github.com/riscv/riscv-j-extension/blob/master/id-consistency-proposal.pdf
Guo Ren <guoren@kernel.org> writes: > On Mon, Jan 30, 2023 at 11:49 PM Mark Rutland <mark.rutland@arm.com> wrote: >> >> Hi Bjorn, >> >> On Mon, Jan 30, 2023 at 04:28:15PM +0100, Björn Töpel wrote: >> > Guo Ren <guoren@kernel.org> writes: >> > >> > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair >> > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute >> > >> in the instructions that will be modified, it is still need to stop other CPUs >> > >> via patch_text API, or you have any better solution to achieve the purpose? >> > > - The stop_machine is an expensive way all architectures should >> > > avoid, and you could keep that in your OPTPROBES implementation files >> > > with static functions. >> > > - The stop_machine couldn't work with PREEMPTION, so your >> > > implementation needs to work with !PREEMPTION. >> > >> > ...and stop_machine() with !PREEMPTION is broken as well, when you're >> > replacing multiple instructions (see Mark's post at [1]). The >> > stop_machine() dance might work when you're replacing *one* instruction, >> > not multiple as in the RISC-V case. I'll expand on this in a comment in >> > the OPTPROBES v6 series. >> >> Just to clarify, my comments in [1] were assuming that stop_machine() was not >> used, in which case there is a problem with or without PREEMPTION. >> >> I believe that when using stop_machine(), the !PREEMPTION case is fine, since >> stop_machine() schedules work rather than running work in IRQ context on the >> back of an IPI, so no CPUs should be mid-sequnce during the patching, and it's >> not possible for there to be threads which are preempted mid-sequence. >> >> That all said, IIUC optprobes is going to disappear once fprobe is ready >> everywhere, so that might be moot. > The optprobes could be in the middle of a function, but fprobe must be > the entry of a function, right? > > Does your fprobe here mean: ? > > The Linux kernel configuration item CONFIG_FPROBE: > > prompt: Kernel Function Probe (fprobe) > type: bool > depends on: ( CONFIG_FUNCTION_TRACER ) && ( > CONFIG_DYNAMIC_FTRACE_WITH_REGS ) && ( CONFIG_HAVE_RETHOOK ) > defined in kernel/trace/Kconfig See the cover of [1]. It's about direct calls for BPF tracing (and more) on Arm, and you're completly right, that it's *not* related to optprobes at all. [1] https://lore.kernel.org/all/20221108220651.24492-1-revest@chromium.org/
On Tue, Jan 31, 2023 at 2:40 PM Björn Töpel <bjorn@kernel.org> wrote: > > Guo Ren <guoren@kernel.org> writes: > > > On Mon, Jan 30, 2023 at 11:28 PM Björn Töpel <bjorn@kernel.org> wrote: > >> > >> Guo Ren <guoren@kernel.org> writes: > >> > >> >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > >> >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > >> >> in the instructions that will be modified, it is still need to stop other CPUs > >> >> via patch_text API, or you have any better solution to achieve the purpose? > >> > - The stop_machine is an expensive way all architectures should > >> > avoid, and you could keep that in your OPTPROBES implementation files > >> > with static functions. > >> > - The stop_machine couldn't work with PREEMPTION, so your > >> > implementation needs to work with !PREEMPTION. > >> > >> ...and stop_machine() with !PREEMPTION is broken as well, when you're > >> replacing multiple instructions (see Mark's post at [1]). The > >> stop_machine() dance might work when you're replacing *one* instruction, > >> not multiple as in the RISC-V case. I'll expand on this in a comment in > >> the OPTPROBES v6 series. > >> > >> >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) > >> >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > >> >> > /* install breakpoint in text */ > >> >> > void __kprobes arch_arm_kprobe(struct kprobe *p) > >> >> > { > >> >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > >> >> > - patch_text(p->addr, __BUG_INSN_32); > >> >> > - else > >> >> > - patch_text(p->addr, __BUG_INSN_16); > >> >> > +#ifdef CONFIG_RISCV_ISA_C > >> >> > + u32 opcode = __BUG_INSN_16; > >> >> > +#else > >> >> > + u32 opcode = __BUG_INSN_32; > >> >> > +#endif > >> >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > >> >> > >> >> Sounds good, but it will leave some RVI instruction truncated in kernel text, > >> >> i doubt kernel behavior depends on the rest of the truncated instruction, well, > >> >> it needs more strict testing to prove my concern :) > >> > I do this on purpose, and it doesn't cause any problems. Don't worry; > >> > IFU hw must enforce the fetch sequence, and there is no way to execute > >> > broken instructions even in the speculative execution path. > >> > >> This is stretching reality a bit much. ARMv8, e.g., has a chapter in the > >> Arm ARM [2] Appendix B "Concurrent modification and execution of > >> instructions" (CMODX). *Some* instructions can be replaced concurrently, > >> and others cannot without caution. Assuming that that all RISC-V > >> implementations can, is a stretch. RISC-V hasn't even specified the > >> behavior of CMODX (which is problematic). > > Here we only use one sw/sh instruction to store a 32bit/16bit aligned element: > > > > INSN_0 <- ebreak (16bit/32bit aligned) > > INSN_1 > > INSN_2 > > > > The ebreak would cause an exception which implies a huge fence here. > > No machine could give a speculative execution for the ebreak path. > > It's the concurrent modification that I was referring to (removing > stop_machine()). You're saying "it'll always work", I'm saying "I'm not > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that Software must ensure write c.ebreak on the head of an 32b insn. That means IFU only see: - c.ebreak + broken/illegal insn. or - origin insn Even in the worst case, such as IFU fetches instructions one by one: If the IFU gets the origin insn, it will skip the broken/illegal insn. If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak exception is raised. Because c.ebreak would raise an exception, I don't see any problem. > will work on all RISC-V implementations? Do you have examples of > hardware where it will work? For the c.ebreak, it's natural. It's hard to make hardware implementation get problems here. > > > Björn
On Tue, Jan 31, 2023 at 3:03 PM Björn Töpel <bjorn@kernel.org> wrote: > > Guo Ren <guoren@kernel.org> writes: > > >> > >> > static void __kprobes arch_prepare_simulate(struct kprobe *p) > >> > >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > >> > >> > /* install breakpoint in text */ > >> > >> > void __kprobes arch_arm_kprobe(struct kprobe *p) > >> > >> > { > >> > >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > >> > >> > - patch_text(p->addr, __BUG_INSN_32); > >> > >> > - else > >> > >> > - patch_text(p->addr, __BUG_INSN_16); > >> > >> > +#ifdef CONFIG_RISCV_ISA_C > >> > >> > + u32 opcode = __BUG_INSN_16; > >> > >> > +#else > >> > >> > + u32 opcode = __BUG_INSN_32; > >> > >> > +#endif > >> > >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > >> > >> > >> > >> Sounds good, but it will leave some RVI instruction truncated in kernel text, > >> > >> i doubt kernel behavior depends on the rest of the truncated instruction, well, > >> > >> it needs more strict testing to prove my concern :) > >> > > I do this on purpose, and it doesn't cause any problems. Don't worry; > >> > > IFU hw must enforce the fetch sequence, and there is no way to execute > >> > > broken instructions even in the speculative execution path. > >> > > >> > This is stretching reality a bit much. ARMv8, e.g., has a chapter in the > >> > Arm ARM [2] Appendix B "Concurrent modification and execution of > >> > instructions" (CMODX). *Some* instructions can be replaced concurrently, > >> > and others cannot without caution. Assuming that that all RISC-V > >> > implementations can, is a stretch. RISC-V hasn't even specified the > >> > behavior of CMODX (which is problematic). > >> Here we only use one sw/sh instruction to store a 32bit/16bit aligned element: > >> > >> INSN_0 <- ebreak (16bit/32bit aligned) > >> INSN_1 > >> INSN_2 > >> > >> The ebreak would cause an exception which implies a huge fence here. > >> No machine could give a speculative execution for the ebreak path. > > > > For ARMv7, ebreak is also safe: > > > > --- > > Concurrent modification and execution of instructions > > > > The ARMv7 architecture limits the set of instructions that can be > > executed by one thread of execution as they are being modified by > > another thread of execution without requiring explicit > > synchronization. > > ... > > The instructions to which this guarantee applies are: > > In the Thumb instruction set > > The 16-bit encodings of the B, NOP, BKPT, and SVC instructions. > > ... > > In the ARM instruction set > > The B, BL, NOP, BKPT, SVC, HVC, and SMC instructions. > > --- > > Right, and "B7.7 Concurrent modification and execution of instructions" > Armv8-M ARM (https://developer.arm.com/documentation/ddi0553/latest), > also defines that certain instructions can be concurrently modified. > > This is beside the point. We don't have a spec for RISC-V, yet. We're > not even sure we can (in general) replace the lower 16b of an 32b > instruction concurrently. "It's in the Armv8-M spec" is not enough. > > I'd love to have a spec defining that, and Derek et al has started > [1]. Slide #99 has CMODX details. For the c.ebreak, CMODX is unnecessary. CMODX would give a wider definition. > > Your patch might be great for some HW (which?), but not enough for > general RISC-V Linux (yet). Until then, the existing stop_machine() way > is unfortunately the way to go. It's generic for c.ebreak, not some HW. It's natural to support. No machine would break it. Please let me clarify our argued premise of "ebreak" injection atomicity: - c.ebreak on the head of 32b insn - ebreak on an aligned 32b insn > > > Björn > > [1] https://github.com/riscv/riscv-j-extension/blob/master/id-consistency-proposal.pdf
On Tue, Jan 31, 2023 at 3:12 PM Björn Töpel <bjorn@kernel.org> wrote: > > Guo Ren <guoren@kernel.org> writes: > > > On Mon, Jan 30, 2023 at 11:49 PM Mark Rutland <mark.rutland@arm.com> wrote: > >> > >> Hi Bjorn, > >> > >> On Mon, Jan 30, 2023 at 04:28:15PM +0100, Björn Töpel wrote: > >> > Guo Ren <guoren@kernel.org> writes: > >> > > >> > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > >> > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > >> > >> in the instructions that will be modified, it is still need to stop other CPUs > >> > >> via patch_text API, or you have any better solution to achieve the purpose? > >> > > - The stop_machine is an expensive way all architectures should > >> > > avoid, and you could keep that in your OPTPROBES implementation files > >> > > with static functions. > >> > > - The stop_machine couldn't work with PREEMPTION, so your > >> > > implementation needs to work with !PREEMPTION. > >> > > >> > ...and stop_machine() with !PREEMPTION is broken as well, when you're > >> > replacing multiple instructions (see Mark's post at [1]). The > >> > stop_machine() dance might work when you're replacing *one* instruction, > >> > not multiple as in the RISC-V case. I'll expand on this in a comment in > >> > the OPTPROBES v6 series. > >> > >> Just to clarify, my comments in [1] were assuming that stop_machine() was not > >> used, in which case there is a problem with or without PREEMPTION. > >> > >> I believe that when using stop_machine(), the !PREEMPTION case is fine, since > >> stop_machine() schedules work rather than running work in IRQ context on the > >> back of an IPI, so no CPUs should be mid-sequnce during the patching, and it's > >> not possible for there to be threads which are preempted mid-sequence. > >> > >> That all said, IIUC optprobes is going to disappear once fprobe is ready > >> everywhere, so that might be moot. > > The optprobes could be in the middle of a function, but fprobe must be > > the entry of a function, right? > > > > Does your fprobe here mean: ? > > > > The Linux kernel configuration item CONFIG_FPROBE: > > > > prompt: Kernel Function Probe (fprobe) > > type: bool > > depends on: ( CONFIG_FUNCTION_TRACER ) && ( > > CONFIG_DYNAMIC_FTRACE_WITH_REGS ) && ( CONFIG_HAVE_RETHOOK ) > > defined in kernel/trace/Kconfig > > See the cover of [1]. It's about direct calls for BPF tracing (and more) > on Arm, and you're completly right, that it's *not* related to optprobes > at all. > > [1] https://lore.kernel.org/all/20221108220651.24492-1-revest@chromium.org/ Thx for sharing :)
On Tue, Jan 31, 2023 at 09:48:29AM +0800, Guo Ren wrote: > On Mon, Jan 30, 2023 at 11:49 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > > Hi Bjorn, > > > > On Mon, Jan 30, 2023 at 04:28:15PM +0100, Björn Töpel wrote: > > > Guo Ren <guoren@kernel.org> writes: > > > > > > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > > > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > > > >> in the instructions that will be modified, it is still need to stop other CPUs > > > >> via patch_text API, or you have any better solution to achieve the purpose? > > > > - The stop_machine is an expensive way all architectures should > > > > avoid, and you could keep that in your OPTPROBES implementation files > > > > with static functions. > > > > - The stop_machine couldn't work with PREEMPTION, so your > > > > implementation needs to work with !PREEMPTION. > > > > > > ...and stop_machine() with !PREEMPTION is broken as well, when you're > > > replacing multiple instructions (see Mark's post at [1]). The > > > stop_machine() dance might work when you're replacing *one* instruction, > > > not multiple as in the RISC-V case. I'll expand on this in a comment in > > > the OPTPROBES v6 series. > > > > Just to clarify, my comments in [1] were assuming that stop_machine() was not > > used, in which case there is a problem with or without PREEMPTION. > > > > I believe that when using stop_machine(), the !PREEMPTION case is fine, since > > stop_machine() schedules work rather than running work in IRQ context on the > > back of an IPI, so no CPUs should be mid-sequnce during the patching, and it's > > not possible for there to be threads which are preempted mid-sequence. > > > > That all said, IIUC optprobes is going to disappear once fprobe is ready > > everywhere, so that might be moot. > The optprobes could be in the middle of a function, but fprobe must be > the entry of a function, right? > > Does your fprobe here mean: ? > > The Linux kernel configuration item CONFIG_FPROBE: > > prompt: Kernel Function Probe (fprobe) > type: bool > depends on: ( CONFIG_FUNCTION_TRACER ) && ( > CONFIG_DYNAMIC_FTRACE_WITH_REGS ) && ( CONFIG_HAVE_RETHOOK ) > defined in kernel/trace/Kconfig Yes. Masami, Steve, and I had a chat at the tracing summit late last year (which unfortunately, was not recorded), and what we'd like to do is get each architecture to have FPROBE (and FTRACE_WITH_ARGS), at which point OPTPROBE and KRETPROBE become redundant and could be removed. i.e. we'd keep KPROBES as a "you can trace any instruction" feature, but in the few cases where OPTPROBES can make things fater by using FTRACE, you should just use that directly via FPROBE. Thanks, Mark.
> > It's the concurrent modification that I was referring to (removing > > stop_machine()). You're saying "it'll always work", I'm saying "I'm not > > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that > Software must ensure write c.ebreak on the head of an 32b insn. > > That means IFU only see: > - c.ebreak + broken/illegal insn. > or > - origin insn > > Even in the worst case, such as IFU fetches instructions one by one: > If the IFU gets the origin insn, it will skip the broken/illegal insn. > If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak > exception is raised. > > Because c.ebreak would raise an exception, I don't see any problem. That's the problem, this discussion is: Reviewer: "I'm not sure, that's not written in our spec" Submitter: "I said it, it's called -accurate atomicity-" Andrea
On Tue, Jan 31, 2023 at 6:57 PM Andrea Parri <parri.andrea@gmail.com> wrote: > > > > It's the concurrent modification that I was referring to (removing > > > stop_machine()). You're saying "it'll always work", I'm saying "I'm not > > > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that > > Software must ensure write c.ebreak on the head of an 32b insn. > > > > That means IFU only see: > > - c.ebreak + broken/illegal insn. > > or > > - origin insn > > > > Even in the worst case, such as IFU fetches instructions one by one: > > If the IFU gets the origin insn, it will skip the broken/illegal insn. > > If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak > > exception is raised. > > > > Because c.ebreak would raise an exception, I don't see any problem. > > That's the problem, this discussion is: > > Reviewer: "I'm not sure, that's not written in our spec" > Submitter: "I said it, it's called -accurate atomicity-" I really don't see any hardware that could break the atomicity of this c.ebreak scenario: - c.ebreak on the head of 32b insn - ebreak on an aligned 32b insn If IFU fetches with cacheline, all is atomicity. If IFU fetches with 16bit one by one, the first c.ebreak would raise an exception and skip the next broke/illegal instruction. Even if IFU fetches without any sequence, the IDU must decode one by one, right? The first half c.ebreak would protect and prevent the next broke/illegal instruction. Speculative execution on broke/illegal instruction won't cause any exceptions. It's a common issue, not a specific ISA issue. 32b instruction A -> 16b ebreak + 16b broken/illegal -> 32b instruction A. It's safe to transform. > > Andrea
Guo Ren <guoren@kernel.org> writes: > On Tue, Jan 31, 2023 at 6:57 PM Andrea Parri <parri.andrea@gmail.com> wrote: >> >> > > It's the concurrent modification that I was referring to (removing >> > > stop_machine()). You're saying "it'll always work", I'm saying "I'm not >> > > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that >> > Software must ensure write c.ebreak on the head of an 32b insn. >> > >> > That means IFU only see: >> > - c.ebreak + broken/illegal insn. >> > or >> > - origin insn >> > >> > Even in the worst case, such as IFU fetches instructions one by one: >> > If the IFU gets the origin insn, it will skip the broken/illegal insn. >> > If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak >> > exception is raised. >> > >> > Because c.ebreak would raise an exception, I don't see any problem. >> >> That's the problem, this discussion is: >> >> Reviewer: "I'm not sure, that's not written in our spec" >> Submitter: "I said it, it's called -accurate atomicity-" > I really don't see any hardware that could break the atomicity of this > c.ebreak scenario: > - c.ebreak on the head of 32b insn > - ebreak on an aligned 32b insn > > If IFU fetches with cacheline, all is atomicity. > If IFU fetches with 16bit one by one, the first c.ebreak would raise > an exception and skip the next broke/illegal instruction. > Even if IFU fetches without any sequence, the IDU must decode one by > one, right? The first half c.ebreak would protect and prevent the next > broke/illegal instruction. Speculative execution on broke/illegal > instruction won't cause any exceptions. > > It's a common issue, not a specific ISA issue. > 32b instruction A -> 16b ebreak + 16b broken/illegal -> 32b > instruction A. It's safe to transform. Waking up this thread again, now that Changbin has showed some interest from another thread [1]. Guo, we can't really add your patches, and claim that they're generic, "works on all" RISC-V systems. While it might work for your I/D coherent system, that does not imply that it'll work on all platforms. RISC-V allows for implementations that are I/D incoherent, and here your IFU-implementations arguments do not hold. I'd really recommend to readup on [2]. Now how could we move on with your patches? Get it in a spec, or fold the patches in as a Kconfig.socs-thing for the platforms where this is OK. What are you thoughts on the latter? Björn [1] https://lore.kernel.org/linux-riscv/20230215034532.xs726l7mp6xlnkdf@M910t/ [2] https://github.com/riscv/riscv-j-extension/blob/master/id-consistency-proposal.pdf
Hi, Sorry I missed this thread. On Tue, 31 Jan 2023 10:33:05 +0000 Mark Rutland <mark.rutland@arm.com> wrote: > On Tue, Jan 31, 2023 at 09:48:29AM +0800, Guo Ren wrote: > > On Mon, Jan 30, 2023 at 11:49 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > Hi Bjorn, > > > > > > On Mon, Jan 30, 2023 at 04:28:15PM +0100, Björn Töpel wrote: > > > > Guo Ren <guoren@kernel.org> writes: > > > > > > > > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > > > > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > > > > >> in the instructions that will be modified, it is still need to stop other CPUs > > > > >> via patch_text API, or you have any better solution to achieve the purpose? > > > > > - The stop_machine is an expensive way all architectures should > > > > > avoid, and you could keep that in your OPTPROBES implementation files > > > > > with static functions. > > > > > - The stop_machine couldn't work with PREEMPTION, so your > > > > > implementation needs to work with !PREEMPTION. > > > > > > > > ...and stop_machine() with !PREEMPTION is broken as well, when you're > > > > replacing multiple instructions (see Mark's post at [1]). The > > > > stop_machine() dance might work when you're replacing *one* instruction, > > > > not multiple as in the RISC-V case. I'll expand on this in a comment in > > > > the OPTPROBES v6 series. > > > > > > Just to clarify, my comments in [1] were assuming that stop_machine() was not > > > used, in which case there is a problem with or without PREEMPTION. > > > > > > I believe that when using stop_machine(), the !PREEMPTION case is fine, since > > > stop_machine() schedules work rather than running work in IRQ context on the > > > back of an IPI, so no CPUs should be mid-sequnce during the patching, and it's > > > not possible for there to be threads which are preempted mid-sequence. > > > > > > That all said, IIUC optprobes is going to disappear once fprobe is ready > > > everywhere, so that might be moot. > > The optprobes could be in the middle of a function, but fprobe must be > > the entry of a function, right? > > > > Does your fprobe here mean: ? > > > > The Linux kernel configuration item CONFIG_FPROBE: > > > > prompt: Kernel Function Probe (fprobe) > > type: bool > > depends on: ( CONFIG_FUNCTION_TRACER ) && ( > > CONFIG_DYNAMIC_FTRACE_WITH_REGS ) && ( CONFIG_HAVE_RETHOOK ) > > defined in kernel/trace/Kconfig > > Yes. > > Masami, Steve, and I had a chat at the tracing summit late last year (which > unfortunately, was not recorded), and what we'd like to do is get each > architecture to have FPROBE (and FTRACE_WITH_ARGS), at which point OPTPROBE > and KRETPROBE become redundant and could be removed. No, the fprobe will replace the KRETPROBE but not OPTPROBE. The OPTPROBE is completely different one. Fprobe is used only for function entry, but optprobe is applied to the function body. > > i.e. we'd keep KPROBES as a "you can trace any instruction" feature, but in the > few cases where OPTPROBES can make things fater by using FTRACE, you should > just use that directly via FPROBE. I think what you are saying is KPROBE_ON_FTRACE, and that will be replaced by FPROBES. Thank you, > > Thanks, > Mark.
On Thu, 26 Jan 2023 11:15:59 -0500 guoren@kernel.org wrote: > From: Guo Ren <guoren@linux.alibaba.com> > > The previous implementation was based on the stop_matchine mechanism, > which reduced the speed of arm/disarm_kprobe. Using minimum ebreak > instruction would get accurate atomicity. > > This patch removes the patch_text of riscv, which is based on > stop_machine. Then riscv only reserved patch_text_nosync, and developers > need to be more careful in dealing with patch_text atomicity. > > When CONFIG_RISCV_ISA_C=n, the ebreak could replace the whole > instruction. When CONFIG_RISCV_ISA_C=y, the patch uses 16-bit length > c.ebreak instruction, which may occupy the first part of the 32-bit > instruction and leave half the rest of the broken instruction. Because > ebreak could detour the flow to skip it, leaving it in the kernel text > memory is okay. > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> I'm not sure how the RISCV specification ensures this type of self code modification. But if you think calling the stop_machine() for *each* probe arm/disarm is slow, there may be another way to avoid ot by introducing a batch arming interface too. (reserve-commit way) BTW, for the BPF usecase which is usually only for function entry/exit, we will use Fprobes. Since that will use ftrace batch text patching, I think we already avoid such slowdown. FYI, for ftrace dynamic event usecase, there is another reason to slow down the enable/disable dynamic event itself (to sync up the event enabled status to ensure all event handler has been done, it waits for rcu-sync for each operation.) Thank you, > --- > arch/riscv/include/asm/patch.h | 1 - > arch/riscv/kernel/patch.c | 33 ------------------------------ > arch/riscv/kernel/probes/kprobes.c | 29 ++++++++++++++++++-------- > 3 files changed, 21 insertions(+), 42 deletions(-) > > diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h > index 9a7d7346001e..2500782e6f5b 100644 > --- a/arch/riscv/include/asm/patch.h > +++ b/arch/riscv/include/asm/patch.h > @@ -7,6 +7,5 @@ > #define _ASM_RISCV_PATCH_H > > int patch_text_nosync(void *addr, const void *insns, size_t len); > -int patch_text(void *addr, u32 insn); > > #endif /* _ASM_RISCV_PATCH_H */ > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > index 765004b60513..8bd51ed8b806 100644 > --- a/arch/riscv/kernel/patch.c > +++ b/arch/riscv/kernel/patch.c > @@ -98,36 +98,3 @@ int patch_text_nosync(void *addr, const void *insns, size_t len) > return ret; > } > NOKPROBE_SYMBOL(patch_text_nosync); > - > -static int patch_text_cb(void *data) > -{ > - struct patch_insn *patch = data; > - int ret = 0; > - > - if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) { > - ret = > - patch_text_nosync(patch->addr, &patch->insn, > - GET_INSN_LENGTH(patch->insn)); > - atomic_inc(&patch->cpu_count); > - } else { > - while (atomic_read(&patch->cpu_count) <= num_online_cpus()) > - cpu_relax(); > - smp_mb(); > - } > - > - return ret; > -} > -NOKPROBE_SYMBOL(patch_text_cb); > - > -int patch_text(void *addr, u32 insn) > -{ > - struct patch_insn patch = { > - .addr = addr, > - .insn = insn, > - .cpu_count = ATOMIC_INIT(0), > - }; > - > - return stop_machine_cpuslocked(patch_text_cb, > - &patch, cpu_online_mask); > -} > -NOKPROBE_SYMBOL(patch_text); > diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c > index 475989f06d6d..27f8960c321c 100644 > --- a/arch/riscv/kernel/probes/kprobes.c > +++ b/arch/riscv/kernel/probes/kprobes.c > @@ -24,12 +24,18 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); > static void __kprobes arch_prepare_ss_slot(struct kprobe *p) > { > unsigned long offset = GET_INSN_LENGTH(p->opcode); > +#ifdef CONFIG_RISCV_ISA_C > + u32 opcode = __BUG_INSN_16; > +#else > + u32 opcode = __BUG_INSN_32; > +#endif > > p->ainsn.api.restore = (unsigned long)p->addr + offset; > > - patch_text(p->ainsn.api.insn, p->opcode); > - patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset), > - __BUG_INSN_32); > + patch_text_nosync(p->ainsn.api.insn, &p->opcode, offset); > + patch_text_nosync((void *)((unsigned long)(p->ainsn.api.insn) + offset), > + &opcode, GET_INSN_LENGTH(opcode)); > + > } > > static void __kprobes arch_prepare_simulate(struct kprobe *p) > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > /* install breakpoint in text */ > void __kprobes arch_arm_kprobe(struct kprobe *p) > { > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > - patch_text(p->addr, __BUG_INSN_32); > - else > - patch_text(p->addr, __BUG_INSN_16); > +#ifdef CONFIG_RISCV_ISA_C > + u32 opcode = __BUG_INSN_16; > +#else > + u32 opcode = __BUG_INSN_32; > +#endif > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > } > > /* remove breakpoint from text */ > void __kprobes arch_disarm_kprobe(struct kprobe *p) > { > - patch_text(p->addr, p->opcode); > +#ifdef CONFIG_RISCV_ISA_C > + u32 opcode = __BUG_INSN_16; > +#else > + u32 opcode = __BUG_INSN_32; > +#endif > + patch_text_nosync(p->addr, &p->opcode, GET_INSN_LENGTH(opcode)); > } > > void __kprobes arch_remove_kprobe(struct kprobe *p) > -- > 2.36.1 >
On Thu, Feb 16, 2023 at 3:54 PM Björn Töpel <bjorn@kernel.org> wrote: > > Guo Ren <guoren@kernel.org> writes: > > > On Tue, Jan 31, 2023 at 6:57 PM Andrea Parri <parri.andrea@gmail.com> wrote: > >> > >> > > It's the concurrent modification that I was referring to (removing > >> > > stop_machine()). You're saying "it'll always work", I'm saying "I'm not > >> > > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that > >> > Software must ensure write c.ebreak on the head of an 32b insn. > >> > > >> > That means IFU only see: > >> > - c.ebreak + broken/illegal insn. > >> > or > >> > - origin insn > >> > > >> > Even in the worst case, such as IFU fetches instructions one by one: > >> > If the IFU gets the origin insn, it will skip the broken/illegal insn. > >> > If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak > >> > exception is raised. > >> > > >> > Because c.ebreak would raise an exception, I don't see any problem. > >> > >> That's the problem, this discussion is: > >> > >> Reviewer: "I'm not sure, that's not written in our spec" > >> Submitter: "I said it, it's called -accurate atomicity-" > > I really don't see any hardware that could break the atomicity of this > > c.ebreak scenario: > > - c.ebreak on the head of 32b insn > > - ebreak on an aligned 32b insn > > > > If IFU fetches with cacheline, all is atomicity. > > If IFU fetches with 16bit one by one, the first c.ebreak would raise > > an exception and skip the next broke/illegal instruction. > > Even if IFU fetches without any sequence, the IDU must decode one by > > one, right? The first half c.ebreak would protect and prevent the next > > broke/illegal instruction. Speculative execution on broke/illegal > > instruction won't cause any exceptions. > > > > It's a common issue, not a specific ISA issue. > > 32b instruction A -> 16b ebreak + 16b broken/illegal -> 32b > > instruction A. It's safe to transform. > > Waking up this thread again, now that Changbin has showed some interest > from another thread [1]. > > Guo, we can't really add your patches, and claim that they're generic, > "works on all" RISC-V systems. While it might work for your I/D coherent > system, that does not imply that it'll work on all platforms. RISC-V > allows for implementations that are I/D incoherent, and here your > IFU-implementations arguments do not hold. I'd really recommend to > readup on [2]. Sorry, [2] isn't related to this patch. This patch didn't have I/D incoherent problem because we broadcast the IPI fence.i in patch_text_nosync. Compared to the stop_machine version, there is a crazy nested IPI broadcast cost. stop_machine -> patch_text_nosync -> flush_icache_all void flush_icache_all(void) { local_flush_icache_all(); if (IS_ENABLED(CONFIG_RISCV_SBI)) sbi_remote_fence_i(NULL); else on_each_cpu(ipi_remote_fence_i, NULL, 1); } EXPORT_SYMBOL(flush_icache_all); > > Now how could we move on with your patches? Get it in a spec, or fold > the patches in as a Kconfig.socs-thing for the platforms where this is > OK. What are you thoughts on the latter? I didn't talk about I/D incoherent/coherent; what I say is the basic size of the instruction element. In an I/D cache system, why couldn't LSU store-half guarantee atomicity for I-cache fetch? How I-cache could fetch only one byte of that Store-half value? We've assumed this guarantee in the riscv jump_label implementation, so why not this patch couldn't? > > > Björn > > [1] https://lore.kernel.org/linux-riscv/20230215034532.xs726l7mp6xlnkdf@M910t/ > [2] https://github.com/riscv/riscv-j-extension/blob/master/id-consistency-proposal.pdf
Guo Ren <guoren@kernel.org> writes: > On Thu, Feb 16, 2023 at 3:54 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Guo Ren <guoren@kernel.org> writes: >> >> > On Tue, Jan 31, 2023 at 6:57 PM Andrea Parri <parri.andrea@gmail.com> wrote: >> >> >> >> > > It's the concurrent modification that I was referring to (removing >> >> > > stop_machine()). You're saying "it'll always work", I'm saying "I'm not >> >> > > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that >> >> > Software must ensure write c.ebreak on the head of an 32b insn. >> >> > >> >> > That means IFU only see: >> >> > - c.ebreak + broken/illegal insn. >> >> > or >> >> > - origin insn >> >> > >> >> > Even in the worst case, such as IFU fetches instructions one by one: >> >> > If the IFU gets the origin insn, it will skip the broken/illegal insn. >> >> > If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak >> >> > exception is raised. >> >> > >> >> > Because c.ebreak would raise an exception, I don't see any problem. >> >> >> >> That's the problem, this discussion is: >> >> >> >> Reviewer: "I'm not sure, that's not written in our spec" >> >> Submitter: "I said it, it's called -accurate atomicity-" >> > I really don't see any hardware that could break the atomicity of this >> > c.ebreak scenario: >> > - c.ebreak on the head of 32b insn >> > - ebreak on an aligned 32b insn >> > >> > If IFU fetches with cacheline, all is atomicity. >> > If IFU fetches with 16bit one by one, the first c.ebreak would raise >> > an exception and skip the next broke/illegal instruction. >> > Even if IFU fetches without any sequence, the IDU must decode one by >> > one, right? The first half c.ebreak would protect and prevent the next >> > broke/illegal instruction. Speculative execution on broke/illegal >> > instruction won't cause any exceptions. >> > >> > It's a common issue, not a specific ISA issue. >> > 32b instruction A -> 16b ebreak + 16b broken/illegal -> 32b >> > instruction A. It's safe to transform. >> >> Waking up this thread again, now that Changbin has showed some interest >> from another thread [1]. >> >> Guo, we can't really add your patches, and claim that they're generic, >> "works on all" RISC-V systems. While it might work for your I/D coherent >> system, that does not imply that it'll work on all platforms. RISC-V >> allows for implementations that are I/D incoherent, and here your >> IFU-implementations arguments do not hold. I'd really recommend to >> readup on [2]. > Sorry, [2] isn't related to this patch. Well, it is. Page 44 and 98. You are changing an instruction, that potentially the processor fetches and executes, from an instruction storage which has not been made consistent with data storage. > This patch didn't have I/D incoherent problem because we broadcast the > IPI fence.i in patch_text_nosync. After the modification, yes. > Compared to the stop_machine version, there is a crazy nested IPI > broadcast cost. > stop_machine -> patch_text_nosync -> flush_icache_all > void flush_icache_all(void) > { > local_flush_icache_all(); > > if (IS_ENABLED(CONFIG_RISCV_SBI)) > sbi_remote_fence_i(NULL); > else > on_each_cpu(ipi_remote_fence_i, NULL, 1); > } > EXPORT_SYMBOL(flush_icache_all); Look, I'd love to have your patch in *if we could say that it works on all RISC-V platforms*. If everyone agrees that "Guo's approach works" -- I'd be a happy person. I hate the stopmachine flow as much as the next guy. I want a better mechanism in as well. What I'm saying is that: There's no specification for this. What assumptions can be made? The fact that Intel, Arm, and power has this explicitly stated in the specs, hints that this is something to pay attention to. Undefined behavior is no fun debugging. You seem very confident that it's impossible to construct hardware where your approach does not work. If there's concensus that your approach is "good enough" -- hey, that'd be great! Get it in! >> Now how could we move on with your patches? Get it in a spec, or fold >> the patches in as a Kconfig.socs-thing for the platforms where this is >> OK. What are you thoughts on the latter? > > I didn't talk about I/D incoherent/coherent; what I say is the basic > size of the instruction element. > In an I/D cache system, why couldn't LSU store-half guarantee > atomicity for I-cache fetch? How I-cache could fetch only one byte of > that Store-half value? > We've assumed this guarantee in the riscv jump_label implementation, > so why not this patch couldn't? Which is a good point. Is that working on all existing implementations? Is that assumption correct? We've seen issues with that approach on *simulation*, where you/Andy fixed it: https://lore.kernel.org/linux-riscv/20230206090440.1255001-1-guoren@kernel.org/ Maybe the RISC-V kernel's assumptions about text patching should just be documented, and stating "this is what the kernel does, and what it assumes about the execution environment -- if your hardware doesn't work with that ¯\_(ツ)_/¯". What are your thoughts on this, Guo? You don't seem to share my concerns. Or is it better to go for your path (patches!), and simply change it if there's issues down the line?
On Fri, Feb 17, 2023 at 12:23:51AM +0900, Masami Hiramatsu wrote: > On Tue, 31 Jan 2023 10:33:05 +0000 > Mark Rutland <mark.rutland@arm.com> wrote: > > > On Tue, Jan 31, 2023 at 09:48:29AM +0800, Guo Ren wrote: > > > On Mon, Jan 30, 2023 at 11:49 PM Mark Rutland <mark.rutland@arm.com> wrote: > > Masami, Steve, and I had a chat at the tracing summit late last year (which > > unfortunately, was not recorded), and what we'd like to do is get each > > architecture to have FPROBE (and FTRACE_WITH_ARGS), at which point OPTPROBE > > and KRETPROBE become redundant and could be removed. > > No, the fprobe will replace the KRETPROBE but not OPTPROBE. The OPTPROBE > is completely different one. Fprobe is used only for function entry, but > optprobe is applied to the function body. Sorry, I had OPTPROBE and KPROBE_ON_FTRACE confused in my head, and was thinking that FPROBE would supersede KPROBE_ON_FTRACE and KRETPROBE. > > i.e. we'd keep KPROBES as a "you can trace any instruction" feature, but in the > > few cases where OPTPROBES can make things fater by using FTRACE, you should > > just use that directly via FPROBE. > > I think what you are saying is KPROBE_ON_FTRACE, and that will be replaced by > FPROBES. Yes, sorry for the confusion. Mark.
On Thu, Feb 16, 2023 at 11:42 PM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Thu, 26 Jan 2023 11:15:59 -0500 > guoren@kernel.org wrote: > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > The previous implementation was based on the stop_matchine mechanism, > > which reduced the speed of arm/disarm_kprobe. Using minimum ebreak > > instruction would get accurate atomicity. > > > > This patch removes the patch_text of riscv, which is based on > > stop_machine. Then riscv only reserved patch_text_nosync, and developers > > need to be more careful in dealing with patch_text atomicity. > > > > When CONFIG_RISCV_ISA_C=n, the ebreak could replace the whole > > instruction. When CONFIG_RISCV_ISA_C=y, the patch uses 16-bit length > > c.ebreak instruction, which may occupy the first part of the 32-bit > > instruction and leave half the rest of the broken instruction. Because > > ebreak could detour the flow to skip it, leaving it in the kernel text > > memory is okay. > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > Signed-off-by: Guo Ren <guoren@kernel.org> > > I'm not sure how the RISCV specification ensures this type of self > code modification. But if you think calling the stop_machine() for > *each* probe arm/disarm is slow, there may be another way to avoid > ot by introducing a batch arming interface too. (reserve-commit way) Could you give out "reserve-commit way" link? I'm not making sense of that, thank you. > > BTW, for the BPF usecase which is usually only for function > entry/exit, we will use Fprobes. Since that will use ftrace batch > text patching, I think we already avoid such slowdown. > > FYI, for ftrace dynamic event usecase, there is another reason to slow > down the enable/disable dynamic event itself (to sync up the event > enabled status to ensure all event handler has been done, it waits > for rcu-sync for each operation.) If it's done, the ftrace common code will guarantee all events are done. Do you know if anyone has started this work? - "sync up the event enabled status to ensure all event handler has been done". > > Thank you, > > > --- > > arch/riscv/include/asm/patch.h | 1 - > > arch/riscv/kernel/patch.c | 33 ------------------------------ > > arch/riscv/kernel/probes/kprobes.c | 29 ++++++++++++++++++-------- > > 3 files changed, 21 insertions(+), 42 deletions(-) > > > > diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h > > index 9a7d7346001e..2500782e6f5b 100644 > > --- a/arch/riscv/include/asm/patch.h > > +++ b/arch/riscv/include/asm/patch.h > > @@ -7,6 +7,5 @@ > > #define _ASM_RISCV_PATCH_H > > > > int patch_text_nosync(void *addr, const void *insns, size_t len); > > -int patch_text(void *addr, u32 insn); > > > > #endif /* _ASM_RISCV_PATCH_H */ > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > index 765004b60513..8bd51ed8b806 100644 > > --- a/arch/riscv/kernel/patch.c > > +++ b/arch/riscv/kernel/patch.c > > @@ -98,36 +98,3 @@ int patch_text_nosync(void *addr, const void *insns, size_t len) > > return ret; > > } > > NOKPROBE_SYMBOL(patch_text_nosync); > > - > > -static int patch_text_cb(void *data) > > -{ > > - struct patch_insn *patch = data; > > - int ret = 0; > > - > > - if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) { > > - ret = > > - patch_text_nosync(patch->addr, &patch->insn, > > - GET_INSN_LENGTH(patch->insn)); > > - atomic_inc(&patch->cpu_count); > > - } else { > > - while (atomic_read(&patch->cpu_count) <= num_online_cpus()) > > - cpu_relax(); > > - smp_mb(); > > - } > > - > > - return ret; > > -} > > -NOKPROBE_SYMBOL(patch_text_cb); > > - > > -int patch_text(void *addr, u32 insn) > > -{ > > - struct patch_insn patch = { > > - .addr = addr, > > - .insn = insn, > > - .cpu_count = ATOMIC_INIT(0), > > - }; > > - > > - return stop_machine_cpuslocked(patch_text_cb, > > - &patch, cpu_online_mask); > > -} > > -NOKPROBE_SYMBOL(patch_text); > > diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c > > index 475989f06d6d..27f8960c321c 100644 > > --- a/arch/riscv/kernel/probes/kprobes.c > > +++ b/arch/riscv/kernel/probes/kprobes.c > > @@ -24,12 +24,18 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); > > static void __kprobes arch_prepare_ss_slot(struct kprobe *p) > > { > > unsigned long offset = GET_INSN_LENGTH(p->opcode); > > +#ifdef CONFIG_RISCV_ISA_C > > + u32 opcode = __BUG_INSN_16; > > +#else > > + u32 opcode = __BUG_INSN_32; > > +#endif > > > > p->ainsn.api.restore = (unsigned long)p->addr + offset; > > > > - patch_text(p->ainsn.api.insn, p->opcode); > > - patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset), > > - __BUG_INSN_32); > > + patch_text_nosync(p->ainsn.api.insn, &p->opcode, offset); > > + patch_text_nosync((void *)((unsigned long)(p->ainsn.api.insn) + offset), > > + &opcode, GET_INSN_LENGTH(opcode)); > > + > > } > > > > static void __kprobes arch_prepare_simulate(struct kprobe *p) > > @@ -114,16 +120,23 @@ void *alloc_insn_page(void) > > /* install breakpoint in text */ > > void __kprobes arch_arm_kprobe(struct kprobe *p) > > { > > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) > > - patch_text(p->addr, __BUG_INSN_32); > > - else > > - patch_text(p->addr, __BUG_INSN_16); > > +#ifdef CONFIG_RISCV_ISA_C > > + u32 opcode = __BUG_INSN_16; > > +#else > > + u32 opcode = __BUG_INSN_32; > > +#endif > > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); > > } > > > > /* remove breakpoint from text */ > > void __kprobes arch_disarm_kprobe(struct kprobe *p) > > { > > - patch_text(p->addr, p->opcode); > > +#ifdef CONFIG_RISCV_ISA_C > > + u32 opcode = __BUG_INSN_16; > > +#else > > + u32 opcode = __BUG_INSN_32; > > +#endif > > + patch_text_nosync(p->addr, &p->opcode, GET_INSN_LENGTH(opcode)); > > } > > > > void __kprobes arch_remove_kprobe(struct kprobe *p) > > -- > > 2.36.1 > > > > > -- > Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Thu, Feb 16, 2023 at 11:23 PM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > Hi, > > Sorry I missed this thread. > > On Tue, 31 Jan 2023 10:33:05 +0000 > Mark Rutland <mark.rutland@arm.com> wrote: > > > On Tue, Jan 31, 2023 at 09:48:29AM +0800, Guo Ren wrote: > > > On Mon, Jan 30, 2023 at 11:49 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > > > Hi Bjorn, > > > > > > > > On Mon, Jan 30, 2023 at 04:28:15PM +0100, Björn Töpel wrote: > > > > > Guo Ren <guoren@kernel.org> writes: > > > > > > > > > > >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair > > > > > >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute > > > > > >> in the instructions that will be modified, it is still need to stop other CPUs > > > > > >> via patch_text API, or you have any better solution to achieve the purpose? > > > > > > - The stop_machine is an expensive way all architectures should > > > > > > avoid, and you could keep that in your OPTPROBES implementation files > > > > > > with static functions. > > > > > > - The stop_machine couldn't work with PREEMPTION, so your > > > > > > implementation needs to work with !PREEMPTION. > > > > > > > > > > ...and stop_machine() with !PREEMPTION is broken as well, when you're > > > > > replacing multiple instructions (see Mark's post at [1]). The > > > > > stop_machine() dance might work when you're replacing *one* instruction, > > > > > not multiple as in the RISC-V case. I'll expand on this in a comment in > > > > > the OPTPROBES v6 series. > > > > > > > > Just to clarify, my comments in [1] were assuming that stop_machine() was not > > > > used, in which case there is a problem with or without PREEMPTION. > > > > > > > > I believe that when using stop_machine(), the !PREEMPTION case is fine, since > > > > stop_machine() schedules work rather than running work in IRQ context on the > > > > back of an IPI, so no CPUs should be mid-sequnce during the patching, and it's > > > > not possible for there to be threads which are preempted mid-sequence. > > > > > > > > That all said, IIUC optprobes is going to disappear once fprobe is ready > > > > everywhere, so that might be moot. > > > The optprobes could be in the middle of a function, but fprobe must be > > > the entry of a function, right? > > > > > > Does your fprobe here mean: ? > > > > > > The Linux kernel configuration item CONFIG_FPROBE: > > > > > > prompt: Kernel Function Probe (fprobe) > > > type: bool > > > depends on: ( CONFIG_FUNCTION_TRACER ) && ( > > > CONFIG_DYNAMIC_FTRACE_WITH_REGS ) && ( CONFIG_HAVE_RETHOOK ) > > > defined in kernel/trace/Kconfig > > > > Yes. > > > > Masami, Steve, and I had a chat at the tracing summit late last year (which > > unfortunately, was not recorded), and what we'd like to do is get each > > architecture to have FPROBE (and FTRACE_WITH_ARGS), at which point OPTPROBE > > and KRETPROBE become redundant and could be removed. > > No, the fprobe will replace the KRETPROBE but not OPTPROBE. The OPTPROBE > is completely different one. Fprobe is used only for function entry, but > optprobe is applied to the function body. > > > > > i.e. we'd keep KPROBES as a "you can trace any instruction" feature, but in the > > few cases where OPTPROBES can make things fater by using FTRACE, you should > > just use that directly via FPROBE. > > I think what you are saying is KPROBE_ON_FTRACE, and that will be replaced by > FPROBES. I'm sorry, I'm not sure how FPROBES could replace KPROBE_ON_FTRACE? Do you have some discussion on it? > > Thank you, > > > > > Thanks, > > Mark. > > > -- > Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Fri, Feb 17, 2023 at 3:32 PM Björn Töpel <bjorn@kernel.org> wrote: > > Guo Ren <guoren@kernel.org> writes: > > > On Thu, Feb 16, 2023 at 3:54 PM Björn Töpel <bjorn@kernel.org> wrote: > >> > >> Guo Ren <guoren@kernel.org> writes: > >> > >> > On Tue, Jan 31, 2023 at 6:57 PM Andrea Parri <parri.andrea@gmail.com> wrote: > >> >> > >> >> > > It's the concurrent modification that I was referring to (removing > >> >> > > stop_machine()). You're saying "it'll always work", I'm saying "I'm not > >> >> > > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that > >> >> > Software must ensure write c.ebreak on the head of an 32b insn. > >> >> > > >> >> > That means IFU only see: > >> >> > - c.ebreak + broken/illegal insn. > >> >> > or > >> >> > - origin insn > >> >> > > >> >> > Even in the worst case, such as IFU fetches instructions one by one: > >> >> > If the IFU gets the origin insn, it will skip the broken/illegal insn. > >> >> > If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak > >> >> > exception is raised. > >> >> > > >> >> > Because c.ebreak would raise an exception, I don't see any problem. > >> >> > >> >> That's the problem, this discussion is: > >> >> > >> >> Reviewer: "I'm not sure, that's not written in our spec" > >> >> Submitter: "I said it, it's called -accurate atomicity-" > >> > I really don't see any hardware that could break the atomicity of this > >> > c.ebreak scenario: > >> > - c.ebreak on the head of 32b insn > >> > - ebreak on an aligned 32b insn > >> > > >> > If IFU fetches with cacheline, all is atomicity. > >> > If IFU fetches with 16bit one by one, the first c.ebreak would raise > >> > an exception and skip the next broke/illegal instruction. > >> > Even if IFU fetches without any sequence, the IDU must decode one by > >> > one, right? The first half c.ebreak would protect and prevent the next > >> > broke/illegal instruction. Speculative execution on broke/illegal > >> > instruction won't cause any exceptions. > >> > > >> > It's a common issue, not a specific ISA issue. > >> > 32b instruction A -> 16b ebreak + 16b broken/illegal -> 32b > >> > instruction A. It's safe to transform. > >> > >> Waking up this thread again, now that Changbin has showed some interest > >> from another thread [1]. > >> > >> Guo, we can't really add your patches, and claim that they're generic, > >> "works on all" RISC-V systems. While it might work for your I/D coherent > >> system, that does not imply that it'll work on all platforms. RISC-V > >> allows for implementations that are I/D incoherent, and here your > >> IFU-implementations arguments do not hold. I'd really recommend to > >> readup on [2]. > > Sorry, [2] isn't related to this patch. > > Well, it is. Page 44 and 98. You are changing an instruction, that > potentially the processor fetches and executes, from an instruction > storage which has not been made consistent with data storage. Page 44 describes how two ST sequences affect IFU fetch, it is not related to this patch (We only use one IALIGN ebreak). Page 98 is a big topic and ignores the minimal fetch element size. (This material also agrees that the IFU should follow ISA minimal instruction size to fetch instructions from memory.) If you want to add something in the ISA spec for this patch. I think it should be at the beginning of the ISA spec: --- diff --git a/src/intro.tex b/src/intro.tex index 7a74ab7..4d353ee 100644 --- a/src/intro.tex +++ b/src/intro.tex @@ -467,7 +467,8 @@ We use the term IALIGN (measured in bits) to refer to the instruction-address alignment constraint the implementation enforces. IALIGN is 32 bits in the base ISA, but some ISA extensions, including the compressed ISA extension, relax IALIGN to 16 bits. IALIGN may not take on any value other than 16 or -32. +32. The Instruction Fetch Unit must fetch memory in IALGN, which means IFU +doesn't support misaligned fetch. We use the term ILEN (measured in bits) to refer to the maximum instruction length supported by an implementation, and which is always --- This sentence is redundant. No IFU will misplace fetch instructions, right? > > > This patch didn't have I/D incoherent problem because we broadcast the > > IPI fence.i in patch_text_nosync. > > After the modification, yes. > > > Compared to the stop_machine version, there is a crazy nested IPI > > broadcast cost. > > stop_machine -> patch_text_nosync -> flush_icache_all > > void flush_icache_all(void) > > { > > local_flush_icache_all(); > > > > if (IS_ENABLED(CONFIG_RISCV_SBI)) > > sbi_remote_fence_i(NULL); > > else > > on_each_cpu(ipi_remote_fence_i, NULL, 1); > > } > > EXPORT_SYMBOL(flush_icache_all); > > Look, I'd love to have your patch in *if we could say that it works on > all RISC-V platforms*. If everyone agrees that "Guo's approach works" -- > I'd be a happy person. I hate the stopmachine flow as much as the next > guy. I want a better mechanism in as well. What I'm saying is that: > > There's no specification for this. What assumptions can be made? The > fact that Intel, Arm, and power has this explicitly stated in the specs, > hints that this is something to pay attention to. Undefined behavior is > no fun debugging. > > You seem very confident that it's impossible to construct hardware where > your approach does not work. > > If there's concensus that your approach is "good enough" -- hey, that'd > be great! Get it in! > > >> Now how could we move on with your patches? Get it in a spec, or fold > >> the patches in as a Kconfig.socs-thing for the platforms where this is > >> OK. What are you thoughts on the latter? > > > > I didn't talk about I/D incoherent/coherent; what I say is the basic > > size of the instruction element. > > In an I/D cache system, why couldn't LSU store-half guarantee > > atomicity for I-cache fetch? How I-cache could fetch only one byte of > > that Store-half value? > > We've assumed this guarantee in the riscv jump_label implementation, > > so why not this patch couldn't? > > Which is a good point. Is that working on all existing implementations? > Is that assumption correct? We've seen issues with that approach on > *simulation*, where you/Andy fixed it: > https://lore.kernel.org/linux-riscv/20230206090440.1255001-1-guoren@kernel.org/ > > Maybe the RISC-V kernel's assumptions about text patching should just be > documented, and stating "this is what the kernel does, and what it > assumes about the execution environment -- if your hardware doesn't work > with that ¯\_(ツ)_/¯". > > What are your thoughts on this, Guo? You don't seem to share my > concerns. Or is it better to go for your path (patches!), and simply > change it if there's issues down the line? >
diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h index 9a7d7346001e..2500782e6f5b 100644 --- a/arch/riscv/include/asm/patch.h +++ b/arch/riscv/include/asm/patch.h @@ -7,6 +7,5 @@ #define _ASM_RISCV_PATCH_H int patch_text_nosync(void *addr, const void *insns, size_t len); -int patch_text(void *addr, u32 insn); #endif /* _ASM_RISCV_PATCH_H */ diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c index 765004b60513..8bd51ed8b806 100644 --- a/arch/riscv/kernel/patch.c +++ b/arch/riscv/kernel/patch.c @@ -98,36 +98,3 @@ int patch_text_nosync(void *addr, const void *insns, size_t len) return ret; } NOKPROBE_SYMBOL(patch_text_nosync); - -static int patch_text_cb(void *data) -{ - struct patch_insn *patch = data; - int ret = 0; - - if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) { - ret = - patch_text_nosync(patch->addr, &patch->insn, - GET_INSN_LENGTH(patch->insn)); - atomic_inc(&patch->cpu_count); - } else { - while (atomic_read(&patch->cpu_count) <= num_online_cpus()) - cpu_relax(); - smp_mb(); - } - - return ret; -} -NOKPROBE_SYMBOL(patch_text_cb); - -int patch_text(void *addr, u32 insn) -{ - struct patch_insn patch = { - .addr = addr, - .insn = insn, - .cpu_count = ATOMIC_INIT(0), - }; - - return stop_machine_cpuslocked(patch_text_cb, - &patch, cpu_online_mask); -} -NOKPROBE_SYMBOL(patch_text); diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c index 475989f06d6d..27f8960c321c 100644 --- a/arch/riscv/kernel/probes/kprobes.c +++ b/arch/riscv/kernel/probes/kprobes.c @@ -24,12 +24,18 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); static void __kprobes arch_prepare_ss_slot(struct kprobe *p) { unsigned long offset = GET_INSN_LENGTH(p->opcode); +#ifdef CONFIG_RISCV_ISA_C + u32 opcode = __BUG_INSN_16; +#else + u32 opcode = __BUG_INSN_32; +#endif p->ainsn.api.restore = (unsigned long)p->addr + offset; - patch_text(p->ainsn.api.insn, p->opcode); - patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset), - __BUG_INSN_32); + patch_text_nosync(p->ainsn.api.insn, &p->opcode, offset); + patch_text_nosync((void *)((unsigned long)(p->ainsn.api.insn) + offset), + &opcode, GET_INSN_LENGTH(opcode)); + } static void __kprobes arch_prepare_simulate(struct kprobe *p) @@ -114,16 +120,23 @@ void *alloc_insn_page(void) /* install breakpoint in text */ void __kprobes arch_arm_kprobe(struct kprobe *p) { - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) - patch_text(p->addr, __BUG_INSN_32); - else - patch_text(p->addr, __BUG_INSN_16); +#ifdef CONFIG_RISCV_ISA_C + u32 opcode = __BUG_INSN_16; +#else + u32 opcode = __BUG_INSN_32; +#endif + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode)); } /* remove breakpoint from text */ void __kprobes arch_disarm_kprobe(struct kprobe *p) { - patch_text(p->addr, p->opcode); +#ifdef CONFIG_RISCV_ISA_C + u32 opcode = __BUG_INSN_16; +#else + u32 opcode = __BUG_INSN_32; +#endif + patch_text_nosync(p->addr, &p->opcode, GET_INSN_LENGTH(opcode)); } void __kprobes arch_remove_kprobe(struct kprobe *p)