Message ID | 166981518895.1131462.4693062055762912734.stgit@devnote3 |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp928877wrr; Wed, 30 Nov 2022 05:38:27 -0800 (PST) X-Google-Smtp-Source: AA0mqf7N6TnJji7JRH4iYxxbhJK31/oyKv8LiZcO2Q8jQGap5kcDiKAmD8ABfhOfK7lU3DD6b+76 X-Received: by 2002:a17:90b:1282:b0:214:1804:d96b with SMTP id fw2-20020a17090b128200b002141804d96bmr63691830pjb.90.1669815507255; Wed, 30 Nov 2022 05:38:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669815507; cv=none; d=google.com; s=arc-20160816; b=EvFwuOTjVK6Pf7eaBHwYW7r1RbPh48Fq0Kh04bb29joL9ecWb1Tz3jBs2qS2sQS9Mu +JdsvpYAkFc48ki1r1mfMpWTAkHt3mmtffRm2xrPiMPfpnZt3DEIjb2sH+ZU+0NiYZKg LswPKjtM5s4QZtldsRNCOlCG4JKjLr95p93RaTd1QzS9+ayiwXpsdi+LqXovOSfGUIov mI3g/1n8eJxnFe0PwgyD+TrENry3ud4WmMgCoFVBQU5KvteRNuJtAAQ8YXz/0tebEZAx fBwOt0oniHGs+Hy0S0zgavxIVWid7ziuUpWce/x8/shxMq2Q1thQLiesNngrZTGarBAg kCvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:date:subject:cc:to:from:dkim-signature; bh=Zh9NJ8HiEbDgtonMCsIOlX6r1KfgzP9rJiz8owikWvM=; b=P1uB/8dq1RDtycP2tJgDied/hB6pTX4/XsG5z8caPeQYpurjH9am79QazXwLLNYIGC enqk1oFbOzshrcSvlI/hUO8zBR8bWzSJGr/l6NU7RtFsrF9IDlHuiqew7HMilmFtF/fK cnGR94+pi8Em4F9K5QGb5jv0ONYOMoltyRJZcJGWDtRhp9GpxEaPtNdVbeyqNQ42vQ3R +bsQrgDcp3lzLqGxBdLsVk7ypaQQoOjtFZWte9H4pWtjRYaQfS5d8aaHYzfMqFftlq8f nPgKSka2iW2iZzIaH9mJixVKrl3tfjCJh/IbJA/neDNtTuCAcRBmBzo5M2peYpQpq1Ik s/Sg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Tl81z/no"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x4-20020a63db44000000b00476e764bb0esi1252003pgi.392.2022.11.30.05.38.14; Wed, 30 Nov 2022 05:38:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Tl81z/no"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229713AbiK3NdT (ORCPT <rfc822;heyuhang3455@gmail.com> + 99 others); Wed, 30 Nov 2022 08:33:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229509AbiK3NdP (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 30 Nov 2022 08:33:15 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4546C2A976 for <linux-kernel@vger.kernel.org>; Wed, 30 Nov 2022 05:33:15 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 99DA161BEA for <linux-kernel@vger.kernel.org>; Wed, 30 Nov 2022 13:33:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9BFB1C433C1; Wed, 30 Nov 2022 13:33:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669815193; bh=XO5UJqdtuocJYtQeIVXnMEv2iX5AN+0skfN/KK6ex14=; h=From:To:Cc:Subject:Date:From; b=Tl81z/noFegDucrDO/148YmlTLCpLmQXXrq0HQamnuT/lYSlxG0zL75XB9jt4Zyav ulbHsnITYoTmzQy59PMt2RyC2l57y3zn42h6MN6A3Hj9X/zH/NDqOHD6BG9m9kczkr OVafDTpKeVJpQT6i8fnh5X8DBwDXw1o/ZsVQ3QCWAQuYfdSmMQw1Pc6tfE5Uuab+/b McQbiZhS9Fjnm5iQEIlMh9SRN2Ucq6eNXiv43e1Ib/9ai1BUxdos/HrIDXK4os7YMh PCoH5sIL1xU1hkedljc50HqFiaQKnGsrFOXt1VuFeCASZb7kpoBgz6ewDMtwljVoYw GP0Td1RZUXiyQ== From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> To: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Peter Zijlstra <peterz@infradead.org>, Masami Hiramatsu <mhiramat@kernel.org>, linux-kernel@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>, Huacai Chen <chenhuacai@loongson.cn>, Jinyang He <hejinyang@loongson.cn>, Tiezhu Yang <yangtiezhu@loongson.cn>, "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Subject: [PATCH -tip v2] x86/kprobes: Drop removed INT3 handling code Date: Wed, 30 Nov 2022 22:33:09 +0900 Message-Id: <166981518895.1131462.4693062055762912734.stgit@devnote3> X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog User-Agent: StGit/0.19 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750928465223237337?= X-GMAIL-MSGID: =?utf-8?q?1750928465223237337?= |
Series |
[-tip,v2] x86/kprobes: Drop removed INT3 handling code
|
|
Commit Message
Masami Hiramatsu (Google)
Nov. 30, 2022, 1:33 p.m. UTC
From: Masami Hiramatsu (Google) <mhiramat@kernel.org> Drop removed INT3 handling code from kprobe_int3_handler() because this case (get_kprobe() doesn't return corresponding kprobe AND the INT3 is removed) must not happen with the kprobe managed INT3, but can happen with the non-kprobe INT3, which should be handled by other callbacks. For the kprobe managed INT3, it is already safe. The commit 5c02ece81848d ("x86/kprobes: Fix ordering while text-patching") introduced text_poke_sync() to the arch_disarm_kprobe() right after removing INT3. Since this text_poke_sync() uses IPI to call sync_core() on all online cpus, that ensures that all running INT3 exception handlers have done. And, the unregister_kprobe() will remove the kprobe from the hash table after arch_disarm_kprobe(). Thus, when the kprobe managed INT3 hits, kprobe_int3_handler() should be able to find corresponding kprobe always by get_kprobe(). If it can not find any kprobe, this means that is NOT a kprobe managed INT3. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> --- Changes in v2: - update comment to mention that the first safe commit. --- arch/x86/kernel/kprobes/core.c | 14 -------------- 1 file changed, 14 deletions(-)
Comments
Hi Masami and the x86 list, On 30/11/2022 14:33, Masami Hiramatsu (Google) wrote: > From: Masami Hiramatsu (Google) <mhiramat@kernel.org> > > Drop removed INT3 handling code from kprobe_int3_handler() because this > case (get_kprobe() doesn't return corresponding kprobe AND the INT3 is > removed) must not happen with the kprobe managed INT3, but can happen > with the non-kprobe INT3, which should be handled by other callbacks. > > For the kprobe managed INT3, it is already safe. The commit 5c02ece81848d > ("x86/kprobes: Fix ordering while text-patching") introduced > text_poke_sync() to the arch_disarm_kprobe() right after removing INT3. > Since this text_poke_sync() uses IPI to call sync_core() on all online > cpus, that ensures that all running INT3 exception handlers have done. > And, the unregister_kprobe() will remove the kprobe from the hash table > after arch_disarm_kprobe(). > > Thus, when the kprobe managed INT3 hits, kprobe_int3_handler() should > be able to find corresponding kprobe always by get_kprobe(). If it can > not find any kprobe, this means that is NOT a kprobe managed INT3. > > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > Changes in v2: > - update comment to mention that the first safe commit. > --- > arch/x86/kernel/kprobes/core.c | 14 -------------- > 1 file changed, 14 deletions(-) I'm sorry to reply on a patch that is more than one year old, but in short, it looks like it is causing a kernel panic on our side. To be honest, I don't understand the link, but I probably missed something. A bit of context: for the MPTCP net subsystem, we are testing a new CI service to launch a VM and run our test suite (kselftests, kunit, etc.). This CI (Github Action) doesn't support KVM acceleration, and QEmu (v6.2.0) falls back to TCG ("-machine accel=kvm:tcg"). Before, we were always running the tests with QEmu and KVM support, and I don't think we had this issue before. Now, in two weeks, this new CI reported 5 kernel panic in ~30 runs. I initially reported the issue on netdev [1], because the CI always got the panic when doing some pings between different netns, not using MPTCP yet. Eric Dumazet mentioned that it looks like it is an x86 issue, maybe with the jump labels. Since then, I tried to 'git bisect' the issue on my side, but it was not easy: hard to reproduce and unclear to me what is causing it. After a few (long) sessions, 'git bisect' gave me this commit: 8e791f7eba4c ("x86/kprobes: Drop removed INT3 handling code") I thought it was another false positive, but I was not able to reproduce the issue when testing the parent commit, while I could still do that when validating this 8e791f7eba4c commit. I also went back to our MPTCP tree [2], reproduced the issue again, just to be sure. Then I tried to reproduce it after having reverted 8e791f7eba4c. I didn't have any panic, and I tried for a long time: ~2000 iterations, while I usually have the kernel panic after ~50 iterations. So it looks like the modifications done by this patch is linked to the kernel panic we have: > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c > index 66299682b6b7..33390ed4dcf3 100644 > --- a/arch/x86/kernel/kprobes/core.c > +++ b/arch/x86/kernel/kprobes/core.c > @@ -986,20 +986,6 @@ int kprobe_int3_handler(struct pt_regs *regs) > kprobe_post_process(p, regs, kcb); > return 1; > } > - } > - > - if (*addr != INT3_INSN_OPCODE) { > - /* > - * The breakpoint instruction was removed right > - * after we hit it. Another cpu has removed > - * either a probepoint or a debugger breakpoint > - * at this address. In either case, no further > - * handling of this interrupt is appropriate. > - * Back up over the (now missing) int3 and run > - * the original instruction. > - */ > - regs->ip = (unsigned long)addr; > - return 1; > } /* else: not a kprobe fault; let the kernel handle it */ > > return 0; > > As far as I know, our 'mptcp_connect.c' selftest that is being used to reproduce the issue, is not using KProbes, tracing, dynamic debug, bpf, etc. It simply creates netns with IP and routes, then do some pings between the different IPs. Any idea why this commit could be linked to a kernel panic? Here is the last call trace I had: > # INFO: validating network environment with pings > [ 1985.073189] int3: 0000 [#1] PREEMPT SMP NOPTI > [ 1985.073246] CPU: 0 PID: 3203 Comm: ping Not tainted 6.7.0-113761-g5e006770879c-dirty #250 > [ 1985.073246] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > [ 1985.073246] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) > [ 1985.073246] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 67 48 d0 > All code > ======== > 0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) > 7: 00 > 8: 0f 1f 40 00 nopl 0x0(%rax) > c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > 11: 55 push %rbp > 12: 48 89 fd mov %rdi,%rbp > 15: 48 83 ec 20 sub $0x20,%rsp > 19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax > 20: 00 00 > 22: 48 89 44 24 18 mov %rax,0x18(%rsp) > 27: 31 c0 xor %eax,%eax > 29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction > 2e: 66 90 xchg %ax,%ax > 30: 66 90 xchg %ax,%ax > 32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx > 37: 48 89 ef mov %rbp,%rdi > 3a: 65 gs > 3b: 8b .byte 0x8b > 3c: 35 .byte 0x35 > 3d: 67 addr32 > 3e: 48 rex.W > 3f: d0 .byte 0xd0 > > Code starting with the faulting instruction > =========================================== > 0: c9 leave > 1: 00 00 add %al,(%rax) > 3: 00 66 90 add %ah,-0x70(%rsi) > 6: 66 90 xchg %ax,%ax > 8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx > d: 48 89 ef mov %rbp,%rdi > 10: 65 gs > 11: 8b .byte 0x8b > 12: 35 .byte 0x35 > 13: 67 addr32 > 14: 48 rex.W > 15: d0 .byte 0xd0 > [ 1985.073246] RSP: 0018:ffffb36d40003c08 EFLAGS: 00000246 > [ 1985.073246] RAX: 0000000000000000 RBX: ffff9580825ca000 RCX: 0000000000000001 > [ 1985.073246] RDX: 0000000000000002 RSI: ffff9580825c8000 RDI: ffff9580821cca00 > [ 1985.073246] RBP: ffff9580821cca00 R08: 0000000000000000 R09: 000000000000001c > [ 1985.073246] R10: ffff9580812dcf10 R11: ffff9580812dcf00 R12: ffff9580821cca00 > [ 1985.073246] R13: 000000000000002a R14: 0000000000000000 R15: ffff9580825e1800 > [ 1985.073246] FS: 00007fa7c46be1c0(0000) GS:ffff9580fdc00000(0000) knlGS:0000000000000000 > [ 1985.073246] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1985.073246] CR2: 00005584236b2200 CR3: 0000000002704000 CR4: 00000000000006f0 > [ 1985.073246] Call Trace: > [ 1985.073246] <IRQ> > [ 1985.073246] ? die (arch/x86/kernel/dumpstack.c:421) > [ 1985.073246] ? exc_int3 (arch/x86/kernel/traps.c:762) > [ 1985.073246] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569) > [ 1985.073246] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) > [ 1985.073246] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) > [ 1985.073246] ? kmem_cache_alloc_node (mm/slub.c:3843) > [ 1985.073246] __netif_rx (net/core/dev.c:5084) > [ 1985.073246] veth_xmit (drivers/net/veth.c:321) > [ 1985.073246] dev_hard_start_xmit (include/linux/netdevice.h:4989) > [ 1985.073246] __dev_queue_xmit (include/linux/netdevice.h:3367) > [ 1985.073246] ? arp_create (net/ipv4/arp.c:577) > [ 1985.073246] arp_solicit (net/ipv4/arp.c:392) > [ 1985.073246] ? kmem_cache_alloc (mm/slub.c:3843) > [ 1985.073246] ? arp_constructor (net/ipv4/arp.c:249) > [ 1985.073246] neigh_probe (arch/x86/include/asm/atomic.h:53) > [ 1985.073246] __neigh_event_send (net/core/neighbour.c:1242) > [ 1985.073246] neigh_resolve_output (net/core/neighbour.c:1547) > [ 1985.073246] ip_finish_output2 (include/net/neighbour.h:542) > [ 1985.073246] ? __ip_finish_output.part.0 (include/linux/skbuff.h:4884) > [ 1985.073246] __netif_receive_skb_one_core (net/core/dev.c:5537) > [ 1985.073246] process_backlog (include/linux/rcupdate.h:782) > [ 1985.073246] __napi_poll (net/core/dev.c:6576) > [ 1985.073246] net_rx_action (net/core/dev.c:6647) > [ 1985.073246] __do_softirq (arch/x86/include/asm/jump_label.h:27) > [ 1985.073246] do_softirq (kernel/softirq.c:454) > [ 1985.073246] </IRQ> > [ 1985.073246] <TASK> > [ 1985.073246] __local_bh_enable_ip (kernel/softirq.c:381) > [ 1985.073246] __dev_queue_xmit (net/core/dev.c:4379) > [ 1985.073246] ip_finish_output2 (include/linux/netdevice.h:3171) > [ 1985.073246] ? __ip_finish_output.part.0 (include/linux/skbuff.h:4884) > [ 1985.073246] ip_push_pending_frames (net/ipv4/ip_output.c:1490) > [ 1985.073246] raw_sendmsg (net/ipv4/raw.c:647) > [ 1985.073246] ? netfs_rreq_assess (fs/netfs/io.c:101) > [ 1985.073246] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206) > [ 1985.073246] ? set_pte_range (mm/memory.c:4529) > [ 1985.073246] ? __sock_sendmsg (net/socket.c:733) > [ 1985.073246] __sock_sendmsg (net/socket.c:733) > [ 1985.073246] ? move_addr_to_kernel.part.0 (net/socket.c:253) > [ 1985.073246] __sys_sendto (net/socket.c:2191) > [ 1985.073246] ? __rseq_handle_notify_resume (kernel/rseq.c:257) > [ 1985.073246] ? ktime_get_real_ts64 (kernel/time/timekeeping.c:292 (discriminator 3)) > [ 1985.073246] __x64_sys_sendto (net/socket.c:2203) > [ 1985.073246] do_syscall_64 (arch/x86/entry/common.c:52) > [ 1985.073246] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) > [ 1985.073246] RIP: 0033:0x7fa7c499081a > [ 1985.073246] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 > All code > ======== > 0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4) > 4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax > b: eb b8 jmp 0xffffffffffffffc5 > d: 0f 1f 00 nopl (%rax) > 10: f3 0f 1e fa endbr64 > 14: 41 89 ca mov %ecx,%r10d > 17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax > 1e: 00 > 1f: 85 c0 test %eax,%eax > 21: 75 15 jne 0x38 > 23: b8 2c 00 00 00 mov $0x2c,%eax > 28: 0f 05 syscall > 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction > 30: 77 7e ja 0xb0 > 32: c3 ret > 33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > 38: 41 54 push %r12 > 3a: 48 83 ec 30 sub $0x30,%rsp > 3e: 44 rex.R > 3f: 89 .byte 0x89 > > Code starting with the faulting instruction > =========================================== > 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax > 6: 77 7e ja 0x86 > 8: c3 ret > 9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > e: 41 54 push %r12 > 10: 48 83 ec 30 sub $0x30,%rsp > 14: 44 rex.R > 15: 89 .byte 0x89 > [ 1985.073246] RSP: 002b:00007ffce269b368 EFLAGS: 00000246 ORIG_RAX: 000000000000002c > [ 1985.073246] RAX: ffffffffffffffda RBX: 00007ffce269ca10 RCX: 00007fa7c499081a > [ 1985.073246] RDX: 0000000000000040 RSI: 0000558423f7c300 RDI: 0000000000000003 > [ 1985.073246] RBP: 0000558423f7c300 R08: 00007ffce269ec90 R09: 0000000000000010 > [ 1985.073246] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 > [ 1985.073246] R13: 00007ffce269c5a8 R14: 00007ffce269b370 R15: 00007ffce269ca10 > [ 1985.073246] </TASK> > [ 1985.073246] Modules linked in: > [ 1985.073246] ---[ end trace 0000000000000000 ]--- > [ 1985.073246] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) > [ 1985.073246] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 67 48 d0 > All code > ======== > 0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) > 7: 00 > 8: 0f 1f 40 00 nopl 0x0(%rax) > c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > 11: 55 push %rbp > 12: 48 89 fd mov %rdi,%rbp > 15: 48 83 ec 20 sub $0x20,%rsp > 19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax > 20: 00 00 > 22: 48 89 44 24 18 mov %rax,0x18(%rsp) > 27: 31 c0 xor %eax,%eax > 29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction > 2e: 66 90 xchg %ax,%ax > 30: 66 90 xchg %ax,%ax > 32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx > 37: 48 89 ef mov %rbp,%rdi > 3a: 65 gs > 3b: 8b .byte 0x8b > 3c: 35 .byte 0x35 > 3d: 67 addr32 > 3e: 48 rex.W > 3f: d0 .byte 0xd0 > > Code starting with the faulting instruction > =========================================== > 0: c9 leave > 1: 00 00 add %al,(%rax) > 3: 00 66 90 add %ah,-0x70(%rsi) > 6: 66 90 xchg %ax,%ax > 8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx > d: 48 89 ef mov %rbp,%rdi > 10: 65 gs > 11: 8b .byte 0x8b > 12: 35 .byte 0x35 > 13: 67 addr32 > 14: 48 rex.W > 15: d0 .byte 0xd0 > [ 1985.073246] RSP: 0018:ffffb36d40003c08 EFLAGS: 00000246 > [ 1985.073246] RAX: 0000000000000000 RBX: ffff9580825ca000 RCX: 0000000000000001 > [ 1985.073246] RDX: 0000000000000002 RSI: ffff9580825c8000 RDI: ffff9580821cca00 > [ 1985.073246] RBP: ffff9580821cca00 R08: 0000000000000000 R09: 000000000000001c > [ 1985.073246] R10: ffff9580812dcf10 R11: ffff9580812dcf00 R12: ffff9580821cca00 > [ 1985.073246] R13: 000000000000002a R14: 0000000000000000 R15: ffff9580825e1800 > [ 1985.073246] FS: 00007fa7c46be1c0(0000) GS:ffff9580fdc00000(0000) knlGS:0000000000000000 > [ 1985.073246] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1985.073246] CR2: 00005584236b2200 CR3: 0000000002704000 CR4: 00000000000006f0 > [ 1985.073246] Kernel panic - not syncing: Fatal exception in interrupt > [ 1985.073246] Kernel Offset: 0x19a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > Unexpected stop of the VM My kconfig is available in [3], here is an extract: $ grep JUMP .config # CONFIG_KEXEC_JUMP is not set CONFIG_ARCH_SUPPORTS_KEXEC_JUMP=y CONFIG_JUMP_LABEL=y CONFIG_HAVE_ARCH_JUMP_LABEL=y CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y CONFIG_HAVE_JUMP_LABEL_HACK=y $ grep PROBES .config CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_KPROBES=y CONFIG_OPTPROBES=y CONFIG_KPROBES_ON_FTRACE=y CONFIG_UPROBES=y CONFIG_KRETPROBES=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_OPTPROBES=y CONFIG_HAVE_KPROBES_ON_FTRACE=y # CONFIG_KPROBES_SANITY_TEST is not set $ grep DYNAMIC .config CONFIG_PREEMPT_DYNAMIC=y CONFIG_DYNAMIC_MEMORY_LAYOUT=y CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y CONFIG_HAVE_PREEMPT_DYNAMIC=y CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y CONFIG_DYNAMIC_SIGFRAME=y # CONFIG_SWIOTLB_DYNAMIC is not set CONFIG_DYNAMIC_DEBUG=y CONFIG_DYNAMIC_DEBUG_CORE=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y CONFIG_HAVE_DYNAMIC_FTRACE_NO_PATCHABLE=y CONFIG_DYNAMIC_FTRACE=y CONFIG_DYNAMIC_FTRACE_WITH_REGS=y CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y CONFIG_DYNAMIC_EVENTS=y # CONFIG_TEST_DYNAMIC_DEBUG is not set I reported the tests I did, userspace commands before triggering the bug, call traces, kernel config, vmlinux files, etc. on a ticket that is publicly accessible [4]. Do you mind having a look at this issue, and tell me what you think about it, please? I can run more tests and debug patches if it can help. Please note that it looks like it is even harder to reproduce the kernel panic when added more debug mechanisms, but I can always try. [1] https://lore.kernel.org/netdev/98724dcd-ddf3-4f78-a386-f966ffbc9528@kernel.org/T/ [2] on top of Linus' tree from 2 days ago: 736b5545d39c ("Merge tag 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") [3] https://github.com/multipath-tcp/mptcp_net-next/files/13998625/config.gz [4] https://github.com/multipath-tcp/mptcp_net-next/issues/471 Cheers, Matt
On Sat, 20 Jan 2024 18:44:38 +0100 Matthieu Baerts <matttbe@kernel.org> wrote: > > I'm sorry to reply on a patch that is more than one year old, but in No problem, I've done the same. > short, it looks like it is causing a kernel panic on our side. To be > honest, I don't understand the link, but I probably missed something. > > A bit of context: for the MPTCP net subsystem, we are testing a new CI > service to launch a VM and run our test suite (kselftests, kunit, etc.). > This CI (Github Action) doesn't support KVM acceleration, and QEmu > (v6.2.0) falls back to TCG ("-machine accel=kvm:tcg"). Before, we were > always running the tests with QEmu and KVM support, and I don't think we > had this issue before. Now, in two weeks, this new CI reported 5 kernel > panic in ~30 runs. I'm guessing that qemu doesn't do something that real hardware will do, which is causing the bug. > > I initially reported the issue on netdev [1], because the CI always got > the panic when doing some pings between different netns, not using MPTCP > yet. Eric Dumazet mentioned that it looks like it is an x86 issue, maybe > with the jump labels. Since then, I tried to 'git bisect' the issue on > my side, but it was not easy: hard to reproduce and unclear to me what > is causing it. It could possibly be due to jump_labels/static_branch as they use int3 as well. > > After a few (long) sessions, 'git bisect' gave me this commit: > > 8e791f7eba4c ("x86/kprobes: Drop removed INT3 handling code") > > I thought it was another false positive, but I was not able to reproduce > the issue when testing the parent commit, while I could still do that > when validating this 8e791f7eba4c commit. I also went back to our MPTCP > tree [2], reproduced the issue again, just to be sure. Then I tried to > reproduce it after having reverted 8e791f7eba4c. I didn't have any > panic, and I tried for a long time: ~2000 iterations, while I usually > have the kernel panic after ~50 iterations. > > So it looks like the modifications done by this patch is linked to the > kernel panic we have: Actually, I think it's possibly the other way around. The changes to this patch actually allow the bug to be hit! > > > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c > > index 66299682b6b7..33390ed4dcf3 100644 > > --- a/arch/x86/kernel/kprobes/core.c > > +++ b/arch/x86/kernel/kprobes/core.c > > @@ -986,20 +986,6 @@ int kprobe_int3_handler(struct pt_regs *regs) > > kprobe_post_process(p, regs, kcb); > > return 1; > > } > > - } > > - > > - if (*addr != INT3_INSN_OPCODE) { So if *addr != INT3_INST_OPCODE, then this returns 1 addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t)); Hmm, so the int3 is removed when this is called, and in that case, this returns 1. > > - /* > > - * The breakpoint instruction was removed right > > - * after we hit it. Another cpu has removed > > - * either a probepoint or a debugger breakpoint > > - * at this address. In either case, no further > > - * handling of this interrupt is appropriate. > > - * Back up over the (now missing) int3 and run > > - * the original instruction. > > - */ > > - regs->ip = (unsigned long)addr; > > - return 1; > > } /* else: not a kprobe fault; let the kernel handle it */ > > > > return 0; > > > > > > > As far as I know, our 'mptcp_connect.c' selftest that is being used to > reproduce the issue, is not using KProbes, tracing, dynamic debug, bpf, > etc. It simply creates netns with IP and routes, then do some pings > between the different IPs. Any idea why this commit could be linked to a > kernel panic? The above function is called in traps.c: static bool do_int3(struct pt_regs *regs) { int res; #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP if (kgdb_ll_trap(DIE_INT3, "int3", regs, 0, X86_TRAP_BP, SIGTRAP) == NOTIFY_STOP) return true; #endif /* CONFIG_KGDB_LOW_LEVEL_TRAP */ #ifdef CONFIG_KPROBES if (kprobe_int3_handler(regs)) return true; #endif res = notify_die(DIE_INT3, "int3", regs, 0, X86_TRAP_BP, SIGTRAP); return res == NOTIFY_STOP; } If the kprobe_int3_handler() returns 1 it shortcuts the do_int3() function and it never gets to the notify_die() call. If something called into this and that notify_die() causes the panic, the kprobe patch prevented that by sheer luck. That is, the patch prevented whatever is not working from calling the notify_die() as it should have happened, and the panic was avoided. By adding this commit, it allowed the bug to trigger the panic. The crash has nothing to do with kporbes, it was just that the kprobe handler was incorrectly short-cutting the do_in3() function and preventing the panic. > > > Here is the last call trace I had: > > > # INFO: validating network environment with pings > > [ 1985.073189] int3: 0000 [#1] PREEMPT SMP NOPTI > > [ 1985.073246] CPU: 0 PID: 3203 Comm: ping Not tainted 6.7.0-113761-g5e006770879c-dirty #250 > > [ 1985.073246] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > > [ 1985.073246] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) Yep, definitely a jump_label issue. > > [ 1985.073246] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 67 48 d0 > > All code > > ======== > > 0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) > > 7: 00 > > 8: 0f 1f 40 00 nopl 0x0(%rax) > > c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > 11: 55 push %rbp > > 12: 48 89 fd mov %rdi,%rbp > > 15: 48 83 ec 20 sub $0x20,%rsp > > 19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax > > 20: 00 00 > > 22: 48 89 44 24 18 mov %rax,0x18(%rsp) > > 27: 31 c0 xor %eax,%eax > > 29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction > > 2e: 66 90 xchg %ax,%ax > > 30: 66 90 xchg %ax,%ax > > 32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx > > 37: 48 89 ef mov %rbp,%rdi > > 3a: 65 gs > > 3b: 8b .byte 0x8b > > 3c: 35 .byte 0x35 > > 3d: 67 addr32 > > 3e: 48 rex.W > > 3f: d0 .byte 0xd0 > > [..] > > I reported the tests I did, userspace commands before triggering the > bug, call traces, kernel config, vmlinux files, etc. on a ticket that is > publicly accessible [4]. > > > Do you mind having a look at this issue, and tell me what you think > about it, please? Just did ;-) So for some reason, jump labels is causing a crash. I can try to look more into this on Monday. -- Steve > > I can run more tests and debug patches if it can help. Please note that > it looks like it is even harder to reproduce the kernel panic when added > more debug mechanisms, but I can always try. > > > [1] > https://lore.kernel.org/netdev/98724dcd-ddf3-4f78-a386-f966ffbc9528@kernel.org/T/ > [2] on top of Linus' tree from 2 days ago: 736b5545d39c ("Merge tag > 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") > [3] https://github.com/multipath-tcp/mptcp_net-next/files/13998625/config.gz > [4] https://github.com/multipath-tcp/mptcp_net-next/issues/471 > > > Cheers, > Matt
On Sat, 20 Jan 2024 17:05:17 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Sat, 20 Jan 2024 18:44:38 +0100 > Matthieu Baerts <matttbe@kernel.org> wrote: > > > > > I'm sorry to reply on a patch that is more than one year old, but in > > No problem, I've done the same. Yeah, thanks for reporting! I realized the problem. > > > short, it looks like it is causing a kernel panic on our side. To be > > honest, I don't understand the link, but I probably missed something. > > > > A bit of context: for the MPTCP net subsystem, we are testing a new CI > > service to launch a VM and run our test suite (kselftests, kunit, etc.). > > This CI (Github Action) doesn't support KVM acceleration, and QEmu > > (v6.2.0) falls back to TCG ("-machine accel=kvm:tcg"). Before, we were > > always running the tests with QEmu and KVM support, and I don't think we > > had this issue before. Now, in two weeks, this new CI reported 5 kernel > > panic in ~30 runs. > > I'm guessing that qemu doesn't do something that real hardware will do, > which is causing the bug. If this is the timing bug, it is not qemu's issue, but ours. > > I initially reported the issue on netdev [1], because the CI always got > > the panic when doing some pings between different netns, not using MPTCP > > yet. Eric Dumazet mentioned that it looks like it is an x86 issue, maybe > > with the jump labels. Since then, I tried to 'git bisect' the issue on > > my side, but it was not easy: hard to reproduce and unclear to me what > > is causing it. > > It could possibly be due to jump_labels/static_branch as they use int3 > as well. Yeah, it seems like. Does jump_labels/static_branch wait until all interrupts exit before removing their object from the "active list"? kprobes does this but I found it might be *not enough*. When removing a kprobe, we do 1. disarm int3 2. remove kprobe from hlist ("active list") by hlist_del_rcu() 3. wait for rcu 4. free kprobe's trampoline The possible scenario is CPU0 CPU1 0. hit int3 1. disarm int3 2. remove kprobe from hlist 2.1 run do_int3() 2.2 kprobe_int3_handler() failed (*) 2.3 notify_die() failed 2.4 kernel panic 3. wait for rcu 4. free kprobe's trampoline (*) because the corresponding kprobe is already removed from the active list. Thus exc_int3() needs a check whether the int3 is already removed or not before it decides that int3 is a stray int3 (== returning false). Or, another possible solution is to add another synchronize_rcu() or sync_core() right after disarming int3 so that we ensure all int3 handler at that point are finished. > > > > > After a few (long) sessions, 'git bisect' gave me this commit: > > > > 8e791f7eba4c ("x86/kprobes: Drop removed INT3 handling code") > > > > I thought it was another false positive, but I was not able to reproduce > > the issue when testing the parent commit, while I could still do that > > when validating this 8e791f7eba4c commit. I also went back to our MPTCP > > tree [2], reproduced the issue again, just to be sure. Then I tried to > > reproduce it after having reverted 8e791f7eba4c. I didn't have any > > panic, and I tried for a long time: ~2000 iterations, while I usually > > have the kernel panic after ~50 iterations. > > > > So it looks like the modifications done by this patch is linked to the > > kernel panic we have: > > Actually, I think it's possibly the other way around. The changes to > this patch actually allow the bug to be hit! > > > > > > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c > > > index 66299682b6b7..33390ed4dcf3 100644 > > > --- a/arch/x86/kernel/kprobes/core.c > > > +++ b/arch/x86/kernel/kprobes/core.c > > > @@ -986,20 +986,6 @@ int kprobe_int3_handler(struct pt_regs *regs) > > > kprobe_post_process(p, regs, kcb); > > > return 1; > > > } > > > - } > > > - > > > - if (*addr != INT3_INSN_OPCODE) { > > So if *addr != INT3_INST_OPCODE, then this returns 1 > > addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t)); > > Hmm, so the int3 is removed when this is called, and in that case, this > returns 1. Yes, this is a kind of "fixup" path. > > > > > - /* > > > - * The breakpoint instruction was removed right > > > - * after we hit it. Another cpu has removed > > > - * either a probepoint or a debugger breakpoint > > > - * at this address. In either case, no further > > > - * handling of this interrupt is appropriate. > > > - * Back up over the (now missing) int3 and run > > > - * the original instruction. > > > - */ > > > - regs->ip = (unsigned long)addr; > > > - return 1; > > > } /* else: not a kprobe fault; let the kernel handle it */ > > > > > > return 0; > > > > > > > > > > > > As far as I know, our 'mptcp_connect.c' selftest that is being used to > > reproduce the issue, is not using KProbes, tracing, dynamic debug, bpf, > > etc. It simply creates netns with IP and routes, then do some pings > > between the different IPs. Any idea why this commit could be linked to a > > kernel panic? > > The above function is called in traps.c: > > static bool do_int3(struct pt_regs *regs) > { > int res; > > #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP > if (kgdb_ll_trap(DIE_INT3, "int3", regs, 0, X86_TRAP_BP, > SIGTRAP) == NOTIFY_STOP) > return true; > #endif /* CONFIG_KGDB_LOW_LEVEL_TRAP */ > > #ifdef CONFIG_KPROBES > if (kprobe_int3_handler(regs)) > return true; > #endif > res = notify_die(DIE_INT3, "int3", regs, 0, X86_TRAP_BP, SIGTRAP); > > return res == NOTIFY_STOP; > } > > If the kprobe_int3_handler() returns 1 it shortcuts the do_int3() > function and it never gets to the notify_die() call. > > If something called into this and that notify_die() causes the panic, > the kprobe patch prevented that by sheer luck. Yes, it prevents the panic unexpectedly :) > > That is, the patch prevented whatever is not working from calling the > notify_die() as it should have happened, and the panic was avoided. By > adding this commit, it allowed the bug to trigger the panic. The crash > has nothing to do with kporbes, it was just that the kprobe handler was > incorrectly short-cutting the do_in3() function and preventing the > panic. Yeah, but this should not be done in the kprobe_int3_handler() but exc_int3() should do. Thank you, > > > > > > > Here is the last call trace I had: > > > > > # INFO: validating network environment with pings > > > [ 1985.073189] int3: 0000 [#1] PREEMPT SMP NOPTI > > > [ 1985.073246] CPU: 0 PID: 3203 Comm: ping Not tainted 6.7.0-113761-g5e006770879c-dirty #250 > > > [ 1985.073246] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > > > [ 1985.073246] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) > > Yep, definitely a jump_label issue. > > > > [ 1985.073246] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 67 48 d0 > > > All code > > > ======== > > > 0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) > > > 7: 00 > > > 8: 0f 1f 40 00 nopl 0x0(%rax) > > > c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > > 11: 55 push %rbp > > > 12: 48 89 fd mov %rdi,%rbp > > > 15: 48 83 ec 20 sub $0x20,%rsp > > > 19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax > > > 20: 00 00 > > > 22: 48 89 44 24 18 mov %rax,0x18(%rsp) > > > 27: 31 c0 xor %eax,%eax > > > 29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction > > > 2e: 66 90 xchg %ax,%ax > > > 30: 66 90 xchg %ax,%ax > > > 32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx > > > 37: 48 89 ef mov %rbp,%rdi > > > 3a: 65 gs > > > 3b: 8b .byte 0x8b > > > 3c: 35 .byte 0x35 > > > 3d: 67 addr32 > > > 3e: 48 rex.W > > > 3f: d0 .byte 0xd0 > > > > > [..] > > > > > I reported the tests I did, userspace commands before triggering the > > bug, call traces, kernel config, vmlinux files, etc. on a ticket that is > > publicly accessible [4]. > > > > > > Do you mind having a look at this issue, and tell me what you think > > about it, please? > > Just did ;-) > > So for some reason, jump labels is causing a crash. I can try to look > more into this on Monday. > > -- Steve > > > > > > I can run more tests and debug patches if it can help. Please note that > > it looks like it is even harder to reproduce the kernel panic when added > > more debug mechanisms, but I can always try. > > > > > > [1] > > https://lore.kernel.org/netdev/98724dcd-ddf3-4f78-a386-f966ffbc9528@kernel.org/T/ > > [2] on top of Linus' tree from 2 days ago: 736b5545d39c ("Merge tag > > 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") > > [3] https://github.com/multipath-tcp/mptcp_net-next/files/13998625/config.gz > > [4] https://github.com/multipath-tcp/mptcp_net-next/issues/471 > > > > > > Cheers, > > Matt > >
On Sun, 21 Jan 2024 11:28:52 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > On Sat, 20 Jan 2024 17:05:17 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Sat, 20 Jan 2024 18:44:38 +0100 > > Matthieu Baerts <matttbe@kernel.org> wrote: > > > > > > > > I'm sorry to reply on a patch that is more than one year old, but in > > > > No problem, I've done the same. > > Yeah, thanks for reporting! I realized the problem. > > > > > > short, it looks like it is causing a kernel panic on our side. To be > > > honest, I don't understand the link, but I probably missed something. > > > > > > A bit of context: for the MPTCP net subsystem, we are testing a new CI > > > service to launch a VM and run our test suite (kselftests, kunit, etc.). > > > This CI (Github Action) doesn't support KVM acceleration, and QEmu > > > (v6.2.0) falls back to TCG ("-machine accel=kvm:tcg"). Before, we were > > > always running the tests with QEmu and KVM support, and I don't think we > > > had this issue before. Now, in two weeks, this new CI reported 5 kernel > > > panic in ~30 runs. > > > > I'm guessing that qemu doesn't do something that real hardware will do, > > which is causing the bug. > > If this is the timing bug, it is not qemu's issue, but ours. Hmm, as far as I can see, the jump_label uses text_poke_bp(_batch) which send IPI for sync_core() on each core, after replacing INT3 with other opcode. void text_poke_sync(void) { on_each_cpu(do_sync_core, NULL, 1); } static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries) { [...] /* * Third step: replace the first byte (int3) by the first byte of * replacing opcode. */ for (do_sync = 0, i = 0; i < nr_entries; i++) { u8 byte = tp[i].text[0]; if (tp[i].len == 6) byte = 0x0f; if (byte == INT3_INSN_OPCODE) continue; text_poke(text_poke_addr(&tp[i]), &byte, INT3_INSN_SIZE); do_sync++; } if (do_sync) text_poke_sync(); [...] This must ensure those processor should finish running INT3 exception handlers because the IPI is handled outside of the INT3 exception. However, if the I-cache entry servives text_poke() and sync_core(), this problem may happen. The text_poke() flushes TLB but for the local (!global) PTE, and sync_core() just serialize (!= cache flushing?). Thus the other CPUs can still see the INT3 after text_poke_sync()? If so, on such CPU, removed INT3 is still alive on the I-cache and hit it after text_poke_sync(). This will be a ghost INT3... > > > > I initially reported the issue on netdev [1], because the CI always got > > > the panic when doing some pings between different netns, not using MPTCP > > > yet. Eric Dumazet mentioned that it looks like it is an x86 issue, maybe > > > with the jump labels. Since then, I tried to 'git bisect' the issue on > > > my side, but it was not easy: hard to reproduce and unclear to me what > > > is causing it. > > > > It could possibly be due to jump_labels/static_branch as they use int3 > > as well. > > Yeah, it seems like. Does jump_labels/static_branch wait until all interrupts > exit before removing their object from the "active list"? > > kprobes does this but I found it might be *not enough*. > When removing a kprobe, we do > > 1. disarm int3 > 2. remove kprobe from hlist ("active list") by hlist_del_rcu() > 3. wait for rcu > 4. free kprobe's trampoline > > The possible scenario is > > CPU0 CPU1 > 0. hit int3 > 1. disarm int3 > 2. remove kprobe from hlist > 2.1 run do_int3() > 2.2 kprobe_int3_handler() failed (*) > 2.3 notify_die() failed > 2.4 kernel panic > 3. wait for rcu > 4. free kprobe's trampoline > > (*) because the corresponding kprobe is already removed from the > active list. > > Thus exc_int3() needs a check whether the int3 is already removed or not > before it decides that int3 is a stray int3 (== returning false). > > Or, another possible solution is to add another synchronize_rcu() > or sync_core() right after disarming int3 so that we ensure all > int3 handler at that point are finished. So this another solution is already done. I think we need to add the ghost INT3 check in the exc_int3() as follows; Thank you, From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001 From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Date: Sun, 21 Jan 2024 17:16:50 +0900 Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7eba4c ("x86/kprobes: Drop removed INT3 handling code") --- arch/x86/kernel/traps.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c876f1d36a81..f3e7a99c21fe 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -720,6 +720,25 @@ static bool do_int3(struct pt_regs *regs) } NOKPROBE_SYMBOL(do_int3); +static bool fixup_int3(struct pt_regs *regs) +{ + unsigned long addr = instruction_pointer(regs) - INT3_INSN_SIZE; + + if (*(u8 *)addr != INT3_INSN_OPCODE) { + /* + * The breakpoint instruction was removed right + * after we hit it. Another cpu has removed it + * from this address. In this case, no further + * handling of this interrupt is appropriate. + * Back up over the (now missing) int3 and run + * the original instruction. + */ + instruction_pointer_set(regs, (unsigned long)addr); + return true; + } + return false; +} + static void do_int3_user(struct pt_regs *regs) { if (do_int3(regs)) @@ -757,7 +776,7 @@ DEFINE_IDTENTRY_RAW(exc_int3) irqentry_state_t irq_state = irqentry_nmi_enter(regs); instrumentation_begin(); - if (!do_int3(regs)) + if (!do_int3(regs) && !fixup_int3(regs)) die("int3", regs, 0); instrumentation_end(); irqentry_nmi_exit(regs, irq_state);
Hi Masami, Steven, On 21/01/2024 10:05, Masami Hiramatsu (Google) wrote: > On Sun, 21 Jan 2024 11:28:52 +0900 > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > >> On Sat, 20 Jan 2024 17:05:17 -0500 >> Steven Rostedt <rostedt@goodmis.org> wrote: >> >>> On Sat, 20 Jan 2024 18:44:38 +0100 >>> Matthieu Baerts <matttbe@kernel.org> wrote: >>> >>>> >>>> I'm sorry to reply on a patch that is more than one year old, but in >>> >>> No problem, I've done the same. >> >> Yeah, thanks for reporting! I realized the problem. Thank you both for your quick reply, very useful explanations, analysis and patch! (...) > So this another solution is already done. I think we need to add the > ghost INT3 check in the exc_int3() as follows; > > Thank you, > > From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001 > From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> > Date: Sun, 21 Jan 2024 17:16:50 +0900 > Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled > > INT3 is used not only for software breakpoint, but also self modifying > code on x86 in the kernel. For example, jump_label, function tracer etc. > Those may not handle INT3 after removing it but not waiting for > synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by > anyone because they think it has been removed already. > Recheck there is INT3 on the exception address and if not, ignore it. > > Note that previously kprobes does the same thing by itself, but that is > not a good location to do that because INT3 is commonly used. Do it at > the common place so that it can handle all 'ghost' INT3. I just tested it, and I was able to run pings for 3h without any issues! While at it, and just to be on the safe side, I also re-run the tests after having added a "pr_warn()" -- I know, using printk(), especially when talking to you... but I was not sure what was safe to use at this place in the code :) -- before returning "true" in the new function you added, and we can see that the crash is avoided thanks to the new code: [ 27.422518] traps: crash avoided, addr=18446744071882050317 [ 27.426182] traps: crash avoided, addr=18446744071882050317 [ 370.483208] traps: crash avoided, addr=18446744071882075656 [ 370.485066] traps: crash avoided, addr=18446744071882075656 [ 370.485084] traps: crash avoided, addr=18446744071882075656 [ 592.866416] traps: crash avoided, addr=18446744071882075656 [ 592.867937] traps: crash avoided, addr=18446744071882075656 [ 980.988342] traps: crash avoided, addr=18446744071882050317 [ 980.989866] traps: crash avoided, addr=18446744071882050317 (from my VM running with 2 CPU cores) Again, thank you for the fix! (Just in case you need it:) Tested-by: Matthieu Baerts <matttbe@kernel.org> Cheers, Matt
On Sun, 21 Jan 2024 16:23:35 +0100 Matthieu Baerts <matttbe@kernel.org> wrote: > Hi Masami, Steven, > > On 21/01/2024 10:05, Masami Hiramatsu (Google) wrote: > > On Sun, 21 Jan 2024 11:28:52 +0900 > > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > > >> On Sat, 20 Jan 2024 17:05:17 -0500 > >> Steven Rostedt <rostedt@goodmis.org> wrote: > >> > >>> On Sat, 20 Jan 2024 18:44:38 +0100 > >>> Matthieu Baerts <matttbe@kernel.org> wrote: > >>> > >>>> > >>>> I'm sorry to reply on a patch that is more than one year old, but in > >>> > >>> No problem, I've done the same. > >> > >> Yeah, thanks for reporting! I realized the problem. > > Thank you both for your quick reply, very useful explanations, analysis > and patch! > > (...) > > > So this another solution is already done. I think we need to add the > > ghost INT3 check in the exc_int3() as follows; > > > > Thank you, > > > > From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001 > > From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> > > Date: Sun, 21 Jan 2024 17:16:50 +0900 > > Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled > > > > INT3 is used not only for software breakpoint, but also self modifying > > code on x86 in the kernel. For example, jump_label, function tracer etc. > > Those may not handle INT3 after removing it but not waiting for > > synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by > > anyone because they think it has been removed already. > > Recheck there is INT3 on the exception address and if not, ignore it. > > > > Note that previously kprobes does the same thing by itself, but that is > > not a good location to do that because INT3 is commonly used. Do it at > > the common place so that it can handle all 'ghost' INT3. > > I just tested it, and I was able to run pings for 3h without any issues! > > While at it, and just to be on the safe side, I also re-run the tests > after having added a "pr_warn()" -- I know, using printk(), especially > when talking to you... but I was not sure what was safe to use at this > place in the code :) -- before returning "true" in the new function you > added, and we can see that the crash is avoided thanks to the new code: > > [ 27.422518] traps: crash avoided, addr=18446744071882050317 > [ 27.426182] traps: crash avoided, addr=18446744071882050317 > > [ 370.483208] traps: crash avoided, addr=18446744071882075656 > [ 370.485066] traps: crash avoided, addr=18446744071882075656 > [ 370.485084] traps: crash avoided, addr=18446744071882075656 > > [ 592.866416] traps: crash avoided, addr=18446744071882075656 > [ 592.867937] traps: crash avoided, addr=18446744071882075656 > > [ 980.988342] traps: crash avoided, addr=18446744071882050317 > [ 980.989866] traps: crash avoided, addr=18446744071882050317 > > (from my VM running with 2 CPU cores) > > > Again, thank you for the fix! > > (Just in case you need it:) > > Tested-by: Matthieu Baerts <matttbe@kernel.org> The thing is, the bug is with qemu and *not* the kernel. Masami's patch just paper's over the real bug, and worse, if the kernel has a bug that's not doing proper synchronization, the patch will keep it from being detected. So no, I do not think this is the proper solution. The real problem is that qemu does not seem to be honoring the memory barriers of an interrupt. The reason the code does the ipi's is to force a full memory barrier across all CPUs so that they all see the same memory before going forward to the next step. My guess is that qemu does not treat the IPI being sent as a memory barrier, and then the CPUs do not see a consistent memory view after the IPIs are sent. That's a bug in qemu! This should be reported to the qemu community and should be fixed there. In the mean time, feel free to use Masami's patch in your local repo until qemu is fixed, but it should not be added to Linux mainline. -- Steve
On Sun, 21 Jan 2024 10:31:44 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > My guess is that qemu does not treat the IPI being sent as a memory > barrier, and then the CPUs do not see a consistent memory view after > the IPIs are sent. That's a bug in qemu! More specifically, I bet qemu may be doing a dcache barrier, but not an icache barrier in the interrupt. If the code is already in qemu's pipeline, it may not be flushing it like real hardware would do. -- Steve
On Sun, 21 Jan 2024 18:05:44 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > However, if the I-cache entry servives text_poke() and sync_core(), this > problem may happen. > The text_poke() flushes TLB but for the local (!global) PTE, and sync_core() > just serialize (!= cache flushing?). Thus the other CPUs can still see the Yes, the purpose of the IPIs are for cache flushing, including icache. > INT3 after text_poke_sync()? If so, on such CPU, removed INT3 is still > alive on the I-cache and hit it after text_poke_sync(). > This will be a ghost INT3... An interrupt is a full memory barrier and it looks like qemu is not honoring that. Thus the bug is in qemu and not the kernel. -- Steve
Hi Steven, On 21/01/2024 16:31, Steven Rostedt wrote: > On Sun, 21 Jan 2024 16:23:35 +0100 > Matthieu Baerts <matttbe@kernel.org> wrote: > >> Hi Masami, Steven, >> >> On 21/01/2024 10:05, Masami Hiramatsu (Google) wrote: >>> On Sun, 21 Jan 2024 11:28:52 +0900 >>> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: >>> >>>> On Sat, 20 Jan 2024 17:05:17 -0500 >>>> Steven Rostedt <rostedt@goodmis.org> wrote: >>>> >>>>> On Sat, 20 Jan 2024 18:44:38 +0100 >>>>> Matthieu Baerts <matttbe@kernel.org> wrote: >>>>> >>>>>> >>>>>> I'm sorry to reply on a patch that is more than one year old, but in >>>>> >>>>> No problem, I've done the same. >>>> >>>> Yeah, thanks for reporting! I realized the problem. >> >> Thank you both for your quick reply, very useful explanations, analysis >> and patch! >> >> (...) >> >>> So this another solution is already done. I think we need to add the >>> ghost INT3 check in the exc_int3() as follows; >>> >>> Thank you, >>> >>> From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001 >>> From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> >>> Date: Sun, 21 Jan 2024 17:16:50 +0900 >>> Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled >>> >>> INT3 is used not only for software breakpoint, but also self modifying >>> code on x86 in the kernel. For example, jump_label, function tracer etc. >>> Those may not handle INT3 after removing it but not waiting for >>> synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by >>> anyone because they think it has been removed already. >>> Recheck there is INT3 on the exception address and if not, ignore it. >>> >>> Note that previously kprobes does the same thing by itself, but that is >>> not a good location to do that because INT3 is commonly used. Do it at >>> the common place so that it can handle all 'ghost' INT3. >> >> I just tested it, and I was able to run pings for 3h without any issues! >> >> While at it, and just to be on the safe side, I also re-run the tests >> after having added a "pr_warn()" -- I know, using printk(), especially >> when talking to you... but I was not sure what was safe to use at this >> place in the code :) -- before returning "true" in the new function you >> added, and we can see that the crash is avoided thanks to the new code: >> >> [ 27.422518] traps: crash avoided, addr=18446744071882050317 >> [ 27.426182] traps: crash avoided, addr=18446744071882050317 >> >> [ 370.483208] traps: crash avoided, addr=18446744071882075656 >> [ 370.485066] traps: crash avoided, addr=18446744071882075656 >> [ 370.485084] traps: crash avoided, addr=18446744071882075656 >> >> [ 592.866416] traps: crash avoided, addr=18446744071882075656 >> [ 592.867937] traps: crash avoided, addr=18446744071882075656 >> >> [ 980.988342] traps: crash avoided, addr=18446744071882050317 >> [ 980.989866] traps: crash avoided, addr=18446744071882050317 >> >> (from my VM running with 2 CPU cores) >> >> >> Again, thank you for the fix! >> >> (Just in case you need it:) >> >> Tested-by: Matthieu Baerts <matttbe@kernel.org> > > The thing is, the bug is with qemu and *not* the kernel. Masami's patch > just paper's over the real bug, and worse, if the kernel has a bug > that's not doing proper synchronization, the patch will keep it from > being detected. So no, I do not think this is the proper solution. > > The real problem is that qemu does not seem to be honoring the memory > barriers of an interrupt. The reason the code does the ipi's is to > force a full memory barrier across all CPUs so that they all see the > same memory before going forward to the next step. > > My guess is that qemu does not treat the IPI being sent as a memory > barrier, and then the CPUs do not see a consistent memory view after > the IPIs are sent. That's a bug in qemu! > > This should be reported to the qemu community and should be fixed > there. In the mean time, feel free to use Masami's patch in your local > repo until qemu is fixed, but it should not be added to Linux mainline. Thank you for the explanation! For QEmu, I'm currently not using a recent version: v6.2.0, while the latest one is v8.2.0. I was already suspecting that QEmu could be responsible for this issue -- we don't have the issue with KVM, only TCG -- but it looks like it is not that easy to upgrade it: for the CI, we use virtme, which doesn't support newer versions. We will switch to virtme-ng, upgrade QEmu to a version that is still supported, try to reproduce the issue without Masami's patch, and report that to QEmu community. For the time being, the CI will run the tests with Masami's patch. Cheers, Matt
Hi Steven, On 21/01/2024 17:20, Matthieu Baerts wrote: > On 21/01/2024 16:31, Steven Rostedt wrote: (...) >> The thing is, the bug is with qemu and *not* the kernel. Masami's patch >> just paper's over the real bug, and worse, if the kernel has a bug >> that's not doing proper synchronization, the patch will keep it from >> being detected. So no, I do not think this is the proper solution. >> >> The real problem is that qemu does not seem to be honoring the memory >> barriers of an interrupt. The reason the code does the ipi's is to >> force a full memory barrier across all CPUs so that they all see the >> same memory before going forward to the next step. >> >> My guess is that qemu does not treat the IPI being sent as a memory >> barrier, and then the CPUs do not see a consistent memory view after >> the IPIs are sent. That's a bug in qemu! >> >> This should be reported to the qemu community and should be fixed >> there. In the mean time, feel free to use Masami's patch in your local >> repo until qemu is fixed, but it should not be added to Linux mainline. > > Thank you for the explanation! > > For QEmu, I'm currently not using a recent version: v6.2.0, while the > latest one is v8.2.0. I was already suspecting that QEmu could be > responsible for this issue -- we don't have the issue with KVM, only TCG > -- but it looks like it is not that easy to upgrade it: for the CI, we > use virtme, which doesn't support newer versions. We will switch to > virtme-ng, upgrade QEmu to a version that is still supported, try to > reproduce the issue without Masami's patch, and report that to QEmu > community. FYI, I managed to upgrade QEmu to 8.2.0, and launch the tests: I was not able to reproduce this issue. I guess the bug has already been fixed, I'm sorry for the noise! Cheers, Matt
On Sun, 21 Jan 2024 17:20:26 +0100 Matthieu Baerts <matttbe@kernel.org> wrote: > Hi Steven, > > On 21/01/2024 16:31, Steven Rostedt wrote: > > On Sun, 21 Jan 2024 16:23:35 +0100 > > Matthieu Baerts <matttbe@kernel.org> wrote: > > > >> Hi Masami, Steven, > >> > >> On 21/01/2024 10:05, Masami Hiramatsu (Google) wrote: > >>> On Sun, 21 Jan 2024 11:28:52 +0900 > >>> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > >>> > >>>> On Sat, 20 Jan 2024 17:05:17 -0500 > >>>> Steven Rostedt <rostedt@goodmis.org> wrote: > >>>> > >>>>> On Sat, 20 Jan 2024 18:44:38 +0100 > >>>>> Matthieu Baerts <matttbe@kernel.org> wrote: > >>>>> > >>>>>> > >>>>>> I'm sorry to reply on a patch that is more than one year old, but in > >>>>> > >>>>> No problem, I've done the same. > >>>> > >>>> Yeah, thanks for reporting! I realized the problem. > >> > >> Thank you both for your quick reply, very useful explanations, analysis > >> and patch! > >> > >> (...) > >> > >>> So this another solution is already done. I think we need to add the > >>> ghost INT3 check in the exc_int3() as follows; > >>> > >>> Thank you, > >>> > >>> From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001 > >>> From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> > >>> Date: Sun, 21 Jan 2024 17:16:50 +0900 > >>> Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled > >>> > >>> INT3 is used not only for software breakpoint, but also self modifying > >>> code on x86 in the kernel. For example, jump_label, function tracer etc. > >>> Those may not handle INT3 after removing it but not waiting for > >>> synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by > >>> anyone because they think it has been removed already. > >>> Recheck there is INT3 on the exception address and if not, ignore it. > >>> > >>> Note that previously kprobes does the same thing by itself, but that is > >>> not a good location to do that because INT3 is commonly used. Do it at > >>> the common place so that it can handle all 'ghost' INT3. > >> > >> I just tested it, and I was able to run pings for 3h without any issues! > >> > >> While at it, and just to be on the safe side, I also re-run the tests > >> after having added a "pr_warn()" -- I know, using printk(), especially > >> when talking to you... but I was not sure what was safe to use at this > >> place in the code :) -- before returning "true" in the new function you > >> added, and we can see that the crash is avoided thanks to the new code: > >> > >> [ 27.422518] traps: crash avoided, addr=18446744071882050317 > >> [ 27.426182] traps: crash avoided, addr=18446744071882050317 > >> > >> [ 370.483208] traps: crash avoided, addr=18446744071882075656 > >> [ 370.485066] traps: crash avoided, addr=18446744071882075656 > >> [ 370.485084] traps: crash avoided, addr=18446744071882075656 > >> > >> [ 592.866416] traps: crash avoided, addr=18446744071882075656 > >> [ 592.867937] traps: crash avoided, addr=18446744071882075656 > >> > >> [ 980.988342] traps: crash avoided, addr=18446744071882050317 > >> [ 980.989866] traps: crash avoided, addr=18446744071882050317 > >> > >> (from my VM running with 2 CPU cores) > >> > >> > >> Again, thank you for the fix! > >> > >> (Just in case you need it:) > >> > >> Tested-by: Matthieu Baerts <matttbe@kernel.org> > > > > The thing is, the bug is with qemu and *not* the kernel. Masami's patch > > just paper's over the real bug, and worse, if the kernel has a bug > > that's not doing proper synchronization, the patch will keep it from > > being detected. So no, I do not think this is the proper solution. > > > > The real problem is that qemu does not seem to be honoring the memory > > barriers of an interrupt. The reason the code does the ipi's is to > > force a full memory barrier across all CPUs so that they all see the > > same memory before going forward to the next step. > > > > My guess is that qemu does not treat the IPI being sent as a memory > > barrier, and then the CPUs do not see a consistent memory view after > > the IPIs are sent. That's a bug in qemu! > > > > This should be reported to the qemu community and should be fixed > > there. In the mean time, feel free to use Masami's patch in your local > > repo until qemu is fixed, but it should not be added to Linux mainline. > > Thank you for the explanation! > > For QEmu, I'm currently not using a recent version: v6.2.0, while the > latest one is v8.2.0. I was already suspecting that QEmu could be > responsible for this issue -- we don't have the issue with KVM, only TCG Ah, I missed this point. If KVM works well, I agree that this is a qemu TCG's bug. I guess TCG implementation forgets to serialize CPU when the IPI comes. > -- but it looks like it is not that easy to upgrade it: for the CI, we > use virtme, which doesn't support newer versions. We will switch to > virtme-ng, upgrade QEmu to a version that is still supported, try to > reproduce the issue without Masami's patch, and report that to QEmu > community. Thanks, and if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling Self- and Cross-Modifying Code" said that what the other CPU needs to do is "Execute serializing instruction; (* For example, CPUID instruction *)" for cross-modifying code. that has been done in do_sync_core(). Thus this bug should not happen. Thank you! > > For the time being, the CI will run the tests with Masami's patch. > > Cheers, > Matt > -- > Sponsored by the NGI0 Core fund.
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 66299682b6b7..33390ed4dcf3 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -986,20 +986,6 @@ int kprobe_int3_handler(struct pt_regs *regs) kprobe_post_process(p, regs, kcb); return 1; } - } - - if (*addr != INT3_INSN_OPCODE) { - /* - * The breakpoint instruction was removed right - * after we hit it. Another cpu has removed - * either a probepoint or a debugger breakpoint - * at this address. In either case, no further - * handling of this interrupt is appropriate. - * Back up over the (now missing) int3 and run - * the original instruction. - */ - regs->ip = (unsigned long)addr; - return 1; } /* else: not a kprobe fault; let the kernel handle it */ return 0;