Message ID | 20221202155817.2102944-1-vschneid@redhat.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp937410wrr; Fri, 2 Dec 2022 08:01:07 -0800 (PST) X-Google-Smtp-Source: AA0mqf4w+AHSODho99UxcB3wtV6ZWMH1SnO7VJqBiVLcueqNbGu4CAphj3dX4Thy8DfV0+q6/y2K X-Received: by 2002:a17:902:b707:b0:189:5f3c:fb25 with SMTP id d7-20020a170902b70700b001895f3cfb25mr41657768pls.123.1669996866828; Fri, 02 Dec 2022 08:01:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669996866; cv=none; d=google.com; s=arc-20160816; b=voplpUAaiQn0UfIUYjQGg4/dKLbYddIgUxHD2oYaKXYAi7x4e10sD4x9gHe3LV1JVX ynFrnY22OV7+HyGMZc23UYTME2X+y1hvr5BOULGInha+xeUynKobPDfWzck7sK/eybol 63Wsk7vMH8nGiTzCFD2Ah7Xw68+5mRpgy91U+LwPB0dfV3bFvt9Yu+FVB0CC4MGaVBbk CGyOHjQRa4/fX28HJ7pm8RW64rtA3VF46pUEuLploGazNGywNk9lm7SqnkGcMVKUY4j0 ps+jzBT+qv+p5AR18DPK7oiE7vC8c3DTmdXAmCFOvwzClYAnnJ6zbzuqRHpNRtpL0He/ 1gUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=B+hXh5P61A2DdBxK6OLw+4jPvT0r9cthvTrIhcW2QOA=; b=GaF1JYPa58/UUYwlMjgMc15DCRSTV59TiW+TF2k7B4REMNqcQU+6OC2PcHBaeyJUpo siqpHKqC/GeKTYucmedvud+EUzM0na4S8ay7U1HALcscR/941xeorX9nW5m1AJIhusTP I4iCE1mmXQh06a97+sNyeYLOaVR+eBPypkUGWhEoyhrm/m/RRQCcw/8yVXywiNGPWPy8 3YYLuUanj/QIs1RpfYJXrAD2VHGAjcWDhavs0xB9aB6sssvpJQ7gKKwhAMTxisWh07hP gIC7FG6PznM0WFbqC2DSetgzlpLoKlqwX8OWiqOESnbModBiCZN7NfSrsR8mJrJtdjM/ LLCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=V5EvK5la; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 130-20020a621988000000b0057346470e4fsi7214756pfz.344.2022.12.02.08.00.52; Fri, 02 Dec 2022 08:01:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=V5EvK5la; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233712AbiLBP77 (ORCPT <rfc822;lhua1029@gmail.com> + 99 others); Fri, 2 Dec 2022 10:59:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233568AbiLBP75 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Dec 2022 10:59:57 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1357C9A4CD for <linux-kernel@vger.kernel.org>; Fri, 2 Dec 2022 07:58:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669996733; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=B+hXh5P61A2DdBxK6OLw+4jPvT0r9cthvTrIhcW2QOA=; b=V5EvK5la88P0Vw5L9npruJd3zjFlCQYiESb38SrZ/Fl4Eq+ORERbxxQ0Z7JFFZapejU49M Wtbr2THtrH8iXGA9BmLddTIDGM0M3Ak28xYAY2t+3SV8X85Mudy9hxt8H9XB0wOb96TkBm sDO61h1w2euzOQV70lpGLMyL11mfDmU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-411-aad4T597M-uqDalSOOGakQ-1; Fri, 02 Dec 2022 10:58:50 -0500 X-MC-Unique: aad4T597M-uqDalSOOGakQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 471C8800B23; Fri, 2 Dec 2022 15:58:48 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.33.36.77]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 437932028CE4; Fri, 2 Dec 2022 15:58:42 +0000 (UTC) From: Valentin Schneider <vschneid@redhat.com> To: linux-alpha@vger.kernel.org, linux-kernel@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, openrisc@lists.librecores.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, x86@kernel.org Cc: "Paul E. McKenney" <paulmck@kernel.org>, Steven Rostedt <rostedt@goodmis.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Juri Lelli <juri.lelli@redhat.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com>, Frederic Weisbecker <frederic@kernel.org>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Marc Zyngier <maz@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Russell King <linux@armlinux.org.uk>, Nicholas Piggin <npiggin@gmail.com>, Guo Ren <guoren@kernel.org>, "David S. Miller" <davem@davemloft.net> Subject: [PATCH v3 0/8] Generic IPI sending tracepoint Date: Fri, 2 Dec 2022 15:58:09 +0000 Message-Id: <20221202155817.2102944-1-vschneid@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751118634407406581?= X-GMAIL-MSGID: =?utf-8?q?1751118634407406581?= |
Series |
Generic IPI sending tracepoint
|
|
Message
Valentin Schneider
Dec. 2, 2022, 3:58 p.m. UTC
Background ========== Detecting IPI *reception* is relatively easy, e.g. using trace_irq_handler_{entry,exit} or even just function-trace flush_smp_call_function_queue() for SMP calls. Figuring out their *origin*, is trickier as there is no generic tracepoint tied to e.g. smp_call_function(): o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them (cf. trace_call_function{_single}_entry()). o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a mostly useless string (smp_calls will all be "Function call interrupts"). o Other architectures don't seem to have any IPI-sending related tracepoint. I believe one reason those tracepoints used by arm/arm64 ended up as they were is because these archs used to handle IPIs differently from regular interrupts (the IRQ driver would directly invoke an IPI-handling routine), which meant they never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit} tracepoints gave a way to trace IPI reception but those have become redundant as of: 56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts") d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts") which gave IPIs a "proper" handler function used through generic_handle_domain_irq(), which makes them show up via trace_irq_handler_{entry, exit}. Changing stuff up ================= Per the above, it would make sense to reshuffle trace_ipi_raise() and move it into generic code. This also came up during Daniel's talk on Osnoise at the CPU isolation MC of LPC 2022 [1]. Now, to be useful, such a tracepoint needs to export: o targeted CPU(s) o calling context The only way to get the calling context with trace_ipi_raise() is to trigger a stack dump, e.g. $(trace-cmd -e ipi* -T echo 42). This is instead introducing a new tracepoint which exports the relevant context (callsite, and requested callback for when the callsite isn't helpful), and is usable by all architectures as it sits in generic code. Another thing worth mentioning is that depending on the callsite, the _RET_IP_ fed to the tracepoint is not always useful - generic_exec_single() doesn't tell you much about the actual callback being sent via IPI, which is why the new tracepoint also has a @callback argument. Patches ======= o Patch 1 is included for convenience and will be merged independently. FYI I have libtraceevent patches [2] to improve the pretty-printing of cpumasks using the new type, which look like: <...>-3322 [021] 560.402583: ipi_send_cpumask: cpumask=14,17,21 callsite=on_each_cpu_cond_mask+0x40 callback=flush_tlb_func+0x0 <...>-187 [010] 562.590584: ipi_send_cpumask: cpumask=0-23 callsite=on_each_cpu_cond_mask+0x40 callback=do_sync_core+0x0 o Patches 2-6 spread out the tracepoint across relevant sites. Patch 6 ends up sprinkling lots of #include <trace/events/ipi.h> which I'm not the biggest fan of, but is the least horrible solution I've been able to come up with so far. o Patch 8 is trying to be smart about tracing the callback associated with the IPI. This results in having IPI trace events for: o smp_call_function*() o smp_send_reschedule() o irq_work_queue*() o standalone uses of __smp_call_single_queue() This is incomplete, just looking at arm64 there's more IPI types that aren't covered: IPI_CPU_STOP, IPI_CPU_CRASH_STOP, IPI_TIMER, IPI_WAKEUP, ... But it feels like a good starting point. Links ===== [1]: https://youtu.be/5gT57y4OzBM?t=14234 [2]: https://lore.kernel.org/all/20221116144154.3662923-1-vschneid@redhat.com/ Revisions ========= v2 -> v3 ++++++++ o Dropped the generic export of smp_send_reschedule(), turned it into a macro and a bunch of imports o Dropped the send_call_function_single_ipi() macro madness, split it into sched and smp bits using some of Peter's suggestions v1 -> v2 ++++++++ o Ditched single-CPU tracepoint o Changed tracepoint signature to include callback o Changed tracepoint callsite field to void *; the parameter is still UL to save up on casts due to using _RET_IP_. o Fixed linking failures due to not exporting smp_send_reschedule() Steven Rostedt (Google) (1): tracing: Add __cpumask to denote a trace event field that is a cpumask_t Valentin Schneider (7): trace: Add trace_ipi_send_cpumask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() irq_work: Trace self-IPIs sent via arch_irq_work_raise() treewide: Trace IPIs sent via smp_send_reschedule() smp: reword smp call IPI comment sched, smp: Trace smp callback causing an IPI arch/alpha/kernel/smp.c | 2 +- arch/arc/kernel/smp.c | 2 +- arch/arm/kernel/smp.c | 5 +- arch/arm/mach-actions/platsmp.c | 2 + arch/arm64/kernel/smp.c | 3 +- arch/csky/kernel/smp.c | 2 +- arch/hexagon/kernel/smp.c | 2 +- arch/ia64/kernel/smp.c | 4 +- arch/loongarch/include/asm/smp.h | 2 +- arch/mips/include/asm/smp.h | 2 +- arch/mips/kernel/rtlx-cmp.c | 2 + arch/openrisc/kernel/smp.c | 2 +- arch/parisc/kernel/smp.c | 4 +- arch/powerpc/kernel/smp.c | 6 +- arch/powerpc/kvm/book3s_hv.c | 3 + arch/powerpc/platforms/powernv/subcore.c | 2 + arch/riscv/kernel/smp.c | 4 +- arch/s390/kernel/smp.c | 2 +- arch/sh/kernel/smp.c | 2 +- arch/sparc/kernel/smp_32.c | 2 +- arch/sparc/kernel/smp_64.c | 2 +- arch/x86/include/asm/smp.h | 2 +- arch/x86/kvm/svm/svm.c | 4 + arch/x86/kvm/x86.c | 2 + arch/xtensa/kernel/smp.c | 2 +- include/linux/smp.h | 8 +- include/trace/bpf_probe.h | 6 ++ include/trace/events/ipi.h | 22 ++++++ include/trace/perf.h | 6 ++ include/trace/stages/stage1_struct_define.h | 6 ++ include/trace/stages/stage2_data_offsets.h | 6 ++ include/trace/stages/stage3_trace_output.h | 6 ++ include/trace/stages/stage4_event_fields.h | 6 ++ include/trace/stages/stage5_get_offsets.h | 6 ++ include/trace/stages/stage6_event_callback.h | 20 +++++ include/trace/stages/stage7_class_define.h | 2 + kernel/irq_work.c | 14 +++- kernel/sched/core.c | 19 +++-- kernel/sched/smp.h | 2 +- kernel/smp.c | 78 ++++++++++++++++---- samples/trace_events/trace-events-sample.c | 2 +- samples/trace_events/trace-events-sample.h | 34 +++++++-- virt/kvm/kvm_main.c | 1 + 43 files changed, 250 insertions(+), 61 deletions(-) -- 2.31.1
Comments
On Fri, 02 Dec 2022 07:58:09 PST (-0800), vschneid@redhat.com wrote: > Background > ========== > > Detecting IPI *reception* is relatively easy, e.g. using > trace_irq_handler_{entry,exit} or even just function-trace > flush_smp_call_function_queue() for SMP calls. > > Figuring out their *origin*, is trickier as there is no generic tracepoint tied > to e.g. smp_call_function(): > > o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them > (cf. trace_call_function{_single}_entry()). > o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a > mostly useless string (smp_calls will all be "Function call interrupts"). > o Other architectures don't seem to have any IPI-sending related tracepoint. > > I believe one reason those tracepoints used by arm/arm64 ended up as they were > is because these archs used to handle IPIs differently from regular interrupts > (the IRQ driver would directly invoke an IPI-handling routine), which meant they > never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit} > tracepoints gave a way to trace IPI reception but those have become redundant as > of: > > 56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts") > d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts") > > which gave IPIs a "proper" handler function used through > generic_handle_domain_irq(), which makes them show up via > trace_irq_handler_{entry, exit}. > > Changing stuff up > ================= > > Per the above, it would make sense to reshuffle trace_ipi_raise() and move it > into generic code. This also came up during Daniel's talk on Osnoise at the CPU > isolation MC of LPC 2022 [1]. > > Now, to be useful, such a tracepoint needs to export: > o targeted CPU(s) > o calling context > > The only way to get the calling context with trace_ipi_raise() is to trigger a > stack dump, e.g. $(trace-cmd -e ipi* -T echo 42). > > This is instead introducing a new tracepoint which exports the relevant context > (callsite, and requested callback for when the callsite isn't helpful), and is > usable by all architectures as it sits in generic code. > > Another thing worth mentioning is that depending on the callsite, the _RET_IP_ > fed to the tracepoint is not always useful - generic_exec_single() doesn't tell > you much about the actual callback being sent via IPI, which is why the new > tracepoint also has a @callback argument. > > Patches > ======= > > o Patch 1 is included for convenience and will be merged independently. FYI I > have libtraceevent patches [2] to improve the > pretty-printing of cpumasks using the new type, which look like: > <...>-3322 [021] 560.402583: ipi_send_cpumask: cpumask=14,17,21 callsite=on_each_cpu_cond_mask+0x40 callback=flush_tlb_func+0x0 > <...>-187 [010] 562.590584: ipi_send_cpumask: cpumask=0-23 callsite=on_each_cpu_cond_mask+0x40 callback=do_sync_core+0x0 > > o Patches 2-6 spread out the tracepoint across relevant sites. > Patch 6 ends up sprinkling lots of #include <trace/events/ipi.h> which I'm not > the biggest fan of, but is the least horrible solution I've been able to come > up with so far. > > o Patch 8 is trying to be smart about tracing the callback associated with the > IPI. > > This results in having IPI trace events for: > > o smp_call_function*() > o smp_send_reschedule() > o irq_work_queue*() > o standalone uses of __smp_call_single_queue() > > This is incomplete, just looking at arm64 there's more IPI types that aren't > covered: > > IPI_CPU_STOP, > IPI_CPU_CRASH_STOP, > IPI_TIMER, > IPI_WAKEUP, > > ... But it feels like a good starting point. > > Links > ===== > > [1]: https://youtu.be/5gT57y4OzBM?t=14234 > [2]: https://lore.kernel.org/all/20221116144154.3662923-1-vschneid@redhat.com/ > > Revisions > ========= > > v2 -> v3 > ++++++++ > > o Dropped the generic export of smp_send_reschedule(), turned it into a macro > and a bunch of imports > o Dropped the send_call_function_single_ipi() macro madness, split it into sched > and smp bits using some of Peter's suggestions > > v1 -> v2 > ++++++++ > > o Ditched single-CPU tracepoint > o Changed tracepoint signature to include callback > o Changed tracepoint callsite field to void *; the parameter is still UL to save > up on casts due to using _RET_IP_. > o Fixed linking failures due to not exporting smp_send_reschedule() > > Steven Rostedt (Google) (1): > tracing: Add __cpumask to denote a trace event field that is a > cpumask_t > > Valentin Schneider (7): > trace: Add trace_ipi_send_cpumask() > sched, smp: Trace IPIs sent via send_call_function_single_ipi() > smp: Trace IPIs sent via arch_send_call_function_ipi_mask() > irq_work: Trace self-IPIs sent via arch_irq_work_raise() > treewide: Trace IPIs sent via smp_send_reschedule() > smp: reword smp call IPI comment > sched, smp: Trace smp callback causing an IPI > > arch/alpha/kernel/smp.c | 2 +- > arch/arc/kernel/smp.c | 2 +- > arch/arm/kernel/smp.c | 5 +- > arch/arm/mach-actions/platsmp.c | 2 + > arch/arm64/kernel/smp.c | 3 +- > arch/csky/kernel/smp.c | 2 +- > arch/hexagon/kernel/smp.c | 2 +- > arch/ia64/kernel/smp.c | 4 +- > arch/loongarch/include/asm/smp.h | 2 +- > arch/mips/include/asm/smp.h | 2 +- > arch/mips/kernel/rtlx-cmp.c | 2 + > arch/openrisc/kernel/smp.c | 2 +- > arch/parisc/kernel/smp.c | 4 +- > arch/powerpc/kernel/smp.c | 6 +- > arch/powerpc/kvm/book3s_hv.c | 3 + > arch/powerpc/platforms/powernv/subcore.c | 2 + > arch/riscv/kernel/smp.c | 4 +- > arch/s390/kernel/smp.c | 2 +- > arch/sh/kernel/smp.c | 2 +- > arch/sparc/kernel/smp_32.c | 2 +- > arch/sparc/kernel/smp_64.c | 2 +- > arch/x86/include/asm/smp.h | 2 +- > arch/x86/kvm/svm/svm.c | 4 + > arch/x86/kvm/x86.c | 2 + > arch/xtensa/kernel/smp.c | 2 +- > include/linux/smp.h | 8 +- > include/trace/bpf_probe.h | 6 ++ > include/trace/events/ipi.h | 22 ++++++ > include/trace/perf.h | 6 ++ > include/trace/stages/stage1_struct_define.h | 6 ++ > include/trace/stages/stage2_data_offsets.h | 6 ++ > include/trace/stages/stage3_trace_output.h | 6 ++ > include/trace/stages/stage4_event_fields.h | 6 ++ > include/trace/stages/stage5_get_offsets.h | 6 ++ > include/trace/stages/stage6_event_callback.h | 20 +++++ > include/trace/stages/stage7_class_define.h | 2 + > kernel/irq_work.c | 14 +++- > kernel/sched/core.c | 19 +++-- > kernel/sched/smp.h | 2 +- > kernel/smp.c | 78 ++++++++++++++++---- > samples/trace_events/trace-events-sample.c | 2 +- > samples/trace_events/trace-events-sample.h | 34 +++++++-- > virt/kvm/kvm_main.c | 1 + > 43 files changed, 250 insertions(+), 61 deletions(-) Acked-by: Palmer Dabbelt <palmer@rivosinc.com> # riscv