From patchwork Thu Nov 9 00:41:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josh Poimboeuf X-Patchwork-Id: 16393 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp138356vqs; Wed, 8 Nov 2023 16:43:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IGeo30pw7nfKXg+/zmMMDgtNoosNAZ7hL916oZx98D+DDksNkXTgYONTdJsnbBfWC07IatG X-Received: by 2002:a05:6358:89e:b0:168:d2b5:db9 with SMTP id m30-20020a056358089e00b00168d2b50db9mr2297059rwj.6.1699490639491; Wed, 08 Nov 2023 16:43:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699490639; cv=none; d=google.com; s=arc-20160816; b=puzoMaicJRfzeRYkxwzaCJ3tSwpYMzuBHr05EgMBDsDfCepjP/hAvfoePpUWsbC+Ra sqXIiKNotTUV6yyVfZ68tZ9uVzyv4/2GxllmufgyIvKbsvKiJzXQS2zEG4TOFejfTECl No+HRTTzQleuTA0JX2L3r+eAXgRmDxvI5Suz6U034ogg7zY1ShWusFRpnDrltQcK5Dd9 PWdPRFFMa6W6CpGW/qmLKAjl87fhmqzqF+5UfoxVKMo3jMO1k6kDgfA+mj80bcOzALRz CXhWFguu2YcH4YHLcaLVniUPdr56wJdUwy4D/XFwc3koNSNs4ei8SdbEVN3EkfxyguXV 5CsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=EsrlRFMUwxTHY0OLUvb+B45aLe8i4wH34FbFIoosYt4=; fh=3ahOA1+93SqEqAeT96I/cgRVlate7pzd5ZZzl/6KbbU=; b=a0Ginzu5aRoCf42KOAK91Ftaotyb7N49MnXTc1l23UOmKFx+JnfvwU1j+jInvfe+ig BalKb4HoGF2mMoPpJMHlt2lwUvWRM7iNSQ3DE6i0We1miE5a7Y8hXDl3fLnW4xeBdbQI uBy7kWmuOxaERXD60tQArfuVwm47P9T53Zls1faPm6LJuYbe32zGmuGvxlBQDgG2685b QwNSj0VK5kc3Bf0Z3ArSFLNglE37fJGArhmucxYxH6yuefi7fo7YvXCc6ONSV4wPcsf1 HRzGdj9DScET0DStuQ2F6OQXFELFygLqy9YCCldCYh6i6B6h1+G+AzXzJKPbU39LXqiu iw5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QvNtqecZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id a67-20020a636646000000b005b106cd44casi6041753pgc.145.2023.11.08.16.43.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 16:43:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QvNtqecZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 6FC168290BBF; Wed, 8 Nov 2023 16:43:50 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231873AbjKIAnm (ORCPT + 32 others); Wed, 8 Nov 2023 19:43:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229565AbjKIAnk (ORCPT ); Wed, 8 Nov 2023 19:43:40 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC777C6; Wed, 8 Nov 2023 16:43:38 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CF486C433C7; Thu, 9 Nov 2023 00:43:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490618; bh=Pc9caaey7aHM7qDFPo67u9OP06WjqgJKiis0yYkxtEo=; h=From:To:Cc:Subject:Date:From; b=QvNtqecZLEG36r7OUjMTHtHT0nk+RcYvspPKJ5ODKiAql8/K1UnHG5f4AzwYfIsZN eNXKcCs8zhEYaVmWocXHQmgYjpfDcT8Vnuor3Gh3PP8s7TT3guiZO8WWB59lEh2Vck Yu/52a0vZ5bWxZ6EKmqi1Q2B88baanD1NflpHB4AS4aVJN5cZw7g6W87N+YgznI2qp pFHMJTQy1QTtJpzP07z+zDy96wGHT23MJPMBPsRl/fq6lCeT822+svGFV181UjQDKR WjaCuzjtlHNcB+znwzW2MZ4UIOapOuxZnNvWKMyFwIOUb2TNjsM3jOYSztZd5uM94l 8QEcCDETcw2Tw== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 00/10] perf: user space sframe unwinding Date: Wed, 8 Nov 2023 16:41:05 -0800 Message-ID: X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 08 Nov 2023 16:43:50 -0800 (PST) X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782045096768502377 X-GMAIL-MSGID: 1782045096768502377 Some distros have started compiling frame pointers into all their packages to enable the kernel to do system-wide profiling of user space. Unfortunately that creates a runtime performance penalty across the entire system. Using DWARF (or .eh_frame) instead isn't feasible because of complexity and slowness. For in-kernel unwinding we solved this problem with the creation of the ORC unwinder for x86_64. Similarly, for user space the GNU assembler has created the SFrame ("Simple Frame") format starting with binutils 2.40. These patches add support for unwinding user space from the kernel using SFrame with perf. It should be easy to add user unwinding support for other components like ftrace. I tested it on Gentoo by recompiling everything with -Wa,-gsframe and using a custom glibc patch (which I'll send in a reply to this email). The unwinding itself seems to work well, though I still have a major problem: how to tell perf tool to stitch together the separate kernel+user callchains into a single event? Right now I have a hack which somehow causes perf tool to overwrite the kernel callchain with the user one. I'm perf-clueless, any ideas or patches for a clean way to implement that would be most helpful. Otherwise there were two main challenges: 1) Finding .sframe sections in shared/dlopened libraries The kernel has no visibility to the contents of shared libraries. This was solved by adding a PR_ADD_SFRAME option to prctl() which allows the runtime linker to manually provide the in-memory address of an .sframe section to the kernel. 2) Dealing with page faults Keeping all binaries' sframe data pinned would likely waste a lot of memory. Instead, read it from user space on demand. That can't be done from perf NMI context due to page faults, so defer the unwind to the next user exit. Since the NMI handler doesn't do exit work, self-IPI and then schedule task work to be run on exit from the IPI. Special thanks to Indu for the original concept, and to Steven and Peter for helping a lot with the design. And to Steven for letting me do it ;-) TODO: - Stitch kernel+user events together in perf tool (help needed) - Add arm64 support - Add VDSO .sframe support - Allow specifying FP vs sframe from perf tool? Right now it's auto-detected, maybe that's enough - Port ftrace and others to use sframe - Support sframe v2 - Determine the impact of missing DRAP support (aligned stacks which SFrame doesn't currently support) - Add debugging hooks Josh Poimboeuf (10): perf: Remove get_perf_callchain() 'init_nr' argument perf: Remove get_perf_callchain() 'crosstask' argument perf: Simplify get_perf_callchain() user logic perf: Introduce deferred user callchains perf/x86: Add HAVE_PERF_CALLCHAIN_DEFERRED unwind: Introduce generic user space unwinding interfaces unwind/x86: Add HAVE_USER_UNWIND perf/x86: Use user_unwind interface unwind: Introduce SFrame user space unwinding unwind/x86/64: Add HAVE_USER_UNWIND_SFRAME arch/Kconfig | 9 + arch/x86/Kconfig | 3 + arch/x86/events/core.c | 65 ++--- arch/x86/include/asm/mmu.h | 2 +- arch/x86/include/asm/user_unwind.h | 11 + fs/binfmt_elf.c | 46 +++- include/linux/mm_types.h | 3 + include/linux/perf_event.h | 24 +- include/linux/sframe.h | 46 ++++ include/linux/user_unwind.h | 33 +++ include/uapi/linux/elf.h | 1 + include/uapi/linux/perf_event.h | 1 + include/uapi/linux/prctl.h | 3 + kernel/Makefile | 1 + kernel/bpf/stackmap.c | 6 +- kernel/events/callchain.c | 39 ++- kernel/events/core.c | 96 ++++++- kernel/fork.c | 10 + kernel/sys.c | 11 + kernel/unwind/Makefile | 2 + kernel/unwind/sframe.c | 414 +++++++++++++++++++++++++++++ kernel/unwind/sframe.h | 217 +++++++++++++++ kernel/unwind/user.c | 86 ++++++ mm/init-mm.c | 2 + 24 files changed, 1060 insertions(+), 71 deletions(-) create mode 100644 arch/x86/include/asm/user_unwind.h create mode 100644 include/linux/sframe.h create mode 100644 include/linux/user_unwind.h create mode 100644 kernel/unwind/Makefile create mode 100644 kernel/unwind/sframe.c create mode 100644 kernel/unwind/sframe.h create mode 100644 kernel/unwind/user.c