Message ID | 20240212213922.783301-1-surenb@google.com |
---|---|
Headers |
Return-Path: <linux-kernel+bounces-62406-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp181827dyb; Mon, 12 Feb 2024 13:40:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IGqqS4HCmCkcpfexNI+EPfEgWtFv9Kw2sCZR9yY7DV8UNKuUyiJMqytIIwisIcPaaf+V1Zx X-Received: by 2002:a05:6a21:3998:b0:19e:c5cb:9667 with SMTP id ad24-20020a056a21399800b0019ec5cb9667mr5917649pzc.16.1707774054345; Mon, 12 Feb 2024 13:40:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707774054; cv=pass; d=google.com; s=arc-20160816; b=DPonXBBFCWCESTWPe4XX6hrWxBA6db69AlZ8AmyGGtfi1TzZ6cFV/sBENWhjO7R2DH yi9lYvoHbSz8rMvoKbaPkyhbbdaYi9HWnwDjobTzceEKEJ0rqMe9q7Z5XIc1QJo1sdZQ n1gF3IH+wb0jSn+RAjG55iZBlp9saRmH/rtP0UDF3FQVc4/mUmdQEXBBQWwGL0gFeSFk cOWAM3baj1OE2jR9yHxzExjXs7eRzkmnrmBI+TgSVKWCGHwtKgMLBiIqVmsyEyXbO6b+ 3s6KUBWcCngfE9eBjhIY/C7DMA/gz8XX528opO7aktRSgA20J3aRSAD8/yQgDiqzrnME WftA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:dkim-signature; bh=vA/ePHR05wabM4Y5MMpygOj0r4Hku8hMtHDwkSr6xQQ=; fh=zZolaVqi/udScCLdMQPhUVG63qULQPVxXi1kPGdo8lI=; b=FKH7mtKIJfl0ljzkLDQd9J7p3HFV2HeJO18Fcdd7CgED1/mG6CBmkPgrLVfPrBERQB sxnj0AGqXey/CyfKCWUYndcYMz6do+cNkaMSPz+MmZgtT5dk+6ynhFvovw7o6BL0NPIk KYW29lQC28PM+z2EmiOtcAHGDy1Mqk4sMxDLCqt23GoL7k/D3QR1wSmNKLhGRtC7Jb9q mX2hbw0JECaIeHIMQR1D80LyGsmdB7kr3q1UqglAieXs24njXtR+c/+giZvrvCeMsORl U+x26E//Uj8krZzVVJRRuTytF86Z0B0M3VCAza6aPjGALg9zCis7I8IHpYVB6vy1tR4T oqfg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=3K8eOlS2; arc=pass (i=1 spf=pass spfdomain=flex--surenb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-62406-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62406-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com X-Forwarded-Encrypted: i=2; AJvYcCUR5TrIF+QygxZHE0n/f+GnO139e2tMOR2SfEyhgnpIyO66yVc7JD4Vz9NUT/fQeW0Cyntu5MoFRZBqgh79DdMWfnUuCA== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id j20-20020a056a00175400b006e0381a4a95si5312635pfc.402.2024.02.12.13.40.54 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Feb 2024 13:40:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-62406-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=3K8eOlS2; arc=pass (i=1 spf=pass spfdomain=flex--surenb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-62406-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62406-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 1E8A6285736 for <ouuuleilei@gmail.com>; Mon, 12 Feb 2024 21:40:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DA9074CDE1; Mon, 12 Feb 2024 21:39:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3K8eOlS2" Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E1B44D5A2 for <linux-kernel@vger.kernel.org>; Mon, 12 Feb 2024 21:39:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707773976; cv=none; b=dBCJvVKzu0ZCBQxz453vR4mgeSLG2G+GR/ZGeU65QBoS8wMeC6wo2RCsXRnaX7wVSHr6hyHKGXvNr9eefODsBvw3aD0rzRILU1UiY1vY3CCP9o9wa4UlSXGg5VLLxQ2BDmHVJgujOyjQsiZirzz4cIyiw11T0oMziWwFHIER6fM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707773976; c=relaxed/simple; bh=s3LEGYg8VUwRWvZpWWOEEverrgI0r872J86EZP7SPxw=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=dDndnqCt0YXZ/sXYNt9EdBgf1Zj1g6g+47xJ6gnT+KeUMeBCYR6AjSDr0riprLk/VcWis6dzxbkIO2BY782a4yJPtpMUXyf8TV21iAskswI2VWZltcggJXnXNHjortR+/UNd+fkipc1m+FQsi5SQy8zvgxjHT8thn9kp27XhGJQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3K8eOlS2; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dcc58cddb50so290810276.0 for <linux-kernel@vger.kernel.org>; Mon, 12 Feb 2024 13:39:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707773972; x=1708378772; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=vA/ePHR05wabM4Y5MMpygOj0r4Hku8hMtHDwkSr6xQQ=; b=3K8eOlS2fg2MQ3a3n/2AUg6ViRTW1C6K26sF+QSzMvOShaxAmsfaYbHk5sC/5Xch04 0IdRPV31S8TJxahklR+wQ3ZAVgmLTb0076g6n2bjDUKE1Ws5e8amDAzALSo5xl5+KCGk zLqFWFe7uH9lyCq53DUwEvjSVAHSuG78oWes9u5mRQf+zm471q8JcERFV9TsPlY4TQlt e0TFOnLdPCt7yQTpGfGRdFhky5LUaLBEYpyf/syXl91QqOM6Nf70VIPq7ceytPJM+aFc sMB/7Q/OChvpVpQ/PVfwvWCZUzBVaEWhTph96dE8EOZzHt8csRMnqfDTERv84ruw5bWg wPSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707773972; x=1708378772; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vA/ePHR05wabM4Y5MMpygOj0r4Hku8hMtHDwkSr6xQQ=; b=g+47oP7UMIu3rqjI7SL6TR3UZ2oJaXtsjHp0qYbQnTr6qHwpxovm0CDwOdPpm5unw4 PHGFBNieE+BlEGukZu5LYi2lyAsFvHPD4VnG8+2tjM8uN+5fBmI3X97WQaZvataNu9By HlxpNi15xwkY9pIucaYXtumH7Hy7r0jWakqAaZKiWwcLRxisXFw7biq6OC9kxwNA+jiR iGhBhKWQsY2MbiNIJNpHy0h8dhQ8QNCEA6SI0K9tkYiKY1Dg6rSoarQoH48lIOOz1EdU 9fH7aXQ47lLYpvLivIySxekpoeuqdxO8DKgijgM4jrn2QaOkeH74sPk6dbeSOPFq72t8 +lPw== X-Gm-Message-State: AOJu0Yx8tJNEXefT8fjP8f/QADS4fgBWgmWBrx3K9HXCFNux6g6hxe4c aCQzvSK+SOdxp5wacu7TRIUK0Dca+Yem/WIasm2WBJ2lSpgiY3olTvndyvL6S6W+3lKqgK47gNF ZYw== X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:b848:2b3f:be49:9cbc]) (user=surenb job=sendgmr) by 2002:a05:6902:188c:b0:dbd:b165:441 with SMTP id cj12-20020a056902188c00b00dbdb1650441mr2291367ybb.0.1707773972271; Mon, 12 Feb 2024 13:39:32 -0800 (PST) Date: Mon, 12 Feb 2024 13:38:46 -0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog Message-ID: <20240212213922.783301-1-surenb@google.com> Subject: [PATCH v3 00/35] Memory allocation profiling From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, surenb@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790730886507589680 X-GMAIL-MSGID: 1790730886507589680 |
Series |
Memory allocation profiling
|
|
Message
Suren Baghdasaryan
Feb. 12, 2024, 9:38 p.m. UTC
Memory allocation, v3 and final: Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. We're aiming to get this in the next merge window, for 6.9. The feedback we've gotten has been that even out of tree this patchset has already been useful, and there's a significant amount of other work gated on the code tagging functionality included in this patchset [2]. Example output: root@moria-kvm:~# sort -h /proc/allocinfo|tail 3.11MiB 2850 fs/ext4/super.c:1408 module:ext4 func:ext4_alloc_inode 3.52MiB 225 kernel/fork.c:356 module:fork func:alloc_thread_stack_node 3.75MiB 960 mm/page_ext.c:270 module:page_ext func:alloc_page_ext 4.00MiB 2 mm/khugepaged.c:893 module:khugepaged func:hpage_collapse_alloc_folio 10.5MiB 168 block/blk-mq.c:3421 module:blk_mq func:blk_mq_alloc_rqs 14.0MiB 3594 include/linux/gfp.h:295 module:filemap func:folio_alloc_noprof 26.8MiB 6856 include/linux/gfp.h:295 module:memory func:folio_alloc_noprof 64.5MiB 98315 fs/xfs/xfs_rmap_item.c:147 module:xfs func:xfs_rui_init 98.7MiB 25264 include/linux/gfp.h:295 module:readahead func:folio_alloc_noprof 125MiB 7357 mm/slub.c:2201 module:slub func:alloc_slab_page Since v2: - tglx noticed a circular header dependency between sched.h and percpu.h; a bunch of header cleanups were merged into 6.8 to ameliorate this [3]. - a number of improvements, moving alloc_hooks() annotations to the correct place for better tracking (mempool), and bugfixes. - looked at alternate hooking methods. There were suggestions on alternate methods (compiler attribute, trampolines), but they wouldn't have made the patchset any cleaner (we still need to have different function versions for accounting vs. no accounting to control at which point in a call chain the accounting happens), and they would have added a dependency on toolchain support. Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. [2]: Improved fault injection is the big one; the alloc_hooks() macro this patchset introduces is also used for per-callsite fault injection points in the dynamic fault injection patchset, which means we can easily do fault injection on a per module or per file basis; this makes it much easier to integrate memory fault injection into existing tests. Vlastimil recently raised concerns about exposing GFP_NOWAIT as a PF_MEMALLOC_* flag, as this might introduce GFP_NOWAIT to allocation paths that have never had their failure paths tested - this is something we need to address. [3]: The circular dependency looks to be unavoidable; the issue is that alloc_tag_save() -> current -> get_current() requires percpu.h, and percpu.h requires sched.h because of course it does. But this doesn't actually cause build errors because we're only using macros, so the main concern is just not leaving a difficult-to-disentangle minefield for later. So, sched.h is now pretty close to being a types only header that imports types and declares types - this is the header cleanups that were merged for 6.8. Kent Overstreet (11): lib/string_helpers: Add flags param to string_get_size() scripts/kallysms: Always include __start and __stop symbols fs: Convert alloc_inode_sb() to a macro mm/slub: Mark slab_free_freelist_hook() __always_inline mempool: Hook up to memory allocation profiling xfs: Memory allocation profiling fixups mm: percpu: Introduce pcpuobj_ext mm: percpu: Add codetag reference into pcpuobj_ext mm: vmalloc: Enable memory allocation profiling rhashtable: Plumb through alloc tag MAINTAINERS: Add entries for code tagging and memory allocation profiling Suren Baghdasaryan (24): mm: enumerate all gfp flags mm: introduce slabobj_ext to support slab object extensions mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects slab: objext: introduce objext_flags as extension to page_memcg_data_flags lib: code tagging framework lib: code tagging module support lib: prevent module unloading if memory is not freed lib: add allocation tagging support for memory allocation profiling lib: introduce support for page allocation tagging mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags change alloc_pages name in dma_map_ops to avoid name conflicts mm: enable page allocation tagging mm: create new codetag references during page splitting mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y lib: add codetag reference into slabobj_ext mm/slab: add allocation accounting into slab allocation and free paths mm/slab: enable slab allocation tagging for kmalloc and friends mm: percpu: enable per-cpu allocation tagging lib: add memory allocations report in show_mem() codetag: debug: skip objext checking when it's for objext itself codetag: debug: mark codetags for reserved pages as empty codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations Documentation/admin-guide/sysctl/vm.rst | 16 ++ Documentation/filesystems/proc.rst | 28 ++ MAINTAINERS | 16 ++ arch/alpha/kernel/pci_iommu.c | 2 +- arch/mips/jazz/jazzdma.c | 2 +- arch/powerpc/kernel/dma-iommu.c | 2 +- arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +- arch/powerpc/platforms/ps3/system-bus.c | 4 +- arch/powerpc/platforms/pseries/vio.c | 2 +- arch/x86/kernel/amd_gart_64.c | 2 +- drivers/block/virtio_blk.c | 4 +- drivers/gpu/drm/gud/gud_drv.c | 2 +- drivers/iommu/dma-iommu.c | 2 +- drivers/mmc/core/block.c | 4 +- drivers/mtd/spi-nor/debugfs.c | 6 +- .../ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 4 +- drivers/parisc/ccio-dma.c | 2 +- drivers/parisc/sba_iommu.c | 2 +- drivers/scsi/sd.c | 8 +- drivers/staging/media/atomisp/pci/hmm/hmm.c | 2 +- drivers/xen/grant-dma-ops.c | 2 +- drivers/xen/swiotlb-xen.c | 2 +- fs/xfs/kmem.c | 4 +- fs/xfs/kmem.h | 10 +- include/asm-generic/codetag.lds.h | 14 + include/asm-generic/vmlinux.lds.h | 3 + include/linux/alloc_tag.h | 188 +++++++++++++ include/linux/codetag.h | 83 ++++++ include/linux/dma-map-ops.h | 2 +- include/linux/fortify-string.h | 5 +- include/linux/fs.h | 6 +- include/linux/gfp.h | 126 +++++---- include/linux/gfp_types.h | 101 +++++-- include/linux/memcontrol.h | 56 +++- include/linux/mempool.h | 73 +++-- include/linux/mm.h | 8 + include/linux/mm_types.h | 4 +- include/linux/page_ext.h | 1 - include/linux/pagemap.h | 9 +- include/linux/percpu.h | 27 +- include/linux/pgalloc_tag.h | 105 +++++++ include/linux/rhashtable-types.h | 11 +- include/linux/sched.h | 24 ++ include/linux/slab.h | 184 +++++++------ include/linux/string.h | 4 +- include/linux/string_helpers.h | 11 +- include/linux/vmalloc.h | 60 +++- init/Kconfig | 4 + kernel/dma/mapping.c | 4 +- kernel/kallsyms_selftest.c | 2 +- kernel/module/main.c | 25 +- lib/Kconfig.debug | 31 +++ lib/Makefile | 3 + lib/alloc_tag.c | 213 +++++++++++++++ lib/codetag.c | 258 ++++++++++++++++++ lib/rhashtable.c | 52 +++- lib/string_helpers.c | 22 +- lib/test-string_helpers.c | 4 +- mm/compaction.c | 7 +- mm/filemap.c | 6 +- mm/huge_memory.c | 2 + mm/hugetlb.c | 8 +- mm/kfence/core.c | 14 +- mm/kfence/kfence.h | 4 +- mm/memcontrol.c | 56 +--- mm/mempolicy.c | 52 ++-- mm/mempool.c | 36 +-- mm/mm_init.c | 10 + mm/page_alloc.c | 66 +++-- mm/page_ext.c | 13 + mm/page_owner.c | 2 +- mm/percpu-internal.h | 26 +- mm/percpu.c | 120 ++++---- mm/show_mem.c | 15 + mm/slab.h | 176 ++++++++++-- mm/slab_common.c | 65 ++++- mm/slub.c | 138 ++++++---- mm/util.c | 44 +-- mm/vmalloc.c | 88 +++--- scripts/kallsyms.c | 13 + scripts/module.lds.S | 7 + 81 files changed, 2126 insertions(+), 695 deletions(-) create mode 100644 include/asm-generic/codetag.lds.h create mode 100644 include/linux/alloc_tag.h create mode 100644 include/linux/codetag.h create mode 100644 include/linux/pgalloc_tag.h create mode 100644 lib/alloc_tag.c create mode 100644 lib/codetag.c
Comments
On Mon, Feb 12, 2024 at 01:38:46PM -0800, Suren Baghdasaryan wrote: > Low overhead [1] per-callsite memory allocation profiling. Not just for debug > kernels, overhead low enough to be deployed in production. What's the plan for things like devm_kmalloc() and similar relatively simple wrappers? I was thinking it would be possible to reimplement at least devm_kmalloc() with size and flags changing helper a while back: https://lore.kernel.org/all/202309111428.6F36672F57@keescook/ I suspect it could be possible to adapt the alloc_hooks wrapper in this series similarly: #define alloc_hooks_prep(_do_alloc, _do_prepare, _do_finish, \ ctx, size, flags) \ ({ \ typeof(_do_alloc) _res; \ DEFINE_ALLOC_TAG(_alloc_tag, _old); \ ssize_t _size = (size); \ size_t _usable = _size; \ gfp_t _flags = (flags); \ \ _res = _do_prepare(ctx, &_size, &_flags); \ if (!IS_ERR_OR_NULL(_res) \ _res = _do_alloc(_size, _flags); \ if (!IS_ERR_OR_NULL(_res) \ _res = _do_finish(ctx, _usable, _size, _flags, _res); \ _res; \ }) #define devm_kmalloc(dev, size, flags) \ alloc_hooks_prep(kmalloc, devm_alloc_prep, devm_alloc_finish, \ dev, size, flags) And devm_alloc_prep() and devm_alloc_finish() adapted from the URL above. And _do_finish instances could be marked with __realloc_size(2) -Kees
On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: [...] > We're aiming to get this in the next merge window, for 6.9. The feedback > we've gotten has been that even out of tree this patchset has already > been useful, and there's a significant amount of other work gated on the > code tagging functionality included in this patchset [2]. I suspect it will not come as a surprise that I really dislike the implementation proposed here. I will not repeat my arguments, I have done so on several occasions already. Anyway, I didn't go as far as to nak it even though I _strongly_ believe this debugging feature will add a maintenance overhead for a very long time. I can live with all the downsides of the proposed implementation _as long as_ there is a wider agreement from the MM community as this is where the maintenance cost will be payed. So far I have not seen (m)any acks by MM developers so aiming into the next merge window is more than little rushed. > 81 files changed, 2126 insertions(+), 695 deletions(-)
On 13.02.24 22:58, Suren Baghdasaryan wrote: > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: >> >> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: >> [...] >>> We're aiming to get this in the next merge window, for 6.9. The feedback >>> we've gotten has been that even out of tree this patchset has already >>> been useful, and there's a significant amount of other work gated on the >>> code tagging functionality included in this patchset [2]. >> >> I suspect it will not come as a surprise that I really dislike the >> implementation proposed here. I will not repeat my arguments, I have >> done so on several occasions already. >> >> Anyway, I didn't go as far as to nak it even though I _strongly_ believe >> this debugging feature will add a maintenance overhead for a very long >> time. I can live with all the downsides of the proposed implementation >> _as long as_ there is a wider agreement from the MM community as this is >> where the maintenance cost will be payed. So far I have not seen (m)any >> acks by MM developers so aiming into the next merge window is more than >> little rushed. > > We tried other previously proposed approaches and all have their > downsides without making maintenance much easier. Your position is > understandable and I think it's fair. Let's see if others see more > benefit than cost here. Would it make sense to discuss that at LSF/MM once again, especially covering why proposed alternatives did not work out? LSF/MM is not "too far" away (May). I recall that the last LSF/MM session on this topic was a bit unfortunate (IMHO not as productive as it could have been). Maybe we can finally reach a consensus on this.
On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: > On 13.02.24 22:58, Suren Baghdasaryan wrote: > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: > > > [...] > > > > We're aiming to get this in the next merge window, for 6.9. The feedback > > > > we've gotten has been that even out of tree this patchset has already > > > > been useful, and there's a significant amount of other work gated on the > > > > code tagging functionality included in this patchset [2]. > > > > > > I suspect it will not come as a surprise that I really dislike the > > > implementation proposed here. I will not repeat my arguments, I have > > > done so on several occasions already. > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe > > > this debugging feature will add a maintenance overhead for a very long > > > time. I can live with all the downsides of the proposed implementation > > > _as long as_ there is a wider agreement from the MM community as this is > > > where the maintenance cost will be payed. So far I have not seen (m)any > > > acks by MM developers so aiming into the next merge window is more than > > > little rushed. > > > > We tried other previously proposed approaches and all have their > > downsides without making maintenance much easier. Your position is > > understandable and I think it's fair. Let's see if others see more > > benefit than cost here. > > Would it make sense to discuss that at LSF/MM once again, especially > covering why proposed alternatives did not work out? LSF/MM is not "too far" > away (May). > > I recall that the last LSF/MM session on this topic was a bit unfortunate > (IMHO not as productive as it could have been). Maybe we can finally reach a > consensus on this. I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd need to see a serious proposl - what we had at the last LSF was people jumping in with half baked alternative proposals that very much hadn't been thought through, and I see no need to repeat that. Like I mentioned, there's other work gated on this patchset; if people want to hold this up for more discussion they better be putting forth something to discuss.
On 13.02.24 23:09, Kent Overstreet wrote: > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: >> On 13.02.24 22:58, Suren Baghdasaryan wrote: >>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: >>>> >>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: >>>> [...] >>>>> We're aiming to get this in the next merge window, for 6.9. The feedback >>>>> we've gotten has been that even out of tree this patchset has already >>>>> been useful, and there's a significant amount of other work gated on the >>>>> code tagging functionality included in this patchset [2]. >>>> >>>> I suspect it will not come as a surprise that I really dislike the >>>> implementation proposed here. I will not repeat my arguments, I have >>>> done so on several occasions already. >>>> >>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe >>>> this debugging feature will add a maintenance overhead for a very long >>>> time. I can live with all the downsides of the proposed implementation >>>> _as long as_ there is a wider agreement from the MM community as this is >>>> where the maintenance cost will be payed. So far I have not seen (m)any >>>> acks by MM developers so aiming into the next merge window is more than >>>> little rushed. >>> >>> We tried other previously proposed approaches and all have their >>> downsides without making maintenance much easier. Your position is >>> understandable and I think it's fair. Let's see if others see more >>> benefit than cost here. >> >> Would it make sense to discuss that at LSF/MM once again, especially >> covering why proposed alternatives did not work out? LSF/MM is not "too far" >> away (May). >> >> I recall that the last LSF/MM session on this topic was a bit unfortunate >> (IMHO not as productive as it could have been). Maybe we can finally reach a >> consensus on this. > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd > need to see a serious proposl - what we had at the last LSF was people > jumping in with half baked alternative proposals that very much hadn't > been thought through, and I see no need to repeat that. > > Like I mentioned, there's other work gated on this patchset; if people > want to hold this up for more discussion they better be putting forth > something to discuss. I'm thinking of ways on how to achieve Michal's request: "as long as there is a wider agreement from the MM community". If we can achieve that without LSF, great! (a bi-weekly MM meeting might also be an option)
On Tue, Feb 13, 2024 at 11:17:32PM +0100, David Hildenbrand wrote: > On 13.02.24 23:09, Kent Overstreet wrote: > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: > > > On 13.02.24 22:58, Suren Baghdasaryan wrote: > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: > > > > > [...] > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback > > > > > > we've gotten has been that even out of tree this patchset has already > > > > > > been useful, and there's a significant amount of other work gated on the > > > > > > code tagging functionality included in this patchset [2]. > > > > > > > > > > I suspect it will not come as a surprise that I really dislike the > > > > > implementation proposed here. I will not repeat my arguments, I have > > > > > done so on several occasions already. > > > > > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe > > > > > this debugging feature will add a maintenance overhead for a very long > > > > > time. I can live with all the downsides of the proposed implementation > > > > > _as long as_ there is a wider agreement from the MM community as this is > > > > > where the maintenance cost will be payed. So far I have not seen (m)any > > > > > acks by MM developers so aiming into the next merge window is more than > > > > > little rushed. > > > > > > > > We tried other previously proposed approaches and all have their > > > > downsides without making maintenance much easier. Your position is > > > > understandable and I think it's fair. Let's see if others see more > > > > benefit than cost here. > > > > > > Would it make sense to discuss that at LSF/MM once again, especially > > > covering why proposed alternatives did not work out? LSF/MM is not "too far" > > > away (May). > > > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate > > > (IMHO not as productive as it could have been). Maybe we can finally reach a > > > consensus on this. > > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd > > need to see a serious proposl - what we had at the last LSF was people > > jumping in with half baked alternative proposals that very much hadn't > > been thought through, and I see no need to repeat that. > > > > Like I mentioned, there's other work gated on this patchset; if people > > want to hold this up for more discussion they better be putting forth > > something to discuss. > > I'm thinking of ways on how to achieve Michal's request: "as long as there > is a wider agreement from the MM community". If we can achieve that without > LSF, great! (a bi-weekly MM meeting might also be an option) A meeting wouldn't be out of the question, _if_ there is an agenda, but: What's that coffeee mug say? I just survived another meeting that could've been an email? What exactly is the outcome we're looking for? Is there info that people are looking for? I think we summed things up pretty well in the cover letter; if there are specifics that people want to discuss, that's why we emailed the series out. There's people in this thread who've used this patchset in production and diagnosed real issues (gigabytes of memory gone missing, I heard the other day); I'm personally looking for them to chime in on this thread (Johannes, Pasha). If it's just grumbling about "maintenance overhead" we need to get past - well, people are going to have to accept that we can't deliver features without writing code, and I'm confident that the hooking in particular is about as clean as it's going to get, _regardless_ of toolchain support; and moreover it addresses what's been historically a pretty gaping hole in our ability to profile and understand the code we write.
On 13.02.24 23:30, Suren Baghdasaryan wrote: > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 13.02.24 23:09, Kent Overstreet wrote: >>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: >>>> On 13.02.24 22:58, Suren Baghdasaryan wrote: >>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: >>>>>> >>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: >>>>>> [...] >>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback >>>>>>> we've gotten has been that even out of tree this patchset has already >>>>>>> been useful, and there's a significant amount of other work gated on the >>>>>>> code tagging functionality included in this patchset [2]. >>>>>> >>>>>> I suspect it will not come as a surprise that I really dislike the >>>>>> implementation proposed here. I will not repeat my arguments, I have >>>>>> done so on several occasions already. >>>>>> >>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe >>>>>> this debugging feature will add a maintenance overhead for a very long >>>>>> time. I can live with all the downsides of the proposed implementation >>>>>> _as long as_ there is a wider agreement from the MM community as this is >>>>>> where the maintenance cost will be payed. So far I have not seen (m)any >>>>>> acks by MM developers so aiming into the next merge window is more than >>>>>> little rushed. >>>>> >>>>> We tried other previously proposed approaches and all have their >>>>> downsides without making maintenance much easier. Your position is >>>>> understandable and I think it's fair. Let's see if others see more >>>>> benefit than cost here. >>>> >>>> Would it make sense to discuss that at LSF/MM once again, especially >>>> covering why proposed alternatives did not work out? LSF/MM is not "too far" >>>> away (May). >>>> >>>> I recall that the last LSF/MM session on this topic was a bit unfortunate >>>> (IMHO not as productive as it could have been). Maybe we can finally reach a >>>> consensus on this. >>> >>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd >>> need to see a serious proposl - what we had at the last LSF was people >>> jumping in with half baked alternative proposals that very much hadn't >>> been thought through, and I see no need to repeat that. >>> >>> Like I mentioned, there's other work gated on this patchset; if people >>> want to hold this up for more discussion they better be putting forth >>> something to discuss. >> >> I'm thinking of ways on how to achieve Michal's request: "as long as >> there is a wider agreement from the MM community". If we can achieve >> that without LSF, great! (a bi-weekly MM meeting might also be an option) > > There will be a maintenance burden even with the cleanest proposed > approach. Yes. > We worked hard to make the patchset as clean as possible and > if benefits still don't outweigh the maintenance cost then we should > probably stop trying. Indeed. > At LSF/MM I would rather discuss functonal > issues/requirements/improvements than alternative approaches to > instrument allocators. > I'm happy to arrange a separate meeting with MM folks if that would > help to progress on the cost/benefit decision. Note that I am only proposing ways forward. If you think you can easily achieve what Michal requested without all that, good. My past experience was that LSF/MM / bi-weekly MM meetings were extremely helpful to reach consensus.
On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: > On 13.02.24 23:30, Suren Baghdasaryan wrote: > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: > > > > > > On 13.02.24 23:09, Kent Overstreet wrote: > > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: > > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote: > > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: > > > > > > > [...] > > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback > > > > > > > > we've gotten has been that even out of tree this patchset has already > > > > > > > > been useful, and there's a significant amount of other work gated on the > > > > > > > > code tagging functionality included in this patchset [2]. > > > > > > > > > > > > > > I suspect it will not come as a surprise that I really dislike the > > > > > > > implementation proposed here. I will not repeat my arguments, I have > > > > > > > done so on several occasions already. > > > > > > > > > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe > > > > > > > this debugging feature will add a maintenance overhead for a very long > > > > > > > time. I can live with all the downsides of the proposed implementation > > > > > > > _as long as_ there is a wider agreement from the MM community as this is > > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any > > > > > > > acks by MM developers so aiming into the next merge window is more than > > > > > > > little rushed. > > > > > > > > > > > > We tried other previously proposed approaches and all have their > > > > > > downsides without making maintenance much easier. Your position is > > > > > > understandable and I think it's fair. Let's see if others see more > > > > > > benefit than cost here. > > > > > > > > > > Would it make sense to discuss that at LSF/MM once again, especially > > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far" > > > > > away (May). > > > > > > > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate > > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a > > > > > consensus on this. > > > > > > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd > > > > need to see a serious proposl - what we had at the last LSF was people > > > > jumping in with half baked alternative proposals that very much hadn't > > > > been thought through, and I see no need to repeat that. > > > > > > > > Like I mentioned, there's other work gated on this patchset; if people > > > > want to hold this up for more discussion they better be putting forth > > > > something to discuss. > > > > > > I'm thinking of ways on how to achieve Michal's request: "as long as > > > there is a wider agreement from the MM community". If we can achieve > > > that without LSF, great! (a bi-weekly MM meeting might also be an option) > > > > There will be a maintenance burden even with the cleanest proposed > > approach. > > Yes. > > > We worked hard to make the patchset as clean as possible and > > if benefits still don't outweigh the maintenance cost then we should > > probably stop trying. > > Indeed. > > > At LSF/MM I would rather discuss functonal > > issues/requirements/improvements than alternative approaches to > > instrument allocators. > > I'm happy to arrange a separate meeting with MM folks if that would > > help to progress on the cost/benefit decision. > Note that I am only proposing ways forward. > > If you think you can easily achieve what Michal requested without all that, > good. He requested something?
On 13.02.24 23:50, Kent Overstreet wrote: > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: >> On 13.02.24 23:30, Suren Baghdasaryan wrote: >>> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 13.02.24 23:09, Kent Overstreet wrote: >>>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: >>>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote: >>>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: >>>>>>>> >>>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: >>>>>>>> [...] >>>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback >>>>>>>>> we've gotten has been that even out of tree this patchset has already >>>>>>>>> been useful, and there's a significant amount of other work gated on the >>>>>>>>> code tagging functionality included in this patchset [2]. >>>>>>>> >>>>>>>> I suspect it will not come as a surprise that I really dislike the >>>>>>>> implementation proposed here. I will not repeat my arguments, I have >>>>>>>> done so on several occasions already. >>>>>>>> >>>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe >>>>>>>> this debugging feature will add a maintenance overhead for a very long >>>>>>>> time. I can live with all the downsides of the proposed implementation >>>>>>>> _as long as_ there is a wider agreement from the MM community as this is >>>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any >>>>>>>> acks by MM developers so aiming into the next merge window is more than >>>>>>>> little rushed. >>>>>>> >>>>>>> We tried other previously proposed approaches and all have their >>>>>>> downsides without making maintenance much easier. Your position is >>>>>>> understandable and I think it's fair. Let's see if others see more >>>>>>> benefit than cost here. >>>>>> >>>>>> Would it make sense to discuss that at LSF/MM once again, especially >>>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far" >>>>>> away (May). >>>>>> >>>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate >>>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a >>>>>> consensus on this. >>>>> >>>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd >>>>> need to see a serious proposl - what we had at the last LSF was people >>>>> jumping in with half baked alternative proposals that very much hadn't >>>>> been thought through, and I see no need to repeat that. >>>>> >>>>> Like I mentioned, there's other work gated on this patchset; if people >>>>> want to hold this up for more discussion they better be putting forth >>>>> something to discuss. >>>> >>>> I'm thinking of ways on how to achieve Michal's request: "as long as >>>> there is a wider agreement from the MM community". If we can achieve >>>> that without LSF, great! (a bi-weekly MM meeting might also be an option) >>> >>> There will be a maintenance burden even with the cleanest proposed >>> approach. >> >> Yes. >> >>> We worked hard to make the patchset as clean as possible and >>> if benefits still don't outweigh the maintenance cost then we should >>> probably stop trying. >> >> Indeed. >> >>> At LSF/MM I would rather discuss functonal >>> issues/requirements/improvements than alternative approaches to >>> instrument allocators. >>> I'm happy to arrange a separate meeting with MM folks if that would >>> help to progress on the cost/benefit decision. >> Note that I am only proposing ways forward. >> >> If you think you can easily achieve what Michal requested without all that, >> good. > > He requested something? > This won't get merged without acks from MM people.
On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: > > On 13.02.24 23:30, Suren Baghdasaryan wrote: > > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: > > > > > > > > On 13.02.24 23:09, Kent Overstreet wrote: > > > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: > > > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote: > > > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > > > > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: > > > > > > > > [...] > > > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback > > > > > > > > > we've gotten has been that even out of tree this patchset has already > > > > > > > > > been useful, and there's a significant amount of other work gated on the > > > > > > > > > code tagging functionality included in this patchset [2]. > > > > > > > > > > > > > > > > I suspect it will not come as a surprise that I really dislike the > > > > > > > > implementation proposed here. I will not repeat my arguments, I have > > > > > > > > done so on several occasions already. > > > > > > > > > > > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe > > > > > > > > this debugging feature will add a maintenance overhead for a very long > > > > > > > > time. I can live with all the downsides of the proposed implementation > > > > > > > > _as long as_ there is a wider agreement from the MM community as this is > > > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any > > > > > > > > acks by MM developers so aiming into the next merge window is more than > > > > > > > > little rushed. > > > > > > > > > > > > > > We tried other previously proposed approaches and all have their > > > > > > > downsides without making maintenance much easier. Your position is > > > > > > > understandable and I think it's fair. Let's see if others see more > > > > > > > benefit than cost here. > > > > > > > > > > > > Would it make sense to discuss that at LSF/MM once again, especially > > > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far" > > > > > > away (May). > > > > > > > > > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate > > > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a > > > > > > consensus on this. > > > > > > > > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd > > > > > need to see a serious proposl - what we had at the last LSF was people > > > > > jumping in with half baked alternative proposals that very much hadn't > > > > > been thought through, and I see no need to repeat that. > > > > > > > > > > Like I mentioned, there's other work gated on this patchset; if people > > > > > want to hold this up for more discussion they better be putting forth > > > > > something to discuss. > > > > > > > > I'm thinking of ways on how to achieve Michal's request: "as long as > > > > there is a wider agreement from the MM community". If we can achieve > > > > that without LSF, great! (a bi-weekly MM meeting might also be an option) > > > > > > There will be a maintenance burden even with the cleanest proposed > > > approach. > > > > Yes. > > > > > We worked hard to make the patchset as clean as possible and > > > if benefits still don't outweigh the maintenance cost then we should > > > probably stop trying. > > > > Indeed. > > > > > At LSF/MM I would rather discuss functonal > > > issues/requirements/improvements than alternative approaches to > > > instrument allocators. > > > I'm happy to arrange a separate meeting with MM folks if that would > > > help to progress on the cost/benefit decision. > > Note that I am only proposing ways forward. > > > > If you think you can easily achieve what Michal requested without all that, > > good. > > He requested something? Yes, a cleaner instrumentation. Unfortunately the cleanest one is not possible until the compiler feature is developed and deployed. And it still would require changes to the headers, so don't think it's worth delaying the feature for years.
On 13.02.24 23:59, Suren Baghdasaryan wrote: > On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet > <kent.overstreet@linux.dev> wrote: >> >> On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: >>> On 13.02.24 23:30, Suren Baghdasaryan wrote: >>>> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: >>>>> >>>>> On 13.02.24 23:09, Kent Overstreet wrote: >>>>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: >>>>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote: >>>>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: >>>>>>>>> >>>>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: >>>>>>>>> [...] >>>>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback >>>>>>>>>> we've gotten has been that even out of tree this patchset has already >>>>>>>>>> been useful, and there's a significant amount of other work gated on the >>>>>>>>>> code tagging functionality included in this patchset [2]. >>>>>>>>> >>>>>>>>> I suspect it will not come as a surprise that I really dislike the >>>>>>>>> implementation proposed here. I will not repeat my arguments, I have >>>>>>>>> done so on several occasions already. >>>>>>>>> >>>>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe >>>>>>>>> this debugging feature will add a maintenance overhead for a very long >>>>>>>>> time. I can live with all the downsides of the proposed implementation >>>>>>>>> _as long as_ there is a wider agreement from the MM community as this is >>>>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any >>>>>>>>> acks by MM developers so aiming into the next merge window is more than >>>>>>>>> little rushed. >>>>>>>> >>>>>>>> We tried other previously proposed approaches and all have their >>>>>>>> downsides without making maintenance much easier. Your position is >>>>>>>> understandable and I think it's fair. Let's see if others see more >>>>>>>> benefit than cost here. >>>>>>> >>>>>>> Would it make sense to discuss that at LSF/MM once again, especially >>>>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far" >>>>>>> away (May). >>>>>>> >>>>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate >>>>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a >>>>>>> consensus on this. >>>>>> >>>>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd >>>>>> need to see a serious proposl - what we had at the last LSF was people >>>>>> jumping in with half baked alternative proposals that very much hadn't >>>>>> been thought through, and I see no need to repeat that. >>>>>> >>>>>> Like I mentioned, there's other work gated on this patchset; if people >>>>>> want to hold this up for more discussion they better be putting forth >>>>>> something to discuss. >>>>> >>>>> I'm thinking of ways on how to achieve Michal's request: "as long as >>>>> there is a wider agreement from the MM community". If we can achieve >>>>> that without LSF, great! (a bi-weekly MM meeting might also be an option) >>>> >>>> There will be a maintenance burden even with the cleanest proposed >>>> approach. >>> >>> Yes. >>> >>>> We worked hard to make the patchset as clean as possible and >>>> if benefits still don't outweigh the maintenance cost then we should >>>> probably stop trying. >>> >>> Indeed. >>> >>>> At LSF/MM I would rather discuss functonal >>>> issues/requirements/improvements than alternative approaches to >>>> instrument allocators. >>>> I'm happy to arrange a separate meeting with MM folks if that would >>>> help to progress on the cost/benefit decision. >>> Note that I am only proposing ways forward. >>> >>> If you think you can easily achieve what Michal requested without all that, >>> good. >> >> He requested something? > > Yes, a cleaner instrumentation. Unfortunately the cleanest one is not > possible until the compiler feature is developed and deployed. And it > still would require changes to the headers, so don't think it's worth > delaying the feature for years. > I was talking about this: "I can live with all the downsides of the proposed implementationas long as there is a wider agreement from the MM community as this is where the maintenance cost will be payed. So far I have not seen (m)any acks by MM developers". I certainly cannot be motivated at this point to review and ack this, unfortunately too much negative energy around here.
On Tue, Feb 13, 2024 at 02:59:11PM -0800, Suren Baghdasaryan wrote: > On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet > <kent.overstreet@linux.dev> wrote: > > > > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: > > > On 13.02.24 23:30, Suren Baghdasaryan wrote: > > > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: > > > If you think you can easily achieve what Michal requested without all that, > > > good. > > > > He requested something? > > Yes, a cleaner instrumentation. Unfortunately the cleanest one is not > possible until the compiler feature is developed and deployed. And it > still would require changes to the headers, so don't think it's worth > delaying the feature for years. Hang on, let's look at the actual code. This is what instrumenting an allocation function looks like: #define krealloc_array(...) alloc_hooks(krealloc_array_noprof(__VA_ARGS__)) IOW, we have to: - rename krealloc_array to krealloc_array_noprof - replace krealloc_array with a one wrapper macro call Is this really all we're getting worked up over? The renaming we need regardless, because the thing that makes this approach efficient enough to run in production is that we account at _one_ point in the callstack, we don't save entire backtraces. And thus we need to explicitly annotate which one that is; which means we need _noprof() versions of functions for when the accounting is done by an outer wraper (e.g. mempool). And, as I keep saying: that alloc_hooks() macro will also get us _per callsite fault injection points_, and we really need that because - if you guys have been paying attention to other threads - whenever moving more stuff to PF_MEMALLOC_* flags comes up (including adding PF_MEMALLOC_NORECLAIM), the issue of small allocations not failing and not being testable keeps coming up.
On Tue, Feb 13, 2024 at 05:29:03PM -0500, Kent Overstreet wrote: > On Tue, Feb 13, 2024 at 11:17:32PM +0100, David Hildenbrand wrote: > > On 13.02.24 23:09, Kent Overstreet wrote: > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote: > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: > > > > > > [...] > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback > > > > > > > we've gotten has been that even out of tree this patchset has already > > > > > > > been useful, and there's a significant amount of other work gated on the > > > > > > > code tagging functionality included in this patchset [2]. > > > > > > > > > > > > I suspect it will not come as a surprise that I really dislike the > > > > > > implementation proposed here. I will not repeat my arguments, I have > > > > > > done so on several occasions already. > > > > > > > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe > > > > > > this debugging feature will add a maintenance overhead for a very long > > > > > > time. I can live with all the downsides of the proposed implementation > > > > > > _as long as_ there is a wider agreement from the MM community as this is > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any > > > > > > acks by MM developers so aiming into the next merge window is more than > > > > > > little rushed. > > > > > > > > > > We tried other previously proposed approaches and all have their > > > > > downsides without making maintenance much easier. Your position is > > > > > understandable and I think it's fair. Let's see if others see more > > > > > benefit than cost here. > > > > > > > > Would it make sense to discuss that at LSF/MM once again, especially > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far" > > > > away (May). You want to stall this effort for *three months* to schedule a meeting? I would love to have better profiling of memory allocations inside XFS so that we can answer questions like "What's going on with these allocation stalls?" or "What memory is getting used, and where?" more quickly than we can now. Right now I get to stare at tracepoints and printk crap until my eyes bleed, and maybe dchinner comes to my rescue and figures out what's going on sooner than I do. More often we just never figure it out because only the customer can reproduce the problem, the reams of data produced by ftrace is unmanageable, and BPF isn't always available. I'm not thrilled by the large increase in macro crap in the allocation paths, but I don't know of a better way to instrument things. Our attempts to use _RET_IP in XFS to instrument its code paths have never worked quite right w.r.t. inlined functions and whatnot. > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a > > > > consensus on this. From my outsider's perspective, nine months have gone by since the last LSF. Who has come up with a cleaner/better/faster way to do what Suren and Kent have done? Were those code changes integrated into this patchset? Or why not? Most of what I saw in 2023 involved compiler changes (great; now I have to wait until RHEL 11/Debian 14 to use it) and/or still utilize fugly macros. Recalling all the way back to suggestions made in 2022, who wrote the prototype for doing this via ftrace? Or BPF? How well did that go for counting allocation events and the like? I saw Tejun saying something about how they use BPF aggressively inside Meta, but that was about it. Were any of those solutions significantly better than what's in front of us here? I get it, a giant patch forcing everyone to know the difference between alloc_foo and alloc_foo_noperf offends my (yours?) stylistic sensibilities. On the other side, making analysis easier during customer escalations means we kernel people get data, answers, and solutions sooner instead of watching all our time get eaten up on L4 support and backporting hell. > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd > > > need to see a serious proposl - what we had at the last LSF was people > > > jumping in with half baked alternative proposals that very much hadn't > > > been thought through, and I see no need to repeat that. > > > > > > Like I mentioned, there's other work gated on this patchset; if people > > > want to hold this up for more discussion they better be putting forth > > > something to discuss. > > > > I'm thinking of ways on how to achieve Michal's request: "as long as there > > is a wider agreement from the MM community". If we can achieve that without > > LSF, great! (a bi-weekly MM meeting might also be an option) > > A meeting wouldn't be out of the question, _if_ there is an agenda, but: > > What's that coffeee mug say? I just survived another meeting that > could've been an email? I congratulate you on your memory of my kitchen mug. Yes, that's what it says. > What exactly is the outcome we're looking for? > > Is there info that people are looking for? I think we summed things up > pretty well in the cover letter; if there are specifics that people > want to discuss, that's why we emailed the series out. > > There's people in this thread who've used this patchset in production > and diagnosed real issues (gigabytes of memory gone missing, I heard the > other day); I'm personally looking for them to chime in on this thread > (Johannes, Pasha). > > If it's just grumbling about "maintenance overhead" we need to get past > - well, people are going to have to accept that we can't deliver > features without writing code, and I'm confident that the hooking in > particular is about as clean as it's going to get, _regardless_ of > toolchain support; and moreover it addresses what's been historically a > pretty gaping hole in our ability to profile and understand the code we > write. Are you and Suren willing to pay whatever maintenance overhead there is? --D
On Wed, Feb 14, 2024 at 12:02:30AM +0100, David Hildenbrand wrote: > On 13.02.24 23:59, Suren Baghdasaryan wrote: > > On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet > > <kent.overstreet@linux.dev> wrote: > > > > > > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: > > > > On 13.02.24 23:30, Suren Baghdasaryan wrote: > > > > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: > > > > > > > > > > > > On 13.02.24 23:09, Kent Overstreet wrote: > > > > > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: > > > > > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote: > > > > > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > > > > > > > > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: > > > > > > > > > > [...] > > > > > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback > > > > > > > > > > > we've gotten has been that even out of tree this patchset has already > > > > > > > > > > > been useful, and there's a significant amount of other work gated on the > > > > > > > > > > > code tagging functionality included in this patchset [2]. > > > > > > > > > > > > > > > > > > > > I suspect it will not come as a surprise that I really dislike the > > > > > > > > > > implementation proposed here. I will not repeat my arguments, I have > > > > > > > > > > done so on several occasions already. > > > > > > > > > > > > > > > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe > > > > > > > > > > this debugging feature will add a maintenance overhead for a very long > > > > > > > > > > time. I can live with all the downsides of the proposed implementation > > > > > > > > > > _as long as_ there is a wider agreement from the MM community as this is > > > > > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any > > > > > > > > > > acks by MM developers so aiming into the next merge window is more than > > > > > > > > > > little rushed. > > > > > > > > > > > > > > > > > > We tried other previously proposed approaches and all have their > > > > > > > > > downsides without making maintenance much easier. Your position is > > > > > > > > > understandable and I think it's fair. Let's see if others see more > > > > > > > > > benefit than cost here. > > > > > > > > > > > > > > > > Would it make sense to discuss that at LSF/MM once again, especially > > > > > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far" > > > > > > > > away (May). > > > > > > > > > > > > > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate > > > > > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a > > > > > > > > consensus on this. > > > > > > > > > > > > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd > > > > > > > need to see a serious proposl - what we had at the last LSF was people > > > > > > > jumping in with half baked alternative proposals that very much hadn't > > > > > > > been thought through, and I see no need to repeat that. > > > > > > > > > > > > > > Like I mentioned, there's other work gated on this patchset; if people > > > > > > > want to hold this up for more discussion they better be putting forth > > > > > > > something to discuss. > > > > > > > > > > > > I'm thinking of ways on how to achieve Michal's request: "as long as > > > > > > there is a wider agreement from the MM community". If we can achieve > > > > > > that without LSF, great! (a bi-weekly MM meeting might also be an option) > > > > > > > > > > There will be a maintenance burden even with the cleanest proposed > > > > > approach. > > > > > > > > Yes. > > > > > > > > > We worked hard to make the patchset as clean as possible and > > > > > if benefits still don't outweigh the maintenance cost then we should > > > > > probably stop trying. > > > > > > > > Indeed. > > > > > > > > > At LSF/MM I would rather discuss functonal > > > > > issues/requirements/improvements than alternative approaches to > > > > > instrument allocators. > > > > > I'm happy to arrange a separate meeting with MM folks if that would > > > > > help to progress on the cost/benefit decision. > > > > Note that I am only proposing ways forward. > > > > > > > > If you think you can easily achieve what Michal requested without all that, > > > > good. > > > > > > He requested something? > > > > Yes, a cleaner instrumentation. Unfortunately the cleanest one is not > > possible until the compiler feature is developed and deployed. And it > > still would require changes to the headers, so don't think it's worth > > delaying the feature for years. > > > > I was talking about this: "I can live with all the downsides of the proposed > implementationas long as there is a wider agreement from the MM community as > this is where the maintenance cost will be payed. So far I have not seen > (m)any acks by MM developers". > > I certainly cannot be motivated at this point to review and ack this, > unfortunately too much negative energy around here. David, this kind of reaction is exactly why I was telling Andrew I was going to submit this as a direct pull request to Linus. This is an important feature; if we can't stay focused ot the technical and get it done that's what I'll do.
On 14.02.24 00:12, Kent Overstreet wrote: > On Wed, Feb 14, 2024 at 12:02:30AM +0100, David Hildenbrand wrote: >> On 13.02.24 23:59, Suren Baghdasaryan wrote: >>> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet >>> <kent.overstreet@linux.dev> wrote: >>>> >>>> On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: >>>>> On 13.02.24 23:30, Suren Baghdasaryan wrote: >>>>>> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: >>>>>>> >>>>>>> On 13.02.24 23:09, Kent Overstreet wrote: >>>>>>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: >>>>>>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote: >>>>>>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: >>>>>>>>>>> >>>>>>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: >>>>>>>>>>> [...] >>>>>>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback >>>>>>>>>>>> we've gotten has been that even out of tree this patchset has already >>>>>>>>>>>> been useful, and there's a significant amount of other work gated on the >>>>>>>>>>>> code tagging functionality included in this patchset [2]. >>>>>>>>>>> >>>>>>>>>>> I suspect it will not come as a surprise that I really dislike the >>>>>>>>>>> implementation proposed here. I will not repeat my arguments, I have >>>>>>>>>>> done so on several occasions already. >>>>>>>>>>> >>>>>>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe >>>>>>>>>>> this debugging feature will add a maintenance overhead for a very long >>>>>>>>>>> time. I can live with all the downsides of the proposed implementation >>>>>>>>>>> _as long as_ there is a wider agreement from the MM community as this is >>>>>>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any >>>>>>>>>>> acks by MM developers so aiming into the next merge window is more than >>>>>>>>>>> little rushed. >>>>>>>>>> >>>>>>>>>> We tried other previously proposed approaches and all have their >>>>>>>>>> downsides without making maintenance much easier. Your position is >>>>>>>>>> understandable and I think it's fair. Let's see if others see more >>>>>>>>>> benefit than cost here. >>>>>>>>> >>>>>>>>> Would it make sense to discuss that at LSF/MM once again, especially >>>>>>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far" >>>>>>>>> away (May). >>>>>>>>> >>>>>>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate >>>>>>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a >>>>>>>>> consensus on this. >>>>>>>> >>>>>>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd >>>>>>>> need to see a serious proposl - what we had at the last LSF was people >>>>>>>> jumping in with half baked alternative proposals that very much hadn't >>>>>>>> been thought through, and I see no need to repeat that. >>>>>>>> >>>>>>>> Like I mentioned, there's other work gated on this patchset; if people >>>>>>>> want to hold this up for more discussion they better be putting forth >>>>>>>> something to discuss. >>>>>>> >>>>>>> I'm thinking of ways on how to achieve Michal's request: "as long as >>>>>>> there is a wider agreement from the MM community". If we can achieve >>>>>>> that without LSF, great! (a bi-weekly MM meeting might also be an option) >>>>>> >>>>>> There will be a maintenance burden even with the cleanest proposed >>>>>> approach. >>>>> >>>>> Yes. >>>>> >>>>>> We worked hard to make the patchset as clean as possible and >>>>>> if benefits still don't outweigh the maintenance cost then we should >>>>>> probably stop trying. >>>>> >>>>> Indeed. >>>>> >>>>>> At LSF/MM I would rather discuss functonal >>>>>> issues/requirements/improvements than alternative approaches to >>>>>> instrument allocators. >>>>>> I'm happy to arrange a separate meeting with MM folks if that would >>>>>> help to progress on the cost/benefit decision. >>>>> Note that I am only proposing ways forward. >>>>> >>>>> If you think you can easily achieve what Michal requested without all that, >>>>> good. >>>> >>>> He requested something? >>> >>> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not >>> possible until the compiler feature is developed and deployed. And it >>> still would require changes to the headers, so don't think it's worth >>> delaying the feature for years. >>> >> >> I was talking about this: "I can live with all the downsides of the proposed >> implementationas long as there is a wider agreement from the MM community as >> this is where the maintenance cost will be payed. So far I have not seen >> (m)any acks by MM developers". >> >> I certainly cannot be motivated at this point to review and ack this, >> unfortunately too much negative energy around here. > > David, this kind of reaction is exactly why I was telling Andrew I was > going to submit this as a direct pull request to Linus. > > This is an important feature; if we can't stay focused ot the technical > and get it done that's what I'll do. Kent, I started this with "Would it make sense" in an attempt to help Suren and you to finally make progress with this, one way or the other. I know that there were ways in the past to get the MM community to agree on such things. I tried to be helpful, finding ways *not having to* bypass the MM community to get MM stuff merged. The reply I got is mostly negative energy. So you don't need my help here, understood. But I will fight against any attempts to bypass the MM community.
On Tue, Feb 13, 2024 at 03:11:15PM -0800, Darrick J. Wong wrote: > On Tue, Feb 13, 2024 at 05:29:03PM -0500, Kent Overstreet wrote: > > On Tue, Feb 13, 2024 at 11:17:32PM +0100, David Hildenbrand wrote: > > > On 13.02.24 23:09, Kent Overstreet wrote: > > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: > > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote: > > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: > > > > > > > [...] > > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback > > > > > > > > we've gotten has been that even out of tree this patchset has already > > > > > > > > been useful, and there's a significant amount of other work gated on the > > > > > > > > code tagging functionality included in this patchset [2]. > > > > > > > > > > > > > > I suspect it will not come as a surprise that I really dislike the > > > > > > > implementation proposed here. I will not repeat my arguments, I have > > > > > > > done so on several occasions already. > > > > > > > > > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe > > > > > > > this debugging feature will add a maintenance overhead for a very long > > > > > > > time. I can live with all the downsides of the proposed implementation > > > > > > > _as long as_ there is a wider agreement from the MM community as this is > > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any > > > > > > > acks by MM developers so aiming into the next merge window is more than > > > > > > > little rushed. > > > > > > > > > > > > We tried other previously proposed approaches and all have their > > > > > > downsides without making maintenance much easier. Your position is > > > > > > understandable and I think it's fair. Let's see if others see more > > > > > > benefit than cost here. > > > > > > > > > > Would it make sense to discuss that at LSF/MM once again, especially > > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far" > > > > > away (May). > > You want to stall this effort for *three months* to schedule a meeting? > > I would love to have better profiling of memory allocations inside XFS > so that we can answer questions like "What's going on with these > allocation stalls?" or "What memory is getting used, and where?" more > quickly than we can now. > > Right now I get to stare at tracepoints and printk crap until my eyes > bleed, and maybe dchinner comes to my rescue and figures out what's > going on sooner than I do. More often we just never figure it out > because only the customer can reproduce the problem, the reams of data > produced by ftrace is unmanageable, and BPF isn't always available. > > I'm not thrilled by the large increase in macro crap in the allocation > paths, but I don't know of a better way to instrument things. Our > attempts to use _RET_IP in XFS to instrument its code paths have never > worked quite right w.r.t. inlined functions and whatnot. > > > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate > > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a > > > > > consensus on this. > > From my outsider's perspective, nine months have gone by since the last > LSF. Who has come up with a cleaner/better/faster way to do what Suren > and Kent have done? Were those code changes integrated into this > patchset? Or why not? > > Most of what I saw in 2023 involved compiler changes (great; now I have > to wait until RHEL 11/Debian 14 to use it) and/or still utilize fugly > macros. > > Recalling all the way back to suggestions made in 2022, who wrote the > prototype for doing this via ftrace? Or BPF? How well did that go for > counting allocation events and the like? I saw Tejun saying something > about how they use BPF aggressively inside Meta, but that was about it. > > Were any of those solutions significantly better than what's in front of > us here? > > I get it, a giant patch forcing everyone to know the difference between > alloc_foo and alloc_foo_noperf offends my (yours?) stylistic > sensibilities. On the other side, making analysis easier during > customer escalations means we kernel people get data, answers, and > solutions sooner instead of watching all our time get eaten up on L4 > support and backporting hell. > > > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd > > > > need to see a serious proposl - what we had at the last LSF was people > > > > jumping in with half baked alternative proposals that very much hadn't > > > > been thought through, and I see no need to repeat that. > > > > > > > > Like I mentioned, there's other work gated on this patchset; if people > > > > want to hold this up for more discussion they better be putting forth > > > > something to discuss. > > > > > > I'm thinking of ways on how to achieve Michal's request: "as long as there > > > is a wider agreement from the MM community". If we can achieve that without > > > LSF, great! (a bi-weekly MM meeting might also be an option) > > > > A meeting wouldn't be out of the question, _if_ there is an agenda, but: > > > > What's that coffeee mug say? I just survived another meeting that > > could've been an email? > > I congratulate you on your memory of my kitchen mug. Yes, that's what > it says. > > > What exactly is the outcome we're looking for? > > > > Is there info that people are looking for? I think we summed things up > > pretty well in the cover letter; if there are specifics that people > > want to discuss, that's why we emailed the series out. > > > > There's people in this thread who've used this patchset in production > > and diagnosed real issues (gigabytes of memory gone missing, I heard the > > other day); I'm personally looking for them to chime in on this thread > > (Johannes, Pasha). > > > > If it's just grumbling about "maintenance overhead" we need to get past > > - well, people are going to have to accept that we can't deliver > > features without writing code, and I'm confident that the hooking in > > particular is about as clean as it's going to get, _regardless_ of > > toolchain support; and moreover it addresses what's been historically a > > pretty gaping hole in our ability to profile and understand the code we > > write. > > Are you and Suren willing to pay whatever maintenance overhead there is? I'm still wondering what this supposed "maintenance overhead" is going to be... As I use this patch series I occasionally notice places where a bunch of memory is being accounted to one line of code, and it would better be accounted to a caller - but then it's just a couple lines of code to fix that. You switch that callsite to the _noprof() version of whatever allocation it's doing, then add an alloc_hooks() wrapper at the place you do want it accounted. That doesn't really feel like overhead to me, just the normal tweaking your tools to get the most out of them. I will continue to do that for the code I'm looking at, yes. If other people are doing that too, it'll be because they're also using memory allocation profiling and finding it valuable. I did notice earlier that we're still lacking documentation in the Documentation/ directory; the workflow for "how you shift accounting to the right spot" is something that should go in there.
On Tue, Feb 13, 2024 at 06:54:09PM -0500, Pasha Tatashin wrote: > > > I tried to be helpful, finding ways *not having to* bypass the MM > > > community to get MM stuff merged. > > > > > > The reply I got is mostly negative energy. > > > > > > So you don't need my help here, understood. > > > > > > But I will fight against any attempts to bypass the MM community. > > > > Well, I'm definitely not trying to bypass the MM community, that's why > > this patchset is posted. Not sure why people can't voice their opinion > > on the benefit/cost balance of the patchset over the email... But if a > > meeting would be more productive I'm happy to set it up. > > Discussing these concerns during the next available MM Alignment > session makes sense. At a minimum, Suren and Kent can present their > reasons for believing the current approach is superior to the > previously proposed alternatives. Hang on though: I believe we did so adequately within this thread. Both in the cover letter, and I further outlined exactly what the hooks need to do, and cited the exact code. Nobody seems to be complaining about the specifics, so I'm not sure what would be on the agenda?
I'll do a more throrough code review, but before the discussion gets too sidetracked, I wanted to add my POV on the overall merit of the direction that is being proposed here. I have backported and used this code for debugging production issues before. Logging into a random host with an unfamiliar workload and being able to get a reliable, comprehensive list of kernel memory consumers is one of the coolest things I have seen in a long time. This is a huge improvement to sysadmin quality of life. It's also a huge improvement for MM developers. We're the first points of contact for memory regressions that can be caused by pretty much any driver or subsystem in the kernel. I encourage anybody who is undecided on whether this is worth doing to build a kernel with these patches applied and run it on their own machine. I think you'll be surprised what you'll find - and how myopic and uninformative /proc/meminfo feels in comparison to this. Did you know there is a lot more to modern filesystems than the VFS objects we are currently tracking? :) Then imagine what this looks like on a production host running a complex mix of filesystems, enterprise networking, bpf programs, gpus and accelerators etc. Backporting the code to a slightly older production kernel wasn't too difficult. The instrumentation layering is explicit, clean, and fairly centralized, so resolving minor conflicts around the _noprof renames and the wrappers was pretty straight-forward. When we talk about maintenance cost, a fair shake would be to weigh it against the cost and reliability of our current method: evaluating consumers in the kernel on a case-by-case basis and annotating the alloc/free sites by hand; then quibbling with the MM community about whether that consumer is indeed significant enough to warrant an entry in /proc/meminfo, and what the catchiest name for the stat would be. I think we can agree that this is vastly less scalable and more burdensome than central annotations around a handful of mostly static allocator entry points. Especially considering the rate of change in the kernel as a whole, and that not everybody will think of the comprehensive MM picture when writing a random driver. And I think that's generous - we don't even have the network stack in meminfo. So I think what we do now isn't working. In the Meta fleet, at any given time the p50 for unaccounted kernel memory is several gigabytes per host. The p99 is between 15% and 30% of total memory. That's a looot of opaque resource usage we have to accept on faith. For hunting down regressions, all it takes is one untracked consumer in the kernel to really throw a wrench into things. It's difficult to find in the noise with tracing, and if it's not growing after an initial allocation spike, you're pretty much out of luck finding it at all. Raise your hand if you've written a drgn script to walk pfns and try to guess consumers from the state of struct page :) I agree we should discuss how the annotations are implemented on a technical basis, but my take is that we need something like this. In a codebase of our size, I don't think the allocator should be handing out memory without some basic implied tracking of where it's going. It's a liability for production environments, and it can hide bad memory management decisions in drivers and other subsystems for a very long time.
On 14.02.24 00:28, Suren Baghdasaryan wrote: > On Tue, Feb 13, 2024 at 3:22 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 14.02.24 00:12, Kent Overstreet wrote: >>> On Wed, Feb 14, 2024 at 12:02:30AM +0100, David Hildenbrand wrote: >>>> On 13.02.24 23:59, Suren Baghdasaryan wrote: >>>>> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet >>>>> <kent.overstreet@linux.dev> wrote: >>>>>> >>>>>> On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: >>>>>>> On 13.02.24 23:30, Suren Baghdasaryan wrote: >>>>>>>> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: >>>>>>>>> >>>>>>>>> On 13.02.24 23:09, Kent Overstreet wrote: >>>>>>>>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote: >>>>>>>>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote: >>>>>>>>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote: >>>>>>>>>>>>> [...] >>>>>>>>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback >>>>>>>>>>>>>> we've gotten has been that even out of tree this patchset has already >>>>>>>>>>>>>> been useful, and there's a significant amount of other work gated on the >>>>>>>>>>>>>> code tagging functionality included in this patchset [2]. >>>>>>>>>>>>> >>>>>>>>>>>>> I suspect it will not come as a surprise that I really dislike the >>>>>>>>>>>>> implementation proposed here. I will not repeat my arguments, I have >>>>>>>>>>>>> done so on several occasions already. >>>>>>>>>>>>> >>>>>>>>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe >>>>>>>>>>>>> this debugging feature will add a maintenance overhead for a very long >>>>>>>>>>>>> time. I can live with all the downsides of the proposed implementation >>>>>>>>>>>>> _as long as_ there is a wider agreement from the MM community as this is >>>>>>>>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any >>>>>>>>>>>>> acks by MM developers so aiming into the next merge window is more than >>>>>>>>>>>>> little rushed. >>>>>>>>>>>> >>>>>>>>>>>> We tried other previously proposed approaches and all have their >>>>>>>>>>>> downsides without making maintenance much easier. Your position is >>>>>>>>>>>> understandable and I think it's fair. Let's see if others see more >>>>>>>>>>>> benefit than cost here. >>>>>>>>>>> >>>>>>>>>>> Would it make sense to discuss that at LSF/MM once again, especially >>>>>>>>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far" >>>>>>>>>>> away (May). >>>>>>>>>>> >>>>>>>>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate >>>>>>>>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a >>>>>>>>>>> consensus on this. >>>>>>>>>> >>>>>>>>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd >>>>>>>>>> need to see a serious proposl - what we had at the last LSF was people >>>>>>>>>> jumping in with half baked alternative proposals that very much hadn't >>>>>>>>>> been thought through, and I see no need to repeat that. >>>>>>>>>> >>>>>>>>>> Like I mentioned, there's other work gated on this patchset; if people >>>>>>>>>> want to hold this up for more discussion they better be putting forth >>>>>>>>>> something to discuss. >>>>>>>>> >>>>>>>>> I'm thinking of ways on how to achieve Michal's request: "as long as >>>>>>>>> there is a wider agreement from the MM community". If we can achieve >>>>>>>>> that without LSF, great! (a bi-weekly MM meeting might also be an option) >>>>>>>> >>>>>>>> There will be a maintenance burden even with the cleanest proposed >>>>>>>> approach. >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>>> We worked hard to make the patchset as clean as possible and >>>>>>>> if benefits still don't outweigh the maintenance cost then we should >>>>>>>> probably stop trying. >>>>>>> >>>>>>> Indeed. >>>>>>> >>>>>>>> At LSF/MM I would rather discuss functonal >>>>>>>> issues/requirements/improvements than alternative approaches to >>>>>>>> instrument allocators. >>>>>>>> I'm happy to arrange a separate meeting with MM folks if that would >>>>>>>> help to progress on the cost/benefit decision. >>>>>>> Note that I am only proposing ways forward. >>>>>>> >>>>>>> If you think you can easily achieve what Michal requested without all that, >>>>>>> good. >>>>>> >>>>>> He requested something? >>>>> >>>>> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not >>>>> possible until the compiler feature is developed and deployed. And it >>>>> still would require changes to the headers, so don't think it's worth >>>>> delaying the feature for years. >>>>> >>>> >>>> I was talking about this: "I can live with all the downsides of the proposed >>>> implementationas long as there is a wider agreement from the MM community as >>>> this is where the maintenance cost will be payed. So far I have not seen >>>> (m)any acks by MM developers". >>>> >>>> I certainly cannot be motivated at this point to review and ack this, >>>> unfortunately too much negative energy around here. >>> >>> David, this kind of reaction is exactly why I was telling Andrew I was >>> going to submit this as a direct pull request to Linus. >>> >>> This is an important feature; if we can't stay focused ot the technical >>> and get it done that's what I'll do. >> >> Kent, I started this with "Would it make sense" in an attempt to help >> Suren and you to finally make progress with this, one way or the other. >> I know that there were ways in the past to get the MM community to agree >> on such things. >> >> I tried to be helpful, finding ways *not having to* bypass the MM >> community to get MM stuff merged. >> >> The reply I got is mostly negative energy. >> >> So you don't need my help here, understood. >> >> But I will fight against any attempts to bypass the MM community. > > Well, I'm definitely not trying to bypass the MM community, that's why > this patchset is posted. Not sure why people can't voice their opinion > on the benefit/cost balance of the patchset over the email... But if a > meeting would be more productive I'm happy to set it up. If you can get the acks without any additional meetings, great. The replies from Pasha and Johannes are encouraging, let's hope core memory-allocator people will voice their opinion here. If you come to the conclusion that another meeting would help getting maintainers's attention and sorting out some of the remaining concerns, feel free to schedule a meeting with Dave R. I suspect only the slot next week is already taken. In the past, we also had "special" meetings just for things to make progress faster. If you're looking for ideas on what the agenda of such a meeting could look like, I'll happily discuss that with you off-list. v2 was more than 3 months ago. If it's really about minor details here, waiting another 3 months for LSF/MM is indeed not reasonable. Myself, I'll be happy not having to sit through another LSF/MM session of that kind. The level of drama is exceptional and I'm hoping it won't be the new norm in the MM space. Good luck!
On 2/14/24 00:08, Kent Overstreet wrote: > On Tue, Feb 13, 2024 at 02:59:11PM -0800, Suren Baghdasaryan wrote: >> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet >> <kent.overstreet@linux.dev> wrote: >> > >> > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: >> > > On 13.02.24 23:30, Suren Baghdasaryan wrote: >> > > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote: >> > > If you think you can easily achieve what Michal requested without all that, >> > > good. >> > >> > He requested something? >> >> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not >> possible until the compiler feature is developed and deployed. And it >> still would require changes to the headers, so don't think it's worth >> delaying the feature for years. > > Hang on, let's look at the actual code. > > This is what instrumenting an allocation function looks like: > > #define krealloc_array(...) alloc_hooks(krealloc_array_noprof(__VA_ARGS__)) > > IOW, we have to: > - rename krealloc_array to krealloc_array_noprof > - replace krealloc_array with a one wrapper macro call > > Is this really all we're getting worked up over? > > The renaming we need regardless, because the thing that makes this > approach efficient enough to run in production is that we account at > _one_ point in the callstack, we don't save entire backtraces. > > And thus we need to explicitly annotate which one that is; which means > we need _noprof() versions of functions for when the accounting is done > by an outer wraper (e.g. mempool). > > And, as I keep saying: that alloc_hooks() macro will also get us _per > callsite fault injection points_, and we really need that because - if > you guys have been paying attention to other threads - whenever moving > more stuff to PF_MEMALLOC_* flags comes up (including adding > PF_MEMALLOC_NORECLAIM), the issue of small allocations not failing and > not being testable keeps coming up. How exactly do you envision the fault injection to help here? The proposals are about scoping via a process flag, and the process may then call just about anything under that scope. So if our tool is per callsite fault injection points, how do we know which callsites to enable to focus the fault injection on the particular scope?
On Tue 13-02-24 14:59:11, Suren Baghdasaryan wrote: > On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet > <kent.overstreet@linux.dev> wrote: > > > > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote: [...] > > > If you think you can easily achieve what Michal requested without all that, > > > good. > > > > He requested something? > > Yes, a cleaner instrumentation. Nope, not really. You have indicated you want to target this version for the _next_ merge window without any acks, really. If you want to go forward with this then you should gain a support from the MM community at least. Why? Because the whole macro layering is adding maintenance cost for MM people. I have expressed why I absolutely hate the additional macro layer. We have been through similar layers of macros in other areas (not to mention page allocator interface itself) and it has _always_ turned out a bad idea long term. I do not see why this case should be any different. The whole kernel is moving to a dynamic tracing realm and now we are going to build a statically macro based tracing infrastructure which will need tweaking anytime real memory consumers are one layer up the existing macro infrastructure (do not forget quite a lot of allocations are in library functions) and/or we need to modify the allocator API in some way. Call me unimpressed! Now, I fully recognize that the solution doesn't really have to be perfect in order to be useful. Hence I never NAKed it even though I really _dislike_ the approach. I have expected you will grow the community support over time if this is indeed the only feasible approach but that is not reflected in the series posted here. If you find a support I will not stand in the way. > Unfortunately the cleanest one is not > possible until the compiler feature is developed and deployed. And it > still would require changes to the headers, so don't think it's worth > delaying the feature for years. I am pretty sure you have invested a non-trivial time into evaluating other ways, yet your cover letter is rather modest about any details: : - looked at alternate hooking methods. : There were suggestions on alternate methods (compiler attribute, : trampolines), but they wouldn't have made the patchset any cleaner : (we still need to have different function versions for accounting vs. no : accounting to control at which point in a call chain the accounting : happens), and they would have added a dependency on toolchain : support. First immediate question would be: What about page_owner? I do remember the runtime overhead being discussed but I do not really remember any actual numbers outside of artificial workloads. Has this been investigated? Is our stack unwinder the problem? Etc. Also what are the biggest obstacles to efficiently track allocations via our tracing infrastructure? Has this been investigated? What were conclusions?
On Wed 14-02-24 01:20:20, Johannes Weiner wrote: [...] > I agree we should discuss how the annotations are implemented on a > technical basis, but my take is that we need something like this. I do not think there is any disagreement on usefulness of a better memory allocation tracking. At least for me the primary problem is the implementation. At LFSMM last year we have heard that existing tracing infrastructure hasn't really been explored much. Cover letter doesn't really talk much about those alternatives so it is really hard to evaluate whether the proposed solution is indeed our best way to approach this. > In a codebase of our size, I don't think the allocator should be > handing out memory without some basic implied tracking of where it's > going. It's a liability for production environments, and it can hide > bad memory management decisions in drivers and other subsystems for a > very long time. Fully agreed! It is quite common to see oom reports with a large portion of memory unaccounted and this really presents additional cost on the debugging side.
On Wed 14-02-24 10:01:14, Kent Overstreet wrote: > On Wed, Feb 14, 2024 at 03:46:33PM +0100, Michal Hocko wrote: > > On Wed 14-02-24 01:20:20, Johannes Weiner wrote: > > [...] > > > I agree we should discuss how the annotations are implemented on a > > > technical basis, but my take is that we need something like this. > > > > I do not think there is any disagreement on usefulness of a better > > memory allocation tracking. At least for me the primary problem is the > > implementation. At LFSMM last year we have heard that existing tracing > > infrastructure hasn't really been explored much. Cover letter doesn't > > really talk much about those alternatives so it is really hard to > > evaluate whether the proposed solution is indeed our best way to > > approach this. > > Michal, we covered this before. It is a good practice to summarize previous discussions in the cover letter. Especially when there are different approaches discussed over a longer time period or when the topic is controversial. I do not see anything like that here. Neither for the existing tracing infrastructure, page owner nor performance concerns discussed before etc. Look, I do not want to nit pick or insist on formalisms but having those data points layed out would make any further discussion much more smooth.
On Wed 14-02-24 11:17:20, Kent Overstreet wrote:
[...]
> You gotta stop with this this derailing garbage.
It is always pleasure talking to you Kent, but let me give you advice
(free of charge of course). Let Suren talk, chances for civilized
and productive discussion are much higher!
I do not have much more to add to the discussion. My point stays, find a
support of the MM community if you want to proceed with this work.
On Tue, 13 Feb 2024 14:59:11 -0800 Suren Baghdasaryan <surenb@google.com> wrote: > > > If you think you can easily achieve what Michal requested without all that, > > > good. > > > > He requested something? > > Yes, a cleaner instrumentation. Unfortunately the cleanest one is not > possible until the compiler feature is developed and deployed. And it > still would require changes to the headers, so don't think it's worth > delaying the feature for years. Can we please be told much more about this compiler feature? Description of what it is, what it does, how it will affect this kernel feature, etc. Who is developing it and when can we expect it to become available? Will we be able to migrate to it without back-compatibility concerns? (I think "you need quite recent gcc for memory profiling" is reasonable). Because: if the maintainability issues which Michel describes will be significantly addressed with the gcc support then we're kinda reviewing the wrong patchset. Yes, it may be a maintenance burden initially, but at some (yet to be revealed) time in the future, this will be addressed with the gcc support?
On Wed, Feb 14, 2024 at 8:55 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 13 Feb 2024 14:59:11 -0800 Suren Baghdasaryan <surenb@google.com> wrote: > > > > > If you think you can easily achieve what Michal requested without all that, > > > > good. > > > > > > He requested something? > > > > Yes, a cleaner instrumentation. Unfortunately the cleanest one is not > > possible until the compiler feature is developed and deployed. And it > > still would require changes to the headers, so don't think it's worth > > delaying the feature for years. > > Can we please be told much more about this compiler feature? > Description of what it is, what it does, how it will affect this kernel > feature, etc. Sure. The compiler support will be in a form of a new __attribute__, simplified example: // generate data for the wrapper static void _alloc_tag() { static struct alloc_tag _alloc_tag __section ("alloc_tags") = { .ct = CODE_TAG_INIT, .counter = 0 }; } static inline int wrapper (const char *name, int x, int (*callee) (const char *, int), struct alloc_tag *callsite_data) { callsite_data->counter++; printf ("Call #%d from %s:%d (%s)\n", callsite_data->counter, callsite_data->ct.filename, callsite_data->ct.lineno, callsite_data->ct.function); int ret = callee (name, x); printf ("Returned: %d\n", ret); return ret; } __attribute__((annotate("callsite_wrapped_by", wrapper, _alloc_tag))) int foo(const char* name, int x); int foo(const char* name, int x) { printf ("Hello %s, %d!\n", name, x); return x; } Which we will be able to attach to a function without changing its name and preserving the namespace (it applies only to functions with that name, not everything else). Note that we will still need _noprof versions of the allocators. > > Who is developing it and when can we expect it to become available? Aleksei Vetrov (google) with the help of Nick Desaulniers (google). Both are CC'ed on this email. After several iterations Aleksei has a POC which we are evaluating (https://github.com/llvm/llvm-project/compare/main...noxwell:llvm-project:callsite-wrapper-tree-transform). Once it's in good shape we are going to engage with CLANG and GCC community to get it upstreamed. When it will become available and when the distributions will pick it up is anybody's guess. Upstreaming is usually a lengthy process. > > Will we be able to migrate to it without back-compatibility concerns? > (I think "you need quite recent gcc for memory profiling" is > reasonable). The migration should be quite straight-forward, replacing the macros with functions with that attribute. > > > Because: if the maintainability issues which Michel describes will be > significantly addressed with the gcc support then we're kinda reviewing > the wrong patchset. Yes, it may be a maintenance burden initially, but > at some (yet to be revealed) time in the future, this will be addressed > with the gcc support? That's what I'm aiming for. I just don't want this placed on hold until the compiler support is widely available, which might take years. >
On Mon, 2024-02-12 at 13:38 -0800, Suren Baghdasaryan wrote: > Memory allocation, v3 and final: > > Overview: > Low overhead [1] per-callsite memory allocation profiling. Not just for debug > kernels, overhead low enough to be deployed in production. > > We're aiming to get this in the next merge window, for 6.9. The feedback > we've gotten has been that even out of tree this patchset has already > been useful, and there's a significant amount of other work gated on the > code tagging functionality included in this patchset [2]. > > Example output: > root@moria-kvm:~# sort -h /proc/allocinfo|tail > 3.11MiB 2850 fs/ext4/super.c:1408 module:ext4 func:ext4_alloc_inode > 3.52MiB 225 kernel/fork.c:356 module:fork func:alloc_thread_stack_node > 3.75MiB 960 mm/page_ext.c:270 module:page_ext func:alloc_page_ext > 4.00MiB 2 mm/khugepaged.c:893 module:khugepaged func:hpage_collapse_alloc_folio > 10.5MiB 168 block/blk-mq.c:3421 module:blk_mq func:blk_mq_alloc_rqs > 14.0MiB 3594 include/linux/gfp.h:295 module:filemap func:folio_alloc_noprof > 26.8MiB 6856 include/linux/gfp.h:295 module:memory func:folio_alloc_noprof > 64.5MiB 98315 fs/xfs/xfs_rmap_item.c:147 module:xfs func:xfs_rui_init > 98.7MiB 25264 include/linux/gfp.h:295 module:readahead func:folio_alloc_noprof > 125MiB 7357 mm/slub.c:2201 module:slub func:alloc_slab_page > > Since v2: > - tglx noticed a circular header dependency between sched.h and percpu.h; > a bunch of header cleanups were merged into 6.8 to ameliorate this [3]. > > - a number of improvements, moving alloc_hooks() annotations to the > correct place for better tracking (mempool), and bugfixes. > > - looked at alternate hooking methods. > There were suggestions on alternate methods (compiler attribute, > trampolines), but they wouldn't have made the patchset any cleaner > (we still need to have different function versions for accounting vs. no > accounting to control at which point in a call chain the accounting > happens), and they would have added a dependency on toolchain > support. > > Usage: > kconfig options: > - CONFIG_MEM_ALLOC_PROFILING > - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT > - CONFIG_MEM_ALLOC_PROFILING_DEBUG > adds warnings for allocations that weren't accounted because of a > missing annotation > > sysctl: > /proc/sys/vm/mem_profiling > > Runtime info: > /proc/allocinfo > > Notes: > > [1]: Overhead > To measure the overhead we are comparing the following configurations: > (1) Baseline with CONFIG_MEMCG_KMEM=n > (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) > (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) > (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && > CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) > (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT > Thanks for the work on this patchset and it is quite useful. A clarification question on the data: I assume Config (2), (3) and (4) has CONFIG_MEMCG_KMEM=n, right? If so do you have similar data for config (2), (3) and (4) but with CONFIG_MEMCG_KMEM=y for comparison with (5)? Tim > Performance overhead: > To evaluate performance we implemented an in-kernel test executing > multiple get_free_page/free_page and kmalloc/kfree calls with allocation > sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU > affinity set to a specific CPU to minimize the noise. Below are results > from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on > 56 core Intel Xeon: > > kmalloc pgalloc > (1 baseline) 6.764s 16.902s > (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) > (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) > (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) > (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) >
On Mon, 12 Feb 2024, Suren Baghdasaryan <surenb@google.com> wrote: > Memory allocation, v3 and final: > > Overview: > Low overhead [1] per-callsite memory allocation profiling. Not just for debug > kernels, overhead low enough to be deployed in production. > > We're aiming to get this in the next merge window, for 6.9. The feedback > we've gotten has been that even out of tree this patchset has already > been useful, and there's a significant amount of other work gated on the > code tagging functionality included in this patchset [2]. I wonder if it wouldn't be too much trouble to write at least a brief overview document under Documentation/ describing what this is all about? Even as follow-up. People seeing the patch series have the benefit of the cover letter and the commit messages, but that's hardly documentation. We have all these great frameworks and tools but their discoverability to kernel developers isn't always all that great. BR, Jani.
On Fri, 16 Feb 2024, Kent Overstreet <kent.overstreet@linux.dev> wrote: > On Fri, Feb 16, 2024 at 10:38:00AM +0200, Jani Nikula wrote: >> I wonder if it wouldn't be too much trouble to write at least a brief >> overview document under Documentation/ describing what this is all >> about? Even as follow-up. People seeing the patch series have the >> benefit of the cover letter and the commit messages, but that's hardly >> documentation. >> >> We have all these great frameworks and tools but their discoverability >> to kernel developers isn't always all that great. > > commit f589b48789de4b8f77bfc70b9f3ab2013c01eaf2 > Author: Kent Overstreet <kent.overstreet@linux.dev> > Date: Wed Feb 14 01:13:04 2024 -0500 > > memprofiling: Documentation > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Thanks! Wasn't part of this series and I wasn't aware it existed. BR, Jani.