[v3,00/35] Memory allocation profiling

Message ID	20240212213922.783301-1-surenb@google.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-62406-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Date: Mon, 12 Feb 2024 13:38:46 -0800 Precedence: bulk Mime-Version: 1.0 Message-ID: <20240212213922.783301-1-surenb@google.com> Subject: [PATCH v3 00/35] Memory allocation profiling From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, surenb@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX
Series	Memory allocation profiling \| [v3,00/35] Memory allocation profiling [v3,01/35] lib/string_helpers: Add flags param to string_get_size() [v3,02/35] scripts/kallysms: Always include __start and __stop symbols [v3,03/35] fs: Convert alloc_inode_sb() to a macro [v3,04/35] mm: enumerate all gfp flags [v3,05/35] mm: introduce slabobj_ext to support slab object extensions [v3,06/35] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation [v3,07/35] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation [v3,08/35] mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects [v3,09/35] slab: objext: introduce objext_flags as extension to page_memcg_data_flags [v3,10/35] lib: code tagging framework [v3,11/35] lib: code tagging module support [v3,12/35] lib: prevent module unloading if memory is not freed [v3,13/35] lib: add allocation tagging support for memory allocation profiling [v3,14/35] lib: introduce support for page allocation tagging [v3,15/35] mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags [v3,16/35] change alloc_pages name in dma_map_ops to avoid name conflicts [v3,17/35] mm: enable page allocation tagging [v3,18/35] mm: create new codetag references during page splitting [v3,19/35] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y [v3,20/35] lib: add codetag reference into slabobj_ext [v3,21/35] mm/slab: add allocation accounting into slab allocation and free paths [v3,22/35] mm/slab: enable slab allocation tagging for kmalloc and friends [v3,23/35] mm/slub: Mark slab_free_freelist_hook() __always_inline [v3,24/35] mempool: Hook up to memory allocation profiling [v3,25/35] xfs: Memory allocation profiling fixups [v3,26/35] mm: percpu: Introduce pcpuobj_ext [v3,27/35] mm: percpu: Add codetag reference into pcpuobj_ext [v3,28/35] mm: percpu: enable per-cpu allocation tagging [v3,29/35] mm: vmalloc: Enable memory allocation profiling [v3,30/35] rhashtable: Plumb through alloc tag [v3,31/35] lib: add memory allocations report in show_mem() [v3,32/35] codetag: debug: skip objext checking when it's for objext itself [v3,33/35] codetag: debug: mark codetags for reserved pages as empty [v3,34/35] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations [v3,35/35] MAINTAINERS: Add entries for code tagging and memory allocation profiling

Message ID

20240212213922.783301-1-surenb@google.com

Headers

Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-62406-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1;
Date: Mon, 12 Feb 2024 13:38:46 -0800
Precedence: bulk
Mime-Version: 1.0
Message-ID: <20240212213922.783301-1-surenb@google.com>
Subject: [PATCH v3 00/35] Memory allocation profiling
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz,
	hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de,
	dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com,
	corbet@lwn.net, void@manifault.com, peterz@infradead.org,
	juri.lelli@redhat.com, catalin.marinas@arm.com, will@kernel.org,
	arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com,
	david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org,
	nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev,
	rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com,
	yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com,
	hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org,
	ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org,
	ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com,
	bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org,
	iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com,
	elver@google.com, dvyukov@google.com, shakeelb@google.com,
	songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com,
	minchan@google.com, kaleshsingh@google.com, surenb@google.com,
	kernel-team@android.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org,
 linux-mm@kvack.org,
	linux-modules@vger.kernel.org, kasan-dev@googlegroups.com,
	cgroups@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"

Series

Memory allocation profiling |

Message

Suren Baghdasaryan Feb. 12, 2024, 9:38 p.m. UTC

  Memory allocation, v3 and final:

Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for debug
kernels, overhead low enough to be deployed in production.

We're aiming to get this in the next merge window, for 6.9. The feedback
we've gotten has been that even out of tree this patchset has already
been useful, and there's a significant amount of other work gated on the
code tagging functionality included in this patchset [2].

Example output:
  root@moria-kvm:~# sort -h /proc/allocinfo|tail
   3.11MiB     2850 fs/ext4/super.c:1408 module:ext4 func:ext4_alloc_inode
   3.52MiB      225 kernel/fork.c:356 module:fork func:alloc_thread_stack_node
   3.75MiB      960 mm/page_ext.c:270 module:page_ext func:alloc_page_ext
   4.00MiB        2 mm/khugepaged.c:893 module:khugepaged func:hpage_collapse_alloc_folio
   10.5MiB      168 block/blk-mq.c:3421 module:blk_mq func:blk_mq_alloc_rqs
   14.0MiB     3594 include/linux/gfp.h:295 module:filemap func:folio_alloc_noprof
   26.8MiB     6856 include/linux/gfp.h:295 module:memory func:folio_alloc_noprof
   64.5MiB    98315 fs/xfs/xfs_rmap_item.c:147 module:xfs func:xfs_rui_init
   98.7MiB    25264 include/linux/gfp.h:295 module:readahead func:folio_alloc_noprof
    125MiB     7357 mm/slub.c:2201 module:slub func:alloc_slab_page

Since v2:
 - tglx noticed a circular header dependency between sched.h and percpu.h;
   a bunch of header cleanups were merged into 6.8 to ameliorate this [3].

 - a number of improvements, moving alloc_hooks() annotations to the
   correct place for better tracking (mempool), and bugfixes.

 - looked at alternate hooking methods.
   There were suggestions on alternate methods (compiler attribute,
   trampolines), but they wouldn't have made the patchset any cleaner
   (we still need to have different function versions for accounting vs. no
   accounting to control at which point in a call chain the accounting
   happens), and they would have added a dependency on toolchain
   support.

Usage:
kconfig options:
 - CONFIG_MEM_ALLOC_PROFILING
 - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
 - CONFIG_MEM_ALLOC_PROFILING_DEBUG
   adds warnings for allocations that weren't accounted because of a
   missing annotation

sysctl:
  /proc/sys/vm/mem_profiling

Runtime info:
  /proc/allocinfo

Notes:

[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:

                        kmalloc                 pgalloc
(1 baseline)            6.764s                  16.902s
(2 default disabled)    6.793s (+0.43%)         17.007s (+0.62%)
(3 default enabled)     7.197s (+6.40%)         23.666s (+40.02%)
(4 runtime enabled)     7.405s (+9.48%)         23.901s (+41.41%)
(5 memcg)               13.388s (+97.94%)       48.460s (+186.71%)

Memory overhead:
Kernel size:

   text           data        bss         dec         diff
(1) 26515311	      18890222    17018880    62424413
(2) 26524728	      19423818    16740352    62688898    264485
(3) 26524724	      19423818    16740352    62688894    264481
(4) 26524728	      19423818    16740352    62688898    264485
(5) 26541782	      18964374    16957440    62463596    39183

Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags:           192 kB
PageExts:         262144 kB (256MB)
SlabExts:           9876 kB (9.6MB)
PcpuExts:            512 kB (0.5MB)

Total overhead is 0.2% of total memory.

[2]: Improved fault injection is the big one; the alloc_hooks() macro
this patchset introduces is also used for per-callsite fault injection
points in the dynamic fault injection patchset, which means we can
easily do fault injection on a per module or per file basis; this makes
it much easier to integrate memory fault injection into existing tests.

Vlastimil recently raised concerns about exposing GFP_NOWAIT as a
PF_MEMALLOC_* flag, as this might introduce GFP_NOWAIT to allocation
paths that have never had their failure paths tested - this is something
we need to address.

[3]: The circular dependency looks to be unavoidable; the issue is that
alloc_tag_save() -> current -> get_current() requires percpu.h, and
percpu.h requires sched.h because of course it does. But this doesn't
actually cause build errors because we're only using macros, so the main
concern is just not leaving a difficult-to-disentangle minefield for
later.
So, sched.h is now pretty close to being a types only header that
imports types and declares types - this is the header cleanups that were
merged for 6.8.


Kent Overstreet (11):
  lib/string_helpers: Add flags param to string_get_size()
  scripts/kallysms: Always include __start and __stop symbols
  fs: Convert alloc_inode_sb() to a macro
  mm/slub: Mark slab_free_freelist_hook() __always_inline
  mempool: Hook up to memory allocation profiling
  xfs: Memory allocation profiling fixups
  mm: percpu: Introduce pcpuobj_ext
  mm: percpu: Add codetag reference into pcpuobj_ext
  mm: vmalloc: Enable memory allocation profiling
  rhashtable: Plumb through alloc tag
  MAINTAINERS: Add entries for code tagging and memory allocation
    profiling

Suren Baghdasaryan (24):
  mm: enumerate all gfp flags
  mm: introduce slabobj_ext to support slab object extensions
  mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext
    creation
  mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
  mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache
    objects
  slab: objext: introduce objext_flags as extension to
    page_memcg_data_flags
  lib: code tagging framework
  lib: code tagging module support
  lib: prevent module unloading if memory is not freed
  lib: add allocation tagging support for memory allocation profiling
  lib: introduce support for page allocation tagging
  mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation
    tags
  change alloc_pages name in dma_map_ops to avoid name conflicts
  mm: enable page allocation tagging
  mm: create new codetag references during page splitting
  mm/page_ext: enable early_page_ext when
    CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
  lib: add codetag reference into slabobj_ext
  mm/slab: add allocation accounting into slab allocation and free paths
  mm/slab: enable slab allocation tagging for kmalloc and friends
  mm: percpu: enable per-cpu allocation tagging
  lib: add memory allocations report in show_mem()
  codetag: debug: skip objext checking when it's for objext itself
  codetag: debug: mark codetags for reserved pages as empty
  codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext
    allocations

 Documentation/admin-guide/sysctl/vm.rst       |  16 ++
 Documentation/filesystems/proc.rst            |  28 ++
 MAINTAINERS                                   |  16 ++
 arch/alpha/kernel/pci_iommu.c                 |   2 +-
 arch/mips/jazz/jazzdma.c                      |   2 +-
 arch/powerpc/kernel/dma-iommu.c               |   2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c      |   2 +-
 arch/powerpc/platforms/ps3/system-bus.c       |   4 +-
 arch/powerpc/platforms/pseries/vio.c          |   2 +-
 arch/x86/kernel/amd_gart_64.c                 |   2 +-
 drivers/block/virtio_blk.c                    |   4 +-
 drivers/gpu/drm/gud/gud_drv.c                 |   2 +-
 drivers/iommu/dma-iommu.c                     |   2 +-
 drivers/mmc/core/block.c                      |   4 +-
 drivers/mtd/spi-nor/debugfs.c                 |   6 +-
 .../ethernet/chelsio/cxgb4/cxgb4_debugfs.c    |   4 +-
 drivers/parisc/ccio-dma.c                     |   2 +-
 drivers/parisc/sba_iommu.c                    |   2 +-
 drivers/scsi/sd.c                             |   8 +-
 drivers/staging/media/atomisp/pci/hmm/hmm.c   |   2 +-
 drivers/xen/grant-dma-ops.c                   |   2 +-
 drivers/xen/swiotlb-xen.c                     |   2 +-
 fs/xfs/kmem.c                                 |   4 +-
 fs/xfs/kmem.h                                 |  10 +-
 include/asm-generic/codetag.lds.h             |  14 +
 include/asm-generic/vmlinux.lds.h             |   3 +
 include/linux/alloc_tag.h                     | 188 +++++++++++++
 include/linux/codetag.h                       |  83 ++++++
 include/linux/dma-map-ops.h                   |   2 +-
 include/linux/fortify-string.h                |   5 +-
 include/linux/fs.h                            |   6 +-
 include/linux/gfp.h                           | 126 +++++----
 include/linux/gfp_types.h                     | 101 +++++--
 include/linux/memcontrol.h                    |  56 +++-
 include/linux/mempool.h                       |  73 +++--
 include/linux/mm.h                            |   8 +
 include/linux/mm_types.h                      |   4 +-
 include/linux/page_ext.h                      |   1 -
 include/linux/pagemap.h                       |   9 +-
 include/linux/percpu.h                        |  27 +-
 include/linux/pgalloc_tag.h                   | 105 +++++++
 include/linux/rhashtable-types.h              |  11 +-
 include/linux/sched.h                         |  24 ++
 include/linux/slab.h                          | 184 +++++++------
 include/linux/string.h                        |   4 +-
 include/linux/string_helpers.h                |  11 +-
 include/linux/vmalloc.h                       |  60 +++-
 init/Kconfig                                  |   4 +
 kernel/dma/mapping.c                          |   4 +-
 kernel/kallsyms_selftest.c                    |   2 +-
 kernel/module/main.c                          |  25 +-
 lib/Kconfig.debug                             |  31 +++
 lib/Makefile                                  |   3 +
 lib/alloc_tag.c                               | 213 +++++++++++++++
 lib/codetag.c                                 | 258 ++++++++++++++++++
 lib/rhashtable.c                              |  52 +++-
 lib/string_helpers.c                          |  22 +-
 lib/test-string_helpers.c                     |   4 +-
 mm/compaction.c                               |   7 +-
 mm/filemap.c                                  |   6 +-
 mm/huge_memory.c                              |   2 +
 mm/hugetlb.c                                  |   8 +-
 mm/kfence/core.c                              |  14 +-
 mm/kfence/kfence.h                            |   4 +-
 mm/memcontrol.c                               |  56 +---
 mm/mempolicy.c                                |  52 ++--
 mm/mempool.c                                  |  36 +--
 mm/mm_init.c                                  |  10 +
 mm/page_alloc.c                               |  66 +++--
 mm/page_ext.c                                 |  13 +
 mm/page_owner.c                               |   2 +-
 mm/percpu-internal.h                          |  26 +-
 mm/percpu.c                                   | 120 ++++----
 mm/show_mem.c                                 |  15 +
 mm/slab.h                                     | 176 ++++++++++--
 mm/slab_common.c                              |  65 ++++-
 mm/slub.c                                     | 138 ++++++----
 mm/util.c                                     |  44 +--
 mm/vmalloc.c                                  |  88 +++---
 scripts/kallsyms.c                            |  13 +
 scripts/module.lds.S                          |   7 +
 81 files changed, 2126 insertions(+), 695 deletions(-)
 create mode 100644 include/asm-generic/codetag.lds.h
 create mode 100644 include/linux/alloc_tag.h
 create mode 100644 include/linux/codetag.h
 create mode 100644 include/linux/pgalloc_tag.h
 create mode 100644 lib/alloc_tag.c
 create mode 100644 lib/codetag.c

Comments

Kees Cook Feb. 13, 2024, 12:29 a.m. UTC | #1

On Mon, Feb 12, 2024 at 01:38:46PM -0800, Suren Baghdasaryan wrote:
> Low overhead [1] per-callsite memory allocation profiling. Not just for debug
> kernels, overhead low enough to be deployed in production.

What's the plan for things like devm_kmalloc() and similar relatively
simple wrappers? I was thinking it would be possible to reimplement at
least devm_kmalloc() with size and flags changing helper a while back:

https://lore.kernel.org/all/202309111428.6F36672F57@keescook/

I suspect it could be possible to adapt the alloc_hooks wrapper in this
series similarly:

#define alloc_hooks_prep(_do_alloc, _do_prepare, _do_finish,		\
			  ctx, size, flags)				\
({									\
	typeof(_do_alloc) _res;						\
	DEFINE_ALLOC_TAG(_alloc_tag, _old);				\
	ssize_t _size = (size);						\
	size_t _usable = _size;						\
	gfp_t _flags = (flags);						\
									\
	_res = _do_prepare(ctx, &_size, &_flags);			\
	if (!IS_ERR_OR_NULL(_res)					\
		_res = _do_alloc(_size, _flags);			\
	if (!IS_ERR_OR_NULL(_res)					\
		_res = _do_finish(ctx, _usable, _size, _flags, _res);	\
	_res;								\
})

#define devm_kmalloc(dev, size, flags)					\
	alloc_hooks_prep(kmalloc, devm_alloc_prep, devm_alloc_finish,	\
			 dev, size, flags)

And devm_alloc_prep() and devm_alloc_finish() adapted from the URL
above.

And _do_finish instances could be marked with __realloc_size(2)

-Kees

Michal Hocko Feb. 13, 2024, 12:24 p.m. UTC | #2

On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
[...]
> We're aiming to get this in the next merge window, for 6.9. The feedback
> we've gotten has been that even out of tree this patchset has already
> been useful, and there's a significant amount of other work gated on the
> code tagging functionality included in this patchset [2].

I suspect it will not come as a surprise that I really dislike the
implementation proposed here. I will not repeat my arguments, I have
done so on several occasions already. 

Anyway, I didn't go as far as to nak it even though I _strongly_ believe
this debugging feature will add a maintenance overhead for a very long
time. I can live with all the downsides of the proposed implementation
_as long as_ there is a wider agreement from the MM community as this is
where the maintenance cost will be payed. So far I have not seen (m)any
acks by MM developers so aiming into the next merge window is more than
little rushed. 

>  81 files changed, 2126 insertions(+), 695 deletions(-)

David Hildenbrand Feb. 13, 2024, 10:04 p.m. UTC | #3

On 13.02.24 22:58, Suren Baghdasaryan wrote:
> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
>>
>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
>> [...]
>>> We're aiming to get this in the next merge window, for 6.9. The feedback
>>> we've gotten has been that even out of tree this patchset has already
>>> been useful, and there's a significant amount of other work gated on the
>>> code tagging functionality included in this patchset [2].
>>
>> I suspect it will not come as a surprise that I really dislike the
>> implementation proposed here. I will not repeat my arguments, I have
>> done so on several occasions already.
>>
>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe
>> this debugging feature will add a maintenance overhead for a very long
>> time. I can live with all the downsides of the proposed implementation
>> _as long as_ there is a wider agreement from the MM community as this is
>> where the maintenance cost will be payed. So far I have not seen (m)any
>> acks by MM developers so aiming into the next merge window is more than
>> little rushed.
> 
> We tried other previously proposed approaches and all have their
> downsides without making maintenance much easier. Your position is
> understandable and I think it's fair. Let's see if others see more
> benefit than cost here.

Would it make sense to discuss that at LSF/MM once again, especially 
covering why proposed alternatives did not work out? LSF/MM is not "too 
far" away (May).

I recall that the last LSF/MM session on this topic was a bit 
unfortunate (IMHO not as productive as it could have been). Maybe we can 
finally reach a consensus on this.

Kent Overstreet Feb. 13, 2024, 10:09 p.m. UTC | #4

On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
> > > 
> > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > [...]
> > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > we've gotten has been that even out of tree this patchset has already
> > > > been useful, and there's a significant amount of other work gated on the
> > > > code tagging functionality included in this patchset [2].
> > > 
> > > I suspect it will not come as a surprise that I really dislike the
> > > implementation proposed here. I will not repeat my arguments, I have
> > > done so on several occasions already.
> > > 
> > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > this debugging feature will add a maintenance overhead for a very long
> > > time. I can live with all the downsides of the proposed implementation
> > > _as long as_ there is a wider agreement from the MM community as this is
> > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > acks by MM developers so aiming into the next merge window is more than
> > > little rushed.
> > 
> > We tried other previously proposed approaches and all have their
> > downsides without making maintenance much easier. Your position is
> > understandable and I think it's fair. Let's see if others see more
> > benefit than cost here.
> 
> Would it make sense to discuss that at LSF/MM once again, especially
> covering why proposed alternatives did not work out? LSF/MM is not "too far"
> away (May).
> 
> I recall that the last LSF/MM session on this topic was a bit unfortunate
> (IMHO not as productive as it could have been). Maybe we can finally reach a
> consensus on this.

I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
need to see a serious proposl - what we had at the last LSF was people
jumping in with half baked alternative proposals that very much hadn't
been thought through, and I see no need to repeat that.

Like I mentioned, there's other work gated on this patchset; if people
want to hold this up for more discussion they better be putting forth
something to discuss.

David Hildenbrand Feb. 13, 2024, 10:17 p.m. UTC | #5

On 13.02.24 23:09, Kent Overstreet wrote:
> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
>> On 13.02.24 22:58, Suren Baghdasaryan wrote:
>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>
>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
>>>> [...]
>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback
>>>>> we've gotten has been that even out of tree this patchset has already
>>>>> been useful, and there's a significant amount of other work gated on the
>>>>> code tagging functionality included in this patchset [2].
>>>>
>>>> I suspect it will not come as a surprise that I really dislike the
>>>> implementation proposed here. I will not repeat my arguments, I have
>>>> done so on several occasions already.
>>>>
>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe
>>>> this debugging feature will add a maintenance overhead for a very long
>>>> time. I can live with all the downsides of the proposed implementation
>>>> _as long as_ there is a wider agreement from the MM community as this is
>>>> where the maintenance cost will be payed. So far I have not seen (m)any
>>>> acks by MM developers so aiming into the next merge window is more than
>>>> little rushed.
>>>
>>> We tried other previously proposed approaches and all have their
>>> downsides without making maintenance much easier. Your position is
>>> understandable and I think it's fair. Let's see if others see more
>>> benefit than cost here.
>>
>> Would it make sense to discuss that at LSF/MM once again, especially
>> covering why proposed alternatives did not work out? LSF/MM is not "too far"
>> away (May).
>>
>> I recall that the last LSF/MM session on this topic was a bit unfortunate
>> (IMHO not as productive as it could have been). Maybe we can finally reach a
>> consensus on this.
> 
> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> need to see a serious proposl - what we had at the last LSF was people
> jumping in with half baked alternative proposals that very much hadn't
> been thought through, and I see no need to repeat that.
> 
> Like I mentioned, there's other work gated on this patchset; if people
> want to hold this up for more discussion they better be putting forth
> something to discuss.

I'm thinking of ways on how to achieve Michal's request: "as long as 
there is a wider agreement from the MM community". If we can achieve 
that without LSF, great! (a bi-weekly MM meeting might also be an option)

Kent Overstreet Feb. 13, 2024, 10:29 p.m. UTC | #6

On Tue, Feb 13, 2024 at 11:17:32PM +0100, David Hildenbrand wrote:
> On 13.02.24 23:09, Kent Overstreet wrote:
> > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> > > On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > 
> > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > > > [...]
> > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > > > we've gotten has been that even out of tree this patchset has already
> > > > > > been useful, and there's a significant amount of other work gated on the
> > > > > > code tagging functionality included in this patchset [2].
> > > > > 
> > > > > I suspect it will not come as a surprise that I really dislike the
> > > > > implementation proposed here. I will not repeat my arguments, I have
> > > > > done so on several occasions already.
> > > > > 
> > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > > > this debugging feature will add a maintenance overhead for a very long
> > > > > time. I can live with all the downsides of the proposed implementation
> > > > > _as long as_ there is a wider agreement from the MM community as this is
> > > > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > > > acks by MM developers so aiming into the next merge window is more than
> > > > > little rushed.
> > > > 
> > > > We tried other previously proposed approaches and all have their
> > > > downsides without making maintenance much easier. Your position is
> > > > understandable and I think it's fair. Let's see if others see more
> > > > benefit than cost here.
> > > 
> > > Would it make sense to discuss that at LSF/MM once again, especially
> > > covering why proposed alternatives did not work out? LSF/MM is not "too far"
> > > away (May).
> > > 
> > > I recall that the last LSF/MM session on this topic was a bit unfortunate
> > > (IMHO not as productive as it could have been). Maybe we can finally reach a
> > > consensus on this.
> > 
> > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> > need to see a serious proposl - what we had at the last LSF was people
> > jumping in with half baked alternative proposals that very much hadn't
> > been thought through, and I see no need to repeat that.
> > 
> > Like I mentioned, there's other work gated on this patchset; if people
> > want to hold this up for more discussion they better be putting forth
> > something to discuss.
> 
> I'm thinking of ways on how to achieve Michal's request: "as long as there
> is a wider agreement from the MM community". If we can achieve that without
> LSF, great! (a bi-weekly MM meeting might also be an option)

A meeting wouldn't be out of the question, _if_ there is an agenda, but:

What's that coffeee mug say? I just survived another meeting that
could've been an email? What exactly is the outcome we're looking for?

Is there info that people are looking for? I think we summed things up
pretty well in the cover letter; if there are specifics that people
want to discuss, that's why we emailed the series out.

There's people in this thread who've used this patchset in production
and diagnosed real issues (gigabytes of memory gone missing, I heard the
other day); I'm personally looking for them to chime in on this thread
(Johannes, Pasha).

If it's just grumbling about "maintenance overhead" we need to get past
- well, people are going to have to accept that we can't deliver
features without writing code, and I'm confident that the hooking in
particular is about as clean as it's going to get, _regardless_ of
toolchain support; and moreover it addresses what's been historically a
pretty gaping hole in our ability to profile and understand the code we
write.

David Hildenbrand Feb. 13, 2024, 10:48 p.m. UTC | #7

On 13.02.24 23:30, Suren Baghdasaryan wrote:
> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 13.02.24 23:09, Kent Overstreet wrote:
>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote:
>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>>>
>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
>>>>>> [...]
>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback
>>>>>>> we've gotten has been that even out of tree this patchset has already
>>>>>>> been useful, and there's a significant amount of other work gated on the
>>>>>>> code tagging functionality included in this patchset [2].
>>>>>>
>>>>>> I suspect it will not come as a surprise that I really dislike the
>>>>>> implementation proposed here. I will not repeat my arguments, I have
>>>>>> done so on several occasions already.
>>>>>>
>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe
>>>>>> this debugging feature will add a maintenance overhead for a very long
>>>>>> time. I can live with all the downsides of the proposed implementation
>>>>>> _as long as_ there is a wider agreement from the MM community as this is
>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any
>>>>>> acks by MM developers so aiming into the next merge window is more than
>>>>>> little rushed.
>>>>>
>>>>> We tried other previously proposed approaches and all have their
>>>>> downsides without making maintenance much easier. Your position is
>>>>> understandable and I think it's fair. Let's see if others see more
>>>>> benefit than cost here.
>>>>
>>>> Would it make sense to discuss that at LSF/MM once again, especially
>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far"
>>>> away (May).
>>>>
>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate
>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a
>>>> consensus on this.
>>>
>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
>>> need to see a serious proposl - what we had at the last LSF was people
>>> jumping in with half baked alternative proposals that very much hadn't
>>> been thought through, and I see no need to repeat that.
>>>
>>> Like I mentioned, there's other work gated on this patchset; if people
>>> want to hold this up for more discussion they better be putting forth
>>> something to discuss.
>>
>> I'm thinking of ways on how to achieve Michal's request: "as long as
>> there is a wider agreement from the MM community". If we can achieve
>> that without LSF, great! (a bi-weekly MM meeting might also be an option)
> 
> There will be a maintenance burden even with the cleanest proposed
> approach. 

Yes.

> We worked hard to make the patchset as clean as possible and
> if benefits still don't outweigh the maintenance cost then we should
> probably stop trying.

Indeed.

> At LSF/MM I would rather discuss functonal
> issues/requirements/improvements than alternative approaches to
> instrument allocators.
> I'm happy to arrange a separate meeting with MM folks if that would
> help to progress on the cost/benefit decision.
Note that I am only proposing ways forward.

If you think you can easily achieve what Michal requested without all 
that, good.

My past experience was that LSF/MM / bi-weekly MM meetings were 
extremely helpful to reach consensus.

Kent Overstreet Feb. 13, 2024, 10:50 p.m. UTC | #8

On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
> On 13.02.24 23:30, Suren Baghdasaryan wrote:
> > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
> > > 
> > > On 13.02.24 23:09, Kent Overstreet wrote:
> > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > > 
> > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > > > > > [...]
> > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > > > > > we've gotten has been that even out of tree this patchset has already
> > > > > > > > been useful, and there's a significant amount of other work gated on the
> > > > > > > > code tagging functionality included in this patchset [2].
> > > > > > > 
> > > > > > > I suspect it will not come as a surprise that I really dislike the
> > > > > > > implementation proposed here. I will not repeat my arguments, I have
> > > > > > > done so on several occasions already.
> > > > > > > 
> > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > > > > > this debugging feature will add a maintenance overhead for a very long
> > > > > > > time. I can live with all the downsides of the proposed implementation
> > > > > > > _as long as_ there is a wider agreement from the MM community as this is
> > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > > > > > acks by MM developers so aiming into the next merge window is more than
> > > > > > > little rushed.
> > > > > > 
> > > > > > We tried other previously proposed approaches and all have their
> > > > > > downsides without making maintenance much easier. Your position is
> > > > > > understandable and I think it's fair. Let's see if others see more
> > > > > > benefit than cost here.
> > > > > 
> > > > > Would it make sense to discuss that at LSF/MM once again, especially
> > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far"
> > > > > away (May).
> > > > > 
> > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate
> > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a
> > > > > consensus on this.
> > > > 
> > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> > > > need to see a serious proposl - what we had at the last LSF was people
> > > > jumping in with half baked alternative proposals that very much hadn't
> > > > been thought through, and I see no need to repeat that.
> > > > 
> > > > Like I mentioned, there's other work gated on this patchset; if people
> > > > want to hold this up for more discussion they better be putting forth
> > > > something to discuss.
> > > 
> > > I'm thinking of ways on how to achieve Michal's request: "as long as
> > > there is a wider agreement from the MM community". If we can achieve
> > > that without LSF, great! (a bi-weekly MM meeting might also be an option)
> > 
> > There will be a maintenance burden even with the cleanest proposed
> > approach.
> 
> Yes.
> 
> > We worked hard to make the patchset as clean as possible and
> > if benefits still don't outweigh the maintenance cost then we should
> > probably stop trying.
> 
> Indeed.
> 
> > At LSF/MM I would rather discuss functonal
> > issues/requirements/improvements than alternative approaches to
> > instrument allocators.
> > I'm happy to arrange a separate meeting with MM folks if that would
> > help to progress on the cost/benefit decision.
> Note that I am only proposing ways forward.
> 
> If you think you can easily achieve what Michal requested without all that,
> good.

He requested something?

David Hildenbrand Feb. 13, 2024, 10:57 p.m. UTC | #9

On 13.02.24 23:50, Kent Overstreet wrote:
> On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
>> On 13.02.24 23:30, Suren Baghdasaryan wrote:
>>> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 13.02.24 23:09, Kent Overstreet wrote:
>>>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
>>>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote:
>>>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>>>>>
>>>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
>>>>>>>> [...]
>>>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback
>>>>>>>>> we've gotten has been that even out of tree this patchset has already
>>>>>>>>> been useful, and there's a significant amount of other work gated on the
>>>>>>>>> code tagging functionality included in this patchset [2].
>>>>>>>>
>>>>>>>> I suspect it will not come as a surprise that I really dislike the
>>>>>>>> implementation proposed here. I will not repeat my arguments, I have
>>>>>>>> done so on several occasions already.
>>>>>>>>
>>>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe
>>>>>>>> this debugging feature will add a maintenance overhead for a very long
>>>>>>>> time. I can live with all the downsides of the proposed implementation
>>>>>>>> _as long as_ there is a wider agreement from the MM community as this is
>>>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any
>>>>>>>> acks by MM developers so aiming into the next merge window is more than
>>>>>>>> little rushed.
>>>>>>>
>>>>>>> We tried other previously proposed approaches and all have their
>>>>>>> downsides without making maintenance much easier. Your position is
>>>>>>> understandable and I think it's fair. Let's see if others see more
>>>>>>> benefit than cost here.
>>>>>>
>>>>>> Would it make sense to discuss that at LSF/MM once again, especially
>>>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far"
>>>>>> away (May).
>>>>>>
>>>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate
>>>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a
>>>>>> consensus on this.
>>>>>
>>>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
>>>>> need to see a serious proposl - what we had at the last LSF was people
>>>>> jumping in with half baked alternative proposals that very much hadn't
>>>>> been thought through, and I see no need to repeat that.
>>>>>
>>>>> Like I mentioned, there's other work gated on this patchset; if people
>>>>> want to hold this up for more discussion they better be putting forth
>>>>> something to discuss.
>>>>
>>>> I'm thinking of ways on how to achieve Michal's request: "as long as
>>>> there is a wider agreement from the MM community". If we can achieve
>>>> that without LSF, great! (a bi-weekly MM meeting might also be an option)
>>>
>>> There will be a maintenance burden even with the cleanest proposed
>>> approach.
>>
>> Yes.
>>
>>> We worked hard to make the patchset as clean as possible and
>>> if benefits still don't outweigh the maintenance cost then we should
>>> probably stop trying.
>>
>> Indeed.
>>
>>> At LSF/MM I would rather discuss functonal
>>> issues/requirements/improvements than alternative approaches to
>>> instrument allocators.
>>> I'm happy to arrange a separate meeting with MM folks if that would
>>> help to progress on the cost/benefit decision.
>> Note that I am only proposing ways forward.
>>
>> If you think you can easily achieve what Michal requested without all that,
>> good.
> 
> He requested something?
> 

This won't get merged without acks from MM people.

Suren Baghdasaryan Feb. 13, 2024, 10:59 p.m. UTC | #10

On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
> > On 13.02.24 23:30, Suren Baghdasaryan wrote:
> > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
> > > >
> > > > On 13.02.24 23:09, Kent Overstreet wrote:
> > > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> > > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > > >
> > > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > > > > > > [...]
> > > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > > > > > > we've gotten has been that even out of tree this patchset has already
> > > > > > > > > been useful, and there's a significant amount of other work gated on the
> > > > > > > > > code tagging functionality included in this patchset [2].
> > > > > > > >
> > > > > > > > I suspect it will not come as a surprise that I really dislike the
> > > > > > > > implementation proposed here. I will not repeat my arguments, I have
> > > > > > > > done so on several occasions already.
> > > > > > > >
> > > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > > > > > > this debugging feature will add a maintenance overhead for a very long
> > > > > > > > time. I can live with all the downsides of the proposed implementation
> > > > > > > > _as long as_ there is a wider agreement from the MM community as this is
> > > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > > > > > > acks by MM developers so aiming into the next merge window is more than
> > > > > > > > little rushed.
> > > > > > >
> > > > > > > We tried other previously proposed approaches and all have their
> > > > > > > downsides without making maintenance much easier. Your position is
> > > > > > > understandable and I think it's fair. Let's see if others see more
> > > > > > > benefit than cost here.
> > > > > >
> > > > > > Would it make sense to discuss that at LSF/MM once again, especially
> > > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far"
> > > > > > away (May).
> > > > > >
> > > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate
> > > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a
> > > > > > consensus on this.
> > > > >
> > > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> > > > > need to see a serious proposl - what we had at the last LSF was people
> > > > > jumping in with half baked alternative proposals that very much hadn't
> > > > > been thought through, and I see no need to repeat that.
> > > > >
> > > > > Like I mentioned, there's other work gated on this patchset; if people
> > > > > want to hold this up for more discussion they better be putting forth
> > > > > something to discuss.
> > > >
> > > > I'm thinking of ways on how to achieve Michal's request: "as long as
> > > > there is a wider agreement from the MM community". If we can achieve
> > > > that without LSF, great! (a bi-weekly MM meeting might also be an option)
> > >
> > > There will be a maintenance burden even with the cleanest proposed
> > > approach.
> >
> > Yes.
> >
> > > We worked hard to make the patchset as clean as possible and
> > > if benefits still don't outweigh the maintenance cost then we should
> > > probably stop trying.
> >
> > Indeed.
> >
> > > At LSF/MM I would rather discuss functonal
> > > issues/requirements/improvements than alternative approaches to
> > > instrument allocators.
> > > I'm happy to arrange a separate meeting with MM folks if that would
> > > help to progress on the cost/benefit decision.
> > Note that I am only proposing ways forward.
> >
> > If you think you can easily achieve what Michal requested without all that,
> > good.
>
> He requested something?

Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
possible until the compiler feature is developed and deployed. And it
still would require changes to the headers, so don't think it's worth
delaying the feature for years.

David Hildenbrand Feb. 13, 2024, 11:02 p.m. UTC | #11

On 13.02.24 23:59, Suren Baghdasaryan wrote:
> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
>>
>> On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
>>> On 13.02.24 23:30, Suren Baghdasaryan wrote:
>>>> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 13.02.24 23:09, Kent Overstreet wrote:
>>>>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
>>>>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote:
>>>>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>>>>>>
>>>>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
>>>>>>>>> [...]
>>>>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback
>>>>>>>>>> we've gotten has been that even out of tree this patchset has already
>>>>>>>>>> been useful, and there's a significant amount of other work gated on the
>>>>>>>>>> code tagging functionality included in this patchset [2].
>>>>>>>>>
>>>>>>>>> I suspect it will not come as a surprise that I really dislike the
>>>>>>>>> implementation proposed here. I will not repeat my arguments, I have
>>>>>>>>> done so on several occasions already.
>>>>>>>>>
>>>>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe
>>>>>>>>> this debugging feature will add a maintenance overhead for a very long
>>>>>>>>> time. I can live with all the downsides of the proposed implementation
>>>>>>>>> _as long as_ there is a wider agreement from the MM community as this is
>>>>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any
>>>>>>>>> acks by MM developers so aiming into the next merge window is more than
>>>>>>>>> little rushed.
>>>>>>>>
>>>>>>>> We tried other previously proposed approaches and all have their
>>>>>>>> downsides without making maintenance much easier. Your position is
>>>>>>>> understandable and I think it's fair. Let's see if others see more
>>>>>>>> benefit than cost here.
>>>>>>>
>>>>>>> Would it make sense to discuss that at LSF/MM once again, especially
>>>>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far"
>>>>>>> away (May).
>>>>>>>
>>>>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate
>>>>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a
>>>>>>> consensus on this.
>>>>>>
>>>>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
>>>>>> need to see a serious proposl - what we had at the last LSF was people
>>>>>> jumping in with half baked alternative proposals that very much hadn't
>>>>>> been thought through, and I see no need to repeat that.
>>>>>>
>>>>>> Like I mentioned, there's other work gated on this patchset; if people
>>>>>> want to hold this up for more discussion they better be putting forth
>>>>>> something to discuss.
>>>>>
>>>>> I'm thinking of ways on how to achieve Michal's request: "as long as
>>>>> there is a wider agreement from the MM community". If we can achieve
>>>>> that without LSF, great! (a bi-weekly MM meeting might also be an option)
>>>>
>>>> There will be a maintenance burden even with the cleanest proposed
>>>> approach.
>>>
>>> Yes.
>>>
>>>> We worked hard to make the patchset as clean as possible and
>>>> if benefits still don't outweigh the maintenance cost then we should
>>>> probably stop trying.
>>>
>>> Indeed.
>>>
>>>> At LSF/MM I would rather discuss functonal
>>>> issues/requirements/improvements than alternative approaches to
>>>> instrument allocators.
>>>> I'm happy to arrange a separate meeting with MM folks if that would
>>>> help to progress on the cost/benefit decision.
>>> Note that I am only proposing ways forward.
>>>
>>> If you think you can easily achieve what Michal requested without all that,
>>> good.
>>
>> He requested something?
> 
> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
> possible until the compiler feature is developed and deployed. And it
> still would require changes to the headers, so don't think it's worth
> delaying the feature for years.
> 

I was talking about this: "I can live with all the downsides of the 
proposed implementationas long as there is a wider agreement from the MM 
community as this is where the maintenance cost will be payed. So far I 
have not seen (m)any acks by MM developers".

I certainly cannot be motivated at this point to review and ack this, 
unfortunately too much negative energy around here.

Kent Overstreet Feb. 13, 2024, 11:08 p.m. UTC | #12

On Tue, Feb 13, 2024 at 02:59:11PM -0800, Suren Baghdasaryan wrote:
> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
> > > On 13.02.24 23:30, Suren Baghdasaryan wrote:
> > > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
> > > If you think you can easily achieve what Michal requested without all that,
> > > good.
> >
> > He requested something?
> 
> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
> possible until the compiler feature is developed and deployed. And it
> still would require changes to the headers, so don't think it's worth
> delaying the feature for years.

Hang on, let's look at the actual code.

This is what instrumenting an allocation function looks like:

#define krealloc_array(...)                     alloc_hooks(krealloc_array_noprof(__VA_ARGS__))

IOW, we have to:
 - rename krealloc_array to krealloc_array_noprof
 - replace krealloc_array with a one wrapper macro call

Is this really all we're getting worked up over?

The renaming we need regardless, because the thing that makes this
approach efficient enough to run in production is that we account at
_one_ point in the callstack, we don't save entire backtraces.

And thus we need to explicitly annotate which one that is; which means
we need _noprof() versions of functions for when the accounting is done
by an outer wraper (e.g. mempool).

And, as I keep saying: that alloc_hooks() macro will also get us _per
callsite fault injection points_, and we really need that because - if
you guys have been paying attention to other threads - whenever moving
more stuff to PF_MEMALLOC_* flags comes up (including adding
PF_MEMALLOC_NORECLAIM), the issue of small allocations not failing and
not being testable keeps coming up.

Darrick J. Wong Feb. 13, 2024, 11:11 p.m. UTC | #13

On Tue, Feb 13, 2024 at 05:29:03PM -0500, Kent Overstreet wrote:
> On Tue, Feb 13, 2024 at 11:17:32PM +0100, David Hildenbrand wrote:
> > On 13.02.24 23:09, Kent Overstreet wrote:
> > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> > > > On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > 
> > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > > > > [...]
> > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > > > > we've gotten has been that even out of tree this patchset has already
> > > > > > > been useful, and there's a significant amount of other work gated on the
> > > > > > > code tagging functionality included in this patchset [2].
> > > > > > 
> > > > > > I suspect it will not come as a surprise that I really dislike the
> > > > > > implementation proposed here. I will not repeat my arguments, I have
> > > > > > done so on several occasions already.
> > > > > > 
> > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > > > > this debugging feature will add a maintenance overhead for a very long
> > > > > > time. I can live with all the downsides of the proposed implementation
> > > > > > _as long as_ there is a wider agreement from the MM community as this is
> > > > > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > > > > acks by MM developers so aiming into the next merge window is more than
> > > > > > little rushed.
> > > > >
> > > > > We tried other previously proposed approaches and all have their
> > > > > downsides without making maintenance much easier. Your position is
> > > > > understandable and I think it's fair. Let's see if others see more
> > > > > benefit than cost here.
> > > > 
> > > > Would it make sense to discuss that at LSF/MM once again, especially
> > > > covering why proposed alternatives did not work out? LSF/MM is not "too far"
> > > > away (May).

You want to stall this effort for *three months* to schedule a meeting?

I would love to have better profiling of memory allocations inside XFS
so that we can answer questions like "What's going on with these
allocation stalls?" or "What memory is getting used, and where?" more
quickly than we can now.

Right now I get to stare at tracepoints and printk crap until my eyes
bleed, and maybe dchinner comes to my rescue and figures out what's
going on sooner than I do.  More often we just never figure it out
because only the customer can reproduce the problem, the reams of data
produced by ftrace is unmanageable, and BPF isn't always available.

I'm not thrilled by the large increase in macro crap in the allocation
paths, but I don't know of a better way to instrument things.  Our
attempts to use _RET_IP in XFS to instrument its code paths have never
worked quite right w.r.t. inlined functions and whatnot.

> > > > I recall that the last LSF/MM session on this topic was a bit unfortunate
> > > > (IMHO not as productive as it could have been). Maybe we can finally reach a
> > > > consensus on this.

From my outsider's perspective, nine months have gone by since the last
LSF.  Who has come up with a cleaner/better/faster way to do what Suren
and Kent have done?  Were those code changes integrated into this
patchset?  Or why not?

Most of what I saw in 2023 involved compiler changes (great; now I have
to wait until RHEL 11/Debian 14 to use it) and/or still utilize fugly
macros.

Recalling all the way back to suggestions made in 2022, who wrote the
prototype for doing this via ftrace?  Or BPF?  How well did that go for
counting allocation events and the like?  I saw Tejun saying something
about how they use BPF aggressively inside Meta, but that was about it.

Were any of those solutions significantly better than what's in front of
us here?

I get it, a giant patch forcing everyone to know the difference between
alloc_foo and alloc_foo_noperf offends my (yours?) stylistic
sensibilities.  On the other side, making analysis easier during
customer escalations means we kernel people get data, answers, and
solutions sooner instead of watching all our time get eaten up on L4
support and backporting hell.

> > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> > > need to see a serious proposl - what we had at the last LSF was people
> > > jumping in with half baked alternative proposals that very much hadn't
> > > been thought through, and I see no need to repeat that.
> > > 
> > > Like I mentioned, there's other work gated on this patchset; if people
> > > want to hold this up for more discussion they better be putting forth
> > > something to discuss.
> > 
> > I'm thinking of ways on how to achieve Michal's request: "as long as there
> > is a wider agreement from the MM community". If we can achieve that without
> > LSF, great! (a bi-weekly MM meeting might also be an option)
> 
> A meeting wouldn't be out of the question, _if_ there is an agenda, but:
> 
> What's that coffeee mug say? I just survived another meeting that
> could've been an email?

I congratulate you on your memory of my kitchen mug.  Yes, that's what
it says.

> What exactly is the outcome we're looking for?
> 
> Is there info that people are looking for? I think we summed things up
> pretty well in the cover letter; if there are specifics that people
> want to discuss, that's why we emailed the series out.
> 
> There's people in this thread who've used this patchset in production
> and diagnosed real issues (gigabytes of memory gone missing, I heard the
> other day); I'm personally looking for them to chime in on this thread
> (Johannes, Pasha).
> 
> If it's just grumbling about "maintenance overhead" we need to get past
> - well, people are going to have to accept that we can't deliver
> features without writing code, and I'm confident that the hooking in
> particular is about as clean as it's going to get, _regardless_ of
> toolchain support; and moreover it addresses what's been historically a
> pretty gaping hole in our ability to profile and understand the code we
> write.

Are you and Suren willing to pay whatever maintenance overhead there is?

--D

Kent Overstreet Feb. 13, 2024, 11:12 p.m. UTC | #14

On Wed, Feb 14, 2024 at 12:02:30AM +0100, David Hildenbrand wrote:
> On 13.02.24 23:59, Suren Baghdasaryan wrote:
> > On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet
> > <kent.overstreet@linux.dev> wrote:
> > > 
> > > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
> > > > On 13.02.24 23:30, Suren Baghdasaryan wrote:
> > > > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
> > > > > > 
> > > > > > On 13.02.24 23:09, Kent Overstreet wrote:
> > > > > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> > > > > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > > > > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > > > > > > > > [...]
> > > > > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > > > > > > > > we've gotten has been that even out of tree this patchset has already
> > > > > > > > > > > been useful, and there's a significant amount of other work gated on the
> > > > > > > > > > > code tagging functionality included in this patchset [2].
> > > > > > > > > > 
> > > > > > > > > > I suspect it will not come as a surprise that I really dislike the
> > > > > > > > > > implementation proposed here. I will not repeat my arguments, I have
> > > > > > > > > > done so on several occasions already.
> > > > > > > > > > 
> > > > > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > > > > > > > > this debugging feature will add a maintenance overhead for a very long
> > > > > > > > > > time. I can live with all the downsides of the proposed implementation
> > > > > > > > > > _as long as_ there is a wider agreement from the MM community as this is
> > > > > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > > > > > > > > acks by MM developers so aiming into the next merge window is more than
> > > > > > > > > > little rushed.
> > > > > > > > > 
> > > > > > > > > We tried other previously proposed approaches and all have their
> > > > > > > > > downsides without making maintenance much easier. Your position is
> > > > > > > > > understandable and I think it's fair. Let's see if others see more
> > > > > > > > > benefit than cost here.
> > > > > > > > 
> > > > > > > > Would it make sense to discuss that at LSF/MM once again, especially
> > > > > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far"
> > > > > > > > away (May).
> > > > > > > > 
> > > > > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate
> > > > > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a
> > > > > > > > consensus on this.
> > > > > > > 
> > > > > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> > > > > > > need to see a serious proposl - what we had at the last LSF was people
> > > > > > > jumping in with half baked alternative proposals that very much hadn't
> > > > > > > been thought through, and I see no need to repeat that.
> > > > > > > 
> > > > > > > Like I mentioned, there's other work gated on this patchset; if people
> > > > > > > want to hold this up for more discussion they better be putting forth
> > > > > > > something to discuss.
> > > > > > 
> > > > > > I'm thinking of ways on how to achieve Michal's request: "as long as
> > > > > > there is a wider agreement from the MM community". If we can achieve
> > > > > > that without LSF, great! (a bi-weekly MM meeting might also be an option)
> > > > > 
> > > > > There will be a maintenance burden even with the cleanest proposed
> > > > > approach.
> > > > 
> > > > Yes.
> > > > 
> > > > > We worked hard to make the patchset as clean as possible and
> > > > > if benefits still don't outweigh the maintenance cost then we should
> > > > > probably stop trying.
> > > > 
> > > > Indeed.
> > > > 
> > > > > At LSF/MM I would rather discuss functonal
> > > > > issues/requirements/improvements than alternative approaches to
> > > > > instrument allocators.
> > > > > I'm happy to arrange a separate meeting with MM folks if that would
> > > > > help to progress on the cost/benefit decision.
> > > > Note that I am only proposing ways forward.
> > > > 
> > > > If you think you can easily achieve what Michal requested without all that,
> > > > good.
> > > 
> > > He requested something?
> > 
> > Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
> > possible until the compiler feature is developed and deployed. And it
> > still would require changes to the headers, so don't think it's worth
> > delaying the feature for years.
> > 
> 
> I was talking about this: "I can live with all the downsides of the proposed
> implementationas long as there is a wider agreement from the MM community as
> this is where the maintenance cost will be payed. So far I have not seen
> (m)any acks by MM developers".
> 
> I certainly cannot be motivated at this point to review and ack this,
> unfortunately too much negative energy around here.

David, this kind of reaction is exactly why I was telling Andrew I was
going to submit this as a direct pull request to Linus.

This is an important feature; if we can't stay focused ot the technical
and get it done that's what I'll do.

David Hildenbrand Feb. 13, 2024, 11:22 p.m. UTC | #15

On 14.02.24 00:12, Kent Overstreet wrote:
> On Wed, Feb 14, 2024 at 12:02:30AM +0100, David Hildenbrand wrote:
>> On 13.02.24 23:59, Suren Baghdasaryan wrote:
>>> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet
>>> <kent.overstreet@linux.dev> wrote:
>>>>
>>>> On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
>>>>> On 13.02.24 23:30, Suren Baghdasaryan wrote:
>>>>>> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>
>>>>>>> On 13.02.24 23:09, Kent Overstreet wrote:
>>>>>>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
>>>>>>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote:
>>>>>>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
>>>>>>>>>>> [...]
>>>>>>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback
>>>>>>>>>>>> we've gotten has been that even out of tree this patchset has already
>>>>>>>>>>>> been useful, and there's a significant amount of other work gated on the
>>>>>>>>>>>> code tagging functionality included in this patchset [2].
>>>>>>>>>>>
>>>>>>>>>>> I suspect it will not come as a surprise that I really dislike the
>>>>>>>>>>> implementation proposed here. I will not repeat my arguments, I have
>>>>>>>>>>> done so on several occasions already.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe
>>>>>>>>>>> this debugging feature will add a maintenance overhead for a very long
>>>>>>>>>>> time. I can live with all the downsides of the proposed implementation
>>>>>>>>>>> _as long as_ there is a wider agreement from the MM community as this is
>>>>>>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any
>>>>>>>>>>> acks by MM developers so aiming into the next merge window is more than
>>>>>>>>>>> little rushed.
>>>>>>>>>>
>>>>>>>>>> We tried other previously proposed approaches and all have their
>>>>>>>>>> downsides without making maintenance much easier. Your position is
>>>>>>>>>> understandable and I think it's fair. Let's see if others see more
>>>>>>>>>> benefit than cost here.
>>>>>>>>>
>>>>>>>>> Would it make sense to discuss that at LSF/MM once again, especially
>>>>>>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far"
>>>>>>>>> away (May).
>>>>>>>>>
>>>>>>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate
>>>>>>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a
>>>>>>>>> consensus on this.
>>>>>>>>
>>>>>>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
>>>>>>>> need to see a serious proposl - what we had at the last LSF was people
>>>>>>>> jumping in with half baked alternative proposals that very much hadn't
>>>>>>>> been thought through, and I see no need to repeat that.
>>>>>>>>
>>>>>>>> Like I mentioned, there's other work gated on this patchset; if people
>>>>>>>> want to hold this up for more discussion they better be putting forth
>>>>>>>> something to discuss.
>>>>>>>
>>>>>>> I'm thinking of ways on how to achieve Michal's request: "as long as
>>>>>>> there is a wider agreement from the MM community". If we can achieve
>>>>>>> that without LSF, great! (a bi-weekly MM meeting might also be an option)
>>>>>>
>>>>>> There will be a maintenance burden even with the cleanest proposed
>>>>>> approach.
>>>>>
>>>>> Yes.
>>>>>
>>>>>> We worked hard to make the patchset as clean as possible and
>>>>>> if benefits still don't outweigh the maintenance cost then we should
>>>>>> probably stop trying.
>>>>>
>>>>> Indeed.
>>>>>
>>>>>> At LSF/MM I would rather discuss functonal
>>>>>> issues/requirements/improvements than alternative approaches to
>>>>>> instrument allocators.
>>>>>> I'm happy to arrange a separate meeting with MM folks if that would
>>>>>> help to progress on the cost/benefit decision.
>>>>> Note that I am only proposing ways forward.
>>>>>
>>>>> If you think you can easily achieve what Michal requested without all that,
>>>>> good.
>>>>
>>>> He requested something?
>>>
>>> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
>>> possible until the compiler feature is developed and deployed. And it
>>> still would require changes to the headers, so don't think it's worth
>>> delaying the feature for years.
>>>
>>
>> I was talking about this: "I can live with all the downsides of the proposed
>> implementationas long as there is a wider agreement from the MM community as
>> this is where the maintenance cost will be payed. So far I have not seen
>> (m)any acks by MM developers".
>>
>> I certainly cannot be motivated at this point to review and ack this,
>> unfortunately too much negative energy around here.
> 
> David, this kind of reaction is exactly why I was telling Andrew I was
> going to submit this as a direct pull request to Linus.
> 
> This is an important feature; if we can't stay focused ot the technical
> and get it done that's what I'll do.

Kent, I started this with "Would it make sense" in an attempt to help 
Suren and you to finally make progress with this, one way or the other. 
I know that there were ways in the past to get the MM community to agree 
on such things.

I tried to be helpful, finding ways *not having to* bypass the MM 
community to get MM stuff merged.

The reply I got is mostly negative energy.

So you don't need my help here, understood.

But I will fight against any attempts to bypass the MM community.

Kent Overstreet Feb. 13, 2024, 11:24 p.m. UTC | #16

On Tue, Feb 13, 2024 at 03:11:15PM -0800, Darrick J. Wong wrote:
> On Tue, Feb 13, 2024 at 05:29:03PM -0500, Kent Overstreet wrote:
> > On Tue, Feb 13, 2024 at 11:17:32PM +0100, David Hildenbrand wrote:
> > > On 13.02.24 23:09, Kent Overstreet wrote:
> > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > > 
> > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > > > > > [...]
> > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > > > > > we've gotten has been that even out of tree this patchset has already
> > > > > > > > been useful, and there's a significant amount of other work gated on the
> > > > > > > > code tagging functionality included in this patchset [2].
> > > > > > > 
> > > > > > > I suspect it will not come as a surprise that I really dislike the
> > > > > > > implementation proposed here. I will not repeat my arguments, I have
> > > > > > > done so on several occasions already.
> > > > > > > 
> > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > > > > > this debugging feature will add a maintenance overhead for a very long
> > > > > > > time. I can live with all the downsides of the proposed implementation
> > > > > > > _as long as_ there is a wider agreement from the MM community as this is
> > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > > > > > acks by MM developers so aiming into the next merge window is more than
> > > > > > > little rushed.
> > > > > >
> > > > > > We tried other previously proposed approaches and all have their
> > > > > > downsides without making maintenance much easier. Your position is
> > > > > > understandable and I think it's fair. Let's see if others see more
> > > > > > benefit than cost here.
> > > > > 
> > > > > Would it make sense to discuss that at LSF/MM once again, especially
> > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far"
> > > > > away (May).
> 
> You want to stall this effort for *three months* to schedule a meeting?
> 
> I would love to have better profiling of memory allocations inside XFS
> so that we can answer questions like "What's going on with these
> allocation stalls?" or "What memory is getting used, and where?" more
> quickly than we can now.
> 
> Right now I get to stare at tracepoints and printk crap until my eyes
> bleed, and maybe dchinner comes to my rescue and figures out what's
> going on sooner than I do.  More often we just never figure it out
> because only the customer can reproduce the problem, the reams of data
> produced by ftrace is unmanageable, and BPF isn't always available.
> 
> I'm not thrilled by the large increase in macro crap in the allocation
> paths, but I don't know of a better way to instrument things.  Our
> attempts to use _RET_IP in XFS to instrument its code paths have never
> worked quite right w.r.t. inlined functions and whatnot.
> 
> > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate
> > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a
> > > > > consensus on this.
> 
> From my outsider's perspective, nine months have gone by since the last
> LSF.  Who has come up with a cleaner/better/faster way to do what Suren
> and Kent have done?  Were those code changes integrated into this
> patchset?  Or why not?
> 
> Most of what I saw in 2023 involved compiler changes (great; now I have
> to wait until RHEL 11/Debian 14 to use it) and/or still utilize fugly
> macros.
> 
> Recalling all the way back to suggestions made in 2022, who wrote the
> prototype for doing this via ftrace?  Or BPF?  How well did that go for
> counting allocation events and the like?  I saw Tejun saying something
> about how they use BPF aggressively inside Meta, but that was about it.
> 
> Were any of those solutions significantly better than what's in front of
> us here?
> 
> I get it, a giant patch forcing everyone to know the difference between
> alloc_foo and alloc_foo_noperf offends my (yours?) stylistic
> sensibilities.  On the other side, making analysis easier during
> customer escalations means we kernel people get data, answers, and
> solutions sooner instead of watching all our time get eaten up on L4
> support and backporting hell.
> 
> > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> > > > need to see a serious proposl - what we had at the last LSF was people
> > > > jumping in with half baked alternative proposals that very much hadn't
> > > > been thought through, and I see no need to repeat that.
> > > > 
> > > > Like I mentioned, there's other work gated on this patchset; if people
> > > > want to hold this up for more discussion they better be putting forth
> > > > something to discuss.
> > > 
> > > I'm thinking of ways on how to achieve Michal's request: "as long as there
> > > is a wider agreement from the MM community". If we can achieve that without
> > > LSF, great! (a bi-weekly MM meeting might also be an option)
> > 
> > A meeting wouldn't be out of the question, _if_ there is an agenda, but:
> > 
> > What's that coffeee mug say? I just survived another meeting that
> > could've been an email?
> 
> I congratulate you on your memory of my kitchen mug.  Yes, that's what
> it says.
> 
> > What exactly is the outcome we're looking for?
> > 
> > Is there info that people are looking for? I think we summed things up
> > pretty well in the cover letter; if there are specifics that people
> > want to discuss, that's why we emailed the series out.
> > 
> > There's people in this thread who've used this patchset in production
> > and diagnosed real issues (gigabytes of memory gone missing, I heard the
> > other day); I'm personally looking for them to chime in on this thread
> > (Johannes, Pasha).
> > 
> > If it's just grumbling about "maintenance overhead" we need to get past
> > - well, people are going to have to accept that we can't deliver
> > features without writing code, and I'm confident that the hooking in
> > particular is about as clean as it's going to get, _regardless_ of
> > toolchain support; and moreover it addresses what's been historically a
> > pretty gaping hole in our ability to profile and understand the code we
> > write.
> 
> Are you and Suren willing to pay whatever maintenance overhead there is?

I'm still wondering what this supposed "maintenance overhead" is going
to be...

As I use this patch series I occasionally notice places where a bunch of
memory is being accounted to one line of code, and it would better be
accounted to a caller - but then it's just a couple lines of code to fix
that. You switch that callsite to the _noprof() version of whatever
allocation it's doing, then add an alloc_hooks() wrapper at the place
you do want it accounted.

That doesn't really feel like overhead to me, just the normal tweaking
your tools to get the most out of them.

I will continue to do that for the code I'm looking at, yes.

If other people are doing that too, it'll be because they're also using
memory allocation profiling and finding it valuable.

I did notice earlier that we're still lacking documentation in the
Documentation/ directory; the workflow for "how you shift accounting to
the right spot" is something that should go in there.

Kent Overstreet Feb. 14, 2024, 12:04 a.m. UTC | #17

On Tue, Feb 13, 2024 at 06:54:09PM -0500, Pasha Tatashin wrote:
> > > I tried to be helpful, finding ways *not having to* bypass the MM
> > > community to get MM stuff merged.
> > >
> > > The reply I got is mostly negative energy.
> > >
> > > So you don't need my help here, understood.
> > >
> > > But I will fight against any attempts to bypass the MM community.
> >
> > Well, I'm definitely not trying to bypass the MM community, that's why
> > this patchset is posted. Not sure why people can't voice their opinion
> > on the benefit/cost balance of the patchset over the email... But if a
> > meeting would be more productive I'm happy to set it up.
> 
> Discussing these concerns during the next available MM Alignment
> session makes sense. At a minimum, Suren and Kent can present their
> reasons for believing the current approach is superior to the
> previously proposed alternatives.

Hang on though: I believe we did so adequately within this thread. Both
in the cover letter, and I further outlined exactly what the hooks
need to do, and cited the exact code.

Nobody seems to be complaining about the specifics, so I'm not sure what
would be on the agenda?

Johannes Weiner Feb. 14, 2024, 6:20 a.m. UTC | #18

I'll do a more throrough code review, but before the discussion gets
too sidetracked, I wanted to add my POV on the overall merit of the
direction that is being proposed here.

I have backported and used this code for debugging production issues
before. Logging into a random host with an unfamiliar workload and
being able to get a reliable, comprehensive list of kernel memory
consumers is one of the coolest things I have seen in a long
time. This is a huge improvement to sysadmin quality of life.

It's also a huge improvement for MM developers. We're the first points
of contact for memory regressions that can be caused by pretty much
any driver or subsystem in the kernel.

I encourage anybody who is undecided on whether this is worth doing to
build a kernel with these patches applied and run it on their own
machine. I think you'll be surprised what you'll find - and how myopic
and uninformative /proc/meminfo feels in comparison to this. Did you
know there is a lot more to modern filesystems than the VFS objects we
are currently tracking? :)

Then imagine what this looks like on a production host running a
complex mix of filesystems, enterprise networking, bpf programs, gpus
and accelerators etc.

Backporting the code to a slightly older production kernel wasn't too
difficult. The instrumentation layering is explicit, clean, and fairly
centralized, so resolving minor conflicts around the _noprof renames
and the wrappers was pretty straight-forward.

When we talk about maintenance cost, a fair shake would be to weigh it
against the cost and reliability of our current method: evaluating
consumers in the kernel on a case-by-case basis and annotating the
alloc/free sites by hand; then quibbling with the MM community about
whether that consumer is indeed significant enough to warrant an entry
in /proc/meminfo, and what the catchiest name for the stat would be.

I think we can agree that this is vastly less scalable and more
burdensome than central annotations around a handful of mostly static
allocator entry points. Especially considering the rate of change in
the kernel as a whole, and that not everybody will think of the
comprehensive MM picture when writing a random driver. And I think
that's generous - we don't even have the network stack in meminfo.

So I think what we do now isn't working. In the Meta fleet, at any
given time the p50 for unaccounted kernel memory is several gigabytes
per host. The p99 is between 15% and 30% of total memory. That's a
looot of opaque resource usage we have to accept on faith.

For hunting down regressions, all it takes is one untracked consumer
in the kernel to really throw a wrench into things. It's difficult to
find in the noise with tracing, and if it's not growing after an
initial allocation spike, you're pretty much out of luck finding it at
all. Raise your hand if you've written a drgn script to walk pfns and
try to guess consumers from the state of struct page :)

I agree we should discuss how the annotations are implemented on a
technical basis, but my take is that we need something like this.

In a codebase of our size, I don't think the allocator should be
handing out memory without some basic implied tracking of where it's
going. It's a liability for production environments, and it can hide
bad memory management decisions in drivers and other subsystems for a
very long time.

David Hildenbrand Feb. 14, 2024, 10:01 a.m. UTC | #19

On 14.02.24 00:28, Suren Baghdasaryan wrote:
> On Tue, Feb 13, 2024 at 3:22 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 14.02.24 00:12, Kent Overstreet wrote:
>>> On Wed, Feb 14, 2024 at 12:02:30AM +0100, David Hildenbrand wrote:
>>>> On 13.02.24 23:59, Suren Baghdasaryan wrote:
>>>>> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet
>>>>> <kent.overstreet@linux.dev> wrote:
>>>>>>
>>>>>> On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
>>>>>>> On 13.02.24 23:30, Suren Baghdasaryan wrote:
>>>>>>>> On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On 13.02.24 23:09, Kent Overstreet wrote:
>>>>>>>>>> On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
>>>>>>>>>>> On 13.02.24 22:58, Suren Baghdasaryan wrote:
>>>>>>>>>>>> On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>> We're aiming to get this in the next merge window, for 6.9. The feedback
>>>>>>>>>>>>>> we've gotten has been that even out of tree this patchset has already
>>>>>>>>>>>>>> been useful, and there's a significant amount of other work gated on the
>>>>>>>>>>>>>> code tagging functionality included in this patchset [2].
>>>>>>>>>>>>>
>>>>>>>>>>>>> I suspect it will not come as a surprise that I really dislike the
>>>>>>>>>>>>> implementation proposed here. I will not repeat my arguments, I have
>>>>>>>>>>>>> done so on several occasions already.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyway, I didn't go as far as to nak it even though I _strongly_ believe
>>>>>>>>>>>>> this debugging feature will add a maintenance overhead for a very long
>>>>>>>>>>>>> time. I can live with all the downsides of the proposed implementation
>>>>>>>>>>>>> _as long as_ there is a wider agreement from the MM community as this is
>>>>>>>>>>>>> where the maintenance cost will be payed. So far I have not seen (m)any
>>>>>>>>>>>>> acks by MM developers so aiming into the next merge window is more than
>>>>>>>>>>>>> little rushed.
>>>>>>>>>>>>
>>>>>>>>>>>> We tried other previously proposed approaches and all have their
>>>>>>>>>>>> downsides without making maintenance much easier. Your position is
>>>>>>>>>>>> understandable and I think it's fair. Let's see if others see more
>>>>>>>>>>>> benefit than cost here.
>>>>>>>>>>>
>>>>>>>>>>> Would it make sense to discuss that at LSF/MM once again, especially
>>>>>>>>>>> covering why proposed alternatives did not work out? LSF/MM is not "too far"
>>>>>>>>>>> away (May).
>>>>>>>>>>>
>>>>>>>>>>> I recall that the last LSF/MM session on this topic was a bit unfortunate
>>>>>>>>>>> (IMHO not as productive as it could have been). Maybe we can finally reach a
>>>>>>>>>>> consensus on this.
>>>>>>>>>>
>>>>>>>>>> I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
>>>>>>>>>> need to see a serious proposl - what we had at the last LSF was people
>>>>>>>>>> jumping in with half baked alternative proposals that very much hadn't
>>>>>>>>>> been thought through, and I see no need to repeat that.
>>>>>>>>>>
>>>>>>>>>> Like I mentioned, there's other work gated on this patchset; if people
>>>>>>>>>> want to hold this up for more discussion they better be putting forth
>>>>>>>>>> something to discuss.
>>>>>>>>>
>>>>>>>>> I'm thinking of ways on how to achieve Michal's request: "as long as
>>>>>>>>> there is a wider agreement from the MM community". If we can achieve
>>>>>>>>> that without LSF, great! (a bi-weekly MM meeting might also be an option)
>>>>>>>>
>>>>>>>> There will be a maintenance burden even with the cleanest proposed
>>>>>>>> approach.
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> We worked hard to make the patchset as clean as possible and
>>>>>>>> if benefits still don't outweigh the maintenance cost then we should
>>>>>>>> probably stop trying.
>>>>>>>
>>>>>>> Indeed.
>>>>>>>
>>>>>>>> At LSF/MM I would rather discuss functonal
>>>>>>>> issues/requirements/improvements than alternative approaches to
>>>>>>>> instrument allocators.
>>>>>>>> I'm happy to arrange a separate meeting with MM folks if that would
>>>>>>>> help to progress on the cost/benefit decision.
>>>>>>> Note that I am only proposing ways forward.
>>>>>>>
>>>>>>> If you think you can easily achieve what Michal requested without all that,
>>>>>>> good.
>>>>>>
>>>>>> He requested something?
>>>>>
>>>>> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
>>>>> possible until the compiler feature is developed and deployed. And it
>>>>> still would require changes to the headers, so don't think it's worth
>>>>> delaying the feature for years.
>>>>>
>>>>
>>>> I was talking about this: "I can live with all the downsides of the proposed
>>>> implementationas long as there is a wider agreement from the MM community as
>>>> this is where the maintenance cost will be payed. So far I have not seen
>>>> (m)any acks by MM developers".
>>>>
>>>> I certainly cannot be motivated at this point to review and ack this,
>>>> unfortunately too much negative energy around here.
>>>
>>> David, this kind of reaction is exactly why I was telling Andrew I was
>>> going to submit this as a direct pull request to Linus.
>>>
>>> This is an important feature; if we can't stay focused ot the technical
>>> and get it done that's what I'll do.
>>
>> Kent, I started this with "Would it make sense" in an attempt to help
>> Suren and you to finally make progress with this, one way or the other.
>> I know that there were ways in the past to get the MM community to agree
>> on such things.
>>
>> I tried to be helpful, finding ways *not having to* bypass the MM
>> community to get MM stuff merged.
>>
>> The reply I got is mostly negative energy.
>>
>> So you don't need my help here, understood.
>>
>> But I will fight against any attempts to bypass the MM community.
> 
> Well, I'm definitely not trying to bypass the MM community, that's why
> this patchset is posted. Not sure why people can't voice their opinion
> on the benefit/cost balance of the patchset over the email... But if a
> meeting would be more productive I'm happy to set it up.

If you can get the acks without any additional meetings, great. The 
replies from Pasha and Johannes are encouraging, let's hope core 
memory-allocator people will voice their opinion here.

If you come to the conclusion that another meeting would help getting 
maintainers's attention and sorting out some of the remaining concerns, 
feel free to schedule a meeting with Dave R. I suspect only the slot 
next week is already taken. In the past, we also had "special" meetings 
just for things to make progress faster.

If you're looking for ideas on what the agenda of such a meeting could 
look like, I'll happily discuss that with you off-list.

v2 was more than 3 months ago. If it's really about minor details here, 
waiting another 3 months for LSF/MM is indeed not reasonable.

Myself, I'll be happy not having to sit through another LSF/MM session 
of that kind. The level of drama is exceptional and I'm hoping it won't 
be the new norm in the MM space.

Good luck!

Vlastimil Babka Feb. 14, 2024, 10:20 a.m. UTC | #20

On 2/14/24 00:08, Kent Overstreet wrote:
> On Tue, Feb 13, 2024 at 02:59:11PM -0800, Suren Baghdasaryan wrote:
>> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet
>> <kent.overstreet@linux.dev> wrote:
>> >
>> > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
>> > > On 13.02.24 23:30, Suren Baghdasaryan wrote:
>> > > > On Tue, Feb 13, 2024 at 2:17 PM David Hildenbrand <david@redhat.com> wrote:
>> > > If you think you can easily achieve what Michal requested without all that,
>> > > good.
>> >
>> > He requested something?
>> 
>> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
>> possible until the compiler feature is developed and deployed. And it
>> still would require changes to the headers, so don't think it's worth
>> delaying the feature for years.
> 
> Hang on, let's look at the actual code.
> 
> This is what instrumenting an allocation function looks like:
> 
> #define krealloc_array(...)                     alloc_hooks(krealloc_array_noprof(__VA_ARGS__))
> 
> IOW, we have to:
>  - rename krealloc_array to krealloc_array_noprof
>  - replace krealloc_array with a one wrapper macro call
> 
> Is this really all we're getting worked up over?
> 
> The renaming we need regardless, because the thing that makes this
> approach efficient enough to run in production is that we account at
> _one_ point in the callstack, we don't save entire backtraces.
> 
> And thus we need to explicitly annotate which one that is; which means
> we need _noprof() versions of functions for when the accounting is done
> by an outer wraper (e.g. mempool).
> 
> And, as I keep saying: that alloc_hooks() macro will also get us _per
> callsite fault injection points_, and we really need that because - if
> you guys have been paying attention to other threads - whenever moving
> more stuff to PF_MEMALLOC_* flags comes up (including adding
> PF_MEMALLOC_NORECLAIM), the issue of small allocations not failing and
> not being testable keeps coming up.

How exactly do you envision the fault injection to help here? The proposals
are about scoping via a process flag, and the process may then call just
about anything under that scope. So if our tool is per callsite fault
injection points, how do we know which callsites to enable to focus the
fault injection on the particular scope?

Michal Hocko Feb. 14, 2024, 1:23 p.m. UTC | #21

On Tue 13-02-24 14:59:11, Suren Baghdasaryan wrote:
> On Tue, Feb 13, 2024 at 2:50 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Tue, Feb 13, 2024 at 11:48:41PM +0100, David Hildenbrand wrote:
[...]
> > > If you think you can easily achieve what Michal requested without all that,
> > > good.
> >
> > He requested something?
> 
> Yes, a cleaner instrumentation.

Nope, not really. You have indicated you want to target this version for the
_next_ merge window without any acks, really. If you want to go
forward with this then you should gain a support from the MM community
at least. Why? Because the whole macro layering is adding maintenance
cost for MM people.

I have expressed why I absolutely hate the additional macro layer. We
have been through similar layers of macros in other areas (not to
mention page allocator interface itself) and it has _always_ turned out
a bad idea long term. I do not see why this case should be any
different.

The whole kernel is moving to a dynamic tracing realm and now we
are going to build a statically macro based tracing infrastructure which
will need tweaking anytime real memory consumers are one layer up the
existing macro infrastructure (do not forget quite a lot of allocations
are in library functions) and/or we need to modify the allocator API
in some way. Call me unimpressed!

Now, I fully recognize that the solution doesn't really have to be
perfect in order to be useful. Hence I never NAKed it even though I really
_dislike_ the approach. I have expected you will grow the community
support over time if this is indeed the only feasible approach but that
is not reflected in the series posted here. If you find a support I will
not stand in the way.

> Unfortunately the cleanest one is not
> possible until the compiler feature is developed and deployed. And it
> still would require changes to the headers, so don't think it's worth
> delaying the feature for years.

I am pretty sure you have invested a non-trivial time into evaluating
other ways, yet your cover letter is rather modest about any details:
:  - looked at alternate hooking methods.
:    There were suggestions on alternate methods (compiler attribute,
:    trampolines), but they wouldn't have made the patchset any cleaner
:    (we still need to have different function versions for accounting vs. no
:    accounting to control at which point in a call chain the accounting
:    happens), and they would have added a dependency on toolchain
:    support.

First immediate question would be: What about page_owner? I do remember
the runtime overhead being discussed but I do not really remember any
actual numbers outside of artificial workloads. Has this been
investigated? Is our stack unwinder the problem? Etc.

Also what are the biggest obstacles to efficiently track allocations via
our tracing infrastructure? Has this been investigated? What were conclusions?

Michal Hocko Feb. 14, 2024, 2:46 p.m. UTC | #22

On Wed 14-02-24 01:20:20, Johannes Weiner wrote:
[...]
> I agree we should discuss how the annotations are implemented on a
> technical basis, but my take is that we need something like this.

I do not think there is any disagreement on usefulness of a better
memory allocation tracking. At least for me the primary problem is the
implementation. At LFSMM last year we have heard that existing tracing
infrastructure hasn't really been explored much. Cover letter doesn't
really talk much about those alternatives so it is really hard to
evaluate whether the proposed solution is indeed our best way to
approach this.

> In a codebase of our size, I don't think the allocator should be
> handing out memory without some basic implied tracking of where it's
> going. It's a liability for production environments, and it can hide
> bad memory management decisions in drivers and other subsystems for a
> very long time.

Fully agreed! It is quite common to see oom reports with a large portion
of memory unaccounted and this really presents additional cost on the
debugging side.

Michal Hocko Feb. 14, 2024, 4:02 p.m. UTC | #23

On Wed 14-02-24 10:01:14, Kent Overstreet wrote:
> On Wed, Feb 14, 2024 at 03:46:33PM +0100, Michal Hocko wrote:
> > On Wed 14-02-24 01:20:20, Johannes Weiner wrote:
> > [...]
> > > I agree we should discuss how the annotations are implemented on a
> > > technical basis, but my take is that we need something like this.
> > 
> > I do not think there is any disagreement on usefulness of a better
> > memory allocation tracking. At least for me the primary problem is the
> > implementation. At LFSMM last year we have heard that existing tracing
> > infrastructure hasn't really been explored much. Cover letter doesn't
> > really talk much about those alternatives so it is really hard to
> > evaluate whether the proposed solution is indeed our best way to
> > approach this.
> 
> Michal, we covered this before.

It is a good practice to summarize previous discussions in the cover
letter. Especially when there are different approaches discussed over a
longer time period or when the topic is controversial.

I do not see anything like that here. Neither for the existing tracing
infrastructure, page owner nor performance concerns discussed before
etc. Look, I do not want to nit pick or insist on formalisms but having
those data points layed out would make any further discussion much more
smooth.

Michal Hocko Feb. 14, 2024, 4:31 p.m. UTC | #24

On Wed 14-02-24 11:17:20, Kent Overstreet wrote:
[...]
> You gotta stop with this this derailing garbage.

It is always pleasure talking to you Kent, but let me give you advice
(free of charge of course). Let Suren talk, chances for civilized
and productive discussion are much higher!

I do not have much more to add to the discussion. My point stays, find a
support of the MM community if you want to proceed with this work.

Andrew Morton Feb. 14, 2024, 4:55 p.m. UTC | #25

On Tue, 13 Feb 2024 14:59:11 -0800 Suren Baghdasaryan <surenb@google.com> wrote:

> > > If you think you can easily achieve what Michal requested without all that,
> > > good.
> >
> > He requested something?
> 
> Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
> possible until the compiler feature is developed and deployed. And it
> still would require changes to the headers, so don't think it's worth
> delaying the feature for years.

Can we please be told much more about this compiler feature? 
Description of what it is, what it does, how it will affect this kernel
feature, etc.

Who is developing it and when can we expect it to become available?

Will we be able to migrate to it without back-compatibility concerns? 
(I think "you need quite recent gcc for memory profiling" is
reasonable).

Because: if the maintainability issues which Michel describes will be
significantly addressed with the gcc support then we're kinda reviewing
the wrong patchset.  Yes, it may be a maintenance burden initially, but
at some (yet to be revealed) time in the future, this will be addressed
with the gcc support?

Suren Baghdasaryan Feb. 14, 2024, 5:14 p.m. UTC | #26

On Wed, Feb 14, 2024 at 8:55 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 13 Feb 2024 14:59:11 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > > > If you think you can easily achieve what Michal requested without all that,
> > > > good.
> > >
> > > He requested something?
> >
> > Yes, a cleaner instrumentation. Unfortunately the cleanest one is not
> > possible until the compiler feature is developed and deployed. And it
> > still would require changes to the headers, so don't think it's worth
> > delaying the feature for years.
>
> Can we please be told much more about this compiler feature?
> Description of what it is, what it does, how it will affect this kernel
> feature, etc.

Sure. The compiler support will be in a form of a new __attribute__,
simplified example:

// generate data for the wrapper
static void _alloc_tag()
{
  static struct alloc_tag _alloc_tag __section ("alloc_tags")
      = { .ct = CODE_TAG_INIT, .counter = 0 };
}

static inline int
wrapper (const char *name, int x, int (*callee) (const char *, int),
         struct alloc_tag *callsite_data)
{
  callsite_data->counter++;
  printf ("Call #%d from %s:%d (%s)\n", callsite_data->counter,
          callsite_data->ct.filename, callsite_data->ct.lineno,
          callsite_data->ct.function);
  int ret = callee (name, x);
  printf ("Returned: %d\n", ret);
  return ret;
}

__attribute__((annotate("callsite_wrapped_by", wrapper, _alloc_tag)))
int foo(const char* name, int x);

int foo(const char* name, int x) {
  printf ("Hello %s, %d!\n", name, x);
  return x;
}

Which we will be able to attach to a function without changing its
name and preserving the namespace (it applies only to functions with
that name, not everything else).
Note that we will still need _noprof versions of the allocators.

>
> Who is developing it and when can we expect it to become available?

Aleksei Vetrov (google) with the help of Nick Desaulniers (google).
Both are CC'ed on this email.
After several iterations Aleksei has a POC which we are evaluating
(https://github.com/llvm/llvm-project/compare/main...noxwell:llvm-project:callsite-wrapper-tree-transform).
Once it's in good shape we are going to engage with CLANG and GCC
community to get it upstreamed. When it will become available and when
the distributions will pick it up is anybody's guess. Upstreaming is
usually a lengthy process.

>
> Will we be able to migrate to it without back-compatibility concerns?
> (I think "you need quite recent gcc for memory profiling" is
> reasonable).

The migration should be quite straight-forward, replacing the macros
with functions with that attribute.

>
>
> Because: if the maintainability issues which Michel describes will be
> significantly addressed with the gcc support then we're kinda reviewing
> the wrong patchset.  Yes, it may be a maintenance burden initially, but
> at some (yet to be revealed) time in the future, this will be addressed
> with the gcc support?

That's what I'm aiming for. I just don't want this placed on hold
until the compiler support is widely available, which might take
years.

>

Tim Chen Feb. 14, 2024, 6:53 p.m. UTC | #27

On Mon, 2024-02-12 at 13:38 -0800, Suren Baghdasaryan wrote:
> Memory allocation, v3 and final:
> 
> Overview:
> Low overhead [1] per-callsite memory allocation profiling. Not just for debug
> kernels, overhead low enough to be deployed in production.
> 
> We're aiming to get this in the next merge window, for 6.9. The feedback
> we've gotten has been that even out of tree this patchset has already
> been useful, and there's a significant amount of other work gated on the
> code tagging functionality included in this patchset [2].
> 
> Example output:
>   root@moria-kvm:~# sort -h /proc/allocinfo|tail
>    3.11MiB     2850 fs/ext4/super.c:1408 module:ext4 func:ext4_alloc_inode
>    3.52MiB      225 kernel/fork.c:356 module:fork func:alloc_thread_stack_node
>    3.75MiB      960 mm/page_ext.c:270 module:page_ext func:alloc_page_ext
>    4.00MiB        2 mm/khugepaged.c:893 module:khugepaged func:hpage_collapse_alloc_folio
>    10.5MiB      168 block/blk-mq.c:3421 module:blk_mq func:blk_mq_alloc_rqs
>    14.0MiB     3594 include/linux/gfp.h:295 module:filemap func:folio_alloc_noprof
>    26.8MiB     6856 include/linux/gfp.h:295 module:memory func:folio_alloc_noprof
>    64.5MiB    98315 fs/xfs/xfs_rmap_item.c:147 module:xfs func:xfs_rui_init
>    98.7MiB    25264 include/linux/gfp.h:295 module:readahead func:folio_alloc_noprof
>     125MiB     7357 mm/slub.c:2201 module:slub func:alloc_slab_page
> 
> Since v2:
>  - tglx noticed a circular header dependency between sched.h and percpu.h;
>    a bunch of header cleanups were merged into 6.8 to ameliorate this [3].
> 
>  - a number of improvements, moving alloc_hooks() annotations to the
>    correct place for better tracking (mempool), and bugfixes.
> 
>  - looked at alternate hooking methods.
>    There were suggestions on alternate methods (compiler attribute,
>    trampolines), but they wouldn't have made the patchset any cleaner
>    (we still need to have different function versions for accounting vs. no
>    accounting to control at which point in a call chain the accounting
>    happens), and they would have added a dependency on toolchain
>    support.
> 
> Usage:
> kconfig options:
>  - CONFIG_MEM_ALLOC_PROFILING
>  - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
>  - CONFIG_MEM_ALLOC_PROFILING_DEBUG
>    adds warnings for allocations that weren't accounted because of a
>    missing annotation
> 
> sysctl:
>   /proc/sys/vm/mem_profiling
> 
> Runtime info:
>   /proc/allocinfo
> 
> Notes:
> 
> [1]: Overhead
> To measure the overhead we are comparing the following configurations:
> (1) Baseline with CONFIG_MEMCG_KMEM=n
> (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
>     CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
> (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
>     CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
> (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
>     CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
> (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
> 

Thanks for the work on this patchset and it is quite useful.
A clarification question on the data:

I assume Config (2), (3) and (4) has CONFIG_MEMCG_KMEM=n, right?
If so do you have similar data for config (2), (3) and (4) but with
CONFIG_MEMCG_KMEM=y for comparison with (5)?

Tim

> Performance overhead:
> To evaluate performance we implemented an in-kernel test executing
> multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> affinity set to a specific CPU to minimize the noise. Below are results
> from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
> 56 core Intel Xeon:
> 
>                         kmalloc                 pgalloc
> (1 baseline)            6.764s                  16.902s
> (2 default disabled)    6.793s (+0.43%)         17.007s (+0.62%)
> (3 default enabled)     7.197s (+6.40%)         23.666s (+40.02%)
> (4 runtime enabled)     7.405s (+9.48%)         23.901s (+41.41%)
> (5 memcg)               13.388s (+97.94%)       48.460s (+186.71%)
>

Jani Nikula Feb. 16, 2024, 8:38 a.m. UTC | #28

On Mon, 12 Feb 2024, Suren Baghdasaryan <surenb@google.com> wrote:
> Memory allocation, v3 and final:
>
> Overview:
> Low overhead [1] per-callsite memory allocation profiling. Not just for debug
> kernels, overhead low enough to be deployed in production.
>
> We're aiming to get this in the next merge window, for 6.9. The feedback
> we've gotten has been that even out of tree this patchset has already
> been useful, and there's a significant amount of other work gated on the
> code tagging functionality included in this patchset [2].

I wonder if it wouldn't be too much trouble to write at least a brief
overview document under Documentation/ describing what this is all
about? Even as follow-up. People seeing the patch series have the
benefit of the cover letter and the commit messages, but that's hardly
documentation.

We have all these great frameworks and tools but their discoverability
to kernel developers isn't always all that great.

BR,
Jani.

Jani Nikula Feb. 16, 2024, 9:07 a.m. UTC | #29

On Fri, 16 Feb 2024, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> On Fri, Feb 16, 2024 at 10:38:00AM +0200, Jani Nikula wrote:
>> I wonder if it wouldn't be too much trouble to write at least a brief
>> overview document under Documentation/ describing what this is all
>> about? Even as follow-up. People seeing the patch series have the
>> benefit of the cover letter and the commit messages, but that's hardly
>> documentation.
>> 
>> We have all these great frameworks and tools but their discoverability
>> to kernel developers isn't always all that great.
>
> commit f589b48789de4b8f77bfc70b9f3ab2013c01eaf2
> Author: Kent Overstreet <kent.overstreet@linux.dev>
> Date:   Wed Feb 14 01:13:04 2024 -0500
>
>     memprofiling: Documentation
>     
>     Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Thanks! Wasn't part of this series and I wasn't aware it existed.

BR,
Jani.