Message ID | 20240221194052.927623-20-surenb@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-75398-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2685:b0:108:e6aa:91d0 with SMTP id mn5csp1260146dyc; Wed, 21 Feb 2024 11:49:50 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCV5GPI/dPb93tjnNXGqM0x9Gg5iPDUu/pHCRpkCBVg7ygm6Q7CY+NBzfJeJheBzOozUVtgrxaRYxmw1bRwjXmHkoVsWhA== X-Google-Smtp-Source: AGHT+IGPtP2fzdoQtR3KBZFVH5w90s7ro1QRK1hbVgB0Z/VhgzrSOpu1CvdgQwQXJNRlsyRl/djh X-Received: by 2002:a17:906:26ca:b0:a3e:ede1:cb3d with SMTP id u10-20020a17090626ca00b00a3eede1cb3dmr4848268ejc.57.1708544990081; Wed, 21 Feb 2024 11:49:50 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708544990; cv=pass; d=google.com; s=arc-20160816; b=yuBnu9+vPhFmzKJuL4uhsoyQ//QDYWr3H3XZzHCZPTaSWtYNANeSOFi8+nuy7GJve7 aCrdJXBZrnWESGDbMkO+kIacVaOGpTIQTA3vUpNUCczaFfZhBMxVrdWWEPfkwbQRgrgC g6h/8iXC7zlwsZ/0Xcvj5ghwo7wrT9Ajbu1dt4pWnBWTUzSEesijlyAB81gTfP+XoPhE JDQ39HtqfQyD/PvP5Hq+ejdd0/5CGGa0w7Msk2QbrzJlfK2AQkHH/v0pWP1rc5KxRjNX 8fj0vlRRNQeFslesNKvLmIVSSUUp+4ZT/BCYCwGk3vrHG2Y/TGFgit4NjrTGNwItctEe mRzA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=1hWNqJFIgMmzuIRMCt7lzLBdngxXaEaZ9g59MUlptRo=; fh=MuK1HU0MYOnWtye8nIYdRvf1fPDnKrZA58Fwfzvz7dk=; b=ql0tTxcARCxGMy26HSgVPoVNbowQ1CztTJyBWWXaTHigaFlPH6ZlqF9BArb14zOmUD oxpShiVAL+qpf8NWRraQ132heWG0UwIU5EeSg5Gdo+IZHeS4VugHWZl/LZx5gI8ksF0d xYKl921Z3uJchu0G5EOA8ko9WeMyu3O0jQCbC6nBNVX/HHN7ImrfOrrNGhMldmOl1T3H O9TaJVl7zXIjlNG7SujhmQwkIp28Z6VqWT3Uu7yOUkS9nkTwoGrMt9S0OvhFOlYK7NhU p3WQmr3/A/Q0cq95JvrpM4sPcuc/V8ecOoBirgUiwj1mLp87LzaVUNiO9VtQOyvmzdIR fBSw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=HdCkcLjx; arc=pass (i=1 spf=pass spfdomain=flex--surenb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-75398-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75398-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id qf11-20020a1709077f0b00b00a3ec1463a12si2598943ejc.348.2024.02.21.11.49.49 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Feb 2024 11:49:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-75398-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=HdCkcLjx; arc=pass (i=1 spf=pass spfdomain=flex--surenb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-75398-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75398-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id AE0FA1F238BA for <ouuuleilei@gmail.com>; Wed, 21 Feb 2024 19:49:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 04E4614E2EF; Wed, 21 Feb 2024 19:41:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HdCkcLjx" Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 221F914901F for <linux-kernel@vger.kernel.org>; Wed, 21 Feb 2024 19:41:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708544501; cv=none; b=vFRNWZBcp9aOxaA2XytoduVpScsmiQRuTL5le5m/nWuukkAIVFhsHa0IQkZ4x9WDjdzLxetFUAa4Mp/I2OvzOCtYYwFJBY0xXPps+p1xTL0irmjppIccYm8R/UupGMCUIKsykr6Ryz4039OcywPH54rvY4wk8+lc/IU2TiPlTfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708544501; c=relaxed/simple; bh=vicr63sJKc1o/siGbzRTyI8A3E1968FkjrlbTO/oYk4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=kAb69hT3g2wlA2PYf/D1gZYpsndGHhvszZEs4UgZOU5zcMSWV/190A1ioeX5gaunv9+E4KWzhnvLEnKUpjd7wADdtjhCrOd2M2weuepbmc+WMvJO2mnV4IVEG9Tt4mLdRzl9FHLgG5jh7HyQpzDnrTjbaQ4ouD5ZRkk80eypklE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HdCkcLjx; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dce775fa8adso2077432276.1 for <linux-kernel@vger.kernel.org>; Wed, 21 Feb 2024 11:41:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1708544498; x=1709149298; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1hWNqJFIgMmzuIRMCt7lzLBdngxXaEaZ9g59MUlptRo=; b=HdCkcLjxJ4Wizup2+9U7O6tdGXX0GcXUUQR8MMyrWK+3lb0LUb9CPJgk96yNeW7jZf cv9mPFE/FImSsOSpnLgOoaqxCaNuCMq4699Vfj5OI4/tx19iWipTRrytKstxOMcrXc3Y FkevKNDf6LVq/Jq4lZ4+14FMmXq2+DZYQxuKSH1I3Rbk2udt/zJuqs6eLAXlJ2l+V6GU fbI2UGRKa3lp80Te2z7/cZMY5hpMHg8q5QBiTGOABMDhTegPN5Mr5kDJvyjzEvnI5eQK NivwmS/wrDr9eCdx0MXJ7PZ1XCOcx072pXbq5VEOpo0yxFDbOqNXHJ+GPV6qaCx5cdqw vMDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708544498; x=1709149298; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1hWNqJFIgMmzuIRMCt7lzLBdngxXaEaZ9g59MUlptRo=; b=LWf7Zm7hh0CMlmNJoE9Ds2TVizE3Oy0KEkzEFMvuIerHz+uDii+9J1tQRX6ZMXqOnZ +jGL9rI5FydBQHcFk/SGaO45DA3jzhVGl8DWHR9G5/W00QV3qdKwHgOt8lq+HC5B+a8H 82Yvq6+T2G3GF6SNmzW4BeMj8oUG7dIzM1tYmeVNzYLE2t6VBiPXb5TB83pl8lquGwF9 cLzqzFP+LSAwPSgarikoNtlJLAc9uhNlwWUg/ipiMKe81n1Cc/qRmJXEZ+NRgXt9s++M BPWcj+DvmPtGkwzvUL9OLIpM08IdL+/k4cNj4ojfKzh8vj7waXcDc2GrwtllCI8dyuGI WPhQ== X-Forwarded-Encrypted: i=1; AJvYcCVgvE60CLJMsmLQYr3AzeMla8kWgmSQmHiaZHYXEEuNadO+MHFPDboV5KSBZVAwGEWqBG74SVUE4bGJ4q+3csXbrgXVyTsh1QrdrD/J X-Gm-Message-State: AOJu0Yxzr4c1QXd/hQYK630+81OGOjsgqi4GY+OYSILmpv2C4Dq4/tKc Aeh0fXNadZ17DvqE1K8FTk8+zUrlCU+m6J9SOeASdYlTTwbv1uYIB5CQXHFCen0bQc2DvUCJEse KuA== X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:953b:9a4e:1e10:3f07]) (user=surenb job=sendgmr) by 2002:a05:6902:18d6:b0:dc6:dfc6:4207 with SMTP id ck22-20020a05690218d600b00dc6dfc64207mr68537ybb.10.1708544498016; Wed, 21 Feb 2024 11:41:38 -0800 (PST) Date: Wed, 21 Feb 2024 11:40:32 -0800 In-Reply-To: <20240221194052.927623-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> Mime-Version: 1.0 References: <20240221194052.927623-1-surenb@google.com> X-Mailer: git-send-email 2.44.0.rc0.258.g7320e95886-goog Message-ID: <20240221194052.927623-20-surenb@google.com> Subject: [PATCH v4 19/36] mm: create new codetag references during page splitting From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, penguin-kernel@i-love.sakura.ne.jp, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, surenb@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791539271307843305 X-GMAIL-MSGID: 1791539271307843305 |
Series |
Memory allocation profiling
|
|
Commit Message
Suren Baghdasaryan
Feb. 21, 2024, 7:40 p.m. UTC
When a high-order page is split into smaller ones, each newly split
page should get its codetag. The original codetag is reused for these
pages but it's recorded as 0-byte allocation because original codetag
already accounts for the original high-order allocated page.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
mm/huge_memory.c | 2 ++
mm/page_alloc.c | 2 ++
3 files changed, 34 insertions(+)
Comments
On 2/21/24 20:40, Suren Baghdasaryan wrote: > When a high-order page is split into smaller ones, each newly split > page should get its codetag. The original codetag is reused for these > pages but it's recorded as 0-byte allocation because original codetag > already accounts for the original high-order allocated page. This was v3 but then you refactored (for the better) so the commit log could reflect it? > Signed-off-by: Suren Baghdasaryan <surenb@google.com> I was going to R-b, but now I recalled the trickiness of __free_pages() for non-compound pages if it loses the race to a speculative reference. Will the codetag handling work fine there? > --- > include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++ > mm/huge_memory.c | 2 ++ > mm/page_alloc.c | 2 ++ > 3 files changed, 34 insertions(+) > > diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h > index b49ab955300f..9e6ad8e0e4aa 100644 > --- a/include/linux/pgalloc_tag.h > +++ b/include/linux/pgalloc_tag.h > @@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order) > } > } > > +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) > +{ > + int i; > + struct page_ext *page_ext; > + union codetag_ref *ref; > + struct alloc_tag *tag; > + > + if (!mem_alloc_profiling_enabled()) > + return; > + > + page_ext = page_ext_get(page); > + if (unlikely(!page_ext)) > + return; > + > + ref = codetag_ref_from_page_ext(page_ext); > + if (!ref->ct) > + goto out; > + > + tag = ct_to_alloc_tag(ref->ct); > + page_ext = page_ext_next(page_ext); > + for (i = 1; i < nr; i++) { > + /* Set new reference to point to the original tag */ > + alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag); > + page_ext = page_ext_next(page_ext); > + } > +out: > + page_ext_put(page_ext); > +} > + > #else /* CONFIG_MEM_ALLOC_PROFILING */ > > static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, > unsigned int order) {} > static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {} > +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {} > > #endif /* CONFIG_MEM_ALLOC_PROFILING */ > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 94c958f7ebb5..86daae671319 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -38,6 +38,7 @@ > #include <linux/sched/sysctl.h> > #include <linux/memory-tiers.h> > #include <linux/compat.h> > +#include <linux/pgalloc_tag.h> > > #include <asm/tlb.h> > #include <asm/pgalloc.h> > @@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, > /* Caller disabled irqs, so they are still disabled here */ > > split_page_owner(head, nr); > + pgalloc_tag_split(head, nr); > > /* See comment in __split_huge_page_tail() */ > if (PageAnon(head)) { > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 58c0e8b948a4..4bc5b4720fee 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order) > for (i = 1; i < (1 << order); i++) > set_page_refcounted(page + i); > split_page_owner(page, 1 << order); > + pgalloc_tag_split(page, 1 << order); > split_page_memcg(page, 1 << order); > } > EXPORT_SYMBOL_GPL(split_page); > @@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order, > struct page *last = page + nr; > > split_page_owner(page, 1 << order); > + pgalloc_tag_split(page, 1 << order); > split_page_memcg(page, 1 << order); > while (page < --last) > set_page_refcounted(last);
On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 2/21/24 20:40, Suren Baghdasaryan wrote: > > When a high-order page is split into smaller ones, each newly split > > page should get its codetag. The original codetag is reused for these > > pages but it's recorded as 0-byte allocation because original codetag > > already accounts for the original high-order allocated page. > > This was v3 but then you refactored (for the better) so the commit log > could reflect it? Yes, technically mechnism didn't change but I should word it better. Smth like this: When a high-order page is split into smaller ones, each newly split page should get its codetag. After the split each split page will be referencing the original codetag. The codetag's "bytes" counter remains the same because the amount of allocated memory has not changed, however the "calls" counter gets increased to keep the counter correct when these individual pages get freed. > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > I was going to R-b, but now I recalled the trickiness of > __free_pages() for non-compound pages if it loses the race to a > speculative reference. Will the codetag handling work fine there? I think so. Each non-compoud page has its individual reference to its codetag and will decrement it whenever the page is freed. IIUC the logic in __free_pages(), when it loses race to a speculative reference it will free all pages except for the first one and the first one will be freed when the last put_page() happens. If prior to this all these pages were split from one page then all of them will have their own reference which points to the same codetag. Every time one of these pages are freed that codetag's "bytes" and "calls" counters will be decremented. I think accounting will work correctly irrespective of where these pages are freed, in __free_pages() or by put_page(). > > > --- > > include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++ > > mm/huge_memory.c | 2 ++ > > mm/page_alloc.c | 2 ++ > > 3 files changed, 34 insertions(+) > > > > diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h > > index b49ab955300f..9e6ad8e0e4aa 100644 > > --- a/include/linux/pgalloc_tag.h > > +++ b/include/linux/pgalloc_tag.h > > @@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order) > > } > > } > > > > +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) > > +{ > > + int i; > > + struct page_ext *page_ext; > > + union codetag_ref *ref; > > + struct alloc_tag *tag; > > + > > + if (!mem_alloc_profiling_enabled()) > > + return; > > + > > + page_ext = page_ext_get(page); > > + if (unlikely(!page_ext)) > > + return; > > + > > + ref = codetag_ref_from_page_ext(page_ext); > > + if (!ref->ct) > > + goto out; > > + > > + tag = ct_to_alloc_tag(ref->ct); > > + page_ext = page_ext_next(page_ext); > > + for (i = 1; i < nr; i++) { > > + /* Set new reference to point to the original tag */ > > + alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag); > > + page_ext = page_ext_next(page_ext); > > + } > > +out: > > + page_ext_put(page_ext); > > +} > > + > > #else /* CONFIG_MEM_ALLOC_PROFILING */ > > > > static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, > > unsigned int order) {} > > static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {} > > +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {} > > > > #endif /* CONFIG_MEM_ALLOC_PROFILING */ > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 94c958f7ebb5..86daae671319 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -38,6 +38,7 @@ > > #include <linux/sched/sysctl.h> > > #include <linux/memory-tiers.h> > > #include <linux/compat.h> > > +#include <linux/pgalloc_tag.h> > > > > #include <asm/tlb.h> > > #include <asm/pgalloc.h> > > @@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, > > /* Caller disabled irqs, so they are still disabled here */ > > > > split_page_owner(head, nr); > > + pgalloc_tag_split(head, nr); > > > > /* See comment in __split_huge_page_tail() */ > > if (PageAnon(head)) { > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 58c0e8b948a4..4bc5b4720fee 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order) > > for (i = 1; i < (1 << order); i++) > > set_page_refcounted(page + i); > > split_page_owner(page, 1 << order); > > + pgalloc_tag_split(page, 1 << order); > > split_page_memcg(page, 1 << order); > > } > > EXPORT_SYMBOL_GPL(split_page); > > @@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order, > > struct page *last = page + nr; > > > > split_page_owner(page, 1 << order); > > + pgalloc_tag_split(page, 1 << order); > > split_page_memcg(page, 1 << order); > > while (page < --last) > > set_page_refcounted(last);
On 2/27/24 17:38, Suren Baghdasaryan wrote: > On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> >> On 2/21/24 20:40, Suren Baghdasaryan wrote: >> > When a high-order page is split into smaller ones, each newly split >> > page should get its codetag. The original codetag is reused for these >> > pages but it's recorded as 0-byte allocation because original codetag >> > already accounts for the original high-order allocated page. >> >> This was v3 but then you refactored (for the better) so the commit log >> could reflect it? > > Yes, technically mechnism didn't change but I should word it better. > Smth like this: > > When a high-order page is split into smaller ones, each newly split > page should get its codetag. After the split each split page will be > referencing the original codetag. The codetag's "bytes" counter > remains the same because the amount of allocated memory has not > changed, however the "calls" counter gets increased to keep the > counter correct when these individual pages get freed. Great, thanks. The concern with __free_pages() is not really related to splitting, so for this patch: Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > >> >> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> >> >> I was going to R-b, but now I recalled the trickiness of >> __free_pages() for non-compound pages if it loses the race to a >> speculative reference. Will the codetag handling work fine there? > > I think so. Each non-compoud page has its individual reference to its > codetag and will decrement it whenever the page is freed. IIUC the > logic in __free_pages(), when it loses race to a speculative > reference it will free all pages except for the first one and the The "tail" pages of this non-compound high-order page will AFAICS not have code tags assigned, so alloc_tag_sub() will be a no-op (or a warning with _DEBUG). > first one will be freed when the last put_page() happens. If prior to > this all these pages were split from one page then all of them will > have their own reference which points to the same codetag. Yeah I'm assuming there's no split before the freeing. This patch about splitting just reminded me of that tricky freeing scenario. So IIUC the "else if (!head)" path of __free_pages() will do nothing about the "tail" pages wrt code tags as there are no code tags. Then whoever took the speculative "head" page reference will put_page() and free it, which will end up in alloc_tag_sub(). This will decrement calls properly, but bytes will become imbalanced, because that put_page() will pass order-0 worth of bytes - the original order is lost. Now this might be rare enough that it's not worth fixing if that would be too complicated, just FYI. > Every time > one of these pages are freed that codetag's "bytes" and "calls" > counters will be decremented. I think accounting will work correctly > irrespective of where these pages are freed, in __free_pages() or by > put_page(). >
On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 2/27/24 17:38, Suren Baghdasaryan wrote: > > On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <vbabka@suse.cz> wrote: > >> > >> On 2/21/24 20:40, Suren Baghdasaryan wrote: > >> > When a high-order page is split into smaller ones, each newly split > >> > page should get its codetag. The original codetag is reused for these > >> > pages but it's recorded as 0-byte allocation because original codetag > >> > already accounts for the original high-order allocated page. > >> > >> This was v3 but then you refactored (for the better) so the commit log > >> could reflect it? > > > > Yes, technically mechnism didn't change but I should word it better. > > Smth like this: > > > > When a high-order page is split into smaller ones, each newly split > > page should get its codetag. After the split each split page will be > > referencing the original codetag. The codetag's "bytes" counter > > remains the same because the amount of allocated memory has not > > changed, however the "calls" counter gets increased to keep the > > counter correct when these individual pages get freed. > > Great, thanks. > The concern with __free_pages() is not really related to splitting, so for > this patch: > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > > > > >> > >> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > >> > >> I was going to R-b, but now I recalled the trickiness of > >> __free_pages() for non-compound pages if it loses the race to a > >> speculative reference. Will the codetag handling work fine there? > > > > I think so. Each non-compoud page has its individual reference to its > > codetag and will decrement it whenever the page is freed. IIUC the > > logic in __free_pages(), when it loses race to a speculative > > reference it will free all pages except for the first one and the > > The "tail" pages of this non-compound high-order page will AFAICS not have > code tags assigned, so alloc_tag_sub() will be a no-op (or a warning with > _DEBUG). Yes, that is correct. > > > first one will be freed when the last put_page() happens. If prior to > > this all these pages were split from one page then all of them will > > have their own reference which points to the same codetag. > > Yeah I'm assuming there's no split before the freeing. This patch about > splitting just reminded me of that tricky freeing scenario. Ah, I see. I thought you were talking about a page that was previously split. > > So IIUC the "else if (!head)" path of __free_pages() will do nothing about > the "tail" pages wrt code tags as there are no code tags. > Then whoever took the speculative "head" page reference will put_page() and > free it, which will end up in alloc_tag_sub(). This will decrement calls > properly, but bytes will become imbalanced, because that put_page() will > pass order-0 worth of bytes - the original order is lost. Yeah, that's true. put_page() will end up calling free_unref_page(&folio->page, 0) even if the original order was more than 0. > > Now this might be rare enough that it's not worth fixing if that would be > too complicated, just FYI. Yeah. We can fix this by subtracting the "bytes" counter of the "head" page for all free_the_page(page + (1 << order), order) calls we do inside __free_pages(). But we can't simply use pgalloc_tag_sub() because the "calls" counter will get over-decremented (we allocated all of these pages with one call). I'll need to introduce a new pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted of a solution but OTOH this is a special situation, so maybe it's acceptable. WDYT? > > > > Every time > > one of these pages are freed that codetag's "bytes" and "calls" > > counters will be decremented. I think accounting will work correctly > > irrespective of where these pages are freed, in __free_pages() or by > > put_page(). > > >
On 2/28/24 18:50, Suren Baghdasaryan wrote: > On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <vbabka@suse.cz> wrote: > >> >> Now this might be rare enough that it's not worth fixing if that would be >> too complicated, just FYI. > > Yeah. We can fix this by subtracting the "bytes" counter of the "head" > page for all free_the_page(page + (1 << order), order) calls we do > inside __free_pages(). But we can't simply use pgalloc_tag_sub() > because the "calls" counter will get over-decremented (we allocated > all of these pages with one call). I'll need to introduce a new > pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted > of a solution but OTOH this is a special situation, so maybe it's > acceptable. WDYT? Hmm I think there's a problem that once you fail put_page_testzero() and detect you need to do this, the page might be already gone or reallocated so you can't get to the tag for decrementing bytes. You'd have to get it upfront (I guess for "head && order > 0" cases) just in case it happens. Maybe it's not worth the trouble for such a rare case. >> >> >> > Every time >> > one of these pages are freed that codetag's "bytes" and "calls" >> > counters will be decremented. I think accounting will work correctly >> > irrespective of where these pages are freed, in __free_pages() or by >> > put_page(). >> > >>
On Wed, Feb 28, 2024 at 10:28 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 2/28/24 18:50, Suren Baghdasaryan wrote: > > On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > > >> > >> Now this might be rare enough that it's not worth fixing if that would be > >> too complicated, just FYI. > > > > Yeah. We can fix this by subtracting the "bytes" counter of the "head" > > page for all free_the_page(page + (1 << order), order) calls we do > > inside __free_pages(). But we can't simply use pgalloc_tag_sub() > > because the "calls" counter will get over-decremented (we allocated > > all of these pages with one call). I'll need to introduce a new > > pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted > > of a solution but OTOH this is a special situation, so maybe it's > > acceptable. WDYT? > > Hmm I think there's a problem that once you fail put_page_testzero() and > detect you need to do this, the page might be already gone or reallocated so > you can't get to the tag for decrementing bytes. You'd have to get it > upfront (I guess for "head && order > 0" cases) just in case it happens. > Maybe it's not worth the trouble for such a rare case. Yes, that hit me when I tried to implement it but there is a simple solution around that. I can obtain alloc_tag before doing put_page_testzero() and then decrement bytes counter directly as needed. Not sure if it is a rare enough case that we can ignore it but if the fix is simple enough then might as well do it? > > >> > >> > >> > Every time > >> > one of these pages are freed that codetag's "bytes" and "calls" > >> > counters will be decremented. I think accounting will work correctly > >> > irrespective of where these pages are freed, in __free_pages() or by > >> > put_page(). > >> > > >> > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h index b49ab955300f..9e6ad8e0e4aa 100644 --- a/include/linux/pgalloc_tag.h +++ b/include/linux/pgalloc_tag.h @@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order) } } +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) +{ + int i; + struct page_ext *page_ext; + union codetag_ref *ref; + struct alloc_tag *tag; + + if (!mem_alloc_profiling_enabled()) + return; + + page_ext = page_ext_get(page); + if (unlikely(!page_ext)) + return; + + ref = codetag_ref_from_page_ext(page_ext); + if (!ref->ct) + goto out; + + tag = ct_to_alloc_tag(ref->ct); + page_ext = page_ext_next(page_ext); + for (i = 1; i < nr; i++) { + /* Set new reference to point to the original tag */ + alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag); + page_ext = page_ext_next(page_ext); + } +out: + page_ext_put(page_ext); +} + #else /* CONFIG_MEM_ALLOC_PROFILING */ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, unsigned int order) {} static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {} +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {} #endif /* CONFIG_MEM_ALLOC_PROFILING */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 94c958f7ebb5..86daae671319 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -38,6 +38,7 @@ #include <linux/sched/sysctl.h> #include <linux/memory-tiers.h> #include <linux/compat.h> +#include <linux/pgalloc_tag.h> #include <asm/tlb.h> #include <asm/pgalloc.h> @@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, /* Caller disabled irqs, so they are still disabled here */ split_page_owner(head, nr); + pgalloc_tag_split(head, nr); /* See comment in __split_huge_page_tail() */ if (PageAnon(head)) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 58c0e8b948a4..4bc5b4720fee 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order) for (i = 1; i < (1 << order); i++) set_page_refcounted(page + i); split_page_owner(page, 1 << order); + pgalloc_tag_split(page, 1 << order); split_page_memcg(page, 1 << order); } EXPORT_SYMBOL_GPL(split_page); @@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order, struct page *last = page + nr; split_page_owner(page, 1 << order); + pgalloc_tag_split(page, 1 << order); split_page_memcg(page, 1 << order); while (page < --last) set_page_refcounted(last);