Message ID | 20231221065943.2803551-1-shy828301@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-21312-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2411:b0:101:2151:f287 with SMTP id m17csp337364dyi; Tue, 9 Jan 2024 11:32:29 -0800 (PST) X-Google-Smtp-Source: AGHT+IE7OywOobemjePXu36+C/eyGYmagkbK9u9yd4XW8TUScH8VNzYSaiQNSXrKRt7zE2W9EhOc X-Received: by 2002:a05:6512:3e10:b0:50e:6b57:300f with SMTP id i16-20020a0565123e1000b0050e6b57300fmr1602685lfv.161.1704828749657; Tue, 09 Jan 2024 11:32:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704828749; cv=none; d=google.com; s=arc-20160816; b=Rq+dCaX0najJoKjxJCbFwbH40cnFlG+xPapP/GGRVckrxJ22yeUF3TibfXuPRO9rQb CMEP+RMLlheYl5fDkyodkceGo4t5nnIZyG7BIaMPKlEHTt65b1pnPe1ca5xNYK7JwP94 BiBksyC8wOQITWxfCai1iTJr+nEkrjz+ftpUVwR7vPYZoZaWLJK0KprB5MS7BIgmxbIi EZLMWhKBeH/4nIheZMP+DdoUFebSar1GfVD/lc/Q5xxnalTT3t8kmbnwhPCOG/i57TWQ 545YGe486b55L678cJ8HK+D0qw0Mw7ouX2P/vk1+6l89rWcSLpI9BHSkFkJIq/iR8orA 5BZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=PBnpa/7xlrSbnA3Nm7BMZMZEN1FuSc8e9znQIZPVWWQ=; fh=ZWcTJY7/JTZPRq0vNALGhHYdQEmjPYY7a79dr5y+MAs=; b=fPMZHuPNjj1eS3fd0/YTIr2e4z2R0r9r8GSqlL/b+M4BCJfTC+QjwikHY9N2ftyjr5 FTREq+rzUnzG+WILjXQ0Kw9VkNStB6M5IUrUEavQGJSQ2uWu27qcHPV/ZIY4i8dwc3U1 QKo28Am+ExLwABEa0lRfSPxvVi3s0nP3NEeeq0g9j+Cz602JD6OdIBTkgHJnhgXZogYI TUQxuvfdsh8mhIyJyDwGpNSOkt7G3VuIdreoy7IlN/7NGDDTH18Msr7jCUNShhKrgHA5 Q+ol6ZCVa7e+/w/Q0KPDWesx6xZlkuubI1cagvcWy4Nthl1pvxs1zj+woJgNVQ88hpAX 8cPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=XIbX3DKg; spf=pass (google.com: domain of linux-kernel+bounces-21312-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-21312-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id f24-20020a50a6d8000000b005536bd5c5a4si1009681edc.413.2024.01.09.11.32.29 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jan 2024 11:32:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-21312-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=XIbX3DKg; spf=pass (google.com: domain of linux-kernel+bounces-21312-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-21312-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 3D5531F25D7E for <ouuuleilei@gmail.com>; Tue, 9 Jan 2024 19:32:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7D1473D567; Tue, 9 Jan 2024 19:31:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XIbX3DKg" Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E5C93D3A7 for <linux-kernel@vger.kernel.org>; Tue, 9 Jan 2024 19:31:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oi1-f178.google.com with SMTP id 5614622812f47-3bc4f49a3b6so3696352b6e.1 for <linux-kernel@vger.kernel.org>; Tue, 09 Jan 2024 11:31:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704828714; x=1705433514; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=PBnpa/7xlrSbnA3Nm7BMZMZEN1FuSc8e9znQIZPVWWQ=; b=XIbX3DKgtiILJc491zm91AZ4qAoZVsez7vEuHE5HvFamGShCGT5ugNYXpqnwsVdGoe 8SW5FN+Kb+1G8J0bdpf2wwfjl/BirTmPzWQGa73LLDAqRMdS0LsDnF/RNocG0UfiHCCk +rsNc75+nDJJPzlKcXZlfOiuJvNMbkix6yunbvdBTO+kiCwBUlZc+F1Gv0RGnCrsLntk fwfLWqzmqbkz1OkchYwhC3aFDaGfEn7XBDVhI5rK0d5z9pdEPtfsbn78hSpb+6h8/csJ xmXB4Ge6iAWVTwYsTpsEQIDESDz7TpFPugF9Cks9vHqbhqIpFbx2eK0bPI/HuU3Udxnv OH3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704828714; x=1705433514; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PBnpa/7xlrSbnA3Nm7BMZMZEN1FuSc8e9znQIZPVWWQ=; b=eflbvygfiZBTDHrU+6oENhZxk6dAktdlBYEQI/GANtzXLTAoB5TuaMrR4n3VR94g2C VJ7cZKqNDWs6S8tPEeDY8ubAEuNDmp9TktKG1WQat3SJu8DSRLr+d/8S9f063f7jnmO1 +ffZXmW3ZFHTSsJzBjnhuyLC60qV7FlEwq079CXClAkmFZtoIIJ4Ndz53btZHAYovr0y HluCa2Do/dDAYlzSnUSuuUXw6r/FHE7PC/IAVhXPGrDWxcQRPCUQaYt9gBli///nwUOg wxaD05u0BnNDe9g1h5iZxqV692IQPP7xj9Yu6ZTgGttAdlsSZPe9GWRIdFIaa2orE3wM tcpA== X-Gm-Message-State: AOJu0YxVXk2a7C3AxBQ0BC5HeymAGyp2IWZMWOB+OCAtxhCpNs3zQR8x CKvHNP7Y9r/ch77AU8j/cEQ= X-Received: by 2002:a54:4709:0:b0:3bc:3c4a:44b with SMTP id k9-20020a544709000000b003bc3c4a044bmr7023972oik.106.1704828714635; Tue, 09 Jan 2024 11:31:54 -0800 (PST) Received: from localhost.localdomain (c-67-174-241-145.hsd1.ca.comcast.net. [67.174.241.145]) by smtp.gmail.com with ESMTPSA id w2-20020a62c702000000b006d9a0059a9asm2173490pfg.172.2024.01.09.11.31.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jan 2024 11:31:54 -0800 (PST) From: Yang Shi <shy828301@gmail.com> To: oliver.sang@intel.com, riel@surriel.com, fengwei.yin@intel.com, willy@infradead.org, cl@linux.com, ying.huang@intel.com, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack Date: Wed, 20 Dec 2023 22:59:42 -0800 Message-Id: <20231221065943.2803551-1-shy828301@gmail.com> X-Mailer: git-send-email 2.39.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787642510921501147 X-GMAIL-MSGID: 1787642510921501147 |
Series |
[1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack
|
|
Commit Message
Yang Shi
Dec. 21, 2023, 6:59 a.m. UTC
From: Yang Shi <yang@os.amperecomputing.com> We avoid allocating THP for temporary stack, even tnough khugepaged_enter_vma() is called for stack VMAs, it actualy returns false. So no need to call it in the first place at all. Signed-off-by: Yang Shi <yang@os.amperecomputing.com> --- mm/mmap.c | 2 -- 1 file changed, 2 deletions(-)
Comments
On 2023/12/21 14:59, Yang Shi wrote: > From: Yang Shi <yang@os.amperecomputing.com> > > We avoid allocating THP for temporary stack, even tnough > khugepaged_enter_vma() is called for stack VMAs, it actualy returns > false. So no need to call it in the first place at all. > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> > --- > mm/mmap.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index b78e83d351d2..2ff79b1d1564 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > } > } > anon_vma_unlock_write(vma->anon_vma); > - khugepaged_enter_vma(vma, vma->vm_flags); > mas_destroy(&mas); > validate_mm(mm); > return error; > @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) > } > } > anon_vma_unlock_write(vma->anon_vma); > - khugepaged_enter_vma(vma, vma->vm_flags); > mas_destroy(&mas); > validate_mm(mm); > return error;
On 2023/12/21 14:59, Yang Shi wrote: > From: Yang Shi <yang@os.amperecomputing.com> > > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP > boundaries") incured regression for stress-ng pthread benchmark [1]. > It is because THP get allocated to pthread's stack area much more possible > than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not. > > The MAP_STACK flag is used to mark the stack area, but it is a no-op on > Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating > THP for such stack area. > > With this change the stack area looks like: > > fffd18e10000-fffd19610000 rw-p 00000000 00:00 0 > Size: 8192 kB > KernelPageSize: 4 kB > MMUPageSize: 4 kB > Rss: 12 kB > Pss: 12 kB > Pss_Dirty: 12 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 0 kB > Private_Dirty: 12 kB > Referenced: 12 kB > Anonymous: 12 kB > KSM: 0 kB > LazyFree: 0 kB > AnonHugePages: 0 kB > ShmemPmdMapped: 0 kB > FilePmdMapped: 0 kB > Shared_Hugetlb: 0 kB > Private_Hugetlb: 0 kB > Swap: 0 kB > SwapPss: 0 kB > Locked: 0 kB > THPeligible: 0 > VmFlags: rd wr mr mw me ac nh > > The "nh" flag is set. > > [1] https://lore.kernel.org/linux-mm/202312192310.56367035-oliver.sang@intel.com/ > > Reported-by: kernel test robot <oliver.sang@intel.com> > Tested-by: Oliver Sang <oliver.sang@intel.com> > Cc: Yin Fengwei <fengwei.yin@intel.com> > Cc: Rik van Riel <riel@surriel.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Christopher Lameter <cl@linux.com> > Cc: Huang, Ying <ying.huang@intel.com> > Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> > --- > include/linux/mman.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 40d94411d492..dc7048824be8 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > arch_calc_vm_flag_bits(flags); > } >
Yang Shi <shy828301@gmail.com> writes: > From: Yang Shi <yang@os.amperecomputing.com> > > We avoid allocating THP for temporary stack, even tnough ~~~~~~ though? -- Best Regards, Huang, Ying > khugepaged_enter_vma() is called for stack VMAs, it actualy returns > false. So no need to call it in the first place at all. > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com> > --- > mm/mmap.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index b78e83d351d2..2ff79b1d1564 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > } > } > anon_vma_unlock_write(vma->anon_vma); > - khugepaged_enter_vma(vma, vma->vm_flags); > mas_destroy(&mas); > validate_mm(mm); > return error; > @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) > } > } > anon_vma_unlock_write(vma->anon_vma); > - khugepaged_enter_vma(vma, vma->vm_flags); > mas_destroy(&mas); > validate_mm(mm); > return error;
On Sun, Jan 14, 2024 at 9:52 PM Huang, Ying <ying.huang@intel.com> wrote: > > Yang Shi <shy828301@gmail.com> writes: > > > From: Yang Shi <yang@os.amperecomputing.com> > > > > We avoid allocating THP for temporary stack, even tnough > ~~~~~~ > though? Yeah, it is a typo. Thanks for noticing this. > > -- > Best Regards, > Huang, Ying > > > khugepaged_enter_vma() is called for stack VMAs, it actualy returns > > false. So no need to call it in the first place at all. > > > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com> > > --- > > mm/mmap.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index b78e83d351d2..2ff79b1d1564 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > > } > > } > > anon_vma_unlock_write(vma->anon_vma); > > - khugepaged_enter_vma(vma, vma->vm_flags); > > mas_destroy(&mas); > > validate_mm(mm); > > return error; > > @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) > > } > > } > > anon_vma_unlock_write(vma->anon_vma); > > - khugepaged_enter_vma(vma, vma->vm_flags); > > mas_destroy(&mas); > > validate_mm(mm); > > return error;
* Yang Shi: > From: Yang Shi <yang@os.amperecomputing.com> > > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP > boundaries") incured regression for stress-ng pthread benchmark [1]. > It is because THP get allocated to pthread's stack area much more possible > than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not. > > The MAP_STACK flag is used to mark the stack area, but it is a no-op on > Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating > THP for such stack area. Doesn't this introduce a regression in the other direction, where workloads expect to use a hugepage TLB entry for the stack? It's seems an odd approach to fixing the stress-ng regression. Isn't it very much coding to the benchmark? Thanks, Florian
* Yang Shi: > On Tue, Jan 30, 2024 at 11:53 PM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Yang Shi: >> >> > From: Yang Shi <yang@os.amperecomputing.com> >> > >> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP >> > boundaries") incured regression for stress-ng pthread benchmark [1]. >> > It is because THP get allocated to pthread's stack area much more possible >> > than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN >> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not. >> > >> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on >> > Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating >> > THP for such stack area. >> >> Doesn't this introduce a regression in the other direction, where >> workloads expect to use a hugepage TLB entry for the stack? > > Maybe, it is theoretically possible. But AFAICT, the real life > workloads performance usually gets hurt if THP is used for stack. > Willy has an example: > > https://lore.kernel.org/linux-mm/ZYPDwCcAjX+r+g6s@casper.infradead.org/#t > > And avoiding THP on stack is not new, VM_GROWSDOWN | VM_GROWSUP areas > have been applied before, this patch just extends this to MAP_STACK. If it's *always* beneficial then we should help it along in glibc as well. We've started to offer a tunable in response to this observation (also paper over in OpenJDK): Make thread stacks not use huge pages <https://bugs.openjdk.org/browse/JDK-8303215> But this is specifically about RSS usage, and not directly about reducing TLB misses etc. Thanks, Florian
diff --git a/mm/mmap.c b/mm/mmap.c index b78e83d351d2..2ff79b1d1564 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) } } anon_vma_unlock_write(vma->anon_vma); - khugepaged_enter_vma(vma, vma->vm_flags); mas_destroy(&mas); validate_mm(mm); return error; @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) } } anon_vma_unlock_write(vma->anon_vma); - khugepaged_enter_vma(vma, vma->vm_flags); mas_destroy(&mas); validate_mm(mm); return error;