Message ID | 1d679805-8a82-44a4-ba14-49d4f28ff597@p183 |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3529768vqy; Tue, 5 Dec 2023 08:02:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IG3DOIqLjFSRkq5YDXeAoTb++xTVNIb3jCQ9I+qb2U5Pw13MeVW45kd0ibo9RQofUYaoj/7 X-Received: by 2002:a05:620a:230:b0:777:29d8:5dc2 with SMTP id u16-20020a05620a023000b0077729d85dc2mr1401723qkm.66.1701792148196; Tue, 05 Dec 2023 08:02:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701792148; cv=none; d=google.com; s=arc-20160816; b=WFdkkx5Z7uzPBI0LiXhFZsu6xn+n8dWvO6XV/DiIymRf3eyKb8U1odHe9XUGVJf5PP 1F+sZSLU4o7+j8GMLH70E+g9EaIj0M0AiNbF+HQWZbgrP3m4ORFFCdiLOite5Y5LbT5/ oHCeyW9nZRAjy8HhdwDRt9u8GqLXEQghxOU1BOEXU95JTLNfvqeOV8bNLqvRDC0X0UW3 Go9rPEWATlZDWfnthSVKXLC6fOBy+VNNNqMt8PSJWXEIJTG8HFYQ2a6J63aEWOeJl+ts tKCASxsEuuqYcVCwtcWhvgCT45rbgovUyivaFW6c2ULewcav59RTjiwQUO9YKs/x8+KD k9RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Tvo/9rnArZ8fI0hFLD0+8fIJyDMuJ5boJQpNSGg0de4=; fh=yj6yxuR3CJ0J/QMf5clYFKBRnhonznizL+14OP4e1Ok=; b=CwrBm0fs4DLRDp1SgYItClz4RJYzyR/5PVgC0ndzG6gzgPkJ0H73B+aZc7AxP20gsq IOUEMKVByRpfDkuxMUZBjFMXiMAreCMRlTHPCEd7EyZ/lEBbAiONKKi3R0tc0RPQ322E M4bh/3qGPenivUaKJJILOYhq9/xk29RGqOnzv02fBC/CDnBl8CFTGIPdRuuvO7G4Oy1w pk1g43MZUjxlF3FWRbzZ11SO64EtM7cb7WfDlKOD0UOepo3OQbmfozwj5p1wD2aXMhtY E/2zgXki4+f81nkoYaw8g/kmB3oydA2+W36eFwFQzkf6zhaEgqagKAaaY/NaVRi5PJg/ kT1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ua44hSie; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id i5-20020a05620a248500b0077da5e2cbb9si12460549qkn.316.2023.12.05.08.02.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 08:02:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ua44hSie; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 99F4080ACFCC; Tue, 5 Dec 2023 08:02:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346071AbjLEQBj (ORCPT <rfc822;chrisfriedt@gmail.com> + 99 others); Tue, 5 Dec 2023 11:01:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235614AbjLEQBd (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 5 Dec 2023 11:01:33 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1ABC01AB; Tue, 5 Dec 2023 08:01:39 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-3333224c7b9so3711757f8f.1; Tue, 05 Dec 2023 08:01:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701792097; x=1702396897; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Tvo/9rnArZ8fI0hFLD0+8fIJyDMuJ5boJQpNSGg0de4=; b=Ua44hSie5AoBdpKopXuz6XgZ4mVi8FN/VOwnnUMYnfUYsNcd6eJ3/x53YNLEo7xgim f/ZhmeoTXKDWSLDX7KvuO5yGQth/g1a8bZAkGRQynVUs1MOhIQMT1jRFwxzoA2ZPOmRU EqLNpvZegCB+wYnsUCWKcPQNxSaAV+2YsQMbdgXjyb/KQcW/oMAAgKMiMz8ZUgKk+XtA 5H9DX+0KjHNPGO9vQLmeXzu8/4NaxXAzzWXIe5n4ji6P0ySLOErqzDaJQCr0zjidjjif 56NOZxSinLA+Ar4kzkUUovi/jn8CWUURiTwlNqRvM7lafyN0+TabPRcW+po5OTqWgDZi 6JoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701792097; x=1702396897; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Tvo/9rnArZ8fI0hFLD0+8fIJyDMuJ5boJQpNSGg0de4=; b=WlcMUCOIDrVGv/wcWvNtjLJ3i31U1+mr/SIDJu6lgY3g8icXq9q5srzDOdLNA+vh/7 GpqG5nSqNQjfYcE2KvKTrBZ/Hgc67Igc89DzQDfnpY7LJ6bgwZtOnc12QtrWkRgpYx9k QfRt7l8eyF1MK+dO195vowrBmF6/uv8Wr9euNKNg9ZlUSGclMer3bf1YLHvJlSCFFjZm 0xzrGkM0lBJU7OnfdVe9q9NINHPJR+E89Wj/clW9moeNP4wvelGw0KHZCLZ/PfjaixZ7 KJaXfjbJ7nCjJyzMJNtkJR3jz2VFIp8fMpQ+AaVJsX+r7F/yztYdDdvIE2gp36L18DDo HFkQ== X-Gm-Message-State: AOJu0YxqNdPTCMSO4qj7u9Bip/ed6fMPfTqFBEokZEzIX3+nDCH0huyw OMs7zRe8eIuKk/kTc7BHdnJf5W3RKA== X-Received: by 2002:adf:e349:0:b0:333:4862:8520 with SMTP id n9-20020adfe349000000b0033348628520mr2133188wrj.138.1701792097218; Tue, 05 Dec 2023 08:01:37 -0800 (PST) Received: from p183 ([46.53.254.107]) by smtp.gmail.com with ESMTPSA id r13-20020a056000014d00b003333fc2cb92sm8295598wrx.58.2023.12.05.08.01.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 08:01:36 -0800 (PST) Date: Tue, 5 Dec 2023 19:01:34 +0300 From: Alexey Dobriyan <adobriyan@gmail.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Florian Weimer <fweimer@redhat.com>, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org Subject: [PATCH v2] ELF: supply userspace with available page shifts (AT_PAGE_SHIFT_MASK) Message-ID: <1d679805-8a82-44a4-ba14-49d4f28ff597@p183> References: <6b399b86-a478-48b0-92a1-25240a8ede54@p183> <87v89dvuxg.fsf@oldenburg.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87v89dvuxg.fsf@oldenburg.str.redhat.com> X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 05 Dec 2023 08:02:19 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784458403611256971 X-GMAIL-MSGID: 1784458403611256971 |
Series |
[v2] ELF: supply userspace with available page shifts (AT_PAGE_SHIFT_MASK)
|
|
Commit Message
Alexey Dobriyan
Dec. 5, 2023, 4:01 p.m. UTC
Report available page shifts in arch independent manner, so that
userspace developers won't have to parse /proc/cpuinfo hunting
for arch specific strings:
Note!
This is strictly for userspace, if some page size is shutdown due
to kernel command line option or CPU bug workaround, than is must not
be reported in aux vector!
x86_64 machine with 1 GiB pages:
00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00
00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00
x86_64 machine with 2 MiB pages only:
00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00
00000040 1d 00 00 00 00 00 00 00 00 10 20 00 00 00 00 00
AT_PAGESZ is always 4096 which is not that interesting.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---
v2: switch to 1 bit per page shift (bitmask)
arch/x86/include/asm/elf.h | 12 ++++++++++++
fs/binfmt_elf.c | 3 +++
include/uapi/linux/auxvec.h | 14 ++++++++++++++
3 files changed, 29 insertions(+)
Comments
On Tue, Dec 05, 2023 at 07:01:34PM +0300, Alexey Dobriyan wrote: > Report available page shifts in arch independent manner, so that > userspace developers won't have to parse /proc/cpuinfo hunting > for arch specific strings: > > Note! > > This is strictly for userspace, if some page size is shutdown due > to kernel command line option or CPU bug workaround, than is must not > be reported in aux vector! Given Florian in CC, I assume this is something glibc would like to be using? Please mention this in the commit log. > > x86_64 machine with 1 GiB pages: > > 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00 > > x86_64 machine with 2 MiB pages only: > > 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > 00000040 1d 00 00 00 00 00 00 00 00 10 20 00 00 00 00 00 > > AT_PAGESZ is always 4096 which is not that interesting. That's not always true. For example, see arm64: arch/arm64/include/asm/elf.h:#define ELF_EXEC_PAGESIZE PAGE_SIZE I'm not actually sure why x86 forces it to 4096. I'd need to go look through the history there. > > Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> > --- > > v2: switch to 1 bit per page shift (bitmask) > > arch/x86/include/asm/elf.h | 12 ++++++++++++ > fs/binfmt_elf.c | 3 +++ > include/uapi/linux/auxvec.h | 14 ++++++++++++++ > 3 files changed, 29 insertions(+) > > --- a/arch/x86/include/asm/elf.h > +++ b/arch/x86/include/asm/elf.h > @@ -358,6 +358,18 @@ else if (IS_ENABLED(CONFIG_IA32_EMULATION)) \ > > #define COMPAT_ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x1000000) > > +#define ARCH_AT_PAGE_SHIFT_MASK \ > + do { \ > + u32 val = 1 << 12; \ > + if (boot_cpu_has(X86_FEATURE_PSE)) { \ > + val |= 1 << 21; \ > + } \ > + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \ > + val |= 1 << 30; \ > + } \ > + NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, val); \ > + } while (0) > + > #endif /* !CONFIG_X86_32 */ Can't we have a generic ARCH_AT_PAGE_SHIFT_MASK too? Something like: #ifndef ARCH_AT_PAGE_SHIFT_MASK #define ARCH_AT_PAGE_SHIFT_MASK NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, 1 << PAGE_SHIFT) #endif Or am I misunderstanding something here? > > #define VDSO_CURRENT_BASE ((unsigned long)current->mm->context.vdso) > --- a/fs/binfmt_elf.c > +++ b/fs/binfmt_elf.c > @@ -240,6 +240,9 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, > #endif > NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); > NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE); > +#ifdef ARCH_AT_PAGE_SHIFT_MASK > + ARCH_AT_PAGE_SHIFT_MASK; > +#endif That way we can avoid an #ifdef in the .c file. > NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC); > NEW_AUX_ENT(AT_PHDR, phdr_addr); > NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr)); > --- a/include/uapi/linux/auxvec.h > +++ b/include/uapi/linux/auxvec.h > @@ -33,6 +33,20 @@ > #define AT_RSEQ_FEATURE_SIZE 27 /* rseq supported feature size */ > #define AT_RSEQ_ALIGN 28 /* rseq allocation alignment */ > > +/* > + * Page sizes available for mmap(2) encoded as bitmask. > + * > + * Example: x86_64 system with pse, pdpe1gb /proc/cpuinfo flags reports > + * 4 KiB, 2 MiB and 1 GiB page support. > + * > + * $ hexdump -C /proc/self/auxv FWIW, a more readable form is: $ LD_SHOW_AUXV=1 /bin/true > + * 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > + * 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00 > + * > + * For 2^64 hugepage support please contact your Universe sales representative. > + */ > +#define AT_PAGE_SHIFT_MASK 29 ... hmm, why is 29 unused? > + > #define AT_EXECFN 31 /* filename of program */ > > #ifndef AT_MINSIGSTKSZ This will need a man page update for "getauxval" as well...
* Kees Cook: > On Tue, Dec 05, 2023 at 07:01:34PM +0300, Alexey Dobriyan wrote: >> Report available page shifts in arch independent manner, so that >> userspace developers won't have to parse /proc/cpuinfo hunting >> for arch specific strings: >> >> Note! >> >> This is strictly for userspace, if some page size is shutdown due >> to kernel command line option or CPU bug workaround, than is must not >> be reported in aux vector! > > Given Florian in CC, I assume this is something glibc would like to be > using? Please mention this in the commit log. Nope, I just wrote a random drive-by comment on the first version. >> x86_64 machine with 1 GiB pages: >> >> 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 >> 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00 >> >> x86_64 machine with 2 MiB pages only: >> >> 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 >> 00000040 1d 00 00 00 00 00 00 00 00 10 20 00 00 00 00 00 >> >> AT_PAGESZ is always 4096 which is not that interesting. > > That's not always true. For example, see arm64: > arch/arm64/include/asm/elf.h:#define ELF_EXEC_PAGESIZE PAGE_SIZE I'm pretty sure the comment refers to the x86-64 situation. 8-) > I'm not actually sure why x86 forces it to 4096. I'd need to go look > through the history there. On x86-64, page size 4096 is architectural. Likewise on s390x and a few other architectures. Thanks, Florian
On Wed, Dec 06, 2023 at 10:05:36PM +0100, Florian Weimer wrote: > * Kees Cook: > > > On Tue, Dec 05, 2023 at 07:01:34PM +0300, Alexey Dobriyan wrote: > >> Report available page shifts in arch independent manner, so that > >> userspace developers won't have to parse /proc/cpuinfo hunting > >> for arch specific strings: > >> > >> Note! > >> > >> This is strictly for userspace, if some page size is shutdown due > >> to kernel command line option or CPU bug workaround, than is must not > >> be reported in aux vector! > > > > Given Florian in CC, I assume this is something glibc would like to be > > using? Please mention this in the commit log. > > Nope, I just wrote a random drive-by comment on the first version. Ah, okay. Then Alexey, who do you expect to be the consumer of this new AT value?
On Wed, Dec 06, 2023 at 12:47:27PM -0800, Kees Cook wrote: > On Tue, Dec 05, 2023 at 07:01:34PM +0300, Alexey Dobriyan wrote: > > Report available page shifts in arch independent manner, so that > > userspace developers won't have to parse /proc/cpuinfo hunting > > for arch specific strings: > > > > Note! > > > > This is strictly for userspace, if some page size is shutdown due > > to kernel command line option or CPU bug workaround, than is must not > > be reported in aux vector! > > Given Florian in CC, I assume this is something glibc would like to be > using? Please mention this in the commit log. glibc can use it. Main user is libhugetlbfs, I guess: https://github.com/libhugetlbfs/libhugetlbfs/blob/master/hugeutils.c#L915 Loop inside getauxval() can run faster than opendir(). > > x86_64 machine with 1 GiB pages: > > > > 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > > 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00 > > > > x86_64 machine with 2 MiB pages only: > > > > 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > > 00000040 1d 00 00 00 00 00 00 00 00 10 20 00 00 00 00 00 > > > > AT_PAGESZ is always 4096 which is not that interesting. > > That's not always true. For example, see arm64: > arch/arm64/include/asm/elf.h:#define ELF_EXEC_PAGESIZE PAGE_SIZE Yes, I'm x86_64 guy, AT_PAGESZ remark is about x86_64. > I'm not actually sure why x86 forces it to 4096. I'd need to go look > through the history there. > > --- a/arch/x86/include/asm/elf.h > > +++ b/arch/x86/include/asm/elf.h > > @@ -358,6 +358,18 @@ else if (IS_ENABLED(CONFIG_IA32_EMULATION)) \ > > > > #define COMPAT_ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x1000000) > > > > +#define ARCH_AT_PAGE_SHIFT_MASK \ > > + do { \ > > + u32 val = 1 << 12; \ > > + if (boot_cpu_has(X86_FEATURE_PSE)) { \ > > + val |= 1 << 21; \ > > + } \ > > + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \ > > + val |= 1 << 30; \ > > + } \ > > + NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, val); \ > > + } while (0) > > + > > #endif /* !CONFIG_X86_32 */ > > Can't we have a generic ARCH_AT_PAGE_SHIFT_MASK too? Something like: > > #ifndef ARCH_AT_PAGE_SHIFT_MASK > #define ARCH_AT_PAGE_SHIFT_MASK > NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, 1 << PAGE_SHIFT) > #endif > > Or am I misunderstanding something here? 1) Arch maintainers can opt into this new way to report information at their own pace. 2) AT_PAGE_SHIFT_MASK is about _all_ pagesizes supported by CPU. Reporting just one is missing the point. I'll clarify comment: mmap() support require many things including tests for hugetlbfs being mounted, this is about CPU support. > > --- a/fs/binfmt_elf.c > > +++ b/fs/binfmt_elf.c > > @@ -240,6 +240,9 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, > > #endif > > NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); > > NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE); > > +#ifdef ARCH_AT_PAGE_SHIFT_MASK > > + ARCH_AT_PAGE_SHIFT_MASK; > > +#endif > > That way we can avoid an #ifdef in the .c file. That's a false economy. ifdefs aren't bad inherently. When all archs implement AT_PAGE_SHIFT_MASK, ifdef will be removed. > > --- a/include/uapi/linux/auxvec.h > > +++ b/include/uapi/linux/auxvec.h > > @@ -33,6 +33,20 @@ > > #define AT_RSEQ_FEATURE_SIZE 27 /* rseq supported feature size */ > > #define AT_RSEQ_ALIGN 28 /* rseq allocation alignment */ > > > > +/* > > + * Page sizes available for mmap(2) encoded as bitmask. > > + * > > + * Example: x86_64 system with pse, pdpe1gb /proc/cpuinfo flags reports > > + * 4 KiB, 2 MiB and 1 GiB page support. > > + * > > + * $ hexdump -C /proc/self/auxv > > FWIW, a more readable form is: $ LD_SHOW_AUXV=1 /bin/true OK. It doesn't show new values as text, but OK. > > + * 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > > + * 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00 > > + * > > + * For 2^64 hugepage support please contact your Universe sales representative. > > + */ > > +#define AT_PAGE_SHIFT_MASK 29 > > ... hmm, why is 29 unused? > > > + > > #define AT_EXECFN 31 /* filename of program */ > > > > #ifndef AT_MINSIGSTKSZ > > This will need a man page update for "getauxval" as well... Hear, hear!
On Wed, Dec 06, 2023 at 01:09:01PM -0800, Kees Cook wrote: > On Wed, Dec 06, 2023 at 10:05:36PM +0100, Florian Weimer wrote: > > * Kees Cook: > > > > > On Tue, Dec 05, 2023 at 07:01:34PM +0300, Alexey Dobriyan wrote: > > >> Report available page shifts in arch independent manner, so that > > >> userspace developers won't have to parse /proc/cpuinfo hunting > > >> for arch specific strings: > > >> > > >> Note! > > >> > > >> This is strictly for userspace, if some page size is shutdown due > > >> to kernel command line option or CPU bug workaround, than is must not > > >> be reported in aux vector! > > > > > > Given Florian in CC, I assume this is something glibc would like to be > > > using? Please mention this in the commit log. > > > > Nope, I just wrote a random drive-by comment on the first version. > > Ah, okay. Then Alexey, who do you expect to be the consumer of this new > AT value? libhugetlbfs and everyone who is using 2 MiB pages. New code should look like this: #ifndef AT_PAGE_SHIFT_MASK #define AT_PAGE_SHIFT_MASK 29 #endif unsigned long val = getauxval(AT_PAGE_SHIFT_MASK); if (val) { g_page_size_2mib = val & (1UL << 21); return; } // old 2 MiB page detection code It is few lines of fast code before code they're already using.
* Alexey Dobriyan: > On Wed, Dec 06, 2023 at 12:47:27PM -0800, Kees Cook wrote: >> On Tue, Dec 05, 2023 at 07:01:34PM +0300, Alexey Dobriyan wrote: >> > Report available page shifts in arch independent manner, so that >> > userspace developers won't have to parse /proc/cpuinfo hunting >> > for arch specific strings: >> > >> > Note! >> > >> > This is strictly for userspace, if some page size is shutdown due >> > to kernel command line option or CPU bug workaround, than is must not >> > be reported in aux vector! >> >> Given Florian in CC, I assume this is something glibc would like to be >> using? Please mention this in the commit log. > > glibc can use it. Main user is libhugetlbfs, I guess: > > https://github.com/libhugetlbfs/libhugetlbfs/blob/master/hugeutils.c#L915 > > Loop inside getauxval() can run faster than opendir(). Is libhugetlbfs still maintained? Last commit was three years ago? Thanks, Florian
On Thu, Dec 07, 2023 at 05:57:05PM +0300, Alexey Dobriyan wrote: > On Wed, Dec 06, 2023 at 12:47:27PM -0800, Kees Cook wrote: > > Can't we have a generic ARCH_AT_PAGE_SHIFT_MASK too? Something like: > > > > #ifndef ARCH_AT_PAGE_SHIFT_MASK > > #define ARCH_AT_PAGE_SHIFT_MASK > > NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, 1 << PAGE_SHIFT) > > #endif > > > > Or am I misunderstanding something here? > > 1) Arch maintainers can opt into this new way to report information at > their own pace. > > 2) AT_PAGE_SHIFT_MASK is about _all_ pagesizes supported by CPU. > Reporting just one is missing the point. > > I'll clarify comment: mmap() support require many things including > tests for hugetlbfs being mounted, this is about CPU support. I significantly prefer APIs not being arch-specific, so I'd prefer we always include AT_PAGE_SHIFT_MASK. For an architecture that doesn't define its own ARCH_AT_PAGE_SHIFT_MASK, it's not _inaccurate_ to report 1 << PAGE_SHIFT, but it might be incomplete.
* Kees Cook: > I significantly prefer APIs not being arch-specific, so I'd prefer we > always include AT_PAGE_SHIFT_MASK. For an architecture that doesn't > define its own ARCH_AT_PAGE_SHIFT_MASK, it's not _inaccurate_ to report > 1 << PAGE_SHIFT, but it might be incomplete. The downside is that as an application programmer, I have to go and chase for the information the legacy way if I encounter getauxval(AT_PAGE_SHIFT_MASK) == getpagesize() for a longer time because the interface does not signal the absence of any extended page sizes. Thanks, Florian
On Fri, Dec 08, 2023 at 07:35:25PM +0100, Florian Weimer wrote: > * Kees Cook: > > > I significantly prefer APIs not being arch-specific, so I'd prefer we > > always include AT_PAGE_SHIFT_MASK. For an architecture that doesn't > > define its own ARCH_AT_PAGE_SHIFT_MASK, it's not _inaccurate_ to report > > 1 << PAGE_SHIFT, but it might be incomplete. > > The downside is that as an application programmer, I have to go and > chase for the information the legacy way if I encounter > getauxval(AT_PAGE_SHIFT_MASK) == getpagesize() for a longer time > because the interface does not signal the absence of any extended > page sizes. Are there architectures besides x86 where AT_PAGE_SHIFT_MASK isn't a single bit? If so, let's get them added now along with x86.
On Fri, Dec 08, 2023 at 10:29:25AM -0800, Kees Cook wrote: > On Thu, Dec 07, 2023 at 05:57:05PM +0300, Alexey Dobriyan wrote: > > On Wed, Dec 06, 2023 at 12:47:27PM -0800, Kees Cook wrote: > > > Can't we have a generic ARCH_AT_PAGE_SHIFT_MASK too? Something like: > > > > > > #ifndef ARCH_AT_PAGE_SHIFT_MASK > > > #define ARCH_AT_PAGE_SHIFT_MASK > > > NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, 1 << PAGE_SHIFT) > > > #endif > > > > > > Or am I misunderstanding something here? > > > > 1) Arch maintainers can opt into this new way to report information at > > their own pace. > > > > 2) AT_PAGE_SHIFT_MASK is about _all_ pagesizes supported by CPU. > > Reporting just one is missing the point. > > > > I'll clarify comment: mmap() support require many things including > > tests for hugetlbfs being mounted, this is about CPU support. > > I significantly prefer APIs not being arch-specific, It will become arch-independent once all relevant archs opt-in. I doubt anyone is writing new software for sparc or alpha. > so I'd prefer we > always include AT_PAGE_SHIFT_MASK. For an architecture that doesn't > define its own ARCH_AT_PAGE_SHIFT_MASK, it's not _inaccurate_ to report > 1 << PAGE_SHIFT, but it might be incomplete. It is inaccurate if ARCH_AT_PAGE_SHIFT_MASK is defined as "_all_ page shift CPU supports". Inaccurate version is called AT_PAGESZ which lists just 1 page size, there is no need for 2 inaccurate APIs.
--- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -358,6 +358,18 @@ else if (IS_ENABLED(CONFIG_IA32_EMULATION)) \ #define COMPAT_ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x1000000) +#define ARCH_AT_PAGE_SHIFT_MASK \ + do { \ + u32 val = 1 << 12; \ + if (boot_cpu_has(X86_FEATURE_PSE)) { \ + val |= 1 << 21; \ + } \ + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \ + val |= 1 << 30; \ + } \ + NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, val); \ + } while (0) + #endif /* !CONFIG_X86_32 */ #define VDSO_CURRENT_BASE ((unsigned long)current->mm->context.vdso) --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -240,6 +240,9 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #endif NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE); +#ifdef ARCH_AT_PAGE_SHIFT_MASK + ARCH_AT_PAGE_SHIFT_MASK; +#endif NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC); NEW_AUX_ENT(AT_PHDR, phdr_addr); NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr)); --- a/include/uapi/linux/auxvec.h +++ b/include/uapi/linux/auxvec.h @@ -33,6 +33,20 @@ #define AT_RSEQ_FEATURE_SIZE 27 /* rseq supported feature size */ #define AT_RSEQ_ALIGN 28 /* rseq allocation alignment */ +/* + * Page sizes available for mmap(2) encoded as bitmask. + * + * Example: x86_64 system with pse, pdpe1gb /proc/cpuinfo flags reports + * 4 KiB, 2 MiB and 1 GiB page support. + * + * $ hexdump -C /proc/self/auxv + * 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 + * 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00 + * + * For 2^64 hugepage support please contact your Universe sales representative. + */ +#define AT_PAGE_SHIFT_MASK 29 + #define AT_EXECFN 31 /* filename of program */ #ifndef AT_MINSIGSTKSZ