Message ID | 20230705190002.384799-2-charlie@rivosinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f45:0:b0:3ea:f831:8777 with SMTP id v5csp2092660vqx; Wed, 5 Jul 2023 12:34:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4xdNLn/xRUyrbySjJFy7pAwO3uGAMpdfXSbY9pBuVsxTJIW57jlRdjiN01/OXne24tx4hb X-Received: by 2002:a05:6808:140d:b0:3a1:cbac:38e5 with SMTP id w13-20020a056808140d00b003a1cbac38e5mr23631577oiv.12.1688585652187; Wed, 05 Jul 2023 12:34:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688585652; cv=none; d=google.com; s=arc-20160816; b=mg5YaaPd+WFa3yRsL1FiJuwwBm66Oqk+zcyyfWmuik8AOUC+izHp0UJwGH/IP02ktt PxohwW5D+2pENKwp48SdJYjgg96cFIuOS+rEQ8tudQoub4jjQ4309RUCKr86BB9eXfR/ 8NT0ev8iRGLb9vRODYdpZGtXuSnmnBg6M16VtSe3Vd227rXuL4XYYRUm/h0aOLcBwT+X LoXWk9Ylkrotm1m2gjRyUHY01Pvb6xl4a65lp092Jpv+Qgj56sEyxdkbHtxR2PrPjX16 Ge4BX26GcJ3TdRGicVWEEcEhkQUW41ghoKJWEnhUIZJsStiyq8k7GBA1jj/R4t9/2tQ6 eTbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YwDbFN4IMu/IT2sUnJpiyfq5VoW3RyrrIlVKWP8Zp3M=; fh=B6InlnakgCOMhE0Jo7qCWVHZRq1xmzaWTN1j+WVFWWQ=; b=CajYADLfJXfViGhrXOcR7VLu5/vE0ywoEM+TJL0mPFmDyr9YmqBMadEKHRAIw/4zlz fimn03b44kZNRdDqcq8FcrIpS++wiaITPqxAJMp+kuY7IXkacVav0e7gGW5lR8z7Z4FN BZPq+9WX9gG083rm2HNhgEZcsPcXy3DPLJV1nzSmPqAplE0BxuU3342QSMW83lmam+c7 vSvzJKLfiayyftJUtDurKrHsMiOS34yZI3XaUN0WxhYXxN4cJ85cklrR8S2e8MwcMvlW QXVlwefQoq6qfhJ5itnOzjd5TGhBs/FB8Oo0Dzn6cym20W7be7ccvVMqxTZzIvvfoJJ4 Bdrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=B5MuAnjy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y64-20020a636443000000b0054fd88c3598si23651173pgb.35.2023.07.05.12.33.59; Wed, 05 Jul 2023 12:34:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=B5MuAnjy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232942AbjGETB3 (ORCPT <rfc822;tebrre53rla2o@gmail.com> + 99 others); Wed, 5 Jul 2023 15:01:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232371AbjGETBZ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 5 Jul 2023 15:01:25 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F8741BDD for <linux-kernel@vger.kernel.org>; Wed, 5 Jul 2023 12:00:56 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1b7e6512973so40604215ad.3 for <linux-kernel@vger.kernel.org>; Wed, 05 Jul 2023 12:00:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1688583646; x=1691175646; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YwDbFN4IMu/IT2sUnJpiyfq5VoW3RyrrIlVKWP8Zp3M=; b=B5MuAnjyPA5HWnw/0ZMD7sikNOclxpF4lGSvciWQB87w4rFYSUNfLNK+HsQp492Rf+ ZxPO1vU+712n7WlePRMhOn6SczHHo9fE6HIJLL8ZyJ/POLpUb9mOET4MxYE1D1KL66+R yc7XK/MLgV48J/rmUICg/JfoMi8RwOcb8Rdc0L6L7JA3HWXsHZEe5vqnDO4wpHtksNG0 6hykpWJqcBzzZvBlwkUnF6EpCUytPeXyckLHmbzX+YnvS2NJrlzxgyDFU30U93nh8Azf WSn1iuqn7dcZTohfbd3Vdke/R26c1Z9YtPFh/+P4RJkaVsr6lp0+b0KcwUXVO3bdjKSZ UYyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688583646; x=1691175646; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YwDbFN4IMu/IT2sUnJpiyfq5VoW3RyrrIlVKWP8Zp3M=; b=f99hSI8pZEA+UMvYu0PmnYAgzyj0yl2JeY/aPp5VZ5meUdkQioBNNno+7Vn6LErVdL k/34sZq3R81/edu7icU2vPqiYkEdDwXvoD4y6BNWyGO9+894pk9GihtUrexwJDPzu0gR NUotrGllnqPn+anQtM5Ku15bsH+M5/xf4mG4/nluM+VN2QyMQt9tEWnjZYBXi7w+QxUa vmeoz2ljjpdQgIZi2kmwODRXu9Tf8lO7aylxb6dHdT4lBMBo9+KvRcAGwFikJe3Blr4r UgmgKHZT23MoW5HIIekSPNP3dypCZTDd4vnTfdtatxYyOq2eQ1Av9sU4x9rX2ifncvW5 3Z4Q== X-Gm-Message-State: ABy/qLaeiqmK/R+XPOpu58SaeSD7YEhTZH4FbEsl9rk0CCGd1oN6Kgun 4LuEbPevxj1pHwwRHnYIPCj6Ng== X-Received: by 2002:a17:903:495:b0:1b8:a372:9c25 with SMTP id jj21-20020a170903049500b001b8a3729c25mr3944409plb.9.1688583646025; Wed, 05 Jul 2023 12:00:46 -0700 (PDT) Received: from charlie.ba.rivosinc.com ([66.220.2.162]) by smtp.gmail.com with ESMTPSA id d7-20020a170902aa8700b001b0358848b0sm19323359plr.161.2023.07.05.12.00.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jul 2023 12:00:45 -0700 (PDT) From: Charlie Jenkins <charlie@rivosinc.com> To: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Cc: charlie@rivosinc.com, conor@kernel.org, paul.walmsley@sifive.com, palmer@rivosinc.com, aou@eecs.berkeley.edu, anup@brainfault.org, konstantin@linuxfoundation.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, mick@ics.forth.gr, jrtc27@jrtc27.com Subject: [RESEND PATCH v3 1/2] RISC-V: mm: Restrict address space for sv39,sv48,sv57 Date: Wed, 5 Jul 2023 11:59:41 -0700 Message-ID: <20230705190002.384799-2-charlie@rivosinc.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230705190002.384799-1-charlie@rivosinc.com> References: <20230705190002.384799-1-charlie@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1770610388345183277?= X-GMAIL-MSGID: =?utf-8?q?1770610388345183277?= |
Series |
RISC-V: mm: Make SV48 the default address space
|
|
Commit Message
Charlie Jenkins
July 5, 2023, 6:59 p.m. UTC
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. The RISC-V specification enforces
that bits outside of the virtual address range are not used, so
restricting the size of the default address space as such should be
temporary. A hint address passed to mmap will cause the largest address
space that fits entirely into the hint to be used. If the hint is less
than or equal to 1<<38, an sv39 address will be used. An exception is
that if the hint address is 0, then a sv48 address will be used. After
an address space is completely full, the next smallest address space
will be used.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 13 +++++++++++-
arch/riscv/include/asm/processor.h | 34 ++++++++++++++++++++++++------
3 files changed, 40 insertions(+), 9 deletions(-)
Comments
Hi Charlie, On 05/07/2023 20:59, Charlie Jenkins wrote: > Make sv48 the default address space for mmap as some applications > currently depend on this assumption. The RISC-V specification enforces > that bits outside of the virtual address range are not used, so > restricting the size of the default address space as such should be > temporary. What do you mean in the last sentence above? > A hint address passed to mmap will cause the largest address > space that fits entirely into the hint to be used. If the hint is less > than or equal to 1<<38, an sv39 address will be used. An exception is > that if the hint address is 0, then a sv48 address will be used.After > an address space is completely full, the next smallest address space > will be used. > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> > --- > arch/riscv/include/asm/elf.h | 2 +- > arch/riscv/include/asm/pgtable.h | 13 +++++++++++- > arch/riscv/include/asm/processor.h | 34 ++++++++++++++++++++++++------ > 3 files changed, 40 insertions(+), 9 deletions(-) > > diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h > index 30e7d2455960..1b57f13a1afd 100644 > --- a/arch/riscv/include/asm/elf.h > +++ b/arch/riscv/include/asm/elf.h > @@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr); > * the loader. We need to make sure that it is out of the way of the program > * that it will "exec", and that there is sufficient room for the brk. > */ > -#define ELF_ET_DYN_BASE ((TASK_SIZE / 3) * 2) > +#define ELF_ET_DYN_BASE ((DEFAULT_MAP_WINDOW / 3) * 2) > > #ifdef CONFIG_64BIT > #ifdef CONFIG_COMPAT > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h > index 75970ee2bda2..752e210c7547 100644 > --- a/arch/riscv/include/asm/pgtable.h > +++ b/arch/riscv/include/asm/pgtable.h > @@ -57,18 +57,29 @@ > #define MODULES_END (PFN_ALIGN((unsigned long)&_start)) > #endif > > + > /* > * Roughly size the vmemmap space to be large enough to fit enough > * struct pages to map half the virtual address space. Then > * position vmemmap directly below the VMALLOC region. > */ > #ifdef CONFIG_64BIT > +#define VA_BITS_SV39 39 > +#define VA_BITS_SV48 48 > +#define VA_BITS_SV57 57 > + > +#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1)) > +#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1)) > +#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1)) > + > #define VA_BITS (pgtable_l5_enabled ? \ > - 57 : (pgtable_l4_enabled ? 48 : 39)) > + VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39)) > #else > #define VA_BITS 32 > #endif > > +#define DEFAULT_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS) Maybe rename DEFAULT_VA_BITS into MMAP_VA_BITS? Or something similar? > + > #define VMEMMAP_SHIFT \ > (VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) > #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h > index 94a0590c6971..468a1f4b9da4 100644 > --- a/arch/riscv/include/asm/processor.h > +++ b/arch/riscv/include/asm/processor.h > @@ -12,20 +12,40 @@ > > #include <asm/ptrace.h> > > -/* > - * This decides where the kernel will search for a free chunk of vm > - * space during mmap's. > - */ > -#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3) > - > -#define STACK_TOP TASK_SIZE > #ifdef CONFIG_64BIT > +#define DEFAULT_MAP_WINDOW (UL(1) << (DEFAULT_VA_BITS - 1)) > #define STACK_TOP_MAX TASK_SIZE_64 > + > +#define arch_get_mmap_end(addr, len, flags) \ > + ((addr) >= VA_USER_SV57 ? STACK_TOP_MAX : \ > + ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) ? \ > + VA_USER_SV48 : \ > + VA_USER_SV39) > + > +#define arch_get_mmap_base(addr, base) \ > + (((addr >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) ? \ So IIUC, a user must pass a hint larger than the max address of the mode the user wants right? Shouldn't the user rather pass an address that is larger than the previous mode? I mean if the user wants a 56-bit address, he should just pass an address above 1<<47 no? > + VA_USER_SV57 - (DEFAULT_MAP_WINDOW - base) : \ > + ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) ? \ > + VA_USER_SV48 - (DEFAULT_MAP_WINDOW - base) : \ > + (addr == 0) ? \ > + base : \ > + VA_USER_SV39 - (DEFAULT_MAP_WINDOW - base)) > + Can you turn that into a function or use if/else statement? It's very hard to understand what happens there. And riscv selects ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT which means the base is at the top of the address space (minus the stack IIRC). But if rlimit_stack is set to infinity (see mmap_base() https://elixir.bootlin.com/linux/latest/source/mm/util.c#L412), mmap_base is equal to TASK_UNMAPPED_BASE. Does that work in that case? It seems like this: VA_USER_SV39 - (DEFAULT_MAP_WINDOW - base)) would be negative right? You should also add a rlimit test. > #else > +#define DEFAULT_MAP_WINDOW TASK_SIZE > #define STACK_TOP_MAX TASK_SIZE > #endif > #define STACK_ALIGN 16 > > + > +#define STACK_TOP DEFAULT_MAP_WINDOW > + > +/* > + * This decides where the kernel will search for a free chunk of vm > + * space during mmap's. > + */ > +#define TASK_UNMAPPED_BASE PAGE_ALIGN(DEFAULT_MAP_WINDOW / 3) > + > #ifndef __ASSEMBLY__ > > struct task_struct;
On Thu, Jul 06, 2023 at 11:11:37AM +0200, Alexandre Ghiti wrote: > Hi Charlie, > > > On 05/07/2023 20:59, Charlie Jenkins wrote: > > Make sv48 the default address space for mmap as some applications > > currently depend on this assumption. The RISC-V specification enforces > > that bits outside of the virtual address range are not used, so > > restricting the size of the default address space as such should be > > temporary. > > > What do you mean in the last sentence above? > Applications like Java and Go broke when sv57 was implemented because they shove bits into the upper space of pointers. However riscv enforces that all of the upper bits in the virtual address are equal to the most significant bit. "Temporary" may not have been the best word, but this change would be irrelevant if application developers were following this rule, if I am understanding this requirement correctly. What this means to me is that riscv hardware is not guaranteed to not discard the bits in the virtual address that are not used in paging. > > > A hint address passed to mmap will cause the largest address > > space that fits entirely into the hint to be used. If the hint is less > > than or equal to 1<<38, an sv39 address will be used. An exception is > > that if the hint address is 0, then a sv48 address will be used.After > > an address space is completely full, the next smallest address space > > will be used. > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> > > --- > > arch/riscv/include/asm/elf.h | 2 +- > > arch/riscv/include/asm/pgtable.h | 13 +++++++++++- > > arch/riscv/include/asm/processor.h | 34 ++++++++++++++++++++++++------ > > 3 files changed, 40 insertions(+), 9 deletions(-) > > > > diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h > > index 30e7d2455960..1b57f13a1afd 100644 > > --- a/arch/riscv/include/asm/elf.h > > +++ b/arch/riscv/include/asm/elf.h > > @@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr); > > * the loader. We need to make sure that it is out of the way of the program > > * that it will "exec", and that there is sufficient room for the brk. > > */ > > -#define ELF_ET_DYN_BASE ((TASK_SIZE / 3) * 2) > > +#define ELF_ET_DYN_BASE ((DEFAULT_MAP_WINDOW / 3) * 2) > > #ifdef CONFIG_64BIT > > #ifdef CONFIG_COMPAT > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h > > index 75970ee2bda2..752e210c7547 100644 > > --- a/arch/riscv/include/asm/pgtable.h > > +++ b/arch/riscv/include/asm/pgtable.h > > @@ -57,18 +57,29 @@ > > #define MODULES_END (PFN_ALIGN((unsigned long)&_start)) > > #endif > > + > > /* > > * Roughly size the vmemmap space to be large enough to fit enough > > * struct pages to map half the virtual address space. Then > > * position vmemmap directly below the VMALLOC region. > > */ > > #ifdef CONFIG_64BIT > > +#define VA_BITS_SV39 39 > > +#define VA_BITS_SV48 48 > > +#define VA_BITS_SV57 57 > > + > > +#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1)) > > +#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1)) > > +#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1)) > > + > > #define VA_BITS (pgtable_l5_enabled ? \ > > - 57 : (pgtable_l4_enabled ? 48 : 39)) > > + VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39)) > > #else > > #define VA_BITS 32 > > #endif > > +#define DEFAULT_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS) > > > Maybe rename DEFAULT_VA_BITS into MMAP_VA_BITS? Or something similar? > > > > + > > #define VMEMMAP_SHIFT \ > > (VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) > > #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) > > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h > > index 94a0590c6971..468a1f4b9da4 100644 > > --- a/arch/riscv/include/asm/processor.h > > +++ b/arch/riscv/include/asm/processor.h > > @@ -12,20 +12,40 @@ > > #include <asm/ptrace.h> > > -/* > > - * This decides where the kernel will search for a free chunk of vm > > - * space during mmap's. > > - */ > > -#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3) > > - > > -#define STACK_TOP TASK_SIZE > > #ifdef CONFIG_64BIT > > +#define DEFAULT_MAP_WINDOW (UL(1) << (DEFAULT_VA_BITS - 1)) > > #define STACK_TOP_MAX TASK_SIZE_64 > > + > > +#define arch_get_mmap_end(addr, len, flags) \ > > + ((addr) >= VA_USER_SV57 ? STACK_TOP_MAX : \ > > + ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) ? \ > > + VA_USER_SV48 : \ > > + VA_USER_SV39) > > + > > +#define arch_get_mmap_base(addr, base) \ > > + (((addr >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) ? \ > > > So IIUC, a user must pass a hint larger than the max address of the mode the > user wants right? Shouldn't the user rather pass an address that is larger > than the previous mode? I mean if the user wants a 56-bit address, he should > just pass an address above 1<<47 no? > The rationale is that the hint address provided to mmap should signify all of the bits that the user is okay with being used for paging. Meaning that if they pass in 1<<50, they are okay with the first 51 bits being used in paging. The largest address space that fits within 51 bits is sv48, so that will be used. To use sv57, 1<<56 or larger will need to be used. > > > + VA_USER_SV57 - (DEFAULT_MAP_WINDOW - base) : \ > > + ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) ? \ > > + VA_USER_SV48 - (DEFAULT_MAP_WINDOW - base) : \ > > + (addr == 0) ? \ > > + base : \ > > + VA_USER_SV39 - (DEFAULT_MAP_WINDOW - base)) > > + > > > Can you turn that into a function or use if/else statement? It's very hard > to understand what happens there. > Yes, I can use statement expressions. > And riscv selects ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT which means the base > is at the top of the address space (minus the stack IIRC). But if > rlimit_stack is set to infinity (see mmap_base() > https://elixir.bootlin.com/linux/latest/source/mm/util.c#L412), mmap_base is > equal to TASK_UNMAPPED_BASE. Does that work in that case? It seems like > this: VA_USER_SV39 - (DEFAULT_MAP_WINDOW - base)) would be negative right? > > You should also add a rlimit test. > That is a good point. I think a better alternative will be to do base + (VA_USER_SV39 - DEFAULT_MAP_WINDOW). This will also work with the other address spaces by swapping out the 39 with 48 and 57. > > > #else > > +#define DEFAULT_MAP_WINDOW TASK_SIZE > > #define STACK_TOP_MAX TASK_SIZE > > #endif > > #define STACK_ALIGN 16 > > + > > +#define STACK_TOP DEFAULT_MAP_WINDOW > > + > > +/* > > + * This decides where the kernel will search for a free chunk of vm > > + * space during mmap's. > > + */ > > +#define TASK_UNMAPPED_BASE PAGE_ALIGN(DEFAULT_MAP_WINDOW / 3) > > + > > #ifndef __ASSEMBLY__ > > struct task_struct;
On 7 Jul 2023, at 00:56, Charlie Jenkins <charlie@rivosinc.com> wrote: > > On Thu, Jul 06, 2023 at 11:11:37AM +0200, Alexandre Ghiti wrote: >> Hi Charlie, >> >> >> On 05/07/2023 20:59, Charlie Jenkins wrote: >>> Make sv48 the default address space for mmap as some applications >>> currently depend on this assumption. The RISC-V specification enforces >>> that bits outside of the virtual address range are not used, so >>> restricting the size of the default address space as such should be >>> temporary. >> >> >> What do you mean in the last sentence above? >> > Applications like Java and Go broke when sv57 was implemented because > they shove bits into the upper space of pointers. However riscv enforces > that all of the upper bits in the virtual address are equal to the most > significant bit. "Temporary" may not have been the best word, but this change > would be irrelevant if application developers were following this rule, if I > am understanding this requirement correctly. What this means to me is > that riscv hardware is not guaranteed to not discard the bits in the virtual > address that are not used in paging. RISC-V guarantees that it will not discard the bits*. Java and Go aren’t actually dereferencing the pointers with their own metadata in the top bits (doing so would require a pointer masking extension, like how Arm has TBI), they’re just temporarily storing it there, assuming they’re not significant bits, then masking out and re-canonicalising the address prior to dereferencing. Which breaks, not because the hardware is looking at the higher bits (otherwise you could never use Sv57 for such applications even if you kept your addresses < 2^47), but because the chosen addresses have those high bits as significant. * A page fault is guaranteed if the address isn’t sign-extended Jess
diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h index 30e7d2455960..1b57f13a1afd 100644 --- a/arch/riscv/include/asm/elf.h +++ b/arch/riscv/include/asm/elf.h @@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr); * the loader. We need to make sure that it is out of the way of the program * that it will "exec", and that there is sufficient room for the brk. */ -#define ELF_ET_DYN_BASE ((TASK_SIZE / 3) * 2) +#define ELF_ET_DYN_BASE ((DEFAULT_MAP_WINDOW / 3) * 2) #ifdef CONFIG_64BIT #ifdef CONFIG_COMPAT diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 75970ee2bda2..752e210c7547 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -57,18 +57,29 @@ #define MODULES_END (PFN_ALIGN((unsigned long)&_start)) #endif + /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #ifdef CONFIG_64BIT +#define VA_BITS_SV39 39 +#define VA_BITS_SV48 48 +#define VA_BITS_SV57 57 + +#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1)) +#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1)) +#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1)) + #define VA_BITS (pgtable_l5_enabled ? \ - 57 : (pgtable_l4_enabled ? 48 : 39)) + VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39)) #else #define VA_BITS 32 #endif +#define DEFAULT_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS) + #define VMEMMAP_SHIFT \ (VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index 94a0590c6971..468a1f4b9da4 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -12,20 +12,40 @@ #include <asm/ptrace.h> -/* - * This decides where the kernel will search for a free chunk of vm - * space during mmap's. - */ -#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3) - -#define STACK_TOP TASK_SIZE #ifdef CONFIG_64BIT +#define DEFAULT_MAP_WINDOW (UL(1) << (DEFAULT_VA_BITS - 1)) #define STACK_TOP_MAX TASK_SIZE_64 + +#define arch_get_mmap_end(addr, len, flags) \ + ((addr) >= VA_USER_SV57 ? STACK_TOP_MAX : \ + ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) ? \ + VA_USER_SV48 : \ + VA_USER_SV39) + +#define arch_get_mmap_base(addr, base) \ + (((addr >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) ? \ + VA_USER_SV57 - (DEFAULT_MAP_WINDOW - base) : \ + ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) ? \ + VA_USER_SV48 - (DEFAULT_MAP_WINDOW - base) : \ + (addr == 0) ? \ + base : \ + VA_USER_SV39 - (DEFAULT_MAP_WINDOW - base)) + #else +#define DEFAULT_MAP_WINDOW TASK_SIZE #define STACK_TOP_MAX TASK_SIZE #endif #define STACK_ALIGN 16 + +#define STACK_TOP DEFAULT_MAP_WINDOW + +/* + * This decides where the kernel will search for a free chunk of vm + * space during mmap's. + */ +#define TASK_UNMAPPED_BASE PAGE_ALIGN(DEFAULT_MAP_WINDOW / 3) + #ifndef __ASSEMBLY__ struct task_struct;