Message ID | 1684292580-2455-1-git-send-email-yangtiezhu@loongson.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp844748vqo; Tue, 16 May 2023 20:09:39 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4BTz6fJMudqMQwofCfAkUjmFsMMLfbpZ3+ofRCP4mRDIrTCUzaNOMtorFOE5xgNaAvYwuv X-Received: by 2002:a05:6a00:218f:b0:64a:7723:fe04 with SMTP id h15-20020a056a00218f00b0064a7723fe04mr21820929pfi.4.1684292979034; Tue, 16 May 2023 20:09:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684292979; cv=none; d=google.com; s=arc-20160816; b=dWkguqbC7onXTfW44RcDvPS/A9OG0Fcx0yt0RWcb5GfijrOMUItoAN5Jt5V8MKpzDG BbG1g3gneDlEYBi9J7dRlFNWElVnZd+cenSYMY2uQ3vsh9MpwJ37061/7M/fAuW0Sn42 t+8p1vNyNK+ug4yFMCznOOBoAAUFaMb3TSSfvGv5lxovMyeb9NLRXC/gt3MobGid6Xpx 3MyzLyhLIfbGCu3OPO/mEeJItpr6D0Ynd/PcWxi7bkC4wC4XRttj3JsA45MMrcIXc24S COLUp/NjUOw3B6CChKQSQwVxJtb8GdjOTlWWtip3I5lUsP1zxRuUHC0TsZSPe9Gk0xsp yiZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=cbHjEM0uvTOpmhD+T6iVZpjjfvxBZPHntccz3W8/AxA=; b=a97kIJkihFvHVcsEVYhrM64QMd3mbZJAXaIgQSK65wkUkxoxbaZ+zofQj72GPamX57 yZpVe3mxxwtvnDFcGRI+erG3XPhcNYDUp6W7t6JR6M5SCXAGgEm31joB1L+P25VG8Gmw HYGtP+anSZ0zSCzkHCPfovVqZSIK5mTpeyz/QR7fcFkd6NEKB9m8v7v5fb6SZLUcsfzQ yhzjfOvNoQu2l7sP3NMLPtWVMric5Gmx92U6haoNmij2Na05AzgjLX6/e+tyLAkElfvA PMGMoOCXSGKlh8vjbogadKunjmUYpnyIWGvQJR3CKFwKOpi52Ggdyr8VecRNUn6sPV95 KJdw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u21-20020a63b555000000b0052877366da6si6411284pgo.104.2023.05.16.20.09.25; Tue, 16 May 2023 20:09:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232314AbjEQDEV (ORCPT <rfc822;abdi.embedded@gmail.com> + 99 others); Tue, 16 May 2023 23:04:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232353AbjEQDDz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 16 May 2023 23:03:55 -0400 Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 68EE946B6 for <linux-kernel@vger.kernel.org>; Tue, 16 May 2023 20:03:10 -0700 (PDT) Received: from loongson.cn (unknown [113.200.148.30]) by gateway (Coremail) with SMTP id _____8Ax3ertQ2Rkr14JAA--.16416S3; Wed, 17 May 2023 11:03:09 +0800 (CST) Received: from linux.localdomain (unknown [113.200.148.30]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dx_8vrQ2RkyvNkAA--.41740S2; Wed, 17 May 2023 11:03:08 +0800 (CST) From: Tiezhu Yang <yangtiezhu@loongson.cn> To: Huacai Chen <chenhuacai@kernel.org>, WANG Xuerui <kernel@xen0n.name>, Christian Brauner <brauner@kernel.org>, Andy Lutomirski <luto@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: loongarch@lists.linux.dev, linux-kernel@vger.kernel.org, loongson-kernel@lists.loongnix.cn Subject: [PATCH] LoongArch: Add support to clone a time namespace Date: Wed, 17 May 2023 11:03:00 +0800 Message-Id: <1684292580-2455-1-git-send-email-yangtiezhu@loongson.cn> X-Mailer: git-send-email 2.1.0 X-CM-TRANSID: AQAAf8Dx_8vrQ2RkyvNkAA--.41740S2 X-CM-SenderInfo: p1dqw3xlh2x3gn0dqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBjvJXoWxGr1DJFWfCryxXr1UCr13twb_yoWrCF1rpF Z2krsrJw4UWryfKFWaq3sxurn8Grn7Ww42qF4I93yfAF1IgryDZr1vyrykZF45t3ykC34I gFWfWw4Y9F4UX3DanT9S1TB71UUUUjUqnTZGkaVYY2UrUUUUj1kv1TuYvTs0mT0YCTnIWj qI5I8CrVACY4xI64kE6c02F40Ex7xfYxn0WfASr-VFAUDa7-sFnT9fnUUIcSsGvfJTRUUU bS8YFVCjjxCrM7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4v20xvaj40_Wr0E3s 1l1IIY67AEw4v_Jrv_JF1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xv wVC0I7IYx2IY67AKxVW8JVW5JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwA2z4 x0Y4vEx4A2jsIE14v26r4UJVWxJr1l84ACjcxK6I8E87Iv6xkF7I0E14v26r4UJVWxJr1l n4kS14v26r1Y6r17M2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6x ACxx1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r126r1DMcIj6I8E 87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lc7CjxV Aaw2AFwI0_JF0_Jw1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1l4IxY O2xFxVAFwI0_Jrv_JF1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGV WUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_ Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rV WUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4U YxBIdaVFxhVjvjDU0xZFpf9x07jOiSdUUUUU= X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766109194565918751?= X-GMAIL-MSGID: =?utf-8?q?1766109194565918751?= |
Series |
LoongArch: Add support to clone a time namespace
|
|
Commit Message
Tiezhu Yang
May 17, 2023, 3:03 a.m. UTC
When execute the following command to test clone3 on LoongArch:
# cd tools/testing/selftests/clone3 && make && ./clone3
we can see the following error info:
# [5719] Trying clone3() with flags 0x80 (size 0)
# Invalid argument - Failed to create new process
# [5719] clone3() with flags says: -22 expected 0
not ok 18 [5719] Result (-22) is different than expected (0)
This is because if CONFIG_TIME_NS is not set, but the flag
CLONE_NEWTIME (0x80) is used to clone a time namespace, it
will return -EINVAL in copy_time_ns().
Here is the related code in include/linux/time_namespace.h:
#ifdef CONFIG_TIME_NS
...
struct time_namespace *copy_time_ns(unsigned long flags,
struct user_namespace *user_ns,
struct time_namespace *old_ns);
...
#else
...
static inline
struct time_namespace *copy_time_ns(unsigned long flags,
struct user_namespace *user_ns,
struct time_namespace *old_ns)
{
if (flags & CLONE_NEWTIME)
return ERR_PTR(-EINVAL);
return old_ns;
}
...
#endif
Here is the complete call stack:
clone3()
kernel_clone()
copy_process()
copy_namespaces()
create_new_namespaces()
copy_time_ns()
clone_time_ns()
Because CONFIG_TIME_NS depends on GENERIC_VDSO_TIME_NS, select
GENERIC_VDSO_TIME_NS to enable CONFIG_TIME_NS to build the real
implementation of copy_time_ns() in kernel/time/namespace.c.
Additionally, it needs to define some arch dependent functions
such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens(), then the failed test can be fixed.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
This is based on 6.4-rc2
arch/loongarch/Kconfig | 1 +
arch/loongarch/include/asm/vdso/gettimeofday.h | 7 ++++++
arch/loongarch/kernel/vdso.c | 32 ++++++++++++++++++++++++++
3 files changed, 40 insertions(+)
Comments
Hi, Tiezhu, The layout of vdso data (loongarch_vdso_data): struct vdso_pcpu_data pdata[NR_CPUS]; struct vdso_data data[CS_BASES]; VDSO_DATA_SIZE is the page aligned size of loongarch_vdso_data, and in memory, vdso code is above vdso data. Then, get_vdso_base() returns the start of vdso code, and get_vdso_data() returns the start of vdso data. In your patch, __arch_get_timens_vdso_data() returns get_vdso_data() + PAGE_SIZE, but you don't increase the size of loongarch_vdso_data. The result is it returns an address in vdso code. Now, do you know what the problem is? Or still insist that "I have tested"? Huacai On Wed, May 17, 2023 at 11:03 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote: > > When execute the following command to test clone3 on LoongArch: > > # cd tools/testing/selftests/clone3 && make && ./clone3 > > we can see the following error info: > > # [5719] Trying clone3() with flags 0x80 (size 0) > # Invalid argument - Failed to create new process > # [5719] clone3() with flags says: -22 expected 0 > not ok 18 [5719] Result (-22) is different than expected (0) > > This is because if CONFIG_TIME_NS is not set, but the flag > CLONE_NEWTIME (0x80) is used to clone a time namespace, it > will return -EINVAL in copy_time_ns(). > > Here is the related code in include/linux/time_namespace.h: > > #ifdef CONFIG_TIME_NS > ... > struct time_namespace *copy_time_ns(unsigned long flags, > struct user_namespace *user_ns, > struct time_namespace *old_ns); > ... > #else > ... > static inline > struct time_namespace *copy_time_ns(unsigned long flags, > struct user_namespace *user_ns, > struct time_namespace *old_ns) > { > if (flags & CLONE_NEWTIME) > return ERR_PTR(-EINVAL); > > return old_ns; > } > ... > #endif > > Here is the complete call stack: > > clone3() > kernel_clone() > copy_process() > copy_namespaces() > create_new_namespaces() > copy_time_ns() > clone_time_ns() > > Because CONFIG_TIME_NS depends on GENERIC_VDSO_TIME_NS, select > GENERIC_VDSO_TIME_NS to enable CONFIG_TIME_NS to build the real > implementation of copy_time_ns() in kernel/time/namespace.c. > > Additionally, it needs to define some arch dependent functions > such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and > vdso_join_timens(), then the failed test can be fixed. > > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> > --- > > This is based on 6.4-rc2 > > arch/loongarch/Kconfig | 1 + > arch/loongarch/include/asm/vdso/gettimeofday.h | 7 ++++++ > arch/loongarch/kernel/vdso.c | 32 ++++++++++++++++++++++++++ > 3 files changed, 40 insertions(+) > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index d38b066..93b167f 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -80,6 +80,7 @@ config LOONGARCH > select GENERIC_SCHED_CLOCK > select GENERIC_SMP_IDLE_THREAD > select GENERIC_TIME_VSYSCALL > + select GENERIC_VDSO_TIME_NS > select GPIOLIB > select HAS_IOPORT > select HAVE_ARCH_AUDITSYSCALL > diff --git a/arch/loongarch/include/asm/vdso/gettimeofday.h b/arch/loongarch/include/asm/vdso/gettimeofday.h > index 7b2cd37..1af88ac 100644 > --- a/arch/loongarch/include/asm/vdso/gettimeofday.h > +++ b/arch/loongarch/include/asm/vdso/gettimeofday.h > @@ -94,6 +94,13 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void) > return get_vdso_data(); > } > > +#ifdef CONFIG_TIME_NS > +static __always_inline > +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd) > +{ > + return get_vdso_data() + PAGE_SIZE; > +} > +#endif > #endif /* !__ASSEMBLY__ */ > > #endif /* __ASM_VDSO_GETTIMEOFDAY_H */ > diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c > index eaebd2e..cf62103 100644 > --- a/arch/loongarch/kernel/vdso.c > +++ b/arch/loongarch/kernel/vdso.c > @@ -14,6 +14,7 @@ > #include <linux/random.h> > #include <linux/sched.h> > #include <linux/slab.h> > +#include <linux/time_namespace.h> > #include <linux/timekeeper_internal.h> > > #include <asm/page.h> > @@ -73,6 +74,37 @@ static int __init init_vdso(void) > } > subsys_initcall(init_vdso); > > +#ifdef CONFIG_TIME_NS > +struct vdso_data *arch_get_vdso_data(void *vvar_page) > +{ > + return (struct vdso_data *)(vvar_page); > +} > + > +/* > + * The vvar mapping contains data for a specific time namespace, so when a > + * task changes namespace we must unmap its vvar data for the old namespace. > + * Subsequent faults will map in data for the new namespace. > + * > + * For more details see timens_setup_vdso_data(). > + */ > +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns) > +{ > + struct mm_struct *mm = task->mm; > + struct vm_area_struct *vma; > + > + VMA_ITERATOR(vmi, mm, 0); > + > + mmap_read_lock(mm); > + for_each_vma(vmi, vma) { > + if (vma_is_special_mapping(vma, &vdso_info.data_mapping)) > + zap_vma_pages(vma); > + } > + mmap_read_unlock(mm); > + > + return 0; > +} > +#endif > + > static unsigned long vdso_base(void) > { > unsigned long base = STACK_TOP; > -- > 2.1.0 >
On 05/18/2023 10:25 AM, Huacai Chen wrote: > Hi, Tiezhu, > > The layout of vdso data (loongarch_vdso_data): > > struct vdso_pcpu_data pdata[NR_CPUS]; > struct vdso_data data[CS_BASES]; > > VDSO_DATA_SIZE is the page aligned size of loongarch_vdso_data, and in > memory, vdso code is above vdso data. > > Then, get_vdso_base() returns the start of vdso code, and > get_vdso_data() returns the start of vdso data. > > In your patch, __arch_get_timens_vdso_data() returns get_vdso_data() + > PAGE_SIZE, but you don't increase the size of loongarch_vdso_data. The > result is it returns an address in vdso code. > > Now, do you know what the problem is? Or still insist that "I have tested"? Please review the following changes based on the current patch, modify the layout of vvar to expand a page size for timens_data, and also map it to zero pfn before creating time namespace, then the last thing is to add the callback function vvar_fault(). $ git diff diff --git a/arch/loongarch/include/asm/page.h b/arch/loongarch/include/asm/page.h index fb5338b..26e8dcc 100644 --- a/arch/loongarch/include/asm/page.h +++ b/arch/loongarch/include/asm/page.h @@ -81,6 +81,7 @@ typedef struct { unsigned long pgprot; } pgprot_t; #define __va(x) ((void *)((unsigned long)(x) + PAGE_OFFSET - PHYS_OFFSET)) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) +#define sym_to_pfn(x) __phys_to_pfn(__pa_symbol(x)) #define virt_to_pfn(kaddr) PFN_DOWN(PHYSADDR(kaddr)) #define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr)) diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c index cf62103..3e89aca 100644 --- a/arch/loongarch/kernel/vdso.c +++ b/arch/loongarch/kernel/vdso.c @@ -23,7 +23,27 @@ #include <vdso/vsyscall.h> #include <generated/vdso-offsets.h> +/* + * The layout of vvar: + * + * high + * +----------------+----------------+ + * | timens_data | PAGE_SIZE | + * +----------------+----------------+ + * | vdso_data | | + * | vdso_pcpu_data | VDSO_DATA_SIZE | + * +----------------+----------------+ + * low + */ +#define VVAR_SIZE (VDSO_DATA_SIZE + PAGE_SIZE) + +enum vvar_pages { + VVAR_DATA_PAGE_OFFSET, + VVAR_TIMENS_PAGE_OFFSET, +}; + extern char vdso_start[], vdso_end[]; +extern unsigned long zero_pfn; /* Kernel-provided data used by the VDSO. */ static union { @@ -42,6 +62,40 @@ static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struc return 0; } +static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, + struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct page *timens_page = find_timens_vvar_page(vma); + unsigned long pfn; + + switch (vmf->pgoff) { + case VVAR_DATA_PAGE_OFFSET: + if (timens_page) + pfn = page_to_pfn(timens_page); + else + pfn = sym_to_pfn(vdso_data); + break; +#ifdef CONFIG_TIME_NS + case VVAR_TIMENS_PAGE_OFFSET: + /* + * If a task belongs to a time namespace then a namespace + * specific VVAR is mapped with the VVAR_DATA_PAGE_OFFSET and + * the real VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET + * offset. + * See also the comment near timens_setup_vdso_data(). + */ + if (!timens_page) + return VM_FAULT_SIGBUS; + pfn = sym_to_pfn(vdso_data); + break; +#endif /* CONFIG_TIME_NS */ + default: + return VM_FAULT_SIGBUS; + } + + return vmf_insert_pfn(vma, vmf->address, pfn); +} + struct loongarch_vdso_info vdso_info = { .vdso = vdso_start, .size = PAGE_SIZE, @@ -52,6 +106,7 @@ struct loongarch_vdso_info vdso_info = { }, .data_mapping = { .name = "[vvar]", + .fault = vvar_fault, }, .offset_sigreturn = vdso_offset_sigreturn, }; @@ -120,7 +175,7 @@ static unsigned long vdso_base(void) int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) { int ret; - unsigned long vvar_size, size, data_addr, vdso_addr; + unsigned long size, data_addr, vdso_addr; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; struct loongarch_vdso_info *info = current->thread.vdso; @@ -132,17 +187,16 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) * Determine total area size. This includes the VDSO data itself * and the data pages. */ - vvar_size = VDSO_DATA_SIZE; - size = vvar_size + info->size; + size = VVAR_SIZE + info->size; data_addr = get_unmapped_area(NULL, vdso_base(), size, 0, 0); if (IS_ERR_VALUE(data_addr)) { ret = data_addr; goto out; } - vdso_addr = data_addr + VDSO_DATA_SIZE; + vdso_addr = data_addr + VVAR_SIZE; - vma = _install_special_mapping(mm, data_addr, vvar_size, + vma = _install_special_mapping(mm, data_addr, VVAR_SIZE, VM_READ | VM_MAYREAD, &info->data_mapping); if (IS_ERR(vma)) { @@ -153,7 +207,12 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) /* Map VDSO data page. */ ret = remap_pfn_range(vma, data_addr, virt_to_phys(&loongarch_vdso_data) >> PAGE_SHIFT, - vvar_size, PAGE_READONLY); + VDSO_DATA_SIZE, PAGE_READONLY); + if (ret) + goto out; + + ret = remap_pfn_range(vma, data_addr + VDSO_DATA_SIZE, zero_pfn, + PAGE_SIZE, PAGE_READONLY); if (ret) goto out; If you have any more comments, please let me know, thank you. I will send v2 after waiting for some more feedbacks. Thanks, Tiezhu
Hi, Tiezhu, On Sat, May 20, 2023 at 6:35 PM Tiezhu Yang <yangtiezhu@loongson.cn> wrote: > > > > On 05/18/2023 10:25 AM, Huacai Chen wrote: > > Hi, Tiezhu, > > > > The layout of vdso data (loongarch_vdso_data): > > > > struct vdso_pcpu_data pdata[NR_CPUS]; > > struct vdso_data data[CS_BASES]; > > > > VDSO_DATA_SIZE is the page aligned size of loongarch_vdso_data, and in > > memory, vdso code is above vdso data. > > > > Then, get_vdso_base() returns the start of vdso code, and > > get_vdso_data() returns the start of vdso data. > > > > In your patch, __arch_get_timens_vdso_data() returns get_vdso_data() + > > PAGE_SIZE, but you don't increase the size of loongarch_vdso_data. The > > result is it returns an address in vdso code. > > > > Now, do you know what the problem is? Or still insist that "I have tested"? > > Please review the following changes based on the current patch, > modify the layout of vvar to expand a page size for timens_data, > and also map it to zero pfn before creating time namespace, then > the last thing is to add the callback function vvar_fault(). > > $ git diff > diff --git a/arch/loongarch/include/asm/page.h > b/arch/loongarch/include/asm/page.h > index fb5338b..26e8dcc 100644 > --- a/arch/loongarch/include/asm/page.h > +++ b/arch/loongarch/include/asm/page.h > @@ -81,6 +81,7 @@ typedef struct { unsigned long pgprot; } pgprot_t; > #define __va(x) ((void *)((unsigned long)(x) + > PAGE_OFFSET - PHYS_OFFSET)) > > #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) > +#define sym_to_pfn(x) __phys_to_pfn(__pa_symbol(x)) > > #define virt_to_pfn(kaddr) PFN_DOWN(PHYSADDR(kaddr)) > #define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr)) > diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c > index cf62103..3e89aca 100644 > --- a/arch/loongarch/kernel/vdso.c > +++ b/arch/loongarch/kernel/vdso.c > @@ -23,7 +23,27 @@ > #include <vdso/vsyscall.h> > #include <generated/vdso-offsets.h> > > +/* > + * The layout of vvar: > + * > + * high > + * +----------------+----------------+ > + * | timens_data | PAGE_SIZE | > + * +----------------+----------------+ > + * | vdso_data | | > + * | vdso_pcpu_data | VDSO_DATA_SIZE | > + * +----------------+----------------+ > + * low > + */ > +#define VVAR_SIZE (VDSO_DATA_SIZE + PAGE_SIZE) > + > +enum vvar_pages { > + VVAR_DATA_PAGE_OFFSET, > + VVAR_TIMENS_PAGE_OFFSET, > +}; You suppose that vdso_data+vdso_pcpu_data can fit in one page, but this isn't always the case. > + > extern char vdso_start[], vdso_end[]; > +extern unsigned long zero_pfn; > > /* Kernel-provided data used by the VDSO. */ > static union { > @@ -42,6 +62,40 @@ static int vdso_mremap(const struct > vm_special_mapping *sm, struct vm_area_struc > return 0; > } > > +static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, > + struct vm_area_struct *vma, struct vm_fault > *vmf) > +{ > + struct page *timens_page = find_timens_vvar_page(vma); > + unsigned long pfn; > + > + switch (vmf->pgoff) { > + case VVAR_DATA_PAGE_OFFSET: > + if (timens_page) > + pfn = page_to_pfn(timens_page); > + else > + pfn = sym_to_pfn(vdso_data); > + break; > +#ifdef CONFIG_TIME_NS > + case VVAR_TIMENS_PAGE_OFFSET: > + /* > + * If a task belongs to a time namespace then a namespace > + * specific VVAR is mapped with the > VVAR_DATA_PAGE_OFFSET and > + * the real VVAR page is mapped with the > VVAR_TIMENS_PAGE_OFFSET > + * offset. > + * See also the comment near timens_setup_vdso_data(). > + */ > + if (!timens_page) > + return VM_FAULT_SIGBUS; > + pfn = sym_to_pfn(vdso_data); > + break; > +#endif /* CONFIG_TIME_NS */ > + default: > + return VM_FAULT_SIGBUS; > + } > + > + return vmf_insert_pfn(vma, vmf->address, pfn); > +} > + > struct loongarch_vdso_info vdso_info = { > .vdso = vdso_start, > .size = PAGE_SIZE, > @@ -52,6 +106,7 @@ struct loongarch_vdso_info vdso_info = { > }, > .data_mapping = { > .name = "[vvar]", > + .fault = vvar_fault, I prefer pre-allocate than page-fault if possible. Huacai > }, > .offset_sigreturn = vdso_offset_sigreturn, > }; > @@ -120,7 +175,7 @@ static unsigned long vdso_base(void) > int arch_setup_additional_pages(struct linux_binprm *bprm, int > uses_interp) > { > int ret; > - unsigned long vvar_size, size, data_addr, vdso_addr; > + unsigned long size, data_addr, vdso_addr; > struct mm_struct *mm = current->mm; > struct vm_area_struct *vma; > struct loongarch_vdso_info *info = current->thread.vdso; > @@ -132,17 +187,16 @@ int arch_setup_additional_pages(struct > linux_binprm *bprm, int uses_interp) > * Determine total area size. This includes the VDSO data itself > * and the data pages. > */ > - vvar_size = VDSO_DATA_SIZE; > - size = vvar_size + info->size; > + size = VVAR_SIZE + info->size; > > data_addr = get_unmapped_area(NULL, vdso_base(), size, 0, 0); > if (IS_ERR_VALUE(data_addr)) { > ret = data_addr; > goto out; > } > - vdso_addr = data_addr + VDSO_DATA_SIZE; > + vdso_addr = data_addr + VVAR_SIZE; > > - vma = _install_special_mapping(mm, data_addr, vvar_size, > + vma = _install_special_mapping(mm, data_addr, VVAR_SIZE, > VM_READ | VM_MAYREAD, > &info->data_mapping); > if (IS_ERR(vma)) { > @@ -153,7 +207,12 @@ int arch_setup_additional_pages(struct linux_binprm > *bprm, int uses_interp) > /* Map VDSO data page. */ > ret = remap_pfn_range(vma, data_addr, > virt_to_phys(&loongarch_vdso_data) >> > PAGE_SHIFT, > - vvar_size, PAGE_READONLY); > + VDSO_DATA_SIZE, PAGE_READONLY); > + if (ret) > + goto out; > + > + ret = remap_pfn_range(vma, data_addr + VDSO_DATA_SIZE, zero_pfn, > + PAGE_SIZE, PAGE_READONLY); > if (ret) > goto out; > > If you have any more comments, please let me know, thank you. > I will send v2 after waiting for some more feedbacks. > > Thanks, > Tiezhu >
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index d38b066..93b167f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -80,6 +80,7 @@ config LOONGARCH select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL + select GENERIC_VDSO_TIME_NS select GPIOLIB select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL diff --git a/arch/loongarch/include/asm/vdso/gettimeofday.h b/arch/loongarch/include/asm/vdso/gettimeofday.h index 7b2cd37..1af88ac 100644 --- a/arch/loongarch/include/asm/vdso/gettimeofday.h +++ b/arch/loongarch/include/asm/vdso/gettimeofday.h @@ -94,6 +94,13 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void) return get_vdso_data(); } +#ifdef CONFIG_TIME_NS +static __always_inline +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd) +{ + return get_vdso_data() + PAGE_SIZE; +} +#endif #endif /* !__ASSEMBLY__ */ #endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c index eaebd2e..cf62103 100644 --- a/arch/loongarch/kernel/vdso.c +++ b/arch/loongarch/kernel/vdso.c @@ -14,6 +14,7 @@ #include <linux/random.h> #include <linux/sched.h> #include <linux/slab.h> +#include <linux/time_namespace.h> #include <linux/timekeeper_internal.h> #include <asm/page.h> @@ -73,6 +74,37 @@ static int __init init_vdso(void) } subsys_initcall(init_vdso); +#ifdef CONFIG_TIME_NS +struct vdso_data *arch_get_vdso_data(void *vvar_page) +{ + return (struct vdso_data *)(vvar_page); +} + +/* + * The vvar mapping contains data for a specific time namespace, so when a + * task changes namespace we must unmap its vvar data for the old namespace. + * Subsequent faults will map in data for the new namespace. + * + * For more details see timens_setup_vdso_data(). + */ +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns) +{ + struct mm_struct *mm = task->mm; + struct vm_area_struct *vma; + + VMA_ITERATOR(vmi, mm, 0); + + mmap_read_lock(mm); + for_each_vma(vmi, vma) { + if (vma_is_special_mapping(vma, &vdso_info.data_mapping)) + zap_vma_pages(vma); + } + mmap_read_unlock(mm); + + return 0; +} +#endif + static unsigned long vdso_base(void) { unsigned long base = STACK_TOP;