Message ID | 20230616085038.4121892-7-rppt@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1181892vqr; Fri, 16 Jun 2023 01:55:35 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6qWYedK0A4hJTe662qGTDKQJkY3VhkB/ESossZPJMqVkuQ+Dzpm2HNrLT6hpZlkvyKuITo X-Received: by 2002:a17:902:c106:b0:1ac:3780:3a76 with SMTP id 6-20020a170902c10600b001ac37803a76mr1241121pli.4.1686905734966; Fri, 16 Jun 2023 01:55:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686905734; cv=none; d=google.com; s=arc-20160816; b=bmKRRv/m8cCMcHialBDeaW+FJz5cYd6DkKlX4z4YUEkGu6r5N811p82WVRjTUmkKZu ngh+vF6090qDOVvHflM2Dx18Yfm5tu0Dm4uzkNYHWhPD2VNU4M/mhwXHKXIVrleBMILV bxNzdyXVhbJjDWu1Q0Yj5i4JTSLeQTO94Xd38VzgU7PrVK9bEdZE8K6Gj2flvshlrRQ5 mUJAsKJoHhxCSxrSRNCuCfcKmRdeW36jRTIWLgNplr0mx2pwBEVNXv/aT/mERa2YeK6g UcoGUNhhRzEfudSa23QQFmojlShuMyRdbi0JMuTRBvSFQCbEM8k5JF5UzYLlsB+LXSxX QmqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=o7OUswSBBgyHs9HCP4EVel3dfjx/MF405XRTUuvhTk4=; b=VN7HDFgbpxgZNgLkeRUjhUZ7VpnusY44iaaDsxIirJzkgpTH/c4jKHnH5cxF8KRe4b yHdZZOsLj8FUlt6fPTiicFDlT0ncGpqCnnpulvm3ZDO41HjOEpZvBVr/z7DA0QrHaFWf xIrSqsIldYe37K54RsmdyJFBAaygNDWm56IA/rCDviHQYcZ9bvF3FuCS4b+TcphrPHal Wn3gk8QmyRFgdDuPpJQLrodamGV5lFZjU9wKIkfHCQ5B9J2ODNnUeXHbDyUV8BTo6Ik0 ue+lmwehQjEkAd0gg7jz2LtvWaUjJjuLWkpOFvklAEdRQXW23PEtArmB4CMajyh7fSRm H9cg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=tjvByEZn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z10-20020a170903018a00b001b04c325d66si15401838plg.565.2023.06.16.01.55.21; Fri, 16 Jun 2023 01:55:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=tjvByEZn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243196AbjFPIwg (ORCPT <rfc822;maxin.john@gmail.com> + 99 others); Fri, 16 Jun 2023 04:52:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343929AbjFPIwO (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 16 Jun 2023 04:52:14 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 040DD1FEC; Fri, 16 Jun 2023 01:51:59 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7A5996312B; Fri, 16 Jun 2023 08:51:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82EC5C433CB; Fri, 16 Jun 2023 08:51:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1686905517; bh=B2AS/WR4xFCKseXYMdTiKKeL7nu8UkrYZoJGS68ejsY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tjvByEZnMSDWipjUeKQ3Rwqa6Bnc8WsZduBUz1eLRzqSPfqzWN106gYEQSU//XIf1 l67itImeq52R36luGSQzNUsJ68sHZ/xmHrg629NgZZvqXUYlmGO3JQJj/bd/fFeFKl +dlGyqNrFRTLxzqxLnHpKHTUupn5FXGtQixdRqD3nGt1Qtirqs9PgPm+ysYwFN3qmo rha25n6IyjNvdT8LE44jnT/YRZtPrYlLNH+pSNZeEXCW+59woYzsm3x66jgLj3+4sx 07zO1nSgtuc4hukUEEfOmql504pMwv3/412mDRnU0Gqh/RXmgVKHo6runZks71Y/Zs BiMhGWUvxJ+AQ== From: Mike Rapoport <rppt@kernel.org> To: linux-kernel@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org>, Catalin Marinas <catalin.marinas@arm.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, "David S. Miller" <davem@davemloft.net>, Dinh Nguyen <dinguyen@kernel.org>, Heiko Carstens <hca@linux.ibm.com>, Helge Deller <deller@gmx.de>, Huacai Chen <chenhuacai@kernel.org>, Kent Overstreet <kent.overstreet@linux.dev>, Luis Chamberlain <mcgrof@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Michael Ellerman <mpe@ellerman.id.au>, Mike Rapoport <rppt@kernel.org>, Nadav Amit <nadav.amit@gmail.com>, "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>, Palmer Dabbelt <palmer@dabbelt.com>, Puranjay Mohan <puranjay12@gmail.com>, Rick Edgecombe <rick.p.edgecombe@intel.com>, Russell King <linux@armlinux.org.uk>, Song Liu <song@kernel.org>, Steven Rostedt <rostedt@goodmis.org>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, Thomas Gleixner <tglx@linutronix.de>, Will Deacon <will@kernel.org>, bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, netdev@vger.kernel.org, sparclinux@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 06/12] mm/execmem: introduce execmem_data_alloc() Date: Fri, 16 Jun 2023 11:50:32 +0300 Message-Id: <20230616085038.4121892-7-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230616085038.4121892-1-rppt@kernel.org> References: <20230616085038.4121892-1-rppt@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768848868122125559?= X-GMAIL-MSGID: =?utf-8?q?1768848868122125559?= |
Series |
mm: jit/text allocator
|
|
Commit Message
Mike Rapoport
June 16, 2023, 8:50 a.m. UTC
From: "Mike Rapoport (IBM)" <rppt@kernel.org> Data related to code allocations, such as module data section, need to comply with architecture constraints for its placement and its allocation right now was done using execmem_text_alloc(). Create a dedicated API for allocating data related to code allocations and allow architectures to define address ranges for data allocations. Since currently this is only relevant for powerpc variants that use the VMALLOC address space for module data allocations, automatically reuse address ranges defined for text unless address range for data is explicitly defined by an architecture. With separation of code and data allocations, data sections of the modules are now mapped as PAGE_KERNEL rather than PAGE_KERNEL_EXEC which was a default on many architectures. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> --- arch/powerpc/kernel/module.c | 8 +++++++ include/linux/execmem.h | 18 +++++++++++++++ kernel/module/main.c | 15 +++---------- mm/execmem.c | 43 ++++++++++++++++++++++++++++++++++++ 4 files changed, 72 insertions(+), 12 deletions(-)
Comments
On Fri, 2023-06-16 at 11:50 +0300, Mike Rapoport wrote: > From: "Mike Rapoport (IBM)" <rppt@kernel.org> > > Data related to code allocations, such as module data section, need > to > comply with architecture constraints for its placement and its > allocation right now was done using execmem_text_alloc(). > > Create a dedicated API for allocating data related to code > allocations > and allow architectures to define address ranges for data > allocations. Right now the cross-arch way to specify kernel memory permissions is encoded in the function names of all the set_memory_foo()'s. You can't just have unified prot names because some arch's have NX and some have X bits, etc. CPA wouldn't know if it needs to set or unset a bit if you pass in a PROT. But then you end up with a new function for *each* combination (i.e. set_memory_rox()). I wish CPA has flags like mmap() does, and I wonder if it makes sense here instead of execmem_data_alloc(). Maybe that is an overhaul for another day though...
On Fri, Jun 16, 2023 at 1:51 AM Mike Rapoport <rppt@kernel.org> wrote: > > From: "Mike Rapoport (IBM)" <rppt@kernel.org> > > Data related to code allocations, such as module data section, need to > comply with architecture constraints for its placement and its > allocation right now was done using execmem_text_alloc(). > > Create a dedicated API for allocating data related to code allocations > and allow architectures to define address ranges for data allocations. > > Since currently this is only relevant for powerpc variants that use the > VMALLOC address space for module data allocations, automatically reuse > address ranges defined for text unless address range for data is > explicitly defined by an architecture. > > With separation of code and data allocations, data sections of the > modules are now mapped as PAGE_KERNEL rather than PAGE_KERNEL_EXEC which > was a default on many architectures. > > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> [...] > static void free_mod_mem(struct module *mod) > diff --git a/mm/execmem.c b/mm/execmem.c > index a67acd75ffef..f7bf496ad4c3 100644 > --- a/mm/execmem.c > +++ b/mm/execmem.c > @@ -63,6 +63,20 @@ void *execmem_text_alloc(size_t size) > fallback_start, fallback_end, kasan); > } > > +void *execmem_data_alloc(size_t size) > +{ > + unsigned long start = execmem_params.modules.data.start; > + unsigned long end = execmem_params.modules.data.end; > + pgprot_t pgprot = execmem_params.modules.data.pgprot; > + unsigned int align = execmem_params.modules.data.alignment; > + unsigned long fallback_start = execmem_params.modules.data.fallback_start; > + unsigned long fallback_end = execmem_params.modules.data.fallback_end; > + bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW; > + > + return execmem_alloc(size, start, end, align, pgprot, > + fallback_start, fallback_end, kasan); > +} > + > void execmem_free(void *ptr) > { > /* > @@ -101,6 +115,28 @@ static bool execmem_validate_params(struct execmem_params *p) > return true; > } > > +static void execmem_init_missing(struct execmem_params *p) Shall we call this execmem_default_init_data? > +{ > + struct execmem_modules_range *m = &p->modules; > + > + if (!pgprot_val(execmem_params.modules.data.pgprot)) > + execmem_params.modules.data.pgprot = PAGE_KERNEL; Do we really need to check each of these? IOW, can we do: if (!pgprot_val(execmem_params.modules.data.pgprot)) { execmem_params.modules.data.pgprot = PAGE_KERNEL; execmem_params.modules.data.alignment = m->text.alignment; execmem_params.modules.data.start = m->text.start; execmem_params.modules.data.end = m->text.end; execmem_params.modules.data.fallback_start = m->text.fallback_start; execmem_params.modules.data.fallback_end = m->text.fallback_end; } Thanks, Song [...]
On Fri, Jun 16, 2023 at 04:55:53PM +0000, Edgecombe, Rick P wrote: > On Fri, 2023-06-16 at 11:50 +0300, Mike Rapoport wrote: > > From: "Mike Rapoport (IBM)" <rppt@kernel.org> > > > > Data related to code allocations, such as module data section, need > > to > > comply with architecture constraints for its placement and its > > allocation right now was done using execmem_text_alloc(). > > > > Create a dedicated API for allocating data related to code > > allocations > > and allow architectures to define address ranges for data > > allocations. > > Right now the cross-arch way to specify kernel memory permissions is > encoded in the function names of all the set_memory_foo()'s. You can't > just have unified prot names because some arch's have NX and some have > X bits, etc. CPA wouldn't know if it needs to set or unset a bit if you > pass in a PROT. The idea is that allocator never uses CPA. It allocates with the permissions defined by architecture at the first place and then the callers might use CPA to change them. Ideally, that shouldn't be needed for anything but ro_after_init, but we are far from there. > But then you end up with a new function for *each* combination (i.e. > set_memory_rox()). I wish CPA has flags like mmap() does, and I wonder > if it makes sense here instead of execmem_data_alloc(). I don't think execmem should handle all the combinations. The code is always allocated as ROX for architectures that support text poking and as RWX for those that don't. Maybe execmem_data_alloc() will need to able to handle RW and RO data differently at some point, but that's the only variant I can think of, but even then I don't expect CPA will be used by execmem. We also might move resetting the permissions to default from vmalloc to execmem, but again, we are far from there. > Maybe that is an overhaul for another day though... CPA surely needs some love :)
On Fri, Jun 16, 2023 at 01:01:08PM -0700, Song Liu wrote: > On Fri, Jun 16, 2023 at 1:51 AM Mike Rapoport <rppt@kernel.org> wrote: > > > > From: "Mike Rapoport (IBM)" <rppt@kernel.org> > > > > Data related to code allocations, such as module data section, need to > > comply with architecture constraints for its placement and its > > allocation right now was done using execmem_text_alloc(). > > > > Create a dedicated API for allocating data related to code allocations > > and allow architectures to define address ranges for data allocations. > > > > Since currently this is only relevant for powerpc variants that use the > > VMALLOC address space for module data allocations, automatically reuse > > address ranges defined for text unless address range for data is > > explicitly defined by an architecture. > > > > With separation of code and data allocations, data sections of the > > modules are now mapped as PAGE_KERNEL rather than PAGE_KERNEL_EXEC which > > was a default on many architectures. > > > > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> > [...] > > static void free_mod_mem(struct module *mod) > > diff --git a/mm/execmem.c b/mm/execmem.c > > index a67acd75ffef..f7bf496ad4c3 100644 > > --- a/mm/execmem.c > > +++ b/mm/execmem.c > > @@ -63,6 +63,20 @@ void *execmem_text_alloc(size_t size) > > fallback_start, fallback_end, kasan); > > } > > > > +void *execmem_data_alloc(size_t size) > > +{ > > + unsigned long start = execmem_params.modules.data.start; > > + unsigned long end = execmem_params.modules.data.end; > > + pgprot_t pgprot = execmem_params.modules.data.pgprot; > > + unsigned int align = execmem_params.modules.data.alignment; > > + unsigned long fallback_start = execmem_params.modules.data.fallback_start; > > + unsigned long fallback_end = execmem_params.modules.data.fallback_end; > > + bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW; > > + > > + return execmem_alloc(size, start, end, align, pgprot, > > + fallback_start, fallback_end, kasan); > > +} > > + > > void execmem_free(void *ptr) > > { > > /* > > @@ -101,6 +115,28 @@ static bool execmem_validate_params(struct execmem_params *p) > > return true; > > } > > > > +static void execmem_init_missing(struct execmem_params *p) > > Shall we call this execmem_default_init_data? This also fills in jit.text (next patch), so _data doesn't work here :) And it's not really a default, the defaults are set explicitly for arches that don't have execmem_arch_params. > > +{ > > + struct execmem_modules_range *m = &p->modules; > > + > > + if (!pgprot_val(execmem_params.modules.data.pgprot)) > > + execmem_params.modules.data.pgprot = PAGE_KERNEL; > > Do we really need to check each of these? IOW, can we do: > > if (!pgprot_val(execmem_params.modules.data.pgprot)) { > execmem_params.modules.data.pgprot = PAGE_KERNEL; > execmem_params.modules.data.alignment = m->text.alignment; > execmem_params.modules.data.start = m->text.start; > execmem_params.modules.data.end = m->text.end; > execmem_params.modules.data.fallback_start = m->text.fallback_start; > execmem_params.modules.data.fallback_end = m->text.fallback_end; > } Yes, we can have a single check here. > Thanks, > Song > > [...]
Mike! Sorry for being late on this ... On Fri, Jun 16 2023 at 11:50, Mike Rapoport wrote: > > +void *execmem_data_alloc(size_t size) > +{ > + unsigned long start = execmem_params.modules.data.start; > + unsigned long end = execmem_params.modules.data.end; > + pgprot_t pgprot = execmem_params.modules.data.pgprot; > + unsigned int align = execmem_params.modules.data.alignment; > + unsigned long fallback_start = execmem_params.modules.data.fallback_start; > + unsigned long fallback_end = execmem_params.modules.data.fallback_end; > + bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW; While I know for sure that you read up on the discussion I had with Song about data structures, it seems you completely failed to understand it. > + return execmem_alloc(size, start, end, align, pgprot, > + fallback_start, fallback_end, kasan); Having _seven_ intermediate variables to fill _eight_ arguments of a function instead of handing in @size and a proper struct pointer is tasteless and disgusting at best. Six out of those seven parameters are from: execmem_params.module.data while the KASAN shadow part is retrieved from execmem_params.module.flags So what prevents you from having a uniform data structure, which is extensible and decribes _all_ types of allocations? Absolutely nothing. The flags part can either be in the type dependend part or you make the type configs an array as I had suggested originally and then execmem_alloc() becomes: void *execmem_alloc(type, size) and static inline void *execmem_data_alloc(size_t size) { return execmem_alloc(EXECMEM_TYPE_DATA, size); } which gets the type independent parts from @execmem_param. Just read through your own series and watch the evolution of execmem_alloc(): static void *execmem_alloc(size_t size) static void *execmem_alloc(size_t size, unsigned long start, unsigned long end, unsigned int align, pgprot_t pgprot) static void *execmem_alloc(size_t len, unsigned long start, unsigned long end, unsigned int align, pgprot_t pgprot, unsigned long fallback_start, unsigned long fallback_end, bool kasan) In a month from now this function will have _ten_ parameters and tons of horrible wrappers which convert an already existing data structure into individual function arguments. Seriously? If you want this function to be [ab]used outside of the exec_param configuration space for whatever non-sensical reasons then this still can be either: void *execmem_alloc(params, type, size) static inline void *execmem_data_alloc(size_t size) { return execmem_alloc(&exec_param, EXECMEM_TYPE_DATA, size); } or void *execmem_alloc(type_params, size); static inline void *execmem_data_alloc(size_t size) { return execmem_alloc(&exec_param.data, size); } which both allows you to provide alternative params, right? Coming back to my conversation with Song: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." You might want to reread: https://lore.kernel.org/r/87lenuukj0.ffs@tglx and the subsequent thread. The fact that my suggestions had a 'mod_' namespace prefix does not make any of my points moot. Song did an extremly good job in abstracting things out, but you decided to ditch his ground work instead of building on it and keeping the good parts. That's beyond sad. Worse, you went the full 'not invented here' path with an outcome which is even worse than the original hackery from Song which started the above referenced thread. I don't know what caused you to decide that ad hoc hackery is better than proper data structure based design patterns. I actually don't want to know. As much as my voice counts: NAK-ed-by: Thomas Gleixner <tglx@linutronix.de> Thanks, tglx
On Mon, Jun 19, 2023 at 12:32:55AM +0200, Thomas Gleixner wrote: > Mike! > > Sorry for being late on this ... > > On Fri, Jun 16 2023 at 11:50, Mike Rapoport wrote: > > > > +void *execmem_data_alloc(size_t size) > > +{ > > + unsigned long start = execmem_params.modules.data.start; > > + unsigned long end = execmem_params.modules.data.end; > > + pgprot_t pgprot = execmem_params.modules.data.pgprot; > > + unsigned int align = execmem_params.modules.data.alignment; > > + unsigned long fallback_start = execmem_params.modules.data.fallback_start; > > + unsigned long fallback_end = execmem_params.modules.data.fallback_end; > > + bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW; > > While I know for sure that you read up on the discussion I had with Song > about data structures, it seems you completely failed to understand it. > > > + return execmem_alloc(size, start, end, align, pgprot, > > + fallback_start, fallback_end, kasan); > > Having _seven_ intermediate variables to fill _eight_ arguments of a > function instead of handing in @size and a proper struct pointer is > tasteless and disgusting at best. > > Six out of those seven parameters are from: > > execmem_params.module.data > > while the KASAN shadow part is retrieved from > > execmem_params.module.flags > > So what prevents you from having a uniform data structure, which is > extensible and decribes _all_ types of allocations? > > Absolutely nothing. The flags part can either be in the type dependend > part or you make the type configs an array as I had suggested originally > and then execmem_alloc() becomes: > > void *execmem_alloc(type, size) > > and > > static inline void *execmem_data_alloc(size_t size) > { > return execmem_alloc(EXECMEM_TYPE_DATA, size); > } > > which gets the type independent parts from @execmem_param. > > Just read through your own series and watch the evolution of > execmem_alloc(): > > static void *execmem_alloc(size_t size) > > static void *execmem_alloc(size_t size, unsigned long start, > unsigned long end, unsigned int align, > pgprot_t pgprot) > > static void *execmem_alloc(size_t len, unsigned long start, > unsigned long end, unsigned int align, > pgprot_t pgprot, > unsigned long fallback_start, > unsigned long fallback_end, > bool kasan) > > In a month from now this function will have _ten_ parameters and tons of > horrible wrappers which convert an already existing data structure into > individual function arguments. > > Seriously? > > If you want this function to be [ab]used outside of the exec_param > configuration space for whatever non-sensical reasons then this still > can be either: > > void *execmem_alloc(params, type, size) > > static inline void *execmem_data_alloc(size_t size) > { > return execmem_alloc(&exec_param, EXECMEM_TYPE_DATA, size); > } > > or > > void *execmem_alloc(type_params, size); > > static inline void *execmem_data_alloc(size_t size) > { > return execmem_alloc(&exec_param.data, size); > } > > which both allows you to provide alternative params, right? > > Coming back to my conversation with Song: > > "Bad programmers worry about the code. Good programmers worry about > data structures and their relationships." Thomas, you're confusing an internal interface with external, I made the same mistake reviewing Song's patchset...
Kent! On Sun, Jun 18 2023 at 19:14, Kent Overstreet wrote: > On Mon, Jun 19, 2023 at 12:32:55AM +0200, Thomas Gleixner wrote: > > Thomas, you're confusing an internal interface with external No. I am not. Whether that's an internal function or not does not make any difference at all. Having a function growing _eight_ parameters where _six_ of them are derived from a well defined data structure is a clear sign of design fail. It's not rocket science to do: struct generic_allocation_info { .... }; struct execmem_info { .... struct generic_allocation_info alloc_info; ; struct execmem_param { ... struct execmem_info[NTYPES]; }; and having a function which can either operate on execmem_param and type or on generic_allocation_info itself. It does not matter as long as the data structure which is handed into this internal function is describing it completely or needs a supplementary argument, i.e. flags. Having tons of wrappers which do: a = generic_info.a; b = generic_info.b; .... n = generic_info.n; internal_func(a, b, ....,, n); is just hillarious and to repeat myself tasteless and therefore disgusting. That's CS course first semester hackery, but TBH, I can only tell from imagination because I did not take CS courses - maybe that's the problem... Data structure driven design works not from the usage site down to the internals. It's the other way round: 1) Define a data structure which describes what the internal function needs to know 2) Implement use case specific variants which describe that 3) Hand the use case specific variant to the internal function eventually with some minimal supplementary information. Object oriented basics, right? There is absolutely nothing wrong with having internal_func(generic_info *, modifier); but having internal_func(a, b, ... n) is fundamentally wrong in the context of an extensible and future proof internal function. Guess what. Today it's sufficient to have _eight_ arguments and we are happy to have 10 nonsensical wrappers around this internal function. Tomorrow there happens to be a use case which needs another argument so you end up: Changing 10 wrappers plus the function declaration and definition in one go instead of adding One new naturally 0 initialized member to generic_info and be done with it. Look at the evolution of execmem_alloc() in this very pathset which I pointed out. That very patchset covers _two_ of at least _six_ cases Song and myself identified. It already had _three_ steps of evolution from _one_ to _five_ to _eight_ parameters. C is not like e.g. python where you can "solve" that problem by simply doing: - internal_func(a, b, c): + internal_func(a, b, c, d=None, ..., n=None): But that's not a solution either. That's a horrible workaround even in python once your parameter space gets sufficiently large. The way out in python is to have **kwargs. But that's not an option in C, and not necessarily the best option for python either. Even in python or any other object oriented language you get to the point where you have to rethink your approach, go back to the drawing board and think about data representation. But creating a new interface based on "let's see what we need over time and add parameters as we see fit" is simply wrong to begin with independent of the programming language. Even if the _eight_ parameters are the end of the range, then they are beyond justifyable because that's way beyond the natural register argument space of any architecture and you are offloading your lazyness to wrappers and the compiler to emit pointlessly horrible code. There is a reason why new syscalls which need more than a few parameters are based on 'struct DATA_WHICH_I_NEED_TO_KNOW' and 'flags'. We've got burned on the non-extensibilty often enough. Why would a new internal function have any different requirements especially as it is neither implemented to the full extent nor a hotpath function? Now you might argue that it _is_ a "hotpath" due to the BPF usage, but then even more so as any intermediate wrapper which converts from one data representation to another data representation is not going to increase performance, right? > ... I made the same mistake reviewing Song's patchset... Songs series had rough edges, but was way more data structure driven and palatable than this hackery. The fact that you made a mistake while reviewing Songs series has absolutely nothing to do with the above or my previous reply to Mike. Thanks, tglx
On Mon, Jun 19, 2023 at 02:43:58AM +0200, Thomas Gleixner wrote: > Kent! Hi Thomas :) > No. I am not. Ok. > Whether that's an internal function or not does not make any difference > at all. Well, at the risk of this discussion going completely off the rails, I have to disagree with you there. External interfaces and high level semantics are more important to get right from the outset, internal implementation details can be cleaned up later, within reason. And the discussion on this patchset has been more focused on those external interfaces, which seems like the right approach to me. > > ... I made the same mistake reviewing Song's patchset... > > Songs series had rough edges, but was way more data structure driven > and palatable than this hackery. I liked that aspect of Song's patchset too, and I'm actually inclined to agree with you that this patchset might get a bit cleaner with more of that, but really, this semes like just quibbling over calling convention for an internal helper function.
On Mon, Jun 19, 2023 at 12:32:55AM +0200, Thomas Gleixner wrote: > Mike! > > Sorry for being late on this ... > > On Fri, Jun 16 2023 at 11:50, Mike Rapoport wrote: > > The fact that my suggestions had a 'mod_' namespace prefix does not make > any of my points moot. The prefix does not matter. What matters is what we are trying to abstract. Your suggestion is based of the memory used by modules. I'm abstracting address spaces for different types of executable and related memory. They are similar, yes, but they are not the same. The TEXT, INIT_TEXT and *_DATA do not match to what we have from arch POV. They have modules with text, rw data, ro data and ro after init data and the memory for the generated code. The memory for modules and memory for other users have different restrictions for their placement, so using a single TEXT type for them is semantically wrong. BPF and kprobes do not necessarily must be at the same address range as modules and init text does not differ from normal text. > Song did an extremly good job in abstracting things out, but you decided > to ditch his ground work instead of building on it and keeping the good > parts. That's beyond sad. Actually not. The core idea to describe address range suitable for code allocations with a structure and have arch code initialize this structure at boot and be done with it is the same. But I don't think vmalloc parameters belong there, they should be completely encapsulated in the allocator. Having fallback range named explicitly is IMO clearer than an array of address spaces. I accept your point that the structures describing ranges for different types should be unified and I've got carried away with making the wrappers to convert that structure to parameters to the core allocation function. I've chosen to define ranges as fields in the containing structure rather than enum with types and an array because I strongly feel that the callers should not care about these parameters. These parameters are defined by architecture and the callers should not need to know how each and every arch defines restrictions suitable for modules, bpf or kprobes. That's also the reason to have different names for API calls, exactly to avoid having alloc(KPROBES,...), alloc(BPF, ...), alloc(MODULES, ...) an so on. All in all, if I filter all the ranting, this boils down to having a unified structure for all the address ranges and passing this structure from the wrappers to the core alloc as is rather that translating it to separate parameters, with which I agree. > Thanks, > > tglx
On Mon, 19 Jun 2023 02:43:58 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > Now you might argue that it _is_ a "hotpath" due to the BPF usage, but > then even more so as any intermediate wrapper which converts from one > data representation to another data representation is not going to > increase performance, right? Just as a side note. BPF can not attach its return calling code to functions that have more than 6 parameters (3 on 32 bit x86), because of the way BPF return path trampoline works. It is a requirement that all parameters live in registers, and none on the stack. -- Steve
On Tue, Jun 20, 2023 at 7:51 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 19 Jun 2023 02:43:58 +0200 > Thomas Gleixner <tglx@linutronix.de> wrote: > > > Now you might argue that it _is_ a "hotpath" due to the BPF usage, but > > then even more so as any intermediate wrapper which converts from one > > data representation to another data representation is not going to > > increase performance, right? > > Just as a side note. BPF can not attach its return calling code to > functions that have more than 6 parameters (3 on 32 bit x86), because of > the way BPF return path trampoline works. It is a requirement that all > parameters live in registers, and none on the stack. It's actually 7 and that restriction is being lifted. The patch set to attach to <= 12 is being discussed.
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index ba7abff77d98..4c6c15bf3947 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -103,6 +103,10 @@ struct execmem_params __init *execmem_arch_params(void) { pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC; + /* + * BOOK3S_32 and 8xx define MODULES_VADDR for text allocations and + * allow allocating data in the entire vmalloc space + */ #ifdef MODULES_VADDR unsigned long limit = (unsigned long)_etext - SZ_32M; @@ -116,6 +120,10 @@ struct execmem_params __init *execmem_arch_params(void) execmem_params.modules.text.start = MODULES_VADDR; execmem_params.modules.text.end = MODULES_END; } + execmem_params.modules.data.start = VMALLOC_START; + execmem_params.modules.data.end = VMALLOC_END; + execmem_params.modules.data.pgprot = PAGE_KERNEL; + execmem_params.modules.data.alignment = 1; #else execmem_params.modules.text.start = VMALLOC_START; execmem_params.modules.text.end = VMALLOC_END; diff --git a/include/linux/execmem.h b/include/linux/execmem.h index b9a97fcdf3c5..2e1221310d13 100644 --- a/include/linux/execmem.h +++ b/include/linux/execmem.h @@ -44,10 +44,12 @@ enum execmem_module_flags { * space * @flags: options for module memory allocations * @text: address range for text allocations + * @data: address range for data allocations */ struct execmem_modules_range { enum execmem_module_flags flags; struct execmem_range text; + struct execmem_range data; }; /** @@ -89,6 +91,22 @@ struct execmem_params *execmem_arch_params(void); */ void *execmem_text_alloc(size_t size); +/** + * execmem_data_alloc - allocate memory for data coupled to code + * @size: how many bytes of memory are required + * + * Allocates memory that will contain data copupled with executable code, + * like data sections in kernel modules. + * + * The memory will have protections defined by architecture. + * + * The allocated memory will reside in an area that does not impose + * restrictions on the addressing modes. + * + * Return: a pointer to the allocated memory or %NULL + */ +void *execmem_data_alloc(size_t size); + /** * execmem_free - free executable memory * @ptr: pointer to the memory that should be freed diff --git a/kernel/module/main.c b/kernel/module/main.c index b445c5ad863a..d6582bfec1f6 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1195,25 +1195,16 @@ void __weak module_arch_freeing_init(struct module *mod) { } -static bool mod_mem_use_vmalloc(enum mod_mem_type type) -{ - return IS_ENABLED(CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC) && - mod_mem_type_is_core_data(type); -} - static void *module_memory_alloc(unsigned int size, enum mod_mem_type type) { - if (mod_mem_use_vmalloc(type)) - return vzalloc(size); + if (mod_mem_type_is_data(type)) + return execmem_data_alloc(size); return execmem_text_alloc(size); } static void module_memory_free(void *ptr, enum mod_mem_type type) { - if (mod_mem_use_vmalloc(type)) - vfree(ptr); - else - execmem_free(ptr); + execmem_free(ptr); } static void free_mod_mem(struct module *mod) diff --git a/mm/execmem.c b/mm/execmem.c index a67acd75ffef..f7bf496ad4c3 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -63,6 +63,20 @@ void *execmem_text_alloc(size_t size) fallback_start, fallback_end, kasan); } +void *execmem_data_alloc(size_t size) +{ + unsigned long start = execmem_params.modules.data.start; + unsigned long end = execmem_params.modules.data.end; + pgprot_t pgprot = execmem_params.modules.data.pgprot; + unsigned int align = execmem_params.modules.data.alignment; + unsigned long fallback_start = execmem_params.modules.data.fallback_start; + unsigned long fallback_end = execmem_params.modules.data.fallback_end; + bool kasan = execmem_params.modules.flags & EXECMEM_KASAN_SHADOW; + + return execmem_alloc(size, start, end, align, pgprot, + fallback_start, fallback_end, kasan); +} + void execmem_free(void *ptr) { /* @@ -101,6 +115,28 @@ static bool execmem_validate_params(struct execmem_params *p) return true; } +static void execmem_init_missing(struct execmem_params *p) +{ + struct execmem_modules_range *m = &p->modules; + + if (!pgprot_val(execmem_params.modules.data.pgprot)) + execmem_params.modules.data.pgprot = PAGE_KERNEL; + + if (!execmem_params.modules.data.alignment) + execmem_params.modules.data.alignment = m->text.alignment; + + if (!execmem_params.modules.data.start) { + execmem_params.modules.data.start = m->text.start; + execmem_params.modules.data.end = m->text.end; + } + + if (!execmem_params.modules.data.fallback_start && + execmem_params.modules.text.fallback_start) { + execmem_params.modules.data.fallback_start = m->text.fallback_start; + execmem_params.modules.data.fallback_end = m->text.fallback_end; + } +} + void __init execmem_init(void) { struct execmem_params *p = execmem_arch_params(); @@ -112,6 +148,11 @@ void __init execmem_init(void) p->modules.text.pgprot = PAGE_KERNEL_EXEC; p->modules.text.alignment = 1; + p->modules.data.start = VMALLOC_START; + p->modules.data.end = VMALLOC_END; + p->modules.data.pgprot = PAGE_KERNEL; + p->modules.data.alignment = 1; + return; } @@ -119,4 +160,6 @@ void __init execmem_init(void) return; execmem_params = *p; + + execmem_init_missing(p); }