Message ID | 20221025200656.951281799@infradead.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp1196197wru; Tue, 25 Oct 2022 13:13:08 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6eBBGdJWLo0ngqPzv2Rmjc0o5I/plfs8kGRZOJRC5UDIXZC3v68K5+BjaBj6mDwSjCFCk1 X-Received: by 2002:a63:234c:0:b0:46f:1b7:438b with SMTP id u12-20020a63234c000000b0046f01b7438bmr11031917pgm.516.1666728787741; Tue, 25 Oct 2022 13:13:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666728787; cv=none; d=google.com; s=arc-20160816; b=smTRxk3HUKHMOk6jjklrCDY0PuJEA3NIKA4P0gR7eCiRRfgthFiiqyPbQy+ghAhh/6 MEURzVpbeg6SHFKrgHKzS9GItzl9NzM4aQVhme2d43UOELoV3Mht313/oV2y7OZnL5ua Yy/6pa0xR1xSwGX9s/EMI0xjoTW/w1xRgqieJHFKO4BB3C+Tjr2jV71PCuVexjyYe2Iz g0s16xjM88nfCsDQXt14c3l3B86dHRNTvEKxcu3UdNv4dBK2Th0iTqo0LCc2gQbmReCx fnHFvdbm4c9Hm7x3mXMgnr3/3shKcuo5hXNJUjnh5LfiEpkDx3OsDqQhvx74FMzWAXWu 6yJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:date:user-agent:message-id :dkim-signature; bh=arjjn6azk9uuxv9qRJ+TWeQwrm7jAh/OWTTw4STalbE=; b=WO3qM+6Xa8IFuBfka9sg4HRDMzf0QMINlvyTZNFPxIm1eRE4hwmrfUeMO+zouMv2CU wYeMuy3rMA4w8/86pDEqc6ViHWENG14Uh6bP95mKp3SPlGPD0aWQlV1Zk39i6jFifBvP xBRX+nmmakBv6gkjsHSdq7PARLc1ZqfTpLkNZi7/0vAWtGlluRlnnCb9+eP3vnkxw7CW Vaj9A3ecwhxchzJ4iYd3+ZMQSU13z69dv/PLXAEzAIC7VQuXqPgo0aFE63ai8W3cw7Ll B4RVSPf5mw/JgX2Py1bEopIir0/c50jzZ6StxYq8fy+e9MXVmjAV0ke6uSlqsxy8KQo+ kacg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=SoA0yvjL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s64-20020a632c43000000b0046b1a7c6679si4198121pgs.414.2022.10.25.13.12.35; Tue, 25 Oct 2022 13:13:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=SoA0yvjL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231549AbiJYULn (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Tue, 25 Oct 2022 16:11:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231327AbiJYULg (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Oct 2022 16:11:36 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31FD679EC7 for <linux-kernel@vger.kernel.org>; Tue, 25 Oct 2022 13:11:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Subject:Cc:To:From:Date:Message-ID: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To:References; bh=arjjn6azk9uuxv9qRJ+TWeQwrm7jAh/OWTTw4STalbE=; b=SoA0yvjL6GoYB+2zKr2iW2n+CW Y00diumhJOcA6Du8eS/vHG6P2v8xueThdNVEx7uS3ftVAkGE38n2+AWibbqKg9wC69ms04V6Et/e3 Bb0rMCRjSeSnu2ary0XsvmDIzaCrqqIzsAy1m9Rexuzp6GUn1wAKnFk+ALwqsjP/Ox2wKM2JTsk/X 6kuGM3yL27VqqWJ2hrwHNbpLh9rmr2QBV/ZMEqvaCabyJK5Asf+SjID8CG/2th5wS8OYj2Xph+Ypg i6hZjEklqfE7N5K6GEhwAufY6ylbQnVAu7Vv3iqViGmctp9PD+B7yIJE3ssF4MMoeVqdBmHj/i6Rm LZ3sIbSw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1onQGm-00GWF5-EP; Tue, 25 Oct 2022 20:11:28 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 9F04A30008D; Tue, 25 Oct 2022 22:11:22 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 8805D2C450286; Tue, 25 Oct 2022 22:11:22 +0200 (CEST) Message-ID: <20221025200656.951281799@infradead.org> User-Agent: quilt/0.66 Date: Tue, 25 Oct 2022 22:06:56 +0200 From: Peter Zijlstra <peterz@infradead.org> To: torvalds@linux-foundation.org, rostedt@goodmis.org, dave.hansen@intel.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, x86@kernel.org, keescook@chromium.org, seanjc@google.com Subject: [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747691805512204758?= X-GMAIL-MSGID: =?utf-8?q?1747691805512204758?= |
Series | x86/ftrace: Cure boot time W+X mapping | |
Message
Peter Zijlstra
Oct. 25, 2022, 8:06 p.m. UTC
Hi, These few patches re-work and re-order boot things enough to avoid ftrace creating boot time W+X maps. The patches compile and boot for the one config I tested things on (with ftrace=function enabled; *slooooow*). I've pushed them out for the robots to have a go at here: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm.poke_me
Comments
On Tue, Oct 25, 2022 at 1:11 PM Peter Zijlstra <peterz@infradead.org> wrote: > > These few patches re-work and re-order boot things enough to avoid ftrace > creating boot time W+X maps. Thanks, looks fine. > The patches compile and boot for the one config I tested things on (with > ftrace=function enabled; *slooooow*). So this might be just tracing overhead, but it might also be that you slowed down text_poke() at bootup a _lot_. The only part that the NX^W checking cared about was that - if (likely(system_state != SYSTEM_BOOTING)) - set_memory_ro((unsigned long)trampoline, npages); + set_memory_ro((unsigned long)trampoline, npages); set_memory_x((unsigned long)trampoline, npages); for the create_trampoline(), because without the 'set_memory_ro()', the 'set_memory_x()' will complain. It does strike me that it's stupid to make those be two calls that do exactly the same thing, and we should have a combined "set it read-only and executable" function, but that's a separate issue. The slowness is probably not the trampilines, but just the regular "text_poke of kernel text" that we probably want to keep special just because otherwise it's _so_ slow to do for every alternative etc. Linus
On Tue, 25 Oct 2022 16:07:25 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > The slowness is probably not the trampilines, but just the regular > "text_poke of kernel text" that we probably want to keep special just > because otherwise it's _so_ slow to do for every alternative etc. Yes. That's why I recommended to change patch 4 to: if (unlikely(system_state == SYSTEM_BOOTING) && core_kernel_text((unsigned long)addr)) { text_poke_early(addr, opcode, len); return; } The only special case we had was the ftrace trampoline that was dynamically allocated, and there's paths that require updating it (when you add two function callbacks to the same location). -- Steve
On Tue, Oct 25, 2022 at 04:07:25PM -0700, Linus Torvalds wrote: > It does strike me that it's stupid to make those be two calls that do > exactly the same thing, and we should have a combined "set it > read-only and executable" function, but that's a separate issue. Right, and we have it all over the place. Something like the below perhaps? I'll feed it to the robots, see if something breaks. > The slowness is probably not the trampilines, but just the regular > "text_poke of kernel text" that we probably want to keep special just > because otherwise it's _so_ slow to do for every alternative etc. I tried with and without the patches, it's dead slow either way and I couldn't spot a noticable difference between the two -- so I'm assuming it's simply the trace overhead, not the trace-enable time. --- --- a/arch/arm/mach-omap1/sram-init.c +++ b/arch/arm/mach-omap1/sram-init.c @@ -74,8 +74,7 @@ void *omap_sram_push(void *funcp, unsign dst = fncpy(sram, funcp, size); - set_memory_ro(base, pages); - set_memory_x(base, pages); + set_memory_rox(base, pages); return dst; } @@ -126,8 +125,7 @@ static void __init omap_detect_and_map_s base = (unsigned long)omap_sram_base; pages = PAGE_ALIGN(omap_sram_size) / PAGE_SIZE; - set_memory_ro(base, pages); - set_memory_x(base, pages); + set_memory_rox(base, pages); } static void (*_omap_sram_reprogram_clock)(u32 dpllctl, u32 ckctl); --- a/arch/arm/mach-omap2/sram.c +++ b/arch/arm/mach-omap2/sram.c @@ -96,8 +96,7 @@ void *omap_sram_push(void *funcp, unsign dst = fncpy(sram, funcp, size); - set_memory_ro(base, pages); - set_memory_x(base, pages); + set_memory_rox(base, pages); return dst; } @@ -217,8 +216,7 @@ static void __init omap2_map_sram(void) base = (unsigned long)omap_sram_base; pages = PAGE_ALIGN(omap_sram_size) / PAGE_SIZE; - set_memory_ro(base, pages); - set_memory_x(base, pages); + set_memory_rox(base, pages); } static void (*_omap2_sram_ddr_init)(u32 *slow_dll_ctrl, u32 fast_dll_ctrl, --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -134,10 +134,9 @@ void *alloc_insn_page(void) if (!page) return NULL; - if (strict_module_rwx_enabled()) { - set_memory_ro((unsigned long)page, 1); - set_memory_x((unsigned long)page, 1); - } + if (strict_module_rwx_enabled()) + set_memory_rox((unsigned long)page, 1); + return page; } --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops set_vm_flush_reset_perms(trampoline); - set_memory_ro((unsigned long)trampoline, npages); - set_memory_x((unsigned long)trampoline, npages); + set_memory_rox((unsigned long)trampoline, npages); return (unsigned long)trampoline; fail: tramp_free(trampoline); --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -415,17 +415,12 @@ void *alloc_insn_page(void) return NULL; set_vm_flush_reset_perms(page); - /* - * First make the page read-only, and only then make it executable to - * prevent it from being W+X in between. - */ - set_memory_ro((unsigned long)page, 1); /* * TODO: Once additional kernel code protection mechanisms are set, ensure * that the page was not maliciously altered and it is still zeroed. */ - set_memory_x((unsigned long)page, 1); + set_memory_rox((unsigned long)page, 1); return page; } --- a/drivers/misc/sram-exec.c +++ b/drivers/misc/sram-exec.c @@ -106,10 +106,7 @@ void *sram_exec_copy(struct gen_pool *po dst_cpy = fncpy(dst, src, size); - ret = set_memory_ro((unsigned long)base, pages); - if (ret) - goto error_out; - ret = set_memory_x((unsigned long)base, pages); + ret = set_memory_rox((unsigned long)base, pages); if (ret) goto error_out; --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -860,8 +860,7 @@ static inline void bpf_prog_lock_ro(stru static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) { set_vm_flush_reset_perms(hdr); - set_memory_ro((unsigned long)hdr, hdr->size >> PAGE_SHIFT); - set_memory_x((unsigned long)hdr, hdr->size >> PAGE_SHIFT); + set_memory_rox((unsigned long)hdr, hdr->size >> PAGE_SHIFT); } int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap); --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -14,6 +14,14 @@ static inline int set_memory_x(unsigned static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; } #endif +static inline int set_memory_rox(unsigned long addr, int numpages) +{ + int ret = set_memory_ro(addr, numpages); + if (ret) + return ret; + return set_memory_x(addr, numpages); +} + #ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP static inline int set_direct_map_invalid_noflush(struct page *page) { --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -494,8 +494,7 @@ static int bpf_struct_ops_map_update_ele refcount_set(&kvalue->refcnt, 1); bpf_map_inc(map); - set_memory_ro((long)st_map->image, 1); - set_memory_x((long)st_map->image, 1); + set_memory_rox((long)st_map->image, 1); err = st_ops->reg(kdata); if (likely(!err)) { /* Pair with smp_load_acquire() during lookup_elem(). --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -864,8 +864,7 @@ static struct bpf_prog_pack *alloc_new_p list_add_tail(&pack->list, &pack_list); set_vm_flush_reset_perms(pack->ptr); - set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); - set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); + set_memory_rox((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); return pack; } @@ -883,8 +882,7 @@ void *bpf_prog_pack_alloc(u32 size, bpf_ if (ptr) { bpf_fill_ill_insns(ptr, size); set_vm_flush_reset_perms(ptr); - set_memory_ro((unsigned long)ptr, size / PAGE_SIZE); - set_memory_x((unsigned long)ptr, size / PAGE_SIZE); + set_memory_rox((unsigned long)ptr, size / PAGE_SIZE); } goto out; } --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -468,8 +468,7 @@ static int bpf_trampoline_update(struct if (err < 0) goto out; - set_memory_ro((long)im->image, 1); - set_memory_x((long)im->image, 1); + set_memory_rox((long)im->image, 1); WARN_ON(tr->cur_image && tr->selector == 0); WARN_ON(!tr->cur_image && tr->selector); --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -124,8 +124,7 @@ int bpf_struct_ops_test_run(struct bpf_p if (err < 0) goto out; - set_memory_ro((long)image, 1); - set_memory_x((long)image, 1); + set_memory_rox((long)image, 1); prog_ret = dummy_ops_call_op(image, args); err = dummy_ops_copy_args(args);
On Wed, Oct 26, 2022 at 12:15 AM Peter Zijlstra <peterz@infradead.org> wrote: > > Right, and we have it all over the place. Something like the below > perhaps? I'll feed it to the robots, see if something breaks. I was nodding along with the patches like this: > - set_memory_ro(base, pages); > - set_memory_x(base, pages); > + set_memory_rox(base, pages); but then I got to this part: > +static inline int set_memory_rox(unsigned long addr, int numpages) > +{ > + int ret = set_memory_ro(addr, numpages); > + if (ret) > + return ret; > + return set_memory_x(addr, numpages); > +} and that's when I went "no, I really meant make it one single call". set_memory_ro() and set_memory_x() basically end up doing the exact same thing, just with different bits. So it's not only silly to have the callers do two different calls, it's silly to have the *implementation* do two different scans of the page tables and page merging/splitting. I think in the case of x86, the set_memory_rox() function would basically just be int set_memory_rox(unsigned long addr, int numpages) { pgprot_t clr = __pgprot(_PAGE_RW); pgprot_t set = { 0 }; if (__supported_pte_mask & _PAGE_NX) set.pgprot |= _PAGE_NX; return change_page_attr_set_clr(&addr, numpages, set, clr, 0, NULL); } or something close to that. (NOTE! The above was cobbled together in the MUA, hasn't seen a compiler much less been booted, and might be completely broken for some reason - it's meant to be the concept, not some kind of final word). IOW, the whole "set_rox" is really just a _single_ change_page_attr_set_clr() call. Maybe you meant to do that, and this patch was just prep-work for the arch code being the second stage? Linus
On Wed, Oct 26, 2022 at 10:59:29AM -0700, Linus Torvalds wrote: > Maybe you meant to do that, and this patch was just prep-work for the > arch code being the second stage? Yeah; also, since this is cross arch, we need a fallback. Anyway; robots hated on me for missing a few includes. I'll go prod at this more.
On Thu, Oct 27, 2022 at 08:59:45AM +0200, Peter Zijlstra wrote: > On Wed, Oct 26, 2022 at 10:59:29AM -0700, Linus Torvalds wrote: > > > Maybe you meant to do that, and this patch was just prep-work for the > > arch code being the second stage? > > Yeah; also, since this is cross arch, we need a fallback. Anyway; > robots hated on me for missing a few includes. I'll go prod at this > more. Got around to it; I added the below patch on top and things seem to still boot so it must be good :-) --- Subject: x86/mm: Implement native set_memory_rox() From: Peter Zijlstra <peterz@infradead.org> Date: Sat Oct 29 13:19:31 CEST 2022 Provide a native implementation of set_memory_rox(), avoiding the double set_memory_ro();set_memory_x(); calls. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/include/asm/set_memory.h | 3 +++ arch/x86/mm/pat/set_memory.c | 10 ++++++++++ include/linux/set_memory.h | 2 ++ 3 files changed, 15 insertions(+) --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -6,6 +6,9 @@ #include <asm/page.h> #include <asm-generic/set_memory.h> +#define set_memory_rox set_memory_rox +int set_memory_rox(unsigned long addr, int numpages); + /* * The set_memory_* API can be used to change various attributes of a virtual * address range. The attributes include: --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2025,6 +2025,16 @@ int set_memory_ro(unsigned long addr, in return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); } +int set_memory_rox(unsigned long addr, int numpages) +{ + pgprot_t clr = __pgprot(_PAGE_RW); + + if (__supported_pte_mask & _PAGE_NX) + clr.pgprot |= _PAGE_NX; + + return change_page_attr_clear(&addr, numpages, clr, 0); +} + int set_memory_rw(unsigned long addr, int numpages) { return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0); --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -14,6 +14,7 @@ static inline int set_memory_x(unsigned static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; } #endif +#ifndef set_memory_rox static inline int set_memory_rox(unsigned long addr, int numpages) { int ret = set_memory_ro(addr, numpages); @@ -21,6 +22,7 @@ static inline int set_memory_rox(unsigne return ret; return set_memory_x(addr, numpages); } +#endif #ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP static inline int set_direct_map_invalid_noflush(struct page *page)
On Sat, Oct 29, 2022 at 4:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > Got around to it; I added the below patch on top and things seem to > still boot so it must be good :-) Thanks, looks good to me, and as I see your simpler version, I realized that my broken MUA version wasn't "rox", it was "ronx". Linus