From patchwork Thu Oct 5 07:20:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 148912 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2016:b0:403:3b70:6f57 with SMTP id fe22csp427039vqb; Thu, 5 Oct 2023 09:44:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE0ZFkyK3tEApQH5a871W0ueViYZEGqjZiftOg1JXAKNSGcfKjUSBPrOQ+EsOk5M8Gs6QKX X-Received: by 2002:a05:6358:5e12:b0:13e:d853:cc22 with SMTP id q18-20020a0563585e1200b0013ed853cc22mr4732676rwn.4.1696524248174; Thu, 05 Oct 2023 09:44:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696524248; cv=none; d=google.com; s=arc-20160816; b=qsyXSsTSwRNzYFPGT9+CDz6wxI890l0TyhIXfcAQ0M0+bU0lHbtI6sVAik3nNNUvPi DsWHn+Fkml2DDEacyb+HLyOHeKSlzqrL/jopsEokeCXlquW1KNLPCfzSvHgLyVoLJPE2 25/bMh1LnWnnaaA9gPL8/UDSFqYyoAuDjKRZiT8e+ayX7kFFRsOkOafQiRElfnuXi8Df +yNi+sXHs/nUEtxm3zpsWYf07n3T43nsIiAKBDJk7AOSp4n9sElHMvJGVZyl27vO5MPB gsB0dvnhKTbsC1uJ6MHkJpmoYimWBW+KFFHlG+sKX1EivTSh7ke9FVEfP0X8+aJnqYBJ d0qQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=RCaK47zDCHa+WUW6K8uJlvzYxhvz2WFUP53che6f/SQ=; fh=Jx3zaC5Gf5qk8UrUQDt8JpzrCBJuUiWdRlWrlpLNX5w=; b=EPBszlqFoSSLZkuawWzoGz8tC9Az5NZbs4m7jRWxPDmIbcTUKbZVNOeaYViRMyBQX+ AnPKRhGu0h3E1uNqENGNJC8nXiCSFeCBTnEQFvnQkPCXFFVL5gDEF5rZs8+mQsNFmWCC cTdf6PHqQefm5oFRbL5O42p+YcQHRooHA16HWd5orxRjWhpesYejkptTsZAwIxUNXy8N Pu9wpxzcjWepNT/ZYv8vD71dOf9FF7+pul1N2hrJNcalqmdVod9WxTPcSBhcIn90K2Mm 3pZOJt0hLh1i/Fkxgw8YlEBOPp1RVn/ABlu40RXcvNeLDdbAu5bQ+E1SJThlueesQCm+ Gdyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=04YeYcBO; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id p6-20020a63f446000000b00578a9ea5686si1753925pgk.660.2023.10.05.09.44.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 09:44:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=04YeYcBO; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id B6E5784D1220; Thu, 5 Oct 2023 09:43:38 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229541AbjJEQit (ORCPT + 19 others); Thu, 5 Oct 2023 12:38:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231516AbjJEQiS (ORCPT ); Thu, 5 Oct 2023 12:38:18 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C1217A8F; Thu, 5 Oct 2023 00:20:21 -0700 (PDT) Date: Thu, 05 Oct 2023 07:20:16 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1696490417; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RCaK47zDCHa+WUW6K8uJlvzYxhvz2WFUP53che6f/SQ=; b=04YeYcBOoWjyk8/OvJFvsZDHE/6k7k9YhetsTLLGy7GxKFgecly1oRh7BpgrwA3CM081JO +rhYbyOslujZxT0wx+J025OdaWaMx+LyrpE8Dx9Db5AsORAilcUaxOwBKFyHwAolz81Qva x9CyOmeFl0ZLacVYv0bTZjTM50uDpCzl9yVgE6Qq01QuRjktKIWkBM47MO3pZH0YhGGGhy k/9fR2CtEKTe1fy7PlmJCNyMKZDo/nziAP8a3oaw2yDhwC3sjpHhyqLhbpBV5QCgvVjxSl euyezZzavAGBrW01a0/U1+iWUy6IruMVT1/CPmKjcqdOr3yi43Kf8Qy6WKxmrQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1696490417; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RCaK47zDCHa+WUW6K8uJlvzYxhvz2WFUP53che6f/SQ=; b=bdBu45+tykw16T2IiPNtLkS4ex8eSEjydy9tgHvUZPnsIUzy+xkDKUAov+UFub+EdVKQZJ 2qxiokcOL2nLFdCg== From: "tip-bot2 for Uros Bizjak" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/percpu] x86/percpu: Use C for percpu read/write accessors Cc: Nadav Amit , Uros Bizjak , Ingo Molnar , Andy Lutomirski , Brian Gerst , Denys Vlasenko , "H. Peter Anvin" , Linus Torvalds , Peter Zijlstra , Thomas Gleixner , Josh Poimboeuf , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20231004192404.31733-1-ubizjak@gmail.com> References: <20231004192404.31733-1-ubizjak@gmail.com> MIME-Version: 1.0 Message-ID: <169649041650.3135.17524840710193792958.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 05 Oct 2023 09:43:38 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778836974545343812 X-GMAIL-MSGID: 1778934610428037323 The following commit has been merged into the x86/percpu branch of tip: Commit-ID: ca4256348660cb2162668ec3d13d1f921d05374a Gitweb: https://git.kernel.org/tip/ca4256348660cb2162668ec3d13d1f921d05374a Author: Uros Bizjak AuthorDate: Wed, 04 Oct 2023 21:23:08 +02:00 Committer: Ingo Molnar CommitterDate: Thu, 05 Oct 2023 09:01:53 +02:00 x86/percpu: Use C for percpu read/write accessors The percpu code mostly uses inline assembly. Using segment qualifiers allows to use C code instead, which enables the compiler to perform various optimizations (e.g. propagation of memory arguments). Convert percpu read and write accessors to C code, so the memory argument can be propagated to the instruction that uses this argument. Some examples of propagations: a) into sign/zero extensions: the code improves from: 65 8a 05 00 00 00 00 mov %gs:0x0(%rip),%al 0f b6 c0 movzbl %al,%eax to: 65 0f b6 05 00 00 00 movzbl %gs:0x0(%rip),%eax 00 and in a similar way for: movzbl %gs:0x0(%rip),%edx movzwl %gs:0x0(%rip),%esi movzbl %gs:0x78(%rbx),%eax movslq %gs:0x0(%rip),%rdx movslq %gs:(%rdi),%rbx b) into compares: the code improves from: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax a9 00 00 0f 00 test $0xf0000,%eax to: 65 f7 05 00 00 00 00 testl $0xf0000,%gs:0x0(%rip) 00 00 0f 00 and in a similar way for: testl $0xf0000,%gs:0x0(%rip) testb $0x1,%gs:0x0(%rip) testl $0xff00,%gs:0x0(%rip) cmpb $0x0,%gs:0x0(%rip) cmp %gs:0x0(%rip),%r14d cmpw $0x8,%gs:0x0(%rip) cmpb $0x0,%gs:(%rax) c) into other insns: the code improves from: 1a355: 83 fa ff cmp $0xffffffff,%edx 1a358: 75 07 jne 1a361 <...> 1a35a: 65 8b 15 00 00 00 00 mov %gs:0x0(%rip),%edx 1a361: to: 1a35a: 83 fa ff cmp $0xffffffff,%edx 1a35d: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx 1a364: 00 The above propagations result in the following code size improvements for current mainline kernel (with the default config), compiled with: # gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1) text data bss dec filename 25508862 4386540 808388 30703790 vmlinux-vanilla.o 25500922 4386532 808388 30695842 vmlinux-new.o Co-developed-by: Nadav Amit Signed-off-by: Nadav Amit Signed-off-by: Uros Bizjak Signed-off-by: Ingo Molnar Cc: Andy Lutomirski Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Josh Poimboeuf Link: https://lore.kernel.org/r/20231004192404.31733-1-ubizjak@gmail.com --- arch/x86/include/asm/percpu.h | 65 ++++++++++++++++++++++++++++------ 1 file changed, 54 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index da45120..60ea775 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -400,13 +400,66 @@ do { \ #define this_cpu_read_stable_8(pcp) percpu_stable_op(8, "mov", pcp) #define this_cpu_read_stable(pcp) __pcpu_size_call_return(this_cpu_read_stable_, pcp) +#ifdef CONFIG_USE_X86_SEG_SUPPORT + +#define __raw_cpu_read(qual, pcp) \ +({ \ + *(qual __my_cpu_type(pcp) *)__my_cpu_ptr(&(pcp)); \ +}) + +#define __raw_cpu_write(qual, pcp, val) \ +do { \ + *(qual __my_cpu_type(pcp) *)__my_cpu_ptr(&(pcp)) = (val); \ +} while (0) + +#define raw_cpu_read_1(pcp) __raw_cpu_read(, pcp) +#define raw_cpu_read_2(pcp) __raw_cpu_read(, pcp) +#define raw_cpu_read_4(pcp) __raw_cpu_read(, pcp) +#define raw_cpu_write_1(pcp, val) __raw_cpu_write(, pcp, val) +#define raw_cpu_write_2(pcp, val) __raw_cpu_write(, pcp, val) +#define raw_cpu_write_4(pcp, val) __raw_cpu_write(, pcp, val) + +#define this_cpu_read_1(pcp) __raw_cpu_read(volatile, pcp) +#define this_cpu_read_2(pcp) __raw_cpu_read(volatile, pcp) +#define this_cpu_read_4(pcp) __raw_cpu_read(volatile, pcp) +#define this_cpu_write_1(pcp, val) __raw_cpu_write(volatile, pcp, val) +#define this_cpu_write_2(pcp, val) __raw_cpu_write(volatile, pcp, val) +#define this_cpu_write_4(pcp, val) __raw_cpu_write(volatile, pcp, val) + +#ifdef CONFIG_X86_64 +#define raw_cpu_read_8(pcp) __raw_cpu_read(, pcp) +#define raw_cpu_write_8(pcp, val) __raw_cpu_write(, pcp, val) + +#define this_cpu_read_8(pcp) __raw_cpu_read(volatile, pcp) +#define this_cpu_write_8(pcp, val) __raw_cpu_write(volatile, pcp, val) +#endif + +#else /* CONFIG_USE_X86_SEG_SUPPORT */ + #define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp) #define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp) #define raw_cpu_read_4(pcp) percpu_from_op(4, , "mov", pcp) - #define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val) #define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val) #define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val) + +#define this_cpu_read_1(pcp) percpu_from_op(1, volatile, "mov", pcp) +#define this_cpu_read_2(pcp) percpu_from_op(2, volatile, "mov", pcp) +#define this_cpu_read_4(pcp) percpu_from_op(4, volatile, "mov", pcp) +#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val) +#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val) +#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val) + +#ifdef CONFIG_X86_64 +#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp) +#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val) + +#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp) +#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val) +#endif + +#endif /* CONFIG_USE_X86_SEG_SUPPORT */ + #define raw_cpu_add_1(pcp, val) percpu_add_op(1, , (pcp), val) #define raw_cpu_add_2(pcp, val) percpu_add_op(2, , (pcp), val) #define raw_cpu_add_4(pcp, val) percpu_add_op(4, , (pcp), val) @@ -432,12 +485,6 @@ do { \ #define raw_cpu_xchg_2(pcp, val) raw_percpu_xchg_op(pcp, val) #define raw_cpu_xchg_4(pcp, val) raw_percpu_xchg_op(pcp, val) -#define this_cpu_read_1(pcp) percpu_from_op(1, volatile, "mov", pcp) -#define this_cpu_read_2(pcp) percpu_from_op(2, volatile, "mov", pcp) -#define this_cpu_read_4(pcp) percpu_from_op(4, volatile, "mov", pcp) -#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val) -#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val) -#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val) #define this_cpu_add_1(pcp, val) percpu_add_op(1, volatile, (pcp), val) #define this_cpu_add_2(pcp, val) percpu_add_op(2, volatile, (pcp), val) #define this_cpu_add_4(pcp, val) percpu_add_op(4, volatile, (pcp), val) @@ -476,8 +523,6 @@ do { \ * 32 bit must fall back to generic operations. */ #ifdef CONFIG_X86_64 -#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp) -#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val) #define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val) #define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val) #define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val) @@ -486,8 +531,6 @@ do { \ #define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, , pcp, oval, nval) #define raw_cpu_try_cmpxchg_8(pcp, ovalp, nval) percpu_try_cmpxchg_op(8, , pcp, ovalp, nval) -#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp) -#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val) #define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val) #define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val) #define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)