Message ID | 20231004145137.86537-5-ubizjak@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:254a:b0:403:3b70:6f57 with SMTP id hf10csp188372vqb; Wed, 4 Oct 2023 07:52:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGZNkmKAb08vpZssIwKkQi6khaadi35E406CyvudCqmtYMqn/U3y091ZXKFUg+2Ey58gpZT X-Received: by 2002:a05:6830:11c6:b0:6b9:c7de:68e0 with SMTP id v6-20020a05683011c600b006b9c7de68e0mr2297900otq.29.1696431135375; Wed, 04 Oct 2023 07:52:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696431135; cv=none; d=google.com; s=arc-20160816; b=AmcQAwQNfxuxxzqTAete+YmXtl7HX/9znEZ3umz7hPyJNCBd+3ZYQlWLbj/OJRPJUY S16qhkPPpdfwo/2gEtd6uJPdJ3mGAFwt0PhFKDBgeZqFhd6MHeGe+tM66h9h73NtIliY 7w2NpaJPsZ2jAik4Xuz6XmjD26AWNDBaPc2lWqo+aJlwJ/+wFD54S4JQe8vK7uX90KTJ LJjFRBqOyBrZPL/iMQlZm/VTU/YH4wKeFk9xrNEomuBOlSOsvgo+sP4ZWaTteL1ODwr/ blKlUhWmugAvNqSKtmjpIUSkdMtMu9qdRmXrhBAbX0bMw46WcwIuGZj/s5NaIrsPMcZQ 8jGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jISPkvnxHNZ9YOvTMmU4+FmuVTZn4CUvBaU3KXe/pp4=; fh=QwKZ3LSC2ArRWaWGsRoGdOI+uNGFxMvejiiF8D37aJU=; b=LVvzdgujOSgsjbYqGOJJwgntY/ZWP/z28SKjknoDFLUeQuxj18jCz8+5vr9lB6e/Li U1cMRyXpkPMrtkGzEmCQh7pjpz8lIOQnJWBLprVEIL4U+eb/hqZrI2FuQi5Yu9m0rpac J8XC01ehp4JZf9ETvlO0Gtv/M1SqZC3dx/G0U2OY4d/hqTNK2AcAL/mTQJHMF2/YmF87 0JOyLLz/tVOHkjCpJRepo94352CyBZghKxixcolOW7s8mXcIH46vFoxpWveGcnv14x6n exwWYvgW4XYQ/tRLeq3Fkm+bM4k6+TU2BvL2liLXFID+LpNzH8VZNV0ExzJc+Svi1H6u SVJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=KqQhbewG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id p16-20020a63c150000000b0056f7592d732si3855486pgi.424.2023.10.04.07.52.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 07:52:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=KqQhbewG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 2634B80C1108; Wed, 4 Oct 2023 07:52:14 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242982AbjJDOwK (ORCPT <rfc822;pusanteemu@gmail.com> + 18 others); Wed, 4 Oct 2023 10:52:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242977AbjJDOwE (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 4 Oct 2023 10:52:04 -0400 Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95C5FCE for <linux-kernel@vger.kernel.org>; Wed, 4 Oct 2023 07:51:56 -0700 (PDT) Received: by mail-ej1-x62c.google.com with SMTP id a640c23a62f3a-9a6190af24aso405625866b.0 for <linux-kernel@vger.kernel.org>; Wed, 04 Oct 2023 07:51:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696431115; x=1697035915; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jISPkvnxHNZ9YOvTMmU4+FmuVTZn4CUvBaU3KXe/pp4=; b=KqQhbewGXZw9JCQ10SBY8ctvjbp36URcRYlvjQa/LO6JA47730TRn9Em6cQ6gcuNgm jI+mpbosxKDCmt/KrBjdd9Ev3l+FsP7FyZYfeffu02dgYYJzaTKreUAZAzYj4CqSm6Nc 9uOme2h91TqPWOtmFT9VLSJ3x0mhonCd/xNPOm1AXbfbdtlZGme7OOcYITyXO2rrInZs tPskQIbbvxdakTx+4bbkzj5hOJkCNjGFa047Pfm5sAHkHbIJGW4jAsu4Ogvu2z0DNsBS 6LX/xH1RtebUtoL1Qam+DWXxpljZlHY6S6rsrkX6429E19WjWX7BDJbxIbjdjpq424W0 KbCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696431115; x=1697035915; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jISPkvnxHNZ9YOvTMmU4+FmuVTZn4CUvBaU3KXe/pp4=; b=XZ/tDAaRnEr/FpYAxVCJ7Nz17S4KrgqAZAgh4d3sAC3oUE6cX6lLcXq9qBRZSRQz1I wx+t+Bxmc34u+5qjWMmzqDuyxc5BmAc1cb63rXW3MlKIHZb4SjvIXyTa19K0JwbrVi8l 9noewd1cfhHn3A2PBbgWT6ol9y3jJkFnEOlWhovJV/cGjP2UM/F22M8NAl8d48i8KvKe 5qhIflxSAtSZIenKFRcQ3qiySnPF5HVPDGD3/4vhyjAN18V59IGfY1U4D8QmnmWVrHM2 2ID71DQKRd8kIzJ3SNcmMyT8CMXIE6Le74itvqnZbUz8+JbewcoTwwtddgEkhiwUVSht CUrw== X-Gm-Message-State: AOJu0Yy17VRdmvGmrws54/5w6fwDpHC4k/Fo54szB/mhT3vbppkfIVJ4 Zcn7ebgtbTtxQtjlL8Yfocw= X-Received: by 2002:a17:906:305a:b0:9ae:5487:c71c with SMTP id d26-20020a170906305a00b009ae5487c71cmr2098120ejd.49.1696431114973; Wed, 04 Oct 2023 07:51:54 -0700 (PDT) Received: from localhost.localdomain ([46.248.82.114]) by smtp.gmail.com with ESMTPSA id j26-20020a1709064b5a00b009a16975ee5asm2906307ejv.169.2023.10.04.07.51.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 07:51:54 -0700 (PDT) From: Uros Bizjak <ubizjak@gmail.com> To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Uros Bizjak <ubizjak@gmail.com>, Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@kernel.org>, Nadav Amit <namit@vmware.com>, Brian Gerst <brgerst@gmail.com>, Denys Vlasenko <dvlasenk@redhat.com>, "H . Peter Anvin" <hpa@zytor.com>, Linus Torvalds <torvalds@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, Josh Poimboeuf <jpoimboe@redhat.com> Subject: [PATCH 4/4] x86/percpu: Use C for percpu read/write accessors Date: Wed, 4 Oct 2023 16:49:44 +0200 Message-ID: <20231004145137.86537-5-ubizjak@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004145137.86537-1-ubizjak@gmail.com> References: <20231004145137.86537-1-ubizjak@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 04 Oct 2023 07:52:14 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778836974545343812 X-GMAIL-MSGID: 1778836974545343812 |
Series |
x86/percpu: Use segment qualifiers
|
|
Commit Message
Uros Bizjak
Oct. 4, 2023, 2:49 p.m. UTC
The percpu code mostly uses inline assembly. Using segment qualifiers allows to use C code instead, which enables the compiler to perform various optimizations (e.g. propagation of memory arguments). Convert percpu read and write accessors to C code, so the memory argument can be propagated to the instruction that uses this argument. Some examples of propagations: a) into sign/zero extensions: 110b54: 65 0f b6 05 00 00 00 movzbl %gs:0x0(%rip),%eax 11ab90: 65 0f b6 15 00 00 00 movzbl %gs:0x0(%rip),%edx 14484a: 65 0f b7 35 00 00 00 movzwl %gs:0x0(%rip),%esi 1a08a9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax 1a08f9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax 4ab29a: 65 48 63 15 00 00 00 movslq %gs:0x0(%rip),%rdx 4be128: 65 4c 63 25 00 00 00 movslq %gs:0x0(%rip),%r12 547468: 65 48 63 1f movslq %gs:(%rdi),%rbx 5474e7: 65 48 63 0a movslq %gs:(%rdx),%rcx 54d05d: 65 48 63 0d 00 00 00 movslq %gs:0x0(%rip),%rcx b) into compares: b40804: 65 f7 05 00 00 00 00 testl $0xf0000,%gs:0x0(%rip) b487e8: 65 f7 05 00 00 00 00 testl $0xf0000,%gs:0x0(%rip) b6f14c: 65 f6 05 00 00 00 00 testb $0x1,%gs:0x0(%rip) bac1b8: 65 f6 05 00 00 00 00 testb $0x1,%gs:0x0(%rip) df2244: 65 f7 05 00 00 00 00 testl $0xff00,%gs:0x0(%rip) 9a7517: 65 80 3d 00 00 00 00 cmpb $0x0,%gs:0x0(%rip) b282ba: 65 44 3b 35 00 00 00 cmp %gs:0x0(%rip),%r14d b48f61: 65 66 83 3d 00 00 00 cmpw $0x8,%gs:0x0(%rip) b493fe: 65 80 38 00 cmpb $0x0,%gs:(%rax) b73867: 65 66 83 3d 00 00 00 cmpw $0x8,%gs:0x0(%rip) c) into other insns: 65ec02: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx 6c98ac: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx 9aafaf: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx b45868: 65 0f 48 35 00 00 00 cmovs %gs:0x0(%rip),%esi d276f8: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx The above propagations result in the following code size improvements for current mainline kernel (with the default config), compiled with gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1) text data bss dec hex filename 25508862 4386540 808388 30703790 1d480ae vmlinux-vanilla.o 25500922 4386532 808388 30695842 1d461a2 vmlinux-new.o The conversion of other read-modify-write instructions does not bring us any benefits, the compiler has some problems when constructing RMW instructions from the generic code and easily misses some opportunities. Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Co-developed-by: Nadav Amit <namit@vmware.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> --- arch/x86/include/asm/percpu.h | 65 +++++++++++++++++++++++++++++------ 1 file changed, 54 insertions(+), 11 deletions(-)
Comments
* Uros Bizjak <ubizjak@gmail.com> wrote: > The percpu code mostly uses inline assembly. Using segment qualifiers > allows to use C code instead, which enables the compiler to perform > various optimizations (e.g. propagation of memory arguments). Convert > percpu read and write accessors to C code, so the memory argument can > be propagated to the instruction that uses this argument. > > Some examples of propagations: > > a) into sign/zero extensions: > > 110b54: 65 0f b6 05 00 00 00 movzbl %gs:0x0(%rip),%eax > 11ab90: 65 0f b6 15 00 00 00 movzbl %gs:0x0(%rip),%edx > 14484a: 65 0f b7 35 00 00 00 movzwl %gs:0x0(%rip),%esi > 1a08a9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax > 1a08f9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax > > 4ab29a: 65 48 63 15 00 00 00 movslq %gs:0x0(%rip),%rdx > 4be128: 65 4c 63 25 00 00 00 movslq %gs:0x0(%rip),%r12 > 547468: 65 48 63 1f movslq %gs:(%rdi),%rbx > 5474e7: 65 48 63 0a movslq %gs:(%rdx),%rcx > 54d05d: 65 48 63 0d 00 00 00 movslq %gs:0x0(%rip),%rcx Could you please also quote a 'before' assembly sequence, at least once per group of propagations? Ie. readers will be able to see what kind of code generation changes result in this kind of text size reduction: > text data bss dec hex filename > 25508862 4386540 808388 30703790 1d480ae vmlinux-vanilla.o > 25500922 4386532 808388 30695842 1d461a2 vmlinux-new.o Thanks, Ingo
* Ingo Molnar <mingo@kernel.org> wrote: > > * Uros Bizjak <ubizjak@gmail.com> wrote: > > > The percpu code mostly uses inline assembly. Using segment qualifiers > > allows to use C code instead, which enables the compiler to perform > > various optimizations (e.g. propagation of memory arguments). Convert > > percpu read and write accessors to C code, so the memory argument can > > be propagated to the instruction that uses this argument. > > > > Some examples of propagations: > > > > a) into sign/zero extensions: > > > > 110b54: 65 0f b6 05 00 00 00 movzbl %gs:0x0(%rip),%eax > > 11ab90: 65 0f b6 15 00 00 00 movzbl %gs:0x0(%rip),%edx > > 14484a: 65 0f b7 35 00 00 00 movzwl %gs:0x0(%rip),%esi > > 1a08a9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax > > 1a08f9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax > > > > 4ab29a: 65 48 63 15 00 00 00 movslq %gs:0x0(%rip),%rdx > > 4be128: 65 4c 63 25 00 00 00 movslq %gs:0x0(%rip),%r12 > > 547468: 65 48 63 1f movslq %gs:(%rdi),%rbx > > 5474e7: 65 48 63 0a movslq %gs:(%rdx),%rcx > > 54d05d: 65 48 63 0d 00 00 00 movslq %gs:0x0(%rip),%rcx > > Could you please also quote a 'before' assembly sequence, at least once > per group of propagations? Ie. for any changes to x86 code generation, please follow the changelog format of: 7c097ca50d2b ("x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op") ... Move the load of %rsi outside inline asm, so the compiler can reuse the value. The code in slub.o improves from: 55ac: 49 8b 3c 24 mov (%r12),%rdi 55b0: 48 8d 4a 40 lea 0x40(%rdx),%rcx 55b4: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 55b8: 4c 89 f8 mov %r15,%rax 55bb: 48 8d 37 lea (%rdi),%rsi 55be: e8 00 00 00 00 callq 55c3 <...> 55bf: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 55c3: 75 a3 jne 5568 <...> 55c5: ... 0000000000000000 <.altinstr_replacement>: 5: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) to: 55ac: 49 8b 34 24 mov (%r12),%rsi 55b0: 48 8d 4a 40 lea 0x40(%rdx),%rcx 55b4: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 55b8: 4c 89 f8 mov %r15,%rax 55bb: e8 00 00 00 00 callq 55c0 <...> 55bc: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 55c0: 75 a6 jne 5568 <...> 55c2: ... Where the alternative replacement instruction now uses %rsi: 0000000000000000 <.altinstr_replacement>: 5: 65 48 0f c7 0e cmpxchg16b %gs:(%rsi) The instruction (effectively a reg-reg move) at 55bb: in the original assembly is removed. Also, both the CALL and replacement CMPXCHG16B are 5 bytes long, removing the need for NOPs in the asm code. ... Thanks, Ingo
On Wed, 4 Oct 2023 at 07:51, Uros Bizjak <ubizjak@gmail.com> wrote: > > The percpu code mostly uses inline assembly. Using segment qualifiers > allows to use C code instead, which enables the compiler to perform > various optimizations (e.g. propagation of memory arguments). Convert > percpu read and write accessors to C code, so the memory argument can > be propagated to the instruction that uses this argument. So apparently this causes boot failures. It might be worth testing a version where this: > +#define raw_cpu_read_1(pcp) __raw_cpu_read(, pcp) > +#define raw_cpu_read_2(pcp) __raw_cpu_read(, pcp) > +#define raw_cpu_read_4(pcp) __raw_cpu_read(, pcp) > +#define raw_cpu_write_1(pcp, val) __raw_cpu_write(, pcp, val) > +#define raw_cpu_write_2(pcp, val) __raw_cpu_write(, pcp, val) > +#define raw_cpu_write_4(pcp, val) __raw_cpu_write(, pcp, val) and this > +#ifdef CONFIG_X86_64 > +#define raw_cpu_read_8(pcp) __raw_cpu_read(, pcp) > +#define raw_cpu_write_8(pcp, val) __raw_cpu_write(, pcp, val) was all using 'volatile' in the qualifier argument and see if that makes the boot failure go away. Because while the old code wasn't "asm volatile", even just a *plain* asm() is certainly a lot more serialized than a normal access. For example, the asm() version of raw_cpu_write() used "+m" for the destination modifier, which means that if you did multiple percpu writes to the same variable, gcc would output multiple asm calls, because it would see the subsequent ones as reading the old value (even if they don't *actually* do so). That's admittedly really just because it uses a common macro for raw_cpu_write() and the updates (like the percpu_add() code), so the fact that it uses "+m" instead of "=m" is just a random odd artifact of the inline asm version, but maybe we have code that ends up working just by accident. Also, I'm not sure gcc re-orders asms wrt each other - even when they aren't marked volatile. So it might be worth at least a trivial "make everything volatile" test to see if that affects anything. Linus
On Sun, Oct 8, 2023 at 8:00 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, 4 Oct 2023 at 07:51, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > The percpu code mostly uses inline assembly. Using segment qualifiers > > allows to use C code instead, which enables the compiler to perform > > various optimizations (e.g. propagation of memory arguments). Convert > > percpu read and write accessors to C code, so the memory argument can > > be propagated to the instruction that uses this argument. > > So apparently this causes boot failures. > > It might be worth testing a version where this: > > > +#define raw_cpu_read_1(pcp) __raw_cpu_read(, pcp) > > +#define raw_cpu_read_2(pcp) __raw_cpu_read(, pcp) > > +#define raw_cpu_read_4(pcp) __raw_cpu_read(, pcp) > > +#define raw_cpu_write_1(pcp, val) __raw_cpu_write(, pcp, val) > > +#define raw_cpu_write_2(pcp, val) __raw_cpu_write(, pcp, val) > > +#define raw_cpu_write_4(pcp, val) __raw_cpu_write(, pcp, val) > > and this > > > +#ifdef CONFIG_X86_64 > > +#define raw_cpu_read_8(pcp) __raw_cpu_read(, pcp) > > +#define raw_cpu_write_8(pcp, val) __raw_cpu_write(, pcp, val) > > was all using 'volatile' in the qualifier argument and see if that > makes the boot failure go away. > > Because while the old code wasn't "asm volatile", even just a *plain* > asm() is certainly a lot more serialized than a normal access. > > For example, the asm() version of raw_cpu_write() used "+m" for the > destination modifier, which means that if you did multiple percpu > writes to the same variable, gcc would output multiple asm calls, > because it would see the subsequent ones as reading the old value > (even if they don't *actually* do so). > > That's admittedly really just because it uses a common macro for > raw_cpu_write() and the updates (like the percpu_add() code), so the > fact that it uses "+m" instead of "=m" is just a random odd artifact > of the inline asm version, but maybe we have code that ends up working > just by accident. > > Also, I'm not sure gcc re-orders asms wrt each other - even when they > aren't marked volatile. > > So it might be worth at least a trivial "make everything volatile" > test to see if that affects anything. I have managed to reproduce the bug, and I'm trying the following path: Scrap the last patch and just add: #define __raw_cpu_read_new(size, qual, pcp) \ ({ \ *(qual __my_cpu_type(pcp) *)__my_cpu_ptr(&(pcp)); \ }) #define __raw_cpu_read(size, qual, _var) \ ({ \ __pcpu_type_##size pfo_val__; \ asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]), "%[val]") \ : [val] __pcpu_reg_##size("=", pfo_val__) \ : [var] "m" (__my_cpu_var(_var))); \ (typeof(_var))(unsigned long) pfo_val__; \ }) Then changing *only* #define raw_cpu_read_8(pcp) __raw_cpu_read_new(8, , pcp) brings the boot process far enough to report: [ 0.463711][ T0] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.465859][ T0] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) PANIC: early exception 0x0d IP 10:ffffffff810c4cb9 error 0 cr2 0xffff8881ab1ff000 [ 0.469084][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-11417-gca4256348660-dirty #7 [ 0.470756][ T0] RIP: 0010:cpu_init_exception_handling+0x179/0x740 [ 0.472045][ T0] Code: be 0f 00 00 00 4a 03 04 ed 40 19 15 85 48 89 c7 e8 9c bb ff ff 48 c7 c0 10 73 02 00 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 10 00 0f 85 21 05 00 00 65 48 8b 05 45 26 f6 7e 48 8d 7b 24 [ 0.475784][ T0] RSP: 0000:ffffffff85207e38 EFLAGS: 00010002 ORIG_RAX: 0000000000000000 [ 0.477384][ T0] RAX: 0000000000004e62 RBX: ffff88817700a000 RCX: 0000000000000010 [ 0.479093][ T0] RDX: dffffc0000000000 RSI: ffffffff85207e60 RDI: ffff88817700f078 [ 0.481178][ T0] RBP: 000000000000f000 R08: 0040f50000000000 R09: 0040f50000000000 [ 0.482655][ T0] R10: ffff8881ab02a000 R11: 0000000000000000 R12: 1ffffffff0a40fc8 [ 0.484128][ T0] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff85151940 [ 0.485604][ T0] FS: 0000000000000000(0000) GS:ffff888177000000(0000) knlGS:0000000000000000 [ 0.487246][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.488515][ T0] CR2: ffff8881ab1ff000 CR3: 00000000052d7000 CR4: 00000000000000b0 [ 0.490002][ T0] Call Trace: [ 0.490600][ T0] <TASK> [ 0.491145][ T0] ? early_fixup_exception+0x10e/0x280 [ 0.492176][ T0] ? early_idt_handler_common+0x2f/0x40 [ 0.493222][ T0] ? cpu_init_exception_handling+0x179/0x740 [ 0.494348][ T0] ? cpu_init_exception_handling+0x164/0x740 [ 0.495472][ T0] ? syscall_init+0x1c0/0x1c0 [ 0.496351][ T0] ? per_cpu_ptr_to_phys+0x1ca/0x2c0 [ 0.497336][ T0] ? setup_cpu_entry_areas+0x138/0x980 [ 0.498365][ T0] trap_init+0xa/0x40 Let me see what happens here. I have changed *only* raw_cpu_read_8, but the GP fault is reported in cpu_init_exception_handling, which uses this_cpu_ptr. Please note that all per-cpu initializations go through existing {this|raw}_cpu_write. void cpu_init_exception_handling(void) { struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw); int cpu = raw_smp_processor_id(); ... I have tried the trick with all reads volatile (and writes as they were before the patch), but it didn't make a difference. Also, the kernel from the git/tip branch works OK for default config, so I think there is some config option that somehow disturbs the named-as enabled kernel. Uros.
On Sun, 8 Oct 2023 at 12:18, Uros Bizjak <ubizjak@gmail.com> wrote: > > Let me see what happens here. I have changed *only* raw_cpu_read_8, > but the GP fault is reported in cpu_init_exception_handling, which > uses this_cpu_ptr. Please note that all per-cpu initializations go > through existing {this|raw}_cpu_write. I think it's an ordering issue, and I think you may hit some issue with loading TR od the GDT or whatever. For example, we have this set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); followed by asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8)); in native_load_tr_desc(), and I think we might want to add a "memory" clobber to it to make sure it is serialized with any stores to the GDT entries in question. I don't think *that* particular thing is the issue (because you kept the writes as-is and still hit things), but I think it's an example of some lazy inline asm constraints that could possibly cause problems if the ordering changes. And yes, this code ends up depending on things like CONFIG_PARAVIRT_XXL for whether it uses the native TR loading or uses some paravirt version, so config options can make a difference. Again: I don't think it's that "ltr" instruction. I'm calling it out just as a "that function does some funky things", and the load TR is *one* of the funky things, and it looks like it could be the same type of thing that then causes issues. Things like CONFIG_SMP might also matter, because the percpu setup is different. On UP, the *segment* use goes away, but I think the whole "use inline asm vs regular memory ops" remains (admittedly I did *not* verify that, I might be speaking out of my *ss). Your dump does end up being close to a %gs access: 0: 4a 03 04 ed 40 19 15 add -0x7aeae6c0(,%r13,8),%rax 7: 85 8: 48 89 c7 mov %rax,%rdi b: e8 9c bb ff ff call 0xffffffffffffbbac 10: 48 c7 c0 10 73 02 00 mov $0x27310,%rax 17: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx 1e: fc ff df 21: 48 c1 e8 03 shr $0x3,%rax 25:* 80 3c 10 00 cmpb $0x0,(%rax,%rdx,1) <-- trapping instruction 29: 0f 85 21 05 00 00 jne 0x550 2f: 65 48 8b 05 45 26 f6 mov %gs:0x7ef62645(%rip),%rax # 0x7ef6267c 36: 7e 37: 48 8d 7b 24 lea 0x24(%rbx),%rdi but I don't know what the "call" before is, so I wasn't able to match it up with any obvious code in there. Linus
On Sun, 8 Oct 2023 at 13:13, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Your dump does end up being close to a %gs access: Bah. I should have looked closer at the instructions before the oops. Because I think that's exactly the problem here. That's the KASAN checks that have been added, and we have this insane code: > 10: 48 c7 c0 10 73 02 00 mov $0x27310,%rax > 17: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx > 1e: fc ff df > 21: 48 c1 e8 03 shr $0x3,%rax > 25:* 80 3c 10 00 cmpb $0x0,(%rax,%rdx,1) <-- trapping instruction Look how both %rax and %rdx are constants, yet then gcc has generated that crazy "shift a constant value right by three bits, and then use an addressing mode to add it to another constant". And that 0xdffffc0000000000 constant is KASAN_SHADOW_OFFSET. So what I think is going on is trivial - and has nothing to do with ordering. I think gcc is simply doing a KASAN check on a percpu address. Which it shouldn't do, and didn't use to do because we did the access using inline asm. But now that gcc does the accesses as normal (albeit special address space) memory accesses, the KASAN code triggers on them too, and it all goes to hell in a handbasket very quickly. End result: those percpu accessor functions need to disable any KASAN checking or other sanitizer checking. Not on the percpu address, because that's not a "real" address, it's obviously just the offset from the segment register. We have some other cases like that, see __read_once_word_nocheck(). And gcc should probably not have generated such code in the first place, so arguably this is a bug with -fsanitize=kernel-address. How does gcc handle the thread pointers with address sanitizer? Does it convert them into real pointers first, and didn't realize that it can't do it for __seg_gs? Linus
On Sun, Oct 8, 2023 at 10:48 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Sun, 8 Oct 2023 at 13:13, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Your dump does end up being close to a %gs access: > > Bah. I should have looked closer at the instructions before the oops. > > Because I think that's exactly the problem here. That's the KASAN > checks that have been added, and we have this insane code: > > > 10: 48 c7 c0 10 73 02 00 mov $0x27310,%rax > > 17: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx > > 1e: fc ff df > > 21: 48 c1 e8 03 shr $0x3,%rax > > 25:* 80 3c 10 00 cmpb $0x0,(%rax,%rdx,1) <-- trapping instruction > > Look how both %rax and %rdx are constants, yet then gcc has generated > that crazy "shift a constant value right by three bits, and then use > an addressing mode to add it to another constant". Hm, the compiler knows perfectly well how to make compound addresses, but all this KASAN stuff is a bit special. > And that 0xdffffc0000000000 constant is KASAN_SHADOW_OFFSET. > > So what I think is going on is trivial - and has nothing to do with ordering. > > I think gcc is simply doing a KASAN check on a percpu address. > > Which it shouldn't do, and didn't use to do because we did the access > using inline asm. > > But now that gcc does the accesses as normal (albeit special address > space) memory accesses, the KASAN code triggers on them too, and it > all goes to hell in a handbasket very quickly. Yes, I can confirm that. The failing .config from Linux Kernel Test project works perfectly well after KASAN has been switched off. So, the patch to fix the issue could be as simple as the one attached to the message. > End result: those percpu accessor functions need to disable any KASAN > checking or other sanitizer checking. Not on the percpu address, > because that's not a "real" address, it's obviously just the offset > from the segment register. > > We have some other cases like that, see __read_once_word_nocheck(). > > And gcc should probably not have generated such code in the first > place, so arguably this is a bug with -fsanitize=kernel-address. How > does gcc handle the thread pointers with address sanitizer? Does it > convert them into real pointers first, and didn't realize that it > can't do it for __seg_gs? I don't know this part of the compiler well, but it should not touch non-default namespaces. I'll file a bug report there. Thanks, Uros. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ecb256954351..1edf4a5b93ca 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2393,7 +2393,7 @@ config CC_HAS_NAMED_AS config USE_X86_SEG_SUPPORT def_bool y - depends on CC_HAS_NAMED_AS && SMP + depends on CC_HAS_NAMED_AS && SMP && !KASAN config CC_HAS_SLS def_bool $(cc-option,-mharden-sls=all)
* Uros Bizjak <ubizjak@gmail.com> wrote: > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index ecb256954351..1edf4a5b93ca 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2393,7 +2393,7 @@ config CC_HAS_NAMED_AS > > config USE_X86_SEG_SUPPORT > def_bool y > - depends on CC_HAS_NAMED_AS && SMP > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > + depends on CC_HAS_NAMED_AS && SMP && !KASAN So I'd rather express this as a Kconfig quirk line, and explain each quirk. Something like: depends on CC_HAS_NAMED_AS depends on SMP # # -fsanitize=kernel-address (KASAN) is at the moment incompatible # with named address spaces - see GCC bug #12345. # depends on !KASAN ... or so. BTW., please also document the reason why !SMP is excluded. Thanks, Ingo
* Uros Bizjak <ubizjak@gmail.com> wrote: > I have tried the trick with all reads volatile (and writes as they were > before the patch), but it didn't make a difference. Also, the kernel from > the git/tip branch works OK for default config, so I think there is some > config option that somehow disturbs the named-as enabled kernel. Yeah, I made sure tip:x86/percpu boots & works fine on a number of systems - but that testing wasn't comprehensive at all, and I didn't have KASAN enabled either, which generates pretty intrusive instrumentation. Thanks, Ingo
* Ingo Molnar <mingo@kernel.org> wrote: > > * Uros Bizjak <ubizjak@gmail.com> wrote: > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index ecb256954351..1edf4a5b93ca 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -2393,7 +2393,7 @@ config CC_HAS_NAMED_AS > > > > config USE_X86_SEG_SUPPORT > > def_bool y > > - depends on CC_HAS_NAMED_AS && SMP > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > So I'd rather express this as a Kconfig quirk line, and explain each quirk. > > Something like: > > depends on CC_HAS_NAMED_AS > depends on SMP > # > # -fsanitize=kernel-address (KASAN) is at the moment incompatible > # with named address spaces - see GCC bug #12345. > # > depends on !KASAN > > ... or so. BTW., while this OK for testing, this is too heavy handed for release purposes, so please only disable the KASAN instrumentation for the affected percpu accessors. See the various __no_sanitize* attributes available. I'd even suggest introducing a new attribute variant, specific to x86, prefixed with __no_sanitize_x86_seg or so, which would allow the eventual KASAN-instrumentation of the percpu accessors once the underlying GCC bug is fixed. Thanks, Ingo
On Mon, Oct 9, 2023 at 1:51 PM Ingo Molnar <mingo@kernel.org> wrote: > > > * Ingo Molnar <mingo@kernel.org> wrote: > > > > > * Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > > index ecb256954351..1edf4a5b93ca 100644 > > > --- a/arch/x86/Kconfig > > > +++ b/arch/x86/Kconfig > > > @@ -2393,7 +2393,7 @@ config CC_HAS_NAMED_AS > > > > > > config USE_X86_SEG_SUPPORT > > > def_bool y > > > - depends on CC_HAS_NAMED_AS && SMP > > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > > > So I'd rather express this as a Kconfig quirk line, and explain each quirk. > > > > Something like: > > > > depends on CC_HAS_NAMED_AS > > depends on SMP > > # > > # -fsanitize=kernel-address (KASAN) is at the moment incompatible > > # with named address spaces - see GCC bug #12345. > > # > > depends on !KASAN > > > > ... or so. > > BTW., while this OK for testing, this is too heavy handed for release > purposes, so please only disable the KASAN instrumentation for the affected > percpu accessors. > > See the various __no_sanitize* attributes available. These attributes are for function declarations. The percpu casts can not be implemented with separate static inline functions. Also, __no_sanitize_address is mutually exclusive with __always_inline. Uros. > I'd even suggest introducing a new attribute variant, specific to x86, > prefixed with __no_sanitize_x86_seg or so, which would allow the eventual > KASAN-instrumentation of the percpu accessors once the underlying GCC bug > is fixed. > > Thanks, > > Ingo
* Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Oct 9, 2023 at 1:51 PM Ingo Molnar <mingo@kernel.org> wrote: > > > > > > * Ingo Molnar <mingo@kernel.org> wrote: > > > > > > > > * Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > > > index ecb256954351..1edf4a5b93ca 100644 > > > > --- a/arch/x86/Kconfig > > > > +++ b/arch/x86/Kconfig > > > > @@ -2393,7 +2393,7 @@ config CC_HAS_NAMED_AS > > > > > > > > config USE_X86_SEG_SUPPORT > > > > def_bool y > > > > - depends on CC_HAS_NAMED_AS && SMP > > > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > > > > > So I'd rather express this as a Kconfig quirk line, and explain each quirk. > > > > > > Something like: > > > > > > depends on CC_HAS_NAMED_AS > > > depends on SMP > > > # > > > # -fsanitize=kernel-address (KASAN) is at the moment incompatible > > > # with named address spaces - see GCC bug #12345. > > > # > > > depends on !KASAN > > > > > > ... or so. > > > > BTW., while this OK for testing, this is too heavy handed for release > > purposes, so please only disable the KASAN instrumentation for the affected > > percpu accessors. > > > > See the various __no_sanitize* attributes available. > > These attributes are for function declarations. The percpu casts can > not be implemented with separate static inline functions. Also, > __no_sanitize_address is mutually exclusive with __always_inline. Sigh - I guess the Kconfig toggle is the only solution then? Thanks, Ingo
> On Oct 9, 2023, at 3:00 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > > !! External Email > > On Mon, Oct 9, 2023 at 1:51 PM Ingo Molnar <mingo@kernel.org> wrote: >> >> BTW., while this OK for testing, this is too heavy handed for release >> purposes, so please only disable the KASAN instrumentation for the affected >> percpu accessors. >> >> See the various __no_sanitize* attributes available. > > These attributes are for function declarations. The percpu casts can > not be implemented with separate static inline functions. Also, > __no_sanitize_address is mutually exclusive with __always_inline. Right, but for GCC you may be able to do something like: #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-fsanitize=address" // Your code here... #pragma GCC diagnostic pop Not sure if there is something equivalent in CLANG, and it should be done with the kernel’s _Pragma.
On Mon, Oct 9, 2023 at 1:41 PM Ingo Molnar <mingo@kernel.org> wrote: > > > * Uros Bizjak <ubizjak@gmail.com> wrote: > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index ecb256954351..1edf4a5b93ca 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -2393,7 +2393,7 @@ config CC_HAS_NAMED_AS > > > > config USE_X86_SEG_SUPPORT > > def_bool y > > - depends on CC_HAS_NAMED_AS && SMP > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > So I'd rather express this as a Kconfig quirk line, and explain each quirk. > > Something like: > > depends on CC_HAS_NAMED_AS > depends on SMP > # > # -fsanitize=kernel-address (KASAN) is at the moment incompatible > # with named address spaces - see GCC bug #12345. > # > depends on !KASAN > > ... or so. > > BTW., please also document the reason why !SMP is excluded. Eh, thanks for pointing it out, it is not needed at all, it works also for !SMP. Will fix in a Kconfig patch. Thanks, Uros.
On Mon, Oct 9, 2023 at 2:21 PM Nadav Amit <namit@vmware.com> wrote: > > > > > On Oct 9, 2023, at 3:00 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > !! External Email > > > > On Mon, Oct 9, 2023 at 1:51 PM Ingo Molnar <mingo@kernel.org> wrote: > >> > >> BTW., while this OK for testing, this is too heavy handed for release > >> purposes, so please only disable the KASAN instrumentation for the affected > >> percpu accessors. > >> > >> See the various __no_sanitize* attributes available. > > > > These attributes are for function declarations. The percpu casts can > > not be implemented with separate static inline functions. Also, > > __no_sanitize_address is mutually exclusive with __always_inline. > > Right, but for GCC you may be able to do something like: > > #pragma GCC diagnostic push > #pragma GCC diagnostic ignored "-fsanitize=address" > > // Your code here... > #pragma GCC diagnostic pop > > Not sure if there is something equivalent in CLANG, and it should be done with > the kernel’s _Pragma. Unfortunately, this is only for diagnostics and expects "-W..." to suppress warnings. Here we want to disable kernel sanitizer just for the enclosing access and I'm sure it won't work with diagnostics pragmas. I don't think that "-fsanitize=..." is included in target or optimization options allowed in Pragma. Uros.
> On Oct 9, 2023, at 3:42 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > > !! External Email > > On Mon, Oct 9, 2023 at 2:21 PM Nadav Amit <namit@vmware.com> wrote: >> >> >> >>> On Oct 9, 2023, at 3:00 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> >>> !! External Email >>> >>> On Mon, Oct 9, 2023 at 1:51 PM Ingo Molnar <mingo@kernel.org> wrote: >>>> >>>> BTW., while this OK for testing, this is too heavy handed for release >>>> purposes, so please only disable the KASAN instrumentation for the affected >>>> percpu accessors. >>>> >>>> See the various __no_sanitize* attributes available. >>> >>> These attributes are for function declarations. The percpu casts can >>> not be implemented with separate static inline functions. Also, >>> __no_sanitize_address is mutually exclusive with __always_inline. >> >> Right, but for GCC you may be able to do something like: >> >> #pragma GCC diagnostic push >> #pragma GCC diagnostic ignored "-fsanitize=address" >> >> // Your code here... >> #pragma GCC diagnostic pop >> >> Not sure if there is something equivalent in CLANG, and it should be done with >> the kernel’s _Pragma. > > Unfortunately, this is only for diagnostics and expects "-W..." to > suppress warnings. Here we want to disable kernel sanitizer just for > the enclosing access and I'm sure it won't work with diagnostics > pragmas. I don't think that "-fsanitize=..." is included in target or > optimization options allowed in Pragma. Ugh. Sorry for the noise. You seem to be right.
On Mon, Oct 9, 2023 at 1:41 PM Ingo Molnar <mingo@kernel.org> wrote: > > > * Uros Bizjak <ubizjak@gmail.com> wrote: > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index ecb256954351..1edf4a5b93ca 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -2393,7 +2393,7 @@ config CC_HAS_NAMED_AS > > > > config USE_X86_SEG_SUPPORT > > def_bool y > > - depends on CC_HAS_NAMED_AS && SMP > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > + depends on CC_HAS_NAMED_AS && SMP && !KASAN > > So I'd rather express this as a Kconfig quirk line, and explain each quirk. > > Something like: > > depends on CC_HAS_NAMED_AS > depends on SMP > # > # -fsanitize=kernel-address (KASAN) is at the moment incompatible > # with named address spaces - see GCC bug #12345. > # > depends on !KASAN This is now PR sanitizer/111736 [1], but perhaps KASAN people [CC'd] also want to be notified about this problem. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736 Thanks, Uros.
On Sun, Oct 8, 2023 at 8:00 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, 4 Oct 2023 at 07:51, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > The percpu code mostly uses inline assembly. Using segment qualifiers > > allows to use C code instead, which enables the compiler to perform > > various optimizations (e.g. propagation of memory arguments). Convert > > percpu read and write accessors to C code, so the memory argument can > > be propagated to the instruction that uses this argument. > > So apparently this causes boot failures. > > It might be worth testing a version where this: > > > +#define raw_cpu_read_1(pcp) __raw_cpu_read(, pcp) > > +#define raw_cpu_read_2(pcp) __raw_cpu_read(, pcp) > > +#define raw_cpu_read_4(pcp) __raw_cpu_read(, pcp) > > +#define raw_cpu_write_1(pcp, val) __raw_cpu_write(, pcp, val) > > +#define raw_cpu_write_2(pcp, val) __raw_cpu_write(, pcp, val) > > +#define raw_cpu_write_4(pcp, val) __raw_cpu_write(, pcp, val) > > and this > > > +#ifdef CONFIG_X86_64 > > +#define raw_cpu_read_8(pcp) __raw_cpu_read(, pcp) > > +#define raw_cpu_write_8(pcp, val) __raw_cpu_write(, pcp, val) > > was all using 'volatile' in the qualifier argument and see if that > makes the boot failure go away. > > Because while the old code wasn't "asm volatile", even just a *plain* > asm() is certainly a lot more serialized than a normal access. > > For example, the asm() version of raw_cpu_write() used "+m" for the > destination modifier, which means that if you did multiple percpu > writes to the same variable, gcc would output multiple asm calls, > because it would see the subsequent ones as reading the old value > (even if they don't *actually* do so). > > That's admittedly really just because it uses a common macro for > raw_cpu_write() and the updates (like the percpu_add() code), so the > fact that it uses "+m" instead of "=m" is just a random odd artifact > of the inline asm version, but maybe we have code that ends up working > just by accident. FYI: While the emitted asm code is correct, the program flow depends on uninitialized value. The compiler is free to remove the whole insn stream in this case. Admittedly, we have asm here, so the compiler is a bit more forgiving, but it is a slippery slope nevertheless. Uros.
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index da451202a1b9..60ea7755c0fe 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -400,13 +400,66 @@ do { \ #define this_cpu_read_stable_8(pcp) percpu_stable_op(8, "mov", pcp) #define this_cpu_read_stable(pcp) __pcpu_size_call_return(this_cpu_read_stable_, pcp) +#ifdef CONFIG_USE_X86_SEG_SUPPORT + +#define __raw_cpu_read(qual, pcp) \ +({ \ + *(qual __my_cpu_type(pcp) *)__my_cpu_ptr(&(pcp)); \ +}) + +#define __raw_cpu_write(qual, pcp, val) \ +do { \ + *(qual __my_cpu_type(pcp) *)__my_cpu_ptr(&(pcp)) = (val); \ +} while (0) + +#define raw_cpu_read_1(pcp) __raw_cpu_read(, pcp) +#define raw_cpu_read_2(pcp) __raw_cpu_read(, pcp) +#define raw_cpu_read_4(pcp) __raw_cpu_read(, pcp) +#define raw_cpu_write_1(pcp, val) __raw_cpu_write(, pcp, val) +#define raw_cpu_write_2(pcp, val) __raw_cpu_write(, pcp, val) +#define raw_cpu_write_4(pcp, val) __raw_cpu_write(, pcp, val) + +#define this_cpu_read_1(pcp) __raw_cpu_read(volatile, pcp) +#define this_cpu_read_2(pcp) __raw_cpu_read(volatile, pcp) +#define this_cpu_read_4(pcp) __raw_cpu_read(volatile, pcp) +#define this_cpu_write_1(pcp, val) __raw_cpu_write(volatile, pcp, val) +#define this_cpu_write_2(pcp, val) __raw_cpu_write(volatile, pcp, val) +#define this_cpu_write_4(pcp, val) __raw_cpu_write(volatile, pcp, val) + +#ifdef CONFIG_X86_64 +#define raw_cpu_read_8(pcp) __raw_cpu_read(, pcp) +#define raw_cpu_write_8(pcp, val) __raw_cpu_write(, pcp, val) + +#define this_cpu_read_8(pcp) __raw_cpu_read(volatile, pcp) +#define this_cpu_write_8(pcp, val) __raw_cpu_write(volatile, pcp, val) +#endif + +#else /* CONFIG_USE_X86_SEG_SUPPORT */ + #define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp) #define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp) #define raw_cpu_read_4(pcp) percpu_from_op(4, , "mov", pcp) - #define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val) #define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val) #define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val) + +#define this_cpu_read_1(pcp) percpu_from_op(1, volatile, "mov", pcp) +#define this_cpu_read_2(pcp) percpu_from_op(2, volatile, "mov", pcp) +#define this_cpu_read_4(pcp) percpu_from_op(4, volatile, "mov", pcp) +#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val) +#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val) +#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val) + +#ifdef CONFIG_X86_64 +#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp) +#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val) + +#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp) +#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val) +#endif + +#endif /* CONFIG_USE_X86_SEG_SUPPORT */ + #define raw_cpu_add_1(pcp, val) percpu_add_op(1, , (pcp), val) #define raw_cpu_add_2(pcp, val) percpu_add_op(2, , (pcp), val) #define raw_cpu_add_4(pcp, val) percpu_add_op(4, , (pcp), val) @@ -432,12 +485,6 @@ do { \ #define raw_cpu_xchg_2(pcp, val) raw_percpu_xchg_op(pcp, val) #define raw_cpu_xchg_4(pcp, val) raw_percpu_xchg_op(pcp, val) -#define this_cpu_read_1(pcp) percpu_from_op(1, volatile, "mov", pcp) -#define this_cpu_read_2(pcp) percpu_from_op(2, volatile, "mov", pcp) -#define this_cpu_read_4(pcp) percpu_from_op(4, volatile, "mov", pcp) -#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val) -#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val) -#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val) #define this_cpu_add_1(pcp, val) percpu_add_op(1, volatile, (pcp), val) #define this_cpu_add_2(pcp, val) percpu_add_op(2, volatile, (pcp), val) #define this_cpu_add_4(pcp, val) percpu_add_op(4, volatile, (pcp), val) @@ -476,8 +523,6 @@ do { \ * 32 bit must fall back to generic operations. */ #ifdef CONFIG_X86_64 -#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp) -#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val) #define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val) #define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val) #define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val) @@ -486,8 +531,6 @@ do { \ #define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, , pcp, oval, nval) #define raw_cpu_try_cmpxchg_8(pcp, ovalp, nval) percpu_try_cmpxchg_op(8, , pcp, ovalp, nval) -#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp) -#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val) #define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val) #define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val) #define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)