Message ID | 169477710252.27769.14094735545135203449.tip-bot2@tip-bot2 |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp1238815vqi; Fri, 15 Sep 2023 11:21:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGcZnJ9eWK9IiPqJYvVPZ6hJ399KMQiW0V/q7DzkvdIXt4SxFrsHEPvDylq4QQLUswYCvai X-Received: by 2002:a17:90a:bc84:b0:26f:b228:faea with SMTP id x4-20020a17090abc8400b0026fb228faeamr2136529pjr.18.1694802095356; Fri, 15 Sep 2023 11:21:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694802095; cv=none; d=google.com; s=arc-20160816; b=rHVMgQK655cYq0ONDIhiT5GPKykoEc0KtxovSaci1pdXS4kf68gpZBEW3etAzZ+vCM BNSPXMd5fBzd3sjG7ZBPEYCIIImaHKFejyPTDJPdl9YR/fKMU0rKWeMbAKIgogzUWTCr Sd9ZguujbTpUutXviB0eZLUHiYe2JCrWHMM3Hw06ts2LsZLgqNUcSkNskB325ywGw82U Smgw8g3cjeqWVwcSt18AJOHkDqmzU5w2tgP6QAI3G6qR8zakCDRzs+os13TB7kxwppsV FTQAreX9oWdqhusY18Op77W3B2L3WMjA+egs69WjJr4xG6xomffT69nCscD5USb6Kz/s BGNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=9J21HuhWXljN+jXTdAk29WG1MDKeiqVwqZ9TxVbomng=; fh=xchYt5XmYi0yMIXdYaRYw/p1FaaFKtt5BUtMkLmbazU=; b=yD5boRIbLzE35bPmTdo5rmRRUYy37Piwd/iKpBdKHDwZms59+lpoAN9uOLIL3ybo1N YVBSIIzj5mN/z9c8JOemYJHAv0I5Qk2fIXiEEAExr6SMX+bxwjJd1c6djPkgy310whA2 sphgTqRw3YG3PzrezrGuO64iruRXSik2SrIuvup14fDE9rAvGj2UMDAeD7wLJQeYU08X 3c5NQBz2R22qpmQ3ALk5dkvUM+ASjQ32NZ8DpJolOLzBf9+TC5SHMFc2eyQAjkW7Kwfz lqxCeqvZvgGKPO56IzUXfWqccAUVuGYhSsabyG5WOJjTDQJ6eZFZNyWmyD79+myw9krV ACCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=LMlX6Tox; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id az18-20020a17090b029200b00263eb5054fdsi5786900pjb.32.2023.09.15.11.21.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Sep 2023 11:21:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=LMlX6Tox; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 0AFD583B2D49; Fri, 15 Sep 2023 04:25:35 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234345AbjIOLZY (ORCPT <rfc822;ruipengqi7@gmail.com> + 32 others); Fri, 15 Sep 2023 07:25:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234349AbjIOLZL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 15 Sep 2023 07:25:11 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92FDDCC8; Fri, 15 Sep 2023 04:25:04 -0700 (PDT) Date: Fri, 15 Sep 2023 11:25:02 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694777103; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9J21HuhWXljN+jXTdAk29WG1MDKeiqVwqZ9TxVbomng=; b=LMlX6Toxx7Q1vxs3t5ydLNVNMin45taaBUfxXOvsKMtAZO9uh6nFhWqmJHfNF1oVfY22O8 dXr3iUCAaFSertTJbfxwMjRRGEnh0nh6w9BVzYzDg9yKY/nAsbSS9S6EsnrZUIe0hzwmHG en8+d51zW12pNormq5jvFI6vRDhrj51jkREY5ZuZ2PNtGjcx14sBusxNFWbyK8rWzuzsOU SRosXR5UWp/wtr9FO5MvElFxyX7mmS6wfIT+9yAkqQQbjJ8jzTMy2fqWiKIbLwVraAgZzm 9rF7+WTdXM807R0Wgch53PFsZbqYGzfVm6JJj2TsMEpr5JP+10Dr24c5rGXn3w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694777103; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9J21HuhWXljN+jXTdAk29WG1MDKeiqVwqZ9TxVbomng=; b=EeooppuNQPAxP/cVjKRXBsjgytBF6dvzFT0DezyrOfGumL/LBTTNwAn5pqqZ7ccHpu5NoH +F/j9dtnQkQ/oaBQ== From: "tip-bot2 for Uros Bizjak" <tip-bot2@linutronix.de> Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/asm] x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128} Cc: Uros Bizjak <ubizjak@gmail.com>, Ingo Molnar <mingo@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20230906185941.53527-1-ubizjak@gmail.com> References: <20230906185941.53527-1-ubizjak@gmail.com> MIME-Version: 1.0 Message-ID: <169477710252.27769.14094735545135203449.tip-bot2@tip-bot2> Robot-ID: <tip-bot2@linutronix.de> Robot-Unsubscribe: Contact <mailto:tglx@linutronix.de> to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Fri, 15 Sep 2023 04:25:35 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777123124601735698 X-GMAIL-MSGID: 1777128801814536314 |
Series |
[tip:,x86/asm] x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128}
|
|
Commit Message
tip-bot2 for Thomas Gleixner
Sept. 15, 2023, 11:25 a.m. UTC
The following commit has been merged into the x86/asm branch of tip: Commit-ID: 54cd971c6f4461fb6b178579751788bf4f64dfca Gitweb: https://git.kernel.org/tip/54cd971c6f4461fb6b178579751788bf4f64dfca Author: Uros Bizjak <ubizjak@gmail.com> AuthorDate: Wed, 06 Sep 2023 20:58:44 +02:00 Committer: Ingo Molnar <mingo@kernel.org> CommitterDate: Fri, 15 Sep 2023 13:16:35 +02:00 x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128} Define target-specific {raw,this}_cpu_try_cmpxchg64() and {raw,this}_cpu_try_cmpxchg128() macros. These definitions override the generic fallback definitions and enable target-specific optimized implementations. Several places in mm/slub.o improve from e.g.: 53bc: 48 8d 4f 40 lea 0x40(%rdi),%rcx 53c0: 48 89 fa mov %rdi,%rdx 53c3: 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx 53c8: 4c 89 e8 mov %r13,%rax 53cb: 49 8d 30 lea (%r8),%rsi 53ce: e8 00 00 00 00 call 53d3 <...> 53cf: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 53d3: 48 31 d7 xor %rdx,%rdi 53d6: 4c 31 e8 xor %r13,%rax 53d9: 48 09 c7 or %rax,%rdi 53dc: 75 ae jne 538c <...> to: 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 53c4: 4c 89 f8 mov %r15,%rax 53c7: 48 8d 37 lea (%rdi),%rsi 53ca: e8 00 00 00 00 call 53cf <...> 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 53cf: 75 bb jne 538c <...> reducing the size of mm/slub.o by 80 bytes: text data bss dec hex filename 39758 5337 4208 49303 c097 slub-new.o 39838 5337 4208 49383 c0e7 slub-old.o Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230906185941.53527-1-ubizjak@gmail.com --- arch/x86/include/asm/percpu.h | 67 ++++++++++++++++++++++++++++++++++- 1 file changed, 67 insertions(+)
Comments
On Fri, Sep 15, 2023 at 6:45 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Fri, 15 Sept 2023 at 04:25, tip-bot2 for Uros Bizjak > <tip-bot2@linutronix.de> wrote: > > > > Several places in mm/slub.o improve from e.g.: > > > [...] > > > > to: > > > > 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx > > 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx > > 53c4: 4c 89 f8 mov %r15,%rax > > 53c7: 48 8d 37 lea (%rdi),%rsi > > 53ca: e8 00 00 00 00 call 53cf <...> > > 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 > > 53cf: 75 bb jne 538c <...> > > Honestly, if y ou care deeply about this code sequence, I think you > should also move the "lea" out of the inline asm. I have to say that the above asm code was shown mostly as an example of the improvement, to illustrate how the compare sequence at the end of the cmpxchg loop gets eliminated. Being a fairly mechanical change, I didn't put much thought in the surrounding code. > Both > > call this_cpu_cmpxchg16b_emu > > and > > cmpxchg16b %gs:(%rsi) > > are 5 bytes, and I suspect it's easiest to just always put the address > in %rsi - whether you call the function or not. > > It doesn't really make the code generation for the non-call sequence > worse, and it gives the compiler more information (ie instead of > clobbering %rsi, the compiler knows what %rsi contains). > > IOW, something like this: > > - asm qual (ALTERNATIVE("leaq %P[var], %%rsi; call > this_cpu_cmpxchg16b_emu", \ > + asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ > ... > - "c" (new__.high) \ > - : "memory", "rsi"); \ > + "c" (new__.high), \ > + "S" (&_var) \ > + : "memory"); \ > > should do it. Yes, and the above change improves slub.o assembly from (current tip tree with try_cmpxchg patch applied): 53b3: 41 8b 44 24 28 mov 0x28(%r12),%eax 53b8: 49 8b 3c 24 mov (%r12),%rdi 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 53c4: 4c 89 f8 mov %r15,%rax 53c7: 48 8d 37 lea (%rdi),%rsi 53ca: e8 00 00 00 00 call 53cf <kmem_cache_alloc+0x9f> 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 53cf: 75 bb jne 538c <kmem_cache_alloc+0x5c> to: 53b3: 41 8b 44 24 28 mov 0x28(%r12),%eax 53b8: 49 8b 34 24 mov (%r12),%rsi 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 53c4: 4c 89 f8 mov %r15,%rax 53c7: e8 00 00 00 00 call 53cc <kmem_cache_alloc+0x9c> 53c8: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 53cc: 75 be jne 538c <kmem_cache_alloc+0x5c> where an effective reg-reg move "lea (%rdi), %rsi" at 537c gets removed. And indeed, GCC figures out that %rsi holds the address of the variable and emits: 5: 65 48 0f c7 0e cmpxchg16b %gs:(%rsi) alternative replacement. Now, here comes the best part: We can get rid of the %P modifier. With named address spaces (__seg_gs), older GCCs had some problems with %P and emitted "%gs:foo" instead of foo, resulting in "Warning: segment override on `lea' is ineffectual" assembly warning. With the proposed change, we use: --cut here-- int __seg_gs g; void foo (void) { asm ("%0 %1" :: "m"(g), "S"(&g)); } --cut here-- and get the desired assembly: movl $g, %esi %gs:g(%rip) %rsi The above is also in line with [1], where it is said that "[__seg_gs/__seg_fs] address spaces are not considered to be subspaces of the generic (flat) address space." So, cmpxchg16b_emu.S must use %gs to apply segment base offset, which it does. > Note that I think this is particularly true of the slub code, because > afaik, the slub code will *only* use the slow call-out. > > Why? Because if the CPU actually supports the cmpxchgb16 instruction, > then the slub code won't even take this path at all - it will do the > __CMPXCHG_DOUBLE path, which does an unconditional locked cmpxchg16b. > > Maybe I'm misreading it. And no, none of this matters. But since I saw > the patch fly by, and slub.o mentioned, I thought I'd point out how > silly this all is. It's optimizing a code-path that is basically never > taken, and when it *is* taken, it can be improved further, I think. True, but as mentioned above, the slub.o code was used to illustrate the effect of the patch. The new locking primitive should be usable in a general way and could be also used in other places. [1] https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html#x86-Named-Address-Spaces Uros.
Now also with the patch attached. Uros. On Sun, Sep 17, 2023 at 8:31 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Fri, Sep 15, 2023 at 6:45 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > On Fri, 15 Sept 2023 at 04:25, tip-bot2 for Uros Bizjak > > <tip-bot2@linutronix.de> wrote: > > > > > > Several places in mm/slub.o improve from e.g.: > > > > > [...] > > > > > > to: > > > > > > 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx > > > 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx > > > 53c4: 4c 89 f8 mov %r15,%rax > > > 53c7: 48 8d 37 lea (%rdi),%rsi > > > 53ca: e8 00 00 00 00 call 53cf <...> > > > 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 > > > 53cf: 75 bb jne 538c <...> > > > > Honestly, if y ou care deeply about this code sequence, I think you > > should also move the "lea" out of the inline asm. > > I have to say that the above asm code was shown mostly as an example > of the improvement, to illustrate how the compare sequence at the end > of the cmpxchg loop gets eliminated. Being a fairly mechanical change, > I didn't put much thought in the surrounding code. > > > Both > > > > call this_cpu_cmpxchg16b_emu > > > > and > > > > cmpxchg16b %gs:(%rsi) > > > > are 5 bytes, and I suspect it's easiest to just always put the address > > in %rsi - whether you call the function or not. > > > > It doesn't really make the code generation for the non-call sequence > > worse, and it gives the compiler more information (ie instead of > > clobbering %rsi, the compiler knows what %rsi contains). > > > > IOW, something like this: > > > > - asm qual (ALTERNATIVE("leaq %P[var], %%rsi; call > > this_cpu_cmpxchg16b_emu", \ > > + asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ > > ... > > - "c" (new__.high) \ > > - : "memory", "rsi"); \ > > + "c" (new__.high), \ > > + "S" (&_var) \ > > + : "memory"); \ > > > > should do it. > > Yes, and the above change improves slub.o assembly from (current tip > tree with try_cmpxchg patch applied): > > 53b3: 41 8b 44 24 28 mov 0x28(%r12),%eax > 53b8: 49 8b 3c 24 mov (%r12),%rdi > 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx > 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx > 53c4: 4c 89 f8 mov %r15,%rax > 53c7: 48 8d 37 lea (%rdi),%rsi > 53ca: e8 00 00 00 00 call 53cf <kmem_cache_alloc+0x9f> > 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 > 53cf: 75 bb jne 538c <kmem_cache_alloc+0x5c> > > to: > > 53b3: 41 8b 44 24 28 mov 0x28(%r12),%eax > 53b8: 49 8b 34 24 mov (%r12),%rsi > 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx > 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx > 53c4: 4c 89 f8 mov %r15,%rax > 53c7: e8 00 00 00 00 call 53cc <kmem_cache_alloc+0x9c> > 53c8: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 > 53cc: 75 be jne 538c <kmem_cache_alloc+0x5c> > > where an effective reg-reg move "lea (%rdi), %rsi" at 537c gets > removed. And indeed, GCC figures out that %rsi holds the address of > the variable and emits: > > 5: 65 48 0f c7 0e cmpxchg16b %gs:(%rsi) > > alternative replacement. > > Now, here comes the best part: We can get rid of the %P modifier. With > named address spaces (__seg_gs), older GCCs had some problems with %P > and emitted "%gs:foo" instead of foo, resulting in "Warning: segment > override on `lea' is ineffectual" assembly warning. With the proposed > change, we use: > > --cut here-- > int __seg_gs g; > > void foo (void) > { > asm ("%0 %1" :: "m"(g), "S"(&g)); > } > --cut here-- > > and get the desired assembly: > > movl $g, %esi > %gs:g(%rip) %rsi > > The above is also in line with [1], where it is said that > "[__seg_gs/__seg_fs] address spaces are not considered to be subspaces > of the generic (flat) address space." So, cmpxchg16b_emu.S must use > %gs to apply segment base offset, which it does. > > > Note that I think this is particularly true of the slub code, because > > afaik, the slub code will *only* use the slow call-out. > > > > Why? Because if the CPU actually supports the cmpxchgb16 instruction, > > then the slub code won't even take this path at all - it will do the > > __CMPXCHG_DOUBLE path, which does an unconditional locked cmpxchg16b. > > > > Maybe I'm misreading it. And no, none of this matters. But since I saw > > the patch fly by, and slub.o mentioned, I thought I'd point out how > > silly this all is. It's optimizing a code-path that is basically never > > taken, and when it *is* taken, it can be improved further, I think. > > True, but as mentioned above, the slub.o code was used to illustrate > the effect of the patch. The new locking primitive should be usable in > a general way and could be also used in other places. > > [1] https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html#x86-Named-Address-Spaces > > Uros. diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index a87db6140fe2..331a9d4dce82 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -242,14 +242,15 @@ do { \ old__.var = _oval; \ new__.var = _nval; \ \ - asm qual (ALTERNATIVE("leal %P[var], %%esi; call this_cpu_cmpxchg8b_emu", \ + asm qual (ALTERNATIVE("call this_cpu_cmpxchg8b_emu", \ "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \ : [var] "+m" (_var), \ "+a" (old__.low), \ "+d" (old__.high) \ : "b" (new__.low), \ - "c" (new__.high) \ - : "memory", "esi"); \ + "c" (new__.high), \ + "S" (&_var) \ + : "memory"); \ \ old__.var; \ }) @@ -271,7 +272,7 @@ do { \ old__.var = *_oval; \ new__.var = _nval; \ \ - asm qual (ALTERNATIVE("leal %P[var], %%esi; call this_cpu_cmpxchg8b_emu", \ + asm qual (ALTERNATIVE("call this_cpu_cmpxchg8b_emu", \ "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \ CC_SET(z) \ : CC_OUT(z) (success), \ @@ -279,8 +280,9 @@ do { \ "+a" (old__.low), \ "+d" (old__.high) \ : "b" (new__.low), \ - "c" (new__.high) \ - : "memory", "esi"); \ + "c" (new__.high), \ + "S" (&_var) \ + : "memory"); \ if (unlikely(!success)) \ *_oval = old__.var; \ likely(success); \ @@ -309,14 +311,15 @@ do { \ old__.var = _oval; \ new__.var = _nval; \ \ - asm qual (ALTERNATIVE("leaq %P[var], %%rsi; call this_cpu_cmpxchg16b_emu", \ + asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \ : [var] "+m" (_var), \ "+a" (old__.low), \ "+d" (old__.high) \ : "b" (new__.low), \ - "c" (new__.high) \ - : "memory", "rsi"); \ + "c" (new__.high), \ + "S" (&_var) \ + : "memory"); \ \ old__.var; \ }) @@ -338,7 +341,7 @@ do { \ old__.var = *_oval; \ new__.var = _nval; \ \ - asm qual (ALTERNATIVE("leaq %P[var], %%rsi; call this_cpu_cmpxchg16b_emu", \ + asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \ CC_SET(z) \ : CC_OUT(z) (success), \ @@ -346,8 +349,9 @@ do { \ "+a" (old__.low), \ "+d" (old__.high) \ : "b" (new__.low), \ - "c" (new__.high) \ - : "memory", "rsi"); \ + "c" (new__.high), \ + "S" (&_var) \ + : "memory"); \ if (unlikely(!success)) \ *_oval = old__.var; \ likely(success); \
* Uros Bizjak <ubizjak@gmail.com> wrote: > Now also with the patch attached. > > Uros. > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h > index a87db6140fe2..331a9d4dce82 100644 > --- a/arch/x86/include/asm/percpu.h > +++ b/arch/x86/include/asm/percpu.h Assuming it boots & works, mind sending a fully changelogged patch with a SOB and a 'Suggested-by: Linus' tag or so? Looks like a nice v6.7 addition. Thanks, Ingo
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 34734d7..4c36419 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -237,12 +237,47 @@ do { \ #define raw_cpu_cmpxchg64(pcp, oval, nval) percpu_cmpxchg64_op(8, , pcp, oval, nval) #define this_cpu_cmpxchg64(pcp, oval, nval) percpu_cmpxchg64_op(8, volatile, pcp, oval, nval) + +#define percpu_try_cmpxchg64_op(size, qual, _var, _ovalp, _nval) \ +({ \ + bool success; \ + u64 *_oval = (u64 *)(_ovalp); \ + union { \ + u64 var; \ + struct { \ + u32 low, high; \ + }; \ + } old__, new__; \ + \ + old__.var = *_oval; \ + new__.var = _nval; \ + \ + asm qual (ALTERNATIVE("leal %P[var], %%esi; call this_cpu_cmpxchg8b_emu", \ + "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \ + CC_SET(z) \ + : CC_OUT(z) (success), \ + [var] "+m" (_var), \ + "+a" (old__.low), \ + "+d" (old__.high) \ + : "b" (new__.low), \ + "c" (new__.high) \ + : "memory", "esi"); \ + if (unlikely(!success)) \ + *_oval = old__.var; \ + likely(success); \ +}) + +#define raw_cpu_try_cmpxchg64(pcp, ovalp, nval) percpu_try_cmpxchg64_op(8, , pcp, ovalp, nval) +#define this_cpu_try_cmpxchg64(pcp, ovalp, nval) percpu_try_cmpxchg64_op(8, volatile, pcp, ovalp, nval) #endif #ifdef CONFIG_X86_64 #define raw_cpu_cmpxchg64(pcp, oval, nval) percpu_cmpxchg_op(8, , pcp, oval, nval); #define this_cpu_cmpxchg64(pcp, oval, nval) percpu_cmpxchg_op(8, volatile, pcp, oval, nval); +#define raw_cpu_try_cmpxchg64(pcp, ovalp, nval) percpu_try_cmpxchg_op(8, , pcp, ovalp, nval); +#define this_cpu_try_cmpxchg64(pcp, ovalp, nval) percpu_try_cmpxchg_op(8, volatile, pcp, ovalp, nval); + #define percpu_cmpxchg128_op(size, qual, _var, _oval, _nval) \ ({ \ union { \ @@ -269,6 +304,38 @@ do { \ #define raw_cpu_cmpxchg128(pcp, oval, nval) percpu_cmpxchg128_op(16, , pcp, oval, nval) #define this_cpu_cmpxchg128(pcp, oval, nval) percpu_cmpxchg128_op(16, volatile, pcp, oval, nval) + +#define percpu_try_cmpxchg128_op(size, qual, _var, _ovalp, _nval) \ +({ \ + bool success; \ + u128 *_oval = (u128 *)(_ovalp); \ + union { \ + u128 var; \ + struct { \ + u64 low, high; \ + }; \ + } old__, new__; \ + \ + old__.var = *_oval; \ + new__.var = _nval; \ + \ + asm qual (ALTERNATIVE("leaq %P[var], %%rsi; call this_cpu_cmpxchg16b_emu", \ + "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \ + CC_SET(z) \ + : CC_OUT(z) (success), \ + [var] "+m" (_var), \ + "+a" (old__.low), \ + "+d" (old__.high) \ + : "b" (new__.low), \ + "c" (new__.high) \ + : "memory", "rsi"); \ + if (unlikely(!success)) \ + *_oval = old__.var; \ + likely(success); \ +}) + +#define raw_cpu_try_cmpxchg128(pcp, ovalp, nval) percpu_try_cmpxchg128_op(16, , pcp, ovalp, nval) +#define this_cpu_try_cmpxchg128(pcp, ovalp, nval) percpu_try_cmpxchg128_op(16, volatile, pcp, ovalp, nval) #endif /*