From patchwork Thu Dec 1 08:09:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 28230 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp135680wrr; Thu, 1 Dec 2022 00:10:53 -0800 (PST) X-Google-Smtp-Source: AA0mqf4/LoF4CCB036x+hhoX52YMKn6aQECvx8r/PTzYWjG4PH9nEqUnhuoVdT9gqXFXha2SMCLH X-Received: by 2002:a17:907:7e94:b0:7ba:e547:4d83 with SMTP id qb20-20020a1709077e9400b007bae5474d83mr30874913ejc.163.1669882253125; Thu, 01 Dec 2022 00:10:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669882253; cv=none; d=google.com; s=arc-20160816; b=L5ZBMEKlrfqYYr5mXY/4WEa9Z18ULhlYq44HWsVhpJ/B/ONr6seV80THsmMH8wKsls naYILfk3Ua/AhZ028OLduBjrzUUWa5qCIQN6DJMiGDMuKHWTYpqOaXlVJfBZlnEkSsAu H7fU6iOhuh7zB4weySD8q4VjekbpQNkpVeY9XbGxEUeUSfhaDJqkWJVl0Wqu9iqZSP2p iqO9/kkzgSpnjApGUJVN5ORm04acTYgwWE1Dao+FDCo976pd2AAalPFtSLPb8iw4xsIs N0ynqsIGsGf2RFU+uD9eS31zi6mTRLrRKGbgoa7PEPw3c0DRN1PmsZeW0ScUB+CaVImh vrbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:message-id:subject:cc:to:date :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=4d7H287U0qVF6BlZfLkqhjvOgxuasImjifUXl9G1ZqQ=; b=gdXGs1flMfUVJp5bCgMCCkHvbq4fOKTWA/ua3ZxraomMWo2KVftyZZVVF2P4qpVWfF +sPESY31g+OLOMMu1FCrWh6sdZd5MtfEOQXY6WVD6/W53XUB+ZR8ME2A88ud10dYuspG yh3teEUzN9qAvKhkT/Kidy729t+KZtq81hF08VXoarl2O594v3TpVDb12BsaT8ymZKKS 0ubBWLTiOguJyYQtmo6T86GmNeNkDpE1JcGd4/SpH6wVx5Ns8gREB3a4VtYNU6zI0D2e bPTI7GUbpLxGA8+HGJVTVrQR1/6sESqc1atxdg3/Kw2WEWNXVhRZFRgCDA27SzB0iyJN CC2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=fiWdSG3m; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id s21-20020a056402037500b0046b953601c9si3136661edw.7.2022.12.01.00.10.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 00:10:53 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=fiWdSG3m; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DFCD63858C74 for ; Thu, 1 Dec 2022 08:10:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DFCD63858C74 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1669882251; bh=4d7H287U0qVF6BlZfLkqhjvOgxuasImjifUXl9G1ZqQ=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=fiWdSG3meyuygNpc6hvUNAs52eVKr7BRN8A5SZUaso+Kk/uOJSoyNcZG/IqngggIM czcUK4mPZJL0EOemgJGhEgDdXe4auW+cGPEVv5FhFfTORVMATNtpIt2wu8vAyc9rOm JJrUXLS0rImZLINm3o8PlTWTuQeUtF/UO5yM1pdU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 8C40D3858D37 for ; Thu, 1 Dec 2022 08:10:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8C40D3858D37 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-493-NynIjEHKNTScgwm87HI0LQ-1; Thu, 01 Dec 2022 03:09:59 -0500 X-MC-Unique: NynIjEHKNTScgwm87HI0LQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 96CF83C025BE; Thu, 1 Dec 2022 08:09:58 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.195.114]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4E654140EBF5; Thu, 1 Dec 2022 08:09:58 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 2B189rQG3738635 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 1 Dec 2022 09:09:53 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 2B189pXu3738634; Thu, 1 Dec 2022 09:09:51 +0100 Date: Thu, 1 Dec 2022 09:09:51 +0100 To: Uros Bizjak , Roger Sayle Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] i386: Improve *concat3_{1,2,3,4} patterns [PR107627] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750998453643050088?= X-GMAIL-MSGID: =?utf-8?q?1750998453643050088?= Hi! On the first testcase we've regressed since 12 at -O2: - movq 8(%rsi), %rax - movq %rdi, %r8 - movq (%rsi), %rdi + movq (%rsi), %rax + movq 8(%rsi), %r8 movl %edx, %ecx - shrdq %rdi, %rax - movq %rax, (%r8) + xorl %r9d, %r9d + movq %rax, %rdx + xorl %eax, %eax + orq %r8, %rax + orq %r9, %rdx + shrdq %rdx, %rax + movq %rax, (%rdi) On the second testcase we've emitted such terrible code with the useless xors and ors for a long time. For PR91681 the *concat3_{1,2,3,4} patterns have been added but they allow just register inputs and register or memory offsettable output. The following patch fixes this by allowing also memory inputs on those patterns, because the pattern is then split to 0-2 emit_move_insns or one xchg and those can handle loads from memory too just fine. So that we don't narrow memory loads (source has 128-bit (or for ia32 64-bit) load and we would make 64-bit (or for ia32 32-bit) load out of it), register_operand -> nonmemory_operand change is done only for operands in zero_extend arguments. o <- m, m or o <- m, r or o <- r, m alternatives aren't used, we'd lack registers to perform the moves. But what is in addition to the current ro <- r, r supported are r <- m, r and r <- r, m (in that case we just need to be careful about corner cases, see what emit_move_insn we'd call and if we wouldn't clobber registers used in m's address before loading - split_double_concat handles that now) and &r <- m, m (in that case I think the early clobber is the easiest solution). The first testcase then on 12 -> patched trunk at -O2 changes: - movq 8(%rsi), %rax - movq %rdi, %r8 - movq (%rsi), %rdi + movq 8(%rsi), %r9 + movq (%rsi), %r10 movl %edx, %ecx - shrdq %rdi, %rax - movq %rax, (%r8) + movq %r9, %rax + shrdq %r10, %rax + movq %rax, (%rdi) so same amount of instructions and second testcase 12 -> patched trunk at -O2 -m32: - pushl %edi - xorl %edi, %edi pushl %esi - movl 16(%esp), %esi + pushl %ebx + movl 16(%esp), %eax movl 20(%esp), %ecx - movl (%esi), %eax - movl 4(%esi), %esi - movl %eax, %edx - movl $0, %eax - orl %edi, %edx - orl %esi, %eax - shrdl %edx, %eax movl 12(%esp), %edx + movl 4(%eax), %ebx + movl (%eax), %esi + movl %ebx, %eax + shrdl %esi, %eax movl %eax, (%edx) + popl %ebx popl %esi - popl %edi Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? BTW, I wonder if we couldn't add additional patterns which would catch the case where one of the operands is constant and how does this interact with the stv pass in 32-bit mode where I think stv is right after combine, so if we match these patterns, perhaps it would be nice to handle them in stv (unless they are handled there already). 2022-12-01 Jakub Jelinek PR target/107627 * config/i386/i386.md (*concat3_1, *concat3_2): For operands which are zero_extend arguments allow memory if output operand is a register. (*concat3_3, *concat3_4): Likewise. If both input operands are memory, use early clobber on output operand. * config/i386/i386-expand.cc (split_double_concat): Deal with corner cases where one input is memory and the other is not and the address of the memory input uses a register we'd overwrite before loading the memory into a register. * gcc.target/i386/pr107627-1.c: New test. * gcc.target/i386/pr107627-2.c: New test. Jakub --- gcc/config/i386/i386.md.jj 2022-11-28 10:13:17.758656933 +0100 +++ gcc/config/i386/i386.md 2022-11-30 12:11:55.724474793 +0100 @@ -11396,11 +11396,12 @@ (define_insn "*xorqi_ext_1_cc" ;; Split DST = (HI<<32)|LO early to minimize register usage. (define_code_iterator any_or_plus [plus ior xor]) (define_insn_and_split "*concat3_1" - [(set (match_operand: 0 "nonimmediate_operand" "=ro") + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r") (any_or_plus: - (ashift: (match_operand: 1 "register_operand" "r") + (ashift: (match_operand: 1 "register_operand" "r,r") (match_operand: 2 "const_int_operand")) - (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))] + (zero_extend: + (match_operand:DWIH 3 "nonimmediate_operand" "r,m"))))] "INTVAL (operands[2]) == * BITS_PER_UNIT" "#" "&& reload_completed" @@ -11412,10 +11413,11 @@ (define_insn_and_split "*concat3_2" - [(set (match_operand: 0 "nonimmediate_operand" "=ro") + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r") (any_or_plus: - (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) - (ashift: (match_operand: 2 "register_operand" "r") + (zero_extend: + (match_operand:DWIH 1 "nonimmediate_operand" "r,m")) + (ashift: (match_operand: 2 "register_operand" "r,r") (match_operand: 3 "const_int_operand"))))] "INTVAL (operands[3]) == * BITS_PER_UNIT" "#" @@ -11428,12 +11430,14 @@ (define_insn_and_split "*concat3_3" - [(set (match_operand: 0 "nonimmediate_operand" "=ro") + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r") (any_or_plus: (ashift: - (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) + (zero_extend: + (match_operand:DWIH 1 "nonimmediate_operand" "r,m,r,m")) (match_operand: 2 "const_int_operand")) - (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))] + (zero_extend: + (match_operand:DWIH 3 "nonimmediate_operand" "r,r,m,m"))))] "INTVAL (operands[2]) == * BITS_PER_UNIT" "#" "&& reload_completed" @@ -11444,11 +11448,13 @@ (define_insn_and_split "*concat3_4" - [(set (match_operand: 0 "nonimmediate_operand" "=ro") + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r") (any_or_plus: - (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) + (zero_extend: + (match_operand:DWIH 1 "nonimmediate_operand" "r,m,r,m")) (ashift: - (zero_extend: (match_operand:DWIH 2 "register_operand" "r")) + (zero_extend: + (match_operand:DWIH 2 "nonimmediate_operand" "r,r,m,m")) (match_operand: 3 "const_int_operand"))))] "INTVAL (operands[3]) == * BITS_PER_UNIT" "#" --- gcc/config/i386/i386-expand.cc.jj 2022-11-28 10:13:17.703657740 +0100 +++ gcc/config/i386/i386-expand.cc 2022-11-30 13:27:44.851737861 +0100 @@ -173,6 +173,33 @@ split_double_concat (machine_mode mode, rtx dlo, dhi; int deleted_move_count = 0; split_double_mode (mode, &dst, 1, &dlo, &dhi); + /* Constraints ensure that if both lo and hi are MEMs, then + dst has early-clobber and thus addresses of MEMs don't use + dlo/dhi registers. Otherwise if at least one of li and hi are MEMs, + dlo/dhi are registers. */ + if (MEM_P (lo) + && rtx_equal_p (dlo, hi) + && reg_overlap_mentioned_p (dhi, lo)) + { + /* If dlo is same as hi and lo's address uses dhi register, + code below would first emit_move_insn (dhi, hi) + and then emit_move_insn (dlo, lo). But the former + would invalidate lo's address. Load into dhi first, + then swap. */ + emit_move_insn (dhi, lo); + lo = dhi; + } + else if (MEM_P (hi) + && !MEM_P (lo) + && !rtx_equal_p (dlo, lo) + && reg_overlap_mentioned_p (dlo, hi)) + { + /* In this case, code below would first emit_move_insn (dlo, lo) + and then emit_move_insn (dhi, hi). But the former would + invalidate hi's address. Load into dhi first. */ + emit_move_insn (dhi, hi); + hi = dhi; + } if (!rtx_equal_p (dlo, hi)) { if (!rtx_equal_p (dlo, lo)) --- gcc/testsuite/gcc.target/i386/pr107627-1.c.jj 2022-11-30 13:52:11.654818924 +0100 +++ gcc/testsuite/gcc.target/i386/pr107627-1.c 2022-11-30 13:53:40.288496872 +0100 @@ -0,0 +1,22 @@ +/* PR target/107627 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-final { scan-assembler-not "\torq\t" } } */ + +static inline unsigned __int128 +foo (unsigned long long x, unsigned long long y) +{ + return ((unsigned __int128) x << 64) | y; +} + +static inline unsigned long long +bar (unsigned long long x, unsigned long long y, unsigned z) +{ + return foo (x, y) >> (z % 64); +} + +void +baz (unsigned long long *x, const unsigned long long *y, unsigned z) +{ + x[0] = bar (y[0], y[1], z); +} --- gcc/testsuite/gcc.target/i386/pr107627-2.c.jj 2022-11-30 13:52:14.890770658 +0100 +++ gcc/testsuite/gcc.target/i386/pr107627-2.c 2022-11-30 13:53:32.863607618 +0100 @@ -0,0 +1,22 @@ +/* PR target/107627 */ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-final { scan-assembler-not "\torl\t" } } */ + +static inline unsigned long long +qux (unsigned int x, unsigned int y) +{ + return ((unsigned long long) x << 32) | y; +} + +static inline unsigned int +corge (unsigned int x, unsigned int y, unsigned z) +{ + return qux (x, y) >> (z % 32); +} + +void +garply (unsigned int *x, const unsigned int *y, unsigned z) +{ + x[0] = corge (y[0], y[1], z); +}